GA-HRNet: High-Precision Building Extraction for Individualization of Oblique Photogrammetry 3D Models
Abstract
1. Introduction
- Proposed a novel segmentation head named Gated-ASPP (G-ASPP). This module introduces a context-aware gating mechanism into the classic ASPP structure, utilizing ASPP’s own global context information to dynamically and channel-wise recalibrate its multi-scale feature responses, thereby improving boundary delineation accuracy for large-footprint buildings while reducing false positives in complex urban backgrounds.
- Developed the GA-HRNet model. This model combines HRNet’s spatial detail preservation capability with G-ASPP’s dynamic context reasoning capability. It achieved performance superior to multiple existing advanced segmentation models on a building dataset constructed based on publicly available 3D model data of the Kowloon Peninsula, Hong Kong, thereby verifying the effectiveness and superiority of the proposed method.
- Established an end-to-end framework for building individualization from oblique photogrammetry models. The framework integrates orthophoto generation, deep learning-based segmentation, and coordinate-based 3D mapping with interactive visualization, providing a practical solution for large-scale urban 3D model applications that avoids the geometric complexity of direct 3D segmentation.
2. Literature Review
| Category | Model | Key Mechanism | Strengths | Limitations |
|---|---|---|---|---|
| CNN-based Methods | ARC-Net [31] | End-to-end attention refinement mechanism | Enhances discriminative power of feature maps | Involves repeated down-sampling and up-sampling operations, easily leading to spatial information loss. |
| BRRNet [32] | Residual refinement module | Effectively corrects predicted boundaries | ||
| MAP-Net [33] | Multi-parallel attention paths | Deeply captures spatial and semantic information | ||
| Modified U-Net [34] | Parallel distance map prediction | Improves smoothness and quality of segmentation results | ||
| FwSVM-Net [35] | Support Vector Machine (SVM) integration | Improves model training efficiency | ||
| Transformer-based Methods | BuildFormer [36] | Convolution + Window attention mechanism | Enhances frequency perception capability | Massive demand for training data; difficult to converge and prone to overfitting with small sample sizes. |
| MSTrans [37] | Multi-scale Transformer architecture | Robust perception of buildings with different sizes | ||
| DeepSwinLite [38] | Multi-scale feature fusion module | Improves computational efficiency and performance | ||
| Foundation of Proposed Method | HRNet [40] | Parallel multi-resolution subnetworks | Maintains high-resolution representations throughout, reducing spatial information loss | Relatively limited contextual understanding capability. |
| DeepLabV3+ [41] | Atrous Spatial Pyramid Pooling (ASPP) | Expands receptive field to capture multi-scale context | Prone to spatial information loss and difficult to adapt to complex scale variations in ultra-high-resolution orthophotos. |
3. Materials and Methods
3.1. Study Area
3.2. Dataset Construction
3.3. GA-HRNet Network
3.4. Post-Processing
3.5. Dynamic Individualization
4. Experiment
4.1. Experimental Settings
4.2. Overall Model Performance Evaluation
4.3. Experiment Results
4.4. Ablation Study
4.5. Dynamic Individualization Effect Display
5. Discussion
5.1. Mechanism of Performance Advantage
5.2. Limitations
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| UAV | Unmanned Aerial Vehicle |
| CNN | Convolutional Neural Network |
| DOM | Digital Orthophoto Map |
| ASPP | Atrous Spatial Pyramid Pooling |
| HRNet | High-Resolution Network |
| GA-HRNet | Gated-ASPP High-Resolution Network |
| IoU | Intersection over Union |
| GAP | Global Average Pooling |
References
- Riaz, K.; McAfee, M.; Gharbia, S.S. Management of climate resilience: Exploring the potential of digital twin technology, 3D city modelling, and early warning systems. Sensors 2023, 23, 2659. [Google Scholar] [CrossRef]
- Yiğit, A.Y.; Uysal, M. Virtual reality visualisation of automatic crack detection for bridge inspection from 3D digital twin generated by UAV photogrammetry. Measurement 2025, 242, 115931. [Google Scholar] [CrossRef]
- Christodoulides, A.; Tam, G.K.L.; Clarke, J.; Smith, R.; Horgan, J.; Micallef, N.; Morley, J.; Villamizar, N.; Walton, S. Survey on 3D Reconstruction Techniques: Large-Scale Urban City Reconstruction and Requirements. IEEE Trans. Vis. Comput. Graph. 2025, 31, 9343–9367. [Google Scholar] [CrossRef]
- Gu, D.; Chen, W.; Lu, X. Automated Assessment of Wind Damage to Windows of Buildings at a City Scale Based on Oblique Photography, Deep Learning and CFD. J. Build. Eng. 2022, 52, 104355. [Google Scholar] [CrossRef]
- Zhao, M.; Chen, J.; Song, S.; Li, Y.; Wang, F.; Wang, S.; Liu, D. Proposition of UAV multi-angle nap-of-the-object image acquisition framework based on a quality evaluation system for a 3D real scene model of a high-steep rock slope. Int. J. Appl. Earth Obs. Geoinf. 2023, 125, 103558. [Google Scholar] [CrossRef]
- Li, J.; Han, F.; Shi, L.; Liu, Z.; Wang, C.; Fan, Y. Rapid integration strategy for oblique photogrammetry terrain and highway BIM models in large-scale scenarios. Autom. Constr. 2025, 177, 106354. [Google Scholar] [CrossRef]
- Chowdhury, S.A.H.; Nguyen, C.; Li, H.; Li, B. Fixed-Lens camera setup and calibrated image registration for multifocus multiview 3D reconstruction. Neural Comput. Appl. 2021, 33, 7421–7440. [Google Scholar] [CrossRef]
- Verykokou, S.; Ioannidis, C. An overview on image-based and scanner-based 3D modeling technologies. Sensors 2023, 23, 596. [Google Scholar] [CrossRef]
- Liu, J.; Gao, J.; Ji, S.; Zeng, C.; Zhang, S.; Gong, J. Deep learning based multi-view stereo matching and 3D scene reconstruction from oblique aerial images. ISPRS J. Photogramm. Remote Sens. 2023, 204, 42–60. [Google Scholar] [CrossRef]
- Li, G.; Zhou, P.; Du, J.; Zhang, J.; Zhu, J. NMSCANet: Stereo Matching Network for Speckle Variations in Single-Shot Speckle Projection Profilometry. Opt. Express 2024, 32, 5849–5863. [Google Scholar] [CrossRef]
- Meng, C.; Song, Y.; Ji, J.; Jia, Z.; Zhou, Z.; Gao, P.; Liu, S. Automatic classification of rural building characteristics using deep learning methods on oblique photography. Build. Simul. 2022, 15, 1161–1174. [Google Scholar] [CrossRef]
- Zhang, L.; Wang, G.; Sun, W. Automatic Extraction of Building Geometries Based on Centroid Clustering and Contour Analysis on Oblique Images Taken by Unmanned Aerial Vehicles. Int. J. Geogr. Inf. Sci. 2022, 36, 453–475. [Google Scholar] [CrossRef]
- Wang, S.; Li, X.; Lin, L.; Lu, H.; Jiang, Y.; Zhang, N.; Wang, W.; Yue, J.; Li, Z. A Single Data Extraction Algorithm for Oblique Photographic Data Based on the U-Net. Remote Sens. 2024, 16, 979. [Google Scholar] [CrossRef]
- Ma, X.Y.; Zhang, X.P.; Shi, L. Research on the Algorithm of Building Object Boundary Extraction Based on Oblique Photographic Model. In Proceedings of the IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 12–14 October 2018; pp. 1957–1960. [Google Scholar] [CrossRef]
- Li, M.; Rottensteiner, F.; Heipke, C. Modelling of buildings from aerial LiDAR point clouds using TINs and label maps. ISPRS J. Photogramm. Remote Sens. 2019, 154, 127–138. [Google Scholar] [CrossRef]
- Yu, D.; Ji, S.; Liu, J.; Wei, S. Automatic 3D building reconstruction from multi-view aerial images with deep learning. ISPRS J. Photogramm. Remote Sens. 2021, 171, 155–170. [Google Scholar] [CrossRef]
- Hübner, P.; Weinmann, M.; Wursthorn, S.; Hinz, S. Automatic voxel-based 3D indoor reconstruction and room partitioning from triangle meshes. ISPRS J. Photogramm. Remote Sens. 2021, 181, 254–278. [Google Scholar] [CrossRef]
- Sarker, S.; Sarker, P.; Stone, G.; Gorman, R.; Tavakkoli, A.; Bebis, G.; Sattarvand, J. A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation. Mach. Vis. Appl. 2024, 35, 67. [Google Scholar] [CrossRef]
- Xu, W.; Zeng, Y.; Yin, C. 3D City Reconstruction: A Novel Method for Semantic Segmentation and Building Monomer Construction Using Oblique Photography. Appl. Sci. 2023, 13, 8795. [Google Scholar] [CrossRef]
- Kang, J.; Fernandez-Beltran, R.; Sun, X.; Ni, J.; Plaza, A. Deep Learning-Based Building Footprint Extraction With Missing Annotations. IEEE Geosci. Remote Sens. Lett. 2022, 19, 3002805. [Google Scholar] [CrossRef]
- Le, Q.H.; Shin, H.; Kwon, N.; Ho, J.; Ahn, Y. Deep Learning Based Urban Building Coverage Ratio Estimation Focusing on Rapid Urbanization Areas. Appl. Sci. 2022, 12, 11428. [Google Scholar] [CrossRef]
- Oostwegel, L.J.N.; Schorlemmer, D.; Guéguen, P. From Footprints to Functions: A Comprehensive Global and Semantic Building Footprint Dataset. Sci. Data 2025, 12, 1699. [Google Scholar] [CrossRef] [PubMed]
- Schrotter, G.; Hürzeler, C. The Digital Twin of the City of Zurich for Urban Planning. J. Photogramm. Remote Sens. Geoinf. Sci. 2020, 88, 99–112. [Google Scholar] [CrossRef]
- Luo, Y.; He, J. Evaluation of the urban heat island effect based on 3D modeling and planning indicators for urban planning proposals. J. Asian Archit. Build. Eng. 2025, 24, 5634–5656. [Google Scholar] [CrossRef]
- Liu, X.; Antwi-Afari, M.F.; Li, J.; Zhang, Y.; Manu, P. BIM, IoT, and GIS integration in construction resource monitoring. Autom. Constr. 2025, 174, 106149. [Google Scholar] [CrossRef]
- Alcaraz, C.; Lopez, J. Digital Twin: A Comprehensive Survey of Security Threats. IEEE Commun. Surv. Tutor. 2022, 24, 1475–1503. [Google Scholar] [CrossRef]
- Chen, M.; Wu, J.; Liu, L.; Zhao, W.; Tian, F.; Shen, Q.; Zhao, B.; Du, R. DR-Net: An Improved Network for Building Extraction from High Resolution Remote Sensing Image. Remote Sens. 2021, 13, 294. [Google Scholar] [CrossRef]
- Boonpook, W.; Tan, Y.; Torsri, K.; Kamsing, P.; Torteeka, P.; Nardkulpat, A. PCL–PTD Net: Parallel cross-learning-based pixel transferred deconvolutional network for building extraction in dense building areas with shadow. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 773–786. [Google Scholar] [CrossRef]
- Han, R.; Fan, X.; Liu, J. EUNet: Edge-UNet for Accurate Building Extraction and Edge Emphasis in Gaofen-7 Images. Remote Sens. 2024, 16, 2397. [Google Scholar] [CrossRef]
- Kuang, J.; Liu, D. SFGNet: Salient-feature-guided real-time building extraction network for remote sensing images. Knowl.-Based Syst. 2025, 317, 113413. [Google Scholar] [CrossRef]
- Liu, Y.; Zhou, J.; Qi, W.; Li, X.; Gross, L.; Shao, Q.; Zhao, Z.; Ni, L.; Fan, X.; Li, Z. ARC-Net: An Efficient Network for Building Extraction From High-Resolution Aerial Images. IEEE Access. 2020, 8, 154997–155010. [Google Scholar] [CrossRef]
- Shao, Z.; Tang, P.; Wang, Z.; Saleem, N.; Yam, S.; Sommai, C. BRRNet: A Fully Convolutional Neural Network for Automatic Building Extraction From High-Resolution Remote Sensing Images. Remote Sens. 2020, 12, 1050. [Google Scholar] [CrossRef]
- Zhu, Q.; Liao, C.; Hu, H.; Mei, X.; Li, H. MAP-Net: Multiple Attending Path Neural Network for Building Footprint Extraction From Remote Sensed Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6169–6181. [Google Scholar] [CrossRef]
- Moghalles, K.; Li, H.C.; Al-Huda, Z.; Hezzam, E.A. Multi-task deep network for semantic segmentation of building in very high resolution imagery. In Proceedings of the 2021 International Conference of Technology, Science and Administration (ICTSA), Taiz, Yemen, 22–24 March 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Yildirim, F.S.; Karsli, F.; Bahadir, M.; Yildirim, M. FwSVM-Net: A novel deep learning-based automatic building extraction from aerial images. J. Build. Eng. 2024, 96, 110473. [Google Scholar] [CrossRef]
- Wang, L.; Fang, S.; Meng, X.; Li, R. Building Extraction with Vision Transformer. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5625711. [Google Scholar] [CrossRef]
- Yang, F.; Jiang, F.; Li, J.; Lu, L. MSTrans: Multi-Scale Transformer for Building Extraction from HR Remote Sensing Images. Electronics 2024, 13, 4610. [Google Scholar] [CrossRef]
- Yilmaz, E.O.; Kavzoglu, T. DeepSwinLite: A Swin transformer-based light deep learning model for building extraction using VHR aerial imagery. Remote Sens. 2025, 17, 3146. [Google Scholar] [CrossRef]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention Mask Transformer for Universal Image Segmentation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 1280–1289. [Google Scholar] [CrossRef]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 3349–3364. [Google Scholar] [CrossRef]
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar] [CrossRef]
- Zhao, Z.; Jiang, G.; Li, Y. A Novel Method for Digital Orthophoto Generation from Top View Constrained Dense Matching. Remote Sens. 2023, 15, 177. [Google Scholar] [CrossRef]
- Chen, S.; Yan, Q.; Qu, Y.; Gao, W.; Yang, J.; Deng, F. Ortho-NeRF: Generating a True Digital Orthophoto Map Using the Neural Radiance Field from Unmanned Aerial Vehicle Images. Geo-Spat. Inf. Sci. 2024, 28, 741–760. [Google Scholar] [CrossRef]
- Wang, Y.; Zhao, Q.; Wu, Y.; Tian, W.; Zhang, G. SCA-Net: Multiscale Contextual Information Network for Building Extraction Based on High-Resolution Remote Sensing Images. Remote Sens. 2023, 15, 4466. [Google Scholar] [CrossRef]
- Yan, G.; Jing, H.; Li, H.; Guo, H.; He, S. Enhancing Building Segmentation in Remote Sensing Images: Advanced Multi-Scale Boundary Refinement with MBR-HRNet. Remote Sens. 2023, 15, 3766. [Google Scholar] [CrossRef]
- Huang, H.; Liu, J.; Wang, R. Easy-Net: A Lightweight Building Extraction Network Based on Building Features. IEEE Trans. Geosci. Remote Sens. 2024, 62, 4501515. [Google Scholar] [CrossRef]











| IoU (%) | F1-Score (%) | Precision (%) | Recall (%) |
|---|---|---|---|
| 91.25 | 95.41 | 93.31 | 97.70 |
| Model | IoU (%) | F1-Score (%) | Precision (%) | Recall (%) |
|---|---|---|---|---|
| FCN | 85.63 | 92.22 | 89.88 | 94.88 |
| U-Net | 86.33 | 92.62 | 91.87 | 93.86 |
| SCA-Net | 87.10 | 93.05 | 90.74 | 95.74 |
| MBR-HRNet | 89.79 | 94.61 | 92.86 | 96.51 |
| Easy-Net | 88.42 | 93.84 | 90.27 | 97.89 |
| GA-HRNet | 91.25 | 95.41 | 93.31 | 97.70 |
| Model | IoU (%) | F1-Score (%) | Precision (%) | Recall (%) |
|---|---|---|---|---|
| DeepLabV3+ (ResNet+ASPP) | 89.04 | 94.17 | 91.20 | 97.54 |
| HRNet-V2 | 90.27 | 94.88 | 93.26 | 96.61 |
| HRNet-ASPP (HRNet+ASPP) | 90.52 | 95.01 | 92.70 | 97.63 |
| GA-HRNet (HRNet+G-ASPP) | 91.25 | 95.41 | 93.31 | 97.70 |
| Loss Function | IoU (%) | F1-Score (%) | Precision (%) | Recall (%) |
|---|---|---|---|---|
| Dice Loss | 87.91 | 93.55 | 90.14 | 97.38 |
| Focal Loss | 90.67 | 95.10 | 93.42 | 96.96 |
| BCE Loss | 91.25 | 95.41 | 93.31 | 97.70 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zou, J.; Zhang, Y.; Li, F.; Wang, R.; Wu, J.; Qiao, Y. GA-HRNet: High-Precision Building Extraction for Individualization of Oblique Photogrammetry 3D Models. Appl. Sci. 2026, 16, 1486. https://doi.org/10.3390/app16031486
Zou J, Zhang Y, Li F, Wang R, Wu J, Qiao Y. GA-HRNet: High-Precision Building Extraction for Individualization of Oblique Photogrammetry 3D Models. Applied Sciences. 2026; 16(3):1486. https://doi.org/10.3390/app16031486
Chicago/Turabian StyleZou, Jiacui, Yongchuan Zhang, Feng Li, Ruibing Wang, Jiajun Wu, and Yang Qiao. 2026. "GA-HRNet: High-Precision Building Extraction for Individualization of Oblique Photogrammetry 3D Models" Applied Sciences 16, no. 3: 1486. https://doi.org/10.3390/app16031486
APA StyleZou, J., Zhang, Y., Li, F., Wang, R., Wu, J., & Qiao, Y. (2026). GA-HRNet: High-Precision Building Extraction for Individualization of Oblique Photogrammetry 3D Models. Applied Sciences, 16(3), 1486. https://doi.org/10.3390/app16031486

