RegionGraph: Region-Aware Graph-Based Building Reconstruction from Satellite Imagery
Abstract
1. Introduction
- We propose a structured reconstruction framework based on graph construction and graph optimization. In the graph construction stage, topological information is encoded into an initial graph representation. In the graph optimization stage, this representation is refined by automatically adjusting graph nodes and edges to improve reconstruction accuracy and efficiency.
- We introduce RegionGraph for building structure extraction from overhead remote sensing images. The method incorporates a primitive estimation network, ConPNet, which uses regional heatmap context to estimate primitive locations and attributes. By representing building topology as a region-based graph and formulating graph optimization as a node-merging contraction process, RegionGraph improves both regional completeness and structural consistency.
- We conduct comparison and ablation experiments on the SpaceNet dataset. The results demonstrate that the proposed method performs well in terms of regional completeness and structural relationship modeling, achieving performance comparable to state-of-the-art methods.
2. Related Work
2.1. Structural Reconstruction and Structural Reasoning
2.2. Traditional Structural Reconstruction Methods
2.3. Deep Learning for Building Extraction from Remote Sensing Images
2.4. Contour-Based and Topology-Aware Methods
2.5. Summary
3. Method
3.1. Overall Architecture
3.2. Graph Construction: ConPNet
- 1.
- Structure extraction operation : This step aims to remove environmental and appearance variations while recovering vectorized structural information. Specifically, corner reconstruction and edge reconstruction are applied to suppress background noise and façade textures, and to recover building contour corners and edges. The outputs of and are concatenated along the channel dimension before being passed to the subsequent rendering synthesis module.
- 2.
- Rendering synthesis operation : This step performs discrete sampling of boundary structures based on detected corner points. Sampling points are generated along boundary heatmaps according to corner locations to produce the final sampled-point representation.
3.2.1. Structural Design of ConPNet
3.2.2. Structure Extraction
3.2.3. Rendering Compositing
3.3. Graph Optimization: Graph Shrinkage via Node Merging
Node Merging Strategy
- 1.
- Compute for each relational edge .
- 2.
- Remove edges with confidence greater than a threshold :
- 3.
- Merge nodes connected by the removed edges:
4. Experiments and Analysis
4.1. Dataset and Sample Processing
4.1.1. Dataset
4.1.2. Sample Processing
- Sample-point heatmap. We uniformly sample points along each annotated edge at a 10-pixel interval to form the sampled-point set. Each sampled point is represented in the heatmap by placing a 2D Gaussian kernel centered at its pixel location, with standard deviation , resulting in the sampled-point heatmap.
- Corner heatmap. Corner heatmaps are generated using the same Gaussian rendering procedure applied to the annotated corner set.
- Boundary heatmap. Annotated edges are dilated to 3-pixel-wide line segments and then smoothed with a Gaussian filter with to form the boundary heatmap.
- Region heatmap. The closed region enclosed by annotated edges is filled with 1, and all other pixels are set to 0, producing the region heatmap.
4.2. Evaluation Metrics and Experimental Setup
4.2.1. Evaluation Metrics
4.2.2. Experimental Setup
4.3. Comparative Evaluation
4.4. Ablation Study
- Pointsample. Use sampled points as the triangulation input; otherwise use corners.
- Contracttri. Enable triangular-region merging; otherwise remove this contraction step.
- Contractheat. Enable relational-edge-based merging; otherwise remove this contraction step.
- R. Use region heatmap prior in ConPNet; otherwise remove RegionHead and DownsampleR.
- CA. Use channel attention in the Fusion Network; otherwise replace it with simple channel summation.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Nauata, N.; Furukawa, Y. Vectorizing world buildings: Planar graph reconstruction by primitive detection and relationship inference. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 711–726. [Google Scholar]
- Zhao, S.; Tu, K.; Ye, S.; Tang, H.; Hu, Y.; Xie, C. Land Use and Land Cover Classification Meets Deep Learning: A Review. Sensors 2023, 23, 8966. [Google Scholar] [CrossRef] [PubMed]
- Sharifi, A.; Khavarian-Garmsir, A.R.; Allam, Z.; Asadzadeh, A. Progress and prospects in planning: A bibliometric review of literature in Urban Studies and Regional and Urban Planning, 1956–2022. Prog. Plan. 2023, 173, 100740. [Google Scholar] [CrossRef]
- Hough, P.V. Method and Means for Recognizing Complex Patterns. U.S. Patent 3,069,654, 18 December 1962. [Google Scholar]
- Zhao, W.; Persello, C.; Stein, A. Building instance segmentation and boundary regularization from high-resolution remote sensing images. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium; IEEE: New York, NY, USA, 2020; pp. 3916–3919. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Toshev, A.; Szegedy, C. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2014; pp. 1653–1660. [Google Scholar]
- Xu, D.; Zhu, Y.; Choy, C.B.; Li, F. Scene graph generation by iterative message passing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2017; pp. 5410–5419. [Google Scholar]
- Xu, Y.; Xu, W.; Cheung, D.; Tu, Z. Line segment detection using transformers without edges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2021; pp. 4257–4266. [Google Scholar]
- Furukawa, Y.; Curless, B.; Seitz, S.M.; Szeliski, R. Manhattan-world stereo. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2009; pp. 1422–1429. [Google Scholar]
- Silberman, N.; Hoiem, D.; Kohli, P.; Fergus, R. Indoor segmentation and support inference from rgbd images. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2012; pp. 746–760. [Google Scholar]
- Nguyen, T.; Reitmayr, G.; Schmalstieg, D. Structural modeling from depth images. IEEE Trans. Vis. Comput. Graph. 2015, 21, 1230–1240. [Google Scholar] [CrossRef] [PubMed]
- Zou, C.; Colburn, A.; Shan, Q.; Hoiem, D. Layoutnet: Reconstructing the 3D room layout from a single rgb image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2018; pp. 2051–2059. [Google Scholar]
- Yang, S.T.; Wang, F.E.; Peng, C.H.; Wonka, P.; Sun, M.; Chu, H.K. Dula-net: A dual-projection network for estimating room layouts from a single rgb panorama. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2019; pp. 3363–3372. [Google Scholar]
- Gimenez, L.; Hippolyte, J.L.; Robert, S.; Suard, F.; Zreik, K. Reconstruction of 3D building information models from 2D scanned plans. J. Build. Eng. 2015, 2, 24–35. [Google Scholar] [CrossRef]
- Zhang, Y. Optimisation of building detection in satellite images by combining multispectral classification and texture filtering. ISPRS J. Photogramm. Remote Sens. 1999, 54, 50–60. [Google Scholar] [CrossRef]
- Okorn, B.; Xiong, X.; Akinci, B.; Huber, D. Toward automated modeling of floor plans. In Proceedings of the Symposium on 3D Data Processing, Visualization and Transmission, Paris, France, 17–20 May 2010; Volume 2. [Google Scholar]
- Cabral, R.; Furukawa, Y. Piecewise planar and compact floorplan reconstruction from images. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2014; pp. 628–635. [Google Scholar]
- Delage, E.; Lee, H.; Ng, A.Y. A dynamic bayesian network model for autonomous 3D reconstruction from a single indoor image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2006; Volume 2, pp. 2418–2428. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2015; pp. 3431–3440. [Google Scholar]
- Girard, N.; Smirnov, D.; Solomon, J.; Tarabalka, Y. Polygonal Building Extraction by Frame Field Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2021; pp. 5891–5900. [Google Scholar]
- He, J.; Cheng, Y.; Wang, W.; Ren, Z.; Zhang, C.; Zhang, W. A Lightweight Building Extraction Approach for Contour Recovery in Complex Urban Environments. Remote Sens. 2024, 16, 740. [Google Scholar] [CrossRef]
- Li, K.; Liu, R.; Cao, X.; Bai, X.; Zhou, F.; Meng, D.; Wang, Z. Segearth-ov: Towards training-free open-vocabulary segmentation for remote sensing images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2025; pp. 10545–10556. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017; pp. 2961–2969. [Google Scholar]
- Wang, X.; Kong, T.; Shen, C.; Jiang, Y.; Li, L. Solo: Segmenting objects by locations. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 649–665. [Google Scholar]
- Bolya, D.; Zhou, C.; Xiao, F.; Lee, Y.J. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2019; pp. 9157–9166. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2018; pp. 8759–8768. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H. Conditional convolutions for instance segmentation. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 282–298. [Google Scholar]
- Peng, S.; Jiang, W.; Pi, H.; Li, X.; Bao, H.; Zhou, X. Deep snake for real-time instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2020; pp. 8533–8542. [Google Scholar]
- Liu, Z.; Liew, J.H.; Chen, X.; Feng, J. DANCE: A Deep Attentive Contour Model for Efficient Instance Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; IEEE: New York, NY, USA, 2021; pp. 345–354. [Google Scholar]
- Wei, S.; Zhang, T.; Ji, S. A Concentric Loop Convolutional Neural Network for Manual Delineation-Level Building Boundary Segmentation from Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4407511. [Google Scholar] [CrossRef]
- Wang, L.; Wang, G.; Luo, X.; Wang, L.; Yu, W.; Zhang, Z.; Gao, H. Contour-based instance segmentation method of road scene. Sci. Rep. 2025, 15, 33692. [Google Scholar] [CrossRef] [PubMed]
- Xiao, X.; Wang, K.; Zhong, Z.; Qu, W.; Wu, W.; Cui, Z.; Su, Y.; Li, A.; Gong, J.; Li, D. A novel data-driven based high-precision building roof contour full-automatic extraction and structured 3D reconstruction method combining stereo images and LiDAR points. Int. J. Digit. Earth 2025, 18, 2484668. [Google Scholar] [CrossRef]
- Yao, W.; Li, C.; Xiong, M.; Dong, W.; Chen, H.; Xiao, X. ContourFormer: Real-Time Contour-Based End-to-End Instance Segmentation Transformer. arXiv 2025, arXiv:2501.17688. [Google Scholar]
- Zhang, F.; Nauata, N.; Furukawa, Y. Conv-MPN: Convolutional Message Passing Neural Network for Structured Outdoor Architecture Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2020; pp. 2798–2807. [Google Scholar]
- Newell, A.; Yang, K.; Deng, J. Stacked Hourglass Networks for Human Pose Estimation. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 483–499. [Google Scholar]
- Wu, W.; Qian, C.; Yang, S.; Wang, Q.; Cai, Y.; Zhou, Q. Look at Boundary: A Boundary-Aware Face Alignment Algorithm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2018. [Google Scholar]
- Acuna, D.; Ling, H.; Kar, A.; Fidler, S. Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2018; pp. 859–868. [Google Scholar]
- Zhang, Z.; Li, Z.; Bi, N.; Wang, J.; Zhang, S. PPGNet: Learning Point-Pair Graph for Line Segment Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2019; pp. 7105–7114. [Google Scholar]
- Hamaguchi, R.; Hikosaka, S. Building Detection from Satellite Imagery Using Ensemble of Size-Specific Detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; IEEE: New York, NY, USA, 2018; pp. 187–191. [Google Scholar]
- Zhang, R.; Zhang, Q.; Zhang, G. SDSC-UNet: Dual Skip Connection ViT-Based U-Shaped Model for Building Extraction. IEEE Geosci. Remote Sens. Lett. 2023, 20, 6005005. [Google Scholar] [CrossRef]
- Zhou, Y.; Qi, H.; Ma, Y. End-to-End Wireframe Parsing. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2019; pp. 962–971. [Google Scholar]












| Input | ||||
| Structure | Conv , stride = 1 | Conv , stride = 1 | ConvT , stride = 2 | |
| BatchNorm | BatchNorm | BatchNorm | ||
| ReLU | ReLU | ReLU | ||
| Concatenate | ||||
| Output |
| Method | Corner (%) | Edge (%) | Region (%) | |||
|---|---|---|---|---|---|---|
| Precision | Recall | Precision | Recall | Precision | Recall | |
| PolyRNN++ [38] | 49.6 | 43.7 | 19.5 | 15.2 | 39.8 | 13.7 |
| PPGNet [39] | 78.0 | 69.2 | 55.1 | 50.6 | 32.4 | 30.8 |
| Hamaguchi [40] | 58.3 | 57.8 | 25.4 | 22.3 | 51.0 | 36.7 |
| SDSC-UNet [41] | 42.5 | 70.6 | 25.6 | 35.6 | 42.1 | 42.7 |
| L-CNN [42] | 66.7 | 86.2 | 51.0 | 71.2 | 25.9 | 41.5 |
| Nauata [1] | 91.1 | 64.6 | 68.1 | 48.0 | 70.9 | 53.1 |
| ConvMPN [35] | 77.9 | 80.2 | 56.9 | 60.7 | 51.1 | 57.6 |
| RegionGraph (Ours) | 80.3 | 75.9 | 61.6 | 58.3 | 71.9 | 65.4 |
| Method | F1-Corner (%) | F1-Edge (%) | F1-Region (%) | F1-Average (%) |
|---|---|---|---|---|
| PolyRNN++ [38] | 46.4 | 17.1 | 20.4 | 28.0 |
| PPGNet [39] | 73.3 | 52.8 | 31.6 | 52.6 |
| Hamaguchi [40] | 58.0 | 23.8 | 42.7 | 41.5 |
| SDSC-UNet [41] | 53.1 | 29.8 | 42.4 | 41.8 |
| L-CNN [42] | 75.2 | 59.4 | 31.9 | 55.5 |
| Nauata [1] | 75.6 | 56.3 | 60.8 | 64.2 |
| ConvMPN [35] | 79.0 | 58.7 | 54.2 | 64.0 |
| RegionGraph (Ours) | 78.0 | 59.9 | 68.5 | 68.8 |
| Setting Name | R | CA | |||
|---|---|---|---|---|---|
| Setting 1 | / | ✓ | / | / | / |
| Setting 2 | ✓ | / | / | / | / |
| Setting 3 | ✓ | ✓ | / | / | / |
| Setting 4 | ✓ | / | ✓ | / | / |
| Setting 5 | ✓ | ✓ | ✓ | / | / |
| Setting 6 | ✓ | ✓ | ✓ | ✓ | / |
| Setting 7 | ✓ | ✓ | ✓ | ✓ | ✓ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, L.; Fang, C.; Li, W.; Chen, K.; Li, B.; Sun, Q. RegionGraph: Region-Aware Graph-Based Building Reconstruction from Satellite Imagery. J. Imaging 2026, 12, 161. https://doi.org/10.3390/jimaging12040161
Li L, Fang C, Li W, Chen K, Li B, Sun Q. RegionGraph: Region-Aware Graph-Based Building Reconstruction from Satellite Imagery. Journal of Imaging. 2026; 12(4):161. https://doi.org/10.3390/jimaging12040161
Chicago/Turabian StyleLi, Lei, Chenrong Fang, Wei Li, Kan Chen, Baolong Li, and Qian Sun. 2026. "RegionGraph: Region-Aware Graph-Based Building Reconstruction from Satellite Imagery" Journal of Imaging 12, no. 4: 161. https://doi.org/10.3390/jimaging12040161
APA StyleLi, L., Fang, C., Li, W., Chen, K., Li, B., & Sun, Q. (2026). RegionGraph: Region-Aware Graph-Based Building Reconstruction from Satellite Imagery. Journal of Imaging, 12(4), 161. https://doi.org/10.3390/jimaging12040161

