Semi-Automated Dataset Generation for Residential Buildings Using Graph-Based Topological Modelling
Abstract
1. Introduction
2. Background
2.1. Context
2.2. State of the Art
2.2.1. Graph-Based Analysis for Spatial Layouts
2.2.2. Floor Plan Datasets
3. Materials and Methods
3.1. Workflow Overview
- Raw data acquisition: the building’s typical floor plans are acquired in a raw format.
- Floor plan retracing: the spaces and the doors in the floor plans are manually retraced in a 3D modelling environment.
- Topological data generation: a 3D topological model mapping spaces and the data generation: a 3D topological model mapping spaces and their passage relationships through doors for each floor plan is generated thanks to a VPL algorithm.ir passage relationships through doors for each floor plan is generated thanks to a VPL algorithm.
- Functional data assignment: the occupancy types of the spaces in the floor plan are assigned through a procedure that leverages conditional data enrichment techniques based on the analysis of graph centrality metrics and morphological attributes of spaces.
3.2. Raw Data Acquisition
3.3. Floor Plan Retracing
- The initial step is to import the raw floor plan data into Rhino, ensuring that the original architectural drawings are available in a 3D modelling environment, where their geometry and scale can be manipulated.
- Once imported, each space within the floor plan is manually retraced by drawing closed polylines delineating the gross area of each space.
- In parallel with space retracing, door locations are defined by manually adding vertices at the centre of each door opening. By using these vertices, the following algorithms capture the connectivity between spaces.
3.4. Topological Graph Generation
3.5. Functional Data Assignment
3.5.1. Ruleset 1: Identifying the Stairs via Uniform Component Decomposition Centrality
- For every node in that building block graph, the node is temporarily removed, and any resulting isolated nodes in the graph are discarded.
- The modified graph is split into its connected subgraphs. For the removed node, the algorithm calculates the sizes (number of nodes) of the resulting subgraphs.
- The algorithm computes the standard deviation of the subgraphs’ sizes (in terms of number of nodes in the subgraph) to quantify how evenly the graph splits without the node.
- The standard deviation is then rescaled, and a negative logarithm is applied to produce a score, meaning that a lower standard deviation (i.e., more homogeneous component sizes) results in a higher UCDC score. This score is normalised and assigned to the node as its UCDC.
- Within each building block, the node with the highest score (i.e., the one whose removal yields the most balanced split) is identified as the staircase node, serving as the root of that building block (Figure 8).
3.5.2. Ruleset 2: Detecting Elevators
3.5.3. Ruleset 3: Apartment Graph Extraction and Entrance Identification
3.5.4. Ruleset 4: Corridor Assignment
3.5.5. Ruleset 5: Distinguishing Corridors from Pass-Through Living Rooms
- Chain A: this configuration comprises nodes connected in the sequence “Entrance—Corridor—Corridor”. The central corridor is designated as a pass-through living room if its area is greater than that of the second corridor if the degree (number of connections) of the second corridor is higher than that of the central corridor, and if the degree of the central corridor is less than or equal to three.
- Chain B: this configuration comprises nodes connected in the sequence “Corridor—Corridor—None”, where “None” indicates that the node has not yet been classified. In this chain, the central corridor is reclassified as a pass-through living room if its area exceeds that of each adjacent node that is not classified as a corridor if the degree of the first corridor is greater than that of the central corridor, and if the degree of the central corridor is less than or equal to 3.
- Chain C: this configuration comprises nodes connected in the sequence “Entrance—Corridor—None”. Here, the corridor is classified as a pass-through living room if its area is larger than that of each adjacent non-corridor node and if its degree is less than or equal to 3.
3.5.6. Ruleset 6: Differentiating Liveable from Non-Liveable Spaces
- Approach A: Initially, each apartment’s unclassified nodes (i.e., those not already labelled as “corridor”, “entrance”, or “pass-through living room”) were clustered using K-Means with three clusters. The clustering used features that combined both the gross area and centrality metrics (i.e., betweenness, closeness, and degree centrality in the apartment graph). The clusters with the smallest average area were designated as “Supporting” spaces. The remaining clusters were classified as “Living” spaces. Moreover, if the cluster with a medium average area had a mean area below 65% of the cluster with the highest average area (a threshold derived from the ratio of the smallest single bedroom to the smallest double bedroom in Bologna according to local regulation), then even those nodes were reclassified as “Supporting”.
- Approach B: Then, due to the sensitivity issues associated with this approach (e.g., recurrent failures to correctly identify small single bedrooms as living spaces in apartments containing many large rooms), the algorithm was simplified to perform K-Means clustering using only the area attribute and reduce the number of clusters to 2.
3.5.7. Ruleset 7: Identifying the Living Room
- Proximity to the entrance: the closer a node is to the designated entrance, the more likely it is to function as the living room.
- Connectivity (degree): nodes with a higher number of connections (i.e., a higher degree) are more central in the apartment’s circulation and thus are favoured.
- Size: a larger area increases the likelihood of the node serving as the living room.
3.5.8. Ruleset 8: Classifying Bedrooms
3.5.9. Ruleset 9: Identifying Closets
3.5.10. Ruleset 10: Kitchen Identification
3.5.11. Ruleset 11: Setting Toilets
3.6. Data Export
3.7. Sensitivity Analysis
4. Results
4.1. Dataset Description
4.2. Dataset Analysis
5. Discussion
5.1. Evaluation of Process Accuracy
5.1.1. Evaluation of Rulesets 1 and 2: Stairs and Elevators
5.1.2. Evaluation of Rulesets 3, 4 and 5: Entrances, Corridors and Pass-Through Living Rooms
- Chain A was observed in 23 apartments (6.0% of the total). This configuration occurs when a pass-through living room is situated between two corridors, following the specified conditions.
- Chain B was found in 28 apartments (7.3%). In this case, the pass-through living room appears between two corridors, with one of them leading to a dead end.
- Chain C was the most common, appearing in 113 apartments (29.4%). Here, the pass-through living room is formed between the entrance and a single corridor, which ends in a dead end.
5.1.3. Evaluation of Ruleset 6: Living vs. Supporting Spaces
5.1.4. Evaluation of Ruleset 7, 8, 9, 10 and 11: Living Rooms, Bedrooms, Kitchens, Toilets, and Closets
- Single bedroom vs. kitchen: These are often confused when the kitchen is directly connected to the corridor (entrance) and is not connected to other spaces. Considering that Italian regulations (Istruzioni Ministeriali 20 Giugno 1896) stipulated a minimum of 9 square metres for both kitchens and single bedrooms, this error is understandable. In such cases, the approach cannot yield better results.
- Living rooms vs. large bedrooms: The same consideration applies. Established Italian regulations require that living rooms and double bedrooms have a net area greater than 14 square metres. When identified topologically in terms of centrality, living rooms can be confused with bedrooms. Again, the algorithm cannot do much in these cases.
- Small kitchen vs. toilets: A small kitchen can be confused with toilets when the toilets are larger, for similar reasons.
- Small toilets vs. closets: Another recurring error occurs when bathrooms are smaller than closets. Indeed, the process assumes that closets are the nodes with the smallest area among those classified as supporting spaces.
5.2. Limitations and Future Studies
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AECO | Architecture, Engineering, Construction, and Operations. |
ANN | Artificial neural network. |
BIM | Building information modelling. |
GIS | Geographic Information System. |
GNN | Graph Neural Network. |
ML | Machine learning. |
RF | Random Forest. |
UCDC | Uniform component decomposition centrality. |
VPL | Visual Programming Language. |
References
- Italian National Institute of Statistics (ISTAT). Population and Housing Census. 2011. Available online: http://dati-censimentopopolazione.istat.it/Index.aspx (accessed on 13 March 2025).
- Lehtola, V.V.; Koeva, M.; Elberink, S.O.; Raposo, P.; Virtanen, J.-P.; Vahdatikhaki, F.; Borsci, S. Digital Twin of a City: Review of Technology Serving City Needs. Int. J. Appl. Earth Obs. Geoinf. 2022, 114, 102915. [Google Scholar] [CrossRef]
- Lu, Y.; Tian, R.; Li, A.; Wang, X.; Garcia del Castillo Lopez, J.L. CUBIGRAPH5K: Organizational Graph Generation for Structured Architectural Floor Plan Dataset. In Proceedings of the 26th International Conference of the Association for Computer—Aided Architectural Design Research in Asia (CAADRIA 2021), Hong Kong, China, 29 March–1 April 2021; Volume 1, pp. 81–90. [Google Scholar]
- Parenti, G. Una Esperienza Di Programmazione Settoriale Nell’edilizia: L’ina-Casa; Giuffrè: Roma, Italy, 1967. [Google Scholar]
- Ministero dei Lavori Pubblici. Legge 2 Luglio 1949, n. 408: Disposizioni per l’incremento Delle Costruzioni Edilizie; Ministero dei Lavori Pubblici: Rome, Italy, 1949. [Google Scholar]
- Xie, X.; Ding, W. An Interactive Approach for Generating Spatial Architecture Layout Based on Graph Theory. Front. Archit. Res. 2023, 12, 630–650. [Google Scholar] [CrossRef]
- Zhang, X.; Wong, A.K.-S.; Lea, C.-T. Automatic Floor Plan Analysis for Adaptive Indoor Wi-Fi Positioning System. In Proceedings of the 2016 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 15–17 December 2016; pp. 869–874. [Google Scholar]
- Herthogs, P.; De Temmerman, N.; De Weerdt, Y. Assessing the Generality and Adaptability of Building Layouts Using Justified Plan Graphs and Weighted Graphs: A Proof of Concept. In Proceedings of the Central Europe towards Sustainable Building 2013: Sustainable Refurbishment of Existing Building Stock, Prague, Czech Republic, 26–28 June 2013. [Google Scholar]
- Azizi, V.; Usman, M.; Zhou, H.; Faloutsos, P.; Kapadia, M. Graph-Based Generative Representation Learning of Semantically and Behaviorally Augmented Floorplans. Vis. Comput. 2022, 38, 2785–2800. [Google Scholar] [CrossRef]
- De Las Heras, L.-P.; Terrades, O.R.; Llados, J. Attributed Graph Grammar for Floor Plan Analysis. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 726–730. [Google Scholar]
- Keleş, B.N.; Takva, Ç.; Çakıcı, F.Z. Accessibility Analysis of Public Buildings with Graph Theory and the Space Syntax Method: Government Houses. J. Asian Archit. Build. Eng. 2025, 24, 199–213. [Google Scholar] [CrossRef]
- Hillier, B. A Theory of the City as Object: Or, How Spatial Laws Mediate the Social Construction of Urban Space. Urban Des. Int. 2001, 6, 153–179. [Google Scholar] [CrossRef]
- Hillier, B. Space Is the Machine: A Configurational Theory of Architecture; Cambridge University Press: Cambridge, UK, 1996. [Google Scholar]
- Isaac, S.; Sadeghpour, F.; Navon, R. Analyzing Building Information Using Graph Theory. In Proceedings of the 30th ISARC, Montreal, QC, Canada, 11–15 August 2013. [Google Scholar]
- Ernst, S.; Łabuz, M.; Środa, K.; Kotulski, L. Graph-Based Spatial Data Processing and Analysis for More Efficient Road Lighting Design. Sustainability 2018, 10, 3850. [Google Scholar] [CrossRef]
- Cao, J.; Zhang, H.; Savov, A.; Hall, D.; Dillenburger, B. Energy-Aware Design: Predicting Building Performance from Layout Graphs. In Proceedings of the Energy-Aware Design: Predicting Building Performance from Layout Graphs, Rhodes, Greece, 24–26 July 2022. [Google Scholar]
- Wilson, R.J. Introduction to Graph Theory, 4th ed.; Longman Group Ltd.: Harlow, UK, 1996. [Google Scholar]
- Wong, S.S.Y.; Chan, K.C.C. EvoArch: An Evolutionary Algorithm for Architectural Layout Design. Comput. -Aided Des. 2009, 41, 649–667. [Google Scholar] [CrossRef]
- Valente, T.W.; Coronges, K.; Lakon, C.; Costenbader, E. How Correlated Are Network Centrality Measures? Connections 2008, 28, 16–26. [Google Scholar] [PubMed]
- Werner, C.; Loidl, M. Betweenness Centrality in Spatial Networks: A Spatially Normalised Approach (Short Paper). GIScience 2023, 277, 83. [Google Scholar] [CrossRef]
- Wurzer, G.; Lorenz, W.E. SpaceBook: A Case Study of Social Network Analysis in Adjacency Graphs. In Complexity & Simplicity: Proceedings of the 34th eCAADe Conference, Oulu, Finland, 24–26 August 2016; Herneoja, A., Österlund, T., Markkanen, P., Eds.; TU Wien: Vienna, Austria, 2016; Volume 2, pp. 229–238. [Google Scholar]
- Yang, L.; Worboys, M. Generation of Navigation Graphs for Indoor Space. Int. J. Geogr. Inf. Sci. 2015, 29, 1737–1756. [Google Scholar] [CrossRef]
- Bhattacharya, S.; Sinha, S.; Dey, P.; Saha, A.; Chowdhury, C.; Roy, S. Online Social-Network Sensing Models. In Computational Intelligence Applications for Text and Sentiment Data Analysis; Elsevier: Amsterdam, The Netherlands, 2023; pp. 113–140. [Google Scholar] [CrossRef]
- Kalervo, A.; Ylioinas, J.; Häikiö, M.; Karhu, A.; Kannala, J. CubiCasa5K: A Dataset and an Improved Multi-Task Model for Floorplan Image Analysis. In Image Analysis; Felsberg, M., Forssén, P.-E., Sintorn, I.-M., Unger, J., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2019; Volume 11482, pp. 28–40. [Google Scholar] [CrossRef]
- Goyal, S.; Mistry, V.; Chattopadhyay, C.; Bhatnagar, G. BRIDGE: Building Plan Repository for Image Description Generation, and Evaluation. In Proceedings of the 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 20–25 September 2019; pp. 1071–1076. [Google Scholar]
- van Engelenburg, C.; Mostafavi, F.; Kuhn, E.; Jeon, Y.; Franzen, M.; Standfest, M.; van Gemert, J.; Khademi, S. MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes. arXiv 2024, arXiv:2407.10121. [Google Scholar] [CrossRef]
- Wu, W.; Fu, X.-M.; Tang, R.; Wang, Y.; Qi, Y.-H.; Liu, L. Data-Driven Interior Plan Generation for Residential Buildings. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Hu, R.; Huang, Z.; Tang, Y.; Van Kaick, O.; Zhang, H.; Huang, H. Graph2Plan: Learning Floorplan Generation from Layout Graphs. ACM Trans. Graph. 2020, 39, 4. [Google Scholar] [CrossRef]
- Nauata, N.; Chang, K.-H.; Cheng, C.-Y.; Mori, G.; Furukawa, Y. House-GAN: Relational Generative Adversarial Networks for Graph-Constrained House Layout Generation. arXiv 2020, arXiv:2003.06988. [Google Scholar] [CrossRef]
- Jabi, W.; Chatzivasileiadi, A. Topologic: Exploring Spatial Reasoning Through Geometry, Topology, and Semantics; Advances in Science, Technology & Innovation; Springer: Cham, Switzerland, 2021. [Google Scholar]
- Jabi, W. Topologicpy, Version 0.8.15; 2024. Available online: https://zenodo.org/records/11555173 (accessed on 10 April 2025).
- NetworkX—NetworkX Documentation. Available online: https://networkx.org/ (accessed on 13 March 2025).
- MacQueen, J.B. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics; Le Cam, L.M., Neyman, J., Eds.; University of California Press: Berkeley, CA, USA, 1967; pp. 281–297. [Google Scholar]
- Crucitti, P.; Latora, V.; Porta, S. Centrality Measures in Spatial Networks of Urban Streets. Phys. Rev. E 2006, 73, 036125. [Google Scholar] [CrossRef] [PubMed]
- Perez, C.; Germon, R. Graph Creation and Analysis for Linking Actors: Application to Social Data. In Automating Open Source Intelligence: Algorithms for OSINT; Layton, R., Watters, P.A., Eds.; Elsevier: Amsterdam, The Netherlands, 2016; pp. 103–129. [Google Scholar]
- Golbeck, J. Nodes, Edges, and Network Measures. In Analyzing the Social Web; Elsevier: Amsterdam, The Netherlands, 2013; pp. 9–23. [Google Scholar] [CrossRef]
Property Name | Description | Graph Type |
---|---|---|
cn_betweennessCentrality | Betweenness centrality value for the node in the graph. Betweenness centrality quantifies a node’s significance within a network based on its position along the shortest paths between other node pairs, in which a node is considered central if it frequently serves as an intermediary in these shortest-path connections [34]. | Floor, block, apartment |
cn_betwennessCentrality_Normalized | Normalised betweenness centrality of the node in the graph (0 for the less central node, 1 for the most central node in the graph) | Floor, block, apartment |
cn_closenessCentrality | Closeness centrality value for the node in the graph. Closeness Centrality suggests that a node’s proximity to other nodes in the graph determines its closeness to the rest of the network [35]. | Floor, block, apartment |
cn_closenessCentrality_Normalized | Normalised closeness centrality of the node in the graph (0 for the less central node; 1 for the most central node in the graph) | Floor, block, apartment |
cn_degreeCentrality | Degree centrality value for the node in the graph. It is the total number of connections a space has relative to all other spaces within the architectural environment [21]. | Floor, block, apartment |
cn_degreeCentrality_Normalized | Normalised degree centrality of the node in the graph (0 for the less central node; 1 for the most central node in the graph) | Floor, block, apartment |
cn_degree | Degree (number of direct connections) for the node in the graph. | Floor, block, apartment |
cn_degree_Normalized | Normalised degree value centrality of the node in the graph (0 for the less central node; 1 for the most central node in the graph) | Floor, block, apartment |
cn_eigenVectorCentrality | Eigenvector centrality for the node in the graph. Eigenvector Centrality evaluates a node’s importance by accounting for both its direct connections and the significance of its neighbouring nodes [36]. | Floor, block, apartment |
cn_eigenVectorCentrality_Normalized | Normalised eigenvector centrality of the node in the graph (0 for the less central node; 1 for the most central node in the graph) | Floor, block, apartment |
cn_uniformComponentDecompositionCentrality | Uniform component decomposition centrality of the node in the graph | Floor, block |
cn_uniformComponentDecompositionCentrality_Normalized | Normalised uniform component decomposition centrality of the node in the graph (0 for the less central node; 1 for the most central node in the graph) | Floor, block |
Property Name | Description | Graph Type |
---|---|---|
pr_Compactness | Measure of the space’s compactness, defined as the ratio between the gross area of the space and the gross area of its smallest bounding rectangle in the xy plane. | Floor, block, apartment |
pr_GrossArea | Gross area of the property (in square metres). | Floor, block, apartment |
pr_Rectangularity | Measure of how closely the space’s shape approximates a rectangle, defined as the ratio between the width and the length of the smallest bounding rectangle of the space in the xy plane. | Floor, block, apartment |
pr_SVRatio | Surface-to-volume ratio of the property, defined as the ratio between the volume of the space and the sum of the areas of all faces bounding the space (walls, roofs, and floors). | Floor, block, apartment |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Massafra, A.; Al-Harasis, D.H.; Stefanini, L.; Jabi, W. Semi-Automated Dataset Generation for Residential Buildings Using Graph-Based Topological Modelling. Buildings 2025, 15, 1283. https://doi.org/10.3390/buildings15081283
Massafra A, Al-Harasis DH, Stefanini L, Jabi W. Semi-Automated Dataset Generation for Residential Buildings Using Graph-Based Topological Modelling. Buildings. 2025; 15(8):1283. https://doi.org/10.3390/buildings15081283
Chicago/Turabian StyleMassafra, Angelo, Dania H. Al-Harasis, Lorenzo Stefanini, and Wassim Jabi. 2025. "Semi-Automated Dataset Generation for Residential Buildings Using Graph-Based Topological Modelling" Buildings 15, no. 8: 1283. https://doi.org/10.3390/buildings15081283
APA StyleMassafra, A., Al-Harasis, D. H., Stefanini, L., & Jabi, W. (2025). Semi-Automated Dataset Generation for Residential Buildings Using Graph-Based Topological Modelling. Buildings, 15(8), 1283. https://doi.org/10.3390/buildings15081283