# Framework for Indoor Elements Classification via Inductive Learning on Floor Plan Graphs

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Works

#### 2.1. Rule-Based Heuristic Methods and Machine Learning Algorithms in Floor Plan Analysis Research

#### 2.2. Graph Neural Network (Gnn) and Floor Plan Analysis Using Gnn

## 3. Materials and Methods

- (1)
- The framework must detect and classify space elements, such as rooms, together with basic elements (walls, doors, etc.) and symbols.
- (2)
- The framework must start with raster data and output vector data maintaining shape without abstraction.
- (3)
- The framework must perform inductive learning by separating a set of graphs of various types and sizes into graph units, rather than transductive learning that deals with a single large graph.

#### 3.1. Image Pre-Processing and Vectorization

#### 3.2. Region Adjacency Graph (Rag) Conversion and Feature Extraction

`INTERSECTS`operation on another polygon $q\in P,q\ne p$. With the rest of the polygon elements in P, p would need to execute the

`INTERSECTS`operation $\left|P\right|-1$ times, and the number of iterations for P would increase exponentially with the number of nodes. To reduce the number of iterations and the complexity, instead of two nested loops, we used an

`STRtree`[21], which is a spatial indexing algorithm based on an

`R-tree`. The tree returns a resulting polygon set Q when p queries the

`INTERSECTS`of the other spatial objects. If a polygon element q is in Q and q’s area is bigger than the minimum area parameter m, the edge between ${v}_{p}$ and ${v}_{q}$ is added to the edge set E. By using the

`STRtree`, the time complexity of the RAG conversion process is reduced from $O\left({n}^{2}\right)$ to $O\left(n{log}_{m}n\right)$. n is the number of polygons (nodes) and m is the number of entries in the tree.

Algorithm 1: RAG conversion |

#### 3.3. Graph Neural Network Models

#### 3.3.1. A GNN Variant for Inductive Learning on Graphs

#### 3.3.2. A GNN Model to Utilize Distance Weight Feature

## 4. Results

#### 4.1. Datasets

#### 4.2. GNN Models

- (1)
- GCN [16]: Graph Convolution Networks aggregate the neighbor nodes of the target node using a symmetric normalized graph Laplacian ${\tilde{\mathbf{D}}}^{-\frac{1}{2}}\tilde{\mathbf{A}}{\tilde{\mathbf{D}}}^{-\frac{1}{2}}$ made with a self-loop adjacency graph $\tilde{\mathbf{A}}=\mathbf{A}+\mathbf{I}$ and a diagonal degree matrix $\tilde{\mathbf{D}}={\sum}_{j}\tilde{{\mathbf{A}}_{ij}}$. The embedding vectors of the target nodes are generated by summing the information of neighboring nodes and projecting onto a weight matrix. The update process of GCN is$$\begin{array}{c}\hfill {\mathbf{h}}_{v}^{k}=\sigma \left(\right)open="("\; close=")">{\mathbf{W}}^{k-1}\xb7\sum _{u\in \mathcal{N}\left(v\right)}\frac{1}{{c}_{vu}}{\mathbf{h}}_{u}^{k-1},\end{array}$$
- (2)
- GIN [18]: A Graph Isomorphism Network was proposed to maximize the discriminative and representational power of each node in a graph. It shows almost the same performance as the Weisfeiler–Lehman graph isomorphism test [29]. We used MAX, MEAN, and SUM operations as the AGGREGATE function in our experiments. The update process of GIN is$$\begin{array}{c}\hfill {\mathbf{h}}_{v}^{k}=\sigma \left(\right)open="("\; close=")">{\mathrm{MLP}}^{k}\left(\right)open="("\; close=")">(1+{\u03f5}^{k})\xb7{\mathbf{h}}_{v}^{k-1}+\mathrm{AGGREGATE}\left(\right)open="("\; close=")">{\mathbf{h}}_{u}^{k-1},u\in \mathcal{N}\left(v\right)\\ ,\end{array}$$
- (3)
- GraphSAGE [17]: We used the same model as introduced in Section 3.3.1 MEAN was excluded from the experiment because it does not differ much from the propagation rule of GCN. When using the POOL aggregator, a weight matrix was added prior to the MAX operation to increase the expressive power of the message function. The POOL aggregator is defined as follows:$$\begin{array}{c}\hfill {\mathrm{AGGREGATE}}_{k}^{\mathrm{pool}}=\mathrm{max}\left(\right)open="("\; close=")">\{\sigma ({\mathbf{W}}_{\mathrm{pool}}^{k}{\mathbf{h}}_{u}^{k}+\mathbf{b}),\forall u\in \mathcal{N}\left(v\right)\}.\end{array}$$
- (4)
- DWGNN: The model developed by the authors and introduced in Section 3.3.2 was implemented. MAX, MEAN, SUM, and LSTM were used for the AGGREGATE function in our experiment.

#### 4.3. Implementation Details

#### 4.4. Experiment on the Cubicasa Dataset

#### 4.5. Experiment on Large and Complicated Floor Plans: Uos and Uos-Aug

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Dosch, P.; Tombre, K.; Ah-Soon, C.; Masini, G. A complete system for the analysis of architectural drawings. Int. J. Doc. Anal. Recog.
**2000**, 3, 102–116. [Google Scholar] [CrossRef] - Macé, S.; Locteau, H.; Valveny, E.; Tabbone, S. A system to detect rooms in architectural floor plan images. In Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, Boston, MA, USA, 9–11 June 2010; pp. 167–174. [Google Scholar]
- Lu, T.; Yang, H.; Yang, R.; Cai, S. Automatic analysis and integration of architectural drawings. Int. J. Doc. Anal. Recog.
**2007**, 9, 31–47. [Google Scholar] [CrossRef] - Ahmed, S.; Liwicki, M.; Weber, M.; Dengel, A. Automatic room detection and room labeling from architectural floor plans. In Proceedings of the 2012 10th IAPR International Workshop on Document Analysis Systems. Gold Coast, Queensland, Australia, 27–29 March 2012; pp. 167–174. [Google Scholar]
- Barducci, A.; Marinai, S. Object recognition in floor plans by graphs of white connected components. In Proceedings of the 21st International Conference on Pattern Recognition, Tsukuba, Japan, 11–15 November 2012; pp. 298–301. [Google Scholar]
- De, P. Vectorization of Architectural Floor Plans. Twelfth Int. Conf. Contemp. Comput.
**2019**, 10, 1–5. [Google Scholar] - De las Heras, L.P.; Ahmed, S.; Liwicki, M.; Valveny, E.; Sánchez, G. Statistical segmentation and structural recognition for floor plan interpretation. Int. J. Doc. Anal. Recog.
**2014**, 17, 221–237. [Google Scholar] [CrossRef] - Liu, C.; Wu, J.; Kohli, P.; Furukawa, Y. Raster-to-vector: Revisiting floorplan transformation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 142–149. [Google Scholar]
- Dodge, S.; Xu, J.; Stenger, B. Parsing floor plan images. In Proceedings of the 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), Toyoda Auditorium, Nagoya University, Nagoya, Japan, 8–12 May 2017; pp. 358–361. [Google Scholar]
- Zeng, Z.; Li, X.; Yu, Y.K.; Fu, C.W. Deep Floor Plan Recognition Using a Multi-Task Network with Room-Boundary-Guided Attention. In Proceedings of the IEEE International Conference on Computer Vision, COEX Convention Center, Seoul, Korea, 2 October–11 November 2019; pp. 9096–9104. [Google Scholar]
- Zlatanova, S.; Li, K.J.; Lemmen, C.; Oosterom, P. Indoor Abstract Spaces: Linking IndoorGML and LADM. In Proceedings of the 5th International FIG 3D Cadastre Workshop, Athens, Greece, 18–20 October 2016; pp. 317–328. [Google Scholar]
- Tobler, W.R. A computer movie simulating urban growth in the Detroit region. Econ. Geogr.
**2016**, 46 (Suppl. 1), 234–240. [Google Scholar] [CrossRef] - Dominguez, B.; García, Á.L.; Feito, F.R. Semiautomatic detection of floor topology from CAD architectural drawings. Comput. Aided Des.
**2012**, 44, 367–378. [Google Scholar] [CrossRef] - Gori, M.; Monfardini, G.; Scarselli, F. A new model for learning in graph domains. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; Volume 2, pp. 729–734. [Google Scholar]
- Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. Learn. Syst.
**2009**, 20, 61–80. [Google Scholar] [CrossRef] [PubMed][Green Version] - Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations, ICLR, Toulon, France, 24–26 April 2017. [Google Scholar]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; pp. 1024–1034. [Google Scholar]
- Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv
**2018**, arXiv:1810.00826. [Google Scholar] - Hu, R.; Huang, Z.; Tang, Y.; van Kaick, O.; Zhang, H.; Huang, H. Graph2Plan: Learning Floorplan Generation from Layout Graphs. arXiv
**2020**, arXiv:2004.13204. [Google Scholar] [CrossRef] - Renton, G.; Héroux, P.; Gaüzère, B.; Adam, S. Graph Neural Network for Symbol Detection on Document Images. In Proceedings of the 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), Sydney, Australia, 20–25 September 2019; Volume 1, pp. 62–66. [Google Scholar]
- Pfoser, D.; Jensen, C.S.; Theodoridis, Y. Novel approaches to the indexing of moving object trajectories. In Proceedings of the 26th VLDB Conference, Cairo, Egypt, 10–14 September 2000; pp. 395–406. [Google Scholar]
- Zernike, F. Diffraction theory of the cut procedure and its improved form, the phase contrast method. Physica
**1934**, 1, 56. [Google Scholar] - Zhang, Z.; Cui, P.; Zhu, W. Deep learning on graphs: A survey. IEEE Trans. Knowl. Data Eng.
**2020**. [Google Scholar] [CrossRef][Green Version] - Hamilton, W.L. Graph representation learning. Synth. Lect. Artif. Intell. Mach. Learn.
**2020**, 14, 1–159. [Google Scholar] [CrossRef] - Barthélemy, M. Spatial networks. Phys. Rep.
**2011**, 499, 1–101. [Google Scholar] [CrossRef][Green Version] - Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. arXiv
**2017**, arXiv:1704.01212. [Google Scholar] - Gong, L.; Cheng, Q. Exploiting edge features for graph neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 9211–9219. [Google Scholar]
- Kalervo, A.; Ylioinas, J.; Häikiö, M.; Karhu, A.; Kannala, J. Cubicasa5k: A dataset and an improved multi-task model for floorplan image analysis. In Scandinavian Conference on Image Analysis; Springer: Cham, Switzerland, 2019; pp. 28–40. [Google Scholar]
- Weisfeiler, B.; Lehman, A.A. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Tech. Inform.
**1968**, 2, 12–16. [Google Scholar] - Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv
**2015**, arXiv:1502.03167. [Google Scholar] - Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Wang, M.; Yu, L.; Zheng, D.; Gan, Q.; Gai, Y.; Ye, Z.; Huang, Z. Deep graph library: Towards efficient and scalable deep learning on graphs. arXiv
**2019**, arXiv:1909.01315. [Google Scholar]

**Figure 1.**Overview of the proposed framework. The input floor plan image is pre-processed to erase texts and get binarized. The processed image is then vectorized depending on its closed regions and converted to an RAG. The floor plan graph is input to a GNN module in order to classify each polygon according to its and the neighbors’ feature vectors.

**Figure 2.**Overview of the vectorization process. The white areas in (

**a**) are vectorized and buffered according to the thickness of the pixel lines surrounding them (

**c**). The black areas are converted into polygons (

**d**), which are generated by the difference operation between (

**b**,

**c**). Finally, the complete polygon set (

**e**) is generated by merging the two polygon sets. (

**f**) describes the detailed process of polygon buffering (the frame color of each step shows the respective small square in the process in detail).

**Figure 3.**Node classification on a transductive learning GNN method (

**a**) and on an inductive learning GNN method (

**b**). In the transductive learning method, (

**a**) the model is trained by accessing all the nodes and edges in order to predict the class of nodes in the test set (denoted by question marks). In the inductive learning method (

**b**), on the other hand, the set of graphs is split into training and test set, and the test set is predicted with a GNN model trained on a set of training graphs.

**Figure 4.**Visual illustration of the update process of node v (a door segment). The softmin function assigns respective attention values to each neighbor of v according to their distance to v (${e}_{{u}_{i}v}$). Each node’s embedding vector at layer $k-1$ is element-wise multiplied with a respective attention value. They pass through a weight matrix ${W}_{0}$ and aggregated to a message. This message is added to v’s embedding vector at layer $k-1$ and multiplied with the weight matrix ${W}_{1}$. The result is the embedding vector of node v at layer k. In the Figure, ${\mathbf{a}}_{\mathcal{N}\left(v\right)}$ is the converted attention vector and $\mathbf{AGG}$ is the AGGREGATE function.

**Figure 5.**Examples of input image (

**a**) and ground truth (

**b**), and visual comparison of indoor element classification results by GNN models for transductive learning (

**c**,

**d**) and inductive learning models (

**e**,

**f**). The element class “outer space” is erased for visibility.

**Figure 7.**Example of data augmentation. The original plan (

**a**) gets transformed by Equation (9), and returns an augmented plan (

**b**).

**Table 1.**Class-wise accuracy comparison by different GNN models on the CubiCasa dataset (micro-averaged F1 score). AGG stands for the AGGREGATE method.

GNN Model | AGG | Objects | Wall | Window | Door | Stair | Room | Porch | Outer Space | Overall |
---|---|---|---|---|---|---|---|---|---|---|

GIN | MEAN | 0.9001 | 0.8009 | 0.9176 | 0.8029 | 0.5453 | 0.8092 | 0.6719 | 0.7879 | 0.8577 |

GCN | · | 0.9113 | 0.8118 | 0.9142 | 0.8154 | 0.5398 | 0.8325 | 0.54 | 0.7453 | 0.8658 |

GIN | MAX | 0.9241 | 0.8842 | 0.9454 | 0.8816 | 0.5968 | 0.8833 | 0.75 | 0.7849 | 0.9025 |

DWGNN | MEAN | 0.9392 | 0.8485 | 0.9367 | 0.8942 | 0.7653 | 0.903 | 0.6982 | 0.9038 | 0.9137 |

DWGNN | MAX | 0.9429 | 0.8571 | 0.9456 | 0.898 | 0.7854 | 0.9133 | 0.7215 | 0.9048 | 0.9201 |

DWGNN | SUM | 0.9441 | 0.8648 | 0.9428 | 0.9054 | 0.7612 | 0.9164 | 0.7268 | 0.9119 | 0.9214 |

GIN | SUM | 0.9445 | 0.8991 | 0.9783 | 0.9063 | 0.6572 | 0.9067 | 0.7352 | 0.8664 | 0.9283 |

DWGNN | LSTM | 0.9597 | 0.9224 | 0.971 | 0.94 | 0.7913 | 0.9313 | 0.7849 | 0.9233 | 0.9471 |

GraphSAGE | POOL | 0.9586 | 0.9157 | 0.9765 | 0.941 | 0.7675 | 0.9377 | 0.8449 | 0.9289 | 0.9478 |

GraphSAGE | LSTM | 0.9708 | 0.9466 | 0.9896 | 0.9557 | 0.8341 | 0.9617 | 0.8832 | 0.9625 | 0.9651 |

GNN Model | AGG | Objects | Wall | Window | Door | Stair | Lift | Room | Hallway | X-Room | Overall |
---|---|---|---|---|---|---|---|---|---|---|---|

GCN | · | 0.7286 | 0.6829 | 0.6286 | 0.6314 | 0.7643 | 0.8043 | 0.56 | 0.4143 | 0.4357 | 0.6843 |

GIN | MEAN | 0.7014 | 0.7614 | 0.5629 | 0.67 | 0.7957 | 0.7071 | 0.4886 | 0.3643 | 0.4729 | 0.71 |

GIN | MAX | 0.7529 | 0.7857 | 0.7114 | 0.73 | 0.7186 | 0.4686 | 0.5671 | 0.49 | 0.4743 | 0.7457 |

GIN | SUM | 0.8014 | 0.87 | 0.84 | 0.7914 | 0.7786 | 0.7414 | 0.6757 | 0.65 | 0.5086 | 0.8329 |

GraphSAGE | POOL | 0.8371 | 0.87 | 0.8357 | 0.7971 | 0.8514 | 0.6714 | 0.7571 | 0.6214 | 0.5614 | 0.8429 |

DWGNN | MEAN | 0.8626 | 0.8879 | 0.8526 | 0.8256 | 0.8857 | 0.6525 | 0.8382 | 0.7712 | 0.6686 | 0.8658 |

DWGNN | SUM | 0.8633 | 0.8916 | 0.8684 | 0.8274 | 0.9103 | 0.8454 | 0.8067 | 0.8087 | 0.7142 | 0.8764 |

DWGNN | MAX | 0.8644 | 0.8943 | 0.861 | 0.8165 | 0.9191 | 0.7323 | 0.8293 | 0.8155 | 0.7353 | 0.8765 |

DWGNN | LSTM | 0.8665 | 0.9178 | 0.923 | 0.8661 | 0.9457 | 0.7875 | 0.8406 | 0.8385 | 0.7765 | 0.9072 |

GraphSAGE | LSTM | 0.908 | 0.9308 | 0.9152 | 0.8847 | 0.9318 | 0.9206 | 0.8599 | 0.7951 | 0.8255 | 0.9184 |

GNN Model | AGG | Objects | Wall | Window | Door | Stair | Lift | Room | Hallway | X-Room | Overall |
---|---|---|---|---|---|---|---|---|---|---|---|

GCN | · | 0.8812 | 0.7898 | 0.7295 | 0.7565 | 0.8516 | 0.9243 | 0.6508 | 0.6619 | 0.6255 | 0.7822 |

GIN | MEAN | 0.886 | 0.8865 | 0.8656 | 0.8208 | 0.95 | 0.8406 | 0.7603 | 0.5446 | 0.8215 | 0.8752 |

GIN | MAX | 0.9014 | 0.9373 | 0.9094 | 0.906 | 0.9706 | 0.9348 | 0.9026 | 0.8734 | 0.8822 | 0.925 |

DWGNN | MEAN | 0.9334 | 0.9516 | 0.9472 | 0.9092 | 0.9705 | 0.9405 | 0.9289 | 0.8598 | 0.8862 | 0.9446 |

DWGNN | SUM | 0.9602 | 0.9608 | 0.9511 | 0.9301 | 0.9728 | 0.9311 | 0.9469 | 0.9088 | 0.8966 | 0.955 |

DWGNN | MAX | 0.9612 | 0.9683 | 0.9588 | 0.9398 | 0.9803 | 0.9659 | 0.9531 | 0.9208 | 0.9189 | 0.9628 |

GIN | SUM | 0.964 | 0.9731 | 0.9679 | 0.9286 | 0.9766 | 0.9531 | 0.9661 | 0.9358 | 0.9004 | 0.9658 |

GraphSAGE | POOL | 0.9696 | 0.9727 | 0.9689 | 0.9485 | 0.974 | 0.9617 | 0.9459 | 0.8946 | 0.933 | 0.9681 |

DWGNN | LSTM | 0.9627 | 0.978 | 0.9733 | 0.9474 | 0.9848 | 0.9507 | 0.9488 | 0.9401 | 0.9478 | 0.9716 |

GraphSAGE | LSTM | 0.9762 | 0.9827 | 0.9802 | 0.9646 | 0.9796 | 0.9869 | 0.9553 | 0.9124 | 0.9747 | 0.9788 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Song, J.; Yu, K.
Framework for Indoor Elements Classification via Inductive Learning on Floor Plan Graphs. *ISPRS Int. J. Geo-Inf.* **2021**, *10*, 97.
https://doi.org/10.3390/ijgi10020097

**AMA Style**

Song J, Yu K.
Framework for Indoor Elements Classification via Inductive Learning on Floor Plan Graphs. *ISPRS International Journal of Geo-Information*. 2021; 10(2):97.
https://doi.org/10.3390/ijgi10020097

**Chicago/Turabian Style**

Song, Jaeyoung, and Kiyun Yu.
2021. "Framework for Indoor Elements Classification via Inductive Learning on Floor Plan Graphs" *ISPRS International Journal of Geo-Information* 10, no. 2: 97.
https://doi.org/10.3390/ijgi10020097