Automatic Method for Building Indoor Boundary Models from Dense Point Clouds Collected by Laser Scanners

In this paper we present a method that automatically yields Boundary Representation Models (B-rep) for indoors after processing dense point clouds collected by laser scanners from key locations through an existing facility. Our objective is particularly focused on providing single models which contain the shape, location and relationship of primitive structural elements of inhabited scenarios such as walls, ceilings and floors. We propose a discretization of the space in order to accurately segment the 3D data and generate complete B-rep models of indoors in which faces, edges and vertices are coherently connected. The approach has been tested in real scenarios with data coming from laser scanners yielding promising results. We have deeply evaluated the results by analyzing how reliably these elements can be detected and how accurately they are modeled.


Introduction
During the last decade laser scanners have gained popularity in architecture, engineering, construction and facility management (AEC/FM). Other measurement methods such as total stations, measuring tapes and based-stereo-camera prototypes are too time-consuming or inaccurate compared to scanners, particularly in large-scale environments. Moreover the high density provided by a single OPEN ACCESS scan (which could be over several million points) makes this technology very suitable for use in the 3D modeling of facilities.
Most of the time, the dense raw data provided from scanners are manipulated by a designer or processed by an engineer in order to create a simplified model of the scenario. This well known reverse engineering process is applied to 3D model creation. Thus, a simplified model provides a high-level representation of the scenario that ranges from a single CAD model, in which a wall is represented as a set of independent planar surfaces, to a Building Information Model (BIM), in which a wall is thought as a volumetric object composed by multiple surfaces with several relevant properties like color, material, cost, etc. Much of the emphasis in previous works has been on creating visually realistic non-parametric models rather than accurate parametric ones. Some examples in this category include methods by El-Hakim et al. [1], which is focused on indoor environments and Früh et al. [2] and Remondino et al. [3], which are focused on outdoor environments. In these cases, a great part of the modeling process is supervised by the modeler so that it cannot be said that models are obtained in an automatic manner.
In this paper we introduce a method that automatically yields Boundary Representation Models (B-rep) for indoors after processing dense point clouds collected by laser scanners from key locations through an existing facility. The 3D model then represents the current state of the building, what is named as the "as-is condition". This model does not necessarily have to be coincident with either the designed ("as-designer condition") or the built model ("as-built condition"). In fact, the facility could have lightly been modified or restored from its initial design. In other occasions, we cannot access to the design drawings or they merely do not exist. Thus, the automatic creation of "as-is models" of inhabited scenarios is a challenging research field which is gaining attention from different applications including architecture, engineering, robotics, etc. [4].
Although there are not many publications dealing with the automatic creation of 3D models, in the last years several interesting works related with some of the stages of this process can be found in the literature. A review for automatic reconstruction of as-built building information models can be found in [4]. The process of creating a single-semantic model can vary depending on the input and expected output. Generally, the automatic modeling process can be divided into three steps: data acquisition, data processing and data modeling, and most of the published papers concentrate on the second stage: data processing.
In this framework, 3D data processing means processing millions of unstructured 3D data in order to obtain higher level data structures. Several approaches that convert 3D data into high-level representations in buildings context can be found over the last years. We can here distinguish between proposals to detect and model single objects or particular parts of large scenarios and those than completely model indoors and outdoors.
As regards the first sort of proposals, one of the earlier works that obtain 3D models from laser scanner in a local space context is the one of Kwon et al. [5]. They introduce a set of algorithms for fitting sparse point clouds to a set of single volumetric primitives (cuboids, cylinders and spheres) which can be extended to groups of primitives belonging to the same object. An automated recognition/retrieval approach for 3D CAD objects in the construction context is presented in [6]. In [7] a semiautomatic method to match 3D existing models to data of industrial building steel structures is proposed. The author develops a variant of the ICP algorithm for data registration in order to recognize CAD models objects in large site laser scans. The method is semiautomatic because a coarse registration needs to be performed manually. The same author presents in [8] a plane based registration system for coarse registration of laser scanners data with 3D models in the context of AEC/FM industry. The approach is based on finding planes from the point cloud and matching them with the ones extracted from the 3D models. Nevertheless, the matching process is performed by hand. Other authors [9] propose the combination of point clouds, acquired by means of laser scanners, and photogrammetry in order to generate 3D models of a building under construction. The work of Rusu et al. [10] correctly recognizes and localizes relevant kitchen objects including cupboards, kitchen appliances, and tables. They interpret the improved point clouds in terms of rectangular planes and 3D geometric shapes. One of the innovations consists of including a novel multi-dimensional tuple representation for point clouds and robust, efficient and accurate techniques for computing these representations, which facilitate the creation of hierarchical object models.
Detailed models of part of the walls and building façades are obtained in [2,[11][12][13]. In [11,12] the data processing goes from detecting windows through low data density regions to discover other data patterns in the façade. In [13] important façade elements such as walls and roofs are distinguished as features. Then, the knowledge about the features' sizes, positions, orientations, and topology is introduced to recognize these features in a segmented laser point cloud. Früh et al. [2] develop a set of data processing algorithms for generating textured façade meshes of cities from a series of scans, corresponding to vertical 2D surfaces, obtained by a laser scanner. Thrun et al. developed a plane extraction method based on the expectation-maximization algorithm [14]. Other researchers have proposed plane sweep approaches to find planar regions [15,16]. Valero et al. [17] focus on the modeling of those linear moldings that typically surround doorways, windows, and divide ceilings from walls and walls from floors.
With respect to the automatic creation of complete indoor models, geometric surfaces or volumetric primitives can be fitted to a 3D point cloud to model walls, doors, ceilings, columns, beams, and other structures of interest. In its most simple format, the modeled primitives are annotated with labels (e.g., "wall"). In a higher modeling degree, spatial and functional relationships between nearby structures and spaces are established. Interesting developments can be seen in [18] and [19]. In these works, Adan et al. identify and model the main structural components of an indoor environment (walls, floors, ceilings, windows, and doorways) despite the presence of significant clutter and occlusion, which occur frequently in natural indoor environments, but they deal with rectangular rooms. Furthermore, in the present work we present an approach to identify walls in rooms with more complicated geometries. Okorn et al. [20] present an automated method for creating accurate 2D floor plan models of building interiors. They project the points onto a 2D ground plane and create a histogram of points density, from which line segments corresponding to walls are extracted using a Hough Transform. The first steps of our proposal seem to be inspired in the same strategy since we also project the points to extract the approximate 2D wall's location. Nevertheless, this is only used to later retrieve the corresponding 3D data and delimitate with precision the points belonging to the walls.
This work aims to make progress in the creation of automatic 3D models from point clouds provided by scanners [21]. The contributions of this work lie in three points: accuracy of the model, coherent representation and evaluation of the method. First, we propose a voxelization (discretization) of the space with which to accurately define the planes fitted to the points belonging to the ceiling, floor and walls of the facility. Second, we generate a complete boundary representation model of the indoor in which faces, edges and vertices are coherently connected. Third, once the method is designed and implemented, we make a performance evaluation by measuring the geometric modeling accuracy, the recognition accuracy and the relationship modeling accuracy. The following sections explain the main stages to achieve a complete B-rep model of interiors and present the results obtained.

Floor and Ceiling Segmentation
The first stage in the 3D model creation process consists of efficiently segmenting the data obtained by the scanner from different positions. Here we assume that preprocessing stages like outliers/noise filtering and data registration are done, so that our input is an unstructured large cloud of points.
One of the problems which arise in the automatic walls detection is the voxelization of the space in the sense of which the best voxels' size is and where the origin of the space is set. In [18,19] this aspect is not tackled. In the present work, we deal with the optimization of the voxels' size, allowing us to adjust more precisely and at the same time, the ceiling, floor and walls of rectangular rooms into the voxel planes. Thus, a voxel plane contains the majority of the data of each wall.
First of all, we face up the identification and segmentation of the floor and the ceiling of the room. The method proposed in this section assumes that, as it is usual in construction, floor and ceiling are parallel structures. From here on the word "wall" will be indistinctly used to refer any flat indoor structure (ceiling, floor or wall). Our approach is based on creating an optimum discretization of the space (from here on called "voxel space") and then accurately defining the planes which contain the maximum number of points which lie in the floor and ceiling.
Formally, the voxel space can be defined in a universal coordinate system (UCS) by means of the voxel size (ε, δ, σ) and the coordinates of centroid's voxel (v x , v y , v z ), v z being voxel's height, according to the construction context. Assuming cubic voxels, our objective is then addressed on determining the minimum voxel size, characterized by the parameter ε, and the first coordinate of the voxel plane v z which contains the maximum number of the points belonging to a wall. Once the voxelization of the space is carried out, most of the points belonging, for example, to the floor are contained in a narrow parallelepiped M whose height is ε (see Figure 1(a)). The uncertainty in ε can be limited by means of the flatness specifications provided in the construction standards. The trade-off between the size of the voxels and the number of points contained in M is regulated by the objective function (1), in which p(ε, v z ) is the percentage of the data contained in M. Therefore, this function evaluates the percentage of the wall's data contained in a voxel plane versus the voxel's size. Figure 1 shows the voxelization of the space according to the definition of the volume M for the floor (M a ) and the ceiling (M b ): The algorithm is shown below. When the algorithm is separately applied for ceiling and floor, two different voxel space be the position and voxel size parameters calculated for the floor and the ceiling of the room. We propose a new function G which integrates both voxelization proposals.

Algorithm 1. Calculation of optimum values
Equation (2) imposes one unique voxel size E and the positions of the planes of voxels V' z,a and V' z,b , attaining the maximum value of G which provide us the best simultaneous planes of voxels for ceiling and floor. The new proposed function to optimize is as follows: p a and p b are the occupation percentages for the parameters ε' a and ε' b . The pseudocode of the algorithm which obtains G is detailed in Algorithm 2.

Algorithm 2.
Obtaining the function G.

Performance Tests
The approach detailed above has been tested in simulated and real data. Figure 1 shows an example of the results obtained under simulation. In Figure 1(b) a front view of a simulated point cloud of a room is depicted. It can be seen two dense point regions, which correspond to the floor and ceiling, and sparse regions which simulate the rest of the sensed points in the room. The maximum values of two Gaussian functions fitted to the data distribution determine the initial values of v z,a and v z,b . Figure 1(c) shows the voxels planes projected over the plane YZ, the 3D data and two slices in red and blue that contain the majority of the sensed points. The red voxels plane contains points of the floor and the blue one contains points of the ceiling. We tested the algorithm over twenty simulated rooms. The occupation percentage averages for ceiling and floor were 96.9% and 92.1% respectively. Figure 1(d) presents the result for a real case. We illustrate the 3D data sensed by a laser scanner from five positions of an inhabited classroom. The data segments corresponding to the ceiling and floor are painted in cyan and red. The total point cloud was composed by 1.5 million points, and the size of the segmented regions was 187,000 points for the ceiling and 95,000 points for the floor.

Rectangular Indoor Plans
In this section we present an approach to segment 3D data corresponding to each one of the walls of a rectangular indoor plan. A rectangular plan is the easiest case to be dealt with. The strategy explained in Section 2.1 can here be extended by considering three pairs of parallel voxels planes, so that the Equation (2) can be easily extended. The objective is to find six parallelepipeds (slices) with centers c i ,

Arbitrary Indoor Plans
Assuming that floor and ceiling lie in parallel planes, the approach proposed in Section 2.1 can always be used to detect the floor and the ceiling in any non-rectangular room of a building. Figure 2(b) illustrates the extraction of the points belonging to the floor and ceiling of an arbitrary plan. However, the identification of the walls is a more complex task. The projection of the 3D data from a specific viewpoint allows us to obtain a normalized binary image in which each pixel can be occupied by one or more 3D points (white pixels in Figure 3(b)) or not. From here on, we will denote I the projected image of the data from a top view. This image will help us to obtain a coarse location and position of the walls which will be later refined. The segmentation process is as follows. After creating the image I, the boundary of the room is extracted and, through a Hough Transform algorithm, the set of edges corresponding to the walls in the projected image are detected in a 2D context. As the reader may suppose, if a wall or the connectivity between two walls is completely occluded by a piece of furniture or a constructive component, the boundary of the room does not fit to the walls but these components. Once the room boundaries are demarcated, the points out of the boundaries are removed automatically. Afterwards, we figure out the intersections between edges and obtain the corners in the image. Figure 3  Assuming vertical walls, the segments and corners in the image signify planes and edges in the 3D context. And the planes will be used to segment the 3D points which lie into each wall. Thus, we calculate the mean square distance of the point cloud to the walls and classify each point into a wall. Figure 4 shows the segmentation of points belonging to the walls for two different rooms.

Creation of B-Rep Models
The segmentation stage provides the set of points belonging to each wall (including floor and ceiling) of the indoor scene. The following step consists of converting this raw information into high level surface representation. Within the 3D representation models universe we have chosen the boundary representation (B-rep) model [22]. In B-rep, a shape is described by a set of surface elements along with the connectivity information which describe the topological relationship between the elements.
The process to achieve a B-rep of an interior space starts calculating the planes which best fit every set of the segments associated to the walls. To do this, we have used the Singular Value Decomposition (SVD) technique [23]. Through SVD, the closeness between the plane and the segmented points is easily calculated. The steps to calculate the plane equation that best fits a generic set of points are shown below.
Each point cloud P = (p 1 ,p 2 ,…,p n ), corresponding to a wall, can be fit to a plane defined by the equation: The best fit plane is that which minimizes the sum of the distances between every point p i and the plane Π. Therefore, we can calculate each fit plane by minimizing the expression: (c)

Results
Our approach was tested on panoramic range data appertaining to inhabited interiors. Note that we are dealing with very complex scenarios, in which furniture and other objects contribute to clutter and occlusions. They contain a wide variety of objects that occlude not only the walls, but also the ceiling and floor. Three to six laser scans were taken per room. A FARO Photon laser scanner provided 38 million points per room. Figures 7 and 8 show the resulting 3D B-rep models for two rooms.
In this section we present the results obtained for two different inhabited indoors. These interiors do not have rectangular but arbitrary plans. The first room corresponds to the Virtual Reality Lab at the Escuela Técnica Superior de Ingenieros Industriales (UCLM). The second one is the living room of a private flat.
After defining the B-rep model we aim to investigate the accuracy of the obtained models. Firstly, we determine the error committed when we represent the walls of the rooms by means of planes. We thus measure the quadratic distance of every point of the walls to the corresponding plane. Distances between the sensed points and the planes are represented in colormaps in Figure 6(a) for different walls of the lab and (b) for two walls belonging to the living room. It can be seen different regions where an important variation of color is produced. These areas can be owing to typical objects hanging on the wall (pictures, posters, and so on), moldings and doorframes. Of course, some of the errors come from the fact that the walls, ceiling and floor are not totally flat. The corresponding error means, for both inhabited interiors, are presented in Table 1, in which d represents the mean value for the distances between each 3D point and its corresponding plane. The mean error of the fitted planes ranged between 0.59 and 4.64 cm for the lab and between 0.19 and 6.5 cm for the room. In any case, for the majority of the walls the mean error is around below 2 centimeters, despite being severely occluded by tables and chairs. On the other hand, the wall which fitted worse was number 2 of the lab. In this case, there was a big projection panel which largely occluded the wall. Most points sensed in this part of the room corresponded to the panel, so that the calculated plane fitted the panel instead of fitting the wall.
We focused most of our analysis on understanding the performance of our modeling results, since this aspect of the algorithm is considerably less studied than planar wall modeling. We considered two aspects of the performance: first, how reliably the walls can be detected, and second, how accurately they are modeled. To answer the first question, we compared the detected walls with walls of the ground truth model. No fails were reported in this aspect and all existing walls were correctly detected. Failed detections mainly might occur in severe occlusion circumstances.
The second question was tackled first generating the geometrical ground truth of the scenes. We constructed by hand the ground truth models of the rooms with the help of a Leica DISTOTM A6 laser tape measure, which provides 1 millimeter accuracy. In order to assess the error committed, the ground truth models were then compared with our 3D models (see Figures 7 and 8). The results are summarized in Table 2.   We first compared the lengths of the vertical and horizontal edges of each face of the ground truth models with ours and the difference between are denoted as d v and d h in Table 2. The value of d v was similar for all pairs of walls; around 0.87 cm for the lab and 0.80 cm for the living room. The smallest value of d h in the living room was for number 3. In this case, the mean value was less than 2 cm in both rooms.
In order to compare the accuracy of the orientation of the faces, we calculated the difference between the respective normal vectors (α in Table 2). The smallest faces yielded a high rate in α, which distorted the average value. Thus, although the mean value of α were 1.59° and 1.85° for the lab and the living room respectively, in the majority of the faces α was less than this value.
Once we have presented the results for these two inhabited interiors, we can establish some difference between them. The living room has more and smaller walls than the laboratory, what might lead the reader to believe that the segmentation process is particularly complicated in this case. However, if we compare the mean values for the different parameters in Tables 1 and 2, the deviations with the ground truth are lower in the living room's case. This result can be due to the fact that a multitude of pieces of furniture occlude the walls of the room and, consequently, the segmentation process may yield more imprecise segments.

Conclusions
Automatic creation of "as-is models" of inhabited scenarios is a challenge research field which is gaining attention from different applications in AEC/FM contexts. In this paper an approach for automatically creating Boundary Representation Models (B-rep) from dense point clouds collected by laser scanners is presented.
The method here proposed is based on segmenting the data by optimizing the discretization of the space. We make a 2D projection of the data in the voxel space and coarsely determine the segments in which the points lie. The voxels' size is adjusted to the walls, taking into account the flatness tolerances in construction, improving the approach of previous works [18,19] in which predefined voxels' sizes were used. Then the boundary representation model is generated by intersecting the planes that contains the 3D segments, calculating the faces, edges and vertices, and establishing the relationship between components. This idea has been tested in arbitrary shape plans providing excellent results.
The results lead us to state that the majority of the wall's points were correctly segmented. We have also evaluated the performance and accuracy of our method comparing the ground truth and the query B-rep models. Overall modeling accuracy for the AEC domain typically needs to be at least within 2.5 cm of the ground truth, so that the precision of our model is clearly above the standard (between 0.8 cm and 1.88 cm for vertical and horizontal edges).
This research is a part of a larger project which aims to obtain automated reverse engineering of buildings including more complex semantic models. Future improvements to the method will be addressed in two lines: extend the method to non-flat walls and generate B-rep models including details (paintings, panels, etc.) and parts of the walls (moldings, doorframes, windows frames, etc.) in inhabited buildings. As other potential applications, these 3D models can be helpful to create, in an automatic manner, virtual scenes in which digitized objects are introduced in order to generate virtual exhibitions as the ones shown in [24][25][26].