Unsupervised Building Instance Segmentation of Airborne LiDAR Point Clouds for Parallel Reconstruction Analysis

: Efﬁcient building instance segmentation is necessary for many applications such as parallel reconstruction, management and analysis. However, most of the existing instance segmentation methods still suffer from low completeness, low correctness and low quality for building instance segmentation, which are especially obvious for complex building scenes. This paper proposes a novel unsupervised building instance segmentation (UBIS) method of airborne Light Detection and Ranging (LiDAR) point clouds for parallel reconstruction analysis, which combines a clustering algorithm and a novel model consistency evaluation method. The proposed method ﬁrst divides building point clouds into building instances by the improved kd tree 2D shared nearest neighbor clustering algorithm (Ikd-2DSNN). Then, the geometric feature of the building instance is obtained using the model consistency evaluation method, which is used to determine whether the building instance is a single building instance or a multi-building instance. Finally, for multiple building instances, the improved kd tree 3D shared nearest neighbor clustering algorithm (Ikd-3DSNN) is used to divide multi-building instances again to improve the accuracy of building instance segmentation. Our experimental results demonstrate that the proposed UBIS method obtained good performances for various buildings in different scenes such as high-rise building, podium buildings and a residential area with detached houses. A comparative analysis conﬁrms that the proposed UBIS method performed better than state-of-the-art methods.


Introduction
Light Detection and Ranging (LiDAR) is an active remote sensing system that is less affected by weather conditions such as light and can quickly, in real time, collect threedimensional surface information of the ground or ground objects. The data points obtained by the LiDAR system are dense, accurate, and have three-dimensional location information, so they are regarded as an important data source, for example, for point cloud classification, building instance segmentation and reconstructing three-dimensional building models. Automatic building instance segmentation from airborne LiDAR point clouds creates new opportunities for cadastral management, urban planning and world population monitoring. Traditionally, building instances are delineated by manual labeling from photogrammetry software. However, this process requires expensive equipment and qualified people. For this reason, building instance segmentation using automatic techniques has great potential and importance. Airborne laser scanning (ALS) greatly reduces the workload, shortens the time of field measurement, improves the efficiency of work and can quickly acquire high-precision 3D surface information of objects [1][2][3], which has been widely used in many fields such as point cloud filtering [4][5][6][7], 3D point cloud classification [8][9][10]; 3D building modeling [11][12][13][14][15] and individual tree segmentation [16][17][18][19][20]. With the development of science and technology and the increase in the application range of laser point clouds, many scholars have studied instance segmentation of laser point clouds in recent years.
Instance segmentation is developed on the basis of object detection, positioning and semantic segmentation. Due to its wide range of application scenarios and research value, this technology has attracted more and more attention in academic and industrial circles. Object detection or localization is a step in progression from coarse to fine object inference and interpretation. It not only provides the classes of the objects but also the location of the objects classified. Semantic segmentation gives fine inference by predicting labels for every point or pixel in the input data, and then, each point or pixel is labeled according to the object class within which it is enclosed. Furthering this evolution, instance segmentation gives different labels for separate instances of objects belonging to the same class [21]. Different instance segmentation scenes often use different algorithms, most of which are usually only applicable to limited scenes. The segmentation object instance scene mainly includes the instance segmentation of indoor objects [22][23][24][25], the instance segmentation of cars and people on outdoor roads [26][27][28][29][30], the instance segmentation of buildings in urban scenes and others [31]. Instance segmentation of indoor scenes is mainly used in the field of robotics [32], while instance segmentation of cars and people on outdoor roads is mainly used in the field of autonomous driving [33][34][35][36]. In recent years, there has been more and more research on the instance segmentation of indoor objects or the instance segmentation of cars and people on outdoor roads; their data sources include two-dimensional image data, depth image data and point cloud data, and their processing methods include traditional methods and deep learning methods. Buildings in the urban scene are mainly used in the fields of urban planning and management. There is little research in the research field.
In order to meet the needs of the subject query, reconstructed three-dimensional building models must be physically distinguishable. Until now, there is no accepted standard definition of building instance. From an architectural point of view, buildings as small as a single house of a few tens of square meters to urban complexes of millions of square meters all are considered as building instances. In this paper, a building instance is defined as follows: the protruding ground part does not have a common set (common connection part) with other buildings and cannot be distinguished on computer vision. The current building instance segmentation methods can be divided into two categories: traditional methods [37][38][39][40][41][42][43] and the deep learning-based method [44].
Traditional methods are common methods for building instance segmentation. Ramiya et al. [37] used an adopted filtering algorithm to divide a point cloud into ground points and non-ground points, and then, the Euclidean distance clustering algorithm was used to divide non-ground points into a point cloud cluster. The local surface normal was computed for each of the points in the cluster. The direction cosines of the normal were found out and a histogram was generated. Histogram parameters such as mean and standard deviation can be adopted to separate a building cluster from a non-building cluster. Wang et al. [38] first detected regions of building blocks from LiDAR point clouds, and the Euclidean clustering method was introduced to segment the building point clouds into individual clusters. Matei et al. [39] used a voxel of a 3D grid to divide a point cloud into ground points and non-ground points, and then, building instances were segmented by a smaller voxel size compared to ground classification. LiDAR point clouds are divided into ground points and non-ground points, and then, the moving window algorithm is used to divide the non-ground points into a point cloud cluster. Each point cloud cluster represents an individual building or tree, and then, the set of trees is removed from the point cloud cluster [40][41][42]. Yan et al. [43] proposed a dense matching point cloud building instance segmentation method. Specifically, on the basis of filtering point cloud and horizontal point cloud extraction and clustering, the roof point clouds are projected into a two-dimensional grid, and then, the non-roof point clouds are deleted. According to the topological relationship between the grids, the coverage of each building instance point cloud is obtained, and the building instance segmentation is realized.
Deep learning is a new research tendency in the field of machine learning. In recent years, it has been extensively studied such as point cloud semantic segmentation, point cloud classification and point cloud filtering. There are also studies related to building instance segmentation. Iglovikov et al. [44] present TernausNetV2, a simple fully convolutional network that allows extracting objects from high-resolution satellite imagery on an instance level.
Although the existing instance segmentation methods generally provide satisfactory building instance segmentation results, they have limitations. Many existing instance segmentation methods (Euclidean segmentation methods, voxel segmentation methods and moving window algorithm) perform well in some simple scenes, but for complex scenes such as podium building, they do not. In this paper, we propose a novel unsupervised building instance segmentation (UBIS) method of airborne LiDAR point clouds for parallel reconstruction analysis which combines a clustering algorithm and a model consistency evaluation method. First, an improved kd tree shared nearest neighbor clustering algorithm is used to segment building point clouds. Next, an improved minimum area bounding rectangle (MBR) algorithm can be used to determine whether the cluster is an isolated point cloud cluster or a cluster to be evaluated according to the thresholds of the point cloud cluster's MBR length and width. Finally, a model consistency evaluation method is used to detect whether the cluster to be evaluated is a single building instance or a multi-building instance, which is segmented again for multi-building instance to improve the completeness, accuracy and quality of building instance segmentation.
The remainder of this paper is organized as follows. Section 2 presents building instance segmentation. Section 3 gives the experiments and analysis. Finally, Section 4 presents our conclusions and future research directions.

Building Instance Segmentation
In this paper, we propose a novel unsupervised building instance segmentation (UBIS) method with a successive scheme that includes definition of building types (Section 2.1), building point cloud segmentation (Section 2.2), merging of building façade point clouds (Section 2.3), recognition and merging of roof detail instance (Section 2.4) and merging of isolated point cloud clusters (Section 2.5). An illustration of the UBIS method is given in Figure 1. The LiDAR datasets used in this study are classified as building regions, which are marked by manual labeling and a published benchmark dataset. Ground filtering and building segmentation are conducted as preprocessing, which is not studied in this article. First, Figures 1a and 1b represent aerial images corresponding to inputted building point clouds and the inputted building point clouds, respectively. Second, based on the kd tree 2D shared nearest neighbor clustering algorithm (Ikd-2DSNN) algorithm, the building point clouds are divided into single-building (single-building means that a point cloud cluster only contains one building) and multi-building (multi-building means that a point cloud cluster contains multiple buildings) instances, as shown in Figure 1c,d. Third, octreebased regional growing is used to remove façade point clouds from the multi-building instances, as shown in Figure 1e. Fourth, based on the kd tree 3D shared nearest neighbor clustering algorithm (Ikd-3DSNN) algorithm, multi-building roof planes from the multibuilding instances are divided into single-building roof instances, as shown in Figure 1f. Fifth, according to the characteristics of the building facade directly below the building roof, the building façade point cloud is merged into building roof instance as shown in Figure 1g. Sixth, according to the characteristics of the roof detail above the roof and roof instance MBR size, the roof detail cluster is recognized and merged into building instance, as shown in Figure 1h. Finally, the isolated point cloud cluster is merged into the nearest neighbor building instance, as shown in Figure 1i.

Definition of Building Types
A building is an artificial environment created by man, which is of great significance to human life. In order to manage, analyze and subject query building, a building is classified by its complexity.
High-rise building: Generally refers to high-rising residential buildings, as shown in Figure 2. Podium building: Generally refers to the ancillary building body at the bottom of the main body of a multi-storey, high-rise or super-high-rise building, which occupies an area larger than the standard floor area of the main body of the building, as shown in Figure 3.

Building Point Clouds Segmentation
The completeness, accuracy and quality of building instance segmentation are low with general instance segmentation for complex building scenes such as podium buildings and residential areas with detached houses. In this paper, we propose a building instance segmentation method which combines an improved Ikd-SNN clustering algorithm (Section 2.2.1) and a model consistency evaluation method (Section 2.2.2). The key steps of building instance segmentation are as follows.
Step 1: The point spacing dc 1 and ratio kt 1 are the neighbor search radius and the ratio of shared neighbor number to neighbor number, respectively. dc 1 and kt 1 are used as parameters of Ikd-2DSNN.
Step 2: The building point clouds are divided into building instances by Ikd-2DSNN.
Step 3: According to the thresholds of minimum bounding rectangle length and width, the improved minimum bounding rectangle algorithm (Section 2.2.3) is used to determine whether the building point cloud cluster is an isolated point cloud cluster or a building instance to be evaluated.
Step 4: The model consistency evaluation method is used to calculate the geometric feature of the building instance.
Step 5: The geometric feature of building instance is used to determine whether the building instance to be evaluated is a single building instance or a multi-building instance.
Step 6: Octree-based regional growing [45] is used to extract multi-building planes from multi-building instances. Then, according to the angle θ between multi-building planes' normal vector and horizontal direction, the multi-building facade point clouds are deleted. The multi-building non-facade point clouds are regarded as multi-building roof point clouds.
Step 7: Multi-building roof point clouds are divided into building roof instance by the Ikd-3DSNN algorithm. The parameters of neighbor search radius dc 2 and ratio kt 2 are used in the Ikd-3DSNN algorithm.

The Improved kd Tree Shared Nearest Neighbor Clustering Algorithm
Since the density-based spatial clustering of applications with noise (DBSCAN) algorithm [46] is not affected by noise, it is widely used in point cloud segmentation. DBSCAN is used to discover clusters of arbitrary shape in the presence of noise. Czerniawski et al. [47] used the DBSCAN clustering algorithm to detect major planes in cluttered point clouds. Plane sets were separated into individual planes by DBSCAN in xyz space [48]. Due to reflections from building materials, point cloud density is relatively uneven. While DBSCAN can find clusters of arbitrary shapes, it cannot handle data containing clusters of differing densities, since its density-based definition of core points cannot identify the core points of varying density clusters. A shared nearest neighbor approach to similarity was proposed by Jarvis and Patrick in [49] and also later by Levent [50] and Faustino [51]; it finds clusters that these approaches overlook, i.e., clusters of low or medium density which represented relatively uniform regions "surrounded" by non-uniform or higher density areas. However, the shared nearest neighbor approach may incorrectly connect distant points. In this paper, we propose an improved kd tree shared nearest neighbor clustering algorithm (Ikd-SNN).
The kd-SNN clustering algorithm uses a density based approach to find core points. First, the algorithm calculates the similarity matrix, and then, sparsify the similarity matrix is constructed by keeping only the k most similar neighbors and the shared nearest neighbor graph from the sparsified similarity matrix is constructed. Next, the shared nearest neighbor density of each point and the core points are found, and clusters from the core points are formed. Those points that do not form a shared nearest neighbor graph from the sparsified similarity matrix with the core point are regarded as noise points. Finally, all noise points are discarded and all non-noise, non-core points are assigned to clusters.
The kd-SNN clustering algorithm includes two parameters, which are neighbor number and shared neighbor number. Since the point cloud density is relatively uneven, it is possible to connect distant points from different building instances by kd-SNN clustering. Points within a point neighborhood radius are more likely to be the same instance than points within a point neighborhood number. Meanwhile, the point spacing is not affected by repeated scanning point clouds and farther points are not connected.
The point cloud density of the same building does not change much, and the point cloud density between different buildings will vary greatly. Shared neighbor number cannot distinguish the density changes within the neighborhood of each point, so the same instance and different instances cannot be distinguished well. Because the ratio of shared neighbor number to neighbor number can better distinguish the changes in point cloud density, the neighbor number is replaced by the ratio of shared neighbor number to neighbor number. The key steps of the improved kd tree shared nearest neighbor (Ikd-SNN) clustering algorithm are as follows.
(1) A point in the point cloud is selected and it is marked that the point has been visited. The point is regarded as the center point ct 1 and dc 1 is regarded as the radius of the sphere, and the sphere is the neighbor of the point. The number of points in the neighborhood radius dc 1 of the center point ct 1 is num 1 .
(2) Each point in the neighborhood radius is taken as the center point ct 2 , and the number of points num 2 within the radius dc 1 of the center point ct 2 is calculated.
(3) The number of shared points in num 1 and num 2 is num 3 ; num 1 and num 2 are compared, and the smaller value is assigned to num. The ratio kt of num to num 3 is the ratio of the number of shared neighbors to the number of neighbors.
(4) A point is added to the ct 1 cluster center if the ratio kt is greater than the threshold, and then, the point is marked as visited; otherwise, it is not marked.
(6) Repeat steps (1), (2), (3), (4) and (5) until all points are visited. Building point clouds are divided into building instances by the improved kd tree 2D shared nearest neighbor clustering algorithm (Ikd-2DSNN). Then, the model consistency evaluation method is used to distinguish whether the building instance is a single-building instance or a multi-building one. Next, octree-based regional growing is used to extract multi-building roof planes from the multi-building instances. Then, the multi-building roof planes are divided into single-building roof instances by the improved kd-tree 3D shared nearest neighbor clustering algorithm (Ikd-3DSNN). In the process of building instance segmentation, the kd-SNN algorithm is called the kd-2DSNN algorithm when calculating two-dimensional Euclidean distance in xoy space. The two-dimensional Euclidean distance dc is equal to dc 1 . The ratio kt is equal to kt 1 . In the same way, the Ikd-3DSNN algorithm calculates three-dimensional Euclidean distance in xyz space; the three-dimensional Euclidean distance dc is equal to dc 2 and the ratio kt is equal to kt 2 .

Model Consistency Evaluation Method
Based on the visual psychology of Gestalt theory, the factors that can attract human visual attention into some basic structures and structure clustering methods, including color constancy law, vicinity law, similarity law, Rubin's closure law, constant width law, symmetry law and convexity law, are summarized [52]. The height of the same building roof point cloud is basically similar and the density of that is uniform. Meanwhile, each building instance point cloud is a cluster, and the distance between buildings is much larger than the distance between points. The ratio of the actual building volume to the building volume based on the projected area stretch is close to 1. Based on the characteristics of Gestalt theory and buildings, a building is regarded as a column structure. According to the approximate consistency between a abuilding and a column structure model, this paper proposes a model consistency evaluation method based on the ratio of the actual building volume to the building volume based on the projected area stretch, which is used to determine whether the building instance is a single-building instance or a multibuilding one. A multi-building instance is further segmented to improve the completeness, correctness and quality of the building instance segmentation. The key steps of the model consistency evaluation method are as follows.
Step 1: Maximum and minimum of point clouds are calculated. The lateral dimensions of the grid are dc, and dc is equal to the point spacing dc 1 . A grid is established in xoy space, and then, the grid is filled with point clouds, as shown in Figure 5a.
Step 2: The grid with point clouds is marked as true. Otherwise, it is marked as false, as shown by the gray area in Figure 5b. The grid marked number num2d is counted, and the building projected area area2dt is calculated by num2d and dc.
Step 3: The height of the three-dimensional grid is equal to dc, and then, the number of the grid rows (nx), columns (ny) and layers (nz) is calculated by the maximum, minimum and dc. A three-dimensional grid is established according to step 1, and it is filled with point clouds, as shown in Figure 6a.
Step 4: The grid with point clouds is marked as true starting from the top layer of row zero and column zero for the three-dimensional grid, as shown in Figure 6b, and it is marked as visited. Then, all grids below the marked grid are marked as true, as shown in Figure 6c, and they are marked as visited.
Step 5: Step 4 is repeated until all grids are marked as visited, as shown in Figure 6d.
Step 6: The three-dimensional grid marked as true number num3d is counted. According to length, width and height of the grid and the marked grid number num3d, the actual volume vol3d of the building is calculated.
Step 7: According to the characteristics of the building column structure, the building projected area area2dt in the two-dimensional grid and the layer number nz of the threedimensional grid, the building volume vol3dt based on the projected area stretch is counted as Equation (1), as shown in Figure 5c.
where vol3dt represents the building volume based on the projected area stretch; area2dt represents building projected area; nz represents the layer number of the three-dimensional grid.
Step 8: Since the same building roof height is basically similar, whether the point cloud instance is a single-building instance or a multi-building instance is determined by the ratio of the actual building volume to the building volume based on the projected area stretch. The ratio is calculated as Equation (2).
where rt is a ratio of the actual building volume to the building volume based on the projected area stretch. The closer the ratio is to 1, the more likely it is to be a single-building cluster.

The Improved Minimum Bounding Rectangle (MBR) Algorithm
The MBR algorithm can obtain the minimum area among a rectangle with any shape and is often used to detect building boundaries [53][54][55]. The most classic MBR algorithm is the rotation method. According to a rotating point cloud cluster, a bounding rectangle area is calculated, and the MBR algorithm is improved based on the monotonicity change of the bounding rectangle area. The improved MBR algorithm is used to determine whether the cluster is an isolated point cloud cluster or an evaluated cluster according to the minimum bounding rectangle thresholds of the length and width. The pseudocode of the improved MBR algorithm is detailed in Algorithm 1. for θ ← θ + ∆θ to 90 do 6 Rotate point cloud θ degrees and calculate area S of current rotated building point cloud cluster;

Merging of Building Façade Point Clouds
Based on the characteristics of the building facade below the building roof, the building roof instance is projected into the xoy plane, and then, a projection region growing algorithm is utilized to obtain single-building instances with the building façade point clouds and building attachment point clouds. Figure 7 illustrates the projection region growing algorithm (the red point clouds represent the building roof and the blue point clouds represent the building facade point clouds). The key steps of the projection region growing are as follows: Step 1: Maximum and minimum values of the building roof instance point clouds are calculated. The lateral dimensions of the grid are dc, and dc is equal to the point spacing dc 1 . Two-dimensional grid 1 is established, and then, it is filled with the building roof instance point clouds.
Step 2: The same grid 2 as in step 1 is created, and two-dimensional grid 2 is filled with the multi-building instances point clouds.
Step 3: In grid 2, the point clouds corresponding to the point cloud in grid 1 are point clouds of projection region growing.
Step 4: Repeat step 2 and step 3 for each single-building roof instance; the singlebuilding instances with building facade point clouds and building attachment point clouds are obtained.

Recognition and Merging of Roof Detail Instance
The building attachment point clouds are obtained as shown in Section 2.3, and then, the building attachment point clouds are divided into building instances by Ikd-2DSNN. The instance is judged by the MBR maximum MBR max (i.e., MBR length and MBR width are equal to MBR max ), the maximum difference value (dv max ) and the minimum difference value (dv min ). The instance is a roof detail instance if the length and width of the building roof instance MBR are less than MBR max and the difference value is within the range of dv max and dv min . The maximum difference threshold and the minimum difference threshold between the maximum height value of roof detail instance and the maximum height value of the single-building instance with building facade point clouds are dv max and dv min , respectively. The instance is a small building instance if it is not a roof detail instance and the length and width of the cluster point clouds MBR are larger than the length and width thresholds of minimum building MBR MBR min ; otherwise, it is an isolated point cloud cluster.

Merging of Isolated Point Cloud Clusters
After merging of building façade point clouds and roof detail instance point clouds, there are still some isolated point cloud clusters (an isolated point cloud cluster generally refers to some building outlier point clouds or sparse roof point clouds caused by material reflection). Each point of isolated point cloud clusters is treated as the search center to find the nearest building point to the discrete point. The label of the nearest building point is assigned to the point, and then, the discrete points are merged into the nearest building instance.

Experiments and Analysis
The implementation details of the experiments including the descriptions of the benchmark datasets, the evaluation criteria and the parameter settings of the UBIS method are described in this section. The UBIS method was implemented in C++. The experiments were conducted on a computer with 8 GB RAM and an Intel Core i7-9750H @ 2.59 GHz CPU.

Datasets Description
The performance of the UBIS method was evaluated with five airborne LiDAR datasets. The mean flying height of dataset 1 and dataset 2 was 900 m in Ningbo, China, and the average LiDAR point spacing was roughly 0.35 m. The median point density of building point clouds was 8 points/m 2 . Dataset 1 was a purely high-rise building scene. Dataset 2 was a podium building scene. Dataset 3 was the international society for photogrammetry and remote sensing (ISPRS) benchmark dataset captured over Vaihingen in Germany [56,57]. The LiDAR data used in this paper have been classified as building regions [58]. The airborne LiDAR point clouds were acquired by a Leica ALS50 system at an average flying height of 500 m with a median point density of 6.7 points/m 2 . The average LiDAR point spacing was roughly 0.4~0.5 m. The dataset was a purely residential area with small, detached houses [59]. Dataset 4 and dataset 5 were dayton annotated LiDAR earth scan (DALES) datasets, which were collected using a Riegl Q1560 dual-channel system flown in a Piper PA31 Panther Navajo aircraft [60]. The entire aerial LiDAR collection spanned 330 km 2 over the city of Surrey in British Columbia, Canada [60]. The altitude was 1300 m, with a 400% minimum overlap. The median point cloud density was 6 points/m 2 . The average LiDAR point spacing was roughly 0.5~0.6 m. Dataset 4 and dataset 5 were purely residential areas with small, detached houses. The LiDAR datasets used in this study are classified as "building" regions, which are marked by manual labeling and published benchmark datasets. Ground filtering and building segmentation were conducted as preprocessing, which is not studied in this article.

Evaluation Criteria
For building instance segmentation, point-based evaluation and object-based evaluation based on point-in-polygon tests are not applicable. Object-based evaluation by evaluating the mutual overlap is calculated by IoU to evaluate building instance segmentation, which is correctly segmented building instance, under-segmented building instance or over-segmented building instance [61]. When IoU is larger than the threshold of minimum overlap, the automatically segmented building instance is a correctly segmented building instance. The automatically segmented building instance is an over-segmented building instance if IoU is less than the threshold of minimum overlap and contains only one building instance point cloud in the ground truth. The automatically segmented building instance is an under-segmented building instance if IoU is less than the threshold of minimum overlap and contains multiple building instance point clouds in the ground truth. IoU is counted as Equation (3).
where point c represents automatically segmented building instance point clouds and point g represents building instance point clouds in the ground truth.
Completeness is also referred to as the detection rate [62] or as the producer's accuracy [63] and is the percentage of entities in the reference that were detected. Correctness, also referred to as user's accuracy [63], indicates how well the detected entities match the reference and is closely linked to the false alarm rate. A good segmentation should have both high completeness and correctness, and quality is a comprehensive evaluation parameter for completeness and correctness. Based on the manual marking test data as the ground truth, the UBIS method utilized completeness, correctness and quality to verify building instance segmentation effect [64]. The completeness, correctness and quality are calculated as Equation (4), Equation (5) and Equation (6) where TP, FP and FN represent the number of correctly segmented building instances, the number of under-segmented building instances and the number of over-segmented building instances, respectively.

Parameter Settings
Five datasets were processed by the proposed UBIS method. Table 1 shows the key parameter settings of the UBIS method, which were set by trial and error. These parameter settings were used for all the experiments in this paper. rt, θ, MBR max , MBR min , dv max and dv min were set for the five datasets. However, five parameters (i.e., dc 1 , kt 1 , dc 2 , kt 2 and dc) were tuned according to the building point cloud density.

Figures 8-12
show the outcomes of the UBIS method for various types of buildings in different scenes. Five building point clouds datasets were selected from Ningbo, China, Germany (the ISPRS benchmark dataset) and Canada (DALES datasets). This building instance contains the evaluation on a per-building instance level. Figures 8a, 9a and 10a represent aerial images corresponding to input point clouds. Figures 11a and 12a do not provide the corresponding aerial images on the public dataset. They represent the point clouds of the entire scene displayed in different colors according to their heights. Figures 8b, 9b, 10b, 11b and 12b represent the five selected building point clouds, which show the LiDAR data displayed in different colors according to their heights. Figures 8c,  9c, 10c, 11c and 12c represent the results of the UBIS method, where each single-building instance is dotted in one color. It is worth noting that the proposed UBIS method combines building point cloud segmentation, merging of building façade point clouds, recognition and merging of roof detail instances and merging of isolated point cloud clusters, which significantly improves the problem of building instance segmentation in complex scenes. Figures 8d, 9d, 10d, 11d and 12d are the ground truth of each point cloud, which was manually labeled using CloudCompare point cloud software [65]. Figures 8e, 9e, 10e, 11e and 12e show the main differences between the UBIS method results and the ground truth (the minimum overlap between the UBIS method results and the ground truth is 0.75), where the yellow, red and blue regions represent correctly segmented buildings instance, under-segmented building instances and over-segmented building instances, respectively. More specifically, complex scenes' buildings, for example, where multiple building roofs are adjacent, the height of the building roof is similar and the density of the roof point cloud is uniform, are difficult to divide into different roof instances (e.g., the red region in Figures 8e, 9e, 10e, 11e and 12e). Because the point cloud density is sparse or some buildings are partially missing, or the structure of the building roof is extremely complex, it makes the building over-segmented (e.g., the blue region in Figures 8e, 9e, 10e,       To evaluate the performance of the UBIS method for building instance segmentation, the building instance segmentation result was compared to the manually marked ground truth in terms of the completeness, correctness and quality. IoU 0.5 and IoU 0.75 are two typical IoU levels [55] (i.e., IoU 0.5 and IoU 0.75 are minimum overlaps of 50% and 75% with the corresponding building instance in the ground truth, respectively). The units are %. The corresponding metrics and runtime on the selected point clouds are listed in Table 2. It was found that the UBIS method can acquire a good performance for building instance segmentation of airborne LiDAR point clouds with high-rise buildings, podium buildings and a residential area with detached houses. In the high-rise building scene, each point cloud cluster only contains one building instance, and its instance segmentation accuracy is the highest. In the podium building scene, due to occlusion of buildings or extremely complex building roof structure, the building instance segmentation includes over-segmented building instances. The instance segmentation accuracy is lower than that of the high-rise building scene instance segmentation. In the residential area with detached houses, the distance between the buildings is relatively close, and some adjacent buildings have the same height and uniform density, so the segmentation result contains some under-segmented building instances. The completeness, correctness and quality of the UBIS method are above 92% for the five datasets.

Performance Comparison
To further compare the performance of the UBIS method with other state-of-theart approaches, we conducted building instance segmentation using different selected methods on the point clouds of the five datasets. The three-dimensional Euclidean-based segmentation method (ES3D) [37,38] and the moving window algorithm (MV) [40][41][42] are the most popular building instance segmentation methods; Locally convex connected patches (LCCP) [32] is an efficient learning-and model-free approach for the segmentation of 3D point clouds into object parts. To further illustrate that the proposed UBIS method fully combines the advantages of the two-dimensional clustering algorithm and threedimensional clustering algorithm, the two-dimensional Euclidean based-segmentation method is introduced and compared with the method proposed in this paper. The twodimensional Euclidean-based segmentation method (ES2D) is similar to ES3D and calculates two-dimensional Euclidean distance in xoy space. For the existing building instance segmentation method, the authors only mention those algorithms or mention those algorithms and show the result of the processing, and they do not use public datasets or open, own datasets and those algorithms' parameters. The key parameters of the ES3D method, the MV method, the ES2D method and the LCCP method were set by experiment. Figures 13-17 show the outcomes of the five methods on the five selected point clouds datasets, the scale and orientation of each figure, respectively. Table 3 lists their corresponding metrics; IoU 0.75 is used to evaluate the completeness, correctness and quality of building instance segmentation (i.e., IoU 0.75 is minimum overlap of 75% with the corresponding building instance in the ground truth). It was found that the UBIS method outperformed the benchmark methods on the selected point clouds. More specifically, many observations can be noted based on these comparison results: (1) The building instance segmentation completeness and quality of the ES2D method and MV method are lower than that of the UBIS method. The building instance segmentation completeness and quality of the ES3D method and LCCP method are far lower than that of the UBIS method. The building instance segmentation correctness of the ES2D method, MV method, ES3D method and LCCP method is lower than that of the UBIS method. (2) The building instance segmentation completeness and quality of the ES2D method and MV method are far higher than that of the ES3D method and LCCP method. The building instance segmentation correctness of the ES2D method and MV method is close to that of the ES3D method and LCCP method. (3) In this article, the proposed UBIS method combines building point cloud segmentation, merging of building façade point clouds, recognition and merging of roof detail instance and merging of isolated point cloud clusters, which significantly improve the completeness, correctness and quality of building instance segmentation in different scenes.        Figure 18a is the building point clouds. Figure 18b shows that the UBIS method can not only segment high-rise buildings in the blue circle but also podium buildings in the green circle. Figure 18c,d show that the MV method and the ES2D method can only segment high-rise buildings in the blue circle, while the podium buildings in the red circle will cause under-segmentation. Figures 18e and 18f show the results of the ES3D method and the LCCP method, respectively. The blue circle and the green circle respectively contain under-segmented and over-segmented podium buildings and over-segmented buildings with façade point clouds. Figure 18g shows that the UBIS method can segment interconnected buildings in the blue circle. Figure 18h,i show that the MV method and the ES2D method will cause under-segmentation for interconnected buildings in the blue circle. Figure 18j,k show the ES3D method and the LCCP method results in the blue circle, which contain under-segmentation and over-segmentation for podium buildings and buildings with façade point clouds.

Conclusions
In this paper, we propose a novel unsupervised building instance segmentation (UBIS) method for parallel reconstruction analysis, which combines a clustering algorithm and a model consistency evaluation method. The proposed building instance segmentation method makes full use of the advantages of the two-dimensional clustering algorithm and the three-dimensional clustering algorithm. Building point clouds are divided into building instances by a two-dimensional clustering algorithm (Ikd-2DSNN), which avoids the oversegmentation of building facade and roof detail point cloud. The model consistency evaluation method is used to distinguish whether the building instance is a single-building instance or a multi-building one. The three-dimensional clustering algorithm (Ikd-3DSNN) is used to segment multi-building instances again to improve the accuracy of building instance segmentation. The experimental results in Section 3.2 have demonstrated that the proposed UBIS method obtained good performance in various scenes of building point clouds and outperformed four state-of-the-art approaches, whose completeness, correctness and quality remained above 92% with five datasets. We found that the UBIS method also has = limitations for more complex cities with various typologies of buildingsfor example, the point cloud density is sparse, some buildings are partially missing or the building roof is extremely complex. In future work, we will further extend the method to improve the robustness of the building instance segmentation.