A Building Point Cloud Extraction Algorithm in Complex Scenes

: Buildings are significant components of digital cities, and their precise extraction is essential for the three-dimensional modeling of cities. However, it is difficult to accurately extract building features effectively in complex scenes, especially where trees and buildings are tightly adhered. This paper proposes a highly accurate building point cloud extraction method based solely on the geometric information of points in two stages. The coarsely extracted building point cloud in the first stage is iteratively refined with the help of mask polygons and the region growing algorithm in the second stage. To enhance accuracy, this paper combines the Alpha Shape algorithm with the neighborhood expansion method to generate mask polygons, which help fill in missing boundary points caused by the region growing algorithm. In addition, this paper performs mask extraction on the original points rather than non-ground points to solve the problem of incorrect identification of facade points near the ground using the cloth simulation filtering algorithm. The proposed method has shown excellent extraction accuracy on the Urban-LiDAR and Vaihingen datasets. Specifically, the proposed method outperforms the PointNet network by 20.73% in precision for roof extraction of the Vaihingen dataset and achieves comparable performance with the state-of-the-art HDL-JME-GGO network. Additionally, the proposed method demonstrated high accuracy in extracting building points, even in scenes where buildings were closely adjacent to trees.


Introduction
Building points are widely used in a variety of fields, including urban planning, cultural preservation, and disaster management, due to their capacity to capture detailed geometric features [1,2].With the rapid development of cities, the surrounding environment of buildings has become complicated, making accurate building extraction a difficult task [3][4][5][6].
Building point cloud extraction methods can be classified into two categories based on data sources: single-source methods and multi-source methods.Single-source methods only use LiDAR data to extract building points.Zou et al. [7] proposed an adaptive strips approach for extracting buildings, which used adaptive-weight polynomials to classify each point and extract the edge points of buildings based on the regional clustering relationship among the points.This method only utilized the three-dimensional coordinate values of LiDAR data without the need for other auxiliary information to successfully identify buildings.Huang et al. [8] developed a top-down method based on the object entity to extract building points.Ground points were separated from non-ground points, and non-ground points were split to identify smooth zones.The building regions were then distinguished from smooth regions by top-level processing using their geometric and penetrating properties.Lastly, employing topological, geometric, and penetrating properties, the down-level processing was used to eliminate non-building points surrounding structures accurate extraction of the building point cloud.The proposed method is evaluated on the Urban-LiDAR and Vaihingen datasets, demonstrating excellent extraction accuracy.The main contributions of this paper are summarized as follows: 1.
This paper combines the Alpha Shape algorithm with the neighborhood expansion method to compensate for the shortcomings of the region growing algorithm in the coarse extraction stage, thereby obtaining more complete building points.2.
To address the issue of misidentifying facade points near the ground, we perform mask extraction on the original points instead of non-ground points.This approach allows us to obtain more comprehensive facade points within the mask polygons compared to the ones obtained using the cloth simulation filtering algorithm.

3.
Even in cases where buildings are closely adjacent to trees, the proposed method can successfully separate and extract building points from tree points, thereby improving accuracy and reliability.

Methods
This section introduces the proposed method for building extraction in complex scenes in detail.Our method is divided into two stages, namely coarse extraction and fine extraction, to achieve accurate extraction of the building point cloud.
In the coarse extraction stage of the building point cloud, our proposed method identifies non-ground points in the point cloud using the cloth simulation filtering (CSF) algorithm and uses a region growing algorithm to obtain the coarse extraction of the building point cloud.At this stage, the region growing algorithm may fail to identify some building boundary points.
In the fine extraction stage of the building point cloud, our proposed method obtains mask polygons based on the coarsely extracted building points by applying the Alpha Shape algorithm and the neighborhood expansion method.The building point cloud is enlarged and replaced by non-ground points within mask polygons.Discrete tree points are removed from the building point cloud using the region growing algorithm and the Euclidean clustering algorithm.The building point cloud is then upgraded by merging with the facade point cloud near the ground.Noise points are removed using the radius filtering algorithm to obtain the final building point cloud.The detailed workflow and visualization flowchart for the building point cloud extraction are shown in Figures 1 and 2.

Coarse Extraction of the Building Point Cloud
Due to the large terrain undulations and uneven density distribution of points, traditional filtering algorithms have difficulty obtaining high-accuracy non-ground points.In order to remove ground points with high accuracy, this paper uses the CSF algorithm to separate non-ground points from ground points.
The basic idea of the CSF algorithm is to invert the original points and use a cloth model composed of spring-connected cloth particles to simulate the filtering process [16].The position of particles on grid nodes in space determines the shape of the fabric [17].According to Newton's Second Law, the relationship between particle position and force can be expressed as follows [18]: where m is the mass of the particle.X(t) is the position of the particle at time t.F e (X, t) is the external force on the particle.F i (X, t) is the internal force of the particle at position X at time t.
According to Equation (1), we first only calculate the influence of gravity on each particle, resulting in the position of each particle [18]: where G is the gravity.X(t) is the position of the particle at time t, and ∆t is the step length of time.
Next, consider the internal forces between particles to limit their displacement in the void area of the inverted points.The displacement of each particle is calculated as follows [18]: where → d is the displacement vector of particles.b is a parameter that determines whether a particle can move (b = 1 indicates it can move; b = 0 indicates it cannot move); p k is the position of adjacent particles of p 0 .
→ n = (0, 0, 1) T .Finally, the relative position of particles is adjusted based on the internal forces between them and the fabric stiffness parameters.If the distance between the actual point and the simulated particles is less than the pre-set threshold, it is considered a ground point; otherwise, it is considered a non-ground point (Figure 3).m ∂X(t) ∂t 2 = F e (X, t) + F i (X, t), where m is the mass of the particle.X(t) is the position of the particle at time t.F e (X, t) is the external force on the particle.F i (X, t) is the internal force of the particle at position X at time t.
According to Equation (1), we first only calculate the influence of gravity on each particle, resulting in the position of each particle [18]: where G is the gravity.X(t) is the position of the particle at time t, and ∆t is the step length of time.
Next, consider the internal forces between particles to limit their displacement in the void area of the inverted points.The displacement of each particle is calculated as follows [18]: where d ⃗ is the displacement vector of particles.b is a parameter that determines whether a particle can move (b = 1 indicates it can move; b = 0 indicates it cannot move); p k is the position of adjacent particles of p 0 .n ⃗ = (0, 0, 1) T .Finally, the relative position of particles is adjusted based on the internal forces between them and the fabric stiffness parameters.If the distance between the actual point and the simulated particles is less than the pre-set threshold, it is considered a ground point; otherwise, it is considered a non-ground point (Figure 3).After identifying non-ground points in the point cloud, we use the region growing algorithm to obtain the coarse extraction of the building point cloud from non-ground points.The algorithm selects the point with the minimum curvature as the initial seed point.Given a neighboring point A of a seed point B, if the angle between the normal vector of A (N neighbor ) and that of B(N seed ) is less than a given threshold θ (Equation ( 4)) and the curvature value of A(σ neighbor ) is less than a given threshold value σ (Equation ( 5)), point A is considered a new seed point.The region continues to grow until all points are processed (Figure 4) [19].
Here, θ and σ are usually small enough to avoid incorrectly identifying non-building points that are approximately planar as building points.In this case, the region growing algorithm may fail to extract some building boundary points due to the large angles between the local normal vectors of adjacent points (Figure 5b).σ neighbor < σ. ( Here, θ and σ are usually small enough to avoid incorrectly identifying non-building points that are approximately planar as building points.In this case, the region growing algorithm may fail to extract some building boundary points due to the large angles between the local normal vectors of adjacent points (Figure 5b).

Fine Extraction of the Building Point Cloud
Considering that the region growing algorithm may fail to include the boundary points of the buildings during the coarse extraction stage, the building point cloud is enlarged and replaced with the help of mask polygons.
In this paper, mask polygons are used to identify the points located within them.To obtain mask polygons, we first project the coarsely extracted building point cloud onto the XOY plane.Then, we use the Alpha Shape algorithm [20] to extract edge points from the projected points and finally extend the edge points through the neighborhood expansion method based on corresponding multipliers.
Here, θ and σ are usually small enough to avoid incorrectly identifying non-building points that are approximately planar as building points.In this case, the region growing algorithm may fail to extract some building boundary points due to the large angles between the local normal vectors of adjacent points (Figure 5b).

Fine Extraction of the Building Point Cloud
Considering that the region growing algorithm may fail to include the boundary points of the buildings during the coarse extraction stage, the building point cloud is enlarged and replaced with the help of mask polygons.
In this paper, mask polygons are used to identify the points located within them.To obtain mask polygons, we first project the coarsely extracted building point cloud onto the XOY plane.Then, we use the Alpha Shape algorithm [20] to extract edge points from the projected points and finally extend the edge points through the neighborhood expansion method based on corresponding multipliers.
Mask polygons are extracted in the following steps (Figure 6):

Fine Extraction of the Building Point Cloud
Considering that the region growing algorithm may fail to include the boundary points of the buildings during the coarse extraction stage, the building point cloud is enlarged and replaced with the help of mask polygons.
In this paper, mask polygons are used to identify the points located within them.To obtain mask polygons, we first project the coarsely extracted building point cloud onto the XOY plane.Then, we use the Alpha Shape algorithm [20] to extract edge points from the projected points and finally extend the edge points through the neighborhood expansion method based on corresponding multipliers.
Mask polygons are extracted in the following steps (Figure 6): (1) All possible pairs of projected points are processed in the same way.For any pair of points P 1 (x 1 , y 1 ) and P 2 (x 2 , y 2 ) from projected point cloud on the XOY plane of point cloud S, the center point P 3 (x 3 , y 3 ) of the circle whose distance from is calculated and is equal to α based on the distance intersection method (Figure 7) [21]: where (2) The distance d between each point in S and P 3 is calculated.If d is less than α, the point is considered to be inside the circle; otherwise, it is deemed to be outside the circle.If there are P 1 and P 2 such that there are no other points inside the circle, then P 1 and P 2 are defined as edge points, and P 1 P 2 is defined as a boundary line.The edge points are obtained until all point pairs in S have been processed.(3) The centroid coordinates Cen point of all edge points and the distance Dis point from each edge point to the Cen point , as well as the direction vector Dir point from the Cen point to each edge point, are calculated.Multi refers to the corresponding multipliers.The expanded corresponding edge point Exp point is as follows: (4) Edge points are sorted based on the polar angles between adjacent points and connect them to form a closed polygon for extracting points within the polygon.
(1) All possible pairs of projected points are processed in the same way.For any pair of points  1 ( 1 ,  1 ) and  2 ( 2 ,  2 ) from projected point cloud on the XOY plane of point cloud S, the center point  3 ( 3 ,  3 ) of the circle whose distance from is calculated and is equal to α based on the distance intersection method (Figure 7) [21]: where Figure 6.Mask polygon extraction using a combination of the Alpha Shape algorithm and neighborhood expansion method.(2) The distance d between each point in S and  3 is calculated.If d is less than α, the point is considered to be inside the circle; otherwise, it is deemed to be outside the circle.If there are  1 and  2 such that there are no other points inside the circle, then  1 and  2 are defined as edge points, and  1  2 is defined as a boundary line.The edge points are obtained until all point pairs in S have been processed.
(3) The centroid coordinates Cen  of all edge points and the distance Dis  from each edge point to the Cen  , as well as the direction vector Dir  from the Cen  to each edge point, are calculated. refers to the corresponding multipliers.The expanded corresponding edge point Exp  is as follows: (4) Edge points are sorted based on the polar angles between adjacent points and connect them to form a closed polygon for extracting points within the polygon.
The steps for connecting edge points are as follows: First, the center point of all edge points is calculated.Then, the edge points are sorted based on their polar angles relative to the center point in a counterclockwise direction in ascending order.Finally, all edge points are connected in counterclockwise order to create a closed polygon (Figure 8).(1) All possible pairs of projected points are processed in the same way.For any pair of points  1 ( 1 ,  1 ) and  2 ( 2 ,  2 ) from projected point cloud on the XOY plane of point cloud S, the center point  3 ( 3 ,  3 ) of the circle whose distance from is calculated and is equal to α based on the distance intersection method (Figure 7) [21]: where Figure 6.Mask polygon extraction using a combination of the Alpha Shape algorithm and neighborhood expansion method.(2) The distance d between each point in S and  3 is calculated.If d is less than α, the point is considered to be inside the circle; otherwise, it is deemed to be outside the circle.If there are  1 and  2 such that there are no other points inside the circle, then  1 and  2 are defined as edge points, and  1  2 is defined as a boundary line.The edge points are obtained until all point pairs in S have been processed.
(3) The centroid coordinates Cen  of all edge points and the distance Dis  from each edge point to the Cen  , as well as the direction vector Dir  from the Cen  to each edge point, are calculated. refers to the corresponding multipliers.The expanded corresponding edge point Exp  is as follows: (4) Edge points are sorted based on the polar angles between adjacent points and connect them to form a closed polygon for extracting points within the polygon.
The steps for connecting edge points are as follows: First, the center point of all edge points is calculated.Then, the edge points are sorted based on their polar angles relative to the center point in a counterclockwise direction in ascending order.Finally, all edge points are connected in counterclockwise order to create a closed polygon (Figure 8).The steps for connecting edge points are as follows: First, the center point of all edge points is calculated.Then, the edge points are sorted based on their polar angles relative to the center point in a counterclockwise direction in ascending order.Finally, all edge points are connected in counterclockwise order to create a closed polygon (Figure 8).After the mask polygons are obtained based on the coarsely extracted building point cloud, the building point cloud is enlarged and replaced by all non-ground points within the mask polygons.Due to the possibility of adding certain tree points to the building point cloud, we use the region growing algorithm and the Euclidean clustering algorithm [22] to filter out some discrete tree points from the building point cloud.
The specific operation process of the Euclidean clustering algorithm is as follows: (1) The K nearest neighbor points for any point P in space are found using the KD-Tree nearest neighbor search algorithm.(2) For the K nearest neighbor points, the Euclidean distance between each point and P is calculated.(3) If there are points within the K nearest neighbors that have a distance smaller than the set threshold, these points are clustered into a set Q. (4) The above process is repeated until the number of elements in set Q no longer increases.
At this stage, the threshold values for the normal vector and curvature in the region After the mask polygons are obtained based on the coarsely extracted building point cloud, the building point cloud is enlarged and replaced by all non-ground points within the mask polygons.Due to the possibility of adding certain tree points to the building point cloud, we use the region growing algorithm and the Euclidean clustering algorithm [22] to filter out some discrete tree points from the building point cloud.
The specific operation process of the Euclidean clustering algorithm is as follows: (1) The K nearest neighbor points for any point P in space are found using the KD-Tree nearest neighbor search algorithm.(2) For the K nearest neighbor points, the Euclidean distance between each point and P is calculated.
(3) If there are points within the K nearest neighbors that have a distance smaller than the set threshold, these points are clustered into a set Q. (4) The above process is repeated until the number of elements in set Q no longer increases.At this stage, the threshold values for the normal vector and curvature in the region growing algorithm are relatively large to include the boundary points of the buildings.
Subsequently, the building point cloud is upgraded by merging with the façade point cloud near the ground, which is obtained by conducting mask extraction on the original points instead of non-ground points and setting appropriate values for the Z-axis to adjust the height to a certain distance from the ground (Figure 9).Given that the façade point cloud may overlap with the existing building point cloud, the duplicate points are removed from the merged building point cloud.Finally, we use the radius filtering algorithm to remove the discrete noise points within the building point cloud.[22] to filter out some discrete tree points from the building point cloud.
The specific operation process of the Euclidean clustering algorithm is as follows: (1) The K nearest neighbor points for any point P in space are found using the KD-Tree nearest neighbor search algorithm.(2) For the K nearest neighbor points, the Euclidean distance between each point and P is calculated.(3) If there are points within the K nearest neighbors that have a distance smaller than the set threshold, these points are clustered into a set Q. (4) The above process is repeated until the number of elements in set Q no longer increases.
At this stage, the threshold values for the normal vector and curvature in the region growing algorithm are relatively large to include the boundary points of the buildings.
Subsequently, the building point cloud is upgraded by merging with the facade point cloud near the ground, which is obtained by conducting mask extraction on the original points instead of non-ground points and setting appropriate values for the Z-axis to adjust the height to a certain distance from the ground (Figure 9).Given that the facade point cloud may overlap with the existing building point cloud, the duplicate points are removed from the merged building point cloud.Finally, we use the radius filtering algorithm to remove the discrete noise points within the building point cloud.The main idea of the radius filtering algorithm is to assume that each point in the original points contains at least a certain number of neighboring points within a specified radius neighborhood [23].When this assumption is satisfied, the point is considered a valid point and retained.On the contrary, if the conditions are not met, it will be identified as a noise point and removed.As an example, Figure 10 specifies a radius of d.If at least one adjacent point is specified within this radius, only the blue points in the figure will be The main idea of the radius filtering algorithm is to assume that each point in the original points contains at least a certain number of neighboring points within a specified radius neighborhood [23].When this assumption is satisfied, the point is considered a valid point and retained.On the contrary, if the conditions are not met, it will be identified as a noise point and removed.As an example, Figure 10 specifies a radius of d.If at least one adjacent point is specified within this radius, only the blue points in the figure will be removed from the point cloud.If at least two adjacent points are specified within the radius, both the purple and black points will be removed.

Study Areas
To evaluate the performance of our proposed method, we conducted experiments on two datasets: the Urban-LiDAR dataset (https://www.lidar360.com/accessed on 2 May 2022) and the Vaihingen dataset (http://www2.isprs.org/accessed on 7 April 2022).The Urban-LiDAR dataset consists of a total of 719,823 points.The dataset includes various types of objects, including buildings, trees, and ground points, as shown in Figure 11.The terrain in this area has undergone significant changes, with dense vegetation and high buildings.

Study Areas
To evaluate the performance of our proposed method, we conducted experiments on two datasets: the Urban-LiDAR dataset (https://www.lidar360.com/accessed on 2 May 2022) and the Vaihingen dataset (http://www2.isprs.org/accessed on 7 April 2022).The Urban-LiDAR dataset consists of a total of 719,823 points.The dataset includes various types of objects, including buildings, trees, and ground points, as shown in Figure 11.
The terrain in this area has undergone significant changes, with dense vegetation and high buildings.

Study Areas
To evaluate the performance of our proposed method, we conducted experiments on two datasets: the Urban-LiDAR dataset (https://www.lidar360.com/accessed on 2 May 2022) and the Vaihingen dataset (http://www2.isprs.org/accessed on 7 April 2022).The Urban-LiDAR dataset consists of a total of 719,823 points.The dataset includes various types of objects, including buildings, trees, and ground points, as shown in Figure 11.The terrain in this area has undergone significant changes, with dense vegetation and high buildings.The Vaihingen dataset contains 411,722 points.The Vaihingen dataset is divided into two parts: Vaihi-1 and Vaihi-2, which have been processed separately in this paper, as shown in Figure 12 (displayed by elevation).In the Vaihingen dataset, non-ground points are composed of buildings, powerlines, low vegetation, cars, fences, hedges, shrubs, and trees; ground points are composed of impervious surfaces.The Vaihingen dataset is collected by the Leica ALS50 system with a point density of 4-8 m −2 .The terrain in this area is relatively flat, with sparse vegetation and low buildings.The Vaihingen dataset contains 411,722 points.The Vaihingen dataset is divided into two parts: Vaihi-1 and Vaihi-2, which have been processed separately in this paper, as shown in Figure 12 (displayed by elevation).In the Vaihingen dataset, non-ground points are composed of buildings, powerlines, low vegetation, cars, fences, hedges, shrubs, and trees; ground points are composed of impervious surfaces.The Vaihingen dataset is collected by the Leica ALS50 system with a point density of 4-8 m −2 .The terrain in this area is relatively flat, with sparse vegetation and low buildings.

Parameter Settings
In the process of extracting the building point cloud, this paper involves some important algorithms, including the CSF algorithm, the region growing algorithm, and the Euclidean clustering algorithm.In this article, the parameters we set are mainly based on the density of points and terrain undulations.The specific parameter settings are shown in Table 1, where the parameter settings of the region growing algorithm are used for the coarse extraction stage of building points.

Parameter Settings
In the process of extracting the building point cloud, this paper involves some important algorithms, including the CSF algorithm, the region growing algorithm, and the Euclidean clustering algorithm.In this article, the parameters we set are mainly based on the density of points and terrain undulations.The specific parameter settings are shown in Table 1, where the parameter settings of the region growing algorithm are used for the coarse extraction stage of building points.
When using the CSF algorithm to separate ground and non-ground points, the following key parameters play an important role: (1) cloth_resolution represents the size of the terrain coverage grid, that is, the setting of the grid resolution, which affects the precision of generating a digital terrain model (DTM).A larger cloth resolution usually leads to a rougher DTM generated; (2) max_iterations represents the maximum number of iterations; (3) classification_threshold represents the distance threshold between the actual point and the simulated terrain, used to divide the point cloud into ground points and non-ground points.In the coarse extraction stage of the building point cloud, the region growing algorithm is used to extract building points from non-ground points.The region growing algorithm involves the following key parameters: (1) theta_threshold represents the smoothing threshold; (2) curvature_threshold represents the curvature threshold; (3) neighbor_number represents the number of neighborhood search points; (4) min_pts_per_cluster represents the minimum number of points for each cluster; and (5) max_pts_per_cluster represents the maximum number of points in each cluster.
When using the Euclidean clustering algorithm to filter discrete tree points and obtain building points, the Euclidean clustering algorithm involves several important parameters: (1) tolerance represents the search radius of nearest neighbor search, which is the minimum Euclidean distance between two different clusters; (2) min_cluster_size represents the minimum number of cluster points; (3) max_cluster_size represents the maximum number of cluster points.

Evaluation Indicators
This paper uses precision, recall, and the F1 score as evaluation indicators to verify the effectiveness of the proposed method in extracting building points.
Precision represents the proportion of correctly predicted building points to all predicted building points [24]: Recall represents the proportion of correctly predicted building points to actual building points [24]: Recall = TP/(TP + FN), (10) The F1 score is the weighted average of precision and recall, which is closer to the smaller value of precision and recall [24]: where TP represents the number of correctly predicted building points, FP represents that non-building points are incorrectly predicted as building points, and FN represents that building points are incorrectly predicted as non-building points.

Benchmark Algorithm
To verify the effectiveness of the proposed method, a manually interactive recognition of the building point cloud was used as a reference.In the Urban LiDAR dataset, this paper mainly analyzes the building point cloud obtained through manual interactive recognition.In the Vaihingen dataset, this paper compares the PointNet [25], PointNet++ [26], and HDL-JME-GGO [27] networks with the proposed method.The PointNet, PointNet++, and HDL-JME-GGO networks estimate test data by learning from training data (Figure 13).

Benchmark Algorithm
To verify the effectiveness of the proposed method, a manually interactive recognition of the building point cloud was used as a reference.In the Urban LiDAR dataset, this paper mainly analyzes the building point cloud obtained through manual interactive recognition.In the Vaihingen dataset, this paper compares the PointNet [25], PointNet++ [26], and HDL-JME-GGO [27] networks with the proposed method.The PointNet, Point-Net++, and HDL-JME-GGO networks estimate test data by learning from training data (Figure 13).The basic idea of the PointNet network is to utilize a multi-layer perceptron to capture the feature information of the point, followed by the use of maximum pooling to aggregate these point features into a global feature representation.The PointNet network is able to directly process unordered point cloud data without considering the order of points.The basic idea of the PointNet network is to utilize a multi-layer perceptron to capture the feature information of the point, followed by the use of maximum pooling to aggregate these point features into a global feature representation.The PointNet network is able to directly process unordered point cloud data without considering the order of points.
The PointNet++ network incorporates a hierarchical structure comprising a sampling layer, a grouping layer, and a feature extraction layer.This structure allows for the organization of each point and its surrounding neighborhood into local regions, which are then processed using the PointNet network to extract features from the corresponding point cloud.By employing this hierarchical structure, the network becomes capable of effectively learning local feature information as the context scale expands.
The HDL-JME-GGO network utilizes layered data to enhance deep feature learning using the PointNet++ network.It incorporates a joint learning method based on nonlinear manifolds to globally optimize and embed deep features into a low-dimensional space, taking into account the contextual information of spatial and deep features.It effectively addresses artifacts caused by partitioning and sampling in the processing of large-scale datasets.This network achieves global regularization by optimizing initial labels to ensure spatial regularity, resulting in locally continuous and globally optimal classification results.

Results
We evaluated the building extraction performance of the proposed method on the Urban-LiDAR dataset and the Vaihingen dataset.The building point cloud could be divided into two non-overlapping point clouds: the facade point cloud and the roof point cloud.The separation of facade points and roof points was achieved based on the normal vector threshold in the Z direction.The extraction results of the proposed method on Urban-LiDAR, Vaihi-1, and Vaihi-2 data are shown in Figures 14-16, respectively.It was evident from the figures that the proposed method achieved a high level of accuracy in extracting building points.
vided into two non-overlapping point clouds: the facade point cloud and the roof point cloud.The separation of facade points and roof points was achieved based on the normal vector threshold in the Z direction.The extraction results of the proposed method on Urban-LiDAR, Vaihi-1, and Vaihi-2 data are shown in Figures 14-16, respectively.It was evident from the figures that the proposed method achieved a high level of accuracy in extracting building points.

Discussion
This paper evaluated the extraction results of the proposed method on Urban-LiDAR data, as shown in Table 2.For the roofs, the proposed method yielded a precision of 98.74%, a recall of 98.47%, and an F1 score of 98.60%.For the facades, the values were 97.98%, 70.94%, and 82.30%, respectively.In addition, we analyzed the extraction accuracy of the roof in the Urban-LiDAR data.

Discussion
This paper evaluated the extraction results of the proposed method on Urban-LiDAR data, as shown in Table 2.For the roofs, the proposed method yielded a precision of 98.74%, a recall of 98.47%, and an F1 score of 98.60%.For the facades, the values were 97.98%, 70.94%, and 82.30%, respectively.In addition, we analyzed the extraction accuracy of the roof in the Urban-LiDAR data.From Table 3, it can be seen that the highest precision, recall, and F1 scores all reached 100% (Roof 14 and Roof 29).The minimum accuracy rate of the roof was 79.57%, the recall was 89.13%, and the F1 score was 84.08% (Roof 28).The experimental results showed that the proposed method exhibited high accuracy and completeness in roof segmentation.Although the CSF algorithm can effectively separate ground points from non-ground points, it may mistakenly identify façade points that are closer to the ground as ground points.To solve this problem, this paper extracted masks based on original points rather than non-ground points and set appropriate values for the Z-axis to obtain the façade point cloud near the ground.Comparing Figure 17c, the façade points within mask polygons in the original points were more complete than those in the non-ground points acquired using the CSF algorithm.
Although the CSF algorithm can effectively separate ground points from non-ground points, it may mistakenly identify facade points that are closer to the ground as ground points.To solve this difficult problem, this paper extracted masks based on original points rather than non-ground points and set appropriate values for the Z-axis to obtain the facade point cloud near the ground.Comparing Figure 17c, the facade points within mask polygons in the original points were more complete than those in the non-ground points acquired using the CSF algorithm.In addition, we evaluated the effectiveness of building point extraction in two different scenes from the Urban-LiDAR dataset: a complex scene and a low-density scene.Figure 18c displayed the extracted building point cloud using the proposed method in the complex scene, and the precision, recall, and F1 score of the roof were 98.82%, 98.38%, and In addition, we evaluated the effectiveness of building point extraction in two different scenes from the Urban-LiDAR dataset: a complex scene and a low-density scene.Figure 18c displayed the extracted building point cloud using the proposed method in the complex scene, and the precision, recall, and F1 score of the roof were 98.82%, 98.38%, and 98.60%, respectively.It demonstrated that the proposed method could extract building points accurately in the complex scene.Figure 19c shows the extraction results using the proposed method in the scene with low point density.The recall of the roof was only 92.02%, but the precision was 99.41%, and the F1 score was 95.57%.It could be seen that there were relatively dense points with significant fluctuations at the edges of the original points, and even if we used the region growing algorithm to process it, points at that location could still be lost.
Our proposed method is compared with three segmentation networks: PointNet, PointNet++, and HDL-JME-GGO on the Vaihingen dataset.The performance indicators are listed in Table 4.The proposed method performed outstandingly in roof extraction, achieving a precision 20.73% higher than that of the PointNet network.However, the F1 score of the proposed method was only lower by 0.28% compared to the HDL-JME-GGO network.For facade extraction, the precision of the proposed method was 49.63% higher than that of the PointNet network, 16.53% higher than that of the PointNet++ network, but only 3.87% lower than that of the HDL-JME-GGO network.While our proposed method achieved slightly lower accuracy than the HDL-JME-GGO network, it considerably outperformed the PointNet and PointNet++ networks in extracting building points based on geometric information.
points accurately in the complex scene.Figure 19c shows the extraction results using the proposed method in the scene with low point density.The recall of the roof was only 92.02%, but the precision was 99.41%, and the F1 score was 95.57%.It could be seen that there were relatively dense points with significant fluctuations at the edges of the original points, and even if we used the region growing algorithm to process it, points at that location could still be lost.98.60%, respectively.It demonstrated that the proposed method could extract building points accurately in the complex scene.Figure 19c shows the extraction results using the proposed method in the scene with low point density.The recall of the roof was only 92.02%, but the precision was 99.41%, and the F1 score was 95.57%.It could be seen that there were relatively dense points with significant fluctuations at the edges of the original points, and even if we used the region growing algorithm to process it, points at that location could still be lost.Because the Vaihingen dataset was composed of the Vaihi-1 point cloud and the Vaihi-2 point cloud, we conducted a detailed analysis of the extraction results on the two-point clouds.For roof extraction, the proposed method achieved precision, recall, and an F1 score of 91.49%, 92.32%, and 91.90% for the Vaihi-1 point cloud and 96.27%, 83.93%, and 89.68% for the Vaihi-2 point cloud, respectively (Table 5).Furthermore, we selected 21 buildings and analyzed the roof extraction accuracy for both the Vaihi-1 point cloud and the Vaihi-2 point cloud (Table 6).For the Vaihi-1 point cloud, the proposed method achieved the highest precision, recall, and F1 score, all reaching 100%.The proposed method yielded the lowest precision, recall, and F1 score at 71.91%, 81.51%, and 76.41%, respectively.Regarding the Vaihi-2 point cloud, the proposed method achieved the highest precision (99.90%), recall (98.39%), and F1 score (99.04%).Conversely, the proposed algorithm exhibited the lowest precision (86.80%), recall (55.14%), and F1 score (71.05%).These results indicate the proposed method's capability to achieve high-accuracy results in roof extraction.
Although the proposed method achieved high accuracy in extracting the Vaihi-1 point cloud and the Vaihi-2 point cloud, there were still some shortcomings.Due to the limitations of the CSF algorithm, it may have difficulty extracting certain roof points close to the ground, such as those points shown in the white circle in Figure 20b.In addition, it was difficult to extract building points solely based on geometric information for some roofs with significant undulations, as shown in the black circle of building points in Figures 20b and 21b.Although the proposed method achieved high accuracy in extracting the Vaihi-1 point cloud and the Vaihi-2 point cloud, there were still some shortcomings.Due to the limitations of the CSF algorithm, it may have difficulty extracting certain roof points close to the ground, such as those points shown in the white circle in Figure 20b.In addition, it was difficult to extract building points solely based on geometric information for some roofs with significant undulations, as shown in the black circle of building points in

Conclusions
This paper proposes a highly accurate building point cloud extraction method based solely on the geometric information of points.The method is divided into two stages: coarse extraction and fine extraction.In the coarse extraction stage, a coarsely extracted building point cloud is obtained using the cloth simulation filtering algorithm and the region growing algorithm.In the fine extraction stage, the coarsely extracted building point cloud is iteratively refined using mask polygons and the region growing algorithm.The proposed method has shown excellent extraction accuracy on the Urban-LiDAR and

Conclusions
This paper proposes a highly accurate building point cloud extraction method based solely on the geometric information of points.The method is divided into two stages: coarse extraction and fine extraction.In the coarse extraction stage, a coarsely extracted building point cloud is obtained using the cloth simulation filtering algorithm and the region growing algorithm.In the fine extraction stage, the coarsely extracted building point cloud is iteratively refined using mask polygons and the region growing algorithm.The proposed method has shown excellent extraction accuracy on the Urban-LiDAR and Vaihingen datasets.On the Urban-LiDAR dataset, the method achieved a precision of 98.74%, a recall of 98.47%, and an F1 score of 98.60% for roof extraction.For facade extraction on the same dataset, the precision, recall, and F1 scores were 97.98%, 70.94%, and 82.30%, respectively.On the Vaihingen dataset, the proposed method outperformed the PointNet network by 20.73% in roof extraction precision and achieved comparable performance with the HDL-JME-GGO network.For facade extraction, the method surpassed the PointNet network by 49.63% in precision, the PointNet++ network by 16.53%, and fell slightly behind the HDL-JME-GGO network by only 3.87%.Additionally, the proposed method can still extract building points with high accuracy, even in cases where buildings are closely adjacent to trees.However, relying solely on geometric information for building extraction may face significant challenges for roofs with significant fluctuations or in situations where point density is low.We will introduce more feature information, such as color or texture, to enhance the ability to extract buildings, thereby achieving more accurate and complete building extraction in the future.

Figure 1 .Figure 2 .
Figure 1.Workflow of the building point cloud extraction.

Figure 3 .
Figure 3.The point cloud is divided into ground points and non-ground points using the CSF filtering algorithm (ground points are displayed in dark yellow, and non-ground points are displayed in blue).

Figure 3 .
Figure 3.The point cloud is divided into ground points and non-ground points using the CSF filtering algorithm (ground points are displayed in dark yellow, and non-ground points are displayed in blue).

Figure 4 .Figure 5 .
Figure 4. Plane segmentation results using the region growing algorithm.

Figure 4 .
Figure 4. Plane segmentation results using the region growing algorithm.

Figure 4 .Figure 5 .
Figure 4. Plane segmentation results using the region growing algorithm.

Figure 5 .
Figure 5. (a) Ground truth; (b) coarse extraction results using the region growing algorithm, with buildings in red, trees in green, and ground points in dark yellow.

Figure 7 .
Figure 7. Calculation of the center coordinates of a circle based on the distance intersection method.

Figure 6 .
Figure 6.Mask polygon extraction using a combination of the Alpha Shape algorithm and neighborhood expansion method.

Figure 7 .
Figure 7. Calculation of the center coordinates of a circle based on the distance intersection method.

Figure 7 .
Figure 7. Calculation of the center coordinates of a circle based on the distance intersection method.

Figure 8 .
Figure 8. Polygonal connection based on the polar angles.

Figure 8 .
Figure 8. Polygonal connection based on the polar angles.

Figure 9 .
Figure 9. Misclassification of building points using the CSF algorithm within the red circle, with ground points in dark yellow and non-ground points in blue.

Figure 9 .
Figure 9. Misclassification of building points using the CSF algorithm within the red circle, with ground points in dark yellow and non-ground points in blue.
Remote Sens. 2024, 16, x FOR PEER REVIEW 9 of 20 removed from the point cloud.If at least two adjacent points are specified within the radius, both the purple and black points will be removed.

Figure 13 .
Figure 13.Training data.Ground points are in dark yellow; the facades are in purple; the roofs are in red; and other elements are in green.

Figure 13 .
Figure 13.Training data.Ground points are in dark yellow; the facades are in purple; the roofs are in red; and other elements are in green.

Figure 14 .Figure 14 .Figure 14 .Figure 15 . 20 Figure 15 .Figure 16 .
Figure 14.Urban-LiDAR's extraction results: ground points are in dark yellow; tree points are in green; the facades are in purple; the roofs are in red.(a) Ground truth; (b) the extraction results using the proposed method.

Figure 16 .
Figure 16.Vaihi-2's extraction results: ground points are in dark yellow; the facades are in purple; the roofs are in red.(a) Ground truth; (b) the extraction results using the proposed method.

Figure 17 .
Figure 17.(a) Facade points within mask polygons in the original points; (b) the facade points within mask polygons in the non-ground points; (c) the overlay of (a,b).

Figure 17 .
Figure 17.(a) Facade points within mask polygons in the original points; (b) the facade points within mask polygons in the non-ground points; (c) the overlay of (a,b).

Figure 18 .Figure 19 .
Figure 18.Extraction of buildings results in complex scenes: (a) original data; (b) label data; (c) the extraction results of the building using the proposed method.

Figure 18 .
Figure 18.Extraction of buildings results in complex scenes: (a) original data; (b) label data; (c) the extraction results of the building using the proposed method.

Figure 18 .Figure 19 .
Figure 18.Extraction of buildings results in complex scenes: (a) original data; (b) label data; (c) the extraction results of the building using the proposed method.

Figure 19 .
Figure 19.Extraction of the buildings with low cloud density: (a) original point cloud; (b) manually delineated reference building points.The integration of texture information into data collected by unmanned aerial vehicles (UAVs) may introduce errors, as exemplified by the points highlighted in blue in the figure, which should ideally be categorized as building points; (c) the extracted building points using the proposed method.

Table 1 .
Parameter settings of some important algorithms.

Table 1 .
Parameter settings of some important algorithms.