Indoor Clutter Object Removal Method for an As-Built Building Information Model Using a Two-Dimensional Projection Approach

: Point cloud data are used to create an as-built building information model (as-built BIM) that reﬂects the actual status of any building, whether being constructed or already completed. However, indoor clutter objects in the point cloud data, such as people, tools, and materials, should be effectively eliminated to create the as-built BIM. In this study, the authors proposed a novel method to automatically remove indoor clutter objects based on the Manhattan World assumption and object characteristics. Our method adopts a two-dimensional (2D) projection of a 3D point cloud approach and utilizes different properties of indoor clutter objects and structural elements in the point cloud. Voxel-grid downsampling, density-based spatial clustering (DBSCAN), the statistical outlier removal (SOR) ﬁlter, and the unsupervised radius-based nearest neighbor search algorithm were applied to our method. Based on the evaluation of our proposed method using six actual scan datasets, we found that our method achieved a higher mean accuracy (0.94), precision (0.97), recall (0.90), and F1 core (0.93) than the commercial point cloud processing software. Our method shows better results than commercial point cloud processing software in classifying and removing indoor clutter objects in complex indoor environments acquired from construction sites. As a result, assumptions about the different properties of indoor clutter objects and structural elements are being used to identify indoor clutter objects. Additionally, it is conﬁrmed that the parameters used in the proposed method could be determined by the voxel size once it is decided during the downsampling process.


Introduction
An as-built building information model (BIM) is an innovative tool for construction management such as quality control, progress monitoring, inspection, safety monitoring, and so forth [1][2][3][4][5]. Additionally, as-built BIM is also useful for the control and monitoring of automated construction equipment and robots [6,7]. In order to create an as-built BIM, it is necessary to collect the actual status of construction sites or an existing building. The most common methods to gather status data are laser scanning or photogrammetry [8], and their output is point cloud data. However, the point cloud data typically includes unnecessary objects, such as people, tools, and materials, which are considered clutter in the point cloud data [9,10]. These clutter objects can negatively impact the accuracy and speed of automated as-built BIM creation, particularly in terms of point-cloud semantic segmentation of building elements [11,12]. Therefore, it is critical to remove these indoor clutter objects from the point cloud. Removing outdoor clutter objects from point cloud data can be easily accomplished manually. However, it is relatively difficult to remove indoor clutter objects due to the large number of items and their interconnection with building elements such as floors, walls, and ceilings [13]. Therefore, manual elimination of indoor clutter objects is an inefficient process, which is why automation is necessary to develop efficient as-built BIM creation methods [14].
Removing indoor clutter objects intersects with several research areas, including the automatic creation of as-built BIM, the automation of two-dimensional (2D) floor plan generation, and point cloud semantic segmentation. The authors believe that removing indoor clutter objects could enhance the outcomes of critical processes in Scan-to-BIM, such as point cloud semantic segmentation, line fitting, and plane fitting. The proposed method aimed to obtain an x-y plane from which indoor clutter objects were removed. The obtained x-y plane could be used to remove indoor clutter objects from the point cloud.
The most typical method for removing indoor clutter objects is the line-fitting-based method. The line-fitting-based approach identifies outliers using obtained lines or planes that represent the additional elements that need to be preserved [9,10,[14][15][16]. However, the line-fitting-based method often ignores certain types of elements, such as indoor columns or walls, depending on their parameter values defined a priori, and it may not accurately reflect the thickness of inner walls [14,16]. Furthermore, while an appropriate horizontal xy plane is essential, related studies have either manually selected this plane or determined it based on a z-axis value. Another category of removal methods are feature-based approaches. However, it is challenging to set an appropriate parameter value in clustering [17,18], and there is a limitation near the contact area of different objects [19]. In order to overcome these limitations, active research is being conducted on methods that use geometric features such as the normal vectors [18,[20][21][22], as well as those that use features extracted from deep learning models, such as pointNet [23]. Recently, deep learning models have been utilized to conduct semantic segmentation into learned classes. Subsequently, these models label any unlearned point cloud as clutter. While these methods classify structural elements based on distinct geometric features, distinguishing elements with similar features, such as walls and columns, remains challenging. Furthermore, removing indoor clutter objects that have geometrical features similar to structural elements poses a significant challenge.
This study proposes an indoor clutter object removal method that works even when the geometric features of indoor clutter objects are similar to those of structural elements. Additionally, the proposed method can extract a representative line of structural elements and identify indoor clutter objects and structural elements based on the Manhattan World (MW) assumption. The study is based on the following two assumptions:

•
Structural elements, such as columns and walls, are in contact with the floor and ceiling. • Indoor clutter objects mainly exist on the floor and do not extend to the ceiling.
The proposed method was developed based on these two assumptions and the 2D projection approach. The method uses voxel-grid downsampling, DBSCAN, a statistical outlier removal (SOR) filter, and an unsupervised radius-based nearest neighbor search algorithm.

Literature Review
The proposed method was developed based on a 2D projection approach of the point cloud. In this section, the authors provide an overview of the literature on line-fitting-based and feature-based methods.

Line-Fitting-Based Method
The following studies utilized the line-fitting-based method to extract structural element lines. The RANSAC-based method is a model-fitting method that can perform well in the presence of outliers [24]. It is particularly suitable for fitting 2D lines or planes of structural elements from a point cloud. Babacan et al. [10] created various horizontal slices based on the floor and ceiling to create as-built BIMs. The horizontal slice with the fewest indoor clutter objects was selected for RANSAC application. The 2D line derived from the structural elements was used in the as-built BIM. Pouraghdam et al. [15] selected the horizontal slice intended for RANSAC applications 0.3 m below the ceiling. Gankhuyag and Han [14] determined the z-coordinate of the floor and the ceiling to determine indoor height. The horizontal slice for RANSAC application was then extracted from the z-coordinate, which was estimated based on the multiplication of the threshold and the indoor height.
The RANSAC-based method is directly applicable to both horizontal slices and point clouds. When applied to point clouds, the RANSAC-based method includes fitting the plane of walls or columns as general structural elements. Previtali et al. [9] applied RANSAC to a point cloud to fit the plane of structural elements, and the location of the plane after fitting was used in as-built BIM modeling. Wang et al. [16] used RANSAC to detect wall candidates from a point cloud. The detected wall candidate was used in the line segment, and a 2D floor plan was created.
Several studies applied other line-fitting methods to obtain more accurate lines of structural elements. Kim and Lee [25] applied a voxelization-based method to obtain structural element lines. They utilized the horizontal slice between the floor and ceiling, which is the most clutter-free slice. Martens and Blankenbach [26] adopted morphological operations on the x-y plane of a point cloud to remove indoor noise. Wu et al. [27] developed a Modified Ring-Stepping Clustering (M-RSC) method to extract structural element lines in complex indoor environments. However, their method involved a manual task to remove indoor clutter object data. Macher et al. [28] applied the Maximum Likelihood Estimation SAMple Consensus (MLESAC) method to extract structural element lines. After that, the indoor clutter objects were removed from the structural element point that was obtained from the structural element line.
The line-fitting-based method can generate a point cloud of structural elements and 2D floor plans, considering indoor clutter objects in the point cloud. Notably, the RANSACbased approach has been robustly employed to extract lines or planes of structural elements. However, it struggles to accurately represent the thickness of inner walls or columns due to the challenges in optimizing the right parameters. Furthermore, line-fitting-based methods are required to select an appropriate horizontal slice. However, it is difficult to define the appropriate horizontal slice that is least affected by the indoor clutter object data.

Feature-Based Method
The feature-based method was primarily developed for semantic segmentation, and several studies have applied this approach to segment indoor point clouds. These methods can be categorized into clustering-based methods and deep learning-based methods.
The clustering-based method segments targets, such as the structural elements and the indoor object, and accounts for the indoor clutter objects using an approach that determines the point cloud of the clutter aside from the target. For example, Yang and Wu [22] used pointNet features to perform clustering-based segmentation of point clouds by applying DBSCAN to two selected features. Yao et al. [29] applied supervoxel and DBSCAN to remove the point cloud of the floor, while Chen et al. [19] proposed a new density-based clustering method to segment indoor objects. Czerniawski et al. [18] determined the normal vector from the point cloud and applied DBSCAN to the generated sparse normal space to preserve indoor objects and remove planar elements. Stojanovic et al. [12] segmented a point cloud based on the z-axis into three segments for the construction of an as-built BIM and used the point cloud of the middle segment to create the floor plan. They applied k-means clustering to the x-y plane to extract the structural elements. Romero-Jarén and Arranz [30] proposed an automatic segmentation and classification method based on geometric feature clustering. The indoor clutter was categorized into virtual other objects, virtual objects on the floor, and virtual objects on the ceiling. The main approach of this study was based on the fact that indoor clutter objects have non-planar characteristics.
The deep learning-based method is widely used for semantic segmentation of point clouds. In general, these methods identify and learn the key structural elements or primary objects of interest. They can also classify any unlearned point cloud as clutter. Park et al. [2] applied pointNet [23] for semantic segmentation, and Kim and Kim [31] applied DGCNN [32]. Perez-Perez et al. [33] developed their own model to apply scan-to-BIM. Besides, other deep learning models such as RandLa-Net [34] and pointNet++ [35] have recently been actively developed and applied to Scan-to-BIM research [36,37], segmenting both structural elements and indoor clutter objects. However, the classification perfor-mance between objects with similar geometric features, such as walls and columns, is still unsatisfactory. Therefore, if certain indoor clutter objects have similar geometric features with structural elements, it would be difficult to clearly identify those indoor clutter objects as clutter.
As the literature above mentions, the feature-based method is effective in utilizing the geometrical features of the object to be preserved. However, they have limitations in accurately determining indoor clutter objects that have similar geometrical features to the structural elements. Also, they tend to create errors when the clutter objects are close to the structural elements. Therefore, the authors believe that if the indoor clutter objects are removed before the semantic segmentation task, it may lead to more accurate as-built BIM modeling.

Method Overview and Assumptions
The proposed method was designed to efficiently eliminate indoor clutter objects from the point cloud data obtained from a construction site. It is based on the assumptions that the structural elements are connected from the floor to the ceiling and that indoor clutter objects exist mostly on the floor and are not connected from the floor to the ceiling. The proposed method used voxel-grid downsampling, DBSCAN clustering, and the SOR filter to accurately identify and remove indoor clutter while preserving the structural elements in the point cloud. Figure 1 illustrates the framework of the proposed method, which consists of seven steps (a to g). First, the proposed method receives an original point cloud as input data, as shown in Figure 1a. Subsequently, the point cloud near the floor and the ceiling is eliminated, as shown in Figure 1b. This is done to ensure that the method focuses on the indoor clutter on the floor, as the structural elements are assumed to be connected from the floor to the ceiling. The voxel-grid downsampling is then applied to generate a uniform point-cloud density, as shown in Figure 1c. This process reduces the computational burden and allows for faster processing. The x-and y-coordinates are then extracted from the point cloud that was downsampled with the voxel grid, as shown in Figure 1d.

Removal of the Floor and Ceiling
The proposed method was developed based on the Manhattan World assumption. According to the abovementioned assumptions, the histogram of the number of points Next, the extracted x-and y-coordinates of the structural element candidates are clustered through DBSCAN, as shown in Figure 1e. DBSCAN is used to group the points with similar spatial coordinates into clusters, which helps identify the structural elements. The SOR filter is then applied to obtain more accurate structural element candidates, and the indoor clutter objects are removed, as shown in Figure 1f. The SOR filter is used to smooth the surface of the structural element candidates and remove the remaining noise data. Finally, the obtained structural element candidates are used to search the structural elements in the point cloud using an unsupervised radius-based nearest neighbor search algorithm, as shown in Figure 1g. Once the voxel size is determined, the proposed method operates automatically, except for the SOR filtering step. The parameters required for each step are automatically determined based on the voxel size. The details of these steps are explained in Section 3.2.

Removal of the Floor and Ceiling
The proposed method was developed based on the Manhattan World assumption. According to the abovementioned assumptions, the histogram of the number of points according to the z-coordinate of the point cloud shows a sharp rise near the floor and ceiling, as seen in Figure 2a. However, there may be outdoor outliers of the target object based on the z-axis. To consider these outliers, the proposed method uses the average of the number of points based on the z-axis. This method automatically obtains the zcoordinates of the floor and ceiling by following these steps: (1) determining the average value of the number of points according to the z-coordinate; (2) preserving only the zcoordinate that has a greater number of points than the average number of points, as shown in Figure 2b; (3) distinguishing the low-ranking 30% and high-ranking 30% data based on the z-coordinate from the preserved z-coordinate and number of points data; and (4) determining the z-coordinates at which the number of points is maximized from the low-and high-ranking 30% data as the floor and ceiling z-coordinates, respectively.   The obtained floor and ceiling z-coordinates were used to remove the floor and ceiling. For this purpose, the value that added 0.2 m to the floor's z-coordinate (Z f loor ) was defined as Z min (for the data intended to be preserved), as shown in Equation (1). In addition, the value that subtracted 0.2 m from the ceiling's z-coordinate (Z ceiling ) was defined as Z max (for the data intended to be preserved), as shown in Equation (2). Z mid was determined using Equation (3), according to the determined Z min and Z max . The determined Z min , Z mid , and Z max were used to determine the parameters of the DBSCAN, SOR filter, and unsupervised radius-based nearest neighbor search algorithm in conjunction with the voxel size (used in the voxel-grid downsampling that will be performed subsequently). Figure 3 shows the data from which the floor and the ceiling were removed based on the z-coordinate of the determined Z min and Z max .

Voxel-Grid Downsampling
The point density of a point cloud obtained by a three-dimensional (3D) scanner varies depending on the distance between the scanner and the target of scanning. Objects that are far from the 3D scanner have a lower point density, whereas objects that are close to each other have a relatively higher point density. Meanwhile, the proposed method applies DBSCAN based on the x-and y-coordinates of the point cloud. From the perspective of DBSCAN operating on the x-and y-coordinates, the desired results cannot be obtained if the point density of the point cloud is not uniform. Therefore, to apply the proposed method, it is necessary for the point cloud to have a uniform point density. To achieve this, the method uses voxel-grid downsampling. Voxel-grid downsampling creates a voxel with a length equal to the previously defined voxel size, as shown in Figure 4. The voxel-grid downsampling recreates the representative point located at the center of the voxel instead of the points located inside the voxel. The voxel size was set to 0.05 m, which was appropriate for reducing the weight of the point cloud while preserving the shapes of the inner columns or the thickness of the inner wall. The authors utilized the voxel-grid downsampling algorithm in Open3d (ver. 0.17.0).

Voxel-Grid Downsampling
The point density of a point cloud obtained by a three-dimensional (3D) scanner varies depending on the distance between the scanner and the target of scanning. Objects that are far from the 3D scanner have a lower point density, whereas objects that are close to each other have a relatively higher point density. Meanwhile, the proposed method applies DBSCAN based on the x-and y-coordinates of the point cloud. From the perspective of DBSCAN operating on the x-and y-coordinates, the desired results cannot be obtained if the point density of the point cloud is not uniform. Therefore, to apply the proposed method, it is necessary for the point cloud to have a uniform point density. To achieve this, the method uses voxel-grid downsampling. Voxel-grid downsampling creates a voxel with a length equal to the previously defined voxel size, as shown in Figure 4. The voxelgrid downsampling recreates the representative point located at the center of the voxel instead of the points located inside the voxel. The voxel size was set to 0.05 m, which was appropriate for reducing the weight of the point cloud while preserving the shapes of the inner columns or the thickness of the inner wall. The authors utilized the voxel-grid downsampling algorithm in Open3d (ver. 0.17.0).

Extraction of XY Coordinates from the Point Cloud between Z mid and Z max
The x-and y-coordinates of the points to be used for DBSCAN are extracted between Z mid and Z max based on the z-axis. This is based on the second assumption in this study, that indoor clutter objects mainly exist on the floor and do not extend to the ceiling. The purpose of DBSCAN is to extract the x-and y-coordinates of the structural elements. Therefore, the proposed method uses the x-and y-coordinates between Z mid and Z max -which have a small impact on the indoor clutter objects-to increase the efficiency of DBSCAN.

Extraction of XY Coordinates from the Point Cloud between and
The x-and y-coordinates of the points to be used for DBSCAN are extracted between and based on the z-axis. This is based on the second assumption in this study, that indoor clutter objects mainly exist on the floor and do not extend to the ceiling. The purpose of DBSCAN is to extract the x-and y-coordinates of the structural elements. Therefore, the proposed method uses the x-and y-coordinates between and -which have a small impact on the indoor clutter objects-to increase the efficiency of DBSCAN. Figure 5a shows the x-y plane where the x-and y-coordinates are plotted from the total point cloud between and . Figure 5b shows the x-y plane where the x-and y-coordinates are plotted from the point cloud between and . It can be identified that the indoor clutter objects are less prevalent in Figure 5b than in Figure 5a. Therefore, the extraction of the x-and y-coordinates from the point cloud between and is more efficient for DBSCAN.

DBSCAN
DBSCAN requires two parameters: min points and epsilon. In the proposed method, the min points and epsilon are determined from the voxel size, , and .   Figure 5a shows the x-y plane where the x-and y-coordinates are plotted from the total point cloud between Z min and Z max . Figure 5b shows the x-y plane where the x-and y-coordinates are plotted from the point cloud between Z mid and Z max . It can be identified that the indoor clutter objects are less prevalent in Figure 5b than in Figure 5a. Therefore, the extraction of the x-and y-coordinates from the point cloud between Z mid and Z max is more efficient for DBSCAN.

Extraction of XY Coordinates from the Point Cloud between and
The x-and y-coordinates of the points to be used for DBSCAN are extracted between and based on the z-axis. This is based on the second assumption in this study, that indoor clutter objects mainly exist on the floor and do not extend to the ceiling. The purpose of DBSCAN is to extract the x-and y-coordinates of the structural elements. Therefore, the proposed method uses the x-and y-coordinates between and -which have a small impact on the indoor clutter objects-to increase the efficiency of DBSCAN. Figure 5a shows the x-y plane where the x-and y-coordinates are plotted from the total point cloud between and . Figure 5b shows the x-y plane where the x-and y-coordinates are plotted from the point cloud between and . It can be identified that the indoor clutter objects are less prevalent in Figure 5b than in Figure 5a. Therefore, the extraction of the x-and y-coordinates from the point cloud between and is more efficient for DBSCAN.

DBSCAN
DBSCAN requires two parameters: min points and epsilon. In the proposed method, the min points and epsilon are determined from the voxel size, , and . Figure 6 shows an example of the point arrangement of a wall downsampled to a voxel size of 0.05

DBSCAN
DBSCAN requires two parameters: min points and epsilon. In the proposed method, the min points and epsilon are determined from the voxel size, Z mid , and Z max . Figure 6 shows an example of the point arrangement of a wall downsampled to a voxel size of 0.05 m as an ideal case. The height of the input point cloud data was determined as the difference between Z mid and Z max , and the points were arranged at 0.05 m intervals. When viewed from the top (x-y plane), a cluster is formed in which an integer number of points divided by 0.05 m from the height of the input point cloud data gather at one point on the x-y plane. structural element candidates as the core points and border points, which are the result of DBSCAN. The authors utilized DBSCAN algorithm in scikit-learn library (ver. 1.2.0).   Therefore, the ideal epsilon value of DBSCAN operating in the 2D projection x-y plane was set to 0.05 m, which is the same as the voxel size in the proposed method, as shown in Equation (4). The ideal min point value was an integer value that eliminated the decimal point of the value after the division of the height of the input data (arranged in 0.05 intervals). In this study, the authors used an integer value obtained by subtracting two from the ideal min point value to account for possible omitted points, as shown in Equation (5). This approach allows for a more robust clustering result, even when some points may be missing from the data. Figure 7 shows the results following the use of the min points and the epsilon determined based on the height of the input data and the voxel size. The red points in Figure 7 indicate the data determined to be outlier points, and the black points include the core points and border points. The proposed method defines the structural element candidates as the core points and border points, which are the result of DBSCAN. The authors utilized DBSCAN algorithm in scikit-learn library (ver. 1.2.0). m as an ideal case. The height of the input point cloud data was determined as the difference between and , and the points were arranged at 0.05 m intervals. When viewed from the top (x-y plane), a cluster is formed in which an integer number of points divided by 0.05 m from the height of the input point cloud data gather at one point on the x-y plane.
Therefore, the ideal epsilon value of DBSCAN operating in the 2D projection x-y plane was set to 0.05 m, which is the same as the voxel size in the proposed method, as shown in Equation (4). The ideal min point value was an integer value that eliminated the decimal point of the value after the division of the height of the input data (arranged in 0.05 intervals). In this study, the authors used an integer value obtained by subtracting two from the ideal min point value to account for possible omitted points, as shown in Equation (5). This approach allows for a more robust clustering result, even when some points may be missing from the data. Figure 7 shows the results following the use of the min points and the epsilon determined based on the height of the input data and the voxel size. The red points in Figure 7 indicate the data determined to be outlier points, and the black points include the core points and border points. The proposed method defines the structural element candidates as the core points and border points, which are the result of DBSCAN. The authors utilized DBSCAN algorithm in scikit-learn library (ver. 1.2.0).

SOR Filter
The structural element candidates obtained from DBSCAN include the locations of structural elements, such as walls or columns, on the x-y plane. However, the points of Appl. Sci. 2023, 13, 9636 9 of 18 indoor clutter objects may remain. If these coordinates are used to search the point cloud of structural elements, which is used subsequently, it could result in errors associated with the determination of the structural elements as indoor clutter objects. Therefore, it is necessary to remove the x-and y-coordinates of indoor clutter objects. To achieve this, the proposed method used the SOR filter in Cloudcompare (v2.1.2). The SOR filter eliminates noise data based on the maximum distance, which is determined according to the standard deviation multiplier threshold (sT ) to estimate the outlier and the number of points (k), which is used to calculate the average distance (δ) and standard deviation (σ), as shown in Equation (6).
In the proposed method, the value of k for the SOR filter is set to the number of min points obtained from DBSCAN. Additionally, sT is set to 0.1 considering a voxel size of 0.05 m. Figure 8a shows the structural element candidates before applying the SOR filter, and Figure 8b shows the results after applying the SOR filter.

SOR Filter
The structural element candidates obtained from DBSCAN include the locations of structural elements, such as walls or columns, on the x-y plane. However, the points of indoor clutter objects may remain. If these coordinates are used to search the point cloud of structural elements, which is used subsequently, it could result in errors associated with the determination of the structural elements as indoor clutter objects. Therefore, it is necessary to remove the x-and y-coordinates of indoor clutter objects. To achieve this, the proposed method used the SOR filter in Cloudcompare (v2.1.2). The SOR filter eliminates noise data based on the maximum distance, which is determined according to the standard deviation multiplier threshold (s ) to estimate the outlier and the number of points ( ), which is used to calculate the average distance (δ) and standard deviation (σ), as shown in Equation (6).
In the proposed method, the value of for the SOR filter is set to the number of min points obtained from DBSCAN. Additionally, s is set to 0.1 considering a voxel size of 0.05 m. Figure 8a shows the structural element candidates before applying the SOR filter, and Figure 8b shows the results after applying the SOR filter.

Unsupervised Radius-Based Nearest Neighbor Search Algorithm
The refined structural element candidates obtained through SOR filtering were used to identify structural elements within the original point cloud. However, these structural element candidates do not directly correspond to the original point cloud due to the application of voxel-grid downsampling. To address this issue, nearest neighbor search algorithms such as unsupervised k-nearest neighbor search and unsupervised radius-based nearest neighbor search can be employed. In this study, the authors applied the unsupervised radius-based nearest neighbor search algorithm, which could determine the radius value based on the voxel size, as the setting for the k value in k-nearest neighbor was unclear. The radius value was set at 0.1 m, taking into consideration the voxel size of 0.05 m.
To ensure the effective operation of the unsupervised radius-based nearest neighbor search algorithm, a data structuring method must be selected. Popular options for data structuring include KD-tree and ball-tree, which can significantly reduce computational costs for nearest neighbor search. In this study, the results obtained from applying KDtree and ball-tree to the original point cloud (52,150,674 points) and the downsampled point cloud (106,391 points) were compared. The comparison results are summarized in Table 1. When applied to the original point cloud, the ball-tree method took 1 min and 48

Unsupervised Radius-Based Nearest Neighbor Search Algorithm
The refined structural element candidates obtained through SOR filtering were used to identify structural elements within the original point cloud. However, these structural element candidates do not directly correspond to the original point cloud due to the application of voxel-grid downsampling. To address this issue, nearest neighbor search algorithms such as unsupervised k-nearest neighbor search and unsupervised radiusbased nearest neighbor search can be employed. In this study, the authors applied the unsupervised radius-based nearest neighbor search algorithm, which could determine the radius value based on the voxel size, as the setting for the k value in k-nearest neighbor was unclear. The radius value was set at 0.1 m, taking into consideration the voxel size of 0.05 m.
To ensure the effective operation of the unsupervised radius-based nearest neighbor search algorithm, a data structuring method must be selected. Popular options for data structuring include KD-tree and ball-tree, which can significantly reduce computational costs for nearest neighbor search. In this study, the results obtained from applying KDtree and ball-tree to the original point cloud (52,150,674 points) and the downsampled point cloud (106,391 points) were compared. The comparison results are summarized in Table 1. When applied to the original point cloud, the ball-tree method took 1 min and 48 s, while the KD-tree method took 2 min and 11 s. Furthermore, when applied to the downsampled point cloud, the ball-tree method took 0.2 s, while the KD-tree method took 0.4 s. Consequently, this study employed the unsupervised radius-based nearest neighbor search algorithm using the ball-tree method for the exploration of structural elements within the original point cloud. Figure 9 shows the results of classifying indoor clutter objects and structural elements based on the proposed method. The authors utilized scikit-learn library (ver. 1.2.0). s, while the KD-tree method took 2 min and 11 s. Furthermore, when applied to the downsampled point cloud, the ball-tree method took 0.2 s, while the KD-tree method took 0.4 s. Consequently, this study employed the unsupervised radius-based nearest neighbor search algorithm using the ball-tree method for the exploration of structural elements within the original point cloud. Figure 9 shows the results of classifying indoor clutter objects and structural elements based on the proposed method. The authors utilized scikitlearn library (ver. 1.2.0).

Experimental Data
To perform a comprehensive performance evaluation of the proposed method, six actual scan datasets were used in this study. Four of these datasets were point cloud data obtained from the parking lot, basement, and apartments 1 and 2 of an apartment complex construction site. The remaining two datasets were point cloud data obtained from lecture rooms 1 and 2 at Inha University. Figure 10 shows the six experimental datasets after voxel-grid downsampling and removal of floors and ceilings. Detailed descriptions of these datasets can be found in Table 2.

Experimental Data
To perform a comprehensive performance evaluation of the proposed method, six actual scan datasets were used in this study. Four of these datasets were point cloud data obtained from the parking lot, basement, and apartments 1 and 2 of an apartment complex construction site. The remaining two datasets were point cloud data obtained from lecture rooms 1 and 2 at Inha University. Figure 10 shows the six experimental datasets after voxel-grid downsampling and removal of floors and ceilings. Detailed descriptions of these datasets can be found in Table 2.

Performance Evaluation and Metrics
To evaluate the proposed method's performance appropriately, this study applied voxel-grid downsampling and removed the floor and ceiling from the point cloud data. The ground truth data was manually labelled and classified into structural elements and indoor clutter objects. The red points indicate the structural elements, while the blue points indicate indoor clutter objects. This study compared the ground truth labels with the results of classifying the six experimental datasets into structural elements and indoor clutter objects using the proposed method.
Moreover, the proposed method's performance was compared with the Auto-Classify Indoor function of commercial point cloud processing software. Auto-Classify Indoor automatically classifies the point cloud into indoor elements, including walls, floors, ceilings, and other remaining parts, using feature-based methods. The performance of the proposed method was evaluated using metrics such as accuracy, precision, recall, and F1 score. These metrics are calculated based on the values from the confusion matrix. True Positive (TP) refers to instances that are actually structural elements and are also predicted as structural elements. True Negative (TN) refers to instances that are actually not structural elements, clutter objects in this study, and are not predicted as structural elements. False Positive (FP) refers to instances that are actually not structural elements but are

Performance Evaluation and Metrics
To evaluate the proposed method's performance appropriately, this study applied voxel-grid downsampling and removed the floor and ceiling from the point cloud data. The ground truth data was manually labelled and classified into structural elements and indoor clutter objects. The red points indicate the structural elements, while the blue points indicate indoor clutter objects. This study compared the ground truth labels with the results of classifying the six experimental datasets into structural elements and indoor clutter objects using the proposed method.
Moreover, the proposed method's performance was compared with the Auto-Classify Indoor function of commercial point cloud processing software. Auto-Classify Indoor automatically classifies the point cloud into indoor elements, including walls, floors, ceilings, and other remaining parts, using feature-based methods. The performance of the proposed method was evaluated using metrics such as accuracy, precision, recall, and F1 score. These metrics are calculated based on the values from the confusion matrix. True Positive (TP) refers to the test results correctly identified as structural elements. True Negative (TN) refers to the test results correctly identified as non-structural elements. False Positive (FP) refers to the test results incorrectly identified as structural elements. False Negative (FN) refers to the test results incorrectly identified as non-structural elements. Each metric is calculated according to Equations (7)- (10).

Experimental Results
All experiments were performed on a PC running Windows 10, equipped with an AMD Ryzen 9 5900X 12-Core Processor running at 3.70 GHz and 64 GB of RAM. This study aimed to classify the six datasets described in Figure 10 and Table 2 into structural elements and indoor clutter objects for removing indoor clutter objects in the point cloud. Figure 11 displays the ground truth and classification results of the structural elements and indoor clutter objects of the six datasets using the proposed method. The results of the ground truth-based accuracy, precision, recall, and F1 score are summarized in Table 3. The proposed method based on the six datasets achieved an average accuracy of 0.94, an average precision of 0.97, an average recall of 0.90, and an average F1 score of 0.94. Table 4 presents the classification results of the experimental data into structural elements and indoor clutter objects using the Auto-Classify Indoor function of commercial software. Table 5 presents the processing times of the proposed method for each dataset. The times were calculated except for the SOR filter step because it is manually performed. As observed in our experiments, the SOR filter step required a time of 1 to 2 s for its operation using commercialized software. Therefore, the times listed in Table 5 satisfactorily represent the time needed for the operation of the proposed method. Figure 12 presents a graph that compares the average performance of the proposed method and the Auto-Classify Indoor function of commercial software. indoor clutter objects using the Auto-Classify Indoor function of commercial software. Table 5 presents the processing times of the proposed method for each dataset. The times were calculated except for the SOR filter step because it is manually performed. As observed in our experiments, the SOR filter step required a time of 1 to 2 s for its operation using commercialized software. Therefore, the times listed in Table 5 satisfactorily represent the time needed for the operation of the proposed method. Figure 12 presents a graph that compares the average performance of the proposed method and the Auto-Classify Indoor function of commercial software.

Discussion
The experimental results classified the structural elements and the indoor clutter o the six datasets with an average accuracy of 0.94, an average precision of 0.97, an averag recall of 0.90, and an average F1 score of 0.93. In addition, the proposed method yielded improved performances for all evaluation metrics in comparison to the Auto-Classify In door function of the commercial software, as shown in Figure 12. In particular, all th metrics from the parking lot dataset and the apartment 1 dataset were 0.96 or higher. Th improved performance of the apartment 1 dataset was attributed to the set's relatively low indoor complexity. Meanwhile, improved performance was observed in the case of th parking lot's dataset (among all datasets), despite its high indoor complexity.
The proposed method yielded higher performance than the feature-based Auto-Clas sify Indoor function; this can be explained using Figure 13. Figure 13a shows the poin cloud that is classified into the indoor Auto-Classify function and the structural elements

Discussion
The experimental results classified the structural elements and the indoor clutter of the six datasets with an average accuracy of 0.94, an average precision of 0.97, an average recall of 0.90, and an average F1 score of 0.93. In addition, the proposed method yielded improved performances for all evaluation metrics in comparison to the Auto-Classify Indoor function of the commercial software, as shown in Figure 12. In particular, all the metrics from the parking lot dataset and the apartment 1 dataset were 0.96 or higher. The improved performance of the apartment 1 dataset was attributed to the set's relatively low indoor complexity. Meanwhile, improved performance was observed in the case of the parking lot's dataset (among all datasets), despite its high indoor complexity.
The proposed method yielded higher performance than the feature-based Auto-Classify Indoor function; this can be explained using Figure 13. Figure 13a shows the point cloud that is classified into the indoor Auto-Classify function and the structural elements, and Figure 13b is the actual target object. The highlighted area of Figure 13b is where the plasterboard was loaded on the floor. Therefore, it is appropriate to remove the plasterboard as indoor clutter objects. However, the point cloud of the plasterboard in the Auto-Classify Indoor function has vertical geometrical features, such as the wall, and it has therefore been classified as structural elements. Thus, the proposed method can operate powerfully even for indoor clutter objects with geometrical features similar to those of the structural elements.
The advantages of the proposed method were summarized above; however, it did not perform outstandingly in all experimental datasets. In particular, the results from datasets from lecture rooms 1 and 2 exhibited high performances in the case of the Auto-Classify Indoor function, but by a narrow margin. This was because the proposed method performed weakly near the window and the door, as shown in Figure 14. The proposed method was developed based on a 2D projection approach. The point cloud obtained from the window is unstable, and the points of the structural elements that exist above the door can be classified as indoor clutter in the DBSCAN of the proposed method. The authors reviewed the experimental results of the proposed method and found that most errors occurred at the upper part of the window and the door. Therefore, it is expected that an improvement in these limitations in the future could remove the indoor clutter objects more accurately. and Figure 13b is the actual target object. The highlighted area of Figure 13b is where the plasterboard was loaded on the floor. Therefore, it is appropriate to remove the plasterboard as indoor clutter objects. However, the point cloud of the plasterboard in the Auto-Classify Indoor function has vertical geometrical features, such as the wall, and it has therefore been classified as structural elements. Thus, the proposed method can operate powerfully even for indoor clutter objects with geometrical features similar to those of the structural elements. The advantages of the proposed method were summarized above; however, it did not perform outstandingly in all experimental datasets. In particular, the results from datasets from lecture rooms 1 and 2 exhibited high performances in the case of the Auto-Classify Indoor function, but by a narrow margin. This was because the proposed method performed weakly near the window and the door, as shown in Figure 14. The proposed method was developed based on a 2D projection approach. The point cloud obtained from the window is unstable, and the points of the structural elements that exist above the door can be classified as indoor clutter in the DBSCAN of the proposed method. The authors reviewed the experimental results of the proposed method and found that most errors occurred at the upper part of the window and the door. Therefore, it is expected that an improvement in these limitations in the future could remove the indoor clutter objects more accurately.   The advantages of the proposed method were summarized above; however, it did not perform outstandingly in all experimental datasets. In particular, the results from datasets from lecture rooms 1 and 2 exhibited high performances in the case of the Auto-Classify Indoor function, but by a narrow margin. This was because the proposed method performed weakly near the window and the door, as shown in Figure 14. The proposed method was developed based on a 2D projection approach. The point cloud obtained from the window is unstable, and the points of the structural elements that exist above the door can be classified as indoor clutter in the DBSCAN of the proposed method. The authors reviewed the experimental results of the proposed method and found that most errors occurred at the upper part of the window and the door. Therefore, it is expected that an improvement in these limitations in the future could remove the indoor clutter objects more accurately.

Conclusions
In this study, the authors proposed a novel method to determine the indoor clutter objects based on the assumptions that (a) the structural elements stretched from the floor to the ceiling and (b) the indoor clutter objects existed on the floor and did not stretch to the ceiling. The proposed method includes the removal of the floor and ceiling, voxelgrid downsampling, DBSCAN, SOR filter application, and an unsupervised radius-based nearest neighbor search algorithm.
The experiment with the six scan datasets from actual sites showed higher accuracy, precision, recall, and F1 scores than conventional methods in identifying indoor clutter objects. Specifically, the proposed method achieved an accuracy of 0.94, a precision of 0.97, a recall of 0.90, and an F1 score of 0.94. When compared to the Auto-Classify Indoor function of commercial point-cloud processing software, the proposed method showed higher performances by 0.10, 0.09, 0.07, and 0.09 in terms of accuracy, precision, recall, and F1 score, respectively.
The contributions of this study are as follows: • The proposed method can accurately determine and remove indoor clutter objects with higher performance than commercial software; • The proposed method can extract an appropriate x-y plane that represents structural elements, including inner walls and columns; • The proposed method can identify indoor clutter objects among objects with similar geometrical features to structural elements; • The parameters of DBSCAN, the SOR filter, and the unsupervised radius-based nearest neighbor search algorithm used in the proposed method are automatically determined by the voxel size.
However, the proposed method has some limitations in accurately determining the structural elements near windows and doors. In the future, the authors plan to improve this method to more accurately determine the structural elements at all locations, including regions near windows and doors. Furthermore, we will also consider pipes that are installed horizontally. The authors will adopt the proposed method in the Scan-to-BIM process to improve point cloud semantic segmentation results. The proposed method has the potential to be applied in various fields, such as architecture, civil engineering, and interior design.