Effective Selection of Variable Point Neighbourhood for Feature Point Extraction from Aerial Building Point Cloud Data

: Existing approaches that extract buildings from point cloud data do not select the appropriate neighbourhood for estimation of normals on individual points. However, the success of these approaches depends on correct estimation of the normal vector. In most cases, a ﬁxed neighbourhood is selected without considering the geometric structure of the object and the distribution of the input point cloud. Thus, considering the object structure and the heterogeneous distribution of the point cloud, this paper proposes a new effective approach for selecting a minimal neighbourhood, which can vary for each input point. For each point, a minimal number of neighbouring points are iteratively selected. At each iteration, based on the calculated standard deviation from a ﬁtted 3D line to the selected points, a decision is made adaptively about the neighbourhood. The selected minimal neighbouring points make the calculation of the normal vector accurate. The direction of the normal vector is then used to calculate the inside fold feature points. In addition, the Euclidean distance from a point to the calculated mean of its neighbouring points is used to make a decision about the boundary point. In the context of the accuracy evaluation, the experimental results conﬁrm the competitive performance of the proposed approach of neighbourhood selection over the state-of-the-art methods. Based on our generated ground truth data, the proposed fold and boundary point extraction techniques show more than 90% F1-scores.


Introduction
Estimation of the feature points and lines is a fundamental problem in the field of image and shape analysis as this estimation facilitates better understanding of an object in a variety of areas, e.g., data registration [1], data simplification [2], road extraction [3], and building reconstruction [4]. Specifically, the area of 3D building reconstruction has a broad range of applications, such as building type classification, urban planning, solar potential estimation, change detection, forest management, and virtual tours [5][6][7][8][9][10]. Due to the availability of 3D point cloud data, from both airborne and ground-based mobile laser scanning systems, the extraction of 3D feature points and lines from point cloud data has become an attractive research topic to describe an object shape more accurately.
The airborne Light Detection and Ranging (LiDAR) data from a geographic location is a set of unordered points. It mainly consists of the parameters from three independent dimensions with X, Y, and Z coordinates along with other retro-reflective properties generally in the form of intensities. These parameters and properties together can describe the topographic profile of the Earth's surface and object at that location [11]. Therefore, LiDAR data provide more accurate geometric information than the 2D image and are used as the main input data for automatic building reconstruction [12,13]. The reconstruction approaches from LiDAR data can be broadly categorised into two: model-driven and data-driven [7]. The first approach finds the most similar models previously stored in the database to the input data, whereas the second approach tries to generate any building model from the provided 3D data. The data-driven approach mainly finds different features, e.g., planer patches, lines, curves, angles, and corners, which represent the major components of a building structure, from the input building point cloud. By correctly grouping those features and geometric topologies, the models of the buildings are generated. To reconstruct the buildings, individual planar patches are firstly identified using one or more of the different segmentation algorithms such as region growing [14]. After that, the neighbouring segments are identified and the relationship among the patches are established based on different features such as co-planarity, intersection lines, corners, and edges [15]. Therefore, feature point extraction to construct feature lines and edges to establish a relationship among the planar patches in the 3D point cloud data is the main challenge for building reconstruction techniques in the data-driven approach. While the model driven approach is limited to the models in the library, the data driven approach works in general for any building roof shape.
Although various definitions of the 3D object edges can be found in the literature [16][17][18], in the area of building reconstruction, many of the authors categorised the 3D edges into the boundary and fold edges [16,19]. Roof contours and facade outlines are referred to as boundary edges [19], and fold edges or sharp feature lines are defined as the intersecting edges of two or more planes [10,20]. Ni et al. [16] considered that boundary elements have an abrupt angular gap in the shape formed by their neighbouring points and that the points in the fold edges have an abrupt directionality change between the directions in adjacent surfaces. Existing research on edge extraction in point clouds mostly considers either statistical and geometrical methods or the directionality and geometric changes [21]. To measure the directionality and the geometric changes, the estimation of normal and curvature for each 3D point in the data is an important factor and should be calculated accurately [22]. However, estimation of the normal vector along a building edge highly depends on the neighbourhood employed for each point [21,23], and the inharmonious nature of LiDAR point cloud makes the calculation of that neighbourhood complex and challenging [22,24]. Moreover, noise associated in the oblique point cloud data can create serious problems for calculating accurate normals in the context of affective and automatic building plane reconstruction [25].
Consequently, the k-neighbourhood (also, known as k-nearest neighbours or k-NN) and r-neighbourhood [26] are two traditional approaches in selecting neighbours of a given point P i . The former selects k number of nearest points from P i , and the latter contains all points for which the distance to P i is less than or equal to r. Selecting the value for k or r is challenging as the local geometry of the object is unknown [17]. A higher value of k or r may reduce the impact of noise on the estimated normal but information from several classes or planes can be mixed up in one neighbourhood, thus producing a wrong estimation [27]. In contrast, a lower value can prevent the capture of enough shape information [27]. Figure 1 shows that, while a small neighbourhood for P 3 may offer an unstable normal estimation due to local variations (e.g., corrugations on a metallic roof), a large neighbourhood for P 2 can skip the local variations and thus offers a better estimation. However, large neighbourhoods for P 1 and P 4 attract points from other planes and objects. Therefore, a wrong selection of neighbourhoods can result in a seriously faulty normal estimation. Nonetheless, the aircraft that holds the LiDAR system scans the studied area in a specific direction; thus, specific scanlines can be observed over the objects in the area. Figure 1 demonstrates the scanline direction (red arrows) of LiDAR points over a building roof.
Considering the LiDAR point density in addition to the heterogeneous point distribution, this paper proposes an effective neighbourhood selection method that is free from selecting any fixed parameter such as k or r. The proposed method finds the best fit 3D lines first and, then, considers the standard deviation of the fitted points. Based on the point density of the input data and the distance pattern of scanlines, it selects the minimal number of neighbours automatically for each point in the point cloud. The direction of the normal is then calculated for each point to extract the fold feature points of the roof. The terms "fold edge point", "fold feature point", or "fold point" are used alternatively in this paper to indicate the same thing. Based on the distance from the mean value of the selected minimal neighbours, a decision can be made easily about whether a point is a boundary point.
The particular contributions of this paper are as follows: • In the context of calculating an accurate normal, a new robust method is proposed for automatic selection of neighbouring points of each point in a LiDAR point cloud data. This proposed method can select the optimal minimum number of neighbouring points and, thus, can solve the existing problems of accurate normal calculation of individual points. • Based on the calculated direction of the normal, we propose an effective method for finding the fold feature points. Maximum angle differences of the neighbouring normal vectors are clustered, and an experimentally selected threshold is adopted to decide fold edge points. • To find the boundaries of individual objects, a new method for boundary point detection is suggested. This method depends on the distance from a point to the calculated mean of its neighbouring points, selected by the proposed technique of automatic neighbouring point selection.
The rest of the paper is organised as follows. Section 2 presents a review of the existing techniques for neighbourhood selection, normal calculation, and feature point extraction along with their challenges. The proposed method of neighbourhood selection along with the fold and boundary feature point extraction methods are discussed in Section 3. Extensive experimentations are presented and discussed in Section 4. Finally, Section 5 exposes the conclusion.

Review
The main objective of our work is to extract the feature points from point cloud data based on the minimal number of neighbourhood for each point. Calculation of the normal vectors is an important geometric property to find the feature points. Thus, in this section, firstly, we discuss the state-of-the-art approaches of neighbourhood selection methods.
Secondly, we discuss existing methods for calculating the normal of a point. Finally, we discuss the existing methods for feature point extraction techniques.

Neighbourhood Selection
Most of the existing feature point extraction techniques use the geometric properties (e.g., curvature, discontinuity, and angle) of a point based on its k or r neighbourhoods in the input point cloud data. The classical Principal Component Analysis (PCA) can estimate the important geometric features of a point by collecting its k number of neighbours [23]. The minimal value of k needs to be chosen manually, but in practice, a single global k is often not suitable for an entire point cloud, where different objects in different regions may have different geometric structures or point densities [17,23]. A large value of k over-smooths the sharp feature points, while a small neighbourhood is more sensitive to local variations and noise [28].
To avoid these issues, some authors proposed adaptive approaches instead of using the fixed minimal neighbourhood. For example, Elong et al. [29] used a curvature-based adaptive neighbourhood selection technique to classify point cloud data. Considering the calculated curvature value of each point, the author divided an input point cloud into scatter and regular regions. After that, they selected adaptive values of k and r for scatter and regular regions, 10 ≤ k ≤ 50 and 0.5 m ≤ r ≤ 2.5 m, respectively, within fixed intervals to reduce the computational complexities. Weinman et al. [30,31] used a Shannon entropy-based [32] neighbourhood selection method to select k closest neighbours within a fixed predefined interval, where 10 ≤ k ≤ 100. For different values of k, they found different entropies for each point and finally chose the value k, which satisfied the minimum entropy. Wang et al. [23] proposed a self-attention-based normal estimation architecture, where they claimed that the network could select the minimal number of neighbouring points according to the local geometric properties that initially provided a large k number of neighbourhoods of a point. They applied a multi-head self-attention module that selects the neighbouring points softly according to the local geometric properties. However, this method worked at the expense of the high computational cost associated with the Convolutional Neural Network (CNN). Ben-Shabat et al. [28] used a point-wise multiscale 3D modified Fischer Vector (3DmFV) [33] representation to encode the local geometry (e.g., normal) of a point using a CNN. They found n subsets of points for each point in the original input point cloud. Each subset was referred to as a scale that contained a different number of points instead of fixed neighbours. The 3DmFV representation was then calculated for each scale. All 3DmFVs were provided as input to a CNN to find the normal of a point. Some authors used a multiscale neighbourhood selection approach for the classification of point clouds [27,34]. For example, Leichter et al. [27] introduced and applied a multiscale approach to recover the neighbourhood in an unstructured 3D point cloud. Both the k and r neighbourhoods were used at different scales with the aim to improve the classification accuracy. The major drawback of this approach is the high computational complexity. Besides this, these methods experience drawbacks from the so-called Hughes phenomenon [35], where the classification accuracy decreases for growing feature space dimensionality.

Normal Vector Calculation
The normal vectors of points in a point cloud are important geometric properties that have been widely used by many authors to find the fold and boundary points, and high-quality surfaces [17,21,24,36,37]. Although there are several methods for estimating normal vectors in a point cloud, they are mainly proposed for 3D geometric models that have less noise with high point densities and most of the models contain smooth surfaces. In the case of buildings in a typical urban environment, the situation is complex. LiDAR data often have low point density and nonuniform sampling rate and contain a lot of noise [25]. Therefore, the accurate calculation of normal vectors in this situation is challenging.
The literature on estimating normal vectors can be divided into two major approaches: combinatorial and numerical [37]. The combinatorial approach uses mainly Delaunay and Voronoi properties [38]. Although this approach can work in the presence of noise, in general, it becomes infeasible for large datasets. The numerical approach considers the neighbours of an interest point that may represent the surface locally, and the calculated normal of the surface is treated as the estimated normal of the point of interest. Finding a proper local neighbourhood and a best-fit plane for a point is the main issue in the numerical-based approach [30]. The PCA and its variations, for example, the Weighted PCA [39], Moving Least Square (MLS) Projection [40], Robust PCA (RPCA) [41], and Diagnostic-Robust PCA (DRPCA) [42], were used by different authors for calculating normal vectors by finding the best fitted plane. Considering specifically the oblique building point cloud data in urban environments, Zhu et al. [25] proposed an effective normal estimation method to handle the noise in building a point cloud through a local to global optimisation strategy. Instead of calculating the normal of individual points, they proceeded in a hierarchical fashion and merged similar points into supervoxels considering a planarity constraint to exclude outliers. Nurunnabi et al. [37] removed outliers using their proposed Maximum Consistency with Minimum Distance (MCMD) algorithms and then applied PCA to find normal vectors in a point cloud. Dey et al. [43] proposed an improved approach and solved the limitations of the MCMD to construct more accurate planes. Recently, Sanchez et al. [24] proposed a robust normal estimation technique through an iterative weighted PCA [39] and the robust statistical M-estimators [44]. In the weighted PCA, the neighbouring points are assigned different weights based on their distance to P i . Smaller distance points are assigned bigger weights in this approach. The M-estimators allow users to fit a model onto points by rejecting outliers. Chen et al. [17] proposed a method to extract the fold points based on the minimal number of clusters of the unit normal vectors using effective k-means clustering [45]. Using the k-neighbourhood, they calculated the normal vectors for each adjacent triangular plane constructed using any two points from the neighbours and the point itself. The directions of the unit normal vectors were calculated using a minimum spanning tree algorithm proposed by Guennebaud and Gross [46].
The major challenge in calculating the normal vectors for feature point extraction is selecting the minimal number of neighbouring points that directly influence the extraction process. The present literature mainly selects the minimal number of neighbours empirically, which is a manual process and does not consider the point cloud density and other factors. Besides this, the performance of these methods degrades in the presence of noise in the input point cloud data.

Feature Point Extraction
Existing 3D feature point extraction techniques can be broadly categorised into indirect (by converting 3D point clouds into images first) and direct (by extracting 3D edge points from the 3D-point cloud directly) approaches [11,47]. Indirect approaches take the advantages of traditional 2D edge point detection algorithms. The 3D point clouds are converted into 2D images first, and after that, the extracted lines or edge points from the 2D images are projected back to the 3D space [48][49][50]. Moghadam et al. [51] extracted feature points for the edge and boundary of an object to construct the 3D feature lines using corresponding 2D line segments of each part of the object. For each extracted 3D planar part, all of the 3D points were projected on a 2D plane [47]. Contours were extracted from the 2D segments and then projected back onto the 3D plane to obtain the 3D edge points and lines. These techniques fail to extract perfect 3D edges because of information loss during the 3D to 2D conversion and vice-versa and, thus, degrade the extraction accuracy [52].
The direct approach can also be subdivided by plane-based and direct point geometrybased approaches. Plane-based methods consider the intersections of two or more separate roof planes as the feature lines of a building. It is suitable because most buildings are a combination of different piecewise-planar roof planes [7,12,53]. Determination of planar surfaces is the main key point of this category. In this method, planar points are firstly separated from the non-planar feature points and then individual roof segments are extracted using different clustering and region growing approaches. Points of the intersecting roof plane segments are taken into consideration to form the feature lines [12]. For example, Ni et al. [16] proposed an easy-to-use 3D feature point extraction method namely Analyzing Geometric Properties of Neighbourhoods (AGPN). The authors defined the 3D edges as "3D discontinuities of the geometric properties in the underlying 3D-scene". They combined the RANdom SAmple Consensus (RANSAC) [54] and angular gap metric [55] to extract edge feature points. This method can extract two kinds of feature points, i.e., boundary elements and fold edges, which actually include all types of edges in a point cloud data. Although the plane fitting-based methods show good extraction results for the cases of fold points, most of the time, they show less accuracy for boundary point extraction [17]. Besides this, the extraction of feature points using existing plane fitting-based methods does not perform well as it loses the sharp features when the intersecting planar patches are too small [17,47].
The direct point geometry-based approach can detect both boundary and inside sharp feature points based on different geometric properties such as azimuth angles [17], normal directions [21,56], and curvature variations [21]. For example, Chen et al. [17] considered the directions of the calculated normals and azimuth angles. For a fold point, they considered the direction of all the unit normal vectors of adjacent triangular planes and aggregated them into two different clusters. The directions of the unit vectors in each of the groups are very close to each other but far from each other. To detect the boundary points, they considered the azimuth angles. Statistical approaches, such as covariance and eigenvalues, and the combination of features derived from the PCA were also used in some cases [18,37]. In this paper, we focus on the direct point geometry-based approach to avoid the problems of image and plane fitting-based methods and concentrate on accurate normal estimation to extract the feature points.
Many authors used the following 3 × 3 covariance matrix Cov[P, P] of a point P to extract the point cloud feature points as a direct approach. The features are calculated based on different combinations of eigenvalues (λ 1 ≥ λ 2 ≥ λ 3 ) and eigenvectors of Cov[P, P] [10,37,52,57]. Among different measures of feature point extraction from point-cloud data, the linearity and planarity measures are widely used [58]. These two properties of any point can be defined by Equations (1) and (2), respectively.
The main problem of feature point extraction based on the eigenvalues are the empirical determination of the thresholds and the selection of the neighbourhood [37].

Proposed Method
This paper suggests a new approach for selecting neighbouring points in the context of calculation of normal vectors and roof features. In this section, we present the proposed method for neighbourhood estimation followed by the proposed fold point extraction and boundary point detection.

Estimating Minimal Neighbourhood
The calculation of the normal vector is widely employed for extracting planes, boundaries, and edge features from LiDAR point cloud data. Most of the existing methods mainly considered high-density point cloud data with low noise, and their normal calculations are for 3D geometric models (e.g., statues and mechanical parts) having artificial or smooth surfaces. However, the nature of the aerial LiDAR data over an urban environment is different in the sense that they often contain noise, have low point density, and are heterogeneous in positions compared to the point cloud data over the 3D geometric models used by the existing methods.
The selection of the number of neighbouring points to calculate the normal vector is a great challenge. If we choose a high number of neighbours, then points from multiple planes can be aggregated (see Figure 1). If a low number of neighbouring points are chosen, they may be selected from a single straight scanline. In both cases, the calculation of the normal will be erroneous. While a small neighbourhood may be sensitive to the small variation in roof material, a large neighbourhood can attract outliers. In addition, the aerial LiDAR data come with a vertical accuracy of 15 to 30 cm. This accuracy issue can affect any size neighbourhood. Therefore, calculation of the normal using a dynamically selected minimal number of neighbourhood for each point is more suitable to circumvent these issues.
For each P i on a plane, the proposed neighbourhood selection method iteratively selects a larger neighbourhood and fits a 3D line L 3 to the neighbouring points. If all or most of these points are from the same scanline, then the line fitting error, for example, in terms of the standard deviation, will be low. In contrast, if they come from different scanlines, then the error will be high. A high error shows that the corresponding neighbourhood is large enough, it includes points from multiple scanlines and these points are minimal to form a normal of the plane. Figure 2 illustrates the flow diagram of the proposed neighbourhood estimation method that follows the steps below. Figure 2. The workflow of the proposed variable neighbourhood selection method: T d is the threshold, σ i is the standard deviation, ε is the distance error, and δ is the neighbourhood increment.

•
The proposed method first selects a minimal number of neighbouring points (say, k = 3 since a minimum of 3 points are necessary to calculate a plane normal) for P i using the k-NN algorithm. Let the set of neighbouring points be S p including the point P i . • A best fit 3D line L 3 is constructed using S p . The distance from each point of S p to L 3 is calculated. • The standard deviation σ i of the calculated distances is compared with a selected distance threshold T d . If σ i < T d , the value of k is increased (say, k = k + δ) and the procedure is repeated with the updated S p . Ideally, δ = 1 is set to iteratively find a minimal value of k for P i . However, to avoid a large number of iterations, δ = 5 is selected and, once a minimal k is found, a smaller minimal k is obtained by testing its previous δ − 1 values. T d is equal to the distance between two neighbouring points in the case of regular distribution of LiDAR points and can be calculated using Equation (3) according to Tarsha-Kurdi et al. [59], where ϑ represents the input point density. The mean area occupied by a single LiDAR point is in a square form, and the area of the square is equal to the inverse of the point density in a regular distributed point cloud data. The side length of the square represents the mean distance between two neighbouring points that satisfies Equation (3).
• If σ i ≥ T d , S p is the estimated minimal neighbourhood for P i . The green points in Figure 3a show that the above steps successfully define the minimal neighbourhood for all points on a building roof. However, when there are unexpectedly a large number of points residing along a portion of a scanline, then these steps fail to define the neighbourhood, as in this case, all or most of the points in S p are obtained from the same scanline using the k-NN algorithm (see Figure 3b). Since points are not selected from two or more scanlines, the 3D line is repeatedly formed on the scanline that offers a low σ i value. • To avoid the above issue, this paper proposes a new neighbourhood search procedure for P i (see Figure 3c). First, depending on the input point density ϑ, when the number of points in S p is larger than Aϑ, where A is the area of the smallest detectable plane, points that are very close (e.g., ε = 0.01 m) to L 3 are removed from S p (blue points remain). Second, a line L passing through P i and perpendicular to L 3 (scanline) is generated. Third, a new rectangular neighbourhood C 1 C 2 C 3 C 4 (green shaded in Figure 3c) for P i is formed. C 1 C 2 C 3 C 4 is long along L but short along L 3 , and thus, the idea is to reduce the neighbouring points from the current scanline (blue points) and to include more points from outside the scanline (green points) and even from the next scanlines (yellow points). Finally, only six points closest to four corners and two midpoints (C 1 , C 2 , C 3 , C 4 , M 1 , and M 2 ) from within C 1 C 2 C 3 C 4 and P i are assigned to an empty S p and σ i is again estimated to L 3 . If the condition (σ i ≥ T d ) is still not satisfied, the rectangle is enlarged (orange shaded) along L to include more points from outside L 3 , i.e., four more points closest to corners C 5 , C 6 , C 7 , and C 8 are added to S p . It is experimentally observed that, when (mostly in the second iteration) points from the next scanlines are considered in S p , the condition is satisfied. Figure 3d shows that all points on the roof now have minimal neighbourhoods.
Airborne LiDAR data over a building roof follow the pattern of specific scanlines. The threshold T d for the standard deviation σ i provides the guarantee that points from at least two scanlines are selected for a minimal neighbourhood irrespective of the point density (e.g., abrupt high or diluted points). Thus, the calculated normal will be accurate as a true plane can be formed using the selected minimal neighbouring points. The variable nature of the proposed neighbourhood estimation solves existing issues with the normal calculation in the literature.

Finding Fold Points
The weighted PCA algorithm [60] is adopted to calculate the normal at each input point P i . The points within the minimal neighbourhood estimated above for P i are used to calculate its normal. This paper proposes the following method to determine the fold points.
To decide if P i is a fold point, the maximum angle difference θ max between its normal and the normals of its adjacent neighbours is found. Adjacent neighbours are simply obtained by using the k-NN algorithm from the selected neighbours S p for each point P i . An alternative to find the adjacent neighbours may be the Delaunay triangulation [61]. We consider at least 8 adjacent neighbours in this case. If the selected adjacent points from S p are less than 8, then we select the remaining adjacent points from the original point cloud. Figure 4 shows some cases to decide fold points with k = 8. These cases are decided as follows by comparing θ max with an angle threshold T θ : • When two planes physically intersect, as shown in Figure 4a, and if θ max > T θ for P i (red dot) but θ max for its neighbours (green dots) can be clustered into two major groups, where the clusters are not close to each other, P i is a fold point. • When P i is a planar point, as shown in Figure 4b, θ max ≤ T θ for P i and all its neighbours. • When P i is on a curved surface, as shown in Figure 4c, θ max > T θ for P i and θ max for its neighbours are very close to T θ .
• When P i is on a step edge, as shown in Figure 4d, there can be one of two situations. The adjacent vertical plane may have no or a small number of points. When there are no points on the vertical plane, then the fold points may be completely undetermined if the two planes (top and bottom) are parallel. If there is a large slope difference between these two planes, then the case in Figure 4a applies and fold points will be determined. When there are points reflected from the vertical plane, the fold points (between vertical and top planes and between vertical and bottom planes) can also be determined using the case in Figure 4a.

Determining the Threshold T θ
To determine the threshold T θ for θ max , we consider points on a simple building from the International Society for Photogrammetry and Remote Sensing (ISPRS) benchmark dataset presented in Figure 5. The ground truth for the fold and planar points are manually generated. A total of 138 points are identified as the fold points, and the rest (3421) of the points are considered planar points.
We consider different fixed neighbourhoods (k = 9, 20, 30, 45, and 60) and the variable neighbourhood proposed in Section 3.1. For each neighbourhood considered, we estimate θ max for all the points. In Figure 5, these points are shown in different colours depending on their θ max values. As can be seen, while a small (k = 9) or large (k = 60) value of k misses many true fold points between the two building planes, a moderate value (k = 45) of k determines most of the fold points but also wrongly identifies many planar points as fold points. In contrast, the proposed variable neighbourhood determines most of the fold and planar points truly. Table 1 shows that the proposed neighbourhood offers better F1-score [53] than the fixed neighbourhoods when we consider θ max ≥ 20 • for the fold points.
We observe that θ max values of the ground truth fold points lie in the last two angle groups (20 • to 30 • and 30 • to 90 • ) of Table 1. The calculated F1-scores, considering only these two groups, as shown in the last row of the table, are highest among all possible combinations of groups of θ max for each neighbourhood. The planar points have θ max between 0 • to 20 • . Therefore, we decide T θ = 20 • .  To prove this selection of T θ is insensitive to the input point density, we selected some representative buildings from different datasets with different point densities (i.e., five buildings from the ISPRS Vaihingen area, five from Hervey Bay, and five from Aitkenvale areas; see Section 4.1) and a synthetic cube shape point cloud (see Section 4.3). Besides this, we also generated different point densities by resampling the original point cloud [62]. Figure 6 shows the average values of θ max for the fold points under different point densities. It can be observed that T θ = 20 • is a reasonable choice irrespective of different cases of point densities and datasets.

Detection of Boundary Points
We propose a simple but effective procedure for detection of the boundary points using the minimal neighbouring points S p for each P i . To make the decision about whether P i is on boundary, we first calculate the mean (S) of S p . Then, the Euclidean distance (d i ) fromS to P i is calculated. In practice, when P i is an inner point on the plane,S resides close to P i since the neighbouring points surround P i (see Figure 7a). However, when P i resides on the boundary, thenS resides away from P i (i.e., the boundary) since there are no neighbouring points outside the boundary. Therefore, we use a threshold to distinguish the boundary and non-boundary points. P i is considered a boundary point when d i ≥ T d 2 , where T d is the threshold calculated based on the density of point cloud using Equation (3). Figure 7 shows the detected boundary points on two different roof point clouds. In Figure 7b, the proposed method can also extract the proper boundary points because S p rejects the much closed points and accepts only the suitable points for P i , as described in Section 3.1 using Figure 3c.

Experimental Results
Focusing on the extraction of feature points over the building roof, the proposed methods are applied on real point cloud datasets. We chose the extracted buildings from two different datasets. The first one is the Australian benchmark [7], and the second one is the International Society for Photogrammetry and Remote Sensing (ISPRS) benchmark dataset [62,63]. Buildings from two different sites of Australian datasets and three sites of the ISPRS datasets were selected for the evaluation. To extract and justify the point clouds over the building roof, we employed the results of the extracted building roofs from Dey et al. [7,53]. Moreover, to demonstrate and compare the quantitative evaluation of our methods with the existing state-of-the-arts techniques, we manually generated the ground truth for two areas from the two different datasets. A short description of the datasets and the ground truth is given below followed by the results of the comparative analysis and outline generation. Finally, the applicability of the proposed methods are also demonstrated on the point cloud generated or captured by different media, such as, artificially generated or terrestrially scanned data.

Datasets
The Australian datasets contain three sites from two different areas: Aitkenvale and Hervey Bay. Two sites from Aikenvale area (AV) have high point densities 12 to 40 points/m 2 . The first one (AV1) covers a 66 m × 52 m area and contains 5 different buildings. The second site (AV2) contains 63 different buildings covering an area of 214 m × 149 m. Hervey Bay (HB) has 28 different buildings covering 108 m × 104 m area by a medium point density (12 points/m 2 ). The ISPRS benchmark datasets have three different sites from the Vaihingen area (VH), containing a total of 107 buildings with different complexities and shapes. Point density of the datasets is low, 2.5 to 3.9 points/m 2 .
The first site (VH1) of this area is mainly the inner city consisting of historic buildings. The second site (VH2) contains high-rising buildings, and the third site (VH3) is a residential area. Figure 8 shows the selected sites from two different datasets. It is hard to collect the ground truth from the point cloud data. Hence, we selected AV1 and VH3 only because these two sites are purely residential with detached houses. The roof of the buildings of these two sites contain multiple planes; thus, sufficient fold points exist for the ground-truth generation and evaluation. We manually categorised the roof points of each building into three different categories: fold edge points, boundary points, and inner planar points. For a fold edge point, we kept the point within a maximum distance from the intersection line: 0.15 m for AV1 and 0.3 m for VH3. These two distances are estimated based on the point density according to the Equation (3).

Comparison
To calculate and compare the processing time of the proposed neighbourhood selection method, we chose all buildings from both AV1 and VH3 sites. Table 2 compares and summarises the average processing time per building of the proposed method for each datasets with different fixed-size neighbourhood. For AV1, the proposed method takes an average of 0.232 s to find the minimal neighbourhood for all points in a building, whereas the k-NN algorithm takes 0.058 s considering k = 45. For VH3 sites, the average processing time for the proposed method is 3.120 s. The influence of the variation of k on the processing time is negligible, as shown in Table 2. Due to the existence of an abrupt point density in several buildings, the average processing time increases for VH3 datasets. For example, the building we considered from VH3 in Figure 7b takes 42.94 s to find its minimal neighbourhood for all points using our proposed method. Finally, despite the considerable difference between the processing time values between the fixed-size and proposed variable-size neighbourhood selection methods, the processing time for the proposed minimal variable-size neighbourhood selection approach is still in the acceptable range. All experiments were performed in MATLAB 2020 platform using an Intel(R) core(TM) i5-7500 CPU at 3.40 GHz with 16 GB RAM.  To demonstrate the comparison, the performance of the proposed fold point extraction and the proposed boundary point extraction using the proposed neighbourhood selection method is compared with the existing state-of-the-art methods. Both qualitative and quantitative results are presented and compared.

Fold Points
To compare the performance of the proposed fold point extraction method, we chose the AGPN method proposed by Ni et al. [16] and the fold point extraction method proposed by Chen et al. [17]. Because they are not publicly available, we implemented these methods using MATLAB to evaluate and compare their performance on our datasets. To ensure reproduction of the implementation, we carefully followed the original articles and checked the results using similar data. We obtained almost the same results. Moreover, we were provided with partial codes and some sample data from the authors of the original articles (e.g., Figures 16 and 17). The general comparison between these two and the proposed methods is summarised in Table 3. Figure 9 shows the extracted fold edge points using these two methods along with our proposed method for two sample buildings from AV1 and VH3 sites for a qualitative comparison. It is clearly visible that the proposed method identifies the points belonging to the fold edge better than the AGPN and Chen methods. The AGPN method slightly outperforms the Chen method. This is because the AGPN extracts the fold points based on a model fitting and region growing approach and, thus, is more suitable for the objects containing planar parts. However, both of the existing methods extract a lot of false-positive (FP) fold points. In this case, our proposed method extracts more precise and accurate fold points, which are clearly visible in Figure 9d,h.  Using the generated ground truth of the AV1 and VH3 area, we evaluated and compared the quantitative performance of these three methods. We considered the precision, recall rates, and F1-score [53] as quantitative measures. Lower precision rates of the existing methods in Table 4 indicate that their FP rates are higher than that of the proposed method. The AGPN has a higher true positive rate, but as it considers the intersection area of two different planes, a wider fold edge is selected; thus, a lot of FP points are selected. The Chen method can select a narrow edge, but at the same time, it misclassifies many true planar points as the fold edge, which leads to a lower precision rate. A lower recall rate of the Chen method for both datasets indicates a higher false negative rate, which is also visible in Figure 9c,g. The F1-scores show that the proposed method performs better than both of the existing methods.

Boundary Points
The performance of the proposed boundary point extraction method is compared with the recently proposed approach by Chen et al. [17] and the improved RANSAC method proposed by Ni et al. [16]. Table 5 summarises the comparison among the methods, and Figure 10 shows the visual comparison using two sample buildings selected from the VH3 and AV2 sites. The building in the first row in Figure 10 is selected from the ISPRS benchmark site (VH3) and has a low point density, while the building of the second row is selected from a high-density Australian site (AV2). Figure 10a,d represent the results of the Chen method, where we can see that some boundary points are missed on the bottom roof plane. One possible reason for misclassifying the boundary points is that the Chen method projects the neighbouring points into a 2D plane; thus, it cannot differentiate the boundary between two separate planes in the same building. Again, from Figure 10b,e, we can see that the improved RANSAC method can extract the boundary points well, but a lot of non-boundary points are also classified as boundary points and some true boundary points are missed. The probable reason behind the misclassification is that the angle between the two projected vectors has two different values, and sometimes, the method cannot choose the correct one [17]. Our proposed method is able to correctly extract the boundary points for these two buildings, as shown in Figure 10c,f. Though some false-positive points are noticeable, these are much lower than the improved RANSAC method, and there are very few missing true boundary points. Figure 10. Extracted boundary points by (a,d) the Chen method [17], (b,e) the improved RANSAC method [16], and (c,f) the proposed method. To demonstrate the quantitative comparison, the extracted boundary points of the buildings from the AV1 and VH3 areas are evaluated using the generated ground truth. Table 6 compares the results of the extracted boundary points by three different methods. The Chen method has a higher precision rate because the FP rate is lower, which means fewer inner points are identified as boundaries. Again, as the projection of 3D neighbourhood into 2D limits the detection of some true boundary points, lower recall rate is noticeable for both datasets by the Chen method. However, it performs better than AGPN and F1-score for both datasets. The proposed method has a higher recall rate and F1-scores. Thus, the overall performance of the proposed boundary extraction is much better than the state-of-the-arts. Although there are several eigenvalue-based features in the literature to classify the LiDAR points into the fold edge and non-edge points, for simplicity and to demonstrate the applicability of the proposed neighbourhood selection method, we have chosen the frequently used parameters linearity (L) and planarity (P). Equations (1) and (2) are used to calculate these two features. To find the eigenvalues (λ 1 , λ 2 , λ 3 ), the covariance matrix was constructed based on the information of the local neighbourhood. We demonstrate the qualitative performance of different fixed neighbourhoods against our proposed variable neighbourhood on a sample building from the ISPRS benchmark datasets.
In Figure 11, we calculated the linearity and planarity for each point. For a fair comparison, we considered a binary decision where green points indicate L ≥ 0.5 and red points indicate P ≥ 0.5. Blue points are considered as the original undecided points. It is clearly visible that, for low k values, both L and P found unexpected results (Figure 11a,b). A high k value considers the fold edge points as planar too (Figure 11e). It seems that, among all five k values, k = 45 (Figure 11c) shows an acceptable performance, where the proposed method allows almost similar performance for linear and planar points. Moreover, the proposed approach shows a clear distinction of the fold edge points (blue points in Figure 11f). To show the quantitative performance, F1 scores for the extracted linear and planar points are demonstrated in Table 7 for a different number of neighbouring points using the manually generated ground truth for the same building. Both fold and boundary points are considered as linear for simplicity.  Figure 12 shows the combined results of fold (blue), boundary (red), and planar (yellow) points for a sample complex building roof from the HB datasets. It is clearly visible that the combination of the proposed boundary, fold, and planar point extraction describes the building roof structure very well as it is almost similar to the reference 3D roof. Figure 13 shows the same for the AV1 dataset. Figure 14 also shows the same for some selected buildings from the AV2, HB, and VH areas. In all examples, a very small number of points, which are negligible as the total number of input points is a large number, are misclassified.

Applicability in Different Types of Point Clouds
In addition to the performance study using the real aerial point cloud data presented above, we also chose two other types of point cloud data. Firstly, a cubic object was selected as an example for artificially generated synthetic point cloud data [64] (Figure 15).
Secondly, a commercial building ( Figure 16) and a structure of "3S" (Figure 17) used by Chen et al. [17] were selected as representatives of terrestrial laser scanning (TLS) data [17], with an average density of 4000 points/m 2 . The name of the commercial building is "Computer World" situated next to Wuhan University, and the "3S" structure is about 7 m tall with several flat and curved components situated within the university area.  Combined fold (blue), boundary (red), and planar (yellow) points compared with the reference 3D building roof [7] for several buildings from the test datasets. While the first, third, and fifth rows show the reference information overlaid onto of the building roofs, the second, fourth, and sixth rows show the extracted results by the proposed method. Figure 15. Comparing results on a synthetic cube shape by the (a) AGPN [16], (b) Chen [17], and (c) proposed methods. Green represents planar points, and red represents fold points.

Figure 16.
Comparing results on building data by the (a) AGPN [16], (b) Chen [17], and (c) proposed methods. Green indicates the planar points, and red represents both the boundary and the fold edge points.
In both of the cases, all thresholds of the proposed methods are selected in the same way described in Section 3. For the cube shape, Figure 15 compares the results by the three methods. To evaluate the quantitative performance, we calculate the number of actual fold edge points and then evaluate the different methods, which is demonstrated in Table 8. The total number of points and the true fold edge points in this shape are 9602 and 484, respectively. We chose a neighbourhood size of 20 for both the AGPN and Chen methods.  Figures 16 and 17 show the results on the building TLS data by the three methods. We can see that the existing methods extract a lot of false fold and boundary points. However, the proposed method contains less false fold and boundary points. Tables 9 and 10 demonstrate the quantitative comparison between three methods in terms of total extracted feature (boundary and fold) points for "Computer World" building and the "3S" structure, respectively. As ground truths are not available for these two structure, we follow the comparison technique used by Chen et al. [17] in this situation. The extraction rate for each structure depicts the performance. For both of the TLS datasets, we chose a neighbourhood size of 30 for the AGPN and Chen methods. Figure 17. Comparing the results on the "3S" structure of Wuhan University: (a) original point cloud, (b) extracted feature points using AGPN [16], (c) extracted feature points using Chen's method [17] (d) extracted feature points using the proposed methods. the (e) extracted feature and planar points. Green indicates the planar points, and red represents both the boundary and the fold edge points.

Conclusions
This paper proposes an approach for selecting the minimal variable neighbourhood for airborne LiDAR point cloud data over the building roof. The proposed approach solves the problem of accurate normal estimation for finding the fold edge points. To extract the boundary, an effective boundary point selection method is also proposed using the suggested neighbourhood selection method. The proposed neighbourhood selection method is independent of various point densities and the calculation of normal vectors are not influenced by the heterogenic distribution of the point cloud. Using the generated ground truth for the two selected areas from the ISPRS and Australian benchmark datasets, we show the applicability and performance of the proposed method. Two different types of point cloud data, such as, artificially generated and terrestrially scanned, are also tested using the proposed methods.
In this research, we focused mainly on feature point extraction from point cloud data, which follow a specific scanline pattern. Thus, the methods are mainly demonstrated and most of the experiments are performed on the standard benchmark data of the building roof point cloud. We considered the building roof point cloud extracted from the original point cloud datasets in this research. Vegetation, outliers, and other objects were removed using our previously developed methods of building extraction. However, integrating some machine learning techniques may improve these proposed methods in terms of selecting manual thresholds. Tracing feature lines from the extracted feature points is the next step of 3D reconstruction. In the future, we will investigate the incorporation of machine learning techniques to extract the feature points and an effective feature line tracing algorithm to regularise the extracted feature points. Moreover, the applicability of the proposed methods will also be investigated in different application areas such as 3D modelling of indoor objects.