Classification and Segmentation of Mining Area Objects in Large-Scale Spares Lidar Point Cloud Using a Novel Rotated Density Network

The classification and segmentation of large-scale, sparse, LiDAR point cloud with deep learning are widely used in engineering survey and geoscience. The loose structure and the non-uniform point density are the two major constraints to utilize the sparse point cloud. This paper proposes a lightweight auxiliary network, called the rotated density-based network (RD-Net), and a novel point cloud preprocessing method, Grid Trajectory Box (GT-Box), to solve these problems. The combination of RD-Net and PointNet was used to achieve high-precision 3D classification and segmentation of the sparse point cloud. It emphasizes the importance of the density feature of LiDAR points for 3D object recognition of sparse point cloud. Furthermore, RD-Net plus PointCNN, PointNet, PointCNN, and RD-Net were introduced as comparisons. Public datasets were used to evaluate the performance of the proposed method. The results showed that the RD-Net could significantly improve the performance of sparse point cloud recognition for the coordinate-based network and could improve the classification accuracy to 94% and the segmentation per-accuracy to 70%. Additionally, the results concluded that point-density information has an independent spatial–local correlation and plays an essential role in the process of sparse point cloud recognition.


Background
The Light Detection and Ranging (LiDAR) has been widely used for its fast, convenient, and non-contact measurement advantages [1][2][3]. The point cloud is the direct output from a LiDAR sensor, which is a detailed expression of all objects detected by the sensor. Classification and segmentation of large-scale point cloud are substantial topics in geodesy and remote-sensing areas [4][5][6][7][8][9][10][11]. The point cloud has a property called spatially-local correlation, which refers to the dependency relationship within the point cloud [12]. LeCun Yann et al. [13] proposed that mining spatially-local correlation of an object is the key to classification and segmentation through deep learning. Many researchers [14][15][16] have developed various complex network structures to improve the precision of point cloud classification and segmentation, by mining the spatially-local correlation. However, the high-precision requests complex networks, substantial computing parameters, and extended computing time. This complexity changes the LiDAR from a light, real-time measurement technology to heavy and time-consuming data analysis work.

1.
According to LiDAR specification [17], the point-density defines as the number of points per area where the surface of the earth is sampled. Commonly the point-density is given for one cubic meter (pts/m 3 ). For engineering survey, high, medium, low, and sparse density point cloud is defined as point-density < 2 pts/m 3 , (2,7] pts/m 3 , (7,10] pts/m 3 and >10 pts/m 3 , respectively. In order to reduce the computational costs and improve efficiency, many researchers applied the sparse point cloud for classification or segmentation, by using deep learning technology [18,19]. However, applying a sparse point cloud will reduce the amount of data and blur the shape of objects. For large-scale sparse point cloud, with the decrease of the point-density, the shape of the point cloud becomes unclear, the structural feature becomes confused, and the spatially-local correlation becomes difficult to find. Two major constraints to use the sparse point cloud are summarized as: Loose structure: As shown in Figure 2, as the point cloud becomes sparse, the structural features (red dotted line) of the objects become loose. This phenomenon makes it difficult to extract the spatially-local correlation from the sparse point cloud and apply deep learning methods.

2.
Non-uniform point-density: The distribution of point cloud is not uniform since the point cloud is scanned by different scanning stations. The point cloud data of panoramic mining area was obtained by multi-station stitching. This is done by registering the point cloud of different scanning stations in a unified coordinate system. Because the operating mode of Terrestrial Laser Scanning (TLS) is scanning objects through sensor rotation, the point cloud has a scattering characteristic-the point-density is high when the object is close to the sensor and vice versa. The point cloud of the panoramic mining area is spliced together by multiple scanning stations. For a situation where there are many similar objects in a certain area, such as the grove in Figure 1. When a single TLS station is closer to the object, the density is greater, as shown in Figure 1a; when another TLS station is further away from the object, the density is relatively small, as shown in Figure 1b. Therefore, when stitching together, the non-uniform point-density, the difference in point cloud density of the same object in the overlapping area, will be obvious, as shown in Figure 1c. Besides the two constraints, the points will maintain the relative position to each other, theoretically, when the density of point cloud becomes sparse. This spatially-local correlation reflects the structural feature of the object, which can be used to facilitate the classification and segmentation.

Related Work
The development of 3D convolutional neural network (3D CNNs) promotes understanding of the semantic analysis of point cloud. Standard convolutional neural networks (CNNs) require dense input representations on uniform grids, but point cloud are sparse and irregular in 3D space. To overcome it, early researchers used volumetric [20][21][22], multi-view [6,[23][24][25], or other feature representations in order to first build 3D models. In recent years, the trend has shifted instead to using raw 3D data directly [26][27][28][29][30][31]. Therefore, we divided these approaches into two groups, coordinate-based and feature-based network.

Feature-Based Network
The earlier 3D feature-based network is based on 2D image convolutional models. Many algorithms transform point cloud into features first, then feed in the established CNNs, such as transforming point cloud to regular 3D grids [20,21] or projecting point cloud into a 2D view format [23]. However, these approaches generate a huge volume of unnecessary data and lead to high computational cost during the training process. To enable efficient convolutions, data structures like octrees [32] and voxels [33,34] are utilized. The sophisticated strategies are used to avoid redundant computations. [35,36] encoded in each nonempty voxel with six statistical quantities that are derived from all the points contained within the voxel. VoxelNet [34] designs an end-to-end voxel feature encoding layer to learn a discriminative feature representation from point cloud and predicts accurate  1. According to LiDAR specification [17], the point-density defines as the number of points per area where the surface of the earth is sampled. Commonly the point-density is given for one cubic meter (pts/m 3 ). For engineering survey, high, medium, low, and sparse density point cloud is defined as point-density < 2 pts/m 3 , (2,7] pts/m 3 , (7,10] pts/m 3 and >10 pts/m 3 , respectively. In order to reduce the computational costs and improve efficiency, many researchers applied the sparse point cloud for classification or segmentation, by using deep learning technology [18,19]. However, applying a sparse point cloud will reduce the amount of data and blur the shape of objects. For large-scale sparse point cloud, with the decrease of the point-density, the shape of the point cloud becomes unclear, the structural feature becomes confused, and the spatially-local correlation becomes difficult to find. Two major constraints to use the sparse point cloud are summarized as: Loose structure: As shown in Figure 1, as the point cloud becomes sparse, the structural features (red dotted line) of the objects become loose. This phenomenon makes it difficult to extract the spatially-local correlation from the sparse point cloud and apply deep learning methods. 2. Non-uniform point-density: The distribution of point cloud is not uniform since the point cloud is scanned by different scanning stations. The point cloud data of panoramic mining area was obtained by multi-station stitching. This is done by registering the point cloud of different scanning stations in a unified coordinate system. Because the operating mode of Terrestrial Laser Scanning (TLS) is scanning objects through sensor rotation, the point cloud has a scattering characteristic-the point-density is high when the object is close to the sensor and vice versa. The point cloud of the panoramic mining area is spliced together by multiple scanning stations. For a situation where there are many similar objects in a certain area, such as the grove in Figure 2. When a single TLS station is closer to the object, the density is greater, as shown in Figure 2a; when another TLS station is further away from the object, the density is relatively small, as shown in Figure 2b. Therefore, when stitching together, the non-uniform point-density, the difference in point cloud density of the same object in the overlapping area, will be obvious, as shown in Figure 2c. Besides the two constraints, the points will maintain the relative position to each other, theoretically, when the density of point cloud becomes sparse. This spatially-local correlation reflects the structural feature of the object, which can be used to facilitate the classification and segmentation.

Related Work
The development of 3D convolutional neural network (3D CNNs) promotes understanding of the semantic analysis of point cloud. Standard convolutional neural networks (CNNs) require dense input representations on uniform grids, but point cloud are sparse and irregular in 3D space. To overcome it, early researchers used volumetric [20][21][22], multi-view [6,[23][24][25], or other feature representations in order to first build 3D models. In recent years, the trend has shifted instead to using raw 3D data directly [26][27][28][29][30][31]. Therefore, we divided these approaches into two groups, coordinate-based and feature-based network.

Feature-Based Network
The earlier 3D feature-based network is based on 2D image convolutional models. Many algorithms transform point cloud into features first, then feed in the established CNNs, such as transforming point cloud to regular 3D grids [20,21] or projecting point cloud into a 2D view format [23]. However, these approaches generate a huge volume of unnecessary data and lead to high computational cost during the training process. To enable efficient convolutions, data structures like octrees [32] and voxels [33,34] are utilized. The sophisticated strategies are used to avoid redundant computations. [35,36] encoded in each nonempty voxel with six statistical quantities that are derived from all the points contained within the voxel. VoxelNet [34] designs an end-to-end voxel feature encoding layer to learn a discriminative feature representation from point cloud and predicts accurate 3D bounding boxes. PointPillars [37] encoded features on vertical columns of point cloud to predict 3D oriented boxes for 3D object detection.
Recently, more and more networks [15,[38][39][40] used multi-feature or multi-modal network models to make up for the deficiency of feature transformation, in a certain way.

Coordinate-Based Network
Coordinate-based networks are end-to-end 3D convolutional models which are directly applied to the point cloud. PointNet [26] is the pioneer in the direct use of raw point cloud for 3D semantic analysis. It involves the skillful use of the max-pooling layer, which is a symmetric function, and the order of points does not affect its output. Max-pooling aggregates point features into a global feature vector, in an order invariance manner. Although the max-pooling idea is proven to be effective, it suffers from the lack of the ability to encode local structures with varying density. Further improvement is followed in PointNet++ [27], in which the weak performance of PointNet in fine-scale segmentation tasks was addressed by the use of multi-scale extraction. PointCNN [12] used coordinate transformation of the point cloud to simultaneously weight and permute the input features. SO-Net [29] explicitly utilizes the spatial distribution of input point cloud during hierarchical feature extraction. EdgeConv [30] used a dynamic graph CNN operator to better capture the local geometric features of point cloud, while maintaining permutation invariance. However, these network all focus on the analysis of dense point cloud, scanned at small-scale scenario like indoor spaces (ModelNet40, ScanNet [41]). The applications of the point cloud in the large-scale scene has not been well-investigated.

Contributions
This paper proposed a data preprocessing method-a Grid Trajectory Box (GT-Box)-and a lightweight density feature-based (density-based) network-Rotated Density Network (RD-Net), to analyze the large-scale sparse point cloud. The remainder of the paper is organized as follows: The formulation of GT-Box preprocessing and RD-Net is introduced in Section 2. In Section 3, the proposed method is tested and compared with other existing methods. The results and discussion are presented in Section 4. Finally, conclusions are offered in Section 5.

Methodology
First, we introduce GT-Box preprocess to divide the input sparse point cloud into K-Dimensional-Tree blocks (KD-Tree blocks) as training data in Section 2.1. Then, the description of our network, Rotated Density Network (RD-Net) is presented in Section 2.2. Finally, in Section 2.3, the most efficient semantic analysis method, RD-Net+PointNet, is done to achieve high-precision classification and segmentation of large-scale sparse point cloud. The overall information flow of RD-Net+PointNet is shown in Figure 3. RD-Net+PointNet is a multi-feature (structural feature, global feature, local feature) network combining GT-Box preprocess, density-based network RD-Net and coordinate-based network PointNet. The complete information flow is described in detail.

GT-Box Preprocess
GT-Box preprocess is developed to maintain the spatial structure of point cloud, as well as transform large-scale point cloud into a standard input size, without any overlap area. Through GT-Box preprocess, the point cloud is separated into many regular KD-Tree blocks, which facilitates the high-dimensional information computed in a convolution subsampling operator. Specifically, the GT-Box preprocess includes two steps-building GT-Boxes and generating KD-Tree blocks, based on K-Nearest Neighbor (KNN). The purpose of building GT-Boxes is to fix the search range of the KNN algorithm, and prevent the algorithm from getting stuck in a local area, when using the KNN algorithm to find the nearest neighbors in the second step. In Step 2 (Generating KD-Tree blocks), we first set the appropriate search radius R, and stipulate that each round of search must traverse all points in the current GT-Box, make sure that there are no points with a distance smaller than the search radius R from the previous point, before entering the next GT-Box. As the point spacing in large-scale sparse point cloud is quite different, if we do not use Step 1 (Building GT-Boxes), it might cause a bug that the KNN algorithm can only search in a certain local area, and cannot complete the jump operation when the point spacing is greater than the search radius R, as shown in Figure 4.

GT-Box Preprocess
GT-Box preprocess is developed to maintain the spatial structure of point cloud, as well as transform large-scale point cloud into a standard input size, without any overlap area. Through GT-Box preprocess, the point cloud is separated into many regular KD-Tree blocks, which facilitates the high-dimensional information computed in a convolution subsampling operator. Specifically, the GT-Box preprocess includes two steps-building GT-Boxes and generating KD-Tree blocks, based on K-Nearest Neighbor (KNN). The purpose of building GT-Boxes is to fix the search range of the KNN algorithm, and prevent the algorithm from getting stuck in a local area, when using the KNN algorithm to find the nearest neighbors in the second step. In Step 2 (Generating KD-Tree blocks), we first set the appropriate search radius R, and stipulate that each round of search must traverse all points in the current GT-Box, make sure that there are no points with a distance smaller than the search radius R from the previous point, before entering the next GT-Box. As the point spacing in large-scale sparse point cloud is quite different, if we do not use Step 1 (Building GT-Boxes), it might cause a bug that the KNN algorithm can only search in a certain local area, and cannot complete the jump operation when the point spacing is greater than the search radius R, as shown in Figure 4.

GT-Box Preprocess
GT-Box preprocess is developed to maintain the spatial structure of point cloud, as well as transform large-scale point cloud into a standard input size, without any overlap area. Through GT-Box preprocess, the point cloud is separated into many regular KD-Tree blocks, which facilitates the high-dimensional information computed in a convolution subsampling operator. Specifically, the GT-Box preprocess includes two steps-building GT-Boxes and generating KD-Tree blocks, based on K-Nearest Neighbor (KNN). The purpose of building GT-Boxes is to fix the search range of the KNN algorithm, and prevent the algorithm from getting stuck in a local area, when using the KNN algorithm to find the nearest neighbors in the second step. In Step 2 (Generating KD-Tree blocks), we first set the appropriate search radius R, and stipulate that each round of search must traverse all points in the current GT-Box, make sure that there are no points with a distance smaller than the search radius R from the previous point, before entering the next GT-Box. As the point spacing in large-scale sparse point cloud is quite different, if we do not use Step 1 (Building GT-Boxes), it might cause a bug that the KNN algorithm can only search in a certain local area, and cannot complete the jump operation when the point spacing is greater than the search radius R, as shown in Figure 4.

Building GT-Boxes
GT-Box is a traversal box that divides the point cloud into boxes with the same size. When we choose the dimensions of the GT-Boxes, we should avoid the fault-point clouds gathered at different edges, with obvious gaps in the middle-in the GT-Box, as much as possible, and the S-type traversal rule is used to identify the boxes, as shown in Figure 5a. For example, the large-scale point cloud size of the mining subsidence basin scene is 1342.46 × 1245.2 × 85.75 m 3 ; in Section 3. Among the objects that we want to classify or segment are many strip structures, such as buildings and trees. Therefore, we also divided the unit GT-Boxes into a strip structure. The length and width of the unit GT-Box were defined as one-hundredth of the boundary of the sparse point cloud, the height of the unit GT-Box was designated as one-fifth of the height of the sparse point cloud, 13.42 × 12.45 × 17.15 m 3 . With this operation, the point cloud is divided into a number of fixed search regions, named GT-Boxes. The identification of the GT-Boxes is to be used in the sampling process, for the next step. rule is used to identify the boxes, as shown in Figure 5a. For example, the large-scale point cloud size of the mining subsidence basin scene is 1342.46 × 1245.2 × 85.75 m3; in Section 3. Among the objects that we want to classify or segment are many strip structures, such as buildings and trees. Therefore, we also divided the unit GT-Boxes into a strip structure. The length and width of the unit GT-Box were defined as one-hundredth of the boundary of the sparse point cloud, the height of the unit GT-Box was designated as one-fifth of the height of the sparse point cloud, 13.42 × 12.45 × 17.15 m3. With this operation, the point cloud is divided into a number of fixed search regions, named GT-Boxes. The identification of the GT-Boxes is to be used in the sampling process, for the next step.

Generating KD-Tree Blocks
KNN clustering algorithm is used to divide each GT-Box into many KD-Tree blocks. First, we take the first point in the box as the starting point and search the nearest K points. The first K-1 points and the starting point are combined as the KD-Tree block. Then, the Kth point will be used as the starting point for the next combination. When the number of points in the box is less than the threshold K (K depends on the points input to the KNN), the processing of this GT-Box is finished.
After this operation, all objects in the sparse point cloud is divided into many KD-Tree blocks, which have the same number of points but different space sizes, as displayed in Figure 5b,c. If the structure of an object is simple and the point-density is uniform, the size of the KD-Tree block is large and the point distribution is uniform. On the other hand, if the structure is complex and the pointdensity is uneven, the size of the KD-Tree block is small. By GT-Box preprocess, we separate sparse point cloud into training samples in a standard format, while maintaining the structural features of objects.

RD-Net
RD-Net is a lightweight density-based network. It is combined with two key units-the pointdensity unit and the rotation unit. This network enhances the spatially-local correlation by extracting the structure features from point-density information. The point-density information reflects the relationship among each point.
Specifically, RD-Net operation includes the rotation unit, the density unit, and the lightweight training layers, as shown in Figure 6.

Generating KD-Tree Blocks
KNN clustering algorithm is used to divide each GT-Box into many KD-Tree blocks. First, we take the first point in the box as the starting point and search the nearest K points. The first K-1 points and the starting point are combined as the KD-Tree block. Then, the Kth point will be used as the starting point for the next combination. When the number of points in the box is less than the threshold K (K depends on the points input to the KNN), the processing of this GT-Box is finished.
After this operation, all objects in the sparse point cloud is divided into many KD-Tree blocks, which have the same number of points but different space sizes, as displayed in Figure 5b,c. If the structure of an object is simple and the point-density is uniform, the size of the KD-Tree block is large and the point distribution is uniform. On the other hand, if the structure is complex and the point-density is uneven, the size of the KD-Tree block is small. By GT-Box preprocess, we separate sparse point cloud into training samples in a standard format, while maintaining the structural features of objects.

RD-Net
RD-Net is a lightweight density-based network. It is combined with two key units-the point-density unit and the rotation unit. This network enhances the spatially-local correlation by extracting the structure features from point-density information. The point-density information reflects the relationship among each point.
Specifically, RD-Net operation includes the rotation unit, the density unit, and the lightweight training layers, as shown in Figure 6.

Rotation Unit
An object usually consists of several KD-Tree blocks, as shown in Figure 5c. The rotation unit changes the density information of the object by rotating the KD-tree blocks with random angles, along the rotation axis (Z-axis in Figure 7). Specifically, in Figure 7, we use color to divide the point cloud into structure points (the points distribution is consistent with the rotation axis, blue points) and the non-structure points (the points distribution is inconsistent with the rotation axis, red points). When the data passes through the rotation unit, because the distribution of structure points is consistent with the rotation axis, the point density changes little; as shown in in Figure 7, blue curve. On the contrary, rotation of unstructured points causes dramatic point-density distribution changes, as shown in Figure 7, red curve. According to this phenomenon, the delta of the point-density distribution, before and after the rotation can be used to identify the structure feature.

Rotation Unit
An object usually consists of several KD-Tree blocks, as shown in Figure 5c. The rotation unit changes the density information of the object by rotating the KD-tree blocks with random angles, along the rotation axis (Z-axis in Figure 7). Specifically, in Figure 7, we use color to divide the point cloud into structure points (the points distribution is consistent with the rotation axis, blue points) and the non-structure points (the points distribution is inconsistent with the rotation axis, red points). When the data passes through the rotation unit, because the distribution of structure points is consistent with the rotation axis, the point density changes little; as shown in in Figure 7, blue curve. On the contrary, rotation of unstructured points causes dramatic point-density distribution changes, as shown in Figure 7, red curve. According to this phenomenon, the delta of the point-density distribution, before and after the rotation can be used to identify the structure feature.

Rotation Unit
An object usually consists of several KD-Tree blocks, as shown in Figure 5c. The rotation unit changes the density information of the object by rotating the KD-tree blocks with random angles, along the rotation axis (Z-axis in Figure 7). Specifically, in Figure 7, we use color to divide the point cloud into structure points (the points distribution is consistent with the rotation axis, blue points) and the non-structure points (the points distribution is inconsistent with the rotation axis, red points). When the data passes through the rotation unit, because the distribution of structure points is consistent with the rotation axis, the point density changes little; as shown in in Figure 7, blue curve. On the contrary, rotation of unstructured points causes dramatic point-density distribution changes, as shown in Figure 7, red curve. According to this phenomenon, the delta of the point-density distribution, before and after the rotation can be used to identify the structure feature.

Density Unit
According to [13], the external shape feature and the internal structure feature of the object are independent. In order to separate the two structures, we defined point-density information by calculating the point-density of each point and organize the result to a one-dimensional matrix.
To calculate the density of each point, the radius parameter R of the region needs to be selected properly, as shown in Figure 8. Through the k-query algorithm [42], we traversed each point in the input KD-Tree block sample and calculated the number of points n in the spherical volume V with radius R, and their relative distance {d} from the center point. Dividing n(pts) by V(m 3 ), the point-density information is obtained as the input data; dividing {d}(m) by n(pts), the distance information is obtained as the bias of the CNN layers in the RD-Net. The volume calculation formula of the unit sphere is as shown in Equations (1) and (2).

Density Unit
According to [13], the external shape feature and the internal structure feature of the object are independent. In order to separate the two structures, we defined point-density information by calculating the point-density of each point and organize the result to a one-dimensional matrix. To calculate the density of each point, the radius parameter R of the region needs to be selected properly, as shown in Figure 8. Through the k-query algorithm [42], we traversed each point in the input KD-Tree block sample and calculated the number of points n in the spherical volume V with radius R, and their relative distance {d} from the center point. Dividing n(pts) by V(m 3 ), the pointdensity information is obtained as the input data; dividing {d}(m) by n(pts), the distance information is obtained as the bias of the CNN layers in the RD-Net. The volume calculation formula of the unit sphere is as shown in Equations (1) and (2).
Through density unit, we divided the points from KD-Tree blocks into two types, according to the point-density. Structural points and non-structural points. The structure points are at the intersection of object structures with high and uneven point-density. The non-structural points are with the single structural feature, uniform point-density. Therefore, similar objects have similar structural features.
where, is the ith point in the KD-Tree block, represents the number of adjacent points through the k-query algorithm.
is the radius of the unit sphere.

The Implementation of RD-Net
First, the point cloud is rotated in the rotation unit, as shown in Equations (3) and (4).
where, is the rotation matrix with the rotation angle along with the Z-axis, , , represents the input point cloud, and , , represents the rotated point cloud. Through density unit, we divided the points from KD-Tree blocks into two types, according to the point-density. Structural points and non-structural points. The structure points are at the intersection of object structures with high and uneven point-density. The non-structural points are with the single structural feature, uniform point-density. Therefore, similar objects have similar structural features.
where, P i is the ith point in the KD-Tree block, n represents the number of adjacent points through the k-query algorithm. R is the radius of the unit sphere.

The Implementation of RD-Net
First, the point cloud is rotated in the rotation unit, as shown in Equations (3) and (4).
R (x,y,z) = P (x,y,z) ·R(θ) where, R(θ) is the rotation matrix with the rotation angle θ along with the Z-axis, P (x,y,z) represents the input point cloud, and R (x,y,z) represents the rotated point cloud.
Then, the two sets of point cloud, before and after the rotation, are transferred to the density unit to calculate the point-density, as shown in Equation (5). P d = Density P (x,y,z) , Density R (x,y,z) (5) where P d represents the matrix of the density of point cloud. Finally, the point-density information is transferred into lightweight convolution layer for training, as shown in Equations (6) and (7).
where P represents high-dimensional information through the network, mlp represents the multi-layer perceptron, g is the symmetric function of max-pooling.

Semantic Analysis with RD-Net+Coordinate-Based Network
We combine the result from RD-Net with the PointNet to perform the semantic analysis. PointNet is an end-to-end network that directly analyses point cloud, based on coordinates, which transforms coordinate information into three different features (Global Feature, Point Feature, and Local Feature), as shown in Figure 3.
Through the RD-Net and PointNet, we can obtain four distinct features-the global feature (B, 1024), the point feature, the local feature (B, N, 256), and the structural feature (B, 1024). The global feature has a large-scale and represents the entire spatial shape of KD-Tree blocks, and the local feature has a small-scale and represents a local region shape of KD-Tree blocks. The structural feature is from the RD-Net, which has a large-scale and analyses the topological relationship of the entire sample block. We selected structure feature and global feature N times, to form a matrix of the same shape as (B, N, 1024). Then, we stacked them together as input data (B, N, 2304) for classification or segmentation in the concatenate layer, as shown in Figure 3. Then, we applied different training layers to these two kinds of tasks, as shown in Figure 9. where represents the matrix of the density of point cloud. Finally, the point-density information is transferred into lightweight convolution layer for training, as shown in Equations (6) and (7). , , where represents high-dimensional information through the network, represents the multilayer perceptron, is the symmetric function of max-pooling.

Semantic Analysis with RD-Net+Coordinate-Based Network
We combine the result from RD-Net with the PointNet to perform the semantic analysis. PointNet is an end-to-end network that directly analyses point cloud, based on coordinates, which transforms coordinate information into three different features (Global Feature, Point Feature, and Local Feature), as shown in Figure 3.
Through the RD-Net and PointNet, we can obtain four distinct features-the global feature (B, 1024), the point feature, the local feature (B, N, 256), and the structural feature (B, 1024). The global feature has a large-scale and represents the entire spatial shape of KD-Tree blocks, and the local feature has a small-scale and represents a local region shape of KD-Tree blocks. The structural feature is from the RD-Net, which has a large-scale and analyses the topological relationship of the entire sample block. We selected structure feature and global feature N times, to form a matrix of the same shape as (B, N, 1024). Then, we stacked them together as input data (B, N, 2304) for classification or segmentation in the concatenate layer, as shown in Figure 3. Then, we applied different training layers to these two kinds of tasks, as shown in Figure 9.  For classification, a network with sufficiently strong feature analysis capabilities performs steadily in classification tasks. In classification, the entire input is classified into different categories. Through the multi-layer perceptron (MLP), high-dimensional information (2048) is converted into certain-dimension predictive category C of the input point cloud. In addition, the output is a classification result with C × 1 one-hot vector, C representing the classification categories. For segmentation, this task requests the sensitivity to the change of features. In segmentation, every point in KD-Tree blocks needs to be predicted. As such, we labeled every point in the input separately. Through CNN layers and MLP, the output is a B×N×C matrix, where N is the number of points.

Datasets
The point cloud of two scenes, rural scene and large-scale mining subsidence basin scene is used to evaluate the proposed method, as the experimental data. The data of each scene is divided into different point-density scales by down-sampling.
(1) Rural scene: We selected rural scene data from Semantic3D [43]. Through down-sampling, the data are processed into four point-density levels-high, medium, low, and sparse density point cloud, as shown in Table 1. The object label has 8 different categories-natural terrain, high vegetation, low vegetation, buildings, hard scape, scanning artifacts, and cars.
(2) Large-scale mining subsidence basin scene: The measured point cloud of Shandong mining area in Ordos China is used as the data sets. The original point cloud was obtained by multi-station registration using TLS Rigel VZ-4000 LiDAR sensors with high-resolution 300 MHZ scanning. Three times surveys for the same mining area, from summer to autumn, four months in total (May 28, 2018; July 3, 2018; and September 10, 2018) are used. The data were sub-sampled into four point-density levels, high, medium, low, and sparse density point cloud, as displayed in Table 1. The point cloud of mining area mainly contain terrain, vegetation, and building categories. The sizes of the files are also shown in Table 1.

Implementation Details
We used the same framework and training strategy to perform independent experiments on four density types of point clouds (high, medium, low, and sparse). Each density type of data has two different scene of point cloud, rural and mining subsidence basin. In the GT-Boxes preprocessing phase, we divided the point cloud into GT-Boxes, then generated KD-Tree blocks, as experiment samples, based on those GT-Boxes. In particular, we applied a unit GT-Box with dimensions of 2.46 × 2.25 × 12.89 m 3 for the rural scene and 13.42 × 12.45 × 17.15 m 3 for the mining subsidence basin scene, respectively. We, respectively, divided the rural scene and the mining subsidence basin scene point clouds into 2500 GT-Boxes and 9400 GT-Boxes, for each density type. The KD-Tree blocks generated by these GT-Boxes were used as samples for our experiment, as shown in Figure 10. In order to observe the influence of different density type of point cloud on our proposed network more objectively, we selected 33,200 KD-Tree blocks, 10,000 from rural scenes, and 23,200 from mining subsidence basin scenes, for each density type of point cloud, as experimental samples. For classification tasks, each sample is described as a one-dimensional one-hot vector ( [1,3] for mining subsidence basin scene, [1,8] for rural scene). For segmentation tasks, each sample is described as an n-dimensional one-hot vector (n represents the number of points, [n, 3] for mining subsidence basin scene, [n, 8] for rural scene). Then, we divided the data into 25,000 training data sets and 8200 test data sets by random sampling, according to the ratio of 3:1. Before the training phase, we used the rotated unit to compare the change of the density of point cloud before and after rotation, from a small part of the point cloud samples of each category, to find the most suitable radius parameter R. The loss function uses multi-category cross entropy.
The experimental tests were conducted by a computer equipped with a 64-bit Intel Core i5-6300HQ CPU at 2.3 GHz, GeForce GTX 950M GPU, and 12 GB RAM, running the Ubuntu 18.04 operating system. All proposed methods were implemented using Tensorflow-GPU 2.0.0. In our proposed method, the activation function of the hidden layer uses ReLU, the activation function of the output layer uses SoftMax, and the training process uses the Adam optimizer, with an initial learning rate 0.005. The learning parameters were optimized using dropout, batch normalization, and an exponentially decaying learning rate.

Design of Experiments
To evaluate the performance of RD-Net and the impact of point-density information on the deep learning process, three series of experimental tests were conducted.
For classification, the running time and classification accuracy were recorded for each epoch. The classification accuracy is equal to the number of correct predictions divided by the total number of samples. The average accuracy of samples is used to evaluate the classification task. For segmentation, the accuracy is equal to the number of correct predicted points in this block divided

Design of Experiments
To evaluate the performance of RD-Net and the impact of point-density information on the deep learning process, three series of experimental tests were conducted.
For classification, the running time and classification accuracy were recorded for each epoch. The classification accuracy is equal to the number of correct predictions divided by the total number of samples. The average accuracy of samples is used to evaluate the classification task. For segmentation, the accuracy is equal to the number of correct predicted points in this block divided by the total number of points in this block. The average segmentation accuracy and the mean Intersection over Union (mean-IoU) are used to evaluate the segmentation task performance.

Experiment 1: Performance Evaluation on RD-Net
The proposed method, RD-Net, is an auxiliary density-based network. It can be efficiently used in the mining scene by quickly identifying the density characteristics of point cloud and combining them with the original pre-training coordinated-based network, such as PointNet and PointCNN, it can achieve efficient recognition level for large-scale sparse point cloud. Based on this view, we combine RD-Net with current popular end-to-end coordinate-based networks, PointNet and PointNet, to form RD-Net+PointNet and RD-Net+PointCNN. We designed two experiments (E 1.1, E 1.2) to analyze the performance of RD-Net and the properties of point-density information from different perspectives.
(E1.1) We applied five different network, PointNet, PointCNN, RD-Net, RD-Net+PointNet, and RD-Net+PointCNN, to carry on the comparative experiment of classification and segmentation. Classification accuracy and Mean-IoU were used to evaluate the performance. All five methods were applied to the sparse point cloud of both scenes. Then, the best performing network in this experiment, RD-Net+PointNet, was applied to the mining subsidence basin data set, to compare the performance on classification.
(E1.2) Different combination modes of the four internal features of RD-Net+PointNet are tested in this experiment. The goal is to identify the importance of each feature. The combination modes are shown in Figure 11. In which, G-L: Global Feature + Local Feature combination mode; the S-G-L: Structural Feature + Global Feature + Local Feature; the G-S: Global Feature + Structural Feature combination mode; and the G-H Global Feature + Hybrid Feature combination mode. by the total number of points in this block. The average segmentation accuracy and the mean Intersection over Union (mean-IoU) are used to evaluate the segmentation task performance.

Experiment 1: Performance Evaluation on RD-Net
The proposed method, RD-Net, is an auxiliary density-based network. It can be efficiently used in the mining scene by quickly identifying the density characteristics of point cloud and combining them with the original pre-training coordinated-based network, such as PointNet and PointCNN, it can achieve efficient recognition level for large-scale sparse point cloud. Based on this view, we combine RD-Net with current popular end-to-end coordinate-based networks, PointNet and PointNet, to form RD-Net+PointNet and RD-Net+PointCNN. We designed two experiments (E 1.1, E 1.2) to analyze the performance of RD-Net and the properties of point-density information from different perspectives.
(E1.1) We applied five different network, PointNet, PointCNN, RD-Net, RD-Net+PointNet, and RD-Net+PointCNN, to carry on the comparative experiment of classification and segmentation. Classification accuracy and Mean-IoU were used to evaluate the performance. All five methods were applied to the sparse point cloud of both scenes. Then, the best performing network in this experiment, RD-Net+PointNet, was applied to the mining subsidence basin data set, to compare the performance on classification.
(E1.2) Different combination modes of the four internal features of RD-Net+PointNet are tested in this experiment. The goal is to identify the importance of each feature. The combination modes are shown in Figure 11. In which, G-L: Global Feature + Local Feature combination mode; the S-G-L: Structural Feature + Global Feature + Local Feature; the G-S: Global Feature + Structural Feature combination mode; and the G-H Global Feature + Hybrid Feature combination mode.

Experiment 3: Performance Evaluation on Different Density Type of Point Cloud
To verify the impact of different data volumes on efficiency. We applied the same training strategy to classify four different density types of point cloud (high, medium, low, and sparse) using the best performing network in this experiment, RD-Net+PointNet. Furthermore, we also tested the point cloud of different densities with four different combination modes.

Results and Discussion
In this section, the experiments results are presented and discussed, respectively.  Table 2. Density-based network, RD-Net, could significantly improve the performance of large-scale sparse point cloud recognition for the coordinate-based network, such as PointNet and PointCNN. For the mining subsidence basin scene, RD-Net+PointNet showed the best performance in both classification and segmentation tasks. The classification accuracy and the mean-IoU were 94.06% and 83.41%, respectively; followed by the PointNet method with 90.10% classification accuracy and 80.73% mean-IoU. The same performance could be observed in the rural scene. RD-Net recognizes objects from the perspective of spatial structure by analyzing the density characteristics. It does not depend on the coordinates of the point cloud itself, but is analyzed by the relative position relationship between the points, which is independent of the coordinate-based network. Therefore, compared with the traditional end-to-end coordinate-based network, such as PointNet and PointCNN, the network combining RD-Net is more robust. In addition, RD-Net is a lightweight network with a small number of weight parameters and low calculation costs. Combining traditional networks with RD-Net, such as RD-Net+PointNet, could help researchers to apply it more efficiently to different scenarios. The superior performance demonstrated the high-precision semantic analysis of large-scale sparse point cloud, by combining the RD-Net and PointNet. To verify the impact of different data volumes on efficiency. We applied the same training strategy to classify four different density types of point cloud (high, medium, low, and sparse) using the best performing network in this experiment, RD-Net+PointNet. Furthermore, we also tested the point cloud of different densities with four different combination modes.

Results and Discussion
In this section, the experiments results are presented and discussed, respectively.

Results and Discussion on Experiment 1
(E 1.1) The classification and segmentation results of RD-Net+PointNet for mining subsidence basin data set are shown in the Figure 12. The evaluation results for different methods are shown in Table 2. Density-based network, RD-Net, could significantly improve the performance of large-scale sparse point cloud recognition for the coordinate-based network, such as PointNet and PointCNN. For the mining subsidence basin scene, RD-Net+PointNet showed the best performance in both classification and segmentation tasks. The classification accuracy and the mean-IoU were 94.06% and 83.41%, respectively; followed by the PointNet method with 90.10% classification accuracy and 80.73% mean-IoU. The same performance could be observed in the rural scene. RD-Net recognizes objects from the perspective of spatial structure by analyzing the density characteristics. It does not depend on the coordinates of the point cloud itself, but is analyzed by the relative position relationship between the points, which is independent of the coordinate-based network. Therefore, compared with the traditional end-to-end coordinate-based network, such as PointNet and PointCNN, the network combining RD-Net is more robust. In addition, RD-Net is a lightweight network with a small number of weight parameters and low calculation costs. Combining traditional networks with RD-Net, such as RD-Net+PointNet, could help researchers to apply it more efficiently to different scenarios. The superior performance demonstrated the high-precision semantic analysis of large-scale sparse point cloud, by combining the RD-Net and PointNet. Table 2 also illustrates that the RD-Net should be used with another coordinated-based network (PointNet or PointCNN).   (E1.2) As shown in Table 3 and Figure 13, the S-G-L combination mode performs best among all methods. Compared S-G-L with the G-H method, the S-G-L method increases the classification accuracy by 4.3%. This is because of the optimal utilization of point-density information. Both the G-H and S-G-L methods use point-density and coordinate information. However, the G-H stack the two kinds of information together, for training, while the S-G-L method separates the two kinds of information and uses PointNet to train coordinate information and RD-Net to train the point-density information, independently. This result proves that considering the point-density as independent information instead of the complementary to the coordinate information can significantly improve the performance.   Table 3 and Figure 13, the S-G-L combination mode performs best among all methods. Compared S-G-L with the G-H method, the S-G-L method increases the classification accuracy by 4.3%. This is because of the optimal utilization of point-density information. Both the G-H and S-G-L methods use point-density and coordinate information. However, the G-H stack the two kinds of information together, for training, while the S-G-L method separates the two kinds of information and uses PointNet to train coordinate information and RD-Net to train the point-density information, independently. This result proves that considering the point-density as independent information instead of the complementary to the coordinate information can significantly improve the performance.   Table 3 indicates that the mean-IoU of the G-L is 10% higher than G-S. This phenomenon explains the critical effect of the local feature analysis. Specifically, the local feature represents small-  Table 3 indicates that the mean-IoU of the G-L is 10% higher than G-S. This phenomenon explains the critical effect of the local feature analysis. Specifically, the local feature represents small-scale local region shapes of objects, which is composed of adjacent points and complement each other. However, the structure feature has a large-scale, which represents the topological relationship of the whole object.
When we use the hybrid feature of small-scale point feature and large-scale point-density feature, the G-H, for segmentation experiments, the mean-IoU is not growth, as shown in Table 3. However, when we use the mixed local feature and the global feature for these experiments, the classification effect still does not grow. This is precise because of the independence of the point-density.

Results and Discussion on Experiment 2
The selection of radius parameters R in this experiment, is shown in Table 4. As illustrated, the accuracy is impacted by the selection of the radius. For example, when the radius R is selected as 0.2 m, the classification accuracy of the mining subsidence basin is 78.73%. With an increase in R, the classification accuracy improves and reaches the optimal result when R = 2 m. After this, the accuracy decreases as the R increases. This might be because a radius of about 2 meters can express the change of the density distribution of the current point cloud. When the radius is too large, the density range of unit sphere of each point covers the change of point density before and after the rotation of point cloud, vice versa.

Results and Discussion on Experiment 3
As shown in Table 5, the classification accuracy of sparse point clouds with a small amount of data is slightly lower than that of medium density point clouds, and is slightly higher than dense point clouds with large amounts of data. However, the calculation efficiency is five times that of medium density type and ten times that of high-density type of point cloud. The data size does not significantly impact the classification accuracy when using the RD-Net with coordinate-based network. This phenomenon demonstrated that the point-density information is more sensitive at sparse point cloud, and the proposed method, RD-Net+PointNet, could proficiently process the sparse point cloud with a small amount of file size, which can significantly reduce the computing cost and improve work efficiency. It is of great practical value for studying large-scale surface laws, such as mining subsidence in mining areas. As displayed in Figure 13, as the point-density of point cloud becomes denser, the gap between the four different feature-combined methods decrease gradually. This result proved that the spatially-local correlation between points increases gradually when the point cloud becomes dense. When the point-density reaches 10 pts/m 2 , the coordinate has enough spatial information for classification and segmentation and the effect of point-density information is reduced, as in Figure 13. However, a high-density point cloud means a heavy volume of data and longtime calculation.

Conclusions
This paper proposed a preprocessing method, GT-Box, to retain spatial information and a novel lightweight network, RD-Net, that can efficiently improve the performance of classification and segmentation in the large-scale sparse point cloud. Specifically, the classification accuracy and the mean-IoU of RD-Net+PointNet are 94.06% and 83.41%. Unlike coordinate-based network, RD-Net does not rely on the coordinates of the point cloud itself when analyzing the point cloud, but through the relative position relationship between the points. The learning parameters of RD-Net and traditional coordinate-based network is independent of each other. Therefore, the two types of networks can complement each other to improve performance. By comparing with the different feature combination connected models in the concatenate layer, we proved that point-density is an independent and scale-sensitive spatial information, and plays an important role in the recognition of sparse point cloud. In addition, comparing the analysis of dense density type of point cloud with large amount of data using our proposed method-RD-Net+PointNet-to classify and segment large-scale sparse point cloud, a similar accuracy could be achieved, but the efficiency was increased ten times. It is practical for geoscientists engaged in regular research, such as mining subsidence. Remarkably, RD-Net+PointNet has a significant improvement effect on large-scale sparse point cloud, but it has little improvement on dense density point cloud. We believe that the reason for this limitation lies in the difficulty in understanding the characteristics of density-sensitive information of point cloud. The recognition of sensitive information features of different densities of point cloud is an exciting future direction, to further improve RD-Net performance.