Incorporating Handcrafted Features into Deep Learning for Point Cloud Classiﬁcation

: Point cloud classiﬁcation is an important task in point cloud data analysis. Traditional point cloud classiﬁcation is conducted primarily on the basis of speciﬁc handcrafted features with a speciﬁc classiﬁer and is often capable of producing satisfactory results. However, the extraction of crucial handcrafted features hinges on su ﬃ cient knowledge of the ﬁeld and substantial experience. In contrast, while powerful deep learning algorithms possess the ability to learn features automatically, it normally requires complex network architecture and a considerable amount of calculation time to attain better accuracy of classiﬁcation. In order to combine the advantages of both the methods, in this study, we integrated the handcrafted features, whose beneﬁts were conﬁrmed by previous studies, into a deep learning network, in the hopes of solving the problem of insu ﬃ cient extraction of speciﬁc features and enabling the network to recognise other e ﬀ ective features through automatic learning. This was done to achieve the performance of a complex model by using a simple model and fulﬁl the application requirements of the remote sensing domain. As indicated by the experimental results, the integration of handcrafted features into the simple and fast-calculating PointNet model could generate a classiﬁcation result that bore comparison with that generated by a complex network model such as PointNet ++ or KPConv.


Introduction
The development of 3D scanning and 3D imaging technologies has resulted in easier acquisition and a wider range of application of point cloud data. In the domain of remote sensing and geoinformation, the point cloud was at first primarily utilised to produce the digital surface model (DSM) and the digital terrain model (DTM) [1,2]. Nowadays, the point cloud has become a major data source for 3D model reconstruction and 3D mapping [3,4]. In the domain of computer vision and robotic research, the point cloud can be utilised in object detection, tracking, and 3D modelling [5,6]. In terms of forestry applications, the point cloud provides the measurements required for forest type classification and tree species identification [7,8]. In addition, the point cloud can be used to record the shape and exterior of ancient relics or historical buildings for the purpose of digital preservation [9,10]. In recent years, because of the need for autonomous driving, the point cloud has been extensively utilised to detect and identify all types of traffic objects for the purpose of road inventory and the production of high-definition maps (HD maps) [11][12][13]. Among these applications, a very common task is point cloud classification [14,15], also known as point cloud semantic segmentation [16,17].
The primary objective of point cloud classification is to assign a semantic class label to each point of the point cloud. However, the automatic classification of the point cloud is rendered challenging by some of the data characteristics of the point cloud, such as the irregularity of point distributions, the enormous number of points, the non-uniform point density, and the complexity of the

Classification Based on Deep Learning
In recent years, many researchers have begun to develop DL algorithms applicable to point cloud classification from the viewpoint of successful development of DL in image classification and analysis [26]. Unlike the handcrafted features, which need to be designed on the basis of specific application requirements and domain knowledge, DL is characterised by the ability of automatic feature learning, and the learned features are generally more suitable for the specific application requirements. CNN is a popular DL algorithm often applied in image classification; however, because of the unordered and irregular nature of the point cloud, a general CNN cannot be directly applied to point cloud classification. As a result, many researchers convert the 3D point cloud to regular raster representations, such as 2D images [14,28] or 3D voxels [29,46], and then apply a CNN to these rasterized data for classification [18]. As a pioneer of algorithms of this type, volumetric CNNs convert the point cloud into regularly distributed voxels, and then a 3D CNN is applied to the classification and the object identification of the voxels [29,47]. However, this approach can result in vulnerability to point sparsity, and a 3D convolution is a rather time-consuming calculation process. In contrast, while multiview CNNs project the point cloud to 2D images with different perspectives and then utilise CNNs in the classification, the approach results in vulnerability to object occlusion and changes in the point cloud density [47,48].
To avert the influence of the data conversion of the point cloud, it is the most ideal to directly classify the point cloud data. In recent years, many point-based 3D deep learning architectures have been put forth, among which PointNet, proposed by Qi, Su, Kaichun and Guibas [27], is regarded as a pioneer of the methods of this type. The PointNet architecture consists of a classification network and a segmentation network. The classification network uses the max pooling as a symmetric function to aggregate point features from input points to a global feature, and then a multi-layer perceptron (MLP) is used to classify the global feature. The segmentation network is the extension of the classification network. It combines the global and the local point features and then outputs the classification score of each point. Since PointNet does not use convolution operator for feature extraction, it is praised for its few model parameters, conservation of memory, and fast training speed. Nevertheless, its major problem lies in its inability to capture detailed local features, limiting its ability to recognize fine-grained structures and generalizability to complex scenes. To tackle the deficiencies of PointNet, PointNet++ follows the hierarchical structure of the CNN and progressively executes the subsampling, grouping, and feature extraction of the point cloud to obtain the local features from multiple scales [30]. In the subsampling, farthest point sampling (FPS) is used instead of random sampling for the complete coverage of the point set. In the grouping, a set of local regions is constructed where the multi-resolution grouping (MRG) is used for tackling the non-uniform point density. In the feature extraction, a PointNet is utilised for learning local features within the grouped local regions. Such local features are further grouped into larger regions and processed to produce higher level features recursively. Unlike PointNet and PointNet++, both of which execute feature extraction via MLP, kernel point convolution (KPConv) uses the 3D convolution kernel to extract ample local features [33]. The 3D convolution kernel consists of a set of weight-carrying kernel points, each of which has an influence distance to control the overlap between each kernel point area of influence and ensure a good space coverage. In addition, kernel point positions are critical to the convolution operator, apart from forming rigid kernels via optimisation, deformable kernels can be constructed through learning. Although KPConv can generate better classification results by utilising deformable kernels, both the learning of the kernel points positions and the calculation of the 3D convolution require a considerable amount of calculation time. Table 1 compares the architectures and the properties of the three aforementioned deep learning networks. While the input of PointNet acquires a fixed quantity of points within a block via random sampling, PointNet++ conducts sampling by using the farthest point sampling algorithm (FPS). Both use MLP as the method for feature extraction. In consideration of the advantages of the two models, such as the simplicity of the model, the small number of parameters, and speedy calculation, in this study, we focused on examining these two deep learning models and assessing their effects on the efficacy of point cloud classification after the models incorporated the handcrafted features. In contrast, while KPConv utilises 3D convolution kernel for feature extraction and does not limit the quantity of the point cloud input, k-dimensional tree (KD-tree) has to be executed beforehand for subsampling point cloud when there are an excessive number of points. In view of the complex model, the large number of parameters, the relatively great extent of memory occupation, and the relatively long calculation time, it is not fitting to add extra features to KPConv. However, in consideration of the considerable classification efficacy of the model, in this study, we compared the classification result generated by the model with that generated by other models.  Figure 1a illustrates the traditional feature-based point cloud classification process. Assume that p i = (x i , y i , z i ) is a point in the point cloud and (x i , y i , z i ) is the 3D coordinates of the point p i . N (p i ) represents the neighbourhood of point p i , and F (p i ) is the handcrafted feature set derived from K points in the neighbourhood p j ∈ N (p i ), j = 1, 2, · · · , K. These handcrafted features are the major inputs of the classifier; therefore, the quality of the classification result hinges on whether the handcrafted features possess identifiability. Figure 1b illustrates the classification process based on 3D deep learning, in which the 3D coordinates of the point are directly input and the point cloud classification is executed after the features are self-learned via the DL network. The method possesses a higher level of automation, and generally the more complex the network architecture is, the better are the classification results achieved. Figure 1c illustrates the point cloud classification process proposed by this study, which combines the advantages of both the handcrafted features and the learned features, in the hopes of achieving the effects produced by a complex deep learning model (e.g., PointNet++ or KPConv) by means of a simple deep learning model (e.g., PointNet).

Methodology
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 29   Figure 1b illustrates the classification process based on 3D deep learning, in which the 3D coordinates of the point are directly input and the point cloud classification is executed after the features are self-learned via the DL network. The method possesses a higher level of automation, and generally the more complex the network architecture is, the better are the classification results achieved. Figure 1c illustrates the point cloud classification process proposed by this study, which combines the advantages of both the handcrafted features and the learned features, in the hopes of achieving the effects produced by a complex deep learning model (e.g., PointNet++ or KPConv) by means of a simple deep learning model (e.g., PointNet). Furthermore, in order to evaluate the applicability of the proposed method to different point cloud datasets, in this study, we conducted a classification and result analysis of the ALS and MLS point cloud data, respectively. Although both the scanning approaches generate point cloud data presented in the form of 3D coordinates, there is a vast distinction between their properties. While Furthermore, in order to evaluate the applicability of the proposed method to different point cloud datasets, in this study, we conducted a classification and result analysis of the ALS and MLS point cloud data, respectively. Although both the scanning approaches generate point cloud data presented in the form of 3D coordinates, there is a vast distinction between their properties. While the Remote Sens. 2020, 12, 3713 6 of 28 ALS point cloud generally has a lower point density, higher acquisition speed, and a wider covering area, the MLS point cloud normally has a higher density and more distinct details. Another major distinction lies in the different scanning directions toward the ground objects. While the ALS scans are performed in a downward vertical direction at a high altitude, resulting in a sparse point cloud of the vertical plane of the ground objects (e.g., the walls of buildings), the MLS scans are conducted in a horizontal direction at the ground level toward the ground objects and may omit the points of the horizontal plane of several ground objects (e.g., the rooves of buildings).

Extraction of Handcrafted Features
The network architecture of PointNet indicates that it is characterised by its simple model, few parameters, and fast training speed, yet it is deficient in the extraction of local features. As a result, the handcrafted features used in this study were mainly the local features of the point cloud, for the purpose of making up for the deficiencies of PointNet. In addition, the intrinsic properties of the point cloud data, such as return intensity and elevation information, were utilised as features, whose effects on the classification efficacy were evaluated. Table 2 presents the handcrafted features used in this study. Their definition and calculation method are as follows.

Covariance Features
Covariance features are the most representative type of common local features, and many researchers have confirmed their positive effect on classification [49][50][51]. The covariance features of point p i are generated mainly by calculating the (3 × 3) covariance matrix of the coordinates of all the points in its neighbourhood [51]. In this study, we first found the neighbourhood N (p i ) via K-nearest neighbours (KNN) and calculated the covariance matrix of all the points p j ( j = 1, 2, · · · , K) in the neighborhood. Then, by using eigen decomposition, we found the three eigenvalues of the covariance matrix, arranged from small to large as (λ 0 , λ 1 , λ 2 ), and the three corresponding eigenvectors (v 0 , v 1 , v 2 ). Many geometric features can be derived from the eigenvalues, among which the three common shape features: linearity (L), planarity (P), and scattering (S), as proposed by Demantké, Mallet, David and Vallet [50], can be utilised to determine the shape behaviour of the points within the neighbourhood N (p i ). Their calculation methods are illustrated in Formula (1) to Formula (3). When the point cloud in the neighbourhood was in a linear form, λ 2 λ 1 ≈ λ 0 ≈ 0, while the value of linearity (L) was close to 1; when the point cloud in the neighbourhood was in a planar form, λ 2 ≈ λ 1 λ 0 ≈ 0, and the value of planarity (P) was close to 1; when the point cloud in the neighbourhood was in a dispersed and volumetric form, λ 2 ≈ λ 1 ≈ λ 0 , and the value of scattering (S) was close to 1. Furthermore, in this study, we used the verticality (V) put forth by Guinard and Landrieu [52]. The calculation method is illustrated in Formula (4). This verticality (V) was utilised to determine the verticality of the point distribution. The horizontal neighbourhood produced a value close to 0, while the vertical linear neighbourhood produced a value close to 1, and the vertical planar neighbourhood (e.g., façade) produced a median value ( 0.7).
Remote Sens. 2020, 12, 3713 7 of 28 In addition, the normal vector of each point in the point cloud is generally regarded as one of the important features by many researchers [49]. The normal vector can be computed using many different calculation methods [53]. In this study, we used the eigen vector v 0 corresponding to the minimal eigenvalue λ 0 in the covariance matrix of point p i as the normal vector N of the point and decomposed it along the 3D coordinate axes into three normal components serving as the normal features of the point, as demonstrated in Formula (5):

Height Features
Another effective feature involves elevation information of the point cloud. Both the ALS and the MLS data contain a large number of points on the ground surface. Almost all the other above-ground objects are connected to the ground, and the junction is where classification errors are most likely to occur. In order to solve this problem, many researchers filtered out the points on the ground before classifying the above-ground objects [54], while some introduced the height difference ∆z between the ground and the above-ground object as a feature for classification [38,40,54]. As the complete automation of the filtration of the points on the ground surface cannot be attained at the present stage, it remains a time-consuming and laborious task to thoroughly filter out the points on the ground [15]. In view of this, in this study, we utilised the height difference ∆z as the height feature, which was computed by subtracting the height of the lowest point in the scene from the height of the point.

Intensity Features
Apart from acquiring the 3D coordinates of the target point, the laser scanner occasionally records the return intensity I of the laser at the same time, the value of which may be affected by the texture and roughness of the target surface, the laser wavelength, the emitted energy, and the incidence angle, and therefore can facilitate classification to some extent [39,49]. Figure 2a illustrates the ALS point cloud data utilised in this study. These test data constitute a subset of an ALS data collected over Tainan, Taiwan in August 2010. The data acquisition was carried out by the RIEGL-Q680i scanner, with a flight altitude of approximately 800 m and a point density of approximately 10 pt/m2. The primary application of this data was 3D urban mapping, and four different classes, including Ground, Building, Car, and Tree, were manually labelled beforehand and were utilised as the reference data for training and testing, as illustrated in Figure 2b. These test data contains a total of 2,262,820 points, and the number of points and percentage belonging to each class are shown in Table 3.

Feature Selection and Model Configuration for ALS Point Cloud Classification
On the basis of the two deep networks PointNet and PointNet++, in this study, we integrated a different type of handcrafted features discussed in the previous section into the ALS data and produced different models, as shown in Table 4.     According to the design of the PointNet and PointNet++ models, the input point cloud was divided into several blocks first; then, a fixed number of points from each block were selected for training. Considering that the original study does not offer suggestions about the most ideal size of the blocks, in this study, we found the best block size via experiments (see Section 4.1 for details) and divided the ALS point cloud data into 15 m × 15 m blocks, with each block containing 2048 sampled points. The training strategy for the model basically corresponded with the setup suggested by the original study. The setting of the relevant hyperparameters is illustrated in Table 5. The definitions and effects of these hyperparameters can be found in [55].
Unlike the two aforementioned models that require the process of dividing the point cloud into blocks, KPConv directly conducts the classification of the point cloud of the scene. However, an excessive number of points might result in insufficient memory and subsequently lead to a failure in the calculation, hence the need for the subsampling of the original point cloud. Moreover, the convolution kernels in KPConv were classified as the rigid type and the deformable type. The training strategy for the model followed the setup from the original study, with a max epoch of 500, a batch size of 12, 15 kernel points, and the radius of influence of the point convolution set as 4 m [33].

ALS_PointNet_1
ALS_PointNet++_1 (x, y, z) ALS_PointNet_2 ALS_PointNet++_2 (x, y, z) + N x , N y , N z ALS_PointNet_3 ALS_PointNet++_3 (x, y, z) + (λ 0 , λ 1 , λ 2 ) ALS_PointNet_4 ALS_PointNet++_4 (x, y, z) + (L, P, S) ALS_PointNet_5 ALS_PointNet++_5 (x, y, z) + (L, P, S, V) ALS_PointNet_6 ALS_PointNEt++_6 (x, y, z) + N x , N y , N z + (L, P, S, V) ALS_PointNet_7 ALS_PointNet++_7 (x, y, z) + (L, P, S) + ∆z According to the design of the PointNet and PointNet++ models, the input point cloud was divided into several blocks first; then, a fixed number of points from each block were selected for training. Considering that the original study does not offer suggestions about the most ideal size of the blocks, in this study, we found the best block size via experiments (see Section 4.1 for details) and divided the ALS point cloud data into 15 m × 15 m blocks, with each block containing 2048 sampled points. The training strategy for the model basically corresponded with the setup suggested by the original study. The setting of the relevant hyperparameters is illustrated in Table 5. The definitions and effects of these hyperparameters can be found in [55].
Unlike the two aforementioned models that require the process of dividing the point cloud into blocks, KPConv directly conducts the classification of the point cloud of the scene. However, an excessive number of points might result in insufficient memory and subsequently lead to a failure in the calculation, hence the need for the subsampling of the original point cloud. Moreover, the convolution kernels in KPConv were classified as the rigid type and the deformable type. The training strategy for the model followed the setup from the original study, with a max epoch of 500, a batch size of 12, 15 kernel points, and the radius of influence of the point convolution set as 4 m [33].  Figure 3a illustrates the MLS point cloud data utilised in this study. The test area is located at Tainan High Speed Rail Station District (Shulan) in Tainan, Taiwan. This MLS data were collected via the Optech Lynx-M1 scanner in December 2017, with a point density of approximately 200 pt/m 2 , and primarily used for road inventory and the production of HD maps. In comparison with the ALS data, the observed scene in the MLS data consisted of a larger number and greater complexity of ground objects and of the classifications of ground objects, including a total of eight classes: Ground, Tree, Street lamp, Traffic sign, Traffic light, Island (divisional island), Car, and Building, as shown in Figure 3b. There are 14,899,744 points in this observed scene, and the number of points and percentage belonging to each class is listed in Table 6.

Feature Selection and Model Configuration for MLS Point Cloud Classification
On the basis of the two models PointNet and PointNet++, we tested different feature combinations for the MLS point cloud classification, as shown in Table 7, which differed from the ALS data with the additional intensity feature and removal of the normal features.
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 29  Figure 3a illustrates the MLS point cloud data utilised in this study. The test area is located at Tainan High Speed Rail Station District (Shulan) in Tainan, Taiwan. This MLS data were collected via the Optech Lynx-M1 scanner in December 2017, with a point density of approximately 200 pt/m 2 , and primarily used for road inventory and the production of HD maps. In comparison with the ALS data, the observed scene in the MLS data consisted of a larger number and greater complexity of ground objects and of the classifications of ground objects, including a total of eight classes: Ground, Tree, Street lamp, Traffic sign, Traffic light, Island (divisional island), Car, and Building, as shown in Figure 3b. There are 14,899,744 points in this observed scene, and the number of points and percentage belonging to each class is listed in Table 6.

Feature Selection and Model Configuration for MLS Point Cloud Classification
On the basis of the two models PointNet and PointNet++, we tested different feature combinations for the MLS point cloud classification, as shown in Table 7, which differed from the ALS data with the additional intensity feature and removal of the normal features.

Models Features PointNet PointNet++
In consideration of the dense point cloud and the large quantity of MLS data, in order to effectively acquire the local geometric features of the point cloud, we first executed subsampling and determined via experiments that the best block size was 5 m × 5 m, with 4096 points extracted from each block for training. The training strategy for the PointNet and PointNet++ models followed the setup from the original study. The setting of the relevant hyperparameters is illustrated in Table 8. The setup of KPConv basically resembled that described in Section 3.2. However, in view of the relatively complex scene and the relatively large number of point clouds in the MLS point cloud data, the max epoch was set to 600, while the batch size was set to 8.

Classification Performance Evaluation
In order to assess the efficacy of each model in point cloud classification, we used the classification performance metrics, which are frequently used in machine learning and include overall accuracy (OA), precision, recall, and F1-score, and Matthews correlation coefficient (MCC) [56]. The calculation of these indicators for binary classification can be expressed as follows: Here, TP, FP, FN, and TN represent true positive, false positive, false negative, and true negative, respectively, all of which can be calculated from the point-based confusion matrix.
In the case of a multi-class problem with K classes, the macro-averaging procedure is commonly employed to calculate the overall mean of per-class measures for different indicators [57]. By this procedure, the precision, recall and F-1 score are computed for each class according to Equations (7)-(9), and then averaged via arithmetic mean. In addition, a multi-class extension of MCC in terms of the confusion matric was also considered in this study [58], which is defined as follows: where c is the total number of samples correctly predicted, s is the total number of samples, p k is the number of samples that class k was predicted, and t k is the number of samples that class k truly occurred.
In comparison to the OA, the average F1-score is less vulnerable to the problem of imbalanced data. As a result, many studies have used the average F1-score in the assessment of the point cloud classification performance [14,18,59]. As an alternative measure unaffected by the imbalanced dataset issue, MCC_k is more informative than average F1-score and OA in evaluating classification problems [45,60]

Effect of Block Size
When generally applied in image classification, CNN first partitions an image into many patches in order to extract the local features to be utilised in the classification. Similarly, the PointNet and PointNet++ models first partition the point cloud into numerous blocks, each of which serves as a unit during training. It is evident that the size of the blocks has a direct impact on the classification result. However, the original study on PointNet and PointNet++ does not offer specific suggestions about this concern. In order to conduct the succeeding experiments with the best block size, we partitioned the ALS data into blocks of five different sizes: 2.5 m × 2.5 m, 5 m × 5 m, 10 m × 10 m, 15 m × 15 m, and 20 m × 20 m. We also conducted classification tests on the point cloud data before and after the addition of the geometric feature (L, P, S, V). The experimental results are shown in Figure 4, in which the triangles with dashed lines represent the classification results produced by using only the (x, y, z) coordinates, and the circles with solid lines represent the results produced after the addition of the geometric feature. Each color represents a different classification indicator, and the grey vertical bars indicate the calculation time required.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 29 Here, TP, FP, FN, and TN represent true positive, false positive, false negative, and true negative, respectively, all of which can be calculated from the point-based confusion matrix.
In the case of a multi-class problem with K classes, the macro-averaging procedure is commonly employed to calculate the overall mean of per-class measures for different indicators [57]. By this procedure, the precision, recall and F-1 score are computed for each class according to Equations (7)-(9), and then averaged via arithmetic mean. In addition, a multi-class extension of MCC in terms of the confusion matric was also considered in this study [58], which is defined as follows: where c is the total number of samples correctly predicted, s is the total number of samples, is the number of samples that class k was predicted, and is the number of samples that class k truly occurred. In comparison to the OA, the average F1-score is less vulnerable to the problem of imbalanced data. As a result, many studies have used the average F1-score in the assessment of the point cloud classification performance [14,18,59]. As an alternative measure unaffected by the imbalanced dataset issue, MCC_k is more informative than average F1-score and OA in evaluating classification problems [45,60]

Effect of Block Size
When generally applied in image classification, CNN first partitions an image into many patches in order to extract the local features to be utilised in the classification. Similarly, the PointNet and PointNet++ models first partition the point cloud into numerous blocks, each of which serves as a unit during training. It is evident that the size of the blocks has a direct impact on the classification result. However, the original study on PointNet and PointNet++ does not offer specific suggestions about this concern. In order to conduct the succeeding experiments with the best block size, we partitioned the ALS data into blocks of five different sizes: 2.5 m × 2.5 m, 5 m × 5 m, 10 m × 10 m, 15 m × 15 m, and 20 m × 20 m. We also conducted classification tests on the point cloud data before and after the addition of the geometric feature (L, P, S, V). The experimental results are shown in Figure 4, in which the triangles with dashed lines represent the classification results produced by using only the (x, y, z) coordinates, and the circles with solid lines represent the results produced after the addition of the geometric feature. Each color represents a different classification indicator, and the grey vertical bars indicate the calculation time required.   Figure 4 suggests that, prior to the addition of the geometric feature, the increase in the block size results in an evident decline in each classification indicator. This could be attributed to the fact that the larger blocks in the PointNet architecture resulted in the insufficiency of the local structures in the global features obtained via max pooling, hence the decrease in the classification efficacy [61]. However, note that when the blocks were 2.5 m × 2.5 m in size, some of the indicators were lower than those following a block 5 m × 5 m in size, which was likely to be associated with the size of the objects to be classified in the ALS point cloud. When a larger object was divided into numerous small pieces, its entire structure was destroyed, hence the inability of the network to learn the geometric structure of the object as a complete entity. In contrast, the classification results of the point cloud into which the geometric features were integrated (represented by the solid lines in Figure 4) indicated that the addition of the geometric features led to an increase in all of the classification indicators and that the block size had less impact on the classification, with the block size of 15 m × 15 m generating a better classification result. As a result, in the succeeding experiments, the point cloud was partitioned into blocks of size 15 m × 15 m. There was no overlap between blocks, and regions without point data were excluded. The construction following the completion of the partition is shown in Figure 5.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 29 Figure 4 suggests that, prior to the addition of the geometric feature, the increase in the block size results in an evident decline in each classification indicator. This could be attributed to the fact that the larger blocks in the PointNet architecture resulted in the insufficiency of the local structures in the global features obtained via max pooling, hence the decrease in the classification efficacy [61]. However, note that when the blocks were 2.5 m × 2.5 m in size, some of the indicators were lower than those following a block 5 m × 5 m in size, which was likely to be associated with the size of the objects to be classified in the ALS point cloud. When a larger object was divided into numerous small pieces, its entire structure was destroyed, hence the inability of the network to learn the geometric structure of the object as a complete entity. In contrast, the classification results of the point cloud into which the geometric features were integrated (represented by the solid lines in Figure 4) indicated that the addition of the geometric features led to an increase in all of the classification indicators and that the block size had less impact on the classification, with the block size of 15 m × 15 m generating a better classification result. As a result, in the succeeding experiments, the point cloud was partitioned into blocks of size 15 m × 15 m. There was no overlap between blocks, and regions without point data were excluded. The construction following the completion of the partition is shown in Figure 5. Unlike the ALS point cloud data, the MLS data are characterised by a high point density and a small object scale. Theoretically, the blocks should be smaller in size if the local geometric structures of the point cloud in each block are to be effectively extracted. In view of this, we conducted experiments using blocks of four sizes: 2.5 m × 2.5 m, 5 m × 5 m, 10 m × 10 m, and 15 m × 15 m. Figure 6 illustrates the classification results of the point cloud data before and after the addition of the features. According to the classification results produced after the addition of the geometric features, the classification performance following the blocks measuring 2.5 m × 2.5 m was similar to that following the blocks measuring 5 m × 5 m. However, the smaller the size of the blocks was, the longer was the calculation time required. As a result, we conducted the succeeding experiment by using the blocks measuring 5 m × 5 m. The construction following the completion of partition is shown in Figure 7. Unlike the ALS point cloud data, the MLS data are characterised by a high point density and a small object scale. Theoretically, the blocks should be smaller in size if the local geometric structures of the point cloud in each block are to be effectively extracted. In view of this, we conducted experiments using blocks of four sizes: 2.5 m × 2.5 m, 5 m × 5 m, 10 m × 10 m, and 15 m × 15 m. Figure 6 illustrates the classification results of the point cloud data before and after the addition of the features. According to the classification results produced after the addition of the geometric features, the classification performance following the blocks measuring 2.5 m × 2.5 m was similar to that following the blocks measuring 5 m × 5 m. However, the smaller the size of the blocks was, the longer was the calculation time required. As a result, we conducted the succeeding experiment by using the blocks measuring 5 m × 5 m. The construction following the completion of partition is shown in Figure 7. Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 29

Effects of the Handcrafted Features Using PointNet
In this section, we focused on assessing the influence of the addition of the handcrafted features to the ALS point cloud data on the classification efficacy of PointNet. As shown in Table 4, following the addition of different types of features to the original point cloud coordinates, 80% of the data were utilised in training, while 20% were utilised in testing. The hyperparameters for model training are illustrated in Table 5, and the classification results are illustrated in Figure 8, in which the classification indicators, including overall accuracy, average recall, average precision, average F1score, and MCC_k, are displayed in column charts for the purpose of easy comparison. Overall, the classification results generated after the addition of the features surpassed those produced using only the (x, y, z) coordinates. While the ALS_PointNet_2 model with the addition of normal features effectively improves the average recall, it resulted in a decrease in the average precision. In contrast, ALS_PointNet_3, to which three eigenvalues were added, and ALS_PointNet_4, to which the shape features (L, P, S) were added, have a noticeable increase in both the average recall and the average precision. However, with the inclusion of the verticality (V) feature (i.e., ALS_PointNet_5), the average recall increased, while the average precision decreased. Furthermore, while the classification results of ALS_PointNet_6, to which the normal vector (Nx, Ny, Nz) and the covariance feature (L, P, S, V) were simultaneously added, did not exhibit distinct differences from those of ALS_PointNet_4. Finally, the study produced ALS_PointNet_7 by adding the height feature ∆z to ALS_PointNet_4, which resulted in a slight enhancement in the classification results.

Effects of the Handcrafted Features Using PointNet
In this section, we focused on assessing the influence of the addition of the handcrafted features to the ALS point cloud data on the classification efficacy of PointNet. As shown in Table 4, following the addition of different types of features to the original point cloud coordinates, 80% of the data were utilised in training, while 20% were utilised in testing. The hyperparameters for model training are illustrated in Table 5, and the classification results are illustrated in Figure 8, in which the classification indicators, including overall accuracy, average recall, average precision, average F1score, and MCC_k, are displayed in column charts for the purpose of easy comparison. Overall, the classification results generated after the addition of the features surpassed those produced using only the (x, y, z) coordinates. While the ALS_PointNet_2 model with the addition of normal features effectively improves the average recall, it resulted in a decrease in the average precision. In contrast, ALS_PointNet_3, to which three eigenvalues were added, and ALS_PointNet_4, to which the shape features (L, P, S) were added, have a noticeable increase in both the average recall and the average precision. However, with the inclusion of the verticality (V) feature (i.e., ALS_PointNet_5), the average recall increased, while the average precision decreased. Furthermore, while the classification results of ALS_PointNet_6, to which the normal vector (Nx, Ny, Nz) and the covariance feature (L, P, S, V) were simultaneously added, did not exhibit distinct differences from those of ALS_PointNet_4. Finally, the study produced ALS_PointNet_7 by adding the height feature ∆z to ALS_PointNet_4, which resulted in a slight enhancement in the classification results.

Effects of the Handcrafted Features Using PointNet
In this section, we focused on assessing the influence of the addition of the handcrafted features to the ALS point cloud data on the classification efficacy of PointNet. As shown in Table 4, following the addition of different types of features to the original point cloud coordinates, 80% of the data were utilised in training, while 20% were utilised in testing. The hyperparameters for model training are illustrated in Table 5, and the classification results are illustrated in Figure 8, in which the classification indicators, including overall accuracy, average recall, average precision, average F1-score, and MCC_k, are displayed in column charts for the purpose of easy comparison. Overall, the classification results generated after the addition of the features surpassed those produced using only the (x, y, z) coordinates. While the ALS_PointNet_2 model with the addition of normal features effectively improves the average recall, it resulted in a decrease in the average precision. In contrast, ALS_PointNet_3, to which three eigenvalues were added, and ALS_PointNet_4, to which the shape features (L, P, S) were added, have a noticeable increase in both the average recall and the average precision. However, with the inclusion of the verticality (V) feature (i.e., ALS_PointNet_5), the average recall increased, while the average precision decreased. Furthermore, while the classification results of ALS_PointNet_6, to which the normal vector (Nx, Ny, Nz) and the covariance feature (L, P, S, V) were simultaneously added, did not exhibit distinct differences from those of ALS_PointNet_4. Finally, the study produced ALS_PointNet_7 by adding the height feature ∆z to ALS_PointNet_4, which resulted in a slight enhancement in the classification results. Remote Sens. 2020, 12, x FOR PEER REVIEW 14 of 29

Effects of the Handcrafted Features Using PointNet++
The experiment focused on assessing the influence of the addition of the handcrafted features to PointNet++ on the classification efficacy of the ALS point cloud, by adding features identical to those used in the previous experiment (as shown in Table 4). Table 5 shows the training hyperparameters of the model, with 80% of the data utilised in training and 20% utilised in testing. The testing result is illustrated in Figure 9. Similar to the PointNet model, specifically adding normal features (ALS_PointNet++_2) does not improve the classification efficacy noticeably. In contrast, the addition of the eigenvalues (ALS_PointNet++_3) or the shape features (ALS_PointNet++_4) could enhance the overall classification results, but only to a limited extent. Even if different types of handcrafted features are included in the PointNet++ (ALS_PointNet++_5 and ALS_PointNet++_6), some classification indicators decreased rather than increased. It is worth noting that when height features ∆z are added to the model (ALS_PointNet++_7), all classification metrics were improved and the best classification results were achieved compared to other models.

Comparison with other Methods
This section compares the classification results generated via the method proposed in this paper, RF, and KPConv. RF is the classification approach in machine learning provided by the commercial software LiDAR360, with its input including the point cloud coordinates (x, y, z) and the covariance features (L, P, S, V). PointNet utilised ALS_PointNet_5, to which the covariance features (L, P, S, V) were added, and was represented by PointNet(F). With respect to PointNet++, because of the limited effects of the additional features, ALS_PointNet++_1, to which no feature was added, was utilised and represented by the same name PointNet++. Lastly, the deformable KPConv was executed with the influence distance of 4 m. In order to maintain an identical testing environment for the four methods, the experiment reselected the training and testing samples. The classification results are illustrated in Figure 10. In terms of OA, PointNet(F), to which the handcrafted features were added, produced the best result, while RF came in second but exhibited the lowest average precision. In

Effects of the Handcrafted Features Using PointNet++
The experiment focused on assessing the influence of the addition of the handcrafted features to PointNet++ on the classification efficacy of the ALS point cloud, by adding features identical to those used in the previous experiment (as shown in Table 4). Table 5 shows the training hyperparameters of the model, with 80% of the data utilised in training and 20% utilised in testing. The testing result is illustrated in Figure 9. Similar to the PointNet model, specifically adding normal features (ALS_PointNet++_2) does not improve the classification efficacy noticeably. In contrast, the addition of the eigenvalues (ALS_PointNet++_3) or the shape features (ALS_PointNet++_4) could enhance the overall classification results, but only to a limited extent. Even if different types of handcrafted features are included in the PointNet++ (ALS_PointNet++_5 and ALS_PointNet++_6), some classification indicators decreased rather than increased. It is worth noting that when height features ∆z are added to the model (ALS_PointNet++_7), all classification metrics were improved and the best classification results were achieved compared to other models.

Effects of the Handcrafted Features Using PointNet++
The experiment focused on assessing the influence of the addition of the handcrafted features to PointNet++ on the classification efficacy of the ALS point cloud, by adding features identical to those used in the previous experiment (as shown in Table 4). Table 5 shows the training hyperparameters of the model, with 80% of the data utilised in training and 20% utilised in testing. The testing result is illustrated in Figure 9. Similar to the PointNet model, specifically adding normal features (ALS_PointNet++_2) does not improve the classification efficacy noticeably. In contrast, the addition of the eigenvalues (ALS_PointNet++_3) or the shape features (ALS_PointNet++_4) could enhance the overall classification results, but only to a limited extent. Even if different types of handcrafted features are included in the PointNet++ (ALS_PointNet++_5 and ALS_PointNet++_6), some classification indicators decreased rather than increased. It is worth noting that when height features ∆z are added to the model (ALS_PointNet++_7), all classification metrics were improved and the best classification results were achieved compared to other models.

Comparison with other Methods
This section compares the classification results generated via the method proposed in this paper, RF, and KPConv. RF is the classification approach in machine learning provided by the commercial software LiDAR360, with its input including the point cloud coordinates (x, y, z) and the covariance features (L, P, S, V). PointNet utilised ALS_PointNet_5, to which the covariance features (L, P, S, V) were added, and was represented by PointNet(F). With respect to PointNet++, because of the limited effects of the additional features, ALS_PointNet++_1, to which no feature was added, was utilised and represented by the same name PointNet++. Lastly, the deformable KPConv was executed with the influence distance of 4 m. In order to maintain an identical testing environment for the four methods, the experiment reselected the training and testing samples. The classification results are illustrated in Figure 10. In terms of OA, PointNet(F), to which the handcrafted features were added, produced the best result, while RF came in second but exhibited the lowest average precision. In

Comparison with other Methods
This section compares the classification results generated via the method proposed in this paper, RF, and KPConv. RF is the classification approach in machine learning provided by the commercial software LiDAR360, with its input including the point cloud coordinates (x, y, z) and the covariance features (L, P, S, V). PointNet utilised ALS_PointNet_5, to which the covariance features (L, P, S, V) were added, and was represented by PointNet(F). With respect to PointNet++, because of the limited effects of the additional features, ALS_PointNet++_1, to which no feature was added, was utilised and represented by the same name PointNet++. Lastly, the deformable KPConv was executed with the influence distance of 4 m. In order to maintain an identical testing environment for the four methods, the experiment reselected the training and testing samples. The classification results are illustrated in Figure 10. In terms of OA, PointNet(F), to which the handcrafted features were added, produced the best result, while RF came in second but exhibited the lowest average precision. In terms of the F1-score, PointNet(F) produced the best result, and KPConv came in second. In terms of MCC_k, PointNet(F) still performed the best, followed by RF. The experimental results indicated that when equipped with the handcrafted features, PointNet was indeed effective in enhancing the classification efficacy of the ALS data, which possessed a lower point density and a smaller number of classes, and could even surpass other more complex models, such as PointNet++ and KPConv.
Remote Sens. 2020, 12, x FOR PEER REVIEW 15 of 29 terms of the F1-score, PointNet(F) produced the best result, and KPConv came in second. In terms of MCC_k, PointNet(F) still performed the best, followed by RF. The experimental results indicated that when equipped with the handcrafted features, PointNet was indeed effective in enhancing the classification efficacy of the ALS data, which possessed a lower point density and a smaller number of classes, and could even surpass other more complex models, such as PointNet++ and KPConv.

Effects of Handcrafted Features Using PointNet
The main focus of the experiment was to assess the influence of the addition of the handcrafted features to PointNet on the classification result of the MLS point cloud, by using a strategy similar to that applied to the classification of the ALS point cloud data. The MLS_PointNet_1 to MLS_PointNet_5 models were devised by adding different types of features to the MLS point cloud data (as shown in Table 7). Training and testing for each model were then conducted. The classification results are illustrated in Figure 11. According to the results, after the addition of the features, there was a noticeable increase in each of the indicators of the classification result. MLS_PointNet_4, to which intensity feature I and the covariance features (L, P, S, V) were added, produced the best result. In comparison to the result produced without the addition of any feature (MSL_PointNet_1), the average recall of MLS_PointNet_4 increased by 0.271, average precision by 0.203, overall accuracy by 0.032, and the F1-score by 0.260, altogether showing substantial advancement. In contrast, MLS_PointNet_5, which was formed with the addition of the height features to MLS_PointNet_4, resulted in a decrease in the average precision and was of no benefit to the classification result.

Effects of Handcrafted Features Using PointNet
The main focus of the experiment was to assess the influence of the addition of the handcrafted features to PointNet on the classification result of the MLS point cloud, by using a strategy similar to that applied to the classification of the ALS point cloud data. The MLS_PointNet_1 to MLS_PointNet_5 models were devised by adding different types of features to the MLS point cloud data (as shown in Table 7). Training and testing for each model were then conducted. The classification results are illustrated in Figure 11. According to the results, after the addition of the features, there was a noticeable increase in each of the indicators of the classification result. MLS_PointNet_4, to which intensity feature I and the covariance features (L, P, S, V) were added, produced the best result. In comparison to the result produced without the addition of any feature (MSL_PointNet_1), the average recall of MLS_PointNet_4 increased by 0.271, average precision by 0.203, overall accuracy by 0.032, and the F1-score by 0.260, altogether showing substantial advancement. In contrast, MLS_PointNet_5, which was formed with the addition of the height features to MLS_PointNet_4, resulted in a decrease in the average precision and was of no benefit to the classification result.
Remote Sens. 2020, 12, x FOR PEER REVIEW 15 of 29 terms of the F1-score, PointNet(F) produced the best result, and KPConv came in second. In terms of MCC_k, PointNet(F) still performed the best, followed by RF. The experimental results indicated that when equipped with the handcrafted features, PointNet was indeed effective in enhancing the classification efficacy of the ALS data, which possessed a lower point density and a smaller number of classes, and could even surpass other more complex models, such as PointNet++ and KPConv.

Effects of Handcrafted Features Using PointNet
The main focus of the experiment was to assess the influence of the addition of the handcrafted features to PointNet on the classification result of the MLS point cloud, by using a strategy similar to that applied to the classification of the ALS point cloud data. The MLS_PointNet_1 to MLS_PointNet_5 models were devised by adding different types of features to the MLS point cloud data (as shown in Table 7). Training and testing for each model were then conducted. The classification results are illustrated in Figure 11. According to the results, after the addition of the features, there was a noticeable increase in each of the indicators of the classification result. MLS_PointNet_4, to which intensity feature I and the covariance features (L, P, S, V) were added, produced the best result. In comparison to the result produced without the addition of any feature (MSL_PointNet_1), the average recall of MLS_PointNet_4 increased by 0.271, average precision by 0.203, overall accuracy by 0.032, and the F1-score by 0.260, altogether showing substantial advancement. In contrast, MLS_PointNet_5, which was formed with the addition of the height features to MLS_PointNet_4, resulted in a decrease in the average precision and was of no benefit to the classification result.

Effects of Handcrafted Features Using PointNet++
This section mainly focuses on testing the influence of the addition of different features to PointNet++ on the classification efficacy of the MLS point cloud, by adding the features identical to those in the PointNet experiment discussed in the previous section. The classification results are illustrated in Figure 12. In comparison to the MLS_PointNet++_1 model, in which only the point coordinates were used as the input, the addition of either intensity feature I or the covariance features (L, P, S, V) could increase each of the classification indicators only to a limited extent, with some of them even declining. The experimental results indicated that the self-learned local features of PointNet++ have already been effective in processing a complex scene with a higher point density and a larger number of classes; thus, the inclusion of the handcrafted features could produce very limited benefits. In addition, it can be found that height features are not beneficial for MLS point cloud classification.

Effects of Handcrafted Features Using PointNet++
This section mainly focuses on testing the influence of the addition of different features to PointNet++ on the classification efficacy of the MLS point cloud, by adding the features identical to those in the PointNet experiment discussed in the previous section. The classification results are illustrated in Figure 12. In comparison to the MLS_PointNet++_1 model, in which only the point coordinates were used as the input, the addition of either intensity feature I or the covariance features (L, P, S, V) could increase each of the classification indicators only to a limited extent, with some of them even declining. The experimental results indicated that the self-learned local features of PointNet++ have already been effective in processing a complex scene with a higher point density and a larger number of classes; thus, the inclusion of the handcrafted features could produce very limited benefits. In addition, it can be found that height features are not beneficial for MLS point cloud classification.

Comparison with other Methods
This section compares the experimental result of the proposed method with the classification results of RF and KPConv. The input of RF consisted of the original point coordinates (x, y, z), the covariance features (L, P, S, V), and the intensity (I). PointNet used MLS_PointNet_4 equipped with the additional covariance features (L, P, S, V) and the intensity (I), and is here represented by PointNet(F). PointNet++ used MLS_PointNet++_1, to which no feature was added. For deformable KPConv, the kernel points influence distance was set as equal to 4 m. The classification results are shown in Figure 13. In terms of OA, each method generated a satisfactory result. In terms of the F1score, RF produced the worst result, while PointNet(F) and PointNet++ produced very similar classification results, and KPConv produced the best result. In terms of MCC_k, KPConv performed the best, followed by PointNet and PointNet++, and RF had the worst results.

Comparison with other Methods
This section compares the experimental result of the proposed method with the classification results of RF and KPConv. The input of RF consisted of the original point coordinates (x, y, z), the covariance features (L, P, S, V), and the intensity (I). PointNet used MLS_PointNet_4 equipped with the additional covariance features (L, P, S, V) and the intensity (I), and is here represented by PointNet(F). PointNet++ used MLS_PointNet++_1, to which no feature was added. For deformable KPConv, the kernel points influence distance was set as equal to 4 m. The classification results are shown in Figure 13. In terms of OA, each method generated a satisfactory result. In terms of the F1-score, RF produced the worst result, while PointNet(F) and PointNet++ produced very similar classification results, and KPConv produced the best result. In terms of MCC_k, KPConv performed the best, followed by PointNet and PointNet++, and RF had the worst results. This section mainly focuses on testing the influence of the addition of different features to PointNet++ on the classification efficacy of the MLS point cloud, by adding the features identical to those in the PointNet experiment discussed in the previous section. The classification results are illustrated in Figure 12. In comparison to the MLS_PointNet++_1 model, in which only the point coordinates were used as the input, the addition of either intensity feature I or the covariance features (L, P, S, V) could increase each of the classification indicators only to a limited extent, with some of them even declining. The experimental results indicated that the self-learned local features of PointNet++ have already been effective in processing a complex scene with a higher point density and a larger number of classes; thus, the inclusion of the handcrafted features could produce very limited benefits. In addition, it can be found that height features are not beneficial for MLS point cloud classification.

Comparison with other Methods
This section compares the experimental result of the proposed method with the classification results of RF and KPConv. The input of RF consisted of the original point coordinates (x, y, z), the covariance features (L, P, S, V), and the intensity (I). PointNet used MLS_PointNet_4 equipped with the additional covariance features (L, P, S, V) and the intensity (I), and is here represented by PointNet(F). PointNet++ used MLS_PointNet++_1, to which no feature was added. For deformable KPConv, the kernel points influence distance was set as equal to 4 m. The classification results are shown in Figure 13. In terms of OA, each method generated a satisfactory result. In terms of the F1score, RF produced the worst result, while PointNet(F) and PointNet++ produced very similar classification results, and KPConv produced the best result. In terms of MCC_k, KPConv performed the best, followed by PointNet and PointNet++, and RF had the worst results.

Effects of Handcraft Features for PointNet
For the PointNet model, the experimental results illustrated in Figures 8 and 11 showed that the classification performance for ALS and MLS point cloud data can be improved regardless of which type of handcrafted features introduced in Section 3.1 were added. Moreover, the amount of improvement in classification performance apparently depends on the selected handcrafted features. In this section, we will discuss the classification results of PointNet with respect to the properties of the selected handcrafted features.
The normalized confusion matrix of the classification result for ALS point cloud shown in Table 9 indicates that the main problem of the PointNet without handcrafted features is the low recall value of the class of Car. In addition, there is ambiguity between the classes of Building and Tree. While the normal features are included in ALS_PointNet_2, the problem of cars being misclassified as ground and buildings has been improved. The presumed reason should be that the normal represents the direction of a surface, and all surface normals of a car vary more than the normals of the ground or a building. However, this could lead to trees with rough surfaces being misclassified as cars, resulting in low precision of the class of Car. In contrast, ALS_PointNet_3, to which three eigenvalues were added, provided the local distribution information of the point cloud and therefore led to a noticeable increase in both the average F-1 score and the MCC_k. The shape features (L, P, S) in ALS_PointNet_4 were essentially the conversion results of the three eigenvalues, and therefore have similar classification results as ALS_PointNet_3. Once the verticality (V) feature was added (i.e., ALS_PointNet_5), the average recall increased, while the average precision decreased. Furthermore, while the classification results of ALS_PointNet_6, to which the normal vector (Nx, Ny, Nz) and the covariance feature (L, P, S, V) were simultaneously added, did not exhibit distinct differences from those of ALS_PointNet_4, the addition of more features resulting in a longer training time. Finally, the study produced ALS_PointNet_7 by adding the height feature ∆z to ALS_PointNet_4, which resulted in a slight enhancement in the classification results. In most cases, the height features were generally added for reducing the interference from the points on the ground. Nevertheless, most of the classification errors caused by the points on the ground had already been rectified in the ALS_PointNet_4 model, to which (L, P, S) was added, hence the limited beneficial result generated by the addition of the height features.
In order to demonstrate the practical benefits produced by the handcrafted features for PointNet model, here, we compared ALS_PointNet_1, to which no feature was added, and ALS_PointNet_4, to which the shape features (L, P, S) were added. Tables 9 and 10 illustrate the normalized confusion matrix of the classification result of each model, respectively. Note that the addition of the shape features (L, P, S) considerably rectified the problem of misclassification, i.e., the buildings being misclassified as the trees or the grounds, the grounds being misclassified as the buildings and the trees, and the trees being misclassified as the buildings. Moreover, the recall of the class of Car considerably increased by approximately 8.5% after the addition of the features. The visualized classification results of the two models are shown in Figures 14 and 15. In Figure 14, note that the overall classification errors considerably decreased after the addition of the features, with only a few classification errors still occurring on the periphery of a small number of buildings. Figure 15a indicates that after the addition of the features, the problem of the buildings being misclassified as trees was rectified. Figure 15b indicates that prior to the addition of the feature, misclassification was likely to occur at the junction of the edge of the buildings and the trees. Nevertheless, such problems were rectified after the addition of the features.    When PointNet model with handcrafted features was applied to the MLS point cloud, there was also a significant improvement in all metrics of the classification results, as shown in Figure 11. Both intensity features and shape features are beneficial for MLS point cloud classification, with shape features being the more effective. When they are added to the model together, i.e., MLS_PointNet_4, better classification results were obtained than when they were added separately. This is due to the fact that intensity and shape features are essentially independent of each other. In contrast, adding the height feature ∆z to the PointNet model is unproductive for point cloud classification. This is similar to the case of ALS point cloud classification.
To further discuss the classification results for each class, Tables 11 and 12 present the normalized confusion matrices for MLS_PointNet_1 and MLS_PointNet_4 respectively. The results in Table 11 showed that MLS_PointNet_1, to which no handcrafted features were added, had some problems with misclassification, including the pole-like objects such as street lamps, traffic signs, and traffic lights being misclassified as trees; some cars being classified as trees, islands, and ground; and many buildings being misclassified as trees. The recall and precision values for Traffic sign were only 0.127 and 0.054, respectively, which were the worst classification results in the entire MLS scene. However, with the addition of intensity and shape features, all the overall classification metrics increased, and most of the above problems of misclassification were significantly improved. For example, the recall value of Traffic sign improved from 0.127 to 0.915, and the precision value improved from 0.054 to 0.656. In addition, the recall values of Street light, Car and Building improved from 0.321, 0.509 and 0.577 to 0.865, 0.913 and 0.900 respectively. Although the addition of the features has significantly improved the classification efficacy for MLS point cloud, there are still some problems, such as some traffic lights being misclassified as traffic signs and trees being misidentified as traffic lights, resulting in low precision values of Traffic sign and Traffic light. When PointNet model with handcrafted features was applied to the MLS point cloud, there was also a significant improvement in all metrics of the classification results, as shown in Figure 11. Both intensity features and shape features are beneficial for MLS point cloud classification, with shape features being the more effective. When they are added to the model together, i.e., MLS_PointNet_4, better classification results were obtained than when they were added separately. This is due to the fact that intensity and shape features are essentially independent of each other. In contrast, adding the height feature ∆z to the PointNet model is unproductive for point cloud classification. This is similar to the case of ALS point cloud classification.
To further discuss the classification results for each class, Tables 11 and 12 present the normalized confusion matrices for MLS_PointNet_1 and MLS_PointNet_4 respectively. The results in Table 11 showed that MLS_PointNet_1, to which no handcrafted features were added, had some problems with misclassification, including the pole-like objects such as street lamps, traffic signs, and traffic lights being misclassified as trees; some cars being classified as trees, islands, and ground; and many buildings being misclassified as trees. The recall and precision values for Traffic sign were only 0.127 and 0.054, respectively, which were the worst classification results in the entire MLS scene. However, with the addition of intensity and shape features, all the overall classification metrics increased, and most of the above problems of misclassification were significantly improved. For example, the recall value of Traffic sign improved from 0.127 to 0.915, and the precision value improved from 0.054 to 0.656. In addition, the recall values of Street light, Car and Building improved from 0.321, 0.509 and 0.577 to 0.865, 0.913 and 0.900 respectively. Although the addition of the features has significantly improved the classification efficacy for MLS point cloud, there are still some problems, such as some traffic lights being misclassified as traffic signs and trees being misidentified as traffic lights, resulting in low precision values of Traffic sign and Traffic light.

Effects of Handcraft Features for PointNet++
Because the PointNet++ model essentially has the ability to extract local features, the addition of the shape features yields limited benefits in both ALS and MLS point cloud classification, as illustrated in Figures 9 and 12 respectively. It is worth noting that height features are beneficial for ALS data classification, but not for MLS data. This was presumably due to the fact that the ALS point cloud had fewer points in the vertical surface, making the height information somewhat distinguishable between classes. Although the average F-1 score and MCC did not improve much after adding handcrafted features for the MLS point cloud, the classification accuracy of some pole-like objects, such as traffic signs and traffic lights, was improved significantly. An example is shown in Figure 16a, where a portion of Traffic light is misclassified as ground, which was improved by adding height features, as shown in Figure 16b. This demonstrated that the learning features of PointNet++ are still inadequate for the interpretation of pole-like objects, but the handcrafted features included in this study can moderately compensate for this weakness. However, it was also found that the inclusion of handcrafted features reduced the precision values of some individual classes, which is an issue that could be improved by future research. Remote Sens. 2020, 12

Comparison with other Methods
The classification results of Figure 10 in Section 4.2.3 were compared via visualization, as shown in Figure 17. RF erred in misclassifying the buildings as the trees or the cars, while the errors made by PointNet(F) mostly occurred at the junction of a building and a tree. Despite the addition of specific geometric features, some of the more detailed local structures of the point cloud could still not be captured. In comparison to PointNet++ and KPConv, PointNet(F) was less likely to result in classification errors related to large area on ground surface or buildings' rooves. While PointNet++ could rectify the problem of the periphery of the building being misclassified as the tree in PointNet via the self-learned local features, it resulted in the points on the ground on a large scale being misclassified as the buildings, and buildings with an irregular structure being misclassified as the trees. Lastly, while KPConv could acquire more detailed features via point convolution and was less likely to produce classification errors on the periphery of objects, as confirmed by the experiment, it often misclassified a large roof of a building as the ground, which was a problem that might be associated with the location of the kernel points and the kernel size of convolution. Figure 18 shows a comparison of the visualized classification results on MLS point clouds using RF and three deep learning models. In terms of the classes of Ground, Tree, and Car, each model produced a satisfactory result. However, in terms of the identification of other classes, the models displayed varying degrees of false positive rates. By comparing the reference data in Figure 18a,b, a large number of divisional islands were misclassified as the ground points by RF, while buildings were misclassified as trees. In Figure 18c,d, although many incorrect classifications by RF were noticeably rectified in PointNet and PointNet++, some problems remained unsolved, such as the street lamps and the traffic lights being misclassified as the trees, as well as the inaccuracy in classifying the traffic signs of a smaller size and with a smaller number of points. In comparison to PointNet, PointNet++ took the more detailed local features into account and was therefore less likely to misclassify the traffic lights as the street lamps, yet it still made misclassifications in detecting polelike objects. Finally, with respect to KPConv, as shown in Figure 18e, while it produced the best classification result in the identification of each class, such as the divisional islands and the traffic signs, a few faults could still be found in its identification of the walls of the buildings.

Comparison with other Methods
The classification results of Figure 10 in Section 4.2.3 were compared via visualization, as shown in Figure 17. RF erred in misclassifying the buildings as the trees or the cars, while the errors made by PointNet(F) mostly occurred at the junction of a building and a tree. Despite the addition of specific geometric features, some of the more detailed local structures of the point cloud could still not be captured. In comparison to PointNet++ and KPConv, PointNet(F) was less likely to result in classification errors related to large area on ground surface or buildings' rooves. While PointNet++ could rectify the problem of the periphery of the building being misclassified as the tree in PointNet via the self-learned local features, it resulted in the points on the ground on a large scale being misclassified as the buildings, and buildings with an irregular structure being misclassified as the trees. Lastly, while KPConv could acquire more detailed features via point convolution and was less likely to produce classification errors on the periphery of objects, as confirmed by the experiment, it often misclassified a large roof of a building as the ground, which was a problem that might be associated with the location of the kernel points and the kernel size of convolution. Figure 18 shows a comparison of the visualized classification results on MLS point clouds using RF and three deep learning models. In terms of the classes of Ground, Tree, and Car, each model produced a satisfactory result. However, in terms of the identification of other classes, the models displayed varying degrees of false positive rates. By comparing the reference data in Figure 18a,b, a large number of divisional islands were misclassified as the ground points by RF, while buildings were misclassified as trees. In Figure 18c,d, although many incorrect classifications by RF were noticeably rectified in PointNet and PointNet++, some problems remained unsolved, such as the street lamps and the traffic lights being misclassified as the trees, as well as the inaccuracy in classifying the traffic signs of a smaller size and with a smaller number of points. In comparison to PointNet, PointNet++ took the more detailed local features into account and was therefore less likely to misclassify the traffic lights as the street lamps, yet it still made misclassifications in detecting pole-like objects. Finally, with respect to KPConv, as shown in Figure 18e, while it produced the best classification result in the identification of each class, such as the divisional islands and the traffic signs, a few faults could still be found in its identification of the walls of the buildings.

Comparison of Computational Efficiency
Finally, we compared the calculation time and the classification efficacy of the deep learning models used in this study.  Figure  19 illustrates and compares the classification results and the training time of each model for the ALS and the MLS point cloud data. As indicated by the figures, the calculation time required for PointNet(F) did not increase much after adding the handcrafted features, but its classification performance has significantly been improved. Furthermore, PointNet(F) produced the best classification result and required little calculation time for the simple ALS scene, while KPConv produced the best classification result but required a large amount of calculation time for the complex MLS scene. In contrast, the classification efficiency and efficacy of PointNet++ for the MLS scene was relatively balanced. As a result, in terms of classification for a simple scene or a situation equipped with adequate prior knowledge for feature extraction, PointNet(F) would be more beneficial in practical applications.

Comparison of Computational Efficiency
Finally, we compared the calculation time and the classification efficacy of the deep learning models used in this study.  Figure 19 illustrates and compares the classification results and the training time of each model for the ALS and the MLS point cloud data. As indicated by the figures, the calculation time required for PointNet(F) did not increase much after adding the handcrafted features, but its classification performance has significantly been improved. Furthermore, PointNet(F) produced the best classification result and required little calculation time for the simple ALS scene, while KPConv produced the best classification result but required a large amount of calculation time for the complex MLS scene. In contrast, the classification efficiency and efficacy of PointNet++ for the MLS scene was relatively balanced. As a result, in terms of classification for a simple scene or a situation equipped with adequate prior knowledge for feature extraction, PointNet(F) would be more beneficial in practical applications.

Comparison of Computational Efficiency
Finally, we compared the calculation time and the classification efficacy of the deep learning models used in this study.  Figure  19 illustrates and compares the classification results and the training time of each model for the ALS and the MLS point cloud data. As indicated by the figures, the calculation time required for PointNet(F) did not increase much after adding the handcrafted features, but its classification performance has significantly been improved. Furthermore, PointNet(F) produced the best classification result and required little calculation time for the simple ALS scene, while KPConv produced the best classification result but required a large amount of calculation time for the complex MLS scene. In contrast, the classification efficiency and efficacy of PointNet++ for the MLS scene was relatively balanced. As a result, in terms of classification for a simple scene or a situation equipped with adequate prior knowledge for feature extraction, PointNet(F) would be more beneficial in practical applications.

Summary and Conclusions
This study focused on the two deep learning networks PointNet and PointNet++, and analyzed the effects of the addition of various type of handcrafted features on the point cloud classification efficacy. In addition, two point cloud datasets, including an ALS dataset covering a simple scene and an MLS dataset covering a complex scene, are used to test the performance of the proposed method.
In terms of the PointNet model, the various types of handcrafted features introduced in this study are clearly useful for classifying ALS and MLS point cloud data. In particular, the shape features that contain local geometric structure information have the most significant improvement in the classification performance of point clouds. For ALS point cloud classification, the addition of the shape features considerably rectified the problem of misclassification, i.e., the buildings being misclassified as the trees or the grounds, the grounds being misclassified as the buildings and the trees, and the trees being misclassified as the buildings. For MLS point clouds, the problem of misclassification of pole-like objects such as street lamps, traffic signs, and traffic lights can be significantly rectified by adding intensity and shape features to the PointNet model. In addition, the inclusion of these local features also effectively solves the problem of cars and buildings being misclassified as trees. In terms of PointNet++, despite its intrinsic ability to extract local features, the addition of the handcrafted features could facilitate the classification performance to a little extent for both ALS and MLS data. In addition, we find that height features are beneficial for ALS data classification, but not for MLS data. This should be due to the different point distributions between ALS and MLS point clouds.
By comparing the aforementioned results with the results produced by RF and KPConv, we found that PointNet, with the addition of the features, performed better in the case of the ALS data, while KPConv, equipped with the 3D convolution kernel, performed better in the case of the complex MLS data, but had a complex model architecture and required a considerable amount of calculation time. With the addition of local features, PointNet could attain results in the MLS data classification similar to those produced by PointNet++ and KPConv, but with the advantages of a simple model architecture and a short calculation time. As a result, the PointNet model incorporating handcrafted features will be more beneficial for practical applications in classifying simple observed scenes or analyzing complex scenes efficiently.
Through the experiments, we found that there is ample room for discussion and improvement. First, in terms of the influence of the number of ground object classifications, take the ALS data used in this study for example; while only four types of ground objects were classified in the experiment, many scenes in reality might be considerably more complex. Therefore, testing and discussions on more complex ALS scenes containing more ground object classifications should be conducted in the future. Furthermore, through practical applications, we observed that in the cases of both the ALS and the MLS data, the ground points occupied a majority of the data and, consequently, resulted in

Summary and Conclusions
This study focused on the two deep learning networks PointNet and PointNet++, and analyzed the effects of the addition of various type of handcrafted features on the point cloud classification efficacy. In addition, two point cloud datasets, including an ALS dataset covering a simple scene and an MLS dataset covering a complex scene, are used to test the performance of the proposed method.
In terms of the PointNet model, the various types of handcrafted features introduced in this study are clearly useful for classifying ALS and MLS point cloud data. In particular, the shape features that contain local geometric structure information have the most significant improvement in the classification performance of point clouds. For ALS point cloud classification, the addition of the shape features considerably rectified the problem of misclassification, i.e., the buildings being misclassified as the trees or the grounds, the grounds being misclassified as the buildings and the trees, and the trees being misclassified as the buildings. For MLS point clouds, the problem of misclassification of pole-like objects such as street lamps, traffic signs, and traffic lights can be significantly rectified by adding intensity and shape features to the PointNet model. In addition, the inclusion of these local features also effectively solves the problem of cars and buildings being misclassified as trees. In terms of PointNet++, despite its intrinsic ability to extract local features, the addition of the handcrafted features could facilitate the classification performance to a little extent for both ALS and MLS data. In addition, we find that height features are beneficial for ALS data classification, but not for MLS data. This should be due to the different point distributions between ALS and MLS point clouds.
By comparing the aforementioned results with the results produced by RF and KPConv, we found that PointNet, with the addition of the features, performed better in the case of the ALS data, while KPConv, equipped with the 3D convolution kernel, performed better in the case of the complex MLS data, but had a complex model architecture and required a considerable amount of calculation time. With the addition of local features, PointNet could attain results in the MLS data classification similar to those produced by PointNet++ and KPConv, but with the advantages of a simple model architecture and a short calculation time. As a result, the PointNet model incorporating handcrafted features will be more beneficial for practical applications in classifying simple observed scenes or analyzing complex scenes efficiently.
Through the experiments, we found that there is ample room for discussion and improvement. First, in terms of the influence of the number of ground object classifications, take the ALS data used in this study for example; while only four types of ground objects were classified in the experiment, many scenes in reality might be considerably more complex. Therefore, testing and discussions on more complex ALS scenes containing more ground object classifications should be conducted in the future. Furthermore, through practical applications, we observed that in the cases of both the ALS and the MLS data, the ground points occupied a majority of the data and, consequently, resulted in the problem of data imbalance. If this problem is solved, better results and performance can be expected. Finally, in this study, we tested only some point-based features. Discussions on the efficacy of other features, such as contextual-based features [54], object-based features [22], and full-waveform features [38], should be carried out in the future.
Author Contributions: P.-H.H. originally proposed the idea of incorporating handcrafted features into deep learning for point cloud classification; he also guided the study and experiment design, and contributed to manuscript writing and revision. Z.-Y.Z. contributed to the codes of the algorithm, experiments implementation, and partial paper writing. All authors have read and agreed to the published version of the manuscript.

Funding:
This research was funded by the Ministry of Science and Technology, Taiwan, grant number 108-2621-M-002-005-.