Large-Scale ALS Data Semantic Classification Integrating Location-Context-Semantics Cues by Higher-Order CRF

We designed a location-context-semantics-based conditional random field (LCS-CRF) framework for the semantic classification of airborne laser scanning (ALS) point clouds. For ALS datasets of high spatial resolution but with severe noise pollutions, more contexture and semantics cues, besides location information, can be exploited to surmount the decrease of discrimination of features for classification. This paper mainly focuses on the semantic classification of ALS data using mixed location-context-semantics cues, which are integrated into a higher-order CRF framework by modeling the probabilistic potentials. The location cues modeled by the unary potentials can provide basic information for discriminating the various classes. The pairwise potentials consider the spatial contextual information by establishing the neighboring interactions between points to favor spatial smoothing. The semantics cues are explicitly encoded in the higher-order potentials. The higher-order potential operates at the clusters level with similar geometric and radiometric properties, guaranteeing the classification accuracy based on semantic rules. To demonstrate the performance of our approach, two standard benchmark datasets were utilized. Experiments show that our method achieves superior classification results with an overall accuracy of 83.1% on the Vaihingen Dataset and an overall accuracy of 94.3% on the Graphics and Media Lab (GML) Dataset A compared with other classification algorithms in the literature.


Introduction
The semantic classification has been, and still is, of significant interest to the Light Detection and Ranging (LiDAR) processing and machine learning. Airborne laser scanning (ALS) system can acquire both geometric and radiometric information of geo-objects, which has been widely used in semantic classification [1]. An increasing number of applications require the result of semantic classification ranging from object detection to automatic three-dimensional (3D) modeling. Automated urban object extraction from remotely sensed data, especially from ALS point clouds, is a very challenging task due to the complex urban environments and the unorganized point clouds data. We also consider finding different types of objects in a small local neighborhood in this paper, which is obviously difficult for reliable extractions. Compared with a binary decision process, each 3D point in the irregularly distributed point clouds is assigned with a semantic object label in this work. However, due to the obvious defects of ALS point clouds (e.g., noise, inhomogeneity, loss of sharp features and outliers), current methods are not resilient for clutter scenes and heterogeneous ALS point cloud data obtained

Methodology
It is the goal of this paper to present an efficient CRF-based framework for semantic classification from ALS point cloud data without the use of image data providing spectral information. Firstly, multiple features of ALS point clouds are processed mainly based on their locations which can efficiently improve the results of the point-based classification process. Secondly, a Random Forests (RF) classifier is employed to produce the soft labeling results. Some outliers are found in the initial semantic result, then, a CRF framework is presented to smooth the result with context information between neighboring points. However, we find that it is of low accuracy for the objects with a small size, especially for cars. LCS-CRF is proposed to solve this problem and can achieve higher overall accuracy with a higher-order potential. Cluster-based features are extracted on the cluster obtained by a constrained mean-shift clustering method and semantic rules are defined. Then, based on the common knowledge of semantic rules, we define the higher-order potentials. Finally, the location, context and semantics cues are, respectively, encoded by unary, pairwise and higher-order potentials. Once fused, they can provide complementary information from varying perspectives, to improve the ALS point cloud semantic classification performance. A mean-field approximate inference method is employed to obtain the semantic classification results. Figure 1 shows the flowchart of the proposed method.

Point-Based Feature Extraction
Three types of features are employed in this section, geometric features from the ALS point cloud properties, local shape features from the structure tensor and primitive features from the data source. Since the distinctiveness of point-based features strongly depends on the respective neighborhood encapsulating those 3D points, a data-driven approach is proposed to determine the neighborhood size by selecting the number of nearest neighbors in the local 3D neighborhood of each individual point with eigenentropy-based scale selection [7]. The neighbor size can be determined based on the minimum eigenentropy by varying values of the scale parameter: considered, with a lower boundary of min k = 10 neighbors to remain robustness statistically [25][26][27] and an upper boundary of max k = 100 to limit the computational effort.
After the recovery of local neighborhoods, we congregate some features which well-suit this semantic classification for ALS point cloud. The features used in our work are shown in Table 1. The point-based feature vector comprises 34 elements.

Point-Based Feature Extraction
Three types of features are employed in this section, geometric features from the ALS point cloud properties, local shape features from the structure tensor and primitive features from the data source. Since the distinctiveness of point-based features strongly depends on the respective neighborhood encapsulating those 3D points, a data-driven approach is proposed to determine the neighborhood size by selecting the number of nearest neighbors in the local 3D neighborhood of each individual point with eigenentropy-based scale selection [7]. The neighbor size can be determined based on the minimum eigenentropy by varying values of the scale parameter: where E i,λ (k) is the eigenentropy of ith point based on the scale parameter k, and k * i represents the optimal value for ith point. Three eigenvalues (λ s , s = 1, 2, 3) can be derived by the symmetric positive semi-definite 3D structure tensor T ∈ R 3×3 , which is obtained by the k nearest neighbors of each point. In the scope of our work, scales parameters within an interval κ = [k min , k max ] are considered, with a lower boundary of k min = 10 neighbors to remain robustness statistically [25][26][27] and an upper boundary of k max = 100 to limit the computational effort.
After the recovery of local neighborhoods, we congregate some features which well-suit this semantic classification for ALS point cloud. The features used in our work are shown in Table 1. The point-based feature vector comprises 34 elements. Table 1. Three types of point-based features used in this work.

Type Components
Geometric Features H, ∆H, σ H , r, D, σ D , k1, σ k1 , k2, σ k2 , C g , σ C g , C m , σ C m , N, σ N , C, σ C , V r 2d , D 2d , σ(D 2d ) Height H above Digital Terrain Model (DTM) is a discriminating feature to distinguish different classes. The DTM can be generated based on the local topography of the scene [26]. General geometric properties are represented by the radius r of the sphere encompassing k nearest neighbors and the maximum difference ∆H within the neighborhood. Density D, principle curvatures k1 and k2, Gaussian curvature C g , mean curvature C m , and verticality V [28] are used to describe the basic properties of ALS data, which has been demonstrated their efficiency by feature important analysis. Normal vector relationships N and curvature C (i.e., normal change rate) are also derived in this work. σ(·) means the variance of above geometric features in a sphere of radius r. With the k nearest neighbors of each point, 3D structure tensor T ∈ R 3×3 can be derived to obtain 8 local shape features: linearity L, planarity P, scattering S, omnivariance O, anisotropy A, eigenentropy E, sum of eigenvalues Es , and change of curvature ∆C. Intensity I obtained directly by the ALS laser and its variance σ(I) in a sphere of radius r comprise the primitive feature set. In analogy to the 3D case, 2D projection of the 3D points onto XY-plane can reveal complementary information, especially for perfectly vertical structures. Then, r 2d defined by the circle encompassing k nearest neighbors, 2D structure tensor T ∈ R 2×2 (sum of eigenvalues Es,2d , ratio of eigenvalues R 2d ), density D 2d [29], and its variance σ(D 2d ) are also derived as the elements of the point-based feature vector.

CSF with RANSAC
To increase the efficiency of LCS-CRF, off-ground points are employed to extract cluster-based features for the higher-order potentials. CSF [30] algorithm can be used to extract off-ground point from LiDAR data, which has been shown superior performance compared with other ground filtering methods.
Two difficulties should be overcome for the ground filtering for ALS point cloud, i.e., insufficient information of small size objects for clustering which will have an obvious effect on the class accuracy, overall accuracy of classification result [17], and misjudgment between ground and classes with lower height (e.g., low vegetation). Then, RANSAC [31] is integrated with CSF to solve these problems, which is able to segment ground and off-ground points simultaneously. Pseudocode of Algorithm 1 for the RANSAC-based CSF algorithm is shown in Appendix A.
Off-ground points set is generated in Algorithm 1 and the result is shown in Figure 2. More information of small size objects (e.g., car) and lower error samples between ground and classes are obtained. Then, clustering is performed on the off-ground points.

Off-Ground Points Clustering
In this section, we first derive an over-segmentation of ALS point cloud by applying the mean-shift algorithm [32,33], a mountain climbing algorithm based on kernel density estimation without the need to initially specify the number of clusters. An adaptive gradient ascent is applied in the iterations of this algorithm, where shift vector value m will be larger in areas of low point density and lower in areas of high point density [4]. An isotropic Gaussian kernel Γ is adopted, and shift vector value m of point x can be defined as: where S r represents the set of current point's neighbors within the radius of r, and γ denotes the kernel width selected based on the point distribution for a considered scene.

CSF with RANSAC
To increase the efficiency of LCS-CRF, off-ground points are employed to extract cluster -based features for the higher-order potentials. CSF [30] algorithm can be used to extract off-ground point from LiDAR data, which has been shown super ior performance compared with other ground filtering methods. from CSF algorithm. Much information of small size obje cts is lost; (c) ground points obtaine d from Algorithm 1. Error sample s be tween ground and classes are obviously re fined; (d) off-ground points obtaine d from Algorithm 1. Enough information of small size objects, which dire ctly affects the results, can be provide d by highe r-order potential.
Two difficulties should be overcome for the ground filtering for ALS point cloud, i.e., insufficient information of small size objects for clustering which will have an obvious effect on the class accuracy, overall accuracy of classification result [17], and misjudgment between ground and classes with Figure 2. Results contrast between Cloth Simulation Filter (CSF) and Algorithm 1: (a) Ground points obtained from CSF algorithm. Details of misjudgment are also shown; (b) off-ground points obtained from CSF algorithm. Much information of small size objects is lost; (c) ground points obtained from Algorithm 1. Error samples between ground and classes are obviously refined; (d) off-ground points obtained from Algorithm 1. Enough information of small size objects, which directly affects the results, can be provided by higher-order potential.
In this work, off-ground ALS data is heterogeneous and it is hard to distinguish different classes closed to each other in distance space (e.g., car and building or building and vegetation). Then, a constrained mean-shift algorithm is proposed, i.e., a post-processing step for the initial over-segmentation performed by mean-shift algorithm and two initial clusters with a low dissimilarity are preferred to be combined into one cluster. Two constraints are used for the dissimilarity discriminate between initial clusters: • Constraint 1: local connectivity Local connectivity can be measured by the minimum Euclidean distance between points p 1 and p 2 obtained by where d(·) is the Euclidean distance between initial cluster c m and c n , and th d is the threshold of the constraint. • Constraint 2: structure correlation where T m ∈ R 3×3 and T n ∈ R 3×3 are 3D tensor structures for m th and n th clusters, log(·) the matrix logarithm operator, ||·|| F the Frobenius norm [34], and th t the threshold of the constraint.
The pseudocode of Algorithm 2, which shows the details of the constrained mean shift algorithm, is presented in Appendix B.
Clusters of different classes exhibit different characteristics, which can be used to extract more discriminative cluster-based features. Clusters derived from mean-shift algorithm, as shown in Figure 3a, are scattered and cluttered, which cannot show the special information for different classes. But, as shown in Figure 3b, more accuracy and discriminative information are provided to perform cluster-based feature extraction. But, as shown in Figure 3b, more accuracy and discriminative information are provided to perform cluster-based feature extraction.

Cluster-Based Feature Extraction
In contrast to point-based feature extraction, features of cluster are extracted in this section. Point-based features can describe the details of a single point, whereas whole level information for different classes can be obtained from clusters and used to derive higher-order potentials. Herein, five features are extracted from each cluster: • Hight H F Hight above ground measured by the barycenter of the cluster is used to distinguish the roofs and other classes (e.g., cars, low vegetations), as even the lowest roofs are generally higher than cars or low vegetations.

• Distribution of ground points G F
A circular region centered on the cluster center can be divided in to angular bins. The distribution of ground points can be described by the proportion of bins containing ground points [35]. This feature can be used to classify objects which are adjacent to ground.
• Roughness R F Roughness can be determined by the variance in distances between the points and the fitting plane computed on its kernel size, namely the scale of a sphere containing nearest points.
Smooth surface, such as roofs and facades, can be distinguished by this feature from other classes (e.g., cars, vegetations).

• Compactness C F
Compactness can be measured by the volume of the convex hull divided by the area for each cluster. The number of points in a cluster is defined here as the area. A small compactness will be obtained for erect or small size classes.

• Normal correlation N F
This feature can be measured by the correlation between normal vectors of cluster and the vertical direction of the horizontal plane, which has shown a better performance for regular classes compared with other classes. All above cluster-based features have been proven effective in distinguishing one or more classes from others. As shown in Figure 4, each feature's capacity is distinguished with different color-coded values.

Cluster-Based Feature Extraction
In contrast to point-based feature extraction, features of cluster are extracted in this section. Point-based features can describe the details of a single point, whereas whole level information for different classes can be obtained from clusters and used to derive higher-order potentials. Herein, five features are extracted from each cluster:

•
Hight F H Hight above ground measured by the barycenter of the cluster is used to distinguish the roofs and other classes (e.g., cars, low vegetations), as even the lowest roofs are generally higher than cars or low vegetations.

•
Distribution of ground points F G A circular region centered on the cluster center can be divided in to angular bins. The distribution of ground points can be described by the proportion of bins containing ground points [35]. This feature can be used to classify objects which are adjacent to ground.

•
Roughness F R Roughness can be determined by the variance in distances between the points and the fitting plane computed on its kernel size, namely the scale of a sphere containing nearest points. Smooth surface, such as roofs and facades, can be distinguished by this feature from other classes (e.g., cars, vegetations).

•
Compactness F C Compactness can be measured by the volume of the convex hull divided by the area for each cluster. The number of points in a cluster is defined here as the area. A small compactness will be obtained for erect or small size classes.

•
Normal correlation F N This feature can be measured by the correlation between normal vectors of cluster and the vertical direction of the horizontal plane, which has shown a better performance for regular classes compared with other classes.
All above cluster-based features have been proven effective in distinguishing one or more classes from others. As shown in Figure 4, each feature's capacity is distinguished with different color-coded values.

The LCS-CRF Model
To conveniently describe the semantic classification problems, we first establish the notations and definitions used throughout the paper. Consider the input ALS point cloud ) represents a 3D point corresponding to the vertices in a graphical model, and can be constructed.

Pairwise CRF Model
Pairwise CRF model is widely used in semantic classification [13,36,37] to model the spatial interaction in both the labels and observed values, which is of importance in semantic classification. It is a discriminative classification approach, which directly models the posterior probability of the label y conditioned on the observed data x [38,39]. No more than two kinds of cliques are defined in a pairwise CRF. With the Hammersley-Clifford theorem, the CRF model as a Gibbs distribution can be modeled by: where ( ) Z x is the partition function, G C the set of all the cliques, and ( | )

The LCS-CRF Model
To conveniently describe the semantic classification problems, we first establish the notations and definitions used throughout the paper. Consider the input ALS point cloud represents a 3D point corresponding to the vertices in a graphical model, and N is the total number of points. A labeled point cloud can be represented by vector y ∈ Ω, containing the labels y i for all points. y i takes its value from the label set L = {1, 2, · · · , l}, where l denotes the number of classes. Edges e ij ∈ E are used to model the relations between pairs of adjacent points v 1 and v 2 . Then, an undirected graphical model with graph G(V, E) consisting of nodes V and E can be constructed.

Pairwise CRF Model
Pairwise CRF model is widely used in semantic classification [13,36,37] to model the spatial interaction in both the labels and observed values, which is of importance in semantic classification. It is a discriminative classification approach, which directly models the posterior probability of the label y conditioned on the observed data x [38,39]. No more than two kinds of cliques are defined in a pairwise CRF. With the Hammersley-Clifford theorem, the CRF model as a Gibbs distribution can be modeled by: where Z(x) is the partition function, C G the set of all the cliques, and φc(yc|x) the potential function defined over the clique c to model the relationship of the random variables. An assignment of all the random variables (i.e., a labeling) takes values from Ω := L N . Based on the Bayesian maximum a posteriori rule, the most likely labeling y * is inferred based on the given observation, which can be described as: Sensors 2020, 20, 1700 9 of 28 The semantic classification problem with pairwise CRF model is therefore equivalent to finding the minimization of the Gibbs energy function E(y|x), which can be described by the sum of the unary and pairwise potentials. As a special case of Equation (6), E(y x) is formulated as: where φ i is the unary potential term, a proxy for the initial probability distribution across semantic classes, and φ ij is the pairwise potential term to keep smoothness and consistency between predictions.

LCS-CRF Model
Compared with pairwise CRF, richer statistics of point cloud can be captured by LCS-CRF. The problem of misclassification among different classes can be efficiently addressed by encoding higher-order semantics information, which can be employed in CRF model to improve the semantic classification performance. In our work, the potential functions are divided in three parts (i.e., unary, pairwise, and higher-order potentials) based on various cliques: where C represents the set of higher-order cliques, and φ c are the higher-order potentials defined over cliques. Then, the mean-field approximate inference algorithm is employed to optimize the energy function to obtain the final labels. Specifically, the location, context, and semantics are congregated in a higher-order CRF model, and the flowchart of the LCS-CRF-based semantic classification implemented in our study is shown in Figure 5. The semantic classification problem with pairwise CRF model is therefore equivalent to finding the minimization of the Gibbs energy function ( | ) E y x , which can be described by the sum of the unary and pairwise potentials. As a special case of Equation (6), ( | ) E y x is formulated as: where i φ is the unary potential term, a proxy for the initial probability distribution across semantic classes, and j i φ is the pairwise potential term to keep smoothness and consistency between predictions.

LCS-CRF Model
Compared with pairwise CRF, richer statistics of point cloud can be captured by LCS-CRF. The problem of misclassification among different classes can be efficiently addressed by encoding higherorder semantics information, which can be employed in CRF model to improve the semantic classification performance. In our work, the potential functions are divided in three parts (i.e., unary, pairwise, and higher-order potentials) based on various cliques: where C represents the set of higher-order cliques, and c φ are the higher-order potentials defined over cliques. Then, the mean-field approximate inference algorithm is employed to optimize the energy function to obtain the final labels. Specifically, the location, context, and semantics are congregated in a higher-order CRF model, and the flowchart of the LCS-CRF-based semantic classification implemented in our study is shown in Figure 5.

Point-based Features for Unary Potentials
The location information of point i v and its optimal neighbors are used to determine the pointbased feature vectors, by which the unary potentials i φ linking the point to the class labels determines the most probable label for a single point. The unary potentials i φ can be defined by a discriminative classifier with a probabilistic output [40]. An ensemble learning method, RF classifier is employed to produce the soft labeling results for the unary potentials. RF classifier, constructing a multitude of decision trees during training and integrating the class probabilities of the individual trees at a testing stage, has shown a superior

Point-based Features for Unary Potentials
The location information of point v i and its optimal neighbors are used to determine the point-based feature vectors, by which the unary potentials φ i linking the point to the class labels determines the most probable label for a single point. The unary potentials φ i can be defined by a discriminative classifier with a probabilistic output [40].
An ensemble learning method, RF classifier is employed to produce the soft labeling results for the unary potentials. RF classifier, constructing a multitude of decision trees during training and integrating the class probabilities of the individual trees at a testing stage, has shown a superior performance based on its robustness, high accuracy, and feasibility for ALS data [9]. In the implementation, each decision tree casts a vote for the most likely class. If the number of votes casts for a class l is N l , the unary potential is defined by where N l is the total number of decision trees. Based on the point-based features, the location cues are directly used to discriminate the ALS points by the class membership probabilities.

Weighted Potts Model
The pairwise potential φ ij incorporates the contextual cues based on the spatial smoothing dependence principle. Based on the prior spatial knowledge, neighboring points are expected to take the same label. The weighted Potts model has been shown to work well for semantic classification in many previous studies [41,42]. Herein, the pairwise potential takes the form of: where x and p represent the observed values and 3D coordinates. The label compatibility function µ(·), the weights of the spatial kernel and bilateral kernel w 1 and w 2 , and the parameters of Gaussian kernels θ 1 , θ 2 , and θ 3 are learned on the training set with the implementation provided in Reference [43]. Based on the spatial relationship, contextual relations between classes can be modeled and weighting factors are defined depending on how likely two classes occur near each other.

Higher-Order Potentials
Higher-order potentials are incorporated in a CRF model to capture richer perception between features and classes with semantics cues. In our work, the higher-order potentials are directly modeled by the cluster-based features with a sigmoid function. The sigmoid function is usually used as the activation function in many classification methods [44][45][46], which can be seen in Figure 6. performance based on its robustness, high accuracy, and feasibility for ALS data [9]. In the implementation, each decision tree casts a vote for the most likely class. If the number of votes casts for a class l is l N , the unary potential is defined by where l N is the total number of decision trees. Based on the point-based features, the location cues are directly used to discriminate the ALS points by the class membership probabilities.

Weighted Potts Model
The pairwise potential ij φ incorporates the contextual cues based on the spatial smoothing dependence principle. Based on the prior spatial knowledge, neighboring points are expected to take the same label. The weighted Potts model has been shown to work well for semantic classification in many previous studies [41,42]. Herein, the pairwise potential takes the form of: where x and p represent the observed values and 3D coordinates. The label compatibility function ( ) μ ⋅ , the weights of the spatial kernel and bilateral kernel 1 w and 2 w , and the parameters of Gaussian kernels 1 θ , 2 θ , and 3 θ are learned on the training set with the implementation provided in Reference [43]. Based on the spatial relationship, contextual relations between classes can be modeled and weighting factors are defined depending on how likely two classes occur near each other.

Higher-Order Potentials
Higher-order potentials are incorporated in a CRF model to capture richer perception between features and classes with semantics cues. In our work, the higher-order potentials are directly modeled by the cluster-based features with a sigmoid function. The sigmoid function is usually used as the activation function in many classification methods [44][45][46], which can be seen in Figure 6. Before computing the higher-order energy of CRF defined in (9), the cluster-based features are normalized in [0,1] to balance the perception between features and classes. Furthermore, because some features are only discriminated and beneficial for specified classes, the perception of all of the cluster-based features with regard to the labels on two test datasets, described in Section 3.1, can be summarized in Table 2, respectively. To simplify the description, the perception between a normalized feature f and each label y , [ ] R  , can be modeled by: Before computing the higher-order energy of CRF defined in (9), the cluster-based features are normalized in [0, 1] to balance the perception between features and classes. Furthermore, because some features are only discriminated and beneficial for specified classes, the perception of all of the cluster-based features with regard to the labels on two test datasets, described in Section 3.1, can be summarized in Table 2, respectively. To simplify the description, the perception between a normalized feature f and each label y, R[•], can be modeled by: where λ is the scale parameter, and ε the translation parameter. Table 2. Perception of the features for each class. Respectively, the symbols "∝ " and "−∝", indicate the feature value that tends to be large or small in the corresponding class.
Specifically, some semantic rules are defined to adjust the higher-order potentials. Discriminative thresholds τ H and τ G for F H and F G , respectively, can be used to classify buildings and vehicles. Buildings and facades have a lower value in F R , which must be smaller than a threshold τ R . The values of τ H , τ G , and τ R are semantically defined based on common knowledge, and are generally suitable in all scenes. Then, the higher-order potentials are defined as: where is the normalized set of cluster-based features. We consider that off-ground points in a cluster share the same higher-order potential. To reduce the complexity of inference, the higher-order potentials can be rewritten by class membership probabilities and turned into unary potentials [43]. The integrated unary potentials can be written as: where ζ is a free parameter from 0 to 1, to compromise the location cues and semantics cues.

Evaluation Metrics
For evaluation, we compare the derived semantic labeling to the ground truth on a per-point basis. The confusion matrix and five commonly used measures are employed. The evaluation metrics are represented by overall accuracy (OA), Kappa coefficient (KA), recall (R), precision (P), and F 1 -score. Generally, the number of examples per class is inhomogeneous in the test data, and then OA and KA are used to reflect the overall performance and the degree of consistency. Meanwhile, R represents a measure of completeness or quantity, and P represents a measure of exactness or quality. The F 1 -score is a compound metric which combines P and R with equal weights. Appendix C describes the formulas in detail.

Experimental Analysis
To evaluate the performance of the proposed LCS-CRF algorithm, experiments with two ALS data sets were performed on a Windows 10 64-bit, Intel Core i7-4790k 4.00GHz processor with 32 GB of RAM, using Python language.

Study Areas
Two labeled benchmark datasets, Vaihingen Dataset ( Figure 7) and GML Dataset A (Figure 8), are employed to evaluate our methodology for ALS data of different characteristics. The GML Dataset A is provided by the Graphics & Media Lab, Moscow State University, and publicly available sources. This dataset has been acquired with an ALTM 2050 system (Optech Inc.) and contains about 2.077M labeled 3D points, whereby the reference labeling has been performed with respect to five semantic classes (namely, ground, building, car, tree, and low vegetation). For this dataset, a split into a training scene and a test scene is provided. For each point, its XYZcoordinates are provided without intensity value. The Vaihingen Dataset is provided by the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF) and was acquired with a Leica ALS50 system over Vaihingen, Germany, with an average point density of 4 points/m 2 . In the scope of the ISPRS Benchmark on 3D Semantic Labeling, a reference labeling was performed with respect to nine semantic classes, (namely, power line, low vegetation, impervious surfaces, car, fence/hedge, roof, facade, shrub, and tree). Thereby, each point in the data set is labeled accordingly [9]. For this dataset, containing about 1.166 M points in total, a split into a training scene (about 754 k points) and a test scene (about 412 k points) is provided. For each point, its XYZ-coordinates and intensity value are provided.
The GML Dataset A is provided by the Graphics & Media Lab, Moscow State University, and publicly available sources. This dataset has been acquired with an ALTM 2050 system (Optech Inc.) and contains about 2.077M labeled 3D points, whereby the reference labeling has been performed with respect to five semantic classes (namely, ground, building, car, tree, and low vegetation). For this dataset, a split into a training scene and a test scene is provided. For each point, its XYZ-coordinates are provided without intensity value. The GML Dataset A is provided by the Graphics & Media Lab, Moscow State University, and publicly available sources. This dataset has been acquired with an ALTM 2050 system (Optech Inc.) and contains about 2.077M labeled 3D points, whereby the reference labeling has been performed with respect to five semantic classes (namely, ground, building, car, tree, and low vegetation). For this dataset, a split into a training scene and a test scene is provided. For each point, its XYZcoordinates are provided without intensity value.

Qualitative Comparison
In this section, we mainly focus on the analysis of three stages, i.e., ground points filtering, offground points clustering, and LCS-CRF performing.
To visually compare our proposed Algorithm 1 with the CSF method, some small parts with meaningful information are selected from Vaihingen Dataset and GML Dataset A, as shown in Figure  9. In Figure 9, each group (Figure 9a-h) presents the comparison of filtering results for off-ground point with CSF method and our proposed Algorithm 1. We can observe that some confusing object information, especially for small size objective, can be extracted from ground points set, which can

Qualitative Comparison
In this section, we mainly focus on the analysis of three stages, i.e., ground points filtering, off-ground points clustering, and LCS-CRF performing.
To visually compare our proposed Algorithm 1 with the CSF method, some small parts with meaningful information are selected from Vaihingen Dataset and GML Dataset A, as shown in Figure 9. In Figure 9, each group (Figure 9a-h) presents the comparison of filtering results for off-ground point with CSF method and our proposed Algorithm 1. We can observe that some confusing object information, especially for small size objective, can be extracted from ground points set, which can be obtained from CSF method. Not only our method can extract off-ground points from ground point sets, but it can also enhance the reliability of higher-order potentials by eliminate the misclassification between off-ground and ground points. Yet, it has two shortcomings: (1) a fraction of ground points are filtered as off-ground points, which cause a coarse cluster-based classification result; and (2) different parameters should be explored for ALS data diversity. To overcome these shortcomings, we further consider the ground as one of the objectives classified in the calculation of higher-order potentials. Besides, sensitivity analysis for parameters is shown in Section 3.4.1.
Compared with point-based features, the cluster-based features can provide new attributes, upon which semantics cues can be effectively employed. We define five cluster-based features for the derivation of higher-order potentials, which relate closely to the clustering results of off-ground points. Figure 10 presents the clustering results for the test data from Vaihingen Datasets and GML Dataset A, based on the off-ground points, which are extracted with Algorithm 1. As shown in Figure 10, class roof (green in Figure 10a)/building (blue in Figure 10c), which is far from ground with smooth surface; class car (cyan in Figure 10a and reseda in Figure 10c), which has a high correlation with ground; and class tree (yellow in Figure 10a)/high vegetation (orange in Figure 10c), which has a roughness surface, tend to be aggregated into single cluster, and we can make the utmost of semantics cues on these clusters. Due to the similarity of attributes for some different classes, mis-clusters, which means multiple classes contained in a cluster, also exist in the clustering results. Then, we employ the clustering result to define the higher-order potential in the LCS-CRF model, rather than the final semantic classification result. In the LCS-CRF model, we integrate the point-based features and cluster-based features, which show different attributes for each point and complement mutually.
10, class roof (green in Figure 10a)/building (blue in Figure 10c), which is far from ground with smooth surface; class car (cyan in Figure 10a and reseda in Figure 10c), which has a high correlation with ground; and class tree (yellow in Figure 10a)/high vegetation (orange in Figure 10c), which has a roughness surface, tend to be aggregated into single cluster, and we can make the utmost of semantics cues on these clusters. Due to the similarity of attributes for some different classes, mis-clusters, which means multiple classes contained in a cluster, also exist in the clustering results. Then, we employ the clustering result to define the higher-order potential in the LCS-CRF model, rather than the final semantic classification result. In the LCS-CRF model, we integrate the point-based features and cluster-based features, which show different attributes for each point and complement mutually.  To better evaluate the effectiveness of the LCS-CRF model, the qualitative results for three classification algorithms (i.e., RF, CRF, and LCS-CRF) of the two test datasets are, respectively, shown in Figures 11 and 12. To learn the RF models, 400 trees are sufficient in our work. One thousand training samples for each class are randomly chosen from the reference ground-truth data of Vaihingen Dataset and GML Dataset A. The performance of RF in the case of limited training samples can be shown in Figure 11a,b and Figure 12a  To better evaluate the effectiveness of the LCS-CRF model, the qualitative results for three classification algorithms (i.e., RF, CRF, and LCS-CRF) of the two test datasets are, respectively, shown in Figures 11 and 12. To learn the RF models, 400 trees are sufficient in our work. One thousand training samples for each class are randomly chosen from the reference ground-truth data of Vaihingen Dataset and GML Dataset A. The performance of RF in the case of limited training samples can be shown in Figure 11a,b and Figure 12a,b. The soft labeling results for each class, produced by RF, are considered as the unary term of CRF and LCS-CRF.
As can be seen in Figure 11a,b, RF results in a discontinuous shape with lots of discrete points, due to the lack of consideration for the spatial contextual information. By considering the contextual information to alleviate the effect of noise, CRF can deliver a smoother classification map. Although the classification performance of a CRF model can be promoted dramatically by combining As can be seen in Figure 11a,b, RF results in a discontinuous shape with lots of discrete points, due to the lack of consideration for the spatial contextual information. By considering the contextual information to alleviate the effect of noise, CRF can deliver a smoother classification map. Although the classification performance of a CRF model can be promoted dramatically by combining contextual information compared with RF method, their classification performance in keeping useful details are different. Due to the similarity between point-based features of different classes, e.g., ground and low vegetation, tree, and roof, etc., misclassified points are aggregated together, as shown in Figure 11d, which always directly affects the accurate interpretation of the various classes. It is a challenging task to accurately discriminate similar classes. However, on the whole, our proposed LCS-CRF model can achieve the semantic classification result with fewer misclassified regions and less salt and pepper classification noise by employing location-contextual-semantics cues. As shown in Figure 11e-f, the proposed model shows a competitive visual performance and can preserve useful detail information.
To verify the robustness of our method, another high-resolution ALS data of a different sensor is used to assess the performance of proposed method. Similarly, the semantic classification results of GML Dataset A obtained by three methods, i.e., RF, CRF, and LCS-CRF, are shown in Figure 12. Similar to the above test, CRF can deliver smoother results than RF and an improvement in the classification accuracy. Compared with RF model, CRF tends to greatly reduce the classification noise based on context cues. Then, some potentially useful details may also be eliminated. In this experiment, there is a slight difference of point-based features between the class car and low vegetation, which are easily confused. As shown in Figure 12c contextual information compared with RF method, their classification performance in keeping useful details are different. Due to the similarity between point-based features of different classes, e.g., ground and low vegetation, tree, and roof, etc., misclassified points are aggregated together, as shown in Figure 11d, which always directly affects the accurate interpretation of the various classes. It is a challenging task to accurately discriminate similar classes. However, on the whole, our proposed LCS-CRF model can achieve the semantic classification result with fewer misclassified regions and less salt and pepper classification noise by employing location-contextual-semantics cues. As shown in Figure 11e-f, the proposed model shows a competitive visual performance and can preserve useful detail information. To verify the robustness of our method, another high-resolution ALS data of a different sensor is used to assess the performance of proposed method. Similarly, the semantic classification results of GML Dataset A obtained by three methods, i.e., RF, CRF, and LCS-CRF, are shown in Figure 12. Similar to the above test, CRF can deliver smoother results than RF and an improvement in the classification accuracy. Compared with RF model, CRF tends to greatly reduce the classification noise low vegetation and car. With the proposed LCS-CRF model, not only the location and context information are considered, but also the semantics to alleviate the misclassification effectively are fused. The visual results in Figure 12e,f, show an improvement for the car and low vegetation classification.
It is observed that our proposed method outperforms RF and CRF. An improvement in the quantitative metrics will be analyzed in the next section, in which the quantitative performances of Vaihingen Dataset and GML Dataset A are also reported.

Quantitative Comparison
In this section, the corresponding quantitative performances of Vaihingen Dataset and GML Dataset A are reported and analyzed. In accordance with Figures 11e,f and 12e We classify semantic classification methods for the Vaihingen Dataset into two categories: traditional machine learning-based and deep learning-based. We compare our method with the result provided in Reference [26] and the submitted results with published papers provided by the ISPRS Semantic Labeling Benchmark. Reference [47,48,5] adopted the traditional machine learning classifiers to classify ALS point clouds, while Reference [49][50][51][52] leveraged deep learning for the semantic classification. For the sake of clarity and readability, the results achieved by each research group and our model (namely LCS-CRF) are listed for comparison in Table 3.
We perform experiments on another ALS dataset, i.e., GML Dataset A, to verify the effectiveness of our method. The LCS-CRF model ranks first in terms of the OA and 1 F compared with other It is observed that our proposed method outperforms RF and CRF. An improvement in the quantitative metrics will be analyzed in the next section, in which the quantitative performances of Vaihingen Dataset and GML Dataset A are also reported.

Quantitative Comparison
In this section, the corresponding quantitative performances of Vaihingen Dataset and GML Dataset A are reported and analyzed. In accordance with Figure 11e,f and Figure 12e,f, our method can correctly label most of the test data. It can achieve a high OA of 83.1% and KA of 78.5% on the Vaihingen Dataset with eight categories of objects and a high OA of 94.3% and KA of 89.3% on the GML Dataset A with five categories of objects.
We classify semantic classification methods for the Vaihingen Dataset into two categories: traditional machine learning-based and deep learning-based. We compare our method with the result provided in Reference [26] and the submitted results with published papers provided by the ISPRS Semantic Labeling Benchmark. Reference [5,47,48] adopted the traditional machine learning classifiers to classify ALS point clouds, while Reference [49][50][51][52] leveraged deep learning for the semantic classification. For the sake of clarity and readability, the results achieved by each research group and our model (namely LCS-CRF) are listed for comparison in Table 3. Table 3. Scores per class for each method and corresponding overall accuracy (OA) and F 1 (%). We perform experiments on another ALS dataset, i.e., GML Dataset A, to verify the effectiveness of our method. The LCS-CRF model ranks first in terms of the OA and F 1 compared with other methods listed in the Table 4.

Sensitivity Analysis for Parameters
In our experiments, the LCS-CRF model obtained a good classification performance. However, there are so many parameters in the LCS-CRF model to be determined, which play an important role in the classification. These parameters distribute in three parts, i.e., Algorithm 1, Algorithm 2, and higher-order potentials.

Parameters for Algorithm 1
The implementation of CSF requires three essential parameters, including the GR to determine the number of particles, CT to select the distances between points and the simulated terrain, and MI to end the simulation process. To study the sensitivity of GR and CT for the CSF algorithm, MI is set to be 200, which is enough for our scene. GR varies from 0.2 to 1.2 and 0.2 to 1.0 for the test data of Vaihingen Dataset and GML Dataset A, respectively, with a step of 0.2. CT selected from 0.3 to 1.3 and 0.4 to 2.4 for the test data of Vaihingen Dataset and GML Dataset A, respectively, with a step of 0.4. Sensitivity analysis for parameters is presented in Figure 13. As can be observed, better results can be obtained, which are considered as the initial input of Algorithm 1, with GR equal to 0.6 and CT equal to 0.5 for Vaihingen Dataset, and GR equal to 0.4 and CT equal to 1.2 for GML Dataset A.
Sensors 2020, 20, 1700 20 of 29 that 0.4 and 0.8 are not the cut-off values of maximum distance, only representing the variation tendency of OA for ground/off-ground points. The minimum inlier ratio varies from 0.5 to 0.8 with a step of 0.05 for the two test datasets. For evaluating the filtering results, we utilize OA for ground/offground points to analyze the OA. Analysis for these two parameters are shown in Figure 14. We can observe that the OA for ground/off-ground points converges to a certain value, due to the higher maximum distance and the lower minimum inlier ratio with a slight influence on the OA. To make the results more reliable, the observed results, which can show the details directly, parts of them as shown in Figure 9, are also considered to determine the values of these two parameters.
(a) (b) In order to obtain more details of off-ground object information and keep the OA, the parameters are utilized to perform Algorithm 1, which are listed in Table 5. These parameters are determined based on the experimental results (as shown in Figures 9 and 14) and the properties of the input ALS point cloud.  Although the ground points filtering accuracy can be 95.1% and 96.0% for Vaihingen Dataset and GML Dataset A, more detailed off-ground object information, especially objects with small size and low height, are essential for our scene to improve the semantic classification results. Then, we employ RANSAC for the ground points obtained by CSF to enrich the off-ground information with enough filtering accuracy. The property of RANSAC for each point is mostly determined by two thresholds, the maximum distance to distinctive initial inliers among current point's neighbors and the minimum inlier ratio to determine whether the current point is an element of ground points set on the premise that the current point belongs to initial inliers.
To find an appropriate value of maximum distance and minimum inlier ratio, we test the procedure with maximum distance varying from 0.1 to 0.4 with a step of 0.05 and 0.1 to 0.8 with a step of 0.1 for the test data of Vaihingen Dataset and GML Dataset A, respectively. It is worth noting that 0.4 and 0.8 are not the cut-off values of maximum distance, only representing the variation tendency of OA for ground/off-ground points. The minimum inlier ratio varies from 0.5 to 0.8 with a step of 0.05 for the two test datasets. For evaluating the filtering results, we utilize OA for ground/off-ground points to analyze the OA. Analysis for these two parameters are shown in Figure 14. We can observe that the OA for ground/off-ground points converges to a certain value, due to the higher maximum distance and the lower minimum inlier ratio with a slight influence on the OA. To make the results more reliable, the observed results, which can show the details directly, parts of them as shown in Figure 9, are also considered to determine the values of these two parameters. In order to obtain more details of off-ground object information and keep the OA, the parameters are utilized to perform Algorithm 1, which are listed in Table 5. These parameters are determined based on the experimental results (as shown in Figures 9 and 14) and the properties of the input ALS point cloud.  In order to obtain more details of off-ground object information and keep the OA, the parameters are utilized to perform Algorithm 1, which are listed in Table 5. These parameters are determined based on the experimental results (as shown in Figures 9 and 14) and the properties of the input ALS point cloud.

Parameters for Algorithm 2
Algorithm 2 is proposed to produce clusters of off-ground points, which can be used to extract discriminative cluster-based features. In the first step, two parameters, r and γ, are selected for the mean-shift algorithm, which are based on the prior knowledge about the expected point distribution for the scene we consider. Then, parameters k, th d , and th t , which were described in Section 2.1.3, are determined for the post-processing step.
Herein, the performance of Algorithm 2 is mainly evaluated based on the intuitive result, and an experiment example has been shown in Figure 3. Then, we only provide the configuration of these parameters for Vaihingen Dataset and GML Dataset A, which is shown in Table 6.

Parameters for Higher-Order Potentials
In the LCS-CRF model, the higher-order potentials are derived with semantics cues based on a Sigmoid function. Two parameters are utilized to determine the formulation of Sigmoid function, and they are, respectively, denoted as λ and ε. Parameter λ mainly controls the scaling of Sigmoid function, while ε controls the translation. In this section, we also normalize the cluster-based features into [0,1], and then parameter ε is set as 0.5 to consist with the distribution of cluster-based features. The expression of Sigmoid function with different values of parameter λ is shown in Figure 15. The datum line is represented by a red straight line, which is treated as a reference to Sigmoid function. It means that the values of cluster-based features are directly used for the calculation of higher-order potentials. Different curves in the figure represent the projection values of cluster-based features through Sigmoid function with different λ. We employ Sigmoid function to enhance the discrimination of the cluster-based features to obtain a better classification result. However, there have been a few misjudgments in terms of cluster-based features, which are utilized to obtain the higher-order potential based on the regulations described in Section 2.3.3. Then, the corresponding analysis for parameter λ is given to test its effect in the LCS-CRF algorithm.
In the LCS-CRF model, the higher-order potentials are derived with semantics cues based on a Sigmoid function. Two parameters are utilized to determine the formulation of Sigmoid function, and they are, respectively, denoted as λ and ε . Parameter λ mainly controls the scaling of Sigmoid function, while ε controls the translation. In this section, we also normalize the clusterbased features into [0,1], and then parameter ε is set as 0.5 to consist with the distribution of clusterbased features. The expression of Sigmoid function with different values of parameter λ is shown in Figure 15. The datum line is represented by a red straight line, which is treated as a reference to Sigmoid function. It means that the values of cluster-based features are directly used for the calculation of higher-order potentials. Different curves in the figure represent the projection values of cluster-based features through Sigmoid function with different λ . We employ Sigmoid function to enhance the discrimination of the cluster-based features to obtain a better classification result. However, there have been a few misjudgments in terms of cluster-based features, which are utilized to obtain the higher-order potential based on the regulations described in Section 2.3.3. Then, the corresponding analysis for parameter λ is given to test its effect in the LCS-CRF algorithm. In order to study the sensitivity of the parameter λ for our method, other parameters are set to be constants. Experiments are conducted to analyze the effect of the parameter λ , which is varied from 2 to 12 with a step of 2 for Vaihingen Dataset and GML Dataset A. The sensitivity analysis for the parameter λ is presented in Figure 16. To make them more concise, we also compute the In order to study the sensitivity of the parameter λ for our method, other parameters are set to be constants. Experiments are conducted to analyze the effect of the parameter λ, which is varied from 2 to 12 with a step of 2 for Vaihingen Dataset and GML Dataset A. The sensitivity analysis for the parameter λ is presented in Figure 16. To make them more concise, we also compute the variation tendency of the OA under different settings of parameter λ, as shown in Figure 16a,b. The parameter λ shows obvious impact on the OA compared with employing the datum function, and the relative importance of the higher-order potential is increased as parameter λ increases.  We can observe that, the OA first increases as parameter λ increases since the semantic rules are properly utilized with Sigmoid functions to enhance the discrimination of cluster-based features. Then, the OA no longer increases at a certain value of parameter λ (i.e., around 6 for Vaihingen Dataset and around 8 for GML Dataset A), and even shows a slight decreasing trend, since the large varying degrees of cluster-based features can lead to the accumulation of noise from cluster-based features and cause misjudgments of clusters. The red dotted lines in Figure 16, serving as a reference, represents the classification results based on the higher-order potentials derived by datum function.
Another parameter, ζ, is also analyzed with Vaihingen Dataset and GML Dataset A, which mainly controls the effect of the higher-order potentials in the classification. As shown in the Figure 17, parameter ζ is selected from 0 to 1 with a step of 0.1, while other parameters are set to be constant values. The OA gradually increases in the beginning with the increase in parameter ζ, in which the semantic rules dominate the tendency compared with location information in the unary potential. After parameter ζ reaches up to a certain value (i.e., around 0.6 for Vaihingen Dataset and around 0.7 for GML Dataset A), the OA also shows a slight decreasing trend, since the unary potential become dominant with the increase in parameter ζ. When ζ equals to 1, the overall accuracies for Vaihingen Dataset and GML Dataset A reach 0.783 and 0.924, respectively, where the classification result is obtained by the CRF model. It is found that an obvious improvement of the classification results was shown in both test datasets by integrating higher-order potentials, compared with the results directly derived by CRF model. variation tendency of the OA under different settings of parameter λ , as shown in Figure 16a,b. The parameter λ shows obvious impact on the OA compared with employing the datum function, and the relative importance of the higher-order potential is increased as parameter λ increases.
(a) (b) We can observe that, the OA first increases as parameter λ increases since the semantic rules are properly utilized with Sigmoid functions to enhance the discrimination of cluster-based features. Then, the OA no longer increases at a certain value of parameter λ (i.e., around 6 for Vaihingen Dataset and around 8 for GML Dataset A), and even shows a slight decreasing trend, since the large varying degrees of cluster-based features can lead to the accumulation of noise from cluster-based features and cause misjudgments of clusters. The red dotted lines in Figure 16, serving as a reference, represents the classification results based on the higher-order potentials derived by datum function.
Another parameter, ζ , is also analyzed with Vaihingen Dataset and GML Dataset A, which mainly controls the effect of the higher-order potentials in the classification. As shown in the Figure   17, parameter ζ is selected from 0 to 1 with a step of 0.1, while other parameters are set to be constant values. The OA gradually increases in the beginning with the increase in parameter ζ , in which the semantic rules dominate the tendency compared with location information in the unary potential. After parameter ζ reaches up to a certain value (i.e., around 0.6 for Vaihingen Dataset and around 0.7 for GML Dataset A), the OA also shows a slight decreasing trend, since the unary

Discussion
From Table 3, we can observe that the OA of LCS-CRF model performs the best among all of the traditional machine learning based method. As far as the eight specific classes are concerned, our method ranks first in the imp_sur, car, and shrub classes within the traditional machine learning-based methods, and its P surpass previous highest results with absolute advantages (+1.1%, +2.6%, and +6.1%). The RF model is mainly based on the point-based features, which are derived by the location cues of points, to perform semantic classification for ALS data. The CRF model integrates the location and contextual cues and shows a smoother result compared with the RF model (as shown in Figure 11). Obviously, the LCS-CRF model shows a superior result by incorporating location, context, and semantics cues into a higher CRF model. Especially for the car class, a great improvement of P is obtained by adding semantics cues. The class low-veg, with a higher P, mainly benefits from Algorithm 1. The OA of the LCS-CRF model ranks first among the traditional machine learning-based methods and third among the deep learning-based methods, with minor disadvantages (1.8% and 2.1% lower than the second and the first OA, respectively). Though some deep learning-based methods perform better than our method, the LCS-CRF model can also satisfy the general demand with less training costs.
In Table 4, the P of car class with LCS-CRF model surpass the results of RF and CRF model with +26% and +22.5%, which means that semantics cues play an important role in the semantic classification.
We perform the methods RF+LBP and RF+α-exp by adding a regulation framework to smooth the semantic results derived by RF model. Though significant improvements are shown in building, car, and low vegetation classes compared with RF model, the OA of methods RF+LBP and RF+α-exp are still less than 90%. The P of car class for our method is superior to others, and plausible results are shown in ground, building, tree, and low vegetation classes, which validate our proposed method.
In comparison to other approaches, our method shows several strengths. We compare the results achieved with our methodology to the ones obtained by recent approaches. Similarly, Reference [5] proposed a hierarchical higher-order CRF framework, in which, spatial and context were integrated via a two-layer CRF. The Robust P n Potts model was utilized to build the higher-order potential in their first layer CRF. Their framework iterated and mutually propagated context to improve the classification results. The results, with their framework on the Vaihingen Dataset, have been described in Table 3 (LUH), which showed outstanding performance in F 1 and revealed a rather high quality of the results in several classes. In contrast, our methodology extra integrates semantic cue in a higher-order CRF, which is a one-layer CRF with neither iteration nor propagation of context, and shows obvious increases in class car and OA by 5.8% and 1.5%, respectively. Currently, the only approach delivering semantic classification results of higher quality (with OA = 85.2% and F 1 = 69.3%) for the Vaihingen Dataset is the one presented by Reference [52] that leverages deep learning for the semantic labeling of ALS point clouds. Yet, a multi-convolutional neural network (MCNN) was trained to automatically learn deep features of each point from the generated contextual images across multiple scales, which was time-consuming in training process and had relatively high requirements to hardware, while the proposed LCR-CRF framework only employs explicit point-based and cluster-based features. Comparable results can be observed in Table 3 with P in classes imp_sur (+0.1%), car (+8.8%), façade (-1.9%), and shrub (−0.5), and with the OA (−2.1%). Compared with [49] and [50], which also adopted deep learning for the semantic classification, the OA is, respectively, raised by 1.6% and 1.5% in our framework and P in several classes shows better performance, especially in class car. Due to the consideration of multi-scale neighborhoods, Reference [26] obtained an improved performance on the GML Dataset A by exploring contextual information across different scales in the, respectively, extracted features, while we obtain the optimal neighbors with the algorithm proposed in Reference [7] and integrate meaningful semantic cues. As shown in Table 4, our method increases the OA by 3.8% and the F 1 by 11.7%, and three of the five classes' P are improved. The methods RF+LBP and RF+α-exp, which was performed based on the methodology proposed in Reference [25], constructed graph models and employed structured regularization for spatially smoothing semantic labeling of point clouds. In our method, not only spatial information is utilized, but also context and semantic cues are integrated in a posterior probability model. In contrast with these two methods, our method better addresses some hard-to-retrieve classes, such as classes car and low vegetation, and increases OA by 8.3% and 6.5%, as observed in Table 4.
Experiment results suggest that the LCS-CRF model shows superior performance on the semantic classification for ALS data. However, there are still some misclassification in the results. For the Vaihingen Dataset, classes fence and facade are at a disadvantage due to their attributes, including the small cardinal number, sparsity, and similar characteristics with some other classes. A close-up visual inspection shows that the class fence is often classified as class low_veg or shrub, which causes adverse effects on the OA and F 1 . For the GML Dataset A, classes building and car produce lower precisions compared with classes ground and tree. Based on the visual inspection of test data, class building with small height shows similar attributes to classes ground and car, due to its planarity and clustering. Class low vegetation with smaller clusters is easily classified as car, which is very sensitive to the P of class car due to the extremely small size of class car compared with the whole test dataset.
As shown in Section 3.3, parameters in three parts, i.e., Algorithm 1, Algorithm 2, and higher-order potentials, are analyzed. Most parameter values are tested in a general interval based on the attributes of point clouds and common experience. Based on the hardware described in Section 3, it takes about 1.5 h to calibrate the parameters in the first and second parts both on the Vaihingen Dataset and GML Dataset A. The decision of parameters in the third part need a heavier time cost due to the large-scale ALS point clouds, and the time for each inference on the LCR-CRF model is about 1.2 h. Then, parallel computing is utilized to speed up the process to a great extent. Once the parameters are determined, automatic interpretation can be performed on large-scale ALS point clouds. In addition, it takes only about 0.5 h to train a CRF model on the Vaihingen Dataset in our work, while the training time in a deep learning framework takes about three to six days [54].

Conclusions
In this paper, we presented an LCS-CRF model for ALS data semantic classification. The main novelty of this framework consists of the integration of location, context, and semantics cues from irregularly distributed ALS points to semantically labeled point clouds in a higher-order CRF framework. The method processes in three main stages, i.e., (i) feature extraction; (ii) off-ground points extraction and clustering; and (iii) classification. A total of 34 point-based features from their locations and 5 cluster-based features from off-ground points' clusters are extracted to form the feature space. To effectively employ the semantics cues, off-ground points extraction and clustering are performed for the cluster-based feature extraction. Based on the location and semantics cues, the unary potentials and higher-order potentials can be derived by the RF classifier and the sigmoid function. Then, the context information between neighbor points is integrated in a higher-order CRF as a pairwise potential to smooth the classification results. Therefore, the location, context, and semantics cues are, respectively, formulated in unary, pairwise, and higher-order potentials within the probabilistic LCS-CRF model to alleviate the misclassification. The experiments with two ALS point cloud data sets confirm the competitive semantic classification performance of the proposed method in both the qualitative and quantitative evaluations.
However, parameters with different values are sensitive to the classification results. In our future work, further improvements aim at preserving more potentially useful details to improve the results with fewer parameters. We also intend to investigate the potential of deep learning adapted to the ALS point cloud data.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Parameters Max Iterations (MI), Classification Threshold (CT), and Grid Resolution (GR) are utilized for the CSF algorithm which have been analyzed to obtain a better initial result in Section 3. The number of neighbors for a query point is determined by parameter k. th1 and th2 are used to determine the inliers and calculate the inlier proportion thresholds of the RANSAC algorithm and inlier proportion.

Appendix B
Parameters r and γ, scales for neighbor size and gaussian kernel, respectively, are utilized for the mean-shift algorithm which have been analyzed to obtain a better initial result in Section 3. The number of neighbors for a query cluster is determined by parameter k. th d and th t are the constrains for local connectivity and structure correlation.

Algorithm 2. Constrained mean-shift algorithm
Input: ALS off-ground point set {NG} Parameters: r, γ, k, th d , th t 1: Derive the initial clusters and cluster centers with mean-shift algorithm {C}&{C cen } ← (NG, r, γ) 2: While true do 3: For j = 1 to size {C} do 4: If C j ∈ {C} then 5: Find neighbors for each cluster N j ← (C cen , k) 6: Compare C j with cluster n j ∈ N j 7: If local connectivity < th d and structure correlation < th t then 8: C j ← C j ∪ n j , {C} ← {C}\n j 9: End If 10: End If 11: End For 12: No merging happened. 13: End While Output: Final cluster set {C}

Appendix C
The OA, KA, R, P, and F 1 -score can be computed by the confusion matrix as follows: where x ij represents the element of confusion matrix on ith row and jth column.