Full-Waveform LiDAR Point Clouds Classification Based on Wavelet Support Vector Machine and Ensemble Learning

Light Detection and Ranging (LiDAR) produces 3D point clouds that describe ground objects, and has been used to make object interpretation in many cases. However, traditional LiDAR only records discrete echo signals and provides limited feature parameters of point clouds, while full-waveform LiDAR (FWL) records the backscattered echo in the form of a waveform, which provides more echo information. With the development of machine learning, support vector machine (SVM) is one of the commonly used classifiers to deal with high dimensional data via small amount of samples. Ensemble learning, which combines a set of base classifiers to determine the output result, is presented and SVM ensemble is used to improve the discrimination ability, owing to small differences in features between different types of data. In addition, previous kernel functions of SVM usually cause under-fitting or over-fitting that decreases the generalization performance. Hence, a series of kernel functions based on wavelet analysis are used to construct different wavelet SVMs (WSVMs) that improve the heterogeneity of ensemble system. Meanwhile, the parameters of SVM have a significant influence on the classification result. Therefore, in this paper, FWL point clouds are classified by WSVM ensemble and particle swarm optimization is used to find the optimal parameters of WSVM. Experimental results illustrate that the proposed method is robust and effective, and it is applicable to some practical work.


Introduction
The remote sensing technique is based on the sensor recorded energy that is reflected or emitted from the Earth's surface [1]. Therefore, remote sensing can acquire information about ground objects without physical contact and can be divided into two categories: passive and active. Passive remote sensing data mostly exist in the form of spectral images, which makes it difficult for them to describe 3D space features and they usually require geometric correction.
Light Detection and Ranging (LiDAR) is a type of active remote sensing technique. Different from the 2D information displayed by spectral images, the advantage of LiDAR is to directly generate point clouds with large area coverage and high-precision coordinates [2]. In most cases, LiDAR point clouds are used to produce elevation data, such as the Digital Elevation Model (DEM) and Digital Surface Model (DSM), which is typically employed as ancillary information to assist passive remote sensed data in classification [3][4][5][6]. With the hardware equipment of LiDAR, more and more features are extracted each classifier constructed by them, respectively, is used twice. At the same time, particle swarm optimization (PSO), which is a swarm intelligence algorithm with a good performance and stable convergence to the optimal solution, is utilized to optimize the SVM parameters.

Feature Extraction for FWL
According to the type of device, LiDAR point cloud classification can be conducted on the basis of many features, such as intensity, geometric, textural, and multispectral features. Compared with traditional LiDAR, FWL is able to extract more points, showing more details on structure. Therefore, geometric features are adopted to show the positional relationship between point clouds. However, it reflects the characteristics of discrete point clouds and neglects the interaction of the entire laser pulse with the ground objects. Hence, waveform features can be seen as a powerful complement to the classification process. For FWL, waveform decomposition of original data, whose essence is to fit the waveform with a function, is a useful technique to extract data information and should be performed before the process of classification. In this paper, before waveform decomposition, filtration of noise in the raw data [34] is conducted to reduce the noise interference for classification results, and then a set of Gaussian functions [35] is employed to fit the waveform. The relevant parameters of the function thus obtained are called waveform features of point clouds, reflecting the position and backscattering property of the targets during laser transmission.
Geometric features chosen for classification include the elevation (h), elevation standard deviation (σ h ), volume density (ρ), curve (c), vertical angle (N Z ), and vertical angle variance (σ N Z ). Before the extraction of geometric features, a restricted 3D neighborhood ω is defined. A covariance matrix of 3D coordinates and its eigenvalues λ 1 , λ 2 , and λ 3 (λ 1 < λ 2 < λ 3 ) are computed on the basis of 3D points in ω. The corresponding eigenvector of λ 1 is the normal vector (N X , N Y , N Z ). Geometric features take advantage of the geometric shape intuitively and show geometric relationships between point clouds. However, these features cannot give the radiometric information expressing the reflectance properties of ground objects. While waveform features provide both geometric information and radiometric information, which is used to distinguish different surface materials. Furthermore, compared with the geometric features extracted from a specific neighborhood, the waveform features can reflect the characteristics of the pulse emission direction directly.
Waveform features extracted in this paper are the echo amplitude (A), width (σ), peak location (u) and intensity (I). Amplitude is the peak value of the echo signal, width is the standard deviation of each Gaussian component, peak location is the maximum position of the Gaussian function and intensity is the backscattered energy of a single Gaussian component, which is calculated by integrating each Gaussian component. All used features and their explanations are shown in Table 1. [19].  ρ ρ = n/v ω v ω is the volume of ω. Generally speaking, ρ of building walls and trees is lower than others.
c reflects the shape of the surface of the object, and the canopy usually has a high value.
N Z / Deviation angle of a normal vector from the vertical direction, reflecting the flatness of the ground object.
The variance of the vertical angles of 3D points in ω, reflecting the shape of the ground object.  I / I is the amount of energy returned by the laser pulse interacting with the ground objects whose characteristic is similar to A.

Construction of WSVM Model
SVM is a machine learning method which classifies data by creating a hyper-plane. Assume a set of training data X = {(x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n )}, where X i ∈ R d , and y i ∈ {−1, +1} is a class label. Then, the training process can be transformed into the optimization of the following expressions [36,37]: where ω is the normal vector of the hyper-plane, and i ( i ≤ 0) is a slack variable used for measuring classification errors. C is the penalty parameter, b is the bias or threshold, and φ is a function that maps the input data to a higher-dimensional space where data can be linearly separable. To avoid high-dimensional calculations, the mapping step can be skipped by kernel function, which is able to calculate the mapping directly. Additionally, the kernel function is defined as follows: The series of commonly used kernel functions are global functions rather than local functions, however SVM is an approximation model where a local function basis is better than a global basis [38,39]. The essence of wavelet analysis is to approximate signals using a set of functions generated by dilations and translations of a mother wavelet function [40], which means that a good localization property in the time and frequency domain is presented and the details of a signal are extracted [41]. It also has the advantages of a low redundancy, high stability, and well adaptability to high-dimensional data. The construction process of the wavelet kernel function is as follows [42,43]: The one-dimensional wavelet function can be described as: Then, the dot-product of the wavelet kernel is where h(x) is a mother wavelet, x, x ∈ R d , a is a dilation factor, and c is a translation factor. According to the translation invariant kernel theorem K(x, x ) = K(x − x ), Equation (5) can be transformed as: Then, the Gaussian, Shannon, Mexican Hat, Morlet, and Harmonic wavelet functions are used as mother wavelets to construct wavelet kernel functions and the wavelet kernel functions are defined as follows [44][45][46]: Gaussian: Shannon: Mexican Hat: Morlet: Harmonic: Wavelet kernel functions are able to approximate arbitrary functions with a good localization property and have the ability of multi-scale analysis [47]. They amplify the difference between the values of the samples' features and improve the stability of classifier model. Moreover, unlike the commonly used kernel functions that are correlative and redundant, the wavelet kernel functions are orthogonal [48]. These advantages mean that WSVM has a better generalization ability, higher accuracy, and lower computational complexity.

Parameter Optimization
SVM parameters have a great influence on the classification results. SVMs constructed by the above five wavelet kernel functions all have the penalty parameter C and kernel function parameter σ. C controls the generalization capacity of the classifier, while σ determines the distribution of data after mapping it into a new feature space. PSO is a population-based parallel search algorithm using a group of particles. It has been noticed that members of a group seem to share information among them, a fact that leads to increase the efficiency of the current group. A particle moves toward the optimum according to its present velocity, its previous experience, and the experience of its neighbors. In a n-D search space, the position and velocity of the ith particle are represented as vectors X i = x i,1 , ..., x i,n and V i = v i,1 , ..., v i,n , where each element is coded by real values. Let Pbest i and Gbest be the best position of the ith particle and the group's best position thus far, respectively. The velocity and position of each particle are updated as follows [49,50]: where V k i is the velocity of the ith particle at iteration k, ω is the inertia weight factor, c 1 and c 2 are the acceleration coefficients, r 1 and r 2 are random numbers between 0 and 1, and X k i is the position of the ith particle at iteration k. In the velocity updating process, the parameters such as ω, c 1 and c 2 should be determined in advance, which makes it cumbersome to solve large-scale optimization problems.
As PSO is used for SVM parameter optimization, whereby the particle whose vector value can acquire the highest classification accuracy is the optimal solution. Each dataset is divided into two parts for base classifiers: training and testing. Each iteration, the training samples are used for the construction of a base classifier, then a set of parameters is generated, and the testing samples are used to prove the validity of the parameters. After that, the classification accuracy of these parameters can be achieved by comparing the predicted and original labels. The optimal classification accuracy obtained in the current generation is recorded and compared with the highest classification accuracy in the last iteration. If the former wins, the optimal solution is replaced with the current solution. When the maximum number of iterations is reached, the final parameters are obtained.

WSVM Ensemble
Ensemble learning trains a series of base classifiers and combines their outputs by a fusion strategy, improving the generalization performance [51]. As mentioned in Section 1, an ensemble system that uses SVMs as base classifiers is called SVM ensemble [52]. Considering the heterogeneity and homogeneity of the system [53], while reducing the running time, the bagging algorithm is adopted to aggregate the base classifiers and five kernel functions mentioned in Section 3.1 are utilized to construct SVMs. Furthermore, each kernel function is used twice and a total of 10 WSVMs are utilized for the ensemble learning.
After each WSVM has output a predicted label, it is important to adopt a fusion strategy that determines the final label of the ensemble system. In this paper, majority voting is employed as the fusion strategy. Each classifier generates a predicted label, which is taken as a vote, and the final result is determined by the vote of each classifier. For example, there is a classification problem with n classes (Y = 1, 2, ..., n), a training dataset, an instance x to be classified and T classifiers. Each classifier outputs a label L i of x, where i = 1, 2, ..., T and L i ∈ Y. Let V be the final label of x, then where y ∈ Y; if a is true, p(a) = 1, otherwise p(a) = 0 and a is the output of base classifier. Majority voting treats the output of each classifier equally, determining the final result by counting the number of occurrences for each class. When encountering the same number of occurrences, in cases where it is difficult to decide, a random selection of these classes is adopted to determine the final result.
Following the feature extraction in Section 2, bootstrap sampling is used to generate sub-datasets from the original data, and the number of instances in each sub-dataset is not greater than that in the original data. Then, on the basis of the corresponding sub-dataset, each SVM obtains the optimal parameters by the continuous process of training and testing samples combined with PSO. After that, each base classifier predicts the labels with these parameters. Finally, majority voting is utilized to generate the final result. The schematic diagram of the proposed method is shown in Figure 1.

Implementation of the Proposed Method
The classification process of FWL point clouds is completed in this paper, employing WSVM ensemble. The detailed steps of the proposed method are as follows:

•
Step 1: Acquire FWL data and filter noise in the data.

•
Step 2: Decompose full-waveform LiDAR data and extract the features displayed in Table 1.

•
Step 3: Use bootstrap sampling to generate sub-datasets and assign them to each base classifier.

•
Step 4: Train base classifiers with the parameters of each particle in the population.

•
Step 5: Obtain the classification accuracy of each particle and update the population using Equations (12) and (13).

•
Step 6: Update global optimal accuracy and corresponding parameters.
Step 6.1: Compare the classification accuracy of the particles in the current generation with the global optimal accuracy.
Step 6.2: Determine whether to update the global optimal accuracy and the corresponding parameters on the basis of comparison result.
Step 6.3: Return to Step 4 until the number of iterations has reached the maximum value.

•
Step 7: Save the parameters achieved by the global optimal accuracy of each base classifier according to Steps 4-6, and take them as the parameters of each classifier.

•
Step 8: Each base classifier predicts the labels of data with their parameters.

•
Step 9: Output the final results with majority voting.

Experimental Results and Discussion
The experimental environment in this study was a computer with a 2.30GHz CPU and 8G of RAM. The data-processing operation was realized using MATLAB 2016a and VS2017 software. The manual classification process was accomplished using LiDAR software and visual interpretation by researchers with relevant working experience.

Experimental Platform and Data Information
The data used in this paper were acquired by airborne LiDAR system ALS60, in 2009, and in the form of Las 1.3. There were three study areas and the experimental data were colored according to elevation values, as shown in Figure 2a-c. Study Area 1 is flat and open, with a low point cloud density; Study Area 2 is a dense residential area; and Study Area 3 is mainly distributed with large buildings. The number of points, training and testing samples of the study areas and some other information about the study areas are shown in Table 2, and training and testing samples were generated by manual classification. To quicken the process, the number of points in each training subset generated by bootstrap sampling was one-tenth of the whole dataset. After the classification for all of the point clouds was completed, testing samples of these three study areas were used to validate the classification accuracy of the ensemble system.

Classification Results for Point Clouds
The classification results of the proposed classification method for FWL point clouds were compared with those of basic SVM, optimal single WSVMs, RF, ISODATA, and WSVM ensemble without parameter optimization. The classification results are shown in Figures 3-5, while the CPU time of each method is shown in Table 3 (unit: s), and the classification accuracy of each method is shown in Table 4. The ground objects were classified into low vegetations (blue color), trees (green color), buildings (yellow color) and others (red color). Low vegetations includes grass and crops, while others includes mainly artificial surfaces such as roads and concrete floors.   As shown in Table 3, the CPU time of WSVM ensemble without parameter optimization was lower than any other compared methods, and those of RF and optimal WSVMs were relatively high. The CPU time of the proposed method was 85.0167, 78.4931 and 97.0145 s, respectively, which was significantly reduced when compared to optimal single WSVM. As shown in Table 4, the proposed method performed better than RF and ISODATA; it also obviously outperformed basic SVM and WSVM ensemble without parameter optimization, which illustrates the importance of SVM kernel function and parameter optimization. In addition, the classification accuracy of optimal single WSVMs proved the applicability of WSVM in such classification problems. Finally, from the perspective of the ensemble, the classification accuracy of WSVM ensemble was higher than optimal single WSVMs used in the ensemble system. The classification accuracies via optimal single WSVMs of the three study areas were 96.5720%, 94.5937% and 92.1143%, which were 1.1757%, 0.5018% and 1.7179% lower, respectively, than those of the proposed method. Considering its CPU time, as presented in Table 3, this method has good application prospect.
As shown in Figures 3-5, the results of ISODATA and basic SVM had obvious misclassifications, which proved that unsupervised classification and improper kernel functions of SVM have difficulty distinguishing the objects with approximate feature values. It was also found that the results of WSVM ensemble without parameter optimization had some noise (see, for example, left center of Figure 3e and the left half of Figure 5e), which illustrates that the random parameters may be not suitable in such classification problems. Optimal single WSVMs and the proposed method had similar results, but further comparison of these two methods' results revealed that the latter method had better classification ability for details. As shown in Figure 3, the road in the lower left corner extracted by the proposed method is more complete than optimal single WSVM. As shown in Figure 5, the low vegetations in the lower left corner are misclassified as others by optimal single WSVM, while these errors do not exist in the result of the proposed method.

Conclusions
A FWL point cloud classification method employing WSVM ensemble is proposed in this paper, which utilizes five different wavelet kernel functions and ensemble learning. In the process of classification, geometric and waveform features are adopted, and bagging algorithm is used to integrate the base classifiers. Further, in each base classifier, PSO is employed to optimize WSVM parameters to obtain satisfactory classification results. The classification accuracy of all study areas was over 93%, which proved the feasibility of the proposed method. In addition, it was higher than that of basic SVM, optimal single WSVMs, RF, ISODATA and WSVM ensemble without parameter optimization. Moreover, in the case of a similar classification accuracy, the CPU time of the WSVM ensemble was less than single WSVMs. In conclusion, the proposed method can acquire accurate and effective results of FWL point cloud classification and has strong application potential for large-scale classification problems. On the basis of this work, future research such as the optimization of FWL feature extraction will be conducted.

Conflicts of Interest:
The authors declare no conflict of interest.