Systematic Comparison of Power Corridor Classiﬁcation Methods from ALS Point Clouds

: Power corridor classiﬁcation using LiDAR (light detection and ranging) point clouds is an important means for power line inspection. Many supervised classiﬁcation methods have been used for classifying power corridor scenes, such as using random forest (RF) and JointBoost. However, these studies did not systematically analyze all the relevant factors that a ﬀ ect the classiﬁcation, including the class distribution, feature selection, classiﬁer type and neighborhood radius for classiﬁcation feature extraction. In this study, we examine these factors using point clouds collected by an airborne laser scanning system (ALS). Random forest shows strong robustness to various pylon types. When classifying complex scenes, the gradient boosting decision tree (GBDT) shows good generalization. Synthetically, considering performance and e ﬃ ciency, RF is very suitable for power corridor classiﬁcation. This study shows that balanced learning leads to poor classiﬁcation performance in the current scene. Data resampling for the original unbalanced dataset may not be necessary. The sensitivity analysis shows that the optimal neighborhood radius for feature extraction of di ﬀ erent objects may be di ﬀ erent. Scale invariance and automatic scale selection methods should be further studied. Finally, it is suggested that RF, original unbalanced class distribution, and complete feature set should be considered for power corridor classiﬁcation in most cases.


Introduction
In recent years, there has been rapid progress in the construction of smart grids [1,2]. The requirements for rapid monitoring and maintenance management have been continuously improved. Airborne laser scanning (ALS) is an active remote sensing technology that can directly and efficiently acquire the three-dimensional spatial information of objects. It is widely used in digital city construction [3,4], forestry surveys [5,6], and power line inspection [7][8][9][10][11][12]. ALS eliminates some of the limitations of traditional inspection, such as high labor intensity, low efficiency, and low accuracy. It gradually has become an important technique for electric power line inspection. Usually the classification of light detection and ranging (LiDAR) datasets is performed for ground classification only, in which several difficulties can be found; later, several approaches also appear for classifying what is over the ground too: For example, classifying the LiDAR dataset of a power corridor into vegetation, power line, pylon, and other objects. Point cloud classification has gradually become the basis for a transmission line security analysis and power scene reconstruction [12].
The current classification methods for power line corridor scenes can be divided into the following two groups: (i) Rule-based classification [10,[12][13][14][15] and (ii) machine learning classification [7,8,16,17].  Table 2 shows the details about the datasets, including the area, density and the class distribution. Figures 1 and 2 show the class information and geographic information of the datasets, respectively. Figure 2 represent the extent of the dataset. As shown in Figure 1, the color of each point is rendered according to the class. The training set consists of several power corridors with an area of~907 × 90 m 2 and contains 6,199,950 points. There are five classes: Ground, vegetation, power line, pylon, and building. The building class accounts for a very small proportion of the data, 0.81%. The power facility classes account for approximately 5.61% of the data. The pylon class data is shown in Figure 3a. Site I is one of the test sets, with 2,819,021 points. It is a single power corridor with an area of~535 × 90 m 2 . The class distribution is similar in all datasets with the number of vegetation and ground objects being the dominant classes. Site II is another test set with an area of~397 × 90 m 2 and contains 3,697,447 points. It has two types of the pylon (Figure 3a,b), which are different from the pylons in the training set. The vegetation and ground points make up the majority of the dataset, accounting for nearly 90%. Site III is more complex and wider, covering an area of~937 × 80 m 2 and has three pylons. The pylon type in this dataset is different from that in the training dataset ( Figure 3c).    In summary, these datasets have the following characteristics: (1) They all include five classes: Ground, vegetation, power line, pylon, and building; (2) they have diverse pylon types ( Figure 3); (3) the classes are unevenly distributed-the vegetation and ground point clouds account for nearly 90% of the total number of point clouds, while the proportion of the power facilities such as power lines and pylons vary from 0.7%-5.6%. Therefore, the power corridor laser scanning dataset is an unbalanced dataset that can cause a typical unbalanced classification problem. Based on the purpose of classification, the main components of the power corridor, including the ground, vegetation, pylon, power line, and building classes, are identified as the target objects for classification.

Brief Overview of the Method
We conducted a comparative study of the scene classification methods for power corridors using ALS data. In this work, the classification of the power corridor scenes based on a supervised classification method is the basic framework. The effects of the classifier, class distribution, and feature set, and neighborhood radius for feature extraction on the classification accuracy are systematically compared. The aim is to obtain the optimal combination of the parameters to achieve the best classification results. The classification framework in this study is as follows: Firstly, outliers filtering is carried out to remove outliers. Second, a point-based feature vector is constructed according to the characteristics of the target objects. Then, the feature vectors are input into the classifiers to obtain the classification results. Our main experiments include a comparison between classifiers, an analysis of balanced versus unbalanced learning, a comparison between feature sets, and a sensitivity analysis of the neighborhood radius for feature extraction, which are explained in Figure 4 and the following subsections.
Remote Sens. 2019, 11, x FOR PEER REVIEW 7 of 28 Figure 3. The pylon types of datasets: (a) Pylon belongs to the line in Figure 1a; (b) pylon belongs to the line in Figure 1c; (c) pylon belongs to the line in Figure 1d. Three different types of pylon are included.

Brief Overview of the Method
We conducted a comparative study of the scene classification methods for power corridors using ALS data. In this work, the classification of the power corridor scenes based on a supervised classification method is the basic framework. The effects of the classifier, class distribution, and feature set, and neighborhood radius for feature extraction on the classification accuracy are systematically compared. The aim is to obtain the optimal combination of the parameters to achieve the best classification results. The classification framework in this study is as follows: Firstly, outliers filtering is carried out to remove outliers. Second, a point-based feature vector is constructed according to the characteristics of the target objects. Then, the feature vectors are input into the classifiers to obtain the classification results. Our main experiments include a comparison between classifiers, an analysis of balanced versus unbalanced learning, a comparison between feature sets, and a sensitivity analysis of the neighborhood radius for feature extraction, which are explained in Figure 4 and the following subsections. The complete flowchart of our study. This flowchart has been divided into two parts. The left box is the basic classification framework and the right box lists the main experiments, including comparison among classifiers, balanced versus unbalanced learning, comparison between feature sets, and sensitivity analysis of neighborhood radius. All classifiers, feature sets and class distribution tested are listed in the dotted box.

Point Cloud Outliers Filtering
In the data acquisition process of an airborne LiDAR system, noise points are inevitably generated due to the influences of the instrument and the environment. Most of the noise points deviate significantly from the elevation of the target scene. The power line is an important class in power corridors. This class is a point set suspended in the air, which is similar to the noise points. Therefore, this characteristic should be considered when designing the denoising algorithm to keep the integrity of the power line. The left box is the basic classification framework and the right box lists the main experiments, including comparison among classifiers, balanced versus unbalanced learning, comparison between feature sets, and sensitivity analysis of neighborhood radius. All classifiers, feature sets and class distribution tested are listed in the dotted box.

Point Cloud Outliers Filtering
In the data acquisition process of an airborne LiDAR system, noise points are inevitably generated due to the influences of the instrument and the environment. Most of the noise points deviate significantly from the elevation of the target scene. The power line is an important class in power corridors. This class is a point set suspended in the air, which is similar to the noise points. Therefore, this characteristic should be considered when designing the denoising algorithm to keep the integrity of the power line.
A method combining the K-means clustering technique [39] and statistical analysis is used to achieve outliers filtering. Based on the preset cluster number, K, the point clouds are clustered into K clusters. Then, each cluster is traversed, and the distance between each point and the cluster centroid is calculated. The threshold is set according to the 3σ principle [40]. A point with a distance is greater than the threshold is regarded as a noise candidate point. However, the candidate points include true Remote Sens. 2019, 11,1961 8 of 27 noise points and the edge points of clusters. The difference between the two is that the edge points are denser, while the actual noise points are sparse. The average distance between the nearest N points of each cluster centroid is taken as the non-noise point spacing. The T-fold non-noise point spacing is used as the discriminating threshold. When the distance between the noise candidates is larger than T, the candidate is identified as a noise point and removed. The description of the point cloud outliers filtering algorithm ends here.

Point Cloud Resampling
Point cloud resampling is a common technique to change point density. With undersampling methods, the number of point clouds decreases by some rules; with oversampling methods, the number of point clouds increases according to the algorithm principles. Random undersampling and SMOTE oversampling are combined to create balanced datasets.
In our experiments, cloud resampling is a preprocessing method for balanced learning. A class equilibrium training set is constructed by changing the class distribution with resampling the original training set. It's worth noting that, the resampling algorithms rely on the ground truth class label, so we do not resample the testing dataset.

Random undersampling
Since the vegetation and ground objects account for most of the point clouds, we regard them as majority classes. Random sampling is performed on these classes to randomly eliminate these samples from the majority classes to balance the class distribution [41]. This can also reduce the amount of data in the training set to a certain extent and improve the training speed. However, feature information that is potentially important for constructing the classifier may be discarded in the process, and the accuracy is affected as a result.

SMOTE oversampling
For the relatively small classes, such as the pylon and building classes, the SMOTE algorithm proposed by Chawla is applied to these minority classes [20]. Its basic idea is to create new minority samples by interpolating between the existing samples and adding them to the original dataset. The specific algorithm flow is as follows: Where P(c i ) represents the points in each minority class, {P raw } is the raw point set, C min is the minority class, m is the size of C min , k neigh is the number of neighbors, N s is the oversampling ratio, P syn is the synthetic point set, P neigh is n i points of each neighbor, p nn k is a point in P neigh and p new is a synthetic point.

Algorithm. SMOTE
Input: Raw point set {P raw }, minority class C min . Parameters: Number of neighbors, k neigh ; oversampling ratio, N s = {n 1 , n 2 , . . . , n m }, m is the size of C min . Initial synthetic point set P syn ← O For c i in minority class C min do For p j in P(c i ) do Find k neigh nearest neighbors for p j based on the Euclidean distance Randomly choose n i samples as P neigh from the k neigh nearest neighbors For p nn k in P neigh do p new = p j + rand(0, 1)(p nn k − p j ) Append p new to P syn End for End for End for Output: All laser points {P raw } ∪ P syn When constructing the balanced training set, random undersampling and SMOTE oversampling resample each class to a certain value. When determining this sampling reference value, three RF models, with classes of approximately 50,000, 100,000, and 150,000 points, are tested. The model selection is performed by k-fold cross validation [42]. This validation method needs the training data to be divided into k folds. Every k-1 folds of the data are used for training, and the remaining fold is used for validation. The average score of the k experiments is taken as the measure of the current model. The accuracy score and time consumption are used as the performance measures to choose a more appropriate sampling reference value.

Feature Extraction
The definition and calculation of the LiDAR point cloud features is an important support for classification. The accuracy of a point cloud classification algorithm is closely related to the efficiency of the features [43]. Generally, due to the large differences in surface characteristics, such as the laser penetration, surface roughness, and physical size, the point cloud features can be analyzed by a visual interpretation. Therefore, we constructed the point cloud feature vectors by studying the spatial distribution characteristics and surface characteristics of the laser points [8, 16,37,44].
Combining the spherical neighborhood S and cylinder neighborhood C with radius r ( Figure 5), 21 kinds of features are defined. They mainly include four categories: The eigenvalue-based [37], density-based [10], height-based [16], and vertical profile-based [8]. First, based on the points in S, the covariance matrix of the center point, C i , is calculated, and the eigenvalues are obtained as λ 1 , λ 2 and λ 3 , while λ 1 > λ 2 > λ 3 . Then, the total points of S (denoted by N 3D ) and the points in the projected circular area of S (denoted by N 2D ) are also calculated. The mean height (denoted by H ave ) and height of the barycenter (denoted by H f , Equation (1)) in C are also calculated. Finally, the single point feature vector is constructed by performing a comprehensive analysis of each laser point.
where N is the number of segments, N i is the number of points in the i-th segment, and H i is the height of the barycenter of the current segmentation, i = 1, . . . , N.  Table 3 shows the details about all the features. Listed below are some of the features: • Planarity [37]: A measurement of plane characteristics of point clouds. This feature of a planar structure object is a high value. This feature is more pronounced in buildings due to the direct reflection from the roof surface.   Table 3 shows the details about all the features. Listed below are some of the features: • Planarity [37]: A measurement of plane characteristics of point clouds. This feature of a planar structure object is a high value. This feature is more pronounced in buildings due to the direct reflection from the roof surface.
• Linearity [37]: A measurement of linear characteristics of point clouds. The power lines and the edges of the building are distinctly linear structures, and the feature values of these points are high. • Anisotropy [37]: measurement of the uniformity of the distribution of points on three arbitrary vertical axes. This feature helps to separate anisotropic structures such as power lines and buildings from vegetation. • Sphericity [37]: A measurement of how spherical round an object is. This feature of vegetation is more significant due to the relatively uniform distribution of vegetation points in all directions. • Point density [10]: Spherical neighborhoods S are used to calculate this feature. Generally speaking, the density of ground and building roof is the highest, and the density of vegetation is higher than that of power line. This feature can be used to classify vegetation. The pylon has a distinct vertical continuous spatial structure, so this feature can be used for the pylon extraction. Table 3. Feature vector description. The first column represents the basis of feature computing, the second column is the feature name, the third column is the feature abbreviation, the fifth column is the feature computing method, and the last column is the description of the feature meaning.

Feature Abbreviation Equation Description
Eigenvalue The entropy of eigen The density of points within S Density Ratio DR 3 4r · N3D

N2D
The ratio of the point density in S and in its projection plane The height difference between the current point and the lowest point in C Height Below HB Z max − Z The height difference between the current point and the highest point in C  Table 2). Different classes have almost no differences in certain features, resulting in relatively less contribution of these features to classification accuracy. Feature selection is based on the most similar features to the target objects and these features are directly selected or linear combinations of them are selected [37,45,46]. This can reduce excessive feature dimensions and enhance the generalization of the model. Moreover, it enhances the understanding of the features and their values [45]. In this section, the following three feature sets are considered: • Feature set F 0 is the complete feature set, which contains all the features and is described in Table 2.

•
Feature set F PCA is derived from a PCA. A PCA is a commonly used method of data dimensionality reduction that maps n-dimensional features to k dimensions (k < n). By calculating the covariance matrix of F 0 , the eigenvalue and eigenvectors are obtained [47]. The k features with the largest eigenvalues (i.e., the largest variances) are selected. These k-dimensional features, called principal components, are completely new orthogonal features and are not related to one another. The principal components are linear combinations of the original features, which can reflect the influence of the original features to a large extent. The first 95% of the principal components are selected to form F PCA .

•
Feature set F RF is obtained from RF. This feature set construction method was proposed by Díaz-Uriarte for biological applications [39]. In random forest, due to the use of a repeatable random sampling technique called bootstrap, an average of 1/3 of the samples were not included in the collected sample set. These samples are called out of bag data (OOB) [7,48]. The OOB data can be used to estimate the generalization ability of the RF model, which is called the OOB estimation. The smaller the generalization error is, the better the performance of the classifier is, and vice versa. The importance of each feature is calculated from the difference between the error of OOB data and the error after the noise is added [48]. The larger the difference is, the greater the influence of the feature is on the prediction result, which indicates the importance of the feature. Then, the features are sorted by importance, and features are removed based on the rejection ratio, η [48]. After repeating the above process, the feature set with the lowest out-of-bag error rate will be selected as the final set, F RF . Three features, namely, planarity (PL), height above (HA), and maxptsnumdev (MPD), were removed by the feature selection method based on random forest.

K-Nearest Neighbor (KNN)
The K-nearest neighbor algorithm (KNN) is a basic algorithm for digital image classification [49], and is widely used as a reference classification method in pattern recognition [50,51]. The KNN algorithm uses the original training data directly without training a separate classifier model. For a new input instance, p i , the distance between p i and each instance in the training set is obtained, which indicates the similarity degree of the two instances. Then, the K instances closest to p i are found in the training dataset. Finally, the label that appears the most in the K instances is used as the label for p i [52]. The distance calculated in this paper is the Minkowski distance [53]. The KNN algorithm is often used because it is simple and easy to understand. This paper uses its classification results as a baseline for the subsequent classifier comparisons.

Logistic Regression (LR)
Logistic regression (LR) is a linear model developed from linear regression [54]. According to the training set, the best fitting parameters can be calculated by an optimization algorithm. The decision boundaries that separate the classes are fitted. Then, the model is used to classify the test set [52]. For a multiclass case, such as scene classification, the logistic regression model uses a one-vs-all strategy to obtain a separate classifier model for each class. When training, the current class, C i , is treated as a positive class, and all other classes are treated as negative classes. For the classification of N classes, N classifiers are obtained. When testing, the test instances are input into all the classifiers, and the probabilities of the positive classes are calculated. The class with the highest probability value is used as the output label. The advantage of the LR classifier is that its computational complexity is low, and its idea and implementation are relatively simple. The disadvantage is that it is affected by the number of input features. When there are fewer features, the model is prone to underfitting, which results in a lower classification accuracy [52]. L2 is used as a cost function, and 'liblinear' is set as an optimization algorithm parameter. In the power corridor scene classification algorithm of this paper, 21 features with different meanings are involved. Each feature has approximately the same effect on the final classification result. The easiest and most efficient way to use the features is to linearly weight them, so the LR method was tested as one of the classifiers.

Random Forest (RF)
RF is the most commonly used classifier for power corridor scene classification [48]. It is a combined classifier proposed by Breiman. It grows many decision trees, and each of them is independent. Starting from the root node, a certain feature of the instance is tested, and then the instance is assigned to its child node according to the test result. The instances are tested recursively until the leaf nodes are reached and the decision tree is built. When the training set for the current tree is drawn by sampling with replacement, approximately one-third of the cases are left out of the sample [55]. Each tree gives a classification and obtains the respective results. These voting results are integrated, and the label with most votes is output as the final judgment [56]. The out-of-bag data are used to obtain a running unbiased estimate of the classification error as the trees are added to the forest. The out-of-bag data are also used to obtain estimates of variable importance [48]. The Gini coefficient is used for partitioning the node data sets. RF has been used in many laser point cloud classification experiments and has achieved good classification accuracy. Thus, we can think of this classifier as the state of the art.

Gradient Boosting Decision Tree (GBDT)
When GBDT was first proposed, it was considered to be a generalization algorithm. GBDT represents a type of ensemble learning, proposed by Jerome Fredman [57]. Its basic unit is the decision tree, which is the same as in RF. Due to high accuracy, short time-consumed and less memory occupation, GBDT is mostly used to solve data mining problems [58]. GBDT mainly determines the class by calculating the value of the scalar score function, which is similar to the cost function of logistic regression. The basic principle of GBDT is that each decision tree learns the residuals of all tree conclusions. The purpose of each calculation is to reduce the previous residual, that is, to reduce the model residual to the gradient direction. The mean squared error with Friedman's improvement score is used for potential splitting. Since the power corridor scene may involve a variety of terrain conditions, different cloud densities, and different pylon types, a classifier with high generalization ability is required. Accordingly, this classifier was tested as one of the classifiers.

Experiments
We developed a framework for power corridor classification method using C++ language and Python language. Specifically, point cloud outlier filtering and feature extraction are written in C++, and other parts, including point cloud resampling and classification, are written in Python. Based on the Point Cloud Library (PCL) [59], an open source C++ programming library, a KD (k-dimensional) tree is constructed for point cloud to achieve fast neighborhood search. The classification experiments mainly rely on the Sklearn [60], a machine learning library in python.
The experiments are conducted on a laptop running Microsoft Windows 10 (×64) with 4-Core Intel I5-8250U, 8GB Random Access Memory (RAM) and 256G SSD.
We compared four important factors to obtain the optimal classification solution in the power corridor classification from LiDAR points. Four different classifiers (KNN, LR, RF, and GBDT) were tested for classification. Two opposite class distributions (original unbalanced distribution and balanced distribution after sampling) were discussed. Three feature sets (F 0 , F PCA , F RF ) and the neighborhood size for calculating the point features were also analyzed. Four comparative experiments were carried out: (1) In the comparison experiment for the classifiers, KNN, LR, RF, and GBDT were tested to implement the classification of power corridor scenes. The feature vectors of the training set were fed into the classifier, and then the trained model of each classifier was obtained. After that, the test set features were transmitted to the trained models, and the prediction labels of each point were output. Finally, the classification results were evaluated using the performance measures. The original training set and the feature set F 0 was used in this experiment. In addition, a 5-fold cross validation was used to evaluate the model and adjust the hyperparameter. (2) In the comparison experiment for the class distribution, the original dataset with an unbalanced class distribution and the dataset with a balanced class distribution after data resampling were used. By analyzing the class distribution of the LiDAR points, the pylon, power line, and building classes were oversampled in this experiment, and the vegetation and ground classes were undersampled. The sampling reference value was determined to be 100,000 by 5-fold cross validation. The optimal classifier in the experiment (1) was selected, and the feature set F 0 was used in this experiment. (3) In the comparison experiment for the feature set selection, feature sets F 0 , F PCA , F RF were tested.
In the construction of feature set F PCA , to retain the near-possible feature information, the first F features with a cumulative variance contribution rate of 95% were selected. When the feature set F RF was constructed, the rejection ratio was set to 10%. The original unbalanced class distribution training dataset that performed better in the experiment (2) was used, and the optimal classifier in the experiment (1) was used for classification. (4) In the comparison experiment for the neighborhood size, neighborhood sizes in the range of 1.5~5.5 m with an interval of 1 m were tested separately. The selection of neighborhood radius range is based on the work of Guo et al. [7]. Guo et al. tested the neighborhood radius of 1-6 m for feature extraction and found that 2.5 m was the optimal size [7]. In this experiment, the sensitivity of the neighborhood radius in the same range is analyzed, and 2.5 m is used as one of the test values. All the classifiers mentioned in the experiment (1) were used to study the neighborhood sensitivity of each classifier, and the optimal feature set in the experiment (3) was used in this experiment.
To evaluate the influence of the different factors more scientifically, several common performance measures are introduced as follows: • Confusion matrix. The misclassification results are recorded in a matrix form. Each column represents the number predicted to belong to the class, while each row represents the number that actually belongs to the class. Four types of records are displayed: A positive sample (TP) that has a correct judgment, a positive sample (FN) that has an incorrect judgment, a negative sample (TN) that has a correct judgment, and a negative sample (FP) that has an incorrect judgment [61]. • Precision rate, PRE. This measure represents the proportion of all the samples classified into this class that truly belong to this class. This is an important measure in the classification of power corridor scenes. When calculating the rate of a certain class, all its samples are regarded as positive samples, and the other samples are regarded as negative samples. The proportion of correct samples of vegetation, pylon, power line, and other classes can be analyzed [62]. This rate is computed as in Equation (2).

•
Recall rate, REC. This is the ratio of the correctly classified samples to the total number of samples in the class [62]. This rate is computed as in Equation (3). • F1-score, F1. When the classification requires high PRE and REC values, the F1 score can be introduced as a performance measure [63]. This metric is computed as in Equation (4).
Since there are many classes involved in power corridor classification, macro averaging was used to evaluate the overall classification. This measure is the arithmetic mean of a performance measure in all the classes. Combined with the above performance measures, macro precision P (Equation (2)), macro recall R (Equation (3)) and macro F1 F (Equation (4)) were used as the performance measures of the overall classification results. The equations of macro averaging and basic performance measures are as follows: where n is the number of classes.

Comparison among the Classifiers
The misclassifications of KNN, LR, RF, and GBDT appear to be quite similar. The classification results of sites I, II and III are shown in Figures 6-8 where n is the number of classes.

Comparison among the Classifiers
The misclassifications of KNN, LR, RF, and GBDT appear to be quite similar. The classification results of sites I, II and III are shown in Figures 6, 7 and 8, respectively. The black boxes are partial areas with obvious misclassifications. The vegetation points are seriously confused with the ground points. A possible explanation for this result may be the features such as elevation, density, and dispersion of the low vegetation points are similar to those of the ground points, which may cause some difficulties in distinguishing them. Some power line points are confused with vegetation and pylon points. Building points are always mixed with high vegetation points, making it easy to be misclassified for vegetation. The above misclassifications exist in the classification results of the four classifiers.    Table 4 shows the classification performance for the different classifiers. For site I, the R value of LR is only 69.93%, while the other R values are more than 80%. The classification accuracy of RF is slightly better than that of GBDT and LR (P = 82.16%, R = 83.15%, F = 82.33%). For site II, the R value of LR is only 65.58%, and the F value is also the lowest. All the measures of RF are approximately 1% higher than those of GBDT. For the site III, the R values of all the classifiers decrease significantly compared with the other test sets. The P value of KNN is the highest among these four classifiers, at 81.90%. Compare with other classifiers, GBDT performs better on the results of this data set (P = 78.52%, R = 76.35%, F = 76.59%). By synthesizing the classification performance of the three test sets, we can conclude that RF and GBDT both perform well in the power corridor classification. RF provides better results in sites I and II while GBDT only in site III. Meantime, GBDT performs slightly better than RF when average accuracy is taken into account (P = 83.46%, R = 80.76%, F = 81.13%). The classification performance of these two classifiers seems to be similarly superior. In addition, the efficiency of power inspection is an important factor to be considered. A random forest consists of multiple decision trees. Because the decision trees are independent of each other, the generation process of the decision tree can be performed in parallel, which greatly improves the time efficiency of the algorithm. GBDT constructs a set of weak learners through boosting iteration, which is an iterative learning method. It is difficult to train data in parallel due to the dependencies between weak learners. The serial process of GBDT results in high computational complexity. For the current training set, training a random forest classifier only takes about 18 minutes, while building a GBDT classifier takes more than 10 hours. Considering both accuracy and time-consuming, RF would be the optimal classifier among these tested classifiers. Specifically, taking random forests as an example, the misclassification of all the classes is analyzed in detail. Tables 5-7 explain the confusion matrix and performance measures of sites I, II and III, respectively. For sites I and II, the REC value of the pylon and building classes is low. The pylon points are mainly confused with vegetation points. The building points are confused with vegetation and ground points. Since there are three pylons in site III, the terrain complexity is higher than that of sites I and II. RF performed well in classifying the ground and vegetation points. In addition, the performance in classifying the pylon and building points is also similar to that achieved for the first two datasets.
Hence, it seems that this algorithm can also adapt to complex terrain scenes. The precision for the power line classification significantly decreased. This may be because, in site III, there is a small region without ground points (in the original data, there is a lake). It seemed that the classifiers are affected by this abnormal region. When defining the features, the power scene defaults to a vertical structure. The power line points are suspended in the air, and the elevation is higher in the vertical profile. The ground points cover the entire scene at the lowest elevation. Therefore, the lack of ground points directly destroys the vertical profile features for the power line and leads to the abnormally high values of the DR (Density Ratio) feature. The serious misclassification of power lines and other classes above this area is due to this anomaly (Figure 8). Different classifiers show differences in pylon classification. Especially, RF performs better when faced with different pylon types. The two types of the pylon are denoted as type-I and type-II (Figure 9b,c). All the pylons are mainly confused with buildings and vegetation, especially the bottom of the pylons (Figures 10 and 11). LR is most sensitive to the pylon types. Both types of pylon have serious misclassifications. The integrity of the shape of the pylon cannot be guaranteed for type-I. These results may be explained by the fact that LR is a linear model with limitations on the adaptability to the data scene. RF shows strong robustness in classifying the pylon types. High accuracy can be guaranteed for the classification of the different pylon types. Since RF and GBDT are integrated classifiers, the final result is determined by multiple tree voting, their generalization ability is stronger than that of LR. In summary, among the four classification algorithms, RF would be the best classifier for the classification of power corridor scenes.  (Figure 10 and 11). LR is most sensitive to the pylon types. Both types of pylon have serious misclassifications. The integrity of the shape of the pylon cannot be guaranteed for type-I. These results may be explained by the fact that LR is a linear model with limitations on the adaptability to the data scene. RF shows strong robustness in classifying the pylon types. High accuracy can be guaranteed for the classification of the different pylon types. Since RF and GBDT are integrated classifiers, the final result is determined by multiple tree voting, their generalization ability is stronger than that of LR. In summary, among the four classification algorithms, RF would be the best classifier for the classification of power corridor scenes.   Different classifiers show differences in pylon classification. Especially, RF performs better when faced with different pylon types. The two types of the pylon are denoted as type-I and type-II ( Figure  9b,c). All the pylons are mainly confused with buildings and vegetation, especially the bottom of the pylons (Figure 10 and 11). LR is most sensitive to the pylon types. Both types of pylon have serious misclassifications. The integrity of the shape of the pylon cannot be guaranteed for type-I. These results may be explained by the fact that LR is a linear model with limitations on the adaptability to the data scene. RF shows strong robustness in classifying the pylon types. High accuracy can be guaranteed for the classification of the different pylon types. Since RF and GBDT are integrated classifiers, the final result is determined by multiple tree voting, their generalization ability is stronger than that of LR. In summary, among the four classification algorithms, RF would be the best classifier for the classification of power corridor scenes.

Balanced Versus Unbalanced Learning
The classification of power corridor scenes has a typical problem of unbalanced class distribution. Most supervised learning classification methods have limitations when such problems exist. The

Balanced Versus Unbalanced Learning
The classification of power corridor scenes has a typical problem of unbalanced class distribution. Most supervised learning classification methods have limitations when such problems exist. The correct classification of the majority classes tends to be learned, while the classes with fewer samples are possibly neglected. This may result in the reduction of the classification accuracy for the minority classes. Accordingly, this section discusses the impact of class distribution on the classification results, using feature set F 0 and considering RF as the optimal classifier in Section 4.1.
Sampling to 100,000 points per class for the training set may lead to better classification performance. Table 8 shows the classification performance for different sampling references. The accuracy score and time consumption are used as performance measures. When all the classes of the training data are sampled to 50,000 points, the classification accuracy is improved by 2.18% compared with that of no resampling data. It only takes 77 S, nearly 40 times shorter than the original time. When the data is sampled to 100,000 points per class, the mean score reaches 97.24% and time increased by 102 S relative to 50,000 points. When the number of sampling points is increased by 50,000 to 150,000, the accuracy is improved very little, only 0.08%, and the time consumption increased by 112 s. This indicates that increasing the number of samples at this time has little effect on accuracy. In contrast, this operation increases the calculation time and reduces efficiency. Therefore, 100,000 was used as the sampling reference value. The pylon, power line, and building classes were oversampled according to the reference value sampling reference value. The ground and vegetation classes were under-sampled. The information of the resampled dataset is shown in Table 9. The oversampled results for the building, pylon and power line classes are shown in Figure 12, where the blue points are the original points, and the red points are synthesized by the SMOTE algorithm. The synthesized building points retain the planar characteristics of the original points, while the pylon points retain the original pylon shape and the vertical structure information. The distribution and range of the synthetic points are similar to those of the original points. The SMOTE oversampling method is used to make the pylon, power line and building denser while maintaining the original point cloud structure. power line classes are shown in Figure 12, where the blue points are the original points, and the red points are synthesized by the SMOTE algorithm. The synthesized building points retain the planar characteristics of the original points, while the pylon points retain the original pylon shape and the vertical structure information. The distribution and range of the synthetic points are similar to those of the original points. The SMOTE oversampling method is used to make the pylon, power line and building denser while maintaining the original point cloud structure. Contrary to expectations, balanced learning leads to poor performance. Table 10 shows the classification performance for the different class distributions. The symbol ∆ denotes the difference between the measures obtained by balanced and unbalanced learning. The P values of the three test sets fell by more than 5%, while the R values increased by 10% on average. These results indicate that balanced learning can increase the recall rates. However, this comes at the expense of the precision and finally leads to a drop in the F value. The power corridor scene classification pays more attention to the precision rate. Although balanced learning can partly improve the recall rate, it may not be suitable for the current scene.  Contrary to expectations, balanced learning leads to poor performance. Table 10 shows the classification performance for the different class distributions. The symbol ∆ denotes the difference between the measures obtained by balanced and unbalanced learning. The P values of the three test sets fell by more than 5%, while the R values increased by 10% on average. These results indicate that balanced learning can increase the recall rates. However, this comes at the expense of the precision and finally leads to a drop in the F value. The power corridor scene classification pays more attention to the precision rate. Although balanced learning can partly improve the recall rate, it may not be suitable for the current scene. The classification accuracy of all the classes is affected by the class distribution. Table 11 shows the variation of F1 values for different classes and class distributions. The F1 values of each class in sites I and II have dropped. Among the three data sets, the building class considered as minority class declined the most, averaging 14.2%. The pylon class, another minority class, also shows an obvious variation (site I: −3.75%, site II: −5.74%). The majority classes, namely, the ground and vegetation classes, are less affected by the class distribution. The variation range of the F1 values for balanced learning and unbalanced learning is less than 1.5%. A possible explanation for this might be that the algorithm is affected by the density of the points cloud. Resampling the training set artificially changes the density. As a result of this, the training set and the test set do not match, which may lead to a decrease in accuracy. When processing a power corridor dataset with an unbalanced distribution, data resampling may not be necessary. The point clouds with the original unbalanced distributions are more favorable for power corridor scene classification.

Comparison Between Feature Sets
This section discusses the impact of the different feature sets on the accuracy of the power corridor scene classification with balanced learning. The eight largest principal components were selected with PCA; that is, the dimension of the feature space was reduced from 21 to 8. Table 12 shows that in terms of all the measures, the complete feature set, F 0 , performs the best in the current experiment. The F PCA feature set, which is based on the PCA transform, performs poorly (in site I, P = 77.67%, R = 79.01%, F = 77.62%; in site II, P = 88.24%, R = 77.71%, F = 81.51%; in site III, P = 70.69%, R = 62.92%, F = 65.04%). This is mainly because the complete feature set, F 0 , is designed based on the characteristics of each class. The PCA only reduces the dimension of the features by data variance, which loses the physical meaning of the original features. The feature importance is considered when constructing the F RF feature set. The classification results of F 0 are slightly better than those of F RF . Therefore, F 0 would be the optimal feature set.

Sensitivity Analysis of Neighborhood Radius for Feature Extraction
The sensitivity of each class to the neighborhood radius for feature extraction is discussed in this section to find the optimal neighborhood. Since the cylinder neighborhood, C, and the sphere neighborhood, S, are used in the calculation of the feature vector. The feature values are related to the neighborhood radius. A large radius may include other class points in the neighborhood, while a small radius may include an insufficient number of class points. Both will affect the accuracy of the scene classification. Therefore, a sensitivity analysis of this parameter is required to find the optimal neighborhood radius.
To show the sensitivity of the classes to the neighborhood radius more intuitively, RF was tested. To balance the assessment of the recall and accuracy, the F1 value was used as a measure of performance (shown in Table 13). All the classes are affected by the neighborhood radius. Specifically, the pylon class is the most sensitive class, with a change range of 22.3%; while the power line class is the least sensitive, with a change range of only 2.05%. It seemed that the F1 value of the vegetation and ground class fluctuates due to its dominance in the neighborhood (shown in Figure 13). As the radius increases, the number of points in other classes increases, resulting in the weakening of the features. When the radius exceeds a certain value, the number of vegetation points accounts for the vast majority, so the accuracy increases. These results are likely related to the object size. Buildings and pylons are regular objects that are relatively fixed in size, unlike vegetation, ground and power lines. The classification performances of them show an obvious upward and then downward trend. Too large or too small a neighborhood radius will show worse classifications due to the proportion of different classes in a neighborhood. The optimal neighborhood radius of different classes is different. The optimal neighborhood radius is 4.5 m for mean F1-score. Generally, the scale problem is important for feature extraction that can affect the final classification performance. Therefore, scale invariance and automatic scale selection methods may contribute to classification improvement.

Sensitivity Analysis of Neighborhood Radius for Feature Extraction
The sensitivity of each class to the neighborhood radius for feature extraction is discussed in this section to find the optimal neighborhood. Since the cylinder neighborhood, C, and the sphere neighborhood, S, are used in the calculation of the feature vector. The feature values are related to the neighborhood radius. A large radius may include other class points in the neighborhood, while a small radius may include an insufficient number of class points. Both will affect the accuracy of the scene classification. Therefore, a sensitivity analysis of this parameter is required to find the optimal neighborhood radius.  To show the sensitivity of the classes to the neighborhood radius more intuitively, RF was tested. To balance the assessment of the recall and accuracy, the F1 value was used as a measure of performance (shown in Table 13). All the classes are affected by the neighborhood radius. Specifically, the pylon class is the most sensitive class, with a change range of 22.3%; while the power line class is

Conclusions
Based on the airborne LiDAR data of a power corridor, we systematically compared the important parameters of the scene classification algorithms for power corridors with five kinds of target objects: Ground, vegetation, power line, pylon, and building. We classified the ALS point clouds via a framework with three stages: (i) Point cloud outliers filtering; (ii) feature extraction and selection; and (iii) classification. Specifically, we focused on comparing the results of different classifiers, class distributions, feature sets and neighborhood sizes for feature extraction. Through the comparison analyses, we proposed and validated a simple workflow for power corridor classification. We found that the classification method composed of the RF classifier, the original unbalanced class distribution, and the complete feature sets could be an optimal solution for higher classification accuracy. The sensitivity analyses showed that the optimal neighborhood radius for feature extraction of different objects is different and that the pylon class is the most sensitive to neighborhood changes.
By analyzing the algorithms, we discuss the future direction for improvement of the power corridor scene classification algorithms. For example, the echo characteristics can be considered to improve the classification accuracy of vegetation, and the ground points can be filtered first by using traditional ground extraction methods. The features derived from the radiometric data were also found useful, in other research, for improving the classification accuracy of vegetation and buildings. Therefore, their application seems to be worth testing. In addition, scale invariance and automatic scale selection methods should be further studied for feature extraction. There is a fact that the Power systems usually knows the exact geographic location of each pylon with the geographic information systems. To improve the classification accuracy of the pylon and power line, some post-processing work can be done, such as min-cut algorithms, and important parameters like neighborhood radius can be easily adjusted based on this prior knowledge. Moreover, there has been increasing interest in deep learning for point cloud scene classification. The next step is to consider applying deep learning to the classification of power corridor scenes.
Author Contributions: C.W., S.P. and P.W. designed the experiments; S.P. analyzed the data and wrote the paper; X.X. and P.D. revised the paper. S.N. and X.X. provided fund supports.