Classiﬁcation of Transmission Line Corridor Tree Species Based on Drone Data and Machine Learning

: Tree growth in power line corridors poses a threat to power lines and requires regular inspections. In order to achieve sustainable and intelligent management of transmission line corridor forests, a transmission line corridor tree barrier management system is needed, and tree species classiﬁcation is an important part of this. In order to accurately identify tree species in transmission line corridors, this study combines airborne LiDAR (light detection and ranging) point-cloud data and synchronously acquired high-resolution aerial image data to classify tree species. First, individual-tree segmentation and feature extraction are performed. Then, the random forest (RF) algorithm is used to sort and ﬁlter the feature importance. Finally, two non-parametric classiﬁcation algorithms, RF and support vector machine (SVM), are selected, and 12 classiﬁcation schemes are designed to perform tree species classiﬁcation and accuracy evaluation research. The results show that after using RF for feature ﬁltering, the classiﬁcation results are better than those without feature ﬁltering, and the overall accuracy can be improved by 3.655% on average. The highest classiﬁcation accuracy is achieved when using SVM after combining a digital orthorectiﬁcation map (DOM) and LiDAR for feature ﬁltering, with an overall accuracy of 85.16% and a kappa coefﬁcient of 0.79.


Introduction
Excessive growth of trees around the transmission line corridor tends to obstruct transmission lines. Therefore, trees that grow to a height that threatens transmission lines need to be regularly inspected and removed [1]. In order to achieve sustainable and smart management of forests in transmission line corridors, trees in transmission line corridors are not cut down all at once, but systematically through the establishment of a transmission line corridor tree barrier management system. By inputting tree obstruction information into the information base, a model of tree growth is created to facilitate inquiries about tree obstruction hazards, so that planned felling can be developed. Therefore, it is important to know the tree species. With the continuous development of remote-sensing technology, tree species classification has also been applied to transmission line corridors. However, most of the data sources used in the research on tree species classification of transmission line corridors are single data sources [2], and the classification accuracy is not sufficient to effectively prevent hidden dangers caused by trees in these corridors. The classification of tree species based on multi-source remote sensing has advantages in other fields [3][4][5][6][7], so this study considers using multi-source unoccupied aerial vehicle (UAV) data to classify tree species in transmission line corridors to improve classification accuracy.
Machine learning (ML) algorithms can be used to solve the non-linear sample classification problem of tree species classification. Many scholars have used ML to identify or classify tree species [8][9][10][11]. For instance, Franklin et al. [12] used the multi-spectral data obtained by drones combined with ML algorithms to classify deciduous tree species, with an overall classification accuracy of 78%. Ahmed et al. [13] placed three multispectral cameras on a UAV and used the acquired data to identify Sequoia; the results showed that the identification accuracy was as high as 89%. Chan et al. [14] compared the classification accuracy of different classification algorithms based on hyperspectral data, and the results showed that the classification accuracy of AdaBoost classification and random forest (RF) classification algorithm was almost the same (close to 70%); the difference was less than 1%, which was higher than that of the neural network classifier that has an overall accuracy of 63.7%. Puttonen et al. [15] collected LiDAR data and hyperspectral data at the same time based on the Sensei system of the Finnish Geodetic Institute to classify coniferous and broad-leaved species. The results show that the classification accuracy using only spectral features was 90.5%, while the overall accuracy of classification combined with spectral and structural features reached 95.8%. Considering airborne hyperspectral and LiDAR data obtained at the same time and the support vector machine (SVM) classifier, Liu Yijun et al. [16] classified the dominant tree species in the Pu'er Mountain experimental area forest. The results showed that the overall accuracy of the fusion data classification reached 80.54%, compared with only using spectral information. In summary, the preceding research shows that using multi-source remote-sensing data combined with ML can enable effective identification of tree species. In the past, studies on tree species classification used remote-sensing images with a low-resolution rate, and most of them used a single data source. However, using multiple remote-sensing data sources and ML algorithms to classify tree species represents a research hotspot [2,[17][18][19][20]. In addition, relatively few studies have been conducted on the classification of tree species in transmission line corridors.
Accurate spatial information on tree species is essential for forestry management and is crucial for sustainable management of forest resources and effective monitoring of species diversity, which can help solve a wide variety of application problems faced by forestry management. In this study, experiments were conducted to address the issue of how to improve the accuracy and efficiency of forest species classification using remote sensing technology. On the one hand, the complementary effect of the superior features of airborne LiDAR point clouds and DOMs (digital orthophoto maps) is realized, and the classification accuracy of woody species is improved by feature screening. In addition, various classification methods are analyzed and compared, which has important theoretical significance. On the other hand, this helps to obtain finer tree species information of the transmission channel more accurately and quickly and provides a reference basis for the tree obstacle potential management system. It is of great practical significance for establishing tree growth models, as well as querying and timely cleaning of tree barrier hazards in transmission line corridors.
This study fully utilizes the advantages of machine classification algorithms in highdimensional feature classification and solves the problem of low classification accuracy of tree species in transmission line corridors. First, the vertical information provided by the LiDAR data and the horizontal information provided by the DOM are combined to segment the canopy and extract the canopy features. Then, the RF algorithm is used in feature selection. Finally, the RF and SVM algorithms are used to classify tree species, and the high-precision classification of tree species in the transmission line corridor is achieved.

Study Area
The study area is located in the northeastern part of Chizhou city, Anhui Province, with an altitude between 1.8 m and 112.2 m. The geographical position is 117 • 46'-117 • 56' east longitude and 30 • 39'-30 • 41' north latitude. It has a warm and humid subtropical monsoon climate with four distinct seasons, sufficient rainfall, annual average temperature of 16.5 • C, annual average precipitation of 1400-2200 mm, a long period of sunshine, a short frost-free period, and approximately 40 rainy days. The study area is rich in vegetation types. The dominant tree species include broad-leaved tree species such as fir, bamboo, Sustainability 2022, 14, 8273 3 of 15 maple, and oak, mainly in middle-aged and mature forests. The specific location of the study area is shown in Figure 1. frost-free period, and approximately 40 rainy days. The study area is rich in vegetation types. The dominant tree species include broad-leaved tree species such as fir, bamboo, maple, and oak, mainly in middle-aged and mature forests. The specific location of the study area is shown in Figure 1.

Aerial Image and LiDAR Data
The data used in this study include airborne LiDAR point-cloud data and synchronized high-resolution digital orthophotos. The flight time was June 2016, under clear weather conditions with good visibility. The airborne LiDAR point-cloud data were collected using the Optech ALTM Galaxy system. The parameters are shown in Table 1. The downlink channel of one of the towers in the study area was selected as the test area. The original LiDAR point-cloud data and orthophotos of the specific study area are shown in Figure 2 and Supplementary Materials File S1.

Aerial Image and LiDAR Data
The data used in this study include airborne LiDAR point-cloud data and synchronized high-resolution digital orthophotos. The flight time was June 2016, under clear weather conditions with good visibility. The airborne LiDAR point-cloud data were collected using the Optech ALTM Galaxy system. The parameters are shown in Table 1. The downlink channel of one of the towers in the study area was selected as the test area. The original LiDAR point-cloud data and orthophotos of the specific study area are shown in Figure 2 and Supplementary Materials File S1.

Methods
This study combines the horizontal characteristics of the DOM and the vertical characteristics of LiDAR data and selects ML algorithms to classify the tree species around the transmission line corridor. The main steps are as follows: (1) LiDAR point-cloud data are used to generate a CHM (canopy height model). (2) The watershed algorithm is used in CHM-based single wood segmentation. (3) The RF algorithm is used to select the best feature combination for individual-tree species classification and analyze and compare the impact of feature se-lection on tree species classification. (4) A classification scheme is designed, the effect of multi-source UAV data in individual-tree species classification is studied, and the ability of different non-parametric learning algorithms is evaluated to classify tree species at the individual tree level. The technical process is shown in Figure 3.

Methods
This study combines the horizontal characteristics of the DOM and the vertical characteristics of LiDAR data and selects ML algorithms to classify the tree species around the transmission line corridor. The main steps are as follows: (1) LiDAR point-cloud data are used to generate a CHM (canopy height model). (2) The watershed algorithm is used in CHM-based single wood segmentation. (3) The RF algorithm is used to select the best feature combination for individual-tree species classification and analyze and compare the impact of feature se-lection on tree species classification. (4) A classification scheme is designed, the effect of multi-source UAV data in individual-tree species classification is studied, and the ability of different non-parametric learning algorithms is evaluated to classify tree species at the individual tree level. The technical process is shown in Figure 3.

Data Preprocessing
In this study, the LiDAR point cloud data are already classified point clouds. The point clouds of extraneous objects on the ground such as transmission lines and tower bases are removed before the segmentation of individual tree canopies is performed. Only vegetation points and ground points in the point cloud are retained. The ground points in the classified point cloud data are used as feature points to perform interpolation operations to construct a DEM. The first echo points of vegetation points are interpolated, and the difference operation is performed to construct a DSM. The interpolation method uses Triangulation Irregular Network Interpolation (TIN), which constructs triangles from a series of points. The advantage of the TIN method is its ability to preserve surface details in topographically complex areas. The difference operation is performed on the generated DSM and DEM raster data to obtain the canopy height model after elevation normalization. There are black or gray invalid holes in the original CHM caused by abnormal changes in height, which will affect tree vertex detection and tree crown sketching. In this study, the median filter in the smoothing filter is selected for smoothing, a new CHM is generated, and the invalid value of the optimized CHM image is filled.

Individual-Tree Canopy Segmentation
Before individual-tree canopy segmentation, point clouds of irrelevant objects on the ground such as transmission lines and tower bases are removed, and only vegetation points and ground points in the point cloud are retained, thus improving the accuracy of tree segmentation.
Watershed segmentation algorithm is a mathematical morphology segmentation method based on topology theory proposed by Vincent [21]. This algorithm considers im-Sustainability 2022, 14, 8273 5 of 15 age segmentation according to the composition of the watershed and has a good response to weak edges. It is one of the most common segmentation methods. In this paper, the watershed segmentation algorithm is used to segment the single tree canopy for CHM, the Gaussian smoothing factor is 1, and the smoothing window used is 5 × 5.

Data Preprocessing
In this study, the LiDAR point cloud data are already classified point clouds. The point clouds of extraneous objects on the ground such as transmission lines and tower bases are removed before the segmentation of individual tree canopies is performed. Only vegetation points and ground points in the point cloud are retained. The ground points in the classified point cloud data are used as feature points to perform interpolation operations to construct a DEM. The first echo points of vegetation points are interpolated, and the difference operation is performed to construct a DSM. The interpolation method uses Triangulation Irregular Network Interpolation (TIN), which constructs triangles from a series of points. The advantage of the TIN method is its ability to preserve surface details in topographically complex areas. The difference operation is performed on the generated DSM and DEM raster data to obtain the canopy height model after elevation normalization. There are black or gray invalid holes in the original CHM caused by abnormal changes in height, which will affect tree vertex detection and tree crown sketching. In this study, the median filter in the smoothing filter is selected for smoothing, a new CHM is generated, and the invalid value of the optimized CHM image is filled.

Individual-Tree Canopy Segmentation
Before individual-tree canopy segmentation, point clouds of irrelevant objects on the ground such as transmission lines and tower bases are removed, and only vegetation points and ground points in the point cloud are retained, thus improving the accuracy of tree segmentation.
Watershed segmentation algorithm is a mathematical morphology segmentation

Feature Extraction
In this study, three types of features are extracted based on DOM: spectral, textural, and geometric features. Thereafter, point cloud and CHM features are extracted based on LIDAR point clouds. The detailed list is shown in Tables 2-6.

Geometric Features Feature Description Symbolic Representation
Area Area of segmented object Area Perimeter Perimeter of segmented object Perimeter Area perimeter ratio Ratio of area of segmented object to perimeter A_P  A large number of features bring about the problem of redundancy. Even a classifier that is not sensitive to dimensionality decreases the classification accuracy, and feature screening can solve this problem [22]. This study selects the RF algorithm for feature screening because the RF algorithm can sort the importance of variables before classification [23]. The most important features to participate in the classification must be retained to solve the problem of excessive original features. The specific steps are the following: First, the Gini index is calculated for each node k in each tree: G k represents the Gini index at node k.p k represents the estimated value of the probability that the sample belongs to any class at node k.
The importance of a node is determined by the amount of change in the Gini index before and after the node is split: G k1 and G k2 represent the child nodes generated by G k . For each tree in the forest, the preceding criteria are used to recursively generate I ∆k .
Finally, samples and variables are randomly selected to generate a forest. It is assumed that the forest produces a total of T trees.
In the forest, if the variable X i appears M times in the t-th tree, then the importance of the variable X i in the t-th tree is Then, the variable importance of X i in the entire forest is Finally, the variables are selected according to the importance of the variables.

Tree Species Classification Based on Machine Learning
According to field survey data, the main tree species in the study area are paulownia, oak, fir, moso bamboo, maple poplar, and others. The final classification system is divided into four categories, namely, paulownia, oak, fir, and other tree species (including bamboo, maple poplar, shrubs, and other relatively small tree species).
The RF algorithm integrates a large number of trees into a forest, avoiding the onesidedness and inaccuracy caused by the classification of a single decision tree, while the SVM does not require large samples and has great advantages in high-dimensional feature recognition. Therefore, this study applies RF and SVM in tree species classification.
The main steps of RF-based tree species classification are the following: (1) Random samples are created. Each time with replacement, n samples are drawn from the original sample set, and k extractions are performed in total. (2) A decision tree is established. In each process of generating a decision tree, from the D features in the feature space, d (d < D) features are selected to form a new feature set, and the new feature set is used to generate a decision tree. (3) The generated k decision trees are combined, and the classification results of multiple decision trees are selected to obtain the final classification category.
The tree species classification process based on SVM is transforms the non-linear sample space into a linear space through the kernel function to realize the division of samples. In this study, the kernel function chooses the radial basis function [24], which is expressed as In the formula, x and x i refer to the unknown vector and the support vector, respectively, and δ is the width of the function.
Based on the segmented image objects and the extracted features, 12 combinations are formed. These twelve combination schemes are shown in Table 7. When DOM is used, schemes I and II are unfeatured screening that use RF and SVM classifiers, respectively, whereas schemes III and IV are featured screening that use RF and SVM classifiers, respectively, after selection. When LiDAR is used, schemes V and VI are unfeatured screening that use RF and SVM classifiers, respectively, whereas schemes VII and VIII are featured screening that use RF and SVM classifiers, respectively, after selection. When LiDAR and DOM are used, schemes IX and X are unfeatured screening that use RF and SVM classifiers, respectively, whereas schemes XI and XII are featured screening that use RF and SVM classifiers, respectively, after selection.

Accuracy Evaluation Indicators
In this study, stratified sampling is used to randomly select 40% of the data from each tree species for inspection. A total of 232 training samples and 155 test samples are available in the sample plots.
After obtaining the tree species classification results of different schemes, we need to verify the correctness to evaluate the effect of the individual-tree species classification of each scheme. The stratified sampling method is adopted, and the verification samples are selected through a combination of field investigation and visual interpretation. Constructing a confusion matrix is a common method to quantify classification accuracy [25]. In addition, MAE is selected for metrics in this study [26][27][28]. The indicators used to measure are shown in Table 8.
A precision statistic used to determine the matching degree between the actual feature category and classification result, which can weaken the influence of sample selection on the accuracy verification Measure of the difference between the predicted and actual values of the model.
x ii is the number of samples that were correctly classified. x i+ is the total number of samples classified into class i. x +i is the total number of samples in class i in the reference samples. r is the total number of classes. N denotes the total number of samples drawn. y i is the actual expected output, andŷ i is the model prediction.

Optimized CHM Extraction Results
Due to the small canopy width, the use of a 3 × 3 filter window can retain the original information to the greatest extent. This study uses a 3 × 3 filter window to perform median smoothing filtering of the original CHM raster data. Comparing the local effect map of the median filter algorithm (Figure 4), we find many discontinuously distributed low values at the edge of the canopy in the original image. The image after median filtering is smoother, and invalid values in the image can also be removed effectively. Therefore, the median filter is selected to smooth the CHM data to reduce the impact of invalid values on accuracy. As shown in the final canopy height model in Figure 5, as the height of the canopy increases, and the image shows a brightness change from black to white. Figure 4b shows a partial demonstration of Figure 5.

Individual Tree Segmentation Results
The optimized CHM is segmented by the watershed segmentation algorithm. In combination with the field survey, the optimized results of partial tree crown segmentation and selected samples are shown in Figure 6.

Individual Tree Segmentation Results
The optimized CHM is segmented by the watershed segmentation algorithm. In combination with the field survey, the optimized results of partial tree crown segmentation and selected samples are shown in Figure 6.

Individual Tree Segmentation Results
The optimized CHM is segmented by the watershed segmentation algorithm. In combination with the field survey, the optimized results of partial tree crown segmentation and selected samples are shown in Figure 6.

Feature Screening Results
In this study, the RF algorithm is used to sort and filter the importance of a feature set composed of five types of 160 features based on DOM and LiDAR point-cloud data extraction. In total, 15 and 13 features were retained by RF screening when using only LiDAR and DOM, respectively, and 18 features were retained by feature screening after combining the two types of data. The ranking of the importance of the features retained after screening is shown in Figure 7. Analysis of feature importance revealed that the spectral mean and standard deviation scores for each band in the spectral features were the most stable and contributed the most, whether the classification was performed using only DOM or DOM combined with LiDAR. The texture features also have important contributions in the classification, where the contrast and correlation are the top ranked features in importance among the texture features. In the combination of DOM and LiDAR, point-cloud features, CHM features and geometric features all have more important roles in the classification. most stable and contributed the most, whether the classification was performed using only DOM or DOM combined with LiDAR. The texture features also have important contributions in the classification, where the contrast and correlation are the top ranked features in importance among the texture features. In the combination of DOM and LiDAR, point-cloud features, CHM features and geometric features all have more important roles in the classification.

Classification Results and Accuracy Evaluation of Individual Tree Species
According to the results of the previous individual tree crown segmentation and feature extraction, the individual tree species are classified based on the designed four schemes, and the classification algorithm is implemented using Python. Samples are selected through a combination of field investigation and visual interpretation. Then, 60% of the data are selected as the training set for training the model, and 40% of the verification data are used to test the model reliability. After the tree species classification results are obtained, the test samples are selected to evaluate the accuracy of the results, and the best classification scheme is determined after analysis and comparison. The classification accuracy is shown in Table 9. The results of classifying trees according to the scheme 12 with the highest overall accuracy are shown in Figure 8.

Classification Results and Accuracy Evaluation of Individual Tree Species
According to the results of the previous individual tree crown segmentation and feature extraction, the individual tree species are classified based on the designed four schemes, and the classification algorithm is implemented using Python. Samples are selected through a combination of field investigation and visual interpretation. Then, 60% of the data are selected as the training set for training the model, and 40% of the verification data are used to test the model reliability. After the tree species classification results are obtained, the test samples are selected to evaluate the accuracy of the results, and the best classification scheme is determined after analysis and comparison. The classification accuracy is shown in Table 9. The results of classifying trees according to the scheme 12 with the highest overall accuracy are shown in Figure 8.

Results Analysis
Analysis of the accuracy of the scenarios based on the data in Table 9 shows that: (1) When using DOM only, scheme Ⅲ had the highest classification accuracy with an overall accuracy of 79,35%, Kappa coefficient of 0.71, and MAE of 0.29. After feature selection, the accuracy of both classifiers improved. The classification schemes with

Results Analysis
Analysis of the accuracy of the scenarios based on the data in Table 9 shows that: (1) When using DOM only, scheme III had the highest classification accuracy with an overall accuracy of 79,35%, Kappa coefficient of 0.71, and MAE of 0.29. After feature selection, the accuracy of both classifiers improved. The classification schemes with feature selection improved the accuracy of classification using RF and SVM by 5.16% and 1.93%, respectively, compared to the schemes without feature selection. (2) When using LiDAR only, none of the classification results of schemes V-VIII were very good, and none of the overall accuracies reached 55%. For this study area, the effect of using LiDAR only for tree species classification was not satisfactory. (3) When using the combination of DOM and LiDAR for classification, scheme 12 had the best classification results, with an overall accuracy of 85.16% and a Kappa coefficient of 0.79. The accuracy of classification using RF and SVM improved by 3.23% and 6.45%, respectively, after feature selection compared to that in the scheme without feature selection. (4) In terms of tree species, Paulownia was more affected by feature selection, and in most cases, PA, UA improved after feature selection. Oak and fir were more affected by feature selection when LiDAR and DOM were combined for classification, and there was a significant improvement in PA and UA. The classification accuracy of other tree species was not ideal due to more internal species, and it may be necessary to classify other tree species into several more detailed categories in order to improve the accuracy.

The Impact of Feature Screening on Classification
Feature screening is very important for classification research. Feature screening can reduce multicollinearity among features and improve computational efficiency and classification accuracy. The results show that the accuracy and Kappa coefficient of RF and SVM classification improved after feature screening, and RF feature screening achieved good results in both RF and SVM classification. Therefore, the RF signature screening is reliable. Using multispectral and LiDAR data for classification, Pham et al. [29] explored the role of RF signature screening for classification. When the multi-source data were combined, the AO after RF screening reached 85.4%, and the Kappa coefficient was 0.81, which were 0.05 and 0.07 higher than those without feature screening, which is very similar to the results of this study.

The Impact of the Classification Algorithm on the Accuracy
For this study, when DOM was combined with LiDAR for classification, the SVM algorithm was more accurate after feature filtering. This may be because the SVM model can solve high-dimensional problems well and is better for machine learning in the case of small samples. The RF algorithm has been shown to overfit in some noisy classification or regression problems.

Contribution of Different Features to Classification
When DOM was combined with LiDAR for classification, intensity and height features were extracted from LiDAR, spectral and texture features are extracted from DOM, and the performance of these features was evaluated. The results show that the spectral features contributed the most to the classification. Among them, the green band was very important in distinguishing tree species, probably because of the different pigment contents of different tree species; the contents of chlorophyll, carotenoid, anthocyanin, and lutein are closely related to the reflectance of the green band. Texture features also contributed greatly, such as the contrast and correlation within the convolution kernel. Texture features are global features that can describe the surface properties of the scene corresponding to the image area, so they have great potential for classification. The LiDAR point cloud features provided three-dimensional information of trees for classification. The first echo intensity features and height features of LiDAR data were sensitive to canopy conditions, well represented the tree canopy structure and morphological features, and contributed greatly to tree species classification.

Effect of Observation Season on the Classification Accuracy
Huaipeng Liu [30] classified urban tree species based on four seasons of RedEdge-MX data, and the results showed that among the four seasons of the year, the classification of tree species based on spring data was the best. The accuracy of tree species classification can be improved by combining data from two, three, and four seasons. Other studies on tree species classification were conducted in summer or autumn and also achieved good accuracy, very similar to the results of the present study [31,32]. In future studies, more data from different periods will be applied to the study of tree species classification so that the relationship between seasons and the accuracy of tree species classification can be discussed in more depth.

Conclusions
To solve the problem of tree species classification in transmission line corridors, this study used multi-source UAV data and ML methods to effectively overcome the problem of low tree species classification accuracy and realized the extraction and classification of individual trees in transmission line corridors. The results show that feature selection is an important task in classification research on tree species. After feature screening, the accuracy and kappa coefficient of RF and SVM classification improved. Thus, RF feature screening achieved good results in both RF and SVM classification, which shows that this type of feature screening is reliable.
During the experiment, the extraction of features was the most important, and the contribution of various features to the classification results was different. The research results show that spectral features contributed the most to classification. In addition, texture features played a very important role in classification, such as the correlation and contrast in the convolution kernel of the green band and blue band. The features extracted from LiDAR data were used to supplement the 3D information of the individual tree and were also indispensable in the classification. The research results show that the first echo intensity feature and height feature of LiDAR data also had a high contribution to the classification. In future research, more data sources will be selected to achieve large combinations so that more effective features can be extracted to distinguish tree species. This will provide important information for the establishment of an intelligent early warning system for tree barriers in transmission line corridor areas, thus enabling sustainable management of forest resources and effective monitoring of species diversity in these corridors.