Object ‐ Based Tree Species Classification Using Airborne Hyperspectral Images and LiDAR Data

: The identification of tree species is one of the most basic and key indicators in forest resource monitoring with great significance in the actual forest resource survey and it can comprehensively improve the efficiency of forest resource monitoring. The related research has mainly focused on single tree species without considering multiple tree species, and therefore the ability to classify forest tree species in complex stand is not clear, especially in the subtropical monsoon climate region of southern China. This study combined airborne hyperspectral data with simultaneously acquired LiDAR data, to evaluate the capability of feature combinations and k ‐ nearest neighbor (KNN) and support vector machine (SVM) classifiers to identify tree species, in southern China. First, the stratified classification method was used to remove non ‐ forest land. Second, the feature variables were extracted from airborne hyperspectral image and LiDAR data, including independent component analysis (ICA) transformation images, spectral indices, texture features, and canopy height model (CHM). Third, random forest and recursion feature elimination methods were adopted for feature selection. Finally, we selected different feature combinations and used KNN and SVM classifiers to classify tree species. The results showed that the SVM classifier has a higher classification accuracy as compared with KNN classifier, with the highest classification accuracy of 94.68% and a Kappa coefficient of 0.937. Through feature elimination, the classification accuracy and performance of SVM classifier was further improved. Recursive feature elimination method based on SVM is better than random forest. In the spectral indices, the new constructed slope spectral index, SL 2 , has a certain effect on improving the classification accuracy of tree species. Texture features and CHM height information can effectively distinguish tree species with similar spectral features. The height information plays an important role in improving the classification accuracy of other broad ‐ leaved species. In general, the combination of different features can improve the classification accuracy, and the proposed strategies and methods are effective for the identification of tree species at complex forest type in southern China.


Introduction
As a renewable resource, forest plays an important role in the survival and development of human civilization [1]. A timely understanding of the stock and distribution of forest resources is the basis for the sustainable development of forestry [2]. Nowadays, tree species classification of complex forests is becoming a very important research direction. However, with the changing climatic conditions and the interference of natural and human factors, the richness of forest species has been decreasing [3], which has seriously affected the sustainable development of the forests [4]. In addition, the accuracy of the estimation of forest on-ground carbon stocks is also dependent on the accuracy of tree species identification [5,6]. In the past, tree species were mainly identified with fieldwork, which was time consuming, laborious, and costly. With the rapid development of remote sensing technology, remote sensing image data plays an important role in forestry species identification [7,8].
At present, wide-spectrum, medium-and low-resolution remote sensing data are widely used [9,10]. Because of the low spatial and spectral resolution, only forest types can be identified. Hyperspectral imagery contains near-continuous spectral information of the ground object, which can accurately detect different objects with fine spectral differences, and therefore the recognition accuracy of tree species is improved for the source data [11][12][13]. Zhang et al. [14] used wavelet transform to process HYDICE hyperspectral data and identify tree species in tropical forests and found that hyperspectral data after wavelet transform can improve the recognition accuracy. Dian et al. [15] used airborne hyperspectral images for forest tree species classification, which proved that combining the spatial and spectral information can improve the accuracy of tree species classification. Fagan et al. [16] used hyperspectral and multitemporal Landsat imagery to classify forest and tree plantations in northeastern Costa Rica. The results indicated that using hyperspectral data alone classified six species of tree plantations with 75% to 93% producer's accuracy, however, for fine classification of tree species, the effectiveness of classification is still limited only by hyperspectral images.
The classification of remote sensing images is mainly based on pixels or objects. Classifiers based on pixels has been widely used over the past decades [17,18], and the object-based method for tree species identification has recently emerged, which is proposed for the development of high spatial resolution data [19][20][21]. The key technology of object-based classification is image segmentation. The segmentation quality and precision depend on the classification algorithms and how to perform optimal scale segmentation qualitatively or quantitatively [22,23]. Immitzer et al. [24] found that the object-based method is better than the pixel-based method when using very high spatial resolution data. Wang et al. [25] combined decision tree method (DT) with the object-based method to carry out vegetation research in the Yushu area which overcame the "salt and pepper" and improved the classification accuracy effectively.
Because the distribution of ground objects has a certain continuity, there is a correlation between adjacent pixels on the remote sensing image. Hyperspectral can only characterize the horizontal direction of the forest, which makes the phenomenon of different objects having the same spectrum and the same objects having different spectrum [24]. Therefore, even with spectral images of high spatial resolution, it is difficult to classify all tree species. As an important supplementary feature, spatial structure information makes it possible to classify tree species at a finer scale [26,27]. Airborne LiDAR data can characterize the vertical structure information of the stand, which has obvious advantages for forest type identification and forest structure characteristics [12,28]. Hollaus et al. [29] used LiDAR data to extract the canopy height of single trees. The results showed that the correlation between LiDAR tree height and field tree height was very good. Heinzel et al. [30] used high density full-waveform LiDAR data for tree species classification and found that up to six tree species were classified with an overall accuracy of 57%. Since airborne LiDAR can only obtain the threedimensional information of the vertical structure of the tree species, it is less able to improve the related information of the tree species in horizontal direction [31]. However, the type of single wood cannot be accurately determined based on tree height or crown information, therefore, in tree species identification, airborne LiDAR data needs to be combined with hyperspectral data to take the advantage of it.
Therefore, the combination of hyperspectral imagery with LiDAR data to achieve complementary advantages and application in forestry has become a new research hotspot. Some scholars have carried out related studies in this field [32,33]. Voss et al. [34] combined multitemporal AISA hyperspectral data with LiDAR data and used object-oriented classification methods for tree species classification in an urban environment; the final classification accuracy is higher than the accuracy of a single data source classification. Liu et al. [35] used support vector machine (SVM) classifier to identify complex forest species in northern China based on airborne LiDAR and hyperspectral data fusion and found that the classification accuracy of the fused data was higher than using spectral data alone, and the overall accuracy reached 83.88%, and the Kappa coefficient was 0.80. Cao et al. [36] used unmanned aerial vehicle (UAV) hyperspectral images and digital surface model (DSM) to classify mangrove species. The results showed that height information played an important role in improving the classification accuracy.
Considering the advantages of airborne hyperspectral imagery and LiDAR point cloud in structure, the combination of the two data was applied to the recognition and classification of ground objects. Related studies have mainly focused on single tree species without considering multiple tree species, and therefore the ability to classify tree species in complex forest type is not clear, especially in the subtropical monsoon climate region of southern China. Therefore, the main objectives of this study include the following: To combine hyperspectral images with simultaneously acquired LiDAR data, using different combinations of features and classifiers for object-based classification; to evaluate the capability of airborne hyperspectral and LiDAR data for accurate identification of tree species in complex forest stand in the subtropical monsoon climate region of southern China; and also to compare and analyze the contributions of different feature variables and classifiers.

Study Area
This study was carried out in Jiepai Forest Farm in Gaofeng Forest Farm of Nanning City, Guangxi Province, China (22°56′41″-23°0′21″ N, 108°19′47″-109°23′16″ E), as shown in Figure 1. The area is a hilly landform, the altitude is 100 to 300 m, and the slope is 6 to 35° [37]. It belongs to the south subtropical monsoon climate, with abundant sunshine and rainfall. The annual average temperature is about 21 °C, rainfall is 1200 to 1500 mm, and the annual evaporation is 1250 to 1620 mm [38], which is suitable for the growth of tropical and subtropical tree species. The forest type has typical characteristics of southern China forest [39]. For the study site, we chose an area with abundant tree species, covering an area of 128 ha (as the yellow polygon in Figure 1). The tree species mainly include Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.), eucalyptus (Eucalyptus robusta Smith), Illicium verum (Illicium verum Hook.f.), mytilaria laosensis (Mytilaria laosensis Lec.), slash pine (Pinus elliottii Engelm.), and masson pine (Pinus massoniana Lamb.). At the study site, the broadleaved species are varied and the samples are limited, and therefore they are not subdivided into tree species, but rather called other broad leaves. In addition, some forest land for seedling storage and cutting sites were collectively referred to as other forest land, and non-forest land was divided into water, road, and buildings. Table 1 is the specific classification system of the study area.

Data Collection
Datasets used in this study were acquired by the CAF's (the Chinese Academy of Forestry) LiCHy (LiDAR, CCD, and Hyperspectral) Airborne Observation System [40], which collects hyperspectral images, LiDAR data, and CCD images synchronously. Hyperspectral images were collected using an AISA Eagle II (Spectral Imaging Ltd., Oulu, Finland) hyperspectral sensor for LiCHy system [41]. It is a push broom imaging system and covers the VNIR spectral ranges from 400 nm to 1000 nm. The RIEGL LMS-Q680i with a full waveform LiDAR system was carried as the laser sensor [42] and provided high-precision digital elevation model (DEM) and digital surface model (DSM). A medium-format airborne digital camera system (DigiCAM-60) was selected as the CCD sensor [43] with 0.2 m spatial resolution. The data was collected on 13 January and 30 January 2018 at Jiepai Forest Farm in Nanning, Guangxi province. The actual flight altitude was approximately 1000 m, and the data acquisition day was sunny and cloudless. Detailed parameters of the three earth observation sensors are shown in Table 2. At the same time, the field survey was carried out from 16 January 2018 to 5 February 2018, mainly in the pure Chinese fir forest, pure eucalyptus forest, and other mixed forests. A total of 19 plots were investigated in the study area, of which 6 plots are pure eucalyptus forest, 7 plots are pure Chinese fir forest, and the rest are other mixed forests, a total of 1657 trees. The tree species includes eucalyptus, Chinese fir, Illicium verum, masson pine, etc. The plots are 25 × 25 m in size, recorded data include the location, plot number, aspect, tree species, tree height, tree crown width, and other basic measurement factors. During the field work, we collected some sampling points and recorded the precise location using a handheld GPS device, including latitude and longitude and tree species information. The positioning accuracy of sampling points is available, which can be used for extraction of training and validation samples. Additional reference data included subcompartment data. The main reference factors included types of ground objects and dominant tree species information, which were used as reference for vegetation cover types and tree species distribution in the forest farm.

Data Preprocessing
The airborne hyperspectral data have been preprocessed by the data providers including radiometric calibration, geometric correction, and orthorectification, as well as we processed data for mosaicing and cropping, atmospheric correction, denoising, and geometric registration with LiDAR data. Because the flight altitude was relatively low, the hyperspectral data was less affected by the atmosphere, thus it facilitated to correction. In this study we used MODTRAN 4+ radiation transfer model [44] supported by ENVI to perform atmospheric correction on hyperspectral data, which can correct the cascade effect caused by diffuse reflection and adjust the spectrum smoothing caused by artificial suppression. The minimum noise fraction rotation (MNF) method was used to remove image noise with its advantage for hyperspectral data denoising.
According to the field survey data and CCD orthoimages, we selected typical tree species samples of above seven tree species and extracted the mean spectral reflectance curves ( Figure 2). The vegetation spectral curve highlights the valley and peak, and the near-infrared bands form a distinct high reflection peak, which is consistent with the spectral curve characteristics of the vegetation. Comparing the spectral reflectance of each tree species, there is greater separability in the near-infrared spectral region, and the spectral reflectance values of broad-leaved species are generally higher than conifer species. The LiDAR data provided more control to users for vertical information analysis. The horizontal accuracy of LiDAR is about 0.5 m and the vertical accuracy is about 0.3 m after comparing with typical observation targets. The consistency of LiDAR and CCD products is within 1 pixel for gentle slope areas and 1 to 2 pixels for hilly areas. The canopy height model (CHM) extracted from airborne LiDAR data is an important feature variable. The processing mainly contains filtering classification to separate ground and non-ground points from the point cloud data. Before classifying ground points, abnormal points should be filtered out, including points that are significantly lower than the ground or higher than the surface target, and moving object points. A digital elevation model (DEM) was created by performing a triangulated irregular network (TIN) [45] interpolation operation using the point cloud product that was already separated as ground points. Meanwhile, the digital surface model (DSM) was generated by interpolating the first return points. The CHM elevation-normalized data were obtained by conducting a grid difference calculation between DSM and DEM [35].
The hyperspectral data and the LiDAR CHM data should perform coregistration. We used the nearest neighbor resampling method [46] and selected 20 representative control points on the two images for coregistration, which were located along the road or flat area. The average error of the control points was less than one pixel. Thus, the coregistration result of the two data was confidential.

Sample Collection
According to the field survey data and the position of the tree species recorded in the field observation, combining with the high spatial resolution CCD orthoimages and subcompartment data, we randomly chose 522 image-object samples in the study site for training, each image-object sample covered several complete canopies, which ensured the training sample was more representative. We selected 372 image-object samples for validating the results. These samples were evenly distributed throughout the study site, ensuring that training samples and validation samples were not duplicated. Table 3 shows the number of training samples and verification samples for each tree species.

Workflow Description
The workflow of this study is illustrated in Figure 3. We used airborne hyperspectral images and LiDAR data for object-based tree species classification. The classification process includes five major steps: (1) Stratified classification is used to remove non-forest land and avoid confusion with tree species; (2) extraction of feature variables from airborne hyperspectral image and LiDAR data and using random forest and recursive feature elimination method for selecting the optimal combination of feature variables; (3) selection of the optimal segmentation parameters for image segmentation; (4) using KNN and SVM classifier to classify object-based tree species by combination of different features; and (5) classification accuracy evaluation by analyzing the differences of various features combination.

Image Segmentation
The object-based classification method is an image automatic analysis method. The accuracy of image segmentation significantly affects the classification accuracy [47]. In this study, the multiscale segmentation algorithm in eCognition Developer software is used for segmentation. It starts with a single pixel as a bottom-up region merging algorithm. After numerous iterations, the small objects are merged into a complete larger object [48]. The key of multiscale segmentation is to set the parameters such as band weight, segmentation scale, shape index, and compactness index. In this study, we set a series of different segmentation parameters and analyzed all segmentation results to determine the optimal segmentation parameters.

Stratified Classification
In the study area, there is a large area of non-forest land in addition to forest land. If we directly select training samples of each category for classification, the workload of the classification is greatly increased, and the classification results decrease. Therefore, the stratified classification of forest land and non-forest land can avoid the interference of non-forest land spectral information. The normalized difference vegetation index (NDVI) is sensitive to the changes of soil background [49,50]. By comparing the NDVI values of non-forest land and forest land, it was found that the forest land can be identified well when NDVI > 0.52, therefore, it is set as the threshold for forest land identification, and the non-forest land is simply divided into water area and road or buildings. In the forest land, there is also some forest land for seedling storage and cutting land generally with no tree growth or canopies coverage, and therefore it is unnecessary or difficult to distinguish tree species. Such forest land is also divided and collectively referred to as other forest land. When 0.52 ≤ NDVI < 0.7 and CHM < 2, the other forest land can be better distinguished.

Feature Variables Extraction and Selection
The spectral features of tree species are different from each other, at the same time, combining spatial information and other auxiliary information can be more accurate to distinguish tree species [15]. In this study, we extracted four sets of features.

Independent Components Analysis
Independent component analysis (ICA) is a commonly used method of dimensionality reduction that converts a group of mixed signals into independent components [51]. We performed independent component analysis on the processed hyperspectral image and found that the first five independent components after conversion included 99% information of all spectral bands. Therefore, we selected the first five independent components to participate in the classification as spectral feature variables.

Spectral Index
According to the indices related to canopy structure, chlorophyll content and water content, we selected nine vegetation indices, including normalized difference vegetation index NDVI, plant senescence reflectance index PSRI, modified red edge simple ratio index MRESRI, modified red edge normalized difference vegetation index MRENDVI, normalized green difference vegetation index GNDVI, photochemical reflectance index PRI, structure insensitive pigment index SIPI, vogelmann red edge index VOG1, and anthocyanin reflectance index ARI1.
The vegetation indices are concentrated in the visible and near-infrared bands region. NDVI and GNDVI are related to the chlorophyll content of plants [52]. NDVI increases the difference between the scattering of green leaves in the near-infrared region and the chlorophyll absorption in the red band region [53]. PSRI is related to the ratio of carotenoids to chlorophyll, reflecting canopy stress and vegetation senescence [54]. MRESRI and MRENDVI are sensitive to canopy changes and senescence, taking into account the mirror reflection effect of leaves [55]. PRI is related to the changes of plant carotenoid, leaf stress, and carbon uptake efficiency [56]. SIPI reflects the sensitivity of the ratio of carotenoids to chlorophyll in the reduction of canopy structure, which is related to the stress of vegetation changes in canopy structure [57]. VOG1 is sensitive to the combination of chlorophyll concentration, canopy layer, and water content [58]. ARI1 indicates the change in absorption of anthocyanins in the green band relative to the red band [59]. In summary, the selected spectral indices indicate the difference in leaf, canopy structure, chlorophyll content, and water content of tree species, therefore, these indices have certain discrimination for seven tree species.
We analyzed the spectral reflectance curves of each tree species and found that there are large differences in red edge and near infrared region. The slopes of red band and red edge versus the slopes of red band and near-infrared band were different from each other. Therefore, constructing new spectral indices about slopes may be helpful for tree species identification. There are also differences in the area of the triangle formed in red band, red edge, and near-infrared band. Therefore, constructing an area-dependent spectral index may also contribute to the identification of tree species.
The analyses of the spectral reflectance and first derivative reflectance of each tree species (Figures 4 and 5), show it is obvious that the spectral reflectance values of 760 nm and 890 nm are different, and the first derivative reflectance of 687 nm is continuously increased from zero, showing that the reflectivity changes significantly afterwards, which can be considered as the starting point of the platform. Therefore, we constructed three new spectral indices, which are the slope of the spectral between wavelengths 687 nm and 760 nm, 687 nm and 890 nm, and the triangle area enclosed by wavelengths 687 nm, 760 nm, and 890 nm. The schematic diagram is shown in Figure 4. The specific equation of the slope (SL) and the enclosed triangle area (TA) are as followed: The equation of the slope between wavelengths 687 nm and 760 nm is: where SL1 is the slope between wavelengths 687 nm and 760 nm, ρ is the spectral reflectance value of the corresponding band, and ∆λ is the difference in wavelength between 687 nm and 760 nm. The equation of the slope between wavelengths 687 nm and 890 nm is: where SL2 is the slope between wavelengths 687 nm and 890 nm, ρ is the spectral reflectance value of the corresponding band, and ∆λ is the difference in wavelength between 687 nm and 890 nm. The equation of the triangle area enclosed by wavelengths 687 nm, 760 nm and 890 nm is: where TA is the area enclosed by the wavelengths 687 nm, 760 nm, and 890 nm, ρ is the spectral reflectance value of the corresponding band, and ∆λ is the difference in wavelength between 687 nm and 760 nm, ∆λ is the difference in wavelength between 687 nm and 890 nm, ∆λ is the difference in wavelength between 760 nm and 890 nm. Nine vegetation indices and three new constructed spectral indices composed a set of spectral index features, which can be used for tree species classification. Table 4 shows the spectral indices formulation calculated by hyperspectral images.  (3) Note: The hyperspectral image spectral band is ρ and the wavelength is 400 nm to 1000 nm.

Textural Feature
Texture feature is an important factor in object-based classification [60,61]. Making full use of texture information of the image can effectively solve the phenomenon of the same objects with different spectrum. We used the grey level co-occurrence matrix (GLCM) to calculate eight texture features based on second-order matrix [62], including mean, variance (VAR), homogeneity (HOM), contrast (CON), dissimilarity (DIS), entropy (ENT), second moment (SM), and correlation (COR).
According to the previous studies [36,63], we selected three bands, i.e., band 482 nm, band 550 nm, and band 650 nm as RGB of hyperspectral images for texture analysis in this study. The texture window size was set from 3 × 3, 5 × 5, 7 × 7, …, to 31 × 31, and the step length was 1, the moving direction took the average of four directions of 0°, 45°, 90°, and 135°, to extract the above eight texture features. We selected different texture window size images to test the accuracy results by ICA transformation features selection and SVM classifier. As shown in Figure 6, the overall accuracy varied with the texture window size. When the window size was 17 × 17, the overall accuracy of classification is the highest. Therefore, we selected the 17 × 17 texture window size in this study and the extracted 24 texture features were used for the next feature combination and feature selection.

Canopy Height Model from LiDAR Data
The structure and height of tree species vary with their growth habits. Due to the phenomenon that different objects have the same spectrum, the tree species with similar spectral were difficult to distinguish [64,65]. In order to solve this problem, we used canopy height model obtained by LiDAR data as a feature variable, recorded the variable as the CHM, which reflected the height information of each tree species.
We calculated the CHM of each of the tree species samples, classified 1 m interval as one level, and obtained the tree species height distribution frequency chart (Figure 7). It can be seen that the height distribution curves of Illicium verum, masson pine, slash pine, Chinese fir, and mytilaria laosensis have obvious gaps and the height of eucalyptus species are not concentrated mainly due to different planting years. The other broad leaves include a variety of broadleaf species, therefore the height also distributes a wide range. According to statistics, the average tree height of each tree species is as follows: Illicium verum 5.01 m, masson pine 8.51 m, slash pine 14.82 m, Chinese fir 11.22

Selection of Optimal Variable Combination
In this study, we extracted four sets of feature variables (Table 5), which constituted a larger dataset and increased the dimension of the data used for classification. These extracted feature variables could be highly correlated or redundant and increased the complexity of overall calculation. For some classifiers, it can lead to a dimensional disaster, called "Hughes phenomenon" [66]. For the classifier with Hughes phenomenon, the performance cannot reach the true expression [67], and therefore it is very important to avoid Hughes phenomenon. According to previous studies [68,69], feature selection in high-dimensional datasets and identification of the most important features can improve model interpretability and speed up the sample training process. We used two methods to select feature variables. Random forest (RF) [68,70] is an algorithm based on decision trees. It modifies the candidate segmentation characteristics of decision trees, analyzes the results of each decision tree, and then completes the prediction and classification of samples. RF was used because it can rank the importance of all the feature variables, calculate the importance of each feature variable, and sort in descending order. The indicators include mean decrease accuracy (MDA) and mean decrease gini (MDG). Generally, the larger of the value means the variables are more important. On the basis of mean decrease accuracy, we selected the top-ranked feature variables as the optimal feature variables to participate in the classification. h/m f Recursive feature elimination (RFE) is a feature selection method using feature ranking technology [71,72]. It performs the backward sequence reduction based on all of the input features, and eliminates the least relevant features each step, and finally, obtains the optimal feature subset [69]. On the basis of the SVM classifier, recursive feature elimination (SVM-RFE) was first applied to the field of molecular biology by Guyon et al. [73], and then applied to the field of remote sensing [74], but it was rarely applied in the classification of hyperspectral data. In this study, the SVM-RFE algorithm was used for feature selection. Through the comparison of all the features, the features with important functions were retained, and the features that had an inhibitory effect on the improvement of the classification accuracy were deleted. At the end, the optimal feature subset was formed.

Classification Method
Object-based classification technology aggregates adjacent pixels with the same or similar attributes into one object, in which the image objects are used as the object-based classification unit [75]. On the basis of the segmentation objects, selected features are used for classification that can effectively distinguish categories. In this study, we used feature variables extracted from hyperspectral image and LiDAR data as the classification criteria, and selected k-nearest neighbor and support vector machine classifiers to classify tree species.
The k-nearest neighbor (KNN) is an instance-based learning method that is generally considered to be one of the simplest machine learning classifiers [36,76]. It has been widely used in high resolution and hyperspectral image classification [77]. The main idea of KNN classifier is to sort the difference between the calculated samples to be classified and the training samples, in ascending order, to select the top K least differential category. The category that has the most occurrences among the K categories is the most similar class. Finally, the samples are classified into classes with the most similar training samples. The optimal neighborhood value K of classification experiment is determined by performing multiple tests on training samples of different feature subsets. Support vector machine (SVM) has been widely used in hyperspectral image classification as a supervised machine learning method based on statistical theory [78]. Its main idea is to maximize the distance between the two sides of the plane and the two types of samples closest to the plane by establishing an optimal decision hyperplane, providing a good generalization for classification. For a multidimensional sample set, the system randomly generates a hyperplane and moves continuously until the samples belonging to different categories are located on both sides of the hyperplane, which can solve the problem with a limited number of training samples and improve the generalization performance.

Determination of Classification Scheme
In order to evaluate the performance of different feature combinations and feature selection, six schemes were proposed for KNN and SVM object-based tree classification (Table 6).

Accuracy Assessment of Classification Results
The accuracy of KNN and SVM classifier classification results was evaluated using the selected 372 verification samples and the confusion matrix was used to evaluate the classification result of feature variables combination. The confusion matrix includes overall accuracy (OA), Kappa coefficients, producer accuracy (PA), and user accuracy (UA). Overall accuracy and Kappa coefficients are used for the overall classification performance. Producer accuracy and user accuracy are used to evaluate of individual classes [79].

Image Segmentation Results
In the multiscale segmentation process, we need to set segmentation parameters, including weight of input layers, segmentation scale, shape index, and compactness. The scale of segmentation directly determines the size and fragmentation of the object. In general, the smaller the scale, the smaller the object, and the larger number of segments. First, we defined the range and step length of each parameter as follows: The range of scale was 1 to 5, step length was one, shape index ranged from 0.1 to 0.5, and compactness parameters ranged from 0.1 to 1, both of them had a step length of 0.1, and then we combined different parameters to segmented hyperspectral images.
We compared and analyzed all the segmentation results and found, when segmentation scale was four or five, that the Chinese fir, masson pine, and other broad-leaved tree species could not be separated well. When segmentation scale was one or two, segmented objects were too fragmented, which affected the image processing efficiency. According to a series of interactive segmentation experiments, we finally determined that when the segmentation scale was set to three, the shape index was set to 0.1, and the compactness was set to 0.4, the boundary of the segmented object best fits the boundary of actual tree species. As shown in Figure 8, each object includes a complete canopy or several canopies. Therefore, when we used these parameters to segment the hyperspectral image, the segmentation results were visually the best.  Figure 9 shows the result of removing non-forest land and cutting land according to NDVI and CHM. The first step was to remove non-forest land. When the NDVI was <0.52 the water area, road, and buildings can be distinguished well. Therefore, we used 0.52 as the NDVI threshold to distinguish non-forest land. Secondly, in the forest land, there is also some forest land for seedling storage and cutting land, called other forest land. Generally, there are no trees or canopies in these land covers, therefore when 0.52 ≤ NDVI < 0.7 and CHM < 2, the other forest land can be better distinguished. Then, the rest are the forest land used to classify tree species. In order to facilitate observation, the forest land was merged. The left (Figure 9a) shows that water area and some road and buildings are better distinguished from the forest land. The right (Figure 9b) shows the distinguishing result between cutting land and the forest land. Therefore, stratified classification effectively avoids the mixing phenomenon of non-forest land and tree species.

Comparison of Tree Species Classification Results
The classification results of different schemes using KNN and SVM classifiers are shown in Figure 10. Both classifiers spatially distinguish different tree species within the study area. According to the comparison and analysis of the classification results, the classification boundaries of eucalyptus and Chinese fir can be divided in space; whereas the Illicium verum and other broad-leaved species have obvious mixed phenomena in the first two schemes because the spectrum of the two tree species is similar. When texture features and CHM height information were added, the mixed phenomenon was greatly reduced. On both sides of the road, there are some mixed objects, the analysis is that the road and buildings have a certain influence on the spectral reflectance of trees and resulted in the mixed classification. According to visual judgment, SVM classifier is better than KNN classifier. For example, some castanopsis hystrix intercroped with Chinese fir (the red polygon in the Figure 10) in the southeast corner of the study area can be better distinguished when using SVM classifier, while KNN classifier was trivially and misclassified some castanopsis hystrix into Chinese fir. Table 7 summarizes the overall accuracy of the different classification schemes using KNN and SVM classifiers. The results show that SVM classifier is better than KNN classifier. On the basis of the SVM classifier, the scheme F classification accuracy is the best with the overall accuracy of 94.68%, and a Kappa coefficient of 0.937. The KNN classifier-based scheme D has the highest classification accuracy of 90.28% and a Kappa coefficient of 0.884. The classification accuracy was improved by 9.76% and 11.6% and the Kappa coefficients were improved by 0.117 and 0.139 as compared with scheme A based only on ICA transformation features. The producer accuracy and user accuracy of different schemes based on the two classifiers are shown in Tables 8 and 9.  For the classification scheme A based on ICA transformation features, the overall accuracy of the SVM classifier is 83.08% and the Kappa coefficient is 0.798; the overall accuracy of the KNN classifier is 80.51% and the Kappa coefficient is 0.767. Among them, the classification accuracy of eucalyptus is the highest, and the difference between its producer accuracy and user accuracy is very small, which shows that the results of these two classifiers for eucalyptus recognition is better and stable. Other broad-leaved species and slash pine have lower classification accuracy because the other broad-leaved species are mainly identified as Illicium verum and mytilaria laosensis. Because of the wide variety of other broad-leaved species, their spectral are very diverse; Illicium verum and mytilaria laosensis have similar spectral curves and many samples were misclassified into them.
After adding the spectral index features, the classification accuracy of the two classifiers was improved, and the overall accuracy was improved by 3.17% and 3.74%, respectively. The producer accuracy of slash pine increased the most, and the two classifiers were, respectively, improved from 62.09% and 71.25% to 81.17% and 90.08%. It indicates that the spectral index plays a certain role in tree species classification. The texture features were added to the classification scheme C, and the classification accuracy of each tree species was improved. The overall accuracy of the SVM classifier reaches 90.86%, and the accuracy of slash pine, masson pine, and Illicium verum species improved more than other species of broadleaf.
In the scheme D with added CHM features, the comparison of the classification accuracy of each tree species showed that the classification accuracy of other broad-leaved tree species is significantly improved. The producer accuracy of other broad-leaved tree species with the SVM classifier is 89.57%, i.e., improved by 15.24% and the KNN classifier is 81.68%, i.e., improved by 12.96%. The wide variety of other broad-leaved species have similar spectral curves with Illicium verum, and therefore most of them are misclassified into Illicium verum. The height of Illicium verum is lower at 5 meters, while the heights of other broad-leaved tree species are generally above 10 meters, and therefore after adding the height information, the separability of the two tree species was improved.
For scheme E and scheme F, the features selected by the random forest method and the recursive feature elimination method were used for classification. The classification accuracy of the KNN classifier used the selected features is close to the classification accuracy of all features, and the SVM classifier achieved higher classification accuracy as compared with using all the features. The comparison of the two feature selection methods showed that the recursive feature elimination based on SVM is slightly better than the random forest method and is more suitable for the classifier used in this study. In the selected two sets of feature subsets, there are 13 identical features selected by the two methods, among which there are nine spectral features. In the spectral indices, NDVI, PRI, GNDVI, SL2, and PSRI all appear in the two selected feature subsets.

Comparison of Classification Results Based on Two Classifiers
In this study, object-based classification was used to classify tree species, and the classification effects of the two classifiers were compared and analyzed. As shown in Figure 10, SVM classifier can better distinguish castanopsis hystrix intercroped with Chinese fir, while KNN classifier has a trivial classification result with low classification accuracy, and some castanopsis hystrix are misclassified into Chinese fir. This is because KNN classifier is closely related to the distance of the training samples. For the strip-shaped castanopsis hystrix species, the objects to be classified are easily affected by the samples of Chinese fir, which leads to the misclassification. According to Table 7, the classification accuracy of SVM classifier is higher than KNN classifier. As a mature method in supervised classification, SVM requires low training samples and high operability. For multidimensional samples set, the system randomly generates a hyperplane and moves continuously, establishing an optimal decision hyperplane, and classifies the samples. Therefore, SVM classifier has good performance when the number of training samples was limited, which reduced misclassification. For example, slash pine and mytilaria laosensis have high producer accuracy and user accuracy. In scheme E and scheme F, the classification accuracy of the SVM classifier using the selected features is higher than using all the features. Similar conclusions were also obtained in previous studies [36,80], i.e., for excessive spectral features and other ancillary features, classifier performance and efficiency can be improved by eliminating redundant features.
With an increase of high spatial resolution images, more and more studies have adopted objectbased methods. Previous studies have shown that object-based methods provide better classification accuracy than pixel-based methods when using high spatial resolution images [20]. In our study, we used object-based classification to avoid the phenomenon of "salt and pepper", and effectively overcame the drawbacks by considering space, shape, and texture features of the image. However, there are some problems in the object-based method, which is that the segmentation scale parameters are difficult to determine adaptively, and the classification accuracy is affected by the segmentation accuracy. Therefore, rapid optimization and improvement of segmentation parameters are also important to improve classification accuracy.

The Role of Spectral Index Features
A comparison of the spectral reflectance curves of different tree species shows that the shape of the reflectance curve between conifers, conifers, and broad-leaved species is similar [35]. We also found that in the visible region, the reflectance difference between tree species is large, and in the near-infrared region, the reflectance difference between tree species is larger, and therefore this position is conducive to tree species identification. In view of that, we built the new spectral indices in the near-infrared region.
In this study, ICA transformation features and spectral indices extracted by hyperspectral imagery were used as spectral features. In scheme B, after adding spectral index features, the overall classification accuracies of two classifiers were improved by 3.17% and 3.74%, respectively. Scheme E and F used 13 identical features in the two subsets including nine spectral features, which indicates that the spectral features play an important role in the classification. In the spectral indices, NDVI, PRI, GNDVI, SL2, and PSRI appear in two subsets, which proves that the newly constructed index SL2 has an effect by improving the classification accuracy of tree species. The other selected indices, NDVI and GNDVI, are related to the chlorophyll content of plants. PRI is associated with changes in carotenoids in plants and PSRI is related to the ratio of carotenoids to chlorophyll. It indicates that the preferred spectral indices are closely related to the chlorophyll and carotenoids of the vegetation, while four indices are related to the vegetation reflectance in the near-infrared band. These factors can effectively distinguish different tree species. Therefore, spectral index, as a remote sensing parameter of vegetation, plays an important role in forest resource monitoring. In this study, we added spectral index features to classify tree species based on ICA transformation images. Previous studies have also shown that if the ground objects are subdivided such as tree species, it is difficult to identify tree species by spectral index alone [81]. Therefore, in practical applications, the use of spectral index combined with relevant auxiliary data is an effective method for extracting tree species information.

The Role of Texture Features
With the improvement of spatial resolution of remote sensing data, the use of spatial information is becoming more and more widely used while applying spectral information. Due to the different structure and growth state of the canopy, conifer and broadleaf species produce different texture features. In this study, texture information was added to the scheme C, and the classification accuracy of the two classifiers was significantly improved. The overall accuracy of the SVM classifier reaches 90.86%. It is explained that texture features play a role in classification. At present, there are many methods for texture analysis at home and abroad [82,83], among which the gray level co-ocurrence matrix (GLCM) is recognized as the most widely used and best applied method. In the process of extracting textures using GLCM, the results of texture extraction are closely related to different window sliding directions, window sliding distances, and window sizes. In this study, texture features of multiple texture window sizes were extracted, and we selected the 17 × 17 window size with the highest classification accuracy. The overall classification accuracy for slash pine, masson pine, and Illicium verum was higher than that of other species of broad leaves, which was related to the crown width of each tree species. The fact is that the canopy of the coniferous tree species is generally small, the height of the Illicium verum is low and the crown width is also small, and the crown width of other types of broad-leaved tree species is generally wide, showing that the texture window size we selected is more suitable for small crown tree species. Therefore, in subsequent studies, the tree species can be layered according to the characteristics of canopy, and then combined with different windows of the textures for classification.

The Role of Canopy Height Model
Airborne based hyperspectral imagery generally has high spatial resolution and spectral resolution and has obvious advantages in tree species identification [26]. However, using only hyperspectral imagery for vegetation classification results in different objects having the same spectrum and the same objects having different spectrum. Airborne LiDAR data can provide accurate three-dimensional structural information and has a good applicability for describing the vertical height of complex forest. The CHM extracted from LiDAR data is an important feature variable, and different tree species generally have a specific height range.
Previous studies have shown that combining canopy height characteristics can improve tree species classification [28,84]. In this study, scheme D added CHM height information, and the classification accuracy was improved as compared with the scheme without adding height information. The classification accuracy of other broad-leaved species is significantly increased with the improved producer accuracy of 15.24% and 12.96% by the SVM and KNN classifiers, respectively. This indicates that the addition of CHM features can effectively improve the classification accuracy of vegetation in complex forest areas, and it also shows that height information can play an important role when the spectral of tree species are similar. The addition of vertical structure information makes the training samples in relatively independent space and the interference factors less, and therefore the accuracy of the training samples is the key to improving the accuracy of image classification.
The height of the tree species can distinguish the tree species very well. One of the reasons is that the same tree species generally have the same forest age, and therefore the height information plays a greater role in improving the classification accuracy of the tree species. At the same time, we also found that the accuracy of eucalyptus decreased after adding height features due to the inconsistent height of the different planting years. Therefore, in applications, when distinguishing tree species in the stand in which each species has the same age group, the tree height can be utilized as a key feature variable, otherwise tree height information can produce extra disturbances which lead to low classification accuracy. In addition, the terrain of the study area selected in this study is relatively flat. The pixel values of the CHM can be used to indirectly reflect the vegetation height, which can effectively improve the accuracy of tree species classification. However, in hilly areas, the error of the point cloud data obtained from LiDAR data can increase, causing the generated DEM and DSM data to be inaccurate, and therefore the CHM may reflect incorrect tree heights. Therefore, in hilly areas, the impact of adding CHM data on classification accuracy of tree species needs further analysis. And different point cloud densities can create different CHM information. In subsequent studies, the impact of point cloud densities on tree species classification should be analyzed.
In general, spectral features, texture features, and CHM height information play a role in improving the classification accuracy of tree species. The addition of each feature increases the separability between categories. Among them, the height information has a significant effect on improving the classification accuracy of other broad-leaved species. When the spectral information of the tree species is similar, complete utilization of the height information and texture features can significantly improve the identification accuracy for the even-aged forest.

Conclusions
In this study, we used airborne hyperspectral images and LiDAR data for object-based tree classification. Independent components, spectral indices, and texture features were extracted from airborne hyperspectral data, new spectral indices were constructed by analyzing spectral curve of tree species, and CHM features were extracted from LiDAR data. On the basis of feature combination and feature selection, we compared and analyzed the contribution of different features and classifiers on object-based classification. The following conclusions can be drawn: (1) Compared with the KNN classifier, the SVM classifier has higher classification accuracy, with the highest classification accuracy of 94.68% and a Kappa coefficient of 0.937. It shows that the SVM classifier has better performance when the number of training samples is limited. By eliminating redundant features, the classification accuracy and performance of the SVM classifier can be further improved, and the recursive feature elimination based on the SVM feature selection method is better than random forest. (2) In the spectral indices, NDVI, PRI, GNDVI, SL2, and PSRI are in the selected feature subsets, indicating that the newly constructed SL2 spectral index plays a role in improving classification accuracy. At the same time, the preferred spectral indices are closely related to vegetation chlorophyll and carotenoids, and four indices are related to near-infrared band. These factors can effectively distinguish different tree species. (3) With the addition of texture features, the classification accuracy of both classifiers is significantly improved. The overall classification accuracy of slash pine, masson pine, and Illicium verum was higher than other species of broad leaves. Therefore, the selected texture window size is more suitable for small crown tree species, which implies that using a single texture window size has certain limitations. Considering the type of forest, using multiscale texture window size should be a new research topic in improving tree species classification. (4) CHM height information has a significant effect on improving the classification accuracy of tree species especially other broad-leaved species. It can effectively distinguish tree species with similar spectral features, but different tree heights. The accuracy of the CHM is affected by the terrain. In hilly areas, the CHM may reflect incorrect tree heights. In addition, the CHM has a certain relationship with the LiDAR point cloud density, and therefore the influence of point cloud density and terrain factors on CHM and tree species classification need further analysis. (5) Object-based classification can avoid the phenomenon of "salt and pepper" and classification accuracy is affected by the segmentation accuracy. However, segmentation scale parameters are difficult to determine adaptively, so rapid optimization and improvement of segmentation parameters are quite important to improve classification accuracy.