Object-Based Approach Using Very High Spatial Resolution 16-Band WorldView-3 and LiDAR Data for Tree Species Classiﬁcation in a Broadleaf Forest in Quebec, Canada

: Species identiﬁcation in Quebec, Canada, is usually performed with photo-interpretation at the stand level, and often results in a lack of precision which a ﬀ ects forest management. Very high spatial resolution imagery, such as WorldView-3 and Light Detection and Ranging have the potential to overcome this issue. The main objective of this study is to map 11 tree species at the tree level using an object-based approach. For modeling, 240 variables were derived from WorldView-3 with pixel-based and arithmetic feature calculation techniques. A global approach (11 species) was compared to a hierarchical approach at two levels: (1) tree type (broadleaf / conifer) and (2) individual broadleaf (ﬁve) and conifer (six) species. Five di ﬀ erent model techniques were compared: support vector machine, classiﬁcation and regression tree, random forest (RF), k-nearest neighbors, and linear discriminant analysis. Each model was assessed using 16-band or ﬁrst 8-band derived variables, with the results indicating higher precision for the RF technique. Higher accuracies were found using 16-band instead of 8-band derived variables for the global approach (overall accuracy (OA): 75% vs. 71%, Kappa index of agreement (KIA): 0.72 vs. 0.67) and tree type level (OA: 99% vs. 97%, KIA: 0.97 vs. 0.95). For broadleaf individual species, higher accuracy was found using ﬁrst 8-band derived variables (OA: 70% vs. 68%, KIA: 0.63 vs. 0.60). No distinction was found for individual conifer species (OA: 94%, KIA: 0.93). This paper demonstrates that a hierarchical classiﬁcation approach gives better results for conifer species and that using an 8-band WorldView-3 instead of a 16-band is su ﬃ cient.


Introduction
Forest characterization in Quebec, Canada, is usually assessed based on photo-interpretation using three-dimensional appearance. This approach has been used since the last century and is still in use for forest planning and forest composition analysis [1]. New techniques, such as image enhancement, have been developed over the years using aerial imagery and user-friendly software, and the information provided has been well accepted by and proven useful for foresters [2,3]. However, species identification with these newer methods still lacks precision, and varies among photo-interpreters, mainly because this characterization is made at the stand level, as species identification at the tree level would be time consuming and expensive [3,4]. Recently, very high spatial resolution satellite imagery has become more available and could be used to classify tree species at tree level across different biomes [5][6][7]. In addition, with an airborne laser scanner or "LiDAR" (light detection and ranging), an infrared laser can scan the surface of the earth, generating a 3D point cloud that can be used to analyze the tree structure [8,9].
For the purposes of the project and due to time limitations, three study areas totaling 26.1 km 2 were selected to collect field data, train and apply the models. Those areas contain mature forest stands composed of dominant tree species with diverse structural stands and topography.
Remote Sens. 2020, 12, x FOR PEER REVIEW 4 of 35 For the purposes of the project and due to time limitations, three study areas totaling 26.1 km 2 were selected to collect field data, train and apply the models. Those areas contain mature forest stands composed of dominant tree species with diverse structural stands and topography.

Imagery and Airborne Laser Scanner Data
Very high spatial resolution 16-band WorldView-3 imagery was acquired for the study areas on August 26th 2016 with a nadir view angle of 12.9° and a solar elevation of 54.3° (Table 1). Two cloudfree images were collected and preprocessed. The images were geometrically corrected with Rational Polynomial Coefficients (RPCs), radiometrically calibrated (radiance to reflectance) and then pansharpened using the least squares algorithm [82]. No atmospheric correction or dehaze reductions were applied since many artifacts were produced, and this preprocessing has proven to be less effective than pansharpening for tree classification purposes using high spatial resolution imagery [6,30,82]. The orthorectification was done using a 5 m LiDAR Digital Elevation Model with OrthoEngine of PCI Geomatica [6,7]. For the purpose of the study, all sixteen bands were rescaled to 30 cm, similarly to Li et al. [45]. Finally, the images were mosaicked with a bundle adjustment and

Imagery and Airborne Laser Scanner Data
Very high spatial resolution 16-band WorldView-3 imagery was acquired for the study areas on 26 August 2016 with a nadir view angle of 12.9 • and a solar elevation of 54.3 • (Table 1). Two cloud-free images were collected and preprocessed. The images were geometrically corrected with Rational Polynomial Coefficients (RPCs), radiometrically calibrated (radiance to reflectance) and then pansharpened using the least squares algorithm [82]. No atmospheric correction or dehaze reductions were applied since many artifacts were produced, and this preprocessing has proven to be less effective than pansharpening for tree classification purposes using high spatial resolution imagery [6,30,82]. The orthorectification was done using a 5 m LiDAR Digital Elevation Model with OrthoEngine of PCI Geomatica [6,7]. For the purpose of the study, all sixteen bands were rescaled to 30 cm, similarly to Li et al. [45]. Finally, the images were mosaicked with a bundle adjustment and then fitted with the CHM using 100 tie points to reduce the offset in the canopy [83]. A second-order polynomial regression was used to create the final mosaic with a root mean square error of 0.97 m. Inspired by Zhou and Qiu [84] and Hartling et al. [31], deep shadow was extracted from the mosaic using a maximum likelihood classification with a shadow index [85]. The Bhattacharyya index showed a separability of 1.94 to detect deep shadow against other elements. This result was used to mask the mosaic for subsequent processes. Those preprocesses were carried out using PCI Geomatica (version 2016) and ENVI (version 5.4).
LiDAR data were acquired on 17 June 2015, with a point density of 10 pts/m 2 . The sensor used was a Riegl Q-780 system with a pulse repetition frequency of 400 kHz and a laser wavelength of 1064 nm. For the acquisition, a Cessna 172 airplane was flown at a mean height of 1200 m above ground level with a flight speed of 185 km/h. The absolute accuracy in xyz was 30 cm. The point cloud was classified by the provider into five classes: Unassigned, Ground, High vegetation, Building and Water. We used the Ground class (lowest points) to create a 50-cm Digital Elevation Model (DEM) and the High vegetation class (highest points) to produce a 50-cm Digital Surface Model (DSM). We subtracted the DSM from the DEM to obtain the 50-cm CHM. This procedure was performed using the LAS Dataset To Raster function and the Raster Calculator tool in ArcGIS Desktop 10.6.

Field Survey and Data Collection
A total of 515 dominant trees were identified and positioned in the study areas using a high precision Trimble Pro6H GPS with a mean precision of 1 m after post-processing. We targeted individual trees or groups of trees presenting the same species with a minimum crown diameter of 5 m. Geographical coordinates and physical parameters, such as tree heights and crown sizes, were measured or interpreted from WorldView-3 and aerial images [1]. Based on GPS positions, a manual delineation was made by photo-interpretation to fit the crown to the correct tree on the WorldView-3 images ( Figure 2) [6,7,86]. Thus, 185 broadleaf and 153 coniferous tree samples (total of 338) remained after this manual delineating exercise, as only visible and identifiable crowns were kept (Table 2). Mean crown sizes vary between species (22-85 m 2 ) and the mean height is over 16 m in all cases.
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 35 3 images ( Figure 2) [6,7,86]. Thus, 185 broadleaf and 153 coniferous tree samples (total of 338) remained after this manual delineating exercise, as only visible and identifiable crowns were kept (Table 2). Mean crown sizes vary between species (22-85 m 2 ) and the mean height is over 16 m in all cases.

Derived Variables
WorldView-3 imagery data were used to extract, calculate and adapt variables based on the available literature. The details of these variables are given in Appendix A. The spectral variables were computed from the 16 bands so as to cover diverse spectral elements (vegetation, wetness, Manually delineated crowns by photo-interpretation and their corresponding GPS points collected during the field campaign. The background image displays WorldView-3 in false colors (infrared, green and blue). The species' abbreviations are described in Table 2.  Tooth Aspen  BT  BL  42  25  4  13  5  18  Red Oak  RO  BL  60  24  3  24  10  34  Sugar Maple  SM  BL  85  24  3  37  9  46  Yellow Birch  YB  BL  63  22  4  36  10  46  Balsam Fir  BF  CN  22  16  3  13  3  16  Eastern White Cedar  EC  CN  31  21  5  16  5  21  Eastern Hemlock  HK  CN  39  23  3  29  9  38  Red Pine  RP  CN  59  28  3  15  5  20  White Pine  WP  CN  64  26  4  38  10  48  White Spruce  WS  CN  35  20  5  7  3  10 Total 259 79 338

Derived Variables
WorldView-3 imagery data were used to extract, calculate and adapt variables based on the available literature. The details of these variables are given in Appendix A. The spectral variables were computed from the 16 bands so as to cover diverse spectral elements (vegetation, wetness, openness etc.). Spectral variables of reflectance values were calculated by pixel [6] and object-based methods using R [87], SAS software [88] and eCognition Developer. Object-based indices consist of a series of customized arithmetic features calculated using the mean of pixel values within an object of specific bands [16,19,80]. Arithmetic features were also calculated for the 95th percentile highest pixels within each object in order to use the brightest (sunlit) parts of each crown. Although they could be correlated with pixel-based indices, arithmetic features are quick to calculate, which is a significant advantage when working with massive data. Textural variables were extracted using eCognition Developer 9.2.

Tree Crown Segmentation from Fused Data
The first step of the object-based approach is to segment the territory into contextual objects, such as single tree crowns. ITC segmentation using CHM is challenging in complex forest stands [89]. We decided to analyze trees over 17 m in height to reduce the effect of understory on canopy gaps [29,90] and as a trade-off based on the field survey (see Table 2). A 2 m buffer around mature trees (>17 m) was incorporated to keep pixels that are part of the same crown but that have smaller heights; those less than 7 m in height were eliminated and used as a mask for further processing [16,29]. Preprocessed CHM is usually utilized for ITC segmentation [15,21,[91][92][93]. Various spatial filters with different window sizes (two or four pixels) and shapes (rectangular or circular) were tested, and the one that best fit the tree crowns visually was selected. We then combined the original and filtered CHMs with the Jakubowski et al. [16] method to keep both the edges of the original CHM and the smooth central crown of the filtered CHM. Topological operations were also undertaken to merge small crowns [93].
Three CHMs were evaluated in this study: original, filtered and corrected. Similarly to Pham et al. [32] and Koch et al. [15], an inverse watershed segmentation algorithm was used within eCognition Developer. This algorithm uses a CHM to find local maxima to grow a region according to the heights of neighboring pixels until they touch another object [94]. Several parameters were tested in order to select the optimal values for the height differences between local maxima and minima and object areas, using trial and error with various combinations. A neighborhood of eight pixels was used in order to produce disconnected objects. Three thresholds were used; considering the high resolution of the CHM (50 cm), these were: (1) an overflow area of 25 pixels (6.25 m 2 ); (2) an overflow depth (difference between maximum and minimum height) of 0.5 m; and (3) an overflow volume of 15 m 3 were selected. Objects that were below those thresholds were merged to their neighboring objects.
The approach based on the CHM depends on height variation detection and is therefore related to the shape of the crowns. However, different neighbor tree species may have overlapping branches and could produce confusion when using imagery data to classify tree species. We added a multiresolution segmentation to make the objects generated with CHM more precise by using imagery at a sublevel [5,6]. This segmentation is a bottom-up region merging technique. Combining CHM and imagery to make ITC segmentation allows pure species objects to be delineated. The multiresolution segmentation algorithm with the most significant bands identified with statistical analysis was used, as suggested by Koch et al. [15] (Section 2.5.1). This algorithm requires several empirical parameters, including: (1) the weights of each selected bands; (2) a color/shape weight associated with the spectral/shape homogeneity; (3) a compactness/smoothness weight according to the object shape; and (4) a scale parameter referring to the average size of objects. In this study, to integrate non-correlated visible and infrared information, and by trial and error, four bands (3, 5, 6 and 7) were selected and assigned the same weight (1). The weights of color and compactness were set as 0.3 and 0.7, respectively, in order to insist on spectral variations while retaining more compact objects such as tree crowns. A scale parameter of 100 was selected with visual assessments in order to limit under-segmentation. It has been shown that under-segmentation affects the classification accuracy more than over-segmentation [95]. Those parameters were empirically chosen to ensure best results for ITC segmentation [19]. The performance of ITC segmentation based on CHM alone and then with imagery was assessed using 30 random non-isolated crowns with photo-interpretation.
We first evaluated if a segment represented a single crown. Considering the offset between the data and the complexity of the forest structure, a single crown was identified when the segment contained at least 75% of the corresponding tree [15]. Then, we evaluated the number of species encompassed within the segment. The ITC segmentation technique which gave the best accuracy was retained and used for classification over the study areas.

Variable Selection
In accordance with the literature, a total of 240 variables were used as predictor variables, including 64 band statistics (mean, skewness, standard deviation), 112 spectral indices and 64 textures (detailed in Appendix A). Variable selection is a crucial procedure before modeling. Using a large number of predictor variables is time consuming, requires high computing capacity, reduces the reproducibility, and the results cannot be easily interpretable. Furthermore, the use of a large number of predictor variables does not necessarily produce the best results, since the model performance can vary widely according to the variables utilized.
As a first step, outliers were removed from the dataset, identified based on the work of Brillinger [96]. We first calculated the interquartile range for our data. Using that range, only the values falling between the median value plus 2.5 times the interquartile, and the median value minus 2.5 times the interquartile were conserved. A datum was removed after it had been attributed to an outlier for more than eight distributed variables.
Secondly, to reduce the dimensionality of the data we proceeded with a correlation analysis to avoid redundant information. Pairs of variables with correlation coefficients greater than 0.85 were considered to be highly correlated, and the variable presenting the highest mean correlation with the remaining predictor variables was discarded from further analysis [97]. To simplify the analysis, those procedures were applied to parametric and non-parametric models. For all models, except LDA, further procedures were applied. At the end of the correlation analysis, this limited number of variables were introduced as input variables in the Boruta algorithm [98]. Boruta is a wrapper algorithm that seeks to capture all the important, interesting features in a dataset with respect to a variable outcome. The 15 most significant variables were selected. Finally, as machine learning algorithms do not always retain the relevant variables [99,100], we created a loop among the 15 selected variables from Boruta to evaluate the performance of models created by all possible combinations of a number of variables (3, 4, 5, 6, 7, 8, 9 and 10). For the LDA, variable selection was done with the Stepdisc procedure in SAS using the stepwise selection method and a variable entry and staying significance level of 0.005. Those iterations were processed within the training dataset. We then retained the combination that gave the best performance for each classification technique. Those procedures were independently done for 16-band and 8-band WorldView-3 derived variables.

Modeling Process
We began by attempting to model all 11 tree species, five broadleaf and six coniferous, in a global approach. Next, we used a hierarchical approach with group classifications, similarly to Wessel et al. [60]. In the first step, we attempted to separate the two tree types (broadleaf and conifer trees). In the second step, we proceeded to the modeling of individual tree species belonging to each type. For the classification, four modeling procedures were implemented under R: RF [101], SVM [62], k-NN [102], CART [102,103] and one under SAS: LDA [76,77].
To avoid overfitting of the classification models, independent validation was conducted by dividing the available reference data into two sets, where 80% of the total samples were used for model calibration [25] and the remaining samples were used as test set (Table 2). Tuning was applied differently on each model. To find the best gamma and penalty parameters for SVM we used a grid search over a supplied parameter range, and the combination of parameters that maximized model performance was retained. The CART model was tuned with different complexity parameter (cp) values that were estimated by testing cp values using cross-validation approaches. The best cp was then defined as the one that maximized the cross-validation accuracy. For the RF model, we selected the number of trees to grow (ntree) as the number from which the error converged and remained stable based on the out of bag (OOB) error. Careful selection of this parameter is key, as we want to have enough trees to stabilize the error but not so many that we over-correlate the ensemble. On the other hand, the number of variables randomly sampled as candidates at each split (mtry) was selected based on the value that minimised the OOB error and maximised the model performance. The neighborhood parameter (k) for the k-NN algorithm was selected based on the model achieving the best accuracy when varying k. For the LDA, selected variables were included using the Discrim procedure in SAS with a parametric method based on multivariate normal distribution within each class to derive a linear discriminant function [88]. Five models were developed in order to compare their performance using various combinations of predictor variables and tuning parameters. To determine if the eight extra bands of WorldView-3 imagery (SWIR 1 to 8) would improve classification accuracy, those procedures were first applied with variables derived from 8-band WorldView-3 and then repeated with variables derived from 16-band WorldView-3. Once the best-performing model was identified it was used to map the tree species' distribution throughout the study areas.

Model Performance
The classification performance of each model was assessed based on confusion matrices computed using reference data (20%) selected as a test dataset. Comparing the observed and the predicted data allowed us to assess producer and user accuracies. Overall accuracy (OA) was calculated by averaging accuracies among all classes [104]. We calculated Cohen's Kappa index of agreement (KIA) to evaluate the possibility of an agreement occurring simply by chance [105]. The KIA is a robust statistic useful for reliability testing. Similar to correlation coefficients, it can range from −1 to +1, where 0 represents the amount of agreement that can be expected from random chance, and 1 represents perfect agreement [106,107]. We compared all the models and selected the optimal ones offering the highest OA and KIA.

Individual Tree Crown Segmentation and Assessment
Three CHMs were used in the ITC segmentation: the original CHM, a filtered CHM and a corrected CHM. Figure 3 presents a 3D profile for each CHM. It shows that the original CHM has a high z range and the filtered CHM smooths the z variation. The corrected CHM is composed of both original local variety and filtered CHM smoothness.  As a second step, imagery was added to the ITC segmentation. The assessment shows similar results for filtered and corrected CHMs (Table 3). They produced better delineation than the original CHM for single crown (63% vs. 40%) and single species (73% vs. 70%) segmentation. Our results indicate that the use of CHMs alone for ITC segmentation can lack precision, especially when different species' crowns can be interlaced or when their neighbors are at the same height. The combination of filtered CHM and imagery showed the best result for single crown delineation (68%). Combinations of original and corrected CHMs with imagery indicated that 56% and 64% of the objects fitted a single crown, respectively, showing over-segmented crowns. ITC segmentation assessment showed that the best results for single species were found by combining a corrected CHM with imagery. For this combination, 82% of the objects represented a single species, in contrast to original (74%) and filtered (75%) combinations. For the residual objects (18%), a majority showed a single species for 50% of their area. Figure 4 shows an example of ITC segmentation using a corrected CHM combined with imagery, the combination chosen for tree mapping. As a second step, imagery was added to the ITC segmentation. The assessment shows similar results for filtered and corrected CHMs (Table 3). They produced better delineation than the original CHM for single crown (63% vs. 40%) and single species (73% vs. 70%) segmentation. Our results indicate that the use of CHMs alone for ITC segmentation can lack precision, especially when different species' crowns can be interlaced or when their neighbors are at the same height. The combination of filtered CHM and imagery showed the best result for single crown delineation (68%). Combinations of original and corrected CHMs with imagery indicated that 56% and 64% of the objects fitted a single crown, respectively, showing over-segmented crowns. ITC segmentation assessment showed that the best results for single species were found by combining a corrected CHM with imagery. For this combination, 82% of the objects represented a single species, in contrast to original (74%) and filtered (75%) combinations. For the residual objects (18%), a majority showed a single species for 50% of their area. Figure 4 shows an example of ITC segmentation using a corrected CHM combined with imagery, the combination chosen for tree mapping.

Classification and Assessment
For a visual analysis, the mean spectral values were calculated for each tree species of the training dataset ( Figure 5). The species are most discriminated in the NIR region. Conifers ( Figure  5B) are more separable than broadleaf species ( Figure 5A), as their curves are more distinguishable.

Classification and Assessment
For a visual analysis, the mean spectral values were calculated for each tree species of the training dataset ( Figure 5). The species are most discriminated in the NIR region. Conifers ( Figure 5B) are more separable than broadleaf species ( Figure 5A), as their curves are more distinguishable.  Table 2.
Variable selection was conducted prior to the classification. From the 240 original derived variables, 50 and 75 were not correlated using the first eight bands and using all the bands, respectively, and were processed in the Boruta algorithm. The 15 most significant variables were then selected and utilized to run all possible combinations for all models except LDA. Among those 15 variables, three were especially notable: (A) the ARI_mean_95pc_higher index (Anthocyanin Reflectance Index); (B) the GLCM_Entropy_Band_7; and (C) the GMI2_mean index (Simple NIR/Rededge Ratio) ( Figure 6). The ARI (Anthocyanin Reflectance Index) was calculated with the 95th percentile highest pixels and allows an estimation of the anthocyanin (water-soluble vacuolar pigments) accumulation in intact stressed and senescing leaves [108]. Broadleaf trees have higher values than conifers, which is consistent with the fact that leaves, in contrary to needles, accumulate  Table 2.
Variable selection was conducted prior to the classification. From the 240 original derived variables, 50 and 75 were not correlated using the first eight bands and using all the bands, respectively, and were processed in the Boruta algorithm. The 15 most significant variables were then selected and utilized to run all possible combinations for all models except LDA. Among those 15 variables, three were especially notable: (A) the ARI_mean_95pc_higher index (Anthocyanin Reflectance Index); (B) the GLCM_Entropy_Band_7; and (C) the GMI2_mean index (Simple NIR/Red-edge Ratio) ( Figure 6). The ARI (Anthocyanin Reflectance Index) was calculated with the 95th percentile highest pixels and allows an estimation of the anthocyanin (water-soluble vacuolar pigments) accumulation in intact stressed and senescing leaves [108]. Broadleaf trees have higher values than conifers, which is consistent with the fact that leaves, in contrary to needles, accumulate more anthocyanin to protect them from sunlight [109]. The GLCM (Grey Level Co-occurrence Matrix) entropy texture index was calculated using band 7 (832.5 nm); this index has a high value when all pixels are of similar magnitude [110,111]. Its values show that Sugar Maple (SM) species presents the most uniform pattern. A crown with small bumps and shallow cavities, with an appearance similar to that of a broccoli, are indeed characteristics used in photo-interpretation to identify this species [112]. The GMI2 (Gitelson and Merzylak Index) index is a simple ratio that allows chlorophyll content estimation in a wide range of pigment variation using insensitive (B7: near-infrared) and sensitive (B6: red-edge) bands and could be considered as an improved Normalized Difference Vegetation Index (NDVI) [113]. This index shows values that vary between conifer species. White pine (WP) is the species with the highest value, indicating a higher chlorophyll content than the other conifer species [114].
Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 35 more anthocyanin to protect them from sunlight [109]. The GLCM (Grey Level Co-occurrence Matrix) entropy texture index was calculated using band 7 (832.5 nm); this index has a high value when all pixels are of similar magnitude [110,111]. Its values show that Sugar Maple (SM) species presents the most uniform pattern. A crown with small bumps and shallow cavities, with an appearance similar to that of a broccoli, are indeed characteristics used in photo-interpretation to identify this species [112]. The GMI2 (Gitelson and Merzylak Index) index is a simple ratio that allows chlorophyll content estimation in a wide range of pigment variation using insensitive (B7: near-infrared) and sensitive (B6: red-edge) bands and could be considered as an improved Normalized Difference Vegetation Index (NDVI) [113]. This index shows values that vary between conifer species. White pine (WP) is the species with the highest value, indicating a higher chlorophyll content than the other conifer species [114].  Table 2. The variables' abbreviations are described in Appendix A.

Figure 6.
Boxplots of the (A) ARI_mean_95pc_higher index; (B) GLCM_Entropy_Band_7; and (C) GMI2_mean index for tree types, broadleaves and conifers, respectively. The species' abbreviations and the number of training samples used to calculate the mean are described in Table 2. The variables' abbreviations are described in Appendix A.
For the global modeling approach, RF (ntree: 2000; mtry: 3) was selected as the best model based on the performance assessment, as it gives an OA of 75% and a KIA of 0.72 when using 16-band derived variables (Table 4). This performance was achieved using nine variables. SVM also performed well, with 71% OA and a KIA value of 0.68. The k-NN, CART and LDA approaches offered less-precise classification, with OAs below 61%. All the performance models declined or were stable when using 8-band instead of 16-band derived variables, except for LDA, which increased by 5% (61% to 66%). For the hierarchical modeling approach, RF (ntree: 2000; mtry: 2) presented the best performance with OA of 99% and KIA of 0.97 for tree type (broadleaf/conifer) modeling (Table 5). This performance was achieved using four variables derived from 16-band WorldView-3. If only 8-band instead of 16-band derived variables were used, RF and k-NN achieved the best performances (OA: 97%, KIA: 0.95). All the performance models declined or were stable when using 8-band derived variables. For broadleaf and conifer species modeling, RF (ntree: 2000; mtry: 2) gave the best performances using 8-band derived variables, with OAs of 70% and 94% and using six and seven variables, respectively ( Table 6). For broadleaf modeling with RF, the KIA value (0.63) indicated a moderate agreement, which is considered as a substantial agreement as presented by McHugh [106]. The KIA was greater than 0.90 for conifer species, which is considered as an almost perfect agreement [106] with a value of 0.93. For RF, using 16-band instead of 8-band derived variables did not increase OAs for broadleaves (70% vs. 68%) or conifers (94% in both cases). Table 5. Summary of the performance assessment using training and test sets for the five modelling techniques (RF, SVM, k-NN, CART and LDA) for the tree type (broadleaf/conifer) modeling approach. For tree mapping, the selected modeling technique with the highest overall accuracy (OA) is highlighted in dark. KIA: Kappa Index of agreement.

Based on 8-Band WorldView-3
Based on 16-Band WorldView-3 Without considering the global approach, the three models selected to classify tree species used a total of 16 variables: nine spectral indices, three simple bands, one standard deviation and three textures ( Table 7). Out of the 16 WorldView-3 bands, a total of nine bands were used in our selected models ( Table 8).

Model Technique No of Variables
The following error matrices come from the best models (RF) using 16-band and 8-band derived variables for the global approach and tree type, and for individual species, respectively. Although the RF model presented an OA of 75% for the global approach, its precision per species fluctuated highly and varied between 58% for SM and 100% for Big Tooth Aspen (BT), Eastern Hemlock (HK) and White Spruce (WS) for the user's accuracy in the confusion matrix, while its producer's accuracy varied between 33% for WS and 100% for HK and WP ( Table 9). The tree type model (Table 10) had almost perfect results (OA: 99%), with one error: one conifer was classified as a broadleaf. For broadleaf species identification (OA: 70%) (Table 11), all species were classified over 70%, except Red Oak (RO) and Yellow Birch (YB) (67%) for user's accuracy. For producer's accuracy, all species were classified over 60%. For conifer species (OA: 94%) (Table 12), all species were perfectly classified (100%), except WP with (83%) for user's accuracy and Eastern White Cedar (EC) and Red Pine (RP) (80%) for producer's accuracy.
The RF model was used to map tree types and individual broadleaf and conifer tree species over the study areas. Figure 7 illustrates the classification map in an island composed of both tree types. Tall WP trees are especially visible on the border of the island. The inland is mainly made up of SM and YB. Smaller trees (<17 m) were not mapped. Table 7. Description of the sixteen variables used in the selected models for the hierarchical approach. For spectral indices calculated with 95th percentile highest pixels, the abbreviation is "variable_mean_95pc_higher", for arithmetic feature calculated for spectral indices, the abbreviation is "variable_mean". Band numbers are described in Table 1.

Abbreviation Vegetation Index Adapted Formula Models References
ARI_mean Anthocyanin Reflectance Index 1/B3_mean − 1/B6_mean Conifer [115] ARI_mean_95pc_higher Anthocyanin Reflectance Index Arithmetic mean of the 5% higher pixel value of the object with ARI Tree type [115] Band_1_mean Layer values Mean value of band 1 of the pixels forming the object Broadleaf [6] Band_12_95pc_higher Layer values Arithmetic mean of the 5% higher pixel value of the object using band 12 Tree type [6] Band_5_95pc_higher Layer values Arithmetic mean of the 5% higher pixel value of the object using band 5 Broadleaf [6] GLCM_Entropy_Band_7 Texture values Entropy calculated with the value of band 7 of the pixels forming an object Broadleaf; Conifer [110,111] GLCM_Homogeneity_Band_3 Texture values Homogeneity calculated with the value of band 3 of the pixels forming an object Conifer [110,111] GLCM_Homogeneity_Band_4 Texture values Homogeneity calculated with the value of band 4 of the pixels forming an object Conifer [110,111]

Discussion
This study compares five different models to successfully map 11 tree species in a natural North American forest based on WorldView-3 imagery and LiDAR data. The proposed method is highlighted by three main aspects: (1) an object-based segmentation technique using imagery and LiDAR; (2) a hierarchical classification approach with more than ten species; and (3) model iterations  Table 2.

Discussion
This study compares five different models to successfully map 11 tree species in a natural North American forest based on WorldView-3 imagery and LiDAR data. The proposed method is highlighted by three main aspects: (1) an object-based segmentation technique using imagery and LiDAR; (2) a hierarchical classification approach with more than ten species; and (3) model iterations for optimal selection. ITC segmentation is usually implemented when mapping species at the tree level, and studies have often used LiDAR data [13,119] or imagery [6,11,120]. Using only LiDAR or imagery at the tree level results in objects with merged tree crowns [121], especially in a mature broadleaf forest like the one in the present study. Both data types could be used together to limit this effect. As an example, Heinzel and Koch [121] delineated ITCs using a pixel-based classification within the objects to avoid neighbor tree errors. While Alonso-Benito et al. [39] used LiDAR and imagery for segmentation, they did not classify at the tree level. Koukoulas and Blackburn [83] also used both data types, but with a succession of complex GIS procedures to find treetops. The ITC segmentation proposed here follows a watershed algorithm [122] from LiDAR data similarly to Weinacker et al. [93] and Koch et al. [26]. Significant bands for tree types (broadleaf and conifer) were then used to refine the segmentation using a multiresolution algorithm as suggested by Pham et al. [32] and Koukoulas and Blackburn [83]. This approach has similarities with multiscale approaches to separate species in a dense and complex forest. Indeed, raster-based ITC segmentation approaches do not allow object overlaps yet offer a more realistic representation for a broadleaf natural forest [16]. As shown in Table 3, the results indicate that using a filtered or corrected CHM delineates single crowns and species better than using an original CHM (increased accuracy of at least 20% for single crowns and 3% for single species). When imagery is added to ITC segmentation it leads to over-segmentation, creating many objects in large crown cases when compared to their corresponding manually-delineated crowns. Single crown delineation accuracy could be reduced. In such a situation, one option would be to merge similar small objects [24] using spectral difference as a second step [80], although over-segmentation is generally preferred to under-segmentation [40,95]. For this assessment, no isolated tree crowns were used. This could be another reason why accuracies were not over 70% for single crown delineation. For single species delineation, its accuracy improved with imagery; up to 9% for the corrected CHM. For filtered and original CHMs, the accuracy slightly improved with imagery (2-4%). This could be related to the fact that ITC segmentation using filtered CHM alone produced bigger objects. Those were then divided into smaller parts that were not entirely covered (at least by 75%) by a single species.
The Kenauk Nature property is composed of a complex mixed forest with more than 25 tree species. A number of studies have used high spatial resolution sensors to map tree species in a natural forest environment at the tree level; those included relatively few species recognition [6,29,32,43,44,93,121]. For example, Immitzer et al. [7] classified 10 species while concentrating on pure stands for reference data, where spectral variability could be limited. Having such a high number of species in our study area forced us to survey only the dominant species (11). Misclassification could therefore be influenced by the complex forest environment that made it difficult to target suitable data for references. For this reason, we manually delineated tree samples by stereo photo-interpretation to have reliable data as suggested by Immitzer et al. [7].
Previous studies generally limited their classification to a global approach without new machine learning techniques such as SVM and RF. For example, Waser et al. [6] classified seven species with an OA of 83% with a global approach using LDA. The present study demonstrates a hierarchical classification approach as a significant procedure in order to classify and map 11 tree species. This approach conserves the integrity of the tested algorithms in a hierarchical perspective by first classifying tree types and then the individual species by their corresponding type. Our results show that the hierarchical approach gives a better performance than using a single global approach, especially for conifers, which is consistent with other studies [123]. However, the hierarchical approach needed more variables (16) than the global approach (nine). Also, the hierarchical approach presented here shows that using multiple modeling techniques at each level allows the best models to be selected, which could vary. Therefore, this approach has the ability to reduce unbalanced accuracies between species as reported by studies working in a global approach [7]. In our case, RF was the best model for all levels, followed by SVM. This observation is in opposition to other studies working with coarser imagery, such as Sentinel-2 [60].
Another interesting element is that this approach allows the selection of relevant variables and specific model techniques for each classification level. The variables selected for each model were not the same for broadleaf and conifer species. For example, broadleaf species are more distinguishable using texture variables because their branch structures are much more varied (Figure 6(B)). A similar technique was used in Krahwinkler and Rossmann [124] to make a binary decision tree hierarchical structure by classifying each single species. Our approach permits a simpler way to classify species by type with satisfying results, and limits the hierarchical structure to two levels. Moreover, instead of using only the SVM, we tried five different models to optimize the accuracy. On the other hand, it is worth noting that SVM and RF are generally the best algorithms according to their OA. For tree species classification, SVM is generally recognized to be more effective when working with a small number and imbalanced distribution of samples [45]. It should be pointed out that ancillary variables (topographic position index, topographic wetness index and water proximity, etc.) could also improve classification accuracy [32,53,125], although it would be important to collect stratified samples evenly distributed among those variables.
Mitigated improvements were observed when using 16-band or 8-band WorldView-3 derived variables. The additional eight bands (SWIR 1 to 8) slightly enhanced the global approach (OA: 75% vs. 71%, KIA: 0.72 vs. 0.67) and tree type classifications (OA: 99% vs. 97%, KIA: 0.97 vs. 0.95), but did not improve individual species classification. This is partially consistent with other studies that observed an improved classification accuracy when adding new bands, especially with a large number of tree species [7,40]. For example, Ferreira et al. [126] simulated WorldView-3 bands for tree species discrimination and found that incorporating SWIR bands significantly increased the average accuracy. Despite the low spatial resolution compared to other multispectral bands (5.25 × 7.5 m vs. 0.84 × 1.2 m), the spectral information of SWIR bands was significant for certain inter-species separability, despite the fact that their integration should be made with caution when mapping smaller trees because their crowns could be covered by just a few pixels. Adding the SWIR bands also permitted to integrate spectral indices that were developed within hyperspectral studies [127][128][129][130]. Finally, the small accuracy improvement shows that it may be sufficient to use only 8-band derived variables to simplify the method.
The model iterations procedure for optimal selection is an important contribution of this study compared to other similar studies. Studies generally integrate all variables without an oriented variable selection or by using complicated methods such as linear mixed-effects modeling and genetic algorithms [32,45,[131][132][133]. However, this selection aspect is essential to insure reproducibility for operational purposes [14]. Moreover, our results showed that using fewer variables could actually improve the classification. We proposed a simple method using all the variables in order to select the 15 most significant variables provided by the Boruta algorithm [98], and eliminated the inter-correlated variables similarly to Budei et al. [14]. We then computed all combinations to determine the one that obtained the best results using the least possible number of variables.
Spectral variable calculation techniques are also an important aspect of this procedure. A majority of the recent studies use a pixel-based calculation technique to perform spectral variables [6,44,45]. We used two different calculation techniques: pixel-based and arithmetic feature (mean of all pixels or 95th percentile highest pixels within each object). For example, a tree crown could have an NDVI value that differs depending on if it is calculated using the mean of each red and near-infrared band (arithmetic feature) or if the mean of the NDVI calculated by pixel is extracted. The first case allows spectral variables to be calculated rapidly, while the second case makes it possible to calculate textural variables, for example. Indeed, an arithmetic feature has the advantage of creating variables rapidly within R or SAS instead of adding a new raster band each time, which would make massive data management difficult. Additionally, using the 95th percentile of the highest pixel values allowed us to keep the sunlit parts of crowns and thereby limit the shadow effects which could affect classification accuracy [7].
Machala et al. [19] was concerned about using maximum values in features where objects are heterogeneous (e.g., high and low trees), but this is not the case in our study since ITC segmentation is aiming for homogeneous objects. While testing correlations for both calculation techniques, we obtained high coefficients for many variables. For the arithmetic feature of NDVI with the mean of all pixels within each object, we found correlations of 0.99 and 0.93 for pixel-based and 95th percentile of highest pixel values' corresponding variables, respectively. This method allowed more variables to be implemented in the modeling process.
Although the proposed approach is robust to identify 11 tree species, three main limitations were identified. The first limitation was that unevenly distributed samples between the 11 species made it difficult to correctly use machine learning models such as RF. This limitation was also identified by Tao et al. [134] and Farquad and Bose [135]. It is known that using an unbalanced training dataset tends to affect the prediction accuracy of the dominant classes, which implies lower accuracies in the less represented classes [60]. To limit this impact, new samples could balance the dataset, but this simple solution is also the most expensive, involving additional field surveys and photo-interpretation. As suggested by Farquad and Bose [135], another solution could be to automatically over-or under-balance the dataset [136].
The second limitation concerns spatial and spectral resolutions and the georeferencing of imagery. The research presented here was based on 16-band WorldView-3 imagery. Firstly, the spectral quality could have been affected by rescaling. WorldView-3 bands contain various spatial resolutions from 0.21 m for panchromatic up to 7.5 m for shortwave infrared. The panchromatic band ranges from 450 to 800 nm, covering the first seven bands. Despite the fact that the nine other bands were out of range, for methodological purposes, all bands were rescaled and pansharpened. Those last nine bands could have been degraded, which may have affected the modeling and the reproducibility of the method [137]. To limit this impact, the last nine bands should not be used for the texture variables. Secondly, despite preprocessing, an offset between imagery and LiDAR CHM persist (RMS: 0.97 m) and could affect the ITC segmentation and classification modeling. The offset at the ground level was almost perfect, but the misalignment of the crowns was sometimes over 3 m. Tree crowns tilted in the image could be used for stereo-reconstruction when at least two images are used [78,138], but using a single image caused complex situations where segmented LiDAR crowns were not matching their corresponding trees in the WorldView-3 image. A digital surface model derived from LiDAR could also be used to orthorectify the image [25,29]. However, we did several tests and decided not to use this technique because it created many artefacts when high spatial resolution images such as WorldView-3 were used. In this study, where mature trees were present all over the area, manual points were collected to fit the CHM and thereby reduce this offset. To limit this effect, a threshold of 17 m was set as a mask in order to analyze only tall and large trees. The imagery was also integrated in the ITC segmentation to divide objects including more than one species as a solution to eliminate the offset between data sources.
The third limitation of the proposed approach concerns the fact that the territory is composed of more than 11 species. Given that the species modeling does not include the full diversity, a marginal species will be classified into one of the 11 species classes used in the modeling. Also, small trees were not mapped, as a threshold of 17 m was incorporated. It would be interesting to integrate more species classes in the modeling, considering groups of age or height [26]. Although more species will make the model more complex, functional groups could be tested in the hierarchical approach [139], multi-temporal imagery could be used [40,41,45] or more advanced algorithms like deep learning techniques [31]. Li et al. [45] argued that using bi-temporal WorldView imagery could improve the classification on average by 10.7%. He et al. [40] found their best results when combining late-spring, mid-summer and early-fall images. Hartling et al. [31] demonstrated that deep learning techniques could improve broadleaf species classification by at least 30% compared to RF and SVM. Adding other variables such as LiDAR metrics or topological measures could also improve the classification [8,14,16,22,39,131]. Finally, an expert procedure could be implemented to select a maximum number of each categorical variable to limit over-representation [136]. For example, this procedure would avoid the need to automatically select only LiDAR variables and instead allow for a mix of LiDAR, spectral indices, topological variables, etc.

Conclusions
This study proposes a method to map individual tree species by using machine learning techniques with very high spatial resolution imagery (WorldView-3) in a complex natural North American forest at the tree level. An object-based approach at multiple scales was conducted. We found that adding spectral information to CHM improved ITC segmentation. We were able to successfully classify five broadleaf species and six conifer species using a hierarchical approach, with OAs of 70% and 94%, respectively. This hierarchical approach had better accuracies for conifers than using a global approach (75%). Only sixteen variables were used with three models (tree type, broadleaf and conifer) corresponding to nine spectral bands. Among the five tested machine learning techniques, RF provided the best results for all cases. This method could also be applied on a large scale with limited manipulations. The resulting maps represent a valuable tool with which to analyze forest composition and to guide forest planners. However, ITC segmentation could be enhanced with automatic evaluation techniques which could allow additional iterations. Ancillary variables such as topographic and hydrographic indices could also improve the classification accuracy. This approach could also be enriched by balancing dataset and expert procedures, by integrating LiDAR metrics and multi-temporal imagery or by combining other sensors such as hyperspectral sensors.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Variables' details. For pixel-based spectral indices, the abbreviation is "variable_mean_pixel", for spectral indices calculated with 95th percentile highest pixels, the abbreviation is "variable_mean_95pc_higher", for arithmetic feature calculated for spectral indices, the abbreviation is "variable_mean". Band numbers are described in Table 1.   GLCM_Dissimilarity_ Band_X Band 1 to 16 Dissimilarity calculated with the pixels forming an object [111] x 16 GLCM_Entropy_ Band_X Band 1 to 16 Entropy calculated with the pixels forming an object [111] x 16 GLCM_Homogeneity_ Band_XBand 1 to 16 Homogeneity calculated with the pixels forming an object [111] x 16 Total 232