1. Introduction
Forest characterization in Quebec, Canada, is usually assessed based on photo-interpretation using three-dimensional appearance. This approach has been used since the last century and is still in use for forest planning and forest composition analysis [
1]. New techniques, such as image enhancement, have been developed over the years using aerial imagery and user-friendly software, and the information provided has been well accepted by and proven useful for foresters [
2,
3]. However, species identification with these newer methods still lacks precision, and varies among photo-interpreters, mainly because this characterization is made at the stand level, as species identification at the tree level would be time consuming and expensive [
3,
4]. Recently, very high spatial resolution satellite imagery has become more available and could be used to classify tree species at tree level across different biomes [
5,
6,
7]. In addition, with an airborne laser scanner or “LiDAR” (light detection and ranging), an infrared laser can scan the surface of the earth, generating a 3D point cloud that can be used to analyze the tree structure [
8,
9].
Furthermore, LiDAR data allows a forest to be characterized at the tree level, which can lead to a better estimation of timber volume and hence better planning by foresters [
10]. Individual tree crown (ITC) segmentation is being studied more and more [
11,
12,
13,
14]. Forest segmentation can be done through two different techniques: (1) point cloud-based and (2) raster-based, using the canopy height model (CHM) [
8,
15,
16]. The first technique generally gives good results, but it is time-consuming, complex and requires advanced LiDAR sensors [
17]. The second technique has been studied much more, both at the stand level [
18,
19] and at the tree level [
20,
21,
22], as there are a variety of algorithms that provide rapid ITC segmentation, which gives satisfactory results [
14,
16,
23].
Many studies have investigated tree species mapping at the tree level in a forest environment, however they usually process hyperspectral data [
24,
25,
26,
27,
28]. Fewer studies have tried to map tree species with satellite imagery at tree level scale [
6,
29,
30,
31]. Pham et al. [
32] tried to combine imagery, LiDAR and GIS topography indices, which led to better results than using a single data source. In previous projects, we used aerial hyperspectral data fused with LiDAR and GIS data to map individual tree species [
4,
33,
34]. The results showed global precisions of over 93% to classify ash and spruce against 14 other species, and precisions of 62% and higher to classify seven species, in an urban and in a forest environment, respectively. In the latter case, yellow birch and hemlock were the species identified with the best accuracy (mean precisions of 77% and 83%, respectively). Both studies were carried out using an experimental hyperspectral sensor. While using aerial hyperspectral data gives interesting results, the complexity of the processes, as well as the high acquisition costs over large areas must be taken into account [
35].
The use of satellite multispectral imagery is relevant for tree species mapping. Indeed, satellite data has been widely used to classify tree species [
30,
36,
37,
38,
39], but since the launch of DigitalGlobe multispectral sensors, none of these efforts has had the ability to capture very high spatial resolution (<2 m) that is as detailed as WorldView-3. Moreover, the eight new bands in the Short-Wave Infrared (SWIR) may improve tree species classification [
7,
40]. In remote areas, such precise satellite images can become an alternative to aerial photography [
41]. These images provide more spectral bands for analysis with a relatively competitive acquisition cost. Some studies also combined satellite imagery with LiDAR and demonstrated that a combination of both can significantly increase the accuracy of classification [
19,
38,
42,
43], but they have essentially worked at the stand level. Others have used fused data to classify tree species at the tree level using high spatial resolution sensors [
32,
43,
44]. More recently, Li et al. [
45] worked on tree species classification with WorldView-3 and LiDAR at the tree level in an urban context with isolated trees. Nevertheless, the number of tree species in those studies was limited, usually less than ten species.
Recently, the use of machine learning techniques, including the support vector machine (SVM), classification and regression tree (CART), random forest (RF), k-nearest neighbors (k-NN) and linear discriminant analysis (LDA) techniques for classifying forest characteristics have been gaining popularity. These techniques have been widely used in remote sensing for species classification [
46,
47], vegetation health assessment [
48,
49,
50,
51], biomass mapping [
52,
53,
54], wetland mapping [
55,
56,
57] and landslide risk evaluation [
58,
59]. He et al. [
40] also used RF in a hierarchical approach in order to classify tree species. However, few studies evaluated the use of multiple techniques in a hierarchical approach at the tree level [
60].
The SVM algorithm, initially suggested by Vapnik [
61], maximizes the margin around the hyperplane that separates features into different domains [
62]. For classes that are not linearly separable, the SVM uses a kernel function, reducing a nonlinear problem to a linear problem based on a radial basis function or Gaussian kernels. The penalty parameter (C) and the kernel parameter gamma (γ) for the radial basis function kernel should be optimized, and can heavily impact the classification accuracy when using SVM models [
63]. It is C that determines the trade-off between margin maximization and training error minimization [
64], while the γ parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’ [
65,
66,
67].
The CART approach operates by recursively splitting the data until the ending points, or terminal nodes, are achieved using pre-set criteria [
68]. The CART therefore begins by analyzing all explanatory variables and determining which binary division of a single explanatory variable best reduces deviance in the response variable [
69]. The main elements of the CART, and of any decision tree algorithm, are: (1) rules for splitting data at a node based on the value of one variable; (2) stopping rules for deciding when a branch is terminal and cannot be split anymore; and (3) a prediction for the target variable in each terminal node.
Introduced by Breiman [
70], RF is a classifier that evolves from decision trees. It actually consists of many CARTs. To classify a new instance, each tree is trained with a randomly selected subset of the training samples and variables based on bootstrap sampling, and then the final classification is conducted based on a majority vote on the trees in the forest [
71]. Although the classifier was originally developed for the machine learning community, thanks to its accuracy, interest in RF has grown rapidly in ecology [
72] and in the classification of remotely-sensed imagery [
73].
The k-NN [
74] algorithm is a non-parametric method that assigns to an unseen point the dominant class among its k-nearest neighbors within the training set. Unlike most other methods of classification, k-NN falls under lazy learning, which means that there is no explicit training phase before the classification. The classification with k-NN is carried out by following three steps: (1) compute a distance value between the item to be classified and every other item in the training data set; (2) choose the k-closest data points (the items with the k-lowest distances); and (3) conduct a “majority vote” among those data points to decide the final classification.
LDA has been widely used in various tree species classification studies [
75,
76,
77,
78]. LDA projects the original features onto a lower dimensional space by means of three steps [
79]: (1) calculate the separability between different classes, called the between-class variance; (2) calculate the distance between the mean and sample of each class, called the within-class variance; and (3) construct the lower dimensional space which maximizes the between-class variance and minimizes the within-class variance.
The main objective of this study is to map 11 tree species using an object-based approach with WorldView-3 imagery and LiDAR data. Object-based image analysis brings the capacity to group homogenous pixels into meaningful objects based on their spectral values, which can then be analyzed by their shape, size, texture and contextual information [
19], in contrary to the pixel-based approach [
80]. We implemented modeling techniques in a global and hierarchical approach. More specifically, this study aims to (1) delineate ITC using fused data (WorldView-3 imagery and LiDAR data); (2) compare models at each classification level (global and hierarchical); (3) evaluate classification improvement using 16-band instead of 8-band WorldView-3; and (4) apply the best models to map tree species over the study areas. This implies the ability to delineate ITCs in order to extract spectral signatures, and to assign a specie class to each object. For ITC segmentation, we used three different techniques. We propose an ITC segmentation using fused data (CHM and satellite imagery) to refine tree species’ crown delineation [
32]. The classification of species is divided into three parts, on two levels: (1) tree types and (2) broadleaf and conifer tree species. In the present study we used five different models (SVM, CART, RF, k-NN and LDA) to overcome the uncertainty derived from the use of an individual model, given that the results can vary depending on the modeling technique.
4. Discussion
This study compares five different models to successfully map 11 tree species in a natural North American forest based on WorldView-3 imagery and LiDAR data. The proposed method is highlighted by three main aspects: (1) an object-based segmentation technique using imagery and LiDAR; (2) a hierarchical classification approach with more than ten species; and (3) model iterations for optimal selection.
ITC segmentation is usually implemented when mapping species at the tree level, and studies have often used LiDAR data [
13,
119] or imagery [
6,
11,
120]. Using only LiDAR or imagery at the tree level results in objects with merged tree crowns [
121], especially in a mature broadleaf forest like the one in the present study. Both data types could be used together to limit this effect. As an example, Heinzel and Koch [
121] delineated ITCs using a pixel-based classification within the objects to avoid neighbor tree errors. While Alonso-Benito et al. [
39] used LiDAR and imagery for segmentation, they did not classify at the tree level. Koukoulas and Blackburn [
83] also used both data types, but with a succession of complex GIS procedures to find treetops. The ITC segmentation proposed here follows a watershed algorithm [
122] from LiDAR data similarly to Weinacker et al. [
93] and Koch et al. [
26]. Significant bands for tree types (broadleaf and conifer) were then used to refine the segmentation using a multiresolution algorithm as suggested by Pham et al. [
32] and Koukoulas and Blackburn [
83]. This approach has similarities with multiscale approaches to separate species in a dense and complex forest. Indeed, raster-based ITC segmentation approaches do not allow object overlaps yet offer a more realistic representation for a broadleaf natural forest [
16]. As shown in
Table 3, the results indicate that using a filtered or corrected CHM delineates single crowns and species better than using an original CHM (increased accuracy of at least 20% for single crowns and 3% for single species). When imagery is added to ITC segmentation it leads to over-segmentation, creating many objects in large crown cases when compared to their corresponding manually-delineated crowns. Single crown delineation accuracy could be reduced. In such a situation, one option would be to merge similar small objects [
24] using spectral difference as a second step [
80], although over-segmentation is generally preferred to under-segmentation [
40,
95]. For this assessment, no isolated tree crowns were used. This could be another reason why accuracies were not over 70% for single crown delineation. For single species delineation, its accuracy improved with imagery; up to 9% for the corrected CHM. For filtered and original CHMs, the accuracy slightly improved with imagery (2–4%). This could be related to the fact that ITC segmentation using filtered CHM alone produced bigger objects. Those were then divided into smaller parts that were not entirely covered (at least by 75%) by a single species.
The Kenauk Nature property is composed of a complex mixed forest with more than 25 tree species. A number of studies have used high spatial resolution sensors to map tree species in a natural forest environment at the tree level; those included relatively few species recognition [
6,
29,
32,
43,
44,
93,
121]. For example, Immitzer et al. [
7] classified 10 species while concentrating on pure stands for reference data, where spectral variability could be limited. Having such a high number of species in our study area forced us to survey only the dominant species (11). Misclassification could therefore be influenced by the complex forest environment that made it difficult to target suitable data for references. For this reason, we manually delineated tree samples by stereo photo-interpretation to have reliable data as suggested by Immitzer et al. [
7].
Previous studies generally limited their classification to a global approach without new machine learning techniques such as SVM and RF. For example, Waser et al. [
6] classified seven species with an OA of 83% with a global approach using LDA. The present study demonstrates a hierarchical classification approach as a significant procedure in order to classify and map 11 tree species. This approach conserves the integrity of the tested algorithms in a hierarchical perspective by first classifying tree types and then the individual species by their corresponding type. Our results show that the hierarchical approach gives a better performance than using a single global approach, especially for conifers, which is consistent with other studies [
123]. However, the hierarchical approach needed more variables (16) than the global approach (nine). Also, the hierarchical approach presented here shows that using multiple modeling techniques at each level allows the best models to be selected, which could vary. Therefore, this approach has the ability to reduce unbalanced accuracies between species as reported by studies working in a global approach [
7]. In our case, RF was the best model for all levels, followed by SVM. This observation is in opposition to other studies working with coarser imagery, such as Sentinel-2 [
60].
Another interesting element is that this approach allows the selection of relevant variables and specific model techniques for each classification level. The variables selected for each model were not the same for broadleaf and conifer species. For example, broadleaf species are more distinguishable using texture variables because their branch structures are much more varied (
Figure 6(B)). A similar technique was used in Krahwinkler and Rossmann [
124] to make a binary decision tree hierarchical structure by classifying each single species. Our approach permits a simpler way to classify species by type with satisfying results, and limits the hierarchical structure to two levels. Moreover, instead of using only the SVM, we tried five different models to optimize the accuracy. On the other hand, it is worth noting that SVM and RF are generally the best algorithms according to their OA. For tree species classification, SVM is generally recognized to be more effective when working with a small number and imbalanced distribution of samples [
45]. It should be pointed out that ancillary variables (topographic position index, topographic wetness index and water proximity, etc.) could also improve classification accuracy [
32,
53,
125], although it would be important to collect stratified samples evenly distributed among those variables.
Mitigated improvements were observed when using 16-band or 8-band WorldView-3 derived variables. The additional eight bands (SWIR 1 to 8) slightly enhanced the global approach (OA: 75% vs. 71%, KIA: 0.72 vs. 0.67) and tree type classifications (OA: 99% vs. 97%, KIA: 0.97 vs. 0.95), but did not improve individual species classification. This is partially consistent with other studies that observed an improved classification accuracy when adding new bands, especially with a large number of tree species [
7,
40]. For example, Ferreira et al. [
126] simulated WorldView-3 bands for tree species discrimination and found that incorporating SWIR bands significantly increased the average accuracy. Despite the low spatial resolution compared to other multispectral bands (5.25 × 7.5 m vs. 0.84 × 1.2 m), the spectral information of SWIR bands was significant for certain inter-species separability, despite the fact that their integration should be made with caution when mapping smaller trees because their crowns could be covered by just a few pixels. Adding the SWIR bands also permitted to integrate spectral indices that were developed within hyperspectral studies [
127,
128,
129,
130]. Finally, the small accuracy improvement shows that it may be sufficient to use only 8-band derived variables to simplify the method.
The model iterations procedure for optimal selection is an important contribution of this study compared to other similar studies. Studies generally integrate all variables without an oriented variable selection or by using complicated methods such as linear mixed-effects modeling and genetic algorithms [
32,
45,
131,
132,
133]. However, this selection aspect is essential to insure reproducibility for operational purposes [
14]. Moreover, our results showed that using fewer variables could actually improve the classification. We proposed a simple method using all the variables in order to select the 15 most significant variables provided by the Boruta algorithm [
98], and eliminated the inter-correlated variables similarly to Budei et al. [
14]. We then computed all combinations to determine the one that obtained the best results using the least possible number of variables.
Spectral variable calculation techniques are also an important aspect of this procedure. A majority of the recent studies use a pixel-based calculation technique to perform spectral variables [
6,
44,
45]. We used two different calculation techniques: pixel-based and arithmetic feature (mean of all pixels or 95th percentile highest pixels within each object). For example, a tree crown could have an NDVI value that differs depending on if it is calculated using the mean of each red and near-infrared band (arithmetic feature) or if the mean of the NDVI calculated by pixel is extracted. The first case allows spectral variables to be calculated rapidly, while the second case makes it possible to calculate textural variables, for example. Indeed, an arithmetic feature has the advantage of creating variables rapidly within R or SAS instead of adding a new raster band each time, which would make massive data management difficult. Additionally, using the 95th percentile of the highest pixel values allowed us to keep the sunlit parts of crowns and thereby limit the shadow effects which could affect classification accuracy [
7].
Machala et al. [
19] was concerned about using maximum values in features where objects are heterogeneous (e.g., high and low trees), but this is not the case in our study since ITC segmentation is aiming for homogeneous objects. While testing correlations for both calculation techniques, we obtained high coefficients for many variables. For the arithmetic feature of NDVI with the mean of all pixels within each object, we found correlations of 0.99 and 0.93 for pixel-based and 95th percentile of highest pixel values’ corresponding variables, respectively. This method allowed more variables to be implemented in the modeling process.
Although the proposed approach is robust to identify 11 tree species, three main limitations were identified. The first limitation was that unevenly distributed samples between the 11 species made it difficult to correctly use machine learning models such as RF. This limitation was also identified by Tao et al. [
134] and Farquad and Bose [
135]. It is known that using an unbalanced training dataset tends to affect the prediction accuracy of the dominant classes, which implies lower accuracies in the less represented classes [
60]. To limit this impact, new samples could balance the dataset, but this simple solution is also the most expensive, involving additional field surveys and photo-interpretation. As suggested by Farquad and Bose [
135], another solution could be to automatically over- or under-balance the dataset [
136].
The second limitation concerns spatial and spectral resolutions and the georeferencing of imagery. The research presented here was based on 16-band WorldView-3 imagery. Firstly, the spectral quality could have been affected by rescaling. WorldView-3 bands contain various spatial resolutions from 0.21 m for panchromatic up to 7.5 m for shortwave infrared. The panchromatic band ranges from 450 to 800 nm, covering the first seven bands. Despite the fact that the nine other bands were out of range, for methodological purposes, all bands were rescaled and pansharpened. Those last nine bands could have been degraded, which may have affected the modeling and the reproducibility of the method [
137]. To limit this impact, the last nine bands should not be used for the texture variables. Secondly, despite preprocessing, an offset between imagery and LiDAR CHM persist (RMS: 0.97 m) and could affect the ITC segmentation and classification modeling. The offset at the ground level was almost perfect, but the misalignment of the crowns was sometimes over 3 m. Tree crowns tilted in the image could be used for stereo-reconstruction when at least two images are used [
78,
138], but using a single image caused complex situations where segmented LiDAR crowns were not matching their corresponding trees in the WorldView-3 image. A digital surface model derived from LiDAR could also be used to orthorectify the image [
25,
29]. However, we did several tests and decided not to use this technique because it created many artefacts when high spatial resolution images such as WorldView-3 were used. In this study, where mature trees were present all over the area, manual points were collected to fit the CHM and thereby reduce this offset. To limit this effect, a threshold of 17 m was set as a mask in order to analyze only tall and large trees. The imagery was also integrated in the ITC segmentation to divide objects including more than one species as a solution to eliminate the offset between data sources.
The third limitation of the proposed approach concerns the fact that the territory is composed of more than 11 species. Given that the species modeling does not include the full diversity, a marginal species will be classified into one of the 11 species classes used in the modeling. Also, small trees were not mapped, as a threshold of 17 m was incorporated. It would be interesting to integrate more species classes in the modeling, considering groups of age or height [
26]. Although more species will make the model more complex, functional groups could be tested in the hierarchical approach [
139], multi-temporal imagery could be used [
40,
41,
45] or more advanced algorithms like deep learning techniques [
31]. Li et al. [
45] argued that using bi-temporal WorldView imagery could improve the classification on average by 10.7%. He et al. [
40] found their best results when combining late-spring, mid-summer and early-fall images. Hartling et al. [
31] demonstrated that deep learning techniques could improve broadleaf species classification by at least 30% compared to RF and SVM. Adding other variables such as LiDAR metrics or topological measures could also improve the classification [
8,
14,
16,
22,
39,
131]. Finally, an expert procedure could be implemented to select a maximum number of each categorical variable to limit over-representation [
136]. For example, this procedure would avoid the need to automatically select only LiDAR variables and instead allow for a mix of LiDAR, spectral indices, topological variables, etc.