Individual Tree Crown Segmentation and Classification of 13 Tree Species Using Airborne Hyperspectral Data

Knowledge of the distribution of tree species within a forest is key for multiple economic and ecological applications. This information is traditionally acquired through time-consuming and thereby expensive field work. Our study evaluates the suitability of a visible to near-infrared (VNIR) hyperspectral dataset with a spatial resolution of 0.4 m for the classification of 13 tree species (8 broadleaf, 5 coniferous) on an individual tree crown level in the UNESCO Biosphere Reserve ‘Wienerwald’, a temperate Austrian forest. The study also assesses the automation potential for the delineation of tree crowns using a mean shift segmentation algorithm in order to permit model application over large areas. Object-based Random Forest classification was carried out on variables that were derived from 699 manually delineated as well as automatically segmented reference trees. The models were trained separately for two strata: small and/or conifer stands and high broadleaf forests. The two strata were delineated beforehand using CHM-based tree height and NDVI. The predictor variables encompassed spectral reflectance, vegetation indices, textural metrics and principal components. After feature selection, the overall classification accuracy (OA) of the classification based on manual delineations of the 13 tree species was 91.7% (Cohen’s kappa (κ) = 0.909). The highest user’s and producer’s accuracies were most frequently obtained for Weymouth pine and Scots Pine, while European ash was most often associated with the lowest accuracies. The classification that was based on mean shift segmentation yielded similarly good results (OA = 89.4% κ = 0.883). Based on the automatically segmented trees, the Random Forest models were also applied to the whole study site (1050 ha). The resulting tree map of the study area confirmed a high abundance of European beech (58%) with smaller amounts of oak (6%) and Scots pine (5%). We conclude that highly accurate tree species classifications can be obtained from hyperspectral data covering the visible and near-infrared parts of the electromagnetic spectrum. Our results also indicate a high automation potential of the method, as the results from the automatically segmented tree crowns were similar to those that were obtained for the manually delineated tree crowns.


Introduction
The number of remote sensing papers focusing on tree species classification has increased substantially over the past few decades mainly due to a higher availability of multi-and hyperspectral data [1]. While forest managers and the timber industry have an obvious interest in data on timber hyperspectral data reviewed by Fassnacht et al. [1], the Random Forest [29] and the computationintensive [30] Support Vector Machines (SVM) classifiers were by far the most commonly used nonparametric classifiers. However, the number and quality of the reference samples is probably more important than the chosen classifier [1,31].
This study aims at applying the Random Forest classification algorithm to hyperspectral and tree height data in order to automatically classify tree species within an object-based approach. The project area is located in the UNESCO Biosphere Reserve 'Wienerwald' nearby Vienna, Austria. The temperate forest hosts a variety of tree species and is thereby a suitable test site for tree species classification experiments. The following research questions are studied: (1) Are accurate classifications of individual, manually delineated tree crowns possible with hyperspectral VNIR data? (2) If so, is this still possible with automatically segmented tree crowns, so as to permit wall-towall mappings? (3) Which of the remotely sensed spectral variables are particularly important for classification accuracy? (4) Which tree species can be classified best/worst?

Study Site
The study site is located in the Austrian province of Lower Austria and covers an area of 1050 ha ( Figure 1). The area is a composite of forested, agricultural and urban areas with buildings and roads. The elevation of the hilly terrain ranges between 250 and 600 m above sea level. The area can be allocated to the colline and submontane altitudinal belt. The annual rainfall ranges from 700 to 1,000 mm and precipitation peaks in July. The dominating soils are luvisol and pseudogley. Naturally, the forest is dominated by sessile oak-hornbeam forests, alder-ash forests as well as beech forests with an admixture of sessile oak, ash and maple [32]. In the areas with ongoing timber production, pure spruce, pine and larch stands can be found.  The study site is located in the 'Wienerwald', which became a UNESCO Biosphere Reserve in 2005. The reserve is divided into multiple patches. Each one is allocated to one of three different zones, depending on the priorities for the implementation of the biosphere reserve objectives. A large part in the South of the study site is contained by a core region. Core regions are the supposed 'primeval forests of tomorrow', which is why silvicultural management has stopped since 2005 [33,34]. A small stretch in the North-West is located in one of the buffer zones, which are not inherently connected to management limitations. The forested part of the transition area of the study site is managed by the Austrian state forest enterprise. The geographical location of the study site is presented in Figure 1.

Remote Sensing Data
Airborne data were recorded by Airborne Technologies. Hyperspectral data were acquired on 25 August 2016 under cloudless conditions with a Hyspex VNIR 1600 push broom sensor that was mounted on a Tecnam MMA aircraft. The sensor covers the electromagnetic spectrum from 415 nm to 991 nm and provides data with 160 spectral bands, each with a spectral width of 3.7 nm. During the overflight, 18 flight strips were generated with solar azimuth angles between 145 • and 165 • and solar zenith angles between 47.8 • and 51.1 • . The average flying altitude was 830 m.a.s.l. and the average flying speed was 56 m/s. However, the high flight speed and altitude interfered with the high spectral resolution of the VNIR sensor. Therefore, two VNIR bands at a time needed to be fused during the pre-processing, resulting in a reduction of the original 160 spectral bands to 80 bands. The pixel size was approximately 0.4 m.
Airborne laser scanning data were acquired during the same flight using a RIEGL LMS-Q680i sensor. The point cloud featured an average point density of 15 points/m 2 and was used to create Digital Surface and Digital Terrain Models (DSM, DTM) of 0.5 m spatial resolution.
The hyperspectral data were pre-processed involving atmospheric correction and mosaicking. First, individual images were corrected for atmospheric effects using ATCOR4 [35,36]. After this, flight strips were aligned to create a seamless combined image (mosaicking) using the ENVI software (Exelis Visual Information Solutions, Boulder, CO, USA).

Forest Mask
To isolate the forested areas within the hyperspectral data for subsequent segmentation, a forest mask was created. The basic components of the final forest mask were two sub-masks: (1) a canopy height model (CHM)-based mask, and (2) a Normalized Difference Vegetation Index (NDVI)-based mask. To derive the CHM layer, the difference between DSM and DTM was calculated. The CHM-based mask only included and assembled pixels with a height of at least 3 m. With respect to the second sub-mask, pixels with NDVI values ≥0.6 were considered as being forested. For the final forest mask, the intersection of the CHM-and vegetation-based masks was used. According to this forest mask, 8.4 km 2 of the 10.5 km 2 study site are covered by trees. Subsequently, small grasslands, road strips and shadows were eliminated. Elimination was based on the results of a pixel-based Random Forest classification which distinguished the following classes: reference tree classes, shadows, streets, and grasslands. We checked visually that the forested area was well covered.

Reference Data
A reference shapefile with 287 forest stand polygons and information on the relative shares of 38 tree species was provided by the Austrian state forest enterprise. The shapefile refers to the situation in 2008 and covers the study area. However, the study was mainly focused on middle-age and mature stands and therefore only little changes of the species composition over time can be expected. Of the 38 recorded species, 23 were attributed to at least one of the mentioned 287 polygons. Only very few species grew in pure stands.
By using reference polygons that were provided by the forest enterprise, detailed reference data were collected during field work in summer 2017. To achieve a decent representation of all tree Remote Sens. 2018, 10, 1218 5 of 29 species in the classification, we decided to exclude all species with less than 20 reference samples. Subsequently, the reference trees were delineated manually in ArcGIS (ESRI 2011). Examples are shown in Figure 2. Upon delineation, only sunlit portions of each crown were considered, which served to decrease intra-crown and intra-species variation. The number and species of the reference trees are given in Table 1. In total, 699 reference polygons were finally available for the study, and each were assigned to one of 13 tree species (eight broadleaf, five coniferous).  Illustration of the manual crown delineation approach. Due to our intent to exclude shaded areas of the tree crowns from the reference sample, some reference tree polygons (like the one at the bottom left) have a contorted shape. Background: Color infrared composite of the hyperspectral data.

Workflow Description
In this work, a three-step approach was followed ( Figure 3). First, a Random Forest classifier was trained with the 699 manually delineated reference trees using 202 predictor variables. This step served to evaluate the classification performance for optimally delineated tree crowns. After this, the entire study area, and all of the trees within the study area, were automatically segmented into image objects. The segments corresponding to the manually delineated crowns (highest amount of overlapping area) were selected and were used to create an additional Random Forest model. By Figure 2. Illustration of the manual crown delineation approach. Due to our intent to exclude shaded areas of the tree crowns from the reference sample, some reference tree polygons (like the one at the bottom left) have a contorted shape. Background: Color infrared composite of the hyperspectral data. Table 1. Overview of the general characteristics of the reference tree species and the respective class sizes.

Workflow Description
In this work, a three-step approach was followed ( Figure 3). First, a Random Forest classifier was trained with the 699 manually delineated reference trees using 202 predictor variables. This step served to evaluate the classification performance for optimally delineated tree crowns. After this, the entire study area, and all of the trees within the study area, were automatically segmented into image objects. The segments corresponding to the manually delineated crowns (highest amount of overlapping Remote Sens. 2018, 10, 1218 6 of 29 area) were selected and were used to create an additional Random Forest model. By comparing the classification results it was checked how strongly the classification accuracy was impaired by the automatic segmentation of the reference trees. To generate a wall-to-wall tree species map, the Random Forest model that was trained on the segmented reference trees was applied to all of the automatic generated segments.
Remote Sens. 2018, 10, x FOR PEER REVIEW 6 of 29 comparing the classification results it was checked how strongly the classification accuracy was impaired by the automatic segmentation of the reference trees. To generate a wall-to-wall tree species map, the Random Forest model that was trained on the segmented reference trees was applied to all of the automatic generated segments. Figure 3. Workflow overview. In the first two steps, Random Forest models based on both manually delineated and segmented reference trees were trained. In the third step, the model based on segmented reference trees was applied to the whole hyperspectral image to produce a classified tree map.

Noise Removal
Smoothing was applied as even atmospherically corrected reflectance data contain noise. Without smoothing, the noise would be passed on to all classification features. Smoothing is based on the assumption that in the absence of noise-the process underlying the data would result in a smooth curve [28]. The Whittaker smoother is well suited for this task [37]. The smoother balances fidelity to the data with the smoothness of the resulting curve and has the ability to automatically interpolate between missing values [38,39]. The algorithm was executed on the mean reflectance values of each tree crown using the R function miwhitatzb1 [40]. The smoothing was done in one iteration and the parameter (λ) was set to 12.

Feature Extraction
A large number of features was extracted from the hyperspectral data cube (Table 2). Besides the smoothed reflectance values, these were statistics of the first derivative of the mean spectral signature (minimum, maximum, mean, standard deviation), vegetation indices (Table A1), textural metrics and principle components. All of the features were generated at object-level only and by using previously smoothed spectra. Object-level information was generated using simple value averaging.  In the first two steps, Random Forest models based on both manually delineated and segmented reference trees were trained. In the third step, the model based on segmented reference trees was applied to the whole hyperspectral image to produce a classified tree map.

Noise Removal
Smoothing was applied as even atmospherically corrected reflectance data contain noise. Without smoothing, the noise would be passed on to all classification features. Smoothing is based on the assumption that in the absence of noise-the process underlying the data would result in a smooth curve [28]. The Whittaker smoother is well suited for this task [37]. The smoother balances fidelity to the data with the smoothness of the resulting curve and has the ability to automatically interpolate between missing values [38,39]. The algorithm was executed on the mean reflectance values of each tree crown using the R function miwhitatzb1 [40]. The smoothing was done in one iteration and the parameter (λ) was set to 12.

Feature Extraction
A large number of features was extracted from the hyperspectral data cube ( Table 2). Besides the smoothed reflectance values, these were statistics of the first derivative of the mean spectral signature (minimum, maximum, mean, standard deviation), vegetation indices (Table A1), textural metrics and principle components. All of the features were generated at object-level only and by using previously smoothed spectra. Object-level information was generated using simple value averaging. Texture describes tone changes within an image ('granularity'), i.e., how 'smooth' or 'coarse' an image appears. With a scale large enough to picture objects of interest over the extent of multiple pixels, texture differences between these objects frequently constitute a distinctive feature [41].
To calculate texture metrics, the 'Haralick Texture [42] Extraction' (HTE) application as implemented in the Orfeo ToolBox (OTB) was run [43]. Red (700 nm) and near-infrared bands (846 nm) were selected for this application. As it generated the most appealing output layers (visual inspection), the radius for texture extraction was set to three pixels. Corresponding to the maximum pixel values of each band, the input image maxima were specified as 15,132 (700 nm) and 15,010 (846 nm). For all other parameters default the values were kept. The 'simple' mode of the HTE application was chosen for analysis as its texture features showed the most variability. The 'simple' mode yields eight textural feature layers for each spectral band (Energy, Entropy, Correlation, Inverse Difference Moment, Inertia, Cluster Shade, Cluster Prominence and Haralick Correlation).
To create variables with potentially higher (condensed) information content, a Principal Component Analysis (PCA) was performed on the atmospherically corrected and mosaicked VNIR data [44]. During this, different amounts of information from the input (bands) are merged into separate uncorrelated principal components [19]. As the computation of principal components for the whole VNIR data would have needed high amounts of memory space, the transformation matrix was determined for an image subset (0.6 km 2 ), which featured only forest. The PCA was done in R using the function prcomp from the immanent 'stats' package. Scaling was enabled which eliminates the unit of measurement and thereby makes reflectance values from different bands with different value ranges comparable [45]. The resulting transformation matrix was used to calculate principal components across the image extend.

Segmentation
The (large scale) mean shift algorithm was applied to the hyperspectral data twice. First, to define strata of (i) small broadleaf or conifer and (ii) high broadleaf stands and a second time to delineate the individual tree crowns within the strata. The first segmentation was necessary as when the mean shift algorithm was applied to the complete hyperspectral data, we found that segmentation parameters, which were suitable for conifers and trees with a rather small crown, left trees with a wide crown over-segmented. It was thereby decided to apply different segmentation parameters to the mentioned groups (small broadleaf or conifer, high broadleaf). A conifer of any height was assigned to the first group (small broadleaf or conifer), while depending on its height-a broadleaf tree was assigned to either the first or the second group. Tree height was considered proxy for tree age, which in turn was considered a proxy for tree crown size. Once, the strata were established, the mean shift algorithm was applied a second time to delineate the individual tree crowns within the strata (using strata-specific parameters) and to thereby automate the tree crown delineation process.
For both segmentation purposes, the mean shift algorithm was used. The mean shift algorithm [46,47] is a non-parametric method for locating the maxima of a density function [48]. Upon mean shift image segmentation, the mean of all the pixel values within a defined circular window around the starting point is calculated. The extent of the window is defined by a spatial and spectral distance. The center of the window is next shifted to the point corresponding to the previously determined mean. This process is repeated until the maximum number of iterations is reached or the window does not significantly move anymore. At this point, an adjustable convergence threshold defines what should be considered as a significant move. Upon clustering, those pixels whose windows got shifted to the same location are grouped together.
For applications over large areas, it has to be considered that the basic mean shift algorithm is unstable due to its tile-wise computation of images. Besides other modifications, the large scale mean shift (LSMS) algorithm [48] generates overlapping tiles, assuring that the surroundings of the border segments from one tile are also explored in another. This prevents a false generation of segment borders at tile borders. Hence, for the generation of the strata, the four-stage LSMS as implemented in the Orfeo toolbox was used. In recent years, LSMS was used for several forest-related analyses of earth observation data [49][50][51]. However, to our knowledge there are still no studies using the mean shift algorithm for the delineation of individual tree crowns.

Stratification
Before delineating individual crowns, we first used the CHM to differentiate stands with mainly high and mainly small trees. This permits combining small broadleaf stands and conifer stands into one stratum for optimum segmentation and classification results at tree level (same for the high broadleaf stands). Acceptable segmentations (visual inspection) were obtained using a spatial radius of 12, a range radius of 3, a minimum object size of 20,000 and maximum 50 iterations (Table 3). Figure 4 gives an idea of the sensitivity of the segmentation. Segments of the high and small trees were separated by applying a threshold of 10 m on the mean height of each segment. earth observation data [49][50][51]. However, to our knowledge there are still no studies using the mean shift algorithm for the delineation of individual tree crowns.

Stratification
Before delineating individual crowns, we first used the CHM to differentiate stands with mainly high and mainly small trees. This permits combining small broadleaf stands and conifer stands into one stratum for optimum segmentation and classification results at tree level (same for the high broadleaf stands). Acceptable segmentations (visual inspection) were obtained using a spatial radius of 12, a range radius of 3, a minimum object size of 20,000 and maximum 50 iterations (Table 3). Figure 4 gives an idea of the sensitivity of the segmentation. Segments of the high and small trees were separated by applying a threshold of 10 m on the mean height of each segment.  Table 3.
Conifers were separated from broadleaved trees using a pixel-based Random Forest classification. The model was trained using reference data of the following classes: broadleaf trees, coniferous trees, shadow, non-forest vegetation and infrastructure/houses. The pixel-based classification of the VNIR data cube had an overall accuracy of 98.4% (CI = 98.3%, 98.4%;) and a Cohen's kappa of 0.974. The producer and user accuracies of the conifer class were 87.2% and 94.0%, respectively.
Finally, the shapes indicating small broadleaf or conifer stands were merged. The merger served to indicate the areas in the VNIR data that were designated for the segmentation with a more conservative set of parameters, while the inverse of these areas was used for the segmentation of the high broadleaf stands.

Segmentation of Individual Tree Crowns
Within the two strata, the mean shift algorithm was used to delineate the individual tree crowns. The strata-specific parameters are listed in Table 3 (lower part). The parameters take into account that young (small) broadleaf trees and conifers are best delineated using a small parameter for minimum object sizes (and range radius), compared to larger crowns that are typical for higher trees and broadleaf species.  Table 3.
Conifers were separated from broadleaved trees using a pixel-based Random Forest classification. The model was trained using reference data of the following classes: broadleaf trees, coniferous trees, shadow, non-forest vegetation and infrastructure/houses. The pixel-based classification of the VNIR data cube had an overall accuracy of 98.4% (CI = 98.3%, 98.4%;) and a Cohen's kappa of 0.974. The producer and user accuracies of the conifer class were 87.2% and 94.0%, respectively.
Finally, the shapes indicating small broadleaf or conifer stands were merged. The merger served to indicate the areas in the VNIR data that were designated for the segmentation with a more conservative set of parameters, while the inverse of these areas was used for the segmentation of the high broadleaf stands.

Segmentation of Individual Tree Crowns
Within the two strata, the mean shift algorithm was used to delineate the individual tree crowns. The strata-specific parameters are listed in Table 3 (lower part). The parameters take into account that young (small) broadleaf trees and conifers are best delineated using a small parameter for minimum object sizes (and range radius), compared to larger crowns that are typical for higher trees and broadleaf species.
Mean shift segmentation generated 1,596,727 segments with a mean area of 3.65 m 2 . The average object size in the small broadleaf or conifer strata was 2.65 m 2 , compared to 4.06 m 2 for the high broadleaf strata. Figure 5 gives an idea about the performance of the selected parameters. Table 3. Parameters of the mean shift segmentations for the conifer or small broadleaf and high broadleaf stands (upper part), and the delineation of the individual tree crowns (lower part). Mean shift segmentation generated 1,596,727 segments with a mean area of 3.65 m². The average object size in the small broadleaf or conifer strata was 2.65 m², compared to 4.06 m² for the high broadleaf strata. Figure 5 gives an idea about the performance of the selected parameters.   Table 3. (a) depicts a high broadleaf forest. Conifers are featured in (b).

Random Forest Classification
The Random Forest algorithm [29] has gained popularity in remote sensing and is frequently used for the classification of tree species [1,4]. The algorithm constructs hundreds of decision trees based on bootstrap samples that were randomly created from the original dataset. Observations within each bootstrap sample account for about two-thirds of all observations within the original dataset and are drawn with replacement. Random Forest uses each decision tree for the classification of the observations that were not part of the corresponding bootstrap sample (Out-of-bag (OOB) data) [29].
Random Forest deviates from classical bagging algorithms insofar as-to prevent a correlation between the decision trees-it does not consider all input variables (features) for the construction of each decision tree, but only a random selection of these. For a dataset with p input variables and a categorical response variable, the number of features that are considered within each sample is by default, with features being selected randomly selected for each bootstrap sample anew [52]. Only one of all the variables considered at each node [53,54].  Table 3. (a) depicts a high broadleaf forest. Conifers are featured in (b).

Random Forest Classification
The Random Forest algorithm [29] has gained popularity in remote sensing and is frequently used for the classification of tree species [1,4]. The algorithm constructs hundreds of decision trees based on bootstrap samples that were randomly created from the original dataset. Observations within each bootstrap sample account for about two-thirds of all observations within the original dataset and are drawn with replacement. Random Forest uses each decision tree for the classification of the observations that were not part of the corresponding bootstrap sample (Out-of-bag (OOB) data) [29].
Random Forest deviates from classical bagging algorithms insofar as-to prevent a correlation between the decision trees-it does not consider all input variables (features) for the construction of each decision tree, but only a random selection of these. For a dataset with p input variables and a categorical response variable, the number of features that are considered within each sample is √ p by default, with features being selected randomly selected for each bootstrap sample anew [52]. Only one of all the √ p variables considered at each node [53,54].
Within the training state of the Random Forest model reference data is used, i.e., all observations feature a true class label. In each decision tree only the corresponding OOB data set is classified. The proportion of matches between majority vote from the OOB results and the true class label is used for accuracy assessment. Due to this built-in validation measure, it becomes unnecessary to set aside a test set [53,55,56].
There are different ways of how the Random Forest algorithm estimates the importance of each input variable for the classification result. One measures the impact that the input variable has on the classification result and is originally called the Margin measure [53], however it is referred to as Mean Decrease in Accuracy (MDA) in e.g., the randomForest R package [57]. The margin of an observation is defined as the number of votes for the true class of the observation minus the number of votes for other classes. In the first step, OOB data is passed down the corresponding decision trees. In the second step, the values of one specific input variable are randomly permuted within the OOB data. Each of the modified OOB datasets is then run down its corresponding unaltered decision tree. The average lowering of the margin across all of the observations upon the permutation of variable values is a measurement of the relative importance of the input variable for classification. A large decrease of the average margin corresponds to a high importance [53]. As it is customary for classification studies, the results of a classification are displayed in a confusion matrix, featuring the producer's, the user's and the overall accuracy as well as Cohen's kappa coefficient.
For this study, the algorithm was executed in R version 3.3.3 [58]. We used the randomForest() function from the identically named package by Liaw and Wiener [57]. The function confusionMatrix from the 'caret' package [59] was used for the extraction of Cohen's kappa and the 'raster' package [60] was used for data preparation. The default settings of the two tuning parameters for randomForest were kept: the number of decision trees to grow (ntree = 500) and the number of variables used to split at each node (mtry = √ p). The setting does not only ensure comparability with other studies, but also proofed to lead to reasonable results in studies that experimented on these values [4,61].
To find the optimum feature combination, a recursive feature selection was applied [62]. The backward features selection starts with a model that is based on all of the input features and makes a step-wise reduction by removing each time the least important variable based on the MDA values which are recalculated at each step [63][64][65]. At the end, the model with the lowest number of input features which reaches min 97.5% of the maximum OOB overall accuracy is used.
To obtain a quality indicator in the model application step, classification reliability was calculated as the share of votes for the class with the highest number of votes minus the share for the class with the second highest number of votes [50,65]. This yields a reliability value ranging between 1 and 0, with 1 indicating a high and 0 indicating a low classification reliability.

Spectral Signatures of Tree Species
The complete spectral signatures for each of the 13 investigated tree species are featured in Figure 6a. The reflectance values of the different species are particularly widespread around wavelengths > 800 nm. The reflectance of sycamore maple is comparatively high, while the opposite is true for Norway spruce. For the spectral interval between 800 and 1000 nm, the reflectance values of broadleaves are generally higher than the corresponding values of conifers. However, there is some overlap, e.g. around the comparatively low reflectance values of wild cherry in the spectral interval between 800 and 991 nm. For the visible range, the spectral signatures around the green peak at 560 nm are shown in Figure 6b. Directly at the peak, the ranking of the maximum reflectance values is very different. Three of the four highest reflectance values were obtained for coniferous species, while the lowest ones were featured by wild cherry.

Classification of Manually Delineated Reference Trees
The results of the classification of manually delineated reference trees are given in Table 4. Hereafter, this classification will be referred to as the 'VNIR (all)', compared to the results that were obtained from the automatically segmented tree crowns (labeled as 'VNIR (all-mean shift)'). The lowest user's accuracy was obtained for sycamore maple (78.7%) and is mainly owed to misclassifications of European ash. The maximum value (100.0%) was achieved for Weymouth pine, silver fir and wild cherry. The classification of Weymouth pine and silver birch yielded a producer's accuracy of 100.0%. The lowest producer's accuracy was associated with European ash (72.7%), which was mainly due to the mentioned misclassifications as sycamore maple and black alder. With one misclassification as European beech, silver fir is the only coniferous species that got classified as a broadleaf. There was also only one misclassification of a broadleaf species as a conifer, namely wild cherry as European larch.
Of all 699 reference trees, 58 were misclassified (8.3%). Of these, nine trees were conifers, which corresponds to 3.0% of all conifers in this classification, while a total of 49 broadleaves were misclassified (12.2% of all broadleaves). The overall accuracy of the classification is 91.7% (confidence interval (CI) = 89.4%, 93.6%), while Cohen's kappa reaches 0.909.
The importance of the 24 variables that are part of the best model are illustrated in Figure 7a. Most of the variables are principal components (17)

Classification of Manually Delineated Reference Trees
The results of the classification of manually delineated reference trees are given in Table 4. Hereafter, this classification will be referred to as the 'VNIR (all)', compared to the results that were obtained from the automatically segmented tree crowns (labeled as 'VNIR (all-mean shift)'). The lowest user's accuracy was obtained for sycamore maple (78.7%) and is mainly owed to misclassifications of European ash. The maximum value (100.0%) was achieved for Weymouth pine, silver fir and wild cherry. The classification of Weymouth pine and silver birch yielded a producer's accuracy of 100.0%. The lowest producer's accuracy was associated with European ash (72.7%), which was mainly due to the mentioned misclassifications as sycamore maple and black alder. With one misclassification as European beech, silver fir is the only coniferous species that got classified as a broadleaf. There was also only one misclassification of a broadleaf species as a conifer, namely wild cherry as European larch.
Of all 699 reference trees, 58 were misclassified (8.3%). Of these, nine trees were conifers, which corresponds to 3.0% of all conifers in this classification, while a total of 49 broadleaves were misclassified (12.2% of all broadleaves). The overall accuracy of the classification is 91.7% (confidence interval (CI) = 89.4%, 93.6%), while Cohen's kappa reaches 0.909.
The importance of the 24 variables that are part of the best model are illustrated in Figure 7a. Most of the variables are principal components (17)     With an overall accuracy of 89.3% (CI: 86.7%, 91.5%) and a Cohen's kappa of 0.882, the best results for these two-variable groups setups were obtained for 'VNIR (bands, PCs)'. The very best results were obtained by using variables from all of the groups (VNIR (all)). Combining bands, indices and principal components ('VNIR (bands, indices, PCs)') resulted in an overall accuracy of 91.0% (CI = 88.6%, 93.0%) and a Cohen's kappa of 0.901.

Classification of Mean Shift-Segmented Reference Trees
Using reference trees that were segmented with the mean shift algorithm ('VNIR (all)-mean shift'), the lowest user's accuracies were obtained for oak (79.4%). As in the case of the VNIR (all) classification, Weymouth pine was associated with the maximum producer's and user's accuracy (100.0%). The lowest producer's accuracy was obtained for European ash (75.0%), mainly due to misclassifications as black alder, European beech and oak. Silver fir is the only coniferous species that got misclassified as a broadleaf species, namely European beech. Half of the broadleaf species got misclassified as conifers at least once. With this, sycamore maple got most frequently misclassified as a conifer (silver fir). Most misclassified broadleaf species were classified as European larch. Of all 699 reference trees, 74 were misclassified (10.6%). Of these 74 trees, 14 were conifers, which corresponds to 4.7% of all conifers in this classification, while a total of 60 broadleaves were misclassified (15.0% of all broadleaves). The overall accuracy of the classification is 89.4% (CI = 86.9%, 91.6%). Cohen's kappa reaches 0.883. The confusion matrix is provided in Table 5. With an overall accuracy of 89.3% (CI: 86.7%, 91.5%) and a Cohen's kappa of 0.882, the best results for these two-variable groups setups were obtained for 'VNIR (bands, PCs)'. The very best results were obtained by using variables from all of the groups (VNIR (all)). Combining bands, indices and principal components ('VNIR (bands, indices, PCs)') resulted in an overall accuracy of 91.0% (CI = 88.6%, 93.0%) and a Cohen's kappa of 0.901.

Classification of Mean Shift-Segmented Reference Trees
Using reference trees that were segmented with the mean shift algorithm ('VNIR (all)-mean shift'), the lowest user's accuracies were obtained for oak (79.4%). As in the case of the VNIR (all) classification, Weymouth pine was associated with the maximum producer's and user's accuracy (100.0%). The lowest producer's accuracy was obtained for European ash (75.0%), mainly due to misclassifications as black alder, European beech and oak. Silver fir is the only coniferous species that got misclassified as a broadleaf species, namely European beech. Half of the broadleaf species got misclassified as conifers at least once. With this, sycamore maple got most frequently misclassified as a conifer (silver fir). Most misclassified broadleaf species were classified as European larch. Of all 699 reference trees, 74 were misclassified (10.6%). Of these 74 trees, 14 were conifers, which corresponds to 4.7% of all conifers in this classification, while a total of 60 broadleaves were misclassified (15.0% of all broadleaves). The overall accuracy of the classification is 89.4% (CI = 86.9%, 91.6%). Cohen's kappa reaches 0.883. The confusion matrix is provided in Table 5. Table 5. Confusion matrix of the VNIR (all-mean shift) classification based on the automatically segmented tree crowns. The gray lines separate coniferous and broadleaf trees. Tree species are abbreviated as follows: EB = European beech, OS = oak species, EA = European ash, EH = European hornbeam, SM = sycamore maple, SB = silver birch, WC = wild cherry, BA = black alder, NS = Norway spruce, EL = European larch, SP = Scots pine, SF = silver fir, WP = Weymouth pine. Other abbreviations in the table: PA = producer's accuracy, OA = overall accuracy, UA = user's accuracy, κ = Cohen's kappa.    (18). Again, the vegetation index PRI is the most important variable, followed by BR. In total, five vegetation indices are included in the model. The complete variable importance plot is provided in Figure 7b. Table 6 summarizes the classification results that were obtained from all of the classifications of the manually delineated and automatically segmented reference trees. In general, the results of the VNIR (all-mean shift) classification were only slightly worse than the ones obtained for the equivalent manual delineation scenario. For the classifications that were based on manually delineated trees, the results were best upon including all variables, closely followed by the classification using bands, indices and principal components. Using only reflectance values (VNIR (bands)) yielded the worst of all results. A huge difference was found between the results of the four classifications using spectral reflectance values plus one of the four other variable groups (1st derivative, texture, indices, principal components): While the scenarios with indices and principal components generate good results, it does hardly make a difference to the accuracy when texture or statistics of the first derivative are added to the classification. Table 6. Overall accuracy (OA) and Cohen's kappa (κ) for classifications of reference trees. The dotted line separates the classifications of the manually delineated reference trees (above) from the classifications of the segmented reference trees (below). Abbreviations are as follows: CI = Confidence Interval, Ind. = Indices, PCs = Principal components, Text. = Texture, 1st Deriv. = 1st Derivative. The best classification is printed in bold. The classification accuracies of the individual tree species varied between the different classifications. However, some species were repeatedly associated with highest or lowest accuracies. The producer's accuracy of European ash was the lowest in all eight classifications. The species that were associated with minimum user's accuracies were European ash, oak and sycamore maple. Both Weymouth pine and Scots pine had the maximum user's accuracies in four cases. Two times, Scots pine, Weymouth pine, silver birch and European larch were associated with the highest producer's accuracies. There were only three cases where maximum accuracy values were obtained for a broadleaf species (producer's accuracy of silver birch (twice), user's accuracy of wild cherry). Figure 8 presents the wavelength loadings for the first two principal components (PC 1 and PC 2), as well as the four principle components that were associated with the highest importance in the best model of the VNIR (all) classification (PC 23, PC 11, PC 15 and PC 13). The average spectral signature of European beech was added to the graphs to ease interpretation. The loadings of the first two PCs, particularly PC 1, are very uniform (Figure 8a,b). This implies that these two PCs focus on the overall spectral variability in the data cube without necessarily improving the discrimination of the species. On the contrary, the loadings of the four most important principal components (Figure 8c-f) feature several peaks, indicating that different parts of the electromagnetic spectrum contribute to the separability of the classes. Besides peaks around green wavelengths (500-550 nm), we also note maxima located around blue wavelengths, as well as in the red region of the light spectrum. Compared to the visible spectral range, the near-infrared range seems to store less information that is needed for species discrimination (in this particular case).

Importance of Wavelengths for Principal Components
Remote Sens. 2018, 10, x FOR PEER REVIEW 16 of 29 two PCs, particularly PC 1, are very uniform (Figure 8a,b). This implies that these two PCs focus on the overall spectral variability in the data cube without necessarily improving the discrimination of the species. On the contrary, the loadings of the four most important principal components ( Figure  8c-f) feature several peaks, indicating that different parts of the electromagnetic spectrum contribute to the separability of the classes. Besides peaks around green wavelengths (500-550 nm), we also note maxima located around blue wavelengths, as well as in the red region of the light spectrum. Compared to the visible spectral range, the near-infrared range seems to store less information that is needed for species discrimination (in this particular case).

Wall-to-Wall Mapping on Mean Shift-Segmented VNIR Image
The wall-to-wall tree species and classification reliability maps are presented in Figures 9 and  10. The classification map (Figure 9) shows the high species richness and biodiversity. The corresponding classification reliabilities ( Figure 10) demonstrate a large spatial variability. In the upper map detail (A) of Figure 10, an accumulation of segments with low reliabilities can be found

Wall-to-Wall Mapping on Mean Shift-Segmented VNIR Image
The wall-to-wall tree species and classification reliability maps are presented in Figures 9 and 10. The classification map (Figure 9) shows the high species richness and biodiversity. The corresponding classification reliabilities ( Figure 10) demonstrate a large spatial variability. In the upper map detail (A) of Figure 10, an accumulation of segments with low reliabilities can be found (in red). Around this area, high classification reliabilities have been recorded (in green). This pattern can be rediscovered in the upper map detail (A) of Figure 9. High classification reliabilities correspond to an area where all segments were classified as European beech. In contrast, the area of low classification reliability is made up of segments that were mainly classified as black alder, European ash and oak. A similar picture of tree species distribution patterns matching the classification reliabilities is depicted in the second map details (B) where in this case Scots pine being is associated with the highest reliabilities.
Remote Sens. 2018, 10, x FOR PEER REVIEW 17 of 29 (in red). Around this area, high classification reliabilities have been recorded (in green). This pattern can be rediscovered in the upper map detail (A) of Figure 9. High classification reliabilities correspond to an area where all segments were classified as European beech. In contrast, the area of low classification reliability is made up of segments that were mainly classified as black alder, European ash and oak. A similar picture of tree species distribution patterns matching the classification reliabilities is depicted in the second map details (B) where in this case Scots pine being is associated with the highest reliabilities.   The average classification reliability (=share of votes for the class with the highest number of votes minus the share for the class with the second highest number of votes) of all of the segments is 0.36 with substantial differences for the different species (Figure 11). The highest and lowest mean classification reliability values of the broadleaf species were obtained for European beech (0.44) and sycamore maple (0.13). The corresponding values of conifers were 0.45 (Scots pine) and 0.08 (silver fir). The average classification reliability (=share of votes for the class with the highest number of votes minus the share for the class with the second highest number of votes) of all of the segments is 0.36 with substantial differences for the different species (Figure 11). The highest and lowest mean classification reliability values of the broadleaf species were obtained for European beech (0.44) and sycamore maple (0.13). The corresponding values of conifers were 0.45 (Scots pine) and 0.08 (silver fir).

Discussion
The study shows that accurate tree species classification results can be obtained with hyperspectral data, with overall accuracies exceeding 90%. Differences between segmented and manually delineated tree crowns were small and not statistically significant. At the same time, despite the high number of classified tree species (13), our classification results outperformed other studies.

Classification Accuracies
The high OOB classification accuracies can be first and foremost attributed to the fact that the acquired hyperspectral dataset featured both a high spectral (80 bands, 7.3 nm wide) as well as a high spatial resolution (0.4 m). In other studies, a comparable resolution is usually-if at all-only given for either the spectral or the spatial domain. For example, a pixel-based classification of five tree species with hyperspectral data of 8 m spatial resolution and 125 bands resulted in an overall accuracy of 86% based on an independent validation data set [66]. Fassnacht et al. [67] achieved overall accuracies of 84% to 92% (seven species) and 86% to 97% (five species) with hyperspectral data with 125 bands of 3 to 4 m spatial resolution using different feature selection approaches and an iterative bootstrap classification approach. Richter et al. [27] separated ten broadleaved tree species based on data with 367 spectral bands and 2 m spatial resolution with an overall accuracy of 78.4%. Dalponte et al. [68] obtained a cross validated Cohen's kappa of 0.890 in a pixel-based classification that was based on hyperspectral data with a spatial resolution of 0.4 m and 160 bands (band width: 3.6 nm), hence using data more similar to ours. However, only three tree species were classified, demonstrating that the excellent spatial and spectral resolution of our dataset cannot be the only reason for our good results. All of these studies used pixel-based approaches to classify hyperspectral data covering VNIR as well as the SWIR region.
On the other hand, also studies based on multispectral data with high spatial resolution (2 m) achieved good classification results. Using WorldView-2 data for the object-based classification of ten tree species, Immitzer et al. [4] obtained an OOB overall accuracy of 82.0%. Other WorldView-2 studies in Central Europe achieved similar results [15,69].

Discussion
The study shows that accurate tree species classification results can be obtained with hyperspectral data, with overall accuracies exceeding 90%. Differences between segmented and manually delineated tree crowns were small and not statistically significant. At the same time, despite the high number of classified tree species (13), our classification results outperformed other studies.

Classification Accuracies
The high OOB classification accuracies can be first and foremost attributed to the fact that the acquired hyperspectral dataset featured both a high spectral (80 bands, 7.3 nm wide) as well as a high spatial resolution (0.4 m). In other studies, a comparable resolution is usually-if at all-only given for either the spectral or the spatial domain. For example, a pixel-based classification of five tree species with hyperspectral data of 8 m spatial resolution and 125 bands resulted in an overall accuracy of 86% based on an independent validation data set [66]. Fassnacht et al. [67] achieved overall accuracies of 84% to 92% (seven species) and 86% to 97% (five species) with hyperspectral data with 125 bands of 3 to 4 m spatial resolution using different feature selection approaches and an iterative bootstrap classification approach. Richter et al. [27] separated ten broadleaved tree species based on data with 367 spectral bands and 2 m spatial resolution with an overall accuracy of 78.4%. Dalponte et al. [68] obtained a cross validated Cohen's kappa of 0.890 in a pixel-based classification that was based on hyperspectral data with a spatial resolution of 0.4 m and 160 bands (band width: 3.6 nm), hence using data more similar to ours. However, only three tree species were classified, demonstrating that the excellent spatial and spectral resolution of our dataset cannot be the only reason for our good results. All of these studies used pixel-based approaches to classify hyperspectral data covering VNIR as well as the SWIR region.
On the other hand, also studies based on multispectral data with high spatial resolution (2 m) achieved good classification results. Using WorldView-2 data for the object-based classification of ten Remote Sens. 2018, 10, 1218 20 of 29 tree species, Immitzer et al. [4] obtained an OOB overall accuracy of 82.0%. Other WorldView-2 studies in Central Europe achieved similar results [15,69].
As evident in Table 6, vegetation indices and principal components strongly contributed to the success of our approach. These findings are in line with other studies [66,67]. Beneficial were also the high number of reference samples per class (24 (Weymouth pine) to 99 (European beech)). At the same time, only the sunlit areas of each tree crown were considered when delineating the reference samples. This has also been recommended by other studies [4,23,66] and was done with the ambition to decrease the intra-crown and intra-class variation of the spectral signal due to varying illumination within the canopy. Similarly, the parameters of the segmentation algorithm were set to mimic these manual delineations. Together, these choices have certainly positively contributed to the classification results, although we did not attempt to quantify the contribution of each individual factor.

Suitability of Hyperspectral Dataset and Segmentation Algorithms
One main objective of the study was to check how much the quality of the classification is affected by replacing manually delineated polygons with segmented reference polygons. Obviously, any observed decrease in accuracy should be traded off against the potential benefits of an automated crown delineation process. For example, only automatic segmentation makes it possible to apply per-tree classification models over large areas, leveraging the full environmental and economic potential of remotely sensed data [70]. With this in mind, the observed loss in classification accuracy was acceptable. Indeed, we only found a slight decrease of the VNIR (all) classification results compared to those that were based on automated segmented reference data (∆ overall accuracy = −2.3 pp, ∆κ = −0.026). This decline in classification accuracy can be considered very small, especially in the face of the overlapping confidence intervals (Table 6). Relatively similar classifications using manually and automatically segmented objects were also reported by Dalponte et al. [69]. However, these results were based only on three different tree species.
To generate reasonable tree crown objects with acceptable efforts, while avoiding the pitfalls and challenges of more sophisticated approaches (e.g., [25,71,72]), we chose to implement a stratified approach. We first separated the high and broadleaf stands from the small and/or conifer stands. For this, a LiDAR-based CHM was used, which could, however, easily be replaced by CHMs generated from photogrammetry. The high broadleaf stands were usually made up of relatively large crowns, whereas small trees and conifers generally had small crowns. The tree crown segmentation parameters were fixed accordingly for the two strata. Parameter settings were chosen in a way that the polygons resulting from the segmentation process were as congruent as possible with the manually delineated polygons. For this, over-segmentation was chosen over under-segmentation. Therefore, the majority of the segments are presumably smaller than the corresponding manual objects. In the reference dataset, the average automated generated segment size was around 40% smaller than the manually delineated crowns. As only sunlit areas were included in the manually delineated polygons, it was to be expected that a smaller extent of this polygon (as obtained from our segmentation) would still yield good classification results. No further attempts were made to optimize the tree crown segmentation process, which would be necessary if also the tree number should be counted. One possibility for the optimization of the parameters could be based on the classification results [65]. Another option would be to use unsupervised approaches [73]. Also, with respect to the chosen mean shift segmentation algorithm, we did not study the various existing alternatives [25]. In our understanding, the most appealing advantage of the mean shift algorithm is the fact that minimum object sizes can be deliberately set [74].

Classification of Tree Species
In all but the VNIR (all) classification, the user's and producer's accuracies of the five coniferous species were higher compared to the eight broadleaf species. Similar findings were reported by other studies [23,31,67]. By far the best classification results were obtained for Weymouth pine and Scots pine. The classes with the lowest user's accuracies, i.e., the classes that the most misclassified trees were assigned to, are European ash, oak and sycamore maple.
The clear superiority of the two coniferous species Weymouth pine and Scots pine is probably related to the distinctiveness of their spectral signatures. By contrast, the intra-class spectral signatures of the mentioned broadleaf species are very diverse and overlap with those of other tree species. Consequently, four of the six trees that were misclassified as black alder in the VNIR (all) classification were European ash, which is a species with a very similar spectral signature.
In the VNIR (all-mean shift) classification, oak was associated with the lowest user's accuracy, mainly owing to misclassifications of European beech. This was not expected as there seemed to be relatively large spectral differences between the two respective spectral curves ( Figure 6). It is possible that the data of some of the reference trees were noisy, which remained unrecognized upon inspection of the average spectral signatures. For example, signal from the soil, smaller trees and other (understory) vegetation might also be featured in the reference dataset due to a loose canopy structure [1,75].
Another possible reason for the high classification accuracy of both Weymouth and Scots pine could be related to the light requirements of these species. Dalponte et al. [6] argue that the low shade tolerance of pines allows them to grow only under good light conditions, which is why these trees are usually less suppressed by neighboring trees. In our study, this could have minimized external perturbations of the spectral signatures of the two pine species. Indeed, the stand where all of the reference Weymouth pines were sampled was comparatively open. Some correlation between light requirement and the accuracies in the VNIR (all) classification could be implied, e.g., by the high producer's accuracies of the highly light requiring species silver birch (100.0%), Scots pine (98.7%) and European larch (97.6%). Also, the user's accuracies of sycamore maple (78.7%), European ash (88.9%) and European hornbeam (92.9%) are in line with their rising light requirement (for a ranking of the light requirements of the tree species, see Ellenberg and Leuschner [76]). However, the fact that there are multiple deviations from this supposed relationship suggests that the light requirement of a species is only one factor among others that is relevant for the classification accuracy.
It is a well-known issue that the Random Forest classifier favors the classification of classes with a large number of reference samples-often at the expense of those with less reference samples [77]. In our work, however, both user's and producer's accuracies were relatively independent from the sample number. Even the species with relatively few reference samples (≈ n ≤ 30: silver fir, Weymouth pine) appeared to still have had enough samples not to be overwhelmingly affected. Apparently, in our study, the classification success depended more on the distinctiveness of the spectral signatures and much less on the number of reference samples.
Similar to other studies, we evaluated our models only with respect to the chosen classes. The 13 tree species that were included in our study cover the vast majority of the forest. However, we know that more species are present in the forest. For some very rare species that were reported in the dataset from the Austrian state forest enterprise, we were not able to take a sufficient amount of reference samples (≥20). Those species are therefore automatically misclassified, but do not appear in our confusion matrices. The same shortcoming exists for young and/or covered trees. For example, tree age can have effects on the spectral signatures of tree species [12,14]. There were hardly any young trees featured in the reference dataset as their usually small crowns were hard to identify in the data, often overlapping with neighboring tree crowns or even completely hidden. Therefore, it is probable that the accuracy of the model is positively biased.
As expected, the accuracies improved when apart from the spectral reflectance values there were additional variables used for the classifications including a simultaneous feature selection [27,31,68,78]. In particular, the addition of spectral indices and principal components drastically improved the classification results. By contrast, the inclusion of textural metrics and first derivatives only had a small effect. With respect to the textural metrics, it seems that the ratio of object to pixel size was not large enough to produce meaningful additional information [79].
The upswing in accuracies associated with broadleaves was more pronounced than the corresponding values for conifers (VNIR (all)-VNIR (bands): ∆ UA broadleaf = + 42.1 pp, ∆ PA broadleaf = + 45.6 pp, ∆ UA conifer = + 31.1 pp, ∆ PA conifer = + 27.6 pp). These results suggest that conifers can be classified decently already with very basic variables while additional variable are particularly beneficial for the correct classification of broadleaf trees. The correction for background effects through vegetation indices, for example, might be helpful for the classification of relatively open and transparent broadleaf trees, but less beneficial for the classification of the relatively compact conifer canopies.
The achieved classification reliability is additional information which can be important for the interpretation of the classification results or for revising the results [64]. Schultz et al. [65] showed that high reliabilities are positively correlated with a higher amount of correct classification. Species which obtained higher class specific accuracies were frequently classified with a higher reliability in the wall-to-wall mapping. It seems that the reliability results are also affected by a class occurrence (both in the model and in reality). However, we found (not shown) that the use of identical same sample sizes in each class achieved similar results. Compared to crop classifications [51,65], we note that the obtained reliability values were not very high.

Variable Importance
Recursive feature selection was used to narrow down the number of variables to the ones resulting in the highest classification accuracies. The approach is based on the importance values and the OOB accuracies from the Random Forest models. We obtained satisfactory results, which are in line with other studies [50,55,64,65]. However, as it uses OOB results, the feature selection procedure can lead to some kind of bias. Although the variables that were kept for the different models were never fully identical, there was a high level of agreement, not only among the variable groups (e.g., principal components, vegetation indices), but also among specific variables. This does not only add to the confidence in the reproducibility and meaningfulness of the feature selection, but it also makes it possible to draw general conclusions from the feature selection results.
In all cases where the vegetation indices were included in the classification (VNIR (all), VNIR (bands, indices), VNIR (bands, indices, PCs), VNIR (all-mean shift), the Photochemical Reflectance Index (PRI) was the most important variable. Furthermore, the vegetation indices BR, PSRI, GNDVI and mND 705 were also featured in all of the four said models. Of the five most frequently featured vegetation indices, four (PRI, BR, PSRI, GNDVI) included a green wavelength and three of them (BR, PSRI, GNDVI) related it to one from the near-infrared part of the spectrum.
At least in terms of quantity, principal components were the most important group of variables. Interestingly, three of the four models including principal components did not feature any of the first two principal components, which by definition explain most of the variation in the image data (Exception: VNIR (bands, PCs), PC 2). This points to a main criticism in using principle components in regression problems (in contrast to Partial least squares regression (PLSR) and independent component analysis (ICA)): the most useful information is not necessarily contained in the first components, but might be contained in later factors [44,80]. This drawback was avoided in our study by considering all PCs, and using the implemented feature selection approach to select those ones contributing best to the separability of the classes.

Conclusions
In this work, 699 manually delineated reference trees of 13 tree species were classified with the Random Forest classifier. The reference dataset comprised a variety of explanatory variables, including reflectance values, vegetation indices, textural metrics, statistics of the first derivative of the mean spectral signature and principal components. After the classification of the manually delineated polygons, the same procedure was carried out for the corresponding segments that were generated with automated mean shift segmentation. The resulting Random Forest models were finally applied to the segmented VNIR data and classified tree maps of the study area were obtained. This setup ensured that useful insights could be gained with respect to (i) the importance of the different variables, (ii) species-related differences in classification accuracy, and (iii) the automation potential of the proposed method.
Despite the fact that we studied 13 different species, we obtained very high classification accuracies (out-of-bag overall accuracy > 0.90). We attributed these results to the richness of the acquired hyperspectral data set (80 spectral bands), which were moreover recorded at a very high spatial resolution (0.4 m). As expected, not all of the species were equally well classified. Notably, for European ash and oak could not be well separated from the remaining species (mainly European beech) due to supposed high intra-class variabilities and overlapping spectral signatures. In general, conifers were better identified compared to broadleaf species. No relation was found between the number of available reference samples (24 ≤ n ≤ 99) and the species-specific user's and producer's accuracy.
With respect to the predictive variables we found that vegetation indices and principle components were the most important. Among the vegetation indices, PRI, BR, PSRI, GNDVI and mND 705 were of major significance for a successful classification. The first four mentioned indices include a green wavelength and three of them relate it to near-infrared reflectance values. The most important principal components share a high reliance on blue and green wavelengths.
In this work, the mean shift algorithm has shown to generate decently classifiable segments, which can be seen as a first step in the direction of an automated tree species classification approach. It seems meaningful to apply more advanced and objective evaluation methods and possibly also to evaluate other algorithms. More research is also needed to better understand the effects of shortcomings in the reference data, such as unequal sample sizes or an inhomogeneous distribution of reference samples over the study site. Ultimately, the applicability of tree species classification models over much larger geographical areas should be investigated.    [99][100][101]