Machine Learning Techniques for Tree Species Classification Using Co-Registered LiDAR and Hyperspectral Data

The use of light detection and ranging (LiDAR) techniques for recording and analyzing tree and forest structural variables shows strong promise for improving established hyperspectral-based tree species classifications; however, previous multi-sensoral projects were often limited by error resulting from seasonal or flight path differences. The National Aeronautics and Space Administration (NASA) Goddard’s LiDAR, hyperspectral, and thermal imager (G-LiHT) is now providing co-registered data on experimental forests in the United States, which are associated with established ground truths from existing forest plots. Free, user-friendly machine learning applications like the Orange Data Mining Extension for Python recently simplified the process of combining datasets, handling variable redundancy and noise, and reducing dimensionality in remotely sensed datasets. Neural networks, CN2 rules, and support vector machine methods are used here to achieve a final classification accuracy of 67% for dominant tree species in experimental plots of Howland Experimental Forest, a mixed coniferous–deciduous forest with ten dominant tree species, and 59% for plots in Penobscot Experimental Forest, a mixed coniferous–deciduous forest with 15 dominant tree species. These accuracies are higher than those produced using LiDAR or hyperspectral datasets separately, suggesting that combined spectral and structural data have a greater richness of complementary information than either dataset alone. Using greatly simplified datasets created by our dimensionality reduction methodology, machine learner performance remains comparable or higher to that using the full dataset. Across forests, the identification of shared structural and spectral variables suggests that this methodology can successfully identify parameters with high explanatory power for differentiating among tree species, and opens the possibility of addressing large-scale forestry questions using optimized remote sensing workflows.


Introduction
The use of geographic information systems (GIS) and remote sensing techniques for forestry applications has been a major concern in the field of geography since its creation, and a technical revolution in the last three decades allowed for increasingly sophisticated analysis of forest structure, composition, and dynamics.Although optical, multispectral, and hyperspectral remote-sensing techniques are traditionally used to gather data on forests, the incorporation of data on tree and canopy structure can improve analysis of forest biomass and health, carbon sequestration potential, and range, potentially even at the species level [1].Light detection and ranging (LiDAR) technologies are increasingly being employed to collect data on structural features of tree canopies and branching patterns, forest structure and succession, and even estimates of tree physiological metrics such as leaf area index [2].The use of LiDAR sensors, usually on airborne platforms such as small airplanes, also proved to be a boon to commercial forest resource monitoring and valuation.The ability to accurately estimate the height and other forest parameters, such as basal area or timber volume, with a single flyover greatly simplified the valuation of forests grown for timber [3].
Changing weather patterns will reshape the ranges of species worldwide, and the ability to monitor the changes in community dynamics of trees and other plants, which play a fundamental role in overall ecosystem functioning and composition, will be key in understanding trends in terrestrial biomes and in creating effective strategies for conservation, resilience, and human livelihoods [4].Whether a forest is assessed for conservation or commercial purposes, one key element of forest systems remains difficult to quantify; individual LiDAR data points have little to say about the species identity of a given tree.Nonetheless, tree species differ in their canopy architecture, overall shape factor, and foliage type; such morphological characteristics were used as the basis for LiDAR-based differentiation between deciduous and coniferous trees [5] and for more detailed species-level classifications [6][7][8].When collecting LiDAR data, a single laser pulse may be reflected off multiple canopy structures and recorded as several returns with different light intensities [9].Full-waveform LiDAR datasets contain information on these multiple within-canopy echoes and, thus, provide robust information on complex canopy architecture and forest composition.In addition to species-level classification, such datasets were used to estimate additional forest biomass parameters [10].However, discrete-return LiDAR data were found to provide additional information on forest structure beyond that provided by full-waveform LiDAR data [11], and can offer valuable insight into tree and forest canopies, for example, by using summary metrics calculated by binning a LiDAR point cloud into percentiles, deciles, or other values summarizing individual ground and canopy returns [12].Similar structural indices were used with success when surveying vegetation biodiversity [13], and it is, thus, possible to take advantage of high-density LiDAR data to examine branching patterns and other architectural data on a single-tree or tree-stand basis, and to relate this to the species of individual trees or of the predominant species in a stand.
The structural information provided by discrete-return LiDAR was used with success to characterize species richness [14] and predict species composition of individual stands [15] in tropical forests, and to differentiate among tree species or taxonomic groups [16][17][18].Classifications attempted on forests with only a few tree species [19] and those using a high density of LiDAR points [20][21][22] have achieved high accuracies; however, the utility of LiDAR summary metrics was more limited when attempting to extend similar classifications to a larger number of tree species [23].Additionally, debate continues over the optimal resolution of LiDAR data in comparison to individual tree crown size.Some researchers warned against trying to create species-level maps with data coarser than the individual tree level [24], although others asserted that there is unavoidable within-species variability due to an individual-tree signature that explains up 65% of intraspecies variability [25].While some researchers had success in performing individual tree detection (ITD) to closely approximate the location and size of trees and tree crowns [26,27], ITD remains a field with uncertainties that will need to be addressed before it can be implemented for robust forest inventory or valuation [28].Until such methods are perfected, it remains prudent to work at the scale of aggregated tree stands, plots, or groups for large-scale forest inventory or classification when using discrete-return LiDAR data.
In addition to the structural information offered by LiDAR datasets, remote tree species classifications may take advantage of the differential reflectance of different wavelengths of light within heterogeneous forest canopies.Multispectral [29,30] and optical [31,32] datasets were previously used in combination with LiDAR for species-level classifications, and hyperspectral data in particular were used for tree species classification because of the differences in light reflectance off leaves with species-specific pigment concentrations [33].Using a similar principle, recently developed multispectral LiDAR systems can be used to gather wavelength-dependent structural information, and were used for tree species identification with higher accuracies than single-wavelength LiDAR data [34][35][36][37].The use of LiDAR data to identify gaps in the tree canopy also bolstered the accuracy of some hyperspectral classifications [38][39][40].Fused hyperspectral and LiDAR datasets also show promise for ecological problems other than species classification, including biomass estimation [41], habitat characterization [42], and forest fire risk evaluation [43].Very few studies (with the exception of Reference [44]) had the opportunity to use a co-registered hyperspectral and LiDAR dataset.
Recently available sensors like the National Aeronautics and Space Administration (NASA) Goddard's LiDAR, hyperspectral, and thermal (G-LiHT) imager are helping to improve the utility and viability of projects combining LiDAR and hyperspectral data by providing co-registered data on established experimental forests [45].These G-LiHT flights provide a double benefit for investigating advances in the remote sensing of vegetation.Firstly, co-registered data are historically a rarity in multi-sensoral projects, but are desirable because of the potential for reducing error resulting from seasonal or flight path differences.Secondly, the use of experimental forests as ground truth areas for remote-sensing projects is recommended because of the existing knowledge on the management of such areas and the possibility for connecting results of remote-sensing analyses to other existing research [46].
Datasets collected by G-LiHT also provide an excellent opportunity for evaluating new analytical methods.Remotely sensed datasets are often constrained or complicated by variable redundancy and noise, and traditional dimensionality reduction techniques require the use of computationally intense and expensive software programs.Additionally, most remotely sensed datasets, particularly hyperspectral ones, are very large.Given that only a fixed number of ground-truth sites or pixels typically exist for a particular project, this dimensionality leads to problems such as the Hughes phenomenon, where hyperspectral data on a forest for which the researcher has data on only a small number of ground truth areas might be more redundant than insightful [47].For this reason, machine learning and data mining techniques for dimensionality reduction and pattern finding are often employed in species classification studies, as well as for predictive models of species distributions or habitat suitability [46,[48][49][50][51][52], and they show strong promise for use in future work.
Large datasets like those collected by G-LiHT represent a classic case of the Hughes phenomenon and, therefore, an ideal opportunity to assess dimensionality reduction techniques.Previous researchers recommended techniques such as data filtering (retaining only a subset of a dataset for analysis, such as by removing incomplete cases or irrelevant variables), factor analysis, or separability indices used to reduce a large dataset to only the most informative subset, with the goal of improving the efficiency of machine learners and reducing correlation among variables used for classifier training [53].Variables may be selected using parametric [54] or nonparametric [55] statistical methods, as well as more complex statistical methods aimed at variable selection, grouping [56], or iterative inter-comparisons of potential variable combinations for accurate and parsimonious model creation [57,58].Additionally, machine learning methods, including random forests and support vector machines, were used as tools for pattern recognition and variable selection when working with large datasets [59,60].It is becoming increasingly easy to implement such techniques; free, user-friendly data mining and machine learning applications like the Orange Data Mining Extension for Python (Orange) [61] recently simplified the process of combining and analyzing remotely sensed datasets for researchers at all levels.Thus, it is clear that there exist datasets and techniques that are ideally suited to respond to a need for optimized dimensionality reduction techniques, particularly as the collection of large datasets is becoming increasingly common in forest ecosystems.Here, we describe a methodology for assessing a suite of LiDAR and optical metrics refined by machine learning techniques to perform species-level tree classifications that optimize the contribution of both structural and spectral information.

Data Collection
The G-LiHT imager is composed of several compatible off-the-shelf navigation, spectrometry, imaging, and laser sensor products [45].Flyovers relevant to this study were conducted in June 2012 and all data can be found in the G-LiHT data archive at ftp://fusionftp.gsfc.nasa.gov/G-LiHT.Discrete-return LiDAR data, originally collected at a density of six returns per square meter, are available in raster format, in which returns are aggregated to 13-m 2 pixels (see Section 2.2 and Table 1 for more details); hyperspectral data are available at a 1-m 2 resolution.In total, 190 individual variables are available as part of the G-LiHT outputs for each forest (32 LiDAR metrics, 114 hyperspectral reflectance bands, and 44 vegetation indices).× 25 m.Data on the species, tree height, and diameter at breast height (DBH) were recorded for each tree above 10 cm in DBH in these plots [62].

Data Preparation and Exploration
Binned point clouds generated by G-LiHT's scanning LiDAR sensor were processed into standard metrics (as described in previous publications [63,64]; definitions in Table 1), available as raster files with 13-m 2 resolution.
Available hyperspectral data include at-sensor reflectance data covering a spectrum between 418 and 918 nm, with an approximately 4.4-nm interval between bands for a total of 114 individual bands.A total of 44 different vegetation indices calculated from these reflectance measurements are also available (a select list for those discussed in this article are shown in Table 2).To prepare these available data for analysis, a dominant species for each whole plot and subplot was determined and associated with subplot polygons (Table 3).Howland Experimental Forest was largely undisturbed in the 140 years since its establishment [83].In Penobscot Experimental Forest, plots used in this analysis were outside areas used in cutting and forest management studies [84], and only minor disturbance from spruce budworm was reported [85].Since forest age and tree size distributions can be assumed to be relatively stable in these areas, and because trees under 10 cm in DBH were already removed from analysis, the species with the greatest number of individual trees (stem count) was chosen as representative of the subplot-level dominant species.In four cases of a tie between two or more species, the species that was dominant in a neighboring subplot or throughout the entire plot was chosen.

Machine Learning Methods, Accuracy, and Validation
Numerous methods for machine learning are available, spanning a wide range of data analysis techniques.Overall, the methods used here can be broken down into classification tree methods (decision trees, random forest), methods based on grouping and separability (support vector machines (SVM), k-nearest neighbors), and methods based on rule creation and application (CN2 rules, naïve Bayes, neural networks).The Orange Data Mining Extension for Python, version 2.7 [61] was used to test each of the above classification methods (Figure 1a).Classifier performance was assessed by calculating overall classification accuracy (CA), area under the curve of the receiving operator characteristic (AUC-ROC) [86], Brier scores [87], and Cohen's kappa coefficient [88,89] for each combination.Orange automatically generates and reports AUC-ROC values and Brier scores when machine learning classifiers are run, as well as a confusion matrix and classification accuracy value.Custom Python code was written to calculate the kappa coefficient from the confusion matrix.(decision trees, random forest), methods based on grouping and separability (support vector machines (SVM), k-nearest neighbors), and methods based on rule creation and application (CN2 rules, naïve Bayes, neural networks).The Orange Data Mining Extension for Python, version 2.7 [61] was used to test each of the above classification methods (Figure 1a).Classifier performance was assessed by calculating overall classification accuracy (CA), area under the curve of the receiving operator characteristic (AUC-ROC) [86], Brier scores [87], and Cohen's kappa coefficient [88,89] for each combination.Orange automatically generates and reports AUC-ROC values and Brier scores when machine learning classifiers are run, as well as a confusion matrix and classification accuracy value.Custom Python code was written to calculate the kappa coefficient from the confusion matrix.As a baseline for comparison to the explanatory power of simplified datasets, all machine learners described above were tested on full datasets consisting of 32 LiDAR metrics, 114 hyperspectral bands, 44 vegetation indices, or all 190 variables.After testing, the two best-performing machine learners, as determined by highest classification accuracy, AUC-ROC, Brier score, and kappa coefficient, were selected for use in further analyses.In all cases, classification accuracies were determined by applying each trained machine learner to subsets of input data with known dominant species identity.Specifically, cross-validation resampling, in which data on each dominant species serve as training data in one of multiple rounds of machine learning by each classifier, was used to generate confusion matrices from which each overall classification accuracy was calculated.
For use in combination with the two best machine learning techniques, the list of input variables was also reduced to a simplified list, optimized to include the most informative variables available for each forest.The classification tree run on each dataset during the initial exploratory analysis was examined using the Tree Viewer widget in Orange.These trees were used to construct lists of variables that represent informative breaks in the dataset.LiDAR metrics, vegetation indices, or reflectance bands found in the first five levels of classification tree nodes were compiled to create simplified lists.Lists were also made from the first ten levels of classification tree nodes, but these longer lists were found to be no more informative than those from the first five levels; thus, they are not discussed in the results section.As a baseline for comparison to the explanatory power of simplified datasets, all machine learners described above were tested on full datasets consisting of 32 LiDAR metrics, 114 hyperspectral bands, 44 vegetation indices, or all 190 variables.After testing, the two best-performing machine learners, as determined by highest classification accuracy, AUC-ROC, Brier score, and kappa coefficient, were selected for use in further analyses.In all cases, classification accuracies were determined by applying each trained machine learner to subsets of input data with known dominant species identity.Specifically, cross-validation resampling, in which data on each dominant species serve as training data in one of multiple rounds of machine learning by each classifier, was used to generate confusion matrices from which each overall classification accuracy was calculated.
For use in combination with the two best machine learning techniques, the list of input variables was also reduced to a simplified list, optimized to include the most informative variables available for each forest.The classification tree run on each dataset during the initial exploratory analysis was examined using the Tree Viewer widget in Orange.These trees were used to construct lists of variables that represent informative breaks in the dataset.LiDAR metrics, vegetation indices, or reflectance bands found in the first five levels of classification tree nodes were compiled to create simplified lists.Lists were also made from the first ten levels of classification tree nodes, but these longer lists were found to be no more informative than those from the first five levels; thus, they are not discussed in the results section.
Variables identified from classification tree breaks were then used to construct five simplified lists of input data per forest.The classification tree run on the reflectance bands alone was used to construct a simplified list of select reflectance bands.Lists were constructed in the same way from classification trees run on the full list of vegetation indices to create a simplified vegetation indices list and on the entire dataset of 190 LiDAR and hyperspectral variables to create a simplified list containing variables of all data types.In the case of the LiDAR metrics, lists were made for individual forests, and a common list of metrics shared across forests was also made in an attempt to identify some generalizable aspects of LiDAR data that may have strong explanatory power in other forests (Table 4).Using only these simplified lists of metrics as inputs, the two best classification and resampling methods as determined above were rerun and reassessed on the basis of CA, AUC-ROC, Brier score, and kappa coefficient (Figure 1b).In order to compare this method of dimensionality reduction to an established statistical technique, principal component analysis (PCA) was also performed on a dataset constructed from a raster stack of all hyperspectral reflectance bands, using the Forward PC Rotation function in ENVI Classic.PCA was performed on this dataset only because of missing values in LiDAR metric rasters and because of the difficulty of interpretation of principle components created from all vegetation indices, in which mathematical transformations were already applied to reflectance data.The resulting principal components with eigenvalues greater than one (10 principal components for each forest) were exported as raster files and used as inputs to machine learning classifiers as described above for other datasets.

Results
An initial assessment of species-specific structure shows that individual tree DBH and height measurements in Howland and Penobscot Experimental Forests vary in absolute magnitude and in degree of within-species variability (Figure 2).This variability is unsurprising given that these biometry data are also comparing across tree ages and growing conditions.Nonetheless, interspecies variability in these parameters illustrates key characteristics of tree community composition at each forest site.Results of our initial exploratory analysis show that, in both forests, use of the full dataset containing both spectral and structural data resulted in higher classification accuracies, 0.6371 for Howland Experimental Forest and 0.5914 for Penobscot Experimental Forest, than using any of the three individual data types alone (Figure 3).Across forests, using LiDAR data alone resulted in slightly lower classification accuracies than either type of hyperspectral data, and the use of vegetation indices as machine learning inputs resulted in higher accuracies than using raw reflectance data.Indeed, classification accuracies achieved using vegetation indices were nearly equal to that from the full hyperspectral and LiDAR dataset in Howland Experimental Forest (CA = 0.6367) (Figure 3).Although the performance of individual machine learning techniques varied by data type and forest, k-nearest neighbors, random forest, and neural network classifiers tended to outperform other options.In both forests, all machine learning techniques produced higher accuracies when run with cross-validation resampling; only these results are shown in Figures 3 and 4. Finally, classification accuracies from data on Howland Experimental Forest (maximum CA = 0.6371) were higher across the board than those from Penobscot Experimental Forest (maximum CA = 0.5914).Results of our initial exploratory analysis show that, in both forests, use of the full dataset containing both spectral and structural data resulted in higher classification accuracies, 0.6371 for Howland Experimental Forest and 0.5914 for Penobscot Experimental Forest, than using any of the three individual data types alone (Figure 3).Across forests, using LiDAR data alone resulted in slightly lower classification accuracies than either type of hyperspectral data, and the use of vegetation indices as machine learning inputs resulted in higher accuracies than using raw reflectance data.Indeed, classification accuracies achieved using vegetation indices were nearly equal to that from the full hyperspectral and LiDAR dataset in Howland Experimental Forest (CA = 0.6367) (Figure 3).Although the performance of individual machine learning techniques varied by data type and forest, k-nearest neighbors, random forest, and neural network classifiers tended to outperform other options.In both forests, all machine learning techniques produced higher accuracies when run with cross-validation resampling; only these results are shown in Figures 3 and 4. Finally, classification accuracies from data on Howland Experimental Forest (maximum CA = 0.6371) were higher across the board than those from Penobscot Experimental Forest (maximum CA = 0.5914).Table 4 shows the simplified lists of inputs used for the second round of machine learning analysis.Dimensionality is greatly reduced as compared to the full dataset; this is particularly evident in the case of the hyperspectral reflectance bands, where only 13 bands (Howland Experimental Forest) or 16 bands (Penobscot Experimental Forest) were retained, representing an approximately 90% reduction in the number of input variables.In the case of the reflectance only dataset, selected bands covered the full range of available wavelengths.A wide range of vegetation indices optimized for chlorophylls, cartenoids, anthocyanins, and xanthophylls were also identified as important variables for interspecies distinction.The LiDAR metrics identified by this method include both FCover, a general measure of forest density and extent, and Fract_All, which quantifies the relative number of multiple LiDAR collisions with vegetation due to within-canopy structure, and height percentile and density decile parameters that provide detailed information on vertical distribution of canopy elements.Five paramters, D9, FCover, FractAll, P50, and P100, were selected in both forests, suggesting that this methodology can be used to identify key structural characteristics of species, as well as pigment-related reflectance differences.

Remote
The use of the combined list of hyperspectral and LiDAR inputs yielded higher classification accuracies (Figure 4) and kappa coefficients (Figure 5) than any individual dataset alone, demonstrating that LiDAR and hyperspectral datasets contain complementary information.When Table 4 shows the simplified lists of inputs used for the second round of machine learning analysis.Dimensionality is greatly reduced as compared to the full dataset; this is particularly evident in the case of the hyperspectral reflectance bands, where only 13 bands (Howland Experimental Forest) or 16 bands (Penobscot Experimental Forest) were retained, representing an approximately 90% reduction in the number of input variables.In the case of the reflectance only dataset, selected bands covered the full range of available wavelengths.A wide range of vegetation indices optimized for chlorophylls, cartenoids, anthocyanins, and xanthophylls were also identified as important variables for interspecies distinction.The LiDAR metrics identified by this method include both FCover, a general measure of forest density and extent, and Fract_All, which quantifies the relative number of multiple LiDAR collisions with vegetation due to within-canopy structure, and height percentile and density decile parameters that provide detailed information on vertical distribution of canopy elements.Five paramters, D9, FCover, FractAll, P50, and P100, were selected in both forests, suggesting that this methodology can be used to identify key structural characteristics of species, as well as pigment-related reflectance differences.
The use of the combined list of hyperspectral and LiDAR inputs yielded higher classification accuracies (Figure 4) and kappa coefficients (Figure 5) than any individual dataset alone, demonstrating that LiDAR and hyperspectral datasets contain complementary information.When comparing between the performance of machine learners run with inputs from individual datasets, use of the simplified lists of vegetation indices also resulted in high classification accuracies.Machine learners trained on a simplified list of reflectance bands outperformed those trained on the principal components created from the reflectance dataset, demonstrating that dimensionality can be reduced with our methodology while retaining superior separability among species.In Figure 4, the greatest classification accuracies from the exploratory analysis are overlaid in gray on results from analyses on the simplified lists.Although all six available machine learning techniques were tried on these PCA datasets as in the exploratory analysis step, results from only the two with the highest classification accuracy or kappa coefficient results are shown in Figures and 5 for ease of comparison.For Howland Experimental Forest, classification accuracies improved or remained comparable to those achieved using the full dataset, even with the significant dimensionality reduction performed here.In data from Penobscot Experimental Forest, simplified lists were slightly outperformed by runs using the full dataset in all cases.Nevertheless, the similar or, in some cases, improved performance of machine learners run on a significantly smaller dataset implies that our selection methodology is able to produce a list of inputs optimized for high separability among tree species.comparing between the performance of machine learners run with inputs from individual datasets, use of the simplified lists of vegetation indices also resulted in high classification accuracies.Machine learners trained on a simplified list of reflectance bands outperformed those trained on the principal components created from the reflectance dataset, demonstrating that dimensionality can be reduced with our methodology while retaining superior separability among species.In Figure 4, the greatest classification accuracies from the exploratory analysis are overlaid in gray on results from analyses on the simplified lists.Although all six available machine learning techniques were tried on these PCA datasets as in the exploratory analysis step, results from only the two with the highest classification accuracy or kappa coefficient results are shown in Figures 4 and 5 for ease of comparison.For Howland Experimental Forest, classification accuracies improved or remained comparable to those achieved using the full dataset, even with the significant dimensionality reduction performed here.In data from Penobscot Experimental Forest, simplified lists were slightly outperformed by runs using the full dataset in all cases.Nevertheless, the similar or, in some cases, improved performance of machine learners run on a significantly smaller dataset implies that our selection methodology is able to produce a list of inputs optimized for high separability among tree species.

Discussion
Results from combined LiDAR, vegetation index, and hyperspectral reflectance datasets across forests suggest that the combination of spectral and structural information is richer in detail than any individual dataset alone.This improvement is in line with other studies that found a similar effect [20,90].The fact that the incorporation of LiDAR data improved the hyperspectral-based classifications of tree species, particularly at Howland Experimental Forest, speaks to the utility of machine learning techniques in solving problems like this one.Some researchers previously postulated that LiDAR datasets do not suffer as much from the issues of ill-posed problems and very high dimensionality and are, therefore, better suited to classification techniques that would not necessarily be optimal for other remote-sensing work [91].This, along with our dimensionality reduction methodology, may account for some of the differences between the results described here and other previously published studies that did not find improvements in classification accuracy when adding LiDAR data to hyperspectral datasets [39,92].
Nonetheless, there remain some limitations to the analysis as presented here.Firstly, the inclusion of LiDAR metrics, such as the mean and standard deviation rasters, which are necessarily specific to the tree heights in the forest on which they were calculated, may limit the generalizability of this analysis.Secondly, this analysis necessitated use of aggregated data.While this is not a constraint that will necessarily apply to all future studies, aggregation of data to a subplot level was required in this case because of the lack of data on the coordinates of individual trees within either forest.The aggregation of this biometry data by subplot-level stem count is just one of several ways in which data could have been meaningfully summarized [93].Although initial exploration of aggregation methods revealed that the majority of subplots would be assigned the same dominant species regardless of method, this choice necessarily affects the exact classification accuracies

Discussion
Results from combined LiDAR, vegetation index, and hyperspectral reflectance datasets across forests suggest that the combination of spectral and structural information is richer in detail than any individual dataset alone.This improvement is in line with other studies that found a similar effect [20,90].The fact that the incorporation of LiDAR data improved the hyperspectral-based classifications of tree species, particularly at Howland Experimental Forest, speaks to the utility of machine learning techniques in solving problems like this one.Some researchers previously postulated that LiDAR datasets do not suffer as much from the issues of ill-posed problems and very high dimensionality and are, therefore, better suited to classification techniques that would not necessarily be optimal for other remote-sensing work [91].This, along with our dimensionality reduction methodology, may account for some of the differences between the results described here and other previously published studies that did not find improvements in classification accuracy when adding LiDAR data to hyperspectral datasets [39,92].
Nonetheless, there remain some limitations to the analysis as presented here.Firstly, the inclusion of LiDAR metrics, such as the mean and standard deviation rasters, which are necessarily specific to the tree heights in the forest on which they were calculated, may limit the generalizability of this analysis.Secondly, this analysis necessitated use of aggregated data.While this is not a constraint that will necessarily apply to all future studies, aggregation of data to a subplot level was required in this case because of the lack of data on the coordinates of individual trees within either forest.The aggregation of this biometry data by subplot-level stem count is just one of several ways in which data could have been meaningfully summarized [93].Although initial exploration of aggregation methods revealed that the majority of subplots would be assigned the same dominant species regardless of method, this choice necessarily affects the exact classification accuracies achieved in this analysis.Additionally, any aggregation means that some detail is necessarily lost, particularly from the field campaign dataset, which provided data on height and DBH at an individual tree level, and from the hyperspectral datasets.Within each subplot, several hundred 1-m 2 pixels were averaged together during the aggregation process, meaning that a great deal of detail on differential reflectance from within individual tree crowns could not be used.This is a problem that was confronted by numerous researchers in the past, since G-LiHT is certainly not the only dataset to include data aggregated to different sizes or to rely on ground-truth data with some limitations.Some authors argued that attempts to identify or classify species at anything above the individual tree level will be met with difficulty [24], but other researchers previously published classifications with up to 90% on tree stands [94].In the case of the used here, aggregation to a mean subplot value necessarily creates some error due to loss of detail and because of the contaminating effect of non-dominant species' spectral signatures, as well as any visible shrub understory or bare ground, for which it was impossible to fully account in this classification.Nonetheless, a classification accuracy of over 67% demonstrates again that such datasets can still be used to generate reliable results, an encouraging result given that previous researchers reported stand effects that explain a similar amount of variance in LiDAR returns, as with species identity [25].
Although the combination of spectral and structural data in this and future analyses will likely always necessitate data aggregation or spatial mismatch, our analysis shows that the benefits of dataset fusion outweigh the costs.Both simplified lists of inputs combining data from all three data types include an intriguing mix of variables.Across forests, these simplified lists contain numerous variables related to leaf greenness and pigment concentrations.All hyperspectral reflectance bands selected for the simplified list containing all data types fall between 500 and 600 nm, the green portion of the spectrum.Similarly, the majority of vegetation indices included in the simplified lists were related to anthocyanin, carotenoid, and chlorophyll concentrations, either directly or as a measure of the red edge of the vegetation reflectance spectrum.As a complement, the LiDAR metrics included in these lists include parameters representing broad structural features within forest canopies, including the P50 and P100 parameters widely used to quantify forest biomass and height in LiDAR inventory studies [95], as well as the Fract_All and density parameters that provide insight into crown and canopy structure.The consistent selection of these parameters across sites indicates that the methodology used here is capable of identifying characteristics of vegetation that are both fundamentally important and useful in distinguishing between tree species within a single region of forest canopy.
The selection of particular machine learners over others is also a key factor in determining the success of tree species classifications.In this analysis, neural networks, k-nearest neighbors, and random forest methods generally outperformed the others available through Orange.Historically, support vector machines were used with success on remotely sensed datasets, including in other recent attempts at tree species classification [37].This is likely due to the fact that support vector machines (SVM) are designed to handle datasets of very high dimensionality, making them the established standard for hyperspectral remote-sensing work [96].However, on our datasets with reduced dimensionality, the strengths of other machine learning techniques may have led to their superior performance.Early work on the use of neural networks highlighted their suitability for multisource datasets [97], and the entire neural network principle is based on the capacity of each neuron in the network to shift and change as the network handles more or new data [98].Similarly, the CN2 rules algorithm was invented to create rules that can be applied to data points that fit well, but imperfectly, with known classes, rather than excluding all imperfect matches [99].The benefit of such flexibility is easily seen when considering the variability in growth form and leaf reflectance from individuals of the same species in a forest, although this analysis by no means confirms this as the precise reason for the high performance of these machine learning techniques in this analysis.Further work should explore within-species variability as an important factor in machine learning work for tree species classification on the landscape scale.

Conclusions
In this analysis, neural networks, k-nearest neighbors, and random forest methods were used to achieve high classification accuracies when distinguishing among tree species using simplified and optimized lists of hyperspectral and LiDAR variables.This analysis supports a growing body of knowledge on the utility of datasets containing complementary structural and spectral information.Given the potential for land-cover classification using LiDAR data on land surface properties [100], such fused datasets may better reveal the structure and shadowing effects of canopy gaps or other irregularities that would otherwise hinder species classifications using spectral data alone.It was shown that using data on aboveground biomass in conjunction with structural information on forest structure generated by the laser vegetation imaging sensor (LVIS) improves the ability of models to predict the size of forest carbon stocks [21].It now seems that the combination of these two data types may be able to simultaneously help identify tree species, thereby opening up the possibility of generating species-specific carbon estimates with a similar combined dataset.Other researchers looking to the future of remote sensing also highlighted the utility of LiDAR data in addressing large-scale questions like deforestation and carbon sequestration in whole forests on a species-specific basis [1,31].
When looking to the future of multi-sensoral and fused datasets, one of the commonly cited challenges is the development or discovery of analytical methods that can properly integrate data collected by different sensors or by different projects altogether.While variable reduction techniques used here showed mixed results depending on the exact set of inputs to each machine learner, it appears that dimensionality reduction based on classification tree nodes is a technique worth trying on fused or multisource remote sensing datasets.In summary, the capability of data mining and machine learning interfaces like Orange to optimize classification workflows is clearly powerful.Further work should be done to optimize the production of simplified datasets combining information from a variety of sensors in order to better understand, monitor, and quantify heterogeneously distributed tree species.

Figure 1 .
Figure1.Sample Orange workflow for comparing machine learning methods.In the exploratory analysis step (a), all available machine learning methods were used in combination with the full suite of available data.In the simplified analysis (b), only the two best-performing machine learners were used on a simplified list of input variables, one example of which is shown here.

Figure 1 .
Figure1.Sample Orange workflow for comparing machine learning methods.In the exploratory analysis step (a), all available machine learning methods were used in combination with the full suite of available data.In the simplified analysis (b), only the two best-performing machine learners were used on a simplified list of input variables, one example of which is shown here.

Figure 2 .
Figure 2. Plots of individual tree diameters at breast height by species.Summary of diameter at breast height data (a) and individual tree height data (b) for on trees in experimental plots in Howland Experimental Forest and Penobscot Experimental Forest.Dots represent data on individual trees; overlaid box-and-whisker plots summarize distribution of values by species for each forest.

Figure 2 .
Figure 2. Plots of individual tree diameters at breast height by species.Summary of diameter at breast height data (a) and individual tree height data (b) for on trees in experimental plots in Howland Experimental Forest and Penobscot Experimental Forest.Dots represent data on individual trees; overlaid box-and-whisker plots summarize distribution of values by species for each forest.

Figure 3 .
Figure 3.Comparison of resampling techniques and machine learning methods using complete lists of metrics.Figure shows classification accuracies achieved by machine learners run on full datasets from Howland Experimental Forest (a) and Penobscot Experimental Forest (b).From left to right, columns represent classification accuracies produced with light detection and ranging (LiDAR) data, hyperspectral reflectance data, vegetation indices (VIs) calculated from these reflectance data, and a combined dataset of LiDAR and both types of hyperspectral data.

Figure 3 .
Figure 3.Comparison of resampling techniques and machine learning methods using complete lists of metrics.Figure shows classification accuracies achieved by machine learners run on full datasets from Howland Experimental Forest (a) and Penobscot Experimental Forest (b).From left to right, columns represent classification accuracies produced with light detection and ranging (LiDAR) data, hyperspectral reflectance data, vegetation indices (VIs) calculated from these reflectance data, and a combined dataset of LiDAR and both types of hyperspectral data.

Figure 4 .
Figure 4. Classification accuracies achieved by machine learners run on simplified datasets from Howland Experimental Forest (a) and Penobscot Experimental Forest (b).Dot color represents the machine learning technique used in each case.Gray dots represent classification accuracies achieved during the exploratory analysis step using the full dataset, and are shown as a comparison.

Figure 4 .Figure 5 .
Figure 4. Classification accuracies achieved by machine learners run on simplified datasets from Howland Experimental Forest (a) and Penobscot Experimental Forest (b).Dot color represents the machine learning technique used in each case.Gray dots represent classification accuracies achieved during the exploratory analysis step using the full dataset, and are shown as a comparison.

Figure 5 .
Figure 5. Kappa coefficients achieved by machine learners run on simplified datasets from Howland Experimental Forest (a) and Penobscot Experimental Forest (b).Dot color represents the machine learning technique used in each case.

Table 1 .
Full list of light detection and ranging (LiDAR) metrics and abbreviations.

at 44 • 85 20 N, 68 • 62 00 W. Both sites are mixed coniferous-deciduous, predominantly evergreen forests. Data were collected in forest plots (11 plots in Howland Experimental Forest and 12 plots in Penobscot Experimental Forest) of 50 m × 200
m, each of which was divided into 16 subplots of approximately 25 m

Table 2 .
Select list of hyperspectral vegetation indices and abbreviations used in final analyses.

Table 3 .
Tree species abbreviations.Species dominant in one or more subplots in either forest are denoted with a letter indicating the forest name: H for Howland Experimental Forest and P for Penobscot Experimental Forest.Species not followed by a letter are not the dominant species in any subplot from either forest.

Table 4 .
Simplified lists of machine learning inputs.Variables used in nodes within the first five levels of classification trees constructed on the full dataset were added to simplified lists serving as inputs for further machine learning analyses.Metrics used to construct the common list of LiDAR metrics are highlighted in gray to indicate shared status across forests.