A Comparison of Methods for Determining Forest Composition from High-Spatial-Resolution Remotely Sensed Imagery

Remotely sensed imagery has been used to support forest ecology and management for decades. In modern times, the propagation of high-spatial-resolution image analysis techniques and automated workflows have further strengthened this synergy, leading to the inquiry into more complex, local-scale, ecosystem characteristics. To appropriately inform decisions in forestry ecology and management, the most reliable and efficient methods should be adopted. For this reason, our research compares visual interpretation to digital (automated) processing for forest plot composition and individual tree identification. During this investigation, we qualitatively and quantitatively evaluated the process of classifying species groups within complex, mixed-species forests in New England. This analysis included a comparison of three high-resolution remotely sensed imagery sources: Google Earth, National Agriculture Imagery Program (NAIP) imagery, and unmanned aerial system (UAS) imagery. We discovered that, although the level of detail afforded by the UAS imagery spatial resolution (3.02 cm average pixel size) improved the visual interpretation results (7.87–9.59%), the highest thematic accuracy was still only 54.44% for the generalized composition groups. Our qualitative analysis of the uncertainty for visually interpreting different composition classes revealed the persistence of mislabeled hardwood compositions (including an early successional class) and an inability to consistently differentiate between ‘pure’ and ‘mixed’ stands. The results of digitally classifying the same forest compositions produced a higher level of accuracy for both detecting individual trees (93.9%) and labeling them (59.62–70.48%) using machine learning algorithms including classification and regression trees, random forest, and support vector machines. These results indicate that digital, automated, classification produced an increase in overall accuracy of 16.04% over visual interpretation for generalized forest composition classes. Other studies, which incorporate multitemporal, multispectral, or data fusion approaches provide evidence for further widening this gap. Further refinement of the methods for individual tree detection, delineation, and classification should be developed for structurally and compositionally complex forests to supplement the critical deficiency in local-scale forest information around the world.


Introduction
The accurate identification of tree species is an important component of successful forest management [1,2]. For hundreds of years, societies have prepared land-cover maps to better understand and manage the distribution of vegetation communities [3][4][5]. While the methodologies to produce such spatial representations have changed dramatically, it is apparent that these generalizations still serve as an important tool for solving a number of environmental problems [6][7][8]. Many known drivers of ecosystem change and degradation stem from land-cover and land-use conversion at the local scale. For forested areas, this can mean a considerable reduction in neighboring area functionality and resource availability, in addition to the influences of direct land-cover transformation. With land-cover maps, and especially forest-cover-type maps, serving to guide critical management decisions and research understanding, it is important that their representations are as reliable and as detailed as possible [2,9]. Remotely sensed data have come to provide some of the most accurate and cost-effective ways of producing such forest composition information [1,10]. Modern high-spatial-resolution imagery, with 1 m or smaller pixel sizes, is becoming more attainable and, as such, is spurring a multitude of precision forestry applications [11][12][13][14]. Freely available high-resolution imagery from sources such as Google Earth provide users one such tool for compiling local-scale information [15][16][17]. Despite the undeniable benefits that this imagery provides, the best practices to generate reliable and detailed forest-cover information are yet undetermined.
The classification of remotely sensed imagery generates thematic maps (or layers) by distinguishing individual features based on a selected classification scheme using the spectral, textural, and temporal characteristics of those map classes. The creation of thematic maps is one of the most common applications of remotely sensed imagery [6,18]. While there is a rich history of manually interpreted thematic layers, countless techniques have been developed using computer-based algorithms for reliably automating this procedure [10,14,[19][20][21]. Identifying tree species through visual interpretation takes a trained specialist and remains time consuming for larger areas [2,10]. It is more common today, that information on forest species is produced using automated approaches and highresolution remotely sensed data [2,14]. To sufficiently handle the increasing amount of digital remotely sensed data, an approach called digital image processing has also been developed to analyze and explore the characteristics of the acquired imagery [12,22,23]. The techniques for image classification are defined by several characteristics including simple or advanced, supervised or unsupervised, pixel-based or object-based [23,24]. The first distinction, simple or advanced, specifies whether the algorithm integrates machine learning as a function for separating the defined classes. Following breakthroughs in computer science, classification algorithms used in thematic mapping began to integrate artificial intelligence (AI) or machine learning in the mid-1990s [20,24,25]. Common and powerful examples of such classifications include decision trees (e.g., Classification and Regression Trees (CART) or random forests) and the support vector machine (SVM) algorithm [18,20,26,27]. The second distinction, supervised or unsupervised, specifies whether the algorithm relies on training data to base its assignments (supervised classifications) or if the user defines some clustering parameters used to divide the sample units to maximize separability (unsupervised classifications) [23]. While conventional, supervised and unsupervised algorithms are still used frequently for remote sensing image classification, machine learning methods have been found to generally perform better [20,28,29]. For the final distinction, pixel-based classifications (PBC) denote algorithms which operate on the smallest divisible unit of digital images, the pixel [23]. Object-based classifications (OBC), also known as OBIA, or GEOBIA, operate on homogenous image primitives, also termed image areas, polygons, objects, or segments [30][31][32][33]. PBC relies heavily on spectral data to assign class labels, taking into account only the spectral response of the individual pixels [34][35][36][37]. The increasing spatial resolution of remotely sensed data has caused subsequently greater challenges for positional registration. Due to these challenges, classification methods have shifted towards using homogenous windows (e.g., 3 × 3 or 5 × 5 pixels) and/or image objects [38,39]. OBC uses region-growing, thresholding, or clustering algorithms to segment images into more holistic units of analysis (e.g., individual tree crowns) [31,40]. OBC incorporates greater context into each individual unit, such as size, compactness, spectral or geometric heterogeneity, and spectral averages, while maintaining user defined thresholds for between object variability [41]. Like the preference for OBC over PBC, machine learning algorithms often allow for a greater number of inputs, reducing the reliance on spectral properties of individual pixels alone [41,42]. Deciding between remote sensing platforms (e.g., satellite, airborne, UAS, etc.), algorithms and classification approaches is a choice dictated by the specific needs of the project and the characteristics of the source imagery [36,43]. In recent years, the increased flexibility and To confront the constraints of time, money, and effort on site-specific (i.e., precision) forestry data collection, improved technologies need to be embraced [44][45][46][47]. No longer considered 'Dangerous, Dirty, and Dull contraptions' [48], UAS have been used in recent years for numerous high-precision applications [45,[49][50][51][52]. Apart from the collection of raw imagery and videos, UAS imagery provides valuable information from 3D models created using Structure from Motion (SfM). The mathematical process behind SfM provides a nearly autonomous workflow for reconstructing 3-dimensional (3D) surfaces from numerous 2D projections (images) [53][54][55].
In our study we have two objectives. We compared the proficiency of visual interpretation to digital processing for forest plot composition and individual tree identification using UAS and other high-spatial resolution remotely sensed imagery, such as satellite and airborne imagery. A similar investigation by Holbling et al., [56], compared manual and semi-automated classification approaches for landslide mapping. They determined that while there were obvious trade-offs in the techniques, the final accuracy varied depending on the study site. Therefore, in our first objective, we quantify the accuracy achieved when visual interpreting forest composition classes from three different sources of remotely sensed imagery (Google Earth, National Agriculture Imagery Program (NAIP), and Unmanned Aerial Systems (UAS)). We also provide a qualitative assessment of the uncertainty in forest composition mapping from visual interpretation when using these image sources. In our second objective, to provide a comparison of these results with digital (automated) approaches, we quantified the individual tree identification accuracy achieved using the NAIP and UAS imagery. Three supervised classification algorithms were used for this test: CART, random forests, and SVM. This investigation provided a critical evaluation of the methods used to support local scale forest management, which for many parts of the world face a severe deficiency in coverage [14,57,58]. We also specifically targeted UAS applications which can be adopted by a broad audience by implementing only true color sensors and straightforward classification frameworks. Our research counters studies which have adopted multispectral, multi-temporal, Light Detection and Ranging (lidar), or hyperspectral data to UAS-based classifications of individual tree species [59][60][61][62]. In doing so, we provided a novel investigation of the capability for UAS to enhance forest inventory assessments and extend the availability of structurally diverse and species rich forest species composition data and the most relevant methods to do so [45].

Study Areas
A combination of nine forested properties, located in southeastern New Hampshire were studied during this research. The properties included a total of 605.15 hectares (ha) of forested land comprising a variety of species compositions, forest successional classes, and stand structures (Figure 1). These sites were selected due to their availability of field-based inventory data (i.e., Continuous Forest Inventory (CFI) plots), and because of their limited management. The average size of these properties is 70.36 ha, while the smallest (Moore Fields) contains 17.2 ha of forested land cover. All but one of the properties, the Blue Hills Foundation lands, are owned and managed by the University of New Hampshire ( Figure 2). These include College Woods, Kingman Farm, Thompson Farm, Moore Fields, East Foss Farm, West Foss Farm, Dudley, and Burley-Demeritt [63]. The Blue Hills study site is a contiguous forest conservation land, managed by the Harvard Forest.   [63]. The B study site is a contiguous forest conservation land, managed by the Harvard For Figure 1. Representation of the forest diversity found within each of the woodland prope played are the plot center (blue dot) for a singular Continuous Forest Inventory (CFI) plo as a 30 × 30 m buffer placed around this plot. This UAS image shows a complex New Engla with a variety of species including eastern white pine (Pinus strobus, Linnaeus), eastern (Tsuga canadensis, (L.) Carrière), northern red oak (Quercus rubra, F. Michaux), red maple (Ac L.), and American beech (Fagus grandifolia, (Ehrh.)). Each study site consisted of similarly h densities, with diverse and overlapping tree crowns. Displayed are the plot center (blue dot) for a singular Continuous Forest Inventory (CFI) plot, as well as a 30 × 30 m buffer placed around this plot. This UAS image shows a complex New England Forest with a variety of species including eastern white pine (Pinus strobus, Linnaeus), eastern hemlock (Tsuga canadensis, (L.) Carrière), northern red oak (Quercus rubra, F. Michaux), red maple (Acer rubrum, L.), and American beech (Fagus grandifolia, (Ehrh.)). Each study site consisted of similarly high stem densities, with diverse and overlapping tree crowns.

Field Reference Data
Ground-based inventory designs are unique for each land manager. For UNH properties, forest inventory data are collected so that communities can be managed to maintain research integrity and characteristics of the broader New England region [63]. Individual CFI plots are positioned systematically throughout each property with one plot per hectare (2.47 acres) [64]. At each plot, an angle gauge methodology is used to elicit a probability proportional to the size selection of each measured tree [65]. The UNH woodlands office follows the regional recommendation of a basal area factor (BAF) 4.59 m 2 /ha·(20 ft 2 /acre) inventory [64,66]. Any tree with a sufficient basal area and proximity from the plot center is recorded as a representative of the broader forest stand. Such methods give each plot a variable radius instead of a fixed size. Each selected tree had several biophysical measurements taken, including species name, diameter at breast height (dbh), collection date, and a silvicultural code (i.e., live or dead). Bearing and distance from the plot

Field Reference Data
Ground-based inventory designs are unique for each land manager. For UNH properties, forest inventory data are collected so that communities can be managed to maintain research integrity and characteristics of the broader New England region [63]. Individual CFI plots are positioned systematically throughout each property with one plot per hectare (2.47 acres) [64]. At each plot, an angle gauge methodology is used to elicit a probability proportional to the size selection of each measured tree [65]. The UNH woodlands office follows the regional recommendation of a basal area factor (BAF) 4.59 m 2 /ha·(20 ft 2 /acre) inventory [64,66]. Any tree with a sufficient basal area and proximity from the plot center is recorded as a representative of the broader forest stand. Such methods give each plot a variable radius instead of a fixed size. Each selected tree had several biophysical measurements taken, including species name, diameter at breast height (dbh), collection date, and a silvicultural code (i.e., live or dead). Bearing and distance from the plot center were also recorded for all measured trees. For several of the UNH woodlands included in this study, we elected to resample the plot locations and attributes ourselves to correct specific uncertainties. The newly resampled locations were chosen because the recorded positional accuracy appeared poor during exploratory data analysis and initial study [38]. The GPS receivers now available include Wide Area Augmentation System (WAAS) positional averaging for improved registration with remotely sensed imagery. These study sites included College Woods, Kingman Farm, East Foss Farm, and Thompson Farm.
At the Blue Hills conservation lands in Strafford, NH, CFI plots follow a randomly generated distribution. Plot data, first collected in 2008, were distributed across upland forests following a GIS analysis which removed areas within 50 m of parcel boundaries and non-forested land cover. To minimize spatial autocorrelation and capture a larger extent of the forest, a 50 m minimum spacing between plots was also defined. Individual inventory plots were resampled in 2010 and 2017, with the addition of 20 new plots in 2017. At each plot location, fixed area (20 × 20 m) plots were generated in which all trees taller than 1.4 m and with a dbh greater than or equal to 2.5 cm were measured (in cm). Vegetation recorded with a dbh smaller than 12.7 cm (5 inches) was filtered, however, during our inventory processing, non-tree vegetation was removed to present an estimate of species composition following a similar procedure to that used for the other study sites.
The training data used for analysis of the digital classification approaches (i.e., individual tree classifications) in this study were generated from a combination of (1) groundbased inventory trees that were remeasured specifically for this analysis and (2) visual interpretations of CFI plot measured trees that were cross-referenced by two experienced, undergraduate technicians [67]. A high-precision EOS Arrow 200 RTK GPS (EOS Positioning Systems Inc, Montreal, QC, Canada) with positional averaging was used to gather the locations of individual training trees for each class across several study areas [68]. These individual tree measurements (reference data) consisted of trees from a variety of sizes (dbh and height), as well as both dominant and co-dominant canopy classes. All of these trees were located within the core area of the contiguous forest and, as such, were a part of the contiguous forest canopy (see Figure 1 for stand heterogeneity). To ensure that a minimum number of both training and validation trees for each class were available, visual interpreters used a combination of the ground-based inventory trees, their local forestry knowledge (i.e., elements of visual interpretation for coniferous and deciduous trees), and specifically generated species-based training keys to generate additional reference data for several classes. For each class, 70 reference trees were collected for use in both the NAIP and the UAS supervised classifications. These reference data included each composition class found within our forest inventory plot classification scheme: white pine (Pinus strobus), eastern hemlock (Tsuga canadensis), other conifers (e.g., red pine (Pinus resinosa, Ait.)), American beech (Fagus grandifolia), oaks (Quercus spp.), red maple (Acer rubrum), other hardwoods (e.g., shagbark hickory (Carya ovata, (P. Mill. Koch))), and early successional forest species.

Remotely Sensed Imagery
To evaluate the use of visual interpretation for forest plot composition three image types were selected in this study. These are: Google Earth imagery, NAIP imagery, and UAS imagery. To evaluate the use of digital image analysis for forest plot composition and tree identification only the NAIP and UAS imagery were used. Our analysis began with evaluating visual interpretation because numerous research projects opt for visual interpretation of remotely sensed imagery as their source for reference data (e.g., Google Earth or airborne imagery) [14,16,69,70]. These data yield a synoptic view, can be cost effective, in modern times are high resolution, and in some cases provide multi-date or multispectral inferences. The Google Earth images are based on true color (RGB) high-resolution satellite imagery composites with the most current, cloud free, and seamless appearance [71]. For our study areas these included satellite imagery captured during the beginning of October 2018 and October 2020, with a variety of sensors including Landsat, Sentinel, and digital aerial photography (https://support.google.com/earth/answer/6327779?hl=en#zippy= %2Csatellite-aerial-images, last accessed 18 September 2021). The maximum resolution for the global coverage in Google Earth, however, is 15 m, with many areas featuring a much higher spatial resolution. The 2018 U.S. National Agriculture Imagery Program (NAIP) imagery maintains the same specifications as the imagery collected in 2016 [12,72]. That is, New Hampshire was collected at a 60 cm spatial resolution with 4 spectral bands (Blue, Green, Red, and near infrared (NIR)). For our study sites, NAIP imagery was collected between 6 August and 16 October 2018.
Two fixed-wing unmanned aircraft, the senseFly eBee Plus and eBee X, deployed with true-color sensors, were used to capture the UAS imagery for this research [73,74]. The eBee Plus was deployed with its associated Sensor Optimized for Drone Applications (SODA) while the eBee X was operated with the senseFly Aeria X sensor [75,76]. While the eBee X flight characteristics and camera quality are an improvement over the eBee Plus, hardware and logistical constraints required that several study areas were flown using the eBee Plus to ensure that summer leaf-on imagery (e.g., May-August in 2018, 2019, and 2020) could be captured. Both UASs were piloted using the eMotion flight management software (v3.15 and v3.19) (eMotion 2021), (senseFly, Genève, Switzerland) [77]. The preferred flight parameters were based on the results of previous research [34,78]. All missions were conducted with 85% forward overlap, 90% side overlap, winds perpendicular to the flight lines, consistent sun angle and exposure, and flight height at the Federal Aviation Administration (FAA) sUAS limit of 121.92 m (400 ft) [34,51,78].
Following the collection of the UAS imagery, the individual image locations were post-processed using the National Oceanic and Atmospheric Administration (NOAA) Continuously Operating Reference Stations (CORS) network RINEX files and the given eBee flight log [79]. These positionally corrected images were then transferred to Agisoft MetaShape, v1.5.5. (Agisoft LLC, St. Petersburg, Russia) for SfM modeling. Our processing workflow started with a 'high-accuracy' image alignment to ensure that the maximum number of images could be aligned while still maintaining a precise alignment. Next, the 'ultrahigh-quality' settings were selected to create the dense point cloud, digital elevation model (DEM), and orthomosaic. This maximum-quality setting ensured that DEM was generated using the full resolution of the imagery, which is the foundation of the segmentation process in the next section [80]. For each study area, an ultrahigh-resolution true-color (RGB) orthomosaic and DEM were generated. These spatial data products ranged in spatial resolution from 2.53 cm to 3.6 cm with an average pixel size (ground sampling distance) of 3.02 cm.

Classification Scheme
The characterization of New England Forest cover types is inherently difficult because of the density and species diversity of the trees [58,81]. Due to New England being a transition zone between boreal forests (to the north) and temperate hardwoods (to the south), there is a heterogeneous distribution of communities which must be captured even over small areas [43]. Several classification schemes exist for forest-cover types in this region including Eyre [82], Pugh [43], Justice et al. [83], and MacLean et al. [84]. Each classification scheme uses the overstory tree species composition as a means of subdividing community types. The goal of our classification was to provide knowledge of the distribution of ecologically and economically similar forest stands. To best suit this goal and capture prominent and unique communities, we adopted and modified the scheme given by Pugh [43]. We began by defining forested land-cover areas and individual trees. Here, we used the definition by Anderson [85] as areas that have 10% or more aerial tree-crown density (coverage), capable of producing timber, and influential on the climate or water regime. Our definition of trees, based on the above forest-inventory methods, reflect woody vegetation with a minimum height of 3 m and a minimum diameter of 12.7 cm (5 inches). The first level of our classification hierarchy (i.e., the generalized composition classes) distinguishes coniferous forests, mixed forests, deciduous forests, and early successional forests. Coniferous forests are forests which are dominated by tree species, comprising an overstory with greater than 66.6% basal area per unit area coniferous species. Mixed forests are forests which are dominated by tree species, comprising an overstory with less than or equal to 66.6% basal area per unit area and greater than or equal to 33.3% basal area per unit area coniferous species. Deciduous forests are forests which are dominated by tree species, comprising an overstory with less than 33.3% basal area per unit area coniferous species. Lastly, early successional forests include forests which represent highly distinct tree composition and structure, are representative of unique ecosystem function and management, and are a key element of the New England landscape [86]. Here, we included birch (Betula spp., Marsh.), ash (Fraxinus spp., L.), and aspen (Populus spp. Michx. (Salicaceae)) mixtures (not found in the previous classification scheme) within this 'early successional' category as an example of distinct early successional forests. The full definitions of each class within the next, more specific, level of forest classification are as follows: Coniferous (Softwood) White pine-any forested land surface dominated by tree species, comprising an overstory canopy with greater than 70% basal area per unit area eastern white pine. Hemlock-any forested land surface dominated by tree species, comprising an overstory canopy with greater than 70% basal area per unit area eastern hemlock. Mixed conifer-any forested land surface dominated by tree species, comprising coniferous species other than white pine or eastern hemlock (or a combined mixture of these species) that comprises greater than 66% basal area per unit area of the overstory canopy.

Mixed Forest
Mixed forests-any forested land surface dominated by tree species, comprising a heterogenous mixture of deciduous and coniferous species each comprising greater than 20% basal area per unit area composition. Important species associations include eastern white pine and northern red oak (Quercus rubra), red maple (Acer rubrum), white ash (Fraxinus americana, Marsh.), eastern hemlock, and birches.

Deciduous (Hardwood)
Red maple-any forested land surface dominated by tree species, comprising an overstory canopy with greater than 50% basal area per unit area red maple. Oak-any forested land surface dominated by tree species, comprising an overstory canopy with greater than 50% basal area per unit area white oak (Quercus alba, L.), black oak (Quercus velutina, Lam.), northern red oak, or a mixture. American beech-any forested land surface dominated by tree species, comprising an overstory canopy with greater than 25% basal area per unit area American beech composition. This unique class takes precedence over other mentioned hardwood classes if present. Mixed hardwoods-any forested land surface dominated by tree species, comprising deciduous species other than red maple, oak, or American beech (or a combined mixture of these species) that comprises greater than 66% basal area per unit area of the overstory canopy.

Early Successional
Early successional-any forested land surface dominated by tree species, comprising an overstory composition that is highly distinct including areas dominated by early successional species such as paper birch (Betula papyrifera, Marsh.), white ash (Fraxinus americana), or aspen (Populus spp.).

Forest Composition from Visual Interpretation Accuracy/Uncertainty in Visual Interpretation
At each CFI plot location, a 30 × 30 m fixed area was registered to the plot center. Two trained, forest technicians interpreted and independently assigned a forest composition class to each NAIP, Google Earth, and UAS inventory plot sample. Any plot that was not interpretable in the imagery or was not labeled forest on any of the imagery sources was removed. This filtering of poor-image-quality locations resulted in a final sample size of 408 inventory plots. Each individual sample was interpreted a minimum of three times by each technician so that a combined consensus for each interpreter (rather than a single estimation) was determined. The majority composition, or mixture of classes was then used to label the final plot composition for each source of imagery (see Fraser and Congalton,[38]). A thematic map accuracy assessment error matrix was then used to quantitively compare the plot level agreement for each imagery source to the field reference data [39]. To aid the manual interpretation, training keys for each composition class were created for selected CFI plot locations for each image source. These training keys provided clear examples of each individual species and a distinct threshold between the forest classes. Additionally, both visual interpreters were trained using local reference imagery and the elements of visual interpretation regarding both coniferous and deciduous forest canopy characteristics. To ensure that each inventory plot was labeled on the basis of a consensus and not a single visual assessment, both interpreters classified each sample three times, leading to a total of six trials for each source of imagery. The consensus of these six trials was used to label the final composition for each inventory plot. The agreement (or conversely variability) of these six trials was investigated during our qualitative analysis of visual interpretation uncertainty.
Following the quantitative analysis of visual interpretation accuracy using each of the three remotely sensed imagery sources, we conducted a qualitative assessment of both specific and generalized composition class uncertainty. This qualitative assessment included a review a minimum of four inventory plots, randomly selected from each composition class. In total, 36 of the original 408 plots were sampled. We then analyzed the variability and misclassification of such plots across each of the three interpretation trials that both visual interpreters conducted (six in total). This test was completed for each of the three imagery sources so that similarities and differences in their ability to label individual classes could be better understood. We applied this qualitative analysis to both the more specific scheme of nine composition classes and the generalized scheme of four composition classes. A flow chart for both the visual interpretation and the digital classification methodologies can be seen in Figure 3.

Image Segmentation and Tree Detection
To evaluate the digital classification approaches, both NAIP and UAS imagery was segmented and classified using three supervised classification algorithms. The Google Earth imagery was not classified using these methodologies as the data were only hosted within the Google Earth Pro software v7.3.4 (Google, Mountain View, CA, USA) and, therefore, could not be digitally processed.
variability and misclassification of such plots across each of the three interpretation trials that both visual interpreters conducted (six in total). This test was completed for each of the three imagery sources so that similarities and differences in their ability to label individual classes could be better understood. We applied this qualitative analysis to both the more specific scheme of nine composition classes and the generalized scheme of four composition classes. A flow chart for both the visual interpretation and the digital classification methodologies can be seen in Figure 3.  We used a multiresolution segmentation technique found within eCognition v9.1 (Trimble, Munich, Germany) to delineate individual tree crowns on the NAIP imagery. This segmentation algorithm was selected due to the lack of tree crown morphology data (such as a DEM) matching the resolution of this imagery. A range of segmentation scale, color/shape, and compactness/smoothness parameters were tested (e.g., scale ranging from 10 to 600 (intervals of 10), color/shape ranging from 0.1 to 0.7 (intervals of 0.1), and compactness/smoothness ranging from 0.3 to 0.8 (intervals of 0.1)). The results of these segmentation parameter combinations were evaluated both qualitatively (i.e., visually) and quantitatively in comparison to manually digitized reference trees (i.e., polygons) at several of our study areas. For the quantitative assessment, we calculated the over-segmentation accuracy (Oa), under-segmentation accuracy (Ua), and quality rate (QR) of each parameter (see Gu et al. [80]) combination for over 200 digitized reference trees [87,88]. The equations for Oa, Ua, and QR are included below. The goal of this segmentation was to provide pure tree species segments, which dictated that over-segmentation took priority over the other evaluation metrics. A subset of the best-performing (quantitatively) results were then visually reviewed to select the best fit. Following the selection of an optimal parameter combination, individual tree crowns were delineated on the NAIP imagery. A total of 29 object-level features (spectral, textural, and geometric attributes) were calculated for use in the supervised classification algorithms (see Table A1 in Appendix A). Two of these features, the mean and the standard deviation of the near-infrared (NIR) band were unique to the NAIP imagery.
In these three equations, r i denotes the i-th reference polygon, and s i represents the i-th segmented polygon that overlaps with r i . The symbol ∩ is defined at the intersection between two polygons, while ∪ is their union [80,87,88].
Segmentation of the UAS imagery was conducted using a marker-controlled watershed segmentation (MCWS) technique [80,88]. This MCWS workflow consisted of several stages, each reliant on the 3D tree crown data available for each study area. We began by creating an ultrahigh-resolution canopy height model (CHM) based on the DEMs. A 2 m New Hampshire lidar bare earth dataset was used to adjust the SfM DEMs to height above elevation (i.e., terrain) values [89]. Next, we applied a Gaussian filter to this raster dataset to diminish excessive pits and peaks (i.e., noise) in the data [80,88,90]. To begin the individual tree detection and delineation (ITDD) process, we applied a local maxima filter, with a fixed window size, to the final filtered CHM to establish the MCWS marker (i.e., individual tree crowns). A fixed, circular, window size of 45 cells (~1.65 m) was chosen for this step. This window size was selected during initial testing because it met our objective of avoiding under-segmentation (omission error) as much as possible, thus allowing the generation of tree segments which represented only single species. Other similar studies for this region selected larger fixed window sizes for the purpose of determining the best performance for individual tree delineation as represented by QR at the expense of greater omission error [52,67,80]. To quantify the individual tree detection error, we calculated the object detection rate (ODR), as well as the over-detection (over-segmentation or commission) and under-detection (under-segmentation or omission) by comparison with over 200 digitized reference trees [67,80,91]. The next stage in the MCWS process consisted of masking the non-forested areas and large canopy gaps on the basis of a minimum height threshold. A height threshold of 6 m was applied to the CHM prior to delineating the individual tree crowns [92]. The final stage of the MCWS process applied the segmentation algorithm, which was initialized at the given markers and delineated tree boundaries using the height gradients from the CHM [80]. Similar to the NAIP segmentation results, the final UAS tree segments were quantitatively and qualitatively evaluated against manually digitized reference trees using the Oa, Ua, and QR metrics [88]. After this assessment of segmentation quality, 26 spectral, geometric, and textural features were generated for each tree segments using eCognition, which were then available for use the digital classification approaches (see Table A1 in Appendix A).

Automated Classifications
Three supervised classification algorithms were applied to tree segments generated from the NAIP and UAS data to label the segment into one of the classes in the classification scheme. Multiple classification algorithms were implemented because of their oftencontradictory performance in other studies [20]. First, we applied a singular decision tree (CART) to determine if the complexity of our forests could be differentiated by a more simplistic classifier [18,27]. Secondly, we applied a random forest (RF) ensemble classifier, made up of 500 decision trees, to these same tree segments [20,26]. We used the Gini index for this classification to control the decision tree splits [93,94]. For both of these decision tree-based classifications, the mean decrease in impurity (MDI) was calculated for each of the included features to ensure that an optimal confluence of input data could be enforced. In other words, individual features with the lowest scores could be pruned to both reduce the dimensionality of the classification and improve the overall accuracy. For the final supervised classification algorithm, we implemented a support vector machine (SVM) classifier based on the one-against-one form [29,95]. A linear kernel was selected for the kernel function [20]. All three of these classifications were performed in Python using the Sickit-learn package and with all of the available geometric, spectral, and textural features [29,[96][97][98]. Using this package, a number of procedures for selecting the training and validation samples were implemented. These included (1) splitting the reference data to achieve a minimum validation sample size of 30 samples per class (i.e., 55% training and 45% validation), (2) splitting the reference data to achieve a minimum validation sample size of 30 samples per class and performing removing negatively influential features based on the MDI scores, (3) splitting the reference data to achieve a 65% training/35% testing split, and (4) conducting a permutation-based out-of-bag validation with 3% of the total sample size selected for validation. We then elected to apply the procedure that both achieved the highest overall accuracy and maintained a statistically valid accuracy assessment [39]. Each accuracy assessment for each of the classification methods and imagery sources was performed 10 times so that an average of the overall accuracy could be recorded.

Accuracy/Uncertainty in Visual Interpretation
The accuracy achieved when visually interpreting forest inventory plot-level compositions using the Google Earth, NAIP, and UAS imagery was evaluated for both the nine-class and the four-class composition schemes. The sample sizes and labels for these classes can be seen in Table 1. In total, 408 forest inventory plots were classified for each of the three imagery sources. A large portion of these plots, according to the field-inventory data, were coniferous (a combination of white pine, eastern hemlock, and mixed conifer composition classes). Plot-level classification accuracies using the each of the three high-resolution imagery sources were low given the species complexity of these forests (see Table A2 in Appendix A). The overall accuracy for interpreting nine classes using the Google Earth imagery was 29.9%. Classes such as AB, EH, RM, and OAK showed the lowest thematic accuracies. When generalized to only four classes, the overall accuracy using the Google Earth imagery increased to only 44.85%. Interpreting these same plots using the NAIP imagery resulted in a similar performance. Our nine-class thematic accuracy was 31.86%, while the generalized four-class assessment resulted in an overall accuracy of 46.57%. Both the nine-class and the four-class interpretation accuracies were higher when using the highest-spatial-resolution UAS imagery. The forest inventory plot compositions reached an overall accuracy of 39.46% for nine classes ( Table 2) and 54.44% for four composition classes (Table 3).    We conducted a qualitative assessment of the uncertainty incurred during the visual interpretation of complex, mixed-species forests. Our assessment included the labeling of plot-level composition across Google Earth, NAIP, and UAS imagery. Table 4 shows a subsample of 36 plots where the results of the field data are compared to the visual interpretation results. For example, we see that the first OAK plot (Table 4) comprised 81.2% OAK, with the remainder of the composition (18.2%) being American beech. With the proportion of OAK being greater than 50%, according to the field data, each of the interpretations should have also labeled the plot as OAK; however, there were several instances in which the interpreter labeled the plot as mixed hardwoods (MH). An MH classification would indicate that the plot was visually interpreted as having greater than 66% deciduous composition, while also consisting of less than 50% OAK composition and less than 20% AB composition. The final eastern hemlock (EH) plot, containing 87.5% EH, was mislabeled once as mixed forest (MF) and twice as mixed conifer (MC).
These interpretations indicated that the interpreters did not recognize a composition containing greater than 70% EH. For each of the four AB plots, with six interpretation trials each, these plots were only mislabeled as coniferous-dominated once. The most common misclassification of AB plots was MH. This misclassification of AB as MH indicated that both interpreters did not recognize forest compositions containing greater than 20% AB. Visual interpretations conducted using the Google Earth imagery showed large amounts of uncertainty for all plots other than those heavily dominated by white pine (WP) or mixed forest (MF). Of the 36 plots that were included in this analysis, only three reported a consensus (four or more labels in agreement) for the correct forest composition. The NAIP imagery visual interpretations fared slightly better. For these assessments, most classes were identified correctly labeled at least once for most plots. Composition classes such as WP, MH, and early successional (ES) were correctly identified more often with the NAIP imagery than the Google Earth imagery. Nevertheless, only five of the 36 plots were interpreted with a majority agreement for the correct composition. When interpreting the UAS imagery, there was a noticeable decrease in the uncertainty for identifying individual species (e.g., American beech and red maple). MH, however, showed a noticeable drop in successful identifications when using the UAS imagery. Although individual classes were correctly identified more often, there was still a low percentage of classes which formed an agreement for the correct forest composition. Six of the 36 plots (16.7%) interpreted using the UAS imagery resulted in a majority agreement for correct composition class.  These interpretations indicated that the interpreters did not recognize a composition containing greater than 70\% EH. For each of the four AB plots, with six interpretation trials each, these plots were only mislabeled as coniferous-dominated once. The most common misclassification of AB plots was MH. This misclassification of AB as MH indicated that both interpreters did not recognize forest compositions containing greater than 20\% AB. Visual interpretations conducted using the Google Earth imagery showed large amounts of uncertainty for all plots other than those heavily dominated by white pine (WP) or mixed forest (MF). Of the 36 plots that were included in this analysis, only three reported a consensus (four or more labels in agreement) for the correct forest composition. The NAIP imagery visual interpretations fared slightly better. For these assessments, most classes were identified correctly labeled at least once for most plots. Composition classes such as WP, MH, and early successional (ES) were correctly identified more often with the NAIP imagery than the Google Earth imagery. Nevertheless, only five of the 36 plots were interpreted with a majority agreement for the correct composition. When interpreting the UAS imagery, there was a noticeable decrease in the uncertainty for identifying individual species (e.g., American beech and red maple). MH, however, showed a .   We also assessed the uncertainty in visual interpretations when the forest classes were generalized to conifer forest (C), deciduous forest (D), mixed forest (MF), and early successional forest (ES). For the Google Earth and NAIP interpretation assessments of forest composition, there was a less obvious contrast between the uncertainty incurred in labeling four classes and the uncertainty in labeling nine classes. Much of the misclassification for both imagery sources resulted in commission to the MF class, instead of a similar species dominance. Using the Google Earth imagery, nine of the 36 inventory plots were labeled correctly, according to a majority agreement. With the NAIP imagery, 11 of the 36 plots reported a majority agreement for the correct forest composition. In Table 5, we see the plot-level interpretations using the UAS imagery. Classes such as WP, OAK, and American beech (AB) had fewer misclassifications at this level of generalization. The third ES plot (from the top), containing a 100% ES basal area composition, was still mislabeled as deciduous during all trials. The third WP plot (third from the top) was incorrectly labeled MF during five of the six trials, despite containing only 8.3% OAK and 8.3% ES composition. Many of the MF classes were incorrectly labeled as either coniferous-or deciduous-dominated. Using the UAS imagery to visually interpret four generalized forest composition classes at the plot level resulted in the lowest amount of uncertainty. Overall, 28 of the 36 (77.78%) were labeled with a consensus for the correct forest composition.

Image Segmentation and Tree Detection
Quantitative metrics (Oa, Ua, and QR) were used to determine an optimal set of multiresolution segmentation parameters to delineate individual tree crowns within the NAIP imagery. The optimal selection of segmentation parameters (for use in eCognition for the multiresolution segmentation technique) included a scale parameter of 10, a color/shape of 0.2, and a compactness/smoothness of 0.5. Measuring the correspondence of these tree segments to 230 reference trees resulted in an Oa of 0.382, a Ua of 0.849, and a QR of 0.657.
For the MCWS of the UAS CHM and orthomosaic, we began by assessing the individual tree detection accuracy. A total of 231 samples were used for this assessment ( Table 6). The 45-cell fixed window size led to an overall detection accuracy of 93.9%. This detection rate is a combination of the 231 reference trees that were detected as a singular canopy (correct or 1:1 detection) and those that were detected as multiple trees. In other words, only 6.1% of the reference trees were not detected (under detection or omission error). While a smaller window size did eventually remove the omission error, it caused every tree to be heavily over segmented. A larger window size increased the omission error (under detection) to greater than 10%. Table 6. Individual tree detection accuracy for the unmanned aerial system (UAS) imagery segmentation. 85  132  14  231 36.80% 57.14% 6.1% Overall Detection Accuracy

93.9%
Continuing through the MCWS process, we quantitatively evaluated the final segmentation results against these same 231 reference samples [93]. These UAS tree segments resulted in a Oa of 0.73, a Ua of 0.523, and a QR of 0.6438.

Digital Classifications
Both NAIP and UAS imagery was evaluated for the effectiveness in identifying individual trees using three supervised digital classification algorithms. The sample sizes for each of the eight composition classes for both imagery sources are included in Table 7. Since this approach was conducted for labeling individual trees, the mixed forest class was not possible. Individual tree digital classifications using the segmented NAIP imagery were generated using the CART, RF, and SVM classifiers. Following the examination of feature importance scores (see Figure A1 in Appendix A), we removed the gray-level co-occurrence matrix (GLCM) contrast, GLCM dissimilarity, border index, and gray-level difference vector (GLDV) contrast for the NAIP imagery CART and RF classifications. This removal resulted in an increase in overall accuracy of 1.13% and 1.55% for CART and RF, respectively. The overall accuracy of labeling eight classes for the three classifiers was 21.44% (CART), 29.23% (RF), and 29.36% (SVM).
The digital classification of eight composition classes using UAS imagery resulted in higher overall accuracies for each of the three supervised classifiers. For this imagery, the least important features were asymmetry, density, shape index, radius of the short ellipsoid, and compactness (see Figure A2 in Appendix A). The removal of these features improved the overall accuracies by 0.235% (CART) and 1.33% (SVM). The overall accuracies for eight composition classes using the UAS imagery, based on an average of 10 iterations, were 33.27% (CART) (see Table A3 in Appendix A), 46.67% (RF) (see Table A4 in Appendix A), and 46.90% (SVM) ( Table 8). These UAS thematic accuracies represented, on average, a 15.60% increase over the same methods when using the NAIP imagery.  The overall classification accuracies for both NAIP and UAS imagery increased when the eight classes were collapsed to conifer, deciduous, and early successional. We again evaluated the feature importance for both the NAIP and the UAS image classifications (see Figure A3 in Appendix A), to determine the optimal feature selection for classifying coniferous, deciduous, and early successional cover types. Both imagery sources showed a general consensus for the most important (e.g., greenness and brightness) and least important (e.g., border index and compactness) features. The NAIP imagery correctly classified, on average, 45.32% of the tree segments using the CART algorithm. Using the RF and SVM algorithms, the average overall accuracies increased to 53.58% and 52.69% respectively. Classifying these same image segments using the UAS imagery produced average overall accuracies of 59.62% (CART), 70.48% (RF) ( Table 9), and 68.59% (SVM). Table 9. Thematic map accuracy assessment error matrix for individual trees using the UAS imagery and the RF algorithm for three classes.

Analysis of Visual Interpretation Uncertainty
The qualitative analysis of visual interpretation uncertainty showed regular progression in the ability to differentiate composition classes within complex forests. When classifying more specific composition classes (i.e., nine groups), we saw that all three remotely sensed imagery sources struggled to provide a consensus across six interpretation trials. Such a consensus is needed to provide both an accurate and a confident label for the composition of each inventory plot. The UAS imagery also showed slightly less variability in the identification of more pure species classes, in comparison to the Google Earth and NAIP imagery. The perceived ability to identify individual species, however, also led to a lower percentage of plots labeled as mixed hardwoods or mixed conifers. Other classes, such as EH, demonstrated that, even with nearly absolute plot composition (>85%), there was a significant amount of confusion and misclassification with other species. Such classes likely require further training or revision of the classification scheme [2,10]. When the forest composition was generalized to only four classes, all three imagery sources showed a considerable reduction in misclassifications. While there was still some confusion between specific mixtures or dominance, many of the plots for each source of imagery could be identified at least in these basic compositional groups. Additional classification rules such as forming a hierarchical classification by first identifying the plot as coniferous, deciduous, mixed, or other forest could have bridged this gap in misclassifications [18]. One potential source of confusion in the labeling of these inventory plots could have been the presence and visual perception of large trees. Large trees are known to disproportionately account for stand structure and function [67,99]. A few large trees (or even a single tree in some cases) could have accounted for a large portion of the perceived plot composition based on the synoptic view of the visual interpretations. These same trees, however, may not be representative of the same compositional dominance when measured using the variable plot radius design that was used to collect our field-based reference data. This research was conducted within the transition forest region of New England forests [58]. These mixed-species forests comprise a rich diversity of hardwood species at local scales, as well as contain a common white pine and eastern hemlock component. The lower-spatialresolution Google Earth and NAIP imagery may suffer from this tendency for species mixtures, as both resulted in a large amount of MF commission error, even during the labeling of four composition classes. Lastly, certain classification scheme edge cases (e.g., a plot with 33% coniferous composition which could be interpreted as deciduous-dominated or MF depending on the interpreter) were found during this qualitative analysis.
When looking at the overall thematic accuracies for the Google Earth, NAIP, and UAS plot-level interpretations, we formed several important insights. For both the nine-class composition accuracy and the four-class composition accuracy, the Google Earth and NAIP imagery produced approximately equal results. Both sources of imagery demonstrated a considerable amount of commission error for the MF class. The NAIP imagery acquisition (influencing phenology) and image characteristics were not consistent, leading to challenges in interpretations across study areas [12]. Further spatial data exploration and preprocessing before using the NAIP imagery could be integrated to influence species classification success. The UAS imagery exhibited an ability to discern nine classes with a higher accuracy than even the generalized composition accuracy for either of the other imagery sources. Despite the increased spatial resolution to only 3.02 cm using the UAS imagery, however, the highest overall accuracy achieved using visual interpretation was still only 51.96%. As with other studies, specific hardwood classes and early successional species mixtures (ES) showed a high amount of thematic classification error [59].

Analysis of Digital Classifications
Despite watershed segmentation being one of the most common and powerful methods for delineating tree crowns given the availability of 3D data, the visual assessment of tree segment quality was never absolute for all species [80,88]. Our individual tree detection accuracy produced a final omission error of 6.1%, similar to other studies conducted using remotely sensed data [1,61]. During the manual refinement of the digital classification training samples, it was observed that many tree segments still contained some portion of a species mixture. The occurrence of mixed species tree segments was especially common for the large coniferous trees, which displayed the lowest classification accuracy. The individual segments for these large coniferous trees commonly absorbed neighboring subdominant canopy deciduous trees. A more advanced segmentation technique could be adopted in future studies to better produce pure tree segments [52].
Turning to the automated individual tree classification results, the UAS imagery produced, on average, a 15.65% increase in overall accuracy over the NAIP imagery. Digital classification of the NAIP imagery, as with the interpretation analysis, likely suffers from inconsistencies in collection date and spectral characteristics [12]. The highest overall accuracy for eight classes was achieved using the UAS imagery and the SVM classifier, at 46.90%. This classification accuracy represents a 7.44% higher accuracy than visual interpretations at the plot level. Both NAIP and UAS imagery supervised classifications still resulted in low accuracies for more specific classes such as EH and RM, however. In the automated classification of generalized (three) classes, we again observed an increase in performance for the UAS imagery over the NAIP imagery. The accuracy of the UAS imagery was, on average, 15.70% higher for the three supervised algorithms in comparison with the NAIP imagery. The highest overall accuracy for the three-class automated classification was produced using the UAS and the RF algorithm, at 70.48%, which is an increase over the four-class visual interpretation accuracy of 16.04%. Achieving a higher overall accuracy for eight classes using the SVM algorithm and for four classes using the RF algorithm is not inconsistent with other findings. Many studies either evaluated the results of multiple machine learning algorithms or found that the best classifier is application-dependent [20,100,101]. As part of our initial testing, we compared various procedures for training and validating these individual tree classifications (Table 10). These methods included (1) splitting the reference data to achieve a minimum validation sample size of 30 samples per class, (2) splitting the reference data to achieve a minimum validation sample size of 30 samples per class and performing removing negatively influential features, (3) splitting the reference data to achieve a 65% training/35% testing split, and (4) conducting a permutation-based out-of-bag validation with 3% of the total sample size selected for validation. On the basis of both the performance and the statistical validity, we applied the second method for each of the digital classification evaluations [21,39]. Similar studies employing the use multispectral and multitemporal UAS have been known to produce higher overall accuracies. In Gini et al. [60], accuracies were produced which ranged from 58% to 87%. These findings, however, were for the classification of several hardwood species within a private nursery, which is different from the species-rich New England forests evaluated here. Xu et al. [61] produced comparable accuracies for eight subtropical species (conifer and deciduous) by incorporating both multispectral imagery and use of the photogrammetric point cloud. For the classification of eight conifer and deciduous species, they found a 65% overall accuracy and an 80% overall accuracy for labeling only coniferous and deciduous species. The inclusion of multispectral bands and indices or simply an increase in spectral resolution would likely increase the classification accuracy when using the UAS imagery [60,[102][103][104]. One of the most important features, as reported in Figures A1 and A3 (see Appendix A), for the NAIP imagery individual tree classifications was the NIR band. Numerous studies have outlined the importance of NIR reflectance in tree species classification [12,105,106]. Our results, however, show that true color 'photogrammetric' sensors, which may provide a more efficient and sometimes more effective platform for surveying contiguous forests, can be used with a decrease in classification accuracy of approximately 10% [34,75,76]. One important factor for this success was the selection of and reduction in classification features [107]. Our MDI test and feature reduction, while only resulting in a 2% difference in classification accuracy here, will become more important as the number of features and the spectral complexity are increased [21,108]. Lastly, image segmentation quality improvements could be explored to enhance individual tree classification. High-resolution image segmentation techniques and individual tree detection and delineation methods are being developed at a rapid pace [52,88,[109][110][111]. The ability to accurately detect and delineate the range of tree species and crown morphologies present in this landscape would provide more representative training samples for each species and, therefore, enhance the potential of each classification algorithm.

Future Perspectives
Future research should continue to investigate the best methods for adopting UAS for fine-scale (i.e., precision) forest management [57][58][59]. Data fusion techniques, such as the integration of both satellite and UAS data [112] or of optical and lidar data [1,62] present methods for overcoming the limitations of UAS digital photogrammetry and achieving high accuracies for individual tree identification. Advanced classification algorithms may also present a variety of methods for better handling of the data dimensionality. However, such techniques would require a far greater amount of training data and technical expertise to complete [20,21]. The extension of forest composition data from one location for classification of another could provide several advantages to forest managers, such as semiautomated classifications, considerable gains in time, cost reductions, and lower expert user knowledge required when given proper consideration for potential sources of uncertainty [113]. Unlike satellite-based generalizations of forest composition data across study sites, UASs are not prone to the same dissimilarities in image characteristics [18,113,114]. Instead, UAS applications face a myriad of rapidly evolving computer vision and data science challenges and solutions [115]. The development of these disciplines and tools is hoped to lead to achieving sufficient tree-level accuracies, which can then be aggregated to the plot or forest stand levels.

Conclusions
Trends in automated and semiautomated forest classifications using high-resolution remotely sensed data have made the thematic classification of individual trees a realistic aspiration. In this study, we evaluated, both qualitatively and quantitatively, the application of Google Earth, NAIP, and UAS imagery for plot composition and individual tree identification. For this analysis, we compared visual interpretation and digital processing approaches. Our results indicated that supervised machine learning classifiers outperformed visual interpreters for specific (+7.44%) and generalized (+16.04%) species composition. While visual interpretation is commonly applied for broad-scale inferences of forest composition, the uncertainty in labeling more specific classes, as well as the costs required to train interpreters, makes fine-scale assessments impractical. Our results indicate that automated machine learning approaches can be a capable alternative for local-scale forest surveys, even with only single-date true-color imagery. In comparison with other research, the inclusion of multitemporal imagery, multispectral imagery, or more advanced segmentation techniques would likely further increase this divide. Subsequent studies should continue to examine diverse forests and geospatial analysis techniques for delineating the trees within them.           Table A4. Thematic map accuracy assessment error matrix for individual trees using the UAS imagery and the RF algorithm for eight classes.