Improved Classification of Urban Trees Using a Widespread Multi-Temporal Aerial Image Dataset

Urban tree identification is often limited by the accessibility of remote sensing imagery but has not yet been attempted with the multi-temporal commercial aerial photography that is now widely available. In this study, trees in Detroit, Michigan, USA are identified using eight high resolution red, green, and blue (RGB) aerial images from a commercial vendor and publicly available LiDAR data. Classifications based on these data were compared with classifications based on World View 2 satellite imagery, which is commonly used for this task but also more expensive. An object-based classification approach was used whereby tree canopies were segmented using LiDAR, and a street tree database was used for generating training and testing datasets. Overall accuracy using multi-temporal aerial images and LiDAR was 70%, which was higher than the accuracy achieved with World View 2 imagery and LiDAR (63%). When all data were used, classification accuracy increased to 74%. Taxa identified with high accuracy included Acer platanoides and Gleditsia, and taxa that were identified with good accuracy included Acer, Platanus, Quercus, and Tilia. Our results show that this large catalogue of multi-temporal aerial images can be leveraged for urban tree identification. While classification accuracy rates vary between taxa, the approach demonstrated can have practical value for socially or ecologically important taxa.


Introduction
Urban trees provide ecosystem services such as air pollution removal, cooling, and storm water retention as well as ecosystem disservices, including the release of volatile organic compounds and allergenic pollen [1,2]. The services and disservices provided by trees vary substantially between taxa (e.g., [3][4][5]) and they also vary across space [6,7]. Comprehensive maps of urban trees at the species or genus level would be tremendously useful for urban foresters, planners, ecologists, and public health practitioners. For example, maps of ash trees have been used to help manage emerald ash borer outbreaks in urban settings [8,9], and maps of wind-pollinated trees could help to predict allergenic pollen exposure [10].
Most data on urban trees is collected using plot-based sampling methods such as i-Tree Eco [11], and the United States Forest Service has initiated a plot-based survey program in over 100 urban forests [12]. These plot-based sampling approaches provide unbiased city-wide estimates of urban tree composition [11], but do not generate the comprehensive maps of trees that would be most useful for managers and researchers. Surveys of street trees (i.e., city owned trees in the right of way) are common in medium and large cities and often measure all street trees in a city [13,14]. However, street trees

Data Description: Tree Census
Davey Tree Company surveyed 169,011 street trees from 2011-2016, which was a project undertaken by the Michigan Department of Natural Resources and funded by the USDA Forest Service State and Private Forestry Program. Trees in right of ways were included but trees in parks, on private property, and in Highland Park and Hamtramck were not. Species were identified and tree locations were recorded using meter-accurate GPS. When compared to manually calculated tree canopy centroids from the high-resolution LiDAR dataset (described below) the RMSE was 2.0 m (this included both measurement error and differences between stem locations and tree canopy centers). Tree diameter at breast height (DBH) was also recorded. The most common tree genera were Acer, Gleditsia, Platanus, Ulmus, and Quercus (Table 1).  study area were omitted from this study.) We used 8 scenes that were collected between 2014 and 2018 ( Table 2; image collection dates were: 23 October 2014, 2 November 2014, 1 May 2015, 10 October 2016,  17 April 2017, 23 September 2017, 28 November 2017, and 25 October 2018). These images are available at very high resolution (up to 7.5 cm), but here we used them at a resolution of 0.6 m. Image tiles were downloaded from the Nearmap API, mosaicked, and reprojected to UTM zone 17 N. The average positional accuracy RMSE between the Nearmap scenes was 1.44 m (using an average of 56 control points between each pair of scenes) and the average positional accuracy RMSE compared to LiDAR was 1.06 (Table 2). We also explored aerial images from the National Agricultural Imagery Program but image registration was unsatisfactory. A World View 2 image taken on 13 June 2011 was obtained from Digital Globe ( Table 2). The satellite swath width is 16 km, so we restricted our analysis to that area. Standard atmospheric corrections were applied within Erdas Imagine (Hexagon Geospatial, Madison, AL, USA) and the area with clouds (4.24 km 2 ) was removed from all analyses to ensure fair comparisons between datasets.
LiDAR data have been widely used for tree identification and are now publicly available for much of the United States. In particular, LiDAR can be used to segment tree canopies [31], which replaces the time-consuming task of manually delineating tree crowns that has been undertaken in several studies [17,24,28,29], facilitating GEOBIA approaches on a large spatial scale. We used LiDAR data collected and processed by Sanborn Mapping Company for the State of Michigan, which is now publicly available through the Southeast Michigan Council of Governments. Sanborn Mapping Company collected LiDAR data between 12-23 April 2017, with a Leica ALS70 MPiA LiDAR instrument and an integrated IPAS20 GPS.INS system mounted within an Aero Commander twin engine airplane. Airborne GPS data were postprocessed with Intertial Explorer and LEICA CloudPro and the georeferenced LiDAR data were classified and edited with Terrasolid Terrascan software. Non-vegetated Vertical Accuracy (NVA) was assessed by comparing the digital elevation model with 185 ground control points; the data met Quality Level 2 standards (19.6 cm NVA). Nominal point density was 2.2 points/m 2 ( Table 2).

Workflow
The analysis steps are described in Figure 2 and examples of the datasets are shown in Figure 3. LiDAR point cloud data were used to create a pit-free digital surface model (DSM) through a triangulated irregular network approach [38] that included buffering individual points and Gaussian smoothing, implemented within the R package lidR [39]. Other products created included digital elevation models, digital surface models (DSM), and normalized digital surface models (nDSM). Two-dimensional return intensity rasters were provided with the 2017 LiDAR data and were also used in the classification.

(See Supplementary Materials).
LiDAR point cloud data were used to create a pit-free digital surface model (DSM) through a triangulated irregular network approach [38] that included buffering individual points and Gaussian smoothing, implemented within the R package lidR [39]. Other products created included digital elevation models, digital surface models (DSM), and normalized digital surface models (nDSM). Two-dimensional return intensity rasters were provided with the 2017 LiDAR data and were also used in the classification. (See Supplementary Materials)  Tree segmentation of the nDSM was also conducted in lidR. Tree tops were detected by passing a circular local maximum filter over the nDSM raster; the tallest point in the window was assumed to be a tree top. We followed others in assuming that taller trees have larger canopies and therefore varied the local maximum window size as a function of focal pixel height [39,40]. After qualitatively exploring linear and quadratic functions and various parameter values, we selected a linear equation with an intercept of 4.00 and a coefficient of 0.02. Tree crown polygons were then delineated using a seed and region growing algorithm in which each local maximum is considered the seed of a tree crown and neighboring pixels are added to that crown if their heights are less than a certain percentage of the focal pixel's height; this algorithm is described in detail elsewhere [39,41]. Tree crown polygons that contained more than one street tree point were not used in training or testing data.
To prevent mismatches between points in the street tree database and canopy cover, we discarded trees that were: <10 cm DBH, <5 m tall in 2017, or that may have been under another tree's canopy (based on manual examination of the data, this was defined as being >12 m tall but having a DBH < 20 cm.) Acer platanoides, the most common tree species, was identified separately from other Acer species, as it is of ecological interest (it is sometimes considered invasive [42]) and spectrally distinct while flowering. Individuals of uncommon genera (<100 individuals in the analysis dataset) were grouped together ("other") and comprised 0.7% of the dataset.
Several common spectral indices for 3 and 8 band images were calculated [43,44] and are listed in SI 1; for example, for WorldView 2 several measures of NDVI were calculated and for Nearmap images examples of calculated indices included GEI and GRVI [43,44]. The mean, median, and standard deviation of each of these indices were extracted for each tree crown polygon. Similarly, we also extracted the mean, median, and standard deviation from the other available variables, including the original bands from each Nearmap and WorldView 2 image, the LiDAR intensity, and the nSDM. We also explored several textural indices similar to those used by others [24,45] for each dataset using the R package GLCM [46] for pixel windows of size 3, 5, and 9 pixels. However, these derived texture variables contributed little accuracy (<2%) and so we do not include them in the final analysis.
Tree polygons were divided into training data (70%) and testing data (30%) for classification. Classification was conducted using the randomForest package in R [47] based on the remote sensing variables that had been extracted for each polygon. Confusion matrices were computed for testing data. Analyses were conducted using data from all sources (i.e., WorldView 2, Nearmap, and LiDAR data) and then separate analyses were conducted to compare the accuracy of models using different Remote Sens. 2020, 12, 2475 6 of 14 subsets of data (e.g., WorldView 2 vs. Nearmap). Variable importance was measured by the mean decrease in accuracy and Gini values when a variable was omitted from the model. Predictions were created across the study area by extracting data for each segmented polygon that was more than 50% tree cover; creation of that tree cover layer is described elsewhere [36]. Analyses and data exploration were conducted with R 3.5 [48]  Tree segmentation of the nDSM was also conducted in lidR. Tree tops were detected by passing a circular local maximum filter over the nDSM raster; the tallest point in the window was assumed to be a tree top. We followed others in assuming that taller trees have larger canopies and therefore varied the local maximum window size as a function of focal pixel height [39,40]. After qualitatively exploring linear and quadratic functions and various parameter values, we selected a linear equation with an intercept of 4.00 and a coefficient of 0.02. Tree crown polygons were then delineated using a

Classification Results
Overall model accuracy reached 74% when all data sources were included (Table 3). Nearmap imagery by itself had an overall accuracy of 68%, whereas WorldView 2 imagery had an accuracy of 58%; when LiDAR-derived metrics (i.e., height and intensity) were also included, accuracy rates rose to 70% for Nearmap imagery rose and to 63% for WorldView 2. Thus, including Nearmap imagery improved tree identification accuracy. User accuracy was far higher than producer accuracy; in the highest accuracy model these were 82% and 38% respectively. The kappa statistic for this model was 0.68. Classification accuracy for the focal taxa using the model that included all of the datasets (Table 4) was highest for Acer platanoides (92% producer accuracy and 89% user accuracy), and Gleditsia tricanthos (92% and 76%). Other taxa had somewhat lower accuracy, including Platanus (68% and 85%), Acer (84% and 56%), and Catalpa (64% and 92%), Quercus (53% and 83%), Ulmus (34% and 72%), and Tilia (59% and 80%).

Variable Importance
Of the top ten most important variables (defined here by ranking the mean decrease in accuracy when that variable was omitted), eight were from Nearmap and two were from WorldView 2 (SI 2: Variable importance). Mean and median summaries of polygons accounted for three and six of the top ten variables respectively. The top Nearmap indices were gei, grvi, and sgreen (SI 2: Variable importance). The three Nearmap images that were represented in the top ten variables were 25 October 2018, 28 November 2017, and 2 November 2014.

Aerial Images for Tree Classification
Multi-temporal aerial images were useful for urban tree classification and performed better than commonly used WorldView 2 satellite imagery. The Nearmap image catalogue is expanding rapidly and could improve tree identification in urban areas. It is also affordable, scalable, and avoids the need for atmospheric correction and cloud cover masks.
Multi-temporal imagery can increase tree classification accuracy [24,49]. This is likely due to differences in phenology and reflectance that only occur during certain phenological stages (e.g., flowering). For example, Acer platanoides displays distinctive neon green foliage and flowers in early spring, Gleditsia triacanthos has yellow leaves that senesce relatively early in the fall, and Quercus leaves often stay on trees long after other deciduous genera have dropped their leaves. Including more images in an analysis increases the likelihood of observing phenological differences that can aid identification.

Accuracy Compared to Other Studies
The tree identification accuracy of the model that included LiDAR and aerial images but not satellite imagery (70%) is comparable to other studies of urban trees using satellite imagery. For example, a study of fairly uniformly grown trees in Beijing achieved accuracies of 80-92% using bi-temporal images from WorldView 2 and WorldView 3 and manually delineated tree canopies [24]. A study in Tampa, Florida, USA achieved overall accuracy rates of 56% with WorldView 2 data [22]. Differences in accuracy between studies may also be due to both the amount of spectral and structural similarity between the study taxa and the imagery analyzed. Accuracy is also similar to studies that investigated tree identity in non-urban settings, even though urban areas offer unique challenges such as building shadows and mixed signatures from surfaces beneath trees. For example, studies using WorldView 2 imagery have reported accuracy rates of 30-95% [17], 83% [18], 92% [19], 40-100% [20], 70-90% [50], 67-95% [21], and 65-85% [23].

Leveraging Urban Tree Censuses
Urban tree databases are becoming increasingly common [13,14] and are often GIS-based. In cases, these have already been leveraged for use in tree identification with remote sensing [26,29]. Study designs that require additional field work, such as manually delineating tree crowns, have been generally limited to relatively small sample sizes [24,28,29]. The larger sample sizes provided by automated extraction of tree polygons from street tree databases allow for larger and more sophisticated models. However, this approach does have several potential limitations: canopy segmentation error, temporal mismatches between street tree surveys and remote sensing data, and records that could impede tree identification (e.g., understory trees that are overtopped by large trees). Street tree databases are also not representative of overall tree composition. In Detroit, several early successional genera, including Ulmus, Populus, and Morus, appear more common among non-street trees (personal observation). This has the potential to bias classification of taxa that are disproportionately common as either street trees or non-street trees. Future studies may be able to address this bias by selecting training or testing datasets that are representative of the area's tree composition, for example by consulting data from the Urban Forest Inventory and Analysis program, which will be available soon. Despite the challenges and limitations inherent in leveraging street tree data, this approach can yield predictions with real value.

Applications
The usefulness of the predictions generated will largely depend on whether the well-identified taxa are of importance, e.g., through the provisioning of ecosystem services or disservices. For example, Quercus (user accuracy: 0.83) canopy cover was predicted to vary considerably throughout the city, and these cover estimates are highly effective for predicting neighborhood-level differences in airborne Quercus pollen (manuscript in preparation), which is both extensive and of considerable interest to those with pollen allergies [10]. These predictions could also be useful for guiding future responses to host-specific tree pests and pathogens or for understanding the spread of potentially invasive species, such as Acer platanoides (user accuracy: 0.89) in to natural areas. However, prediction accuracy for many rare tree genera (i.e., Aesculus, Ailanthus, Celtis, Ginko, Populus, and Morus) were poor, thus these predictions should not be used to assess urban tree diversity.

Limitations
Spatial misalignment can interfere with combining multiple data sources in a single analysis [51]. In the present study, the average RMSEs for the aerial and satellite imagery compared to the LiDAR data were both low (1.62 and 1.06 m), especially compared to the average size of classified polygons (71 m 2 ). A GEOBIA approach may have helped to reduce the effect of minor misalignment of datasets in this analysis; preliminary analyses at the pixel level returned accuracies <50% (data not shown). Use of object-based approach in urban environments using LiDAR is a rapidly increasing [52], and so registration errors may be less of a barrier for data fusion in future studies. A challenge for applying GEOBIA approaches to tree identification in urban areas is that individual tree segmentation still is usually inaccurate [53], especially when there are many types of trees growing in different conditions and when trees have intermingled canopies and irregular forms [52]. The resulting polygons may sometimes not include an entire individual tree (i.e., over segmentation), but this is not problematic if each polygon is large enough to have sufficient information for tree identification and if the predictions are used at the stand level (e.g., for calculating total canopy area per taxon). A limitation of our reliance on previously collected datasets is that we did not have manually delineated tree canopies available to quantitatively assess the accuracy of the segmentation algorithm; this limitation is shared by most application-oriented uses of tree segmentation algorithms [52].
One potential limitation associated with using phenological signals to identify plants is the variability of phenology, even within a city. For example, we found that the range of peak flowering times for oak trees within Detroit varied by several weeks [54]. The usefulness of phenological signals to identify plants will be limited when images encompass that variation. Most cities contain thermal gradients [55,56] that affect phenology [57,58] and future studies would do well to consider this when conducting analyses that depend on phenological differences.

Conclusions
This study shows that common commercially available multi-temporal aerial images can improve classification of urban trees. Predictions using both aerial images and LiDAR had an overall accuracy of 70% whereas predictions using LiDAR and World View 2 imagery had an accuracy of 63%. Including both aerial images and satellite imagery increased accuracy to 74%. These results highlight the benefits of adding widespread aerial photography to urban tree identification analyses and showcase some practical uses for the prediction of well-identified taxa across the study area.