Assessing the Accuracy and Potential for Improvement of the National Land Cover Database’s Tree Canopy Cover Dataset in Urban Areas of the Conterminous United States

: The National Land Cover Database (NLCD) provides time-series data characterizing the land surface for the United States, including land cover and tree canopy cover (NLCD-TC). NLCD-TC was ﬁrst published for 2001, followed by versions for 2011 (released in 2016) and 2011 and 2016 (released in 2019). As the only nationwide tree canopy layer, there is value in assessing NLCD-TC accuracy, given the need for cross-city comparisons of urban forest characteristics. Accuracy assessments have only been conducted for the 2001 data and suggest substantial inaccuracies for that dataset in cities. For the most recent NLCD-TC version, we used various datasets that characterize the built environment, weather, and climate to assess their accuracy in different contexts within 27 cities. Overall, NLCD underestimates tree canopy in urban areas by 9.9% when compared to estimates derived from those high-resolution datasets. Underestimation is greater in higher-density urban areas (13.9%) than in suburban areas (11.0%) and undeveloped areas (6.4%). To evaluate how NLCD-TC error in cities could be reduced, we developed a decision tree model that uses various remotely sensed and built-environment datasets such as building footprints, urban morphology types, NDVI (Normalized Difference Vegetation Index), and surface temperature as explanatory variables. This predictive model removes bias and improves the accuracy of NLCD-TC by about 3%. Finally, we show the potential applications of improved urban tree cover data through the examples of ecosystem accounting in Seattle, WA, and Denver, CO. The outputs of rainfall interception and urban heat mitigation models were highly sensitive to the choice of tree cover input data. Corrected data brought results closer to those from high-resolution model runs in all cases, with some variation by city, model, and ecosystem type. This suggests paths forward for improving the quality of urban environmental models that require tree canopy data as a key model input. A.R.T.;


Introduction
Urban trees are an important feature in evaluating urban landscapes and ecosystem services. The US does not have any nationwide tree canopy cover data product. The Multi-Resolution Land Characteristics (MRLC) Consortium produces the National Land Cover Database (NLCD) for the United States using Landsat satellite imagery and other supplementary resources such as high-resolution imagery and digital elevation models [1]. In this consortium, the US Geological Survey (USGS) leads the development of NLCD land cover and imperviousness datasets, National Oceanic and Atmospheric Administration photo-interpretation to define whether each point contained tree canopy or not. They used National Agriculture Imagery Program (NAIP) imagery (1 m resolution) as the response variable and Landsat 5 imagery and a digital elevation model as explanatory variables. Over 63,200 locations were interpreted to produce the response variable (Yang et al., 2018). Their predictive model was based on random forest regression, and the root mean square error (RMSE) of their model varied between 10 and 18%. Coulston et al. [22] used five NLCD mapping zones as sampling zones to train their model and reported that the error is greater in urban areas. However, their sampling zones do not represent different urban density profiles in urban areas; therefore, their method does not provide evidence on how the error varies across different density ranges.
As described above, basic definition, data, and algorithms for developing NLCD-TC 2001 and 2011 data were fundamentally different in terms of sampling and response variables. Critically, the approach to identifying tree canopies in the response imagery was also different. In the 2001 data, trees were classified if vegetation was measured to be 5 m or taller. In the 2011 data, trees were classified as a life form with no height threshold [23]. Due to these substantial differences, 2001 and 2011 datasets are not comparable and, therefore, not appropriate for longitudinal studies. Currently, the NLCD website does not include the 2001 data in its catalog.
In fall 2019, a third generation NLCD-TC database was published, including tree canopy data that were generated for the years 2011 and 2016 [24]. The 2019 NLCD-TC version uses similar methods as the previous version. To produce the 2016 products, the same "resources were not available to re-interpret tree canopy cover of all of the locations used in the [original] 2011 product. Rather 3% of the original locations were re-interpreted using newer NAIP imagery based on the occurrence of wildfires or large NDVI changes detected in Landsatderived time series" [24] (p. 113). To build the explanatory variables for the production of the 2011 product, Landsat 5 Thematic Mapper imagery was used, whereas for the 2016 product, Landsat 8 Operational Land Imager imagery was used. The main predictive algorithm remains random forest regression.
Three studies have assessed the accuracy of the 2001 NLCD-TC dataset. First, Homer et al. [25] evaluated the accuracy of the NLCD-TC 2001 data in three NLCD mapping zones in Virginia, Minnesota, and Utah. They found mean absolute error in each zone of 9.9, 14.1, and 8.4%, respectively. A second study by Greenfield et al. [16] used aerial photos to identify trees in selected geographies and compared them with NLCD-TC values.
To determine the assessment locations, they reclassified the 65 NLCD mapping zones into five larger regions (Southeast, Northeast, Midwest, Mountain West, and Arid West). Within four randomly selected NLCD mapping zones in each region, they selected seven incorporated areas or census-designated places of varying population densities. They randomly distributed 200 points across these locations. Greenfield et al. found that the NLCD-TC 2001 data underestimate tree cover by about 9.7% on average, with a consistent error rate across the conterminous United States and no statistical differences among different regions. A third accuracy assessment by Nowak and Greenfield [17] used manual photo-interpretation in a manner similar to Greenfield et al. [16] but distributed their samples in all 65 NLCD mapping zones. Nowak and Greenfield [17] found that the 2001 NLCD-TC data underestimate tree cover in 64 of the zones by 9.7% on average. These studies are based on dispersed sampling across different zones and climates. Notably, they did not identify the error distribution across the gradient of settlement density. The accuracy of NLCD-TC 2011 data also has not been independently studied-either for the first 2011 dataset released in 2016 or the more recently released 2019 product, which covers the years 2011 and 2016. Studies like these can shed light on the ways we use NLCD products.
Urban areas are typically more heterogeneous than rural areas. Any given urban NLCD 30 m grid cell often aggregates a wider variety of land cover classes into a single type than a more homogeneous non-urban environment. Further, "canopies" composed of a single tree, often smaller than a single pixel, are common in urban areas, and it is unclear if and when NLCD-TC data detect these isolated tree patches.
Given that no other national urban forest data sources exist, there is a need to better understand and quantify patterns and source error in the NLCD-TC datasets for urban areas so that the potential uses and limits of this dataset for this highly important context can be better understood. The objective of this paper is to do so by running comparisons of NLCD-TC against more accurate, high-resolution urban tree cover data by city and across multiple variables describing the built environment, climate, and ecoregion characteristics. Because high-resolution tree canopy data can be used to derive a percent tree cover for the same pixel area, albeit using finer scale building blocks, it is valid to compare the two for accuracy. However, it is worth noting that no matter how accurate NLCD-TC is in its pixel-level tree percentages, it can never characterize where the tree canopies are within that pixel when coverage is less than total, meaning that NLCD-TC will never be appropriate for analyses that require a high level of spatial precision.
We next evaluate methods of accuracy enhancement using additional nationally available explanatory variables derived from other sources, including NLCD land cover, regional climate data, Landsat 8 imagery, and building footprint data. This approach provides an exploratory application of how effectively remotely sensed products and more extensive training data (i.e., high-resolution tree canopy cover datasets) can address NLCD-TC spatial error in urban areas. Finally, we evaluate the implications of this error assessment for a modeling application where NLCD-TC is a key model input-in our case, city-scale ecosystem accounting/ecosystem service assessment [13,14]. This analysis shows how uncertainties in the NLCD-TC data affect research conclusions; though limited to a specific application, we expect our results to have relevance for the modeling of other urban phenomena dependent on tree canopy cover.

Methods
In this section, we first describe the data that we used in this work, then our data compilation and analysis procedures. The datasets that we used served two purposes: (1) accuracy assessment of the NLCD-TC data in different urban areas and (2) construction and testing of a predictive model to improve the quality of NLCD-TC and its applications to environmental modeling in urban areas.

Data
For the accuracy assessment, we used the most recent version of the 2011 NLCD-TC for the conterminous United States from the MRLC [1] as the target variable of interest.
We used high-resolution (1 m) land cover data (that includes tree canopy cover as a class) for 27 cities to build a 30 m layer (hereafter referred to as the "high-resolution derived TC" data) for accuracy assessment of the NLCD-TC data in urban areas as described in Section 2.2. For 10 cities, we used data from the US Environmental Protection Agency (EPA) EnviroAtlas portal. The reported fuzzy accuracy of these layers ranges from 83 to 95% at the 1 m scale [26]. For 17 cities, we received the data produced by the University of Vermont Spatial Analysis Laboratory (UVM SAL: 2011-2017; EPA data for Cleveland, OH, and Chicago, IL, USA, were also originally developed by the University of Vermont). Only a few cities provide a formal accuracy assessment for these layers. The tree cover layers for New York City and Philadelphia are 99 and 97% accurate [27]. Since the UVM SAL uses a similar procedure for all cities, including manual correction, it is expected that tree cover data for other cities will have similar accuracy levels. Nine of the datasets covered entire counties (e.g., Baltimore County, MD, USA); others covered small (e.g., Cambridge, MA, USA) to large (e.g., New York City, NY, USA) cities. City/county data represented the years 2007 through 2016, though nearly 60% of data are for 2010 and 2011. To evaluate how representative the data were across diverse climatic conditions, we classified each city/county dataset by EPA Level II Ecoregion [28] (Table 1). Table 1. Cities with high-resolution (1 m) land cover by US Environmental Protection Agency Level II Ecoregion [28]. City/county data are sourced from either the • US Environmental Protection Agency EnviroAtlas [26] or * the University of Vermont [27]. We used six additional characteristics, listed below, as independent variables to construct a predictive model of tree canopy cover that aims to improve on the native NLCD-TC product in cities. Additionally, we used surface temperature, NDVI, building footprint, and urban density data to assess how NLCD-TC accuracy varies based on environmental and built environment characteristics (Section 2.2).

Level II Ecoregion
(a) Surface temperature and NDVI. Tree cover has a negative correlation with surface temperature but a positive correlation with normalized difference vegetation index (NDVI) [29,30]. Therefore, we produced surface temperature and NDVI datasets from Landsat 8 images for each city or county (Table 1) as explanatory variables to evaluate NLCD-TC error. To produce these, we downloaded the four least cloud-covered summer images for the years 2013-2015. Considering the limited number of cloudfree images in humid parts of the US in the summertime, we judged four images per tile to provide sufficient variation and manageable computational demand. We calculated surface temperature using Landsat 8 band 10 and NDVI using bands 4 and 5 of the images based on USGS guidelines [31]. We extracted median values for surface temperature and NDVI for all four images in each scene. (b) Building footprints. In urban environments, trees and buildings create a heterogeneous environment, which makes tree detection using remotely sensed data challenging. To create a gradient of built density, we used the area of building footprints in each cell; this dataset is extracted from Microsoft building footprint data [32]. Microsoft reported that these data have 99.3% precision and 93.5% pixel recall accuracy. Heris et al. [33] evaluated the accuracy of this dataset and found it to detect 96, 93, and 94% of buildings over 100 m 2 in Denver, CO, New York City, NY, and Los Angeles County, CA, respectively. We used three of the six summary datasets generated by Heris et al. [33]: (1) total building footprint coverage per cell (m 2 per 900 m 2 cell); (2) number of buildings that intersect each cell; and (3) area of the average building intersecting the cell (m 2 ). These data have been converted into raster datasets that summarize building data for 30 m cells aligned with NLCD data, better meeting the needs of national-scale models. Because Microsoft used aerial photos from different years to generate this dataset, they did not provide a specific date for these data. (c) Urban density. We used an urban morphology classification produced by Heris [34], which is based on Census and impervious surface data for the years 2000 and 2010 (we used the 2010 product in this study). This classification is based on the neighborhood density of each 30 m cell for the conterminous US for five densities: high, medium, and low-density urban areas, urban fringe, and suburbs. This dataset helps to stratify the distribution of NLCD-TC error across different urban morphologies in built environments as well as natural (non-built) areas falling within cities or counties of interest. We also used this dataset to separate built and undeveloped areas. For undeveloped cells, we applied a query to exclude cells that have an impervious surface cover greater than 0%. (d) Climate data. To incorporate variation in climatic environments across cities, which helps explain differences in urban tree occurrence, in the NLCD-TC predictive model, we extracted the average annual high and low temperature and average annual precipitation for each city (1990-2018) from the US Climate Data website [35]. (e) Year built of structures. We used the median year built of structures from the 2010 Census Block Group data [36] to incorporate the age of neighborhoods in our NLCD-TC predictive model assessment. This accounts for the fact that the maturity and size of urban tree canopies often correlate with the age of establishment of residential neighborhoods [37]. (f) National Land Cover Database (NLCD) land cover. We used the most recent edition of the 2011 NLCD land cover [24] in the NLCD-TC predictive model.

Dependent and Independent Variables
To compare NLCD-TC values with the high-resolution derived TC data, we aggregated the amount of tree cover in 1 m cells to 30 m cells and calculated the percent cover for every 30 m cell. To evaluate the accuracy of the year 2011 NLCD-TC in US urban areas across different regions and landscapes, we initially calculated error (the difference between high-resolution derived TC and NLCD-TC at 30 m resolution). We report the mean error, RMSE, and Kolmogorov-Smirnov score. Our subsequent analysis has three parts. First, we evaluated the distribution of error in urban areas, across cities, and in different landscapes. Next, we developed a predictive model to improve the accuracy of the NLCD-TC dataset in urban areas. Finally, we showed how the predictive model can improve the accuracy of NLCD-TC in modeling applications for Denver, CO, and Seattle, WA.
To evaluate NLCD-TC error across 27 cities and counties, we generated scatterplots and histograms of the values of tree cover and their error (high-resolution derived TC minus NLCD-TC) in different landscapes. The categories we compared include EPA ecoregions, built versus undeveloped areas, cities, gradients for urban density, tree canopy cover, NDVI, total building footprint coverage, and surface temperature. Plotting these distributions helped us to understand error distribution patterns of the NLCD-TC 2011 data. For example, the total building footprint variable can help determine whether the error is larger in cells that have greater building cover.

Predictive Model
We developed a predictive model that aims to reduce tree canopy error rates in cities using a series of explanatory variables. Our decision tree model used high-resolution derived TC as the response variable and the following explanatory variables: NLCD-TC, NLCD land cover, building coverage, surface temperature, NDVI, urban density, median year built of housing units, average precipitation, average high temperature, and built/undeveloped. To compile the explanatory variables, we used the NLCD-TC grid structure to convert all layers to raster datasets with matching resolution, extent, and projection system. Ensuring that all raster layers have the same properties enabled the conversion of the raster layers to Python Numpy Arrays, which allows the use of a wide range of optimized libraries. We converted all explanatory layers, clipped from national layers to the 27 cities and counties, to Numpy arrays and then compiled them in a single Pandas dataframe in which every row is the cell data point, and every column is a variable. That primary data frame contains 34.8 million records after excluding cells with open water or no-data values.
We used the Scikit Learn package (version 0.24.2) for Python [38] to run the decision tree regressions. To avoid overfitting, we used a two-level incremental sampling method to evaluate the performance of the model. This sampling method also mitigates potential spatial autocorrelation. The incremental sampling method randomly sampled a fraction of the data. The fractions started from 0.1%, and in 17 steps, reach 100% of the data. For each fraction, we performed the second level of sampling through which we randomly sampled 80% of the data for training the regression model and used 20% of it for testing the results. This two-level sampling method created a 17-step incremental fraction that starts from 0.08 to 0.8% for training the model. The model performance was stable at 75-76% for sample sizes greater than five million (14% of the entire data; Figure 1).

Predictive Model
We developed a predictive model that aims to reduce tree canopy error rates in cities using a series of explanatory variables. Our decision tree model used high-resolution derived TC as the response variable and the following explanatory variables: NLCD-TC, NLCD land cover, building coverage, surface temperature, NDVI, urban density, median year built of housing units, average precipitation, average high temperature, and built/undeveloped. To compile the explanatory variables, we used the NLCD-TC grid structure to convert all layers to raster datasets with matching resolution, extent, and projection system. Ensuring that all raster layers have the same properties enabled the conversion of the raster layers to Python Numpy Arrays, which allows the use of a wide range of optimized libraries. We converted all explanatory layers, clipped from national layers to the 27 cities and counties, to Numpy arrays and then compiled them in a single Pandas dataframe in which every row is the cell data point, and every column is a variable. That primary data frame contains 34.8 million records after excluding cells with open water or no-data values.
We used the Scikit Learn package (version 0.24.2) for Python [38] to run the decision tree regressions. To avoid overfitting, we used a two-level incremental sampling method to evaluate the performance of the model. This sampling method also mitigates potential spatial autocorrelation. The incremental sampling method randomly sampled a fraction of the data. The fractions started from 0.1%, and in 17 steps, reach 100% of the data. For each fraction, we performed the second level of sampling through which we randomly sampled 80% of the data for training the regression model and used 20% of it for testing the results. This two-level sampling method created a 17-step incremental fraction that starts from 0.08 to 0.8% for training the model. The model performance was stable at 75-76% for sample sizes greater than five million (14% of the entire data; Figure 1).

Validation of the Predictive Model
To examine the predictive power of our model, we separated the data of two cities in widely differing climate zones (Denver, CO, and Seattle, WA, USA) from our training dataset and used the model to correct the bias of NLCD-TC in these cities. We report and present the improved NLCD-TC error distributions and fitted curves.

Validation of the Predictive Model
To examine the predictive power of our model, we separated the data of two cities in widely differing climate zones (Denver, CO, and Seattle, WA, USA) from our training dataset and used the model to correct the bias of NLCD-TC in these cities. We report and present the improved NLCD-TC error distributions and fitted curves.

Use Case: Running Corrected Data vs. Native NLCD-TC for Two Ecosystem Accounting Models
We evaluated the impact of correcting NLCD-TC on two ecosystem service models that depend on TC as a key input, which we ran for Denver and Seattle. These two models quantify (1) the amount of rainfall intercepted by urban trees and (2) the quantity of cooling energy savings that trees provide through urban heat mitigation [14]. We used three different datasets to show the sensitivity of such models to input tree canopy datasets. The input tree cover data include (1) NLCD-TC 2011 (30 m resolution), (2) corrected NLCD-TC from our predictive model (30 m resolution), and (3) high-resolution tree cover datasets for the two cities (1 m resolution). The interception model uses tree cover and year 2011 precipitation data to quantify rainfall interception by trees during daily storm events throughout the year, accounting for leaf-on and -off seasons (with leaf area index values reduced during the leaf-off season). The heat mitigation impact model uses tree cover, buildings, surface temperature, and weather station data to estimate the cooling energy savings provided by trees [14]. We used the methods and terminology recommended by the United Nations for implementing ecosystem accounting-an internationally standardized approach to systematically quantify ecosystems and the services they provide to the economy. Along with tracking changes over time in ecosystems extent and condition, the System of Environmental-Economic Accounting Ecosystem Accounting (SEEA EA) quantifies, typically using models, the ecosystem services produced by specific ecosystems and used by economic units (businesses, households, and government), using a framework consistent with national economic accounting principles [39]. We used NLCD land cover data as a proxy for ecosystem types and stratified our outcomes using this layer to estimate the services that each ecosystem type provided. In this context, we generally assumed that the high-resolution tree cover data will typically provide more accurate estimates for modeled ecosystem services, though a true accuracy assessment of ecosystem service model results would require calibration data that are seldom available at city and national scale.

Code Availability
We used Python to program the analysis procedure and generate a variety of figures; we include several key figures representative of our analysis in the Results section below. We included the Jupyter Notebook that contains the code and all figures in a code repository [40].

General Error Distribution
Our assessment for NLCD-TC 2011 compared to high-resolution derived TC data from 27 cities and counties shows a mean error of 9.9%, an absolute value of the error of 14.9%, and an RMSE of 23.3. The K-S test is 0.28, which reports the distance between the NLCD-TC distribution function and the high-resolution derived TC data distribution. The NLCD-TC 2011 tends to underestimate overall tree cover in cities ( Figure 2). Across individual cities, overall error distribution patterns and differences in error vary ( Figure 3 shows patterns for nine cities; additional cities are shown in the code repository). The common pattern in most cities is a greater underestimation than overestimation of tree cover on a grid cell basis (positive error values). This pattern is more extreme in cities such as Denver, Boise, and New York.

Error Distribution across Different Landscape Characteristics
Multidimensional error analysis provides insight into the uncertainty of the original data products and the choice of appropriate explanatory variables to build a predictive model to reduce the tree canopy error in cities. In this section, we assess how NLCD-TC 2011 accuracy varies across the following variables: (a) EPA ecoregions, (b) built versus undeveloped areas, (c) urban density, (d) tree cover gradients, (e) cities, (f) NDVI, (g) surface temperature, and (h) building footprint coverage.
(a) Error distribution across EPA ecoregions: Average error showed considerable differences across EPA ecoregions. Figure 4 shows the EPA Level II ecoregions of the US cities in warm deserts, Mediterranean, and Ozark/Ouachita-Appalachian forests regions Multidimensional error analysis provides insight into the uncertainty of the original data products and the choice of appropriate explanatory variables to build a predictive model to reduce the tree canopy error in cities. In this section, we assess how NLCD-TC 2011 accuracy varies across the following variables: (a) EPA ecoregions, (b) built versus undeveloped areas, (c) urban density, (d) tree cover gradients, (e) cities, (f) NDVI, (g) surface temperature, and (h) building footprint coverage.
(a) Error distribution across EPA ecoregions: Average error showed considerable differences across EPA ecoregions. Figure 4 shows the EPA Level II ecoregions of the US cities in warm deserts, Mediterranean, and Ozark/Ouachita-Appalachian forests regions (8-4, 10-2, and 11-1) have the lowest error while cold deserts and central plains (10-1, 8-1, and 8-2) have the largest error values.  (b) Error distribution across built versus undeveloped areas: Mean error is considerably higher in the built areas (11.8%) than undeveloped areas (6.4%). NLCD-TC 2011 underestimates tree cover consistently in built areas.
(c) Error distribution across urban density: Average NLCD-TC error also increases with urban density. The only exception to this pattern is that error in high-density areas is slightly less than in medium-density areas ( Figure 5). (c) Error distribution across urban density: Average NLCD-TC error also increases with urban density. The only exception to this pattern is that error in high-density areas is slightly less than in medium-density areas ( Figure 5). (d) Error distribution across tree cover gradients: In all cities, in cells with greater tree canopy cover, the average error is also larger ( Figure 6). The slope of regression lines varies across cities. For instance, in Baltimore County, MD, and Austin, TX, the slopes are generally small, whereas, in Denver, CO, and Boise, ID, the slope is relatively large. These differences reflect the average tree canopy area in different cities and climates.  [5,10], [10,20], [20,40], [20,60]  (d) Error distribution across tree cover gradients: In all cities, in cells with greater tree canopy cover, the average error is also larger ( Figure 6). The slope of regression lines varies across cities. For instance, in Baltimore County, MD, and Austin, TX, the slopes are generally small, whereas, in Denver, CO, and Boise, ID, the slope is relatively large. These differences reflect the average tree canopy area in different cities and climates. (c) Error distribution across urban density: Average NLCD-TC error also increases with urban density. The only exception to this pattern is that error in high-density areas is slightly less than in medium-density areas ( Figure 5). (d) Error distribution across tree cover gradients: In all cities, in cells with greater tree canopy cover, the average error is also larger ( Figure 6). The slope of regression lines varies across cities. For instance, in Baltimore County, MD, and Austin, TX, the slopes are generally small, whereas, in Denver, CO, and Boise, ID, the slope is relatively large. These differences reflect the average tree canopy area in different cities and climates.  [5,10], [10,20], [20,40], [20,60] [5,10], [10,20], [20,40], [20,60] (e) Error distribution by cities/counties: NLCD-TC error varies considerably among cities/counties (Figure 7). In all cities but Memphis, TN, NLCD-TC underestimates tree cover. Washington, DC, has the largest average error (21%).
(e) Error distribution by cities/counties: NLCD-TC error varies considerably among cities/counties (Figure 7). In all cities but Memphis, TN, NLCD-TC underestimates tree cover. Washington, DC, has the largest average error (21%). (f, g, h) Error distribution by NDVI, surface temperature, and building footprint gradients: For areas with higher surface temperatures, which typically have less tree canopy cover, NLCD-TC error is smaller (Figure 8a). Two clusters of error emerge relative to surface temperature-at lower temperatures (29-30 °C, with 15-18% error) and moderate (30-31 °C) temperatures centered around zero error. The second cluster likely indicates areas with understory vegetation and no tree cover. The NDVI scatterplot also shows an error cluster (approximately 15-20%) at higher NDVI levels ( Figure 8b). This is most likely associated with the underestimation of tree cover. The building footprint coverage scatterplot shows a somewhat greater error in cells with less building coverage (Figure 8c). When building footprint area rises above 30%, the error is smaller. This indicates areas with more building coverage and less tree cover.  (f, g, h) Error distribution by NDVI, surface temperature, and building footprint gradients: For areas with higher surface temperatures, which typically have less tree canopy cover, NLCD-TC error is smaller (Figure 8a). Two clusters of error emerge relative to surface temperature-at lower temperatures (29-30 • C, with 15-18% error) and moderate (30-31 • C) temperatures centered around zero error. The second cluster likely indicates areas with understory vegetation and no tree cover. The NDVI scatterplot also shows an error cluster (approximately 15-20%) at higher NDVI levels ( Figure 8b). This is most likely associated with the underestimation of tree cover. The building footprint coverage scatterplot shows a somewhat greater error in cells with less building coverage (Figure 8c). When building footprint area rises above 30%, the error is smaller. This indicates areas with more building coverage and less tree cover. (e) Error distribution by cities/counties: NLCD-TC error varies considerably among cities/counties (Figure 7). In all cities but Memphis, TN, NLCD-TC underestimates tree cover. Washington, DC, has the largest average error (21%). (f, g, h) Error distribution by NDVI, surface temperature, and building footprint gradients: For areas with higher surface temperatures, which typically have less tree canopy cover, NLCD-TC error is smaller (Figure 8a). Two clusters of error emerge relative to surface temperature-at lower temperatures (29-30 °C, with 15-18% error) and moderate (30-31 °C) temperatures centered around zero error. The second cluster likely indicates areas with understory vegetation and no tree cover. The NDVI scatterplot also shows an error cluster (approximately 15-20%) at higher NDVI levels (Figure 8b). This is most likely associated with the underestimation of tree cover. The building footprint coverage scatterplot shows a somewhat greater error in cells with less building coverage (Figure 8c). When building footprint area rises above 30%, the error is smaller. This indicates areas with more building coverage and less tree cover.

Predictive Model Performance
The R 2 or performance score of the decision tree regression is 0.76 (Table 2). Besides NLCD-TC itself (by far the most important predictor), NLCD land cover, NDVI, and average precipitation are the strongest predictors. As expected, NLCD-TC itself is a very strong predictor; when we eliminated that, NDVI and NLCD land cover became strong predictors, yielding a model with an R 2 of 0.68. Table 2. Decision tree model outcomes (the importance/coefficient of explanatory variables). The predictive model improves the estimate of tree canopy cover relative to the native (uncorrected) NLCD-TC product (i.e., the red smoothed line, representing corrected NLCD-TC, is closer to the high-resolution green smoothed line than the native NLCD-TC blue smoothed line, Figure 9). The model predicts 0% canopy cover cells very effectively. It also improves the prediction of cells with 100% canopy cover compared to the native NLCD-TC. However, it still slightly underpredicts them relative to high-resolution derived TC tree canopy data, and it overestimates tree canopy cover at values of 1-20% and 90-99%. The corrected tree cover data have a better error distribution centered around zero ( Figure 10). All metrics-mean error, mean absolute value of error, RMSE, and Kolmogorov-Smirnov test score-show that the predictive model has improved the accuracy of NLCD-TC in cities (Table 3). Kolmogorov-Smirnov score 0.25 0.27 Figure 9. Comparison of National Land Cover Database-Tree Cover, predicted tree cover, and highresolution derived TC tree cover histograms; the smoothed lines show the normal density line; the Y axis shows the percentage of data counts.  To test the performance of a predictive NLCD-TC model in two cities in widely dif- Figure 10. Comparison of National Land Cover Database-Tree Canopy (NLCD-TC) and corrected tree cover error (deviation by % when compared to high-resolution derived data) distribution; the smoothed lines are the normal density lines.

Validation of the Predictive Model in Denver, CO, and Seattle, WA, to Correct NLCD-TC Bias
To test the performance of a predictive NLCD-TC model in two cities in widely different EPA ecoregions (Denver in the South-Central Semiarid Prairies and Seattle in the Marine West Coast Forests), we used a model that excludes these two cities from its training dataset. The predictive model improves the accuracy of NLCD-TC in both cities substantially. It performs well in reducing the number of zero-cover grid cells where NLCD-TC underestimates tree cover considerably in both cities (Figure 11, top). The predictive model also produces an even distribution around zero (Figure 11, bottom). The average NLCD-TC error in Seattle is 11.9%, while that produced by the predictive model is 1.3%; these values for Denver are 5.3% and −1.5%, respectively. The predictive model, thus, improved the underestimation of NLCD-TC in both cities, though it still slightly overestimates tree cover in Denver and underestimates it in Seattle (both by <1.5%).
Remote Sens. 2022, 14, x FOR PEER REVIEW 16 of 24 Figure 11. Distribution of tree cover (top) and error (deviation by % when compared to high-resolution derived data) (bottom) for National Land Cover Database-Tree Canopy (NLCD-TC), highresolution, and corrected tree cover data in Seattle, WA, and Denver, CO; the smoothed lines are the normal density lines.

NLCD-TC Data Correction: Effects on Ecosystem Accounting Model Results
Denver and Seattle's urban forests have different tree cover and patch configurations--with both figures being larger for Seattle. The average tree patch sizes for Denver and Seattle are 133 m 2 and 193 m 2 , respectively. Mean citywide tree cover at 30 m resolution is 4.5% in Denver and 16.2% in Seattle. As reported above, our predictive model improves NLCD-TC in both cities. In Seattle, the predictive model increased citywide tree cover at 30 m cells from 16.2 to 27.2%. The predictive model also increased citywide tree cover estimates in Denver from 4.5 to 12.6% at the 30 m cell level.
Gains obtained from using corrected or high-resolution tree canopy cover inputs relative to native NLCD-TC are complex and depend on the model. In Denver, rainfall interception was estimated at 0.8 million, 2.1 million, and 17.2 million m 3 of rainfall when native NLCD-TC, corrected NLCD-TC, and high-resolution tree canopy data were used, re- Figure 11. Distribution of tree cover (top) and error (deviation by % when compared to highresolution derived data) (bottom) for National Land Cover Database-Tree Canopy (NLCD-TC), high-resolution, and corrected tree cover data in Seattle, WA, and Denver, CO; the smoothed lines are the normal density lines.

NLCD-TC Data Correction: Effects on Ecosystem Accounting Model Results
Denver and Seattle's urban forests have different tree cover and patch configurationswith both figures being larger for Seattle. The average tree patch sizes for Denver and Seattle are 133 m 2 and 193 m 2 , respectively. Mean citywide tree cover at 30 m resolution is 4.5% in Denver and 16.2% in Seattle. As reported above, our predictive model improves NLCD-TC in both cities. In Seattle, the predictive model increased citywide tree cover at 30 m cells from 16.2 to 27.2%. The predictive model also increased citywide tree cover estimates in Denver from 4.5 to 12.6% at the 30 m cell level.
Gains obtained from using corrected or high-resolution tree canopy cover inputs relative to native NLCD-TC are complex and depend on the model. In Denver, rainfall interception was estimated at 0.8 million, 2.1 million, and 17.2 million m 3 of rainfall when native NLCD-TC, corrected NLCD-TC, and high-resolution tree canopy data were used, respectively (Table 4). By contrast, the heat mitigation model was less sensitive to tree canopy input data, with estimated energy savings at 47,937, 51,289, and 59,140 mWh using native, corrected, and high-resolution data, respectively. In both models for Denver, the corrected NLCD-TC produces estimates closer to the high-resolution ones. However, gains are relatively incremental and still substantially underestimate rainfall interception.
In contrast, using corrected NLCD-TC brings ecosystem service values for Seattle much closer to those generated using high-resolution data ( Table 4). Rainfall interception for native NLCD-TC, corrected NLCD-TC, and high-resolution tree cover are 3.5, 5.4, and 6.0 million m 3 water, respectively. In other words, in Seattle, the amount of rainfall interception increased from 58% of the total using native NLCD-TC to 89% using the corrected version (accuracy gains were smaller for Denver). Corresponding values for energy savings are 34,428, 40,280, and 51,335 mWh. When we stratified the results based on land cover type (as a proxy for ecosystem types), we can see for which cities, ecosystem services and ecosystem types, the correction has been most influential. Generally, the predictive model improved results over the native input tree cover data, though large gaps remained for a few ecosystem-ecosystem service types (e.g., energy savings in high-density developed areas of Seattle and overall results of the Denver rainfall interception model, Table 4). Overall, the corrected tree cover dataset is an important step forward in running more accurate ecosystem services and accounting models in urban areas, though more city-scale calibration data will be needed to assess model accuracy properly.

Discussion
As a nationwide dataset, NLCD-TC is the only available product for tracking urban tree cover throughout every city in the US. However, its relatively coarse resolution makes its accuracy and precision questionable in urban areas, especially where landscapes are heterogeneous and have a wide variety of cover types. This analysis sought to more systematically assess NLCD-TC error and possible error correction methods, in order to help establish the usability of NLCD-TC for urban forest analysis applications. While a series of papers evaluated errors associated with the first version of NLCD-TC (2001) dataset [16,17,25], we are unaware of similar analyses for the two more recent NLCD-TC datasets. Past studies (e.g., [16]) relied on a limited set of observations and called for more research into exploring the magnitude of uncertainty of NLCD-TC.
We leveraged the availability of newer validation datasets and analytical techniques to better understand tree canopy error in heterogeneous urban environments and evaluate approaches to its correction and use in urban modeling. Access to high-resolution data for multiple cities enabled us to train a decision tree algorithm on over 35 million sample points to more accurately estimate tree cover. We suggest that a machine learning algorithm can have two major benefits, by (1) improving the accuracy of the native dataset and (2) producing a better error distribution centered around zero across urban landscapes (i.e., with better geographic balance). Our study also shows the value of additional datasets (e.g., NDVI, climate, and building data) that can potentially be useful in generating more accurate future NLCD-TC layers.
We found that NLCD-TC 2011 error is not evenly distributed across different geographies, particularly in highly heterogeneous environments like cities where 30 m grid cells often encompass multiple land cover types. Stratifying the error across different characteristics, such as density gradients, built versus undeveloped land, and ecoregions showed that NLCD-TC tends to have a larger error (underestimation) in medium-and high-density urban areas. In other words, the error distribution is skewed to the right. The high frequency of zero error also is reasonable because NLCD-TC predicted so many cells correctly. This finding aligns with those of previous analyses [16,17,25] and also a recent study showing that NLCD impervious cover is typically overestimated in cities [41]. We suspect that this error could be due to (1) shadowing from buildings in urban areas that cause noise in the remotely sensed data; (2) more heterogeneous surfaces, which result in a larger error when remotely sensed data are used for detecting tree canopies; and (3) the fact that in cities, individual tree crowns are often isolated and small enough that they might not be detected in classification of 30 m resolution pixels. We also evaluated NLCD-TC data for 2016 and found the data to have similar error distribution and bias as the 2011 data, which we expected since they used the same production process.
Given current data availability, there is an opportunity to build more sophisticated models that incorporate urban heterogeneity to more accurately predict tree canopy cover in cities. High-resolution land cover data are a cost-effective way to produce an intensified sample at a spatial resolution capable of capturing the heterogeneity inherent in urban landscapes. By combining high-resolution land cover with diverse datasets in a decision tree model, we reduced the mean error from 8.1 to 0% and the mean absolute error from 13.5 to 10.6%, with an R 2 value of 0.77. We found that climatic variables such as temperature and precipitation and built environment variables such as buildings and development density could improve the tree cover accuracy marginally but not considerably. Including such variables in future algorithms is recommended.
This accuracy assessment informs future studies about the usability of NLCD-TC data for ecosystem assessment models in urban contexts. NLCD-TC data might not be an appropriate input for a model that cannot tolerate a 10-15% underestimation of tree cover. In such cases, if high-resolution data are not available, our correction method may be helpful. In our application of native, corrected, and high-resolution tree canopy data to urban ecosystem accounting models, we found these models to be highly sensitive to the quality of urban tree canopy layers. Understanding the impact of tree canopy data quality on a given model can be complex since these models often use nonlinear relationships, and the relationship between the built environment and ecosystems is complex in cities [42]. For example, in our heat mitigation model, energy savings will be realized if trees are located close to buildings. If tree input data underestimate tree cover in cells with buildings, then the results would be substantially affected. In this case, the accuracy of values in cells without buildings would not matter. By contrast, the rainfall interception model quantifies interception by trees regardless of their location in a city; interception was underestimated more strongly in a city with a smaller and more dispersed canopy (Denver) than a larger and more connected one (Seattle). These points highlight the importance of understanding the spatial distribution of error.
Analyses relying on urban tree canopy data in cities should be aware of six points raised by our study. First, as we show in Section 3.2, error rates are higher in certain ecoregions, urban density, and cities than others. This makes efforts to correct tree canopy data more important in some places than others. Second, when evaluating tree cover in a region that includes developed and undeveloped land uses, error and uncertainty will be distributed unevenly across the landscape. Researchers should consider an average underestimation error of 5% in undeveloped areas and 11% in developed areas as a reasonable expectation when using the native (uncorrected) NLCD-TC products for urban and regional scale analysis. Third, since there is also a systematic error in NLCD impervious surface data [41], models that use both NLCD-TC and impervious data should be aware of the potentially interactive effects of the respective errors of these datasets. Fourth, while our predictive model removed bias in the data, the mean absolute error in urban areas improved modestly (from 13.5 to 10.6%), so high-resolution data will remain preferable when available. Fifth, if high-resolution data are available for a part but not all of a study area, it may be useful to use the high-resolution data to train an algorithm to correct NLCD-TC in the study area. Finally, urban models often benefit from being run at a higher resolution than 30 m [43,44], though this may not always be possible in large-scale comparative studies.
Machine learning algorithms offer the potential to improve the mapping of urban areas [45], including estimates of tree cover elsewhere in the world, particularly in heterogeneous environments like cities and in data-poor regions. Datasets derived from satellites like Landsat 8 and the Sentinel program can provide the basis for such studies. Adding other data for population or housing density or building footprints (now available for an increasing number of countries, including Australia, Austria, Canada, Germany, Tanzania, Uganda, and the U.K. [46][47][48]) could improve estimates of tree cover. Opportunities may, thus, exist to improve tree canopy cover estimates in cities for other parts of the world using methods similar to ours, improving on global [49] or continental-scale [50] tree canopy cover datasets. Now that more extensive climatic (e.g., temperature and precipitation), socio-economic (e.g., housing density), and built environment data are available, machine learning algorithms such as random forests can be used to combine such data to build localized models. Such corrected datasets may provide a more accurate view of urban processes that depend on tree canopy cover data as key model input, including urban climate, climate resiliency, and air and water quality. Our work shows how understanding the error distribution of tree canopy cover data and applying methods to improve its accuracy can improve urban ecosystem accounting models, with examples for multiple US cities and model types.
The most notable limitation of our study was that the city-level, high-resolution land cover datasets that we used for validation were not evenly distributed across the US, nor were they all available for the same year. A greater number of our samples were located in the East, particularly in the mid-Atlantic region. Our evaluation would be more complete if we had access to high-resolution data for more major cities in the Plains, Intermountain West, and Pacific Coast regions. We believe our model could be improved using cities more fully representative of diverse climatic regions, particularly with more examples from hotter and drier regions. High-resolution land cover data for some cities come from different years than 2011 (Table 1). Given the need to incorporate data from cities spanning as large a gradient as possible of climate/ecological zones as well as city size and age, and because time series of high-resolution data do not frequently exist for cities, we included data for a range of years. For 63% of our cities, that range is ±1 year from 2011, for 78% of cities the range is ±3 years from 2011, and for 96% of cities the range is ±4 years from 2011. Though this may produce some error, annual tree cover change in cities is often small, averaging 1% nationally over a five-year period though occasionally being as high as 0.9% [51] to 2% [52] per year. Future work could also evaluate uncertainty datasets that are produced alongside the NLCD-TC data.

Conclusions
In the absence of a nationwide dataset to characterize urban forests, the NLCD urban tree canopy layer is often used by default to map urban trees where no alternative highresolution data exist or in cases where multi-city, regional, or national comparisons are needed. Our analysis sought to assess the accuracy of NLCD-TC in urban contexts and whether it can be improved through a multivariate modeling approach. Through validation using high-resolution land cover datasets, our study shows how the error of the US NLCD-TC 2011 is distributed in heterogeneous urban environments. This work and our subsequent predictive model can be useful in improving tree canopy cover estimates to track changes over time in urban tree canopies and associated ecosystem services in the absence of highresolution data. This may be particularly useful in cities lacking the resources for periodic monitoring, whether small cities in developed nations like the US, or in the developing world, where approaches analogous to ours could be developed and applied.

Funding:
We acknowledge the support of the NASA Biodiversity and Ecological Forecasting Program (grant no. 80NSSC18K0341) for Heris' time. Support for Bagstad's time was provided by the US Geological Survey Land Change Science Program. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the US Government.