Small Field Plots Can Cause Substantial Uncertainty in Gridded Aboveground Biomass Products from Airborne Lidar Data

: Emerging satellite radar and lidar platforms are being developed to produce gridded


Introduction
Global estimates of the aboveground woody biomass density (AGB) and the changes in AGB over time are important for understanding how climate and land use change affect the global carbon cycle [1].Multiple current or upcoming spaceborne missions will support the production of global or near-global gridded AGB products.Notably, AGB estimation is an objective for the Global Ecosystem Dynamics Investigation (GEDI) lidar instrument on the International Space Station [2], the Biomass P-band synthetic aperture radar (SAR) satellite [3], and the NASA-ISRO Synthetic Aperture Radar (NISAR) L-and S-band SAR satellite [4].Together, data from these missions are intended to improve our understanding of AGB carbon stocks and fluxes.
Broad-scale gridded AGB products based on remotely sensed data are estimates produced by models that are calibrated using field data.Among these field data, generally, individual tree size and species observations from forest plots are used to predict biomass using allometric models and are then aggregated to estimate plot-level average AGB.However, the spatial resolution of forest plots is often smaller than the spatial resolution of gridded AGB products, which is constrained by instrument and sampling properties.For example, GEDI, Biomass, and NISAR will support gridded AGB products at 100 ha, 4 ha, and 1 ha, respectively [2][3][4].These missions will rely largely on contributed plot data collected for other efforts from different vegetation types globally, which are often less than 1 ha in size.Geolocation errors in plots and satellite pixels and the side-looking configuration of radar imagery can cause substantial errors in relating ground measurements of biomass to lidar and radar observations directly [5,6].Therefore, the direct use of field plots for the calibration and validation of spaceborne missions may introduce large uncertainty to both local predictions and the predicted regional spatial distribution of AGB [7,8].
To address this challenge, NISAR will use airborne laser scanning (ALS) data to scale up field data from smaller, sparse field plots and create 1 ha resolution, landscape-scale AGB estimates that can be used for the calibration of biomass algorithms and validation of NISAR's AGB estimates [9].In this approach, fine-resolution (<1 m point spacing) ALS data, as the so-called "gold standard" remote sensing technology for AGB estimation, are combined with the ground-estimated AGB in a model to estimate the AGB at the resolution of forest plots across the entire landscape [10].ALS-based AGB estimates are then aggregated to provide estimates of the AGB in 1 ha grid cells to match the NISAR AGB product pixel size [11][12][13].This process removes the large error associated with the direct use of plot data caused by mismatches in the size and location of plots with respect to radar imagery.However, this process also introduces additional sources of uncertainty related to the ALS-based model used to predict AGB and the resampling of ALS AGB pixels to 1 ha grid cells [14 -16].In particular, multiple studies have found that uncertainty in AGB-ALS models decreases with increasing plot size, as some sources of error become less important, including spatial mismatches between tree crowns and trunks, and plot/tree geolocation errors [8,12,17,18].
To estimate uncertainty in AGB predictions from NISAR, it is necessary to rigorously account for all sources of errors and to propagate uncertainty from ground plots to ALS and to radar-based estimations of biomass.NISAR's biomass Algorithm Theoretical Basis Document and Calibration and Validation ("CAL/VAL") Plan describes the process of the ALS-assisted calibration and validation of the NISAR AGB algorithm [9,19].
In this paper, we develop a workflow to estimate the 1 ha pixel-level uncertainty in AGB predictions associated with scaling AGB estimates from small field plots to larger landscapes with the aid of ALS data, an intermediate step in the NISAR CAL/VAL methodology.We apply a modified version of the methodology outlined in McRoberts et al. to the practical example of producing reference AGB maps, and associated uncertainty maps, using a set of 13 sites in temperate forests across four ecoregions of North America [20].To properly characterize uncertainty, we also examine: (1) how uncertainty can be reduced by using site-specific or multi-site models associated with similar vegetation types in terms of structure and species and (2) the effect of plot size on the model prediction accuracy.To achieve this goal, we focus on incorporating uncertainty associated with prediction uncertainty and residual variability [20].Prediction uncertainty is due to the presence of incomplete or imperfect information in the sample data used to train and evaluate models, and residual variability is associated with the distribution of training data around the model predictions.
prediction accuracy.To achieve this goal, we focus on incorporating uncertainty associated with prediction uncertainty and residual variability [20].Prediction uncertainty is due to the presence of incomplete or imperfect information in the sample data used to train and evaluate models, and residual variability is associated with the distribution of training data around the model predictions.
. More detailed information about each study site, including latitude/longitude, is provided in Table 1.More detailed information about each study site, including latitude/longitude, is provided in Table 1.
Table 1.Study sites with the National Ecological Observation Network (NEON).NEON plots were measured for woody vegetation structure on a rotating basis, with a subset of plots at each site measured each year, resulting in a range of years for those data per site.

Field Estimates of AGB
We estimated the AGB in NEON plots using woody plant vegetation structure data (DP1.10098)from field surveys [24].We used all plots with recorded tree and sapling data.NEON uses two types of field plots for woody vegetation structure, "distributed plots" and "tower plots".Both distributed and tower plots are nominally 40 × 40 m in size (0.16 ha), but distributed and tower plots have different subplot sizes where trees are actually measured.In the distributed plots, all trees ≥10 cm in diameter at breast height (DBH) were measured (diameter, species, and stem location) in only one 20 × 20 m (0.04 ha) subplot, while in tower plots, trees are measured in either one or two subplots of 0.04 ha, depending on the vegetation type.At the LENO site, we performed additional fieldwork to measure trees in all four 0.04 ha subplots.To ensure consistency in the sampled area across plot types, all analyses with NEON plot data used 0.04 ha subplots.
Small trees (1 cm ≤ DBH < 10 cm) were also measured in nested subplots large enough to include at least 20 individuals, or throughout the entire plot when small trees were sparse.Our analysis included trees that were ≥2.5 cm DBH, and we estimated the AGB density of small trees by assuming that nested subplots were representative of their plots.
We also estimated the AGB in two ForestGEO megaplots at SCBI and SERC that were 25.6 and 16 ha in size, respectively.In ForestGEO plots, all trees that were ≥1 cm DBH were measured throughout the plot area.We estimated the AGB density in gridded subplots of either 0.04 ha or 1.0 ha, and also included trees that were ≥2.5 cm in our analysis.
For each measured tree, we used allometric models to predict the AGB from the DBH and species the 'allodb' package in R.This package predicts AGB using multiple allometries and weights allometric model predictions based on the geographic location of plots [25].We chose this package because it included allometries based on species included in 24 extratropical megaplots in North America, Europe, and Asia, and allowed consistent AGB predictions across other extratropical NISAR calibration/validation sites.All AGB predictions from ground measurements and allometries were assumed to have no uncertainty; previous studies have found that the effects of allometric model prediction uncertainty are negligible relative to other sources of variability [26,27].

ALS Data
To scale AGB estimates from field plots to landscapes, we used small-footprint ALS data from the NEON Airborne Observation Platform (AOP).The NEON AOP carries an Optech Gemini ALS sensor and collects data over at least 10,000 ha at each site, with a point density of at least 5.2 pts m −2 (Table 1).For most sites, we used the most recent available ALS data at the time of analysis.For LENO, ORNL, and TALL, we used the ALS data closest in time to the UAVSAR campaigns detailed elsewhere in this special issue.We used the NEON-classified ALS data and excluded points classified as noise (DP1.30003.001)[28].
We processed ALS data into gridded canopy height models using the 'lidR' package in R [29,30].First, we calculated the normalized height above ground for each point with the 'normalize_height' function using the k-nearest neighbor approach with inverse-distance weighting.Next, we predicted canopy height with the 'grid_canopy' function at a 1 m resolution, using the points-to-raster method to select the tallest ALS return in each pixel with a 0.2 m buffer around the return locations.Last, we predicted the mean canopy height (MCH) as the average of the canopy height values within a larger area, either 0.04 ha plot boundaries or gridded across the landscape at 0.04 ha spatial grain.

AGB-ALS Model
We used a power function to describe the relationship between the plot-estimated AGB and forest structure (MCH) from ALS data: where a 1 and b 1 are the parameters to be estimated and ∼ N 0, σ 2 is a random residual term.The power function was fit with the least-squares approach using the 'nls' function in R and the Gauss-Newton algorithm.Models were fit using weighted least squares because the AGB prediction residuals were heteroscedastic [20].The weights were the inverse of the predicted residual variance, where residual variance was predicted using the procedure outlined in [20] for estimating heteroscedastic residual variance: 1.
First, the plots were ordered by their MCH and sorted into equal-sized groups of at least 10 plots each; 2.
Next, the average MCH and biomass residual variance were calculated for each group; 3.
Finally, a model was fit describing the increase in residual variance as a power function of average MCH: where σ 2 RV is the group average residual variance, ---MCH is the group average MCH, a 2 and b 2 are the parameters to be estimated, and δ ∼ N 0, σ 2 is a random residual term.
The model prediction of σ 2 RV was used to estimate the residual variance for each sample unit (training data) observation.
As weights depended on the fitted model, we used an iterative approach, first fitting Equation (1) without weights and then iteratively refitting using new weights until the estimates of parameters a 1 and b 1 both varied by less than 1% across five successive iterations.
We fit a single model first, hereafter the "general model", using AGB and MCH data from all plots at all sites.Next, we fit ecoregion-specific and site-specific models by using the relationship between MCH and AGB for all plots within an ecoregion and site, respectively.We evaluated the initial model fit at each site by reporting the root-mean-square error (RMSE) and R 2 of Equation (1) using 10-fold cross-validation-plots were randomly divided into 10 groups and the AGB for each plot was predicted from a model trained only on the other nine groups.All subsequent calculations using Equation ( 1) residual values were performed using each plot's residual value from 10-fold cross-validation.

Predicted Gridded Biomass across Landscapes
We predicted AGB at the landscape scale for each site by applying Equation ( 1), parameterized with field plot data, to gridded ALS-derived MCH data of the same spatial resolution (0.04 ha).We produced three maps of predicted AGB for each site by applying the general model, the relevant ecoregion-specific model, and the site-specific model [31][32][33].We predicted AGB at the 1 ha spatial resolution of the NISAR biomass product by first applying the Equation (1) estimated parameters to 0.04 ha resolution pixels (Tables S1 and S2) and then using the 'aggregate' function in the 'raster' package, taking the mean value of 25 contiguous pixels [34].

Estimating AGB Uncertainty at 1 ha Resolution
The variance (uncertainty) of the 1 ha gridded mean AGB was estimated using three separate variances associated with the prediction uncertainty, residual variability, and residual spatial autocorrelation: where σ2 Total is the total estimated uncertainty, σ2 PU is the prediction uncertainty, σ2 RV is the residual variability, and σ2 RSA is the uncertainty associated with the residual spatial autocorrelation.
Prediction uncertainty.We estimated the prediction uncertainty using a "wild bootstrapping" approach [31,32], where the value for each field plot AGB observation was resampled for each bootstrap iteration: where y i,j is the resampled AGB value for plot i and iteration j, ŷi is the predicted AGB value for plot i from Equation (1), ε i is the residual value for plot i from Equation (1), and ν i,j is randomly selected from a standard normal distribution with mean 0 and standard deviation 1 for each plot i and iteration j.Iterations continued until the standard deviation in the mean estimated landscape AGB across all iterations varied by less than 0.5% compared with the standard deviation of the mean estimated landscape AGB calculated for the preceding 50% of iterations [33].An example of the distribution of model parameter estimates from bootstrapping is provided in Figure S1.For each 1 ha pixel, σ2 PU was estimated as the variance in the predicted biomass (at 1 ha spatial grain) across all bootstrap iterations.
Residual uncertainty.The residual variance was calculated for each 0.04 ha pixel ( σ2 RV,0.04 ha ) from the MCH using the approach from [20] and described above for estimating residual variance during iterative weighted model fitting (Equation (2); Figures S2 and S3).The residual variance of AGB for each 1 ha grid cell σ2 RV ) was then estimated as: where n = 25, the total number of 0.04 ha pixels within each 1 ha pixel.
Residual spatial autocorrelation uncertainty.We estimated the variance from residual spatial autocorrelation using a correlogram constructed from observed model residual values.We constructed the empirical correlogram using two steps.First, we used the 'correlog' function from the 'ncf ' package in R to estimate point values describing the correlation of model residuals for plots at different distances apart, with 20 m distance increments to match the spatial grain of the field plot data [35].Next, we fit a spline with 3 degrees of freedom to the empirical correlogram data so that the spatial correlation could be predicted at any distance (Figure S4).Owing to the limited number of field plots and the amount of data required to estimate the spatial autocorrelation, we constructed one empirical correlogram from the observed residual values from the general model for all plots and sites combined; we assumed that the magnitude and distance of the spatial autocorrelation did not vary among sites.We estimated the residual spatial autocorrelation for each 1 ha pixel ( σ2 RSA ) by considering the sum of the products of the residual standard deviations and spatial correlations for each pair of 0.04 ha subpixels: σRV,0.04 ha i σRV,0.04ha j ρij (6) where ρij is the correlation between the residuals for the pairs of pixels predicted from the distance between the pixels using the empirical correlogram and n = 25, the total number of 0.04 ha pixels within each 1 ha pixel.

The Effect of Plot Size on AGB Uncertainty
To explore the relationship between field plot size and uncertainty in the 1 ha resolution gridded AGB products, we leveraged the SCBI and SERC megaplots that allowed us to fit AGB-ALS models at multiple resolutions-we compared the 1 ha pixel-level estimates of uncertainty for models scaled up from 0.04 ha subplots and 0.25 ha subplots, and for models directly fit with 1 ha subplots.As the SCBI and SERC megaplots contained no subplots smaller than 100 Mg ha −1 AGB at 1 ha resolution, we added additional simulated smaller-biomass data by randomly selecting MCH values between 0 and 20 m and AGB predictions based on the AGB-MCH model (Equation ( 1)) and the residual variance model (Equation ( 2)), assuming that the models fit to real plot data would predict the relationship for smaller MCH subplots (Figure S5).Ten hectares of additional plot data were simulated for each resolution, keeping the total plot area constant across models of different plot sizes.For each plot size, we calculated estimates of the AGB and associated uncertainty at a 1 ha resolution as described above.When estimating models directly from 1 ha field plots, the pixel-level uncertainty estimates did not include a contribution from residual spatial autocorrelation.

Plot-Based Biomass Estimates
We used 529 plots of 0.04 ha from 11 sites (excluding megaplot sites SCBI and SERC) to compare the performance of general, ecosystem-specific, and site-specific AGB-ALS models, and to demonstrate our upscaling approach.The plot-based biomass estimates ranged from 0.01 to 827.2 Mg ha −1 .The large biomass range was mostly due to the small size of the plots and the diversity of sites across temperate broadleaf and mixed forests with greater biomass (ORNL and LENO) and the boreal (DEJU) and temperate conifer (OSBS) sites with smaller biomass (Table 1; Figure 2).models, and to demonstrate our upscaling approach.The plot-based biomass estimates ranged from 0.01 to 827.2 Mg ha −1 .The large biomass range was mostly due to the small size of the plots and the diversity of sites across temperate broadleaf and mixed forests with greater biomass (ORNL and LENO) and the boreal (DEJU) and temperate conifer (OSBS) sites with smaller biomass (Table 1; Figure 2).

Plot-Based Model Performance
The general model, describing the relationship between the AGB and MCH in 0.04 ha inventory plots for all sites, had RMSE = 68.4Mg ha −1 using 10-fold cross-validation.
The relationships between AGB and MCH varied among ecoregions (Figure 2).The ecoregion-specific AGB-MCH models reduced the RMSE for all ecoregions (Table 2).However, the magnitude of RMSE reduction varied among ecoregions, from <2% for the grassland and savanna ecoregions to 20% for boreal forests.

Plot-Based Model Performance
The general model, describing the relationship between the AGB and MCH in 0.04 ha inventory plots for all sites, had RMSE = 68.4Mg ha −1 using 10-fold cross-validation.
The relationships between AGB and MCH varied among ecoregions (Figure 2).Ecoregion-specific AGB-MCH models reduced the RMSE for all ecoregions (Table 2).However, the magnitude of RMSE reduction varied among ecoregions, from <2% for the grassland and savanna ecoregions to 20% for boreal forests.The relationships between AGB and MCH also varied among sites (Figure 3).At the site level, the ecoregion-specific or site-specific model RMSE differed from the general model RMSE by up to 67% for the NIWO site.However, more specific models did not universally reduce the site-level RMSE-depending on the site, either the general, ecoregionspecific, or site-specific model had the smallest RMSE (Table 3).The relationships between the AGB and MCH also varied among sites (Figure 3).At the site level, the ecoregion-specific or site-specific model RMSE differed from the general model RMSE by up to 67% for the NIWO site.However, more specific models did not universally reduce the site-level RMSE-depending on the site, either the general, ecoregion-specific, or site-specific model had the smallest RMSE (Table 3).Table 3. Site-scale performance of models relating AGB and MCH for 0.04 ha inventory plots.All RMSE values were calculated using 10-fold cross-validation, and are presented in units of AGB and as a percentage of the mean AGB of the respective site.

One-Hectare-Resolution Estimates of Uncertainty
To characterize how pixel-level uncertainty in the 1 ha gridded AGB estimates varied across sites, among models (general, ecoregion-specific, or site-specific), and for different AGB values, we estimated the average pixel-level uncertainty for each site and model for all pixels in three different AGB ranges chosen to include the AGB range of NISAR's AGB validation requirement (≤100 Mg ha −1 ): 0-20 Mg ha −1 , 40-60 Mg ha −1 , and 80-100 Mg ha −1 .For all sites and models, the total uncertainty increased in magnitude with increasing predicted AGB, from 1.1-12.8Mg ha −1 for pixels with AGB ≤ 20 Mg ha −1 , 6.1-28.3 for pixels with AGB 40-60 Mg ha −1 , and 7.8-33.3for pixels with AGB 80-100 Mg ha −1 (Figure 4).The total uncertainty was smallest for the site-specific models in conifer forest sites, for the ecoregion-specific models in temperate broadleaf/mixed forest sites and boreal forest sites, and for the general model in temperate grassland/savanna sites (Figure 4).
We also characterized the relative contribution of each uncertainty component (prediction uncertainty, residual variability, and spatial autocorrelation in residuals) to the total model uncertainty (Figure 5).Residual spatial autocorrelation dominated the total model uncertainty for the general model across all levels of predicted AGB (Figure 5a,d,g), contributing ~70-90% of the total uncertainty; residual variability had the second-largest contribution to the total uncertainty for the general model.The prediction uncertainty was relatively more important for pixels with a smaller predicted AGB in ecoregion-specific models, contributing up to ~80% of the total uncertainty for temperate broadleaf/mixed forest sites and ~50% for temperate grassland/savanna sites (Figure 5b,e,h).The relative contributions among uncertainty components were more variable for site-level models across the range of predicted AGB, although residual variability was consistently smaller than ~50%, while prediction uncertainty and residual spatial autocorrelation could reach up to ~100% and ~80%, respectively, depending on the site (Figure 5c,f,i).

Plot Size Effects on Gridded AGB Uncertainty
The pixel-level prediction uncertainty, residual uncertainty, and residual spatial autocorrelation uncertainty were all greater for the models estimated from 0.04 ha plots than for the models estimated from 0.25 ha plots and 1 ha plots (Figure 6).The prediction uncertainty and residual uncertainty were the smallest for 0.25 ha plots, but the 1 ha plots had the smallest total pixel-level uncertainty due to the lack of contribution from residual spatial autocorrelation uncertainty.For pixels with AGB 80-100 Mg ha −1 AGB at SCBI and SERC, the average total uncertainty ( σTotal ) for AGB estimates for 1 ha plots was 10.1 Mg ha −1 , 45% less than the average for 0.04 ha plots (18.2 Mg ha −1 ); the average for AGB estimates for 0.25 plots (10.9 Mg ha −1 ) was 40% less than that for 0.04 ha plots.We also characterized the relative contribution of each uncertainty component (prediction uncertainty, residual variability, and spatial autocorrelation in residuals) to the total model uncertainty (Figure 5).The residual spatial autocorrelation dominated the total model uncertainty for the general model across all levels of predicted AGB (Figure 5a,d,g), contributing ~70-90% of the total uncertainty; residual variability had the secondlargest contribution to the total uncertainty for the general model.The prediction uncertainty was relatively more important for pixels with a smaller predicted AGB in ecoregionspecific models, contributing up to ~80% of the total uncertainty for temperate broadleaf/mixed forest sites and ~50% for temperate grassland/savanna sites (Figure 5b,e,h).The relative contributions among uncertainty components were more variable for site-level models across the range of predicted AGB, although residual variability was consistently smaller than ~50%, while prediction uncertainty and residual spatial autocorrelation could reach up to ~100% and ~80%, respectively, depending on the site (Figure 5c,f,i).

Plot Size Effects on Gridded AGB Uncertainty
The pixel-level prediction uncertainty, residual uncertainty, and residual spatial autocorrelation uncertainty were all greater for the models estimated from 0.04 ha plots than certainty and residual uncertainty were the smallest for 0.25 ha plots, but the 1 ha plots had the smallest total pixel-level uncertainty due to the lack of contribution from residual spatial autocorrelation uncertainty.For pixels with AGB 80-100 Mg ha −1 AGB at SCBI and SERC, the average total uncertainty ( ̂ ) for AGB estimates for 1 ha plots was 10.1 Mg ha −1 , 45% less than the average for 0.04 ha plots (18.2 Mg ha −1 ); the average for AGB estimates for 0.25 plots (10.9 Mg ha −1 ) was 40% less than that for 0.04 ha plots.

Discussion
We demonstrated that the pixel-level total uncertainty of gridded AGB products estimated from field plots and ALS data could be substantial compared with predicted AGB values.When ALS was used as an intermediate step in gridded spaceborne AGB product calibration, the uncertainty in ALS and plot-based AGB influenced the uncertainty of the resulting spaceborne product as well.
In this analysis, relatively large uncertainty could be attributed, in part, to the sparse and small field plots.Other studies have also demonstrated that the RMSE in AGB-MCH models decreases with increasing plot area [8,12,17].While small plots may be efficient and sufficient for other research purposes in forest ecology, larger plots are preferable for reducing uncertainty in model-based remote sensing products, particularly those from spaceborne sensors.
To demonstrate this point, we used the examples of the SCBI and SERC megaplots to develop AGB-MCH models from different plot sizes.We show, in Figure 6, that field plots ≥ 0.25 ha in size are necessary to reduce the uncertainty of lidar-estimated AGB and allow the estimates to be useful for the calibration and validation of other sensors, such as NISAR.The relationship between plot size and lidar model prediction uncertainty likely varies among ecoregions, and plots < 0.25 ha (but larger than 0.04 ha, based on our results) may be acceptable in ecoregions with lower biomass and/or higher stem density of trees (e.g., boreal forests).It can be challenging to know a priori what plot size is needed in a particular context, and ≥0.25 ha is a conservative choice that is likely to be sufficient in most systems.Our recommendations are consistent with the Committee on Earth

Discussion
We demonstrated that the pixel-level total uncertainty of gridded AGB products estimated from field plots and ALS data could be substantial compared with predicted AGB values.When ALS is used as an intermediate step in gridded spaceborne AGB product calibration, the uncertainty in ALS and plot-based AGB will influence the uncertainty of the resulting spaceborne product as well.
In this analysis, relatively large uncertainty could be attributed, in part, to the sparse and small field plots.Other studies have also demonstrated that the RMSE in AGB-MCH models decreases with increasing plot area [8,12,17].While small plots may be efficient and sufficient for other research purposes in forest ecology, larger plots are preferable for reducing uncertainty in model-based remote sensing products, particularly those from spaceborne sensors.
To demonstrate this point, we used the examples of the SCBI and SERC megaplots to develop AGB-MCH models from different plot sizes.We show, in Figure 6, that field plots ≥ 0.25 ha in size are necessary to reduce the uncertainty of lidar-estimated AGB and allow the estimates to be useful for the calibration and validation of other sensors, such as NISAR.The relationship between plot size and lidar model prediction uncertainty likely varies among ecoregions, and plots < 0.25 ha (but larger than 0.04 ha, based on our results) may be acceptable in ecoregions with lower biomass and/or higher stem density of trees (e.g., boreal forests).It can be challenging to know a priori what plot size is needed in a particular context, and ≥0.25 ha is a conservative choice that is likely to be sufficient in most systems.Our recommendations are consistent with the Committee on Earth Observation Satellites aboveground woody biomass working group recommendation of plot sizes of ≥0.25 ha in higher-biomass forests [18].
The model (general, ecoregion-specific, or site-specific) that most reduced the total pixel-level uncertainty varied between sites in this analysis.For example, for all conifer forest sites, the site-specific AGB predictions had less total uncertainty than either the ecoregion-specific or general AGB predictions (Figure 4).However, for the temperate broadleaf/mixed forest sites, total AGB prediction uncertainty was the lowest for the ecoregion model (Figure 4).This occurred because there was more variation in the AGB-MCH relationships among sites for boreal forest sites than for temperate broadleaf/mixed forest sites (Figure 3), resulting in greater residual variance for the boreal ecoregion model compared with the broadleaf/mixed model (Figure S2).When structural differences among sites are less apparent, using ecoregion-specific or general models calibrated with more data can reduce the contribution of prediction uncertainty to the total uncertainty (Figure 5).In addition to the total uncertainty in pixel-level AGB estimates, other considerations may also be important for model choice, such as systematic prediction error or bias (Figure S6).
There may be practical challenges to applying this framework to rigorously estimate model prediction uncertainty resulting from the volume of data needed to estimate intermediate model parameters.For example, the model for predicting the residual variance from MCH (Equation ( 2)) was fit using grouped data, where each group should have ≥10 separate observations.Therefore, at least 30 observations (i.e., field plots) were required to fit Equation ( 2)-having insufficient data to properly fit this model could result in inaccurate estimates of residual variation.Estimation of the correlogram-describing the correlation between residual values among plots/pixels as a function of distance-requires not only a large number of field plots, but also field plots that are arranged in space across an appropriate range of distances, particularly small distances, where the spatial autocorrelation is greatest.Here, we assumed that one empirical correlogram was applicable to all sites and models.The correlograms were likely to have varied among sites, but we lacked the data to fit site-specific correlograms.
We acknowledge approaches that could potentially further reduce the total model prediction uncertainty.For example, including additional ALS metrics, such as alternative height or canopy cover metrics, can improve the performance of AGB-ALS models for some ecoregions [36].We limited this analysis to the mean top of canopy height, because this metric tends to be relatively stable across instrument and data collection characteristics [37].Further, improving field plot and ALS co-registration can reduce uncertainty in field plot ALS models-one example using National Forest Inventory data from Spain found that increasing the accuracy of field plot geolocation reduced the RMSE of models predicting the stand volume and basal area by ~10% [38]; another study found that using survey-grade global positioning system (GPS) receivers with sub-meter accuracy reduced the standard error of large-area AGB estimates by over 20% compared with GPS receivers with location errors of up to 5-10 m [39].In this framework, models with additional parameters may reduce residual variability, but may also increase prediction uncertainty; the overall effect on the total uncertainty of adding additional metrics and parameters warrants further study.Here, we also assumed that the species-level AGB-diameter allometries exhibited no systematic prediction error.These allometries can have large uncertainties for individual trees (particularly large trees); therefore, the plot-level AGB for small plots can be uncertain as well, contributing to large residual variations [18,40].Although other approaches can be more accurate for individual tree AGB estimation (such as terrestrial lidar), these data are not yet widely available for landscape-level applications due to current limitations in data collection and processing at scale [41].Therefore, using ALS-based AGB estimates is necessary at this time to obtain enough 1 ha AGB samples to train and test SAR models, and to avoid errors associated with geolocation mismatches between field plots and satellite pixels [5,6].
This analysis is meant to demonstrate the workflow that will be applied for constructing gridded reference AGB products for the calibration and validation of the NISAR mission-the actual AGB estimates and associated uncertainty for NEON sites are likely to change from additional data collection before NISAR's launch and calibration/validation period.Our results support recent calls to have consistent, high-quality field data for the cal/val of remote sensing AGB products and large-scale Earth system models [13,[42][43][44].

Conclusions
We provide a framework for estimating pixel-level uncertainty in gridded products from AGB-lidar models; this framework will be used as part of the calibration/validation effort for the NISAR AGB algorithm.
The uncertainty of predicted AGB from ALS increases in absolute values with AGB, and varies among sites depending on whether the AGB-MCH model is developed based on site-specific, ecoregion-specific, or general (using all sites/ecoregions) data.
Our results show that plot size has a substantial impact on the uncertainty of AGB predicted from ALS data.Predictions from models created using small field plots have large uncertainty due to several factors, among them including the uncertainty of groundestimated biomass in small plots, difficulty of height capturing the variability of volume and tree size, potential extension of tree crowns outside the lidar pixels of plot area, and potential plot location errors.Larger field plots can reduce the uncertainty of AGB predicted from lidar height metrics.We recommend large (≥0.25 ha) field plots for use in developing AGB-lidar models with reasonably low uncertainty.

Figure 1 .
Figure 1.Location of study sites and the distribution of NISAR ecoregions in the United States and Canada.More detailed information about each study site, including latitude/longitude, is provided in Table1.

Figure 1 .
Figure 1.Location of study sites and the distribution of NISAR ecoregions in the United States and Canada.More detailed information about each study site, including latitude/longitude, is provided in Table1.

Figure 2 .
Figure 2. Comparison of the general model relating aboveground woody biomass to the mean canopy height (fit to all data) with ecoregion-specific models.Each point represents one 0.04 ha field inventory plot.

Figure 2 .
Figure 2. Comparison of the general model relating aboveground woody biomass to the mean canopy height (fit to all data) with ecoregion-specific models.Each point represents one 0.04 ha field inventory plot.

Figure 3 .
Figure 3.Comparison of the general model (fit to all data), ecoregion-specific models, and site-specific models relating the aboveground woody biomass to mean canopy height.Each point represents one 0.04 ha field inventory plot.Gray points are plot data from other ecoregions, included for comparison.

Figure 3 .
Figure 3.Comparison of the general model (fit to all data), ecoregion-specific models, and site-specific models relating the aboveground woody biomass to mean canopy height.Each point represents one 0.04 ha field inventory plot.Gray points are plot data from other ecoregions, included for comparison.

Figure 4 .
Figure 4. Variation in the average total pixel-level model uncertainty ( ̂ , Equation (3)) across sites, models (general model, ecoregion-specific models, and site-specific models), and predicted AGB values.Each point represents the average pixel-level uncertainty for a given site for all 1 ha gridded pixels within 1 of 3 predicted AGB ranges: 0-20 Mg ha −1 (a,b), 40-60 Mg ha −1 (c,d), and 80-100 Mg ha −1 (e,f).The left-hand panels (a,c,e) compare uncertainty from the general model (x-axis) with uncertainty from the relevant ecoregion-specific model (y-axis), and the right-hand panels (b,d,f) compare uncertainty from site-specific models (x-axis) with uncertainty from the relevant ecoregion-specific model (y-axis).Each panel includes one point per site, colored by ecoregion.

Figure 4 .
Figure 4. Variation in the average total pixel-level model uncertainty ( σTotal , Equation (3)) across sites, models (general model, ecoregion-specific models, and site-specific models), and predicted AGB values.Each point represents the average pixel-level uncertainty for a given site for all 1 ha gridded pixels within 1 of 3 predicted AGB ranges: 0-20 Mg ha −1 (a,b), 40-60 Mg ha −1 (c,d), and 80-100 Mg ha −1 (e,f).The left-hand panels (a,c,e) compare uncertainty from the general model (xaxis) with uncertainty from the relevant ecoregion-specific model (y-axis), and the right-hand panels (b,d,f) compare uncertainty from site-specific models (x-axis) with uncertainty from the relevant ecoregion-specific model (y-axis).Each panel includes one point per site, colored by ecoregion.

Figure 5 .
Figure 5. Ternary plots showing the relative contribution of the prediction uncertainty, residual variability, and residual spatial autocorrelation to the total model uncertainty ( ̂ , Equation (3)) among sites and for different AGB ranges.Each point represents the site-level average across all 1 ha pixels with low (0-20 Mg ha −1 , a-c), medium (40-60 Mg ha −1 , d-f), or high (80-100 Mg ha −1 , g-i) predicted biomass values within the range of interest (0-100 Mg ha −1 ) for the NISAR mission.Results are shown for the general model (left column), ecoregion-specific model (center column), or sitespecific model (right column) applied to each site.

Figure 5 .
Figure 5. Ternary plots showing the relative contribution of prediction uncertainty, residual variability, and residual spatial autocorrelation to the total model uncertainty ( σTotal , Equation (3)) among sites and for different AGB ranges.Each point represents the site-level average across all 1 ha pixels with low (0-20 Mg ha −1 , a-c), medium (40-60 Mg ha −1 , d-f), or high (80-100 Mg ha −1 , g-i) predicted AGB values within the range of interest (0-100 Mg ha −1 ) for the NISAR mission.Results are shown for the general model (left column), ecoregion-specific model (center column), or site-specific model (right column) applied to each site.

Figure 6 .
Figure 6.Variation in prediction uncertainty ( ̂ in Equation (3), a), residual variability ( ̂ in Equation (3), b), residual spatial autocorrelation ( ̂ in Equation (3), c), and total model uncertainty ( ̂ in Equation (3), d) for 1 ha gridded AGB estimates derived from 0.04 ha, 0.25, ha, or 1 ha subplots at the SCBI and SERC sites with large ForestGEO megaplots.The residual spatial autocorrelation does not contribute to the total pixel-level uncertainty of 1 ha plots because the models were fit to the data of the same spatial resolution.All uncertainty values are in absolute units (Mg ha −1 ).

Figure 6 .
Figure 6.Variation in prediction uncertainty ( σPU in Equation (3), a), residual variability ( σRV in Equation (3), b), residual spatial autocorrelation ( σRSA in Equation (3), c), and total model uncertainty ( σTotal in Equation (3), d) for 1 ha gridded AGB estimates derived from 0.04 ha, 0.25, ha, or 1 ha subplots at the SCBI and SERC sites with ForestGEO megaplots.Residual spatial autocorrelation does not contribute to the total pixel-level uncertainty of 1 ha plots because the models were fit to the data of the same spatial resolution.All uncertainty values are in absolute units (Mg ha −1 ).

Table 2 .
Ecoregion-scale performance of models relating AGB and MCH for 0.04 ha inventory plots.Sites within each ecoregion are given in Table1.All RMSE values were calculated using 10-fold cross-validation, and are presented in units of AGB and as a percentage of the mean AGB of the respective ecoregion.

Table 2 .
Ecoregion-scale performance of models relating AGB and MCH for 0.04 ha inventory plots.Sites within each ecoregion are given in Table1.All RMSE values were calculated using 10-fold cross-validation, and are presented in units of AGB and as a percentage of the mean AGB of the respective ecoregion.