Article Uncertainty of Forest Biomass Estimates in North Temperate Forests Due to Allometry: Implications for Remote Sensing

Estimates of above ground biomass density in forests are crucial for refining global climate models and understanding climate change. Although data from field studies can be aggregated to estimate carbon stocks on global scales, the sparsity of such field data, temporal heterogeneity and methodological variations introduce large errors. Remote sensing measurements from spaceborne sensors are a realistic alternative for global carbon accounting; however, the uncertainty of such measurements is not well known and remains an active area of research. This article describes an effort to collect field data at the Harvard and Howland Forest sites, set in the temperate forests of the Northeastern United States in an attempt to establish ground truth forest biomass for calibration of remote sensing measurements. We present an assessment of the quality of ground truth biomass estimates derived from three different sets of diameter-based allometric equations over the Harvard and Howland Forests to establish the contribution of errors in ground truth data to the error in biomass estimates from remote sensing measurements.


Introduction
Understanding the global carbon cycle and its influences on atmospheric greenhouse gases is among the most pressing issues in ecosystem science.Forests act as substantial terrestrial carbon sinks: they are estimated on average to absorb 2.7 Petagrams of carbon per year (PgC•yr −1 ).However, the uncertainties associated with these estimates are on the order of 1 PgC•yr −1 [1].Systematic and spatially continuous estimation of global carbon stocks is key to reducing this overarching uncertainty [2].Of these stocks, the carbon held in above-and below-ground forest woody vegetation, i.e., forest biomass carbon (MgC•ha −1 ) is a key component.Forest biomass carbon has been estimated by numerous scientific field studies and national programs [3][4][5][6], but aggregating data from inconsistent and spatially limited studies to generate global estimates of biomass may lead to errors [7].Remote sensing offers a potential solution to greater global consistency of estimates.Although biomass carbon (hereafter referred to as biomass) estimates derived from remote sensing may be less accurate at the plot scale than field (ground) measurements, remote sensing is technically capable of spatially continuous biomass estimates over the entire globe at some set level of spatial detail.Thus it has the potential to eliminate inconsistencies due to differences in measurement programs between diverse countries or agencies.It could eliminate the need for sampling and extrapolation, which has been shown to constitute as much as 98% of total biomass estimation error [8].
Given the need to mitigate the uncertainty in estimates of forest biomass, new spaceborne sensors have been proposed.Recent examples have been the proposed NASA DESDynI (Deformation, Ecosystem Structure and Dynamics of Ice) mission [1,9], the proposed ESA BIOMASS mission [10] and the efforts associated with the JAXA ALOS satellites [11].An important consideration for these sensor programs has been the need to demonstrate observation strategies and test algorithms for combining observations similar to those expected from proposed satellite sensor missions in order to estimate forest biomass and assess its accuracy.To that end, large-scale coordinated field surveys have been developed.During the summer of 2009 such a field study was carried out over several sites in the Northeastern United States.At these sites, forest composition, structure and biomass data were collected for a large set of hectare-sized fixed-area plots concurrently with remote sensing observations from radar and lidar sensors.

Study Goal and Objectives
Algorithms to derive biomass estimates from radar and lidar sensors are currently typically calibrated using field biomass data collected over forest plots.Thus in order to assess the quality of estimates from remote sensing instruments, an analysis of the uncertainty of ground-truth estimates is necessary.At the level of a field plot, such as the ha-sized plots in the Northeastern United States sites, one source of error that may contribute to reduced accuracy of field-and hence remote sensing-biomass estimates is that associated with applying allometric biomass equations to the new field data.Therefore, our goal was to examine the different sets of existing diameter-based allometric equations used for estimating above-ground forest biomass and to analyze the uncertainties associated with using these different equations to estimate forest biomass.We carried this out using diameter data collected at two well-measured sites as part of the 2009 Northeastern United States field surveys.Three compilations or studies [12][13][14] that present existing diameter-biomass regressions for North American forests were chosen for inter-comparison.There are several existing studies [8,15] focusing on tropical forest biomass that account for error sources such as allometric uncertainty, measurement and sampling errors.However, such studies over the temperate North American forests are either outdated or non-existent.Our work will provide a greater understanding of the uncertainties in remote sensing estimates of forest biomass for north temperate forests arising from allometric uncertainties.

Allometry
Dimensional analysis, or allometry, refers to the relationships between certain elements of a natural object's size and shape.In forestry, for example, the diameter and volume of a tree are related.This allows for prediction of a tree's volume, and by association its mass, through a much simpler and practical measurement of its diameter.In simple Euclidean terms the volume of an object is proportional to a product of its diameter, D, and height, H However, most natural objects such as trees are not well described by simple Euclidean shapes, especially given their complex structures (i.e., tree crowns).The use of fractal geometry (suggested in [16]) provides a more realistic alternative.Various studies have demonstrated the usefulness of this approach in relating tree diameter to crown dimensions in particular [17,18] and to the overall structure of trees in general, so that the relationship between volume and a diameter-height product is given by with both α, β positive and generally regarded to be bounded by 2 < α + β < 3. To take this a step further, analyses such as in [19] use biomechanics to report that height scales as a function of diameter, such that with 0 < γ ≤ 1.So Equation (2) becomes with as the proportionality constant.Since mass (or biomass when talking of trees) is a product of density (ρ) and volume, the total above ground biomass of a tree, M , can be written as In general a theoretical value of around 8/3 has been suggested for the coefficient b [20].In practice, both a and b have been shown to vary with tree species and ecological conditions among others.Whenever possible, these coefficients are empirically determined for the various species encountered in a particular forest and documented in the form of allometric equations.

Methods for Developing Allometric Relationships
Interest in determining above-ground biomass of forests has led to a fairly large body of studies where allometric equations have been developed and documented for different species in various ecological regions or biomes.Allometric biomass equations typically use tree diameter as the independent variable although it is fairly common to use height as a second independent variable along with diameter.In general allometric equations for a tree species are developed using destructive sampling methods.The independent variables, including diameter at breast height (dbh; 1.37 m above the ground [21]), are measured for a representative sample (usually over a range of diameters or ages) of trees of a species.These trees are then felled and separated into different components including the main stem (trunk), stem bark, branches and foliage.The fresh weight of each component is measured.Since the intent is to determine dry biomass, the components are then dried in ovens (throughout this article tree weight or biomass refers to the above-ground dry biomass).However, it is impractical to dry an entire large tree.Instead, sampling is used.The stem (or trunk) is cut into smaller pieces (e.g., 1 to 2 m in length) and fresh weight of each of the smaller sections is recorded.Discs (of a few centimeters in length) are cut from each section, labeled, weighed and dried in ovens.The dry weight of the discs is measured and the dry weight of the stem from which they were cut is estimated using the ratio of dry to wet weights [22].Bark weight is obtained in a similar manner.Some studies account for stump weights by cutting the tree very close to the ground; those studies that do not, use similar weight ratio methods to estimate stump weight as well.To estimate the weight of a tree crown, most studies adopt some type of a stratified sampling approach.Approaches such as this involve cutting branches into sections of a certain size and separating them into classes or strata based on branch diameter, measured at some distance from the base.Randomly chosen branches from each diameter class are chosen for drying.Branches that belong to larger diameter classes are weighed much like the main stem whereas smaller branches are dried intact.Foliage from the weighed branches is also dried and weighed.The entire crown weight is estimated using ratios of dry to wet weight.Typically in the next step dbh measurements for the destructively sampled trees are regressed to their total dry weight of each component (e.g., main stem, branches, etc.) or to their total above-ground dry tree weight.It should be noted that in most cases, the projected weight of the total tree allometry is different from the sum of its component allometry.However, some recent studies (such as [23]) have outlined more statistically sound methods for using the sum of components for estimating total biomass.In either case, component or total weights are related to dbh using regression techniques.Since the variation of tree weight is heteroscedastic, that is to say the variation increases with increasing diameter, the use of simple linear regression becomes complicated.Traditionally, this problem is circumvented by taking the logarithm of Equation ( 5), such that and using linear regression to estimate log a and b coefficients.This solves the problems of heteroscedasticity, however the conversion from logarithmic back to arithmetic units causes a bias in the mean estimated biomass.To correct for this artifact, Baskerville [24] suggested the following correction based on properties of lognormal probability distributions, where M c is the corrected mean weight and σ se is the standard error in logarithmic units and µ = log M .The factor e σ 2 se /2 is usually referred to as the bias correction factor and is published by most studies, however there is contention that this correction itself is biased for small sample sizes [13] thus it is not uniformly published or used.In more recent studies, such as [14] and [23] the problem of heteroscedasticity is accounted for by modeling variance and using more sophisticated regression techniques.

Using Existing Allometric Equations
Since it is rarely feasible to develop diameter-biomass allometries for a particular locational area of interest, it is common to use previously developed allometric equations.Because of the large amount of studies that document such equations, it becomes important to be able to correctly identify the most representative equations.Typically, biomass allometry studies either focus on one or a select few species across multiple regions or biomes [25][26][27][28]; for multiple species that belong to a particular region or ecosystem [3,22,[29][30][31][32]; or constitute literature that focuses on compiling multiple studies [12,13,33,34].The cited examples are not meant to be exhaustive, in fact hardly so, since studies, especially of the first two types, easily number in the hundreds.It is beyond the scope of this work to summarize all existing equations, however it would be remiss not to look at more than one of the studies that summarizes multiple sets of biomass allometries.Therefore, we chose three studies for comparison that are the key comprehensive compilations appropriate for Northeastern United States temperate forests.Even though these studies summarize coefficients for equations of the same form Equation (5), they approach the analysis in distinct ways.The first in Ter-Mikaelian [12] lists species-specific coefficients previously developed by multiple independent studies.The second in Jenkins et al. [13] develops new coefficients for species grouped into general categories synthesized from multiple regressions developed by other groups previously via destructive sampling.The third, Lambert et al. [14], lists species-specific coefficients calculated using raw data in a more rigorous statistical framework allowing a more accurate assessment of error.A short discussion on the three studies follows.

The Ter-Mikaelian Equations
The work by Ter-Mikaelian and Korzukhin [12] summarizes equations for sixty five North-American tree species from studies conducted in the United States.All coefficients are reproduced or recalculated for allometric equations of the form given in Equation (5).Some species are represented by multiple equations derived from different published destructive sampling efforts.The original objective of this study was to identify the reasons behind the observed variation between different allometric coefficients for species common to this region.Their subsequently widely used compilation of a large number of allometric equations was a byproduct of this effort.A consequence of the intent to conduct a quantitative comparison was the documentation of standard error for most of the equations.This was a major reason for our selecting this study over previously established works such as Tritton and Hornbeck [33] that otherwise provide a similar compilation for the Northeastern USA.
In the use of allometric equations, there is no strict rationale for which set of coefficients to choose for a particular species if more than one set exists.It is common to use coefficients based on locational proximity of where the equation was developed and where it is to be applied.In some cases multipliers based on, e.g., different soil types are given and that can be matched by equation users to local conditions.However these are not always the only consideration since large biases can be introduced if equations are used beyond the range of diameter values that were used for developing the original regressions; thus sometimes age and size of the application site are factored into the decision.Furthermore, in some cases, species specific equations may not be available, and applying coefficients developed for another species may be necessary.It is hard to perfectly ascertain the amount of error introduced by using equations for different species or even non-site specific equations.

The Jenkins Equations
Jenkins et al. [13] attempts to rectify the spatial variability among allometric equations seen in compilations such as the ones listed in [12].This study aims to develop generalizable equations that would be applicable for a large set of species across varied biomes.It adopts a meta-analysis approach, as described in [35], for combining results from multiple existing allometric studies.In short, it involves generating pseudo-data from published equations and combining all the pseudo-data to generate new regression coefficients.Here, instead of having species specific coefficients, as in Ter-Mikaelian [12], species were categorized into ten groups based on similarities in structure and allometric coefficients.The allometric coefficients generated in this study are all of the same form given in Equation (5).Although the authors are meticulous in categorizing a large number of important species and careful to include a wide range of diameters, the drawbacks include the potential for introducing biases resulting from use of non-species-specific and non-site-specific equations.Furthermore, estimates of standard error, derived from the pseudo-data, are sub-optimal and may be dominated by spatial variability in the diameter-biomass relationship.

The Lambert Equations
Lambert et al. [14] attempt a more statistically rigorous approach to generate best linear unbiased estimators of biomass.The approach, based on methods proposed in [23], provides species-specific equations generated by fitting actual raw diameter and biomass data collected over many sites across Canada.The use of raw data, instead of the pseudo-data approach used in Jenkins et al. [13], also allows for a more rigorous characterization of error.The approach outlined in [23] is a departure from the standard approaches in two distinct ways.First, it does not use the logarithms of diameter and biomass to circumvent the problem of heteroscedasticity.Instead it models the variance in a power-law sense, much like the diameter-biomass relationship itself.Secondly, it recognizes the possibility of correlation between component biomasses themselves.The process of separating a tree into its components, such as main stem, branches, canopy, etc., to estimate its dry weight during the development of allometric equations leads to biased estimates of total biomass.To account for this bias, Lambert et al. [14] use an approach known as Seemingly Unrelated Regression (SUR) so that the sum of component biomass estimates can be used to generate unbiased estimates of total biomass.Furthermore, it allows the use of a variance-covariance matrix to account for correlations between components when estimating the total error.Lambert et al. [14] list coefficients for each component, so that the total biomass can be estimated using where M i is the dry biomass for the i th component defined here as either stem, bark, foliage, branches or total, D is the diameter at breast height (dbh) and a i , b i are regression coefficients for a particular component.

Error Propagation Analysis
Several factors determine the error in estimating the biomass of a tree from a measurement of its diameter.These factors include (but are not limited to) [13] error in estimating coefficients of the allometric equations during their development, the use of those coefficients outside of the species and ecosystem from which they were developed, inconsistencies in methodologies between different species equations, and the error in measuring diameter of the tree an allometric equation will be applied to.It is not possible to perfectly account for every error source, but an analysis of the error sources that can be characterized in some mathematical framework is necessary.Since the sources of error may depend on the choice of allometry, a treatment for the three allometries chosen for this work is presented.

Errors in Estimates from the Ter-Mikaelian Equations
Three potential error sources in forest biomass estimates are considered for this type of allometry: biomass error due to an error in measurement of tree diameter (σ m ) during application of an allometric equation, error in determining allometric coefficients (σ a ) during the development of the allometric equation, and errors in using these allometric coefficients in a novel site (σ s ).Assuming that the equations are chosen properly, these three error sources should account for most of the error in biomass estimates [15].It is assumed that the three sources of error are independent, so that the total error in the estimates of tree weights can be written as Here, σ m refers to the error in weight due to an error in measurement of dbh during field surveys where the intent is to use existing allometric equations.While the dbh measurements are also expected to be error prone when the trees were destructively sampled to generate the allometric equations themselves, it is assumed for the purposes of this analysis that this error is subsumed under σ a .The dbh measurement error, σ D , can then be propagated to an error in tree weight by using a Taylor series expansion of the allometric model, as suggested in [15].Given an allometric model that consists of only diameter as the independent variable, the error in the above ground biomass can be written as a function of the measurement error as where σ D is the uncertainty in the measurement of diameter (D) and ∂ ln (f ) /∂ ln (D) is the partial derivative of the natural log of the allometric model function, f , with respect to the natural log of the diameter.Since the allometric model function is of the form f = aD b , the measurement error, σ m , is given by During the development of allometric equations a complex interplay of factors determine the uncertainty of model coefficients, σ a , which include the natural variability in tree structure, sampling methodology and measurement uncertainties among others.Most studies just report the standard error in estimate (SEE) or the root sum of squares of the fit residuals, σ se , as a means of estimating σ a without addressing each factor separately.Since standard errors are dependent on the methodology chosen for fitting diameter and weight, it is not simply equal to σ a .Most studies summarized in [12] use log-transformed variables for regression and the standard error is also reported in logarithmic units.In such cases, σ se must be transformed into arithmetic units.This is not as simple as using the inverse-logarithm since the statistics of random variables are skewed during this transformation.Baskerville [24] suggested the following conversion to estimate the allometric error from standard error in logarithmic units derived from properties of lognormal distributions where µ is the logarithm of estimated biomass, i.e., µ = log M .Some studies use base-10 logarithms, in such cases a similar correction is used but with the corresponding anti-log function.A few studies in [12] provide standard error in arithmetic units, in those cases it is assumed that σ a = σ se .The third component of error, σ s (site error), captures the error in biomass estimates introduced by using coefficients developed at a site different to which the equation is later applied.Few, if any, have attempted to quantify this error even though it is widely recognized as a potential uncertainty primarily driven by soil conditions and climate [34,36].However, an imperfect estimate of σ s can be obtained by employing a bootstrap type approach [37].The single-stage bootstrap is a technique from the non-parametric class of methods used in statistics for arriving at estimates of variation (or error) in data.It relies on resampling the data and using the spread of mean values from the various combinations of the resampled data to estimate the variance of errors.This approach, adopted for the analysis presented here, takes advantage of the fact that Ter-Mikaelian et al. [12] usually include more than one set of allometric coefficients per species, developed by different researchers in different locations.For example, this compilation [12] lists nine sets of coefficients for red maple (Acer rubrum) and seven for paper birch (Betula papyrifera).Locations where these equations were developed extend from West Virginia in the south to Nova Scotia in the north, in essence representing a range of ecoregions that may exist in the Northeastern United States.Different biomass estimates of a tree can be generated for a particular diameter by using different subsets of the equations and averaging estimates for each subset.In that manner, various combinations of these equations can be used to generate a large number of biomass estimates for a particular tree.If all combinations are used, 126 different biomass estimates can be obtained using combinations of five equations from a total of nine for red maples, while 35 different biomass estimates can be obtained for paper birch using combinations of four equations from a total of seven, providing enough samples for a crude estimate of the variance in allometric coefficients due to geographic variability.
Figure 1 shows biomass estimates from the nine red maple equations and the seven paper birch equations and mean values from different combinations of these equations.A sample standard deviation from the full set of combinations is computed and quadratic fits to diameter values are used as an estimate of σ s .
Figure 1.Variation between estimated biomass values from the different allometric equations summarized in [12] for red maple (Acer rubrum) and paper birch (Betula papyrifera).To obtain uncertainties in estimates of forest biomass at some spatial scale, σ sp , the per-tree errors (σ t ) are aggregated over the area of interest (assuming that the individual tree errors are uncorrelated), such that where N is the total number of trees for the particular area in question, and σ 2 t i is the total error in tree weight for the i th tree.

Error in Estimates from the Jenkins Equations
The treatment of error in the Jenkins equations is similar to that described in Section 4.1 with the exception of the site-error σ s .The equations developed in Jenkins et al. [13] are derived from studies that encompass all of the continental United States.The standard error for each group of equations includes variability due to site conditions and captures the intrinsic variability between the species grouped together.The standard error is therefore expected to be much larger.However, due to the correlated nature of psuedo-data used to re-estimate allometric coefficients, standard errors tend to be biased and not entirely reliable [13,14].In the absence of a better method of estimating variance, the standard error reported in [13] is used as an estimate of biomass uncertainty using the corrections described in Equation ( 16), so that the total error is given by with σ m estimated using Equation (15).

Error in Estimates from the Lambert Equations
The procedure outlined in Parresol [23] is more mathematically involved than those traditionally employed.Instead of relying on standard error estimates from regressions between diameter data and an estimate of tree weight, as done for equations summarized in Ter-Mikaelian [12] and Jenkins [13], this method attempts to include the error in estimated tree weight as well as the regression error.Because the method estimates total tree weight as a function of the estimated component weights, the correlations of errors among the components cannot simply be ignored.This is accounted for by using variance-covariance matrices for the coefficients of the component equations using a statistical framework known as NSUR (non-linear seemingly unrelated regression).Furthermore, the variance of component weights is not constant over all observations either (heteroscedasticity). Instead of relying on the traditional approach of using logarithms to avoid the problem of heteroscedasticity, here the regression error for each component equation, e i , is modeled such that if then e i is functionally related to the diameter using In the above formulation, coefficient c i is estimated by fitting diameter to the residuals e i for each component and also the total biomass.The allometric error in total biomass estimates, σ a is given by where σ 2 SU R is the SUR system variance or the residual sum of squares from the multiple non-linear regression analysis, S 2 Mt is the estimated variance in total biomass due to errors in estimating the coefficients of component biomass equations and σ ii is the residual root sum of squares from the particular equation of interest (in this case that of total biomass).The term ψ t (D) refers to the function that models heteroscedasticity, which takes the form of Equation (20), i.e., ψ i (D) = D c i .Lambert et al. publish coefficients c i in [14] for each of the component equations and also for the total biomass.The variance of total biomass, S 2 Mt is estimated using [23] where Σab is the estimated variance-covariance matrix of the set of coefficients a i , b i .These 8 × 8 matrices, for two parameters of each of the four component equations, that is, four a i and four b i rows/columns, are estimated using the raw tree weight and diameter data.The vector F ab is a row vector of the derivatives of the model function with respect to the fit parameters.Since the model function, given in Equation ( 12), is a summation of components, the vector F ab can be calculated using where F ab is an 8 × 1 row vector.All parameters needed to estimate the total variance of the weight of a tree using Equation ( 21) through Equation ( 23) are published for each species in Lambert et al. [14].
The site-specific error, discussed in Section 4.1, is not included for the Lambert equations because the variability modeled by Equation ( 21) has been shown [14] to encompass the variability between the different equations summarized in [12].The error due to measurement in diameter (when applying these equations) however needs to be propagated through the model function.This takes a slightly different form than Equation (15) and can be estimated using the Taylor series approximation of the model function given by where M total is given by Equation ( 12).This can be simplified to show easily that where σ D is the error in measuring tree diameter.The total error in estimating the tree biomass is given by where σ m and σ a are given by Equation (25) and Equation ( 21) respectively.
In the following sections we will use these three different sets of allometric equations on tree diameter data collected at two sites, the Harvard Forest in Massachusetts and the Howland Forest in Maine.The purpose is to demonstrate, quantify and compare the uncertainties in biomass estimates generated from different sets of allometric equations.

The Harvard Forest
The Harvard Forest near Petersham, MA is an ecological research facility that has been managed by the Harvard University since 1907.It is spread over 3,000 hectares and is split mainly into three tracts: Prospect Hill, Tom Swamp and Slab City (see Figure 2).The forest is representative of the Transition Hardwoods of central New England [38], with dominant species of red oak (Quercus rubra), red maple (Acer rubrum), white birch (Betula papyrifera), white pine (Pinus strobus) and eastern hemlock (Tsuga canadenesis).Most of the forest was artificially planted in the first half of the twentieth century over reclaimed agricultural land [39].Permanent sites for research in a wide array of fields such as biodiversity, conservation, forest-atmosphere carbon exchange and soil warming to name a few are distributed throughout the forest.The orientation of each plot was chosen to be either 5 degrees for vertical plots, or 95 degrees for horizontal plots, in order to align with radar overflight tracks.The shape of the plots was chosen to maximize the overlap between the square radar pixels and forest measurement data.Of the 15 plots, ten were in Prospect Hill (titled PH01 to PH10), two in Tom Swamp (TS01, TS02) and one in Slab City (SC01).The remaining two plots were set in the nearby Federated Women's Club State Forest (SF02 and SF04).Because the Prospect Hill tract is the main research site at the Harvard Forest, contains the most diversity in species and has more low relief and accessible areas, most of the plots were placed in Prospect Hill as seen in Figure 2. Otherwise, the locations of the plots were also chosen based on species composition, age/structure, topography, accessibility and lidar/radar coverage.Some plots were set in homogenous stands; PH1 and PH7 are set inside stands of predominantly tall and dense red pine trees, PH4 and PH2 are set within a hemlock stand, PH6 and PH10 are set within stands of predominantly young deciduous trees.Plots and subplots were geolocated using a combination of GPS, compass and tape.Because of the thick canopy cover at the Harvard Forest, the absolute accuracy of the final GPS reference locations was on the order of 4 m.For each of the 240 subplots, diameter, species and condition (live or dead) for each tree larger than 10 cm was recorded.

Allometric Equations for the Harvard Forest
Table A1 summarizes field diameter data for the 23 species and site-specific allometric equations used for each to estimate tree biomass.All parameters and statistics listed are for allometric equations of the form given in Equation ( 5) with diameters expressed in centimeters and estimated biomass in kilograms.In Table A1, the entries labeled Min D, Max D and NT, summarize the diameter data collected over the Harvard Forest.Min D and Max D are the minimum and maximum diameters recorded for a total number of trees (NT) of the corresponding species.The column 'Type' refers to the type of equations summarized in Ter-Mikaelian [12] used here to determine the weight of the trees surveyed at the Harvard Forest.With only one very minor species exception this was always AB: total above ground biomass (Table A1).For the Ter-Mikaelian analysis, AB equations were used to estimate biomass for 99.7% of the total tree count at the Harvard Forest site.
The columns a, b are the coefficients of the allometric equations.The term 'Range' in field 'Range/Over' refers to the range of diameter values that either the chosen study reported or was estimated by [12] and 'Over' refers to the number of trees in the diameter data collected over the Harvard Forest that exceed this limit.Columns 'MTD', 'R 2 ', 'SEE', 'N' are representative statistics highlighting the performance of the regression between tree diameters and weights reported by the researchers that developed these allometric coefficients.The column 'MTD' refers to the method used in fitting the two variables, which include the use of log-transformed (ln: log e , log: log 10 ) data for linear regression or the use of weighted-non-linear (abs) regression for estimating the allometric coefficients.R 2 , the coefficient of determination, is reported by most studies and summarized in [12].The SEE (standard error in estimate) is either reported by the authors or in a few cases had to be calculated from the data summaries themselves.It is listed in units based on the fit methodology, i.e., either log, ln or abs.The parameter 'N' refers to the number of samples (trees) used in the regression of tree diameter and weight data for developing the allometric coefficients, while 'CF' is the correction factor suggested in [24] to correct for biases caused by the conversion between arithmetic and logarithmic units.Finally the last two columns in Table A1 refer to the location where the equation was developed and the equation authors and their original publication.
When more than one equation was available, particular care was given to choosing equations so that the weights and errors for most of the trees recorded during the Harvard Forest survey could be accurately estimated.Ideally there would be coefficients and statistics for every species recorded during the Harvard Forest survey, but that was not always the case.In the instances where allometric coefficients and statistics for a particular species were not found in the Ter-Mikaelian study, substitute equations and statistics were used.Very few individuals of any species for which there was not a specific equation were encountered during the survey at the Harvard Forest.Thus we assume that the use of substitute equations would not significantly impact the validity of the results presented here.For instance, the two statistics SEE and CF for black birch (Betula velutina) were not listed in Ter-Mikaelian, so they were estimated from the summary in Tritton and Hornbeck [33].The allometric coefficients for American chestnut (Castanea dentata) were not listed altogether in [12] so the coefficients from sugar maple (Acer saccharum) were used instead, as suggested in [3].Similarly, coefficients for paper birch (Betula papyrifera) were used for hophornbeam (Ostrya virginiana, an infrequently occurring understory species).The SEE statistic for total weight was not available for red pine (Pinus resinosa) so the standard error of the 'stem-bark' equation was used instead.The summary in Ter-Mikaelian does not list the error from any of the equations in Young et al. [22]; in cases where those coefficients were used, SEE was estimated for this article from the data summary in [22].
Coefficients from the Jenkins equations from [13] are summarized in Table A2.Both coefficients a, b are for allometric equations of the form given in Equation ( 5).This study groups most important species seen in the United States into eleven general classes.All 23 species measured at the Harvard Forest fall into eight of those eleven classes.Since Jenkins et al. [34] pay attention to the range of diameter values over which the coefficients are estimated (especially towards the higher end), very few trees measured at the Harvard Forest exceed the limits of these equations.
The coefficients of the biomass equations from Lambert et al. [14] of the form given in Equations ( 8) through Equation ( 12) are summarized in Table A3.The coefficients for the four component equations are listed as a i , b i for the i th component (stem, bark, branches or foliage, respectively).The error parameters σ SU R and σ tt are the SUR system variance and error in the total biomass equation as described in Section 4.3.The field ψ t (D) refers to the coefficient of the function that models heteroscedasticity of error in the total biomass equation, i.e., coefficient 'c' in ψ t (D) = D c .The columns 'Range' refers to the range of diameter values the equations were regressed over and 'N' to the number of samples or trees of each species used in the regressions.For species at Harvard Forest for which specific equations were not provided in Lambert et al. [14], coefficients for general softwood (pines and other gymnosperms) or hardwood (maples and other angiosperms) equations were used.

Comparison of Biomass Estimates and Errors
Individual tree weights and associated uncertainties were estimated for the Harvard Forest 2009 dataset using coefficients listed in Tables A1-A3 with the appropriate biomass and error equations discussed in Sections 3 and 4 respectively.Figure 3 shows the mean tree weights (in kg) for four of the major tree species encountered at the Harvard Forest as a function of the diameters measured during the field survey.Error bars using 95% confidence intervals are also shown.Since a majority (90 percent) of the measured diameters are less than 40 cm, the diameter range in Figure 3 is truncated at 40 cm.In all cases a diameter measurement accuracy, σ D , of 2% was chosen [40].
As expected the mean tree biomass of softwoods (dominantly red pines and hemlocks) are lower than the mean tree biomass of hardwoods (dominantly red maples and red oaks).Within the hardwood category, compared with the oaks, red maples have lower specific wood density so they tend to have lower biomass values.This is noticeable in Figure 3 with the biomass of red maples only slightly greater than those of hemlocks and pines of the same size.The more noticeable trend in Figure 3 is the large uncertainty in tree biomass estimates.In all four cases shown here, the error exceeds 100% of the mean tree biomass for larger diameters.The mean biomass and error estimates from the Lambert equations tend to be less than or equal to estimates from the other two allometric equations for most species.These per-tree biomass estimates were summed over all trees in a subplot (one subplot = 0.0625 ha) and multiplied by the appropriate scale factors to generate estimates of biomass in units of tons/ha.The biomass errors for the subplots, obtained by using the root sum of squares of the individual tree errors, were also multiplied by the same scaling factors to convert the errors into units of tons/ha.Figure 4 compares the total biomass estimates for all the subplots obtained using the three different allometries.Estimates from the Jenkins allometry [13] are plotted against subplot-level biomass estimates from the Ter-Mikaelian [12] and Lambert [14] allometries.The errors in biomass estimates from each of the allometries are shown as error-bars of widths corresponding to 95% confidence intervals.The mean estimates from the Ter-Mikaelian and Jenkins equations are consistent, but the estimates from the Lambert equations are consistently lower.The subplot-level biomass estimates over the Harvard Forest range from 50 tons/ha to 500 tons/ha with mean values of roughly 200 tons/ha from the Lambert and 250 tons/ha from the other two allometries.
Figure 5 shows the per-tree weights aggregated for the corresponding plots to generate hectare-level biomass totals from the three allometric equations.Estimates of errors are shown as error-bars using 95% confidence intervals.The biomass values for the fifteen 1-ha plots over the Harvard Forest range from 115 to 350 tons/ha.The confidence intervals for these hectare-level biomass estimates are fairly narrow, with mean errors of 2 tons/ha for the Lambert and Ter-Mikaelian allometries and slightly higher errors of 4 tons/ha for estimates using the Jenkins allometry.Comparison of subplot-level biomass estimated over the Harvard Forest using the Ter-Mikaelian [12] (Table A1), Jenkins [13] (Table A2) and Lambert [14] (Table A3) allometric equations.A1), Jenkins [13] (Table A2) and Lambert [14] (Table A3) allometric equations.Figure 6 shows a comparison of the estimated errors as a function of mean biomass at the 1-ha (plot) and 0.0625-ha (subplot) level for the three sets of allometric equations.The error from the Jenkins equations is higher with a mean value of 18.8 tons/ha for subplots and 4.95 tons/ha for 1-ha plots.The mean error for both the Ter-Mikaelian and Lambert allometries is roughly the same at 2 tons/ha and 7 tons/ha for plot and subplot level estimates respectively.The maximum error in subplot level estimates can be as high as 86 tons/ha for the Jenkins equations while the maximum for both Lambert and Ter-Mikaelian equations are on the order of 30 tons/ha.At the ha-level the Jenkins equations again display a maximum error of approximately 8 tons/ha.The biomass error at the subplot-level has a strong dependence on the mean biomass value for either allometry.Errors from the Jenkins equations also display an increase in their variance as a function of the mean biomass values at both subplot and ha-levels.This effect is less noticeable for the other two allometries.Figure 6.Subplot (left) and hectare level (right) biomass error as a function of mean biomass from the Ter-Mikaelian [12] (Table A1), Jenkins [13] (Table A2) and Lambert [14] (Table A3) allometric equations.

Howland Forest, Maine
The Howland Forest research facility, managed by the University of Maine since 1989, is spread over 200 ha in central Maine near the town of Howland, 50 km north of Bangor, Maine. Figure 7 shows its two tracts, one near the town of Howland and the other near Penobscot.The forest, a boreal-northern hardwood transitional forest, consists mainly of spruce, fir, hemlock, pines and maples.The topography of the region is generally flat and the field sites are also laid out in low relief areas.At Howland and Penobscot, plots of the same dimensions as those established at the Harvard Forest were laid out and surveyed.The field surveys over Howland/Penobscot and Harvard were part of the same data collection effort so the methodology over these two sites was deliberately similar.We use data from both the Howland and Penobscot Forest surveys, however for simplicity they are hereafter referred to as the Howland Forest.

Howland Forest Field Survey
At the Howland Forest 11 1-ha plots were surveyed during the summer of 2009.Data from 12 1-ha plots were also collected at Penobscott.A total of 28 species were recorded during the two field surveys.Of those, eight account for roughly 90% of the total tree stem count and biomass.The field data recorded species, diameters (dbh) and condition over all subplots within the multiple 1-ha plots.

Allometric Equations for the Howland Forests
Of the 28 species recorded at the Howland Forest, 17 had been encountered at the Harvard Forest as well.For species common to both the Harvard and Howland Forest datasets, allometric equations summarized in Section 5.2 were used to generate biomass and error estimates over the Howland Forest.For the species that were only seen at the Howland Forest and not at the Harvard Forest, coefficients from the Ter-Mikaelian equations [12] were selected (Table A4).Coefficients for green ash (Fraxinus pennsylvanica) were not summarized in [12], therefore those of black ash (Fraxinus nigra) were used.Similarly, coefficients from bigtooth aspen (Populus grandidentata) were used for balsam poplar (Populus balsamifera).For a small number of stems, species was not recorded.In such cases the coefficients for balsam fir (Abies balsamea) were used since it was the most widespread species.The eleven species unique to the Howland Forest dataset belong to the same eight classes for the Jenkins equations as summarized in Table A2.Balsam fir (Abies balsamea) belongs to the True Fir/Hemlock class, mountain maple (Acer spicatum) to the Soft Maple/Birch class, while black ash (Fraxinus nigra), green ash (Fraxinus pennsylvanica) and American elm (Ulmus americana) are classified as Mixed Hardwoods in [13].For the two spruce species, Norway and black (Picea abies, Picea mariana) coefficients from the Spruce class were used, while coefficients from the Cedar/Larch class were used for white cedar (Thuja occidentalis).Both balsam poplar (Populus balsamifera) and trembling aspen (Populus tremuloides) are classified into the Aspen/Adler/Willow category.Biomass for the unidentified trees was estimated using the True fir/Hemlock equation.
Allometric coefficients from Lambert et al. [14] for the species unique to the Howland forest are summarized in Table A5.Here, coefficients for most of the species were available, except for mountain maple (Acer spicatum), green ash (Fraxinus pennsylvanica) and Norway spruce (Picea abies).For the first two, coefficients from the general hardwood equations were used, while the coefficients from the softwood equation were used for Norway spruce.The biomass of all the unidentified trees was estimated using a general hardwood equation as well.

Comparison of Biomass Estimates and Errors
Total biomass estimates for the 23 plots and the corresponding 368 subplots at the Howland Forest were estimated by summing the tree biomass estimates obtained using the three different allometric equations.Unlike the Harvard Forest dataset, a larger number of low-biomass sites were sampled during the field surveys over the Howland Forest.Figure 8 shows the comparison of subplot-level biomass estimates from the three allometric equations; Jenkins [13], Ter-Mikaelian [12] and Lambert [14].Errors in these subplot-level biomass estimates are shown as error-bars using 95% confidence intervals.

Ter-Mikaelian Lambert
As was the case with the Harvard Forest estimates, the Lambert allometry (shown here in green) consistently underestimates the biomass values compared with the Jenkins equations.Mean biomass estimates from Ter-Mikaelian and Jenkins equations, however, are similar.The range of biomass values at the Howland Forest at the subplot-level is similar to the Harvard Forest (of roughly 450 tons/ha), however there are more subplots with fairly low biomass values (less than 50 tons/ha) that are not present in the Harvard Forest dataset.
Figure 9 shows the biomass estimates for the twenty three hectares at the Howland Forest obtained from the three different allometric equations.The biomass estimates from these sites, with the descriptor H for Howland and P for Penobscot, range from close to zero (plot P9) to about 270 tons/ha (plot P1).The high biomass values at Howland are not as high as those at the Harvard Forest (of up to 350 tons/ha), primarily due to the larger number of hardwoods at the Harvard Forest site.However, the low biomass sites, such as P9 at Howland, are much lower than any found at the Harvard Forest, primarily because they are comprised of very low density, small trees (about 77 trees in total with diameters ranging from 6 to 13 cm).Similar site structures were not surveyed at the Harvard Forest.Comparison of hectare-level biomass estimated using the Ter-Mikaelian [12] (Table A4), Jenkins [13] (Table A2) and Lambert [14] (Table A5) allometric equations over the Howland Forest.Figure 10.Subplot and plot level biomass error as a function of mean biomass from the Ter-Mikaelian [12] (Table A4), Jenkins [13] (Table A2) and Lambert [14] (Table A5) allometric equations over the Howland Forest.

Hectares
Figure 10 shows the error estimates from the three allometries as a function of mean estimated biomass at the two spatial scales (subplots and hectares).These errors, much like those for the Harvard Forest, increase as a function of the mean biomass value for both the subplot and ha-levels.The variance of the errors from the Jenkins equations increases as a function of the mean biomass as well.Of the three allometric equations the mean errors are highest for the Jenkins equations at 8.7 tons/ha and 2.3 tons/ha for subplots and hectares respectively.Much like Harvard Forest, both the Ter-Mikaelian and Lambert allometries yield lower and consistent mean error values of roughly 1 ton/ha and 4 tons/ha for hectares and subplots.

Implications for Remote Sensing
Remote sensing instruments such as radar and lidar do not measure tree weights or forest biomass directly, but rather measure some aspect of forest structure, which is in turn related to forest biomass.Typically, algorithms used to translate remote sensing observations to forest biomass estimates rely on field data (often called "ground-truth") for calibration and validation.The quality of remote sensing estimates of forest biomass is therefore directly affected by the quality of the ground-truth data.Of particular interest to any analysis of remote sensing algorithms used to predict biomass is the bias and uncertainty in those biomass estimates.Assuming that remote sensing algorithms can be properly calibrated, the amount of bias is likely dominated by the difference between field estimates of biomass and the actual forest biomass.Analyzing this bias is a difficult proposition, as it implies cutting down and weighing a sub-sample of the measured trees, which would be time-consuming and prohibitively expensive.Researchers are left with a large set of allometric equations that may or may not predict forest biomass accurately.It is therefore instructive to at least compare the variability between biomass estimates from the various allometric equations to understand the range of possible ground-truth values.We did so by comparing the biomass estimates from Ter-Mikaelian, Jenkins and Lambert allometric equations and analyzing their associated uncertainties.In addition, we have provided an estimate of the contribution of errors in ground truth-data to the total error budget of remote sensing biomass estimation algorithms.There was good agreement between the biomass values, at both subplot-and hectare-levels estimated using the Ter-Mikaelian and Jenkins allometries at both the Harvard and Howland sites.The Lambert biomass estimates were consistently lower, in some cases by up to 30 percent.Two factors distinguish the Lambert equations.First, the methodology used to develop the Lambert equations is somewhat different, and second, the destructive sampling tree data was collected in more northern Latitudes, across Canada instead of within the United States.While it is tempting to choose the Ter-Mikaelian or Jenkins equations because of their apparent agreement, the Lambert study has its merits because of the thorough and sound nature of the statistical framework analysis presented there.
Tables 1 and 2 summarize statistics for errors in biomass estimates at subplot-and hectare-scales respectively at the Howland and Harvard Forests.The statistics in terms of the mean, the maximum and the minimum for each of the three allometries are expressed in units of tons/ha.The errors in biomass estimates at the Howland Forest are lower in general than the errors at the Harvard Forest.This difference can primarily be attributed to the lower biomass estimates at the Howland Forest.Errors from the Jenkins allometry seem to be largest among the three allometries, with errors in estimates from the Lambert allometry consistently the lowest.At the subplot-level, the largest error is approximately 30 percent of the estimated biomass value for that subplot, which can exceed remote sensing requirements [1,41], necessitating a careful analysis of the quality of ground truth data.At the ha-level however, biomass error is almost always within 5 percent of the biomass estimate, suggesting that at larger spatial scales the quality of ground-truth data improves significantly, possibly to the point that this data can be used for calibrating remote sensing algorithms.For instance, the best case RMS error between lidar measurements and field estimates of biomass over the Harvard and Howland Forests is on the order of 30 tons/ha [42].Average errors of less than 5 tons/ha in ground truth estimates are thus only a small part of the total error budget and likely acceptable for calibration/validation purposes.

Conclusions
In this article, an effort was made to characterize the uncertainty of biomass estimates from data collected during field surveys over the Harvard and Howland Forests, representing typical eco-regions of the Northeastern United States.The field data consisted of diameter and species information from every tree larger than 10 cm over a total of 38 one-ha plots spread throughout the two sites.Three different sets of allometric equations were used to estimate mean biomass and the errors associated with those estimates.A comparison of the biomass and error estimates over the Harvard and Howland sites using the three allometries at subplot (25 × 25 m) and hectare (50 × 200 m) scales was presented.At the Harvard Forest, the biomass estimates ranged from 50 to 500 tons/ha for subplots and 113 to 250 tons/ha for one-ha plots, while at the Howland Forest the biomass values of up to 450 tons/ha and 250 tons/ha were estimated at subplot and one-ha plot scales respectively.The average error in subplot-scale biomass estimates varied between 4 and 20 tons/ha, while at one-ha scales these errors were smaller and ranged between 1 and 5 tons/ha.

Figure 2 . 1 .
Figure 2. Harvard Forest tracts and plots.The inset shows a 1-ha plot with its sixteen 25 m by 25 m subplots numbered one through sixteen.!!

Figure 3 .
Figure 3.Estimated tree weights with 95% confidence intervals for the four major tree species at the Harvard Forest.

Figure 7 .
Figure 7. Howland Forest Research facility includes two sites, one near the town of Howland and the other near Penobscott in central Maine.!

Table 1 .
Subplot scale comparison of biomass errors from the three allometries at the two study sites.All errors are listed in units of tons/ha.

Table 2 .
Hectare scale comparison of biomass errors from the three allometries at the two study sites.All errors are listed in units of tons/ha.

Table A3 .
[14]ary of coefficients from the Lambert allometric equations[14]for the twenty three species catalogued at the Harvard Forest.

Table A4 .
[12]ary of diameter data for the eleven species unique to the Howland Forest dataset and the corresponding coefficients chosen from Ter-Mikaelian allometry[12].

Table A5 .
[14]ary of coefficients from the Lambert allometric equations[14]for the eleven species catalogued only at the Howland Forest.