Modeling Forest Aboveground Biomass and Volume Using Airborne LiDAR Metrics and Forest Inventory and Analysis Data in the Pacific Northwest

The United States Forest Service Forest Inventory and Analysis (FIA) Program provides a diverse selection of data used to assess the status of the nation’s forests using sample locations dispersed throughout the country. Airborne laser scanning (ALS) systems are capable of producing accurate measurements of individual tree dimensions and also possess the ability to characterize forest structure in three dimensions. This study investigates the potential of discrete return ALS data for modeling forest aboveground biomass (AGBM) and gross volume (gV) at FIA plot locations in the Malheur National Forest, eastern Oregon utilizing three analysis levels: (1) individual subplot (r = 7.32 m); (2) plot, comprising four clustered subplots; and (3) hectare plot (r = 56.42 m). A methodology for the creation of three point cloud-based airborne LiDAR metric sets is presented. Models for estimating AGBM and gV based on LiDAR-derived height metrics were built and validated utilizing FIA estimates of AGBM and gV derived using regional allometric equations. Simple linear regression models based on the plot-level analysis out OPEN ACCESS Remote Sens. 2015, 7 230 performed subplot-level and hectare-level models, producing R2 values of 0.83 and 0.81 for AGBM and gV, utilizing mean height and the 90th height percentile as predictors, respectively. Similar results were found for multiple regression models, where plot-level analysis produced models with R2 values of 0.87 and 0.88 for AGBM and gV, utilizing multiple height percentile metrics as predictor variables. Results suggest that the current FIA plot design can be used with dense airborne LiDAR data to produce area-based estimates of AGBM and gV, and that the increased spatial scale of hectare plots may be inappropriate for modeling AGBM of gV unless exhaustive tree tallies are available. Overall, this study demonstrates that ALS data can be used to create models that describe the AGBM and gV of Pacific Northwest FIA plots and highlights the potential of estimates derived from ALS data to augment current FIA data collection procedures by providing a temporary intermediate estimation of AGBM and gV for plots with outdated field measurements.


Introduction
Light detection and ranging (LiDAR) is a laser-based, active remote sensing system, which collects ranging data utilizing the speed of light and information about the flight time of a laser pulse [1].In this context, flight time refers to the time it takes for a given laser pulse to travel from a system, backscatter from an object, and return back to the system.A wide variety of LiDAR systems currently exist, and data have been successfully collected utilizing systems mounted to space-borne, aerial, and terrestrial (tripod or vehicle-based) platforms.
It should be noted that the ability to acquire three-dimensional data is not unique to LiDAR remote sensing systems.This type of data can also be obtained by radar systems (another active remote sensing system) or through the use of photogrammetric techniques in conjunction with stereoscopic image pairs collected by airborne or satellite systems.A variety of studies have provided comparisons of LiDAR and radar forest measurements to ground measurements.For example, Sexton et al. [33] used linear regression to examine LiDAR canopy height measurements and radar canopy height measurements and concluded that LiDAR provided more precise results (R 2 = 0.83).Hyde et al. [34] used LiDAR, synthetic aperture radar (SAR), and interferometric synthetic aperture radar (InSAR) to individually and synergistically predict AGBM for a southwestern ponderosa pine forest, and found, through individual comparison, that LiDAR predicted AGBM best, accounting for almost 84% of the variability.
Airborne laser scanners (ALS) can be broadly grouped into two categories: discrete return and full waveform digitizers.These categories can be further specified by the type of system (profiling or scanning), laser footprint size, and the number of recorded returns for each laser pulse.Previous ALS studies have demonstrated that both large-footprint waveform and small-footprint discrete return ALS data, can be used to derive measurements (e.g., tree height, crown dimensions, tree location) at the stand level [5,25,30,35] and plot level [8,19,36,37].Additionally, small-footprint LiDAR is also capable of deriving measurements at the individual tree level [10,11,21,22,28,[38][39][40][41][42][43][44].These direct ALS measurements can then be used in conjunction with known allometric relationships or statistical analysis procedures to estimate parameters such as diameter at breast height (DBH), AGBM, or gross volume (gV).
LiDAR research for forestry applications has largely focused on the development of methodologies to employ LiDAR data as a surrogate for various ground measurements.ALS data can be collected over larger areas with a reduced amount of effort compared to traditional field measurements.However, the high level of complexity present within many forests (e.g., large number of species and variable canopy densities) can complicate the retrieval of such measurements.In Norway, researchers have developed and implemented methods to produce measurements of interest for stand-based forest inventories, and were able to account for 84% to 89% of the variance when predicting stand volume [45].A summary of stand-based variables of interest, study characteristics, and results from investigations by Scandinavian researchers are listed in [45].
Since ALS systems collect data looking down on the forest, forest measurements other than tree height or crown dimensions (e.g., diameter at breast height, biomass) are typically indirectly estimated.Popescu [21], used regression analysis to estimate the DBH of individual trees, using the LiDAR-derived height and crown diameter measurements provided by TreeVaW (an individual tree detection software package) as independent variables in a regression analysis.Individual tree detection algorithms implemented in TreeVaW are described in Popescu and Wynne [46].In traditional forestry, biomass estimation requires destructive sampling, or the use of species-specific [47], regional, or national [48] allometric equations.Allometric equations can also be applied to LiDAR data, if the required information is available.Popescu [21] outlined a method for obtaining individual tree AGBM estimates using allometric equations and estimates of individual tree DBH from ALS data.Examples of other studies that have also predicted AGBM using LiDAR data include [17,20,34,49].
The United States Forest Service (USFS) Forest Inventory and Analysis (FIA) program provides forest inventory measurements used to assess the status of the nation's forests.Forest resource managers and researchers commonly use these measurements to estimate forest biophysical parameters such as, gV, AGBM, or Carbon stocks (C) at local, regional, and national scales.This direct link between data provider and end user makes the FIA Program the primary information provider for many of the gV estimates, AGBM budgets, and C budgets created in the United States.
The collection of forest inventory data at a national level is a challenging and complex undertaking.Models relating ALS data to FIA parameters hold great potential to contribute to this task, by: (1) supplementing ground-based FIA measurements or biophysical parameter estimates with estimates produced from ALS data, especially in recently disturbed areas; (2) providing an increased amount of data for areas of interest that contain only a small number of FIA sample locations; or (3) aiding data collection in remote areas where challenging environmental or terrain conditions make ground-based measurements exceedingly dangerous, time consuming, and costly.
The overall objective of this study is to model forest AGBM and gV utilizing LiDAR metrics from individual subplots, four clustered subplots (hereafter referred to as a plot), and hectare plots using AGBM and gV estimates for individual subplots and plots calculated from ground-based FIA measurement data and regional allometric equations and subsequently compared to LiDAR-derived height percentile, height bin, and density bin metrics calculated for individual subplots, plots, and hectare plots.Plot AGBM and gV estimates were compared to plot and hectare plot LiDAR metrics as exhaustive tree tallies were not collected for the hectare plots.Since the data collected by ALS systems are capable of describing the three dimensional structure of the forest, they can be used to estimate forest biophysical parameters of interest such as AGBM and gV.Specific study objectives include: (1) development of a methodology to derive area-based airborne LiDAR metrics related to forest biophysical parameters for FIA subplots, plots, and hectare plots; (2) identification of relationships between the LiDAR metric sets and FIA subplot and plot estimates of forest AGBM and gV calculated using regional allometric equations; (3) investigation of the effectiveness of individual and multiple point cloud metrics to predict AGBM and gV within the context of the FIA plot design; and (4) identification of the most appropriate LiDAR metrics and analysis level for estimating AGBM and gV in the conditions present in the western forests in the US.While the remote sensing literature abounds with forestry LiDAR studies, our study brings novel elements that include: (1) development of an ALS-based methodology for estimating AGBM and gV utilizing the national forest inventory in the US, the USFS FIA plot design and ground measurements; (2) investigation of the effectiveness of previously developed point cloud metrics within the context of the FIA plot design; and (3) comparison of AGBM and gV estimates over three analysis scales: individual subplots (r = 7.32 m), plots (n = 4 subplots, each with r = 7.32 m), and hectare plots (r = 56.42m).

Study Area
The study areas for this project are located in the Malheur National Forest in eastern Oregon.ALS data were acquired for two areas, referred to as the Malheur and Camp Creek acquisitions, and cover a total of approximately 189,468 hectares (Figure 1).Elevation ranges from 1236 to 2593 m, with a mean slope of 22 degrees.The sites were selected due to availability of FIA ground measurements and recent ALS data, and the presence of a wide variety of forest structure and physiographic conditions.The forests located within the study area are composed of mostly Ponderosa pine (Pinus ponderosa), Douglas-fir (Pseudotsuga menziesii), western larch (Larix occidentalis), grand fir (Abies grandis), and lodgepole pine (Pinus contorta).

Data
This study utilized FIA ground crew in situ measurements and discrete return, small-footprint, airborne LiDAR (ALS) data.

Forest Inventory and Analysis Data
The USFS provided FIA data for all FIA plots within the study area (177 plots, 708 subplots, and more than 4400 trees).Each FIA plot contains a cluster of four circular ~0.016 hectare subplots (radius = 7.32 m).Subplot one is centered over the plot center for the entire FIA plot.Subplots two, three, and four are located 36.58 m from the center of subplot one at azimuths of 0°, 120°, and 240°, respectively.Until recently, the Pacific Northwest FIA region's field plot design included, unlike other FIA regions, a one-hectare plot (r = 56.24m, Figure 2).All individual trees with DBH equal or greater than 12.7cm are measured and recorded if they are located within the boundaries of a subplot.Measurements collected for each of these trees include: DBH, height, tree condition (live/dead), crown class (open grown, dominant, codominant, intermediate, or overtopped), species, and species group.Measurements for describing tree location relative to plot center (e.g., azimuth and distance) are also collected.While trees were measured within hectare plots, the tally was not exhaustive (a larger minimum DBH threshold was employed).Tree measurements from hectare plots were not utilized in this study.The regional equations used to calculate individual tree AGBM and gV estimates are discussed in detail in [50].AGBM estimates can be derived from gV estimates by simple linear scaling, where the scaling factor is the mean specific gravity for a particular species.Thus, estimates of AGBM and gV produced with regional equations tend to be highly correlated.
Subplot-level estimates of total AGBM and gV were calculated by summing the AGBM and gV estimates for all live trees (meeting FIA measurement requirements) within each subplot.Total subplot AGBM and gV were then scaled to a per hectare estimates.Plot and hectare plot estimates for total AGBM and gV were calculated using the total AGBM and gV for all subplots within each hectare plot.Total AGBM and gV were then scaled to a per hectare estimate.In essence, the area from the four subplots is scaled up to one hectare.It is important to note that the same estimates of field-based AGBM and gV are utilized for the plots and hectare plots, as field data for all of the trees within hectare plots were not available.
The FIA ground crew data utilized in this study (collected in 2007, 2008, and 2009) contained a total of 232 subplots and 58 hectare plots (Table 1).Temporal disparities between FIA ground data and the ALS data utilized for this study ultimately resulted in a 67% reduction in sample size for this study.The 33% of the data remaining was deemed conducive to a valid comparison with modeled derivatives of the ALS data, which were collected between 2008 and 2009.Furthermore, a total of 35 subplots and 3 hectare plots did not contain trees meeting the FIA measurement criteria.Identified subplots and hectare plots were considered to be non-forested and excluded from this analysis, resulting in the use of 197 subplots and 55 hectare plots.

ALS Data
A total of three ALS data acquisitions were flown to acquire data for the two study areas.The data for the Camp Creek study area were collected from 19 August 2008 to 27 August 2008.Data for the Malheur study area were acquired with two separate acquisition missions, the first covered the western half of the study area (between 19 November 2008 and 11 December 2008) and a second covering the eastern half (between 1 July 2009 and 5 July 2009).All data collection missions utilized two LiDAR instruments, a Leica ALS50 Phase II and a Leica ALS60, mounted in a single engine fixed-wing survey aircraft, both operating with pulse frequency of 105 kHz and capable of recording up to 4 returns per pulse.Post processing eliminated all returns from pulses with scan angle exceeding 14°.Analysis of the ALS data for both study areas shows a mean pulse density of 9 pulses per m 2 .Although the first data collection mission was performed during months where leaf-off conditions could be present, the majority of species within our study area do not lose their foliage during the winter months.Three hectare plots utilized in this study had LiDAR data acquired during both the leaf-on and leaf-off acquisitions.These plots were utilized to examine if major differences between LiDAR data collected during leaf-on and leaf-off conditions existed.Comparison of hectare plot-level metrics for these data did not identify large differences.For example, mean plot heights for leaf-off conditions were 7.6 m, 7.7 m, and 5.8 m.Corresponding mean heights from leaf-on conditions were 7.8 m, 7.8 m, and 5.9 m.

Point Cloud-Based ALS Metrics
To minimize effects from temporal discrepancies between FIA field plot visit and LiDAR acquisition due to tree growth, mortality, etc., ALS point cloud metrics were calculated for plots measured by FIA ground crews in 2007, 2008, and 2009; resulting in a total of 232 subplots and 58 hectare plots, of which a total of 197 subplots and 55 hectare plots were forested and utilized for this study.Considering that the spatial allocation of FIA plots is random, the selected plots are believed to represent the range of slopes and forest conditions present in the Malheur National Forest.Tables 1 and 2 provide summaries of the data collection year for each subplot and the frequency of tree species in each crown class.Descriptive statistics for FIA measured DBH and tree height, as well as subplot and hectare plot mean AGBM and gV are presented in Tables 3 and 4.

Extraction of Subplot and Hectare Plot Point Clouds
Global positioning system (GPS) coordinates for all subplot center locations were obtained by using a real time differential, wide area augmentation system enabled survey-grade GPS receiver and post-processed with information from a base station.Plot coordinates were furnished with estimates of precision known to correspond to the 95th percentile three-dimensional distance threshold around the plot center location (Table 5).The resulting geolocation precision of plots used in this study is likely superior to the coordinates obtained during the 2007-2009 regular FIA field visits, when FIA field crews were utilizing recreational-grade GPS receivers.
Circular plot buffers for subplots (r = 7.32 m) and hectare plots (r = 56.42m) were generated using subplot center coordinates and plot center coordinates, respectively.The buffers for each cluster of four individual subplot comprising a plot were also merged into one file.Individual subplots, plots, and hectare plot point clouds were subsequently extracted from the vendor-provided LiDAR tiles utilizing FUSION [51] (version 3.30, Figure 3), a LiDAR viewing and analysis software suite developed by the USFS.Previous studies have shown that geolocation error can increase variation in ALS metrics as well as biophysical parameter estimates based on these metrics [52].Frazer et al. [53] found that plot size could amplify or reduce the severity of geolocation error.Increased robustness of larger area plots to geolocation error is a direct result of the increased amount of overlap between ground and ALS data, the ability to capture more variability from ground measurements, and the reduction of the perimeter of the plot with respect to the area within the plot.The conclusions presented in these studies evidence the possible need to use hectare plot-level point cloud metrics (as opposed to subplot-level point cloud metrics) since the larger plot area should mitigate and negative effects stemming from geolocation errors.

Height Percentiles
Height percentiles, calculated from ALS data, have been previously used to relate ALS data to forest biophysical parameters [18][19][20].For this study, mean height, max height, and the 25th, 50th, 75th, 90th, and 95th height percentiles were calculated for all subplot and hectare plot point clouds.4).FUSION provided height bin outputs as the total number of points within each height bin, as well as the ratio of points in a given height bin to the total number of points in a given file (e.g., subplot, hectare plot).The ratio was used for modeling purposes since it provides a normalized measure of the number of points in each height bin for all plots.

Density Metrics
Density metrics, similar to those utilized in [45], were also calculated for each subplot and hectare plot.Density metrics for this study describe cumulative above-threshold point frequency, and were calculated using the same static height bin intervals mentioned in Section 2.3.5.The primary difference between the density and height bins is that all points greater than or equal to a given bin threshold (i.e., minimum height value) are counted, e.g., density 1 (d1) contains the count of all points with heights greater than or equal to 0 m, density 2 (d2) contains the count of all points greater than or equal to 5.0 m, etc. Counts are normalized by dividing the total number of points in a density metric by the total number of points within the subplot (Figures 4 and 5).The ALS data processing approach is summarized in Figure 6.

Regression Analysis
Simple linear regression (SLR) models were used to examine the relationship between the ALS-derived point cloud metrics and the subplot-, plot-, and hectare-level (per hectare) estimates of AGBM and gV.Additionally, multiple regression (MR) analysis, using mixed stepwise elimination (based on AIC criterion [54]), was implemented with all variables listed in Table 6.Variance inflation factors (VIFs) were calculated to check for the presence of multicollinearity among the remaining predictor variables in the model.Predictor variables with VIFs greater than ten were considered an indicator of multicollinearity in the model [55].
Models were cross-validated using the prediction sum of squares (PRESS) statistic [56].The PRESS statistic is a leave-one-out cross-validation method, where a model is fit using n-1 observations (where n is the total number of observations in the dataset).The withheld observation is then used to calculate a prediction error.This process is repeated until a prediction error is calculated for each observation.The PRESS statistic is calculated by summing the squared prediction errors.Statistics calculated for each model can be compared, and the model with the lowest PRESS statistic will have the highest goodness of fit.

Results
Individual SLR models were created for the AGBM and gV estimates using each of the point cloud-based metrics calculated for the individual subplots, plots, and the hectare plots.Diagnostic plots from the initial SLR models confirmed the existence of heteroscedasticity, and further examination found the histograms of the AGBM and gV data were positively skewed.Both issues violate the assumptions of linear regression, and indicate that the original models were not appropriate, and that a transformation was needed for the data to satisfy the normality and constant variance assumptions of linear regression.Such transformations are also commonly used to normalize positively skewed distributions and reduce heteroscedasticity [55].All SLR models were rerun using the natural log and square root transformed AGBM and gV data.Diagnostic plots indicated that the square root transformation (AGBMsqrt and gVsqrt) was most appropriate for these data.
When utilizing the subplot square root transformed AGBM and gV data, the best independent variables from the independent subplot point cloud metric sets were: mean height (Figures 7a and 8a), hb20-25, and d3, which accounted for 77%, 52%, and 67% of the variability in field estimated AGBM and 73%, 49%, and 62% of the variance in field estimated gV (Table 7).All previously mentioned models were significant at the α = 0.05 level.The best independent variables for predicting AGBM from the plot point cloud metrics were also mean height (Figure 7c), hb20-25, and d3, accounting for 83%, 56%, and 76% of the variability in field estimated AGBM.For gV, the best independent variables from the clustered subplot point cloud metrics were the p90 (Figure 8c), mean height, hb20-25, and d3, accounting for 81%, 80%, 54%, and 74% of the variability in the field estimated gV.Mean height was also included in the best independent variable list since models produced using p90 and mean height were very similar (Table 8).Models produced utilizing the plot point cloud metrics were all significant at the α = 0.05 level.The best independent variables for the hectare plot point cloud metric sets were mean height (Figure 7e), hb15-20, and d3, accounting for 73%, 63%, and 73% of the variability in field estimated AGBM.The best independent variables for predicting gV from the hectare plot point cloud metric sets were p90 (Figure 8e), hb15-20, and d3, accounting for 73%, 62%, and 71% of the variability in field estimated gV (Table 9).Models for the hectare plot data were all significant at the α = 0.05 level.MR models for the subplot data were created utilizing independent variables selected with a mixed stepwise selection method (AIC criterion-based).The MR models showed only minor improvement in predictive ability when including more than one of the height percentile metrics.However, models based on multiple height bin metrics or multiple density metrics did improve the predictive ability of models (Tables 7 and 10).The AGBM and gV MR models based on the height bin metric set included hb5-10, hb10-15, hb15-20, hb20-25, and hbgt25.All variables in both models were significant at the α = 0.05 level, and all VIFs were less than 10 indicating no multicollinearity issues.The height bin-based model was able to explain 78% of the variability in field estimated AGBM at the subplot-level.The height bin based gV MR model was able to explain 75% of the variability in gV.The MR model for predicting subplot AGBM from density metrics utilized d2, d3, d5, and d6, while the MR model for predicting subplot gV from density metrics utilized d2, d5, and d6.All model variables were significant at the α = 0.05 level, and VIFs were below 10.The MR models were able to explain 78% and 74% of the variance in field estimated AGBM and gV, respectively.
For MR models based on plot metric sets (Table 11), the addition of multiple independent variables improved models for every metric set.Models utilizing percentile metrics as independent variables were able to account for the greatest variability in the field estimated AGBM and gV, 87% and 88%, respectively.The most notable improvement was seen in models utilizing height bin metrics, where the use of multiple height bin metrics allowed height bin-based models of AGBM and gV to account for 86% and 85% of the variability in the field estimated AGBM and gV, as compared to SLR models (utilizing hb20-25) which accounted for 56% and 54% of the variability in field estimated AGBM and gV.AGBM and gV models employing density metrics both utilized two density bins (d2 and d6), and were able to account for 86% and 84% of the variability in the field estimated AGBM and gV.All models reported in Table 11 were significant at the α = 0.05 level, and showed no multicollinearity issues (all VIFs were lower than 10).
For MR models based on hectare plot metric sets, the addition of multiple independent variables only improved models based on height bin metrics or density metrics.The percentile-based MR models (for ABGM and gV) identified several independent variables (mean height and p90 for AGBM and mean height p90, and p25 for gV), but not all of the selected variables were significant at the α = 0.05 level, and had VIFs larger than 10 (indicating multicollinearity).The resulting model had a lower R 2 value and higher RMSE than the SLR models developed for AGBM and gV SLR.The initial height bin-based MR model for AGBM (utilizing the square root transformation) utilized hb5-10, hb10-15, and hb15-20 as independent variables.This model showed no multicollinearity issues (VIFs below 10).All independent variables utilized were significant at the α = 0.05 level.The height bin-based model for gV (utilizing the square root transformation) employed hb10-15 and hb20-25, and showed no indication of multicollinearity (VIFs below 10).All independent variables were significant at the α = 0.05 level.The height bin-based MR models for AGBM and gV accounted for 74% and 73% of the variability in field estimated AGBM and gV, respectively.Analysis of the VIFs for the initial density-based MR models for the square root transformed AGBM and square root transformed gV did not indicate multicollinearity issues.All selected independent variables (d2 and d4 for AGBM and d2 and d5 for gV) were significant at the α = 0.05 level.Density-based MR models for AGBM and gV accounted for 75% and 73% of the variability in the field estimated AGBM and gV, respectively.
Modeling results presented above show that model R 2 values improved from models based on individual subplots to models based on plots.When the scale of the analysis level is increased to the hectare plot-level, a decrease in model R 2 values was observed.This pattern was visible in both SLR models and MR models (Tables 7-9 and 10-12, respectively).Overall, the best R 2 values were obtained when using plot-level analysis.

Discussion
Previous studies have found that ALS-derived variables describing the height of trees can produce accurate AGBM, gV, and other forest biophysical parameter predictions [15][16][17][18][19][20][21][22][23][24][25][26][27].To our knowledge, few studies have compared the ability of height percentile, height bin, and density LiDAR metrics to estimate AGBM and gV for the same dataset.In one such study, Naesset and Gobakken [57] compare similar LiDAR metrics, but focused on forest growth using mean tree height, basal area, and volume under different forest conditions than those present in the Malheur National Forest.Our study is also unique in its analysis utilizing point cloud metrics from individual subplots, plots comprising a cluster of 4 subplots, and hectare plots to determine which dataset would best estimate AGBM and gV.Hectare plots were also utilized due to concern regarding geolocation and edge effect errors commonly associated with small area plots [51,52].Table 5 provides descriptive statistics for the subplot center coordinates used to extract plot-level LiDAR data.Plot center GPS data utilized for this study exhibit superior geolocation precision when compared to the recreational-grade GPS receivers usually employed by FIA field crews.
After our analyses, it can be inferred that geolocation and edge effect errors did not have noticeable effect on the ability of subplot point cloud metrics to estimate AGBM and gV within our study area.However, future research efforts will attempt to quantify how geolocation and edge effect errors affected our AGBM and gV estimates.The best predictors of AGBM and gV at the individual subplot-level were mean height, hb20-25, and d3.Individual subplot SLR models based on hb20-25 exhibited a moderate relationship to AGBM and gV, while models based on d3 and mean height exhibited a stronger relationship.A visual examination of a subset of subplot point clouds identified that subplot-level mean height is commonly located near a high density cluster of LiDAR returns in the vertical distribution (as expected), while d3 typically covers a majority of the points within the upper forest canopy (Figure 9).This suggests that mean height is an accurate predictor of AGBM and gV because it accurately describes tree height for the majority of trees on a plot.The d3 metric typically provides information on the taller trees within a plot, although in the presence of mostly short trees it may fail to capture such information (Figure 9c,f).One possible reason for poor performance of SLR models based on subplot hb20-25 is the limited number of returns occurring within the 20-25 m.Thus, when larger trees are present within a plot this metric provides information useful for describing subplotlevel AGBM or gV (since these trees will typically contain a large portion of the AGBM or gV within a subplot).However, when subplots contain trees less than 20 m tall, this metric essentially reports that no vegetation layers are present within a plot (Figure 9b).This suggests that multiple height bin metrics might be required to correctly model AGBM and gV for forests with conditions similar to those exhibited in the Malheur National Forest.Our subplot MR models utilizing height bin-based independent variables (Table 9) provide evidence for this finding, with the final model utilizing multiple height bin metrics, exhibiting no multicollinearity issues, and accounting for a similar amount of variance in field estimated AGBM and gV as the best subplot SLR models.
Overall, the plot SLR and MR models were able to account for the largest amounts of variability in the field estimate AGBM and gV (Table 8).LiDAR-derived independent variables for the clustered subplot SLR models were mostly the same as those identified for the individual subplot models (mean, hb20-25, and d3).However, the best SLR model for gV utilized the percentile metric p90.This also occurred in the hectare plot SLR gV model.The SLR model utilizing the mean height value is very similar to the model based on p90, which is also similar to the SLR modeling results based on the hectare plot point cloud metrics.
Hectare plot-level SLR models utilizing height percentile metrics performed similarly to subplot-level SLR models, while density and height bin metric-based models were improved (Table 9).The best SLR models in each point cloud metric set were mean, hb20-25, and d3 for AGBM and p90, hb15-20, and d3 for gV.Table 9 also reports results for the gV model based on the mean, as this model was similar to the model based on p90.Models based on the previously mentioned independent variables accounted for 73%, 63%, and 73% of the variability in the field estimated AGBM and 75%, 62%, and 71% of the variability in field estimated gV.We believe this improvement is the result of scaling the subplot estimates of AGBM and gV to the hectare plot-level, since the average of conditions observed for four relatively small subplots would more closely resemble the conditions found within the larger hectare plot.In some cases, the best plot and hectare plot-level SLR models for a given metric set utilized a different independent variable than the corresponding individual subplot-level SLR models.For example, the hectare plot-level height bin-based models for AGBM and gV models utilized hb15-20, while the subplot-level SLR utilized hb20-25.A visual examination of the two height bins in one of the hectare plots suggests that hb15-20 provides more information about the forest conditions on the hectare plot than hb20-25 (Figure 10).This conclusion is also supported by the descriptive statistics for tree species height (Table 3), which shows eight of the eleven tree species (including those composing the majority of trees within our plots) have mean heights within the hb15-20 range of 15 to 20 m.At the individual subplot-level, stepwise MR analysis identified that models based on height bin metrics and density metrics could benefit from multiple predictor variables, while improvements to models based on height percentile metrics were nearly negligible.In height bin and density-based models, the inclusion of multiple predictors increased the predictive ability of the models to a level similar to the mean height-based SLR model, with R 2 values ranging from 0.74 to 0.78.AGBM and gV models utilizing height bin metrics always required more predictor variables than models utilizing density metrics, in part due to the compartmentalized nature of height bin metrics.Density metrics exhibit strong spatial correlation, and as such the inclusion of a large number of density bins leads to multicollinearity issues.
At the plot analysis level, stepwise MR analysis identified that models for each metric set could benefit from multiple predictor variables.While multiple independent variables were utilized in models for each metric set, a decrease in the number of variables in the height bin and density based models is observed.At the hectare plot-level, stepwise MR analysis identified that models utilizing height bin or density metrics would benefit from multiple predictor variables, while no additional predictor variables were identified for models based on height percentiles.However, it should be noted that density metric based models received only a minor improvement when a second predictor variable was added.
Results for the MR analyses show that point cloud metrics derived from the clustered subplots were able to account for the most variability in field estimated AGBM and gV, followed by models based on individual subplot metrics, and models based on hectare plot metrics.We hypothesize that the extent of hectare plot point cloud metrics was too large to adequately describe the average conditions present in the four subplots contained within each hectare plot.Future studies will investigate if a more appropriate plot size for LiDAR data covering the four subplots exists.
While subplot-level and hectare plot-level analyses produced adequate SLR and MR models, the best SLR and MR models were produced with the plot-level analysis.We believe there are several reasons why the plot level analysis produced the best results.When moving from subplot-to plot-level, the sample size is increased three-fold.If AGBM or gV is spatially distributed as a uniform random process, where tree locations approximating a Poisson simple point pattern and tree sizes approximating a distribution skewed to the right (considered to be the usual case) we expect estimation accuracy to increase rapidly with initial increases in sample size.However, when the spatial extent of the analysis is increased from the plot-level to the hectare plot-level we are forced to assume that the population is distributed identically within the hectare as within the subplots.This assumption might not be valid for all hectare plots utilizing the current field data, which was collected for each of the subplots.Thus, reductions in model R 2 values are a result of the discrepancies in the population distribution between hectare plots and plots.Such discrepancies would not be an issue if exhaustive tree tallies were available for the hectare plots.Overall, this finding provides evidence that the current FIA plot design can be used with dense airborne LiDAR data to produce area-based estimates of AGBM and gV.

Conclusions
This study represents an initial attempt to model biophysical parameters of interest to the FIA program utilizing the standard FIA plot design, data from FIA ground crews, and ALS data.As previously mentioned, other studies have successfully modeled similar forest parameters with ALS data [15][16][17][18][19][20][21][22][23][24][25][26][27]34,46,49,55].However, to our knowledge none have done so for multiple analysis level (i.e., subplot, plot, and hectare plot) comparing the performance of percentile, height bin, and density metrics, and specifically focused on modeling AGBM and gV for FIA purposes.Models based on subplot point cloud metrics, for AGBM and gV provided results similar to those presented in the literature and cited throughout this study.
Overall results from this study show that ALS can be used to create models that describe the AGBM and gV of Pacific Northwest FIA subplots with results comparable in accuracy to those of other published studies.Point cloud metrics based on PNW FIA hectare plots or individual subplots were not able to describe AGBM and gV as well as those based on plot point cloud metrics.The ability of ALS to collect data over large areas coupled with these results demonstrates the potential of ALS systems to augment current FIA data collection procedures by providing a temporary intermediate estimation of AGBM and gV for plots with outdated field measurements.We hypothesize that these intermediate estimates could prove beneficial for increasing the accuracy of forest C budgets by reducing variance caused by temporal measurement discrepancies in the currently available FIA measurements.However, more work must be done to quantify the amount of variance that would be reduced due to temporal measurement discrepancies, as well as the contribution of ALS-based estimates to this variance.
Future research identified by this study, includes: (1) utilization of regional species-specific growth and yield models to mitigate error caused by out-of-date tree measurements, while increasing the number of samples available for use in modeling efforts; and (2) investigations of how GPS accuracy and ALS data can be utilized to identify plots to be withheld from future modeling exercises.

Figure 1 .
Figure 1.Malheur National Forest study area in eastern Oregon with slope depicted in degrees.Black squares represent locations of selected Forest Inventory and Analysis plots field-visited between 2007 and 2009.Actual plot locations have been obscured due to confidentiality constraints.

Figure 2 .
Figure 2. The location and dimensions of the subplots (r = 7.32 m) and one-hectare (r = 56.42m) plot used by the Pacific Northwest Forest Inventory and Analysis region.Figure not drawn to scale.
Figure 2. The location and dimensions of the subplots (r = 7.32 m) and one-hectare (r = 56.42m) plot used by the Pacific Northwest Forest Inventory and Analysis region.Figure not drawn to scale.

Figure 3 .
Figure 3.Light detection and ranging (LiDAR) analysis levels.Point clouds were extracted for individual subplots, clusters of four subplots, and hectare plots.

Figure 4 .
Figure 4. Conceptual illustration of the separation of vertical space (left), and example of the vertical distribution and separation (by metric type) of light detection and ranging (LiDAR) returns in a hectare plot (right).

Figure 5 .
Figure 5. Cross sectional plot of one subplot point cloud visualizing the minimum height thresholds for density metrics, with subfigures a through f highlighting all points above the minimum height thresholds of 0 m (a), 5 m (b), 10 m (c), 15 m (d), 20 m (e), and 25 m (f), respectively.

Figure 6 .
Figure 6.Flowchart of the light detection and ranging (LiDAR) data processing approach.

Figure 7 .
Figure 7. Scatter plot of simple linear regression results for the best simple linear regression aboveground biomass models (transformed and non-transformed) for individual subplots (a,b), plots (c,d), and hectare plots (e,f).* indicates p-values of less than 0.05.

Figure 8 .
Figure 8. Scatter plot of simple linear regression results for the best simple linear regression gross volume models (transformed and non-transformed) for individual subplots (a,b), plots (c,d), and hectare plots (e,f).* indicates p-values of less than 0.05.

Figure 9 .
Figure 9. Cross-sectional plots of two subplot point clouds displaying the location of the best aboveground biomass and gross volume predictor variables.The left subfigures identify the mean height for each example subplot.The middle subfigures identify the location of height bin 20-25 for each example subplot.The right subfigures identify the location of density 3 for each example subplot.

Figure 10 .
Figure 10.A visual comparison of height bin 15-20 and height bin 20-25 metrics for a hectare plot.Height bin15-20 provides more information about the conditions present on the hectare plot than height bin 20-25.(a) Cross sectional plot of one hectare plot and the location of height bin 15-20 and height bin 20-25; (b) Top-down representation of all points within the hectare plot; (c) Top-down representation of the points falling within height bin 15-20; and (d) Top-down representation of the points falling within height bin 20-25.

Table 1 .
Total number of Forest Inventory and Analysis subplots, plots, and hectare plots by year.

Table 2 .
Tree species crown class frequencies.
a Number of trees; b Open grown crown class; c Dominant crown class; d Codominant crown class; e Intermediate crown class; and f Overtopped crown class.

Table 3 .
Tree species diameter at breast height and height descriptive statistics.
a number of trees; b standard deviation; and c coefficient of variation percentage.

Table 4 .
Subplot and hectare plot descriptive statistics for mean estimated biomass and volume.Descriptive statistics for subplot and hectare plot aboveground biomass (AGBM) and gross volume (gV) were calculated excluding subplots and hectare plots with no Forest Inventory and Analysis reported ABGM or gV.

) d Subplot AGBM (Mg•ha −1 )
a number of plots; b reduced number of plots; c standard deviation; and d coefficient of variation percentage.

Table 5 .
Descriptive statistics for the mean accuracies (m) for subplot center coordinates.
a Number of FIA subplots; b Standard deviation; and c Coefficient of variation percentage.

Table 7 .
Summary of the best subplot-level aboveground biomass and gross volume simple linear regression models for each point cloud metric set.* indicate p-values of less than 0.05.

Table 8 .
Summary of the best clustered subplot-level aboveground biomass and gross volume simple linear regression models for each point cloud metric set.* indicates p-values of less than 0.05.

Table 9 .
Summary of the best hectare plot-level aboveground biomass and gross volume simple linear regression models for each point cloud metric set.* indicates p-values of less than 0.05.

Table 10 .
Subplot-level multiple regression analysis results.* indicates p-values of less than 0.05.

Table 11 .
Plot-level multiple regression analysis results.* indicates p-values of less than 0.05.

Table 12 .
Hectare plot-level multiple regression analysis results.* indicates p-values of less than 0.05.