Next Article in Journal
CAWRES: A Waveform Retracking Fuzzy Expert System for Optimizing Coastal Sea Levels from Jason-1 and Jason-2 Satellite Altimetry Data
Previous Article in Journal
A Remote Sensing Approach for Regional-Scale Mapping of Agricultural Land-Use Systems Based on NDVI Time Series
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluating Site-Specific and Generic Spatial Models of Aboveground Forest Biomass Based on Landsat Time-Series and LiDAR Strip Samples in the Eastern USA

1
Department of Forest Resources, University of Minnesota, 1530 Cleveland Ave. North, St. Paul, MN 55108, USA
2
Northern Research Station, Forest Inventory and Analysis, USDA Forest Service, 1992 Folwell Ave., St. Paul, MN 55108, USA
3
Pacific Northwest Research Station, USDA Forest Service, University of Washington, Seattle, WA 98195, USA
4
USDA Forest Service, Pacific Northwest Research Station, 3200 SW Jefferson Way, Corvallis, OR 97731, USA
5
Northern Research Station, Center for Research on Ecosystem Change, USDA Forest Service, 271 Mast Road, Durham, NH 03824, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2017, 9(6), 598; https://doi.org/10.3390/rs9060598
Submission received: 14 April 2017 / Revised: 10 June 2017 / Accepted: 11 June 2017 / Published: 13 June 2017

Abstract

:
Large-area assessment of aboveground tree biomass (AGB) to inform regional or national forest monitoring programs can be efficiently carried out by combining remotely sensed data and field sample measurements through a generic statistical model, in contrast to site-specific models. We integrated forest inventory plot data with spatial predictors from Landsat time-series imagery and LiDAR strip samples at four sites across the eastern USA—Minnesota (MN), Maine (ME), Pennsylvania-New Jersey (PANJ) and South Carolina (SC)—in statistical modeling frameworks to analyze the performance of generic (all sites combined) versus site-specific models. The major objective was to evaluate the prediction accuracy of generic and site-specific models when applied to particular sites. Pixel-level polynomial model fitting was applied to the time-series of near-anniversary date Landsat variables to obtain projected metrics in the target year 2014 for which LiDAR strip samples were available. Two forms of models based on ordinary least-squares multiple linear regressions (MLR) and the random forest (RF) machine learning approach were developed for each site and for the pooled (i.e., generic) reference data frame. The models were evaluated using national forest inventory (NFI) data for the USA. We observed stronger fit statistics with the MLR than with RF for both the site-specific and the generic models. The proportions of variances explained (adjusted R2) with the site-specific models were 0.86, 0.78, 0.82 and 0.92 for ME, MN, PANJ and SC, respectively while the generic model had adjusted R2 = 0.85. A test of statistical equivalence of observed and predicted AGB for the NFI locations did not reveal equivalence with any of the models, possibly due to the different resolutions of the observed and predicted data. In contrast, predictions by the generic and site-specific models were equivalent. We conclude that a generic model provides accuracies comparable to the site-specific models for large-area AGB assessment across our study sites in the eastern USA.

1. Introduction

Forest ecosystems store the majority (~80%) of terrestrial biomass and carbon [1], as well as contribute significantly to global carbon emissions (~33% over the past 150 years) through land use and land cover changes [2,3,4]. Sustainable land use and forestry practices focused on enhancing carbon storage in live woody biomass have been recognized as effective means to reduce the effects of emissions and mitigate the effects of climate change, because forests are estimated to absorb about 25% of annual anthropogenic carbon-dioxide emissions [5]. The United Nations Framework Conventions on Climate Change (UNFCCC) has endorsed programs for reducing emissions from deforestation and forest degradation (REDD+), and mandated that member countries periodically report forest carbon estimates via national greenhouse gas inventories (NGHGI). The REDD+ initiatives and NGHGI will benefit from wall-to-wall (spatial) inventories of aboveground forest biomass (AGB) at regional and/or national scales. Indeed, estimation and monitoring of AGB over a large space and time are crucial to inform strategic and operational-scale forest management plans. It can be hypothesized that a large-area spatial inventory of AGB can be implemented more efficiently with a generic model for diverse forest types, as site-specific data or models may not be available for individual forest types in any area of interest.
Spatially explicit (wall-to-wall) inventories of AGB for regional or national scales can be made cost-effective by integrating inventories of field sample plots with freely available remotely sensed predictors. The cutting-edge light detection and ranging (LiDAR) technology has been extensively used in forestry practices including AGB mapping, because these active sensors are highly sensitive to structural attributes and provide accurate three-dimensional coordinates of the pulse returns from intercepting canopy and terrain elements [6,7]. In addition, the freely available archive of Landsat data, especially multispectral imagery by the Thematic Mapper (TM) and successor sensors in use since 1984, have been integral in portraying the spatio-temporal distribution of AGB. More recently, the Sentinel-2 land monitoring satellite mission of the European Space Agency is publicly providing imagery of improved spectral and spatial resolutions that also supports Landsat data continuity. Landsat surface reflectance bands and derived indices have long been used to produce spatio-temporal biomass maps and to differentiate forest structure and composition over landscapes [8,9]. Although the passive Landsat optical data are less sensitive to complex forest conditions [10], their continuous data collection for large scenes since 1972, moderate spatial resolution (30-m × 30-m) and synoptic coverage characteristics can improve cost-efficiency in forest inventories across large spatial and temporal domains [11,12]. Furthermore, improved accuracy of AGB prediction can be attained when Landsat data are combined with LiDAR-derived fine resolution metrics, because Landsat spectral data can more accurately predict species composition [13,14]. LiDAR data acquired via discrete or full waveform instruments, with small or large footprints and scanning or profiling characteristics, have been extensively used in past studies for AGB mapping [14,15].
AGB estimation using LiDAR depends on canopy characteristics such as tree branching and foliage structure, because these elements intercept laser pulses in various canopy height layers. Consequently, the accuracy of LiDAR-based AGB models can vary with species and forest types, as well as the strength of relationships among tree diameter at breast height (DBH), total height, and crown dimensions. Fortunately, highly accurate absolute elevations of canopy elements and overall three-dimensional canopy structures can be estimated from LiDAR data. The general procedure is to separate the LiDAR point cloud data into ground and non-ground returns, create a digital terrain model (DTM) from the ground returns, and then subtract the DTM from the elevations of canopy returns to obtain actual return heights [16]. The height information of LiDAR returns is ultimately used to develop area-level metrics that can be used to predict statistical distributions of canopy elevations, strata density, and cover. LiDAR data should be binned to regular grid cell sizes approximately matching field sample plot sizes to enable wall-to-wall mapping of the target response attribute [17,18].
For remotely sensed AGB mapping, formulation of a relationship (model) between a sample of field inventory data and co-located spatial predictors is crucial. The AGB of sample plots may be obtained using localized species-specific or generalized allometric models dependent on commonly measured tree variables such as DBH and total height. Because localized allometric models are not available for most forest ecosystems around the world, generalized models are commonly applied to the tree list of local forest inventory data with the tree AGB estimates being aggregated to the plot level. For example, the tree AGB models developed by Jenkins et al. [19] for 10 species groups in the continental USA are widely used in North America, despite large biogeoclimatic gradients across the continent. The concerted efforts in recent years to create legacy tree databases based on past studies across a large region (e.g., Falster et al. [20]; www.legacytreedata.org) have also facilitated formulation of generic models. However, a drawback of these generalized models is that they ignore the effects of local site characteristics on tree growth [21,22]. Further, the use of DBH alone in an allometric model assumes that the relationship between tree diameter and height is invariant, which is unrealistic [23]. With the widespread use of generalized tree allometric models in design-based frameworks for estimating AGB at multiple spatial scales, the scope for generalized (or generic) remote sensing-based models in large-area inventories can be emphasized. This is because of the increasing availability of consistently collected remotely sensed data in the public domain (e.g., Landsat, and satellite based LiDAR), which generally reduces costs and improves accuracy compared to approaches based on field data alone.
Many LiDAR-based AGB estimation applications have focused on local models for two basic reasons: (i) the lack of a consistent LiDAR dataset over a large range of diverse forest types, and (ii) concerns for ecological variations (i.e., biogeoclimatic conditions) over large geographic areas (Wulder et al., 2012). However, with pending spaceborne LiDAR such as the Global Ecosystem Dynamics Investigation (GEDI) and other missions, which are expected to provide consistent data across regions, there is potential for pooling data from different forest types to construct a generic model for enhancing the efficiency of LiDAR-based inventories. Such efficiency is achieved in the sense that LiDAR processing will require derivation of only a subset of predictors at the spatial resolution that matches the size of field inventory plots compared to a varying suite of predictors for site-specific models. While the validity of a generic model is improved by including a large number of data points from across the representative ecological sites [24,25], extension of the geographic domain complicates model selection and parameterization with the increasing complexity of the reference training frame [26,27,28,29]. Nevertheless, regional biomass assessments based on remotely sensed data could benefit from a generic model, because the need for numerous site-specific models and the aggregation of local estimates for large-area totals would be avoided. Performance of a remote sensing-based generic model relative to site-specific models for AGB estimation remains largely unknown. A thorough analysis of prediction errors with generic and site models would inform users as generic models do not accommodate the effects of differences in local growing conditions.
Empirical parametric or non-parametric relationships between sample plot AGB as the response and LiDAR and Landsat derived variables as the predictors can be constructed and subsequently applied in the same population at any area where only the remotely sensed data are available. However, the form of the models and importance rankings of predictors can vary with forest types [30]. In any case, formulation of a generic model requires selection of an optimal set of predictors [29]. Careful selection of predictors is important in parametric regression, since a large number of strongly correlated variables can produce overfitting (i.e., unstable estimates of parameters) of models that poorly reflect the general trend in data and adversely affect predictions accuracy outside of the training data [31]. Non-parametric modeling techniques, such as random forest, are free from many statistical assumptions, and have the ability to rank importance of predictors and estimate mean square error through internal cross-validation [32,33,34]. Random forest techniques have been frequently used in AGB modeling by integrating samples from field inventories, remote sensing, and other auxiliary data [35]. For large-area estimation in a parametric mode, model-assisted approaches to inference are largely preferred because (i) the reference (training) data frame relies on a sample of population units selected randomly from a consistent probability sampling design, and (ii) the relationship between sampling unit (plot) observations and co-located auxiliary variables is exploited to achieve a reduction in error variance (compared to what is possible without auxiliary data) in the estimate of a population parameter [36,37]. Simple or stratified random sampling designs can be employed to acquire the sample and infer the population mean or aggregate by expressing the sampling error in the form of, for example, standard error of estimates [38].
The Carbon Monitoring System (CMS) program of NASA in collaboration with the Forest Inventory and Analysis (FIA) program of the U.S. Department of Agriculture Forest Service aims to assess forest carbon sinks and fluxes in the USA. CMS has emphasized large-area (continent-scale) assessment for strategic purposes using Landsat time-series data, and small-area assessment for operational plans using fine resolution remotely sensed data such as LiDAR. The FIA program has focused on methods that provide consistent inventory estimates across time and space. The broad objective of this study is to integrate field sampling, LiDAR sampling, and Landsat time-series data to formulate parametric and non-parametric models for plot-level AGB prediction within the context of CMS. Here, we investigate the utility and performance of a remote sensing-based generic model and site-specific models for estimating plot-level AGB across a wide range of forest ecosystems in five states across the eastern USA. The specific objectives are threefold:
(1)
to compare the effectiveness of LiDAR and Landsat variables for estimating plot-level AGB using ordinary multiple linear and random forest based models at four study sites (i.e., Landsat scenes) across the eastern USA;
(2)
to evaluate the accuracy of site-specific and generic AGB models across the four study sites; and
(3)
to validate the performance models using FIA data.

2. Methods

2.1. Study Area

The study area was located in the eastern USA and covered four Landsat scenes corresponding to the WRS-2 paths/rows of 12/28, 27/27, 14/32 and 16/37 (Figure 1). Scene 14/32 covers parts of Pennsylvania and New Jersey (PANJ) while scenes 12/28, 27/27 and 16/37 lie completely within Maine (ME), Minnesota (MN) and South Carolina (SC), respectively. The scenes are spread over diverse landscapes with substantial differences in bioclimatic characteristics (Table 1). The terrain elevations (based on LiDAR strip samples) were found to range from 0 to 674, 151 to 561, 98 to 1138 and 0 to 114 m above sea level in PANJ, MN, ME, and SC, respectively.
The study area in ME lies in a transition zone between boreal forests to the north and eastern deciduous forests to the south, and forest composition is divided nearly equally between hardwoods and softwoods with spruce/fir, maple/beech/yellow birch, white/red/jack pine, and aspen/white birch as the dominant forest types [39]. Differences in harvesting intensity across the landscape have impacted stand-size classes across different areas of ME. Laurentian mixed forests characterize the MN site that lies in the transition zone between the Canadian boreal forests to the north and the broadleaf deciduous forests to the south [40]. Aspen, pine, oak, elm/ash/cottonwood, and spruce/fir are the dominant forest types in MN where nearly two-thirds of the growing stock are hardwoods, and the rest are softwoods. Mixed-oak, northern hardwoods and loblolly/pitch/shortleaf pine forest-type groups dominate the PANJ site where hardwoods are largely distributed to the northern areas and softwoods are concentrated primarily in the south [41,42]. Loblolly-shortleaf pine, oak-hickory and oak-gum-cypress are the dominant forest-type groups in SC where a large part of the forest is young and naturally regenerated [43].

2.2. LiDAR Strip Samples and Processing

LiDAR strip samples were acquired in summer (28 June–3 September) 2014 at the four study sites using a Leica Reigl 680i airborne laser scanner onboard twin engine aircrafts (Cessna 310, Piper Aztec and Piper Navajo) flown at an average altitude of 731.5 m above the terrain level. The LiDAR strips covered about 2590, 3056, 4454 and 3571 square kilometers in 24, 24, 35 and 29 flight lines for ME, MN, PANJ, and SC, respectively. The average width of the strips was about 1.2 km. The multi-return LiDAR system captured up to six returns per pulse and generated point cloud data with densities of four to six pulses per square meter and average precision of 5 cm for elevations. The LiDAR datasets were processed to produce 55 grid metrics (Table A1) at 30-m spatial resolutions, following McGaughey [16]. These metrics represented the descriptive statistics such as elevation range and percentiles based on the height information of all LiDAR returns falling within each 30-m × 30-m grid cell. The heights of aboveground pulse returns were also summarized to obtain the grid metrics for canopy cover and vertical strata density as defined in Table A1 and Deo et al. [17].

2.3. Landsat Time-Series Data and Pixel Level Curve Fitting

Near-anniversary date time-series Landsat surface reflectance high-level data products, managed in the USGS climate data records, were obtained for each of the sites from the Earth Resources Observation and Science (EROS) Center Science Processing Architecture (ESPA) on-demand interface (http://espa.cr.usgs.gov/). The time-series data were acquired by Landsat-5 Thematic Mapper (TM) and Landsat-8 Operational Land Imager (OLI) sensors at the peak of the growing seasons (July–September) from 1984 to 2015 (Table 2). The high-level products included surface reflectance of individual bands (visible and infrared), some derived spectral indices, and cloud masks for each image (https://landsat.usgs.gov/cdr-ecv) at 30-m spatial resolution. The image data were terrain corrected (L1T) and radiometrically preprocessed at the source (USGS EROS [44] ), and we enhanced the input quality by selecting only images having less than 10% cloud cover. The peak growing season and annual quality of imagery provided data with consistent landscape conditions and phenology within a site due to similar solar geometry. Any narrow cloud and shadow patches in some of the images were masked out and substituted with clear imagery acquired in adjacent years. The collection consisted of 16, 18, 18 and 17 images for ME, MN, PANJ, and SC sites, respectively, where data gaps as long as three years were present. The data products are ready-to-use in terms of predicting forest attributes.
We considered 12 spatial predictors derived from the Landsat data for predicting plot-level AGB: shortwave infrared (SWIR) surface reflectance, normalized difference vegetation index (NDVI), normalized burn ratio (NBR), integrated forest z-score (IFZ), tasseled cap brightness (TCB), greenness (TCG) and wetness (TCW) indices, tasseled cap angle (TCA), disturbance index (DI), enhanced vegetation index (EVI), normalized difference moisture index (NDMI), and soil adjusted vegetation index (SAVI). The predictors were either SWIR band surface reflectance, or ratio of bands, or any combination of surface reflectance bands. The SWIR band, NDVI, NBR, NDMI, SAVI, and EVI were obtained directly from the USGS climate data records via the ESPA online interface, while IFZ, TCA, and DI were derived as described in Huang et al. [45], Pflugmacher et al. [11] and Healey et al. [46], respectively. The coefficients for the derivation of tasseled cap indices were adopted from Crist et al. [47] for Landsat-5 data and Baig et al. [48] for Landsat-8 data. The derivations of IFZ and DI metrics required average and standard deviation values of forested samples of the imagery. Hence, several homogeneous polygons (>20) were digitized manually over dense forest areas of each image in ArcGIS. The dense forest areas were identified using the dense-dark vegetation raster available for each Landsat image in the surface reflectance high-level data products of the USGS [44]. The descriptions of individual predictors are given in Deo et al. [8] and USGS EROS [49]. These predictors were considered to simplify model and minimize error, although published works have reported varying sensitivity of the same predictors in different forest types [31,50,51].
Pixel-level polynomial (third-degree) models were fit to the time-series Landsat metrics (12 predictors at each of the four sites) to characterize spectral trends over time and to obtain fitted (projected) metrics for any target year that did not have cloud-free Landsat data. A previous study revealed that AGB models dependent on the polynomial-projected Landsat variables produce accuracies comparable to the models using actual Landsat metrics [8]. The pixel-level temporal model fittings have the potential to minimize noise due to atmospheric factors, such as cloud and shadows. The temporal model fit was performed using R statistical software packages [52], including ‘raster’ and ‘rgdal’, which generated pixel-level parameter estimates of model fit and coefficients of determination (R2). The model parameters (as rasters) were used to obtain projected metrics for the Landsat variables in the year 2014 when LiDAR was also flown. The projected Landsat variables were used in AGB modeling, because cloud-free imagery was not available in 2014 for all sites, and the projected variables are most likely to provide consistency in the predictive power over time. However, other approaches to obtaining projected predictors from a time-series of surface reflectance imagery have also been published by others (e.g., [45,53,54]).

2.4. Field Data for Model Training

Field data for model training were collected in summer 2015 and consisted of approximately 50 specialized forest inventory plots within the LiDAR strips of each site (Table 3). Although there is a one year gap between the field and LiDAR data, the plots were located in undisturbed forest conditions, so there was minimal difference in forest structure and composition in the two datasets. A total of 197 plots, hereafter called Carbon Monitoring System (CMS) plots, across the four sites (48 in ME, 50 in MN, 49 in PANJ and 50 in SC) were established in forest lands and were distributed across 15 LiDAR-based structural strata (3 cover classes × 5 height classes). The plots were fixed-area circular (16.1 m radius), and closely matched the 30-m pixel size of the Landsat data and LiDAR grid metrics. The coordinates of plot centers were estimated using a Javad Triumph-2 survey-grade global positioning system and differential correction post-processing that resulted in an average horizontal error of less than a meter. The field measurements included records of species, DBH, and total heights of trees with DBH of at least 12.7 cm within the plots, following Forest Inventory and Analysis (FIA) measurement protocols [55]. The standard FIA algorithms were used to predict AGBs for individual sample trees [56] which were aggregated to obtain plot-level estimates.

2.5. Field Data for Model Validation

Inventory data collected on FIA plots within the LiDAR strips measured in 2014 (i.e., the year of LiDAR acquisition) were used for model validation. The FIA program has a national network of permanent plots established using a sampling design that produce equal probability random samples [57]. In addition, FIA plots located within the strips and measured in the previous and subsequent years of LiDAR acquisition were added to the validation dataset because there were few FIA plots inventoried within strips in the same year of acquisition. Thus, FIA plots measured in 2013, 2014 and 2015 were used for the validation of site-specific models in ME and MN, while the plots measured in 2012, 2013 and 2014 were used for the validation of site-specific models in PANJ and SC, per the availability of measurements in the FIA database. The number of FIA plots used for model validation was 86 in ME, 102 in MN, 42 in PANJ, and 75 in SC, such that each plot has at least one tree (i.e., plot-level basal area > 0). Actual coordinates for these plots were obtained from the Northern Research Station of FIA in St. Paul, Minnesota.
The FIA plot configuration consists of a cluster of four subplots with a central and three peripherals at 36.58-m (120-ft) horizontal distance and 0°, 120° and 240° azimuths from the central subplot. Each subplot has a 7.32-m (24-ft) radius in which all trees with DBH of at least 12.7 cm are measured for various attributes, including species, DBH, and total height. The tree-size measurements were used with species-specific allometric models to predict tree AGB [56] which were then summed to obtain plot-level estimates (Table 3). The FIA plots are distributed over all land-use classes and ownership, and the sampling intensity on the ground was up to three plots per 2400 ha; NJ portion of the area had intensity of only one plot per 2400 ha.

2.6. Modeling and Validation

The predicted Landsat variables and actual LiDAR grid metrics of 2014 were intersected with actual CMS and FIA plot locations in a GIS environment. In the spatial joining of response and predictor variables, we attached the digital value of co-located pixels of predictors to the CMS data, while the mean spectral values from each 3 × 3 pixel-window were attached to the FIA plot data. This was performed because the CMS plots matched the spatial resolution of predictors but small FIA subplots are distributed over a square block of nine 30-m by 30-m pixels [57]. For this reason, the FIA plots were not used for model training, because using plots of smaller size than the spatial resolution of predictors has negative effects on model prediction accuracy [58]. Hence, a site-specific training data frame, based on the CMS field measurements, was developed for each site and a generalized training data frame was produced by pooling over all the CMS data points from the four sites. Both training data frames had 67 predictors (12 Landsat fitted and 55 LiDAR grid metrics, Table A1).
We applied ordinary least squares multiple linear regression (MLR) and the k-nearest neighbors technique using an RF-based distance metric (RF-kNN) to each of the training data frames to construct site-specific and generic AGB models. In the RF-kNN method available in the ‘yaImpute’ package of R, a set of training points with known values of response and predictors are used to predict a response at any target point (where only spatial predictors are known) as a weighted average of k-nearest neighbors from the reference set where neighbors are identified via the RF algorithm [33]. The spectral similarity of the predictors at the target and reference points provides the basis for a nearness or proximity measure. In the process, RF creates an ensemble of multiple regression tree models (for continuous variables), each constructed from a bootstrap subsample of the reference frame, and the nearness is estimated as the proportion of all trees in the ensemble that associates a target point with a particular subset of k points in the reference set. RF-kNN can accommodate a large number of predictors, rank their importance in the models, and generate a measure of model accuracy through internal cross-validation. The RF-kNN model generates mean square error (MSE) as the average value of the errors met with the out-of-bag data corresponding to approximately 33% of the plots that are withheld from the bootstrap subsample when constructing the tree models. Our previous studies [8,59] have revealed that using about 2000 trees and three neighbors (i.e., the value of k) in the RF-kNN model provides optimal results; so, we used these numbers and other parameters were set to the default values in the R package [33].
Although RF is considered robust to the spatial auto-correlation of predictors, multi-collinear variables were identified and removed before fitting the models. First, the variables were pruned using a multivariate variable screening process based upon QR-matrix decomposition [60,61]. Subsequently, an RF-based model selection procedure [61,62] was applied to obtain a parsimonious set of model predictor variables for each training data frame. The RF model selection procedure uses a percentage increase in model MSE to select a suite of the least number of predictor variables that explains the greatest proportion of variation in AGB (e.g., [29,63]. The set of predictors identified in the RF model selection procedure were further subjected to both forward and backwards elimination methods of stepwise regression [59,64] to remove predictors that did not contribute statistically significantly to improving the fit of the model to the data. The optimal sets of predictors thus selected for each site and for the pooled (generic) dataset were used separately in multiple-linear regression (MLR) and the RF-kNN frameworks to construct AGB models. In addition, the same set of predictor variables was used for both MLR and RF-kNN modeling and the accuracies were compared with respect to adjusted R2 and root mean square error (RMSE) statistics; the RF-kNN only provides pseudo-R2 and MSE. The pseudo-R2 is the percent variance explained that measures how well out-of-bag predictions explain the target variance of the training set [34]. The square root of MSE estimate from the RF was taken to obtain RMSE. The importance ranking of predictors based on RF is shown in Figure 2.
The models were developed for two scenarios, one using both LiDAR and Landsat predictors and the other using only Landsat predictors. Because LiDAR acquisition is still a big constraint for a large-area spatio-temporal assessment of AGB, and Landsat data are freely available (since 1970s), we wanted to see how the combined variables from LiDAR and Landsat sensors improve model accuracy compared to when only LiDAR or Landsat data are used. The models with minimum RMSE (as the fit statistic) for both the scenarios were applied to each site to obtain AGB maps that were intersected with the FIA plots within the LiDAR strips to evaluate the AGB predictions from the generic and site-specific models with the field observations, with emphasis on mean deviation (bias) and the standard error of the mean. The accuracy of predictions at the FIA plot locations were calculated in terms of mean deviation, RMSE and Pearson’s correlation coefficient (r). The model assisted regression estimators [38], especially standard errors of estimates, were used to evaluate the confidence interval for the large-area estimates. In addition, equivalence tests [65,66] for the predictions based on generic and site-specific models at the FIA plot locations were conducted. The equivalence testing uses two one-sided tests and relies on a subjective choice of a region in which differences between predicted and observed values are considered negligible. A region of equivalence, for example, set to ±25% of the standard deviation around the mean infers that predicted and observed values do not differ if the absolute value of the mean of the differences is less than 25% of the standard deviation.

3. Results

The simple multiple linear regression analysis dependent on both LiDAR and Landsat (LiD.LS) variables produced more accurate models in terms of fit statistics than the RF-based models when the same optimal set of predictors was used in both the approaches (Table 4). The site-specific MLR models produced adjusted R2 ≥ 0.78, and the generic model from the pooled reference frame explained a comparable amount of variance (adj. R2 = 0.85). In fact, the generic model had larger adjusted R2 than the site-specific models for MN and PANJ, although the numbers of selected predictors and type of metrics were different among the models. The largest and smallest proportions of explained variance with the LiD.LS dependent model were obtained for SC and MN, respectively. As expected, the RMSE for the site-specific models followed the average and standard deviation trends of AGB distributions in the sample plots (e.g., the PANJ site displayed the largest mean and standard deviation in the training data and had largest RMSE; Table 3 and Table 4). The models constructed using only Landsat (LS) variables resulted in a large reduction in the proportion of variance explained relative to the LiD.LS models. The proportion of explained variance for the generic LiD.LS model was more than twice of the proportion for the generic LS model. The LS model with minimum RMSE and maximum adjusted R2 was obtained in ME (adj. R2 = 0.61), and the opposite was found in MN (adj. R2 = 0.26). One notable observation was that the Landsat metrics were least important for the LiD.LS models and only a single predictor was found to contribute statistically significantly in one of the site-specific models, as well as in the generic model (i.e., SWIR band in ME and TCB in the generic, Figure 2).
The importance rankings and the number of predictor variables making statistically significant contributions for the generic and site-specific LiD.LS models were not consistent across the sites (Figure 2). However, the cover metrics and density of LiDAR returns in vertical height strata were important for all the models. Interestingly, the proportions of LiDAR returns in the first five vertical strata (sum of which is a proxy for canopy volume) along with the Landsat-derived TCB index were found to make significant contributions for the generic LiD.LS model. When the six predictors selected for the generic LiD.LS model were used to construct models for the individual sites in the RF framework, the fit statistics (i.e., pseudo-R2 and RMSE) degraded somewhat with respect to the site-specific models produced using the optimal set of LiDAR and Landsat predictors (Table 4 and Table 5). However, the same six predictors in simple MLR analyses provided similar or improved fit statistics compared to the site-specific models based on the optimal set of LiD.LS predictors.
When scatterplots of CMS plot-level AGB against some selected LiDAR metrics such as cover (CovDBH) and average elevation (ElevAv) were made (figures not shown), large correlations were observed. The Pearson’s correlation coefficients (r) between AGB and ElevAv were 0.81, 0.70, 0.79 and 0.88, and between AGB and CovDBH were 0.77, 0.70, 0.76 and 0.63 in ME, MN, PANJ, and SC, respectively. However, Landsat variables such as the SWIR band reflectance provided r = −0.64, −0.21, −0.18 and −0.54, and TCB provided r = −0.63, −0.19, 0.04 and −0.53 for ME, MN, PANJ, and SC, respectively. The negative correlations here imply that with increasing biomass, the SWIR band reflectance and TCB of canopy diminishes (e.g., [9]).
Compared to RF models, the MLR models resulted in larger R2 and smaller RMSE at each site (Table 4). Hence, the site-specific and generic MLR models were used to develop AGB maps for the LiDAR coverage, and the predictions were evaluated with the observations at FIA plot locations (Table 6). The maps dependent on LiD.LS are not shown because of narrow LiDAR strips; only the maps dependent on LS models are shown in Figure A1 and Figure A2. The generic model compared to site-specific models resulted in similar or larger correlations with the field observations over the FIA plots at the four sites (Table 6). The mean deviations (bias) of site-specific models in MN and SC were less than in the generic model; otherwise the generic model had smaller mean deviation at the other two sites (i.e., ME and PANJ). In terms of RMSE, the generic model produced smaller error than the site-specific models for ME and PANJ, while the results were opposite for MN and SC. Equivalence tests of the observed AGB against the predicted AGB either by generic or site-specific models at FIA plot locations did not reveal equivalence at 25% region of similarity and 5% level of significance (α = 0.05). When AGB was predicted at each FIA plot location by the generic and site-specific models, the two sets of predictions did not differ at the 25% region of similarity and α = 0.05 (Figure 3). The correlation coefficients of AGB predictions by the generic and site-specific models were 0.95, 0.93, 0.97, and 0.96 for ME, MN, PANJ and SC, respectively. The training data frame of the generic LiD.LS model (i.e., 197 CMS plots with the six predictors as in Table 5) when subjected to an iterative analysis that trained a model using 177 plots and applied to the remaining 20 plots, produced r = 0.91, bias = −0.13 Mg·ha−1 and RMSE = 27.40 Mg·ha−1 after 1000 iterations.
The standard errors (SE) of estimates of the population means, an important statistic for model inference in the form of a confidence interval to the mean for large area estimation [38], using the site-specific models were comparable to the SE obtained when applying the generic model to the individual sites (Table 7 and Table 8). The prediction estimates at the CMS plot locations revealed that the SE values of the site-specific models are similar but smaller than the SE values for the generic model (Table 7). In contrast, the prediction estimates at the FIA plot locations revealed that the generic model produce smaller SE values than site-specific models for all sites (Table 8). The estimated bias (mean deviation of the residuals) with each site-specific model was zero when the models were applied to the CMS plot locations (i.e., the same data used to construct the models); but the site-specific models when applied to the FIA plot locations produced positive estimated bias for all sites. The generic model also produced positive estimated bias for CMS and FIA plot locations of individual sites. The estimated biases of generic model predictions at the FIA plots were 4.8%, 41.2%, 12.8% and 41.1% of the mean of sample observations for ME, MN, PANJ and SC, respectively. The biases with site-specific models were 18.5%, 22.8%, 13.7% and 35.1% of the mean of sample observations for ME, MN, PANJ and SC, respectively. The differences of model-assisted regression estimates (MARE) of population mean using the generic and site-specific models at the FIA plots were 4.7%, 8.2%, 14.6% and 6.6% of the mean of sample observations for ME, MN, PANJ and SC, respectively.

4. Discussion

Previous studies have shown strong relationships between LiDAR metrics and coincident in situ measurements, and most of the models are representative of small test sites with little variation in biophysical environments. This study also revealed the strength of LiDAR dependent AGB models that are applicable to regional assessment. We did not attempt to produce forest type or strata-specific generic models (because of limited field plot data), although published works have shown better performance of such models. For example, Latifi et al. [28] classified field data into broadleaved, coniferous and mixed forest types, and found better performance of a strata-specific model compared to a model from a pooled reference data frame. White et al. [67] evaluated various models based on LiDAR data acquired in both leaf-off and leaf-on canopy conditions in coniferous, deciduous and mixed forest types in the Rocky Mountains of Alberta, Canada. They found that pooling leaf-on and leaf-off data for a selected forest type did not adversely impact model accuracy (as relative RMSE and bias of pooled models differed by <2% compared to either leaf-on or leaf-off models). They also noted poor accuracy of the generic model built by pooling all data regardless of forest types and canopy conditions at the time of LiDAR acquisitions; RMSE of the generic model was twice as large as that of forest type-specific models. White et al. [67] and Villikka et al. [68] found that mixing models and data types (i.e., applying leaf-off models to leaf-on data, and vice versa) resulted in large error and bias in model outcomes. Considering these observations, we did not apply site-specific models to the other sites (e.g., using developed models in ME to predict AGB in SC), but found generic and site-specific models to perform similarly.
Comparable to this study, some large-area studies employing LiDAR strip datasets do exist, but comprehensive analyses of site-specific and generic AGB models are lacking. For example, Lefsky et al. [69] used large-footprint spaceborne profiling LiDAR data to predict maximum canopy height in three different ecosystems: tropical broadleaf, temperate broadleaf and temperate conifer forests. They found that the generic model explained 48% of the variability while site-specific models explained 59% to 68% of the variability. Nelson et al. [70] used airborne profiling LiDAR in the entire state of Delaware, USA, and developed separate AGB prediction models for deciduous, conifer and mixed forest types, along with a generic model across all strata. Their generic model explained higher variance (R2 = 0.66) compared to the forest type-specific models where R2 ranged from 0.28 for the mixed forest to 0.44 for the conifer forest. Maltamo et al. [71] used LiDAR strip samples (200 m wide and 1500 km long) combined with geo-climatic variables and major species information for AGB modeling in mountain forests in Norway. However, their study categorized only three forest types: spruce-, pine- and deciduous-dominated. They found that plots across all species can be pooled for modeling and inventory of spruce forests, but site-specific models are optimal in areas dominated by pine or deciduous species. They also noticed that geo-climatic variables, serving as proxies for growing conditions, are less influential compared to species information when combined with LiDAR metrics in AGB modeling (addition of geo-climatic variables improved RMSE by <2% and species information improved RMSE by >19%, with respect to the model based on only LiDAR metrics). In Maltamo et al. [71], the RMSE was 47.8% for the generic model and 33.1 to 47.3% for the forest-specific models. In the absence of species information in our study, the MLR generic model explained 85% of the variance compared to a range of 78% to 92% in the site-specific models dependent on both LiDAR and Landsat predictors. However, our generic model produced AGB estimates that are equivalent to the estimates of site-specific models at 25% region of similarity (Figure 3). It should be noted that the equivalence is based on a subjective choice of a region of similarity, and users should decide on how large a difference is acceptable to conclude that the generic provide predictions equivalent to the predictions of the site-specific model.
Laser pulses have varying levels of interactions in the canopy of broadleaf and conifer trees due to the differences in crown shape. Hence, it is logical to assume that species-specific models would perform better. If we assume that Landsat data capture more species information than LiDAR data, then it is logical to state that models including both LiDAR- and Landsat-derived predictors have high accuracies. However, only one Landsat metric was shown to be useful at one site (i.e., ME). Maltamo et al. [71] explained that species- and location-wise variations in crown biomass are largely accounted for by LiDAR, because the echo distribution is directly dependent on the three-dimensional crown structure. In the leaf-on canopy conditions for our LiDAR datasets, we expected accurate estimates of canopy heights, as pulses are more likely intercepted in upper canopy levels. Particularly in deciduous forests, leaf-on data can be expected to provide better models [67]. However, White et al. [67] and Næsset [72] found that leaf-off LiDAR data correlate better to field observed AGB than leaf-on data, so we recommend further study for sites with leaf-off data to reevaluate the performance of generic model. Considering the limitations of remotely sensed data (e.g., limited spatial coverage and incapability for species mapping with LiDAR, and signal saturation of radar or passive optical imagery in high biomass areas), several studies have underscored the need for data fusion from different sources to accurately estimate AGB over a large area (e.g., [30,73,74,75]).
The importance rankings of predictors in the site-specific models varied with sites. This was also revealed from the pixel-level polynomial model fitting to the time-series of observed metrics of Landsat data. The polynomial fit statistics of the same metrics at different sites varied significantly. For example, the fitting to the SWIR surface reflectance produced R2 values ≥0.40 in 71% of the pixels at ME, but only in 19% of pixels at the PANJ site (Table 9). It may be inferred that the low average R2 for a predictor metric at a given site is due to larger sensitivity to disturbances. This is because the same set of Landsat imagery was used to obtain several predictors at a given site and when pixel-level polynomial were fit, some of the metrics resulted in large and others small R2.
LiDAR metrics among the selected variables in the LiD.LS model at each site are much more influential than Landsat variables. Canopy height, canopy density, and cover metrics derived from LiDAR data are commonly used and the variables selected in this study match the key variables reported elsewhere in forest biomass inventories [76]. Lefsky et al. [69] found that maximum canopy height alone explained about 73% of the variance in AGB. Maltamo et al. [71] found that LiDAR metrics related to canopy density were useful in the prediction of AGB in spruce-dominated forests. Nelson et al. [70] found that average height, quadratic mean height, and canopy cover accounted for the most of the variation in AGB. Lim and Treitz [77] concluded that laser-based quantiles of canopy heights are useful in predicting forest biomass.
The reduced accuracy estimates, such as the percent variance explained (pseudo-R2) and RMSE, of RF-models compared to the MLR models are justified. The MLR relies on all observations in the training data to minimize the residuals of prediction, while RF performs using bootstrap samples. The RF prediction is the average of the predictions by numerous tree models, each built from a bootstrap sample of the training data, applied to out-of-bag data for a target observation [34]. An estimate of error rate in the RF algorithm is obtained from the difference of out-of-bag predictions and corresponding observations. Since RF predictions are out-of-bag cross validated, the variance of predictions is generally larger in RF compared to MLR [78]. This implies that pseudo-R2 in RF would be less than the R2 in MLR, when the same set of predictors is used in both approaches. Because the CMS data were representative of all forest types in the study sites and satisfied the assumptions of parametric regression (e.g., normal distribution of plot-level AGB), the MLR models obtained after screening of collinear and insignificant predictors are reliable for application in operational forestry settings. In addition, MLR models are easy to apply using GIS platforms to quickly produce maps, whereas implementation of RF models can be time-intensive for large geographical areas. The better performances of MLR than RF are also observed in other studies (e.g., [78,79]).
Spatial and temporal match of field sampling and remote sensing data significantly influence model accuracy [80,81]. Intuitively, very accurate models need to be site-specific, species-specific and scale-dependent. This implies that models are first formulated by integrating field and remotely sensed data of closely matching resolutions, and subsequently extended spatially in the same coverage area using spatial predictors at the same resolutions [82]. Non-equivalence of all model predictions and FIA field observations can be attributed to spatial and temporal mismatch of FIA plot data as well as inaccuracy in the precise coordinates of FIA plot locations. The FIA plot validation data contained measurements that occurred between 2012 and 2015, while the remote sensing predictors were for the year 2014. The spatial resolution of predictors in the study was 30-m while each FIA plot with four subplots is spread over a block of a 3 × 3 pixel window. A single FIA subplot is 168 m2 which is less than one-fifth of the area of 30-m pixel (900 m2). It is understandable that the smaller size of FIA subplots resulted in a poor match with pixel-level predictions because of the mixed spectral signals of 30-m pixels. Furthermore, reduced precision in the coordinates of FIA plot centers poorly links in situ data with corresponding remote sensing variables. Published studies have shown that larger circular plots (10–25 m radius) improve model fit and prediction accuracy of forest inventories, because a higher degree of spatial overlap between ground data and remotely sensed metrics are attained for any given GPS error [17,82,83]. Furthermore, edge-effects (i.e., counting of trees along plot boundary as in or out) are reduced with larger plot sizes [58]. One more source of error in our models was the 12.7 cm minimum DBH for sample tree measurements in the CMS plots. This DBH minimum restricted field measurements of smaller trees while LiDAR metrics captured those in the reference data. Published studies have also indicated that the selection of predictors and model types plays a critical role in biomass modeling and different methods may be necessary for diverse forest conditions [29,84,85]. In the remote sensing community, there is differing opinion regarding the replication potential of a statistical model and the predictor metrics across different forest types. One contribution of this study is the finding that data from the different forest types across the eastern USA can be integrated with large-scale LiDAR acquisitions (provided the sensors have similar configuration) or Landsat data with little impact on model performance compared to site-specific models. There is minimal penalty in the application of LiD.LS generic versus site-specific models, if the stratification is ignored due to a small number of field plots (for both model training and validation) in a regional assessment.

5. Conclusions

Within the context of remote sensing-assisted NGHGIs, this study highlights the potential use of a generic model for estimation of forest AGB across diverse forest types. The overall conclusion is that the generic model provided accuracy comparable to site-specific models in which local forest types and growing conditions were pooled. The generic model using both LiDAR and Landsat variables explained more than twice the variance of the generic model using only Landsat variables. In conditions where the samples of field plots by forest or species types are insufficient for model training, it is useful to design a generic model that renders a minor penalty with respect to site-specific models, if the objective is to reduce costs for application over large geographic areas. Some published studies have shown that leaf-off data produce better models than leaf-on data for AGB mapping; however, we recommend future studies that apply leaf-off data to examine a similar modeling framework as conducted in this study.

Acknowledgments

This research was supported by the US Department of Agriculture Forest Service Information Resources Direction Board through the Landscape Change Monitoring System and the NASA Carbon Monitoring System (NASA reference number: NNH13AW62I), in addition to the US Forest Service—Northern Research Station (#14-JV-11242305-012), and the Minnesota Agricultural Experiment Station (project MIN-42-063). We thank Ronald McRoberts, Brian Clough, Jennifer Corcoran, David Wilson and three external reviewers for their comments that improved this manuscript.

Author Contributions

R.K.D., M.B.R. and G.M.D. conceived and designed the research. W.B.C. and H.-E.A. provided data; R.K.D. processed remote sensing data, performed statistical analysis and drafted the manuscript; M.B.R., G.M.D., W.B.C., H.-E.A. and C.W.W. reviewed and edited the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Figure A1. Aboveground biomass distribution maps in the study areas in Maine (ME) and Minnesota (MN), based on the site-specific (left column) and generic models (right column) dependent only on Landsat derived predictors.
Figure A1. Aboveground biomass distribution maps in the study areas in Maine (ME) and Minnesota (MN), based on the site-specific (left column) and generic models (right column) dependent only on Landsat derived predictors.
Remotesensing 09 00598 g004
Figure A2. Aboveground biomass distribution maps in the study areas in Pennsylvania-New Jersey (PANJ) and South Carolina (SC), based on the site-specific (left column) and generic models (right column) dependent only on Landsat derived predictors.
Figure A2. Aboveground biomass distribution maps in the study areas in Pennsylvania-New Jersey (PANJ) and South Carolina (SC), based on the site-specific (left column) and generic models (right column) dependent only on Landsat derived predictors.
Remotesensing 09 00598 g005
Table A1. Description of 54 different LiDAR metrics used in the study.
Table A1. Description of 54 different LiDAR metrics used in the study.
LiDAR MetricsDescription
DenStra1Total returns in the vertical height interval of 1 to 4 m divided by 3
DenStra2Total returns in the vertical height interval of 4 to 8 m divided by 4
DenStra3Total returns in the vertical height interval of 8 to 16 m divided by 8
DenStra4Total returns in the vertical height interval of 16 to 32 m divided by 16
DenStra5Total returns in the vertical height interval of 32 to 64 m divided by 32
Stratum0Total return proportion in the vertical height interval 0 to 1 m
Stratum1Total return proportion in the vertical height interval 1 to 4 m
Stratum2Total return proportion in the vertical height interval 4 to 8 m
Stratum3Total return proportion in the vertical height interval 8 to 16 m
Stratum4Total return proportion in the vertical height interval 16 to 32 m
Stratum5Total return proportion in the vertical height interval 32 to 64 m
CovMeanPercentage of all returns above mean (all returns above mean × 100/total count of all returns)
CovModePercentage of all returns above mode per pixel (all returns above mode × 100/total count of all returns)
CovDBHPercentage of all returns above DBH per pixel (all returns above DBH × 100/total count of all returns)
ElevAADAverage absolute deviation of elevations of all returns above DBH
ElevAvAverage of elevations of all returns above DBH
CRRCanopy relief ratio ((mean − min)/(max − min)) of elevations of all returns
ElevCMCubic mean of elevations of all returns
ElevCVCoefficient of variation of elevations of all returns above DBH
ElevIQInterquartile range of elevations of all returns above DBH
ElevKurtKurtosis of elevations of all returns above DBH
ElevL1First L-moment of elevations of all returns above DBH
ElevL2Second L-moment of elevations of all returns above DBH
ElevL3Third L-moment of elevations of all returns above DBH
ElevL4Fourth L-moment of elevations of all returns above DBH
ElevLCVL-moment coefficient of variation of elevations of all returns above DBH
ElevLkurtL-moment kurtosis of elevations of all returns above DBH
ElevLskewL-moment skewness of elevations of all returns above DBH
EMADmedMedian of the absolute deviations from the overall median of elevations
EMADmodMode of the absolute deviations from the overall mode of elevations
ElevMaxMaximum of elevations of all returns above DBH
ElevMinMinimum of elevations of all returns above DBH
ElevModMode of elevations of all returns above DBH
ElevPiith percentile of elevations of all returns above DBH, where i = 1, 5, 10, 20, 25, 30, 40, 50, 60, 70, 75, 80, 90, 95 and 99
ElevQMQuadratic mean of elevations of all returns
ElevSkewSkewness of elevations of all returns above DBH
ElevSDStandard deviation of elevations of all returns above DBH
ElevVarVariance of elevations of all returns above DBH
FPVFilled potential volume (ratio of the volume under the canopy to volume under a surface anchored at the max height in the cell)
rumpleRatio of the surface area of the canopy surface to the flat area of the cell
CHMCanopy height model (canopy surface model normalized with bare-earth model)

References

  1. Houghton, R.A.; Hall, F.; Goetz, S.C. Importance of biomass in the global carbon cycle. J. Geophys. Res. 2009, 114, 1–13. [Google Scholar] [CrossRef]
  2. Houghton, R.A.; House, J.I.; Pongratz, J.; Van der Werf, G.R.; DeFries, R.S.; Hansen, M.C.; Le Quere, C.; Ramankutty, N. Carbon emissions from land use and land-cover change. Biogeosciences 2012, 9, 5125–5142. [Google Scholar] [CrossRef]
  3. Köhl, M.; Lasco, R.; Cifuentes, M.; Jonsson, Ö.; Korhonen, K.T.; Mundhenk, P.; de Jesus Navar, J.; Stinson, G. Changes in forest production, biomass and carbon: Results from the 2015 UN FAO Global Forest Resource Assessment. For. Ecol. Manag. 2015, 352, 21–34. [Google Scholar] [CrossRef]
  4. Le Quéré, C.; Peters, G.P.; Andres, R.J.; Andrew, R.M.; Boden, T.A.; Ciais, P.; Friedlingstein, P.; Houghton, R.A.; Marland, G.; Moriarty, R.; et al. Global carbon budget 2013. Earth Syst. Sci. Data 2014, 6, 235–263. [Google Scholar] [CrossRef]
  5. Pan, Y.; Birdsey, R.A.; Fang, J.; Houghton, R.; Kauppi, P.E.; Kurz, W.A.; Phillips, O.L.; Shvidenko, A.; Lewis, S.L.; Canadell, J.G.; et al. A large and persistent carbon sink in the world’s forests. Science 2011, 333, 988–993. [Google Scholar] [CrossRef] [PubMed]
  6. White, J.C.; Wulder, M.A.; Varhola, A.; Vastaranta, M.; Coops, N.C.; Cook, B.D.; Pitt, D.; Woods, M. A Best Practice Guide for Generating Forest Inventory Attributes from Airborne Laser Scanning Data Using an Area-Based Approach. Available online: http://www.cfs.nrcan.gc.ca/pubwarehouse/pdfs/34887.pdf (accessed on 12 June 2017).
  7. Wulder, M.A.; White, E.M.; Nelson, R.F.; Næsset, E.; Ørka, H.O.; Coops, N.C.; Hilker, T.; Bater, C.W.; Gobakken, T. LiDAR sampling for large-area forest characterization: A review. Remote Sens. Environ. 2012, 121, 196–209. [Google Scholar] [CrossRef]
  8. Deo, R.K.; Russell, M.B.; Domke, G.M.; Woodall, C.W.; Falkowski, M.J.; Cohen, W.B. Using Landsat time-series and LiDAR to inform aboveground forest biomass baselines in northern Minnesota, USA. Can. J. Remote Sens. 2017, 43, 28–47. [Google Scholar] [CrossRef]
  9. Deo, R.K. Modeling and mapping of aboveground biomass and carbon sequestration in the cool temperate forest of north-east China. Master’s Thesis, ITC, Enschede, The Netherlands, 2008. [Google Scholar]
  10. Lefsky, M.A.; Cohen, W.B.; Parker, G.G.; Harding, D.J. LiDAR remote sensing for ecosystem studies. BioScience 2002, 52, 19–30. [Google Scholar] [CrossRef]
  11. Pflugmacher, D.; Cohen, W.B.; Kennedy, R.E.; Yang, Y. Using Landsat-derived disturbance and recovery history and LiDAR to map forest biomass dynamics. Remote Sens. Environ. 2014, 151, 124–137. [Google Scholar] [CrossRef]
  12. Zald, H.S.; Ohmann, J.L.; Roberts, H.M.; Gregory, M.J.; Henderson, E.B.; McGaughey, R.J.; Braaten, J. Influence of LiDAR, Landsat imagery, disturbance history, plot location accuracy, and plot size on accuracy of imputation maps of forest composition and structure. Remote Sens. Environ. 2014, 143, 26–38. [Google Scholar] [CrossRef]
  13. Hudak, A.T.; Lefsky, M.A.; Cohen, W.B.; Berterretche, M. Integration of LiDAR and Landsat ETM+ data for estimating and mapping forest canopy height. Remote Sens. Environ. 2002, 82, 397–416. [Google Scholar] [CrossRef]
  14. White, J.C.; Coops, N.C.; Wulder, M.A.; Vastaranta, M.; Hilker, T.; Tompalski, P. Remote Sensing Technologies for Enhancing Forest Inventories: A Review. Can. J. Remote Sens. 2016, 42, 619–641. [Google Scholar] [CrossRef]
  15. Næsset, E.; Gobakken, T. Estimation of above- and below- ground biomass across regions of the boreal forest zone using airborne laser. Remote Sens. Environ. 2008, 112, 3079–3090. [Google Scholar] [CrossRef]
  16. McGaughey, R.J. FUSION/LDV: Software for LiDAR Data Analysis and Visualization; Version 3.21; U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station; University of Washington: Seattle, WA, USA, 2014.
  17. Deo, R.K.; Froese, R.E.; Falkowski, M.J.; Hudak, A.T. Optimizing variable radius plot size and LiDAR resolution to model standing volume in conifer forests. Can. J. Remote Sens. 2016, 42, 428–442. [Google Scholar] [CrossRef]
  18. Wallerman, J.P.; Holmgren, J. Estimating field-plot data of forest stands using airborne laser scanning and SPOT HRG data. Remote Sens. Environ. 2007, 110, 227–234. [Google Scholar] [CrossRef]
  19. Jenkins, J.C.; Chojnacky, D.C.; Heath, L.S.; Birdsey, R.A. National-scale biomass estimators for United States tree species. Forest Sci. 2003, 49, 12–35. [Google Scholar]
  20. Falster, D.S.; Duursma, R.A.; Ishihara, M.I.; Barneche, D.R.; FitzJohn, R.G.; Varhammar, A.; Aiba, M.; Ando, M.; Anten, N.; Aspinwall, M.J.; et al. BAAD: A biomass and allometry database for woody plants. Ecology 2015, 96. [Google Scholar] [CrossRef]
  21. Chave, J.; Réjou-Méchain, M.; Búrquez, A.; Chidumayo, E.; Colgan, M.S.; Delitti, W.B.C.; Duque, A.; Eid, T.; Fearnside, P.M.; Goodman, R.C.; et al. Improved allometric models to estimate the aboveground biomass of tropical trees. Glob. Chang. Biol. 2014, 20, 3177–3190. [Google Scholar] [CrossRef] [PubMed]
  22. Domke, G.M.; Woodall, C.W.; Smith, J.E.; Westfall, J.A.; McRoberts, R.E. Consequences of alternative tree-level biomass estimation procedures on U.S. forest carbon stock estimates. For. Ecol. Manag. 2012, 270, 108–116. [Google Scholar] [CrossRef]
  23. Weiskittel, A.R.; Hann, D.W.; Kershaw, J.A.; Vanclay, J.K. Forest Growth and Yield Modeling; Wiley: Hoboken, NY, USA, 2011. [Google Scholar]
  24. Hollaus, M.; Wagner, W.; Schadauer, K.; Maier, B.; Gabler, K. Growing stock estimation for alpine forests in Austria: A robust LiDAR-based approach. Can. J. For. Res. 2009, 39, 1387–1400. [Google Scholar] [CrossRef]
  25. Westfall, J. Spatial-scale considerations for a large-area forest inventory regression model. Forestry 2015, 88, 267–274. [Google Scholar] [CrossRef]
  26. Gleaspm, C.J.; Im, J. Forest biomass estimation from airborne LiDAR data using machine learning approaches. Remote Sens. Environ. 2012, 125, 80–91. [Google Scholar]
  27. Latifi, H.; Nothdurft, A.; Koch, B. Non-parametric prediction and mapping of standing timber volume and biomass in a temperate forest: application of multiple optical/LiDAR-derived predictors. Forestry 2010, 83, 395–407. [Google Scholar] [CrossRef]
  28. Latifi, H.; Fassnacht, F.E.; Hartig, F.; Berger, C.; Hernández, J.; Corvalán, P.; Koch, B. Stratified aboveground forest biomass estimation by remote sensing data. Int. J. Appl. Earth Obs. 2015, 38, 229–241. [Google Scholar] [CrossRef]
  29. Moser, P.; Vibrans, A.C.; McRoberts, R.E.; Næsset, E.; Gobakken, T.; Chirici, G.; Mura, M.; Marchetti, M. Methods for variable selection in LiDAR-assisted forest inventories. Forestry 2016, 89, 1–13. [Google Scholar] [CrossRef]
  30. Chen, Q.; Laurin, G.V.; Battles, J.J.; Saah, D. Integration of airborne LiDAR and vegetation types derived from aerial photography for mapping aboveground live biomass. Remote Sens. Environ. 2012, 121, 108–117. [Google Scholar] [CrossRef]
  31. Brosofske, K.D.; Froese, R.E.; Falkowski, M.J.; Banksota, A. A review of methods for mapping and prediction of inventory attributes for operational forest management. Forest Sci. 2014, 60, 1–24. [Google Scholar] [CrossRef]
  32. Breiman, L. Random forest. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  33. Crookston, N.L.; Finley, A.O. yaimpute: An R package for kNN imputation. J. Stat. Softw. 2008, 23, 1–16. [Google Scholar] [CrossRef]
  34. Liaw, L.A.; Wiener, M. Classification and regression by random forest. R News. 2002, 2, 18–22. [Google Scholar]
  35. Zald, H.S.; Wulder, M.A.; White, J.C.; Hilker, T.; Hermosilla, T.; Hobart, G.W.; Coops, N.C. Integrating Landsat pixel composites and change metrics with LiDAR plots to predictively map forest structure and aboveground biomass in Saskatchewan, Canada. Remote Sens. Environ. 2016, 176, 188–201. [Google Scholar] [CrossRef]
  36. Magnussen, S. Arguments for a model-dependent inference? Forestry 2015, 88, 317–325. [Google Scholar] [CrossRef]
  37. Gregoire, T.G.; Næsset, E.; McRoberts, R.E.; Ståhl, G.; Andersene, H.E.; Gobakken, T.; Ene, L.; Nelson, R. Statistical rigor in LiDAR-assisted estimation of aboveground forest biomass. Remote Sens. Environ. 2016, 173, 98–108. [Google Scholar] [CrossRef]
  38. McRoberts, R.E.; Næsset, E.; Gobakken, T. Inference for LiDAR-assisted estimation of forest growing stock volume. Remote Sens. Environ. 2013, 128, 268–275. [Google Scholar] [CrossRef]
  39. McCaskill, G.L.; McWilliams, W.H.; Barnett, C.J.; Butler, B.J.; Hatfield, M.A.; Kurtz, C.M.; Morin, R.S.; Moser, W.K.; Perry, C.H.; Woodall, C.W. Maine’s Forests 2008; Resour. Bull. NRS-48; U.S. Department of Agriculture: Newtown Square, PA, USA, 2011.
  40. Miles, P.D.; Heinzen, D.; Mielke, M.E.; Woodall, C.W.; Butler, B.J.; Piva, R.J.; Meneguzzo, D.M.; Perry, C.H.; Gormanson, D.D.; Barnett, C.J. Minnesota’s Forests 2008; Resour. Bull. NRS-50; U.S. Department of Agriculture: Newtown Square, PA, USA, 2011.
  41. Crocker, S.J. Forests of New Jersey 2013; Resour. Update FS-27; U.S. Department of Agriculture: Newtown Square, PA, USA, 2014.
  42. McCaskill, G.L.; McWilliams, W.H.; Alerich, C.A.; Butler, B.J.; Crocker, S.J.; Domke, G.M.; Griffith, D.; Kurtz, C.M.; Lehman, S.; Lister, T.W.; et al. Pennsylvania’s Forests 2009; Resour. Bull. NRS-82; U.S. Department of Agriculture: Newtown Square, PA, USA, 2013.
  43. Rose, A.K. South Carolina’s Forests 2011; Resour. Bull. SRS-208; U.S. Department of Agriculture Forest Service: Asheville, NC, USA, 2016.
  44. USGS EROS. Landsat Surface Reflectance Higher-Level Data Products. U.S. Geological Survey, Earth Resources Observation and Science Center. 2017. Available online: https://landsat.usgs.gov/landsat-surface-reflectance-high-level-data-products (accessed on 16 March 2017).
  45. Huang, C.; Goward, S.N.; Masek, J.G.; Thomas, N.; Zhu, Z.; Vogelmann, J.E. An automated approach for reconstructing recent forest disturbance history using dense Landsat time series stacks. Remote Sens. Environ. 2010, 114, 183–198. [Google Scholar] [CrossRef]
  46. Healey, S.P.; Cohen, W.B.; Zhiqiang, Y.; Krankina, O.N. Comparison of Tasseled Cap-based Landsat data structures for use in forest disturbance detection. Remote Sens. Environ. 2005, 97, 301–310. [Google Scholar] [CrossRef]
  47. Crist, E.P.; Cicone, R.C. A physically based transformation of Thematic Mapper data—The TM Tasseled Cap. IEEE Trans. Geosci. Remote Sens. 1984, GE-22, 256–263. [Google Scholar] [CrossRef]
  48. Baig, M.H.A.; Zhang, L.; Shuai, T.; Tong, Q. Derivation of a tasselled cap transformation based on Landsat 8 at-satellite reflectance. Remote Sens. Lett. 2014, 5, 423–431. [Google Scholar] [CrossRef]
  49. USGS EROS. Landsat Surface Reflectance-Derived Spectral Indices; Product Guide Version 2.9; U.S. Geological Survery, Earth Resources Observation and Science Center: Garretson, SD, USA, 2016. Available online: https://landsat.usgs.gov/sites/default/files/documents/si_product_guide.pdf (accessed on 11 April 2016).
  50. Heiskanen, J. Estimating aboveground tree biomass and leaf area index in a mountain birch forest using ASTER satellite data. Int. J. Remote Sens. 2006, 27, 1135–1158. [Google Scholar] [CrossRef]
  51. Lu, D.; Chen, Q.; Wang, G.; Moran, E.; Batistella, M.; Zhang, M.; Vaglio Laurin, G.; Saah, D. Aboveground Forest Biomass Estimation with Landsat and LiDAR Data and Uncertainty Analysis of the Estimates. Int. J. For. Res. 2012, 2012, 1–16. [Google Scholar] [CrossRef]
  52. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2016. Available online: http://www.R-project.org/ (accessed on 5 August 2016).
  53. Cohen, W.B.; Yang, Z.; Kennedy, R. Detecting trends in forest disturbance and recovery using yearly Landsat time series: 2. TimeSync—Tools for calibration and validation. Remote Sens. Environ. 2010, 114, 2911–2924. [Google Scholar] [CrossRef]
  54. Zhu, Z.; Woodcock, C.E.; Holden, C.; Yang, Z. Generating synthetic Landsat images based on all available Landsat data: Predicting Landsat surface reflectance at any given time. Remote Sens. Environ. 2015, 162, 67–83. [Google Scholar] [CrossRef]
  55. O’Connell, B.M.; LaPoint, E.B.; Turner, J.A.; Ridley, T.; Pugh, S.A.; Wilson, A.M.; Waddell, K.L.; Conkling, B.L. The Forest Inventory and Analysis Database: Database Description and User Guide Version 6.0.2 for Phase 2. Available online: https://www.fia.fs.fed.us/library/database-documentation/historic/ver6/FIADB%20User%20Guide%20P2_6-0-2_final-opt.pdf (accessed on 1 August 2016).
  56. Woodall, C.W.; Heath, L.S.; Domke, G.M.; Nichols, M.C. Methods and Equations for Estimating Aboveground Volume, Biomas, and Carbon for Trees in the US Forest Inventory 2010; Gen. Tech. Rep. NRS-88; USDA, Forest Service, Northern Research Station: Newtown Square, PA, USA, 2011; p. 30.
  57. McRoberts, R.E. The national forest inventory of the United States of America. J. For. Sci. 2008, 24, 127–135. [Google Scholar]
  58. Næsset, E.; Bollandsås, O.M.; Gobakken, T.; Solberg, S.; McRoberts, R.E. The effects of field plot size on model-assisted estimation of aboveground biomass change using multitemporal interferometric SAR and airborne laser scanning data. Remote Sens. Environ. 2015, 168, 252–264. [Google Scholar] [CrossRef]
  59. Deo, R.K. Application of an imputation method for geospatial inventory of forest structural attributes across multiple spatial scales in the Great Lake States, USA. Ph.D. Thesis, Michigan Technological University, Houghton, MI, USA, 2014. [Google Scholar]
  60. Cížková, L.; Cížek, P. Numerical linear algebra. In Handbook of Computational Statistics: Concepts and Methods; Gentle, J.E., Hardle, W.K., Mori, Y., Eds.; Springer: Heidelberg, Germany, 2012; pp. 105–137. [Google Scholar]
  61. Evans, J.S.; Murphy, M.A. rfUtilities: Random Forests Model Selection and Performance Evaluation. 2015. Available online: http://cran.r-project.org/package=rfUtilities (accessed on 10 November 2016).
  62. Murphy, M.A.; Evans, J.S.; Storfer, A. Quantifying Bufo boreas connectivity in Yellowstone National Park with landscape genetics. Ecology 2010, 91, 252–261. [Google Scholar] [CrossRef] [PubMed]
  63. Fekety, P.A.; Falkowski, M.J.; Hudak, A.T. Temporal transferability of LiDAR-based imputation of forest inventory attributes. Can. J. For. Res. 2015, 45, 422–435. [Google Scholar] [CrossRef]
  64. Venables, W.N.; Ripley, B.D. Modern Applied Statistics with S, 4th ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
  65. Robinson, A.P.; Duursma, R.A.; Marshall, J.D. A regression-based equivalence test for model validation: shifting the burden of proof. Tree Physiol. 2005, 25, 903–913. [Google Scholar] [CrossRef] [PubMed]
  66. Robinson, A. Equivalence: Provides Tests and Graphics for Assessing Tests of Equivalence. R Package Version 0.7.2. 2016. Available online: https://cran.r-project.org/web/packages/equivalence (accessed on 17 January 2017).
  67. White, J.C.; Arnett, J.T.; Wulder, M.A.; Tompalski, P.; Coops, N.C. Evaluating the impact of leaf-on and leaf-off airborne laser scanning data on the estimation of forest inventory attributes with the area-based approach. Can. J. For. Res. 2015, 45, 1498–1513. [Google Scholar] [CrossRef]
  68. Villikka, M.; Packalén, P.; Maltamo, M. The suitability of leaf-off airborne laser scanning data in an area-based forest inventory of coniferous and deciduous trees. Silva Fennica. 2012, 46, 99–110. [Google Scholar] [CrossRef]
  69. Lefsky, M.A.; Harding, D.J.; Keller, M.; Cohen, W.B.; Carabajal, C.C.; Espirito-Santo, F.D.B.; Hunter, M.O.; de Oliveira, R. Estimates of forest canopy height and aboveground biomass using ICESat. Geophys. Res. Lett. 2005, 32, 1–4. [Google Scholar] [CrossRef]
  70. Nelson, R.; Krabill, W.; Tonelli, J. Regional forest inventory using an airborne profiling LiDAR. Jpn. Soc. For. Plan. 2008, 13, 287–294. [Google Scholar]
  71. Maltamo, M.; Bollandsås, O.M.; Gobakken, T.; Næsset, E. Large-scale prediction of aboveground biomass in heterogeneous mountain forests by means of airborne laser scanning. Can. J. For. Res. 2016, 46, 1138–1144. [Google Scholar] [CrossRef]
  72. Næsset, E. Assessing sensor effects and effects of leaf-off and leaf-on canopy conditions on biophysical stand properties derived from small-footprint airborne laser data. Remote Sens. Environ. 2005, 98, 356–370. [Google Scholar] [CrossRef]
  73. Fayad, I.; Baghdadi, N.; Bailly, J.S.; Barbier, N.; Gond, V.; Hérault, B.; El Hajj, M.; Fabre, F.; Perrin, J. Regional scale rain-forest height mapping using regression-kriging of spaceborne and airborne LiDAR data: Application on French Guiana. Remote Sens. 2016, 8, 240. [Google Scholar] [CrossRef]
  74. Fayad, I.; Baghdadi, N.; Guitet, S.; Bailly, J.S.; Hérault, B.; Gond, V.; El Hajj, M.; Minh, D.H.T. Aboveground biomass mapping in French Guiana by combining remote sensing, forest inventories and environmental data. Int. J. Appl. Earth Obs. 2016b, 52, 502–514. [Google Scholar] [CrossRef]
  75. Shendryk, I.; Hellström, M.; Klemedtsson, L.; Kljun, N. Low-density LiDAR and optical imagery for biomass estimation over boreal forest in Sweden. Forests 2014, 5, 992–1010. [Google Scholar] [CrossRef]
  76. Zhao, K.; Popescu, S.C.; Nelson, R. LiDAR remote sensing of forest biomass: A scale-invariant estimation approach using airborne lasers. Remote Sens. Environ. 2009, 113, 182–196. [Google Scholar] [CrossRef]
  77. Lim, K.S.; Treitz, P.M. Estimation of above ground forest biomass from airborne discrete return laser scanner data using canopy-based quantile estimators. Scand. J. For. Res. 2004, 19, 558–570. [Google Scholar] [CrossRef]
  78. Smith, P.F.; Ganesh, S.; Liu, P. A comparison of random forest regression and multiple linear regression for prediction in neuroscience. J. Neurosci. Methods. 2013, 220, 85–91. [Google Scholar] [CrossRef] [PubMed]
  79. Pourrahmati, M.R.; Baghdadi, N.N.; Darvishsefat, A.A.; Namiranian, M.; Fayad, I.; Bailly, J.S.; Gond, V. Capability of GLAS/ICESat data to estimate forest canopy height and volume in mountainous forests of Iran. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 5246–5261. [Google Scholar] [CrossRef]
  80. Huang, W.; Swatantran, A.; Johnson, K.; Duncanson, L.; Tang, H.; Dunne, J.O.N.; Hurtt, G.; Dubayah, R. Local discrepancies in continental scale biomass maps: A case study over forested and non-forested landscapes in Maryland, USA. Carbon Balance Manag. 2015, 10, 1–16. [Google Scholar] [CrossRef] [PubMed]
  81. McRoberts, R.E. The effects of rectification and Global Positioning System errors on satellite image-based estimates of forest area. Remote Sens. Environ. 2010, 114, 1710–1717. [Google Scholar] [CrossRef]
  82. Næsset, E. Predicting forest stand characteristics with airborne laser using a practical two-stage procedure and field data. Remote Sens. Environ. 2002, 80, 88–99. [Google Scholar] [CrossRef]
  83. Frazer, G.W.; Magnussen, S.; Wulder, M.A.; Niemann, K.O. Simulated impact of sample plot size and co-registration error on the accuracy and uncertainty of LiDAR-derived estimates of forest stand biomass. Remote Sens. Environ. 2011, 115, 636–649. [Google Scholar] [CrossRef]
  84. Lu, D. The potential and challenge of remote sensing-based biomass estimation. Int. J. Remote Sens. 2006, 27, 1297–1328. [Google Scholar] [CrossRef]
  85. Powell, S.L.; Cohen, W.B.; Healey, S.P.; Kennedy, R.E.; Moisen, G.G.; Pierce, K.B.; Ohmann, J.L. Quantification of live aboveground forest biomass dynamics with Landsat time-series and field inventory data: A comparison of empirical modelling approaches. Remote Sens. Environ. 2010, 114, 1053–1068. [Google Scholar] [CrossRef]
Figure 1. Location of four study sites in the eastern USA. The sites are represented by Landsat scenes of path/rows 12/28, 27/27, 14/32, and 16/37 in Maine (ME), Minnesota (MN), Pennsylvania-New Jersey (PANJ) and South Carolina (SC), respectively. The vertical and horizontal bands are LiDAR strip samples collected on about 7 to 12% coverage of the Landsat scenes.
Figure 1. Location of four study sites in the eastern USA. The sites are represented by Landsat scenes of path/rows 12/28, 27/27, 14/32, and 16/37 in Maine (ME), Minnesota (MN), Pennsylvania-New Jersey (PANJ) and South Carolina (SC), respectively. The vertical and horizontal bands are LiDAR strip samples collected on about 7 to 12% coverage of the Landsat scenes.
Remotesensing 09 00598 g001
Figure 2. Importance ranking of the LiDAR and Landsat predictors used in site-specific and generalized models at four sites in the eastern USA. TCB and SWIR represent tasseled cap brightness and short-wave infrared reflectance bands, respectively while the other metrics are described in the Table A1.
Figure 2. Importance ranking of the LiDAR and Landsat predictors used in site-specific and generalized models at four sites in the eastern USA. TCB and SWIR represent tasseled cap brightness and short-wave infrared reflectance bands, respectively while the other metrics are described in the Table A1.
Remotesensing 09 00598 g002
Figure 3. Equivalence plot of aboveground biomass (AGB) predictions by the generic and site-specific models at FIA plot locations. The black line represents the line of best fit, the dashed gray lines represent the 25% region of similarity for slope, and the shaded gray polygon represents the 25% region of similarity for intercept. The black and white vertical bars represent 95% confidence intervals for slope and intercept, respectively.
Figure 3. Equivalence plot of aboveground biomass (AGB) predictions by the generic and site-specific models at FIA plot locations. The black line represents the line of best fit, the dashed gray lines represent the 25% region of similarity for slope, and the shaded gray polygon represents the 25% region of similarity for intercept. The black and white vertical bars represent 95% confidence intervals for slope and intercept, respectively.
Remotesensing 09 00598 g003
Table 1. Average temperature, precipitation and degree days of the four study sites located across the eastern USA, 1985–2016.
Table 1. Average temperature, precipitation and degree days of the four study sites located across the eastern USA, 1985–2016.
SiteAverage Annual Precipitation (mm)Average Annual Temperature °C Cooling Degree Days °C Heating Degree Days °C
ME11705.21004750
MN7124.41605200
PANJ123710.63803150
SC120018.211801250
Source: http://www.ncdc.noaa.gov/cag/; Heating degree days are annual summations of negative differences between the mean daily temperature and 18 °C as the base; and cooling degree days are annual summations of positive differences between the mean daily temperature and 18 °C as the base (see, www.cpc.ncep.noaa.gov/products/analysis_monitoring/cdus/degree_days/ddayexp.shtml).
Table 2. Sensor-year-Julian day of the time-series Landsat imagery for four sites in the eastern USA.
Table 2. Sensor-year-Julian day of the time-series Landsat imagery for four sites in the eastern USA.
ME (Path 12, Row 28) MN (Path 27, Row 27)PANJ (Path 14, Row 32)SC (Path 16, Row 37)
L5-1985-205L5-1986-233L5-1984-265L5-1984-247
L5-1986-256L5-1987-236L5-1987-257L5-1985-185
L5-1988-246L5-1989-257L5-1988-212L5-1986-236
L5-1990-251L5-1990-212L5-1990-185L5-1987-239
L5-1992-257L5-1991-263L5-1991-172L5-1988-258
L5-1993-259L5-1995-210L5-1993-241L5-1991-202
L5-1995-249L5-1997-247L5-1995-231L5-1993-191
L5-1998-225L5-1998-218L5-1997-172L5-1994-258
L5-1999-244L5-1999-205L5-2000-261L5-1995-229
L5-2003-255L5-2000-256L5-2002-186L5-1996-184
L5-2004-258L5-2001-226L5-2003-237L5-1998-173
L5-2008-237L5-2003-248L5-2006-197L5-2000-227
L5-2010-242L5-2006-256L5-2007-248L5-2002-200
L5-2011-229L5-2007-259L5-2008-235L5-2005-208
L8-2014-237L5-2008-230L5-2010-240L5-2007-230
L8-2015-176L5-2010-235L5-2011-243L5-2011-273
L5-2011-254L8-2014-219L8-2014-233
L8-2013-259L8-2015-206
L5 stands for Landsat 5 Thematic Mapper (TM) and L8 stands for Landsat 8 Operational Land Imager (OLI).
Table 3. Summary of aboveground biomass (Mg·ha−1) from field plot measurements across four sites in the eastern USA.
Table 3. Summary of aboveground biomass (Mg·ha−1) from field plot measurements across four sites in the eastern USA.
SitesMeanMinimumMaximumStandard DeviationCoefficient of Variation (%)
CMS DataFIA Data CMS DataFIA DataCMS DataFIA DataCMS DataFIA DataCMS DataFIA Data
ME69.0478.040.842.13218.71201.8860.2943.9987.3156.36
MN51.2950.541.883.33185.24146.4645.6730.3889.0560.12
PANJ105.38136.570.660388.15396.7986.37102.4381.9675.01
SC83.4197.360.140329.62226.8686.6452.56103.8753.98
CMS = Carbon Monitoring System; FIA = Forest Inventory and Analysis. CMS data are plot data used for model fitting; FIA data are plot data used for model validation. There were 48, 50, 49 and 50 CMS plots measured in ME, MN, PANJ and SC, respectively. The number of FIA (validation) plots was 59 in ME, 70 in MN, 30 in PANJ, and 60 in SC. The FIA plots were selected such that each represents a single condition and per hectare basal-area and basal area weighted height is above zero.
Table 4. Fit statistics of site-specific and generic models using random forests (RF) and multiple linear regressions (MLR) across four sites in the eastern USA.
Table 4. Fit statistics of site-specific and generic models using random forests (RF) and multiple linear regressions (MLR) across four sites in the eastern USA.
Site-Specific & Generic ModelsModel Type Variables SelectedRF-ModelMLR-Model
Pseudo R2RMSE (Mg·ha−1)Adj.R2RMSE (Mg·ha−1)
MELiD.LSElevMax, CovDBH, DenStra4, SWIR, Stratum3, Stratum00.7926.80.8620.5
MNLiD.LSCovDBH, Stratum3, Stratum4, ElevP050.6526.70.7819.9
PANJLiD.LSCovMean, Stratum1, Stratum2, Stratum3, Stratum5, ElevSD0.6848.20.8233.0
SCLiD.LSCovDBH, DenStra4, ElevCM0.8631.20.9223.3
GenericLiD.LSTCB, Stratum1, Stratum2, Stratum3, Stratum4, Stratum00.7635.70.8527.5
MELSTCB, DI, EVI, IFZ0.4544.30.6135.4
MNLSTCB, DI, TCG, IFZ, NDMI, TCA0.1847.50.2636.4
PANJLSSWIR, EVI, TCG, NDMI, NDVI, TCA0.4563.40.4460.0
SCLSSWIR, DI, TCG, NBR, TCW0.2773.10.4460.7
GenericLSSWIR, TCB, EVI, IFZ, NBR, NDVI, TCW0.3260.80.3359.2
LiD.LS model uses both LiDAR and Landsat predictors, and LS model uses only Landsat predictors for the year 2014. Coefficients of all the predictors in the models have significant p-values at α = 0.05 in multiple linear regressions (MLR). The predictors were selected by applying the QR-decomposition algorithm for pruning the collinear variables followed by the random forest (RF) model selection approach (that uses standardized importance values of predictors and model improvement ratio) and both direction elimination methods in stepwise regression. Pseudo-R2 = % explained variance in the RF models. RMSE = root mean square error.
Table 5. Fit statistics of site-specific random forest (RF) and multiple linear regression (MLR) models dependent on the same variables as in the generic LiDAR and Landsat (LiD.LS) model for four sites across the eastern USA.
Table 5. Fit statistics of site-specific random forest (RF) and multiple linear regression (MLR) models dependent on the same variables as in the generic LiDAR and Landsat (LiD.LS) model for four sites across the eastern USA.
SiteModel Variables Selected RF-ModelMLR-Model
Pseudo R2RMSE (Mg·ha−1)Adj.R2RMSE (Mg·ha−1)
MELiD.LSTCB, Stratum1, Stratum2, Stratum3, Stratum4, Stratum00.7728.20.8819.3
MNLiD.LS0.6427.20.7819.6
PANJLiD.LS0.5557.10.8332.9
SCLiD.LS0.8334.70.9320.1
Coefficients of a few predictors in the models were not significant at α = 0.05.
Table 6. Relationship of observed and predicted aboveground biomass values at the FIA plot locations within the LiDAR strips of the four target sites in the eastern USA. Predictions by the generic model and site-specific model are compared with the FIA observations.
Table 6. Relationship of observed and predicted aboveground biomass values at the FIA plot locations within the LiDAR strips of the four target sites in the eastern USA. Predictions by the generic model and site-specific model are compared with the FIA observations.
SiteNo. of PlotsCorrelation Coef. (r)Bias (Mg·ha−1) RMSE (Mg·ha−1)
Generic ModelSite-Spec. ModelGeneric ModelSite-Spec. ModelGeneric ModelSite-Spec. Model
ME590.890.883.8 (4.8)14.7 (18.5)21.8 (27.5)26.3 (33.2)
MN700.800.8120.4 (41.3)11.3 (22.9)28.1 (56.9)23.3 (47.1)
PANJ300.920.9215.9 (12.8)17.0 (13.7)39.2 (31.7)42.3 (34.2)
SC600.850.8240.0 (41.1)34.2 (35.1)51.4 (52.8)51.1 (52.5)
Poo-led2190.87 19.30 (23.89) 63.34 (44.99)
Bias is calculated as predicted minus observed aboveground biomass and the numbers in parentheses represent relative bias (%). Numbers in parentheses represent relative root mean square error (RMSE, %).
Table 7. Summary of AGB predictions (Mg·ha−1) at the CMS plots within the LiDAR strips by the generic and site-specific MLR models dependent on LiDAR and Landsat derived variables.
Table 7. Summary of AGB predictions (Mg·ha−1) at the CMS plots within the LiDAR strips by the generic and site-specific MLR models dependent on LiDAR and Landsat derived variables.
SitesMean AGB of Sample Plot DataStd. Error of Agb in Sample ObservationsGeneric ModelSite-Specific Model
Mean AGB Prediction Over All PixelsEst. BiasMARE §Std. ErrorMean AGB Prediction Over All PixelsEst. BiasMARE §Std. Error
ME69.08.789.6−6.996.43.096.7096.72.9
MN51.36.469.310.858.53.064.4064.42.8
PANJ105.412.389.8−9.799.55.2109.00109.04.8
SC83.412.2102.35.496.93.3103.00103.03.3
§ MARE is model assisted regression estimate calculated as mean AGB prediction minus estimated bias.
Table 8. Summary of AGB predictions (Mg·ha−1) at the FIA plots within the LiDAR strips by the generic and site-specific MLR models dependent on LiDAR and Landsat derived variables.
Table 8. Summary of AGB predictions (Mg·ha−1) at the FIA plots within the LiDAR strips by the generic and site-specific MLR models dependent on LiDAR and Landsat derived variables.
SitesMean AGB of Sample Plot DataStd. Error of AGB in Sample ObservationsGeneric ModelSite-Specific Model
Mean AGB Prediction Over All PixelsEst. BiasMARE §Std. ErrorMean AGB Prediction Over All PixelsEst. BiasMAR §Std. Error
ME79.36.189.63.885.82.896.714.782.02.9
MN49.43.769.320.448.92.364.411.353.12.4
PANJ123.617.689.815.973.96.6109.017.092.07.2
SC97.46.8102.340.062.34.2103.034.268.84.9
§ MARE is model assisted regression estimate calculated as mean AGB prediction minus estimated bias.
Table 9. Proportion of total pixels that attained R2 in the range of 40–100% with the pixel-level polynomial curve fittings at four sites in the eastern USA.
Table 9. Proportion of total pixels that attained R2 in the range of 40–100% with the pixel-level polynomial curve fittings at four sites in the eastern USA.
Landsat MetricsMEMNPANJSC
IFZ76.854.330.749.8
SWIR71.255.529.143.1
NDMI68.524.237.539.0
DI68.141.337.148.3
Greenness65.411.234.822.8
Wetness65.231.435.040.1
SAVI63.410.335.922.2
Brightness57.936.423.141.9
TCA54.217.549.334.7
NDVI52.813.153.633.1
EVI48.710.126.523.5
NBR239.717.426.528.6

Share and Cite

MDPI and ACS Style

Deo, R.K.; Russell, M.B.; Domke, G.M.; Andersen, H.-E.; Cohen, W.B.; Woodall, C.W. Evaluating Site-Specific and Generic Spatial Models of Aboveground Forest Biomass Based on Landsat Time-Series and LiDAR Strip Samples in the Eastern USA. Remote Sens. 2017, 9, 598. https://doi.org/10.3390/rs9060598

AMA Style

Deo RK, Russell MB, Domke GM, Andersen H-E, Cohen WB, Woodall CW. Evaluating Site-Specific and Generic Spatial Models of Aboveground Forest Biomass Based on Landsat Time-Series and LiDAR Strip Samples in the Eastern USA. Remote Sensing. 2017; 9(6):598. https://doi.org/10.3390/rs9060598

Chicago/Turabian Style

Deo, Ram K., Matthew B. Russell, Grant M. Domke, Hans-Erik Andersen, Warren B. Cohen, and Christopher W. Woodall. 2017. "Evaluating Site-Specific and Generic Spatial Models of Aboveground Forest Biomass Based on Landsat Time-Series and LiDAR Strip Samples in the Eastern USA" Remote Sensing 9, no. 6: 598. https://doi.org/10.3390/rs9060598

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop