Airborne Lidar Sampling Strategies to Enhance Forest Aboveground Biomass Estimation from Landsat Imagery

: Accurately estimating aboveground biomass (AGB) is important in many applications, including monitoring carbon stocks, investigating deforestation and forest degradation, and designing sustainable forest management strategies. Although lidar provides critical three-dimensional forest structure information for estimating AGB, acquiring comprehensive lidar coverage is often cost prohibitive. This research focused on developing a lidar sampling framework to support AGB estimation from Landsat images. Two sampling strategies, systematic and classiﬁcation-based, were tested and compared. The proposed strategies were implemented over a temperate forest study site in northern New York State and the processes were then validated at a similar site located in central New York State. Our results demonstrated that while the inclusion of lidar data using systematic or classiﬁcation-based sampling supports AGB estimation, the systematic sampling selection method was highly dependent on site conditions and had higher accuracy variability. Of the 12 systematic sampling plans, R 2 values ranged from 0.14 to 0.41 and plot root mean square error (RMSE) ranged from 84.2 to 93.9 Mg ha − 1 . The classiﬁcation-based sampling outperformed 75% of the systematic sampling strategies at the primary site with R 2 of 0.26 and RMSE of 70.1 Mg ha − 1 . The classiﬁcation-based lidar sampling strategy was relatively easy to apply and was readily transferable to a new study site. Adopting this method at the validation site, the classiﬁcation-based sampling also worked e ﬀ ectively, with an R 2 of 0.40 and an RMSE of 108.2 Mg ha − 1 compared to the full lidar coverage model with an R 2 of 0.58 and an RMSE of 96.0 Mg ha − 1 . This study evaluated di ﬀ erent lidar sample selection methods to identify an e ﬃ cient and e ﬀ ective approach to reduce the volume and cost of lidar acquisitions. The forest type classiﬁcation-based sampling method described in this study could facilitate cost-e ﬀ ective lidar data collection in future studies.


Introduction
Forest ecosystem management requires comprehensive, timely, and accurate monitoring efforts [1]. Above ground biomass (AGB) is an important indicator in monitoring the change of forest carbon stocks. Airborne lidar has been applied successfully to estimate forest biophysical parameters and has provided accurate AGB estimation in many studies [2,3], particularly when used in coordination with data from passive sensors. Commonly used remote sensing data sources, such as Landsat [4], Moderate Resolution Imaging Spectroradiometer (MODIS) [5], and radar [6], tend to reach a saturation point that limits their effectiveness in estimating higher AGB levels [7]. The saturation level of radar and 2000 m. Systematic sampling is easy to design and apply but it might fail to represent the full data range, especially if only a small portion of data are sampled. Classification-based sampling can help compensate for this situation by better representing all value ranges. In classification-based sampling, a classification map is created and then applied to assist lidar sample selection. Chen and Hay [19] aimed to model forest canopy height from lidar samples that were selected by combining pseudo-height classification from QuickBird imagery with several other inputs in a rule-based model. The rules included non-overlapping transects, covering all height classes, and selecting pseudo-height histograms with the highest correlation to the pseudo-height histogram derived from all data. Previous studies have considered both systematic sampling and classification-based sampling, though there has not been a comparison of these two strategies.
The overall aim of this study was to deepen our understanding of lidar sampling for AGB estimation. While the value of lidar sampling has been well documented and various lidar sampling strategies have been proposed, there are no widely accepted protocols for cost-effective lidar sampling for AGB estimation. Additionally, while forest type has long been recognized as a factor in AGB estimation, prior studies have not documented the use of forest type classification for lidar sampling selection within this field. This paper presents a methodological framework to map AGB in temperate forests by combining ground-based inventory data, comprehensive Landsat data, and lidar samples acquired using a variety of methods. We particularly focused on: (1) Assessing whether lidar samples can substitute for comprehensive lidar data collection, (2) characterizing the differences in AGB estimation based on systematic and classification-based sampling lidar sampling strategies, and (3) providing a protocol for lidar sampling acquisition, implementation, and evaluation.

Main Study Area: Huntington Wildlife Forest
Our main study area was the Huntington Wildlife Forest (Huntington) in the central Adirondack Park in northern New York State. The Huntington property provided a location for evaluating the value of different lidar sampling procedures and developing a sampling protocol. Huntington is managed by the State University of New York College of Environmental Science and Forestry (SUNY ESF; 43 • 58 19"N, 74 • 13 18"W; Figure 1). Huntington covers approximately 60 km 2 with mountainous topography ranging in elevation from 466 m to 859 m above mean sea level. Huntington had a mean annual temperature of 4.4 • C and mean annual precipitation of 1010 mm [21]. Huntington contained both undisturbed natural communities and managed forest stands with major species being American beech (Fagus grandifolia), yellow birch (Betula alleghaniensis Britt.), sugar maple (Acer saccharum Marshall.), red spruce (Picea rubens Sarg.), red maple (Acer rubrum L.), and hemlock (Tsuga spp.).

Field Inventory Data
SUNY ESF maintained continuous forest inventory (CFI) plots within Huntington and Heiberg forests, with comprehensive data collected during the summer of 2011 and 2010, respectively. The CFI plots are approximately 405 m 2 circular regions, with the center of each plot located using a global positioning system receiver. All trees in the plot with a diameter at breast height (DBH) of 11.7 cm or greater were measured in Huntington and 9.1 cm or greater were measured in Heiberg. The information recorded for each tree included tree species, DBH, and location relative to the plot center.
Based on the field observations, tree-level AGB was calculated using species-specific DBH allometric equations from Jenkins et al. [23]. Plot-level AGB was calculated as the average AGB per unit area within each plot in megagrams per hectare (Mg ha −1 ). This was calculated by dividing the tree-level AGB total by the plot area. The United Nations Economic Commission for Europe (UNECE) Food and Agriculture Organization (FAO) [24] defined a stand as a mixed forest where neither broadleaved nor coniferous trees account for more than 75% of the tree crown area. We adapted the UNECE/FAO approach and defined a plot as hardwood if the hardwood AGB within the plot was over 75% of the total AGB. Softwood plots were similarly defined when at least 75% of the total AGB was softwood AGB. Mixed forest plots had neither softwood nor hardwood accounting for more than 75% of the total AGB. Table 1 provides descriptive statistics for plot level AGB in hardwood, softwood, and mixed plots in Huntington and Heiberg forests.

Lidar Data and Processing
Airborne lidar data were acquired for Huntington and Heiberg on 10 September 2011 and 10 August 2010, respectively. ALS60 lidar systems were used to simultaneously collect both discrete return point clouds and the waveforms of the returned signals. Characteristics of the lidar data collections for Huntington and Heiberg are summarized in Table 2. Raw laser data was post-processed by Kucera International using Terrasolid's TerraScan software [25]. All further point-cloud processing tasks were performed within FUSION software [26]. Lidar variables were derived from the lidar points within each inventory plot using the CloudMetrics function in FUSION [27]. Return-based, height-based, and density-based variables were derived (Table 3).

Landsat Data and Processing
We selected orthorectified Landsat TM Level-1 images acquired on 19 August 2011 (path/row: 15/29) and 18 July 2010 (path/row: 15/30) that covered the Huntington and Heiberg forest areas, respectively. The images were downloaded from the U.S. Geological Survey Earth Explorer [28]. Although the Landsat images were collected earlier in the growing season than the lidar datasets, they were the cloud-free images that best coincided with the forest inventory data collection.
Using the metadata associated with the downloaded Landsat images, radiometric correction was applied to convert digital numbers into reflectance aiming to mitigate the impact of scene illumination and viewing geometry. Dark object subtraction was applied for atmosphere correction, which was intended to remove the effects of atmosphere scattering and absorption. Radiometric and atmosphere correction were both performed using ENVI 5.2 [29]. Landsat bands 1-5 and 7 (blue, green, red, near infrared, and 2 shortwave infrared), reflectance values, and vegetation indices calculated from these bands were used for model variable selection. Five commonly used vegetation indices were applied in this study: Differenced Vegetation Index (DVI), Ratio Vegetation Index (RVI), Normalized Vegetation Difference Index (NDVI), Soil Adjusted Vegetation Index (SAVI), and Modified Soil Adjusted Vegetation Index (MSAVI) ( Table 4).

Overview
We used AGB data developed from full lidar coverage as a baseline to see if Landsat-based AGB models that used lidar samples could achieve accuracies that approached that of models that used the more expensive full lidar coverage. We also sought to determine how accuracy varied with sampling strategy and if there was a way to establish a protocol to guide lidar sample collection. The workflow for this study is shown in Figure 3. The baseline for comparison in our study was an AGB model developed from the comprehensive lidar data coverage. Forest inventory plot and lidar data were applied to build a first stage regression model that was then used to estimate AGB for the Huntington study area. The impact of different lidar sampling strategies was explored using second stage regression models, which established a relationship between samples of the lidar estimated AGB values and Landsat derived variables. Two categories of lidar sampling strategies were explored: Systematic sampling and classification-based sampling. The classification-based sampling approach was based on a Random Forest (RF) forest type classification. A study previously performed at the same site found that RF had better performance in forest type classification than support vector machine and decision tree algorithms [35]. To assess the accuracy of different sampling strategies, Landsat estimated AGB values generated from second stage regression models were validated using plot and lidar estimated AGB values using mean absolute error (MAE), root mean square error (RMSE), and relative root mean square error (RRMSE). Lidar estimated AGB values covering the full study area were estimated using the first stage regression model and used for testing the Landsat estimated AGB values. of the research process including data, sampling strategies, methods, and results.

Regression and Variable Selection
This study explored the relationship between AGB and remote sensing derived variables using regression models based on the equation below: where is the intercept, are model coefficients, and represents the remote sensing derived predictors. As discussed in the prior section, regression models were built in two distinct steps within the workflow (Figure 3). In the first stage regression model, the dependent variable was AGB for the 270 plots within the Huntington area and the predictors were selected from lidar derived variables using the forward variable selection method. Like prior studies [36,37], using the natural logarithm of both dependent and predictor variables led to better performance for the first stage regression model. The second component of the analysis applied Equation (1) to develop regression models for a series of different sampling strategies (described in the next section). Shown as second stage regression models in Figure 3Error! Reference source not found., these models used a sample of the lidar estimated AGB values as the dependent variable and Landsat variables as predictors without variable selection. All variables were used to facilitate comparison by ensuring all second stage regression models had the same predictors.
There are several commonly used variable selection methods when applying multiple linear regression: Forward, backward, and stepwise selection. Forward selection starts with the most

Regression and Variable Selection
This study explored the relationship between AGB and remote sensing derived variables using regression models based on the equation below: where β 0 is the intercept, β j are model coefficients, and X ij represents the remote sensing derived predictors. As discussed in the prior section, regression models were built in two distinct steps within the workflow (Figure 3). In the first stage regression model, the dependent variable was AGB for the 270 plots within the Huntington area and the predictors were selected from lidar derived variables using the forward variable selection method. Like prior studies [36,37], using the natural logarithm of Remote Sens. 2019, 11,1906 8 of 20 both dependent and predictor variables led to better performance for the first stage regression model. The second component of the analysis applied Equation (1) to develop regression models for a series of different sampling strategies (described in the next section). Shown as second stage regression models in Figure 3, these models used a sample of the lidar estimated AGB values as the dependent variable and Landsat variables as predictors without variable selection. All variables were used to facilitate comparison by ensuring all second stage regression models had the same predictors.
There are several commonly used variable selection methods when applying multiple linear regression: Forward, backward, and stepwise selection. Forward selection starts with the most significant variable in the model and sequentially adds the next most significant variable into the model until none of the remaining variables are significant. Backward selection starts with all variables in the model and successively removes the least significant variable until all the variables in the model are significant at a chosen level. Stepwise selection adds or removes one variable at each step to ensure all variables in the model are significant, while no variable outside the model is significant enough to enter the model. Forward selection was applied when building the first stage regression model because it supported the easy application of subsequent procedures.

Lidar Sampling Strategies
Two sampling strategies were adopted in this study: Systematic and classification-based sampling. In systematic sampling, combinations of 3 sampling patterns (point, strip, and grid) and 4 sampling intervals (500 m, 1000 m, 1500 m, and 2000 m) were applied to acquire 12 systematic lidar samples ( Figure 4). A northwest-southeast alignment was applied to be consistent with the airplane flight path used during the lidar acquisition. The classification-based sampling used the same sampling pattern and amount of data as the best performing systematic sampling strategy. However, instead of using a pre-defined distance interval, the classification-based sampling selected data based on the forest type distribution within the samples. instead of using a pre-defined distance interval, the classification-based sampling selected data based on the forest type distribution within the samples. Based on the 542 m lidar data acquisition swath width, a 500 × 500 m square area was chosen as the basic sampling unit at Huntington. However, given the smaller forest extent of our test site, for the Heiberg area, a 200 × 200 m square area was chosen as the basic sampling unit. By reducing the basic sampling unit at Heiberg, we kept the overall area percentage sampled consistent with the Huntington analysis. RF is a non-parametric machine learning algorithm that was implemented in this study using the "RandomForest" package [38] within the R software environment [39,40]. RF can be used for regression or classification depending on the type of variable to be estimated [41,42]. Compared with Based on the 542 m lidar data acquisition swath width, a 500 × 500 m square area was chosen as the basic sampling unit at Huntington. However, given the smaller forest extent of our test site, for the Heiberg area, a 200 × 200 m square area was chosen as the basic sampling unit. By reducing the basic sampling unit at Heiberg, we kept the overall area percentage sampled consistent with the Huntington analysis.

RF Classification of Forest Type for Classification-Based Sampling
RF is a non-parametric machine learning algorithm that was implemented in this study using the "RandomForest" package [38] within the R software environment [39,40]. RF can be used for regression or classification depending on the type of variable to be estimated [41,42]. Compared with linear regression techniques, RF has a lower bias and avoids overfitting [43][44][45][46][47]. RF grows many trees to vote for a result, which makes it insensitive to outliers and noise [44,45]. For each tree, approximately two-thirds of the original data was randomly chosen to build the tree, and the remaining data were used for estimating out-of-bag error and calculating variable importance. In this study, RF was applied to develop a forest type classification map using forest inventory plots as reference data and Landsat derived variables as predictors. The forest type classification map identified 3 classes: Hardwood, mixed, and softwood forests. Default RF parameters were applied: 500 for ntree, square foot of the total predictors for mtry, and 1 for node size.

Chi-Square Test for Selecting Classification-Based Samples
For the classification-based sampling, the sampling pattern and percentage of the sampled area were chosen according to the performance of different systematic sampling plans. Our testing demonstrated that there was a need to identify a sample that represented the overall distribution of forests within the study site. There are multiple approaches that can be used to explore the relationship between a sample and the population. The chi-square goodness of fit test is used to determine whether an observed categorical variable frequency distribution differs from an expected distribution.
where O i is the observed frequency, E i is the expected frequency, N is the total number of observations, and p i is the percentage of type i in the expected distribution. The similarity between observed and expected distribution can be inferred from the χ 2 value. Smaller χ 2 values indicate more similar distributions. In this study, forest type distribution from the sampled area was our observed distribution and forest type distribution from the whole study area was our expected distribution. We divided the study site into multiple non-overlapping strips. Using this method, we calculated χ 2 values between the whole study area and each strip based on forest type distribution. Smaller χ 2 values correspond to strips with forest type composition that was closer to the whole study site.

Accuracy Assessment for Second Stage Regression Models
We used 10-fold cross-validation to assess the quality of AGB estimation of the first stage regression model. The second stage regression models were assessed using model fitting R 2 . In addition, the Landsat AGB estimations generated from second stage regression models were compared to plot and lidar estimated AGB with accuracy reported using MAE, RMSE, and RRMSE. The plot estimated AGB was calculated from ground inventory plots and the lidar estimated AGB was the estimated AGB value generated by applying the first stage regression model to the whole area. Plot estimated AGB was considered the best estimate of actual AGB. Therefore, plot tested RMSE was given more importance in terms of model comparison.
where AGB Landsat,k is Landsat derived AGB from second stage regression models, AGB re f ,k is plot or lidar derived AGB, m is the number of validation data points (k = 1, 2, . . . , m).

Full Lidar Coverage AGB Estimation
Huntington forest inventory plots and lidar derived variables were applied to establish the first stage regression model, which was validated using a 10-fold cross validation. The regression equation for the final model selected is shown in Equation (6). This equation shows the two variables selected through the forward variable selection process: ht-P90 (90th percentile of lidar point heights) and Per-first-mean (percentage of first returns above mean return height within each plot). The model has an R 2 of 0.57, RMSE of 64.8 Mg ha −1 , and RRMSE of 34.7% from the 10-fold cross-validation. Figure 5 shows a scatter plot illustrating the relationship between the field-based plot AGB and the lidar estimated AGB for the Huntington site.  Raster layers of ht_P90 and Per_first_mean covering the whole area were created from the lidar point data. A cell size of 30 m was adopted for both raster layers to be consistent with the Landsat spatial resolution. The two raster layers were then applied in Equation (6)   Raster layers of ht_P90 and Per_first_mean covering the whole area were created from the lidar point data. A cell size of 30 m was adopted for both raster layers to be consistent with the Landsat spatial resolution. The two raster layers were then applied in Equation (6)  Raster layers of ht_P90 and Per_first_mean covering the whole area were created from the lidar point data. A cell size of 30 m was adopted for both raster layers to be consistent with the Landsat spatial resolution. The two raster layers were then applied in Equation (6) to generate a lidar estimated AGB map for Huntington (Figure 6). Lidar estimated AGB values at Huntington ranged from 0 to 784.89 Mg ha −1 with less than 0.3% of pixel values beyond the plot AGB maximum value of 440.3 Mg ha −1 .

Systematic Sampling AGB Estimation for the Huntington Area
We used the AGB data developed from the full lidar coverage using Equation (6) as a baseline to see if the Landsat-based AGB model using lidar samples could achieve accuracies that approached that of the more expensive full lidar coverage AGB estimation. Several second stage regression models were built for each sampling strategy. The model for each sampling strategy was evaluated by looking at the model fitting R 2 , MAE, RMSE, and RRMSE values calculated using both the field-based plot AGB and the lidar estimated AGB as references ( Table 5). The number of pixels applied for building the regression models is also summarized in Table 5. The first stage regression model is shown in Equation (6)  Overall, although the point sampling generally had higher R 2 values, the strip sampling approach had smaller MAE, RMSE, and RRMSE values when assessed using the field-based AGB values. Strip sampling also matched the nature of airplane flight paths, which rendered it easy to adopt from a practical viewpoint. Therefore, the strip pattern was applied for further analysis.
The location of the starting point for the systematic sampling determined the location of all subsequent samples. To evaluate the sensitivity of the AGB estimates to this starting point and examine the stability of systematic sampling, we tested five different starting points for the strip sampling using a 1500 m interval. Figure 7 illustrates the arrangement of the five systematic strip sampling layouts with 500 m swath width and a distance interval of 1500 m. Given the variability shown in these five alternatives, we also explored the variability based on a random selection of 3 of the 13 non-overlapping strips available for this property. This led to a total of 286 combinations, with plot-based RMSE values summarized in Figure 8. The plot-based RMSE values that came from randomly selecting three strips ranged from 80.1 to 102.0 Mg ha −1 .  Given the variability shown in these five alternatives, we also explored the variability based on a random selection of 3 of the 13 non-overlapping strips available for this property. This led to a total of 286 combinations, with plot-based RMSE values summarized in Figure 8. The plot-based RMSE values that came from randomly selecting three strips ranged from 80.1 to 102.0 Mg ha −1 . Given the variability shown in these five alternatives, we also explored the variability based on a random selection of 3 of the 13 non-overlapping strips available for this property. This led to a total of 286 combinations, with plot-based RMSE values summarized in Figure 8. The plot-based RMSE values that came from randomly selecting three strips ranged from 80.1 to 102.0 Mg ha −1 .

Classification-Based Sampling AGB Estimation for the Huntington Area
The second sampling approach explored a classification-based framework. We used a strip sampling structure at 1500 m distance intervals to select three strips from the forest type map. The forest type map was generated from the Landsat data using an RF classification with an out-of-bag (OBB) error rate of 18.9%. As with the systematic sampling, the Huntington study site was covered with 13,500 m wide non-overlapping strips. The distribution of strips and strip IDs are shown in Figure 8. Boxplot summarizing plot-based RMSE values from 286 possible sampling outcomes that were generated by randomly selecting 3 of the 13 strips on the Huntington site.

Classification-Based Sampling AGB Estimation for the Huntington Area
The second sampling approach explored a classification-based framework. We used a strip sampling structure at 1500 m distance intervals to select three strips from the forest type map. The forest type map was generated from the Landsat data using an RF classification with an out-of-bag (OBB) error rate of 18.9%. As with the systematic sampling, the Huntington study site was covered with 13,500 m wide non-overlapping strips. The distribution of strips and strip IDs are shown in Figure 9. In order to select strips that best represented the entire study site, the frequency of each forest type was summarized within each strip and in the full dataset and Chi-square goodness of fit values were calculated. Strips with smaller chi-square values had a forest class distribution that was closer to the full data than strips with larger chi-square values. Strips six, seven, and eight had the smallest chi-square values (Figure 10), thus were selected to provide the classification-based lidar sample. In order to select strips that best represented the entire study site, the frequency of each forest type was summarized within each strip and in the full dataset and Chi-square goodness of fit values were calculated. Strips with smaller chi-square values had a forest class distribution that was closer to the full data than strips with larger chi-square values. Strips six, seven, and eight had the smallest chi-square values (Figure 10), thus were selected to provide the classification-based lidar sample.
Lidar estimated AGB within strips six, seven, and eight were used to build a regression model with Landsat derived variables as predictors. The model results are shown in Table 6. The R 2 for the classification-based sampling was generally higher than any of the 12 systematic sampling strategies and the plot and lidar tested MAE, RMSE, and RRMSE values were generally smaller. Overall, the classification-based sampling outperformed 75% of the systematic sampling strategies. In order to select strips that best represented the entire study site, the frequency of each forest type was summarized within each strip and in the full dataset and Chi-square goodness of fit values were calculated. Strips with smaller chi-square values had a forest class distribution that was closer to the full data than strips with larger chi-square values. Strips six, seven, and eight had the smallest chi-square values (Figure 10), thus were selected to provide the classification-based lidar sample.

Testing Classification-Based Sampling for the Heiberg Data
A first stage regression model was built between plot AGB for all 43 Heiberg forest inventory plots and lidar derived variables following the same procedure used for the Huntington site. The regression model is shown in Equation (6). The two lidar variables identified through the forward selection process were the 95th percentile of lidar point heights (ht_P95) and the percentage of first returns above 5 m (Per-first-5 m). The model had an R 2 of 0.58, RMSE of 96.0 Mg ha −1 , and RRMSE of 44.7%. Raster layers for ht_P95 and Per-first-5 m were created from the Heiberg lidar points with a pixel size of 30 m. The two raster layers were applied to Equation (6) to acquire lidar estimation of AGB for Heiberg.
To test the transferability of the classification-based sampling method, we applied the procedure developed at Huntington to the Heiberg study area. The forest type classification map with three forest classes (hardwood, mixed, and softwood forests) was produced using RF based on forest inventory plot and Landsat data with an OBB error rate of 23.7%. The Heiberg site was smaller than the Huntington site, hence was divided into seven, 200 m wide strips along the flight path used to acquire the lidar data. Chi-square values were calculated between full data and each strip based on the distributions of forest type classes. As shown in Figure 11, strip two, four, and seven had the smallest chi-square values hence lidar estimated AGB values in those strips were used as the dependent variable and Landsat variables were used as predictors in the regression model.
The regression model built using the sample strips was then applied to the Landsat data covering the Heiberg study area to acquire AGB estimates. Landsat AGB estimates were tested using plot and lidar estimated AGB values (Table 7). Compared with using full lidar coverage, the classification-based sampling decreased the R 2 value from 0.58 to 0.40. Plot and lidar tested MAE, RMSE, and RRMSE values also increased. inventory plot and Landsat data with an OBB error rate of 23.7%. The Heiberg site was smaller than the Huntington site, hence was divided into seven, 200 m wide strips along the flight path used to acquire the lidar data. Chi-square values were calculated between full data and each strip based on the distributions of forest type classes. As shown in Figure 11, strip two, four, and seven had the smallest chi-square values hence lidar estimated AGB values in those strips were used as the dependent variable and Landsat variables were used as predictors in the regression model. Figure 11. Chi-square values between all data and each strip. X axis is strip name and the Y axis is chi-square value.

Discussion
In this study, we aimed to determine if samples of lidar data could be combined with forest inventory data and Landsat imagery to produce viable wall-to-wall maps of AGB. In particular, this study aimed to assess the stability of sampling techniques in order to develop a strategy to identify lidar samples that could be fused with Landsat data to estimate AGB without substantially compromising accuracy when compared to a full lidar based model. In our study, both systematic sampling and classification-based sampling were compared to AGB derived from full lidar coverage. For our main Huntington site, when compared to having full lidar coverage, the RMSE from systematic strip sampling and classification-based sampling both had a higher RMSE (by 30% or more). One possible factor to consider in reducing this difference may relate to the proportion of data [48][49][50]. In both sampling approaches, we limited samples to under 25% of the study area. Chen et al. [51] compared the fusion of QuickBird imagery and different sized lidar samples and concluded that model performance for estimating forest canopy heights increased with lidar sampled area.
Another weakness of the sampling-based approach lies in the use of multiple regression models [52,53]. Most of our plot AGB values were in a similar range but a few had much lower values that highly influenced the multiple regression model built ( Figure 5). Oversampling lower AGB plots may improve the model performance. In contrast to the AGB estimation based on full lidar coverage that used one regression model, in the sampling-based fusing approach, we used two regression models. By adding the second regression model, we introduced additional uncertainties from both Landsat data and the second statistical model [54]. In addition, the Landsat estimated AGB values from the second regression model have narrower AGB ranges compared to lidar estimated AGB values indicating the impacts of saturation. The saturation effect can be caused by the limitations of Landsat spatial, spectral, and temporal resolutions, impacts of forest type, forest structure, and topographic features [55]. The saturation effect limits the ability of Landsat to estimate high AGB levels and leads to low AGB estimation accuracy, which has been reported in other studies [4,[56][57][58]. While the RMSE-based validation characterized errors associated with the regression models they do not consider errors related to forest measurement data collection or the allometric equations applied, both of which add uncertainty to the study. We minimized uncertainty associated with the allometric equations by using species-specific equations with consideration of the applicable tree DBH range. We were unable to perform uncertainty analysis on forest measurement due to the lack of repeated tree measurement and destructive AGB measurements though this is certainly a factor to consider in future studies.
Multiple studies have performed lidar sampling, with strips being the most commonly used sampling pattern among the studies [16,19,20,59]. Sampling using data strips is consistent with the nature of airplane flight planning, which makes it a good compromise between ease of use, lower cost, and accuracy. The problem faced when using systematic sampling is inconsistency. Chen and Hay [19] stated that different lidar transects would generate different results, which is consistent with our outcomes as reported in Table 5 that shows the variability in the AGB estimates from the 12 systematic sampling strategies. Systematic sampling using strips at 1500 m intervals showed better performance in terms of RMSE than the other systematic sampling strategies at Huntington study area tested with plot and lidar estimated AGB.
Systematic sampling strategy outcomes are highly connected with site conditions, modeling technique, and the use of auxiliary data [60,61]. To test the uncertainty associated with site condition variability, we explored the impact of the systematic sampling start point for strip sampling as well as the random selection of sampling strips in terms of plot-based RMSE. Our study demonstrated that even with consistency in terms of modeling technique and auxiliary data inputs, RMSE values varied substantially (Figure 7) when we used different starting points to sample three strips at a constant distance interval. RMSE values also varied substantially when we varied the location sampled through random strip selection (Figure 8). These RMSE values varied from outperforming all other systematic sampling strategies in Table 5 to be the worst sampling strategy. With such high variability of RMSE, it is hard to discern the impact of sample percentages on the AGB estimation. From a practical standpoint, it would be almost impossible to discern which systematic strategy would return a good outcome since you cannot typically explore multiple systematic sampling combinations and would not be considering sampling if the full lidar coverage was available. In our study, there was no general trend in terms of the changes in accuracy with variation in systematic sampling intervals and sampling pattern. This variability may have been linked to differences in forest condition in different regions. Gregoire et al. [62] recommended considering the AGB gradient during the sampling stage. Although Chen and Hay [19] got similar performance from N-S and E-W direction lidar samplings, this might be attributed to the complexity of the forest ecosystem in their study site, which had no general trend in any direction. If there was a general trend shown in a site, as might be the case for plantation areas, considering sampling direction is highly recommended. Our study supported prior work that demonstrated that systematic sampling is easy to apply, but the instability of the outputs suggests it has lower transferability for AGB estimation at other sites.
We applied classification-based sampling with the goal of using readily available Landsat data to select samples for acquiring lidar data that were representative of the entire study area. Land cover is an important factor in modeling AGB [63,64] and it is easy to overlook some forest types especially over large and heterogeneously distributed areas. Zheng et al. [64] showed that developing individual regression models for each forest type could improve model accuracy. In general, hardwoods have high canopy cover resulting in more horizontal expansion compared to softwood [63,65]. Selecting lidar strips based on forest type classification result could avoid over or under representation of certain forest types. The classification-based sampling outperformed 75% of the systematic sampling strategies in Huntington study area, and more importantly, provided a means to plan lidar acquisition that was lacking in the systematic sampling approach. Adopting this method to our test Heiberg area, the classification-based sampling also worked effectively, with R 2 and RMSE values acquired from the classification-based sampling only moderately impacted when compared to the full lidar coverage model. The classification-based sampling method provides a means to substantially reduce lidar acquisition without a major compromise in accuracy while providing a preprocessing step to guide application in new study areas. The need to perform the classification does require additional analysis; however, the random nature of systematic sampling can lead to substantial, and unknown a priori, sample variability that potentially decreases transferability of this approach.

Conclusions
The framework in this study provides an approach to obtain wall-to-wall estimates of AGB by merging lidar samples with Landsat imagery and forest inventory data. We focused on AGB estimation accuracy based on systematic and classification-based lidar sampling strategies. While systematic lidar sampling can achieve promising AGB estimates and is easy to implement, there was high model outcome variability among systematic sampling strategies. Moreover, the results attained from systematic sampling strategies were highly dependent on site condition, which provides challenges in planning lidar acquisitions. Classification-based lidar sampling provides a planning framework that is more readily transferable to new sites by guiding the selection of lidar samples representative of the study site. The fusion of lidar samples and Landsat data had lower accuracies in AGB estimation compared with full lidar coverage, which can be exacerbated by the uncertainties introduced by the addition of Landsat data and the use of a second regression model. This study performed a methodical comparison of systematic and classification-based lidar sampling approaches to support AGB estimation. We propose using forest type classification as a means to guide lidar sampling selection within this field. We anticipate the results of this study could facilitate cost-effective lidar data collection for use in future studies.