Analyzing the Uncertainty of Estimating Forest Aboveground Biomass Using Optical Imagery and Spaceborne LiDAR

Accurate estimation of forest aboveground biomass (AGB) is important for carbon accounting. Forest AGB estimation has been conducted with a variety of data sources and prediction methods, but many uncertainties still exist. In this study, six prediction methods, including Gaussian processes, stepwise linear regression, nonlinear regression using a logistic model, partial least squares regression, random forest, and support vector machines were used to estimate forest AGB in Jiangxi Province, China, by combining Geoscience Laser Altimeter System (GLAS) data, Moderate Resolution Imaging Spectroradiometer (MODIS) data, and field measurements. We compared the effect of three factors (prediction methods, sample sizes of field measurements, and cross-validation settings) on the predictive quality of the methods. The results showed that the prediction methods had the most considerable effect on the prediction quality. In most cases, random forest produced more accurate estimates than the other methods. The sample sizes had an obvious effect on accuracy, especially for the random forest model. The accuracy increased with increasing sample sizes. The random forest algorithm with a large number of field measurements, was the most precise (coefficient of determination (R2) = 0.73, root mean square error (RMSE) = 23.58 Mg/ha). Increasing the number of folds within the cross-validation settings improved the R2 values. However, no apparent change occurred in RMSE for different numbers of folds. Finally, the wall-to-wall forest AGB map over the study area was generated using the random forest model.


Introduction
Forest ecosystems play a pivotal role in the global carbon cycle, and the contribution of the forest to carbon cycle is often quantified by forest aboveground biomass (AGB) [1]. AGB has been identified as a biodiversity variable suitable for estimating ecosystem functions [2]. There is particular interest in mapping forest AGB so that spatial variations in carbon stock and ecosystem functions can be monitored across a range of scales.
Forest field measurements are an essential prerequisite for biomass estimation. Most researchers assume that field measured data, which are also called reference data, are the most accurate. The field measured data are used to develop biomass estimation models, compare different biomass estimation models, and conduct uncertainty analysis [3]. Accordingly, the size of the reference data set could be a crucial factor influencing the precision of forest AGB prediction. The collection of large amounts developed through linking reference data to the potential variables derived from remotely sensed images [27]. Empirical algorithms are widely used for forest AGB estimation, which include parametric and nonparametric algorithms. Parametric algorithms assume that the relationship between biomass and prediction variables derived from remote sensing data have explicit model structures that can be specified. Examples of these are simple or multiple linear regression models, and nonlinear models. After the regression model relating the forest AGB and the dependent variables has been developed, the regional biomass can be easily predicted based on the model and the remotely sensed variables [28][29][30]. However, forest AGB is affected by many factors, and predicting the relationship between AGB and remote sensing variables is difficult using parametric algorithms. An alternative approach is to use nonparametric algorithms (e.g., K-nearest neighbors, random forest, and support vector machine) to estimate AGB with remote sensing data [31][32][33][34][35][36][37]. Nonparametric algorithms predict the model structure in a data-driven manner instead of explicitly predefining the model structure.
Uncertainty analysis for forest AGB estimates is important [38]. The root mean square error (RMSE) and the coefficient of determination (R 2 ) are often used to assess the accuracy of biomass estimates. Traditionally, the data are split into two parts: one to develop the model, and one for the evaluation. Cross-validation is widely used to assess the forest AGB estimates using reference data. One of the common types of cross-validation is k-fold cross-validation. In this method, the reference data are randomly partitioned into k equal sized subsamples. Of the k subsamples, k − 1 subsamples are used as the training data, and the remaining single subsample is retained as the validation data to test the model. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as validation data. The k results can then be averaged to produce a single estimation [39]. In practice, not all earlier studies included cross-validation or similar analyses of model performance. Instead, some studies reported the predictive accuracy for the data that were applied to fit the biomass model, for which a larger error is to be expected [40]. The value of k in cross-validation settings varied across different studies, which may affect the model diagnostics. Therefore, the model accuracy reports should be treated carefully when comparing different studies. The effect of cross-validation settings were tested in this study.
Despite many studies conducted on forest AGB estimation, various uncertainties remain in the current model estimates. Comparing the existing studies is difficult due to the diversity of data sources, modeling methods, and calculating standards. In this study, we compared the performance of prediction methods for estimating forest AGB while varying the sample size of the reference data and the cross-validation settings. Six commonly used modeling algorithms were applied to estimate AGB through the integration of ground inventory data, ICESat/GLAS measurements, and optical imagery. Our main objectives were: (1) to evaluate the influence of the modeling algorithms, sample data sizes, and cross-validation settings on biomass estimation performance; (2) compare the modeling algorithms for forest AGB density estimation for the study area; and (3) map spatially continuous regional forest AGB using the selected biomass model through a combination of forest inventory data and remote sensing data.

Study Area
This study was conducted in Jiangxi Province, Southeast China ( Figure 1). The MODIS land cover product (MOD12Q1) was applied as the vegetation classification data, as shown in Figure 1 [41]. This province is located at 24 • 29 to 30 • 04 N and from 113 • 34 to 118 • 28 E. Jiangxi Province has an area of 1.67 × 10 5 km 2 and a population of 45 million. It is located in a subtropical moisture monsoon climate zone and has an annual average temperature of~16.3-17.5 • C and annual total precipitation of~1341-1943 mm. The topography of Jiangxi is characterized by mountainous regions in the east, south, and west; plain and drainage in the north; and hilly and basin areas in the central part. Jiangxi Province currently has the second largest total forested area in China. According to the latest forest survey data, Jiangxi Province has a forest area of 1.021 × 10 5 km 2 , with rich vegetation and typical in the east, south, and west; plain and drainage in the north; and hilly and basin areas in the central part. Jiangxi Province currently has the second largest total forested area in China. According to the latest forest survey data, Jiangxi Province has a forest area of 1.021 × 10 5 km 2 , with rich vegetation and typical subtropical evergreen broadleaf forests. Needle leaf forests, mixed needle and broadleaf forests, evergreen broadleaf forests, broadleaf mixed evergreen and deciduous forests, bamboo forests, and shrublands dominate the forested land.

Field Measurements
Forest reference data are critical for biomass estimation. In this study, the reference data were obtained from the Jiangxi Province section of National Forest Resource Inventory database for China [42]. The dataset was collected around 2006 ( Figure 1). The plots were uniformly distributed over the forested land. Portable global positioning system (GPS) instruments and static differential GPS are widely used in field investigation [42]. The positioning error of the GPS was less than 7 m. Tree species were identified and the diameter at breast height (dbh) was measured for every tree with a dbh > 5 cm. Tree height was measured for 3-5 average trees in each plot. Heights were estimated by height-diameter models for trees that had not been measured in height. The biomass of each tree in the plots was estimated via allometry models for each tree species [43]. The biomass of each plot was the sum of the biomass predictions of all trees in the plot.

LiDAR Data
The GLAS staged in the ICESat emits a 1064 nm laser pulse at an elliptical ground footprint of ~64 m in diameter, and the echo waveform of the laser pulse is recorded by the sensor system [44]. The ellipsoidal footprints of ~65 m in diameter were spaced every 170 m along the track and at tens of kilometers across tracks [45]. In this study, we selected GLAS data from the operating periods L2B and L3A (from February 2006 to November 2006) for our forest AGB mapping procedure. Two GLAS products, GLA01 and GLA14, were downloaded from the ICESat/GLAS data pool and we recorded the full-waveform, surface elevation, location, and data quality for each laser shot. Four criteria were applied to the GLAS footprints to ensure their quality: (1) cloud free, (2) no saturation

Field Measurements
Forest reference data are critical for biomass estimation. In this study, the reference data were obtained from the Jiangxi Province section of National Forest Resource Inventory database for China [42]. The dataset was collected around 2006 ( Figure 1). The plots were uniformly distributed over the forested land. Portable global positioning system (GPS) instruments and static differential GPS are widely used in field investigation [42]. The positioning error of the GPS was less than 7 m. Tree species were identified and the diameter at breast height (dbh) was measured for every tree with a dbh > 5 cm. Tree height was measured for 3-5 average trees in each plot. Heights were estimated by height-diameter models for trees that had not been measured in height. The biomass of each tree in the plots was estimated via allometry models for each tree species [43]. The biomass of each plot was the sum of the biomass predictions of all trees in the plot.

LiDAR Data
The GLAS staged in the ICESat emits a 1064 nm laser pulse at an elliptical ground footprint of 64 m in diameter, and the echo waveform of the laser pulse is recorded by the sensor system [44]. The ellipsoidal footprints of~65 m in diameter were spaced every 170 m along the track and at tens of kilometers across tracks [45]. In this study, we selected GLAS data from the operating periods L2B and L3A (from February 2006 to November 2006) for our forest AGB mapping procedure. Two GLAS products, GLA01 and GLA14, were downloaded from the ICESat/GLAS data pool and we recorded the full-waveform, surface elevation, location, and data quality for each laser shot. Four criteria were applied to the GLAS footprints to ensure their quality: (1) cloud free, (2) no saturation effects, (3) greater signal to noise ratios (SNRs), and (4) not significantly higher than the land surface elevation [46].
Pre-processing included error elimination, data decompression, voltage conversion, and filtering. The waveform was filtered using wavelet transform, which removes high-frequency noise and smooths the data. The leading edge extent, trailing edge extent, and waveform extent were computed from the full-waveform information of each laser shot. These three variables have been shown to be correlated with canopy height variability, terrain slope, and canopy height, respectively [47]. The methods used here to extract the GLAS waveform characteristics and correct the terrain followed those used by Xi et al. [22]. The wavelet analysis method was used to position the peak of the waveform and to extract waveform length information of the vegetation height peak.

MODIS Normalized Difference Vegetation Index Product (MOD13Q1)
Both the MODIS and Landsat TM data were downloaded and analyzed. Cloud-free, good-quality TM images that covered the study area in 2006 were lacking. The correlation coefficients between biomass and TM-Normalized Difference Vegetation Index (NDVI) were much less than MODIS-NDVI during March and September 2006. Therefore, we selected MODIS-Terra MOD13Q1 data as the source of the NDVI data. MOD13Q1 data are provided every 16 days at a 250-m spatial resolution. Jiangxi Province is covered by 3 MODIS path/row tiles (h28v05, h28v06, and h27v06). The data were corrected for the geometric and radiometric distortions of the images to the standard Level 1G before delivery. In this study, to explore the effectiveness of seasonal time series data for estimating forest AGB, the data from 15 MODIS images were acquired in different seasons (spanning from 81st to the 305th day of the year) for each path/row tile.

MODIS Percent Tree Cover Product (MOD44B)
The vegetation continuous fields collection (MOD44B) was used as the source of tree cover percentage data, which contains proportional estimates for woody vegetation cover types. Tree cover is predicted based on multi-temporal metrics for a full year using a regression tree model [48]. The RMSE is reported to be 9.06% for this collection. These data were included as an explanatory variable for the correlation between forest AGB and coverage.

Methods
The methodology used to estimate forest AGB in this study is shown in Figure 2. The major steps include (1) extrapolating GLAS metrics and selecting MODIS variables (Section 3.1), (2) the creation of subsamples using bootstrapping technique (Section 3.2), (3) estimating forest AGB using different algorithms (Section 3.3), (4) comparing and evaluating the forest AGB estimating results, and (5) mapping the forest AGB over Jiangxi Province using the selected AGB algorithm.

Variables from GLAS and MODIS Data for Biomass Estimation
Since the spatial distance between two GLAS footprints was large, the probability of the collected reference data overlapping GLAS footprints was very small. To relate the GLAS variables to the reference data, three variables (leading edge extent, trailing edge extent, and waveform extent) were estimated into spatial continuous data using the random forest (RF) algorithm. Auxiliary predictors such as cumulative NDVI, elevation, slope, climate factors, and vegetation map were used in the RF modeling process. Detailed procedures used to estimate the three GLAS-derived variables can be found in Su et al. [8] and Lefsky [47], so they will not be discussed here.

Variables from GLAS and MODIS Data for Biomass Estimation
Since the spatial distance between two GLAS footprints was large, the probability of the collected reference data overlapping GLAS footprints was very small. To relate the GLAS variables to the reference data, three variables (leading edge extent, trailing edge extent, and waveform extent) were estimated into spatial continuous data using the random forest (RF) algorithm. Auxiliary predictors such as cumulative NDVI, elevation, slope, climate factors, and vegetation map were used in the RF modeling process. Detailed procedures used to estimate the three GLAS-derived variables can be found in Su et al. [8] and Lefsky [47], so they will not be discussed here.
For the NDVI data, most existing studies chose the NDVI in peak season to estimate forest AGB; however, NDVI data collected at different times may provide different information regarding forest structure. Therefore, the capabilities of NDVI before or after peak season for forest AGB estimation should also be investigated. In this paper, correlation analysis was used to examine the relationships between ground-measured forest AGB and NDVI at different times. Figure 3 shows the correlation coefficients (r) between the forest field inventory biomass data and NDVI values at different times. The NDVI values in spring and autumn were more closely related to AGB than in the peak season in May, June, July, and August. In the peak season, the saturation issue for canopies with large density reduced the correlation between NDVI and AGB. Although vegetation indices in the months of the peak growth season have often been used in previous studies, they may not be the reasonable choice for biomass mapping. Data in March before the green up date and in November after the leaf senescence date are more closely related to AGB, because they contain information on tree stems and branches, and their density is related to AGB. In this study, the r value between NDVI in spring and autumn and AGB were larger ( Figure 3). Therefore, NDVI in spring and autumn were selected as potential variables in the forest AGB prediction model. For the NDVI data, most existing studies chose the NDVI in peak season to estimate forest AGB; however, NDVI data collected at different times may provide different information regarding forest structure. Therefore, the capabilities of NDVI before or after peak season for forest AGB estimation should also be investigated. In this paper, correlation analysis was used to examine the relationships between ground-measured forest AGB and NDVI at different times. Figure 3 shows the correlation coefficients (r) between the forest field inventory biomass data and NDVI values at different times. The NDVI values in spring and autumn were more closely related to AGB than in the peak season in May, June, July, and August. In the peak season, the saturation issue for canopies with large density reduced the correlation between NDVI and AGB. Although vegetation indices in the months of the peak growth season have often been used in previous studies, they may not be the reasonable choice for biomass mapping. Data in March before the green up date and in November after the leaf senescence date are more closely related to AGB, because they contain information on tree stems and branches, and their density is related to AGB. In this study, the r value between NDVI in spring and autumn and AGB were larger ( Figure 3). Therefore, NDVI in spring and autumn were selected as potential variables in the forest AGB prediction model.
Finally, six predictors were selected based on earlier experience and the literature review, including three GLAS variables, two NDVI, and the tree cover percent data. The value of these six predictors at the site of the reference data was estimated, and a 801 × 7 matrix of responses and predictor variables was prepared. The first column is reference data, and columns from the second to the seventh are the six predictors. There were 801 reference data points, corresponding to 801 rows. Remote Sens. 2019, 11, x FOR PEER REVIEW 7 of 17 Finally, six predictors were selected based on earlier experience and the literature review, including three GLAS variables, two NDVI, and the tree cover percent data. The value of these six predictors at the site of the reference data was estimated, and a 801 × 7 matrix of responses and predictor variables was prepared. The first column is reference data, and columns from the second to the seventh are the six predictors. There were 801 reference data points, corresponding to 801 rows.

Creation of Subsamples
Step 1: The matrix of the response and predictor variables was first ordered according to the biomass values in the first column. Then, the matrix was subdivided into five equal size subgroups. The size of the five subgroups was n/5, where n = 801.
Step 2: Bootstrapping was used to create subsamples from the stratified dataset in step 1 [49]. Four sample size classes were generated to test the influence of sample size on the performance of the prediction models. For each of the four desired sample size classes (nclass1 = n/4, nclass2 = n/3, nclass3 = n/2, nclass4 = n), 500 datasets were generated using replacement. The stratification procedure in step 1 ensured that samples from the full range of biomass values were included in each bootstrapped dataset.

Biomass Modeling Methods
For each sample size class, the 500 input datasets were fitted by six prediction algorithms: Gaussian processes (GPs), stepwise linear models (LMSTEP), partial least squares regression (PLS), nonlinear regression using a logistic model (NRL), random forest (RF), and support vector machine (SVM).
GPs provide a probabilistic approach for learning generic regression problems with kernels [50] and use a weighting strategy for the optimization, which is relevant to the predictor variables. The inverse of the specific variable controlling the spread of the relations for each particular predictor represents the relevance of that predictor variable. That is, the larger the variable, the more extended the relationships along that predictor (i.e., containing less informative content).
LMSTEP regression removes the least correlated variables using an iterative strategy, and finally obtains a regression model with the variables that most accurately explain the forest AGB [51]. In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some pre-specified criterion, such as adjusted R 2 and the Akaike information criterion.
PLS is based on linear transitions from several original variables to relatively few orthogonal factors or latent variables. It is useful for establishing predictive models when the predictor

Creation of Subsamples
Step 1: The matrix of the response and predictor variables was first ordered according to the biomass values in the first column. Then, the matrix was subdivided into five equal size subgroups. The size of the five subgroups was n/5, where n = 801.
Step 2: Bootstrapping was used to create subsamples from the stratified dataset in step 1 [49]. Four sample size classes were generated to test the influence of sample size on the performance of the prediction models. For each of the four desired sample size classes (nclass1 = n/4, nclass2 = n/3, nclass3 = n/2, nclass4 = n), 500 datasets were generated using replacement. The stratification procedure in step 1 ensured that samples from the full range of biomass values were included in each bootstrapped dataset.

Biomass Modeling Methods
For each sample size class, the 500 input datasets were fitted by six prediction algorithms: Gaussian processes (GPs), stepwise linear models (LMSTEP), partial least squares regression (PLS), nonlinear regression using a logistic model (NRL), random forest (RF), and support vector machine (SVM).
GPs provide a probabilistic approach for learning generic regression problems with kernels [50] and use a weighting strategy for the optimization, which is relevant to the predictor variables. The inverse of the specific variable controlling the spread of the relations for each particular predictor represents the relevance of that predictor variable. That is, the larger the variable, the more extended the relationships along that predictor (i.e., containing less informative content).
LMSTEP regression removes the least correlated variables using an iterative strategy, and finally obtains a regression model with the variables that most accurately explain the forest AGB [51]. In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some pre-specified criterion, such as adjusted R 2 and the Akaike information criterion.
PLS is based on linear transitions from several original variables to relatively few orthogonal factors or latent variables. It is useful for establishing predictive models when the predictor variables are collinear [52]. As metrics extracted from GLAS may be collinear, a PLS model was developed.
NRL should not be confused with the binomial logistic regression that is often used with categorical dataset. The mathematical form of the NRL model is given by [53].
SVM is a typical nonparametric algorithm [54] that transforms a nonlinear regression into a linear regression using a kernel function. In this paper, the radial basis function kernel was selected for forest AGB prediction, since it requires fewer parameters and can reduce the complexity of numerical calculation. RF regression has been successfully used in ecological remote sensing areas such as biomass estimation. RF constructs numerous independent regression trees, and a bootstrapping sample is Remote Sens. 2019, 11, 722 8 of 16 chosen for each regression tree. The regression tree continues to grow until it reaches the largest size. The results are estimated by averaging the predictions of all regression trees [55].
For each of the prediction algorithms, as well as each sample size class, k-fold cross-validation was performed, which produced the diagnostics RMSE and coefficient of determination (R 2 ). In this study, cross-validation was applied with three settings (3-fold, 5-fold, and 10-fold) to test the influence of different settings on the results. The relative error index was calculated using the following equation to measure the accuracy of prediction: where y i * is the predicted AGB, y i is the observed value, and n is the number of validation samples.
Analysis of variance (ANOVA) was used to estimate the contribution of each of the three factors (prediction method, sample size, and number of folds) and their interactions to the model diagnostics (R 2 and RMSE) [56].

ANOVA Results
The maximum sum of squares values are the reported for the prediction method (76.34) and sample size (12.65) as well as for their interaction (18.52) ( Table 1). The sum of squares value indicates the contribution of the factor in describing the variance of R 2 and RMSE; a larger value indicates a more important contribution. The number of folds in the k-fold cross-validation settings was less important than the prediction method and sample size, as indicated by the smaller sum of squares (3.58). ANOVA results for RMSE showed the same pattern, with the prediction model and sample size as well as their interaction having the larger sum of squares values (5013184, 22048, and 53584, respectively). The number of folds did not notably contribute to the model diagnostics (the sum of squares value was 148 for RMSE).

Model Performance
Figures 4 and 5 summarize the model performance obtained from all model runs. The violin plots in Figure 4 illustrate the distribution of the R 2 values obtained by five-fold cross-validation for the six methods. Each of the methods had four sample size classes. The R 2 values for each of the four sample size classes were obtained from 500 bootstrapped models. The median R 2 values for each of the four sample size classes are presented with colored horizontal stripes. Class 1 to 4 from left to right are shown in red, green, orange, and blue respectively. The violin plots in Figure 5 illustrate the distribution of the RMSE values, and explanations follow those for Figure 4. A comparison of the six prediction methods indicated that RF provided more accurate AGB estimates than the other five methods, especially when a larger sample size was available. Concerning the sample size, a stable increase in accuracy and decrease in RMSE were observed along with an increasing number of sample units (colored horizontal stripes from left to right). Exceptions were the RMSE values of the GP and PLS methods, where a slight increase in RMSE was observed when switching from class 3 to class 4 with the largest number of sample units. The absolute differences in R 2 and RMSE between RF and the second model ranged from 0.04 and 23.78 Mg/ha (class 1, second model GP) to 0.18 and 32.16 Mg/ha (class 4, second model GP). The absolute differences in R 2 and RMSE between RF and the least accurate prediction model (PLS) were 0.24 and 40.02 Mg/ha, respectively. GP and SVM produced larger R 2 and smaller RMSE when compared to NRL, PLS, and LMSTEP in most cases. the six methods. Each of the methods had four sample size classes. The R values for each of the four sample size classes were obtained from 500 bootstrapped models. The median R 2 values for each of the four sample size classes are presented with colored horizontal stripes. Class 1 to 4 from left to right are shown in red, green, orange, and blue respectively. The violin plots in Figure 5 illustrate the distribution of the RMSE values, and explanations follow those for Figure 4. A comparison of the six prediction methods indicated that RF provided more accurate AGB estimates than the other five methods, especially when a larger sample size was available. Concerning the sample size, a stable increase in accuracy and decrease in RMSE were observed along with an increasing number of sample units (colored horizontal stripes from left to right). Exceptions were the RMSE values of the GP and PLS methods, where a slight increase in RMSE was observed when switching from class 3 to class 4 with the largest number of sample units. The absolute differences in R 2 and RMSE between RF and the second model ranged from 0.04 and 23.78 Mg/ha (class 1, second model GP) to 0.18 and 32.16 Mg/ha (class 4, second model GP). The absolute differences in R 2 and RMSE between RF and the least accurate prediction model (PLS) were 0.24 and 40.02 Mg/ha, respectively. GP and SVM produced larger R 2 and smaller RMSE when compared to NRL, PLS, and LMSTEP in most cases.   sample size classes were obtained from 500 bootstrapped models. The median R 2 values for each of the four sample size classes are presented with colored horizontal stripes. Class 1 to 4 from left to right are shown in red, green, orange, and blue respectively. The violin plots in Figure 5 illustrate the distribution of the RMSE values, and explanations follow those for Figure 4. A comparison of the six prediction methods indicated that RF provided more accurate AGB estimates than the other five methods, especially when a larger sample size was available. Concerning the sample size, a stable increase in accuracy and decrease in RMSE were observed along with an increasing number of sample units (colored horizontal stripes from left to right). Exceptions were the RMSE values of the GP and PLS methods, where a slight increase in RMSE was observed when switching from class 3 to class 4 with the largest number of sample units. The absolute differences in R 2 and RMSE between RF and the second model ranged from 0.04 and 23.78 Mg/ha (class 1, second model GP) to 0.18 and 32.16 Mg/ha (class 4, second model GP). The absolute differences in R 2 and RMSE between RF and the least accurate prediction model (PLS) were 0.24 and 40.02 Mg/ha, respectively. GP and SVM produced larger R 2 and smaller RMSE when compared to NRL, PLS, and LMSTEP in most cases.   A downside was that RF showed the largest variances in the obtained R 2 values when four classes of sample size were all used (Figure 4). However, the ranges of the variation in R 2 and RMSE reduced when the sample sizes increased ( Figure 6).
The modeling accuracy results of the six prediction methods created with different cross-validation settings are presented in Figures 7 and 8. The violin plots in Figures 7 and 8 illustrate the distribution of R 2 and RMSE values, respectively, from the 500 bootstrapped models as obtained by 3-fold, 5-fold, and 10-fold cross-validation for each method using the class 4 dataset. The median values of R 2 or RMSE for 3-fold, 5-fold, and 10-fold cross-validation are presented from left to right in red, green and orange horizontal stripes. In most cases, a larger number of folds led to an obvious increase in R 2 . However, the difference in RMSE resulting from the different number of folds was very small (Figure 8). A downside was that RF showed the largest variances in the obtained R 2 values when four classes of sample size were all used (Figure 4). However, the ranges of the variation in R 2 and RMSE reduced when the sample sizes increased (Figure 6). The modeling accuracy results of the six prediction methods created with different cross-validation settings are presented in Figures 7 and 8. The violin plots in Figures 7 and 8 illustrate the distribution of R 2 and RMSE values, respectively, from the 500 bootstrapped models as obtained by 3-fold, 5-fold, and 10-fold cross-validation for each method using the class 4 dataset.
The median values of R 2 or RMSE for 3-fold, 5-fold, and 10-fold cross-validation are presented from left to right in red, green and orange horizontal stripes. In most cases, a larger number of folds led to an obvious increase in R 2 . However, the difference in RMSE resulting from the different number of folds was very small (Figure 8).   The modeling accuracy results of the six prediction methods created with different cross-validation settings are presented in Figures 7 and 8. The violin plots in Figures 7 and 8 illustrate the distribution of R 2 and RMSE values, respectively, from the 500 bootstrapped models as obtained by 3-fold, 5-fold, and 10-fold cross-validation for each method using the class 4 dataset.
The median values of R 2 or RMSE for 3-fold, 5-fold, and 10-fold cross-validation are presented from left to right in red, green and orange horizontal stripes. In most cases, a larger number of folds led to an obvious increase in R 2 . However, the difference in RMSE resulting from the different number of folds was very small (Figure 8).  To validate the prediction accuracy of the different methods, the mean relative error indices were calculated ( Table 2). The sample sizes in class 1 to class 4 were n/4, n/3, n/2, and n respectively. A prediction technique with a smaller relative error is more accurate. The results showed that RF was more accurate than the other methods, followed by GPs, NRL, SVM, LMSTEP, and PLS in sequence. The relative error decreased when the sample sizes increased in the RF method, but this To validate the prediction accuracy of the different methods, the mean relative error indices were calculated ( Table 2). The sample sizes in class 1 to class 4 were n/4, n/3, n/2, and n respectively. A prediction technique with a smaller relative error is more accurate. The results showed that RF was more accurate than the other methods, followed by GPs, NRL, SVM, LMSTEP, and PLS in sequence. The relative error decreased when the sample sizes increased in the RF method, but this trend was not obvious for the other methods. Table 2. The means of relative error for Gaussian processes (GPs), stepwise linear models (LMSTEP), nonlinear regression using a logistic model (NRL), partial least squares regression (PLS), random forest (RF), and support vector machine (SVM).

Mapping Forest Aboveground Biomass
The spatial distribution of forest AGB over the study area is shown in Figure 9, which was obtained using the RF model with the largest number of sample units. The forest AGB density was distributed mainly within the 60-180 Mg/ha. Figure 9 shows clear spatial patterns in the study area of the AGB values. Greater AGB values were distributed in South and Northeast Jiangxi, which are mountainous areas. The total AGB of all forest types estimated with RF was 9.19 × 10 8

Discussion
Although many studies of forest AGB estimation have been conducted, substantial uncertainties remain in the current estimations. In this study, we analyzed the effects of the prediction method (six cases), sample size (four cases), and cross-validation (three cases) on the

Discussion
Although many studies of forest AGB estimation have been conducted, substantial uncertainties remain in the current estimations. In this study, we analyzed the effects of the prediction method (six cases), sample size (four cases), and cross-validation (three cases) on the prediction quality of forest AGB estimation from LiDAR and optical remote sensing data. We generated subsamples using stratified bootstrapping. The ANOVA method was used to rank the importance of the three factors on the accuracy of the biomass predictions reflected in the computed diagnostics (RMSE and R 2 ).
The ANOVA results showed that the prediction method was the most important factor affecting the quality of the biomass estimation. This is in line with earlier studies [33,57,58] that reported notable differences in the performance of the prediction methods.
The performance of the six prediction methods was compared in this study, including three parameter methods (LMSTEP, PLS, and NRL) and three nonparametric methods (GPs, RF, and SVM). Among the parameter methods, LMSTEP and PLS are classified as linear regression and NRL is classified as nonlinear regression. RF provided the greatest estimation accuracy in terms of the largest R 2 , the smallest RMSE, and the smallest relative error, especially when a larger sample size was used. This result complements those of other studies [10,40,57,58]. One of the main advantages of RF is its flexibility due to its conceptual design, which differs from all the other five models. A downside of RF may be that when a small size sample is applied in the algorithm, the variance of estimates is considerable. An increase in sample size can lead to a reduction in the range of variation. Among the three nonparametric algorithms in this study, GPs generated suboptimal predicted results in terms of R 2 , RMSE, and relative error. The SVM showed the least favorable accuracy, which is consistent with the conclusions from recent studies [59,60].
The linear regression methods LMSTEP and PLS, provided less accurate estimates than the nonparametric algorithms RF, GPs, and SVM, in most cases, which was in line with the findings of other studies [34]. In addition, no improvement occurred when larger size samples were used for these two methods. This can be explained by the fact that the relationship between remote sensing predictors and observed forest biomass is likely to be nonlinear and therefore not well modeled. Linear regression based on remote sensing data may be more suitable for predicting vegetation AGB in other biomes such as grasslands. The NRL was more accurate than LMSTEP and PLS in terms of relative error.
The sample size was less important than the prediction method for explaining the variance of model diagnostics RMSE and R 2 . The R 2 values increased along with the increasing number of sample units for the nonparametric algorithms, especially for RF. The RMSE decreased with increasing number of sample units, except for GPs from class 3 to class 4. Therefore, model estimates can be improved by increasing the sample size rather than improving the prediction algorithms. Since field data are costly to obtain and limited, other choices such as realistic synthetic datasets present an interesting alternative to field data [61]. Huang et al. [23] constructed allometric models using limited field plots and transmitted the ecological information to GLAS data. They enriched the AGB samples into the number of GLAS spots. They also stated that more training samples may be helpful in developing a robust RF model for mapping forest AGB, despite the uncertainties introduced by AGB in GLAS spots.
The number of folds was found to have an effect on the explained variance in R 2 . Increasing the number of folds within the cross-validation settings improved the R 2 values, especially for the three nonparametric algorithms (RF, GPs, and SVM). For model runs with a large number of folds (for example, 10 folds) may produce larger R 2 values, which overestimate the predictive power of the models. We concluded that the small sample size of the hold out sample during the cross-validation using a large number of folds caused this result. Most previous studies ignored the influence of the number of folds. When the results from different studies are compared, the number of folds in the cross-validation settings should be considered. RMSE values were more robust than R 2 values to the validation settings, and no apparent change occurred in RMSE under different numbers of folds.
The results should be interpreted cautiously as the study was limited to one site. This conclusion may change if other sites with different forest structure, tree species, and topography are tested.
In addition, ANOVA assumes that the dependent variable is subjected to non-negligible error. The response variables in this study were sample-based estimates subjected to sampling variability. ANOVA was used as a diagnostic tool in this study, but the significance level was small. Furthermore, the allometric models led to uncertainties in biomass estimates. We tried to improve the accuracy via using allometric models for each tree species of the study area. However, there were still some uncertainties because tree densities, soil texture, and climate conditions may influence the growth of tree height and dbh, thus influencing the accumulation of biomass.
Further studies could investigate whether other datasets, such as radar, could alter our finding. An additional research issue to be tackled is the comparison of methods relative to the width of the confidence intervals rather than to the quality of fit of the models to the data. Considering the limitations of the available reference samples, improving the statistical algorithms or developing a new prediction method would be helpful for minimizing the predictive error of forest AGB. We are now developing a new prediction algorithm, called high accuracy surface modeling, which has the potential to improve the prediction accuracy.

Conclusions
We illustrated the forest AGB estimation through a comparative analysis of different prediction methods (GPs, NRL, LMSTEP, PLS, RF, and SVM) based on field measurements, LiDAR and optical data. The effects of the prediction method, sample size, and cross-validation setting were compared and analyzed using two remote sensing datasets combined with stratified bootstrapping. We found that the prediction method had a considerable effect on the accuracy of the biomass estimates, being more important than sample size. The number of folds in the cross-validation setting was a less important factor than prediction method and sample size. The model diagnostic R 2 was influenced notably by the number of folds. The estimate produced with the RF algorithm was more accurate than the other five AGB models in this study. In most cases, the increase in the sample size led to an increase in the accuracy of the estimation. The RF algorithm notably benefits from a larger sample size. Finally, the forest AGB was mapped using the RF model with all the reference data. Choosing an appropriate prediction method is recommended to improve the predictive accuracy. The influence of the number of folds should be considered when comparing model performance.