Estimating Forest Volume and Biomass and Their Changes Using Random Forests and Remotely Sensed Data

: Despite the popularity of random forests (RF) as a prediction algorithm, methods for constructing conﬁdence intervals for population means using this technique are still only sparsely reported. For two regional study areas (Spain and Norway) RF was used to predict forest volume or aboveground biomass using remotely sensed auxiliary data obtained from multiple sensors. Additionally, the changes per unit area of these forest attributes were estimated using indirect and direct methods. Multiple inferential frameworks have attracted increased recent attention for estimating the variances required for conﬁdence intervals. For this study, three di ﬀ erent statistical frameworks, design-based expansion, model-assisted and model-based estimators, were used for estimating population parameters and their variances. Pairs and wild bootstrapping approaches at di ﬀ erent levels were compared for estimating the variances of the model-based estimates of the population means, as well as for mapping the uncertainty of the change predictions. The RF models accurately represented the relationship between the response and remotely sensed predictor variables, resulting in increased precision for estimates of the population means relative to design-based expansion estimates. Standard errors based on pairs bootstrapping within or internal to RF were considerably larger than standard errors based on both pairs and wild external bootstrapping of the entire RF algorithm. Pairs and wild external bootstrapping produced similar standard errors, but wild bootstrapping better mimicked the original structure of the sample data and better preserved the ranges of the predictor variables.


Introduction
National forest inventories (NFIs) were initiated to collect information that could be used to ensure sustainable use of wood resources. As the forest functions of interest broadened, a substantial number of new variables were introduced to NFI field surveys [1] with growing forest stock volume and biomass now among the most commonly assessed. These variables are of fundamental importance for ensuring maintenance of wood production, formulating and implementing global forest policy, assessing sustainability, and improving decision-making regarding management decisions such as harvests. Traditionally, data obtained for ground plots that are spatially distributed using statistical sampling designs have satisfied the need for accurate forest parameter estimates, albeit at great cost data. The Norwegian data included plot-level biomass datasets for two times for the same plots and temporally consistent ALS data (1999,2010). Mean volume change per unit area was estimated with indirect methods for the Spanish study area and mean biomass change per unit area with direct methods for the Norwegian study area. Evaluation of the methods for two climatically different study areas, two response variables, and two approaches to estimating changes contributes to the robustness of the results and conclusions.

La Rioja, Spain
The study area is located in the La Rioja region of northern Spain ( Figure 1) and consists of Scots pine (Pinus sylvestris L.) forests. According to the Spanish National Forest Inventory (SNFI), it is the second largest species formation in La Rioja, representing an area of approximately 27,000 hectares (ha), predominantly in zones above 1000 m elevation which cover the highest areas of the mountain regions. In general, La Rioja forests consist of pure and dense stands and were repopulated mainly due to the importance of their timber production. In addition, human depopulation of rural areas during recent decades has led to the natural expansion of the species. A Scots pine species mask was constructed based on the Spanish National Forest Map (SNFM25) (scale: 1:25,000). Forest land polygons of 1 ha minimum size in which Scots pine was the most dominant species were selected, resulting in a set of areas in which Scots pine was either monospecific or dominant but mixed with other species. The data used to construct the RF volume models were obtained from two different surveys (Table 1): One characterized as La Rioja 2010 which consisted of ground data for a systematic sample of plots from the 4th SNFI and the first ALS acquisition (Section 2.1.1, first campaign), and the other characterized as La Rioja 2016 which consisted of ground data for a purposive sample of plots and the second ALS acquisition (Section 2.1.2, second campaign).

La Rioja Sampling Design
•

First campaign
The 4th SNFI in La Rioja was conducted between 2011 and 2012 using permanent sample plots established systematically at the intersections of a 1 × 1 km grid in areas identified as forest land by the SNFM25 [36]. The SNFI sample units consist of four circular concentric plots of radius 5, 10, 15,

La Rioja, Spain
The study area is located in the La Rioja region of northern Spain ( Figure 1) and consists of Scots pine (Pinus sylvestris L.) forests. According to the Spanish National Forest Inventory (SNFI), it is the second largest species formation in La Rioja, representing an area of approximately 27,000 hectares (ha), predominantly in zones above 1000 m elevation which cover the highest areas of the mountain regions. In general, La Rioja forests consist of pure and dense stands and were repopulated mainly due to the importance of their timber production. In addition, human depopulation of rural areas during recent decades has led to the natural expansion of the species. A Scots pine species mask was constructed based on the Spanish National Forest Map (SNFM25) (scale: 1:25,000). Forest land polygons of 1 ha minimum size in which Scots pine was the most dominant species were selected, resulting in a set of areas in which Scots pine was either monospecific or dominant but mixed with other species.
The data used to construct the RF volume models were obtained from two different surveys (Table 1): One characterized as La Rioja 2010 which consisted of ground data for a systematic sample of plots from the 4th SNFI and the first ALS acquisition (Section 2.1.1, first campaign), and the other characterized as La Rioja 2016 which consisted of ground data for a purposive sample of plots and the second ALS acquisition (Section 2.1.2, second campaign). The 4th SNFI in La Rioja was conducted between 2011 and 2012 using permanent sample plots established systematically at the intersections of a 1 × 1 km grid in areas identified as forest land by the SNFM25 [36]. The SNFI sample units consist of four circular concentric plots of radius 5, 10, 15, and 25 m. Trees with diameter at breast height (DBH) of at least 75 mm are measured in a 5 m radius plot; trees with DBH >125 mm are measured in a 10 m radius plot; trees with DBH >225 mm are measured in a 15 m radius plot; and only trees with a DBH >425 mm are measured in the 25 m radius plot [36,37]. For each tree, DBH, total height, species, and the tree's position relative to the plot center (direction and distance) are recorded. Species-specific allometric models were used to predict individual tree volumes which were weighted to predict the total plot stock volume [31]. We ignored the uncertainty associated with individual tree allometric volume predictions based on the findings of McRoberts et al. [38,39]. When the allometric models are based on large calibration datasets and when the model fits the data well, the effects of this model prediction uncertainty are small relative to the effects of plot-to-plot sampling variability. However, if smaller calibration samples sizes are used as in La Rioja 2016, the final uncertainty value could be underestimated.
All SNFI plots in the study area were filtered by selecting those for which the forest class under consideration was dominant (more than 80% of total volume). Furthermore, only plots whose maximum height (the height of the tallest tree, hmax) were consistent with the 99th height percentile, derived from ALS 2010, were considered due to inaccuracy of plot center coordinates (5-15 m [40,41]). This inaccuracy could lead to discrepancies between the field measured attribute and the population unit remote sensing metrics. SNFI plots were removed from the analysis when their respective hmax was different in the range of plus/minus 4 m with respect to the 99th height percentile. The resulting sample after filtering by both criteria consisted of 155 field plots.

Second campaign
Field data for the second survey were acquired during the summer of 2018 for 49 circular plots with a 14.1 m radius. The center of each plot was recorded using a dual-frequency global navigation satellite system receiver GEO7x (Trimble ® GNSS) with sub-meter horizontal accuracy (0.4 m) after post-processing correction. DBH and species were recorded for all trees with DBH >75 mm. Height was measured for six randomly selected trees per plot. The heights of the remaining trees for each plot were predicted using a generalized height-diameter model based on the tree heights measured on all field plots. The total plot stock volume was predicted using the allometric models constructed by Crecente-Campo et al. [42] for Scots pine using 2682 plots located in the major mountain ranges of Spain.
The 49 circular plots were purposively selected considering the variability of the forest stands and their distribution over the study area. The aim was to achieve a specific distribution that encompasses different ranges and variability of forest structural attributes including height, forest canopy cover, canopy relief ratio, elevation, and slope. Additionally, these plots were selected based on their accessibility for field work, favoring those areas closest to forest tracks.

Remotely Sensed Data
ALS data were acquired in 2010 and 2016 during leaf-on conditions by the Spanish National Programme of Aerial Orthophotography with a mean density of 0.5 points per m 2 and vertical RMSE <0.4 m for the former and 2 points per m 2 and vertical RMSE <0.2 m for the latter. ALS tiles were processed using FUSION software [43] to generate a 2-m digital elevation model from the ground points, which facilitates estimation of height above the ground surface for each vegetation point. Fifteen forest structure metrics were calculated, having previously removed the points below 2 m, both for the field plots and 25-m × 25-m cells that tessellated the study area and served as population units. ALS metrics included mean, variance (varia), standard deviation (stdev), coefficient of variation (cv), interquartile range (iq), kurtosis (kurto), percentiles (ranging from the 1st to 99th percentile: p1, p5, p25, p50, p75, p95, and p99), canopy relief ratio (crr), and forest canopy cover (lfcc). The locations of the 25-m × 25-m cells were exactly the same for both ALS datasets. The 625 m 2 (25-m × 25-m) size for the population units was chosen because it is similar to the smallest sample plots (14.1 m radius; 625 m 2 ) used for the second campaign. Even the smallest plots and the ALS cells had enough ALS points to estimate ALS metrics reliably. With a cell size of 625 m 2 and a minimum point density of 0.5 points per m 2 , there would be, on average, 312.5 points per plot and cell. Based on Vauhkonen et al. [44], more than 300 points per plot should be sufficient.
The study area is covered by three Landsat scenes with paths (p) and rows (r): p201 r031, p200 r031, and p200 r030 ( Figure 1). For each scene, two predominantly cloud free images (less than 10% cloud cover) were downloaded corresponding to the years for the two ALS datasets (2010 and 2016). From 18 July for p200 r030 and 23 June for the rest of Landsat scenes. Surface reflectance of individual bands (blue, green, red, near infrared (NIR), and two shortwave infrared (SWIR)) and their respective quality assurance (QA) bands were calculated. QA bands were used to remove the remaining cloud and cloud shadows.
Each image was resampled using the nearest neighbor resampling method from 30-m × 30-m to the corresponding 25-m × 25-m ALS cell size. From the cloud-free Landsat mosaic, the normalized difference vegetation index (ndvi), the normalized burn ratio (nbr), and the normalized difference moisture index (ndmi) were calculated [17,31,45]. Landsat bands as well as spectral vegetation index values were calculated for each field plot.

Våler, Norway
The study area was located in a boreal forest region (approximately 850 ha) in Våler municipality ( Figure 1) in southeastern Norway with Norway spruce (Picea abies (L.) Karst.) and Scots pine as the dominant species [7]. The dataset consisted of aboveground biomass (AGB) observations for 176 circular plots (200 m 2 ) measured in 1999 and 2010 (hereafter, Våler 1999 and Våler 2010) and ALS metrics derived from two ALS datasets acquired in the same years as the field campaigns. Field data ( Table 1) were acquired using a systematic sampling design and aggregated into four classes: Regenerated forest (31 plots), young forest (55 plots), mature spruce dominated forest (58 plots), and mature pine dominated forest (32 plots). Tree-level AGB was predicted for both 1999 and 2010 using allometric models based on measurements of DBH and tree height. Plot-level AGB was predicted as the sum of individual tree-level AGB predictions. The ALS metrics represented a range of height percentiles (from the 0th to 90th percentile: p00, p10, p20, p30, p40, p50, p60, p70, p80, and p90) as well as canopy densities (ranging from the 0th to 90th canopy density: d00, d10, d20, d30, d40, d50, d60, d70, d80, and d90) and were calculated for each field plot and for 200-m 2 square cells that tessellated the area and constituted the population under study. For further details see [12,46].

Random Forests Prediction Models
RF models were constructed using the R package RandomForest [47]. For this study RF models were calibrated with the default settings of regression trees (n tree = 500) and the default value for the number of variables to test at each node (mtry = p/3, where p is the total number of independent variables). An ensemble of 500 regression trees was sufficient for estimates to stabilize. For the La Rioja datasets, mean volume (VOL) per unit area was predicted using the information from the field plot datasets, the set of ALS metrics and the Landsat auxiliary information. For the Våler datasets, RF models were used to describe the relationships between mean AGB per unit area and the ALS metrics, as well as to predict AGB change (Section 2.5). The change in biomass (∆AGB) was modeled directly using change in biomass (between 2010 and 1999) on each field sample plot as the response variable, and differences in ALS metrics between the two points in time as predictor variables. The importance of each predictor variable was assessed through the RF importance metric percentage increase in the model mean square error (%IncMSE) along with exploring the relationship between the response and predictor variables by means of correlation analysis. RF %IncMSE and the correlation analysis results were analyzed until four non-collinear (r < 0.80) predictor variables were selected.

Expansion Estimator
Assuming an equal probability simple random or systematic sampling design, the expansion (Exp) estimator,μ Exp , [48,49] (p.51) and its standard error, SE μ Exp , were calculated as: where n is the sample size and y i is the observed value for each ith sample unit.

Model Assisted Estimator
Model-assisted (MA) estimators use models based on auxiliary data to enhance precision but rely on probability samples for validity [24]. The model-assisted estimator,μ MA , was expressed as: where N is the total number of population units (cells). The first term 1 N N i = 1ŷ i is the mean of the RF model predictions,ŷ i , for all population units, and the second term, 1 n n i = 1 (ŷ i − y i ), is a correction factor (CF), also characterized as the bias estimate, based on the sample unit observations and predictions to adjust for estimated systematic RF prediction error [50].
The model-assisted standard error, SE(μ MA ), was calculated as: where ε i =ŷ i − y i and ε = 1 n n i = 1 ε i . Although the term regression is often used to characterize these model-assisted estimators, multiple other prediction techniques including nonlinear regression models and non-parametric prediction techniques have also been used [51][52][53][54][55][56].

Model-Based Estimator
Because the La Rioja 2016 data were not acquired using a probability sampling design, a model-based (MB) inference framework was necessary. The model-based estimator of the population mean,μ MB , is based on population unit predictions,ŷ i .
Bootstrap procedures were used to estimate the MSE and the standard error, SE = √ MSE of the model-based estimate of the population mean and the mean population change. Two bootstrapping approaches were considered: The wild bootstrap [57,58] and the pairs or non-parametric bootstrap [59]. All bootstrap methods depend on the notion of a bootstrap sample which must mimic the original sampling design [57,[59][60][61][62] (p. 2). In this sense, an advantage of the wild bootstrap is that it preserves the systematic nature of the original sample, whereas pairs bootstrap generates resamples which do not necessarily mimic the original sample structure.
Bootstrapping was used at two levels with RF. First, RF uses the pairs bootstrap to select a resample for use with each new regression tree. With pairs bootstrap, we created a matrix containing in each row the observed values of the dependent variable and the values of the corresponding predictor variables. For each RF tree, a new bootstrap sample (of the same size as the original data) was constructed by selecting rows randomly with replacement and was used to construct a new random forests model and to estimate the population mean for the population. The standard deviations of the 500 estimates of the population mean, one for each of the 500 default RF trees, can be used to estimate the MSE for the estimate of the population mean. Because the RF bootstrap resample is randomly selected with replacement, there is little control over the sample used to build each tree. This level of bootstrapping is characterized by Mentch et al. [63] (p. 2) as internal.
Second, bootstrapping can be used to iterate over the entire RF algorithm to estimate the MSE. Mentch et al. [63] (p. 11) characterize this level of bootstrapping as external. The pairs bootstrap is used as previously described to select a resample of the original sample data for each iteration of the RF algorithm. This bootstrap resample is then resampled again for each tree within the RF algorithm. With the wild bootstrap, residual values for each sample plot unit are estimated as the difference between the observation and random forests model prediction. For each sample unit, the estimated residual is multiplied by a random number from a distribution with mean 0 and variance 1, but for this study constrained to the interval between −2 and 2. Finally, this product is added to the model prediction to construct the wild bootstrap resample observation. A new wild resample is constructed for each bootstrap iteration of the RF algorithm with the wild resample then resampled again with the algorithm. For both the external wild and pairs bootstraps, n boot = 10,000 iterations were performed for each dataset and each study area.
Regardless of the external bootstrap used, the bootstrap mean,μ boot , was calculated as: whereμ b is the estimated population mean for the bth iteration. The SE was calculated as: Remote Sens. 2019, 11,1944 8 of 21

Population Change Estimation
For the La Rioja data, volume change (∆VOL) for each population unit was predicted as the difference between the two VOL predictions for the population unit, one for La Rioja 2010 and one for La Rioja2016 (Equation (8)). Because VOL observations were not available at two points in time for the same plots, change could not be estimated using direct methods. For the Våler data, if the same method were applied, an estimate of the covariance between the means for the two dates would be necessary because population unit predictions are based on data for the same plots, albeit at different times [64]. This calculation is cumbersome for RF models because the bootstrap sample used for each tree from the first dataset may be different from the resample from the second dataset. Therefore, for the Våler dataset, ∆AGB was predicted directly using RF (Equation (10)), differences between AGB measurements for the same plots and differences for the explanatory variables (ALS metrics) derived from the 1999 and 2010 ALS acquisitions.
For the La Rioja study area, the indirect estimate of mean ∆VOL per unit area was calculated as: For the Våler study area, the direct estimate of mean ∆AGB per unit area was calculated as: from Equation (7). A spatially continuous uncertainty map for ∆VOL and ∆AGB was constructed, respectively, for the La Rioja and the Våler study area. For the La Rioja dataset, the total ∆VOL uncertainty for each population unit was estimated as: where SE(ŷ i ) was the square root of the MSE estimate for each population unit and each dataset using mean VOL predictions for each population unit in each wild bootstrap iteration. For the Våler dataset, population unit uncertainty was estimated as the square root of the MSE estimate of the mean ∆AGB predictions for each population unit.

Overall Workflow
A flowchart depicting the steps conducted in this study and previously described is presented ( Figure 2). dataset, population unit uncertainty was estimated as the square root of the MSE estimate of the mean ΔAGB predictions for each population unit.

Overall Workflow
A flowchart depicting the steps conducted in this study and previously described is presented (Figure 2).

Random Forests Regression Models
For the La Rioja datasets, RF %IncMSE indicate that the most important variables for both La Rioja datasets were the 25th and 50th percentiles and the ALS height mean (Figure 3). However, because of the large correlations among these variables, only the 25th percentile was used (Figure 4). Overall, all the ALS metrics had large correlations with VOL observations in the sample units (Figure 4) with the exception of var, iq, cv, and stdev. Somewhat smaller correlations between VOL and the Landsat spectral bands and vegetation indices were observed for La Rioja 2010 than for La Rioja 2016. The final models for La Rioja 2010 that used the 25th and 99th percentiles, lfcc and crr as predictor variables and for La Rioja 2016 that used the 25th and 99th percentile, lfcc and ndmi explained 82-86% of the variability.
Rioja 2010 and 2016, ALS metrics, with the exception of 0th height percentile, had strong correlations with field AGB with greater correlations for the percentile metrics than for the density metrics ( Figure  4). Among all the height percentile metrics, the 20th, 30th, 40th, 50 th , and 60th had the greatest correlations with AGB which agrees with the RF variable importance assessments. In fact, for the Våler change analysis the most important predictor variables were the differences in the 20th, 50th, and 30th ALS height percentiles. RF models constructed with the four most important non-collinear predictor variables explained less variability than when using all ALS metrics. Therefore, AGB and ΔAGB models were fitted using all the ALS metrics. The latter biomass models explained 83%-86% of the variability for AGB and 82% of the variability for ΔAGB.

Estimates of Population Parameters for Each Point in Time
The For the Våler datasets, the 50th ALS height percentile was the most important predictor variable for both datasets. However, similar importance values were observed for the 50th, 20th, 30th, 60th ALS height percentiles and the 00th and 10th canopy densities for Våler 2010. For Våler 1999, the 40th ALS height percentile and the 00th and 10th canopy densities also ranked high ( Figure 3). As for La Rioja 2010 and 2016, ALS metrics, with the exception of 0th height percentile, had strong correlations with field AGB with greater correlations for the percentile metrics than for the density metrics (Figure 4). Among all the height percentile metrics, the 20th, 30th, 40th, 50th, and 60th had the greatest correlations with AGB which agrees with the RF variable importance assessments. In fact, for the Våler change analysis the most important predictor variables were the differences in the 20th, 50th, and 30th ALS height percentiles. RF models constructed with the four most important non-collinear predictor variables explained less variability than when using all ALS metrics. Therefore, AGB and ∆AGB models were fitted using all the ALS metrics. The latter biomass models explained 83-86% of the variability for AGB and 82% of the variability for ∆AGB.

Estimates of Population Parameters for Each Point in Time
The  The MA estimates of the means for both study areas were very similar to but smaller than the Exp estimates, especially for La Rioja 2010 and 2016 (Table 2). In all cases, introduction of the remotely sensed auxiliary variables for use with the MA estimators reduced the SEs with SE(μ MA ) = 4.33 m 3 /ha  Figure 5) showed most points were located close to the 1:1 line, although there were a few with greater deviations, particularly for smaller and greater prediction ranges. An inherent feature in RF contributes to this phenomenon which is exacerbated by small sample sizes as in La Rioja 2016. The mean for any node will be the mean of the number of sample observations in each node. Therefore, no prediction can be smaller than the smallest sample observation or greater than the greater sample observation.

Estimates of Parameters
For the La Rioja study area, ΔVOL population unit predictions ranged from -397. 43   External bootstrapping produced SEs that were smaller than internal bootstrap SEs by factors ranging from 1.63 to 3.03. At the external level, the two bootstrap approaches produced similar SEs, albeit slightly smaller for the pairs bootstrap (Table 2). However, smaller should not be construed to indicate more accurate; in particular, because wild bootstrapping mimics the structure of the original sample, which the pairs bootstrapping does not for systematic samples, wild bootstrapping was  Figure 6 shows the change maps for each study area along with their wall-to-wall uncertainty maps. Cold colors (blue) indicate a loss of VOL or AGB while warm colors (yellow) indicate an increase in the reported variables. Greater uncertainty values are associated with the greatest change predictions. For the La Rioja study area, population units with negative ∆VOL predictions coincide with areas where management practices such as clear-cutting or thinning were conducted. Overall, there has been an increased in AGB in the Våler study area and the population units with negative ∆AGB predictions coincide with the deforestation activities classified in McRoberts et al. [46].

RF Optimitation: Landsat Variables
Even though SNFI plots are characterized by the lack of precision for plot center coordinates, the RF models adequately described the relationship between the plot-level VOL observations and the remotely sensed data resulting in smaller SE estimates. Nevertheless, for future analysis SNFI plot and ALS cell sizes should be the same. The differences between the Exp and the MA estimates for La Rioja 2010 and for La Rioja 2016 might be due to differences in the sample and population distributions of the auxiliary variables [35]. As shown in Table 3, the auxiliary variables for the population units have a greater range than for the sample. Unlike regression models, RF cannot extrapolate beyond the range of the response variable in the sample.
This study demonstrated that the use of remotely sensed auxiliary data led to smaller standard error estimates. Overall, Landsat vegetation indices based on NIR and SWIR bands were the most important variables among optical data in the La Rioja study area. This finding is consistent with Dube and Mutanga [65] who reported that SWIR bands and vegetation indices were most important for predicting AGB. In addition, Condés and McRoberts [31] confirmed that SWIR band and ndvi were the most correlated with volume change. However, none of the optical variables were in the final RF volume model for La Rioja 2010 due to their smaller correlations with the volume variable. This result could be attributed to the less accurate plot center coordinates for La Rioja 2010 resulting in NFI field plot observations that do not exactly match with the corresponding Landsat band values [66]. SNFI plots with complete forest cover could be located in population units that are not completely forested with the result that the Landsat reflectance signal for the plot could be mixed with the bare soil reflectance.

RF Optimitation: Landsat Variables
Even though SNFI plots are characterized by the lack of precision for plot center coordinates, the RF models adequately described the relationship between the plot-level VOL observations and the remotely sensed data resulting in smaller SE estimates. Nevertheless, for future analysis SNFI plot and ALS cell sizes should be the same. The differences between the Exp and the MA estimates for La Rioja 2010 and for La Rioja 2016 might be due to differences in the sample and population distributions of the auxiliary variables [35]. As shown in Table 3, the auxiliary variables for the population units have a greater range than for the sample. Unlike regression models, RF cannot extrapolate beyond the range of the response variable in the sample.
This study demonstrated that the use of remotely sensed auxiliary data led to smaller standard error estimates. Overall, Landsat vegetation indices based on NIR and SWIR bands were the most important variables among optical data in the La Rioja study area. This finding is consistent with Dube and Mutanga [65] who reported that SWIR bands and vegetation indices were most important for predicting AGB. In addition, Condés and McRoberts [31] confirmed that SWIR band and ndvi were the most correlated with volume change. However, none of the optical variables were in the final RF volume model for La Rioja 2010 due to their smaller correlations with the volume variable. This result could be attributed to the less accurate plot center coordinates for La Rioja 2010 resulting in NFI field plot observations that do not exactly match with the corresponding Landsat band values [66]. SNFI plots with complete forest cover could be located in population units that are not completely forested with the result that the Landsat reflectance signal for the plot could be mixed with the bare soil reflectance.

Statistical Inference and Bootstrap Techniques
The novelty of this study resides in the fact that model-based MSE estimation with the RF algorithm is assessed through different bootstrapping approaches. Both the population mean and the MSE of the estimate of the population mean are necessary for inferences for population parameters in the form of confidence intervals. Because RF regression trees are constructed using resamples selected with replacement from the original sample (i.e., pairs bootstrapping), the resample might not mimic the original sample as is required for bootstrapping [33,58,[60][61][62]. Further, resampling with replacement may reduce the ranges of the predictor variables, particularly for small samples, thereby introducing bias because of the inability of RF to extrapolate beyond the range of the sample or resample data. These features could contribute to greater SEs when the internal bootstrapping pairs is used, as was observed in La Rioja 2016, although greater SEs were only observed when Landsat variables were included. When only the ALS metrics for La Rioja 2016 were used, SE(μ RF boot ) = 12.13 m 3 /ha as opposed to SE(μ RF boot ) = 24.88 m 3 /ha. In this study, because the RF models were constructed with only four predictor variables, only a single variable was considered by the algorithm for splitting a node. Many of those nodes could have been split with the less-correlated predictor variable (ndmi) with the response VOL variable among all the predictor variables with which the RF model was constructed. This could lead to greater loss of volume prediction precision, and thereby, greater MSE estimates. These findings lead us to exercise caution if internal pairs bootstrapping procedures are used to estimate the model-based MSE. Because model-based MSE estimates were smaller when pairs bootstrapping was used at both internal and external levels, further work should be directed to the wild bootstrapping approach at both internal and external levels.
The greater SE(μ w boot ) for La Rioja 2016 could be attributed to the smaller calibration data set size used [20,39]. Nevertheless, even though the sampling intensity in La Rioja 2016 was smaller by a factor of 3 compared to La Rioja 2010, the SE differences were not meaningful. Model-based SEs depend not only on sample size but also on the model, the model estimates, the covariance for the model estimates, and values of independent variables [24]. This suggests that little may be lost by using a smaller purposively selected calibration data set. This feature is an appealing option for conducting forest inventories in remote areas where the acquisition of field data for probability sampling designs may be prohibitive, either for economic or logistic reasons. Reduction in field-plot sample sizes could allow more money for remotely sensed data acquisition whose integration leads to greater precision in estimates. Alternatively, RF models for La Rioja 2010 could be temporally transferred to the La Rioja 2016 ALS data, thereby capitalizing on the greater SNFI sampling intensity and the greater point density of the 2016 ALS data [16,67]. However, the effectiveness of the RF model based on earlier data could depend on the particular forest attribute predicted and the differences between point cloud characteristics for the ALS data used to construct the model and the ALS data used when applying the model [68]. Tompalski et al. [68] observed up to 6% differences in mean predicted volume when ALS data of different pulse densities were used. Besides point density, other factors related to flight design or sensor configuration can affect the main forest variables estimation at least as much as differences in pulse densities [69]. Nationwide flight campaigns cover a wide range of forest types and topographic structures, with highly variable and not always optimal flying conditions. Holmgren et al. [70] found that canopy cover was more affected by scanning angle than laser height percentiles and Montaghi [71] indicated that most plot-level prediction metrics were relatively unaffected by high scanning angles, up to 20 degrees. Furthermore, the effects of climate change could change the natural conditions between the data acquisition dates and compromise the effectiveness of an RF model constructed using earlier data [72].

Population Change
A large number of studies have demonstrated the value of multi-temporal ALS data for estimating changes in forest attributes [12,16,73]. Change maps are important for understanding the state and dynamics of forests, and they can be used as an initial information source for reporting carbon emissions. Even though our methodology focuses on a regional area, it could have national implications because it mostly leverages open source data gathered to compile data for a better understanding of the forest status. Numerous countries continue to acquire second coverage ALS data resulting in a greater number of territories with multi-ALS flights in which our workflow could be replicated. The similarity of the results obtained for this study (∆AGB = 15.11 Mg/ha with SE (∆AGB) = 2.61 Mg/ha (17.27%)) with previous studies on change estimation with remotely sensed data lend support for the suitability of the RF algorithm. Nӕsset et al. [74] and McRoberts et al. [15] analyzed the same Våler data using linear regression models with a direct change approach. The former reported ∆AGB = 11.9 Mg/ha with SE (∆AGB) = 1.6 Mg/ha (13.44%), and the latter ∆AGB = 13.62 Mg/ha with SE (∆AGB) = 2.21 Mg/ha (16.23%). Zhao et al. [14] found similar differences between linear regression models and RF when estimating ∆AGB.
Although direct approaches are often reported as preferable in the scientific literature because only errors from one model are incorporated [12,13,74], other studies have found greater accuracies with indirect methods. Økseter et al. [75] estimated ∆AGB in boreal forest using multi-temporal ALS data in Norway and reported the indirect approach as a more satisfactory. McRoberts et al. [15] showed indirect methods produced greater precision when using nonlinear models with the same Våler data as used for this study. Zhao et al. [14] estimated ∆AGB in a forested landscape in Scotland with the indirect approaches showing slightly more precise performances. In addition, the indirect modeling has been claimed to be less sensitive to extrapolating and it has been found to be a better alternative to estimate changes more accurately at stand-level [73]. In light of the diversity of results, the most accurate and/or precise approach may have to be determined by the data available for each study.

Data Considerations
Direct approaches require repeated measurements of the same field plots which are not always possible due to economic constraints. Further, changes in sampling protocol or field crews between measurements may complicate comparisons of repeated measurements [76]. Costs associated with indirect approaches could be reduced by predicting forest attributes using models with temporal transferability [16] but bearing in mind that temporal transferability should be conducted carefully as previously mentioned.
Nevertheless, the notion of worldwide, multi-temporal, wall-to-wall ALS data available free of charge is likely unrealistic. A substantial investment is required to acquire ALS data for large forest areas, and subsequent processing requires large data storage capability. Because these factors hinder development of forest inventories based on this technology, other alternatives should be considered to mitigate the costs. The increasing access to satellite images may contribute to development of more cost-efficient forest inventories. Basal area removal has been modeled successfully with RF using repeated measurements of NFI plots and Landsat based disturbance products over a 30-year period [77]. Landsat time series have also been explored for mapping AGB dynamics [78]. An alternative to wall-to-wall ALS data could be acquisition of ALS transects in combination with other optical remotely sensed data to estimate forest attributes in a spatially continuous manner [29,79]. Additionally, due to the increased availability of satellite images, an upscaling approach to integrate remotely sensed data from multiple sources has arisen as a suitable alternative for mapping forest attributes at large scales [20,80].

Conclusions
Four main conclusions can be drawn from this study: (1) RF models adequately described the relationship between field plot measurements of volume and biomass per unit area and remotely sensed data; (2) model-assisted and model-based estimators based on RF predictions produced similar estimates of population means and change estimates and smaller mean square error estimates than the expansion estimators, thus indicating the utility of remotely sensed data for enhancing forest inventories; (3) because pairs bootstrapping does not mimic the original sample structure and does not preserve the range of predictor variables, wild bootstrapping for MSE estimation is recommended; and (4) the essence of the RF bootstrapping technique could impact on the model-based MSE estimation when internal pairs bootstrapping techniques are applied. By using multi-temporal ALS data, estimates of change for each study area were obtained. For the La Rioja study area the population volume change mean was −10.41 m 3 /ha (−26.85, 6.03 m 3 /ha), whereas for the Våler study area the population biomass change mean was 15.11 Mg/ha (9.89, 17.37 Mg/ha). These results reflect a biomass gain in Våler from 1999 to 2010, while in La Rioja we cannot say whether from 2010 to 2016 there has been a loss or gain in the volume stock of Scots pine forest as the volume change confidence interval contains the zero.