A Generalized Lidar-Based Model for Predicting the Merchantable Volume of Balsam Fir of Sites Located along a Bioclimatic Gradient in Quebec, Canada

Lidar-based models rely on an optimal relationship between the field and the lidar data for accurate predictions of forest attributes. This relationship may be altered by the variability in the stand growth conditions or by the temporal discrepancy between the field inventory and the lidar survey. In this study, we used lidar data to predict the timber merchantable volume (MV) of five sites located along a bioclimatic gradient of temperature and elevation. The temporal discrepancies were up to three years. We adjusted a random canopy height coefficient (accounting for the variability amongst sites), and a growth function (accounting for the growth during the temporal discrepancy), to the predictive model. The MV could be predicted with a pseudo-R2 of 0.86 and a residual standard deviation of 24.3 m3 ha−1. The average biases between the field-measured and the predicted MVs were small. The variability of MV predictions was related to the bioclimatic gradient. Fixed-effect models that included a bioclimatic variable provided similar prediction accuracies. This study suggests that the variability amongst sites, the occurrence of a bioclimatic gradient and temporal discrepancies are essential in building a generalized lidar-based model for timber volume.


Introduction
Forest stands often develop a diversity of canopy structures across space and time. The diversity may result from the natural dynamics in stands through time (e.g., growth, mortality, succession, stand composition), the variability of biophysical factors (e.g., soil, climate, elevation) or human activities [1,2]. Boucher et al. [1] for example, stated that disturbance regimes and climate have an effect on canopy structures. Laflèche et al. [3] showed that site characteristics, species composition, and climate influence site productivity.
Canopy structures can be described through forest attributes measured during field inventories (e.g., canopy cover, height, diameter, stand volume) or through remote sensing tools such as the light detection and ranging (lidar) [4]. Lidar is an active sensor which uses pulses of laser light to measure the distance to a target and record the strength of light backscattering from this target. It generates a point cloud which is a three-dimensional representation of the volumetric interaction between pulse photons and the target. It has been extensively used in forestry to describe the vertical structure of stands [5][6][7]. Metrics can be calculated from the point clouds to statistically describe them (e.g., height metrics, canopy cover metrics). They are often used as explanatory variables to model forest attributes such as basal area, timber merchantable volume, biomass, etc. [8][9][10][11].
Spatially extensive acquisitions of lidar datasets are now possible, making this tool attractive for large-scale inventories [12,13]. Such inventories often require several lidar surveys. Combining the resulting datasets is challenging, especially when there is variability amongst the sites distributed across a given survey area, or when time has elapsed between the field inventory and the lidar survey (temporal discrepancy). In this situation, it is difficult to transfer a site-specific model to another location and therein obtain the same prediction accuracy.
Recent research has ways to accurately predict forest attributes from lidar over large-scale areas. Hansen et al. [2] for example, included biophysical variables (climate, topography, and soil) in their model when predicting lidar canopy height and canopy cover for five ecoregions located along a 4000 km transect. Naesset and Gobakken [14] and Nilsson et al. [15] calibrated site-specific models to predict forest attributes from lidar and combined the models' outputs to make predictions at a large-scale. Their approach allowed for accurate predictions locally but required several models for working at the large-scale level. and opted for a general mixed-effect model to predict timber volume, biomass or dominant height of multiple sites. Both authors obtained an overall goodness comparable to the site-specific (local) models. This approach offered the advantage of building a single predictive model which could be adapted to the specific conditions found in the different sites.
Several questions remain when predicting forest attributes of different sites in an operational context, including the following: (1) Does the combination of multiple lidar datasets in a generalized predictive model increase the bias at each site? (2) Does accounting for the variability in sites limit the model biases at each site? (3) How should forest attributes be predicted outside the range of the studied sites? and (4) Do temporal discrepancies between field and lidar acquisitions have an effect on the model prediction accuracy? This study, therefore, aims at analyzing the effects of variability amongst sites and temporal discrepancies on the prediction accuracy of timber merchantable volume of sites located along a bioclimatic gradient.

Study Sites
Various bioclimatic domains exist throughout the province of Quebec, Eastern Canada [16]. Five study sites populated by balsam fir (Abies balsamea (L.) Mill) were selected along a bioclimatic gradient of temperature, precipitation, elevation and growing degree days (calculated with a 5 • C reference temperature), Figure 1  The field and the lidar data were obtained from the Quebec provincial forest ministry -Ministère des Forêts, de la Faune et des Parcs (MFFP) and the Montmorency Research Forest [17][18][19]. The field data were acquired following a uniform inventory protocol while the lidar data were generated through surveys with variable acquisition parameters.

Field Plot Data
The field data were collected in circular plots (radius = 11.28 m, area = 400 m 2 ). The plot center positions were georeferenced using a precision GPS + Glonass receiver. Field plots were selected following these criteria: (1) balsam fir was the dominant species (>50%); and (2)  . Trees with a diameter at breast height (dbh) above 9 cm (merchantable trees) were measured using a diameter tape. Tree height was calculated using a height-diameter based-equation following Bégin and Raulier [20]: where H ij is the total height of the i th tree of the j th plot, DBH ij the diameter at breast height, β 0 −β k the model parameters, z k the k explanatory variables, and ε ij the error term and δ j the plot random coefficients. The height-diameter ratio (H-D ratio) of each plot was calculated by dividing the average plot height (weighted against the basal area) by the average plot dbh (weighted against the basal area). This ratio accounts for the tree morphology resulting from growth conditions [21]. The field merchantable volume (MV) defined as the volume under bark for stems greater than 9 cm diameter excluding the stump and tip, was calculated following Fortin et al. [22]. The climatic data (temperature, precipitation, growing degree days (gdds)) were derived from the MFFP forest inventory database using the BioSIM software [19]. Table 1 provides a summary of the field data for the study sites.

Lidar Data and Variable Generation
The lidar surveys were conducted in leaf-on conditions in 2014 at the Mauricie_south and Mauricie_north sites, 2011 at the Montmorency site, 2013 at the SLSJ_south site, and 2013-2014 at the SLSJ_north site. They were supplied by various survey providers. Ground-classified returns were interpolated to construct a digital terrain model (dtm) using LAStools software (version 150526 unlicensed, rapidlasso GmbH, Gilching, Germany) [23]. The height of returns was normalized by measuring the elevation difference relative to the underlying terrain model. A 2 m threshold was applied to remove non-canopy returns [9,14]. Point clouds were clipped to the extent of each field plot boundary. Metrics describing them were calculated using the USFS's FUSION software (version 3.42, US Forest Service, Salt Lake City, UT, USA) [24]. They were classified into three standard groups of variables: canopy height CH variables (percentiles of heights, averages), canopy density CD variables (percentage of first returns above or below a 2 m, 5 m or 7 m threshold), canopy structural heterogeneity CSH variables (standard deviation of height, variance of height, coefficient of variation). The average elevation of each plot was also calculated from the lidar raw data. The elevation and the climatic data were included in a fourth group of variables, identified as BIO variables. Table 2 provides a summary of the lidar data for the study sites. a Average and standard deviation of the lidar characteristics; TD: temporal discrepancy with field data; prf: pulse repetition frequency; agfa: above ground flight altitude; msa: maximum scanning angle off-nadir; d: return density above 2 m; havgf = average height of first returns above 2 m; p2f = percentage of first returns above 2 m; hstdf = standard deviation of the height of first returns above 2 m.

Merchantable Volume Modeling
Several generalized MV models were tested. The simplest model (Model 1) ignored the effects of site, growth, and the bioclimatic gradient, while the most complex model (Model 6) included all of these. All possible combinations of candidate metrics (one variable per standard group) were tested in a 3-metric model (Model 1) adapted from Yoga et al. [25]. We selected this model as it has been specifically developed for balsam fir stands and was successfully tested at the Montmorency site: where MV jm is the merchantable volume of the j th field plot of study site m, CH jm the canopy height, ∆ 0 the canopy height intercept (mean height adjustment where MV jm = 0), CD jm the canopy density, CSH jm the canopy structural heterogeneity, β 0 −β 3 the model parameters, and ε jm the error term. Each 3-metric subset was tested separately for each study site. The model's mean square error (MSE) was then calculated. The candidate subset with the smallest MSE across all study sites was selected as the best subset and then used to elaborate the more complex models.
Temporal discrepancies existed amongst the field inventories and the corresponding lidar surveys ( Table 2). Equation (3) describes a growth adjustment function (G), based on Pothier and Savard [26] yield tables, that can be used to account for the discrepancies: where TD is the temporal discrepancy between the field and the lidar data acquisitions (in years), H a height variable, and γ 1 −γ 2 the model parameters. A time-corrected model MV model (Model 2) was then built by adding Equation (3) to Equation (2): The effect of site variability was accounted by using a random coefficient regression [27,28]. Site random coefficients were either added to all the model parameters (b 1−3jm , Model 3) or to the canopy height parameter only (∆ m , Model 4): Bioclimatic variables were added iteratively to Model 4 to account for the bioclimatic gradient (Model 5): Finally, the site random coefficient of Model 5 was ignored to specifically analyze the accuracy of the resulting bioclimatic fixed-effect model (Model 6): (8) All the models were fitted by maximum likelihood. The statistical analyses were carried out using the "nlme" package [29] of the R software [30]. The residual heteroscedasticity was taken into account by adding a variance function to the models following Yoga et al. [25]. In that study, the variance function was comprised of variables describing the average height and the spatial distribution of returns. In our present study, the lidar data were acquired with different settings. We, therefore, opted to omit the return spacing variable of Yoga et al. [25] when accounting for the heteroscedasticity. The random error term ε jm , therefore, had the following form: var(ε jm ) = σ 2 havgf jm β 5 (9) where σ 2 is the residual variance, havgf jm the variance the average height of first returns above 2 m and β 5 the variance parameter.

Model Validation
The best models were selected based on the Akaike Information Criterion AIC (lowest), the residual standard deviation RSD (lowest), the coefficient of determination (pseudo-R 2 , highest), the level of significance (p-value) and the standard deviation of the random coefficients (rstd, lowest). The leave-one-out cross-validation (LOOCV) was used to assess the models' accuracy [31]. A new dataset was created by removing one field observation from any site. The MV model was fitted to the new dataset (training data) and used to predict the MV of the removed observation (test data). The procedure was repeated until predicted values were obtained for all the field observations.

Optimal Subset of Explanatory Variables
The subset of lidar-derived variables, which generated the smallest MSE, included the average height of first returns above 2 m (havgf), the percentage of first returns above 2 m (p2f) and the standard deviation of the height of first returns above 2 m (hstdf). Figure 2 presents the correlation between the field MV and each of the explanatory variables per study site. They were of 0.87, 0.41 and 0.41 respectively.  Table 3 presents a model comparison amongst the MV models. Model 1 was built with the best subset of lidar-derived variables only. It was significantly improved by including the growth adjustment function (Model 2: p-value < 0.0001). The AIC decreased from 2861.8 to 2774.9 and the RSD from 33.3 to 27.2 m 3 ha −1 while the pseudo-R 2 increased from 0.75 to 0.83. Table 4 presents the average biases between the field and the predicted merchantable volumes before (Model 1) and after (Model 2) the addition of the growth adjustment function. The absolute biases decreased for all sites except the SLSJ_south.  Model 2 was further improved by accounting for the variability amongst sites (Models 3 and 4: pseudo-R 2 = 0.86). Similar improvements were obtained between Model 3 (built with multiple random coefficients) and Model 4 (built with the canopy height random coefficient). Model 4, however, had the smallest AIC (2722.7 versus 2746.9). The RSD was 24.3 m 3 ha −1 initially and 25.0 m 3 ha −1 after cross-validation (see also Table 4 for site biases). However, the model residuals were heteroscedastic. The RSD was estimated as 2.8 m 3 ha −1 multiplied by the variance function when accounting for the heteroscedasticity.

General MV Models
The fixed coefficients of Model 4 are presented in Table 5 and the random coefficients, in Table 6. The random coefficients were negative for the Mauricie_south, Mauricie_north and SLSJ_south sites (−0.48 m, −0.37 m, −0.36 m) and positive for the Montmorency and SLSJ north sites (0.70 m and 0.51 m). Their standard deviation was of 0.56 m. It decreased further with the MV models built with the 95th percentile of height (rstd = 0.47). However, these models were less accurate (pseudo-R 2 = 0.82). The rstd decreased substantially when a bioclimatic variable was added to Model 4 ( Table 6: Models 5a and 5b, random coefficients~0 m). The models included the gdd and the elevation variable respectively. The AICs were smaller (Table 3: 2699.5 and 2687.9).  Model 4 random coefficients were plotted against the H-D ratio and the bioclimatic variables ( Figure 3). The Mauricie_south, Mauricie_north, and SLSJ_south, which had low elevations but high gdds and H-D ratios, also had a lower random coefficient. Conversely, the Montmorency and SLSJ_north sites, which had high elevations but low gdds and H-D ratios also, had a higher random coefficient.
The fixed-effect models (Models 6a and 6b) produced similar accuracies as Model 4 (pseudo-R 2 = 0.86). Model 6b which included the elevation variable had the lowest AIC value (2688.9). The average biases remained small ( Table 4). The RSD was of 24.2 initially and of 24.8 m 3 ha −1 after cross-validation. It was estimated to 2.1 m 3 ha −1 multiplied by the variance function when accounting for the heteroscedasticity.

Discussion
Our study aimed at building a generalized lidar-based model to predict the timber merchantable volume of various study sites located along a gradient of bioclimatic factors. The field and the lidar data were acquired at different dates. We opted for a nonlinear random coefficient model to predict the MV as several studies have shown that mixed-effect models can accurately predict forest attributes both at small and large-scale levels [32][33][34]. The model needed to be adjusted given the variability amongst study sites and the temporal discrepancies.
A crucial step in building the MV models was the search of optimal explanatory variables. The combination of havgf, p2f and hstdf provided the best subset of variables. The addition of a growth adjustment function improved the model significantly (p-value < 0.0001). This function accounted for the MV growth that occurred during the temporal discrepancy. It also enabled us to accurately combine the different lidar datasets. It included a height variable (havgf) as the effect of time on merchantable volume varies with the stand development stage. This result confirms the need to adjust predictive models when temporal discrepancies occur [35]. Models 4 and 6b were the best candidate models. Model 4 contained a site random coefficient (∆ m ) accounting for the variability amongst sites. The model performed as efficiently as Model 3, which contained several coefficient adjustments (p-value = 0.9913). This result showed that a local adjustment of canopy height was therefore enough to account for the variability amongst sites. This can be explained by the fact that the correlation between the MV and havgf was high compared to the other explanatory variables (Figure 2). It was of 0.87 versus 0.41 for p2f and hstdf. Modifying havgf only would, therefore, have a comparable effect to modifying all the model parameters.
Model 4 random coefficients varied with the characteristics of the study sites: they were positive at the Mauricie_south, Mauricie_north and SLSJ_south sites and negative at the Montmorency and the SLSJ_north sites. The Mauricie_south, Mauricie_north and SLSJ_south sites also had higher H-D ratios (H-D = 0.99, 0.98, 0.96; Table 1) compared to the Montmorency and the SLSJ_north sites (H-D = 0.94 and 0.88). Thinner stems are characterized by higher H-D ratios and have lower merchantable volumes. For a given height, trees were therefore possibly thinner at the Mauricie_south, Mauricie_north and SLSJ_south sites compared to the Montmorency and the SLSJ_north sites. For a given average height, canopy density and canopy structural heterogeneity, havgf needed to be decreased at the Mauricie_south, Mauricie_north and SLSJ_south sites and increased at the Montmorency, SLSJ_north sites when predicting the MV with the same equation. Table 1 shows a gradient of bioclimatic factors amongst study sites: higher temperatures and gdds were observed at the Mauricie_south, Mauricie_north and SLSJ_south sites compared to the Montmorency and SLSJ_north sites. The random coefficient values (Models 5a and 5b) decreased substantially when a bioclimatic variable was added to the canopy height parameter. This result suggests a correlation between the site variability and the bioclimatic gradient in this case study. The fixed-effect Model 6b which included the elevation variable had a similar prediction accuracy to Model 4 but a lower AIC value. The relationship between the environmental conditions of a study area and the canopy structure variability has also been assessed in other studies [2,36]. Gdd is an indicator of accumulated heat and can be correlated with tree growth [37]. Warmer temperatures and longer growing seasons have a positive effect on tree height growth [38] as well as lower elevations [39]. Trees will, therefore, have a higher H-D ratio and a lower merchantable volume for a given height. Conversely, lower gdds or higher elevations have a negative effect on tree height growth. Trees will have a lower H-D ratio and a higher merchantable volume for a given height. The H-D ratio was smallest at the SLSJ_north site and Montmorency sites.
The canopy variability may be due to other factors such as tree mortality, fire disturbance, site species composition, lidar sensor settings, return density, etc. Sites were composed of similar species, predominantly balsam fir and spruce (Table 1). We tested Models 4 and 6b on a sample of field plots where the proportion of balsam fir was > 60% (256 plots). We obtained similar MV predictions (pseudo-R 2 = 0.87 and 0.86; RSD = 24.4 m 3 ha −1 and 24.5 m 3 ha −1 respectively). We, therefore, considered that the species composition had little effect on the MV predictions in this case study.
Several studies have also demonstrated that the lidar settings can have an effect on the precision of derived forest attributes [40][41][42]. The lidar data were acquired separately and with variable acquisition settings in this case study (Table 2). We could not analyze the settings' effects on the MV models as we did not have enough datasets to draw conclusions. However, their effects would likely be minor compared to the bioclimatic gradient, which had a direct effect on the MV. Further research should be done on this issue.
Our study has several practical uses in an operational context. It suggests flexible models to predict the merchantable volume in study sites where spatial and temporal variabilities are observed. The proposed models are adaptable enough to combine lidar datasets acquired at different time periods. They can accurately predict the merchantable volume both at local (site-specific) and large-scale levels (multiple sites). Using a random coefficient model, such as Model 4, is more advantageous than building several local lidar-based models when predicting forest attributes at a large-scale level. When the variability amongst sites is related to a bioclimatic gradient, a simpler fixed-effect model, such as Model 6b, could then be used with a similar prediction accuracy. The fixed-effect models offer the advantage to be generalizable to other sites of known gdd/elevation, even when field inventory has not been done. As lidar data will be available for the entire province of Quebec by 2021, new lidar study sites could be tested to confirm the relationship between the bioclimatic gradient and the MV variation. The data could also be combined with satellite data to predict the MV at large scale [43,44].

Conclusions
Canopy structure variations can be caused by natural forest dynamics occurring over time or by variations of bioclimatic factors (including temperature and elevation). We tested a lidar-based model to predict the merchantable volume (MV) of balsam fir sites located along a bioclimatic gradient of temperature and elevation. Temporal discrepancies up to three years also existed between the field inventory and the lidar data acquisitions. A random coefficient model was adjusted to include a random canopy height coefficient (accounting for site variability) and a growth function (accounting for growth during the temporal discrepancy). The pseudo-R 2 was of 0.86 and the RSD of 24.3 m 3 ha −1 . The model performed as efficiently as a model with adjustments on all the parameters (p-value = 0.9913). The site variability was found to be related to the bioclimatic gradient. Fixed-effect models built with bioclimatic variables produced similar prediction accuracies (pseudo-R 2 = 0.86, RSD ≤ 24.5 m 3 ha −1 ). The effects of site variability, bioclimatic gradients, and temporal discrepancies were found to be essential to generalized lidar-based modelings.