1. Introduction
Airborne Laser Scanning (ALS or lidar) has proven to be a useful 3d tool for the measurement of forest attributes over the past decades. It generates point clouds which are a three-dimensional representation of the volumetric interaction between pulse photons and illuminated objects. The spatial distribution of returns within the point cloud depends on the characteristics of the lidar systems and the target object [
1,
2]. The scan mechanisms (e.g., oscillating mirror, rotating mirror, nutating mirror) mounted on lidar scanners produce different scanning patterns of pulses on the ground [
1]. For example, the bi-directional scan mechanism of oscillating mirrors produces a see-saw scanning pattern where pulses tend to be more homogeneously distributed along the center of the flight line than at the borders. Consequently, an alternation of local clumps of returns and local gaps will be observed within the point cloud [
3]. Scanning angle is another setting which influences the spatial distribution pattern. For example, high-scanning angles increase the distance traveled by the incident pulse. Objects located further from the lidar sensor are likely to be occluded or undetected [
4], thus producing an irregular spatial distribution of returns.
The effects of the surface characteristics of scanned objects on the scanning pattern are not controlled by the user. Gatziolis et al. [
1], for example, demonstrated that higher return densities are recorded in forest stands compared to green pastures under similar lidar settings. Korpela et al. [
2] noted that leaf size and orientation and foliage density affect the intensity of lidar returns. Balsa-Barreiro et al. [
5] noted that topography and land cover influence the density and spacing of returns.
Metrics (e.g., height metrics, canopy cover metrics), derived from point clouds are often used to model forest attributes such as volume [
6,
7]. Hence, highly correlated models between lidar data and field volume have been achieved in many studies [
8,
9,
10,
11]. Studies on lidar modeling focus on investigating the optimal explanatory variables that best explain the variability of the forest attributes. The relationship between the explanatory and dependent variable is obtained by means of a regression model. Least square procedures are commonly used to estimate the parameters of the regression model [
12]. These procedures focus on minimizing the sum of squared residuals. Heteroscedasticity occurs in a model when the variance of the residuals is not constant [
12,
13]. Some possible causes of heteroscedasticity can be measurement errors of the data, autocorrelation or misspecification of the model. This can lead to a significant loss of precision where the estimated parameters of the model are inefficient, although unbiased. The standard errors and confidence intervals are also unreliable. In other words, the forest attribute can still be predicted, but the predictions are uncertain. In an operational context, uncertain predictions of a forest attribute such as the merchantable volume have a direct effect on the evaluation of its economic value [
14]. Nonetheless, heteroscedasticity may be corrected using different methods [
12,
15,
16]. Variable transformation for example modifies the measurement scale of variables in a model and heteroscedasticity may then be corrected [
12,
17]. Variance modeling identifies the sources of heteroscedasticity in a model and attempts to better estimate the model uncertainty through a variance function [
15].
Few studies can be found on the effects of lidar on the uncertainty of predictive models (e.g., [
4,
18,
19]). Yet, with the increasing use of lidar in forestry, reliable estimates of forest attributes derived from lidar are necessary. There is, therefore, a need to investigate whether factors like the spatial pattern of returns can also influence the predictions or the uncertainty of modeled forest attributes. This study, therefore, aims to analyze the effects of three lidar attributes (return density, return spacing, scanning angle) on the predictions and the uncertainty of merchantable volume estimates.
3. Results
We found a very good correlation between field and predicted merchantable volumes for balsam fir stands even before the introduction of the SP variables (pseudo-R
2 = 0.91,
Table 3, and
Figure 4). The best predictors were the mean height of first returns (Mean_h), the proportion of first returns below 2 m (
_2m) and the rumple index (Ri). The VIF factor was below 10. The RSD was 28.0 m
3 ha
−1 for the regression model and 28.98 m
3 ha
−1 after cross-validation. The best equation was:
Figure 5 shows plots of the model residuals versus the predicted MV, Mean_h and Skew_area variables. Residuals had an outward-opening funnel form: they were more variable as the predicted MVs increased. The MV parameter was significant (
p = 0.02) when regressing the absolute residuals against the predicted MVs. This indicated the presence of heteroscedasticity and needed to be corrected.
The spatial distribution of returns barely influenced the predicted MV. We found no significant improvement when adding an SP predictor to the fixed part of the MV model.
Table 4 shows a model comparison between the basic MV model and 3 MV models built with an additional SP variable:
The additional SP variables did not improve the basic model (AICc = 1137.79, 1137.48 and 1139.32 respectively for Equations (7) – (9) versus 1137.23 for Equation (6)). The density variable (Equation (9)) was the least significant (p = 0.74). RSDs were however similar for all models.
Table 5 shows a comparison between MV models before (basic model) and after addition of a variance function (corrected models). Adding SP variables to the random part of the MV model improved the precision. The corrected MV models had lower AICc value (1110.32, 1123.58 and 1106.72 versus 1137.23 for the basic model). Mean_h and Skew_area had the most effect on the unexplained variance as residuals were more variable when these variables increased (
Figure 5,
Table 6). The MV uncertainty was better assessed by adding the following variance function to Equation (6):
The RSD, which was 28.0 m3 ha−1 initially, was estimated to 3.7 m3 ha−1 multiplied by the variance function, when considering the heteroscedasticity of residuals.
4. Discussion
This study aimed to develop an efficient lidar model to predict merchantable volume in balsam fir stands and to better assess the model uncertainty. As uncertainty increases, the model parameters can still be estimated but the confidence intervals are unreliable [
12,
27]. Uncertain predictions induce measurement errors in the volume of stands and consequently on the evaluation of their economic value [
14]. The uncertainty of merchantable volume should, therefore, be correctly estimated.
As expected, highly significant model for predicting the MV could be developed using standard lidar metrics (pseudo-R
2 = 0.91, RSD = 28.0 m
3 ha
−1). This is consistent with other studies indicating that lidar variables can accurately predict forest structure attributes in the eastern Canadian boreal forest [
11,
28,
29]. The optimal explanatory variables of the model were the mean height of first returns, the proportion of first returns below 2 m and the rumple index. Mean height is a good predictor of volume as shown in many studies (e.g.: [
8,
11]). The variable was positively correlated to the MV in the model as shown in
Table 3.
The proportion of first returns below 2 m variable could be correlated with the fraction of canopy gaps. Canopy gaps are small openings of stands where there are smaller or no trees [
30]. They are often found in mature stands, and reduce the overall stand volume. Several plots of our study site were within mature forest stands. The variable was negatively correlated to the MV in our model as shown in
Table 3.
The rumple index measures the canopy surface roughness or structural complexity. It has also been defined as a 3d measure of canopy heterogeneity [
23]. It can be used to characterize multi-cohorts stands such as those present in our study site. Mature trees, which have bigger volume, are more likely to be present in multi-aged stands [
31,
32]. The variable was positively correlated to the MV in the model as shown in
Table 3.
Figure 2 shows that the spatial distribution of returns can remain irregular despite a high return density or the presence of overlapping strips. The consequences of an irregular spacing pattern are directly evident when building, for example, canopy surface models or digital elevation models [
3,
33,
34]. In our case study, the irregular pattern induced an alternation of local clumps (overscanned areas) and local gaps (areas without data) within the point cloud. This pattern did not affect the accuracy of the model as adding an SP variable to the fixed part did not improve the predictions. However, the residuals were affected by the pattern. This shows that an irregular spatial distribution of returns can increase the uncertainty of predictive models.
The variance function helped to identify the sources of heteroscedasticity of residuals. Residuals were more heteroscedastic with increases of the mean height of returns (Mean_h) and the skewness of the area distribution of triangulated returns (Skew_area); see
Figure 5 and
Table 6. This suggests a higher MV uncertainty in high stands (e.g., mature forest stands) and with irregular lidar scanning patterns. Natural high stands tend to have a more heterogeneous canopy height structure compared to low stands [
35]. An irregular scanning pattern over these stands could over or under characterize the height distribution substantially. Yet these stands are preferentially cut during forest operations. Their MV, therefore, need to be precisely predicted. Conversely, low stands, tend to have a more homogeneous canopy height structure. The over or under characterization of the height distribution due to the irregular scanning pattern would be less substantial.
We further tested the robustness of the corrected MV model for a low-density scenario like in some operational context (pulse density ≤ 2 m−2). We obtained similar results to the previous analyses. The RSD was 27.4 m3 ha−1 initially and was estimated to 3.7 m3 ha−1 multiplied by the variance function when considering the heteroscedasticity of residuals. Mean_h and Skew_area remained the best combination of covariates for the variance function. The spacing distribution of returns should also be considered when planning lidar surveys or, at least, when analyzing acquired lidar data.
Pooling lidar strips together at the plot level increased the return density (6.4 m
−2 on average). However, our analysis shows that the return density did not influence the prediction of the merchantable volume estimates for the study site (
p = 0.74). This result is in accordance with other studies (e.g., [
10,
11,
36]) done even in low return density contexts.
We tested variable transformation to correct for heteroscedasticity. The response and explanatory variables of the MV model were strictly positive (see
Table 1 and
Table 2), and three variables (MV, Prop_2m, and Ri) had a right-skewed distribution. We thus tested a square root, a cube root, and a logarithmic transformation. The heteroscedasticity was best reduced with the logarithmic transformation. The
p-value for the predicted MV parameter was improved, however, it remained significant (0.01). The model had the following form:
Acquiring lidar data with a small scanning angle would also be an ideal solution to obtain more homogeneous data but the acquisition costs would also be greatly increased. Other authors have chosen to exclude from the point clouds either pulses transmitted at high scanning angles [
18,
37] or irregularly spaced sectors within strips [
3]. However, these changes would substantially alter the point cloud structure. A statistical approach, as the one we have proposed, has the advantage of alleviating these problems while entailing no additional costs. We recommend the use of an additional spatial distribution variable when modeling forest attributes models from lidar point clouds having an irregular spatial distribution of returns.
A practical application of our study is the establishment of reliable uncertainty maps of predicted merchantable volumes. Users can then confidently assess the reliability of the estimates and consequently better plan their harvesting operations.