Forest Site and Type Variability in ALS-Based Forest Resource Inventory Attribute Predictions over Three Ontario Forest Sites

Over the last decade, spatially-explicit modeling of landscape-scale forest attributes for forest inventories has greatly benefitted from airborne laser scanning (ALS) and the area-based approach (ABA) to derive wall-to-wall maps of these forest attributes. Which ALS-derived metrics to include when modeling forest inventory attributes, and how prediction accuracies vary over forest types depends largely on the structural complexity of the forest(s) being studied. Hence, the purpose of this study was to (i) examine the usefulness of adding texture and intensity metrics to height-based ALS metrics for the prediction of several forest resource inventory (FRI) attributes in one boreal and two Great Lakes, St. Lawrence (GLSL) forest region sites in Ontario and (ii) quantify and compare the site and forest type variability within the context of the FRI prediction accuracies. Basal area (BA), quadratic mean diameter-at-breast height (QMD), and stem density (S) were predicted using the ABA and a nonparametric Random Forests (RF) regression model. At the site level, prediction accuracies (i.e., expressed as RMSE (Root Mean Square Error), bias, and R2) improved at the three sites when texture and intensity metrics were included in the predictor set, even though no significant differences (p > 0.05) could be detected using the nonparametric RMANOVA test. Stem density benefitted the most from the inclusion of texture and intensity, particularly in the GLSL sites (% RMSE improved up to 6%). Combining site and forest type results indicated that improvements in site level predictions, due to the addition of texture and intensity metrics to the ALS predictor set, were the result of changes in prediction accuracy in some but not all forest types present at a site and that these changes in prediction accuracy were site and FRI attribute specific. The nonparametric Kruskal–Wallis test indicated that prediction errors between the different forest types were significantly different (p ≤ 0.01). In the boreal site, prediction accuracies for conifer forest types were higher than for deciduous and mixedwoods. Such patterns in prediction accuracy among forest types and FRI attributes could not be observed in the GLSL sites. In the Petawawa Research Forest (PRF), we did detect the impact of silvicultural treatments especially on QMD and S predictions.


Introduction
Over the last decade, spatially-explicit modeling of landscape-scale forest attributes for forest inventories has greatly benefitted from airborne laser scanning (ALS), either as the only data source to model forest inventory attributes [1][2][3], in combination with multi-or hyperspectral imagery [4], or in combination with digital aerial photogrammetry (DAP) to update forest inventories [5] (for an overview see Wulder et al. [6]).ALS has shown great utility for capturing the three-dimensional structure of forest canopies [7] with ALS metrics, such as mean height of ALS returns, serving as suitable predictors of forest structural attributes.In the context of forest resource inventories (FRI), attributes such as timber volume (VOL), aboveground biomass (AGB), basal area (BA), quadratic mean diameter-at-breast height (QMD), and stem density (S) have been successfully estimated using ALS metrics, especially with area-based approaches (ABA), in a variety of forest ecosystems worldwide (e.g., [2,3,[8][9][10][11][12][13][14]).These attributes are crucial for the assessment of carbon and growing stock, size-class distributions and timber assortment, and sustainable forest management (SFM) at local, provincial, national, and global scales.
Many studies have focused on what ALS metrics to include in the predictor set to improve FRI attribute predictions.Furthermore, an increasing number of studies are focusing on testing the generality and transferability of ALS-based predictions of FRI attributes over larger landscapes and for different forest sites to determine if relationships between key ALS metrics and FRI attributes remain consistent over multiple sites and multiple forest cover conditions [1,3,8,11,[15][16][17]. Kotivuori et al. [17] found that forest structure and the specific ALS sensor and acquisition parameters have considerable effect on FRI attribute predictions when they compared nationwide to regional-level prediction models.The identification of key ALS metrics for the prediction of FRI attributes is important because over time, a large set of ALS metrics have been developed from normalized height (z) information contained in the ALS data.Given this plethora of ALS metrics, various authors have identified points of concern (e.g., [16,18,19]).For example, strong intercorrelations between metrics can cause problems in feature selection, i.e., the selection of a subset of predictors for a parsimonious and robust model, and model generalization for sites with different vegetation and terrain conditions [18][19][20].They can also hamper the interpretation of the relationships between ALS metrics and predicted FRI variables [21].Large numbers of highly correlated and redundant metrics can increase the likelihood that the selected predictors are sensitive to stochastic "noise" rather than general trends in the data [22].This, in turn, may adversely affect model generalization, i.e., the quality of the predictions may decrease when applied to new data [23,24].Based on reviews of the literature, White et al. [25,26] conclude that ALS metrics related to height, the variability of height, and canopy cover/density are often the most reliable predictors of FRI attributes.Others, e.g., Bouvier et al. [16], have argued that ALS metrics used for the prediction of FRI attributes should also incorporate ALS metrics that characterize the vertical structure of forest stands.Even though in most studies these ALS metrics explain the majority of variation within FRI attributes, additional ALS metrics may be required to improve FRI attribute predictions in more complex forest ecosystems, e.g., forest ecosystems with a large variety of tree species or ones undergoing partial-harvest silviculture, or when modeling more complex forest stand attributes such as diameter distributions.
Based on the utility of texture metrics derived from optical imagery for forestry applications (e.g., [27,28]), texture metrics have also been developed from ALS-based canopy height models (CHM) [29][30][31][32].These texture metrics allow for a more detailed characterization of horizontal forest structure or upper canopy variability, i.e., in the prediction of species [29], growing stock attributes [31,33], and the spatial arrangement of trees [30,32].Ozdemir and Donoghue [31] showed that texture metrics generated from ALS-derived canopy height models (CHM) explained tree size diversity (tree height and QMD diversity).Niemi and Vauhkonen [33] found that texture metrics improved predictions of VOL and BA in Finland's boreal forest and helped cluster forest stands into groups with increasing maturity and stocking characteristics.Hence, texture appears to capture upper canopy variability which can be indicative of stand development stage [33].Intensity metrics are another set of ALS metrics that, until recently, have generally been excluded from ALS metric sets for the prediction of FRI attributes because of the radiometric heterogeneities in uncalibrated intensity values due to system-and environment-induced distortions [34].These distortions degrade the intensity quality and hence may reduce the performance and robustness of the predictive models applied.However, some studies have incorporated intensity-based ALS metrics to improve ABA growing stock [35], diameter distribution models [36], tree species classification [37], and gap fraction [38].The utility of intensity data is primarily attributed to its ability to discriminate among tree species and characterize foliage distribution within the canopy [39].Shang et al. [36] postulated that intensity metrics may capture the permeability or layering of the canopy.For example, early forest stand development stages are characterized by simple forest structure which results in a relatively low overall canopy density/high canopy permeability.In such stands the authors observed low variance of intensity values.
The variability in FRI attribute prediction accuracy within forest types has been addressed either by predicting FRI attributes separately by forest type [9,10,13,[40][41][42][43][44][45] or by predicting FRI attributes over the entire forest site and subsequently extracting predictions (and error) by forest type (e.g.[13,44,45]).For example, Pitt et al. [44] found that the quality of FRI attribute predictions varied with forest type regardless of which prediction method used.Studies in boreal and temperate regions in Europe (e.g., [40,41,43,46]) found more accurate predictions of FRI attribute estimates for conifers than deciduous or mixed forest types.However, Spriggs et al. [45] found that BA and S were more accurately predicted for broadleaved species than conifers in the Great Lakes, St. Lawrence (GLSL) forest region in Ontario.The authors postulated that estimation models fit dominant forest type better.Hence, prediction accuracies for coniferous species are better in boreal regions (where conifers are dominant), whereas prediction accuracies for deciduous species are better in deciduous dominated forests such as the GLSL.In addition, Heurich and Thoma [42] found that not only forest species composition but also forest management practices influenced the relationships between FRI attribute prediction accuracies and the ALS-based predictors.Their FRI prediction accuracies showed no significant differences between deciduous and coniferous forest areas in a montane mixed forest in Germany which they attributed to selective harvesting activities within some of the conifer stands and that subsequently lowered the overall predictive accuracies in their conifer forest area.
Here, we examine the inclusion of CHM texture and intensity metrics on FRI attribute prediction of BA, QMD, and S because these metrics characterize stand structural properties not characterized by ALS metrics derived from the normalized height information.We test these metrics at three study sites in Ontario that vary in terms of forest complexity.We then compare forest site and forest type variability in the FRI attribute prediction accuracies within and between these sites to determine how FRI predictions and errors vary.We ask the following questions.
(1) How does the inclusion of CHM texture and/or intensity information affect the FRI attribute prediction accuracy at the overall site level?We hypothesize that the addition of texture and intensity information will benefit predictions more for complex forest ecosystems, i.e., the GLSL, than for structurally simpler forest ecosystems, i.e., the boreal forest region.(2) How does inclusion of CHM texture and/or and intensity information affect FRI attribute prediction accuracy at the forest type level?Are FRI attribute prediction errors within forest types significantly different from each other within and between sites and how does it change when additional texture and/or intensity metrics are used in the predictions.

Study Sites
We conducted our research across three forest sites in northern and central Ontario, Canada to capture some of the diversity that exists within Ontario's managed forest ecosystems (Figure 1).One forest site is representative of the boreal forest region (i.e., Hearst Forest (HF)) and two sites represent the GLSL forest region (i.e., Petawawa Research Forest (PRF) and Haliburton Forest and Wildlife Reserve (HFWR)), the transitional zone between the northern boreal forest and the eastern deciduous forest of North America.Even though PRF and HFWR are both in the GLSL forest region, they do differ in terms of species composition.Due to differences in annual precipitation, soils, and management practices, PRF tends to be dominated by conifer species while HFWR is more hardwood dominated.For forest management planning purposes, Ontario recognizes eight provincial forest types that have been grouped by ecological and site characteristics such as species composition and ecosite: (i) white and red pine (white and red pine mixedwood stands); (ii) jack pine; (iii) upland conifers (predominantly mixed spruce, jack pine, and fir stands on upland sites); (iv) lowland conifers (predominantly black spruce stands on low, poorly drained sites); (v) mixedwood (mixed stands containing spruce, jack pine, fir, poplar, and white birch); (vi) poplar; (vii) white birch; and (viii) tolerant hardwoods (predominantly maple and oak).Variability does exist within these forest types, e.g., mixedwoods and tolerant hardwoods are more heterogeneous than jack pine.This can be attributed to varying stand development and seral stages, stand age, understory species composition, natural disturbances, and the impact of silvicultural practices.The three study sites overlap in provincial forest types to some extent, i.e., forested wetlands (lowland conifers), spruce forests (upland conifers), intolerant hardwoods (poplar and white birch), and mixedwoods stands can be found at all three sites.However, these forest types and sites are managed differently, i.e., spruce forests are managed at HF and unmanaged at HFWR.Tolerant hardwoods are only found in the GLSL forest region sites and the pine forest types are only encountered at HF and PRF.

Hearst Forest (HF)
The Hearst Forest (49 • 40 N, 83 • 40 W) is located in northeastern Ontario and is the largest of the three sites (1,200,000 hectares) of which more than 1,000,000 ha is productive forest managed by Hearst Forest Management Inc. (HFMI) under a Sustainable Forest License (SFL) [47] (Figure 1a).This means that stands are predominantly managed under a clear cut silvicultural system in an attempt to emulate natural disturbance regimes, i.e., fire.The region's climate is classified as modified continental with mean daily temperatures of −17.9 • C and 17.4 • C in January and July, respectively, and an average precipitation of 830 mm.The greatest amount of precipitation occurs in July (average of 102 mm) and there is an average of 101 frost-free days annually.The most dominant tree species in the Hearst Forest is black spruce (Picea mariana Mill.B.S.P.), being present as a major component in various forest types that cover 67% of the land base.On better drained, more productive lowland transitional and upland sites, black spruce may be found with white spruce (Picea glauca [Moench.]Voss), jack pine (Pinus banksiana Lamb.), balsam fir (Abies balsamea [L.] Mill), and trembling aspen (Populus tremuloides Michx.).In lowland areas (in the clay belt section of the forest) with poor drainage, pure black spruce stands and black spruce in association with cedar (Thuji occidentalis L.) and tamarack (Larix laricina [Du Roi] Koch) are prevalent, whereas in wet but very well drained lowland areas black spruce is found in combination with cedar, tamarack, and white spruce.In upland sites, mixedwood stands with jack pine, black and white spruce, trembling aspen, balsam poplar (Populus balsamifera L.), white birch (Betula papyrifera Marsh.) and balsam fir; intolerant hardwood stands (predominantly trembling aspen and white birch); and jack pine dominated stands are found [47].

Petawawa Research Forest (PRF)
The PRF (45 • 57 N, 77 • 34 W) (Figure 1b) encompasses approximately 10,000 ha and is located within the GLSL forest region.This region's climate is continental and subhumid with mean monthly temperatures of −11.8 • C and 20.3 • C in January and July, respectively, and an average precipitation of 859 mm per year.The greatest amount of precipitation occurs in June with an average of 87 mm and there is an annual average of 139 frost-free days.The overstory of this mixed mature forest is characterized by eastern white (Pinus strobus L.), red (Pinus resinosa Ait.) and jack pine on dry, nutrient-poor sites, and trembling aspen and white birch on sandy to clayey upland sites.Tolerant hardwood species such as sugar (Acer saccharum Marsh) and red maple (Acer rubrum L.) are also abundant, especially on nutrient-rich uplands.Red oak (Quercus rubra L.), albeit in lower abundances, is found on upper, south-, and west-facing slopes with shallow soils while shade-tolerant conifers, such as eastern hemlock (Tsuga canadensis L.) are more commonly found in valleys and on northand east-facing slopes [48,49].The PRF was established to improve the scientific basis for forest management, hence plantations of different species and initial planting densities, white and red pine stands treated with a uniform shelterwood system, and natural unharvested control stands can be found in PRF [10].

Haliburton Forest and Wildlife Reserve (HFWR)
The HFWR (45 1c) is also located in the GLSL forest region.It is a privately owned Forest Stewardship Council (FSC)-certified forest covering 23,800 ha.The forest has an extensive management history, starting in the mid-1800s with the selective removal of white pine.Since the 1960s, the forest has been managed using various forms of partial harvesting to maintain a balance between economic value and ecological integrity.This has resulted in uneven-aged forest stands in addition to some old-growth stands on steeper terrain [50,51].Like the PRF, the climate is continental and subhumid with mean monthly temperatures of −9.9 • C and 18.7 • C in January and July, respectively, and an average precipitation of 1074 mm per year.The greatest amount of precipitation (mixed) occurs in November (average of 116 mm) and there is an average of 104 frost-free days annually.The HFWR is a mixed forest dominated by sugar maple, but with a significant conifer (eastern hemlock and balsam fir) and hardwood presence (American beech (Fagus grandifolia Ehrh.) and yellow birch (Betula alleghaniensis Britt.))[51].

Field Data Collection
For all three study sites, field data were collected close to the year of the ALS data acquisition.All field data were collected in forest stands (average tree height ≥5 m.) using a stratified random sampling design, albeit varying by site.In HF, the study site was stratified by forest type (eight) and development stage (young to over-mature conditions) using an ALS metric (vertical complexity index: VCI) but excluded forest stands impacted by harvest, insect infestation, fire, and blowdown [13].In the PRF a subset of previously established field plots [52] was selected that covered the majority of the provincial forest types and a range of BA conditions.The HFWR was stratified by seven ecotypes and then further subdivided into six levels of canopy openness [53].In the established circular plots, information on tree species, status (living or dead), and diameter at breast height (DBH: 1.3 m) were collected for all trees over a DBH threshold of 9 cm.The center of each plot was geo-referenced using a GPS system (Table 1).The largest number of tree species were present in PRF and HFWR, i.e., both sites included 23 species.Ten tree species were found in the HF dataset.The relative abundance of tree species in the sampled plots varied by site (Table 1).The dominant tree species were eastern white pine, sugar maple, and black spruce for PRF, HFWR, and HF, respectively.To be able to compare FRI attribute prediction accuracies among forest types, each plot was assigned to one of eight Ontario provincial forest types.Because plot numbers in the poplar and white birch were low, these two forest types were merged into one forest type (i.e., intolerant hardwoods).We also merged the jack pine and red and white pine forest types into one pine forest type because HF was the only site including jack pine dominated plots and PRF the only site that included red and white pine mixedwood plots.It should be noted however that the jack pine forest type in HF and the white and red pine forest types in PRF are very different in terms of species characteristics and ecosites that they occupy and hence no comparison between the HF and PRF sites with respect to this merged forest type were made in this study.Additional field data collection and sampling specifications are listed in Table 1.A summary of the field data is presented in Table 2.The cumulative distributions and overall means of BA, QMD, and S in the GLSL forest sites (PRF and HFWR) are more comparable to each other than to the boreal forest site (HF) where overall mean BA and QMD was lower than in the GLSL sites (22.4 m 2 ha −1 compared to 27.5 and 29.3 m 2 ha −1 for BA, and 16.7 cm compared to 22.3 and 22.6 cm for QMD, for HF, HFWR, and PRF, respectively) and overall mean S was higher (1019 stems ha −1 compared to 757 and 881 stems ha −1 for HF, HFWR, and PRF, respectively).We also see a larger range of BA and S values in the boreal compared to the GLSL sites (Table 2).

ALS Data Acquisition and Processing
ALS data for all three forest sites were collected during the growing season (i.e., during leaf-on conditions).Each of these data sources were acquired in different years with different sensors and sensor specifications (Table 3).The ALS data for each site had been preprocessed to varying levels.The HF and HFWR ALS data were classified into ground and nonground returns by North West Geomatics Ltd. (Calgary, AB, Canada) and Teledyne Optech Inc. (Vaughan, ON, Canada) respectively and normalized by subtracting the elevation of a modeled triangular irregular network (TIN) based on ALS points classified as ground returns from each ALS nonground point [13,36].For the PRF, the vendor (Teledyne Optech Inc., Vaughan, ON, Canada) collected multispectral ALS data of which only the near infrared channel was used in this study.The latter were not classified into ground and nonground returns by the vendor but were normalized using ground returns from a previous ALS data acquisition over PRF.ALS metrics were calculated at the plot level, using all returns and without any height threshold to filter the point data.The latter was based on the results of Woods et al. [10,12] who found that FRI attribute models using ALS metrics without a height threshold performed better than models with a height threshold.To date, the most common algorithms that can be used for radiometric processing of the ALS intensity data, aim to account for the influence of varying path lengths on the ALS returns [37,54,55].These algorithms require at a minimum the range between the sensor and individual returns, which is contained within the trajectory file of the plane.This trajectory file, i.e., range information, was not available for two of the ALS datasets (HF and HFWR) and, hence, no radiometric normalization was performed on the intensity data in this study.This lack of normalization may limit the application of FRI attribute models that include intensity-based predictors to landscape level predictions.
We divided our set of ALS metrics into three groups: normalized height-based ALS metrics, CHM-based texture metrics, and intensity-based metrics.The first group included measures of central tendency, measures of dispersion, height percentiles, canopy cover/density, and stand complexity metrics based on ALS normalized height (z) values.The second group included eight gray level co-occurrence matrix (GLCM) texture metrics [56] derived from the 2-m gridded CHM.The GLCM is a tabulation of how often different combinations of pixel gray levels, i.e., CHM pixel values, occur at a specific distance and orientation [57].The GLCM texture metrics that can subsequently be calculated can be categorized as contrast, orderliness and descriptive statistics.We used two to three texture metrics per category since many of the GLCM texture metrics are correlated [57].The last group included ALS intensity metrics consisting of measures of central tendency, dispersion and cumulated intensity at height percentiles.See Table A1 for the complete list of ALS metrics.
We performed the majority of calculations to derive the ALS metrics in R version 3.4.1 [58] using the lidR package version 1.5.0 [59].The GLCM texture metric calculations were performed in Definiens eCognition (version 9.0) [60].The GLCM was calculated using all CHM pixels within each object, i.e., each of the sample plot extents, unlike window-based GLCM texture metrics, and for all directions (0 • , 45 • , 90 • , 135 • ) to ensure that the texture metrics were directionally invariant.To reduce border effects, CHM pixels bordering the sample plots directly were also used in the GLCM calculation [60].

Statistical Analysis
Our major analyses are summarized in Figure 2. First nonparametric models were constructed to predict BA, QMD, and S at our three sites using four different sets of ALS metrics as predictor variables.With the four sets of predictor variables we set out to test whether BA, QMD and S predictions in boreal and GLSL sites improved with additional characterization of the upper canopy (also known as texture) and/or additional characterization of canopy permeability (also known as intensity).We then assessed the performance of these four RF models in each study site and at the forest type level using %RMSE, %Bias, and R 2 .Lastly, we tested if the prediction errors on the site and forest type level were significantly different from each other using Friedman's and Kruskal-Wallis tests.
The methods to predict and map FRI attributes vary greatly and range from parametric ordinary least square (OLS) regression to nonparametric approaches such as Random Forests (RF [61]) [26].There is no consensus on which of these methods performs best as this depends on how model performance is defined, the purpose of the analysis, available resources, data characteristics, and underlying biophysical conditions of the forest [62].For the operational prediction of FRI attributes, e.g., in Ontario [13,44] and elsewhere [63][64][65], nonparametric machine learning approaches have been shown to offer advantages.These studies found that RF, with its ability to handle complex, high-dimensional interactions between predictors [66], had equivalent results when compared to parametric methods in terms of accuracy and precision of the FRI attributes predicted for a boreal forest in northern Ontario [13].Penner et al. [13] stated that one important advantage of RF modeling was the use of cross-validation during model development which resulted in relatively robust models.RF's relative ease of wall-to-wall mapping of the predictions, excluding the need for (i) precise forest-type information at the spatial resolution of the predictive model(s), (ii) forest polygons with tree species information, or (iii) stratifying the [13,65], gives the RF modeling approach a significant advantage from an operational perspective.
Hence, to predict the three FRI attributes over the three different sites we applied RF.In the RF approach, a virtual forest or ensemble of regression trees is built using bootstrapped (with replacement) sample plot calibration data.The ensemble of regression trees is allowed to grow to maximum predictive accuracy [67], while bootstrapping minimizes the correlation between residuals of single trees [68].
Forests 2018, 9, x FOR PEER REVIEW 9 of 26 forest in northern Ontario [14].Penner et al. [14] stated that one important advantage of RF modeling was the use of cross-validation during model development which resulted in relatively robust models.RF's relative ease of wall-to-wall mapping of the predictions, excluding the need for (i) precise forest-type information at the spatial resolution of the predictive model(s), (ii) forest polygons with tree species information, or (iii) stratifying the [14,67], gives the RF modeling approach a significant advantage from an operational perspective.Hence, to predict the three FRI attributes over the three different sites we applied RF.In the RF approach, a virtual forest or ensemble of regression trees is built using bootstrapped (with replacement) sample plot calibration data.The ensemble of regression trees is allowed to grow to maximum predictive accuracy [69], while bootstrapping minimizes the correlation between residuals of single trees [70].We used leave-one-out cross-validation (LOOCV) to assess model performance over the full range of forest conditions at each of the forest sites because plot numbers varied greatly among the three sites and plot numbers were low at PRF.We calculated the average bias (absolute and relative), RMSE (absolute and relative) as a measure of error spread, and the coefficient of determination (R 2 ) to evaluate the predictive performance of the RF models at the site and forest type level.Bias, RMSE and R 2 were calculated as follows.We used leave-one-out cross-validation (LOOCV) to assess model performance over the full range of forest conditions at each of the forest sites because plot numbers varied greatly among the three sites and plot numbers were low at PRF.We calculated the average bias (absolute and relative), RMSE (absolute and relative) as a measure of error spread, and the coefficient of determination (R 2 ) to evaluate the predictive performance of the RF models at the site and forest type level.Bias, RMSE and R 2 were calculated as follows.
Forests 2019, 10, 226 where N is the number of validation plots, y i is the observed value for plot i, ŷi is the predicted value for plot I, and ȳ is the mean of the observed variable, i.e., BA, QMD, or S.
To compare the FRI attribute prediction error of the different RF models within each site we used the RMANOVA (repeated-measures analysis of variance) [69].The nonparametric RMANOVA test, i.e., Friedman's Test including a post hoc test, evaluates the null hypothesis that the medians of the prediction errors for each of the FRI attributes are the same among the four RF models that used different sets of predictor variables [69].If significant differences exist, the Wilcoxon-Nemenyi-McDonald-Thompson post hoc test performs a comparison on each paired RF model.We used a nonparametric version of the RMANOVA because not all FRI attributes over the three study sites fulfilled the assumptions of a normal distribution and equal variances of the residuals [69].
To compare the FRI attribute prediction errors at the forest type level, we used the nonparametric Kruskal-Wallis test (including a post hoc test).The Kruskal-Wallis test can be used when groups do not have the same sample sizes [69].The Kruskal-Wallis test evaluates the null hypothesis that the medians of the FRI attribute prediction errors are the same among the forest types within each of the sites.If significant differences in medians exist among the forest types, the pairwise Mann-Whitney U-test can be used as a post hoc analysis to identify which forest type pairs have significantly different medians.The nonparametric Kruskal-Wallis test was used because not all of the FRI predictions over the forest types fulfilled the assumption of a normal distribution and equal variances of the residuals.
All statistical analyses were performed in R version 3.4.1 [58] using the caret package version 6.0-77 [70] and randomForest package version 4.6-12 [71] to predict FRI variables.A function by Galili [72] was applied to compare the FRI attribute predictions generated with the four different ALS datasets.The kruskal.test and pairwise.wilcox.testfunctions to perform the Kruskal-Wallis and post hoc tests for the FRI attribute prediction comparisons among forest types are found in the native stats package in R.

FRI Attribute Prediction at the Forest Site Level
Although the prediction accuracy metrics (i.e., %RMSE, %Bias, and R 2 ) for the four predictor variable sets showed considerable variation over the three forest sites and FRI attributes, some tendencies were observed (Figure 3, Table 4).At all three sites, the addition of texture and/or intensity metrics to the predictor set improved %RMSE, %Bias, and R 2 .For the PRF, including intensity metrics in the predictor set always improved (i.e., lowered) %RMSE.In HFWR, including texture and intensity metrics resulted in the lowest %RMSE.For HF, the inclusion of texture metrics improved %RMSE more than the inclusion of intensity metrics; however, the lowest %RMSEs in HF for BA, QMD, and S were observed when all predictors were combined.The largest improvements in %RMSE were observed in S predictions for HFWR and PRF (6% and 3.4%, respectively) when intensity metrics were added to the set of predictor variables.In terms of %Bias, the lowest average bias was observed over all three FRI attributes and sites when texture metrics were combined with our baseline predictors, with the exception of BA in PRF where the lowest average bias was observed using only height-based metrics.Figure 3 illustrates that all RF models were on average over-predicting BA, QMD, and S regardless of which predictor variable set was being used, with the exception of QMD predictions in PRF which were on average underpredicted.Patterns in R 2 (Table 4) corresponded closely with patterns observed in %RMSE.For PRF, R 2 's for BA, QMD, and S were highest when intensity metrics were included.In HFWR and HF, RF models that included both texture and intensity metrics often resulted in the highest R 2 values.The largest improvements in R 2 were observed in S predictions when the predictor sets in HFWR and PRF included intensity metrics (0.48 to 0.67 and 0.27 to 0.41, respectively).Overall, S predictions benefited the most by including texture and intensity metrics, particularly for the GLSL forest region sites, whereas BA and QMD predictions show minimal change/improvement, especially in HF.When prediction accuracy metrics were compared among the three sites, no trends were observed, i.e., %RMSE's for all FRI attributes were lowest in HFWR, the lowest %Bias values were observed in PRF (BA) and HFWR (QMD and S), and R 2 's for all FRI attributes were highest in HF, regardless of which predictor variable sets used.The Friedman's test indicated that none of the distributions of our FRI prediction errors significantly differed from each other (p > 0.05) (Table 5).

FRI Attribute Prediction at the Forest Type Level
At the forest type level we also observed considerable variation in %RMSE, %Bias, and R 2 among FRI attributes and sites (Figures 4 and 5).In PRF and HFWR, we observed the greatest improvements in %RMSE and %Bias in lowland conifers and intolerant and tolerant hardwood forest types when texture and intensity metrics were included in the predictor set.For example, in PRF, %RMSE improved by 10% and 7% in predicted S for lowland conifers and tolerant hardwoods, respectively.In HFWR, %RMSE and %Bias improved in intolerant and tolerant hardwoods by 10% and 9% (%RMSE), respectively, and 10% and 6% (%Bias), respectively, for S predictions.Meanwhile, the least amount of change in %RMSE and %Bias was observed for HF (all forest types show <5% change), when texture and intensity metrics were included.
Including texture and intensity metrics in the predictor set also impacted R 2 values (Figure 5).Changes in R 2 are observed in S predictions for PRF and HFWR, BA predictions for HFWR, and QMD predictions for PRF, but vary among forest types.For example, in PRF we see improvements in R 2 in QMD and S predictions in intolerant hardwoods, red/white pines, and tolerant hardwoods, and decreasing R 2 values of QMD in lowland conifers.In HFWR, improvements in R 2 values were seen in BA and S predictions for intolerant and tolerant hardwoods.Hence, when we combine both site and forest type level results, it appears that changes in site level %RMSE, %Bias, and R 2 , due to the addition of texture and intensity metrics to the predictor set, are the result of the improvement in some but not all forest types present at a site and that improvements are site and FRI attribute specific.
In terms of comparing prediction accuracy between forest types, we see that in the boreal site (i.e., HF), prediction accuracies for conifer forest types (CL, CU, and PIN) were better than for intolerant hardwoods and mixedwoods for all three FRI attributes.For example, the highest R 2 values for BA, QMD, and S were observed in jack pine followed by upland and lowland conifers.In the GLSL forest sites (i.e., HFWR and PRF), such consistent patterns in prediction accuracies among forest types and FRI attributes were not evident, with PRF showing the most variability in prediction accuracy among forest types.For example, for the predictions using all ALS metrics (height-based, texture, and intensity), the differences between the forest types with the lowest and highest %RMSE in PRF were 19.4% (BA), 31.3%(QMD), and 35.4% (S).In contrast, in HFWR and HF the differences in %RMSE between certain forest types were much lower, i.e., 6.7%, 8.6%, and 12.3% in HFWR, and 9.5%, 11.9%, and 15.9% in HF for BA, QMD, and S, respectively.However, no consistency in forest types displaying lowest or highest %RMSE could be determined for any of the sites.
When we tested for differences in BA, QMD and S prediction errors between forest types, all three FRI variables and sites showed significant differences (p < 0.01 to p <= 0.0001) (Figure 6).The Mann-Whitney U post hoc tests indicated that pairwise differences between forest types varied within and between sites for the three FRI attributes.However, in HF and HFWR we observed more significant differences between forest types than in PRF (Table A2).)).

Discussion
Our results demonstrate that FRI attribute predictions improved at all three sites in terms of RMSE, bias, and R 2 , albeit to varying degrees, when texture and/or intensity metrics were integrated with normalized height metrics derived from ALS data.These findings support previous studies (e.g., [33,35,37,38]).Ozdemir and Donoghue [33] found that the CHM-based texture metrics improved predictions of height, DBH, and crown variability over using point-based ALS metrics alone in their  A2).

Discussion
Our results demonstrate that FRI attribute predictions improved at all three sites in terms of RMSE, bias, and R 2 , albeit to varying degrees, when texture and/or intensity metrics were integrated with normalized height metrics derived from ALS data.These findings support previous studies (e.g., [31,33,35,36]).Ozdemir and Donoghue [31] found that the CHM-based texture metrics improved predictions of height, DBH, and crown variability over using point-based ALS metrics alone in their study sites in northeast England.They also noted that improvements in prediction accuracy varied with tree-size diversity indices.Niemie and Vauhkonen [33] tested a wide range of CHM-based texture metrics and found that GLCM texture metrics were one of the texture metrics groups that were moderately successful for predicting VOL, BA, and QMD in a southern boreal forest site in Finland, especially when combined with density metrics derived from the ALS point data.They found, however, that texture metrics predominantly described dominant canopy properties such as total VOL, BA, and QMD, rather than the variability within these properties.
Garcia et al. [35] found that normalized intensity-related metrics provided more accurate estimates of biomass and explained more of the variance in their biomass models in a mixed species Mediterranean forest.Metrics that incorporated intensity at different height percentiles (which can be viewed as a combination of height and intensity information) along with intensity-based measures of canopy closure were selected in their overall and species-specific biomass models.Shang et al. [36] compared the utility of height and intensity metrics for the prediction of size class specific stem densities in the GLSL HFWR site.They found that intensity-based metrics and height percentiles had high RF importance scores for the majority of the size classes and total stem density, indicating their relevance in predicting stem diameter distributions.They contributed this to the fact that the variance of intensity captured the permeability or layering of the canopy which enabled the differentiation between high and low stem density stands.These and other studies [38,39] attributed the improvement in prediction accuracy of forest attributes to the ability of intensity metrics to take into account the amount of foliage, variation in foliage structure, and species composition, aspects that ALS-based height metrics do not necessarily capture.
In our study, at the forest site level, the combination of ALS height-based, texture and intensity metrics (HF and HFWR) or ALS height-based and intensity metrics (PRF) improved the RMSE and R 2 of the BA, QMD and S predictions, whereas the addition of texture metrics in almost all sites and for all FRI attributes improved the average bias.Like Ozdemir and Donoghue [31] we also found that improvements in prediction accuracy, due to the added texture and intensity metrics, varied by FRI attribute, e.g., improvements up to 6% RMSE were observed in S predictions, whereas improvements in BA and QMD never exceeded 1.5% RMSE.For all three sites, RF models to predict S were always the weakest models, which may explain why these models gained the most by having texture and intensity metrics included in the predictor sets.Another indicator of the utility of texture and intensity metrics in our FRI attribute predictions was the RF variable importance measure.For the prediction of S, texture (GLCM dissimilarity (HFWR), GLCM entropy (PRF), GLCM correlation (HF)), and intensity (ISD (PRF and HFWR)) metrics were highly ranked in terms of variable importance at all sites.For BA predictions in HFWR, the ISD metric was the second most important metric when all ALS metrics were incorporated in the predictor set.Another study in PRF also found that CHM-derived texture and other metrics (Roughness, Ruggedness, and Variance) were often highly ranked in terms of RF variable importance [52].
Differences in prediction accuracy improvements among the three sites were also observed, with greater changes being observed more frequently in the GLSL (i.e., HFWR and PRF) sites than in the boreal site (i.e., HF).For example, the %RMSE of the S predictions improved by 3.4% in PRF and 6% in HFWR, whereas S predictions in HF improved by 1.7% when texture and intensity metrics were included in the predictor set.Similar to Hollaus et al. [43] and Bouvier et al. [16], we observed better R 2 values in the boreal, conifer dominated, forest site for all three FRI attributes than in the broadleaved dominated (HFWR) and more mixed (PRF) GLSL forest sites.One reason for observing better predictive accuracies in both HF and HFWR, compared to the PRF, may be attributed to sample size and sample plot size at each of the forest sites.With 387 sampling plots, HF had the largest number of plots, followed by HFWR with 185, and 84 for PRF.In terms of plot size, HFWR had the largest plot size (2500 m 2 ), followed by PRF (1000 m 2 ) and then HF (400 m 2 ).Shang et al. [73] found that sample plot size had a stronger impact on model performance than the number of plots in their diameter distribution predictions in HFWR.However, Fassnacht et al. [74] found that when a combination of increased plot sizes and reduced number of plots was used, the prediction accuracy benefits of the increased plot size decreased.They attributed this mostly to the decreased model performance of a model calibrated with lower sample sizes.In our study, the 84 sample plots in PRF were unable to cover the full range of stand conditions found in PRF; hence, QMD and S prediction accuracies, in particular, were substantially lower compared to prediction accuracies observed in HFWR and HF.A comparison of prediction accuracies with a previous study in PRF [52], which included 223 field plots (plot size of 625 m 2 ), is inconclusive in terms of attributing the decreased prediction accuracy in PRF to the number of sampling plots only.QMD prediction accuracies showed a %RMSE difference of 7% between their predicted QMD % RMSE (17%) and the %RMSE (25%) found in this study.However, in terms of BA predictions, the %RMSE found in this study (22% RMSE) was better than in theirs (25% RMSE).Differences in ALS sensor, wavelength of the ALS sensors, and ALS-based predictor variable sets used in both studies may also contribute to the differences between these studies.In addition, this study used existing field datasets to calibrate the FRI attribute models and hence some field datasets may have fitted the purpose of this study better than others.For example, in HF, the field data were initially collected to support an OLS modeling approach to predict FRI attributes, and purposely sampled all forest types present in HF.In HFWR, the field data collection focused on upland ecosites, which comprises 90% of the HFWR landscape.Some lowland and less common ecosite types were not represented in their sampling design, explaining the lack of lowland conifer data for this forest site.In PRF, the stratified sampling scheme was based on a principal component analysis of a large set of ALS metrics with the intent to capture a wide range of forest structural conditions and hence did not explicitly focus on sampling all forest types [52].In addition, only a subset of these plots were remeasured for the 2016 ALS acquisition and used in the analysis presented here.Lastly, since relationships between stand structure and forest attributes are species-dependent, many studies have found that model prediction accuracy decreases in multi-or mixed-species forest stands [2,8,16,75].However, lower prediction accuracies have also been observed in stands with poorer growth conditions [17] and in forest stands that are more heavily influenced by forest management practices [42].Among our three sites, structural complexity and species composition were expected to be less variable for the boreal forest site, i.e., HF is largely dominated by conifers (70% versus 30% mixed and intolerant hardwoods) [13], and more complex for the temperate forest sites.In HFWR, however, species composition was fairly homogenous, i.e., the dominant species in almost all plots is sugar maple [45].Hence prediction accuracies in both HF and HFWR were better than in PRF, which was the most complex forest site in terms of species composition and structure.
An aspect that negatively affected prediction accuracy at all three sites and across all forest types may be attributed to the modeling approach we chose to use in this study.Despite RF's advantages for wall-to-wall mapping of predicted FRI attributes, the RF approach is also known to underestimate high predicted values and overestimate low predicted values and is not able to extrapolate beyond the range of the training data [25].This bias is exacerbated when sample size is small [76].Fundamentally, RF is a nearest neighbor method where observations are predicted using the averages of response values of neighboring observations [77].Especially in small datasets, data points in the tails of the response variable distribution may differ substantially from its nearest neighbors, which causes the predictions of these data points to move towards the center of the training data, i.e., large response values are underestimated and small values overestimated [76,77].In our study, this bias was observed in all three sites (Figure 3), but especially in PRF, and also over all forest types .RF bias correction strategies that could potentially improve RF predictions have-as of yet-not been well-studied [76], nor have they been implemented in many of the available RF software packages.Hence, no bias correction of the FRI attribute predictions was performed in this study.However, from an operational forest management perspective, knowledge of the proportion of the landscape that is affected by this bias and the location of the forest stands that are impacted by this bias is important.Coulston et al. [78] proposed a Monte Carlo approach to quantify prediction uncertainty within RF regression models.
Maps of such estimates of the RF prediction uncertainty, in addition to the FRI attribute prediction surfaces for each forest sites could subsequently be used to address this.
The %RMSE observed in this study were comparable to other studies that modeled similar FRI attributes at the same forest sites [10,13,44,45,52].These prediction accuracies were calculated at the plot level (or pixel level once wall-to-wall prediction surfaces of the FRI attributes are generated).For operational forest management, however, prediction accuracies at the stand level, i.e., encompassing several pixels, are more relevant and may very well be lower.Naesset [2,79] found in their two-phase modeling approach of FRI attributes that the standard deviation between predicted and observed plot values were substantially larger than the standard deviations in the same attributes for entire forest stands.At the plot level, FRI attribute predictions can be highly variable, however, once plot estimates are averaged over a stand (encompassing several pixels) the variability is reduced and prediction accuracy improves [79][80][81].However, Iqbal et al. [81] showed that %Bias increased when prediction accuracies were calculated at the stand level, possibly because at the stand level those errors are more likely to accumulate.For this study we did not have stand level validation data available for all three sites and hence could not test this for our forest sites or test the existence of variability among sites.
Uniform increases or decreases in prediction performances in BA, QMD, and S were not observed when texture and intensity metrics were included in the predictor set.Instead, site level prediction improvements arising due to texture and intensity metrics were observed due to improvements in certain forest types but not in all, i.e., improvements varied by site and FRI attribute.If we focus on the S predictions, as these predictions improved the most by the addition of texture and intensity metrics, these metrics appeared to be most beneficial for forest types whose prediction performances were lower initially, i.e., lowland conifers (from 56.2% to 46.3% RMSE) and tolerant hardwoods (from 35.9% to 27.9% RMSE) in PRF, intolerant (from 28.8% to 17.3% RMSE) and tolerant hardwoods (from 32.6% to 23.5% RMSE) in HFWR, and intolerant hardwoods (from 40.3% to 36.4% RMSE) and mixedwoods (from 43.1% to 40% RMSE) in HF.The variability in prediction accuracy between forest types and the variability in improvement in prediction accuracy within forest types when additional ALS metrics are incorporated has been observed in other studies (e.g., [13,15,16,42]).Naesset [75], Heurich and Thoma [42], and Kotivuori [17] have pointed out that the structural diversity between forest types (in terms of tree species, crown shape and width, canopy cover and layering, etc.) have considerable effects on the ALS-based metrics and hence on FRI attribute predictions.For example, crown shape differences and the different properties of needles and leaves influence the penetration of laser pulses into the canopy surface and hence influence the height distributions calculated from the ALS data [44,76].In addition, variation in crown width and number of trees, due to differences in site quality, competition and silvicultural practices within different forest types, may greatly affect the relationship between FRI attributes and ALS-based density metrics [75].For example, many plots classified as tolerant hardwoods (with larger tree crowns and lower stem densities) had lower values for the ALS metrics D7-9 compared to upland and lowland conifer plots (with smaller tree crowns and higher stem densities).It is no surprise that we noticed similar variability in relationships between texture and intensity-based metrics and FRI attributes within the different forest sites and forest types as well.Intensity-based metrics may be able to discriminate between tree species with different leaf angle distributions/orientations, e.g., erectophile versus planophile leaf angle distributions in aspens and maples, respectively.The relationships between ALS metrics and forest type specific FRI attributes, especially for lowland conifers, upland conifers, intolerant hardwoods, and mixedwoods in PRF, should be treated with caution because of the low sample sizes (i.e., <10 sample plots) within these forest types.In addition, some ambiguity existed in the definition of forest types, especially for mixedwoods and hardwoods forest types, which may have affected this relationship.Furthermore, relationships between intensity metrics and the FRI attributes were influenced by the different ranges in intensity values among the three ALS datasets which hampers the interpretation.In order to properly compare and understand the relationship between FRI variables and the intensity metrics, a radiometric correction of the ALS intensity values is required [34].Since the key input features, such as the range between the sensor and individual returns, required for the radiometric correction procedure were not available for two of our ALS datasets, radiometric correction was not performed.Hence, more research is required to further our understanding of ALS intensity metrics in relation to the FRI variables and their consistency over different forest sites and forest types and across different sensors and acquisition conditions.
Conifer FRI attribute predictions in HF (i.e., boreal) were consistently better, i.e., lower %RMSE, lower %Bias, and higher R 2 than in mixedwoods and intolerant hardwoods.For example, jack pine predictions consistently had the best R 2 values over all three FRI attributes and mixedwood predictions consistently the worst.This is in line with previous findings [2,41,44,75].For example, Pitt et al. [44] observed that the quality of FRI attribute predictions varied with forest type and that the largest errors were observed in boreal mixedwoods in HF, regardless of which prediction method used.Contrary to the findings in the boreal zone, Spriggs et al. [45] found that in HFWR, prediction accuracy was better for broadleaved species and postulated that estimation models were most suited to the dominant forest type of a site.In the GLSL forest sites, we did not observe either of these trends consistently over the three FRI attributes and two sites.When we compared Spriggs et al. [45] conifer class to our upland conifer forest type and their broadleaved class to our tolerant hardwood forest type, our BA prediction accuracies at the forest type level were similar, i.e., BA RMSE conifer = 7.39 versus 8.26 m 2 /ha, RMSE broadleaved = 5.93 versus 4.72 m 2 /ha.Hence, in both studies, broadleaved stands had lower RMSEs.However, S predictions between the two studies did not demonstrate this trend as clearly.At the forest type level, Spriggs et al. [45] reported an RMSE of 268 stems/ha in conifer stands versus 156 stems/ha in broadleaved stands whereas in this study the difference was much smaller, i.e., 157 and 151 stems/ha, respectively.Also, in terms of R 2 , upland conifers performed better than tolerant hardwoods in our study.Spriggs et al. [45] did not include this as one of their accuracy measures.Differences in the prediction modeling approach and how the field data were split in conifer versus broadleaved versus our four forest types may explain the differences in S prediction accuracies found between the two studies.
Zolkos et al. [15] attributed the variability within and between forest types, in part, to the fact that it is possible to have similar forest stand structures yielding different BA, QMD, and S values, while Heurich and Thoma [42] pointed out that silvicultural treatment within forest types affected prediction accuracies.In our study we noticed this within the pine forest types in PRF.Pine forests were extremely diverse as it incorporated pine plantations, stands that have undergone partial harvesting (uniform shelterwood regeneration cuts) and natural unmanaged red pine/white pine stands.For example, Figure 7 shows that within this forest type in PRF, different BA-QMD, BA-S, and QMD-S relationships were observed than for the other forest types across the three sites.For example, the high QMD variance at low BA values is explained by pine plantations having low QMD and seed tree stands having high QMDs.Similarly, many of the normalized height-based ALS metrics had difficulties distinguishing between pine plantations and stands with residual trees remaining after a partial harvest even though these two stand types yield very different QMD and S values, i.e., pine plantations had lower QMDs and higher stem densities compared to partially harvested stands.Some of the texture and intensity-based metrics seemed to better separate plantations, silviculturally-treated, and natural stands; however, the red pine/white pine QMD and S predictions for PRF displayed high %RMSEs and only small changes in prediction accuracy were observed when texture and intensity metrics were included in the predictor sets.To investigate the prediction accuracy in the red pine/white pine forest type in PRF further, we recalculated %RMSE and %Bias when sample plots were split into natural unmanaged and silviculturally-treated plots (plantations and partial harvested stands).Both %RMSE and %Bias were substantially larger in silvicultural treated plots than in natural unmanaged plots for all three FRI attributes, i.e., %RMSEs for natural and silviculturally-treated plots were 19.1% and 29.3% (BA), 13.8% and 42.2% (QMD), and 32.2% and 71.6% (S), respectively and %Bias were −3% and −6% (BA), 1% and −15% (QMD), and 3% and −9% (S), respectively.Prediction accuracies for unmanaged natural plots were closer to the ones found for the tolerant hardwood forest type, whereas the silviculturally-treated plots together with lowland conifers had the lowest prediction accuracies.Woods et al. [10] observed the effect of stand spacing and silvicultural treatments on the prediction accuracies of ALS-based models of FRI attributes for multiple forest sites throughout central Ontario.Similarly, Heurich and Thoma [42] found that selective harvesting in coniferous stands had a negative influence on the prediction accuracies of FRI attributes in the Bavarian Forest National Park in Germany.These studies may indicate that "standard" ALS-based metrics used to predict FRI attributes are not able to sufficiently characterize silviculturally-treated stand conditions.Ayrey and Hayes [82] demonstrated that conditional neural networks (CNN), which use 3D volumetric pixels or voxels derived from the ALS point cloud rather than a set of traditional ALS-based metrics as model input, outperformed RF and linear mixed models for some of their above ground biomass, tree count, and percent conifer predictions.They indicated that some of the features that the CNN uses to make predictions relate to edges, surfaces and possibly branches within the voxelized point cloud, features that ABA-ALS-based metrics do not capture [82].Their study, however, predicted these attributes over several sites and did not focus on differences in prediction accuracies among different forest types or stand conditions within these sites.attributes in the Bavarian Forest National Park in Germany.These studies may indicate that "standard" ALS-based metrics used to predict FRI attributes are not able to sufficiently characterize silviculturally-treated stand conditions.Ayrey and Hayes [83] demonstrated that conditional neural networks (CNN), which use 3D volumetric pixels or voxels derived from the ALS point cloud rather than a set of traditional ALS-based metrics as model input, outperformed RF and linear mixed models for some of their above ground biomass, tree count, and percent conifer predictions.They indicated that some of the features that the CNN uses to make predictions relate to edges, surfaces and possibly branches within the voxelized point cloud, features that ABA-ALS-based metrics do not capture [83].
Their study, however, predicted these attributes over several sites and did not focus on differences in prediction accuracies among different forest types or stand conditions within these sites.
In this study, we have quantified the variability in FRI prediction accuracy between three Ontario forest sites and between the forest types within these sites.However, we acknowledge that additional research is needed to get better insights into the relationships between prediction error and forest types especially in the more complex GLSL.

Conclusions
In this study we demonstrate that FRI attribute predictions for BA, QMD, and S over three contrasting Ontario sites improves with the addition of texture and intensity-based ALS metrics (i.e., over using height metrics alone).Although no significant differences between predictions were detected using the nonparametric RMANOVA test at the site level, the nonparametric Kruskal-Wallis test indicates that significant differences exist in prediction errors between the different forest types (p < 0.01 to p ≤ 0.0001).Texture metrics seem more relevant for the boreal site but improvements were relatively small, whereas intensity metrics improved predictions in the GLSL sites, especially in terms of stem density.Since the intensity metrics used in this study were derived from raw ALS intensity data, the impact of intensity normalization/radiometric correction on the model predictions requires further research especially since multiple ALS sensors were used in this study.When we combine site and forest type level results, we demonstrate that improvements in site level predictions, due to the addition of texture and intensity metrics to the ALS predictor set, are the result of changes in some but not all forest types present at a site and that these changes are site and FRI attribute specific.Our study found that boreal site prediction accuracies were most consistent among FRI attributes and forest types, and conform to other studies performed in the boreal forest zone.Such patterns in prediction accuracy among forest types and FRI attributes was not observed in the GLSL sites.Varied silvicultural treatments for the pine forest types (i.e., pine plantations, silviculturally- In this study, we have quantified the variability in FRI prediction accuracy between three Ontario forest sites and between the forest types within these sites.However, we acknowledge that additional research is needed to get better insights into the relationships between prediction error and forest types especially in the more complex GLSL.

Conclusions
In this study we demonstrate that FRI attribute predictions for BA, QMD, and S over three contrasting Ontario sites improves with the addition of texture and intensity-based ALS metrics (i.e., over using height metrics alone).Although no significant differences between predictions were detected using the nonparametric RMANOVA test at the site level, the nonparametric Kruskal-Wallis test indicates that significant differences exist in prediction errors between the different forest types (p < 0.01 to p ≤ 0.0001).Texture metrics seem more relevant for the boreal site but improvements were relatively small, whereas intensity metrics improved predictions in the GLSL sites, especially in terms of stem density.Since the intensity metrics used in this study were derived from raw ALS intensity data, the impact of intensity normalization/radiometric correction on the model predictions requires further research especially since multiple ALS sensors were used in this study.When we combine site

Figure 1 .
Figure 1.The three study sites in Ontario, Canada.

Figure 2 .
Figure 2. A flow diagram of the methods used in this study.

Figure 2 .
Figure 2. A flow diagram of the methods used in this study.

Figure 3 .
Figure 3. FRI predictive models for BA, QMD, and S for each of the sites developed using the RF model including all ALS metrics (z, texture and intensity) and leave-one-out cross-validation.Linear fitted lines by site and the 1:1 line are added to show the different model fits by forest site.

Figure 3 .
Figure 3. FRI predictive models for BA, QMD, and S for each of the sites developed using the RF model including all ALS metrics (z, texture and intensity) and leave-one-out cross-validation.Linear fitted lines by site and the 1:1 line are added to show the different model fits by forest site.

Figure 5 .
Figure 5. R 2 of the RF models using different predictor variable sets (H: normalized height metrics; H+T: normalized height and CHM texture metrics; H+I: normalized height and intensity metrics; and H+T+I: normalized height, CHM texture and intensity metrics) by FRI attribute (top: BA, QMD, and S), forest site (right: PRF, HFWR, and HEF), and provincial forest type (CL, CU, INT, MW, PIN, and THW).

Figure 5 .
Figure 5. R 2 of the RF models using different predictor variable sets (H: normalized height metrics; H+T: normalized height and CHM texture metrics; H+I: normalized height and intensity metrics; and H+T+I: normalized height, CHM texture and intensity metrics) by FRI attribute (top: BA, QMD, and S), forest site (right: PRF, HFWR, and HEF), and provincial forest type (CL, CU, INT, MW, PIN, and THW).

Figure 5 .
Figure 5. R 2 of the RF models using different predictor variable sets (H: normalized height metrics; H+T: normalized height and CHM texture metrics; H+I: normalized height and intensity metrics; and H+T+I: normalized height, CHM texture and intensity metrics) by FRI attribute (top: BA, QMD, and S), forest site (right: PRF, HFWR, and HEF), and provincial forest type (CL, CU, INT, MW, PIN, and THW).

Figure 5 .
Figure 5. R 2 of the RF models using different predictor variable sets (H: normalized height metrics; H+T: normalized height and CHM texture metrics; H+I: normalized height and intensity metrics; and H+T+I: normalized height, CHM texture and intensity metrics) by FRI attribute (top: BA, QMD, and S), forest site (right: PRF, HFWR, and HEF), and provincial forest type (CL, CU, INT, MW, PIN, and THW).

Figure 5 .
Figure 5. R 2 of the RF models using different predictor variable sets (H: normalized height metrics; H+T: normalized height and CHM texture metrics; H+I: normalized height and intensity metrics; and H+T+I: normalized height, CHM texture and intensity metrics) by FRI attribute (top: BA, QMD, and S), forest site (right: PRF, HFWR, and HEF), and provincial forest type (CL, CU, INT, MW, PIN, and THW).

Figure 6 .
Figure 6.Site and forest type specific BA, QMD, and S predictive model residuals for the four different ALS metric datasets (H only: normalized height metrics; H+T: normalized height and CHM texture metrics; H+I: normalized height and intensity metrics; and H+T+I: normalized height, CHM texture and intensity metrics).p-values of the prediction error differences between forest types were acquired using the unpaired nonparametric Kruskal-Wallis test (p-values of the Mann-Whitney U post hoc tests can be found in TableA2).

Figure 6 .
Figure 6.Site and forest type specific BA, QMD, and S predictive model residuals for the four different ALS metric datasets (H only: normalized height metrics; H+T: normalized height and CHM texture metrics; H+I: normalized height and intensity metrics; and H+T+I: normalized height, CHM texture and intensity metrics).p-values of the prediction error differences between forest types were acquired using the unpaired nonparametric Kruskal-Wallis test (p-values of the Mann-Whitney U post hoc tests can be found in TableA2).

Table 1 .
Field data specifications.

Table 2 .
Field data summaries (mean and standard deviation) for all sample plots and by provincial forest types.

Table 4 .
Site level BA, QMD, and S predictive RF model accuracies (%RMSE, %Bias, and R 2 , respectively) for the four different predictor sets (H only: normalized height metrics; H+T: normalized height and CHM texture metrics; H+I: normalized height and intensity metrics; and H+T+I: normalized height, CHM texture and intensity metrics) using leave-one-out cross-validation.Best models in terms of %RMSE, %Bias or R 2 are indicated in bold.

Table 5 .
Site level prediction error differences between the RF models with different predictor sets (H only: normalized height metrics; H+T: normalized height and CHM texture metrics; H+I: normalized height and intensity metrics; and H+T+I: normalized height, CHM texture and intensity metrics).p-values of the prediction error differences were acquired using the paired, nonparametric Friedman's test.

Table A2 .
Forest type prediction error differences between the RF models (p-values of the prediction error differences between forest types were acquired using the unpaired nonparametric Kruskal-Wallis test, including Mann-Whitney U post hoc tests).