A Comparison of Airborne Laser Scanning and Image Point Cloud Derived Tree Size Class Distribution Models in Boreal Ontario

Airborne Laser Scanning (ALS) metrics have been used to develop area-based forest inventories; these metrics generally include estimates of stand-level, per hectare values and mean tree attributes. Tree-based ALS inventories contain desirable information on individual tree dimensions and how much they vary within a stand. Adding size class distribution information to area-based inventories helps to bridge the gap between area-and tree-based inventories. This study examines the potential of ALS and stereo-imagery point clouds to predict size class distributions in a boreal forest. With an accurate digital terrain model, both ALS and imagery point clouds can be used to estimate size class distributions with comparable accuracy. Nonparametric imputations were generally superior to parametric imputations; this may be related to the limitation of using a unimodal Weibull function on a relatively small prediction unit (e.g., 400 m 2).


Introduction
Forest Resource Inventories in Ontario have been designed to meet long-term (20 year) strategic management planning needs and have traditionally contained photo-interpreted estimates of species composition, age, height, and site occupancy.Recent studies have shown that these data may be augmented with area-based estimates of growing stock (basal area, volume) and average tree size (height, diameter, volume) derived from either Airborne Laser Scanning (ALS) [1,2] or stereo image point clouds (IPC) [3], to facilitate tactical (5-year) and operational (1-year) planning of forest operations.Since harvesting operations have become more mechanized and processing facilities are increasingly optimized for particular products and sizes of raw materials, there is considerable interest in adding information to inventories on the size assortment of the stems.
Modelling, or fitting size class distributions to empirical data, has received considerable attention.Much of the earlier work focused on selection and fitting of an appropriate distribution.Bailey and Dell [4] proposed using the Weibull function for size class distributions.Most of the commonly used distribution functions are unimodal, while many forest size class distributions are irregular and not well characterized by a unimodal function.Mixture models combine two or more distribution functions, resulting in multi-modal distribution functions, which have been used to characterize more complex forests e.g., [5].Predicting the modeled size class distribution from ancillary data e.g., [6] is generally the next step.This may include ensuring the size class distribution is compatible with other inventory attributes, including total stems and basal area [7].Alternatively, the Diameter at breast height (Dbh) and height distributions can be predicted together [8].The advantage of using a parametric distribution model is the typically low (one to four) number of parameters that have to be predicted; the key disadvantage is a restriction in model form [9].
Nearest neighbour imputation methods have become popular in forest inventory efforts [10].Imputation is used to associate expensive but sparse data with inexpensive and spatially comprehensive data [11].The response variable is measured on a subset of the prediction units in the population (the reference data set), and auxiliary or predictor variables are available for the entire population.Generally, a prediction for a target unit is calculated from a weighted combination of the response variables from observations in the reference data set that are most similar, or nearest neighbours, to the target in terms of auxiliary variables.Nearest neighbour techniques can be used to predict categorical and continuous variables and univariate or multivariate response variables [10].This ability to predict multivariate response variables makes nearest neighbour imputation particularly promising for the prediction of probability density functions, particularly for complex stands with multiple species and a variety of tree sizes [12].The size class distributions for these stands tend to be multimodal and not easily represented by parametric functions [10].When the reference dataset is large, k-Nearest Neighbour (kNN), estimation shows promise in predicting relatively broad Dbh classes [13].However, in many forest inventory applications, ground observations are relatively few and are generally captured from small plot areas (typically 400 m 2 ).RandomForest, another nonparametric imputation method, has shown promise in forest application using ALS [14].
ALS has been used to predict size class distributions.When height is the size attribute, early work [15] focused on predicting the distribution of tree heights from the distribution of ALS return heights by first generating a canopy model for the calibration data.The canopy area model or three-dimensional canopy volume model was developed for the calibration data using stem-mapped data, including crown measurements.This canopy model was used to generate a theoretical distribution of ALS returns, which was then compared to the actual returns.More recent work [16,17] also predicted the distribution of canopy heights.Rather than requiring tree location in the calibration data, assumptions were made about the spatial arrangement of the trees.Another approach, particularly when diameter is the size attribute, is to predict the size class distribution directly from ALS data without first generating a crown map.Bollandså s et al. [18] used an ALS point density of ~0.7/m 2 in a Norwegian boreal forest to predict the deciles of the Dbh distribution using most similar neighbour (MSN) and seemingly unrelated regression (SUR).The MSN and SUR predictions generated an unbiased prediction of total basal area, but MSN was better at predicting the number of large trees.Some authors have also included predictors from aerial photographs.Packalé n and Maltamo [19] used ALS predictors as well as spectral values and textural features from aerial photos that had been radiometrically corrected against a Landsat 7 ETM.
Recently, ALS-like point clouds have been derived from stereo imagery using pixel matching.A previous study compared ALS and IPC for predicting forest inventory attributes in the Ontario boreal forest [3].Comparable accuracies were obtained for predictions of forest inventory attributes including basal area, merchantable stem volume, top height and quadratic mean Dbh, but they found some loss of precision with the IPC using an area-based modeling approach.To date, no studies have compared the use of ALS and IPC in the prediction of size class distributions.
The objective of this study was to investigate and compare the potential of ALS and IPC metrics to predict size class distributions for a management area in a northeastern Ontario boreal forest.Both parametric and nonparametric approaches are evaluated in this comparison.

Materials
The study area and data were used in a previous study [3] to estimate traditional forest inventory attributes.

Study Area
The Hearst Forest is located in northeastern Ontario (Figure 1) and has more than 1,000,000 ha of productive forest, classified into eight forest types (Table 1).The following description is detailed in the 2007 forest management plan [20].The predominant tree species on the Hearst Forest is black spruce (Picea mariana Mill.B.S.P.).Sixty-seven percent of the land base is composed of forest types in which black spruce is a major component.The better-drained, more productive lowland transitional and upland sites, where the Spruce Pine (SP) and Spruce Fir (SF) forest types are found, make up 30 percent of the land base.On these sites, black spruce may be found with white spruce (Picea glauca (Moench) A. Voss), jack pine (Pinus banksiana Lamb.), balsam fir (Abies balsamea (L.) Mill), and trembling aspen (Populus tremulodies Michx.).The black spruce (SB) forest type makes up 34 percent of the land base and consists of black spruce on lowland areas in pure stands and in association with cedar (Thuja occidentalis L.) and tamarack (Larix laricina (Du Roi) K. Koch).These lowland sites are characterized by poor drainage and moderately-deep to deep (more than 20 cm) organic soil over clay.The productivity of these areas is low to moderate.Lowland conifer (LC) makes up about 3 percent of the area and is a very wet but well drained forest type with strong groundwater seepage dominated by black spruce and often supporting cedar, tamarack and white spruce.Approximately 25 percent of the land base occurs on mineral soils on upland sites associated with mixedwood stands consisting of jack pine, black and white spruce, trembling aspen, balsam poplar (Populus balsamifera L.), white birch (Betula papyrifera Marsh.) and balsam fir.The soils are fine loam to sandy clay, topped by less than 20 cm of organic material.The mixedwood forest type is split into mixedwood conifer (MWC) or mixedwood hardwood (MWH), depending on whether conifers or hardwoods are the majority.Intolerant hardwoods (IH) make up 5.5 percent of the land base and consist mainly of trembling aspen and white birch.Approximately 3 percent of the land base consists of stands that are dominated by jack pine on mineral soils (PJ).

Ground (Field) Data
The majority of the ground data were collected on the Hearst Forest during the summer of 2010, according to a previously documented field protocol [2].A total of 446 circular, 400-m 2 temporary sample plots were established throughout the range of development stages within eight forest types (Table 1).As in prior analyses, four plots dominated by cedar with unusually high basal areas were not used.In late 2012, an additional 64 plots, eight in each forest type, were established using the same field protocol.A lower Dbh limit of 9 cm was used.Trees with Dbh > 9 cm were put into 2cm wide Dbh classes.Plots with three or fewer Dbh classes were withheld from analyses.Plots with less than 3 m 2 /ha of live basal area were also excluded, leaving a total of 401 plots.Veteran trees, defined as solitary, large trees that were significantly taller and older than the main canopy, were removed from the ground plot tallies.
The vertical complexity index (VCI) [21] was computed from the ALS data as a means of stratifying the ground plots within forest types.VCI summarizes the vertical distribution of the ALS returns on a scale of 0 to 1 and is computationally similar to the Shannon [22] evenness index, which is used to quantify species diversity and evenness.The index is at a maximum when the frequency distribution is a uniform distribution and decreases as the distribution becomes more peaked.In general, plots with smaller VCI tended to be in younger stands with a right-tailed Dbh distribution.As VCI increases, the distributions tend toward a more symmetric, unimodal distribution.As the VCI approaches 1, the distributions tend to flatten with no clear mode or multiple modes.Plots with high VCI tend to be associated with overmature conditions where the overstorey is starting to break up and an understorey is developing.The stratification is similar to that used by Bollandså s and Naesset [23] who used the Gini coefficient to group by distribution types, ranging from normal, to uniform, to reverse-J.Alternatives to VCI that better characterize forest structural types [24] are available.VCI was used here to ensure the validation data covered a range of conditions, not necessarily to identify forest structural types, and was felt to be adequate for this use.
Eight plots in each forest type, randomly selected throughout the range of VCI, were reserved for validation (Tables 2 and 3).

ALS Data
The ALS data were acquired between 4 July and 4 September 2007 according to the specification provided in Table 4. ALS predictor variables were derived from point-cloud statistics (Table 5) following the methods described by Woods et al. [2].

Image-Based Data
Aerial imagery for the Hearst Forest was acquired with the Leica ADS40 sensor during July and August 2007 as part of the provincial Forest Resources Inventory acquisition effort [25,26].The data included stereo coverage with panchromatic, red, green, blue, and infrared bands, acquired at a ground sampling distance of 16 cm, and later resampled to 20 cm for the panchromatic and 35 cm for the 4-band multispectral data (Table 4; Figure 1).Photogrammetric pixel matching was completed by the image vendor, using the semi-global matching algorithm [27][28][29][30] on 80 cm resolution, 4-band multispectral data.The resulting IPC had an average sampling density of 2.4 pixel matches/m 2 and described the surface captured on the image (ground, low vegetation, or trees), but were not classified as such.These IPCs were then normalized against the ALS-derived digital terrain model (DTM).Normalized IPC statistics and a digital canopy height model were generated for both datasets.

Dependent Variables
The prediction unit was the 400-m 2 plot.The parametric dependent variable was relative basal area (BA), the fraction of total BA by 2-cm Dbh class on the 400-m 2 plot.The choice of 2-cm wide Dbh classes was somewhat arbitrary.For the calibration data, it led to an average of 9 Dbh classes with trees present.The lower limit for Dbh was 9 cm; the lower limit considered for merchantability.The total BA per hectare, as well as the fraction of basal area in trees with Dbh > 9 cm, were predicted [31], and used herein to convert relative BA by Dbh class to BA by Dbh class.
The nonparametric dependent variable was the BA/ha by Dbh class on the 400-m 2 plot.We considered predicting the relative BA by Dbh class, but these predictions would have required scaling to ensure that they summed to one.Since both the relative BA and the absolute BA would require scaling, it was decided to select the raw BA for modeling.These BA predictions were then converted to relative BA by dividing the BA by Dbh class by the sum of the predicted BA by Dbh class.Parametric and nonparametric predictions of relative BA were then compared.

Independent ALS and Optical CHM Predictors
Both the ALS and IPC data require normalization against a DTM.The IPC point cloud is concentrated in the upper canopy envelope, compared to the ALS point cloud, which is distributed throughout the canopy (Figure 2).The better canopy penetration of ALS makes it much more suitable to the generation of a DTM [32] in a forested environment.For this study, a DTM was generated from the ALS data and the ALS and IPC data were normalized against this DTM.
The ALS predictor variables (Table 5) were derived from point-cloud statistics following previously described methods [2].Veteran trees generally result in a few high ALS returns that have a large influence on some of the measures of spread, such as standard deviation of ALS returns (STD_DEV) and vertical complexity index (VCI).To reduce the influence of these trees on measures of spread, these statistics were calculated by first removing the top 5% of the ALS returns, and basing the statistic on the remaining 95% of returns.The remaining statistics were calculated using all ALS returns.The IPC data were intersected with the ground plots and independent predictors were generated (Table 5).Not all statistics generated with the ALS point clouds were generated from the IPC.Exceptions were the statistics that are dependent on ratios of the number of points distributed through the canopy (first return divided by all returns [DA], first and only return/all returns [DB], and coefficient of variation [CanCovar]), which could not be calculated for the IPC data.

Parametric or Non-Linear Regression (NLS)
The choice of a 2-cm diameter class interval meant that some Dbh classes had no trees.These could either be treated as missing observations, and not used in the statistical analysis, or as zeroes, and used in the statistical analysis.In this study, we chose to set missing values to 0 for Dbh classes within the range of Dbhs for the plot.For example, if the Dbh range for a plot went from 10 to 32 cm, any Dbh classes within that interval with no trees had zero trees (no missing values).We set the relative frequency one size larger than the largest Dbh to zero as well.The result was that missing values were only allowed for Dbh classes > the largest Dbh class + 2 cm.
Our methods draw heavily from Cao [6], who used the three-parameter Weibull (Equation ( 1)) to model Dbh distributions.The Weibull probability density function predicts the relative frequency of x, given location parameter "a", shape parameter "b", and scale parameter "c".
We used the parameter prediction method [6].First, we fit Equation (1) to each calibration plot using PROC NLIN in SAS, with a = 9.0.Then we used stepwise regression (SAS routine GLMSELECT) to predict the parameters of the plot-level fit from ALS or IPC attributes by forest type.We used a logarithmic transformation to ensure the parameter predictions were always positive.We predicted the natural logarithm of the shape parameter (ln(b)) and the scale parameter (ln(c)) as linear combinations of the natural logarithm of the predictor variables (X1 .. Xp-1) from Table 5 Next, we fit the original model as a single equation expressing the difference between two cumulative density functions (F(x)), with the location parameter a set to 9.0 cm.We removed non-statistically significant (probability < 0.05) parameters from the model.
Model (3) was fit by forest type.

Non Parametric or RandomForest Nearest Neighbour Prediction
We used nearest neighbour methods [11] to impute the size class distributions for the target prediction units.The distance measure to identify nearest neighbours was calculated using randomForest and the predictions are referred to as randomForest Nearest Neighbour (RFNN).RandomForest [33] is a nonparametric technique that generates a "forest" of regression trees.Each regression tree is grown using binary partitioning so that at each node, the training data are split into two groups using a single predictor.This binary partitioning continues until each final group ("node" or "leaf") contains a user-specified number of data points.Each tree is grown with a random subset of the training data and the decision variable at each node is drawn from a random subset of the potential predictor variables.The distance between a target prediction unit and each point in the reference data set is one minus the proportion of trees where the target prediction unit is in the same terminal node as the reference observation.
The R package yaImpute [34] was used identify the nearest neighbour (k = 1) in the reference dataset which was then used to impute the array of BA by 2 cm Dbh class, ranging from Dbh9, Dbh11, …, Dbh69 where Dbhi is the Dbh class (i -1) ≤ Dbh < (i + 1).The function "yai" was used with method = "randomForest" and the supplied defaults, including the number of regression trees = 500 and mtry (the number of predictor variables picked a random) equal to the square root of the number of predictor variables.Unlike the parametric predictions, the data were not stratified by forest type.

Evaluating Fit
The predictions were evaluated by visually comparing the predicted and observed distributions as well as two measures of fit.
The first measure of fit was the index developed by Reynolds et al. [34], which was used to measure the closeness of Dbh predictions to the data.Let be the cumulative density function (cdf) of diameters (x) on a plot predicted by the model and be the observed cdf.Let w(x) be a weight function and N the number of trees/ha.The Reynolds error index is the following.

  
Reynolds et al. [35] suggested setting the weight to the volume of a tree in diameter class x or the dollar value of the tree.We set the weight to the basal area of a tree with Dbh x.The error index was calculated as the weighted absolute differences in frequencies summed over all diameter classes: The statistical properties of the index are unknown but the smaller the index, the better the agreement between the predicted and observed distribution.
A second measure of fit was the closeness of the quadratic mean diameter (DQ) calculated from the predicted distribution compared to the actual DQ.This measure is relatively insensitive to the shape of the distribution.
Our 400 m 2 prediction unit is relatively small, resulting in some jagged Dbh distributions (e.g., Figure 3c).Moreover, graphical summaries have limited usefulness when plots contain a relatively small (<40) number of trees [9].Larger plots or areas of prediction, such as stands or blocks, may be expected to have smoother distributions.Therefore, we grouped validation plots by forest type as well as VCI class, a measure of the entropy of the vertical distribution of the ALS returns, to better assess prediction results.
The error index and DQ prediction errors were subjected to repeated measures analysis of variance.In this study, four error indices and four DQ prediction errors were calculated for each plot corresponding to two remote sensing methods (IPC vs. ALS) and two prediction methods (SUR vs. RFNN).The following hypotheses were tested.
H0: ALS = IPC.The error index does not depend on remote sensing technique (ALS vs. IPC).H1: ALS ≠ IPC.The error index depends on remote sensing technique (ALS vs. IPC).and H0: SUR = RFNN.The error index does not depend on statistical technique (SUR vs. RFNN).H1: SUR ≠ RFNN.The error index depends on statistical technique (SUR vs. RFNN).
As well, the interaction between remote sensing and statistical technique was tested.Similar hypotheses were tested for DQ prediction errors.Forest type was included as a fixed-effect, blocking variable.

Parametric Predictions
The parametric predictions of relative BA were converted to BA (m 2 /ha) by diameter class using the prediction equations developed in [31] to predict total BA and the proportion of BA in trees with Dbh > 9.0 cm.We found that the results were sensitive to the size of the Dbh class interval and the size uniformity of the trees on the sample plots.Single-storey, even-aged stands could generally be well represented with a unimodal distribution (Figure 3a).With a prediction unit as small as 400 m 2 , size distributions were often not unimodal, particularly with small (2-cm wide) Dbh classes, and were poorly predicted with a Weibull function (Figure 3b).If wider Dbh class intervals were used, it is possible that the plot in Figure 3b would resemble a uniform distribution.Figure 3c is an example of a possible two-storied stand.These sample distributions illustrate one of the difficulties of parametric prediction-the parametric approach to predicting distributions is limited by the capability of the distribution function to adequately represent the true distribution.Part of this is due to the relatively small plot size used, but the complexity of the tree size distributions also has an impact.The variables selected by stepwise regression (Equation ( 2)) varied by forest type, resulting in final parametric models that varied by these strata.The ALS and IPC predictions were similar and reasonably close to the actual distribution for unimodal distributions (Figure 4a).The prediction of more complex distributions was less satisfactory (Figure 4b,c).For the ALS-based predictions, one or more of VDR_95, VCI_95, and p90 appeared in each model except in the LC forest type.For the IPC-based predictions, CC values, particularly CC2, CC14, and CC16 occurred in most models.There was more similarity in predictors across strata within data sources (ALS vs. IPC) than across data sources for the same stratum.A concern of the parametric modeling approach was the overestimation of basal area in the larger Dbh classes as the model form tries to smoothly feather the basal area distribution.

Nonparametric Predictions
The nonparametric predictions were more irregular than the parametric predictions (Figure 5).RFNN essentially looks for nearest neighbours in the reference dataset in terms of the ALS predictor attributes (P40, d10, etc.).The reference dataset did not appear to be large enough to find close neighbours for all the observed size-classes (e.g., Figure 5b) but, in contrast to the parametric modeling approach, the nonparametric models were able to predict multi-modal distributions.In addition, the nonparametric models do not predict tree sizes outside the range in the reference data set.The imputations here come from the single nearest neighbour (k = 1), resulting in jagged distributions.The number of neighbours used could be increased and weighted by the distance from the target observation.This would likely result in smoother distributions.We used a single nearest neighbour, resulting in a realistic, jagged distribution at the plot level.When the imputations from the individual prediction units are aggregated to a stand or block, the resulting distribution will be smoother.The differences between the ALS and IPC nonparametric predictions were greater than those observed for the parametric predictions.When validation plots were grouped by VCI class (i.e., representing an area larger than 400 m 2 ), the actual distributions were smoother and the ALS and IPC parametric predictions were very close (Figure 6).

Measures of Fit
The error index was computed using 2-cm Dbh intervals.The smaller the index, the better the agreement between the actual and observed distribution.The differences in error index were small and within the range of variation within the strata.Again, the plot size, 400-m 2 , is small for estimating meaningful size class distributions.Predictions are expected to perform better when prediction units are aggregated over a larger range, for example, at the stand polygon or harvest block level.Results from accuracy assessments vary with the scale (in terms of area), of the assessment [36].Therefore, the error indices were also calculated after combining all the validation plots within a stratum (Figure 7).We obtained similar results when comparing the actual DQ to the DQ calculated from the predicted Dbh distribution (Figure 8).The DQ errors for lowland black spruce (SB) were particularly small, due in part to the smaller DQ.For nonparametric predictions, the average bias, when calculated at the plot level, was larger than when the bias was calculated by strata.The opposite was true for the parametric predictions.In general, the nonparametric predictions were marginally better (lower error index) than the parametric predictions.The differences between the ALS and IPC predictions were smaller.The largest error indices were associated with lowland black spruce (SB), the forest type with the most samples, but a relatively small average tree size.
Repeated measures analysis of variance results show statistically significant differences between statistical techniques (p = 0.0145) but not between remote sensing techniques or forest types (p ≥ 0.1812) for DQ error (Table 6).For the error index, there are statistically significant differences by forest type and for statistical technique (p ≤ 0.0199) but not by remote sensing technique (p = 0.6048).Table 6.The repeated measure analysis of variance results are given.The probability that the null hypothesis (H0) is supported by the data is given.Statistically significant differences (probability < 0.05) are shaded.

Discussion
The differences between the ALS and IPC predictions are minor and not statistically significant when compared in terms of error index or DQ.Comparable results for ALS and IPC suggest that the choice of remote sensing can be based on other considerations.If an appropriate quality DTM is not available, IPC is not a viable option because the point cloud requires such a DTM for normalization.If a DTM is available, IPC may be the preferred method, since the point-cloud data may be generated at minimal additional cost when the same imagery is required for species interpretation to support inventory work.
The differences between parametric and nonparametric predictions when compared in terms of error index and DQ were statistically significant.Agreement between the actual and predicted size class distributions was poor for some plots, due in part to the small prediction unit size and relatively narrow Dbh class width used.Ground sampling is expensive and plot size is generally balanced by the number of samples.Prediction unit size is a function of the spatial resolution desired for the inventory attributes, and ~400 m 2 has been found to be a suitable size for both ALS [2] and IPC [3] predictions in the boreal forest.Larger prediction units may be used, but as the size increases, so does the cost of calibration and validation.Decisions regarding Dbh class width can affect parametric estimates.However, the resulting models can be used to predict the relative BA for any Dbh interval.In contrast, nonparametric imputation is tied to the Dbh class width associated with the training data-classes can be aggregated but not split.The inflexibility of the nonparametric predictions with respect to Dbh class width led to the choice of a relatively narrow Dbh class width.Alternatively, broader Dbh classes could have been used and finer intervals could be interpolated from the nonparametric predictions.
Parametric distributions have positive relative frequencies for all positive diameters, creating the need to truncate the right side of the distribution to avoid the over-estimation of BA into larger Dbh classes.Several options exist, including predicting the maximum Dbh for each pixel, setting relative frequencies below some threshold to zero, and capping predictions at the maximum observed Dbh in the calibration data.Parametric predictions often benefit from stratification into similar forest types, which can increase modeling costs.Nonparametric imputations generally use reference observations that are close or similar by some measure, obviating the need for stratification.
BA (m 2 /ha) by Dbh class, rather than relative BA by Dbh class, is generally of interest.For the parametric models, this requires a prediction of total BA in trees larger than the minimum Dbh.For this study, those predictions were developed earlier [31].The error index and DQ used here to evaluate the predictions used relative BA.
This study deliberately focused on merchantable-sized trees (Dbh > 9.0 cm).Small errors in the BA associated with small trees can lead to unreasonably large estimates of stems/ha.A previous study [31] derived estimates of total BA and the relative BA in merchantable-sized trees.The BA in smaller trees can thus be estimated.It could also be partitioned into size classes, but we have found that error rates are high.
Diameter distribution predictions are best suited to aggregates of prediction units (e.g., stands or other areas of interest), which complicates validation since the spatial definition and measurement of large plots on the ground can be difficult and prohibitively expensive.In this study, we attempted to get around this obstacle by aggregating and validating predictions by forest type or VCI class.Recent advancements in harvesting equipment (e.g., MultiDat TM data loggers, [37]) allow the measurement and recording of the stem diameter (and many other parameters, including stem taper and product volume) of each tree as it is harvested, presenting the best opportunity for validating inventory predictions.
Alternatively, a tree-based approach can be used.First, tree crowns are delineated using ALS [38], and Dbh is estimated from the tree crowns [39].The relative accuracies of tree-based and area-based estimates vary, depending on the attribute [40] and the degree to which crowns are visible from above, leading to research into combining tree-and area-based approaches (e.g., [41]).

Conclusions
Area-based forest inventories have been developed using ALS metrics and generally include estimates of per hectare values (BA, volumes, etc.) as well as mean tree attributes (e.g., DQ).Tree-based ALS inventories contain much desired information on individual tree dimensions.The addition of size class distributions to area-based inventories bridges some of the gap between area-and tree-based inventories.This study examined the potential of ALS and IPC to predict size class distributions in a boreal forest.Given an accurate digital terrain model, both ALS and digital stereo aerial photos provide size class distributions that were not statistically different in terms of error index and DQ error.Nonparametric imputations were associated with lower error index and DQ error values than parametric imputations.This may be related to the limitation of using a unimodal Weibull function on a relatively small prediction unit size.Generally, it is expected that predictions based on aggregated prediction units will perform better than comparisons on a single prediction unit.

Figure 1 .
Figure 1.The location of the Hearst Forest is given.Insets provide examples of the aerial imagery used in the study.

Figure 2 .
Figure 2. Point clouds are illustrated as three-dimensional plots for a sample 400-m 2 forested plot (ALS left panel; IPC right panel).ALS returns are distributed throughout the canopy and include the forest floor.In contrast, IPC elevation measures exist only for features shown on the image-in this case, the canopy surface.If a quality DTM exists, then the IPC-derived canopy surface measures can be translated into actual height values.

Figure 3 .
Figure 3.The actual size class distributions are compared to predictions using Equation (1) for three sample plots-plot 34 (a), plot 6 (b), and plot 44 (c).The dependent variable is the relative proportion of BA within each diameter class.

Figure 4 .
Figure 4.The actual size class distributions are compared to predictions using Equation (1) for three sample plots-plot 34 (a), plot 6 (b), and plot 44 (c).The dependent variable is the relative proportion of BA by diameter class.The error indices for a) are 1.76 (ALS) and 1.61 (IPC); for b) are 3.88 (ALS) and 3.18 (IPC); and for c) are 1.43 (ALS) and 1.78 (IPC).Parametric predictions have been converted to BA by Dbh class to permit comparisons.

Figure 5 .
Figure 5.The actual size class distributions are compared to nonparametric predictions for three sample plots-plot 34 (a), plot 6 (b), and plot 44 (c).The dependent variable is the relative proportion of BA by diameter class.The error indices for a) are 1.45 (ALS) and 1.90 (IPC); for b) are 2.94 (ALS) and 3.78 (IPC); and for c) are 1.98 (ALS) and 1.98 (IPC).

Figure 6 .
Figure 6.Parametric and nonparametric relative basal area by diameter class results are given for the black spruce and mixed hardwood validation plots aggregated by VCI class.

Figure 7 .
Figure 7.The strata level error index is given for the validation data.All the plots within a stratum were combined prior to calculating the index.

Figure 8 .
Figure 8.The average difference between the DQ calculated from the actual distribution and the DQ calculated from the predicted size class distribution.The difference is given for the validation data, averaged by strata.All the plots within a stratum were combined prior to calculating the bias.

Table 1 .
The forest types used in the analysis are described.

Table 2 .
The number of plots is given by forest type and vertical complexity index (VCI) class.The number of calibration plots is followed by the number of validation plots (in brackets).

Table 3 .
The ground plots are summarized by forest type (including the eight plots from each forest type that were reserved for validation).The mean is followed by the range (in brackets).

Table 4 .
The ALS specifications are given.

Table 5 .
ALS and image point cloud predictor variables are defined.Field names with a "_95" ending are based on the lower 95% of returns.