Remote Sensing Lidar Sampling Density for Forest Resource Inventories in Ontario, Canada

Over the past two decades there has been an abundance of research demonstrating the utility of airborne light detection and ranging (LiDAR) for predicting forest biophysical/inventory variables at the plot and stand levels. However, to date there has been little effort to develop a set of protocols for data acquisition and processing that would move governments or the forest industry towards cost-effective implementation of this technology for strategic and tactical (i.e., operational) forest resource inventories. The goal of this paper is to initiate this process by examining the significance of LiDAR data acquisition (i.e., point density) for modeling forest inventory variables for the range of species and stand conditions representing much of Ontario, Canada. Field data for approximately 200 plots, sampling a broad range of forest types and conditions across Ontario, were collected for three study sites. Airborne LiDAR data, characterized by a mean density of 3.2 pulses m −2 were systematically decimated to produce additional datasets with densities of approximately 1.6 and 0.5 pulses m −2. Stepwise regression models, incorporating LiDAR height and density metrics, were developed for each of the three LiDAR datasets across a range of forest types 831 to estimate the following forest inventory variables: (1) average height (R 2

However, standards for the acquisition, processing and application of LiDAR data for forestry and natural resources inventory and management are not well defined, nor are they likely to be standardized across all inventory variables or forest types.For example, data acquisition standards that determine the optimal acquisition of LiDAR data for forestry (in terms of forest variable estimation and cost efficiency) have not been universally defined, nor is there documentation of expert knowledge defining suitable acquisition criteria (i.e., survey design) for estimating forest variables.These standards are required for the forest industry to gain the best possible return from the technology across a range of forest conditions and for specific operational requirements, as well as to maintain consistency across surveys within regions.This deficiency must be addressed to provide the forest sector, both in industry and government, with a distinct competitive advantage in achieving truly sustainable forest management that encompasses economic, ecological, and social values.
The overall goal of our research has been to examine acquisition standards for collecting, processing and analyzing LiDAR data to derive forest inventory attributes that lead to the production of an eFRI for Ontario forests.A number of researchers have examined the impacts of different sensor and survey parameters on estimating forest inventory variables [23][24][25][26][27][28][29][30][31][32][33].It has been shown that the plot-level vertical distribution of LiDAR pulse returns remains relatively consistent with flying altitude, albeit with some subtle differences [27,28].Naesset [32] also examined the effects of different sensors, flying altitudes and pulse repetition frequencies on LiDAR-derived metrics for estimating mean tree height and timber volume for Norway Spruce (Picea abies (L.) Karst) and Scots Pine (Pinus sylvestris L.).Results revealed minor differences in precision for the various acquisition parameters and systematic differences between acquisitions of up to 2.5% for mean tree height and 10.7% for timber volume.However, it is not clear as to what impact pulse repetition frequency has on these estimates, since this variable could not be isolated between acquisitions, due to integrated effects of different sensors and flying heights.
Further, it has been demonstrated that pulse power has significant impacts on canopy attribute characterization, a variable that will vary with sensor pulse repetition frequency and flying altitude [26].The minimum distance between first and last returns also appears to increase with increasing flying altitude, potentially altering the statistical distribution of LiDAR returns within a forest canopy [31].However, Lim et al. [34] examined the statistical nature of 23 LiDAR-derived height and density metrics for two LiDAR sampling densities (data acquired on separate acquisitions at different altitudes).Only a very small number of metrics corresponding to the tails of the distribution of the laser canopy heights differed between the two surveys, indicating that plot-level data characterized by higher laser sampling densities do not necessarily result in richer data for biophysical variable estimation.Similarly, Bater et al. [33] also observed that most LiDAR first return vegetation height metrics did not differ between flight lines of identical sensor and survey parameters, but with differing point densities in areas of overlap.The authors concluded that when sensor setting and data acquisition parameters are held constant, and time dependent forest dynamics have not changed, LiDAR data are suitable for forest monitoring.
The above studies provide substantial insight into the effects of sensor characteristics and survey designs on LiDAR data point distributions, metrics and variable estimation.However, it is difficult to isolate single sensor or data acquisition parameters when trying to examine their effects, due to their integrated nature and co-dependency.In addition, these studies also exhibit different experimental designs across contrasting forest environments, which make comparison difficult [32,33].
Specifically, a key question that has yet to be isolated and fully addressed, and that the forest industry continues to ask as it considers operationalizing the use of LiDAR in forest resource inventories, is: What is the optimal point density for predicting forest inventory variables?It is still not clear how LiDAR data collected at different point densities impacts the estimation of a full range of forest biophysical variables for forest ecosystems across Ontario.Point density is a function of flight and sensor parameters which continue to evolve with the development of new sensor technologies.These developments will continue to impact data acquisition costs.This research focuses specifically on sampling density in order to determine the impacts of LiDAR point density on the prediction of forest inventory variables, independent of sensor or flight parameters.It is assumed that lower LiDAR point densities will translate into reduced data acquisition costs, always a consideration when conducting forest resource inventories.To investigate this question, we examined the impact of three point densities (3.2, 1.6, and 0.5 pulses m −2 ) derived from the same LiDAR data acquisition on the prediction of several forest inventory variables for forest types common across Ontario.In this manner, we were able to isolate the effect of sampling density on the estimation of forest biophysical variables for a range of forest ecosystems.

Study Areas
The three study areas were chosen to exemplify the majority of Ontario's commercial forest landbase.These included the Swan Lake Research Forest (SLRF), Petawawa Research Forest (PRF) and Romeo Malette Forest (RMF) (Figure 1).

Swan Lake Research Forest
The SLRF is a 2000 ha forest located 250 km north of Toronto within the Algonquin Provincial Park (45°28′N, 78°45′W) (Figure 1).Elevation at the site ranges from 412 to 587 m above sea level (a.s.l.).The site lies on the Precambrian Shield and is characterized by rolling hills and high rocky ridges that are separated by valleys, scoured by glaciation.Outwash flats, ablation moraines, and drumlinoid deposits provide soil deposits ranging from coarse to medium texture [21].The Algonquin Dome, due to its elevation, has a climate that is generally more cool and wet than its surrounding areas [35].Based on climate normals from the climate station at Huntsville, Ontario, the SLRF has a mean annual temperature of 5.5 °C (January mean (−10.2 °C); July mean (19.4 °C)).The average annual precipitation is 1,032 mm with 746 mm falling as rain whereas the average annual snowfall is 286 cm [36]

Petawawa Research Forest
The PRF is located approximately 200 km west of Ottawa and 180 km east of North Bay, just east of Chalk River, Ontario (Figure 1).Climate normals for PRF include a mean annual temperature of 4.3 °C (January mean (−13.0 °C); July mean (19.2 °C)).The average annual precipitation is 853 mm with 651 mm falling as rain.Average annual snowfall is 204 cm [36].PRF lies on the southern edge of the Precambrian Shield with its topography strongly influenced by glaciation and post-glacial outwashing.The terrain is dominated by: (i) extensive sand plains of mostly deltaic origin; (2) imposing hills, shallow, sandy soils, and bedrock outcrops; and (3) gently rolling hills with moderately deep, loamy sand containing numerous boulders [21].Elevations in the area range from 140 to 280 m a.s.l.The research forest encompasses 10,000 ha of mixed mature natural and plantation forest that is representative of the Great Lakes-St.Lawrence Forest and is characterized by eastern white pine, red pine (Pinus resinosa Ait.), trembling aspen, and white birch.Red oak (Quercus rubra L.) dominates poor, dry soils in the area.Boreal forest species from the north and shade-tolerant hardwoods from the south exist on suitable sites.

Romeo Malette Forest
The RMF is located in the northeast portion of Ontario's Boreal Forest near Timmins, Ontario (Figure 1).It has a relatively cool climate with a mean annual temperature of 1.3 °C (January mean (−17.5 °C); July mean (17.4 °C)).The average annual precipitation is 831 mm with 558 mm falling as rain.Average annual snowfall is 313 cm [36].It is an active forest management unit with approximately 532,000 productive forest hectares.The forest is characterized by extensive coniferous stands on poorly drained lowlands and gently rising uplands.The northern portion of the study area (approximately 40% of the forest area) is located on clay sites, best described as relatively flat to gently rolling, interspersed with depressions and eskers [37].The elevation in the north has a narrow range (i.e., from 305 to 320 m), resulting in a high water table and poor drainage across extensive clay deposits [37].In the southern area (i.e., approximately 60% of the study area), the forest consists of glacial deposits of boulder sand till overlying bedrock with elevation ranging from 305 to 381 m a.s.l.Topography is typically rolling with the soils exhibiting good drainage [37].The dominant species are black spruce [Picea mariana (Mill) B.S.P.], white birch, trembling aspen, jack pine [Pinus banksiana Lamb.], eastern white cedar, white spruce, eastern larch, and balsam fir.Species occurring less frequently include black ash, yellow birch, soft maple and red and white pine.

LiDAR Data
Airborne LiDAR data were collected in August 2007 for each of the study areas on a strip basis using an Optech ALTM 3100 mounted in a Cessna Grand Caravan aircraft.The base mission was flown at 1,000 m altitude with a 20° field of view (½ angle), scan rate of 54 Hz, and a maximum pulse repetition frequency of 100,000 Hz.This configuration resulted in a cross-track spacing of 0.499 m, an along track spacing of 0.572 m, an average pulse density of 3.2 pulses m −2 , and a swath width of approximately 475 m.The LiDAR data were classified as ground or non-ground returns by the vendor using the TerraScan software and proprietary algorithms.

Ground Reference Data
The forest types sampled were: (i) tolerant hardwoods (i.e., sugar maple, beech, yellow birch) (Tol-Hwd); (ii) Great Lakes-St.Lawrence pine communities (i.e., white pine, red pine, jack pine) (GrtLks-Pine); (iii) Boreal black spruce (Boreal-SB); (iv) Boreal jack pine (Boreal-PJ); (v) Boreal intolerant hardwoods (i.e., white birch, trembling aspen) (Boreal-IH); and (vi) Boreal mixed woods (Boreal-MW).Ground reference data were collected for the three study areas during the periods of November-December 2007 and May-October 2008.A circular, fixed area plot of 400 m 2 (11.28 m radius) was used for sampling all forest types except the tolerant hardwood group, where a 1,000 m 2 (17.84 m radius) plot size was used to better represent the uneven-aged size class structures present.The centre of each circular plot was geo-referenced with a Trimble Pro XT™ kinematic GPS unit connected to a Hurricane™ antenna, mounted on a tripod.A minimum of 300 GPS points were collected for each post position and later post-processed against a base station to achieve sub-meter accuracy.
Each plot had all trees larger than or equal to 10.0 cm measured for DBH with a diameter tape.Each tree was assessed for species, status (i.e., live or dead), crown class (i.e., dominant, co-dominant, etc.) and visual quality.A Vertex™ hypsometer was used to measure tree height for each tree in the plot.Heights of deciduous species were measured during leaf-off conditions to obtain the most accurate height measurements possible.The forest variables included in the analysis are presented in Table 1.
A total of 32 plots were established in the SLRF and assigned to the Tol-Hwd forest type.Similarly, 35 plots were established in the PRF and assigned to the GrtLks-Pine forest type while 136 plots were established in the RMF, with each plot assigned to one of four forest types (i.e., Boreal-IH, Boreal-MW, Boreal-PJ and Boreal-SB).A summary of the field data for each study site and based on forest type are presented in Table 2.

Data Processing
All LiDAR returns were normalized against a triangulated irregular network (TIN) that was developed using the LiDAR returns that were classified as ground.The process involved subtracting from the original z-value of a return the z-value on the TIN matching its x-y coordinates.No height threshold (e.g., 2 m; [40]) was used to filter the LiDAR data.Preliminary tests indicated that application of a height threshold did not improve model performance.
LiDAR data were decimated according to the methodology described by Raber et al. [41].A decimation level 0 (D0) represented the original dataset characterized by a point density of approximately 3.2 pulses m −2 .The decimation level 1 (D1) LiDAR dataset was derived by taking alternating pulses along each scan line with each scan line retained, thereby increasing the cross track spacing by a factor of two.For the decimation level 2 (D2) LiDAR dataset, every fourth point along each scan line was retained, thereby increasing the cross track spacing by a factor of 4, and every other scan line was retained, resulting in an increase in the along-track spacing by a factor of 2. The systematic decimation resulted in the D1 and D2 LiDAR datasets possessing point densities of approximately 1.6 and 0.5 pulses m −2 , respectively (Figure 2).
Predictor variables were derived statistics from the normalized LiDAR data.In each case, "all" returns were used, and height thresholds were not used to filter point data.Potential predictors included basic statistics of mean height (mean), standard deviation (std_dev), and absolute deviation around the mean height (abs_dev), as well as deciles of LiDAR canopy height (i.e., p10…p90) and the maximum (max) LiDAR height.For each plot, the range of LiDAR height measurements was divided into 10 equal intervals and the cumulative proportion of LiDAR returns found in the first nine intervals provided as many additional predictors (i.e., d1…d9).The final two predictors were calculated as the number of first returns divided by all returns intersecting a sample plot (Da), and the number of first and only returns divided by all returns intersecting a sample plot (Db).The entire RMF was then subdivided into contiguous 400 m 2 (20 m × 20 m) tiles or prediction units (PUs) and the above suite of predictor variables calculated for each, creating a 20 m × 20 m raster surface populated with LiDAR predictors.In this manner, model prediction surfaces are generated for the entire LiDAR coverage (Figure 3).

Statistical Analyses
Multiple stepwise regressions with a significance level of 0.05 were used for constructing models for predicting forest inventory variables.A diagnosis of each model was performed to determine if parametric statistical assumptions were satisfied.The Shapiro-Wilk Test was used to determine if residuals were normally distributed and the Brown-Forsythe Test was used to check for the presence of heteroscedasticity (i.e., unequal error variance).As LiDAR predictors have been reported to be highly correlated, the variance inflation factors (VIFs) for the predictors used in each model were examined.Candidate models where predictors exhibited VIFs greater than 10 were discarded, as these would suggest the presence of multi-collinearity in the predictor data [42].

Effect of Decimation Treatments
For each forest variable, predictions from the models constructed for the different decimation levels were compared to respective observed values to create a multivariate response vector of absolute prediction errors for each plot: where, e0 i(j) is the absolute error associated with the undecimated prediction for plot j within each forest/type i (Tol-Hwd, GrtLks-Pine, Boreal-PJ, Boreal-SB, Boreal-IH and Boreal-MW): and e1 i(j) and e2 i(j) denote similar errors for the decimated predictions, 1.6 and 0.5 pulses m −2 , respectively.Prediction errors were then subjected to repeated measures analyses of variance (RMANOVA), treating the different forests/types as fixed effects, and making comparisons of the different decimation levels (within-subject effects) with multivariate tests (Wilks' lambda).RMANOVA is used when all members of a sample are measured repeatedly under a number of different conditions.Within the context of this study, it was a particular forest variable for a plot that was repeatedly predicted using LiDAR data of three varying point densities.Given repeated measurements, the use of a standard ANOVA is not appropriate [43].We contrasted the different decimation levels to test for increases in prediction error that were proportional and disproportional to the decimation levels applied (i.e., linear and quadratic contrasts, respectively, associated with increasing decimation).With this approach, the following null hypothesis was tested for each forest variable: H 0 : Decimation of the LiDAR point cloud from 3.2 to 1.6 and 0.5 pulses m −2 does not reduce prediction precision; versus H a1 : Decimation of the LiDAR point cloud reduces prediction precision proportional to decimation level; or H a2 : Decimation of the LiDAR point cloud reduces prediction precision disproportional to decimation level.In cases where a decimation × forest/type interaction was indicated, similar analyses by forest/type were used to reveal the source of interaction.

Results and Discussion
For illustration purposes, the models developed for each variable for Boreal-SB and Boreal-IH plot data are presented in Tables 3 and 4. For black spruce, the models typically exhibit very high coefficients of determination (i.e., R 2 and R 2 (adj)) with RMSEs, expressed as a percentage of the predicted means, ranging from approximately 4-19%, typically with only one or two input variables (Table 3).Models developed for variables based on intolerant hardwood plots in the RMF also exhibited high adjusted coefficients of determination (i.e., R 2 (adj) = 0.201-0.939)with RMSEs ranging from 4 to 23%.Variable-by-stand type results indicate that height-related models (i.e., AVGHT; TOPHT; QMDBH) tended to perform well (i.e., RMSEs < 10%) and volume/biomass-related models (i.e., SUMBA; SUMGTV, SUMGMV, SUMBIO) performed moderately well (RMSEs typically 10-20%).Density models tended to exhibit the highest RMSEs (Table 5).
For all forest variables tested, overall model prediction precision varied strongly with forest/type (p ≤ 0.02; Table 6).Boreal-SB variables tended to be predicted with the greatest precision (lowest mean absolute errors) and GrtLks-Pine variables with the least precision.The Boreal-SB stands were quite similar in that the majority were upland sites with mature black spruce of natural origin, whereas the GrtLks-Pine communities were more diverse in terms of species, management, and origin.The range of conditions sampled included unmanaged white pine, shelterwood white and red pine, thinned and unmanaged red pine plantations, as well as some natural jack pine stands.In future work, this group will be subdivided further to better consider the volume/height relationships for these species and management conditions.
With the exception of 2 of the 8 forest variables tested, we generally found little evidence to reject the null hypothesis that decimation of the LiDAR point cloud has no affect on model precision (p > 0.10; Table 6).In overall analyses, mean prediction errors tended to increase with decimation for the variables DENSITY and AVGHT (p ≤ 0.02), but there was evidence to suggest that this situation was not consistent across all forest/types studied (p ≤ 0.10).More specifically, DENSITY prediction errors for Boreal-PJ increased from 226 stems ha −1 at 3.2 pulses m −2 , to 299 and 324 stems ha −1 through decimation to 1.6 and 0.5 pulses m −2 respectively (decimation linear, p < 0.01).
To a lesser extent, prediction errors for GrtLks-Pine increased from 171 stems ha −1 at 3.2 pulses m −2 , to 211 and 192 stems ha −1 through decimation to 1.6 and 0.5 pulses m −2 respectively (decimation quadratic, p = 0.09).Thinning treatments had been applied to these forests/types potentially giving rise to increased error as a function of insufficient sample size to account for a suitable range of density conditions.
Similarly, AVGHT prediction errors for Boreal-MW and GrtLks-Pine tended to increase sharply (30 to 40%) with the highest level of decimation (i.e., 0.5 pulses m −2 ) (decimation quadratic, p ≤ 0.10).However, these examples appear rare in the context of the overall data set and one may argue that with a significance level of 10%, we might expect to observe trends that suggest rejection of H 0 up to 10% of the time simply through random chance alone.Thus, we feel that it is reasonable to conclude that decimation of the LiDAR point cloud from 3.2 to 1.6 and 0.5 pulses m −2 did not reduce the prediction precision of the forest variables tested.
The ability to significantly reduce LiDAR pulse density for forest inventory modeling without affecting prediction accuracy or precision provides significant financial savings in data acquisition and processing.Although not tested in this study, it is anticipated that accurate and precise digital elevation models (DEMs) of a finer scale than possible in the past can also be derived from low density LiDAR data, even in leaf-on conditions.Table 3. Models (and associated statistical descriptors) developed for variables, based on black spruce plots in the RMF.Models are presented for each of the decimation levels (i.e., D0 ~3.2 pulses m −2 ; D1 ~1.6 pulses m −2 ; D2 ~0.5 pulses m −2 ).Similar sets of models were developed for SLRF (Tol-Hwd), PRF (GrtLks-Pine), and RMF (Boreal-PJ, Boreal-SB, Boreal-IH and Boreal-MW).Overall mean absolute error:

Conclusions
The results from this research demonstrate that a point density of 0.5 pulses•m −2 is sufficient for the estimation of forest inventory variables at the plot and stand levels for the different forest types considered in this study.In cases where a decimation effect was observed for a forest variable, the effect may be attributed to differences in model form, specifically as it relates to number of predictors used.This study provides further evidence that low-density LiDAR-based predictions offer significant potential for integration into tactical forest resource inventories for a range of forest ecosystems across Ontario.LiDAR data can provide a number of important surfaces (i.e., bare earth digital elevation model; forest inventory predictor surfaces) that are critical to tactical forest management and planning.Further research into the generation of additional surfaces related to terrain should provide more precise characterization of moisture and nutrient regimes for modeling of forest ecosites.

Figure 1 .
Figure 1.Locations of the three study sites in Ontario, Canada.

π 1 ) 2 ,
Value obtained by summing the squared DBH of each tree in a 0.04-ha plot and converting this sum to area measure in m2   SUMBAGrossTotal Volume (m 3 ha −1 ) [38] SUMGTV = b 1 * DBH 2 *(1 − 0.04365* b 2 ) 2 /( b 3 + (0.3048 * b 4 /Ht)) where b 1..4 are regression coefficients that vary by species, and Ht is tree height in m.Per hectare value is obtained by summing the volume of each tree and dividing by the plot area (ha).SUMGTV Gross Merchantable Volume (m 3 ha −1 ) [38] SUMGMV = SUMGTV * (b 1 + b 2 (X) + b(X 2 )) where b 1..3 are regression coefficients that vary by species, and X = [(1+Hs/Ht)(Dtop 2 /DBH 2 )]; Hs = Stump Height (0.2 m); Ht = Total Tree Height (m); Dtop = Minimum Top Diameter (inside bark) (10 cm).Per hectare value is obtained by summing the volume of each tree and dividing by the plot area (ha).SUMGMV Density (stems ha −Number of live trees 10.0 cm DBH and larger, expressed per hectare.DENSITY Quadratic Mean DBH (cm) where n is stems per plot.QMDBH Average Height (m) The average height of all trees 10.0 cm DBH and larger.AVGHT Top Height (m) The average height of the 100 stems per hectare of largest DBH.TOPHT Aboveground Biomass (kg ha −1 ) [39] SUMBIO = b 1 * DBH b2 where b 1..2 are regression coefficients that vary by species.Per hectare value is obtained by summing the biomass of each tree and dividing by the plot area (ha).SUMBIO

Table 1 .
The forest variables considered for this analysis.

Table 2 .
Field data statistics for each study site and forest type.

Table 5 .
Root Mean Square Errors (RMSEs) for model developed for variables, based on forest inventory plots in the Romeo Malette Forest, Swan Lake Research Forest and Petawawa Research Forest.RMSEs are presented for each of the decimation levels (i.e., D0 ~3.2 pulses m −2 ; D1 ~1.6 pulses m −2 ; D2 ~0.5 pulses m −2 ).Percent RMSEs are presented in brackets.

Table 6 .
Summary of mean absolute errors for the decimation levels tested and p-values from RMANOVA (Wilks' lambda) testing for decimation and forest/type effects and their interaction.(Note: values in bold suggest a loss of precision with increasing decimation for some forest/types).