Using Tree Detection Algorithms to Predict Stand Sapwood Area , Basal Area and Stocking Density in Eucalyptus regnans Forest

Managers of forested water supply catchments require efficient and accurate methods to quantify changes in forest water use due to changes in forest structure and density after disturbance. Using Light Detection and Ranging (LiDAR) data with as few as 0.9 pulses m−2, we applied a local maximum filtering (LMF) method and normalised cut (NCut) algorithm to predict stocking density (SDen) of a 69-year-old Eucalyptus regnans forest comprising 251 plots with resolution of the order of 0.04 ha. Using the NCut method we predicted basal area (BAHa) per hectare and sapwood area (SAHa) per hectare, a well-established proxy for transpiration. Sapwood area was also indirectly estimated with allometric relationships dependent on LiDAR derived SDen and BAHa using a computationally efficient procedure. The individual tree detection (ITD) rates for the LMF and NCut methods respectively had 72% and 68% of stems correctly identified, 25% and 20% of stems missed, and 2% and 12% of stems over-segmented. The significantly higher computational requirement of the NCut algorithm makes the LMF method more suitable for predicting SDen across large forested areas. Using NCut derived ITD segments, observed versus predicted stand BAHa had R2 ranging from 0.70 to 0.98 across six catchments, whereas a generalised parsimonious model applied to all sites used the portion OPEN ACCESS Remote Sens. 2015, 7 7299 of hits greater than 37 m in height (PH37) to explain 68% of BAHa. For extrapolating one ha resolution SAHa estimates across large forested catchments, we found that directly relating SAHa to NCut derived LiDAR indices (R2 = 0.56) was slightly more accurate but computationally more demanding than indirect estimates of SAHa using allometric relationships consisting of BAHa (R2 = 0.50) or a sapwood perimeter index, defined as (BAHaSDen)1⁄2 (R2 = 0.48).


Introduction
Many benefits of forests are strongly influenced by forest structure and density.For example, forest regeneration after timber harvesting or wildfire affects forest structure and density, which can have dramatic effects on forest hydrology, forest growth rates, carbon stores and wildlife habitat.Advancements in Light Detection and Ranging (LiDAR) sensors are providing opportunity for detailed analysis of forest structure and density at unprecedented spatial scales.At landscape spatial scales, LiDAR data are being collected from forests across the world with fine scale information on individual trees and stand structure.New algorithms for analysing these large datasets are being developed and applied to produce information useful for forest management, particularly in the boreal forests of the northern hemisphere [1].
In south-eastern Australia, wildfires have burnt approximately 3 million hectares of forested land since 2003, and in addition, intensive forest harvesting in recent decades has stimulated large-scale changes in forest age.In tall eucalypt forests, replacing old forest with young healthy regenerating forest leads to an overall decline in catchment water yields over the forest regeneration period [2][3][4][5].Understanding and managing water security in Australia depends critically on the availability of efficient methods for quantifying the relationship between forest water use and changes in forest structure and density over the post-disturbance regeneration period.
Characterising forests with LiDAR data is commonly done on two spatial scales: (i) area based, where the biophysical variables are averaged over an area encompassing several trees; and (ii) tree based, where individual tree dimensions are estimated using individual tree detection (ITD) algorithms.In ecohydrology research, forest transpiration (T) is commonly derived from tree-scale measurements of sapwood area (SA1.3) at 1.3 m, and therefore using ITD algorithms to characterise biophysical properties of forests at the individual tree-scale may prove useful for scaling T from a tree-to-landscape level.
Compared to LiDAR research in boreal forests (see [1] for review), relatively few studies have published LiDAR based estimates of forest biophysical properties in eucalypt forests of Australia.Haywood and Sutton [6] estimated plot-level basal area per hectare (BAHa) in Eucalyptus regnans forests of south eastern Australia using simple predictor variables such as height percentiles and summary statistics of the vertical vegetation profile (r 2 = 0.56; RMSE = 14.7 m 2 ).Jaskierniak et al. [7] characterised a multilayered E. regnans forest structure using mixture distribution functions to predict plot-level BAHa with r 2 ranging from 0.6 to 0.89 in 18, 37 and 70 year old stands.In mixed species eucalyptus forests (MSEF), [8] derived plot level LiDAR indices using a Weibull distribution function, height percentiles, and summary statistics to predict stand BAHa and sapwood area per hectare (SAHa) with r 2 of 0.88 and 0.7 respectively.
Kaartinen et al. [9,10] provide a comprehensive review of ITD methods, all of which have been applied in the northern hemisphere.Kaartinen and Hyyppa [11] evaluated the quality, accuracy and feasibility of ITD methods in boreal forests and reported large variation in the quality of published methods with varying laser point densities.It has been shown that ITD performs best with laser scanning density of 5-10 pulses•m −2 , which is higher than the 0.9 and 4 pulses•m −2 available over our study sites.Several studies undertaken in Scandinavian and European forests have demonstrated that it is possible to detect 40%-70% of all trees using an ITD approach [12,13].In deciduous forests, the complex crown shape and structure results in generally lower success rates [14,15].In North American forests, success rates have been largely dependent on canopy density [16].Persson et al. [12,17] used the same ITD algorithm and managed detection rates of 71% for Scandinavian forest dominated by spruce and pine, 51% for coniferous forests, 45% for Bavarian Forest National Park, and 40% for deciduous forests.These studies suggest that ITD rates are highly dependent on forest structure.To our knowledge, we demonstrate for the first time the potential use of ITD methods in native eucalypt forests, which represent a forest structure substantially different to those of previous studies.
Most ITD algorithms generate canopy height models (CHM) to locate local height maxima in LiDAR data, which are then considered representative of the stem positions within the forest [18][19][20].Using this approach, point clouds are typically segmented into crown diameters using watershed algorithms [21], slope-based segmentation methods [12], or gridded methods smoothed with Gaussian filters [20].More advanced ITD methods recognise that heterogeneous multilayered forests, such as E. regnans forests in our study, require a 3D approach to address the segmentation problem [22][23][24].In E. regnans forest, an ITD algorithm needs to be robust enough to differentiate different vegetation strata down the vertical profile, whilst accurately handling different sized overstorey crowns across a heterogeneous forest landscape.
The overarching aim of our project is to acquire an accurate and robust measure of spatial variability in T per hectare across Melbourne's forested water catchments.This required us to develop a methodology that uses LiDAR indices to estimate SAHa, a well-established proxy for T. Our research was conducted in forested catchments supplying approximately 90% of the water used by the city of Melbourne.Considering Melbourne's water catchments consist of approximately 155,000 ha of forested land, predicting treespecific SA1.3 across the landscape is not strictly required by the water management agency-a relationship between plot-level LiDAR indices and SAHa is sufficient for water yield assessment purposes.We argue that a more robust and accurate relationship between plot-level LiDAR indices and field measured SAHa will eventuate if an ITD algorithm removes within-plot LiDAR points that represent trees outside the plot, and includes LiDAR points outside of plots if they represent measured trees withinplots.To confirm this, we compare our ITD derived BAHa estimates to those of [7] who characterised the forest structure across five of the same permanent plot catchments using an area-based mixture distribution function method.
To the authors' knowledge, [8] is the only study that has used LiDAR to capture within catchment variation in SAHa and demonstrated that forest heterogeneity across an elevation gradient was associated with a threefold change in annual T. They applied a SA1.3/BA1.3relationship (R1.3) to predict plot-level SAHa from plot-level BA1.3 measurements, and regressed the SAHa estimate against explanatory variables derived from LiDAR data to spatially extrapolate SAHa across two catchments (136 ha and 87 ha).
Adopting this approach requires the assumption that R1.3 is constant across Melbourne's water catchments.Recently, [25] have shown high spatial variability in R1.3 and the SAHa/BAHa relationship (RHa) across the landscape, and provide a more consistent species-specific relationship observed between SAHa and an index of total sapwood perimeter per hectare, defined as , where SDen is stocking density of overstorey trees per hectare.In order to determine how well LIDAR may estimate , we evaluate our ability to predict BAHa and SDen across all sites.We also measured SA1.3 across an intensively studied 5 ha site in order to determine how well the perimeter index may be used to scale tree-level SA1.

Harvested 5 Hectare Forest
A complete inventory of overstorey SA1.3 and BA1.3 was mapped over a 5 ha harvested site located along the upper reaches of the Great Dividing Range in Toolangi State Forest, approximately 80 km northeast of Melbourne, Australia.Refer to [26] for a detailed description of the site's topography, climate, and soil conditions.The overstorey consists of a pure E. regnans forest stand, originated from an even-aged regeneration caused by a wildfire in 1939, and the regrowth was harvested in the summer of December 2011 (73 years old).We estimated SA1.3 and BA1.3 for all stems across the site with measured stump dimensions data consisting of stump height (HS), diameter at stump height (DS), and diameter 20 cm below stump height (DS20), as well as photo images of stump cross-sections collected two months after harvest in order to differentiate sapwood from heartwood.Sapwood area and BA at stump height (SAS and BAS respectively) were calculated by correcting for lens distortion in each stump's crosssectional image, and then scaling the images to actual cross-sectional area.As described by [26], a mixed effects model was then used to predict SA1.3 and BA1.3 by coupling SAS and BAS estimates with stump dimensions data, and tree and sapwood taper data.
Table 1 provides diameter distributions of overstorey trees across all sites (see Section 2.1.2for description of other sites), and shows the 5 ha site had highly variable SDen and BAHa suitable for calibrating both ITD algorithms.To apply the ITD algorithms, each stump's position was accurately located using a 2 cm resolution orthophoto mosaic of the site after the post-harvest burn.To generate the orthophoto mosaic, images collected using a UAV platform (multi-rotor OktoKopter) were georectified following [25,27].In brief the method applied a photogrammetric technique called structure from motion (SfM) to orientate individual images and generate a dense point cloud with aerial triangulation.The generated point cloud was used to geometrically correct and mosaic the individual images.A detailed outline of the UAV platform's specifications and post-processing procedure are outlined in [25].

Forest Hydrology Research Catchments
Data from the long-term forest hydrology research catchments includes five catchments with 151 permanent growth plots consisting of 1939 wildfire E. regnans regrowth.The catchments were treated in the 1970s to investigate the impact of forest density and age on forest water use.Treatments across the catchments consisted of: selective forest thinning with 50% and 33% BA removal; removal of understorey only; and patch cut removing 54% of the area with clear fell harvesting.The permanent plots were revisited in the summer of September 2008 for measurement of diameter at breast height over bark (DBHOB) of all eucalyptus trees.Exact tree locations within plots were not mapped, and instead trees were allocated into 5 m × 5 m sub-plots.Table 1 provide summary statistics indicating a diverse range of forest conditions across the sites.

LiDAR Data
The LiDAR data were acquired in 2007 and 2008 for the 5 ha site and permanent plot sites respectively.Table 2 provides the flight details and sensor configurations for both flights.Data acquisition over the 5 ha site consisted of 0.9 pulses•m −2 as the flight covered an area encompassing approximately 500,000 ha of the Central Highlands Forest Management Area, whereas the flight over the research catchments was limited to the study site and consisted of 4 pulses•m −2 .In both cases, the LiDAR point cloud was measured relative to sea level and needed to be converted into height above ground in order to represent the vegetation height.For this purpose, the ground was classified with an iterative procedure that built a triangulated model of the ground surface [27,28].All ground classified points were then applied to a thin-plate spline interpolator, using the ANUDEM algorithm [29], to produce a digital terrain model (DTM).The DTM was subtracted from the height of the LiDAR points to produce a point cloud representing the vegetation height.
Overlapping LiDAR flight paths may distort the density of points when some sites are represented by one flight path and others by two overlapping flight paths.To address this problem, a point density map identified strips of overlapping paths, which were delineated and intersected down the middle to define the boundary used to adjoin adjacent flight paths.The dataset's GPS timestamp was then used to segregate the LIDAR into flight paths in order to remove any overlap.

Normalised Cut Segmentation
Individual tree segments are derived using the normalised cut (NCut) method known from image segmentation [30] and developed for LIDAR applications by [23].The NCut method was used to identify overstorey tree segments in LiDAR data, which were used to estimate SDen and BAHa across all sites and SAHa across the 5 ha site.The procedure first subdivides the region of interest (ROI) into a voxel structure with voxels and a voxel space (vp) of 1 m [23].The LiDAR points are assigned to voxels (l, m, n) (l =1, ..., ; m=1, ..., ; n =1, ..., ) of size vp 3 and only voxels with at least one point are used in the segmentation procedure.The reduced voxel space is then represented as a weighted undirected graph G = (V, E), where V is a set of nodes, one for each voxel, and E is a set of edges formed between each pair of nodes.The edges are weighted to represent the affinity between connected nodes, and the weight function wij is defined as [23]: The notation in Equations ( 1) and ( 2) is explained in the following two paragraphs.The Xij and Zij weight the Euclidian distance between voxels, with and representing the horizontal and vertical distances weighted separately to allow for consideration of the prior knowledge of a typical 3D shape of trees of interest.The fraction Gij allows the segmentation to use prior knowledge about stem positions, where is the maximum horizontal distance for each pair of voxels to the nearest stem position.The formulation of Gij assures that voxels nearest to a stem position will most probably belong to the same tree.Section 2.2.4 outlines how prior knowledge of stem positions was determined with a local maximum filtering method.
In Equation ( 2), the parameters, σ , , and σ , are calibrated to determine an appropriate amount of influence Xij, Zij, and Gij have on the weight matrix (W).To keep the computation at a reasonable size, the threshold parameter (rxy) in Equation ( 1) is used to recognise that the similarity between two voxels decreases with increasing distance between them, and all pairs beyond a distance rxy may be assumed to have zero affinity.Introducing rxy avoids the computation of wij for nodes far apart, which represents a great majority of pairs if the ROI is large and vp is small.
The NCut procedure partitions the weighted graph G into two disjoint segments, G1 and G2, by maximising the similarity of all voxels within a segment, and minimising the similarity between segments G1 and G2.For this purpose, Shi and Malik [30] have demonstrated that the following cost function should be minimised: where , ∑ ∈ , ∈ is the sum of weights between segments and , and , ∑ ∈ , ∈ is the sum of weights of all edges ending in segment .Shi and Malik [23,30] show that the minimisation of , may be solved with the following generalised eigenvalue problem: (4) where W is the weighting matrix that represents the weights wij between all nodes of the graph G, D holds the degree of a node i at the diagonal element ∑ , and is an indicator vector that corresponds to the second smallest eigenvalue , from which one can deduce whether a node belongs to , .A frequency histogram of eigenvector values shows that they are real-valued and therefore need to be binarised into segments and by introducing an optimal threshold parameter (Thresbin) that divides the histogram of into two.To optimise Thresbin, we evaluated 50 equally spaced values within the range of in order to select the value that minimised the NCut value in subdividing the graph into segments G1 and G2.Each segment, G1 and G2, was used to form a new graph G, which was applied iteratively using the procedure outlined above to generate smaller segments.In order to prevent the graph from being over-segmented, the procedure stopped if , becomes less than a maximum allowable NCut , value (NCutMax), or the number of voxels representing a segment drops below a minimum (Vmin) value, defined as: (5) where is the minimum crown volume.In Equation ( 4), the size of W and D is n*n, where n is the number of nodes in graph G.It is evident that as the ROI size increases, the size of D and W increases exponentially.Considering all pairs beyond a distance rxy are assumed to have zero affinity, both D and W become sparse matrices and should be stored accordingly to reduce the required computational memory for large ROI calculations.

Calibration of the NCut Procedure
Several parameters control the NCut segmentation process, and their values needed to be calibrated with consideration for computational requirements, LiDAR data resolution, and forest structure of interest.Due to high computational demand, the calibration process required the use of Victoria's Life Sciences Computation (VLSCI) mainframe facility and was restricted to a circular plot 130 m in diameter (area = 1.33 ha).The plot was selected within the 5 ha site for its broad range of stocking density conditions over a relatively small forested region.
Table 3a provides a list of parameter values used to identify the parameter set that produced the most accurate ITD segments across the 5 ha site.The vp and rxy values are largely influenced by the LiDAR resolution, whereas the other parameters are influenced by the forest structure.The calibration process involved a structured search that started off with the largest vp and smallest rxy value in order to evaluate the parameters influenced by forest structure with the least amount of processing time.The NCut algorithm was generally unstable with small rxy values, and consistently became more stable until rxy = 70 m, irrespective of the other parameter values applied.Using rxy of 70 m and the largest vp value, we attempted to reduce the plausible range of all other parameter values before evaluating them with a reduced vp value, keeping in mind that some parameters interact and therefore reducing vp required the Vmin value to be increased, for example.To evaluate the performance of the NCut algorithm using each evaluated set of parameter values, it was necessary to estimate each detected tree's location, which was defined as the detected tree's canopy centroid and was calculated as the mean x-coordinate and mean y-coordinate of all points representing an ITD segment.Tree height was calculated as the ITD segment's maximum height value.The evaluation process also required each ITD segment to be catagorised as a eucalyptus or non-eucalyptus tree.There is generally a large vertical space between the middle-storey and overstorey canopy profile in 70 year old E. regnans forest (see [31] for photos of forest structure).We were therefore able to select only one parameter from Table 3b to represent the minimum height value of the percentile index most predictive at distinguishing eucalyptus from non-eucalyptus tree.In order to identify the parameter set most effective at detecting individual eucalyptus trees, we maximised the following expression to increase the ITD rate: where M is the number of stems matched with only one detected tree closer to a stem than any other stem; C is commission error, defined as the number of stems with more than one detected tree closer to a stem than any other stem; and O is omission error, defined as the number of stems with no detected trees closer to a stem than any other stem.Twenty parameter sets most effective at detecting trees within the circular ROI (130 m in diameter) were evaluated across the 5 ha site.To make the computation more efficient, the 5 ha site was tiled into five sections with a 15 m overlapping buffer to remove erroneous artefacts along the tile boundaries when merging tiles.The parameter set most effective at maximising Equation (3) across the whole 5 ha site was used to predict SDen and BAHa across all sites and SAHa across the 5 ha site.

Cleaning ITD Segments
Predicting overstorey BAHa and SAHa required the LiDAR indices to be strata specific.Seventy-three year old E. regnans forests generally have overstorey canopy base higher than the tops of the middle storey.Preliminary investigations found some of our ITD segments classified as eucalyptus trees had a bimodal distribution of LiDAR points that included the middle-storey vegetation profile.For this reason, we were required to make the ITD segments more strata specific with the use of the Generalised Additive Model of Location, Shape and Scale (GAMLSS) package in R [32].
Following [7], the GAMLSS package was used to summarise the density of each ITD segment with bimodal mixture models made up of a paired combination of the Weibull (WEI), Gumbel (GU) and Normal (NO) distribution functions.This generated six bimodal distributions for each ITD segment and the Akaike Information Criterion (AIC) value was used as a goodness of fit measure to identify the mixture model that summarised the density of ITD segments most accurately [33].The mixture model with the lowest AIC value was then used to estimate the height of the overstorey canopy base.It was assumed that the overstorey canopy base was equal to the mixture model's local minima, unless the lowest local maxima was greater than 30 m, for which the overstorey canopy base was assumed to be the minimum height of the density profile.The eucalypt-specific component of each ITD segment was then calculated as LiDAR points greater than the canopy base.

Local Maximum Filtering
The local maximum filtering method was applied for two purposes in this study.First, it is used to efficiently estimate SDen across all sites for comparison against the more computationally demanding NCut algorithm.Second, it provides the required prior knowledge of stem positions needed to compute , of Equation (2) in the NCut procedure.Following the Metla method in [9], the LMF procedure subdivided the ROI into a grid with 1 m cells to extract the highest LiDAR point value within each cell.The resolution of the dataset meant some cells had no points and were therefore assigned a "NoData" value.Cells with a "NoData" value and with at least two out of eight adjacent cells with a height value were given the median value of adjacent cells within the 3 m × 3 m grid window.The rest of the NoData cells were assigned a zero value.Low valued cells were then defined as those with six out of eight neighbours within their 3 m × 3 m grid window containing a value five m higher than the subject cell.Cells defined as low valued cells were replaced with a median value computed using the neighbouring grid cells with a value five meters higher than the subject cell.
As the resulting canopy height model (CHM) exhibited unnatural looking pits (i.e., cells with much lower values than adjacent cells due to data resolution used), a pit filling algorithm developed by Ben-Arie et al. [34] was used to remove the pits.The CHM was then smoothed using a Gaussian filter that was calibrated to maximise Expression (6) across the 5 ha site by varying the kernel size (k) between 11 and 17 m with 1 m increments, and a standard deviation (σ) between 0.8 and 1.5 m with 0.1 m increments.A Gaussian kernel with k of 13 m and σ of 0.8 m produced the final CHM.A negative image of the CHM was then produced to undertake a watershed segmentation with a drainage direction algorithm that identified the local minimum values [35], the centre of which were assumed to represent tree locations used to estimate SDen and applied to , of Equation (2).

Producing Explanatory Variables
Previous applications of the NCut method have focused on evaluating the algorithm's ability to detect locations of overstorey and understorey trees [23,24,36].In this study we assigned ITD segments to field measured plots in order to directly relate plot-level LiDAR indices to SAHa, as well as indirectly estimate SAHa with LiDAR derived SDen and BAHa estimates using allometric relationships provided by [25].The plots within the long-term research catchments provide an adequate size to predict area based forest attributes, whereas the 5 ha site was tiled into 20 m × 20 m grids in order to generate 100 plots with a complete inventory of overstorey BAHa and SAHa measurements.In total, 251 plots were used in the study.

Assigning ITD Segments to Plots
Regressing plot-level forest attributes with LiDAR indices required each ITD segment and its corresponding tree measurements to be assigned to the same plot.For the 151 permanent plots, information on tree locations was limited to a 5 m × 5 m sub-plot resolution.For this reason, measured plot-level forest attributes were simply regressed against explanatory variables derived from ITD segments with centroids within the corresponding plots.
The 5 ha site consisted of information on stem location, which provided the opportunity to assign ITD segments to the nearest stem by calculating the Euclidean distance between the ITD segment's centroid location and nearest stem location.Considering some stems had omission and commission error, the plot level analysis required us to; (i) merge all ITD segments that were assigned to the same stem, based on the assumption that the NCut algorithm produced commission error when a stem's canopy was oversegmented; and (ii) merge BA1.3 and SA1.3 measurements of stems without an ITD segment with the nearest stem with an ITD segment, based on the assumption that the NCut algorithm's omission error is due to the stem's canopy being assigned to a neighbouring tree's ITD segment.This produced a dataset with one measure of forest attributes assigned to each LiDAR segment.
The LiDAR segments with corresponding field measurements were then used to produce plot-level stand (PLS) segments.Using plot 73 as an example, Figure 1a shows the location of each stem (green dots), centroid of each LiDAR segment (black star), and the lines that assign each stem to each LiDAR segment.To illustrate how LiDAR segments and their corresponding stem measurements were always assigned to the same plot, Figure 1b shows a PLS segment (large purple dots) amalgamated using all LiDAR segments with their centroid in the plot.For example, stem 118 was located in plot 73 whereas the stem's LiDAR segment had a centroid located outside the plot and therefore the stem's field measurements were assigned to the adjacent plot with the stem's LiDAR segment.Stem 112 had the opposite circumstance as it was measured in an adjacent plot but assigned to plot 73 considering the LiDAR segment associated with the stem had a centroid located in plot 73.For these reasons, Figure 1b shows that the LiDAR points representing stem 118 are not included in plot 73, whereas LiDAR points representing stem 112 are included.

Figure 1. (a)
Using the cell that corresponds to plot 73, illustration shows how LiDAR segments are assigned to stems (blue lines) and how omitted stems are assigned to the nearest stem with an ITD segment (red lines) and (b) example shows how LiDAR points representing plot 73 (large purple dots) may be located outside the plot if the stem assigned to the plot has a canopy overhanging the plot, and vice versa, LiDAR segments with points inside the plot may be assigned to the adjacent plot if a stem's ITD segment has a centroid outside the plot.

Explanatory Variables
Table 4 shows a list of explanatory variables derived from ITD segments and regressed against field measured SDen, BAHa and SAHa.In Table 4, plot-level LiDAR indices one to seven were generated by amalgamating LiDAR segments into PLS segments and calculating summary statistics of PLS segments, whereas indices eight to 14 analysed each LiDAR segment within a plot and generating summary statistics of LiDAR segments.Most LiDAR indices in Table 4 are self-explanatory and aim to capture the dimensions of the overstorey canopy structure, whereas LiDAR indices 13 and 14 used the Matlab alphavol function to generate alpha shapes [37], which are generalisations of the convex hull of a spatial point dataset.For a sufficiently large alpha (α) parameter, the alpha shape is identical to the convex hull, and as α decreases, the shape shrinks and gradually develops cavities.We have generated 2-dimensional alpha shapes to estimate canopy spread, and 3-dimensional alpha shapes to estimate canopy volume.

Regression Analysis of LiDAR Indices against Field Measured Forest Characteristics
To predict SDen, BAHa and SAHa with the LiDAR indices in Table 4, we avoided standard regression techniques such as least-squares and stepwise selection considering the high number of candidate predictor variables and the inherent collinearity between many of them.To address our high-dimensional predictor space, we generated models with a shrinkage regression technique called ridge regression, and in doing so, avoided over-fitting with the use of penalties on the sum of the squares of the regression coefficient estimates [38].A short summary of the procedure is provided below, and for a more detailed outline refer to [7].
As the ridge regression procedure keeps all parameters in the model, a parsimonious solution required a pre-screening step that selected a list of 2, 3, 4, and 5 predictor variables with the highest absolute conditional correlation with the response variable.For each list of predictor variables, a family of competing models was generated using a range of parameter shrinkage levels.From each family of models with a predetermined number of predictor variables, models with the most optimal level of shrinkage were identified with a Generalised Cross Validation procedure [39], and the model that offered the best predictive accuracy was identified with the smallest difference between the observed values and those predicted by the model.Considering the quality of the model's estimates used the same data to fit then assess the model, we adopted the "0.632+" bootstrap method to correct for misleading estimates with a pseudo cross-validation procedure [40].

NCut Calibration Process
Our 5 ha NCut calibration site has 75 m tall E. regnans forest, and the species is known to have attained heights over 100 metres across some parts of Melbourne's Water catchments as it is the tallest flowering plant on Earth [41].The forest height at our site makes the vertical voxel space comparatively larger and computationally more demanding than other forest types.Table 5 shows the required computational time and Random Access Memory (RAM) for computing the NCut algorithm with a range of ROI and vp sizes.Table 5 suggests that the computational time may be further reduced by subdividing a ROI into overlapping tiles and then subsequently merging the individual results.The computation time is largely dependent on the implementation of the eigenvalue solution.However it may benefit greatly from technological advancements such as the Intel(R) Math Kernel Library.The threshold rxy parameter reduces the maximum horizontal distance for calculating wij, and we found the NCut algorithm to be unstable using an rxy value of 4.5 m, as suggested by [23], and were required to increase rxy to 70 m.The discrepancy in rxy is likely attributed to the low point density (0.9 points•m −2 ) and therefore low penetration rate of our LiDAR compared to [22].The high rxy value significantly increased the required RAM and processing time for our site.After evaluating all combinations of parameter values listed in Table 3, the following parameter values maximised equation ( 6

Tree Detection for Stocking Density Estimates
Table 6 summarises the performance of the LMF and NCut algorithm across the 5 ha site, which respectively shows 72% and 68% of stems where correctly identified, 25% and 20% of stems were missed, and 2% and 12% of stems were over-segmented.For the NCut method, Figure 2 superimposes the results over a stocking density map to show that the omission error was concentrated within higher stocking density stands (mean (µ): 203 trees•ha −1 , standard deviation (σ): 45 trees•ha −1 ), whereas commission error was more likely to occur within lower stocking density stands (µ: 138 trees•ha −1 , σ: 47 trees•ha −1 ).For each commission error, a line (purple or blue) is drawn showing the distance and direction of predicted trees relative to actual stem location.The stems with omission error are identified with a red line linking them to the nearest stem with an ITD segment.The insert within Figure 2 shows that in some instances the NCut algorithm predicted the correct number of stems for two adjacent stems but the stems had omission and commission error when the two ITD centroids were closest to one stem.This mismatch is likely to occur in tall forests with stems leaning towards canopy gaps.Although these circumstances reduced the number of correctly matched stems in Table 6, omission and commission error of this nature was unlikely to have a large effect on the spatial distribution of SDen per hectare.Table 6.Segmentation accuracy using (a) LMF and (b) NCut method, which shows the number of stems correctly identified (M), with commission error (C) and with omission error (O) over the 5 ha site.
* Using NCut algorithm, the ROI is subdivided into five one ha areas to improve computational efficiency.

Figure 2.
Using the NCut method, location of stems with omission and commission error superimposed over an observed SDen map, and the insert shows circumstances where the algorithm predicts the correct number of stems for two neighbouring stems, but omission and commission error is recorded when both ITD centroids are closest to one stem.
Regarding the LMF method, omission error was also more likely in stands with high stocking density (µ: 185 trees ha −1 , σ: 47 trees ha −1 ), whereas commission error was small and more randomly distributed.Using LMF tree locations across the 5 ha site, a 1 m resolution map of predicted SDen was generated with a 15 m radius search window around each cell in order to relate predicted and observed SDen of superimposed cell values (Figure 3).A systematic error in LMF SDen estimates is evident in Figure 3 and the resulting relationship was used to correct the bias.Across the 5 ha site, the average distance of all stems to the nearest neighbouring stem was 4.3 m, and Table 7a provides the mean and standard deviation of distances between actual and predicted stem locations.Figure 4 illustrates the performance of both algorithms for different sized trees, whereas Table 7b provides the average diameter of stems: correctly matched, with commission error, and omission error.In summary, small stems with close neighbouring trees within high stocking density forest were more likely to be omitted, whereas larger trees within low stocking density forest were more likely to be over-segmented.
The 151 permanent plots did not have actual stem locations mapped, which restricted our tree detection analysis to a plot level.Considering 137 of these plots had a narrow 15 m plot width there was a high likelihood that the canopy of eucalypts >60 m in height leaned in and out of the plot areas.For this reason our results provided poor predictions of plot-level SDen using stem location estimates alone (R 2 = 0.14 and 0.21 using LMF or NCut method respectively).Alternatively, using the LiDAR indices calculated from the NCut ITD segments, Table 8 shows that the ridge regression models were more effective at predicting SDen, and the alpha shape indices that represent the area of canopy spread and canopy volume were often useful predictors.
Table 7.(a) Distance between observed and predicted tree locations for stems correctly matched and stems with commission error; and (b) diameter of stems correctly matched, with commission error, and omission error.

Method Correctly Matched Commission Error Omission Error
(a) Distance (m) (µ ± σ) LMF 1.5 ± 0.9 3.9 ± 1.  Table 8.Across the permanent plot sites, RMSE and R 2 of the ridge regression models and the list of predictor variables used to predict SDen.For the model with all sites merged, the number of trees per plot is predicted.

Plot Basal Area Predictions
Table 9 provides coefficients of determination to show that predictions of BAHa across the permanent plot sites were more accurate using the NCut algorithm compared to the area based methods used by [7].Using the same number of explanatory variables used by [7], the NCut algorithm explained between 7% and 30% (median: 9%) of variation in BAHa that [7] could not explain.In order to produce a parsimonious model that may be applied across a range of E. regnans forest conditions, we merged all plot data into one dataset.Considering each catchment contained plots with varying sizes, Figure 5a shows how the LiDAR indices were first related to the total BA of each plot (BAP) (R 2 = 0.91), rather than BAHa.In order to remove the influence of plot size on the coefficient of determination, each plot's size (AP) and predicted BAP was used to calculate predicted BAHa = BAP(10000/AP), which was related to observed BAHa in Figure 5b (R 2 = 0.74).Relating plot-level BAHa directly to LiDAR indices was only undertaken for indices not confounded by plot size, such as the portion of hits above 36 to 40 m (PH36 ... PH40).Using indices not influenced by plot size, Figure 5c shows that PH37 provided the greatest explanatory power (R 2 = 0.68).Table 9. RMSE and R 2 of the ridge regression model and the list of predictor variables used to predict eucalyptus basal area for all sites.For the permanent plot sites, the R 2 values using the NCut algorithm are compared to R 2 values using mixture distribution functions [7].

Plot Sapwood Area Predictions
Using 20 m × 20 m plots across the 5 ha site, Figure 6 provides a set of relationships between observed and predicted SAHa.It is evident in Figure 6a that with small 0.04 ha plots, SAHa is predicted more accurately using observed BAHa (R 2 = 0.84) than the observed perimeter index (R 2 = 0.58).This result is not consistent with [25] who used larger plots with variable plot sizes ranging from 0.15 to 0.31 ha (µ = 0.21; σ = 0.035) to show is more predictive than BAHa (R 2 = 0.77 and 0.62 respectively).This suggests that the perimeter index requires larger plot sizes for SDen estimates to be representative of the stand's actual stocking density.In Figure 6b we show that predicting SAHa directly with LiDAR indices is more accurate (R 2 = 0.56) than indirect estimates using a two-step procedure that predicts SAHa with LiDAR derived BAHa estimates and a site-specific SAHa/BAHa (RHa) relationship (R 2 = 0.50).In Figure 6c we show that using RHa derived from 15 external sites [25] and LiDAR derived BAHa estimates resulted in SAHa being systematically underestimated due to the 5 ha site's high RHa relative to the broader landscape's RHa.In Figure 6c, we also show that predicting SAHa with the relationship derived from the same external sites and LiDAR derived BAHa and SDen estimates resulted in slightly poorer predictions (R 2 = 0.48) due to the limitations of predicting SDen in small 0.04 ha plots.

Discussion
In recent decades, the increased frequency of wildfires and timber harvesting in forests of southeastern Australia has meant that managers of forested catchments need a better understanding of how such disturbances will affect streamflow.There is a growing body of evidence that spatial variation in T across the landscape is largely due to the spatial pattern in stand SA [8,25,42,43], whereas decadal variation in mean annual T is largely due to temporal changes in stand SA [4,44,45].Recent LiDAR data acquisition flights across more than 500,000 ha of southeastern Australian forests provide a rich source of information on forest structure, which may be used to scale T from a tree-or stand-level to a landscapelevel with an improved understanding of how forest structure relates to SA.
To the authors' knowledge, the present study is the first attempt to use an ITD algorithm in native eucalypt forests, and in doing so we predicted SDen with the LMF method and SDen, BAHa and SAHa with the NCut method.We found the ITD algorithms performed well in E. regnans forests as our ITD rates of 72% and 68%, using only 0.9 LiDAR pulses•m −2 , were similar to studies in Scandinavian and European forests that used LiDAR with 3-10 pulses•m −2 [15,20,46].Our LiDAR data resolution was more comparable to [47], who used 1 pulse•m −2 in forests with Scots pine (Pinus sylvestris), Norway spruce (Picea abies) and birch (Betula sp.) to detect 52% of trees with a minimum curvature-based regional detector applied to a smoothed CHM.

Improving the NCut Algorithm
In this study, we have calibrated the NCut algorithm with one set of parameter values across a 5 ha 69 year old E. regnans forest.In order to apply the NCut algorithm across a broad range of existing age classes, age-specific calibration of the NCut algorithm will be required due to changes in forest structure with age.Even within one age class, substantial variation in canopy crown size was evident due to the spatial distribution of SDen and BA1.3, resulting in higher omission and commission rates within respectively high and low stocking density stands.Improved ITD rates may be achieved with a procedure that spatially varies the NCut parameter values with prior knowledge of the distribution of tree-crown forms across the landscape.
Figure 7 shows that this may be achieved with an efficient preliminary screening procedure capable of capturing the heterogeneity of canopy gaps across the landscape.Using each stem's LiDAR points greater than 37 m, a 2D convex hull is provided in Figure 7 to show how well the heterogeneity in tree-crown form explains the distribution of stem commission and omission errors.Commission errors were more frequent in stands with large canopy gaps, as over-segmentation occurred for large tree canopies that extended towards large adjacent gaps.
To correct for this systematic SDen bias, it would be necessary to segment the forested landscape into homogenous units characterised with canopy gap features and then apply varying sets of NCut parameter values calibrated to best segment forests with specific canopy gap features (see [48]).St-Onge et al. [49] have reviewed LiDAR based techniques that may be applied to automatically detect and characterise canopy gaps.The accuracy and resolution output of existing techniques is adequate for our purposes, considering [50] used an automated object-based technique to delineate canopy gaps with an accuracy of 96%, whereas [51] used multi-temporal LiDAR to identify newly opened gaps less than 1 m 2 in size.

Extrapolating SAHa over Large Water Supply Catchments
Stand BA represents a forest attribute commonly predicted with LiDAR data [52] and is regularly used in allometric relationships to predict SA [53].We demonstrate that the NCut algorithm used in area-based estimates of BAHa is more accurate than the area-based mixture distribution function method, as we explained approximately 9% (median across five catchments) more BAHa variation.The improved performance was attributed to the NCut algorithm's ability to generate plot-level indices with PLS segments that removed within-plot LiDAR points associated with trees found in adjacent plots and included LiDAR points in adjacent plots if they represented trees within the plots.
As well as predicting BAHa across a broad range of forest conditions (R 2 = 0.74; RMSE= 10.94), we provide a parsimonious model that is computationally efficient at accurately predicting BAHa with one explanatory variable, PH37 (R 2 = 0.68; RMSE = 14.67).For extrapolating BAHa across large forested landscapes, the latter model is more appealing due to the intensive computational requirements of the NCut algorithm in the former model.For one hectare resolution BAHa estimates, we applied a pragmatic approach that involved calculating PH37 from LiDAR points within 20 × 20 m grid cells, and then averaging the estimates using superimposed 100 m × 100 m grid cells.Averaging PH37 reduced the propagated error generated when using a simplified calculation of PH37 without the NCut algorithm to partition LiDAR points into PLS segments.Using this procedure to calculate PH37, we applied the parsimonious model in Figure 5c to estimate BAHa across Melbourne's Maroondah water supply catchments in Figure 8a.Although it was more accurate to predict SAHa directly from LiDAR indices, the computational requirements of the NCut procedure (Table 5) may not justify the improved estimates (Figure 6).Using grids greater than 0.15 ha, [25] has shown that the perimeter index, , accurately predicts SAHa (R 2 = 0.77, n = 15).In Table 6, we have shown that the computationally more efficient LMF method is capable of accurately predicting SDen over a 5 ha site, which may further be improved using the relationship in (Figure 3 to correct for the systematic underestimation.Figure 8b provides the resulting SDen map using this procedure.In (Figure 8c, SAHa across Melbourne's Maroondah water supply catchment was calculated as [25]: SAHa = 0.0778 + 1.05 (R 2 = 0.77, n = 15) (7)

Conclusions
For even-aged forests, LiDAR data with as few as 0.9 pulses•m −2 was able to be used with the LMF and NCut methods to determine SDen and BAHa with moderate to high accuracy over large areas.Among 619 trees in a 69 year old E. regnans forest, the respective ITD rates were 72% and 68% using the LMF and NCut methods, with 25% and 20% of stems missed, and 2% and 12% of stems over-segmented.Given these similar ITD rates, we conclude that significantly lower computational requirements make the bias-corrected LMF method more suitable for predicting SDen across large forest areas.
For estimating BAHa with plot resolution of the order of 0.04 ha, the NCut algorithm was a significant improvement over an area based method with mixture distribution functions, explaining 7% to 30% (median: 9%) more variation in BAHa for individual catchments.When applied to all catchments, we were able to produce a model that explained 68% of BAHa using only the PH37 index that applies the NCut ITD segments.We demonstrate that this model may be extrapolated for one ha resolution estimates without the need to compute ITD segments.The two-step procedure involves calculating BAHa estimates by averaging PH37 values calculated for each 20 m × 20 m grid cell over 100 m × 100 m grid cells.
Although the relationship between observed and predicted SAHa was most accurate when directly relating SAHa to LiDAR indices (R 2 = 0.56), a two-step procedure that used the stem perimeter index to predict SAHa (R 2 = 0.48) was computationally more efficient.We conclude that a pragmatic approach for extrapolating SAHa across large water catchments involves using a one ha resolution grid to estimate: (i) SDen with the bias-corrected LMF method; (ii) BAHa with a parsimonious model consisting of one explanatory variable, PH37; and (iii) SAHa with the perimeter index using the BAHa and SDen estimates.

Figure 3 .
Figure 3. Relationship between observed and predicted SDen using 1 m resolution maps generated with a 15 m radium search window around each cell (R 2 = 0.58).

Figure 4 .
Figure 4. Using the (a) NCut and (b) LMF method, bar plot showing distribution of stem diameters correctly matched, with omission error and with commission error.

Figure 5 .
Figure 5.Using all plot data, relationship between: (a) predicted and observed plot-level BA; (b) predicted BAHa using the best fit model and observed BAHa; (c) portion of hits >37 m (PH37) and observed BAHa.

Figure 6 .
Figure 6.Relationship between observed and predicted SAHa, where SAHa is predicted: (a) using observed BAHa and SDen (b) directly using LiDAR indices and indirectly using LiDAR derived BAHa estimates and a site-specific RHa relationship; and (c) indirectly using LiDAR derived BAHa and SDen estimates and a generalised RHa and perimeter index relationship.

Figure 7 .
Figure 7.A convex hull of each LiDAR segment, used to represent the crown canopy of each stem, showing that stems with commission error were adjacent to larger canopy gaps than stems with omission error.

Figure 8 .
Figure 8. Maps and histograms showing distribution of estimated (a) BAHa; (b) SDen; and (c) SAHa of 1939 regrowth Ash forest across Melbourne's Maroondah water supply catchment.
3 to a stand-and landscape-level using LiDAR data.The following questions are addressed in the present study: 1. Can LiDAR data with less than 1 pulse per m 2 identify individual trees within a dense E. regnans forest, and if so what is the ITD rate? 2. Can ITD methods improve predictions of stand BAHa in E. regnans forests compared to previous area based methods applied to the same sites?3. Is it more accurate to predict stand SAHa with: (i) a RHa relationship and BAHa estimates derived from LiDAR; (ii) a stand perimeter index that relies on accurate SDen and BAHa estimates; or (iii) LiDAR indices directly related to stand SAHa?

Table 1 .
Summary statistics of sites exposed to a range of silviculture treatments.

Table 2 .
Flight details and sensor configuration for the two LiDAR data acquisition flights.

Table 3 .
List of parameters used and values tested to calibrate the NCut algorithm in E. regnans forest.

Table 4 .
List of LiDAR indices generated for each plot.

Table 5 .
Influence of region of interest (ROI) and voxel size on required computation time and memory.
* The RAM value is a coarse indication, as it represents the minimum amount of committed (not used) RAM within a mainframe computer for a completed run of the NCut algorithm.