www.mdpi.com/journal/remotesensing Retrieval of Forest Aboveground Biomass and Stem Volume with Airborne Scanning LiDAR

Airborne scanning LiDAR is a promising technique for efficient and accuratebiomass mapping due to its capacity for direct measurement of the three-dimensionalstructure of vegetation. A combination of individual tree detection (ITD) and an area-basedapproach (ABA) introduced in Vastaranta et al. [1] to map forest aboveground biomass(AGB) and stem volume (VOL) was investigated. The main objective of this study was totest the usability and accuracy of LiDAR in biomass mapping. The nearest neighbourmethod was used in the ABA imputations and the accuracy of the biomass estimation wasevaluated in the Finland, where single tree-level biomass models are available. The relativeroot-mean-squared errors (RMSEs) in plot-level AGB and VOL imputation were 24.9%and 26.4% when field measurements were used in training the ABA. When ITDmeasurements were used in training, the respective accuracies ranged between 28.5%–34.9%and 29.2%–34.0%. Overall, the results show that accurate plot-level AGB estimates can beachieved with the ABA. The reduction of bias in ABA estimates in AGB and VOL wasencouraging when visually corrected ITD (ITDvisual) was used in training. We conclude that itis not feasible to use ITDvisual in wall-to-wall forest biomass inventory, but it could provide acost-efficient application for acquiring training data for ABA in forest biomass mapping.


Introduction
Remote sensing (RS) methods, such as optical and microwave satellite imaging, digital aerial photography and light detection and ranging (LiDAR), are highly useful in various forest-monitoring tasks.Until recently, knowledge of forest biomass and its changes has been based on ground measurements and coarse or medium resolution satellite images.Therefore, the accuracy of biomass estimations, especially at the local level (e.g., forest stands or sample plots), is poor.Stand biomass is highly correlated with tree heights that can be determined accurately by means of LiDAR, e.g., [2].It is expected that LiDAR applications will enhance the accuracy of forest biomass estimates at all levels from single-tree to nationwide inventory applications.
LiDAR measurements can be divided into profiling and scanning approaches.Airborne scanning LiDAR (small-footprint) has two main methods for deriving forest information: an area-based approach (ABA, [3]) and individual tree detection (ITD, [4]).LiDAR is a promising technique for efficient and accurate biomass detection, due to its capacity for direct measurement of vegetation structure and forest characteristics at different levels (from tree to stand level), e.g., [5,6].With LiDAR's ability to directly measure forest structure, including canopy height (CH) and crown dimensions, it is increasingly being used for forest inventories at different levels.Previous studies have shown that LiDAR data can be used to estimate a variety of forest inventory attributes, including tree, plot, and stand level estimates for tree height [4,[7][8][9], stem volume (VOL) [3,10,11], basal area (BA) [12][13][14] and tree species [15][16][17][18][19].In relation to biomass changes, LiDAR is also a promising method for monitoring forest hazards and defoliation, due to its ability to derive vegetation structure properties [20][21][22][23][24].
Inventory of stand aboveground biomass (AGB) can be based on LiDAR data at single time points and multitemporal LiDAR can be used in monitoring biomass changes.Lefsky et al. [12] showed that LiDAR-derived features such as quadratic mean CH could explain 80% of the variance in AGB.The structure of the forest canopy and leaf area index (LAI) affect the penetration of laser pulses in tree crowns [25].The LAI value represents the area of leaf surface per unit area of ground surface.Changes in AGB have also been estimated, using changes in LAI.The ground truth for LAI can be determined, using special measuring devices, or by estimation from the LiDAR data [21,[25][26][27].Popescu et al. [28] studied the accuracy of individual tree crown diameter at the plot-level and its effect on AGB estimation from LiDAR data.The study area was covered by coniferous, deciduous and mixed stands of varying age classes.The effect on biomass estimation was notable; the estimate of crown diameter alone explained 78% of the variance in biomass and the R 2 values and root-mean-squared errors (RMSEs) improved by up to 0.24 and 7 t/ha, respectively.
Popescu et al. [29] combined small-footprint LiDAR and multispectral airborne data to estimate the plot-level VOL and AGB in deciduous and pine (Pinus L.) forests, using ITD in which the average VOL ranged between 123 and 163 m 3 /ha.The maximum R 2 values for AGB were 0.32 for deciduous trees and 0.82 for pines.The respective RMSEs were 44 t/ha and 29 t/ha.Bortolot and Wynne [30] also used ITD for AGB estimation in young forests (ages between 11 and 16 yr) in which the correlation (r) varied from 0.59 to 0.82 and RMSEs from 13.6 to 140.4 t/ha.Van Aardt et al. [31] estimated the VOL and AGB with LiDAR point height metrics as predictors in a per-segment estimation in deciduous, coniferous and mixed forests.The adjusted R 2 and RMSE values for deciduous AGBs were 0.58 and 37.41 t/ha.Naesset [32] used regression methods to estimate AGB for 143 sample plots in young and mature coniferous forests.The sample plot data were divided into three strata (I: young forest, II: mature forest with poor site quality and III: mature forest with favourable site quality).The regression models explained 92% of the variability in the AGB for all the forest types.Jochem et al. [33] used a semiempirical model that was originally developed for VOL estimation to estimate the AGB in Norway spruce (Picea abies L.) dominated alpine forests.The model was extended with tree canopy transparency parameters (CTPs) extracted from LiDAR.The model was calibrated, using 196 selected sample plots.The R 2 values for the fitted AGB models were 0.70 with no CTP and varied from 0.64 to 0.71 with different CTPs.The standard deviations (stds) varied from 87.4 t/ha (35.8%) to 101.9 t/ha (41.7%).Latifi et al. [34] tested the ABA in southwestern Germany in VOL and AGB mapping.They found that the random forest method was superior to other nearest neighbour (NN) methods and achieved relative errors of 23.3%-31.4% in plot-level VOL and 22.4%-33.2% in AGB prediction, depending on the feature sets and feature selection used.
Breidenbach et al. [35] introduced a method, semi-ITC (Individual Tree Crown), that combines a modified ITD approach with ABA.The study introduced a practical solution for solving the problems with tree detection and the estimation bias of ITD.The semi-ITC method predicts volume for the entire segment, which can include none, one or more trees.The use of ABA in forest parameter estimation requires large amounts of training data, which are currently collected mainly with field measurements.Since field measurements are expensive and slow to make, new applications for acquiring training data are needed.Vastaranta et al. [1] introduced a method that combined ITD and ABA for forest parameter retrieval.They also used visual interpretation to diminish tree detection problems in ITD.Breidenbach et al. [36] used a new data collection strategy employing high-density scanning LiDAR from overlapping flight strips to improve the accuracy of k nearest neighbour (kNN) estimates in the areas with low-density scanning LiDAR.The semi-ITC method was used with high-density LiDAR to make forest parameter predictions and the accuracy was tested, using external data with AGB as a response variable.The relative RMSEs varied from 26.7% to 41.56%, depending on the selection of reference plots used.
The objective of this study was to use a combination of two airborne scanning LiDAR-based applications, ITD [4] and the ABA [3,14], in a manner similar to that used in [1], to map AGB and VOL estimates in the same study area [1].The ABA was trained separately with field and ITD measurements.The main goal of the study was to find the usability and accuracy of the previously introduced method in AGB and VOL mapping.The accuracy of the biomass estimation was evaluated in Finland, where single-tree-level biomass models are available.The models used were evaluated with values of tree biomass measured in the laboratory.The study focuses on developing a new cost-efficient biomass mapping that would provide spatially explicit estimates.

Study Area and Field Data
The study area is located at Evo, Finland (61.19°N, 25.11°E).The area belongs to the southern Boreal Forest Zone and is comprised of mainly managed boreal forests, but Evo is also a popular recreation area.Therefore a broad mixture of forest stands, varying from natural to intensively managed, is available.Scots pine (Pinus sylvestris L.) and Norway spruce are the dominant tree species in the study area, contributing approximately 78.2% of the total VOL.See Vastaranta et al. [1] for more detailed description of the study area.
The field data comprised 509 circular plots of radius 10 m within all trees with a diameter-at-breast-height (DBH) larger than 7 cm were measured.Sampling the field plots was based on pre-stratification of existing stand inventory data to distribute plots over various site types, tree species and stand development classes.Pre-stratification was done separately in 2007 and 2009 [37].The field reference data in 2007 were sampled, using stratified sampling [37].The stratification of the study area was based on previous stand-level inventory data.An initial grid of points with 100 m × 100 m spacing was created, each point representing the centre of a plot, for which the stand variables used in the stratification were allocated.The input variables used in the stratification of the initial plots were volumes of pine, spruce, larch (Larix decidua Mill.) and deciduous species and characteristics of the site type (mineral/peatland and site fertility class).All the input variables were standardized to a mean of 0 and std of 1.
The latter set of field plots measured in 2009 were sampled on the basis of the airborne scanning LiDAR and aerial image data acquired in 2007 using an initial grid of first phase points generated equidistantly in the middle of the grid of 2007 [37].The plots were stratified on the basis of a smaller subset of the airborne scanning LiDAR and aerial image features selected by Holopainen et al. [38].The features used in the stratification included three LiDAR height features, two LiDAR texture features and two aerial image features.The procedure applied in the stratification was otherwise similar to that used in 2007.
The initial plots were clustered into 40 strata using the k-means clustering algorithm.Proportional allocation was applied in deriving the field sample, i.e., the number of the field plots allocated to each stratum was based on the number of initial plots in the strata, with the exception of very small strata in which additional plots were added to the proportional share (at least two plots per stratum).Within the strata, the sample plots were allocated systematically, based on their geographical location.
The field plots were located with a GEOXM 2005 global positioning system (GPS) device (Trimble Navigation Ltd., Sunnyvale, CA, USA), and the locations were post-processed with local base station data resulting accuracy of approximately 0.6 m [39].The field measurements were collected in 2007 and 2009 from the study plot.The SIMO (Simulation and Optimization) forest management planning calculation system [40] was used for adding tree-level growth to the trees measured in 2007.

Calculation of AGB
The tree-level AGB was calculated with models introduced by Repola [41,42] for Scots pine, Norway spruce and birch (Betula pendula Roth and Betula pubenscens Ehrh.) in Finland.The models were developed for total biomass of the tree and for following individual tree components: stemwood and bark, living and dead branches, foliage or needles, stump and roots.The model predictors were tree species, DBH and height.The AGBs for trees were predicted, using total AGB models (1-3) and field-measured DBH and height.The total AGB models are listed below.
Scots pine [42]: (1) Norway spruce [42]: Birch [41]: where y ki is the total biomass for tree i in stand k (kg), d Ski is 2 + 1.25d ki (d ki = the tree DBH for tree i in stand k, cm) and h ki is the tree height for tree i in stand k (m).Factor u k is the variance of random stand k parameter and e ki is the residual error for tree i in stand k.The AGB model parameters are presented in Table 1.
Table 1.Parameters for aboveground biomass for models 1-3 by Repola [41,42].Plot-level AGB estimates were obtained by summing the tree data.The VOLs were calculated with standard Finnish models [43].The field data were divided randomly for the training (255 plots) and test datasets (254 plots).The statistics of the forest variables of AGB, weighted mean diameter (Dg), weighted mean height (Hg) and VOL, in the test and training datasets are presented in Table 2.The frequency distributions of the AGBs and VOLs within the test and training plots are shown in Figure 1.

Tree Analyses for Evaluating the Biomass Models Used
The data for evaluating the existing biomass models developed by Repola [41,42] were collected in summer 2010 from the study area.A total of 38 sample trees were measured: 19 Scots pines and 19 Norway spruce (Table 3).The field measurements and biomass calculations are described in further detail in Kankare et al. [44] and were conducted on the same principle as that described in Repola [41].A short summary is presented below.The field measurements were collected in a destructive manner.All the sample trees were felled, thinned and the trunks were cut into logs.The following information was recorded: tree height, heights of the living and dead crowns, weight of the branches (including cones and needles) and logs, diameters along the trunk and moisture content of the bark, stem wood and branches.Samples of bark, stem wood and branches were dried in an oven at 70 °C for 2-3 days.The biomass components were modelled, using ratio estimation methods.The biomass is referred to as dry weight in Table 3.

LiDAR Data
The LiDAR data was acquired in midsummer 2009, using a Leica ALS50-II SN058 system (Leica Geosystems AG, Heerbrugg, Switzerland).The data was acquired with following parameters: (1) the flying altitude was 400 m at a speed of 80 knots, (2) an open angle of 30°, (3) beam divergence of 0.15 mrad and (4) a pulse rate of 150 kHz.The pulse density within the plots was 10 hits per m 2 .A digital terrain model (DTM) was computed by the data provider.

Automatic Tree Detection and Visual Interpretation
A summary of individual tree detection methods used follows and the methods used are the same as introduced elsewhere in detail [1].A raster canopy height model (CHM) was created from normalized data (DTM subtracted from the height of the vegetation points).Single tree segmentations were performed in the CHM images, using a minimum curvature-based region detector (see [45,46]).During the segmentation processes, the tree crown shapes and locations of individual trees were determined.The CHM was smoothed with a Gaussian filter to remove small variations on the crown surface.The degree of smoothness was determined by the value of the std (Gaussian scale) and kernel size of the filter.An std of 0.7 and bandwidth of five pixels (pixel size 0.5 m) were used based on trial and error.Then the minimum curvatures were calculated (see [46]).The minimum curvature is one of the principal curvatures and, for a surface such as a CHM, a higher value of minimum curvature describes the treetop.The smoothed CHM image was linearly scaled, based on the minimum curvature computed, resulting in a smoothed, yet contrast-stretched image.Finally, the local maxima were searched in a sliding neighbourhood of 5 × 5 pixels.They were considered as treetops and used as markers in the following marker-controlled watershed transformation [47] for tree crown delineations.Each segment was considered to represent a singletree crown and the highest laser point height within the segment was used as an estimate for tree height.
The results of above described automatic tree detection (ITD auto ) were visually inspected in TerraScan software (TerraSolid Ltd, Helsinki, Finland).The omission and commission errors (Figure 2) were searched and reduced in a three-dimensional (3D) environment with visual interpretation.This procedure combining ITD auto and visual interpretation is further referred to as visual ITD (ITD visual ).

Retrieval of AGB and VOL from Individual Tree Detection
The plot-level AGB and VOL, based on the ITD results, were calculated for ITD auto and ITD visual , with no field measurements.A DBH was predicted for the trees, using the laser-based tree height and model developed by Kalliovirta and Tokola [48] for boreal forest stands in southern Finland.The AGB and VOL were calculated, using the laser-based tree height and predicted DBH, with the same models [41][42][43] used in calculation of the AGB and VOL of the field measurements (see Section 2.2).In AGB and VOL modelling, the main tree species of the plots were assumed to be known.

Area-Based Approach
The ABA used statistical LiDAR features as predictors that were calculated from the CH or vertical distribution of the laser returns [14,49] for estimating the plot-level AGB and VOL.The AGB and VOL were imputed for the test plots, using ABA with three differently acquired training sets, field measurements, ITD auto and ITD visual .Methods are referred as ABA field , ABA ITDauto and ABA ITDvisual .The following features were derived individually per plot from the normalized laser data, minimum height (Hmin), maximum height (Hmax), mean height (Hmean) calculated as the arithmetic mean of the laser heights, STD of laser heights (Hstd), coefficient of variation (CV), penetration, percentiles calculated from 10% to 100% of the CH distribution at 10% intervals (h10-h100) and canopy cover percentiles expressed as proportions of the first returns below a given percentage (10-90%) of the total height (p10-p90).
The nearest neighbour (NN) imputations were applied in ABA.The selected NN imputation method was the k-most similar neighbour (k-MSN) [50].In k-MSN, the similarity is based on canonical correlations and the Mahalanobis distance [50].The R yaImpute library [51] was applied in the NN imputations.The number of NNs was set at 5. LiDAR features used as predictors were selected, using the least absolute shrinkage and selection operator (LASSO) method introduced by Tibshirani [52].LASSO regression was applied, using the glmnet-package [53] in R software for statistical computing [54].The method uses penalizing of the absolute size of the regression coefficients and this can produce a situation in which the size of the coefficients can shrink to zero, which enables feature selection.The method recommended for selecting optimal penalty value (lambda) is cross-validation, 10-fold cross-validation was used in this study.The features were selected separately for ABA field , ABA ITDauto and ABA ITDvisual .

Statistical Analyses
The accuracies of the tree-level biomass models and imputed plot-level AGBs and VOLs were evaluated by calculating the bias and RMSE: % 100 (7) where n is the number of observations, y i the value estimated from the field data for observation i, the predicted value for observation i and the mean of the variable in question.

Validation of the Biomass Estimation Models Used
The accuracies of the biomass models used were evaluated at the tree level, using 38 trees measured in the laboratory.The estimates of AGB were calculated, using field-measured DBH, h and species.These estimates were compared with the biomasses determined in the laboratory (Figure 3).The residuals showed a systematic shift in estimates larger than 250 kg.In this case, the biases for the Scots pines were 9.1 kg (5.8%) and for the Norway spruce 35.2 kg (15.0%).The RMSEs, combining both bias and deviation, were 15.0 kg (9.5%) and 54.6 kg (23.2%), respectively.The errors in tree-level AGB estimations are shown in Figure 4.

Accuracy of the ABA
The accuracy of the ABA in plot-level AGB and VOL imputation was assessed in 254 test plots.Training data for the ABA were measured in the field and with ITD from 255 plots.Tree-level AGBs were calculated with the models reported by Repola [41,42].The final statistical LiDAR features used in the imputations were selected, using LASSO (Table 4).The difference for selection of the best features was relatively small between the ABA methods, depending on the training data, and the number of predictors varied from 8 to 11.
The accuracies of the various ABAs are presented in Table 5 and residual plots in Figure 5.The ABA ITDvisual AGB imputation accuracy was similar to that of ABA field .The RMSEs were 24.9%, 34.9% and 28.5% for ABA field , ABA ITDauto and ABA ITDvisual , respectively.The ABA ITDvisual resulted in even smaller bias for the AGB than did ABA field .The adjusted R 2 values (Table 6, Figure 6) for the ABA methods in the AGB imputations varied from 0.71 to 0.72.
VOL was imputed most accurately with ABA field (Table 5) and an RMSE of 26.4% was achieved.The ABA ITDvisual imputation accuracy was better than that of ABA ITDauto , especially the bias, which was reduced from −12.5% to −1.9%.

Discussion
The estimation of forest AGB and VOL with airborne scanning LiDAR was tested in the present study.The results showed that LiDAR is capable of retrieving AGB accurately at the plot level.Validated current tree-level biomass models proved to be useful in AGB calculations, providing accuracies similar to those of general tree VOL models.Systematic shifts were found in validation of the current tree-level biomass models for biomasses larger than 250 kg, which meant that the large biomasses were underestimated.The bias in larger trees can be the reason for the increasing deviance that is seen in Figure 6.Therefore the usability of these models should be studied in forests with varying development levels.
The AGB imputation accuracies (RMSEs) varied in this study from 24.9% (23 t/ha) to 34.9% (32.3 t/ha) at the plot level.The results were in line with previous studies, e.g., [29,30,55,56], in which the RMSEs varied from 14% to 42%.Latifi et al. [34] achieved plot-level accuracies of 22.2%−45.5%,using LiDAR data, and the best results were achieved with a random forest approach in comparison to other NN methodologies.The AGB imputation accuracy [34] for the k-MSN method varied from 43.3% to 45.8%, depending on the variable set used.The results for the AGB imputation with the ABA ITD -methods were, in general, similar to those in Latifi et al. [34], but to make adequate comparison of the NN methods used, all external factors (e.g., site-specific or the selection of model features) should be removed.
The best features for each ABA method were selected, using LASSO regression.The number of features selected was dependent on the penalty value (lambda), which was searched using cross-validation.The values selected were quite small and therefore the number of features in the models was large.This could cause overfitting, especially if highly correlated feature groups are present.With higher lambda value, fewer features were selected for the model, but this resulted in rapid increase in mean squared error (MSE) in cross-validation.Detailed comparison of the various feature selection methods when LiDAR features are used would be an interesting research topic since the correlated features here were selected for the optimal feature set.The impact of variable selection was also investigated by calculating the estimates for ABA ITDauto and ABA ITDvisual using selected features from ABA field .Only small increase in bias and RMSE was found, e.g., the relative RMSE increased 1.6% and 0.5% for ABA ITDauto and ABA ITDvisual , respectively.
The AGBs imputed with the various ABA methods explained 71%-72% of the variation at the plot level.The R 2 values for the Scots pine, Norway spruce and deciduous plots varied between 0.73 and 0.76, 0.68 and 0.69 and 0.71 and 0.72.In comparison to previous studies, the total R 2 values were similar or somewhat better, e.g., [28,30].
The accuracy of tree VOL imputation at different levels (tree, plot or stand) is one of the most studied of forestry subjects, e.g., [4,13,14,34].The accuracy of plot-level VOL imputation in the present study varied from 26.4% (48.4 m 3 /ha) to 34.0% (62.4 m 3 /ha), depending on the ABA training data used.Latifi et al.'s [34] imputation accuracies for VOL varied between 23.3% and 31.4%,depending on the feature set and selection method used.Popescu et al. [29] achieved VOL RMSEs of 52.84 m 3 /ha and 47.9 m 3 /ha for deciduous and pine plots, respectively.The VOL accuracy obtained in the present study with the ABA ITD methods was similar to that in the above-mentioned studies.The best result was from ABA field , but the results from the ABA ITD methods were not far behind.
In large-scale forest biomass inventories, acquisition of extensive field references may be a major source of expense.In the present study, the training datasets for ABA were measured from LiDAR data and the imputation accuracies were compared with the accuracies imputed, using traditional field data.This type of approach could provide savings, especially in areas with difficult terrain or sparse road networks, and still maintain the spatial accuracy of the estimates.However, it is crucial to obtain unbiased estimates, which often presents a bottleneck if ITD is used in acquiring training data for the ABA.In the present study, ABA ITDvisual resulted in even smaller bias for AGB and VOL than ABA field .The relative bias was 0.8%-1.2%smaller for ABA ITDvisual .This was most likely caused by the growth simulation used with the sample plots measured in 2007.It should be noted that the knowledge of tree species has a large impact on ABA estimation results and it was given as an auxiliary input data for the ITD training datasets in this study.The bias estimates might therefore be too low since tree species were not estimated using the LiDAR data.However, multiple promising studies have been published on this subject [16][17][18][19] and tree species classification should be integrated to the method in operational forestry.
Tree detection has been a major error source in ITD.Under Nordic forest conditions, tree detection accuracies with ITD auto methods have varied from 20% to 90% [57].Other sources of bias in ITD methods include prediction of DBH and tree species classification [58].In this study, the composition of tree species was relatively simple.In more diverse forests, the biomass estimation and especially biomass model development would be more challenging.We used visual interpretation to reduce bias caused by commission and omission errors of tree detection.The bias for AGB was −13.9% with ABA ITDauto .Visual interpretation reduced the bias to −3.1%.The bias in ITD could also be reduced by using plot-level biomass calibration data measured in the field or more sophisticated DBH prediction models [24,58,59].

Conclusions
The airborne scanning LiDAR-based forest mapping method [1], which utilizes ITD to measure training data for ABA, was tested in plot-level AGB and VOL imputations in the same study area used in [1].The main goal was to find the usability and accuracy of the introduced method in AGB mapping in Finland where biomass models are available.The results show that spatially accurate AGB estimates can be achieved, using the ABA.Biomass inventory can be even more accurate and effective if ITD measurements are used in training.The use of ITD as training data for the ABA provides a new application for forest biomass inventories in areas that are difficult to reach.Accurate local biomass models were used in this study, but in most of the areas these types of models are not available.Therefore, the use of tree-level LiDAR features should be studied in AGB estimation and model creation.

Figure 1 .
Figure 1.(a) AGB and (b) VOL frequency distributions for test and training sets with class interval of 20 t/ha (AGB) and 50 m 3 /ha (VOL).

Figure 2 .
Figure 2. ITD auto -detected trees plotted with black solid squares.Omission tree marked (shallow black square) from understorey.Commission error from plot with a single tree (shallow black circle) is easily reduced in visual interpretation.

Figure 4 .
Figure 4. Residual plot for AGB at tree level.AGB prediction error in kg is presented in y-axis for Scots pine (red) and for Norway spruce (green).

Table 2 .
Plot-level statistics for field measurements.254 circular, fixed-radius (r = 10 m).The plots were used as the test set and 255 plots as the training set for ABA.

Table 4 .
Statistical LiDAR features used in k-MSN imputations.

Table 5 .
AGB and volume imputation accuracies, RMSE and bias, for the test plots (n = 254).

Table 6 .
Adjusted R 2 for ABA methods in AGB imputations compared with test set.