Using Small-Footprint Discrete and Full-Waveform Airborne LiDAR Metrics to Estimate Total Biomass and Biomass Components in Subtropical Forests

An accurate estimation of total biomass and its components is critical for understanding the carbon cycle in forest ecosystems. The objectives of this study were to explore the performances of forest canopy structure characterization from a single small-footprint Light Detection and Ranging (LiDAR) dataset using two different techniques focusing on (i) 3-D canopy structural information by discrete (XYZ) LiDAR metrics (DR-metrics), and (ii) the detailed geometric and radiometric information of the returned waveform by full-waveform LiDAR metrics (FW-metrics), and to evaluate the capacity of these metrics in predicting biomass and its components in subtropical forest ecosystems. This study was undertaken in a mixed subtropical forest in Yushan Mountain National Park, Jiangsu, China. LiDAR metrics derived from DR and FW LiDAR data were used alone, and in combination, in stepwise regression models to estimate total as well as above-ground, root, foliage, branch and trunk biomass. Overall, the results indicated that three sets of predictive models performed well across the different subtropical forest types (Adj-R = 0.42–0.93, excluding foliage biomass). Forest type-specific models (Adj-R = 0.18–0.93) were generally more accurate than the general model (Adj-R = 0.07–0.79) with the most accurate results obtained for coniferous stands (Adj-R = 0.50–0.93). In addition, LiDAR metrics related to vegetation heights were the strongest predictors of total biomass and its components. This OPEN ACCESS Remote Sens. 2014, 6 7111 research also illustrates the potential for the synergistic use of DR and FW LiDAR metrics to accurately assess biomass stocks in subtropical forests, which suggest significant potential in research and decision support in sustainable forest management, such as timber harvesting, biofuel characterization and fire hazard analyses.


Introduction
As a primary reservoir of carbon dioxide (CO 2 ) in terrestrial ecosystems, forests play a key role in the global carbon cycle [1].Globally, forest ecosystems contain approximately 80% of the world's aboveground and 40% of belowground carbon stocks [2].The subtropical forests biome accounts for approximately a quarter of the area of China and are particularly important for improving regional ecological environment and maintaining global carbon balance [3].Despite their importance, there is still considerable uncertainty about carbon budgets within these subtropical forests, despite several national-scale studies using historical forest inventory data [4].In addition, the partitioned biomass components (i.e., trunk, branch, foliage and root) also provide important information for forest management decisions such as the estimation of stem and branch biomass for biofuels calculations, as well as analyzing foliage biomass for studying forest growth and assessing crown biomass (i.e., branch and foliage biomass) for predicting fire hazard [5,6].
Remote sensing techniques can provide quantitative spatial explicit information and "wall-to-wall" observations for carbon stock mapping and monitoring [7].However, since subtropical forests are typically structurally-complex and carbon-dense ecosystems, sensors (such as Landsat and shorter waveform RADAR, Radio Detection and Ranging) tend to saturate which inhibits reliable forest carbon stock estimates in these regions [8].In addition, optical sensors can only provide limited information of the vertical structure of the forests [9].Light Detection and Ranging (LiDAR) is an active remote sensing laser technology capable of providing detailed, spatially explicit, three-dimensional information on vegetation structure [10].A large number of studies have demonstrated the potential of LiDAR to accurately estimate biophysical and structural properties over a wide range of forest types [11].
Over the past decades, many conventional airborne LiDAR systems produce small-footprint discrete data (system points) [12].These systems record multi-returns (n ≤ 5) per transmitted pulse with each return representing the 3D position and intensity of the reflected light corresponding to the surfaces where the light has been reflected [13].Discrete data can also be derived from the LiDAR waveforms (waveform points), which provide a denser point cloud in contrast to conventional systems [14].However, research has indicated that the difference between the system versus waveform points for estimating canopy height is negligible, as the additional points (from waveform points) mainly describe the internal crown structure and the low vegetation stratum, and as a result the DTM and CHM-based heights are not significantly improved [15].These discrete data-based approaches have been successfully used to describe a wide range of forest characteristics [16].For instance, airborne LiDAR estimated tree heights were found to be of similar or better accuracy than corresponding field-based measurements [17].
Additional attributes, such as leaf area index [18], canopy volume [19] and biomass [20,21], are also well characterized with discrete LiDAR data at both the individual-tree and plot scale [22].
Due to limitations in the electronics of many conventional airborne LiDAR systems, only surfaces that are sufficiently spaced apart can be distinguished into separate returns [13].Recently, small-footprint full-waveform LiDAR systems have become available commercially; these systems provide new opportunities for forestry studies [23].These full-waveform systems digitize and record the entire backscattered signal of each emitted pulse, and allow the recording of detailed geometric and physical properties of intercepted objects [24].The analysis of full-waveform data has improved height estimations [25], enhanced point extraction [26] and additional target information [15].This extended return information has been successfully applied to forestry applications, such as single tree detection [27], tree species classification [28,29] and the estimation of forest structure characteristics [30,31].
In this paper, we examine the relationship between forest biomass (i.e., above-ground and total biomass) and its components (i.e., foliage, branch, trunk and root biomass) with small-footprint discrete (DR) and full-waveform (FW) airborne LiDAR derived metrics in three typical subtropical forest types (i.e., coniferous, broadleaved and mixed forests).In addition, we examine the estimation capability of DR and FW metrics based models for biomass estimation individually and in combination.Finally, the accuracy of the relationships is evaluated within each forest type by comparing biomass estimates with field measurements and characterizes the environments using additional topographic and vegetation conditions.

Study Site Description
Yushan Forest, a state-operated forest and national forest park, is located near the town of Changshu in Jiangsu province, southeast China (120°42′9.4″E,31°40′4.1″N).It covers approximately 1103 ha, with an elevation range of approximately 20-261 m above sea level (Figure 1).This site is situated in the north-subtropical monsoon climatic region with an annual mean precipitation of 1062.5 mm; the highest monthly precipitation occurs in June (171 mm) and July (147 mm).The forest in Yushan belongs to the North-subtropical mixed secondary forest with three main forest types: coniferous-dominated, broad-leaved dominated and mixed forests [32].The main conifers species include Masson pine (Pinus massoniana Lamb.) and Chinese fir (Cunninghamia lanceolata (Lamb.)Hook.) as the apex species, and a small percentage of Slash pine (Pinus elliottii Engelm.) and Japanese Blackbark Pine (Pinus thunbergii Parl.).The primary broad-leaved species are Sawtooth oak (Quercus acutissima Carruth.) and Chinese sweet gum (Liquidambar formosana Hance), mixed with some evergreen broad-leaved tree species belonging to the Fagaceae, Lauraceae and Theaceae.

Field Data
Using historical discrete return airborne LiDAR data (2007), a mean height and a cover layer at 30 m cell resolution were developed to be used as an initial stratification of the study area.The height and canopy layers were classified into five equal interval classes and in combination with the GIS-based forest inventory data (2012) to guide the selection of ground sample plot.A total of 65 square plots  Plot data were collected during the 2012 (June to August) and 2013 (August) growing seasons under leaf-on conditions.Plot corners were located using Trimble GeoXH6000 Handheld GPS units, and the high precision real-time differential signals were received from the Jiangsu Continuously Operating Reference Stations (JSCORS), resulting in submeter accuracy [33].At each plot, species of all measured trees and the status of the tree (live or dead), along with the crown class (e.g., dominant or co-dominant) were recorded.For all live trees with a DBH > 5 cm, diameter, height, height to crown base, and crown width in both cardinal directions were measured.Small trees (DBH < 5 cm) and dead wood were also tallied for total stem number, but not used in biomass calculations.DBH was measured on all trees using a diameter tape.Heights of all trees were measured using a Vertex IV hypsometer.Crown widths were determined by measuring two crown radii, one being the major axis (to the tip of the outmost branch on the longest side of the crown) and the other being 90° (perpendicular) to the first axis.
Several plot-level forest variables were calculated based on the individual tree data, including basal area (m 2 •ha −1 ) and Lorey's height (i.e., the basal area weighted height).Species-specific exponential allometric equations referenced from Feng et al. (1999) were used (Allometric equations developed from tree inventory data from local or nearby provinces were selected for this research) [34], with biomass (i.e., total and above-ground biomass) and biomass components (i.e., stem, root, branch and foliage) calculated for individual trees within each plot based on the field-measured DBH and height, and then summed to obtain total plot biomass.A summary of the derived ground biomass estimates is provided in Table 1.

LiDAR Data
Small-footprint LiDAR data were acquired on 17 August 2013 using a Riegl LMS-Q680i scanner (which is a full-waveform digitizing laser scanner), onboard the LiDAR, CCD and Hyperspectral (LiCHy) Airborne Observation System [35].The system was operated at 900 m above ground level, with a flight line side-lap of ≥60%.Data was acquired at a 400 kHz pulse rate with a scanning angle of ±30° from nadir (pulses transmitted at scan angles that exceeded 15° were excluded from the final dataset).The returned waveforms were recorded with a temporal sample spacing of 1 ns (approximately 15 cm) and beam footprint size around 0.25 m.The dataset had an average pulse spacing of 0.74 m and pulse density of 1.4 pulse•m −2 in a single strip, with approximate three times higher pulse density in the overlapped regions.
Only one dataset was used for the entire analysis, the discrete data (with XYZ coordinates, intensities and additional parameters) were extracted from the full-waveform.The aim of this research is not to investigate the benefits of using the higher density point clouds provided by the full waveform laser scanning (see Reitberger et al. 2009 [36]), rather to explore how discrete and full-waveform data differ in their estimation of total biomass and specific biomass components.The discrete data were extracted using decomposition algorithms provided by the RIEGL RiAnalyze ® software (Horn, Austria).The algorithms are based on fitting one or multiple Gaussian pulses to the waveform data and extracting the temporal position (and range) of the pulses, the individual signal strength (amplitudes) and signal broadening (echo width) for each echo [37].The full-waveform and discrete LiDAR data were both stored in LAS format (American Society for Photogrammetry and Remote Sensing, Bethesda, MD, USA) as final products by the data provider.

Data Pre-Processing
A 1-m digital terrain model (DTM) was created in two steps from the DR data: first, the data was filtered to remove the above-ground returns (using algorithm adapted from Kraus and Pfeifer, 1998) [38], and secondly the DTM was created by calculating the average elevation from the remaining (ground) LiDAR returns within a cell (cells that contain no points were filled by interpolation using neighboring cells).The DR point clouds were then normalized against the ground surface height and extracted for each plot using the coordinates of the lower left and upper right corners.All point cloud data processing was performed using the FUSION software package [39].
To avoid the effects of off-nadir pointing on the FW (e.g., the waveform is stretched as off-nadir angle increases) and to integrate non-vertical waveforms from different flight trajectories, a voxel-based approach developed by Hermosilla et al. (2014) was applied to the full-waveform data [40].This approach synthesized multiple raw waveforms into composite waveforms through the vertical space partitioning of forest canopies by voxels, and uses the maximum amplitude value to construct new pseudo-vertical waveforms.The background noise of each waveform was first suppressed by a de-noising process and then smoothed by a Gaussian filter [40].Each pre-processed waveform was then spatially located in the three-dimensional space and normalized by subtracting the derived DTM height from height of each bin in the corresponding positions.The vertical space was partitioned into voxels (0.25 × 0.25 × 0.15 m) according to the footprint size and temporal sample spacing.A maximum amplitude value was calculated and assigned to each intersected voxel.The composite waveforms were finally extracted as each vertical column of voxels [31] (See the technical route of this research in Figure 2).

Discrete Data Based Metrics
Discrete data based metrics are descriptive structure statistics calculated from the height normalized LIDAR point cloud measurements in three-dimensional space [41].In this study, 15 metrics for each plot within the 30 × 30 m area of the 65 plots were calculated including: (i) selected height measures, i.e., percentile height (h 10 , h 25 , h 50 , h 75 , h 90 and h 95 ), maximum height (h max ) and mean height (h mean ); (ii) variability of height measures, i.e., coefficient of variation of heights (h cv ); (iii) selected canopy return density measures, i.e., canopy return density (d 2 , d 4 , d 6 and d 8 ) and (iv) canopy cover measures, i.e., canopy cover above 2 m (CC 2m ) and canopy cover above mean height (CC mean ).
All of the discrete data-based metrics were generated from first returns, which have been found to be more stable for estimating forest biophysical attributes than all returns [42].Metrics of heights and canopy return densities were computed with a height threshold of 2-m above ground-level to ensure that the non-canopy and below-canopy returns were excluded from metric calculation [43,44].

Full-Waveform Based Metrics
Full-waveform data based metrics describe the radiometric and geometric attributes of the return waveforms based on the spatio-temporal analysis [40].In this research, 15 waveform metrics adapted from previous studies of large-footprint full-waveform data (i.e., SLICER, LVIS and GLAS) were incorporated into the pseudo-vertical waveform analysis due to their success in previous studies [28][29][30][31]45].These metrics include height of median energy (HOME), waveform distance (WD), height/median ratio (HTMR), number of peaks (NP), roughness of outermost canopy (ROUGH), front slope angle (FS), return waveform energy (RWE) and vertical distribution ratio (VDR) (Figure 3).
HOME is calculated as the distance from waveform centroid to the ground (i.e., corresponding DTM height), and has been found to be sensitive to canopy openness and the vertical arrangement of canopy elements [46].WD, often related to the LiDAR canopy height, is computed as the distance from waveform begin to the ground [47].HTMR, which is the HOME divided by the WD, was used to depict the change of HOME relative to the canopy height [46].ROUGH, which is the distance from the waveform beginning to the first peak, was applied to describe the roughness of the upper-most canopy and the spatial arrangement of plant surfaces [25,48].FS, which is the vertical angle from waveform beginning to the peak of canopy return energy, provides the variability of the upper canopy [49].NP is the number of detected peaks within each composite waveform.RWE represents the total received energy.VDR is calculated as the differences between the WD and the HOME, divided by WD [30].The plot level waveform metrics (except VDR) were summarized as the mean and standard deviation of all the waveform metrics (of individual composite waveform) within each plot.A summary of the metrics with corresponding descriptions is shown in Table 2.

Statistical Analyses
The strength of relationships between biomass (and each biomass components) and the 30 LiDAR metrics were tested by the square of the Pearson correlation coefficient (R 2 ).Multiple linear regression models, which include the two sets of LIDAR metrics (i.e., discrete and full-waveform based metrics) as predictor variables, were developed independently and in combination for estimating the biomass and biomass components among different forest types (i.e., all forests, coniferous forests, broadleaved forests and mixed forests).To ensure the independent variables were not highly correlated, collinearity was first evaluated using Principal Component Analysis (PCA) based on the correlation matrix.Models with condition number (κ) lower than 30 were accepted to ensure that there was no serious collinearity in the selected models [50,51].Standard backward stepwise regression was performed to select variables for the final models with predictor variables left in the models at the 5% significance level.The best fitting models were then selected based on the lowest Akaike information criterion value [52].Once the best models were chosen, leave-one-out cross validation was used to validate them.Three sets of predictive models, i.e., discrete data metrics based model (DR-model), full-waveform metrics based model (FW-model) and Combo model (includes the combination of both discrete and full-waveform metrics) were fitted.These prediction models were assessed using adjusted coefficient of determination (Adj-R 2 ), Cross-validation coefficient of determination (CV-R 2 ), Root-Mean-Square Error (RMSE), and relative RMSE (rRMSE), defined as the percentage of the ratio of RMSE and the observed mean values (Table 3).

Results
Most of the biomass and its components (hereafter written as "biomass components") have a stronger correlation with the discrete data based metrics than the waveform model based metrics (Figure 4).Within the set of discrete data based metrics, except for the "all forests" and "broadleaved forests" foliage biomass components, most of the height-related metrics (especially the percentile heights and the coefficient of variation of heights) and biomass components are strongly correlated.In comparison, measures of canopy return density and canopy cover have a relatively poorer relationship with the biomass components.The strength of the relationships between biomass components and the waveform metrics were also generally weaker.The waveform metrics related to canopy height (e.g., WD), vertical arrangement (e.g., HOME) and roughness of upper-most canopy (e.g., ROUGH) had a more significant relationship with biomass components than others.Details on the three sets of predictive models (i.e., general and forest-type specific models developed from discrete data based metrics, waveform based metrics and their combinations) are shown in Tables A1-A3 and Table 3 summarizes their accuracies.Overall, most of the biomass components are generally well predicted (Adj-R 2 = 0.42-0.93,rRMSE = 11.08%-33.68%)except for foliage biomass (Adj-R 2 = 0.07-0.82,rRMSE = 32.81%-62.94%).The combo models performed the best; explaining 60%-79% (for the general model) and 65%-93% (for each forest type-specific model) of the variability of most biomass components (exclude foliage biomass).Compared with full-waveform models (Adj-R 2 = 0.42-0.79,rRMSE = 16.88%-33.68%),the discrete data models (Adj-R 2 = 0.50-0.88,rRMSE = 13.61%-28.27%)have a relatively higher performance, except for the trunk biomass (in coniferous forests) and foliage biomass (in broadleaved forests) (see Table 3).The fit for the general model (developed from all plots) is relatively low (Adj-R 2 = 0.07-0.79).In comparison, the relationship were generally improved for forest type-specific models (Adj-R 2 = 0.18-0.93).Overall, most of the fitted models had higher correlations in coniferous forests (Adj-R 2 =0.50-0.93)and broadleaved forests (Adj-R 2 = 0.43-0.89)than in mixed forest types (Adj-R 2 = 0.41-0.82).The relative RMSE was generally in accordance with adjusted R 2 , with lower values in coniferous forests (rRMSE = 11.08%-41.1%)and broadleaved forests (rRMSE = 13.34%-36.36%)than in mixed forest types (rRMSE = 15.85%-45.93%).
Scatter plots of the plot-level field-measured and LiDAR-estimated above-ground and root biomass are shown in Figure 5 with linear fits and prediction confidence intervals (95%).The linear models demonstrate the increased correlations in the Combo models (R 2 = 0.89 for above-ground biomass; R 2 = 0.80 for root biomass) in comparison to the DR model (R 2 = 0.88; R 2 = 0.78) and FW model (R 2 = 0.70; R 2 = 0.58).The widths of the confidence intervals in the DR model (AGB) and Combo model (AGB) are relatively narrow, indicating better quality of model fitting.Some outliers, which may be the result of prediction bias, should be noted in the FW model (AGB) and DR model (Root).The selected metrics of the two sets of biomass estimation models (i.e., the discrete data models and full-waveform models) are shown in Figure 6.For the discrete data models, height-related metrics (h mean , h 50 , h 75 and h 95 ) calculated from the point cloud are the most frequently selected by the regression models, followed by canopy return density and canopy cover metrics.The h mean statistic is sensitive to all of the biomass components in coniferous forests (selected by all of the 6 conifer models).In comparison, the statistics of h 50 and h 95 are more sensitive to the biomass components of "all forests" and mixed forests (selected by 8-9 of 12 models of the "all forests" and mixes), whereas d 6 and d 8 are more sensitive to the biomass components of broadleaved forests (selected by 5 of 6 models of the broadleaves).The h 75 statistic appears sensitive to above-ground (W AGB ), trunk (W trunk ) and total biomass (W total ), often selected by the predictive models (in 3 of the 4 forest types).For the full-waveform models, the statistic of HOME μ calculated from composite waveforms is the most frequently selected variable.The HTMR μ and NP μ statistics appear sensitive to foliage biomass since they are selected to predict this component in 3-4 of the 4 forest types.FS σ are significantly sensitive to the branch biomass since they are selected in 4 of all the forest types.Note: See Table 2 for the codes and descriptions of metrics.W a = W AGB ; W s = W trunk ; W b = W branch ; W f = W foliage ; W r = W root ; W t = W total .(See Table 1 for biomass components' codes).
Figure 7 represents the selected metrics of the combo models, i.e., using both discrete and full-waveform metrics.Each predictive model includes at least one variable from the DR and FW metric pools, whereas more discrete metrics (11 out of the total 15) are used in the combo models than full-waveform metrics (7 out of 15).Height-related metrics of h 50 and h 95 remain sensitive to the biomass components of "all forests" and mixed forests (selected by 8-9 of 12 models of the "all forests" and mixes), which agrees with the previous analysis of the discrete data only models.VDR is the most frequently used full-waveform metrics in the combo models (it is selected at least once for each forest type-specific models).  2 for the codes and descriptions of metrics.W a = W AGB ; W s = W trunk ; W b = W branch ; W f = W foliage ; W r = W root ; W t = W total .(See Table 1 for biomass components' codes).
Within specific forest types, the biomass components were calculated using the individual models (i.e., forest type-specific models) and then averaged (Figure 8).Compared with the field measurements (the first left column in each forest type), most of the biomass components were similar to the field-measured values except for an overestimation of branch biomass by all the models within all plots; an underestimation of trunk biomass by full-waveform and combo models within coniferous forests; and an overestimation of trunk biomass by full-waveform and combo models within the mixed forests.
Across all 65 field plots, the relationship between field-measured and LiDAR-estimated biomass components is shown in Figure 9. Overall, most of the estimated biomass components of foliage, branch and root are well predicted by the combo models.Five plots with the largest bias in trunk biomass are highlighted and analyzed against four topographic and vegetation factors which may influence the accuracy of biomass estimation [53][54][55].Both slope (calculated by DTM) and canopy cover (Percentage of first returns above the first return 2-m heights) were derived from discrete LiDAR data, while the mean height and stem number of trees were calculated based on the field-measured data.All five plot outliers have high levels of canopy cover, and most are located on steep slopes (Figure 9b).

Discussion
In similar forests, Fu et al. (2011) estimated above-ground biomass components using low hit density (point distance = 1.08 m) small foot-print discrete LiDAR data in 78 smaller circle plots (radius = 7.5 m or 15 m) in mountainous subtropical forests, southwest China [56].Compared with our results, less variability was explained for above-ground biomass within those broad-leaved (43%) and coniferous (68%) forests, and there was no significant correlation found in the mixed forests.Both the larger plot size (see Zhao et al. 2009 for details on the effects of plot size) [57] and the higher hit density may explain the more significant correlations found in this study.Studies predicting root biomass from small footprint LiDAR especially in subtropical forest are rare.Naesset, (2004) reported a R 2 = 0.86 for below-ground biomass in boreal forests [58].Pang and Li (2012) fitted regression models for biomass components in temperate forests in Northeast China and explained 85% of the variability in root biomass in conifers, which is slightly higher than reported in this study [59].The reason for the slightly lower accuracy in our study may be because of the complexity of multilayered subtropical forest, which has a far greater variability in stem height and density compared to boreal and temperate forests.Neuenschwander et al. (2012) extracted waveform metrics (e.g., amplitude, pulse width, integrated canopy energy) from FW data and found that the FW-metrics can help improving the characterization and classification of the vegetation [30].In this study, Most of the combo models (include both DR and FW metrics) show a higher variation explanation capability for predicting the biomass components than the DR-metrics (only) models, especially for of trunk biomass (in coniferous forests; with a adjusted-R 2 increase of 0.34) and foliage biomass (in broadleaved forests; with a adjusted-R 2 increase of 0.15), which is in agreement with their findings.Sumnall et al. (2012) investigated the estimation of forest structural variables from both discrete (system points) and full waveform LiDAR data [60].Height and intensity-related metrics were derived from DR data (system points), metrics of heights (derived from waveform points), amplitude and pulse widths of the waveforms were all derived from FW data.Results indicated that most of the FW-models (using both waveform points and full-waveform based metrics as independent variables) had a higher accuracy than DR models (using system points based metrics as independent variables) for estimating biophysical properties of a semi-natural forest in southern England, which agrees with our research which shows that most of the combo models had a higher performance than DR-models, and the full-waveform metrics contribute to the improvements by characterizing the detailed vertical structure of the forest stands.
For the discrete and full-waveform metrics analyzed in this study, these metrics related to vegetation heights, i.e., mean height, upper height percentiles (i.e., h 50 , h 75 , h 95 ), HOME (i.e., waveform centroid to the ground) and VDR (calculated from HOME and waveform distance), were the strongest predictors of biomass and its components.Larson (1963) found that the bole form and stem increment are strongly connected to the crown geometry and crown position [61].The LIDAR sensors measure 3-D structure characteristics of the forest canopy, which provides a good foundation for strong relationships between the LIDAR-derived metrics and forest biomass.Since most LiDAR returns are from dominant trees, the LiDAR mean height likely represents the height of overstory trees.The inclusion of waveform-derived HOME (which depicts the vertical arrangement of canopy elements) and VDR (describes the change of HOME relative to the canopy height) [25], helps to account for intermediate tree crowns in the midstory and suppressed trees in the understory, therefore potentially improving the characterizing of vertical structure.The predictive capabilities of these metrics are strong (in both the single and combo models) for estimating most of the biomass components, and they therefore provide a precise quantitative description of stand structure characteristics.This finding is consistent with the small-footprint discrete data [20], small-footprint full-waveform data [31,40] and large-footprint SLICER data [41,46] studies, in which mean height, height percentiles and HOME (and HOME-related waveform metrics) were found to explain most of variability in forest structure characteristics.
Compared to the individual DR and FW models, the combo models performed the best, explaining a large amount of variability for most biomass components, and indicating significant synergies in small-footprint DR and FW LiDAR data across the three types of subtropical forests.Fu et al. (2011) explored several regression models relating both percentile heights and canopy return density variables derived from airborne DR LiDAR for the estimation of above-ground biomass components in a similar forest type [56].Compared with our study, their results showed relatively low relationships for above ground biomass (R 2 = 0.37-0.68),and no significant correlation between foliage biomass and the DR metrics in the subtropical mixed forests.In this study, FW models have a higher variation explanation capability for predicting foliage biomass than DR models.The waveform statistic of HTMR μ and NP μ also seems to be sensitive to foliage biomass, and FS σ are significantly sensitive to the branch biomass.The results demonstrate promising results for improving the estimation of crown (i.e., branch and foliage) components, which would be useful for modeling forest growth and fire behavior.
In this study, the forest type-specific models represent a better performance than the general model (developed from all plots).This is consistent with the previous studies which demonstrate that forests with highly variable species composition should be stratified to ensure accurate biomass predictions [56,59].Naesset (2004) reported that forest type did not have any significant impact on the estimated biomass models in boreal forest (in southeast Norway, dominated by Norway spruce and Scots pine) [58].However, in our study, the complex forest conditions in subtropical forests contained greater species diversity, making the effects of tree-species composition (classified as forest types) significant.Overall, the models were more accurate for coniferous and broadleaved forests than in mixed forests across all models.The dominating coniferous and broadleaved species in our research site have a relatively even-aged tree structure and higher homogeneous composition, i.e., less variation in tree height and density at the plot level.In contrast, the mixed forest has more variability and less accurate LiDAR-derived heights within plots.Meanwhile, individual tree biomass in coniferous and broadleaved plots was calculated by relatively accurate species-specific allometric equations, while the mixed forest biomass was estimated from a less accurate, generalized allometric equation.Thus, the homogeneous forest conditions coupled with the accurate field-measured biomass is expected to strengthen the linear fit and reduce bias.Bater and Coops (2009) explored the relationship between the satellite-derived NDVI and LiDAR ground return density in a coastal western hemlock dominated forest, and found that ground return density decreased as NDVI (an estimate of vegetation cover) increased [55].Clark et al. (2004) also demonstrated that, in an evergreen tropical rain forest, RMS error of DEM on steep slopes was 0.67 m greater than on flat slopes [53].Meanwhile, the LiDAR ground retrieval was complicated by dense, multi-layered evergreen canopy in old-growth forests, causing an overestimation of 1.95 m.In this study, five plots with the largest bias of trunk biomass were analyzed by the factors of topographic and vegetation conditions.It was found that all of the five plot outliers had high levels of canopy cover, and most were located on steeper slopes.Thus, the lower predictive accuracy may due to less canopy penetration by LiDAR and consequently less accurate definition of the digital terrain model and canopy height model.Bater and Coops (2009) suggested that in those areas with high slope and low point density, the prediction uncertainty maps may provide insights into optimal DEM selection and improve the vegetation height estimation [55].

Conclusions
This study implemented two different approaches i.e., discrete data (DR) and full-waveform (FW) based approaches to extract plot-level LiDAR metrics, and evaluated the capacity of these metrics in predicting biomass and its components in three types of subtropical forests.The results indicated that the three sets of predictive models performed well across the different subtropical forest types (Adj-R 2 = 0.42-0.93,excluding foliage biomass).Forest type-specific models (Adj-R 2 = 0.18-0.93)were generally more accurate than the general model (Adj-R 2 = 0.07-0.79)with the most accurate results obtained for coniferous stands (Adj-R 2 = 0.50-0.93) in homogeneous forest conditions.In addition, the analysis of bias in the estimation of trunk biomass demonstrated that the lower accuracy may be due to less canopy penetration by the LiDAR returns, resulting in a less accurate digital terrain model and canopy height model.
According to the results, for the discrete and full-waveform metrics analyzed in this study, metrics related to vegetation heights, i.e., mean height, upper height percentiles, HOME and VDR, were the strongest predictors of biomass and its components.The FW-metrics were sensitive to the trunk biomass (in coniferous forests) and foliage biomass (in broadleaved forests).The results also demonstrated that most of the combo models (coupling of both FW and DR metrics) had improved performance than DR (only) models, and the FW metrics contributed to the improvements by characterizing the detailed vertical structure of the forest stands.
This study suggests that although the discrete and full-waveform LiDAR data extract forest canopy information in different ways, i.e., discrete data characterize the 3-D canopy structure, while full-waveform data recorded the detailed geometric and radiometric information of the canopy return, both discrete and full-waveform metrics can provide valuable information for accurately estimating specific biomass components in certain forest type of the subtropical forests.However, one should note that the choice of LiDAR metrics does impact the performance of models used for the biomass estimation, and different biomass pools are correlated with different LIDAR metrics.As a result, the combination of DR and FW metrics and the exploration of new sets of full-waveform metrics contributed to the development of more robust predictive models.
LiDAR biomass estimation has been conducted in temperate and boreal forests and the transition region of these forests [62][63][64].Published studies from tropical (and subtropical) forests are few.As a result, this study contributes to the accurate and quantitative estimation of biomass using state-of-the-art airborne LiDAR technology in the subtropical forests.In addition, to our best knowledge, this is the first time different forest biomass components (of the subtropical forests) have been tested and estimated by the pseudo-vertical waveform analysis approach.
Future works should focus on exploring more full-waveform metrics derived from the small-footprint full-waveform LiDAR data using the pseudo-vertical waveform analysis approach.For a improved (and direct) interpretability of the full-waveform derived variables, the full-waveform information could be aggregated at the individual tree level, and validated with field measurements at this fine spatial scale.This proposed work could provide a bridge between crown delineation algorithms and pseudo-vertical waveform analysis approaches, and would be useful for individual-tree based biophysical properties estimation and tree-species classification.62. Lim

( 30 ×
30 m) were established within the study site, covering a range of species composition, age classes, and site indices.These plots were divided into three types based on species composition: (i) coniferous forest (n = 15, dominated by Masson pine and Chinese fir); (ii) broad-leaved forest (n = 18, dominated by Sawtooth oak and Chinese sweet gum); and (iii) mixed species forest (n = 32, a mixture of coniferous and broad-leaved species).

Figure 1 .
Figure 1.Yushan forest study site and the locations of the plots of three main forest types: (a) Coniferous dominated forest; (b) Broadleaved dominated forest; (c) Mixed forest.Pictures and hemispherical photographs of three typical forest stands are shown beside the map of research area.

Figure 2 .
Figure 2. Technological route of this research.

Figure 3 .
Figure 3. Illustration of a typical Pseudo-vertical waveform in one sample plot.In this example, the full-waveform record has been normalized by computing the difference between their height and the DTM.Eight distinct peaks are apparent in the graph.The front slope angle (FS) is computed as the vertical angle from waveform beginning to the peak of first canopy return energy.

Figure 4 .
Figure 4. Intensity graph of the square of the Pearson's correlation coefficient (R 2 ) between each biomass components and LiDAR-derived metrics.The values of the coefficients are transformed into pixels within a blue-red color range.

Figure 5 .
Figure 5. Plot-level observed and estimated above-ground and root biomass for the combination of forest-type specific models developed form discrete data based metrics, waveform based metrics and their combinations.(a) Above ground biomass (AGB) estimated by discrete data(DR) based model; (b) AGB estimated by waveform(FW) based model; (c) AGB estimated by combo model; (d) Root biomass estimated by DR model; (e) Root biomass estimated by FW model; (f) Root biomass estimated by Combo model.

Figure 6 .
Figure 6.The selected LiDAR metrics of the discrete data model vs. full-waveform model in various forest types.

Figure 7 .
Figure 7.The selected LiDAR metrics of the combo model in various forest types.

Figure 8 .
Figure 8. Relationship between plot-level field measured and LiDAR-estimated biomass components of three sets of models (i.e., discrete data based model, full-waveform based model and the combo model) within each type of forest.Each biomass components are calculated as the average value of the plots within specific forest type.

Figure 9 .
Figure 9. Relationship between the amount of field-measured and LiDAR-estimated biomass components across all field plots (n = 65) (abscissa: all of the 65 field plots; ordinate: the amount of biomass components): (a) the field-measured biomass components are displayed as area.The biomass components estimated by the combo model are displayed as dots; five plots with the largest bias in trunk biomass are highlighted and analyzed against four topographic and vegetation factors; (b) slope (degree) and canopy cover (%) of all plots with the five outlier plots highlighted.

Table 1 .
Summary statistics of field measured biomass and biomass components in the study area (n = 65 plots).

Table 2 .
Summary of LiDAR metrics computed from discrete and full-waveform LiDAR data.

Table 3 .
The accuracy assessment results of three sets of predictive models among various forest types.

2 C-R 2 RMSE rRMSE Aj-R 2 C-R 2 RMSE rRMSE Aj-R 2 C-R 2 RMSE rRMSE
Note: Aj-R 2 : adjusted R-square; C-R 2 : The cross-validation coefficient of determination; RMSE: Root mean square error; rRMSE (%): relative RMSE, i.e., percentage of the ratio of RMSE and the observed mean value.See Table1for codes of biomass and its components.