Forest Canopy LAI and Vertical FAVD Profile Inversion from Airborne Full-Waveform LiDAR Data Based on a Radiative Transfer Model

Forest canopy leaf area index (LAI) is a critical variable for the modeling of climates and ecosystems over both regional and global scales. This paper proposes a physically based method to retrieve LAI and foliage area volume density (FAVD) profile directly from full-waveform Light Detection And Ranging (LiDAR) data using a radiative transfer (RT) model. First, a physical interaction model between LiDAR and a forest scene was built on the basis of radiative transfer theories. Next, FAVD profile of each laser shot of full-waveform LiDAR was inverted using the physical model. In addition, the missing LiDAR data, caused by high-density forest and LiDAR system limitations, were filled in based on the inverted FAVD and the ancillary CHM data. Finally, LAI of the study area was retrieved from the inverted FAVD at a 10-m resolution. CHM derived LAI based on the Beer-Lambert law was compared with the LAI derived from full-waveform data. Also, we compared the results with the field measured LAI. The values of correlation coefficient r and RMSE of the estimated LAI were 0.73 and 0.67, respectively. The results indicate that full-waveform LiDAR data is a reliable data source and represent a useful tool for retrieving forest LAI. OPEN ACCESS Remote Sens. 2015, 7 1898


Introduction
Forest canopy leaf area index (LAI) is a critical variable in modeling climates and ecosystems and is required by many terrestrial ecosystem models [1].LAI of high accuracy will promote accurate modeling of energy, carbon, water, and climate [2,3].Remotely estimating forest structure potentially provides an ecologically significant advance over laborious ground-based estimation methods and has become widely used to characterize forest structure [4].LAI is often retrieved from passive optical remote sensing, and there are two primary types of LAI inversion methods.The first is based on developing the empirical or semi-empirical relationship between the vegetation indices and the LAI, and the other is based on radiation transfer models using a look-up table (LUT) or a neural network (NN) [5], or an optimation method such as Levenberg-Marquardt algorithm [6].However, estimated LAI values tend to saturate in areas of dense vegetation and high biomass.
Light Detection And Ranging (LiDAR), which is rapidly emerging as an active remote sensing technique, can provide direct measurement of the vertical structure of canopy that traditional optical data cannot, and has been successfully used to estimate the canopy height, LAI, biomass, and other variables [7][8][9][10][11] and is shown to be capable of measuring dense canopies [12].
Normally, LiDAR systems are classified as discrete-return or full-waveform depending on their analysis of the reflected signal.Many discrete-return systems capture between two and five values for every return laser pulse [13].Full-waveform LiDAR provides more information regarding the structure and physical properties of the illuminated surface than discrete-return LiDAR.In the past, the full waveform data were often decomposed to produce 3D point clouds.Hofton et al. [14]decomposed the LVIS waveform into a series of Gaussian components using a nonnegative least-squares method (LSM), and Persson et al. [15] processed the Gaussian decomposition based on the expectation maximization (EM) algorithm for the TopEye Mark II system.Wagner et al. [16] tested LMS-Q560 data and demonstrated that the decomposition was successful for approximately 98% of waveform profiles.These clouds are subsequently used as the discrete data [17], or by building regression models using the calculated waveform metrics with field measurements to estimate forest parameters [18][19][20].However, neither approach uses all of the information provided in the waveform data.Adams et al. [21] explored whether the waveform metrics of Gaussian peak height, half-height width, and exponential decay rate can distinguish the return types of foliage, understory, and ground.However, the result was that these three metrics could not determine the return type.The metrics alone did not bear a direct relationship to foliage density, even though foliage density correlated well with the decay rate using a simple model based on Beer-Lambert law.
The interaction between LiDAR and vegetation should be explored further to make better use of the waveform information.Several studies have been performed on modeling waveforms in forest areas, which is particularly difficult due to the complexity of the internal structure of the trees [22].
Blair and Hofton [23] first simulated the return waveform under the following assumption: the waveform represents the vertical distribution of the intercepted surfaces with individual backscattering characteristics but with equal reflectivity, which is typical for dense forests, and their model is based on aggregating discrete return data rather than using an RT model.Sun and Ranson [24] developed a 3D model that successfully linked full waveform data to the structural and optical properties of forest by building a 3D forest stand scene divided into cells with specific characteristics.Although Koetz et al. [25] successfully retrieved LAI from a synthetic dataset using the 3D model, this study failed to retrieve the vertical foliage distribution of the forest.Ni-Meister et al. [26] developed a method to derive the gap probability and canopy cover from LiDAR waveforms by modifying a hybrid geometric optical-radiative transfer (GORT) model.The basic assumption of the model is that gap probability is the reverse of the vertical canopy profile because laser energy can only penetrate to the lower canopy layer or ground through gaps.However, due to the clumping effect for natural vegetation, the actual foliage profile cannot be determined; only the gap probability and apparent foliage profile can be directly retrieved from the LiDAR return [27].Armston, et al. [28] and Chen, et al. [29] used a simple RT model for direct retrieval of canopy gap probability from waveform LiDAR.This model can improve the estimation of canopy gap probability from waveform LIDAR data and is self-calibrating for the ratio of the reflectance from crown and ground.
Morsdorf et al. [30] simulated LiDAR return waveforms by combining three different modeling components: the leaf optical properties, the tree structure, and the LiDAR measurement process (which builds on the open-source ray-tracing program POVRAY).Using the multi-spectral canopy LiDAR data, the NDVI profiles of the trees were obtained to monitor the chlorophyll content of the forests.Lewis [31] built a canopy reflectance model that can create a 3D structure model of individual plants using Monte Carlo ray tracing.North et al. [32] used the Monte Carlo radiative transfer model of GLAS waveforms for 3D canopies within the framework of the FLIGHT model; the integrated waveform energy and directly derived bidirectional reflectance factors (BRFs) from FLIGHT were in good agreement.Hancock et al. [33] used the Monte Carlo ray tracer from Lewis [31] to create a full waveform LiDAR simulator of dual wavelength to separate canopy from ground over steep topography.Though Monte Carlo ray tracers make far fewer assumptions than did Sun and Ranson [24] and Ni-Meister et al. [26] in their methods, they are computationally complex.
Figure 1 illustrates the basic process of the LiDAR-forest interaction.When the LiDAR pulse penetrates the canopy, the return waveform reflects the inner structure of the canopy.This study aims to map the spatial and vertical distribution of forest structural parameters using waveform LiDAR data on the basis of a physical inversion model.The canopy that the laser pulse shoot was divided into many layers and the model assumes that either foliage or air is in the slice.The LAI and foliage area volume density (FAVD) of each layer (shown in small cubes in Figure 1) is addressed in the inversion process, and the final result of the LAI is calculated based on the FAVD retrieval results.The main steps are illustrated in Figure 2. First, the full-waveform data are denoised and decomposed to rebuild the emitting and return waveforms using the Gaussian decomposition parameters.Then, FAVD of each LiDAR shot is inverted based on the physical radiative transfer model between LiDAR and the forest scene.Due to the sampling missing data between footprints of the waveform data, FAVD of the missing sampling shots is filled in using a canopy height model (CHM) of the study area, which was generated from the discrete LiDAR data concurrently obtained with the full-waveform data.Finally, LAI was retrieved using the FAVD at a 10-m resolution and compared with the CHM-derived LAI and the in situ LAI measurements.Note that in this paper, the leaves and the trunk are not considered separately, so the FAVD and LAI are actually equivalent to plant area volume density and plant area index.

Study Sites
The study area, Dayekou forest (38.6°N, 100°E), is located in the Qilian Mountain area of Gansu province in the arid region of northwestern China.The mean altitude of this study area is approximately 2400 m, and the slope of the study area is relatively gentle with a mean value of approximately 8.2°.
The forest is influenced by the temperate continental mountain climate, and the annual precipitation occurs mainly in summer.Prevalent vegetation types at this site are mountain pastures and forests, and the dominant forest type is Qinghai Spruce (Piceacrassifolia).Figure 3 shows an RGB photograph and CHM image of the study area with a spatial resolution of 0.5 m.

Field Measurements
Field measurements were acquired during June 2008 in 15 sample plots with a size of 25 m ×25 m (shown as red crosses in Figure 3).LAI at each plot was measured using LAI-2000 Plant Canopy Analyzer (LAI-2000) [34] and Tracing Radiation and Architecture of Canopies (TRAC) [35], as described in detail in [36].
In our study, the true LAI was used as the test data, which is a combination of LAI measured using LAI-2000 and LAI measured using TRAC, and the ratio, namely foliage clumping index, which can convert effective LAI to true LAI, is derived from TRAC.

Airborne LiDAR Data
A full-waveform LiDAR system digitizes the complete waveform of the emitting laser pulse as well as the return laser pulse scattered back from the target.The discrete-return LiDAR system only records single or multiple returns often taken as the peak of a pulse.The airborne full-waveform LiDAR data for the study area was acquired on June 28, 2008 using a RIEGL LMS-Q560 laser scanner; the LiDAR system and flight parameters are listed in Table 1 [37].Though the scan angle range is ±30°, only the scan route that was under nadir was chosen, i.e. the scan angle of chosen dataset was less than 10°.
In addition, a CHM (Figure 3b) that represents the forest canopy height distribution was built by filtering the discrete LiDAR data concurrently obtained by the scanner.The discrete raw data was processed using the Terrasolid software to classify the data into two types, i.e., ground points and non-ground points, which can produce a digital elevation model (DEM) and a digital surface model (DSM), respectively.The CHM was created at a resolution of 0.5 m by subtracting the DEM from DSM.

LAI Derived from CHM
A 10 m LAI map derived from CHM was used as a dataset to compare with LAI derived from the full-waveform data.Here LAI was calculated based on the Beer-Lambert law [38,39]: where G(θ) is the foliage projection coefficient characterizing the foliage angular distribution, in a canopy with a spherical leaf angle distribution, G(θ) is approximated by 0.5, θ is the zenith angle of the incoming radiation, which is ignored since only data with scan angle <10° was used in this study, Ω is the clumping index, since the study site is mainly covered by Qinghai spruce, Ω was assigned to an constant 0.69 calculated from measurement by TRAC.fcover is the canopy cover factor.In this study, CHM pixels greater than zero are considered to be canopy pixels, and fcover is calculated as the ratio between the number of canopy pixels and the number of corresponding CHM pixels within the 10 m-resolution pixel.

Decomposition of Waveforms
Gaussian decomposition provides estimates of the location and scattering properties of the targets within the travel path of the laser beam, and it assumes that the cross-sectional profile can be represented by a series of Gaussian functions [16].The input parameters for the inversion model must be prepared by decomposition of the waveform data into a sum of Gaussian components.First, we assume the emitting laser pulse is Gaussian, and the return waveform is a series of Gaussian components: where l is the number of Gaussians, and αi, xi, and σi is the amplitude, position, and half-width of the Gaussian component, respectively.The essence of Gaussian decomposition is to solve 3l unknowns of L observation equations (L>3l) using the nonlinear least-squares method (LMS).Before decomposition of the waveform, the noise should be removed by calculating the threshold for each waveform.In our study, Persson's [15] algorithm, which is based on computing the median absolute deviation (MAD), was adopted to calculate the noise threshold.The equation is as follows: where wave is the original waveform, and median means calculation of the MAD from samples.The samples of the original waveform below the noise threshold were set to zero.Then, to eliminate isolated values, samples were set to zero if both left and right neighbors were also zero.To better estimate the number, amplitude, position, and half-width of the sub-Gaussian components, the initial values of these parameters should be defined.The initial values of l, xi, and αi were calculated based on the number, position, and value, respectively, of the local maxima of the denoised waveform, and the initial value of σi was set to 2, because inspection of the emitting waves indicated that their half-widths were near 2.
Equation ( 4) is the object function of the LMS optimization algorithm, where f(xi) is the observation value, yi is the calculated value, and wi is the observation error.Optimization is the process of constantly adjusting model parameters to minimize the objective function value.Sequential quadratic programming (SQP), which is an excellent optimization algorithm for solving nonlinear programming problems [40], is adopted to search for the cost function minimum.

/
(4) Figure 4 shows a pair of emitting and return waveforms before and after Gaussian decomposition.The emitting wave can be modeled by a Gaussian wave using nine sampling values.According to Dubayah and Drake [41] and Mallet and Bretar [22], for small-footprint LiDAR systems, the laser beam has difficulty penetrating the vegetation and has a high possibility of missing the ground under dense canopy.In this paper, we assume that the last return Gaussian wave from either ground or tree branch and is shown in the right figure of Figure 4 by the sub-Gaussian wave whose position is 40.4.The distance between the start of the entire signal and the start of the last wave is defined as the canopy range (Hr), shown as the distance between the two blue lines in Figure 4.The number of canopy layers is defined as the ratio between canopy range and the vertical sampling resolution of the LiDAR system (0.15 m).Moreover, a series of parameters after decomposition should be determined as the input parameters of the FAVD inversion model: the number of canopy layers m, the number of emitting wave samples n, and the reconstructed emitting and return waveforms.

Model of LiDAR Waveform of Forest
The physical model of the interaction between LiDAR emitting laser pulse and forest was built based on RT theories and a 3D model [24].In this model, the forest canopy is described as a turbid scattering medium that is parameterized by its FAVD, μ, the Ross-Nilson G-factor, and the foliage reflectance, rleaf.The background (soil) within the forest scene is parameterized by its reflectance, rsoil.
Because LiDAR is working under hot spot conditions, the bidirectional gap probability p of the canopy in the direction Ωi, at the canopy depth z, can be expressed as: where is the density of a one-sized leaf area at depth , , Ω is the G-factor, the mean projection of unit foliage area at a depth z' in the direction Ωi.
The emitting laser pulse was divided into n narrow pulses ( , 1 … , with the duration time set to the time resolution of the system (Δt 1 ns), and the canopy was divided into m layers from top to bottom, with the thickness Δz the same as the vertical range resolution of the system (Δz = Δt × c/2 = 0.15 m, c is the speed of light).Supposing the LiDAR return pulse begins at a time when the first sub-emitting pulse reaches the first canopy layer, then, when the ith sub-pulse reaches the jth sub-canopy layer, the time delay is: In addition, the returned energy is: where Γ • • is the back-scatter factor of the jth canopy, ω is a parameter determined by the LiDAR system, Γ is the scattering phase function of the canopy Γ / , and is the extinction above this canopy layer exp ∑ • • Δ [24].
When the ith sub-pulse reaches the m+1 soil layer, the returned energy is calculated from Equation (8).
To set the LiDAR system parameter, ω, the soil return sub-Gaussian wave extracted from the full return waveform was used.So, in this model, the calibration of the LiDAR system is not necessary.The return signal is recorded in m + n digital bins according to the time delay.

Inversion of FAVD and LAI
The input parameters of this model include the reflectance of leaves, the reflectance of soil, and the G-factor.The reflectance of leaves and soil were set to 0.55, and 0.2, respectively, according to their field measurements [42].G-factor was assumed to be 0.5, the same as in Section 2.4.The LMS method was used for the inversion process, and the cost function described in Equation ( 9) was used.(9) In Equation ( 9), Iobserve is the LiDAR observed return signal, Imodel is the result estimated by the interaction model, and n is the number of observation samples; moreover, the vertical FAVD within a laser shot, 1 ••• , was set to be a nonnegative to control the optimization process.

Filling in Missing LiDAR Data
After displaying the sampling positions of laser shots, it was found that many of the laser shots missed the denser canopy areas, as shown in Figure 5a.Note that all of the samples along the waveform are missing; this may occur either because the forest is so dense and high that the laser cannot penetrate or because the system fails to detect the return signal due to an excess of energy attenuation after penetration.
To obtain the complete forest information, the corresponding missing FAVD was filled in as described in this section.First, the number and position of the missing laser shots was calculated based on the temporal and geolocational information of the existing ones.The statistics of the temporal information for emitting laser shots indicate that the normal emitting time interval, Δt, is 0.00002 s; therefore, the missing laser shot number n and its corresponding coordinates , , , • ) where ΔT is the time interval between two existing neighboring laser shots, and (X1,Y1,Z1), (X2,Y2,Z2) are their coordinates.Then, the corresponding forest canopy height of the missing laser shot was searched on the CHM according to its coordinates.When the corresponding height was confined to the range [1.9, 50], the missing shot area was considered to be canopy, and in this case, FAVD of this shot could be filled in.The range [1.9, 50] is chosen based on the tree height measurements from the sample plots (1656 trees in total).To minimize the error, the nearest existing shot that had the same or similar canopy height value was found, and its inverted FAVD was assigned to the missing one.Following this step, the FAVD reconstruction for one missing shot was completed.Figure 5b shows the positions of the filled-in shots; the shapes of these filled-in shots are round circles corresponding to forest canopy.
Figure 5c shows the positions of the original and filled-in shots combined, and the number of empty holes is greatly reduced.Besides, among the filled FAVD profiles, about half of them have the height exceeding 8 m.
To validate this gap-filling method, 1500 out of 8000 waveforms were randomly removed from the site, and were filled up again using the above method.It was found that 939 waveforms (about 63%) can be filled.In reality, it is the areas with dense and high forest coverage that lacks the full-waveform data; this validation method cannot detect this bias.However, areas with tree height above 8 m were examined, and 82% waveforms can be filled.Besides, aggregated FAVD values of the 939 original and filled waveforms were compared, and results indicate that they have a correlation of 0.64.

Inversion of LAI from FAVD
The diameter of the laser shot is considered a function of the laser incidence angle (α) and the relative flight height (Hf) approximately expressed as: In this equation, α is determined by the laser beam divergence and the illuminating surface slope; because the laser shot is small, the surface slope effect is not considered in our study, and α is assigned as 0.5 mrad, according to the laser beam divergence (see Table 1).According to the flight parameters around the sample plots, D is primarily in the range of (0.3~0.4 m).
To retrieve the 2D LAI information from the 3D FAVD of the inversion results, a cube was constructed, with the size determined by the coordinate range of the LiDAR samples.To guarantee that there were enough samples within the voxels in the cube, the size of the voxel was set to be 10 m × 10 m × 15 m.Supposing the LiDAR system takes a random sample after the reconstruction work, the FAVD of one voxel is the volume weighted average of the FAVD of the samples (u1, u2•••, uk) within the voxel, as described by Equation (16).
In Equation ( 14), Hk is the height from the sensor to the target, the value of which can be extracted from the LiDAR data.
After calculation of the voxel FAVD values, the 2D LAI information can be obtained by integration of FAVD values in the height.

Vertical FAVD Profile on Laser Shot Scale
A total of 40,022 return waveforms were inverted by the model in this paper.Figure 6 shows the FAVD inversion results of two waveforms: the left images are the original observed return waveforms and the modeled waveforms, and the right images are the results of FAVD of each canopy layer, with a total of 18 and 13 values, respectively.The red waves were modeled by the LiDAR-forest radiative transfer model.The modeled waveforms and the original waves align well.To better illustrate the inverted FAVD of our study area, Figure 7 gives a vertical profile of the FAVD distributed in the rectangular slice area in Figure 3a.It can be found from Figure 7 that trees are very sparse on the southern side of the mountain, which is consistent with the field survey and the photograph in Figure 3. Figure 8 is the cumulative frequency distribution of FAVD for the whole study area; 42.5% of FAVD values are below 0.1, 80.1% are below 1, and 94.7% are below 3.However, little correlation was observed between the CHM derived LAI and full-waveform derived LAI (r is 0.20, Figure 10a).The histogram of CHM derived LAI shows that there is only one LAI peak located at 3.13 (Figure 10b).This may result from using simply canopy cover to calculate LAI, ignoring the vertical information of the canopy, and thus cannot detect the differences between high trees and low grass.Moreover, full-waveform derived LAI is larger than CHM derived LAI at higher LAI values (LAI > 2.5), and CHM derived LAI is more easily saturated.

LAI Inversion and Comparison with field measurements
In addition, the estimated LAI values were compared with the field measured LAI in 15 sample plots to address the accuracy of our method; as shown in Figure 9c, the value of the correlation coefficient, r, is 0.73, and the RMSE value is 0.67.

Discussion
This study represents an attempt to retrieve vertical forest information from full-waveform LiDAR data.This technique can still be improved in several aspects.First, this method must be tested on different forest types and expanded to a larger study area; due to the lack of computing capacity, only a small area was chosen in this study.Second, due to the lack of field measurements of forest FAVD distributions in our study area, the FAVD inversion results should be validated in a future study.Furthermore, this LiDAR-forest interaction model is only suitable for use with small-footprint full-waveform LiDAR data; if large-footprint LiDAR data, such as GLAS [43] or LVIS [44] data are used, the slope effect must be considered and corrected [45]; besides, multi-scattering is ignored in this RT model since the laser beam divergence is less than 0.5 mrad, while for large-footprint LiDAR data, the multi-scattering effect should be included.Lastly, the gap filling method used in this paper will cause underestimation of LAI, since the true FAVD cannot be obtained due to the loss of full-waveform data in dense and high forest.In future work, a more detailed assessment of the potential biases of this method is needed.However, if high-density sampling LiDAR data is available, then this gap filling procedure and its potential biases could be avoided.With the development of LiDAR systems and the feasibility of LiDAR data, more attention will be devoted to studies concerning the physical interaction mechanism between LiDAR and the target, or the combination of LiDAR data with other remote sensing data sources, such as hyperspectral and high spatial resolution data.

Conclusions
Recently, various approaches have been undertaken to retrieve canopy structural information directly from LiDAR data sources, or by integration of optical remote sensing data [46].However, current work has failed to make full use of the waveform information, nor to explore the physical mechanism of the interaction between LiDAR and vegetation.The main advantage of this study consisted in direct and rapid retrieval of forest canopy vertical FAVD profiles and LAI from airborne full-waveform LiDAR data, based on the physical LiDAR-canopy interaction model.The method discussed in this paper makes it possible to take advantage of the geolocational, temporal and intensity information of the full-waveform data, besides, with the aid of CHM data, the inconsistency and data gap could be eliminated to some extent.At last, we compared the full-waveform derived LAI with field measurements, the values of correlation coefficient r and RMSE were 0.73 and 0.67, respectively.This result indicated that LAI inverted from waveform LiDAR data has the ability to serve as one of the validation sources for other products and data.

Figure 1 .
Figure 1.Basic process of the Light Detection And Ranging (LiDAR)-forest interaction.

Figure 2 .
Figure 2. Flow chart of leaf area index (LAI) retrieval from full-waveform LiDAR data.

Figure 3 .
Figure 3. Distribution of the study area and field sites on (a) RGB photograph and (b) CHM image.Both images have a spatial resolution of 0.5 m.

Figure 4 .
Figure 4.A pair of emitting and return waveforms before and after Gaussian decomposition.

Figure 5 .
Figure 5. Positions of the laser shots (a) original (b) filled using the ancillary CHM data (c) original and filled.

Figure 6 .
Figure 6.FAVD (Foliage Area Volume Density) inversion results of two different waveforms.

Figure 7 .
Figure 7. FAVD distribution in the green rectangle area in Figure 3. Trees on the southern side of the mountain are very sparse.

Figure 8 .
Figure 8. Cumulative frequency distribution of FAVD for the whole study area; 42.5% of FAVD values are below 0.1, 80.1% are below 1, and 94.7% are below 3.

Figure
Figure 9a,b show the LAI inversion results of our study area.A comparison of (a) and (b) illustrates that LAI distribution is consistent with the canopy density.There are two LAI peaks (0.9 and 3.7) shown in Figure 9b corresponding to the two different landcover types in the study area, which are grass and Qinghai spruce, respectively.

Figure 9 .
Figure 9. (a) LAI inversion results of the study area (with the spatial resolution of 10 m).(b) Histogram of LAI.(c) Scatter plot of the inverted LAI and the field measured LAI (r = 0.73, RMSE = 0.67).