Combined Use of Airborne Lidar and DBInSAR Data to Estimate LAI in Temperate Mixed Forests

: The objective of this study was to determine whether leaf area index (LAI) in temperate mixed forests is best estimated using multiple-return airborne laser scanning (lidar) data or dual-band, single-pass interferometric synthetic aperture radar data (from GeoSAR) alone, or both in combination. In situ measurements of LAI were made using the LiCor LAI-2000 Plant Canopy Analyzer on 61 plots (21 hardwood, 36 pine, 4 mixed pine hardwood; stand age ranging from 12-164 years; mean height ranging from 0.4 to 41.2 m) in the Appomattox-Buckingham State Forest, Virginia, USA. Lidar distributional metrics were calculated for all returns and for ten one meter deep crown density slices (a new metric), five above and five below the mode of the vegetation returns for each plot. GeoSAR metrics were calculated from the X-band backscatter coefficients (four looks) as well as both X- and P-band interferometric heights and magnitudes for each plot. Lidar metrics alone explained 69% of the variability in LAI, while GeoSAR metrics alone explained 52%. However, combining the lidar and GeoSAR metrics increased the R 2 to 0.77 with a CV-RMSE of 0.42. This study indicates the clear potential for X-band backscatter and interferometric height (both now available from spaceborne sensors), when combined with small-footprint lidar data, to improve LAI estimation in temperate mixed forests.


Introduction
Leaf area index (LAI) is an important canopy descriptor used to estimate growth and productivity in forest ecosystems. Watson [1] stated one of the early definitions of LAI as the total one-sided area of leaf tissue per unit of ground surface area. Thus, LAI is a dimensionless index that represents an important method to quantify the amount of photosynthesizing tissue in forests. Leaves are radiation receivers (depending on the amount of productive leaves and their specific surface area, they absorb between 80 to 90% of the light assimilated by forests). Leaves are the main photosynthesizing organ in forest stands, thus variations in leaf production and light interception are directly related to forests growth and development. Accordingly, LAI is a key variable that can be used to monitor current forest stand growth and has become a key explanatory variable for ecosystem process models.
Remote sensing estimation of LAI has been mostly based on empirical modeling, using vegetation indices, generally developed with the spectral reflectance from the near-infrared and red wavelengths, and their correlations with ground-truth estimates. However, the use of optical imagery carries some disadvantages, as follows: (1) it is only suitable when evaluating horizontal variation; (2) optical passive sensors are unable to obtain data from the ground when it is obscured by clouds, and most importantly; (3) vegetation indices calculated using optical imagery tend to reach a saturation point when LAI values are above 3 [2][3][4]. This last limitation can be particularly important when estimating LAI in the eastern US hardwood and mixed forests where reported estimations have ranged from 3.9 to 7.3 [5] and from 3.5 to 5.1 [6].
Two fairly recent technologies could potentially improve the estimate of LAI in these forests where canopies can vary greatly not only horizontally but also vertically, and the likelihood of reaching a reflectance saturation point is high. Light detection and ranging (lidar) sensors measure the time between the emission and reception of laser pulses to estimate the location and height of the target feature. They thus acquire information in three dimensions (x, y, and z coordinates) and provide the means to evaluate variation across a vertical profile. Previous studies in which LAI was estimated, in mixed forests (distributed around the world) using lidar data, have reported R 2 's between 0.37 and 0.93 related to a number of indices and lidar metrics. For a coniferous and mixed beech forest in Italy, the laser penetration index (LPI, taking into account the transmission of the laser beams through the canopy, as the proportion of laser pulses that hit the ground to the total number of pulses) was created explaining 89% of the variation of LAI [7]. Later, Kwak et al. [8] using the LPI and an interception index (LII) could explain 86% of the variation in a South Korean mixed forest. In Japan, using the ground fraction of returns, Sasaki et al. [9] reported an adjusted R 2 of 0.80 for an evergreen and broad-leaved forest. Solberg et al. [10] reported correlations with laser penetration from 0.37 to 0.93 (depending on plot size), in a mixed forest in Norway. Hopkinson and Chasmer [11] evaluated coniferous and mixed forests in Canada, and found R 2 's ranging from 0.58 to 0.78, when using either lidar returns ratios or intensity of returns ratios. In United States, Zhao and Popescu [12] found an R 2 of 0.84 in a mixed hardwoods and coniferous (including plantations) from Texas, and Richardson et al. [13] reported R 2 values from 0.49 to 0.66 for a Pacific Northwest mixed forest. Other efforts to estimate LAI with lidar, using different approaches than LPI and in either coniferous or hardwood forests only, have shown similar promising results [14][15][16][17][18][19]. In addition, none of these prior studies have reported a maximum LAI or saturation problem using lidar.
Dual-band interferometric synthetic aperture radar (DBInSAR) can now be collected using the geographic synthetic aperture radar (GeoSAR) airborne radar mapping system. GeoSAR acquires X-band (VV, 9.7 GHz) and P-band (HH, 0.35 GHz) simultaneously over 11 km swaths [20]. GeoSAR has emerged as a potential instrument to be used to estimate forest attributes, such as canopy height [21] and biomass [20,22]. Long wavelengths from the P-band (0.85 m) penetrate the upper canopy and can reach the ground; short wavelengths (0.03 m) from the X-band are scattered at the top of the canopy. This technology has been widely used in tropical regions where forest canopies are usually under clouds most of the year [23,24].
Previous attempts to estimate LAI using SAR (Synthetic Aperture Radar) data have found low correlations between ERS-2 (European Remote Sensing Satellite-2) SAR backscatter and LAI or biomass, but significant correlations between a green leaf biomass index (calculated using ERS-2 SAR backscatter) and LAI, in Mediterranean vegetation [25]. Sun et al. [26] evaluated the correlations of biomass with the Laser Vegetation Imaging Sensor (LVIS) and with SAR signatures. Their results showed that the predictive models that used both lidar samples and radar imagery explained between 63% and 71% of the variability in biomass. In addition, as an effort to simulate the X-band interferometric heights using airborne lidar, Solberg et al. [27] reproduced InSAR heights with an R 2 of 0.78 by using a simulator that could be used for understanding how vegetation changes may affect the InSAR data. Other researchers have found saturation problems for the C-band (radar band that operates at a wavelength of 4-8 cm) backscatter with high values of LAI in tundra ecosystems and plantation forests [28,29]. Manninen et al. [30] used a C-band backscatter ratio from ENVISAT (ENVIroment SATellite)/ASAR (Advanced Synthetic Aperture Radar) in a mixed forest obtaining an RMSE of 0.27.
While these and other studies have used radar backscatter to estimate LAI, none to date has assessed the potential utility of interferometric heights for LAI estimation. Given that lidar data have been shown to enable robust LAI estimation, and both lidar and DBInSAR can be used to estimate canopy heights, we posited that the DBInSAR data from GeoSAR could be useful for remote sensing of LAI. As such, the objective of this study was to determine whether leaf area index in temperate mixed forests is best estimated using multiple-return airborne laser scanning (lidar) data or dual-band, single-pass interferometric synthetic aperture radar data (from GeoSAR) alone or both in combination.

Study Site
The study area was located in the Appomattox-Buckingham State Forest in Virginia, at 37°25'9"N and 78°40'30"W ( Figure 1). This state forest is the largest in Virginia, covering approximately 8,094 ha. It is located within the Piedmont physiographic region. The elevation in the area ranges between 159 to 238 m, with a gentle relief often described as rolling slopes and flat terrain. The mean annual precipitation is between 78 to 120 mm; and the mean annual temperatures can vary from −4 (winter) to 31 °C (summer). The state forest is managed as multiple use, with forestry and hunting the predominant activities. It is divided in 27 compartments that include 1,244 stands. This forest is composed of coniferous, hardwoods, and mixed stands. Three pine species are present, as follows: loblolly pine (Pinus taeda L.), shortleaf pine (Pinus echinata Mill.), and Virginia pine (Pinus virginiana Mill.). Among the deciduous trees are northern red oak (Quercus rubra L.), white oak (Quercus alba L.), red maple (Acer rubrum L.), yellow poplar (Liriodendron tulipifera L.), blackgum (Nyssa sylvatica Marsh.), and American beech (Fagus grandifolia Ehrh.). The forest stands had been previously classified based on the species groups and their relative stoking [31], in six forest vegetation types: bottomland hardwoods, upland hardwoods, pine-hardwoods, loblolly pine, shortleaf pine, and Virginia pine. The measurement plots were installed following the US National Forest Inventory and Analysis (FIA) guidelines [32] and were of two types: fixed radius plots and variable radius plots, the latter based on basal area guidelines. The fixed radius plots were installed in 1999 and the variable radius plots were installed in 2002 using a basal area factor of 10 (BAF) and following a grid of 201.17 m [33]. The plots were composed of 4 circular sub-plots (one in the center and three in a triangle shape around the center); each sub-plot has a radius of 7.32 m (for tree measurement), and they are 36.58 m apart from each other. For more details about this study design (see [34]). The center of the plots was found using GPS navigation. All plots (fixed and variable radius) were measured during the 2008 dormant season. Out of the 81 fixed radius plots initially installed, only 51 were measured for this research, since 12 of them had been harvested. Additionally, from the 219 variable radius plots already installed, only 30 could be measured, and those were distributed mainly near the access roads ( Figure 1). The plots measured were distributed among 25 stands covering over 405 ha.
Total tree height (ht) and diameter at breast height (DBH) were assessed for every individual with a DBH > 2.54 cm within the measurement plots using a Haglöf Vertex hypsometer and diameter tape. A 20 m buffer was applied to each plot (from its center) to generate circular plots of 1,256.6 m 2 of size.

Leaf Area Measured with an Optical Sensor
Leaf area index was indirectly estimated during late summer (23-25 September 2008), using the LiCor LAI-2000 Plant Canopy Analyzer. Although, the ground data collection was undertaken in September, the vegetation was mostly green at all levels of the forest vertical profile. The LAI-2000 measures the gap fraction (the probability that a ray of light will go through the canopy without intercepting any plant element), and applies the Beer-Lambert law to estimate leaf area, assuming that the foliage elements (independent of whether they are leaves, needles or shoots) are randomly distributed, and that the leaf size is small compared to the size of the canopy. Thus, when these assumptions are not met, underestimations can occur. However, this instrument has been widely used in previous research, due to it being one of the non-destructive methods to estimate leaf area index, and due to its portability, especially when sampling plots with high amount of understory located in remote areas. Above-canopy readings were recorded remotely every 15 s by placing an instrument in an open field adjacent to the stand during the same date and time that measurements were taken inside the stand. The measurements inside the stand, below-canopy readings, were made holding the instrument at the height of 1 m facing upwards. This same procedure was repeated in every plot regardless of the presence of understory or mid-story vegetation. Due to the instrument design, measurements were taken under diffuse sky conditions to ensure that the sensor used indirect light only. Thus, measurements were taken during the dawn and pre-dusk periods, with the above instrument facing north and using a 90° view cap. Sampling points were distributed in the following manner: one reading at the center of the plot, and one reading at 5 m away from the center in each cardinal direction (north, south, east and west), for a total of 5 readings per plot. The calculation of LAI was accomplished using the FV-2000 software which averaged all the readings per plot. The canopy model used to calculate LAI was Horizontal [35]; the ring number 5 was masked to reduce the error introduced by the stem and branches of trees; and the option of skipping records with transmittance >1 was used in order to avoid bad readings that can alter the mean values of LAI per plot. The above and below canopy records were matched by time [36]. Table 1. Explanatory variables derived from lidar and GeoSAR. Return hag refers to the return height above the ground. Statistics in subscripts were as follows: frequency (total), mean, mode, standard deviation (stdv), coefficient of variation (cv), minimum (min), maximum (max), and height percentiles (10th, 20th,…,90th). The metrics Gr total , All total , Veg total , Gr returns , All pulses , and Veg pulses were determined for calculation of other metrics (i.e., proportions of returns), but were not used for model development.

Lidar Metrics
Symbol Total number of ground returns Gr total

All returns (return hag > 0.2 m)
Units are meters for all metrics except for All total and All cv .
All total , All mean , All stdv, All cv, All min , All max , All 10th ,…, All 90th Vegetation returns (return hag > 1 m) Units are meters for all metrics except for Veg total and Veg cv .

Intensity values (returns hag > 1 m) Units are watts for all metrics except for I cv .
I mean , I min , I max , I stdv , I cv

Lidar Data
Small footprint lidar data were acquired in late August 2008. The system was an Optech ALTM 3100 with an integrated Applanix DSS 4K × 4K DSS camera. The data have multiple returns with a sampling density of 5 pulses per square meter, with 4 or fewer returns per pulse. The scan angle was less than 15 degrees. Instrument vertical accuracy over bare ground is 15 cm, and horizontal accuracy is 0.5 m. The inverse distance weighted interpolation method was used to generate a digital elevation model (DEM) with the data classified as ground returns [34]. Vegetation returns were classified using a 1 m threshold because the instrument used to estimate LAI in situ was held at approximately 1 m above the ground. The metrics derived from the ground returns class (Gr) were: frequency (count) of returns and frequency (count) of pulses ( Table 1). The metrics derived from the all returns class (All) were: frequency (count), mean height, standard deviation, coefficient of variation, minimum, maximum, percentiles (10,20,25,40,50,75, and 90; as the height or value at which a percentage-10, 20, etc.-of the total number of returns are below that value), and frequency (count) of pulses [34,37]. The metrics derived from the vegetation returns class (Veg) were the same described for the all returns class with the addition of the mode. The distribution of intensity values (I) were described using the mean, minimum, maximum, standard deviation, and coefficient of variation. First, second, third and fourth returns were classified as such and divided by the total number of "vegetation returns" (R). The Laser Penetration Index (LPI) [38] was calculated per plot as the proportion of ground pulses to the total pulses. Density metrics (d) were calculated following [39], as the proportion of returns found on each of 10 sections equally divided within the range of heights of vegetation returns for each plot. Additionally, another set of metrics, crown density slices (Cd), was calculated using the mode value of vegetation returns. Ten 1-m sections of vegetation returns (5 above and 5 below the mode value) were classified and proportion of returns to the total number of returns, mean, standard deviation, and coefficient of variation were calculated ( Figure 2).

Figure 2. (a) Hypothetical representation of crown density slices derived from lidar
Veg mode value (height to live crown was not measured on the ground). Five 1-m sections above and below the mode were defined, and the descriptive statistics (i.e., frequency, mean, standard deviation, and coefficient of variation) from the returns within each section were obtained. See Table 1 for variable names and how they were calculated. (b) Crown density values for an upland hardwood plot.
Frequency of returns (count), calculated from each of the lidar data point classes, was used only to estimate other metrics, such as proportions of returns, but was not used in the development of the models (Table 1). Ground plots were overlaid on digital orthophotographs acquired at the same time as the lidar data. Sixteen plots partially encompassed roads or herbaceous areas. These plots were eliminated from the dataset.

GeoSAR Data
GeoSAR data were acquired in late summer 2008. The system recorded data from two microwave bands, X (VV, 9.7 GHz) with a 0.03 m wavelength and P (HH, 0.35 GHz) with a 0.85 m wavelength, in single passes. Postings from the X-band were 3 m; those from the P-band were 5 m. GeoSAR X-band interferometry yields a digital surface model (DSM) and P-band interferometry is used to create a digital elevation model (DEM). Previous research has used the difference between the DSM and DEM to create a canopy height model used to estimate forest biomass [20]. The provider (Fugro EarthData, Inc.) performed the preprocessing, including both the interferometry and generation of two orthorectified magnitude images: (1) the magnitude from bands X and P, expressed as the square root of the intensity values and (2) the sigma-0 (σ 0 ) or backscatter coefficient from all four looks (North, South, East, West), defined as the backscatter power per unit area on the ground. Analogous to those used with lidar-derived heights and intensities, GeoSAR metrics were developed using the following approach (see also Table 1):  In order to evaluate the vegetation height, the difference between X-band (mostly backscattered from the vegetation/canopy surface) and P-band (mostly from the ground and lower tree branches) interferometric heights was calculated. In addition, the X-band was divided by the P-band with the purpose of evaluating any other relationship between the two bands.  The high resolution DEM created from the lidar data was used to generate the heights above ground for the X and P bands.  No changes were made to the magnitude or the σ 0 bands.  The cell values from all the rasters created (10 in total) were extracted and the frequency, mean, standard deviation, coefficient of variation, minimum, maximum, and percentiles (10th to 90th) were calculated for all plots.

Statistical Analysis
A dataset of 81 plots was compiled for all lidar-derived, GeoSAR-derived and ground-truth metrics. However, after deleting plots for proximity to roads and for being outliers (but not influential), the number of plots was reduced to 61. Pearson correlation coefficients were used to evaluate relationships among lidar metrics, GeoSAR metrics and ground estimated LAI. Multiple regressions were used to fit the dataset. Best subset regression models were examined using the RSQUARE method for best subsets model identification [40]. This method generates a set of best models for each number of variables (1, 2,…,6, etc.). The criterion to choose the models with the best group of variables was a combination of several conditions, as follows:  Low residual mean square (RMSE).  Similarity between the adjusted coefficients of determination R 2 adj' and R 2 values. The R 2 adj' is a rescaling of R 2 by degrees of freedom, hence involves the ratio of mean squares instead of sum of squares.  Mallows' C p statistic values [41]. When the model is correct, the C p is close to the number of variables in the model.  Low values from two information criteria, the [42] Information Criterion (AIC) and [43] Bayesian Criterion (SBC). The AIC is known for its tendency to select larger subset sizes than the true model; hence the SBC was used for comparison, since it penalizes models with larger number of explanatory variables more heavily than AIC.
The best models chosen per each subset size (based on number of variables in the models) were evaluated for collinearity issues. Near-linear dependencies between the explanatory variables were evaluated using computational stability diagnostics. In order to make independent variables orthogonal to the intercept and therefore remove any collinearity that involves the intercept, independent variables were centered by subtracting their mean values [44,45]. The variance inflation factor (VIF) with a threshold of 10 was used to quantify how much the variance of an estimated regression coefficient was inflated. However, VIF neither detects multiple near-singularities nor identifies the source of singularities. Hence, the condition index (CI), as the square root of the ratio of the largest eigenvalue to the corresponding eigenvalue from the dataset matrix, was also evaluated for all variables within the models. Similar to VIF, the CI indicates weak dependencies when higher than 10 but lower than 30, and severe dependencies when higher than 30 [46].
Additional data to test the models were not available, thus cross-validation analysis was performed using the prediction sum of squares (PRESS), which is the sum of squares of the difference between each observation and its prediction when that observation was not used in the prediction equation [47]. The root mean square error from the cross validation analysis (CV-RMSE, also known as RMSE pred ) was then calculated as the square root of the ratio between the PRESS statistic and the number of observations. The CV-RMSE is an indicator of the predictive power of the model. The significance level used for all the statistical tests was α = 0.05 (p-value < 0.05). This p-value was used to evaluate if the variables included in the model were statistically significant as well. The squared semipartial correlation coefficients (SSCC) were calculated using partial sum of squares to determine the contribution from each variable to the models, while controlling the effects of other independent variables within the model. These coefficients represent the proportion of the variance of the dependent variable associated uniquely with the independent variable.
Although the statistical analyses applied to the dataset of 61 plots did not show the presence of outliers, three of these plots with measured low LAI values (1.34 to 1.43) could potentially be influencing the dependent vs. independent variable relationships. Therefore, best subset analyses were also applied to the dataset after removing these three observations (n = 58).

Summary Statistics from Ground Measurements and Lidar Metrics
The 61 plots were distributed within the different forest types as follows: 3 in bottomland hardwoods, 18 in upland hardwoods, 4 in mixed pine-hardwoods, 24 in loblolly pine, 6 in shortleaf pine, and 6 in Virginia pine. For all forest types, stand age ranged between 10 and 164 years.
Mean tree height ranged from 13 m to 16 m, and mean dbh from 13 cm to 24 cm. Mean leaf area index values estimated on the ground, for all forest types, were between 3.1 to 4.1 ( Table 2). For all groups of forest types, the mean number of lidar ground returns ranged between 222 and 555 returns/plot area (1,256.6 m 2 ), and for all returns (hag > 0.2 cm) from 4,343 to 5,278 returns/plot area. Mean lidar heights above ground were between 9.9 m to 13.2 m, with standard deviations ranging from 4.5 m to 6.8 m (Table 3).  Minimum heights were set to 0.2 m, and maximum values ranged from 25.3 m to 37.6 m. Intensity mean values from vegetation returns (hag > 1 m) were observed between 37 to 51 watts/plot. Standard deviations from the intensity values were over 20 watts/plot for all groups of plots. Laser penetration index (LPI) was lowest (0.003) for the pine-hardwoods and shortleaf pine group of plots, and highest (0.039) for the upland hardwood plots.
The mean number of cells per plot from the GeoSAR P-band was 49, and for the X-band was 138. Mean heights from the P-band ranged from 5.46 m to 10.48 m, while for the X-band they ranged from 10.84 m to 16.06 m (Table 4). Mean heights from the X-band were always higher than lidar returns, except for the upland hardwood plots. However, maximum values from the lidar returns were as much as 10 m higher than the maximum values from the X-band. P-band mean height values were high (up to 18 m) from the ground, which made the difference of X and P bands to be low, sometimes as low as half the mean height observed from the lidar returns. A comparison between lidar returns and GeoSAR heights for an upland hardwood plot can be visualized in Figure 3. The range of magnitude values from the P-bands was larger than from the X-band, as shown by the standard deviation. The vertical profile (distribution of heights vs. frequency) from lidar returns ( Figure 4) showed two peaks, one at the mode value (13 m) and a second one at lower height. The latter might be related to a well-defined understory stratum in the forest stands. Also, a graph was obtained from the distribution of GeoSAR X-band heights, showing two peaks. Although, the highest peak (corresponding to the overstory vegetation) was at a similar height than the peak from lidar, the lower peak (corresponding to the understory vegetation) was a few meters higher than the peak acquired from lidar ( Figure 4). This could be due to the level (height) at which the X-band penetrates to the forest, which unavoidably misses a large amount of vegetation present in the understory. Table 4. Means of GeoSAR cell values per forest type. P and X band heights were calculated by subtracting the values from a DEM created from the lidar returns (n = 61). Column annotation: X − P (X-band minus P-band), P mag (P-band magnitude values), X mag (X-band magnitude values), n (number of observations or plots), Stdv (standard deviation), and Max (maximum value).   Pearson correlation coefficients were summarized for the variables included in the best models (Table 5). Laser penetration index (LPI) had the highest correlation with LAI (−0.698), followed by All 10th (0.638) and X 50th (0.609). Also, d 2 (−0.347) and X cv (−0.485) were statistically significant. The 10th and 20th percentiles (height values) were the only percentiles of any type significantly correlated with LAI. It is important to mention that in the past, the relationship between LAI and LPI has been reported as linear [38] and as curvilinear (as the logarithmic transformation of LPI) [48]. In this research, we evaluated both relationships as variables in the models. The Pearson correlation values were similar, −0.689 for the curvilinear model vs. −0.698 for the linear approach. Also, the results from the subset analyses showed consistently similar or higher R 2 's values when using the linear relationship than when using the logarithmic transformation of LPI. This is probably due to a combination of factors, such as ecosystem type and range of LAI values. As a result, the models reported include the variable from the linear relationship only, which not only performed similarly to the curvilinear, but also makes the models easier to use and interpret. The best models from lidar metrics had R 2 values up to 0.69 with 4 variables in the model. Adding more variables increased the R 2 and resulted in no collinearity problems. However, there was always at least one variable not contributing significantly to the model. Hence, only models with 2 and 4-variables were reported (Table 6). Common variables in these models were LPI and All 10th , the increase in R 2 (from 0.58 to 0.69) was given by the d 10 and Cd-3 metrics. The largest contribution in both models was from the LPI (0.174 and 0.202), and in the 4-variable model the other three variables (All 10th , d 10 , and Cd-3) had a similar contribution (0.053, 0.064, and 0.059). Predicted values from the 4-variable model were plotted against the measured LAI ( Figure 5). The results from the best subset analyses for GeoSAR metrics showed that although the R 2 values increased when adding more variables to the model, the R 2 adj' did not, therefore only a 4-variable model with an R 2 of 0.52 is shown in Table 6. The variable that contributed the most was X 50th (0.127), followed by X cv (0.098), sn01xl cv (0.047), and Xmag stdv (0.035). All variables included in the lidar only and GeoSAR only models had a VIF and CI lower than 5.  Table 6. Best predictive models of LAI using lidar metrics only and GeoSAR metrics only, n = 61. The statistics R 2 adj' , CV-RMSE, SSCC, VIF, and CI are the adjusted coefficient of determination, the RMSE from the cross validation analysis, the squared semipartial correlation coefficient from partial sum of squares, the variance inflation factor and the condition index, respectively. For a description of the variable names refer to Table 1. All variables in the models were highly significant at a p-value < 0.001.  Table 7. Best predictive models of LAI using lidar metrics (including crown density slices) and GeoSAR metrics, n = 61. The statistics R 2 adj' , CV-RMSE, SSCC, VIF, and CI are the adjusted coefficient of determination, the RMSE from the cross validation analysis, the squared semipartial correlation coefficient from partial sum of squares, the variance inflation factor and the condition index, respectively. All variables in the models were highly significant at a p-value < 0.0001. For a description of the variable names refer to

Variable Selection and Modeling
The best-performing models from the best subsets regressions using the metrics from lidar and GeoSAR combined are reported in Table 7. The R 2 values ranged from 0.66 from a 2-variable model to 0.77 from a 6-variable model. The All 50th and X 50th variables were included in all models; the latter was the only variable from GeoSAR that was included. Other variables included in these models from lidar were LPI, d 2 , and two crown density metrics (Cd-1, and Cd-3 stdv ). The largest contributions (always higher than 0.1) were from the All 50th and X 50th variables. Between the 5 and 6-variable model, the R 2 and R 2 adj' increased and the RMSE decreased with an extra variable, but the CV-RMSE stayed the same. There were no collinearity issues flagged by the VIF and CI, which were under 5 for all variables. Predicted values from the 4-variable model and 6-variable model are shown in Figures 6 and 7 for comparison. The difference in R 2 values between these two is only 0.04, but the observations from the 6-variable model are distributed closer to the 1:1 line, suggesting a better fit. The best models obtained from the best subset regression analyses applied to the dataset without the low LAI plots (n = 58), consistently included the same variables as the best models obtained when using the dataset of 61 plots. The R 2 values were lower (0.1 lower) than the R 2 values observed when using the 61 plots (Figure 8), however, this reduction of the R 2 values can be attributed to the reduced number of plots representing the low levels of the LAI range. In addition, the fact that the best models included the exact same variables than the models from the 61 plots, and that the reduction of the R 2 values is only 0.1 confirms that such plots are not influential enough to drive the relationship in the models. Therefore, since the exclusion of these three plots did not affect the relationship of measured LAI with the lidar and GeoSAR metrics, most of the results reported in this research used the dataset with 61 plots. Crown density metrics were included in the best models using 5 or more variables. These were removed as independent variables, and the data re-analyzed. The results from these analyses are shown in Table 8. It was noticeable that in the absence of the crown metrics from lidar, more variables from GeoSAR were included in the models, to the point of obtaining R 2 and RMSE values comparable to the models in Table 7. The additional metrics from GeoSAR were Pmag stdv and Pmag max . The VIF values from these two models increased to 7.6 compared to the models with the crown metrics, due to the high correlation between Pmag stdv and Pmag max (0.931). Table 8. Best predictive models of LAI using lidar metrics (excluding crown density slices) and GeoSAR metrics, n = 61. The statistics R 2 adj' , CV-RMSE, SSCC, VIF, and CI are the adjusted coefficient of determination, the RMSE from the cross validation analysis, the squared semipartial correlation coefficient from partial sum of squares, the variance inflation factor and the condition index, respectively. All variables in the models were highly significant at a p-value < 0.0001. For a description of the variable names refer to Table 1

Discussion
The LAI range of values, among all plots, was large enough to develop a relationship with lidar metrics. There were few representatives (3) at the low range of LAI. These three plots were influential, and therefore, were not deleted from the dataset. It is important to highlight that even when the estimations from the LiCor LAI-2000 could be conservative and underestimating the actual values of LAI for the plots sampled [49][50][51], there was consistency in the estimates acquired and the amount of vegetation observed (especially, in the extremes of the LAI range). Therefore, the prediction models developed using these estimations can capture relative differences in LAI among the observed forest conditions.
In the past, models for LAI prediction in mixed hardwood and coniferous forests using only lidar data have reported R 2 values ranging from 0.8 to 0.9, using either very few plots (between 10 to 18) or small plot sizes (400 m 2 to 500 m 2 ) [8,9,12,38]. The results reported in this research, using 61 plots of 1,257 m 2 size, reveal an R 2 of 0.69 (CV-RMSE = 0.48) for lidar only models, and an increased R 2 value of 0.77 (CV-RMSE = 0.42) when using lidar and GeoSAR data together.
The high correlation of LPI with leaf area index was expected [7]. Laser penetration index, defined as the proportion of ground returns to the total number of pulses, is directly related to the amount of leaves and canopy thickness. The more open the canopy the more pulses reach the ground, and vice versa. This variable was included in the models that were developed with either lidar metrics alone or with the combination of lidar and GeoSAR metrics. There were two models where LPI was not included, in which the 50th percentile of lidar returns took its place.
Lidar return percentiles are height values calculated based on the vertical density of returns [39]. They describe the height of the vegetation density across the stand vertical profiles. In other words, such heights relate to the target heights on the ground, as more targets (i.e., branches, leaves, etc.) the laser encounters at certain height or section from the ground, more returns are obtained from that section of the stand. For example, the 50th percentile value means that 50% of the return heights are above or below that height. In addition, the 10th percentile was included in the lidar only models, this metric ranged from 0.40 m to 8.08 m, with a mean of 3.54 m for all 61 plots. At this height (of the vertical profile) in the measured forest stands, mostly understory was present, making this stratum an important contributor to the LAI value of the plot.
Similar to the 10th percentile of the lidar returns, the density metric d 10 [39], defined as the proportion of returns found at the top of the canopy with respect to the total number of returns from the vegetation, was included in the lidar metric models only. The top of the canopy is directly related to tree crowns, and hence LAI. Almost opposite to d 10 , the density metric d 2 was selected in the models using lidar and GeoSAR metrics together. This variable relates to the low section of the vertical profile of the stand.
Crown density slice metrics are descriptors of tree crowns, and metrics related with the proportions of returns and standard deviation of the return heights at 1 and 3 meters below the mode value were included in the models. These variables contributed as much as the density metrics. Interestingly, the combination of all returns percentiles, densities, and crown density metrics in the models managed to describe the vegetation at the top, medium, and low level of the vertical profile. For instance, d 10 , Cd-3, and All 10th were together in the 4-variable model for lidar metrics only.
The interferometric heights from the X-band, after being corrected by the DEM developed from lidar data, showed the largest correlations with LAI. The 50th percentile of the height values per plot was positively correlated with LAI. The coefficient of variation from all the height values within a plot correlated negatively, suggesting more variability among the height values in plots with low LAI values. In addition, the metrics of the layer generated from the difference between X-band and P-band (X-minus P-band), and the metrics from the P-band interferometry showed significant correlations with LAI but they were not included in the best models. Moreover, the coefficient of variation obtained from the values of one of the σ 0 layers contributed significantly to the model when only GeoSAR data were used.

Conclusions
The results from this study support previous success in estimating LAI in mixed forests using lidar metrics [11][12][13], R 2 values were 0.58 (CV-RMSE=0.53) and 0.69 (CV-RMSE=0.48) for two and four lidar metrics in the model, respectively. Our evaluation showed that the sole use of dual-band synthetic aperture radar to estimate LAI is not as promising as the sole use of lidar data, since the best model had an R 2 of 0.52 (CV-RMSE=0.58) by including four metrics in the model. However, the evaluation of the two technologies together showed an important synergistic gain in the explanation of LAI variability, the R 2 values increased up to 0.77 with a CV-RMSE of 0.42. The most important metric in the combined model was the 50th percentile of the X-band interferometric height from DBInSAR The set of plots used in this research, comprised a broad range of stand ages (10 to 164 year old), forest types (21 plots of hardwoods, 36 plots of pure pine, and 4 plots of pine-hardwood), and measured LAI values (1.3 to 4.9). Considering this variability, the models developed represent a robust and accurate way to estimate LAI in the temperate mixed forests of Virginia.
Ideally, the future inclusion of additional sampled plots covering a wider range of species associations and LAI values for this particular forest type would increase the robustness and accuracy of these models and result in a more trustworthy tool for the estimation and monitoring of leaf area in other states or regions. In addition, X-band interferometry is currently possible using spaceborne sensors, which shows clear utility for LAI estimation at landscape to regional scales.
At present, the growing hardwood utilization industry requires decision support tools that can accommodate a diverse set of management, planning, and policy-making strategies and goals. Leaf area index is a key variable for the estimation of wood production and carbon storage when using such tools. Consequently, robust and accurate models to remotely estimate this variable are essential. The results of our research demonstrate that lidar and DBInSAR data can be important factors in the development and improvement of such models, particularly when the datasets are used in tandem.