Discrimination of Canopy Structural Types in the Sierra Nevada Mountains in Central California

: Accurate information about ecosystem structure and biogeochemical properties is essential to providing better estimates ecosystem functioning. Airborne LiDAR (light detection and ranging) is the most accurate way to retrieve canopy structure. However, accurately obtaining both biogeochemical traits and structure parameters requires concurrent measurements from imaging spectrometers and LiDARs. Our main objective was to evaluate the use of imaging spectroscopy (IS) to provide vegetation structural information. We developed models to estimate structural variables (i.e., biomass, height, vegetation heterogeneity and clumping) using IS data with a random forests model from three forest ecosystems (i.e., an oak-pine low elevation savanna, a mixed conifer / broadleaf mid-elevation forest, and a high-elevation montane conifer forest) in the Sierra Nevada Mountains, California. We developed and tested general models to estimate the four structural variables with accuracies greater than 75%, for the structurally and ecologically di ﬀ erent forest sites, demonstrating their applicability to a diverse range of forest ecosystems. The model R 2 for each structural variable was least in the conifer / broadleaf forest than either the low elevation savanna or the montane conifer forest. We then used the structural variables we derived to discriminate site-speciﬁc, ecologically meaningful descriptions of canopy structural types (CST). Our CST results demonstrate how IS data can be used to create comprehensive and easily interpretable maps of forest structural types that capture their major structural features and trends across di ﬀ erent vegetation types in the Sierra Nevada Mountains. The mixed conifer / broadleaf forest and montane conifer forest had the most complex structures, containing six and ﬁve CSTs respectively. The identiﬁcation of CSTs within a site allowed us to better identify the main drivers of structural variability in each ecosystem. CSTs in open savanna were driven mainly by di ﬀ erences in vegetation cover; in the mid-elevation mixed forest, by the combination of biomass and canopy height; and in the montane conifer forest, by vegetation heterogeneity and clumping.


Introduction
Terrestrial ecosystems regulate the exchange of energy and matter between the land and the atmosphere [1]. Trees fundamentally define the three-dimensional structural and energetic properties of forest ecosystems including the amount of light energy intercepted for photosynthesis, which is subsequently available for respiration, and evapotranspiration [2,3]. Furthermore, the structural and biogeochemical traits within ecosystems may promote complementarity in energy use, including higher

Study Site
Our study region is located on the western slopes of the Sierra Nevada Mountains of central California. We focus on three sub-regions (49.94 km 2 , 24.39 km 2 , and 134.02 km 2 ) located coincidentally within the National Science Foundation (NSF) National Ecological Observatory Network (NEON) terrestrial sites in Domain 17 and which also overlaps with two sites and is near to the third site of the National Science Foundation's Southern Sierra Critical Zone Observatory (CZO) (Figure 1). Our study areas are limited by the availability of LiDAR data acquired concurrently with IS data. The three sites represent forest types that are extensively distributed throughout the Sierra Nevada Mountains, in California.
The lower elevation site is located in the San Joaquin Experimental Range (abbreviated as SJER; located at 37 • 5 N, 119 • 43 W) and covers an elevation range from 210 m to 520 m above sea level. The area is characterized by open woodland savanna dominated by Quercus douglasii, Quercus wislizeni and Pinus sabiniana and generally associated with shrub species of Ceanothus sp., Arctostaphylos sp. and Aesculus californica, as well as various annual grass species. Summers in this Mediterranean climate are hot and dry with mean temperatures between 24 • C and 27 • C. Winters are cool and wet with mean temperatures ranging from 4 • C to 10 • C. The annual precipitation is around 486 mm, restricted to the extended winter period, from October/November to April/May [54], producing an annual drought duration of six or more months in the summer-autumn period. and wet. The mean temperature varies from 5.5 °C to 18.0 °C , and the annual precipitation is around 805 mm per year [55].
The highest elevation site is the Teakettle Experimental Forest (abbreviated as TEAK; 36° 58´ N, 119°1´ W) located within the Sierra National Forest at elevations from 2000 m to 2800 m above sea level and mainly characterized by montane old-growth conifer forest dominated by Abies concolor, Abies magnifica and Pinus jeffreyi. Forest mosaics range from patches of dense canopy, scattered trees, or shrub-dominated together with open rocky areas. Summers are cooler than lower elevations but are also dry and winters are cool and moist, with snow typically covering the ground from November to June. Most of the 1250 mm annual precipitation falls as snow between November and May [56].  Figure 2 shows the methodology flowchart. Two remotely-sensed datasets were used in this research: (1) Small footprint, discrete return-recording LiDAR (Optec Gemni) from the NSF NEON program and (2) IS data from the National Aeronautics and Space Administration's (NASA) Advanced Visible Infrared Imaging Spectrometer (AVIRIS-classic). The former was used to produce the reference structural data, while the latter was trained to predict the structural variables The mid-elevation site is Soaproot Saddle (abbreviated as SOAP; located at 37 • 2 N, 119 • 15 W) in the Sierra National Forest and is situated approximately 998 m to 1383 m above sea level. The site is dominated by evergreen and deciduous broadleaf trees and mixed conifer forest. The main conifer tree species are Pinus ponderosa, Calocedrus decurrens, and Pinus lambertiana, and the most common broadleaf tree species are the deciduous Quercus kelloggii and the evergreen Quercus chrysolepsis. Forest density varies from dense to relatively open. Summers are hot and dry, and winters are mild and wet. The mean temperature varies from 5.5 • C to 18.0 • C, and the annual precipitation is around 805 mm per year [55].

Data and Methods
The highest elevation site is the Teakettle Experimental Forest (abbreviated as TEAK; 36 • 58 N, 119 • 1 W) located within the Sierra National Forest at elevations from 2000 m to 2800 m above sea level and mainly characterized by montane old-growth conifer forest dominated by Abies concolor, Abies magnifica and Pinus jeffreyi. Forest mosaics range from patches of dense canopy, scattered trees, or shrub-dominated together with open rocky areas. Summers are cooler than lower elevations but are also dry and winters are cool and moist, with snow typically covering the ground from November to June. Most of the 1250 mm annual precipitation falls as snow between November and May [56]. Figure 2 shows the methodology flowchart. Two remotely-sensed datasets were used in this research: (1) Small footprint, discrete return-recording LiDAR (Optec Gemni) from the NSF NEON program and (2) IS data from the National Aeronautics and Space Administration's (NASA) Advanced Visible Infrared Imaging Spectrometer (AVIRIS-classic). The former was used to produce the reference structural data, while the latter was trained to predict the structural variables measured by LiDAR. In addition, LiDAR metrics were used to validate the IS models due to the accuracy of the LiDAR data and the lack of field data for the four predicted structural variables. Previous studies have used LiDAR as reference data for validation [57]. We evaluated the accuracy of the LiDAR canopy height prediction by comparing it to studies in the literature [18,[58][59][60][61][62][63][64][65][66][67], which reported coefficients of determination (R 2 ), between 0.70 and 0.95 [61][62][63][64][65][66][67] in ecosystems similar to our study sites [67]. Several studies estimated conifers, like those present in our study sites, with height errors less than two meters [61,66,[68][69][70] in sites with species from mixed and broadleaf forests with different vegetation densities and tree species compositions [71,72]. To evaluate our LiDAR height data directly, we compared maximum heights of LiDAR with field observations (see Supplementary Materials Figure S1) obtaining an agreement of R 2 = 0.80 and a RMSE of 2.38 m which validates the use of the LiDAR data as reference data for this analysis.

Data and Methods
Remote Sens. 2018, 10, x FOR PEER REVIEW 5 of 30 measured by LiDAR. In addition, LiDAR metrics were used to validate the IS models due to the accuracy of the LiDAR data and the lack of field data for the four predicted structural variables. Previous studies have used LiDAR as reference data for validation [57]. We evaluated the accuracy of the LiDAR canopy height prediction by comparing it to studies in the literature [18,[58][59][60][61][62][63][64][65][66][67], which reported coefficients of determination (R 2 ), between 0.70 and 0.95 [61][62][63][64][65][66][67] in ecosystems similar to our study sites [67]. Several studies estimated conifers, like those present in our study sites, with height errors less than two meters [61,66,[68][69][70] in sites with species from mixed and broadleaf forests with different vegetation densities and tree species compositions [71,72]. To evaluate our LiDAR height data directly, we compared maximum heights of LiDAR with field observations (see Supplementary materials Figure S1) obtaining an agreement of R 2 = 0.80 and a RMSE of 2.38 m which validates the use of the LiDAR data as reference data for this analysis. The research took place in two phases ( Figure 2). In the first phase, four structural variables were selected from LiDAR. We built general, non-site-specific models to estimate the four structural variables (i.e., biomass, canopy height, vegetation heterogeneity, and clumping) using optical metrics from AVIRIS-classic and using LiDAR data as reference data to train, test and validate the model. In the second phase, combinations of the structural variables were used to define spatially explicit site-dependent CSTs. The procedure was carried out in parallel using IS-estimated structural variables and LiDAR-derived structural variables. In a final step CST results from IS-derived data and LiDAR-derived data were compared to validate the procedure.  The research took place in two phases ( Figure 2). In the first phase, four structural variables were selected from LiDAR. We built general, non-site-specific models to estimate the four structural variables (i.e., biomass, canopy height, vegetation heterogeneity, and clumping) using optical metrics from AVIRIS-classic and using LiDAR data as reference data to train, test and validate the model. In the second phase, combinations of the structural variables were used to define spatially explicit site-dependent CSTs. The procedure was carried out in parallel using IS-estimated structural variables and LiDAR-derived structural variables. In a final step CST results from IS-derived data and LiDAR-derived data were compared to validate the procedure.
2.2.1. Advanced Visible Infrared Imaging Spectrometer (AVIRIS) and Light Detection and Ranging (LiDAR) Preprocessing AVIRIS-classic data were acquired June 12, 2013 by the Jet Propulsion Laboratory (JPL). The sensor was flown onboard an ER-2 aircraft at approximately 20 km altitude, resulting in a pixel spatial resolution of 18 m. AVIRIS-classic has 224 spectral bands ranging from 370 nm to 2500 nm with a nominal bandwidth of 10 nm [73]. JPL provided radiometric, geometric and atmospheric correction to apparent surface reflectance. In addition, a topographic correction was done following the model proposed by Soenen [74] based on the sun-canopy-sensor (SCS) correction proposed by Gu and Gillespie [75]. The noisy atmospheric water bands from 1342 nm to 1482 nm and from 1800 nm to 1966 nm were removed, leaving 193 bands for the analysis.
Small footprint, discrete return-recording Optec Gemni LiDAR data from the NEON Airborne Observation Platform (AOP) were collected for the three study sites on 9-15 June 2013 coincident with the NASA IS data [76]. The areas covered by the LiDAR acquisition are shown in the red boxes in Figure 1. The data have a point density of~19 pts/m 2 . In addition, a 1 m 2 spatial resolution bare earth digital elevation model (DEM) was created by NEON, which was used to normalize the height returns. All LiDAR-derived structural metrics were calculated on an 18 m grid that was co-registered with AVIRIS-classic data pixel using 100 ground control points with an RMSE of < 0.5 pixel. We removed pixels without green vegetation using a two-step masking process: (1) pixels with Normalized Difference Vegetation Index (NDVI) <0.2 and of these, (2) pixels that were considered non-photosynthetic vegetation (non-green plant material) when un-mixed using multiple endmember spectral mixture analysis (MESMA) [77]. The low NDVI threshold selected was based on the decision to use a common criterion valid for all three study sites. This was especially important at the SJER site due to low canopy cover in the savanna ecosystem, together with the summer timing of the data acquisition (i.e., when the understory is dominated by dry grass). These steps resulted in masking rock outcrops and water bodies in the images but did not remove all the roads as pixels were partly vegetation covered.

Optical and LiDAR Metrics
The optical metrics were based on Huesca [32]: (1). Sub-pixel cover fractions of Green Vegetation (representing different forest cover types), Soil, Non-Photosynthetic vegetation (NPV), and shade were calculated using MESMA implemented in the Visualization and Image Processing for Environmental Research (VIPER) Tools package using ENVI image analysis software [78]. MESMA was run with the following constraints: 1. The maximum allowable RMSE = 2.5% and 2. The minimum and maximum allowable endmember fractions must fall between −0.05 and 1.05. Spectra of NPV and soil endmembers were independently collected during the overflight using an Analytical Spectral Devices (ASD) full-range spectrometer sensor, the FieldSpec3 Spectroradiometer (Analytical Spectral Devices Inc., Boulder, CO, USA), while green vegetation endmembers were selected from the AVIRIS-classic images. Several endmember spectra were selected for each cover type. The final endmembers chosen had high COB (count-based endmember selection (COB: [79])) and low MASA (minimum average spectral angle (MASA: [80])) and EAR (endmember average RMSE (EAR: [81])) values. (2). Narrow-band indices representing three types of spectral information were used: 1. those sensitive to the presence of photosynthetic pigments: NDVI [82], Red Edge Normalized Difference Vegetation Index (NDVI705) [83,84], Modified Red Edge Normalized Difference Vegetation Index (mNDVI705) [84,85], and Enhanced Vegetation Index (EVI) [86]; 2. those sensitive to water content: Normalized Difference Water Index (NDWI) [87], Normalized Difference Infrared Index (NDII) [88], and 3. an index sensitive to dry plant matter content: the Cellulose Absorption Index (CAI) [89]. Despite the similarity of these indexes, they provide different projections through the data space related to these processes and each provides useful information, as described later in the Results section. The formulas of the narrow-band indices used in this research are shown in Supplementary Material Table S1. (3). Spectral canopy water absorption derivatives: Derivative analysis was used to measure the wavelength position and magnitude of the NIR water absorption edges [90], abbreviated here as Wtr1EdgeWvl and Wtr1EdgeMag, the canopy water absorption features between 958-1073 nm and 1105-1168 nm, abbreviated as Wtr1AbAr and Wtr2AbAr, respectively, and the physically-derived equivalent water thickness (EWT; the depth of water/per pixel area) [90][91][92]. (4). Principal component analysis (PCA) [93] was performed both using the full spectral range, as well as the independent regions: visible, near infrared, and shortwave infrared which was done to summarize significant information in all three regions of the spectrum. The PCA components that provided strong relationships with canopy structure [32] were included in this study.
The LiDAR metrics selected were those in Huesca [32], including maximum height (Hmax), mean height (Hmean), median height (Hmedian) and standard deviation of canopy height (Hstd), which were derived from the height distribution of the LiDAR returns. To avoid outliers, the 99th percentile was used to assign the maximum height. The standard deviation of the canopy height model (CHMstd) was created at 1 m 2 spatial resolution. The fractional cover (FC) [21,94] and the fractional cover from the first returns (FC_1ret) [21] represent the vertical projection of the vegetation canopy onto the ground calculated as the ratio between vegetation and ground returns. FC is based on all returns while FC_1ret only considers the first returns. Leaf area index (LAI) [95] is estimated from the canopy cover. The integral of the vegetation vertical profile (VVPint) at 0.5 m vertical resolution, represents the proportion of canopy returns for each height bin and is used to estimate biomass [96,97]. In addition, we computed the clumping index using the method of García [23] to estimate the degree of randomness of the spatial distribution of canopy materials. The term clumping/randomness refers to the spatial distribution of the vegetation within our 18 m pixels. Although usually applied to the degree of randomness of leaves/needles or shoots, it is not restricted to a specific scale but can be applied to the degree of clumpiness or randomness of the crowns or in the case here, the vegetation within the pixel. The clumping index was not included in Huesca [32] due to distortion in some of the flight lines. This problem was fixed by the NEON team, and used in this analysis [76].

LiDAR-Derived Structural Variables
The relationship among all 10 LiDAR metrics was measured by Pearson's R coefficient (R) (Supplementary Materials Table S2). Results showed strong inter-correlations among LiDAR metrics in the three study sites, ranging from −0.7 to 1.00. Thus, we grouped the highly correlated metrics into variables related to biomass (composed of VVPint, LAI, FC, and FC_1ret, with cross-correlations between 0.91-1.0), variables related to canopy height (composed of Hmean, Hmedian and Hmax, with cross-correlations between 0.79-1.0), variables related to vegetation heterogeneity or canopy roughness (composed of Hstd and CHMstd, with cross-correlations between 0.96-1.00), and the clumping index, which had the least correlation to the other structural variables and was treated as an independent variable.
In this study, the metrics selected from the mid-elevation site, were chosen to represent each metrics group and were based on the variable that presented the highest correlations. The metric Biomass was most strongly represented by the vertical distribution of canopy elements (VVPint). Hmean represented canopy height, and CHMstd represented vegetation heterogeneity. CHMstd is a continuous metric that represented vegetation heterogeneity based on canopy heights. The clumping index represented how the vegetation is distributed in the space; that means the degree of clumpiness or randomness of the vegetation. Both structural variables (i.e., CHMstd and clumping index) represent horizontal structural variables. Of the four selected structural variables, biomass and height can be considered "vertical vegetation variables", and vegetation heterogeneity and clumping can be considered "horizontal vegetation variables".

Modeling Structural Variables with Optical Metrics
We developed a single, cross-site model for each of the final four structural variables (i.e. biomass, vegetation height, vegetation heterogeneity and clumping), estimating them with the optical metrics defined in Section 2.2.2 using the random forests model. It is a non-parametric non-linear tree-based regression model [98]. The number of decision trees was initially selected by running the procedure 1000 times and analyzing the regression error. At 200 trees, the error stabilized and, to set a conservative limit, we chose 500 iterative trees to ensure that every observation was predicted multiple times. The random forests model randomly selects the best fit four optical metrics from the 24 possible metrics ( Table 1). These combinations can be different in each tree; therefore, over the 500 runs the relative strength of each metric on model performance was established.  [18,91] Normalized Difference Water Index (NDWI); [87] Fractional Cover form the first returns (FC-1rtn); [18] Normalized Difference Infrared Index (NDII); [88] Leaf Area Index (LAI); [92] Cellulose Absorption Index (CAI); [89] Vegetation Vertical Profile integral (VVIint) [93,94] Wavelength positon of the NIR water absorption edge (Wtr1EdgeWvl), [90] Clumping Index [23] Magnitude positon of the NIR water absorption edge (Wtr1EdgeMag), [90] Canopy water absorption feature between 958-1073 nm (Wtr1AbAr), [90] Canopy water absorption feature between 1105-1168 nm (Wtr2AbAr), [90] Equivalent Water Thickness (EWT); [90][91][92] Principal Component: PC1, PC2, PC1_visible, PC2_visible, PC1_NIR, PC2_NIR, PC1_SWIR1 and PC1_SWIR2 [93].
In order to evaluate the reliability of each model, one set of data was used to develop the model (i.e., training-testing dataset), and another independent dataset was used to evaluate the model's reliability (i.e., validation dataset). Training and validation were selected to be sufficiently far apart to avoid any spatial autocorrelation. The Moran I correlogram was used to estimate the minimum distance apart validation points needed to be from training points to avoid spatial autocorrelation. At each iteration, the training step used 1/3 of the independent testing-training data and the remaining 2/3 was used for internal testing of the model. The training-testing dataset was selected from North to South over the western part of each site, and the validation dataset from North to South over the Eastern part with a separation of 2.5 km, 1.0 km, and 1.4 km for SJER, SOAP and TEAK, respectively, in order to capture the maximum variability of each structural variable while avoiding the spatial autocorrelation.
The complex terrain and difficult access to areas with steep slopes, compounded with large gradients and clustered species distribution patterns made it difficult to find an unbiased sampling plan for these training, testing, and validation datasets. We employed a dual-validation method to evaluate (1) the R 2 and (2) the spatial extendibility of the models. Each model's accuracy and precision was evaluated by its R 2 and the model's error using the RMSE and the mean percent standard error (MPSE). Sometimes RMSE can be difficult to interpret because it depends of the range of the variable, therefore, we also report the ranges of each variable among other basic statistics (Supplementary Materials  Table S3). (2) The spatial extendibility of the model was evaluated over an independent validation data for each study site by assessing the relationships measured by R 2 , and RMSE between the AVIRIS-classic estimated structural variables values and the LiDAR data. By spatial extendibility we mean that a statistical model developed from the training-testing data was compared to an independent validation dataset, which was obtained from the image at locations sufficiently separated beyond the distance determined by the spatial autocorrelation test at each site, but within its elevation zone community type. This validation evaluates the robustness of the model at each site. We also tested for spatial autocorrelation of the errors for each structural variable in each study site using Moran's I index.
We present the ranking of the most important metrics based on the increase in the random forests' mean square error if they were deleted from the composite structural variables.

Canopy Structural Types Definition
We wished to understand how these four structural variables (i.e., biomass, vegetation height, vegetation heterogeneity and clumping) related to different forest types in the three study sites. While the generic model was developed to produce the four structural variables using data from all three study sites, we used these to develop a set of CSTs that were defined site by site. Our objective here is to develop a procedure to identify CSTs. This procedure was applied independently to both LiDAR-derived structural variables (i.e., those used as reference data) and to the IS-derived structural variables (i.e., used as predicted structural variables). We then compared the CSTs derived from these two data sources. In addition we wanted to evaluate how CSTs varied within and between sites that are characterized by differences in climate, topography, soils, community composition, community types, and frequency of disturbance.
CSTs are defined by the combination of vertical and horizontal structure variables. Vertical structure is represented by biomass and height while horizontal structure is represented by vegetation heterogeneity and clumping. In a first step each variable was to be divided into several classes. The number of classes is defined by the user and depends on the requirements of a specific research objective. For this study we classified biomass and height into three classes that represented low, medium and high values. Vegetation heterogeneity was classified into two classes that represents homogeneous (i.e., low values) and heterogeneous (i.e., high values) pixels. Clumping was also classified into two classes that represented clumping distribution of trees (i.e., low values) s versus random distribution of trees (i.e., high values).
An unsupervised IsoData classification [99] was used to classify biomass, height and vegetation heterogeneity. This procedure calculates evenly distributed class means within the dataset, and then iteratively clusters the pixels into the nearest class. However, the clumping was classified using a static threshold of 0.5, with values above 0.5 indicating regular to random distribution, and values below 0.5 indicating clumped aggregation (where 0 = completely clumped, 1 = completely random; Garcia et al., 2015).
The combination of biomass and height classifications created a maximum of 9 vertical structure classes (3 biomass classes combined with 3 height classes). Each class was analyzed in terms of representativeness (i.e., number of pixels and spatial continuity) and the proximity among classes with the intent to reduce the number of combinations to an interpretable and ecologically meaningful set. The final selection of classes represents what we called "vertical canopy structural types" (VCSTs). The combination of vegetation heterogeneity and clumping resulted in a maximum of 4 horizontal structure classes (2 vegetation heterogeneity classes combined with 2 clumping classes). Each class was analyzed following the same criteria explained above. The final selection represents what we called "horizontal canopy structural types" (HCSTs). In a final step, VCSTs and HCSTs were combined. Each class was analyzed in terms of representativeness (i.e., number of pixels and spatial continuity) and the proximity among classes to define the final CSTs.
One class is selected as potential VCST, HCST, or as the final CST when it represents more than 15% of the pixels of the study site and shows spatial continuity (or coherence). Those representing less than 15% were merged with one of the previous classes based on the average class proximity and spatial coherence. One additional rule in the merging process was that a high class value for any structural variable cannot be combined with a low class value. This restriction means that a high biomass pixel cannot be joined with a low biomass pixel or high height with low height. The other classes that could not be grouped and represent less than 15% of the pixels of the study site were evaluated as to whether they should be grouped together or considered as a different class.
The next step was to ecologically interpret the CSTs for each study site using the U.S. Forest Service's vegetation map of California (CALVEG), updated in 2015 [100]. It provides information on vegetation cover types and attributes, such as crown size and height. CALVEG is the best source of spatial information about these sites, and produced at the same spatial scale and for the same time period, and CVEG provided independent information on the ecological status of these forests. Two attributes within CALVEG were used to conduct this analysis: (1) Vegetation Cover Type and (2) National Vegetation Classification Standard (NVCS). Within the latter, only the classes of "closed tree canopy" and "open tree canopy" were considered. Each CST is described with the percentage of each class attribute. The sum of all "vegetation cover type" classes within in each CST sums to 100%, however the sum of the two selected classes from the NVCS classification are not required to equal 100% because they do not represent all classes within this attribute. Table 2 presents the classes defined within the vegetation cover type attribute.

Modeling Structural Variables with Optical Metrics
Random forest regression models for all four structural variables achieved high goodness-of-fit, with R 2 values of 0.83, 0.80, 0.78 and 0.76, and RMSEs of 0.01, 1.08, 2.82, 0.01 for biomass, canopy height, vegetation heterogeneity, and clumping, respectively (Table 3, Figures 3-6). Clearly the spatial patterns for the structural variables between LiDAR and optical data closely match for all four variables at all three sites, even with the banding observed in the clumping images ( Figure 6). Additional information about the correlations between the LIDAR variables and structural variables at each study site are provided in the Supplementary Materials Figure S2. The relationship between the four structural variables and the most important optical metrics are shown in the Supplementary Materials Table S4.
To evaluate the site-specific performance of the general models, we see they produced IS-derived maps that were similar to LiDAR-derived maps at each site, although some differences are observed (Table 3). By examining the R 2 correlations within the training-testing and validation datasets compared to the reference data (i.e., LiDAR-derived structural variables) for each of the four structural variables, R 2 values were greater than 0.70 for the training dataset for all variables except height in SJER and TEAK, and equal or above 0.58 in SOAP. Lowest accuracies were found using the validation dataset for each study site, which was especially evident in SOAP, but nontheless with low RMSE. However, considering the independent validation data for the three study sites together, the R 2 values were 081, 0.69, 0.72 and 0.70 for biomass, canopy height, vegetation heterogeneity and clumping respectively. Biomass was the structural variable with highest R 2 at all study sites (i.e., R 2 of 0.80, 0.66 and 0.8 in SJER, SOAP and TEAK respectively) and height was the structural variable with lower R 2 in SJER and SOAP (i.e., R 2 of 0.67 and 0.59, respectively) and clumping in TEAK (R 2 of 0.72).  (Figure 3a,b, Figure 4a This site has the most complex vegetation in terms of species number and types, but it is the smallest area by 2-5 times the other two sites, providing a potential explanation for its lower performance. Examining the spatial distribution of model errors at each site, we found no coherent spatial pattern for errors in SJER (Supplementary Materials, Figure S3a-d) except for clumping. This was also true for canopy height and heterogeneity at SOAP (Supplementary Materials, Figure S3f Figure S3d,h,l). Moran's I index showed significant positive spatial autocorrelation that is in agreement with the spatial patterns observed. The highest errors in biomass were found in the southwest and northwest parts of the TEAK study area, the highest errors in canopy height were also in a southwest to northeast band, and the highest errors in vegetation heterogeneity were in the southwest (Supplementary Materials, Figure S3j,k), respectively. Those areas correspond to extreme values of the variable (i.e., either very high or very low), at least partially due to the regression approach used for estimating these structural variables. Remote Sens. 2018, 10, x FOR PEER REVIEW 12 of 30  Table S5 for the ranking of the input metrics. Note regions are not to scale due to optimizing images and site size differences.  Table S5 for the ranking of the input metrics. Note regions are not to scale due to optimizing images and site size differences. Remote Sens. 2018, 10, x FOR PEER REVIEW 13 of 30  Table S5 for the ranking of the input metrics. Note regions are not to scale due to optimizing images and site size differences.    Table S5 for the ranking of the input metrics. Note regions are not to scale due to optimizing images and site size differences.  Table S5 for the ranking of the input metrics. Note regions are not to scale due to optimizing images and site size differences.  Table S5 for the ranking of the input metrics. Note regions are not to scale due to optimizing images and site size differences.
To evaluate the site-specific performance of the general models, we see they produced IS-derived maps that were similar to LiDAR-derived maps at each site, although some differences are observed (Table 3). By examining the R 2 correlations within the training-testing and validation datasets compared to the reference data (i.e., LiDAR-derived structural variables) for each of the  Table S5 for the ranking of the input metrics. Note regions are not to scale due to optimizing images and site size differences.

Canopy Structural Types (CSTs)
Result from the analysis of CSTs is shown in Figure 7 and Table 4. Each CST represents different patterns of biomass, height, vegetation heterogeneity and clumping within the forest (Supplementary Materials Figure S4). The coherent spatial distribution of each CST show the class may consistently Remote Sens. 2019, 11, 1100 16 of 29 represent a forest type or condition. Table 5 shows the percentage of each vegetation cover type from CALVEG within each CST. Only the results from the IS-derived data are displayed in Table 5 for SJER and SOAP because the differences between LiDAR-derived and IS-derived data were negligible. For TEAK, both are displayed (i.e., LiDAR-derived and IS-derived data) because there is a disagreement in one of the CST classes. Supplementary Materials Figure S5 presents a red-green-blue (RGB) combination, equivalent to the fractional composition of each of the three most important structural variables at each site. We present this figure to demonstrate that the number of CSTs identified in our analysis (Figure 7) is consistent with the patterns seen in Supplementary Materials Figure S5 and these results are spatially coherent. Supplementary Materials Tables S6 and S7 show the agreement between CSTs derived from LiDAR and IS data.    . Spatial distribution of the structural types (CST) in SJER using reference data i.e., LiDAR-derived data (a) and IS-derived data (b); similarly in SOAP these are (c) and (d); and in TEAK (e) and (f). Note regions are not to scale due to optimizing images and site size differences.

Structural Types Defined in the San Joaquin Experimental Forest
Of the nine potential classes from the combinations of biomass and canopy height (i.e., vertical structure), we found three different vertical canopy structural types (VCSTs) at SJER, representing low biomass and low height, medium biomass and medium height, and high biomass and tall height.
Within the four possible classes from the combinations of vegetation heterogeneity and clumping (i.e., horizontal structure), there were just two horizontal canopy structural types (HCSTs) discriminated because most pixels were classified within the low clumping class. Thus, differences between these two combined classes are based on mainly on vegetation heterogeneity. Finally, the six classes from combing the VCSTs and HCSTs were grouped into three unique CSTs (Figure 7a,b and Table 4).
Differences among CSTs at SJER were driven mainly by vertical structure. In general terms, low to medium biomass (i.e., mean biomass between 0.06 and 0.18) and canopy height (i.e., mean canopy height between 1.86 and 4.28 m) were associated with more homogeneous areas (i.e., mean vegetation heterogeneity between 0.62 and 1.49), while high biomass and tall canopy height (i.e., mean biomass and canopy height higher than 0.30 and 6 m respectively) characterized more heterogeneous areas (i.e., mean vegetation heterogeneity greater than 2.5). Following the procedure described in the methods section, the CSTs were independently determined using LiDAR-derived structural variables and IS-estimated structural variables (Figure 7a,b). Table 5 shows the percentage of each vegetation cover type from CALVEG within each CST. Only the results from the IS-derived data are displayed in Table 5 because the differences between LiDAR-derived and IS-derived data were negligible (see Table 4). The first CST is dominated by grass species (74%). The second CST is dominated by a mix of hardwood species and grass (43% hardwood and 53% grass). The third canopy structural type is mainly dominated by hardwood species (55% hardwood, 27% hardwood/conifer mixed).

Canopy Structural Types Defined in Soaproot Saddle
At SOAP, analysis of the nine possible classes from vertical structure (i.e., the combinations of biomass and canopy height) resulted in five VCSTs. The four classes resulting from the horizontal structure (i.e., the combinations of vegetation heterogeneity and clumping) were sufficiently representative and distinct that they were not grouped. The 20 final classes from the combinations of VCSTs and HCSTs were grouped into six different CSTs (Figure 7c,d and Table 4).
The similarity between CSTs obtained from LiDAR-derived variables and from IS-derived variables is clear (Figure 7). CST1 and CST6 are characterized by low and high values of structural variables, respectively. The other four CSTs represent intermediate situations. CST4 and CST5 represent medium biomass (i.e., mean biomass between 0.57 and 0.70) and high vegetation heterogeneity (i.e., mean vegetation heterogeneity between 7.61 and 8.93) together with a regular to random distribution of the vegetation (i.e., mean clumping between 0.44 and 0.65). The difference between these CSTs is in canopy height. While CST4 is characterized by medium canopy height (i.e., mean height around 12-14 m), CST5 is dominated by taller canopies (i.e., mean height around 15-16 m). There is a difference in terms of biomass between CST2 and CST3 from LiDAR-derived and IS-derived data. CST2 and CST3 are represented by medium and high biomass respectively with LiDAR-derived data but by low and medium biomass respectively with IS-derived data. Both CSTs are characterized by homogeneous vegetation, however, while CST2 is associated with medium canopy height and clustered vegetation aggregations (i.e., mean canopy height around 11-12 m and mean clumping around 0.4), CST3 is characterized by low canopy height and by regular to random distribution of the vegetation (i.e., mean canopy height around 8-9 m and mean clumping around 0.59-0.70). Table 5 shows the range in percentage for each CST of each CALVEG vegetation cover type together with the attributes of closed and sparse tree canopy from the national vegetation classification standards. Only the results from the IS-derived data for TEAK are shown in Table 5 because the differences between LiDAR-derived and IS-derived data for JSER and SOAP were negligible (See Table 4). All CSTs are dominated by mixed forest (Table 5) according to CALVEG data with an increasing percentage from CST1 to structural type CST6 (from 49% in CST1 to 73% in CST6). Hardwood species dominate the first three CSTs while conifer species dominate the last three CSTs. The first CST is associated with low values of the four structural variables and has a relatively high percentage of sparse tree canopy and non-vegetated areas (i.e., 29% and 17% respectively) in relation to the other CSTs.
The main difference between CST2 and CST3 is based on the vegetation distribution (i.e., clumping) ( Table 5). According to CALVEG, CST3 has a higher percentage of hardwood forest, and it is mainly associated with more closed forests than CST2, which has a high percentage of open forest that is associated with a clumped distribution. The differences between CST4 and CST5 are in canopy height and clumping, which is medium and high, respectively, for CST4, and high and low, respectively, for CST5. These results can be explained in terms of vegetation distribution. Although CST5 has a higher percentage of coniferous forest, it is more open than CST4, thus the former presents a greater degree of clumping as represented by lower values of the clumping index. CST6 is characterized by high values of all structural variables; this is in agreement with the CALVEG classes because these pixels are mainly associated with high percentage of conifer forest (26%) with a closed tree canopy (90%).

Structural Types Defined in Teakettle Experimental Forest
At TEAK, analysis of the nine possible classes from the vertical structure resulted in three different VCSTs that represent low, medium and high biomass and height. The four HCSTs resulting from the horizontal structure were again sufficiently different to be representative and distinct, therefore they were not grouped. Finally the 12 resulting classes from the combination of VCSTs and HCSTs were grouped into five different CSTs (Figure 7e,f and Table 4).
CST1 is characterized by low values of the four structural variables. CST2 and CST3 are characterized by medium biomass but the rest of the structural variables are different. While CST2 represents small, homogeneous and regular to random distribution of vegetation (i.e., mean canopy height between 5-8 m, mean vegetation heterogeneity between 3.93 and 5.12 and mean clumping higher than 0.5), CST3 is characterized by tall, heterogeneous canopies in terms of tree heights and clustered aggregations of forest trees (i.e., mean height taller than 14 m, mean vegetation heterogeneity greater than 7.0 and mean clumping less than 0.5). In this site, there was disagreement between CST4 and CST5 in terms of clumping. CST4 represents random distribution in LiDAR-derived data but a clumping distribution in IS-derived data and the opposite situation occurs in CST5. Table 5 shows the percentage of each CALVEG vegetation cover type within each CST together with the attributes of closed and sparse tree canopy from the national vegetation classification standards. At this site, the differences between LiDAR-derived data and IS-derived variables were not negligible in one of the CSTs, thus Table 5 provides both results. TEAK is mainly dominated by coniferous forests but the percent of that dominance is different among structural types. CST1, has low values of the four structural variables, and is associated with a higher percent of sparse low stature vegetation (28% and 32% for LiDAR-derived and IS-derived data, respectively) and non-vegetated area (17% and 18% for LiDAR-derived and IS-derived data, respectively). CST2 had higher structural variables than CST1 (i.e., medium biomass, low canopy height and vegetation heterogeneity, and high clumping) that is associated with the highest percentage of mixed forest (13% and 20% for LiDAR-derived and IS-derived data, respectively) and the lowest percentage conifer forest (55% and 51% for LiDAR-derived and IS-derived data, respectively). Pixels within CST3, CST4 and, CST5 are more than 70% conifer forest. CST3 is associated with more open and sparse tree canopy (16% and 15% for LiDAR-derived and IS-derived data, respectively) than CST4 and CST5. This explains the lower values of biomass and canopy height in CST3. The main difference between CST4 and CST5 is the clumping. Both CSTs are dominated by high percentage of coniferous forest, but CST5 has a greater closed tree canopy percentage (70% and 84% for LiDAR-derived and IS-derived data, respectively) than CST4 (59% and 58% for LiDAR-derived and IS-derived data, respectively). The main disagreement between LiDAR-derived and IS-derived data is the proportion of mixed and conifer forests between CST4 and CST5. While the percentage of conifer and mixed forest is similar for both CSTs using IS-derived data (97% and 93% of conifer for CST4 and CST5 respectively, and 1% and 6% of mixed forest for CST4 and CST5 respectively), the conifer percentage is much higher and mixed forest percentage is much lower for CST5 than CST4 using LiDAR-derived data (77% and 94% of conifer for CST4 and CST5, respectively, and 14% and 5% of mixed forest for CST4 and CST5, respectively).

Discussion
Our results demonstrate how IS data can be used to create comprehensive and easily interpretable maps of canopy structural types across the major forest ecosystems of the Sierra Nevada Mountains, California. The models developed provide accurate estimates of four structural variables (i.e., biomass, canopy height, vegetation heterogeneity and clumping) across three study sites, demonstrating their applicability to a diverse range of ecological types from an open savanna, to a dense conifer/broadleaf mixed forest to a clumped distribution of trees in a high elevation montane conifer forest, using a single, general model for each variable. We illustrated the spatial extendibility of each model using an independent validation dataset that was tested (1) to show it had low spatial autocorrelation, and (2) that it was not used in the training phase for model development.
We developed and tested a general model to estimate structural variables using IS data for three structurally and ecologically different forested sites in the central Sierra Nevada Mts. Our study provides a methodology to discriminate ecologically meaningful "canopy structural types" (CSTs) that further refined and tested the approach of Huesca [32] to map forest structural variability based on "canopy structural types" derived from IS data. The CSTs are site-specific and capture the major structural differences and trends within each ecosystem. The number of CSTs provides information about the canopy complexity of the ecosystems. Accurate maps of forest canopy structures are critical for many environmental monitoring applications and understanding ecosystem change, from improving estimates of ecosystem processes, estimating carbon sequestration patterns, to understanding habitat conditions for other organisms in the forest, and to provide better information for management decisions.
CSTs maps have multiple ecological applications; for example, the CSTs can be used in a species classifier along with AVIRIS-classic spectral data to improve classified estimates of spectrally similar vegetation classes. The spatial variability in these CST classes provides more information about landscape structure. For example, if the vegetation is of low stature it can represent herbaceous vegetation, or if a bit taller it can represent a shrub layer with or without a tree overstory. A forest type like "hardwood/conifer" may contain patches of herbaceous vegetation, shrubs and trees of different size (e.g., from regrowth after fire or logging). Thus, this type of contextual information provides additional details about the age since disturbance, carbon pools, vegetation types, and other characteristics can inform interpretations about habitat condition or connectivity of habitat units. It can also be used with the IS information to constrain the physiochemical traits and their physiological processes as described in the introduction. Another application could use CSTs to evaluate past disturbance events by developing a spatially explicit map of successional stage classes.
LiDAR is the most accurate method we have available to retrieve vegetation structure and this role is widely accepted. Passive remote sensing data have not been seen as providing satisfactory estimates of structural variables, such as biomass or tree height, because of the difficulty to accurately measure 3D vegetation structure from relatively large pixels and wide spectral bands [101]. IS data may be better suited than multispectral data because the narrow spectral bands better resolve absorption and scattering across the spectrum. The results from this analysis show a new method to produce both physiological data and structural data measured concurrently from IS. This might be advantageous, even if it has more error than LiDAR data, because LiDAR is often not concurrently available with IS data. This data, perhaps periodically recalibrated against LiDAR, could allow evaluating seasonal to interannual changes in vegetation condition.
Huesca [32] analyzed the relationships between single spectral bands and several combined spectral metrics with structural variables derived from LiDAR from the SOAP site. Their results demonstrated that multiple optical metrics are needed to derive good estimates of structure and showed the importance of using the entire spectrum for resolving the variables. Their analysis of single spectral bands showed that biomass was negatively correlated in visible and shortwave infrared (SWIR) parts of the spectrum and positively correlated in the NIR. For height and vegetation heterogeneity no clear pattern of correlation emerged across the spectrum, however, they found the best combination of optical bands had high correlations around 1100 nm, between the two water absorption features (970 nm and 1240 nm) in the near-infrared region. Most terrestrial LiDAR use a band around 1050 nm so the high correlation with structure at 1100 nm regions is consistent with the wavelengths used to measure terrestrial LiDAR observations.
The correlation of individual metrics with structure also showed that the shade fraction, generally the first principal component in image analysis, represents variable scene illumination, and is often used as a proxy for albedo, which is important to include for estimating vegetation structure. These results were used as the starting point for this analysis. In this study we show the importance of the metrics found in Huesca [32] but also other new metrics such as the percentage of surface soil exposed in the images, especially in the savanna ecosystem and the dry plant materials detected in the CAI metric was especially valuable in the conifer high elevation forest.
To define CSTs, we needed to accurately estimate the four key structural variables: biomass, height, vegetation heterogeneity and clumping. Of these, the most commonly estimated variable using optical remote-sensing data is biomass [102]. The high R 2 of the models developed in this study; specifically at TEAK for vegetation height demonstrates the capability of IS to estimate structural variables that are usually obtained from LiDAR data. Multispectral data often have broad bands that blur the scattering and absorption wavelengths, do not cover the full wavelength region from 400 nm to 2500 nm, or have much larger pixel sizes. One additional reason for limited success is that field plots are frequently too few in number and/or limited to a small part of the study area. Using LiDAR as reference data greatly expanded the number and areal coverage of our training and validation data. Using 18 m pixels, we measured more than 150,000, 75,000, and 400,000 observations for SJER, SOAP and TEAK, respectively. Of these, the models were trained on approximately 35,694 pixels from across the sites, tested on 71,388 and validated on 107,083 pixels. The lower overall R 2 at the mid-elevation SOAP site is a function of the fewer number of pixels that comprise this site in addition to its greater complexity.
To be useful, maps of CSTs should capture the major structural features, and differences and trends within each ecosystem. The CSTs defined in each map are ecosystem-specific because each ecosystem has different means and ranges of these structural variables. Based on the existing literature, we defined our CSTs using four widely used key structural variables (i.e., biomass, height, vegetation heterogeneity, and clumping) to ensure their utility for a range of ecological applications. The resulting CSTs were interpretable when compared to the existing CALVEG database for these sites. For instance, CSTs associated with low values of the structural variables were dominated by annual herbaceous vegetation in SJER, and they were associated with a high percentage of sparse tree canopy and non-vegetated areas in SOAP and TEAK. By contrast CSTs characterized by variables with high structural values were mainly associated with high percentages of tall conifer trees in SOAP and TEAK, and a closed canopy cover fraction. Moreover, the interactions of the structural variables in CSTs allow us to identify structural variability across ecosystems.
Although CSTs are based on structure and not on species, more species may co-occur in more complex ecosystems, thus in sites with higher numbers of CSTs. In general, our results show that the number of CSTs follows the number of tree species at each site, with SJER having fewest, TEAK intermediate and SOAP having the largest number of tree species. For our study sites, the savanna ecosystem at SJER had the least structural complexity with just three major CSTs: one dominated by grass species, one by a mix of hardwood species and grasses, and the third by hardwood species. At the other end of the spectrum, SOAP, a mixed conifer-hardwood forest, which is significantly overstocked due to extended fire suppression, while in recent years it has experienced various types of disturbance from fire and bark beetle kill, to management modifications. Thus, the combination of these factors resulted in the greatest structural complexity with six unique CSTs. All structural types were dominated by characteristics of a mixed forest, but with different percentages of conifer and broadleaf trees, according to the CALVEG data. Three of the CSTs were dominated by hardwood species, while the other three were dominated by conifer species. TEAK also had high structural complexity with five different CSTs. We might have expected less structural variability due to the strong dominance of a true fir-pine conifer forest in this ecosystem. Our results showed the dominance of conifer species with canopy height and vegetation complexity (as defined by vegetation heterogeneity and clumping), which together created multiple CSTs. These structural variables are related to biotic competition, fire suppression, and disturbance history.
Identification of different CSTs within a site allows us to better identify the main drivers, or the main source of CST variation in structural variability in each ecosystem. In the low elevation SJER savanna ecosystem, CSTs were mainly driven by differences in vegetation cover (i.e. biomass was the structural variable more relevant in CST discrimination), perhaps in part because of lower overall species diversity. This forest-savanna type forms the boundary between the grasslands of the Central Valley floor and the higher elevation forests. Its structure is primarily constrained by the Mediterranean climate, with long hot summers and extended periods of drought lasting six to eight months or more and secondarily to the shallower soils with low nutrients and organic matter.
In high elevation montane conifer TEAK forest, CSTs were driven mainly by vegetation complexity (i.e., vegetation heterogeneity and clumping were the most important structural variables). This forest is mostly constrained by physical conditions with areas of steep terrain, large areas of granite bedrock exposures at or just below the surface, soils are shallow with low organic matter and the growing season is short due to freezing conditions for six to eight or more months of the year.
We identified several sources of error that should be considered when evaluating the use of optical data for structure variables. These errors fall mainly into two broad groups: (1) measurement errors from LiDAR and IS data, and (2) errors from our modeling approach. Acquisition factors, such as altitude, scan angle, and point density can affect LiDAR data; other factors such as terrain, vegetation composition, canopy closure may also introduce measurement errors in the data [103,104]. IS data is affected by the sun-view angles, flight direction, georegistration, and atmospheric calibration as well as topographic factors, vegetation composition and canopy closure. Among the IS-derived structural variables, clumping was the least well-predicted. This could be due to an impact of low LiDAR point density on this variable (see Supplementary Materials, Figure S3), or because the optical metrics do not capture vegetation clumping as well. It is important to note that we did not observe the point density effects on the IS-derived clumping (see Figure 6b,d,f).
Other errors come from our modeling approach. In particular, the selection of LiDAR-derived reference variables or the selection of optical metrics, each carries its own caveats. That being said, the metrics for this study were selected to balance model parsimony with completeness or complexity. Moreover, there are additional general limitations of random forest regressions, such as predictions with extreme values, or values outside the range of training data; and how the training-testing and validation datasets were selected to avoid spatial autocorrelation, but these may not capture the full spatial variability of the four structural variables due to topography, soils, and other site conditions.
Both measurement error and modeling error may have different impacts on each study site. For instance, in SJER the main driver of CSTs were differences in canopy density, indicating errors in biomass are most relevant for this ecosystem. Alternatively, errors in vegetation heterogeneity and clumping are the most important in study sites such as TEAK, where these variables are the main drivers of CSTs. Accuracy in biomass and canopy height are vital in SOAP since the combination of both are the main drivers of their CSTs. Discretization of continuous structural variables into CSTs for which ecologically meaningful thresholds cannot be defined is another source of error. Thus, pixels on either side of a threshold will be classified into adjacent CSTs, which increase the error with the number of classes of each structural variable. Some differences between LiDAR-derived CSTs and IS-derived CSTs at TEAK could be due to this issue.
Finally, it is important to emphasize that while the model for each structural variable is applicable to all three ecosystems studied, the CSTs are site-dependent. The methodology à propos takes into account the variability within each study site and defines the CSTs based on the entire range of the structural variables within a specific study site. If we can discriminate more CSTs in one area than another it means that site has greater structural complexity. Although we can analyze the main drivers of vegetation structure, we cannot extrapolate the CSTs from one site to another with different structural characteristics.

Conclusions
Our results demonstrate how IS data can be used to create comprehensive and easily interpretable maps of canopy structural types across the ecosystems in the Sierra Nevada Mountains. The models developed accurately estimate four structural variables (i.e., biomass, canopy height, vegetation heterogeneity and clumping) across study sites, demonstrating their applicability to a diverse ecological range of cover types from an open savanna, to a conifer/broadleaf mixed forest to a montane conifer forest, using a single, general model for each variable.
Furthermore, our work provides a methodology to use these structural variables to discriminate ecologically meaningful "canopy structural types" (CSTs). The CSTs are site-specific and capture the major structural trends and differences within each ecosystem. The number of CSTs gives us information about the degree of canopy complexity of the ecosystems. Moreover, results from CSTs analysis also allowed us to identify the main drivers that define the structural variability in each ecosystem. The next steps in this study will be to extend the developed models to a larger diversity of forest ecosystems and to analyze the applicability and the portability of the models to scales. It could be useful to evaluate how CSTs change over time, to determine if the main drivers are always the same for a specific type of ecosystem. Accurate maps of forest canopy structure are critical for many environmental monitoring applications. As we look to the future of forest management, understanding the spatial distribution of canopy structural types will play a key role in our ability to adapt and respond to change.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/11/9/1100/s1. Figure S1: Relationship between maximum tree height from LiDAR measurements and field data. Figure S2: Relationship between LiDAR and IS-derived data for the four structural variables (i.e. biomass (VVP int ), canopy height (Hm), vegetation heterogeneity (CHM) and clumping), for the three study sites (SJER, SOAP and TEAK). The data looks continuous because of the high point density. The different gray levels represent the ranges >50%, 51%-75%, 76%-90% and 91%-95%, and >96% of all observations. Figure S3: Spatial distribution of errors for biomass (a), canopy height (b), vegetation heterogeneity (c) and clumping (d) in SJER. Panels (e), (f), (g), and (h) respectively for SOAP, and (i), (j), (k), and (l) respectively for TEAK. Black, white and grey shades represent underestimation, overestimation and no error, respectively. The coordinates in UTM 11N of the lower left corner of SJER, SOAP and TEAK are (254502, 4104126), (295848, 4098420), (317268, 4090230), respectively. The coordinates of the upper right corner of SJER, SOAP and TEAK are (260100, 4113000), (301572, 4102650), (327186, 4103676), respectively. Figure S4: Histogram of the four structural variables (i.e. biomass, canopy height, vegetation heterogeneity and clumping) of each CST for SJER, SOAP and TEAK. Biomass and clumping values range from 0 to 1, height from 0 to 50 m and vegetation heterogeneity from 0 to 22 m. Red, blue and green represent high, medium and low values respectively. Figure S5: RBG composition based on scaled (0-1) ranges for the three most important canopy structural variables. Spatial/color patterns follow the CST classes for each site. Table S1: Formulas of the narrow-band indices used in this research Table S2: Relationships among all pairs of LiDAR metrics using Pearson's R coefficient for the three study sites (SJER, SOAP and TEAK). (VVPint: vegetation vertical profile, LAI: leaf area index, FC: fractional cover, FC_1ret: fractional cover from the first returns, Hmax: maximum height, Hmean: mean height, Hmedian: median height, Hstd: standard deviation of height, CHMstd: standard deviation of canopy height model). Colored coefficients represent best correlations for biomass (red-bold), height (red-underlined) and vegetation heterogeneity (red-italic). Table S3: Statistics summary of training-testing (TT) and validation (V) datasets for SJER, SOAP and TEAK for each of the four structural variables. VVP and clumping scaled from 0 to 1, Hmean and CHMstd in meters Table S4: Relationship measured by R 2 between the four structural variables (i.e. biomass (VVP int ), canopy height (Hm), vegetation heterogeneity (CHMstd) and clumping) and the most important optical metrics according to the ranking of Random Forest. Table S5: The rank of the highest 10 optical metrics for prediction based on the increase of mean square error (parentheses) when that metric was removed from the structural variable. Table S6: Similarity between classes of VVP int , Hm, CHMstd and Clumping of LiDAR derive data and IS-derive data for SJER, SOAP and TEAK. Table S7: Percentage of spatial coincidence between CSTs defined with LiDAR-derived data and IS-derive data for SJER, SOAP and TEAK.
Author Contributions: M.H. and S.L.U. conceived the manuscript and did most of the writing on it, Margarita Huesca performed all analyses on LiDAR and IS data and drafted the manuscript, K.L.R. participated in the methodology definition in early stages, M.G. provided the methodology for the clumping index and pre-processing LiDAR data. All authors read and made comments on various versions of this manuscript.
Funding: This research was funded by the National Aeronautics and Space Administration (NASA), Terrestrial Ecology program in preparatory research for the proposed HyspIRI satellite (S.L. Ustin PI, grant number NNX12AP87G).