Monitoring Key Forest Structure Attributes across the Conterminous United States by Integrating GEDI LiDAR Measurements and VIIRS Data

: Accurate information on the global distribution and the three-dimensional (3D) structure of Earth’s forests is needed to assess forest biomass stocks and to project the future of the terrestrial Carbon sink. In spite of its importance, the 3D structure of forests continues to be the most crucial information gap in the observational archive. The Global Ecosystem Dynamics Investigation (GEDI) Light Detection and Ranging (LiDAR) sensor is providing an unprecedented near-global sampling of tropical and temperate forest structural properties. The integration of GEDI measurements with spatially-contiguous observations from polar orbiting optical satellite data therefore provides a unique opportunity to produce wall-to-wall maps of forests’ 3D structure. Here, we utilized Visible Infrared Imaging Radiometer Suite (VIIRS) annual metrics data to extrapolate GEDI-derived forest structure attributes into 1-km resolution contiguous maps of tree height (TH), canopy fraction cover (CFC), plant area index (PAI), and foliage height diversity (FHD) for the conterminous US (CONUS). The maps were validated using an independent subset of GEDI data. Validation results for TH ( r 2 = 0.8; RMSE = 3.35 m), CFC ( r 2 = 0.79; RMSE = 0.09), PAI ( r 2 = 0.76; RMSE = 0.41), and FHD ( r 2 = 0.83; RMSE = 0.25) demonstrated the robustness of VIIRS data for extrapolating GEDI measurements across the nation or even over larger areas. The methodology developed through this study may allow multi-decadal monitoring of changes in multiple forest structural attributes using consistent satellite observations acquired by orbiting and forthcoming VIIRS instruments.


Introduction
Human activities and natural processes are rapidly changing the global distribution and structure of forests [1]. The subsequent changes in forest biomass are thought to dominate changes in net terrestrial carbon flux, altering the global carbon cycle, affecting species diversity, and changing Earth's climate [1][2][3]. However, the total amount of carbon contained in forest's biomass and subsequently the future state of the terrestrial Carbon sink remain highly uncertain [4]. Constraining this uncertainty is paramount to safeguarding the future of our planet not least through informing sustainable local forest management policies, forest-based natural climate solutions, and international carbon emission reduction initiatives [5][6][7][8].
Spatially contiguous information on forest 3D structure is needed to better resolve forest biomass distribution and net terrestrial carbon flux [9][10][11][12][13]. Indeed, a number of quantitative and observable 3D forest structure characteristics (canopy cover, canopy height, leaf area index, and vertical foliage distribution) are essential for modeling the true magnitude of global forest biomass stocks as they exist today [14], modeling biomass ping methods developed for VIIRS should be applicable for MODIS data, thus providing the opportunity for operational multi-decadal monitoring of key forest structural attributes from the 2000 MODIS first images until the nominal life expectancy of the fourth VIIRS instrument expected to last through 2031 [41].

VIIRS Multitemporal Metrics
The VIIRS has 16 bands with a nadir resolution of 750 m (M-bands) and five bands with a 375 m nadir resolution (I-bands). Following previous MODIS and VIIRS-based global vegetation and land cover studies [22,24,39,[42][43][44], 9 of the 16 VIIRS M-bands were processed (Table 1) to create a set of multi-temporal spectral metrics to serve as consistent land surface phenology inputs for mapping 3-dimensional canopy structure. This section provides an overview of VIIRS data processing. More details have been described in previous studies [22,39,44]. The VIIRS swath level surface reflectance data for the year 2019 were obtained from the NOAA's Comprehensive Large Array-Data Stewardship System (CLASS). The swath data were then projected into a sinusoidal projection and mosaicked into global daily data following the method described in [39]. The spatial resolution of the global mosaic daily data was 926.65 m. These daily mosaics were then aggregated to create 5-day and 32-day composites using the self-adaptive compositing approach which employs spectral and temporal information to determine the general surface cover condition (e.g., vegetation/land, water, snow/ice) of a given pixel and adaptively select the most suitable compositing criterion for that pixel [44]. The monthly composited values were used to create the annual spectral and phenology metrics using the methods described in [19,39]. Compositing reduces temporal inconsistencies associated with clear-sky data availability, cloud shadow and snow cover contamination [45] and hence may allow more consistent mapping of vegetation structure [19,22,24].
In addition to the composite spectral data and phenology statistics, the VIIRS-derived annual surface type data [8,10,14], data from the VIIRS Global Land Surface Phenology product (VNP22Q2) [23], and the geographic position of each pixel encoded in coordinates of latitude (−90-90 • ) and longitude (−180-180 • ) were also incorporated into the multitemporal metrics. VNP22Q2 data were resampled from its native resolution of 463.325 m to the MODIS sinusoidal 926.65 m grid by averaging the values of the 4 spatially coincident 463.325 m pixels. The incorporated data from the VNP22Q2 product were the salient transition dates of the phenology cycle (start, peak, onset of senescence, and end of the plant growth season), the length of the plant growth cycle, the growth season integrated two band enhanced vegetation index (EVI2) [46], and the rates of change in EVI2 values during the vegetation growth and senescence phases. The inclusion of geographic position was based on the hypothesis that because large scale climate patterns exert strong control on the geographic distribution of vegetation biomes, geographic position serves as a good predictor of land cover and vegetation class at continental to global scales [43]. Table 2 provides a full list of the multi-temporal metrics used to produce the wall-to-wall maps of CFC, CH, PAI, and FHD. Table 2. Multi-temporal phenology metrics used to produce wall-to-wall maps of tree canopy structure. Band x refers to one of the bands listed in Table 2.

Metric Name Description
Annual

The GEDI data
GEDI detectors record the amount of laser energy reflected by earth 3D structures at different heights above the ground [10,29]. The recorded energy as a function of distance is the GEDI waveform. The GEDI instrument is comprised of 3 lasers, one of which is split into two beams ("coverage" beams), while the other two remain at full power ("power" beams). At any one time, 4 beams, each with footprint diameter of approximately 25 m, are incident on the ground. Each of these 4 beams is then dithered every other shot across track, to produce 8 tracks of data, separated by about 600 m across the flight track direction. For any one track, footprint centers are separated by 60 m along track. Over the 2-year mission, GEDI is expected to acquire waveform data for~10 billion cloud-free footprints [10].
The distribution of the laser energy within the GEDI~25 m footprints was used in [29,30] to determine the height, canopy cover, and vertical distribution of plant material. A detailed description of the GEDI L2B footprint canopy cover and vertical profile metrics can be found in [30]. In the GEDI L2B product, the first and last modes (i.e., the detected signal above noise) within the GEDI waveform have been associated with the highest and lowest perceived reflecting surfaces [30]. Adam et al. [35] compared GEDI L2B canopy height data calculated from the first and last modes of the GEDI waveform (rh100) to the maximum of ALS-derived crown canopy height. Potapov et al. [9], on the other hand, compared GEDI relative height metrics data (rh90, rh95, and rh100) to the 90th percentile distribution of ALS-based forest canopy height data. The comparisons in [9] produced similar precision values (r 2 =~0.7) but the metric rh90 underestimated canopy height (mean difference 2.3 m) and rh100 overestimated (mean difference −2.7 m) compared to rh95 (mean difference 0.7 m). However, vegetation growth from the time of acquisition of ALS reference data (2012-2017) to the time of acquisition of GEDI data (2019) might have influenced the comparisons in [9]. In this study, the relative height difference between the first and last modes (rh100) was used to represent canopy height (CH).
In addition to plant canopy height, the GEDI LiDAR waveform was used in [30] to derive canopy fraction cover (CFC), plant area index (PAI), and foliage height diversity (FHD). The GEDI-derived CFC is the percent of the ground covered by the vertical projection of canopy material (i.e., leaves, branches, and stems). CFC accounts for the gaps between and within the canopy [10,30] and is therefore different from other commonly used definitions such as the "crown cover" definition in [47] which does not account for within plant canopy gaps.
The GEDI-derived PAI is related to Leaf Area Index (LAI) defined as one half of the total leaf area per unit ground surface but differs from LAI in that it incorporates, in addition to leaves, all canopy structural elements [30]. Finally, the GEDI-derived FHD is calculated from the PAI vertical profile (Equation (1)) and is a measure of the complexity of canopy structure with higher FHD values often associated with multiple canopy layers [48].
where pi is the proportion of vertical LAI profile in the i th layer, summed over the number of layers. The GEDI products also include several quality flags that indicate the overall validity of a waveform for measuring surface structure. This section provides an overview of the quality flags used in this study. A more complete description of the quality flags is provided in [29,30]. The waveform "sensitivity" flag provides an estimate for the relative minimum percentage of the waveform energy return that needs to be present in the ground for it to be detected. The "sensitivity" flag therefore aids in identifying the ability of waveforms to penetrate canopies and detect the ground level. The "quality" flag is an aggregation of several individual quality assessment parameters and is intended to provide end users with the ability to easily remove erroneous and/or lower quality returns. The "degrade" flag indicates whether the recorded waveform contains saturated intensities. Saturated waveforms are flagged since they do not accurately represent the return photon flux and might degrade the precision of the ground elevation and canopy height measurements.

GEDI Data Processing
The GEDI level 02B product data granules [49] over North America were obtained for the two months of June and July 2019. Data from June and July were selected since for most of the conterminous US, except for some arid areas, tree canopy during the selected period is at or near full leaf out [23]. The obtained data contained 254,550,672 GEDI level 02B footprints of tree height, canopy cover, and vertical distribution of plant material. To select the highest quality data, the GEDI data were further filtered to remove the waveforms flagged as unsuitable for measuring canopy structure ("Quality flag" = 1), flagged as detector-saturated ("Degrade flag" = 1), or had sensitivity values <0.95. The sensitivity value of 0.95 was selected to reduce the proportion of false positive ground returns [50]. The application of these filters removed most of the data collected from the GEDI "coverage" beams, cloud-contaminated observations, and observations taken over water bodies [9,10]. The filtering resulted in a subset of 27,361,244 GEDI footprints or 10.75% of the data collected over the study area during the two months of June and July. The remaining GEDI level 02B data were then mapped into the VIIRS standard global 1-km Sinusoidal grid system. This resulted in 1,324,640 of collocated VIIRS-GEDI paired data records. The number of GEDI footprints falling within a collocated VIIRS grid cell ranged between 1 and 29 observations except for the VIIRS grid cells where the ascending and descending GEDI tracks intersected. In the latter case, the maximum number of GEDI observations falling within a VIIRS grid cell reached a maximum of 59 observations. Some 349,634 (or 26.4%) of the collocated VIIRS-GEDI grid cells contained 20 or more GEDI footprints and 580,744 (or 43.8%) of the collocated VIIRS-GEDI grid cells contained 10 or less GEDI footprints. The remaining VIIRS grid cells (394,471) contained 11 to 19 GEDI footprints.
The average values of GEDI products within the coincident VIIRS grid cells were calculated in order to investigate the relation between VIIRS and GEDI data products. The output are metrics of GEDI-VIIRS paired collections. However, only GEDI-VIIRS paired collections calculated from 20 or more GEDI observations were retained. The selection of the cutoff value of 20 was made to minimize the errors associated with under-sampling of GEDI observations within a VIIRS grid cell while maintaining a sufficiently large sample of GEDI-VIIRS paired collections that can capture variations in canopy structure across the study area. The resulting GEDI-VIIRS metrics contained 349,634 paired records relating VIIRS data to GEDI-derived CH, CFC, PAI, and FHD.
The GEDI-VIIRS paired collections were split into calibration and validation subsets. The calibration subset was drawn from a random sampling of half of the downloaded GEDI tracks. The validation subset were derived from the remaining GEDI tracks ( Figure 1). Selecting independent GEDI tracks for model calibration and validation, especially when the geographic position is included as an explanatory variable, should reduce the impacts of spatial autocorrelation on model validation results [43,51]. The resulting GEDI-VIIRS training subset contained 162,477 paired records, whereas the validation metrics contained 187,157 paired VIIRS-GEDI records.

Calibration and Validation of the Random Forest Regression Models
The paired GEDI-VIIRS calibration data were used to train four random forest regression models [52] that can predict GEDI-like CH, CFC, PAI, and FHD. In total, 159 VIIRSderived multi-temporal phenology metrics (Table 2) were used as input features (i.e., independent variables) in model construction.
Random forest regression models are an ensemble learning method. The method grows a user-defined number of regression trees and averages their predictions. In random forests, each tree in the ensemble (i.e., forest) is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. Furthermore, when splitting each node during the construction of a tree, the best split is found either from all input features or from a random subset of inputs defined by the user [52]. The size of the subset of inputs and other regression tree model hyper-parameters (e.g., number of trees in the forest, the maximum depth of the regression trees, the minimum number of training samples required to split a node, and the minimum number of training sample to form a leaf) that produce optimal model performance can be estimated by an exhaustive search over user specified parameter values or through a randomized sampling from a user specified distribution of parameters settings [53].
To optimize model performance, the hyper-parameter values of the random forest models were approximated through a randomized sampling of 300 parameter setting. The 300 hyper-parameter samples were obtained from the distributions outlined in Table 3. Accordingly, a total of 1200 random forest models were evaluated (300 for each of the four random forest regression models). A randomly selected subset of the calibration data (15% of the calibration dataset) were used to search for the optimal hyper-parameter values. The use of only 15% of calibration data to tune the hyper-parameters of the models was done to maintain a reasonable utilization of computing resources. The hyper-tuned random forest model parameters that better explained the out of bag sample variances of the GEDI-derived CH, CFC, PAI, and FHD (i.e., the models with the highest evaluated r 2 value) were retained ( Table 4). The retained hyper-parameters were subsequently used to calibrate four random forest regression models using the complete training dataset of 162,477 paired GEDI-VIIRS records. Model hyper-parameter and model calibration were performed using the Scikit-learn platform in Python [53]. Model validation was performed on the independent validation sample of 187,157 GEDI-VIIRS paired records derived from the GEDI granules not used in model calibration (see Section 2.3 for more details). The spatial distribution of the validation sample is shown in Figure 1. The four calibrated random forest regression models were then used to predict the four canopy structural elements CH, CFC, PAI, and FHD using VIIRS phenology metrics as model inputs. These VIIRS-derived CH, CFC, PAI, and FHD were compared to their corresponding GEDI-derived values in order to evaluate the performances of the four random forest regression models. The evaluation criteria were the ability of the models to explain the variances observed in the GEDI-derived canopy structural elements (R squared values as a measure of the precision of the models); and the accuracy of the four models measured by calculating the mean and median absolute errors as well as the root meansquared error (RMSE). We also investigated model bias by calculating the distribution of differences between model-derived canopy structural elements and their corresponding GEDI-derived values at multiple intervals of the variables' distribution.

Derivation and Assessment of Vegetation Structure Attributes for CONUS
The calibrated random forest regression models were applied to the 2019 multitemporal phenology metrics (Table 2) in order to produce wall-to-wall maps of canopy height, fraction canopy cover, plant area index, and foliage height diversity index for the conterminous US. In producing the wall-to-wall maps, water bodies were not processed since the GEDI data used in model calibration did not include observations over water. The 926.65-m grid size water mask was calculated from the global 231.66-m land/water product [54]. Any 926.65-m was flagged as a water pixel if any of the 16 spatially coincident pixels in the original product were classified as water pixels.
Summary statistics and histograms were used to assess variations in the spatial distribution of the VIIRS-derived forest attribute values between ecoregions. The ecoregion boundaries used were the Environmental Protection Agency (EPA) level 1 and level 3 ecoregion boundaries. EPA ecoregions are identified by analyzing the patterns and composition of biotic and abiotic phenomena that affect or reflect differences in ecosystem quality and integrity [55][56][57]. Spatial patterns of forest structure attributes were expected to conform to the boundaries of EPA ecoregions since the biotic and abiotic phenomena used to delineate the ecoregions include geology, landforms, soils, vegetation, climate, and land use; all factors known to influence potential natural vegetation, its type and structure.

Random Forest Model Validation
Results from the comparison of the VIIRS-derived canopy attributes with their corresponding GEDI-derived attributes are summarized in Table 5. The precision (degree to which variations in GEDI-derived values were explained by the models) of the FHD model was higher (r 2 = 0.83) than that for the other models with the PAI model having the lowest precision (r 2 = 0.76). When calculated over the entire data range, comparisons between the modeled and GEDI-derived values had negligible bias and offset values ( Figure 2). However, when the distribution of the differences between the VIIRS-derived and the GEDI-derived values were plotted for strata representing intervals drawn from the range of each canopy attribute, it was noticed that the canopy height model underestimated GEDI-derived canopy height data for tall (>27 m) forests and the Plant Area Index model underestimated the GEDIderived Plant Area Index at index values greater than 2 ( Figure 3). Underestimation of fraction tree cover in denser forests were less pronounced while the comparisons of FHD values showed good correspondence between the modeled and reference data for all strata (Figure 4).

Wall-to-Wall Maps of Canopy Structure for the Conterminous US
The four random forest regression models were used to extrapolate from GEDI products wall-to-wall maps of canopy height, canopy fraction cover, plant area index, and foliage height diversity for the conterminous US using the phenology metrics derived from the observations acquired by the VIIRS instrument onboard the NOAA-20 satellite for the year 2019. Overlaying the EPA ecoregions on these maps showed distinct variations in the spatial patterns of forest structure attributes between EPA level III ecoregions (Figures 5 and 6). This was expected since the factors that were used to delineate the ecoregion boundaries reflect the aggregate of geographic characteristics (both natural and anthropogenic) that determine potential vegetation, its type, horizontal distribution, and vertical profiles. The remainder of this section further examines the relation of EPA ecoregions to the VIIRS-derived forest attributes.      The VIIRS-modeled canopy height map (Figure 5a) demonstrated canopy height (CH) variations across natural forest and ecoregion types in the conterminous US. The tallest modeled tree heights were in the Klamath Mountains and the Cascades of the Northwestern Forested Mountains as well as in the coast range of the Marine West Coast Forests (Figure 5c). These ecoregions are habitats for very tall conifer species including the coast Douglas-fir (Pseudotsuga menziesii ssp. menziesii), the Western Red Cedar (Thuja plicata), the coast Redwood (Sequoia sempervirens), and Sitka Spruce (Picea sitchensis). The next modeled tallest trees were in the North Cascades, Eastern Cascade slopes and foothills, and in the northern reaches of the Sierra Nevada range. These regions are also home to the tall coniferous species of Douglas-fir, Western Red Cedar, and Jeffrey pine (Pinus jeffreyi). Similarly, modeled tree height results suggest that some forested regions in the Northern Rockies are also taller than the Eastern temperate forests and the Northern Hardwood Forests of the US (Figures 5a and 7a). Modeled canopy height values in the Eastern Temperate forests also show height variation related to natural forest types, clearly separating taller Appalachian and Blue Ridge Mountain forests from lower canopy height forests of the Piedmont Plateau, the Southeastern Plains, and the Southern Coastal Plains (Figure 5a). However, the forests of the Piedmont Plateau were, in general, taller than the northern hardwood forests of the Great Lakes (Figure 7b).
Results from the canopy fraction cover (CFC) model showed several ecoregions with expansive forests characterized by high percentage canopy cover (Figure 5b). For example, the landscapes of the Blue Ridge and the Central Appalachians ( Figure 5d) had a median VIIRS-derived percentage canopy cover greater than 55%. In general, most of the natural forest types of the Eastern Temperate forests had a median VIIRS-derived canopy fraction cover exceeding that of the forests of the Northwestern Forested Mountains except for the sub ecoregions of the cascades and the Coast Range (Figure 7c). The latter sub-regions had median percentage cover values comparable to the Eastern Temperate Forests and the Northern hardwood forests.
VIIRS-modeled Plant Area Index (PAI) values were found to be higher in forests with tall trees and high percentage canopy cover (Figure 6a). Large areas of the Blue Ridge and the Central Appalachians had PAI values greater than 2.75. In comparison, fewer forested areas had PAI values greater than 2.75 in the South-Western Appalachians, the Western Allegheny Plateau, and in the Northwestern parts of the Interior Plateau (Figure 6c). Some patches of the tall forests in the Cascades, Coast Range, and to a lesser extent in the Klamath mountains ecoregions had VIIRS modeled PAI values between 2.0 and 2.75. Large areas with relatively high VIIRS-modeled PAI values were also found in Ouachita/Boston Mountains ecoregions, and Northern Appalachian/Maritime Highlands. Except for the Cascades and Coast range, most of the Northwestern Forested Mountains had lower PAI than the Northern Hardwood Forests and the Eastern Temperate Forests (Figure 7d).
The spatial arrangements of the VIIRS-modeled foliage height diversity (FHD) values showed different spatial patterns from those for PAI, CFC, and CH (Figure 6b). Compared to other ecoregions with similar canopy height, and canopy fraction cover, the FHD values over large areas of the Sierra Nevada range were surprisingly high (Figure 6d). Similarly, the relatively shorter and sparse coniferous forests of the Middle and Southern Rockies ecoregions had, on average, similar FHD values to the taller and denser canopies of Klamath Mountains. In general, median FHD values were highest in the Northwestern Forested Mountains followed by the Eastern temperate forest, and the lowest median FHD values were modeled for the Northern Hardwood Forests (Figure 7e). The distribution of very high FHD values were in the sub-regions of the Appalachians, Ouachita/Boston Mountains ecoregion, the Cascades, Klamath Mountains, and large swaths of the Sierra Nevada range (Figure 6b). The multiband color rendering of CH (red band), CFC (green band), and PAI (blue band) further illustrates how these three forest structural attributes vary across the CONUS (Figure 7a). The color intensity of the Red-Green-Blue bands in Figure 7a is linearly related to the spatial distribution of its corresponding forest structure attribute values. Compared to other forests in the CONUS, forests with relatively high CH, CFC, and PAI values are rendered in white tones such as in the Marine West Coast Forests, the Cascades, and Central Appalachians. Darker red/orange tones such as in the central and northern Rockies represent more open canopy with relatively tall trees. Darker green tones such as in the Piedmont Plateau represent dense canopy with relatively shorter trees. Black tones in Figure 7a are non-forested areas.

Discussion
After its nominal two-year lifetime, GEDI will have covered the majority of 1-km cells between 51.6 • N and S with two or more ground tracks [34]. However, cloud contamination, track spacing, clumping in the ISS orbital tracks, and removal of GEDI waveforms unsuitable for measuring canopy structure can result in large coverage gaps at the 1-km grid [31,58]. In this study, only 10.75% of the GEDI footprints were retained after filtering out waveforms flagged as "low quality" and waveforms with a sensitivity value <0.95. The estimation of forest structure attributes for the 1-km grid cells that have few or zero "high quality" GEDI waveforms can benefit from using statistical models that integrate GEDI-derived forest structure data with wall-to-wall satellite images [31,58]. Similarly, the GEDI L4B 1-km gridded science data product will use statistical models that utilize wall-to-wall satellite images to estimate above ground biomass in grid cells with less than two coincident GEDI ground tracks [10].
This study demonstrated the potential of using random forest regression models and VIIRS data to extrapolate the GEDI-derived forest structure attributes into wall-to-wall maps of CH, CFC, PAI, and FHD. It should be noted that the GEDI and VIIRS-derived PAI are different from True PAI and LAI since large-footprint LiDAR systems cannot directly measure leaf clumping or leaf angle distribution. End users are therefore recommended to apply conversions based on their best knowledge to convert GEDI's PAI to true PAI and LAI for running earth system models [30].
The accuracy of the modeled results in this study depended on several factors, notably the quality and availability of reference LiDAR data, the quality and consistency of the optical VIIRS data, and any limitations of the ability of the statistical models, to accurately predict the outputs without bias. One factor which could have affected the generalization ability of the models was the number of cloud-free "high-quality" GEDI observations used to calculate the mean values of the canopy structural attributes within the 1-km VIIRS grid cell. The number of GEDI observations ranged between 20 and 59. Especially in heterogeneous landscapes, under-sampling can result in high mean standard errors [59] and thus in inaccurate representation of canopy structural attributes at the 1-km resolution grid cell [34,43]. The availability of GEDI observations is expected to increase as soon as the data collected during the full leaf-out period in 2020 become publically available. Furthermore, the expected future availability of more GEDI data will increase the size of the GEDI-VIIRS training sample for some of the currently underrepresented surface types (e.g., human settlements and landscapes with frequent cloud cover). Therefore, the expected increase in GEDI data availability should increase the accuracy of model outputs.
In addition to GEDI data availability, other issues still exist in GEDI data that can affect model performance. Surface slopes, for instance, can affect tree height estimation by LiDAR instruments [60,61]. While the GEDI footprint size was selected to limit the vertical mixing of vegetation and ground signals caused by surface slope [10], comparisons of the GEDI height metric (RH95) with airborne laser scanning (ALS) data in [9] revealed that RH95 tends to overestimate canopy height in areas of complex topography and sparse tree cover. Additional corrections of the footprint-level data may therefore be required to reduce slopeinduced canopy height uncertainty [9]. Furthermore, residual geolocation errors of GEDI data were also found to affect the calibration and validation of the statistical models [9]. Currently available GEDI data from [62] has a horizontal geolocation accuracy between 10 and 20 meters [10] which was found to affect the performance of Landsat-based tree height models [9]. Future calibration process is expected to reduce the geolocation error down to 8 m horizontal [10]. Nonetheless, the impacts of surface slopes and residual geolocation errors on the results of this study should be less pronounced than the impacts surmised in studies utilizing higher spatial resolution optical data [9]. Aggregating LiDAR data to a 1-km grid cell size is expected to reduce model uncertainty due to spatial averaging of errors [5,59,[63][64][65][66]. For instance, the impacts of LiDAR positioning errors on biomass prediction errors [63,64] and on the prediction errors of tree canopy attributes [66] were all found to decrease exponentially with mapping resolution [59]. A theoretical analysis on how spatial aggregation can reduce spatial random noise and improve the accuracy and precision of predictive models is presented in [67].
In addition to inherent errors in GEDI data, validation results using the reserved sample of 187,157 VIIRS-GEDI observations showed an underestimation of canopy height for tall (>27m) forests and an underestimation of Plant Area Index at index values greater than 2.5. Other studies integrating LiDAR and optical data using regression forests (RF) or regression trees similarly reported an underestimation of tree height in tall forests [9,68]. This might be due to (1) the problematic behavior of the random forest regression models in situations where the training and prediction inputs differ in their range and/or distributions (i.e., covariate shift); (2) the low sampling rate in the training data of very tall forests; (3) the loss of sensitivity of optical data, especially in the visible spectrum, to increasing LAI/PAI values [26,27,69]; and (4) due to the previously discussed errors in the GEDI training data. Highly localized calibration of Landsat-based models of canopy height were found in [70] to improve model performance and reduce model biases at either edge of the data range. Substituting RF models with Deep Learning models might reduce model biases at the higher end of the distribution of canopy height and PAI values. Indeed, recent applications of deep learning (DL) regression models to retrieve continuous canopy structural attributes [68,71,72] showed promising results. DL models were found in [68] to outperform RF models in mapping canopy height in Northwestern China. Convolutional Neural Networks (CNNs) applied to Sentinel-2 data to estimate vegetation height were demonstrated to have better sensitivity to the taller canopy heights in [72] than RF models applied to Landsat data in [9] and to VIIRS-data in this study.
It should be stressed that while the resulting 1-km VIIRS-derived maps of forest attributes showed good agreements with the GEDI-validation data, the GEDI-VIIRS comparisons cannot be construed as accuracy assessments of the derived 1-km maps. These comparisons are a representation of the ability of the models to extrapolate GEDI-like forest structure attributes across the CONUS. A robust accuracy assessment should consider the errors in the GEDI-derived products. Results from the GEDI pre-launch calibration and validation studies suggested GEDI ability to measure subtle variations in the complex vertical canopy structure [50] at sub-meter vertical accuracy [10,30]. Several studies also demonstrated the added value of incorporating simulated GEDI data into models that map forest structure and biomass from space-born Interferometric Synthetic Aperture Radar data [31,58,73]. However, the accuracies of post launch GEDI data and products might differ from the accuracies of the ALS-simulated GEDI data due to, for example, geolocation inaccuracies, and spatiotemporal variations in atmospheric attenuation. A robust postlaunch validation is needed to assess the accuracy of GEDI science data products. A robust post-launch validation requires contemporaneous reference data (e.g., Terrestrial Laser Scanning and Airborne Laser Scanning data) that are collocated with on-orbit GEDI data, and encompasses a broad range of tropical and temperate vegetation types. The acquisition and processing of these data is a global collaborative effort that is being coordinated by the GEDI calibration and validation program. These reference data, once available, will be used by the GEDI team to validate GEDI performance against design requirements, assess the repeatability of the GEDI level 1 and 2 product variables, and verify their accuracy [10].
Other approaches are currently being tested to produce 1-km or finer scale wall-towall maps of 3D vegetation structure. These include a spatially interpolated GEDI-derived level 3 gridded canopy vegetation metrics [10], the fusion of GEDI with interferometric SAR [31,58,74,75], and the fusion of GEDI and ICESAT-2 data with moderate (10-30 m) resolution optical remote sensing data [9,68,72]. While Sentinel-1/2 and Landsat data were utilized to derive tree height and tree cover products at much finer spatial detail [9,68,72] than the VIIRS-derived products, they can be subject to data gaps in cloudy regions and in regions with sparse Landsat/Sentinel-2 data record. Compared to finer spatial resolution sensors, the VIIRS very high data acquisition rate reduces the effects of clouds and incomplete atmospheric correction on the temporal consistency of VIIRS data which should in turn reduce spatial and inter-annual fluctuations in modeled results that would otherwise be caused by atmospheric contamination. Furthermore, the generalization ability of parametric and non-parametric statistical models was found to be inversely related to spatial scale [59,64] as a results of the spatial aggregation of errors [5,[63][64][65] and because aggregation reduces the contamination of a pixel reflectance values from reflected light originating beyond the pixel's area (i.e., aggregation reduces the impacts of the sensor's point spread function or PSF) [76].
Improving the accuracy of the statistical models will enable a more precise analysis of inter-annual and multi-decadal forest dynamics. Robust models will enable the annual monitoring of changes in forest structural attributes. This is essential for monitoring post disturbance forest regrowth, forest degradation, and carbon emissions from land conversions. It is expected that more robust models can be achieved by (1) the increased availability of the GEDI data; (2) the planned improvements to the GEDI data and the quality of its products [9,10]; (3) and tuning the hyper-parameters and parameters of deep learning algorithms.

Conclusions
Accurate, multi-year, 3D vegetation structure data at spatial resolutions of 1-km or less are needed by the ecosystem science community to constrain the errors associated with modeling above ground biomass (AGBM), Net Ecosystem Exchange (NEE), and Carbon fluxes associated with forest disturbance events and post-disturbance forest recovery [1]. The higher the accuracy of the derived 3D vegetation structure, the more feasible it is to precisely model biomass and Carbon fluxes especially where the scale of mapping matches the important scales of vegetation dynamics and the underlying environmental gradients [1].
This study demonstrated the feasibility of integrating GEDI-derived tree canopy structure data with VIIRS-derived multi-temporal phenology metrics to produce contiguous maps of tree canopy height, canopy fraction cover, plant area index, and foliage height diversity. Unfortunately, the limited availability of reference data did not allow for a robust accuracy assessment that considers the propagation of the errors that resulted from using the modeled GEDI forest structure products. Nevertheless, if the GEDI-derived tree canopy data do provide a valid representation of canopy 3D structure as suggested by GEDI pre-launch validation studies, the VIIRS-derived maps produced in this study can assist in modeling the true magnitude of global forest biomass stocks and for prognosticating future trends in the strength of the land carbon sink.
Future endeavors will focus on increasing the robustness of the statistical models to enable multi-decadal monitoring of forest dynamics at global scale using the entire VIIRS-NPP and MODIS data records. Extending the methods presented in this study to global coverage may require training separate models for each biome or for each vegetation type as suggested in [9,72] in order to overcome the limitations of model overfitting when employing very large training datasets.
Furthermore, it will be worthwhile to evaluate the potential of developing global 3D vegetation structure data using the VIIRS 350m I bands with a precision that is sufficient to calculate about 80% accurate 500 m biomass maps as this would represent a significant advance forward and provide the most accurate baseline map of carbon stocks to date [10].
Funding: This study was supported by NOAA grant NA19NES4320002 (Cooperative Institute for Satellite Earth System Studies -CISESS) and NOAA grant NA14NES4320003 at the University of Maryland/ESSIC. Additional support for Khaldoun and Huang was provided by NASA through grant 80NSSC18K0833 and US Fish and Wildlife Service through grant F18AC00973.

Data Availability Statement:
Publicly available datasets were analyzed in this study. GEDI data were obtained from https://search.earthdata.nasa.gov/ and VIIRS data were obtained from NOAA's Comprehensive Large Array-data Stewardship System (https://www.avl.class.noaa.gov/).