Highly Local Model Calibration with a New GEDI LiDAR Asset on Google Earth Engine Reduces Landsat Forest Height Signal Saturation

: While Landsat has proved to be e ﬀ ective for monitoring many elements of forest condition and change, the platform has well-documented limitations in measuring forest structure, the vertical distribution of the canopy. This is important because structure determines several key ecosystem functions, including: carbon storage; habitat suitability; and timber volume. Canopy structure is directly measured by LiDAR, and it should be possible to train Landsat structure models at a highly local scale with the dense, global sample of full waveform LiDAR observations collected by NASA’s Global Ecosystem Dynamics Investigation (GEDI). Local models are expected to perform better because: (a) such models may take advantage of localized correlations between structure and canopy surface reﬂectance; and (b) to the extent that models revert to the mean of the calibration data due to a lack of discrimination, local models will revert to a more representative mean. We tested Landsat-based relative height predictions using a new GEDI asset on Google Earth Engine, described here. Mean prediction error declined by 23% and important prediction biases at the extremes of the range of canopy height dropped as model calibration became more local, minimizing forest structure signal saturation commonly associated with Landsat and other passive optical sensors. Our results suggest that Landsat-based maps of structural variables such as height and biomass may substantially beneﬁt from the kind of local calibration that GEDI’s dense sample of LiDAR data supports.


Introduction
The spatial, temporal, and spectral resolution of data collected by the Landsat platform has proved appropriate for a range of important ecosystem monitoring applications [1,2]. However, Landsat has been used to study forest structure variables with only moderate success. As a passive optical sensor, Landsat's reflectance signal is primarily determined by reflective properties of the top of the forest canopy; structural elements related to height are generally not directly sensed. The signal saturation that occurs when trees continue to grow below a closed canopy can limit differentiation of biomass levels [3][4][5] and other aspects of forest structure [4,6,7].
There is nevertheless ample motivation to develop strategies for using Landsat to map structure. No other space-based platform offers Landsat's potential for historical perspective related to forest height and structure. Relying upon consistent radiometric processing since the mid-1980s [8], several studies have calibrated a forest structure-related model in one part of a time series and applied that model to imagery from different years [9][10][11]. Landsat can also contribute, when combined with a sample of high-quality reference data, to statistical estimates of population parameters such as mean biomass [12][13][14].
While Landsat may not respond to sub-canopy forest structure directly, structure is often at least somewhat correlated with Landsat reflectances, particularly in the shortwave infrared range. Landsat reflectances, or metrics derived from these, such as local texture, may in some cases be sensitive to canopy traits involving shading or moisture content, which are themselves correlated with structural variables [15,16]. Structure models are most likely to be accurate when restricted to local areas or specific forest types. Zhao et al. [17] found that stratifying Landsat-based biomass models by topographic aspect increased the amount of variation explained. Foody et al. [18] found that accurate Landsat-based biomass models could be trained at the site level, but that transfer of those models to other sites substantially reduced performance. They speculated that the contribution of local forest structure and composition to the remotely sensed response differed in important ways across sites. Locally trained models may minimize this cross-site uncertainty. Additionally, in extreme cases where a model has no basis for differentiation and simply predicts the mean of the training data, error will likely be lower if training and prediction occur over a narrower ecological range.
Spaceborne LiDAR datasets have the potential to provide canopy height measurements at densities sufficient to train a different height model every few kilometers. Several studies have used waveforms from the GLAS sensor on the ICESat platform to greatly expand the amount of training data available for height or biomass models using Landsat or MODIS as independent variables [19][20][21]. NASA's Global Ecosystem Dynamics Investigation (GEDI) is a waveform LiDAR sensor mounted on the International Space Station, which overflies latitudes between 51.6 • N & S. GEDI will provide significantly more waveform LiDAR observations than GLAS, using an acquisition strategy designed to maximize coverage of the world's tropical and temperate forests [22].
The width of GEDI's observation footprint, approximately 25 m [23], roughly corresponds to Landsat's 30 m pixel size, allowing more straightforward footprint-to-pixel matching. We hypothesize that prediction error for canopy height, as a representative structure variable, will decline as Landsat-based models are fitted with GEDI measurements at increasingly local scales, such that longstanding signal saturation problems will be substantially diminished. Testing this hypothesis on a global scale requires a tool that allows flexible combination of the Landsat and GEDI data records. Accordingly, we introduce a raster-based GEDI asset on Google Earth Engine (GEE) [24], a cloud-based platform that provides both access to the entire Landsat archive (and other key datasets) and a powerful application programming interface. This publicly accessible GEDI asset, described below, will be maintained and updated throughout the mission.

GEDI Relative Height Data on Google Earth Engine
The GEDI GEE asset is constructed from a subset of the variables contained in the GEDI L2A Global Footprint-Level Elevation and Height Metrics Product [25], which is publicly available from the NASA Land Processes Distributed Active Archive Center (DAAC). This subset includes 12 identification and quality flag variables and 10 relative height (RH) values thought to be useful for forest structure modeling ( Table 1). The reader is directed to the DAAC for definitions and processing details for each variable. For each mission month, all footprints having a quality value of 1 and having a "leaf-on" label according to the phenology map used by GEDI are rasterized in a 25-m resolution, 22-band image collection available through GEE. This image collection is not an official NASA product, but instead represents a reformatted version of publicly available GEDI data that is designed to work seamlessly with other spatial data on GEE. For pixels containing more than one "quality" footprint in a given month, the rasterized image represents data from the footprint with the higher sensitivity value. Individual raster images were created over each UTM grid zone (6 • longitude × 8 • latitude; Figure 1). The most relevant variables for this paper are the fields showing relative height (rhNN, in meters), representing the height above the ground from below which a specified percentage of the laser's energy returns (e.g., rh50 is the midpoint of the height distribution of returned energy). The GEDI GEE asset will be updated as new data are released and the entire archive is reprocessed. Documentation on GEE will note any changes to asset specifications. For pixels containing more than one "quality" footprint in a given month, the rasterized image represents data from the footprint with the higher sensitivity value. Individual raster images were created over each UTM grid zone (6° longitude × 8° latitude; Figure 1). The most relevant variables for this paper are the fields showing relative height (rhNN, in meters), representing the height above the ground from below which a specified percentage of the laser's energy returns (e.g., rh50 is the midpoint of the height distribution of returned energy). The GEDI GEE asset will be updated as new data are released and the entire archive is reprocessed. Documentation on GEE will note any changes to asset specifications.

Determining the Effect of Highly Local Forest Height Model Calibration Using GEDI Waveforms
A global sample of GEDI waveforms was drawn to test the performance of canopy height models calibrated at different scales. The sampled population of waveforms included data acquired from 25 March to 30 September 2019, which was all of the data publicly available at the time of this analysis. Only waveforms with a quality flag (see quality in Table 1) of 1 and falling in a forested cover class according to the CGLS LC100 land cover map from the Copernicus Global Land Survey [26] were considered here. The sample consisted of 10,000 forested waveforms, distributed randomly across UTM grid zones in numbers proportional to grid zone share of global forest cover according to the LC100 map (Figure 1).
At each waveform in the sample, 100 neighboring waveforms were randomly selected at 3 different scales: within 3 km, within 30 km, and on the same continent. Waveforms without at least 100 neighbors at the 3 km scale were excluded from the comparison, leaving 7615 of what are hereafter called "focal sites" (Figure 2). The rh98 value for each focal site was predicted with independent variables from Landsat (described below) three times, using a Random Forests model [27] calibrated with 100 randomly selected footprints at each of the three scales. Thus, three different models were built at each of 7615 focal sites: one with rh98 predicted from 100 neighboring GEDI shots within 3 km, one with 100 neighbors within 30 km, and one with 100 GEDI shots randomly selected from the same continent.

Results and Discussion
GEDI was designed to provide a sample, not a census, of the structure of the world's temperate and tropical forests [22]. One of the mission's primary science deliverables is a 1 × 1 km grid of biomass estimates (with estimated uncertainty), created with a hybrid model-based mode of inference that explicitly considers GEDI's sampling design [32]. However, while GEDI's spatially discontinuous design allows for clear, transparent estimation of biomass and other forest structure variables without the production of high-resolution maps, there are nevertheless important applications for such maps.
For instance, Tyukavina et al. [33] created a global canopy height map, similar to the maps created here, for the purpose of creating spatial strata to be used in the estimation of deforestation impacts on carbon emissions. Jantz et al. [34] used carbon maps to identify corridors between protected areas. Hansen et al. [35] overlaid a map of structurally complex forests with a map of lower human pressure to identify forests of high conservation value. Good Landsat-based maps will also be needed by the GEDI mission itself. In 1 km grid cells where the number of clear footprints is insufficient for making a sample-based estimate, a process called Hierarchical Model-Based inference will be used as a contingency. This approach uses biomass predictions at nearby GEDI footprints to train a second level of models to predict biomass using a wall-to-wall dataset such as Landsat [13]. The second-level model could be trained with the kind of 3 km search radius used here, for example, and applied to Landsat data to create GEDI biomass estimates for 1 km cells where sample-based estimates are not possible.
The results of the experiment reported here suggest that GEDI's dense sample will partially mitigate well-known limitations due to Landsat forest structure signal saturation. The interquartile range of residual errors and the RMSE of predictions produced for the 7615 target cells declined as a function of decreasing size of the calibration window ( Table 2). The impact of more local model calibration was even more evident when considering bias alone. When models lack a basis for discriminating different levels of the dependent variable, minimization of error (through whatever function the model employs) usually dictates predictions near the mean of the training data. The resulting overprediction of the low end of forest structure metrics and the underprediction of the high end have been noted elsewhere with the use of Landsat [36]. This phenomenon can result in maps that do not show the extreme conditions of a forest population. Furthermore, it can violate assumptions of homoskedasticity made in the formal modes of inference mentioned above. This Predictions across the 7615 focal sites at each scale were compared to observed (and independent) focal site rh98 values. The effect of calibration scale was quantified by deriving both the root mean squared error (RMSE) and interquartile range of residual errors at each scale. Additionally, predictions were plotted in relation to observed values to assess systematic over-or under-prediction in regions along the range of rh98. Predictions were also evaluated by five global plant functional types, as mapped by Diaz et al. [28].
The Landsat predictor data were generated using harmonic functions [29] fitted to all cloud-free spectral values between 1999 and 2019 for pixels corresponding to each of the focal and neighbor sites. Specifically, these functions were fit with the ordinary least squares method and included terms that accommodate cyclical reflectance changes due to vegetation phenology. The function-fitting algorithm [29] features the same breakpoint generation function employed in the CCDC algorithm (Continuous Change Detection and Classification) [30]. Breaks generally indicate a land cover change [31], and the work reported here used only the harmonic function fit to imagery acquired since the date of the most recent breakpoint. Inputs to the Random Forest models included 60 independent variables: 8 parameters describing the harmonic fit for 6 reflectance bands (48 inputs); the RMSE of the fitted harmonic function for each band (6 inputs); and reflectance values for each of 6 Landsat bands predicted for 15 July 2019 by the harmonic functions. That date was arbitrary, but was near the mid-point of the GEDI data used in this analysis. No non-spectral variables were included in the models. Random Forest models used 500 decision trees, with one-third of the variables tried for each split.
It should be noted that rh98 does not equate to canopy top height. Differences between canopy height and rh98 may stem from: topography within the footprint, problems in correctly identifying ground elevation, and signal to noise uncertainty at the top of the canopy. The rh98 variable was selected here as a simple proxy for other forest structure variables.

Results and Discussion
GEDI was designed to provide a sample, not a census, of the structure of the world's temperate and tropical forests [22]. One of the mission's primary science deliverables is a 1 × 1 km grid of biomass estimates (with estimated uncertainty), created with a hybrid model-based mode of inference that explicitly considers GEDI's sampling design [32]. However, while GEDI's spatially discontinuous design allows for clear, transparent estimation of biomass and other forest structure variables without the production of high-resolution maps, there are nevertheless important applications for such maps.
For instance, Tyukavina et al. [33] created a global canopy height map, similar to the maps created here, for the purpose of creating spatial strata to be used in the estimation of deforestation impacts on carbon emissions. Jantz et al. [34] used carbon maps to identify corridors between protected areas. Hansen et al. [35] overlaid a map of structurally complex forests with a map of lower human pressure to identify forests of high conservation value. Good Landsat-based maps will also be needed by the GEDI mission itself. In 1 km grid cells where the number of clear footprints is insufficient for making a sample-based estimate, a process called Hierarchical Model-Based inference will be used as a contingency. This approach uses biomass predictions at nearby GEDI footprints to train a second level of models to predict biomass using a wall-to-wall dataset such as Landsat [13]. The second-level model could be trained with the kind of 3 km search radius used here, for example, and applied to Landsat data to create GEDI biomass estimates for 1 km cells where sample-based estimates are not possible.
The results of the experiment reported here suggest that GEDI's dense sample will partially mitigate well-known limitations due to Landsat forest structure signal saturation. The interquartile range of residual errors and the RMSE of predictions produced for the 7615 target cells declined as a function of decreasing size of the calibration window ( Table 2). The impact of more local model calibration was even more evident when considering bias alone. When models lack a basis for discriminating different levels of the dependent variable, minimization of error (through whatever function the model employs) usually dictates predictions near the mean of the training data. The resulting overprediction of the low end of forest structure metrics and the underprediction of the high end have been noted elsewhere with the use of Landsat [36]. This phenomenon can result in maps that do not show the extreme conditions of a forest population. Furthermore, it can violate assumptions of homoskedasticity made in the formal modes of inference mentioned above. This pattern, clearly visible when Landsat models were calibrated at a continental scale, was diminished in models calibrated at 30 km and especially with models calibrated at 3 km ( Figure 3).
Prediction uncertainty ( Table 2) was high relative to other space-based LiDAR-based height models (e.g., [21]). There are several possible reasons for this. Our objective was to assess the effect of local calibration on models of canopy height based on Landsat. Other independent variables such as forest type and topography may have improved model performance but were excluded as peripheral to this objective. Additionally, the sample used here was subject to less screening than the samples used in many other papers. For example, we omitted shots determined through GEDI's processing system to be contaminated by clouds, but we did not filter out shots on steep slopes, which is a step common to most published work with GLAS. Sensitivity of canopy height RMSE to even valid outliers was demonstrated by Simard et al. [37], who documented height RMSE going from 6.1 to 4.4 m with the exclusion of seven outliers. Our inclusion of sample locations in high-relief areas likely led to higher levels of assessed error, but it also led to an error metric more representative of global performance than one might produce with a more filtered dataset. valid outliers was demonstrated by Simard et al. [37], who documented height RMSE going from 6.1 to 4.4 m with the exclusion of seven outliers. Our inclusion of sample locations in high-relief areas likely led to higher levels of assessed error, but it also led to an error metric more representative of global performance than one might produce with a more filtered dataset.   Improvements in high-low bias were not equal across the sample. Predictions of rh98 in evergreen needleleaf forests (ENT; 16% of all focal footprints) were closer to unity (predicted = observed) than other forest types when using continental calibration data (Figure 4). Better continental-scale calibration in this forest type may be related to the simpler or perhaps more uniform canopy form of coniferous trees. Coniferous forests have been observed to exhibit better relationships between Landsat and biomass than deciduous forests [17]. The biggest improvement occurred in evergreen broadleaf tree ecosystem (EBT), which are represented by half of the focal site sample. Figure 4 demonstrates little relationship between rh98 and the Landsat predictors for EBT when considered at the continental scale, with predictions occurring in an undifferentiated band across the entire range of observed rh98.
Predictions of rh98 using a calibration range of 3 km, while not as tightly correlated with observed values as in other vegetation types, nevertheless represented a substantial gain in prediction accuracy, especially important because of the large extent of EBT. the range of ecological variability, thus simplifying the prediction task; and (2) ensuring that when models revert to prediction of the mean of the training data they at least revert to a more locally representative mean. This experiment randomly drew 100 neighbors to create separate pixel-level rh98 models at each scale of calibration. This ensured comparability across scales, but would not be operationally efficient for creating a global map. Extension to mapping applications would likely involve using a local tiling system, where all constituent clear footprints were used to train a tile-level model. The results presented here suggest that tiling should be as local as possible. Operational mapping may also include imagery from other sensors or stationary predictors related to topography or ownership class. Texture metrics applied to the Landsat-like Copernicus Sentinel-2 platform have also proven effective in training multispectral imagery from LiDAR returns [38].  The effectiveness of highly local calibration is likely due to some combination of: (1) restricting the range of ecological variability, thus simplifying the prediction task; and (2) ensuring that when models revert to prediction of the mean of the training data they at least revert to a more locally representative mean. This experiment randomly drew 100 neighbors to create separate pixel-level rh98 models at each scale of calibration. This ensured comparability across scales, but would not be operationally efficient for creating a global map. Extension to mapping applications would likely involve using a local tiling system, where all constituent clear footprints were used to train a tile-level model. The results presented here suggest that tiling should be as local as possible. Operational mapping may also include imagery from other sensors or stationary predictors related to topography or ownership class. Texture metrics applied to the Landsat-like Copernicus Sentinel-2 platform have also proven effective in training multispectral imagery from LiDAR returns [38].
It is noteworthy that only 7615 out of 10,000 sample locations had at least 100 neighbors within a 3 km range. Locations with insufficient neighbors might have occurred in isolated forest patches, such as on islands or in linear riparian areas, or they may simply have not been well sampled by the GEDI data collected prior to October, 2019 (representing approximately 25% of the measurements in GEDI's original 2-year period of operation). Coverage should increase in both cases as the mission continues, and the results here suggest that finer scale calibration supported by additional LiDAR measurements will result in important reductions in Landsat forest structure signal saturation.

Conclusions
Highly local calibration of Landsat-based models of canopy height reduced both overall error and the tendency to over-predict the high end of the range and under-predict the low end. This suggests that spaceborne LiDAR data has potential to reduce long-acknowledged structure-mapping limitations of passive optical imagery. Combination of GEDI and Landsat data was facilitated by a new GEE GEDI asset that is announced and described in this Technical Note. Similar uses of GEDI waveform data with other wall-to-wall data sources optimized to be responsive to forest structure-new and upcoming radar instruments, for example-should likewise be possible and fruitful using cloud-based archives. As more GEDI data are collected, it will become possible to use smaller local neighborhoods, which has the potential to further improve maps of forest height and biomass across the globe.