Estimating Time Since the Last Stand-Replacing Disturbance (TSD) from Spaceborne Simulated GEDI Data: A Feasibility Study

: Stand-level maps of past forest disturbances (expressed as time since disturbance, TSD) are needed to model forest ecosystem processes, but the conventional approaches based on remotely sensed satellite data can only extend as far back as the ﬁrst available satellite observations. Stand-level analysis of airborne LiDAR data has been demonstrated to accurately estimate long-term TSD (~100 years), but large-scale coverage of airborne LiDAR remains costly. NASA’s spaceborne LiDAR Global Ecosystem Dynamics Investigation (GEDI) instrument, launched in December 2018, is providing billions of measurements of tropical and temperate forest canopies around the globe. GEDI is a spatial sampling instrument and, as such, does not provide wall-to-wall data. GEDI’s lasers illuminate ground footprints, which are separated by ~600 m across-track and ~60 m along-track, so new approaches are needed to generate wall-to-wall maps from the discrete measurements. In this paper, we studied the feasibility of a data fusion approach between GEDI and Landsat for wall-to-wall mapping of TSD. We tested the methodology on a ~52,500-ha area located in central Idaho (USA), where an extensive record of stand-replacing disturbances is available, starting in 1870. GEDI data were simulated over the nominal two-year planned mission lifetime from airborne LiDAR data and used for TSD estimation using a random forest (RF) classiﬁer. Image segmentation was performed on Landsat-8 data, obtaining image-objects representing forest stands needed for the spatial extrapolation of estimated TSD from the discrete GEDI locations. We quantiﬁed the inﬂuence of (1) the forest stand map delineation, (2) the sample size of the training dataset, and (3) the number of GEDI footprints per stand on the accuracy of estimated TSD. The results show that GEDI-Landsat data fusion would allow for TSD estimation in stands covering ~95% of the study area, having the potential to reconstruct the long-term disturbance history of temperate even-aged forests with accuracy (median root mean square deviation = 22.14 years, median BIAS = 1.70 years, 60.13% of stands classiﬁed within 10 years of the reference disturbance date) comparable to the results obtained in the same study area with airborne LiDAR.

resolution, such as Landsat, are worth exploring. Landsat provides global coverage-ideal for vegetation landscape mapping-and has an adequate spatial resolution (i.e., 30 m) for stand-level analysis at regional scales. Moreover, Landsat data have been used for image segmentation in studies related to the estimation of stand structure attributes, such as height, biomass, and TSD [33,[40][41][42][43].
We hypothesized that the fusion of GEDI and Landsat data could be suitable to map TSD wallto-wall at the stand level, by implementing a GEOBIA approach to delineate the stands. However, the feasibility of the object-based data fusion approach would depend on the combined effects of the GEDI sampling grid, the delineation of the forest stands, and the size and spatial pattern of standlevel disturbances. These considerations are directly related to the number of GEDI footprints available to estimate TSD per stand. In this study, we used GEDI data simulated over the two-year programmed mission [44] to estimate TSD in forest stands delineated from Landsat data. The main goals were to assess whether (1) Landsat data can be used to delineate even-aged forest patches to generate wall-to-wall TSD maps, and (2) the GEDI sampling configuration is a limiting factor to map stand-level TSD.

Study Area
The study area is in the Nez Perce-Clearwater National Forest in Idaho, USA ( Figure 1). It is a temperate mixed-conifer forest covering ~52,500 ha of the Clear Creek, Selway River, and Elk Creek watersheds. The study area is mountainous, with elevation ranging from ~400 to 2000 m, and steep, with slopes often higher than 30%. Predominant tree species are Douglas fir (Pseudotsuga menziesii (Mirb.) Franco.) and grand fir (Abies grandis (Dougl. Ex D. Don) Lindl.), often accompanied by ponderosa pine (Pinus ponderosa C. Lawson) and western red cedar (Thuja plicata Donn ex D.Don). The landscape is largely characterized by the presence of even-aged stands resulting primarily from (1) stand-replacing wildfires that have disturbed the study area since the 19th century and (2) clearcut harvests during the latter half of the 20th century [45,46]. Other anthropogenic disturbances The landscape is largely characterized by the presence of even-aged stands resulting primarily from (1) stand-replacing wildfires that have disturbed the study area since the 19th century and (2) clearcut harvests during the latter half of the 20th century [45,46]. Other anthropogenic disturbances shaping the landscape of the study area include commercial thinning, seed tree, and shelterwood cuts [46].

LiDAR Data and GEDI Simulated Data
Small-footprint discrete return airborne LiDAR data were acquired in October 2009 and June-July 2012 on the Clear Creek watershed and the Selway River and Elk Creek watersheds, respectively. A Leica ALS60 instrument in multi-pulse mode and at a pulse rate of 69,400 Hz was used in the Clear Creek area and another at 88,000 Hz in the Selway area. The average pulse density was greater than 4 points/m 2 in both datasets.
Large-footprint LiDAR waveforms were simulated from the airborne LiDAR point clouds using the GEDI simulator, developed for the calibration of algorithms and assessment of GEDI mission accuracy [44,47]. The waveforms were simulated on 22-m-diameter footprints (i.e., the mean expected GEDI footprint diameter), the location of which is determined by the nominal ISS orbits, combined with the GEDI sampling configuration.
Two grids of simulated GEDI footprints were generated over the study area: 1. 66,177 simulated footprints; the simulation assumes that the GEDI instrument will be firing 60% of the time over the two-year programmed mission (henceforth, 'full grid'). 2.
28,602 simulated footprints; the simulation assumes that, besides the 60% firing rate, an additional data loss of approximately 50% of the footprints is expected due to snow and cloud cover (henceforth, 'cloud grid').
From the waveforms simulated for each footprint, 57 summary metrics were calculated (Table 1). There are several ground finding algorithms being tested by the GEDI mission. We mainly used a set of simulated waveform metrics where the ground is estimated from the center of gravity of the waveform intensity between the lowest two inflection points on a denoised and smoothed waveform. Default GEDI calibration database settings were used, with a smoothing width at 0.7 m and minimum feature width at 3 bins, and no noise was selected for the simulations. Table 1. List of the 57 derived metrics from simulated GEDI waveforms [44], calculated on each footprint of the two grids ( Figure 1).
rhInfl 0-100 Relative height (rh) metrics, ranging from 0%-100% and computed at 2% steps, using ground from inflection points (m) maxHalfCov Canopy cover (fraction) from double the energy beneath the lowest maximum ground infHalfCov Canopy cover (fraction) from double the energy beneath the inflection point ground Leading edge extent Leading edge extent (m), related to the slope [48] Trailing edge extent Trailing edge extent (m), related to canopy elevation [48] BlairSense Blair's sensitivity metric. Canopy cover at which this SNR would have 90% chance of detecting ground Topographic aspect has an important influence on microclimate gradients that constrain forest productivity and composition; therefore, the solar radiation aspect index 'SRAI' [49], defined as a linear transformation of the topographic aspect, was calculated for each GEDI simulated footprint. The SRAI ranges between 0 and 1 and relates to solar insolation: lower values represent cooler and moister areas that are north-northeast oriented, and higher values represent warmer and drier areas Remote Sens. 2020, 12, 3506 5 of 25 that are south-southwest oriented. SRAI was calculated on a 30-m raster grid (Equation (1)) and for each simulated GEDI footprint location, the SRAI value at the footprint center was extracted as: where α is the topographic aspect that was obtained from the digital terrain model (DTM) interpolated at a 30-m resolution from the ground returns of the airborne LiDAR point cloud using FUSION software [50]. Note that SRAI was only used for stratification and selection of the training dataset, i.e., it was not included as a predictor variable to estimate TSD (see Section 3.2).

Landsat Data
A cloud-free Landsat 8 Operational Land Imager (OLI) image (path 42, row 28) close in time (23 July 2013) to the airborne LiDAR surveys was downloaded from the USGS Earth Explorer (https://earthexplorer.usgs.gov). Landsat 8 was preferred over Landsat 5 or 7, both overlapping in time with the airborne LiDAR acquisitions, because of the poor radiometric resolution of the TM instrument on Landsat 5, and the failure in 2003 of the Scan Line Corrector (SLC) of the ETM+ sensor on Landsat 7.
The Landsat top of atmosphere reflectance multispectral data were pan-sharpened using the panchromatic band and the near neighbor diffusion (NNDiffuse) algorithm implemented in ENVI software [51]. Tasseled cap (TC) indices of brightness (TCB), greenness (TCG), and wetness (TCW) were then calculated using the coefficients published in Baig et al. [52] for the TC transformation based on Landsat 8 top of atmosphere reflectance. The three indices were rescaled between 0-100 applying a 2% linear stretch to be used as input data for image segmentation (see Section 2.4).

Landsat Image Segmentation
Landsat segmentation was performed using the multiresolution segmentation (MRS) algorithm [53] implemented in eCognition 9.1 software. The MRS segmentation output is adjusted through the specification and weighting of input data, and three parameters: scale (unitless, unbounded), compactness, and shape (both unitless, values defined between 0 and 1). The parameters control the spectral homogeneity, size, border smoothness, and compactness of the image-objects. TC indices were the input data for segmentation since they are more correlated to successional stages and recent disturbances (particularly the TCW) compared to other Landsat-derived indices [54][55][56]. Additionally, they have been previously used for the segmentation of forest patches in studies related to the estimation of forest structural parameters, such as biomass or TSD [33,40]. The three TC indices were included as input for segmentation and four combinations assigning different layer weights in the MRS algorithm were tested (one combination assigning equal weight to the three TC layers and another three combinations assigning double weight to one of the layers). For each combination of input data layers and weights, a set of 135 segmentations were obtained varying the values of the scale parameter (between 10 and 100 in increments of 2 units) and the compactness parameter (values of 0.1, 0.5, and 0.9). The shape parameter was kept constant at a low value (0.1) as recommended in Kavzoglu and Yildiz [57].
Subsequently, a two-stage object-based evaluation strategy was followed to identify an optimal segmentation [58]. Area-based dissimilarity metrics of oversegmentation and undersegmentation were used to identify the best combination of input layer weights [58,59]. Subsequently, measures of spatial autocorrelation [58,[60][61][62] were used to identify, out of the 135 segmentations generated, the one with the maximum degree of intra-segment homogeneity and inter-segment heterogeneity. The optimal segmentation was obtained, giving the TCB and TCG equal weights in the MRS segmentation algorithm, while TCW was given double weight, and setting compactness = 0.1, shape = 0.1, and scale parameter = 46. For a more detailed explanation of the selection workflow, which combines both unsupervised and supervised object-based evaluation methods, refer to Sanchez-Lopez et al. [58]. The segmentation resulted in 2098 image-objects. Of these, 39 were smaller than 2 ha, which is commonly used as a threshold between stand and patch forest management units in areas of the Pacific Northwest [46], and were therefore merged with the most spectrally similar neighboring image-object, applying the 'remove' eCognition algorithm. Figure 2 shows the resulting segmentation composed of 2059 image-objects.
Remote Sens. 2020, 12, x FOR PEER REVIEW  6 of 27 image-object, applying the 'remove' eCognition algorithm. Figure 2 shows the resulting segmentation composed of 2059 image-objects.

Airborne LiDAR Segmentation
An airborne LiDAR-based map of even-aged forest stands was used as a baseline. The airborne LiDAR datasets are described in Section 2.2 (for more details on the processing steps, see Sanchez-Lopez et al. [12]). The map of even-aged forest stands, described and validated in Sanchez-Lopez et al. [12,58], was generated through the segmentation of the gridded 95th percentile summary metric of the airborne LiDAR canopy height returns binned at a 30-m resolution, identified as the optimal LiDAR metric for the delineation of even-aged forest stands. The map defines 1470 forest stands with an average size of 35.6 ha and a median size of 29 ha ( Figure 3).

Airborne LiDAR Segmentation
An airborne LiDAR-based map of even-aged forest stands was used as a baseline. The airborne LiDAR datasets are described in Section 2.2 (for more details on the processing steps, see Sanchez-Lopez et al. [12]). The map of even-aged forest stands, described and validated in Sanchez-Lopez et al. [12,58], was generated through the segmentation of the gridded 95th percentile summary metric of the airborne LiDAR canopy height returns binned at a 30-m resolution, identified as the optimal LiDAR metric for the delineation of even-aged forest stands. The map defines 1470 forest stands with an average size of 35.6 ha and a median size of 29 ha ( Figure 3).

Ancillary Reference Data
Records of stand-replacing disturbances that occurred in the study area between 1870 and 2005 were available from two different data sources: (1) A dataset of burned area perimeters of historical wildfires that occurred from 1870 to 2000, photointerpreted using fire atlas data and historical aerial photographs [45]. The dataset includes a classification of fire severity classes of unburned, low, moderate, and high severity. We kept for

Ancillary Reference Data
Records of stand-replacing disturbances that occurred in the study area between 1870 and 2005 were available from two different data sources: (1) A dataset of burned area perimeters of historical wildfires that occurred from 1870 to 2000, photo-interpreted using fire atlas data and historical aerial photographs [45]. The dataset includes a classification of fire severity classes of unburned, low, moderate, and high severity. We kept for analysis the areas labeled as moderate and high severity, which equate to stand-replacing fire. The dataset includes 166 fires, the greater majority recorded prior to 1940 (157 fires, representing 99.95% of the area burned). (2) The FACTS (Forest ACtivity Tracking System) timber harvest dataset, which is maintained by the U.S. Forest Service and contains spatial and temporal records of planned and accomplished forest management activities, such as clearcuts. It consists of polygon features with embedded metadata that include the fiscal year in which the activity was done. The dataset contained harvest records since 1956 for the study area [46].
The two datasets were harmonized, merged, and manually refined to amend potential unreported post-disturbances (e.g., small areas with evident biomass removal and other post-logging activity, especially within fire perimeters) as described in Sanchez Lopez et al. [12]. TSD was assigned to each polygon, as the number of years between the stand-replacing disturbance and 2012, used as the reference year based on the acquisition dates of the airborne LiDAR (used to simulate the GEDI data) (Section 2.2).
To ensure that the reference dataset included stands of both categories, i.e., disturbed after 1870 and undisturbed since 1870, 20 additional undisturbed forest stands (i.e., not disturbed since before 1870) were digitized and visually interpreted from the 2011 National Agriculture Imagery Program (NAIP) digital imagery (1-m spatial resolution using the closest available date to the LiDAR data) and the airborne LiDAR point clouds. The stands were characterized by dense canopy closure and relatively tall height of the dominant cohort of the forest canopy [12]. The reference dataset covered 29,154 ha (55.5% of the study area) ( Figure 4). For each combination of GEDI footprint grid and segmentation-derived stand maps, a sample of forest stands of known TSD was extracted through stratified random sampling from the total population of candidate stands, defined as all the stands that (1) overlapped at least 75% of their area with the ancillary reference dataset ( Figure 4); and (2) enclosed at least one GEDI footprint.
Topographic and canopy height-related metrics were used as stratification variables, adapting the stand-level stratification defined in Sanchez-Lopez et al. [12]. LiDAR-assisted stratification for forest inventories is cost-effective to reduce the amount of data required for model calibration in comparison to random sampling [63,64]. Our GEDI-assisted stratification was used to ensure the representativeness of the training set and minimize the number of required stands with known TSD. In practice, the time of the disturbance may only be known for a small percentage of stands that comprise the landscape and that potentially can be used to locate training GEDI footprints. The dataset was compiled from records of historical burns digitized from aerial photographs [45] and digitized perimeters of the clearcut management units reported in the FACTs (Forest Service Activity Track System) harvest dataset [46]. Twenty forest stands undisturbed since before 1870 were delineated through visual interpretation of 1-m 2011 NAIP (National Agriculture Imagery Program) imagery and added to the dataset.
The topographic aspect-derived metric 'SRAI' was adopted as the first stratification variable, and the 98% relative height GEDI metric ('rhInfl98′, Table 1) was selected as a second stratification variable. For each candidate stand, the mean 'SRAI' and mean 'rhInfl98′ were calculated considering all the footprints belonging to the stand. Two 'SRAI' strata (mesic/xeric) of equal size were defined, using the median 'SRAI' of the candidate stands as a threshold. Within each 'SRAI' stratum, five 'rhInfl98′ strata were defined, corresponding to the quintiles of the 'rhInfl98′ distribution. As a result, The dataset was compiled from records of historical burns digitized from aerial photographs [45] and digitized perimeters of the clearcut management units reported in the FACTs (Forest Service Activity Track System) harvest dataset [46]. Twenty forest stands undisturbed since before 1870 were delineated through visual interpretation of 1-m 2011 NAIP (National Agriculture Imagery Program) imagery and added to the dataset.

GEDI Coverage Analysis
A preliminary assessment was conducted to establish whether the GEDI footprint grid was sufficiently dense, relative to the size of the forest stands, to allow for the implementation of an object-oriented data fusion approach. The simulated GEDI footprints were combined with the stand map obtained from image segmentation, to compute the distribution of the number of GEDI footprints per stand. A parameter of particular importance is the number of forest stands not intersected by any GEDI footprints, i.e., the stands where no direct TSD estimation from GEDI would be possible.
The analysis was performed by considering the six possible combinations of the two stand maps derived from image segmentation (i.e., the Landsat segmentation described in Section 2.4, and the LiDAR segmentation described in Section 2.5) and three subsets of the GEDI simulated footprints, namely (a) the 'full grid' defined in Section 2.2; (b) the subset of the 'full grid' obtained by selecting only the footprints that are fully enclosed in a single forest stand, i.e., omitting footprints on stand edges (henceforth 'full grid-within stand'); and (c) the subset of the 'cloud grid' obtained by selecting only the footprints that are fully enclosed in a single forest stand (henceforth 'cloud grid-within stand'). Both image segmentations were considered because, as discussed in Sections 1 and 2.4, it is reasonable to expect the Landsat segmentation to be a sub-optimal stand map. Therefore, the use of the optimal LiDAR segmentation, already validated in Sanchez-Lopez et al. [58], allows verification of whether any limitations of this study are to be attributed to the GEDI sampling, or to the use of Landsat data. Similarly, it is expected that eliminating from the subsequent steps of the analysis the footprints that straddle stand boundaries might increase the accuracy of the TSD estimation but at the cost of a reduced number of available measurements.

Estimation of Time Since Disturbance
The TSD estimation was performed by adapting to GEDI simulated waveforms the object-oriented methodology introduced by Sanchez-Lopez et al. [12] for airborne LiDAR data. TSD was estimated through (a) the selection of a training set of GEDI footprints belonging to forest stands with known TSD, used to train a random forest classifier; (b) the imputation of TSD to the entire set of GEDI footprints covering the study area; (c) the subsequent extrapolation to the stand level by using the stand polygons to calculate median TSD from the footprint-level estimations; and finally (d) accuracy assessment comparing the estimated TSD to the ancillary reference dataset.
The TSD estimation was performed independently using the six combinations of segmentation-derived stand maps (i.e., Landsat and LiDAR segmentation) and GEDI footprint grids defined above (i.e., 'full grid', 'full grid-within stand', and 'cloud grid-within stand'). The six resulting scenarios ranged from a conservative scenario in the study area (i.e., 'cloud grid-within stand' for the footprint-level imputation, extrapolated to the sub-optimal Landsat-derived stand map) to the baseline representing ideal conditions (i.e., 'full grid' of GEDI footprints, extrapolated to the optimal LiDAR-derived stand map).
For each of the six combinations, a sensitivity analysis was performed, to determine the sensitivity of the accuracy metrics to the size of the training sample, i.e., the number of forest stands with known TSD used to locate the training set of GEDI footprints. The training sample was extracted through random sampling using a GEDI-assisted stratification. Ten sample sizes were tested, and the random extraction was repeated 100 times for each defined sample size. The entire classification procedure was performed for each instance of the random extraction (6 grid/segmentation combinations, 10 sample sizes, 100 random extractions resulting in 6000 total classifications), the accuracy of each classification was assessed, and summary nonparametric statistics (median, 25th, and 75th percentiles) of the accuracy metrics were calculated for each grid/segmentation combination and for each sample size. Section 3.2.1 describes in detail the experimental design of the sensitivity analysis and the stratified random sampling, and Section 3.2.2 reports the accuracy metrics used. For each combination of GEDI footprint grid and segmentation-derived stand maps, a sample of forest stands of known TSD was extracted through stratified random sampling from the total population of candidate stands, defined as all the stands that (1) overlapped at least 75% of their area with the ancillary reference dataset ( Figure 4); and (2) enclosed at least one GEDI footprint.
Topographic and canopy height-related metrics were used as stratification variables, adapting the stand-level stratification defined in Sanchez-Lopez et al. [12]. LiDAR-assisted stratification for forest inventories is cost-effective to reduce the amount of data required for model calibration in comparison to random sampling [63,64]. Our GEDI-assisted stratification was used to ensure the representativeness of the training set and minimize the number of required stands with known TSD. In practice, the time of the disturbance may only be known for a small percentage of stands that comprise the landscape and that potentially can be used to locate training GEDI footprints.
The topographic aspect-derived metric 'SRAI' was adopted as the first stratification variable, and the 98% relative height GEDI metric ('rhInfl98 , Table 1) was selected as a second stratification variable. For each candidate stand, the mean 'SRAI' and mean 'rhInfl98 were calculated considering all the footprints belonging to the stand. Two 'SRAI' strata (mesic/xeric) of equal size were defined, using the median 'SRAI' of the candidate stands as a threshold. Within each 'SRAI' stratum, five 'rhInfl98 strata were defined, corresponding to the quintiles of the 'rhInfl98 distribution. As a result, 10 strata of equal size were identified and used for the selection of training data ( Figure 5).
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 27 Figure 5. Scheme of the stratified random sampling to extract the training set of stands of known TSD. The mean topographic solar-radiation aspect index ('SRAI') and the mean GEDI 98% relative height (from inflection points) ('rhInfl98′) were used as stratification variables. This is a specific example with sample size = 7 (seven stands per stratum, reported as red triangles), resulting in a total of 70 stands sampled to create the training dataset, for the Landsat segmentation and 'full grid-within stand' scenario. In this case, the quintiles of the GEDI 98% relative height are 17. The extraction was replicated 100 times for each sample size, with the sample size varying from 1 to 10 in each stratum (i.e., 10 to 100 training stands in total). Therefore, for each scenario, 1000 random extractions were performed. Note that, onwards, we refer to the sample size as the number of stands sampled in each stratum.
The R 'yaImpute' package [65] was used to train an RF classifier and predict TSD for each GEDI footprint. The training data used in the RF consisted of the GEDI footprint observations enclosed within the perimeters of the sampled stands. TSD, as observed in the ancillary reference data, was the dependent variable and the 57 GEDI derived metrics of the simulated GEDI waveforms were the predictor variables ( Table 1). The number of decision trees was left at the default value (i.e., 500), as well as the number of predictor variables available to split at each tree node, which was seven. Once the RF classifier was trained, TSD was imputed to the grid of GEDI simulated footprints.
TSD at the stand level was calculated as the median value of the TSD predictions of the GEDI Figure 5. Scheme of the stratified random sampling to extract the training set of stands of known TSD. The mean topographic solar-radiation aspect index ('SRAI') and the mean GEDI 98% relative height (from inflection points) ('rhInfl98 ) were used as stratification variables. This is a specific example with sample size = 7 (seven stands per stratum, reported as red triangles), resulting in a total of 70 stands sampled to create the training dataset, for the Landsat segmentation and 'full grid-within stand' scenario. In this case, the quintiles of the GEDI 98% relative height are 17. The extraction was replicated 100 times for each sample size, with the sample size varying from 1 to 10 in each stratum (i.e., 10 to 100 training stands in total). Therefore, for each scenario, 1000 random extractions were performed. Note that, onwards, we refer to the sample size as the number of stands sampled in each stratum.
The R 'yaImpute' package [65] was used to train an RF classifier and predict TSD for each GEDI footprint. The training data used in the RF consisted of the GEDI footprint observations enclosed within the perimeters of the sampled stands. TSD, as observed in the ancillary reference data, was the dependent variable and the 57 GEDI derived metrics of the simulated GEDI waveforms were the predictor variables (Table 1). The number of decision trees was left at the default value (i.e., 500), as well as the number of predictor variables available to split at each tree node, which was seven. Once the RF classifier was trained, TSD was imputed to the grid of GEDI simulated footprints.
TSD at the stand level was calculated as the median value of the TSD predictions of the GEDI footprints intersecting or fully enclosed within each forest stand.

Accuracy Assessment Metrics
The accuracy of the predictions was assessed at the stand level using three accuracy metrics, defined as follows: The root mean square difference between the estimated and observed TSD of the stands (RMSD): The bias (BIAS): The percentage of stands with absolute TSD error of less than 10 years (Perct.10). Ten years was selected as a benchmark to compute this accuracy metric given (1) the complexity of accurately estimating stand age using structural and topography-related derived indicators [13], (2) the analysis performed in the same study area using airborne LiDAR [12], and the potential suitability of decadal maps of stand-replacing TSD disturbance to categorize successional forest stages: In Equations (2)-(4), n is the number of stands, andŶ i and Y i are respectively the estimated and observed TSD of stand i.
The validation dataset used to compute these metrics comprised all the stands of known TSD that were not used for training. The stands overlapping with the manually interpreted undisturbed forest stands (i.e., not disturbed since 1870) were also excluded because their age was unknown.

GEDI Coverage Analysis
The cumulative distribution of the percentage of stands and percentage of the study area, reported as a function of the minimum number of footprints per stand, is shown in Figure 6 for the six combinations of forest stand segmentation and GEDI footprint grids. The representative parameters of the distributions (total number of GEDI footprints, 95th, 75th, 50th, and 25th percentiles, and number of stands with no GEDI footprints) are reported in Table 2.

GEDI Coverage Analysis
The cumulative distribution of the percentage of stands and percentage of the study area, reported as a function of the minimum number of footprints per stand, is shown in Figure 6 for the six combinations of forest stand segmentation and GEDI footprint grids. The representative parameters of the distributions (total number of GEDI footprints, 95th, 75th, 50th, and 25th percentiles, and number of stands with no GEDI footprints) are reported in Table 2.  Comparison between the Landsat and the equivalent LiDAR-based scenarios confirms that oversegmentation translates into fewer footprints per stand available to estimate TSD ( Figure 6). The increase in the number of stands costs a reduction in the average stand size and the number of intersecting footprints. The effect is accentuated when only fully enclosed footprints are considered (i.e., 'full grid-within stand' and 'cloud grid-within stand') because there are more segment boundaries for GEDI footprints to straddle.
Both the 'full grid' and the 'full grid-within stand' scenarios have good overall coverage of GEDI footprints per stand. Stands intersecting 20 footprints or more represent 75% of the study area regardless of the segmentation. The 'cloud grid -within stand' scenario, on the other hand, is the most restrictive case. Note that the 'cloud grid' had approximately 50% fewer footprints than the 'full grid' (Section 2.2). Notwithstanding this, stands with more than 7 and 12 GEDI footprints observations represented 75% of the study area in the Landsat and LiDAR segmentations, respectively.
The number of stands not intersected by any GEDI footprint is always lower than 15%, and represent less than 5% of the study area in all considered scenarios (Table 2); direct estimation of TSD and associated uncertainties with the current approach is not possible in these stands.

Sensitivity Analysis of the Estimated Time Since Disturbance
The experimental design detailed in Section 3.2 resulted in 6000 separate TSD estimations, which are discussed in detail in Section 4.2.1 through Section 4.2.3. The number of stands and GEDI footprints with known TSD available for training the RF and validating the proposed methodology is displayed in Table 3. Figure 7 shows, as an illustrative example, one of the instances of the TSD estimation obtained using the Landsat segmentation and the 'full grid-within stand' GEDI footprints. Table 2. Summary parameters of the cumulative distributions of the number of footprints per stand, expressed as a percentage of the number of stands (i.e., red curves in Figure 6), and percentage of the study area (i.e., blue curves in Figure 6), for the six combinations of segmentation-derived stand maps and subsets of the GEDI simulated footprints.  Table 3. Dataset of stands with known TSD available for training and validation of the proposed methodology for the six combinations of segmentation-derived stand map and GEDI footprint grid. All these stands overlapped at least 75% of their area with the ancillary reference dataset (Figure 4), and enclosed at least one GEDI footprint. increase in the number of stands costs a reduction in the average stand size and the number of intersecting footprints. The effect is accentuated when only fully enclosed footprints are considered (i.e., 'full grid-within stand' and 'cloud grid-within stand') because there are more segment boundaries for GEDI footprints to straddle. Both the 'full grid' and the 'full grid-within stand' scenarios have good overall coverage of GEDI footprints per stand. Stands intersecting 20 footprints or more represent 75% of the study area regardless of the segmentation. The 'cloud grid -within stand' scenario, on the other hand, is the most restrictive case. Note that the 'cloud grid' had approximately 50% fewer footprints than the 'full grid' (Section 2.2). Notwithstanding this, stands with more than 7 and 12 GEDI footprints observations represented 75% of the study area in the Landsat and LiDAR segmentations, respectively.

Quantiles of the % Stand
The number of stands not intersected by any GEDI footprint is always lower than 15%, and represent less than 5% of the study area in all considered scenarios ( Table 2); direct estimation of TSD and associated uncertainties with the current approach is not possible in these stands.

Sensitivity Analysis of the Estimated Time Since Disturbance
The experimental design detailed in Section 3.2 resulted in 6000 separate TSD estimations, which are discussed in detail in Sections 4.2.1 through 4.2.3. The number of stands and GEDI footprints with known TSD available for training the RF and validating the proposed methodology is displayed in Table 3. Figure 7 shows, as an illustrative example, one of the instances of the TSD estimation obtained using the Landsat segmentation and the 'full grid-within stand' GEDI footprints.  . Illustrative example of the methodology for the stand-level TSD estimation, showing the results of one of the 100 instances generated using the Landsat segmentation, the 'full grid -within stand' GEDI subset of footprints, and a sample size of 7 training stands per stratum. Top row: location of the randomly selected training stands, with on the right a detail of the GEDI footprints intersecting the training stands (i.e., the footprints used to train the imputation). Bottom row: estimated TSD for the whole study area, with on the right a detail of the footprint-level estimations, which were then aggregated to generate the stand-level estimation. In this specific case, RMSD was 18.9 years, BIAS was 0.4 years, and Perct.10 was 67.0% (71.4% when considering only the stands disturbed since 1910).
Even-aged stands are relatively homogenous, but there is inherent within-stand variability of the vegetation in terms of age and attributes, such as canopy cover and height. This variability increases with TSD, which translates into increased variability in the GEDI footprint-level observations, as illustrated by the distribution of the´rhInfl98´GEDI metric (Figure 8). The metric shows a relatively  Even-aged stands are relatively homogenous, but there is inherent within-stand variability of the vegetation in terms of age and attributes, such as canopy cover and height. This variability increases with TSD, which translates into increased variability in the GEDI footprint-level observations, as illustrated by the distribution of the ´rhInfl98´ GEDI metric (Figure 8). The metric shows a relatively strong correlation with observed TSD-higher at the stand-level (R = 0.768)represented by the median value of estimates of the enclosed GEDI footprints-than at the individual footprint level (R = 0.674).  Figure 9 shows RMSD, BIAS, and Perct.10 boxplots of the TSD estimations obtained from the 100 random extractions of training stands for each sample size (i.e., from 1 to 10 training stands per stratum). In all six combinations of segmentation and footprint grids, the accuracy metrics show a  Figure 9 shows RMSD, BIAS, and Perct.10 boxplots of the TSD estimations obtained from the 100 random extractions of training stands for each sample size (i.e., from 1 to 10 training stands per stratum). In all six combinations of segmentation and footprint grids, the accuracy metrics show a similar monotonic pattern, with increasing accuracy as the number of training stands and GEDI footprints increases. As expected, there is a linear relationship between the number of training sampled stands and the number of GEDI footprints used to train the RF. In all cases, the sensitivity of the accuracy metrics to the sample size is initially high, and then steeply declines for sample sizes ≥ 4 stands per stratum. For instance, with the most conservative 'cloud grid-within stand' and Landsat segmentation, at sample size = 4, the median Perct.10 is 55.77% (i.e., 55.77% of the stands are classified within 10 years of the reference disturbance date), whereas at sample size = 10, Perct.10 is 60.56%. The interquartile range of all the accuracy metrics also decreases monotonically, with the greatest decrease observed for sample sizes ≤ 6.

Sensitivity to Training Sample Size
As a consequence, a training sample size of 6 stands per stratum was identified as the best compromise between the cost of generating training data and the reduction in classification errors. Figure 10 shows for sample size = 6, the accuracy metrics stratified by stand height (i.e., the 'rhInfl98' GEDI metric).

Sensitivity to the Stand Map
Comparison of the results obtained using the Landsat segmentation (red in Figure 9) and the LiDAR segmentation (blue in Figure 9) shows, with all grids and all sample sizes, that the former had a slightly worse performance as measured by RMSD and Perc.10, and slightly better performance as measured by the BIAS. The difference is relatively small, indicating that despite the oversegmentation discussed in Sections 2.4 and 4.1, the Landsat segmentation can provide a suitable stand map. At a sample size = 6, the largest difference between the two segmentations was observed in the most conservative 'cloud grid-within stand' scenario, where median RMSD, median BIAS, and median Perc.10 were, respectively, 22.01 years, 0.01 years, and 57.77% for the Landsat segmentation, and 19.13 years, 0.42 years, and 63.53% for the LiDAR segmentation (Table 4). Nevertheless, and as expected, the LiDAR segmentation depicts overall higher accuracies compared to the Landsat segmentation when analyzing stratified stand heights (Figure 10), which is accentuated on stands with median 'rhInfl98' lower than 10 m. Table 4. RMSD, BIAS, and Perct.10 for the six combinations of segmentation-derived stand map and GEDI footprint grid for the 100 replicates obtained for a sample size of 6 stands per stratum. A second-order effect of the stand oversegmentation using Landsat is that more footprints are discarded when only footprints fully enclosed within a stand are considered. Using the Landsat segmentation, 13,731 footprints (21% of the total) were discarded when considering the 'full grid-within stand' scenario instead of the 'full grid' scenario, whereas 10,417 (15% of the total) were discarded with the LiDAR segmentation (Table 2). In both cases, however, accuracy was higher with the 'full grid-within stand' than the 'full grid' scenario, indicating that eliminating footprints straddling two or more stands increases the accuracy even though fewer observations are available to estimate TSD. This was confirmed by extracting the training set from the 'full grid-within stand' to train the RF, and then applying the same RF to the 'full grid' and 'full grid-within stand'. The accuracy metrics ( Figure 11) indicate that, in all cases, the exclusion of the footprints straddling stands leads to a small increase in accuracy.

Sensitivity to the Number of Footprints per Stand
The absolute number of available footprints per stand, and the density of footprints relative to the stand area, are in principle critical factors in evaluating the feasibility of the proposed object-based approach for the generation of a wall-to-wall TSD map. Comparison of the results obtained with the three grids (i.e., comparison of the three rows of Figure 9 and Table 4) shows that despite the large difference in available footprints, the three grids yielded very similar results, with the 'cloud grid-within stand' scenario only slightly less accurate than the 'full grid-within stand' scenario but more accurate than the 'full grid' scenario, due to the benefit of removing footprints that straddle stand edges.
Remote Sens. 2020, 12, x FOR PEER REVIEW 19 of 27 Figure 11. Accuracy metrics ( , , and Perct.10) stratified by training sample size (1 to 10 stands per stratum) of the TSD estimations obtained by extracting the training set from the 'Full gridwithin stand' to train the RF, and then applying it to the 'full grid-within stand' (in blue) and the 'full grid' (in red). The thick central line represents the median, the edges of the box are the first (i.e., the 25th percentile), and third (i.e., 75th percentile) quartiles, and the whiskers are 1.5 times the range of the upper and lower quartiles.

Sensitivity to the Number of Footprints per Stand
The absolute number of available footprints per stand, and the density of footprints relative to the stand area, are in principle critical factors in evaluating the feasibility of the proposed objectbased approach for the generation of a wall-to-wall TSD map. Comparison of the results obtained with the three grids (i.e., comparison of the three rows of Figure 9 and Table 4) shows that despite the large difference in available footprints, the three grids yielded very similar results, with the 'cloud grid-within stand' scenario only slightly less accurate than the 'full grid-within stand' scenario but more accurate than the 'full grid' scenario, due to the benefit of removing footprints that straddle stand edges.
In Figure 12, the accuracy of the TSD classifications obtained with sample size = 6 is stratified by the number of footprints available in each stand, and in Figure 13, the same results are reported as a function of the density of footprints per unit area. As expected, the accuracy of the TSD predictions increases with the number of footprints per stand, but the results are stable with about 15 footprints per stand (Figure 12), which translates to a density of more than 0.5 footprint per ha ( Figure 13). In Figure 12, the accuracy of the TSD classifications obtained with sample size = 6 is stratified by the number of footprints available in each stand, and in Figure 13, the same results are reported as a function of the density of footprints per unit area. As expected, the accuracy of the TSD predictions increases with the number of footprints per stand, but the results are stable with about 15 footprints per stand (Figure 12), which translates to a density of more than 0.5 footprint per ha ( Figure 13).   , and Perct.10 stratified by the number of available GEDI footprints per stand (rightmost three columns); all results are obtained with training sample size = 6 stands. The thick central line represents the median, the edges of the box are the first (i.e., the 25th percentile) and the third (i.e., 75th percentile) quartiles, and the whiskers are 1.5 times the range of the upper and lower quartile.

Discussion
The GEDI spaceborne LiDAR instrument is providing unprecedented observations of global vegetation structure. Its sampling configuration makes it a priority to develop approaches for the spatial extrapolation of footprint-level observations to generate wall-to-wall maps of vegetation characteristics. In this research, we studied the feasibility of an object-oriented fusion of GEDI

Discussion
The GEDI spaceborne LiDAR instrument is providing unprecedented observations of global vegetation structure. Its sampling configuration makes it a priority to develop approaches for the spatial extrapolation of footprint-level observations to generate wall-to-wall maps of vegetation characteristics. In this research, we studied the feasibility of an object-oriented fusion of GEDI footprints and Landsat data for the estimation of time since disturbance at the stand level; the analysis was conducted using simulated GEDI data generated from a small-footprint airborne LiDAR dataset.
We adapted to GEDI the approach for TSD estimation developed for airborne LiDAR in Sanchez-Lopez et al. [12]. The resulting methodology involves the extraction of a training set of stands with known TSD, followed by RF classification to obtain footprint-level TSD estimates, and finally object-oriented fusion with a Landsat-derived stand map.
The proposed stratification for the selection of the training set is based solely on topography and GEDI-observed height. In the present study,´SRAI´was derived from the DEM generated from the airborne LiDAR acquisition, but in principle, it could be generated from any globally available surface elevation data (e.g., SRTM); thus, it is not expected to limit the findings or future application of this research since it was only used for stratification and selection of the training data. A minimum of six stands per stratum (60 stands in total) was needed to obtain acceptable accuracies, and to constrain the standard errors of the accuracy metrics. The 'rhInfl98 metric, representing the height of the upper layer of the canopy, was selected for stratification based on the expected high correlation between canopy height and TSD [12], which was confirmed in the present analysis ( Figure 8); nevertheless, alternative metrics accounting also for canopy cover might be well suited as stratification variables, especially when considering relatively old stands (i.e., TSD > 100 years).
The stratification ensures that the full range of forest conditions is sampled, while requiring the collection of TSD training data (usually expensive and time-consuming to gather in the field) only on the sampled stands. In other study sites where information on TSD is not available, forest change maps [66] and available optical remotely sensed imagery could be used to identify recent disturbances. In areas where clearcut harvesting is commonly practiced and field data records of forest operations are available (e.g., in the U.S.A, Canada, or the Scandinavian countries), those can be in a TSD reference dataset, as well as information from national forest inventories that often include dendrochronology records of specific stands. If none of these records exist, the proposed methods could still be implemented, by estimating TSD from tree age cores on the sampled stands.
Landsat data have an adequate spatial resolution (i.e., 30 m) to detect typical forest management units, such as forest stands. However, a few limitations should be considered. It is well established that the sensitivity of Landsat reflectance to forest structure is low in dense canopies [54,67]. In particular, there is an asymptotic relationship between optical reflectance and stand age, with low separability of mature forest stands since the signal saturates above intermediate biomass levels [68]. As expected, a degree of mismatch was observed between the perimeters of the Landsat image-objects and the actual forest stands, especially in stands disturbed many decades ago (i.e., TSD>80 years). Overall, the Landsat-based stand map was oversegmented in comparison to the airborne LiDAR-based one. While oversegmentation is considered less problematic compared to undersegmentation in object-based image analysis, in the specific case of data fusion with a sampling instrument, oversegmentation has the undesirable effect of reducing the number of GEDI footprints available to estimate TSD within each stand. The 25-m diameter of the GEDI footprints results in a number of observations straddling the boundaries between of stands of different TSD. The results indicate that using only footprints fully enclosed within a single forest stand increased accuracy, despite the tradeoff with the reduced number of available observations (e.g., Figures 9 and 11). It therefore is important to ensure an optimal delineation of the forest stands, to avoid oversegmentation, thus minimizing the number of discarded GEDI footprints.
The total number of footprints used for training the RF had, overall, less effect on accuracy than the number of training stands sampled in each stratum (as noted above, the gain in accuracy is very considerable for small sample sizes of up to six stands per stratum). This is demonstrated in Figure 9: the accuracy of the results obtained using the three grids is substantially identical at each given sample size, even though the total number of footprints used for training (Figure 9, left column) varies greatly across grids. This is not unexpected, as a larger number of training stands would ensure that the training dataset better represents the full range of forest conditions in the study area.
The spatial distribution of GEDI footprints, with regularly spaced observations along and across the orbits of the International Space Station, effectively ensures that larger stands will include more footprints than smaller stands. The accuracy of the results monotonically increases with the number of footprints per stand, but our results indicate that the accuracy improves only marginally above 15 footprints per stand (corresponding to a density of 0.5 footprints per ha).
The number of footprints available to estimate TSD in each stand at the end of the two-year programmed mission will depend, among other factors, on cloud coverage. We observed only a very modest decrease in accuracy when comparing the results obtained with the full grid of simulated observations to the grid with 50% cloud cover; and our analysis suggests that only 5% of the study area will be represented by stands unsampled by a GEDI footprint. When considering the present results for future studies, note should be taken that footprint density is also a function of latitude. The study area is located on~46N, where a higher track density is expected compared to lower latitudes nearer the tropics [37]. Additionally, the size and shape of the forest stands will differ in other ecosystems, due to different stand-replacing disturbance dynamics and harvesting practices. As a result, the number of stands not intersected by any GEDI footprint might increase or decrease, and wall-to-wall mapping of TSD with the proposed methodology might require more complex data fusion strategies with alternative sources of remotely sensed data.
This research builds upon the study of Sanchez-Lopez et al. [12], who used airborne LiDAR data to predict stand-level TSD in the same study area. When accounting for clouds, two years of GEDI footprints are expected to cover only~4% of the study area, whereas the airborne LiDAR point clouds provided complete coverage with a return density of >4 points/m 2 . Even so, the change in accuracy between the two studies was not pronounced. The overall accuracy obtained using the LiDAR segmentation and the 'cloud grid-within stand' scenario (median RMSD = 19.13 years, median BIAS = 0.42 years, and median Perct.10 = 63.53% (Table 4)) was comparable to the results obtained from airborne LiDAR in the former study: RMSD = 17.5 years, BIAS = 0.8 years, and Perct.10 = 72.8%.
Even-aged stands are structurally homogenous, but there exists an intrinsic degree of heterogeneity driven by, among other things, regeneration and regrowth patterns, micro-site conditions, and competition, which becomes accentuated with age as an uneven-aged distribution of the trees comprising the stand develops. Sanchez-Lopez et al. [12] argued that the introduction of meaningful spatial units of analysis in the form of stand boundaries reduced the inherent variability of cell-level analyses, to improve estimates of forest attributes (such as TSD) such that they vary less within stands than between stands. Similarly, here, the use of stand perimeters provided contextual information to aggregate TSD at the stand level, reducing the variability observed by individual GEDI footprints ( Figure 8). It should be noted that part of the variability observed in the footprint-level GEDI metrics might be attributed to the ground filtering algorithm used to retrieve the GEDI metrics. Default settings, such as the smoothing width, could be adjusted for this specific biome to make ground detection more precise.
Thanks to the availability of the burn history dataset [45] and clearcut harvest records (i.e., FACTs dataset) [46], our analysis considered TSD up to 140 years. Some limitation of the reference dataset ought, however, to be discussed. The distribution of disturbances changed through time, with a very large number of disturbed stands in some decades (e.g., 1910s) and no disturbed stands in others (e.g., 1900s, 1890s). Moreover, most of the older disturbed stands (TSD > 60 years) resulted from large fires that occurred in the 1910s and 1930s, whereas younger disturbed stands (TSD < 60 years) resulted from smaller clearcuts. It should also be considered that the reference dataset included fires of medium to high severity [12,45]. Stand regeneration and regrowth patterns might significantly vary as a function of fire severity, especially when combined with other factors, such as topography and micro-climate conditions. It is expected that that the incorporation of additional variables in the RF, linked to the type and magnitude of disturbance, or to climate and topography, would increase the accuracy of the TSD predictions [12]. It is also likely that the FACTs dataset has omission errors (i.e., additional clearcuts and some post-fire logging that might not have been recorded), but these would not affect the accuracy metrics.
The validation results (RMSD~20 years and Perct.10~60% on TSD up to 140 years) indicate that the TSD estimation is sufficiently precise for a broad characterization of stand age (old-growth, primary forest from non-disturbed and industrial areas), and to identify successional states (e.g., open, stand initiation, young multistory, mature multistory, old multistory) [27] but not for providing the exact decade of the disturbance.
Overall, the present study indicates that GEDI will provide a relatively dense grid of measurements after its two-year programmed mission and its combination with a stand delineation provides a means to map stand-level TSD synoptically, at least in areas with similar disturbance dynamics and at relatively high latitudes. The method has been specifically designed for stand-replacing disturbances. Further research would be required to address the suitability of the method including more subtle disturbances, such as partial harvests, low-severity fires, droughts, or insect outbreaks.

Conclusions
Upscaling footprint estimates to objects that represent even-aged forest stands overcomes the limitations of the GEDI sampling configuration and allows for the estimation of stand-level TSD. In contrast to the use of cells of constant size for data fusion with spaceborne LiDAR data (e.g., 1 km, 500 m) [31,32,69], the use of image-objects is preferable to map forest attributes, such as TSD, in managed forests that vary less within stands than between stands even decades after stand-replacing disturbance events. Thus, footprints enclosed within the same stand likely represent canopy structures of similar TSD. Datasets of the stand-level disturbance history of forest ecosystems are required to fully understand global carbon cycle dynamics and assess the actual role of forests in the mitigation of climate change.