Terrain Slope Effect on Forest Height and Wood Volume Estimation from GEDI Data

: The Global Ecosystem Dynamics Investigation LiDAR (GEDI) is a new full waveform (FW) based LiDAR system that presents a new opportunity for the observation of forest structures globally. The backscattered GEDI signals, as all FW systems, are distorted by topographic conditions within their footprint, leading to uncertainties on the measured forest variables. In this study, we explore how well several approaches based on waveform metrics and ancillary digital elevation model (DEM) data perform on the estimation of stand dominant heights ( 𝐻𝐻 𝑑𝑑𝑑𝑑𝑑𝑑 ) and wood volume (V) across different sites of Eucalyptus plantations with varying terrain slopes. In total, five models were assessed on their ability to estimate 𝐻𝐻 𝑑𝑑𝑑𝑑𝑑𝑑 and four models for V. Results showed that the models using the GEDI metrics, such as the height at different energy quantiles with terrain data from the shuttle radar topography mission’s (SRTM) digital elevation model (DEM) were still dependent on the topographic slope. For 𝐻𝐻 𝑑𝑑𝑑𝑑𝑑𝑑 , an RMSE increase of 14% was observed for data acquired over slopes higher than 20% in comparison to slopes between 10 and 20%. For V, a 74% increase in RMSE was reported between GEDI data acquired over slopes between 0–10% and those acquired over slopes higher than 10%. Next, a model relying on the height at different energy quantiles of the entire waveform ( 𝐻𝐻𝑇𝑇 𝑛𝑛 ) and the height at different energy quartiles of the bare ground waveform ( 𝐻𝐻𝐺𝐺 𝑛𝑛 ) was assessed. Two sets of the 𝐻𝐻𝐺𝐺 𝑛𝑛 metrics were generated, the first one was obtained using a simulated waveform representing the echo from a bare ground, while the second one relied on the actual ground return from the waveform by means of Gaussian fitting. Results showed that both the simulated and fitted models provide the most accurate estimates of 𝐻𝐻 𝑑𝑑𝑑𝑑𝑑𝑑 and V for all slope ranges. The simulation-based model showed an RMSE that ranged between 1.39 and 1.66 m (be-tween 26.76 and 39.26 m 3 ·ha −1 for V) while the fitting-based method showed an RMSE that ranged between 1.26


Introduction
In the last couple of decades, due to its accurate Earth observation capabilities, remote sensing has increasingly been used for the estimation, on local and global scales, of forest biophysical characteristics, namely forest heights and above ground biomass (AGB). The estimation of forest characteristics is not restricted to a particular remote sensing technique, as it has been obtained using either passive optical sensing such as from optical imagery, or using active sensors such as synthetic aperture radar (SAR) or light detection and ranging (LiDAR) data. Nonetheless, LiDAR has proven to be better suited for the estimation of AGB and canopy heights than SAR (with available wavelengths to date: L, C and X bands), Global Navigation Satellite System Reflectometry [1], and optical imagery [2,3]. LiDAR data show lower signal saturation with AGB than optical and radar data. In general, the literature reports saturation thresholds with optical and radar imageries (with X, C and L-bands for SAR data) rarely exceeding 150 Mg/ha [4,5] whereas Li-DAR data have shown AGB estimation capabilities up to 1200 t/ha [6]. Yet, the AGB estimation levels from LiDAR data are based on canopy vertical structure metrics, and the relationship between height structure and ABG itself may saturate at high AGB, thus sometimes a start of under-estimation of AGB using LiDAR data is observed [7].
LiDAR systems capture vertical structures by measuring the time taken for an emitted laser pulse to return. Over vegetated areas, the emitted pulse will start reflecting as soon as it hits the top of canopy (given a large enough top of canopy surface), and the final return will theoretically be from the ground (if the laser pulse can penetrate through the gaps [8]). The representation of such vertical structures depends on the LiDAR system used. Discrete LiDAR systems usually have small footprints (<1 m) and record multiple returns representing different targets within the travel path. The returned laser echoes are next recorded as a series of 3D coordinates known as point clouds. On the other hand, full waveform (FW) LiDAR systems record all the reflected signals over a given footprint, and not only the first one. They therefore provide a continuous vertical profile representing the heights of the different objects within their footprints, which is usually larger than 10 m. Therefore, FW LiDAR systems provide much richer information about the spatial arrangement of structures within their waveforms [9]. The recorded vertical echoes of objects within the waveform are represented as a series of multiple connected temporal peaks. These peaks might therefore represent reflections from a single object (e.g., top of canopy cover) or different objects with relatively the same heights (e.g., short understory and the ground) [10]. To measure vegetation characteristics, the vegetation and ground portion of the waveform need to be identified and separated. As such, given the wide footprint of FW LiDAR systems, a source of uncertainty on the estimation of forest characteristics such as canopy heights and biomass could occur based on the local terrain morphology. For instance over terrain with a high relief, the ground return might get mixed with the vegetation leading to an overestimation of the relevant vegetation characteristics [11].
Over the last years, many studies were carried out on FW data acquired by the Geoscience Laser Altimeter System (GLAS) on board the Ice, Cloud, and land Elevation satellite (ICESat) for the estimation of forest characteristics. ICESat/GLAS was the first spaceborne FW LiDAR system, and operated from 2003 until 2009 [12]. ICESat/GLAS pulsed one of its three 1064 nm lasers at a time, at 40 Hz, producing ellipsoidal shaped footprints with a diameter of ~60 m at ~170 m along-track intervals, and several kilometers across tracks. ICESat/GLAS was used during its operational and post-operational period in many studies for the precise estimation of forest characteristics such as canopy height and biomass [11,[13][14][15][16][17][18][19]. However, most of the previously mentioned studies focused on forests with relatively flat terrain since ICESat/GLAS with its large footprint size was susceptible to overestimating forest characteristics (e.g., canopy heights, and AGB) over terrains with high relief [20,21]. Nonetheless, a number of studies have been carried out to address this particular issue, and presented several methods to minimize the effects of slope on the waveforms. These studies can be grouped into three categories. The first group of studies attempted to retrieve slope information from the waveforms in the form of waveform metrics, such as the trailing edge extent (representing terrain variability) and the leading edge extent (representing vegetation variability) [11,13,20,22] and terrain indices (range of ground surface elevations within a sampling window) retrieved from a digital elevation model (DEM). This technique provided increased accuracies on the estimation of forest heights over sloping terrain [11], but the squared correlation coefficient (R 2 ) decreased with increased slope values, and was only viable for slopes lower than 15° (R 2 = 0.63) [23]. The second type of studies, such as the study of Yang et al. [24], minimized the effects of slope on large footprint LiDAR by modifying the geometric optical and radiative transfer (GORT) vegetation LiDAR model [25] to take into account the impacts of surface topography. Their approach showed that slope-corrected ICESat/GLAS footprints had an accuracy on the estimation of canopy heights of 2.2 m (R 2 of 0.77). More recently, Wang et al. [26] developed a method based on new waveform metrics to minimize the effects of slopes on the estimation of forest AGB. The developed method relies on quantile heights (the height above ground at which n% of the waveform energy falls below) [27], and quantile heights of the ground return only. The method developed by Wang et al. [26] can be decomposed into three major steps. (1) A LiDAR waveform over bare grounds but with similar slope as the studied waveform is first simulated in order to derive a ground return. This was necessary in the study of Wang et al. [26] as they worked on ICESat/GLAS waveforms with a ~70 m diameter footprints alongside simulated waveforms with 25 m diameter footprints. For ICESat/GLAS, the ground return disappeared with slopes bigger than 15° [28] (i.e., returns from ground and vegetation become mixed up over sloping terrain). Next, the simulated waveform is aligned with the studied waveform at the signal end, and finally the related metrics were derived. The methodology developed by Wang et al. [26] gave significant increase in accuracy in comparison to previous methodologies. Indeed, for the estimation of above ground biomass, a decrease of 20.83 Mg/ha in terms of RMSE (32% increase in R 2 ) was observed for slope ranges of 0-40°.
ICESat/GLAS was succeeded in 2018 by ICESat-2 that carried the Advanced Topographic Laser Altimeter System (ATLAS) with a goal to measure ice-sheet topography, cloud and atmospheric properties, and global vegetation. In contrast to ICESat/GLAS, AT-LAS is equipped with a single 532 nm wavelength laser that emits six beams (arranged into three pairs). Beam pairs are separated by ~3 km across-track with a pair spacing of 90 m. The nominal footprint of ATLAS is 17 m with a spacing interval of 0.7 m along-track. Moreover, unlike ICESat/GLAS, ATLAS uses a photon counting system rather than a full waveform system, and has the ability to detect single echoed photons. However, given the wavelength of the equipped laser (532 nm), ATLAS has lower reflectance over vegetation in comparison to ice [29], coupled with the low number of reflected photons, ICE-Sat-2 might be limited over tropical forests for direct canopy height retrievals due to the canopy cover [29].
Recently, the Global Ecosystem Dynamics Investigation (GEDI) on board the International Space Station (ISS) started collecting LiDAR data globally since April 2019. GEDI's mission is to provide information about canopy structure, biomass and topography, and is estimated to acquire 10 billion cloud free shots in its two years mission [30]. GEDI shows many similarities to ICESat/GLAS, however, given GEDI's higher sampling rate (242 vs. 40 Hz for ICESat-1), and its much smaller footprint size (~25 vs. ~60 m for ICESat-1), GEDI will provide a highly improved coverage, and improved measurements over forested areas with high relief in comparison to ICESat/GLAS. Nonetheless, given that GEDI is a FW LiDAR system, it is expected to also be affected by relief. However, since GEDI has a much smaller footprint than ICESat/GLAS while getting equivalent vertical resolution (1 ns), the effect of slopes on the waveform should be less pronounced.
As far as we know, no studies have yet been dedicated to analyze the effects of slopes on GEDI data. Therefore, the objective of this study is two-fold. First, the effects of the terrain slope on the estimation of both canopy heights and wood volume of fast-growing Eucalyptus plantations in Brazil will be analyzed. Next, slope-effect minimization techniques from previous literature will be used in order to determine which methodology yields the best forest characteristics estimates over sloping terrain. The choice of the in situ dataset was decided since the physical structure of Eucalyptus is simple enough, and very homogeneous within the GEDI footprint, thus reducing uncertainties unrelated to the GEDI sensor itself.
The rest of this paper is organized as follows: Section 2 describes the study area and lists the data used. Section 3 presents a thorough description of the methods for the estimation of canopy heights and wood volume over sloping terrain. Sections 4 and 5 present the results and the discussion, respectively. Finally, in Section 6, we summarize and conclude our study.

Study Area
The study area is located in Brazil, in five regions across a large latitudinal gradient ( Figure 1), covering different climate and soil types. Maranhão (MA) is located in a typical equatorial region with different intensities of monsoon rainfall (1200 to 2500 mm/year. Bahia (BA) and Espírito Santo (ES) are located in a tropical coastal region with strong rainfall anisotropy (800 to 1500 mm/year) that directly affects wood productivity from the nearshore towards the hinterland. Mato Grosso do Sul (MS) is located in a tropical region (1200 to 1500 mm/year) but with some subtropical features (rare frost), it is the most environmentally homogeneous among the five study areas, resulting in less variation in wood productivity within the region. São Paulo (SP) is mainly in a subtropical region (1100 to 2000 mm/year with orographic effects), heavy frost-days are frequent in the southern part, has complex relief and a wide range of deep and shallow tropical soils, resulting in a huge variability in wood productivity across the region. Across the five regions, clonal seedlings of mainly E. grandis (W. Hill) and E. urophylla (S.T. Blake) and different types of hybrids are planted in rows at a density of 1000-1667 trees/ha. The wood productivity of the plantations was on average 40 m 3 /ha/year, with 80% of the stands being between 30-50 m 3 /ha/year and some stands could reach values as high as 60 m 3 /ha/year. At harvest time, the dominant height of around 80% of the stands is in the 20-35 m range with a stand volume between 180 and 300 m 3 /ha. The plantations were managed locally by stand units (~50 ha), where the same management is applied for each stand: planting, harvesting and weed control, genetic material, soil preparation and fertilization. The plantations are generally characterized by a sparse understory and herbaceous strata. Eucalyptus plantations exhibit a simple structure, with a tree crown strata of 3 to 10 m in width above a "trunk strata" with few Eucalyptus leaves and few understories. The "soil strata" is mainly constituted of litter accumulation of branches and leaves, with some patches of herbaceous species. More than 82% of the Brazilian Eucalyptus plantations are cultivated on flat to gentle slopes due to huge harvesting and logging operation costs on high slopes [31]. In some areas, such as the states of Minas Gerais, Paraná, Santa Catarina, and São Paulo (Paraíba Valley), high slopes are however the rule. Figure 1. Location of the five study sites; the zoomed-in rectangle shows an example of GEDI tracks over some stands.

In Situ Data
A total of 168 Eucalyptus stands were selected with field inventories performed between 1 April 2019 and 1 June 2019. The stands were selected due to the presence of acquired GEDI shots within this period. Moreover, these stands are generally located on a terrain with slopes of varying degrees. Specifically, 86.4% of stands are located over slopes lower than 10%, 11.2% of stands are located on sloping terrain with slope ranges between 10 and 20%, and the remainder of the stands (2.4%) are located on a terrain with slopes between 20 and 45%. An additional 50-m internal buffer strip from the stand borders was used to account for any footprint geolocation errors and to avoid footprints that match the boundary between the stand of interest and the surrounding medium. Permanent inventory plots had an area of approximately 400 m 2 and were systematically distributed throughout the stand with a density of one plot per 10 ha. They included 30 to 100 trees with an average of 58 trees. During a field inventory, the diameter at breast height (DBH, 1.3 m above the ground) of each tree in the inventory plot, the height of a central subsample of 10 trees, and the height of the four largest trees in terms of DBH (dominant trees) were measured. The mean height of these dominant trees defined the dominant height of the plot ( ). , basal area and age on the inventory date were then used in local volume equations to estimate the plot's total and merchantable volume (the merchantable volume is a tree's volume up to 6 cm stem diameter with bark). Table 1 shows the distribution of field measurements of and wood volume.

GEDI Dataset
GEDI is equipped with three 1064 nm onboard lasers designed to sample the Earth's surface at a ~60 m interval along the track with a cross track separation of ~600 m. One of the lasers is split into two beams. These four beams are then dithered across track to produce eight parallel tracks of observations. The GEDI lasers fire at a frequency of 242 Hz and, on the ground, measure 3D structures over a 25-m wide footprint. The received waveforms are digitized to a maximum of 1246 bins with a vertical resolution of 1 ns (15 cm), corresponding to a maximum of 186.9 m of height ranges, with a vertical accuracy over relatively flat, non-vegetated surfaces of ~3 cm [30].
GEDI data are already processed by the Land Processes Distributed Active Archive Center (LP DAAC, https://lpdaac.usgs.gov/tools/data-pool/ (accessed on 27 May 2021)). GEDI data are provided through three data products, L1B [32], L2A [33], and L2B [34]. The L1B data product, contains among other, the geolocated (longitude, latitude, and elevation) raw transmitted and received waveforms as well as information on mean and standard deviation of the noise, and acquisition time. The L2A product contains data of elevation and height metrics of the vertical structures within the waveform. These height metrics are issued from the processing of the received waveforms from the L1B product. Finally, the L2B data product [34] provides footprint-level vegetation metrics, such as canopy cover, vertical profile metrics, Leaf Area Index (LAI), and foliage height diversity (FHD).
The extracted metrics from each waveform are the results of several processing steps [32,33]. First, the raw received waveforms are smoothed to reduce the noise in the signal, and thus permitting the determination of the useful part of the waveform within the corresponding footprint [32]. Waveform smoothing is performed by means of a Gaussian filter with a current width of 6.5 ns. The smoothing permits the determination of searchstart and searchend, which are, respectively, the first and last positions in the signal where the signal intensity is above the following threshold [32]: where "mean" is the mean noise level, "std" is the standard deviation of the noise of the smoothed waveform, and "v" is a constant currently set at 4. Inside the window delimited by searchstart and searchend, the highest (toploc) and lowest (botloc) detectable returns are determined ( Figure 2) [32]. toploc and botloc respectively represent the highest and lowest locations inside the waveform extent where two adjacent intensities are above a threshold. The threshold equation used to determine toploc and botloc is the same as Equation (1), with "v" an integer fixed at 2, 3, 4 and 6. Two values of "v" are used to determine the toploc ("Front_threshold") and botloc ("Back_threshold"). Finally, the location of the distinctive peaks or modes in the waveform, such as the ground peak, or top of canopy peaks are determined using a second Gaussian filtering of the waveform section between toploc and botloc, and then finding all the zero crossings of the first derivative of the filtered waveform ( Figure 2) [33]. The width of the second Gaussian filter ("Smoothwidth_zcross") is fixed to either 3.5 or 6.5 ns. Currently, the LP DAAC provides six configurations (a1 to a6) for the estimation of the waveform metrics. The difference in these configurations are the values used for the thresholds presented earlier. For studies on Eucalyptus plantations, Fayad et al. [7] determined that algorithm a1 (Smoothwidth_zcross = 6.5, Front_threshold = 3, Back_threshold = 6) provided the best metrics for the estimation of canopy heights and wood volume. ) at different quantiles "n". The left and right red dashed lines represent, respectively, the position of the vegetation (Vloc) and ground peaks (Gloc). Note that 1 ns corresponds to 15 cm sampling distance in the waveform. The waveform amplitudes are counts from the analog to digital converter (ADC) on the instrument and normalized to be between 0 and 1.
In this study, variables from both L1B and L2A were extracted. From L1B, we extracted the raw received waveforms, their geolocation (longitude, and latitude), as well as their acquisition date and times. From the L2A data product, we extracted the following variables: (1) the position within the waveform of toploc and botloc, and (2) the relative height metrics at 10% intervals from botloc (0%) to toploc (100%) ( , 10% ≤ ≤ 100%, 10%). represent the height between botloc and the location at n% of cumulative energy ( Figure 2) [33].
Initially, 2128 GEDI shots acquired over the 168 Eucalyptus stands were selected. These GEDI shots corresponded to acquisitions within a window of two months (before or after) the stand inventory. The two-month window is necessary to overcome tree growth differences that occur between inventory times and acquisition times. In fact, on these fast-growing plantations, a 2-month difference could result in an up to 50 cm growth in and 10 m 3 ·ha −1 in V (depending on the genetic material, pedoclimatic conditions and age). However, this reasonable hypothesis allows keeping a large number of stands, including a large variability of age and growing conditions.
Finally, not all GEDI acquisitions are viable, as atmospheric conditions (e.g., clouds) can affect them. Therefore, a waveform was not investigated further if it met any of the following criteria: • Waveforms with reported elevations that are significantly higher than the corresponding elevations in the SRTM DEM [15]. In essence, we removed all waveforms where the absolute difference is higher than 100 m; • Waveforms with a difference between waveform extent ( , height between toploc and botloc, [13]) and (Gloc-Vloc) higher than 400 bins (corresponding to 60 m).

Digital Elevation Model Metrics
The Shuttle Radar Topography Mission's (SRTM) Digital Elevation Model (DEM) with a spatial resolution of 30 m was used is this study. Two variables were derived from the DEM for each GEDI footprint: slope (S), and surface Roughness (ROUG). The surface roughness map was obtained by computing the standard deviation of the elevation in a 3 × 3 pixelmoving window.

Stand Scale Dominant Heights Estimation
In this section, we evaluate five models for the estimation of stand-scale dominant heights ( ) from GEDI data acquired over terrain with different slopes. Two models are based on GEDI metrics and terrain information from the SRTM DEM and is estimated through linear regression, while the remaining three will be exclusively based on GEDI metrics where is estimated using random forest regression algorithms. The first tested model is based on the formulation by Lefky et al. [13]: where is the waveform extent (height difference between botloc and toploc) in meters, S is the terrain slope in degrees, a is the coefficient applied to the waveform height index, b is the coefficient applied to the slope, and c is a constant.
In the study of Fayad et al. [7], it has been shown that the relative height metric (RH100) (Figure 2) is better correlated to in situ than the waveform extent. Therefore, a modified version of Equation (2) will also be tested. Equation (3) will use the RH100 instead of the : We will also attempt to estimate through nonlinear nonparametric regressions by means of a Random Forest regressor (RFH). Random Forests are an ensemble of machine learning algorithms used for classification or regressing by fitting a number of decision trees on various sub-samples of the dataset, and use averaging to improve the predictive accuracy and control over-fitting [35]. Compared to linear models, RF is advantageous for being able to model also nonlinear relationships (threshold effect) between the variable to explain and the explanatory variables. The suggested model will use the relative height metrics {10% ≤ ≤ 100%, 10%}, as well as the terrain roughness (ROUG), and terrain slope (S).
The models integrating terrain information, such as the models provided by Lefsky et al. [13], have been proven to increase the accuracy on estimation over sloping terrain [11]. However, the effect of slopes is not completely eliminated. For instance, in the study of Xing et al. [23], the accuracy of the method proposed by Lefsky decreased with the increasing slope. Indeed, the RMSE increased from of 2.87 m (R 2 of 0.89) for slope ranges 0-5° to 5.97 m (R 2 of 0.08) for slope ranges 0-30°. Therefore, Wang et al. [26] proposed a method relying on new waveform metrics in order to reduce the effects of slopes. The method of Wang et al. [26] comprises three steps in order to generate the new waveform metrics. First, a waveform over bare grounds with the same slope value as the studied waveform is simulated. The simulated waveform is based on a Gaussian function, resembling the Laser pulse used by FW LiDARs, and thus has the following form (assuming a nadir-viewing angle): where σ and A are respectively the standard deviation and amplitude of simulated echoed waveform, and x the waveform sample locations at 1 ns (15 cm) intervals. For simplicity, the amplitude (A) is set to one. The standard deviation of the waveform, which affects its width, is dictated by the characteristics of the LiDAR system used. Over flat bare grounds, the standard deviation of the echoed waveform is defined as follows: where c is the speed of light (3 ⋅ 10 8 m/s), t is the pulse width of the LiDAR system (t = 15.6 ns for GEDI [30]), and PT = 0.5 (half the amplitude points of the pulse width).
Over sloping terrain, the waveform extent will increase with the terrain slope even if the canopy is the same, and the echoed waveform will exhibit a broadening of the ground return (Figures 3 and 4). Therefore, the standard deviation value to simulate a bare ground return (Equation (5)) should be broadened in accordance to the terrain slope. The standard deviation on a slope terrain is computed from by adding a broadening effect: where is obtained from Equation (5), β = 0.5 for waveforms simulated over forest stands [36], γ is the footprint diameter (γ = 25 m for GEDI), and finally, θ is the terrain slope in degrees (°).  After determining the shape of the waveform over a bare ground with known slope (Figure 5.a), the simulated ground return is superposed over the studied GEDI waveform by the position of the signal end (botloc) of both waveforms ( Figure 5.b). For the simulated waveform, toploc is determined as the first position where the amplitude of the simulated waveform is above zero, while botloc is the last position where the amplitude is above zero. After the superposition of the original waveform and the simulated ground waveform, the metrics ("s" for simulated) can be generated, where = − , and is the height between botloc and the position at n% energy of the original waveform, and is the height between botloc and the position at n% energy of the simulated ground waveform (Figure 5.b).
To estimate the AGB, Wang et al. [26] relied on a linear regression model using four ( 25 , 50 , 75 , and 100 ). However, since random forests are more accurate than linear regression models for the estimation of forest characteristics (e.g., canopy heights, AGB) [7], in this study, and will be used in a random forest regression model using nine and nine values ( 20% ≤ ≤ 100%, 10%). The method proposed by Wang et al. [26] relies on slope information from the SRTM DEM which has a resolution of 30 m while the diameter of GEDI is 25 m. Therefore, in order to analyze the pertinence of the Wang method, we propose a methodology that relies on the same metrics, but instead of simulating a ground return, the metrics will be calculated from the ground return of the original waveform (henceforth referred to as , "f" for fitted). The original ground return will be fitted by means of an automated Gaussian decomposition of the original waveform [37]. Figure 6 shows the difference over the ground return between a simulated ground return, and fitted ground return. A summary of the models that will be tested for the estimation of are presented in Table 2. For the random forest-based models, they are built using a set of 500 trees (higher tree count slightly increased the model accuracy), with a tree depth equal to the square root of the number of available factors. Model performance is assessed using a 5fold cross validation, and the Eucalyptus stands used for training or validation were selected randomly regardless of the terrain slope. In addition, we imposed that footprints belonging to the same stand were assigned exclusively to one of the data partitions (train or test) with the aim to avoid possible non-independence of the data due to spatial proximity in the evaluation procedure. Finally, model performances are evaluated by means of the coefficient of determination (R 2 ), the root mean square error (RMSE), the bias (difference between estimated and in situ variables), and the root mean squared percentage error (RMSPE). RMSPE is defined as: where is the measured variable and is the predicted variable.

Wood Volume Estimation
Four models will be tested for the estimation of wood volume. The first model is adapted from Saatchi et al. [38] and uses a power law relationship between the volume and Lorey's height. The model will also use the terrain slope (S) in order to compensate for the terrain slope effect: where HL is Lorey's height and S is the terrain slope. In this study, the relationship defined in Equation (8) was used by replacing Lorey's height with 100 (representing the dominant height ) as both height values were similar (HL was lower than by a maximum of 0.9 m at the end of the rotation of the Eucalyptus plantation) [15].
The second model that will be tested is based on a random forest regressor using the relative height metrics {10% ≤ ≤ 100%, 10%}, as well as the terrain roughness (ROUG), and terrain slope (S).
Finally, the metrics generated by Wang et al. [26] will be used in a random forest regressor in order to estimate the wood volume. Two sets of metrics will be used: (1) the and as defined by Wang et al. [26], and (2) the and which rely on a Gaussian decomposition. A summary of the tested models for the estimation of wood volume is presented in Table 3. Table 3. List of the models used for the estimation of wood volume (V).

Estimation of Dominant Stand Heights ( )
The results presented in Figure 7 and Table 4 show the accuracy of the estimation of from the models presented in Table 2 over three slope ranges (0-10%, 10-20%, and between 20 and 45%). For slope ranges 0-10%, the accuracy of the estimation of using MH1 was the lowest with an RMSE of 2.06 m (R 2 = 0.81). For the remaining models, the RMSE of the estimates were similar and ranged between 1.35 m (R 2 = 0.93, ) and 1.42 m (R 2 = 0.93, fRFHRHT+HG). For slope ranges 10-20%, the models did not show any decrease in performance due to slope, except for the Wang model (sRFHRHT+HG) which had a 30 cm increase in RMSE (Table 4) and a 1% increase in RMSPE. For slopes higher than 20%, all models except for (fRFHRHT+HG) had a decrease in accuracy with increased slopes. Indeed, for slopes higher than 20%, the RMSE ranged between 1.65 m (R 2 of 0.86, RFHRH) and 3.26 m (R 2 of 0.45, Model 1) (Table 4). Moreover, the bias (estimated -in situ ) was ~1.2 m using Models MH1 and MH2, and 0.65 m for (RFHRH). The (sRFHRHT+HG) and (fRFHRHT+HG) models were the two models that did not show high sensitivity to terrain slopes (no bias) even for slopes higher than 20% (Figure 7, Table 4). Nonetheless, the model (fRFHRHT+HG) was slightly more accurate for slope ranges higher than 10%, where the RMSPE on the estimation of remained 6% (RMSE of 1.26 m, R 2 of 0.92), with a slight bias of 0.11 m. Figure 7. Comparison of the measured vs. estimated dominant height from the models presented in Section 3.1 ( Table 2) using GEDI metrics extracted with the processing configuration "a1" of GEDI data. An analysis of the slope effects on the accuracy of is presented in Figure 8, which shows the variability of the difference between the estimated and in situ . The results shown in Figure 8 indicate that the slope effect on estimates using the models (MH2) and (RFHRH) were not completely eliminated for slopes higher than 15%. Indeed, for both models, the median difference between estimated and in situ increased from −0.08 and −0.06 for, respectively, the models (MH2) and (RFHRH) for slope ranges ]10-15%] to 1.2 and 0.59 m for slope ranges higher than 20% (between 20 and 45%). In contrast, the slope effects on estimates using the Wang-based methodology (sRFHRHT+HG, or fRFHRHT+HG) was greatly reduced, and the median difference between estimated and in situ ranged between −0.096 and 0.0.12 m for (sRFHRHT+HG) and between −0.25 and 0.17 m for (fRFHRHT+HG).

Figure 8.
Box-whiskers plots of the difference between estimated and in situ using four estimation models across five slope ranges. Numbers in parenthesis represent the number of points in each slope class.

Estimation of Wood Volume (V)
Four models were tested for the estimation of the wood volume (V) in Eucalyptus stands. The results presented in Figure 9 and Table 5 show that the models (MV1) and (RFVRH) are sensitive to slopes with increasing RMSE on the estimation of V (decrease of R 2 ) with increasing slope. For both models, the RMSE increased from about 27.5 m 3 ·ha −1 (R 2 ~0.9) to about 48.7 m 3 ·ha −1 (R 2 ~0.75) when the terrain slope increases from [0-10%] to slope values higher than 20% (between 20 and 45%). The mean difference between estimated V and in situ V (bias, Table 5) also decreased from, respectively, 0.21 m 3 ·ha −1 and −1.60 m 3 ·ha −1 for MV1 and RFVRH for slopes between 0-10% to, respectively, 13.95 m 3 ·ha −1 and 11.42 m 3 ·ha −1 for MV1 and RFVRH for slopes higher than 20%. The Wang-based methodology (sRFHRHT+HG) did not show an increased overestimation by increased slopes (bias between −2.66 and −5.91 m 3 ·ha −1 , Table 5). Nonetheless, the (sRFHRHT+HG) model showed lower accuracy for slopes higher than 10%. Indeed, for slopes between 0-10%, the RMSE of the estimation of V using the (sRFHRHT+HG) model was 27.57 m 3 ·ha −1 (R 2 of 0.91) and decreased to 53.82 m 3 ·ha −1 (R 2 of 0.79) for slopes between 10-20% and 49.12 m 3 ·ha −1 (R 2 of 0.74) for slopes higher than 20%. Finally, the modified Wang model (fRFHRHT+HG) which relies on metrics derived from the fitted ground return of the waveforms, similarly to estimates, was the most accurate model. Indeed, the results presented in Figure 9 and Table 5 show that the estimation of V using the model (fRFHRHT+HG) was the most accurate in comparison to the three other models with an RMSE ranging between 26.78 m 3 ·ha −1 (RMSPE of 20%, R 2 of 0.92) for slopes between 0-10% to 36.29 m 3 ·ha −1 (RMSPE of 20%, R 2 of 36.29 m 3 ·ha −1 ) for slopes higher than 20%. Moreover, the mean difference between the estimated V and in situ V using the model (fRFHRHT+HG) remained relatively stable with a mean difference of 1.32 m 3 ·ha −1 for slopes between 0-10% to −2.65 m 3 ·ha −1 for slopes higher than 20% (Table  5).  9. Comparison of measured vs. estimated wood volume from the models presented in Section 3.2 using GEDI metrics extracted using algorithm a1. RMSE is expressed in m 3 .ha -1 .
The variability of the difference between the estimated and in situ V for the four tested models across five slopes ranges is presented in Figure 10. As seen previously, MV1 and RFVRH both show sensitivity to slopes higher than 10%. This is evident by the increased median difference between the estimated and in situ V. For the model (MV1, Figure 10), the median difference between the estimated and in situ V increased from −0.23 m 3 ·ha −1 for slopes between 0-5% to 24.32 m 3 ·ha −1 for slopes higher than 20%. Similarly, for the model (RFVRH), the median difference between the estimated and in situ V increased from −1.53 m 3 ·ha −1 for slopes between 0-5% to 20.20 m 3 ·ha −1 for slopes higher than 20%. The models sRFVRHT+HG and fRFVRHT+HG were both insensitive to slopes with a median difference between the estimated and in situ V ranging from 0.58 m 3 ·ha −1 (slopes ∊ [0-5]%) and 5.42 m 3· ha −1 (slopes > 20%) for the model sRFVRHT+HG and from 1.21 m 3 ·ha −1 (slopes ∊ [0-5]%) and 2.16 m 3 ·ha −1 (slopes > 20%) for the model fRFVRHT+HG ( Figure 10). Nonetheless, the model sRFVRHT+HG showed higher variability on the estimates of V for slopes higher than 10% in comparison to the model fRFVRHT+HG ( Figure 10). Figure 10. Box-whiskers plots of the difference between the estimated and in situ using four estimation models across five slope ranges. Numbers in parenthesis represent the number of points in each slope class.

Discussion
The results presented in this study show that the models relying on the SRTM DEM generated terrain metrics provide a limited slope-effect correction of and wood volume (V) estimates from GEDI data, especially for high slope values (e.g., higher than 20%). Indeed, for MH2 and RFHRH the effect of the slopes was minimal for slope ranges 0-20% and increased for slopes higher than 20%, with an increase on the RMSE of 63 cm for model 2 and 19 cm for the model RFHRH. Nonetheless, the effect of the terrain slope was more pronounced on the estimation of V, and a decrease in accuracy (RMSE) was observed for slopes as low as 10% (e.g., for the RFVRH model, the RMSE decreased from 26.55 m 3 ·ha −1 for terrain slopes lower than 10% to more than 46 m 3 ·ha −1 for terrain slopes higher than 10%). These results indicate that the SRTM DEM with its 30-m resolution is not adequate for the 25-m wide GEDI footprints. Therefore, a finer resolution DEM, for example 10 m, is required for the 25-m GEDI footprints. However, the results showed that for V, the R 2 remained high and that the highest volumes were underestimated for high values for all slope classes. The difference in RMSE between the slope classes may therefore come from the fact that there were proportionally more data with high V for the two slope classes 10-20% and 20-45% than for the 0-10% class.
The methodology proposed by Wang using metrics generated from either simulated or fitted ground returns provided the best results, as both approaches showed that the effects of slopes were minimized for all slope classes available in this study. Indeed, among all the generated metrics from GEDI waveforms, the and metrics were the only metrics that were independent from the slope. Figure 11 shows that the distribution of errors (GEDI-in situ) of 100 (Figure 11.a) and 100 (Figure 11.b) was slope dependent, especially for slopes higher than 10% with an intercept of, respectively, ~0.04 and ~0.07. In contrast, the distribution of errors for 100 and 100 (Figures 11.c, and  11.d) was constant across all slope gradients (intercept ≈ 0.003). Nonetheless, the estimation of and V using and was slightly less accurate than and . The uncertainties on the estimation of both variables using the and metrics could be attributed to two factors. Firstly, the simulated ground return used to calculate the and is dependent on the slope of the studied waveform, which is calculated from the SRTM DEM. Given the 30-m resolution of the SRTM DEM, and the 25-m footprint diameter, terrain information within the footprint could not be accurately calculated. This is evident in Figure 12 which compares the full width at half maximum (FWHM) of the simulated and fitted ground returns. The results presented in Figure 12 show that the FWHM of both the simulated and fitted ground returns are highly correlated for slopes lower than 10% but for slopes higher than 10%, the FWHM of the simulated ground returns are lower than the FWHM of the fitted ground returns, leading to the uncertainties on the estimation of and V. Secondly, in the formulation of Wang et al. [26], the ground returns were simulated using a symmetrical Gaussian function [39,40] assuming a plane slope within the 25-m footprint. Nevertheless, this formulation is not always satisfactory as the return echo components recorded by FW LiDAR systems are asymmetrical. Thus, the fitting process has lower accuracy when the echoed asymmetrical components are fitted using a symmetrical Gaussian function. Moreover, GEDI waveforms display a sharp rising part and a slower descending one. As such, a lognormal function, which is characterized by a short rise time and a tailing, could be a better fit for the simulation of the ground echo than a Gaussian.  Finally, while the metrics proposed by Wang et al. [26] seem to greatly minimize the effect of slopes over our study area, there are several uncertainties still unaccounted for. First, within the Eucalyptus stands, tree heights are very homogeneous (i.e., the tree height is evenly distributed within the footprints). However, it was reported by Hyde et al. [41] that the uneven distribution of canopy structure could increase the uncertainty on the estimation of canopy structure with FW LiDAR as the dominant trees at the edge of the LiDAR footprint might not be detectable due to the lower laser energy at the edge of the footprint in comparison to its center. Therefore, in a future work, the metrics proposed by Wang et al. [26] should be assessed with GEDI acquisitions over natural forests. Another source of uncertainty could be present for slopes higher than 20%. In this study, there were very few acquisitions with slopes higher than 20%, and while the slope effects were greatly reduced for slope ranges 0-20%, it is necessary to analyze the pertinence of the Wang et al. [26] methodology for high slopes.

Conclusions
In this study, several approaches were tested in order to minimize the uncertainties due to the presence of slopes on the estimation of forest canopy heights and wood volume from GEDI data. The tested approaches can be classified into two groups. The first group are methods incorporating traditional GEDI waveform metrics (e.g., , ) and ancillary SRTM DEM data (e.g., terrain slope S, surface roughness ROUG), while the second group is based solely on new GEDI waveform metrics. The results showed that the methods relying on ancillary SRTM DEM data provided limited correction capabilities for slopes higher than 20% and this for both canopy heights and wood volume estimates. Indeed, the random forest regression model (RFHRH) using the relative height metrics ( ) as well as the S and ROUG variables extracted from the SRTM DEM presented an increase of 14% in terms of RMSE (8% decrease in R 2 ) on the estimates for acquisitions over terrain slopes between 10-20% and slopes higher than 20%. Moreover, the same model when used to estimate the wood volume (RFVRH) showed a decrease in accuracy (increase in RMSE) for slopes higher than 10%. Indeed, for the model RFVRH an RMSE increase on the estimation of the wood volume of 74% was reported between GEDI acquisitions with terrain slopes between 0-10% and GEDI acquisitions with terrain slopes higher than 10%. These results indicate that the 30-m resolution SRTM DEM is not suitable for the 25-m wide GEDI footprints.
Next, building on the model of Wang et al. [26], two sets of metrics were generated for GEDI waveforms, the first set of metrics ( and ) were generated using a simulated ground return that varied based on the slope of the GEDI footprint. The second set of metrics ( and ) were generated using a fitted Gaussian from the ground return. Estimation approaches of and V using these sets of metrics and the Random Forest technique provided the most accurate estimates of and V for all terrain slope ranges. The ( and )-based model showed an RMSE that ranged between 1.39 and 1.66 m (between 26.76 and 39.26 m 3 ·ha −1 for V) while the ( and )-based method showed an RMSE that ranged between 1.26 and 1.34 m (between 26.78 and 36.29 m 3 ·ha −1 for V). Moreover, the dependency of the GEDI metrics on slopes (e.g., intercept of 0.069 for 100 ) was greatly reduced for the two set of metrics (e.g., intercept of ≈ 0.003 for both 100 and 100 ). Nonetheless, the model based on the ( and ) performed slightly worse than the ( and )-based model for the estimation of both forest variables. The decrease in accuracy of the ( and )-based model is due to the use of the 30-m SRTM DEM (the only available global DEM) for the 25-m GEDI footprints. On the other hand, the ( and ) metrics rely on the presence of a distinct ground return in the received waveform which might not always be detectable over high sloping terrain. Therefore, it is recommended that for the estimation of and V over moderately sloping terrain (i.e., presence of distinct ground peak) the ( and ) metrics should be used, while the ( and ) metrics should be used for high sloping terrain (i.e., undetectable ground return).
Finally, the effect of slopes on the 25-m GEDI footprints is rather low as the estimation on canopy heights degraded by a maximum of 1 m for slopes between 20 and 45%. In regard to the wood volume estimation, the effect of slopes was more pronounced, and a degradation on the accuracy (increased RMSE) of a maximum of 20 m 3 ·ha −1 was observed for slopes between 20 and 45%.