Analysis of GEDI Elevation Data Accuracy for Inland Waterbodies Altimetry

: The Global Ecosystem Dynamics Investigation (GEDI) Light Detection And Ranging (LiDAR) altimetry mission was recently launched to the International Space Station with a capability of providing billions of high-quality measurements of vertical structures globally. This study assesses the accuracy of the GEDI LiDAR altimetry estimation of lake water levels. The di ﬀ erence between GEDI’s elevation estimates to in-situ hydrological gauge water levels was determined for eight natural lakes in Switzerland. The elevation accuracy of GEDI was assessed as a function of each lake, acquisition date, and the laser used for acquisition (beam). The GEDI elevation estimates exhibit an overall good agreement with in-situ water levels with a mean elevation bias of 0.61 cm and a standard deviation (std) of 22.3 cm and could be lowered to 8.5 cm when accounting for instrumental and environmental factors. Over the eight studied lakes, the bias between GEDI elevations and in-situ data ranged from − 13.8 cm to + 9.8 cm with a standard deviation of the mean di ﬀ erence ranging from 14.5 to 31.6 cm. Results also show that the acquisition date a ﬀ ects the precision of the GEDI elevation estimates. GEDI data acquired in the mornings or late at night had lower bias in comparison to acquisitions during daytime or over weekends. Even though GEDI is equipped with three identical laser units, a systematic bias was found based on the laser units used in the acquisitions. Considering the eight studied lakes, the beams with the highest elevation di ﬀ erences compared to in-situ data were beams 1 and 6 (standard deviations of − 10.2 and + 18.1 cm, respectively). In contrast, the beams with the smallest mean elevation di ﬀ erence to in-situ data were beams 5 and 7 ( − 1.7 and − 2.5 cm, respectively). The remaining beams (2, 3, 4, and 8) showed a mean di ﬀ erence between − 7.4 and + 4.4 cm. The standard deviation of the mean di ﬀ erence, however, was similar across all beams and ranged from 17.2 and 22.9 cm. This study highlights the importance of GEDI data for estimating water levels in lakes with good accuracy and has potentials in advancing our understanding of the hydrological signiﬁcance of lakes especially in data scarce regions of the world.


Introduction
Freshwater resources are a renewable resource that is vital for the sustainability of all life forms on Earth. Surface fresh waters, which are found in the form of snow and glaciers, rivers, lakes,

In-Situ Water Levels from Gauge Stations
Water level records from in-situ gauge stations over lakes were obtained free of charge from the Hydrology Department of the Federal Office for the Environment (FOEN) (www.hydrodaten.admin.ch). FOEN currently monitors the quantity and quality of surface water and groundwater through a network of 260 gauging stations across Switzerland. In this study, the comparison between GEDI footprint elevations and the in-situ lake water elevations was done by comparing the footprint elevation at GEDI acquisition time and the reported in-situ daily-mean water level elevation corresponding to the GEDI acquisition date.

GEDI Data Products
The Global Ecosystem Dynamics Investigation (GEDI) onboard the International Space Station (ISS), which commenced operations in early 2019, uses three onboard lasers that produce eight parallel tracks (beams) of observations. GEDI lasers illuminate a surface or footprint on the ground with a 25 m diameter, at a frequency of 242 Hz, over which 2D or 3D structures are measured. The footprints are separated by~60 m (center to center) along the beam, and the beams are separated by~600 m. As the ISS is not maintained in a repeating orbit [25], the repeat cycle of GEDI acquisitions are not guaranteed. However, GEDI has the ability to rotate the instrument up to six degrees, allowing the lasers to be pointed as much as 40 km on either side of the ISS's ground track [25]. Over our studied lakes, there are on average two acquisition series per lake during the first two months of available GEDI data (Table 1), which correspond to the time period between mid-April 2019 and mid-June 2019. GEDI measures vertical structures using a 1064-nm laser pulse, and the echoed waveforms are digitized to a maximum of 1246 bins with a vertical resolution of 1 ns (15 cm), corresponding to a maximum of 186.9 m of height ranges, with a vertical accuracy over relatively flat, non-vegetated surfaces of~3 cm [30].
In order to measure 3D structures, GEDI uses its onboard telescope that collects the light reflected by the ground, vegetation, and even clouds. The collected light, which represents the amount of laser energy reflected from surface objects within the footprint at different heights, is converted to voltage, and recorded as a function of time in 1 ns intervals. Then, object heights are calculated by multiplying the recorded time by the speed of light, which produces the full-waveform. The recorded waveform can then be used to derive a variety of height metrics, such as vegetation canopy heights, canopy vertical profiles, and relative height (RH, i.e., vertical distribution relative to the ground).
GEDI is also capable of deriving topographic elevations in the same manner as conventional radar altimeters, i.e., by calculating the range from the surface to the system, which is then converted to the sea surface's height above a reference ellipsoid. Over flat surfaces, such as water surfaces, or bare-grounds, the recorded waveform has a Gaussian form (single peak or mode) similar in shape to the transmitted pulse (Figure 2a). Waveforms recorded within footprints over complex geometries (e.g., forests) will be multi-modal in shape, with each mode representing a reflection from a distinct surface height (Figure 2b). Therefore, in order to precisely estimate water surface elevation, the location of the mode in the waveform, representing the water surface, should be determined as accurately as possible.
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 23 Typical GEDI waveforms over a lake (a) and a forest stand (b). An unusable waveform due probably due to cloud conditions is shown in (c).
Before any metric can be determined in the received waveforms, the first waveform processing step is, as described in the Algorithm Theoretical Basis Document (ATBD) [31,32], consists in the smoothing of waveforms. Waveform smoothing allows minimizing the noise in the signal, and thus permitting the determination of the useful part of the waveform within the corresponding footprint. Waveform smoothing is performed by means of a Gaussian filter with various widths. As mentioned in the ATBD, currently a width of 6.5 ns was used for the Gaussian filter (smooth width). After smoothing, two locations in the waveform denoted as searchstart and searchend are determined ( Figure  3). searchstart and searchend are, respectively, the first and last positions in the signal where the signal intensity is above the following threshold: where 'mean' is the mean noise level, 'σ' is the standard deviation of noise of the smoothed waveform, and 'v' is a variable, currently set at 4. After determining the locations of searchstart and searchend, the region between them, denoted as the waveform extent, is extended by a predetermined number of sample bins, currently set to 100 bins at both sides. Inside the waveform extent, the highest (toploc) and lowest (botloc) detectable returns are determined ( Figure 3). toploc and botloc, respectively, represent the highest and lowest locations inside the waveform extent were two adjacent intensities are above a threshold. The threshold equation used to determine toploc and botloc is the same as Equation (1), with 'v' an integer fixed to 2, 3, 4, and 6. In the ATBD, the value of 'v' used to determine toploc is named 'Front_threshold' and 'Back_threshold for botloc. Currently, six configurations or algorithms, representing different threshold and smoothing settings, were used to determine Typical GEDI waveforms over a lake (a) and a forest stand (b). An unusable waveform due probably due to cloud conditions is shown in (c).
Before any metric can be determined in the received waveforms, the first waveform processing step is, as described in the Algorithm Theoretical Basis Document (ATBD) [31,32], consists in the smoothing of waveforms. Waveform smoothing allows minimizing the noise in the signal, and thus permitting the determination of the useful part of the waveform within the corresponding footprint. Waveform smoothing is performed by means of a Gaussian filter with various widths. As mentioned in the ATBD, currently a width of 6.5 ns was used for the Gaussian filter (smooth width). After smoothing, two locations in the waveform denoted as searchstart and searchend are determined (Figure 3). searchstart and searchend are, respectively, the first and last positions in the signal where the signal intensity is above the following threshold: where 'mean' is the mean noise level, 'σ' is the standard deviation of noise of the smoothed waveform, and 'v' is a variable, currently set at 4. After determining the locations of searchstart and searchend, the region between them, denoted as the waveform extent, is extended by a predetermined number of sample bins, currently set to 100 bins at both sides. Inside the waveform extent, the highest (toploc) and lowest (botloc) detectable returns are determined ( Figure 3). toploc and botloc, respectively, represent the highest and lowest locations inside the waveform extent were two adjacent intensities are above a threshold. The threshold equation used to determine toploc and botloc is the same as Equation (1), with 'v' an integer fixed to 2, 3, 4, and 6. In the ATBD, the value of 'v' used to determine toploc is named 'Front_threshold' and 'Back_threshold for botloc. Currently, six configurations or algorithms, representing different threshold and smoothing settings, were used to determine waveform metrics with high precision in a variety of acquisition scenarios (Table 2). Finally, the location of distinctive peaks or modes in the waveform, such as the ground peak, or top of canopy peaks are determined using a second Gaussian filtering of the waveform section between toploc and botloc, and then finding all the zero crossings of the first derivative of the filtered waveform ( Figure 3). The width of the second Gaussian filter (Smoothwidth_zcross) is fixed to either 3.5 or 6.5 ns. Finally, the position of the ground return within the waveform is determined using the position of the last detected peak. Therefore, the geolocation (longitude, latitude, and elevation) of the ground return is interpolated using its offset to the start of the received waveform. The six different algorithms used for the detection of the waveform metrics, generally lead to six different elevations of the ground return. However, since waveforms acquired over water are in general uni-modal waveforms, only two different sets of algorithms produce different elevations. Sets 1 and 4 are similar, and sets 2, 3, 5, and 6 are similar. Therefore, in this study, which is carried on waveforms acquired over water surfaces, only the elevations produced from algorithms 1 and 2 were analyzed.
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 23 return is interpolated using its offset to the start of the received waveform. The six different algorithms used for the detection of the waveform metrics, generally lead to six different elevations of the ground return. However, since waveforms acquired over water are in general uni-modal waveforms, only two different sets of algorithms produce different elevations. Sets 1 and 4 are similar, and sets 2, 3, 5, and 6 are similar. Therefore, in this study, which is carried on waveforms acquired over water surfaces, only the elevations produced from algorithms 1 and 2 were analyzed.  GEDI data used in this study are already processed and published by the Land Processes Distributed Active Archive Center (LP DAAC). Currently, there are three data products (L1B, L2A, and L2B) that are available for download. The L1B data product [30] contains detailed information about the transmitted and received waveforms, the location and elevation of each waveform footprint, and other ancillary information, such as mean and standard deviation of the noise, and acquisition time. The L2A data product [31] contains data of elevation and height metrics of the vertical structures within the waveform. These height metrics are issued from the processing of the received waveforms from the L1B data product. Finally, the L2B data product [32] provides footprintlevel vegetation metrics, such as canopy cover, vertical profile metrics, Leaf Area Index (LAI), and foliage height diversity (FHD). In this study, the received waveforms, their geolocation (longitude, and latitude), as well as their acquisition times, were extracted from the L1B data product. In the L2A  GEDI data used in this study are already processed and published by the Land Processes Distributed Active Archive Center (LP DAAC). Currently, there are three data products (L1B, L2A, and L2B) that are available for download. The L1B data product [30] contains detailed information about the transmitted and received waveforms, the location and elevation of each waveform footprint, and other ancillary information, such as mean and standard deviation of the noise, and acquisition time. The L2A data product [31] contains data of elevation and height metrics of the vertical structures within the waveform. These height metrics are issued from the processing of the received waveforms from the L1B data product. Finally, the L2B data product [32] provides footprint-level vegetation metrics, such as canopy cover, vertical profile metrics, Leaf Area Index (LAI), and foliage height diversity (FHD). In this study, the received waveforms, their geolocation (longitude, and latitude), as well as their acquisition times, were extracted from the L1B data product. In the L2A data product, the derived metrics are also grouped by algorithm. Therefore, for each beam, the metrics derived from each of the six algorithms, as well as the parameters used for each algorithm, are available. Therefore, we extracted from L2A for each beam, and for each of algorithms 1 and 2, the following variables: (1) the position within the waveform, as well as the elevation of toploc and botloc, (2) the latitude and longitude, as well as the elevation of the lowest peak or mode, (3) the amplitude of the smoothed waveform's lowest detected mode (zcross_amp), (4) the width of Gaussian fit of the received waveform (rx_gwidth), and (5) the number of detected modes (num_detectedmodes). No metrics were extracted from the L2B data product as they were not relevant to this study.

Filtering of GEDI Waveforms
Not all GEDI acquisitions are viable, as atmospheric conditions and clouds can affect them ( Figure 2c). Therefore, two filters were applied to remove erroneous lower quality returns. The first filter applied removes waveforms with reported elevations that are significantly higher than the corresponding Shuttle Radar Topography Mission (SRTM) DEM elevation [33] (i.e., we removed all waveforms were |GEDI elevation-SRTM| > 100 m). Since we are only interested with waveforms that are acquired over water, we removed all waveforms having two or more peaks or modes. A multi-modal waveform is a strong indication that the waveform was acquired over areas with complex geometry (e.g., vegetation or considerable relief). Information regarding the number of detected modes for each waveform were acquired from the L2A data product. Over the eight studied lakes, 21242 GEDI shots were available for comparison with the lake gauge data ( Figure 1, Table 1). From these shots, only 4637 (21.8%) provided exploitable waveforms.
GEDI data accessible through NASA's LP DAAC contain a quality flag (quality_flag) for each acquired waveform. A waveform with a quality flag set to '1 indicates that the waveform meets certain criteria based on energy, sensitivity, amplitude, and real-time surface tracking quality, and thus can be processed further. However, in this study, after the application of the SRTM DEM filter, waveforms with either value of the quality_flag (0 or 1) showed similar characteristics (e.g., defined single peak, high signal to noise ratio, etc.). Therefore, all waveforms were analyzed regardless of the value of the quality_flag.

Transformation of GEDI Elevations
In order to conduct a consistent analysis between the elevations provided by GEDI and water elevation from gauge stations, the heights from both datasets must refer to the same vertical datum. In this study, the geolocated GEDI waveform elevations are relative to the WGS 84 ellipsoid, while gauge stations are provided as orthometric heights with reference to the French height system (NGF-IGN69) for lakes Geneva and Neuchâtel, and the Swiss height measurement reference system (LN02) for the other lakes. The tide gauge at Marseille determines the 'zero level' for all elevations in France, while the reference for all height measurements in Switzerland is the "Repère Pierre du Niton" in the harbor of Geneva (stone). The elevation of this stone was evaluated in 1902 to be 373.6 m over sea level.
The conversion to orthometric heights of GEDI shots acquired over lakes Geneva and Neuchâtel was made using the following equation: Here, H IGN69 is the derived orthometric height of GEDI footprints from leveling with respect to NGF-IGN69, h wgs84 is the GEDI footprint elevation above the WGS 84 ellipsoid, and N IGN69 are the French gravimetric geoid heights (e.g., between 48.49 and 50.41 m for Lake Geneva). The value for Remote Sens. 2020, 12, 2714 9 of 22 N IGN69 was obtained by bilinear interpolation of a 1 km NGF-IGN69 Geoid Height Grid provided by the French National Institute of Geographic and Forest Information (IGN) (geodesie.ign.fr).
For the remaining lakes, to convert between ellipsoidal elevations and orthometric heights with respect to LN02, a two-step process is required. First, ellipsoidal elevations of GEDI footprints were converted to orthometric elevations with respect to the new Swiss height system LHN95 (Landeshöhennetz 1995) using the following equation: where H LHN95 is the converted GEDI footprint elevation with respect to LHN05, and N CHGEO2004 the Swiss gravimetric geoid heights. Then, GEDI footprint elevations, which are now orthometric elevations with respect to LHN95, are converted to the Swiss height system (LN02) by means of three grids. Three grids are required, as height conversion between LHN95 and LN02 cannot be modeled by a single offset. This is due to their different way of gravity reduction, the treatment of vertical movements, and the constraints introduced in LN02. Therefore, the conversion between orthometric LHN95 heights and LN02 heights was made using the following equation [34]: where H ln02 are the GEDI footprint elevations with respect to the Swiss height system (LN02), H norm is a 1 km grid describing the difference between LN02 and normal heights, H scale is a 1 km grid scale factor used to transform between normal heights and orthometric heights, ∆g boug is a 1 km grid representing the Bouguer anomalies, and g is the average normal gravity equal to 980,000 mGal. The Swiss geoid grid (CHGeo2004), as well as the three grids used in the transformation between LHN95 and LN02 heights, were obtained from the Swiss Federal Office of Topography (www.swisstopo.admin.ch).

Results
This section will begin with an analysis of two exemplary GEDI waveforms acquired on lakes ( Figure 2). Then, the remainder of the results section will analyze the quality of the GEDI elevations for each lake, date, and finally for each beam. Figure 2a shows a perfect example of a viable GEDI waveform over water surfaces (usable waveform with a high signal-to-noise ratio). The waveform presents a single distinct peak corresponding to the water surface, with very low noise level. In contrast, Figure 2c shows a GEDI waveform with very high noise level and no distinctive peaks, which renders such waveforms useless. The example waveform shown in Figure 2c could correspond to acquisitions in the presence of clouds over our study area.
The comparison between GEDI elevations and in-situ elevations registered from the hydrological gauge stations shows that the parameters used in algorithm a1 (Smoothwidth_zcross of 6.5 ns, Table 2) provide more precise elevations in comparison to algorithm a2 (Smoothwidth_zcross of 3.5 ns, Table 2). Using the entire database from all the lakes in this study (8 lakes and 4637 viable waveforms), GEDI footprint elevations in comparison to in-situ gauge station elevations showed a mean elevation difference of 0.61 cm with a1 and 7.8 cm with a2. The standard deviation of the mean difference between GEDI footprint elevations and gauge station readings is 22.3 cm using a1 and 23.7 cm using a2. The root mean square error (RMSE) on GEDI elevations is slightly higher using a2 with a value of 24.9 cm against 22.3 cm using a1.

Analysis of GEDI Waveforms for Each Lake
The precision of elevations estimated from GEDI waveforms was studied separately for each lake using all GEDI beams from all acquisition dates. Table 3 shows a mean difference (MD) between GEDI and in-situ elevations that varies between −13.8 cm (under-estimation by GEDI) and +9.8 cm (over-estimation by GEDI). The reported standard deviation from MD varied between 14.5 cm and 31.6 cm. Table 3. Summary statistics of elevations from GEDI acquisitions for each of the 8 studied lakes (Mean MD, standard deviation std, and root mean square error RMSE of the difference between GEDI elevations and in-situ elevations) using data from all acquisition dates given in Table 1 and from all the beams.  Figure 4a shows an example of GEDI data for a transect with its 8 beams, acquired on May 29th 2019 at 9:25 p.m. over lake Neuchâtel in Switzerland (GEDI elevations for all lakes can be found in Appendix A, Figure A1). This example shows what has also been observed over the other lakes, albeit with different elevation precision depending on the acquisition date, or beam. Over the transect in Figure 4a, the mean difference (MD) between elevations from GEDI and those reported by the gauge station varied between −6.4 (under-estimation by GEDI) for beam 1 and +45.2 cm (over-estimation by GEDI) for beam 6 ( Figure 4b). The standard deviation from MD varied between 5.5 cm for beam 7 and 17.5 cm for beam 4. Using elevations from all the beams acquired on May 29th 2019 over Lake Neuchâtel, the calculated MD was in the order of +6.1 cm (over-estimation by GEDI) with a standard deviation of 16.1 cm. Moreover, over some GEDI footprints, we observed on some beams, elevations that deviated greatly from the mean of all GEDI elevations, with some of these elevations being 50 cm further from the mean. Despite all verification, we were unsuccessful in explaining the reason for such elevation differences, even though these points were acquired in the middle of the lake, and their corresponding waveforms showed very high signal to noise ratio, and resembled in form to other waveform from other footprints. Table 4 shows the mean difference and the standard deviation between elevations from GEDI and in-situ gauge records, using data over all lakes, grouped by date. Results show that the mean difference (MD) between elevations from GEDI and in-situ gauges varied between −26.8 cm (under-estimation by GEDI) and +15.2 cm (over-estimation by GEDI). The lowest bias corresponded to data acquired the mornings of April 28, and May 02 and 04, or late at night on May 22. The highest recorded bias was observed on acquisitions that were made around noon (e.g., April 21, May 28, and June 08), in the early evening (May 29), GEDI acquisitions taken over the weekend (e.g., April 20 and 21, June 08), or before a holiday (e.g., May 22). These strong biases could be due to several phenomenon. (1) Increased perturbations of the water surface due to human activities taking place at these times. The reported standard deviation from MD shows that it varies between 12.7 cm and 24.9, with a standard deviation lower than 15 cm for morning acquisitions (e.g., April 28, May 02, and 04), with the exception of June 08, which corresponds to acquisitions taken around noon. (2) Currents generated by thermal effects or winds [35,36].

Analysis of GEDI Waveforms by Date
Neuchâtel, the calculated MD was in the order of +6.1 cm (over-estimation by GEDI) with a standard deviation of 16.1 cm. Moreover, over some GEDI footprints, we observed on some beams, elevations that deviated greatly from the mean of all GEDI elevations, with some of these elevations being 50 cm further from the mean. Despite all verification, we were unsuccessful in explaining the reason for such elevation differences, even though these points were acquired in the middle of the lake, and their corresponding waveforms showed very high signal to noise ratio, and resembled in form to other waveform from other footprints.     Figure 5 shows the summary of the statistics calculated from the difference between GEDI and hydrological gauge elevations for each date and each beam, using data from all lakes. These statistics were first calculated for each GEDI beam and for each date. Only the statistics with at least 30 GEDI shots for each date/beam pair are reported in this section. Results show that the bias (elevations from GEDI-elevations from gauge stations) varied depending on the acquisition date, and the beam. For certain beams, at certain dates, elevation from GEDI had the same order of magnitude as in-situ elevations, while, for other dates, and even for the same beam, the bias was very significant (Figure 5a). For example, on 20 April 2019, elevations from GEDI acquired over beam 1 showed a bias of −2.9 cm which increased to +21.9 cm on 8 June 2019. Similarly, Figure 5b shows that the standard deviation from the mean difference between GEDI and gauge station elevations varied between 4.6 cm (beam 6,  The statistics were then calculated for each GEDI beam, using all the dates. Results in Table 5 show that the difference between elevations from GEDI and in-situ gauge records differed according to the laser they were acquired with. The mean difference between GEDI and in-situ elevations varied between −10.2 cm (under-estimation by GEDI) and +18.1 cm (over-estimation by GEDI). The beams with the highest difference were beams 1 (coverage beam) and beams 6 (full power beam) with a mean difference of, respectively, −10.2 cm and +18.1 cm. In contrast, the beams that captured elevations with the smallest diversion from in-situ elevations were beams 5 and 7 (both full power beams), with a mean difference of −1.7 cm, and −2.5 cm, respectively, while the remaining beams (2,3, 4, and 8) show the mean observed difference varied between −7.4 cm and +4.4 cm. Finally, the standard deviation from the mean difference between elevations from GEDI and elevations from insitu gauges were similar across beams, with a standard deviation that varied between 17.2 cm and 22.9 cm. The statistics were then calculated for each GEDI beam, using all the dates. Results in Table 5 show that the difference between elevations from GEDI and in-situ gauge records differed according to the laser they were acquired with. The mean difference between GEDI and in-situ elevations varied between −10.2 cm (under-estimation by GEDI) and +18.1 cm (over-estimation by GEDI). The beams with the highest difference were beams 1 (coverage beam) and beams 6 (full power beam) with a mean difference of, respectively, −10.2 cm and +18.1 cm. In contrast, the beams that captured elevations with the smallest diversion from in-situ elevations were beams 5 and 7 (both full power beams), with a mean difference of −1.7 cm, and −2.5 cm, respectively, while the remaining beams (2, 3, 4, and 8) show Remote Sens. 2020, 12, 2714 13 of 22 the mean observed difference varied between −7.4 cm and +4.4 cm. Finally, the standard deviation from the mean difference between elevations from GEDI and elevations from in-situ gauges were similar across beams, with a standard deviation that varied between 17.2 cm and 22.9 cm. Table 5. Summary statistics for each GEDI beam (Mean MD and standard deviation std) for the difference between GEDI and in-situ elevations using all acquired GEDI waveforms (from all acquisition dates over all lakes). Finally, we present an analysis of the distribution of the difference between GEDI elevation estimates for each beam in comparison to in-situ elevations. The difference (D) between GEDI and in-situ elevations has been grouped into five intervals: (−100, −25), [−25, −10), [−10, +10), [+10, +25), and [+25, +100) cm. Figure 6 shows that the lowest elevation differences were obtained using beams 3, 4, and 8. Overall, GEDI elevations from beam 8 were the most accurate, followed by beams 3 and 4, then beams 1, 2, and 7, and finally beams 5 and 6 showed large differences between elevations from GEDI and those from in-situ gauge stations. For beam 8, 57% of the shots had a difference with in-situ elevations between −10 and 10 cm, followed by a small percentage of shots with D between −100 and −10 cm and between 10 and 100 cm. For beams 3 and 4, the difference between GEDI and in-situ elevations was between −10 and +10 cm for~50% of the shots, with a small percentage of shots with D between +25 and +100 cm (less than 5%), and between −100 and −25 cm (~14%). The difference in elevations D for beams 1, 2, and 7 was between −25 and +25 cm for, respectively, 78, 75 and 81% of the shots. Finally, for beam 6, 44% of the shots showed very high over-estimation of GEDI elevations (D between +25 and +100 cm), while, for beam 5, the elevation differences were distributed almost equally among the five classes of D. Moreover, results showed that almost 43% of the shots with an elevation difference D in the range (−100, −25 cm] or in the range [+25, +100 cm) were obtained from beams 5 and 6 (19.7% from beam 5, and 22.1% from beam 6). In contrast, only 5.3% and 6.5% of shots in beams 3 and 4 showed an elevation difference D in the range (−100; −25 cm] or in the range [+25;+100 cm).

GEDI-Hydrological
Remote Sens. 2020, 12, x FOR PEER REVIEW 14 of 23 Finally, we present an analysis of the distribution of the difference between GEDI elevation estimates for each beam in comparison to in-situ elevations. The difference (D) between GEDI and in-situ elevations has been grouped into five intervals: (−100, −25), [−25, −10), [−10, +10), [+10, +25), and [+25, +100) cm. Figure 6 shows that the lowest elevation differences were obtained using beams 3, 4, and 8. Overall, GEDI elevations from beam 8 were the most accurate, followed by beams 3 and 4, then beams 1, 2, and 7, and finally beams 5 and 6 showed large differences between elevations from GEDI and those from in-situ gauge stations. For beam 8, 57% of the shots had a difference with insitu elevations between −10 and 10 cm, followed by a small percentage of shots with D between −100 and −10 cm and between 10 and 100 cm. For beams 3 and 4, the difference between GEDI and in-situ elevations was between −10 and +10 cm for ~50% of the shots, with a small percentage of shots with D between +25 and +100 cm (less than 5%), and between −100 and −25 cm (~14%). The difference in elevations D for beams 1, 2, and 7 was between −25 and +25 cm for, respectively, 78, 75 and 81% of the shots. Finally, for beam 6, 44% of the shots showed very high over-estimation of GEDI elevations (D between +25 and +100 cm), while, for beam 5, the elevation differences were distributed almost equally among the five classes of D. Moreover, results showed that almost 43% of the shots with an elevation difference D in the range (−100, −25 cm] or in the range [+25, +100 cm) were obtained from beams 5 and 6 (19.7% from beam 5, and 22.1% from beam 6). In contrast, only 5.3% and 6.5% of shots in beams 3 and 4 showed an elevation difference D in the range (−100; −25 cm] or in the range [+25;+100 cm).

GEDI Waveform Metrics and Elevation Accuracy
In this section, we present the effect of some GEDI waveform metrics that could potentially be affected by sensor saturation, and thus have an effect on the GEDI elevation estimations. The correlation between the amplitude of the smoothed waveform at the lowest detected mode (zcross_amp from the L2A product) and the precision on the elevation has been analyzed. This analysis was carried on only two dates (20 April, and 02 May) which had the maximum number of GEDI acquisitions (~1000 shots for each acquisition date). On April 20th (Figure 7a), the variable zcross_amp varied between 280 and 4000 amplitude counts (AC). For zcross_amp between 280 and 600 AC, the difference D (difference between GEDI and in-situ elevations) increased with zcorss_amp from −50 to about +40 cm. For zcross_amp between 600 and 3000 AC, the difference D was stable with a value around +50 cm for beam 5 and 0 cm for beam 3. For values of zcross_amp between 3000 and 4000 AC the difference between GEDI and in-situ elevations decreased from +40 cm to around −10 cm. A similar pattern was observed for acquisitions taken on 2 May 2019 (Figure 7b), especially for zcross_amp between 3000 and 4000 AC. For zcross_amp between 600 and 3000 AC on May 2, the difference D was stable and was around 0 cm. However, it decreased to −60 cm for zcross_amp around~4000 AC. For zcross_amp, less than 600 AC, the difference D remained stable but with strong fluctuations for low zcross_amp values (zcross_amp less than 400 AC). The increased uncertainties for waveforms with higher values of zcross_amp (higher than 3000 AC) are most probably due to the specular reflection of the water that saturates the detector [29]. Moreover, a large portion of the waveforms with zcross_amp higher than 3000 AC were also observed as having a wider peak which indicates some form of saturation. Indeed, the analysis of the difference D and the width of Gaussian (width of the return peak in the case of unimodal waveforms) fit to the received waveforms (rx_gwidth, available in the L2A data product) shows two clusters (Figure 8). The first cluster corresponds to rx_gwidth values between 4.5 and 9 ns, and the second cluster corresponds to rx_gwidth between 11 and 17 ns. Figure 8 shows that the waveforms from the second cluster have a slight under-estimation of elevations of around 10 cm.

GEDI Waveform Metrics and Elevation Accuracy
In this section, we present the effect of some GEDI waveform metrics that could potentially be affected by sensor saturation, and thus have an effect on the GEDI elevation estimations. The correlation between the amplitude of the smoothed waveform at the lowest detected mode (zcross_amp from the L2A product) and the precision on the elevation has been analyzed. This analysis was carried on only two dates (20 April, and 02 May) which had the maximum number of GEDI acquisitions (~1000 shots for each acquisition date). On April 20th (Figure 7a), the variable zcross_amp varied between 280 and 4000 amplitude counts (AC). For zcross_amp between 280 and 600 AC, the difference D (difference between GEDI and in-situ elevations) increased with zcorss_amp from −50 to about +40 cm. For zcross_amp between 600 and 3000 AC, the difference D was stable with a value around +50 cm for beam 5 and 0 cm for beam 3. For values of zcross_amp between 3000 and 4000 AC the difference between GEDI and in-situ elevations decreased from +40 cm to around −10 cm. A similar pattern was observed for acquisitions taken on 2 May 2019 (Figure 7b), especially for zcross_amp between 3000 and 4000 AC. For zcross_amp between 600 and 3000 AC on May 2, the difference D was stable and was around 0 cm. However, it decreased to −60 cm for zcross_amp around ~4000 AC. For zcross_amp, less than 600 AC, the difference D remained stable but with strong fluctuations for low zcross_amp values (zcross_amp less than 400 AC). The increased uncertainties for waveforms with higher values of zcross_amp (higher than 3000 AC) are most probably due to the specular reflection of the water that saturates the detector [29]. Moreover, a large portion of the waveforms with zcross_amp higher than 3000 AC were also observed as having a wider peak which indicates some form of saturation. Indeed, the analysis of the difference D and the width of Gaussian (width of the return peak in the case of unimodal waveforms) fit to the received waveforms (rx_gwidth, available in the L2A data product) shows two clusters (Figure 8). The first cluster corresponds to rx_gwidth values between 4.5 and 9 ns, and the second cluster corresponds to rx_gwidth between 11 and 17 ns. Figure 8 shows that the waveforms from the second cluster have a slight under-estimation of elevations of around 10 cm.

Modelling GEDI Estimation Erorrs
In the previous sections, we showed that there were several instrumental and environmental factors affecting the acquired GEDI waveforms, thus producing an important difference between in-situ and GEDI estimated elevations. Among these factors, the provided zcross_amp, rx_gwidth from the L2a data product, and the derived GEDI viewing angle (θ) has been examined. zcross_amp and rx_gwidth were chosen as they are an indicator of saturation as seen in the previous section, whereas the viewing angle has been demonstrated in Urban et al. [37] to increase elevation errors for ICESat-1 GLAS when the viewing angle deviates from nadir due to precision attitude determination. In this paper, the GEDI viewing angle (θ) has been estimated using the following equation: where d is the distance between an acquired GEDI shot and the position of GEDI projected at nadir onto the WGS84 reference ellipsoid and H is the elevation of GEDI over the referenced ellipsoid. In addition to the previous factors, several additional environmental factors have also been considered since in-situ water levels do not necessarily provide water elevation across the surface of lakes as standing waves (seiches), and wind-generated waves are commonly present over lakes. Over the studied lakes, no direct information about waves were available, therefore, they were substituted by proxy variables. In essence, we chose the factors that influence the creation and the form of standing, and wind-generated waves (e.g., wave heights and wave direction). These factors include wind speed, wave direction, and lake depth. Wind speeds were acquired at each GEDI acquisition date using meteorological data from the nearest weather stations. Wave direction and average lake depth at each GEDI footprint were acquired from the LATLAS project (swisslakes.net) using GEDI footprint coordinates for lake depth, and wind direction and GEDI footprint coordinates to determine the wave direction. Nonetheless, these factors were only available for five of the eight studied lakes (Geneva, Neuchâtel, Zürich, Obersee (Zürich), and Lucerne).
In Section 3.2, it was shown that GEDI acquisition dates and times could influence the accuracy of GEDI elevation estimates. Therefore, two additional factors were considered for the modelling of GEDI estimation errors. (1) The acquisition time of a GEDI shot (Time of Day, TOD) was converted to a value between 1 and 3 representing, respectively, acquisitions taken in the morning (6 a.m. to 12 a.m.), afternoon (12 a.m. to 6 p.m.), and evening (6 p.m. to 12 p.m.). (2) The acquisition date of a GEDI shot was converted to a value between 1 and 7 representing the acquisition day (Day of Week, DOW).
Finally, GEDI estimation errors were modeled using the previously mentioned factors in a Random Forest regressor (RF). Random Forests are an ensemble of machine learning algorithms used for classification or regressing by fitting a number of decision trees on various sub-samples of the dataset, and uses averaging to improve the predictive accuracy and control over-fitting [38]. Compared to linear models, RF is advantageous for being able to model also nonlinear relationships (threshold effect) between the variable to explain and the explanatory variables. For this study, the number of trees in the RF were set to 100 trees, with a tree depth equal set to the square root of the number of available factors. In order to train and assess the model accuracy, we randomly split the database into 70% for training and 30% for validation (and accuracy estimation).
The random Forest regression results for the five lakes combined showed that we were able to explain~82% of the error (GEDI-in-situ elevations) variance and reduce the RMSE on the elevation estimation from 20.2 to 8.4 cm (Figure 9).
Remote Sens. 2020, 12, x FOR PEER REVIEW 17 of 23 Figure 9. Comparison between the modeled elevation difference as a function of instrumental and environmental factors and the obtained elevation difference between GEDI and in-situ elevations.
Moreover, the factors that contributed the most on the difference between GEDI and in-situ elevations were determined. This process was conducted using the percentage increase in the mean square error of the regressions (%IncMSE, estimated with out-of-bag cross validation) from the factor importance test for the random forests model (average and standard deviations of 50 repetitions) ( Figure 10). The factor importance test shows that the most important factors for the modeling of errors is related to the viewing angle of GEDI (47.6%), followed by zcross_amp (47.2%) wave direction (45.5%), depth (43.5%), wind speed (40.2%), and rx_gwidth (30.2%). The least important factors are the effect of the acquisition time (TOD 22.3%) and date (DOW 18.0%). The modeling of GEDI errors for each lake separately did not show any differences specific to the location, geography, or geometry of the lake. For the five lakes tested, the random forest regressor was able to explain between 70.1 to 83.3% of the error (GEDI-in-situ elevations) variance, with an accuracy on the GEDI elevations between 5.6 and 10 cm (Table 6). Moreover, the factors that contributed the most on the difference between GEDI and in-situ elevations were determined. This process was conducted using the percentage increase in the mean square error of the regressions (%IncMSE, estimated with out-of-bag cross validation) from the factor importance test for the random forests model (average and standard deviations of 50 repetitions) ( Figure 10). The factor importance test shows that the most important factors for the modeling of errors is related to the viewing angle of GEDI (47.6%), followed by zcross_amp (47.2%) wave direction (45.5%), depth (43.5%), wind speed (40.2%), and rx_gwidth (30.2%). The least important factors are the effect of the acquisition time (TOD 22.3%) and date (DOW 18.0%).
Remote Sens. 2020, 12, x FOR PEER REVIEW 17 of 23 Figure 9. Comparison between the modeled elevation difference as a function of instrumental and environmental factors and the obtained elevation difference between GEDI and in-situ elevations.
Moreover, the factors that contributed the most on the difference between GEDI and in-situ elevations were determined. This process was conducted using the percentage increase in the mean square error of the regressions (%IncMSE, estimated with out-of-bag cross validation) from the factor importance test for the random forests model (average and standard deviations of 50 repetitions) ( Figure 10). The factor importance test shows that the most important factors for the modeling of errors is related to the viewing angle of GEDI (47.6%), followed by zcross_amp (47.2%) wave direction (45.5%), depth (43.5%), wind speed (40.2%), and rx_gwidth (30.2%). The least important factors are the effect of the acquisition time (TOD 22.3%) and date (DOW 18.0%). The modeling of GEDI errors for each lake separately did not show any differences specific to the location, geography, or geometry of the lake. For the five lakes tested, the random forest regressor was able to explain between 70.1 to 83.3% of the error (GEDI-in-situ elevations) variance, with an accuracy on the GEDI elevations between 5.6 and 10 cm (Table 6). The modeling of GEDI errors for each lake separately did not show any differences specific to the location, geography, or geometry of the lake. For the five lakes tested, the random forest regressor was able to explain between 70.1 to 83.3% of the error (GEDI-in-situ elevations) variance, with an accuracy on the GEDI elevations between 5.6 and 10 cm (Table 6). Table 6. Accuracy of elevations from GEDI acquisitions for each of the 5 lakes without and with accounting for the sources of error (RMSE), using data from all acquisition dates given in Table 1 and from all the beams.

Lake
Error

Discussion
Using GEDI data extracted from the algorithms developed by the GEDI team, the accuracy of the GEDI water surface elevation estimates seems to be high enough. Overall, the standard deviation from the mean difference between GEDI and in-situ elevations is~22 cm with no apparent bias. Moreover, given GEDI's small footprint diameter (25 m), GEDI should provide better elevation estimates in comparison to, for example, radar altimeters for narrower water surfaces, such as rivers. On the other hand, while GEDI uses the same laser specs as those used for GLAS on board ICESat-1, the precision obtained by GEDI is inferior to that obtained using ICESat-1. In fact, Baghdadi et al. [29] in their study over Swiss lakes, observed an accuracy (RMSE) of elevation estimates in the order of~5 cm using ICESat-1 data.
GEDI's smaller footprint means that GEDI waveforms within the footprints could easily be affected by small disturbances coming, for instance, from water surface roughness, which leads to uncertainties in the estimation of water surface elevations. For example, GEDI acquisitions with the highest mean difference to in-situ elevations, and highest standard deviation were acquisitions taken during the weekend (e.g., April 21, June 08), or before a holiday (e.g., May 22). The uncertainties at these acquisition dates could be explained in part by the increased human activities over the water surface (e.g., ships) which pollutes the return waveform. Moreover, these uncertainties are also the result of small currents generated by thermal effects [35] or winds [36], that disrupts the water surface. Finally, over large lakes, water surface is not entirely flat due to the presence of wind-generated waves and seiches. Therefore, GEDI, depending on the angle of incidence, can over-or under-estimate the water surface level by providing elevations from the trough or the top of the waves. In our study of the effects of GEDI's viewing angle over each lake and each date, we observed that, generally, uncertainties on the estimation of elevations increased with increasing viewing angle. Moreover, the acquisitions with the large deviation to the mean elevation difference between GEDI and in-situ elevations were acquisitions with the largest viewing angle.
The time of GEDI acquisitions also introduces uncertainties on the elevation estimates. For example, during sunlit GEDI acquisitions, photons from the sun reflecting at the water surface could contaminate the returned echo and increase the noise. In this study, we found that GEDI elevation estimates with the lowest standard deviation and bias to the in-situ elevations were acquisitions taken during the morning or late at night when lake water surfaces are usually calm, cooler with low wind speeds. For these acquisition times (April 28 and May 02, 04, and 22), the mean difference between GEDI and in-situ elevations was −6 ± 15 cm.
The analysis of the precision of elevations from GEDI according to the used laser in the acquisition did not show any difference between coverage and full power laser. Nonetheless, some beams showed systematic differences in comparison to others. In general, the most accurate elevations came from beams 1-4 (coverage laser) and beams 7-8 (full power laser). Beams 5-6, which also correspond to a full power laser, showed the least precise elevations across all dates. Moreover, there were some differences on the elevation estimation accuracies across the beams. For example, footprints acquired from beam 8 had better precision than footprints from beam 7, albeit both beams are produced using the same laser unit onboard GEDI. Similar observations have been noted for beams 3 and 4, which were found to be more precise than beams 1 and 2. However, these uncertainties could be mostly explained by the errors introduced from the viewing angle of GEDI, which differs from one beam to another on a given acquisition date.
In general, the differences between GEDI and in-situ elevations are due to both instrumental and environmental factors. In our modeling of the errors, the two most contributing factors were the viewing angles of GEDI, and the saturation on some of the acquired waveforms assuming this occurs through the zcross_amp indicator. However, these factors could be corrected in unimodal waveforms. Another form of uncertainties is related to the uneven water surface due to standing and wind-generated waves. This was apparent by the high contribution of environmental factors, like the wave direction, depth of the lake at each GEDI footprint, and the wind speed. Lake depth is a direct indicator of the wave heights as waves near the shore (low depth) are higher than waves farther away, while wind speed controls the height of the generated wave. In this study, the contribution of wind speed on the errors appears to be lower than other factors. However, this is due to the low wind speeds at the present GEDI acquisition (maximum encountered wind speed of 12 km/h), which suggests small wind-generated waves. In addition to instrumental and environmental factors, the acquisition date and time of GEDI can also directly affect its accuracy. The time of the acquisition during the day, as well as the day of the week on which an acquisition took place, both can have effects on the echoed waveforms as seen previously. The effect of these two factors is related to the noise from the sun on the GEDI receiver, and the increased human activities over each lake during certain days.
Finally, the provided quality_flag from the L1B data product could help, in theory, to select GEDI data with higher accuracy on elevations. Using data issued after the application of our filter (|SRTM elevation-GEDI elevation| > 100 m), we observed only a slight decrease of the root mean square error on GEDI elevation estimates when we considered only the waveforms with a quality_flag = 1. For algorithm a1, we observed a decrease in the RMSE of around 1.4 cm (from 22.3 cm using data with both quality_flag = 0 and 1 to 20.9 cm using only quality_flag =1). Similarly, for a2, the decrease in the RMSE was around 0.8 cm (from 24.9 cm using data with both quality_flag = 0 and 1 to 24.1 cm using only quality_flag = 1).

Conclusions
In this study, we analyzed GEDI data in order to determine its accuracy of elevation estimation over lake surfaces using algorithms provided by the Land Processes Distributed Active Archive Center (LP DAAC). The objective was to study the quality of the first two months of GEDI data (min-April until mid-June 2019) acquired over several lakes in Switzerland. Overall, 4367 GEDI shots out of 21,242 available shots were exploitable and analyzed over the eight studied lakes.
This first analysis of GEDI data from the first two months of acquisitions showed a very low mean elevation difference between GEDI and in-situ gauge station elevations, in the order of 0.61 ± 22.3 cm for one standard deviation. While GEDI's reported vertical accuracy in this study was well below the 50 cm vertical accuracy provided in Dubayah et al. [31], it still remains higher than what was previously obtained using the ancient LiDAR satellite ICESat-1. In fact, the vertical accuracy of GLAS onboard ICESat-1 was better than 10 cm, as demonstrated by Baghdadi et al. [29].
The analysis of GEDI data by lake showed that the vertical precision varied from under-estimation by GEDI of −13.8 cm for certain lakes, to over-estimations of +9.8 cm for others, with a varying standard deviation between 14.5 and 31.6 cm.
The investigation of GEDI's vertical accuracy by date showed a mean difference between GEDI and in-situ gauge station elevations varying between −26.8 and +15.2 cm. The lowest bias corresponded to data acquired in the morning or late at night. The highest recorded bias was observed on acquisitions that were made around noon, in the early evening, and over the weekend. Moreover, the difference in GEDI's elevation accuracy according to the acquisition date is also affected, in part, by the instrument's viewing angle at acquisition time (larger viewing angle leads to lower accuracies). However, the full effects of the viewing angle were not studied in its entirety due to the small number of available acquisitions at the time of this writing.
The analysis of GEDI data by beam number showed that the difference between GEDI and gauge stations' elevations varied depending on the acquisition date and the beam. Certain beams at certain dates showed that elevations from GEDI were very similar to in-situ readings (fluctuations of few cm). Summary statistics calculated for each GEDI beam using acquisitions from all dates showed that the beams with the highest elevation differences to in-situ readings were beams 1 and 6 (−10.2 and +18.1 cm, respectively). In contrast, the beams with the smallest mean elevation difference to in-situ readings were beams 5 and 7 (−1.7 and −2.5 cm, respectively). The remaining beams (2, 3, 4, and 8) showed a mean difference between −7.4 and +4.4 cm. The standard deviation of the mean difference, however, was similar across all beams (between 17.2 and 22.9 cm).
The analysis of the metrics, such as the amplitude or width of the modes, did not allow further investigation of GEDI elevation estimation accuracy, even though a certain dependence was found between these metrics and the quality of GEDI data.
Nonetheless, accounting for instrumental and environmental factors increased the accuracy (RMSE) of GEDI estimates to 8.4 cm for all lakes and from 5.6 to 10 cm (with no apparent bias) when modeling the errors for each lake independently.
Following the first analysis done on the first GEDI data sets, we can conclude that GEDI has a strong potential for precise estimation of water surfaces of any size. Moreover, a better estimate of GEDI metrics by the LP DAAC can be expected in the near future, allowing for reprocessed data with a better elevation precision.