Antarctic Supraglacial Lake Detection Using Landsat 8 and Sentinel-2 Imagery: Towards Continental Generation of Lake Volumes

: Melt and supraglacial lakes are precursors to ice shelf collapse and subsequent accelerated ice sheet mass loss. We used data from the Landsat 8 and Sentinel-2 satellites to develop a threshold-based method for detection of lakes found on the Antarctic ice shelves, calculate their depths and thus their volumes. To achieve this, we focus on four key areas: the Amery, Roi Baudouin, Nivlisen, and Riiser-Larsen ice shelves, which are all characterized by extensive surface meltwater features. To validate our products, we compare our results against those obtained by an independent method based on a supervised classiﬁcation scheme (e.g., Random Forest algorithm). Additional veriﬁcation is provided by manual inspection of results for nearly 1000 Landsat 8 and Sentinel-2 images. Our dual-sensor approach will enable constructing high-resolution time series of lake volumes. Therefore, to ensure interoperability between the two datasets, we evaluate depths from contemporaneous Landsat 8 and Sentinel-2 image pairs. Our assessments point to a high degree of correspondence, producing an average R 2 value of 0.85, no bias, and an average RMSE of 0.2 m. We demonstrate our method’s ability to characterize lake evolution by presenting ﬁrst evidence of drainage events outside of the Antarctic Peninsula on the Amery Ice shelf. The methods presented here pave the way to upscaling throughout the Landsat 8 and Sentinel-2 observational record across Antarctica to produce a ﬁrst-ever continental dataset of supraglacial lake volumes. Such a dataset will improve our understanding of the inﬂuence of surface hydrology on ice shelf stability, and thus, future projections of Antarctica’s contribution to sea level rise.


Introduction
Supraglacial lakes form when liquid meltwater collects on the surface of a glacier, ice shelf, or ice sheet. Lakes require two prerequisites for their formation: liquid water and a sufficiently impermeable bottom. These two criteria are widely available in Greenland [1][2][3][4], but less prevalent in Antarctica. While supraglacial lakes in Greenland occur at mid-elevations, but generally still within the ablation

Data and Methods
Here, we describe the collection and preprocessing of the L8 and S2 imagery, introduce the study sites, and then delve into description of the techniques we used for automated lake area delineation and volume calculation. We build on commonly-used, successful techniques in the literature to detect supraglacial lakes and fine-tune them to make them specifically applicable to all locations in Antarctica where ponding occurs (mostly on and around ice shelves).

Study Area
While the overarching objective of our research is to produce an Antarctic-wide lake area and volume product, here, we focus on several key areas to develop our methods ( Figure 1). These sites include: the (1) Amery, (2) Roi Baudouin, (3) Nivlisen, and (4) Riiser-Larsen ice shelves, which have all been characterized by extensive surface hydrological networks for many decades [21,[28][29][30][31][32][33]. The Amery Ice Shelf drains grounded ice from the interior of the Lambert Glacier drainage basin, which covers 16% of the entire mass of the East Antarctic Ice Sheet [34]. The largest supraglacial lake, which is reported to be~80 km long, is located on the Amery Ice Shelf [21]. Extensive and active surface hydrology is known to exist across Roi Baudouin ice shelf in Dronning Maud land, East Antarctica, as the result of persistent katabatic winds driving a melt-albedo feedback [33]. On the neighboring Nivlisen Ice Shelf, surface lakes are observed to drain >70 km across the ice shelf [21,31]. The Riiser-Larsen Ice Shelf is about 400 km long and is located on the coast of Queen Maud Land. Similar to the aforementioned ice shelves, melt ponds have appeared on the Riiser-Larsen Ice Shelf across decades of satellite imagery [21]. The selection of these ice-shelf study areas provides a wide range of surface conditions and thus spectral variability related to different surface classes. Large variation in melt ponding characteristics and behavior make these ice shelves ideal locations for testing and refining our methods.

Data and Methods
Here, we describe the collection and preprocessing of the L8 and S2 imagery, introduce the study sites, and then delve into description of the techniques we used for automated lake area delineation and volume calculation. We build on commonly-used, successful techniques in the literature to detect supraglacial lakes and fine-tune them to make them specifically applicable to all locations in Antarctica where ponding occurs (mostly on and around ice shelves).

Study Area
While the overarching objective of our research is to produce an Antarctic-wide lake area and volume product, here, we focus on several key areas to develop our methods ( Figure 1). These sites include: the (1) Amery, (2) Roi Baudouin, (3) Nivlisen, and (4) Riiser-Larsen ice shelves, which have all been characterized by extensive surface hydrological networks for many decades [21,[28][29][30][31][32][33]. The Amery Ice Shelf drains grounded ice from the interior of the Lambert Glacier drainage basin, which covers 16% of the entire mass of the East Antarctic Ice Sheet [34]. The largest supraglacial lake, which is reported to be ~80 km long, is located on the Amery Ice Shelf [21]. Extensive and active surface hydrology is known to exist across Roi Baudouin ice shelf in Dronning Maud land, East Antarctica, as the result of persistent katabatic winds driving a melt-albedo feedback [33]. On the neighboring Nivlisen Ice Shelf, surface lakes are observed to drain >70 km across the ice shelf [21,31]. The Riiser-Larsen Ice Shelf is about 400 km long and is located on the coast of Queen Maud Land. Similar to the aforementioned ice shelves, melt ponds have appeared on the Riiser-Larsen Ice Shelf across decades of satellite imagery [21]. The selection of these ice-shelf study areas provides a wide range of surface conditions and thus spectral variability related to different surface classes. Large variation in melt ponding characteristics and behavior make these ice shelves ideal locations for testing and refining our methods.

Image Data Collection and Preprocessing
We first compiled a list of imagery over each study site from each sensor using the Google Earth Engine cloud computing platform and automatically downloaded the data from Google Cloud (L8

Image Data Collection and Preprocessing
We first compiled a list of imagery over each study site from each sensor using the Google Earth Engine cloud computing platform and automatically downloaded the data from Google Cloud (L8 images accessed via gs://gcp-public-data-landsat/ and S2 images accessed via: gs://gcp-public-data-sentinel-2/). For each study site, we used all available L8/S2 images over the 2013-2019 austral summers (November-February) that were acquired at sun elevation angles greater than 20 • . A companion paper based on Landsat 8 imagery by Halberstadt et al. [36] showed that below this sun elevation angle, lakes are not significantly spectrally different from their surroundings and therefore cannot be resolved using common classification schemes. We did not filter images based on their cloud cover, to test our methods even more.
We used Landsat 8 Level 1 products, which were geometrically-corrected and radiometrically-calibrated data. Using coefficients provided in image metadata, we converted L8 Level-1T data to top-of-atmosphere (TOA) reflectance, known to accurately represent surface conditions over ice sheets [5,6,23], and previously used for glaciological [27,35] and ice sheet hydrological investigation [5,6,23,37,38]. The S2 images were Level-1C products, which means they were orthorectified, map-projected images containing TOA reflectance data, therefore no additional preprocessing is required.

Threshold-Based Classification of Lakes, Rocks, and Clouds
Here, we present new techniques for automated lake classification using Landsat 8 and Sentinel-2 satellite imagery. The method developed for each sensor combines separate threshold-based algorithms that classify the image into three main classes: (1) lakes, (2) clouds, and (3) rocks. We determine threshold values used in the methodology by creating a training dataset based on selected L8 and S2 images. Training images cover a wide range of illumination conditions, cloud cover, geology, and spectral characteristics and were acquired over multiple sites in Antarctica as outlined in Section 2.1. As part of our classification scheme, we examined spectral properties of different classes using individual bands and also band combinations. Most notably, we used the Normalized Difference Water Index (NDWI) to extract pixels containing liquid water [5,6,13]. The blue and red bands in Equation (1) correspond to B2, and B4, respectively, in both Landsat 8 and Sentinel-2: To help isolate clouds and rocks in individual images, we used the Normalized Difference Snow Index (NDSI) [39,40] as part of both L8 and S2 methods. The green and SWIR bands in Equation (2) correspond to B3 and B6 in Landsat 8 and B3 and B11 in Sentinel-2: Since L8 provides thermal information, we use the TIRS1/Blue ratio [27] to further identify rocks and clouds. The following sections detail procedures developed for each sensor.

Landsat 8
Using 15 L8 scenes acquired over the study sites outlined in Section 2.1 we created a training dataset (n ≈ 14.5 × 10 6 ) representing spectral properties of eight classes, namely: lakes (ponds, streams, and rivers) (n ≈ 0.5 × 10 6 ), slush (water-saturated snow) (n ≈ 0.2 × 10 6 ), snow (n ≈ 2 × 10 6 ), shaded snow (n ≈ 0.1 × 10 6 ), sunlit rocks (n ≈ 0.1 × 10 6 ), shaded rocks (n ≈ 0.1 × 10 6 ), clouds (n ≈ 11 × 10 6 ), and cloud shadows (n ≈ 0.5 × 10 6 ). Pixel values were extracted across several bands or band combinations to determine the spectral thresholds that best distinguished these eight surface types ( Figure 2). The first step in our lake detection procedure, as outlined in Figure 3, is to identify and remove areas of exposed rock outcrop. Although the NDSI is most widely used for detection of sunlit rocks, it often misclassifies clouds and does not capture areas of shaded rock. Consequently, we employ a mask based on L8's thermal infrared band (TIRS1) and the blue band. Low values for the TIRS1/Blue ratio [27] represent cold and highly reflective surfaces such as snow and clouds. In contrast, higher values indicate warmer and darker surfaces such as rocks or seawater. As can be noted in Figure 2, there is a good separation between sunlit and shaded rock areas and other classes in the TIRS1/Blue chart. We used a scaled TIRS1/Blue threshold value of >0.65 (>650 for non-scaled TIRS1 brightness temperature and blue reflectance values) to mask out areas of exposed rock outcrop. This threshold represents the lower 5th percentile for sunlit rocks. Application of this threshold may result in inclusion of some shaded snow areas, as it also corresponds to the upper 99% percentile of the shaded snow class. To avoid such a misclassification, we used an absolute threshold for blue TOA reflectance B2 < 0.35 to produce a rock/seawater mask (seawater reflects less than 20% of the incoming solar radiation in the blue band). Therefore, the application of this threshold (B2 < 0.35) adequately masks out both rock and open ocean environments, that are spectrally similar to darker, and thus deeper, lakes. The first step in our lake detection procedure, as outlined in Figure 3, is to identify and remove areas of exposed rock outcrop. Although the NDSI is most widely used for detection of sunlit rocks, it often misclassifies clouds and does not capture areas of shaded rock. Consequently, we employ a mask based on L8's thermal infrared band (TIRS1) and the blue band. Low values for the TIRS1/Blue ratio [27] represent cold and highly reflective surfaces such as snow and clouds. In contrast, higher values indicate warmer and darker surfaces such as rocks or seawater. As can be noted in Figure 2, there is a good separation between sunlit and shaded rock areas and other classes in the TIRS1/Blue chart. We used a scaled TIRS1/Blue threshold value of >0.65 (>650 for non-scaled TIRS1 brightness temperature and blue reflectance values) to mask out areas of exposed rock outcrop. This threshold represents the lower 5th percentile for sunlit rocks. Application of this threshold may result in inclusion of some shaded snow areas, as it also corresponds to the upper 99% percentile of the shaded snow class. To avoid such a misclassification, we used an absolute threshold for blue TOA reflectance B2 < 0.35 to produce a rock/seawater mask (seawater reflects less than 20% of the incoming solar radiation in the blue band). Therefore, the application of this threshold (B2 < 0. 35  To produce a cloud mask for every scene, we employ Landsat 8's B6 (Short-Wave Infrared SWIR) TOA data. We mark pixels as cloudy when their B6 TOA reflectance value exceeds 0.1. This threshold captures 80% of clouds, while excluding 99% of snow-covered areas in the training dataset. To aid further snow filtering in our cloud mask, we used the NDSI below a threshold of 0.8, which again corresponds to the upper 99th percentile for snow ( Figure 2).
After application of the rock/seawater and cloud masks, we delineated lake areas using the most widely-applied method, NDWI. We chose a threshold of 0.19 (NDWI > 0. 19), which is an intermediate value between the lower 5th percentile for lakes and the upper 99th percentile for slush. However, the NDWI method alone is unable to fully discern lakes from shaded-snow and cloud-shadowed areas ( Figure 2). The chosen threshold could result in gross overestimation of lake areas, as it also corresponds to the upper 85th percentile for cloud shadows, which at times can cover large areas of the image. To reduce such misclassification errors, we subtract B4 (red) TOA reflectances from those in B3 (green) TOA reflectances from those in B2 (blue) to highlight the difference between light attenuation properties in water versus cloud-shadowed snow surfaces. As there is a significant change in water absorption properties going from the blue band to the green and red bands, the subtracted (B3-B4) and (B2-B3) values for lakes will be higher than those of the more uniformly-absorptive surfaces such as cloud-shadowed snow areas, which appear light to dark grey in the true-color composite images. A threshold of (B3-B4) > 0.07 and (B2-B3) > 0.11 are then applied to exclude cloud shadows and shaded snow areas from lake classification results. We selected these thresholds to exclude as much of the cloud shadows and shaded snow areas as possible, while preserving as many lake pixels as possible (95% of lake pixels). Using these thresholds, we created binary (i.e., lake and non-lake) masks for each L8 scene. From the binary lake products, we removed areas that are <5 pixels (30 m resolution) in total and linear features that are narrower than 2 pixels, since these features are likely to represent areas of mixed slush or streams rather than lakes [5,41].
After application of the rock/seawater and cloud masks, we delineated lake areas using the most widely-applied method, NDWI. We chose a threshold of 0.19 (NDWI > 0. 19), which is an intermediate value between the lower 5th percentile for lakes and the upper 99th percentile for slush. However, the NDWI method alone is unable to fully discern lakes from shaded-snow and cloud-shadowed areas ( Figure 2). The chosen threshold could result in gross overestimation of lake areas, as it also corresponds to the upper 85th percentile for cloud shadows, which at times can cover large areas of the image. To reduce such misclassification errors, we subtract B4 (red) TOA reflectances from those in B3 (green) TOA reflectances from those in B2 (blue) to highlight the difference between light attenuation properties in water versus cloud-shadowed snow surfaces. As there is a significant change in water absorption properties going from the blue band to the green and red bands, the subtracted (B3-B4) and (B2-B3) values for lakes will be higher than those of the more uniformly-absorptive surfaces such as cloud-shadowed snow areas, which appear light to dark grey in the true-color composite images. A threshold of (B3-B4) > 0.07 and (B2-B3) > 0.11 are then applied to exclude cloud shadows and shaded snow areas from lake classification results. We selected these thresholds to exclude as much of the cloud shadows and shaded snow areas as possible, while preserving as many lake pixels as possible (95% of lake pixels). Using these thresholds, we created binary (i.e., lake and non-lake) masks for each L8 scene. From the binary lake products, we removed areas that are <5 pixels (30 m resolution) in total and linear features that are narrower than 2 pixels, since these features are likely to represent areas of mixed slush or streams rather than lakes [5,41].

Sentinel-2
Following a similar methodology to L8, we create a training dataset (n ≈ 135 × 10 6 ) by selecting 10 S2 images of different latitudes, geology, sun illumination, and cloud cover collected over each of our study sites. Broad areas are manually classified as lakes (n ≈ 1.3 × 10 6 ), slush (n ≈ 6 × 10 6 ), snow (n ≈ 40 × 10 6 ), shaded snow (n ≈ 0.2 × 10 6 ), sunlit rocks (n ≈ 0.4 × 10 6 ), shaded rocks (n ≈ 0.4 × 10 6 ), clouds (n ≈ 84 × 10 6 ), and cloud shadows (n ≈ 8 ×  Similar to L8, the first step in our lake detection procedure, as presented in Figure 5, is to create a rock/seawater mask. Since S2 does not provide a thermal band, we employed the NDSI as our primary method to mask out areas of rock exposure (NDSI > 0.85). We applied a B2 (blue) < 0.4 filter to exclude snow and clouds from the rock outcrop/seawater map ( Figure 4). We used a cloud-masking procedure similar to that for L8, in which pixels are marked as clouds when the B11 (SWIR) TOA reflectance value exceeds a threshold of 0.1 (the upper 95th percentile value for snow) and the B10 (SWIR-Cirrus) TOA is greater than 0.01 (the lower 20th percentile for clouds and the upper 95th percentile for snow). Prior to cloud masking, we interpolated B11 and B10 data via a bilinear method to match the 10 m resolution of S2's optical bands. After rock and cloud masking was performed, we applied a NDWI > 0.18 filter to map lake extents (95% of NDWI values for lakes are greater than this threshold). To exclude cloud shadows and shaded snow areas from classification results, a threshold of (B3-B4) > 0.09 was applied (0.09: the intermediate value between the upper 95th percentile for cloud shadows and the lower 5th percentile for lakes). Similar to L8, we created binary (i.e., lake and non-lake) masks for each S2 scene and remove areas <45 pixels (10 m resolution) in total and linear features <6 pixels wide, to further exclude slush or streams from the final classification results. Similar to L8, the first step in our lake detection procedure, as presented in Figure 5, is to create a rock/seawater mask. Since S2 does not provide a thermal band, we employed the NDSI as our primary method to mask out areas of rock exposure (NDSI > 0.85). We applied a B2 (blue) < 0.4 filter to exclude snow and clouds from the rock outcrop/seawater map ( Figure 4). We used a cloud-masking procedure similar to that for L8, in which pixels are marked as clouds when the B11 (SWIR) TOA reflectance value exceeds a threshold of 0.1 (the upper 95th percentile value for snow) and the B10 (SWIR-Cirrus) TOA is greater than 0.01 (the lower 20th percentile for clouds and the upper 95th percentile for snow). Prior to cloud masking, we interpolated B11 and B10 data via a bilinear method to match the 10 m resolution of S2's optical bands. After rock and cloud masking was performed, we applied a NDWI > 0.18 filter to map lake extents (95% of NDWI values for lakes are greater than this threshold). To exclude cloud shadows and shaded snow areas from classification results, a threshold of (B3-B4) > 0.09 was applied (0.09: the intermediate value between the upper 95th percentile for cloud shadows and the lower 5th percentile for lakes). Similar to L8, we created binary (i.e., lake and non-lake) masks for each S2 scene and remove areas <45 pixels (10 m resolution) in total and linear features <6 pixels wide, to further exclude slush or streams from the final classification results.  Figure 5. Flowchart for lake detection from Sentinel-2 satellite imagery. Rock/seawater and cloud masks are created and applied before lake detection scheme is implemented.

Lake Depth Retrieval and Volume Estimation
For each L8 or S2 image, we calculated lake depths and volumes using a physically-based model that is commonly used in the similar studies of lakes in Greenland [5,6,26,42]. This model is based on the premise that light passing through a water column is attenuated with depth, due to absorption and scattering processes. The following expression [42] was used to determine water depth from passive optical data: where is the albedo of the lake bed, is the reflectance of optically deep water (>40 m), is the observed water reflectance (TOA reflectance), and is lake depth. The quantity is a two-way attenuation coefficient that accounts for losses in both upward and downward directions including absorption and scattering. Once the three model parameters ( , , ) have been determined, Equation (1) provides a means of retrieving depth from measured surface reflectance. These model parameters vary depending on the wavelength or band which is used; due to different attenuation rates, longer wavelengths (e.g., red) will be sensitive to shallower depths while shorter wavelengths (e.g., blue) would be more appropriate for significantly deeper areas. Lake volume ( ) is then calculated, according to Equation (4), as the sum of lake depths ( ), multiplied by the pixel area ( ), within the lake outlines: We calculated lake depths by averaging depths derived from L8's red and panchromatic band TOA reflectance data within the boundaries of the lakes, following the recommendations of Pope et al. [5]. To do so, we resampled the 15 m resolution panchromatic band to match the 30 m resolution of the red band.
is the TOA reflectance from each band. is calculated for each lake as the average reflectance of the 1-pixel ring around the margin of each identified lake. We used values for the relevant Landsat 8 bands from Pope et al. [5].
The commonly-used method for derivation of the value relies on extracting reflectance values of optically deep water on a scene-by-scene basis for each band; if ocean water pixels are present in a scene, then the median value of their reflectances can be chosen as [5,26]. However, Figure 5. Flowchart for lake detection from Sentinel-2 satellite imagery. Rock/seawater and cloud masks are created and applied before lake detection scheme is implemented.

Lake Depth Retrieval and Volume Estimation
For each L8 or S2 image, we calculated lake depths and volumes using a physically-based model that is commonly used in the similar studies of lakes in Greenland [5,6,26,42]. This model is based on the premise that light passing through a water column is attenuated with depth, due to absorption and scattering processes. The following expression [42] was used to determine water depth from passive optical data: where A d is the albedo of the lake bed, R ∞ is the reflectance of optically deep water (>40 m), R w is the observed water reflectance (TOA reflectance), and z is lake depth. The quantity g is a two-way attenuation coefficient that accounts for losses in both upward and downward directions including absorption and scattering. Once the three model parameters (A d , R ∞ , g) have been determined, Equation (1) provides a means of retrieving depth from measured surface reflectance. These model parameters vary depending on the wavelength or band which is used; due to different attenuation rates, longer wavelengths (e.g., red) will be sensitive to shallower depths while shorter wavelengths (e.g., blue) would be more appropriate for significantly deeper areas. Lake volume (V lake ) is then calculated, according to Equation (4), as the sum of lake depths (d i ), multiplied by the pixel area (A pixel ), within the lake outlines:

Landsat 8
We calculated lake depths by averaging depths derived from L8's red and panchromatic band TOA reflectance data within the boundaries of the lakes, following the recommendations of Pope et al. [5].
To do so, we resampled the 15 m resolution panchromatic band to match the 30 m resolution of the red band. R w is the TOA reflectance from each band. A d is calculated for each lake as the average reflectance of the 1-pixel ring around the margin of each identified lake. We used g values for the relevant Landsat 8 bands from Pope et al. [5].
The commonly-used method for derivation of the R ∞ value relies on extracting reflectance values of optically deep water on a scene-by-scene basis for each band; if ocean water pixels are present in a scene, then the median value of their reflectances can be chosen as R ∞ [5,26]. However, this method can only be applied to a small percentage of all the L8 images around the continent, because not all scenes contain coastal areas unobscured by cloud cover or sea ice. In such cases, the closest coastal image (both in space and time) can be used for determination of R ∞ , but adopting such an approach for a project of this scale requires automation.
Therefore, we used Google Earth Engine to create a look-up table of R ∞ values from all L8 red and panchromatic bands covering a 100-km buffer around the continent's coastline. For L8, we used 8000 images to compile a dataset of bottom 5% reflectances in each band. We filtered the dataset so that it only included non-cloudy scenes containing optically-deep water by applying a threshold for acceptable R ∞ values (R ∞ < 0.1 since deep water is very dark) ( Figure 6). To calculate lake depth, we chose the R ∞ value of the closest image in space and time (with respect to the scene of interest). this method can only be applied to a small percentage of all the L8 images around the continent, because not all scenes contain coastal areas unobscured by cloud cover or sea ice. In such cases, the closest coastal image (both in space and time) can be used for determination of , but adopting such an approach for a project of this scale requires automation.
Therefore, we used Google Earth Engine to create a look-up table of values from all L8 red and panchromatic bands covering a 100-km buffer around the continent's coastline. For L8, we used ~8000 images to compile a dataset of bottom 5% reflectances in each band. We filtered the dataset so that it only included non-cloudy scenes containing optically-deep water by applying a threshold for acceptable values ( < 0.1 since deep water is very dark) ( Figure 6). To calculate lake depth, we chose the value of the closest image in space and time (with respect to the scene of interest).

Sentinel-2
As with L8, the application of the physically-based model to the S2 red band TOA reflectance data produces the most accurate depth results [41]. Due to finer spatial detail of S2 compared to L8, we derived by averaging reflectances over a three-pixel wide ring around a lake to minimize the impact of errors associated with lake outlines. For red band data, we used = 0.83 from Williamson et al. [41]. Following a similar procedure to L8, we compile a look-up table for S2 using a total of 12321 images acquired during 2016-2019 ( Figure 6). The dataset was then filtered to only reflect acceptable values derived from non-cloudy coastal images ( < 0.1). Similar to L8, for each S2 scene, is extracted from the table using spatiotemporal adjacency criteria.

Assessment of Lake Extents
We employed several strategies to examine the applicability of our procedure described above to different environments with diverse lake characteristics, a prerequisite for widespread application of our methods across Antarctica. The following provides details on the successes and remaining challenges of these strategies and presents a discussion of our lake, rock, and cloud detection results (samples of which are presented in Figure 7).

Sentinel-2
As with L8, the application of the physically-based model to the S2 red band TOA reflectance data produces the most accurate depth results [41]. Due to finer spatial detail of S2 compared to L8, we derived A d by averaging reflectances over a three-pixel wide ring around a lake to minimize the impact of errors associated with lake outlines. For red band data, we used g = 0.83 from Williamson et al. [41]. Following a similar procedure to L8, we compile a R ∞ look-up table for S2 using a total of 12321 images acquired during 2016-2019 ( Figure 6). The dataset was then filtered to only reflect acceptable R ∞ values derived from non-cloudy coastal images (R ∞ < 0.1). Similar to L8, for each S2 scene, R ∞ is extracted from the table using spatiotemporal adjacency criteria.

Assessment of Lake Extents
We employed several strategies to examine the applicability of our procedure described above to different environments with diverse lake characteristics, a prerequisite for widespread application of our methods across Antarctica. The following provides details on the successes and remaining challenges of these strategies and presents a discussion of our lake, rock, and cloud detection results (samples of which are presented in Figure 7).

Visual Inspection of Results across 1000 Images and Four Ice Shelves
In the absence of in-situ validation data, we visually assessed the result of our supervised lake classification procedure for ~1000 L8 and S2 images collected over four ice shelves (Section 2.1). Since visual interpretation of spectrally-similar classes such as lakes, slush, blue ice, shaded snow, etc. is rather subjective, four independent individuals manually inspected the results. For both L8 and S2 sensors, our lake classification scheme successfully maps lake areas across different environments, from the western Roi Baudouin ice shelf covered mostly with smaller and shallower lakes to the Amery Ice shelf including larger and deeper lakes.
Our assessments reveal consistent and reliable lake area delineations for images acquired at solar elevations above 20° for both L8 and S2 sensors. Visual inspection confirmed that the majority of delineated lake areas represent pooled liquid water on the surface of the ice shelves; completely frozen lakes were almost always excluded using our classification approach. Active radar sensors such as Sentinel-1 could be used to detect these lakes.
We note that lakes covered by cloud shadows/under thin clouds were included in the classification results. For these two lake classes, depth retrieval will be impacted by the presence of cloud shadows/clouds, as they alter lake reflectance. Nevertheless, a thorough map of all visible lakes in images of ice shelves is a valuable product when assessing the evolution of lake areas.
Our cloud classification procedures for both S2 and L8, which are applied prior to lake detection methods, produce reliable maps of cloud occurrence. Through visual inspection, we found that nearly all visible clouds were captured in both L8 and S2 images. Some thinner colder clouds were missed, and at times, bright snow patches were misclassified as clouds. Overall, cloud masks provide critical information about where clouds could potentially obscure lakes, an important parameter to consider when analyzing temporal variations of lake areas and volumes.
Due to spectral similarities between shaded rocks and lakes, clipping out rock exposures is a critical step before application of lake detection methods. While the static rock outcrop masks such as those developed by Burton-Johnson et al. [27] provide a through picture of rock areas across the continent, dynamic rock detection methods are required to capture variability in rock areas due to seasonal snow cover. Visual assessment of our results confirmed that our dynamic rock/seawater masking approach for both L8 and S2 sensors allows accurate and thorough removal of sunlit and shaded rocks as well as ocean water, on a scene-by-scene basis.

Visual Inspection of Results across 1000 Images and Four Ice Shelves
In the absence of in-situ validation data, we visually assessed the result of our supervised lake classification procedure for~1000 L8 and S2 images collected over four ice shelves (Section 2.1). Since visual interpretation of spectrally-similar classes such as lakes, slush, blue ice, shaded snow, etc. is rather subjective, four independent individuals manually inspected the results. For both L8 and S2 sensors, our lake classification scheme successfully maps lake areas across different environments, from the western Roi Baudouin ice shelf covered mostly with smaller and shallower lakes to the Amery Ice shelf including larger and deeper lakes.
Our assessments reveal consistent and reliable lake area delineations for images acquired at solar elevations above 20 • for both L8 and S2 sensors. Visual inspection confirmed that the majority of delineated lake areas represent pooled liquid water on the surface of the ice shelves; completely frozen lakes were almost always excluded using our classification approach. Active radar sensors such as Sentinel-1 could be used to detect these lakes.
We note that lakes covered by cloud shadows/under thin clouds were included in the classification results. For these two lake classes, depth retrieval will be impacted by the presence of cloud shadows/clouds, as they alter lake reflectance. Nevertheless, a thorough map of all visible lakes in images of ice shelves is a valuable product when assessing the evolution of lake areas.
Our cloud classification procedures for both S2 and L8, which are applied prior to lake detection methods, produce reliable maps of cloud occurrence. Through visual inspection, we found that nearly all visible clouds were captured in both L8 and S2 images. Some thinner colder clouds were missed, and at times, bright snow patches were misclassified as clouds. Overall, cloud masks provide critical information about where clouds could potentially obscure lakes, an important parameter to consider when analyzing temporal variations of lake areas and volumes.
Due to spectral similarities between shaded rocks and lakes, clipping out rock exposures is a critical step before application of lake detection methods. While the static rock outcrop masks such as those developed by Burton-Johnson et al. [27] provide a through picture of rock areas across the continent, dynamic rock detection methods are required to capture variability in rock areas due to seasonal snow cover. Visual assessment of our results confirmed that our dynamic rock/seawater masking approach for both L8 and S2 sensors allows accurate and thorough removal of sunlit and shaded rocks as well as ocean water, on a scene-by-scene basis.

Comparison with Manually-Digitized Lake Polygons
To complement our visual assessment of the results, we produced a validation dataset of manually-digitized lake boundaries. Distinction of visually-ambiguous classes, such as water-saturated snow, blue ice, and shallow lakes, is subject to user interpretation. Therefore, we only traced deeper lake areas that were least susceptible to such errors. Doing so, however, biases our accuracy assessments toward larger and deeper lakes.
For L8/ S2, we constructed a dataset of~340 km 2 /190 km 2 lake polygons manually derived from six/three scenes. Table 1 provides a summary of classifier accuracy for each scene. Overall accuracies of >94% for L8 and >97% for S2 imply reliable and accurate detection of distinct lake areas. Through visual inspection, we attribute most of the differences between the classifier and the validation data to the wrongful inclusion of frozen lake pixels in the validation data. In this regard, our classifier outperformed user's visual interpretation of lakes. Visually-ambiguous classes, such as slushy snow or partially-frozen lakes, also explain some of the observed differences, but in the absence of in-situ data, it is difficult to evaluate the classification performance in those regions. 98.4% Total traced lake area (Landsat 8)~340 km 2 Total traced lake area (Sentinel-2)~190 km 2 Average Accuracy (Landsat 8)~94.5% Average Accuracy (Sentinel-2)~97.5%

Comparisons with Other Methods
To further assess the performance of our method, we compared our results against those obtained by Halberstadt et al. [36]. They employed a supervised classification scheme based on Random Forest algorithms (50 decision trees), which were trained on datasets comprising spectrally diverse unsupervised clusters clumped into 11 classes (shallow lakes, deep lakes, slush, blue ice, trough ice, snowy ice, two types of cloud shadows, sunlit rocks, and shaded rocks). When compared to manually-traced lake areas (~150 km 2 ), this supervised classification method produced an average accuracy of 97%. We applied both our method and their 11-class Random Forest algorithm to all available L8 images collected at a single footprint (path:128, row:111) over the Amery Ice Shelf throughout the observational record. Images were filtered to remove scenes collected with sun elevations <20 • , and our automated cloud removal was applied. As can be noted in Figure 8, the two methods produce consistent lake areas for each scene (R 2 = 0.91; p-value = 1e −6 ), and overall provide a cohesive picture of lake evolution across each melt season over the 2013-2018 period. Comparison of 42 scenes and 6000 km 2 of detected lakes indicated no bias between the two methods and yielded an average RMSE of~70 km 2 , which translates to less than 1% of total lake area in all scenes. Given that the algorithm by Halberstadt et al. [36] was trained on unsupervised clustering of L8 scenes rather than a pre-existing definition of lake spectral characteristics, convergence of lake areas confirms that our spectral thresholds are valid (Figure 8). Generally, the differences in lake boundaries delineated by the two methods reflect uncertainty in the subjective definition and visual interpretation of supraglacial lakes. The thresholding method is better able to exclude cloud shadows and dirty ice near rock outcrops from classification results. This explains the overestimation of lake areas by the supervised classification method (Figure 8) in scenes with fewer lakes (October-December). On the other hand, the thresholding method is more sensitive to shallow or partially-frozen lake environments. Such sensitivity produces greater lake areas in January and February images as compared to those by Halberstadt et al. [36]. elevations <20°, and our automated cloud removal was applied. As can be noted in Figure 8, the two methods produce consistent lake areas for each scene (R 2 = 0.91; p-value = 1e −6 ), and overall provide a cohesive picture of lake evolution across each melt season over the 2013-2018 period. Comparison of 42 scenes and 6000 km 2 of detected lakes indicated no bias between the two methods and yielded an average RMSE of ~70 km 2 , which translates to less than 1% of total lake area in all scenes. Given that the algorithm by Halberstadt et al. [36] was trained on unsupervised clustering of L8 scenes rather than a pre-existing definition of lake spectral characteristics, convergence of lake areas confirms that our spectral thresholds are valid (Figure 8). Generally, the differences in lake boundaries delineated by the two methods reflect uncertainty in the subjective definition and visual interpretation of supraglacial lakes. The thresholding method is better able to exclude cloud shadows and dirty ice near rock outcrops from classification results. This explains the overestimation of lake areas by the supervised classification method (Figure 8) in scenes with fewer lakes (October-December). On the other hand, the thresholding method is more sensitive to shallow or partially-frozen lake environments. Such sensitivity produces greater lake areas in January and February images as compared to those by Halberstadt et al. [36]. We also compared our results with other threshold-based methods. As can be noted in Figure 9, the more conservative NDWI values such as 0.25 [13] and 0.3 [22] do not capture shallow lakes on the Nivlisen Ice Shelf, while also misclassifying cloud shadows as lakes. Our method reduces cloud shadow misclassification errors by including another thresholding method (B3-B4 > 0.07), while also allowing for shallow lake areas to be detected by using a lower NDWI threshold (0.19) (Figure 9). We also compared our results with other threshold-based methods. As can be noted in Figure 9, the more conservative NDWI values such as 0.25 [13] and 0.3 [22] do not capture shallow lakes on the Nivlisen Ice Shelf, while also misclassifying cloud shadows as lakes. Our method reduces cloud shadow misclassification errors by including another thresholding method (B3-B4 > 0.07), while also allowing for shallow lake areas to be detected by using a lower NDWI threshold (0.19) (Figure 9).

Cross-Validation of Landsat 8 and Sentinel-2 Derived Lake Areas and Depths
To ensure the interoperability of the full L8 and S2 dataset, we examined the results from spatio-temporally coincident L8 and S2 image pairs. These image pairs were either acquired on the same day or were one day apart. To do so, we clipped the image pairs to overlapping areas, and down-sampled the S2 depth products to match the resolution of the L8 products. Figure 10 and Table 2 summarize the results of depth and volume calculations for four sites located on the Amery, Roi Baudouin, and Riiser-Larsen ice shelves. Figure 10 indicates close agreement between depth measurements from the two sensors (R 2 = 0.85) with almost no bias (average RMSE of 0.2 m). The average volumetric difference for all the sites is less than 2%. To compare lake areas derived from each sensor, we computed the Sørensen-Dice similarity coefficient (Dice, 1945). Given two binary masks, the dice coefficient equals twice the number of elements common to both masks divided by the sum of the number of elements in each mask. The average 83% similarity between binary masks for each site point to a high level of consistency between lake detection methods applied to the two types of imagery. Therefore, despite the inherent differences between the two sensors and the nature of detail they capture, we are confident that our methods reproduce same lake areas and volumes. Future work will include evaluation of depth measurements against data from the ICESat-2 Satellite (Ice, Cloud, and land Elevation Satellite-2).

Cross-Validation of Landsat 8 and Sentinel-2 Derived Lake Areas and Depths
To ensure the interoperability of the full L8 and S2 dataset, we examined the results from spatio-temporally coincident L8 and S2 image pairs. These image pairs were either acquired on the same day or were one day apart. To do so, we clipped the image pairs to overlapping areas, and down-sampled the S2 depth products to match the resolution of the L8 products. Figure 10 and Table 2 summarize the results of depth and volume calculations for four sites located on the Amery, Roi Baudouin, and Riiser-Larsen ice shelves. Figure 10 indicates close agreement between depth measurements from the two sensors (R 2 = 0.85) with almost no bias (average RMSE of 0.2 m). The average volumetric difference for all the sites is less than 2%. To compare lake areas derived from each sensor, we computed the Sørensen-Dice similarity coefficient (Dice, 1945). Given two binary masks, the dice coefficient equals twice the number of elements common to both masks divided by the sum of the number of elements in each mask. The average 83% similarity between binary masks for each site point to a high level of consistency between lake detection methods applied to the two types of imagery. Therefore, despite the inherent differences between the two sensors and the nature of detail they capture, we are confident that our methods reproduce same lake areas and volumes. Future work will include evaluation of depth measurements against data from the ICESat-2 Satellite (Ice, Cloud, and land Elevation Satellite-2).

Lake Drainage Events on the Amery Ice Shelf
Surface meltwater lakes on the Antarctic ice shelves are believed to impact ice shelf stability [16,17,20,33,[43][44][45][46][47]. Ice shelf flexure as a result of lake filling and drainage events can lead to formation of fractures both within and outside the lake basins [44,48]. These fractures can trigger a chain reaction of further lake drainage events if they intersect nearby lake basins, potentially contributing to ice shelf break-up [17,49].
Here, we present the first observations of lake drainage outside of the Antarctic Peninsula. These drainage events occurred on the Amery ice shelf during the 2017 and 2018 melt seasons. Figure 11 presents various lake drainage types. Lake a (~3 km in diameter with an average depth of ~1 m) appears to have drained downstream into a river. Lakes b and c appear to be connected via a stream in the 4 January 2017 image. Sixteen days later (20 January 2017), Lake b is empty while Lake c has grown four times its original size. It is possible that meltwater from Lake b flowed downstream into Lake c.

Lake Drainage Events on the Amery Ice Shelf
Surface meltwater lakes on the Antarctic ice shelves are believed to impact ice shelf stability [16,17,20,33,[43][44][45][46][47]. Ice shelf flexure as a result of lake filling and drainage events can lead to formation of fractures both within and outside the lake basins [44,48]. These fractures can trigger a chain reaction of further lake drainage events if they intersect nearby lake basins, potentially contributing to ice shelf break-up [17,49].
Here, we present the first observations of lake drainage outside of the Antarctic Peninsula. These drainage events occurred on the Amery ice shelf during the 2017 and 2018 melt seasons. Figure 11 presents various lake drainage types. Lake a (~3 km in diameter with an average depth of~1 m) appears to have drained downstream into a river. Lakes b and c appear to be connected via a stream in the 4 January 2017 image. Sixteen days later (20 January 2017), Lake b is empty while Lake c has grown four times its original size. It is possible that meltwater from Lake b flowed downstream into Lake c. Lake d is a 4 km long lake on Amery Ice Shelf which is unconnected to Lakes a, b, and c. It is on average 2 m deep and holds 8.6 million cubic meters of meltwater. Lake d appears to have drained through the ice shelf over a 10-day period. Sentinel-1 radar images confirm lake drainage as opposed to a freeze-up ( Figure 11). In the case of these four lakes, our product allows for calculation of meltwater volumes that were either advected on the surface via streams and river (Lakes a, b, c) or vertically through ice shelf fractures (Lake d). Table 3 provides a summary of lake statistics before and after drainage events. Identification of such filling and drainage episodes, and quantification of lake volumes will improve our understanding of surface hydrology and its impact on ice shelf stability. Table 3. Statistics on filling and drainage events for four lakes on the Amery Ice Shelf as outlined in Figure 11.

Lake Evolution
Having assessed and verified the reliability of lake masks produced by our techniques for both L8 and S2 sensors, we examined seasonal changes of lake volumes for the Amery Ice Shelf. We applied our methods to calculate meltwater volumes contained in lakes for the 2013-2019 melt seasons (1 November-1 March) for every day when L8 images were available. Figure 12 presents a time series of lake water volumes for a portion of the Amery Ice Shelf outlined in the red box ( Figure  12A). Grey bars in the time series chart ( Figure 12C) indicate cloudiness per scene as estimated by our cloud estimation procedure and are meant to help with interpreting the observational records. Some dates with low total lake volumes are explained by low visibility due to clouds, which remain the main limiting factor in creating a high-resolution time-series record. When looking at smaller Lake d is a 4 km long lake on Amery Ice Shelf which is unconnected to Lakes a, b, and c. It is on average 2 m deep and holds 8.6 million cubic meters of meltwater. Lake d appears to have drained through the ice shelf over a 10-day period. Sentinel-1 radar images confirm lake drainage as opposed to a freeze-up ( Figure 11). In the case of these four lakes, our product allows for calculation of meltwater volumes that were either advected on the surface via streams and river (Lakes a, b, c) or vertically through ice shelf fractures (Lake d). Table 3 provides a summary of lake statistics before and after drainage events. Identification of such filling and drainage episodes, and quantification of lake volumes will improve our understanding of surface hydrology and its impact on ice shelf stability. Table 3. Statistics on filling and drainage events for four lakes on the Amery Ice Shelf as outlined in Figure 11.

Lake Evolution
Having assessed and verified the reliability of lake masks produced by our techniques for both L8 and S2 sensors, we examined seasonal changes of lake volumes for the Amery Ice Shelf. We applied our methods to calculate meltwater volumes contained in lakes for the 2013-2019 melt seasons (1 November-1 March) for every day when L8 images were available. Figure 12 presents a time series of lake water volumes for a portion of the Amery Ice Shelf outlined in the red box ( Figure 12A). Grey bars in the time series chart ( Figure 12C) indicate cloudiness per scene as estimated by our cloud estimation procedure and are meant to help with interpreting the observational records. Some dates with low total lake volumes are explained by low visibility due to clouds, which remain the main limiting factor in creating a high-resolution time-series record. When looking at smaller spatial scales, overlapping L8 and S2 images can improve temporal resolution to less than 5 days [41]. [41].
The Amery time series shows almost no melt ponding during the 1 November-1 January period across 6 years. This points to early January being the start of melt surface storage in this region, consistent with findings by Langley et al. [32]. Starting in January, meltwater volumes steadily increase before peaking in early February, and then decrease through the remainder of the melt season. While there is significant inter-annual variability, superimposing all available years together present a coherent picture of season supraglacial melt storage behavior on ice shelves Antarctica.

Conclusions
We present the results of a first approach to generate records of lake area and volume evolution in Antarctica across large spatial and temporal scales, exploiting data from optical satellite sensors, Landsat 8 and Sentinel-2 A and B. Assessment of our improved threshold-based lake identification scheme across four different meltwater systems, namely the Amery, Roi Baudouin, Nivlisen, and Riiser Larsen ice shelves, point to its potential to produce continent-wide maps of lake occurrence and evolution. This has been made possible by not only fine-tuning the NDWI-based thresholding method to detect lakes in Antarctica, but also by reduction of misclassification errors resulting from inclusion of spectrally-similar classes in the classification results (e.g., shaded rocks and cloud shadows). Prior to lake detection, two masking procedures are applied: (1) to identify clouds in each image, necessary for interpretation of temporal variations of lake areas and volumes, and (2) to detect areas of exposed on a per-scene basis instead of using static rock outcrop masks, which do not capture variability in rock areas due to variable seasonal snow cover. Our dynamic rock/seawater classification approach for both L8 and S2 sensors also allows for accurate masking of seawater, on a scene-by-scene basis.
Visual inspections of nearly ~1000 L8 and S2 images reveal consistent and reliable lake area delineations for images acquired at solar elevations above 20°. To complement our visual assessments, we compared the lake classification results against a validation dataset of manuallytraced lake boundaries. Overall accuracies of >95% for L8 and >97% for S2 further confirm reliable Grey bars indicate the percentage of all of the pixels within the entire image that are entirely obscured by clouds (cloud cover percentage per scene derived using our cloud detection method).
The Amery time series shows almost no melt ponding during the 1 November-1 January period across 6 years. This points to early January being the start of melt surface storage in this region, consistent with findings by Langley et al. [32]. Starting in January, meltwater volumes steadily increase before peaking in early February, and then decrease through the remainder of the melt season. While there is significant inter-annual variability, superimposing all available years together present a coherent picture of season supraglacial melt storage behavior on ice shelves Antarctica.

Conclusions
We present the results of a first approach to generate records of lake area and volume evolution in Antarctica across large spatial and temporal scales, exploiting data from optical satellite sensors, Landsat 8 and Sentinel-2 A and B. Assessment of our improved threshold-based lake identification scheme across four different meltwater systems, namely the Amery, Roi Baudouin, Nivlisen, and Riiser Larsen ice shelves, point to its potential to produce continent-wide maps of lake occurrence and evolution. This has been made possible by not only fine-tuning the NDWI-based thresholding method to detect lakes in Antarctica, but also by reduction of misclassification errors resulting from inclusion of spectrally-similar classes in the classification results (e.g., shaded rocks and cloud shadows). Prior to lake detection, two masking procedures are applied: (1) to identify clouds in each image, necessary for interpretation of temporal variations of lake areas and volumes, and (2) to detect areas of exposed on a per-scene basis instead of using static rock outcrop masks, which do not capture variability in rock areas due to variable seasonal snow cover. Our dynamic rock/seawater classification approach for both L8 and S2 sensors also allows for accurate masking of seawater, on a scene-by-scene basis.
Visual inspections of nearly~1000 L8 and S2 images reveal consistent and reliable lake area delineations for images acquired at solar elevations above 20 • . To complement our visual assessments, we compared the lake classification results against a validation dataset of manually-traced lake boundaries. Overall accuracies of >95% for L8 and >97% for S2 further confirm reliable and accurate delineation of distinct lake areas. To assess the performance of our method even more, we compare our results over the Amery Ice Shelf against those obtained by a supervised classification scheme developed in a companion paper (Halberstadt et al., [36]). The two methods provide a consistent and coherent picture of lake evolution across each melt season over the 2013-2018 period, producing an RMSE~70 km 2 (less than 1% of total lake areas).
To ensure the interoperability of the full L8 and S2 dataset, we examined the results from (near-)contemporaneous L8 and S2 image pairs. Despite the inherent differences between the two sensors, our comparisons indicate close agreement between their depth measurements (R 2 = 0.85) with almost no bias and an average RMSE of 0.2 m.
Future work will involve evaluation of depth measurements against data from the Ice, Cloud, and land Elevation Satellite-2 (ICESat-2). Furthermore, we will apply our classification methodology to the entire Antarctic Ice Sheet using L8 and S2 data over all regions of the continent that contain melt ponds and lakes. We will analyze all available data acquired throughout the austral summers from 2013 to 2019 (regardless of cloud coverage), which will yield a continent-wide product of lake occurrence, lake depths, and volumes, at 10-30 m spatial resolutions. Our dual-sensor approach will enable construction of high-resolution time series of lake areas and volumes at large spatial scales, by filling the gaps resulting from either cloud cover or non-ideal solar illumination conditions for either sensor. This product, which will be hosted by the Antarctic Glaciological Data Center (AGDC), will be crucial for better understanding of surface hydrological processes and their influence on Antarctica's contribution to sea level rise.