Fully Automated Detection of Supraglacial Lake Area for Northeast Greenland Using Sentinel-2 Time-Series

The usability of multispectral satellite data for detecting and monitoring supraglacial meltwater ponds has been demonstrated for western Greenland. For a multitemporal analysis of large regions or entire Greenland, largely automated processing routines are required. Here, we present a sequence of algorithms that allow for an automated Sentinel-2 data search, download, processing, and generation of a consistent and dense melt pond area time-series based on open-source software. We test our approach for a ~82,000 km2 area at the 79 ◦N Glacier (Nioghalvfjerdsbrae) in northeast Greenland, covering the years 2016, 2017, 2018 and 2019. Our lake detection is based on the ratio of the blue and red visible bands using a minimum threshold. To remove false classification caused by the similar spectra of shadow and water on ice, we implement a shadow model to mask out topographically induced artifacts. We identified 880 individual lakes, traceable over 479 time-steps throughout 2016–2019, with an average size of 64,212 m2. Of the four years, 2019 had the most extensive lake area coverage with a maximum of 333 km2 and a maximum individual lake size of 30 km2. With 1.5 days average observation interval, our time-series allows for a comparison with climate data of daily resolution, enabling a better understanding of short-term climate-glacier


Introduction
The accelerating ice loss of the Greenland Ice Sheet, especially during the last decade, has been detected in multiple studies and marked the island as a focus of cryospheric, atmospheric and oceanographic research and modeling [1][2][3]. Supraglacial lakes (SGL) characterize the melt area (upstream the grounding line) of several marine-terminating outlet glaciers [4,5]. SGL influence surface melts through lowering the albedo and can affect ice flow velocities after drainage through reducing basal friction [6,7]. Furthermore, the variability of their spatial extent and coverage has been reported to be linked to regional variations of surface temperatures [4]. For the monitoring of these lakes, multispectral satellite-based remote sensing is a reasonable method, as the areas that need to be monitored are vast, and the SGL, except in the case of cloud cover, can be mapped efficiently using single bands or band combinations. As a consequence, in (south-) west Greenland, numerous studies have related spectral information from different sensors (Table 1) to either in situ measurements of lake depths or sinks derived from digital elevation models (DEMs, [8] and references therein).
Though potentially, there are numerous multispectral sensors available, each technique has merits and flaws, mostly regarding resolution, acquisition frequency, spatial coverage and accessibility (Table 1). Thus, in most studies, a compromise must be made. Sentinel-2 A/B (S-2), with its multispectral instrument (MSI), combines the advantages of the sensors above, having 10 m resolution with less than two days revisit interval for polar areas available at no charge. Thus, even though data are only available since the launch of S-2 A in June 2015, it offers an unprecedented chance to detect and monitor SGL in remote and polar regions at high spatial and temporal resolution. Until now, there are only very few studies that employ S-2 for the detection of supraglacial lakes or rivers [9,10]. A common problem hindering an automated detection of SGL with multispectral sensors over Greenland is clouds. This problem is enhanced by the gap between the revisits of the satellite, especially for the Landsat series, as, if one or more acquisitions are unusable, large gaps in the time-series can occur. Furthermore, standard cloud detection products (Sen2cor, fmask) are optimized for vegetated and built-up areas but have deficiencies in reliably separating ice and snow from clouds over polar areas (see Sections 3 and 4). Thus, checking of data must be done manually, especially if only parts of the area of interest (AOI) are cloud-covered. While for small areas and/or single days, this is a valid procedure, processing of large areas or longer time-series requires a different solution.
The majority of all SGL detection studies for Greenland were carried out in western Greenland, in the region between Disco Bay (~70 • N) and Kangerlussuaq (~67 • N) (Figure 1), mainly because (a) of the number of SGL that appear here on the lower ice sheet each year, and (b) this part of Greenland is comparatively well accessible and logistics for groundtruthing of, e.g., lake depth is available [11]. Only very few studies were conducted outside this area, e.g., at Petermann Glacier or northwest Greenland ( Figure 1). The east coast of Greenland, and especially the northeastern part, has been subject to only one previous high-resolution multispectral study on SGLs [12]. Only one study using MODIS in 2009 covered the melting seasons 2002-2008 and found a maximum seasonal total lake area of 149.6 km 2 [13]. Contrasting to the spatial distribution of studies, northeast Greenland is expected to show the greatest inland expansion of SGL in the coming decades [14]. However, continuous monitoring of SGLs is currently lacking.
With this study, we aim to: 1. develop a method that allows for continuous and automated detection of SGLs using solely open-source software; 2.
provide an up-to-date and high-resolution dataset of SGLs for the Northeast Greenland Ice Stream (NEGIS) outlet glaciers, Nioghalvfjerdsbrae and Zachariae Isstrøm; 3.
detect characteristic features of melt pond development in northeast Greenland.  [5,13,[15][16][17][18][19][20][21][22][23][24][25][26][27][28]. Studies are color-coded by the year of publication. In case no exact coordinates were given in the respective study or a large area was monitored, the point was approximated to the center of the area of interest. In the case of multiple publications of the same first author for the same area of interest, only the first publication is shown. Studies covering the whole ice sheet, e.g., [29], are not shown.

Area of Interest
The study region of~82,000 km 2 covers the ice sheet upstream the grounding lines of Nioghalvfjerdsbrae (hereafter 79 • N Glacier) and Zachariae Isstrøm between sea level and~1500 m a.s.l. The extent of~150 km inland and, due to the widening of the glacier bed into the ice sheet,~300 km latitudinal, is covered by ten S-2 tiles ( Figure 1). The glacial dynamics of the region have been strongly affected by climate warming, e.g., through the collapse of the floating tongue of Zachariae Isstrøm in 2012 [31]. In contrast to the more southern Greenland east coast, the catchment of NEGIS shows comparatively gentle slopes, which enable the retention of surface meltwater upon the ice and thus the formation of numerous lakes during the melting season.

Preprocessing of Sentinel-2 Data, Screening, and Preselection
In total, 39,919 S-2 A/B level-1C (L1C) scenes were downloaded from the Google cloud storage using the Google Cloud SDK repository for Ubuntu (https://cloud.google. com/storage/docs/public-datasets/sentinel-2?hL=de, last accessed 24 May 2020). We requested a minimum data coverage of 90%, which was checked using quickviews, and a full set of 10 granules; all data with higher no-data content and no full coverage were removed. For all remaining days, the granules of bands 2 (blue, 0.460-0.520 µm), 3 (green, 0.534-0.582 µm) and 4 (red, 0.655-0.684 µm) were first reprojected to EPSG 3413 and then separately merged. In order to circumvent misclassifications in the next steps, we removed the areas not covered by the glacier (rock, ocean) using the GIMP land classification map [30], which was edited manually using two true-color S-2 A images from July 2016 to accurately delineate the up-to-date ice margin.
S-2 data are available between mid of March and mid of September for northeast Greenland, though the first and last day of the year (DOY) for which a complete set of images was found varies between DOY 82-93, and 262-263, respectively. Though S-2 A was launched in June 2015, the data coverage and quality for its first year were found too low to facilitate a meaningful time-series analysis. With S-2 B starting in March 2017, the year 2016 had the lowest data coverage among the 2016-2019 period, while the highest number of complete scenes was found for 2019 (Table 2). On average, 120 complete scene sets were found per year, resulting in an average interval of 1.49 days.

Water Area Delineation
The complete algorithm developed to detect and calculate lake areas is summarized in Figure 2. Following Pope and colleagues (2016), we applied a static band ratio of the blue (band 2) and red (band 4) top-of-atmosphere reflectance, as successfully used in previous studies for Landsat and MODIS [4,11,24,32]. We also tested the normalized difference water index (NDWI) as used by Williamson and colleagues [9], but continued with the band ratio due to comparable results but significantly less computation time (see Section 4.1.1). For determination of the threshold which delineates ice/slush from water, we empirically tested values between 1.0 and 2.4 in 0.2 steps and compared the resulting masks to true color images for several dates and found >1.6 as the best fit. Subsequently, for all dates, the B/R ratio was calculated, and the threshold of 1.6 applied, resulting in binary water masks with 0 as ice and 1 as water.

Postprocessing
The removal of noise is the main aim of postprocessing. Four main sources of noise were identified and addressed: • Water-soaked snow and meltwater channels; • Lake ice on SGLs, resulting in "donut lakes" [33]; • Topographic shadows misclassified as water; • Clouds and cloud shadows, covering lakes or being classified as water areas.
The effects of the subsequent steps are exemplarily visualized in Figure 3.

Area Reduction
To allow for faster computation, the binary B/R ratio images were first cropped to the ablation area of 79 N. We define the eastern extent by the glacier outline mask (Section 2.2) combined with the grounding line based on ERS-II SAR [34], and the northern and southern boundaries are delimited by the extent of S-2 granules 26XMQ, and 26XMN/26XNN, respectively. The western extent is limited by the surface mass balance equaling zero, calculated using the regional atmospheric climate model version 2.3 (RACMO2.3) [35].

Topographic Shadow Masks
Especially in spring and autumn (March-May and September/October), due to the low solar elevation angles of >70 • from nadir ( Figure A1), large shadows are cast on the ice even though small topographic features such as hills, crevasses or seracs. These shadows have a similar spectral blue-red ratio as water and thus are commonly misclassified using the band ratio method [36]. To correct for these misclassifications, we applied a topographic shadow model using the R package "insol" [37] ( Figure A2). We used the ArcticDEM with 10 m resolution to fit the resolution of the S-2 derived water mask. Additionally, the sun position (elevation and azimuth) is retrieved for the exact time of image acquisition from the S-2 metadata. As the model requires substantial memory resources, we divided the DEM first into parts covering each granule, and subsequently split each part into nine (3 × 3) equal pieces and calculated the topographic shadows for each part and day separately. Afterward, the small binary shadow masks were merged again to cover the whole AOI. A random sample of 30 of the resulting binary masks from all years and seasons was compared visually to the natural-color image of the respective date to ensure fitting and quality of the calculation; afterward, the shadow mask was subtracted from the previous water mask.
As the computation of shadow masks is computationally expensive, we developed a general function for the sun's elevation and azimuth based on S-2 metadata of 2016, 2017 and 2018 to speed up this process. Validity-tested using the 2019 data, the pre-computed shadow masks serve as a look-up table for future processing. To achieve that, we calculated the mean for both parameters for each available acquisition and used a local regression to predict the missing values and smoothen the existing ones in order to achieve an idealized solar cycle ( Figure A1). These estimated values of elevation-and azimuth angles were used for the shadow calculations of 2019 and are also implemented in the continuously running algorithm.

De-Noising and Filling
The shadow-corrected rasters were subsequently filtered using the sieve tool embedded in the geospatial data abstraction library (GDAL, [38]). To find the lowest possible pixel threshold to retain the maximum information and remove the maximum percentage of noise, we incremented the size threshold starting with 10 at a connectedness of 8 (diagonal pixels are counted as connected). We found the best tradeoff between signal and noise to be reached at 150 pixels, thus setting the minimum detectable lake size to 15,000 m 2 or 0.015 km 2 .
To accelerate computation, the conversion from raster to polygons should be done at the earliest stage possible. We implemented this step after the removal of small pixel clusters (sieving). The conversion to polygons included the filling of circular multipolygons ("donuts") by unifying multipart polygons into single parts.

Masking with Topographic Sinks
Supraglacial lakes are known to form in numerous surface sinks of the Greenland Ice Sheet during each melt season. As sinks are generated by bedrock undulations, SGL form at more or less the same position each year, rather than moving with the ice. It has been suggested that their position is largely controlled by the underlying bedrock topography [39]. Therefore, it is possible to detect potential lake locations by analyzing topographic data of the ice sheet. For this, we employed the ArcticDEM, gridded to 100 m spatial resolution [40]. In classical hydrologic applications, surface depressions are often assumed to be artificial and are filled prior to the calculation of, e.g., stream networks. This assumption does not hold for the Greenland Ice Sheet where surface depressions exist. In order to detect these topographic sinks, we deployed the fill_depressions tool implemented in the open-source geospatial analysis library WhiteboxToolsTM [41]. The resulting depressionless DEM was then subtracted from the original DEM and differences <0 were masked as sinks. Finally, we employed a 3 × 3 majority filter on the sink mask and assigned an explicit ID to each sink.
To be able to track individual lakes, a spatial join was conducted to attribute the lakes to their respective sinks. This resulted in 1035 individual sinks, covering between 0.01 and 97.4 km 2 , of which 880 were water-filled at least once.

Cloud Detection
Currently, no reliable procedure for S-2 exists that can separate all kinds of clouds from ice-or snow-covered areas. The partial transparency of cirrus-type clouds further complicates the issue. The scene classification map generated during the computation to Level 2A generally allows for a separation of different cloud types and snow. Nonetheless, the delineation of clouds over ice gives, based on a sample taken from different years, months and weather conditions, different but not more accurate results compared to the L1C cloud mask over northeast Greenland. Tests with other methods (e.g., fmask [42] or SWIR reflectance [9]) resulted in similar unsatisfying cloud masks (see Section 4.1.2). An approach based on image time-series, as implemented in MAJA [43] or integrated into a deep learning environment as with s2cloudless by Sentinel Hub (https://medium.com/ sentinel-hub/sentinel-hub-cloud-detector-s2cloudless-a67d263d3025) are promising, but were not implemented in this study as they would have required substantial additional technical effort.
As an intermediate solution, we implemented a two-step approach that, following the MAJA approach, is based on the detection of changes in ground features over time.
Time-Dependent Lake Visibility First, we combine the sink mask polygons with the polygonized lake mask of the date in question and those of the previous and following 15 scenes. Iteratively, the sink IDs get classified as -0, if no area intersection is detected at day x and (a) was not detected since the beginning of the year, or (b) is not detected in the following 15 scenes; -1, if no area intersection is detected at day x but has already been detected since the beginning of the year and reappears during the next 15 scenes; -2, if an intersection is detected (lake is visible).
Scenario 0 applies to the start (a) and the end (b) of the melt season and avoids wrong classifications of still/already frozen or empty SGL. IDs classified as 1 are labeled as cloud covered, and the (non-existing) area is replaced by the mean of the last and the next area. Potential errors arise from fast-draining SGL or partial cloud coverage; these are discussed in Section 4.1.2. This is a false-negatives correction intended to identify probably existing but cloud-covered lakes.

Detection by Spatial Cloud Extent
The second step is based on the assumption that thick clouds (or their shadows) often cover areas larger than a single SGL. To catch these larger clouds, we compare the water polygons of each date before clipping with the sink mask to the polygons within the sink mask to check if any water polygon overlaps more than one sink. If this is the case, the respective lake IDs which are covered get labeled with 1 (cloudy, see above) for this date. This is a false-positives correction, preventing dark clouds or cloud shadows from being labeled as lakes.

Total Error Assessment
Potentially, several error sources can influence the outcome of the processing chain and may add up or elevate each other out. The most important uncertainties arise from (a) the B/R threshold choice, (b) the accuracy of the sink mask, (c) the rigidity of the sieving, (d) the ability of the cloud filters to remove false classifications and (e) the accuracy of the shadow masks. As each of the error sources are difficult to even estimate, we compare the final product of the algorithm to a set of 100 manually identified lakes, randomly chosen from the entire time series. This approach relates to the error assessment of Selmes and colleagues [32] and was chosen as it ensures a completely independent dataset independent of preconditions such as low cloud cover or large lake size. It should nonetheless be noted that also a manual delineation introduces errors, as digitizing is (a) subject to interpretation and (b) not constrained to pixel borders. To estimate this additional error, we adapted the recommendations given by Paul and colleagues for glacier outlines [44] and repeated the digitizing of 10 lakes of different sizes two times in order to get three individual and independent sets. Additionally, for each sample scene date, we classified the degree of cloud cover by visual inspection, ranging from 1 (no clouds at all) to 6 (extremely cloudy) to be able to relate large errors to cloud conditions (other classes: 2 = rare/thin clouds; 3 = few cloudy patches; 4 = medium cover; 5 = strong/thick cloud cover). We also classified the quality of the merged RGB, based on, e.g., artifacts caused by fast moving clouds, or partial missing data, in classes from 1 (perfect image) to 6 (barely useable).

Results
For the whole period, lakes appeared in 880 of the 1035 sinks. The majority of all lakes (93.73%) have an area below 1 km 2 ( Figure 4B). These lakes account for 48.63% of the total detected lake area between 2016 and 2019. The largest single lake detected has an area of 30.49 km 2 and appeared on 29 July 2019. The mean/median SGL size is 274,540/64,212 m 2 , with a standard deviation of 775,750 m 2 . In total, the 10 S-2 granules cover an area of 82,427 km 2 , with the ablation mask covering 27,504 km 2 . The mean/median error is 2505/3870 m 2 per lake, or 0.044/0.024 m 2 per pixel (100 m 2 ).  The melt onset can be defined as the date when free water is continuously present in the snowpack [45]. After a few days of continuous melt days, SGL becomes visible through the melt of covering ice or the confluence of mobile water. GL appeared earliest in 2017 and disappeared latest in 2018, the latter being delayed by 25 and 14 days compared to 2017 and 2016, respectively (Table 3). Though 2018 thus has the latest appearance and simultaneously the lowest maximum total lake area, maximum lake extent and number generally do not seem correlated with the timing of the melt season. Table 3. Lake area characteristics per year. If there is still > 1 km 2 of lake area at the end of the sensing period, the melt season end cannot be determined (-).

Year
Max Lake Area (km 2 )

Northeast Greenland
Two studies have previously focused on SGL detection in the northeast Greenland sector, though the study by Selmes and colleagues [32] also contains data from northeast Greenland.
The first study by Sundal and colleagues [13] using MODIS covers the period 2003 to 2007. They report maximum daily SGL areas between 81.1 (2006) and 149.6 (2003) km 2 , which is significantly less than our assessment (Table 2), though the lowest maxima are comparable (76.66 km 2 in 2018). This can be explained by mainly three factors: (1) the resolution of MODIS is 250 m/pixel ( Figure A3); thus, the minimum detectable SGL size is 62,500 km 2 , thus lakes smaller than 0.1 km 2 were not included in the study. For our dataset, these lakes account for 58.78% of all lakes in number and 7% of the total SGL area. 53% of this day's SGL area. Nonetheless, it should be noted that our study probably underestimates the true lake area (Section 4.2) and that the difference may be even higher. Regarding timing, the results are generally in agreement: the maxima are reached between DOY 202 and 223 for the former study, and between DOY 206 and 233 in this study, with the dates being subject to data availability. The active melt season varies between DOY 160-180 (143-164) and DOY 258-278 (253-263) for the former study (this study).
The second and newer study by Schröder and colleagues [12] mainly employs polarimetric SAR (Sentinel-1), but also involves data from Sentinel-2 and covers the period 2017-2020. Their research area covers 76 to 80 • N and −22 to −34 • W, up to >2000 m a.s.l. As two datasets with different properties and results are involved, they need to be compared individually to the outcome of this study. In general, the SAR-based method results in significantly larger water areas (up to a factor of 3) than the S-2 based detection methods. We attribute this difference mainly to (a) the difference between the measured electromagnetic spectra and thus the measured properties of the ground, (b) the inability of multispectral sensors to penetrate ice and snow, and (c) the different definitions of SGL. The classification of the Sentinel-1 backscatter is mainly based on the total reflection of the microwaves away from the sensor and a resulting minimal backscatter due to the side-looking geometry of the SAR sensor. Additionally, microwaves from the C-band are able to penetrate snow, dependent on its water content and ice [46]. Potentially, these properties can be utilized to measure the total water content of the glacier surface, including liquid water covered by frozen surfaces. The latter is a major advantage over Sentinel-2 based classifications, where only visible water can be detected. On the other hand, the SAR detection is heavily influenced by liquid precipitation and the high water content in snow during the main melt season [12]. Consequently, it is probable that, during summer, SAR SGL detection overestimates lake areas, whereas multispectral methods underestimate the true lake area during the early and late melt season. Therefore, a direct comparison of the total lake area is challenging. For the Sentinel-2 dataset in the former study, the total lake area is generally smaller than within our study, except for the maximum extent of 2019. We attribute this mainly to the use of the NDWI with a threshold tested for west Greenland. Due to the difference in radiation between 70 • N and 79 • N, a tuning of the threshold to local conditions would probably have resulted in larger lake areas [47].

Comparison with Other Areas in Greenland
A wealth of studies focusing on SGLs were performed in the perimeter of Jakobshavn Glacier, West Greenland ( Figure 1) using MODIS, Landsat, ASTER, or a combination of these sensors. Due to the large number which would require a distinct review article to compare, and the fact that most of the studies focus on determining lake depths and volumes, we concentrate on studies (a) of great spatial extent, e.g., the whole ice sheet, and (b) time-series studies, as these may deliver similar parameters such as melting season, annual maxima, etc.
Of the sheet-wide studies, Selmes and colleagues [32] found a mean/median lake area of 0.80/0.56 km 2 (0.63 km 2 for northeast Greenland), which is large compared to our results but may be influenced by the minimum resolution of MODIS, preventing a large number of small lakes from being detected (Section 3). Additionally, they report a maximum lake size of 16.8 km 2 for the whole Greenland Ice Sheet (GrIS), and a mean total lake area of 268 ± 79 km 2 for northeast Greenland between 2005 and 2009. We attribute the differences, especially in maximum lake size, to an upward migration of the ELA, especially in warm years, and consequently an increase of the ablation area in higher altitudes. This is in accordance with Liang and colleagues [48] as well as Gledhill and Williamson [4], who confirmed that SGL area in high altitudes could vary largely, and single lakes may even drive the dynamics of the whole SGL area for a region. As a contrast, we found lakes close to the grounding line to exhibit similar areas independent of the year, which is in accordance with previous time-series studies [22,48,49].
The peaks in the total lake area as reached in late July/early August, appear roughly two to four weeks later than at Petermann Glacier [19], as well as in West Greenland [23], though Fitzpatrick and colleagues report a spread of the date of maximum total lake volume between 5 June and 8 August for the period 2002 to 2012 [15].

Static Band Ratio
The similarity of the spectral information of topographic shadows and water in the blue and red band, compared to the near-and shortwave infrared, is a major disadvantage of the B/R ratio and requires correction in postprocessing. When processing time-series, the computation of the binary topographic shadow masks are required for each acquisition and each granule to be analyzed. For the S-2 NDWI, either band 8 (NIR, 10 m resolution) or band 12 (SWIR, 20 m resolution) needs to be resampled for each date. Consequently, from a computation time perspective, NDWI is better suited for short time-series, as processing shadow masks is time-consuming. Longer time-series, however, profit from the pre-computed shadow masks so that the B/R ratio outperforms NDWI at a certain point, which is subject to hardware resources and a number of scenes in question. For areas without considerable topographic shadows, NDWI and B/R ratio without correction can be expected to deliver similar results [47].
The majority of studies on SGLs use a band ratio with dynamic thresholding [29,32,50]. In theory, this approach is superior to static thresholding, as it is less vulnerable to low contrast, as, e.g., with low sun angles, or if half-transparent cirrus clouds are present. Thus, it may prevent outliers and is potentially applicable to larger areas [29]. Our decision for a static threshold was mainly driven by the computational effort, which is less for static band ratios and thus preferable for large datasets given limited resources. Leeson and colleagues [51] suggested the combination of three different methods for the best accuracy regarding lake size and number, which would have tripled the processing and analysis time. In addition, Williamson and colleagues found that B/R ratio static thresholding resulted in lower RMSE values than other SGL area algorithms if tuned to the study region [47]. Thus, we are confident that our approach is adequate given the high number of scenes, but if applied to other regions, the static threshold should be adapted or replaced by a dynamic threshold if simultaneous analyses of different areas are desired.

Cloud Correction
Step 1 of the cloud correction implements a time-dependent control that is based on the dynamic nature of weather over northeast Greenland, e.g., with respect to the position and strength of the jet stream and the NAO or the occurrence of katabatic winds or piteraqs [52,53]. Though it is reasonable to assume that a lake is not masked by the same cloud for several days, an obscuring for more than one scene is likely. The relatively long threshold of ±15 days, covering ±10 scenes on average, increases the chance of finding a cloud-free observation. This comes at the cost of the detection of rapid lake drainage and refilling, which may happen in less than seven days [48,54], and the dis-and reappearance of a lake within this period will be classified as clouded. As such, this classification step, as well as step 2, acts as a low-pass filter for individual lakes. In addition, it is not possible to detect partial cloud coverage of lakes using this procedure. The differentiation between partial cloud cover and partial drainage does necessarily involve data from either other bands or data sources.
To quantify the performance of cloud detection algorithms is a challenging subject, as, aside from manually labeled datasets, no validation data exists. As manual classification is a time-consuming task that additionally requires training, result validation is typically done on a limited number of scenes, of which only a small percentage, if at all, covers snowor ice-capped surfaces [55][56][57][58]. As a comparison of algorithm performance, or an error estimation for either the L1C-, Level-2A (L2A) cloud masks or fmask over ice are currently lacking, we compared the results of our cloud correction algorithms for 50 randomly chosen S2 scenes (granules 26XMP and 26XNP) to fmask and the L1C-and L2A classification masks ( Figure 8). The different cloud masks have individual properties, e.g., fmask differentiates between several land cover types with clouds as one category, the L2A cloud mask grades clouds by thickness/type, and the L1C cloud mask is simply binary. To compare these different types, we generated binary cloud masks, in the case of fmask containing only the categories clouds and else, and in the case of the L2A mask, all values deviating from 0 were classified as clouds. For our cloud correction, we used the cloud classification matrix to classify the sinks as clouded or cloud-free, regardless of actual lake size. Again, a classification of a cloud mask as even partially wrong is ambivalent, as, through thin clouds, lakes may be detectable. Thus, a potentially high percentage of detected lakes that are classified as cloud-covered is likely. To quantify the potential error adequately, we therefore also compared the lakes classified as cloudy by our procedure to their cloud cover by the other masks ( Figure 9). The standard Level 1C cloud mask performs best on average. The fit of fmask, especially for the cloud-free lakes, is strongly influenced by few heavy outliers ( Figure 9B), whereas the L2A cloud mask obviously overestimates cloud cover. The latter is presumably skewed by our classification of light clouds as thick clouds. Comparing the fit of the three cloud masks among each other, the L1C mask and fmask generally exhibit the least differences (RMSE 10.89 for cloud-free lakes and 6.60 for clouded lakes, see Figure A4).
As these relations are relative to the cloud detection implemented in our processing chain, these values must be regarded with caution, as we cannot assume the cloud detection works perfectly and is probably still prone to errors, e.g., arising from partial cloud cover, cloud shadows, or drainage events. Thus, the need for a better performing cloud detection over ice and snow persists, and the results of this study would benefit substantially from new and more exact approaches in this field.

Lake Drainage Detection
Rapid lake drainage events are one of the most interesting features of SGL, as these may lead to a (local) increase in ice velocity [6,59,60]. Our SGL time-series offers, through its high temporal and spatial resolution, the chance to detect rapid lake drainage events occurring in less than two days. We tested the potential of the improved temporal and spatial resolution on a known drainage event analyzed by Schröder and colleagues [12]. The lake (ID 577 in our dataset), located at N 78.87 W 21.73, has been found to drain completely between 20 July 2017 and 26 July 2017 ( Figure 10). Within the seven-day period, the lake size is reduced from 2.34 km 2 to 0.04 km 2 , equal to 1.83% of the original lake area. Due to the daily resolution of our dataset within this period, the timing of the drainage event can be narrowed down to have occurred between 20 July and 23 July 2017, thus complying with the criteria for rapid drainage events by Morriss et al. [23], Fitzpatrick et al. [15] and Liang et al. [48]. A more exact determination of the event is hindered by thick cloud cover, a problem that has been discussed in detail by Cooley and Christoffersen [28]. According to the classification proposed in their study, a lake must drain 90% of its maximum area within 24 h (given daily observations with MODIS) to be classified as rapidly draining, while cloud-covered observations are excluded from this period. As stated in Section 4.2.2, our two-step cloud detection prevents, in probably most cases, the classification of an area reduction as rapid lake drainage due to the interpolation interval of maximum15 days. This is perfectly illustrated by the example in Figure 10, as the lake loses more than 90% of its area within 48 h; without consultation of the cloud masks, this would, following Cooley and Christoffersen [28], not be classified as rapidly draining. As 21 July is flagged as cloud-covered by cloud mask #1 and 22 July by cloud mask #2 (Figure 10), the drainage event should be classified as rapid. This corroborates the findings that there is an observation bias in rapid drainage event analyses based on multispectral satellite data due to cloud cover, hindering rapid drainage events from being defined as such. We strongly agree that cloud masks, if available, should be considered in the definition of the speed of lake drainage events.

Error Discussion
The calculated root means squared error (RMSE) between both sample datasets sums up to 189,377.2 m 2 ( Figure 11A). This compares well to the root mean square deviation (RSMD) of 0.22 km 2 given by Sundal and colleagues [13], who compared a sample of 53 lakes obtained from MODIS and ASTER imagery. In contrast to the RMSD of 0.007 km 2 given by Williamson and colleagues [9] using S-2 and Landsat 8, the RMSE is large; this can be reasoned by several points: (i) the cloud correction implemented is still inferior to manual checks, (ii) manually delineated lakes as ground truth sample introduce a larger error compared to Landsat-derived lakes, where the same methodology was applied and (iii) large errors in our sample dataset that mainly stem from low contrasts, with static thresholding probably producing less exact results in these cases ( Figure 11C). Setting a threshold to delineate ice from water is potentially risky, as the transition from solid to fluid is not separated by a sharp border, but continuously, with intermediate states such as slush; also, if the water is covered by an ice layer, the depth of the underlying lake influences the reflectance, both of which have a continuous measure. As to this day, the only way to minimize classification errors is an expert-driven manual mapping; every automated procedure is a compromise between quantity and quality. Due to increasing quality, detail and sources of satellite data, this conflict could potentially be minimized through the increasing use of deep learning and/or artificial intelligence, both of which are able to include the time domain in the analysis, and thus the evolution of the glacier surface. Our approach to reducing the possible error through empirical identification of the most precise threshold and the integration of the time domain in postprocessing nevertheless aims at the best possible and most realistic results, though an exact error cannot be quantified.

Conclusions
Multiple studies mapping SGLs in Greenland have used MODIS, albeit its low spatial resolution, or Landsat, which suffers from long repeat cycles, preventing short-term glaciological or climate-induced processes from being detected. For all multispectral sensors, cloud cover, especially over snow and ice, is an obstacle that is hard to overcome using band combinations alone, thus hindering fully automated image processing and requiring manual inspection of the data.
In this study, we explore the potential of S-2 to detect SGL in a fully automated way. S-2 combines the merits of both MODIS (high temporal resolution) and Landsat (high spatial resolution), especially in high latitudes. For the melt seasons 2016 to 2019, we detected 880 lakes with a minimum size of 0.015 km 2 and an average temporal resolution of 1.5 days. The main improvements on the few previous studies employing S-2 for this purpose are (a) the consideration of topographic shadows, limiting the error induced by the spectral similarity of shadows and water on ice, and (b) the implementation of a two-step cloud filter that does not require manual inspection, and takes spatiotemporal characteristics of clouds and lakes into account. The high temporal density of the dataset enables the detection of rapid lake drainages as well as the analysis of the glacier surface's changes due to local weather phenomena, potentials which have only been initially tapped in this study and demand for in-depth exploration of the data. As with all studies employing multispectral data, clouds still introduce a potential source of error; thus, the accuracy of the algorithms will benefit from any future progress in cloud detection using S-2 [61].
The whole processing chain was developed using free and open-source software alone. It enabled us to detect and track 880 SGLs over four years, allowing insights into spatial and temporal properties of surface melt of the NEGIS. The processing chain is designed to be converted into a continuously running version with only minor changes, so near real-time changes can be detected. Given the availability of cloud computing and the continuous increase of storage and computation capabilities, the algorithms are also scalable to investigate SGLs over larger areas or even the whole Greenland Ice Sheet. Data Availability Statement: Lake outline polygons for 79 N as well as cloud masks are currently available on request and will be uploaded to Pangaea Data Center after manuscript publication. Figure A2. Algorithm for calculating topographic shadows for the whole area of interest (AOI) per day, executed for each date with full coverage. Rectangles are input/output data; diamonds are processes. Figure A3. Influence of pixel size on lake detection: contours of the largest detected lake using Sentinel-2 on 29 July 2019, against MODIS band 1 reflectance of the same day (MOD09GQ, 250 m resolution). Figure A4. Relations for the clouded (a-c) and cloud-free (d-f) test data for the L1C-, fmask-and L2A cloud masks, with the respective RMSE. Color-code as displayed in Figure 9.