Antarctic Supraglacial Lake Identiﬁcation Using Landsat-8 Image Classiﬁcation

: Surface meltwater generated on ice shelves fringing the Antarctic Ice Sheet can drive ice-shelf collapse, leading to ice sheet mass loss and contributing to global sea level rise. A quantitative assessment of supraglacial lake evolution is required to understand the inﬂuence of Antarctic surface meltwater on ice-sheet and ice-shelf stability. Cloud computing platforms have made the required remote sensing analysis computationally trivial, yet a careful evaluation of image processing techniques for pan-Antarctic lake mapping has yet to be performed. This work paves the way for automating lake identiﬁcation at a continental scale throughout the satellite observational record via a thorough methodological analysis. We deploy a suite of di ﬀ erent trained supervised classiﬁers to map and quantify supraglacial lake areas from multispectral Landsat-8 scenes, using training data generated via manual interpretation of the results from k-means clustering. Best results are obtained using training datasets that comprise spectrally diverse unsupervised clusters from multiple regions and that include rock and cloud shadow classes. We successfully apply our trained supervised classiﬁers across two ice shelves with di ﬀ erent supraglacial lake characteristics above a threshold sun elevation of 20 ◦ , achieving classiﬁcation accuracies of over 90% when compared to manually generated validation datasets. The application of our trained classiﬁers produces a seasonal pattern of lake evolution. Cloud shadowed areas hinder large-scale application of our classiﬁers, as in previous work. Our results show that caution is required before deploying ‘o ﬀ the shelf’ algorithms for lake mapping in Antarctica, and suggest that careful scrutiny of training data and desired output classes is essential for accurate results. Our supervised classiﬁcation technique provides an alternative and independent method of lake identiﬁcation to inform the development of a continent-wide supraglacial lake mapping product.

under 'worst case' greenhouse-gas emissions scenarios [9]. In contrast, mass loss from the Antarctic Ice Sheet has the potential to raise global mean sea level by tens of meters in future centuries (e.g., [10][11][12][13]) and is projected to dominate global sea level rise in the near future [14,15]. Surface meltwater plays a central role in ice sheet contributions to sea level through both direct surface meltwater runoff and indirect ice dynamical impacts. Liquid meltwater produced on the surface of an ice sheet can pool into bodies of water, such as lakes or streams, if underlain by an icy surface that is sufficiently impermeable. The influence of supraglacial hydrology on ice-sheet dynamics has been extensively explored for the Greenland Ice Sheet, where rapid drainage of surface lakes via hydrofracture to the ice-sheet bed during the early melt season and the development of perennial river networks that drain into moulins during the mid to late season influence subglacial effective pressures and ice flow velocities on short and longer timescales (e.g., [16][17][18][19][20][21][22][23][24][25][26][27][28]).
Supraglacial lakes are of particular importance in Antarctica, where their presence has been shown to induce ice-shelf collapse [29][30][31][32][33], which impacts the flow of upstream ice (e.g., [34][35][36][37][38]) and can trigger dynamic instabilities [39][40][41][42][43]. As Antarctic air temperatures increase, supraglacial lakes will become an increasingly important component of ice sheet mass balance through both direct export via drainage by surface streams [44,45] and meltwater-induced hydrofracturing that can trigger ice-shelf collapse and rapid sea-level rise due to associated dynamic acceleration of interior ice [13]. Supraglacial lakes are generally visually distinct in satellite images; dark blue lakes generally stand out against a white background of snow and ice. Individual lakes are relatively easy to identify manually, but mapping lakes reliably, repeatably, and efficiently across large spatial and temporal scales requires a systematic remote sensing approach.
Previous efforts to identify supraglacial lakes from multispectral satellite data have primarily focused on the Greenland Ice Sheet. A threshold-based approach (i.e., classifying lakes by whether or not reflectances exceed a threshold) has been successful for delineating supraglacial lakes using multispectral satellite data across multiple instruments. Varying thresholds by region have been applied to individual band reflectance values as well as the ratios between bands, such as the normalized difference water index, in order to identify supraglacial lakes (e.g., [25,[46][47][48][49][50][51][52][53][54][55][56][57][58][59][60]). Another approach, dynamic band thresholding, compares the red reflectance of each pixel to the mean red value within a moving window surrounding the pixel (e.g., [61,62]). Williamson et al. [63,64] incorporate this technique into their Fully Automated Supraglacial lake Tracking (FAST) algorithm for capturing lake drainage events. A dynamic moving window approach has also been successfully applied using histograms rather than a pixel mean [65,66]. Lake boundaries can be refined by assuming a bimodal distribution of band ratios [52], and textural analysis has been used to identify supraglacial lakes based on a maximum-likelihood algorithm [67]. Finally, object-oriented classification incorporates size and shape criteria in addition to band reflectance thresholds [49].
Although Antarctic lakes are smaller and shallower than Greenland lakes [54], they can potentially exert a much larger influence on ice-sheet stability and global sea level, because most of the Antarctic Ice Sheet is marine-terminating and surrounded by large floating ice shelves (covering 1.6 million km 2 [68]) that are vulnerable to hydrofracturing [29][30][31][32][33]. Antarctic supraglacial lakes have been mapped on individual ice shelves using band thresholding methods [44,54,69,70] but have only recently gained wide appreciation. Kingslake et al. [71] were the first to note widespread surface lakes across Antarctica by manually mapping meltwater features across a temporal composite of cloud-free Landsat imagery. Stokes et al. [72] also mapped lakes in East Antarctica from a composite of cloud-free Landsat imagery, and provided a minimum estimate of lake area during one month of a high-melt summer (January 2017) using a normalized difference water index threshold. Both of these spatially widespread mapping approaches employ image mosaics that merge scenes from different time periods, and therefore capture a time-integrated snapshot of lakes rather than providing detailed information about lake evolution through time. These mosaics use only imagery acquired under ideal conditions (i.e., cloud-free, high sun elevation) which is not representative of the majority of imagery available over Antarctica.
In Antarctica, a high degree of user intervention and effort (i.e., manual interpretation of images) has been required to identify and map supraglacial lakes from individual images, prohibiting broad spatial and temporal coverage. Our goals for this study are to: (1) develop a method for accurate lake identification that is broadly applicable in space and time and is robust for different ice environments; (2) assess the sensitivity of lake identification to training data and especially training classes; (3) assess the sensitivity of lake identification to classification algorithms; and (4) investigate the transferability of our classification scheme across two ice shelves. Specifically, we develop a completely automated method to identify and map lakes in Google Earth Engine, using a combination of unsupervised and supervised image classification techniques to eliminate much of the manual input required. We first run unsupervised k-means clustering, which honors the full spectral diversity present in supraglacial settings and accounts for spectral information a user cannot interpret. We then interpret this output and use these interpreted classes as training data to generate trained supervised classifiers. To our knowledge, this approach has not been previously applied. Ultimately, we exhaustively test six different combinations of training classes across two different ice shelves. We describe the process of creating this classification scheme and test the sensitivity of this method to the user's choice of training data and classification algorithm. We compare lake areas produced by our method with lake areas produced by previously published classification algorithms and assess the spatial and temporal transferability of our classification scheme.

Overview and Study Area
We developed our method for two areas in East Antarctica: Amery Ice Shelf and Roi Baudouin Ice Shelf ( Figure 1). Amery Ice Shelf is fed by the largest ice stream in the world (Lambert Glacier; [73]), discharging about 16% of grounded ice in East Antarctica [74]. Surface meltwater features have been documented on Amery Ice Shelf for many decades (e.g., [71,[75][76][77]), and meltwater has also been observed on the surface of Roi Baudouin Ice Shelf [70,71,78]. From these study area locations, we selected 20 training scenes and 52 application scenes from the available Landsat imagery to represent a wide range of sun elevations and acquisition dates and therefore spectral characteristics (Table S1). All processing was implemented within the Google Earth Engine cloud computing platform (http://code.earthengine.google.com) using scenes collected during the austral summers of 2013-2014 to 2017-2018.
Although mapping the plethora of ice surface types is not the focus of this study, successful classification of lakes relies on accurately identifying multiple ice surface characteristics ( Table 1). Much of the spectral variability on the surface of the Antarctic Ice Sheet results from spatial variability in the physical properties of the ice sheet surface. We distinguish the fast-flowing ice stream environment ("flowing ice", transported from the continent interior out to the marine margin at velocities of many meters per year), from "firn" where snowfall gradually densifies with depth. "Blue ice" occurs where the white upper layer of snow has been removed, often scoured by wind, revealing a blueish ice surface. Instead of pooling into lakes, surface meltwater can also saturate firn to form "slush". The delineations between these supraglacial features are sometimes difficult to interpret; slush can appear very similar to shallow lakes or to wind-scoured "blue ice" regions. Additional features imaged across the Antarctic Ice Sheet surface include rock outcrops, clouds, and cloud shadows. Our supervised classification approach was built to optimize lake identification; non-lake classes were carefully curated to avoid commission errors with lakes, but accurate classification of other environments (e.g., slush, blue ice, rocks, clouds, and cloud shadows) was not the focus of this work. Throughout this work, we evaluate only the lake class product from supervised classification. Lake (blue regions with distinct boundaries); slush (blue regions with diffuse boundaries); blue ice (slightly blue areas with homogenous appearance over large areas, often on the edge of the ice stream), two different kinds of flowing ice (whiter-and darker-colored ice that appears similar to slush or blue ice but covers large regions across the ice stream); and firn (white non-ice-stream areas). Training data selected from Amery Ice Shelf. c6A t7A 7 Uses the t6A classes, but 'lake' class is separated into shallow lakes (k-means clusters that group areas of high confidence lake interpretation together with uncertain or frozen/lidded lake areas) and deep lakes (high-confidence lake areas only). Training data selected from Amery Ice Shelf. c7A t9A 9 Uses the t7A classes, with the addition of two cloud shadow classes (more opaque and less opaque cloud shadows). Training data selected from Amery Ice Shelf. c9A t11A 11 Uses the t9A classes, with the addition of two rock classes (sunlit and shadowed rock outcrops). Training data selected from Amery Ice Shelf. c11A tOB11A 11 Uses the t11A classes, but this object-based ("OB") training dataset also includes shape parameters (area, perimeter, area to perimeter ratio, width to height ratio) in addition to band reflectance.

Training Data Generation
Our approach was to use an unsupervised clustering algorithm to generate statistically differentiable classes and then interpret this output as meaningful inputs into a supervised classification algorithm. Specifically, we used an unsupervised k-means clustering algorithm [80] with efficient estimation of the number of clusters [81] to first identify spectrally unique clusters. We specified a minimum and maximum number of resultant clusters (5 and 80, respectively) for the unsupervised k-means algorithm, which generally produced 30-40 unique clusters. We manually interpreted and consolidated these k-means clusters to generate specific classes of interest ( Figure 2; Table 1).

Figure 2.
Schematic workflow for creating training data (green), generating trained supervised classifiers (orange) and applying the classifiers over Amery and Roi Baudouin ice shelves (blue). Landsat product ID and sun elevation for each training and application scene used in this study are provided in Table S1. An example of the training process is shown for scene LC08_L1GT_128111_20140211. In this example we show results from an 11-class training dataset, although we test numerous training datasets with different training classes in this study (Table 1). In this scene subset, only 4 of the 11 classes are depicted ('Deep lake', two types of 'Flowing ice', and 'Firn').
Since meltwater features can be spectrally ambiguous, we tested the sensitivity of our lake mapping algorithm to different numbers of input classes. We randomly sampled 1500 pixels from each class to form training datasets ( Table 1) that were named according to the number of training classes and the location of training images. An initial 6-class scheme was based on visual categorization of spectrally unique k-means clusters that represent important drivers of ice sheet process (lake, slush, blue ice, two kinds of flowing ice, firn). We observed that some k-means clusters contained pixels from very shallow lake environments along with slush pixels or non-lake pixels, so we tested splitting the 'lake' class into two classes ('deep lake' and 'shallow/frozen lake') to form a 7-class training dataset. Liquid water pooling in lakes over an ice shelf or ice sheet can re-freeze at the lake surface (e.g., [69,82]), creating a transitional meltwater environment that has been omitted from most classification schemes [49]. We grouped these frozen lake environments into the 'shallow/frozen lake' training class. Two cloud shadow classes and two rock classes were added to form a 9 and 11 class training dataset, respectively; we manually mapped cloud-shadowed regions to form the cloud shadow classes, and for the rock classes, we deployed the Antarctic-wide rock mask (produced by Burton-Johnson et al. [83], accessed through the Antarctic Digital Database [https://www.add.scar.org] and buffered by 1 km to ensure complete rock coverage). Table 1 provides a summary of these training schemes.
We aim to develop an automated mapping procedure broadly applicable in space and time. Therefore, we investigate spatial transferability of supervised classifiers across both Amery and Roi Baudouin ice shelves. We supplemented the full 11-class training dataset from Amery Ice Shelf (t11A) with an additional training dataset created from Roi Baudouin Ice Shelf (t11B). A combination dataset (t11AB) contains training data from both regions (Table 1). Finally, we also explored object-based (rather than pixel-based) image analysis by constructing a training dataset that includes shape parameters (cluster area, perimeter, compactness, and elongation) in addition to the spectral properties of the individual pixels. We resampled our 11-class training dataset with this extra information about cluster shape produced through image segmentation (tOB11A and tOB11B) [84].

Supervised Classification
With our training datasets in hand, we proceeded to supervised classification ( Figure 2). We generate a suite of different trained supervised classifiers by using each training dataset in Table 1 as input to a supervised classification algorithm. To develop an approach that can be easily upscaled in space and time, we used a suite of established classification algorithms accessible within the Google Earth Engine cloud computing platform. These algorithms include Random Forest, Minimum Distance, Classification and Regression Trees, Naive Bayes, Maximum Entropy, and Support Vector Machine algorithms. We then tested all possible combinations of training datasets and supervised classification algorithms (Table S3).
We compare classification results across a set of application images for both Amery and Roi Baudouin ice shelves ( Figure 2). Rock outcrops were masked, and we also manually removed cloudy and cloud-shadowed regions from images prior to applying the unsupervised clustering algorithm, as these can introduce classification ambiguities. Comparing classification results across these heavily curated scenes reflects classifier performance under idealized conditions, without the confounding effect of clouds, cloud shadows and rock. However, individual scene preparation is not feasible at large spatial and temporal scales. Thus, to lay the groundwork for large-scale application of this approach, we also applied all of our 11-class supervised classifiers (including rock and cloud-shadow classes) to scenes with only automated cloud removal based on a multi-band cloud threshold developed by Moussavi et al. [85].

Validation
By necessity, classification error was quantified from limited validation data because in-situ measurements are not available for validation. We therefore constructed two validation datasets: manually mapped high-confidence lake polygons that allows us to assess only the lake/no lake areas that can be clearly interpreted visually, and a pixel-level dataset that was randomly sampled and then manually interpreted as lake or non-lake.
For the first validation dataset, we constructed high confidence lake polygons instead of a traditional shoreline trace; tracing lake outlines requires visual distinction of an often gradual transition between slush and ponded meltwater using just true-color images, a task with inherent subjective variability (e.g., [86]). Thus, we instead used only the centers of lakes to ensure that any error is due to classification and not validation data. We selected potentially ambiguous areas (slush, cloud shadow, blue ice) to comprise non-lake high-confidence polygons.
To construct our second validation dataset, we manually interpreted individual pixels. We generated 200 randomly sampled pixels within each of 6 scenes, and manually labelled each pixel as either a lake or non-lake pixel. Some of these pixels were visually ambiguous, and could not be interpreted as lake or non-lake with certainty. Instead of discarding these pixels, which would create a sampling bias in our validation dataset by disproportionately representing high confidence pixels and ignoring lake margins that are difficult to classify, we included these ambiguous pixels in the dataset with our best guess at interpretation. We report the percent of pixels with low-confidence interpretations to quantify the amount of uncertainty associated with visual interpretation.
We compared lake identification across a set of 52 application scenes: 33 over the Amery Ice Shelf and 19 over the Roi Baudouin Ice Shelf, spanning a range of sun elevation and collection dates ( Table  S1). The goal of this manuscript is to reliably map Antarctic lakes, so we also compared our lake identification against lakes identified through multi-band thresholding presented in companion paper Moussavi et al. [85] and previously published lake thresholds from polar regions.

Error Assessment
We evaluate classifier accuracy against our two manually interpreted lake validation datasets: first, the high-confidence lake polygons assessing only the lake/non-lake areas that can be clearly interpreted visually; and second, the lake pixel dataset, which was randomly sampled prior to visual interpretation, representing the full variability of the scene. These validation datasets are used to assess the performance of classifiers generated from different combinations of classification algorithm and training dataset.
Many of the tested supervised classification algorithms produced high accuracies (Table S2; Table  S3). When the Random Forest algorithm was used to generate supervised classifiers, those classifiers consistently produced the highest accuracies (e.g., 94.9% and 89.5% for t11AB on Amery and Roi Baudouin ice shelf validation pixels, respectively; Table S3), so we selected this algorithm as the best supervised algorithm and refer to our results using this algorithm from now on unless specified otherwise. Full description of the intercomparison of 11 classification algorithms is given in Section SI.1, which therefore achieves aim (3) of the paper. This allows us to focus on aims (1), (2), and (4), rather than tediously report differences in classification algorithm performance.
We assess the accuracy of classifiers generated from different training classes using our two validation datasets. Classifier accuracy tends to be very high when the high-confidence lake and non-lake polygons were used as the validation dataset ( Table 2), demonstrating that our classification scheme is correctly identifying distinct lake and non-lake environments reliably. Manually constructed lake polygons encompass mostly deep centers of large lakes, because these environments can be more easily visually interpreted with high confidence. This presents a possible bias towards assessing accurate classification of deep lakes over shallow lakes. Table 2. High-confidence manual lake and non-lake polygons are produced for five Landsat scenes. The pixels contained by manually traced polygons are classified to calculate the lake/non-lake accuracy assessments shown here. Percentages are classifier lake/non-lake overall accuracy applied to the lake/non-lake polygon dataset. Validation scenes are identified by date (year-month-day); full names are listed in Table S1-C. For the pixel-level validation dataset, all pixels are assigned as either lake or non-lake, and we report the number of low-confidence pixels where this labeling was difficult (Table 3). Low confidence pixels range from only 3.5% to 12.0% of the total validation pixels, so we believe the overall accuracies in Table 3 are a good representation of image processing accuracy. Table 3 shows that with more training classes, classifier accuracy generally increases, although the object-based classifiers (cOB11A and cOB11B) produce lower accuracies than the pixel-based classifiers with the same training classes (c11A and c11B). Accuracy varies across the six sampled Landsat scenes in Table 3 due to differing characteristics (i.e., sun angle, cloud cover); for example, the 2016-12-26 Amery Ice Shelf scene includes more clouds and cloud shadows than the other scenes. Our classifier c11AB, generated from training data from both ice shelves, produces high accuracies over both regions.
Comparing the two validation datasets, we find that the polygon dataset produces higher accuracies than the pixel-based dataset. Mean accuracy for the polygon dataset is 98.6% (standard deviation 2.1%) for all training datasets applied to the corresponding ice shelf, while mean and standard deviation accuracy for the pixel dataset are 92.9% and 6.9%, respectively. In addition, average overall accuracy is much higher for Roi Baudouin than Amery for the pixel dataset, while the polygons show similar accuracy across both environments. Table 3. We use a visually interpreted dataset of individual pixels to validate trained classifiers (Table 1), as well as the multi-band thresholding method from Moussavi et al. [85]. We also report the percentage of low-confidence pixels within our validation dataset, reflecting the potential inaccuracies associated with visual interpretation (Section 2.4.). Percentages are classifier lake/non-lake overall accuracy applied to the lake/non-lake pixels. Validation scenes are identified by date (year-month-day); full names are listed in Table S1-C.

Lake Areas from Supervised Classification
Following our stated aims, we investigated the sensitivity of classified lake areas to different choices a user could make in mapping Antarctic lakes. In addition to the supervised classification algorithm (discussed in Section SI.1.), these choices include the number of training classes and the locations of training classes. Figure 3a shows differences in lake area across training datasets for the Amery Ice Shelf application images. The initial 6-class classifier over Amery Ice Shelf (c6A: lake, slush, blue ice, two kinds of flowing ice, firn; Table 1) often misclassified large swaths of visually slushy or frozen regions as lake (Figure 4), producing relatively high lake areas (Figure 3a). With a distinction between shallow/frozen lakes and deeper lakes, the c7A classifier produced lower lake areas than c6A (Figure 3a; Figure 4). Two cloud shadow classes added to the c9A classifier led to even lower lake areas (Figure 3a; Figure 4). Figure 3a also demonstrates a strong control of sun angle on identified lake area: the shaded grey regions, denoting sun elevations < 20 • , contain lake areas that vary widely across training classes or produce much larger lake areas than are physically possible. The confidence intervals on each bar in Figure 3 reveal that while each set of training data produces high-confidence lake areas, the resulting areas are sometimes quite different.   Table 1), applied to scene LC08_L1GT_128111_20170118. A rock mask was applied prior to classification when using the c6A, c7A, and c9A classifiers but not for c11A or cOB11A.
Throughout the melt season, all classifiers produced a consistent lake evolution pattern: a gradual increase in lake area, peaking during the late season, which matches the melt pattern observed at Larsen Ice Shelf [32]. Early melt season scenes (e.g., November and December) recorded zero or very low lake areas; scenes with a small number of identified lakes often captured a few meltwater ponds, or small amounts of cloud shadow misclassification. Classifiers generated from different training datasets diverged in their ability to classify low-sun-elevation scenes; with increasing number of training classes, lake area decreased for these scenes although significant misclassification remains. We obtained the lowest amount of misclassification for low-sun-elevation scenes by combining training data from both Amery and Roi Baudouin Ice Shelf (c11AB; Figure 3b). Figure 3 shows total lake area across entire scenes, but we are also interested in a more controlled experiment. Thus, we calculate lake area across all our validation polygons, applying each classifier only within the bounds of these high-confidence lake/non-lake areas to test the effect of training data on lake area calculation. In theory, a correct classification should have the same lake area as our manual lake polygons. Figure 5 compares the summed area of our lake polygons with results from supervised classifiers applied to both lake/non-lake polygons. Two date-specific patterns emerge: a pattern where classifiers match lake area better as more classes are added ( We also investigated adding object-based classification to our supervised classification scheme. With shape parameters added as training classes, cOB11A produced lower lake areas compared to the otherwise identical pixel-based c11A classifier (Figure 3a) [Note: higher lake areas from cOB11A in the 25 Feb 2017 scene shown in Figure 5 represent cloud shadow commission error]. With object-based classification, image segmentation creates coherent lake regions, ensuring that large numbers of isolated lake pixels do not artificially inflate the calculated lake areas, but it misses some detailed lake patterns evident in the 'noisy' pixel-by-pixel classification ( Figure 6).

Classification Transferability across Ice Shelves
An aim of this paper is to investigate the transferability of our classification scheme across ice shelf locations. Lakes on Amery Ice Shelf are deeper and larger than on Roi Baudouin Ice Shelf, and this physical difference is important to classify correctly. First we compare a classifier generated from Amery Ice Shelf training data (c11A) with a classifier generated from Roi Baudouin Ice Shelf data (c11B) and a combination classifier generated from both regions (c11AB), by applying these classifiers to both Amery and Roi Baudouin Ice Shelf application images (Figure 7a). We observe that the combination classifier (c11AB) produces similar lake areas to both regionally trained classifiers with the same number of training classes (c11A or c11B; Figure 7a). On Roi Baudouin Ice Shelf, c11B identifies more shallow lake extents than c11AB (Figure 7b), although both capture deep lake environments. Second, we investigate the spatial transferability of supervised classifiers by using training data from one ice shelf as validation data for the other (which renders them independent from one another; for example, a c6A trained classifier was generated from the t6A training dataset and is therefore independent from the t9B training dataset; Table 4 Classifiers generated from Amery Ice Shelf training datasets are relatively unsuccessful when applied to a dataset of interpreted pixels sampled from Roi Baudouin Ice Shelf (i.e., our Roi Baudouin training data), and vice versa. When faced with a combination 11-class dataset containing pixels from both regions, the best-performing regional classifiers (c11A from Amery and c11B from Roi Baudouin) only achieved accuracies of 78% and 66%, respectively. Conversely, the combination classifier c11AB was able to correctly classify the 11-class training datasets from both regions with an accuracy of at least 99%. Table 4. Cross-over classification accuracy (reported as a percentage) for Amery and Roi Baudouin ice shelf trained classifiers, using training datasets as validation. Bold text denotes resubstitution accuracy (after the classifier is generated, it is then applied to the original training dataset). The bottom row shows the combination classifier, generated from training data from both ice shelves, which can accurately classify training data from both locations.

Sensitivity of Lake Area Results to User Choices and Training Data
We are ultimately interested in accurate supraglacial lake mapping. Our results indicate that seemingly trivial user choices can have significant impacts on calculated lake areas, which, in turn, could lead to erroneous scientific conclusions. Our experiment design has made the effect of each of these choices explicitly clear. Principally, the number of training classes significantly impacts the lake areas identified by trained supervised classifiers (Figure 3a; Figure 4), and more accurate lake areas are obtained by distinguishing deep lakes from shallow or frozen lakes ( Figure 5). The addition of cloud shadow classes further improves accuracy of the classifiers; Figure 4 demonstrates how the addition of cloud shadow classes can decrease lake misclassification (commission error). Calculated lake area is relatively insensitive to classification algorithm; similar lake areas are produced across a handful of the best-performing algorithms ( Figure S1), although we use Random Forest exclusively throughout this study. Elucidating these impacts required the extensive investigation that we provide here, and would not be apparent if we had selected an 'off the shelf' classification method and proceeded without further analysis.
Our approach separates shallow/frozen lakes and deep lakes into different training classes (Table 1), using only the deep lake class to calculate lake area. This approach increases confidence in (deep) lake area calculations but possibly introduces omission errors for shallow lakes. Thus, the user decision regarding what constitutes a 'lake' can lead to significant classified lake area discrepancies. This is clear in Figure 3a, where the use of different training classes led to widely different lake areas despite the relatively high accuracy of each classifier ( Table 2, Table 3). In other words, our classifiers are accurately detecting what they have been trained to detect, but they may not have been trained to classify the same lake environment. This is especially important in areas with mostly shallow lakes, such as the Roi Baudouin Ice Shelf. Furthermore, the inclusion of frozen or partially frozen lakes can significantly impact meltwater volume calculations. The definition of when 'slushy' or 'shallow/frozen lake' pixels become 'deep lake' pixels has implications for the wider community use of a lake mapping product and should be explicit in a final product.
Our results also show that classifiers generated from training datasets that contain only one lake class (c6A, c6B) can produce lake areas that are much too large (see c6A in Figure 3a), despite generally high classification accuracy ( Table 2, Table 3). Thus, classification accuracy by itself can be misleading; high accuracy may not reflect improved understanding of the supraglacial environment, as in this example the high image classification accuracy would have badly misrepresented lake area. We assert that training classes must therefore be curated to best capture the desired lake environment, but this does not mean that adding more classes is always desirable. We argue that there are diminishing returns on addition of further classes; we have split rocks, cloud shadows, and flowing ice into two sub-classes each, but further division of classes blurs their statistical differentiability. Future work should explicitly seek to determine if there is an optimal number of classes for pan-Antarctic study.
The object-based classifier is less accurate compared to the pixel-by-pixel classifier. We hypothesize that our addition of shape parameters reduces classifier accuracy by placing less emphasis on reflectance differences. The four shape parameters we incorporate (area; perimeter; the ratio of area to perimeter; and ratio of width to height) are not unique to lake clusters; many other glacial features exhibit similar shapes as lakes (e.g., patches of blue ice, snow, and cloud shadows). Furthermore, c11A/c11B/c11AB omission error is low for high-confidence lake polygons, suggesting that pixel-based classification is already producing coherent lake areas without the extra step of image segmentation. The success of object-based classification relies on the image segmentation methodology. The initial spacing for image segmentation is set at 5 pixels, and superpixel clusters 'grow' from the seeded locations [84]. This spacing is comparable to the minimum lake size for the pixel-based classification method; there is no size threshold for pixel-based classification, but lakes are generally comprised of more than one pixel. The size and clustering method for creating polygons during image segmentation may significantly impact object-based classification results, but further exploration is beyond the scope of this work. As Google Earth Engine tools continue to improve, future incorporation of supervised image classification should explore more sophisticated object-based image analysis as part of the training data production and supervised classification process. Object-based classification provides a valuable minimum estimate of lake area, which could form an important baseline when upscaling lake identification spatially and temporally across Antarctica.

Spatial and Temporal Upscaling
In this work, we investigated the wider spatial and temporal application of our classification method. We tested the spatial robustness of our classification method by comparing classifier performance across Amery and Roi Baudouin ice shelves. The regional classifier generated from only Roi Baudouin scenes (c11B) is less susceptible to shadow misclassification than the regional Amery classifier (c11A) but misses a few deep lake areas. By incorporating training data from both locations, the combination classifier (c11AB) may underestimate lake area by ignoring some shallower lakes (Figure 7b) but ensures that deeper lakes are included. Combining training datasets leads to better performance for low-sun-elevation scenes (c11AB in Figure 3a) and improves classifier accuracy for cloudy scenes (e.g., scene 2016-12-26 in Table 3).
Across both ice shelves, the combination classifier c11AB performed similarly to the regionally trained classifiers c11A and c11B (Figure 7a) with generally similar accuracy despite the differences in lake characteristics ( Table 2; Table 3). Cross-validation of trained classifiers with training datasets for both Amery and Roi Baudouin ice shelves (Table 4) reveals that regionally trained classifiers perform poorly when applied to another ice shelf, but the combination classifier c11AB is accurate across both regions. These observations support the assertion that spatially integrated training data produces more robust lake area identifications. This is encouraging for broad spatial application, as we would otherwise expect that regionally trained classifiers would outperform a multi-region training set. Because c11AB performs well, we have confidence for future broader training data generation.
Removing scenes with sun elevations less than 20 • prevents early-season cloud shadow misclassifications. This produces a consistent seasonal lake evolution pattern, evident across both Amery and Roi Baudouin ice shelves (Figure 3, Figure 7a). However, this sun elevation filter can also remove high-melt late-season scenes from analysis, which may be problematic for capturing the full seasonal evolution of surface meltwater. Adding training data from more locations may continue to improve a classifier's ability to correctly identify lakes in low-sun-elevation scenes. Alternatively, we could envision adding a specific 'low angle' set of classes to be applied only during low angle scenes. This partition into two datasets (based on image date) might allow lakes to be mapped reliably throughout the season but requires difficult analysis of early season scenes to generate training data. To continue to successfully upscale across the Antarctic continent, classification techniques will need to be applicable across more variable environments, not only encompassing different lake characteristics but also regions of extensive dirty ice and different rock outcrop lithologies.

Comparison with Threshold-Based Lake Identification
We compare our 11-class supervised classifier (c11AB) lake results to a suite of previously published spectral thresholding algorithms that have been developed for specific glacial regions on the Antarctic or Greenland Ice Sheet. These methods work by selecting single band or multi-band index thresholds to differentiate lakes. These methods are computationally efficient and easily applied at massive scale, and form a status quo in Antarctic lake mapping. Some of the spectral thresholding methods produce roughly similar lake areas as the supervised classifiers developed here, while others diverge significantly (Figure 8), which is unsurprising given the diversity in lake characteristics across polar regions and the different regions of intended application. The advent of Google Earth Engine has enabled our more computationally intensive methods to be used on the same data volumes as these threshold methods. It is not our intent to explicitly compare and evaluate these methods; rather, we highlight the widely varying lake areas that result from applying different 'off-the-shelf' classifiers, adding weight to our assertion that a careful treatment of supraglacial training data is crucial for the correct identification of lake area. The true-color image shows a subset of scene LC08_L1GT_154109_20170225 with enhanced contrast to highlight lakes partially visible beneath cloud shadows. Lakes identified by our c11AB classifier (c) are compared to previously published lake identification methods based to the ratio of red to blue reflectance: (d) multi-band thresholding by Moussavi et al. [85]; (e) Red/blue ('R/B') thresholds from Banwell et al. [54] for the Larsen B Ice Shelf, Antarctica, and Pope et al. [58] for West Greenland; (f) Normalized Difference Water Index ('NDWI') thresholds from Williamson et al. [63] for West Greenland, Yang & Smith [52] for southwestern Greenland, and Miles et al. [59] for West Greenland. (g) Lake results from c11AB are compared to multi-band thresholding for scene LC08_L1GT_128111_20141228.
We compare our lake results to the multi-band spectral lake identification thresholds developed by Moussavi et al. [85] (Table 2; Table 3; Figure 8). Both methods calculate similar lake areas throughout the melt season and produce a consistent pattern of lake evolution across the melt season (Figure 8a).
General similarity of lake areas calculated by the two methods confirms that the spectral signature of lakes established through our supervised classification method is independently consistent with the multi-band spectral thresholds manually derived in Moussavi et al. [85]. The Moussavi et al. [85] spectral thresholds capture more shallow or partially frozen lake environments than our supervised classifier (Figure 8g). The manually interpreted validation datasets ( Table 2; Table 3) reveal that our 11-class supervised classifier c11AB is more accurate than multi-band-thresholded lakes [85] for three of the four Amery scenes (lower c11AB accuracy is achieved across the cloudiest Amery scene, 2016-12-26, due to cloud shadow commission error). Both methods used together provide critical information about the range of possible lake areas.

Rock Outcrops
Sunlit rock outcrops emerging from snow-and ice-covered regions have distinct spectral characteristics, but shaded rocks can appear similar to lakes. For many of the application scenes presented here, rock outcrops have been clipped out using the static rock mask. This approach ensures complete rock removal but may omit meltwater generated around exposed rock. Also, rock areas may fluctuate annually as snow cover changes over time, so a flexible rock classification is valuable. The incorporation of two rock classes in our training dataset eliminates the need to use a static rock mask on Landsat scenes prior to classification.
We investigate possible classifier confusion between lakes and shaded rocks by comparing our 9-class and 11-class supervised classifier results (c9A and c11A). We find that c9A and c11A produce very similar lake area calculations when a rock mask is applied ( Figure S2), verifying that the c11A classifier does not misclassify deep lakes as rocks; similarly, resubstitution accuracy of the c11A classifier produces virtually zero commission/omission error between rock classes and lake classes. The difference between lake areas classified by c11A with and without the rock mask ( Figure S2) reveals the presence of lakes near rock outcrops that are clipped out by the 1-km rock buffering procedure. The successful incorporation of rock classes into our supervised classification scheme suggests that static rock masking is not a necessary procedure for producing consistent supervised lake area classifications.

Cloud Shadows
Especially at low sun elevations, clouds project shadows onto ice that can be mistaken for lakes. We find that scenes with cloud shadows can have significant classification errors. The inclusion of two unique cloud shadow training classes (and two rock training classes) improve these errors but do not eliminate them (for example, Figure S3). Classification error is generally via commission, where shadowed areas are mistakenly identified as lakes. The addition of two cloud shadow classes improves commission error; classifiers with cloud shadow classes (c9A and c11A) correctly calculate smaller lake areas than without explicit cloud shadow classes, although some cloud shadowed regions are still misclassified as lakes (e.g., Figure S3).
Our pixel validation dataset highlights the difficulty of classifying lakes that are visible but shadowed by clouds; the 12-26-2016 Amery Landsat scene (Table 3) is characterized by clouds and cloud shadows, leading to relatively lower accuracies in Tables 2 and 3. Even the 9-and 11-class supervised classifiers, with cloud shadow training classes, are only able to capture some of the lake areas that appear to underlie cloud shadow. Generally, the confounding effect of cloud shadow can lead to both omission as well as commission errors by our trained supervised classifiers. Supervised classifiers can also be confounded by lake areas shadowed by clouds, because these regions are characterized by both lake and cloud shadow environments. Trained supervised classifiers generally split these areas as partially lake and partially cloud shadow, because a pixel can only belong to one class. Cloud shadow commission error remains an obstacle to large-scale application of supervised classification techniques; best results are achieved through user intervention with careful scene selection and post-classification quality control.

Conclusions
Mapping the extent and evolution of surface meltwater is crucial for understanding the role of supraglacial hydrology in Antarctic ice-sheet stability, and it provides important boundary conditions for assessing the stability of the Antarctic Ice Sheet. Our goals for this manuscript were to: (1) present a method for accurate lake identification that is broadly applicable in space and time and is robust for different ice environments; (2) assess the sensitivity of lake identification to training data and training classes; (3) assess the sensitivity of lake identification to classification algorithms; and (4) explore the transferability of our classification scheme across two ice shelf locations. In this work, we developed a method that interprets unsupervised clustering to generate training data for a scalable supervised classification. We extensively iterated our approach across numerous supervised classification algorithms and training datasets to provide a complete analysis of Antarctic lake classification.
We draw the following principle conclusions. For scenes collected with sun elevation greater than 20 • , accurate supervised classification of lakes is demonstrated across multiple training datasets and classification algorithms using our method. Although misclassification of cloud shadows remains a hindrance to large-scale application of supervised classifiers, our trained classifiers achieve lake/non-lake classification accuracies of over 90% based on manual lake validation datasets. We show that trained supervised classifiers can be applied across two ice shelf environments and produce a coherent melt season signal in lake area (Amery and Roi Baudouin Ice Shelves), despite differences in lake characteristics across the two regions. We also tested the sensitivity of our classification method to the choice of training dataset and classification algorithm. We found that the best classification is achieved using training datasets that distinguish deep lakes from shallow/frozen lake environments and includes unique training classes for cloud shadows. The Random Forest classification algorithm performed best, although lake area results are similar for other high-performing algorithms. This work represents a successful step towards building supervised classifiers that can be fully upscaled across the Antarctic Ice Sheet. Our method is computationally feasible at large scales and can be easily ported across the entire continent within the Google Earth Engine platform. These results provide a valuable comparison point for informing and cross-validating other methods of lake identification, ultimately geared towards creating and improving Antarctic-wide continental lake maps of surface meltwater evolution throughout the satellite observational record.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/8/1327/s1, Table S1: Landsat product ID and sun elevation for training and application scenes, Section SI.1. Supervised Classification Algorithms, Table S2: Resubstitution accuracy for supervised classifiers using different classification algorithms, Table S3: Validation accuracy for all combinations of training dataset and supervised classification algorithm, Figure S1: Comparison of best-performing supervised classification algorithms across the set of Amery Ice Shelf application scenes, Figure S2: Comparison of rock masking versus classification using rock training classes, Figure S3: Supervised classifiers generated from different numbers of training classes applied to an Amery Ice Shelf scene.