On the Automated Mapping of Snow Cover on Glaciers and Calculation of Snow Line Altitudes from Multi-Temporal Landsat Data

: Mapping snow cover (SC) on glaciers at the end of the ablation period provides a possibility to rapidly obtain a proxy for their equilibrium line altitude (ELA) which in turn is a metric for the mass balance. Satellite determination of glacier snow cover, derived over large regions, can reveal its spatial variability and temporal trends. Accordingly, snow mapping on glaciers has been widely applied using several satellite sensors. However, as glacier ice originates from compressed snow, both have very similar spectral properties and standard methods to map snow struggle to distinguish snow on glaciers. Hence, most studies applied manual delineation of snow extent on glaciers. Here we present an automated tool, named ‘ASMAG’ (automated snow mapping on glaciers), to map SC on glaciers and derive the related snow line altitude (SLA) for individual glaciers using multi-temporal Landsat satellite imagery and a digital elevation model (DEM). The method has been developed using the example of the Ötztal Alps, where an evaluation of the method is possible using ﬁeld-based observations of the annual equilibrium line altitude (ELA) and the accumulation area ratio (AAR) measured for three glaciers for more than 30 years. The tool automatically selects a threshold to map snow on glaciers and robustly calculates the SLA based on the frequency distribution of elevation bins with more than 50% SC. The accuracy of the SC mapping was about 90% and the SLA was determined successfully in 80% of all cases with a mean uncertainty of ± 19 m. When cloud-free scenes close to the date of the highest snowline are available, a good to very good agreement of SC ratios (SCR) / SLA with ﬁeld data of AAR / ELA are obtained, otherwise values are systematically higher / lower as useful images were often acquired too early in the summer season. However, glacier speciﬁc di ﬀ erences are still well captured. Snow mapping on glaciers is impeded by clouds and their shadows or when fresh snow is covering the glaciers, so that more frequent image acquisitions (as now provided by Sentinel-2) would improve results.


Introduction
Glacier mass balance is a key measure to determine the contribution of glaciers to regional hydrology [1,2] and global sea level [3][4][5]. As the number of glaciers with direct measurements of mass balance (using the glaciological method) is limited (e.g., [6]) and their representativeness for the larger mountain ranges often unknown [7], it is appealing to obtain glacier mass changes from remote sensing data. Over larger regions the differencing of digital elevation models (DEMs) from two points in time (geodetic method) has been applied widely at approximately decadal resolution to determine total volume and mass changes of glaciers [8][9][10][11]. However, for several geoscientific and socio-economic applications, the availability of region-wide glacier mass changes with annual resolution would be beneficial. As direct measurements reveal a high correlation of glacier mass balance with (a) the snow cover on a glacier and (b) the elevation of the snow line altitude (SLA), mapping snow cover (SC) on glaciers from satellite images offers a proxy for glacier mass balance (e.g., [12][13][14]), whereby the remotely sensed snow cover ratio (SCR) is taken as a proxy for the accumulation area ratio (AAR) [12,13,15] and the elevation of the snow line at the end of the ablation period [12,16] as a proxy for the equilibrium line altitude (ELA). To compensate for local variability in elevation, the ELA is defined mathematically as the elevation where the vertical mass balance profile obtained by direct measurements crosses zero [17]. Mapping SC extent has also been applied to reconstruct missing mass balance measurements [18] on the basis of manually selected satellite scenes coinciding with the end of the ablation season, revealing a high correlation between satellite-derived snowlines and ELAs measured in the field.
Snow and (clean) glacier ice together can be easily mapped from multispectral satellite images by application of various band ratios like near infrared (NIR) / shortwave infrared (SWIR) [19], red / SWIR [20], or the normalized difference snow index (NDSI), which is given as (green -SWIR) / (green + SWIR) [21], and a manually selected threshold. In contrast, the classification of snow on glaciers is difficult with these band ratios, as the shape of the spectral curves of ice and snow is very similar and ratios thus result in about the same values for both facies. A robust and widely used approach for mapping SC on glaciers is based on the use of a threshold on either single-band reflectance values, preferably in the NIR to avoid saturated values over snow with 8-bit Landsat data [12,19,[22][23][24][25][26][27][28][29], or albedo products that are readily available [25]. Mapping snow extent on glaciers has thus come a long way, starting in the 1970s with early tests using contrast-enhanced Landsat multispectral scanner (MSS) satellite images [22,23], to the recent use of near-daily MODIS data to derive time series of surface albedo and SC maps for even small glaciers (e.g., [24,25,30]).
For repeat application over large regions, it would be beneficial to derive the snow-covered area on glaciers automatically, including determination of the snow line altitude (SLA) near the end of the ablation season (as a proxy for the ELA). This is made challenging because of the following: (i) Due to the strong impact of terrain orientation (defined by its normal vector) on surface reflectance in mountain topography, deriving any measure of SC from reflectance requires implementation of a topographic normalisation using a DEM; (ii) SC on glaciers is often patchy and snow lines are in general not parallel to elevation contour lines; and (iii) highly variable atmospheric conditions (e.g., due to cloud shadows) imply that the same threshold might not work equally well on all glaciers. It is thus crucial for any automated algorithm to find a threshold that is adjusted to the local conditions, i.e., for each glacier individually, and as a consequence, for smaller glacier samples, snow lines have often been digitized manually (e.g., [13,[31][32][33]).
To map SC on comparably small glaciers such as in the Alps and for analysis over long time periods, the 30 m resolution Landsat data are the best choice as they span 1984 up to the present and have higher spatial resolution than MODIS data (250 m). On the other hand, Landsat data have the disadvantage of a much coarser temporal coverage (every 16 days), so autumn snowfall and cloud cover can make it difficult to obtain useable images from the end of the ablation period in certain years. As surface elevation of glaciers changes over time, there is a need for multi-temporal DEMs to correctly determine SLA values for long time series.
In this study, we present the fully automated snow mapping on glaciers (ASMAG) tool to (a) map SC on glaciers, and (b) derive the SLA from the 30 year time series of multi-temporal Landsat data and a DEM. The method is applied to the glaciers in the southern Ötztal Alps as three glaciers with long-term data and a DEM. The method is applied to the glaciers in the southern Ötztal Alps as three glaciers with long-term mass balance measurements are located there (Hintereisferner, Kesselwandferner, and Vernagtferner) providing data for validation of the tool. As explained above, the specific challenges for an automated approach are the discrimination of snow from glacier ice, consideration of clouds (or their shadows) affecting parts of a glacier, terrain shadow, the often patchy nature of snow cover, glacier surface debris cover, and methodological constrains such as the impact of a time invariant DEM. Our method has been designed to handle these and automatically finds a glacier-specific reflectance threshold to separate snow from ice. As an introduction to our methodological approach, we present the workflow, discuss the main challenges, and offer a comprehensive assessment of the accuracy of the approach using three independent methods of validation. We also present the results for the study region (SC ratio and SLA time series) in comparison to field data and investigate potential reasons for deviations.

Study Region
The tool is tested in the Ötztal Alps, Tyrol, Austria ( Figure 1) covering 26 glaciers in three different basins (Niedertal, Rofental and Gepatsch), all draining northward. The southern part of the study region includes the main Alpine divide, which often acts as a cloud barrier for weather patterns driven by northerly and southerly flow. In summer, orographic convective clouds are frequent above mountain ridges, often casting shadows on glaciers. The glaciers span ~2150 to ~3750 m a.s.l., cover several aspect directions, have different slopes and shading conditions, and variable hypsometries, collectively resulting in a locally variable distribution of SC. Stripes on the right hand side reveal the typically small impact of the scan line error on the glaciers. Landsat 7 ETM+ SLC-off scenes have thus also been used in the study region.

Satellite Data
In total, 63 Landsat scenes (path/row: 193/027) acquired from 1985 to 2016 (between 1st July and mid-October in each year) were selected for this study (Table 1, full list in the supplementary document). Landsat scenes provide the longest time series and their spatial and spectral resolution allows a robust quality assessment in terms of SC, cloud cover and impact of shadows. However, a clear sky image from the end of the ablation period is not available in every year and the difference between the image acquisition date and the date of the highest snow line in each year varies randomly. All Landsat data were orthorectified (level 1T corrected), have less than 40% cloud cover, and were downloaded from Earth Explorer (http://earthexplorer.usgs.gov/) and the Earth Observation Link (EOLi) of the European Space Agency (ESA).  1985  3  1991  1  1996  1  2002  1  2007  2  2012  1  1986  2  1992  2  1997  2  2003  2  2008  1  2013  3  1988  3  1993  1  1999  1  2004  3  2009  6  2015  3  1989  2  1994  1  2000  2  2005  3  2010  2  2016  2  1990  3  1995  3  2001  2  2006  1  2011  4  Total:  63 All Landsat satellite missions have a revisiting time of 16 days but vary in specific characteristics, with spatial resolution and available spectral bands changing through time (Table 2). This presents a challenge for homogenized and automated processing when working with the complete time series. For example, Landsat MultiSpectral Scanner (MSS) has four image bands and three metadata files (no thermal band), Landsat 4 and 5 Thematic Mapper (TM) have three metadata files, whereas Landsat 7 Enhanced Thematic Mapper plus (ETM+) has nine bands and two metadata files. The Landsat 7 scan line corrector (SLC)-off scenes additionally have an extra folder with the gap mask. The Landsat 8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) has 11 image bands and one metadata file and instead of 8-bit, a 12-bit radiometric resolution in each band. Hence, each Landsat sensor is processed individually.

Digital Elevation Models
Elevation information is essential in ASMAG for two purposes: (1) For topographic correction of illumination effects, and (2) to derive elevation bands to determine the SLA. In this study, we follow the recommendation of Rabatel et al., (2016) [39] and use a DEM from 2000 for the full time series. This reduces biases due to nonlinear changes in glacier extent and may include periods of larger and smaller glaciers. We use the Shuttle Radar Topography Mission (SRTM) 1 arc second global DEM (30 m resolution) from the year 2000 (downloaded from http://earthexplorer.usgs.gov/) which was acquired in about the middle of the investigated period. Hence, for constantly shrinking glaciers the DEM elevations (and derived SLA values) underestimate glacier surfaces at the start of the record and overestimate them at the end. The SRTM DEM is referenced to the Earth Gravitational Model 1996 (EGM96) with a reported mean vertical accuracy of ±16 m [40,41]. As in other regions, only small gaps of glaciers were found in the SRTM DEM (3.15 km 2 ,~0.8%) ( Figure 1) [41]. These gaps were filled with co-registered (see details in Table 3) ASTER Global Digital Elevation Model version 2 (ASTER GDEM V2) data that were also downloaded from Earth Explorer. The ASTER GDEM V2 provides averaged elevation values from 2000-2012 and is available with a 1 arc second (30 m) cell size in geographic coordinates referenced to the EGM96 datum. The vertical accuracy is given as 17 m at a 95% confidence level [42]. For the remainder of the paper, the void-filled SRTM DEM is named reference DEM. All SLA values calculated in this study refer to the reference DEM and represent geoidal heights, so that they can be compared to field measurements of the ELA. Table 3. Co-registration details for the TanDEM-X and ASTER GDEM V2 relative to the reference DEM derived from stable terrain before and after the co-registration. The variables ∆X, ∆Y and ∆Z are the shift vectors, STD is the standard deviation and dh is the mean difference.

Coreg. Parameters
TanDEM To determine trends in surface elevation over Hintereisferner, we used a national DEM (swisstopo DHM25 level 1) from 1992 and a 30 m version of the TanDEM-X DEM that was acquired between 2010 and 2014) [43]. The former DEM is derived from topographic maps (scale 1:50,000) and elevation values refer to the Swiss projection with a local geoid (CH1903) and a user-defined datum. The TanDEM-X DEM is made available by DLR and provided in geographic coordinates with a horizontal resolution of 12 and 30 m. The reported absolute vertical accuracies are <10 m. The vertical datum is WGS84-G1150, i.e., heights are ellipsoidal heights.
Within this study, only the TanDEM-X DEM and the ASTER GDEM V2 were co-registered to the SRTM DEM using the method described by Nuth and Kääb (2011) [44] because the swisstopo DEM already showed elevation differences of less than 1 m for stable terrain (∆Z: -0.65m, standard deviation  (Table 3). The co-registration method analyses elevation differences over stable terrain against terrain aspect and determines shift vectors by fitting (least squares matching) the parameters of a cosine function to the values.

Glacier Outlines
Glacier outlines from 2006 [38,45] were manually updated based on a false colour composite (SWIR, NIR, red as RGB) Landsat 7 ETM image from 29 August 2011 in a GIS environment. We used this satellite scene, as it has minimum SC, which facilitates an unambiguous identification of glacier margins. A size threshold of 0.5 km 2 was applied (26 glaciers remained) to guarantee that glaciers are sufficiently large to cover at least a vertical extent of 100 m. This is essential for the SLA detection method of ASMAG, which is based on the detection of SC in an elevation bin (see Section 4.2.2). All glacier extents were finally converted from vector to raster format (resolution of 60 m for Landsat MSS and 30 m for all other sensors) and clipped to the horizontal extent of the study region to speed up processing.

Mass Balance Data
Direct mass balance measurements and related ELA/AAR values are available for more than 30 years for the three glaciers HEF, KWF and VNF. The measurements follow the fixed date system defined as the period from 1 October to 30 September in the following year. Over the course of the melting period, the transient snow line roughly coincides with the equilibrium line over the glacier. Towards the end of the mass balance year, however, fresh snow fall could cover the ablation area, or exposed snow from previous years (i.e., firn) could cause a difference between the SLA and the ELA. Because the repeat cycle of Landsat impedes a continuous SLA monitoring, we use the annual maximum SLA of one summer season to approximate the ELA. In addition, we use the minimum SCR as a proxy for the AAR of a melting period. The ELA values from 1985 to 2016 are provided in geoidal heights for HEF, KWF and VNF by the World Glacier Monitoring Service (WGMS).

Methods (ASMAG Algorithm)
The general workflow of ASMAG is presented in Figure 2. The processing relies on three input datasets: (a) Glacier outlines, (b) a DEM, and (c) time series of multi-temporal Landsat satellite scenes. In an outer loop, the algorithm accesses all Landsat folders, reads metadata from each scene, determines cloud cover, converts digital numbers into top of atmosphere reflectance (TOAR) and performs a topographic correction. An inner loop cycles through all glaciers, assesses the cloud coverage per glacier and, if cloud cover is less than 10%, the SC and the SLA is calculated. ASMAG has been tailored to handle each Landsat sensor differently, as metadata can be located at different positions in the file, calibration values for calculating radiance and reflectance differ, the radiometric resolution changes, and the number of bands varies. ASMAG is written in IDL (Exelis Visual Information Solution) and various tools (layer stacking, resampling, topographic modeling including slope, aspect and shaded relief) from the ENVI remote sensing software package are integrated.

Data Pre-Processing
4.1.1. Import of Metadata and Landsat Image Sub-Setting ASMAG first imports all metadata (given in the *MTL.txt file) for each Landsat scene (sun elevation, sun azimuth, acquisition date) and each spectral band (bias, gain, pixel size). These numbers are used for the subsequent topographic normalization. Afterwards, all Landsat bands of a scene are cropped to the extent of the study region. The Otsu method is also included in IDL and automatically performs clustering-based image thresholding by assuming that the image contains two classes (snow and ice) and has thus a bimodal histogram [54,56]. As long as the amount of debris on a glacier is small compared to the clean ice area, debris will be included in the darker ice class. The optimum threshold for separating the two classes is found when their combined spread is minimal, or their inter-class variance is maximal [52]. In Figure 3 for three glaciers (no. 5, 17 and 20 in Figure 1) and two dates we show the Ekstrand corrected images, the mapped SC, the reflectance histograms along with the thresholds, and the location of the points (red) used for validation of the results. The latter are selected in obvious and in transition regions to provide a meaningful validation of the method. The SCR is calculated for each glacier and Landsat scene by dividing the snow-covered area by the total glacier area, similar to the accumulation area ratio (AAR) on glaciers [17].

Topographic Normalization with the Ekstrand Correction
Sensor data in the original digital numbers (DNs) are converted into physical values (radiance) and normalized values (reflectance) with band-specific radiometric calibration constants. The retrieval of brightness temperature in degrees kelvin (used for cloud cover mapping) and TOAR follows a two-step process. First, the at-satellite radiance (W/(m 2 sr µm)) is calculated for all bands based on the solar zenith angle at the time of satellite overpass. In a second step, the satellite radiance is converted once into kelvin (thermal band only) and into TOAR once [46][47][48]. The TOAR values are influenced by topography that alters the local incidence angle of solar radiation. This causes differences in reflectance between otherwise similar surface cover types on a pixel-by-pixel basis. As the selected glaciers are located in high mountain regions with partly steep topography, a topographic correction of the TOAR values is applied. Several methods have been developed [49][50][51] and a previous study [52] concluded that the Ekstrand correction provides the best results especially in scenes with SC or glaciers in steep mountain terrain. To perform the Ekstrand correction, the slope, aspect and a hillshade of the DEM for the Landsat acquisition date and time are calculated. This allows the assimilation of pixels considering (a) different illumination conditions, and (b) atmospheric propagation [49].

Cloud Cover Mapping
Although the satellite scenes selected for this study are mostly cloud-free, a cloud detection algorithm is applied to detect potential clouds over glaciers. Here we follow the approach of Irish (2000) [53] that detects clouds in two stages. At first, clouds are classified by excluding dark surfaces, snow is identified and removed with the NDSI (reflectance of snow and clouds is similarly high in the red part of the spectrum), and warm surfaces with high reflectance, vegetation, soil, etc., are identified using 5 out of 7 Landsat bands. In the second stage, a different threshold is applied to the thermal band to discriminate cold clouds from snow surfaces that usually have a higher temperature (at or closer to 0 • C). The cloud cover mask is calculated for the entire Landsat scene and intersected with the glacier mask to determine the cloud cover percentage for each glacier. If cloud cover over individual glaciers is less than 10%, the glacier is included in the processing of the related scene. The method by [53] works for all Landsat sensors, except the (cloud-free) MSS scene because MSS has no thermal band.

Snow Cover Mapping and the Snow Cover Ratio (SCR) for Individual Glaciers
The snow cover mapping algorithm is the core part of ASMAG. As a single threshold applied to the topographically corrected TOAR image of the entire study region has not provided useful results, the method is applied to individual glaciers where it selects a threshold automatically. We follow the approach presented by Bippus (2011) [52] and create a histogram of the Ekstrand corrected NIR band TOAR for each glacier after a mask (derived from the manually adjusted glacier inventory) has been applied. The NIR band also has the advantage of (in general) not being saturated over snow in 8-bit Landsat data [49]. The histogram is then used to identify a threshold to separate snow from ice. As histograms and thus optimum thresholds strongly vary, we tested the suggestion of Yin et al. (2013) [54] and use the Otsu algorithm [55] for an automatic detection of the threshold.
The Otsu method is also included in IDL and automatically performs clustering-based image thresholding by assuming that the image contains two classes (snow and ice) and has thus a bimodal histogram [54,56]. As long as the amount of debris on a glacier is small compared to the clean ice area, debris will be included in the darker ice class. The optimum threshold for separating the two classes is found when their combined spread is minimal, or their inter-class variance is maximal [52]. In Figure 3 for three glaciers (no. 5, 17 and 20 in Figure 1) and two dates we show the Ekstrand corrected images, the mapped SC, the reflectance histograms along with the thresholds, and the location of the points (red) used for validation of the results. The latter are selected in obvious and in transition regions to provide a meaningful validation of the method. The SCR is calculated for each glacier and Landsat scene by dividing the snow-covered area by the total glacier area, similar to the accumulation area ratio (AAR) on glaciers [17].

Snow Line Altitude (SLA) Determination
The snow line on a glacier is rarely a continuous line, as windblown snow, avalanche deposits, hollows and local shading can all cause locally variable snow accumulation and thus an inhomogeneous distribution of remaining snow patches. Hence, the lower boundary of the snow cover is often patchy and could be seen as a zone covering several elevation bands rather than a line at a distinct altitude ( Figure 4). The method developed to calculate the SLA accounts for these variable characteristics and follows the recommendation by the World Meteorological Organization [57] to interpret this region as a zone rather than a line. As a first step, the reference DEM is converted to 20 m (100 m for Landsat MSS) elevation bins. The SC map from each glacier is then digitally intersected with each elevation bin starting at the lowest one. The algorithm checks if the first five elevation bins all have a SC larger than 50%. In this case, the lowest bin is selected as the SLA. If there is no result with five bins throughout the glacier, the algorithm selects only four bins, and finally only three. If there is again no result, the algorithm takes the lowest elevation bin where a SC percentage of more than 50% is found. This procedure of considering isolated snowfields is similar to the principle of deriving mass balance from field measurements where such regions are considered as well. However, whether such snowfields will result in a positive mass balance for the related elevation bin or not will depend on specific conditions [58] like snow thickness and density. Even if an elevation bin has more than 50% SC, it might still be possible that the mass balance of the elevation bin is negative or vice versa (i.e., positive if less than 50% is snow covered).

Snow Line Altitude (SLA) Determination
The snow line on a glacier is rarely a continuous line, as windblown snow, avalanche deposits, hollows and local shading can all cause locally variable snow accumulation and thus an inhomogeneous distribution of remaining snow patches. Hence, the lower boundary of the snow cover is often patchy and could be seen as a zone covering several elevation bands rather than a line at a distinct altitude (Figure 4). The method developed to calculate the SLA accounts for these variable characteristics and follows the recommendation by the World Meteorological Organization [57] to interpret this region as a zone rather than a line. As a first step, the reference DEM is converted to 20 m (100 m for Landsat MSS) elevation bins. The SC map from each glacier is then digitally intersected with each elevation bin starting at the lowest one. The algorithm checks if the first five elevation bins all have a SC larger than 50%. In this case, the lowest bin is selected as the SLA. If there is no result with five bins throughout the glacier, the algorithm selects only four bins, and finally only three. If there is again no result, the algorithm takes the lowest elevation bin where a SC percentage of more than 50% is found. This procedure of considering isolated snowfields is similar to the principle of deriving mass balance from field measurements where such regions are considered as well. However, whether such snowfields will result in a positive mass balance for the related elevation bin or not will depend on specific conditions [58] like snow thickness and density. Even if an elevation bin has more than 50% SC, it might still be possible that the mass balance of the elevation bin is negative or vice versa (i.e., positive if less than 50% is snow covered).

Abramov Glacier (Kyrgyzstan) as an Independent Validation Region
The ASMAG approach has also been applied to Abramov glacier in Kyrgyzstan to have a further independent measure of validation under different environmental conditions. We compared the results of ASMAG to manually digitized snowlines from Landsat satellite imagery [59] which were used to calibrate a mass balance model throughout the melting season. To achieve comparable results, we used the same glacier outlines, DEM and Landsat imagery as in this previous study.

Snow Cover Mapping
The performance of the SC mapping on the individual glaciers was determined by randomly selecting two Landsat scenes from each of the four different types (L5 TM, L7 SLC-on, L7 SLC-off

Abramov Glacier (Kyrgyzstan) as an Independent Validation Region
The ASMAG approach has also been applied to Abramov glacier in Kyrgyzstan to have a further independent measure of validation under different environmental conditions. We compared the results of ASMAG to manually digitized snowlines from Landsat satellite imagery [59] which were used to calibrate a mass balance model throughout the melting season. To achieve comparable results, we used the same glacier outlines, DEM and Landsat imagery as in this previous study.

Snow Cover Mapping
The performance of the SC mapping on the individual glaciers was determined by randomly selecting two Landsat scenes from each of the four different types (L5 TM, L7 SLC-on, L7 SLC-off and L8 OLI) and identifying ten validation points before performing the automatic classification (see Figure 3 for location). As this should be a reference dataset for accuracy assessment, we took care, that at least 3 to 5 points for each glacier were in an obvious snow or ice zone, whereas the other points were in a transition zone with a more challenging (but still possible) class assignment. For accuracy assessment, all points were classified visually into snow and no snow ( Figure 3) and then compared to the results of the automatic method.

Snow Line Altitude (SLA)
The uncertainty of the SLA was determined for each dataset and glacier. Uncertainties result from: (1) The pixel size of the images, which range from 30 to 60 m, (2) the slope of the glacier near the SLA ranging between 6.8 • and 26.5 • , (3) the vertical accuracy of the DEM (±16 m) [12,14,41], and (4) the temporal difference between image acquisition and DEM date. To accommodate for (1) and (2), an elevation bin of 20 m (by adding ±10 m to the current SLA) is calculated. The mean slope of this elevation bin is derived in percent and multiplied with the respective grid size. For (3), DEM uncertainty was added using error propagation (root of the quadratic sum of the different independent errors). The impact of (4) is discussed in Section 6.2.2 as this source introduces a systematic bias that has not been corrected. The resulting total uncertainty of all SLAs varied between 16 and 66 m, depending on the scene and the glacier. The resulting mean uncertainty is ±19 m, which is taken as a measure of uncertainty for the SLA. It is important to distinguish this uncertainty value from the elevation bias of the SLA in any given year due to glacier surface height changes before and after 2000 (see Section 6.2.2).
The performance of the computed SLA has been determined for four glaciers (HEF, KWF, VNF and Hochjochferner (HJF); no. 11, 13, 23 and 26 in Figure 1) by a regression analysis comparing SLAs derived by manual assessment with those derived automatically by ASMAG. The manual SLA value was derived visually in the GIS by overlaying the satellite image on the reference DEM and subsequent determination of elevation values along the snow line visible in the image. All values were rounded to the nearest 20 m interval for comparison with the automatic value. The potential of ASMAG to approximate the ELA as derived by field observations (for the three glaciers with direct measurements) in Chapter 5.3, was restricted to years with at least two Landsat images available and years where the ELA was above the highest elevation of a glacier are eliminated. This assures that at least two values of the SLA are available for one summer. Finally, in each year the highest Landsat derived SLA value is selected to provide an approximation of the ELA [12].

Performance of the Snow Cover Mapping and Snow Line Altitude Detection Method
The comparison of the automatically-and manually-derived SLAs for the four test glaciers reveals a good agreement with a coefficient of determination (r 2 ) > 0.8 for all test glaciers ( Figure 5). Inconsistencies occur when a glacier is covered by a thin fresh snow layer. On HEF in Figure 5a differences up to 380 m (arrow 1) and 220 m (arrow 2) can be detected, whereas for HJF in Figure 5d one difference of 220 m (arrow 3) and three times of more than 100 m are visible. In this case, there was no clear minimum detectable in the histogram and Otsu failed. Another reason for failure is related to small clouds and their shadows. In the case of HJF, this once caused a 400 m difference (Figure 5d, arrow 4) and for HEF a 260 m difference (Figure 5a, arrow 5). The extensive cloud shadow caused a large dark area on the glacier and the SLA detection found at least three elevation bins in a row with SC above it. Also cast shadows by topography caused differences. In four cases, this caused an SLA difference of more than 100 m. As a consequence, the r 2 of HJF (0.82) is the lowest one of all four glaciers. The analysis of the classification accuracy for the selected glaciers shown in Figure 3 is summarized in Table 4 and reveals an overall classification accuracy of 90.6%. Table 4. Performance of the snow cover mapping for eight Landsat scenes and four glaciers (5,8,17,20). The columns 'Points (A/V)' give the number of points (see Figure 3) classified by ASMAG (A) and visually (V) as snow. In total, 43 out of 48 points (90.6%) are classified correctly.

Snow Cover Ratio and Snow Line Altitude through Time
The minimum SCR per season for HEF, KWF and VNF is shown along with the measured AAR in Figure 6. Both show a high year-to-year variability with SCR values in general somewhat higher than AAR due to the earlier (seldom later) acquisition date. Average AAR values for the first (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000) and second periods (2001-2016) are 62% and 54%, respectively. Whereas the AAR of KWF is always the highest value, for HEF and VNF the ranking is variable. In most cases, the ranking is the same for the SCR values, indicating that glacier specific differences are established early on. Larger deviations are visible for all three glaciers in the years 1994, 2003, 2006, 2012 and 2015, whereas good to very good matches can be found in 1993,1999,2000,2001,2004,2009,2010 and 2016. Very good matches for individual glaciers occur more often for HEF than for the other glaciers. When averaged for all glaciers and for the entire study period, the mean value of the SCR is 58% (STD of 11%) whereas it is 40% for the AAR (18%). The analysis of the classification accuracy for the selected glaciers shown in Figure 3 is summarized in Table 4 and reveals an overall classification accuracy of 90.6%. Table 4. Performance of the snow cover mapping for eight Landsat scenes and four glaciers (5,8,17,20). The columns 'Points (A/V)' give the number of points (see Figure 3) classified by ASMAG (A) and visually (V) as snow. In total, 43 out of 48 points (90.6%) are classified correctly.

Snow Cover Ratio and Snow Line Altitude through Time
The minimum SCR per season for HEF, KWF and VNF is shown along with the measured AAR in Figure 6. Both show a high year-to-year variability with SCR values in general somewhat higher than AAR due to the earlier (seldom later) acquisition date. Average AAR values for the first (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000) and second periods (2001-2016) are 62% and 54%, respectively. Whereas the AAR of KWF is always the highest value, for HEF and VNF the ranking is variable. In most cases, the ranking is the same for the SCR values, indicating that glacier specific differences are established early on. Larger deviations are visible for all three glaciers in the years 1994, 2003, 2006, 2012 and 2015, whereas good to very good matches can be found in 1993,1999,2000,2001,2004,2009, 2010 and 2016. Very good matches for individual glaciers occur more often for HEF than for the other glaciers. When averaged for all glaciers and for the entire study period, the mean value of the SCR is 58% (STD of 11%) whereas it is 40% for the AAR (18%).
Remote Sens. 2019, 9, x FOR PEER REVIEW 13 of 24 Figure 6. Snow cover ratio (SCR) for all glaciers (grey lines connecting one SCR value with the following one) within the >30 years study period. Colored points mark the minimum SCR for HEF, KWF and VNF for the respective season. The accumulation area ratio (AAR) values derived in the field for the three glaciers are shown as lines. The time series of the SLA values for all glaciers, as well as the maximum SLA and the ELA measured in the field for HEF, KWF and VNF are shown in Figure 7. A mean SLA of 2979 m is derived for the full sample and study period. From 1985-2000 to 2001-2016 the SLA increased from 2951 m to 3008 m (by 57 m). The maximum SLA in each season is in the majority of the cases lower than the ELA and fluctuates strongest for HEF. HEF often has the lowest SLA, whereas VNF often has the highest SLA. This is in line with the ELA variability but slightly different from the AAR and Figure 6. Snow cover ratio (SCR) for all glaciers (grey lines connecting one SCR value with the following one) within the >30 years study period. Colored points mark the minimum SCR for HEF, KWF and VNF for the respective season. The accumulation area ratio (AAR) values derived in the field for the three glaciers are shown as lines.
The time series of the SLA values for all glaciers, as well as the maximum SLA and the ELA measured in the field for HEF, KWF and VNF are shown in Figure 7. A mean SLA of 2979 m is derived for the full sample and study period. From 1985-2000 to 2001-2016 the SLA increased from 2951 m to 3008 m (by 57 m). The maximum SLA in each season is in the majority of the cases lower than the ELA and fluctuates strongest for HEF. HEF often has the lowest SLA, whereas VNF often has the highest SLA. This is in line with the ELA variability but slightly different from the AAR and SCR variations shown in Figure 6. Hence, ELA/AAR (SLA/SCR) values are not exactly mirrored. Four peaks in the ELA can be seen for the years 1991, 1994, 2003 and 2006/07. All these years had exceptionally warm summer periods where ELAs were partly above the highest elevations of the glaciers. Differences to the SLA were very large in these years, in particular for HEF which often suffered from cloud cover on the scenes closest to maximum ablation. In 1991 and 1994 the scenes available for calculating an SLA for HEF had particularly low values due to snowfall events that occurred shortly before image acquisition.

Maximum Snow Line Altitude Versus Equilibrium Line Altitude
The comparison of the maximum SLA (only for years with more than one acquisition) with the ELA of the respective year reveals a variable relationship (r 2~0 .48) among the three glaciers ( Figure 8). As is the case for the mean SLA values (Figure 7  . Snow cover ratio (SCR) for all glaciers (grey lines connecting one SCR value with the following one) within the >30 years study period. Colored points mark the minimum SCR for HEF, KWF and VNF for the respective season. The accumulation area ratio (AAR) values derived in the field for the three glaciers are shown as lines.  variations shown in Figure 6. Hence, ELA/AAR (SLA/SCR) values are not exactly mirrored. Four peaks in the ELA can be seen for the years 1991, 1994, 2003 and 2006/07. All these years had exceptionally warm summer periods where ELAs were partly above the highest elevations of the glaciers. Differences to the SLA were very large in these years, in particular for HEF which often suffered from cloud cover on the scenes closest to maximum ablation. In 1991 and 1994 the scenes available for calculating an SLA for HEF had particularly low values due to snowfall events that occurred shortly before image acquisition.

Maximum Snow Line Altitude Versus Equilibrium Line Altitude
The comparison of the maximum SLA (only for years with more than one acquisition) with the ELA of the respective year reveals a variable relationship (r 2 ~ 0.48) among the three glaciers ( Figure 8). As is the case for the mean SLA values (Figure 7

Abramov Glacier (Kyrgyzstan) SLA Results
Comparing SLA values obtained by ASMAG with those derived manually [59] for Abramov Glacier in Kyrgyzstan revealed accuracies of nearly 90% (Figure 9). The only outlier in Figure 9a (red arrow) is due to clouds on the tongue of the glacier, which was classified as snow covering more than five elevation bins. Figures 9b and 9c show that ASMAG sometimes includes more SC than the manual digitization. This might be related to difficulties in distinguishing firn from snow but also can arise if the manual classification is inaccurate. A very good agreement of the snowlines is shown in Figure 9d.

Abramov Glacier (Kyrgyzstan) SLA Results
Comparing SLA values obtained by ASMAG with those derived manually [59] for Abramov Glacier in Kyrgyzstan revealed accuracies of nearly 90% (Figure 9). The only outlier in Figure 9a (red arrow) is due to clouds on the tongue of the glacier, which was classified as snow covering more than five elevation bins. Figure 9b,c show that ASMAG sometimes includes more SC than the manual digitization. This might be related to difficulties in distinguishing firn from snow but also can arise if the manual classification is inaccurate. A very good agreement of the snowlines is shown in Figure 9d.  [59] is shown for three different years.

Snow Cover Ratio SCR and AAR
The mapped SCR for each glacier depends very much on the acquisition date of the scene and will be larger than the AAR in all years where the week of maximum ablation is not met. As this is seldom the case, our SCR values are generally higher. However, with very few exceptions, the ranking is identical indicating that glacier specific differences in the ablation pattern are well captured by the ASMAG mapping. On the other hand, in years where (cloud-free) scenes from near the end of the ablation period are available (mostly after 1997) very good matches of the SCR with the AAR have been achieved for all three glaciers. This indicates the high potential of the method for high quality results when denser time series of satellite data are available. Despite the high year-to-year variability, a negative and nearly identical trend of AAR values is observed over the two study periods (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) for the three glaciers (from -0.06 to -0.08). Such trends were also found before 1998 [60] and are in line with a temperature increase since 1979 at the climate station in Vent (0.07 °C /year) [58] and a general decrease of SC in the Alps [61,62].
Less or even no SC on glaciers is expected to become more frequent in the future especially for mid-latitude glaciers. One example representing such conditions was the very hot summer in 2003, where the measured ELA was above the highest point of the glacier. Under such conditions, ASMAG is expected to produce erroneous results from arbitrary snow patches or bright spots on the glacier, although in our case ASMAG worked well in 2003 as only images from mid and end of July (still with snow on glaciers) were available.

Snow Cover Ratio SCR and AAR
The mapped SCR for each glacier depends very much on the acquisition date of the scene and will be larger than the AAR in all years where the week of maximum ablation is not met. As this is seldom the case, our SCR values are generally higher. However, with very few exceptions, the ranking is identical indicating that glacier specific differences in the ablation pattern are well captured by the ASMAG mapping. On the other hand, in years where (cloud-free) scenes from near the end of the ablation period are available (mostly after 1997) very good matches of the SCR with the AAR have been achieved for all three glaciers. This indicates the high potential of the method for high quality results when denser time series of satellite data are available. Despite the high year-to-year variability, a negative and nearly identical trend of AAR values is observed over the two study periods (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) for the three glaciers (from −0.06 to −0.08). Such trends were also found before 1998 [60] and are in line with a temperature increase since 1979 at the climate station in Vent (0.07 • C /year) [58] and a general decrease of SC in the Alps [61,62].
Less or even no SC on glaciers is expected to become more frequent in the future especially for mid-latitude glaciers. One example representing such conditions was the very hot summer in 2003, where the measured ELA was above the highest point of the glacier. Under such conditions, ASMAG is expected to produce erroneous results from arbitrary snow patches or bright spots on the glacier, although in our case ASMAG worked well in 2003 as only images from mid and end of July (still with snow on glaciers) were available.

Maximum SLA and ELA
Similar to the SCR and AAR but with less variability, values for the highest SLA within a season are systematically lower than ELA values. In years with a good match, the ranking of SLA values is also in agreement with ELA values albeit less often than for the SCR/AAR pairs. Considering the simplicity of the SLA calculation and the different way the ELA is derived from field measurements, this agreement is promising for applying the method to more frequently acquired satellite data. As for the SCR, most of the SLA outliers (e.g., in 1991, 1994, 2003, 2007 and 2012) can be explained with the absence of appropriate (cloud-free) images. If images from the end of the ablation season are not available, the automatically mapped maximum SLA cannot be used as a proxy for the ELA nor the glacier mass balance. However, all ASMAG derived snow cover maps can still be used to calibrate and/or validate results of mass balance models [29]. Moreover, snow cover maps are an important input for albedo or degree-day factor parameterizations in mass balance models and have thus further applications.

Comparison to Other Studies
Rabatel and colleagues [13] showed high correlations between the ELA and SLA (r 2 > 0.89) arguing that satellite images can be used to determine ELA. However, they worked with a manually selected sample of scenes that was acquired closest to the date of the highest SLA. Our automated approach gives larger deviations and lower correlations as our selection of summer images also includes years with a large temporal difference between the date of the image and that of the highest snow line. Hence, manual selection of images close to the end of the ablation season [13] can represent the ELA better. However, this might require utilizing Sentinel-2, SPOT or ASTER scenes to increase temporal sampling and achieve a denser time series. This is confirmed by the high agreement of SLA values derived by ASMAG for Abramov glacier when scenes are manually selected. In this case, SLA values are even slightly higher than the manually derived ones, indicating a more conservative interpretation, i.e., firn is likely excluded. When using only Landsat imagery, as in our workflow, the limited temporal sampling becomes an issue.

Satellite Data
ASMAG uses freely available Landsat data that have been acquired since 1985 and can be downloaded from different sources for most regions in the world. Alternatively, freely available ASTER (1999-2008) and Sentinel-2 data (since 2015) can be used, but their time series are shorter. Data from MODIS are generally too coarse for the small glaciers in the Alps, but several studies exist where MODIS on small glaciers and ice caps has been used to derive time series of surface albedo values and regional snow line elevations to approximate their annual mass balance [24,25,30,[63][64][65]. The main drawback of Landsat data when mapping the maximum SLA is the long revisit time of 16 days. If clouds hide glaciers at the time of image acquisition, the number of useful images from the end of the ablation period is often insufficient to capture the highest snowline elevation. Over the full study period, only in the year 2009 six Landsat images (Landsat 5 and 7 SLC-off scenes) were available. Such conditions facilitate capturing the minimum SC and maximum SLA on single glaciers and results in a much smaller difference to values measured in the field. In this regard, it can be expected that applying the ASMAG approach to images from the Sentinel-2A/B satellite constellation with a revisit time of 5 days or shorter will improve future data availability drastically [66], thus overcoming the temporal limitations encountered here with Landsat data.

Digital Elevation Model
A systematic bias is introduced by using the reference DEM from 2000 for the full study period, as this is neither representative of the glacier surface from 1985 nor of recent times. Assuming a more or less constant surface lowering with time introduces a bias, in particular in rapidly down-wasting glacier parts at lower elevations. We have thus investigated this effect on HEF as this glacier is covered in a 1992 DEM from swisstopo and a DEM from the TanDEM-X mission (acquired around 2013). Figure 10 shows elevation profiles extracted along a centre line from all three DEMs (DEM 1992, SRTM DEM, TanDEM-X), revealing increasing differences towards lower elevations. When averaging differences in four elevation bins, the obtained values in the lowest interval (2500-2600 m) are about 64 m higher in 1992 and about 60 m lower with TanDEM-X (Table 5). Such systematic differences have to be considered when interpreting the time series depicted in Figure 7. Strictly, only the SLA values derived around the year 2000 are unbiased.
Remote Sens. 2019, 9, x FOR PEER REVIEW 17 of 24 glacier parts at lower elevations. We have thus investigated this effect on HEF as this glacier is covered in a 1992 DEM from swisstopo and a DEM from the TanDEM-X mission (acquired around 2013). Figure 10 shows elevation profiles extracted along a centre line from all three DEMs (DEM 1992, SRTM DEM, TanDEM-X), revealing increasing differences towards lower elevations. When averaging differences in four elevation bins, the obtained values in the lowest interval (2500-2600 m) are about 64 m higher in 1992 and about 60 m lower with TanDEM-X (Table 5). Such systematic differences have to be considered when interpreting the time series depicted in Figure 7. Strictly, only the SLA values derived around the year 2000 are unbiased.  With DEMs from three points in time available it would be possible to apply elevation dependent correction factors of surface elevation for each year. However, multiple DEMs are not yet generally available for most regions, so here we have only investigated the theoretical effect of neglecting such changes. In the above calculation of differences (that compares an optical DEM to Cand X-Band radar DEMs) we have not corrected elevation differences due to radar penetration [67,68] as they are about an order of magnitude smaller. In effect, SLAs before 2000 have to be corrected by up to +64 m and lowered to max -60 m after 2000. Such a correction would bring values from the first period closer to the measurements but would move those from the second period further away. Nevertheless, most of the SLA values derived in this work stem from around 3000 m (see Figure 7) where the elevation difference are less prominent.
In flat and smooth terrain the vertical accuracy of the DEM is often much higher than in steep rugged terrain [69]. As glacier surfaces are comparably flat (see Figure 1 for mean slope values per glacier) and smooth (30 m cell size) it can be assumed that the reported vertical accuracy of the SRTM DEM (±16 m) is not exceeded over glaciers. Still remaining differences can be due to radar penetration (as shown by [70]) or local artefacts. The systematic differences due to general surface lowering, however, have to be analysed, in particular when values are in the same order of magnitude as the observed trends in elevation change through time.

Methodological Constraints
In this study we presented a method to map snow on glaciers based on techniques developed in previous studies [49,52], and added automated processing lines for threshold selection (using the Otsu method) and SLA determination. The approach works very well when snow and ice facies  With DEMs from three points in time available it would be possible to apply elevation dependent correction factors of surface elevation for each year. However, multiple DEMs are not yet generally available for most regions, so here we have only investigated the theoretical effect of neglecting such changes. In the above calculation of differences (that compares an optical DEM to C-and X-Band radar DEMs) we have not corrected elevation differences due to radar penetration [67,68] as they are about an order of magnitude smaller. In effect, SLAs before 2000 have to be corrected by up to +64 m and lowered to max -60 m after 2000. Such a correction would bring values from the first period closer to the measurements but would move those from the second period further away. Nevertheless, most of the SLA values derived in this work stem from around 3000 m (see Figure 7) where the elevation difference are less prominent.
In flat and smooth terrain the vertical accuracy of the DEM is often much higher than in steep rugged terrain [69]. As glacier surfaces are comparably flat (see Figure 1 for mean slope values per glacier) and smooth (30 m cell size) it can be assumed that the reported vertical accuracy of the SRTM DEM (±16 m) is not exceeded over glaciers. Still remaining differences can be due to radar penetration (as shown by [70]) or local artefacts. The systematic differences due to general surface lowering, however, have to be analysed, in particular when values are in the same order of magnitude as the observed trends in elevation change through time.

Methodological Constraints
In this study we presented a method to map snow on glaciers based on techniques developed in previous studies [49,52], and added automated processing lines for threshold selection (using the Otsu method) and SLA determination. The approach works very well when snow and ice facies have good contrast resulting in a bimodal histogram. The method can thus be applied to all optical sensors where a NIR band is available and to all glaciers showing an optical contrast between ice and snow. If a thin SC is present over the entire glacier, there is no optical contrast and the Otsu method has problems in finding a threshold (Figure 11a). This is acceptable as the determination of the SCR or SLA has limited glaciological meaning in such cases. However, for hydrological applications a complete SC is also of interest and it could be set to 100% based on threshold criteria. If polluted firn is present, it is likely classified as ice and if the firn is very bright as snow (see Figure 9b,c). In the latter case, this will add to the uncertainty as firn belongs to the ablation area. have good contrast resulting in a bimodal histogram. The method can thus be applied to all optical sensors where a NIR band is available and to all glaciers showing an optical contrast between ice and snow. If a thin SC is present over the entire glacier, there is no optical contrast and the Otsu method has problems in finding a threshold (Figure 11a). This is acceptable as the determination of the SCR or SLA has limited glaciological meaning in such cases. However, for hydrological applications a complete SC is also of interest and it could be set to 100% based on threshold criteria. If polluted firn is present, it is likely classified as ice and if the firn is very bright as snow (see Figures 9b and 9c). In the latter case, this will add to the uncertainty as firn belongs to the ablation area. Figure 11. Snow cover mapping performance for two problematic cases. (a) HEF with fresh snow cover and its associated wrong mapping. (b) HEF with clouds and its related shadows, which also cause the algorithm to fail. Right panels show the respective histograms of the Ekstrand corrected image. A clear selection of a threshold is difficult in these cases.
The influence of debris cover on the quality of the results is difficult to determine here, as only a few glaciers with debris-covered tongues are present in the study region. An assessment with the debris-covered dead ice body at HEF revealed that the impact on the classification is small, as debris on glaciers generally falls into the class ice, using the Otsu classification. If the method is applied to glaciers with a high share of debris cover in their ablation zone (e.g., in the Himalaya), a simple clean-ice classification with a band ratio can be performed to first exclude the debris-covered parts from the glacier mask. This is facilitated by the availability of a debris cover mask in the global Randolph Glacier Inventory since version 6.0 [71]. Temporary debris cover in the accumulation zone caused by ash or rock fall may also affect the quality of the SC mapping and SLA determination as these regions are then excluded from the SC map. The SCR would be underestimated and the SLA might be too high in such cases. Compared to clouds and their shadows, however, such events are rare and thus their overall impact is small.
Automated cloud detection is a challenging issue in automated processing of optical remote sensing images, as they exhibit diverse optical properties (water, ice) and densities (fog, cirrus, cumulus) so that the landscape below might partly shine through. Whereas thin ice clouds can be detected over ice and snow by their higher SWIR reflectance [21], the cloud tops of optically thick water clouds (cumulus) might be detected by their lower thermal infrared (TIR) emission (i.e., they are cooler). However, for the optically thin boundaries of a cloud the difference to the reduced TIR emission over ice and snow might be small so that cloud parts might remain over glaciers and be (a) (b) Figure 11. Snow cover mapping performance for two problematic cases. (a) HEF with fresh snow cover and its associated wrong mapping. (b) HEF with clouds and its related shadows, which also cause the algorithm to fail. Right panels show the respective histograms of the Ekstrand corrected image. A clear selection of a threshold is difficult in these cases.
The influence of debris cover on the quality of the results is difficult to determine here, as only a few glaciers with debris-covered tongues are present in the study region. An assessment with the debris-covered dead ice body at HEF revealed that the impact on the classification is small, as debris on glaciers generally falls into the class ice, using the Otsu classification. If the method is applied to glaciers with a high share of debris cover in their ablation zone (e.g., in the Himalaya), a simple clean-ice classification with a band ratio can be performed to first exclude the debris-covered parts from the glacier mask. This is facilitated by the availability of a debris cover mask in the global Randolph Glacier Inventory since version 6.0 [71]. Temporary debris cover in the accumulation zone caused by ash or rock fall may also affect the quality of the SC mapping and SLA determination as these regions are then excluded from the SC map. The SCR would be underestimated and the SLA might be too high in such cases. Compared to clouds and their shadows, however, such events are rare and thus their overall impact is small.
Automated cloud detection is a challenging issue in automated processing of optical remote sensing images, as they exhibit diverse optical properties (water, ice) and densities (fog, cirrus, cumulus) so that the landscape below might partly shine through. Whereas thin ice clouds can be detected over ice and snow by their higher SWIR reflectance [21], the cloud tops of optically thick water clouds (cumulus) might be detected by their lower thermal infrared (TIR) emission (i.e., they are cooler). However, for the optically thin boundaries of a cloud the difference to the reduced TIR emission over ice and snow might be small so that cloud parts might remain over glaciers and be classified as snow by ASMAG. Moreover, the (static) band thresholds for the cloud cover mapping procedure are currently calibrated for only one scene per Landsat type, so that the calculated percentages of cloud cover above a glacier are not always reliable. This could be improved in future versions of ASMAG, preferably in a scene-dependent manner.
Shadows from clouds and cast shadows from steep topography cause problems for the Ekstrand correction. Such shadowed areas on glaciers are classified as ice and result in smaller SCR and incorrect SLA values if they hide snow-covered regions. However, cloud shadows are rare in the workflow and topographic shadows are relatively small in the study region in August and early September. In more extreme relief, the algorithm may profit from inclusion of a topographic shading correction. On the other hand, bright clouds over the ablation region of a glacier are mapped as snow when they are not identified by the cloud mapping procedure (Figure 11b), resulting in an overestimation of the SCR and too low SLA values. However, this only happens when the misclassified 'snow' area is covering at least 50% in five elevation bins.
ASMAG uses a glacier mask from 2011 with smaller glaciers than in the years before (Figure 12a,b). The main reason for this selection was the good quality of the scene (no perennial snowfields and clouds) for glacier extent mapping. As the static mask introduces a bias, we have analyzed the impact of glacier extent on the SCR with a mask from 1986. This revealed the same trends for the four test glaciers (HEF, KWF, VNF and HJF), but SCR values are always smaller than with the 2011 extents as the total glacier area was larger. The mean of the differences for all four test glaciers is 1%. Hence, the impact of the mask from 2011 on the SCR is small but should be considered if SCR is used to approximate glacier mass balance. The accuracy of the SC mapping is also shown in Figure 12b with an overlay (red outlines) on a very high resolution satellite image. Shadows from clouds and cast shadows from steep topography cause problems for the Ekstrand correction. Such shadowed areas on glaciers are classified as ice and result in smaller SCR and incorrect SLA values if they hide snow-covered regions. However, cloud shadows are rare in the workflow and topographic shadows are relatively small in the study region in August and early September. In more extreme relief, the algorithm may profit from inclusion of a topographic shading correction. On the other hand, bright clouds over the ablation region of a glacier are mapped as snow when they are not identified by the cloud mapping procedure (Figure 11b), resulting in an overestimation of the SCR and too low SLA values. However, this only happens when the misclassified 'snow' area is covering at least 50% in five elevation bins.
ASMAG uses a glacier mask from 2011 with smaller glaciers than in the years before ( Figures  12a and 12b). The main reason for this selection was the good quality of the scene (no perennial snowfields and clouds) for glacier extent mapping. As the static mask introduces a bias, we have analyzed the impact of glacier extent on the SCR with a mask from 1986. This revealed the same trends for the four test glaciers (HEF, KWF, VNF and HJF), but SCR values are always smaller than with the 2011 extents as the total glacier area was larger. The mean of the differences for all four test glaciers is 1%. Hence, the impact of the mask from 2011 on the SCR is small but should be considered if SCR is used to approximate glacier mass balance. The accuracy of the SC mapping is also shown in Figure 12b with an overlay (red outlines) on a very high resolution satellite image.

Conclusions
In this study, we presented a method (ASMAG) for the multi-temporal (1985-2016) retrieval of snow cover ratios (SCR) and snow line altitudes (SLA), and the results of its application to selected glaciers in the Ötztal Alps as well as Abramov Glacier in the Pamir Alay, Kyrgyzstan. The SC mapping is applied to each glacier individually using an automatically selected top of atmosphere reflectance (TOAR) threshold based on the Otsu method. Subsequently, the SC map is intersected with 20 m elevation bins of the DEM and its SLA is retrieved.
The overall performance of ASMAG can be seen as good to very good (90% accuracy), but local outliers exist. The SLA detection capability is better than 80% with an accuracy of ±19 m. If the acquisition date of the satellite scene is close to the date of the highest snowline, a good agreement of the derived SCR (SLA) values with the measured AAR (ELA) values is achieved, otherwise SCR (SLA) values are systematically higher (lower). However, glacier specific differences are still

Conclusions
In this study, we presented a method (ASMAG) for the multi-temporal (1985-2016) retrieval of snow cover ratios (SCR) and snow line altitudes (SLA), and the results of its application to selected glaciers in the Ötztal Alps as well as Abramov Glacier in the Pamir Alay, Kyrgyzstan. The SC mapping is applied to each glacier individually using an automatically selected top of atmosphere reflectance (TOAR) threshold based on the Otsu method. Subsequently, the SC map is intersected with 20 m elevation bins of the DEM and its SLA is retrieved.
The overall performance of ASMAG can be seen as good to very good (90% accuracy), but local outliers exist. The SLA detection capability is better than 80% with an accuracy of ±19 m. If the acquisition date of the satellite scene is close to the date of the highest snowline, a good agreement of the derived SCR (SLA) values with the measured AAR (ELA) values is achieved, otherwise SCR (SLA) values are systematically higher (lower). However, glacier specific differences are still correctly captured.
Problems arise when clouds and their shadows are covering parts of glaciers or when fresh snow is present. A small amount of debris cover has no impact on the result, but for large debris covered glaciers a separate ice mask could be useful. A DEM from one point in time is important and useful for the topographic correction of the NIR band. When analysing a time period of more than ten years, a DEM of one point in time introduces errors of the SLA values especially in low laying areas of the glacier. In contrary, the unchanged glacier mask only has a minor impact on the SCR in long time series.
We conclude that ASMAG is a promising tool to derive SCR and the SLA for large glacier samples. The derived snow cover maps can be used to calibrate/validate modelled snow cover evolution and as a proxy for glacier mass balance if cloud-free satellite scenes acquired close to the date of the highest snowline are available. However, Landsat data alone is too sparse for accurate retrieval of maximum snow line elevations, particularly in mountain regions with frequent cloud cover. The inclusion of higher frequency optical satellite data such as the forthcoming abundance of Sentinel-2 imagery will help overcome this restriction and could be relatively readily incorporated into the tool.