Structure-from-Motion Using Historical Aerial Images to Analyse Changes in Glacier Surface Elevation

The application of structure-from-motion (SfM) to generate digital terrain models (DTMs) derived from different image sources has strongly increased, the major reason for this being that processing is substantially easier with SfM than with conventional photogrammetry. To test the functionality in a demanding environment, we applied SfM and conventional photogrammetry to archival aerial images from Zmuttgletscher, a mountain glacier in Switzerland, for nine dates between 1946 and 2005 using the most popular software packages, and compared the results regarding bundle adjustment and final DTM quality. The results suggest that by using SfM it is possible to produce DTMs of similar quality as with conventional photogrammetry. Higher point cloud density and less noise allow a higher ground resolution of the final DTM, and the time effort from the user is 3–6 times smaller, while the controls of the commercial software packages Agisoft PhotoScan (Version 1.2; Agisoft, St. Petersburg, Russia) and Pix4Dmapper (Version 3.0; Pix4D, Lausanne, Switzerland) are limited in comparison to ERDAS photogrammetry. SfM performs less reliably when few images with little overlap are processed. Even though SfM facilitates the largely automated production of high quality DTMs, the user is not exempt from a thorough quality check, at best with reference data where available. The resulting DTM time series revealed an average change in surface elevation at the glacier tongue of −67.0 ± 5.3 m. The spatial pattern of changes over time reflects the influence of flow dynamics and the melt of clean ice and that under debris cover. With continued technological advances, we expect to see an increasing use of SfM in glaciology for a variety of purposes, also in processing archival aerial imagery.


Introduction
Digital terrain models (DTMs), as representations of the earth's surface, are an indispensable source of information in the geosciences.Structure-from-motion (SfM), especially in combination with multi-view-stereo (MVS) algorithms that substantially increase point cloud densities (thus correctly termed SfM-MVS [1]), has become an increasingly popular technology (overviews by [1,2]).Many studies have a methodological nature, focusing on the new technology and its possibilities, whilst others applied SfM-MVS, e.g., to geomorphological [3,4] and glaciological [5,6] problems.Usually, small areas (10 2 -10 4 m 2 ) are considered and terrestrial or low-altitude image platforms such as UAVs, blimps and kites are used.Bakker et al. (2016) [3] have investigated whether SfM-MVS can be applied to plane-based aerial images.This is more challenging due to fewer images, a minor change in viewing angle (usually nadir), and a smaller image overlap.Bakker et al. (2016) [3] showed that SfM-MVS-derived DTMs have similar qualities as DTMs derived with conventional photogrammetry, even when using archival imagery acquired decades ago.The accuracies of the relatively small area of interest (AOI), a braided riverbed, were comparable, and changes in the order of decimetres could be detected.
In glaciology, DTMs are widely used to observe ongoing processes and monitor changes [7].Differences between DTMs from different dates (years to decades apart) provide valuable information on local or glacier-wide surface elevation changes, e.g., the geodetic mass balance (e.g., [8,9]).The application of SfM-MVS on historical, archival aerial imagery to detect mountain glacier changes has not yet been systematically investigated.In alpine areas, topography is highly challenging because of the steep slopes and the large elevation gradient.Generally, glaciers cover larger areas (in the order of square kilometres), making it harder to derive high accuracies in steep topography (e.g., [10]), because of the reduced area of potentially stable terrain and the smaller possibility for good distribution of ground control points (GCPs), which are essential for generating a high-quality DTM.Additionally, extracting information from homogeneous (snow or ice-covered) surfaces is challenging due to low image texture.Archival aerial imagery is a valuable source of information, notably for long-term changes in glacier volume.Imagery is potentially available for many parts of the world from dates of more than 50 and sometimes even more than 100 years back in time.SfM-MVS could make the analysis of such data attractive because of the smaller effort of time and expert knowledge required, and the larger point cloud that potentially results in higher ground resolution.
The major goal of this study was to investigate whether it is possible to retrieve DTMs from aerial images using widely-used commercial SfM-MVS software, and whether these DTMs are of sufficient quality to analyse changes in glacier elevation.Therefore, we compared the resulting DTMs with those produced with conventional photogrammetry using the same input data, and assessed their quality with statistics from the bundle adjustment process and the performance over stable terrain using an independent reference DTM.Thereby, we could highlight the strengths and weaknesses of the two technologies for glaciological purposes.The final question addressed is whether it will be possible to reconstruct long-term glacier elevation changes or even geodetic mass balances with just a few commands by executing automated processes if the corresponding glacier outlines are available.

Study Area
The study area is the debris-covered Zmuttgletscher and its immediate surroundings located in the western Swiss Alps (45 • 59.8 N, 7 • 37.8 E), close to the Matterhorn (4478 m a.s.l.) (Figure 1).The glacier has a size of ~16 km 2 and its elevation ranges from ~2200 m to over 3700 m.Zmuttgletscher is to a large extent surrounded by high rock walls.The steep terrain is often unstable and the constant rock fall combined with glacier retreat has led to a heavily debris-covered glacier tongue.The described location poses a number of difficulties for photogrammetric analysis, e.g., lower accuracy of DTMs in steeper areas [10,11] and the demanding detection of GCPs [12].

Target Imagery
The regular updates of the Swiss national map series are one reason why aerial images have been acquired in our study region over the past decades.Another reason is glacier monitoring purposes that have focused on the tongues of a number of Swiss glaciers [13].The overflights for both purposes have been performed by the federal topographic institute Swisstopo.The images were taken with large-format metric cameras, with a scale of ~1:25,000 (mapping campaigns) and ~1:11,000 (glacier monitoring campaigns; see Table 1).However, the scale varies considerably throughout the study area, which covers terrain between 2100 and >4000 m.In addition, there are seven images from 1946, acquired by American military planes after the end of World War II, which are of lower quality and resolution.The archival aerial imagery was scanned with a resolution of 14 micrometres and can be purchased from Swisstopo.For our analysis, we used a total of 95 aerial images with varying degree of quality, scale, overlap and number of images per date (Table 1).

Reference Data
We used the SwissALTI 3D (SA3D) as the reference DTM [15], which in our study area is based on photogrammetric DTMs derived from aerial imagery acquired in summer 2010, with a pixel resolution of 2 m and a vertical accuracy of ±1-3 m [15].From the difference between SA3D and the slave DTMs, the vertical accuracy was calculated using areas of stable terrain.Stable terrain is required to show no elevation change over time and additionally cover various terrain expositions to adequately represent the slopes of the study area.It was challenging to fulfil these requirements in a relatively small, alpine, glaciated basin (see Figure 2), but areas around the lower part of the tongue could serve as stable terrain for all dates.The Swissimage, a composite of aerial images over all of Switzerland [16], was used to locate GCPs (the elevation of which was taken from SA3D) and areas of stable terrain.

Conventional Photogrammetry Using ERDAS-IP 2015
For the production of the conventional photogrammetric DTMs we used the popular ERDAS imagine photogrammetry [17] (ERDAS-IP).The conventional stereo-photogrammetric approach requires information on interior and exterior camera orientation, i.e., focal length, image size, radial distortion, planar coordinates and camera acquisition elevation.In our case, camera locations are not known but could be calculated in an automatic resection process using the GCP-based geometric model.Additionally, the two tilt angles and the flight direction are required (pitch, roll, yaw); for aerial images pitch and roll can initially be set to zero.With this information, the geometric relations between the images are set up which are then used to solve the collinearity equations.The quality of this input information is crucial for the solution to be as accurate as possible.Therefore, a precise photogrammetric camera is used that was calibrated before the flight campaigns.By locating at least three GCPs in every image, the exact coordinates for all other image pixels can be calculated.
The processing steps are (1) Defining the camera model: Information about the sensor and from the camera calibration report are used to set camera parameters and flying height and delineate the single images; (2) Performing the bundle adjustment: GCPs are manually located in all images, followed by an automatic calculation of tie points and the subsequent aerotriangulation, including adjustments of previously set camera parameters; (3) Generating the DTM: Stereo-matching algorithms use the geometric model and calculate pixel coordinates, resulting in a point cloud that is resampled into a DTM grid with regular pixel size.

SfM-MVS
For the SfM-MVS technology, a similarly popular software is Agisoft PhotoScan [20] (e.g., used by [21,22]).This commercial package contains all the steps from image input to DTM and orthophoto outputs, and thus allows for the setting of GCPs and the possibility to recalibrate initially calculated model parameters, and includes a module for georeferenced DTM generation.A comparably comprehensive software is Pix4Dmapper with large-frame extension [23], which has become more popular recently (e.g., [3,24]).Since the exact algorithms of the two packages are disclosed and potentially different, we included them both in order to get an idea of the differences.
The SfM-MVS process is based on the same photogrammetric principles of image bundle adjustment and the 3D localization of the single pixels, but is different in terms of step sequence.
(1) Matching the input images: In contrast to conventional photogrammetry, the processing starts with the image matching, which is applied to the raw, unstructured images.Key points are detected in the images that are stable under viewpoint and lighting conditions and are described in relation to their neighbourhood (e.g., using the Scale-invariant feature transform algorithm; [25]).Following this, the algorithm extracts information on the correspondences of these key points in the different images.(2) Performing the bundle adjustment: This step runs analogous to step (2) of conventional photogrammetry.The calculation of camera parameters can be supported by the input of GCPs; highest accuracy is retrieved with a homogeneous spatial distribution.
Apart from setting and adapting GCPs, the whole process is automatic, strongly reducing the manual effort compared to ERDAS-IP.
Finally, in every software, a seven-parameter transformation (three translation, three rotation, one scaling) is applied to bring the point cloud to object coordinates [28], here being the Swiss National Grid CH-LV1903 (EPSG: 21781).

Ground Control Points
For conventional photogrammetry, a minimum of three points per image is necessary, while SfM point clouds only need three points in total to be scaled and georeferenced.However, a higher number increases accuracy, especially in areas of steep terrain [21].
We collected the GCPs from Swissimage using path crossings or individual rocks, while rock or vegetation patterns facilitated the point identification from different angles and under different illumination conditions.The rocky terrain complicates the search for GCPs, and the high relief leads to changing patterns of shadow and light across the different dates; changing snow conditions also complicate the location of GCPs.Since glaciers cover large parts of the images, only a small part of the total area is potentially available for locating GCPs.A set of base GCPs were collected that were stable and visible over time and changing conditions during the study period.With this subset of points (see Figure 1) we achieved a homogeneous distribution in elevation (ranging from below the glacier tongue up to over 3500 m a.s.l.) and could constrain the glacier on all sides.This pool of GCPs served as a homogeneous base for the georeferencing of all DTMs and many of these points were used for every or a majority of the DTM constructions.The number of GCPs that were finally used was mainly determined by the ground area that was covered by the flight campaign.On average, we used 18 GCPs (min: eight, max: 30) to produce the DTMs, more for ERDAS-IP if the number of input images was higher.

Final DTM
The point cloud density defines the potential DTM resolution.We set relatively low accuracy thresholds for ERDAS-IP in order to produce DTMs with high quality, which results in lower resolution due to stronger point filtering.The two SfM methods achieve denser point clouds of high quality, finally resulting in higher resolution DTMs with little noise and few artefacts.Only areas constrained by GCPs in several directions can yield reliable numbers.We thus used manually derived concave hull polygons to constrain the DTM extent, rather than using the full DTM output area (see Figure 2).In order to compare the different DTMs from the same date, the overlapping area of the concave hull polygons was chosen as a final mask for analysis.For the glacier signal comparison of all DTMs we used one minimum overlapping polygon.

Coregistration
A number of scientific publications stress the need to do a coregistration of DTMs, especially when comparing them to a reference DTM (e.g., [29]).The coregistration was based on the assumption of terrain being stable outside of glaciers, their forefields and steep areas.For the coregistration we applied the method of Nuth and Kääb (201) [29].A potential spatial trend between the master (SA3D) and slave DTMs was eliminated using a second order trend surface (cf.[30,31]).In order to estimate the quality of the original and the coregistered DTM, we analysed the differencing image between each DTM and the master DTM and pixel value statistics.The western images for the dates 14 September 197714 September , 1983, and 1995 cover a much smaller area and contain mostly snow/ice surfaces, making it difficult to establish a correct connection to 3D points in an existing coordinate system.Thus, a tilt of the DTM may be introduced which can be detected visually but is not obvious from the stable terrain statistics.Combining the visual inspection and the terrain statistics, a choice was made between the original and the coregistered version of the DTM.Out of a total of 46 DTMs (23 original/coregistered) from nine dates and three different software packages, nine were not found to show improved quality after coregistration.In the other 14 cases, only one to three iterations were necessary and the applied shift was small, ranging between zero and five meters (average 0.34 m).

Quality Estimation
We estimated the quality of each DTM per date and software with a raster-by-raster comparison with the reference DTM over stable terrain [32].For this stable terrain (see Figure 2) the mean difference and the pixel standard deviation were calculated, yielding an estimate of the general DTM quality.
DTM artefacts can be recognized by visual inspection.They appear in noisy areas where the key point detection and tie point matching is difficult because of reflectance or topographic effects, which is the case in very bright or flat areas (fresh snow, lakes), very steep areas (rock walls), or shadows.In case they are not contained by the AOI, rock walls and respective artefacts can also be neglected in quality checks using stable terrain.
During the bundle adjustment, the final geometric model is established and can be compared to the previously identified GCPs.The difference between the manually set GCP and the location of the same GCP calculated by the model gives an indication of the internal bundle adjustment quality.The location error of the x, y and z coordinate can be expressed as a root mean square error (RMSE) and is available for every software.

Bundle Adjustment
In total, 23 out of 27 possible DTMs could be generated with a satisfying level of quality, i.e., an RMS error that is lower than the glacier signal for each period and a ground resolution that enables the analysis of change patterns (Table 2).After an iteration of quality checks and reprocessing, this quality could be achieved for every date and software, where a matching of images over the glacier tongue was possible.Using the RMSE, we found no significant difference between different software.However, DTMs from glacier monitoring flights with lower flying altitudes show smaller internal errors (RMSE avg = 1.10 m) than those from the general mapping flights (RMSE avg = 2.03 m).The ground resolution of the DTM reflects the finally calculated point density and also the quality of the filtered point cloud.ERDAS-IP yields a lower average resolution (5.67 m) than PhotoScan (3.33 m) and Pix4Dmapper (2.67 m).A similar number of GCPs were used per software, but more were required for ERDAS-IP with increasing number of input images.The number of automatically detected tie points, however, is two orders of magnitude higher for PhotoScan and Pix4Dmapper.For the year 1946 only PhotoScan was able to achieve sufficient tie point correspondence for successful image matching.With ERDAS-IP we were not successful in establishing a geometric model for aerotriangulation because of a lack of camera parameters, image delineation indications and a relatively low image quality.It is a clear advantage that SfM-MVS methods do not rely on this information.
When only few images with little overlap are available, both SfM-MVS software have difficulties with the image matching.For 2001 (eight images) and 08 September 1977, the RMSE of the ERDAS DTM is the smallest, the quality on stable terrain is comparable, and the reconstructed surface area is larger than that from PhotoScan and Pix4Dmapper.For 2005, Pix4Dmapper was not successful in image matching.These experiences confirm the robust performance of conventional photogrammetry for this typical type of application, where a DTM is produced from a limited set of input images.
We found little correlation between the number of input images and the RMSE, but a small positive correlation between the number of GCPs and the RMSE (Figure 3a,b).This is not surprising, since the number of images and thus the ground area increase with the number of GCPs, and, hence, automatically include a stronger horizontal and vertical distribution and thus higher elevation ranges.The Shannon entropy is a measure of image texture [14] and only shows a small, non-significant correlation with the RMSE; disregarding one outlier DTM (14 September 1977), the correlation becomes much stronger (R 2 = 0.55 (p ≤ 0.05); see Figure 3c,d).Image overlap is also expected to have an effect on DTM quality [3].We could see this in our results, where the glacier monitoring flights with an overlap of ~80% yield a mean RMSE of 1.30, while the national mapping flights with an overlap of ~60% yield a mean RMSE of 1.83.The high overlap of the former accommodates the SfM concept, while the mapping flights have an overlap/camera spacing accommodating conventional photogrammetry, with the parallax at the maximum and thus smaller inaccuracies in elevation calculations.The geometric model is constructed using a combination of several parameters.Different combinations of focal length, image size, and flying height lead to similar results.Unlike for conventional photogrammetry, in the SfM-MVS technique it is possible to apply different image parameters and interior camera orientation settings for every input image in order to achieve the best fitting function.Without providing any input information, we observed that both PhotoScan and the Pix4Dmapper assumed a focal length between 25-30 mm.This is different from the real focal length (commonly 115-150 mm, depending on the camera that was used), but is compensated by different assumptions on image size and flying height.

DTM Quality over Stable Terrain
Generally, there is high confidence in the models considering the stable terrain statistics (Table 3).The mean difference values lie between −0.39 m (ERDAS-IP) and 0.3 m (Pix4Dmapper), with an average of −0.04 m.Analogously, the standard deviation is 2.3 m (ERDAS-IP), 2.4 m (PhotoScan), and 1.3 m (Pix4Dmapper), with an average of 2.04 m.The range shows that, even with a higher resolution, the SfM-MVS-produced DTMs contain less noise than the ERDAS DTMs.Standard deviation is typically taken as DTM accuracy, and can in this case also help to select the best DTM per date.

DTM Intercomparison
A DTM difference was calculated between each DTM and the reference DTM, and the volume change over the glacier between the respective period is compared among the DTM sources.Summing up all periods yields a maximum difference of 15 m or 9.6% between the three software packages (Table 4).This is mainly caused by two DTMs, which show a clearly different signal than the others from the same date, PhotoScan 1961 and ERDAS 1988, which both have relatively high errors (see Table 3).Excluding them from the comparison reduces the total difference to 3% of the sum of periods.Average statistics are one way to compare the DTMs.Another important characteristic is whether the patterns of volume change are similar.This can be compared by analysing the differences between the glacier signal maps from the same period revealed by the DTMs from the different software packages.The general patterns are very similar in every DTM difference image (Figure 4).When looking at a smaller scale of ±10 m, certain characteristics became obvious: small artefacts of the DEM production, difficulties at lake surfaces and DTM edges (see Figure 5).A comparison of these DTMs revealed which DTM has problems in which areas.For example, there is a small bulge of several meters in the Photoscan 1983 DTM (see Figure 5b,c), or a relatively strong effect of >10 m at a proglacial lake surface.

Orthophotos
Orthophotos can be produced by rectifying the aerial images.Thus, their geometric quality depends on the underlying DTM.The horizontal accuracy of the orthophotos is on average below 1 m at the GCP locations but can be locally higher, e.g., in areas of steeper terrain.The pixel resolution of the orthophotos depends on the camera resolution and the flying height and is with 0.2-1 m commonly several times higher than that from the DTMs.We have not investigated the radiometric and geometric quality of the orthophotos in detail, but they undoubtedly contain further information that can be exploited.

Uncertainties
The smallest detectable changes are linked to the combined accuracy of the two DTMs at the start and end of a period and vary according to DTM quality.With the availability of a reference DTM, the standard deviation of values in the stable terrain can be used as an accuracy measure.For a period between two dates, the uncertainty of both DTMs needs to be combined using the square root of the sum of squared standard deviations.Doing this for the example period of 1961-2001 of the ERDAS DTMs, reveals a total uncertainty of σ 1961 2 + σ 2001 2 = ±1.94m while the elevation change signal over the glacier is 26.8 m.Relevant multi-annual or decadal variations of glacier elevation change (over the full glacier or only parts of it) are typically in the range of meters or more (e.g., [33]).Thus, the DTMs with such high spatial resolution and high accuracy are a very useful base for investigating the quantity of volume change and its patterns over time (see Figure 6).

Glacier Evolution
Over the time span of 64 years the glacier tongue showed an average elevation change of −67.0 ± 5.3 m (average lowering rate −1.1 ± 0.08 m/year) inside the overlapping area, with a maximum of approximately −137 m close to today's terminus.The trend is not temporally homogeneous but shows a period of strong negative change from 1946-1961, which is followed by over one decade of an average elevation increase (Figure 7a).Since 1988 the elevation change has become negative again.The most striking spatial patterns are locally emphasized elevation changes, which can be linked to the presence of supraglacial debris cover as well as ice cliffs and supraglacial flow channels.To exemplify the different periods and the effect of debris and ice cliffs, we selected three small areas (~0.01 km 2 ) of different surface cover types (clean ice, debris-covered ice, and an area with ice cliffs/flow channels), and followed their evolution over time (Figure 8).In most periods, the debris-covered area shows the smallest elevation change.The years of positive mass balances in the Alps [34] in the 1970s and 1980s resulted in an increase in ice mass that was transported downglacier.
It reached the cliff area (close to the terminus) later than the areas of the other surface cover types and with smaller intensity.

Discussion
Apart from 1988, there is no notable difference between DTMs from different software packages and the same date.The standard deviation serves as a measure of DTM accuracy (±0.8-5.5 m; average ±2 m), which is sufficient to detect glacier volume changes for periods from several years to decades, and to investigate patterns of spatial change.The high quality of metric cameras (high radial resolution, small lens distortion, see also [35]) is certainly a decisive factor in achieving high DTM quality, because it eases the self-calibration of lens distortion during the SfM-MVS process (this has also been pointed out by [3]).SfM-MVS seems to be weaker with a small number of input images and less image overlap, while this is the strength of conventional photogrammetry.The successful generation of the DTM for 1946 shows the advantage of SfM-MVS to cope with challenging input data even without the use of camera parameters or image properties.Multiple processing of the same images can deliver slightly different results, which has its roots in the random seeding processes of the matching algorithms.Thus, we believe that controlling the model quality during the process is important because adapting GCP selection and placement as well as considering different quality thresholds in the matching process can strongly improve the result.
Restituted lens parameters and camera positions are the product of a number of parameters from image size to flying height.They might not represent the reality (e.g., focal length) but may still lead to high-quality results because of the compensation of one parameter by another.Providing the fixed lens parameters consequently leads to "correct" dependent parameters (e.g., flying height).SfM-MVS has the ability to assume a different camera for each image, so that the combination of parameters yields the best triangulation results.This can also be an advantage for aerial imagery in case of varying image quality.In addition, using PhotoScan and Pix4Dmapper substantially reduces processing time (3-6 times smaller), especially with a larger number of input images.
Despite the mentioned advantages of SfM-MVS, it is important to highlight the similar requirements considering DTM production and quality control.In terms of processing time and DTM quality, it has proven to be efficient to establish a set of base GCPs to be used as input for all DTMs.Just like for conventional photogrammetry, good spatial distribution (horizontally and vertically) of the GCPs is crucial, while quality is more important than quantity in the challenging glacier surroundings.Our experience showed that after a number of 10-15 GCPs the additional benefit to DTM accuracy decreases, even though additional GCPs of high quality will always also increase DTM quality [36,37].For the DTMs in this study, 10-15 GCPs per DTM result in a GCP density per km 2 (and per ground sampling distance (GSD)) of 0.7-2 GCP/km 2 (0.2-9.0 × 10 −8 GCP/GSD), which coincides well with the results from other studies (e.g., [38,39]).Apart from the average GCP density and the total GCP number, the distance to the closest GCP also affects DTM accuracy.Several studies have investigated the effect of a small number of unevenly distributed GCPs on DTM accuracy (e.g., [38]) and found a substantial decrease of accuracy, but still in a range of ~1-3 m absolute accuracy.Gindraux et al. (2017) [38] and Tonkin and Midgley (2016) [40] found a decrease in accuracy of 0.09 and 0.1 m per 100 m distance increase from the next GCP, respectively.In our case, no reference information from the glacier surface was available.However, by combining the base GCPs with GCPs specifically adapted for each date, we achieved an appropriate horizontal and vertical GCP distribution around the glacier margins for all DTMs.Consequently, most pixels on the tongue of Zmuttgletscher are usually within a distance of 100-300 m of a GCP, while some smaller areas reach a maximum distance of 700-800 m.Using a decrease in accuracy of 0.1 m per 100 m distance results in uncertainties of 0.1-0.8m, which is still smaller than uncertainties estimated from the stable terrain comparison.Due to the appropriate GCP distribution and the similar characteristics of stable terrain and the glacier surface, we also expect the error to be on average in the same range for both areas.
We found that the lowest DTM resolution without gross artefacts on the glacier surface is lower for PhotoScan and Pix4Dmapper than for ERDAS-IP.This is likely linked to the multi-vision algorithms that result in a much higher point cloud density (e.g., [41]) which can be an advantage when investigating patterns or smaller features (e.g., moraine breaches, crevasses, thermokarst).
Our results confirm findings by others, such as the ability of SfM-MVS to establish a "dynamic" relation between different parameters required to set up a geometric model [3], or the necessity of abundant and high-quality GCP input, and a final quality control [3,12].The application of SfM-MVS for glaciological purposes is increasing.The technique has been used for a variety of questions, like the characterization of surface features such as ponds and ice cliffs [5], the investigation of calving dynamics by extracting surface ice flow from repeat flight campaigns [42], or the study of supraglacial drainage [6].Piermattei et al. (2016) [43] have also applied it to derive surface elevation changes over an entire glacier, thus determining the mass balance with the geodetic method by using UAV photography.Our study showed that SfM-MVS is well suited to derive geodetic mass balances by also using aerial images because quality and accuracy of the derived DTMs are comparable to modern photogrammetrical DTMs.This shows an opportunity to calculate elevation changes and produce geodetic mass balances for many glaciers, and also to extend existing time series further back.Originally, aerial images were often used for mapping purposes.Using the original images instead of information from the derived maps leads to denser and more precise elevation data, and makes it also possible to produce orthophotos that can be used for other glaciological purposes (e.g., the long-term change of debris-cover on Zmuttgletscher).
By comparing the output of two techniques and three of the currently most popular software packages in the geosciences, we were able to get a good idea of the precision of the resulting DTMs.Not only is the quality within the SfM technique high and, above all, robust, but the differences to the results from conventional photogrammetry are also robust and small.The uncertainty ranges are similar to the ones from the reference DTM, demonstrating that the SfM technology (namely PhotoScan and Pix4Dmapper) is mature enough to be used for scientific (glaciological) purposes under the prerequisites of high GCP quality and sound quality control.SfM-MVS technology might even be preferential to conventional photogrammetry (namely ERDAS-IP), which shows similar quality but lower resolution and longer processing times.
The analysis of the resulting time series of elevation changes over the tongue of Zmuttgletscher reveals an average lowering of approximately 67 m.The change is neither spatially homogeneous nor strongly correlated to absolute elevation, but is rather governed by an interplay of ice dynamics and debris cover, which will be the topic of further investigations.

Conclusions
High-quality DTMs can be achieved by applying SfM-MVS to a small set of aerial images.Their accuracy is comparable to that from DTMs resulting from conventional photogrammetry, but thorough quality control of the results, potentially adapting settings and input data, and reprocessing the data, are inevitable.Therefore, we conclude that the automatic production of geodetic glacier mass balances are still some way ahead.It will, however, become considerably faster for more researchers to produce DTMs over glacial areas that currently lack mass balance data, thus strongly increasing information on decadal glacier changes.We showed this potential with the example of a 64-year time series of elevation changes over the tongue of Zmuttgletscher, revealing rates of change as well as spatial change patterns over several time periods.Additionally, we can imagine a number of other glaciological applications, like methodological investigations of glaciers with an existing glaciological mass balance monitoring programme, or the extraction of information from high-resolution orthophotos.Because of the ease-of-use and the ongoing algorithm improvements, we expect to see more studies applying SfM-MVS to archival aerial imagery, investigating glaciological and other problems in high-mountain areas.

Figure 1 .
Figure 1.The study site, showing a subset of base GCPs used for most DTMs, as well as the rough footprint of the aerial images from 14 September 1977 and 07 September 1988, which are also representative for the other glacier and mapping campaigns, respectively.

Figure 2 .
Figure 2. Example of a color-coded DTM differencing map for the period 1961-2010.The 1961 DTM was generated with Pix4Dmapper.

Figure 3 .
Figure 3. Parameters potentially influencing the DTM Root mean square error: the number of input images (a); the number of GCPs (b); and image texture/entropy (c); excluding one outlier (d).
Figure 5. Internal deviations of the three DTMs from 1983, thereby revealing some problems with model artefacts, steep walls and lake surfaces: Pix4Dmapper vs. ERDAS (a); ERDAS vs. Photoscan (b); and Pix4D vs. Photoscan (c).Note the different classifications between Figures 4 and 5.

Figure 6 .
Figure 6.Example DTM difference between the ERDAS DTMs from 1961 and 2001.It shows a pattern of volume changes that is heterogeneous.The biggest changes happen in the area of ice cliffs and lakes and where debris cover is absent or sparse (compare with Figure 1).

Figure 8 .
Figure 8. Elevation change for each period (a) and cumulative (b) over the tongue (black line) and selected areas with different surface cover types.

Table 1 .
[14]al imagery input data.The Shannon entropy is a measure of image texture[14].

Table 2 .
Resulting DTMs for each software.

Table 3 .
Stable terrain statistics for each DTM.The standard deviation serves as accuracy measure.

Table 4 .
Mean elevation difference (dH) of the overlapping glacier areas from the DTM difference between the reference DTM and the DTM of the respective date and software."Total all" shows the sum of dH values where all three software packages yield results. "Total selected" sums up only the DTMs with sufficient quality for further assessments.