1. Introduction
Alpine lakes are lakes located at high altitudes in mountainous zones [
1] and include glacial and nonglacial lakes. Glacial lakes are water bodies influenced by the presence of glaciers. They include ice-contact lakes, which are next to glacier ice, and distal lakes, which are distant from the originating glaciers or ice sheets but are still influenced by them [
2]. Glacier recession regulates the formation of and changes in glacial lakes, especially in glacier-rich regions such as the Himalayas [
3,
4,
5]. These changes are often considered one of the most visible signs of global warming [
6] and increase the risk of glacial lake outburst floods [
7,
8]. These outbursts can release a large mass of water and sediment in a short time [
9], representing a serious hazard to downstream human life, property, and ecosystems [
10].
Satellite data provide a more effective approach for surveying alpine lakes in large regions because alpine environments are remote and difficult to visit. However, visual interpretation has been the common approach for identifying alpine lakes in satellite images due to the lack of automated approaches for accurately recognizing small alpine lakes [
11]. Because of the amount of labor and time involved, manual visual interpretation is not suitable for covering large areas.
Water sensitive indexes, such as the Normalized Difference Water Index (NDWI) [
12], Modified Normalized Difference Water Index (MNDWI) [
13], and Automated Water Extraction Index (AWEI) [
14], have been shown to be effective for detecting water in multispectral satellite data. Water indexes have also been used for alpine lake identification in countries such as Nepal [
15]. However, because of the complex terrain and weather conditions in the alpine environments, water indexes can result in considerable errors when used for alpine lake identification [
16]. The combination of water indexes and other datasets has been found to be effective in improving the ability of alpine lake identification. For example, the topographic features derived from the digital elevation models (DEMs) were found helpful for separating mountain shadows from glacial lakes [
17]. It is also common to combine water indexes and different segmentation methods to improve the accuracy of alpine lake identification. For example, Non-local Active Contour (NLAC) has been used to deal with regional image heterogeneity [
13].
Cloud contamination can severely limit the use of optical images in mountainous regions for lake identification. Synthetic Aperture Radar (SAR) observations can penetrate clouds and are sensitive to water. Therefore, SAR observations have been adopted for water mapping and monitoring, including for alpine lakes [
18,
19]. Despite their all-weather condition capability [
20], SAR observations also present some disadvantages for water detection, such as inherent topography-induced effects and speckle noise [
19].
A detailed classification system of glacial lakes was proposed based on their formation mechanism, topographic features, and geographical location [
21], including glacial erosion lakes, moraine-dammed lakes, ice-blocked lakes, supraglacial lakes, subglacial lakes, and other glacial lakes. Methods have been developed to identify alpine lakes and their types using satellite images by considering examining certain conditions such as distance to glaciers [
22], minimum elevation [
23], or whether the lakes are located in a glacial development area [
24]. However, these simple conditions are usually insufficient for accurately identifying alpine lakes. Random forest or other machine learning algorithms have been used for establishing more sophisticated models to distinguish between glacial and nonglacial lakes by considering more features of lakes [
25]. Glacial lakes are usually dammed by debris from glacial movement, and these dams are very fragile but distinguishable. Therefore, considering the structure and environment of an alpine lake could improve the ability to identify glacial or nonglacial lakes. However, these features usually only consider the water instead of the surrounding environment. Deep learning presents a stronger learning ability by considering more features of the targets and their environments, providing a promising approach for detecting alpine lakes and their types.
Deep learning algorithms, represented by Convolutional Neural Networks, have evolved rapidly in recent years. Compared to the pixel-based traditional remote sensing analytics, deep learning has demonstrated a strong spatial feature extraction ability by establishing relationships between pixels. The potential of deep learning in water identification, especially glacial lake identification, has been explored. Using UNet and very high-resolution satellite (VHRS) imagery, more than 5000 water bodies were identified in the Hindu Kush, Karakoram, and Himalayas (HKKH) regions, which is much higher than the number in existing inventories [
26]. By improving and optimizing the UNet model, the established algorithms can extract glacial lakes more effectively [
27,
28,
29]. However, UNet is an early deep learning network with limited performance. Advanced deep learning networks that have been developed based on more effective architectures have demonstrated improved identification accuracy compared with UNet [
30,
31]. Despite the ability of deep learning networks, limited sample data lead to inadequate model training and result in false positive errors. However, there are alpine lake datasets available that can be used as training data for deep learning networks. Therefore, the current focus of using deep learning to identify glacial lakes should take full advantage of the existing networks and data.
This paper presents an automatic method for the identification of glacial lakes based on deep learning combined with multi-source satellite data to overcome the limitations of field investigations and manual visual interpretation. In addition to detecting alpine lakes and their outlines, the method presented is capable of distinguishing differences between glacial and nonglacial lakes by extracting the characteristics of their surrounding environments. The advantages of multi-source satellite data (true color images, water indexes, SAR, and DEMs) were integrated to enhance the performance of the identification model. A deep learning label set of high quality was created containing typical glacial lakes and various other objects in the Eastern Himalayas. The presented method can distinguish glacial lakes and nonglacial lakes by their surrounding environment. Knowing the type of alpine lakes could greatly improve our knowledge of these ecosystems as well as provide critical information for assessing potential hazards created by these lakes.
5. Discussion
The combination of high-resolution satellite data and deep learning can accurately capture the distribution and characteristics of alpine lakes, particularly in the case of glacier lakes [
31]. The presented method evaluated the effectiveness of identifying alpine lakes with multi-source data and the performance of distinguishing types of alpine lakes (glacial and nonglacial lakes). It suggested that multi-source satellite data, especially SAR and optical data, can greatly improve the detection rate and reduce false positives in alpine lake identification. The validation loss curves fluctuated downwards, but the variation decreased as more input variables were added, suggesting that the increase in information reduced the uncertainty of the segmentation (
Figure 7b).
The segmentation accuracy was greatly affected by quality issues due to clouds, shadows, and other anomalies when only RGB was used as the input. These issues usually led to false positives (
Figure 9a). Our experiment suggested that MNDWI could considerably improve water detection, but it is also prone to shadow effects (
Figure 9b). Optical satellite data cannot penetrate clouds, leading to underestimating the number of lakes (
Figure 9c); however, SAR data (VV) can overcome this limitation of optical data (
Figure 9d). Additionally, SAR data are not sensitive to shallow waters. Relief data can help correcting the false positives caused by terrain shadows (
Figure 9e).
The Sentinel satellite data were effective in identifying alpine lakes; however, the 10-m resolution data struggled with small moraine thaw lakes, which are hard to identify even through visual interpretation. Errors and inconsistencies in visual interpretation could be propagated into the model and affect the performance of segmentation and classification. Satellite data with higher resolutions could improve the ability to identify alpine lakes, particularly small ones. High-resolution images could also help improve the quality of interpretation by producing high-quality training data, which are key to the success of mapping alpine lakes in extreme mountainous environments.
Both deep learning models with a single RGB input (DL-RGB) and multiple inputs (DL-Multisource) produced significantly higher accuracy than the traditional approach of applying a threshold to the satellite-derived water index (MNDWI), suggesting that a deep learning method performs better for water detection (
Table 7).
The MNDWI method captured most lakes in the test region, demonstrating its effectiveness in detecting water. However, the method produced many false positives, including a large number of rivers. This situation was caused by the limitation of spectral algorithm, which is not sensitive to the spatial characteristics of water bodies. The deep learning algorithm can integrate the spectral and spatial characteristics of water bodies to achieve a better identification effect. The DL-RGB method produced fewer false positives than the MNDWI, but was conservative at lake detection by capturing the least number of lakes among the three methods. The DL-Multisource method detected slightly fewer lakes than the MNDWI, with considerably fewer false positives than the other two methods, indicating that deep learning with multi-source inputs performed the best among the three methods regarding both water detection ability and misidentification.
In this paper, we compiled an inventory of alpine lakes during 2016–2020 for the Eastern Himalayas, identifying 4584 lakes, including 2795 glacial lakes. We extracted the published datasets for the region and compared them to the glacial lakes in the Third Pole Environment (TPE) (V1.0) (2010) [
39] and inventory data of glacial lake in western China (2015) [
40]. The two datasets reported 501 and 533 lakes, less than 20% of the lakes identified by our inventory (
Table 8). Although the two datasets were produced for different years (2010 and 2015), the slow temporal dynamics of alpine lakes were unlikely to cause significant differences between the datasets. The two datasets were both produced from visual interpretation of the 30-m resolution Landsat images. Examples showed glacial lakes that clearly existed between 2009 and 2018 but were missed in both datasets (
Figure 10a). The coarser resolution of Landsat data could have contributed to the omission of small lakes, especially glacial thaw lakes (
Figure 10b), and the lower number of reported lakes compared to our inventory, which was produced from Sentinel data.
Our inventory captured nearly all glacial lakes identified by the two reference datasets, suggesting low commission errors with our inventory compared to the two datasets. However, only 120 glacial lakes were reported by both datasets (
Figure 10c). The lack of consistency between them is likely due to differences in methods and guidelines adopted by the two datasets. The comparison also signals the urgency for mapping glacial lakes at finer scales with higher-resolution satellite data to fill the gaps in our understanding of glacial lakes, especially regarding those small in size but high in numbers.
6. Conclusions
In this paper, we presented an automatic method for the identification of glacial lakes based on deep learning and multi-source satellite data, including optical and SAR data. Compared to traditional spectral-based water detection methods, the deep learning-based methods presented considerably improved water detection ability, with significantly reduced overestimation. The inclusion of a water index (MNDWI), a SAR band (Sentinel-1 VV), and a terrain band (Relief), in addition to RGB images, as inputs for the deep learning method further improved the model performances. Although the transferability of the model has been evaluated in eastern Himalayas, the model may not perform well in regions outside of the Himalayas due to limited representation in the training data. Incorporating the training data of other regions would further improve the model’s ability for identifying alpine lakes worldwide.
An alpine lake inventory consisting of 2075 glacial and 1789 nonglacial lakes was compiled for the Eastern Himalayas; this value is five times higher than the previously reported number of glacial lakes in the two previously existing datasets. The inventory unveiled a large number of glacial lakes that were missed by the existing datasets, especially small glacial thaw lakes, indicating considerable knowledge gaps. The combination of deep learning and multi-source high-solution satellite data demonstrated great potential for mapping small alpine lakes in extreme environments in the Himalayas and other part of the world, such as in Greenland and Antarctica. The results will provide critical information for understanding these ecosystems and early warning of glacial lake outburst floods.