Wet and Dry Snow Detection Using Sentinel-1 SAR Data for Mountainous Areas with a Machine Learning Technique

: Traditional studies on mapping wet snow cover extent (SCE) often feature limitations, especially in vegetated and mountainous areas. The aim of this study is to propose a new total and wet SCE mapping strategy based on freely accessible spaceborne synthetic aperture radar (SAR) data. The approach is transferable on a global scale as well as for di ﬀ erent land cover types (including densely vegetated forest and agricultural regions), and is based on the use of backscattering coe ﬃ cient, interferometric SAR coherence, and polarimetric parameters. Furthermore, four topographical factors were included in the simple tuning of random forest-based land cover type-dependent classiﬁcation strategy. Results showed the classiﬁcation accuracy was above 0.75, with an F-measure higher than 0.70, in all ﬁve selected regions of interest located around globally distributed mountain ranges. Whilst excluding forest-type land cover classes, the accuracy and F-measure increases to 0.80 and 0.75. In cross-location model set, the accuracy can also be maintained at 0.80 with non-forest accuracy up to 0.85. It has been found that the elevation and polarimetric parameters are the most critical factors, and that the quality of land cover information would also a ﬀ ect the subsequent mapping reliability. In conclusion, through comprehensive validation using optical satellite and in-situ data, our land cover-dependent total SCE mapping approach has been conﬁrmed to be robustly applicable, and the holistic SCE map for di ﬀ erent months were eventually derived.


Introduction
Snow cover is an important parameter in the context of water availability, the global radiation balance, and natural disasters such as floods and avalanches. It is also a crucial resource for tourism and hydropower generation [1][2][3][4]. Climate change has been affecting snow cover extent (SCE), its amount, and its distribution globally, generally leading to a decline of the aforementioned parameters [5][6][7][8]. This was also stated in the Synthesis Report of Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (IPCC) [9]. The latest IPCC special report of 2018 reveals severe potential risks due to more extreme SCE changes and melting caused by global warming, which would affect winter tourism and hydrology [10]. As summarized by Beniston et al. [5] and Brown et al. [7], because of the rising temperature as well as changes in atmospheric circulation patterns, snow depth (SD) and snow cover duration (SCD) have exhibited a decreasing trend in the past decades. Additionally, based on climate model simulations and the IPCC special report [6], a declining trend of SCE must be expected, with an even more severe impact on mountain regions [5,11].

SAR Data
As this study aims at using openly accessible data sources, Sentinel-1 imagery provided by the European Space Agency (ESA) were employed. The conventional interferometric wide swath (IW) acquisition mode was used in this study. Both level 1 Single Look Complex (SLC) and Ground Range Detected (GRD) products were used, with the latter merging all sub-swath and de-bursting.
To validate our algorithm across different years and months/seasons, a total of three datasets (including one for the first hydrological year and two for the second hydrological year) were selected for each study area (details are shown in Figure 2 and Table 2). In order to represent snow cover conditions as well as to maximize the similarity of period sample used in each region, the months of November, February and May of the first year were selected in each region. However, for Mount Whitney (MW) and Landtang Lirung (LL) sites, the months of February, April, and May were chosen instead. This is due to the fact that in the MW and LL regions, the imagery was stable only after February 2017, as a result of a much rarer revisiting frequency of Sentinel-1, according to their orbit design, in comparison to European regions. Regarding Aconcagua (AG), the only site located in the southern hemisphere, has opposite season, the corresponding months samples were used. Two different months of imagery were used for external validation in examining the month-dependency of the established model set.

SAR Data
As this study aims at using openly accessible data sources, Sentinel-1 imagery provided by the European Space Agency (ESA) were employed. The conventional interferometric wide swath (IW) acquisition mode was used in this study. Both level 1 Single Look Complex (SLC) and Ground Range Detected (GRD) products were used, with the latter merging all sub-swath and de-bursting.
To validate our algorithm across different years and months/seasons, a total of three datasets (including one for the first hydrological year and two for the second hydrological year) were selected for each study area (details are shown in Figure 2 and Table 2). In order to represent snow cover conditions as well as to maximize the similarity of period sample used in each region, the months of November, February and May of the first year were selected in each region. However, for Mount Whitney (MW) and Landtang Lirung (LL) sites, the months of February, April, and May were chosen instead. This is due to the fact that in the MW and LL regions, the imagery was stable only after February 2017, as a result of a much rarer revisiting frequency of Sentinel-1, according to their orbit design, in comparison to European regions. Regarding Aconcagua (AG), the only site located in the southern hemisphere, has opposite season, the corresponding months samples were used. Two different months of imagery were used for external validation in examining the month-dependency of the established model set.  Table 2. Summary of the SAR data (Sentinel-1, S1) and optical data (Sentinel-2, S2; Landsat-7/8, L7/8) used for training and validation set of this study. Note: reference image of each region for calculating wet snow cover extent (SCE) is marked with an asterisk (*); optical images employed in validation set are included in brackets after the used SAR image dates.

Auxiliary Data
In addition to SAR-derived information, it is also critical to include both land cover as well as topographical information for building a model for the total SCE classification. Considering the global availability, accessibility and data sustainability, we relied on the version 4 of the digital elevation model (DEM) based on National Aeronautics and Space Administration (NASA)'s Shuttle Radar Topographic Mission (SRTM), as well as the land cover data provided by ESA Climate Change Initiative (CCI) product version 2.0.7. SRTM DEM was used to derive elevation, slope, and aspect and curvature topographical information.
Even though the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) DEM has 30 m spatial resolution and global coverage, which is better when compared to the SRTMv4 DEM (around 80% global coverage and 90 m spatial resolution), SRTM was chosen as a source for the DEM as it provides higher vertical accuracy [35][36][37][38]. For land cover data, we selected the ESA CCI land cover product because CCI not only provides a global coverage with 300 m resolution and quality flags denoting the number of valid observations used for classification, but has also been annually updated since 1992, with the latest update in 2015.
The ESA CCI land cover product typology follows the land cover classification system (LCCS) defined by the United Nations (UN) Food and Agriculture Organization (FAO) [39]. It originally categorizes 37 different land cover types including some detailed distinguishments, such as different ratio of mosaic tree and shrub with herbaceous cover; hence all original 37 classes were re-categorized to 16 classes for our study (definition shown in Appendix A), to maintain class differences while avoiding complexity.
To train and validate our models, DLR's Global SnowPack [14] was employed as it provides daily full globe SCE in 500 m spatial resolution without the influence of cloud coverage or polar darkness. Moreover, its daily SCE availability allows for a same-day training and validation when the Sentinel-1 scenes are acquired.

SAR Imagery Processing
The processing of SAR-based observations requires several steps in order to support the three mainstream SAR-based SCE detecting approaches comprising backscattering, InSAR, and PolSAR-based methods.
For backscattering-based information, each GRD SAR image was subset to the extent of the study region. Each image was calibrated, had its thermal noise removed, and underwent speckle filtering with Lee sigma filter [40] (7 × 7 window size). It was then corrected for terrain flattening and distortion with the SRTM DEM, and converted to decibel (dB) units. Finally, each season's SAR dB dataset was set in relation to the summertime's reference dB image (i.e., theoretically bare/snow-free ground) ( Table 2). It is important that, as the quantitative analysis is involved (ratio), the original intensity value or digital numbers (DN) of GRD images need to be calibrated. This is to represent the genuine brightness value based on the incidence angle information [41].
Two SLC images were co-registered for InSAR processing to estimate the coherence value. As IW mode Sentinel-1 imagery is sensed at TOPS mode, it is necessary to respectively split each sub-swath and merge each burst, before and after co-registration. The coherence map was then terrain corrected to remove distortions. For InSAR, the temporal baseline between master and slave images should be shortened to reduce the magnitude of temporal variability that may affect the coherence value considerably [42]. Thus, the temporally nearest SAR imagery of a selected master scene was used as slave image.
Finally, for polarimetric information, each SLC image was calibrated, debursted, and the polarimetric matrix was constructed. As the employed Sentinel-1 IW mode imagery only provides dual polarizations (VV and VH polarizations in all five study regions), the incoherent polarimetric C2 (2 × 2) complex covariance matrix formed by co-and cross-polarized complex scattering amplitudes was used [43,44]. The refined Lee filter [45], with 7 × 7 window size, was then applied to de-speckle, and a decomposition was performed before the final terrain correction. In the presented study, the H/A/α decomposition proposed by Cloude and Pottier [27] was selected due to the availability of dual-polarizations as well as the intensive employment frequency and its viability to detect SCE [46][47][48][49]. Under the commonly used assumption of reciprocal targets (i.e., scattering amplitude of VH and HV are the same) [44], H/A/α decomposition utilizes the eigenvalues and eigenvectors calculated from the coherency matrix, which then composes the H/A/α, representing the entropy (randomness of depolarization of scattering type), anisotropy (normalized difference of importance of scattering type), and scattering angle (dominant scattering angle and type). The comprehensive descriptions of decomposition can refer to [27,44].

Random Forest Classification
As the aim of this study is to detect snow cover on the land surface of various regions around the globe, a simple tuning classifier which is capable of efficiently solving a two-class classification problem is necessary. According to the Global Observing System for Climate (GCOS) [50], a spatial resolution of at least 100 m is necessary for snow cover mapping in mountain regions. Therefore, it was set as the output of the implemented model setup. In this study, the random forest (RF) [51] was chosen to classify snow cover based on SAR observations. Compared to other supervised classifiers, RF has multiple advantages, including a relatively short training time, low computation load, capability to handle high dimensional data with missing values, support of parallel processing, simple tuning of parameters, avoiding of over-fitting, providing importance indexes of input variables, and providing an internal quality index (out-of-bag (OOB) error) [51][52][53][54][55][56][57]. Due to these advantages, RF has been widely applied to many remote sensing-based classification topics [32,[57][58][59][60][61][62][63][64]. However, it is still surprising that, so far, studies relying on RF classifiers for SCE mapping are very rare.
In the present study, the two required tuning parameters for saving memory consumption and computation time while yielding the highest classification performance after several test runs, that is, the parameters of the number of trees L and number of random features assigned to each tree θ k , are respectively set to 600 and the default value (square root of total feature numbers). Although several studies demonstrated that L has less influence than θ k on the classification accuracy [57,65,66], different L values were tested (ranging from 50 to 1000) in Monte Rosa (MR) region and the resultant classification accuracy and processing time are presented in Appendix A. It was found that the RF model with 600 trees can efficiently achieve on optimized accuracy. This finding agrees with previous studies suggesting that a tree number of around 500 is generally sufficient [67,68].
The RF models were established separately for each land cover class. Namely, each modelling trial consists of N RF models, with N referring to the number of land cover classes in the study region. To test the influence of the three SAR observations and the four topographical factors on the RF performance, they were combined in varying setups. Firstly, uni-, dual-, and three SAR observation-combined based models were tested. Then, the highest accuracy model was added with single and all topographical information. Eventually, the best observation-factor combination model was then applied in two individual RF model scenarios. In the first case, each region's first year's data was used to build the model for each region individually, which were then correspondingly applied to the region's second year's data. Consequently, in the second case, the input data of all test regions were merged from the first year's data to build one aggregated model, which was applied to the second year.

Data Splitting for Model Training and Validation, Calculation of Accuracy Evaluation Measurements
To establish the RF model as well as to achieve a comprehensive validation, in the presented study many different sources of data were used, including GSP, optical-based SCE, and snow depth data of meteorological stations. The overall data splitting for each region's land cover class's model training and validation is illustrated in Figure 3.
First, to train the RF model, 70% of the pixels of the first year's data (including SAR-based observations and topographical factors mentioned in 3.2.2) were randomly selected and facilitated to build the model. The established model was then internally validated with 30% of the first year's GSP data, which were manually omitted from building the model. It must be noted that the OOB samples, which account for around 37% of input data, which were automatically not employed in the training sample when building the RF model and assigning features to the model trees, were not used for cross-validation in the presented study. The coarse resolution of GSP (500 m) would inevitably lead to spatial autocorrelation between training and OOB samples. Namely, the value between two sample sets are dependent, which could lead to overestimation of OOB accuracy [67].
Then the established RF model (only trained by first year's GSP data) was used to predict the SCE of the second year, which was then validated with additional external validation sources, including the second year's GSP (two sets of data, as shown in Table 2), Landsat/Sentinel-2-based SCE, and meteorological data. When comparing the different model inputs and setups, one of the most critical steps is to evaluate the specific model performances. Hence, an accuracy assessment was introduced, including overall accuracy, F-measure [69], and "Area Under the Receiver Operating Characteristic Curve (ROC)" (AUC) [70]. These indexes provide different evaluation aspects [71,72] as overall accuracy. F-measure focuses on minimizing the number of errors based on thresholding and qualitative evaluation. In contrast, AUC's purpose is to show how accurate the classes are separated, by checking how the model ranks the examples. Moreover, as it is commonly known that forest-covered areas could produce erroneous results (based on SAR data snow cover classifications [47,[73][74][75], the abovementioned accuracy measurements were also used to calculate non-forested regions separately (the criteria of forest type listed as Appendix A) in order to examine the influence of vegetation.
Considering the overall accuracy, F-measure, and AUC, all of them can be derived from the confusion matrix. In this study, overall accuracy is defined as: Overall Accuracy = pixels within true total SCE classi f ied as total SCE + pixels in snow f ree classi f ied as snow f ree all pixels (1) The F-measure score (a.k.a. F1 score), is a composed weighted measurement which considers both recall as well as precision. In this study, these accuracy indices were calculated as following: Precision = pixels within true total SCE classi f ied as total SCE pixels classi f ed as total SCE (2) Recall = pixels within true total SCE classi f ied as total SCE pixels within true total SCE (3) The last quality index that is calculated to evaluate the performance of the model is the AUC score, which is based on ROC. The ROC curve is plotted with false positive rate and true positive rate on the xand y-axis, respectively. The specificity can be defined as: pixel not in true total SCE classi f ied as not total SCE pixel not in true total SCE Remote Sens. 2019, 11, 895 9 of 29 ROC can be interpreted as the probability of ranking randomly selected actual positive samples over randomly selected negative samples. AUC is defined as the area between the ROC and y = x function, which has a value range from 0.5 to 1.0. According to Metz [76], AUC values range from 0.5 to 0.6, 0.6 to 0.7, 0.7 to 0.8, 0.8 to 0.9, and 0.9 to 1.0, representing bad, satisfactory, good, very good, and excellent model performance, respectively.

Tests for Selecting the Optimized Model Input Variables
As described in Section 3.2.2, the possible inputs for the implemented model to detect total SCE include SAR-based observations (backscattering ratio, InSAR coherence and PolSAR H/A/α parameters, abbreviated as B, I and P), topographical parameters (elevation, aspect, slope, and curvature, abbreviated as e, a, s, and c), and land cover information originating from the CCI land cover product. To select the optimized input combination for the model, Monte Rosa (MR) in the Swiss Alps was chosen as a pilot study site for an in-depth accuracy assessment, as it is characterized by complex topography and land cover.
To test the influence of the SAR parameters on the model performance, different combinations of SAR-based observations were employed to train MR's model set. To assess their uncertainties, 20 iterations were processed for each trial. Resultant accuracy assessments are illustrated in Figure 4a. It was obvious that for all land cover classes, the combination of all available SAR parameters (B + I + P) produced the highest accuracy. When relying only on one SAR-based observation, using P leaded to higher classification accuracy than relying on I or B. The accuracy improvement between using only single SAR parameters and the combination of all available parameters (B + I + P) was significant. Moreover, it was observed that for land cover classes that pose considerable challenges in detecting snow cover using SAR (e.g., forested areas), combining all available SAR parameters leaded to a significant improvement of classification accuracy.
For evaluating the influence of topographical parameters on the model performance, elevation (e), aspect (a), slope (s), and curvature (c) were added as input parameters to the ML approach separately. Their influence on the classification accuracy is presented in Figure 4b. Results showed that by successively adding more topographical parameters the performance of the model increased, leading to its highest performance if all parameters are added collectively ("asce" case in Figure 4b).
The overall accuracy for all land cover classes could be improved by around 8.4%, while the F1 score improved by 9.4% when compared to using only the combination of B + I + P as inputs. Even in non-forested areas, the accuracy (non-forest classes' overall accuracy (NF OA)) raised by around 7.7%. Additionally, the AUC value could be improved by more than 12%. Of all tested topographical parameters, elevation had the biggest impact on the model performance. For example, using only B + I + P to classify snow cover within the mixed shrub and herbaceous land cover, the accuracy only reached around 0.62. This value could be improved to 0.72 by only adding elevation (e) as an auxiliary input. By adding the remaining three topographical parameters (s + a + c), the accuracy increased to 0.74. Testing the influence of different combinations of input parameters (SAR as well as topographic), it became clear that the combination of all available SAR-based observations and all topographical parameters leaded to the best model performance for classifying total snow cover. The same tests have been performed for all test sites listed in Table 2, which are not illustrated here for clarity, but leading to the same results as for the Monte Rosa test site. Therefore, all available SAR-based observations and all topographical parameters have been used as model inputs for the subsequently presented results.
it became clear that the combination of all available SAR-based observations and all topographical parameters leaded to the best model performance for classifying total snow cover. The same tests have been performed for all test sites listed in Table 2, which are not illustrated here for clarity, but leading to the same results as for the Monte Rosa test site. Therefore, all available SAR-based observations and all topographical parameters have been used as model inputs for the subsequently presented results.  SAR-based observations including backscattering ratio, interferometric SAR (InSAR) coherence, and polarimetric (PolSAR) H/A/α parameters are abbreviated as B, I, and P, respectively. Topographical factors including aspect, slope, curvature, and elevation are abbreviated as a, s, c, and e, respectively. Accuracy of each land cover type modeled with SAR-based observations only or together with topographical factors is shown in (a) and (b), respectively. Non-forest classes' overall accuracy, F measure, and area under the receiver operating characteristic curve are abbreviated as NF OA, NF F1, and NF AUC, respectively.

Holistic (Total + Wet) SCE Detection and Overall Workflow Overview
The wet SCE detection was performed, in addition to the RF-based classification of the total SCE, in order to exploit the full potential of the SAR data and to discriminate between wet and dry snow. To map the wet SCE, the backscattering coefficient-based "Nagler's method" was applied. As summarized in the introduction, this method requires two SAR images sensed at snow-covered and snow-free conditions. It utilizes a backscattering threshold (−3 dB) to the ratio map of the two images for detecting the wet SCE. In the present study, according to the feasibility of using a cross-temporal reference image, proved by Luojus et al. [20], both years' SAR data observed during the snow-covered season were combined with the reference scene (summertime observation) of the first year ( Table 2).
The overall workflow of mapping total SCE, by applying a supervised RF classification model set as well as detecting wet SCE with "Nagler's method", is illustrated in Figure 5.

Results
Based on the optimized input variable combinations tested in Section 3.2.4, the same modelling approach was applied to all five study areas. Additionally, to examine the transferability of the model setups, the results were comprehensively validated with different data sets (Landsat/Sentinel-2 and in-situ snow depth information). Finally, the holistic (total + wet) SCE was derived based on the modeled total SCE and wet SCE calculated from "Nagler's method".

Accuracy Assessment of Modeled Total SCE
As presented in Section 2.1, five different study areas located around the globe were selected and applied with the proposed model. The total SCE mapping model set can be trained by either: (1) Using each region's data to train an individual model set and; therefore, training a region-dependent model set; or (2) merging all regions' data to train one universal model set, which was then applicable for all regions simultaneously. Depending on the intentional application of the model, it might be sufficient to have a spatially confined model that only works for a specific region. However, if it is required to derive snow cover information for several varying locations, a universal model set has to be implemented. For both cases, the performance of the model has to be known and; therefore, has to be assessed.
For the first (individual) case, the results of the accuracy assessment are presented in Figure 6a-e. The overall accuracy of Monte Rosa (MR), Zugspitze (ZG), Mount Whitney (MW), Landtang Lirung (LL), and Aconcagua (AG) were around 80%, 75%, 90%, 90%, and 80%, respectively. The F1 score for each test site was higher than 70%. AUC scores for all regions were around 70% or higher which indicated a good model performance; with the exception of ZG, with around 60%, which was still satisfying. The amount and distribution of land cover vary between the test sites and affect the overall accuracy of the model performance. The complexity of land cover types within the CCI land cover product used as auxiliary data in this study was considerably higher in the Europeans Alps and Asian Himalaya when compared to the Andes and Rocky Mountains (see Figure 1). As already reported in earlier SAR-based snow cover studies [17,18], lower accuracy values are achieved in grassland as well as forested areas (classes marked as * in Figure 6), with the exception of the LL test site. To examine the negative effect of grassland and forest areas of the classification performance, the overall accuracy, F measure, and AUC score values have also been calculated for non-forest classes (indicated in Figure 6 and abbreviated with NF OA, NF F1, and NF AUC).

External Validation
Although previous studies often neglect the importance of a detailed validation approach, it is indispensable to comprehensively examine the robustness and correctness of a proposed model with internal and external validation efforts. Consequently, several validation steps have been incorporated into the presented study. It complements the internal validation illustrated in Section 4.1, with addition to external validation based on the GSP, optical-based SCE, and in-situ snow depth data, as illustrated in the data splitting of Figure 3 and workflow of Figure 5. The results of these accuracy assessments are presented in the following sections.

Results of the External Validation with the Global SnowPack and Landsat/Sentinel-2-Derived Snow Cover Maps
In order to examine each land cover types' influence on the model's accuracy, the comparison between the "training accuracy" (the overall accuracy of the internal validation, calculated based on 30% of the first year's data manually kept (Figure 3)) and the "validation accuracy" (the overall accuracy of the external validation, calculated based on the second year's GSP) were calculated and presented as blue and red bars in Figure 7, respectively. Most of the land cover types showed a slightly lower accuracy for all test sites of the second-year validation accuracy, with the exception of merging case (f). SAR-based observations including backscattering ratio, InSAR coherence, and PolSAR H/A/α parameters are abbreviated as B, I, and P, respectively. Topographical factors including aspect, slope, curvature, and elevation are abbreviated as a, s, c, and e, respectively.
As outlined above, building individual models for each test site separately can guarantee a more customized classification with higher accuracy. The overall accuracy of these individual cases including forests and grassland ranged from 75.2% to 89.6% (of which 79.4%-98.6% excluding forests and grassland) with F1 scores ranging from 71.9% to 89.6% (of which 76.0%-98.6% excluding forests and grassland, see Figure 6a-e). All input data were also merged to build a universally applicable model, which showed only slightly lower accuracy values of 81.1% of the overall accuracy (of which 84.5% excluding forests and grassland) and an F1 score of 80.4% (of which 83.9% excluding forests and grassland, see Figure 6f). These values proved a good model performance. This evaluation revealed the robustness of the classification approach proposed in this study whilst merging all input data to create a globally transferable model setup; even if the model was trained universally. The sturdiness can also be exposed from the neglected accuracy difference of 20 trials for each land cover class of each region.

External Validation
Although previous studies often neglect the importance of a detailed validation approach, it is indispensable to comprehensively examine the robustness and correctness of a proposed model with internal and external validation efforts. Consequently, several validation steps have been incorporated into the presented study. It complements the internal validation illustrated in Section 4.1, with addition to external validation based on the GSP, optical-based SCE, and in-situ snow depth data, as illustrated in the data splitting of Figure 3 and workflow of Figure 5. The results of these accuracy assessments are presented in the following sections.

Results of the External Validation with the Global SnowPack and Landsat/Sentinel-2-Derived Snow Cover Maps
In order to examine each land cover types' influence on the model's accuracy, the comparison between the "training accuracy" (the overall accuracy of the internal validation, calculated based on 30% of the first year's data manually kept (Figure 3)) and the "validation accuracy" (the overall accuracy of the external validation, calculated based on the second year's GSP) were calculated and presented as blue and red bars in Figure 7, respectively. Most of the land cover types showed a slightly lower accuracy for all test sites of the second-year validation accuracy, with the exception of the ZG region. However, almost all land cover types within all study areas were still characterized by a satisfying accuracy (>0.6), with the exception of forested classes of the MR region. This affirmed the robustness of the employed model set.  The 500m spatial resolution GSP product may not accurately depict the actual SCE in mountain regions due to missed pixel effects and the negative influence of temporal interpolation for cloudgap filling. Therefore, in addition to this product, higher resolution optical imagery originating from Landsat-7/8 (30 m) and Sentinel-2 (20 m) were also employed to derive SCE for validation. All available observations from these two sensors were screened for suitable data (i.e., cloud-free). The observations chosen for processing are listed in Table 2. The aim is to keep the observation date of the optical sensor as close as possible to the SAR data's date, which could be achieved for most cases The 500m spatial resolution GSP product may not accurately depict the actual SCE in mountain regions due to missed pixel effects and the negative influence of temporal interpolation for cloud-gap filling. Therefore, in addition to this product, higher resolution optical imagery originating from Landsat-7/8 (30 m) and Sentinel-2 (20 m) were also employed to derive SCE for validation. All available observations from these two sensors were screened for suitable data (i.e., cloud-free). The observations chosen for processing are listed in Table 2. The aim is to keep the observation date of the optical sensor as close as possible to the SAR data's date, which could be achieved for most cases with temporal gaps of less than one week.
To extract the optical-based total SCE, the Fmask algorithm was employed [77]. Fmask is an algorithm that aims to extract clouds, cloud shadows, snow, and waterbodies from Landsat 4-8 and Sentinel-2 imagery. Thanks to its mountainous Fmask function, which utilizes additional DEM information, the accuracy of mapping cloud shadow and removing terrain shadow in mountainous regions is significantly improved. For parameter setting, the dilation size of cloud, cloud shadow, and snow were set to 3, 3, and 0, respectively. The threshold for cloud probability was set to 10.0% for Landsat-7, 17.5% for Landsat-8, and 20.0% for Sentinel-2. The resultant total SCE and original true color composites for each study site are shown in Figure 8. Overall, the Fmask classification results were reasonable, except for AG region showing misclassification of waterbody. However, it was also obvious that there was still serious cloud contamination. Additionally, due to the orbit design and sensing swath of the satellites, the study area was not always completely covered by the Landsat/Sentinel-2 observation.
To compare two total SCE results (the SAR-based one based on the RF classification, and the optical-based classification based on Fmask), a total of 10,000 random points were equally distributed in both snow-covered as well as bare ground areas (5000 points each). The points were distributed based on the result of the Fmask product, excluding cloud-covered areas by treating them as no-data. Although some studies reported that Fmask may tend to over-estimate cloud [78,79], in the present random-point-allocating method it would not damage the reliability of validation. A confusion matrix as well as evaluation indexes (overall accuracy and F1 score) were then computed for each test site and each month of the second year, and the results of this assessment are presented in Figure 9. To simplify the interpretation, the numbers of points in each column/row of the confusion matrix were converted into a percentage of the total 10,000 points. All ten months' cases showed good accuracy and F1 score (>0.72); except for month1 (month not used in the training of the first year, referring to Table 2) of AG region, which showed significant under-estimation. Under-estimation was also shown in non-used month (month1) of the first year in ZG and LL region. Furthermore, the month2 (month included in training set) of AG region showed a mild over-estimation. color composites for each study site are shown in Figure 8. Overall, the Fmask classification results were reasonable, except for AG region showing misclassification of waterbody. However, it was also obvious that there was still serious cloud contamination. Additionally, due to the orbit design and sensing swath of the satellites, the study area was not always completely covered by the Landsat/Sentinel-2 observation. 9. To simplify the interpretation, the numbers of points in each column/row of the confusion matrix were converted into a percentage of the total 10,000 points. All ten months' cases showed good accuracy and F1 score (> 0.72); except for month1 (month not used in the training of the first year, referring to Table 2) of AG region, which showed significant under-estimation. Under-estimation was also shown in non-used month (month1) of the first year in ZG and LL region. Furthermore, the month2 (month included in training set) of AG region showed a mild over-estimation.

Validation Relying on In-Situ Data Originating from Meteorological Stations
In addition to the validation based on satellite observations, snow depth (SD) measurements originating from meteorological stations are available for some study areas. For the European region, few countries' national meteorological agencies, for example in Switzerland, provide free access to these data. Additional measurements of daily SD are available from the European Climate

Validation Relying on In-Situ Data Originating from Meteorological Stations
In addition to the validation based on satellite observations, snow depth (SD) measurements originating from meteorological stations are available for some study areas. For the European region, few countries' national meteorological agencies, for example in Switzerland, provide free access to these data. Additional measurements of daily SD are available from the European Climate Assessment and Dataset Project (ECA&D) [80]. The study sites MR and ZG were chosen to be validated with 28 stations of MeteoSwiss and 114 stations of ECA&D, respectively. The locations of the meteorological stations are illustrated in Figure 10. However, as more than 50% of the stations are located in relatively low elevation zones, which are generally completely snow-free in May, only March observations were validated for the presented study. For the study site MW, only one meteorological station was available, while for LL and AG, no suitable and freely accessible station data could be acquired. Therefore, these three regions were not validated with meteorological records. The confusion matrix, displaying the accuracy derived from the comparison of the SAR-based SCE classification with station data, is illustrated in Figure 11. Whilst not considering the skewed observations (station records contain less "snow-covered" data (i.e., SD higher than 0 cm)), both regions achieved overall accuracy values of more than 87%. However, a slight under-estimation can be found in the ZG region. This could be because there might exist a certain threshold of SD for the spaceborne SAR-based approach to detect that a pixel is snow-covered, such as 1 cm for MODIS snow cover product [81] and 2 cm for passive sensors [82]. classification with station data, is illustrated in Figure 11. Whilst not considering the skewed observations (station records contain less "snow-covered" data (i.e., SD higher than 0 cm)), both regions achieved overall accuracy values of more than 87%. However, a slight under-estimation can be found in the ZG region. This could be because there might exist a certain threshold of SD for the spaceborne SAR-based approach to detect that a pixel is snow-covered, such as 1 cm for MODIS snow cover product [81] and 2 cm for passive sensors [82].

Holistic SCE Maps Including Discrimination Between Wet and Dry Snow
According to the comprehensive internal and external validation approaches relying on different data sources, all study regions turned out to achieve reliable accuracy values; with the exception of AG region, which showed comparably weaker results. As the RF classification approach relied on training data originating from optical-based products (GSP), it is not possible to train for the discrimination between wet and dry snow. However, as this information is required, we also detected the wet SCE based on Nagler's algorithm [17]. For the subsequent merge of the total SCE derived from the RF classification and the wet SCE derived from Nagler's classification, the presence of wet SCE was only allowed for regions where the RF classification had successfully detected snow. Figure 12 contains the results of this merging approach for all test sites, including month 1 and 2, to illustrate the dynamics of SCE. During March (month 1, Figure 12a,c,e,g, and i), the total extent of snow cover was larger but wet SCE was proportionally small. As snow melt proceeded during the spring time, wet SCE increased (month 2, Figure 12b,d,f, and h, with AG region constituting an exception) while the total SCE decreased.

Holistic SCE Maps Including Discrimination between Wet and Dry Snow
According to the comprehensive internal and external validation approaches relying on different data sources, all study regions turned out to achieve reliable accuracy values; with the exception of AG region, which showed comparably weaker results. As the RF classification approach relied on training data originating from optical-based products (GSP), it is not possible to train for the discrimination between wet and dry snow. However, as this information is required, we also detected the wet SCE based on Nagler's algorithm [17]. For the subsequent merge of the total SCE derived from the RF classification and the wet SCE derived from Nagler's classification, the presence of wet SCE was only allowed for regions where the RF classification had successfully detected snow. Figure 12 contains the results of this merging approach for all test sites, including month 1 and 2, to illustrate the dynamics of SCE. During March (month 1, Figure 12a,c,e,g,i), the total extent of snow cover was larger but wet SCE was proportionally small. As snow melt proceeded during the spring time, wet SCE increased (month 2, Figure 12b,d,f, and h, with AG region constituting an exception) while the total SCE decreased.

Influence of Different Input Factors on the Classification Accuracy
As shown in Section 3.2.4, the highest classification accuracy can be obtained when including all three SAR-based observations and all four topographical factors. This finding is not surprising as the influence of the abovementioned parameters has already been reported in previous studies [24,[83][84][85][86]. Nevertheless, the impact of these variables has only been evaluated separately. The combination of all parameters in one single approach, as it was implemented for the presented study, turned out to achieve the best results. Topography influences the melting and distribution of snow in mountain regions and was; therefore, included in several approaches to map dry SCE based on wet SCE [17,83,[87][88][89]. Hence, its influence on the RF-based approach implemented for the presented study is also significant. Moreover, among all three SAR-based observations and the four topographical

Influence of Different Input Factors on the Classification Accuracy
As shown in Section 3.2.4, the highest classification accuracy can be obtained when including all three SAR-based observations and all four topographical factors. This finding is not surprising as the influence of the abovementioned parameters has already been reported in previous studies [24,[83][84][85][86]. Nevertheless, the impact of these variables has only been evaluated separately. The combination of all parameters in one single approach, as it was implemented for the presented study, turned out to achieve the best results. Topography influences the melting and distribution of snow in mountain regions and was; therefore, included in several approaches to map dry SCE based on wet SCE [17,83,[87][88][89]. Hence, its influence on the RF-based approach implemented for the presented study is also significant. Moreover, among all three SAR-based observations and the four topographical parameters, PolSAR and elevation resulted in having the highest impact on the classification accuracy, as illustrated in Figure 4. Comparing the classification accuracy relying on only one SAR-based observation (Figure 4a), PolSAR parameters (P) can generate the highest accuracy comparing to backscattering (B) and InSAR coherence (I). For instance, in the urban land cover type, the accuracy of the B/I/P-based cases is around 0.71, 0.72, and 0.79, respectively. Comparing the accuracy testing with only adding one topographical factor (Figure 4b), elevation shows the most significant improvement as the accuracy of grass land can improve from original 0.71 (using only SAR-based observations) to 0.83 when only adding elevation information, but only enhance to 0.72, 0.73, and 0.76 when adding aspect, curvature, and slope information, respectively. This might be because PolSAR contains much more physical information of the ground conditions, such as characteristics of ground surface, than the relatively noisy backscattering and the InSAR coherence.
The importance of both PolSAR parameters and elevation can also be revealed by the ranks of importance of the variables generated by the RF model. The importance of each input variable of each land cover-targeted RF model was weighted, averaged by the percentage of land cover class for each region. The average importance of each variable for all five study areas is illustrated in Figure 13. Although the importance of the variables in each region is different, it is obvious that for all of the regions, elevation and PolSAR parameters have the highest importance. InSAR coherence also shows considerable contribution to the model accuracy comparing to less important variables such as slope, backscatter, aspect, and curvature. However, due to the fact that: (1) all those input variables were confirmed related to snow cover by previous studies; (2) even the least important variable (i.e., curvature) still has an importance of more than 5%; (3) the importance of each input variable differs for each region; and (4) the goal of the proposed approach is to guarantee global transferability, no input variable was trimmed out in the proposed modeling approach to ensure broader suitability and flexibility.

Influence of Vegetation and Land Cover Types on the Classification Accuracy
The evaluation of the RF performance revealed that the snow cover classification accuracy varies significantly between land cover classes (compare Figures 4, 6 and 7). Densely vegetated regions tend to yield lower SCE accuracy when compared to, for example, bare area, sparse vegetation, urban areas, or even grasslands. These findings match with previous studies' conclusions: Koskinen et al. [73] stated that the existence of vegetation will greatly reduce the difference between SCE and other land cover features. It has further been confirmed by simulation that the stem volume of forests directly relates to SAR signals [74], which influences land cover classes with even less biomass, such as grassland or cropland [22,90]. Duguay and Bernier [91] found that the vegetation height would impact the SAR backscattering as well. Moreover, Dedieu et al. [47] suggested that the structure and phenological change of trees would also affect how the SAR signal reflects and bounces, as it was also observed by Park et al. [83]. Venkataraman [75] realized that the random motion of vegetation also causes dramatic loss of InSAR coherence.

Challenges for Validating SAR-Based Snow Cover Classifications; the Case of Aconcagua (AG) Region
After validating the modeled total SCE results, some regions ended up performing better than others. The most apparent example where several factors come into play is the AG test site. The external validation with optical-derived SCE maps (Figure 9) proves the under-estimation of the actual SCE in month1 and the over-estimation in month2. One of the possible reasons is that the land cover product provided by of ESA CCI might be biased for this region. Some of the classes (e.g., scrubland) appear to be overestimated when checked manually, relying on very high-resolution imagery from, for example, Google Earth. Around 57% of the AG region is labeled scrubland in the land cover product. When reviewing the quality flag of the ESA CCI land cover product, a lack of available input data can be constituted. As shown in Figure 14, the AG region's land cover product is based on considerably less input data than any other region, which might be an indicator that the quality is also poor. In addition to that, according to the ESA CCI land cover product user guidance [39], some specific classes, such as sparse vegetation (which accounts for around 23.5% of the AG region), have relatively low classification accuracies. Considering that the total SCE mapping approach proposed in this study is land cover dependent, possible biases in the land cover product would inevitably reduce the robustness of model. As a result, a significant under-and over-estimation could be connected to this issue. To prove this assumption, we also tested the influence of erroneous land cover classes on the classification accuracy in the Monte Rosa (MR) region. First, land cover class labels were randomly shuffled, and the first year's model internal validations as well as the second year's external validation of the resultant SCE were examined (according to Figure 3). A decline of around 3% in overall accuracy was found in the internal cross-validation using first year GSP data, and around 20% and 31% of overestimation were shown in month1/2, validated with optical-based SCE, respectively. Based on this test, it is confirmed that the quality of land cover labelling is important. observed by Park et al. [83]. Venkataraman [75] realized that the random motion of vegetation also causes dramatic loss of InSAR coherence.

Challenges for Validating SAR-Based Snow Cover Classifications; the Case of Aconcagua (AG) Region
After validating the modeled total SCE results, some regions ended up performing better than others. The most apparent example where several factors come into play is the AG test site. The external validation with optical-derived SCE maps (Figure 9) proves the under-estimation of the actual SCE in month1 and the over-estimation in month2. One of the possible reasons is that the land cover product provided by of ESA CCI might be biased for this region. Some of the classes (e.g., scrubland) appear to be overestimated when checked manually, relying on very high-resolution imagery from, for example, Google Earth. Around 57% of the AG region is labeled scrubland in the land cover product. When reviewing the quality flag of the ESA CCI land cover product, a lack of available input data can be constituted. As shown in Figure 14, the AG region's land cover product is based on considerably less input data than any other region, which might be an indicator that the quality is also poor. In addition to that, according to the ESA CCI land cover product user guidance [39], some specific classes, such as sparse vegetation (which accounts for around 23.5% of the AG region), have relatively low classification accuracies. Considering that the total SCE mapping approach proposed in this study is land cover dependent, possible biases in the land cover product would inevitably reduce the robustness of model. As a result, a significant under-and overestimation could be connected to this issue. To prove this assumption, we also tested the influence of erroneous land cover classes on the classification accuracy in the Monte Rosa (MR) region. First, land cover class labels were randomly shuffled, and the first year's model internal validations as well as the second year's external validation of the resultant SCE were examined (according to Figure 3). A decline of around 3% in overall accuracy was found in the internal cross-validation using first year GSP data, and around 20% and 31% of overestimation were shown in month1/2, validated with optical-based SCE, respectively. Based on this test, it is confirmed that the quality of land cover labelling is important. The influence of the topographical parameters on the classification performance is equally significant, as outlined in Section 3.2.4 (compare also Figures 4 and 13). Possible errors in the SRTM version 4 DEM could; therefore, also contribute to miss-classifications of SCE. As reported by Rodriguez et al. [92], the SRTM DEM can have absolute height errors of more than 30 m. Since all four topographical parameters are directly derived from the height measurements, biases could cause false model presumptions that would further translate into classification errors.
Finally, the snow cover product (GSP, which is based on MODIS) utilized to train the model could be biased, too. GSP was chosen as an input because it provides daily snow cover information, which is desirable as snowfall events can occur spontaneously. The relatively coarse resolution (500 m), and the fact that clouds can cover large proportions of the study regions, contribute to uncertainties when training the model.
Any of these factors could be responsible for the relatively poor SCE classification accuracy of the AG test region. It is difficult to determine which is the pivotal factor though, as more tests would be required; while also for some input datasets (like SRTM and GSP) no quality flag is available, which quantitatively characterizes the pixel-wise errors.

Uncertainties of Mapping Wet and Dry SCE in a Holistic Map; the Case of Monte Rosa (MR) Region
As shown in Figure 12b, it is clear that, during the melting season, there is a dry SCE boundary buffering around the wet SCE, especially for the Alpine Monte Rosa region. The possible reasons could include both the over-estimation of our modeled total SCE and the under-estimation of wet SCE detected by "Nagler's method".
According to the confusion matrix comparing "modeled" and "optical-based" total SCE, as shown in Figure 9, there is a slight over-estimation observed in month2 of the Monte Rosa region caused by classification error. However, Nagler's backscattering-based "hard" thresholding approach was also commonly found to under-estimate wet SCE, especially around the snow line where wet snow is patchy [17,[93][94][95]. In detail, because snow has a significantly lower scattering cross-section than bare ground, the stronger backscattering signals of bare soil, rock, and vegetation would; thus, dominate that mixed pixel [94][95][96]. This situation mostly can only be compensated by enhancing the spatial resolution of SAR image, as tested by Malnes et al. [96].

Improvements Achieved in this Study
Although there have been some studies employing machine-learning to classify total SCE, the presented study utilized a novel fusion of multiple SAR-based observations and topographical factors to achieve a higher reliability. The proposed land cover-dependent model has successfully been applied in different study areas. In detail, the improvements when compared to earlier studies can be summarized as follows: (1). The method provides global suitability and transferability To achieve a universal, global transferability, the method needs to satisfy both data availability and model robustness. First, for data availability, all datasets being used in this study are available globally. More importantly, all input data are openly accessible, including the spaceborne SAR and optical imagery, the DEM, and the land cover product. Additionally, no field measurements for snow depth, temperature or forest's stem volume are needed. When it comes to the model's robustness, most of the previous studies applied the SCE mapping to a single regional site [30,97,98], which implies that they might be site-dependent approaches. In contrast, the presented method is tested in five independent regions located in different mountain ranges and even hemispheres. The achieved accuracy confirms that the proposed method can be applied to map SCE around the globe, which ensures the transferability. Moreover, merging the five regions' data to build a universal model set, a stable classification accuracy could also be confirmed.
(2). The method can efficiently deal with all land cover types, including forests Because of challenging SAR-vegetation interactions, many existing studies simply masked out the forest or even agricultural regions [17,18,30,[99][100][101][102]. As illustrated in Figure 1, vegetation is a Remote Sens. 2019, 11, 895 22 of 29 dominant part of land cover in all five study regions. Masking out the vegetation would significantly reduce the suitability of the SCE mapping approach to provide snow cover information for further applications. To overcome this limitation, the presented study includes all land cover types, comprising vegetation-dominated classes such as forest, scrub-, grass-, and crop-land. The model is trained individually for each land cover type to form a land cover-dependent model set. Thanks to the simplicity and robustness of random forest classifiers, they can efficiently process various SAR observations under low computation demands.
(3). The method provides holistic wet and total SCE Existing methods to detect snow cover from SAR data often focus on wet SCE only. The presented study proposes a method to derive both wet and total SCE, relying on a combination of an RF-based classifier to map the total SCE and a backscattering-based approach to detect wet SCE (Nagler's method). Thanks to the performance of the RF classifier, the workflow offers an option for near real-time and automatic classification of wet and dry SCE, thus enhancing the viability of spaceborne SAR data to map SCE. Since the method provides both wet and dry SCE, it could offer new ways of how remote sensing-based snow cover products can be incorporated into hydrological models.

Future Work
Considering the PolSAR data, existing studies have demonstrated that quad-polarization imagery can provide more satisfying results than dual-polarization, as it can reduce topographical distortions [46] and is also more sensitive to the state of snow [47]. Thus, the potential of utilizing the TerraSAR-X or Radarsat-2 should be explored. Moreover, their ultra-high-resolution sensing mode can also enhance the accuracy of wet SCE mapping as discussed in Section 5.4.
Regarding the auxiliary data, one of the future developments to improve the classification accuracy in vegetated areas will include the use of quantitative vegetation measurements, such as biomass density, the normalized difference vegetation index, or the leaf area index. The local incidence angle (LIA) of the SAR imagery related to the local topography may also be included to justify the impact of topography.

Conclusions
There are many studies available for detecting wet snow cover extent (SCE) utilizing spaceborne SAR imagery. However, there exists no properly-validated, globally transferable, total SCE mapping approach that would be applicable to all land cover classes. The presented study proposed a novel approach to map total and wet SCE based on openly accessible datasets (including SAR imagery, DEM, and land cover product) with global coverage. Thanks to the simplicity and efficiency of the implemented random forest classifier, it is possible to incorporate non-linear and complex SAR-based observations (InSAR, PolSAR, backscattering). In regards to the influence of topography, four topographical factors, including elevation, slope, aspect, and curvature, were also included as model inputs. The method has been applied to different mountainous locations (five regions located in different mountain ranges, continents, and hemispheres) over a period of different months and years. By extending the RF-based classification with a backscattering-based thresholding approach, information about wet SCE was additionally supplemented. Internal as well as external independent validation have been comprehensively conducted, confirming the robustness and transferability of the proposed method.
In addition to the global transferability, another merit of the presented approach is the capability to detect snow cover in vegetated areas, such as forests, grass-, scrub-, and crop-lands. Compared to previous studies, which often masked out the forest or even agricultural regions, this study proposed a land cover-dependent classification procedure to solve this limitation. Namely, total SCE was classified with SAR observations and topographical factors in each land cover class separately, before merging each classes' model to form a universal model set. To explore the influence of all inputs, the accuracy of each model set, built by different combinations of SAR observations and topographical factors, was evaluated. Results prove that the model set including all three SAR-based observations (InSAR, PolSAR, backscattering) as well as all four topographical factors (elevation, slope, aspect, curvature) yields the highest classification accuracy for all land cover types. Overall accuracy, F-measure, and AUC score can reach around 80%, 80%, and 70% for the majority of the regions. The accuracy can even be higher, around 5%, if vegetation land cover classes are excluded. It was also found that PolSAR and elevation provide more information than the remaining parameters. Additionally, when merging all input data for all test sites into one universal, general model (i.e., building a universally applicable model set), the accuracy remains satisfying. Future developments will include the incorporation of quantitative vegetation parameters (leaf area index, biomass, normalized difference vegetation index) as well as the local incidence angle of the SAR observations. Acknowledgments: Support by the German Academic Exchange Service (DAAD) fellowship to Ya-Lun S. Tsai is gratefully acknowledged. The authors thank the providers of the meteorological data used in this article, including MeteoSwiss (the Swiss Federal Office of Meteorology and Climatology) as well as the ECA&D project (data and metadata available at http://eca.knmi.nl).

Conflicts of Interest:
The authors declare no conflicts of interest.  Figure A1. Comparison of random forest model's performance with different numbers of trees (tested with Monte Rosa (MR)) region.