Spatial Modeling of Snow Avalanche Using Machine Learning Models and Geo-Environmental Factors: Comparison of Effectiveness in Two Mountain Regions

Although snow avalanches are among the most destructive natural disasters, and result in losses of life and economic damages in mountainous regions, far too little attention has been paid to the prediction of the snow avalanche hazard using advanced machine learning (ML) models. In this study, the applicability and efficiency of four ML models: support vector machine (SVM), random forest (RF), naïve Bayes (NB) and generalized additive model (GAM), for snow avalanche hazard mapping, were evaluated. Fourteen geomorphometric, topographic and hydrologic factors were selected as predictor variables in the modeling. This study was conducted in the Darvan and Zarrinehroud watersheds of Iran. The goodness-of-fit and predictive performance of the models was evaluated using two statistical measures: the area under the receiver operating characteristic curve (AUROC) and the true skill statistic (TSS). Finally, an ensemble model was developed based upon the results of the individual models. Results show that, among individual models, RF was best, performing well in both the Darvan (AUROC = 0.964, TSS = 0.862) and Zarrinehroud (AUROC = 0.956, TSS = 0.881) watersheds. The accuracy of the ensemble model was slightly better than all individual models for generating the snow avalanche hazard map, as validation analyses showed an AUROC = 0.966 and a TSS = 0.865 in the Darvan watershed, and an AUROC value of 0.958 and a TSS value of 0.877 for the Zarrinehroud watershed. The results indicate that slope length, lithology and relative slope position (RSP) are the most important factors controlling snow avalanche distribution. The methodology developed in this study can improve risk-based decision making, increases the credibility and reliability of snow avalanche hazard predictions and can provide critical information for hazard managers. Remote Sens. 2019, 11, 2995; doi:10.3390/rs11242995 www.mdpi.com/journal/remotesensing Remote Sens. 2019, 11, 2995 2 of 26


Introduction
One of the most dangerous hazards in mountainous regions is the snow avalanche. These events can greatly impact individual users of mountain environments, residential areas in avalanche zones, ecosystems, and public infrastructure [1][2][3]. From a geomorphological viewpoint, snow avalanches can produce a range of different erosional and depositional landforms (also termed as snow avalanche impact landforms) which have been well-documented [4][5][6]. Also, snow avalanches play a vital role in sedimentary mass transfer [7,8]. New constructions of roads and highways and the expansion of urbanization in mountainous areas have led to an increase in avalanche hazard [9]. Therefore, predicting the zones of this hazard is essential for the reduction of risk (the probability of avalanche occurrence) and for the mitigation of avalanche hazards (the probability of impacts on people and their interests) [10,11]. Different approaches have been employed for producing a snow avalanche susceptibility map (SASM), including multi-criteria decision analysis (MCDA) methods [12][13][14] and fuzzy-frequency ratio models [15][16][17]. However, these methods are expert opinion-based and subjective. Some researchers have used numerical methods and dynamic models to analyze the snow avalanche hazard [18][19][20][21], however a lot of uncertainty is involved in all the modeling parameters when applying to large regions at smaller scales. Other research methods are based upon remote-sensing techniques [14,[22][23][24][25]. Though remote sensing can provide useful information about snow surface and land surface conditions, analysis of the complicated relationships between snow avalanches and geomorphometrics has been often ignored.
In the case of snow avalanches, to the best of our knowledge, too little attention has been paid to the use of machine learning models for mapping. Little is known about the efficiency of these models, and it is not clear which geomorphometric and topohydrological factors affect snow avalanche distribution and activity. Recently, Kumar et al. [49] used a simple data-mining method, probabilistic occurrence ratio (POR) (also known as a frequency ratio), and a snow avalanche inventory to prepare a snow avalanche hazard map in the Lahaul region of the western Himalayas. Their results indicated that this simple model performed well in the task of identifying snow avalanche-prone areas. Choubin et al. [24] have successfully applied support vector machines (SVM) and multivariate discriminant analysis (MDA) to spatially predict this snow avalanche hazard. Their results demonstrated that both models achieve excellent predictive capacity for snow avalanche mapping. However, the predictive performances of other machine learning models in this field of study have not yet been investigated. To address this important gap, several state-of-the-art machine learning models are used to map the snow avalanche hazard. Two study regions, the Darvan and Zarrinehroud watersheds, were selected to test the robustness and generalizability of the models. In addition, different geomorphometric and topohydrological factors, such as, elevation, the topographic position index (TPI), topographic ruggedness index (TRI), topographic wetness index (TWI), vector ruggedness measure (VRM) wind exposition index (WEI), length-slope (LS), relative slope position (RSP), slope degree, aspect, profile curvature and the distance from the stream, were included as potential effective factors to spatially model the snow avalanche hazard. The main objectives of this research are to: (1) Scrutinize the efficiency of SVM, random forest (RF), naïve Bayes (NB), and generalized additive model (GAM); (2) then compare their robustness and generalizability of the two case studies; and (3) investigate the relative importance of the predictive factors in snow avalanche predictive modeling.

Study Area
This research was conducted in two regions of Iran: the Zarrinehroud (4514 km 2 ) and the Darvan (938 km 2 ) watersheds. Both watersheds are in western Kurdistan Province (Figure 1). The Zarrinehroud Watershed is located between 35°40' and 36°24' N and 46°44' and 46°48' E. The Darvan Watershed is between 34°45' and 35°47' N and 46°02' and 48°04' E. These watersheds were selected because they are snowy in some months of the year, and because snow avalanches frequently occur ( Table 1). Steep-sided valley profiles and physiographic characteristics make them highly susceptible to snow avalanche activity. About 70% of annual precipitation occurs between November and May, and is often snow. Rural settlements in both study areas are often exposed to snow avalanches. Heavy snowfall and avalanches block rural and urban road networks during some parts of the year. Snow avalanches killed four people and injured 28 more in 2015-2018.    Collecting and documenting previous snow avalanches is important for modeling the snow avalanche hazard [49,50]. Avalanche occurrence is the dependent variable in the modeling process. A database of snow avalanche occurrences was prepared through field observations and surveys in the winters of 2016, 2017 and 2018 ( Figure 2).

Snow Avalanche Inventory
Collecting and documenting previous snow avalanches is important for modeling the snow avalanche hazard [49,50]. Avalanche occurrence is the dependent variable in the modeling process. A database of snow avalanche occurrences was prepared through field observations and surveys in the winters of 2016, 2017 and 2018 ( Figure 2).
These snow avalanches were mapped with a global positioning system (GPS) and Google Earth imagery. Only one point was mapped per snow avalanche to provide equal treatment of large and small snow avalanche occurrences, to reduce the effect of spatial autocorrelation between observations (especially when some points are considered for a snow avalanche), to avoid uncertainties in mapping snow avalanche boundaries. Some reports from the Road Organization added some avalanche locations to the database. A total of 72 and 105 snow avalanche locations were identified in the Zarrinehroud and Darvan watersheds, respectively.

Factors Influencing Snow Avalanches
Geomorphological, topographical and environmental factors influence the snow avalanche occurrence. In spatial modeling, these factors are independent variables (hereafter termed predictive factors). There are no universal guidelines to select the main predictive factors.
Therefore, we selected them by reviewing the scholarly literature [1,12,13,[50][51][52][53][54] and considering data availability. Kumar [49] stated that though meteorological conditions vary from year to year, the terrain and its characteristics are constant. Over an extended timeframe, these factors are suitable for snow avalanche susceptibility mapping. In this study, the predictive factors are elevation, topographic position index (TPI), topographic ruggedness index (TRI), topographic These snow avalanches were mapped with a global positioning system (GPS) and Google Earth imagery. Only one point was mapped per snow avalanche to provide equal treatment of large and small snow avalanche occurrences, to reduce the effect of spatial autocorrelation between observations (especially when some points are considered for a snow avalanche), to avoid uncertainties in mapping snow avalanche boundaries. Some reports from the Road Organization added some avalanche locations to the database. A total of 72 and 105 snow avalanche locations were identified in the Zarrinehroud and Darvan watersheds, respectively.

Factors Influencing Snow Avalanches
Geomorphological, topographical and environmental factors influence the snow avalanche occurrence. In spatial modeling, these factors are independent variables (hereafter termed predictive factors). There are no universal guidelines to select the main predictive factors. Therefore, we selected them by reviewing the scholarly literature [1,12,13,[50][51][52][53][54] and considering data availability. Kumar [49] stated that though meteorological conditions vary from year to year, the terrain and its characteristics are constant. Over an extended timeframe, these factors are suitable for snow avalanche susceptibility mapping. In this study, the predictive factors are elevation, topographic position index (TPI), topographic ruggedness index (TRI), topographic wetness index (TWI), vector ruggedness measure (VRM), wind exposition index (WEI), length-slope (LS), relative slope position (RSP), slope degree, aspect, profile curvature and distance from the stream (Figure 3).   (1) Elevation Some measures of weather conditions such as temperature and wind speed vary with elevation. Therefore, elevation indirectly affects the initiation of snow avalanches [12,50]. An elevation map with a resolution of 20 m was obtained from the Iranian Department of Water Resources Management (IDWRM). Elevations range from 703 to 3328 m in the Darvan Watershed, and from 1372 to 3141 m in the Zarrinehroud Watershed ( Figure 3a).
(2) TPI The TPI reflects the corresponding position of each pixel in the landscape. In fact, TPI measures the difference between the elevation at the central point of the pixel, and the mean elevation of a predetermined neighborhood of pixels as the relative topographic position. This variable has been applied in the fields of hydrology [55], geomorphology [56], climatology [57], risk management [58] and snow avalanche risk assessment [50]. In addition, it is useful for landform classification and can be computed with De Reu [59] (Equation (1)): where E pixel and E surrounding are the elevation of the cell and the mean elevation of the neighboring pixels, respectively. TPI maps were produced for both study areas in SAGA-GIS [60]. (3) TRI TRI is increasingly being used in a topographical analysis to characterize the convex or concave forms of slopes [60,61]. It was developed by Riley [62] and can be computed with (Equation (2)): where x shows the elevation of each neighbor cell (0,0). In addition, min and max depict the smallest and largest elevation value among neighboring pixels, respectively. TRI also was calculated and mapped in SAGA-GIS [26]. TRI values in the Darvan Watershed range from 0 to 98.1 m, and the range is <82.3 m in the Zarrinehroud Watershed ( Figure 3c).
(4) TWI TWI characterizes the influence of topography on hydrological processes and is used to infer the spatial distributions of wetness conditions [63] and snow avalanche activity [50]. It can be calculated with (Equation (3)) [64]: where α denotes upslope area which drains to a point, and β is the slope angle at the pixel. TWI has a range between 1.9 and 25 in the Darvan Watershed and from 2 to 24.9 in the Zarrinehroud Watershed ( Figure 3d).

(5) VRM
VRM is a geomorphometric factor often used in geomorphological studies [65]. VRM measures the ruggedness of the landscape and provides vital information for snow avalanche prediction [50]. As explained by Kumar et al. [64][65][66], terrain roughness prevents the formation of a uniformly weak layer of snow and can block the downward movement of the snowpack. On the other hand, terrain roughness may destabilize this snowpack by increasing the stress on it [66]. VRM ranges from 0 to 0.206 in the Darvan Watershed, from 0 to 0.2 in the Zarrinehroud Watershed ( Figure 3e).

(6) WEI
Wind not only causes drifting and redistribution of snow (i.e., snow-depth distribution), but also influences energy exchange, and can strongly influence the melting and freezing of snow and ice [67]. The wind is believed to be one of the main snow avalanche predictive factors, since when its velocity is reduced leeward of the ridges, additional snowfall accumulates there [68,69]. The wind exposition index (WEI) was calculated and mapped in SAGA-GIS [26]. WEI calculates the average wind effect from all directions using an angular step [70] (Figure 3f).
LS is a compound geomorphometric factor that has been used in geomorphological and environmental hazard studies [71][72][73]. LS was computed using one of the modules available in SAGA-GIS [26]. In the Darvan Watershed, LS ranged from 0 to 183.7. While the range LS was from 0 to 54.3 in the Zarrinehroud Watershed ( Figure 3g).

(8) RSP
RSP measures topographic exposure and is bounded by 0 and 1, which represent valley bottom and ridge top, respectively [74,75]. The position of each pixel in a terrain reflects its relative hydrological and geomorphological processes which consequently influence stability [76]. RSP varies from 0 to 1 in both watersheds ( Figure 3h).

(9) Slope
The slope is one of the most important terrain factors for mapping the snow avalanche susceptibility [12]. It mediates the vector of the acceleration due to gravity with surface support and friction, and can dictate the shear stress and shear strength that interacts to initiate avalanches. In this study, slope degree was calculated with a digital elevation model (DEM). The slope varies between 0 • and 78.1 • in the Darvan Watershed and from 0 • to 76.3 • in the Zarrinehroud Watershed ( Figure 3i).

(10) Aspect
There is a significant correlation between aspect and snow avalanches [12,77]. Aspect strongly affects solar energy concentration, wind exposure, snowpack accumulation, snowmelt dynamics and vegetation. In both Darvan and Zarrinehroud watersheds, slopes facing northward are mostly shady, and snowpack is unstable for more extended periods annually. Southward facing slopes receive more insolation, and their snowpack is comparatively stable and resistant to snow avalanches [78]. The maps of aspect were generated from the DEM data ( Figure 3j).

(11) Profile curvature
Profile curvature is vital in avalanche risk analysis because it directly influences shear stress and snowpack movement [49,79,80]. Plan curvature is classified as either concave, flat, or convex. Generally, concave surfaces are stable, while convex slopes promote instability in snow cover. The profile curvature maps were generated from the DEM (Figure 3k).

(12) Distance from stream
Distance from stream contributes to hydrological processes and various natural hazards [81,82]. This factor can be used to infer the spatial patterns of soil moisture and subsurface runoff dynamics, which affect the types of vegetation present in a landscape and their conditions. This factor was calculated using the Euclidian Distance tool in ArcGIS (Figure 3l). Lithology dictates many surface features. In general, rocky outcrops increase surface roughness and decrease the downslope movement of the snowpack [1]. Lithology maps of the watersheds at a scale of 1:100,000 were extracted from a digitized geology map of Kurdistan Province ( Figure 3m).

(14) Land use
Land use is one of the most critical factors in snow avalanche assessment [83]. For example, dense forest can prevent the initiation of snow avalanches. A 1:50,000 land-use map was acquired from the Iranian Department of Water Resources Management (IDWRM). Its data were verified with ground-truthing field investigations. The predominant land uses in the two watersheds are dissimilar. For example, while the dominant land cover in the Darvan Watershed is forested land, there is no forested land in the Zarrinehroud Watershed (Figure 3n).

Methodology
The steps ( Figure 4) to complete analysis of the conditions promoting avalanches are: (1) map the 14 potential predictive factors, (2) develop four machine-learning models for avalanche prediction, (3) evaluate the accuracy of the models using statistical criteria, and (4) determine the relative importance of all factors for predicting the location of snow avalanche development.
Remote Sensing 2019, 11, x FOR PEER REVIEW 9 of 24

Methodology
The steps ( Figure 4) to complete analysis of the conditions promoting avalanches are: (1) map the 14 potential predictive factors, (2) develop four machine-learning models for avalanche prediction, (3) evaluate the accuracy of the models using statistical criteria, and (4) determine the relative importance of all factors for predicting the location of snow avalanche development.

Generating Snow avalanche hazard Maps
Four advanced ML models-SVM, RF, NB and GAM-were employed to predict snow avalanches.
(1) Support vector machine (SVM) The SVM, which is known as the maximum-margin method, is used for modeling and spatial prediction of snow avalanche potential. SVM is based on statistical learning theory that can find an optimal clustering hyperplane among the input data by transforming the input space into a higherdimensional feature space [84,85]. This transformation employs non-linear transformers to specify the best separating hyperplane. The best hyperplane is found when the Euclidean distance of the separating margins between the defined classes of the problem is at a maximum [86]. Kernel functions have been developed and used for solving non-linear life phenomena [87]. The most popular kernel functions are given in Table 2. The variables d, σ and b are kernel parameters for each kernel function, and these parameters, along with the penalize parameter (C), affect the efficiency of SVM. The selection of kernel function is an important step for the generalizability of SVM. The RBF is the most commonly used kernel function, especially for the non-linear problems like natural-hazard probability modeling. In this study, the RBF was therefore selected. Increasing the C parameter leads

Generating Snow Avalanche Hazard Maps
Four advanced ML models-SVM, RF, NB and GAM-were employed to predict snow avalanches.
(1) Support vector machine (SVM) The SVM, which is known as the maximum-margin method, is used for modeling and spatial prediction of snow avalanche potential. SVM is based on statistical learning theory that can find an optimal clustering hyperplane among the input data by transforming the input space into a higher-dimensional feature space [84,85]. This transformation employs non-linear transformers to specify the best separating hyperplane. The best hyperplane is found when the Euclidean distance of the separating margins between the defined classes of the problem is at a maximum [86]. Kernel functions have been developed and used for solving non-linear life phenomena [87]. The most popular kernel functions are given in Table 2. The variables d, σ and b are kernel parameters for each kernel function, and these parameters, along with the penalize parameter (C), affect the efficiency of SVM. The selection of kernel function is an important step for the generalizability of SVM. The RBF is the most commonly used kernel function, especially for the non-linear problems like natural-hazard probability modeling. In this study, the RBF was therefore selected. Increasing the C parameter leads to an over-fitted model, while the σ parameter controls the shape of the clustering hyperplane, and consequently, the overall classification accuracy [88]. An optimization technique based on the genetic algorithm was used to determine the most optimal values of the RBF kernel. The optimal values of C and σ were 165 and 0.6, respectively.

Characteristic Function
Linear kernel The RF model is an ensemble classifier, which combines multiple decision trees to classify the input dataset [89,90]. Training-set overfitting of decision trees' habit was corrected by RF; alternatively, it can be described as a set of decision trees that take a dataset and iteratively re-sample it based on the training data [91]. In this method, n tree bootstrap samples are created from the original dataset. Then, an unpruned classification or a regression tree is established for each of the bootstrap samples. The outcome is the average of the results of all of the n trees. A random set of features is selected each time to predict output, and each output is weighted by the value derived from the votes that it receives [30]. The majority voting based on the outputs of estimated decision trees converges to a single decision tree for the final classification [92]. Using the single decision tree overcomes the uncertainty problem and results in higher prediction accuracy [93,94]. Deriving the high variance from different decision trees is crucial for classification by RF. The literature reports that RF classifications yield good results when used to classify satellite imagery. Hence, RF is one of the most operative, non-parametric, ensemble learning approaches in susceptibility modeling and mapping. We repeated the re-sampling process ten times to get the best result. A set of observations that are not used for tree growth are termed the 'out-of-bag (OOB)' sample. The OOB sample is used to estimate the prediction error and then to determine variable importance. Each tree provides a prediction for its out-of-bag sample, and the average of these results for all trees performs an in situ cross-validation that is known as the 'out-of-bag validation' [91]. In addition, there are three training parameters for RF modeling: ntree (the number of trees in the forest), mtry (the number of predictors at each split) and nodesize (the minimal size of the terminal nodes of the trees). When ntree becomes small, the results deteriorate considerably. If ntree is increased above the optimum value, it causes a general increase in computational cost, though the results will not be improved notably. When mtry is very small, not enough predictor variables are entered at each split, and consequently, the predictive capability of each tree decreases [89]. The default value of ntree is 500 trees, whereas one third of total number of the variables. The default value of nodesize is one. In this study, the ntree values were tested from 500 to 5000 with 100 intervals, and mtry was tested from 1 to 14 using a single interval. The optimization was done using the calibration dataset and the root mean square error (RMSE). A ntree of 4200 and a mtry of 6 resulted in the highest accuracy for snow avalanche modeling.
(3) Naïve Bayes (NB) The naïve Bayes (NB) ML model is a widely used classifier, primarily in data mining studies. The popularity of this model is due to the simplicity of its construction and runtime [95,96]. Furthermore, the NB model is robust and is less affected by noise and irrelevant attributes [97]. The NB classifier is a probabilistic model based on the Bayesian probability and Bayes' theorem [98]. In this model, there is an assumption that all the attributes are fully independent, which is called the conditional independence assumption [99].
This model is used to determine and select the classes that maximize the posterior probabilities using the following steps: (a) Collecting the samples, (b) assessing a prior probability for any defined class, (c) assessing the mean value of defined classes, (d) creating the covariance matrix and assessing the inverse and determinant for any defined class [100] and (e) structuring the discriminant function for any defined class [101]. There exist different kernel functions such as linear, polynomial and Gaussian [102]. In this study, the polynomial kernel was used and it was defined previously in Table 2. The generalized additive model (GAM), another ML technique introduced by Hastie and Tibshirani [54], was also used to predict the likely locations of snow avalanches. The GAM is a semi-parametric extension version of the generalized linear model that can combine both linear and nonlinear relationships between the input predictors and the inventory dataset to determine the target data and the responses. Nonlinear problems apply smooth functions of covariance to transform the predictors [103]. The summation of smoothers in the predictors is considered the response [104]. Therefore, the GAM generates a flexible classification of the dependence of the response on the covariates, which makes it practical for analysis of nonlinear responses to changing site conditions. It is, consequently, useful for spatial problems like natural hazard susceptibility modeling and mapping [96]. In the GAM model, the additive predictor η(X) is used as an alternative to the common linear predictor in the GLM. The simple form of the GAM is described by Equation (4): where the expectation of the variable Y is defined by E(Y); the function of g(.) is considered as the link function; β 0 is the intercept; and f i (X i ) is the smoothing function of the input factor variable of X i [105]. This model follows the three steps: (a) Determination of the distribution of the response variable and its corresponding link function, (b) determination of the parameters as smoothing functions and the basis dimension "k", and (c) model optimization using contour plots.

Accuracy Assessments
The performances of the snow avalanche hazard models were evaluated with a k-fold spatial cross-validation approach, as described by Goetz et al. [50,51]. The inventory dataset is randomly partitioned into k disjoint sets, and one subset at a time is used for model validation while the combined remaining k-1 subsets are utilized for model calibration/training. The R package "sperrorest" developed by Brenning [106] was used for designing a ten-fold cross-validation method.
In this study, two standard evaluation metrics, the area under the receiver operating characteristics curve (AUROC) and true skill statistics (TSS), were used. AUROC is a cutoff-independent metric, whereas TSS is a cutoff-dependent metric.
(1) The AUROC metric The accuracy assessment evaluated the effectiveness of each ML model by analyzing the uncertainty among the resulting susceptibility maps as compared to the inventory data set. The (AUROC) curve technique was selected for this comparison. It is commonly used to characterize the quality of the resulting susceptibility-prediction maps, particularly in the geosciences [107,108]. The plotted curves display the trade-offs between the false positive rate (FPR) and the true positive rate (TPR), on the X and Y axis, respectively. AUROC is an accuracy index. Values closer to one indicate that the output map provides higher quality predictions [109,110]. AUROC curves were calculated for all resulting snow avalanche susceptibility maps. To assess goodness-of-fit, the training pixels were used. Predictive performance of the models, however, was assessed with the validation pixels.
(2) TSS metric The true skill statistics is the other accuracy assessment technique used. It was first proposed by Allouche et al. [111], and is based on the components of a confusion matrix which is generated from the number of matches and mismatches between the inventory data and the modeled results [111].
The TSS can be calculated with Equation (5): where TPR and TNR are defined by Equations (6) and (7): where TN (true negative) and TP (true positive) are the number of pixels that are correctly classified whereas FN (false negative) and FP (false positive) are the numbers of pixels erroneously classified. Another performance metric is Matthews correlation coefficient (MCC) that can be calculated as follows (Equation (8) The values of the TSS range from −1 to 1; the closer the value is to 1, the better the performance [112]. Both AUROC and TSS evaluation metrics were calculated using a GIS-based extension-performance measure tool (PMT)-which were developed by Rahmati et al. [92]. Besides, these evaluation metrics were calculated in terms of both goodness-of-fit (accuracy of the model in the training step) and predictive performance (accuracy of the model in the validation step).

Ensemble Modeling
In general, all the ensemble techniques use the weighted integration of individual models to come up with the outcome. However, the way of calculating these weights are different. In this study, an ensemble method which has proposed by Pourghasemi et al. [113] was used. The ensemble map can be generated using Equation (9): where EM is the resulted ensemble model, AUROC i is the AUROC value of the ith single model (M i ).

Statistical Comparison of Models Friedman Test
The Friedman test is a non-parametric test that has been known as one of the most reliable tests for multiple comparisons to discover significant differences among models [114]. As one of the most important characteristics, it does not assume normal distributions or homogeneity of variance for the compared random variables. The null hypothesis (H 0 ) assumes no difference between the performances of the snow avalanche models. In other words, the null hypothesis of the Friedman test is that all models are equivalent based on their average ranks. If the p-value is less than the significance level (α = 0.05), and the chi square value (χ 2 ) is higher than the standard value (3.841), the null hypothesis is rejected. Since this test does not provide pairwise comparison of these models, it only provides statistical differences for all the models.

Wilcoxon Signed-Rank Test
In order to statistically compare the differences of models, the Wilcoxon signed-rank test, a nonparametric method, was performed. This method is used to test the hypothesis that the means of the first sample is equal to the means of the second sample. This test evaluates the significance of differences between some models based on two statistical components including z and p. When the z value exceeds the critical range (from −1.96 to +1.96) and also the p value is less than the significant level (α = 0.05), then the null hypothesis is rejected (i.e., these models are statistically different). Similar to the Friedman test, the Wilcoxon signed-rank test does not assume normal distributions or homogeneity of variance for the compared random variables.
This relaxed assumption ensures the generality of the test. As discussed by Lin et al. [115], it is appropriate that we conduct the Wilcoxon signed-rank test instead of the paired t-test when comparing two models.

Snow Avalanche Hazard Maps
The avalanche hazard maps predicted by the four models generally exhibit the same spatial patterns, although the details of each model differ ( Figure 5). In the Zarrinehroud Watershed, areas near the ridgeline (i.e., the drainage divide of the watershed), especially in the southern parts of the watershed, are identified as most susceptible to snow avalanches. In the Darvan Watershed, a strip along the southwestern border of the watershed, extending to the center of the watershed, is the corridor of highest risk. Low-susceptibility zones in both watersheds are, unsurprisingly, in the lowlands.
Predicted snow avalanche pixels in all four maps of the study area were categorized into five susceptibility classes based on their susceptibility values: very low (0-0.2), low (0.2-0.4), medium (0.4-0.6), high (0.6-0.8) and very high (0.8-1.0) ( Figure 5). The areas contained in the very high class vary between 7% and 16%, where the average is 11.5% for the models ( Table 2). The high-susceptibility zones are approximately 10.5% of the basin. In total, about one-fifth of the study area could be regarded as avalanche-prone. In both basins, more than 60% of the study areas were categorized as low and very low susceptibility.
After preparing the snow avalanche hazard map by the ensemble model, it was reclassified to five classes: very low (0-0.2), low (0.2-0.4), medium (0.4-0.6), high (0.6-0.8) and very high (0.8-1) ( Figure 6). Unsurprisingly, both individual and ensemble models show the same overall spatial pattern with some differences. The relative area of each class was shown in Table 3.
susceptibility classes based on their susceptibility values: very low (0 -0.2), low (0.2 -0.4), medium (0.4 -0.6), high (0.6 -0.8) and very high (0.8 -1.0) ( Figure 5). The areas contained in the very high class vary between 7% and 16%, where the average is 11.5% for the models ( Table 2). The highsusceptibility zones are approximately 10.5% of the basin. In total, about one-fifth of the study area could be regarded as avalanche-prone. In both basins, more than 60% of the study areas were categorized as low and very low susceptibility.   Figure 6). Unsurprisingly, both individual and ensemble models show the same overall spatial pattern with some differences. The relative area of each class was shown in Table 3.

Performance of Models
Considering      The predictive capacities and generalizability of the models cannot be evaluated based on the goodness-of-fit, as it uses the data with which the models were built [43]. Therefore, the accuracy of the models is best assessed with the validation data.

Performance of Models
Analysis of the validation step reveals that the ensemble model (AUROC = 0.966 in Darvan and AUROC = 0.958 in Zarrinehroud) was a slightly better predictive performance than RF (AUROC = 0.964 in Darvan and AUROC = 0.956 in Zarrinehroud) for both the watersheds (Table 4, Figure S2 Although the accuracies of SVM, NB and GAM were slightly lower than the ensemble and RF models in both study areas, all four models rate "very good" (0.8 < AUROC < 0.9) or "excellent" (AUROC > 0.9) in their avalanche-predicting performance [116,117].

Friedman Test
The results of the Friedman test in the Zarrinehroud and Darvan watersheds (Table 5) showed that both the p-value and chi square value (χ 2 ) of four models are far from the standard values of 3.841 and 0.05, respectively. Therefore, the null hypothesis is rejected. This clearly indicates that the differences among the models are statistically significant. However, since the results of the Friedman test highlights only the significant differences of performance of all the employed models, the result is not able to provide comparisons between these two models.

Wilcoxon Signed-Rank Test
Results of the Wilcoxon signed-rank test in both Darvan and Zarrinehroud watersheds were summarized in Table 6. In the Darvan Watershed, not only the p values were far from the standard value of 0.05, but also these z values were out of the standard range (from −1.96 to +1.96). These conditions were observed for both the Zarrinehroud and Darvan watersheds. Therefore, the differences of all four models are statistically significant.

Variable Importance
The degree to which each contributing factor helps to discern locations of snow avalanche risk was determined with the RF model (  14.2% in Zarrinehroud) and aspect (13.2% in Darvan, 12.9% in Zarrinehroud) had more moderate contributions to the models. Distance from a stream and TWI were the least important factor in both watersheds (i.e., less than ten percent).

The Performance of Models
Since the performances of RF, NB and GAM in the field of snow avalanche hazard have not been evaluated previously, it is impossible to directly compare these results to previous studies. The accuracy of SVM, however, confirms results achieved by Choubin et al. [24]. Numerous studies have applied AHP models which are based on the knowledge and judgments of experts, but the reliance on subjective expert input can be the primary drawback of using them [118]. In the case of avalanche modeling with AHP, the weighting of avalanche-promoting factors used in the models can be very inefficient. Results of this study showed that the ensemble model outperformed individual models in both watersheds. Among individual models, RF had the highest predictive performance and was slightly weaker than the ensemble model. In this study, SVM was used as a benchmark for comparing its results with other individual and ensemble models. Our results also confirmed the study of Choubin et al. [24], who suggested SVM for snow avalanche hazard mapping.
The outstanding performance of RF is due to its key advantages: capability to determine variable importance, non-parametric relationships, robustness to outliers and noise, high classification accuracy and internal, unbiased determination of the generalization error through the out-of-bag (OOB) factor. Rahmati et al. [92] have confirmed the capacities of ML models for analyzing complicated relationships of geoenvironmental, topohydrological and extreme natural processes or events. The two reasons for the better performance of ML models over bivariate and multivariate models is that no statistical assumptions are made, and they can handle data from various measurement methods [26,119,120].

Statistical Comparison of Models
In addition to the evaluation of the performance of the models, the exploration of their statistical differences is necessary [121]. As discussed by Bui et al. [16], these differences originated from the variability of model structures and their spatial prediction patterns. This issue has not yet been investigated for snow avalanche modeling; hence, it is not easy to compare our achievements with previous studies. This motivated us to select two case studies with the same predictive factors and models to fairly judge the statistical differences between the models. This issue was addressed in this study with the evaluation and comparison of different ensemble and standalone machine learning models. All standalone and ensemble models were statistically different for both Darvan and Zarrinehroud watersheds.

Variable Importance
It is logical to say that not all the selected conditioning factors (independent variables) are strong predictors, and in some cases, some of them generate noise and reduce the accuracy of predictions. Therefore, feature selection should be investigated [122]. Generally speaking, users and decision makers feel more comfortable in the practical application of a model if they have a clear understanding of the predictive factors that favorably contribute to the modeling process. Previous researchers have used a wide array of geomorphometric and topohydrological factors to spatially model snow avalanches. Our findings are broadly consistent with Choubin et al. [24]. They found that the TPI strongly influenced the spatial distribution of snow avalanches. This is also in line with the results of Kumar [64,65], who used the AHP method to map the snow avalanche hazard; their study indicated that slope was the most consequential terrain parameter, and ground cover was the least important. Previous studies, however, have not considered a more comprehensive set of geomorphometric factors, such as LS and RSP, so more profound comparisons are not possible. The results of variable importance highlighted that even factors such as distance from the stream and TWI that had less contributions to the modeling process can improve the accuracy of the model.

Conclusions
This study used an innovative method to scrutinize the capability, and efficiency of four ML approaches-RF, SVM, NB and GAM-to spatially model snow avalanches. To determine how robust and generalizable they are, this study was undertaken in two watersheds (the Darvan and the Zarrinehroud) in Kurdistan Province, Iran. A wide array of geomorphometric, topo-hydrologic and geo-environmental factors were employed in combination with a snow avalanche inventory to develop the models. The following conclusions can be drawn: (1) The results demonstrate that the ensemble model slightly outperformed all individual models in terms of the predictive performance in both Darvan (AUROC = 0.966, TSS = 0.865, MCC = 0.861) and Zarrinehroud (AUROC = 0.958, TSS = 0.877, MCC = 0.841) watersheds. Among individual models (i.e., RF, SVM, NB and GAM), the RF was the best model for predicting locations susceptible to snow avalanches in both the Darvan (AUROC = 0.964, TSS = 0.862, MCC = 0.865) and the Zarrinehroud (AUROC = 0.956, TSS = 0.881, MCC = 0.854) watersheds. This can be determined with both cutoff-dependent and cutoff-independent evaluation metrics. Importantly, the model performance of RF was relatively similar to the ensemble model, and always achieved a top two estimation in both goodness-of-fit and predictive performance, usually just behind the ensemble model. SVM and NB also performed at excellent levels (AUROC > 0.9). GAM produced a very good result (0.8 < AUROC < 0.9) for both watersheds. Since all four machine learning models achieved very good or excellent results, these models can be regarded as effective choices for spatially modeling the snow avalanche hazard. Furthermore, the statistical analysis clearly indicates that the results of the models are significantly different.
(2) The snow avalanche hazard maps for the Darvan and Zarrinehroud watersheds indicate the regions that are most likely to see snow avalanches, and they also provide useful information to undergird snow avalanche risk assessments, land use planning and infrastructure development decision making. The ensemble map indicates that approximately 26% of the Darvan Watershed and 21% of the Zarrinehroud Watershed is classified as having high or very high avalanche risk. Some areas along the main roads through the watersheds fall into these areas, and need to be carefully evaluated for safety.
(3) The variable-importance assessment also highlights that LS, lithology and RSP are the most significant factors for predicting a snow avalanche hazard. Avalanche prediction is a complex procedure; therefore, the results of this study can improve our understanding of the snow avalanche hazard. It is Remote Sens. 2019, 11, 2995 20 of 26 highly recommended that this information be employed by decision-makers and used in future studies. Managers should take these findings into account and apply this information for the management of avalanche hazards in the high-and very-high-risk zones.
(4) Although the efficiency and applicability of four advanced machine learning models have been evaluated, there are still other modeling approaches that should be used for snow avalanche hazard mapping. Therefore, further studies focusing on the capability of other machine learning models (e.g., XGBoost, logistic model tree, etc.) in this field of study is suggested.