Flash Flood Susceptibility Modeling Using New Approaches of Hybrid and Ensemble Tree-Based Machine Learning Algorithms

Flash flooding is considered one of the most dynamic natural disasters for which measures need to be taken to minimize economic damages, adverse effects, and consequences by mapping flood susceptibility. Identifying areas prone to flash flooding is a crucial step in flash flood hazard management. In the present study, the Kalvan watershed in Markazi Province, Iran, was chosen to evaluate the flash flood susceptibility modeling. Thus, to detect flash flood-prone zones in this study area, five machine learning (ML) algorithms were tested. These included boosted regression tree (BRT), random forest (RF), parallel random forest (PRF), regularized random forest (RRF), and extremely randomized trees (ERT). Fifteen climatic and geo-environmental variables were used as inputs of the flash flood susceptibility models. The results showed that ERT was the most optimal model with an area under curve (AUC) value of 0.82. The rest of the models’ AUC values, i.e., RRF, PRF, RF, and BRT, were 0.80, 0.79, 0.78, and 0.75, respectively. In the ERT model, the areal coverage for very high to moderate flash flood susceptible area was 582.56 km2 (28.33%), and the rest of the portion was associated with very low to low susceptibility zones. It is concluded that topographical and hydrological parameters, e.g., altitude, slope, rainfall, and the river’s distance, were the most effective parameters. The results of this study will play a vital role in the planning and implementation of flood mitigation strategies in the region.


Introduction
Floods are among the most destructive natural disasters [1]. The term flash flood can be defined as a phenomenon in which river water flows from its natural levees and causes inundation of the surrounding areas for a specific time [2]. It can be noted that flash floods are an unfavorable combination of different environmental parameters, i.e., meteorological, hydrological, geomorphological, and human intervention in the collapse of flash flood protection structures [3]. Over the last few decades, ongoing global climate change has been associated with an increase in the frequency and magnitude of global flash flood hazards. This is not the only reason why large-scale human intervention in the environment, such as forest ecosystems, e.g., deforestation, sedimentation in riverbeds, and riverbeds' encroachment by human settlements and dam construction along with unhealthy development of urbanization, are responsible for devastating flash floods. In recent times, the flash flood intensity pattern has changed due to a gradual increase in the global population, especially in developing countries [4][5][6]. Flash flooding can cause significant socio-economic damages, i.e., loss of human settlement, fatality, infrastructural damages, i.e., agricultural land, buildings, communications, i.e., roads, railways [7][8][9]. Therefore, it has been estimated that 31% of total global economic losses with $104 billion are caused by flood hazards, known as the most costly natural disaster, among others [10]. In 2010, the World Statistics Survey also revealed that more than 178 million people were largely affected by devastating flash floods [11]. Flash flooding is also responsible for approximately 20,000 deaths every year, with 75 million people becoming homeless [12]. Iran is among the countries most susceptible to the flash flood. The Kalvan watershed has been affected by flooding annually; studies show that the Kalvan watershed is vulnerable to flash floods.
Multiple factors are responsible for the occurrence of a destructive flash flood, such as high intensity of rainfall, the tendency to generate runoff, rate of the rainfall-runoff process, soil properties and infiltration rate, poorly maintained flow pattern of a river system, and land-use changes. The frequency of flash floods is an essential method of analyzing flood hazards by predicting future flash floods. Therefore, flash flood frequency is measured using different historical flash flood data, i.e., discharge, rainfall, runoff [13]. As a result of devastating flash flood damage, various types of structural and non-structural measures should be taken to mitigate and prevent flash floods in a sustainable manner. Flash flood is considered to be one of the most dynamic natural disasters. and measures need to be taken to minimize economic losses and adverse effects. One of such measures is mapping the susceptibility areas for flash floods [14]. Accurate flash flood modeling and flash flood susceptibility mapping (FSM) analysis is the main concern among scientists and governments around the globe [15]. Statistical regression and time series analysis has been used for flash flood modeling for large basin areas [16]. The Hydrologic Engineering Center's river analysis system (HEC-RAS) model [17,18] and MIKE model [19] are also used to predict spatio-temporal floods. The recognition of flood potential zones and the identification of different susceptible areas, i.e., high, moderate, low, are therefore necessary conditions for the mitigation and management of devastating flood hazards. In the last few decades, remote sensing and geospatial technology have been intensively used in flash flood susceptibility studies. This geospatial technology has been used for flash flood inundation mapping, site identification for flash flood shelter [13], and flash flood hazard assessment of tropical rainy areas [20]. Later on, different statistical methods in combination with geospatial technology were used for flash flood susceptibility analysis. Among these, multi-criteria decision analysis (MCDA), analytical hierarchy process (AHP), evidential belief function (EVF), etc. are notable methods. In recent decades, different types of machine learning (ML) models have received more attention among researchers throughout the world to predict flash flood susceptibility because of their high accuracy and capacity to handle complex input data structures. Different types of machine learning (ML) algorithms such as logistic regression (LR), artificial neural network (ANN), support vector machine (SVM), random forest (RF), boosted regression trees (BRT), generalized linear model (GLM), etc., along with different ensemble approaches, have been used to predict flash flood susceptibility. Extensive literature In recent times, studies have shown that hybrid or combined models have been used to evaluate flash flood susceptibility mapping rather than using a single ML model [24,25]. Thus, hybrid models are generally developed by integrating several statistical or ML models [26]. On the other hand, the ensemble methods were used to achieve the best performance with high predictive accuracy and were primarily developed by boosting or bagging more than one ML algorithm [25]. Due to the problem of overfitting in previous models such as random forest (RF) and decision tree [27], in this study, we used hybrid parallel and regularized methods to reduce errors in the RF model. In addition, the ensemble decision tree model including extremely randomized trees (ERT) was used in predicting flood susceptibility. In summary, the core aims of this study are: (i) Comparing hybrid parallel random forest (PRF), regularized random forest (RRF), and ERT with RF and boosted regression tree (BRT) as a benchmark model, (ii) prepare flash flood susceptibility maps based on hybrid and ensemble models, (iii) identifying most important variables on flash flood susceptibility in the Kalvan watershed.

Description of the Study Area
The Kalvan watershed is in the northwestern part of Markazi Province, Iran. The study area is located at 34°20′ and 34°50′ N and at 48°54′ and 49°29′ E and covers approximately 2056.75 km 2 . Elevations change from 1602 to 2660 m above sea level ( Figure 1). Based on the Meteorological Department of Markazi Province, the watershed receives an average annual rainfall of 299 mm. The climate is arid to semi-arid. The significant volume of precipitation in the watershed happens during January and February, and the highest recorded 24-h rainfall total was about 47 mm. The dominant land uses, or land covers (LU/LC) in the Kalvan region are agriculture, rangeland, orchard, bare land, rock land, and urban; agriculture accounts for the most considerable portion of the region ( Figure  3h).

Methodology
The methodology of the study is shown in the flowchart ( Figure 2). The steps followed below summarize the approach used.

Methodology
The methodology of the study is shown in the flowchart ( Figure 2). The steps followed below summarize the approach used.

III.
The multi-collinearity analysis was done among the flash flood susceptibility conditioning factors using the variance inflation factor (VIF) and tolerance (TOL) techniques. IV.
Random forest model hybrid with two algorithmic regularizations and parallel for flash flood modeling was used. V.
Extremely randomized trees such as an ensemble of a decision tree were used for flash flood modeling. VI.
Flash flood susceptibility maps were prepared using BRT, RF, parallel and regularization methods on random forest and ensemble for extremely randomized trees (ERT). VII.
Different flash flood susceptibility models' performances were validated through statistical indices along with receiver operating characteristic (ROC)-AUC analysis.

Dataset Preparation for Spatial Modeling
In this study, flash flood susceptibility locations were provided based on flash flood events that occurred and were recorded by the Department of Regional Water of Markazi Province. A total of 256 flash flood points were used in this study. In order to determine the non-ditch points, geographic information system (GIS) software was used, and 256 points were randomly selected. The digital elevation model (DEM) map was obtained with a pixel size of 12.5 m from the ALOSPALSAR sensor, and the slope map, direction curve, plan curvature, profile curvature in the GIS software environment were prepared based on the DEM. The map of the distance from the waterway and the distance from the road based on the Euclidean extension was obtained in GIS software. A drainage density map was prepared using line density extension. System for Automated Geoscientific Analyses Geographic Information System (SAGA GIS) software was used to map the stream power index (SPI) and topographic wetness index (TWI). The region's soil depth map was obtained based on the map prepared by the Administration of Natural Resources of Markazi Province. The lithological map was prepared based on the geological map of 1:100000 of the country's mapping organization. Land use maps were prepared based on Landsat satellite images, Operational Land Image (OLI) measurement, and using the maximum likelihood algorithm with a Kappa value of 0.87 in the environment for visualizing images (ENVI) software environment. The precipitation map of the area was prepared

I.
A total of 256 flash flood susceptibility points were recognized based on field visits and information of the Department of Regional Water of Markazi Province. II. Thirteen flash flood susceptibility conditioning factors were chosen for modeling FSM. III. The multi-collinearity analysis was done among the flash flood susceptibility conditioning factors using the variance inflation factor (VIF) and tolerance (TOL) techniques. IV. Random forest model hybrid with two algorithmic regularizations and parallel for flash flood modeling was used. V.
Extremely randomized trees such as an ensemble of a decision tree were used for flash flood modeling. VI. Flash flood susceptibility maps were prepared using BRT, RF, parallel and regularization methods on random forest and ensemble for extremely randomized trees (ERT). VII. Different flash flood susceptibility models' performances were validated through statistical indices along with receiver operating characteristic (ROC)-AUC analysis.

Dataset Preparation for Spatial Modeling
In this study, flash flood susceptibility locations were provided based on flash flood events that occurred and were recorded by the Department of Regional Water of Markazi Province. A total of 256 flash flood points were used in this study. In order to determine the non-ditch points, geographic information system (GIS) software was used, and 256 points were randomly selected. The digital elevation model (DEM) map was obtained with a pixel size of 12.5 m from the ALOSPALSAR sensor, and the slope map, direction curve, plan curvature, profile curvature in the GIS software environment were prepared based on the DEM. The map of the distance from the waterway and the distance from the road based on the Euclidean extension was obtained in GIS software. A drainage density map was prepared using line density extension. System for Automated Geoscientific Analyses Geographic Information System (SAGA GIS) software was used to map the stream power index (SPI) and topographic wetness index (TWI). The region's soil depth map was obtained based on the map prepared by the Administration of Natural Resources of Markazi Province. The lithological map was prepared based on the geological map of 1:100000 of the country's mapping organization. Land use maps were prepared based on Landsat satellite images, Operational Land Image (OLI) measurement, and using the maximum likelihood algorithm with a Kappa value of 0.87 in the environment for visualizing images (ENVI) software environment. The precipitation map of the area was prepared from the rainfall records of six climatological monitoring stations for the period of 28 years and based on the inverse distance weighting (IDW) interpolation method.
In this study, 13 geo-environmental variables that directly affect the flood process were chosen for the model setup and analysis. These variables or factors were altitude, slope, aspect, plan curvature, profile curvature, distance from river, distance from road, land use, lithology, soil depth, rainfall, stream power index (SPI), and topographic wetness index (TWI) (Figure 3a Here, the slope aspect map was classified into nine classes (Figure 3b). Slope was also an important factor for the FSM, as it has a major influence on the runoff pattern and the flow of water. As a result, flash floods in the lower angle slope area are more frequent than in the higher angle slope area. The slope map is shown in Figure 3c and the percentage of slope ranges from 0 to 159.23. Topographic characteristics of an area were basically understood by plan and profile curvature. Here, the plan and profile curvature value ranged from −10.32 to 10.73 ( Figure 3d) and −13.33 to 12.14 ( Figure 3e).
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 24 used to know hydrological processes over topographical controls of a particular area. High TWI values represent more flash flood-prone areas than low TWI values. The following equation has been used to calculate TWI [32]: where indicates a particular river basin area and indicates the slope gradient in degree. SPI and TWI variables were prepared in SAGAGIS 2.6 software.  The distance from the river was the most important factor in the flash flood susceptibility analysis. Flash flood frequency and magnitude were much high near the river and vice versa. Basically, the extension of flash flood and its magnitude largely depended on the distance from river [28]. The value of the distance from river map ranged from 0 to 11,582.3 m (Figure 3f). The distance from road network was also a critical flash flood conditioning factor. Roads are constructed based on a high-altitude area, so the pattern of water flow changes closer to the road as well, although high-intensity flash floods generally destroy the road structure. The distance from the road network map (Figure 3g) indicated that its value ranged from 0 to 12,806 m. The flash flood occurrence of an area largely depends on the land use/land cover (LULC) of that area. There was a negative correlation between the vegetation density and the occurrences of flash flood. Therefore, vegetation-prone areas had less runoff than the non-vegetation covered areas. The LULC map was prepared based on Landsat satellite images, OLI measurement, and the maximum probability algorithm in the ENVI software environment (Figure 3h). The present study was classified into six categories: i.e., agriculture, bare land, orchard, range, rock surface, and urban areas. Water percolation and its stagnation are largely dependent on the characteristics of the rocks. Therefore, lithology is an important parameter for the occurrence of flash floods in the area. The lithology of the present study area was classified into 12 classes (Figure 3i). The lithology map was provided based on the geological map of 1:100,000 of the Iranian National Cartographic Center (NCC). The significance of the soil type in the event of a flash flood is very high. Basically, the water storage capacity, percolation and permeability rate, and the drainage structure determined by soil types accelerate the area's flash flooding [29]. The soil types map (Figure 3j) of the present study area was classified into four categories, namely rocky outcrops/Entisols, rocky outcrops/Inceptisols, Aridisols, and Inceptisols. The soil type map was obtained from the Administration of Natural Resources of Markazi Province. Rainfall is a direct factor in the occurrence of flash floods and the intensity of rainfall is a major factor in the magnitude of the flash floods. The amount of rainfall varies from 229.23 to 365.85 mm (Figure 3k). The precipitation map of the Kalvan watershed was prepared from the data of 6 climatological stations around the case study with a statistical period of 28 years (1991-2019) based on the IDW method.
The SPI can be defined as the power of erosion and degree of water discharge of a particular area within a watershed. It is established that if the SPI value is high, the flash flood power will also be high, and vice versa [2]. The SPI values have been calculated by using the following equation [30]: where A i indicates a particular river basin area and tanβ indicates the slope gradient in degree. The SPI map of the present study area is shown in Figure 3l. TWI basically represents the moisture condition of the soil, water depth, and saturation zone of a specific topography [31]. TWI is widely used to know hydrological processes over topographical controls of a particular area. High TWI values represent more flash flood-prone areas than low TWI values. The following equation has been used to calculate TWI [32]: where A i indicates a particular river basin area and tanβ indicates the slope gradient in degree. SPI and TWI variables were prepared in SAGAGIS 2.6 software.

Multi-Collinearity Analysis
Different geo-environmental factors have been used to predict flash flood susceptibility mapping using various kinds of models. It is necessary among all of these conditioning factors to find out which two or more than two factors are highly correlated with each other. If there is a higher correlation between two or more conditioning factors than the model, it is less valid to evaluate and to automatically reduce the accuracy of the output result. Therefore, multi-collinearity analysis helps to identify these highly correlated factors. Basically, multi-collinearity is a statistical analysis among more than two variables and represents a linear dependency between these variables [33]. Thus, high multi-collinearity conditioning factors need to be removed from the models for better prediction of the result [34]. Generally, variance inflation factor (VIF) and tolerance (TOL) techniques have been used to analyze the multi-collinearity. Different literature studies have shown high multi-collinearity found among all of these factors which have VIF values >5 or 10 and TOL values of <0.10 or 0.20 [35,36]. The TOL and VIF of a multi-collinearity analysis can be calculating using the following equation: where R 2 j represents the regression value of j on other different variables in a dataset. BRT is a combination of statistical (egression trees) and machine learning (gradient boosting trees) techniques. Thus, BRT is also known as stochastic gradient boosting. BRT is an important data mining technique of a nonparametric method used to measure the association between dependent and independent variables [37]. It is very much used for the determination of independent variables and classification and forecasting analysis [38]. Among the two techniques used in BRT, boosting is used to improve model accuracy through appropriate new trees for residual error [39]. In addition to boosting, regression trees are used in BRT model to categorize the classification system from the decision tree groups in the model [40]. Generally, three parameters are needed for the optimization of a BRT model and these are a number of boosting tree iterations, interaction tree depth, and shrinkage [41]. The shrinkage is emphasized by the contribution of trees to the cultivated model. The parameter of the interaction tree depth is determined through the individual trees [42]. The function of BRT is basically based on the predictive variables X = {x 1 , . . . . . . x n } and a response variable y. The BRT model can be processed by using the training sample of y i , X i , i = 1, . . . N of known y and X values. We also wish to find out a function, i.e., F * (X), which generally maps X to y. Thus, Equation (5) is used to minimize the values of a loss function among all the values of (y, X) [43]: The following equation is used for gradient boosting approximates F(X) : where g(X; α m ) indicates a regression tree of a particular node, α m indicates the parameters of the tree, i.e., different splitting variables and split points, and β m indicates coefficients.
Finally, the BRT model is described by following equation given by Friedman in 2001 [43]: where h(X; a m ) indicates the classification function with α parameters and X variables, m represents the variable of the stage of the model, and β m indicates the coefficient in the stage of m.

Random Forest (RF)
The algorithm of RF was developed by Breiman in the year 2001 [44]. It is an ensemble classifiers system based on binary decision trees. It can easily handle a large number of variables and it is basically a statistically-based approach [45]. By using the training dataset in the RF model, it generates numerous trees based on random bootstrapping, i.e., a random subset of the original training data [44]. The random bootstrapped sample also allows data not included, i.e., the remaining fallow subset data known as out of the bag (OOB) data [46]. This OOB data is used to assess the general errors in the model. The RF model is also used to analyze the dynamic trends known to non-linear interactions between explanatory and response variables. Besides this, any kind of assumption does not need to establish the relationship among explanatory and response variables in this model. Therefore, the RF algorithm has been used to analyze hierarchical and non-linear interactions in the big dataset along with better prediction of new-fangled data cases [47]. The RF requires three parameters to be tuned, i.e., number of variables (mtry), number of trees (ntree), and utmost number of terminal Remote Sens. 2020, 12, 3568 9 of 23 nodes (nodesize) [48]. The algorithm of RF is based on tree-structured classifiers and has been shown as follows: h(x, i k ), k = 1, 2, . . . n (8) where i k represents flash flood occurrence conditioning factors; and 1, 2, . . . n are input vector x.
In an RF, the general errors can be defined as follows by Masetic and Subasi [49]: where x and y indicate the different flash flood occurrence conditioning factors, and mg represents the margin function. Again, the margin function can be described as follows:

Parallel Random Forest (PRF)
Traditional RF algorithms are not suitable for the analysis of big dataset. Therefore, the traditional RF algorithms for the analysis of large mass dataset parallelized design have been developed and are popularly known as PRF. The RF is generally an ensemble of different classification and regression tree (CART) decision trees and the implementation of parallel in traditional RF is better to analyze big datasets [50]. The RF model may be developed in parallel due to the ensemble of decision trees in the RF model. Basically, PRF is a modern version of RF based on decision tree computation [51]. In the real-time situation, RF was parallelized to minimize the execution time and to produce RF predictive output faster than that of the previous one [52]. In the PRF analysis, at first RF was divided into a variety of sub-forests. After that, these sub-forests were created in parallel and finally, all these sub-forests joined together in a larger forest at the core of the process. In PRF, massive dataset analysis is done in two ways: The first is the map, i.e., it creates different key value pairs in which the key indicates the index of a specific data value, and the second is the reduce, i.e., the analysis of key value pairs generated by the map function and the output of the final result [50].
The training dataset in the PRF model has been divided into different subsets of features due to splitting. Let us consider that the training dataset size is represented in S which is N x M, then independents variables represent x 0 , x 1 , . . . x m−2 and dependent variables are x m−1 in that dataset. Later on, the dataset may be split into an (M − 1) feature subset. After that, each and every subset of features is loaded into the different data node. Finally, each subgroup of features is handled by a map mission, and trees are created in parallel.

Regularized Random Forest (RRF)
The RRF was generally developed for feature selection with one ensemble method [53] rather than several ensemble methods [54]. In this model, every element of the training dataset was analyzed at every tree node. It is also known that the process of selecting the features of the RRF model is greedy. In the RF model, the tree regularization method used to develop RRF could choose the compression function subset of the model [55]. Basically, the RRF algorithm has been developed in the way of RF, but the major distinction is that in RRF, the regularized information gain, i.e., Gain R (X i , v), is used.
where F represents the set of features used for splitting the tree nodes, λ ∈ [0, 1] represents penalty coefficient, and i F represents the coefficient penalized for the ith feature of splitting node v. Here λ represents larger penalty. In RRF, minimum regularization occurs when λ = 1 and it is referred to as RRF (1). In RRF (1), the selection of feature subset is known as the least regularized subset, which indicates minimum regularization from RRF.

Extremely Randomized Trees (ERT)
The ERT model, also known as extra trees (ET), is a set of techniques based on a tree-based decision-making model. This is a specific randomization method proposed by Geurts et al. [56]. This model develops a number of regression trees or decision trees from the overall dataset [57]. The main difference between the RF and the ERT is that the ERT emphasizes randomness during training [58]. Randomization in the tree diversity helps to minimize the correlation; it means decision trees become more independent. In the ERT model, each tree node is randomly divided by the variable index and the splitting value. The main fundamental principle in the ERT model is the use of different decision trees, which are basically individually fragile learners, but when they are combined they become extremely robust learners [59]. The ERT algorithm is constructed on the basis of three parameters, i.e., the number of set decision trees (M), the number of randomly selected features (K), and the number of instances required to split the node (n min ). The training dataset in the ERT model, . . , f d is a D-dimensional vector and f j represents the feature and j ∈ {1, 2, . . . D}, and finally ET creates M-independent decision trees for the model.
The ERT model gives more accuracy and a superior result than the RF because this model removes discretization threshold values through optimization [56]. This model also has an important advantage of minimum computing times and can be easily implemented.

Methods of Validation and Accuracy Assessment
The validation and accuracy assessment of FSM using the machine learning model is very much necessary to evaluate the predictive result. Therefore, in this study, different statistical indices along with area under receiver operating characteristic (AUROC) curves were used to evaluate the five machine learning output results. In statistical indices, sensitivity (SST), specificity (SPF), positive predictive values (PPV), and negative predictive values (NPV) were used. If the result of these statistical indices showed a higher value, then every machine learning model gave a better result and vice versa [60]. The four statistical indices which were used in this study can be calculated by following equations where TP represents true positive, TN represents true negative, FP represents false positive, and FN represents false negative. On the other hand, the standard tool widely used for model validation and accuracy assessment is ROC-AUC. The ROC curve plotted on the X and Y axis is popularly known as sensitivity and 1-specificity. This X and Y axes represent true positive and false positive in the graph and the optimum value in both cases is 1 [61]. The ROC-AUC range varies from 0.5 to 1, indicating poor performance to excellent model validation performance. The ROC-AUC has been computed by using following equation: where S AUC is the area under curve, X k is the 1-specificity, and S k is the sensitivity of the receiver operating characteristic (ROC) curve.

Multi-Collinearity Analysis
For this analysis, the multi-collinearity test of 13 flash flood causative factors was done considering the VIF and TOL limit ( Table 1). The range of VIF was about 1.07 to 2.44, and the highest and lowest values of VIF were associated with slope and aspect. In the case of TOL, the range was about 0.41 to 0.93. The highest and lowest values of the TOL limit were associated with aspect (0.93) and slope (0.41). There was no such problem of multi-collinearity in the selected variables for estimating the flash flood susceptibility. Therefore, all 13 explanatory variables were used for modeling the flash flood susceptibility in the present study area.

Flash Flood Susceptibility Modeling
In the BRT model, the areal coverage of very high to high flash flood susceptible areas was 425.13 and 683.91 km 2 , respectively, and these zones were mainly located in the middle portion of the watershed ( Table 2). The rest of the portion of this watershed was associated with moderate, low, and very low susceptible zones, and the areal coverage of these zones was 442.73 (21.53%), 404.28 (19.66%), and 100.70 km 2 (4.90%), respectively (Figure 4a).

Evaluation of Parameters
The results of parameter evaluation versus error rate (RMSE) in the two RRF and ERT models are shown in Figures 5 and 6. Based on Figure 5, it was found that in the RF model, the optimal amount of regularization was 0.01 and the mtry number 2 had the least error in flood modeling. In addition, the results of Figure 6 showed in the ERT model the optimal number of random cut 2 and the number of mtry 7 were determined with the least error in order to model the flood. In the PRF model, the areal coverage of very high, high, and moderate flash flood susceptible areas was 239.72 (11.66%), 396.72 (19.29%), and 461.12 km 2 (22.42%), and these susceptible zones were located mainly in the eastern, middle, and northern portions of the watershed. Apart from this, the rest of this region was associated with low to very low flash flood susceptible zones, and the areal extent of these zones was 440.96 (21.44%) and 518.23 km 2 (25.20%), respectively (Figure 4c).
In the case of the RRF model, the areal coverage of very high, high, and moderate flash flood susceptible areas was 247.98 (12.06%), 373.75 (18.17%), and 466.56 km 2 (22.68%), and these susceptible zones were located mainly in the eastern, middle, and northern portions of the watershed. Apart from this, the rest of this region was associated with low to very low flash flood susceptible zones, and the areal extent of these zones was 359.57 (17.48%) and 608.89 (29.60%), respectively (Figure 4d).
In the ERT model, the very high, high, and moderate flash flood susceptibility zones were associated mainly in the northern, eastern, and middle portions of the watershed, and the areal coverage of these zones was 249.84 (12.15%), 332.72 (16.18%), and 391.51 km 2 (19.04%), respectively. The rest of the part of this watershed was associated with low and very low flash flood susceptible areas, and the areal coverage of these regions was 431.14 (20.96%) and 651.54 km 2 (31.68%), respectively (Figure 4e).

Evaluation of Parameters
The results of parameter evaluation versus error rate (RMSE) in the two RRF and ERT models are shown in Figures 5 and 6. Based on Figure 5, it was found that in the RF model, the optimal amount of regularization was 0.01 and the mtry number 2 had the least error in flood modeling. In addition, the results of Figure 6 showed in the ERT model the optimal number of random cut 2 and the number of mtry 7 were determined with the least error in order to model the flood.

Validation of the Models
The validation of all the models was done with the help of the AUC values from ROC and different statistical indices. Considering the AUC, the best model was ERT and the AUC of this model was 0.82. Apart from this, the AUC values of the rest of the models, i.e., RRF, PRF, RF, and BRT, were 0.80, 0.79, 0.78, and 0.75, respectively (Figure 7). The AUC value for this perspective was estimated

Validation of the Models
The validation of all the models was done with the help of the AUC values from ROC and different statistical indices. Considering the AUC, the best model was ERT and the AUC of this model was 0.82. Apart from this, the AUC values of the rest of the models, i.e., RRF, PRF, RF, and BRT, were 0.80, 0.79, 0.78, and 0.75, respectively (Figure 7). The AUC value for this perspective was estimated

Validation of the Models
The validation of all the models was done with the help of the AUC values from ROC and different statistical indices. Considering the AUC, the best model was ERT and the AUC of this model was 0.82. Apart from this, the AUC values of the rest of the models, i.e., RRF, PRF, RF, and BRT, were 0.80, 0.79, 0.78, and 0.75, respectively (Figure 7). The AUC value for this perspective was estimated considering the true positive (TP) and false positive (FP) values of susceptibility modeling. The vertical axis and horizontal axis of this curve were representing the TP and FP values of susceptibility modeling. TPs were pixels which were correctly estimated to be susceptible to flash flooding and, otherwise, FPs were pixels which were incorrectly estimated to be susceptible to flash flooding. Seventy per cent of the overall data were considered to be model or training and the remaining thirty per cent was considered for validation purposes. The higher AUC values represented the higher accuracy, and vice versa. Here, all the models were associated with higher accuracy, but the ERT model was the most optimal for predicting the flash flood susceptibility. Apart from these, different statistical indices (i.e., sensitivity, specificity, PPV, and NPV) also indicated the same characteristics of all the predicted models (Table 3).
considering the true positive (TP) and false positive (FP) values of susceptibility modeling. The vertical axis and horizontal axis of this curve were representing the TP and FP values of susceptibility modeling. TPs were pixels which were correctly estimated to be susceptible to flash flooding and, otherwise, FPs were pixels which were incorrectly estimated to be susceptible to flash flooding. Seventy per cent of the overall data were considered to be model or training and the remaining thirty per cent was considered for validation purposes. The higher AUC values represented the higher accuracy, and vice versa. Here, all the models were associated with higher accuracy, but the ERT model was the most optimal for predicting the flash flood susceptibility. Apart from these, different statistical indices (i.e., sensitivity, specificity, PPV, and NPV) also indicated the same characteristics of all the predicted models (Table 3).

Importance Value
The importance of the variables of all the predicted models was estimated and is shown in Table  4. In the BRT model, the most important variables for predicting flash flood susceptibility were

Importance Value
The importance of the variables of all the predicted models was estimated and is shown in Table 4. In the BRT model, the most important variables for predicting flash flood susceptibility were distance from river (100), rainfall (94.97), altitude (87.36), and slope (86.22). In the case of the RF model, the dominating variables for flash flood susceptibility were rainfall (100), altitude (81.89), and distance from river (75.03). In the PRF model, the most importance variables for flash flood susceptibility assessment were rainfall (100), altitude (79.53), and distance from river (74.81). In the RRF model, the dominating variables for predicting the flash flood susceptibility were rainfall (100) and distance from river (94.21), respectively. In the case of the ERT model, the most importance variables for generating the flash flood susceptibility model were distance from river (100), rainfall (94.97), altitude (87.36), and slope (86.22), respectively (Table 4). Other variables in all the predicted models were associated with moderate to lower importance for flash flood susceptibility assessment.

Discussion
In this study, various machine learning algorithms (i.e., BRT, RF, PRF, RRF, and ERT) were considered for estimating the flash flood susceptible areas with optimal accuracy. In most of the research work related to this field, the focus was on the use of various appropriate machine learning algorithms to estimate the effectiveness of susceptibility modeling. The application of machine learning and artificial intelligence techniques not only saved time and were less expensive, but were also associated with significant accuracy. For this reason, our main objective of this study was to identify the most efficient machine learning algorithm for estimating the flash flood susceptibility in a semi-arid environment. In this outcome, the ERT model (AUC = 0.82) was most optimal according to its predictive capacity, considering the values of AUC and different statistical indices, though there was a slight variation among the predicted models and its associated AUC values. This method created an ensemble of extremely randomized trees for the determination or estimation as per the traditional technique. Its primary differences from the other tree-based learning algorithms were that it divided clusters by randomly selecting cut-points, and it utilized the entire learning dataset to grow trees [56]. Forecasts of trees were also compiled to achieve the best estimation by most of the supports on supervised learning and mean estimation in the case of regression. From the perspective of bias-variance, a justification for the extra trees model was that the implied randomization of the cut-point and assign mixed with the ensemble average had to be able to minimize the deviation quite positively than lesser randomization strategies considered by different techniques. So far, many studies have proven the high ability of the ERT model in various fields of study. Zhou et al. [62] used extremely randomized trees for image classification for malware detection in comparison with KNN and RF, which showed the results were quite impressive with high accuracy rate. Eslami et al. [63] applied the ERT model to air quality forecasting and showed the high efficiency of the ERT model. Regarding the comparison of the ERT model with the random forest model, it should be noted that this model created a large number of trees and divided the nodes using random subset features the same as the random forest, but they had two main differences. In the ERT model, each tree node was randomly divided by the variable index and the splitting value. The ERT model's main fundamental principle was the use of different decision trees, which were individually fragile learners, but when they were combined, they became extremely robust learners [59]. Although this method has been used in other fields of sciences, such as land cover classification [64] and modeling of daily lake surface water temperature [65], the effectiveness of this model in studying natural disasters, especially flash floods, has not been proven. The key aspect is using more sophisticated approaches like the ERT model to consider flash floods because the relationship between flood occurrence and its causes is not linear and requires very strong and complex models [66].
The importance of hydrological and topographical factors is more optimistic in flash flood susceptibility modeling. In the semi-arid environment, flash floods are one of the common natural hazards and have a serious impact on society. The hydrological factors like rainfall and river distance are the most influential elements in flash flood susceptibility. Apart from this, the topographical elements such as altitude and slope are the most important elements of susceptibility to flash flooding. There are different types of flash flood susceptibility research, which suggests the importance of topographic and hydrological elements as the most influential factors for flash flood susceptibility [18,60,67,68]. In the present study, the most important variables for predicting flash flood susceptibility are the distance from river, rainfall, altitude, and slope. One of the factors considered in flood vulnerability studies is the distance from the river. The areas near to river and stream are more sensitive to flooding and this is a fact that many studies have proved [26,69,70]. Our study showed a high correlation between areas close to the river and flood sensitivity, which was consistent with other studies. Rainfall is a main influencing factor in flood susceptibility mapping, which has been considerably observed in the other studies [71]. Among the topographic variables, slope has the greatest effect on the amount of surface runoff and, therefore, is known as the most critical influential variable in many flash flood studies [71][72][73]. Costache [74] showed that slope is the most crucial factor affecting the flash flood distribution in the Prahova river catchment, Romania. Altitude is one of the most important factors in flash flood modeling in this study area. According to the prepared flood susceptibility maps, it is clear that low-altitude areas, which are usually close to the riverbank, are affected by floods due to low slope and proximity to the river during floods. Various researchers have introduced the altitude factor as one of the factors affecting floods and have confirmed that low altitude areas are more sensitive to flooding [2,75,76].
Natural hazards such as flash floods have a severe impact on people and their livelihoods and the associated infrastructure [11,22]. This type of natural hazard cannot be prevented entirely, but its impact can be minimized by implementing appropriate strategies [77]. For this purpose, the modeling of flash flood susceptible areas with the incorporation of proper strategy is an essential part of monitoring and managing this hazard [78]. On the other hand, efficient modeling and mapping can minimize the severe impact of flash floods and reduce the chance of destroying people's livelihoods, economy, and infrastructure [79]. Modeling flash floods' susceptibility is one type of preparedness to reduce the impact of flash floods by implementing appropriate strategies [15]. However, the river basin level's current outcome appears to help flash flood control officials once again. For this more comprehensive scale, including a river field, it is suggested that an integrated structural design be created that could more effectively consider the impact of anthropocentric variables to determine flash flood events for projected water discharge of varying probability. The main task of the future research is to develop the most suitable algorithm for creating such a type of hybrid models which could predict the scenario in a maximum optimal level after incorporating the lesser number of variables. The use of deep learning approaches due to their high accuracy is one of the solutions proposed for the future of natural hazard studies, especially flash floods studies.

Conclusions
This study was conducted to estimate the flash flood susceptibility of the Kalvan watershed in Iran. This region is generally associated with a semi-arid environment and very much prone to flash floods. Thus, the assessment of this type of hazard was necessary by incorporating a suitable algorithm for reducing the damages from it. For this purpose, we considered five hybrid parallel and regularized approaches to estimate the flash flood susceptibility in a more authentic way and with maximum possible accuracy. In this analysis, the most optimal model was ERT with an AUC value of 0.82. The rest of the models' AUC values, i.e., RRF, PRF, RF, and BRT, were 0.80, 0.79, 0.78, and 0.75, respectively. In the ERT model, the areal coverage for a very high to moderate flash flood susceptible area is 582.56 km 2 (28.33%), and the rest of the portion was associated with very low to low susceptible zones. Therefore, in order to avoid this kind of scenario, careful consideration and correct approaches need to be taken in this area. In the proposed susceptibility modeling, the importance of the topographical and hydrological parameters like altitude, slope, rainfall, and distance from the river were more effective than the other parameters considered in this study. Plan curvature, profile curvature, SPI, land use, etc. were associated with lower importance on flash flood susceptibility assessment. This outcome's main novelty was the application and development of hybrid parallel and regularization approaches for estimating the flash flood susceptibility in a semi-arid environment. These models can be applied in any climatic condition and any type of susceptibility assessment. This type of outcome can help the regional planers and local administrators implement the development strategies for escaping this type of situation.