Flash Flood Susceptibility Modeling and Magnitude Index Using Machine Learning and Geohydrological Models: A Modified Hybrid Approach

In an arid region, flash floods (FF), as a response to climate changes, are the most hazardous causing massive destruction and losses to farms, human lives and infrastructure. A first step towards securing lives and infrastructure is the susceptibility mapping and predicting of occurrence sites of FF. Several studies have been applied using an ensemble machine learning model (EMLM) but measuring FF magnitude using a hybrid approach that integrates machine learning (MCL) and geohydrological models have not been widely applied. This study aims to modify a hybrid approach by testing three machine learning models. These are boosted regression tree (BRT), classification and regression trees (CART), and naive Bayes tree (NBT) for FF susceptibility mapping at the northern part of the United Arab Emirates (NUAE). This is followed by applying a group of accuracy metrics (precision, recall and F1 score) and the receiving operating characteristics (ROC) curve. The result demonstrated that the BRT has the highest performance for FF susceptibility mapping followed by the CART and NBT. After that, the produced FF map using the BRT was then modified by dividing it into seven basins, and a set of new FF conditioning parameters namely alluvial plain width, basin gradient and mean slope for each basin was calculated for measuring FF magnitude. The results showed that the mountainous and narrower basins (e.g., RAK, Masafi, Fujairah, and Rol Dadnah) have the highest probability occurrence of FF and FF magnitude, while the wider alluvial plains (e.g., Al Dhaid) have the lowest probability occurrence of FF and FF magnitude. The proposed approach is an effective approach to improve the susceptibility mapping of FF, landslides, land subsidence, and groundwater potentiality obtained using ensemble machine learning, which is used widely in the literature.


Introduction
Flash floods are a temporary overflow of rivers or valley plains as a natural response to unusually heavy rains. They can cause damage to infrastructure and human life [1,2]. FF usually occur frequently at narrow mountainous valleys (wadis), alluvial fans at the foot of mountainous and narrow coastal areas as a response to climate change and intensive rainfall over an impermeable and an impervious surface [3,4]. Globally, about one-third of the Earth's surface (where more than 70% of the world population reside), frequently experiences to flash flooding [5].
The UAE, including the study area, has not escaped this natural hazard since it experiences several flash flooding on a regional scale. The northern part of the UAE recorded huge amounts of rain between 9 January and 12 January 2020. The heaviest rainfall was 24 years ago in Khor Fakkan with 144 mm (5.66 inches) of accumulated rainfall (https://www.ncm.ae). In Ras Al Khaimah (RAK), one woman was crushed to death after a wall collapsed during a violent storm.
In Ghalilah and Al Fahlain villages of the RAK, flash floods destroyed roads, farms and flooded the village graveyard ( Figure 1). Away from the mountainous areas, the cities of Sharjah and Dubai have experienced monstrous floods consuming roads and vital areas such as Terminal 1 of Dubai International Airport, shopping malls and Jabal Ali (https://www.ncm.ae). Flash flooding events solely depend on several terrain and geohydrological parameters such as alluvial plain width, mountainous valley width, altitude, topographic slopes, topographic curvature, steam density, topographic relief, the angle of repose and, of course, the intensity of rainfall. The angle of repose or talus slope ranges between 25 and 40 and depends upon the nature and type of the rocks and is directionally proportional to the flash flood magnitude [6].
Remote Sens. 2020, 12, x FOR PEER REVIEW 2 of 31 2 rain between 9 January and 12 January 2020. The heaviest rainfall was 24 years ago in Khor Fakkan with 144 mm (5.66 inches) of accumulated rainfall (https://www.ncm.ae). In Ras Al Khaimah (RAK), one woman was crushed to death after a wall collapsed during a violent storm. In Ghalilah and Al Fahlain villages of the RAK, flash floods destroyed roads, farms and flooded the village graveyard ( Figure 1). Away from the mountainous areas, the cities of Sharjah and Dubai have experienced monstrous floods consuming roads and vital areas such as Terminal 1 of Dubai International Airport, shopping malls and Jabal Ali (https://www.ncm.ae). Flash flooding events solely depend on several terrain and geohydrological parameters such as alluvial plain width, mountainous valley width, altitude, topographic slopes, topographic curvature, steam density, topographic relief, the angle of repose and, of course, the intensity of rainfall. The angle of repose or talus slope ranges between 25 and 40 and depends upon the nature and type of the rocks and is directionally proportional to the flash flood magnitude [6].  These consequences can be controlled or, at least, reduced by constructing a regional and precise susceptibility mapping and analysis [7] and calculating the angle of repose or talus for each hydrological basin. Thus, building an accurate geohazard model and measuring flash flood magnitude over a regional scale is one of the researchers and decision-makers important task [8]. Susceptibility can be defined as a prediction of where the future hazardous event is likely to occur [9,10]. The wide availability of free of charge remote sensing data and machine learning algorithms allowed researchers to susceptibility map and predict flash floods over a regional scale efficiently and economically [11][12][13][14].
Most of these studies have been focused on susceptibility mapping of FF using ensemble machine learning or a comparative assessment of machine learning algorithms. However, these studies have not focused on FF conditioning parameters such as alluvial plain width, valley width and basin slope. Additionally, the magnitudes of FF has not been taken into considerations. This study aims to modify a hybrid integration approach for flash flood susceptibility mapping in an arid region. Here, we first performed a comparison between BRT, CART, and NBT models for FF susceptibility mapping for the first time. The best FF susceptibility map was chosen and then modified by dividing it into seven basins. Each basin has its own FF magnitude. The FF magnitude was calculated using four new FFCPs namely alluvial plain width, valley width, basin gradient and mean slope. The proposed approach represents an advancement step to modify predicted maps of FF, landslides, land subsidence and groundwater potential produced using machine learning models. The modified approach can be of great help to risk management specialists and geohazard prevention scientists.

Study Area
The study area stretches from longitude 54 • Figure 2). Most of the built-up area is concentrated on coastal strips and waterfronts such as creeks and artificial lakes, while the agricultural area is limited to the alluvial plains, wherever rainfall and paleochannels (wadis) are found.
The area is characterized by narrow alluvial coastal plains in the north-western and the eastern parts of the study area with a width ranging from 2 to 5 km, reaching its maximum width at Falahyeen and Al Dhaid villages (No. 9 and 19 in Figure 2). Lithologically, the upper streams (mountainous areas) are dominated by the igneous and metamorphic rocks in the east and carbonate rocks in the north and alluvial deposits at the foot of the mountainous areas [13]. The area has weather varying from hot and humid during the summer and being warm during the winter (Figure 3a). The annual rainfall varies from 30 mm in the south-eastern desert near the city of Dubai to 180 mm in the mountainous areas in the north and east [37,38]. The maximum number of rainfall days over the study is four to six days per month during the period from December to March (Figure 3b). The maximum daily precipitation value is 1.2 mm during March (Figure 3c) (Giri and Singh 2015). The estimated annual rainfall over the mountainous and coastal areas was about 97% of total rainfall over the NUAE [38].  Hydrologically, the area is comprised of three aquifers: a carbonate, ophiolite, coastal, and an alluvial. The aquifers are drained by several surface wadi courses. Their trends are common in the NW-SE, NNW-SSE, NE-SW and NNE-SSW directions [39,40]. These features play an important role in flash floods by accumulating rainwater from upstream and crash houses and farms in the downstream [39]. Hydrologically, the area is comprised of three aquifers: a carbonate, ophiolite, coastal, and an alluvial. The aquifers are drained by several surface wadi courses. Their trends are common in the NW-SE, NNW-SSE, NE-SW and NNE-SSW directions [39,40]. These features play an important role

Datasets and Methodology
The proposed approach can briefly be described as the following steps: (i) constructing a flash flood inventory map (dependent variable), (ii) constructing flash floods conditioning parameters (independent variables), (iii) spatially analyzing the relationship between each conditioning parameter and flash flood events, (iv) optimal parameterization and flash flooding susceptibility mapping, (v) evaluating the performance and assessing the accuracy of machine learning models, and (vi) dividing the area into seven basins and calculating flash floods magnitude for each basin. A flowchart of the methodology adopted in the current study is shown in Figure 4.

Construction of Flash Floods Inventory Map (FFIM)
FFIM is an excellent indicator for FF susceptibility mapping. Here we used several sources including Google searches, the Google Earth application and local reports of newspapers and weather. These reports were collected and downloaded via the webpage of the National Centre of Metrology webpage (https://www.ncm.ae/Radar_UAE_Merge). Since 1990, 61 flash flood events were reported across the study area, and the most severe event happened between 9 and 12 January 2020 with 144 mm (5.66 inches).
Most of the FF locations were reported to be distributed in the mountainous valleys, narrow alluvial coastal plains and alluvial fans at the foot of the mountainous areas ( Figure 2). These FF locations were used as training datasets to investigate the spatial relationship between flash floods conditioning parameters and flash flooding occurrence, to learn the machine learning models, and to evaluate the performance and assess the accuracy of the three machine learning models.

Construction of FFCPs
This study aims to map the susceptibly of flash floods and measure their magnitudes in an arid mountainous region with a minimum number of essential FFCPs to reduce errors and computational time and enhance the performance of the BRT, CART and NBT models [41,42]. Three types of FFCPs were chosen based on their degrees of influencing FF occurrences namely terrain and geohydrology. The terrain parameters include altitude, topographic slope, relief, topographic minimum curvature, while the geohydrology parameters include lithology, stream network (wadi courses), stream density, and distance from stream courses ( Figures 5 and 6). Thematic maps of FFCPs such as altitude, topographic slope, topographic relief, topographic curvature, and stream networks (wadi courses) were generated from ALOS DEM with a spatial resolution of 30 m using raster surface of 3D analysis and a hydrology of spatial analysis tools implemented in ArcGIS v.10.2 software. First, maps of altitude, slope, relief and topographic curvature were calculated by importing a 30 m DEM, converting a DEM into raster grid and applying raster surface to the raster grid. The range of altitude and relief from 100 m to 1800 m (m.s.l), the slope map classified into five classes: (i) 0 • -5 • , (ii) 5 • -15 • , (iii) 15 • -30 • , (iv) 30 • -60 • , and (v) >60 • and the range of curvature from −200 to 50. Second, stream network was derived from a DEM using D8 algorithm implemented in hydrology tool. The algorithm starts by fill gaps (central pixel with no data) and determines into which neighboring pixel any water in a central will flow. After that, the flow direction and downhill slope of a central pixel to one of eight neighbors was calculated. Then, flow accumulation was calculated followed by deriving major stream networks using a threshold value of 45 [14]. This value was optimal to reveal the major stream networks in the study area. After that, drainage basins were calculated using the calculated flow direction theme. Third, distance from stream networks and the density of stream network were constructed using distance and density of spatial analyst tools implemented in the ArcGIS v.

Spatial Analysis
Altitude and topographic slope are the most important conditioning parameters for FF occurrences as they control water flow, flow direction, surface runoff and infiltration rate [25,42]. Sites at a lower altitude have a higher probability of FF where water flowing down from upper streams [43]. The topographic slope has a crucial influence on surface water flow, flow direction, runoff, infiltration rate and FF occurrence. As topographic slope increases, runoff potential increases resulting in FF [44]. Topographic curvature has a similar influence on FF occurrence. Sites with negative values for curvatures are zones of water accumulation and, thus, a higher probability of FF occurrence, while sites with positive values for curvature are zones of water dispersion, and thus have a lower probability occurrence of FF [25]. Lithology and its physical characteristics (e.g., porosity and permeability) strongly influence infiltration rate, runoff potential, stream network distribution, and thus FF occurrence [29]. Other FF conditioning parameters such as stream density and distance from streams also play a significant role in FF occurrence. As the distance from streams decreases, the probability of FF occurrence increases [45]. Factors such as aspect, land use/land cover (LULC), NDVI, topographic wetness index and index of the erosion power are secondary parameters and introduce bias and error during the modeling process and can be ignored [12,46,47]. These various FFCPs were chosen based on the geoenvironmental characteristics of the study area and used widely in this literature. These parameters can help in detecting flash flood-affected areas from the surrounding areas since flash flood occurrence is identified as varying greatly with the intensity of rainfall, altitude, slope and stream network [48,49].

Boosted Regression Tree (BRT)
The BRT is an ensemble technique and differs statistically from traditional methods. The BRT consists of machine learning and statistical techniques designed to improve the accuracy and the performance of a single model by fitting a group of models before combining these models for classification and prediction [50]. The BRT model merges regression from classification and regression tree (CART) and boosting techniques to produce a combined modeling. Boosting is a technique designed to enhance the performance of regression trees similar to model averaging [51]. However, the BRT implements a stepwise process, where the models are fitted to a subset of the training dataset. This subset used at every iteration of the model fit is stochastically chosen with no replacement.
The shrinkage parameter or learning rate determines the level of contribution for each tree to the growing model, while the number of nodes in a tree (tree complexity) decides whether interactions are fitted [52]. Then, these parameters determine the total number of trees required for prediction [53].
Elith et al. (2008) [53] described the model as the following steps: 1. Initialize weights to be equal w i = 1/n for m = 1 to iter classification C m : 2. Fit classifier C m to the weight data 3. Compute the weight or misclassification rate r m 4. Let the classifier weight 6. Majority vote classification:

Classification and Regression Trees (CART)
The CART is one of the most common algorithms for the classification of data. It is resistant to missing data, and its variables do not need to have a normal distribution [51,54]. It is a binary recursive partitioning procedure capable of processing continuous and nominal attributes as targets and predictors and was developed by Friedman (1975) [55], Breiman (1984) [56], and Breiman and Stone (1978) [57].
The algorithm has been successfully applied in medical applications to predict the value of a dependent variable based on the different values of independent variables [58], economics applications [59], photogrammetry [60], environmental protection [61], food science and chemistry [62,63], landslide susceptibility mapping [64], and groundwater potential mapping [65]. Classification trees are used when an independent variable is categorized, while regression trees are used when independent is continues and to predict its value (Figures 5 and 6). The CART algorithm is designed as a sequence of trees where the ends are terminal nodes. It consists of three elements: (i) rules of splitting data at a node based on the value of one variable, (ii) stopping rules for deciding when a branch is terminal and can be split no more, and (iii) a prediction for the target variable in each terminal node (Figure 7). The major problem of building a valuable tree is finding the proper guidelines to prune the tree.
Remote Sens. 2020, 12, x FOR PEER REVIEW 12 of 31 12 The algorithm has been successfully applied in medical applications to predict the value of a dependent variable based on the different values of independent variables [58], economics applications [59], photogrammetry [60], environmental protection [61], food science and chemistry [62,63], landslide susceptibility mapping [64], and groundwater potential mapping [65]. Classification trees are used when an independent variable is categorized, while regression trees are used when independent is continues and to predict its value (Figures 5 and 6). The CART algorithm is designed as a sequence of trees where the ends are terminal nodes. It consists of three elements: (i) rules of splitting data at a node based on the value of one variable, (ii) stopping rules for deciding when a branch is terminal and can be split no more, and (iii) a prediction for the target variable in each terminal node (Figure 7). The major problem of building a valuable tree is finding the proper guidelines to prune the tree. At the first stage, classification is created and leads to producing a tree with several branches. The number of branches of any tree depends on the degree of dispersion of data. The size of the tree depends on specific parameters such as the minimum population in the successive nodes, the minimum population of children, the maximum number of levels and the maximum number of nodes [51]. It is worthy to note that there is no relationship between the size of the tree and the accuracy of classification. The correct classification can be made by decreasing the overfiting of the training set.
The phase of cutting is created by generating the biggest possible trees and this process lies in reducing the total number of leaves and tending to increase the accuracy of classification. The final phase is the selection of a tree with a lower number of misclassifications and a higher accuracy. This higher accuracy can be released with the application of cross-validation using Equation (3): where yi is the number of points in the testing set (real variable), xi is the number of points in the testing set (variable classified with d model), N is the number of cases in a testing set. The results of the predicted model were evaluated using a set of testing samples. The measure of the crossvalidation Ra(T) is a linear dependence between the complexity of the tree and the cost of misclassifications Equation (4) [51].
where Rα (T) is the cost-complexity measure, R(T) is the cost of misclassifications, |T| is the complexity of tree measures as the number of terminal nodes in the tree, a parameter of tree complexity (assumes values from 0 for a maximal tree to 1 for a minimal tree). The produced regression rule set was then applied to all FFCPs to map flash flood susceptibility. It is worthily of note that the dependence (complexity of the tree) and accuracy of classification should be taken into consideration. The low complexity of the tree usually leads to the low accuracy of classification.
The output of CART is a hierarchical binary tree which subdivides the prediction space into several regions (Rm) where the response factors have similar values (≡ am) based on Equation (5): At the first stage, classification is created and leads to producing a tree with several branches. The number of branches of any tree depends on the degree of dispersion of data. The size of the tree depends on specific parameters such as the minimum population in the successive nodes, the minimum population of children, the maximum number of levels and the maximum number of nodes [51]. It is worthy to note that there is no relationship between the size of the tree and the accuracy of classification. The correct classification can be made by decreasing the overfiting of the training set.
The phase of cutting is created by generating the biggest possible trees and this process lies in reducing the total number of leaves and tending to increase the accuracy of classification. The final phase is the selection of a tree with a lower number of misclassifications and a higher accuracy. This higher accuracy can be released with the application of cross-validation using Equation (3): where yi is the number of points in the testing set (real variable), xi is the number of points in the testing set (variable classified with d model), N is the number of cases in a testing set. The results of the predicted model were evaluated using a set of testing samples. The measure of the cross-validation R α (T) is a linear dependence between the complexity of the tree and the cost of misclassifications Equation (4) [51].
where R α (T) is the cost-complexity measure, R(T) is the cost of misclassifications, |T| is the complexity of tree measures as the number of terminal nodes in the tree, a parameter of tree complexity (assumes values from 0 for a maximal tree to 1 for a minimal tree). The produced regression rule set was then applied to all FFCPs to map flash flood susceptibility. It is worthily of note that the dependence (complexity of the tree) and accuracy of classification should be taken into consideration. The low complexity of the tree usually leads to the low accuracy of classification.
The output of CART is a hierarchical binary tree which subdivides the prediction space into several regions (R m ) where the response factors have similar values (≡ a m ) based on Equation (5): 3.3.3. Naive Bayes Tree (NBT) Naive Bayes (NB) is a machine learning classifier that creates a probability-based model. It works based on Bayes Theorem, which is known as Naive Bayes. The NB uses a decision tree (DT) for its structure and organizes an NB model on every leaf node of the constructed DT [66]. The NBT exhibits a significant classification performance and accuracy [67,68].
During the NB process, the impact of an attribute value on a specific class is independent of the value of another attribute and known as class conditional independence. This conditional independence of NB makes the datasets to train quicker and it considers all the vectors as independent and applies the Bayes rule [69]. Bayes role can be explained as follows (Equation (6): where: P(A|B) = conditional probability of A given B P(B|A) = conditional probability of A given B P(A) = probability of event A P(B) = probability of event B The model starts by estimating the probability of each class in the model, calculating the covariance and variance matrix, and building the discriminate function for each class [70][71][72].

Optimal Model Parameterisation and Flash Flood Susceptibility Mapping
As a first step, the CART, BRT and NB models were fitted in SATISTICA v. 7 [73], Salford system [74,75], and in R (R Development Core Team 2006) v.3.0.2 [76], implementing gbm, dismo, rpart, and random forest packages [77]. These tools have a stochastic gradient boosting tree which is widely used for regression problems related to predicting and mapping continues dependent variables [73]. After that, the setting and optimizing of all parameters was performed. These parameters were; learning rate, the number of additive trees, the proportion of sub-sampling, and so forth.
Here, the optimal value for the learning rate was set as 0.1, additive trees were 185, and the maximum size of the tree was five. These values may lead to precise results accuracy [74]. In this study, the random point's values have been extracted from each variable of FFCPs for the presence and absence condition of the FF. After that, all three machine learning models were then run based on the mechanism of the open-source tools. Using these tools, FFSM was calculated for each pixel in the thematic maps of FFCPs and then converted into text files. Finally, these text and dbase files were imported into SPSS v.25 to evaluate the models' performance and generate FFSM in GIS environment of ArcGIS v.10.2 software.
During the prediction processing, the models used FFCPs and the regression tree separates the FFCPs into two groups [78,79]. A group such as distance from streams, altitude, and slope in the upper part of the regression tree indicates an approximate area with a higher probability occurrence of FF. Another group, such as altitude, slope, and topographic curvature in the lower part of the regression tree allowed recognition areas of a higher probability of FF occurrence. Among several interval methods, the quantile method, which is used widely in the literature, was chosen to classify FFSM [12,14,36]. The produced FFSM was then classified into four classes namely low, moderate, high, and very high.

Evaluation of the Models Performance
To evaluate the models' performance, we used 61 FF locations. The datasets were divided into 43 (70%) for model training and 18 (30%) for the model validation. These datasets were classified and selected randomly using the Hawth's Tool implemented in the ArcGIS v. 10.2 Software. We calculated the accuracy metrics for each model. Each metric includes accuracy, precision, recall and F1 score. The F1 score was found to the best technique and used widely in literature [13,14,80]. The F1 score was calculated based on four parameters, namely true positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) using the following equations from 7-11: where po is the observed agreement ratio, and pe is the expected agreement Recall = TP/(TP + FN) where TP is the true-positive; FP is the false-positive and FN is the false-negative. The performance of SVM and SAM were evaluated using the open-source R 4.0.0 software. Further validation was performed using the receiver operating characteristics (ROC) curve, which is used widely in the literature due to its simplicity, easiness and higher accuracy [81]. The curve has been successfully used by several researchers in several applications such as groundwater potential mapping [82], and land subsidence susceptibility mapping [12]. The obtained prediction FF maps sometimes contain errors. These errors sometimes come from the deficiency of the FFCPs quality and the structure of the models [46,83].
The accuracy of the produced prediction maps was measured using the area under the curve (AUC) [84]. The AUC ranges from 0 to 1. AUC with a value of 1 indicating a good prediction, and a value of 0 indicating the model is not efficient and cannot predict FF occurrence. Both the success and prediction rates were created to assess the accuracy of the FFSM [85]. The value of AUC can be estimated via the following equation [86] AUC = Σ (TP + ΣTN/(P + N)) (12) where TP (true positive) and TN (true negative) are the numbers of pixels that are correctly classified. P is the total number of pixels with torrential phenomena, and N is the total number of pixels of no flash floods.

Geohydrological Model for FFMI and Filling the Gaps in MLC Maps
Although ensemble-based machine learning models have been used widely in FFS mapping due to their greater accuracy, these models still have some limitations regarding FFCPs. These include the length of the basin, basin area, the gradient of each basin, alluvial plain width, and mean slope. These new parameters are very important in measuring the FF magnitude. Here, we first delineated drainage basins from a DEM using a hydrological tool implemented in the Arc GIS v. 10.2 Software. After that, each basin was considered and treated as a separate FF zone and its magnitude was measured by calculating the following parameters ( Figure 8 and Table 1 (v) Calculating the alluvial plain width (A w ) for each basin manually in a GIS (vi) Calculating the mean slope (Ms) for each basin using a moment statistic (vii) Calculating FF magnitude for each basin with the following equation;

Evaluation of the Models Performance and Validation
Visual inspection shows that there are some differences among the FFSM maps produced using machine learning models. Thus, it is important to evaluate model performance and assess the prediction accuracy. The results from the evaluation of the model performance show that the BRT model had the highest accuracy, followed by the CART and the NB models. The BRT yields an F1 score value of more than 0.91 for all FFS classes, followed by the CART with an F1 score value of more than 0.90 for high and very high classes (Figure 9).
The NB had the lowest F1 score for all FF classes. Thus, the validation results confirmed a positive agreement between the observed and predicted values for the BRT and CART models. Additionally, the slight difference between the F1 score of the BRT and the CART models is due to the gap between the two models and is not statistically different [87]. The BRT model offers reliable information regarding the FF to be predicted [42]. The BRT has the boosting approach that can employ an existing AI method and has the dual advantage of boosting and decision trees [87]. Further quantitative validation using the ROC curve was performed to examine the reliability of the obtained FFSM [88]. Similar to the F1 score, the BRT model has the highest AUC value (0.92), followed by the CART model (0.90) and the NB model (0.79). The high performance of the BRT is because it combines the CART with a boosting algorithm ( Figure 10). Visual inspection shows that there are some differences among the FFSM maps produced using machine learning models. Thus, it is important to evaluate model performance and assess the prediction accuracy. The results from the evaluation of the model performance show that the BRT model had the highest accuracy, followed by the CART and the NB models. The BRT yields an F1 score value of more than 0.91 for all FFS classes, followed by the CART with an F1 score value of more than 0.90 for high and very high classes (Figure 9). The NB had the lowest F1 score for all FF classes. Thus, the validation results confirmed a positive agreement between the observed and predicted values for the BRT and CART models. Additionally, the slight difference between the F1 score of the BRT and the CART models is due to the gap between the two models and is not statistically different [87]. The BRT model offers reliable information regarding the FF to be predicted [42]. The BRT has the boosting approach that can employ an existing AI method and has the dual advantage of boosting and decision trees [87]. Further quantitative validation using the ROC curve was performed to examine the reliability of the obtained FFSM [88]. Similar to the F1 score, the BRT model has the highest AUC value (0.92), followed by the CART model (0.90) and the NB model (0.79). The high performance of the BRT is because it combines the CART with a boosting algorithm ( Figure 10).

Spatial Analysis and Flash Floods Susceptibility Mapping
The results of the spatial analysis show that the extreme FF events had occurred at narrow alluvial plains of the mountainous and coastal areas. These areas are characterized with steep slopes, high relief, surface run-off and high density of streams. The higher density of streams reflects rocks with a lower rate of permeability that has a higher probability of FF occurring. The most important

Spatial Analysis and Flash Floods Susceptibility Mapping
The results of the spatial analysis show that the extreme FF events had occurred at narrow alluvial plains of the mountainous and coastal areas. These areas are characterized with steep slopes, high relief, surface run-off and high density of streams. The higher density of streams reflects rocks with a lower rate of permeability that has a higher probability of FF occurring. The most important FFCPs affecting FF occurrence altitude and slope (Figure 5a,b). Both parameters strongly influence relief, topographic curvature (Figure 5c,d), soil moisture and surface run-off. For topographic curvature, convex classes (>0) have a very low influence on FF occurrence. Concave slopes (<0) had the strongest impact on FF occurrence (Figure 5c). About 90% (40 FF events) of the past FF events had occurred at an elevation from 300 m to 1400 m and slopes between 10 • to 15 • (Figure 5a). Another important FFP affecting flood was lithology. For the lithology factor, the upper streams are dominant by igneous and metamorphic rocks, while the lower streams are dominant by alluvial deposits. Most of the past FF events had occurred in the alluvial plains and fans (flooded plains) at the foot of the mountainous areas (igneous and metamorphic rocks) (Figure 6a). For distance from streams and streams density, the highest number of the past FF events had occurred in areas within 1000 m from the major stream networks (wadi courses) and characterized by a low density of streams (Figure 6b,c).
Parameters such as LULC and aspect and plan curvature have no significant contribution to the modeling process and could affect the accuracy of the model's predictions [13,44,89]. These parameters should be ignored and not considered in the modeling process since the aspect is already calculated during the extraction of stream networks, and the area is characterized by low urban development [13,42].
Maps of FFSMs were constructed by dividing the study area into separated pixels. Each pixel was categorized as a flood and non-flood class. Thus, the FFS index for each map was calculated for all pixels and each pixel was assigned a unique susceptibility index [12,13,36]. The testing of several classification methods such as equal interval, geometrical interval, natural break and quantile shows that the quantile and interval methods were the most appropriate method to classify flooded and non-flooded areas, respectively. This finding agrees well with similar studies applied by Khosravi et al. (2016) [36] who tested several classification methods for different susceptibility mapping. Susceptibility maps of FF produced using BRT, CART and NBT model are shown in Figure 11. These susceptibly indices were categorized into four classes intervals using the quantile technique, which is used widely in the literature [12,36,90]. The produced susceptibly classes were recognized namely very high, high, moderate and low construct FFSMs ( Figure 11).
The maps demonstrate that the high and very high susceptibility classes are commonly located in wadi courses and alluvial plains of the mountainous areas in the east and north. Some portions of very high and high classes are located at the foot of mountainous areas. About 54% (3196.4 km 2 ) of the total area was classified as high and very high classes of FF, 19.3% (1136 km 2 ) was classified as moderate susceptibility classes of FF, and 26.5% (1561 km 2 ) as low class susceptibility of FF. The effectiveness of the proposed MCL models was confirmed by the highest F1 and AUC values than the individual MCL model.

Geohydrological Model for FFMI and Filling the Gaps in MCL Maps
Although the BRT model yields the highest performance, the geographical and spatial variability of the valley depth and alluvial plain width parameters have not been taken into consideration. In this study, the FF magnitude index (FFMI) was calculated using a set of new terrain parameters for each derived basin (Table 1). These parameters include basin area (A) (Figure 12a) the length of the basin (L b ) (Figure 12b), relief (B h ) (Figure 12c), alluvial plain width (A w ) (Figure 13a), gradient (G • ) (Figure 13b), and mean slope (Ms) (Figure 13c). Figure 12a shows that the area is divided into seven basins (zones) of flash flood and can be divided into two types. The first type is narrow coastal zones such as RAK in the northwest, Masafi, Rul Dadanh-Dibba and Fujairah-Kalba in the east. The second type is wide inland basins (zones) such as Falahyeen and Al Dhaid in the west and Hatta-Houylate in the south (Figures 1, 2 and 12a). Except for Al Dhaid and Falaheen basins, all basins are small in area, short in length, drained by dendritic streams in shape and narrow alluvial plains. These zones and their adjoining areas have high gradient angles ranging from 10 • to 33 • , high relief values of more than 900 m, mean slope of than 30 • , and an alluvial plain width of less than 5 km (Figures 12 and 13). Lithologically, all upper streams are dominated by the igneous, metamorphic, and carbonate rocks, while the lower streams are dominated by alluvial deposits. These parameters directly influence the magnitude of the destruction of the FF and have a greater impact on the occurrence of FF in an arid region. For example, a basin (zone) with a higher relief and runoff potential indicates rocks with lower permeability, steeper slopes, relief, and high runoff potential in a basin with a narrow alluvial plain, which can cause susceptibility to floods [91]. Figure 14a shows the modified map of FF produced using the proposed hybrid approach. The map shows different FF zones. Each zone has its own FF magnitude. The estimated FF magnitude values for the basins of RAK and Massafi were 3.24 and 3, respectively (Table 1 and Figure 14a). Villages, roads and farms in these basins were severely affected zones. They cover an area of about 1379 km 2 (23.4%). Rol Dadnah and Fujairah-Kalba basins that cover an area of 1055. 6  To validate the produced FFMI, the past FF events were draped over the FFMI and spatial analysis was performed. The results showed that most of the past FF events (40 FF events) had occurred in high and very high FF susceptibility zones. Further analysis was performed by draping the existing infrastructures and agricultural area over the FFIM shows that most of the villages and farms in mountainous areas and the RAK are located in areas at a higher risk. This fact is acceptable since all settlements, farms and roads have been constructed in the high and very high susceptible zones.
The proposed approach permits that FFCPs be updated at any time, as new parameters become available.

Evaluation of the Models Performance and Validation
In this study, a hybrid approach, which integrates machine learning and geohydrological models, was modified to map FF susceptible areas and measure their FF magnitude in an arid mountainous region. We first used three machine learning models to map the susceptibility of natural phenomena with nonlinear relationships and without the need for prior elimination of statistical supposition and data transformation [12,92,93]. These types of models can fit complex nonlinear relationships between FF locations and conditioning parameters and their efficiency compared based on accuracy matrices (precision, recall and F1 score) and AUC-ROC [14].
The results demonstrated that the BRT model had the highest performance, while NBT a higher accuracy comparing with NBT [53]. This finding is consistent with Rahmati et al. (2020) [94] who used a machine learning approach for spatial modeling of agricultural droughts. They concluded that the BRT and CART models showed the best performance and prediction accuracy compared with NBT and linear supervised classifiers. Our findings also agree well with Naghibi et al. (2016) [65], who concluded that the BRT model produced the best prediction results followed by the CART and RF models. These machine learning, used widely in the literature, were applied due to their simplicity

Evaluation of the Models Performance and Validation
In this study, a hybrid approach, which integrates machine learning and geohydrological models, was modified to map FF susceptible areas and measure their FF magnitude in an arid mountainous region. We first used three machine learning models to map the susceptibility of natural phenomena with nonlinear relationships and without the need for prior elimination of statistical supposition and data transformation [12,92,93]. These types of models can fit complex nonlinear relationships between FF locations and conditioning parameters and their efficiency compared based on accuracy matrices (precision, recall and F1 score) and AUC-ROC [14].
The results demonstrated that the BRT model had the highest performance, while NBT a higher accuracy comparing with NBT [53]. This finding is consistent with Rahmati et al. (2020) [94] who used a machine learning approach for spatial modeling of agricultural droughts. They concluded that the BRT and CART models showed the best performance and prediction accuracy compared with NBT and linear supervised classifiers. Our findings also agree well with Naghibi et al. (2016) [65], who concluded that the BRT model produced the best prediction results followed by the CART and RF models. These machine learning, used widely in the literature, were applied due to their simplicity in description, their accuracy, and straightforwardness of interpretation [7,8,13,14,22,23,[29][30][31]33,53,94,95]. However, limited numbers have been applied to FF susceptibility mapping using a hybrid approach, which integrates machine learning models and morphological and geohydrological parameters to map FF susceptibly and measure its magnitude for each basin the FFSM.

Spatial Analysis and Flash Floods Susceptibility Mapping
FF is one of the main destructive phenomena that occur in mountainous areas and narrow alluvial coastal areas, especially in the NUAE. FF susceptibility mapping using remote sensing and MCL algorithms is considered as a crucial step to reduce the destructive impact of any future FF event [36,80,96]. Spatial analysis showed that most of the built-up and agricultural areas of the Emirates of RAK in the northwest and Fujiarah in the East (95%), and some parts of the Emirates of Ajman and Sharjah (20%) are located in high and very high susceptible zones. Thus, most of roads, dams, farms, and the human population are highly susceptible FF because they are located in wadi courses of the mountainous areas and at the foot of the mountainous areas. These areas receive intensive rainfall due to the impact of climate change [38]. In these zones, a proper urban planning scheme is very important to reduce risk hazard of any future FF event (Bathrellos et al., 2017).
Tremendous numbers of previous studies proposed a combination of MCL models for FFS mapping. They built susceptibility maps using several conditioning factors that are relatively complex [28,36,38,86,96]. Other studies have shown that intensive precipitation, LULC and geohydrology parameters are important factors controlling FF occurrence [28,36,96]. Further studies have shown that factor such as human activities is a significant in FF occurrence [25,94]. These factors such as LULC and human activities could not consider as significant factors in the study area due to low population and intensive human activities. Additionally, the obtained FFSMs using MCL are, in realty, altitude and/or slope map. Thus, it is important to modify geohydrological model and a hybrid approach.

Geohydrological Model for FFM Indexing and Filling the Gaps in MLC Maps
To measure FF magnitude and fill the gaps in the MCL maps, it is important to a hydrological model. Until now, there is no standard rule to choose FFCFs, flood and non-flood locations. Here, the result obtained using the proposed approach and new FFCPs is consistent with the constructed FF inventory map and demonstrated that the proposed approach was able to map susceptible FF and measure their magnitudes in an arid region and much more accurately and reliably compared to ensemble machine learning approaches that are widely used to susceptibility map groundwater potentiality [82], land subsidence [12], landslides [3, 42,85], and flash floods [3, 23,26,[29][30][31]. The obtained susceptibility maps using MCL can be upgraded and re-categorized using the proposed approach and demonstrated that the approach was able to create a satisfactory FFM. The result shows that the highest number of the past FF events in the study area are commonly occurred in the major mountainous streams (wadi courses) and the narrow coastal strip in the east and in the northwest. These areas are lowlands covered by alluvial deposits, located at the foot of the Oman mountains and characterized by the gentle slope.
Based on the new map of FFMI and its related infrastructures map (Figure 14b), about 153.34 km in length of mountainous roads and those at the foot of mountainous areas are dangerous and deadly roads. Roads of residential areas are also dangerous and had a higher probability to destroy (Figure 14b). In Ras Al Khaimah (RAK), one woman was crushed to death after a wall collapsed during a violent storm (NCM, 2020). In Ghalilah and Al Fahlain villages of the RAK, flash floods destroyed roads, farms and flooded the village graveyard ( Figure 1). The risk of damage can be reduced by constructing valley dams and a real-time alert system in the mountainous areas. The existing human settlements in the valley mouth should be shifted to the terrain at a lower elevation with a very gentle slope. Here, the produced FFSM and FFMI can be used as a reference for decision-makers and urban planners.
The results of the proposed approach permit a better understanding of the natural hazard setting of the study area for the first time. The results also facilitate the detection of sites of a higher probability of FF occurrence help identification of infrastructures that are located at high risk. The use of geohydrological approach can be used to fill the gaps in the FFSMs obtained using MCL models and represents an effective approach for FFSM and measuring FF magnitude, particularly in the NUAE, which has not been investigated previously. This finding agrees well Chen et al. (2019) [97] who concluded that the superiority of hybrid models. However, some limitations have been reported during the modeling process. These limitations include the spatial resolution and number of FF conditioning parameters as well as the optimal parameterization of the machine learning algorithms [12,13,95]. Therefore, future work will focus on FF susceptibility mapping using new FFC parameters such as alluvial plain width, the depth of the mountainous valley, and the gradient of the basin. Future work will focus on constructing a real-time meteorological system that is needed to predict areas with a higher FF occurrence. Plantation of Prosopis Cineraria forests and merging steel wedges and screens on the wadi slopes are also needed to reduce runoff potential.

Conclusions
In this study, a hybrid approach that integrates machine learning (the BRT, CART and NBT) and geohydrological models was applied for FF susceptibility mapping and constructing FFMI. The proposed approach was applied, for the first time, to the NUAE. Eight FFCPs, namely; altitude, topographic slope, topographic curvature, relief, streams density, lithology, and distance from streams, were chosen for FFSM. The parameters were selected based on their level of influencing FF occurrence, the geo-environmental characteristics of the study area, the geological background of the authors, and those used widely in this literature. Parameters such as LULC, aspect, plan curvature, and NDVI were ignored since the aspect (flow direction) already calculated during stream network extraction, and the study area is characterized by low population, human activity, and large vegetation cover.
The performance of the machine learning models was evaluated by calculating accuracy metrics using the F1 score for each model and ROC curve. The results showed that the BRT had the highest performance followed by the NBT and CART models. The produced FFSM using the BRT was modified by applying a geohydrological approach, and results showed that the area consists of seven FF zones. Each FF zone has its geohydrological characteristics and FF magnitude. The highest FF magnitude was found to be in the zones of the RAK and Masafi, Rul Dadna, and Fujairah-Kalaba, while the lowest FF magnitude was found to be in the zones of Al Dhaid and Falahyeen in the west. These magnitudes can be further enhanced by applying the proposed approach to sub-basins using remote sensing data with a higher spatial resolution. New FFCPs such as alluvial plain width, stream depth, basin gradient and mean slope can be considered in any future study, especially in an arid region. As a conclusion, the proposed approach and new FFCPs from this study demonstrated the superiority of hybrid models, and the obtained FFSMs can assist urban planners, geohazard specialists and decision-makers to reduce the risk of the FF in an arid region.