Prediction Success of Machine Learning Methods for Flash Flood Susceptibility Mapping in the Tafresh Watershed, Iran

: Floods are some of the most destructive and catastrophic disasters worldwide. Development of management plans needs a deep understanding of the likelihood and magnitude of future ﬂood events. The purpose of this research was to estimate ﬂash ﬂood susceptibility in the Tafresh watershed, Iran, using ﬁve machine learning methods, i.e., alternating decision tree (ADT), functional tree (FT), kernel logistic regression (KLR), multilayer perceptron (MLP), and quadratic discriminant analysis (QDA). A geospatial database including 320 historical ﬂood events was constructed and eight geo-environmental variables—elevation, slope, slope aspect, distance from rivers, average annual rainfall, land use, soil type, and lithology—were used as ﬂood inﬂuencing factors. Based on a variety of performance metrics, it is revealed that the ADT method was dominant over the other methods. The FT method was ranked as the second-best method, followed by the KLR, MLP, and QDA. Given a few di ﬀ erences between the goodness-of-ﬁt and prediction success of the methods, we concluded that all these ﬁve machine-learning-based models are applicable for ﬂood susceptibility mapping in other areas to protect societies from devastating ﬂoods. Despite these applications, other data-driven techniques have been rarely explored for their capability in ﬂood modeling. This study explored the prediction success of ﬁve machine learning techniques—alternating decision tree (ADT), functional tree (FT), Kernel logistic regression (KLR), multilayer perceptron (MLP) and quadratic discriminant analysis (QDA)—for ﬂood susceptibility mapping in the Tafresh watershed, Iran. Our speciﬁc objectives are to: (i) explore and compare the e ﬃ ciency of these ﬁve techniques to produce ﬂood susceptibility maps; and generate a ﬂood susceptibility map for the study area.


Introduction
Floods are destructive disasters that endanger human life and cause global economic losses of about 60 billion USD annually [1]. In general, floods are divided into five types based on their locations and causes, including riverine flooding, urban drainage, ground failures, fluctuating lake

Study Area
Tafresh watershed is located in the Markazi Province, Iran ( Figure 1). The 1605 km 2 watershed is characterized by a mountainous topography, cold winters and relatively moderate summers. Mean annual rainfall and evaporation are 304 and 1921 mm. The average temperatures in summer and winter were recorded as 19.2 and 6.4 • C, respectively, with 73 freezing days over the year [26]. The major rivers of this city are: Abkamer, which originates from the southern highlands of Tafresh and joins the Qara Chai River, about 4 km northwest of Tafresh; Farminin, that originates from Rudbar village and after irrigating parts of the city, flows into the Salt Lake; and Qara Chai, that irrigates parts of the north of the city and then flows into the Salt Lake of Qom [27]. Due to heavy rainfall in winter, the discharge of the rivers increases and causes severe overbank flooding [27]. In addition, due to the Mediterranean climate of the province, which causes heavy rainfall during spring and autumn, flash floods occur frequently during these seasons [26].

Methodology
The flood susceptibility mapping methodology consists of four main steps ( Figure 2): (i) construction of a geospatial database for influential factors and historical flood events; (ii) development of the machine learning models; (iii) model validation against historical flood events; and (iv) generation of flood susceptibility maps. A detailed description of each step is presented in the following subsections.

Inventory Map of Historical Floods
Tafresh watershed is one of the flood-prone watersheds that is usually affected by floods due to topographic and climatic characteristics. Numerous floods have occurred in the past and caused severe damages to the human life and buildings. One of the major floods in the watershed occurred in 2017, with a 24-hour rainfall depth of 90 mm, resulting in over one million USD damages in the watershed (Figure 3). Here, the geographic locations of 320 historical floods were obtained from the Regional Water Organization of Markazi Province to develop an inventory flood map. These floods were then divided into two groups. The first group included 70% of flood data, which was then used as the training dataset, and the second group included 30% of the remaining data, which was used as the validation dataset.
Elevation is an important variable in flood occurrence [5,6,18]. In general, flooding and elevation have an inverse relationship, as low-elevated areas are more prone to flooding [8,11]. Here, the elevation map of the study area was extracted from a digital elevation model (DEM) with a 12.5 m pixel size that was obtained from the ALOS PALSAR sensor (https://earth.esa.int/web/guest/home) (Figure 4a). Slope is one of the influential factors in the occurrence of floods due to its direct impact on surface runoff and infiltration potential [8]. Flood-prone areas are often located within flat landscapes [6,22] as floods likely have long durations, which cause water stagnation (long flood duration) that causes environmental hazard. The slope map was derived from the DEM and classified into five classes (Figure 4b). Aspect is another influencing factor in determining the flood occurrence because it is directly associated with the convergence and direction of water flow [18,22,29]. The aspect map of the research area was extracted from the DEM and classified into nine classes (Figure 4c). Distance from rivers has a significant effect on the probability and magnitude of flooding because the terrestrial water storages are highly associated with flood events [6,7,18,22]. The map of this factor was prepared based on the Euclidian distance and divided into six classes (Figure 4d). Rainfall depth is a key factor that could have the greatest effect on flooding [6,7,18,22]. The spatial distribution of average annual rainfall in the Tafresh watershed was prepared using the metrological data obtained from the period of 1993-2018 [26] (Figure 4e). Land use has a significant effect on flood susceptibility [30,31]. The land-use map was derived from the OLI Landsat satellite imagery (https://landsat.gsfc.nasa.gov/operational-land-imager-oli/) of 2017/6/24 using the maximum likelihood algorithm and supervised environment classification in the ENVI software [32] (Figure 4f). Soil type was another influencing factor because it controls the infiltration and runoff [7,21]. The soil type map was obtained from the Natural Resources Office of the Markazi Province, Iran (Figure 4g). The last factor was lithology, which represents units of rocks and soils that affect infiltration and runoff [6,7,33]. We obtained the lithology map from the Administrative Office of the Natural Resources of Markazi Province, Iran ( Figure 4h, Table 1).

Training and Validation Datasets
The geospatial database that was constructed in the first step of the modeling methodology was used to generate training and validation datasets for the modeling process. To this end, the flood inventory map was randomly divided into two sets; one set with 70% of historical flood locations, was used for training and another set with the remaining 30% of flood locations was used for validation [6,7,33]. Similar to the flood locations, 360 unflooded samples were selected from the unflooded portions of the study area and used to complete the training and validation datasets. Flooded and unflooded datasets were overlaid to generate the final training and validation datasets.

Spatial Relationship
Using the frequency ratio (FR) method, we investigated the spatial relationship between the components of the historical floods and each of the eight influencing factors. For each class of the influencing factors, the FR was calculated using the following equation [7,34,35]: where a: number of flood pixels within the class i of a given factor; b: total number of flood pixels in the domain; c: number of pixels in class i of a given factor; d: total number of pixels in the domain.

Machine Learning Methods
Here, we briefly describe the five machine learning methods used in this study. A full description of each method can be found in the literature [36][37][38].

Alternating Decision Tree (ADT)
ADT is an integration of decision trees and boosting procedures proposed to increase the prediction accuracy of binary classification problems [36]. This method alters decision nodes, which indicate a predicate condition, and prediction nodes, which consist of a single number. Decision nodes determine a predicate condition, while prediction nodes contain a single number. ADT is grown using a boosting algorithm for numeric prediction, in which a decision node and its two prediction nodes are constructed at each boosting iteration step [39]. Each prediction node is assigned a weight that represents the contribution of the node to the final prediction score. The summation of all the contribution weights yields the final prediction probability. This procedure differs from other decision tree based methods such as classification and regression tree (CART) or C4.5, in which a sample follows only one path through the tree [40].

Functional Tree (FT)
FT is a multivariate decision tree that uses a combination of traits in leaves and/or in internal nodes to develop a hierarchical framework for handling classification problems [37]. For these problems, FT utilizes a logistic regression function for splitting the functional inner nodes and for prediction at the functional leaves. This is the main advantage of the FT model over conventional hierarchical models that only use the input data.

Kernel Logistic Regression (KLR)
KLR is a traditional classification method based on minimizing the negative log-likelihood function that utilizes the Broyden-Fletcher-Goldfarb-Shanno algorithm to estimate the probabilistic outcomes. In contrast to the LR, KLR has the ability to classify inseparable linear problems by transferring input characteristics to a higher-dimensional space through the kernel [41]. KLR that requires only solving an unconstrained quadratic can provide probabilities and the straightforward extent to multi-class classification problems. Proper parameter tuning makes KLR a computationally efficient method. We used the statistical software R to implement KLR and tuned the parameter using a trial and error process.

Multilayer Perceptron (MLP)
MLP represents a three-layer structure that consists of an input layer, an output layer and one or more intermediate layers, which are not directly connected to input data and outgoing outputs. The input layer units are only responsible for distributing the inputs to the next layer and the output layer also provides the response of the output signals. In this tier, the number of neurons is equal to the number of inputs and outputs and the hidden layers of the relationship interact with the input and output layers. In MLP, there is no definite algorithm for determining the number of hidden layers and the number of neurons and this is often done by trial and error [42,43].

Quadratic Discriminant Analysis (QDA)
QDA is a conventional classification technique with a quadratic decision surface to deal with different covariance values. In the QDA, measurements in each class are assumed to be normal. An advantage of this method is that QDA does not assume the same covariance of each class and can cope with different amounts of covariance classes. QDA is an easy-to-use and attractive classification technique because it does not ask for parameter tuning. This method has been successfully applied in various modeling practices [44,45].

Receiver Operating Characteristic (ROC) Curve
ROC curve is of the most commonly used procedures for checking the performance of the predictive models [46,47]. ROC curve is a two-dimensional curve that plots the true positive rate (TPR) on the y-axis and the false positive rate (FPR) on the x-axis. The ROC curve quantifies the performance of a model using the area under the curve (AUC), with values having a range of 0.5-1.0. A higher AUC indicates a better model performance [48,49].

Statistical Indices
Seven statistical indices were used for further assessment of the model performances: positive predictive value (PPV), negative predictive value (NPV), sensitivity (SST), specificity (SPF), accuracy (ACC), Kappa and root-mean-square error (RMSE). Given the nature of a flood modeling problem, which is treated as a binary pattern recognition problem (0 = unflooded, 1= flooded), these indices are calculated as [50,51]: where A, B, C and D are the numbers of true positives, false positives, true negatives and false negatives, respectively.
where P a is the relative observed agreement among raters and P est is the hypothetical probability of chance agreement.
where n is the number of samples, and X i , and X i 2 , are actual and predicted values of the outputs, respectively.

Spatial Relationship
Spatial relationship between historical floods and influencing factors (measured via the FR method) revealed that the most flood-prone portions of the Tafresh watershed are located on the orchards (FR

Model Performance
Based on the performance metrics (Section 3.5), all models were found to be powerful for recognizing the general pattern of flood susceptibility (i.e., training performance) in the study area. ADT method, with the lowest RMSE (0.247; Figure 6) and the greatest PPV (90.4%), NPV (95.2%), SST (95%), SPF (90.9%), ACC (92.8%) ( Figure 7) and Kappa (0.856) ( Table 2) indices had the best performance. This method correctly classified 90.4% of the flooded cells and 95.2% of the unflooded cells, indicating an excellent agreement between predicted and observed flood events. In the validation phase, however, the performance of the methods was inconsistent.

Flood Susceptibility Maps
We applied the validated models to estimate flood susceptibility values in the study watershed. The flood susceptibility values were then reclassified into five susceptibility classes-very low, low, moderate, high and very high-using the geometrical intervals classification scheme. This resulted in five flood susceptibility maps (Figure 9), one for each machine learning method. Among the five methods, the QDA predicted the greatest portion (26.1%) of the watershed into very high susceptibility, whereas the ADT predicted the smallest area (12.9%) to very high susceptibility (Figure 9a). Despite the difference in the performance of the models, all the models suggested that the low-lying areas along the rivers, orchards and the residential areas (western part of the watershed) are the most flood-prone portions of the study watershed. Overall, nearly 30% of the Tafresh watershed is covered by the high and very high flood susceptibilities, indicating that the mitigation strategies (e.g., warning systems) and monitoring plans should primarily focus on these portions of the watershed.     A further analysis of the susceptibility maps showed that in each map, the greatest number (Figure 10b) and the highest FR (Figure 10c) of the flood pixels belong to the very high susceptibility class, followed by the high, moderate, low and very low classes, sequentially. This indicated that the models performed satisfactorily in demarcating various levels of flood susceptibility across the study watershed.

Discussion and Conclusions
Identifying and zoning the flood-prone areas is one of the important measures for development of mitigation plans and proper resource allocation in response to future flood events. Despite the universal application of machine learning techniques for prediction of floods, generating a reliable flood susceptibility map is still a challenging task. In this study, we applied five machine learning methods-ADT, FT, KLR, MLP and QDA-and compared their predictive performance in the Tafresh watershed, Iran. Nine flood influencing factors were used in flood susceptibility mapping. Our results demonstrated that the ADT method was dominant over the other four methods in terms of overall training and validation performance. This finding was in agreement with past flood susceptibility mapping studies. For example, Khosravi et al. [52] demonstrated the capability of the ADT model over the logistic model tree (LMT), reduced error pruning tree and Naïve Bayes tree models for flood prediction in the Haraz watershed, Iran. In a recent study, Costache [53] found that the most accurate flood susceptibility map for the center of Romania is derived from the ADT model, which outperformed the weights of evidence and LMT models. Further, our results are supported by previous findings that machine learning methods represent predictive flood models with high capability and reliability [54]. For example, using the multivariate discriminant analysis (MDA), classification and regression trees (CART), SVM [22], genetic algorithm rule-set production (GARP), quick unbiased efficient statistical tree (QUEST) [28], ANN [23], adaptive neuro fuzzy inference system (ANFIS) [55] and boosted regression trees (BRT) [9] methods, the researchers successfully predicted and mapped the distribution of flood susceptibilities within different regions of Iran. Similar results have also been reported from USA [56], Australia [57], China [58], Vietnam [6] and Romania [59]. Additionally, the literature consists of several successful experiments of using machine learning methods for the prediction of landslide [39], wildfire [50] and gully erosion [60].
ADT is a robust algorithm against the potential errors of a modeling process and provides significant improvement in classification error [36]. In addition to a robust classification scheme, ADT represents a measure of confidence, known as the classification margin, which helps the model to easily learn alternating trees from the training dataset [50]. The overall advantage of the ADT model is ease of implementation because this method does not have several hyper-parameters to be tuned and modelers just deal with the number of boosting iterations [52]. Nonetheless, in line with the previous works that acknowledged the efficiency of these five machine learning methods for modeling different types of natural hazards [39,50,60], we found that all the five methods are fairly straightforward and easy to implement within the open-source WEKA software [61] for flood susceptibility mapping. We suggest that these methods could be applied in many different landscapes for flood susceptibility mapping.
Our results revealed that the high susceptibility classes of flood occurrences are associated with those portions of the research area that are characterized by human activities such as orchards and residential areas. The extensive levee systems within the study area have significantly reduced the land area for floodwater storage, suggesting the need for redesigning the levee system to increase floodwater storage and at the same time provide the population and infrastructure with widespread flood protection. It is noteworthy that due to substantial human activities in the Tafresh watershed, the probability of floods in the present study do not seem to be constant over a long period of time, highlighting the need for periodic assessments of flood susceptibility for adopting better informed flood mitigation strategies.
Our results have implications for regional flood planning and watershed protection. In the study watershed, stream gauges with adequate observations are not available, which makes it impossible to develop reliable hydraulic models. The five data-mining methods could be effective alternatives for flood susceptibility mapping in these situations (limited hydrologic and geomorphologic observations) and in ungauged watersheds. The flood susceptibility maps generated here could be used to identify areas that need mitigation actions. In addition to the pre-flood management stage that was illustrated here, the computationally efficient data mining methods could be applied for flood-related studies where fast computation is crucial. A potential application could be real-time flood forecasting, where the application of hydraulic models would require a massive amount of time. The data-driven methods presented here could be effectively used for emergency management to guide evacuation plans, which are directly linked to public safety.

Summary
In this study, we evaluated five machine learning methods for flash flood susceptibility mapping in the Tafresh watershed, Iran. The methods showed a few differences in performance. Therefore, we found that all these methods are suitable for flood susceptibility mapping. Since the study watershed lacks stream gauges with adequate observations, it is not possible to develop reliable hydraulic models. Therefore, the five data-driven methods used here could be effective alternates for flood susceptibility mapping.
Modeling frameworks like what is presented here are useful for optimal flood management and for sustainable conservation of the human society. They also contribute to improving the understanding of planners, managers and engineers to review their conservation plans in response to future floods. The results are, however, only valid for the study watershed and cannot be extrapolated to other areas. Various sources of uncertainty also exist in this study. These include selection of influencing factors, subjective classification of flood influencing factors, spatial resolution of datasets, training and validation datasets and choice of performance metrics. Each of these requires further research to show how these uncertainties affect the ultimate flood susceptibility maps and subsequent decision making. Future work should investigate the impact of these uncertainties by selection of other flood factors such as daily or sub-daily rainfall, HAND, stream power index and topographic wetness index, classifying the flood factors together with stakeholders [62,63], performing a sensitivity analysis on the impact of classification of observed dataset (other than 70% and 30% for training and validation) and evaluating the efficiency of the five methods via alternate goodness-of-fit measures [64].