New Hybrids of ANFIS with Several Optimization Algorithms for Flood Susceptibility Modeling

This study presents three new hybrid artificial intelligence optimization models—namely, adaptive neuro-fuzzy inference system (ANFIS) with cultural (ANFIS-CA), bees (ANFIS-BA), and invasive weed optimization (ANFIS-IWO) algorithms—for flood susceptibility mapping (FSM) in the Haraz watershed, Iran. Ten continuous and categorical flood conditioning factors were chosen based on the 201 flood locations, including topographic wetness index (TWI), river density, stream power index (SPI), curvature, distance from river, lithology, elevation, ground slope, land use, and rainfall. The step-wise weight assessment ratio analysis (SWARA) model was adopted for the assessment of relationship between flood locations and conditioning factors. The ANFIS model, based on SWARA weights, was employed for providing FSMs with three optimization models to enhance the accuracy of prediction. To evaluate the model performance and prediction capability, root-mean-square error (RMSE) and receiver operating characteristic (ROC) curve (area under the ROC (AUROC)) were used. Results showed that ANFIS-IWO with lower RMSE (0.359) had a better performance, while ANFIS-BA with higher AUROC (94.4%) showed a better prediction capability, followed by ANFIS0-IWO (0.939) and ANFIS-CA (0.921). These models can be suggested for FSM in similar climatic and physiographic areas for developing measures to mitigate flood damages and to sustainably manage floodplains.


Introduction
Floods that occur in a short duration with high peak discharge [1] are considered as the worst weather-related natural hazard worldwide causing huge losses of life and property as well as deep social impacts [2][3][4]. Floods are more hazardous than other natural catastrophic hazards, because they affect more than 20000 lives per year and adversely impact nearly 75 million people worldwide, especially through social impacts such as homelessness, major changes in human lives, and the environment [5,6]. In Asia, more than 50% of economic damages as well as over 90% of all human deaths are caused due to floods [7,8].
Many floods have recently occurred in Iran, especially in the northern parts, such as floods in 2012 at Noshahr, in 2013 at Neka and Behshahr, in 2013 and 2015 at Sari, and 2016 at Noshahr, which caused huge financial and human losses [9,10]. The frequencies and damages of these floods may increase in future due to severe climate change and extensive land-use changes in the country [11]. It is noted that a complete flood prevention is not possible; however, its spatial prediction can help mitigate its human and socio-economic losses [12]. Thus, one of the key points in flood management plans is the identification of flood prone areas so that damages can more likely be reduced by avoiding more construction and physical development in these regions. Therefore, it seems logical to seek approaches that can more and more accurately detect flood prone areas within watersheds. Flood measurement and modeling have been always considered two options for such an identification [13]. Since measurement of flood characteristics specifically during the event is hard, costly, and time consuming [14], modeling has been extensively used by scientists especially from the age of digital hydrology. A vast array modeling approaches, ranging from simple linear and empirical models to sophisticated non-linear physically-based models, have been used for flood simulations since then; however, a comprehensive and integrated flood modeling approach has not been achieved yet, due to the complexity, non-linearity, and dynamic structure of floods and their watersheds. Thus, the issue of flood occurrence forecasting and its mapping using physically-based rainfall-runoff models has remained as a challenge [15,16]. The weakness of one-dimensional, linear, and empirical hydrological methods is that watershed river morphology is not stable, and has dynamic characteristics, and they also take a black box view to the watershed such that they cannot provide any insight into the flooding process [17]. These models are also unable to depict rapid watershed responses [18][19][20]. Even though physically-based models can, to some extent, handle watershed and flood complexity, they cannot fully describe total complexity of watersheds and their hydrological phenomena [21][22][23]. They also require field work and a huge bulk of data as well as prohibitive computational costs and parameter estimation [12,14,24]. These shortcomings of classic hydrological models are required to be overcome by new approaches. Tehrany et al. [2] and Bui et al. [25] reported that in recent years, due to aforementioned drawbacks, GIS with data-driven and data mining techniques have offered new insights into natural disaster prediction which can be brought to bear on modeling multi-dimensional floods, an issue which little is known about.
A literature review reveals that all studies in the field of flood hazard modeling can be classified into two main groups including geological-geomorphological and hydrological-hydraulic methods. The first group is based on the field surveys and monitoring of evidence of overflows using remote sensing data [13]. The flood in the second group is simulated and mapped based on the peak flows for specific events or for return periods, thereby obtaining the extent of the water surface [18]. A need to huge amount of data which are not readily available in developing countries and also a need to a great deal of time for model calibration in ungagged hydrometric stations are the weaknesses of the hydrological-hydraulic methods. The generated results by geological-geomorphological methods are also more reasonable than the 1D hydrological-hydraulic methods when river channel is changeable over time and has a high erosive potential [19]. In this regard, some studies have been conducted on flood modeling using hydrological-hydraulic methods including Horritt et al. [21], Di Baldassarre et al. [22], Grimaldi et al. [23], Nguyen et al. [20], and Tyrna et al. [26].
However, many studies have recently been conducted to identify natural hazard prone areas around the world [9,[27][28][29][30][31] and to FSM using geological-geomorphological methods such as frequency ratio (FR) [32], weights-of-evidence (WOE) [28], logistic regression (LR) [33], analytical hierarchy process (AHP) [34], artificial neural network (ANN) [35], decision tree (DT) [36], adaptive neuro-fuzzy inference system (ANFIS) [25], ANFIS-genetic algorithm (GA) [37], bagging-LMT hybrid model and random forest [38], and ANFIS-particle swarm optimization (PSO) [39]. Application of machine learning methods in flood studies has been shown by many researchers [40]; however, no general agreement has yet been reached on the selection of the best model for any natural hazard assessment, such as flood susceptibility. Thus, new models are needed and should be tested. Tehrany et al. 2014 [28] have claimed that machine learning is the main source of methods for data driven modeling which can be applied for flooding modeling. Because floods are complex, it is difficult to model them [41], but because data mining and artificial intelligence models have a non-linear structure, they are more proper than other methods. Artificial neural networks have been widely used for natural hazard assessment among other machine learning tools because of their computational efficiency [25,42,43]; however, the modeling may face with errors in some cases due to their poor prediction [25]. Therefore, to bring the disadvantages of the ANN model under control; because ANFIS model high accuracy-which is a hybrid of ANN and fuzzy logic-it has been proposed by some researchers [44][45][46][47]. The ANFIS model has a better performance than the two individual models [48,49], but it has some limitations due to its weakness to find the best weight parameters which heavily influence its prediction performance [25] and it is better that these weights searched and determined by using optimization of soft computing techniques. However, there are some optimization methods that have different structures and function distributions to find the weighs of parameters. Additionally, the results of each optimization algorithm are different in each study area due to change in the geo-environmental factors. Therefore, detecting new hybrid algorithms to find the best weights and accessing the reasonable results is a crucial issue in the flood modeling process. The main differences between the current study and other studies in the field of flood susceptibility assessment is that this study uses new hybrid algorithms including ANFIS-CA, ANFIS-BA, and ANFIS-IWO for spatial prediction of floods in the Haraz watershed in the northern part of Iran. Although some optimization and machine learning algorithms have been applied for flood modeling over the world, these optimization methods have not been before explored for flood assessment.

Study Area
The Haraz watershed is one of the most flood-influenced basins in the Mazandaran province, which was selected for pilot study area. The watershed location is at the south part of Sari a capital city of Mazandaran Province, Iran ( Figure 1). It lies between longitudes of 51 • 43 and 52 • 36 E, and the latitudes of 35 • 45 and 36 • 22 N, covering an area of about 4014 km 2 . The altitude varies from 300 to 5595 m. The climate of the study area based on De Martonne climatic classification system is very humid and its average annual rainfall is 430 mm. The entire study region is high land and mountainous area, for which slopes of 0-6 • cover only 5% and ground slope varies from 0-66 • . Rangelands cover 92% of the study area, and-geologically-the area is predominantly occupied by Jurassic formations.

Data Preparation and Analysis
The methodology of the present study is shown in Figure 2, including the following steps.

Flash Flood Inventory
An inventory map is indispensable for future spatial prediction of any natural hazard assessment [50] considering single or multiple events in a specific region for recent and past events [51]. Therefore, the first step in any natural hazard susceptibility assessment is the preparation of the inventory map containing historical records [52]. In the current research, flood inventory map with 201 flood locations was generated using flood historical data from 1995 to 2015 that was finally checked during an extensive field survey.

Data Preparation and Analysis
The methodology of the present study is shown in Figure 2, including the following steps.

Data Preparation and Analysis
The methodology of the present study is shown in Figure 2, including the following steps.

Flash Flood Inventory
An inventory map is indispensable for future spatial prediction of any natural hazard assessment [50] considering single or multiple events in a specific region for recent and past events [51]. Therefore, the first step in any natural hazard susceptibility assessment is the preparation of the inventory map containing historical records [52]. In the current research, flood inventory map with 201 flood locations was generated using flood historical data from 1995 to 2015 that was finally checked during an extensive field survey.

Flash Flood Inventory
An inventory map is indispensable for future spatial prediction of any natural hazard assessment [50] considering single or multiple events in a specific region for recent and past events [51]. Therefore, the first step in any natural hazard susceptibility assessment is the preparation of the inventory map containing historical records [52]. In the current research, flood inventory map with 201 flood locations was generated using flood historical data from 1995 to 2015 that was finally checked during an extensive field survey.

Dataset Collection for Spatial Modeling
In any natural hazard prediction, spatial relationship between hazard occurrence and conditioning factors should be analyzed [53]. In this research, 10 flood conditioning factors were based on the literature review, data availability, and characteristics of the study area, selected for flooding assessment-including ground slope, altitude, curvature, SPI, TWI, land-use, rainfall, river density, distance to river together with lithology-in a raster format with spatial resolution of 30 m in Environmental Systems Research Institute (ESRI) ArcGIS 10.2 [11,54].
It is very likely for one factor to have a high impact on flooding in a specific watershed while it may not show any influence in another watershed [35]. Since topographical factors are highly significant to identifying flood prone areas and also have a direct impact on the results of modeling [55], a digital elevation model (DEM) of the study area was extracted from the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global DEM with a 30 × 30 m grid size. Five geomorphic factors-such as ground slope, altitude, curvature, SPI, and TWI-were then constructed from DEM. When the ground slope increases, runoff infiltration duration decreases as well and flow velocity increases. Therefore, high volume of runoff enters the river and as result causes floods [2]. The ground slope map was constructed with eight categories [29]: 0-0.5, 0.5-2, 2-5, 5-8, 8-13, 13-20, 20-30, and larger than 30 • (Figure 3a). According to previous studies [9,10,29,32], altitude has been considered as one of the most effective factors, and since water flows from higher to lower altitudes in mountainous areas, areas of watershed located in lower altitudes have a higher potential for flood occurrence [18]. The altitude map was prepared with nine classes (Figure 3b): 328-350, 350-400, 400-450, 450-500, 500-1000, 1000-2000, 2000-3000, 3000-4000, and > 4000 m. Flat and concave areas have a higher potential for flooding [2,28,32]. Curvature of the study area was constructed in three categories ( Figure 3c): <−0.1, −0.1-0.1, and >0.1, namely concave, flat, and convex, respectively.
River density and distance to river play notable roles in extend and magnitude of flood occurrence [59]. River network used to prepare river density and distance to river maps. River density was calculated by dividing the river length (m) by the basin area (km 2 ) [48] and grouped into six categories, including (Figure 3e): 0-0.401, 0.401-1.17, 1.92-2.67, 2.67-3.66, and 3.66-7.3 km/km 2 . Field survey revealed that there are many flood occurrences adjacent to rivers, and the more the distance to river, the lower the probability of flood occurrence. The distance to river map was constructed using river and multiple ring buffer command in ArcGIS 10.2 and divided into eight classes ( Figure 3h): 0-50, 50-100, 100-150, 150-200, 200-400, 400-700, 700-1000, and > 1000 m. A lithology map of the study area with 1:100,000 scale showed six groups of formations including: Teryas, Quaternary, Permain, Cretaceous, Jurassic, and Tertiary ( Figure 3i). Land-use type is considered as a conditioning factor that has a significant role in flooding [9]. The areas with higher vegetation density, such as forest regions, can control surface runoff and infiltrate the water; therefore, there is negative spatial relationship between vegetation density and flood occurrence [11]. Land-use map was generated from Landsat 8 Operational Land Imager (OLI) imagery for 2013 in Environment for Visualizing Images (ENVI) 5.1 software (The Board of Trustees of the University of Illinois, Illinois, IL, USA) and classified into seven classes-namely grassland (rangeland), bare land, forest, garden, farming land, residential, and water body-using neural network algorithm and supervised classification (Figure 3j). Rainfall is the most prominent conditioning factor for flood occurrence. About 20 years, from 1991 to 2011, meteorological data was used in order to prepare rainfall maps and then classified into six classes ( Figure 3g

Preparation of Training and Testing Dataset
It should be noted that FSM is considered to have a binary classification in which flood index is divided into two classes, including flood or non-flood; therefore, 201 non-flood locations were first selected by Google Earth on hilly and mountainous regions that are not inundated during flood events [9,10]. To validate model capabilities, the inventory dataset should be divided into two groups [60], one for model construction (training dataset) and the other for model validation (testing dataset). For building the training dataset, 70% of both flood and non-flood locations were randomly selected (141 locations) and then combined together, afterwards to build the testing dataset, 30% of the remaining (60 locations) were incorporated together. Both of training and testing datasets transformed to raster format and overlaid with ten flood conditioning factors at the end. For both datasets, a value of 1 was assigned to flood pixels and a value of 0 to non-flood pixels.

Analysis of Spatial Correlation
SWARA method was used to display the spatial relationship between flood occurrence locations together with each conditioning factors. The SWARA method is one of the new multiple-criteria decision-making (MCDM) techniques developed in 2010 [61]. In this technique, the highest rank is apportioned to the most valuable criteria and the lowest rank is allocated to the least valuable criterion and finally, the mean values of ranks are taken into account to obtain the overall ranks [62]. The advantage of this method is that it enables the expert's opinion in relation to the accuracy of weighting to be assessed in model performance [61]. Another advantage of this method in comparison to other MCDM techniques is that experts can converse and work together, which makes for more accurate results. The SWARA method view point is different from other MCDM methods such as AHP, Analytic network process (ANP), etc. [63]; however, this method leads policy makers to make better decisions according to their goals. The Step-wise weight assessment ratio analysis (SWARA) method was used only for determining the initial weights as input for modeling. In this study, we invited some hydrology experts to prioritize and rank the order of sub-classes of each conditioning factor.
According to Keršuliene et al. [61], the importance relativity of the mean value, Sj was determined by where n stands for the number of experts, i is counter in sigma, A shows the offered ranks by the experts for each factor, and j represents the number of the factor. Afterwards, the coefficient K is determined by

Preparation of Training and Testing Dataset
It should be noted that FSM is considered to have a binary classification in which flood index is divided into two classes, including flood or non-flood; therefore, 201 non-flood locations were first selected by Google Earth on hilly and mountainous regions that are not inundated during flood events [9,10]. To validate model capabilities, the inventory dataset should be divided into two groups [60], one for model construction (training dataset) and the other for model validation (testing dataset). For building the training dataset, 70% of both flood and non-flood locations were randomly selected (141 locations) and then combined together, afterwards to build the testing dataset, 30% of the remaining (60 locations) were incorporated together. Both of training and testing datasets transformed to raster format and overlaid with ten flood conditioning factors at the end. For both datasets, a value of 1 was assigned to flood pixels and a value of 0 to non-flood pixels.

Analysis of Spatial Correlation
SWARA method was used to display the spatial relationship between flood occurrence locations together with each conditioning factors. The SWARA method is one of the new multiple-criteria decision-making (MCDM) techniques developed in 2010 [61]. In this technique, the highest rank is apportioned to the most valuable criteria and the lowest rank is allocated to the least valuable criterion and finally, the mean values of ranks are taken into account to obtain the overall ranks [62]. The advantage of this method is that it enables the expert's opinion in relation to the accuracy of weighting to be assessed in model performance [61]. Another advantage of this method in comparison to other MCDM techniques is that experts can converse and work together, which makes for more accurate results. The SWARA method view point is different from other MCDM methods such as AHP, Analytic network process (ANP), etc. [63]; however, this method leads policy makers to make better decisions according to their goals. The Step-wise weight assessment ratio analysis (SWARA) method was used only for determining the initial weights as input for modeling. In this study, we invited some hydrology experts to prioritize and rank the order of sub-classes of each conditioning factor.
According to Keršuliene et al. [61], the importance relativity of the mean value, S j was determined by where n stands for the number of experts, i is counter in sigma, A i shows the offered ranks by the experts for each factor, and j represents the number of the factor. Afterwards, the coefficient K j is determined by The recalculated weight Q j is calculated as The comparative weights of evaluation criteria are verbalized as where W j presents the relative weight of the jth criterion, and m shows the total number of criteria.

Flood Spatial Prediction Modeling
In the present study, three new ensemble models-ANFIS-CA, ANFIS-BA, and ANFIS-IWO-were selected to determine the most susceptible flood areas as well as their comparison.

Adaptive Neuro-Fuzzy Inference System
Takagi and Sugeno [64] presented ANFIS that was obtained from ANN and fuzzy logic [65] by catching the advantages of both in one framework. The ANN's learning capability is automatic. However, this model cannot describe how it acquires the output from decision making. On the contrary, the fuzzy logic can produce output out of fuzzy logic decision, but it does not have the ability to automate learnings [66]. An ANFIS, derived from nature, generates input and output data pairs, so it has been successfully used in diverse fields at solving nonlinear issues and indicating problems [67]. The architecture of ANFIS training is shown in Figure 4. The ANFIS is a feed forward neural network with multi-layer structure. In general, the layers of ANFIS are constructed from six layers with the following function: Layer 1 (input layer): Layer 1 is the layer of input with the amount of x and y passed to the number of neurons in the next layer.
Layer 2 (fuzzification layer): Every node i in Layer 2, also called the 'fuzzification layer', consists of an adaptive node (square node) which has a node function A i or B i−2 as the linguistic label related to the input node i. Therefore, in Layer 2, fuzzy membership function, is computed which determines 'full', 'partial', or 'none' membership levels. The output function is calculated according to the equations Several membership functions for fuzzifying used many research such as triangular, trapezoidal, Gaussian, and Bell functions. In this research, we used the Bell function such that µ A i (x 1 ) is given by where a i , b i , and c i are the parameters of the Bell function that are so-called premise parameters [65,68]. Layer 3 (antecedent layer): All nodes in this layer are fixed nodes labeled as Water 2018, 10, x FOR PEER REVIEW 8 of 28 The recalculated weight Q is calculated as The comparative weights of evaluation criteria are verbalized as where W presents the relative weight of the jth criterion, and m shows the total number of criteria.

Flood Spatial Prediction Modeling
In the present study, three new ensemble models-ANFIS-CA, ANFIS-BA, and ANFIS-IWOwere selected to determine the most susceptible flood areas as well as their comparison.

Adaptive Neuro-Fuzzy Inference System
Takagi and Sugeno [64] presented ANFIS that was obtained from ANN and fuzzy logic [65] by catching the advantages of both in one framework. The ANN's learning capability is automatic. However, this model cannot describe how it acquires the output from decision making. On the contrary, the fuzzy logic can produce output out of fuzzy logic decision, but it does not have the ability to automate learnings [66]. An ANFIS, derived from nature, generates input and output data pairs, so it has been successfully used in diverse fields at solving nonlinear issues and indicating problems [67]. The architecture of ANFIS training is shown in Figure 4. The ANFIS is a feed forward neural network with multi-layer structure. In general, the layers of ANFIS are constructed from six layers with the following function: Layer 1 (input layer): Layer 1 is the layer of input with the amount of x and y passed to the number of neurons in the next layer.
Layer 2 (fuzzification layer): Every node i in Layer 2, also called the 'fuzzification layer', consists of an adaptive node (square node) which has a node function Ai or Bi−2 as the linguistic label related to the input node i. Therefore, in Layer 2, fuzzy membership function, is computed which determines 'full', 'partial', or 'none' membership levels. The output function is calculated according to the equations Several membership functions for fuzzifying used many research such as triangular, trapezoidal, Gaussian, and Bell functions. In this research, we used the Bell function such that μ (x ) is given by where ai, bi, and ci are the parameters of the Bell function that are so-called premise parameters [65,68]. Layer 3 (antecedent layer): All nodes in this layer are fixed nodes labeled as ᴨ. In this layer, the aim is to calculate the firing strength for each rule, called w and the following equation computes the outputs Layer 4 (strength normalization layer): All nodes in this layer are fixed nodes (square nodes) labeled N. Each node calculates the individual rules firing strength ratio to the total number of all rules firing strengths. Output is called the normalized firing strength, determined for each node by the equation . In this layer, the aim is to calculate the firing strength for each rule, called w i and the following equation computes the outputs Layer 4 (strength normalization layer): All nodes in this layer are fixed nodes (square nodes) labeled N. Each node calculates the individual rules firing strength ratio to the total number of all rules firing strengths. Output is called the normalized firing strength, determined for each node by the equation Layer 5 (consequent layer): This layer is the adaptive layer for every node which is known as the defuzzification layer. Every node is an adaptive square node with the node function. It can be defined as where p i , q i , and r i are the consequent parameters of function fuzzy inference system (f i ). Layer 6 (inference layer): This layer is a single node which calculates the total number of all received signals from the defuzzification layer in order to generate the overall output shown by circle and labeled ∑ and f out is final output. It can be described as Several researches have recently optimized the parameters of fuzzy membership in ANFIS by using metaheuristic algorithms which has improved the results [69,70]. In this study, cultural algorithms (CA), bees algorithm (BA), and invasive weed optimization algorithm (IWO) were used for optimizing premise parameters of bell function.
Water 2018, 10, x FOR PEER REVIEW 9 of 28 Layer 5 (consequent layer): This layer is the adaptive layer for every node which is known as the defuzzification layer. Every node is an adaptive square node with the node function. It can be defined as where pi, qi, and ri are the consequent parameters of function fuzzy inference system (fi). Layer 6 (inference layer): This layer is a single node which calculates the total number of all received signals from the defuzzification layer in order to generate the overall output shown by circle and labeled ∑ and fout is final output. It can be described as Several researches have recently optimized the parameters of fuzzy membership in ANFIS by using metaheuristic algorithms which has improved the results [69,70]. In this study, cultural algorithms (CA), bees algorithm (BA), and invasive weed optimization algorithm (IWO) were used for optimizing premise parameters of bell function. Hybrid models can find the relationships between the SWARA values of each conditioning factor through training dataset. In the modeling process, the optimization algorithms have been performed in three steps ( Figure 5) including (i) dividing dataset into two datasets in a 70/30% ratio for model building (training dataset) and model validation (testing dataset), respectively; (ii) construction of bell function in an ANFIS as a membership function and determination of the clusters were performed. Each cluster follows the bell function, that each function parameter of bell function has been optimized by cultural algorithm (CA), bees algorithm (BA), invasive weed optimization algorithm (IWO). We defined the cost function (RSME) and utilize training dataset for optimizing parameters of the bell function of ANFIS; iii) the prediction power of each model has been calculated by the testing dataset. Hybrid models can find the relationships between the SWARA values of each conditioning factor through training dataset. In the modeling process, the optimization algorithms have been performed in three steps ( Figure 5) including (i) dividing dataset into two datasets in a 70/30% ratio for model building (training dataset) and model validation (testing dataset), respectively; (ii) construction of bell function in an ANFIS as a membership function and determination of the clusters were performed. Each cluster follows the bell function, that each function parameter of bell function has been optimized by cultural algorithm (CA), bees algorithm (BA), invasive weed optimization algorithm (IWO). We defined the cost function (RSME) and utilize training dataset for optimizing parameters of the bell function of ANFIS; iii) the prediction power of each model has been calculated by the testing dataset.

Cultural Algorithm
Cultural algorithm (CA) is an evolutionary algorithm introduced by Reynolds [71]. CA expresses an ideal framework of various theories of social evolution with the concept of collective intelligence developed in the 19th century. CA is a computational model of cultural evolution in solving optimization problems that need a vast amount of domain knowledge to steer the collective decisions of individuals in the population. CA has been applied to problems by extensive data, numerous domain limitations, many objectives, and multiple agents in a vast distributed social network. Concluded from social structures, CA compounds evolutionary systems and agents using multiple knowledge sources for the evolution process. Cultural algorithms include two major parts: the space of population and the atmosphere of belief [72]. These two spaces are connected via a communication protocol describing how to link both spaces with the rules for people who believe in their own experiences. These interactions are shown in Figure 6. The population space can include any population-based computational models, like genetic algorithms and evolutionary programming [73]. The belief space supports the information reservoir which every one of their experiences is for other individuals to learn them indirectly.

Cultural Algorithm
Cultural algorithm (CA) is an evolutionary algorithm introduced by Reynolds [71]. CA expresses an ideal framework of various theories of social evolution with the concept of collective intelligence developed in the 19th century. CA is a computational model of cultural evolution in solving optimization problems that need a vast amount of domain knowledge to steer the collective decisions of individuals in the population. CA has been applied to problems by extensive data, numerous domain limitations, many objectives, and multiple agents in a vast distributed social network. Concluded from social structures, CA compounds evolutionary systems and agents using multiple knowledge sources for the evolution process. Cultural algorithms include two major parts: the space of population and the atmosphere of belief [72]. These two spaces are connected via a communication protocol describing how to link both spaces with the rules for people who believe in their own experiences. These interactions are shown in Figure 6. The population space can include any population-based computational models, like genetic algorithms and evolutionary programming [73]. The belief space supports the information reservoir which every one of their experiences is for other individuals to learn them indirectly.

Cultural Algorithm
Cultural algorithm (CA) is an evolutionary algorithm introduced by Reynolds [71]. CA expresses an ideal framework of various theories of social evolution with the concept of collective intelligence developed in the 19th century. CA is a computational model of cultural evolution in solving optimization problems that need a vast amount of domain knowledge to steer the collective decisions of individuals in the population. CA has been applied to problems by extensive data, numerous domain limitations, many objectives, and multiple agents in a vast distributed social network. Concluded from social structures, CA compounds evolutionary systems and agents using multiple knowledge sources for the evolution process. Cultural algorithms include two major parts: the space of population and the atmosphere of belief [72]. These two spaces are connected via a communication protocol describing how to link both spaces with the rules for people who believe in their own experiences. These interactions are shown in Figure 6. The population space can include any population-based computational models, like genetic algorithms and evolutionary programming [73]. The belief space supports the information reservoir which every one of their experiences is for other individuals to learn them indirectly.

Bees Algorithm
Another heuristic algorithm is the bees algorithm, which is bee swarm-based. Originally, this algorithm introduced by Pham et al. [75], which is inspired by the foraging behavior of a group of bees when they try to find food resources around their hives [75]. First, the scout bees are distributed uniformly and randomly in different directions to be able to find flower patches in each spot. After identifying the flower patches, the scout bees go back to the hive and start a special dance which is known as the 'waggle dance' and it is used for communicating with other group members to share information regarding the flower patches they have already identified. This information may include the direction, the distance to the hive, as well as the amount of nectar existing in the flower patches. The accumulated information shared by these scout bees helps the hive or, in other words, the colony, have a proper assessment of all flower patches available. After accomplishing this phase, the scout bees escort another kind of bees known as the 'recruit bees' to go to the discovered flower patches. Different quantities of recruit bees are assigned to each scout bee based on the distance of flower patches as well as the amount of available nectar. In other words, if one flower patch has a higher quality than another one, more recruit bees follow them. Afterwards, the recruit bees constantly assess the flower patch quality while doing the harvest process until the flower patch quality declines. Then, they will leave the flower patch immediately. However, if patch has a steady and satisfactory quality, it will be announced via another waggle dance. Figure 7 shows the flowchart for bee algorithm. First of all, to implement this algorithm, n numbers of bees are uniformly distributed in a random manner throughout the search space. The algorithm then starts assessing the fitness for each determined site by the scout bees until the most prope-or optimized-bee is chosen as the 'elite bee'. The sites of these bees are selected form vicinity of searches' area and the algorithm explores in the selected sites bees in order to find the best bees at the time when numbers of bees are at its high level. For each site, only the most appropriate bee is being chosen to survive for the next generation. The remaining bees are randomly assigned around the scouting area for new potential solutions. The phase continues so long as the algorithm reaches convergence.
Water 2018, 10, x FOR PEER REVIEW 11 of 28

Bees Algorithm
Another heuristic algorithm is the bees algorithm, which is bee swarm-based. Originally, this algorithm introduced by Pham et al. [75], which is inspired by the foraging behavior of a group of bees when they try to find food resources around their hives [75]. First, the scout bees are distributed uniformly and randomly in different directions to be able to find flower patches in each spot. After identifying the flower patches, the scout bees go back to the hive and start a special dance which is known as the 'waggle dance' and it is used for communicating with other group members to share information regarding the flower patches they have already identified. This information may include the direction, the distance to the hive, as well as the amount of nectar existing in the flower patches. The accumulated information shared by these scout bees helps the hive or, in other words, the colony, have a proper assessment of all flower patches available. After accomplishing this phase, the scout bees escort another kind of bees known as the 'recruit bees' to go to the discovered flower patches. Different quantities of recruit bees are assigned to each scout bee based on the distance of flower patches as well as the amount of available nectar. In other words, if one flower patch has a higher quality than another one, more recruit bees follow them. Afterwards, the recruit bees constantly assess the flower patch quality while doing the harvest process until the flower patch quality declines. Then, they will leave the flower patch immediately. However, if patch has a steady and satisfactory quality, it will be announced via another waggle dance. Figure 7 shows the flowchart for bee algorithm. First of all, to implement this algorithm, n numbers of bees are uniformly distributed in a random manner throughout the search space. The algorithm then starts assessing the fitness for each determined site by the scout bees until the most prope-or optimized-bee is chosen as the 'elite bee'. The sites of these bees are selected form vicinity of searches' area and the algorithm explores in the selected sites bees in order to find the best bees at the time when numbers of bees are at its high level. For each site, only the most appropriate bee is being chosen to survive for the next generation. The remaining bees are randomly assigned around the scouting area for new potential solutions. The phase continues so long as the algorithm reaches convergence.

Invasive Weed Optimization Algorithm
Based on the method which originally developed by Mehrabian and Lucas [77] to optimize growth's place of weeds and their reproduction, IWO algorithm is another meta-heuristic algorithm which has been used to imitate the colonizing weeds' behavior. Among the specifications of this algorithm, its simple form, few input parameters, strong robustness, as well as ease of understanding have made it popular for application in problematic nonlinear optimization [78][79][80]. Furthermore, the IWO algorithm has more optimized solutions for problems in comparison with other algorithms such as particle swarm optimization (PSO) and shuffled frog leaping algorithm (SFLA) and in some cases even it has a better performance than the stated algorithms [78]. The IWO algorithm has five components as follows: (1) Initialization It involves a random distribution of restricted weeds in the searching space with d dimension that it is number of factors of flood, which in fact is the earliest population of solutions.
(2) Reproduction During the growing period, weeds are allowed to reproduce a specific number of seeds according to their fitness. As a matter of fact, numbers of reproduced seeds or their S min starts from worth fitness and increases to reach S max for the weeds with the best fitness, as shown in Figure 8. Based on the method which originally developed by Mehrabian and Lucas [77] to optimize growth's place of weeds and their reproduction, IWO algorithm is another meta-heuristic algorithm which has been used to imitate the colonizing weeds' behavior. Among the specifications of this algorithm, its simple form, few input parameters, strong robustness, as well as ease of understanding have made it popular for application in problematic nonlinear optimization [78][79][80]. Furthermore, the IWO algorithm has more optimized solutions for problems in comparison with other algorithms such as particle swarm optimization (PSO) and shuffled frog leaping algorithm (SFLA) and in some cases even it has a better performance than the stated algorithms [78]. The IWO algorithm has five components as follows: (1) Initialization It involves a random distribution of restricted weeds in the searching space with d dimension that it is number of factors of flood, which in fact is the earliest population of solutions.
(2) Reproduction During the growing period, weeds are allowed to reproduce a specific number of seeds according to their fitness. As a matter of fact, numbers of reproduced seeds or their Smin starts from worth fitness and increases to reach Smax for the weeds with the best fitness, as shown in Figure 8. (3) Spatial Dispersal Reproduced seeds are being distributed by chance all over the searching region. So that they can be located close to their family with the average value which is equal to zero and has non-identical variances. Additionally, in every iteration, the standard deviation (SD) decreases from σmin to σmax and is calculated with via non-linear equation which is where itermax is the number of last iteration, σiter is the corresponding iteration's SD, n is the nonlinear modulation index between 2 and 3, σmax is maximum value's SD and σmin is minimum value's SD [81].

(4) Competitive Exclusion
All weeds together with their seeds are combined to create the population for the next generation. If the population surpasses a definite maximum, the weeds with less fitness will be (3) Spatial Dispersal Reproduced seeds are being distributed by chance all over the searching region. So that they can be located close to their family with the average value which is equal to zero and has non-identical variances. Additionally, in every iteration, the standard deviation (SD) decreases from σ min to σ max and is calculated with via non-linear equation which is where iter max is the number of last iteration, σ iter is the corresponding iteration's SD, n is the nonlinear modulation index between 2 and 3, σ max is maximum value's SD and σ min is minimum value's SD [81].

(4) Competitive Exclusion
All weeds together with their seeds are combined to create the population for the next generation. If the population surpasses a definite maximum, the weeds with less fitness will be eradicated.
Reproduction and competition pave the way for the reproduction of the fittest weeds. Therefore, if they produce fitter seeds, seeds can remain alive in the competition.

(5) Termination Condition
Steps 2 to 4 are repeated when the iteration reaches to the maximum defined amount and weeds have the maximum fitness that are closest to optimal solution.

Performance Assessment
Quantitative approaches to determine the accuracy of the models are different between observed and estimated values, defined as a forecasting error. In the current study, the capability of models for flood prediction was evaluated using a statistical criterion namely root-mean-squared error (RMSE) as where O i and E i are observation and prediction of flood probability values, respectively, in training and testing datasets, and N is all samples.

Model Validation and Comparisons
All in all, forecasting capability of flood spatial modeling was analyzed for training and testing datasets [11] using ROC and AUROC, as a standard useful technique to evaluate the prediction capability of models [82,83], the ROC curve is a graph with specificity on the x-axis and sensitivity on the y-axis. Specificity is the number of incorrectly classified floods per total predicted non-floods while sensitivity is the number of correctly classified floods per total predicted floods [84]. The higher the AUROC value is, the better the prediction capability of models will be better [85]. The AUROC can be formulated as AUROC = ∑ TP + ∑ TN P + N where TP and TN are the number of floods that correctly classified as floods and non-floods, respectively. P and N are the number of total pixels which defines as floods and non-floods, respectively [38]. Results of model performance on training data (success rate) shows a degree of fit of a flood model with the training dataset, indicating how suitable the built model is for flood susceptibility evaluation; therefore, this is not an appropriate method to show the capability of model prediction [51,86,87]. Performance of model using testing/validating dataset (prediction rate) shows how good a model is; thus, this approach should be used for evaluation of model prediction capability. In this research, both success and prediction rates were performed using the training and testing datasets with flood susceptibility index.

Freidman Test
In the present research, a non-parametric test-namely the Freidman [88] test which is one of the most credible tests for multiple comparisons [89]-was used to find significant differences between models. This test has ranking for each row by considering the rank values of each columns. The null hypothesis (H 0 ) for the current research shows that there is no difference between prediction capabilities of flood models. If the amount of the p-value (significance) is smaller than the significance level (α = 0.05), then the null hypothesis is rejected. The biggest weakness of this technique is that it only shows whether there is difference among the models performance or not, and it does not have any capability to display pairwise comparisons among the models.

Wilcoxon Test
To overcome Freidman's test weakness, another non-parametric test-Wilcoxon test-was used. This test is used when the aim is to compare two related samples, matched samples, or paired data. The Wilcoxon test prepares pairwise comparisons between all performed flood susceptibility models. The null hypothesis for Wilcoxon test is similar to Friedman's test. The p-value and z-score were applied to assess the differences between flood susceptibility models.

Spatial Relationship between Flood Occurrence and Conditioning Factors
The spatial correlations between flood occurrence and conditioning factors were evaluated, as shown in Table 1. The highest value of SWARA belonged to the first class of 0-0.5 (0.4); therefore, the steeper the ground slope, the lower the flood occurrence probability. The SWARA values decrease when elevation increases and the lowest elevation of 328-350 m had the highest impact (0.63) on the flood occurrence. Generally, the lowest and highest elevations had the highest and lowest influences on flood occurrence, respectively. In the case of curvature, the concave landscape had the highest influence on flooding (0.46), followed by flat (0.43) and then convex (0.11). For SPI, the highest value belonged to class of 2000-3000 (0.32) and values decreased by SPI reducing. The TWI value had a direct impact on flood occurrences events; the greater the TWI, the higher the flood occurrence probability. For the present study, the highest (0.08) and lowest (0) SWARA values belonged to the highest (6.9-11.5) and lowest (1.9-3.9) TWI values. For river density, the class of 2.67-3.66 and 3.66-7.3 showed the highest probability (0.37) and the class of 0-0.4 had the lowest probability (0) of flooding. Results revealed that the more the river density, the higher the flooding probability. The SWARA values showed a decreasing trend when the distance to rivers increased, as the highest and lowest SWARA values belonged to the distance of 0-50 m (0.59) and more than 700 m (0), respectively. According to [2,9,10,28], the most prone areas to flood occurrence were the areas with the lowest elevation, lowest ground slope, flat area, and that were closest to rivers. In the case of lithology, Teryas formations had the highest impact on flooding (0.31), followed by Quaternary (0.21), Permain (0.21), Cretaceous (0.15), Jurassic (0.07), and Tertiary (0.06) formations. Results showed that the land use of water bodies had the highest influence on flooding (0.75), followed by residential area (0.15), garden (0.06), forestlands (0.02), grasslands (0.01), farmlands (0), and barren lands (0). The lowest amount of rainfall (188-333 mm) had the highest impact (0.4) on flooding. In the study area, the more the rainfall, the lower the flooding probability, due to the fact that-in mountainous areas-rainfall would increase with elevation increase; however, flooding occurs at lower elevations.

Model Comparison between the Proposed New ANFIS Ensemble Models
The newly designed three ANFIS ensemble optimization models-namely ANFIS-CA, ANFIS-BA, and ANFIS-IWO-were used with MATLAB R2016 and ArcGIS 10.2. These models were trained, like other smart models, based on a part of data. Therefore, all data of flood and non-flood points were divided into two categories with the ratio of 30 and 70 percent used as training and test data, respectively.
Training and testing datasets were used as fundamental elements of these models. The training dataset for these three hybrid models was applied to find the relationship between SWARA and values of flood (1) and non-flood (0) locations and finally using testing data, the accuracy of the built model was investigated. Accuracy of training and testing is shown in Figure 9a-f. According to Figure 9a,c,e, the RMSE for ANFIS-CA, ANFIS-BA, and ANFIS-IWO in the training was 0.314, 0.274, and 0.067, respectively. Thus, the hybrid model of ANFIS-IWO showed a better performance with training dataset and had a higher degree of fit; however, the best optimized model was the one which predicted the results of the test data with a higher accuracy. The values of RMSE for testing datasets are shown in Figure 9b,d,f. The values of RMSE for ANFIS-CA, ANFIS-BA, and ANFIS-IWO in testing process were 0.449, 0.365, and 0.359, respectively. Therefore, the ANFIS-IWO optimization algorithm indicated a better performance for both training and testing phases, followed by ANFIS-BA and ANFIS-CA.
were divided into two categories with the ratio of 30 and 70 percent used as training and test data, respectively. Training and testing datasets were used as fundamental elements of these models. The training dataset for these three hybrid models was applied to find the relationship between SWARA and values of flood (1) and non-flood (0) locations and finally using testing data, the accuracy of the built model was investigated. Accuracy of training and testing is shown in Figure 9a-f. According to Figure 9a,c,e, the RMSE for ANFIS-CA, ANFIS-BA, and ANFIS-IWO in the training was 0.314, 0.274, and 0.067, respectively. Thus, the hybrid model of ANFIS-IWO showed a better performance with training dataset and had a higher degree of fit; however, the best optimized model was the one which predicted the results of the test data with a higher accuracy. The values of RMSE for testing datasets are shown in Figure 9b,d,f. The values of RMSE for ANFIS-CA, ANFIS-BA, and ANFIS-IWO in testing process were 0.449, 0.365, and 0.359, respectively. Therefore, the ANFIS-IWO optimization algorithm indicated a better performance for both training and testing phases, followed by ANFIS-BA and ANFIS-CA. In addition, it should be noted that the processing speed of the models is also important nowadays. The processing time of 1000 iterations was then estimated as 1, 120, 260 s and 1100 s for the ANFIS-BA, ANFIS-CA, and ANFIS-IWO hybrid models by coding, respectively. As a result, the processing speed of ANFIS-CA had the least time, and the ANFIS-BA model had the highest process time. On the other hand, we examined the convergence of each model in the training phase ( Figure  10). The convergence curve was gained by graphing the calculated function of cost in each iteration of three models (Figure 11). Results showed that cost-function values of the ANFIS-CA model were fixed in the 25th iteration, which indicates an iteration convergence of this model in comparison to other models. On the contrary, ANFIS-BA and ANFIS-IWO models converged in 450th and 625th iterations indicating the slow speed of these models in achieving convergence, respectively. In addition, it should be noted that the processing speed of the models is also important nowadays. The processing time of 1000 iterations was then estimated as 1, 120, 260 s and 1100 s for the ANFIS-BA, ANFIS-CA, and ANFIS-IWO hybrid models by coding, respectively. As a result, the processing speed of ANFIS-CA had the least time, and the ANFIS-BA model had the highest process time. On the other hand, we examined the convergence of each model in the training phase ( Figure 10). In addition, it should be noted that the processing speed of the models is also important nowadays. The processing time of 1000 iterations was then estimated as 1, 120, 260 s and 1100 s for the ANFIS-BA, ANFIS-CA, and ANFIS-IWO hybrid models by coding, respectively. As a result, the processing speed of ANFIS-CA had the least time, and the ANFIS-BA model had the highest process time. On the other hand, we examined the convergence of each model in the training phase ( Figure  10). The convergence curve was gained by graphing the calculated function of cost in each iteration of three models (Figure 11). Results showed that cost-function values of the ANFIS-CA model were fixed in the 25th iteration, which indicates an iteration convergence of this model in comparison to other models. On the contrary, ANFIS-BA and ANFIS-IWO models converged in 450th and 625th iterations indicating the slow speed of these models in achieving convergence, respectively. The convergence curve was gained by graphing the calculated function of cost in each iteration of three models (Figure 11). Results showed that cost-function values of the ANFIS-CA model were fixed in the 25th iteration, which indicates an iteration convergence of this model in comparison to other models. On the contrary, ANFIS-BA and ANFIS-IWO models converged in 450th and 625th iterations indicating the slow speed of these models in achieving convergence, respectively.

Model Configuration and Generating of FSMs Using ANFIS Ensemble Models
As the main aim of flood susceptibility modeling was the reorganization of prone areas with higher probability of flooding; therefore, three optimization models-namely cultural, bees, and invasive weed optimization algorithms-were combined with SWARA-ANFIS to optimize the model for identifying flood prone areas with higher accuracy. ANFIS hybrid models were built using the training dataset and SWARA method values which were standardized between 0 and 1 in MATLAB R2016 software, The MathWorks, Inc, Massachusetts, MA, USA. Then, the constructed models were applied to the entire study area to create flood susceptibility probability (indices) and finally these indices for each pixel size (pixel-by-pixel) was used to create the final FSMs in ArcGIS 10.2 for the Haraz watershed. In the next step, indexes were reclassified into five classes (very low, low, moderate, high, and very high susceptibility) using the quantile method [18]. Three FSMs were then developed for comparative visualization, as shown in Figure 12a-c. Results demonstrated that the areas near rivers with lower slopes and altitudes had higher probabilities of flooding.

Model Configuration and Generating of FSMs Using ANFIS Ensemble Models
As the main aim of flood susceptibility modeling was the reorganization of prone areas with higher probability of flooding; therefore, three optimization models-namely cultural, bees, and invasive weed optimization algorithms-were combined with SWARA-ANFIS to optimize the model for identifying flood prone areas with higher accuracy. ANFIS hybrid models were built using the training dataset and SWARA method values which were standardized between 0 and 1 in MATLAB R2016 software, The MathWorks, Inc, Massachusetts, MA, USA. Then, the constructed models were applied to the entire study area to create flood susceptibility probability (indices) and finally these indices for each pixel size (pixel-by-pixel) was used to create the final FSMs in ArcGIS 10.2 for the Haraz watershed. In the next step, indexes were reclassified into five classes (very low, low, moderate, high, and very high susceptibility) using the quantile method [18]. Three FSMs were then developed for comparative visualization, as shown in Figure 12a-c. Results demonstrated that the areas near rivers with lower slopes and altitudes had higher probabilities of flooding.

Validation of Flood Susceptibility Maps
For validation of the three FSMs, both success and prediction rate curves were applied. The ROC plots are shown in Figure 13a,b. For the training phase, the ANFIS-IWO had the highest AUROC (0.948), followed by ANFIS-BA (0.946) and ANFIS-CA (0.942), implying that ANFIS-IWO had a fitter degree with training dataset. The validation of three obtained maps for the testing dataset revealed that ANFIS-BA (0.944) had a higher flood prediction capability for the Haraz watershed, followed by ANFIS-IWO (0.939) and ANFIS-CA (0.932). According to the relationship between AUROC and

Validation of Flood Susceptibility Maps
For validation of the three FSMs, both success and prediction rate curves were applied. The ROC plots are shown in Figure 13a,b. For the training phase, the ANFIS-IWO had the highest AUROC (0.948), followed by ANFIS-BA (0.946) and ANFIS-CA (0.942), implying that ANFIS-IWO had a fitter degree with training dataset. The validation of three obtained maps for the testing dataset revealed that ANFIS-BA (0.944) had a higher flood prediction capability for the Haraz watershed, followed by ANFIS-IWO (0.939) and ANFIS-CA (0.932). According to the relationship between AUROC and prediction capabilities of models, these models showed excellent performances based on the following classification: 0.5-0.6 (poor), 0.6-0.7 (average), 0.7-0.8 (good), 0.8-0.9 (very good), and 0.9-1 (excellent) [25,90].
Water 2018, 10, x FOR PEER REVIEW 20 of 28 prediction capabilities of models, these models showed excellent performances based on the following classification: 0.5-0.6 (poor), 0.6-0.7 (average), 0.7-0.8 (good), 0.8-0.9 (very good), and 0.9-1 (excellent) [25,90]. As the most appropriate model to predict FSM for the Haraz watershed was the ANFIS-BA algorithm, the other two hybrid models had overestimated the results in very low, high, and very high susceptibility classes and underestimated in low and moderate classes ( Figure 14). It can be seen that, according to the ANFIS-BA algorithm, the very high susceptibility class covered 17% of the study area, and analysis of flood location revealed that about 63% of total flood locations was located in this class. Overall, the very high and high classes contained about 85% of the total flood locations. The low and moderate classes covered about 50% of the study area according to the ANFIS-BA algorithm.
Results of the Freidman and Wilcoxon tests are shown in Tables 2 and 3. They revealed that since the p-value was less than 0.05 (0.00) and chi-square was more than 3.84 (standard value), the null hypothesis was rejected, indicating there were significant differences among the three flood susceptibility models.
The Wilcoxon test was carried out to check the statistical significance of pairwise differences between flood models. Based on the judgment, there were significant differences between ANFIS-CA and ANFIS-BA and ANFIS-CA and ANFIS-IWO; however, there was no significant difference between ANFIS-BA and ANFIS-IWO algorithms. As the most appropriate model to predict FSM for the Haraz watershed was the ANFIS-BA algorithm, the other two hybrid models had overestimated the results in very low, high, and very high susceptibility classes and underestimated in low and moderate classes ( Figure 14). It can be seen that, according to the ANFIS-BA algorithm, the very high susceptibility class covered 17% of the study area, and analysis of flood location revealed that about 63% of total flood locations was located in this class. Overall, the very high and high classes contained about 85% of the total flood locations. The low and moderate classes covered about 50% of the study area according to the ANFIS-BA algorithm.
Results of the Freidman and Wilcoxon tests are shown in Tables 2 and 3. They revealed that since the p-value was less than 0.05 (0.00) and chi-square was more than 3.84 (standard value), the null hypothesis was rejected, indicating there were significant differences among the three flood susceptibility models.
The Wilcoxon test was carried out to check the statistical significance of pairwise differences between flood models. Based on the judgment, there were significant differences between ANFIS-CA and ANFIS-BA and ANFIS-CA and ANFIS-IWO; however, there was no significant difference between ANFIS-BA and ANFIS-IWO algorithms.

Discussion
Flooding is known as the most frequent and destructive natural hazard. The occurrence of floods is increasing worldwide and its human losses and socio-economic damages pose huge pressure on communities. This trend is occurring in northern Iran as well, especially in the Haraz watershed. This huge burden of floods on human societies has brought about geophysicists, hydrologists, water resources engineers, geologists, and geomorphologists to study this phenomenon from different aspects such that its management can be met and its financial damages can be mitigated. One of the most important tasks of these scientists is to identify the areas within a watershed through different tools that are very vulnerable to flood generation. Field work might be preferable; however, it is costly, time-consuming, and hard to conduct. Therefore, a modeling approach is always an alternate tool to direct measurement of flood characteristics. The present study introduces three novel hybrid models-ANFIS-CA, ANFIS-BA, and ANFIS-IWO algorithms-for identifying flood prone areas of the Haraz watershed with a high precision and compares their prediction capabilities. In this study, the SWARA method was adopted to select the most effective factors and the most effective class of each of 10 conditioning factors. Our finding indicated that all factors had significant relationships with floods such that they were all selected for flood susceptibility modeling. Three flood susceptibility maps were generated using three above-mentioned novel hybrid models and they were then classified into five classes.
A comparison between the results of modeling process and five flood susceptibility classes including very low, low, moderate, high, and very high susceptibility classless on the terrain was

Discussion
Flooding is known as the most frequent and destructive natural hazard. The occurrence of floods is increasing worldwide and its human losses and socio-economic damages pose huge pressure on communities. This trend is occurring in northern Iran as well, especially in the Haraz watershed. This huge burden of floods on human societies has brought about geophysicists, hydrologists, water resources engineers, geologists, and geomorphologists to study this phenomenon from different aspects such that its management can be met and its financial damages can be mitigated. One of the most important tasks of these scientists is to identify the areas within a watershed through different tools that are very vulnerable to flood generation. Field work might be preferable; however, it is costly, time-consuming, and hard to conduct. Therefore, a modeling approach is always an alternate tool to direct measurement of flood characteristics. The present study introduces three novel hybrid models-ANFIS-CA, ANFIS-BA, and ANFIS-IWO algorithms-for identifying flood prone areas of the Haraz watershed with a high precision and compares their prediction capabilities. In this study, the SWARA method was adopted to select the most effective factors and the most effective class of each of 10 conditioning factors. Our finding indicated that all factors had significant relationships with floods such that they were all selected for flood susceptibility modeling. Three flood susceptibility maps were generated using three above-mentioned novel hybrid models and they were then classified into five classes.
A comparison between the results of modeling process and five flood susceptibility classes including very low, low, moderate, high, and very high susceptibility classless on the terrain was taken place. Basically, in the very high and high susceptibility classes, all hybrid models covered low lands (flats) and the areas around the rivers. Most of observation floods are occurred due to overbanking the flow. Also, the floodplains over the study area are surrounded by this class of susceptibility. This region located generally in the elevations less than 400 m above sea level, slope angle less than 8% in concave slopes with rainfall more than about 400 mm and the distance less than 200 m from the rivers. The obtained results of modeling process indicated that the ANFIS-CA and ANFIS-IWO have occupied more pixels of very high susceptibility class in comparison to ANFIS-BA algorithm.
In the low and moderate susceptibility class, with increasing the distance from the rivers, the areas where are covered by floods will be decreased. This region generally occupied by slope between 8% and 12%, elevation above sea level between 400 and 2000 m, concave and convex slopes, and the distance from the river between 200 and 700 m. The results of modeling process concluded that the ANFIS-BA hybrid model has covered most areas of the study area in comparison to the ANFIS-CA and ANFIS-IWO hybrid models.
The very low susceptibility class mainly included the mountainous areas and hill slopes which are much far away from the rivers compared to the other susceptibility classes. In term of topography, this zone is usually located in the areas where the slope angle and elevation above-sea level are more than 20% and 2000 m, respectively. The slope shape in this class is mainly concave and convex slopes and the distance from the river is more than 700 m. The modelling process revealed that the ANFIS-BA hybrid model has the least areas over the study area unlike to the ANFIS-CA and ANFIS-IWO hybrid models.
Three ANFIS ensemble optimization approaches-including ANFIS-CA, ANFIS-BA and ANFIS-IWO-were designed to flood modeling of the Haraz watershed for which the entire dataset was divided into two series; 70% of the data for training of models and the rest for testing. Results of model evaluation using RMSE for the best degree of fit in the training phase and the best prediction accuracy in the testing phase revealed that the ANFIS-IWO optimization algorithm indicated a better performance in training and testing phases, followed by ANFIS-BA and ANFIS-CA, while all three models provided acceptable results for flood modeling. However, results of the AUC showed that ANFIS-IWO had the best degree of fit in the training phase which was coincident with the results of RMSE; whereas, in the testing phase, the results of AUC revealed that ANFIS-BA had the best prediction power than other models. The most important disadvantage of RMSE was that it acted based only on error assessments; therefore, better approaches must be found to resolve this weak spot, as the model should be chosen based on its abilities. Treating the ROC and AUC based on true positive (TP), true negative (TN), false positive (FP), and false negative (FN) showed more accurate evaluation of the models than RMSE [39]. According to AUC, the ANFIS-IWO had the fittest degree to the training data; however, the ANFIS-BA had the highest prediction capabilities compare other methods in testing phase. As the testing dataset was not used for modeling, the model evaluation must be performed based on the testing dataset. The challenge here is determining how it is possible for a model to show highest degree of fit in the training phase (ANFIS-IWO) but not to have the highest prediction power in the testing phase. Termeh et al. [39] stated that this situation can take place in the models with the over fitting problem. The BA algorithm used a series of parameters including number of scout bees, best patches, as well as elite patches in the selected patches, amount of employed bees in the elite patches, number of recruited bees in the non-elite best patches, the neighborhood size for each patch, number of iterations, together with difference between first and last iterations value; make it robust. The bees algorithm has local and global searching capabilities that find the best locations. Finally, three designed models were also evaluated for processing speed and convergence criterion. Results indicated that although ANFIS-CA had the lowest prediction power than other models but acquired the lowest processing speed and the highest speed of convergence.
Some studies focused on spatial prediction of floods using machine learning and optimization algorithms over the world. In this case, Chapi et al. [38] introduced a new machine learning algorithm, namely bagging-logistic model tree (BA-LMT) for flood modelling. They concluded that the proposed ensemble model had outperformed the LMT, the logistic regression, the Bayesian logistic regression, and the random forest algorithms. Khosravi et al. [36] compared some soft computing benchmark machine learning algorithms-such as logistic model trees (LMT), reduced error pruning trees (REPT), naïve Bayes trees (NBT), and alternating decision trees (ADT)-for flash flood susceptibility mapping. They revealed that the ADT model had the highest prediction capability, followed by the NBT, LMT, and REPT algorithms, respectively. The results of above mentioned studies indicated the capability of machine learning algorithms in spatial prediction of floods. However, there are some other studies which conducted on optimization as evolutionary algorithms for flood susceptibility assessment such as Bui et al. [25], Hong et al. [24], Ahmadlou et al. [70] and Termeh et al. [39]. Among these studies, Bui et al. [25] introduced a new hybrid model of neural fuzzy inference system and metaheuristic optimization including evolutionary genetic and particle swarm optimization (MONF) for flood mapping. The obtained results were confirmed and compared with the J48 decision tree, the random forest, the multi-layer perceptron neural network, the support vector machine, and the adaptive neuro fuzzy inference system (ANFIS) algorithms. Additionally, Hong et al. [24] concluded that the new hybrid optimization algorithm of ANFIS-differential evolution (ANFIS-DE) outperformed and outclassed of the ANFIS-genetic algorithm (ANFIS-GA). Also, the results of Termeh et al. [39] indicated the highest capability of ANFIS-PSO ensemble model in comparison to ANFIS-DE and ANFIS-GA models. The mentioned studies indicated that the optimization algorithms for flood modeling in the different study areas over the world had more power prediction in comparison to machine learning algorithms. Basically, the results of this study which illustrated the ability of new optimization algorithm-namely bees-for flood susceptibility assessment in the study area are in agreement with other mentioned studies over the world. The bees algorithm is a more robust optimization algorithm in comparison to the CA and IWO algorithms in the study area.
What is pointed out from the above-mentioned studies is that each model has its own advantages and disadvantages. Thus, the new models must be applied and the model with the highest prediction power selected for further and future decisions. Moreover, it is better to state that some of machine learning models have weaknesses; thus, new hybrid models should be continuously introduced to resolve this problem [32]. The current research like other studies has some uncertainties especially in (1) variable factors, such as rainfall, land-use; (2) impact of climate change on flood occurrences; (3) impact of human activities on flood occurrences; and (4) determination of flood inundation map along with flood susceptibility maps. Thus, for future cases, it is recommended that 2-D flood inundation maps of flood-prone areas achieved by the current research should be produced using some commonly hydraulic models such as Hydrologic Engineering Center-River Analysis System (HEC-RAS) software.

Conclusions
FSM especially in ungauged watersheds has a scientific and practical value in the context of basin-scale water resources management and is a base for hazard and risk mapping. It can be considered as a useful tool for land-use planning, decision making, and flood disaster management. Due to complexity and non-linearity structure of watersheds, floods cannot be modeled by employing classic statistical and physically-based distributed methods; therefore, for the determination of flood prone areas in the Haraz watershed, Iran, with high accuracy, new hybrid artificial intelligence models were used. To fulfill this purpose, 201 flood locations were collected and randomly divided into two groups for training and validation dataset, and 10 flood conditioning factors were then selected. ANFIS weighted by SWARA method was applied to make an initial flood susceptibility model and three optimization models-namely CA, BA, and IWO-were adopted to optimize the models. These optimization techniques find optimal parameters and also decrease the problems of local minimum and are therefore appropriate for the training of artificial intelligence methods and optimizing their results as well, especially for solving complex problems such as flood prediction. It is obvious that more precise flood susceptibility maps can decrease the cost and damages from flooding. Finally, FSMs were constructed using ANFIS-CA, ANFIS-BA, and ANFIS-IWO and then classified into five categories using the quantile method in order to produce the susceptibility maps and the results of the achieved map represented very low, low, moderate, high, and very high susceptibility areas. Finally, these three new hybrid optimization models were evaluated using ROC and AUROC. Results revealed that ANFIS-IWO algorithm had a better degree of fit with the training dataset but ANFIS-BA had a higher prediction capability in FSM using the testing dataset followed by ANFIS-IWO and ANFIS-CA.