Determining Flood Zonation Maps, Using New Ensembles of Multi-Criteria Decision-Making, Bivariate Statistics, and Artiﬁcial Neural Network

: Golestan Province is one of the most vulnerable areas to catastrophic ﬂood events in Iran. The ﬂood severity in this region has grown dramatically during the last decades, demanding a major investigation. Accordingly, an authentic map providing detailed information on ﬂoods is required to reduce future ﬂood disasters. Three ensemble models produced by the combination of Evaluation Based on Distance from Average Solution (EDAS) and Multilayer Perceptron Neural Network (MLP) with Frequency Ratio (FR), and Weights of Evidence (WOE) are used to quantify the map ﬂood susceptibility in Golestan Province, in the north of Iran. Ten ﬂood effective criteria, namely altitude, slope degree, slope aspect, plan curvature, distance from rivers, Topographic Wetness Index (TWI), rainfall, soil type, geology, and land use, are considered for the modeling process. The ﬂood zonation maps are validated by the receiver operating curve (ROC). The results show that the most precise model is MLP-FR (AUROC = 0.912), followed by EDAS-FR-AHP (AUROC = 0.875), and EDAS-WOE-AHP (AUROC = 0.845). The high accuracies of all methods applied to illustrate their capability in predicting ﬂood susceptibility in future studies.


Introduction
Among climate-related hazards, flood is considered to have the largest impact on humans and the environment, globally [1]. In the last decades, most cities have experienced a population explosion at an alarming rate, leading to a dramatic increase in flood potential [2]. Population growth and, therefore, urbanization results in the change of agricultural land and natural vegetation into mainly impervious built-up environments, leading to higher storm-water runoff and flooding [3,4]. Flood hazards are expected to increase in frequency and severity in many parts of the world through the impacts of global change on climate, severe weather in the form of heavy rains, and river discharge conditions [5]. For all of the above reasons, it is crucial to perform flood hazard, vulnerability, and risk assessments and incorporate the results in urban planning and management to mitigate future risks [6]. Producing flood zonation maps to diagnose at-risk areas is one of the primary steps for flood hazard mitigation [7].
Iran is one of the most flood-prone countries in the world, owing to its geological and geographical setting. Characteristics of the rivers and topographic conditions of Iran, especially in the north of the country, greatly contribute to the extent and intensity of

Research Overview
In the present study, 10 flood conditioning factors were selected, and the flood in ventory map was achieved according to flood and non-flood locations. For the next stage 70% of data were selected for the modeling process, and flood susceptibility maps of three ensemble models were produced. Besides, the models' performance was determined by the remaining 30% of the data. As a result, the most accurate model was determined. Fig  ure 2 illustrates the hierarchy diagram of the applied methodology, in which the details of each step are shown.

Research Overview
In the present study, 10 flood conditioning factors were selected, and the flood inventory map was achieved according to flood and non-flood locations. For the next stage, 70% of data were selected for the modeling process, and flood susceptibility maps of three ensemble models were produced. Besides, the models' performance was determined by the remaining 30% of the data. As a result, the most accurate model was determined. Figure 2 illustrates the hierarchy diagram of the applied methodology, in which the details of each step are shown.

Research Overview
In the present study, 10 flood conditioning factors were selected, and the flood inventory map was achieved according to flood and non-flood locations. For the next stage, 70% of data were selected for the modeling process, and flood susceptibility maps of three ensemble models were produced. Besides, the models' performance was determined by the remaining 30% of the data. As a result, the most accurate model was determined.

Flood Inventory Map
In order to predict the future flood occurrences in an area, it is necessary to have records of the past flood occurrences in that region [32]. In the current research, a total of 240 past flood locations were obtained from the Golestan Water organization from 2014 to 2019. Moreover, 240 non-flood locations were randomly selected in high-elevation regions with a low possibility of flooding. According to several research studies, these locations were randomly classified into two samples of training (70% of data) and testing (30% of data) [33][34][35][36] (Figure 1).

Flood Influential Criteria
In order to identify the flooding risk for any region, it is necessary to recognize the most probable hazard forming and vulnerability indicators. Besides, relative weightage should be given prudently to the chosen indicators by considering their contributing role in aggravating the flood risk. In the present study, 10 flood influential factors were selected ( Figure 3) [5,19,37]. For simulation, all effective criteria are converted to a 30 × 30 m resolution raster. Flood criteria were classified based on the Natural Break technique [6,14]. Altitude and slope degree are significant factors, which have an inverse relationship with flooding. The slope aspect has an impact on hydrologic processes, rainfall direction, and evapotranspiration [38]. Curvature indicates the ground shape and is divided into three classes, including concave, flat, and convex. Distance from rivers influences flood magnitude and frequent lateral migration [39]. TWI shows the flow propensity for going downslope due to gravity (Equation (1)) [40] and is calculated as follows: where A s is the specific area ( m 2 m ) and β is the slope angle (degree). Rainfall is the chief source of surface runoff. The intensity and volume of channel discharge are largely governed by rainfall. In this study, 10 years of meteorological data (2009-2019) from hydrometric stations were used. Types and sources of all applied factors are presented in Table 1.

Feature Selection
Determination of the importance of influential factors in flood modeling is a necessary step. In this study, before the modeling process, Information Gain (IG) test was used to find the most effective influential factors in flood susceptibility determination [41]. Equation (2) shows the IG value formulation.
where x is each attribute that belongs to dataset H with subsets Hi. This method helps to increase the prediction power and model's efficiency [42]. The value of IG varies from 0 to 1, in which values close to 1 indicate a high prediction of flood intensity factors, while a value of 0 means that flood influential factors do not affect flood occurrence.

Feature Selection
Determination of the importance of influential factors in flood modeling is a necessary step. In this study, before the modeling process, Information Gain (IG) test was used to find the most effective influential factors in flood susceptibility determination [41]. Equation (2) shows the IG value formulation.
where x is each attribute that belongs to dataset H with subsets H i . This method helps to increase the prediction power and model's efficiency [42]. The value of IG varies from 0 to 1, in which values close to 1 indicate a high prediction of flood intensity factors, while a value of 0 means that flood influential factors do not affect flood occurrence.

Frequency Ratio
Frequency ratio (FR), as a widely used bivariate statistical model, determines the probabilistic relationship between flood occurrences and different variables [14,43,44]. FR for each class of various factors is determined by Equation (3) [45].
where A is the number of flood pixels per class; B is total flood pixels; C is the number of pixels per class, and D is the total pixels of the study area.

Weights of Evidence
Weights of Evidence (WOE) determines the relationship between various criteria and flood occurrences [11,[46][47][48]. This method calculates the weights of each class as follows (Equations (4) and (5)): where B and B are indicative of the presence and absence of every criterion, in respective order. in addition, A and A are reagents of the presence and absence of flood events. The standard deviation is determined by Equation (6).
where S 2 W + and S 2 W − are variances of positive and negative weights, computed using the Equations (7) and (8), as the following: It is noted that N represents the number of unit cells. W final is also determined by Equation (9):  [26]. To evaluate different alternatives, the EDAS model employs positive and negative distances from an average solution. This technique includes six steps which are described below: Step 1. Determining the decision matrix: where n shows alternatives and m is indicative of criteria. x ij also shows the status of the i alternative in the j criterion.
Step 2. Calculation of the average solution of criteria: AV j is calculated using Equation (12).
Step 3. Computation of the positive and negative distance from the average value: If the criteria have a positive impact on flooding, the distances from the average value are calculated using Equations (13) and (14): where PDA is the positive distance from the average value and NDA is the negative distance from the average value.
On the other hand, in the case of the negative impact of criteria on flooding, positive and negative distances are computed using Equations (15) and (16).
Step 4. Determination of the optimistic (SP i ) and pessimistic (SN i ) weighted sum of the positive and negative distances for alternatives: For this step, Equations (17) and (18) are calculated as follows: w j is the weight of the j th criterion, in which ∑ m j=1 w j = 1. It should be highlighted that the assigned weights are determined using the AHP technique [49]. The implementation of this method is explained in the literature [17,50,51].
Step 5. Normalization of the SP and SN values: Using Equations (19) and (20), the normalized values of the optimistic and pessimistic weighted sum of the positive and negative distances for all alternatives are computed, respectively.
Step 6. Alternatives ranking: The final score of each alternative (AS i ) is determined using Equation (21) and then they are ranked.
It should be noted that an alternative with the greatest AS i value is the best choice among all alternatives.

Multilayer Perceptron Layer
Multilayer perceptron layer (MLP) as an artificial neural networks model is a precise technique in natural hazard prediction [52,53]. It includes three stages as the following: Step 1. Building the architecture of the network: In this study, the input layer contains ten neurons, corresponding to flood influential criteria. The output layer is composed of two neurons, indicating flood and non-flood pixels.
Step 2. Training the network: In this stage, in order to raise the accuracy of outputs, a backpropagation algorithm was employed [20]. In this procedure, weights of layers were computed via reverse calculation processes, and therefore, neuron numbers in the hidden layer were assigned by trial and error. This process continues until reaching the minimum RMSE [34].
In the current research, 70% of input data were chosen randomly for training and 30% for testing purposes to verify the network performance.
Step 3. Testing: The remaining 30% of data were selected to determine the model's efficiency in this stage (Equation (22)). Besides, the two factors of sensitivity and specificity were used to evaluate the model's predictive ability (Equations (23) and (24)).
where TP indicates correctly grouped flood pixels, TN shows correctly grouped non-flood pixels, and, FP and FN express numbers of incorrectly grouped pixels.

Validation of Models
Validation of the applied models was evaluated on both training and testing datasets using the ROC curve. Validating by training dataset indicates the goodness-of-fit of the models, while validating through testing datasets indicates the predictive capability of the models. ROC curve determines the model's performance in diagnosing flood-prone areas and has been employed in several studies [24,[54][55][56]. The area under the ROC curve (AUC) implies the precision of methods, varying between 0.5 and 1 [35,55].

Importance of Influential Factors on Flood Occurrence
The values of IG for flood influential factors are presented in Table 2. It can be observed that the slope factor had the highest importance on flooding due to the largest IG value (IG = 0.78), followed by distance from rivers (IG = 0.73), altitude (IG = 0.69), land use (IG = 0.57), lithology (IG = 0.51), plan curvature (IG = 0.47), TWI (IG = 0.42), rainfall (IG = 0.35), soil type (IG = 0.23), and slope aspect (IG = 0.12), respectively. As indicated, all the values are higher than 0, which implies the effectiveness of all factors on flood occurrences. Even though rainfall is an initial and key factor in flooding, it has one of the lowest IG values, which can be explained by the fact that morphological and topographical conditions may have more influence on flooding than rainfall intensity. The influential factors derived from DEM have a high impact on the flow accumulation process because the surface runoff moves to regions with low slope degrees and low altitudes where flow accumulates. Moreover, the distance from rivers factor had a high IG value because the rivers are the sources of flooding.

Coefficients Calculation
Results of AHP, FR, and WOE coefficient computations are displayed in Table 3 for each class. As indicated, according to the AHP technique, altitude, distance from rivers, plan curvature, slope, rainfall, and land use had the highest influence on flood occurrence, in respective order. On the other hand, soil type, TWI, lithology, and slope aspect had the lowest impact on flooding. The CR value for pairwise comparison was equal to 0.019 (<0.1), which expresses the reasonable compatibility of weighting.  Higher FR values imply a greater relationship between flood criteria and occurrences [14,43]. The results showed that the last class of altitude factors had the greatest effect on flooding due to the highest FR value. On the other hand, the other four classes had lower FR values. It can be concluded that the flood probability decreases in the case of altitude increases. Likewise, the FR weight for the slope factor was the highest in the last class and it had a lower value in other classes. It is due to the fact that the lower slopes have more possibilities of flooding and vice versa. Regarding the slope aspect factor, the greatest FR value was obtained by the flat category. However, the lowest value was determined by the southeast class. This finding is in accordance with the findings of Termeh et al.  [44], as it is found that the flood potential in the flat aspect is higher than in other categories. Results of plan curvature indicated that the flat shape had the greatest FR value, followed by convex and concave. Flat shapes had more possibility of flooding, compared to other categories, due to keeping surface runoff for more time. For the distance from rivers factor, the last two classes had the greatest influence, owing to higher FR weights. In the case of the TWI factor, the last class had the highest weight, while the first class achieved the lowest value. As is shown for the rainfall criterion, the second and last categories positively influence flood occurrence. The growth in rainfall intensity had no impact on flooding, owing to the fact that by increasing altitude, precipitation would increase as well. The most susceptible category of soil factor was silty loam, followed by Mollisols. Also, other categories negatively affect flooding. In the case of the lithology factor, Proterozoic and Cenozoic classes had a great effect on flooding, followed by Mesozoic and Paleozoic, respectively. For the land-use factor, the highest weight was observed in urban-coastal areas. In general, these residential zones are more prone to flood risk, due to their proximity to rivers and having large populations [57]. Furthermore, the dense forests-mountainous areas had the least FR values and less influence on flood events.
According to the WOE model, the last category of altitude had the greatest weight and positively influenced flooding. At the same time, the other categories had negative weights. As indicated for the slope factor, by increasing the degree, final weights decrease. Regarding the slope aspect, the highest weight was obtained by the flat category. In contrast, the southeast category had the least WOE weight. For distance from rivers, classes of <500 m and 500-1000 m had the highest effect. Although the last two classes of the TWI factor had positive weights, the other three classes had negative impacts on flood occurrences. Similar to the FR method, the second and last categories of rainfall factors had positive effects on flooding, but other categories had adverse impacts. The WOE analysis for the soil type displayed the more susceptible category related to Mollisols, followed by silty loam. In contrast, other categories negatively affect flood events. The result of lithology revealed that the Cenozoic class had the greatest effect on floods, but the Paleozoic category had a negligible impact. In the case of land use factor, the highest and lowest WOE values were obtained for urban-coastal and dense forests-mountainous areas, in respective order.

EDAS-FR-AHP and EDAS-WOE-AHP Methods
In order to produce flood zonation maps of EDAS-FR-AHP and EDAS-WOE-AHP methods, values of FR-AHP and WOE-AHP were multiplied by pixel values of the EDAS technique. Ultimately, flood zonation maps were categorized into five classes using the Quantile algorithm in GIS [13,23,58]. The following maps indicate the flood-prone areas by the EDAS-FR-AHP and EDAS-WOE-AHP models. As is shown, very low and low categories covered 29.82% and 37.62% of the region. The moderate category was also spread over 17.3% and 20.50% of the Province, and finally, very high and high categories covered 52.88% and 41.88% of the Golestan Province as shown in Figures 4 and 5.

EDAS-FR-AHP and EDAS-WOE-AHP Methods
In order to produce flood zonation maps of EDAS-FR-AHP and EDAS-WOE-AHP methods, values of FR-AHP and WOE-AHP were multiplied by pixel values of the EDAS technique. Ultimately, flood zonation maps were categorized into five classes using the Quantile algorithm in GIS [13,23,58]. The following maps indicate the flood-prone areas by the EDAS-FR-AHP and EDAS-WOE-AHP models. As is shown, very low and low categories covered 29.82% and 37.62% of the region. The moderate category was also spread over 17.3% and 20.50% of the Province, and finally, very high and high categories covered 52.88% and 41.88% of the Golestan Province as shown in Figures 4 and 5.

MLP-FR Method
After several trials and errors, the minimum RMSE value stood at 0.015 and remained stable. Resultantly, the final MLP-based structure was obtained as the following ( Figure 6):

MLP-FR Method
After several trials and errors, the minimum RMSE value stood at 0.015 and remained stable. Resultantly, the final MLP-based structure was obtained as the following ( Figure  6): In order to evaluate this model's performance, statistical measures were calculated for both training and testing datasets, which are displayed in Table 4. Regarding the results, the MLP-FR method had high precision in both training and testing samples. Hence, this method is suitable for predicting the flood-prone zones in this study area.  In order to evaluate this model's performance, statistical measures were calculated for both training and testing datasets, which are displayed in Table 4. Regarding the results, the MLP-FR method had high precision in both training and testing samples. Hence, this method is suitable for predicting the flood-prone zones in this study area. The result of the MLP-FR model is presented in Figure 7, which is also grouped into five classes. Concerning this map, very low and low categories accounted for 27.17%, while the moderate category covered around 19.02% of the area. Additionally, the largest part of the study location (53.81%) was covered by high and very high classes of flood potential.

Validation
In order to represent the validation results, both success and prediction rate, with th help of training and validating datasets were investigated (Figure 8). Regarding the suc cess rate, MLP-FR was the most accurate model among the three ensemble methods (AUC = 0.

Validation
In order to represent the validation results, both success and prediction rate, with the help of training and validating datasets were investigated (Figure 8

Discussions
Producing accurate flood zonation maps is considered a crucial way for pr measures and flood mitigation across the world. The latest destructive flooding tan Province led authors to apply several new models to identify flooding region three ensemble models, namely EDAS-FR-AHP, EDAS-WOE-AHP, and MLPemployed to generate flood zonation maps. Flood occurrences are affected by effective factors, demanding to recognize them. According to the feature method, the slope factor was the most effective factor, followed by distance fro altitude, and land use. Among different factors, the least effective one was the pect. This result is in agreement with some previous studies [9,19,23,43]. The v findings showed that the proposed MCDM-based methods (EDAS) predict the f tential of the study location with high precision. The main advantage of the tech to present optimistic and pessimistic solutions that provide flexibility in the fina tion for decision-makers. Also, the EDAS approach is performant in solving s problems due to the computation of the average solution by the arithmetic mean The upper crescent part of Golestan province had low altitude and low slope As a result, areas with low altitudes and low slopes receive more runoff during rainfall. These are the main reasons behind the large proportion of high and v susceptible areas to flood. According to the maps generated by three models, t

Discussions
Producing accurate flood zonation maps is considered a crucial way for preventive measures and flood mitigation across the world. The latest destructive flooding in Golestan Province led authors to apply several new models to identify flooding regions. Hence, three ensemble models, namely EDAS-FR-AHP, EDAS-WOE-AHP, and MLP-FR were employed to generate flood zonation maps. Flood occurrences are affected by different effective factors, demanding to recognize them. According to the feature selection method, the slope factor was the most effective factor, followed by distance from rivers, altitude, and land use. Among different factors, the least effective one was the slope aspect. This result is in agreement with some previous studies [9,19,23,43]. The validation findings showed that the proposed MCDM-based methods (EDAS) predict the flood potential of the study location with high precision. The main advantage of the techniques is to present optimistic and pessimistic solutions that provide flexibility in the final evaluation for decision-makers. Also, the EDAS approach is performant in solving stochastic problems due to the computation of the average solution by the arithmetic mean [26].
The upper crescent part of Golestan province had low altitude and low slope degrees. As a result, areas with low altitudes and low slopes receive more runoff during and after rainfall. These are the main reasons behind the large proportion of high and very high susceptible areas to flood. According to the maps generated by three models, the areas with very low to moderate flood susceptibility are mainly located in the southern and southwestern parts of the region, where the Alborz Mountain range acts as a barrier and prevents the entry of humidity derived from the Caspian Sea into these areas. Thus, these regions have low rainfall and a dry climate. As a result, the possibility of flood occurrence in this part of Golestan Province is low. Besides, the areas with high flood susceptibility are located in the northern and northwestern parts. Evaporation of the Caspian Sea increases the humidity in these regions giving rise to heavy rainfall that can lead to floods. The proximity of the water table to the ground surface, as well as the saturated soil, increases the flood intensity.
A comparison of bivariate statistical methods such as FR and WOE was performed by several researchers. In a study in Golestan Province, the flood susceptibility was assessed using FR (AUC = 76.47%) and WOE (AUC = 74.74%), in which both models had reasonable accuracy and almost similar results [44]. It should be highlighted that the ensemble models of this study had higher accuracy, which proved their capability in the study area. Likewise, for flood zonation determination across Mazandaran Province in Iran, the FR model showed higher prediction performance than the WOE method, which is in line with the reported studies [14]. In another study in Golestan Province, the use of a novel MCDM model combined with bivariate statistics was investigated and both MAIRCA-FR and MAIRCA-WOE indicated reasonable precision [10]. Research in Bangladesh found that the precision of ensemble methods was more than standalone methods [59]. In another research study in the China basin, it was revealed that multilayer-based methods had better efficiency in comparison to the MCDM methods, such as TOPSIS and VIKOR [23], which concurred with the findings of the current study. This is because of the independence of flood factor weights based on the experts' judgment [37,60]. The evaluation of the validity of standalone and ensemble of the MLP method in a study in Golestan Province indicated that the accuracy of the standalone method was not nearly as much as that of the ensemble model. It was also shown that the northern parts of Golestan Province were prone to severe floods rather than other areas [61].
Considering that in the current study, FR-AHP and WOE-AHP were combined with a new MCDM model (EDAS). The FR method was also applied as an ensemble with the MLP technique, and the precision of the flood susceptibility was improved compared to the previously used methods.

Conclusions
This research focused on the determination of flood risks across the Golestan Province in Iran, using three ensemble models, including EDAS-FR-AHP, EDAS-WOE-AHP, and MLP-FR. A flood inventory map was generated by 240 flooding and non-flooding locations, and then they were divided randomly into two categories of training (70% of data) and testing (30% of data). Ten flood effective criteria were chosen for the modeling process. Results showed that 52.88, 41.88, and 53.81% of the study region was covered by high and very high flood hazard risk for EDAS-FR-AHP, EDAS-WOE-AHP, and the MLP-FR, respectively. Accordingly, the MLP-FR model was the most efficient, owing to the highest AUROC value in training (0.934) and testing (0.912) datasets, followed by EDAS-FR-AHP and EDAS-WOE-AHP, with the success precision of 0.901 and 0.883 and predictive precision of 0.875 and 0.844, respectively. Newly employed combined methods (EDAS-based techniques) show reasonable precision in flood zonation prediction, as well as the MLP ensemble model. Resultantly, the MLP-based model was more precise than the MCDMbased model in this research owing to over-fitting prevention and noise problems, and also no dependence on the experts' judgment for weights assignments, causing uncertainty in MCDM models, which is the most important limitation of this work. It is recommended to use EDAS-based models in other study areas in further research to ensure its validity in predicting flood potential areas.