Modeling Spatial Flood using Novel Ensemble Artiﬁcial Intelligence Approaches in Northern Iran

: The uncertainty of ﬂash ﬂood makes them highly di ﬃ cult to predict through conventional models. The physical hydrologic models of ﬂash ﬂood prediction of any large area is very di ﬃ cult to compute as it requires lot of data and time. Therefore remote sensing data based models (from statistical to machine learning) have become highly popular due to open data access and lesser prediction times. There is a continuous e ﬀ ort to improve the prediction accuracy of these models through introducing new methods. This study is focused on ﬂash ﬂood modeling through novel hybrid machine learning models, which can improve the prediction accuracy. The hybrid machine learning ensemble approaches that combine the three meta-classiﬁers (Real AdaBoost, Random Subspace, and MultiBoosting) with J48 (a tree-based algorithm that can be used to evaluate the behavior of the attribute vector for any deﬁned number of instances) were used in the Gorganroud River Basin of Iran to assess ﬂood susceptibility (FS). A total of 426 ﬂood positions as dependent variables and a total of 14 ﬂood conditioning factors (FCFs) as independent variables were used to model the FS. Several threshold-dependent and independent statistical tests were applied to verify the performance and predictive capability of these machine learning models, such as the receiver operating characteristic (ROC) curve of the success rate curve (SRC) and prediction rate curve (PRC), e ﬃ ciency (E), root-mean square-error (RMSE), and true skill statistics (TSS). The valuation of the FCFs was done using AdaBoost, frequency ratio (FR), and Boosted Regression Tree (BRT) models. In the ﬂooding of the study area, altitude, land use / land cover (LU / LC), distance to stream, normalized di ﬀ erential vegetation index (NDVI), and rainfall played important roles. The Random Subspace J48 (RSJ48) ensemble method with an area under the curve (AUC) of 0.931 (SRC), 0.951 Based on the ﬁndings of the validation methods, the FS maps prepared using the machine learning ensemble techniques have high robustness and can be used to advise ﬂood management initiatives in ﬂood-prone areas. (RMSE), operating (ROC) are PALSAR (www.nasa.gov) to a DEM of the region. For the entire study area, the PALSAR was generated using the Mosaicing method incorporated in ENVI5.1 software. To boost the accuracy and reduce the errors of the DEM derived from PALSAR, were used. chosen using experiments, defects, and recommendations from other researchers [72]. A basic ﬁlter (smooth 2 2) was used to eliminate errors and enhance detection precision and the hydrological calculation precision was improved Sinks

considered to be the most destructive flood in Iranian history [17]. Another example is a flood that occurred on 17 March, 2019, during which, along with 12,000 houses, facilities, agricultural land, gardens, most of the towns, including Aqqala and Gomishan, and at least 70 villages, were affected [67]. Given such devastating impacts arising from flooding, identifying the most susceptible areas this flood prone region is crucial for targeting management for reducing flood damage.

Methodology
Flood susceptibility maps were prepared as part of this study using three novel ensemble ML algorithms. Using a number of main steps, the analysis was carried out. Firstly, two kinds of data, i.e., flood incidence and flood conditioning factors (FCFs), were gathered. Secondly, the flood inventory map and thematic layers of the FCFs were prepared (elevation, slope, curvature of the plan, topographic position index (TPI), topographic wetness index (TWI), convergence index (CI), stream power index (SPI), drainage density (DD), stream distance, precipitation, lithology, land use/land cover (LU/LC), normalized difference vegetation index (NDVI), and soil type). Third, 426 flood locations were reported, 70% of which were considered to prepare FS models, and 30% of which were used to validate models produced using training datasets. In step four, the tolerance applied and the variance inflation factor (VIF) multicollinearity was tested among the selected VIF estimates the multicollinearity problem among the independent factors, and if it is present, then that factor should not be used for modeling. In step five, the contribution of the factors was using the frequency ratio (FR), boosted regression tree (BRT), and AdaBoost. For step six, in order produce the flood susceptibility map (FSM), training data was first applied to predict the spatial flood susceptibility (FS) using the base classifier of J48. Then, FSMs were prepared using meta-classifiers (i.e., Real AdaBoost, Random Subspace, and MultiBoosting) ensemble models (Real AdaBoost J48, Random Subspace J48, and MultiBoosting J48). Finally, we validated the FSMs generated by the ensemble models using the ROC curve, efficiency, accuracy, and true skill

Methodology
Flood susceptibility maps were prepared as part of this study using three novel ensemble ML algorithms. Using a number of main steps, the analysis was carried out. Firstly, two kinds of data, i.e., flood incidence and flood conditioning factors (FCFs), were gathered. Secondly, the flood inventory map and thematic layers of the FCFs were prepared (elevation, slope, curvature of the plan, topographic position index (TPI), topographic wetness index (TWI), convergence index (CI), stream power index (SPI), drainage density (DD), stream distance, precipitation, lithology, land use/land cover (LU/LC), normalized difference vegetation index (NDVI), and soil type). Third, 426 flood locations were reported, 70% of which were considered to prepare FS models, and 30% of which were used to validate models produced using training datasets. In step four, the tolerance was applied and the variance inflation factor (VIF) multicollinearity was tested among the selected FCFs. VIF estimates the multicollinearity problem among the independent factors, and if it is present, then that factor should not be used for modeling. In step five, the contribution of the factors was analyzed using the frequency ratio (FR), boosted regression tree (BRT), and AdaBoost. For step six, in order to produce the flood susceptibility map (FSM), training data was first applied to predict the spatial flood susceptibility (FS) using the base classifier of J48. Then, FSMs were prepared using meta-classifiers (i.e., Real AdaBoost, Random Subspace, and MultiBoosting) ensemble models (Real AdaBoost J48, Random Subspace J48, Remote Sens. 2020, 12, 3423 5 of 30 and MultiBoosting J48). Finally, we validated the FSMs generated by the ensemble models using the ROC curve, efficiency, accuracy, and true skill statistics (TSS), and analyzed the sensitivity of the individual flood conditioning factors to the four models. Figure 2 summarizes the key steps in our methods. J48, Real AdaBoost J48, Random Subspace J48, and MultiBoosting J48 models were run in R studio programming software using the CRAN R Weka, party, regRSM packages. Finally, the layouts of the flood susceptibility maps were prepared using ArcGIS software.
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 30 (TSS), and analyzed the sensitivity of the individual flood conditioning factors to the four models. Figure 2 summarizes the key steps in our methods. J48, Real AdaBoost J48, Random Subspace J48, and MultiBoosting J48 models were run in R studio programming software using the CRAN R Weka, party, regRSM packages. Finally, the layouts of the flood susceptibility maps were prepared using ArcGIS software.

Flood Inventory Map (FIM)
The flood location was regarded as the dependent variable for the calibration of the FSMs, and the effective FCFs were regarded as the independent variables. Using the dependent and independent variable data, the FSMs were prepared based on the J48 and ensemble models. It is necessary to prepare a FIM for the spatial flood probability prediction [10,36]. In previous works, several sources were used for preparing the FIM, such as governmental sources, newspaper articles, field surveys, etc. Modified Normalized Difference Water Index (MDNWI) derived from the Sentinel-2 satellite images through the Google Earth Engine can be used for preparing the FIM [56]. In this study, for the period 2001-2019, we used records from the Ministry of Iranian Water source, investigation documents on disaster management in the Golestan Province, and, three times, field surveys were carried out (on 8 February 2018; 2 April 2019; and 21 May 2019) in the randomly selected points to construct the flood inventory map. Flood points were selected randomly among the flood sites, having the flood frequency of one flood per year. We listed 426 flood sites in the research region (Figure 3), which we randomly divided into two datasets, i.e., a training dataset (70%) and a test data set (30%). We also selected the same amount of non-flood sites, randomly, for training as well as validating the models.

Flood Inventory Map (FIM)
The flood location was regarded as the dependent variable for the calibration of the FSMs, and the effective FCFs were regarded as the independent variables. Using the dependent and independent variable data, the FSMs were prepared based on the J48 and ensemble models. It is necessary to prepare a FIM for the spatial flood probability prediction [10,36]. In previous works, several sources were used for preparing the FIM, such as governmental sources, newspaper articles, field surveys, etc. Modified Normalized Difference Water Index (MDNWI) derived from the Sentinel-2 satellite images through the Google Earth Engine can be used for preparing the FIM [56]. In this study, for the period 2001-2019, we used records from the Ministry of Iranian Water source, investigation documents on disaster management in the Golestan Province, and, three times, field surveys were carried out (on 8 February 2018; 2 April 2019; and 21 May 2019) in the randomly selected points to construct the flood inventory map. Flood points were selected randomly among the flood sites, having the flood frequency of one flood per year. We listed 426 flood sites in the research region (Figure 3), which we Remote Sens. 2020, 12, 3423 6 of 30 randomly divided into two datasets, i.e., a training dataset (70%) and a test data set (30%). We also selected the same amount of non-flood sites, randomly, for training as well as validating the models.

Generating Flood Conditioning Factors (FCFs)
Many factors control the flood susceptibility (FS) of an area, but there are no standard criteria for selecting the FCFs to depict the spatial FS. We initially selected 16 FCFs on the basis of previous literature, but eventually selected six topographical factors (elevation, slope, plan curvature, SPI TPI and CI), four hydrological factors (TWI, DD, stream distance, and rainfall), two environmental factors (LU/LC and NDVI), as well as lithology and soil type after multicollinearity analysis for FS  Many factors control the flood susceptibility (FS) of an area, but there are no standard criteria for selecting the FCFs to depict the spatial FS. We initially selected 16 FCFs on the basis of previous literature, but eventually selected six topographical factors (elevation, slope, plan curvature, SPI TPI and CI), four hydrological factors (TWI, DD, stream distance, and rainfall), two environmental factors (LU/LC and NDVI), as well as lithology and soil type after multicollinearity analysis for FS modeling [10,35]. A geological map from the Geological Survey of Iran, rainfall data from the Islamic Republic of Iran Metrological Organization, Landsat 8 Operational Land Imager (OLI) from the United States Geological Survey (USGS) (https:/earthexplorer.usgs.gov/), Phased Array type L-band Synthetic Aperture Radar Digital Elevation Model (PALSAR DEM) of 12.5 × 12.5 m resolution from the Alaska satellite facility, and soil from Mazandaran Agricultural and Natural Resources Research Centre were obtained for the preparation of the thematic FCF layers. PALSAR DEM is considered the most accurate DEM available free of charge [68]. The level of vertical accuracy is most important for the mapping the spatial flood susceptibility. Just like Gesch et al. [69], it was done to calculate the uncertainty in the DEM comparison between Advanced Land Observing Satellite (ALOS) DEM elevations and those of the ground control points (GCPs) [69]. In this analysis, ALOS DEM was used with a vertical precision of 0.3 m.
For geomorphological research and modeling natural disasters, a reliable topographical data source is essential [70]. The Interferometric synthetic aperture radar (InSAR) system and the ALOS-PALSAR data (slave image: 12 August 2009, and master image: 18 September 2012) were used in our research to construct a high precision DEM. The most critical phases in the generation of InSAR DEM include phase estimation and the transformation from phase to height [71]. In Figure 4, the steps for generating a DEM using InSAR are presented. PALSAR sensor images for the study area were downloaded (www.nasa.gov) to provide a DEM of the region. For the entire study area, the PALSAR DEM was generated using the Mosaicing method incorporated in ENVI5.1 software. To boost the accuracy and reduce the errors of the DEM derived from PALSAR, filters were used. The right filters were chosen using experiments, defects, and recommendations from other researchers [72]. A basic filter (smooth 2 × 2) was used to eliminate errors and enhance detection precision and the hydrological calculation precision was improved with the Fill Sinks filter.  [68]. The level of vertical accuracy is most important for the mapping the spatial flood susceptibility. Just like Gesch et al. [69], it was done to calculate the uncertainty in the DEM comparison between Advanced Land Observing Satellite (ALOS) DEM elevations and those of the ground control points (GCPs) [69]. In this analysis, ALOS DEM was used with a vertical precision of 0.3 m. For geomorphological research and modeling natural disasters, a reliable topographical data source is essential [70]. The Interferometric synthetic aperture radar (InSAR) system and the ALOS-PALSAR data (slave image: 12 August 2009, and master image: 18 September 2012) were used in our research to construct a high precision DEM. The most critical phases in the generation of InSAR DEM include phase estimation and the transformation from phase to height [71]. In Figure 4, the steps for generating a DEM using InSAR are presented. PALSAR sensor images for the study area were downloaded (www.nasa.gov) to provide a DEM of the region. For the entire study area, the PALSAR DEM was generated using the Mosaicing method incorporated in ENVI5.1 software. To boost the accuracy and reduce the errors of the DEM derived from PALSAR, filters were used. The right filters were chosen using experiments, defects, and recommendations from other researchers [72]. A basic filter (smooth 2 × 2) was used to eliminate errors and enhance detection precision and the hydrological calculation precision was improved with the Fill Sinks filter. Slope: flooding is specifically related to the gradient of slopes [73][74][75]. The surface slope of the area (Figure 5a) is an important factor influencing the frequency of flooding, since there is very little opportunity for rainwater to penetrate steep hills and much more opportunity for rainwater to penetrate plain and gently sloping land [30]. The slope determines not only the velocity of the flow, but also the intensity and volume of run-off. Steeper slopes are more vulnerable to flash floods, but gentle slopes are more likely to experience longer duration flooding. Ultimately, the combination of steep slopes in the upper part of the watershed and gentle slopes in the lower portions magnifies the local flood hazard. The stagnation of water for longer periods mostly affects the flat regions and  Slope: flooding is specifically related to the gradient of slopes [73][74][75]. The surface slope of the area (Figure 5a) is an important factor influencing the frequency of flooding, since there is very little opportunity for rainwater to penetrate steep hills and much more opportunity for rainwater Remote Sens. 2020, 12, 3423 8 of 30 to penetrate plain and gently sloping land [30]. The slope determines not only the velocity of the flow, but also the intensity and volume of run-off. Steeper slopes are more vulnerable to flash floods, but gentle slopes are more likely to experience longer duration flooding. Ultimately, the combination of steep slopes in the upper part of the watershed and gentle slopes in the lower portions magnifies the local flood hazard. The stagnation of water for longer periods mostly affects the flat regions and the most devastating floods also occur in the lower sloping areas. Therefore, the slope of the study area significantly affects the intensity and duration of any flood event.
Elevation: among the different FCFs (Figure 5b), there is a reciprocal relationship between flooding and elevation [30]. The areas at a low altitude are more susceptible to floods than the higher altitude regions [76]. In reality, flooding in high elevation areas is almost impossible [77]. The height of the basin varies from 64 to 3686 m, suggesting the existence of low-lying flat plains as well as hilly terrain and mountains, while the bulk of the sample region is flat plains (Figure 5b).
Plan curvature: the curvature of the plan is a morphological element regulating the vulnerability of flooding by the controlling of water flow on a region's surface [44]. The plan curvature in the region analyzed ranges from −25.79 to 31.61 ( Figure 5c). The area's plan curvature is categorized into three concave, flat, and convex types, whereby concave areas preserve water and convex areas induce runoff. Interestingly, both concave and convex areas make up similar proportions of the study basin (45% each), which indicates that the basin is susceptible to both high run-off and flood retention.
Topographic wetness index (TWI): introduced by Beven and Kirkby [78], the TWI defines the spatial wetness status in the basin, which influences the susceptibility to flooding [28]. The TWI represents the effect of topography at every point in the catchment on the output of runoff and the volume of flow accumulation [79]. The TWI is calculated from the following Equation (1): where, in respect to per unit contour length (m 2 /m), A S represents the upslope area and β is the slope gradient (degrees). The TWI in this basin ranges from 0.55 and 24.09 ( Figure 5d). Topographic positioning index (TPI): TPI is defined as the altitudinal difference within a specified radius (R) between the central point (Z 0 ) and the average elevation (Z) around it [80,81]. Utilizing Equations (2) and (3), the TPI is measured.
The TPI value relies on the unit (R) of the landscape. A small R reveals small valleys and ridges [82]. TPI ranges from −142.59 to 152.29 within the area of study (Figure 5e).
Convergence index (CI): the CI is a major terrain element showing the relief structure of the study region as a series of channels (convergent areas) and ridges (divergent areas). It was posed by Kiss [83]. The CI is calculated using the following Equation (4): where: the average angle is shown by θ i between the direction of the neighboring cells and the central cell location. The value of CI varies between −100 and +100 in the study region ( Figure 5f). Stream power index (SPI): the SPI demonstrates the erosive capacity of surface running water [84], which has been demonstrated to affect flood vulnerability [85]. DEM was used to derive the SPI using the following Equation (5): Remote Sens. 2020, 12, 3423 9 of 30 where: the upstream region is As, and the gradient is β. The spatial distribution of the SPI ranges from 6.27 to 26.35 (Figure 5g) in the basin. Distance to stream: a method for extracting stream buffer in ArcGIS was applied is the Euclidean distance buffer (Figure 5h). Areas along the main river are more susceptible to floods, but less susceptible to remote areas [86].
Drainage density (DD): drainage density ( Figure 5i) is able to exert good influence over the frequency of floods and is calculated from the DEM using the ArcGIS line density tool [87]. This research has identified four DD groups (e.g., <0.33, 0.33-0.51, 0.51-0.7 and >0.7).
Rainfall: for the purposes of plotting average rainfall thirty years (1986 through 2016) rainfall data were obtained for 44 meteorological stations ( Figure 1). The Wahba first suggested thin plate spline process is an effective and efficient system of interpolating. In order to prepare the raster layer of rainfall data collected from different weather stations, a modified orthogonal least-square plate spline (TPS-M) was used. This technique is more reliable than the traditional method of interpolation. The works of Boer et al. [88] offered detailed explanations for the technique. Rainfall plays a central role in flood occurrence within a region, as it is the source of water in the channels as well as in the ground [89]. We defined five rainfall categories for the purpose of this study, namely <419.71 mm, 419.71-547.87 mm, 547.87-682.61 mm, 682.61-820.63 mm, and >820.636 mm.
Lithology and soil type: lithological and soil maps are of considerable significance when assessing flood-prone areas as the characteristics of the soil, such as density, degree of permeability, composition, and form of substrate directly influence the drainage cycle. The lithology and soil of a region regulate flooding by regulating erosion and infiltration in a watershed [85]. The lithological and soil maps ( Figure 5k and 5l for the study area were drawn up on the basis of an available geological map (scale: 100,000) and a soil map (1:250,000) from a Research Centre for Agricultural and Natural Resources in Mazandaran. Nine lithological units (Supplementary Table S1) and seven soil classes, (Entisols, Inceptisols, Salt Flats, Alfisols, Aridisols, Inceptisols, and Mollisols) were identified.
LU/LC and NDVI: Landsat 8 OLI images were used for producing the LU/LC and NDVI maps (Figure 5m,n) in ArcGIS. Before preparing the LU/LC and NDVI, the satellite images are pre-processed through atmospheric and radiometric correction. Classes of LU/LC with bare land and concrete surfaces increase land flow, while dense forest and vegetation limit overland flows [26]. Moreover, 632 ground control points (GPs) were identified to test the classification accuracy of the LU/LC map. Out of the 632 GCPs 190, 239, 6, 10, 4, 56, 116, 8, and 3 GCPs were taken for forest cover, agriculture class, residential class, orchard class, bare land class, dry farming, rangeland, wood land, and water/wetland classes, respectively. Kappa coefficient was calculated using the Equation (6) [90]. The average precision of the LU/LC map was reasonable (Kappa coefficient = 0.972). Nine LU/LC categories were created for the study basin. From the Landsat 8 OLI, NDVI was calculated. For calculating the NDVI, band 4 and band 3 were used. The value above zero (towards +1) indicates the vegetation and the value below zero (towards −1) indicates the water body of the region. Three NDVI categories were defined, (i.e., <0.201, 0.201-0.369 and >0.369) to represent vegetation density. An NDVI value of >0 indicates the presence of vegetation and with increasing values, the probability of it being a forest increases.

Multicollinearity Test of Effective Factors
Pradhan [91] reported a standard measure for detecting collinear variables among the factors used in probability modeling. Multicollinearity is a condition in mathematics where one predictor variable can be predicted linearly, based on the others, with a large degree of accuracy in a multiple regression model. It can identify the independent variables that can be used further in the model with the help of variance inflation factors (VIF) [92][93][94]. A VIF value greater than 5 suggests a concern with multicollinearity [95,96]. If the test identifies any collinear variable, then it should not be used for predictive modeling. In this study, fourteen independent variables were used for flood susceptibility analysis based on previous flood-related research, meaning that assessment of multicollinearity was important. The VIF was determined using the following Equation (7) among certain variables:

Multicollinearity Test of Effective Factors
Pradhan [91] reported a standard measure for detecting collinear variables among the factors used in probability modeling. Multicollinearity is a condition in mathematics where one predictor variable can be predicted linearly, based on the others, with a large degree of accuracy in a multiple regression model. It can identify the independent variables that can be used further in the model with the help of variance inflation factors (VIF) [92][93][94]. A VIF value greater than 5 suggests a concern with multicollinearity [95,96]. If the test identifies any collinear variable, then it should not be used for predictive modeling. In this study, fourteen independent variables were used for flood susceptibility analysis based on previous flood-related research, meaning that assessment of multicollinearity was important. The VIF was determined using the following Equation (7) among certain variables: 2.5. Analysis of the Relationship between FCFs and Flood Occurrences Using the Frequency Ratio (FR) Model The FR was used to determine the relation between FCFs and the frequency of floods. For a given class of variables, the FR can be defined as the ratio of the likelihood of flood to non-flood, which can be calculated in terms of the percentage share of flood pixels divided by the domain type pixel [97].
The FR values represent the importance of the class variables in terms of flood probability: where: Li is flood cells in the ith category, Ci is the total cells in the ith category, L is total flood cells, and C is the total cells. An FR value >1 shows the greater flood cell concentration in a given segment as compared to the total data layer. On the other side, if a certain category's value is below 1, this suggests a small concentration of flood cells in the data layer.

J48 Decision Tree
J48 is a decision tree algorithm that can be used to determine the attribute vector's behavior for any given number of instances [98]. The algorithm creates the rules for target variable prediction, and with the help of tree classification and the data distribution, can also be simply conceptualized [95]. The add-on features in J48 include its ability to account for missing values, tree pruning, and setting rules. It is also able to generate data mining results with higher accuracy compared to other tree-based algorithms. For the generation of decision trees, this J48 ensemble model uses the C4.5 algorithm, which is very efficient in statistical classification [99]. It is an attribute selection method performed using input information and entropy equations from any types of data with class levels [100].

Real AdaBoost
Freund and Schapire [101] proposed this new ML ensemble model to make multiple weak classifiers into one strong classifier through the process of boosting. Training data subsets can easily be obtained from such ensemble models. The AdaBoost algorithm is based on the following mechanisms: firstly, a subset of the training samples is developed, and then the initial classification configuration is generated according to the same weighted factors. Secondly, all training dataset events are simulated in this fundamental model. The erroneous cases now weight more, while the correctly graded instances weigh the same. Thirdly, the weights are standardized for all training instances and a new subset is randomly generated to develop the next model classifier-based model. That is working continuously until it stops. Ultimately, we end up with a good classifier model owing to the weighted total of all the classifier-based models previously developed.
The Real AdaBoost is a modified and advanced form of the AdaBoost that can be used to obtain a strong classifier using fewer decision trees [58]. Friedman, Hastie, and Tibshirani [101] first introduced this ensemble method from a formal statistical perspective. Their work has shown that the Real AdaBoost removed the necessity of having a coefficient as the optimal coefficient-α j = 1 [58]. It combines the AdaBoost with a real-valued classifier or half of the weighted log odds predicted by the classifier; therefore, it gives a more accurate result compared to the AdaBoost model. In the final stage, the Real AdaBoost combines all the decision tree values into one value; hence, the model can be represented as a scorecard [101].

Random Subspace
The problem of over-fitting the training data for classification in decision trees with the highest accuracy can be overcome using Random Subspace [56]. This can be used to create a tree-based decision classifier that can preserve the maximum precision of training data coupled with increased generalization performance with increasing difficulty. Here, this ensemble method is quite similar to Bagging and trees are constructed in randomly selected subspaces [102][103][104][105]. Therefore, from the original training datasets, the subspaces were randomly chosen and through a combination of voting methods, the final result was generated from the subset of each classifier training dataset [106,107].
The key benefit of the approach of learning Random Subspace is that it uses its set of samples of varying spatial characteristics to achieve variations in sub-classification [59][60][61][62].

MultiBoosting
MultiBoost is an enhancement of the AdaBoost concept that uses the power of both AdaBoosting and Wagging to solve the issue of over-fitting [63,64]. MultiBoosting can generate a decision tree with a lower error than both AdaBoost and Wagging. It is also able to perform parallel executions, which makes it more advanced compared to AdaBoost. MultiBoosting uses a random bootstrap sampling approach to establish training subsets and allocate random weights to each subset [64]. Therefore, the step-by-step process of MultiBoosting is as follows. Firstly, using the training dataset, a set of training subsets is formed through random selection and replacement and then it is used to build the classifier-based model. Secondly, the weights are determined based on the overall accuracy of the classifier-based model. Thirdly, the new subsets are developed on the basis of continuous sampling and instance weightings to train the new classifier-based models from which the result is generated in the form of classifiers.

Model Validation Techniques
Validation of the results generated from flood hazard susceptibility maps (J48, MultiBoosting J48 (MJ48), real AdaBoost J48 (RJ48), and Random Subspace J48 (RSJ48)) is an essential part of the work required to obtain a meaningful conclusion. To quantitatively validate each model, the receiver operating characteristic (ROC) curve was applied [30,108]. For various types of predictive models, ROC is a widely known validation method [109]. The ROC curve defines the reliability of the model by the AUC (area under curve). The AUC values range between 0.5 and 1 where 0.5 reveals that the model cannot estimate the susceptibility to flooding, and 1 implies a good forecast [110]. Of all the flood inventory points, flood susceptibility index values were binarized considering the threshold flood probability index values, which is "the optimal point on the ROC curve closest to the point (0, 1)" using the 'Optimal Cutpoints' package in R [111]. For both training and testing data sets, the ROC curve was used to test the susceptibility maps for flood hazards. The ROC curves of training datasets are called success rate curves for existing flood hazards, whereas the ROC curves of validation datasets are termed prediction rate curves for future flood hazard probabilities. The ROC was equipped to equate the true positive (TP) values with the false positives (FP) at different thresholds. The true positive rate (TPR) is seen as precision and the false positive rate (FPR) as the specificity, drop, or probability of an inaccurate calculation. A higher TPR is more reliable and a higher FPR is the representation of model prediction errors. The equations for TPR and FPR are as follows: where: TN, FP, TP, and FN are the true negative, false positive, true positive, and false negative respectively [112]. The scale of TPR ranges from 0 to 1, whereby the categorical subdivisions are as follows, Excellent (0.9-1), very good (0.8-0.9), good (0.7-0.8), moderate (0.6-0.7), and weak (0.5-0.6) [107]. RMSE (Equation (10)) was used to determine the difference between actual and expected flood results. Low values of RMSE indicate better models [106,113].
In addition, two threshold-dependent evaluation criteria, i.e., Efficiency (E) and True Skill Statistics (TSS) [114,115] were implemented to test model robustness. The value of efficiency and TSS ranges from 0 to 1. These two evaluation matrices were calculated based on the following equations:

Sensitivity Analysis (SA)
The uncertainty in the preparation of the data layers is very difficult to eradicate completely [116]. The sensitivity analysis was used in numerous experiments [117][118][119] to calculate the impact of differences on model outputs, thereby allowing for a systematic estimation of the relative significance of sources of uncertainty. Chen et al. [120] introduced map removal sensitivity analysis (MRSA) method was used in this research. The MRSA approach measures the sensitivity of susceptibility maps to flood by eliminating one or more parameters. Various studies have used this technique to examine the relative importance of the effective factors [121][122][123]. The percentage contribution (PC) of each FCF was calculated using the MRSA process, using the following Equation (14) to examine the relative importance of the model output [124].
where: AUC all and AUC i designate the AUC values obtained from flood susceptibility models using all FCFs and the model when the ith FCF has been excluded.

Considering Multicollinearity of Effective Factors
Based on the VIF and tolerance values, it can be inferred that no variables have problem of multicollinearity as the VIF is less than 10, and the tolerance is less than 0.2 (Table 1). Therefore, the variables are independent and can be used for the spatial flood susceptibility modeling.

Spatial Relationship between Flood Probability and FCFs
The second part of the work investigated the relation between independent variables and flood sensitivity using the FR bivariate statistical method outlined in Table 2. The FR values represent the importance of the class variables in terms of flood probability. The highest FR value (2.43) for the elevation variable is in the <287 m class, which means the lowest elevation range has the highest flood susceptibility, whereas the flood susceptibility is 0 for the elevation class of >784 m, which indicates no flood probability above this elevation. For the slope factor, the highest flood probability is in the slope range of <5.8 • as indicated by FR = 2.12, which is the highest for this variable, while it is 0 for slopes between 14.2 • and 32.5 • ( Table 2). The values of FR for the factor plan curvature exhibits a kind of uniform response, but is still highest (1.15) for flat plan curvature and lowest (0.92) in convex areas, while concave areas have a significant FR value (1.05). In terms of the CI, the maximum FR value (1.94) was found for the <−52.9 CI class and the lowest (0.69) for the −16.07 to 14.5 CI class. It can, therefore, be argued that the <−52.9 CI class plays a significant role in controlling susceptibility to floods. From the FR values for SPI, we can see that the lowest SPI level of <8.87 has the maximum FR value (1.88), and as the strength of the stream reduces, the probability of a flood rises. The FR value for SPI is lowest (0.26) in the 12.8 to 15.7 range class, which indicates a lower flood probability in this SPI category. For TPI, the utmost FR value (1.45) was estimated for the −3.71 to 2.34 class range and it was zero for the >9.62 range, which means that positive TPI values are associated with lower or even zero flood susceptibility. However, in the negative range, the flood probability is quite high, which implies that because the central part of the study landscape is low-lying, it is more prone to flooding and vice-versa. For the TWI, the opposite is true in that increasing TWI values indicate a higher flood probability and vice-versa. Here, the highest FR value (1.58) of the TWI classes is in the 7.49 to 11.08 category, whilst the lowest (0.23) for the <5.07 class, which indicates that topographic wetness plays a significant role in flood occurrence. A wetter surface will hinder the infiltration of water and increase the amount of overland flow, thus, encouraging flooding. The FR of the drainage density (DD) classes was highest in the greater DD classes; i.e., 1.48 FR for the >0.7 km/km 2 class , and lowest in the smallest DD class. For the distance to stream variable, the FR value is highest (3.56) in the less than 100 m class and lowest (0.31) in the >400 m class. Therefore, it may be inferred that proximity to the river matters the most in terms of flood susceptibility. Quite interestingly, for rainfall, the utmost FR value was estimated for the lower ranges, i.e., an FR of 2.16 in the 419.7 to 547.8 mm class, while the FR was 0 in the highest rainfall range of >820.6 mm. Here, the highest rainfall zones are mainly located in the higher elevation areas, which reduce the chance of having flood pixels located there. Thus, in this region lowest rainfall zones are more flood-prone and due to altitude factor, highest rainfall regions are least flood susceptible. The high rainfall in the upper part of the region is actually causing flash flood in the downstream area due to slope induced high velocity run-off. The wetland and residential areas in the LU/LC divisions had higher FR values of 22.86 and 11.73, respectively, while the equivalent value was calculated as 0 for orchards, flat land, and forest. In terms of the lithology, only the 'Qsw, Qft2, Qm, Qft1, Qs, d, Qal' category has an FR value because this lithological segment is a low-lying area. A summary of the lithology of the basin is provided in Supplementary Information. The lowest NDVI class (<0.201) has the highest FR value (1.49) and the highest class (>0.369) has the lowest FR value (0.04), which indicates that the proneness of flood is less in the more forested and vegetated areas. In terms of the soil types, the Aridisols have the highest flood probability (FR = 2.18) whereas Inceptisols, Salt Flat, and Alfisols do not affect flooding. Therefore, from this analysis, we can identify the relative importance of different factors and their classes on flood susceptibility in the study area.

Flood Susceptibility Models (FSMs)
Our flood hazard susceptibility models have been developed using ML ensemble methods. These are capable of forecasting the susceptibility of flooding with the aid of various categorical factors. Flood hazard susceptibility maps (FSMs), based on Jenks' natural break mechanisms, were divided into five subclasses: very high, high, moderate, low, and very low. The method of Jenks optimization, also called natural break classification, is a process of data segmentation to estimate the optimal value structure of the different classes. This method of classification aims to reduce the average disparity between classes while increasing the disparity from the rest of the class. The method thus eliminates intraclass and maximizes group disparities. A higher index rating represents areas that are particularly vulnerable to flooding, and vice versa. According to the J48 model, 46.63% of the basin area comes under the low flood susceptibility zone, whereas 18.67% and 29.13% of the area are very high and high flood-prone areas, respectively. The FSM generated using MJ48 suggested that 62.81% of the area falls into the very low flood susceptibility zone, whereas 24.6% of the basin is very highly prone to flooding (Table 3). The results generated in the RJ48 FSM suggested that 76.38% of the study basin has a very low flood occurrence probability, whereas 16.23% has a very high chance of flooding. The other categories have a significantly lower percentage of shares in terms of flood probability. Lastly, the RSJ48 ensemble method gives a less skewed result, predicting that 48.72% of the study area is a very low and 9.21% is a very high flood susceptibility zone, respectively ( Figure 6). According to this model, 20.98% (Table 3) of the basin area is a moderately flood-prone region, which is the highest of all the models tested. As per the output of the models, it is clear that the western and northwestern part of the study basin is a highly flood-susceptible region along with a few flanks of the southeastern and northeastern parts. The lowest elevations, slope, CI, SPI, and NDVI are located in these western and south-western parts of the study basin, which is the reason for the high to very high flood susceptibility along these zones. The rivers route water from the uphill areas and converge in this lower elevation region, which creates floods during the high rainfall periods. The natural setting of the region does not allow rapid transmission of flood water, resulting in numerous flood events. Accordingly, combination of geographic, geologic, and meteorological conditions are the main reasons for a higher propensity for flooding along this western part of the study basin. As per the output of the models, it is clear that the western and northwestern part of the study basin is a highly flood-susceptible region along with a few flanks of the southeastern and northeastern parts. The lowest elevations, slope, CI, SPI, and NDVI are located in these western and south-western parts of the study basin, which is the reason for the high to very high flood susceptibility along these zones. The rivers route water from the uphill areas and converge in this lower elevation region, which creates floods during the high rainfall periods. The natural setting of the region does not allow rapid transmission of flood water, resulting in numerous flood events. Accordingly, combination of geographic, geologic, and meteorological conditions are the main reasons for a higher propensity for flooding along this western part of the study basin.

Validation of Machine Learning Ensemble Models
Often a single validation approach is inadequate because of the usage of limited samples to validate model results. In this analysis, the ML models performance was tested using the ROC curve, efficiency, and TSS methods. For the success rate curves of the training sample (70%), the RSJ48 model was more well fitted relative to the others, as the AUC value was the maximum (0.931) followed by MJ48 (0.901), RJ48 (0.889), and J48 (0.850), respectively (Figure 7a). The highest efficiency value was achieved by the RSJ48 model followed by MJ48, RJ48, and J48, respectively

Validation of Machine Learning Ensemble Models
Often a single validation approach is inadequate because of the usage of limited samples to validate model results. In this analysis, the ML models performance was tested using the ROC curve, efficiency, and TSS methods. For the success rate curves of the training sample (70%), the RSJ48 model was more well fitted relative to the others, as the AUC value was the maximum (0.931) followed by MJ48 (0.901), RJ48 (0.889), and J48 (0.850), respectively (Figure 7a). The highest efficiency value was achieved by the RSJ48 model followed by MJ48, RJ48, and J48, respectively (Table 4). Similarly, for the training data the TSS and sensitivity values are highest for the RSJ48. The RMSE values of J48, MJ48, RJ48, and RSJ48 were 0.33, 0.35, 0.39, and 0.30, respectively.
A robust conclusion regarding the accuracy of predictions cannot be reached based on only training datasets. Prediction rate curves denote the predictive capacity of the models for future flood hazard susceptible zones in the study basin. The findings of the prediction rate curve suggest that the RSJ48 model is the most effective as the AUC was maximum (0.951) followed up by MJ48 (0.929).
The efficiency values were 0.84, 0.84, 0.85, and 0.86, respectively of the J48, MJ48, RJ48, and RSJ48 models for the validation datasets, whereas TSS values for the J48, MJ48, RJ48, and RSJ48 models were 0.68, 0.70, 0.71, and 0.71, respectively. The sensitivity values for J48, MJ48, RJ48, and RSJ48 were 0.83, 0.80, 0.83, and 0.84, respectively ( Table 4). The corresponding respective RMSE values were 0.35, 0.40, 0.34, and 0.33, respectively (Table 4). Figure 8 shows the values of FR for the selected models. Sub-class FR values were measured by dividing the amount of flood locations contained in the FS class and the total pixels of a particular FS class and the total number of flood location pixels by the FSM's total pixel. Here, a higher value of FR for the very high FS class indicates the accuracy of the models and vice-versa. In our study, RSJ48 returned the highest FR value for the very high FS class and the lowest value for the very low FS class. However, other models also returned high FR values for the very high FS class.
On the basis of the above, it may be said that the ML ensemble models show uniform results for both success rate curves and prediction rate curves and that these are reasonably accurate. Therefore, these models can demarcate the flood hazard susceptibility zones in the study basin with higher precision compared to other models used by previous research. The RSJ48 model is better able to accurately map the FS within the study area, according to the AUC of the ROC curve, the efficiency and TSS methods, than the other ensemble methods we evaluated.

Sensitivity Analysis
A sensitivity research was conducted to determine the FCF's impact on the estimation of flood susceptibility. The results were summarized as percentage contribution (PC; Table 5 and Figure 9). Flood susceptibility maps produced by the ensemble ML method had the greatest sensitivity to elevation, followed by distance to the stream, NDVI, slope, LU/LC, rainfall, TWI, SPI, drainage density TPI, lithology, convergence index and soil type (Figure 9). Study of sensitivity may help in understanding the significant geo-environmental factor, which are very relevant for understanding the model structure in question. According to the sensitivity evaluation, in the case of RSJ48, every factor had a greater percentage contribution compared to the other ML models. Therefore, based on the analysis, it can be said that RSJ48 is better than the other models we tested.

Sensitivity Analysis
A sensitivity research was conducted to determine the FCF's impact on the estimation of flood susceptibility. The results were summarized as percentage contribution (PC; Table 5 and Figure 9). Flood susceptibility maps produced by the ensemble ML method had the greatest sensitivity to elevation, followed by distance to the stream, NDVI, slope, LU/LC, rainfall, TWI, SPI, drainage density TPI, lithology, convergence index and soil type (Figure 9). Study of sensitivity may help in understanding the significant geo-environmental factor, which are very relevant for understanding the model structure in question. According to the sensitivity evaluation, in the case of RSJ48, every factor had a greater percentage contribution compared to the other ML models. Therefore, based on the analysis, it can be said that RSJ48 is better than the other models we tested.

Model Performance and Comparison
Identification and zonation of flood-prone areas in a watershed are essential for reducing the damage caused by flooding. Until now, several methodologies have been developed by the researchers around the world for FSM [35,108,125], and each of them has drawbacks and advantages. Remote Sensing (RS) and Geographic Information System (GIS) are quite powerful and effective in analyzing multidimensional incidents like flooding that are affected by many controlling factors [35,126,127]. The widely used conventional machine learning models like artificial neural network (ANN), SVM, and Random Forest (RF) are well capable of predicting flood but ensemble of machine learning algorithms with any statistical or other machine learning techniques provide better result than the single method [32,33,37]. The decision tree based methods like ANN, SVM, and RF gives flexibility in nonlinear applications with large database and these are very much effective in flood modeling but whenever any ensemble classifiers are applied with them, the problem of over-fitting as well as prediction error gets reduced significantly. In this study, we developed flood susceptibility models using ML methods (hybrid J48 approaches) with one of three ensemble techniques (Real AdaBoost, Random Subspace, and MultiBoost). Ensemble approaches combine the art of data mining for classification while the J48 is a tree classifier considered to have good performance in forecasting flood susceptibility. Hybrid ML ensemble frameworks have been shown to deliver preferential outcomes for spatial flood estimation [128,129]. The imperative to implement such models and to develop an appropriate framework for risk management of flood hazards derives from the fact that floods are responsible for significant economic loss and destruction of human infrastructure and natural environments [30,31]. The findings of our study revealed that the accuracy of the J48 model (SRC = 0.850, PRC = 0.871, TSS = 72, E = 0.86) was enhanced after the ensemble with all three meta-classifiers. Our findings are rational as the ensemble classifiers reduce overfitting problems in FS modeling to improve the efficiency of base classifiers [128,130]. The used ensemble models in the study were shown better accuracy than the single model used in the flood studies (8,10,24,31). The results of our analysis show that the RSJ48 model (AUC = 0.931, E = 0.89, TSS = 0.78, sensitivity = 0.87, and RMSE = 0.3 for testing dataset and AUC = 0.951, E = 0.86, TSS = 0.71, sensitivity = 0.84, and RMSE = 0.33 for training dataset) has the maximum predictive power, followed by the MJ48 model and the RJ48 model. The explanation for that the Random Subspace ensemble is more effective in minimizing uncertainty and discrimination compared to other ensemble approaches [131,132] and takes samples at different spatial layers from randomly chosen subspaces [58,101]. All of the models used in this research are efficient enough for predicting the flood susceptibility. The main advantage of these methods is that these can handle both the data mean continuous and categorical simultaneously. CRAN RWeka, party, regRSM packages of R studio were used for preparing these models. The RMSE of these models are low and, particularly for RSJ48, is very low. Besides, in terms of execution time and computational resources consumption, the RSJ48 model performs better than the other models used here.

Factor Contribution Analysis
Flood hazard susceptibility delineation depends on the interplay of the FCFs. In this analysis, the Real AdaBoost and boosted regression tree (BRT) were used to calculate the relative merits of the chosen variables. The non-parametric model of ML approaches is BRT, which defines a relationship between dependent and independent variables [133]. This is a key tool for evaluating the relevancy of a particular variable and addressing classification and forecasting problems [134]. Details regarding the BRT are available in the literatures of Breiman et al. [134], Therneau et al. [135] and Friedman et al. [101]. We tested fourteen factors and, based on the results, we can identify the relative importance of the factors for flood susceptibility. According to the Real AdaBoost model, NDVI, elevation, LU/LC, rainfall, and stream distance are the top five most significant flood conditioning factors in descending order, whereas SPI, TPI, and TWI are the least relevant variables ( Figure 10). However, using the BRT, elevation was determined to be the most important factor controlling flood hazard susceptibility with the highest importance value of 0.534, preceded by distance to stream (0.093), NDVI (0.080), rainfall (0.048), and LU/LC (0.043). Here, plan curvature (0.003) and SPI (0.007) were the two least important factors for flooding susceptibility. relative importance of the factors for flood susceptibility. According to the Real AdaBoost model, NDVI, elevation, LU/LC, rainfall, and stream distance are the top five most significant flood conditioning factors in descending order, whereas SPI, TPI, and TWI are the least relevant variables ( Figure 10). However, using the BRT, elevation was determined to be the most important factor controlling flood hazard susceptibility with the highest importance value of 0.534, preceded by distance to stream (0.093), NDVI (0.080), rainfall (0.048), and LU/LC (0.043). Here, plan curvature (0.003) and SPI (0.007) were the two least important factors for flooding susceptibility.

Conclusions
Flooding hazards are very sensitive and devastating issues in the world. Recurring flood events cause a huge loss of human life and damage to properties, which should be mitigated at every possible level. Flash floods are a real threat to the sustainability of our societies. Therefore, the accurate prediction of susceptibility to flood events is necessary to improve management practices in effort to reduce flood damage. This work investigated the application of ML ensemble models for predicting flood hazard susceptibility zones in the study basin, along with the identification of the key controlling variables. The ML models suggested that 9.21% to 18.67% of the study basin has a very high susceptibility to flooding. The meta-classifiers we tested increased the accuracy of the J48 machine learning model. Of the three new ensemble models we tested, RSJ48 has better accuracy. Remote sensing and geographical information system integrating with the ML algorithms are creating great foundations for the management of different natural hazards. However, the present work will provide some guidance to other researchers wanting to apply ML ensemble methods to investigate susceptibility to flooding. More applications of the methods reported herein are, however, needed to assess the full extent of the suitability of our methods beyond the case study.

Conclusions
Flooding hazards are very sensitive and devastating issues in the world. Recurring flood events cause a huge loss of human life and damage to properties, which should be mitigated at every possible level. Flash floods are a real threat to the sustainability of our societies. Therefore, the accurate prediction of susceptibility to flood events is necessary to improve management practices in effort to reduce flood damage. This work investigated the application of ML ensemble models for predicting flood hazard susceptibility zones in the study basin, along with the identification of the key controlling variables. The ML models suggested that 9.21% to 18.67% of the study basin has a very high susceptibility to flooding. The meta-classifiers we tested increased the accuracy of the J48 machine learning model. Of the three new ensemble models we tested, RSJ48 has better accuracy. Remote sensing and geographical information system integrating with the ML algorithms are creating great foundations for the management of different natural hazards. However, the present work will provide some guidance to other researchers wanting to apply ML ensemble methods to investigate susceptibility to flooding. More applications of the methods reported herein are, however, needed to assess the full extent of the suitability of our methods beyond the case study.