Modeling Flood Susceptibility Utilizing Advanced Ensemble Machine Learning Techniques in the Marand Plain

Asghar Rostami, Ali; Taghi Sattari, Mohammad; Apaydin, Halit; Milewski, Adam

doi:10.3390/geosciences15030110

Open AccessArticle

Modeling Flood Susceptibility Utilizing Advanced Ensemble Machine Learning Techniques in the Marand Plain

by

Ali Asghar Rostami

¹,

Mohammad Taghi Sattari

^1,2,*

,

Halit Apaydin

²

and

Adam Milewski

^3,*

¹

Department of Water Engineering, Faculty of Agriculture, University of Tabriz, Tabriz 5166616471, Iran

²

Department of Agricultural Engineering, Faculty of Agriculture, Ankara University, Ankara 06110, Turkey

³

Department of Geology, University of Georgia, 210 Field Street, Athens, GA 30602, USA

^*

Authors to whom correspondence should be addressed.

Geosciences 2025, 15(3), 110; https://doi.org/10.3390/geosciences15030110

Submission received: 10 December 2024 / Revised: 11 January 2025 / Accepted: 10 March 2025 / Published: 18 March 2025

Download

Browse Figures

Versions Notes

Abstract

Flooding is one of the most significant natural hazards in Iran, primarily due to the country’s arid and semi-arid climate, irregular rainfall patterns, and substantial changes in watershed conditions. These factors combine to make floods a frequent cause of disasters. In this case study, flood susceptibility patterns in the Marand Plain, located in the East Azerbaijan Province in northwest Iran, were analyzed using five machine learning (ML) algorithms: M5P model tree, Random SubSpace (RSS), Random Forest (RF), Bagging, and Locally Weighted Linear (LWL). The modeling process incorporated twelve meteorological, hydrological, and geographical factors affecting floods at 485 identified flood-prone points. The data were analyzed using a geographic information system, with the dataset divided into 70% for training and 30% for testing to build and validate the models. An information gain ratio and multicollinearity analysis were employed to assess the influence of various factors on flood occurrence, and flood-related variables were classified using quantile classification. The frequency ratio method was used to evaluate the significance of each factor. Model performance was evaluated using statistical measures, including the Receiver Operating Characteristic (ROC) curve. All models demonstrated robust performance, with an area under the ROC curve (AUROC) exceeding 0.90. Among the models, the LWL algorithm delivered the most accurate predictions, followed by RF, M5P, Bagging, and RSS. The LWL-generated flood susceptibility map classified 9.79% of the study area as highly susceptible to flooding, 20.73% as high, 38.51% as moderate, 29.23% as low, and 1.74% as very low. The findings of this research provide valuable insights for government agencies, local authorities, and policymakers in designing strategies to mitigate flood-related risks. This study offers a practical framework for reducing the impact of future floods through informed decision-making and risk management strategies.

Keywords:

flood susceptibility mapping; Iran; geographical information systems (GIS); machine learning; flood hazard; flood vulnerability

1. Introduction

Floods are among the most catastrophic natural disasters, leading to a significant loss of life, extensive property damage, and the destruction of essential infrastructure [1]. Asia, being highly susceptible to flooding, accounts for nearly 90% of all disaster-related deaths attributed to flood events [2]. Over the last three decades, over 80% of global natural disasters have occurred in just nine regions across Asia and Africa [3]. The frequency and intensity of flooding are expected to rise in the next few decades due to global warming, land-use changes, and population growth, potentially causing up to USD 1 trillion in damages [4,5].

Despite facing significant water shortages across nearly two-thirds of its territory [6], Iran has recently suffered considerable destruction from floods [7]. Between 17 March and 1 April 2019, after prolonged drought conditions, the country was struck by widespread, severe flooding over two weeks. Heavy rainfall affected vast areas, impacting 10 million people, causing 76 deaths, and damaging 3800 cities and villages [8,9,10]. In earlier floods, from 2015 to 2017, 112 lives were lost, and houses were demolished. Also, these floods resulted in significant damage to urban infrastructure, agricultural lands, and transportation networks [11]. The increasing frequency of floods is closely linked to human activities such as urban expansion, deforestation, and land-use changes [9,12]. In the northern, western, and southern parts of the country, recent floods have been exacerbated by rapid urbanization, the construction of dams and levees, agricultural expansion, and deforestation, all of which have disrupted natural watershed and river drainage systems [9,13]. Identifying flood-prone areas is crucial for creating effective strategies to mitigate risks and minimize damages [2,14]. However, practical field experiments or theoretical studies on floods are often unfeasible. Therefore, it is vital to develop models, algorithms, and analytical techniques for logical, conceptual, and numerical flood assessments.

Flood susceptibility maps are essential for identifying and assessing areas vulnerable to flooding based on their physical features [2,15]. These tools are crucial in minimizing flood-related losses and play a vital role in disaster management and risk mitigation [16]. Recent advancements in technologies such as remote sensing, Geographic Information Systems (GISs), machine learning (ML), and statistical methods have greatly enhanced the precision of these maps [17,18,19]. Nonetheless, creating precise flood susceptibility maps demands a thorough understanding of flood dynamics, the careful selection of relevant flood-triggering factors, an evaluation of their effects, and the use of appropriate models for analysis.

Recent studies on flood susceptibility mapping have used a range of methods, including Soil and Water Assessment Tools (SWATs), hydrological models, HSPF models, multi-criteria decision analysis (MCDA), IHACRES models, hydrodynamic models, statistical techniques, and ML methods frequently integrated with GISs [20,21,22,23,24,25,26,27,28]. However, to utilize these models effectively, substantial field data collection and thorough parameterization are necessary [29]. Additionally, most models provide only localized flood risk estimates, depending on streamflow data from hydrometric stations, which limit their use for larger regional assessments [12,30]. Hydrological and hydrodynamic models can also be time-consuming and face calibration difficulties, affecting their ability to precisely identify flood-prone zones [29]. While MCDA models are widely used and have shown accuracy in various studies [25], methods like the Analytical Hierarchy Process may introduce uncertainties due to the subjective nature of decision-making [16].

ML techniques have attracted considerable interest from researchers for flood mapping [31]. Among the most commonly employed ML models for this purpose are Support Vector Machines (SVMs) [32,33,34], Parallel Random Forest (PRF) [35], Random Forest (RF) [36,37], Artificial Neural Networks (ANNs) [38,39,40], and decision trees (DTs) [32,38]. These models have proven particularly effective in forecasting flash floods in regions assessed as high risk.

The remarkable efficacy of ML algorithms has led to their increased use and refinement in various models related to natural hazards. However, there is still no widespread agreement on the best ML methods for modeling events like landslides and flood mapping. Studies have shown the importance of creating and evaluating new approaches for flood mapping and other hazard simulations [31]. Techniques that combine remote sensing with ML algorithms and independent models have shown high accuracy by dividing data into training and testing sets for calibration and validation. These advanced methodologies significantly improve our capacity to forecast future flood occurrences, allowing for the implementation of effective strategies to minimize disasters [4].

This research aims to fulfill two main objectives: (i) To compare the performance models like RF, Random SubSpace (RSS), Locally Weighted Regression (LWR), the M5P tree algorithm, and Bagging models in generating accurate maps, and (ii) to investigate the factors that lead to flooding in order to develop flood susceptibility maps for the Marand Plain. This research examines a region highly susceptible to flash floods, utilizing it as a case study to develop a flash flood map. The development of a highly accurate flood susceptibility map for the Marand Plain is a major contribution of this research, being the first initiative of its kind in the region. The results of this study are intended to assist regional and local authorities, along with policymakers, in reducing flood-related risks and implementing effective strategies to mitigate potential damages.

2. Materials and Methodology

2.1. Description of the Case Study

The Marand watershed, found in the East Azerbaijan Province, covers an area of approximately 2030 square kilometers. It is classified as a sub-basin of the Caspian Sea Basin, experiencing an average annual rainfall of about 242.7 mm. The region’s average annual temperature is recorded at 11.4 °C, with temperature variations ranging from 36 °C in July to −12.4 °C in January [41]. According to the De Martonne climate classification, this area exhibits a semi-arid climate characterized by cold winters [1]. The Marand Plain is of considerable environmental and economic significance, largely due to its robust agricultural sector and a population density that is higher than in other regions of the country.

2.2. Methodology

2.2.1. Flood Inventory

Crucial information on flood-affected areas and the key characteristics of past flood events can be provided through flood inventory maps. In this research, the flood inventory map was created by utilizing ground control points from historical flood records at 485 sites, as documented in prior reports and confirmed through field surveys (see Figure 1). To validate these flood points, we drew on multiple sources, including historical records, direct field observations, interviews with local residents, and imagery obtained from Google Earth^®. A flow chart of methodology for flood modeling is given in Figure 2. The 485 identified flood points were randomly divided into two sets, with 70% allocated for training the flood susceptibility models and 30% reserved for testing purposes. A flood layer was established as the dependent variable, using continuous values ranging from 0 to 1 to represent flood occurrence likelihood. Specifically, 0 indicated very low flood susceptibility, 0.25 indicated low susceptibility, 0.5 indicated moderate susceptibility, 0.75 indicated high susceptibility, and 1 represented very high susceptibility. The flood points correspond to specific sites where flooding has been consistently reported, while the non-flood points refer to locations with no historical flood events. To reduce bias, we used an equal number of 485 non-flood points [42]. Data related to twelve parameters that trigger floods were extracted from the training dataset using ArcGIS 10.5. This dataset was then imported into the WEKA software (version 3.9.6) for ML modeling tasks.

2.2.2. Flood Influencing Factors

There is currently no consensus on standardized criteria for identifying factors that influence flooding [43,44]. However, numerous studies have identified a variety of significant variables that contribute to flooding. The parameters considered are slope, plan and profile curvature, elevation, proximity to rivers, rainfall levels, aspect, soil characteristics, land use/cover types, and lithological features [12,14,16,45]. Figure 3 depicts the spatial distribution of these parameters within the study area. All identified variables were transformed into raster format at a spatial resolution of 30 × 30 m using the ArcGIS platform.

Elevation

Elevation (Figure 3a) serves as a fundamental topographic measure [6], exhibiting an inverse relationship with flood susceptibility. Lower-lying areas demonstrate higher flood probability due to gravitational water flow from higher elevations [12,17,32].

Slope

Slope (Figure 3b) directly influences both runoff volume and velocity. Steeper gradients typically result in reduced water infiltration rates and increased direct runoff to drainage systems [22,46,47].

Aspect

Aspect (Figure 3c) significantly affects hydrological response units [12] and influences both soil moisture content and local climatic conditions [6,32].

Topographic Wetness Index (TWI)

The TWI is often used to quantify the influence of topography on hydrological processes. It serves as an indicator of water accumulation at a given location and reflects the tendency of water to flow downward under the force of gravity. Water infiltration affects soil strength and is influenced by factors like soil permeability and pore water pressure [44]. This index highlights the correlation between the slope of the terrain and the moisture content in the surface soil. The TWI is determined using the formula presented in Equation (1):

T W I = \frac{A_{s}}{\tan δ}, A_{s} = \frac{A}{L}

(1)

In this equation, as indicates the drainage at a specific location, while tan σ corresponds to the slope of the terrain. The variable A represents the area of land that drains into the point of interest, and L measures the slope in the direction of water flow. Higher TWI values indicate greater moisture content and enhanced runoff generation potential [44]. For the calculation of the TWI, a DEM in raster format was processed using SagaGIS 9.0.1. software (Figure 3d).

Precipitation

Precipitation (Figure 3e) serves as the primary watershed input, with both quantity and intensity determining flood risk [25,40]. For this study, an average annual precipitation map was created utilizing a 30-year dataset (from 1990 to 2019) obtained from five rain stations. The precipitation data were acquired from the Iran Water Resources Management Company (IDWRMC) and the Iran Meteorological Organization (IRIMO), as shown in Figure 3e.

Distance to River

Distance to river significantly influences flood probability, as flood events frequently occur near waterways [45,48,49]. Research consistently shows higher flood incidents in river-adjacent areas [40].

Stream Power Index (SPI)

The SPI (Figure 4a) evaluates the relationship between stream power and erosion [40,50]. Lower SPI values typically indicate increased erosion potential and flood susceptibility, particularly in areas prone to stream accumulation during intense rainfall events [44,45]. In the present research, the SPI was calculated using a DEM in raster format at varying resolutions, processed within the SagaGIS software environment [45].

Curvature

Curvature, comprising both plan and profile components (Figure 4b,c), affects water flow velocity and erosion–sedimentation processes [51]. Plan curvature indicates flow convergence or divergence along contour lines, while profile curvature reveals velocity variations along the flow path. Areas with negative curvature values typically correlate with increased runoff activity [50].

Land Use/Cover

Land use/cover (LULC) (Figure 4d) fundamentally affect hydrological dynamics [4,45,52]. Key transformations include the conversion of pasturelands to agriculture, forest to farmland, shifts in irrigation practices, and the urban development of forested areas, each significantly impacting runoff patterns. The land-use/cover map was acquired from IDWRMC.

Soil

Soil characteristics critically affect surface runoff generation and flooding processes. Soil composition changes can significantly impact water retention and runoff generation [44]. Porous and shallow soils, particularly in upper watershed sections, typically generate greater runoff compared to denser soil types [48]. This soil map was compiled from various reports and soil texture maps provided by the East Azerbaijan Regional Water Organization (RWOEA), the Natural Resources Organization (NRO), and the Agricultural Research Center of East Azerbaijan Province (ARCEA), as depicted in Figure 4e.

Lithology

Lithology influences runoff characteristics through its effects on infiltration rates and sediment transport. Different geological formations show varying flood sensitivities [12], with finer soil textures typically reducing permeability and increasing runoff. The study area encompasses 31 distinct geological units, each contributing differently to flood susceptibility. The geological map for this study area consists of 31 distinct geological units (Figure 5) and was acquired from the IDWRMC.

2.3. Flood Modeling Methods

2.3.1. Random Forest (RF)

RF is an ensemble learning algorithm developed by Breiman [53]. It functions through two primary mechanisms [4]. It constructs multiple DTs by randomly selecting subsets of data and features. The primary advantage of RF is that it reduces the likelihood of overfitting by aggregating the predictions from many decision trees, making it highly effective for noisy or incomplete datasets. RF is particularly adept at handling both categorical and continuous variables, and it provides an estimate of feature importance, which is valuable in identifying critical flood-related parameters [4,45].

2.3.2. Random SubSpace Method (RSS)

The RSS method, an ensemble ML technique introduced by Ho in 1989, was designed to tackle various environmental issues [54]. Rather than employing the complete dataset, this method generates subspaces from randomly selected feature subsets. This approach effectively reduces overfitting and eliminates redundant features in extensive datasets [15,55]. A detailed description of this algorithm can be found in the work of [56]. Due to its ability to manage datasets with many irrelevant features, it is highly recommended for minimizing the risks associated with overfitting [4]. The RSS method is particularly effective in flood susceptibility modeling, especially when datasets include numerous irrelevant variables. By concentrating on random subsets of features, the RSS maintains the model’s robustness, even when dealing with complex environmental datasets.

2.3.3. Bagging

Bagging, a technique introduced by Breiman [57], enhances model accuracy through bootstrap aggregation. This approach involves training multiple classification models on distinct bootstrap samples and then combining their predictions to improve overall performance. This method effectively boosts predictive accuracy, even with minor variations in the training data [30,58]. The Bagging process comprises three essential steps: (1) randomly drawing bootstrap samples from the original training dataset to generate multiple subsets; (2) constructing classification models based on these subsets; (3) aggregating the predictions from all models to produce the final output. Bagging is commonly applied in flood susceptibility prediction, particularly when the data are noisy or contain numerous outliers. By generating multiple models and aggregating their predictions, Bagging contributes to the development of a more robust and accurate flood risk model [30,58].

2.3.4. M5P Model Tree Algorithm (M5P)

The M5P algorithm is an advanced iteration of Quinlan’s original M5 algorithm, which was introduced in 1992, and it combines DTs with linear regression to enhance predictive accuracy [59]. It is particularly well suited for large datasets, effectively manages incomplete data, and minimizes errors through techniques like pruning and smoothing [60]. The M5P algorithm is well suited for flood modeling due to its ability to handle diverse predictors, including nonlinear relationships like interactions between terrain and climate. By integrating decision trees with regression, it offers more interpretable and accurate models, particularly when assessing flood susceptibility in complex and varied environments. The algorithm operates through four key phases: (i) segmenting the input space, (ii) creating a linear regression model, (iii) discording unnecessary branches from the decision tree to streamline the model, and (iv) applying smoothing techniques to refine the predictions and enhance overall accuracy [61].

2.3.5. Locally Weighted Learning (LWL)

LWL is a method employed for the localized approximation of target functions. In flood susceptibility modeling, LWL is effective for predicting flood risk by considering localized variations in environmental factors like slope and elevation. It is particularly valuable in regions with diverse and spatially variable characteristics, where local patterns play a crucial role in influencing flooding events. It is often regarded as an extension of the k-nearest neighbors algorithm, which is a sample-based approach in ML [62]. This technique emphasizes the importance of nearby data points, allowing for tailored predictions that reflect local variations in the dataset. It operates by retrieving stored sample data during computation and applying weights to nearby data points to approximate complex functions. LWL is versatile, working well for both linear and nonlinear regression, and its performance can be optimized by adjusting various distance metrics [63,64]. For more in-depth information on the LWL algorithm and its latest applications, see [62].

Table 1 provides a summary of the features of five flood prediction models, highlighting their advantages and disadvantages. The overview discusses the strengths and limitations of each model in terms of accuracy, computational requirements, data handling, and applicability to flooding. These models are highly suitable for flood prediction as they perform exceptionally well in managing large datasets, dealing with complex patterns in the data, and reducing errors that may arise from overfitting or noisy inputs.

2.4. Investigating Flood Influencers via Information Gain Ratio and Multicollinearity Methodologies

Evaluating the significance of various factors that influence flooding is a crucial preliminary step before training and validating any predictive model [65]. Each factor’s importance is determined by analyzing its statistical properties and its correlation with instances of flooding. A range of spatial techniques and models has been developed and utilized to assess flood vulnerability and identify areas at risk, facilitating the detection of flood zones. Understanding the relevant set of parameters that contribute to flooding is essential for improving the reliability of the model’s performance [31]. The information gain ratio (IGR) is employed to identify key parameters associated with flood vulnerability due to its effectiveness in evaluating the importance of each influencing factor [66,67]. Its straightforward nature, effectiveness, and capacity to manage multiple variables make it a popular option for environmental and geospatial analyses. In flood susceptibility modeling, IGR proves especially beneficial by identifying the variables with the strongest predictive influence. This method emphasizes the most significant parameters in the modeling process, thereby enhancing the precision and dependability of the results. The IGR is calculated using Equation (2).

G a i n r a t i o (x, Z) = \frac{E n t r o p y (z) - \sum_{1}^{n} \sum_{i = 1}^{n} \frac{|z_{i}|}{|z|} E n t r o p y (z_{i})}{- \sum_{i = 1}^{n} \frac{|z_{i}|}{|z|} l o g \frac{|z_{i}|}{|z|}}

(2)

In this study, IGR was used to evaluate the relative importance of various factors affecting flood susceptibility. The training data points (Z) were separated into subsets (Z_i = 1, 2, 3,… n), each linked to an attribute (x) that represents specific flood-related characteristics. These attributes include key factors such as elevation, slope, proximity to rivers, and land use, which play crucial roles in flood prediction. The IGR assigns a numerical value to each parameter, with higher values reflecting a greater contribution to flood susceptibility. This numerical ranking aids in prioritizing the most important factors in the modeling process, leading to a more focused and efficient analysis. To ensure the reliability of the IGR analysis, this study evaluated multicollinearity among predictor variables using the Variance Inflation Factor (VIF) method. Multicollinearity arises when two or more variables are strongly correlated, which can skew the results and compromise the model’s predictive accuracy. By detecting and addressing multicollinearity, this study ensures that the chosen parameters are independent, and each contributes uniquely to the analysis [68]. To ensure that only relevant predictors are used in flood ML models, a multicollinearity analysis is performed to identify suitable predictors for flood vulnerability assessment. This analysis measures the strength of correlations between predictors by calculating VIFs and tolerance levels. If the VIF value exceeds 5, it indicates significant multicollinearity [68]. Tolerance serves as an extended form of the multiple correlation coefficient. In this context, the variable under evaluation is treated as the dependent variable, while other predictors are treated as independent variables in the regression analysis. Tolerance values range from 0 to 1, with a value of zero indicating that the variable is entirely predictable from the other variables, implying complete multicollinearity. On the other hand, a tolerance value of one suggests no correlation with the other independent variables [58].

2.5. Evaluation of Models Performance

When estimating the spatial probability of flooding in a specific area, it is crucial to rigorously evaluate and validate the model’s performance. The Relative Operating Characteristic (ROC) curve is a widely used evaluation tool in geographic hazard modeling [69] and is presented in a two-dimensional format. In this plot, the X-axis represents sensitivity, while the Y-axis represents specificity. Sensitivity indicates the proportion of flood pixels correctly classified as floods, whereas specificity measures the accurate identification of non-flood pixels [30], a key performance indicator derived from the AUC curve. An AUC/ROC value of 0.5 indicates that the model’s performance is equivalent to random guessing, while values above 0.5 up to 1 suggest improved model accuracy [15]. For further insights into ROC analysis, readers can refer to the work of Pontius and [70]. In addition, to assess the performance of the models, the correlation coefficient (R), root mean square error (RMSE), and mean absolute error (MAE) were employed as evaluation metrics.

3. Results

3.1. Flood Influential Factors

The results of the analysis are presented in Figure 6, where the IGR values for each variable were determined using the 10-fold cross-validation method. Among the variables examined, elevation (0.174), precipitation (0.1456), and proximity to rivers (0.1206) were found to have the highest IGR values, signifying their major influence on flood susceptibility. These factors are crucial in predicting flood events in the study area. Moderately influential variables, with IGR values between 0.0658 and 0.0820, include aspect (0.0820), map curvature (0.08), TWI (0.0792), slope (0.0769), profile curvature (0.0758), and SPI (0.0658). While these factors are less impactful than elevation, precipitation, and proximity to rivers, they still play an important role in determining flood risk. In contrast, land use/land cover (0.0378), lithology (0.0361), and soil type (0.0234) were assigned the lowest IGR values, suggesting their relatively minor influence on flood susceptibility in the study region. Elevation and slope are identified as the most critical factors in flood occurrence, which aligns with findings by Bui [40]. This study illustrates that areas with lower elevation, converging curvatures, and gentle slopes are more vulnerable to flooding. These topographic features tend to form depressions where water accumulates, increasing the likelihood of floods. On the other hand, the aspect variable, which indicates the direction a slope faces, did not emerge as a reliable predictor of flood events. This result suggests that flood-prone areas in this study region are mostly flat or gently sloping, reducing the importance of aspect in determining flood risk. Unlike steep slopes, areas with little to no slope do not exhibit significant directional exposure, making aspect less relevant in this context.

Strong correlations between independent variables can lead to inaccurate predictions. To assess these correlations, the VIF and tolerance metrics were utilized. A threshold of 5 was applied to evaluate the degree of multicollinearity. The findings revealed that all VIF values for the variables analyzed were below this limit, indicating that the variables are largely independent, and multicollinearity is not a concern. Table 2 presents the analysis results, where the elevation factor had the highest VIF of 4.921, and the slope factor recorded the lowest at 1.03.

3.2. Overview of Flood Parameters

Accurate modeling is essential for understanding the spatial relationships between natural hazards and their contributing factors [39]. Flood occurrences, in particular, are influenced by a wide range of factors [67]. This study identified twelve critical variables that impact flood risk: LULC, distance to rivers, elevation, slope, TWI, SPI, profile curvature, plan curvature, precipitation, aspect, soil type, and lithology. The analysis reveals that elevation is the most influential factor, followed by precipitation and proximity to rivers. Other factors, ranked by importance, include aspect, plan curvature, TWI, slope, profile curvature, SPI, LULC, lithology, and soil type. These findings are largely consistent with prior research by Talukdar [67] and Chapi [71], although those studies identified LULC and slope as the primary factors affecting flood susceptibility.

As illustrated in Figure 3a, there is a clear inverse relationship between elevation and flood probability. The elevation map was generated using a 30 × 30 m resolution DEM within the ArcGIS platform. The study area’s elevation ranges from 921 to 3294 m, with floods being recorded across this range. The data show that flood risk decreases as elevation increases, particularly at elevations below 921 m, where flood susceptibility is highest. In contrast, areas above 3294 m are significantly less prone to flooding. The results are in agreement with those reported by Hong [72].

Figure 3b shows the slope response curve, indicating that the probability of flooding increases as the slope angle decreases. This finding is consistent with earlier studies [6,17], which also highlight that flat areas are more vulnerable to flooding. The slope angles in flood-prone regions range from 0 to 79.69 degrees, with the risk of flooding decreasing as the slope becomes steeper. Flatter terrains, in particular, exhibit a much higher susceptibility to flooding.

In this research, a direction map was generated and categorized into nine distinct intervals: (0–22.5), (22.5–67.5), (67.5–112.5), (112.5–157.5), (157.5–202.5), (202.5–247.5), (247.5–292.5), (292.5–337.5), and (337.5–360) degrees, as illustrated in Figure 3c.

The SPI and TWI are two essential hydrological factors that significantly influence the spatial distribution of floods. The SPI, illustrated in Figure 4a, indicates soil moisture content and the potential for erosion from floodwaters in downstream areas of the watershed [73]. Generally, lower SPI values correspond to a higher probability of flooding, as regions with greater stream accumulation potential typically have reduced SPI values [34]. An SPI value near zero notably signifies a significantly increased flood risk, aligning with the observation that most flood events in the study area are linked to regions with lower SPI values. The maximum SPI recorded in this study reached 600 (Figure 4a). In contrast, the TWI, as depicted in Figure 3d, serves as a measure of saturation levels and water accumulation within the watershed. Areas with higher TWI values tend to have greater susceptibility to flooding. In this research, TWI values were found to range from −18.31 to 19.67 (Figure 3d).

Previous studies have established a clear link between increased precipitation and the occurrence of floods [74]. We investigated that precipitation levels of 500 mm are shown to significantly contribute to flood events by generating runoff that flows downstream into rivers. Additionally, precipitation amounts around 250 mm are associated with the highest probabilities of flooding, particularly in downstream areas where runoff from the upper watershed collects. The annual precipitation in the study area ranges from 250 to 500 mm, demonstrating a decreasing trend from the eastern to the western regions of the plain (Figure 3e).

Floods commonly result from overflow from riverbeds, which makes the distance from rivers (Figure 3f) an essential factor for predicting areas likely to experience flooding within a basin [40]. Areas situated near rivers tend to experience quicker responses to heavy rainfall, leading to increased vulnerability to flooding. We found that a distance of up to 1540 m from rivers is the most significant factor affecting flood occurrences. As proximity to the river decreases, the likelihood of flooding correspondingly increases. This finding is consistent with observations made by Chaip [71] and Hong [72], who noted a strong correlation between proximity to rivers and flood events. The farthest distance to the river in this study is 1540 m (Figure 3f).

In this study, we identified land surfaces with curvature values between 1.0 and 2.0 as areas prone to flooding [12]. The curvature maps, obtained from the DEM, show plan curvature values ranging from −1.29 to 4.44 and profile curvature values from −6.37 to 8.66 (Figure 4b,c). These curvature values are primarily linked to flat terrains. Our results reveal that curvature values below one are consistently associated with a reduced probability of flooding. In contrast, convex slopes, characterized by curvature values greater than zero, demonstrate an increased likelihood of flooding. The findings of [23,71] align with this observed relationship between curvature values and flooding susceptibility.

Barren lands play a significant role in increasing runoff, highlighting the important influence of land cover on flood occurrences in the study area [4]. In our research, LULC were classified into ten categories: forest, agricultural land, orchards, dry farming, pasture, and water bodies (Figure 4d). Of these categories, agricultural land was found to have the most considerable effect on flooding, while orchards had the least impact. Barren lands are especially prone to flooding due to their lack of vegetation, which hinders water flow and decreases infiltration rates. In contrast, forests showed the least influence on flooding among all land cover types, as they effectively reduce runoff and improve water absorption.

Soil data play a vital role in assessing excess rainfall and infiltration [75]. This study identified five distinct soil types, including Aridisol, Inceptisol, and Entisol, as shown in Figure 4e.

In relation to lithology (Figure 5), areas characterized by resistant rocks or underground materials with high permeability typically exhibit lower drainage density, leading to a reduced probability of flooding [29]. In this analysis, while the sediments of foothill plains and valley terraces indicated a higher likelihood of flood occurrence, other lithological sediments displayed similar importance in contributing to flooding events (Figure 5).

3.3. Assessment of Flood Risk Models

Flood vulnerability maps were developed at the pixel level within the watershed using five ML models: RSS, RF, LWL, M5P, and Bagging. Among these models, the LWL model was identified as the most accurate predictor for the geographic datasets based on the empirical findings. For the analysis, various reclassification techniques were available in the ArcGIS software [22]. Among the reclassification techniques for flood vulnerability, the quantile and natural breaks methods have been extensively discussed in the literature [4]. The quantile method, in particular, is well regarded for its capacity to deliver more precise results compared to alternative techniques, which makes it a preferred option for reclassifying flood vulnerability maps [22]. We adopted the quantile reclassification method to categorize flood vulnerability into five specific classes (Figure 7) as very low, low, moderate, high, and very high [4].

Figure 7 presents five flood sensitivity models, each categorized into distinct classes, highlighting significant variations in regional coverage for these sensitivity levels. In the RSS model, the very-low-sensitivity category encompasses the smallest section at 3.14%, while low sensitivity covers 10.88%. Moderate sensitivity occupies 20.20%, high sensitivity makes up 56.85%, and very high sensitivity accounts for 8.93%. The predominant high-sensitivity class underscores a considerable flood risk in these regions. Similarly, the Bagging model exhibits substantial coverage for the high-sensitivity class at 55.45%. The proportions for the other classes are notably lower, with very low sensitivity at 1.11%, low sensitivity at 7.53%, moderate sensitivity at 26.57%, and very high sensitivity at 9.34%. This pattern mirrors that of the RSS model, where high-sensitivity areas are prevalent. In the M5P model, coverage for sensitivity classes is as follows: very low sensitivity accounts for 1.79%, low sensitivity for 11.29%, moderate sensitivity for 24.94%, high sensitivity for 52.97%, and very high sensitivity for 9.00%. Once again, the high-sensitivity class dominates, consistent with trends seen in earlier models. The RF model offers a more balanced distribution, with the largest area occupied by the moderate-sensitivity class at 27.03%. This is followed by high sensitivity at 24.90%, low sensitivity at 31.64%, very low sensitivity at 4.82%, and very high sensitivity at 11.61%. This distribution suggests a more equitable risk allocation among moderate- and high-sensitivity areas compared to the other models. In contrast, the LWL model performs exceptionally well, with the moderate-sensitivity class covering the largest proportion of 38.51%. The low-sensitivity class accounts for 29.23%, while the high-sensitivity class comprises 20.73%. The very low and very high classes occupy 1.74% and 9.79%, respectively. Unlike the other models, the LWL model showcases a more precise distribution of flood risk, particularly in moderate- and low-sensitivity areas, indicating superior overall performance. While the RSS, Bagging, and M5P models place a strong emphasis on high-sensitivity areas, the LWL model provides a more balanced and accurate representation of flood sensitivity, making it particularly well suited for detailed flood risk analysis. Although the RF model offers a more equitable perspective, it does not achieve the same level of accuracy as the LWL model. Overall, the LWL model stands out for its comprehensive depiction of flood sensitivity across various risk levels.

The maps in Figure 8 reveal that areas with a higher sensitivity to flooding are primarily located in the flat plains adjacent to the main river, corroborating findings from [22]. Significant concentrations of these sensitivity classes were noted in the western, northwestern, and southeastern parts of the river basin. Across the five different models, a nearly uniform spatial distribution pattern of flood susceptibility regions was observed, particularly for those categorized as high and very high sensitivity. However, the LWL model indicated relatively lower percentages in these sensitivity categories when compared to the other models, suggesting a distinct approach to flood risk representation. This suggests that the LWL model offers a more practical outcome, demonstrating greater accuracy and reliability. The insights derived from these results hold substantial implications for effective planning in the region, potentially streamlining time and costs associated with land-use planning and policymaking. By utilizing the findings from the LWL model, decision-makers can make more informed choices regarding flood risk management and resource allocation in vulnerable areas.

The regions adjacent to the river basin and the primary river channel showed significant vulnerability to flooding, classified within the very-high-sensitivity category on the flood sensitivity map [76]. The areas marked in red on the map correspond to the western part, which is characterized by both very high and high flood sensitivity (Figure 8). Additionally, Arabameri [58] noted a comparable distribution of flood-sensitive zones in relation to agricultural lands in their investigation, further corroborating these results.

3.4. Evaluation and Validation of Models

The ROC curve was utilized to validate the five flood sensitivity models using both training and testing data points (Figure 1). This analysis evaluated the flood sensitivity maps produced by the models in relation to the training dataset, which contributed to the development of the existing flood success rate curve, and the testing dataset, which assessed the accuracy of the flood event predictions. To enhance this analysis, both flood training and testing points were superimposed on the flood sensitivity maps. The accuracy of each model was determined by calculating the AUC; a higher AUC indicates better predictive performance. The ROC curves for the five flood sensitivity models are illustrated in Figure 9, for the training datasets provided in Table 3 and Figure 7. The results demonstrate that the RF model attained the highest AUC of 0.988, accompanied by an SE of 0.003 and a 95% CI between 0.982 and 0.994. The Bagging model closely followed, achieving an AUC of 0.984 (SE: 0.003, 95% CI: 0.978–0.991). Similarly, the LWL model produced an AUC of 0.981 (CI: 0.964–0.984, SE: 0.006, 95%). The RSS demonstrated robust performance with an AUC of 0.975 (CI: 0.965–0.984, SE: 0.005, 95%), while the M5P model recorded an AUC of 0.972 (SE: 0.005, 95% CI: 0.962–0.982). All models exhibited strong predictive capabilities, with AUC values exceeding 0.97, highlighting the RF model as the most effective in accurately predicting flood events in this study.

Figure 9 presents the ROC curves utilized to evaluate five flood sensitivity models based on the test dataset, with detailed metrics shown in Table 4 and Figure 7. The results demonstrated that the LWL model achieved the highest AUC of 0.965, with an SE of 0.018 (95% CI: 0.929–0.1). The RF model closely followed, recording an AUC of 0.961 (SE: 0.0 18, 95% CI: 0.925–0.997). The Bagging model also performed well, attaining an AUC of 0.953 (SE: 0.023, 95% CI: 0.908–0.999). In comparison, both the RSS and the M5P models showed similar outcomes, with AUC values of 0.940 (SE: 0.017, 95% CI: 0.906–0.974) and 0.941 (SE: 0.022, 95% CI: 0.899–0.984), respectively. These results indicate that all models exhibited strong predictive accuracy, with AUC values exceeding 0.94. Importantly, the LWL model distinguished itself by providing superior predictions of flood sensitivity for the validation dataset, while the RF and Bagging models showed comparable effectiveness. Overall, these findings suggest that the LWL model is particularly effective for flood sensitivity assessment, although the other models remain valuable alternatives for evaluating flood risk in the study area.

As summarized in Table 5, the performance assessment of five flood susceptibility models was conducted utilizing both training and testing datasets, with a focus on key evaluation metrics, including RMSE, MAE, and R². The analysis indicated that the RF model achieved superior results, recording the lowest RMSE at 0.032 and the lowest MAE at 0.023, in addition to the highest R² value of 0.980 during the training phase. The ranking of the models based on performance in this phase is as follows: RF > LWL > Bagging > RSS > M5P. The LWL model outperformed the others during the testing phase, achieving a high R² of 0.960, an RMSE of 0.082, and a MAE of 0.060. The RF model followed closely, with an R² of 0.956 and an RMSE of 0.088. The performance rankings for the validation phase are as follows: LWL > RF > M5P > Bagging > RSS. Throughout both the training and validation phases, the LWL and RF models displayed robust predictive capabilities, highlighting their effectiveness in flood susceptibility assessment in this research. In contrast, the RSS model showed the poorest performance, particularly during validation, with the highest RMSE of 0.155 and MAE of 0.127, making it the least effective model for flood susceptibility mapping. These results underscore the dependability of the hybrid combined model approach for evaluating flood susceptibility maps, aligning with the findings [77].

This research sought to evaluate the effectiveness of the LWL model relative to several other models, including RSS, RF, Bagging, and M5P. To achieve optimal performance from these advanced models, meticulous parameter tuning was necessary, as outlined in Table 6. The LWL, RSS, RF, and Bagging models were employed for flood susceptibility prediction, focusing on the most influential flood-related factors. Evaluation metrics were calculated to assess the performance of the LWL model with the benchmark models, using both training and testing datasets (Table 5).

4. Discussion

Flooding is a highly destructive natural disaster, particularly in mountainous regions, with Iran being significantly affected [9]. Mapping flood vulnerability is essential for the effective prevention and management of future flooding events. Nevertheless, accurately predicting the occurrence of floods poses a significant challenge due to the complex nature of contributing factors.

This study investigated a variety of climatic, hydrological, and geo-environmental factors that contribute to flooding, employing GIS techniques for effective data preparation. Prior research, such as that by [25,32,58,78] has shown the benefits of integrating multiple flood-related variables to improve the accuracy of flood vulnerability assessments. Our results corroborate this viewpoint, highlighting the significance of incorporating a broad range of climatic, hydrological, and geographical factors to enhance the reliability of flood vulnerability evaluations.

4.1. Evaluating Uncertainties in Hydrological Modeling

This study employed the IGR method to evaluate the impact of various parameters associated with flooding. The significance of these flood-related factors can differ by geographical context, as noted by Arora [79]. Such discrepancies arise from the diverse conditions that lead to flooding in different regions [80]. The analysis indicated that elevation is an important factor influencing flood vulnerability, while geology, soil features, and LULC were identified as less significant. Key variables in flood vulnerability modeling include elevation, slope, and proximity to water bodies. Generally, regions characterized by low elevation, gentle slopes, and proximity to rivers or streams are at a heightened risk of flooding [32,58]. This finding is supported by the research of [47,81], who also emphasized the importance of elevation in flood risk assessments. Researchers such as [22,71] reached similar conclusions. Furthermore, as [31,58] mentioned, heavy rainfall has been recognized as a key external factor contributing to flood events. This observation aligns with earlier studies [76,78], which stress the significance of rainfall in the causes of floods. Additionally, the expansion of urban areas and the reduction in vegetation cover have been linked to an increased likelihood of flooding due to the presence of impermeable surfaces, which result in elevated surface runoff [76].

In this study, plan curvature and profile curvature were found to range from −1.29 to 4.44 and −6.47 to 8.66, respectively. According to Talukdar [67], curvature values between 1.0 and 2.0 indicate an increased risk of flooding, a conclusion that is consistent with research by [23,58]. The aspect values measured in this study ranged from 337.5 to 360 degrees, resulting in an IGR value of 0.0820. Both aspect and slope emerged as significant factors in flood modeling, supporting findings by Islam [4] and Rahmati [6]. The slope varied from 0 to 79.69 degrees, influencing water velocity, with [58] highlighting its importance in flood generation. Furthermore, Tehrany [17] noted that lower slopes are associated with a higher likelihood of flooding. These findings suggest that the relatively low slope in the area significantly increases the probability of flooding.

SPI and TWI are critical components in analyzing the spatial variability of flooding events. In this study, TWI values ranged from −18.3144 to 19.67, highlighting the significant role of topography in influencing flood occurrences, as observed by Chen [31]. Additionally, LULC and soil type were recognized as important factors in flood analysis, which aligns with the findings of [82]. Moreover, Azareh [42] pointed out that factors such as soil texture, land use, elevation, and frequent heavy rainfall play significant roles in flooding in Iran, further supporting the conclusions drawn in this research.

4.2. Impacts of Flood Variability on the Modeling Processes

Researchers have stressed the need for advanced techniques to produce accurate and reliable flood vulnerability maps [46,83]. Evidenced by previous studies [35,67], ML models have consistently proven their effectiveness and high accuracy in flood vulnerability assessments. Five robust ML models—RF, RSS, M5P, and Bagging—were employed to evaluate flood vulnerability in the Marand Plain of Iran. The results align with findings from earlier research, reinforcing the exceptional accuracy and capabilities of ML models in assessing flood vulnerability.

LWL excels in locally approximating the target function, allowing it to effectively respond to variations in data at the local level. In the context of flood modeling, this ability is highly beneficial for managing spatial heterogeneity, where flood patterns can vary significantly across different areas or terrains. Floods are influenced by many local parameters (like soil type, vegetation, topography, and proximity to water sources), and LWL’s ability to model these variations separately helps improve its prediction accuracy.

The LWL model assigns weights to data points within a local neighborhood, prioritizing those closer to the prediction point. In flood modeling, this approach is advantageous as it gives greater importance to nearby factors, such as rainfall or elevation, which are more relevant to flood susceptibility at a specific location. By focusing on local conditions, the LWL model enhances its ability to make more accurate predictions, making it particularly effective in environments where local variations significantly impact flood risk. This weighting mechanism ensures better alignment with the spatial characteristics of the area being studied. In this study, choosing Euclidean distance as the weighting function increases accuracy, as it helps concentrate predictions on geographically or contextually closer observations. Floods are often highly dependent on location-specific factors, making this feature critical.

Flood prediction often involves nonlinear and complex relationships between variables (such as rainfall intensity, elevation, and land use). LWL has the ability to model this complexity by approximating the function locally and using different regression models for each neighborhood. This means that LWL does not assume a single global model for the entire dataset but can apply simpler models that are suitable for local areas. This demonstrates the impressive performance of the LWL model, which reached high accuracy in both the training (R² = 0.980) and testing (R² = 0.960) phases. The model effectively generalizes to new, unseen data while capturing the local variations that impact flood susceptibility. Such capabilities render the LWL model particularly effective for predicting flood risk.

The LWL model utilizes a local fitting approach instead of a global one, which reduces its susceptibility to overfitting in comparison to other modeling techniques. The weights assigned to closer points act as a form of regularization, helping the model avoid fitting noise or irrelevant fluctuations in the data. The robustness of the model is reflected in the low RMSE values for both the training (0.010) and testing (0.082) phases, indicating high prediction accuracy with minimal error variance.

In this study, LWL delivered the highest ROC-AUC score on test data (0.965), demonstrating a better distinction in identifying flood-prone areas. While the RF model showed similar performance during training, LWL slightly outperformed RF in the testing phase, appearing with a more balanced trade-off between sensitivity and specificity on the ROC curve. This suggests that LWL’s local fitting strategy may generalize flood predictions better across different regions.

A considerable body of research has utilized AUC values derived from ROC curves to assess the performance of various models [32,47]. For example, ref. [17] employed a combination of frequency ratio and logistic regression (LR) techniques to evaluate flood vulnerability in Kelantan, Malaysia. Their findings indicated that the FR-LR model achieved a 90% success rate and an 83% prediction rate, outperforming the DT model, which had an 87% success rate and an 82% prediction rate. Similarly, Chapi [71] introduced a hybrid method that merged Bagging with a logistic model tree (Bagging–LMT) to identify flood-prone areas in Iran’s Haraz Basin. This approach utilized Bayesian logistic regression and RF, resulting in enhanced predictive accuracy. Additionally, Chen [21] investigated flood vulnerability predictions using a Reduced Error Pruning Tree (REPTree) along with Bagging and RSS methods within a GIS framework. Their results demonstrated that RSS-REPTree provided the highest predictive performance, achieving AUC values of 0.949 and 0.907, with the lowest standard error and narrowest confidence intervals for both training and validation datasets. Ref. [65] explored flood vulnerability in Bangladesh by employing Bagging ensembles that included REPtree, RF, M5P, and Random Tree models. Their analysis concluded that the M5P model within the Bagging ensemble delivered the most effective results. In this study, the LWL model excelled compared to the other four models, achieving values of 0.98 in AUC for training and 0.965 for testing. Refs. [4,25,76] highlight the effectiveness of various ML techniques, like ANN, linear regression (LR), RSS, DT, RF, and Bagging. These approaches exhibited prediction accuracies (R²) ranging from 70% to 88% for different types of flooding.

4.3. Human Interventions in Flood Management

This study offers several key benefits, such as the incorporation of diverse climatic, hydrological, and geographical factors to assess flood vulnerability comprehensively. It provides an in-depth analysis of five ML models, highlighting their performance in identifying regions at the greatest risk of flooding. Furthermore, the research evaluates and compares these five ML approaches under consistent conditions, utilizing four distinct accuracy metrics to enhance the robustness of the findings.

This study has several limitations, particularly concerning the flood event database utilized. To enhance the comparison and validation of the models developed, it would be beneficial to broaden the database to include a wider range of river flood events specific to the study area. Additionally, while this research focuses on the potential for river flooding concerning land characteristics and LULC, it fails to adequately consider important factors like flow dynamics and precipitation patterns. To improve the assessment of river flood risk in the study area, incorporating an analysis of flood discharge probabilities over specific return periods would be beneficial [84]. Additionally, variability in spatial resolution among the features—such as DEMs, LULC, geology, and soil layers—presents another significant concern. Addressing these inconsistencies is essential for enhancing the accuracy of the flood risk evaluation. This inconsistency introduces uncertainties that are challenging to quantify. For the sake of uniformity, all thematic layers were resampled to a resolution of 30 m; nevertheless, the results obtained remain useful for mapping flood probabilities. It is important to recognize that outcomes can differ significantly across various regions and with different DEMs [44]. Future research should consider replicating this methodology by comparing multiple models across diverse regions and datasets. The findings indicate that increasing spatial resolution does not automatically improve the accuracy of flood probability models; in fact, higher resolutions (such as 5 m) can lead to reduced accuracy, a trend noted by other researchers [31]. Moreover, different scales of DEMs can yield varying results, highlighting the importance of testing DEMs at multiple scales to identify the optimal one for flood probability mapping. Additionally, changes in the DEM influence only the factors associated with it [45], while other elements such as lithology and precipitation remain unaffected. As a result, the decline in model performance may stem from the absence of variability in factors that are independent of the DEM.

The multi-hazard flood probability maps created in this study serve as essential resources for authorities to detect and prioritize areas at risk of various flooding scenarios. This strategy enhances the decision-making process when implementing effective flood management initiatives. Furthermore, the methodology employed facilitates the swift and efficient production of flood probability maps, even in regions where data may be scarce. The resulting maps can be integrated with hydrodynamic models, thereby improving the precision of flood risk forecasts and evaluations, making them relevant for application in countries with comparable geographic and environmental settings. In addition, our methodology can be modified to evaluate the potential for other environmental hazards, including land subsidence landslides, snow avalanches, debris flows, land subsidence, and gully erosion.

5. Conclusions

This research utilized five modeling approaches—Bagging, M5P, RF, RSS, and LWL—to generate flood vulnerability maps for the first time in the Marand Plain, Iran. We analyzed a dataset of 485 locations affected by flooding, focusing on twelve parameters that contribute to flood risk, including slope, elevation, both profile and plan curvature, aspect, SPI, TWI, LULC, precipitation, proximity to rivers, lithology, and soil types. The significance of these parameters was evaluated using the IGR method. To validate the performance of the flood vulnerability models, we employed the ROC curve, which demonstrated that the combination of the LWL model with RF achieved the highest performance, yielding an AUC of 0.96. However, all models exhibited strong capabilities in mapping flood vulnerability. The results underscored the effectiveness of LWL, M5P, Bagging, and RF in modeling flood risks, revealing that approximately 9.79% of the region is highly sensitive to flooding. A primary limitation of this study is the lack of consideration for temporal variations in certain dynamic factors, such as SPI and LULC. Future research will seek to incorporate these temporal elements by utilizing available time-series data for these variables. Additionally, sensitivity analyses of the influencing parameters could further enhance the models’ accuracy. The integration of LWL with the RF algorithm provides distinct advantages, including a smaller set of candidate parameters, improved optimization processes, and quicker convergence in generating flood vulnerability maps. Effective flood management is increasingly critical, especially in Iran, where flash floods occur annually. Despite the urgent need for effective interventions, many basins and regions remain unassessed for flood control strategies. This study leverages advanced ML algorithms to offer valuable insights that can assist local authorities and stakeholders in formulating effective flood reduction strategies and guiding land-use planning policies.

This study aims to identify factors affecting floods in the study area using data-driven models. Identifying these factors aids in mapping flood-prone areas. This approach can determine regions that may pose a risk in future potential floods. By combining map-based and data-driven approaches, a more comprehensive perspective for flood management is provided, offering better tools for managers to combat natural disasters.

Author Contributions

A.A.R. performed the conceptualization, methodology, validation, writing—original draft preparation, supervision, data curation, software, validation, formal analysis, and visualization stages; M.T.S. and H.A. performed the conceptualization, methodology, writing—review and editing, visualization, and supervision phases; A.M. performed the conceptualization, writing—review and editing, visualization, and supervision phases. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are available based on the request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mousavi, S.M.; Rostamzadeh, H. Estimation of flood land use/land cover mapping by regional modelling of flood hazard at sub-basin level case study: Marand basin. Geomat. Nat. Hazards Risk 2019, 10, 1155–1175. [Google Scholar] [CrossRef]
Pradhan, B.; Lee, S.; Dikshit, A.; Kim, H. Spatial flood susceptibility mapping using an explainable artificial intelligence (XAI) model. Geosci. Front. 2023, 14, 101625. [Google Scholar] [CrossRef]
UNISDR. Global Assessment Report on Disaster Risk Reduction. International Strategy for Disaster Reduction (ISDR); UNISDR: Geneva, Switzerland, 2011. [Google Scholar]
Islam, A.R.M.T.; Talukdar, S.; Mahato, S.; Kundu, S.; Eibek, K.U.; Pham, Q.B.; Kuriqi, A.; Linh, N.T.T. Flood susceptibility modelling using advanced ensemble machine learning models. Geosci. Front. 2021, 12, 101075. [Google Scholar] [CrossRef]
Jahandideh-Tehrani, M.; Zhang, H.; Helfer, F.; Yu, Y. Review of climate change impacts on predicted river streamflow in tropical rivers. Environ. Monit. Assess. 2019, 191, 752. [Google Scholar] [CrossRef] [PubMed]
Rahmati, O.; Kornejady, A.; Samadi, M.; Nobre, A.D.; Melesse, A.M. Development of an automated GIS tool for reproducing the HAND terrain model. Environ. Model. Softw. 2018, 102, 1–12. [Google Scholar] [CrossRef]
Vaghefi, S.A.; Keykhai, M.; Jahanbakhshi, F.; Sheikholeslami, J.; Ahmadi, A.; Yang, H.; Abbaspour, K.C. The future of extreme climate in Iran. Sci. Rep. 2019, 9, 1464. [Google Scholar] [CrossRef]
Shokri, A.; Sabzevari, S.; Hashemi, S.A. Impacts of flood on health of Iranian population: Infectious diseases with an emphasis on parasitic infections. Parasite Epidemiol. Control 2020, 9, e00144. [Google Scholar] [CrossRef]
Alborzi, A.; Zhao, Y.; Nazemi, A.; Mirchi, A.; Mallakpour, I.; Moftakhari, H.; Ashraf, S.; Izadi, R.; AghaKouchak, A. The tale of three floods: From extreme events and cascades of highs to anthropogenic floods. Weather. Clim. Extrem. 2022, 38, 100495. [Google Scholar] [CrossRef]
Shahabi, H.; Shirzadi, A.; Ghaderi, K.; Omidvar, E.; Al-Ansari, N.; Clague, J.J.; Geertsema, M.; Khosravi, K.; Amini, A.; Bahrami, S.; et al. Flood detection and susceptibility mapping using sentinel-1 remote sensing data and a machine learning approach: Hybrid intelligence of bagging ensemble based on k-nearest neighbor classifier. Remote Sens. 2020, 12, 266. [Google Scholar] [CrossRef]
Habibian, F. Increased number of floods in Iran. WWW Document. Econ. News Database 2018. [Google Scholar]
Dodangeh, E.; Choubin, B.; Eigdir, A.N.; Nabipour, N.; Panahi, M.; Shamshirband, S.; Mosavi, A. Integrated machine learning methods with resampling algorithms for flood susceptibility prediction. Sci. Total Environ. 2020, 705, 135983. [Google Scholar] [CrossRef] [PubMed]
Razavi, S.; Gober, P.; Maier, H.R.; Brouwer, R.; Wheater, H. Anthropocene flooding: Challenges for science and society. Hydrol. Process. 2020, 34, 1996–2000. [Google Scholar] [CrossRef]
Gharakhanlou, N.M.; Perez, L. Flood susceptible prediction through the use of geospatial variables and machine learning methods. J. Hydrol. 2023, 617, 129121. [Google Scholar] [CrossRef]
Shafizadeh-Moghadam, H.; Valavi, R.; Shahabi, H.; Chapi, K.; Shirzadi, A. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. J. Environ. Manag. 2018, 217, 1–11. [Google Scholar] [CrossRef] [PubMed]
Bera, S.; Das, A.; Mazumder, T. Evaluation of machine learning, information theory and multi-criteria decision analysis methods for flood susceptibility mapping under varying spatial scale of analyses. Remote Sens. Appl. Soc. Environ. 2022, 25, 100686. [Google Scholar] [CrossRef]
Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J. Hydrol. 2013, 504, 69–79. [Google Scholar] [CrossRef]
Hosseini, F.S.; Choubin, B.; Bagheri-Gavkosh, M.; Karimi, O.; Taromideh, F.; Mako, C. Susceptibility assessment of groundwater nitrate contamination using an ensemble machine learning approach. Groundwater 2023, 61, 510–516. [Google Scholar] [CrossRef]
Zhao, G.; Pang, B.; Xu, Z.; Yue, J.; Tu, T. Mapping flood susceptibility in mountainous areas on a national scale in China. Sci. Total Environ. 2018, 615, 1133–1142. [Google Scholar] [CrossRef]
Bicknell, B.R.; Imhoff, J.C.; Kittle, J.L., Jr.; Donigan, A.S., Jr.; Johanson, R.C. Hydrological Simulation Program—Fortran, User’s Manual for Version 11. EPA/600/R-97/080; U.S. Environmental Protection Agency, National Exposure Research Laboratory: Athens, GA, USA, 1997; p. 755. [Google Scholar]
Arnold, J.G.; Srinivasan, R.; Muttiah, R.S.; Williams, J.R. Large area hydrologic modeling and assessment part I: Model development. J. Am. Water Resour. Assoc. 1998, 34, 73–89. [Google Scholar] [CrossRef]
Tehrany, M.S.; Pradhan, B.; Mansor, S.; Ahmad, N. Flood susceptibility assessmentBusing GIS-based support vector machine model with different kernel types. Catena 2015, 125, 91–101. [Google Scholar] [CrossRef]
Khosravi, K.; Pourghasemi, H.R.; Chapi, K.; Bahri, M. Flash flood susceptibility analysis and its mapping using different bivariate models in Iran: A comparison between Shannon’s entropy, statistical index, and weighting factor models. Environ. Monit. Assess. 2016, 188, 656. [Google Scholar] [CrossRef]
Samanta, S.; Pal, D.K.; Palsamanta, B. Flood susceptibility analysis through remote sensing, GIS and frequency ratio model. Appl. Water Sci. 2018, 8, 66. [Google Scholar] [CrossRef]
Rahman, M.; Ningsheng, C.; Islam, M.M.; Dewan, A.; Iqbal, J.; Washakh, R.A.A.; Shufeng, T. Flood Susceptibility Assessment in Bangladesh Using Machine Learning and Multi-criteria Decision Analysis. Earth Syst. Environ. 2019, 3, 585–601. [Google Scholar] [CrossRef]
Pham, B.T.; Phong, T.V.; Nguyen, H.D.; Qi, C.; Al-Ansari, N.; Amini, A.; Ho, L.S.; Tuyen, T.T.; Yen, H.P.H.; Ly, H.B.; et al. A Comparative Study of Kernel Logistic Regression, Radial Basis Function Classifier, Multinomial Naïve Bayes, and Logistic Model Tree for Flash Flood Susceptibility Mapping. Water 2020, 12, 239. [Google Scholar] [CrossRef]
Wagenaar, D.; Curran, A.; Balbi, M. Invited perspectives: How machine learning will change flood risk and impact assessmentNat. Hazards Earth Syst. Sci. 2020, 20, 1149–1161. [Google Scholar] [CrossRef]
Oney, M.; Anlı, A. Regional Drought Analysis with Standardized Precipitation Evapotranspiration Index (SPEI): Gediz Basin, Turkey. J. Agric. Sci. 2023, 29, 1032–1049. [Google Scholar] [CrossRef]
Fenicia, F.; Savenije, H.H.G.; Matgen, P.; Pfister, L. Understanding catchment behavior through stepwise model concept improvement. Water Resour. Res. 2008, 44. [Google Scholar] [CrossRef]
Tien, B.D.; Pradhan, B.; Nampak, H.; Bui, Q.T.; Tran, Q.A.; Nguyen, Q.P. Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibility modeling in a high-frequency tropical cyclone area using GIS. J. Hydrol. 2016, 540, 317–330. [Google Scholar] [CrossRef]
Chen, W.; Hong, H.; Li, S.; Shahabi, H.; Wang, Y.; Wang, X.; Ahmad, B.B. Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with random subspace and random subspace ensembles. J. Hydrol. 2019, 575, 864–873. [Google Scholar] [CrossRef]
Choubin, B.; Moradi, E.; Golshan, M.; Adamowski, J.; Sajedi-Hosseini, F.; Mosavi, A. An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classifcation and regression trees, and support vector machines. Sci. Total Environ. 2019, 651, 2087–2096. [Google Scholar] [CrossRef]
Termeh, S.V.R.; Kornejady, A.; Pourghasemi, H.R.; Keesstra, S. Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. Sci. Total Environ. 2018, 615, 438–451. [Google Scholar] [CrossRef] [PubMed]
Turoglu, H.; Dolek, I. Floods and their likely impacts on ecological environment in Bolaman River basin (Ordu, Turkey). Res. J. Agric. Sci. 2011, 43, 167–173. [Google Scholar]
Ngo, P.-T.; Pham, T.D.; Nhu, V.-H.; Le, T.T.; Tran, D.A.; Phan, D.C.; Hoa, P.V.; AmaroMellado, J.L.; Bui, D.T. A novel hybrid quantum-PSO and credal decision tree ensemble for tropical cyclone induced flash flood susceptibility mapping with geospatial data. J. Hydrol. 2020, 596, 125682. [Google Scholar] [CrossRef]
Paul, G.C.; Saha, S.; Hembram, T.K. Application of the GIS-Based Probabilistic Models for Mapping the Flood Susceptibility in Bansloi Sub-basin of Ganga-Bhagirathi River and Their Comparison. Remote Sens. Earth Syst. Sci. 2019, 2, 120–146. [Google Scholar] [CrossRef]
Vafakhah, M.; Loor, S.M.H.; Pourghasemi, H.; Katebikord, A. Comparing performance of random forest and adaptive neuro-fuzzy inference system data mining models for flood susceptibility mapping. Arab. J. Geosci. 2020, 13, 417. [Google Scholar] [CrossRef]
Moghaddam, D.D.; Pourghasemi, H.R.; Rahmati, O. Assessment of the Contribution of Geo-Environmental Factors to Flood Inundation in a Semi-Arid Region of SW Iran: Comparison of Different Advanced Modeling Approaches. Natural Hazards GIS-Based Spatial Modeling Using Data Mining Techniques; Springer: Cham, Switzerland, 2019; pp. 59–78. [Google Scholar]
Pham, B.T.; Phong, T.V.; Nguyen-Thoi, T.; Parial, K.; Singh, K.; Ly, H.B.; Nguyen, K.T.; Ho, L.S.; Le, H.V.; Prakash, I. Ensemble modeling of landslide susceptibility using random subspace learner and different decision tree classifiers. Geocarto Int. 2020, 37, 735–757. [Google Scholar] [CrossRef]
Bui, D.T.; Tsangaratos, P.; Ngo, P.T.T.; Pham, T.D.; Pham, B.T. Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods. Sci. Total Environ. 2019, 668, 1038–1054. [Google Scholar] [CrossRef]
Rostami, A.A.; Isazadeh, M.; Shahabi, M.; Nozari, H. Evaluation of geostatistical techniques and their hybrid in modelling of groundwater quality index in the Marand Plain in Iran. Environ. Sci. Pollut. Res. 2019, 26, 34993–35009. [Google Scholar] [CrossRef]
Tang, X.; Li, J.; Liu, M.; Liu, W.; Hong, H. Flood susceptibility assessment based on a novel random Naïve Bayes method: A comparison between different factor discretization methods. Catena 2020, 190, 104536. [Google Scholar] [CrossRef]
Azareh, A.; Sardooi, E.R.; Choubin, B.; Barkhori, S.; Shahdadi, A.; Adamowski, J.; Shamshirband, S. Incorporating multi-criteria decision-making and fuzzyvalue functions for flood susceptibility assessment. Geocarto Int. 2019, 36, 2345–2365. [Google Scholar] [CrossRef]
Hosseini, F.S.; Choubin, B.; Mosavi, A.; Nabipour, N.; Shamshirband, S.; Darabi, H.; Haghighi, A.T. Flash-flood hazard assessment using ensembles and Bayesianbased machine learning models: Application of the simulated annealing feature selection method. Sci. Total Environ. 2020, 711, 135161. [Google Scholar] [CrossRef]
Avand, M.; Kuriqi, A.; Khazaei, M.; Ghorbanzadeh, O. DEM resolution effects on machine learning performance for flood probability mapping. J. Hydro-Environ. Res. 2022, 40, 1–16. [Google Scholar] [CrossRef]
Bui, Q.T.; Nguyen, Q.H.; Nguyen, X.L.; Pham, V.D.; Nguyen, H.D.; Pham, V.M. Verification of novel integrations of swarm intelligence algorithms into deep learning neural network for flood susceptibility mapping. J. Hydrol. 2020, 581, 124379. [Google Scholar] [CrossRef]
Khosravi, K.; Daggupati, P.; Alami, M.T.; Awadh, S.M.; Ghareb, M.I.; Panahi, M.; Pham, B.T.; Rezaie, F.; Qi, C.; Yaseen, Z.M. Meteorological data mining and hybrid data-intelligence models for reference evaporation simulation: A case study in Iraq. Comput. Electron. Agric. 2019, 167, 105041. [Google Scholar] [CrossRef]
Papaioannou, G.; Vasiliades, L.; Loukas, A. Multi-criteria analysis framework for potential flood prone areas mapping. Water Resour. Manag. 2015, 29, 399–418. [Google Scholar] [CrossRef]
Predick, K.I.; Turner, M.G. Landscape configuration and flood frequency influence invasive shrubs in floodplain forests of the Wisconsin River (USA). J. Ecol. 2008, 69, 91–102. [Google Scholar] [CrossRef]
Avand, M.; Moradi, H. Spatial modeling of flood probability using geo-environmental variables and machine learning models, case study: Tajan watershed, Iran. Adv. Space Res. 2021, 67, 3169–3186. [Google Scholar] [CrossRef]
Malik, S.; Chandra Pal, S.; Chowdhuri, I.; Chakrabortty, R.; Roy, P.; Das, B. Prediction of highly flood prone areas by GIS based heuristic and statistical model in a monsoon dominated region of Bengal Basin. Remote Sens. Appl. Soc. Environ. 2020, 19, 100343. [Google Scholar] [CrossRef]
Wei, C.; Dong, X.; Ma, Y.; Zhang, K.; Xie, Z.; Xia, Z.; Su, B. Attributing climate variability, land use change, and other human activities to the variations of the runoff-sediment processes in the Upper Huaihe River Basin, China. J. Hydrol. Reg. Stud. 2024, 56, 101955. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar]
Skurichina, M.; Duin, R.P.W. Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal. Appl. 2002, 5, 121–135. [Google Scholar] [CrossRef]
Kuncheva, L.I. Full-class set classification using the Hungarian algorithm. Int. J. Mach. Learn. Cybern. 2010, 1, 53–61. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Arabameri, A.; Saha, S.; Chen, W.; Roy, J.; Pradhan, B.; Bui, D.T. Flash flood susceptibility modelling using functional tree and hybrid ensemble techniques. J. Hydrol. 2020, 587, 125007. [Google Scholar] [CrossRef]
Solomatine, D.P.; Siek, M.B.L. Flexible and optimal M5 model trees with applications to flow predictions. World Sci. 2004, 2, 1719–1726. [Google Scholar]
Behnood, A.; Behnood, V.; Gharehveran, M.M.; Alyamac, K.E. Prediction of the compressive strength of normal and high-performance concretes using M5P model tree algorithm. Construct. Build. Mater. 2017, 142, 199–207. [Google Scholar] [CrossRef]
Wang, Y.; Witten, I.H. Induction of Model Trees for Predicting Continuous Classes; University of Waikato: Hamilton, New Zealand, 1996. [Google Scholar]
Tanyu, B.F.; Abbaspour, A.; Alimohammadlou, Y.; Tecuci, G. Landslide susceptibility analyses using Random Forest, C4.5, and C5.0 with balanced and unbalanced datasets. Catena 2021, 203, 105355. [Google Scholar] [CrossRef]
Atkeson, C.G.; Moore, A.W.; Schaal, S. Locally weighted learning for control. In Lazy Learning; Springer: Berlin/Heidelberg, Germany, 1997; pp. 75–113. [Google Scholar]
Hong, H. Landslide susceptibility assessment using locally weighted learning integrated with machine learning algorithms. Expert. Syst. Appl. 2024, 237, 121678. [Google Scholar] [CrossRef]
Sameen, M.I.; Sarkar, R.; Pradhan, B.; Drukpa, D.; Alamri, A.M.; Park, H.J. Landslide spatial modelling using unsupervised factor optimisation and regularised greedy forests. Comput. Geosci. 2020, 134, 104336. [Google Scholar] [CrossRef]
Bui, D.T.; Hoang, N.D.; Martínez-Álvarez, F.; Ngo, P.T.T.; Hoa, P.V.; Pham, T.D.; Costache, R. A novel deep learning neural network approach for predicting flash flood susceptibility: A case study at a high frequency tropical storm area. Sci. Total Environ. 2020, 701, 134413. [Google Scholar]
Talukdar, S.; Ghose, B.; Shahfahad, S.R.; Mahato, S.; Pham, Q.B.; Linh, N.T.T.; Costache, R.; Avand, M. Flood susceptibility modeling in Teesta River basin, Bangladesh using novel ensembles of bagging algorithms. Stoch. Environ. Res. Risk Assess. 2020, 34, 2277–2300. [Google Scholar] [CrossRef]
Daoud, J.I. Multicollinearity and regression analysis. J. Phys. Conf. Ser. 2017, 949, 012009. [Google Scholar] [CrossRef]
Shahabi, H.; Hashim, M. Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment. Sci. Rep. 2015, 5, 9899. [Google Scholar] [CrossRef] [PubMed]
Pontius, R.G.; Parmentier, B. Recommendations for using the relative operating characteristic (ROC). Landsc. Ecol. 2014, 29, 367–382. [Google Scholar] [CrossRef]
Chapi, K.; Singh, V.P.; Shirzadi, A.; Shahabi, H.; Bui, D.T.; Pham, B.T.; Khosravi, K. A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ. Model. Softw. 2017, 95, 229–245. [Google Scholar] [CrossRef]
Hong, H.; Panahi, M.; Shirzadi, A.; Ma, T.; Liu, J.; Zhu, A.-X.; Chen, W.; Kougias, I.; Kazakis, N. Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci. Total Environ. 2017, 621, 1124–1141. [Google Scholar] [CrossRef]
Cao, C.; Xu, P.; Wang, Y.; Chen, J.; Zheng, L.; Niu, C. Flash flood hazard susceptibility mapping using frequency ratio and statistical index methods in coalmine subsidence areas. Sustainability 2016, 8, 948. [Google Scholar] [CrossRef]
Todini, F.; De Filippis, T.; De Chiara, G.; Maracchi, G.; Martina, M.; Todini, E. Using a GIS approach to asses flood hazard at national scale. In Proceedings of the European Geosciences Union, 1st General Assembly, Nice, France, 25–30 April 2004. [Google Scholar]
Johnson, S.; La Porta, R.; Lopez-de-Silanes, F.; Shleifer, A. Tunneling. Am. Econ. Rev. 2000, 90, 22–27. [Google Scholar] [CrossRef]
Rahman, M.; Chen, N.; Elbeltagi, A.; Islam, M.M.; Alam, M.; Pourghasemi, H.R.; Tao, W.; Zhang, J.; Shufeng, T.; Faiz, H.; et al. Application of stacking hybrid machine learning algorithms in delineating multi-type flooding in Bangladesh. J. Environ. Manag. 2021, 295, 113086. [Google Scholar] [CrossRef]
Tehrany, M.S.; Jones, S. Evaluating the variations in the flood susceptibility maps accuracies due to the alterations in the type and extent of the flood inventory. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 4. [Google Scholar] [CrossRef]
Adnan, M.S.G.; Abdullah, A.Y.M.; Dewan, A.; Hall, J.W. The effects of changing land use and flood hazard on poverty in coastal Bangladesh. Land. Use Pol. 2020, 99, 104868. [Google Scholar] [CrossRef]
Arora, A.; Pandey, M.; Siddiqui, M.A.; Hong, H.; Mishra, V.N. Spatial flood susceptibility prediction in Middle Ganga Plain: Comparison of frequency ratio and Shannon’s entropy models. Geocarto Int. 2019, 36, 2085–2116. [Google Scholar] [CrossRef]
Rubinato, M.; Nicholas, A.; Peng, Y.; Zhang, J.M.; Lashford, C.; Cai, Y.P.; Lin, P.Z.; Tait, S. Urban and river flooding: Comparison of flood risk management approaches in the UK and China and an assessment of future knowledge needs. Water Sci. Eng. 2019, 12, 274–283. [Google Scholar] [CrossRef]
Khosravi, K.; Nohani, E.; Maroufinia, E.; Pourghasemi, H.R. A GIS-based flood susceptibility assessment and its mapping in Iran: A comparison between frequency ratio and weights-ofevidence bivariate statistical models with multi-criteria decisionmaking technique. Nat. Hazards 2016, 83, 947–987. [Google Scholar] [CrossRef]
Shen, Z.; Deng, H.; Arabameri, A.; Santosh, M.; Vojtek, M.; Vojteková, J. Mapping potential inundation areas due to riverine floods using ensemble models of credal decision tree with bagging, dagging, decorate, multiboost, and random subspace. Adv. Space Res. 2023, 72, 4778–4794. [Google Scholar] [CrossRef]
Chen, S.; Gu, C.; Lin, C.; Zhang, K.; Zhu, Y. Multi-kernel optimized relevance vector machine for probabilistic prediction of concrete dam displacement. Eng. Comput. 2020, 37, 1943–1959. [Google Scholar] [CrossRef]
Xu, L.; Yang, X.; Cui, S.; Tang, J.; Ding, S.; Zhang, X. Cooccurrence of pluvial and fluvial floods exacerbates inundation and economic losses: Evidence from a scenario-based analysis in Longyan, China. Geomat. Nat. Haz. Risk 2023, 14, 2218012. [Google Scholar] [CrossRef]

Figure 1. Location of the Marand Plain.

Figure 2. Summary of methodology for flood modeling.

Figure 3. Local flood parameters: (a) elevation, (b) slope, (c) aspect, (d) TWI, (e) rainfall, and (f) distance to the river.

Figure 4. Local flood parameters: (a) SPI, (b) plan curvature, (c) profile curvature, (d) LULC, and (e) soil types.

Figure 5. Flood parameters: lithology.

Figure 6. Importance of flood conditioning factors based on IGR.

Figure 7. Percentage of area covered by each class in the five flood maps.

Figure 8. Flood maps produced by ML models: (a) RSS, (b) Bagging, (c) M5P, (d) RF, and (e) LWL.

Figure 9. Assessment of five flood vulnerability models through ROC analysis, featuring training points in (a) and testing points in (b).

Table 1. Summarizing the five models’ features.

Model	Features	Advantages	Disadvantages
Random Forest	Ensemble of DTs, random subsets of data and features	Avoids overfitting, handles missing data and chaotic inputs well	Can be computationally expensive, not interpretable like single DTs
Random Subspace	Builds classifiers on random feature subsets, integrates classifiers for final prediction	Reduces overfitting, handles unnecessary features effectively	May require large feature spaces for effective training
Bagging	Bootstrap aggregation with random sampling of training data	Increases accuracy with multiple classifiers, reduces variance, handles noisy datasets	Can suffer from higher computational costs due to multiple model training
M5P Model Tree	Combines DTs with regression functions, pruning and smoothing	Efficient with large datasets, reduces errors, deals well with missing data	Sensitive to noisy data and may not perform well with highly nonlinear relationships
Locally Weighted Learning	Approximates complex functions locally with weighting of nearby data points	Flexible with linear and nonlinear problems, adaptable with different distance metrics (e.g., Euclidean, Mahalanobis)	Computationally intensive, heavily depends on the choice of distance metric and weight function

Table 2. Multicollinearity statistics for factors influencing flood conditioning.

Predictors/Factors	Tolerance	VIF
Slope	0.971	1.03
Aspect	0.939	1.065
SPI	0.641	1.561
Plan curvature	0.549	1.822
Profile curvature	0.817	1.224
TWI	0.775	1.29
Distance to river	0.291	3.437
Elevation	0.203	4.921
Soil	0.854	1.171
Lithology	0.872	1.147
Land use/cover	0.753	1.327
Rainfall	0.834	1.199

Table 3. Summary of ROC curve parameters for the training dataset.

Variables	Area	SE	Asymptotic 95% Confidence Interval
Variables	Area	SE	Lower Bound	Upper Bound
M5P	0.972	0.005	0.962	0.982
RF	0.988	0.003	0.982	0.994
Bagging	0.984	0.003	0.978	0.991
RSS	0.975	0.005	0.965	0.984
LWL	0.981	0.006	0.964	0.984

Table 4. Summary of ROC curve parameters for the testing dataset.

Variables	Area	SE	Asymptotic 95% Confidence Interval
Variables	Area	SE	Lower Bound	Upper Bound
M5P	0.941	0.022	0.899	0.984
RF	0.961	0.018	0.925	0.997
Bagging	0.953	0.023	0.908	0.999
RSS	0.940	0.017	0.906	0.974
LWL	0.965	0.018	0.929	1.000

Table 5. Evaluation of accuracy for five flood susceptibility models using various error metrics on training and testing data.

Models	Training			Validation
Models	R²	MAE	RMSE	R²	MAE	RMSE
M5P	0.950	0.060	0.082	0.930	0.071	0.100
RF	0.980	0.023	0.032	0.956	0.068	0.088
Bagging	0.968	0.045	0.070	0.916	0.078	0.110
RSS	0.964	0.072	0.092	0.903	0.127	0.155
LWL	0.980	0.069	0.010	0.960	0.060	0.082

Table 6. Parameters of ML models for flood susceptibility analysis.

Models	Description
RSS	Classifier: REPTree-M2; Minimum Instances: 2; Seed: 3; Maximum Depth: −1; Minimum Variance Proportion: 0.001; Execution Slots: 1; Iterations: 10; Subspace Size: 0.5.
RF	Batch Size: 100; Seed: 4; Iterations: 100; Maximum Depth: 3; Out-of-Bag Calculation: Enabled; Attribute Importance Calculation: Enabled.
M5P	Batch Size: 100; Minimum Instances: 4.
Bagging	Classifier: Random Tree; Batch Size: 100; Bag Size Percentage: 80; Maximum Depth: 2; Minimum Instances: 1; Minimum Variance Proportion: 0.003; Seed: 5; Execution Slots: 2; Iterations: 20; K-Value: 2.
LWL	Classifier: RF; Batch Size: 100; Bag Size Percentage: 80; KNN: 1; Maximum Depth: 2; Seed: 3; Execution Slots: 1; Iterations: 20; Attribute Importance Calculation: TRUE; Nearest Neighbor Search Algorithm: Linear NN Search; Distance Function: Euclidean Distance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Asghar Rostami, A.; Taghi Sattari, M.; Apaydin, H.; Milewski, A. Modeling Flood Susceptibility Utilizing Advanced Ensemble Machine Learning Techniques in the Marand Plain. Geosciences 2025, 15, 110. https://doi.org/10.3390/geosciences15030110

AMA Style

Asghar Rostami A, Taghi Sattari M, Apaydin H, Milewski A. Modeling Flood Susceptibility Utilizing Advanced Ensemble Machine Learning Techniques in the Marand Plain. Geosciences. 2025; 15(3):110. https://doi.org/10.3390/geosciences15030110

Chicago/Turabian Style

Asghar Rostami, Ali, Mohammad Taghi Sattari, Halit Apaydin, and Adam Milewski. 2025. "Modeling Flood Susceptibility Utilizing Advanced Ensemble Machine Learning Techniques in the Marand Plain" Geosciences 15, no. 3: 110. https://doi.org/10.3390/geosciences15030110

APA Style

Asghar Rostami, A., Taghi Sattari, M., Apaydin, H., & Milewski, A. (2025). Modeling Flood Susceptibility Utilizing Advanced Ensemble Machine Learning Techniques in the Marand Plain. Geosciences, 15(3), 110. https://doi.org/10.3390/geosciences15030110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Flood Susceptibility Utilizing Advanced Ensemble Machine Learning Techniques in the Marand Plain

Abstract

1. Introduction

2. Materials and Methodology

2.1. Description of the Case Study

2.2. Methodology

2.2.1. Flood Inventory

2.2.2. Flood Influencing Factors

Elevation

Slope

Aspect

Topographic Wetness Index (TWI)

Precipitation

Distance to River

Stream Power Index (SPI)

Curvature

Land Use/Cover

Soil

Lithology

2.3. Flood Modeling Methods

2.3.1. Random Forest (RF)

2.3.2. Random SubSpace Method (RSS)

2.3.3. Bagging

2.3.4. M5P Model Tree Algorithm (M5P)

2.3.5. Locally Weighted Learning (LWL)

2.4. Investigating Flood Influencers via Information Gain Ratio and Multicollinearity Methodologies

2.5. Evaluation of Models Performance

3. Results

3.1. Flood Influential Factors

3.2. Overview of Flood Parameters

3.3. Assessment of Flood Risk Models

3.4. Evaluation and Validation of Models

4. Discussion

4.1. Evaluating Uncertainties in Hydrological Modeling

4.2. Impacts of Flood Variability on the Modeling Processes

4.3. Human Interventions in Flood Management

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI