1. Introduction
Floods are among the most widespread and destructive natural hazards, frequently resulting in severe loss of life and significant economic damage. With the increasing volatility of weather patterns driven by climate change, the frequency and severity of flood events are projected to rise [
1]. This growing threat underscores the urgent need for preventive and responsive strategies to mitigate the impact of flooding on communities and infrastructure. Preventive measures aim to assess and communicate the likelihood of flooding in specific areas, often through the use of flood depth maps—tools that illustrate potential inundation levels under various conditions [
2]. In contrast, emergency measures are implemented immediately before, during, or following a flood event, and require near-real-time information on flood extent and affected areas to enable timely and effective decision-making [
3].
Flood Early Warning Systems (FEWSs) play a crucial role in minimizing flood impacts through timely alerts and preparedness. However, their effectiveness is often hindered by high computational demands and challenges in accessing and integrating real-time hydroclimatic data—especially in developing regions. Traditional FEWSs depend on complex hydrological and hydrodynamic models that require intensive calibration, data assimilation, and expert knowledge, limiting their scalability and operational efficiency [
4]. These processes are time-consuming and computationally intensive, particularly when ensemble simulations or real-time scenario forecasting are needed. In many regions, the lack of high-quality input data, such as LiDAR-based topography or accurate rainfall estimates, further reduces model reliability. Improving model efficiency, enhancing computational infrastructure, and automating data assimilation workflows are critical steps toward increasing the predictive accuracy and practical feasibility of FEWSs [
4,
5]. To overcome these limitations, developers and practitioners have recommended the application of ML algorithms in flood modeling and analysis [
6].
Machine learning (ML) is a subset of artificial intelligence where algorithms enhance their performance by learning from increasing amounts of data and repeated task execution [
7]. The model’s performance can then be enhanced by uncovering hidden patterns in the data. In hydraulics and flood studies, ML models have been used geographically and temporally [
8]. While most flood research has focused on temporal modeling (such as rainfall runoff), spatial flood mapping remains underdeveloped. Understanding flood spatial dynamics is essential for predicting inundation and informing emergency response.
Due to the high computational cost of traditional models, machine learning (ML) is gaining traction as a faster alternative. However, there is still a significant gap in developing robust, scalable ML models explicitly tailored for spatial flood prediction. Addressing this gap is vital for advancing practical flood risk management solutions [
9]. Accurate prediction of maximum flood depth in ungauged catchments remains a critical challenge, limiting the effectiveness of current flood risk management strategies. Flood depth maps provide estimates of inundation depth and spatial coverage for different rainfall scenarios and return periods. These maps are typically generated using numerical hydrodynamic models, which simulate flood behavior by discretizing the governing equations and the spatial domain. In addition to depth, such models can also simulate flow velocities, offering a more comprehensive representation of flood dynamics.
Over the past three decades, numerous numerical hydrodynamic modeling tools such as HEC-RAS and MIKE 21 have been developed for this purpose [
10,
11]. With the widespread availability of high-resolution spatial data, particularly in low-relief terrain, two-dimensional (2D) hydraulic models have become increasingly utilized. These models discretize the computational domain into a mesh of cells, allowing for detailed simulation of flood dynamics across complex topographies. Due to their ability to simulate the lateral components of the shallow water equations, 2D models are well-suited for floodplain mapping and flood depth estimation [
12,
13,
14,
15].
Although current 2D numerical methods are considered reliable and effective for flood analysis, both expert and non-expert modelers often encounter challenges in achieving rapid and accurate simulations [
12]. The computational intensity of these models results in longer processing times, posing a significant barrier to their use in time-sensitive applications [
12]. Various efforts have been made to accelerate simulation times, such as the adoption of parallel computing techniques [
12,
16,
17]. However, these approaches typically require access to high-performance computing resources, which may not be widely available or cost-effective [
12]. These computational constraints present a critical limitation in developing near-real-time flood forecasting and response tools. This affects the reliability and timeliness of flood early warning systems, which are crucial for reducing flood impacts, improving emergency preparedness, and enhancing community resilience. For instance, the small stream flood early warning system SSFEWs developed by Cheong et al. 2024 [
5] demonstrated root mean squared errors (RMSEs) of up to 0.619 m
3/s for discharge and 0.016 m for water depth. However, the system’s accuracy declined when rainfall forecasts extended beyond one hour, highlighting the sensitivity of such models to input uncertainty and the limitations of near-real-time flood forecasting applicability [
5]. In this context, ML-based approaches offer a practical pathway to overcoming these challenges, enhancing the responsiveness and adaptability of FEWSs by reducing computational overhead and enabling more timely predictions [
5].
Machine learning (ML) surrogate models, such as U-FLOOD, have emerged as promising alternatives to traditional numerical approaches, offering faster and more computationally efficient solutions without significantly compromising predictive accuracy. These models are particularly advantageous in operational settings where rapid forecasting is essential [
9].
Costache et al. (2024) proposed an ensemble modeling approach combining deep learning, Harris Hawk Optimization, and stacking-based machine learning for flood mapping in Romania’s Buzău River basin [
18]. This study used 12 predictors and 410 data points, achieving high accuracy, particularly with the developed model. While effective locally, this study highlights the challenges of generalizing these models to different regions, emphasizing the need for further research to enhance global applicability [
18].
Dai et al. (2024) developed an ensemble Artificial Neural Network (EANN) model to enhance urban flood prediction in coastal areas such as Macao, China [
19]. The model effectively predicted flood depths during typhoon events, demonstrating that short training datasets can yield high accuracy. However, this study highlights the challenges posed by uncertainties in input data and model parameters, which remain critical for accurate flood forecasting [
19].
Seleem et al. (2023) used Convolutional Neural Networks (CNNs) and Random Forest (RF) models to predict urban pluvial floodwater depth in Berlin [
20]. RF performed well within the training domain using inputs such as rainfall, topography, and land use, but showed poor transferability due to overfitting. In contrast, CNNs—especially U-Net-based architectures—demonstrated better adaptability to new areas via transfer learning. However, this study’s reliance on a geographically limited dataset constrained its broader applicability to urban regions with different hydrological conditions [
20].
Esmaeili-Gisavandani et al. (2023) utilized three data-driven models—RF, Adaptive Network-based Fuzzy Inference System (ANFIS) [
21,
22], and a decision tree algorithm to perform regional flood frequency analysis (RFFA) in ungauged catchments in the Karkheh River basin, Iran. Compared with traditional multivariate regression, the RF model yielded the most accurate predictions of peak flows across various return periods. This study demonstrated RF’s effectiveness in handling hydrological data uncertainty but also noted limitations in transferring the model to catchments with different hydrological characteristics [
22].
Balestra et al. (2022) applied deep neural networks in Southern Italy and demonstrated that such models can rapidly delineate flood-prone areas by relying on globally reproducible conditioning factors [
23]. This approach is particularly valuable in regions where traditional hazard maps are unavailable [
23]. In parallel, Chen et al. (2019) proposed a hybrid ensemble framework that integrated reduced-error pruning trees with bagging and random subspace ensembles, achieving superior predictive performance and highlighting the effectiveness of ensemble techniques for flood susceptibility modeling [
24]. Collectively, these studies emphasize both the scalability of neural networks and the robustness of ensemble-based approaches, underscoring the importance of integrating diverse ML strategies into flood risk assessment.
Overall, using machine learning (ML) in flood prediction can significantly improve flood management and mitigation efforts worldwide, helping to save lives and reduce damage from flooding events [
9]. However, efforts to develop a general or global model remain limited because of the problem’s complexity and the constraints of available data, which restrict the machine learning model’s ability to perform well in unseen regions [
9,
25].
Combining unsupervised and supervised machine learning techniques has been shown to improve the generalization and transferability of models. Unsupervised techniques, such as clustering and dimensionality reduction, can help identify patterns and relationships in the data that may not be apparent through manual inspection. Supervised techniques, such as classification and regression, can then be used to build models that predict outcomes based on the identified patterns, allowing them to perform better on new and unseen data [
24,
26,
27,
28].
While HEC-RAS 2D remains a widely used tool for flood inundation modeling, its simulations are computationally intensive and time-consuming. This study aims to develop a data-driven surrogate modeling approach that delivers comparable flood depth predictions in a fraction of the time, achieving notable improvement in computational efficiency. The approach is particularly valuable for vulnerable ungauged catchments, where timely flood prediction is crucial for risk mitigation. Despite the growing use of machine learning in hydrology, existing studies often rely heavily on observed data, which are sparse in many regions, and few integrate clustering and model generalization to scale flood predictions spatially.
This research addresses these gaps by introducing a novel, cluster-integrated machine learning framework trained on hydrodynamically simulated data. It combines geospatial feature extraction, unsupervised clustering, and regression modeling to enable accurate, near-real-time, pixel-level flood depth prediction. The novelty lies not only in methodological integration but also in its operational value supporting early warning systems and enabling faster, more informed decision-making in flood-prone areas.
4. Discussion
This study presented a detailed methodology for developing a global flood depth prediction model by leveraging machine learning algorithms and extensive hydrodynamic data. The data preparation process involved integrating high-resolution datasets with global covers, such as LULC 2020-ESRI, AW3D30 DEM, and HYSOGs 250m, which were essential for accurate flood modeling. Numerical modeling using 2D hydrodynamic simulations provided maximum flood depth values, and spatial and parametric analyses were performed to extract and refine features for model training. A crucial step in this methodology was the selection of 45 catchment areas globally distributed, ensuring a wide variability in flood-driving parameters. These catchments were delineated and analyzed based on elevation, land use, land cover, and soil type, providing a comprehensive dataset for model training and validation. This study developed an RF classification model to predict pixel inundation, which was evaluated using a confusion matrix. The classification model demonstrated strong performance, with an accuracy of 86% and 84% and an F1-score of 0.83 and 0.81 for the entire and test datasets, respectively (see
Table 3). These results highlight the model’s effectiveness in distinguishing between inundated and non-inundated pixels.
A comprehensive evaluation was conducted of six machine learning algorithms (ANN, CNN, RNN, LSTM, Random Forest, and XGBoost) for pixel-level maximum flood depth prediction. This comparative analysis served as the foundation for selecting the optimal algorithm for the dataset. Among all models tested, the Random Forest algorithm demonstrated superior generalization performance, achieving the highest test R
2 of 0.69 and the lowest test RMSE of 0.483 m (see
Table 5). These results indicate that the Random Forest approach provides the most reliable predictive accuracy for flood depth estimation in this study.
The methodology also included developing and evaluating three regression models (trials): a pixel-based regression model, a clustered pixel-based regression model, and a clustered catchment-based regression model. In Trial 01, the pixel-based depth regression model achieved R2 values of 91% for training and 69% for testing, with validation on unseen catchments yielding an average NSE of 0.55 and R2 of 60%. Trial 02 introduced K-means clustering to group pixels based on physical characteristics, resulting in an overall testing R2 of 75% and improved validation metrics, with an average NSE of 0.69 and R2 of 66% across validation catchments. Trial 03 further advanced the methodology by incorporating catchment-specific parameters for clustering, achieving a training R2 of 97% and a testing R2 of 83%. The validation results for this model indicated high accuracy, with an average NSE of 0.79 and an R2 of 82%.
The capabilities of the developed ML model were evaluated for flood depth estimation using unseen catchment benchmarks. The developed machine learning model generates flood depth predictions in just 6 min—a dramatic improvement over traditional HEC-RAS simulations, which typically require hours to complete. This rapid prediction capability is critical for operational flood warning systems, enabling timely emergency responses. The ML model provides flood depth predictions with statistically robust 95% confidence intervals (
Figure 5), ensuring reliable uncertainty quantification for decision-making. Comparative analysis demonstrates an excellent correspondence between modeled and observed inundation maps, validating the ML model’s spatial prediction capabilities. The model’s combination of speed (sub-10 min predictions) and reliability (quantified uncertainty) represents a significant advancement for operational flood forecasting systems. While this study demonstrates the potential of machine learning models to surrogate hydrodynamic simulations and achieve significant computational efficiency gains (over 225×), it is important to note that the models are trained on HEC-RAS outputs, which, although derived from well-calibrated baseline scenarios (Soliman et al., 2022) [
31], may not fully represent real-world flood behavior. This dependency represents a key limitation. To address this, future research should include validation using observed flood data or satellite-derived flood extents to improve real-world applicability. Furthermore, future work should explore the use of higher-resolution data, advanced clustering algorithms, and the incorporation of actual rainfall event characteristics (depth and duration) along with real-time data integration to enhance model accuracy and scalability for global flood risk management and mitigation.
5. Conclusions
This study demonstrated significant advancements in developing a scalable modeling framework with potential for global generalization of flood depth prediction models using machine learning models. This study initiated with the careful selection of 45 geographically diverse catchments to capture wide variations in key flood-influencing factors, followed by comprehensive data collection of all relevant hydrological parameters. Six prominent machine learning approaches—including Artificial Neural Networks (ANNs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory networks (LSTMs), Random Forest (RF), and XGBoost—were rigorously evaluated for their pixel-scale flood depth prediction capabilities. Through systematic comparison, the Random Forest algorithm demonstrated superior performance, establishing itself as the most effective model for flood forecasting applications.
This study systematically evaluated three progressive modeling approaches: (1) a basic pixel-based regression (Trial 1), (2) a clustered pixel-based depth regression (Trial 2), and (3) a clustered catchment-based depth regression (Trial 3). Trial 3 emerged as the superior model, demonstrating exceptional predictive accuracy with a training R2 of 0.97 and a testing R2 of 0.83. When validated on six unseen catchments, the developed ML model maintained strong performance, achieving an average Nash–Sutcliffe Efficiency (NSE) of 0.79 and R2 of 0.82, confirming its reliability for flood depth prediction in ungauged basins. The main advantages of the developed ML model can be summarized as follows:
Computational Efficiency
- ○
Achieves complete flood depth spatial distribution predictions within 6 min;
- ○
Provides a 225× speed improvement over HEC-RAS 6.3 simulations;
- ○
Represents a 90–95% time reduction compared with HEC-RAS simulations;
- ○
Enables the foundation for flood early warning system implementation.
Prediction Accuracy
- ○
Delivers estimates with statistically robust 95% confidence intervals (
Figure 5);
- ○
Shows strong agreement with the HEC-RAS 6.3 benchmark depth maps (
Figure 6).
Operational Value
- ○
Establishes a foundation for emergency response decision-making;
- ○
Maintains accuracy while dramatically reducing computational requirements.
Finally, these advancements establish the developed ML model as both a rapid and reliable alternative to conventional hydrodynamic modeling for a scalable modeling framework for flood depth prediction.