Next Article in Journal
Effect of Following Current on the Hydroelastic Behavior of a Floating Ice Sheet near an Impermeable Wall
Next Article in Special Issue
Stability of Beach Nourishment Under Extreme Wave Conditions: Insights from Physical-Model Experiments and XBeach Simulations
Previous Article in Journal
Research on Subsea Cluster Layout Optimization Method Considering Three-Dimensional Terrain Constraints
Previous Article in Special Issue
Interpretation Analysis of Influential Variables Dominating Impulse Waves Generated by Landslides
 
 
Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Predicting Coastal Flooding and Overtopping with Machine Learning: Review and Future Prospects

by
Moeketsi L. Duiker
1,2,*,
Victor Ramos
1,2,
Francisco Taveira-Pinto
1,2 and
Paulo Rosa-Santos
1,2,*
1
Department of Civil Engineering and Georesources, FEUP—Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, s/n, 4200-465 Porto, Portugal
2
Interdisciplinary Centre of Marine and Environmental Research of the University of Porto (CIIMAR), Avenida General Norton de Matos, s/n, 4450-208 Matosinhos, Portugal
*
Authors to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2025, 13(12), 2384; https://doi.org/10.3390/jmse13122384
Submission received: 5 November 2025 / Revised: 10 December 2025 / Accepted: 13 December 2025 / Published: 16 December 2025
(This article belongs to the Special Issue Coastal Disaster Assessment and Response—2nd Edition)

Abstract

Flooding and overtopping are major concerns in coastal areas due to their potential to cause severe damage to infrastructure, economic activities, and human lives. Traditional methods for predicting these phenomena include numerical and physical models, as well as empirical formulations. However, these methods have limitations, such as the high computational costs, reliance on extensive field data, and reduced accuracy under complex conditions. Recent advances in machine learning (ML) offer new opportunities to improve predictive capabilities in coastal engineering. This paper reviews ML applications for coastal flooding and overtopping prediction, analyzing commonly used models, data sources, and preprocessing techniques. Several studies report that ML models can match or exceed the performance of traditional approaches, such as empirical EurOtop formulas or high-fidelity numerical models, particularly in controlled laboratory datasets where numerical models are computationally intensive and empirical methods show larger estimation errors. However, their advantages remain task- and data-dependent, and their generalization and interpretability may lag behind physics-based methods. This review also examines recent developments, such as hybrid approaches, real-time monitoring, and explainable artificial intelligence, which show promise in addressing these limitations and advancing the operational use of ML in coastal flooding and overtopping prediction.

1. Introduction

Coastal flooding and wave overtopping represent major hazards for coastal communities, with far-reaching socio-economic and environmental consequences. These events can damage critical infrastructures, disrupt transportation networks, and, in extreme cases, result in loss of lives (e.g., [1,2,3]). In densely urbanized coastal areas, overtopping-induced flooding leads to prolonged economic disruption and places considerable pressure on emergency response systems [4]. Beyond their immediate impacts on human activities, overtopping events also accelerate coastal erosion, drive sediment redistribution, and degrade natural habitats, thereby threatening biodiversity and ecosystem functioning.
The occurrence and intensity of coastal flooding and overtopping are controlled by multiple interacting drivers. Key contributors include storm surges induced by low atmospheric pressure and strong winds, extreme wave conditions, astronomical tidal cycles, coastal morphology, and long-term sea-level rise associated with climate change. Among these, surface gravity waves play a particularly critical role. Generated in the open ocean by wind forcing, waves propagate toward the coast, transporting substantial amounts of energy [5,6]. As they shoal and break in the nearshore zone, part of their energy is dissipated through turbulence, wave–current interactions, and water-level fluctuations [7]. The remaining energy reaches the shoreline, where it is absorbed or reflected depending on coastal boundary conditions, thereby influencing wave setup, overtopping volumes, and longer-term processes such as sediment transport and morphological change [8]. These processes are expected to intensify under global climate change, as sea-level rise and more frequent extreme events increase the likelihood and magnitude of overtopping events [9].
In this context, the accurate assessment of coastal defense structures (CDs) is essential for mitigating flooding and overtopping hazards. Reliable predictions of mean overtopping discharges (MODs) and acceptable threshold values under given design conditions are fundamental to the design and evaluation of CDs [10,11,12]. A range of approaches has been developed for this purpose. Numerical modeling plays a central role, employing formulations based on the Reynolds-averaged Navier–Stokes (RANS) equations, Boussinesq-type models, or shallow-water equations to simulate wave–structure interactions and capture the hydrodynamic processes governing overtopping [13,14]. While these models provide detailed insights, they are computationally demanding and sensitive to boundary conditions, limiting their use for real-time forecasting. Physical models, conducted in laboratories, offer valuable experimental validation of numerical predictions, but they are both costly and time-consuming [15]. Finally, empirical formulations derived from experimental and full-scale datasets, such as the EurOtop equations, remain widely used because of their simplicity and practicality, but often lack site-specific accuracy [16].
Against this backdrop, machine learning (ML) approaches present an innovative alternative to traditional prediction models. Unlike numerical models that rely on explicitly defined physical equations, ML models analyze large datasets to identify patterns and make predictions. This capability allows ML techniques to integrate diverse data sources, including satellite imagery, remote sensing observations, and real-time sensor data, to improve forecasting accuracy [17]. Furthermore, ML models can learn from historical overtopping events to refine prediction capabilities and support decision-making in coastal risk management [18]. As a result, the advantages of ML over conventional models include reduced computational cost while keeping the ability to handle complex interactions between multiple variables [19]. ML techniques such as artificial neural networks (ANNs), gradient boosting models, and convolutional neural networks (CNNs) have demonstrated success in predicting coastal flooding and overtopping with higher accuracy than empirical models [20,21]. Additionally, hybrid ML models that combine physics-based approaches with deep learning methods offer new possibilities for improving prediction accuracy while maintaining interpretability [22].
Despite the potential of ML, important challenges remain in optimizing its use for coastal applications. The accuracy of ML predictions is highly dependent on the availability and quality of training data, highlighting the need for integrating diverse, high-resolution datasets [23,24]. Effective model development also requires domain expertise to ensure that outputs are physically consistent and applicable to real-world conditions. A further limitation concerns interpretability: many advanced ML models operate as “black boxes,” making their predictions difficult to explain and therefore less suitable for regulatory or operational decision-making [25].
Against the foregoing backdrop, this paper explores the application of ML in predicting coastal flooding and overtopping, identifying key trends, methodologies, and key knowledge gaps in the field. By evaluating recent advances in ML techniques and their integration with traditional models, this study highlights the potential of data-driven approaches for improving coastal resilience and flood risk management. Consequently, the remainder of this paper is structured as follows: Section 2 presents a bibliographic analysis, followed by Section 3, which provides an overview of the most commonly used ML models in studies related to coastal flooding and overtopping. It further highlights the findings of the literature review, detailing studies that have employed various models and data sources. Finally, Section 4 and Section 5 present the discussion and conclusions, respectively.

2. Methodology and Bibliographic Analysis

In recent years, research on the integration of ML tools in coastal engineering has intensified. This trend is particularly evident in major academic databases such as Scopus (Sc) and Web of Science (WoS). A bibliometric analysis was conducted to assess studies related to coastal wave overtopping and flood prediction using ML. This analysis involved searching 3 different scientific databases with systematically selected keywords using “AND” and “OR” operators. The search queries included the following:
  • (“Machine learning” OR “artificial intelligence”) AND overtopping AND prediction;
  • (Susceptibility OR map*) AND “Machine learning” AND coastal AND (flood OR inundation) AND NOT (risk OR hazard).
This structured approach ensured a comprehensive dataset while restricting the search to publications from the last 19 years. Since this study focuses specifically on coastal overtopping, coastal flood susceptibility, and flood-extent mapping, the “AND NOT” operator was used to exclude studies primarily related to risk and hazard assessments. Risk- and hazard-focused papers typically evaluate additional components such as population vulnerability, exposure, economic valuation, or flood damage intensity, which fall outside the scope of flood-extent delineation or susceptibility estimation. The Sc database was utilized to search scientific documents published between 2005 and 2024, restricting the search to two subject areas: environmental science and engineering. Similarly, in the ScienceDirect (SD) database, the search was confined to journals within the fields of ocean and coastal engineering. Meanwhile, the WoS search was conducted within article abstracts and limited to the fields of civil engineering, ocean engineering, oceanography, marine engineering, environmental sciences, and environmental studies. For all three databases, the search was limited to English-language publications, and the data were organized by publication year to generate Figure 1 and Figure 2.
Figure 1 illustrates that publications on ML applications for wave overtopping have increased significantly since 2017, with a steady annual rise in published documents. Similarly, Figure 2 demonstrates that ML-based flood studies have steadily increased in volume since 2013, with SD leading in publications, followed by WoS in 2017 and Sc in 2019.
It should be noted that prior to implementing the systematic search methodology, a broader preliminary search was carried out to ensure that relevant studies from other journals were not overlooked. The systematic search presented here reflects the refined scope of the review rather than an attempt to catalog all hazard- or vulnerability-based ML studies.
Table 1 summarizes the number of relevant documents obtained from each database for coastal flooding and overtopping studies, along with the number of documents reviewed in this study. The retrieved documents were classified into three categories: research articles, review articles, and other materials (e.g., conference papers, book chapters, etc.). To ensure the relevance and quality of the review, a document selection process was applied as follows:
  • Inclusion criteria: studies focused on coastal overtopping prediction; coastal flood susceptibility; flood-extent mapping from coastal storm events; studies using ML models applicable to coastal hydrodynamics or remote sensing.
  • Exclusion criteria: studies on risk or vulnerability assessment; focused exclusively on pluvial flooding unless they introduced methodological advances directly transferable to coastal contexts; using similar ML models on identical datasets without additional methodological contributions; hazard-mapping studies emphasizing social vulnerability indices.
Table 1. Number of documents obtained and used from 3 databases (Science Direct (SD), Scopus (Sc), and Web of Science (WoS)) for coastal flooding and overtopping studies.
Table 1. Number of documents obtained and used from 3 databases (Science Direct (SD), Scopus (Sc), and Web of Science (WoS)) for coastal flooding and overtopping studies.
Coastal OvertoppingCoastal Flooding
SDScWoSTotalUsedSDScWoSTotalUsed
Articles7028191171969133511729
Review articles2114321362
Others112316201230
As a result, approximately 82% of the initially retrieved documents related to coastal overtopping and 75% of those related to coastal flooding were excluded from the final review dataset. Many excluded overtopping papers reused identical datasets with minimal methodological differences, while many excluded flooding papers focused exclusively on inland pluvial processes or multi-criteria hazard evaluations. Some papers presented visually similar ML workflows without contributing new insights into model design, preprocessing strategies, or coastal-specific applications. Including such papers would inflate the dataset without adding meaningful diversity.

3. Application of ML in Coastal Flooding and Overtopping Studies

ML has gained increasing prominence in coastal engineering as a means of reducing the computational costs of traditional numerical models and enhancing predictive capabilities. Applications span a wide range of topics, including storm surge forecasting [26], sediment transport and morphodynamics [27], and wave energy resource assessment [28]. ML techniques have also been employed to predict wave patterns, water-level fluctuations [23], and wave overtopping characteristics [29], as well as to optimize the design of coastal defense structures [30]. Beyond structural design, ML contributes to improved early warning systems for extreme weather events [31], thereby strengthening disaster preparedness. Collectively, these advancements underscore the growing role of ML in addressing complex coastal engineering challenges and highlight its potential to support decision-making and risk reduction in vulnerable coastal regions.

3.1. Brief Description of ML Models

This subsection presents a concise overview of the ML models most commonly applied in coastal flooding and overtopping studies (Figure 3). These models exhibit distinct strengths depending on the type of coastal process being modeled. Rather than presenting general definitions, this section highlights coastal-specific considerations that influence model selection. For further details on the mathematical formulation and implementation of the ML models, readers are referred to [32,33,34].
Artificial neural networks (ANNs) are widely used in overtopping studies due to their ability to capture complex nonlinear relationships between hydraulic parameters (e.g., wave height, period, structure geometry) [35,36]. Their ability to handle high-dimensional datasets [37] enables accurate flood and overtopping predictions; however, they require large datasets for training and can be computationally expensive, particularly for deep architectures [38].
Gradient boosting decision trees (GBDTs) have shown strong performance in overtopping prediction because they handle mixed-parameter datasets effectively and provide insights into variable importance [39,40]. They generally outperform ANNs when datasets are small to medium in size [41], but they are sensitive to hyperparameters and may underrepresent long-tailed extreme events [42].
Convolutional neural networks (CNNs) have gained prominence in coastal flooding and overtopping studies due to their capability to process spatial data (e.g., satellite imagery) [43,44]. Their ability to recognize spatial patterns enables improved representation of coastal hydrodynamics, especially in remote sensing applications [38,45]. Despite their effectiveness, CNNs’ limitations include the need for substantial labeled data (rare in coastal settings) and difficulty in interpreting physical drivers behind pixel-level classification.
Random forests (RFs) are robust for flood susceptibility mapping because they handle heterogeneous environmental predictors [46]. They are less effective for image-based flood extent mapping but remain popular for GIS-based susceptibility modeling in coastal environments due to their ability to handle missing data [47].

3.2. Evaluation Metrics

The performance of the models presented in the studies of coastal flooding and overtopping is evaluated using a range of statistical and classification metrics, each highlighting different aspects of predictive accuracy. The root-mean-squared-error (RMSE) quantifies the average magnitude of prediction errors:
R M S E = 1 n i = 1 n y i y ^ i 2
The centered RMSE (cRMSE) removes systematic bias by centering the data around their means:
c R M S E = 1 n i = 1 n y i y ¯ y ^ i y ^ ¯ 2
The normalized RMSE (nRMSE) expresses the RMSE relative to the mean of the observed data, allowing for comparison across datasets with different scales:
n R M S E = R M S E y ¯
The coefficient of determination (R2) expresses the proportion of variance in the observed data explained by the model, where values closer to 1 denote stronger explanatory power:
R 2 = 1 i = 1 n y i y ^ i 2 1 = 1 n y i y ¯ 2
where y i is the observed (measured) value, y ^ i is the predicted value, y ¯ is the mean of the observed values, y ^ ¯ is the mean of the predicted values, and n is the total number of observations. For imbalanced datasets, the average precision (AP) evaluates precision across different recall levels (measures the area under the precision–recall curve), serving as a robust indicator of model performance in imbalanced datasets:
A P = 0 1 p r d r = n R n R n 1 P n ,
where p r represents precision as a function of recall, R n is the recall, and P n is the precision at the nth threshold. In classification-oriented studies, Recall quantifies the proportion of correctly identified positive instances:
R e c a l l = T P T P + F N
Several additional measures are used in flood prediction studies. Precision (Pr) measures the proportion of true positives among predicted positives:
P r = T P T P + F P
Accuracy (Ac) and/or overall accuracy (OAc) quantify the proportion of correct predictions across all cases but can be sensitive to class imbalance:
A c = O A c = T P + T N T P + T N + F P + F N
Intersection over Union (IoU) metric, often used in spatial and image-based flood mapping, measures the overlap between the predicted flooded area and the observed (ground-truth) flooded area relative to their combined extent:
I o U = T P T P + F P + F N
Finally, the area under the curve (AUC), typically referring to the receiver operating characteristic (ROC) curve, evaluates the trade-off between true positive and false positive rates by integrating the true positive rate (TPR) and false positive rate (FPR), with values closer to 1 indicating excellent discriminative ability:
A U C = 0 1 T P R F P R   d ( F P R ) i F P R i + 1 F P R i × T P R i + 1 + T P R i 2
where T P (true positives) represents correctly identified positive cases, T N (true negatives) are correctly predicted negative cases, F P (false positives) are negative cases incorrectly predicted as positive, F N (false negatives) are positive cases incorrectly predicted as negative, T P R = ( T P /( T P + F N )) represents sensitivity or recall, and F P R = ( F P /( F P + T N )) represents the false alarm rate.

3.3. Predicting Coastal Overtopping

As mentioned above, a range of ML approaches, including ANNs, CNNs, GBDTs, and support vector machines (SVMs), have been successfully applied to predict overtopping discharges and runup heights [48,49,50]. These models vary considerably even in terms of data sources, data processing methods, and research objectives. To capture this variability, this subsection reviews recent applications of ML to wave overtopping, with particular emphasis on the diversity of methodologies employed.
Van Gent et al. [51] developed a neural network (NN) model to estimate wave overtopping discharges for a wide range of coastal structures. Their dataset, consisting of 8372 tests from physical model experiments within the European project CLASH [52], included 15 input variables and 1 output variable, mean overtopping discharge. Their results demonstrated that the NN model successfully captured the complex relationships between input variables and overtopping discharge, achieving an RMSE of 0.29. In this context, den Bieman et al. [53] employed the same dataset as van Gent et al. [51] to develop a prediction model for overtopping rates using extreme gradient boosting decision trees (XGB). Their study showed that XGB’s prediction error was 2.8 times lower than the NN model developed by van Gent et al. [51]. The improved performance stemmed from advanced data preprocessing techniques, including permutation importance analysis and hyperparameter optimization. These findings suggest that GBDT models can serve as reliable alternatives to traditional ANN models, particularly when interpretability and computational efficiency are required.
Similarly, Hosseinzadeh et al. [54] explored Gaussian process regression (GPR) and support vector regression (SVR) for predicting mean overtopping discharge at inclined breakwaters. Using an updated CLASH [15] dataset, they demonstrated that GPR outperformed ANN models, achieving a cRMSE = 0.24, compared to ANN’s cRMSE = 0.28 and empirical formulas such as those in [15,55] (cRMSE = 0.63 and 1.20, respectively). Their study emphasized the importance of variable selection, as they found that an optimal combination of 7 inputs and 1 output, instead of 15 inputs used by van Gent et al. [51], improved prediction accuracy beyond conventional approaches. Habib et al. [50] examined the predictive performance of multiple ML models, finding that RF achieved the best accuracy (RMSE = 0.0025), while ANN was the least performing model (RMSE = 0.0031). Furthermore, Elbisy [21] evaluated decision trees (DTs), GBDT, and SVM for predicting overtopping discharge in coastal structures. The results, based on the EurOtop database, indicated that GBDT outperformed previous models by Formentin et al. [56] and van Gent et al. [51]. The scatter index for GBDT was 0.392, compared with 0.52 for Formentin et al. [56] and 0.77 for van Gent et al. [51], reaffirming the potential of ML to improve traditional overtopping assessment methods. This variation in performance across different ML models underscores the crucial role of dataset composition and preprocessing strategies in model accuracy. Factors such as the quality and quantity of input variables, feature selection techniques, and normalization methods significantly impact predictive capabilities. While some models, such as RFs and GBDTs, excel in interpretability and structured datasets, the effectiveness of a model is inherently tied to the characteristics of the data it processes, highlighting the need for careful selection of preprocessing techniques tailored to specific coastal engineering applications.
Even though ML techniques are becoming more popular in coastal engineering studies, selecting an appropriate model for a given dataset remains a significant challenge. The complexity of coastal overtopping processes and site-specific conditions, along with the variability in wave–structure interactions, requires careful consideration when choosing a model that balances accuracy, interpretability, and computational efficiency. These challenges have led some researchers to explore multiple ML models to determine which yields the best performance for a given dataset. For example, Kim and Lee [57] applied eight ML models (XGB, GPR, AdaBoost, RF, SVR, ANN, Lasso, and linear regression) to the same dataset, evaluating model performance under different tuning conditions. Their findings revealed that XGB achieved the highest accuracy (RMSE = 0.276), while linear regression (LR) was the least effective (RMSE = 0.691). This study also highlighted the potential biases in ML models due to unequal dataset distribution across different coastal structure types in EurOtop. Comparative analyses by Alshahri and Elbisy [20] further reinforced these insights. Their study evaluated multilayer perceptron neural networks (MPNNs), cascade correlation neural networks (CCNNs), general regression neural networks (GRNNs), and SVM with a radial-basis function. Using the EurOtop database of 2009, they found that GRNN yielded the most accurate results (RMSE = 0.0003), outperforming SVM (RMSE = 0.0005), CCNN (RMSE = 0.0007), and MPNN (RMSE = 0.001). These findings illustrate how different ML models cater to varying complexities in overtopping prediction by leveraging their unique strengths. While gradient boosting models like XGB demonstrated superior predictive performance in some studies due to their ability to efficiently process structured datasets and optimize feature selection, neural network-based models such as GRNN and CNN have shown higher adaptability in handling complex hydrodynamic interactions.
As a result, a growing number of studies have applied deep learning techniques for overtopping analysis. CNNs, in particular, are well-suited for this task given their capacity to capture spatial and temporal patterns in complex datasets. For example, Tsai and Tsai [58] developed a CNN architecture incorporating bottleneck residual blocks, layer normalization, and dropout layers, with hyperparameters optimized through Bayesian search. Their model achieved an RMSE of 0.112, outperforming both ANN and GBDT approaches. Notably, when validated against real-world prototype datasets, rather than controlled laboratory data, the CNN maintained its predictive advantage (RMSE = 0.555), highlighting its potential for practical overtopping assessment. Alongside these predictive advances, other deep learning applications have focused on real-time overtopping monitoring. For example, Alvarellos et al. [59] trained an ANN-based model on video-recorded overtopping events (2015–2022), covering 3709 events. Their final models achieved training accuracies of 0.73–0.84 and testing accuracies of 0.67–0.86, demonstrating their reliability in detecting overtopping incidents. Beyond predictive modeling, ML has been explored for early warning systems and pedestrian safety in overtopping events. Carro et al. [60] developed an RF-based predictive model, achieving an F2-score accuracy of 0.89 during validation and 0.95 during testing. Their study highlighted how ML models can enhance safety and efficiency in maritime operations, serving as robust early warning tools for ports.
Recent studies have also compared ML models against empirical formulations. For instance, den Bieman et al. [48] assessed the performance of an XGB model for wave overtopping prediction in comparison with empirical approaches proposed in [15,61]. Their results showed that XGB achieved a markedly lower RMSE of 0.098 on the training dataset, while the empirical formulas produced higher errors of 1.089 and 1.313, respectively. Similar findings were reported by Formentin et al. [56], who trained an ANN model on an extended database of physical model tests and achieved substantially higher accuracy (R2 = 0.85–0.98) than empirical formulas traditionally used for estimating overtopping discharge, reflection, and transmission coefficients (R2 = 0.51–0.83). Lee and Suh [62] further demonstrated that ML approaches can outperform conventional formulations by developing new Group Method of Data Handling (GMDH)-based predictive formulas for inclined seawalls using CLASH and EurOtop datasets; the resulting models achieved nRMSE values of 0.0155 and 0.003, compared with 0.0258 and 0.0036 for empirical formulas, and performed comparably to the EurOtop ANN (nRMSE = 0.0172 and 0.003). Additionally, Etemad-Shahidi et al. [63] applied data-mining techniques coupled with scaling arguments to derive overtopping-rate prediction formulas and reported at least 18% and 31% reductions in RMSE for laboratory and field datasets, respectively, relative to existing empirical equations. Collectively, these studies reinforce the growing evidence that ML-based or ML-informed formulations can match or exceed the accuracy of traditional empirical approaches, underscoring their potential value in coastal engineering applications.
Figure 4 presents a summary of different families of ML models reviewed in this paper, showing that within the deep learning and neural network family, ANNs were the most frequently applied, while gradient boosting models dominated the tree-based family. Figure 5 illustrates the models that achieved the highest performance when compared to other ML models within the same study, with XGB emerging as the best-performing approach in three separate research studies.
Table 2 summarizes studies applying ML models to coastal overtopping prediction, outlining the algorithms used, dataset sizes, preprocessing techniques, and performance metrics (primarily RMSE). The results show that predictive accuracy is highly dependent on model choice, preprocessing strategy, and dataset composition, emphasizing the importance of selecting models suited to coastal structure characteristics and data quality. Notably, SVR and GPR have demonstrated strong predictive abilities, as seen in [54], where they outperformed empirical formulas. Across most studies, datasets were derived from similar sources, leading to the adoption of comparable preprocessing methods, including weight factor reduction (WFR), Froude’s similarity law scaling (FSL), permutation importance (PI), data normalization (DN), logarithmic transformation of overtopping discharge (Log(q)), and normalization of coastal structure parameters by wavelength and wave height (SPNWH).
Figure 6 provides a visual summary of the studies presented in Table 2, organized according to the data processing categories defined in Table 3. The figure reveals that feature scaling or transformation is the most prevalent preprocessing step, identified in 16 studies, underscoring the importance of nondimensionalizing overtopping discharge and structural parameters to enhance model generalization. Data partitioning, such as dividing datasets into training and testing subsets, is also widely employed, appearing in 14 of the 19 studies. In contrast, feature engineering and filtering were relatively uncommon, occurring in only five and one study, respectively.
Overall, the performance of ML models in overtopping prediction varies depending on dataset composition, feature selection, and the complexity of coastal structures considered. The reviewed studies collectively show that no single ML algorithm consistently outperforms all others; instead, model performance is strongly influenced by preprocessing choices, data volume, and the physical characteristics of the structures represented in each dataset.

3.4. Predicting Coastal Flooding

Flood prediction plays a fundamental role in coastal risk management, enabling proactive measures to minimize damage and losses. Traditional flood forecasting techniques, including hydrodynamic simulations and empirical models, have been extensively used to predict flood extents and water levels. However, these approaches often require high computational resources and may struggle with adaptability to diverse coastal environments. Therefore, ML methods have emerged as efficient alternatives that enhance predictive accuracy by leveraging large-scale datasets and automated learning algorithms.
On these grounds, recent studies have explored various ML models to improve coastal flood prediction (e.g., flood extent and susceptibility mapping). Deep learning models have been widely utilized due to their capability to model nonlinear relationships among meteorological, oceanographic, and hydrological variables [35,37]. By training on historical flood data, deep learning models can learn patterns in water level fluctuations, rainfall intensity, and tidal influences, improving real-time forecasting accuracy [38]. At this point, to maintain a clear coastal focus, flood-related studies are grouped into two categories: (i) coastal flood-extend and surge/tide/wave-driven flooding; and (ii) general flood susceptibility or riverine/pluvial mapping where ML methods have direct methodological transferability to coastal settings.

3.4.1. Studies Related to Coastal Flooding

Recent research has demonstrated the effectiveness of ML in tackling a broad range of coastal flood-related tasks, from extent mapping and inundation depth estimation to compound flood characterization. A major research area involves the use of deep learning for coastal flood mapping using remote sensing data. For example, Qin et al. [70] compared multiple ML models, including backpropagation neural networks (BPNN), polynomial regression (PR), DT, RF, and K-nearest neighbors (KNN), for generating point waterlogging depth predictions. Their findings indicated that the BPNN model achieved higher accuracy in floodwater depth estimation, with RMSE values ranging from 0.045 to 0.003 across preprocessing strategies and dataset compositions. Importantly, Qin et al. also reported that the ML-based predictions outperformed traditional 1D/2D and CEDH hydrodynamic models, while delivering computational efficiencies up to five times faster. These results highlight the potential of ML to serve as a complementary or surrogate approach to physics-based numerical models for coastal flood applications, particularly in scenarios where rapid forecasting or data-driven updates are required. Liu et al. [71] developed SARCFMNet, a deep CNN designed to process synthetic aperture radar (SAR) imagery for coastal inundation mapping during Hurricane Harvey. Their physics-aware input design and regularization achieved an OAc of 0.98 and an F1 score of 0.88. Similarly, Peng et al. [72] addressed the challenge of near-real-time flood delineation in highly heterogeneous urban coastal settings by introducing the Patch Similarity CNN (PSNet), which uses multispectral surface reflectance imagery before and after flooding. Applied to the Harvey and Florence storms, the model maintained strong performance (F1 = 0.87; OAc = 0.93), illustrating how carefully designed CNN architectures can overcome illumination inconsistency, urban clutter, and spectral variability.
Tree-based models have also been extensively employed in flood susceptibility and vulnerability studies, achieving high accuracy in distinguishing between flood-prone and non-flood-prone areas. For instance, Hasan et al. [73] compared RF, XGB, and KNN for flood susceptibility mapping using 400 observations with 9 flood-controlling factors, processed in ArcGIS. Their study reported accuracy scores ranging from 85.5% to 86.7%. Zahura and Goodall [74] further demonstrated the capability of ML to complement or emulate numerical models by using an RF surrogate to replicate a 1D/2D physics-based hydrodynamic model for predicting surface water depths in an urban coastal watershed. Their surrogate model successfully reproduced flood extent and depth for both pluvial and tidal flooding events and was able to differentiate between locations dominated by rainfall-driven flooding, tide-driven flooding, or a combination of both. Validation using flood reports from the Waze mobile application showed a 90% agreement with the RF-based predictions, highlighting the potential of ML surrogates for operational coastal flood forecasting. Moreover, Saravanan and Abijith [75] assessed susceptibility along the northern coastal zone of Tamil Nadu, India, using XGB, SVM, gradient boosting machine (GBM), rotation tree (RTF), and Naïve Bayes (NB), integrating Sentinel-1 and other geospatial layers. Their feature-refined GBM model reached an AUC of 0.92, demonstrating the suitability of ensemble models for coastal zone susceptibility assessments characterized by heterogeneous environmental attributes.
Despite these advancements, several challenges hinder the widespread application of ML in flood prediction. One major issue is data scarcity, particularly in regions with limited historical flood records. While ML models thrive on large datasets, the absence of extensive training data can hinder model generalization and accuracy. In some cases, synthetic datasets generated through numerical models help mitigate this issue, but discrepancies between simulated and real-world data can still affect prediction reliability. Moreover, model interpretability remains a persistent challenge, as deep learning architectures like CNNs and long short-term memory (LSTMs) function as black-box models, making it difficult for decision-makers to understand the underlying logic behind predictions [76].
In this context, hybrid approaches that integrate physics-based numerical models with ML techniques have been proposed to address these challenges. For example, Naeini and Snaiki [77] proposed a physics-informed ML approach for simulating and predicting time series of wave runup relevant to coastal flooding. Their method leverages the two computational modes of the XBeach model: the Surfbeat (XBSB) mode, which generates wave runup outputs quickly but with limited precision, and the non-hydrostatic (XBNH) mode, which provides high-fidelity and more physically accurate results, albeit at a significantly higher computational cost. To leverage these strengths, they employed a conditional generative adversarial network (cGAN) that transforms the fast, lower-fidelity Surfbeat outputs into high-quality predictions comparable to those of the non-hydrostatic mode. Results demonstrated the robustness and efficiency of this framework, highlighting its potential for risk assessment and coastal flood management.

3.4.2. Non-Coastal Flood Studies Relevant to Coastal Applications

Although numerous ML studies focus on inland or riverine flooding rather than coastal processes, many of their methodological contributions remain highly transferable to coastal flood prediction. These studies are essential to include because they introduce model architectures, preprocessing workflows, and feature-selection strategies that directly apply to coastal problems once oceanographic variables (e.g., wave overtopping discharge, storm surge height, tidal level) replace typical inland predictors such as rainfall or soil moisture.
A large group of transferable studies centers on flood susceptibility modeling. Vu et al. [78] demonstrated the effectiveness of SVM and RF classifiers using Sentinel-1 imagery, achieving an AUC of 0.98. Likewise, Prasad et al. [79] further advanced susceptibility modeling by implementing a base classifier, Adabag (AB), combined with RTF, KNN, boosted regression tree (BRT), and LogiBoost (LB) classifiers, applying Boruta feature selection, showing that careful variable refinement can substantially improve prediction maps.
With the increasing availability of remote sensing data, CNNs have gained attention for flood extent mapping. CNNs excel in processing spatial datasets, such as multispectral satellite imagery, SAR data, and high-resolution topographic maps, allowing for improved detection of flooded regions. For instance, Shastry et al. [80] applied CNN-based models for flood extent classification, achieving an impressive 98% precision by combining Maxar WorldView satellite imagery, United States Geological Survey (USGS) flood data, and land cover information from the National Land Cover Database (NLCD). In addition to evaluating ML performance alone, the authors compared the satellite-derived ML flood maps with a hydraulic model and found that inundation was underestimated by approximately 62% when relying solely on optical remote sensing imagery. Using NLCD and cloud-mask information, they attributed 79% of this underprediction to obstructions such as vegetation (74%), clouds (9%), and combined effects (4%). Their findings highlight that although CNN-based methods perform well in image classification, integrating hydrodynamic model outputs alongside remote sensing data remains essential for producing more complete and realistic flood-extent maps. Similarly, Nemni et al. [81] used a combination of CNN architectures (XNet, U-Net, and ResNet) on Sentinel-1 SAR imagery, obtaining accuracy scores between 91% and 97% after preprocessing steps such as orthorectification, calibration, and speckle filtering. Another major area of research is the use of fully convolutional networks (FCNs) for flood prediction. Kang et al. [82] applied VGG-16-based FCNs to SAR images from Geofen-3 satellites, achieving an accuracy range of 99.1% to 99.6% in flood extent classification. Similarly, Sarker et al. [83] used F-CNNs on 14 Landsat images, normalizing pixel intensity values and integrating Water Observations from Space (WOfS) data, achieving classification accuracy between 80% and 92%.
Another promising development in ML-based flood forecasting is the use of LSTM networks [84], a type of recurrent neural network designed for time-series prediction. LSTMs have been applied in flood forecasting to model complex temporal dependencies between rainfall, tides, and water levels. For instance, Li et al. [85] developed an LSTM-based flood prediction model trained on real-time sensor data, achieving a significant reduction in forecasting errors compared to conventional models. Overall, while these studies are not explicitly coastal, their modeling frameworks, data-driven strategies, and feature-selection techniques remain highly applicable to coastal flood prediction when ocean-driven inputs are incorporated.

3.4.3. Summary of Reviewed Flood-Related Studies

One of the key aspects of ML-based flood prediction is data preprocessing, which ensures the quality and relevance of input variables. Feature selection is a crucial step that reduces redundancy and enhances model performance, often applied using techniques such as recursive feature elimination (RFE) and principal component analysis (PCA) [86]. Normalization and standardization help scale numerical values into comparable ranges, improving the stability of ML models. For example, rainfall data measured in millimeters and wind speed measured in kilometers per hour require normalization to avoid bias in model training [34]. Hidayah et al. [87] employed Normalized Difference Vegetation Index (NDVI) and feature selection methods in flood susceptibility mapping, improving model accuracy to 92.1–97.7%. Dong et al. [88] developed a FastGRNN-FCN hybrid model using flood control data, incorporating 80,997 flood events with multiple hydrological variables. Their approach, which utilized time-series segmentation and feature selection, achieved an accuracy of 95.8–97.8%, highlighting the critical role of preprocessing strategies in enhancing the effectiveness of predictive flood analytics.
Looking ahead, the integration of multi-source data fusion and real-time adaptive ML models presents promising directions for flood prediction research. Combining satellite observations with crowd-sourced flood reports and Internet of Things (IoT)-based water level sensors could further enhance the robustness of ML-based forecasting systems [89]. Additionally, the development of explainable artificial intelligence (XAI) techniques seeks to address the black-box nature of complex ML models by making their predictions more interpretable. In the context of flood forecasting, XAI can highlight which variables (e.g., wave height, storm surge, or CD structural parameters) strongly influence predictions, thereby fostering greater trust in model outputs. This transparency is vital for coastal management, where decisions involve high economic and safety stakes, and it enables policymakers and engineers to detect potential biases or errors, ultimately supporting more reliable and defensible decision-making [90].
Figure 7 provides an overview of the ML model families applied in the reviewed studies, indicating that within the deep learning and neural network family, CNN-based models (e.g., U-Net, VGG16, SegNet, Mask R-CNN, etc.) were the most commonly employed, while RF dominated the tree-based family. Figure 8 highlights the models that achieved the highest performance when compared with other ML models evaluated in the same study, with RF emerging as the best-performing approach in three separate cases.
Table 4 provides a summary of key studies that have applied ML models for coastal flood prediction. It highlights the ML techniques used, the datasets employed, the preprocessing methods applied, and the overall model performance. The table demonstrates how different models, including ANNs, RF, CNNs, and hybrid deep learning approaches, have been leveraged to enhance flood susceptibility mapping, real-time forecasting, and flood extent classification.
Figure 9 presents a synthesis of the studies listed in Table 4, categorized according to the data processing methods defined in Table 5. The analysis reveals that image labeling and segmentation are the most frequently applied techniques, appearing in 10 studies, reflecting a strong emphasis on supervised flood extent detection. Feature engineering, including selection and normalization, is also commonly employed, as identified in approximately six to seven studies. More advanced corrections, such as geometric and radiometric adjustments, are predominantly associated with SAR-based studies.
Across the reviewed flood-related studies, ML models demonstrate strong predictive capabilities for flood extent mapping, susceptibility classification, and short-term forecasting. However, their accuracy remains sensitive to data quality, preprocessing choices, and the spatial resolution of remote sensing inputs. These trends are explored further in Section 4.

4. Discussion

The findings from the reviewed studies indicate that ML has made significant contributions to the prediction of coastal flooding and overtopping. The integration of ML techniques has not only improved predictive accuracy but also enhanced computational efficiency, real-time forecasting capabilities, and the ability to handle large-scale and diverse datasets [74]. However, several key trends, challenges, and limitations emerge from the reviewed literature, necessitating further discussion, presented throughout Section 4.1, Section 4.2, Section 4.3, and Section 4.4.

4.1. ML Applications in Coastal Overtopping

The reviewed overtopping studies collectively reveal that ML techniques have substantially improved the predictive skill of models used to estimate mean overtopping discharge across a wide range of coastal structures. Tree-based models, particularly XGB and RF, consistently achieved some of the lowest RMSE values reported in the literature, especially when trained on large, heterogeneous datasets derived from the EurOtop and CLASH databases. Their strong performance can be attributed to their ability to handle nonlinear interactions between hydraulic and structural variables while offering built-in mechanisms for variable importance assessment, which enhances interpretability and aids feature selection.
Neural network-based models also demonstrated competitive performance, especially when applied to datasets with extensive experimental coverage. Classical ANN architectures often performed well when supported by careful preprocessing steps such as feature scaling, Froude’s similarity law transformations, and normalization of structural parameters [65,66,68].
Across the overtopping literature, data preprocessing emerged as a fundamental determinant of model performance. Techniques such as Log(q), normalization of structural parameters, permutation importance for feature generation, and dimensionality reduction (e.g., PCA) significantly influenced predictive accuracy [50,65]. The consistent use of feature scaling or transformation across most studies (Figure 6 and Table 3) underscores the importance of expressing both hydraulic and structural variables in nondimensional form to enhance the transferability of ML models across different test conditions.
Despite these advances, several challenges persist. Most available overtopping datasets originate from laboratory experiments rather than field measurements, limiting the ability of ML models to fully capture the variability of real-world hydrodynamic conditions. Additionally, model performance varies notably with dataset composition, highlighting that no single ML algorithm universally outperforms others. Instead, the suitability of a model depends on data quality, distribution across structure types, and the relevance of selected input variables. These considerations stress the need for improved data collection efforts, hybrid modeling strategies that incorporate physical principles, and greater consistency in preprocessing methods to enhance the robustness of overtopping predictions.

4.2. ML Applications in Coastal Flooding

The studies reviewed on coastal flooding demonstrate that ML methods have become valuable tools for predicting flood extent, susceptibility, and short-term inundation dynamics. Deep learning models, particularly CNNs and FCNs, consistently achieved the highest performance for flood extent mapping when applied to remote sensing data such as Sentinel-1 SAR, Landsat multispectral imagery, UAV-based observations, and fused datasets [81,95]. Their ability to extract spatial patterns from large and heterogeneous image sets allowed CNN-based models to outperform other classifiers such as SVM, DT, KNN, and even non-convolutional ANNs in most comparative studies. Model accuracy in these applications was strongly influenced by the spatial resolution of input data, emphasizing the role of high-quality satellite and aerial imagery in operational flood detection.
Additionally, tree-based models (RF and XGB) proved particularly effective for flood susceptibility mapping and classification tasks based on environmental, hydrological, and meteorological variables. These models excelled at integrating diverse data sources (e.g., DEMs, soil permeability, tide levels), providing strong predictive performance across multiple studies, and outperforming traditional hydrological models in predictive reliability [100]. Their inherent interpretability, combined with feature ranking capabilities, made them especially useful for identifying key flood-controlling factors.
As with overtopping studies, data preprocessing played a critical role in model performance. Image-based studies commonly relied on segmentation, filtering, and geometric or radiometric correction to enhance input quality (Figure 9 and Table 5). GIS-related preprocessing, such as rasterization, factor reclassification, and resolution harmonization, was essential in susceptibility mapping studies. Feature selection techniques, including PCA, RFE, Boruta, and permutation importance, contributed to substantial improvements in model generalization and computational efficiency (Table 4). These findings highlight that preprocessing is not merely a preparatory step but a core component of ML-based flood prediction pipelines.
Although several studies included in this review focus on non-coastal flooding, their methodologies remain highly relevant to coastal environments. Many ML techniques used for riverine or pluvial flooding, such as CNN-based image segmentation, tree-based susceptibility modeling, and LSTM time-series analysis, are equally applicable to coastal flood prediction once coastal forcing variables are introduced. Additionally, the increasing occurrence of compound events, where rainfall and oceanic drivers interact, blurs the distinction between coastal and non-coastal flooding. As a result, inland ML models provide valuable methodological foundations for coastal applications, especially in contexts requiring data fusion, real-time forecasting, or high-resolution flood-extent mapping.
Despite these promising advancements, several limitations hinder the full integration of ML into coastal flood frameworks. Data scarcity remains a major issue in certain regions, particularly where floods are insufficient or inconsistent. While ML models thrive on large datasets, the absence of extensive training data can hinder model generalization and accuracy. Several recent studies have explored integrating multi-source data, including IoT-based flood sensors and crowd-sourced data, to improve real-time adaptability and prediction accuracy [89]. In some cases, synthetic datasets generated through numerical models help mitigate this issue, but discrepancies between simulated and real-world data can still affect prediction reliability. Moreover, the reliance on historical data may not always capture evolving climate change impacts, requiring adaptive learning models that can incorporate real-time updates. Additionally, model interpretability remains a persistent challenge, as deep learning models, such as CNNs and LSTM networks, function as black-box systems, making it difficult for policymakers and engineers to understand how predictions are made [25,76].
Overall, ML applications in coastal flooding have demonstrated substantial promise, particularly in flood extent mapping and susceptibility assessment. Continued progress will depend on improved data availability, expanded use of hybrid physics-informed frameworks, and stronger emphasis on model interpretability and generalization across diverse coastal environments.

4.3. Fundamental Scientific Challenges and Open Questions

Although ML has demonstrated strong predictive capabilities for coastal overtopping and flooding, the literature reveals several unresolved scientific challenges. A central issue is the limited physical interpretability of many ML architectures, which makes it difficult to understand whether learned patterns reflect true hydrodynamic relationships or simply statistical correlations within laboratory datasets. This raises questions about model robustness under unobserved or extreme conditions, particularly when applied beyond the parameter space of controlled experiments.
Another key uncertainty concerns the amount and quality of data required to effectively train ML models. The studies reviewed in this paper highlight that data quality often outweighs quantity, although expanding datasets can improve coverage and address information gaps in some cases. Model selection further influences this relationship, as certain algorithms exhibit stronger performance with smaller, high-quality datasets, while others benefit from larger training volumes. As illustrated in Figure 10, Figure 11 and Figure 12, the performance of the models did not always correlate directly with dataset size. Some models achieved higher accuracy with smaller datasets, while others required larger volumes of data to reach optimal performance. This indicates that the interplay between data volume, quality, and model architecture remains complex and context-dependent, underscoring the importance of developing strategies that balance these factors to enhance predictive reliability.
Another challenge concerns data representativeness. Most available overtopping datasets originate from wave flume experiments, while flood datasets rely heavily on satellite imagery with varying spatial and temporal resolutions. The scarcity of high-quality field measurements complicates efforts to benchmark ML models against real-world processes. This leads to a broader theoretical question: to what extent can ML models trained on idealized or remote sensing datasets capture the full complexity of nearshore hydrodynamics?
An emerging trend in the literature is the combination of physics-based numerical models with ML approaches. Hybrid models seek to leverage the strengths of both methodologies, integrating the predictive power of ML with the established physical principles of hydrodynamic simulations [101]. These hybrid approaches have demonstrated improved accuracy in flood extent mapping and overtopping discharge estimation, addressing some of the limitations associated with standalone ML models. For instance, hybrid ML models incorporating numerical wave models have outperformed traditional physics-based models alone by incorporating real-time learning components [102]. However, while these approaches are emerging, the literature lacks systematic methodologies for integrating physical constraints directly into ML training pipelines. Developing frameworks that blend data-driven flexibility with physical interpretability remains an open research direction central to advancing ML applications in coastal engineering.

4.4. Coastal Engineering-Specific Future Prospects

Future research should prioritize the development of ML frameworks that reflect the unique physical and operational constraints of coastal engineering. A key priority is improving the availability and quality of field-scale overtopping and flooding measurements, which remain scarce due to logistical and safety challenges. ML models trained mainly on laboratory datasets or remote sensing imagery often struggle to generalize to full-scale, storm-driven conditions.
Another important direction is addressing the nonstationary nature of coastal hazards under climate change. Traditional ML models assume that the statistical relationships learned from historical data remain valid in the future. However, coastal forcing conditions are evolving: storm surges are becoming more frequent and intense, compound events involving surge–rainfall–river interactions are increasing, long-term morphological changes alter local hydrodynamics, and sea-level rise shifts baseline water levels and modifies wave transformation processes. These changes can gradually erode the validity of relationships learned from past datasets, leading to reduced predictive accuracy or systematic biases when models are applied to new conditions. At the same time, it is important to recognize that obtaining continuous, high-quality field measurements that capture this nonstationarity is extremely challenging in practice. Most overtopping datasets are laboratory-based, and coastal flood datasets often rely on remote sensing with limited temporal resolution. As a result, adaptive ML frameworks that update their parameters with new observations, while conceptually promising, must be designed to operate effectively even when available data are sparse, irregular, or indirect. Developing ML systems that can assimilate whatever real-time coastal information is available (e.g., tide gauges, wave buoys, or limited field surveys), therefore, remains a key practical and scientific challenge for ensuring robustness under changing coastal hazard conditions.
As a result, IoT-enabled flood sensors and real-time tide gauge networks can provide continuous updates to ML models, enabling adaptive learning and enhancing forecasting accuracy [103]. Additionally, leveraging crowd-sourced flood data and advanced satellite monitoring systems could further improve model performance in dynamic coastal environments [104,105]. Several recent studies (e.g., [106]) have demonstrated that combining IoT-based water level sensors with deep learning models can enable more effective flood management efforts.
Additionally, the use of federated learning approaches, where ML models are trained across decentralized data sources, has been proposed to address privacy concerns while leveraging multi-regional flood data [107,108]. Figure 13 summarizes the emerging research directions and future perspectives of ML applications in coastal flooding and overtopping prediction. The diagram highlights how multi-source data integration, hybrid physics-based and data-driven modeling, and XAI approaches are converging to form next-generation predictive frameworks.
Furthermore, coastal environments exhibit strong site-specificity due to variations in structural geometry, sediment characteristics, and nearshore bathymetry. ML models must be designed to incorporate spatial transferability or include physics-informed constraints that enforce hydrodynamic consistency across sites.
Finally, operational forecasting frameworks must balance model accuracy with computational efficiency. Coastal monitoring often requires near-real-time predictions using limited on-site hardware. Lightweight architectures, hybrid numerical–ML surrogates, and XAI approaches [97,109] tailored to engineering requirements will be critical to achieving practical, deployable systems. Efforts to improve transparency in ML-based flooding and overtopping prediction, using methods such as Shapley Additive Explanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) [110,111], will help to build trust among engineers and policymakers.
In sum, ML has proven to have the potential to be a transformative tool in coastal flooding and overtopping prediction, offering innovative solutions to traditional modeling challenges. The growing integration of hybrid modeling, real-time adaptive learning, and explainable AI techniques is paving the way for more resilient and effective coastal management strategies. While challenges related to data availability, interpretability, and computational efficiency persist, ongoing research into multi-source data integration and intelligent monitoring frameworks is helping bridge these gaps. Future studies should continue exploring these directions to enhance disaster preparedness and mitigate the impacts of climate-induced flooding and overtopping.

5. Conclusions

This review highlights the significant role of machine learning (ML) in predicting coastal flooding and overtopping. The increasing severity of climate change-induced hazards has driven the need for more accurate and efficient predictive models, and ML has emerged as a powerful alternative to traditional numerical and empirical approaches. By analyzing multiple studies, it is evident that ML models, particularly artificial neural networks (ANNs), convolutional neural networks (CNNs), support vector machines (SVMs), extreme gradient boost decision trees (XGB), and random forests (RFs), have greatly improved the ability to forecast extreme coastal events.
A key advantage of ML models is their ability to integrate large-scale and diverse datasets, such as remote sensing data, numerical model outputs, and tide gauge measurements, to produce high-resolution predictions. Preprocessing techniques, including feature selection, normalization, and augmentation, play a crucial role in enhancing model accuracy and reducing computational costs. The development of hybrid ML models that combine physics-based numerical simulations with data-driven learning has further enhanced predictive performance, addressing some of the limitations associated with traditional standalone models. Additionally, the incorporation of IoT-based real-time monitoring systems and explainable AI (XAI) techniques presents promising opportunities for advancing coastal hazard prediction and management, making ML models more interpretable and usable for policymakers and engineers.
An important distinction between ML-based systems and traditional numerical models lies in their response to nonstationary coastal conditions. Traditional models are grounded in physical equations calibrated under assumptions of stationarity, meaning that their predictive skill may gradually weaken as sea-level rise, changing storm regimes, and long-term morphological adjustments alter the underlying coastal dynamics. In contrast, ML models, particularly those connected to real-time observational networks or adaptive learning workflows, can be continually updated as new data becomes available. This capacity for incremental learning offers a pathway for ML approaches to remain aligned with evolving climate-driven conditions. Nevertheless, such adaptability requires sustained data streams and careful retraining strategies to avoid propagating bias or noise, underscoring the need for integrated frameworks that combine physical understanding with data-driven flexibility.
However, despite these advancements, several unresolved scientific challenges remain. One of the most pressing issues is the limited physical interpretability of many ML models. Because coastal flooding and overtopping are governed by complex hydrodynamic processes, it is often difficult to determine whether ML models are learning meaningful physical relationships or simply capturing patterns specific to laboratory or remote sensing datasets. This raises fundamental questions about model robustness under extreme or previously unseen conditions. Another challenge concerns data representativeness: overtopping datasets are predominantly derived from laboratory experiments, while flood datasets rely heavily on satellite imagery with variable spatial and temporal resolution. These limitations complicate model generalization and underscore the need for more field-scale measurements.
The relationship between dataset size, quality, and model architecture also remains complex. As illustrated across the reviewed studies, larger datasets do not always yield superior performance, and smaller high-quality datasets can, in some cases, produce more reliable predictions. This suggests that future research should prioritize strategies that improve data quality, enhance feature engineering, and better balance data volume with model requirements. Hybrid approaches integrating physical constraints within ML training are emerging as a promising pathway, but still lack systematic frameworks for coastal applications.
This review contributes new value to the coastal engineering literature by explicitly synthesizing ML applications across both coastal overtopping and coastal flooding, two processes that are often treated separately in existing reviews. Unlike earlier ML flood surveys that focus predominantly on general flood mapping, this work emphasizes coastal-specific forcing mechanisms, data limitations, and operational constraints, while also clarifying which inland ML approaches can be meaningfully transferred to coastal settings. By framing the discussion around coastal engineering needs rather than generic ML performance, the review offers clearer guidance for future methodological development and real-world implementation.
By addressing challenges highlighted in this paper and embracing recent advancements, ML can move beyond proof-of-concept studies toward robust, operational tools. In doing so, it has the potential to significantly strengthen coastal risk management and contribute to the development of safer, more resilient coastal communities in the face of accelerating climate-driven change.

Author Contributions

M.L.D.: Conceptualization, methodology, validation, formal analysis, investigation, software, visualization, data curation, writing—original draft preparation/review and editing; V.R.: Conceptualization, methodology, validation, writing—review and editing, supervision, resources. P.R.-S. and F.T.-P.: Conceptualization, methodology, validation, writing—review and editing, supervision, resources, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ph.D. scholarship grant from the Portuguese Foundation of Science and Technology (FCT), with reference 2024.02869.BD. Furthermore, during this research study, Victor Ramos was supported by the program of Stimulus of Scientific Employment Individual Support (CEECIND/03665/2018) from the Portuguese Foundation of Science and Technology (FCT).

Data Availability Statement

Data sharing is not applicable.

Acknowledgments

The authors would like to thank the University of Porto as the hosting institution.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
ABAdaptive Boost and Bagging
AdaBoostAdaptive Boost
AcAccuracy
AHPAnalytical Hierarchy Process
AIArtificial Intelligence
ANNArtificial Neural Network
APAverage Precision
ARDAnalysis-Ready data
AUCArea Under the Curve
BPNNBackpropagation Neural Network
BRTBoosted Regression Tree
CCNNCascade Correlation Neural Networks
CDsCoastal Defense Structures
CEMSRMCopernicus Emergency Management Service Rapid Mapping
cGANconditional Generative Adversarial Network
CNNsConvolutional Neural Networks
cRMSECentered Root-Mean-Squared Error
DDMMSDepartment of Disaster Management, Maharashtra State
DEMDigital Elevation Models
DNData Normalization
DTDecision Tree
FEMAFederal Emergency Management Agency
FSLFroude’s Similarity Law scaling
GANGenerative Adversarial Network
GBDTsGradient Boosting Decision Trees
GBMGradient Boosting Machine
GEGoogle Earth
GISGeographic Information System
GPRGaussian Process Regression
GRNNGeneral Regression Neural Networks
HCFCDHarris County Flood Control District
HRSDHampton Roads Sanitation District
IoTInternet of Things
IoUIntersection over Union
KNNK-Nearest Neighbors
KNNIMK-Nearest Neighbor Imputation Method
LBLogitBoost
LIMELocal Interpretable Model-Agnostic Explanations
LRLinear Regression
LSTMLong Short-Term Memory
LUPLand Use Planning
MAEMean Absolute Error
MLMachine Learning
MDAMaximum Dissimilarity Algorithm
MLPMultilayer Perceptron
MODMean Overtopping Discharge
MPNNMultilayer Perceptron Neural Networks
MSEMean Squared Error
NBNaïve Bayes
NBSSNational Bureau of Soil Survey
NLCDNational Land Cover Database
NOAANational Oceanic and Atmospheric Administration
OAcOverall Accuracy
OSMOpenStreetMap
PCAPrincipal Component Analysis
PIPermutation Importance
PRPolynomial Regression
PrPrecision
R2Coefficient of Determination
RANSReynolds-Averaged Navier–Stokes
RFERecursive Feature Elimination
ResNetResidual Neural Network
RFsRandom Forests
RMSERoot Mean Squared Error
ROCReceiver Operating Characteristic curve
RTFRotation Tree
SARSynthetic Aperture Radar
ScScopus
SDScienceDirect
SHAPSHapley Additive exPlanation
SOMSelf-Organized Maps
SPNWHStructure Parameter Normalized by Wave Height
SRTMShuttle Radar Topography Mission
STORMSystem to Track, Organize, Record, and Map
SVMSupport Vector Machine
SVRSupport Vector Regression
UAVUnmanned Aerial Vehicles
UNOSATUnited Nations Satellite Centre
USGSUnited States Geological Survey
VDSMGIVietnam Department of Survey, Mapping, and Geographic Information
WFRWeight Factor Reduction
WoSWeb of Science
WRMWrapper Reduction Method
XAIeXplainable Artificial Intelligence
XBNHXbeach’s Non-Hydrostatic Mode
XBSBXbeach’s Surfbeat Mode
XGBeXtreme Gradient Boosting
XTExtreme Randomized Tree

References

  1. Xie, D.; Zou, Q.-P.; Mignone, A.; MacRae, J.D. Coastal Flooding from Wave Overtopping and Sea Level Rise Adaptation in the Northeastern USA. Coast. Eng. 2019, 150, 39–58. [Google Scholar] [CrossRef]
  2. Qiang, Y.; He, J.; Xiao, T.; Lu, W.; Li, J.; Zhang, L. Coastal Town Flooding Upon Compound Rainfall–Wave Overtopping–Storm Surge During Extreme Tropical Cyclones in Hong Kong. J. Hydrol. Reg. Stud. 2021, 37, 100890. [Google Scholar] [CrossRef]
  3. Carneiro-Barros, J.E.; Plomaritis, T.A.; Fazeres-Ferradosa, T.; Rosa-Santos, P.; Taveira-Pinto, F. Coastal Flood Mapping with Two Approaches Based on Observations at Furadouro, Northern Portugal. Remote Sens. 2023, 15, 5215. [Google Scholar] [CrossRef]
  4. Kirezci, E.; Young, I.R.; Ranasinghe, R.; Muis, S.; Nicholls, R.J.; Lincke, D.; Hinkel, J. Projections of Global-Scale Extreme Sea Levels and Resulting Episodic Coastal Flooding Over the 21st Century. Sci. Rep. 2020, 10, 11629. [Google Scholar] [CrossRef]
  5. Symonds, G.; Huntley, D.A.; Bowen, A.J. Two-Dimensional Surf Beat: Long Wave Generation by a Time-Varying Breakpoint. J. Geophys. Res. Ocean. 1982, 87, 492–498. [Google Scholar] [CrossRef]
  6. De Bakker, A.T.M.; Tissier, M.F.S.; Ruessink, B.G. Shoreline Dissipation of Infragravity Waves. Cont. Shelf Res. 2014, 72, 73–82. [Google Scholar] [CrossRef]
  7. Dodet, G.; Melet, A.; Ardhuin, F.; Bertin, X.; Idier, D.; Almar, R. The Contribution of Wind-Generated Waves to Coastal Sea-Level Changes. Surv. Geophys. 2019, 40, 1563–1601. [Google Scholar] [CrossRef]
  8. Central Water Commission. Status Report on Coastal Protection & Development in India; Government of India: New Delhi, India, 2016.
  9. IPCC. Technical Summary. In IPCC Special Report on the Ocean and Cryosphere in a Changing Climate; Pörtner, H.-O., Roberts, D.C., Masson-Delmotte, V., Zhai, P., Poloczanska, E., Mintenbeck, K., Tignor, M., Alegría, A., Nicolai, M., Okem, A., et al., Eds.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2019; pp. 39–69. [Google Scholar] [CrossRef]
  10. Koosheh, A.; Etemad-Shahidi, A.; Cartwright, N.; Tomlinson, R.; van Gent, M.R.A. Individual Wave Overtopping at Coastal Structures: A Critical Review and the Existing Challenges. Appl. Ocean Res. 2021, 106, 102476. [Google Scholar] [CrossRef]
  11. van der Meer, J.; Pullen, T.; Allsop, W.; Bruce, T.; Schüttrumpf, H.; Kortenhaus, A. Prediction of Overtopping. In Handbook of Coastal and Ocean Engineering; Kim, Y.C., Ed.; World Scientific: Singapore, 2009; pp. 341–382. [Google Scholar] [CrossRef]
  12. Geeraerts, J.; Troch, P.; De Rouck, J.; Verhaeghe, H.; Bouma, J.J. Wave Overtopping at Coastal Structures: Prediction Tools and Related Hazard Analysis. J. Clean. Prod. 2007, 15, 1514–1521. [Google Scholar] [CrossRef]
  13. Contardo, S.; Lowe, R.J.; Hansen, J.E.; Rijnsdorp, D.P.; Dufois, F.; Symonds, G. Free and Forced Components of Shoaling Long Waves in the Absence of Short-Wave Breaking. J. Phys. Oceanogr. 2021, 51, 1465–1487. [Google Scholar] [CrossRef]
  14. Lowe, R.J.; Altomare, C.; Buckley, M.L.; da Silva, R.F.; Hansen, J.E.; Rijnsdorp, D.P.; Domínguez, J.M.; Crespo, A.J.C. Smoothed Particle Hydrodynamics Simulations of Reef Surf Zone Processes Driven by Plunging Irregular Waves. Ocean Model. 2022, 171, 101945. [Google Scholar] [CrossRef]
  15. EurOtop. Manual on Wave Overtopping of Sea Defences and Related Structures; van der Meer, J.W., Allsop, N.W.H., Bruce, T., de Rouck, J., Kortenhaus, A., Pullen, T., Schüttrumpf, H., Troch, P., Zanuttigh, B., Eds.; EurOtop Consortium: Delft, The Netherlands, 2018; Available online: www.overtopping-manual.com (accessed on 20 October 2025).
  16. Pullen, T.; Bruce, T.; Simm, J.; Allsop, W.; Kortenhaus, A. EurOtop: Manual on Wave Overtopping of Sea Defences and Related Structures—An Overtopping Manual Largely Based on European Research, but for Worldwide Application, 2nd ed.; Environment Agency: Bristol, UK, 2019.
  17. Den Bieman, J.P.; de Ridder, M.P.; van Gent, M.R.A. Deep Learning Video Analysis as Measurement Technique in Physical Models. Coast. Eng. 2020, 158, 103689. [Google Scholar] [CrossRef]
  18. Asiri, M.M.; Aldehim, G.; Alruwais, N.; Allafi, R.; Alzahrani, I.; Nouri, A.M.; Assiri, M.; Ahmed, N.A. Coastal Flood Risk Assessment Using Ensemble Multi-Criteria Decision-Making with Machine Learning Approaches. Environ. Res. 2024, 245, 118042. [Google Scholar] [CrossRef] [PubMed]
  19. Mosavi, A.; Ozturk, P.; Chau, K. Flood Prediction Using Machine Learning Models: Literature Review. Water 2018, 10, 1536. [Google Scholar] [CrossRef]
  20. Alshahri, A.H.; Elbisy, M.S. Assessment of Using Artificial Neural Network and Support Vector Machine Techniques for Predicting Wave-Overtopping Discharges at Coastal Structures. J. Mar. Sci. Eng. 2023, 11, 539. [Google Scholar] [CrossRef]
  21. Elbisy, M.S. Estimation of Wave Overtopping Discharges at Coastal Structures with Combined Slopes Using Machine Learning Techniques. Eng. Technol. Appl. Sci. Res. 2024, 14, 14033–14038. [Google Scholar] [CrossRef]
  22. Willard, J.; Jia, X.; Xu, S.; Steinbach, M.; Kumar, V. Integrating Physics-Based Modeling with Machine Learning: A Survey. arXiv 2020, arXiv:2003.04919. [Google Scholar]
  23. Abouhalima, M.; das Neves, L.; Taveira-Pinto, F.; Rosa-Santos, P. Machine Learning in Coastal Engineering: Applications, Challenges, and Perspectives. J. Mar. Sci. Eng. 2024, 12, 638. [Google Scholar] [CrossRef]
  24. Babati, A.H.; Isa, Z.; Abdussalam, A.F.; Baba, S.U.; Mustapha, B.B.; Musa, A.S. Application of Machine Learning for Coastal Flooding. Discov. Cities 2025, 2, 80. [Google Scholar] [CrossRef]
  25. Rudin, C. Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
  26. Qin, Y.; Su, C.; Chu, D.; Zhang, J.; Song, J. A Review of Application of Machine Learning in Storm Surge Problems. J. Mar. Sci. Eng. 2023, 11, 1729. [Google Scholar] [CrossRef]
  27. Goldstein, E.B.; Coco, G.; Plant, N.G. A Review of Machine Learning Applications to Coastal Sediment Transport and Morphodynamics. Earth-Sci. Rev. 2019, 194, 97–108. [Google Scholar] [CrossRef]
  28. Clemente, D.; Teixeira-Duarte, F.; Rosa-Santos, P.; Taveira-Pinto, F. Advancements on Optimization Algorithms Applied to Wave Energy Assessment: An Overview on Wave Climate and Energy Resource. Energies 2023, 16, 4660. [Google Scholar] [CrossRef]
  29. Habib, M.A.; O’Sullivan, J.J.; Salauddin, M. Prediction of Wave Overtopping Characteristics at Coastal Flood Defences Using Machine Learning Algorithms: A Systematic Review. IOP Conf. Ser. Earth Environ. Sci. 2022, 1072, 012003. [Google Scholar] [CrossRef]
  30. Emmanuel, I.A.; Ojewumi, M.; Huang, W. A Comprehensive Review of AI and ML Applications in Coastal Engineering. Preprints 2025, 2025010600. [Google Scholar] [CrossRef]
  31. Garzon, J.L.; Ferreira, O.; Plomaritis, T.A.; Zózimo, A.C.; Fortes, C.J.E.M.; Pinheiro, L.V. Development of a Bayesian Network-Based Early Warning System for Storm-Driven Coastal Erosion. Coast. Eng. 2024, 189, 104460. [Google Scholar] [CrossRef]
  32. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R, 2nd ed.; Springer: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
  33. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  34. Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
  35. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  36. Schmidhuber, J. Deep Learning in Neural Networks: An Overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef]
  37. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  38. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  39. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  40. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  41. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3146–3154. [Google Scholar]
  42. Probst, P.; Wright, M.N.; Boulesteix, A.L. Hyperparameters and Tuning Strategies for Random Forest. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef]
  43. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
  44. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  45. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
  46. Cutler, D.R.; Edwards, T.C., Jr.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J. Random Forests for Classification in Ecology. Ecology 2007, 88, 2783–2792. [Google Scholar] [CrossRef] [PubMed]
  47. Biau, G.; Scornet, E. A Random Forest Guided Tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
  48. Den Bieman, J.P.; van Gent, M.R.A.; van den Boogaard, H.F.P. Wave Overtopping Predictions Using an Advanced Machine Learning Technique. Coast. Eng. 2021, 166, 103830. [Google Scholar] [CrossRef]
  49. Kim, T.; Lee, W.-D. Review on Applications of Machine Learning in Coastal and Ocean Engineering. J. Ocean Eng. Technol. 2022, 36, 194–210. [Google Scholar] [CrossRef]
  50. Habib, M.A.; O’Sullivan, J.J.; Abolfathi, S.; Salauddin, M. Enhanced Wave Overtopping Simulation at Vertical Breakwaters Using Machine Learning Algorithms. PLoS ONE 2023, 18, e0289318. [Google Scholar] [CrossRef]
  51. van Gent, M.R.A.; van den Boogaard, H.F.P.; Pozueta, B.; Medina, J.R. Neural Network Modelling of Wave Overtopping at Coastal Structures. Coast. Eng. 2007, 54, 586–593. [Google Scholar] [CrossRef]
  52. Steendam, G.J.; van der Meer, J.W.; Verhaeghe, H.; Besley, P.; Franco, L.; van Gent, M.R.A. The International Database on Wave Overtopping. Coast. Eng. 2004, 1, 4301–4313. [Google Scholar] [CrossRef]
  53. den Bieman, J.P.; Wilms, J.M.; van den Boogaard, H.F.P.; van Gent, M.R.A. Prediction of Mean Wave Overtopping Discharge Using Gradient Boosting Decision Trees. Water 2020, 12, 1703. [Google Scholar] [CrossRef]
  54. Hosseinzadeh, S.; Etemad-Shahidi, A.; Koosheh, A. Prediction of Mean Wave Overtopping at Simple Sloped Breakwaters Using Kernel-Based Methods. J. Hydroinform. 2021, 23, 1030–1049. [Google Scholar] [CrossRef]
  55. Jafari, E.; Etemad-Shahidi, A. Derivation of a New Model for Prediction of Wave Overtopping at Rubble Mound Structures. J. Waterw. Port Coast. Ocean. Eng. 2012, 138, 42–52. [Google Scholar] [CrossRef]
  56. Formentin, S.M.; Zanuttigh, B.; van der Meer, J.W. A Neural Network Tool for Predicting Wave Reflection, Overtopping and Transmission. Coast. Eng. J. 2017, 59, 1750006-1–1750006-31. [Google Scholar] [CrossRef]
  57. Kim, T.; Lee, W.-D. Prediction of Wave Overtopping Discharges at Coastal Structures Using Interpretable Machine Learning. Coast. Eng. J. 2023, 65, 433–449. [Google Scholar] [CrossRef]
  58. Tsai, Y.-T.; Tsai, C.-P. Predictions of Wave Overtopping Using Deep Learning Neural Networks. J. Mar. Sci. Eng. 2023, 11, 1925. [Google Scholar] [CrossRef]
  59. Alvarellos, A.; Figuero, A.; Rodríguez-Yáñez, S.; Sande, J.; Peña, E.; Rosa-Santos, P.; Rabuñal, J. Deep Learning-Based Wave Overtopping Prediction. Appl. Sci. 2024, 14, 2611. [Google Scholar] [CrossRef]
  60. Carro, H.; Sande, J.; Figuero, A.; Alvarellos, A.; Peña, E.; Rabuñal, J.; Guerra, A.; Pérez, J.D. Machine Learning Tool for Wave Overtopping Prediction Based on the Safety-Operability Ratio. Ocean Eng. 2024, 312, 119006. [Google Scholar] [CrossRef]
  61. Technical Advisory Committee for Flood Defence in The Netherlands (TAW). Wave Run-Up and Wave Overtopping at Dikes; Technical Report; TAW: Delft, The Netherlands, 2002. [Google Scholar]
  62. Lee, S.B.; Suh, K.-D. Development of Wave Overtopping Formulas for Inclined Seawalls Using GMDH Algorithm. KSCE J. Civ. Eng. 2019, 23, 1899–1910. [Google Scholar] [CrossRef]
  63. Etemad-Shahidi, A.; Shaeri, S.; Jafari, E. Prediction of Wave Overtopping at Vertical Structures. Coast. Eng. 2016, 109, 42–52. [Google Scholar] [CrossRef]
  64. Zanuttigh, B.; Formentin, S.M.; van der Meer, J.W. Prediction of Extreme and Tolerable Wave Overtopping Discharges Through an Advanced Neural Network. Ocean Eng. 2016, 127, 7–22. [Google Scholar] [CrossRef]
  65. Elbisy, M.S. Machine Learning Techniques for Estimating Wave-Overtopping Discharges at Coastal Structures. Ocean Eng. 2023, 273, 113972. [Google Scholar] [CrossRef]
  66. Oliver, J.M.; Esteban, M.D.; López-Gutiérrez, J.-S.; Negro, V.; Neves, M.G. Optimizing Wave Overtopping Energy Converters by ANN Modelling: Evaluating the Overtopping Rate Forecasting as the First Step. Sustainability 2021, 13, 1483. [Google Scholar] [CrossRef]
  67. Verhaeghe, H.; De Rouck, J.; van der Meer, J. Combined Classifier–Quantifier Model: A Two-Phase Neural Model for Prediction of Wave Overtopping at Coastal Structures. Coast. Eng. 2008, 55, 357–374. [Google Scholar] [CrossRef]
  68. Zanuttigh, B.; Formentin, S.M.; van der Meer, J.W. Advances in Modelling Wave-Structure Interaction. Coast. Eng. 2014, 34, 1–14. [Google Scholar] [CrossRef]
  69. Mares-Nasarre, P.; Molines, J.; Gómez-Martín, M.E.; Medina, J.R. Explicit Neural Network-Derived Formula for Overtopping Flow on Mound Breakwaters in Depth-Limited Breaking Wave Conditions. Coast. Eng. 2021, 164, 103810. [Google Scholar] [CrossRef]
  70. Qin, J.; Gao, L.; Lin, K.; Shen, P. A Novel and Efficient Method for Real-Time Simulating Spatial and Temporal Evolution of Coastal Urban Pluvial Flood Without Drainage Network. Environ. Model. Softw. 2024, 172, 105888. [Google Scholar] [CrossRef]
  71. Liu, B.; Li, X.; Zheng, G. Coastal Inundation Mapping From Bitemporal and Dual-Polarization SAR Imagery Based on Deep Convolutional Neural Networks. J. Geophys. Res. Ocean. 2019, 124, 9101–9113. [Google Scholar] [CrossRef]
  72. Peng, B.; Meng, Z.; Huang, Q.; Wang, C. Patch Similarity Convolutional Neural Network for Urban Flood Extent Mapping Using Bi-Temporal Satellite Multispectral Imagery. Remote Sens. 2019, 11, 2492. [Google Scholar] [CrossRef]
  73. Hasan, M.H.; Ahmed, A.; Nafee, K.M.; Hossen, M.A. Use of Machine Learning Algorithms to Assess Flood Susceptibility in the Coastal Area of Bangladesh. Ocean Coast. Manag. 2023, 236, 106503. [Google Scholar] [CrossRef]
  74. Zahura, F.T.; Goodall, J.L. Predicting Combined Tidal and Pluvial Flood Inundation Using a Machine Learning Surrogate Model. J. Hydrol. Reg. Stud. 2022, 41, 101087. [Google Scholar] [CrossRef]
  75. Saravanan, S.; Abijith, D. Flood Susceptibility Mapping of Northeast Coastal Districts of Tamil Nadu, India Using Multi-Source Geospatial Data and Machine Learning Techniques. Geocarto Int. 2022, 37, 15252–15281. [Google Scholar] [CrossRef]
  76. Samek, W.; Wiegand, T.; Müller, K.R. Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. arXiv 2017, arXiv:1708.08296. [Google Scholar] [CrossRef]
  77. Naeini, S.S.; Snaiki, R. A Physics-Informed Machine Learning Model for Time-Dependent Wave Runup Prediction. Ocean Eng. 2024, 295, 116986. [Google Scholar] [CrossRef]
  78. Vu, V.T.; Nguyen, H.D.; Vu, P.L.; Ha, M.C.; Bui, V.D.; Nguyen, T.O.; Hoang, V.H.; Nguyen, T.K.H. Predicting Land Use Effects on Flood Susceptibility Using Machine Learning and Remote Sensing in Coastal Vietnam. Water Pract. Technol. 2023, 18, 1543–1555. [Google Scholar] [CrossRef]
  79. Prasad, P.; Loveson, V.J.; Das, B.; Kotha, M. Novel Ensemble Machine Learning Models in Flood Susceptibility Mapping. Geocarto Int. 2021, 37, 4571–4593. [Google Scholar] [CrossRef]
  80. Shastry, A.; Carter, E.; Coltin, B.; Sleeter, R.; McMichael, S.; Eggleston, J. Mapping Floods from Remote Sensing Data and Quantifying the Effects of Surface Obstruction by Clouds and Vegetation. Remote Sens. Environ. 2023, 291, 113556. [Google Scholar] [CrossRef]
  81. Nemni, E.; Bullock, J.; Belabbes, S.; Bromley, L. Fully Convolutional Neural Network for Rapid Flood Segmentation in Synthetic Aperture Radar Imagery. Remote Sens. 2020, 12, 2532. [Google Scholar] [CrossRef]
  82. Kang, W.; Xiang, Y.; Wang, F.; Wan, L.; You, H. Flood Detection in Gaofen-3 SAR Images via Fully Convolutional Networks. Sensors 2018, 18, 2915. [Google Scholar] [CrossRef]
  83. Sarker, C.; Mejias, L.; Maire, F.; Woodley, A. Flood Mapping with Convolutional Neural Networks Using Spatio-Contextual Pixel Information. Remote Sens. 2019, 11, 2331. [Google Scholar] [CrossRef]
  84. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  85. Li, J.; Wu, G.; Zhang, Y.; Shi, W. Optimizing Flood Predictions by Integrating LSTM and Physical-Based Models with Mixed Historical and Simulated Data. Heliyon 2024, 10, e33669. [Google Scholar] [CrossRef] [PubMed]
  86. Jolliffe, I.T.; Cadima, J. Principal Component Analysis: A Review and Recent Developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef]
  87. Hidayah, E.; Indarto; Lee, W.K.; Halik, G.; Pradhan, B. Assessing Coastal Flood Susceptibility in East Java, Indonesia: Comparison of Statistical Bivariate and Machine Learning Techniques. Water 2022, 14, 3869. [Google Scholar] [CrossRef]
  88. Dong, S.; Yu, T.; Farahmand, H.; Mostafavi, A. A Hybrid Deep Learning Model for Predictive Flood Warning and Situation Awareness Using Channel Network Sensors Data. Comput.-Aided Civ. Infrastruct. Eng. 2021, 36, 402–420. [Google Scholar] [CrossRef]
  89. William, P.; Oyebode, O.J.; Ramu, G.; Lakhanpal, S.; Gupta, K.K.; Al-Jawahry, H.M. Framework for IoT-Based Real-Time Monitoring System of Rainfall Water Level for Flood Prediction Using LSTM Network. In Proceedings of the 2023 3rd International Conference on Pervasive Computing and Social Networking (ICPCSN), Salem, India, 7–8 June 2023; pp. 1321–1326. [Google Scholar]
  90. Gunning, D.; Aha, D.W. DARPA’s Explainable Artificial Intelligence (XAI) Program. AI Mag. 2019, 40, 44–58. [Google Scholar] [CrossRef]
  91. Salatin, R.; Chen, Q.; Raubenheimer, B.; Elgar, S.; Gorrell, L.; Li, X. A New Framework for Quantifying Alongshore Variability of Swash Motion Using Fully Convolutional Networks. Coast. Eng. 2024, 192, 104542. [Google Scholar] [CrossRef]
  92. Hou, J.; Li, X.; Bai, G.; Wang, X.; Zhang, Z.; Yang, L.; Du, Y.; Ma, Y.; Fu, D.; Zhang, X. A Deep Learning Technique-Based Flood Propagation Experiment. J. Flood Risk Manag. 2021, 14, e12718. [Google Scholar] [CrossRef]
  93. Gebrehiwot, A.; Hashemi-Beni, L.; Thompson, G.; Kordjamshidi, P.; Langan, T.E. Deep Convolutional Neural Network for Flood Extent Mapping Using Unmanned Aerial Vehicles Data. Sensors 2019, 19, 1486. [Google Scholar] [CrossRef]
  94. Ichim, L.; Popescu, D. Segmentation of Vegetation and Flood from Aerial Images Based on Decision Fusion of Neural Networks. Remote Sens. 2020, 12, 2490. [Google Scholar] [CrossRef]
  95. Muñoz, D.F.; Muñoz, P.; Moftakhari, H.; Moradkhani, H. From Local to Regional Compound Flood Mapping with Deep Learning and Data Fusion Techniques. Sci. Total Environ. 2021, 782, 146927. [Google Scholar] [CrossRef]
  96. Di Bacco, M.; Contento, A.; Scorzini, A.R. Exploring the Compound Nature of Coastal Flooding by Tropical Cyclones: A Machine Learning Framework. J. Hydrol. 2024, 645, 132262. [Google Scholar] [CrossRef]
  97. Nhangumbe, M.; Nascetti, A.; Georganos, S.; Ban, Y. Supervised and Unsupervised Machine Learning Approaches Using Sentinel Data for Flood Mapping and Damage Assessment in Mozambique. Remote Sens. Appl. Soc. Environ. 2023, 32, 101015. [Google Scholar] [CrossRef]
  98. Dodangeh, E.; Choubin, B.; Eigdir, A.N.; Nabipour, N.; Panahi, M.; Shamshirband, S.; Mosavi, A. Integrated Machine Learning Methods with Resampling Algorithms for Flood Susceptibility Prediction. Sci. Total Environ. 2020, 705, 135983. [Google Scholar] [CrossRef]
  99. Ramayanti, S.; Nur, A.S.; Syifa, M.; Panahi, M.; Achmad, A.R.; Park, S.; Lee, C.W. Performance Comparison of Two Deep Learning Models for Flood Susceptibility Map in Beira Area, Mozambique. Egypt. J. Remote Sens. Space Sci. 2022, 25, 1025–1036. [Google Scholar] [CrossRef]
  100. Sanders, W.; Li, D.; Li, W.; Fang, Z.N. Data-Driven Flood Alert System (FAS) Using Extreme Gradient Boosting (XGBoost) to Forecast Flood Stages. Water 2022, 14, 747. [Google Scholar] [CrossRef]
  101. Tu, H.; Moura, S.; Wang, Y.; Fang, H. Integrating Physics-Based Modeling with Machine Learning for Lithium-Ion Batteries. Appl. Energy 2023, 329, 120289. [Google Scholar] [CrossRef]
  102. Khosravi, K.; Ali, M.; Heddam, S. Near Real-Time Significant Wave Height Prediction Along the Coastline of Queensland Using Advanced Hybrid Machine Learning Models. Int. J. Environ. Sci. Technol. 2024, 22, 5309–5326. [Google Scholar] [CrossRef]
  103. Chang, C.-H.; Rahmad, R.; Wu, S.-J.; Hsu, C.-T.; Chung, P.-H. Enhancing Flood Verification Using Signal Detection Theory (SDT) and IoT Sensors: A Spatial Scale Evaluation. J. Hydrol. 2024, 636, 131308. [Google Scholar] [CrossRef]
  104. Helmrich, A.M.; Ruddell, B.L.; Bessem, K.; Chester, M.V.; Chohan, N.; Doerry, E.; Eppinger, J.; Garcia, M.; Goodall, J.L.; Lowry, C.; et al. Opportunities for Crowdsourcing in Urban Flood Monitoring. Environ. Model. Softw. 2021, 143, 105124. [Google Scholar] [CrossRef]
  105. Vitousek, S.; Buscombe, D.; Vos, K.; Barnard, P.L.; Ritchie, A.C.; Warrick, J.A. The Future of Coastal Monitoring Through Satellite Remote Sensing. Camb. Prism. Coast. Futures 2023, 1, e10. [Google Scholar] [CrossRef]
  106. Hashemi-Beni, L.; Puthenparampil, M.; Jamali, A. A Low-Cost IoT-Based Deep Learning Method of Water Gauge Measurement for Flood Monitoring. Geomat. Nat. Hazards Risk 2024, 15, 2364777. [Google Scholar] [CrossRef]
  107. Farooq, M.S.; Tehseen, R.; Qureshi, J.N.; Omer, U.; Yaqoob, R.; Tanweer, H.A.; Atal, Z. FFM: Flood Forecasting Model Using Federated Learning. IEEE Access 2023, 11, 24472–24483. [Google Scholar] [CrossRef]
  108. Qi, P.; Chiaro, D.; Guzzo, A.; Ianni, M.; Fortino, G.; Piccialli, F. Model Aggregation Techniques in Federated Learning: A Comprehensive Survey. Future Gener. Comput. Syst. 2024, 150, 272–293. [Google Scholar] [CrossRef]
  109. Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges Toward Responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  110. Doshi-Velez, F.; Kim, B. Towards a Rigorous Science of Interpretable Machine Learning. arXiv 2018, arXiv:1702.08608. [Google Scholar]
  111. Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
Figure 1. Number of publications related to overtopping studies per year from 3 scientific databases.
Figure 1. Number of publications related to overtopping studies per year from 3 scientific databases.
Jmse 13 02384 g001
Figure 2. Number of publications related to coastal flooding studies per year from 3 scientific databases.
Figure 2. Number of publications related to coastal flooding studies per year from 3 scientific databases.
Jmse 13 02384 g002
Figure 3. Frequently used ML models in the studies of coastal flooding and overtopping.
Figure 3. Frequently used ML models in the studies of coastal flooding and overtopping.
Jmse 13 02384 g003
Figure 4. A summary of 6 ML families used in different wave overtopping studies.
Figure 4. A summary of 6 ML families used in different wave overtopping studies.
Jmse 13 02384 g004
Figure 5. Models with the highest performance when compared with other ML models from the same wave overtopping study.
Figure 5. Models with the highest performance when compared with other ML models from the same wave overtopping study.
Jmse 13 02384 g005
Figure 6. Number of overtopping studies that used the same data processing methods.
Figure 6. Number of overtopping studies that used the same data processing methods.
Jmse 13 02384 g006
Figure 7. Summary of 6 ML families used in different coastal-flooding related studies.
Figure 7. Summary of 6 ML families used in different coastal-flooding related studies.
Jmse 13 02384 g007
Figure 8. Models with the highest performance when compared with other ML models from the same coastal-flooding related study.
Figure 8. Models with the highest performance when compared with other ML models from the same coastal-flooding related study.
Jmse 13 02384 g008
Figure 9. Number of coastal-flooding related studies that used the same data processing methods.
Figure 9. Number of coastal-flooding related studies that used the same data processing methods.
Jmse 13 02384 g009
Figure 10. Scatter plot of ML model’s performance vs. the amount of data used to train them in overtopping studies. The metrics used include root-mean-squared-error (RMSE), mean-squared-error (MSE), average precision (AP), and coefficient of determination (R2).
Figure 10. Scatter plot of ML model’s performance vs. the amount of data used to train them in overtopping studies. The metrics used include root-mean-squared-error (RMSE), mean-squared-error (MSE), average precision (AP), and coefficient of determination (R2).
Jmse 13 02384 g010
Figure 11. Scatter plot of ML model’s performance vs. the number of images used to predict the flooded area. The metrics used include accuracy (Ac), root-mean-squared-error (RMSE), average precision (AP), and precision (Pr).
Figure 11. Scatter plot of ML model’s performance vs. the number of images used to predict the flooded area. The metrics used include accuracy (Ac), root-mean-squared-error (RMSE), average precision (AP), and precision (Pr).
Jmse 13 02384 g011
Figure 12. Scatter plot of ML model’s performance vs. the number of observations or location points used to predict the flood susceptible area. The metrics used include area under the curve (AUC), root-mean-squared-error (RMSE), accuracy (Ac), mean-squared-error (MSE), and precision (Pr).
Figure 12. Scatter plot of ML model’s performance vs. the number of observations or location points used to predict the flood susceptible area. The metrics used include area under the curve (AUC), root-mean-squared-error (RMSE), accuracy (Ac), mean-squared-error (MSE), and precision (Pr).
Jmse 13 02384 g012
Figure 13. Conceptual framework of future directions in machine learning for coastal flooding and overtopping prediction.
Figure 13. Conceptual framework of future directions in machine learning for coastal flooding and overtopping prediction.
Jmse 13 02384 g013
Table 2. Summary of reviewed coastal overtopping studies.
Table 2. Summary of reviewed coastal overtopping studies.
AuthorsML Models/
Algorithm
Data Size and VariablesData Preprocessing
Methods
Performance
van Gent et al. [51]ANN
[Various structures]
8372 (train)
14 features
1 output (MOD)
WFR; FSL; Log(q)RMSE = 0.29
den Bieman et al. [53]XGB
[Various structures]
8372 (train)
14 features
1 output (MOD)
WFR; FSL; Log(q); PI (feature generation)RMSE = 0.168–0.104
den Bieman et al. [48]XGB
[Various structures]
6943 (train) and 1736 (Test)
15 features
1 output (MOD)
WFR; FSL; Log(q); PI (feature generation)RMSE = 0.284–0.098
Hosseinzadeh et al. [54]SVR and GPR
[Simple sloped breakwater]
1220 (70% train and 30% test)
12 features 1 output (MOD)
WFR; FSL; Log(q); SPNWHcRMSE = 0.260
Kim and Lee [57]XGB, GPR, SVR, AdaBoost, RF, ANN, Lasso, and linear regressor
[Mixed structure]
8679 (80% train and 20% test)
17 features 1 output (MOD)
WFR; FSL; Log(q); DN into a [0, 1] rangeRMSE = 0.691–0.276
Tsai and Tsai [58]CNN
[Various structures and mixed structures]
8653 (80% train and 20% test)
16 features 1 output (MOD)
WFR; DN into a [0, 1] range; SPNWHRMSE = 0.112
Zanuttigh et al. [64]ANN
[Various structures]
8633 (train) and 871 (test)
15 features 1 output (MOD)
WFR; DN into a [0, 1] range; SPNWHRMSE = 0.090–0.047
Alshahri & Elbisy [20]; Elbisy [65]MPNN, CCNN, GRNN, and SVM
[Simple straight sloped]
4401 (train)
12 features 1 output (MOD)
WFR; FSL; SPNWH; DN into a [0, 1] range; Log(q)RMSE = 0.001–0.0003
Habib et al. [50]RF, XGB, SVR, and ANN
[Vertical seawalls]
1318 (70% train) (30% test)
20 features 1 output (MOD)
PCA; WRMs; KNNIMsRMSE = 0.0031–0.0025
Elbisy [21]DT, GBDT, and SVM
[Rubble mound breakwater]
4737
13 features 1 output
(MOD)
SPNWHRMSE = 0.0043–0.002
Formentin et al. [56]ANN
[Various structures]
8194
15 features 1 output (MOD)
WFR; DN into a [0, 1] range; SPNWH; Log(q)RMSE = 0.05–0.03
Etemad-Shahidi et al. [63]M5′ model tree
[Vertical smooth structure]
688 (2/3 train and 1/3 test)
12 features 1 output (MOD)
WFR; Tests of incident waves with an incident angle > 5°, non-smooth/permeable structures, and inclined structures with cot(α) > 0.1 were excludedRMSE = 0.46–0.39
Oliver et al. [66]ANN
[Sloping breakwater and seawall]
9997 (70% train, 15% test, and 15% validate)
15 features
1 output (MOD)
PCA; SOM; FSLMSE = 3.85 × 10−5–3.82 × 10−5
Verhaeghe et al. [67]ANNs
[Various structures]
8154 (85% train and 15% test)
13 features
1 output (MOD and binary values for the other model)
WFR; Log(q); FSLRMSE = 0.8442–0.5249
Lee and Suh [62]GMDH algorithm
[Inclined seawalls without a berm]
3609 + 303 dataset with oblique wave
(70% train and 30% test)
5 to 6 features
1 output (MOD)
WFR; FSL; SPNWHnRMSE = 0.0155–0.0030
Zanuttigh et al. [68]ANN
[Various structures]
16,000, 50 different random splits of data into train, test, and validation
13 features 3 outputs (MOD)
WFR; FSL; SPNWH; DN into a [0, 1] range; Log(q)RMSE = 0.046–0.028
Alvarellos et al. [59]ANN
[vertical, composite, and sloping breakwater]
Data 1 and 2: 8989
Data 3: 15,064 (75% train and 25% test)
10 features 1 output (binary: probability of overtopping event)
Merged and used the sine and cosine for the 0–360 degree range values.AP = 0.48–0.86
Mares-Nasarre et al. [69]ANN
[Mound breakwaters]
235 (70% train, 15% test, and 15% validate)
4 features 2 outputs (OLT and OFV exceeded by 2% of the incoming waves)
DN by other experimental parametersR2 = 0.789–0.903
Carro et al. [60]RF
[Mixed rubble mound]
38,265 (train) 9487 (test)
7 features 1 output (MOD)
MDA is used to separate data into training and testingRecall = 0.89–0.95
Table 3. Categorized summary of data preprocessing methods in overtopping studies.
Table 3. Categorized summary of data preprocessing methods in overtopping studies.
CategoryDescription or Common Techniques
Feature scaling or transformationWeight factor reduction (WFR), Froude similarity law (FSL), log-transformations (Log(q)), structure parameter normalization by wave height (SPNWH), data normalization (DN to [0, 1] range), sine–cosine feature transformation, or standard scaling
Data partitioning or mergingDataset division (for training/test, data merging across datasets)
Feature generation, selection, or importancePermutation importance (PI), principal component analysis (PCA), self-organized maps (SOM), wrapper reduction method (WRM), maximum dissimilarity algorithm (MDA), KNN imputation method (KNNIM), and manual feature elimination or synthesis
Filtering or data cleaningExclusion of outliers, removing non-suitable cases (e.g., waves with incident angle > 5° or non-smooth structures)
Table 4. Summary of reviewed coastal flooding-related studies.
Table 4. Summary of reviewed coastal flooding-related studies.
AuthorModel/AlgorithmData Source/TypeData Preprocessing or TechniquesPerformance
den Bieman et al. [17]SegNet CNN
[Measuring surface elevation, wave runup, and bed level development]
760 video imagery from physical model test (25% test)Image segmentation and
augmentation
Ac = 95.2–99.3%
Salatin et al. [91]V-BeachNet FCN
[Swash spectra, significant wave heights, and wave-driven setup]
Observations: Scanning lidar system
(32 images)
Video segmentation,
image rectification and enhancement
RMSE = 0.33–0.14
Naeini and Snaiki [77]cGAN
[Wave runup for coastal flooding]
Wave runup time-series data generated from XBSB and XBNHData transformation into an RGB image, a random mirroring technique, and image normalizationMAE = 0.01–0.0001
Hasan et al. [73]RF, XGB, & KNN
[Flood susceptibility mapping]
400 observations with 9 variables (30% test)Flood controlling factors processed on ArcGISAc = 85.5–86.7%
Hou et al. [92]Mask R-CNN
(ResNet-101 + RPN)
[Flood extent mapping]
1930 images from physical model tests
(40% test)
Image labeling and segmentationAP = 0.927
Gebrehiwot et al. [93]VGG-based FCN-16s
[Flood extent mapping]
UAV images
(100 samples)
Image labeling and segmentationAc = 97.5%
Ichim & Popescu [94]YOLO, GAN, AlexNet, LeNet, and ResNet
[Segmentation of vegetation and flood extension]
UAV images.
24,000 samples
(30% validation)
Image labeling and segmentationOAc = 85.9–94.4%
Peng et al. [72]CNN (PSNet), SVM, DT, RF, and Adaboost
[Flood extent mapping]
Satellite multispectral surface reflectance imagery and aerial VHR by NOAA. (1500 training samples)Image labeling and segmentationOAc = 93–98%
Sarker et al. [83]F-CNNs
[Flood extent mapping]
14 Landsat satellite images (generated 4.5 million patches)Normalizing pixel intensity values of images and labelingOAc = 80–92%
Muñoz et al. [95]CNN
[Flood extent mapping]
Landsat ARD, dual-polarized SAR data, and coastal DEMs.
22,476 patch images
(20% validation)
Data fusion (cloud/shadow removal, speckle filtering, rescaling) and rescaling of imagesOAc = 97%
Liu et al. [71]CNN
[Flood extent mapping]
Six pairs of Sentinel-1 SAR images (10,049 patches) from CEMSRM, GE, and OSM.Orbit file, Sliding-window filtering, Radiometric calibration, Range-Doppler terrain correctionPr = 88%
Kang et al. [82]FCN (VGG16)
[Flood extent mapping]
8 Geofen-3 SAR images divided into 3528 samplesGF-3 data converted to digital numbers and scaledAc = 99.1–99.6%
Zahura and Goodall [74]RF
[Flood extent mapping and depth]
Virginia HRSD and NOAA (rainfall and tide level). City of Norfolk’s STORM.
16 rainfall and 8 tidal events (20% test).
Feature generation and InterpolationPr = 72–92%
Qin et al. [70]BPNN, PR, DT, RF, & KNN
[Generate point waterlogging depth]
Observation from the monitoring stationStratified random sampling and clustering method.
Reduction factor determination process.
RMSE = 0.539–0.003
Di Bacco et al. [96]XT and RF
[Flood extent mapping and depth]
10 features with 5000 location points.
Meteorological, hydrological, and observed impact data from USGS and FEMA
Permutation Feature ImportanceMSE = 0.66–0.47
Shastry et al. [80]CNN
[Flood extent mapping]
Maxar WorldView 2/3 (Satellite). Five spatiotemporal strata.
USGS Flood, NLCD, and cloud masks.
80 WV images for training and 20 for validation
Used flood models with remote sensing data to get the most realistic flood extents.
Data labeling and segmentation
Pr = 98%
Dong et al. [88]FastGRNN-FCN
[Flood warning and inundation probability]
Data from HCFCD
9 variables
80,997 data instances
The flood data is divided into a multivariate time series and used as the model input.
Data labeling and feature selection
Ac = 95.8–97.8%
Nemni et al. [81]CNN (XNet, U-Net & ResNet)
[Flood extent mapping]
UNOSAT Flood Dataset, Copernicus Sentinel-1 SAR imagery.
Train, test, & validate (80:10:10)
Orthorectification, calibration, speckle filtering, tiling, compression, and
normalizing
Ac = 91–97%
Nhangumbe et al. [97]SVM, RF & Boosted Tree
[Flood extent mapping]
DrivenData Labs based on Sentinel-1 (S1) SAR imagery and cloud to streetFocal mean filter and data labelingIoU = 0.568
Hidayah et al. [87]MLP & RF
[Flood susceptibility mapping]
11 Flood factors from periodic flood polygons (2010–2020)
2368 data points (30% validation)
Normalized difference vegetation index and Feature selection methods.AUC = 0.921–0.967
Dodangeh et al. [98]GAM, BTR & MARS (integrating RS and BT)
[Flood susceptibility]
Data from Iran Meteorological Organization
(294 data points)
Variables were transformed into a raster formatAUC = 0.95–0.98
Ramayanti et al. [99]GNDH & CNN
[Flood susceptibility mapping]
SAR Sentinel-1 satellite images, GIS. 29,291 flood data points (50:50 train & test). DEM data gathered from the SRTMSelection of flood conditioning factors, geometric and radiometric correction; A supervised classification technique; All factors reclassified by ArcMap softwareAUC = 0.87–0.90
Saravanan & Abijith [75]GBM, XGB, RTF, SVM, & NB
[Flood susceptibility mapping]
Sentinel-1 satellite images, Landsat 8 from GEE Chirps from GEE SRTM. NBSS and LUP
(70:30 train & test)
Recursive feature elimination, data normalization, and labelingAUC = 0.8–0.92
Prasad et al. [79]RTF, NSC, KNN, BRT, & LB
[Flood susceptibility mapping]
DDMMS and Landsat images. 210 flood locations, satellite images, and field surveys.
(30% test)
Factor selection by the Boruta algorithm.
Weight of Evidence floods (WoE)
AUC = 0.892–0.966
Vu et al. [78]SVM & RF
[Flood susceptibility mapping]
VDSMGI Sentinel 1A images.
1864 flood locations and 11 conditional factors.
(30% validation)
Data normalization,
factors transformed into raster format at 10 m resolution
AUC = 0.97–0.98
Asiri et al. [18]AHP-SVM & AHP-DT ensembles
[Flood susceptibility mapping]
Various datasets and data sources for coastal flooding risk factors, such as flood hazard, social vulnerability, and exposureMulti-Collinearity Test
Weight Indicators
Proxy Variables normalization (MinMax)
AUC = 0.75–0.98
Table 5. Categorized summary of data preprocessing methods in coastal-flooding studies.
Table 5. Categorized summary of data preprocessing methods in coastal-flooding studies.
CategoryDescription or Common Techniques
Image labeling and segmentationManual or automated flood extent labeling, image segmentation
Feature selection or generationRecursive feature elimination, Boruta, permutation importance, or manually derived conditioning factors
Normalization or scalingNormalizing variables or pixel intensities, Min–Max, or standardization
GIS preprocessingData rasterization, factor reclassification, ArcGIS processing
Noise reduction or filteringSpeckle filtering, sliding-window or focal mean filters, image rectification, cloud/shadow removal
Data fusion or integrationCombining multisource data (SAR, DEM, optical), feature merging
Geometric or radiometric correctionRadiometric calibration, orbit correction, terrain correction, geometric correction
Statistical or sampling techniqueStratified sampling, clustering, and reduction factor estimation
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Duiker, M.L.; Ramos, V.; Taveira-Pinto, F.; Rosa-Santos, P. Predicting Coastal Flooding and Overtopping with Machine Learning: Review and Future Prospects. J. Mar. Sci. Eng. 2025, 13, 2384. https://doi.org/10.3390/jmse13122384

AMA Style

Duiker ML, Ramos V, Taveira-Pinto F, Rosa-Santos P. Predicting Coastal Flooding and Overtopping with Machine Learning: Review and Future Prospects. Journal of Marine Science and Engineering. 2025; 13(12):2384. https://doi.org/10.3390/jmse13122384

Chicago/Turabian Style

Duiker, Moeketsi L., Victor Ramos, Francisco Taveira-Pinto, and Paulo Rosa-Santos. 2025. "Predicting Coastal Flooding and Overtopping with Machine Learning: Review and Future Prospects" Journal of Marine Science and Engineering 13, no. 12: 2384. https://doi.org/10.3390/jmse13122384

APA Style

Duiker, M. L., Ramos, V., Taveira-Pinto, F., & Rosa-Santos, P. (2025). Predicting Coastal Flooding and Overtopping with Machine Learning: Review and Future Prospects. Journal of Marine Science and Engineering, 13(12), 2384. https://doi.org/10.3390/jmse13122384

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop