Next Article in Journal
Applicability of the HC-SURF Dual Drainage Model for Urban Flood Forecasting: A Quantitative Comparison with PC-SWMM and InfoWorks ICM
Previous Article in Journal
MesoHydraulics: Modelling Spatiotemporal Hydraulic Distributions at the Mesoscale
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Probabilistic Prediction of Local Scour at Bridge Piers with Interpretable Machine Learning

1
Department of Civil & Environmental Engineering, Pusan National University, Busan 46241, Republic of Korea
2
K-Watercraft, Busan 46241, Republic of Korea
3
Department of Fire Protection Engineering, Pukyong National University, Busan 48513, Republic of Korea
*
Authors to whom correspondence should be addressed.
Water 2025, 17(24), 3574; https://doi.org/10.3390/w17243574
Submission received: 7 November 2025 / Revised: 9 December 2025 / Accepted: 14 December 2025 / Published: 16 December 2025
(This article belongs to the Section Hydraulics and Hydrodynamics)

Abstract

Local pier scour remains one of the leading causes of bridge failure, calling for predictions that are both accurate and uncertainty-aware. This study develops an interpretable data-driven framework that couples CatBoost (Categorial Gradient Boosting) for deterministic point prediction with NGBoost (Natural Gradient Boosting) for probabilistic prediction. Both models are trained on a laboratory dataset of 552 measurements of local scour at bridge piers using non-dimensional inputs (y/b, V/Vc, b/d50, Fr). Model performance was quantitatively evaluated using standard regression metrics, and interpretability was provided through SHAP (Shapley Additive Explanations) analysis. Monte Carlo–based reliability analysis linked the predicted scour depths to a reliability index β and exceedance probability through a simple multiplicative correction factor. On the held-out test set, CatBoost offers slightly higher point-prediction accuracy, while NGBoost yields well-calibrated prediction intervals with empirical coverages close to the nominal 68% and 95% levels. This framework delivers accurate, interpretable, and uncertainty-aware scour estimates for target-reliability, risk-informed bridge design.

1. Introduction

Scour occurring at bridge foundations has long been recognized as one of the most severe hydraulic risk factors threatening the stability of the structure [1]. Wardhana et al. reported that scour is a major cause of bridge failures in the United States, with an annual average of 22 bridges collapsing or closing due to severe deformation. They also noted that 53% of the 503 bridge failure cases in the US from 1989 to 2000 were due to hydraulic factors such as floods and scour [2]. According to an investigation by the Federal Highway Administration, 60% of the 823 bridge failures since 1972 were related to flow hydraulics [3]. Such structural damage leads to reconstruction costs, transportation network disruptions, and, in extreme cases, loss of life, resulting in enormous socioeconomic repercussions [4]. This necessitates more accurate and reliable scour prediction techniques.
The key challenges facing the field of scour prediction can be summarized in two main parts. The first is to accurately predict the complex hydraulic phenomenon, and the second is to quantify the uncertainty of the prediction results to enable safety-oriented design. Scour occurring around a bridge pier is governed by a complex vortex structure created by the flow being disturbed by the structure. The approach flow forms a pressure gradient at the front of the pier, and the resulting downflow generates a powerful horseshoe vortex system at the foundation. The high shear stress exerted on the riverbed by this vortex begins to remove sediment particles, and a scour forms and intensifies when the outflowing sediment transport rate exceeds the inflowing rate [5,6]. In addition to the complexity of this physical process, the diversity of field conditions (pier geometry, flow conditions, bed material properties, etc.) makes accurate prediction even more difficult. And a single point estimate is insufficient for design practice; a methodology is required that allows for the determination of an appropriate factor of safety by explicitly considering the uncertainty inherent in the prediction.
Efforts to accurately predict scour depth have proceeded in various directions over a long period. The most traditional approach is the empirical formula, developed based on laboratory or field measurement data. Representative empirical formulas, such as the CSU formula documented in HEC-18 and Melville’s formula, have been widely used in practice due to their simplicity of calculation and intuitiveness [6,7]. However, as these formulas were inherently derived from limited experimental conditions, they do not sufficiently reflect the complex hydraulic and geomorphological characteristics of actual rivers. In particular, factors like complex pier geometries, unsteady flow during floods, and the behavior of cohesive sediment often fall outside the applicable scope of these formulas. Furthermore, it has been consistently reported that applying different empirical formulas to the same conditions produces significant discrepancies in prediction results, and they tend to overestimate scour depth when applied to real rivers composed of uniform sediment [8,9,10,11].
In practice, many bridge foundations do not consist of isolated simple piers but involve complex geometries such as pile groups and column–pile-cap systems. Local scour at such complex piers results from the interaction of column, pile-group and pile-cap scour components and depends sensitively on the shape and elevation of the structural elements, making it substantially more difficult to predict than scour at simple cylindrical piers. Recent experimental studies have confirmed that these geometric effects can markedly alter both the magnitude and the pattern of scour and require dedicated prediction approaches beyond traditional formulas developed for simple piers [12,13].
To overcome the limitations of empirical formulas, numerical analysis techniques based on Computational Fluid Dynamics (CFD) were introduced. This approach simulates the 3D flow field and sediment transport process around the pier in detail by directly solving the fluid governing equations. Roulund et al. successfully reproduced the horseshoe vortex around a circular pier using a 3D numerical model that included a turbulence model; however, they failed to consider unsteady flow effects, resulting in an underestimation of downstream scour [14]. Similarly, a study by Khosronejad et al. using a URANS model showed good agreement with experiments for sharp-edged piers, but it failed to properly simulate the strong vortices for blunt shapes, exhibiting a limitation of overestimating scour [15]. While numerical analysis methods provide deep insight into the physical mechanisms, they require massive computational cost and time, and they carry the fundamental problem that the accuracy of the results is highly sensitive to the choice of turbulence model and sediment transport formula [16].
Recently, data-driven machine learning (ML) techniques have garnered attention as an alternative that can dramatically improve predictive accuracy [17]. Early studies utilizing Artificial Neural Networks (ANNs) demonstrated superior performance over traditional empirical formulas. Kambekar et al. showed that ANNs were clearly superior to statistical regression analysis in predicting scour around marine pile structures [18]. Studies by Betani et al. and Lee et al. confirmed that various neural network models, such as MLP, RBF, and BPN, could achieve prediction accuracy that significantly surpassed existing empirical formulas [19,20]. Kaya demonstrated the superiority of ANNs in predicting live-bed scour conditions using field data and suggested, through sensitivity analysis, that high accuracy could be obtained using only key variables like pier width and flow velocity [21]. These studies have shown that ML can significantly enhance prediction accuracy by effectively learning complex non-linear relationships.
Achieving high predictive accuracy alone is not sufficient for engineering decision-making. In design practice, it is necessary to determine an appropriate factor of safety by explicitly considering the uncertainty inherent in the predicted values, and it must be possible to understand the basis on which the model produced a specific prediction. However, most machine learning models operate as ‘black box’, possessing a fundamental limitation in that they cannot provide a physical interpretation of the prediction process. This undermines the reliability of the model and becomes an obstacle to directly utilizing the prediction results as design criteria. There have been attempts to solve these problems. Gene Expression Programming (GEP) is a technique that automatically generates explicit mathematical formulas from data, and Azamathulla et al. showed that GEP could provide interpretable formulas while demonstrating superior predictive performance to ANNs [22]. Pal et al. utilized the M5 model tree to demonstrate the advantage of providing linear relationship equations while maintaining predictive performance equivalent to ANNs [23]. However, these techniques generate a single global formula or rule, so they have the limitation of not being able to sufficiently reflect local data characteristics or changes in variable interactions according to conditions. Against this backdrop, interpretable machine learning attempts incorporating feature-attribution-based techniques like SHAP have recently been increasingly reported [24,25]. SHAP (SHapley Additive exPlanations) utilizes the Shapley value concept from game theory to quantify the contribution of each input variable to an individual prediction. This technique can isolate the independent impact of each variable while considering interactions between them, and it has the strength of providing both global trend analysis and local prediction interpretation simultaneously. However, research utilizing SHAP in the field of scour prediction is still in its early stages.
In parallel with these deterministic and data-driven developments, a number of studies have examined scour and pile performance within probabilistic and reliability-based frameworks. Homaei and Najafzadeh carried out a reliability-based probabilistic evaluation of wave-induced scour depth around marine structure piles, explicitly incorporating uncertainties in wave and seabed characteristics to estimate failure probabilities and reliability indices [26]. Vatani et al. proposed a hybrid random-forest-based subset simulation (RFSS) method for probabilistic assessment of wave-induced scour around pile groups, using a random forest surrogate of a metaheuristically optimized scour equation to estimate reliability indices with Monte Carlo-level accuracy at much lower computational cost. [27]. Jafari-Asl et al. performed a comparative study of several reliability methods for the probabilistic analysis of local scour at a bridge pier in clay–sand-mixed sediments, showing how the estimated failure probability depends on the chosen reliability algorithm when deterministic scour equations are used as performance functions [28]. More recently, Hosseini et al. incorporated random pier scouring into a probabilistic seismic safety assessment of bridges by treating scour depth as a key uncertain parameter in seismic fragility analysis [29]. These contributions clearly highlight the importance of probabilistic and reliability-based thinking for scour-related problems; however, in most of these studies scour depth itself is still represented by deterministic empirical or mechanical equations, and uncertainty is introduced only at the level of input variables or reliability methods. In contrast, the present work develops a probabilistic machine-learning model that directly learns the conditional distribution of local scour depth at bridge piers from data, and then uses the learned distribution parameters to derive risk measures such as exceedance probabilities and a reliability index β.
This study proposes an integrated framework to simultaneously solve the two key challenges in scour prediction: improving accuracy and quantifying uncertainty. To this end, CatBoost, an advanced ensemble learning technique, and NGBoost, which is capable of probabilistic prediction, are utilized. CatBoost is an algorithm that automatically handles categorical variables and effectively prevents overfitting, enabling the achievement of high predictive performance even without complex, separate preprocessing. NGBoost offers the unique advantage of being able to explicitly quantify prediction uncertainty and provide confidence intervals by predicting a probability distribution instead of a single point estimate. In the present application to bridge-pier scour, NGBoost is used to learn the conditional probability distribution of scour depth given the key non-dimensional hydraulic and sediment parameters, so that both the conditional mean and variance of scour can be obtained for any given input condition. Going beyond simply achieving high predictive accuracy, the developed model is systematically analyzed using SHAP to determine which hydraulic and geomorphological characteristics it learns as important. This analysis provides a physical interpretation of the model’s prediction process and investigates the impact of each input variable on scour depth. Furthermore, a reliability analysis is performed to enhance the engineering applicability of the prediction results. Through Monte Carlo Simulation, 10,000 scenarios reflecting the statistical characteristics of the input variables are generated, and the resulting probability distribution of the predictions is analyzed. In this reliability analysis, the learned distribution parameters (conditional mean and variance) are used directly, so that risk measures such as exceedance probability and the reliability index β are evaluated consistently with the original probabilistic prediction. Assuming this distribution is normal, the reliability index (Beta-Reliability Index) is calculated, and correction factors corresponding to changes in the reliability level are proposed. Through this integrated approach, the aim is to contribute to establishing more rational and safety-oriented bridge design criteria by presenting a scour prediction methodology that is accurate, interpretable, and explicitly considers uncertainty.

2. Methods

2.1. Database

2.1.1. Data Collection

The predictive accuracy and generalization ability of machine learning models are highly dependent on the diversity and distribution of the data used for training; therefore, it is essential to construct a dataset that encompasses a wide range of hydraulic and geomorphological scenarios. This is because models essentially learn the patterns inherent in the data, making the breadth and sufficiency of the information covered by the training data key variables that govern the model’s performance. The data used for model training in this study was sourced from the USGS (United States Geological Survey) bridge scour database [30]. The database contains 569 laboratory datasets and 1858 field datasets; in this study, we used only 552 laboratory datasets. The database includes variables related to scour, as shown in Equation (1).
y s = f ( b , V , V c , y , σ g , d 50 ) ,
where ys is the scour depth, b is the pier width perpendicular to the flow, V is the approach flow velocity, Vc is the critical velocity for the sediment, y is the approach flow depth, and d50 is the median sediment particle size. The range of the variables used in the study is shown in Table 1.

2.1.2. Correlation Analysis Between Variables

To understand the relationships between the variables and to guide the construction of non-dimensional inputs, a correlation analysis was performed on the dimensional variables The Pearson correlation coefficient was used to measure the strength of the linear relationships between b, y , V, Vc, d50 and ys, and the results are summarized in Figure 1.
A strong positive correlation (r = 0.86) was observed between the pier width (b) and the scour depth (ys), as scour depth is nearly proportional to pier width in shallow flows. Melville, for instance, summarized that in shallow flows, scour depth is proportional to the width b [31]. Flow depth (y) and scour depth (ys) showed a moderate positive correlation (r = 0.53), which is consistent with the classical pattern that scour intensifies as flow depth increases [31]. The correlation between average flow velocity (V) and scour depth (ys) was relatively weak (r = 0.14). This is because scour is significantly influenced by the relative velocity (V/Vc) rather than the absolute velocity. HEC-RAS cites that Vc is a key parameter for distinguishing between clear-water and live-bed conditions, and V/Vc is used for live-bed scour [32]. The critical velocity (Vc) and scour depth (ys) showed a low positive correlation (r = 0.21), which is contrary to the theoretical expectation that a higher Vc would suppress scour. The correlation between the median particle size (d50) and scour depth (ys) was very weak (r = 0.11), which aligns with previous research. According to the FHWA report the effect of d50 is manifested through the relative roughness (b/d50) or the particle size distribution (non-uniformity), rather than its absolute value [33]. Finally, Vc and d50 showed a very strong positive correlation (r = 0.95). HEC-RAS presents the formula for critical velocity as V c = K y 1 / 6 D 1 / 3 , explaining that Vc increases with d50 [32]. In contrast, correlations among the remaining pairs of independent variables, such as b-V and b-d50, are very small (with |r| values close to zero) and do not suggest any meaningful linear dependence. Their non-zero values simply reflect the finite sample size and the fact that the laboratory experiments were not designed as a perfectly factorial combination of all parameter settings. In this study, the correlation matrix is therefore used as an exploratory diagnostic of the database and as a check for obvious multicollinearity, rather than as a basis for deriving causal laws from pairwise relationships.
Based on the physical principles and inter-variable relationships identified through the correlation analysis, the final input and output variables were structured as non-dimensional parameters to enhance the model’s generalization and performance and to mitigate multicollinearity. The high correlation between ys and b is a “scale effect,” and its influence was removed by normalizing, such as with ys/b. Given the strong correlation between d50 and Vc and the importance of the “relative velocity” pattern (V/Vc), V/Vc was used instead of V and Vc individually to effectively compress information and increase variable independence. The particle size effect was reflected using the relative roughness b/d50 instead of d50 alone. The Froude number ( F r = V / g y ), representing the ratio of inertial to gravitational forces, was included. These non-dimensional groups substantially reduce scale effects and alleviate collinearity among the original variables, although some dependence inevitably remains because Fr shares V and y with V/Vc and y/b, and Vc itself is linked to d50 through incipient-motion relationships. Nevertheless, we retain Fr as a separate predictor because the flow regime strongly influences the strength of the downflow and horseshoe vortices acting on the bed and thus the overall scour mechanism, whereas V/Vc, y/b and b/d50 primarily represent relative velocity, relative depth and relative roughness. We therefore keep y/b, V/Vc, b/d50 and Fr as distinct inputs that summarize complementary hydraulic aspects, and we interpret variable importance and SHAP-based explanations with their mutual dependence in mind. Consequently, the final variables for this study were selected as shown in Equation (2).
y s b = ( y b , V V c , b d 50 , F r ) ,
where ys/b is the normalized scour depth, y/b is the relative flow depth (a geometry and force parameter), V/Vc is the flow intensity, representing the relative velocity of the current flow compared to the critical condition for the bed material, b/d50 is the relative roughness, and Fr is the Froude number, a dimensionless parameter indicating the state of the flow.

2.1.3. Data Partitioning

The whole dataset was randomly split into training, validation, and test sets for model training, validation, and final performance evaluation. In this study, the total data was partitioned into ratios of 80%, 10%, and 10%, respectively.

2.2. ML Models

2.2.1. CatBoost

CatBoost is a machine learning framework based on the gradient boosting decision tree algorithm. Similarly to typical boosting decision tree models, CatBoost sequentially adds decision trees, which are weak learners that learn from the residuals made by the previous stage’s model, to progressively improve the performance of the overall model. In the m th step, the predicted value y ^ ( m ) is updated by adding the prediction result of the new tree, f m ( x ) , to the predicted value from the previous step, y ^ ( m 1 ) .
y ^ ( m ) = y ^ ( m ) + η f m x ,
The objective function of CatBoost is composed of the sum of a loss function L and a regularization term Ω that controls model complexity, and training proceeds in the direction that minimizes this objective function.
O b j ( m ) = i = 1 n L ( y i , y ^ i ( m ) ) + j = 1 m Ω ( f i ) ,
These update rules and objective function follow the standard gradient boosting framework implemented in CatBoost [34]. This study aims to perform precise deterministic prediction for scour depth by utilizing CatBoost’s fast training speed and high prediction accuracy. The CatBoost model was trained using Mean Squared Error (MSE) as the loss function.

2.2.2. NGBoost

NGBoost is a gradient boosting framework specialized for performing probabilistic prediction, which can quantify prediction uncertainty. Unlike traditional models that find a single point estimate, NGBoost predicts the parameters of the probability distribution that the prediction result is expected to follow. For example, assuming a normal distribution, the model predicts two parameters θ : the mean μ and the standard deviation σ . The training objective of NGBoost is to maximize the likelihood of the actual data appearing in the predicted probability distribution, which is equivalent to minimizing the Negative Log-Likelihood (NLL).
L θ = i = 1 n log P θ i y i x i ,
To achieve this, NGBoost uses the Natural Gradient, g n a t , which considers the geometric structure (curvature) of the probability distribution space, instead of the standard gradient. The Natural Gradient is calculated by multiplying the inverse of the Fisher Information Matrix, F θ , by the standard gradient, providing a more stable and efficient learning path.
g n a t = F θ 1 g       w h e r e       g = θ log P θ ( y | x ) ,
The NLL loss and the natural gradient update used in this study follow the probabilistic boosting framework of NGBoost [35]. This study aims to evaluate the inherent uncertainty in scour prediction using NGBoost, which was trained using the NLL as the loss function.
In this study, NGBoost was implemented using the open-source Python library (v.3.12.12) with tree-based base learners and the Normal distribution as the output density. The model was trained on the training dataset using the negative log-likelihood as the loss function, and the hyperparameters (e.g., number of boosting iterations, learning rate, minibatch fraction, column subsampling ratio, and tree-depth parameters) were tuned by Bayesian optimization as described in Section 2.3. For the present dataset (552 samples and four input variables), a single NGBoost training run on a standard desktop computer was completed within a few seconds, indicating that the computational burden is modest for this problem size. During both the hyperparameter optimization and the final training, we did not encounter numerical errors such as NaN losses, divergence, or training failures, suggesting that the optimization process of NGBoost is numerically stable for the current application.

2.3. Hyperparameter Tuning

The performance of a machine learning model is highly dependent on the hyperparameter settings, which determine the model’s structure and learning method. Unlike parameters that are learned from the training data, hyperparameters are values that must be specified by the user in advance. The process of finding the optimal hyperparameter combination is essential to prevent the model from overfitting to the given data and to maximize its generalization performance on unseen data. In this study, Bayesian optimization, an efficient search technique, was applied to secure optimal predictive performance. The criterion for evaluating the performance of each hyperparameter combination during the search process was the validation loss on the validation dataset, defined consistently with the training objective of each model. That is, a model trained on the training dataset with a specific hyperparameter combination was applied to the validation dataset to measure its performance. This process was repeated to finally select the combination that demonstrated the best performance on the validation dataset.

2.4. Model Performance Evaluation

2.4.1. Point Prediction Performance Evaluation

To evaluate the model’s Point prediction performance, the following metrics were used (Equations (7)–(12)). RMSE is the square root of the mean squared error and represents the average magnitude of the errors. SI is the RMSE divided by the mean of the observations, which is useful for comparing relative error magnitudes. R2 is the coefficient of determination; the closer it is to 1, the better the linear fit between observations and predictions. B is the bias, which shows how much the model overestimates or underestimates on average, if B = 0, it is considered to have no average bias. Se is the standard error of the residuals and can express the average fluctuation of the predictions. Finally, MAPE is the Mean Absolute Relative Error, the closer it is to 0, it signifies that the predictions are overall closer to the actual values.
R M S E = i = 1 n ( x i y i ) 2 N ,
S I = 1 N i = 1 n ( x i y i ) 2 x ¯ ,
R 2 = ( i = 1 n ( x i x ¯ ) ( y i y ¯ ) i = 1 n ( x i x ¯ ) 2 i = 1 n ( y i y ¯ ) 2 ) 2 ,
B = i = 1 n ( y i x i ) N ,
S e = i = 1 n ( ( y i x i ) B ) 2 N 2 ,
M A P E = 1 N i = 1 n | y i x i x i | ,
In Equations (7)–(12), x i is the measured value, y i is the model’s predicted value, N is the number of samples, and x ¯ and y ¯ are the sample means of each value. These point prediction metrics have been widely used in previous bridge pier scour studies for evaluating empirical and data-driven models [36,37].

2.4.2. Probability Prediction Performance Evaluation

In this study, calibration was evaluated using the central two-sided prediction interval μ ± k σ , based on the predicted mean μ and standard deviation σ (assuming normality) calculated by NGBoost. The probability levels were set at k = 1 (central 68.27%) and k = 1.96 (central 95%). This is because μ ± k σ represents a well-defined central interval in a normal distribution, allowing for a separate check of the goodness-of-fit for the central part and the tails. The evaluation procedure is as follows. For each sample i , the measured coverage was calculated by tallying whether the actual measured value y i was included within μ i ± k σ i . To reflect the sampling variability of the proportion estimate in a situation with a small sample size, the 95% confidence interval for this inclusion proportion was calculated using the Wilson method and also reported. Additionally, the mean and standard deviation of the standardized residuals were reported to check the appropriateness and bias of the prediction interval width. The evaluation was performed on the test set, which was not used for training.

2.5. SHAP

ML models like CatBoost provide high prediction accuracy, but their internal decision-making processes are complex, leading them to be considered ‘black box’ models. To trust the prediction results of such models and gain physical insights into the scour phenomenon, an interpretation technique that explains the basis of the predictions is essential. This study aims to interpret the developed model by applying SHAP, one of the most widely used techniques in the field of eXplainable AI (XAI). SHAP utilizes the Shapley Value, based on Cooperative Game Theory, to quantitatively calculate how much each input variable contributed—either positively or negatively—to a single prediction value. That is, it has an additive property, meaning the model’s prediction value can be decomposed into the sum of the contributed values from all variables. The explanation model g for a specific prediction is expressed as follows:
g z = ϕ 0 + i = 1 M ϕ i z i ,
In Equation (13), z′ is a binary vector representing the presence or absence of a variable, M is the total number of input variables, ϕ 0 is the base value (the prediction when no input variables are present), and ϕ i is the prediction contribution of variable i , which is the SHAP value. This additive explanation model corresponds to the SHAP feature attribution framework based on Shapley values [38].
Through this approach, SHAP provides tools to analyze the overall trends of the model. Model interpretation is performed to understand which variables the model learned as important across the entire dataset. This is performed by calculating the feature importance by taking the mean absolute SHAP value for each data point. The SHAP summary plot not only shows the variable importance in order but also visualizes the distribution and directionality of each variable’s impact on the SHAP value. This allows for an intuitive understanding of the model’s overall learned patterns, such as “a higher value of variable X tends to increase the scour prediction.”

2.6. Probabilistic Reliability Analysis Based on Limit States

The single point estimate provided by a point-estimate model has the limitation of not reflecting the uncertainty inherent in the prediction process. This study aims to overcome this limitation by comprehensively evaluating the uncertainty originating from both the model and the data, and based on this, to propose rational, risk-based design criteria. To this end, a probabilistic reliability analysis was conducted using the reliability index β and the probability of exceedance P e as core metrics within the framework of the Limit State Design (LSD). The goal of this analysis is to evaluate how safely the predicted value exceeds the measured value. When the limit state function is defined as g = R S , the system’s safety level is represented by the reliability index β as in classical structural reliability theory [39].
β = μ R μ S σ R 2 + σ S 2
In Equation (14), μ R and σ R are the mean and standard deviation of the model prediction (Resistance), and μ S and σ S are the mean and standard deviation of the measurement (Load). The higher the β value, the safer it is, and the probability of exceedance P e , which is the probability that the actual scour will exceed the prediction, is calculated as follows using the standard normal cumulative distribution function Φ [28]:
P e = P r [ g     0 ] = Φ ( β )
To adjust the conservatism of the model prediction, a correction factor α is introduced, and a modified limit state function g α = α R S is considered. Equations (16)–(19) are obtained in this study by applying the same β -index definition in Equations (14) and (15) to the modified limit state g α = α R S . In this case, the mean, variance, reliability index, and probability of failure according to the correction factor are as follows:
μ g α = α μ R μ S ,
σ g 2 ( α ) = α 2 σ R 2 + σ S 2 ,
β ( α ) = μ g ( α ) σ g ( α ) ,
P e ( α ) = Φ ( β ( α ) )
The overall statistical characteristics ( μ R ,   σ R ) of the model prediction were estimated through Monte Carlo Simulation (MCS). The probability distributions for the four input variables X = (b/d50, y/b, Fr, V/Vc) were determined by analyzing the distribution of the actual measurement data. Based on the determined distributions, 10,000 input scenarios were generated and constrained so as not to deviate from the measured range. For each scenario X i , a prediction distribution is obtained from the probabilistic prediction model.
R | X   ~   N ( μ ( X i ) ,   σ 2 ( X i ) )
In Equation (20), μ ( X i ) is the mean prediction, and σ 2 ( X i ) is the model’s own prediction variance. This predictive distribution directly follows the output parameterization of the NGBoost model used in this study. Applying the law of total variance, the mean and variance of the overall model prediction were aggregated as follows:
μ R = E μ X ,
σ R 2 = V a r [ μ ( X ) ] + E [ σ 2 X ]
Equation (22) provides a framework for quantitatively decomposing the final prediction variability into a component due to the variability of the input variables and a component due to the model’s own uncertainty. The statistical properties of the measurements ( μ S ,   σ S ) were calculated directly from the entire actual measurement data. Based on these defined probabilistic characteristics and the NGBoost-based uncertainty estimation results, this study aims to quantitatively evaluate the reliability of the model prediction and analyze correction measures for the target safety level.

3. Results

3.1. Evaluation of Model Performance

Table 2 shows the performance evaluation results of the developed machine learning models. First, the CatBoost model showed high performance on the training dataset (RMSE 0.18, R2 0.88) and recorded RMSE 0.22 and R2 0.76 on the test dataset, which was not used for training. The NGBoost model also showed similar trends with (RMSE 0.21, R2 0.82) on the training dataset and (RMSE 0.23, R2 0.75) on the test dataset. As such, both models showed a minimal drop in performance between the training and test datasets, confirming that stable generalization performance was secured. This can also be visually confirmed in Figure 2, where the predictions of both models are tightly clustered around the ideal 1:1 line. When comprehensively considering the overall metrics, CatBoost showed a slight superiority in point prediction performance.
However, NGBoost, unlike models that provide only a single point estimate, has the advantage of providing a probability distribution for each prediction. This is visualized in Figure 3. Figure 3 shows each prediction(mean) along with its uncertainty bounds. A noteworthy point here is that the length of this error bar is not uniform for all predictions. This means NGBoost evaluates the reliability of the prediction differently for each input condition and quantitatively presents the magnitude of that uncertainty. These probabilistic results, which are evaluated differently for each individual data point, have the potential to be utilized as input data for evaluating system reliability or quantifying risk in subsequent analyses. Therefore, in this study, despite the slight difference in point prediction performance, the NGBoost model, which enables probabilistic analysis, was selected as the final analysis model to more highly evaluate the potential for expanding the post-analysis.
The probabilistic predictions of NGBoost on the test data (n = 56) were checked using central two-sided intervals (Figure 4). The measured coverage for μ ± σ (approx. 68%) was 69.6% (39/56), and for μ ± 1.96 σ (approx. 95%) it was 94.6% (53/56). Both values are very close to the nominal levels (68.27%, 95.00%). The Wilson 95% confidence intervals, reflecting sampling variability, were calculated as 56.7–80.1% ( μ ± σ ) and 85.4–98.2% ( μ ± 1.96 σ ), respectively, both of which include the nominal levels. This indicates that the probability promised by the model is statistically maintained without contradiction in the test data. The error structure also aligns. The mean of the standardized residuals, Z = y μ / σ , was −0.125, showing small bias, and the standard deviation was 1.077, close to the ideal value of 1. In other words, the variance calculated by the model generally matches the actual variability well. Although mild heteroscedasticity is observed (the interval widens slightly as the response increases), the measured coverage is maintained similarly to the nominal level across the entire range. Only 3 points fell outside ± 1.96 σ , indicating no excessive underestimation or overestimation in the tails.

3.2. Model Interpretation

ML model provides excellent predictive accuracy, but its internal workings are complex, making them difficult to interpret. This lack of transparency is a significant challenge in securing the reliability of the model’s predictions and applying them in practical applications. In line with recent advances in XAI, SHAP has also been used to interpret complex hybrid deep-learning architectures in other domains, where feature-level visualizations show how individual inputs increase or decrease the model output [40]. In this study, SHAP was utilized as a model interpretation tool to analyze how each hydraulic variable increases or decreases the predicted scour depth in the NGBoost model.
Figure 5a shows the global importance of each input variable on the model prediction, represented by the mean absolute SHAP value. The NGBoost model recognizes y/b as the most critical factor when predicting scour depth, followed by b/d50, V/Vc, and Fr. The finding that y/b has the greatest influence on dimensionless scour depth can also be found in previous studies [25,41].
While Figure 5a only shows the overall magnitude of each variable’s influence, Figure 5b simultaneously displays the directionality of the influence according to the variable’s value. Each dot represents a single data point, the color indicates the magnitude of the variable’s value, and the horizontal axis position represents the impact on that prediction. In the case of y/b, red dots (high values) are skewed toward positive SHAP values, indicating that a high y/b contributes to increasing the scour prediction. This is consistent with the hydraulic expectation that an increase in water depth intensifies scour. However, for the remaining variables, the boundary between negative and positive SHAP values is indistinct, suggesting that a non-linear correlation exists between the input variables and the target variable.
Figure 6 shows the effect of the change in individual input variables on the predicted non-dimensional scour depth, represented by SHAP dependence plots. y/b, the most dominant variable, showed a 3-stage pattern (Figure 6a). a negative contribution at very shallow depths, a sharp transition to a positive contribution as the depth increases, and a plateau once the depth becomes sufficiently deep. We can confirm that the model has effectively learned the fact, consistent with previous studies, that scour depth increases with flow depth at shallow flow depths, but beyond a certain threshold, the rate of increase diminishes sharply, becoming nearly horizontal [5,23,42].
In the case of V/Vc, it exhibits a non-monotonic pattern (Figure 6b). At low V/Vc values, the SHAP value is generally negative, but it rapidly transitions to positive as it approaches the critical condition (V/Vc ~ 1). After showing a local peak near the critical condition, a mitigation phase appears where the contribution to scour depth temporarily decreases. Subsequently, at high V/Vc values, a clear positive upward trend is observed again. This aligns with previous research demonstrating a phased behavior: below the critical velocity, upstream sediment transport is suppressed; once the threshold is exceeded, sediment supply from upstream resumes, temporarily mitigating scour; and as the flow velocity increases further, scour intensifies again [31].
As b/d50 increases, the SHAP value shows an inverted U-shape, reaching a peak in the b/d50 = 30 range and then decreasing at values smaller or larger than that (Figure 6c). Some points are also observed to drop to a negative contribution in each region. This pattern might be a signal that its influence on scour depth is weakened. This is likely because, given the normalized response ys/b and the high d50-Vc correlation, the variance of b and V in the data is relatively smaller than that of d50, causing some of the effect from high-velocity segments to be absorbed by the V/Vc ratio.
In the case of Fr (Figure 6d), previous research has reported cases where, if other conditions are the same, the average scour simply increases as Fr increases [2]. However, F r = V / g y , so Fr includes both y and V. If an increase in Fr in the data occurs mainly due to a decrease in y rather than an increase in V, the Fr increase and the y/b decrease occur simultaneously, which can result in a trend of decreasing average scour. Furthermore, the re-increase at high Fr values can be interpreted as the Fr increase being predominantly led by an increase in V, or that y/b is already sufficiently large, and its influence on scour depth has reached saturation. The SHAP graph analysis of these input variables confirms that the NGBoost model has learned the hydraulic mechanisms reported by previous research in a consistent manner. This demonstrates that the model is not just a simple statistical regression but a physically interpretable model that reflects the actual hydraulic phenomenon of scour.

3.3. Comparison with Existing Empirical Formulas

To further validate the predictive performance of the proposed machine learning model, its prediction results were compared and analyzed with four major existing empirical formulas-Wilson [41], Melville [43], HEC-18 [6], and Briaud [44]. The mathematical expressions and variable definitions of these empirical formulas are summarized in Table 3, and the performance comparison between the machine learning model and the empirical formulas is presented in Table 4. When evaluated based on the same dataset, the proposed machine learning model showed a distinct performance improvement over the existing empirical formulas across all metrics. In terms of predictive accuracy (RMSE), the machine learning model achieved 0.22, an error level approximately 3 times lower than HEC-18 (0.62) and more than 5 times lower than Wilson (1.07) and Briaud (1.15). The R2 value, which signifies the model’s explanatory power, also recorded 0.80 for the machine learning model, showing a high goodness-of-fit that explains 80% of the data’s variability. In contrast, the existing empirical formulas showed low explanatory power, ranging from 0.10 to 0.35. Particularly in the bias (B) metric, the existing formulas showed a clear tendency to overestimate scour depth, with a consistent positive bias between 0.41 and 0.76. However, the machine learning model showed a value of 0.0006, which is close to 0, confirming that the bias was virtually eliminated. Furthermore, the machine learning model also recorded low values for the SI (0.15) and Se (0.22) metrics compared to the existing empirical formulas. The NGBoost model demonstrated superior performance in all evaluation metrics, including accuracy, agreement, and stability. This suggests that NGBoost is a model capable of precisely predicting the scour phenomenon by effectively learning the actual hydraulic relationships.

3.4. Probabilistic Reliability Analysis

For the probabilistic reliability analysis, we first specified probabilistic models for the four non-dimensional input variables y/b, V/Vc, b/d50, and Fr. The empirical distributions of these variables were examined using histograms of all 552 laboratory measurements (Figure 7). All four variables take only positive values and exhibit right-skewed shapes with a long upper tail. Based on the observed positivity and right-skewness of the data, each variable was assumed to a lognormal distribution. For each variable X , the log-transformed data Z = l n X were used to estimate the distribution parameters. Specifically, the sample mean μ l n and standard deviation σ l n of Z were computed and adopted as the parameters of the lognormal distribution. To prevent unrealistically small or large values during sampling, each lognormal distribution was truncated between the empirical 2.5th and 97.5th percentiles of the observed data, denoted as x m i n and x m a x , which contain approximately 95% of the measurements. A total of 10,000 input scenarios were then generated from these truncated lognormal distributions and propagated through the NGBoost model. The adopted distribution types and estimated parameters are summarized in Table 5.
The NGBoost model developed in this study was used as the probabilistic prediction model. The overall model prediction statistics aggregated by applying the law of total variance to the prediction for each scenario ( μ X i ,   σ 2 ( X i ) ), were calculated as μ R = 1.400 and σ R = 0.361 . When compared to the measurement statistics (   μ S = 1.446 , σ S = 0.473 ), the model, on average, underestimates by about 3% and also tends to estimate a smaller overall variability. Without applying a correction factor ( α = 1 ), the reliability index was β 0.077 , corresponding to a probability of exceedance of P e = Φ 0.077 0.53 . This implies a 53% probability that the actual scour will exceed the prediction when the raw prediction value is used.
Figure 8 and Figure 9 quantitatively show how the system’s safety level (reliability index and probability of exceedance) changes when the conservatism of the prediction is increased by adjusting the correction factor α . The α β curve shows that as α increases, the reliability index β consistently increases (Figure 9). This confirms the intuitive method that increasing the prediction value improves the system’s safety. Likewise, the α P e curve shows that as α increases, the probability of exceedance P e decreases. However, an important point in both curves is that the efficiency of safety improvement gradually decreases as α increases. The slope of the α β curve steadily decreases as α increases (Figure 8). This means that achieving a higher level of safety requires a disproportionately larger increase in the correction factor α . Similarly, as the α P e curve is on a log scale, its slope also decreases in magnitude (Figure 9). This means that as the target safety level gets higher, the required increase in the correction factor (cost) grows progressively larger. This trade-off, where the cost required for additional safety improvement increases as the target safety level gets higher, allows one to determine the optimal correction factor α by setting a target reliability index or probability of exceedance.
β = 0.5   ( Reliability   69 % ) α 1.268
β = 1.0   ( Reliability   84 % ) α 1.557
β = 1.5   ( Reliability   93 % ) α 1.937
β = 2.0   ( Reliability   97.7 % ) α 2.478
β = 2.5   ( Reliability   99.4 % ) α 3.353

4. Discussion

Both CatBoost and NGBoost showed good generalization performance in predicting local scour depth around bridge piers. CatBoost was slightly superior in terms of point prediction accuracy, while NGBoost was somewhat more stable in MAPE and bias on the test dataset, and both models maintained consistent performance without overfitting. However, NGBoost was selected as the final analysis model as it is more suitable for subsequent probabilistic design analysis, thanks to its advantage of providing individual conditional prediction distributions. In contrast to earlier pier-scour studies that mainly relied on deterministic machine learning models for point predictions, the present study adopts NGBoost to directly learn the conditional distribution of scour depth given the hydraulic variables. As an additional benefit, this probabilistic formulation naturally provides uncertainty-aware predictions that can be propagated into the proposed reliability-based design framework.
As a result of validating the probabilistic predictions, the measured coverage on the test set for μ ± σ (68%) was 69.6%, and for μ ± 1.96 σ (95%) it was 94.6%, which are close to the nominal levels (68.27%, 95.00%). The 95% binomial proportion confidence intervals (Wilson) for these were 56.7–80.1% and 85.4–98.2%, respectively, both of which include the nominal levels. The mean of the standardized residuals was −0.125 and the standard deviation was 1.077, confirming that the magnitude of the prediction interval width generally aligns with the actual variability.
The SHAP analysis results showed that y/b was the primary influencing variable, followed by V/Vc, b/d50 and Fr. y/b showed a 3-stage pattern where it increased scour up to a certain level and then plateaued, while V/Vc showed a transition near the critical condition and a re-ascent in the high-velocity region. b/d50 showed a tendency for a peak contribution in the mid-range followed by mitigation, and Fr showed non-monotonicity due to the combined effect of y and V. These patterns are consistent with prior hydraulic knowledge, suggesting that the model learned the physical mechanisms consistently, going beyond simple statistical fitting. Furthermore, local interpretation demonstrated NGBoost’s ability to quantify the data’s inherent variability, reporting the limits of information unexplained by the model as high standard deviations.
In the performance comparison against empirical formulas, NGBoost significantly outperformed the four empirical models (Wilson, Melville, HEC-18, Briaud) on all metrics, such as RMSE and R2, and virtually eliminated bias. This is presumed to be because it captured and explained the complex interactions more effectively than the empirical formulas.
In the probabilistic reliability analysis, the model’s mean showed an underestimation of about 3% compared to measurements, and the variance also showed a slight underestimation trend. At a correction factor of α = 1 , the reliability index was calculated as β 0.077 and P e 53 % . The α β and α P e curves allowed for the calculation of α corresponding to a target reliability level, confirming that the efficiency of additional conservatism decreases as the safety level increases. This allows for practical application by first setting a target probability of exceedance or reliability level and selecting the corresponding correction factor.
This study also has several limitations. Because the analysis was conducted based solely on laboratory data, it does not fully reflect the complex geomorphological and hydraulic conditions of actual rivers. In addition, some potentially important explanatory variables were not considered in the model, so there is a possibility of expanded prediction uncertainty when these factors vary. Future work should therefore include extended validation using field data as well as model development that incorporates additional variables and more diverse hydraulic conditions.

5. Conclusions

This study developed gradient-boosting models (CatBoost and NGBoost) for predicting local scour depth around bridge piers using non-dimensionalized laboratory data and embedded them in a probabilistic reliability framework. Although CatBoost showed slightly better point-prediction accuracy, NGBoost achieved comparable generalization performance and was ultimately adopted as the final model because it directly learns the conditional distribution of scour depth and supports uncertainty-aware predictions for design.
The probabilistic predictions of NGBoost were reasonably well calibrated, and the learned uncertainty bands were broadly consistent with the observed variability. SHAP-based global and local interpretations showed that the influence patterns of y/b, V/Vc, b/d50, and Fr on scour depth are physically plausible and consistent with prior hydraulic knowledge, indicating that the model captures meaningful mechanisms rather than merely fitting noise. Compared with conventional empirical formulas, the proposed model substantially reduced prediction errors and bias, highlighting the benefit of data-driven modeling for complex hydraulic–sediment interactions.
By combining interpretable machine learning, probabilistic prediction, and reliability analysis, this study provides a rational basis for reliability-informed design against local scour at bridge piers. Future work should extend the framework to field datasets and incorporate additional hydraulic and geomorphological variables to further improve robustness and practical applicability.

Author Contributions

Conceptualization, J.C. and J.K.; methodology, formal analysis and validation, J.C. and T.K.; writing—original draft preparation, J.C. and J.K.; writing—review and editing, J.C., S.K. and T.K.; supervision, S.K. and T.K.; funding acquisition, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the “Graduate school of Green Restoration specialization” of Korea Environmental Industry & Technology Institute grant funded by the Ministry of Environment, Republic of Korea.

Data Availability Statement

The data presented in this study are available in [A Pier-Scour Database: 2427 Field and Laboratory Measurements of Pier Scour] at [pubs.usgs.gov/ds/0845/], reference number [30]. These data were derived from the following resources available in the public domain: [pubs.usgs.gov/ds/0845/].

Conflicts of Interest

Author Jongyeong Kim was employed by the company K-Watercraft. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Briaud, C.Y.; Ting, F.C.K.; Chen, H.C.; Gudavalli, R.; Perugu, S.; Wei, G. SRICOS: Prediction of scour rate in cohesive soils at bridge piers. J. Geotech. Geoenciron. Eng. 1999, 125, 237–246. [Google Scholar] [CrossRef]
  2. Wardhana, K.; Hadipriono, F.C. Analysis of Recent Bridge Failures in the United States. J. Perform. Constr. Facil. 2003, 17, 144–150. [Google Scholar] [CrossRef]
  3. Melville, B.; Coleman, S.E. Bridge Scour; Water Resources Publications, LCC: Highlands Ranch, CO, USA, 2000. [Google Scholar]
  4. Brandimarte, L.; Paron, P.; Baldassarre, G.D. Brdige Pier Scour: A Review of Processes, Measurements and Estimates. Environ. Eng. Manag. J. 2012, 11, 975–989. [Google Scholar] [CrossRef]
  5. Deng, L.; Cai, C.S. Bridge Scour: Prediction, Modeling, Monitoring, and Countermeasures-Review. Pract. Period. Struct. Des. Constr. 2010, 15, 125–134. [Google Scholar] [CrossRef]
  6. Arneson, L.A.; Zevenbegan, L.W.; Lagasse, P.F.; Clopper, P.E. Evaluating Scour at Bridges, 5th ed.; Hydraulic Engineering Circular No. 18 (HEC-18); FHWA-HIF-12-003; U.S. Department of Transportation, Federal Highway Administration: Washington, DC, USA, 2012.
  7. Melviile, B.W.; Sutherland, A.J. Design Method for Local Scour at Bridge Piers. J. Hydraul. Eng. 1988, 114, 1210–1226. [Google Scholar] [CrossRef]
  8. Sheppard, D.M.; Miller, W., Jr. Live-Bed Local Pier Scour and Experiments. J. Hydraul. Eng. 2006, 132, 635–642. [Google Scholar] [CrossRef]
  9. Gaudio, R.; Grimaldi, C.; Tafarojnoruz, A.; Calomino, F. Comparsion of Formulae for the Prediction of Scour Depth at Piers. In Proceedings of the 1st IAHR Europe Congress, Ediburgh, UK, 4–6 May 2010. [Google Scholar]
  10. Johnson, P.A. Comparison of Pier-Scour Equations Using Field Data. J. Hydrual. Eng. 1995, 121, 626–629. [Google Scholar] [CrossRef]
  11. Jones, J.S. Comparison of Prediction Equations for Bridge Pier and Abutment Scour. In Proceedings of the 2nd Bridge Engineering Conference, Washington, DC, USA, 24–26 September 1984. [Google Scholar]
  12. Amini, A.; Mahmmod, T.A.; Ghazali, A.H. Optimizing an Evaluation of Scour Depth Prediction Techniques for Columns as Component of Complex Bridge Piers. Adv. Civ. Eng. Environ. Sci. 2024, 1, 64–69. [Google Scholar]
  13. Amini, A.; Mohammad, T.A.; Aziz, A.A.; Ghazali, A.H.; Huat, B.B.K. A local scour prediction method for pile caps in complex piers. Proc. ICE Water Manag. 2011, 164, 73–80. [Google Scholar] [CrossRef]
  14. Roulund, A.; Sumer, B.M.; Fredsoe, J.; Michelsen, J. Numerical and experimental investigation of flow and scour around a circular pile. J. Fluid Mech. 2005, 534, 351–401. [Google Scholar] [CrossRef]
  15. Khosronejad, A.; Kang, S.; Sotiropoulos, F. Experimental and computational investigation of local scour around bridge piers. Adv. Water Resour. 2012, 37, 73–85. [Google Scholar] [CrossRef]
  16. Sumer, B.M.; Fredsoe, J. The Mechanics of Scour in the Marine Environment; World Scientific: Singapore, 2002. [Google Scholar]
  17. Sharafi, H.; Ebtehaj, I.; Bonakdari, H.; Zaji, A.H. Design of a support vector machine with different kernel functions to predict scour depth around bridge piers. Nat. Hazards 2016, 84, 2145–2162. [Google Scholar] [CrossRef]
  18. Kambekar, A.R.; Deo, M.C. Estimation of pile group scour using neural networks. Appl. Ocean Res. 2003, 25, 225–234. [Google Scholar] [CrossRef]
  19. Bateni, S.M.; Borghei, S.M.; Jeng, D.S. Neural network and neuro-fuzzy assessments for scour depth around bridge piers. Eng. Appl. Artif. Intell. 2007, 20, 401–414. [Google Scholar] [CrossRef]
  20. Lee, T.L.; Jeng, D.S.; Zhang, G.H.; Hong, J.H. Neural Network Modeling for Estimation of Scour Depth Around Bridge Piers. J. Hydrodyn. 2007, 19, 378–386. [Google Scholar] [CrossRef]
  21. Kaya, A. Artifical neural network study of observed pattern of scour depth around bridge piers. Comput. Geotech. 2010, 37, 413–418. [Google Scholar] [CrossRef]
  22. Azamathulla, H.M. Gene-expression programming to predcit scour at a bridge abutment. J. Hydroinform. 2012, 14, 324–331. [Google Scholar] [CrossRef]
  23. Pal, M.; Singh, N.K.; Tiwari, N.K. M5 Model tree for pier scour prediction using field dataset. KSCE J. Civ. Eng. 2012, 16, 1079–1084. [Google Scholar] [CrossRef]
  24. Kim, T.; Shahriar, A.R.; Lee, W.D.; Gabr, M.A. Interpretable machine learning scheme for predicting bridge pier scour depth. Comput. Geotech. 2024, 170, 106302. [Google Scholar] [CrossRef]
  25. Kim, T.; Shahriar, A.R.; Lee, W.D.; Choi, Y.; Kwon, S.; Gabr, M.A. Field data-based prediction of local scour depth around bridge piers using interpretable machine learning. Transp. Geotech. 2025, 52, 101567. [Google Scholar] [CrossRef]
  26. Homaei, F.; Najafzadeh, M. A reliability-based probabilistic evaluation of the wave-induced scour depth around marine structure piles. Ocean Eng. 2020, 196, 106818. [Google Scholar] [CrossRef]
  27. Vatani, A.; Jafari-Asl, J.; Ohadi, S.; Hamzehkolaei, N.S.; Ahmadabadi, S.A.; Correia, J.A.F.O. An efficient surrogate model for reliability analysis of the marine structure piles. Marit. Eng. 2023, 176, 176–192. [Google Scholar] [CrossRef]
  28. Jafari-Asl, J.; Seghier, M.E.A.B.; Ohadi, S.; Dong, Y.; Plevris, V. A Comparative Study on the Efficiency of Reliability Methods for the Probabilistic Analysis of Local Scour at a Bridge Pier in Clay-Sand-Mixed Sediments. Modelling 2021, 2, 63–77. [Google Scholar] [CrossRef]
  29. Hosseini, A.R.M.; Razzaghi, M.S. Probabilistic seismic safety assessment of bridges with random pier scouring. Proc. ICE Water Manag. 2024, 177, 838–855. [Google Scholar] [CrossRef]
  30. Benedict, T.; Caldwell, A.W. A Pier-Scour Database: 2,427 Field and Laboratory Measurements of Pier Scour; U.S. Geological Survey Data Series 845; U.S. Geological Survey: Reston, VA, USA, 2014.
  31. Melville, B. The physics of local scour at bridge piers. In Proceedings of the 4th International Conference on Scour and Erosion (ICSE-4), Tokyo, Japan, 5–7 November 2008. [Google Scholar]
  32. U.S. Army Corps of Engineers-Hydrologic Engineering Center (USACE-HEC). HEC-RAS Hydraulic Reference Manual; Version 6.4.1; USACE-HEC: Davis, CA, USA, 2023.
  33. Molinas, A. Bridge Scour in Nonuniform Sediment Mixtures and in Cohesive Materials: Synthesis Report; FHWA-RD-03-083; U.S. Department of Transportation, Federal Highway Adiministration: McLean, VA, USA, 2004.
  34. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the 32nd Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
  35. Duan, T.; Anand, A.; Ding, D.Y.; Thai, K.K.; Basu, S.; Ng, A.; Schuler, A. NGBoost: Natural Gradient Boosting for Probabilistic Prediction. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), Virtual Event, 13–18 July 2020. [Google Scholar]
  36. Amini, A.; Hamidi, S.; Shirzadi, A.; Behmanesh, J.; Akib, S. Efficiency of artificial neural networks in determining scour depth at composite bridge piers. Int. J. River Basin Manag. 2021, 19, 327–333. [Google Scholar] [CrossRef]
  37. Najafzadeh, M.; Barani, G.A.; Azamathulla, H.M. GMDH to predict scour depth around a pier in cohesive soils. Appl. Ocean Res. 2013, 40, 35–41. [Google Scholar] [CrossRef]
  38. Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  39. Nowak, A.S.; Collins, K.R. Reliability of Structures, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
  40. Chechkin, A.; Pleshakova, E.; Gataullin, S. A Hybrid KAN-BiLSTM Transformer with Multi-Domain Dynamic Attention Model for Cybersecurity. Technologies 2025, 13, 223. [Google Scholar] [CrossRef]
  41. Wilson, K.V. Scour at Selected Bridge Sites in Mississippi; Water-Resources Investigations Report 94-4241; U.S. Geological Survey: Jackson, MS, USA, 1995.
  42. Raudviki, A.J.; Ettema, R. Clear-Water Scour at Cylindrical Piers. J. Hydraul. Eng. 1983, 109, 338–350. [Google Scholar]
  43. Melville, B.W. Pier and Abutment Scour: Integrated Approach. J. Hydraul. Eng. 1997, 123, 125–136. [Google Scholar] [CrossRef]
  44. Briaud, J.L. Scour Depth at Bridges: Method Including Soil Properties. 1: Maximum Scour Depth Prediction. J. Geotech. Geoenviron. Eng. 2014, 141, 04014104. [Google Scholar] [CrossRef]
Figure 1. Correlation analysis between input variables.
Figure 1. Correlation analysis between input variables.
Water 17 03574 g001
Figure 2. Comparison of Predicted and Measured values: (a) CatBoost Train; (b) CatBoost Validation; (c) CatBoost Test; (d) NGBoost Train; (e) NGBoost Validation; (f) NGBoost Test.
Figure 2. Comparison of Predicted and Measured values: (a) CatBoost Train; (b) CatBoost Validation; (c) CatBoost Test; (d) NGBoost Train; (e) NGBoost Validation; (f) NGBoost Test.
Water 17 03574 g002
Figure 3. Comparison of Predicted and Measured values with Conditional Intervals: (a) Train; (b) Validation; (c) Test.
Figure 3. Comparison of Predicted and Measured values with Conditional Intervals: (a) Train; (b) Validation; (c) Test.
Water 17 03574 g003
Figure 4. Predictive intervals on the test set.
Figure 4. Predictive intervals on the test set.
Water 17 03574 g004
Figure 5. Analysis of variable importance for predicted value: (a) SHAP feature importance; (b) SHAP summary plot.
Figure 5. Analysis of variable importance for predicted value: (a) SHAP feature importance; (b) SHAP summary plot.
Water 17 03574 g005
Figure 6. SHAP values of input variables: (a) y/b; (b) V/Vc; (c) b/d50; (d) Fr.
Figure 6. SHAP values of input variables: (a) y/b; (b) V/Vc; (c) b/d50; (d) Fr.
Water 17 03574 g006
Figure 7. Distribution feature of input data: (a) y/b; (b) V/Vc; (c) b/d50; (d) Fr..
Figure 7. Distribution feature of input data: (a) y/b; (b) V/Vc; (c) b/d50; (d) Fr..
Water 17 03574 g007
Figure 8. Relation between correction factor and reliability index.
Figure 8. Relation between correction factor and reliability index.
Water 17 03574 g008
Figure 9. Relation between correction factor and Probability of exceedance.
Figure 9. Relation between correction factor and Probability of exceedance.
Water 17 03574 g009
Table 1. Range of variables in data used in the study.
Table 1. Range of variables in data used in the study.
-b (m)V (m/s)Vc (m/s)y (m)d50 (mm)ys (m)Data
Min0.01520.14940.22250.02130.220.0030552
Max0.91442.15801.27411.89897.801.4112
Mean0.10670.51210.43590.26821.190.1341
Table 2. Performance evaluation results of CatBoost, NGBoost.
Table 2. Performance evaluation results of CatBoost, NGBoost.
ModelDataRMSER2SIBiasSeMAPE (%)
CatBoostTrain0.180.860.130.00020.1811.64
Validation0.240.770.17−0.00780.2415.72
Test0.220.760.150.03120.2217.93
NGBoostTrain0.210.820.14−0.00100.2114.38
Validation0.250.750.18−0.00470.2517.39
Test0.230.750.160.01840.2317.17
Table 3. Empirical formulas used for model comparison.
Table 3. Empirical formulas used for model comparison.
Empirical
Formula
ExpressionVariablesKey Aspects
Wilson y s b = 0.9 ( y b ) 0.4 b = projected pier width Data   = Field
y s = 0.2 6.2   m
y = 0.7 11.2   m
V = 0.4 3.2   m / s
d 50 = 0.3 7.6   m m
Melville y s = K y b K I K d K s K θ K G K y b = pier depth size factor
K I = flow intensity factor
K d = sediment size factor
K s = pier nose shape factor
K θ = pier alignment factor
K G = channel geometry factor
(=1 for pier)
Data   = Field, Lab
Pier type
= cylindrical pier
HEC-18 y s b = 2 K 1 K 2 K 3 ( y b ) 0.35 F r 0.43 K 1 = pier nose shape factor
K 2 = pier alignment factor
K 3 = bed condition factor
F r = Froude number
Data   = Field, Lab
    d 50 = 0.2 0.5   m m
Briaud y s b = 2.2 K p w K p s h K p a K p s p ( 2.6 F p i e r F c ( p i e r ) ) 0.7 K p w = water depth influence factor
K p s h = pier shape influence factor
K p a = aspect ratio influence factor
K p s p = pier spacing influence factor
F p i e r = pier Froude number
F c ( p i e r ) = critical pier Froude number
Data   = Lab
    d 50 = 0.1 0.6   mm
Critical   shear   stress   = 0.1–0.8 Pa
Table 4. Comparison of empirical formulas and machine learning model.
Table 4. Comparison of empirical formulas and machine learning model.
ModelRMSER2BSISe
Wilson1.070.110.410.740.99
Melville0.780.350.680.540.40
HEC-180.620.320.420.430.46
Briaud1.150.100.760.790.86
NGBoost0.220.80 6 × 10 4 0.150.22
Table 5. Probabilistic models assumed for input variables in the Monte Carlo simulation.
Table 5. Probabilistic models assumed for input variables in the Monte Carlo simulation.
VariableDistribution Typeμlnσlnxminxmax
y/bLognormal4.57531.12669.9798901.1322
V/VcLognormal0.95070.96800.397913.3041
b/d50Lognormal−1.15820.61480.10911.0588
FrLognormal0.08580.51790.50353.9048
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Choi, J.; Kim, J.; Kwon, S.; Kim, T. Probabilistic Prediction of Local Scour at Bridge Piers with Interpretable Machine Learning. Water 2025, 17, 3574. https://doi.org/10.3390/w17243574

AMA Style

Choi J, Kim J, Kwon S, Kim T. Probabilistic Prediction of Local Scour at Bridge Piers with Interpretable Machine Learning. Water. 2025; 17(24):3574. https://doi.org/10.3390/w17243574

Chicago/Turabian Style

Choi, Jaemyeong, Jongyeong Kim, Soonchul Kwon, and Taeyoon Kim. 2025. "Probabilistic Prediction of Local Scour at Bridge Piers with Interpretable Machine Learning" Water 17, no. 24: 3574. https://doi.org/10.3390/w17243574

APA Style

Choi, J., Kim, J., Kwon, S., & Kim, T. (2025). Probabilistic Prediction of Local Scour at Bridge Piers with Interpretable Machine Learning. Water, 17(24), 3574. https://doi.org/10.3390/w17243574

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop