A Hybrid Surrogate Model for the Prediction of Solitary Wave Forces on the Coastal Bridge Decks

: To facilitate the establishment of the probabilistic model for quantifying the vulnerability of coastal bridges to natural hazards and support the associated risk assessment and mitigation activities, it is imperative to develop an accurate and efﬁcient method for wave forces prediction. With the fast development of computer science, surrogate modeling techniques have been commonly used as an effective alternative to computational ﬂuid dynamics for the establishment of a predictive model in coastal engineering. In this paper, a hybrid surrogate model is proposed for the efﬁcient and accurate prediction of the solitary wave forces acting on coastal bridge decks. The underlying idea of the proposed method is to enhance the prediction capability of the constructed model by introducing an additional surrogate to correct the errors made by the main predictor. Speciﬁcally, the regression-type polynomial chaos expansion (PCE) is employed as the main predictor to capture the global feature of the computational model, whereas the interpolation-type Kriging is adopted to learn the local variations of the prediction error from the PCE. An engineering case is employed to validate the effectiveness of the hybrid model, and it is observed that the prediction performance (in terms of residual mean square error and correlation coefﬁcient) of the hybrid model is superior to the optimal PCE and artiﬁcial neural network (ANN) for both horizontal and vertical wave forces, albeit the maximum PCE degrees used in the hybrid model are lower than the optimal degrees identiﬁed in the pure PCE model. Moreover, the proposed hybrid model also enables the extraction of explicit predictive equations for the parameters of interest. It is expected that the hybrid model could be extended to more complex wave conditions and structural shapes to facilitate the life-cycle structural design and analysis of coastal bridges.


Introduction
With the development of coastal communities and the tourist economy, the construction of coastal bridges is indispensable for establishing a complete and efficient transportation network to meet the daily commuting needs as well as to facilitate any rescue efforts after an extreme natural disaster. However, coastal bridges are often exposed to severe natural environmental conditions during their service life, and recent extreme natural events have demonstrated the vulnerability of coastal bridges to the wave forces generated by hurricanes and tsunamis, especially for the low-lying bridges that are inadequately designed for the storm surge and wave-induced forces [1][2][3][4]. Indeed, many coastal regions have sustained devastating damages to the bridges under the impact of extreme waves, e.g., more than 182 bridge spans were completely removed from their supporting structures over the gulf coast of Louisiana and Mississippi in Hurricane Katrina in 2005 and a total of 252 bridges were washed away in the 2011 Great East Japan Tsunami. The destruction of bridges may severely impact the recovery and prosperity of the coastal communities [5,6], thus it is necessary to evaluate the magnitude of wave forces and the bridge capacity before appropriate preventive measures are taken. In this regard, a method that can accurately predict the wave forces on the bridge decks promptly is essential for the stakeholders to make critical decisions prior to the landfall of hurricanes [7]. Moreover, an effective prediction method can also facilitate the safety assessment of the bridge under a probability-based framework, e.g., structural reliability analysis [8], and enable the efficient structural analysis under the action of other extreme loads such as seismic load [9][10][11][12].
Over the last two decades, numerous research efforts have been devoted to the use of computational fluid dynamics (CFD) method for investigating the wave forces acting on bridge decks [13][14][15][16][17]. The lateral restraining stiffness effect on bridge deck wave interactions was studied by embedding a custom code into ANSYS Fluent [18]. Based on the smoothed particle hydrodynamics (SPH) method, the phenomenon of tsunami waves impinging on bridge superstructures was simulated [19]. Using OpenFOAM, the phenomenon of the tsunami-like wave force on box girder and T girder bridges were compared [20]. Immersed boundary method was also employed to study wave-bridge deck interactions [21]. As a time-varying dynamic system, the time-frequency characteristics of waves play an important role in the wave-structure interactions, and many scholars also have carried out relevant studies [22][23][24][25]. The influence of different wave frequencies on the motion of floating bridges was investigated [26]. It is demonstrated that the second-order difference-frequency wave loads contribute significantly to sway motion, axial force, and strong axis bending moments along floating bridges. The spectral analysis of the vertical wave forces acting on bridge decks by Fourier, wavelet, and Hilbert-Huang transform (HHT) methods were used, and then an empirical formula is proposed to predict the vertical wave forces [25]. Wavelet transforms was introduced to analyze the local characteristics of the incident waves, incline forces and transfer functions between them [27]. It is demonstrated that the nonlinear wave-structure interactions are significant for the wave components in the diffraction effect regime. Although various simulation models and analysis methods are available for the investigation of the wave forces exerted on the bridge deck, it would be time-consuming or cumbersome to obtain the prediction due to the intrinsic complicity of the bridge deck-wave interaction.
With the development of computer science and machine learning theory, the use of advanced surrogate modeling techniques in coastal engineering has drawn increasingly more attention in recent years [28][29][30][31][32]. By combining the M5 model tree and nonlinear regression techniques, the prediction of non-broken wave run-up on single piles is investigated in [32]. A novel model was proposed based on Extreme Learning Machine (ELM) and laboratory experiments to estimate the tsunami wave forces on coastal bridges [33]. The effects of three different machine learning techniques in predicting the wave loads on bridge decks were also compared [34]. It is proved that machine learning techniques can provide guidance for time-history prediction requirements. A new data-driven method based on the conditional Generative Adversarial Network (GAN) principle was proposed [35], through which the three-dimensional nonlinear wave loads and run-up on a fixed structure can be predicted accurately. To more efficiently predict the wave forces, the artificial neural network (ANN) is employed in [36] to establish the link between model parameters (i.e., the still-water level, wave height, and bottom elevation of the girder/superstructure) and wave forces, through which the prediction of the vertical and horizontal forces can readily be obtained in seconds. ANN was also used to quantify the loading effects with multiple surges and wave parameters [37]. Based on a wind-wave-bridge system, the effects of non-stationary winds and waves on the stochastic response of cable-stayed bridge girders were investigated using ANN [38]. It is noted, however, that the above-mentioned approaches require fine-tuning of the parameters involved in the neural network, which is a cumbersome task involving trial and error. To address this issue, a model that is easy to implement and capable of providing a predictive equation is highly desirable.
In this paper, a hybrid surrogate model based on the polynomial chaos expansions (PCE) and Kriging is proposed to establish the predictive model for the solitary wave forces acting on coastal bridge decks. The underlying idea of the proposed method is to enhance the prediction capability of the constructed model by introducing an additional surrogate to correct the errors made by the main predictor. Specifically, this hybrid model adopts the regression-type PCE to capture the global feature of the computational model and the interpolation-type Kriging to capture the local variations of the prediction error. With the availability of the predictive model, the establishment of the probabilistic models for quantifying the vulnerability of the coastal bridges under natural hazards and the associated risk assessment can proceed easily and efficiently.

Polynomial Chaos Expansions
The polynomial chaos expansions (PCE) was originally proposed by Wiener to expand the stochastic process using a set of Hermite polynomials with the Gaussian random variables as the input parameters and was later generalized to account for other commonly used distributions other than Gaussian [39]. The PCE has gained its popularity for uncertainty quantification in the modern engineering community, including the ever-increasing application in the field of CFD simulations [40,41]. More recently, it has been shown that a PCE surrogate model purely trained on a data set can reach point-wise predictions with comparable accuracy to that of other machine learning models, e.g., support vector regressions and neural networks [42]. This somehow justifies the application of PCE for wave forces prediction in this study, where the data set is selected a priori.
In PCE, the simulator output (model response) is expanded onto a space spanned by a set of bases consisting of multivariate polynomials that are orthogonal to the joint probability density function (PDF) of the input variables X, and the model response approximated using PCE can be expressed as: where η α 's are the unknown coefficients to be determined and the α = (α 1 , α 2 , . . . , α n ) ∈ N n is a multidimensional index vector that indicates the components of the multivariate polynomials ψ α (X), which is constructed using a tensor product of the orthogonal univariate polynomials: For instance, if the variable X i follows a Gaussian distribution, φ i α i (X i ) is a set of Hermite polynomials of order α i , whereas Laguerre polynomials will be used for Gamma distribution. Based on this definition, the elements of the multidimensional index vector α = (α 1 , α 2 , . . . , α n ) of the multivariate orthonormal polynomials can also be interpreted as the degrees of the univariate polynomials and |α| = α 1 + α 2 + . . . + α n is the degree of the corresponding multivariate polynomials.
The spectral representation of model response in Equation (1) involves an infinite number of polynomial bases, which may cause troubles in practical application. For the computational purpose, a truncation scheme is introduced for Equation (1) such that only those polynomials with total degree up to p are retained, i.e., 0 ≤ |α| ≤ p [43]: where η T = {η 0 , η 2 , . . . , η P−1 } is the polynomial coefficient vector and ψ(X) = ψ α 0 (x), ψ α 1 (x), . . . , ψ α P−1 (x) T is the matrix gathers all the orthonormal polynomial basis that satisfies {ψ α , 0 ≤ |α| ≤ p}. The above formulation leads to the so-called full PCE model, where the total number of terms involved in the expansion is given by Once the polynomial terms are selected, all that remains is to determine the expansion coefficients η α using information contained in the experimental design (data set) generated from the simulator. In this study, the regression method in the category of non-intrusive approaches is employed and can be formulated as the following least-squares minimization problem [44]:η Given a data set with the input vector X = x 1 , x 2 , . . . , x N T and the correspond- . . , M x N T , the PCE coefficients can be estimated by solving Equation (4) using the ordinary least-square method, which gives: where the data matrix Ψ N×P is a collection of the values of polynomial basis at the experimental design points and has the following form: It is noted that the size of the data set should be sufficiently large to ensure the above data matrix is well-conditioned, such that the regression problem is well-posed. Therefore, it is necessary to use an experimental design whose size N is greater than the total number of terms P in PCE, i.e., P < N. In practical applications, N = kP, k ≥ 2 model evaluations are generally required to reach an approximation with sufficient accuracy.

Kriging
Kriging is a stochastic interpolation method where the model response is assumed to be a realization of a random function, and the Kriging model consists of a regression part and a stochastic process as follows [45]: where

is a vector of regression functions, and
T is the vector of the corresponding regression coefficients; Z (x) represents a Gaussian process with zero mean and the following covariance functions: where σ 2 z is the variance of the Gaussian process; R x i , x j ; θ denotes the spatial correlation function between samples x i and x j , and θ is a vector of hyper-parameters to be determined. The commonly used Gaussian correlation function can be expressed as follows: where θ k is the k-th correlation parameter in θ; x ik and x jk are the k-th coordinates of samples Once the correlation parameters are determined, the regression coefficients β = [β 1 , β 2 , · · · , β m ] T and the Gaussian process variance σ 2 z can be obtained as follows: With the availability of the associated parameters, the best linear unbiased prediction of the response at a new sample point x * can be computed as: is the vector of correlations between the new sample point x * and the points in the training data set X , i.e., r i = R(x * , x i ; θ), i = 1, . . . , N.

Proposed Hybrid Surrogate Model
In the application of surrogate modeling techniques, the relationship between the observed response y and the predicted oneŷ using a specific surrogate model can be expressed as: where ε is an error term that measures the deviation of the predicted value from the true one. In general, the surrogate model is first constructed from a training set and then the prediction is made directly from the model, without considering the error term during model construction and response prediction. This, however, would introduce large prediction errors if an unsuitable surrogate model is chosen for the problem at hand, especially when the given data set is small. To address this issue, a hybrid surrogate model is proposed here to establish approximating models for both structural response and prediction error. Specifically, the PCE is adopted to capture the global feature of the computational model and the Kriging model is employed to model the local variations of the prediction error, i.e., y ≈ŷ PCE +ε Kriging (15) In the proposed hybrid model, the first termŷ PCE on the right-hand side of Equaiton (15) serves as the main predictor of the structural response due to the excellent global fitting property of PCE, whereas the second termε Kriging aims to remove (reduce) the errors raised fromŷ PCE . Thus, given a training data set (X , Y ) for establishing the PCE, the corresponding data set for the construction of the Kriging model is (X , Y −ŷ PCE ). With the availability of the PCE and the Kriging model, the prediction of the response at a new sample point can be easily obtained from Equation (15).
In the sequel, the prediction of wave forces on a typical bridge deck-wave interaction case will be employed to investigate the applicability and validity of the proposed hybrid model.

Engineering Background and Data Preparation
To investigate the effectiveness of the proposed method for the prediction of wave forces, a two-dimensional bridge deck-wave interaction model as shown in Figure 1 is considered. The prototype bridge deck of this model is similar to the damaged I-10 bridge across Escambia Bay, and the solitary waves are used to represent the tsunamis and storm surge. According to the study performed in [46], the horizontal force F h and vertical force F v can be expressed as functions of the involved parameters: where the wave height H, the wave celerity C and the angle of incidence to the structure α are the wave variables in the model; the water depth d, the dynamic viscosity µ and the water density ρ are the fluid-related parameters; and the structural parameters are the deck width W, the deck height d b , the deck length L d , the deck clearance Z c , the rail height d r and the elevation of the bridge girder Z ele .

Engineering Background and Data Preparation
To investigate the effectiveness of the proposed method for the prediction of wav forces, a two-dimensional bridge deck-wave interaction model as shown in Figure 1 i considered. The prototype bridge deck of this model is similar to the damaged I-10 bridg across Escambia Bay, and the solitary waves are used to represent the tsunamis and storm surge. According to the study performed in [46], the horizontal force and vertical forc can be expressed as functions of the involved parameters: , = ( , , , , , , , , , , , , ) (16 where the wave height H, the wave celerity C and the angle of incidence to the structur are the wave variables in the model; the water depth d, the dynamic viscosity and th water density are the fluid-related parameters; and the structural parameters are th deck width W, the deck height , the deck length , the deck clearance , the rail heigh and the elevation of the bridge girder . In this study, extensive CFD simulations are performed using ANSYS Fluent, and total of 472 sampling pairs are generated for the construction of the hybrid surrogat model. For training surrogate models, the sampling pairs are generally composed of a the involved parameters (input) and the associated wave forces (output). However, som variables are depending on each other and/or may have a negligible effect on the evalua tion of wave forces. Moreover, the required number of samples in PCE increases dramat ically with the number of input parameters. Therefore, similar to the study carried out i [36], only the three critical parameters, namely the water depth d, the elevation of th bridge girder , and the wave height H, are used as the input for establishing the pre diction model. More details regarding the data preparation and the assumptions made o the bridge deck-wave interaction simulation model can be found in [15,18].

Surrogate Model Initiation and Assessment Metrics
The three input variables are assumed to follow a uniform distribution with a speci fied supporting range, as illustrated in Table 1. Thus, the normalized Legendre polynom als are used to derive the PCE, which can easily be achieved using the UQLab toolbo [44]. The degree adaptive algorithm is employed to automatically select the optimal de gree of PCE according to the available data set. The Kriging module in the UQLab [45] i also employed to establish the surrogate for the prediction error of the PCE, in which th ordinary Kriging is selected for modeling the trend. In this study, extensive CFD simulations are performed using ANSYS Fluent, and a total of 472 sampling pairs are generated for the construction of the hybrid surrogate model. For training surrogate models, the sampling pairs are generally composed of all the involved parameters (input) and the associated wave forces (output). However, some variables are depending on each other and/or may have a negligible effect on the evaluation of wave forces. Moreover, the required number of samples in PCE increases dramatically with the number of input parameters. Therefore, similar to the study carried out in [36], only the three critical parameters, namely the water depth d, the elevation of the bridge girder Z ele , and the wave height H, are used as the input for establishing the prediction model. More details regarding the data preparation and the assumptions made on the bridge deck-wave interaction simulation model can be found in [15,18].

Surrogate Model Initiation and Assessment Metrics
The three input variables are assumed to follow a uniform distribution with a specified supporting range, as illustrated in Table 1. Thus, the normalized Legendre polynomials are used to derive the PCE, which can easily be achieved using the UQLab toolbox [44]. The degree adaptive algorithm is employed to automatically select the optimal degree of PCE according to the available data set. The Kriging module in the UQLab [45] is also employed to establish the surrogate for the prediction error of the PCE, in which the ordinary Kriging is selected for modeling the trend. The use of appropriate evaluation metrics is important for evaluating the performance of a surrogate model. The commonly used metrics include the mean absolute error (MAE), mean squared error (MSE), root mean square error (RMSE), mean relative error (MRE) and correlation coefficient (R), to name a few. Among these available metrics, MAE is less biased for higher values, yet it may not adequately reflect the performance when dealing with large error values. On the contrary, RMSE is better in terms of reflecting performance when dealing with large error values and is more useful when lower residual values are preferred. As for the R, it is a useful index that detects the linear correlation between the true and predicted values, thus can be well-suited for measuring the performance of a surrogate model. In this regard, only the RMSE and R is employed as the error metrics in the current study, and they are defined as follows: where M is the number of samples in the test data set; y i andŷ i are the true response value and the response predicted by the surrogate model, respectively; y = 1/M M ∑ i=1 y i and i . In the training process, the data set is split into 3 folds, where one fold is left out as the test set and the other two folds are used as the training set. Thus, three different values of RMSE (R) can be obtained after the model is trained, and the mean value of RMSE (R) is then used as the indicator of the model accuracy, i.e., a model with R close to 1 and RMSE close to 0 is deemed as the model with excellent prediction ability.

Results and Discussion
Given the available data set, the PCE with different maximum degrees are constructed to investigate the effects of polynomial degrees on prediction accuracy. The predicted wave forces using PCE with degrees varying from 2 to 6 and the true ones in the test data set are compared in Figures 2 and 3, and the variations of R and RMSE with the PCE degrees for horizontal and vertical wave forces prediction are listed in Tables 2 and 3, respectively. As is seen from Figure 2, the horizontal wave forces can well be predicted by the PCE with a maximum degree of 2, and increasing the maximum degree up to 5 can further improve the prediction accuracy. However, for this particular case, the PCE with a maximum degree higher than 6 does not necessarily result in a better generalization ability, in that more samples might be required to accommodate the dramatically increased number of terms in PCE. This argument is also verified from the results of assessment metrics (R and RMSE) shown in Table 2, where the R of PCE with degree 7 (R = 0.9855) is even smaller than that with degree 2 (R = 0.9943) and the RMSE of PCE with degree 7 is the largest among the investigated degrees. Although the performance of PCE for vertical wave forces prediction is slightly worse than that for horizontal wave forces prediction, as shown in Figure 3 and Table 3, the overall trend of the prediction accuracy variation is similar to that observed in Figure 2 and Table 2, except that the optimal PCE degree is 6 for vertical forces prediction.
Moreover, it is noted that the prediction performance of PCE on the horizontal wave force is better than that on the vertical force. This might be because impinging force induced by the entrapped air underneath the bridge deck makes the relationship between the input parameters and vertical wave force more complicated. A feasible way to improve the prediction accuracy on the vertical wave force is using more samples with different wave scenarios, albeit this will require more effort in data preparation.
Moreover, it is noted that the prediction performance of PCE on the horizontal wave force is better than that on the vertical force. This might be because impinging force induced by the entrapped air underneath the bridge deck makes the relationship between the input parameters and vertical wave force more complicated. A feasible way to improve the prediction accuracy on the vertical wave force is using more samples with different wave scenarios, albeit this will require more effort in data preparation.  Moreover, it is noted that the prediction performance of PCE on the horizontal wave force is better than that on the vertical force. This might be because impinging force induced by the entrapped air underneath the bridge deck makes the relationship between the input parameters and vertical wave force more complicated. A feasible way to improve the prediction accuracy on the vertical wave force is using more samples with different wave scenarios, albeit this will require more effort in data preparation.    The prediction results using the proposed hybrid surrogate model is shown in Figure 4, where the optimal PCE degree for horizontal forces is found to be 2 and that for vertical forces is found to be 3. Although the maximum PCE degrees used in the hybrid model are lower than the optimal degrees identified in the pure PCE model (degree 5 for horizontal wave forces and degree 6 for vertical wave forces), the prediction performance of the hybrid model is superior to the optimal PCE for both horizontal and vertical wave forces. Specifically, for the horizontal wave forces prediction, the R and RMSE of the hybrid model are found to be 0.9975 and 3.70%, respectively; and these two values are found to be 0.9910 and 4.00% for the vertical wave forces prediction. Moreover, the results of the optimal ANN reported in [32] are also illustrated here for comparison purposes, as shown in Figure 5. Obviously, the proposed hybrid model exhibits better performance than the optimal ANN for horizontal wave forces prediction, and comparable accuracy is achieved in predicting the vertical forces for both models. Overall, the results verify the effectiveness of the error correction term in the proposed hybrid model to reduce the prediction error made by the PCE. In addition, it should be noted that the proposed hybrid model is easily implementable, without needing to tune numerous hyper-parameters and model structures as required in the ANN.      ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) ] . = [ , ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) , ( , , ) ] .  Figure 5. Correlation between the predicted wave forces using the optimal ANN and the true ones in the test data set.

Conclusions
To facilitate the establishment of the probabilistic model for quantifying the vulnerability of coastal bridges to natural hazards and support the associated risk assessment and mitigation activities, a hybrid surrogate model is proposed for efficient and accurate prediction of the solitary wave forces acting on coastal bridge decks and the corresponding predictive equations are obtained from the trained model. Unlike traditional surrogate models, this hybrid model includes an error correction term to reduce the prediction error from the main predictor. Specifically, the regression-type polynomial chaos expansion (PCE) is employed as the main predictor to capture the global feature of the computational model, whereas the interpolation-type Kriging is adopted to capture the local variations of the prediction error from the PCE. The prediction of wave forces on a typical bridge deck-wave interaction model is carried out and compared with other methods to demonstrate the effectiveness of the hybrid surrogate model. According to the obtained results, the following conclusions can be drawn: 1. The comparison among the predictive results of the PCE, the hybrid model, and those from the ANN indicates the enhanced performance of the proposed method.
In other words, this hybrid model can capture the underlying physical complexities in the bridge deck-wave interaction, and can thus be used to replace the original time-consuming CFD models for the wave forces prediction and the associated lifecycle-based probabilistic modeling. 2. The use of PCE and Kriging in this study offers several desirable advantages, e.g., the number of tuning parameters can be relatively small. In other words, only the maximum polynomial degree p needs to be tuned in the PCE, enabling the easy implementation of this approach. Moreover, the time required to establish the PCE and Kriging is only a few seconds on a standard laptop, making the prediction of wave forces rather efficient. These features distinguish the proposed hybrid model from other well-known machine learning approaches such as ANNs, which are known to be highly sensitive to their hyper-parameters and require an appropriate and generally cumbersome calibration procedure. 3. The prediction performance of PCE on the horizontal wave force is better than that on the vertical force. This might be because impinging force induced by the entrapped air underneath the bridge deck makes the relationship between the input parameters and vertical wave force more complicated. A feasible way to improve the prediction accuracy on the vertical wave force is using more samples with different wave scenarios, albeit this will require more effort in data preparation.
The limitations of the current study and future work are as follows: 1.
In the proposed hybrid model, only the PCE is used as the main predictor. However, this choice may not be appropriate when the number of training data is small, especially for engineering cases with many input parameters. Thus, the use of other effective surrogate models (e.g., support vector regression, radial basis function) or ensemble models as the main predictor may further enhance the applicability of the hybrid model.

2.
Since the training data in the engineering case is predefined, the number of samples in the data set might be too large or too small for the problem at hand, which could jeopardize the overall performance of the established surrogate model. Thus, the use of an adaptive algorithm that sequentially adds training samples to refine the surrogate model is a topic worth further exploring.
Funding: This research was funded by NSFC, grant number 52078425.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The details of the proposed methodology and of the specific values of the parameters considered have been provided in the paper. Hence, we are confident that the results can be reproduced. Readers interested in the source code are encouraged to contact the authors by email.