Scour Propagation Rates around Offshore Pipelines Exposed to Currents by Applying Data-Driven Models

: Offshore pipelines are occasionally exposed to scouring processes; detrimental impacts on their safety are inevitable. The process of scouring propagation around offshore pipelines is naturally complex and is mainly due to currents and/or waves. There is a considerable demand for the safe design of offshore pipelines exposed to scouring phenomena. Therefore, scouring propagation patterns must be focused on. In the present research, machine learning (ML) models are applied to achieve equations for the prediction of the scouring propagation rate around pipelines due to currents. The approaching ﬂow Froude number, the ratio of embedment depth to pipeline diameter, the Shields parameter, and the current angle of attack to the pipeline were considered the main dimensionless factors from the reliable literature. ML models were developed based on various setting parameters and optimization strategies coming from evolutionary and classiﬁcation contents. Moreover, the explicit equations yielded from ML models were used to demonstrate how the proposed approaches are in harmony with experimental observations. The performance of ML models was assessed utilizing statistical benchmarks. The results revealed that the equations given by ML models provided reliable and physically consistent predictions of scouring propagation rates regarding their comparison with scouring tests.


Introduction
Offshore pipeline structures are commonly employed to transport various fluids, such as oil, gas, oil-gas mixtures, and water.In offshore projects, there is a need to consider many engineering aspects, such as ecology, geo-hazards, and environmental loading [1].Once a submarine pipeline is constructed, several physical factors needing close attention are introduced, such as the state of the seabed, seabed mobility, submarine landslides, waves, and currents [2][3][4].The seabed has two various states: it is either relatively smooth or corrugated (or uneven) with high and low elevations.For an uneven seabed, the pipeline is subjected to free spans when connecting two high levels, leaving the section between these levels with unsupported points.When an unsupported section is extended, the bending stress exerted onto it may increase.From another point of view, vibrations that occur from current-related vortexes must be taken into consideration.Seabed leveling and post-installation support are protective measures for unsupported pipeline spans [2][3][4].A further risk to pipelines is the seabed mobility, with sand waves, in particular, being a major feature.Sand waves move with passing time.When a crest of the seabed bears a pipeline during the construction of the project, this area may find itself in a trough later during the operational lifespan of the offshore pipeline [2].As the third physical factor, submarine landslides occur when the sediment transport rate is high.Submarine landslides take place on steeper slopes.When the landslide process is intensified by displacement, offshore pipelines might be exposed to severe bending, with possible consequent tensile failure [3].Other mechanical factors are related to currents and waves.High-intensity currents and waves have repercussions on the laying operations [4,5].
The above-mentioned physical factors require great attention during pipeline installation and operation because these factors can disturb the safety of the pipeline itself.Underestimating these key elements might lead to scouring processes.A large number of attempts have been made during the past three decades to understand the mechanical factors (e.g., free spans, seabed sediment mobility, geomorphology of the seabed, waves, and currents) of scouring beneath seabed pipelines [6][7][8][9].Overall, the scouring process below pipelines has a three-dimensional flow structure in the field.Experimental observations have demonstrated that free spans are the primary cause of the scouring phenomenon.The length of free spans is controlled by various factors such as the flow conditions, soil conditions, sinking of the pipeline at the span shoulders, and sagging of the pipeline in the scour hole (e.g., [10][11][12][13]).Free spanning occurs in a suspended submarine pipeline segment where there is no joint point between the offshore pipelines and the surface of the seabed due to various causes, such as the scour of ocean currents, uneven surface of the seabed, human-made dangerous activities, and residual/thermal stresses [14].
The 3D scouring propagation around offshore pipelines exposed to currents has been limited to few studies-mainly through experimental investigations-during the last decade.Firstly, Cheng et al. [6] studied the three-dimensional scour rate of propagation along offshore pipelines after the scouring process was initiated.They found that some experimental variables (e.g., embedment depth of the offshore pipeline, live-bed motion state of seabed sediments due to currents, and flow incident angle) had meaningful impacts on the scouring propagation velocities along the pipeline.They proposed a regressionbased formulation for the approximation of the scouring propagation rates.Later, Wu and Chiew [7,8] carried out experimental investigations under clear-water conditions to understand the three-dimensional mechanism of scouring propagation beneath pipelines of various diameters.They emphasized that the rate of scouring propagation was controlled by the Froude number, the Shields parameter, and the initial embedment depth of the offshore pipeline.Cheng et al. [9] studied the scour rate propagation along offshore pipelines for combined wave and current flow conditions.Finally, they presented an empirical equation with a permissible level of precision to estimate the longitudinal rate of scouring propagation.
From the above-mentioned investigations, it can be inferred that a deeper understanding of three-dimensional scouring rates would require further efforts to achieve more comprehensive empirical models with more accurate performance than the current formulas presented in the literature, which are generally obtained by regression analysis or traditional techniques.In the case of current-induced scour, conventional methods based on experimental observations at the laboratory scale have revealed that the main governing variables are the geometric properties of the pipeline, the motion state of the seabed, and the approaching flow state (e.g., [7][8][9]).
During recent years, various machine learning (ML) models have been utilized to predict three-dimensional scour rates at seabed pipelines with promising results.Firstly, Najafzadeh and Saberi-Movahed [15] improved the group method of data handling (GMDH) by gene-expression programming (GEP) to predict the scouring rate propagation below offshore pipelines under wave conditions.They found that GMDH-GEP provided more accurate results than nonlinear regression-based equations and artificial neural networks (ANNs).Similarly, Ehteram et al. [16] optimized the structure of an ANN with the cooling body algorithm (CBA).They concluded that the ANN-CBA model provided better performance for wave-induced scouring rates than simple ANNs.In a recent study, Najafzadeh and Oliveto [17] presented new empirical equations based on four robust ML models (i.e., multivariate adaptive regression splines (MARS), evolutionary polynomial regression (EPR), model tree (MT), and gene-expression programming (GEP)) for the pre-diction of 3D scouring rates below pipelines exposed to regular waves.ML techniques, especially those used in the latest research by Najafzadeh and Oliveto [17], rely on their inherent advantages (e.g., reduction of complexity, fast learning process, and preparation of physical patterns of observational variables) to provide effective performance.
The literature review reveals that there is a lack of using ML techniques to provide physically consistent equations for the prediction of the scouring propagation rates beneath offshore pipelines subjected to currents.Therefore, the outline of this study is articulated in the following way: (i) experimental variables for feeding ML models are provided; (ii) ML models are implemented while controlling the setting parameters of each ML model; (iii) the ML models drive empirical equations; (iv) statistical analyses is performed to evaluate the empirical equations given by ML models; and (v) the physical consistency of ML models' performance with experimental observations is controlled.A general overview of the organization of this study is outlined in Figure 1.

Dimensional Analysis
Based on the experimental studies from the literature, it appears that the approach flow intensity, the geometry of the offshore pipeline, the bed sediment mobility, and the physical properties of seabed sediments play a key role in scouring rate of propagation due to currents [6][7][8].Specifically, the following function, (H 0 ), which implies a relationship between the three-dimensional scour process and the governing variables, is assumed to be where V L is the scour rate along the pipeline, u * is the flow shear velocity at the seabed, e is the embedment depth, D is the pipeline diameter, U C is the flow velocity due to the current, d 50 is the median sediment size, φ is the angle of repose for the bed sediment, α is the flow incident angle to the pipeline, ρ is the mass density of water, ρ s is the mass density of bed sediment, µ is the dynamic viscosity of water, and g is the acceleration due to gravity [6][7][8].
As mentioned in Cheng et al. [6] and Wu and Chiew [7,8], the analysis of scouring rates should be carried out considering effective dimensionless parameters.Furthermore, they investigated the physical consistency of their approaches using the experimental observations to corroborate the control of the selected variables on the scouring rate of propagation.These analyses were performed with dimensionless variables, ameliorating the scale effects related to the regression-based equations from laboratory observations.In terms of the application of ML models, the use of non-dimensional parameters in the prediction of the rate of scouring propagation could enhance the ML models' performance compared to predictive models based on raw variables (e.g., Ehteram et al. [16]; Najafzadeh and Oliveto [17]).Therefore, the present investigation applies the π-theorem to identify a set of dimensionless parameters to be considered in ML models.
Among the raw variables in Equation ( 1), three variables (ρ, D, V L ) were utilized as repeating variables to detect nine dimensionless parameters through the Buckingham theorem: where the above-mentioned parameters in Equation ( 2) can be specified as Before finalizing the dimensional analysis, some of the dimensionless parameters could be converted to a more attractive form, as highlighted in previous studies (Cheng et al. [6]; Wu and Chiew [8]; Najafzadeh and Oliveto [17]).The first and second parameters in Equation (3) were converted to 1-e/D and 1+sinα, respectively, in Cheng et al. [6].According to Wu and Chiew [7], the fourth and fifth parameters in Equation (3) could be converted to ρ s /ρ-1 and the pipeline Froude number (Fr P ), respectively.Moreover, the combination of dimensionless parameters in Equation (3) could be carried out in a way that more meaningful parameters are brought out.Therefore, the dimensionless parameters in Equation (3) were re-arranged as: Therefore, that the functional relationship (2) can be made explicit as where V L * is the dimensionless scouring rate propagation along the offshore pipeline, θ C is the Shields parameter due to the current, and Re P is the pipeline's Reynolds number.

Description of Experimental Data
In this study, the experimental observations were obtained from three reliable studies.Wu and Chiew [7] performed 51 experimental investigations considering uniform sand bed sediments with d 50 = 0.56 mm and geometric standard deviation (σ g ) of 1.4.All experimental studies were conducted in a glass-sided flume whose length, width, and depth were 19 m, 1.6 m, and 0.45 m, respectively.The sediment recess section was 1.8 m long and 0.15 m deep and was placed 14 m downstream from the flume inlet.A total of 8 PVC pipelines, with D from 22 to 116 mm, were used.All experiments were performed under clear-water conditions and organized into four classes, each of which focused on the investigation of the impact of one of the dimensionless parameters e/D, Fr P , θ C , and the ratio of the water depth (y) on the pipeline diameter.For each class, the selected dimensionless parameter was varied, whereas all the other dimensionless parameters remained unchanged.Scouring propagation rates were not observed in four experiments, which will therefore be excluded in the proposed modeling by ML techniques.
Cheng et al. [6] carried out 81 experiments to study the scouring propagation rates in a wave flume whose length, width, and depth were 50 m, 4 m, and 2.5 m, respectively.A concrete sandpit, 4 m long, 4 m wide, and 0.25 m deep, was constructed in the test section.A clear pipeline with a smooth surface, a diameter of 50 mm, and wall thickness of 8 mm was tested.All experiments were conducted under live-bed conditions (θ C = 0.046-0.104),and the embedment depth e varied in the range from 0.1D to 0.5D.The d 50 value and the relative density of sediment grains (S = ρ s /ρ) were 0.37 mm and 2.7, respectively.The sediment angle of repose, φ, was kept constant and equal to 32 • , the angle of attack α ranged from 0 to 45 • , and the dynamic viscosity of water, µ, was equal to 0.001 Pa•s.Among the 81 experiments, piping processes occurred in 2 experiments, and, additionally, propagation scouring rates were not observed in 35 experiments.Therefore, 44 scouring tests were considered in the proposed modeling by ML techniques.For the sake of completeness, Cheng et al. [6] suggested the following relationship to estimate the scour propagation rate: where K is a constant that depends on α values.Equation ( 8) relates to current conditions, and it is the only one available in sthe literature.
Hansen et al. [18] carried out 4 scouring experiments with d 50 = 0.20 mm and S = 2.65 under clear-water conditions.They considered 2 pipeline diameters (i.e., 20 and 50 mm), and the approach flow depth, equal to 0.22 m, was kept constant during all experiments.
Ultimately, 95 experimental observations were obtained from Wu and Chiew [7] (47 datasets), Cheng et al. [6] (44 datasets), and Hansen et al. [18] (4 datasets), which have been considered to develop predictive models of the scour propagation rate in current conditions.Table 1 provides the statistical characterization of the variables from the abovementioned datasets.To perform training and testing stages for the ML techniques, 75% (71 datasets) and 25% (24 datasets) were selected randomly, respectively.As seen in Table 1, the range of the Reynolds number Re P is indicative of fully-turbulent flows, and this would imply that the effect of Re P on the prediction of V L * is negligible.Histograms for all the dimensionless parameters are illustrated in Figure 2a-e.These histograms show the frequency distributions for the explored (independent) parameters governing scouring processes, presenting succinct and beneficial details of the experimental datasets.Incidentally, the analysis of these histogram analyses could help researchers in selecting unexplored ranges of investigation.In Figure 2a, the α parameter had a rather fragmented distribution of the frequency, with the highest fraction of the relative frequency (70%) for sinα = 0.As depicted in Figure 2b, although the distribution of the e/D parameter is fully fragmented, the pattern of the distribution is not symmetrical.Just 2 experiments were performed with e/D = 0.286, whereas about 50% of the experimental observations were carried out with e/D = 0.143.Moreover, Figure 2c shows that the approach flow Froude number, Fr P , was not deeply explored according to a high uniform distribution, and the end tail of the histogram reaches a value of Fr P equal to 0.636.Figure 2d,e also shows that, in the case of the Shields parameter (θ C ) and the dimensionless scouring propagation rate (V L *), the distributions of the values are not symmetrical with maximum values for the relative frequencies around 50% and 55%, respectively.Additionally, most of the scouring tests were characterized by values of θ C and Fr P approximately equal to 0.023 and 0.40, respectively.It is essential to mention that all the scouring tests were carried out utilizing various bed sediments and both states for the approach bed sediment mobility (i.e., live-bed and clear-water conditions).

Gene-Expression Programming
GEP is known as one of the most powerful ML models; it works based on evolutionary algorithms (EAs).GEP utilizes a parse tree configuration to explore solutions, and it can provide a predictive formulation to interpret input-output systems.In addition to this, the overall formulation given by GEP includes a fair number of genetic operators [19,20].Nevertheless, the GEP model employs a diagram of like-tree configurations with complexity, and it benefits from the merits of mathematical relationships to express a genome.The GEP model needs to assign some setting parameters, such as populations of individuals, number of generations, number of chromosomes, mutation rate, and linking functions among genes, to begin with.These factors play an important role in the goodness of the GEP performance during the training stage.GeneXproTools software was used to implement the GEP model.In the GEP performance, the validity of training and testing phases are controlled by the fitness values.In this study, root-mean-square error (RMSE) was selected as a fitness function to evaluate the GEP performance for each generation.Fitness values were scaled to 1000 by 1000/(1 + RMSE).In the GEP model, genetic operators are generally used by four strategies: optimal evolution (OE), constant fine-tuning (CFT), model fine-tuning (MFT), and sub-set selection (SSS).In this study, four alternatives of the GEP performance were considered.From the performances, the application of MFT (565.876) and OE (544.516)strategies stood at the higher goodness values in the training models in comparison with CFT (467.144) and SSS (536.205).Therefore, in the case of scouring problems below pipelines, optimal evolution methodology had promising usability; this study used the OE strategy as put forth in Najafzadeh and Oliveto [17].The best value of the fitness function was equal to 544.516 in the training stage; the corresponding generation number obtained was 1713.Since there are four input variables for the training GEP model, the four genes can be considered the maximum number.There is no doubt that an increase in the number of genes in the structure of the GEP model causes, overall, a more complicated relationship (obtained by the GEP model).To reduce the complexity of the final equation, three genes were first considered.Then, it was found that the performance of the GEP model with three genes provided a better value of the fitness function than a GEP model developed with four genes.Table 2 indicates the setting parameters of the GEP model utilized in the optimal relationship for the prediction of the V L * parameter.

Multivariate Adaptive Regression Splines
Multivariate adaptive regression splines (MARS) are a globally-recognized ML model which generates mathematical expressions by the development of linear regression [21].The MARS model generally includes second-order spline regression.Formulas are created by the cross-validation conception that can automatically control the nonlinearities of the obtained equation and interactions among variables.To generate an equation based on spline regression, the MARS model generally makes use of a number of basis functions (BFs) with their related weighted coefficients [21].Each BF is introduced by a variable and a knot.MARS creates an expression during two phases: the forward and the backward passes.In the forward pass, the MARS model starts with a model that consists of just the intercept/bias term, which is the mean of the output parameter values.Then, the MARS model repeatedly adds BFs in pairs to the second-order regression spline model.At each step, it detects the pair of BFs that can result in the maximum reduction value in sum-of-squares residual error.This process of adding BFs continues until the variation in residual error is too small to continue or until the maximum number of BFs is met.With regards to this research, the MARS technique approximates the scouring propagation rate through the following equation: in which T 0 , T j , BF, r, and NBF are the bias, the constant coefficients related to basis functions, the basis function, the set of dimensionless parameters, and the number of basis functions, respectively.The performance of the forward pass occasionally generates an over-fitted model.To generate an expression with more efficient generalization potential, the backward pass is performed to prune the initially-extended MARS expression.This process deletes terms one by one, eliminating the lowest effective term at each stage until it obtains the best sub-model.The performance of model subsets is evaluated using the generalized cross-validation (GCV) criterion discussed in the literature [21].
In the present research, the MARS technique was run by a computer code written in MATLAB software.The MARS model was primitively created by 10 BFs and 23.5 effective parameters.To reduce the complexity level of the initially developed MARS model, the analysis of GCV was performed.In addition to this, the k-fold was equal to 10 such that the MARS technique was performed 10 times.The forward and backward stages were carried out for each k-fold value; additionally, the number of basis functions in the final MARS model and the total effective number of parameters were obtained.The definition of the 10 performances of the MARS model is provided in Table 3, in which the results during the forward and backward stages are shown.Hence, 15 BFs were obtained to predict the scour propagation rate below offshore pipelines  (14) in which the BF 1 to BF 15 formulations are presented in Table 3.All four input parameters (Fr P , θ C , e/D, 1+sinα) had contributions in driving Equation ( 14) to estimate V L *, as can be inferred from Table 4.The optimal value of GCV (0.5662) reduced the probability of over-parametrization of the MARS expression.This means that the MARS expression could efficiently detect input variables that have lower importance in the prediction of the scouring propagation rate.All coefficients in Equation ( 14) were fitted by the particle swarm optimization (PSO) algorithm, providing MSE = 0.2252 as the best results.

Evulotionary Polynomial Regression
EPR, as a robust data-driven models (DDMs), is developed by the multi-objective genetic algorithm (MOGA) to be applied in four various ways: (i) data analysis from the inputoutput records, (ii) data modeling for both static and dynamic systems, (iii) symbolic expressions for mathematical models, and (iv) decision support for the model selection [22][23][24].Overall, EPR is an integrative model that simultaneously recruits the efficacy of MOGA with numerical regression techniques for developing simple knowledge extraction of mathematical expressions [25,26].
For the prediction of the scouring propagation rate, two types of mathematical expressions, which had the ordinary structures y = bias + ∑a were used to develop the EPR model.X is the vector of the input variables, y is the approximated output, and a is the set of coefficients.In this way, the following mathematical relationships were selected: 6) •(1 7) •(Fr P ) ES(j,8) (15) 2) •(1 + sin α) ES(j,3) •(Fr P ) ES(j,4) 7) • f (Fr P ) ES(j,8) > (16) In the above Equations ( 15) and ( 16), O 0 is the bias term, m is the maximum value of the mathematical terms, O j is a set of coefficients, f is a user-defined function, and the ES function is a range of exponents explored by the EPR model.Previous investigations demonstrated that the use of EPR expressions with natural logarithmic inner functions had more promising efficacy in predicting the scouring propagation rates below offshore pipelines exposed to regular waves compared to predictions obtained without inner function [17].Therefore, six terms and logarithmic inner functions in Equations ( 15) and (16) were considered for the EPR model.The number of generations was 4800.This issue was automatically computed regarding many factors such as the number of input parameters, the kind of inner function, and the number of dataset rows.Tables 5 and 6 indicate the ultimate EPR expressions during training stages based on Equations ( 15) and ( 16), respectively.According to Table 5, the first model (Model #1) had the least complexity in expression with four logarithmic terms, whereas the MSE value (0.582) indicated the lowest accuracy in the prediction of scouring propagation rates.Model #5 included 6 terms with 5 logarithmic terms, providing the best results (MSE = 0.341) along with the most complicated expression.It was inferred from Table 5 that an increase in the presence of the input variables in each logarithm term could increase the precision level of approximation.For instance, the first term of Model #5 included 4 input parameters (i.e., θ C, 1−e/D, 1+sinα, Fr P ); from Model #1 to Model #5, the more complicated the mathematical expression, the higher the precision level.Based on the expressions in Table 6, Model #5 had the highest level of accuracy (MSE = 0.585) in the training phase, whereas Model #1 had the lowest accuracy with an MSE of 0.747.Similarly, the complexity of terms and the number of terms played a key role in improving the efficiency of the expression returned by Equation (16).For instance, Model #2 (see Table 6) had one term with an MSE of 0.646, whereas Model #3 yielded more accurate predictions (MSE = 0.620) with two algebraic terms.Table 6 indicates that the third to fifth expressions (Model #3 to Model #5) had the same number of algebraic terms while providing various performance levels in the prediction of scouring propagation rate due to the existing various complexity levels in each expression.Generally, the EPR expressions given in Table 5 had more complexity and a number of algebraic terms in comparison with those presented in Table 6.Then, equations returned by EPR (based on Equation ( 15)) had more promising results than equations given in Table 6.6. Mathematical expressions developed by Equation ( 16) with logarithmic inner function.

M5 Model Tree
M5MT, as a newly-established system, is generally used to learn models that estimate values.Similar to ML techniques based on the classification concepts, such as MARS and classification and regression tree (CART) models, M5 is capable of creating tree-based expressions; these regression trees provide real values at their leaves.The trees given by M5 can provide multivariate linear expressions.The M5 version of MT efficiently solves problems with high dimensionality.Compared with MARS and CART, M5MT can interpret nonlinear systems with faster executive time performance.The main merit of M5MT over the CART model is that trees have a smaller size (in terms of leaves and nodes) and a higher level of precision in the multi-task systems [27].In M5MT, tree-like structures are provided by the divide-and-conquer technique.M5 splits the search space of data points into several subdivisions.Then, multilinear regression models are fitted on data points in each subdivision.M5MT is implemented based on several steps such as tree structure construction, error estimation, linear modeling, linear model simplification, pruning, and smoothing [27][28][29].
In this study, Weka3.9 software was utilized to develop M5MT for estimating V L * values around offshore pipelines.The following multilinear regression model is generally utilized in developing M5MT: in which a 0 is the bias term, and a 1 to a 4 are weighting coefficients that are computed by the least-square technique.This research used four alternatives of M5MT based on the usability of the pruning and smoothing stages in a way that if the pruning (or smoothing) stage is considered, its state is "True" (T); otherwise, it is called "False" (F).The impacts of pruning and smoothing stages on the performance of M5MT were conceptually evaluated by RMSE values during the training and testing stages.The first alternative, M5MT#1, is related to using the pruned (T) and smoothed (T) M5MT with four rules, as seen in Table 7.The Shields parameter was adjusted as a sole splitting parameter for constructing four rules.Table 7 indicates that all the input parameters were used to provide multilinear regression equations related to their rules.Table 7 demonstrates the results of M5MT for the unpruned (F) and smoothed (T) phases.Similar to Table 7, the performance of M5MT#2 yielded four rules (see Table 8), and, consequently, four multilinear regression relationships.As demonstrated in Table 8, the first rule provided only a bias term (0.0512), and additionally, Fr P was incorporated to model a linear equation for the second rule.In the case of unpruned (F) and smoothed (T) M5MT, a list of rules and relevant multilinear regression equations M5MT#3 were provided in Table 9.As seen in Table 10, all the input parameters were incorporated into 29 driving regression equations and search spaces.Regarding unpruned (F) and unsmoothed (F) M5MT, 29 rules were obtained, and M5MT#4 consequently provided 29 bias terms without incorporating the input parameters.Details of M5MT#4 performance are given in Table 11.
Table 7. List of M5MT#1 details in V L * estimates.Generally, once M5MT was performed with ignorance of the pruning stage, the size of the tree structure increased.Although this issue can occasionally improve the accuracy level of predictions, overfitting can be imminent.For instance, in the performance of M5MT#3, the pruning phase was excluded during the training stage, resulting in an RMSE = 0.846 for the training stage and 0.554 for the testing stage.Additionally, in M5MT#4, simultaneous exclusion of pruning and smoothing stages caused the overfitting and high growth of the model tree.RMSE values for training and testing stages were equal to 0.846 and 0.554, respectively.M5MT#1 provided better predictions of the scouring propagation rate with an RMSE = 0.547 and 0.599 for training and testing, respectively, when compared with M5MT#2 (RMSE = 0.6886 and 0.713).It is practically permissible for complicated system analysis to select a typical alternative of M5MT in the presence of pruning and smoothing phases.In this way, M5MT#1 was selected as the superior model to estimate V L * in this study.

Statistical Measures
To measure the efficacy of ML models' performance for both the training and testing stages in relation with the estimation of scouring propagation rates around offshore pipelines, the index of agreement (IOA), RMSE, mean absolute error (MAE), and scatter index (SI) have been utilized.These statistical measures are defined as: where V * L(Obs) , V * L(Pre) , V * L , and N are the observed, predicted, and average values of V * L , respectively, and N is the number of experimental works.The most ideal value of IOA is equal to 1, whereas the worst one is zero.In addition, the RMSE, MAE, and SI values are introduced as error functions, varying from 0 to +∞.

Statistical Performance of ML Models
Table 12 demonstrates the performance of ML models in the estimation of scouring propagation rates (V L *) in training (calibration) and testing (validation) stages.In the training stage, as seen in Table 12, the MARS expression (Equation ( 14)) with an RSME of 0.474 and MAE of 1.619 gave the most promising outperformance, followed by EPR (RMSE = 0.557 and MAE = 2.754), GEP (RSME = 0.836 and MAE = 3.514), and M5MT (RSME = 0.847 and MAE = 5.347).Additionally, values of IOA and SI proved the superiority of the MARS model (IOA = 0.972 and SI = 0.322) over the EPR (IOA = 0.912 and SI = 0.378), GEP (IOA = 0.914 and SI = 0.567), and M5MT (IOA = 0.962 and SI = 0.574) techniques.According to IOA, RMSE, and SI, the models GEP and M5MT provided rather the same performance in the prediction of V L * for the training stage.Statistical measures of testing stages indicated that EPR (see Table 5 and Model#5) provided the most successful level of performance (RMSE = 0.342 and MAE = 0.300) compared to MARS (RMSE = 0.379 and MAE = 0.449), GEP (RMSE = 0.556 and MAE = 0.490), and M5MT (RMSE = 0.599 and MAE = 0.922).Additionally, values of IOA and SI indicated that EPR expression with the natural logarithmic inner function had the most remarkable potential of estimating V L * in comparison with MARS (IOA = 0.969 and SI = 0.330), GEP (IOA = 0.935 and SI = 0.483), and M5MT (IOA = 0.924 and SI = 0.507).Figure 3 illustrates the qualitative performance of ML techniques for both the training and testing phases.Almost all data points of the training stage in Figure 3 were concentrated on the ±25% range of acceptable error.M5MT and GEP techniques significantly indicated the underestimation of V L * for the observed values of 4.5 and 7, whereas moderate overestimation has been attained by MARS and GEP techniques.Additionally, the scattering of testing data points in Figure 3b illustrated that M5MT and GEP techniques yielded moderate underpredictions and overpredictions for an observed V L * less than 2, whereas, for V L * = 2.5-5, almost all data points given by M5MT and GEP indicated underpredictions in comparison with EPR and MARS models.

Comparisons between ML Models and Related Works Regarding Complexity
In the present part of the investigation, the results of the ML models have been compared with those obtained by previous investigations considering various issues of ML models, such as the complexity of the general structure, the accuracy level, and the typical usability of optimization models in improving ML models' performance.In the case of the convolutional structure of the ML models, EPR expression (see Table 5 and Model #5) had a more complex mathematical structure, including six algebraic terms and natural logarithmic inner function, compared with MARS expression (Equation ( 14)) and multilinear regression equations by M5MT.Applying the present setting parameters in the development of the EPR model along with MOGA performance was more successful in increasing the accuracy level of the V L * prediction than M5MT, with its simpler expressions and low executive time performance.Furthermore, Equation ( 14), given by MARS, included 15 sets of second-order polynomials along with the performance of 10 forward and backward stages, providing more complex expressions and higher executive time performance than the multilinear mathematical expressions developed by M5MT (see Table 7).In this case, the MARS expression had a promising application compared to all four alternatives of the M5MT models (including bias terms and linear expressions).In the GEP model, three typical inner functions (i.e., Tanh, Exp, Atan) with three algebraic terms could not lead to higher complex mathematical expressions than EPR expressions.
Since there is no related research work that has studied the application of ML models in the estimation of scouring propagation rates due to currents, the present results were only compared with previous investigations in terms of ML models' complexity.All the related works were tested on the scouring propagation data due to regular waves and live-bed sediment conditions.Najafzadeh and Saberi-Movahed [15] utilized the GEP model in the structure of the GMDH model to promisingly predict the scouring propagation rates around offshore pipelines due to waves in comparison with GMDH and GEP techniques.One of the main findings of the present study is that it is consistent with the results obtained by Najafzadeh and Saberi-Movahed [15].Moreover, an increase in the complexity of ML models causes an increase in the precision level of GMDH (or other ML models such as EPR and GEP models).For instance, Table 5 indicates that adding logarithmic terms to the EPR expression led to more accurate results, increasing performance from Model #1 with a MSE = 0.582 to Model #5 with a MSE = 0.341.In another related research work, Ehteram et al. [16] used three optimization algorithms (i.e., CBA, WA, and PSO) to improve an ANN structure for the estimation of the scouring propagation rates.All three ANN models were less complex than the EPR expression obtainedby the present study (see Table 5), and additionally, the ANN models were modeled only for wave conditions that were different from the present study.The results of this study showed that EPR, with its high degree of complexity, provided more accurate results than ANN-CBO (MAE = 0.721), ANN-WA (MAE = 0.745) and ANN-PSO (MAE = 0.814).Nevertheless, these ANN models have generally faster performance in predicting the scouring propagation rates, but the potential of MOGA in the selection of algebraic terms and exponents of variables (interactions among variables) would increase the accuracy level of ML models in this case.In Najafzadeh and Oliveto's [17] research, EPR expressions developed by the natural logarithmic inner function provided higher complexity and accuracy than the GEP model.

Effects of the Pipeline Embedment Depth
The variation of V L * values versus e/D ratios is illustrated in Figure 4.These physical behaviors were studied in four levels of θ C and α values, as seen in Figure 4a-d.For θ C = 0.018 and α = 0 • , Figure 4a indicates that an increase in e/D value leads to a decrease in V L *.All the ML models follow the decreasing trend, consistent with observed values.For instance, MARS demonstrated that V L * declined from 1.970 in e/D = 0.02 to 0.577 in e/D = 0.08.Additionally, for the state of θ C = 0.0091-0.061and α = 0 • , Figure 4b illustrated a downward trend between V L * values and e/D ratios.In Figure 4c, the variation of V L * values versus e/D ratios was provided in different viewpoints.Generally, experimental observations depicted a downward trend from V L * = 2.453 in e/D = 0.1 to V L * = 1.398 in e/D = 0.3.Afterward, V L * variations remained rather constant in e/D = 0.4, then had a decreasing trend up to e/D = 0.5.Figure 4c illustrated that all ML models follow the physical behavior of V L * versus e/D given by experimental observations for e/D = 0.1-0.4,whereas the M5MT and EPR models could not follow the downward variation of V L * values between e/D = 0.4-0.5, indicating relatively significant overpredictions.Similar to that shown in Figure 4c, MARS and EPR expressions could successfully simulate physical behaviors of V L * versus e/D values in e/D = 0.1-0.2.As seen in Figure 4d, the M5MT and MARS models have gone through upward trends at e/D = 0.3, illustrating overpredictions.The quantitative results of ML models for different ranges of θ C and α parameters are presented in Table 13.As seen in Table 13, the MARS technique provided the most promising performance (RMSE = 0.197 for θ C = 0.081-0.104and α = 15 • ) for all ranges, except θ C = 0.018 and α = 0 • , whereas M5MT indicated the lowest accuracy in the prediction of V L * for all ranges, except θ C = 0.046-0.104and α = 30-45 • , in which GEP expression had the worst performance.In the case of e/D = 0.01-0.035and α = 0 • , Figure 5b shows that all ML models detect physical behaviors of V L * versus θ C parameter in an upward way for θ C = 0.021-0.054;then, V L * decreases in θ C = 0.064.Additionally, the M5MT and GEP models produced remarkable underpredictions for θ C = 0.054.Figure 5c 5g) whereas GEP and M5MT had underpredictions of V L * for e/D = 0.2 and α = 30 • in Figure 5h.Overall, the predicted values of V L * given by ML models were in good agreement with experimental observations.To allow quantitative comparisons of the ML models' performance in the four ranges of e/D and α, the results of the RMSE are presented in Table 13.Table 13 indicates that EPR expression had the most successful performance for e/D = 0.

Effects of the Approach Flow Froude Number
Figure 6 shows the variations of V L * versus Fr P for three ranges of θ C and α.From Figure 6a, the performance of the ML models was in good agreement with experimental observations for Fr P = 0.34, 0.47, and 0.5.Overall, the qualitative results of ML models indicated that V L * variations had an increasing trend for Fr P = 0.346-0.4;then, a downward trend was obtained for Fr P = 0.4-0.5.Besides, values of V L * given by the ML techniques increased at Fr P = 0.63.As seen in Figure 6a, all the ML models, except MARS expression, had relatively remarkable underprediction for Fr P = 0.4, whereas M5MT and GEP techniques overpredicted V L * values at Fr P = 0.63.In Figure 6b, experimental observations indicated that the variation of V L * values had a declining trend for Fr P = 0.5-1.26,whereas EPR and M5MT techniques overpredicted significantly compared to GEP and MARS expressions.As illustrated in Figure 6c, MARS and EPR models indicated a downward trend for Fr P = 0.5-1.71,whereas GEP and M5MT overpredicted significantly.In regard to the quantitative performance of ML models, the MARS model had the most successful evaluation in the prediction of V L * for all ranges θ C = 0.0091-0.13).

Conclusions
In the present research, various ML models based on evolutionary computing and classification concepts have been utilized to predict the scouring propagation rate around offshore pipelines exposed to currents.Effective dimensionless parameters (i.e., 1 -e/D, 1 + sinα, θ C , Fr P ) obtained from dimensional analysis of the scouring tests were directly incorporated into the presented equations through the performance of ML models.The main findings of this research can be summarized as follows.
ML models provided explicit formulas with promising performance for the estimation of the scouring propagation rate.EPR expression included a mathematical structure with the highest degree of nonlinearity, followed by Equation ( 14) (given by the secondorder polynomial MARS model) and the multilinear regression equation by M5MT.In the GEP model, a kind of inner function played a key role in increasing the precision level of performance (Equation ( 9)) through an optimal evolutionary technique.The predictive equations resulting from the ML models are somewhat complex and have several fitting parameters, which are not so numerous when taking into account the geometrical and hydraulic conditions (even including live-bed and clear-water approaching flows) considered in this study.However, these equations have the considerable advantage of fitting the experimental data with performances that are significantly higher than the currently available literature formulas.Their structure is somewhat complex and inhibits an immediate identification of the role performed by each governing parameter (though a sensitivity analysis could be a replacement in this regard), but they could be easily implemented in numerical codes.Moreover, according to Table 1, their ranges of application are: 0 ≤ e/D ≤ 0.5, 0 • ≤ α ≤ 45 • , 0.22 ≤ Fr P ≤ 0.63, and 0.01 ≤ θ C ≤ 0.10.
The ML models considered in this study definitely perform better than the current empirical models.However, their performance differs from one model to another depending on the structure/algorithm of the model itself.This study also remarks on these differences (highlighting which model would perform better) in a physical context characterized by a limited number of data (95 experimental observations) despite the complexity of the phenomena.Statistical measures of ML models indicated that MARS could yield 15 BFs (including linear and polynomial equations) and the best performance in the training states whereas, for the testing stage, EPR expression with natural logarithmic functions resulted in the most level of precision for the prediction of V L *.
Quantitative and qualitative representations for variations of V L * values versus e/D were provided to obtain physical behaviors with variations in θ C and α values.Generally, MARS and GEP expressions simulated well the variations of V L * values versus e/D ratios through a downward trend compared to EPR and M5MT techniques.These downward variations were in good agreement with experimental variations for different levels of θ C and α (for instance, θ C = 0.018 and α = 0 • ; θ C = 0.0091-0.061and α = 0 • ).
A parametric study of V L * variations versus θ C values was carried out in various levels of e/D and α.All ML models provided increasing trends for levels of e/D and α.Generally, predictions of scouring propagation rates (given by ML models) were in good harmony with experiments by Hansen et al. [18], Cheng et al. [6], and Wu and Chiew [7].
Driving physical variations of V L * values versus the pipeline Froude number (Fr P ) values were conceptualized for three levels of θ C and α values.Although MARS and EPR expressions demonstrated a downward trend for some values of Fr P , the performance of the ML models generally could follow decreasing trends for θ C = 0.0091-0.104and α = 0 • , θ C = 0.046-0.104and α = 15 • , and θ C = 0.046-0.104and α = 30-45 • , probably due to live-bed conditions.
ML model-performed equations exhibited physical consistency with experimental investigations, so they could result in reliable estimations of scouring propagation rates, which are utilized to consider the practical design of offshore pipelines while focusing on preventative measures of erosion and scouring.The analysis of data was carried out considering dimensionless governing parameters of great impact in sediment transport phenomena (e.g., the pipeline Froude number and the Shields parameter).This approach allows the extension of the proposed predictive equations to the field because the experimental data used in this study would appear free from scale effects.However, more experimental observations would be useful and desirable, especially those looking at the unexplored ranges highlighted in the histograms in Figure 2. The combined action of current and waves would need more attention, as would the collection of field data, which are not currently available to the authors' knowledge.

Figure 1 .
Figure 1.Flow chart of the present research work.

Figure 2 .
Figure 2. Frequency and cumulative relative frequency for the non-dimensional parameters considered in the present research: (a) flow attack angle α to the offshore pipeline; (b) ratio of the pipeline embedded depth to pipeline diameter, e/D; (c) approaching flow Froude number to the offshore pipeline, Fr P ; (d) Shields parameter due to current, θ C ; (e) dimensionless scouring propagation rate around offshore pipeline along the longitudinal direction, V L *.

Figure 3 .
Figure 3. Performance of ML models for prediction of V L * in the (a) training and (b) testing stages.

Figure 5
Figure 5 demonstrates the variation of V L * values versus the Shields parameter θ C .As seen in Figure 5a, all ML models provided overall increasing trends for e/D = 0.1 and α = 0 • .

Table 2 .
Setting parameters of the proposed structure of GEP models.

Table 3 .
Characterizations of MARS models during the forward and backward stages. k-

Table 4 .
Coefficients and basis functions for MARS models in scour rate prediction.

Table 11 .
List of bias terms given by M5MT#4 in V L * estimates.

Table 12 .
Statistical performances of the ML models considered in this study.