Prediction of Pile Bearing Capacity Using XGBoost Algorithm: Modeling and Performance Evaluation

: The major criteria that control pile foundation design is pile bearing capacity (P u ). The load bearing capacity of piles is affected by the various characteristics of soils and the involvement of multiple parameters related to both soil and foundation. In this study, a new model for predicting bearing capacity is developed using an extreme gradient boosting (XGBoost) algorithm. A total of 200 driven piles static load test-based case histories were used to construct and verify the model. The developed XGBoost model results were compared to a number of commonly used algorithms— Adaptive Boosting (AdaBoost), Random Forest (RF), Decision Tree (DT) and Support Vector Machine (SVM) using various performance measure metrics such as coefﬁcient of determination, mean absolute error, root mean square error, mean absolute relative error, Nash–Sutcliffe model efﬁciency coefﬁcient and relative strength ratio. Furthermore, sensitivity analysis was performed to determine the effect of input parameters on P u . The results show that all of the developed models were capable of making accurate predictions however the XGBoost algorithm surpasses others, followed by AdaBoost, RF, DT, and SVM. The sensitivity analysis result shows that the SPT blow count along the pile shaft has the greatest effect on the P u .


Introduction
A pile is a long, structural element used to allow structural loads to be transferred to the soils at a depth below the structure's base.Axial, lateral, and moment loads are examples of structural loads.The load transmission mechanism is based on pile toe and pile shaft resistances [1].Deep foundations are another word for pile foundations that are often used in practice.Pile foundations are used to support structures that cannot be supported economically on shallow foundations.The most significant factor when designing a pile foundation is pile carrying capacity (P u ) [2].Various ways to determine pile carrying capacity have been used during the years of research and development [3][4][5][6][7][8][9][10][11][12][13], including dynamic analysis, high strain dynamic test, pile load test, cone penetration test (CPT) and in situ tests.Some research, claims that the aforementioned connections Appl.Sci.2022, 12, 2126 2 of 24 exaggerate the bearing capability [14].However, the pile load test is considered as one of the best methods to determine the pile bearing capacity, although this strategy is costly for small-scale projects and time-consuming [10], it is critical to find a more practical approach.As a result, many studies using in situ test data to assess pile carrying capacity have been performed [9].
Lopes and Laprovitera [15], and Decort [16] proposed different formulas for determining pile carrying capacity for several soils, including clay and sand.Conventional approaches have used numerous main parameters to determine the mechanical properties of piles, including the diameter of pile, length of pile, type of soil, and SPT blow counts of each layer.Nevertheless, the selection of relevant parameters, along with the failure in covering other parameters, have led to the disagreement of results given by various approaches [17].As a result, the development of an optimal model for selecting an appropriate set of parameters is critical.
A recently developed approach based on data mining techniques has been increasingly employed to resolve real-world problems for the past half-decade, particularly in the field of civil engineering [18][19][20][21][22][23][24][25][26][27][28].Several practical problems have already been effectively performed using machine learning algorithms, paving the way for new prospects in the construction industry.Furthermore, a variety of machine learning algorithms, for example, random forest, artificial neural network (ANN), decision tree, adaptive neuro-fuzzy inference system (ANFIS), AdaBoost, SVM, XGBoost have been developed for addressing technical issues, such as pile mechanical behavior prediction.
Goh et al. [29,30] produced an ANN-based algorithm of piles driven in clays to predict the capacity of friction, using on-field data records to train the algorithm.Furthermore, Shahin et al. [31][32][33][34] employed the ANN-based model for forecasting pile load capacity using data that included in situ load testing and cone penetration test (CPT) results.Similarly, Nawari et al. [35] published an ANN approach that uses SPT data and shaft geometry to measure the settling of drilled shafts.Pham et al. [17] produced an ANN and RF to predict driven pile's capacity.Momeni et al. [36] created an ANN model modified with Genetic Algorithm (GA) which select appropriate biases and weights for predicting pile bearing capacity.Based on CPT data, Kordjazi et al. [37] employed an SVM model to forecast the pile ultimate load-bearing capability.Liu et al. [21] developed XGBoost, Backpropagation Neural Network (BPNN) and RF algorithm to estimate driven piles bearing Capacity.Liang et al. estimated stability of hard rock pillars applying XGBoost, gradient boosting decision tree (GBDT), and light gradient boosting machine (LightGBM) Algorithms [23].Pham et al. [38] has also developed Deep Learning Neural Network to estimate the carrying capacity of piles.
In addition to machine learning (ML) techniques mentioned above, the GBDT method demonstrates excellent results in a variety of disciplines [39][40][41].It uses the boosting strategy to incorporate many DTs into a strong classifier as one of the ensemble learning algorithms [42].DTs belong to the ML approach which employs a tree-like framework to handle a wide range of input types while tracing each path to the prediction outcomes [43].DTs, on the other hand, are easy to overfit and sensitive to dataset noise because errors of the DTs were offset by one another, the total prediction performance of GBDT improves with the integration of DTs.XGBoost [44] and LightGBM [45] have recently been proposed in the context of GBDT.They have also attracted a lot of attention as a result of their outstanding performances.These three techniques, in particular, operate well with tiny datasets.To some extent, overfitting, which occurs when results match existing data very closely but fail to correctly estimate future trends, can also be prevented [43].
The aim of the present study is to develop a robust model to estimate axial pile bearing capacity using the XGBoost algorithm based on reliable pile load test results.The scope of the present research includes the following:

•
To develop a model that is able to learn the complex relationship among axial pile bearing capacity and its influencing factors with reasonable precision.

•
To validate the proposed model by comparing the efficacy with prominent modeling techniques, such as AdaBoost, RF, DT, and SVM in terms of performance measure metrics.

•
To conduct sensitivity analyses for the determination of the effect of each input parameter on P u .
The framework of the paper is as follows: In Section 2, data collection and preparation are presented.Section 3 describes the machine learning approaches.The construction of the prediction model is presented in Section 4. Results and discussion are given in Section 5. Lastly, there are some closing remarks.

Dataset
In this study, the dataset of 200 reinforced concrete piles at the test site in Ha Nam province-Vietnam (the complete database is available in Table A1) was used to train and test the model.As a first step, all known parameters affecting P u were taken into account.Furthermore, it was discovered that the majority of traditional methods utilized three categories of parameters: geometry of pile, pile material quality, and soil attributes [3].To achieve the measurements, hydraulic pile presses were used to drive pre-cast square-section piles with closed tips to the ground at a constant rate of penetration.The testing began at least seven days after the piles were driven, and the experimental setup is shown in Figure 1.The load increased gradually in each pile test, as can be observed.The load might be increased up to 200 percent of the pile load design depending on the design requirements.The time it takes to achieve 100 percent, 150 percent, and 200 percent of the load could take from around 6 to 12 h or 24 h, depending on the load [38].These two principles were used to determine pile bearing capacity: (i) the pile bearing capacity was taken as the failure load when the settlement of pile top at the current load level was five times or higher than the settlement of pile top at the previous load level; (ii) when the load-settlement curve became linear at the last test load, condition (i) would not be used.In such a case, the test load at which progressive movement occurs or the total settlement exceeds 10 % of the pile diameter or width would be taken as the pile bearing capacity.To validate the proposed model by comparing the efficacy with prominent modeling techniques, such as AdaBoost, RF, DT, and SVM in terms of performance measure metrics.

•
To conduct sensitivity analyses for the determination of the effect of each input parameter on Pu.
The framework of the paper is as follows: In Section 2, data collection and preparation are presented.Section 3 describes the machine learning approaches.The construction of the prediction model is presented in Section 4. Results and discussion are given in Section 5. Lastly, there are some closing remarks.

Dataset
In this study, the dataset of 200 reinforced concrete piles at the test site in Ha Nam province-Vietnam (the complete database is available in Table A1) was used to train and test the model.As a first step, all known parameters affecting Pu were taken into account.Furthermore, it was discovered that the majority of traditional methods utilized three categories of parameters: geometry of pile, pile material quality, and soil attributes [3].To achieve the measurements, hydraulic pile presses were used to drive pre-cast square-section piles with closed tips to the ground at a constant rate of penetration.The testing began at least seven days after the piles were driven, and the experimental setup is shown in Figure 1.The load increased gradually in each pile test, as can be observed.The load might be increased up to 200 percent of the pile load design depending on the design requirements.The time it takes to achieve 100 percent, 150 percent, and 200 percent of the load could take from around 6 to 12 h or 24 h, depending on the load [38].These two principles were used to determine pile bearing capacity: (i) the pile bearing capacity was taken as the failure load when the settlement of pile top at the current load level was five times or higher than the settlement of pile top at the previous load level; (ii) when the load-settlement curve became linear at the last test load, condition (i) would not be used.In such a case, the test load at which progressive movement occurs or the total settlement exceeds 10 % of the pile diameter or width would be taken as the pile bearing capacity.As a result, previous studies (e.g., [38]) show that pile bearing capacity (Pu) is a function of ( 1  As a result, previous studies (e.g., [38]) show that pile bearing capacity (P u ) is a function of (1) diameter of the pile (D); (2) depth of the first layer of soil embedded (X 1 ); (3) depth of the second layer of soil embedded (X 2 ); (4) depth of the third layer of soil embedded (X 3 ); (5) pile top elevation (X p ); (6) ground elevation (X g ); (7) extra pile top elevation (X t ); (8) pile tip elevation (X m ); (9) SPT blow count at pile shaft (N S ) and (10) SPT blow count at pile tip (N t ) as shown in Figure 2. Therefore, in the current study, these input variables were used to develop the proposed models.Collected data were divided into training and testing sets, researchers have used a different percentage of the available data as the training set for different problems.For instance, Pham et al. [38] used 60%; Liang et al. [23] used 70%; while Ahmad et al. [28] used 80% of the data for training.The statistical consistency of training and testing datasets has a substantial impact on the results when using soft computing techniques which improves the performance of the model and helps in evaluating them better [22,46].To choose the most consistent representation, statistical studies of input and output variables of the training and testing data were performed.It was accomplished through the use of a trial-and-error strategy.For training and testing datasets, the best statistically consistent combination was selected.The data division was performed in such a way that 140 (70%) samples were used for training, and 60 (30%) samples were used for testing the models considered in this study.The results of the statistical analysis of the finally selected combinations are shown in Table 1, which includes minimum, mean, maximum and standard deviation of the input and output variables.Collected data were divided into training and testing sets, researchers have used a different percentage of the available data as the training set for different problems.For instance, Pham et al. [38] used 60%; Liang et al. [23] used 70%; while Ahmad et al. [28] used 80% of the data for training.The statistical consistency of training and testing datasets has a substantial impact on the results when using soft computing techniques which improves the performance of the model and helps in evaluating them better [22,46].To choose the most consistent representation, statistical studies of input and output variables of the training and testing data were performed.It was accomplished through the use of a trial-and-error strategy.For training and testing datasets, the best statistically consistent combination was selected.The data division was performed in such a way that 140 (70%) samples were used for training, and 60 (30%) samples were used for testing the models considered in this study.The results of the statistical analysis of the finally selected combinations are shown in Table 1, which includes minimum, mean, maximum and standard deviation of the input and output variables.

Correlation Analysis
Correlation (ρ) was used to verify the intensity of correlation between different parameters (see Table 2).Given pair of random variables (m, n), the following equation for ρ is used: where cov denotes covariance, σ m denotes the standard deviation of m, and σ n denotes the standard deviation of n. |ρ| > 0.8 represents a strong relationship among m and n, values between 0.3 and 0.8 represents medium relationship, while |ρ| < 0.30 represents weak relationship [47].According to Song et al. [48], correlation is considered as "strong" if |ρ| > 0.8.Table 2 displays the correlations between input and output characteristics.The correlation coefficient has a maximum absolute value of 0.989, as shown in Table 2.There is a "strong to weak" relationship among various variable combinations so none of the input variables was removed.

Extreme Gradient Boosting Algorithm
Chen and Guestrin [44] suggested the XGBoost algorithm, which is based on the GBDT structure.It has attracted a lot of attention as a result of its outstanding results in Kaggle's ML competitions [49].Unlike GBDT, the XGBoost goal function includes a regularization term to avoid overfitting.The main objective function is described as follows: where R( f k ) represents the regularization term at iteration k, and C being a constant that can be removed selectively.Regularization term R( f k ) written as, where α is the complexity of leaves, H denotes the number of leaves, η signifies penalty variable, and ω j represents output results in each leaf node.Leaves denote the expected categories based on classification criteria, whereas the leaf node denotes the tree node which cannot be divided.Furthermore, unlike GBDT, XGBoost employs a second-order Taylor series of main functions rather than the first-order derivative.If the loss function is the mean square error (MSE), then the main function may be written as: Appl.Sci.2022, 12, 2126 6 of 24 where q(x i ) is a function that maps data points to leaves, g i and h i represents loss function's first and second derivatives, respectively.The final loss value is calculated by adding all of the loss values together.Because samples in the DT corresponds to nodes of leaf, the ultimate loss value can be calculated by adding loss values of the leaf nodes.As a result, the main function can be written as: where P j = ∑ i I j p i , Q j = ∑ i I j q i , and I j are the total number of samples in leaf node j.
To summarize, the challenge of optimizing the main function is reduced to identifying the minimum of a quadratic function.Due to the addition of regularization phenomena, XGBoost has a stronger capability to avoid overfitting.The structure of XGBoost can be seen in Figure 3.
Furthermore, unlike GBDT, XGBoost employs a second-order Taylor series of main functions rather than the first-order derivative.If the loss function is the mean square error (MSE), then the main function may be written as: where ( ) is a function that maps data points to leaves,  and ℎ represents loss function's first and second derivatives, respectively.The final loss value is calculated by adding all of the loss values together.Because samples in the DT corresponds to nodes of leaf, the ultimate loss value can be calculated by adding loss values of the leaf nodes.As a result, the main function can be written as: where  = ∑  ,  = ∑  , and  are the total number of samples in leaf node j.
To summarize, the challenge of optimizing the main function is reduced to identifying the minimum of a quadratic function.Due to the addition of regularization phenomena, XGBoost has a stronger capability to avoid overfitting.The structure of XGBoost can be seen in Figure 3.

Random Forest (RF) Algorithm
Because of its simplicity and diversity, RF is the most applied ML method.Breiman in 2001, developed this supervised learning approach for classification and regression analysis [50].RF is an integrated learning strategy that collects data from a single DT and improves prediction accuracy by using majority voting or mean findings, depending on the task.
Assume you have an input data set with the following values Q = q 1 , q 2 , q 3 , . . ., q n where n is the number of datasets.An RF model would be a set of T trees T 1 (Q), T 2 (Q), T 3 (Q) . . ., T n (Q).R1 , R2 . . . . . .Rn is the predicted outcome of these decision-making trees.The eventual output of the RF model for the regression problem will be the average of all the above trees' prediction outcomes.The concept of splitting initial training sets into smaller sets, with only a few predictive elements picked at random in each split, has been used to construct tree-growing algorithms.Because the programmer fails to prune decision trees according to predetermined stopping criteria, they continue to grow indefinitely.Tree growth stops such as the Gini Diversity Index, RMSE and MSE are frequently utilized.Trees with appropriate predictions are picked in the final RF model, and trees with low predictive outcomes are excluded.The overfitting problem of the single DT model is eliminated by randomly selecting predictor parameters and the final set of DTs [50,51].Figure 4 illustrates the random forest's structure.
above trees' prediction outcomes.The concept of splitting initial training sets into smaller sets, with only a few predictive elements picked at random in each split, has been used to construct tree-growing algorithms.Because the programmer fails to prune decision trees according to predetermined stopping criteria, they continue to grow indefinitely.Tree growth stops such as the Gini Diversity Index, RMSE and MSE are frequently utilized.Trees with appropriate predictions are picked in the final RF model, and trees with low predictive outcomes are excluded.The overfitting problem of the single DT model is eliminated by randomly selecting predictor parameters and the final set of DTs [50,51].Figure 4 illustrates the random forest's structure.

AdaBoost Algorithm
AdaBoost or adaptive boosting is a sequential ensemble technique which is based on the principle of developing several weak learners using different training sub-sets drawn randomly from the original training dataset [52,53].During each training, weights are assigned which are used when learning each hypothesis.The weights are used for computation of the error of the hypothesis on the dataset and are an indicator of the comparative importance of each instance.The weights are recalculated after every iteration, such that incorrectly classified instances by the last hypothesis receive higher weights.This enables the algorithm to focus on more difficult-to-learn instances.Assigning revised weights to the incorrectly classified instances is the most vital task of the algorithm.Unlike in classification, in regression, the instances are not correct or incorrect, rather they constitute a real-value error.By comparing the computed error to a predefined threshold prediction error, it can be labeled as an error or not an error and thus, the AdaBoost classifier can be used.Instances with larger errors on previous learners are more likely (i.e., higher Q . . . . .

AdaBoost Algorithm
AdaBoost or adaptive boosting is a sequential ensemble technique which is based on the principle of developing several weak learners using different training sub-sets drawn randomly from the original training dataset [52,53].During each training, weights are assigned which are used when learning each hypothesis.The weights are used for computation of the error of the hypothesis on the dataset and are an indicator of the comparative importance of each instance.The weights are recalculated after every iteration, such that incorrectly classified instances by the last hypothesis receive higher weights.This enables the algorithm to focus on more difficult-to-learn instances.Assigning revised weights to the incorrectly classified instances is the most vital task of the algorithm.Unlike in classification, in regression, the instances are not correct or incorrect, rather they constitute a real-value error.By comparing the computed error to a predefined threshold prediction error, it can be labeled as an error or not an error and thus, the AdaBoost classifier can be used.Instances with larger errors on previous learners are more likely (i.e., higher probability) to be selected for training the subsequent base learner.Finally, weighted average or median is used to provide an ensemble prediction of the individual base learner predictions [54].

Support Vector Machine (SVM) Algorithm
Vapnik invented the SVM in 1995 [55], and it is a popular and successful learning algorithm for the classification of linear and nonlinear regression problems.The SVM algorithm delivers reliable prediction outcomes and is practicable for high-dimensional feature spaces, is robust and has good noise resistance [56,57].In many disciplines, many effective SVM implementations with classification and regression issues have been documented [58][59][60].The following is a summary of SVM's basic theory.
As illustrated in Figure 5, a training set {(u k , v k ), k = 1,2, . . . . . ., n} is chosen for an SVM model, where u k = [u 1k, u 2k , . . . . . ., u nk ] ∈ R n h is the input data, v k ∈ R n m is the output data corresponding to u k , and n is the number of training samples.The goal of the SVM is to identify an optimal hyperplane function f(x) (defined by the weight vector w and the offset b), that passes through all data items with the insensitive loss coefficient ε (based on two supporting hyperplanes, w.u -b = ε and w.u -b = −ε).

Decision Tree (DT) Algorithm
A decision tree is a tool with a tree-like structure that predicts likely outcomes, resource costs, utility costs, and potential consequences.One of the benefits of the machine learning approach over traditional statistical approaches such as regression is that they can handle more than two-dimensional data.For data-driven prediction analysis of diverse geotechnical problems, many researchers have adopted the tree-based approach [20,61,62].As a result, tree-based ML techniques, such as DT, were used to build models and identify the key predictors of pile-soil friction in this work.DT can be seen graphically, showing specific decision requirements as well as the complicated branching that occurs in a constructed decision.This is one of the most popular and commonly used supervised learning techniques for forecasting model accuracy.
DT is capable of performing all tasks including recognition, classification, and prediction.DT is a "tree"-shaped structure made up of a succession of questions, each of which is described by a set of parameters.Roots, branches, and leaves comprise a real tree.Similarly, the graph for DT is comprised of nodes, which are leaves, and branches, which represent connections between nodes [63].A variable is chosen as a root, also known as the initial node, during the DT process.By reference to the appointed features, the initial node is divided into many internal nodes.DT is a top-down tree, meaning the roots are at the very top.Roots, branches, and nodes are the end products of the branches [64].Each node can be divided into two branches and each node has a relationship to a specific characteristic and branches that have been specified by a specific range of input.Figure 6 depicts a flowchart linked to the DT approach.

Construction of Prediction Models
Orange software was used to create the proposed models for predicting pile bearing capacity.Orange is an open-source software package.Machine learning, preprocessing, and visualization methods are included in the default installation, which is divided into six widget sets i.e., Data, Visualize, Classify, Regression, Evaluate and Unsupervised.Orange is visual programming software for machine learning, visualization, data mining, data analysis.
The predictor variables were provided via an input set (x) defined by x = {D, X1, X2, X3, Xp, Xg, Xt, Xm, NS and Nt}, while the target variable (y) is Pu.The most important task in Root Node Decision Node Leaf Node The function f(u) in nonlinear regression is determined as follows: The penalty constant C is used to manage the penalty error, α i , α * i are the Lagrange multipliers, and K (u i , u j ) is the kernel function defined as follows: The mapping function F is a nonlinear mapping function.The most often used kernel functions are linear, polynomial, sigmoid, and Gaussian functions: Linear kernel function: Polynomial kernel function: Sigmoid kernel function: Gaussian kernel function:

Decision Tree (DT) Algorithm
A decision tree is a tool with a tree-like structure that predicts likely outcomes, resource costs, utility costs, and potential consequences.One of the benefits of the machine learning approach over traditional statistical approaches such as regression is that they can handle more than two-dimensional data.For data-driven prediction analysis of diverse geotechnical problems, many researchers have adopted the tree-based approach [20,61,62].As a result, tree-based ML techniques, such as DT, were used to build models and identify the key predictors of pile-soil friction in this work.DT can be seen graphically, showing specific decision requirements as well as the complicated branching that occurs in a constructed decision.This is one of the most popular and commonly used supervised learning techniques for forecasting model accuracy.
DT is capable of performing all tasks including recognition, classification, and prediction.DT is a "tree"-shaped structure made up of a succession of questions, each of which is described by a set of parameters.Roots, branches, and leaves comprise a real tree.Similarly, the graph for DT is comprised of nodes, which are leaves, and branches, which represent connections between nodes [63].A variable is chosen as a root, also known as the initial node, during the DT process.By reference to the appointed features, the initial node is divided into many internal nodes.DT is a top-down tree, meaning the roots are at the very top.Roots, branches, and nodes are the end products of the branches [64].Each node can be divided into two branches and each node has a relationship to a specific characteristic and branches that have been specified by a specific range of input.Figure 6 depicts a flowchart linked to the DT approach.
which is described by a set of parameters.Roots, branches, and le Similarly, the graph for DT is comprised of nodes, which are leav represent connections between nodes [63].A variable is chosen the initial node, during the DT process.By reference to the appo node is divided into many internal nodes.DT is a top-down tree, the very top.Roots, branches, and nodes are the end products o node can be divided into two branches and each node has a relati acteristic and branches that have been specified by a specific ran picts a flowchart linked to the DT approach.

Construction of Prediction Models
Orange software was used to create the proposed models fo capacity.Orange is an open-source software package.Machine and visualization methods are included in the default installatio six widget sets i.e., Data, Visualize, Classify, Regression, Evaluat ange is visual programming software for machine learning, vis data analysis.
The predictor variables were provided via an input set (x) d X3, Xp, Xg, Xt, Xm, NS and Nt}, while the target variable (y) is Pu.Th Root Node Decision Node Leaf Node

Construction of Prediction Models
Orange software was used to create the proposed models for predicting pile bearing capacity.Orange is an open-source software package.Machine learning, preprocessing, and visualization methods are included in the default installation, which is divided into six widget sets i.e., Data, Visualize, Classify, Regression, Evaluate and Unsupervised.Orange is visual programming software for machine learning, visualization, data mining, data analysis.
The predictor variables were provided via an input set (x) defined by x = {D, X 1 , X 2 , X 3 , X p , X g , X t , X m , N S and N t }, while the target variable (y) is P u .The most important task in every modeling step is to pick the right number of training and testing datasets.As a result, 70% of the whole data was chosen to generate the models in this study, with the developed models being tested on the remaining data.On the other way, 140 and 60 sets were utilized for creating and testing the models, respectively.All models (XGBoost, AdaBoost, RF, DT, and SVM) were tweaked to optimize the P u prediction using the trial-and-error process.Figure 7 shows how the prediction models were built.

Hyperparameter Optimization
ML algorithms have parameters that must be tuned.The optimization procedure seeks to find ideal settings for XGBoost, AdaBoost, RF, DT, and SVM to achieve accurate prediction.This study tunes various critical parameters in the XGBoost, AdaBoost, RF, DT and SVM, as well as clarifies the definitions of these hyperparameters.The tuning parameters for the models were chosen and then changed in the trials until the best metrics shown in Table 3 were achieved.every modeling step is to pick the right number of training and testing datasets.As a result, 70% of the whole data was chosen to generate the models in this study, with the developed models being tested on the remaining data.On the other way, 140 and 60 sets were utilized for creating and testing the models, respectively.All models (XGBoost, Ada-Boost, RF, DT, and SVM) were tweaked to optimize the Pu prediction using the trial-anderror process.Figure 7 shows how the prediction models were built.

Hyperparameter Optimization
ML algorithms have parameters that must be tuned.The optimization procedure seeks to find ideal settings for XGBoost, AdaBoost, RF, DT, and SVM to achieve accurate prediction.This study tunes various critical parameters in the XGBoost, AdaBoost, RF, DT and SVM, as well as clarifies the definitions of these hyperparameters.The tuning parameters for the models were chosen and then changed in the trials until the best metrics shown in Table 3 were achieved.

Model Evaluation Indexes
The results of the proposed models are evaluated using R 2 , MAE, RMSE, MARE, NSE and RSR, as more commonly used criteria in the literature.The following equations are used to calculate these metrics: where n denotes the number of points, x i and xi denotes the actual and expected outputs of i-th sample, respectively; x is data averaged actual output.R 2 is a number that ranges from 0 to 1, a higher R 2 value indicates a more efficient model.The model is considered effective when R 2 is more than 0.8 and close to 1 [22].The mean squared difference between projected outputs and targets is the criterion RMSE, and the mean magnitude of errors is the criterion MAE, RMSE and MAE are similar in that the closer these criterion values of these errors are to 0, the better the model's performance.In circumstances where the MAE and RMSE are minimal, the model's accuracy is greater.Models yielded the lowest MARE value, indicating that the model has superior predictive power.The RSR ranges from 0 to a considerable positive number.Lower RSR indicates lower RMSE, indicating that the model is more productive.RSR and NSE categorization ranges as very good, good, satisfactory, and unsatisfactory with ranges 0 ≤ RSR ≤ 0.5, 0.5 < RSR ≤ 0.6, 0.6 < RSR ≤ 0.7, RSR > 0.7 and 0.75 < NSE ≤ 1, 0.65 < NSE ≤ 0.75, 0.5 < NSE ≤ 0.65, and NSE ≤ 0.5, respectively [65].
In addition, the Taylor diagram was used to compare the model's performance visually [66].Taylor diagram shows how similar patterns are and how closely a model pattern relates to reference.The standard deviation (σ), R 2 , and the RMSE are three equivalent model performance statistics that can be shown on a two-dimensional plot using the law of cosines.Taylor diagram is the best method for comparing the performance of various models in particular.

Comparison of Models
This section evaluates the model's efficacy.Figures 8 and 9 depict the training and testing dataset's prediction performance in regression form, respectively, while Tables 4 and 5 provide a summary of the relevant data.         5.
Comparing the above performance measures the proposed XGBoost model performed better than the AdaBoost, RF, DT and SVM.From these statistical analysis and prediction capabilities, we can state that the XGBoost model has good accuracy prediction for pile bearing capacity.
The sensitivity results of the XGBoost model were assessed using Yang and Zang's [67] method for assessing the impact of input variables on P u .This approach, which has been used in several investigations [22,28,[68][69][70], is as follows: as n represents the number of values (i.e., 140); x im and x om denotes input and output variables, respectively.For each input parameter, the r ij value ranges from zero to one, with the greatest r ij values indicating the efficient output variable (i.e., Pu).
Figure 10 shows the r ij scores for all input variables.Figure 10 demonstrates that SPT blow count at pile shaft (N S ) (r ij = 0.985) has the greatest effect on the Pu.Comparing the above performance measures the proposed XGBoost model performed better than the AdaBoost, RF, DT and SVM.From these statistical analysis and prediction capabilities, we can state that the XGBoost model has good accuracy prediction for pile bearing capacity.
The sensitivity results of the XGBoost model were assessed using Yang and Zang's [67] method for assessing the impact of input variables on Pu.This approach, which has been used in several investigations [22,28,[68][69][70], is as follows: as n represents the number of values (i.e., 140);  and  denotes input and output variables, respectively.For each input parameter, the  value ranges from zero to one, with the greatest  values indicating the efficient output variable (i.e., Pu).
Figure 10 shows the  scores for all input variables.Figure 10 demonstrates that SPT blow count at pile shaft (NS) ( = 0.985) has the greatest effect on the Pu.With the use of the Taylor diagram (see Figure 11), we investigated the model's efficiency further.The better the performance, the closer each produced model's point is to the observed point location.The models demonstrated the best predictive capability, while the XGBoost method had a greater correlation and a lesser RMSE.

Comparison with Other Researchers
Table 6 shows some findings from a study on machine learning applications on pile bearing capacity.On the test data set, the expected efficiency of ML algorithms in foundation engineering having predictive outcomes of foundation load is mostly ranging R 2 from 0.71 to 0.918, according to the results of previous studies while in the present study it is 0.955.However, due to the use of different datasets, a comparison between these results is unwarranted.A project that uses different datasets is needed to give a generalized model to foundation engineering.
With the use of the Taylor diagram (see Figure 11), we investigated the model' ciency further.The better the performance, the closer each produced model's point the observed point location.The models demonstrated the best predictive capab while the XGBoost method had a greater correlation and a lesser RMSE.

Comparison with Other Researchers
Table 6 shows some findings from a study on machine learning applications on bearing capacity.On the test data set, the expected efficiency of ML algorithms in fou tion engineering having predictive outcomes of foundation load is mostly ranging R 2 0.71 to 0.918, according to the results of previous studies while in the present stud 0.955.However, due to the use of different datasets, a comparison between these re is unwarranted.A project that uses different datasets is needed to give a genera model to foundation engineering.

Conclusions
Pile bearing capacity values were estimated in this paper using five models.The prediction model was built with ten input parameters and one output parameter.The modeling results show that the XGBoost model has the best capability for accurate prediction of Pu when compared to other models such as AdaBoost, RF, DT and SVM.The following are some of the major findings of this study: 1.
In testing phase, the XGBoost model (R 2 = 0.955, MAE = 59.929,RMSE = 80.653, MARE = 6.6,NSE = 0.950, and RSR = 0.225) has the highest performance capability as compared to other soft computing techniques considered in this study i.e., AdaBoost, RF, DT and SVM as well as the models used in the literature.

2.
Sensitivity analysis results show that SPT blow count at pile shaft (N S ) was the most important parameter in predicting pile bearing capacity.

3.
Taylor diagram also verified that all the models are good but the predictive power of the XGBoost algorithm had a higher correlation and lower RMSE.

4.
Based on the results and analysis the XGBoost model can also be applied to solve a variety of geotechnical engineering problems.
Furthermore, the XGBoost technique has the advantage of being easily updated, it is obvious that the proposed model is open to further development, and that the collection of more data will result in significantly stronger prediction capability, avoiding the requirement for expertise and time to update an existing design aid or equation.

Figure 1 .
Figure 1.Schematic layout of pile load test.
) diameter of the pile (D); (2) depth of the first layer of soil embedded (X1); (3) depth of the second layer of soil embedded (X2); (4) depth of the third layer of soil

Figure 1 .
Figure 1.Schematic layout of pile load test.

Figure 5 .
Figure 5. SVM for a regression problem.

Figure 5 .
Figure 5. SVM for a regression problem.

Figure 7 .
Figure 7.The flowchart for applying a data-driven technique to anticipate pile bearing capacity.

Figure 7 .
Figure 7.The flowchart for applying a data-driven technique to anticipate pile bearing capacity.

Figure 10 .
Figure 10.Sensitivity analysis of input variables.

Figure 10 .
Figure 10.Sensitivity analysis of input variables.

Figure 11 .
Figure 11.Taylor diagram of the models.

Figure 11 .
Figure 11.Taylor diagram of the models.

Table 1 .
Statistical study of inputs and output data.

Table 1 .
Statistical study of inputs and output data.

Table 4 .
Summary of Training model.

Table 4 .
In training part, XGBoost produced lesser MARE, NSE and RSR values compared to AdaBoost, RF, DT and SVM.

Table 6 .
Comparison with other studies.

Table 6 .
Comparison with other studies.