Prediction of Pile Axial Bearing Capacity Using Artificial Neural Network and Random Forest

Axial bearing capacity of piles is the most important parameter in pile foundation design. In this paper, artificial neural network (ANN) and random forest (RF) algorithms were utilized to predict the ultimate axial bearing capacity of driven piles. An unprecedented database containing 2314 driven pile static load test reports were gathered, including the pile diameter, length of pile segments, natural ground elevation, pile top elevation, guide pile segment stop driving elevation, pile tip elevation, average standard penetration test (SPT) value along the embedded length of pile, and average SPT blow counts at the tip of pile as input variables, whereas the ultimate load on pile top was considered as output variable. The dataset was divided into the training (70%) and testing (30%) parts for the construction and validation phases, respectively. Various error criteria, namely mean absolute error (MAE), root mean squared error (RMSE), and the coefficient of determination (R2) were used to evaluate the performance of RF and ANN algorithms. In addition, the predicted results of pile load tests were compared with five empirical equations derived from the literature and with classical multi-variable regression. The results showed that RF outperformed ANN and other methods. Sensitivity analysis was conducted to reveal that the average SPT value and pile tip elevation were the most important factors in predicting the axial bearing capacity of piles.


Introduction
In applied engineering, piles have been used as the foundation elements, in which the axial bearing capacity of the pile (P u ) is considered as the most important parameter in the design of pile foundation. Normally, the axial bearing capacity of piles can be determined by five approaches, namely the static analysis, dynamic analysis, dynamic testing, pile load testing, and in-situ testing [1]. Out of these methods, the pile load test is considered as the best method to determine the pile bearing capacity. However, such a method is time-consuming, and the costs are often difficult to justify for ordinary or small projects, whereas other methods have lower accuracy. As a result, several approaches have been developed to predict the axial bearing capacity of pile or to enhance the prediction accuracy. The nature of these methods included some simplifications, assumptions, or empirical approaches with respect to the soil stratigraphy, soil-pile structure interactions, and the distribution of soil resistance along the pile. In such studies, the test results were used as complementary elements to improve the prediction accuracy. Meanwhile, the European standard (Euro code 7) [2] recommends using the following ground field tests: DP (dynamic probing test), SS (press-in and screw-on probe test), SPT (standard penetration test), PMT (pressure meter tests), PLT (plate loading test), DMT (flat dilatometer test), FVT (field vane test), CPTu (cone penetration tests with the measurement of pore pressure).

Significance of the Research Study
Accurately predicting the axial bearing capacity of pile is of crucial importance because of many possible advantages and contributions to foundation engineering. Numerical or experimental approaches in the available literature still face some limitations, for instance, the lack of dataset samples (Momeni et al. [44] with 36 samples; Bagińska and Srokosz [45] with 50 samples; Teh et al. [46] with 37 samples), accuracy assessment and improvement of the ML algorithms or comparison with classical prediction methodologies. Therefore, the contribution of the present work could be highlighted through the following ideas: (i) the largest dataset, to the best of the author's knowledge, was used for the construction of ML models, including 2314 experimental tests; (ii) a comparison of two ML algorithms, namely ANN and RF, was conducted and compared with classical MVR and five formulas in the literature to fully assess the prediction performance of each approach; (iii) the performance of ML algorithms was evaluated under the presence of random splitting dataset, which could truly find out the efficiency of ML algorithms; and (iv) a sensitivity analysis was performed to reveal the role of each input parameters in predicting the axial bearing capacity of piles.

Experimental Measurement of Bearing Capacity
In this work, pile load tests were conducted on 2314 reinforced concrete piles at the test site in Ha Nam province-Vietnam, where the location is shown in Figure 1a. Pre-cast square section piles with closed tips were driven to the ground by hydraulic pile presses machine with a constant rate of penetration. The piles were assembled from 1 to 3 pile segments, connected by welding through intermediate steel plates and fixed steel plates at the top of each pile segment. The pile top was driven into the ground to the pile top elevation design by the use of the guide pile segment. These pile load tests started at least 7 days after piles were driven, and the experimental layout could be depicted in Figure 1b-d. In each pile test, the load increased progressively. If, after one hour of monitoring, the settlement of the pile top was less than 0.20 mm, the load was increased to a new level. Depending on the design requirements, piles could load up to 200% of the design load. The time required for 100%, 150%, and 200% of load could last for more than 6 h to 12 h or 24 h, respectively. The determination of the bearing capacity of piles were conducted as follows: (i) (i) If the settlement of pile top at a given load level was 5 times higher than the settlement of pile top at the previous load level, or the settlement of the pile top at a given load level increased continuously while the load did not increase, the pile bearing capacity was determined based on that given failure load. The number of piles corresponded to this situation was 688, representing about 30% of the samples. An example of this situation is given in Appendix A ( Figure A1). (ii) When the pile load capacity was too large to be able to test by the destructive load, the load curve (P)-settlement (S) was plotted in log(P)-log(S). The intersection point of two lines was considered as a result of failure and taken as the pile bearing capacity, according to De Beer (1968) [47]. The number of piles corresponded to this situation was 1225 piles (accounting for more than 50% of the samples). An example of this situation is given in Appendix A ( Figures A2 and A3). (iii) For the remaining samples, when the log(P)-log(S) relationship is linear, which could not find the intersection point compared with the previous case. The determination of the pile bearing capacity was taken at the load level when the settlement of the pile top exceeded 10% of the pile diameter. Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 21

Data Preparation
In order to correctly predict the bearing capacity of piles, a thorough understanding of the factors that affect the bearing capacity of the pile is needed. Most traditional pile bearing capacity determination methods included the following parameters: pile geometry, pile material properties, and soil properties [4,5]. The depth of the water table was not included in this study, as it is believed that the effect is already accounted in the SPT blow counts [43]. Since the bearing capacity of piles depended on the soil compressibility and the SPT was one of the most commonly used tests in practice (indicating the in situ compressibility of soils), the SPT blow count/300 mm (N) along the embedded length of the pile was used as a measure of soil compressibility. In this study, the average SPT blow count along the pile shaft (Ns) and pile tip (Nt) were calculated. It is worth mentioning that, in order to obtain the average SPT (Nt) value around the pile tip, Meyerhof's recommendation (1976) [4] was considered. The average SPT (Nt) value for 8D above and 3D below the pile tip was obtained, where D represented the pile diameter.
Hence, the factors that were used for ML simulation were ( Figure 2): (i) pile diameter (D); (ii) length of pile tip segment (L1); (iii) length of second pile segment (L2); (iv) length of pile top segment (L3); (v) the natural ground elevation (Eg); (vi) pile top elevation (Ep); (vii) guide pile segment stop driving elevation (Et); (viii) pile tip elevation (Zm); (ix) the average SPT blow along the embedded length of the pile (Ns) and (x) the average SPT blow at the tip of the pile (Nt). The bearing capacity was the single output variable in this study (Pu).
Due to an important quantity of data (2314 samples), the dataset used in this study is partly presented and summarized in Table 1, along with the statistical information of the input and output variables. As observed in Table 1, the pile diameter (D) ranged from 300 to 400 mm. The length of pile tip section (L1) ranged from 3 m to 8.4 m. The length of the second pile segment (L2) ranged from 1.47 m to 8 m. The length of pile top segment (L3) ranged from 0 m to 3.95 m, where a 0 value means that segment did not exist. The natural ground elevation (Eg) varied from −1.6 m to 3.4 m. The pile

Data Preparation
In order to correctly predict the bearing capacity of piles, a thorough understanding of the factors that affect the bearing capacity of the pile is needed. Most traditional pile bearing capacity determination methods included the following parameters: pile geometry, pile material properties, and soil properties [4,5]. The depth of the water table was not included in this study, as it is believed that the effect is already accounted in the SPT blow counts [43]. Since the bearing capacity of piles depended on the soil compressibility and the SPT was one of the most commonly used tests in practice (indicating the in situ compressibility of soils), the SPT blow count/300 mm (N) along the embedded length of the pile was used as a measure of soil compressibility. In this study, the average SPT blow count along the pile shaft (N s ) and pile tip (N t ) were calculated. It is worth mentioning that, in order to obtain the average SPT (N t ) value around the pile tip, Meyerhof's recommendation (1976) [4] was considered. The average SPT (N t ) value for 8D above and 3D below the pile tip was obtained, where D represented the pile diameter.
Hence, the factors that were used for ML simulation were ( Figure 2): (i) pile diameter (D); (ii) length of pile tip segment (L 1 ); (iii) length of second pile segment (L 2 ); (iv) length of pile top segment (L 3 ); (v) the natural ground elevation (E g ); (vi) pile top elevation (E p ); (vii) guide pile segment stop driving elevation (E t ); (viii) pile tip elevation (Z m ); (ix) the average SPT blow along the embedded length of the pile (N s ) and (x) the average SPT blow at the tip of the pile (N t ). The bearing capacity was the single output variable in this study (P u ).
Due to an important quantity of data (2314 samples), the dataset used in this study is partly presented and summarized in Table 1, along with the statistical information of the input and output variables. As observed in Table 1, the pile diameter (D) ranged from 300 to 400 mm. The length of pile tip section (L 1 ) ranged from 3 m to 8.4 m. The length of the second pile segment (L 2 ) ranged from     In this work, the collected dataset was divided into the training and testing datasets. The training part (approximately 70% of the total data) was used to train the ML models, whereas testing data (approximately 30% of the remaining dataset) was used to validate the performance of the ML models. Different from the original data, the training dataset (including 10 inputs and 1 output) was scaled in the [−1; 1] range ( Table 2). By considering all variables in a uniform range, bias within the dataset between inputs could be minimized. In the present study, the range [−1; 1] was selected to better capture the non-Gaussian distribution of input variables. A scaling process of parameters, such as the minimum and maximum values of the training data were also used to scale the testing dataset. The scaling procedure of input and output variables was applied using Equation (1). Besides, the histograms of all the data, including 10 inputs and 1 output are presented in Figure 3.
where α and β represented the minimum and maximum values of the corresponding variables, and x denoted the value of the selected input variable to be scaled.  Table 2. Statistical values of the normalization process of the training dataset Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 21 In this work, the collected dataset was divided into the training and testing datasets. The training part (approximately 70% of the total data) was used to train the ML models, whereas testing data (approximately 30% of the remaining dataset) was used to validate the performance of the ML models. Different from the original data, the training dataset (including 10 inputs and 1 output) was scaled in the [−1; 1] range (Table 2). By considering all variables in a uniform range, bias within the dataset between inputs could be minimized. In the present study, the range [−1; 1] was selected to better capture the non-Gaussian distribution of input variables. A scaling process of parameters, such as the minimum and maximum values of the training data were also used to scale the testing dataset. The scaling procedure of input and output variables was applied using Equation (1). Besides, the histograms of all the data, including 10 inputs and 1 output are presented in Figure 3.
where α and β represented the minimum and maximum values of the corresponding variables, and x denoted the value of the selected input variable to be scaled.

Random Forest (RF)
Random Forests (RF) designate a family of ML methods, composed of different algorithms for inducing a set of decision trees, such as the Breiman Forest algorithm presented by Breiman [48] and often used in the literature as a benchmark model. In this algorithm, two principles of 'randomization' are used, namely the bagging and random feature selection. The learning step, therefore, consists in building a set of decision trees, each driven from a 'bootstrap' subset from the original learning set, i.e., using the bagging principle, and using a tree induction method called random tree. Such an induction algorithm, usually based on the classification and regression trees (CART) algorithm [49], modifies the partitioning procedure of the nodes of the tree so that the selection of the characteristic used as criterion of partitioning is partially random. In other words, for each node of the tree, a subset of characteristics is generated randomly, from which the best partitioning is achieved. To summarize, the RF method, a decision tree is constructed according to the following steps: Step 1: For N data from the learning set, randomly draw N individuals with a discount. The resulting assembly will be the one used for the induction of the tree; Step 2: For M characteristics, a number K << M is specified so that at each node of the tree, a subset of K characteristics is drawn randomly, among which the best is then selected for partitioning; Step 3: The tree is constructed until it reaches its maximum size. No pruning is done. In this process, the induction of the tree is mainly directed by a hyperparameter, i.e., the number K. This number makes it possible to introduce more or less randomness into the induction.
In this way, except in the case of K = M, the induction of the tree is not at all 'randomized'. Each tree in the forest has a structure and properties, which cannot be grasped. The randomness in RF induction could take advantage of the complementarity of trees, but there is no guarantee that adding a tree to the forest will actually improve the performance. Only a few research studies in the literature have looked at the number of decision trees to be built within a forest. When Breiman introduced the RF formalism in [48], the author also demonstrated that beyond a certain number of trees, adding others did not systematically improve the performance of RF. This result indicates that the number of trees in an RF does not necessarily have to be as large as possible to produce an efficient regressor. The results of [50,51] have experimentally confirmed this assertion.

Artificial Neural Network (ANN)
Artificial neural network (ANN) has emerged over the past four decades as a powerful and versatile computational tool for organizing and correlating knowledge [52]. ANN has been proved a useful prediction tool in solving many types of problems, usually difficult to address using conventional numerical and statistical approaches [53]. MeCulloch, the mathematician, and Pitts, the neuroscientist, are the first two scientists who developed the idea of ANN-based on the structure of the human brain [54]. The ANN has been proved to exhibit a strong ability to handle complex problems in which the relationships between input(s) and output(s) are complex or nonlinear [55]. This computational technique is capable of recognizing, capturing, and mapping features, the so-called 'patterns' in a set of data, primarily due to high neuronal interconnections that process much information in parallel [53]. The structure of ANN consists of several layers (input, hidden, and output layers) connected together by different link weights through hidden nodes [55]. In each node, the activation function is applied. The node net input is obtained by summing the weights of the connection as well as a bias [44]. The back-propagation is the most popular manner to train ANN among different learning algorithms [56]. In addition, many other rules of study have been introduced and extended to this day [57,58].
This fact shows that ANN has many advantages and is widely used in different fields, especially in the field of construction engineering [59,60]. Moreover, numerous investigations on the determination of bearing capacity using ANN have been conducted [53,55,57,58,60]. In the work of Pal and Deswal [61], the author predicted the total capacity of concrete spun pipe piles by using stress-wave data to build the ANN-based predictive model. Based on the conclusion, the predictive performance given by ANN was more reliable compared to supporting vector machines (SVM). Benaliand and Nechnech [62] proposed an ANN-based predictive model of cohesionless soil pile bearing efficiency. A number of 80 cases of axial load were used, and the results showed that the correlation coefficient (R) equaled to 0.92, showing the reliability of the ANN-based predictive model. Besides, numerous investigations on the behavior of pile bearing using ANN also pointed out a better prediction performance compared with analytical and empirical methods [53,[60][61][62]. A schematic diagram of neural network could be illustrated in Figure 4. In the ANN algorithm, the multi-layer network is generally shown to operate most effectively, because it can simulate nonlinear processes. Neural computing's main principle is the decomposition of the relationship between inputs and output into a series of linearly separable steps using hidden layers. Three distinct steps in developing an ANN-based solution could be summarized as [63]: Step 1: Transformation or scaling of data; Step 2: Definition of network architecture, where the number of hidden layers, the number of neurons in each layer, and the connectivity between neurons are determined. The network architecture selection could be depicted from Figure 4; Step 3: Train the network to respond correctly a given set of inputs, this step is considered as the development of neural network ( Figure 5); Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 21 stress-wave data to build the ANN-based predictive model. Based on the conclusion, the predictive performance given by ANN was more reliable compared to supporting vector machines (SVM). Benaliand and Nechnech [62] proposed an ANN-based predictive model of cohesionless soil pile bearing efficiency. A number of 80 cases of axial load were used, and the results showed that the correlation coefficient (R) equaled to 0.92, showing the reliability of the ANN-based predictive model. Besides, numerous investigations on the behavior of pile bearing using ANN also pointed out a better prediction performance compared with analytical and empirical methods [53,[60][61][62]. A schematic diagram of neural network could be illustrated in Figure 4. In the ANN algorithm, the multi-layer network is generally shown to operate most effectively, because it can simulate nonlinear processes. Neural computing's main principle is the decomposition of the relationship between inputs and output into a series of linearly separable steps using hidden layers. Three distinct steps in developing an ANN-based solution could be summarized as [63]: Step 1: Transformation or scaling of data; Step 2: Definition of network architecture, where the number of hidden layers, the number of neurons in each layer, and the connectivity between neurons are determined. The network architecture selection could be depicted from Figure 4; Step 3: Train the network to respond correctly a given set of inputs, this step is considered as the development of neural network ( Figure 5);

Performance Evaluation
In this paper, three indicators accounting for the error between the actual and predicted values of Pu were used, namely the mean absolute error (MAE), root mean square error (RMSE), squared correlation coefficient, or the coefficient of determination (R 2 ). The  Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 21 stress-wave data to build the ANN-based predictive model. Based on the conclusion, the predictive performance given by ANN was more reliable compared to supporting vector machines (SVM). Benaliand and Nechnech [62] proposed an ANN-based predictive model of cohesionless soil pile bearing efficiency. A number of 80 cases of axial load were used, and the results showed that the correlation coefficient (R) equaled to 0.92, showing the reliability of the ANN-based predictive model. Besides, numerous investigations on the behavior of pile bearing using ANN also pointed out a better prediction performance compared with analytical and empirical methods [53,[60][61][62]. A schematic diagram of neural network could be illustrated in Figure 4. In the ANN algorithm, the multi-layer network is generally shown to operate most effectively, because it can simulate nonlinear processes. Neural computing's main principle is the decomposition of the relationship between inputs and output into a series of linearly separable steps using hidden layers. Three distinct steps in developing an ANN-based solution could be summarized as [63]: Step 1: Transformation or scaling of data; Step 2: Definition of network architecture, where the number of hidden layers, the number of neurons in each layer, and the connectivity between neurons are determined. The network architecture selection could be depicted from Figure 4; Step 3: Train the network to respond correctly a given set of inputs, this step is considered as the development of neural network ( Figure 5);

Performance Evaluation
In this paper, three indicators accounting for the error between the actual and predicted values of Pu were used, namely the mean absolute error (MAE), root mean square error (RMSE), squared correlation coefficient, or the coefficient of determination (R 2 ). The R 2 measured the squared correlation between the predicted and actual Pu values, having values in the range of [0, 1]. Low RMSE and MAE values showed better accuracy of the proposed ML algorithms. On the other hand, RMSE calculated the squared root average difference, whereas MAE calculated the difference between the predicted and actual Pu values. These values could be calculated using the following equations [64][65][66][67][68]:

Performance Evaluation
In this paper, three indicators accounting for the error between the actual and predicted values of P u were used, namely the mean absolute error (MAE), root mean square error (RMSE), squared correlation coefficient, or the coefficient of determination (R 2 ). The R 2 measured the squared correlation between the predicted and actual P u values, having values in the range of [0, 1]. Low RMSE and MAE values showed better accuracy of the proposed ML algorithms. On the other hand, RMSE calculated the squared root average difference, whereas MAE calculated the difference between the predicted and actual P u values. These values could be calculated using the following equations [64][65][66][67][68]: where k inferred the number of the samples, v i and v i were the actual and predicted outputs, respectively, and v was the average value of the v i .

Comparison of RF and ANN
The effectiveness of RF and ANN models is evaluated in this section. The parameters of ANN and RF used in this study are given in Tables 3 and 4, respectively. The prediction performance in a regression form is shown in Figures 6 and 7 for the training and testing datasets, respectively, whereas a summary of the corresponding information is indicated in Table 5. It is worth mentioning that the results presented herein were transformed into the normal range.   Considering the testing dataset, the RF model yielded the best prediction results with respect to R 2 , RMSE, MAE, the mean of error merror, and StDerror (i.e., R 2 = 0.866, 0.809; RMSE = 98.161, 116.366; MAE = 2.924, 3.190; merror = 0.573%, 1.202%; StDerror =9.461%, 10.786% using RF and ANN, respectively). The MAE value of RF was slightly higher in the training part but much lower in the testing one compared to ANN because the performance of such a model might be influenced by choice of the selected index of the training data.  From a statistical point of view, the performance of ANN and RF algorithms needs to be fully evaluated. The above results showed that the RF model was better than ANN in predicting the bearing capacity of piles. As mentioned in the simulation procedure, 70% of the experimental data were randomly selected in order to construct and train the ANN and RF black boxes. The performance of such a model might be influenced by the choice of the sample indexes to construct the training dataset. Therefore, a total number of 1000 numerical simulations were next carried out, taking into account the random splitting effect in the dataset. The repetition of a simulation taking  From a statistical point of view, the performance of ANN and RF algorithms needs to be fully evaluated. The above results showed that the RF model was better than ANN in predicting the bearing capacity of piles. As mentioned in the simulation procedure, 70% of the experimental data were randomly selected in order to construct and train the ANN and RF black boxes. The performance of such a model might be influenced by the choice of the sample indexes to construct the training dataset. Therefore, a total number of 1000 numerical simulations were next carried out, taking into account the random splitting effect in the dataset. The repetition of a simulation taking into account the random effect of input could also be called the Monte Carlo simulations, which is well-known in the literature [69]. The R 2 values of these simulations are plotted in Figure 8a,c, and the corresponding histograms are plotted in Figure 8b,d. It was observed that the proposed RF model gave satisfactory R 2 values within the range of 0.83 to 0.87. The most frequent R 2 obtained over 1000 simulations was R 2 = 0.855 with a frequency of about 170. Besides, ANN algorithm showed a lower accuracy when R 2 values ranged from 0.78 to 0.84 with the most frequent values of R 2 = 0.82 (frequency of about 260). Summarized values of the accuracy corresponded to the two models for the testing part is presented in Table 6.
gave satisfactory R values within the range of 0.83 to 0.87. The most frequent R obtained over 1000 simulations was R 2 = 0.855 with a frequency of about 170. Besides, ANN algorithm showed a lower accuracy when R 2 values ranged from 0.78 to 0.84 with the most frequent values of R 2 = 0.82 (frequency of about 260). Summarized values of the accuracy corresponded to the two models for the testing part is presented in Table 6.
It could thus be concluded that RF and ANN algorithms had high potential to predict the bearing capacity of driven piles. However, RF technique yielded better results with average R 2 = 0.861 compare to ANN (average R 2 = 0.811). In conclusion, from the statistical analysis and prediction errors, RF algorithm was the better model to predict the bearing capacity of pile.

Comparison with Empirical Equations and Multi-Variable Regression
In this section, comparisons of the RF model on the prediction of bearing capacity of driven piles are conducted with traditional formulas. These formulas used SPT value to estimate the bearing capacity of driven pile, as in the works of   [4],  [7], Decourt  It could thus be concluded that RF and ANN algorithms had high potential to predict the bearing capacity of driven piles. However, RF technique yielded better results with average R 2 = 0.861 compare to ANN (average R 2 = 0.811). In conclusion, from the statistical analysis and prediction errors, RF algorithm was the better model to predict the bearing capacity of pile.

Comparison with Empirical Equations and Multi-Variable Regression
In this section, comparisons of the RF model on the prediction of bearing capacity of driven piles are conducted with traditional formulas. These formulas used SPT value to estimate the bearing capacity of driven pile, as in the works of   [4],   [7], Decourt (1995) [10],   [8], and AIJ (2004) [11]. The values for unit base (Q b ) and unit shaft (Q s ) resistance are summarized in Table 7.    [7] Q b kN/m 2 = 300N t Q s (kN/m 2 ) = 2N s  [10] The value of the bearing capacity could be then estimated in function of Q b and Q s , following the well-known formula where n refers to the number of soil layers, L i denotes the thickness and Q s(i) indicates the value for unit shaft resistance of the i th soil layer which piles penetrated through. In addition, the use of classical MVR to predict the bearing capacity of pile was also applied. Multi-variable regression technique is commonly used in several studies, such as Egbe et al. [70] and Silva et al. [71] to predict the properties and chemical composition of soil. The regression coefficient results are shown in Table 8.  Figure 9 and Table 9 show several results and detail information of measured pile bearing capacity of the testing part, along with the bearing capacity predicted by RF model and MVR, , , ,   , , , and AIJ (2004), respectively). The results also showed that the MVR was less accurate than the RF model, but both models gave better accuracy than traditional formulas. It is worth noticing that these formulas were developed for granular soil. However, the effect of soil type in estimating the bearing capacity of piles was neglected in this study, knowing that such bearing capacity depends on soil types. The main purpose of this section was to demonstrate the higher prediction capacity of ML approach compared with empirical equations even without information related to soil types.

Feature Importance Analysis Using RF
Basically, the RF algorithm allows evaluating the importance of input parameters. The importance of each predictor variable was measured as the change in the prediction accuracy (by an increase in mean square error-MSE), computed by permuting (value randomly shuffled) the variable with out-of-bag data in the random forests validation approach [72]. A more significant percentage of increase in mean square error indicates higher importance of a given variable in the prediction process [73]. The total sum percentage of the increase in the MSE of all variables is equal to 100%.

Feature Importance Analysis Using RF
Basically, the RF algorithm allows evaluating the importance of input parameters. The importance of each predictor variable was measured as the change in the prediction accuracy (by an increase in mean square error-MSE), computed by permuting (value randomly shuffled) the variable with out-of-bag data in the random forests validation approach [72]. A more significant percentage of increase in mean square error indicates higher importance of a given variable in the prediction process [73]. The total sum percentage of the increase in the MSE of all variables is equal to 100%.
Amongst the 10 variables used to predict the bearing capacity of pile with RF, the average SPT blow along the embedded length of the pile (N s ) was the most important variable, as an increase in MSE of 33% was noticed ( Figure 10). Indeed, N s is an important indicator of predicting pile-bearing capacity, and it is related to the ultimate friction along the pile shaft. The pile tip elevation Z m was the second important variable, confirmed by an increase in MSE of 25%. From a soil mechanic point of view, it meant that with a little change in the soil properties, the pile tip elevation played an important role in the bearing capacity of pile; such a variable is involved in the pile tip resistance. The variables L 2 , L 1 , and E t were ranked as the third to the fifth important predictors, with an increase in MSE, ranging from 5% to 19% (Figure 10). Other predictor variables included in the model (Z g , N t , Z p , L 3 , and D) had lower than 5% of the increase in MSE. This observation was also in good agreement with MVR results, where the coefficients of input variables L 3 and Z p were equal to 0, while D had a relatively small coefficient of 2.33 (see Table 8).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 16 of 21 Amongst the 10 variables used to predict the bearing capacity of pile with RF, the average SPT blow along the embedded length of the pile (Ns) was the most important variable, as an increase in MSE of 33% was noticed ( Figure 10). Indeed, Ns is an important indicator of predicting pile-bearing capacity, and it is related to the ultimate friction along the pile shaft. The pile tip elevation Zm was the second important variable, confirmed by an increase in MSE of 25%. From a soil mechanic point of view, it meant that with a little change in the soil properties, the pile tip elevation played an important role in the bearing capacity of pile; such a variable is involved in the pile tip resistance. The variables L2, L1, and Et were ranked as the third to the fifth important predictors, with an increase in MSE, ranging from 5% to 19% (Figure 10). Other predictor variables included in the model (Zg, Nt, Zp, L3, and D) had lower than 5% of the increase in MSE. This observation was also in good agreement with MVR results, where the coefficients of input variables L3 and Zp were equal to 0, while D had a relatively small coefficient of 2.33 (see Table 8).

Conclusions
In this study, the RF and ANN algorithms were used to examine the capability in predicting the bearing capacity of piles. An unprecedented database containing 2314 instances from on-field measurements of pile load test was used to develop and evaluate the two proposed ML models. The results showed that RF outperformed ANN with a satisfactory accuracy (R 2 = 0.866, RMSE = 98.161 kN, MAE = 2.924 kN using RF compared with R 2 = 0.809, RMSE = 116.366 kN, MAE =3.190 kN using ANN). Moreover, the results of this study indicated that the RF algorithm was more accurate in predicting the pile bearing capacity than those obtained from traditional approaches, namely the formulas or empirical equations from the work of , , , , AIJ (2004), and a classical MVR. In addition, a sensitivity analysis using RF indicated that the average SPT value along pile shaft Ns, pile tip elevation, and L2, L1, Et had the most significant effect on the predicted bearing capacity of piles.
Overall, the RF algorithm, like many other machine learning algorithms, has an additional advantage over conventional methods, which is, once the model is constructed, it can be used as an accurate, quick numerical tool for estimating the bearing capacity of piles. Thus, the performance of such a numerical tool is crucial in foundation engineering. Therefore, improving the prediction accuracy is one perspective of the present work, for instance, using hybrid ML algorithms or deep neural network to predict the bearing capacity of piles.

Conclusions
In this study, the RF and ANN algorithms were used to examine the capability in predicting the bearing capacity of piles. An unprecedented database containing 2314 instances from on-field measurements of pile load test was used to develop and evaluate the two proposed ML models. The results showed that RF outperformed ANN with a satisfactory accuracy (R 2 = 0.866, RMSE = 98.161 kN, MAE = 2.924 kN using RF compared with R 2 = 0.809, RMSE = 116.366 kN, MAE =3.190 kN using ANN). Moreover, the results of this study indicated that the RF algorithm was more accurate in predicting the pile bearing capacity than those obtained from traditional approaches, namely the formulas or empirical equations from the work of , , , , AIJ (2004), and a classical MVR. In addition, a sensitivity analysis using RF indicated that the average SPT value along pile shaft N s , pile tip elevation, and L 2 , L 1 , E t had the most significant effect on the predicted bearing capacity of piles.
Overall, the RF algorithm, like many other machine learning algorithms, has an additional advantage over conventional methods, which is, once the model is constructed, it can be used as an accurate, quick numerical tool for estimating the bearing capacity of piles. Thus, the performance of such a numerical tool is crucial in foundation engineering. Therefore, improving the prediction accuracy is one perspective of the present work, for instance, using hybrid ML algorithms or deep neural network to predict the bearing capacity of piles.   Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Figure A1. Load-settlement relationship for pile number P_511. Figure A2. Load-settlement relationship for pile number P_140. Figure A2. Load-settlement relationship for pile number P_140.