On the Potential of Using Random Forest Models to Estimate the Seismic Bearing Capacity of Strip Footings Positioned on the Crest of Geosynthetic-Reinforced Soil Structures

: Geosynthetic-reinforced soil structures are often used to support shallow foundations of various infrastructure systems including bridges, railways, and highways. When such infrastructures are located in seismic areas, their performance is linked to the seismic bearing capacity of the foundation. Various approaches can be used to calculate this quantity such as analytical solutions and advanced numerical models. Building upon a robust upper bound limit analysis, we created a database comprising 732 samples. The database was then used to train and test a model based on a random forest machine learning algorithm. The trained random forest model was used to develop a publicly available web application that can be readily used by researchers and practitioners. The model considers the following input factors: (1) the ratio of the distance of the foundation from the edge and the width of the foundation (D/B), (2) the slope angle ( β ), (3) the horizontal seismic intensity coefﬁcient (k h ), and (4) the dimensionless geosynthetic factor, which accounts for the tensile strength of the geosynthetic. Leveraging the model developed in this study, we show that the most important features to predict the seismic bearing capacity of strip footings positioned on the crest of geosynthetic-reinforced soil structures are D/B and k h .


Introduction
Shallow foundations are sometimes built on geosynthetic-reinforced soil structures, such as mechanically stabilized earth and artificial slopes.These structures are often used to support bridges, electrical transmission towers, railways, and highways.One of the benefits of using geosynthetic reinforcements when designing these structures is that they provide a substantial improvement to the foundation's bearing capacity under static and seismic loadings.
The complex task of predicting the bearing capacity of shallow foundations lying on geosynthetic-reinforced soils was recently analyzed using AI techniques [30][31][32][33][34].Other studies focused on analyzing the bearing capacity of shallow foundations on natural slopes using machine learning (ML) techniques [35][36][37].However, none of the existing studies focused on the application of ML methods for the prediction of the seismic bearing capacity of shallow foundations positioned on the crest of geosyntethic-reinforced soil structures.Ausilio et al. [38] recently proposed a simple mathematical expression to calculate the seismic bearing capacity of shallow foundations located close the crest of geosyntethicreinforced soil structures.This equation can be efficiently implemented as part of a broader performance-based design approach.The proposed expression is valid for both static and seismic conditions.However, this equation, by itself, does not allow users to gain insights into the hierarchy of the importance of input parameters, preventing from performing datainformed budget allocation and prioritization analyses.Furthermore, such equation is static in the sense that as more data become available, it cannot be directly modified, but it needs alternate and/or updated functional forms and/or equation coefficients.ML approaches solve both issues, allowing for robust feature importance analyses and generating dynamic solutions which do not rely upon a fixed functional form and equation coefficients but can be seamlessly modified as more data become available.
In this study, for the first time, an ML algorithm is utilized for predicting the seismic bearing capacity of shallow foundations located close the crest of geosyntethic-reinforced soil structures.The random forest algorithm was selected and used because it is very versatile and is one the most employed in geotechnical engineering applications [39][40][41][42].The algorithm is based on a database, generated as part of this study, developed using a robust upper-bound limit analysis.In addition to proposing a novel algorithm, we also investigate the importance of key input features, unveiling the most important factors for such problems.

Dataset
Limit analysis approaches are often used to analyze the bearing capacity of foundations built on natural slopes [43,44].The dataset used in this study was developed using a robust upper-bound limit analyses to calculate the seismic bearing capacity of strip footings placed close to the crest of geosynthetic-reinforced structures, developed by Ausilio [18].This method uses the kinematic theorem of the plasticity theory to evaluate the seismic bearing capacity of a strip footing of width B on a geosynthetic reinforced soil slope with angle β at a distance D from the edge.Figure 1 shows the schematic representation of the problem.In this approach, seismic actions are considered as pseudo-static equivalent horizontal and vertical forces acting on both the foundation and the soil underneath it.The kinematically admissible mechanism considered in this approach is characterized by a log-spiral failure surface [45], which is assumed to pass through the right edge of the strip footing on the surface of the reinforced slope with r 0 , which is the radius at initial angle θ 0 , while θ h is the final log-spiral angle (Figure 1).
This approach uses three parameters to characterize the soil portion of the system: (1) γ (soil unit weight), (2) c (cohesion), and (3) ϕ (friction angle).In this approach, a uniformly placed geosynthetic reinforcement is considered with average tensile strength per unit cross-section (k t = T/d) equal to the tensile strength of a single reinforcement layer per unit width (T) divided by the vertical distance between the layers of the reinforcement layers (d).The solution (q) is then obtained by equating the rate of external work to the rate of energy of dissipation: Geosciences 2023, 13, 317 3 of 11 where functions f 1 -f 6 are dependent on ϕ, β, θ 0 , and θ h .Equation (1) represents a lowerbound solution for the ultimate bearing capacity of a footing placed at the crest of a reinforced slope considering a log-spiral failure mechanism.The best estimate of q should then be derived using Equation ( 1) by minimizing it with respect to θ 0 and θ h .Once these angles are found, the limit load is calculated, substituting these values into Equation (1) [18].
This approach uses three parameters to characterize the soil portion of the system: (1) γ (soil unit weight), (2) c (cohesion), and (3) φ (friction angle).In this approach, a uniformly placed geosynthetic reinforcement is considered with average tensile strength per unit cross-section (kt = T/d) equal to the tensile strength of a single reinforcement layer per unit width (T) divided by the vertical distance between the layers of the reinforcement layers (d).The solution (q) is then obtained by equating the rate of external work to the rate of energy of dissipation: where functions f1-f6 are dependent on φ, β, θo, and θh.Equation (1) represents a lowerbound solution for the ultimate bearing capacity of a footing placed at the crest of a reinforced slope considering a log-spiral failure mechanism.The best estimate of q should then be derived using Equation 1 by minimizing it with respect to θo and θh.Once these angles are found, the limit load is calculated, substituting these values into Equation (1) [18].
The dataset developed and used in this study uses the same input parameters presented by Ausilio et al. [34] and comprises: γ, φ, T, d, B, D, β, and the horizontal seismic intensity coefficient (kh).Rather than directly using the direct value of these input parameters, we conveniently adopted the following three dimensionless factors: (1) the dimensionless edge distance (D/B), (2) the geosynthetic factor (γB/kt), (3) the horizontal seismic coefficient (kh), along with the slope angle (β).The resulting values in terms of bearing capacity were also transformed into a dimensionless form (q/γB).The resulting dataset comprises a total of 732 data points.The range of variables used in the model is presented in Table 1.In this study, we decided to keep φ constant and equal to 35°.This choice was made because in most real applications backfills comprise sandy materials characterized The dataset developed and used in this study uses the same input parameters presented by Ausilio et al. [34] and comprises: γ, ϕ, T, d, B, D, β, and the horizontal seismic intensity coefficient (k h ).Rather than directly using the direct value of these input parameters, we conveniently adopted the following three dimensionless factors: (1) the dimensionless edge distance (D/B), (2) the geosynthetic factor (γB/k t ), (3) the horizontal seismic coefficient (k h ), along with the slope angle (β).The resulting values in terms of bearing capacity were also transformed into a dimensionless form (q/γB).The resulting dataset comprises a total of 732 data points.The range of variables used in the model is presented in Table 1.In this study, we decided to keep ϕ constant and equal to 35 • .This choice was made because in most real applications backfills comprise sandy materials characterized by values that are often equal or close to this value.Figure 2 shows histograms of these input parameters, providing a global overview of the parameter space that will be used to construct the prediction model.parameter space.Values of γB/kt were chosen to identify the role of strong (0.4) and weak (2.226) reinforcements.These two values were selected for standard values of T/d (tensile strength divided by the between-reinforcement distance).Other values were added with a random distribution to fill the parameter space gap between these two values.Values of D/B were initially selected to sample the parameter space homogeneously.However, as this parameter goes up, the number of slopes converging to a solution goes down.As a result, the final distribution of D/B roughly decreases as D/B increases.The total number of data points (732) was obtained considering various combinations of each individual input parameter.As a result, the distributions of such dimensionless parameters are somewhat sparse and do not follow any specific probability density function.Some background on the rationale used to build such dataset is provided below.Half of the processing was completed for static conditions (k h = 0).As a result, the k h = 0 bar in Figure 2c represents half of the dataset.The remaining portion of the analysis is representative of pseudo-static conditions.In such cases, most of the processing was completed for k h = 0.2, which was selected as a representative value of the conditions often encountered in professional applications in active seismic zones for typical earthquake return periods.The calculations for the remaining k h values (other than 0 and 0.2) were performed to obtain a balanced dataset among them.However, when the kh increases, the number of slopes converging to a solution decreases.As a result, for higher k h values, the dataset has a smaller number of data points.The slope angle distribution follows a log-normal distribution that was built around four discrete values: (1) 30 • , (2) 45 • , (3) 60 • , and (4) 75 • .Such distribution was defined to have a modal value equal to 45 • .Values between these angles were added to have a sufficient quantity of data points in gap zones of the parameter space.Values of γB/k t were chosen to identify the role of strong (0.4) and weak (2.226) reinforcements.These two values were selected for standard values of T/d (tensile strength divided by the between-reinforcement distance).Other values were added with a random distribution to fill the parameter space gap between these two values.Values of D/B were initially selected to sample the parameter space homogeneously.However, as this parameter goes up, the number of slopes converging to a solution goes down.As a result, the final distribution of D/B roughly decreases as D/B increases.

Random Forest Model
As mentioned earlier, the use of ML algorithms is becoming increasingly popular in geotechnical earthquake engineering applications.ML can be used in regression problems to predict continuous response value.In general, each variable is called a feature, while the prediction is called a target.In a tree-based method, such as the random forest (RF) model used in this study, the prediction is made based on the division of the entire domain into smaller regions, each able to capture different relationships among features.This method develops multiple decision trees using different parts of the datasets and/or a subset of features.The combination of all trees, taken as the mean among them and which forms the so-called "forest", defines the final prediction model.A proper validation of an ML model requires the subdivision of the dataset into train and test data.The latter is used after the train phase to evaluate the performance of the model.This performance analysis is performed using data that were not used to train the algorithm.In this study, the size of the test data is 20% of the whole dataset.In addition to this division, cross validation is performed during the training of the model.This automated procedure randomly subdivides the dataset into (k) smaller sets, and it uses (k − 1) sets to train the model and one set to validate it.In this study, this procedure is repeated k = 10 times during the training of the model.RF algorithms are also often used to identify the importance of each feature based on their predictive power.When using ML models, a hyperparameter optimization algorithm is used to set the set of model parameters.The variables considered are: (i) maximum depth of each tree, (ii) number of estimators (trees), (iii) maximum number of features considered for node splitting, and (iv) the function to measure the quality of a dataset split (criterion).In this study, the overall performance of the RF model is evaluated by means of a residuals analysis.Residuals are computed as the difference of the natural log of q/γB from the analytical procedure and that predicted using the RF model.

Results
This section presents the results of the application of the RF model presented in the previous section.The RF model was trained and tested using the dataset summarized in the "Dataset" section, with the input parameters shown in Table 1 and Figure 2. The resulting RF model is published online and available in the DesignSafe [46] online data repository [47].The average cross validation score (which is a measure of the reliability of the RF model) is 0.992.The training score (which describes how well the RF model predicts the target in the training phase) is equal to 0.998.The testing score (which describes how well the RF model predicts the target using the portion of the dataset excluded from the training phase) is equal to 0.988.The overall score is 0.997.All of these scores are very high, suggesting that the model works well in the parameter space covered by the data in the used dataset.Another desirable feature of this model is that the cross validation, train, and test scores are similar.This is proof that there is no model overfitting (a high train score with a low test score would be present in case of model overfitting).
Figure 3 shows an ensemble representation of predictions made using the developed RF model (crosses) and data points calculated using Equation (1) (circles).Figure 3 shows both train and test data points used in the development of the model.A simple visual inspection of Figure 3 suggests that predictions made using the RF model are consistent with the target values.However, to better inspect the data and identify potential trends, which would suggest model prediction biases, we also performed a residuals analysis.As mentioned previously, residuals were computed as the difference of the natural log of q/γB from the analytical procedure and that predicted using the RF model and are shown in Figure 4.The analysis of Figure 4 confirms that the RF model is able to faithfully reproduce the target results.This is evident by looking at binned means and standard deviation values.The former do not have a persistent trend, although they tend to be more negative as γB/k t increases.The latter show a low dispersion that is generally higher for γB/k t values greater than 1.5.Based on this quantitative analysis, it is possible to conclude that the presented RF model robustly captures the physical behavior of the system being analyzed and that it can be confidently used for forward predictions analyses.
in Figure 4.The analysis of Figure 4 confirms that the RF model is able to faithfully reproduce the target results.This is evident by looking at binned means and standard deviation values.The former do not have a persistent trend, although they tend to be more negative as γB/kt increases.The latter show a low dispersion that is generally higher for γB/kt values greater than 1.5.Based on this quantitative analysis, it is possible to conclude that the presented RF model robustly captures the physical behavior of the system being analyzed and that it can be confidently used for forward predictions analyses.Comparison between predictions made using the RF model presented in this study and results obtained using the analytical procedure of Ausilio [18].
duce the target results.This is evident by looking at binned means and standard deviation values.The former do not have a persistent trend, although they tend to be more negative as γB/kt increases.The latter show a low dispersion that is generally higher for γB/kt values greater than 1.5.Based on this quantitative analysis, it is possible to conclude that the presented RF model robustly captures the physical behavior of the system being analyzed and that it can be confidently used for forward predictions analyses.Another important outcome of this study, which confirms the validity of using ML models to analyze these types of problems, is the feature importance analysis.This analysis presents the percentage importance of each model feature.These percentages are based on the mean hierarchal position of each input feature in all of the decision trees comprising the final model (i.e., the "forest").Such analysis is very useful in the design phase of engineering structures such as those analyzed in this study.Furthermore, they can support funding allocation assessments that can be based on the importance of each analyzed feature.Figure 5 presents the results of the RF-based feature importance analysis performed in this study.The dimensionless ratio of the distance of the foundation from the edge and the width of the foundation is the most important input factor, weighing almost half of the total feature percentage value.The second feature by importance is k h , which represents the amplitude of the earthquake shaking.Interestingly, the dimensionless geosynthetic factor is only ranked third.However, this feature still has a relatively high importance (almost 15%).The fourth and least important feature is the slope of the reinforced structure, which only accounts for less than 5% of the total importance.This last outcome may seem counterintuitive as this parameter, when dealing with foundations on natural slopes, significantly influences the bearing capacity of a foundation at the crest of a slope.This is particularly true for foundations built adjacent to the slope (D = 0), in static conditions and on natural slopes (e.g., [44]).However, the case analyzed in this paper considers completely different conditions (distance from the edge of the slope, seismic conditions, and presence of stabilizing forces represented by geosynthetics) from those typically analyzed when looking at natural slopes.Therefore, under these conditions (this is no longer a natural slope at a fixed distance from the crest, but rather a reinforced slope by means of geosynthetics with variable distance from the crest), the slope angle becomes less influential when analyzing the bearing capacity of a shallow foundation positioned on its crest.These results are very important as they indicate that, when designing these structures, their slope is almost nonimportant on the seismic bearing capacity.Given that D/B is usually a given quantity (the footing location with respect to the edge of the reinforced structure is usually fixed), when allocating funding in the design phase, a stronger emphasis should be placed on the seismic hazard characterization (which relates to k h ) and to the reinforcement characteristics.
based on the mean hierarchal position of each input feature in all of the decision trees comprising the final model (i.e., the "forest").Such analysis is very useful in the design phase of engineering structures such as those analyzed in this study.Furthermore, they can support funding allocation assessments that can be based on the importance of each analyzed feature.Figure 5 presents the results of the RF-based feature importance analysis performed in this study.The dimensionless ratio of the distance of the foundation from the edge and the width of the foundation is the most important input factor, weighing almost half of the total feature percentage value.The second feature by importance is kh, which represents the amplitude of the earthquake shaking.Interestingly, the dimensionless geosynthetic factor is only ranked third.However, this feature still has a relatively high importance (almost 15%).The fourth and least important feature is the slope of the reinforced structure, which only accounts for less than 5% of the total importance.This last outcome may seem counterintuitive as this parameter, when dealing with foundations on natural slopes, significantly influences the bearing capacity of a foundation at the crest of a slope.This is particularly true for foundations built adjacent to the slope (D = 0), in static conditions and on natural slopes (e.g., [44]).However, the case analyzed in this paper considers completely different conditions (distance from the edge of the slope, seismic conditions, and presence of stabilizing forces represented by geosynthetics) from those typically analyzed when looking at natural slopes.Therefore, under these conditions (this is no longer a natural slope at a fixed distance from the crest, but rather a reinforced slope by means of geosynthetics with variable distance from the crest), the slope angle becomes less influential when analyzing the bearing capacity of a shallow foundation positioned on its crest.These results are very important as they indicate that, when designing these structures, their slope is almost non-important on the seismic bearing capacity.Given that D/B is usually a given quantity (the footing location with respect to the edge of the reinforced structure is usually fixed), when allocating funding in the design phase, a stronger emphasis should be placed on the seismic hazard characterization (which relates to kh) and to the reinforcement characteristics.

Use of the Cloud-Based Prediction Tool and Example Case Study
This section illustrates where to find and how to use the presented model within robust engineering frameworks.The RF-based prediction tool developed and presented in this study is publicly available in the cloud on the cyber-infrastructure DesignSafe [46,47] capacity value of 511 kPa.This solution matches the result obtained using the Ausilio [18] solution.This value is obtained instantaneously after entering the input parameter and is shown as output in the Jupyter Notebook, as well as a file contained in the DesignSafe project folder, called "Output.txt."This auto-generated file contains all input parameters, as well as the final result.It also provides a log of user's actions and any error messages that may arise from wrong use of the tool (including dimensionless input parameters out of the parameters space).
The proposed algorithm, as implemented in the cloud [47], can be readily used to define the bearing capacity (q) value corresponding to any shallow foundation positioned near the crest of a slope reinforced by means of geosynthetics in static and seismic conditions.However, engineers may want to solve this problem within a performance-based design framework.This is a viable approach when, for example, the seismic coefficient is very high (e.g., for long return periods, corresponding to more intense design ground motions).In these cases, it may be more reasonable to accept that the structure being designed would suffer an acceptable level of permanent displacements.The proposed model can be used in such a framework too.To do so, users will need to first define a target bearing capacity value (q).Then, the seismic yield coefficient (k y ) will need to be calculated by iterations.Alternatively, k y can be calculated using the solution of Ausilio et al. [38].Armed with k y and one or more scenario earthquake time series, users can calculate the corresponding permanent displacement by means of the conventional Newmark's sliding block procedure (or surrogate models).If the resulting permanent displacement is higher than a tolerable threshold (which has to be pre-defined based on the importance/type of the structure and/or infrastructure system being analyzed), then a higher value of the seismic bearing capacity can be obtained by changing design inputs such as the tensile strength of the reinforcements.

Discussion and Conclusions
This paper investigates the potential of using RF techniques in predicting the seismic bearing capacity of a strip footing resting on geosynthetic-reinforced soil structures.It also provides a critical view on the most important parameters affecting this engineering problem.Finally, it presents a robust engineering workflow on how to use the presented model, using a direct approach and within a performance-based design framework.The RF model presented in this study is based on data obtained using a robust limit analysis approach [18].The resulting dataset comprises more than 700 data points, sampling a wide range of typical input parameters.The final model developed in this study relies upon four composite input parameters: (1) D/B, (2) β, (3) k h , and (4) γB/k t .These parameters were used in legacy models (e.g., [18]).
The main outcomes of this study are summarized and discussed in the remainder of this section.The presented RF algorithm works well in predicting the ultimate seismic bearing capacity of the strip footing resting on geosynthetic-reinforced soil structures.The model developed in this study is publicly available [47] and can be readily used in future forward prediction analyses.Such analyses can be performed by directly calculating the seismic bearing capacity of the analyzed system (i.e., directly using the cloud-based implementation of the presented RF-based prediction model [47]), or by means of a rationale performance based-design approach framework, which leverages the model presented in this paper.The latter would need to be performed combining the RF model developed in this study with the following items: (1) one or more scenario earthquake time series and (2) a model to calculate earthquake-induced permanent displacements (e.g., Newmark-type sliding block procedures or surrogate models).
By performing an RF-based feature importance analysis, the following outcomes, on the relative importance of all input parameters used in this study, were obtained.The dimensionless edge distance, the horizontal seismic intensity coefficient, and the dimensionless geosynthetic factor play a major role in predicting the ultimate seismic bearing capacity of strip footing located close the crest of geosynthetic-reinforced soil

Figure 1 .
Figure1.Geometry of the reinforced soil structure with the log-spiral failure surface (adapted from[18]).

Figure 2 .
Figure 2. Histograms showing the distributions of the input parameters comprising the presented dataset: (a) D/B, (b) β, (c) k h , and (d) γB/k t .

Figure 3 .
Figure 3.Comparison between predictions made using the RF model presented in this study and results obtained using the analytical procedure of Ausilio[18].

Figure 4 .
Figure 4.Residuals between predictions made using the RF model presented in this study and results obtained using the analytical procedure of Ausilio[18].Black triangles show binned means, pink-shaded areas represent means ± a standard deviation.

Figure 3 .
Figure 3.Comparison between predictions made using the RF model presented in this study and results obtained using the analytical procedure of Ausilio[18].

Figure 3 .
Figure 3.Comparison between predictions made using the RF model presented in this study and results obtained using the analytical procedure of Ausilio[18].

Figure 4 .
Figure 4.Residuals between predictions made using the RF model presented in this study and results obtained using the analytical procedure of Ausilio[18].Black triangles show binned means, pink-shaded areas represent means ± a standard deviation.

Figure 4 .
Figure 4.Residuals between predictions made using the RF model presented in this study and results obtained using the analytical procedure of Ausilio[18].Black triangles show binned means, pink-shaded areas represent means ± a standard deviation.

Figure 5 .
Figure 5. Bar chart reporting feature importance of all input parameters used in this study.

Figure 5 .
Figure 5. Bar chart reporting feature importance of all input parameters used in this study.
. The tool was implemented as an interactive Jupyter Notebook.Its interface is intuitive and can be ran by simply running the code from the Jupyter Notebook console.The following input information are necessary to run the algorithm: β, D, B, γ, k t , and k h .To further verify the proposed model, using the cloud-based tool available on DesignSafe, a case study previously presented by Ausilio et al. [38] is analyzed and discussed.The input parameters used in this case study are: β = 45 • , D = 6 m, B = 2 m, γ = 17 kN/m 3 , k t = 34 kPa, and k h = 0.2.They produce the following dimensionless model input parameters: D/B = 3 and γB/k t = 1.The RF model, as implemented in the cloud-based tool, provides a final bearing

Table 1 .
List and ranges of input parameters used in this study.