Probabilistic Design of Retaining Wall Using Machine Learning Methods

: Retaining walls are geostructures providing permanent lateral support to vertical slopes of soil, and it is essential to analyze the failure probability of such a structure. To keep the importance of geotechnics on par with the advancement in technology, the implementation of artiﬁcial intelligence techniques is done for the reliability analysis of the structure. Designing the structure based on the probability of failure leads to an economical design. Machine learning models used for predicting the factor of safety of the wall are Emotional Neural Network, Multivariate Adaptive Regression Spline, and SOS–LSSVM. The First-Order Second Moment Method is used for calculating the reliability index of the wall. In addition, these models are assessed based on the results they produce, and the best model among these is concluded for extensive ﬁeld study in the future. The overall performance evaluation through various accuracy quantiﬁcation determined SOS–LSSVM as the best model. The obtained results show that the reliability index calculated by the AI methods differs from the reference values by less than 2%. These methodologies have made the problems facile by increasing the precision of the result. Artiﬁcial intelligence has removed the cumbersome calculations in almost all the acquainted ﬁelds and disciplines. The techniques used in this study are evolved versions of some older algorithms. This work aims to clarify the probabilistic approach toward designing the structures, using the artiﬁcial intelligence to simplify the practical evaluations.


Introduction
Forecasting the failure of geotechnical structures and then digging for remedial measures to avoid those failures is the primary concern of researchers today. In geotechnical practice, retaining walls have been used for decades to mitigate the excessive movements near deep excavation projects. They prevent subsequent damage to neighboring buildings and infrastructures. Slopes close to deep excavations may experience inevitable failures such as rotational failure, translational failure, wedge failure, compound failure, flow, spreads, etc. Therefore, designing a retaining wall is a crucial research area for geotechnical engineers. To quantify the extent of the failure, a parameter called Factor of Safety (FOS), the ratio of resisting force to the driving force, is calculated based on different criteria such as overturning, sliding, and bearing capacity failure [1]. In this regard, many scholars have conducted reliability analyses on cantilever walls [2][3][4].
Incorporating Artificial Intelligence (AI) into civil engineering has taken reasonable minds and diligent efforts of researchers and engineers to join hands from different fields to give solutions regarding the problem. Geotechnical researchers employed Artificial Neural Network (ANN) upon different aspects of the problems, including computing algorithms, reliability methods, design of structure, etc. The application of neural networks to retaining structures design went back to 2005, when Goh and Kulhawy [5] studied the reliability of retaining wall performance using a neural network approach. Chen et al. [3] explored the prediction of safety factor values of retaining walls as major resistance systems for ground forces through AI techniques. A probabilistic model was proposed by [6] for estimating failure probabilities during excavations using observation evidence. They developed a Bayesian network and distance-based Bayesian model updating, using an ANN to create the response surface relationship. Evaluating a structural or any other geotechnical problem using ANN has already entered the minds of many researchers, but the few drawbacks of this technique have led to the development of many other algorithms.
Neural networks (NNs) have exhausted the field of research and cumbersome nature, and the few drawbacks of NNs have led to advanced and modified techniques [7,8]. Three of these techniques are employed in this paper. The three models used in this study have employed 10-fold cross-validation to obtain an optimal model for each algorithm and then compare three optimal models [9]. The incorporation of emotional parameters such as anxiety and confidence into a neural network made an EmNN [10]. Various aspects of this methodology reference different literary works, such as determining the compressive strength of concrete, modeling rainfall-runoff, etc. MARS is a multivariate analysis to extract complex data mapping into high-dimensional data and create a simple and easyto-understand model [11,12]. The major areas of its application in civil engineering are modeling doweled pavement performance, estimation of deformation of asphalt mixtures, determination of the clay's undrained shear strength of the clay, etc. [13,14]. A hybrid technique SOS-LSSVM comprises two algorithms, an LSSVM for better learning and an SOS for optimizing the learning process [15,16].
In this study, the reliability of a cantilever retaining wall is evaluated based on the sliding failure criterion. Several soil parameters are engaged while designing the wall, such as cohesion, angle of shearing resistance, angle of wall friction, and unit weight of the soil. The retaining wall design is typically based on deterministic formulations that do not consider the natural variability of geotechnical parameters. As a result of the inherent uncertainty of soil properties, there is an increasing trend in geotechnical engineering to incorporate reliability-based designs to mitigate construction uncertainties. The four parameters as mentioned above are taken as characteristic variables in this study, and three algorithms of machine learning named Emotional Neural Network (EmNN), Multivariate Adaptive Regression Spline (MARS), and Symbiotic Organism Search-Least Square Support Vector Machine (SOS-LSSVM) are used to model the retaining wall. Reliability analysis is done to measure the ability of a structure to meet requirements under a specified period [17]. A reliability index is calculated using the First-Order Second Moment method (FOSM), which implies the failure probability of a structure, and after that, models are assessed based on their prediction capability, reliability index, adaptability, learning, etc. [18]. There are three types of probabilistic uncertainty analysis methodologies [19], namely (i) analytical solutions, (ii) approximation approaches (e.g., First-Order Second Moment), and (iii) numerical approaches (e.g., Monte Carlo simulation). FOSM approach is one of the most extensively utilized approaches in civil engineering applications due to its simplicity. The name FOSM refers to the fact that it employs the first-order terms of the Taylor series expansion concerning the mean value of each input variable and needs up to the second moments of the unknown variables. Many researchers described the detailed methodologies of FOSM, and the reader is referred to them; for instance, see [18,20,21].
This study area of amalgamation of civil engineering and artificial intelligence has led to an interdisciplinary approach where different complex design problems are modeled. Various civil engineering sectors have experienced incorporating artificial intelligence algorithms such as Neural Networks, the most applied technique for robust calculation and modeling, to solve problems precisely and assure certainty as a result. Considering the intrinsic uncertainties in geomaterials, their behavior and interaction with other structural elements, performing probabilistic analysis in this field is essential. With the emergence of soft computing in geotechnical engineering, simulation models and the involved constitutive models get progressively more comprehensive. However, stochastic analyses require a large number of model evaluations, which may make the probabilistic analysis computationally unaffordable for practitioners. AI methods such as NN can be used in different aspects such as design, monitoring, and safety analysis to surrogate the extensive computational models with less computational efforts. Therefore, a well-trained and validated AI model provides the possibility of performing probabilistic analysis and evaluating the reliability of structures for the engineers in practice. This paper has elaborated a retaining wall safety analysis, as a case study, using neural networks, support vector machines, and Adaptive Regression Splines. A reliability index is calculated using FOSM to measure the safety of the structure in a given time period. This research paper involves the probabilistic design and reliability analysis of a cantilever retaining wall by incorporating 10-fold crossvalidation. Furthermore, a comprehensive performance assessment is done to conclude which AI model has a better performance. Details of this paper have been thoroughly worked upon for readers to understand the gateway between AI and the geotechnical field to connect their applications.

Emotional Neural Network
Humans have consistently outperformed everyone, be it animal or any other living species when making decisions. Emotions either emphasize the external stimuli that trigger emotion or the internal responses included in the emotional state [22]. The decision-making process, cognition, and learning in animals and humans have been worked upon by many researchers [23][24][25][26]. Machine learning has given all the fields and areas of expertise a new direction, and evolution in machine learning is relatively rapid. Incorporating emotional skills in the neural network leads to EmNN. The emotional neural network consists of emotional neurons, two emotional parameters (anxiety and confidence), and emotional weights. This simulated ideology has been an aid in increasing the learning and decision-making capability of a model. The emotional factors are altered or updated during the learning phase, and the emotional weights are used along with the conventional weights to make decisions. The learning algorithm employed in the EmNN is Emotional Back Propagation (EmBP) [27]. In the Emotional Neural Network, two emotional neurons are non-processing neurons receiving global average values of input space instead of pixels or segments as in conventional neurons. In addition, emotional weights are updated using two emotional coefficients and not conventional learning or momentum rates. Anxiety and confidence, being the two most crucial decision-making parameters, have been incorporated in the model. The rationale behind this is to recognize a pattern based on the general impression besides the precise details of the subject. In general, when a human starts to learn, he develops a high level of anxiety, and confidence stays alarmingly low, but learning after apt practice, the confidence level increases, and anxiety almost becomes negligible. Therefore, EmNN employs this concept in the process of learning. In practice, when the epochs progress and the networks are trained, anxiety tells the system to pay less attention to the derivative of the error and use all nodes as the average value of the training pattern. The other term, confidence, lets the system pay more and more attention to the change in the weights as the training epochs progress, which is a kind of increasing inertia term to modify the change in the pattern from one to another. EmNN employs EmBP as a learning algorithm that is an evolved version of conventional Back Propagation (BP). BP has been popularly used in neural networks after Rumelhart et al. [28]. BP grabbed fame because of its simplicity of implication and quick training quality when adequate training datasets are available. The reader is referred to [29] for more details about EmNN.

Symbiotic Organisms Search-Least Square Support Vector Machine
The AI approach, in different disciplines, has outperformed conventional approaches, and its robustness and excellence have led to the advancement of the research platform. AI has improvised the way of learning. SOS-LSSVM is the hybrid of two computational techniques. In this complimentary system, LSSVM operates as a supervised learning-based predictor to establish the dataset's accurate input-output connection, while SOS works to improve the LSSVM parameters [30,31]. In this study, the predictive tool LSSVM and metaheuristic optimization algorithm SOS are integrated. In addition, the cross-validation technique is employed in this study to validate the training and testing process. Several performance parameters are computed to compare the results and their validity. In order to provide a detailed description of this method, firstly, the metaheuristic algorithm of SOS and nature-inspired based LSSVM are explained in the following.

LSSVM
SVM is one of the most used predictive models modified into LSSVM by the researchers. SVM used quadratic programs as loss functions, whereas LSSVM uses the least square linear system as a loss function [32]. This algorithm is a statistical learning theory, and it differs from SVM in a way that it needs equality constraints while operating the least square cost function. The optimization constraints can be explained using the following formulations: where w R is a vector of undetermined parameters. φ(.) is a function with non-linearity mapping the input space to high feature space. e k R are error variables, and γ is a regularization constant and is always greater than zero.
Equation (3) shows function estimation by the LSSVM model where the solution to the linear system in Equation (4) is α k and b. The kernel function defined here is Radial Basis Function, and it is defined as follows: where σ is a parameter of the kernel function. γ and σ need to be specified for better predictive results from the model.

Symbiotic Organisms Search (SOS)
Symbiotic organisms search is a new metaheuristic algorithm introduced by [31]. Symbiotic organisms search is a new metaheuristic algorithm introduced by various researchers [33,34]. This is a symbiotic interactive technique to move the population of the solution, i.e., the ecosystem of the organism, to better and promising areas of search space while searching for an optimal global solution, and this is done iteratively. Organisms in the ecosystem have unique fitness values, which implies their degree of adaptability to the object that is desired or demanded. The steps involved in the SOS algorithm are as follows: • Until the criteria of stopping are satisfied.
Researchers have studied other metaheuristic techniques such as Genetic Algorithm, Particle Swarm Optimization, Differential Evolution, etc., along with the SOS, and they concluded that SOS performs better than other techniques. It is more effective and efficient and produces better results. In addition, SOS has been inculcated in many research fields to solve engineering and other fields' problems [15,35,36].
SOS-LSSVM as a hybrid AI technology that helps in creating a symbiotic environment for input and output. LSSVM acts as a supervised learning-based predictor. SOS optimizes the parameters used in the LSSVM algorithm, i.e., γ and σ.
The SOS-LSSVM technique is completed in eight significant steps categorized into phases: beginning and training phase followed by testing phase [31].

1.
Data are collected for training the model.

2.
The LSSVM model is used to analyze the ambiguous nature of input and output. In addition, σ and γ are tuned. 3.
SOS algorithm: This algorithm searches for several combinations of σ and γ parameters and makes the best set of these two parameters. In addition, SOS employs mutualism, commensalism, and parasitism phases to improve the fitness value of the solutions reached slowly. 4.
Evaluation of fitness: For evaluation of the system, a fitness function is developed that measures the accuracy of the learning system. The best combination of γ and σ represents the accurate and best fitness value. The dataset is not split randomly, and it is are divided into learning and validation subsets. In addition, to avoid the bias of sampling, 10-fold cross-validation is done. The mean square error (MSE) is utilized by the fitness function for aptness and better representation. 5.
Criteria of termination: The termination criterion used in this technique is the iteration number inculcated in the SOS algorithm. 6.
Optimal σ and γ parameters: Loop stops and optimal σ and γ parameters are reached. 7.
The optimal set of σ and γ parameters are further used for developing the model for testing the data. 8.
Data testing: The dataset split for testing is tested, and the prediction is used for assessing the performance and accuracy of the model.

Multivariate Adaptive Regression Splines
MARS is a non-parametric modeling technique that uses a data-driven approach [37]. This study explores the multidimensionality and intrinsic non-linearity of the data associated with the retaining wall. This methodology training dataset is divided into different linear segments (splines) of the different gradients (slopes). No assumptions are made on the relationship between dependent and independent variables. A flexible MARS model results from smooth curves known as Basis Functions (BFs), which connect the splines. Using this model can handle both the linear and nonlinear behavior of the problem. The points that connect the pieces are called knots. Knots indicate the beginning of one region and the end of another and are placed randomly in the range of input variables. MARS searches all the interactions among the variables and goes through all the knots to generate BFs. An adaptive regression algorithm is employed in this technique for placing the knots. This algorithm has two phases: forward and backward. BFs are defined by placing the knots in the random position using the adaptive regression technique in the forward phase. At each progressing step, the model adapts to the knots and their corresponding BFs, ensuring maximum reduction in the residual error of the sum of squares. Adding BFs continues until the number reaches the maximum limit and results in a complex model. The backward phase helps in removing the redundant BFs. A Jekabson's open MARS code is used to perform the analysis in this paper [38].
After the optimal MARS model is obtained, it can assess the relative parameter importance, which is based upon input variables and BFs.

Cross-Validation
Selecting a model and then evaluating it based on its performance holds a crucial role in machine learning. Assessing the properties of a model and assessing the properties of a model to the best of its capability are two different meanings. Many methods have been proposed by the researchers and have been applied to different models. Cross-validation's simplicity and universality are considered and used widely to select and evaluate the model. K-fold cross-validation is used as a sample reuse methodology. By using a dataset, statisticians do many experiments using different algorithms and methodologies. In this research work, 10-fold cross-validation is employed in all three models proposed, i.e., MARS, EmNN, and SOS LSSVM. After 10-fold cross-validation is chosen, the dataset is split into ten sets of 90% training data and 10% testing data, and then, they are fed to all three algorithms. The best model with minimum Mean Absolute Error (MAE) is chosen each from MARS, EmNN, and SOS-LSSVM, and then, they are compared with each other on different scales and graphs. Using the methodology provided by [39], for each dataset, we assign a rank of three to the model with the best value for each performance index and then apply a rank of one to the model with the worst value for each performance index. After that, the overall performance rating of each model was generated by summing its total rank in each dataset.

Case Example
Cantilever walls rely on their own weight to resist sliding and overturning, but they also benefit from the weight of the backfill above the heel of the wall. The reinforced concrete cantilever walls come with various geometries. They are much cheaper to erect than the gravity wall, so they can be prefabricated and transferred to the site directly. Sliding as one of the potential modes of collapse must always be considered in reliability designs. The minimum safety factor against sliding is considered to be equal to 1.5. The most significant sliding force component usually comes from the lateral earth pressure acting on the wall's active (backfill) side. Such force may be intensified by the presence of vertical or horizontal loads on the backfill surface. Figure 1 depicts the geometry of the cantilever wall as well as the major variables considered in this article. The safety factor against sliding can be indicated by the following in which F R is the sum of the horizontal resisting forces, and F D is the sum of the horizontal driving forces.
where ∑ V is the sum of the total vertical forces acting. P a and P p are the active and passive rankin pressure, rspectively. The following equations are used to calculate the active and passive pressure safety. After modeling the problem, different assessments were done to analyze the performance of the three models. After that, the models were compared amongst each other using various error measurements. In addition, reliability analysis was done by calculating the reliability index using the FOSM method, which assumes that all the variables are independent. Using this, failure probability is calculated, which can further be used for designing purposes.   As illustrated in Figure 1, the soil profile behind the retaining wall is considered to have a slope of i = 20 • , which is considered in the calculation of the soil mass in the above equations. Moreover, c is the cohesion value, φ is the angle of shearing resistance, and δ represents the angle of wall friction (δ = 2/3 φ). In order to calculate all the datasets, the above-mentioned formulation has been scripted in MATLAB code to evaluate the FOS for various combinations of input variables.

Errors and Other Parameters
This study has a lot to contribute both to the field of AI and geotechnical engineering. This research work results from various new techniques in the market of evolving machine learning being employed in different research fields of medication and technical engineering. Cohesion (c), angle of shearing resistance (φ), angle of wall friction (δ), and unit weight (γ) were taken as primary variables, following a lognormal distribution function. For this study of retaining walls, data were collected, and the Coefficient Of Variation (COV) values were taken from different previous studies [39,40]. The COV values for φ and δ are considered to be equal to 15%, while the mean value of 30 • and 20 • is applied, respectively. The random parameter of cohesion is assumed to have a mean value of 20 kPa. The fourth material parameter that is considered a random variable is the unit weight of soil mass γ with a mean value of 20 kN/m 3 , and the COV is assumed to be equal to 8%. Afterwards, 80 datasets were generated randomly, of which 90% were put to training and 10% were put to testing.
Three models-MARS, EmNN, and SOS-LSSVM-were used to predict the factor of safety. After modeling the problem, different assessments were done to analyze the performance of the three models. After that, the models were compared amongst each other using various error measurements. In addition, reliability analysis was done by calculating the reliability index using the FOSM method, which assumes that all the variables are independent. Using this, failure probability is calculated, which can further be used for designing purposes.  Table 1 shows the values of the errors as mentioned earlier and parameters. These parameters define the trend and relation between the predicted and observed values. The Scatter Index (SI) employed in this study is a statistical quantity used for measuring the error in the predicted value [41]. For computing, this value RMSE is divided by the mean of observed values. This quantity indicates if the values forecasted by the models are up to the mark or not. The a20-index is an engineering index used to assess the system's reliability [42]. It is calculated as follows:

Errors and Other Parameters
where m20 is the number of data whose value is the ratio of observed FOS to predicted FOS falling between 0.8 and 1.20, and M is the total number of datasets available. This parameter's physical engineering significance gives the predicted values that deviate by ±20% from the experimental values. Total rank is also calculated in this paper based on what is presented in [43]. After calculating all the parameters mentioned above, models are ranked accordingly. The model value that represents the worst performance is ranked one, and the model with the best performance is ranked three (as we have used three models: EmNN, MARS, and SOS-LSSVM). Thereafter, all the ranks are added to get a total rank. The model that has the highest rank is treated as the best model. This gives an overall view of the prediction capability, trend formation, and performance of a model. The reliability index of the model is calculated and compared with the reliability index of the actual dataset [17]. All three optimal models have different reference β values, and they have been compared accordingly. From Table 2, it is observed that the β value overlaps in SOS LSSVM. As they both show the same reliability index, it is proved that SOS LSSVM is the better model. In addition, using the reliability index, probability of failure (Pf) is calculated. The result coincides with the result generated using β. As discussed before, MARS produced several basis functions, and after that, a function for calculation of the basis function is given in Table 3, where x 1 , x 2 , x 3 , and x 4 are variables c, φ, γ, and δ. The regularization constant and kernel function parameter optimized using SOS in LSSVM are 100,000 and 28.4958, respectively.

BF Equation
Main equation

Taylor Diagram
A Taylor diagram is the graphical representation of how closely the pattern (or patterns) match the observation quantified in terms of the correlation, root mean square error, and amplitude of their variations (standard deviations). This diagram evaluates the aspects of different complex models and performs a comparative analysis of these models with the reference data (self-observed data) [44]. From Figure 2, it can be seen that while predicting SOS-LSSVM and MARS did not deviate much from the observed or actual values and the models overlap the reference value, EmNN has slightly deviated from the reference, as can be seen in the figure. It has lower correlation than the other two and a high standard deviation and RMSE.

AOC-REC Curve
The Regression Error Characteristics curve (REC) is a probability curve and a metric system to check the performance of the regression model [45]. Area Over Curve (AOC) is the measure of distinction of the predicted data of the model from the actual data. From Figure 3, it can be seen and analyzed that the AOC value of the SOS-LSSVM model is relatively less; therefore, it proves to outperform the other two models. In addition, the value of AOC for MARS is lesser than for EmNN, resulting in a better model than EmNN. and amplitude of their variations (standard deviations). This diagram evaluates the aspects of different complex models and performs a comparative analysis of these models with the reference data (self-observed data) [44]. From Figure 2, it can be seen that while predicting SOS-LSSVM and MARS did not deviate much from the observed or actual values and the models overlap the reference value, EmNN has slightly deviated from the reference, as can be seen in the figure. It has lower correlation than the other two and a high standard deviation and RMSE.

AOC-REC Curve
The Regression Error Characteristics curve (REC) is a probability curve and a metric system to check the performance of the regression model [45]. Area Over Curve (AOC) is the measure of distinction of the predicted data of the model from the actual data. From Figure 3, it can be seen and analyzed that the AOC value of the SOS-LSSVM model is relatively less; therefore, it proves to outperform the other two models. In addition, the value of AOC for MARS is lesser than for EmNN, resulting in a better model than EmNN.

R Curve
The performance curve is the curve that shows whether the models follow the trend of the reference models. This curve gives an R-value (coefficient of correlation) calculated and given in the table (Table 1). Figure 4 shows the performance curve, and it can be seen that all the three models overlap each other and follow the same trend approximately. A slight deviation of data can be observed in the EmNN model, and it is clear from other criteria as well that EmNN did not perform that well.

R Curve
The performance curve is the curve that shows whether the models follow the trend of the reference models. This curve gives an R-value (coefficient of correlation) calculated and given in the table (Table 1). Figure 4 shows the performance curve, and it can be seen that all the three models overlap each other and follow the same trend approximately. A slight deviation of data can be observed in the EmNN model, and it is clear from other criteria as well that EmNN did not perform that well.
The performance curve is the curve that shows whether the models follow the trend of the reference models. This curve gives an R-value (coefficient of correlation) calculated and given in the table (Table 1). Figure 4 shows the performance curve, and it can be seen that all the three models overlap each other and follow the same trend approximately. A slight deviation of data can be observed in the EmNN model, and it is clear from other criteria as well that EmNN did not perform that well.

Conclusions
The probabilistic design of a retaining wall has been discussed in this paper using different machine learning methods, namely MARS, EmNN, and SOS-LSSVM. The retaining walls are widely used in various infrastructures such as road and bridge construction to stabilize the slopes. A probabilistic analysis against sliding was performed to evaluate the safety of a case study to establish the intelligence prediction model. The input and output datasets were generated based on consideration of uncertainties in soil parameters and fed into the AI models to be trained. Then, the result given by the predictor was assessed based on the performance parameters and graphs such as the Taylor diagram and Regression Error Characteristic curve. For instance, different errors and efficiencies, such as WMAPE, RMSE, MAE, NS, etc., were calculated. Obtained results validate the ability of all three applied advanced methods to evaluate the safety criteria of a cantilever retaining wall with relatively high accuracy. Therefore, the proposed computational intelligence systems can benefit engineers in design and monitoring processes. SOS is an optimization technique for optimizing the parameters of the LSSVM model, and in the current study, it results in a better prediction of the factor of safety. In addition, the reliability index calculated for different optimal models selected gives an overview that SOS-LSSVM is a better model and has outperformed the other two. MARS and EmNN give slightly deviated result, but the FOS generated from MARS almost overlaps that of SOS-LSSVM.
Considering the obtained results, for future works in this field or the design of other geostructures, SOS-LSSVM can be considered, as its learning rate is optimized using SOS, which gives an upper hand to this algorithm over the other two techniques. In addition, to improve the quality of results, the model can be fed with more training data to ensure better relationships between the input and output dataset. However, future research should study the impact of a higher number of inputs and spatial randomness in the material properties on the performance of suggested AI methods. Finally, it should be stated that this study employed synthetic data to train and validate the machine learning methods. Nevertheless, the proposed methodologies can be applied analogously to realistic problems.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.