Machine Learning in the Stochastic Analysis of Slope Stability: A State-of-the-Art Review

: In traditional slope stability analysis, it is assumed that some “average” or appropriately “conservative” properties operate over the entire region of interest. This kind of deterministic conservative analysis often results in higher costs, and thus, a stochastic analysis considering uncertainty and spatial variability was developed to reduce costs. In the past few decades, machine learning has been greatly developed and extensively used in stochastic slope stability analysis, particularly used as surrogate models to improve computational efﬁciency. To better summarize the current application of machine learning and future research, this paper reviews 159 studies of supervised learning published in the past 20 years. The achievements of machine learning methods are summarized from two aspects—safety factor prediction and slope stability classiﬁcation. Four potential research challenges and suggestions are also given.


Introduction
Landslides are sudden and serious disasters that can cause significant damage to nearby facilities, resulting in economic and casualty losses.Therefore, the evaluation of slope stability is a critical prerequisite for disaster prevention and mitigation.In numerical simulations, slope stability is presented as a factor of safety (FOS), which is obtained using deterministic analyses such as limit analysis [1], finite element limit analysis [2], displacement-based finite element analysis combined with the strength reduction method [3,4], or finite element analysis with the gravity increasing method [5,6].In site investigation and analysis, slope stability is typically classified using empirical formulas or expert judgments.
Slope stability analysis concerns earth materials such as soil, rock, and other materials.Soil is one of the engineering materials that has the most complex physical, mechanical, and chemical behaviors and is made of three phases.Due to geological action and stress history, soils exhibit complex spatial variability and anisotropy, which makes studying soils and predicting their behavior difficult.In geotechnical engineering, two main approaches have traditionally been used to study the mechanical behavior of geotechnical materials: (1) empirical methods, such as laboratory tests and site investigations, and (2) numerical and analytical methods.The cost of experiments and field tests tends to rise with the number of tests conducted, which directly impacts the precision of these assessments.Consequently, significant engineering expertise is necessary to gauge slope stability based on a limited set of experiments.Traditional analytical techniques often make simplifications, such as assuming uniform soil conditions, and thereby overlook the inherent spatial variability in real-world soil [7].
The stability of a slope depends on the material strength and geometric and hydraulic parameters associated with it [8][9][10].Some studies use the deterministic inversion method to estimate soil parameters and study slope stability, and the key to this method is to find a set of parameters that will lead to slope failure [11,12].However, considering uncertainty in soil parameters, the parameters deduced using the deterministic method may not necessarily represent the real situation, and there may be multiple critical combinations of these parameters [9].Therefore, it is necessary to consider the use of probabilistic methods for inversion (probabilistic back analysis).
With the development of statistical methods, geotechnical academics started to consider the use of random field theory to represent the spatial variability in soil [13] and quantify the safety margin of slopes using failure probability or the equivalent reliability index from a probabilistic perspective.The brute-force Monte Carlo simulation (MCS) method has gained popularity in reliability analysis due to its simplicity, flexibility, and ease of use.Theoretically, the Monte Carlo method can handle all problems.For example, Monte Carlo-based probabilistic back analysis can be used to perform parameter inversion, risk assessments, and sensitivity analyses [9,14].However, when the sample size is large (i.e., for small probability events or when the probability density function (PDF) is needed), the brute-force Monte Carlo approach is time-consuming and computationally intensive [15][16][17].The response surface method was then proposed and widely used [18][19][20][21].It approximates the limit state function with a polynomial expression using the function value at a specific point.This type of analytic function replaces the exact limit state function in Monte Carlo simulations.As a result, the number of calculations required to assess the reliability of structural systems can be significantly reduced.However, RSM requires a prior specification of a suitable fitting function (usually a polynomial function) to be specified in advance [22].Since real-world problems are often very complex, polynomial estimates may not perform well in providing a good representation of the objective function [23].
On the other hand, technological developments have made site investigation data more readily available to engineers.It is necessary to take into account these actual measured parameters in the estimation of parameters.When prior knowledge about parameters such as geometry, strength, and hydraulics is obtained, Bayesian inference can combine this prior knowledge to obtain the posterior probability distribution of parameters, which can be used for risk assessment, such as calculating the probability of slope instability [24].The Bayesian method can also be used for time prediction of slope stability by combining historical data and new observational data, thereby reducing the uncertainty in parameters and predicting slope stability at a certain point in the future [25].However, the high non-linearity of the model and the exponentially increased calculation and sample number prevent rapid FOS prediction and accurate classification [24].
In recent years, the machine learning (ML) method has gained increasing attention from geotechnical researchers as a promising approach for studying slope stability.MLaided stochastic analyses have been successfully applied in many cases [26][27][28][29][30][31][32].Research using machine learning to aid in slope stability assessment has been conducted in various countries.These studies often focus on predicting safety factors or categorizing slope safety based on data gathered from field tests.A large number of slope classifications have been conducted using field data collected globally.For instance, the data in a study by Zhang originate from Yunyang County, Chongqing, China [10], while the data in a study by Zhu are sourced from the South Pars Special Zone, Assalouyeh, Southwest Iran [32].ML models in geotechnical reliability analysis can capture complex relationships between input and output data and construct high-dimensional nonlinear functions to directly predict the target output.Machine learning-generated surrogate models work well with variability.They can take into account the variability in data, that is, the parameter variability in soil and slopes, the variability in the model, such as different initialization values or model structures or hyperparameters, and the variability in evaluation and verification, that is, different evaluation indicators.For slope stability problems, the inputs are usually slope boundary parameters and soil mechanical parameters, and the output is the FOS or slope stability classification.Compared with traditional probabilistic or analytical methods (such as the brute-force Monte Carlo simulations or polynomial fitting), ML methods have several advantages, such as high accuracy, not requiring any initial assumptions between input and output (the relationship between input and output is a high-order nonlinear function), being able to handle large datasets, and being capable of handling incomplete data [33].By using ML techniques, geotechnical engineers can avoid the limitations of traditional methods, such as the need for prior assumptions or high computational requirements, and can accurately predict slope stability.
This paper examines 159 publications in the past 20 years that have used supervised ML methods in slope stability problems.The stability analysis method is briefly introduced in Section 2. Section 3 provides an overview of several commonly used ML algorithms.Section 4 outlines the different applications of ML in slope stability.Additionally, Section 5 offers some future research perspectives, and the main conclusion from this review is summarized in Section 6.

Brief Overview of Methods for Slope Stability Analysis
This section provides an overview of methods for slope stability analysis.These methods can be broadly categorized into four groups: empirical methods, the limit equilibrium method (LEM), the finite element method (FEM), and other numerical methods.Figure 1 illustrates the distribution of different methods within the papers examined in this review.The LEM is used in approximately 50% of all studies, which is much more than other methods and largely due to its early development, ease of use, and high computing speed.The LEM is followed by the FEM and empirical methods.Due to its powerful calculation ability and wide applicability, the FEM has gradually become the most popular numerical method.Compared with the LEM, the FEM not only considers the constitutive model of soil but can also represent stress and deformation outside the sliding surface.However, numerical methods are simplified solutions to practical problems and are sometimes unable to explain complex practical situations.Therefore, empirical methods are essential for slope stability problems as they can directly obtain the FOS or slope stability classification, which can be used as independent and dependent variables in reliability analysis.In addition, some other numerical methods are also used in slope stability analysis, such as the finite difference method, the limit analysis method, and mesh-free methods such as the smoothed-particle hydrodynamics (SPH) method [34][35][36][37].The LEM and the FEM will be briefly introduced in the following subsection.
of traditional methods, such as the need for prior assumptions or high co quirements, and can accurately predict slope stability.
This paper examines 159 publications in the past 20 years that have ML methods in slope stability problems.The stability analysis method duced in Section 2. Section 3 provides an overview of several commonly rithms.Section 4 outlines the different applications of ML in slope stabilit Section 5 offers some future research perspectives, and the main conclus view is summarized in Section 6.

Brief Overview of Methods for Slope Stability Analysis
This section provides an overview of methods for slope stability methods can be broadly categorized into four groups: empirical method librium method (LEM), the finite element method (FEM), and other num Figure 1 illustrates the distribution of different methods within the pap this review.The LEM is used in approximately 50% of all studies, whic than other methods and largely due to its early development, ease of use puting speed.The LEM is followed by the FEM and empirical methods.D ful calculation ability and wide applicability, the FEM has gradually b popular numerical method.Compared with the LEM, the FEM not only co stitutive model of soil but can also represent stress and deformation ou surface.However, numerical methods are simplified solutions to practic are sometimes unable to explain complex practical situations.Therefore, ods are essential for slope stability problems as they can directly obtain t stability classification, which can be used as independent and dependen liability analysis.In addition, some other numerical methods are also use ity analysis, such as the finite difference method, the limit analysis method methods such as the smoothed-particle hydrodynamics (SPH) method [3 and the FEM will be briefly introduced in the following subsection.

The Limit Equilibrium Method (LEM)
A failure surface is required to be assumed in the LEM when solving a problem, which is a sliding line in the 2D case, usually a plane, circular, or logarithmic spiral.Based on this assumption, the stability problem is transformed into finding the most dangerous positions for the failure surface.In addition, assumptions are made about the stress distribution on the failure surface to obtain the overall equilibrium equation, which can then be solved using simple statics.Since the LEM assumes static conditions for the failure of slopes, constitutive behaviors and the related deformation are ignored in the calculations [38].The FOS is defined as the ratio of soil resisting shear strength (τ) and the shear stress on the assumed failure surface (τ f ) [39].If the plastic behavior of the soil follows the Mohr-coulomb failure criterion, the FOS is: where σ is the effective stress, ϕ is the effective friction angle, and c is the effective cohesion.The minimum FOS on the assumed failure surface is obtained using iterative calculations [40].
The direction, which is perpendicular to the cross-section, is usually considered insignificant for the calculation of FOS [6].The commonly used two-dimensional LEMs are the Bishop simplified method and the ordinary method of slices (Swedish circle method, Petterson, Fellenius, Spencer, Sarma, etc.).These methods primarily differ in the inter-stripe force assumptions and whether all three equilibrium conditions are satisfied (e.g., force equilibrium in the horizontal and vertical direction and moment equilibrium condition).The Spencer and Sarma methods satisfy all requirements and therefore are called rigorous methods, which give more accurate calculations than other non-rigorous methods [41,42].

The Finite Element Method (FEM)
The FEM is a representative of the mesh-based method and is probably the most widely used numerical method [43][44][45] in geotechnical problems.Compared with LEMs, FEMs can consider the constitutive behavior of soil and are not required to assume the specific failure surface.Two methods were presented for slope stability analysis combined with FEM: the strength reduction method (SRM) and the gravity increasing method (GIM) [5,6].
In the SRM, the parameters of the soil are reduced using a reduction factor until the failure happens, and the FOS is calculated as the reciprocal of this factor using iterations.Taking the Mohr-Coulomb model as an example, in the ith iteration, the reduced strength parameters are: The SRM and the LEM usually give very similar results for homogenous slopes [46].However, the SRM is sometimes sensitive to nonlinear algorithms and flow rules.In addition, the SRM cannot determine failure surfaces, which may be only slightly less critical than the SRM failure surface [47].
In the GIM, the calculated gravity increases gradually until the slope becomes unstable.Therefore, the GIM aims to obtain the limit of gravity, which is represented by the acceleration of gravity [48].The FOS is defined as the ratio between the failure gravity (g f ) and the real-world gravity (g i ):

Brief Overview of Machine Learning (ML) Methods
The applications of ML methods in slope stability analysis in the last two decades (2003-2022) are shown in Figure 2. The bars represent the annual number of studies, and the black dashed line is the trendline.The statistical data were obtained from the Web of Science database.Using 'machine learning' and 'slope stability analyses' as search keywords, and after checking the relevance between papers and the topic, a total of 159 publications from 2002 to 2022 were obtained.The number of publications has grown dramatically from 2019 to 2022.This shows that in recent years, researchers have paid more attention to the use of ML in slope stability problems.It is important to note that a single publication can use multiple ML methods, resulting in a larger number of methods used than the number of publications reporting those studies.

Brief Overview of Machine Learning (ML) Methods
The applications of ML methods in slope stability analysis in the last two decad (2003-2022) are shown in Figure 2. The bars represent the annual number of studies, a the black dashed line is the trendline.The statistical data were obtained from the Web Science database.Using 'machine learning' and 'slope stability analyses' as search ke words, and after checking the relevance between papers and the topic, a total of 159 pu lications from 2002 to 2022 were obtained.The number of publications has grown dram ically from 2019 to 2022.This shows that in recent years, researchers have paid more tention to the use of ML in slope stability problems.It is important to note that a sin publication can use multiple ML methods, resulting in a larger number of methods us than the number of publications reporting those studies.

Artificial Neural Networks (ANNs), Adaptive Neuro-Fuzzy Inference Systems (ANFISs), and Deep Neural Networks (DNNs)
Artificial neural networks (ANNs) are a commonly used ML method to study a approximate highly non-linear relationships between inputs and outputs without any sumptions or mathematical functions [49].A large number of artificial neurons make an ANN, each of which can receive and process 'signals' and transmit them to adjace neurons.'Signals' are real numbers transmitted between neurons with a specific activ tion function.Neurons adjust their weights with network learning and training.The AN structure contains an input layer, hidden layers, and an output layer [31].The differe structures and connections among neurons determine the form of an ANN.The two AN structures used in most of the examined studies include multilayer perceptron [50][51][52] a extreme learning machine [53,54].ANNs have been widely used in the field of machi learning and can perform a wide variety of tasks, such as classification [32] and regressi tasks [31].Multilayer feedforward networks are typical universal approximators [5 ANNs can also handle time series forecasting [56,57] and collaborative information filt [58].In this review, the term ANN primarily denotes the conventional neural netwo architectures encompassing the most basic structure.These networks are characterized having only fully connected layers (also known as 'Dense layers') as their hidden laye  Artificial neural networks (ANNs) are a commonly used ML method to study and approximate highly non-linear relationships between inputs and outputs without any assumptions or mathematical functions [49].A large number of artificial neurons make up an ANN, each of which can receive and process 'signals' and transmit them to adjacent neurons.'Signals' are real numbers transmitted between neurons with a specific activation function.Neurons adjust their weights with network learning and training.The ANN structure contains an input layer, hidden layers, and an output layer [31].The different structures and connections among neurons determine the form of an ANN.The two ANN structures used in most of the examined studies include multilayer perceptron [50][51][52] and extreme learning machine [53,54].ANNs have been widely used in the field of machine learning and can perform a wide variety of tasks, such as classification [32] and regression tasks [31].Multilayer feedforward networks are typical universal approximators [55].ANNs can also handle time series forecasting [56,57] and collaborative information filters [58].In this review, the term ANN primarily denotes the conventional neural network architectures encompassing the most basic structure.These networks are characterized as having only fully connected layers (also known as 'Dense layers') as their hidden layers.
They operate without loops, lack inherent memory, and consider only the current input for any given operation.
Adaptive neuro-fuzzy inference systems (ANFISs) are a machine learning method that combines ANNs with Takagi-Sugeno fuzzy inference systems [59][60][61].The key feature is a set of fuzzy rules, which are conditional statements of the form 'if x is A then y is B', used to approximate the input-output relationship of a system.An ANFIS trains a model with gradient descent and backpropagation algorithms with roughly five steps [59]: (1) The input data are transformed into fuzzy sets using membership functions.
(2) The firing strength of each rule is generated from the input and rules.
(3) Normalized firing strengths are calculated using weighted averaging.(4) Consequent parameters are adjusted to optimize the parameters and weights.
(5) All incoming signals are summed up to obtain the overall output.
The primary advantage of ANFIS is the ability to represent nonlinearity and structured knowledge [59].However, ANFIS requires a large dataset to train the model, and selecting appropriate input-output data is much more crucial.Moreover, the number of fuzzy rules increases exponentially with the amount of input data, which greatly increases the computation cost and may affect the performance of a model [62].
Deep neural networks (DNNs) are extended from the concept of ANNs, and DNNs are probably the most successful deep learning method.The term 'deep' can refer to a structure of more than one hidden layer in NNs.A convolutional neural network (CNN) is a commonly used deep learning model, which is widely used in image and pattern recognition [63] in the computer field, and has been extended to other fields, such as foundation bearing capacity problems [15,64] and slope stability problems [65][66][67] in geotechnical engineering.Hidden layers in CNNs usually contain convolutional layers, pooling layers, and fully connected layers.CNNs effectively capture the topology of images and build features that automatically link features to image classification.Also, there are some variants of CNNs, such as locally connected networks [15,66].Classic DNN structures also include recurrent neural networks (RNNs), transformers, etc. RNNs can be used to process language models [68] and speech recognition [69].Long short-term memory (LSTM) is a popular variant of RNNs.The recently very popular ChatGPT from OpenAI is based on the transformer architecture [70].DNNs usually have many layers and millions of parameters, which can help models fit the training data closely.However, overcomplicated model structures can lead to overfitting problems (the generalization of the trained model is poor).This problem is usually avoided using regularization or trimming connectivity (dropout, etc.).In this review, the term DNN pertains to more complex neural architectures.These can comprise hidden layers of varying structures, such as convolutional or pooling layers, and they may possess functionalities beyond those of basic ANNs.It is crucial to note that not all hidden layers in DNNs are fully connected.

Support Vector Machine (SVM)
Support vector machine (SVM) was initially developed to finish the binary classification task [71].This method is now developed as a supervised learning method and can deal with classification and regression problems [31].The key to SVM lies in the concept of identifying an optimal hyperplane.In classification tasks for linearly separable data, SVM maps training samples into an n-dimensional space and seeks to maximize the margin between the two categories.For regression tasks, SVM aims to find the most fitting hyperplane that contains the most data points.The ε-insensitive loss function is introduced to consider an acceptable error range ε.This loss function defines a region of width of 2ε, where the difference between the predicted and true values is ignored.Specifically, if the difference between the predicted and actual values is less than ε, the loss is 0; otherwise, the loss is the difference between the difference and ε.Quadratic programming methods are used to solve hyperplanes both in classification and regression problems.When handling non-linear data, SVM uses the kernel trick to map data into a higher-dimensional space, facilitating the determination of an optimal hyperplane in this newly transformed feature domain [31].
Popular kernel methods include the radial basis function, periodic kernel, polynomial kernel, exponential squares kernel, etc. SVM has been widely used in many fields, such as text and hypertext classification [72], bioinformatics [73], and anomaly detection [74].

Gaussian Process Regression (GPR)
Gaussian process regression (GPR) is a non-parametric supervised learning Bayesian approach that models the relationship between input variables and output variables [75,76].It is used for both regression and classification problems.
The Gaussian distribution can be used to describe the distribution of random variables, while the Gaussian process is a generalization of Gaussian distribution, which is used to describe the distribution of functions.A Gaussian process can be defined by its mean function m(x) and the covariance function k(x, x ) in the function space: The Gaussian process then can be represented as: Usually, for notation simplicity, we treat the mean function as zero at this time [77,78].Consider a dataset A with n observations (A = {(x i , y i )|i = 1, 2, 3, . . ., n}), where x i is an M-dimensional input vector and y i is the output scalar.The relationship between input data and output data is assumed to be [75,77,79]: where f (x) represents the arbitrary regression function and ε represents the Gaussian noise that follows an independent, identically distributed Gaussian distribution with zero mean and variance of σ 2 n ( ε ∼ N 0, σ 2 n ).The input and output data are defined as two matrices X = [x 1 , x 2 , . . . ,x n ] and Y = [y 1 , y 2 , . . . , where K is the matrix of covariance function k(x, x ): The prediction outputs (training outputs y and test outputs y * ) are jointly distributed as a multivariate normal distribution [78].The joint distribution of the observed target value y and the predicted value y * at the test location under a priori conditions can be written as: GRP is then used to compute the predictive distribution of the function values y * at test points Bayesian optimization is one of the most famous applications of Gaussian process regression [80,81].When evaluating the objective function is very expensive (such as tuning deep learning hyperparameters), Bayesian optimization methods efficiently find the optimal value by building a Gaussian process model of the objective function to select the next evaluation point.

Decision Tree (DT), Random Forest (RF), and Gradient Boosting (GB)
A decision tree (DT) is a supervised learning method utilized in statistics, data mining, and machine learning.It uses a tree-like structure for both regression and classification problems [82,83].Nodes in a DT that have outgoing edges are referred to as internal nodes, while others are known as leaf nodes (leaves), and branches connect the nodes in DT models.DTs can be categorized into two types: classification trees and regression trees.Classification trees split data into two subsets based on class labels, repeating this process until a stopping criterion is met.Regression trees are another type of DT used in machine learning to solve regression problems.They are used to predict continuous output variables, unlike classification trees, which are used to predict a discrete set of values.Regression trees split data into two subsets based on output data and repeat this process until a stopping criterion is met.DTs have many advantages, including simplicity in understanding and explanation [84].However, DTs can be highly sensitive.Small changes in the input data can lead to significant effects on trees and final predictions [84].Due to its reliance on the greedy algorithm, DTs may sometimes fail to return a globally optimal result [85].The DT method can also build over-complex trees for specific problems, leading to poor generalization from training data (overfitting).
A random forest (RF) is an ensemble learning method used for solving classification, regression problems, and other tasks.During training, the RF builds a large number of DTs, which are then used to make predictions [86,87].In classification tasks, the RF output is the class selected by most of the DTs, while in regression problems, the mean or average prediction value from each DT is returned as the RF result.One of the main advantages of RFs is the ability to overcome the overfitting problem that often occurs in DTs.As the number of trees in an RF increases, the generalization error decreases and stabilizes at a limit value.This means that the performance of an RF will not decrease due to overfitting [86].However, one of the disadvantages of RFs is losing the intrinsic interpretability of decision trees [88].Additionally, RFs may not accurately predict the extreme values in continuous variables [89,90].
Gradient boosting (GB) is a popular machine learning algorithm for solving classification and regression problems using a series of weak prediction models, usually in the form of decision trees [91,92].GB iteratively adds new weak learners to the previous model.At each iteration, the model calculates the negative gradient of the loss function from the prediction values of previous learners.It then trains a new learner with this information to focus on the low-accuracy samples from previous learners.Typically, GB uses mean squared error (MSE) as the loss function [93].Compared with the RF method, which is also built from DTs, GB is often found to perform better [91,92,94].This is because GB can effectively reduce bias and variance in the model, leading to better accuracy in prediction.However, GB can be computationally expensive and prone to overfitting if the number of iterations is too high.
Decision trees have demonstrated high accuracy in disease prediction, especially heart disease prediction [95].Random forest and gradient-boosting classifiers perform very well in a credit-scoring context and are able to cope comparatively well with pronounced class imbalances in these data sets [96].

K-Nearest Neighbor (k-NN)
The k-nearest neighbor (k-NN) method is a popular machine learning algorithm used for regression and classification problems.It assumes that similar samples are often located close to each other in the feature space [97].For each sample point, the k-NN algorithm finds the k nearest samples in the dataset and calculates the distances between them and the previous sample points.The input data consist of the samples with the smallest distances satisfying k.In classification problems, the output is the class membership (labels).A sample is classified based on the majority class among the k nearest samples.If k = 1, the sample is classified to the nearest class.In regression problems, the output value is the average value of the k nearest samples [98].One advantage of the k-NN method is its simplicity and fast training process.Additionally, the k-NN algorithm can eliminate noise to some extent [99].However, the value of k is a sensitive parameter, and a smaller k may lead to overfitting, while a larger k may lead to underfitting.Additionally, the computational cost can be expensive, especially for large datasets [99].Therefore, it is necessary to carefully select hyperparameters to ensure performance.The application scenarios for the k-NN method include image recognition [100,101], text classification [102,103], and recommendation system [104].

Multilinear Regression (MLR) and Multivariate Adaptive Regression Splines (MARS)
Multilinear regression (MLR), also known simply as multiple regression, is a simple machine learning method used to model the relationship between two or more predictor variables and a response variable.The model is shown in Equation ( 14) [105]: where y is the dependent variable (output data), α 0 is the intercept or constant term, and α 1 to α n are the coefficients of the independent variables (input data) x 1 to x n , respectively.The aim of MLR is to estimate the values of the α coefficients to minimize the sum of squared errors between the predicted values and the actual values of the dependent variable.The disadvantage of MLR is that this method can only consider the linear relationship between the input and output data.

Multivariate adaptive regression splines (MARS)
is a nonlinear regression algorithm used for classification and regression problems.MARS models the relationship between multiple inputs and one output using a series of piecewise linear functions called basis functions [106,107].Taking the linear basis function as an example, other basis functions are similar.The linear basis function B i (x) can be defined as three types: (1) Constant values T (the intercept).
(2) A hinge function: (3) A function of two or more hinge functions.
MARS model M can be expressed as a sum of basis functions: where t i is the constant coefficient.The model starts with a single basis function and adds new functions iteratively.The algorithm selects the optimal location for each hinge in a greedy forward stage-wise manner [108].MARS has the advantage of generating a simple model that captures complex relationships between inputs and outputs using a small number of basis functions.However, MARS may be sensitive to the choice of the initial basis function and the stopping criteria.MARS is often used in credit scoring [109,110], and species distribution models [111] and also has applications in time series analysis [112].

Sparse polynomial chaos expansion (SPCE)
is based on polynomial chaos expansion and does not require assumptions of the performance function [113][114][115].Polynomial chaos expansion in the random FEM uses a series of coefficients to present the response of the system.However, the computation costs for high-dimensional problems are very prohibitive.A step-wised regression technique is used in SPCE to eliminate the nonsignificant polynomials [114].There are some variants of SPCE, such as adaptive sparse polynomial chaos [113,115].
A Bayesian network (BN) is a probabilistic graphical model that represents a set of random variables and their conditional dependencies using a directed acyclic graph and the probability distribution for each node [116,117].BNs combine the knowledge of graphs and probability statistics and can be used to solve uncertainty problems with logical reasoning.BNs can implement sequential reasoning (such as estimating the probability of failure) and also allow reversal inference (such as evaluating the sensitivity of factors).Bayesian inference can be used to estimate or update the probability of certain nodes in the network when other nodes are observed.In addition, BNs also show better prediction accuracy on small samples.

The Hyperparameter Optimization (HPO) Algorithm in Machine Learning
In machine learning, hyperparameters (such as learning rates in ANNs) must be set before training the ML model [118].Hyperparameter optimization (HPO) is an automated hyperparameter tuning technique aimed at obtaining the best model structure.Grid search is an easy-to-understand HPO method [119].However, its efficiency is greatly affected by the number of hyperparameters to optimize and the values chosen on the grid.The random search method was proposed as an alternative method to speed up the search process [120].More advanced HPO methods have also been proposed, which can be called metaheuristic-based optimization algorithms [89].These algorithms optimize a problem based on a given heuristic function or a cost measure and find a good or acceptable solution within a reasonable amount of time and memory (e.g., particle swarm optimization and ant colony optimization).

Application of ML Methods in Slope Stability
As the global population continues to grow, mountainous regions are becoming increasingly densely populated.As a result, more and more infrastructures are being built in close proximity to slopes, which are highly susceptible to landslides.These landslides not only pose a significant risk to human life but also result in significant economic losses [31].Although analytical and numerical methods have been developed and applied in recent years, most of them focus on deterministic analysis, such as the aforementioned LEM and FEM.Due to geological effects such as weathering, transportation, and stress history, geotechnical engineering materials are not homogeneous but subject to certain spatial variability [121,122].
To improve the accuracy of slope stability analysis, random field theory is used to represent the spatial variability in geotechnical engineering materials [123].In the early stage, brute-force MCSs were used to estimate slope stability.However, this method requires a large number of samples for the estimation of small probability events.Therefore, some advanced methods were proposed to approximate numerical simulations, such as the response surface method [19,[124][125][126].These methods essentially fit independent and dependent variables, such as material parameters and the slope FOS, to polynomial functions and quickly estimate large numbers of samples [125,127].Recently, machine learning methods were gradually applied to slope stability problems due to better performance in handling complex problems.
Figure 3 shows the distribution of slope stability problems analyzed in 159 works from the literature retrieved from the Web of Science database.The machine-learning-aided slope stability analysis mainly concerns two problems-prediction of the FOS and slope stability classification.Other problems include searching for limit failure surface, slope deformation prediction, establishing failure criterion, and so on.

Performance Evaluation Metrics in Regression Problems Using Machine Learning
Evaluation metrics are required to assess the performance of ML models.For the use of machine learning surrogate models, it is common to think of all evaluation data points as interpolations.Common evaluation metrics for regression problems can be roughly divided into two categories: the first category reflects the overall fitting degree of the model, such as the coefficient of determination ( ), while the second category, known as error parameters, directly expresses the difference between the predicted and observed values.The second category can be further divided into relative and absolute differences, such as the mean absolute percentage error (MAPE) and root mean square error (RMSE), respectively.
The coefficient of determination   is a statistical measure that represents the proportion of the variance in the dependent variable that can be explained by the independent variable(s) included in the model.In other words, it reflects the overall fit of the model. can be defined as: where  represents the ith observed value,  * represents the ith predicted value, and  represents the mean value of the observed values. increases with an improvement in model performance.If the predicted value exactly matches the observed value,  1.
However, it is important to note that  can be misleading if used alone, as it does not provide information about the predictive ability or accuracy of the model.

Performance Evaluation Metrics in Regression Problems Using Machine Learning
Evaluation metrics are required to assess the performance of ML models.For the use of machine learning surrogate models, it is common to think of all evaluation data points as interpolations.Common evaluation metrics for regression problems can be roughly divided into two categories: the first category reflects the overall fitting degree of the model, such as the coefficient of determination (R 2 ), while the second category, known as error parameters, directly expresses the difference between the predicted and observed values.The second category can be further divided into relative and absolute differences, such as the mean absolute percentage error (MAPE) and root mean square error (RMSE), respectively.
The coefficient of determination R 2 is a statistical measure that represents the proportion of the variance in the dependent variable that can be explained by the independent variable(s) included in the model.In other words, it reflects the overall fit of the model.R 2 can be defined as: where y i represents the ith observed value, y * i represents the ith predicted value, and y represents the mean value of the observed values.R 2 increases with an improvement in model performance.If the predicted value exactly matches the observed value, R 2 = 1.However, it is important to note that R 2 can be misleading if used alone, as it does not provide information about the predictive ability or accuracy of the model.
Three metrics are introduced for evaluating the absolute difference between the predicted values and the observed values: mean squared error (MSE), root mean square error (RMSE), and mean absolute error (MAE).
Mean squared error (MSE) is a widely used evaluation metric in machine learning to measure the performance of a regression model.It is calculated as the average of the squares of the differences between the predicted and observed values, and it is always a non-negative value that decreases as the model performance improves.MSE can be defined as: Root mean square error (RMSE) is calculated as the square root of the average of the squared differences between predicted values and actual values.RMSE is preferred over MSE when the output values have different dimensions, as RMSE maintains the same dimension as the output values.RMSE is always a positive value that decreases as the model performance improves.RMSE can be defined as: Mean absolute error (MAE) reflects the mean magnitude of the errors without considering their direction.RMSE and MAE have the same dimension, but RMSE is typically larger than MAE due to the squaring of errors in the calculation of RMSE.The squaring of errors in RMSE places greater emphasis on large errors and can lead to a larger overall value.Additionally, unlike RMSE, MAE is directly proportional to the absolute value of the error, meaning that each error influences MAE in proportion to its magnitude, regardless of its direction [128].MAE can be defined as: Mean absolute percentage error (MAPE) is one of the relative errors between the predicted values and the observed values.As a dimensionless evaluation metric, MAPE can simply compare the performance of different models on different datasets, and it is intuitive and easy to understand.Compared with RMSE, MAPE is less susceptible to outliers and is more robust.However, MAPE is not suitable for cases where the observed values are zero or close to zero because MAPE tends to infinity.MAPE can be defined as: Figure 4 illustrates the impact of the coefficient of determination and error parameters on the evaluation of ML models.Several simple functions are chosen to represent observed and predicted values in four particular cases.These four cases are presented in Figure 4, respectively: (a) Low coefficient of determination and high error parameters.In this case, both the overall prediction and the individual predictions are unreliable, which may be due to outliers, such as wrong predictions of boundary values.In addition, non-linear relationships, heteroscedasticity, high noise, and overfitting/underfitting can also lead to this situation.(b) Low coefficient of determination and low error parameters.In this case, individual predictions are accurate, but the overall prediction is poor.One reason can be a low slope of the fitting function.Limited by the definition of R 2 , when the slope of the linear fitting function is low (the function value is close to the average value), even if the prediction accuracy is high, the calculation result of R 2 tends to be 0. The model fails to account for the variability in these data.Similarly, non-linear relationships, heteroscedasticity, high noise, and overfitting/underfitting can also lead to this situation.(c) High coefficient of determination and high error parameters.In this case, the overall prediction is accurate, but the individual predictions are unreliable.This may be due to the use of a linear relationship to fit a non-linear function (incorrect fitting relationships).Uncertainty in the data, heteroscedasticity, outliers, and overfitting/underfitting can also lead to this situation.For example, if the data have large uncertainties, even if the model can explain some of the variance, the MSE may still be large due to the inherent uncertainty of the data.It is recommended to consider both the coefficient of determination and error parameters together to accurately evaluate the performance of a model.Outliers and high noise are often eliminated during data processing and detailed information can be found in Shan's work [129].Overfitting/underfitting are also common problems in machine learning.Data augmentation, regularization, cross-validation, and early stopping help reduce overfitting problems, while increased model complexity and tuning hyperparameters can fix the underfitting problem.It is recommended to consider both the coefficient of determination and error parameters together to accurately evaluate the performance of a model.Outliers and high noise are often eliminated during data processing and detailed information can be found in Shan's work [129].Overfitting/underfitting are also common problems in machine learning.Data augmentation, regularization, cross-validation, and early stopping help reduce overfitting problems, while increased model complexity and tuning hyperparameters can fix the underfitting problem.

Failure Probability of Slopes
In slope stability analysis, the safety margin of slopes is usually represented by the failure probability P f from the perspective of probability.The limit state function can describe slope stability theoretically.A failure happens when the condition of G(Ψ) ≤ 0 is satisfied, where Ψ represents the material parameters commonly regarded as random variables or random fields.The failure probability can be expressed as [130,131]: where f (Ψ) represents the joint probability density function of Ψ.In slope stability analysis, an FOS smaller than 1 usually represents an unstable state.So, G(Ψ) can be expressed as: However, the failure probability cannot be evaluated using the multiple integral in Equation ( 22) since f (Ψ) is usually unavailable due to the spatial variability in materials in geotechnical problems.Therefore, approximation or simulation methods are needed to replace f (Ψ).MCS is often used to approximate failure probability because it is easy to understand and operate.The failure probability can be obtained using: where FOS i represents the ith realization of the FOS.[FOS i < 1] is the Iverson bracket used to judge the state of slopes.
The reliability index β is used to evaluate the system reliability without an exact probability distribution function [132]: where µ z = E[g(x)] and σ z = σ[G(Ψ)] are the mean and standard deviation of the performance function, respectively.It represents the distance between mean values of G(Ψ) and 0 (assumed as the failure point) with standard deviations.G(Ψ) is assumed to be normally distributed in Equation (25).The relationship between failure probability and the reliability index is [77]: where Φ is the standard normal cumulative distribution function.
In slope stability analysis, uncertainty is caused by many factors, including the nature of the geological material, soil parameters, the water table, seismic intensity, etc.Because there are certain errors or uncertainties in the measurement and estimation of these parameters, the calculation results of slope stability will also be affected by these uncertainties, which makes the stability uncertain.MCSs are widely used in stochastic analysis to estimate the propagation of uncertainty.With the development of computer science, a large number of machine learning methods (such as random forests and neural networks) are proposed and applied to build surrogate models to replace time-consuming numerical evaluations.These algorithms can directly fit the mathematical relationship between input and output, reduce the need for complex and time-consuming calculations, and directly obtain the system response.For slope stability analysis, using machine learning methods as surrogate models to evaluate the FOS can obtain highly accurate results and greatly reduce computation effort.

Prediction of the Factor of Safety
A total of 159 slope stability studies were collected from the Web of Science that were published in the last two decades, 89 of which focused on the FOS.This kind of machine learning-aided slope stability method has similarities to the so-called surrogate model methods.The key is the use of machine learning to establish high-dimensional nonlinear functions between input and output variables.These nonlinear functions can obtain a large number of predictions for reliability analysis with high efficiency and accuracy.
Taking the research of [31] as an example, a typical machine learning-aided slope stability analysis is presented in Figure 5:

Prediction of the Factor of Safety
A total of 159 slope stability studies were collected from the Web of Science that were published in the last two decades, 89 of which focused on the FOS.This kind of machine learning-aided slope stability method has similarities to the so-called surrogate model methods.The key is the use of machine learning to establish high-dimensional nonlinear functions between input and output variables.These nonlinear functions can obtain a large number of predictions for reliability analysis with high efficiency and accuracy.
Taking the research of [31] as an example, a typical machine learning-aided slope stability analysis is presented in Figure 5:  (1) Data generation: A large number of samples of random variables or random fields are generated to represent the uncertainty and spatial variability in the soil.Karhunen-Loève expansion is a common method to generate random variables or random fields [133,134].(2) Data collection and preprocessing: Random variables or random fields are mapped into numerical models, such as the LEM and FEM, and the associated FOS is calculated using the numerical models.All simulation results are required to be checked to ensure the correct FOS is obtained.(3) Model selection and training: One or more appropriate machine learning models are chosen based on the nature of the problem.A specific machine learning method is used to train and validate models with random variables or random fields as input and the FOS as output data.Trained models represent the nonlinear relationship between samples and the FOS.The conceptual function captured with machine learning is expressed as: where x i represents the ith input variables, such as the cohesion and friction angle in the Mohr-Coulomb model.
(1) Model validation and tuning: The trained models are evaluated using either a validation or test dataset to assess their performance.Model parameters may be adjusted, or alternative models may be explored as necessary to attain the optimal model or a model that closely approaches optimality.(2) Model deployment: Based on the well-trained models, a large number of predicted FOS values can be quickly obtained, and the failure probability can be counted according to the predicted results.
According to step (3) above, the implicit Equation ( 27) captured with machine learning is also an application of the response surface method (RSM).The RSM has been proven to be an efficient tool for slope stability analysis [127].The classic RSM uses polynomial regression (usually quadratic regression) to approximate the actual response.The classic RSM may fail to detect the most critical slip surface if multiple failure surfaces are present.The RSM based on machine learning is more powerful when dealing with complex, nonlinear responses or large amounts of data and can perform a global search to find the most dangerous failure surface.For example, a Kriging-based RSM was proposed to conduct the system slope stability study [135].The undrained shear strain of the soil is the input of the equations.The method performs a global approximation that allows for a more accurate assessment of the system reliability of soil slopes.Even with the same calibration samples, second-order polynomial-based RSMs are less accurate than Kriging-based RSMs.
For the problem of predicting the FOS, ANNs were used up to 45 times, accounting for about 50% of the reviewed studies, followed by SVM, which was used 20 times.Table 1 lists some information about the studies on the prediction of the FOS and gives the notations of input variables.The rest of this section describes some studies of machine learning methods for predicting the FOS of slopes, including high dimensional or highly nonlinear regression [136][137][138].
Kang et al. [138] proposed a so-called ν-support vector machine (ν-SVM) method to build a surrogate model for predicting the FOS and evaluating the system failure probability.Mu'azu [61] combined the teaching-learning-based optimization (TLBO) method and two machine learning algorithms (ANNs and ANFISs) to predict the FOS of slopes.Their results indicate that the ANN-TLBO is the best model and has the lowest error.Ji et al. [139] trained surrogate models based on the least-squares support vector machine (LS-SVM) and accurately estimated the stability for spatially variable slopes.Ahangari Nanehkaran et al. [140] compared five different machine learning models for FOS predictions and verified the models using a confusion matrix and errors table to confirm the accuracy evaluation indexes.Hsiao et al. [65] proposed a pre-trained model using an ANN and CNN to directly estimate the safety factor, the trace of the slope slip surface, and finally quickly predict the probability of failure.Jiang et al. [141] trained a surrogate model with a gradient-boosting regression tree to predict the FOS under the effect of heavy rainfall.This model accurately evaluated the bench slope FOS under the rainfall intensity of the "20-year rainstorm recurrence period", and its error was smaller than that calculated using a numerical simulation analysis.Lin et al. [142] compared the FOS prediction ability of eleven machine learning methods and suggested that SVM, GBR, and Bagging are considered to be the best among these eleven regression methods.Meng et al. [136] used an ANN to evaluate the slope stability of 3D homogeneous dry slopes with different shapes of slope surfaces.A graphical user interface (SlopeLab) was developed based on the pre-trained ANN models.Suman et al. [143] studied both FOS predictions and slope stability classifications with three different machine learning methods: functional networks (FNs), MARS, and multigene genetic programming (MGGP).Two surrogate models for predictions and classifications were trained using the same input and output dataset, and prediction model equations were provided.Their results indicate that MARS has better prediction capabilities than FN and MGGP models.

Slope Stability Classification
Compared with the prediction of FOS, the classification of slope stability has been extensively studied, with a total of 47 publications identified in our review using the Web of Science.Slope stability classification is significant in landslide disaster prevention and mitigation [10].Directly classifying the state of a slope can intuitively identify the slope stability.Generally, numerical simulations and analytical methods can rationally and explicitly quantify the state of a specific slope.However, when faced with many landslides or a complex situation, performing a slope stability analysis on all landslides in practice would be time-consuming, computationally expensive, or even impossible.Therefore, the classification of slope stability based on known information is necessary.
Slope stability classification can be roughly divided into two categories.The first type classifies the state of real slopes [10,32].Researchers obtain terrain conditions, stress history, environmental and meteorological factors, and other conditions that affect slope stability using site investigation and laboratory measurements or from literature or databases [49,52,147].These conditions and slope stability classifications are used as input and output in the ML training process.The obtained models can accurately predict slope stability under specific conditions.
The second type classifies the stability of slopes using a numerical model [27,98], which is further used to accelerate stochastic analyses.The FOS obtained using numerical calculations or analytical solutions can be used to judge slope stability.Most of the reviewed studies performed a binary classification of slope stability [10,52].FOS = 1 is often defined as the critical state of stability and failure.Therefore, FOS > 1 is stable and FOS < 1 represents failure.Therefore, some studies directly define the output as stable and unstable states.However, some studies reported that some slopes may have intermediate states or are unstable when FOS is greater than 1 and stable when FOS is less than 1 [49], so a more detailed classification of slope states is needed.In addition, a binary classification method is proposed to directly judge the relationship between FOS values and 1 without accurate calculations of the FOS [26].
A typical machine learning-aided slope stability classification is similar to the process of predicting the FOS, while the outputs are the slope stability classifications.Among the studies on slope stability classification, the most common machine learning algorithms were ANNs, which were used in twenty-two studies, followed by SVM in eighteen studies.
Table 2 presents some information about the studies on slope stability classification.Some of these studies are introduced below.Zhang et al. [10] considered an ensemble learning-based method that combined RF and XGBoost methods to conduct a slope stability classification and explore the importance of 12 influencing variables.Zhu et al. [32] proposed a classification framework to categorize rock blocks based on the principles of block theory using the CNN method.The surrogate model can classify three types of rocks (key blocks, trapped blocks, and stable blocks) using high-resolution images.Yuan & Moayedi [52] used six metaheuristic algorithms to optimize the classification ability of machine learning methods.Their results show that metaheuristic algorithms could greatly improve the accuracy of classification models from 2% to 27%, among which the genetic algorithm performed best.
Ensemble learning technology is also popular in slope stability classification.Zhang et al. [148] built a margin distance minimization selective ensemble (MDMSE) method to deal with slope stability classification.This ensemble learning is established using four individual learners (k-NN, SVM, DT, and ANN).Compared with common single-machine learning models and ensemble models, MDMSE shows better generalization ability, better recognition accuracy, and faster identification speed.Lin et al. [149] developed an ensemble learning model with eight individual learners for classification problems.A parameter analysis was conducted with three types of slope parameters: material parameters, geometry parameters, and hydraulic parameters.Their results show that material parameters were the most sensitive factors to slope stability, followed by geometry and hydraulic parameters.They suggested that the cohesive force of geomaterials and the internal friction angle should be improved for treating landslides.

Future Research Perspective
The number of papers published in the past two decades shows that more and more machine learning methods, including several commonly used machine learning methods and some of the latest advanced machine learning methods (such as CNNs), are applied to solve slope stability problems.In addition, machine learning has greatly promoted the implementation of reliability analysis in geotechnical engineering practice.However, there are still some challenges in the current literature, including: (1) The spatial variation in shear strength parameters and hydraulic parameters is commonly considered in slope reliability and risk analysis.However, research on the hydro-mechanical coupling between these parameters is still limited.This means that the impact of rainfall or the groundwater table on slope stability is often not fully accounted for, as both hydraulic and shear strength parameters can have a significant influence.(2) Previous studies have mainly focused on simplified models, and there is a lack of realistic geotechnical engineering case applications.Moreover, most studies were limited to 2D slope simulations.Although 2D slope stability analysis has been extensively investigated, it neglects the effect of 3D spatial variation, which is a crucial factor in real-world applications.Despite being a technical challenge due to the larger computational efforts involved, several studies have demonstrated that 3D slope stability analysis results can significantly differ from those obtained using 2D models [136].Therefore, future research should focus on analyzing existing 3D slope stability data using ML to provide more accurate and reliable results.(3) A site investigation often captures only one or two properties of material spatial variability, which can make it challenging to accurately estimate fluctuation scales and autocorrelations.The generation of random fields currently relies on selecting a theoretical autocorrelation function, which may not accurately represent the true fluctuation scale and autocorrelation in soil parameters.In the future, machine learning could potentially be utilized in field surveys to capture more realistic estimates of these parameters, leading to improved reliability in slope stability analysis.(4) Due to the ease of establishing and connecting databases, as well as the advances in monitoring technologies, it is anticipated that more comprehensive and detailed input data can be obtained for the slope stability classification problem, including environmental factors, meteorological information, and hydrogeological conditions.This will lead to more precise predictions of slope conditions.

Conclusions
This paper reviews the application of ML to slope stability analysis problems in the past two decades.Slope stability has always been an important branch of geotechnical engineering.The limit equilibrium method and the finite element method are commonly used to calculate the FOS.Random field theory is used to simulate the spatial variability in soil.
Early use of the brute-force Monte Carlo method for reliability analysis is timeconsuming and inefficient.With the development of machine learning techniques, a large number of applications have contributed to the assessment of slope stability.Some common machine learning algorithms are introduced in Section 3. ANNs and SVM occupy a mainstream position in a large number of the reviewed studies, but we can also see that various advanced methods are gradually emerging, such as ANFIS (Figure 2).Each machine learning algorithm has advantages and disadvantages.The selection of machine learning algorithms should not blindly pursue 'complexity' and an 'advanced' method but should be suitable for the research problem.
Slope stability problems can be broadly categorized into two groups: those predicting the FOS and those classifying slope stability.In addition, a few studies focused on searching for the most critical sliding surfaces, designing failure criteria, and other related issues.Considering that machine learning is a "black box" technology, it is critical to select evaluation metrics judiciously for studying these problems.The coefficient of determination characterizes the overall fitting level of the model, while error parameters represent the individual prediction performance of the model.Ignoring any of them may lead to a biased evaluation of the model.Therefore, it is recommended to use both the coefficient of determination and error parameters to ensure a reasonable assessment of a model's performance.
While machine learning has been applied extensively to slope stability problems, several issues require further attention in future research.The hydraulic parameters and shear strength parameters have a significant influence on slope stability, and the problem of hydro-mechanical coupling remains a challenge in slope stability analysis.The analysis of 3D slope stability with the assistance of machine learning is important for geotechnical engineers and researchers, given the impact of 3D spatial variation on the stability of slopes.To enhance the reliability of slope stability analysis, it is important to explore how to use machine learning methods to capture more accurate fluctuation scales and autocorrelations.Additionally, the proper use of monitoring data and databases can contribute to an improvement in machine learning model performance.
In general, machine learning has been widely used in slope stability analysis and shows significant potential.With the availability of various open-source machine learning libraries, it has become more accessible to researchers.In the future, machine learning is expected to contribute to a more accurate analysis of slope stability problems and prevent disasters caused by landslides.

Figure 1 .
Figure 1.Proportion of different methods in slope stability (LEM: limit equilibriu finite element method).

Figure 1 .
Figure 1.Proportion of different methods in slope stability (LEM: limit equilibrium method; FEM: finite element method).

Figure 2 .
Figure 2. Annual number of studies about slope stability using ML, where the dashed line is the trendline.

Figure 2 .
Figure 2. Annual number of studies about slope stability using ML, where the dashed line is the trendline.

Figure 3 .
Figure 3. Proportion of slope stability problems in the various studies included in this review.

Figure 3 .
Figure 3. Proportion of slope stability problems in the various studies included in this review.
(d) High coefficient of determination and low error parameters.In this case, both the overall prediction and the individual predictions are good.Modelling 2023, 4, FOR PEERREVIEW  13    overfitting/underfitting can also lead to this situation.For example, if the data have large uncertainties, even if the model can explain some of the variance, the MSE may still be large due to the inherent uncertainty of the data.d) High coefficient of determination and low error parameters.In this case, both the overall prediction and the individual predictions are good.

Figure 4 .
Figure 4.The importance of the coefficient of determination and error parameters, (a) Low coefficient of determination and high error parameters, (b) Low coefficient of determination and low error parameters, (c) High coefficient of determination and high error parameters (d) High coefficient of determination and low error parameters.Cases (a)-(c) indicate poor performance of ML models in either overall or individual predictions, while Case (d) shows high accuracy in both overall and individual predicted values.Therefore, only one type of metric can lead to biased estimations of the ML model.It is recommended to consider both the coefficient of determination and error parameters together to accurately evaluate the performance of a model.Outliers and high noise are often eliminated during data processing and detailed information can be found in Shan's work[129].Overfitting/underfitting are also common problems in machine learning.Data augmentation, regularization, cross-validation, and early stopping help reduce overfitting problems, while increased model complexity and tuning hyperparameters can fix the underfitting problem.

Figure 4 .
Figure 4.The importance of the coefficient of determination and error parameters, (a) Low coefficient of determination and high error parameters, (b) Low coefficient of determination and low error parameters, (c) High coefficient of determination and high error parameters (d) High coefficient of determination and low error parameters.Cases (a)-(c) indicate poor performance of ML models in either overall or individual predictions, while Case (d) shows high accuracy in both overall and individual predicted values.Therefore, only one type of metric can lead to biased estimations of the ML model.It is recommended to consider both the coefficient of determination and error parameters together to accurately evaluate the performance of a model.Outliers and high noise are often eliminated during data processing and detailed information can be found in Shan's work[129].Overfitting/underfitting are also common problems in machine learning.Data augmentation, regularization, cross-validation, and early stopping help reduce overfitting problems, while increased model complexity and tuning hyperparameters can fix the underfitting problem.

Figure 5 .
Figure 5.A process of typical machine learning-aided slope stability analysis.(Adapted from [31]) (a) Traditional statistical reliability analysis (b) Machine learning aided statistical reliability analysis.

Figure 5 .
Figure 5.A process of typical machine learning-aided slope stability analysis.(Adapted from [31]) (a) Traditional statistical reliability analysis (b) Machine learning aided statistical reliability analysis.

Table 1 .
Studies of slope stability concerning the prediction of FOS with ML.

Table 2 .
Studies of slope stability analysis concerning classification with ML (output are slope state classifications)., H, β, β in , D dip , V l , C v 628