1. Introduction
Slope stability assessment is a fundamental task in geotechnical engineering as it underpins landslide hazard evaluation, slope design, and engineering risk management [
1]. For soil slopes and embankments, circular sliding is one of the most common and representative failure modes. Slope stability is commonly quantified by the factor of safety (Fs), which depends on the combined effects of slope geometry, soil strength, and hydraulic conditions. In practice, variables such as unit weight, slope height, pore pressure ratio, cohesion, internal friction angle, and slope angle interact in a highly nonlinear manner, which makes reliable Fs prediction challenging.
To address this problem, machine-learning methods have been increasingly applied to slope stability prediction. Early studies showed that neural networks and other data-driven models can estimate slope safety from geotechnical input variables [
2,
3,
4], and subsequent work extended this comparison to a broader range of machine-learning paradigms for slope stability prediction [
5,
6,
7,
8,
9]. Collectively, these studies indicate that data-driven methods can effectively capture the nonlinear relationship between slope parameters and Fs. Support-vector-based methods have attracted particular attention because they are well-suited to nonlinear regression problems with limited samples, which is a common characteristic of slope-stability case databases.
Several studies have already explored SVR-based approaches to slope stability prediction. Xue [
10] developed a PSO-LSSVM model for slope stability prediction, showing that support-vector-based models can achieve competitive predictive performance. Sari et al. [
11] applied SVR directly to factor-of-safety prediction and demonstrated its feasibility for this task. Wei et al. [
12] further examined hybrid SVR-based models and also reported favourable results. Together, these studies established the applicability of SVR to slope stability prediction. However, they focused primarily on predictive accuracy—that is, on whether SVR or its hybrid variants could outperform competing models—rather than on the reliability of the reported SVR performance under small-sample conditions.
This distinction matters because slope-stability datasets derived from published case histories are usually limited in size. Under such conditions, model development based on a single train-test split or ordinary k-fold cross-validation can readily produce optimistic estimates if hyperparameter tuning and final performance evaluation are not strictly separated [
13]. Consequently, apparently high predictive accuracy does not necessarily reflect reliable out-of-sample performance. From this perspective, the key gap in existing SVR-based slope-stability studies is the lack of a rigorous framework for evaluating the reliability of SVR in data-scarce settings. In other words, the central issue is not whether SVR can fit the available data well, but whether it can deliver stable, reproducible, and trustworthy generalization performance.
To address this gap, this study develops a small-sample-oriented, leakage-aware SVR framework for predicting the factor of safety of circular-failure soil slopes. Based on a database of 80 published cases [
14,
15], the proposed model uses six input variables: soil unit weight, slope height, pore pressure ratio, cohesion, internal friction angle, and slope angle. To reduce optimistic bias, preprocessing, hyperparameter tuning, and model evaluation are all embedded within a repeated nested cross-validation procedure, in which the inner loop is used for model selection and the outer loop is reserved for out-of-sample assessment. The proposed SVR framework is further benchmarked against back-propagation neural networks (BPNN) and radial basis function neural networks (RBFNN) under identical validation partitions and evaluation settings. The main contribution of this work is to shift the emphasis from accuracy-oriented SVR application to reliability-oriented SVR evaluation, thereby providing a more robust and engineering-relevant framework for preliminary slope stability assessment under limited-data conditions.
3. Construction of Slope Stability Prediction Model Based on SVR
3.1. Problem Definition and Dataset
The objective of this study is to construct a regression framework for predicting Fs of circular-failure soil slopes. The modelling dataset consisted of 80 circular-failure slope cases compiled from published sources. The principal references used for data extraction and cross-checking were Feng [
14] and Ma et al. [
15]. The cases were reorganized into a unified dataset containing soil unit weight (
), slope height (
), pore pressure ratio (
), cohesion (
), internal friction angle (
φ), slope angle (
β), and Fs.
As illustrated in
Figure 2, soil unit weight, cohesion, and internal friction angle characterize the fundamental strength properties of the slope material, whereas slope height and slope angle describe the geometric conditions of the slope. The pore pressure ratio reflects the hydraulic effect on effective stress and therefore directly influences slope stability.
From a geotechnical perspective, each input variable governs a distinct aspect of circular-failure slope stability. Soil unit weight () acts as the principal driver of gravitational stress and therefore increases the destabilizing shear force, although its net influence on Fs is moderated by the simultaneous increase in normal stress mobilized on the slip surface. Slope height (H) directly amplifies the gravitational driving force and, for a fixed geometry, tends to reduce Fs roughly in proportion to H. The pore-pressure ratio () reduces the effective normal stress on the failure surface and consequently diminishes the mobilized frictional resistance; even modest increases in can produce substantial reductions in Fs under otherwise identical conditions. Effective cohesion () provides the stress-independent component of shear strength and exerts a clear positive influence on Fs, particularly in shallow slopes and short slip surfaces where the cohesive contribution dominates. The internal friction angle (φ) controls the stress-dependent component of shear strength; larger φ improves Fs primarily on deeper or longer slip surfaces where the normal stress is large. Slope angle (β) defines the geometry of the slip mass and exerts the most direct geometric control on Fs: steeper slopes simultaneously reduce the available resisting moment and increase the driving moment, producing a strongly nonlinear reduction in stability. The six predictors therefore jointly describe strength (, , φ), geometry (H, β), and pore pressure ()—the three classical components of limit-equilibrium slope-stability analysis—and their combined nonlinear interaction motivates the use of a kernel-based regression framework.
The compiled dataset was used for model development, methodological comparison, and performance evaluation under small-sample conditions.
Appendix A provides the full compiled dataset used in the present study. Distributions of the six input variables are shown in
Figure 3.
As shown in
Figure 3, the six predictors span a wide range characteristic of published circular-failure case histories. Soil unit weight (
) is concentrated between 18 and 31 kN/m
3, reflecting the typical range of cohesive and granular soils encountered in slope-stability studies. Slope height (
H) is strongly right-skewed, ranging from 3.66 m for small embankments to 511 m for large hydropower-related slopes; this asymmetry reflects the over-representation of medium-height cases in the published literature. Effective cohesion (
) shows the widest relative variability (0–150 kPa, std 29.5 kPa), capturing both nearly cohesionless granular materials and stiff cohesive soils. The internal friction angle (
φ) is concentrated around 30–40°, consistent with typical geotechnical materials. Slope angle (
β) ranges from 8° to 53° with a bimodal pattern reflecting both gentle natural slopes and steep engineered cuts. The compiled database (N = 80) was assembled from two recognized geotechnical compilations [
14,
15] that together cover a representative cross-section of soil, rock-fill, and weathered-rock slopes; although it does not exhaustively sample all geological settings, the input ranges in
Figure 3 substantially overlap those encountered in standard preliminary slope-stability assessment, and the framework can be re-trained or extended whenever a domain-specific extension is required. Two limitations of this database should nevertheless be acknowledged. First, the cases are drawn from only two published sources and may therefore inherit selection biases inherent to those sources (e.g., a preference for well-documented failures over routine stable slopes). Second, the pore-pressure ratio (
) is markedly concentrated between 0.25 and 0.35 (mean 0.31, std 0.07), with only a small fraction of cases outside this band; consequently, predictions for slopes under markedly different hydraulic conditions (e.g., fully saturated or strongly drained slopes) should be interpreted with caution, as discussed further in
Section 4.5.
3.2. Data Inspection and Preprocessing
Before model development, the dataset was subjected to a data-quality inspection, including checks for missing values, abnormal records, variable ranges, and basic descriptive statistics. Missing values, if any, were handled only within the training data in each validation cycle so as to avoid information leakage from the test data. Outlier inspection was first performed by examining the physical plausibility of each record and cross-checking the original data source. Statistical outlier screening was used only as an auxiliary procedure, and no sample was removed solely on the basis of model residuals.
All continuous variables were standardized using statistics derived exclusively from the training folds. For a given variable
x, the standardized value
z was obtained as
where
and
denote the mean and standard deviation calculated from the corresponding training fold. The same transformation was then applied to the associated validation or test fold. This fold-wise preprocessing strategy ensured that the reported prediction performance reflected genuine generalization rather than inadvertent reuse of information from held-out data.
3.3. Correlation Analysis and Variable Relevance
To gain an initial understanding of the dataset, pairwise Pearson correlation coefficients and scatter plots were used to examine the linear associations among the input variables. The Pearson correlation coefficient
r is defined as
where
and
are the observed values of two variables, and
and
are their corresponding sample means.
Figure 4 presents the pairwise scatter distributions together with the Pearson correlation matrix of the six predictors.
It should be noted that a low pairwise correlation does not imply statistical independence. Therefore, the correlation analysis in this study was used only for descriptive exploration and preliminary screening of variable relationships, rather than as a formal test of independence.
Quantitative inspection of
Figure 4 reveals several patterns useful for interpreting the modelling results that follow. Among the six predictors, the strongest positive Pearson correlations are observed between
and
H (
r = +0.71),
and
(
r = +0.54),
and
φ (
r = +0.53), and
φ and
β (
r = +0.55). These reflect the over-representation in the database of dense, deep, and high-strength rock-fill slopes that combine high unit weight, large height, and elevated strength parameters. Moderate correlations between
and
H (
r = +0.45) and between
and
β (
r = +0.33) likewise stem from sampling characteristics rather than from any inherent mechanical coupling. The pore-pressure ratio
shows weak negative correlations with
(
r = −0.32),
H (
r = −0.31), and
(
r = −0.29), consistent with the tendency of lower-saturation cases to be reported for stiffer, deeper slopes. The Pearson coefficient, however, measures only linear association and is therefore not fully aligned with the strongly nonlinear nature of the slope-stability problem. To complement the linear analysis, the Spearman rank correlation and a histogram-based normalized mutual-information (MI) matrix were additionally computed on the same 80-case data. The Spearman coefficients confirm the patterns observed under Pearson (
–
H: ρ = +0.78;
φ–
β: ρ = +0.55;
–
: ρ = −0.42), indicating that the dependencies are monotonic and not driven by extreme observations. The MI matrix further identifies
as carrying the highest mutual information with Fs (I = 0.40), followed by
H (I = 0.30),
β (I = 0.26),
(I = 0.26),
(I = 0.25), and
φ (I = 0.20). This ordering reveals that
and
H are the most informative predictors under nonlinear dependence—a finding that is partially masked when only the weaker Pearson correlations between
/
H and Fs are considered. These observations motivate the kernel-based regression framework adopted in
Section 3.4, which can capture such nonlinear input-output dependencies without prescribing a parametric form.
3.4. Model Development
SVR was adopted as the core regression model in this study. Unlike classification-oriented support vector machines, SVR seeks to determine a regression function that deviates from the observed target values by no more than a predefined ε-insensitive margin while maintaining model flatness. By introducing slack variables and a penalty parameter , SVR balances model complexity against empirical fitting error. To capture nonlinear relationships between the geotechnical variables and Fs, the RBF kernel was adopted.
Three hyperparameters govern the behaviour of the RBF-SVR model: the penalty parameter , the kernel parameter γ, and the ε-insensitive loss width ε. Parameter controls the trade-off between model smoothness and fitting error, γ determines the influence range of individual samples in the transformed feature space, and ε defines the tolerance margin for regression errors. Owing to its strong nonlinear mapping capability and good small-sample generalization, the RBF-SVR model is well-suited to the present slope stability prediction problem.
3.5. Hyperparameter Optimization and Reproducible Validation
To improve robustness and reproducibility, hyperparameter tuning and model evaluation were conducted within a nested cross-validation framework. The outer loop was used for unbiased performance estimation, whereas the inner loop was used for hyperparameter selection. Specifically, a repeated five-fold outer cross-validation design was adopted, and the random partitioning process was repeated ten times to reduce the influence of any single split on the final evaluation. All models and the validation framework were implemented in MATLAB R2018b+ LIBSVM 2.6.
Within each outer-loop training subset, an inner five-fold grid search was performed to determine the optimal hyperparameters of the RBF-SVR model. All preprocessing procedures, including standardization and optional data cleaning operations, were performed using training data only within each fold. After the optimal hyperparameters had been identified in the inner loop, the SVR model was retrained on the full outer-loop training subset and then evaluated on the corresponding outer-loop test subset. This procedure yielded repeated out-of-sample estimates of predictive performance and thereby reduced the optimism associated with a single random train–test split. Because repeated outer-loop results generate a large number of fold-wise predictions, a single representative hold-out split was additionally selected for graphical presentation and sample-wise error comparison in
Section 4, whereas the repeated nested cross-validation framework remained the primary procedure for robust model development and validation. The overall workflow of model construction, hyperparameter tuning, and validation is summarized in
Figure 5, and the main model settings and search ranges are listed in
Table 1.
To illustrate the advantage of the nested CV mechanism over conventional validation protocols commonly used in small-sample geotechnical machine learning, a methodological comparison is provided in
Table 2. Nested cross-validation separates hyperparameter tuning (inner loop) from performance evaluation (outer loop) to yield unbiased error estimates, which is essential for rigorous evaluation in small-sample scenarios.
3.6. Benchmark Models and Evaluation Metrics
Two benchmark models were considered: the back-propagation neural network (BPNN) and the radial basis function neural network (RBFNN). Both were implemented under the same validation protocol as the SVR model. All three models used the same input variables and outer-loop data partitions, so that the comparison reflected differences in modelling strategy rather than differences in data usage. Crucially, the two benchmark models also underwent identical inner-loop grid-search hyperparameter tuning within the nested cross-validation framework, ensuring that the reported performance ranking is not biased by differential tuning effort. The RBFNN grid covered n
centres ∈ {10, 15, 20, 25} (initialized by k-means clustering), Gaussian-RBF inverse-width
γ ∈ {2
−7, 2
−5, 2
−3, 2
−1}, and ridge regularization
α ∈ {10
−3, 10
−2, 10
−1} (48 candidates per fold); the BPNN grid covered n
hidden ∈ {8, 16, 24}, weight-decay
α ∈ {10
−4, 10
−3, 10
−2}, and Adam learning rate lr ∈ {0.01, 0.02} with 400 training epochs (18 candidates per fold). These configurations are reported in the revised
Table 1 for reproducibility, addressing the requirement that all three models be tuned to comparable rigour. The Wilcoxon signed-rank test of
Section 3.8 should also be interpreted in this context: with only 20 paired observations from a single representative hold-out split, the test has limited statistical power and the reported
p-values must be regarded as descriptive evidence rather than confirmatory inference. The 50-fold nested-CV results in
Supplementary Material S1 provide the principal evidence on which the reliability claim is based.
Model performance was assessed using four metrics: the coefficient of determination (R
2), root mean square error (RMSE), mean absolute error (MAE), and mean relative error (MRE). In general, a larger R
2 and smaller RMSE, MAE, and MRE indicate better model performance. The repeated outer-loop evaluation served as the principal robustness check during model development, whereas a representative hold-out split was additionally used for graphical illustration and sample-wise comparison in
Section 4.
3.7. Qualitative Engineering-Consistency Assessment
To enhance engineering credibility while remaining fully consistent with the available evidence, the fitted model was examined from the perspective of physically meaningful response tendencies rather than through formal post hoc interpretability tools [
17]. The assessment therefore focused on whether the dominant response directions implied by the prediction results were broadly compatible with established geotechnical principles for circular-failure slopes, particularly the expected roles of cohesion, internal friction angle, pore pressure, slope angle, and slope height.
This engineering-consistency assessment was used as a qualitative credibility check rather than as a complete feature-attribution analysis. In other words, a model was regarded as more convincing when its predictive behaviour remained compatible with the mechanics of shear resistance and driving-force balance, rather than merely producing favourable numerical scores on a particular data split.
3.8. Statistical Comparison Based on Paired Sample-Wise Errors
To determine whether the performance differences among competing models were statistically meaningful rather than incidental to the selected test samples, a non-parametric Wilcoxon signed-rank test was performed on paired prediction errors derived from the representative hold-out test subset reported in
Section 4. Absolute error (AE) and relative error (RE) vectors for the 20 hold-out cases were used for pairwise comparison among SVR, RBFNN, and BPNN. This analysis was intended as a descriptive statistical comparison on a common illustrative split, rather than as the sole basis for model validation.
Accordingly, the methodological framework established in this section comprises problem definition and dataset specification, leakage-aware preprocessing, rigorous model development, benchmark comparison, qualitative engineering interpretability assessment, and non-parametric statistical testing of paired prediction errors.
4. Results and Discussion
This section presents the prediction results obtained from a representative hold-out split and discusses the comparative behaviour of SVR, RBFNN, and BPNN for circular-failure slope stability prediction. The purpose of this representative split is to provide a case-level illustration of model performance, including fitted trends, sample-wise errors, and paired error differences. These results should be interpreted as complementary evidence to the repeated nested cross-validation framework described in
Section 3, rather than as a substitute for the overall robustness evaluation.
4.1. Representative Prediction Performance of SVR
Figure 6 shows the prediction results of the SVR model on the representative training-test split. The training subset is used to illustrate the fitting behaviour of the model, whereas the hold-out test subset provides a direct visual assessment of its predictive response for unseen cases within the same partition.
For the training subset, the SVR model achieved an R2 of 97.58% and an RMSE of 0.0518, indicating that the predicted safety factors closely matched the observed values. This result indicates that the nonlinear mapping between the six input variables and the factor of safety was effectively captured by the SVR model. More importantly, the model also maintained favourable predictive performance on the hold-out test subset, with an R2 of 86.56%, RMSE of 0.07497, MAE of 0.0666, and MRE of 5.29%. The predicted trend remained consistent with the measured values, and no test case produced a relative error greater than 10%; the maximum relative error was 9.29%.
These results indicate that the SVR model achieved a good balance between fitting capability and out-of-sample prediction accuracy on the representative split. However, because the dataset is small, the hold-out result should not be interpreted in isolation. Its primary role is to provide a transparent and visual demonstration of model behaviour, whereas the reliability of the modelling strategy is primarily supported by the repeated nested validation design. The approximately 11-percentage-point gap between the training R2 (97.58%) and the test R2 (86.56%) on the representative split warrants an explicit comment. Such a gap is consistent with mild overfitting: the RBF-SVR with = 8 and γ = 0.5 (the typical optimal setting selected by the inner-loop search) is sufficiently flexible to fit small local patterns of the 60 training cases more tightly than its true ability to generalize to unseen slopes. The smaller train–test gap observed under repeated nested CV, where the training R2 averages 0.92 against an outer-test R2 of 0.41 across 50 folds, further indicates that the larger 11-pp gap on this single split is partly stochastic and that any individual hold-out estimate should be interpreted alongside the full distribution rather than in isolation.
4.2. Comparative Performance of SVR, RBFNN, and BPNN
The RBFNN and BPNN models were evaluated using the same representative hold-out split as the SVR model. As shown in
Figure 7 and
Figure 8, both neural-network-based models reproduced the general trend of the data to some extent, but their predictive performance on the test subset was less stable than that of SVR.
For the RBFNN model, the training-set performance was acceptable, with an R2 of 92.84% and an RMSE of 0.0891. However, its test-set performance declined markedly, with an R2 of 70.15%, RMSE of 0.11838, MAE of 0.1085, and MRE of 8.88%. Eight test cases produced relative errors greater than 10%, and the largest relative error reached 15.70%. This indicates that the RBFNN model approximated the overall data trend but was more sensitive to local deviations in the hold-out subset.
The BPNN model exhibited a similar pattern. Its training-set performance was close to that of RBFNN, with an R2 of 92.63% and an RMSE of 0.0906. On the test subset, the BPNN model achieved an R2 of 71.05%, RMSE of 0.10955, MAE of 0.0978, and MRE of 7.91%. Six test cases yielded relative errors greater than 10%, and the maximum relative error reached 16.93%. Compared with SVR, BPNN exhibited larger local prediction errors and weaker consistency between predicted and observed safety factors.
The sample-wise error statistics in
Table 3 further support this comparison. SVR generally produced smaller absolute and relative errors across the 20 hold-out test cases, whereas RBFNN and BPNN exhibited several cases with noticeably larger deviations. The summary metrics in
Table 4 further confirm that SVR achieved the highest R
2 and the lowest RMSE, MAE, and MRE among the three models. Therefore, within this representative partition, SVR provided the most stable and accurate prediction of the factor of safety.
4.3. Pairwise Error Comparison and Statistical Evidence
To further examine whether the observed differences among the three models were reflected at the case level, Wilcoxon signed-rank tests were performed on the paired absolute and relative errors of the 20 hold-out test cases. A non-parametric test was chosen because the comparison was based on paired errors from the same test cases and does not require the assumption of normally distributed error differences.
As summarized in
Table 5, SVR produced significantly lower absolute and relative errors than RBFNN and BPNN on the representative hold-out subset. The differences between SVR and RBFNN were significant for both AE and RE, and the same pattern was observed for the comparison between SVR and BPNN. In contrast, the difference between RBFNN and BPNN was not statistically significant at the 0.05 level.
These results provide additional evidence that the advantage of SVR was not limited to the overall performance metrics but was also reflected in paired case-wise errors. Nevertheless, the statistical test was conducted on a single representative hold-out split and should therefore be interpreted cautiously. Its purpose is to support the case-level comparison among models, whereas the broader assessment of generalization performance should rely on the repeated nested cross-validation framework.
4.4. Performance Rationale Under Small-Sample Nonlinear Slope-Stability Prediction
The favourable performance of SVR can be explained by the compatibility between its learning mechanism and the characteristics of the present slope-stability problem. The factor of safety of circular-failure soil slopes is controlled by nonlinear interactions among soil strength parameters, slope geometry, and pore-pressure-related stress reduction. These interactions are difficult to represent using a simple linear regression form, especially when the available data are limited.
The RBF-kernel-based SVR model is well suited to this type of problem because it can project the original input variables into a high-dimensional feature space and approximate nonlinear relationships without explicitly prescribing the functional form of the slope-stability equation. At the same time, the structural-risk-minimization principle and the regularization term in SVR help control model complexity. This is particularly important in small-sample geotechnical datasets, where excessive flexibility can lead to apparently good training performance but poor generalization to unseen cases.
The comparison with RBFNN and BPNN supports this interpretation. Although the two neural-network-based models fitted the training data reasonably well, their test-set performance deteriorated more substantially. This suggests that they were more vulnerable to local sample characteristics and partition-dependent fluctuations. In contrast, SVR maintained lower test errors and fewer large local deviations, indicating a more favourable balance between nonlinear fitting capacity and model restraint. Therefore, the advantage of SVR in this study should not be understood as higher numerical accuracy, but as stronger generalization behaviour under limited-data conditions.
This point is central to the contribution of the present work. For small-sample slope-stability prediction, a model with a slightly lower apparent training fit but stronger out-of-sample stability is more valuable than a highly flexible model that performs well only on selected partitions. The proposed SVR framework therefore shifts the emphasis from accuracy-oriented model application to reliability-oriented model evaluation.
4.5. Geotechnical Consistency and Applicability of the Proposed Framework
The prediction results are also consistent with the basic mechanical understanding of circular-failure slope stability. In general, increases in cohesion and internal friction angle are expected to improve shear resistance and increase the factor of safety. In contrast, increases in slope height, slope angle, and pore pressure ratio tend to reduce slope stability by increasing driving effects or reducing effective stress. The fact that the SVR model produced stable predictions within this mechanically meaningful input space strengthens confidence that the learned relationship was not merely a statistical artifact.
This geotechnical consistency is important because machine-learning-based slope-stability models should not be judged only by error metrics. In engineering applications, a useful predictive model should also behave in a way that is compatible with established slope-stability mechanisms. The present results suggest that the proposed SVR framework can capture the combined influence of strength, geometry, and pore-pressure variables in a manner that is broadly consistent with the expected response of circular-failure soil slopes.
However, the applicability of the proposed model should be clearly bounded. The framework is most suitable for rapid preliminary assessment, early-stage screening, and comparative evaluation of circular-failure soil slopes whose input variables fall within or close to the range of the training database. It should not be used as a direct replacement for detailed site investigation, limit-equilibrium analysis, numerical modelling, or design-code-based assessment. Its reliability may decrease when applied to slopes with markedly different geological conditions, non-circular or progressive failure mechanisms, strong spatial heterogeneity, complex three-dimensional geometry, or coupled seepage-deformation processes that are not represented in the present database.
Therefore, the proposed SVR framework should be regarded as a data-driven decision-support tool rather than an independent design method. Its main value lies in providing a robust and transparent preliminary prediction of the factor of safety under limited-data conditions. When used together with engineering judgement, site-specific investigation, and conventional stability analysis, it can help improve the efficiency and reliability of early-stage slope stability assessment. A complementary analysis comprising (i) the full distribution of test-set metrics across the 50 outer folds, (ii) a direct comparison with standard 5-fold cross-validation, (iii) partial-dependence plots of the SVR model providing quantitative evidence of geotechnical consistency, and (iv) a brief discussion of the computational cost of the proposed framework is provided in
Supplementary Material S1.