1. Introduction
The stability of rock slopes is a critical issue in open-pit mining, hydropower engineering, and transportation infrastructure construction. As an intrinsic factor controlling the deformation and failure behavior of slopes, rock mass quality plays a vital role in slope stability analysis and engineering design [
1]. Accurate and reliable rock mass quality classification not only provides an important basis for slope stability assessment and safety protection, but also helps prevent geological disasters such as collapses and landslides. Therefore, improving the accuracy of rock mass quality classification is crucial to ensuring the safety of infrastructure construction in slope engineering.
Traditional rock mass quality classification methods can be divided into single-factor methods and multi-factor methods based on the number of influencing factors considered. Single-factor methods classify rock mass quality using a single evaluation index, primarily rock strength [
2] and rock quality designation [
3]. These methods are simple and quick, but they lack accuracy. This is because a single evaluation index cannot truly reflect rock mass quality. In contrast, multi-factor methods use multiple evaluation indices to classify rock mass quality. For instance, the rock mass rating (RMR) system [
4] considers five influencing factors: uniaxial compressive strength (UCS), rock quality designation (RQD), spacing of discontinuities (Sd), groundwater, and conditions of discontinuities. The Q-system [
5] considers six influencing factors, including RQD, number of joint sets, joint roughness number, joint alteration number, joint water reduction factor, and stress reduction factor. The Geological Strength Index (GSI) classification method [
6] introduces the GSI to characterize rock mass quality. These methods provide more accurate and comprehensive results for rock mass classification. However, these empirical classification methods rely heavily on engineering experience and predefined scoring criteria. Their discrete evaluation structures and simplified linear weighting strategies make it difficult to accurately characterize the complex nonlinear interactions among geological parameters under heterogeneous slope conditions. In addition, different experts may produce inconsistent classification results for the same rock mass, reducing the robustness and reproducibility of the evaluation process.
To address the drawbacks of empirical approaches and improve the accuracy and objectivity of classification results, fuzzy comprehensive evaluation methods have been widely employed in rock mass classification research. For instance, Wu et al. [
7] established a probability-based rock mass quality classification model by combining Monte Carlo simulation with the ideal-point method. Dai et al. [
8] developed a comprehensive evaluation model for classifying the rock mass quality of the roadways at the Sanshandao gold mine by combining the entropy weighting method, ideal point method, and gray relational analysis. Wang and Guo [
9] proposed a method for classifying rock mass quality based on an improved cloud model, effectively quantifying qualitative evaluation indicators. Wu et al. [
10] established a slope rock mass quality classification model based on interval continuous mathematics. Furthermore, Fan et al. [
11] developed a rock mass classification model that combines subjective and objective weighting with topological theory, and obtained classification results at the Kunyang phosphate mine superior to both RMR and Q-system. Although fuzzy comprehensive evaluation methods have enriched rock mass quality assessment research, their performance still depends strongly on prior knowledge, manually designed membership functions, and subjective weighting schemes. Moreover, these methods generally exhibit limited self-learning capability and poor adaptability when dealing with high-dimensional nonlinear geological data.
In recent years, owing to the accelerated development of computer technology, artificial intelligence (AI) has become a powerful tool for addressing nonlinear, multi-parameter, and uncertainty-related problems in mining and geotechnical engineering. Compared with empirical approaches and fuzzy comprehensive evaluation methods, the most distinctive advantage of AI is its ability to learn and build intelligent classification models from large amounts of rock mass quality data. For instance, Liu et al. [
12] and Santos et al. [
13] applied support vector machines and artificial neural networks, respectively, to construct rock mass quality classification models from datasets containing 25 and 30 cases. Santos et al. [
14] further compared the performance of four different intelligent algorithms in rock mass classification. They demonstrated that the ensemble-learning-based random forest algorithm achieves the best classification performance. Considering the advantages of ensemble learning, Sheng et al. [
15] combined stacking strategies with deep neural networks to develop an ensemble model for classifying the slope rock mass quality from a dataset containing 310 cases, and achieved accurate classification results at a disused quarry. To further improve classification efficiency and accuracy, Wang et al. [
16] compiled a dataset of 266 cases from the Deziwa open-pit mine. Based on this dataset, they established an ensemble classification model integrating comprehensive weight analysis, the equilibrium optimizer, and adaptive gradient boosting. Although these intelligent models have shown promising performance in rock mass quality classification, their applicability remains challenging. For instance, support vector machines are sensitive to parameter settings, neural networks generally require large-scale datasets to avoid overfitting, and boosting-based ensemble methods may exhibit reduced robustness when handling noisy or incomplete data.
In addition, the performance of intelligent models is highly sensitive to hyperparameter settings. Inappropriate configurations not only increase computational cost but also degrade classification accuracy [
17]. Therefore, to achieve optimal performance, it is essential to employ suitable optimization algorithms for hyperparameter tuning. In this regard, Li and Wang [
18] combined particle swarm optimization with support vector machines to achieve accurate classification of the rock mass quality of highway slopes. Hu et al. [
19] applied particle swarm optimization, genetic algorithms, and gray wolf optimization to optimize support vector machine models for classifying rock mass quality at the Chambishi copper mine, demonstrating that gray wolf optimization significantly enhances classification performance across optimal parameter combinations. Yang et al. [
20] further improved the classification accuracy of extreme gradient boosting for underground rock mass quality assessment by adopting an advanced zebra optimization algorithm. Nevertheless, several challenges remain unresolved. First, many intelligent models are highly sensitive to incomplete, noisy, or imbalanced geological datasets, which are common in practical engineering investigations. Second, conventional machine-learning algorithms often struggle to effectively capture complex feature interactions and uncertainty characteristics among geological parameters. Third, deep neural network-based methods generally require large-scale labeled datasets and extensive hyperparameter tuning, which may limit their applicability in small-sample geotechnical engineering scenarios.
To address the above shortcomings, this study proposes a hybrid intelligent classification framework integrating the Deep Forest (DeepForest) model with three metaheuristic optimization algorithms—Brown Bear Optimizer (BBO), Tuna Swarm Optimizer (TSO), and Sparrow Search Algorithm (SSA)—for slope rock mass quality classification. The proposed framework aims to improve classification robustness under incomplete and imbalanced datasets, enhance nonlinear feature interaction learning, and optimize model hyperparameters for small-sample geotechnical engineering conditions. DeepForest is an advanced hybrid model that combines the predictive advantages of both neural networks and random forests. Compared with other neural networks and ensemble-learning algorithms, the unique cascaded forest architecture effectively reduces the dependence of DeepForest on hyperparameter tuning and large-scale datasets. This makes DeepForest particularly suitable for small-sample, high-dimensional, and uncertainty-related geotechnical engineering problems. The remaining sections of this paper are outlined below.
Section 2 describes the DeepForest model, BBO, TSO, and SSA optimization algorithms in detail.
Section 3 introduces the dataset for classifying slope rock mass quality and presents the corresponding data analysis.
Section 4 demonstrates the metrics for evaluating performance and the development process of the model.
Section 5 evaluates the performance of the proposed model and verifies its applicability through practical engineering applications.
Section 6 summarizes the principal conclusions and discusses the limitations of this study along with potential directions for future research.
2. Methodology
2.1. Deep Forest Algorithm
The deep forest (DeepForest) is a deep-learning algorithm based on random forests. It utilizes a multi-layer cascade structure (each layer comprising multiple random forests) to progressively extract features, ultimately outputting classification or regression results [
21]. The core architecture of DeepForest primarily comprises two components: multi-grained scanning and a cascade forest. As a novel ensemble-learning algorithm combining random forests and deep neural networks, DeepForest first transforms the original input features through multi-grained scanning to strengthen their representational capacity. Subsequently, it employs a cascade structure for layer-by-layer representation learning, thereby effectively improving the classification or regression performance of the model. A schematic diagram of the DeepForest algorithm is shown in
Figure 1.
For the front-end multi-grained scanning structure, the DeepForest algorithm employs a sliding window to divide the feature vectors of the raw data into multiple subsequences or sub-regions, where each subsequence represents data fragments at different scales. These subsequences or sub-regions obtained through the sliding window generate feature vectors of different dimensions, which depend on the window size and the feature extraction method. Obviously, smaller windows generate lower-dimensional feature vectors, while larger windows generate higher-dimensional feature vectors. In this way, data statistical characteristics over a longer time range can be reflected. Through this multi-grained scanning structure, the DeepForest algorithm can comprehensively utilize information from different scales and time spans, thereby gaining a more comprehensive understanding of the complexity and dynamics of the modeling data.
For the back-end cascade forest structure, DeepForest adopts a multi-level decision-making framework to further improve the performance and robustness of the model. In a cascade forest, each layer of the random forest consists of multiple decision trees, each of which is constructed on a random feature space and data subset. At the same time, the output from the previous layer is used as the input for the next layer. This allows deeper trees to learn based on higher-level feature combinations, thereby capturing more complex data patterns and relationships. Through this multi-level ensemble learning, which is structurally similar to deep neural networks, the cascade forest can progressively extract and integrate abstract feature representations of the data, endowing the overall model with stronger generalization ability and prediction accuracy. It is worth noting that this hierarchical design significantly improves the model’s ability to handle complex data and to cope with noise and the curse of dimensionality.
2.2. Brown Bear Optimizer
The brown bear optimizer (BBO) is a metaheuristic optimization algorithm based on natural behavior proposed by Prakash et al. [
22]. It is inspired by observations of communication patterns among brown bears, particularly their pedal scent marking and sniffing behaviors. The main characteristics of this behavior include maintaining gait, careful stepping, and twisting footsteps.
In the BBO algorithm, different groups of brown bears inhabiting the same territory are regarded as individual solution sets of the population, and the pedal scent marking generated by each group is treated as a decision variable within each solution set. The territory of the brown bears is regarded as the search space for the problem. During initialization, different groups are randomly generated within the territory of brown bears and assigned a specific amount of pedal scent marking. The markings of different groups have unique characteristics and are retained in their respective territories. The scope of territory is defined by the decision variable boundaries of the corresponding problem. The mathematical expression for the random initialization of brown bear groups is as follows:
where
is the
j-th pedal scent marking of the
i-th brown bear group, and
is a uniformly distributed random number in the range
. This initialization strategy ensures that candidate solutions are uniformly distributed within the search space, thereby improving population diversity and reducing the risk of premature convergence in early iterations.
In most cases, only male individuals exhibit the behavior of pedal scent marking. To simplify the problem, the number of male individuals in each group is set to 1. This gives the male member of each group a unique gait when walking. Therefore, the pedal scent markings produced by the male brown bears in each group exhibit distinct characteristics. Assuming that the behavior of pedal scent marking based on the unique gait will continue until one-third of the total number of iterations
. A mathematical model of this process can be described as follows:
where
is a uniformly distributed random number in the range
;
is the occurrence factor for the
k-th iteration, which increases linearly with the number of iterations; and
is the current iteration number. In this stage, the update mechanism emphasizes exploration by amplifying individual differences among solutions, which helps the algorithm explore a wider search space and avoid early stagnation in local optima.
Between one-third and two-thirds of the total number of iterations, the pedal scent markings of the brown bear are updated according to the characteristics of careful stepping. This is primarily to enhance the behavior of pedal scent markings. A mathematical model of this process can be described as follows:
where
and
are uniformly distributed random numbers in the range
;
is the step factor for the
k-th iteration; and
is the step length for the
k-th iteration. This stage introduces a balance between the best and worst solutions, enabling the algorithm to exploit promising regions while maintaining population diversity. As a result, the search process gradually shifts from global exploration to local refinement.
From two-thirds of the iterations to the final stage, the pedal scent markings of the brown bear are updated according to the characteristics of twisting footsteps. This is to further establish more durable pedal scent markings. At the same time, these markings will also be utilized to create scent maps by other members of the group. A mathematical model of this process can be described as follows:
where
represents the
k-th twist angular velocity for the
k-th iteration, and
is a uniformly distributed random number in the range
. This mechanism further enhances local exploitation capability by intensifying search around high-quality solutions, while simultaneously using information from inferior solutions to escape suboptimal regions.
Sniffing behavior is common among members of every brown bear group. By sniffing pedal scent markings, they can communicate with one another and move within their territory. To move, brown bears begin sniffing randomly selected pedal marks within the territory. Then they will move towards the pedal scent markings of their own group, ignoring those of the others. The sniffing behavior selects two random candidate solutions and updates the movement process of brown bears using the following mathematical model:
where
is a uniformly distributed random number in the range
. This operation enables information exchange between candidate solutions, allowing individuals to move toward better-performing solutions while preserving stochastic exploration behavior. This helps improve convergence stability and prevents premature convergence. Finally, the BBO optimizer achieves an effective balance between exploration and exploitation through staged behavioral transitions.
2.3. Tuna Swarm Optimizer
The tuna swarm optimizer (TSO) is a metaheuristic optimization algorithm based on population behavior proposed by Xie et al. [
23]. It is inspired by observations of two types of foraging behavior in tuna: spiral foraging and parabolic foraging. The basic idea is to treat each individual in the population as a tuna. Each tuna searches for the optimal solution through its own foraging strategy, while also being influenced by other tuna’s foraging. In each iteration of the algorithm, each tuna adjusts its position based on its own fitness and the fitness of surrounding tuna, thereby better adapting to the environment and finding the optimal solution.
Similar to most metaheuristic algorithms based on population behavior, TSO initializes the population by randomly generating individuals within the search space, which can be mathematically expressed as follows:
where
rand is a random number in the range
;
and
are the upper and lower limits of the search space, respectively;
is the initialization value of the tuna swarm; and
N is the population size.
Spiral foraging is the first foraging strategy of the tuna swarm. When tuna swarms feed, they first swim in a spiral shape, then drive prey into shallow waters. This is because prey in shallow waters are easier to catch. The specific mathematical model is as follows:
where
denotes the position of the individual at the (
t + 1)-th iteration;
denotes the optimal position of the current individual;
denotes the position of the random individual; and
and
represent the weight coefficients controlling the movement of the individual towards the optimal individual and the previous individual, respectively. Their formulations are given in Equation (12):
where
is a constant representing the coefficient of the degree to which the tuna follows the optimal individual and the previous individual during the initial stage;
represents the current number of iterations;
represents the maximum number of iterations; and
is the spiral factor, representing the extent to which an individual moves towards a random individual or an optimal individual. The specific formulation of
is given as follows:
where
is a random number in the range
. The tuna swarm improves its search capability for the space surrounding the prey through spiral foraging.
According to the first case in Equation (11), when all tunas are spiraling around their prey, they will seek the optimal position in the hunting space. However, when the optimal position fails to capture prey, blind following will reduce the feeding efficiency of the tuna swarm. Therefore, to enhance the global search capability of the tuna swarm, a random position is introduced as a search point for spiral foraging, as described in the second case of Equation (11). With increasing numbers of iterations, the random position will progressively transform into the optimal position, and the search capability and accuracy of the TSO algorithm will improve significantly.
Parabolic foraging is the second foraging strategy of the tuna swarm. When tuna swarms feed, they swim in a parabolic shape to capture their prey. Additionally, tuna also conduct local searches within their activity area to discover potential food sources. The specific mathematical model is as follows:
where
is a random number with a value of −1 or 1, which controls the direction of individual position updates; and
is an adjustment coefficient, which controls the magnitude of individual position updates.
Through the cooperation of the two aforementioned foraging strategies, the tuna swarm constantly updates individual positions until the stopping condition is satisfied. During each iteration, tuna updates the optimal positions of individuals according to their current location and fitness values, including historical optimums and global historical optimums. At last, the position and fitness value of the optimal individual are returned.
2.4. Sparrow Search Algorithm
The sparrow search algorithm (SSA) is a metaheuristic optimization algorithm based on sparrow predatory behavior proposed by Xue et al. [
24]. It categorizes sparrow populations into discoverers, followers, and scouts. Each role exhibits different biological behaviors based on its own state and the external environment. Due to its advantages of simplicity, ease of implementation, and few control parameters, SSA has been widely applied in various optimization problems.
Assume there are
sparrows foraging in the search space, with the upper and lower bounds denoted by
ub and
lb, respectively, and the dimensionality of the space is defined as
. The position of the sparrow can be represented as
, and the foraging ability (fitness) of the
i-th sparrow is expressed as
. Based on the fitness of each sparrow, the population can be divided into discoverers and followers, with quantities represented as
and
, respectively. Discoverers lead the foraging direction of the population, while followers follow the discoverers to forage. Their position update rules are given in Equations (15) and (16), respectively. Sparrows switch between these two behaviors based on their own fitness.
where
represents the
i-th sparrow in the
j-th dimension,
;
and
represent the number of iterations and the maximum number of iterations, respectively;
α is a random number in the range
;
and
are the warning value and safety threshold in the ranges
and
, respectively. When
, the current sparrow population has not perceived danger, and discoverers continue to search based on the current position. When
, the current sparrow flock perceives danger, and discoverers lead the population to move randomly to avoid danger.
where
is a random number following a normal distribution;
denotes the worst position of the sparrow population at the
t-th iteration;
denotes the optimal position of the discoverer at (
t + 1)-th iteration;
and
are both
dimensional matrices. All elements in matrix
are 1, and elements in matrix
are randomly assigned 1 or −1:
. When
, the current follower is considered to have low fitness and should fly elsewhere to forage; otherwise, the current follower follows the sparrow positioned at the optimal location to forage.
Randomly selected scouts are responsible for detecting the surrounding environment and adjusting their positions to avoid danger. The number of scouts can be represented as
, and their position update is governed by the following formula:
where
is a random number in the range
;
represents the fitness of the sparrow
;
and
represent the fitness of the sparrow population at the optimal and worst positions, respectively;
is the smallest constant that ensures the denominator is not zero;
denotes the optimal position of the sparrow population at the
t-th iteration; and
is a normally distributed random number in the range
. When
, the sparrows located at the center move toward the peripheral sparrows. When
, the sparrows at the periphery move toward the center to reduce the likelihood of encountering danger.
2.5. Hybrid Optimization Model Based on DeepForest
The objective of this study is to optimize the DeepForest model for classifying slope rock mass quality. For this purpose, three metaheuristic optimization algorithms named BBO, TSO, and SSA were employed to find the optimal hyperparameter combination of the DeepForest model, resulting in the BBO-DeepForest, TSO-DeepForest, and SSA-DeepForest models, respectively. Although these algorithms differ significantly in their metaheuristic principles, their cores are all related to swarm intelligence behavior. Consequently, their optimization procedures for the DeepForest model follow a consistent framework. The specific optimization flow is illustrated in
Figure 2.
- (1)
Data preprocessing: Rock mass data are collected from real geotechnical engineering projects. Randomly split training and test sets, ensuring that all three models are trained and evaluated on identical data partitions. Process the training set, including imputing missing values, removing outliers, and balancing the categories.
- (2)
Hyperparameter selection and initialization: The hyperparameters of the DeepForest model and their corresponding search ranges are defined. In addition, the population size and the maximum number of iterations for the BBO, TSO, and SSA algorithms are specified.
- (3)
Iterative optimization: An appropriate loss function is established to assess the classification performance of the model during the optimization process. During each iteration, the optimization algorithms generate candidate hyperparameter combinations, which are subsequently used to construct and train the corresponding DeepForest model. The classification performance of the DeepForest model on the validation set is then returned to the optimization algorithms as the fitness value for updating the population. The optimal hyperparameters are determined by minimizing the loss value over successive iterations.
- (4)
Optimal model output: Once the termination condition of the iterative process is satisfied, the optimal model with the best combination of hyperparameters is obtained. An independent test set is subsequently employed to assess the performance of the model in slope rock mass classification.
5. Result and Discussion
5.1. Model Evaluation
After obtaining optimal hyperparameter combinations for the DeepForest model, an independent test set was used to assess the classification performance of the following models: BBO-DeepForest, TSO-DeepForest, SSA-DeepForest, and DeepForest without hyperparameter optimization. The confusion matrices of four models on the test set are presented in
Figure 11. The confusion matrix is a reliable and intuitive visualization tool for classification results, where the values on the diagonal represent correctly classified samples, and those on the off-diagonal represent incorrectly classified samples. The unoptimized DeepForest model performs significantly worse at classifying Grade II and Grade III rock mass quality than the optimized models. In contrast, the BBO-DeepForest model achieves high classification accuracy for these two grades.
The accuracy, precision, recall, and F1-score, calculated from confusion matrices, are summarized in
Table 5. The unoptimized DeepForest model exhibits the poorest performance, with accuracy, precision, recall, and F1-score values of 0.780, 0.629, 0.622, and 0.624, respectively. In contrast, the optimized models achieve improved performance across all evaluation metrics. Overall, BBO-DeepForest demonstrates the best classification performance among the three optimized models, obtaining the highest values for all four metrics (accuracy: 0.878, precision: 0.682, recall: 0.678, and F1-score: 0.678). Nevertheless, all four models showed limited classification capability for Grade I samples, which may be attributed to the extremely limited number of Grade I cases available in the dataset (only five samples). Consequently, the ranking scores of all models are further presented in
Figure 12. The results indicate that SSA-DeepForest outperforms TSO-DeepForest due to its higher score. Ultimately, BBO-DeepForest exhibited the best classification performance with the highest ranking score of 16.
To evaluate the classification performance of the models more comprehensively,
Figure 13 presents the ROC curves of the four models along with their corresponding AUC values. For ROC curves, the larger the distance between the curve and the diagonal line, the higher the AUC value, and the better the model performance. According to
Figure 13, all models achieve average AUC values exceeding 0.9. Particularly, the BBO-DeepForest model, which achieved the highest AUC values for rock mass classification (with AUC values of 1.0, 0.974, 0.941, 0.900, and 0.989 for Grades I, II, III, IV, and V, respectively). The TSO-DeepForest and SSA-DeepForest models follow closely, with average AUC values of 0.959 and 0.960, respectively. The DeepForest model exhibits the lowest AUC (0.930). It further demonstrates that the classification performance of the unoptimized DeepForest is inferior to that of the optimized models, particularly BBO-DeepForest.
5.2. Model Comparison
In complex geological environments, collecting complete rock mass data is generally regarded as a severe challenge. Therefore, the proper handling of missing data becomes crucial for developing accurate and reliable classification models. Considering that feature Sd exhibited the highest missing value ratio (approximately 70%) in the constructed dataset, a sensitivity analysis was conducted to evaluate whether this heavily imputed variable could systematically influence the classification results. Specifically, the feature Sd was removed from the dataset, and the entire modeling procedure was repeated under the same experimental settings using the unoptimized DeepForest model, including data imputation, model training, and performance evaluation. The average results of 10 independent experiments are presented in
Table 6.
The comparison results indicate that the overall prediction performance remains relatively stable after removing the high-missing-ratio variable. Compared with the model including Sd, the classification accuracy of the model without Sd increased only slightly from 0.798 to 0.817, while precision, recall, and F1-score also exhibited only marginal changes. These findings suggest that the proposed framework does not excessively rely on heavily imputed features and that the MICE-based imputation process did not introduce severe systematic bias into the final classification results.
To further demonstrate the rationality and reliability of the MICE method adopted in this study, it was compared with seven widely used imputation methods, including mean imputation, median imputation, k-nearest neighbors (KNN), MissForest, expectation maximization (EM), generative adversarial imputation networks (GAIN), and multiple imputation using denoising autoencoders (MIDA). DeepForest and BBO-DeepForest were selected as classifiers. To ensure a fair comparison, identical training and testing sets were used across all methods. IsoForest and SMOTE were also not performed on the training set. The comparison results are presented in
Figure 14.
According to
Figure 14, the imputed dataset using the MICE method helped the DeepForest and BBO-DeepForest models achieve the highest accuracy in rock mass classification. Moreover, by comparing the accuracy of BBO-DeepForest and DeepForest across different imputation methods, it can be observed that the BBO algorithm consistently enhances the classification performance of DeepForest. Notably, for the imputed dataset using the KNN method, the classification accuracy is improved by 9.76%. This further demonstrates the superiority of the BBO algorithm.
To further explain the above comparison results, an additional evaluation of imputation quality was conducted. Specifically, the root mean square error (RMSE) and Jensen–Shannon divergence (JSD) were adopted to quantitatively evaluate the imputation performance of different algorithms. RMSE was used to measure the deviation between imputed values and true values, while JSD was employed to evaluate the similarity between the probability distributions of the imputed and original datasets. Lower RMSE and JSD values indicate better imputation performance and distribution consistency, respectively. The comparison experiment was conducted using samples with complete observations. Specifically, portions of the observed values were randomly masked at missing ratios of 10%, 20%, and 30%, thereby generating pseudo-missing datasets with known ground truth. Then, different imputation algorithms were employed to estimate the missing values, and the imputed datasets were evaluated using RMSE and JSD. To ensure a fair and unbiased comparison, the average results of 10 independent runs were reported.
Table 7 summarizes the imputation performance of different algorithms.
The results indicate that MICE achieved relatively lower imputation errors and better distribution consistency in most cases compared with the other imputation algorithms. In particular, under moderate and high missing ratios (20% and 30%), MICE consistently maintained competitive RMSE and JSD values, demonstrating its capability to provide relatively reliable estimations for missing geological parameters. These findings demonstrate that the superiority of MICE in the proposed framework is not only reflected in classification accuracy but also in the quality and distribution consistency of the imputed data.
Furthermore, to better illustrate the competitiveness of the proposed model relative to other machine-learning models, several advanced classification models were introduced for comparison, including SVM, artificial neural network (ANN), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost). To ensure a fair and robust comparison, repeated random sub-sampling validation was adopted. Specifically, the dataset was randomly divided into training (80%) and testing (20%) subsets ten times using different random seeds, while maintaining the same preprocessing strategy, including missing value imputation, outlier removal, data balancing, and normalization. The average and standard deviation values of accuracy, precision, recall, and F1-score were calculated to evaluate both classification performance and model stability. The hyperparameters of all models followed the default settings in the scikit-learn library. The comparative results are presented in
Table 8.
As shown in
Table 8, DeepForest achieved the best classification performance among all models, with an average accuracy of 80.5% and an F1-score of 76.1%. In contrast, SVM and ANN exhibited relatively lower predictive capability, indicating limited adaptability to the nonlinear and heterogeneous characteristics of the rock mass dataset. As representative boosting-based ensemble-learning models, XGBoost and GBDT have demonstrated competitive performance compared with traditional machine-learning models. In particular, XGBoost achieved an average accuracy of 77.3%, which was close to that of DeepForest. However, DeepForest still outperformed XGBoost in all evaluation metrics, especially in terms of macro-average F1-score and Recall, suggesting a stronger capability for handling imbalanced multi-class rock mass quality classification tasks. Furthermore, DeepForest exhibited relatively stable performance across repeated random experiments, with moderate standard deviation values in all metrics. This indicates that the cascade forest structure possesses good robustness and generalization potential under small-sample and heterogeneous engineering geological conditions. Overall, the proposed DeepForest model demonstrated stronger competitiveness compared with other classification models.
5.3. Ablation Study on External Feature St
The DeepForest model for classifying slope rock mass quality was developed using a collected dataset containing six feature parameters. In addition to five inherent rock mass parameters (UCS, RQD, Sd, Kv, and W), an external feature, St, was employed to distinguish between engineering scenarios (slopes and non-slopes). Generally, rock mass classification should be independent of the specific engineering scenario. The same rock mass should be classified into the same quality grade regardless of whether it occurs in tunnels, slopes, or other engineering environments. However, the dataset used in this study was compiled from multiple literature sources involving different engineering scenarios. In practical engineering, slope rock masses are commonly influenced by weathering, unloading effects, stress redistribution, and joint opening processes, which may lead to systematic differences in the statistical characteristics of slope and non-slope rock mass datasets. Consequently, St was incorporated as an external feature to help the model better capture the distribution heterogeneity among multi-source datasets.
To quantitatively evaluate the contribution of St, an ablation study was conducted. Specifically, the original training set (80%) obtained through stratified random sampling was divided into two versions: one including St and the other without St. The unoptimized DeepForest model was trained separately on these two datasets, while the independent testing set (20%) was used for performance evaluation. Accuracy, precision, recall, and F1-score were adopted as evaluation metrics. To ensure a fair and unbiased comparison, the average results of 10 independent runs were reported. The comparison results are presented in
Table 9.
Table 9 shows that incorporating St can improve the classification performance of the model. Compared with the model without St, the model including St achieved improvements of 1.2%, 2.9%, 1.4%, and 2.4% in accuracy, precision, recall, and F1-score, respectively. These results indicate that although St is not an inherent parameter of the RMR system, it can serve as an effective auxiliary contextual feature for enhancing the generalization capability of intelligent classification models trained on multi-source datasets.
5.4. Model Explanation
Model interpretability is essential for developing rock mass quality classification models. In particular, feature importance analysis can help explain the contribution of input variables to the classification outcomes. To further investigate the significance of different features and their impact mechanisms in rock mass quality classification, the Shapley additive explanations (SHAP) method [
35] was employed to capture the nonlinear relationships between features and the target variable. SHAP is a game theory-based interpretability method that effectively quantifies the marginal contribution of each input to the model output. From a statistical perspective, it reveals the mean marginal effect of appending one feature to other feature subsets. Mathematically, the SHAP value can be defined as follows:
where
is the SHAP value of feature
i;
is the complete feature set;
is the feature subset excluding feature
i;
p is the total number of features; and
is the model output value for the feature subset
.
Considering the superior performance of the BBO-DeepForest model, SHAP values were employed during the classification process to quantify the contributions of the six feature parameters: UCS, RQD, Sd, Kv, W, and St. The overall contribution of each input feature was evaluated by computing the mean absolute SHAP values, and SHAP analysis was further used to illustrate how these factors influence the classification output. The detailed results are presented in
Figure 15 and
Figure 16, respectively. The sign of the SHAP value signifies whether a feature has a positive or negative impact on the model prediction. The larger the absolute value of SHAP, the greater the impact of input features on the classification output.
According to
Figure 15, RQD and UCS exhibit relatively high mean SHAP values, indicating that these parameters contribute substantially to the prediction behavior of the BBO-DeepForest model. Sd, W, and Kv also show noticeable contributions to the classification process. Although St presents comparatively lower average SHAP values, it still provides useful contextual information for distinguishing between slope and non-slope rock masses. These results suggest that the model prediction is strongly associated with multiple geological parameters, particularly RQD and UCS. However, due to the existence of correlations among several rock mass parameters, the SHAP values should be interpreted as model-specific feature attribution results rather than strict indicators of independent physical importance.
The analysis results in
Figure 16 further illustrate the sensitivity of the BBO-DeepForest model to variations in geological parameters. Among all features, RQD and UCS exhibit comparatively higher SHAP contributions, indicating that the model prediction is more responsive to changes in these variables. Additionally, Sd demonstrates a noticeable influence on the classification results. In contrast, the SHAP distributions of W, Kv, and St are relatively more dispersed, suggesting comparatively weaker global attribution effects within the trained model. Overall, the SHAP analysis provides an interpretable description of the prediction behavior of the proposed model from a data-driven perspective. Nevertheless, because several geological parameters exhibit moderate to strong multicollinearity, the SHAP-based feature contributions should not be directly interpreted as independent geomechanical dominance or causal relationships.
To further evaluate the correlations among the geological parameters, variance inflation factor (VIF) analysis was conducted after missing value imputation, as shown in
Table 10. The results indicated that several variables exhibited moderate to strong multicollinearity, which is expected because many rock mass parameters are intrinsically coupled in geological environments. Since the proposed BBO-DeepForest model is based on ensemble tree learning, the presence of correlated variables mainly affects the interpretation of feature attribution rather than the predictive capability of the classifier itself.
5.5. Engineering Verification
To verify the feasibility of the proposed model in slope rock masses, three independent engineering cases from the Luming Molybdenum Mine in Yichun, China, were employed for external validation. The validation dataset was completely excluded from model training, hyperparameter optimization, and cross-validation, and was used solely for independent testing of the trained models.
The Luming molybdenum mine is a large-scale mining project undertaken by China Railway Group in the Lesser Khingan Range. The mine is situated within the Luming Forest Farm of the Tieli Forestry Bureau in Heilongjiang Province, approximately 2 km northeast of the forest farm. The mining area covers about 4.6 km2, with an exploitation depth ranging from an elevation of 640 m to 0 m. Based on the engineering geological conditions and the mechanical properties of the rock mass, the slope angle was determined to be 42°. Except for the loose rock group, the engineering geological rock units in the deposit consist predominantly of hard, massive monzonitic granite. The region has an average annual precipitation of 638 mm, with a maximum daily rainfall of 60.2 mm. The slope rock mass is influenced by weathering, tectonic activity, and unloading effects, leading to the well-developed presence of joints and fractures. Consequently, it is vital to accurately classify the slope rock mass quality to ensure slope stability and mining safety at the Luming molybdenum mine.
According to the investigation of slope stability conditions at the Luming molybdenum mine, rock mass quality at three bench levels in the eastern pit was selected for evaluation, as shown in
Figure 17. To obtain accurate feature parameters, both field investigations and laboratory tests were conducted to characterize the slope rock mass in the eastern sector of the pit. Specifically, RQD, Sd, Kv, and W were obtained through core drilling, joint surveys, wave velocity testing, and seepage analysis, respectively. The UCS of the rock was determined via uniaxial compression tests on samples collected from the study area, and St was labeled as slope rock mass. The detailed feature parameters for each slope are listed in
Table 11.
The classification results of the slope rock mass quality in the eastern open pit are shown in
Table 12. It can be observed that the BBO-DeepForest model correctly classified the rock mass quality of all three bench levels as Grade IV, which is consistent with field observations. The DeepForest and SSA-DeepForest models exhibit similar performance, both misclassifying one Grade IV sample as Grade V. The TSO-DeepForest model also produces a misclassification, assigning one Grade IV sample to Grade III. In summary, all three optimized models proposed in this study demonstrate good feasibility and reliability in practical applications, particularly the BBO-DeepForest model.
5.6. Limitations
Despite the satisfactory performance of the proposed framework, there are still some limitations requiring further improvement:
First, the database was compiled from heterogeneous literature sources collected under different geological conditions, engineering scenarios, and investigation standards. Although basic quality-control procedures and parameter standardization were implemented during dataset construction, it remains difficult to completely eliminate the influence of source heterogeneity.
Second, several variables in the dataset exhibited relatively high missing ratios, particularly the parameter Sd. Although sensitivity analysis demonstrated that the proposed framework does not excessively rely on heavily imputed variables, uncertainty associated with missing-data mechanisms may still affect model robustness.
Third, although independent engineering cases from the Luming Molybdenum Mine were employed for external validation, the current validation dataset remains limited in scale and mainly consists of Grade IV samples. Therefore, the generalization capability of the proposed framework still requires further validation through broader geological environments or engineering scenarios.
Therefore, the proposed framework should be regarded as a feasibility study for intelligent rock mass quality classification under heterogeneous multi-source conditions. Future study will focus on establishing larger-scale databases with unified testing standards and conducting further engineering validation.