Next Article in Journal
The Effect of Stress Distribution on Tibial Implants with a Honeycomb Structure in Open-Wedge High Tibial Osteotomy
Previous Article in Journal
Artificial Intelligence Adoption in SMEs: Survey Based on TOE–DOI Framework, Primary Methodology and Challenges
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimization of Rockburst Grade Prediction Model Based on Multidimensional Feature Selection: Integrated Learning and Index System Correlation Analysis

School of Resources and Safety Engineering, Central South University, Changsha 410083, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(12), 6466; https://doi.org/10.3390/app15126466
Submission received: 6 May 2025 / Revised: 30 May 2025 / Accepted: 1 June 2025 / Published: 9 June 2025

Abstract

:
Rockburst is a major disaster in deep underground engineering, and its prediction is crucial for engineering safety. This study proposes an optimization method based on multidimensional feature selection and integrated learning that systematically evaluates the impact of different indicator dimensions by constructing an indicator–indicator system and an indicator–rockburst hierarchy using a combination of seven-, six-, five-, four-, and three-dimensional indicators in conjunction with six machine-learning models, such as XGBoost, LightGBM, and CatBoost. The results show that tree models (e.g., CatBoost, LightGBM, etc.) are naturally resistant to multicollinearity, and PCA preprocessing destroys their nonlinear feature relationships, leading to performance degradation. CatBoost has the best performance and strong overfitting resistance; LightGBM is the second most efficient and suitable for real-time applications. The indicator–indicator system has better overall performance but less stability, and the indicator–rockburst system has slightly lower performance but a more stable downward trend. The six-dimensional system in both types of systems can balance the performance and complexity and is the optimal choice for engineering applications. This study provides theoretical support and practical reference for the selection of rockburst prediction and an evaluation index system.

1. Introduction

Rockbursts are a threatening phenomenon characterized by their suddenness, destructiveness, and complexity [1]. Rockbursts can cause serious consequences, such as equipment damage and injuries [2]. With the rapid growth of global demand for mineral resources, deep underground mining has become an inevitable trend in the mining industry, and rockbursts are particularly challenging for engineers [3]. Despite the progress of machine learning and deep learning methods in rockburst prediction, existing studies have obvious shortcomings in feature selection: they generally rely on the correlation between indicators while neglecting the correlation between indicators and rockburst grade and lack a systematic assessment of the indicator system in different dimensions.
This study proposes an optimization method based on multidimensional feature selection and integrated learning by constructing two types of evaluation systems (seven-dimensional to three-dimensional), namely, the indicator–indicator system and the indicator–rockburst hierarchy system, and combining six types of models, including XGBoost and CatBoost, in order to solve three problems: (1) the mechanism of the tree model’s resistance to multiple covariance; (2) the performance difference between the indicator–indicator criterion and the indicator–rockburst hierarchy system; and (3) the basis for selecting the optimal indicator dimension. The experiments show that CatBoost performs optimally in the two types of systems, and the six-dimensional system can balance performance and complexity.
The innovations of this study are (1) revealing the properties of the tree model without PCA downscaling; (2) verifying the validity of the indicator–rockburst hierarchy; and (3) establishing the engineering applicability of six-dimensional indicators. The research results provide new theoretical support and practical reference for rockburst prediction.

2. Literature Review

With the increasing scale of mining and underground engineering, especially the rapid development of deep underground mining, the problem of predicting rockburst hazards has become increasingly complex and challenging. To address this challenge, many scholars have begun to extensively explore and utilize advanced machine learning techniques to predict the occurrence of rockbursts, thus promoting a large number of data-driven research efforts [4]. Among the many approaches, machine learning and deep learning models have been widely used in rockburst prediction due to their superior data modeling capabilities and good predictive performance. Sun et al. [5] proposed a prediction framework that combines a random forest-based metric weight optimization method (RF-CRITIC) with an improved cloud model, which is able to effectively perform the prediction of short-term rockbursts. The method significantly improves the reliability and accuracy of prediction through multiple feature selection and fusion. Shen et al. [6], on the other hand, proposed a random forest (Op-RF) model based on Optuna optimization, which significantly improves the prediction performance of the model through efficient hyperparameter optimization. Their validation results showed that the method achieved an area-under-the-curve (AUC) score of 0.984 in rockburst prediction, demonstrating a strong classification capability. In contrast, Li et al. [7] proposed the DeepForest model, which takes into consideration the contribution of each input variable to the occurrence of rockbursts by means of a multilevel integrated learning mechanism and sensitivity analysis. The authors demonstrated the effectiveness of the model in dealing with fewer input parameters, especially for rockburst prediction in deep mines. In addition, Liu et al. [8] further extended the application of deep learning in rockburst prediction by employing a deep learning algorithm with a complex network structure and verified the great potential of deep learning models in rockburst prediction.
In rockburst grade prediction, the selection of appropriate evaluation indicators plays a crucial role in the effectiveness of the model. If there are too few discriminating indicators, the key factors affecting rockbursts may not be fully reflected; however, if too many indicators are selected, the difficulty of data collection and processing is substantially increased [9]. Moreover, too many indicators may cause redundancy of information instead of affecting the prediction effect of the model. Therefore, how to balance the quantity and quality of indicators is always an important challenge in research. Currently, researchers have different approaches to choosing evaluation indicators for rockburst prediction. For example, Xue et al. [10] directly selected six indexes as input variables when building the particle swarm optimized extreme learning machine model (PSO-ELM), namely, the maximum tangential stress of the surrounding rock (σθ), the uniaxial compressive strength of the rock (σc), the tensile strength (σt), the stress ratio (σθc), the brittleness ratio of the rock (σθt), and the elastic energy index (Wet). Shukla et al. [11] directly chose four indexes, namely, maximum tangential stress, elastic energy index, uniaxial compressive strength, and uniaxial tensile stress. Meanwhile, Li et al. [7] used seven indicators in their deep forest model, namely, maximum tangential stress, uniaxial compressive strength, tensile strength, elastic strain energy index, stress concentration factor, and rock brittleness indices B1 and B2 of the surrounding rock, which significantly expanded the evaluation dimensions. On the other hand, the method proposed by Lin et al. [12] focuses on eliminating indicators that are highly correlated with each other through correlation analysis and retaining only four indicators, namely, maximum tangential stress, uniaxial compressive strength, tensile strength, and the elasticity–strain–energy index of the surrounding rock, which optimizes the input features by reducing redundant information. Similarly, Faradonbeh et al. [13] considered the correlation between the indicators and selected only four indicators, namely, maximum tangential stress, uniaxial compressive strength, tensile strength, and elastic energy index of the surrounding rock, in order to prevent the effect of multiple covariance on the modeling process and complexity of the final model. Armaghani et al. [14] still use the correlation between indicators as the criterion for selecting indicators. It can be seen that most of the studies on rockburst prediction tend to use the correlation between indicators as a criterion to obtain a set of evaluation index systems with weak correlation.
However, a comparison of several studies in practice reveals that the correlation between evaluation metrics does not always directly affect the performance of predictive models. For example, the variable autocoder–natural gradient boost model (VAE-NGBoost) proposed by Lin et al. [12] has the highest correlation coefficient between the input metrics of 0.582, which is a relatively low correlation coefficient. Finally, the VAE-NGBoost model achieved an accuracy of 0.921, while the deep forest model of Li et al. [7] achieved an accuracy of 0.924 despite the highest correlation of 0.9 between its input variables. In addition, the correlation between the indicators selected by Wang et al. [15] was 0.788, and the accuracy of the final model was 0.946. These results show that the correlation between the evaluation indicators is not a decisive factor for the model’s effectiveness. On the other hand, the correlation between the evaluation indicators and the rockburst grade also plays a non-negligible role in the model effect. Generally speaking, choosing indicators that are highly correlated with the rockburst grade as inputs can improve the predictive ability of the model. However, current research on this aspect is still weak, and most existing studies do not use the correlation between the indicators and the rockburst grade as a criterion to select evaluation indicators.

3. Purpose of This Study

From the existing literature review, it can be found that the current rockburst prediction research generally adopts the correlation analysis between indicators as the main basis for feature selection, aiming to construct a set of weakly correlated evaluation indicator systems as model input. However, empirical studies have shown that the statistical correlation between indicators has no significant effect on the performance of prediction models. It is worth noting that there is still a lack of comparative studies on different indicator systems in rockburst prediction, and there is a serious lack of empirical analyses to systematically evaluate the predictive efficacy of each indicator system. In addition, existing studies rarely consider the correlation between the indicators and the rockburst level as the feature selection criterion, and this approach has not yet been fully explored in related fields of research.
In order to deeply explore the influence of the selection of evaluation indexes on the performance of the prediction model, this study uses the index–indicator correlation and index–rockburst grade correlation as the selection criterion and the maximum tangential stress (σθ), the uniaxial compressive strength (σc), the tensile strength (σt), the elastic strain energy index (Wet), the stress concentration factor (SCF), the rock brittleness index B1 (B1 = σct), the rock fragility index B2 (B2 = (σc − σt)/(σc + σt)), and seven other indices as the basis. Different numbers of evaluation indices (seven, six, five, four, and three indices) are used as input variables, combined with integrated learning algorithms (XGBoost, CatBoost, LightGBM, and random forest (RF)) and six mainstream algorithms, such as traditional algorithms (support vector machine (SVM) and multilayer perceptron machine (MLP)) with the Optuna hyperparameter optimization algorithm, to carry out a comparative analysis. By comparing the prediction effect of each model under different judging index systems, we aim to assess the specific impact of each index combination on the model performance and provide a theoretical basis for optimizing the rockburst prediction model.

4. Materials and Methods

4.1. Data Sources

In view of previous rockburst research, this study widely collected 330 cases of rockburst-related engineering case data at home and abroad, as shown in Table 1. According to the classification criteria of rockbursts shown in Table 2, the rockburst grade was divided into four different categories: none, light, medium, and strong.

4.2. Data Description and Analysis

The distribution of rockburst categories is shown in Figure 1 with no rockbursts (only 53 cases), strong intense rockbursts (56 cases), light rockbursts (101 cases), and moderate rockbursts (120 cases). In this study, the influence factors in the dataset are maximum tangential stress (σθ), uniaxial compressive strength (σc), tensile strength (σt), elastic strain energy index (Wet), stress concentration factor (SCF), rock brittleness index B1 (B1 = σct), rock brittleness index B2 (B2 = (σc − σ t)/(σc + σt)), and seven others.
The violin plot of the rockburst data set is shown in Figure 2. The violin plot, as a density combination of graphs, can effectively show the overall distribution of data. The width of the graph reflects the uniformity of the data distribution; a wider violin plot indicates that the data are more evenly distributed, while a narrower graph indicates a higher degree of data concentration. The box line portion of the violin plot represents the median and interquartile range of the data, and the density of the scatter reflects the degree of concentration of the data in a certain value interval. By looking at Figure 2, it is clear that there is a category imbalance or sampling bias in the dataset, and the presence of these outliers may be related to the samples that were targeted to specific operating conditions during the collection process.
Further, the scatter and distribution density plots in Figure 3 demonstrate the data distribution relationships between features and the distribution of each rockburst category on a single feature, revealing significant differences in distribution and magnitude between the categories. Therefore, to address these issues, appropriate data enhancement is particularly necessary in the early stages of data processing.
Table 3, on the other hand, provides the results of the calculations for each metric (standard deviation, kurtosis, maximum, mean, median, and range) in the statistical analysis.

4.3. XGBoost

XGBoost (eXtreme gradient boosting) is an efficient machine learning algorithm that is deeply optimized on top of the gradient boosting decision tree (GBDT) framework. Its core mechanism is to continuously optimize the objective function by gradually adding new decision trees, and each tree is trained based on the residuals of the previous tree so as to gradually reduce the error between the model’s predicted value and the true value. As the decision trees continue to accumulate, the value of the loss function gradually decreases, pushing the model to approach the optimal solution. This incremental learning approach makes XGBoost widely used in many fields, such as wind power prediction [23], wildfire disaster risk assessment [24], financial market trend prediction [25], and so on [26].
In terms of mathematical construction, XGBoost’s objective function L ( φ ) consists of two parts: the loss function number and the regularization constraint term. The loss function measures how well the model fits the data, while the regularization constraint term acts as a penalty mechanism to control the complexity of the model and prevent it from falling into the trap of overfitting. This optimization strategy improves the generalization ability of the model while ensuring that it still maintains excellent computational efficiency when dealing with large-scale datasets. The formulas for the objective function and regularization term of XGBoost are as follows:
L ( φ ) = i = 1 n l y ^ i y i + k = 1 k Ω f k
Ω ( f k ) = γ T + 1 2 λ | ω 2
In Equations (1) and (2), y ^ i represents the predicted output of sample xi, while its corresponding true value is yi. The model consists of k subtrees, and the output of the kth subtree is denoted as fk, whose complexity is bounded by the regular term Ω ( f k ) . The hyperparameters γ and λ together regulate the optimization process of the tree, where T refers to the number of leaf nodes in the decision tree, and ω indicates the specific value of each leaf node. In addition, the training error of sample xi is measured by the loss function l y ^ i y i , which affects the learning effect of the overall model.

4.4. LightGBM

Compared with XGBoost, LightGBM [27], as an emerging gradient boosting tree model, demonstrates higher computational efficiency in its algorithm design. Its core optimization lies in the introduction of the histogram algorithm, which effectively reduces the memory occupation and, at the same time, reduces the computational overhead to ensure the efficient operation of the model on large-scale datasets.
Most traditional tree structure learning frameworks, including XGBoost, adopt a layer-by-layer growth strategy, i.e., expanding all the nodes in the same layer at the same time each time to ensure a balanced expansion of the tree. However, LightGBM breaks through this inertia and innovatively introduces a leaf-by-leaf growth mechanism. Instead of following the hierarchical expansion, the method prioritizes the growth of leaf nodes with the smallest splitting loss, which enables the model to converge more quickly and significantly reduces the memory footprint.
In contrast, the layer-by-layer approach is robust and orderly, while the leaf-by-leaf approach is more dynamically adaptive and captures localized changes in data features more accurately.

4.5. Catboost

CatBoost, an open-source integrated model based on gradient boosting [28], has shown excellent performance in complex classification and regression tasks dealing with highly nonlinear data due to its powerful learning capabilities.
Unlike traditional gradient boosting methods that rely on a uniform sample set to estimate the gradient and construct the model, CatBoost takes a different approach and focuses on solving the deep challenge of prediction bias. The accumulation of gradient bias not only affects model stability but may also trigger the target leakage problem [29], which in turn weakens the generalization ability of the model.
To cope with such risks, CatBoost adopts an innovative ordered boosting framework. The method dynamically partitions the leaf nodes of the preorder tree by consistent criteria, thus effectively suppressing the negative effects of gradient bias and prediction bias. As a result, the algorithm’s overfitting resistance is significantly enhanced, and its accuracy and generalization ability are greatly improved, making it more adaptable and stable in complex data environments.

4.6. Random Forests

Random forest (RF) is an integrated learning method that effectively overcomes the overfitting problem that may result from a single decision tree by constructing multiple decision trees and combining their results to make predictions [30]. Its basic algorithmic process is as follows.
First, the number of trees (N) and the number of randomly selected features (m) in each tree are set, and the training data are prepared. Next, in constructing each tree, the following steps are followed sequentially: (1) random sampling from the training set; (2) random selection of a subset of features; and (3) construction of a decision tree.
In the task of classification prediction of rockburst intensity, the RF model employs a voting mechanism in which the majority vote determines the final classification result, and this strategy makes the overall prediction more robust, which is formulated as follows:
y ^ f i n a l = a r g m a x y i = 1 N I y ^ = y
where N denotes the number of decision trees, the prediction for each tree is y ^ i , and I is an indicator function.

4.7. Support Vector Machines

Support vector machine (SVM) is a remarkable machine learning algorithm widely used in classification and regression tasks. It maps data to a higher-dimensional space by means of a kernel trick, thus constructing an optimal bounding hyperplane that can be used to efficiently differentiate between different classes of data. The core idea of the algorithm is to minimize classification errors or maximize the bounds by designing a function such that data points are correctly assigned to the appropriate labels. The wider the margin between the hyperplane and the data points, the smaller the classification error, and such a separation makes the boundaries of each type of data more clear. By optimizing this separation function, SVM can achieve more accurate classification results [31].

4.8. Multilayer Perceptron

Multilayer perceptron (MLP) is an artificial intelligence technology that empowers computers with human brain-like capabilities to perform complex data analysis. The brain, which served as the inspiration for the neural network architecture, relies on hundreds of millions of neurons transmitting electrical signals through intricate connections that coordinate together in order to process information [32]. Artificial neural networks work on a similar principle, consisting of artificial neurons that work together to solve problems. Each neuron consists of four basic components: input, weights, activation function, and output. The input data can come from other neurons or from the external environment, while the weights determine how much each input signal affects the current neuron and are achieved by adjusting how the elements in the previous layer affect the current element [33].

4.9. Optuna

As a next-generation hyperparameter search framework, Optuna [34] achieves breakthroughs in optimization efficiency by innovatively fusing dynamic parameter space reconstruction mechanisms with adaptive pruning algorithms. The core workflow of the system starts with the sophisticated construction of the multidimensional optimization space; researchers need to explicitly define the objective function, parameter types, and their dynamic value ranges. In the iterative optimization phase, the system adopts a hybrid strategy combining Bayesian optimization and evolutionary algorithms to evaluate the convergence characteristics of each parameter combination in real time and implement intelligent abortions of inefficient test nodes based on the expected lifting threshold. This focused search strategy allows computational resources to continuously flow toward the high-potential parameter subspace until the preset termination conditions (e.g., iteration number or accuracy threshold) are satisfied and ultimately outputs the Pareto-optimal hyperparameter configuration scheme.

4.10. Principal Component Analysis

Principal component analysis (PCA) is a classical unsupervised dimensionality reduction method whose core idea is to transform the original high-dimensional features into linearly independent low-dimensional variables (principal components) by orthogonal transformations while retaining the maximum variance information in the data [35]. The method is widely used in eliminating redundant information among features, improving the computational efficiency of models, and visualizing high-dimensional data.

5. Delineation of the System of Indicators for Rockburst Prediction

In order to construct the rockburst prediction index system, this study analyzed the correlation of the rockburst prediction sample data through the Pearson correlation coefficient to assess the interrelationships between the seven key characteristic parameters of rockbursts. The analysis results are shown in Figure 4. According to the analysis, there was a strong correlation between certain indicators. For example, the correlation coefficient between σθ and SCF was as high as 0.9, indicating that the trends of these two variables in the dataset were highly consistent. In addition, the correlation coefficients between σt and B1 were −0.63 and between σt and B2 were −0.69, which indicated that these two fragility indicators showed a strong negative correlation with σt, while a strong positive correlation (correlation coefficient of 0.73) existed between B1 and B2. The absolute values of the correlation coefficients between the other variables were less than or equal to 0.48, indicating a moderate or weak correlation between them.
Further analysis reveals that there is a strong correlation between σθ, SCF, and Wet and the rockburst grade, with correlation coefficients of 0.52, 0.41, and 0.49, respectively, which may be closely related to the strong intrinsic relationship between these variables. On the other hand, σt and σc also showed moderate correlation, while the correlation between B1 and B2 and rockburst grade was relatively weak, showing only a weak correlation.
Combining the above analysis, this study fully considered the correlation between the evaluation indicators (indicator–indicator system, hereafter referred to as I-I) and their correlation with the rockburst grade (indicator–rockburst grade system, hereafter referred to as I-R). Based on these correlation characteristics, the evaluation system of rockburst prediction was classified into nine types (as shown in Table 4).

6. Model Construction

6.1. PCA Processing

Multicollinearity is a key issue in machine learning model development. The MICS in this study is likely to have the problem of multicollinearity, which may lead to overfitting and, at the same time, may reduce the robustness of the model on new data.
In order to explore whether the multi-indicator system in this study has a serious multicollinearity problem, the seven-indicator system was given PCA treatment. If there is a significant improvement in the model effect after PCA treatment, it means that the multicollinearity problem of the original dataset is very serious, which seriously affects the prediction effect of the original model. If there is no multicollinearity problem in the seven-indicator system, then there is naturally no multicollinearity problem in the other low-indicator systems. The PCA processing flow is shown in Figure 5.
PCA converted the seven indicators into a number of principal components, of which the first five were extracted using the criterion of cumulative contribution ≥ 95% (cumulative explained variance of 98.7%). The explained variance of each principal component is shown in Figure 6.
These five principal components were then used as inputs to XGBoost, CatBoost, LightGBM, RF, SVM, and MLP, and five-fold cross-validation was used to train and test the model. Optuna was used for hyperparameter optimization. Finally, the final model effect was compared with the model effect of the original seven-indicator system.

6.2. Determination of Model Architecture

In this study, six representative machine learning algorithms were systematically selected as research carriers: XGBoost, CatBoost, and LightGBM, under the gradient boosting framework; RF, as an integrated learning method; SVM, as a representative of traditional machine learning; and MLP, as a typical model of deep learning.
In the process of model construction, in order to ensure the rigor and scientificity of the performance comparison of different algorithms, this study formulated a standardized modeling process (shown in Figure 7). It is worth noting that all the comparison experiments followed the principle of the “control variable method”. Firstly, MinMaxScaler was uniformly used for feature normalization, and each index was linearly mapped to the interval of [0, 1], which eliminated the influence of the magnitude and retained the distribution characteristics of the original data. Secondly, the hyperparameter optimization link was equipped with the cutting-edge auto-adjustment framework of Optuna, which can explore the parameter space efficiently with the sequence modeling technique based on Bayesian optimization. More importantly, the modeling procedure was standardized to ensure that the performance of different algorithms would be rigorous and scientific in the process of model construction (Figure 7). The sequence modeling technique based on Bayesian optimization efficiently explored the parameter space; more importantly, the optimization process strictly adopted the five-fold cross-validation method to randomly and uniformly divide the dataset into five similarly sized subsets (known as “folds”), denoted as Fold 1 to Fold 5. Then, the model underwent five rounds of iterations, with Fold 1 selected as the validation set in each round and the remaining folds selected as the training set. The model performance was calculated on the validation set in each round, and the average of the five rounds’ performance was taken as the final performance of the model. Through such multiple data rotation and result aggregation, the robustness and data efficiency of model evaluation were significantly improved, both fully exploiting the value of data and effectively avoiding the risk of overfitting.
In the design of the evaluation system, this study adopted the average cross-validation accuracy as the core evaluation index. Specifically, a complete five-fold cross-validation process was executed for each Optuna trial, and the arithmetic average of the five rounds of validation accuracy was finally taken as the ultimate judgment of model performance. This design significantly improved the stability and credibility of the evaluation results. For the Optuna optimizer, this study set a search budget of 100 iterations, and the hyperparameter search space for each model is shown in Table 5.

6.3. Indicators for Model Evaluation

To evaluate the performance of the model, commonly used metrics, such as accuracy, precision, recall, and F1 score, were used as they apply to multicategorical problems. Accuracy, as the most intuitive global metric, measures the overall correctness of the model’s prediction for all categories, but it may be biased in data with unbalanced categories. To compensate for this limitation, this study further introduced precision and recall for in-depth analysis in two key dimensions, namely, prediction reliability and category coverage. Precision focuses on the percentage of true positive categories in the samples predicted by the model to be positive, whereas recall reveals the model’s ability to recognize true positive categories. The F1 score, a reconciling mean metric, seeks an optimal balance between the two and is especially suitable for scenarios with uneven class distributions. The samples were categorized into true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) by comparing the predicted and actual results. The accuracy, precision, recall, and F1 score formulas are as follows:
a c c u r a c y = T P + T N T P + T N + F P + F N
p r e c i s i o n = T P T P + F P
r e c a l l = T P T P + F N
F 1 = 2 × p r e c i s i o n × r e c a l l p r e c i s i o n + r e c a l

7. Results

After the PCA process for the seven-indicator system, the modeling effect is shown in Figure 8.
The rate of change of model effects after PCA treatment compared to model effects before PCA treatment is shown in Table 6:
In this study, six machine learning models, namely, XGBoost, LightGBM, CatBoost, RF, SVM, and MLP, were systematically evaluated for their rockburst prediction performance under different metric dimensions (7-3 metrics), and the results are shown in Table 7, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13 and Table 14 and Figure 9, Figure 10, Figure 11, Figure 12, Figure 13, Figure 14, Figure 15 and Figure 16.

8. Discussion

(1) Aspects of the multicollinearity problem.
As can be seen from Figure 8 and Table 6 of the modeling results after PCA treatment, the modeling effects of XGBoost, LightGBM, CatBoost, and RF decreased after PCA treatment of the indicator system, with the most significant decrease in CatBoost modeling effects. On the contrary, there was a very small improvement in the model effect of SVM and MLP. It is especially noteworthy that XGBoost, LightGBM, CatBoost, and RF are all tree models.
In fact, tree models are good at capturing nonlinear relationships (e.g., interactions and segmentation functions) between features and targets [36], which may be destroyed by linear transformations of PCA. Meanwhile, the tree model itself can effectively deal with the multicollinearity problem [37]. Therefore, the PCA processing of the data before inputting them into XGBoost, LightGBM, CatBoost, and RF may rather reduce the model effect. The experimental results of this study verify this point of view as well. In addition, Cha et al. [38] showed similar results in their experiments: the decision tree (DT) model R2 was 0.872, while the principal component analysis–decision tree (PCA-DT) model R2 was 0.849, and the model effect was worse after PCA treatment. DT also belongs to the tree model. It can be seen that tree models (XGBoost, LightGBM, CatBoost, and RF) inherently have excellent ability to deal with nonlinear relationships and multicollinearity problems, so there is often less need to consider the multicollinearity problem when using tree models for prediction tasks.
In addition, for SVM and MLP, although the modeling effect is improved after PCA treatment, the improvement is very limited, with a maximum improvement of only 2.38%. This indicates that the multicollinearity problem of the original data is not serious.
(2) Comparison of the performance of the models.
As can be seen from Table 7, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13 and Table 14, in the I-I system, CatBoost had the highest average accuracy (0.6950), average precision (0.7106), average recall (0.6950), and average F1 score (0.6944), which was significantly better than the other models. In the I-R system, CatBoost also performed outstandingly, with the highest average accuracy (0.6913), average precision (0.7052), average recall (0.6913), and average F1 score (0.6907). In addition, CatBoost’s stability was significantly better than other models, both in the I-I system and the I-R system. CatBoost’s excellent performance may be attributed to its efficient processing of class features and its ability to resist overfitting, which can better capture nonlinear relationships in rockburst prediction.
LightGBM’s overall performance in the I-I and I-R systems was second only to CatBoost, with average accuracies of 0.6801 and 0.6863, respectively. It is also worth noting that LightGBM’s training speed was significantly faster than that of the other models, which makes it suitable for real-world application scenarios that require fast responses.
The average accuracy of XGBoost in the I-I and I-R systems was 0.6776 and 0.6751, respectively, which is a moderate but stable performance.
RF performed well in the I-I system, with an average accuracy (0.6873) second only to CatBoost, but RF’s performance in the I-R system was significantly degraded, with a more pronounced drop in performance in the low-dimensional system.
SVM and MLP performed poorly in all the index systems. As can be seen from Figure 9, Figure 10, Figure 11 and Figure 12, the performance indicators of SVM and MLP models in all aspects were significantly inferior to the other four models, especially in the I-R system, where the performance was even worse.
(3) Comparisons across indicator systems.
Under the I-I system, as can be seen in Table 7, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13 and Table 14, the average accuracy of all models decreased from 0.6632 to 0.6529 when decreasing from seven dimensions to four dimensions, which is a relatively small decrease (about 1.5%). CatBoost and RF had the smallest fluctuation of performance when decreasing in dimensions, which indicates that they are more robust to the redundancy of metrics. The performances of the six- and five-dimensional systems were close to those of the seven-dimensional system. As can be seen from Figure 13, Figure 14, Figure 15 and Figure 16, the I-I system was somewhat less stable, with a more pronounced decrease in the performance of the models in the four-dimensional system and a sudden high in the three-dimensional system. The four-dimensional system showed a significant decrease in performance, suggesting that oversimplification may lead to the loss of key information. The sudden increase in the three-dimensional metrics may be due to the fact that when the dimensionality is reduced from 4 to 3 dimensions, it happens to retain the deterministic metrics that are strongly correlated with the rockbursts while eliminating redundant or noisy features.
In the I-R system, the average accuracy decreased from 0.6632 to 0.6508 when decreasing from seven dimensions to three dimensions, which is slightly larger than that of the I-I system (about 1.9%). As shown in Figure 13, Figure 14, Figure 15 and Figure 16, the I-R system was more sensitive to changes in dimensionality, especially in the four- and three-dimensional systems, where the decrease in performance is more pronounced, with a steady downward trend in general.
Comparing the four performance indicators in Figure 9, Figure 10, Figure 11 and Figure 12, the overall average performance of the I-I system was better than that of the I-R system. But the I-I system is an unstable phenomenon. In four dimensions and three dimensions, although the average performance indicators of the I-R system were not as good as the I-I system, it was more stable with the dimensionality of the performance of the lower decline. There was no sudden change in the situation, which may be related to the I-R system of the selection criteria for the indicators; rockburst level correlation has a close relationship with the I-R system of the indicators.
In general, the performance of the six-dimensional system of the I-I system and the I-R system was very close to that of the seven-dimensional system. The performance of the five-dimensional system decreased obviously, but it was still better than that of the four-dimensional and three-dimensional systems. The six-dimensional system of the I-I system and the six-dimensional system of the I-R system reduced the redundancy of the indexes. Therefore, it would be more suitable for practical application while guaranteeing the performance is close to that of the seven-dimensional system. If higher requirements are placed on the model’s performance, the seven-dimensional system of the I-I system can be selected. However, the performance of the five-dimensional, four-dimensional, and three-dimensional systems of the I-R system declines significantly, so they are not recommended for use in actual engineering applications.

9. Summary

9.1. Conclusions

This study draws the following conclusions by comparing the performance of different models and indicator systems in rockburst prediction:
(1) Tree models (e.g., CatBoost, LightGBM, etc.) are naturally resistant to multicollinearity, and PCA preprocessing will destroy their nonlinear feature relationships, leading to performance degradation; when using tree models, the original features can be directly retained, avoiding unnecessary dimensionality reduction processing.
(2) Model performance: CatBoost has the best overall performance (highest accuracy and stability), LightGBM is the second most efficient and efficiently trained, XGBoost and RF are stable, and SVM and MLP lag behind significantly.
(3) Indicator system: The six-dimensional system is suitable for practical applications, as it reduces redundancy while retaining performance (close to seven-dimensional); seven-dimensional is suitable for high-precision needs, while five-dimensional or less may lead to loss of information and model performance degradation; the I-I system is better in performance but fluctuates a lot, and the I-R system is more stable.

9.2. Significance and Contribution of This Study

(1) For the first time, the natural resistance mechanism of tree models to multiple covariance has been systematically verified, and it is made clear that linear downscaling methods, such as PCA, destroy the nonlinear feature relationships that tree models depend on.
(2) The empirical study shows that the traditional feature selection method based on low correlation between indicators does not significantly improve the prediction performance. This finding challenges the feature selection paradigm commonly adopted in the current rockburst prediction research and provides a new theoretical basis for subsequent research.
(3) By introducing the feature selection method (I-R system) of correlation between indicators and rockburst grade and verifying its prediction efficacy, this study provides another idea for constructing an indicator system for rockburst prediction that makes up for the lack of exploration of this method in existing studies.
(4) For the first time, the differences in the predictive performance of different dimensional index systems have been systematically evaluated, and the advantages of the six-dimensional system in balancing the accuracy of the model and the engineering practicability have been clarified, which provides a reliable basis for decision-making in practical engineering applications.

9.3. Limitations

(1) Only seven indicators commonly used in rockburst prediction (e.g., σθ, σc, etc.) were considered, and environmental factors such as geological formations, groundwater, etc., were not included, which may have omitted key predictive variables.
(2) Uneven distribution of samples (53 cases without rockburst vs. 120 cases with moderate rockburst) may affect the model’s generalization ability.
(3) The conclusions of this study were drawn based on a specific dataset (N = 330), and although internal validity was ensured through cross-validation, the findings may be subject to a certain degree of chance due to the singularity of the sample source (which were all derived from literature cases). There is still a need to verify the generalizability of the conclusions by other means in the future.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app15126466/s1, Table S1: Collected rockburst database.

Author Contributions

Conceptualization, J.C. and X.X.; methodology, J.C. and X.X.; software, J.C.; validation, J.C., X.X.; formal analysis, J.C.; investigation, J.C.; resources, J.C. and X.X.; data curation, J.C.; writing—original draft preparation, J.C.; writing—review and editing, X.X.; visualization, J.C.; supervision, X.X.; project administration, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article and Supplementary Materials.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhou, J.; Zhang, Y.; Li, C.; He, H.; Li, X. Rockburst prediction and prevention in underground space excavation. Undergr. Space 2024, 14, 70–98. [Google Scholar] [CrossRef]
  2. Keneti, A.; Sainsbury, B.-A. Review of published rockburst events and their contributing factors. Eng. Geol. 2018, 246, 361–373. [Google Scholar] [CrossRef]
  3. Roy, J.M.; Eberhardt, E.; Bewick, R.P.; Campbell, R. Application of data analysis techniques to identify rockburst mechanisms, triggers, and contributing factors in cave mining. Rock Mech. Rock Eng. 2023, 56, 2967–3002. [Google Scholar] [CrossRef]
  4. Askaripour, M.; Saeidi, A.; Rouleau, A.; Mercier-Langevin, P. Rockburst in underground excavations: A review of mechanism, classification, and prediction methods. Undergr. Space 2022, 7, 577–607. [Google Scholar] [CrossRef]
  5. Sun, J.; Wang, W.; Xie, L. Predicting short-term rockburst using RF–CRITIC and improved cloud model. Nat. Resour. Res. 2024, 33, 471–494. [Google Scholar] [CrossRef]
  6. Shen, Y.; Wu, S.; Wang, Y.; Wang, J.; Yang, Z. Interpretable model for rockburst intensity prediction based on Shapley values-based Optuna-random forest. Undergr. Space 2025, 21, 198–214. [Google Scholar] [CrossRef]
  7. Li, D.; Liu, Z.; Armaghani, D.J.; Xiao, P.; Zhou, J. Novel ensemble tree solution for rockburst prediction using deep forest. Mathematics 2022, 10, 787. [Google Scholar] [CrossRef]
  8. Liu, H.; Ma, T.; Lin, Y.; Peng, K.; Hu, X.; Xie, S.; Luo, K. Deep learning in rockburst intensity level prediction: Performance evaluation and comparison of the NGO-CNN-BiGRU-attention model. Appl. Sci. 2024, 14, 5719. [Google Scholar] [CrossRef]
  9. Jia, Z.-C.; Wang, Y.; Wang, J.-H.; Pei, Q.-Y.; Zhang, Y.-Q. Rockburst Intensity Grade Prediction Based on Data Preprocessing Techniques and Multi-model Ensemble Learning Algorithms. Rock Mech. Rock Eng. 2024, 57, 5207–5227. [Google Scholar] [CrossRef]
  10. Xue, Y.; Bai, C.; Qiu, D.; Kong, F.; Li, Z. Predicting rockburst with database using particle swarm optimization and extreme learning machine. Tunn. Undergr. Space Technol. 2020, 98, 103287. [Google Scholar] [CrossRef]
  11. Shukla, R.; Khandelwal, M.; Kankar, P.K. Prediction and Assessment of Rock Burst Using Various Meta-heuristic Approaches. Mining, Met. Explor. 2021, 38, 1375–1381. [Google Scholar] [CrossRef]
  12. Lin, S.; Liang, Z.; Dong, M.; Guo, H.; Zheng, H. Imbalanced rock burst assessment using variational autoencoder-enhanced gradient boosting algorithms and explainability. Undergr. Space 2024, 17, 226–245. [Google Scholar] [CrossRef]
  13. Faradonbeh, R.S.; Vaisey, W.; Sharifzadeh, M.; Zhou, J. Hybridized intelligent multi-class classifiers for rockburst risk assessment in deep underground mines. Neural Comput. Appl. 2023, 36, 1681–1698. [Google Scholar] [CrossRef]
  14. Armaghani, D.J.; Yang, P.; He, X.; Pradhan, B.; Zhou, J.; Sheng, D. Toward Precise Long-Term Rockburst Forecasting: A Fusion of SVM and Cutting-Edge Meta-heuristic Algorithms. Nat. Resour. Res. 2024, 33, 2037–2062. [Google Scholar] [CrossRef]
  15. Wang, J.-C.; Dong, L.-J. Risk assessment of rockburst using SMOTE oversampling and integration algorithms under GBDT framework. J. Central South Univ. 2024, 31, 2891–2915. [Google Scholar] [CrossRef]
  16. Zhou, J.; Li, X.; Mitri, H.S. Classification of rockburst in underground projects: Comparison of ten supervised learning methods. J. Comput. Civ. Eng. 2016, 30, 0401600. [Google Scholar] [CrossRef]
  17. Pu, Y.; Apel, D.B.; Xu, H. Rockburst prediction in kimberlite with unsupervised learning method and support vector classifier. Tunn. Undergr. Space Technol. 2019, 90, 12–18. [Google Scholar] [CrossRef]
  18. Liu, R.; Ye, Y.; Hu, N.; Chen, H.; Wang, X. Classified prediction model of rockburst using rough sets-normal cloud. Neural Comput. Appl. 2019, 31, 8185–8193. [Google Scholar] [CrossRef]
  19. Xue, Y.; Li, Z.; Li, S.; Qiu, D.; Tao, Y.; Wang, L.; Yang, W.; Zhang, K. Prediction of rock burst in underground caverns based on rough set and extensible comprehensive evaluation. Bull. Eng. Geol. Environ. 2019, 78, 417–429. [Google Scholar] [CrossRef]
  20. Wu, S.; Wu, Z.; Zhang, C. Rock burst prediction probability model based on case analysis. Tunn. Undergr. Space Technol. 2019, 93, 103069. [Google Scholar] [CrossRef]
  21. Du, Z.; Xu, M.; Liu, Z.; Xuan, W. Laboratory integrated evaluation method for engineering wall rock rock-burst. Gold 2006, 27, 26–30. [Google Scholar]
  22. Jia, Q.; Wu, L.; Li, B.; Chen, C.; Peng, Y. The comprehensive prediction model of rockburst tendency in tunnel based on optimized unascertained measure theory. Geotech. Geol. Eng. 2019, 37, 3399–3411. [Google Scholar] [CrossRef]
  23. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar] [CrossRef]
  24. Wu, Y.; Xie, Y.; Xu, F.; Zhu, X.; Liu, S. A runoff-based hydroelectricity prediction method based on meteorological similar days and XGBoost model. Front. Energy Res. 2024, 11, 1273805. [Google Scholar] [CrossRef]
  25. Ren, C.; Yue, W.T.; Liang, X.Y.; Liang, Y.J.; Liang, J.Y.; Lin, X.Q. Risk assessment of wildfire disaster in Guilin based on XGBoost and combined weight method. J. Saf. Environ. 2023, 18, 1–9. [Google Scholar]
  26. Xu, J.; Sun, C.; Rui, G. NSGA–III–XGBoost-Based Stochastic Reliability Analysis of Deep Soft Rock Tunnel. Appl. Sci. 2024, 14, 2127. [Google Scholar] [CrossRef]
  27. Joharestani, M.Z.; Cao, C.; Ni, X.; Bashir, B.; Talebiesfandarani, S. PM2.5 Prediction Based on Random Forest, XGBoost, and Deep Learning Using Multisource Remote Sensing Data. Atmosphere 2019, 10, 373. [Google Scholar] [CrossRef]
  28. Zhou, H.; Chen, S.; Li, H.; Liu, T.; Wang, H. Rockburst prediction for hard rock and deep-lying long tunnels based on the entropy weight ideal point method and geostress field inversion: A case study of the Sangzhuling Tunnel. Bull. Eng. Geol. Environ. 2021, 80, 3885–3902. [Google Scholar] [CrossRef]
  29. Hussain, S.; Mustafa, M.W.; Jumani, T.A.; Baloch, S.K.; Alotaibi, H.; Khan, I.; Khan, A. A novel feature engineered-CatBoost-based supervised machine learning framework for electricity theft detection. Energy Rep. 2021, 7, 4425–4436. [Google Scholar] [CrossRef]
  30. Zhang, X.; Xie, H.; Xu, Z.; Li, Z.; Chen, B. Evaluating landslide susceptibility: An AHP method-based approach enhanced with optimized random forest modeling. Nat. Hazards 2024, 120, 8153–8207. [Google Scholar] [CrossRef]
  31. Zhou, J.; Yang, P.; Peng, P.; Khandelwal, M.; Qiu, Y. Performance evaluation of rockburst prediction based on PSO-SVM, HHO-SVM, and MFO-SVM hybrid models. Mining Met. Explor. 2023, 40, 617–635. [Google Scholar] [CrossRef]
  32. Aidoni, A.; Kofidis, K.; Cocianu, C.L.; Avram, L. Deep Learning Models for Natural Gas Demand Forecasting: A Comparative Study of MLP, CNN, and LSTM. Romanian J. Pet. Gas Technol. 2023, 4, 133–148. [Google Scholar] [CrossRef]
  33. Clarke, A.; Giljarhus, K.E.T.; Oggiano, L.; Saddington, A.; Depuru-Mohan, K. MLP-mixer-based deep learning network for pedestrian-level wind assessment. Environ. Data Sci. 2024, 3, e35. [Google Scholar] [CrossRef]
  34. Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar] [CrossRef]
  35. Jin, A.; Basnet, P.; Mahtab, S. Evaluation of Short-Term Rockburst Risk Severity Using Machine Learning Methods. Big Data Cogn. Comput. 2023, 7, 172. [Google Scholar] [CrossRef]
  36. Zhang, Y.; Zhang, R.; Ma, Q.; Wang, Y.; Wang, Q.; Huang, Z.; Huang, L. A feature selection and multi-model fusion-based approach of predicting air quality. ISA Trans. 2020, 100, 210–220. [Google Scholar] [CrossRef] [PubMed]
  37. Elith, J.; Leathwick, J.R.; Hastie, T. A working guide to boosted regression trees. J. Anim. Ecol. 2008, 77, 802–813. [Google Scholar] [CrossRef]
  38. Cha, G.-W.; Choi, S.-H.; Hong, W.-H.; Park, C.-W. Developing a prediction model of demolition-waste generation-rate via principal component analysis. Int. J. Environ. Res. Public Health 2023, 20, 3159. [Google Scholar] [CrossRef]
Figure 1. Distribution of rockburst categories.
Figure 1. Distribution of rockburst categories.
Applsci 15 06466 g001
Figure 2. Violin plot of rockburst data distribution.
Figure 2. Violin plot of rockburst data distribution.
Applsci 15 06466 g002
Figure 3. Distribution of correlations between indicators and density of distribution of indicators by rockburst class.
Figure 3. Distribution of correlations between indicators and density of distribution of indicators by rockburst class.
Applsci 15 06466 g003
Figure 4. Pearson’s correlation coefficient plot for rockburst data.
Figure 4. Pearson’s correlation coefficient plot for rockburst data.
Applsci 15 06466 g004
Figure 5. Flow chart of PCA processing.
Figure 5. Flow chart of PCA processing.
Applsci 15 06466 g005
Figure 6. Variance explained by each principal component.
Figure 6. Variance explained by each principal component.
Applsci 15 06466 g006
Figure 7. Flowchart of model prediction.
Figure 7. Flowchart of model prediction.
Applsci 15 06466 g007
Figure 8. Comparison of model effect before and after PCA treatment.
Figure 8. Comparison of model effect before and after PCA treatment.
Applsci 15 06466 g008
Figure 9. (a) Indicator–indicator accuracy; (b) indicator–rockburst level accuracy.
Figure 9. (a) Indicator–indicator accuracy; (b) indicator–rockburst level accuracy.
Applsci 15 06466 g009
Figure 10. (a) Indicator–indicator precision rate; (b) indicator–rockburst rating precision rate.
Figure 10. (a) Indicator–indicator precision rate; (b) indicator–rockburst rating precision rate.
Applsci 15 06466 g010
Figure 11. (a) Indicator–indicator recall rate; (b) indicator–rock blast rating recall rate.
Figure 11. (a) Indicator–indicator recall rate; (b) indicator–rock blast rating recall rate.
Applsci 15 06466 g011
Figure 12. (a) Indicator–indicator F1 score; (b) indicator–rockburst rating F1 score.
Figure 12. (a) Indicator–indicator F1 score; (b) indicator–rockburst rating F1 score.
Applsci 15 06466 g012
Figure 13. Modeling accuracy by indicator system.
Figure 13. Modeling accuracy by indicator system.
Applsci 15 06466 g013
Figure 14. Model accuracy rates for each indicator system.
Figure 14. Model accuracy rates for each indicator system.
Applsci 15 06466 g014
Figure 15. Model recall by indicator system.
Figure 15. Model recall by indicator system.
Applsci 15 06466 g015
Figure 16. Model F1 scores for each indicator system.
Figure 16. Model F1 scores for each indicator system.
Applsci 15 06466 g016
Table 1. Data sources.
Table 1. Data sources.
ReferencesNumber of CasesRockburst Label
NoneLightModerateStrong
Zhou et al. [16]24743788244
Pu et al. [17]1201110
Liu et al. [18]163481
Xue et al. [19]203773
Wu et al. [20]70151
Du et al. [21]71204
Jia et al. [22]60330
Xue et al. [10]153543
Sum3305310112056
Table 2. Criteria for categorizing rockbursts.
Table 2. Criteria for categorizing rockbursts.
Rockburst
Label
SignFailure Characteristics
None
rockburst
IThere is no sound of rockbursting or rock falling.
Light
rockburst
IIThe rock surrounding the area displays signs of spalling, cracking, or striping. There is no evidence of ejection, and the sound is weak.
Moderate
rockburst
IIIThe rocks surrounding the area exhibit significant deformation and fracturing, resulting in loose rock chips and sudden destruction. This is often accompanied by a crunchy squeaking sound, which is common in local caverns within the surrounding rock.
Strong
rockburst
IVThe rocks surrounding the tunnel are severely fractured and suddenly propelled into it, accompanied by strong bursts, roars, air jets, and other storm phenomena. This causes rapid expansion into the deep surrounding rocks.
Table 3. Calculation of statistical analysis indicators for each indicator.
Table 3. Calculation of statistical analysis indicators for each indicator.
GradeStatistical IndicatorsσθσcσtSCFB1B2Wet
NoneSTD16.4549.863.900.2512.270.071.96
Kurt1.591.550.551.78−0.71−0.590.54
Max.77.69241.0017.661.0547.931.007.80
Min2.6020.000.400.055.380.690.81
Mean25.41102.076.050.3020.560.872.80
Median22.2297.495.120.2217.850.902.06
Range75.09221.0017.261.0042.550.316.99
LightSTD20.8439.713.940.1910.160.071.55
Kurt1.680.963.54−0.183.8215.871.23
Max.126.72263.0022.600.9069.690.979.00
Min13.5030.001.900.102.520.430.85
Mean44.67117.106.760.4121.380.893.71
Median43.42116.896.160.3822.890.923.19
Range113.22233.0020.700.8067.170.548.15
ModerateSTD23.1043.373.820.2016.320.052.71
Kurt−0.06−0.260.201.753.214.4518.99
Max.118.77237.2017.661.2780.000.9821.00
Min13.0230.001.300.100.150.691.20
Mean51.47116.476.160.4725.030.905.07
Median50.90112.505.260.4721.690.915.00
Range105.75207.2016.361.1779.850.2919.80
StrongSTD83.1152.374.671.175.940.066.02
Kurt0.001.54−0.271.751.720.894.60
Max.297.80304.2022.604.8732.200.9430.00
Min16.4330.002.500.105.530.692.03
Mean119.65129.0810.341.1814.120.858.91
Median91.37127.0910.270.7213.270.867.20
Range281.37274.2020.104.7726.670.2527.97
Table 4. Division of the evaluation indicator system.
Table 4. Division of the evaluation indicator system.
Delineation CriteriaIndicator–Indicator CorrelationIndicator–Rockburst Grade Correlation
7 indicatorsσθ, σc, σt, SCF, B1, B2, Wet
6 indicatorsσθ, σc, σt, B1, B2, Wetσθ, σc, σt, SCF, B2, Wet
5 indicatorsσθ, σc, σt, B1, Wetσθ, σc, σt, SCF, Wet
4 indicatorsσθ, σc, σt, Wetσθ, σt, SCF, Wet
3 indicatorsσθ, σc, Wetσθ, SCF, Wet
Table 5. Hyperparametric search space.
Table 5. Hyperparametric search space.
ModelHyperparameterSearch SpaceParameter Type
XGBoostn_estimators[200, 1000]int
learning_rate[0.01, 0.3]float
max_depth[3, 10]int
subsample[0.6, 1.0]float
colsample_bytree[0.6, 1.0]float
gamma[0, 0.5]float
reg_alpha[0, 1]float
reg_lambda[0, 1]float
LightGBMnum_boost_round[200, 1000]int
num_leaves[20, 100]int
learning_rate[0.01, 0.3]float
feature_fraction[0.6, 1.0]float
bagging_fraction[0.6, 1.0]float
bagging_freq[1, 10]int
min_child_samples[5, 100]int
reg_alpha[0, 1]float
reg_lambda[0, 1]float
Catbootiterations[200, 1000]int
depth[2, 12]int
learning_rate[0.01, 0.3]float
l2_leaf_reg[1, 12]float
border_count[32, 256]int
RFn_estimators[200, 1000]int
max_depth[10, 50]int
min_samples_split[2, 10]int
min_samples_leaf[1, 10]int
max_features[‘sqrt’, ‘log2’, None]categorical
SVMC[0.01, 10.0]float
kernel[‘linear’, ‘rbf’, ‘poly’]categorical
gamma[‘scale’, ‘auto’]categorical
degree[2, 5]int
MLPepochs[200, 1000]int
units_layer1[32, 128]int
units_layer2[16, 64]int
activation[‘relu’, ‘tanh’]categorical
learning_rate[1 × 10−4, 1 × 10−3]float
batch_size[16, 32]int
dropout_rate[0.2, 0.5]float
Table 6. Rate of change of model effect before and after PCA treatment.
Table 6. Rate of change of model effect before and after PCA treatment.
Rate of Model Performance Change
AccuracyPrecisionRecallF1-Score
XGBoost−3.69%−4.76%−3.69%−3.50%
LightGBM−2.73%−3.86%−2.73%−2.31%
CatBoost−6.60%−6.38%−6.60%−6.63%
RF−4.04%−3.95%−4.04%−4.31%
SVM1.48%2.34%1.48%1.52%
MLP2.13%1.84%2.13%2.38%
Table 7. Indicator–indicators accuracy.
Table 7. Indicator–indicators accuracy.
Accuracy (I-I)
Model76543Mean
XGBoost0.67510.68430.67830.66580.68440.6776
LightGBM0.68440.69070.69070.66290.67190.6801
Catboost0.70930.69370.69060.69070.69070.6950
RF0.69370.68430.69040.68420.68410.6873
SVM0.63460.63160.61920.61620.63770.6279
MLP0.58210.60990.60070.59760.63140.6043
Mean0.66320.66580.66170.65290.6667--
Table 8. Indicator–rockburst accuracy.
Table 8. Indicator–rockburst accuracy.
Accuracy (I-R)
Model76543Mean
XGBoost0.67510.67510.66260.68130.68130.6751
LightGBM0.68440.69070.68440.68460.68750.6863
Catboost0.70930.69380.68430.68150.68760.6913
RF0.69370.68440.68120.67220.65970.6782
SVM0.63460.63490.62860.61310.59760.6218
MLP0.58210.58850.60700.58510.59130.5908
Mean0.66320.66120.65800.65300.6508--
Table 9. Indicator–indicator precision rate.
Table 9. Indicator–indicator precision rate.
Precision (I-I)
Model76543Mean
XGBoost0.69760.70560.69900.68360.69960.6971
LightGBM0.70410.70980.70400.68230.67920.6959
Catboost0.72620.71100.70440.70450.70670.7106
RF0.70890.70850.71270.70060.70150.7064
SVM0.64990.64940.63240.6230.65190.6413
MLP0.60730.62880.62820.62750.64790.6279
Mean0.68230.68550.68010.67030.6811--
Table 10. Indicator–rockburst accuracy rate.
Table 10. Indicator–rockburst accuracy rate.
Precision (I-R)
Model76543Mean
XGBoost0.69760.70310.68260.70240.69990.6971
LightGBM0.70410.71050.69630.70100.69660.7017
Catboost0.72620.70770.69610.69960.69660.7052
RF0.70890.70260.70020.68420.67180.6935
SVM0.64990.64960.64040.63450.61560.6380
MLP0.60730.62360.62640.61230.62390.6187
Mean0.68230.68290.67370.67230.6674--
Table 11. Indicator–indicator recall rate.
Table 11. Indicator–indicator recall rate.
Recall (I-I)
Model76543Mean
XGBoost0.67510.68430.67830.66580.68440.6776
LightGBM0.68440.69070.69070.66290.67190.6801
Catboost0.70930.69370.69060.69070.69070.6950
RF0.69370.68430.69040.68420.68410.6873
SVM0.63460.63160.61920.61620.63770.6279
MLP0.58210.60990.60070.59760.63140.6043
Mean0.66320.66580.66170.65290.6667--
Table 12. Indicator–rockburst recall rate.
Table 12. Indicator–rockburst recall rate.
Recall (I-R)
Model76543Mean
XGBoost0.67510.67510.66260.68130.68130.6751
LightGBM0.68440.69070.68440.68460.68750.6863
Catboost0.70930.69380.68430.68150.68760.6913
RF0.69370.68440.68120.67220.65970.6782
SVM0.63460.63490.62860.61310.59760.6218
MLP0.58210.58850.60700.58510.59130.5908
Mean0.66320.66120.65800.65300.6508--
Table 13. Indicator–indicator F1 score.
Table 13. Indicator–indicator F1 score.
F1-Score (I-I)
Model76543Mean
XGBoost0.67400.68280.67760.66520.68030.6760
LightGBM0.68090.68630.68870.65840.66990.6768
Catboost0.70850.69250.69050.68940.69110.6944
RF0.69070.68260.68730.68100.68160.6846
SVM0.63350.63050.61770.61430.63280.6258
MLP0.57910.60870.59760.59700.63140.6028
Mean0.66110.66390.65990.65090.6645--
Table 14. Indicator–rockburst F1 score.
Table 14. Indicator–rockburst F1 score.
F1-Score (I-R)
Model76543Mean
XGBoost0.67400.67380.66030.68130.68050.6740
LightGBM0.68090.68870.68260.68390.68520.6843
Catboost0.70850.69280.68420.68140.68670.6907
RF0.69070.68360.68070.67210.65760.6769
SVM0.63350.63240.62550.60910.59570.6192
MLP0.57910.58510.60590.58160.58450.5872
Mean0.66110.65940.65650.65160.6484--
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, J.; Xie, X. Optimization of Rockburst Grade Prediction Model Based on Multidimensional Feature Selection: Integrated Learning and Index System Correlation Analysis. Appl. Sci. 2025, 15, 6466. https://doi.org/10.3390/app15126466

AMA Style

Chen J, Xie X. Optimization of Rockburst Grade Prediction Model Based on Multidimensional Feature Selection: Integrated Learning and Index System Correlation Analysis. Applied Sciences. 2025; 15(12):6466. https://doi.org/10.3390/app15126466

Chicago/Turabian Style

Chen, Jiayang, and Xuebin Xie. 2025. "Optimization of Rockburst Grade Prediction Model Based on Multidimensional Feature Selection: Integrated Learning and Index System Correlation Analysis" Applied Sciences 15, no. 12: 6466. https://doi.org/10.3390/app15126466

APA Style

Chen, J., & Xie, X. (2025). Optimization of Rockburst Grade Prediction Model Based on Multidimensional Feature Selection: Integrated Learning and Index System Correlation Analysis. Applied Sciences, 15(12), 6466. https://doi.org/10.3390/app15126466

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop