Evaluation of Short-Term Rockburst Risk Severity Using Machine Learning Methods

: In deep engineering, rockburst hazards frequently result in injuries, fatalities, and the destruction of contiguous structures. Due to the complex nature of rockbursts, predicting the severity of rockburst damage (intensity) without the aid of computer models is challenging. Although there are various predictive models in existence, effectively identifying the risk severity in imbalanced data remains crucial. The ensemble boosting method is often better suited to dealing with unequally distributed classes than are classical models. Therefore, this paper employs the ensemble categorical gradient boosting (CGB) method to predict short-term rockburst risk severity. After data collection, principal component analysis (PCA) was employed to avoid the redundancies caused by multi-collinearity. Afterwards, the CGB was trained on PCA data, optimal hyper-parameters were retrieved using the grid-search technique to predict the test samples, and performance was evaluated using precision, recall, and F1 score metrics. The results showed that the PCA-CGB model achieved better results in prediction than did the single CGB model or conventional boosting methods. The model achieved an F1 score of 0.8952, indicating that the proposed model is robust in predicting damage severity given an imbalanced dataset. This work provides practical guidance in risk management.


Introduction
Deep underground engineering is becoming more common in mine production, tunnel construction, and the construction of various subsurface structures.This trend has led to more frequent encounters with highly stressed geological conditions [1].As a result, the seismically triggering environment has led to numerous geological hazards, such as rockbursts.A rockburst is a progressive failure process wherein a rock mass ruptures due to the sudden release of a large quantity of stored elastic energy in highly stressed rocks.Casualties and the failure of engineering structures then result from the sudden ejection of surrounding rocks [2].Rockbursts are becoming more prevalent worldwide as mines delve deeper; as a result, accidents are becoming more common [3].In central Europe, 42 seismically active mines reported approximately 190 rockbursts that caused 122 casualties over the last two decades [4].Deep gold fields in western Australia and the Beaconsfield mine in Tasmania have also experienced fatalities [3].The Taiping headrace tunnels in China have experienced over 400 rockburst incidents, resulting in several casualties and the destruction of mechanical equipment [5].Numerous countries have faced rockburst problems in mines, tunnels, shafts, and caverns [6,7].To ensure the safety of personnel, various approaches have been implemented for the real-time monitoring of short-term rockburst risk.
Microgravity, electromagnetic radiation, acoustic emissions, and microseismic monitoring (MS) methods are commonly employed to generate early warnings of short-term rockburst risk [8,9].Among these techniques, the MS technique has been extensively used in deep engineering excavation to warn of short-term rockburst risks by studying the results of various multi-parameter MS methods using experimental, probabilistic, and fractal-theory approaches [10][11][12].For instance, Feng et al. examined the fractal behaviour of the energy distribution of microseismic events during the development of immediate rockbursts.The results indicated that, as the rockburst approached, the daily energy fractal dimension for MS events increased [11].Additionally, Yu et al. investigated the fractal behaviour of the time distribution of MS events for different intensities of rockbursts.The result indicated that time-fractal characteristics could be used to estimate rockburst intensity and that a smaller time-fractal dimension means a lower intensity [13].
Further, using the MS technique, Chen et al. collated 133 rockburst cases and established a relationship between radiated energy and burst intensity.Based on their criteria, rockburst grades were divided into five types: none, slight, moderate, intense, and highly intense [14].Feng et al. utilised six MS parameters from real-time monitoring and established an early warning method.The proposed method was able to successfully identify the strain and strain-structure slip burst of the Jinping II hydropower project [10].Additionally, Alcott et al. established performance criteria for MS source parameters and thresholds for daily decision-making on the ground control.Those criteria were used to help identify seismically affected areas [15].Lastly, Liu et al. observed that, before more significant events, MS apparent volume and spatial correlation length increased, while the energy index, fractal dimension, and b value decreased [6].
All the aforementioned approaches achieved significant results for the early recognition of rockbursts and could be used in early-warning systems.However, the identification of a globally accepted threshold value for rockburst risk that could apply to different site conditions and the choice of MS parameters indicating the various risk levels without the aid of computer models both remain challenging.As a result, some researchers have used a machine learning (ML) approach to predict rockburst risk.The value of ML methods is that they do not require knowledge of input and output, so they can predict outcomes by studying underlying data patterns without human involvement.
Feng et al. proposed an optimised probabilistic neural network (PNN) method to predict rockburst intensity using real-time MS information.The model integrated two other algorithms to improve performance, which increased the model's accuracy in predicting test samples by 20% compared to the standard PNN model [16].Additionally, Liang et al. developed boosting and stacking ensemble methods using real engineering datasets.Those researchers achieved significantly higher accuracy in predicting short-term rockbursts [17,18].Further, Liu et al. presented an artificial neural network (ANN) for the dynamic updating of short-term rockburst predictions.The model was further optimised by embedding a genetic algorithm (GA), which was employed to predict 31 actual cases.The results showed that the model could correctly estimate 83.9% of rockburst cases [19].Further, Zhao et al. built a decision tree (DT) model to predict the exact rank of the rockburst using MS information.The relationship between the MS features and rockbursts was investigated using the DT classifier, and the results showed that the model could accurately predict risk and provide insights regarding rockbursts using MS data [20].Toksanbayev and Adoko collected 254 samples from seismically active mines and established a damagescale classification model based on multinational logistic regression (LR).The proposed work used regression equations to create probabilistic models for the assessment of seismic hazards in mines [21].Lastly, Ullah et al. integrated K-means clustering with extreme gradient boosting (XGBoost) [22].The original data were relabelled through a clustering method, and XGBoost was trained and tested to validate the model.
All the above-mentioned models have contributed significantly to improving the accuracy of prediction.Neural networks have an advantage in dealing with complex nonlinear problems; however, some neural-network models are susceptible to problems caused by irrelevancies in the data and prone to suboptimal local minima.Although the integration of multiple hybrid and complex ensemble models improves prediction accuracy, the resultant models are often difficult to understand and execute.LR and DT are simple and easy to use but have less accuracy in highly complex, nonlinear rockburst problems.Most applied methods have focused on achieving higher accuracy in predicting risk, and the microseismic dataset is comparatively small.The proportions of different intensity levels in datasets are often unequal.However, accurately classifying each risk level is crucial when classes are imbalanced.One previous study [23] shows that the boosting method (CGB) is more efficient for analysing multi-class imbalanced data in small and large datasets than are other boosting algorithms.However, the feasibility of employing CGB in short-term rockburst prediction has never been studied before, and it is necessary to develop a simple and easy-to-use classification model with promise for predicting each class level effectively.
Therefore, this work proposes a PCA-CGB classification model to create a simple and reliable approach to predicting the intensity of rockbursts.The advantage of this proposed work over the previous approach is that more data have been gathered for the study; additionally, variable redundancies are managed through unsupervised learning.Also, to precisely classify each majority or minority class, a simple model is built and performance is comprehensively evaluated using various metrics.

Materials and Methods
The flowchart of the proposed method is shown in Figure 1.
simple and easy to use but have less accuracy in highly complex, n problems.Most applied methods have focused on achieving higher acc risk, and the microseismic dataset is comparatively small.The propo intensity levels in datasets are often unequal.However, accurately cl level is crucial when classes are imbalanced.One previous study [23] sh ing method (CGB) is more efficient for analysing multi-class imbalance large datasets than are other boosting algorithms.However, the feasib CGB in short-term rockburst prediction has never been studied before, to develop a simple and easy-to-use classification model with promise class level effectively.
Therefore, this work proposes a PCA-CGB classification model to reliable approach to predicting the intensity of rockbursts.The advanta work over the previous approach is that more data have been gathered ditionally, variable redundancies are managed through unsupervised precisely classify each majority or minority class, a simple model is bui is comprehensively evaluated using various metrics.

Materials and Methods
The flowchart of the proposed method is shown in Figure 1.

Data Collection
Rockburst data were extracted from [19,24] as a supportive databa seismic information.All data were obtained from underground tunne clude the following six MS parameters as the feature variables: cum (PN), logarithm of cumulative MS energy (PE), logarithm of cumulativ (PV), event rate (PNR), logarithm of energy rate (PER), and logarithm o rate (PVR).Rockburst intensity was the output variable.The output v tensity classes: none (N), slight (S), moderate (M), and intense (I).The put-variable are described in Table 1.

Data Collection
Rockburst data were extracted from [19,24] as a supportive database based on microseismic information.All data were obtained from underground tunnelling works and include the following six MS parameters as the feature variables: cumulative MS events (PN), logarithm of cumulative MS energy (PE), logarithm of cumulative apparent volume (PV), event rate (PNR), logarithm of energy rate (PER), and logarithm of apparent volume rate (PVR).Rockburst intensity was the output variable.The output variable has four intensity classes: none (N), slight (S), moderate (M), and intense (I).The classes of the output-variable are described in     [14].

None
Crack appears inside rock mass; no obvious failure on the surface of rock mass; construction and supports are unaffected

Slight/Weak
Failure is accompanied by slight spalling and slabbing, with slight ejection of rock fragments of size 10-30 cm; failure depth is less than 0.5 m; no harm to the support system and construction if supports are provided at the time

Moderate
Failure of surrounding rock mass followed by severe slabbing and spalling; ejected-fragment size of 30-80 cm; failure sound resembles detonator blasting and lasts for some time; failure depth is more than 0.5 m and less than 1 m; shotcrete lining among rock bolts could be damaged

Intense
Extensive failure range with an ejected-fragment size of 80-150 cm; failure zone with fresh fracture plane; burst sound like an explosive with an impact wave; failure depth between 1-3 m; damage system fully destroyed and severe impact on construction Figure 2 shows the six different features and the distribution of the four intensity levels.In Figure 1, PN represents the density of microfractures.Similarly, PE and PV represent the fracture strength and the degree of damage to the rock mass, respectively.These three parameters are basic parameters that reflect characteristics of microfractures during rockburst development [10].To account for temporal characteristics in the mechanism, three parameters pertaining to time are considered: PNR, PER, and PVR.PNR reflects the frequency of microseismicity, the failure process of the rock mass, and the average evolutionary law of the response over time.PER represents the microseismic radiation energy of the rock mass per unit of time, and PVR is the volume of the rock in the inelastic zone of deformation per unit of time.The PE, PV, PER, and PVR values are in common logarithmic form to ensure it does not change the correlation of the data variables; the form also compresses the scale of the predictors and reduces the absolute values of the datapoints [17].The data-acquisition method is reported in [10].Figure 2 demonstrates that all features contain a degree of discreteness in their characteristic values.For example, we can see that the characteristic values for some intensity class in PN, PV, PNR, and PVR show some discreteness and differ marginally in magnitude.This result arises because some microseismic activity was silent during rockburst development and the microseismic behaviour was stable and low.The precursors of the rockburst were thus not noticeable.When a rockburst occurs, microseismic activity increases suddenly and sharply, so this type of rockburst is not easy to accurately predict in an early-warning system because it is often dispersed [24], reflecting the complex mechanisms of rockburst formation.Failure is accompanied by slight spalling and slabbing, with slight ejection of fragments of size 10-30 cm; failure depth is less than 0.5 m; no harm to the sup system and construction if supports are provided at the time

Moderate
Failure of surrounding rock mass followed by severe slabbing and spalling; ej fragment size of 30-80 cm; failure sound resembles detonator blasting and las some time; failure depth is more than 0.5 m and less than 1 m; shotcrete lining rock bolts could be damaged

Intense
Extensive failure range with an ejected-fragment size of 80-150 cm; failure zon fresh fracture plane; burst sound like an explosive with an impact wave; failur between 1-3 m; damage system fully destroyed and severe impact on constru Figure 2 shows the six different features and the distribution of the four i levels.In Figure 1, PN represents the density of microfractures.Similarly, PE and resent the fracture strength and the degree of damage to the rock mass, respectivel three parameters are basic parameters that reflect characteristics of microfracture rockburst development [10].To account for temporal characteristics in the mec three parameters pertaining to time are considered: PNR, PER, and PVR.PNR refl frequency of microseismicity, the failure process of the rock mass, and the averag tionary law of the response over time.PER represents the microseismic radiation of the rock mass per unit of time, and PVR is the volume of the rock in the inelas of deformation per unit of time.The PE, PV, PER, and PVR values are in comm rithmic form to ensure it does not change the correlation of the data variables; t also compresses the scale of the predictors and reduces the absolute values of t points [17].The data-acquisition method is reported in [10].Figure 2 demonstrate features contain a degree of discreteness in their characteristic values.For exam can see that the characteristic values for some intensity class in PN, PV, PNR, a show some discreteness and differ marginally in magnitude.This result arises some microseismic activity was silent during rockburst development and the m mic behaviour was stable and low.The precursors of the rockburst were thus no able.When a rockburst occurs, microseismic activity increases suddenly and sh this type of rockburst is not easy to accurately predict in an early-warning system it is often dispersed [24], reflecting the complex mechanisms of rockburst format

Data Visualisation and Pre-processing
Data visualisation, analysis, and pre-processing are critical in data science to u standing statistical information, the distribution of variables, the pa ern between variate features and targets, and the correlation among predictors.This dataset co 37 none, 26 slight, 23 moderate, and 13 intense rockburst cases.As we can see, the d is imbalanced, as the distribution of the four classes is not equal.The classes are in gorical form and, for convenience, are converted into ordinal form by assigning val 0, 1, 2, and 3 for none, slight, moderate, and intense rockbursts, respectively.The stat descriptions of each intensity level are summarised in Table 2. Table 2 contains the tical parameters mean, standard deviation, minimum, and maximum for each of th classes, and it is possible to determine how their value distribution varies across s ferent features.For instance, for classes 0 and 3, the minimum and maximum MS e values range from 0.78 to 5.82 and from 4.11 to 7.09, respectively.Similar comparison also be made for other variables using data from the table below.

Data Visualisation and Pre-Processing
Data visualisation, analysis, and pre-processing are critical in data science to understanding statistical information, the distribution of variables, the pattern between multivariate features and targets, and the correlation among predictors.This dataset contains 37 none, 26 slight, 23 moderate, and 13 intense rockburst cases.As we can see, the dataset is imbalanced, as the distribution of the four classes is not equal.The classes are in categorical form and, for convenience, are converted into ordinal form by assigning values of 0, 1, 2, and 3 for none, slight, moderate, and intense rockbursts, respectively.The statistical descriptions of each intensity level are summarised in Table 2. Table 2 contains the statistical parameters mean, standard deviation, minimum, and maximum for each of the four classes, and it is possible to determine how their value distribution varies across six different features.For instance, for classes 0 and 3, the minimum and maximum MS energy values range from 0.78 to 5.82 and from 4.11 to 7.09, respectively.Similar comparisons can also be made for other variables using data from the table below.

Histogram and Parallel Plot
A histogram provides insights into how variables are distributed or whether they are positively or negatively skewed or distorted.According to Figure 3, the values of PE, PER, and PVR resemble a Gaussian distribution, but all are slightly negatively skewed.PN and PNR are positively skewed, and PV is marginally negatively skewed.The scaling of such features often increases the performance of models.

Histogram and Parallel Plot
A histogram provides insights into how variables are distributed or whether they are positively or negatively skewed or distorted.According to Figure 3, the values of PE, PER, and PVR resemble a Gaussian distribution, but all are slightly negatively skewed.PN and PNR are positively skewed, and PV is marginally negatively skewed.The scaling of such features often increases the performance of models.After the descriptions of target and feature variables were individually examined, a parallel plot was used to visualize the underlying relationship between input and output variables, as shown in Figure 4. Parallel coordinate plots aid in comprehending the graphical representation of multivariate MS information [19].The vertical axis represents each independent variable, and line graphs of different colours represent rockburst grades.Based on the plot, the following conclusions can be reached: Overall, it is evident that the relationship between MS parameters and rockburst risk is very complex.It can also be seen that there is some overlap in feature values between slight and moderate rockbursts.After the descriptions of target and feature variables were individually examined, a parallel plot was used to visualize the underlying relationship between input and output variables, as shown in Figure 4. Parallel coordinate plots aid in comprehending the graphical representation of multivariate MS information [19].The vertical axis represents each independent variable, and line graphs of different colours represent rockburst grades.Based on the plot, the following conclusions can be reached:   Overall, it is evident that the relationship between MS parameters and rockb is very complex.It can also be seen that there is some overlap in feature values slight and moderate rockbursts.Overall, it is evident that the relationship between MS parameters and rockburst risk is very complex.It can also be seen that there is some overlap in feature values between slight and moderate rockbursts.

Correlation Examination
Correlation represents the dependencies between two variables and measures the degree to which one fluctuates in relation to the other.Correlation analysis can categorise into three groups: positively correlated, uncorrelated, and negatively correlated.The Pearson correlation coefficient is often used to compute correlations among variables and is expressed in the following form: Simply, r is a Pearson correlation coefficient and x i and y i are the X and Y variable samples.Likewise, x and y denote the mean values of the X and Y variables, respectively.The r value ranges from '−1' to '+1', and different coefficient values indicate the various degrees of correlation, as depicted in Table 3 [25].

Dimensional Reduction
Principal component analysis (PCA) is a dimensionality-reduction technique that maps higher-dimensional data to a lower-dimensional space through mathematical transformation.The procedure used to conduct PCA follows the standard below [26]: 1.
Construct the original data matrix M = x ij m×n , which contains m samples with n variables.x ij is the value of the j predictor for observation i.

2.
Standardise the data to eliminate the effect of varying magnitudes of variables: where the mean and standard deviation of the j predictor are denoted by x j and s j , respectively.

3.
Afterwards, use the standardised data to compute the correlation coefficient matrix CM = (r) n×n , where r stands for the Pearson correlation coefficient.4.
Compute the eigenvalues and eigenvectors of the CM matrix.

5.
Choose the appropriate principal components to reduce the original dimension into a lower dimension.Generally, criteria for the first few principal components are eigenvalues greater than 1 and cumulative contribution rate above 80% are elected [26].
The formula provided is used to calculate the contribution rate of the first p principal components: where λ stands for eigen value.

Categorical Gradient Boosting Classifier
CGB was initially proposed by [27] because of its usefulness in both classification and regression.CGB has demonstrated superiority over other leading boosting variants that have been applied to different problems.For instance, CGB demonstrated better performance than extreme gradient boosting (XGboost) and light gradient boosting (LGBM) in the work of [27].Most recently, a comparative study from Wu et al. found that CGB has remarkable predictive capabilities compared with existing boosting methods [28].It also performed particularly well in some geotechnical areas, such as uniaxial compressivestrength prediction [29] and prediction of the elastic modulus of rocks.Some recent studies have also verified that CGB is superior to other boosting classifiers when applied to multiclass imbalanced data [23].
The working strategy of CGB is to learn many weak and integrate them to form a stronger learner.This approach is similar to the strategies of all other boosting methods.It implements gradient boosting with binary decision trees as weak learners [27].
For data with samples D = X j , y j j=1,...m, , X j = x 1 j , x 2 j , . . . . . ., x n j represents the vector of n number of features and target y j ∈ R, which is either binary or a numerical response.Samples (X j , y j ) are independently distributed according to some unknown distribution P(., .).The objective of the learning task is to train a function H : R n → R that minimises the expected loss, which is provided in Equation (4).

L(H)
where L(., .)represents a smooth loss function and (X, y) represents validation data sam- pled from D.
The process for all iterative gradient boosting constructs a sequence of approximations H t : R m → R, t = 0, 1, . . . . . . .From the previous approximation, H t−1 , H t is acquired in an additive process, as H t−1 +αg t , with a step size α and function g t : R n → R , which is a base predictor and is chosen from a group of functions G to minimise the expected function defined in Equation (5).
The Newton method often deals with the minimisation problem using a second-order approximation of L H t−1 + g t at H t−1 or with the help of a gradient step.Further detailed information can also be obtained in [27].

Evaluation Metrics
Generally, evaluation metrics evaluate the model's performance in assessing the test sample and whether the classifier can appropriately classify the new observations.Although there are various performance metrics for evaluating classifier robustness, this study adopts three metrics: precision, recall, and F1 score.The primary reason for selecting these metrics is that they are useful for evaluating performance when the dataset has a class-imbalance problem.As mentioned in Section 2.2, the numbers of datapoints in the four classes are not equal, making the classes imbalanced; in this case, the F1 score aids in addressing such problems by weighting precision and recall equally.The accuracy, precision, recall, and F1 score for any classifier is calculated using the following formula: where true positive (TP) indicates the number of positively predicted observations that are actually positive; false negative (FN) represents the number of negatively predicted observations that are actually positive; false positive (FP) denotes the number of predicted positives that are actually negative; and true negative (TN) is the number of predicted negatives that are actually negative.
Precision measures the correctness of a model's positive predictions [30].It is the fraction of predicted positive examples that were actually positive and is provided by: Precision = TP TP + FP (7) In contrast, recall measures the completeness of a model's positive predictions [30].It is the fraction of actual positive examples that were predicted positive, which is expressed as: A high-performing model should have both high precision and high recall because both measure the accuracy and completeness of positive predictions.Nevertheless, simultaneously achieving a high value for both is complex because trade-offs exist, meaning that when one increases, the other tends to decrease.Hence, the F1 score gives equal weight to both metrics by computing the harmonic mean of precision and recall [30].The value of the F1 score ranges between 0 and 1 for any particular classifier.An F1 score of 1 or nearly 1 indicates a perfect model.The F1 score is computed using the expression below:

Correlation Result
The computed correlation for the given dataset is shown in a correlation-matrix plot in Figure 5. From Figure 5, it can be seen that all indicators positively correlate with intensity levels to different extents.Four indices, PN, PE, PV, and PER, strongly correlate with targets, having correlation values above 60%, whereas PNR and PVR are moderately correlated at only 55% and 46%, respectively.In addition, some predictor-variable pairs are also strongly correlated with each other.For instance, the correlation of PE and PER is 97%, and PV and PVR follow with a correlation of 88%.Similarly, a correlation of 77% can be seen for PN and PNR.As a correlation between predictor variables becomes stronger, the redundancy of information increases and may impact the training process and prediction.Therefore, a good combination of variables should have features highly correlated with the target, yet uncorrelated with each other [31].
Big Data Cogn.Comput.2023, 7, x FOR PEER REVIEW 10 In contrast, recall measures the completeness of a model's positive predictions It is the fraction of actual positive examples that were predicted positive, which is pressed as:

Recall =
A high-performing model should have both high precision and high recall beca both measure the accuracy and completeness of positive predictions.Nevertheless, ultaneously achieving a high value for both is complex because trade-offs exist, mean that when one increases, the other tends to decrease.Hence, the F1 score gives eq weight to both metrics by computing the harmonic mean of precision and recall [30].value of the F1 score ranges between 0 and 1 for any particular classifier.An F1 score or nearly 1 indicates a perfect model.The F1 score is computed using the expression low: F1 = 2 precision × recall precision + recall

Correlation Result
The computed correlation for the given dataset is shown in a correlation-matrix in Figure 5. From Figure 5, it can be seen that all indicators positively correlate with in sity levels to different extents.Four indices, PN, PE, PV, and PER, strongly correlate w targets, having correlation values above 60%, whereas PNR and PVR are moderately related at only 55% and 46%, respectively.In addition, some predictor-variable pairs also strongly correlated with each other.For instance, the correlation of PE and PE 97%, and PV and PVR follow with a correlation of 88%.Similarly, a correlation of 77% be seen for PN and PNR.As a correlation between predictor variables becomes stron the redundancy of information increases and may impact the training process and pre tion.Therefore, a good combination of variables should have features highly correl with the target, yet uncorrelated with each other [31].Depending upon the correlation analysis, a highly correlated variable can be drop to reduce the multi-collinearity of the analysis [32].When two variables possess a h degree of association, one can be predicted from the other.However, determining w Depending upon the correlation analysis, a highly correlated variable can be dropped to reduce the multi-collinearity of the analysis [32].When two variables possess a high degree of association, one can be predicted from the other.However, determining which should be removed is complicated, as the indicators selected define the rockburst based on two aspects: microfracture characteristics (PN, PE, PV) and temporal evolution characteristics (PNR, PER, PVR).Hence, if all features are dropped from either of these categories, information regarding that aspect will be lost.As a result, considering the negative consequences of one-sided feature removal, the data are further handled by implementing the dimensional-reduction technique to retain original information in low dimensions.

Dimensional Reduced Data
PCA was used to approach the correlated variables discussed here.PCA was implemented using the Sklearn module [33] to reduce the impact of high correlation, and the first three components to achieve the cumulative contribution rate above 80% were chosen.The individual contribution rates for the first components are 60.41%, 19.00%, and 14.82%, respectively, with a cumulative contribution rate of 94.26%.The shape of the data in the 3D space is pictured in Figure 6.It can be seen that, after scaling and PCA, the data points for each cluster are not scattered and are close to each other.The four different colours indicate the different intensity levels.
on two aspects: microfracture characteristics (PN, PE, PV) and te acteristics (PNR, PER, PVR).Hence, if all features are dropped fro gories, information regarding that aspect will be lost.As a result, co consequences of one-sided feature removal, the data are further ha the dimensional-reduction technique to retain original information

Dimensional Reduced Data
PCA was used to approach the correlated variables discussed mented using the Sklearn module [33] to reduce the impact of hi first three components to achieve the cumulative contribution rat sen.The individual contribution rates for the first components a 14.82%, respectively, with a cumulative contribution rate of 94.26% in the 3D space is pictured in Figure 6.It can be seen that, after sca points for each cluster are not sca ered and are close to each ot colours indicate the different intensity levels.

Model Training and Hyper-Parameter Optimisation
The dataset remaining after pre-processing was used to creat The training and testing set is formed by randomly spli ing the generally in an 80:20 ratio.The larger portion (79 samples) was u and was fed to the model to train it.The remaining 20 samples we to evaluate the model.While training the model, hyper-paramete and significantly increased performance.Therefore, hyper-param a grid-search method that embeds the cross-validation (CV) meth tecture of cross-validation with five folds is portrayed in Figure 7.

Model Training and Hyper-Parameter Optimisation
The dataset remaining after pre-processing was used to create the predictive model.The training and testing set is formed by randomly splitting the dataset into two parts, generally in an 80:20 ratio.The larger portion (79 samples) was used as the training set and was fed to the model to train it.The remaining 20 samples were used as a testing set to evaluate the model.While training the model, hyper-parameter tuning was essential and significantly increased performance.Therefore, hyper-parameters were tuned using a grid-search method that embeds the cross-validation (CV) method.The general architecture of cross-validation with five folds is portrayed in Figure 7.
Five-fold CV starts with partitioning the training dataset into five portions and training the model five times.In each round, four portions of the data act as a training set, while the remaining one acts as a test set.The results obtained from all five rounds are then averaged to obtain the final prediction [34].Five-fold CV starts with partitioning the training dataset into five portio ing the model five times.In each round, four portions of the data act as a while the remaining one acts as a test set.The results obtained from all five then averaged to obtain the final prediction [34].
To build a simple and easy-to-use model, two important hyper-param and n_estimators, were chosen, and optimal values for each were identified.the maximum depth of the tree, and n_estimators is the total number of trees Hyper-parameter tuning is computationally expensive.Therefore, consideri putational cost during hyper-parameter selection, values between 2 and 15 using the range function in Python to select the appropriate value for depth.L same range function was also applied for n_estimators, and a range betwee was specified with an interval of 10.As for the learning rate, the default se in To optimise the hyper-parameters, the grid-search CV (GS-CV) method fied k-fold CV was adopted.This method divides the dataset into k segmen each segment contains approximately the same percentage of samples of each as the complete set does.This approach is beneficial when target classes are because it ensures that the model does not overfit to the majority class and t to learn to accurately predict the minority class.GS-CV tunes the parameters ically building and evaluating a model for each combination of algorithm pa specified in a grid [35].Estimator and param grid are two key terms involved CV.The estimator is a classifier that is being trained.The param grid indicat parameter se ings specified above.Every parameter combination is validate best accuracy, and of the possible combinations of pairs and parameter value are closer to optimum are selected to yield a more precise model.The hype optimisation results for the PCA-CGB and single CGB model that was trained inal data are shown in Figure 8 respectively.To build a simple and easy-to-use model, two important hyper-parameters, depth and n_estimators, were chosen, and optimal values for each were identified.The depth is the maximum depth of the tree, and n_estimators is the total number of trees in the forest.Hyper-parameter tuning is computationally expensive.Therefore, considering the computational cost during hyper-parameter selection, values between 2 and 15 were chosen using the range function in Python to select the appropriate value for depth.Likewise, the same range function was also applied for n_estimators, and a range between 10 and 200 was specified with an interval of 10.As for the learning rate, the default setting was used.
To optimise the hyper-parameters, the grid-search CV (GS-CV) method using stratified k-fold CV was adopted.This method divides the dataset into k segments such that each segment contains approximately the same percentage of samples of each target class as the complete set does.This approach is beneficial when target classes are unbalanced because it ensures that the model does not overfit to the majority class and that it is able to learn to accurately predict the minority class.GS-CV tunes the parameters by methodically building and evaluating a model for each combination of algorithm parameters, as specified in a grid [35].Estimator and param grid are two key terms involved in using GS-CV.The estimator is a classifier that is being trained.The param grid indicates the list of parameter settings specified above.Every parameter combination is validated to seek the best accuracy, and of the possible combinations of pairs and parameter values, those that are closer to optimum are selected to yield a more precise model.The hyper-parameter optimisation results for the PCA-CGB and single CGB model that was trained on the original data are shown in Figure 8 respectively.
In Figure 7, the different colours inside the plot indicate the average accuracy for various combinations, and the taller the peak, the higher the accuracy.As illustrated, the accuracy varies significantly for different pairs of combinations.The hyper-parameter tuning range and the optimal values obtained after optimisation for PCA-CGB and CGB are given in Table 4.The optimal values acquired through the GS-CV optimisation process differ between classifiers.For PCA-CGB, the optimal depth and n_estimators are 3 and 140, respectively.Similarly, CGB has a depth value of 2 and an n_estimators value of 130.In Figure 7, the different colours inside the plot indicate the average accuracy for various combinations, and the taller the peak, the higher the accuracy.As illustrated, the accuracy varies significantly for different pairs of combinations.The hyper-parameter tuning range and the optimal values obtained after optimisation for PCA-CGB and CGB are given in Table 4.The optimal values acquired through the GS-CV optimisation process differ between classifiers.For PCA-CGB, the optimal depth and n_estimators are 3 and 140, respectively.Similarly, CGB has a depth value of 2 and an n_estimators value of 130.After the best hyper-parameters were derived using GS-CV optimisation, the optimal models were used to predict the test set that was initially separated from the rest of the data and that had not been used during the training process.The confusion matrix in Table 5 shows that, among 20 observations, the PCA-CGB predicted 18 cases correctly, only misidentifying two samples.The single CGB, by contrast has five incorrect predictions.Considering the available dataset size, the PCA-CGB has a be er accuracy, at 90%.However, accuracy alone cannot reflect the overall strength of the model when the dataset has an unequally distributed class.Therefore, their strength is determined by analysing precision and recall for each class and computing the F1 score.Table 5. Confusion matrix for PCA-CGB and CG.After the best hyper-parameters were derived using GS-CV optimisation, the optimal models were used to predict the test set that was initially separated from the rest of the data and that had not been used during the training process.The confusion matrix in Table 5 shows that, among 20 observations, the PCA-CGB predicted 18 cases correctly, only misidentifying two samples.The single CGB, by contrast has five incorrect predictions.Considering the available dataset size, the PCA-CGB has a better accuracy, at 90%.However, accuracy alone cannot reflect the overall strength of the model when the dataset has an unequally distributed class.Therefore, their strength is determined by analysing precision and recall for each class and computing the F1 score.
Depending on the requirements, some sectors prefer high-recall models and some sectors demand high-precision models.However, the prediction of rockburst hazards is very sensitive and focuses on two primary aspects: minimising unnecessary controlling costs and the safety of personnel and the project.If moderate and intense rockbursts are treated as high-risk and none and slight are treated as low-risk, then a model should precisely classify high-risk and low-risk cases.This should be prioritised because classifying high-risk cases as low-risk threatens human life and project safety; similarly, classifying low-risk cases as high-risk increases economic losses to control and support measures even though the high-risk event is unlikely.From this logic, it can be concluded that rockburst-hazard risk prediction is vital in accurately identifying low-risk and high-risk cases because it is equally important to minimise costs and to ensure the safety of human life and projects.Therefore, in rockburst prediction, precision and recall have equal importance.The precision and recall for the proposed work at each intensity grade are illustrated in Figure 9.
Depending on the requirements, some sectors prefer high-recall models and sectors demand high-precision models.However, the prediction of rockburst hazar very sensitive and focuses on two primary aspects: minimising unnecessary contro costs and the safety of personnel and the project.If moderate and intense rockburst treated as high-risk and none and slight are treated as low-risk, then a model should cisely classify high-risk and low-risk cases.This should be prioritised because classi high-risk cases as low-risk threatens human life and project safety; similarly, classi low-risk cases as high-risk increases economic losses to control and support mea even though the high-risk event is unlikely.From this logic, it can be concluded that burst-hazard risk prediction is vital in accurately identifying low-risk and high-risk because it is equally important to minimise costs and to ensure the safety of huma and projects.Therefore, in rockburst prediction, precision and recall have equa portance.The precision and recall for the proposed work at each intensity grade are trated in Figure 9.As shown in Figure 9a, PCA-CGB has high precision for none, slight, and in rockbursts, but the precision is slightly lower for moderate rockbursts.Regarding th call score, the values for none, moderate, and intense are greatest, whereas that for s risk is comparatively low (Figure 8b).Overall, the model achieved precision and rec 0.9286 and 0.8917, respectively.For any optimal model, higher precision and reca desirable, but practically, it is difficult to maintain high precision and recall simul ously because there is a trade-off; when one increases, another decreases.As show Figure 9, recall decreases when precision increases and vice versa.Hence, the F1 determines the classifier's strength using the harmonic mean of precision and recall F1 score for a single class is derived in the chart in Figure 10, and Table 6 describe general rule of thumb for determining classifier strength according to the F1 (h ps://stephenallwright.com/good-f1-score/,accessed on 11 August 2023).The bar shows that, overall, PCA-CGB has the best F1 score for the none and intense levels, a slightly lower score for the slight and moderate levels.It had an F1 score of 0.8952, w is considered to indicate a good classifier, according to Table 6.As shown in Figure 9a, PCA-CGB has high precision for none, slight, and intense rockbursts, but the precision is slightly lower for moderate rockbursts.Regarding the recall score, the values for none, moderate, and intense are greatest, whereas that for slight risk is comparatively low (Figure 8b).Overall, the model achieved precision and recall of 0.9286 and 0.8917, respectively.For any optimal model, higher precision and recall are desirable, but practically, it is difficult to maintain high precision and recall simultaneously because there is a trade-off; when one increases, another decreases.As shown in Figure 9, recall decreases when precision increases and vice versa.Hence, the F1 score determines the classifier's strength using the harmonic mean of precision and recall.The F1 score for a single class is derived in the chart in Figure 10, and Table 6 describes the general rule of thumb for determining classifier strength according to the F1 score (https://stephenallwright.com/good-f1-score/, accessed on 11 August 2023).The bar chart shows that, overall, PCA-CGB has the best F1 score for the none and intense levels, with a slightly lower score for the slight and moderate levels.It had an F1 score of 0.8952, which is considered to indicate a good classifier, according to Table 6.F1 Score Performance Measure

Performance Comparison
To check the feasibility of using the PCA-CGB, its performance was compared with those of three conventional boosting classifiers on the same dataset.These other classifiers have often been utilised in rockburst prediction [17,36], and the comparison checked for improvements.The three boosting classifiers were the gradient boosting classifier (GBC) [37], adaptive boosting (AdaBoost) [38], and light gradient boosting machine (LGBM) [39].All three models were trained on the same data after PCA, and their hyper-parameters were also optimised using the GS-CV method with the same process used for PCA-CGB.For GBC and LGBM, two crucial parameters, max_depth and n_estimators, were adopted with the same tuning range as that used for PCA-CGB.However, the parameters used for AdaBoost were slightly different; therefore, n_estimators and learning_rate were selected.The selected hyper-parameter range and obtained values are shown in Table 7. Once the optimal hyper-parameters were tuned, classifiers with optimal hyper-parameters were employed to predict the previously unseen test samples.Table 8 shows the confusion matrices for GBC, AdaBoost, and LGBM.Among the three classifiers, GBC and LGBM show better results than AdaBoost.GBC misclassified one none as slight risk and two slight risks as moderate risk, whereas LGBM and AdaBoost incorrectly classified some other intensity classes as moderate risk.The F1 scores of the three classifiers are shown in Figure 11.The figure indicates that all classifiers yield better results for none/no risk and intense risk; however, all have very low scores for slight and moderate risk.GBC, AdaBoost, and LGBM generated F1 scores of 0.7952, 0.6407, and 0.7368, respectively.Finally, the results of PCA-CGB were compared with those of these three classifiers.In various ways, the predictive performance of the proposed work is be er than those of other traditional boosting classifiers when used for imbalanced rockburst data.Although GBC, AdaBoost, and LGBM seem reasonably accurate, their F1 scores are relatively low, meaning they are less robust to the above problem of class imbalance.However, the overall performance of PCA-CGB is superior concerning precision, recall, and F1 score measure, indicating that it is more reliable and possesses greater predictive power than the other boosting classifiers.
Further, in terms of F1 scores, we can discuss the performance in relation to previous work on the subject, including [18,22,40], which acquired F1 scores of 0.66, 0.8779, and 0.8631, respectively.However, the results are not directly comparable due to differences in the dataset sizes because samples for training and variables that appear in the different studies vary marginally.However, to make class distribution more diverse in this study, more cases were gathered to expand the dataset size, and a larger dataset was used compared to those in other studies.When the data are more complex, feeding a lower quantity of data for training may cause an underfi ing problem, and the model loses generalisation.Therefore, more samples were used during training to ensure the model obtained enough records to learn the pa ern between inputs and output.Overall, the final result for the previously unseen test set reveals that in unequally distributed data, the F1 score of the proposed approach still yields be er results for all types of risk severity compared to other works, which have a low error rate even for the datasets that are mostly complex and consist of relatively few data points for particular class.Finally, the results of PCA-CGB were compared with those of these three classifiers.In various ways, the predictive performance of the proposed work is better than those of other traditional boosting classifiers when used for imbalanced rockburst data.Although GBC, AdaBoost, and LGBM seem reasonably accurate, their F1 scores are relatively low, meaning they are less robust to the above problem of class imbalance.However, the overall performance of PCA-CGB is superior concerning precision, recall, and F1 score measure, indicating that it is more reliable and possesses greater predictive power than the other boosting classifiers.
Further, in terms of F1 scores, we can discuss the performance in relation to previous work on the subject, including [18,22,40], which acquired F1 scores of 0.66, 0.8779, and 0.8631, respectively.However, the results are not directly comparable due to differences in the dataset sizes because samples for training and variables that appear in the different studies vary marginally.However, to make class distribution more diverse in this study, more cases were gathered to expand the dataset size, and a larger dataset was used compared to those in other studies.When the data are more complex, feeding a lower quantity of data for training may cause an underfitting problem, and the model loses generalisation.Therefore, more samples were used during training to ensure the model obtained enough records to learn the pattern between inputs and output.Overall, the final result for the previously unseen test set reveals that in unequally distributed data, the F1 score of the proposed approach still yields better results for all types of risk severity compared to other works, which have a low error rate even for the datasets that are mostly complex and consist of relatively few data points for particular class.

Field Data Validation
After the model's reliability in prediction was verified, the model was employed to predict new engineering data extracted from [24].The data were obtained from the underground hydropower tunnelling project after the MS activities of rockbursts were examined.After transformation using PCA, this dataset is provided as input to the model.The prediction and actual results are shown in Table 9. Cases include a slight rockburst and a moderate rockburst, and the model also predicted the correct level, confirming that this classifier effectively classifies events from new, previously unseen samples.

Discussion and Limitations
Prediction of rockbursts in underground engineering using intelligent models should focus on correctly classifying each class equally.Generally, classical ML methods assume that all classes are equally distributed.However, when a dataset has a problem of class imbalance, relying on a single accuracy measure could be misleading because the model may correctly classify members of the majority class but fail to identify members of the minority class.In this scenario, relying on a single measure of accuracy may not be entirely reliable.For the purposes of controlling economic losses and promoting safety, the prediction of each intensity class is equally important.Section 3.3 shows that the model is highly accurate for the majority class (none/no risk) and minority class (intense risk).However, there are some inaccurate outcomes for two other minority classes, slight risk and moderate risk.If we rely on accuracy alone, the model may seem highly accurate.However, the model may fail to classify other minority classes equally, and the misclassification of these types of low-risk events as high-risk rockburst events could have serious implications.Most previous approaches that used classical ML methods relied on a single accuracy measure to evaluate the classifier's performance.Rather than depending on a single metric, this study used precision, recall, and F1 scores because they indicate how robust the classifier is when applied to imbalanced classes.If the model's performance is compared using the F1 score, it is reasonable and acceptable to suggest that it is not susceptible to performance problems associated with imbalanced cases and has greater power to distinguish among classes.The model's performance can also be confirmed when it is applied to rockburst classes that constitute a less extreme minority, as it can accurately identify events in such classes.Nevertheless, the model yields a slightly lower F1 score for slight and moderate rockbursts, the primary reason for which might be uncertainties and overlap between the two cases, which in turn might have led to misclassifications.Despite this issue, PCA-CGB is still more powerful than traditional boosting classifiers because while they seem accurate, they have lower scores in other metrics, indicating that their performance in predicting rockburst data is weak.
Although the proposed method yielded satisfactory results, the dataset size is still relatively small compared to those seen in common ML tasks.In common practice, ML methods rely heavily on huge datasets for better generalisation.Very small datasets can significantly lower performance by underfitting or overfitting the model.Thus, future research should focus on enhancing the model's robustness by developing a model from larger datasets.

Conclusions
Predicting short-term rockburst risk accurately has always been important, as it directly threatens the safety of personnel, equipment, and subsurface structures.Equally, classifying risk severity is essential to allowing the adoption of efficient control measures to avoid economic loss and ensure personnel safety.However, reliably distinguishing among risk levels is often challenging due to class-imbalance issues.Most existing work relies on models with high accuracy, but some of them cannot perform well with imbalanced data.Hence, this work proposes a simple, intelligent predictive method combining unsupervised learning, principal component analysis (PCA), and supervised categorical gradient-boosting (CGB) approaches to intelligently predict rockburst risk levels.The value of this method is that it can generate predictions on unequally distributed classes more efficiently than classical ML models can.The real engineering data based on microseismic information were as assembled into a supportive database comprising six features.The variables have high correlation; therefore, PCA reduces redundancy among variables.After reducing the original dimension into three components, the CGB is adopted to create a PCA-CGB model to predict rockburst risk.To ensure that the optimal model is produced, hyper-parameters are tuned to obtain the best output.The model's predictive performance was evaluated using precision, recall, and F1 score and further compared with three traditional boosting techniques to check for feasibility.The results showed that, regarding

Figure 1 .
Figure 1.Flow chart of the proposed work.

Figure 1 .
Figure 1.Flow chart of the proposed work.

Figure 3 .
Figure 3. Histograms for all six features.


There is no rockburst when PN and PNR values are low and PE, PV, PER, and PVR values are low-to-medium (dark brown lines). Slight and moderate grades have overlapping lines, indicating that medium PN and PNR values and medium-to-high PE, PV, PER, and PVR values are often associated with slight or moderate rockbursts (red and orange lines). Medium-to-high PN and PNR values and high PE, PV, PER and PVR values correspond to intense rockbursts (yellow lines).

Figure 4 .
Figure 4. Parallel plot for MS parameters and rockburst grades.

Figure 3 .
Figure 3. Histograms for all six features.

•
There is no rockburst when PN and PNR values are low and PE, PV, PER, and PVR values are low-to-medium (dark brown lines).• Slight and moderate grades have overlapping lines, indicating that medium PN and PNR values and medium-to-high PE, PV, PER, and PVR values are often associated with slight or moderate rockbursts (red and orange lines).• Medium-to-high PN and PNR values and high PE, PV, PER and PVR values correspond to intense rockbursts (yellow lines).

Figure 3 .
Figure 3. Histograms for all six features.

Figure 4 .
Figure 4. Parallel plot for MS parameters and rockburst grades.

Figure 4 .
Figure 4. Parallel plot for MS parameters and rockburst grades.

Figure 6 .
Figure 6.Projection of data into 3D space after PCA transformation.

Figure 6 .
Figure 6.Projection of data into 3D space after PCA transformation.

Figure 7 .
Figure 7.The working principle of five-fold cross-validation.

Table 1 .
Descriptions of target-variable classes.Authors' own work based on

Table 1 .
Descriptions of target-variable classes.Authors' own work based on [

Table 2 .
Statistical description of intensity levels across different predictors.

Table 2 .
Statistical description of intensity levels across different predictors.
Note: PE, PV, PER and PVR are in common logarithmic form.

Table 3 .
[26]ure of correlation strength based on Pearson correlation coefficient.Authors' own work based on[26].

Table 4 .
Hyper-parameters and tuning range.

Table 4 .
Hyper-parameters and tuning range.

Table 5 .
Confusion matrix for PCA-CGB and CG.

Table 9 .
Prediction of events in a new sample by PCA-CGB.