1. Introduction
Rock burst is a dynamic phenomenon that occurs when the elastic deformation energy of coal (rock) is released suddenly and severely around a coal mine roadway or working face [
1]. This often results in the simultaneous occurrence of displacement, throwing, a loud noise, and an air wave of coal (rock). As the depth of coal mining operations and the intensity of mining activities continue to increase, the number of coal mines experiencing rock bursts in China is on the rise. These incidents are becoming more frequent and severe, placing a significant strain on the safety of coal mine production [
2]. In response to these challenges, the state has developed a comprehensive four-in-one rock burst prevention and control technology: prediction, monitoring, prevention and control, and effect test.
Among the various forms of monitoring, rock burst monitoring plays a pivotal role in the prevention and control of rock bursts [
3,
4,
5]. It serves as the foundation for enhancing the efficacy of prevention and control measures. Currently, a growing number of monitoring methods are being employed in rock burst monitoring, including microseismic, ground sound, electromagnetic radiation, and other methods that have gained popularity in major mines [
6,
7,
8,
9]. Among these, microseismic monitoring stands out due to its resilience to environmental factors and its capacity to provide long-distance, dynamic, three-dimensional, and real-time monitoring. The most significant advantage of microseismic monitoring technology is its capacity for real-time and dynamic monitoring of the dynamic disaster process in mines [
10].
A significant number of scholars from both domestic and international academic institutions have conducted extensive research on the prediction of rock bursts based on microseismic energy changes. Tian et al. [
11] proposed a quantitative-trend early warning method for rock burst risk based on the maximum daily microseismic energy and the total number of microseismic energy/frequency deviations, analyzing the microseismic precursor information law of rock bursts. Yuan Ruifu [
12] used microseismic monitoring system to collect microseismic signals before and after the occurrence of rock bursts, analyzed the time series characteristics of microseismic signals, and used the FFT method and fractal geometry principles to study the spectrum characteristics and distribution variation in microseismic signals. It can be observed from the aforementioned research that the prevailing methodologies for monitoring rock bursts through microseismic energy frequently employ empirical analogy or mathematical statistics to identify early warning indicators and discriminant criteria, thereby enabling the prediction of the risk of rock bursts [
13,
14]. Nevertheless, the general mathematical statistics method is inadequate for mining the extensive monitoring data set, and the prediction accuracy of rock burst risk necessitates further improvement.
Deep learning represents a significant branch of artificial intelligence [
15]. Its effectiveness is contingent upon the availability of a substantial quantity of data, which is then employed to train the model [
16]. Deep learning exhibits a robust capacity for adaptive feature learning, rendering it an optimal choice for the analysis of rock burst monitoring data within the context of big data. Long et al. [
17] developed an intelligent prediction model for coal and gas outbursts based on data mining, and validated the efficacy of the system using microseismic monitoring data from the heading face. The results indicate that the early warning level is largely consistent with the actual situation.
The change in microseismic energy is affected by nonlinear factors, and thus, artificial intelligence algorithms are at a disadvantage in nonlinear analysis of high-dimensional data sets [
18]. The most commonly utilized methodologies for anticipating rock bursts are convolutional neural networks (CNNs), extreme gradient boosting models (XGBoost), random forests (RFs), long short-term memory (LSTM), and lightweight gradient boosters (LightGBM) [
19]. Given that the predictive accuracy of a single algorithmic model is insufficient for the analysis of nonlinear data, the study and application of algorithmic combination models has become a widely researched and applied field [
20]. The results demonstrate that Li et al. [
21] proposed a microseismic signal recognition method based on LMD energy entropy and a probabilistic neural network (PNN) by analyzing the characteristics of microseismic signals. Yuan et al. [
22] used principal component analysis to extract features, used particle swarm optimization to optimize the ELM model, and proposed a rock burst warning model based on the PCA-PSO-ELM combined algorithm. Cao et al. [
23] analyzed the variation characteristics of physical indicators before multiple large energy events and statistically analyzed the shortcomings of impact risk prediction indicators driven only by physical indicators and proposed a time series prediction method for rock bursts driven by the fusion of physical indicators and data features. Ullah Barkat et al. [
24] established a database using shock ground pressure patterns with multiple impact features in microseismic monitoring events of the Jinping secondary hydropower project. They then used three methods of t-distributed stochastic neighbor embedding (t-SNE), K-means clustering, and extreme gradient boosting (XGBoost) to predict the short-term shock ground pressure risk.
The preceding research findings have been utilized to develop a novel approach for the prediction of rock burst risk. This approach employs a long short-term memory (LSTM) neural network model, which is optimized using a Bayesian algorithm to establish a Bayesian long short-term memory (BO-LSTM) neural network model. This model is capable of processing time series data in accordance with the characteristics of microseismic energy change. The 13,200 working face of Gengcun Coal Mine was selected as the application object for the establishment of an early warning model for rock bursts. This involved the selection of field-measured microseismic monitoring data for the purpose of combining big data prediction with early warning of rock bursts. The prediction of rock bursts in the future is established by obtaining the potential characteristic information of microseismic monitoring data. This improves the prediction accuracy of rock burst risk and provides a new method for monitoring and early warning of rock bursts.
2. Model Principles and Methods
2.1. Establishment of Rock Burst Prediction Model
The BO-LSTM model was selected for predicting rock burst risks due to its advantages in handling time series data:
Characteristics of Time Series Data: The LSTM neural network excels at processing time series data and can effectively capture long-term dependencies within the sequence. Given that microseismic monitoring data are typical time series data, LSTM can identify the temporal dependencies, making it advantageous for predicting the next step in rock burst risk.
Avoiding Gradient Vanishing: Traditional recurrent neural networks (RNNs) tend to suffer from the vanishing gradient problem when dealing with long-term dependencies in sequences. LSTM, with its unique gating mechanisms (forget gate, input gate, and output gate), effectively avoids this issue, making it more robust when handling complex time series data.
Prediction Performance: LSTM has demonstrated superior predictive performance, with experimental results showing that it outperforms single models like GRUs and 1D-CNNs across multiple evaluation metrics. The optimized BO-LSTM model further enhances prediction accuracy, demonstrating the practical application value of this model.
Below are the specific steps for constructing the dynamic ground pressure warning model proposed in this paper:
1. Collect Microseismic Monitoring Information: Collect on-site microseismic monitoring information. Due to differences in data dimensions, direct importation into the model for training is not feasible and requires data preprocessing.
2. Establish an Expert Judgment System: In order to address the dearth of sample labels in the model, it is necessary to establish an expert judgment system. The use of expert judgments on microseismic monitoring data as data labels is proposed as a means of generating the data set. The system inputs microseismic data for a specific time period, processes the data, and assesses the dynamic pressure hazards for the following day. The combination of expert analysis and empirical experience yields judgment values, which are then output as sample labels. The judgment scores, which range from 0 to 100, correspond to four levels of dynamic ground pressure. The scores are as follows: 0–25 for no dynamic pressure hazard, 25–50 for a weak dynamic pressure hazard, 50–75 for a moderate dynamic pressure hazard, and 75–100 for a strong dynamic pressure hazard.
3. Model Construction: In the model construction process, the time series characteristics of microseismic energy changes are taken into account. Python 3.11 is employed to construct the LSTM network model, which comprises an input layer, a hidden layer, and an output layer. Additionally, a Bayesian optimization algorithm (BO) is utilized to optimize the parameters. The Python programming language was employed to implement the model on the Jupyter platform. The sample data are divided into two distinct sets: a training set and a verification set. The training set is employed to train the model, while the verification set is utilized to test the trained model and optimize the parameters through the Bayesian algorithm until the optimal neural network model is identified.
4. Obtain Predictions from the Optimal Model: The optimal model is evaluated based on the predicted value. The degree of fit between the predicted value and the actual value of the model is evaluated by comparing the curve fitting and statistical indicators, including the mean absolute error (MAE), mean absolute percentage error (MAPE), and the variance accounted for (VAF). This analysis is conducted using field data to verify the accuracy and practical application value of the BO-LSTM model.
5. Further Model Validation: In order to further verify the accuracy of the model prediction, based on the aforementioned methodology for establishing the model, the daily total energy value, daily maximum energy value, and daily frequency are predicted. The actual microseismic data are employed as the sample label for model training. The model prediction accuracy is further verified by observing the fitting effect of the predicted results and the actual values, as well as the evaluation index.
The structure of the rock burst early warning model is depicted in
Figure 1.
2.2. The Basic Principles of the Long Short-Term Memory (LSTM) Network
LSTM networks are an improved type of recurrent neural network (RNN) that addresses the issues of gradient explosion and vanishing gradient encountered in traditional RNNs. This is illustrated in
Figure 2.
The first step of the LSTM involves the “forget gate”, which determines the degree of influence of the previous time step’s cell state
Ct−1 on the current time step’s cell state
Ct.
ft represents the output of the forget gate, with the input being the hidden state
ht−1 from the previous sequence and the current sequence data
xt. The activation function σ (commonly the sigmoid function), the bias vector
bf, and the weight matrix
Wf are used in the computation of the forget gate [
25].
The computational formula is as follows:
The second step, “input gate”, consists of two parts, and the computational formula is as follows:
In the equation, it determines the necessary information to be updated into the cell state; Ct represents the new cell state at time t; Wc is the weight matrix for the input gate; bc is the bias vector for the input gate; and the activation function tanh denotes the hyperbolic tangent function.
The third step, “output gate”, controls the influence of
Ct on
ht, and its update formula is as follows:
In the equation, ot determines the output portion of the cell state; bo is the bias vector for the output gate; Wo is the weight matrix for the output gate; and ht represents the hidden layer state value of the corresponding unit at time t.
2.3. Bayesian Optimization Principle
Bayesian optimization is an approximate method that employs various probabilistic surrogate models such as Gaussian processes, random forests, etc., to model the relationship between hyperparameters and model performance, ultimately identifying the optimal hyperparameter combination [
26].
In Bayesian optimization, the probabilistic surrogate model refers to substituting the objective function with a certain probability model. The update formula for the posterior probability is as follows:
In the equation, D = {(x1,f1),(x2,f2),(xn,fn)} represents the collected sample points; p(f) is the prior distribution, which can be computed using the Bayesian formula to obtain the posterior distribution of f.
Bayesian optimization surrogate models can be broadly categorized into three types: Tree Parzen Estimator (TPE), Sequential Model-based Algorithm Configuration (SMAC) using random forest regression, and Gaussian Processes (GPs). This paper employs TPE, a non-standard Bayesian optimization algorithm based on a tree-structured Parzen density estimator. In comparison to other models, TPE demonstrates superior performance in high-dimensional spaces, with significantly improved speed.
The configuration space for TPE parameters is tree-shaped, primarily modeling p(x) and p(y). The preceding parameters determine which parameters to select subsequently and the range of values for these parameters.
TPE defines the following two probability densities:
In the equation, l(x) represents the probability density of {xi} corresponding to f(xi) being less than the threshold y*, and g(x) represents the probability density of {xi} corresponding to f(xi) being greater than the threshold y*.
By repeatedly measuring the objective function at different locations, more information can be obtained to estimate the distribution of the objective function. This enables the search for the optimal measurement location, aiming to achieve the optimal function value. To assess whether a location is optimal, function evaluations are collected. In the optimal location, the function attains its maximum value. In TPE, the acquisition function for collecting functions is the Expected Improvement (
EI), which represents the expectation of being below the threshold. It performs well in most situations, and the formula is as follows:
In the equation, the model p is the posterior Gaussian distribution over the observation domain.
In the framework of TPE, let
and
; then, the following relationships hold:
Equation (9) signifies that l(x) identifies values with higher probabilities, and g(x) identifies values with lower probabilities, resulting in a larger Expected Improvement (EI). Both l(x) and g(x) are represented in a tree structure, facilitating the collection of samples to obtain more refined information. In each iteration, the algorithm returns the value x* with the maximum EI.
2.4. DCNN Principle
One-dimensional convolutional neural networks (1DCNNs) are commonly employed for processing sequential data by integrating the well-known convolution operation with neural networks. The network parameters are updated through the backpropagation algorithm. Its architecture primarily includes an input layer, convolutional layer, pooling layer, fully connected layer, and output layer [
27,
28].
The input layer is responsible for receiving raw data, and the convolutional layer performs convolution operations on the data. Subsequently, non-linearity is introduced through the Re
LU activation function. The pooling layer is utilized to reduce data feature dimensions, thereby decreasing computational complexity. Finally, the fully connected layer and output layer transform the extracted features into the ultimate output results.
The Re
LU function is defined as follows:
2.5. GRU Principle
The GRU (gate recurrent unit) is a type of recurrent neural network introduced to address the issue of exploding gradients in RNNs, similar to LSTM (long short-term memory). The GRU model introduces two gates on top of the basic RNN architecture: namely, the update gate and the reset gate. The structure is depicted in
Figure 3.
In the diagram,
zt and
rt represent the update gate and reset gate, respectively. Their mathematical expressions are as follows:
In the equations,
ωr represents the weight for the reset gate,
ωz represents the weight for the update gate, tanh denotes the hyperbolic tangent function, and
σ represents the sigmoid function. The parameters
,
,
and
ω0 are trainable parameters in the model [
29,
30].
3. Data Analysis and Model Training
3.1. Engineering Background
This paper selects the 13,200 working face of Gengcun Coal Mine of the Yimei Group as an example, which is representative of a typical rock burst coal mine. The mining thickness of the working face is 13 to 38 m, with an average thickness of 19.3 m. The southwestern part of the working face is situated within the nappe influence zone of the F16 fault. The F16 fault is a regional thrust fault with a strike that is nearly east–west, and the tendency is slightly east–south. The shallow dip angle is 70°, while the deep dip angle is generally 15° to 35°. The mining method employs the long-wall retreat mining method, while the natural caving method is utilized to manage the roof.
The 13,200 upper roadway is accessed from the external yard of the 13,200 upper roadway at an orientation of 254°52′ and is excavated along the 2–3 coal floor. The total length of the upper roadway is 875.9 m, with the layer section extending from a depth of −65 m to a depth of 25 m. The 13,200 lower roadway is opened from the 13,200 lower roadway yard, with an orientation of 83°, and is excavated along the 2–3 coal floor. The total length is 820.5 m, with the lower roadway passing through the layer section from −42 m to 160 m.
The microseismic data were selected from the 600-day microseismic monitoring data of the 13,200 working face from June 2021 to January 2023, as shown as
Table 1.
3.2. Sensitivity Analysis
It is reasonable to posit that monitoring indicators should exhibit a stronger correlation with the risk of rock burst. Consequently, prior to the establishment of the model data set, data that are unrelated to the risk of impact should be excluded. The Pearson correlation coefficient method and the Spearman correlation coefficient method were employed to analyze the correlation between the monitoring data and the risk of rock burst. The Pearson correlation coefficient is defined as follows [
31]:
In the aforementioned Formula (14), E is the mathematical expectation; D is the variance; and Cov (X,Y) is the covariance in the sum of random variables, which is used to measure the overall error between the two variables. ρXY is the value of the quotient of covariance and standard deviation between two variables, also known as the correlation coefficient between variables X and Y. The closer ρXY is to 1, the greater the correlation between the two variables is, and when it is equal to 0, the two variables are not related.
The Spearman correlation coefficient is defined as follows [
32]:
ρ represents the Spearman correlation coefficient. For values of ρ between −1 and 1, the closer the absolute value of |ρ| is to 1, the greater the correlation. When the absolute value of |ρ| is equal to 0, the correlation is 0. D represents the difference between two data sequences, while N denotes the number of data points.
The correlation coefficient diagram calculated by Pearson correlation coefficient and Spearman correlation coefficient is shown as
Figure 4. It can be seen that the daily total energy, daily frequency, and daily maximum energy are positively correlated with the impact risk, and the correlation between the daily average energy and the impact risk is close to 0. Therefore, the model in this paper selects daily total energy, daily frequency, and daily maximum energy to establish a data set.
3.3. Data Processing
The rock burst early warning model employs a supervised learning method to predict by constructing a mapping between features and label information. The data set comprises both sample features and sample labels. The Pearson and Spearman correlation coefficients indicate that the three indexes of daily total energy, daily frequency, and daily maximum energy are positively correlated with the risk of impact. Accordingly, the sample characteristics are selected as the model inputs: the maximum microseismic energy of n days before time t, the total daily energy of n days before time t, and the frequency of n days before time t, as illustrated in
Figure 5a–c. The sample label is established through the expert evaluation system, based on the evaluation results of the microseismic monitoring data. The results are obtained by impact pressure researchers scoring the impact risk at time t in the historical data.
- (1)
Handling Missing Values
Due to network signal interference during the transmission of microseismic data, missing or abnormal data may occur, which may subsequently be filtered out. Therefore, it is necessary to handle missing values in the collected microseismic data. If the amount of missing data is small and the missing values are randomly distributed, the records with missing values can be deleted. However, in microseismic data, the missing parts often contain critical information, and deleting them may affect the completeness and accuracy of the model. As a result, linear interpolation and spline interpolation methods will be used to fill in the missing values.
Linear interpolation: If the missing value is assumed to lie between two data points that exhibit linear variation, it can be filled using linear interpolation.
Spline interpolation: For a smoother interpolation, particularly suited for complex, non-linear variations in data, spline interpolation will be applied.
- (2)
Normalized Processing
The selected data set exhibits dimensions between different features. In order to enhance the stability of the model and reduce the computational burden, it is necessary to normalize the data [
33]. The MAX-MIN function was selected as follows:
The normalized value, denoted by X*, is the arithmetic mean of the original data set, x. The maximum value of the data in each feature, denoted by max, is the largest value observed in the data set. The minimum value, denoted by min, is the smallest value observed in the data set.
- (3)
Sliding Window Approach:
A sliding window approach was used to create time series sequences for LSTM. The window size was set to 10, meaning each input to the model consists of data from the past 10 days, and the step size was set to 1. This resulted in 600 data sequences for model training.
Each sequence includes the daily energy values and frequencies from the past 10 days, predicting the risk of rock burst for the next day.
- (4)
Data Splitting
The prepared data were split into a training set (85% of the data) and a validation set (15%). The training set was used to train the LSTM model, while the validation set was employed to evaluate and fine-tune the model performance.
3.4. Indicators for Model Evaluation
In order to objectively evaluate the comprehensive performance of the model, it is necessary to consider the fact that this study is essentially a regression problem. In order to accomplish this,
MSE (mean square error),
MAE (mean absolute error),
MAPE (mean absolute percentage error), and
VAF (variance ratio) are used to evaluate the comprehensive performance of the model. The mean squared error (
MSE) and mean absolute error (
MAE) are employed to quantify the discrepancy between the actual and observed values. A smaller value indicates greater accuracy.
MAPE is a measure of the average size of the prediction error relative to the true value. The smaller the value, the more accurate the prediction. The variance ratio (
VAF) is employed to assess the extent to which the variance in the model’s observed data is interpreted. The value of
VAF ranges from 0% to 100%. When the
VAF is close to 100%, the model is able to explain the variance in the observed data to a high degree of accuracy. When the
VAF is close to 0%, the model’s ability to fit the observed data is severely limited, and the variance in the observed data cannot be explained. The calculation formula is as follows:
In this formula,
yi represents the actual value,
yi’ represents the predicted value, and n represents the total number of samples.
In Equation (20), SSres represents the sum of squares of residuals, which is the variance in the difference between the predicted value of the model and the actual observed value. SStot, on the other hand, is the total sum of squares, which is the variance in the difference between the actual observed value and its mean value.
In Equations (21) and (22), is the ith observed value, is the predicted value of the model for the ith observed value, and is the mean value of the observed value.
5. Discussion of Problems
- (1)
Potential Limitations of Using LSTM for Analyzing Microseismic Data
While LSTM models are powerful tools for analyzing time series data, several limitations may arise when dealing with microseismic data, particularly in the context of nonlinear relationships:
Assumption of Sequential Data: LSTM models excel in capturing temporal dependencies, but they may struggle with nonlinear relationships that are not adequately represented in the training data. If the relationship between microseismic energy and rockfall risk is highly nonlinear, the model may fail to accurately predict outcomes.
Data Quality and Quantity: LSTM requires large amounts of high-quality data to generalize effectively. In mining environments, obtaining sufficient labeled data can be challenging due to the rarity of certain events. Poor-quality or imbalanced data sets can lead to inaccurate predictions and a lack of robustness in the model.
Complexity of Interpretability: The “black box” nature of LSTM makes it difficult to interpret how specific input features influence predictions. This lack of transparency can hinder the practical application of results in critical safety decisions, as stakeholders may be hesitant to trust the model’s recommendations.
Overfitting Risks: LSTMs can overfit to the training data, particularly if the model architecture is too complex for the available data. Overfitting compromises the model’s ability to generalize to unseen data, leading to unreliable predictions in real-world scenarios.
These limitations could affect the study’s results by potentially leading to incorrect assessments of rock impact risks, which may undermine the overall safety of mining operations.
- (2)
Integration into Existing Mine Safety Monitoring Systems
Integrating the results of LSTM research into existing mine safety monitoring systems presents both opportunities and challenges:
Data Integration: The successful implementation of LSTM models requires seamless integration with existing data sources, including microseismic monitoring systems, geological databases, and real-time sensor inputs. This may involve significant data preprocessing and synchronization efforts.
Real-Time Processing: LSTM models can be computationally intensive, particularly when processing large data sets in real time. Technological challenges may include ensuring that the existing infrastructure can handle the computational load, which may necessitate upgrades or optimizations.
User Training and Adaptation: Personnel may need training to understand and interpret the outputs from LSTM models effectively. This includes recognizing the limitations and appropriate contexts for applying model predictions.
Continuous Monitoring and Model Updating: Mining environments are dynamic; therefore, the LSTM model should be regularly updated with new data to maintain its predictive accuracy. This requires a robust feedback loop between model outputs and ground truth observations.
- (3)
Risks of Over-Reliance on Machine Learning Algorithms
An over-reliance on machine learning algorithms, such as BO-LSTM, can lead to several critical issues:
Neglect of Traditional Methods: Traditional analysis methods, including empirical studies and domain expertise, provide essential insights that may be overlooked if machine learning is solely relied upon. These methods often incorporate years of expert knowledge, which can be crucial for understanding complex geological conditions.
Potential for Misinterpretation: If decision-makers place undue trust in machine learning outputs without considering traditional methods, they may misinterpret the risks associated with rock impacts, leading to inadequate safety measures.
To mitigate these risks, it is crucial to adopt a hybrid approach that combines machine learning with traditional analysis methods, ensuring a comprehensive understanding of rock impact risks.
- (4)
Discussion of Risks Associated with Deep Learning in Mine Safety
When applying deep learning methods in critical areas such as mine safety, it is vital to discuss potential risks, particularly if the probability of error has not been properly analyzed:
Error Propagation: In safety-critical applications, errors in predictions can have severe consequences. If the model has not been validated thoroughly, it may produce false positives or negatives, leading to misguided safety protocols.
Lack of Error Analysis: Without proper error analysis, it is challenging to understand the model’s limitations and to identify scenarios where it may fail. This is particularly concerning in mining operations, where the stakes are high.
Regulatory Compliance: Mining operations are subject to strict safety regulations. Failure to demonstrate the reliability and accuracy of machine learning models can hinder compliance with industry standards, affecting operational licenses and safety certifications.