D-P-Transformer: A Distilling and Probsparse Self-Attention Rockburst Prediction Method

Zhang, Yu; Li, Jitao; Liu, Dongqiao; Chen, Guangshu; Dou, Jiaming

doi:10.3390/en15113959

Open AccessArticle

D-P-Transformer: A Distilling and Probsparse Self-Attention Rockburst Prediction Method

by

Yu Zhang

^1,2,3,

Jitao Li

^1,2,*

,

Dongqiao Liu

³,

Guangshu Chen

^1,2 and

Jiaming Dou

^1,2

¹

School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

²

Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing University of Civil Engineering and Architecture, Beijing 100044, China

³

State Key Laboratory for GeoMechanics and Deep Underground Engineering, China University of Mining & Technology, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Energies 2022, 15(11), 3959; https://doi.org/10.3390/en15113959

Submission received: 6 April 2022 / Revised: 23 May 2022 / Accepted: 23 May 2022 / Published: 27 May 2022

(This article belongs to the Topic Interdisciplinary Studies for Sustainable Mining)

Download

Browse Figures

Versions Notes

Abstract

:

Rockburst may cause damage to engineering equipment, disrupt construction progress, and endanger human life. To this day, the occurrence of rockburst remains complex and difficult to predict. This study proposes the D-P-Transformer algorithm to address this issue by improving the embedding structure of the Transformer for specific applications to rockburst data. To reduce the computational requirement, sparse self-attention is adopted to replace self-attention. A distilling operation and multiple layer replicas are simultaneously used to enhance the robustness and speed up the algorithm’s process. Taking all relevant rockburst factors into consideration, multiple experiments are conducted on seven large-scale rockburst datasets with different training ratios to verify the reliability of the proposed D-P-Transformer rockburst prediction algorithm. As compared to the original algorithm, the proposed algorithm shows average reductions of 24.45%, 46.56%, 17.32%, and 48.11% in the mean absolute error (MAE), mean square error (MSE), root mean square error (RMSE), and mean absolute percentage error (MAPE), respectively. The results indicate that the novel D-P-Transformer rockburst prediction algorithm is superior to the Transformer prediction algorithm, and could be used for coal mine rockburst prediction analysis.

Keywords:

rockburst prediction; acoustic emission; absolute energy; transformer; probsparse self-attention; distilling

1. Introduction

Research has shown that rockburst disasters are occurring more frequently due to deeper work in underground mines [1]. It is believed that reliable rockburst prediction techniques can significantly reduce reconstruction risk, project costs, and human casualties [2]. Hence, accurate rockburst prediction has become a popular research topic in recent years.

Rockburst is a nonlinear dynamic phenomenon in which a rock mass instantaneously releases energy along the excavation face [3]. In recent years, the number of rockburst incidents in China reached 660, due to which more than 200 people have died [4]. Rockburst has occurred frequently in the United States in recent years, causing many injuries and even deaths. [5]. Rockburst occurs frequently and causes huge losses. Thus, rockburst research has received increasingly more attention from scholars.

After realizing the risk and harm of rockburst, numerous researchers have posited that accurate rockburst prediction is a powerful solution for the mitigation and solution of underground rockburst accidents and risk [6,7,8]. In the 1960s, Cook [9] first studied rockburst by conducting uniaxial compression experiments. More recently, He et al. [10] developed a true triaxial rockburst experimental system with three-way, six-sided loading and single-sided sudden unloading. They conducted multiple rockburst experiments, which successfully reproduced the rockburst phenomenon during the excavation of deep underground caverns. As the frequency, amplitude, and ringing counts of rockburst are important pieces of precursor information, they have been thoroughly explored by researchers [11,12,13,14]. Liu and Wang [15] took frequency as an early rockburst warning signal by analyzing the frequency change of the electromagnetic radiation (EMR) signal. Zhang et al. [16] discussed the magnitude of rockburst risk via microseismic (MS) monitoring techniques. Zhang et al. [17] further discussed the variation law of acoustic emission (AE) counts of different degrees of rock damage during rock destruction, concluding that a sudden increase in AE counts can be used as an indicator of rockburst. Via experimentation, Chu et al. [18] proved that the energy trend can be used as a rockburst indicator to prevent the occurrence of rockburst. Gao et al. [19] proposed an energy conservation index to evaluate rockburst potential by studying the capacity evolution characteristics of rocks.

Some scholars also believe that accurate rockburst prediction requires not only various key parameters, but also rigorous and appropriate algorithmic models [20,21,22,23]. Chen et al. [24] used the dynamic failure mechanism (DFM) to characterize the dynamic damage of rockburst. Jamei et al. [25] compared neural network algorithms with classical algorithms and demonstrated that the former is more suitable for rockburst prediction. Xue et al. [26] developed the particle swarm optimization-extreme learning machine (PSO-ELM) rockburst prediction model to predict typical rockburst cases in riverside hydropower plants in China. Wojtecki et al. [27] used machine learning algorithms to identify and assess rockburst. Moreover, researchers have started to develop and apply artificial intelligence (AI) models combining multiple factors for rockburst analysis and prediction.

Although previous studies have made some achievements, existing prediction techniques, especially AI techniques, require further improvements to more accurately predict the occurrence of rockburst. Therefore, the novel D-P-Transformer rockburst prediction model is proposed in this study. The model takes peak frequency, amplitude, ringing count, and absolute energy at the moment of rockburst as model input and predicts absolute energy using a novel embedding structure, probsparse self-attention, distilling operation, and multi-layer replicas. Seven large-scale rockburst datasets and four evaluation metrics are used to examine the reliability and validity of the proposed algorithm.

2. Methodology

As shown in Figure 1, the system framework consists of the following four main components.

(1): Data sources: In this study, real-time experimental AE data were collected by using a rockburst physical simulation system;
(2): Pre-processing: The experimental data were imported into the “Big Data AI Visualization and Analysis Platform” independently developed by the Big Data AI Lab with independent intellectual property rights for feature extraction. The data were then fed into the embedding structure of the D-P-Transformer rockburst prediction algorithm for pre-processing;
(3): D-P-Transformer: The D-P-Transformer rockburst prediction algorithm model was constructed;
(4): Prediction: The D-P-Transformer rockburst prediction algorithm was used to predict the absolute energy, and four evaluation metrics were used to evaluate the model.

2.1. Data Sources

Rockburst occurs frequently in mining sites such as tunnels and deep wells.

The strain rock explosion physical simulation test system, developed by the China University of Mining and Technology, was utilized in this study. This system consists of a main engine, a load control system, and a data collection system, and can simulate the occurrence of rockburst in the laboratory, as depicted in Figure 2. The experimental data can be collected by AE sensors, pressure sensors, and high-speed cameras, which record the complete changes of the rock sample during the rockburst in real-time. The experiment used two acoustic emission monitoring systems, including the PCI-2 acoustic emission sensor from the PAC company in the US and the PXWAE acoustic emission sensor from the Pengxiang company in China. Finally, based on comparative analysis and research, the data collected by the PCI-2 acoustic emission sensor is the most accurate, with the smallest data error. As a result, the experimental data are the results of the PCI-2 acoustic emission sensor. The PCI-2 acoustic emission sensor is a dual channel sensor, comprising a first channel and a second channel. The experimental data used is primarily from the first channel, and the experimental data from the second channel serves primarily to validate the data from the first channel.

The rockburst datasets used in this study were those of marble-1#, marble-3#, marble-4#, granite-61#, granite-64#, sandstone-60#, and slate-30# rock samples. These rock samples were cut to the specified scale size under conditions of good homogeneity and integrity. As illustrated in Figure 3, force was applied to the rock sample in three directions and on six sides to simulate the force on the rock before mining in the field, such as in the coal mine and deep shafts, after which rapid unloading was carried out along the advance direction of the excavation (direction of the hollow arrow) to simulate the force on the rock during construction up to the occurrence of rockburst.

2.2. Pre-Processing

Rockburst experiments generate a large amount of experimental data. The data generated by each rockburst experiment were stored in tens of thousands of text files, each of which contained 4096 sets of index data of multiple parameters at each moment. Then, a large amount of rockburst data collected in the previous step was imported into the “Big Data AI visualization and analysis platform” for rockburst feature extraction; the peak frequency, amplitude, ringing count, absolute energy, and other important rockburst features were extracted. Finally, a text file of the main feature parameters of the rockburst experiment was exported by the system. Parametric sensitivity analysis has been analyzed in our previous work, which shows that the frequency, amplitude, and absolute energy of the acoustic emission signal can be used for rockburst prediction [28].

To individually relate the extracted features to the experimental runs, the feature data inside the main parameter text file were used as the input of the embedding structure in the D-P-Transformer rockburst prediction algorithm, as presented in Figure 4. This structure consists of three main parts, namely, the scalar, relative time stamp, and absolute time stamp. This structure is obtained by Equation (1) [29]:

X_{en}^{t} = α u_{i}^{t} + P E_{(L_{X} (t - 1) + i,)} + \sum_{p} {[S E_{(L_{X} (t - 1) + i)}]}_{p}

(1)

where PE is the position and ordering information of the rockburst feature data in the long temporal sequence of rockburst, SE is the temporal information of the input sequence of rockburst feature data,

L_{X}

is the input length of the rockburst feature data,

X_{e n}^{t}

is the input of the rockburst feature at time t in the encoder, which contains the temporal and positional information of the rockburst feature, and

α u_{i}^{t}

is the input rockburst feature data converted from a 1-dimensional to 512-dimensional feature vector by conv1D, which corresponds to the scalar. Moreover, PE(LX(t−1) + i,) is the integrated information of the position encoding, which is the same as the position embedding in the Transformer algorithm. It has the form of Equations (2) and (3) and corresponds to the relative time stamp [29].

P E_{(p o s, 2 i)} = \sin (p o s / 10000^{2 i / d_{m o d e l}})

(2)

P E_{(p o s, 2 i + 1)} = \cos (p o s / 10000^{2 i / d_{m o d e l}})

(3)

where pos is the absolute coordinate of the entire time series and 𝑑𝑚𝑜𝑑𝑒𝑙 is the dimension of the transformed feature. Equation (2) is the PE calculation formula for even positions and Equation (3) is the PE calculation formula for odd positions. Moreover,

\sum_{p} {[S E_{(L_{X} (t - 1) + i)}]}_{p}

is the respective time of each rockburst feature throughout the rockburst experiment, which corresponds to the absolute time stamp.

2.3. D-P-Transformer

The various characteristics of rockburst are different in different time states; thus, rockburst occurrence is a long time-series problem. In recent years, numerous researchers have applied their knowledge of mathematics, computer science, and other fields to illustrate the great advantages of the Transformer model in long time-series prediction from both the theoretical, foundational and practical levels. The present work proposes the D-P-Transformer algorithm based on the rockburst prediction method to analyze and predict rockburst occurrence, based on the encoder and decoder structures of the Transformer model, as illustrated in Figure 5.

The structure of the encoder in the proposed prediction algorithm is designed to extract the long-term dependencies of long sequence inputs. The encoder, as a result of the probsparse self-attention mechanism, is subject to redundancy in its feature mapping. Therefore, the proposed D-P-Transformer rockburst prediction algorithm uses a distilling operation to assign higher weights to dominant rockburst features, and generates rockburst feature mappings at the next layer using Equation (4) for layers j to j + 1 [30]:

X_{j + 1}^{t} = M a x P o o l (E L U (C o n v l d ({[X_{j}^{t}]}_{A B})))

(4)

where

{[X_{j}^{t}]}_{A B}

is the key operation in the attention block, and the multi-head probsparse self-attention Convld denotes a one-dimensional convolution operation on a time series. Moreover, ELU is the activation function and MaxPool is maximum pooling. Equation (4) indicates that input

X_{j}^{t}

goes through the AB layer, Convld mapping, ELU, and the MaxPool layer, in turn, to obtain the coded input

X_{j + 1}^{t}

at the subsequent moment.

Via the distilling operation step indicated by the blue trapezoids in Figure 5, the input sequence length of each layer of the decoder is shortened, the size of the network is drastically reduced, and the stacking of layers increases the robustness of the model. In addition, multiple layer replicas are used for simultaneous computation during algorithm operation to speed up the operation and drastically reduce the amount of computation.

The decoder of the proposed prediction algorithm uses two multi-head attention layers and provides the input vector via Equation (5) [30]:

X_{d e}^{t} = Concat (X_{t o k e n}^{t}, X_{0}^{t}) \in R^{(L_{t o k e n} + L_{y}) d_{m o d e l}}

(5)

where

X_{d e}^{t}

is the input of the decoder,

X_{t o k e n}^{t}

is the known part of the encoder input,

L_{t o k e n}^{t}

is the length of that part,

X_{0}^{t}

is the masked part of the encoder input, and L_y is the length of the masked part. The decoder inputs a sequence

X_{d e}^{t}

for some time and the latter part is masked with 0. The purpose of the entire algorithm is to predict the masked part in the decoder. The hidden multi-head attention mechanism does not cause each position to pay attention to the next position, which avoids autoregression and creates a fully connected layer to obtain the final output.

The first step of the proposed prediction algorithm is to use probsparse self-attention instead of traditional self-attention to form the P-Transformer rockburst prediction algorithm. The probsparse self-attention first selects random queries from K to calculate the sparse score, and then selects the u queries with the highest sparse score to calculate the attention value. The remaining queries are not calculated but are sent directly to the input of the self-attention layer, and the average value is taken as the output. As a result, the algorithm only calculates the matrix of the Top-u q_i, and directly takes the average value of the remaining q_i; thus, the amount of computation is greatly reduced.

The traditional self-attention input is (Query, Key), and the scaled dot-product is obtained by Equation (6) [31]:

A (Q, K, V) = S o f t m a x (\frac{Q K^{T}}{\sqrt{d}}) V

(6)

where

Q \in R^{L_{Q} \times d}

,

K \in R^{L_{K} \times d}

, and

V \in R^{L_{V} \times d}

. Moreover, d is the dimension of the input, and Q, K, and V are the active query vector, key vector, and value vector of the input, respectively. The weighted probability of the ith query is in the form of Equation (7) [31]:

A (q_{i}, K, V) = \sum_{j} \frac{k (q_{i}, k_{i})}{\sum_{l} k (q_{i}, k_{l})} V_{j} = E_{p (k_{i} ∣ q_{i})} [v_{j}]

(7)

where

p (k_{i} ∣ q_{i}) = \frac{k (q_{i}, k_{i})}{\sum_{l} k (q_{i}, k_{l})}

and

k (q_{i}, k_{j}) = \exp (\frac{q_{i} k_{i}^{i}}{\sqrt{d}})

. Moreover, q_i, k_i, and v_i, respectively, denote the ith row of matrices Q, K, and V.

KL divergence can effectively evaluate the sparsity of the ith query and this method is an effective way to calculate the attention and probability distribution of the query with unity. The KL divergence is calculated by Equation (8) [31]:

K L (q | | p) = l n \sum_{l = 1}^{L_{K}} e^{\frac{q_{i} k_{l}^{T}}{\sqrt{d}}} - \frac{1}{L_{K}} \sum_{j = 1}^{L_{K}} \frac{q_{i} k_{J}^{T}}{\sqrt{d}} - l n L_{K}

(8)

Because

l n L_{K}

is a constant, it does not affect the derivation. Therefore, the measurement equation for the sparsity of the ith query is Equation (9):

M (q_{i}, K) = l n \sum_{j = 1}^{L_{K}} e^{\frac{q_{i} k_{l}^{T}}{\sqrt{d}}} - \frac{1}{L_{K}} \sum_{j = 1}^{L_{K}} \frac{q_{i} k_{J}^{T}}{\sqrt{d}}

(9)

For all keys,

q_{i}

is the Log-Sum-Exp (LSE), and to simplify the formula,

l n \sum_{j = 1}^{L_{K}} e^{\frac{q_{i} k_{l}^{T}}{\sqrt{d}}}

is replaced by the maximum value

max_{j} {\frac{q_{i} k_{j}^{T}}{\sqrt{d}}}

and

\frac{1}{L_{K}} \sum_{j = 1}^{L_{K}} \frac{q_{i} k_{J}^{T}}{\sqrt{d}}

is the arithmetic mean. Therefore, after simplification, Equation (10) can be used instead [31]:

\bar{M} (q_{i}, K) = max_{j} {\frac{q_{i} k_{j}^{T}}{\sqrt{d}}} - \frac{1}{L_{K}} \sum_{j = 1}^{L_{K}} \frac{q_{i} k_{J}^{T}}{\sqrt{d}}

(10)

The final formula for probsparse self-attention can be obtained as Equation (11) [31]:

A (Q, K, V) = S o f t m a x (\frac{\bar{Q} K^{T}}{\sqrt{d}}) V

(11)

where

\bar{Q}

is a sparse matrix with the same size as q, and

\bar{Q}

is the active query after the linear transformation of the input

X_{e n}^{t}

.

The P-Transformer rockburst prediction algorithm is presented in Algorithm 1.

Algorithm 1 The P-Transformer algorithm.

1:Input: Database

y_{i} \in {y_{1}, y_{2}, \dots y_{n}}

feature map S,

i \in [1, N]

Tensor

Q \in R^{L_{Q} \times d}

,

K \in R^{L_{K} \times d}

,

V \in R^{L_{V} \times d}

2:Output: Predicted value

{\bar{y}}_{i}

3:for

i = 1, 2, \dots N

do

4:According to Equations (1)–(3), get

y_{i}

5:end for

6:randomly select dot-product pairs from K as

\bar{K}

7:compute the KL divergence

8:set the sparse score

\bar{S} = Q \bar{K}

9:compute the measurement

\bar{M} = \max (\bar{S}) – mean (\bar{S})

by row

10:set Top-u queries under

\bar{M}

as

\bar{Q}

11:

S_{1} \leftarrow s o f t m a x (\bar{Q} K^{T} / \sqrt{d}) \cdot V, S_{0} \leftarrow

mean(V)

12:S ← {S₁, S₀}

13:for

i = 1, 2, \dots N

do

14:According to Equation (5) of the Decoder, get

{\bar{y}}_{i}

15:end for

Based on the P-Transformer rockburst prediction algorithm, the D-P-Transformer algorithm is formed by adding a distilling operation to the encoder part and simultaneously calculating multiple layer replicas. The specific algorithm is presented in Algorithm 2. When the rockburst feature map S is obtained, the rockburst features with larger weights are extracted by the convolution operation, activation function, and maximum pooling to significantly reduce the size of the network and are used as the code input for the next moment.

Algorithm 2 The D-P-Transformer algorithm.

1:Input: Dataset

y_{i} \in {y_{1}, y_{2}, \dots y_{n}}

feature map S,

i \in [1, N]

,
Tensor

Q \in R^{L_{Q} \times d}

,

K \in R^{L_{K} \times d}

,

V \in R^{L_{V} \times d}

2:Output: Predicted value

{\bar{y}}_{i}

3:for

i = 1, 2, \dots N

do

4:According to Equations (1)–(3), get

y_{i}

5:end for

6:compute the measurement

\bar{M} = \max (\bar{S}) – mean (\bar{S})

by row

7:Set the Top-u queries under

\bar{M}

as

\bar{Q}

8:

S_{1} \leftarrow s o f t m a x (\bar{Q} K^{T} / \sqrt{d}) \cdot V, S_{0} \leftarrow

mean(V)

9:S ← {S₁, S₀}

10:for

i = 1, 2, \dots N

do

11:According to Equations (4) and (5) of the Encoder and multiple layer replicas, get

{\bar{y}}_{i}

12:end for

2.4. Prediction

In this study, seven large-scale rockburst datasets were used as the experimental data, namely, marble-1#, marble-3#, marble-4#, slate-30#, sandstone-60#, granite-61#, and granite-64#. These datasets were used to enhance the reliability of the experimental results after the prediction model was constructed. Moreover, various metrics were used to verify the performance of the algorithm model, namely, the mean absolute error (MAE), mean square error (MSE), root mean square error RMSE, and mean absolute percentage error (MAPE). The prediction experiments and results are detailed in the next section.

3. Results and Discussion

3.1. Evaluation Metrics

To assess the reliability and accuracy of the proposed model, several metrics were used to provide a comprehensive understanding of the rockburst datasets used and the proposed model. The formulas and meanings of the evaluation metrics are detailed in Table 1. In the model evaluation, the smaller the value of these metrics, the better the performance of the tested model. Moreover, n is the number of rockburst feature data points in the experiment, and

y_{i}

and

{\bar{y}}_{i}

are the ith data points of rockburst feature measurement and prediction, respectively, in the experiment.

3.2. Results

To solve the problem of the huge amount of rockburst data, the P-Transformer rockburst prediction algorithm, formed by probsparse self-attention instead of traditional self-attention, was first used for experimentation, and the prediction results were compared with those of the Transformer algorithm model. Among the prediction results of different ratios of training and test sets for the same type of rock sample, three sets of optimal prediction results were respectively selected. Table 2 reports the results of a large number of experiments using multiple types of rockburst data and different training and test set ratios; the best results are highlighted in boldface. Most of the prediction results of the P-Transformer rockburst prediction algorithm were found to be better than those of the Transformer algorithm. P-Transformer selects the Top-u rockburst data to calculate the attention value, which not only greatly reduced the amount of calculation, but also improved the indicator results as compared to those of the original model.

In the proposed algorithm, a distilling operation and multiple layer replicas are used simultaneously, which greatly reduces the running time and enhances the robustness. Table 3 compares the prediction results of the D-P-Transformer rockburst prediction algorithm with those of the Transformer algorithm based on numerous experiments using multiple types of rockburst data and different training and test sets ratios; the best results are highlighted in boldface. Most of the results of the D-P-Transformer rockburst prediction algorithm were found to be better than those of the Transformer algorithm. The metrics were significantly better than those of the Transformer algorithm model.

3.3. Discussion

For model evaluation, three sets of optimal prediction results were selected for statistical analysis to ensure the performance of the proposed model. To prevent excessively large or excessively small errors, multiple sets of prediction results for the same type of rock were averaged by value. Figure 6 presents the comparison of the average values of multiple sets of prediction results of the P-Transformer rockburst prediction algorithm with those of the Transformer model. As shown in Figure 6a, the MAE values of the P-Transformer algorithm were found to be slightly lower than those of the Transformer model on the marble-3#, marble-4#, slate-30#, sandstone-60#, and granite-61# rock samples. The values for the marble-3#, marble-4 #, and granite-61# rock samples were close to 0, and are therefore equivalent to the best model. As shown in Figure 6b,c, the MSE and RMSE values of the P-Transformer algorithm were found to be slightly lower than those of the Transformer model on the marble-1#, marble-3#, marble-4#, sandstone-60#, and granite-64# rock samples. The best results, with almost no error, were obtained on the marble-4# and granite-61# rock samples, indicating that this model is the most suitable for these types of rocks. As shown in Figure 6d, the MAPE values of the P-Transformer algorithm were found to be significantly lower than those of the original model for all rock samples, indicating that the errors of this model were generally smaller than those of the Transformer model, and this model is therefore more suitable for rockburst prediction.

To further reduce the error and computation time, the D-P-Transformer rockburst prediction algorithm was formed by using multiple layer replicas for simultaneous computation, on the basis of the P-Transformer algorithm. Figure 7 exhibits the comparison of the average values of multiple sets of prediction results of the D-P-Transformer algorithm with those of the P-Transformer model. As shown in Figure 7a, the MAE values of the D-P-Transformer algorithm were lower than those of the P-Transformer model, except for the value for the marble-1# rock sample. The MAE values for the marble-3#, marble-4#, and granite-61# rock samples were close to zero, which is equivalent to no error and indicates high model prediction ability. As shown in Figure 7b,c, the MSE and RMSE values of the D-P-Transformer algorithm were found to be almost equal to those of the P-Transformer model for the slate-30# rock samples, and were lower than those of the original model for the rest of the rock samples. The best results, with almost no errors, for the marble-4# and granite-61# rock samples indicate that this model is most suitable for these types of rocks. As shown in Figure 7d, the MAPE values of the D-P-Transformer algorithm were found to be significantly lower than those of the original model for all the rock samples. This indicates that the errors of this model were all smaller than those of the P-Transformer model, and that this model is more suitable for rockburst prediction.

Based on the comparison with the original prediction algorithm, the average metric results of multiple groups of experiments were compared with the results of the LSTM prediction algorithm, as exhibited in Figure 8. The results indicate that the proposed D-P-Transformer rockburst prediction algorithm achieved the best overall performance. As presented in Figure 8a, it is obvious from the three rock sample experiments of marble-3#, slate-30#, and granite-64# that the prediction errors of the algorithms, from least to greatest, were those of the D-P-Transformer, LSTM, P-Transformer, and Transformer algorithms. In particular, for the three rock sample experiments of marble-3#, marble-4#, and granite-61#, the prediction errors of the D-P-Transformer algorithm were quite low, even close to 0, indicating that this algorithm is especially suitable for these three types of rocks. As presented in Figure 8b,c, it is evident from the four rock sample experiments of marble-1#, marble-3#, sandstone-4#, and granite-64# that the errors of the LSTM, Transformer, P-Transformer, and D-P-Transformer algorithms decreased sequentially, indicating the improvement of the model performance. For the marble-4# and granite-61# rock experiments in particular, the MSE and RMSE values of the D-P-Transformer rockburst prediction algorithm were almost 0, indicating that this algorithm is very suitable for these two types of rock samples. Figure 8d shows that the prediction errors of the D-P-Transformer algorithm were better than those of the other three algorithms for the marble-1#, marble-3#, marble-4#, and granite-64# rock samples, and were close to 0 for the marble-1#, marble-3#, and marble-4# samples.

On the basis that the proposed model is superior to the original model, the prediction data of each model were observed, the maximum and minimum values were removed, and the average value was then taken as the final model evaluation metric result and compared with the HGSO-ANFIS algorithm [32], as reported in Table 4. The values of all performance metrics of the proposed D-P-Transformer rockburst prediction algorithm were better than those of the other algorithms. Moreover, in terms of the MAE, MSE, RMSE, and MAPE values, the performance of the P-Transformer algorithm was improved by 15.29%, 21.82%, 9.56%, and 21.88%, respectively, as compared with the Transformer model. In terms of the MAE, MSE, RMSE, and MAPE values, the performance of the D-P-Transformer algorithm was improved by 24.45%, 46.56%, 17.32%, and 48.11%, respectively, as compared with the Transformer model. In summary, in terms of the MAE, MSE, RMSE, and MAPE metrics, the proposed D-P-Transformer algorithm is the most suitable for rockburst prediction, with a smaller error and better results than the other four compared algorithms.

4. Conclusions

This study proposed a novel D-P-Transformer rockburst prediction algorithm for the prediction of the absolute energy of rockburst experiments. First, the embedding structure was improved based on the traditional Transformer model to increase the applicability of the algorithm to rockburst data. Then, probsparse self-attention was used instead of traditional self-attention to significantly reduce the computational effort. Finally, a distilling operation and multiple layer replicas were employed simultaneously to improve the computational efficiency.

The results of prediction revealed that, compared with those of the Transformer algorithm, the MAE, MSE, RMSE, and MAPE values of the proposed D-P-Transformer rockburst prediction algorithm were reduced by averages of 24.45%, 46.56%, 17.32%, and 48.11%, respectively. The results of prediction revealed that, compared with the other algorithms, the MAE, MSE, RMSE, and MAPE values of the proposed D-P-Transformer rockburst prediction algorithm were reduced by a maximum of 90.67%, 90.41%, 69.03%, and 48.11%, respectively. In all aspects, the D-P-Transformer rockburst prediction algorithm outperforms HGSO-ANFIS and other prediction algorithms. The results indicate that the proposed D-P-Transformer algorithm not only produces less prediction error but is also more suitable for the rockburst prediction analysis of multiple types of rocks, as compared to the traditional Transformer model.

The proposed D-P-Transformer rockburst prediction algorithm can potentially reduce mining-related economic losses, casualties, and environmental impacts. Nevertheless, differences exist between rockburst simulated under experimental conditions and rockburst in the field. Therefore, in future research, it is suggested that the proposed method be applied to the Tayi tunnel in Yunnan Province, China to verify its performance in a real-life situation. Future studies could also benefit from the combination of the proposed D-P-Transformer rockburst prediction algorithm with multiple decision mechanisms to increase the accuracy of coal mine rockburst prediction. Multidisciplinary approaches are required to integrate mining’s technical, economic, environmental, and social aspects into sustainable development.

Author Contributions

Conceptualization, Y.Z. and J.L.; methodology, J.L.; software, J.L.; validation, Y.Z., J.L. and D.L.; formal analysis, J.L.; investigation, G.C. and J.D.; resources, Y.Z. and D.L.; data curation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, Y.Z. and J.L.; visualization, G.C.; supervision, Y.Z.; project administration, J.D.; funding acquisition, Y.Z. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by State Key Laboratory for GeoMechanics and Deep Underground Engineering & Institute for Deep Underground Science and Engineering, grant number XD2021022, and BUCEA Post Graduate Innovation Project under Grant, grant number PG2021054.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Thanks are due to the parents, teachers, Su Yilin, Gao Kailong, and editors for their support.

Conflicts of Interest

The authors declare no conflict of interest. And the funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

References

He, X.-Q.; Zhou, C.; Song, D.-Z.; Li, Z.-L.; Cao, A.-Y.; He, S.-Q.; Khan, M. Mechanism and monitoring and early warning technology for rockburst in coal mines. Int. J. Miner. Met. Mater. 2021, 28, 1097–1111. [Google Scholar] [CrossRef]
Wang, C.-L.; Chen, Z.; Liao, Z.-F.; Hou, X.-L.; Liu, J.-F.; Wang, A.-W.; Li, C.-F.; Qian, P.-F.; Li, G.-Y.; Lu, H. Experimental investigation on predicting precursory changes in entropy for dominant frequency of rockburst. J. Central South Univ. 2020, 27, 2834–2848. [Google Scholar] [CrossRef]
Feng, G.; Lin, M.; Yu, Y.; Fu, Y. A Microseismicity-Based Method of Rockburst Intensity Warning in Deep Tunnels in the Initial Period of Microseismic Monitoring. Energies 2020, 13, 2698. [Google Scholar] [CrossRef]
Cai, W.; Dou, L.; Si, G.; Cao, A.; He, J.; Liu, S. A principal component analysis/fuzzy comprehensive evaluation model for coal burst liability assessment. Int. J. Rock Mech. Min. Sci. 2016, 100, 62–69. [Google Scholar] [CrossRef]
Mark, C. Coal bursts in the deep longwall mines of the United States. Int. J. Coal Sci. Technol. 2016, 3, 1–9. [Google Scholar] [CrossRef] [Green Version]
Swolkień, J.; Szlązak, N. The Impact of the Coexistence of Methane Hazard and Rock-Bursts on the Safety of Works in Underground Hard Coal Mines. Energies 2020, 14, 128. [Google Scholar] [CrossRef]
Zhang, Z.; Luo, C.; Zhang, H.; Gong, R. Rockburst Identification Method Based on Energy Storage Limit of Surrounding Rock. Energies 2020, 13, 343. [Google Scholar] [CrossRef] [Green Version]
Zhang, H.; Ma, C.; Li, T. Quantitative Evaluation of the “Non-Enclosed” Microseismic Array: A Case Study in a Deeply Buried Twin-Tube Tunnel. Energies 2019, 12, 2006. [Google Scholar] [CrossRef] [Green Version]
Cook, N.G.W. The basic mechanics of rockbursts. J. S. Afr. Inst. Min. Metall. 1963, 64, 71–81. Available online: https://hdl.handle.net/10520/AJA0038223X_3752 (accessed on 20 May 2022).
He, M.C.; Zhao, F.; Cai, M.; Du, S. A Novel Experimental Technique to Simulate Pillar Burst in Laboratory. Rock Mech. Rock Eng. 2014, 48, 1833–1848. [Google Scholar] [CrossRef]
Su, G.; Shi, Y.; Feng, X.; Jiang, J.; Zhang, J.; Jiang, Q. True-Triaxial Experimental Study of the Evolutionary Features of the Acoustic Emissions and Sounds of Rockburst Processes. Rock Mech. Rock Eng. 2017, 51, 375–389. [Google Scholar] [CrossRef]
Shen, H.; Li, X.; Li, Q.; Wang, H. A method to model the effect of pre-existing cracks on P-wave velocity in rocks. J. Rock Mech. Geotech. Eng. 2019, 12, 493–506. [Google Scholar] [CrossRef]
Sun, H.; Ma, L.; Konietzky, H.; Yuanyuan, D.; Wang, F. Characteristics and generation mechanisms of key infrared radiation signals during damage evolution in sandstone. Acta Geotech. 2021, 17, 1753–1763. [Google Scholar] [CrossRef]
Wang, C.; Cao, C.; Liu, Y.; Li, C.; Li, G.; Lu, H. Experimental investigation on synergetic prediction of rockburst using the dominant-frequency entropy of acoustic emission. Nat. Hazards 2021, 108, 3253–3270. [Google Scholar] [CrossRef]
Liu, X.; Wang, E. Study on characteristics of EMR signals induced from fracture of rock samples and their application in rockburst prediction in copper mine. J. Geophys. Eng. 2017, 15, 909–920. [Google Scholar] [CrossRef] [Green Version]
Zhang, H.; Chen, L.; Chen, S.G.; Sun, J.C.; Yang, J.S. The Spatiotemporal Distribution Law of Microseismic Events and Rockburst Characteristics of the Deeply Buried Tunnel Group. Energies 2018, 11, 3257. [Google Scholar] [CrossRef] [Green Version]
Zhang, Y.; Wu, W.; Yao, X.; Liang, P.; Sun, L.; Liu, X. Study on Spectrum Characteristics and Clustering of Acoustic Emission Signals from Rock Fracture. Circuits Syst. Signal Process. 2019, 39, 1133–1145. [Google Scholar] [CrossRef]
Chu, Y.; Sun, H.; Zhang, D. Experimental study on evolution in the characteristics of permeability, deformation, and energy of coal containing gas under triaxial cyclic loading-unloading. Energy Sci. Eng. 2019, 7, 2112–2123. [Google Scholar] [CrossRef] [Green Version]
Gao, L.; Gao, F.; Xing, Y.; Zhang, Z. An Energy Preservation Index for Evaluating the Rockburst Potential Based on Energy Evolution. Energies 2020, 13, 3636. [Google Scholar] [CrossRef]
Zeng, A.; Yan, L.; Huang, Y.; Ren, E.; Liu, T.; Zhang, H. Intelligent Detection of Small Faults Using a Support Vector Machine. Energies 2021, 14, 6242. [Google Scholar] [CrossRef]
Świątek, J.; Janoszek, T.; Cichy, T.; Stoiński, K. Computational Fluid Dynamics Simulations for Investigation of the Damage Causes in Safety Elements of Powered Roof Supports—A Case Study. Energies 2021, 14, 1027. [Google Scholar] [CrossRef]
He, S.; Song, D.; Li, Z.; He, X.; Chen, J.; Zhong, T.; Lou, Q. Mechanism and Prevention of Rockburst in Steeply Inclined and Extremely Thick Coal Seams for Fully Mechanized Top-Coal Caving Mining and Under Gob Filling Conditions. Energies 2020, 13, 1362. [Google Scholar] [CrossRef] [Green Version]
Chlebowski, D.; Burtan, Z. Mining-Induced Seismicity during Development Works in Coalbeds in the Context of Forecasts of Geomechanical Conditions. Energies 2021, 14, 6675. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, J.; Zhang, J.; Xu, B.; Zhang, L.; Li, W. Rockburst Precursors and the Dynamic Failure Mechanism of the Deep Tunnel: A Review. Energies 2021, 14, 7548. [Google Scholar] [CrossRef]
Jamei, M.; Hasanipanah, M.; Karbasi, M.; Ahmadianfar, I.; Taherifar, S. Prediction of flyrock induced by mine blasting using a novel kernel-based extreme learning machine. J. Rock Mech. Geotech. Eng. 2021, 13, 1438–1451. [Google Scholar] [CrossRef]
Xue, Y.; Bai, C.; Qiu, D.; Kong, F.; Li, Z. Predicting rockburst with database using particle swarm optimization and extreme learning machine. Tunn. Undergr. Space Technol. 2020, 98, 103287. [Google Scholar] [CrossRef]
Wojtecki, L.; Iwaszenko, S.; Apel, D.B.; Cichy, T. An Attempt to Use Machine Learning Algorithms to Estimate the Rockburst Hazard in Underground Excavations of Hard Coal Mine. Energies 2021, 14, 6928. [Google Scholar] [CrossRef]
Zhang, Y.; Li, J.T.; Su, Y.L.; Gao, K.L.; Liu, K.F. Big Data Analysis of Acoustic Emission Characteristics of Laizhou Granite Rockburst Experiment. Railw. Eng. 2021, 61, 84–87. Available online: https://kns-cnki-net-443.door.bucea.edu.cn/kcms/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2021&filename=TDJZ202108018&uniplatform=NZKPT&v=qvw_GJqepnF4lp1Idbihyftk2RD2u6lhAYyC3Uixkv0xldVZP9MXCfieSs98XuTF (accessed on 20 May 2022).
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Processing Syst. 2017, 30, 1–15. [Google Scholar] [CrossRef]
Lu, W.; Gao, L.; Li, Z.; Wang, D.; Cao, H. Prediction of Long-Term Elbow Flexion Force Intervals Based on the Informer Model and Electromyography. Electronics 2021, 10, 1946. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI, virtually, 2–9 February 2021; p. 2. [Google Scholar] [CrossRef]
Xie, C.; Nguyen, H.; Bui, X.-N.; Nguyen, V.-T.; Zhou, J. Predicting roof displacement of roadways in underground coal mines using adaptive neuro-fuzzy inference system optimized by various physics-based optimization algorithms. J. Rock Mech. Geotech. Eng. 2021, 13, 1452–1465. [Google Scholar] [CrossRef]

Figure 1. The system framework of the D-P-Transformer rockburst prediction algorithm.

Figure 2. The rockburst physical simulation system.

Figure 3. The rock sample stress conditions.

Figure 4. The embedding structure.

Figure 5. The structure of the D-P-Transformer rockburst prediction algorithm.

Figure 6. The metric results of the P-Transformer and Transformer algorithms: (a) MAE; (b) MSE; (c) RMSE; (d) MAPE.

Figure 7. The metrics of the D-P-Transformer and P-Transformer algorithms: (a) MAE; (b) MSE; (c) RMSE; (d) MAPE.

Figure 8. The metrics of various models: (a) MAE; (b) MSE; (c) RMSE; (d) MAPE.

Table 1. The evaluation metrics.

Metric	Formula	Significance
Mean absolute error (MAE)	$M A E = \frac{1}{n} \sum_{i = 1}^{n} \| y_{i} - {\bar{y}}_{i} \|$	The ratio of the mean errors for all the predictions in the rockburst experiments
Mean square error (MSE)	$M S E = \frac{1}{n} {\sum_{i = 1}^{n} \| y_{i} - {\bar{y}}_{i} \|}^{2}$	The ratio of the mean square errors for all the predictions in the rockburst experiments
Root mean square error (RMSE)	$R M S E = \sqrt{\frac{1}{n} {\sum_{i = 1}^{n} \| y_{i} - {\bar{y}}_{i} \|}^{2}}$	The ratio of the root mean square errors for all the predictions in the rockburst experiments
Mean absolute percentage error (MAPE)	$M A P E = \frac{1}{n} \sum_{i = 1}^{n} \| \frac{y_{i} - {\bar{y}}_{i}}{y_{i}} \|$	The ratio of the mean percentage errors for all the predictions in the rockburst experiments

Table 2. The metrics of the P-Transformer and Transformer algorithms on different datasets.

Method Metric		MAE		MSE		RMSE		MAPE
Method Metric		Transformer	P-Transformer	Transformer	P-Transformer	Transformer	P-Transformer	Transformer	P-Transformer
1#	Num.1	1.799	2.009	532.634	534.817	23.078	23.126	0.468	0.641
	Num.2	1.732	1.737	477.107	415.467	21.842	20.383	0.880	0.567
	Num.3	1.736	1.737	364.028	328.099	19.080	18.114	0.652	0.574
3#	Num.1	0.513	0.303	135.020	97.976	11.620	9.898	1.141	0.417
	Num.2	0.439	0.427	97.799	69.145	9.889	8.315	1.015	0.494
	Num.3	0.143	0.132	69.077	25.009	8.311	5.001	0.656	0.153
4#	Num.1	0.075	0.050	5.960	0.948	2.441	0.974	1.050	0.318
	Num.2	0.099	0.088	7.961	0.951	2.822	0.975	1.323	1.015
	Num.3	0.063	0.030	0.950	0.001	0.975	0.032	1.321	0.906
30#	Num.1	9.076	7.486	400.435	430.314	20.011	20.744	1.163	1.967
	Num.2	10.486	7.595	334.541	370.259	18.290	19.242	2.699	2.588
	Num.3	8.194	7.194	232.350	334.541	15.243	18.290	5.000	3.486
60#	Num.1	0.750	0.807	111.434	112.391	10.556	10.601	14.878	13.231
	Num.2	0.775	0.763	111.704	98.588	10.569	9.929	14.761	12.591
	Num.3	0.849	0.754	113.671	9.200	10.662	3.033	15.326	14.045
61#	Num.1	0.214	0.262	0.123	0.140	0.351	0.375	2.049	1.200
	Num.2	0.109	0.084	0.093	0.004	0.305	0.064	1.978	1.194
	Num.3	0.179	0.041	0.152	0.098	0.389	0.314	2.405	1.136
64#	Num.1	19.100	19.996	414.763	479.841	20.366	21.905	3.672	4.370
	Num.2	14.573	14.559	357.963	352.919	18.920	18.786	2.667	2.448
	Num.3	11.114	11.072	352.790	212.707	18.783	14.584	6.403	3.057

Table 3. The metrics of the D-P-Transformer and Transformer algorithms on different datasets.

Method Metric		MAE		MSE		RMSE		MAPE
Method Metric		Transformer	D-P-Transformer	Transformer	D-P-Transformer	Transformer	D-P-Transformer	Transformer	D-P-Transformer
1#	Num.1	1.799	2.320	532.634	435.933	23.078	20.879	0.468	0.553
	Num.2	1.732	1.768	477.107	377.680	21.842	19.434	0.880	0.609
	Num.3	1.736	1.728	364.028	258.550	19.080	16.079	0.652	0.391
3#	Num.1	0.513	0.222	135.020	97.774	11.620	9.888	1.141	0.352
	Num.2	0.439	0.305	97.799	69.014	9.889	8.307	1.015	0.457
	Num.3	0.143	0.134	69.077	25.005	8.311	5.000	0.656	0.237
4#	Num.1	0.075	0.048	5.960	0.676	2.441	0.822	1.050	0.163
	Num.2	0.099	0.058	7.961	0.021	2.822	0.144	1.323	0.839
	Num.3	0.063	0.010	0.950	0.001	0.975	0.012	1.321	0.339
30#	Num.1	9.076	7.234	400.435	356.908	20.011	18.892	1.163	2.515
	Num.2	10.486	7.255	334.541	356.908	18.290	18.892	2.699	2.167
	Num.3	8.194	5.255	232.350	324.450	15.243	18.013	5.000	2.106
60#	Num.1	0.750	0.787	111.434	53.757	10.556	7.332	14.878	15.419
	Num.2	0.775	0.433	111.704	35.206	10.569	5.933	14.761	11.093
	Num.3	0.849	0.577	113.671	8.839	10.662	2.973	15.326	9.428
61#	Num.1	0.214	0.006	0.123	0.101	0.351	0.317	2.049	1.200
	Num.2	0.109	0.004	0.093	0.005	0.305	0.069	1.978	1.194
	Num.3	0.179	0.035	0.152	0.098	0.389	0.313	2.405	1.136
64#	Num.1	19.100	15.771	414.763	353.194	20.366	18.793	3.672	1.634
	Num.2	14.573	14.071	357.963	212.835	18.920	14.589	2.667	1.480
	Num.3	11.114	9.759	352.790	353.166	18.783	18.793	6.403	1.602

Table 4. The performance comparison of various models.

Methods/Metrics	MAE	MSE	RMSE	MAPE
HGSO-ANFIS	19.975	1020.100	31.939	1.197
LSTM	2.233	160.935	12.686	1.337
Transformer	2.466	183.095	11.964	2.303
P-Transformer	2.089	143.137	10.820	1.799
D-P-Transformer	1.863	97.852	9.892	1.195

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Li, J.; Liu, D.; Chen, G.; Dou, J. D-P-Transformer: A Distilling and Probsparse Self-Attention Rockburst Prediction Method. Energies 2022, 15, 3959. https://doi.org/10.3390/en15113959

AMA Style

Zhang Y, Li J, Liu D, Chen G, Dou J. D-P-Transformer: A Distilling and Probsparse Self-Attention Rockburst Prediction Method. Energies. 2022; 15(11):3959. https://doi.org/10.3390/en15113959

Chicago/Turabian Style

Zhang, Yu, Jitao Li, Dongqiao Liu, Guangshu Chen, and Jiaming Dou. 2022. "D-P-Transformer: A Distilling and Probsparse Self-Attention Rockburst Prediction Method" Energies 15, no. 11: 3959. https://doi.org/10.3390/en15113959

APA Style

Zhang, Y., Li, J., Liu, D., Chen, G., & Dou, J. (2022). D-P-Transformer: A Distilling and Probsparse Self-Attention Rockburst Prediction Method. Energies, 15(11), 3959. https://doi.org/10.3390/en15113959

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

D-P-Transformer: A Distilling and Probsparse Self-Attention Rockburst Prediction Method

Abstract

1. Introduction

2. Methodology

2.1. Data Sources

2.2. Pre-Processing

2.3. D-P-Transformer

2.4. Prediction

3. Results and Discussion

3.1. Evaluation Metrics

3.2. Results

3.3. Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI