Automatic Events Recognition in Low SNR Microseismic Signals of Coal Mine Based on Wavelet Scattering Transform and SVM

: The technology of microseismic monitoring, the ﬁrst step of which is event recognition, provides an effective method for giving early warning of dynamic disasters in coal mines, especially mining water hazards, while signals with a low signal-to-noise ratio (SNR) usually cannot be recognized effectively by systematic methods. This paper proposes a wavelet scattering decomposition (WSD) transform and support vector machine (SVM) algorithm for discriminating events of microseismic signals with a low SNR. Firstly, a method of signal feature extraction based on WSD transform is presented by studying the matrix constructed by the scattering decomposition coefﬁcients. Secondly, the microseismic events intelligent recognition model built by operating a WSD coefﬁcients calculation for the acquired raw vibration signals, shaping a feature vector matrix of them, is outlined. Finally, a comparative analysis of the microseismic events and noise signals in the experiment veriﬁes that the discriminative features of the two can accurately be expressed by using wavelet scattering coefﬁcients. The artiﬁcial intelligence recognition model developed based on both SVM and WSD not only provides a fast method with a high classiﬁcation accuracy rate, but it also ﬁts the online feature extraction of microseismic monitoring signals. We establish that the proposed method improves the efﬁciency and the accuracy of microseismic signals processing for monitoring rock instability and seismicity.


Introduction
Intelligent mining is the only way to achieve the safe and efficient production of coal in mines [1]. With the depth of mining, multifactorial compound disasters, such as the mining water hazards and others, become more frequent under high ground stress and some other conditions. At the same time, the rapid development of intelligent mining technology has put forward a new development opportunity for coal geological guarantee technology to be used to avoid more hazards. Coal geological guarantee technology runs through the whole cycle of coal mine production [2,3] and plays an important role in water disaster prevention and intelligent mining, especially the exploration and treatment of hidden disaster-causing geological factors in coal mines [4].
Generally, microseismic monitoring technology is one of the important technologies adopted for the dynamic monitoring of mine geological information, which can monitor the rock rupture phenomenon in real time and has a large monitoring range. To detect and explain the interior of the working surface by using microseismic monitoring technology, the online monitoring of the top and bottom plate damage of the working surface and the description of the whole process of the water guide channel from gestation and development to the final instability can be realized. We use the theory of the interference stress distribution of the surrounding rock to reveal the construction change law of the stress field, playing a key factor in water prevention and control, decreasing water and many other coal mining hazards [5,6].
A single microseismic event with a very short duration lasts tens of milliseconds. The highly accurate recognition of such an event requires careful discrimination between the microseismic and noise events [7]. However, because of the influence of extractive perturbation and the increased amount of monitoring data, the traditional methods of microseismic monitoring data collection and processing are slow and have a low level of accuracy. It is acknowledged that microseismic events and noise can be discriminated easily by the human senses, however, it is extremely difficult to do using automatic recognition methods [8]. Usually, the monitoring station is terribly disturbed by the surrounding noise, and sometimes microseismic events are even be submerged into noise. Along with the properties of microseismic signals, different researchers have proposed some discrimination methods in previous studies.
Mainly, methods based on both the sliding window and the threshold value are considered to be traditional events recognition algorithms. Some commonly used methods are the STA/LTA (the short-term average to long-term average ratio) algorithm [9][10][11], as well as multi-window techniques [12] and the modified energy ratio method [13]. This method, with an operation speed that is extremely fast, is an ordinary discrimination process for the detection of the first arrival of a seismic phase [7]. However, the obstruction signal is considered active, that is, the noise resistance characteristics of the process are invalid. The AR-AIC algorithm is another method used to calculate an autoregressive model of two signals combined in different time windows that use the Akaike Information Criterion (AIC). When the AIC value reaches its minimum, a pick of one microseismic event can be declared [14][15][16].
In these algorithms, because of an increased sensitivity to amplitude mutation, it is a common shortcoming that noise and its energy are portrayed as much larger than microseismic events. This is even more likely when the noise has a frequency content similar to that of a microseismic event. Recently, within the workings of the proposed calculation methods, some intelligent algorithms have been principally applied to the recognition of microseismic events, resulting in a lower efficiency of the processing of collected data, the discordance of recognition standards, and misjudgment.
Otherwise, spectral analyses of the different types of seismic waveforms, such as reflection and refraction tomography, have been adopted to provide more information concerning the source [17,18]. Almost all of these methods are achieved through the Fourier transform theory [19], a theory using orthogonal basis functions having perfect localization in frequency but infinite extent in time. The antileakage least-squares spectral analysis method, a method regularizing irregularly spaced data series, is an iterative one that estimates the statistically significant spectral peaks in the spectrum [20,21]. Because the frequency content is quite time-dependent, this may not be an appropriate way to process seismic signals. To address this issue, an approach called time-frequency transforms, such as wavelet transform, has been widely used in geophysical data processing and interpretations [22][23][24]. Features in the time-frequency domain are also applied for the automatic processing of microseismic signals [25]. However, it is easily changed by the time changing and can miss signal features, so it is not suitable for the analysis of time-varying non-stationary signals and the construction of a feature matrix.
The wavelet scattering decomposition (WSD) transform theory is mainly used to perform an analysis of the complexity of a signal sequence, achieving a nonlinearity analysis because of its high robustness as a rapid and common algorithm, which makes the analysis of time sequences more functional. For example, Mallat and Bruna [26] enabled the identification of audio signals, handwritten text, and image textures by constructing a wavelet scattering decomposition transformation network. Anden and Mallat [27,28] Energies 2022, 15, 2326 3 of 13 extracted the effective feature information through the wavelet scattering decomposition transformation network from the classical music data set GTZAN and the voice call data set TIMIT, achieving good classification results and applying the same method to the analysis of arrhythmia data in the same year. Based on the properties of wavelet scattering, Wiatowski et al. [29] demonstrated the superiority of this method by a process of rigorous mathematical derivation and generalization, achieving good results in different wavelet frameworks. Wang et al. [30] used a wavelet scattering transformation network to extract the features of synthetic aperture radar images, effectively identifying mobile and fixed targets. Li et al. [31] proposed an algorithm for cardiac tone signal classification, using the wavelet scattering transformation network to obtain cardiac tone signal characteristics, which were able to effectively express the feature information corresponding to the signal, and then obtained the feature matrix of the signal used for support vector machine classification. Recently, artificial intelligence algorithms have been widely used in the research involving the recognition of microseismic events in order to improve the efficiency and accuracy of microseismic signal processing for the monitoring of rock instability and seismicity [32,33]. The powerful artificial intelligence classification algorithm of the support vector machine (SVM) constructs the hyperplane with the largest margin in multi-dimensional space, separating different cases of each category label [34,35]. The SVM algorithm is explicitly designed to perform binary (two cluster) classifications and is an influential supervised machine learning algorithm that is widely used in image recognition, text detection, and protein classification. Here, we have successfully adapted the SVM algorithm to intelligently discriminate microseismic signals into microseismic events and noise ones with a higher degree of accuracy.
In this study, not only was an intelligent recognition method for microseismic events based on the support vector machine classification algorithm proposed, but wavelet scattering decomposition transform theory was also introduced into the field, used in performing a study of the influence of quality factors based on the characteristics of the collected data. The feature extraction method was performed based on the microseismic signals' features of the wavelet scattering decomposition transform. Combined with the SVM algorithm, we built the recognition model fitting low signal-to-noise ratio signals. The historical monitoring sample signals applied to experimental verification were determined in order to confirm the effectiveness and instantaneity of this model. Our results suggest that WSD is able to explain the different characteristics of the two classes of signals; that the established WSD-SVM model is able to discriminate microseismic events from noise has been identified. Overall, these studies taken together have revealed the significant discovery that the speed of the calculation process of this model is faster and more useful for real-time online recognition.
The rest of this paper is organized as follows. In Section 2, the effective microseismic signals classification model we proposed is presented. Then, the results of both testing and genuine signals are presented in Section 3. In Section 4, a comparison with other existing methods is presented and analyzed. Finally, our conclusions are given in Section 5.

Wavelet Scattering Decomposition Theory
Wavelet transformation is an effective tool for time-varying non-stationary signal analysis [36]. Because of its scale variability and multi-resolution, it can describe both the time and frequency domain characteristics of the signal, so the local analysis of the signal has good results [37]. For signals in continuous finite time, the wavelet transform is defined as: where a is a scale factor or frequency factor, and b is a translation factor or time factor, and the movement of the main wave is along t. Judging from the above formula, wavelet transformation does not have translation invariance. The actual collected microseismic signals are usually much disturbed; even if overall there is no qualitative change, local changes will disturb the extracted signal features, thus affecting the analysis and recognition of the signal. Therefore, a signal analysis and feature extraction method with both translation invariance and local deformation stability is exactly what is needed.
With a module operation included, the operator |W m |, which removes the complex phase of all wavelet coefficients, can be obtained. Convolution with the input signal yields a non-linear wavelet modulus: where φ refers to the low pass filter, so S m (x) = x * φ refers to a local translation invariant descriptor of the signal x, the scattering coefficients, and the input signal with translation invariance, extracting the low-frequency information of the input signal and removing all high-frequency information. ψ j represents a high-frequency wavelet. High-frequency information is recovered by the modulus transformation U j (x) = x * ψ j (x) , which represents the high-frequency information on scale j and obtains deformation stability by module operation on the nonlinear wavelet transform. Therefore, the low-frequency information (scattering coefficients) and high-frequency information of the wavelet scattering transformation of order 0 are as follows: The 0-order high-frequency information section U 1 (x) is used as input for the first order scattering transformation; this can be denoted as follows: Then, the first order scattering coefficients are indicated as follows: and so on; repeating the iterative procedure above can be done to obtain a scattering coefficient of an arbitrary order. For arbitrary j ≥ 1, the wavelet module transformation convolution of the signal can be expressed as follows: As the next order input, U j x is low pass filtered to obtain the order m scattering coefficient: Applying |W m+1 | to U j x, both S m x and U j+1 x can be computed simultaneously. This can be expressed as: The highest-order l of the scattering decomposition can be defined by initializing U 0 x = x, when 0 ≤ m ≤ l and 1 ≤ j ≤ n, with the iteration of Equations (1)- (8).
In conclusion, the process of wavelet scattering transformation can be described as a scattering transform iteration on the wavelet module operator |W m |; convolution calculates the wavelet model transform U j x a value of m times and outputs the scattering coefficients S m x after low-pass filtering ( Figure 1).
, with the iteration of Equations (1)- (8). Eventually, a feature vector is formed by the scattering coefficients on l m ≤ ≤ 0 : In conclusion, the process of wavelet scattering transformation can be described as a scattering transform iteration on the wavelet module operator m W ; convolution calculates the wavelet model transform x U j a value of m times and outputs the scattering coefficients x S m after low-pass filtering ( Figure 1).

Support Vector Machine Theory
In recent years, as one of the small sample algorithms based on supervised machine learning theory mainly adopted in image identification, text detection, and other fields, support vector machine theory (SVM) and possesses great advantages in the solving of nonlinear, high dimensional and small sample pattern discrimination problems and is becoming an effective classification algorithm. Usually, SVM employs an iterative training algorithm, where an optimal hyperplane with the maximum margin in multi-dimensional space can be constructed and applied to minimize an error function, as seen in Figure 2 [32,33]. In our case, we define the feature extraction of microseismic signals as a binary and nonlinear classification problem, which is an extremely significant step in the proposed algorithm for judging whether a vibration signal is a microseismic event or not. In this study, the given training vectors N j R x j ,..., 1 , = ∈ in two classes and a label vector including Microseismic events (defined as M) and Noise (defined as N) are used and a quadratic optimization problem is solved by this model:

Support Vector Machine Theory
In recent years, as one of the small sample algorithms based on supervised machine learning theory mainly adopted in image identification, text detection, and other fields, support vector machine theory (SVM) and possesses great advantages in the solving of nonlinear, high dimensional and small sample pattern discrimination problems and is becoming an effective classification algorithm. Usually, SVM employs an iterative training algorithm, where an optimal hyperplane with the maximum margin in multidimensional space can be constructed and applied to minimize an error function, as seen in Figure 2 [32,33]. In our case, we define the feature extraction of microseismic signals as a binary and nonlinear classification problem, which is an extremely significant step in the proposed algorithm for judging whether a vibration signal is a microseismic event or not. In this study, the given training vectors x j ∈ R, j = 1, . . . , N in two classes and a label vector including Microseismic events (defined as M) and Noise (defined as N) are used and a quadratic optimization problem is solved by this model: which is subject to the constraints: where β is the normal vector to the hyperplane, b represents a constant. To avoid overfitting, the penalty parameter C is defined on the training error. Note that ξ j is the smallest non-negative number satisfying y j β φ x j + b ≥ 1 − ξ j . With the kernel φ adapted to convert the input data into the feature space, the kernel function G(x 1 , is supposed to be a dot product of the input data, mapping into the higher dimensional feature space by the process of transformation φ. over-fitting, the penalty parameter C is defined on the training error. Note that j ξ is the smallest non-negative number satisfying . With the kernel φ adapted to convert the input data into the feature space, the kernel function ( ) ( ) ( ) is supposed to be a dot product of the input data, mapping into the higher dimensional feature space by the process of transformation φ .

Data Preparation and Model Training
Firstly, we selected equal numbers of microseismic events and noise sequences, operating the calculation of the wavelet scattering coefficients on the two signals and extracting the feature vectors of each one to form the feature matrix called the training set. Making use of the software package for SVM, the classifier, which assists in the classification of the testing signal samples, was built by training the set of selected sample signals. The workflow framework for the best performance and the establishing of our intelligent recognition model, as well as the iterative training and optimization of the predictive model was designed and is shown in Figure 3.

Data Preparation and Model Training
Firstly, we selected equal numbers of microseismic events and noise sequences, operating the calculation of the wavelet scattering coefficients on the two signals and extracting the feature vectors of each one to form the feature matrix called the training set. Making use of the software package for SVM, the classifier, which assists in the classification of the testing signal samples, was built by training the set of selected sample signals. The workflow framework for the best performance and the establishing of our intelligent recognition model, as well as the iterative training and optimization of the predictive model was designed and is shown in Figure 3.  To ensure that the classification model is able to differentiate microseismic events from noise in a low SNR environment, appropriate historical samples were selected to compose a strong data set. Because of the complexity of the microseismic monitoring environment in coal mines, the selected historical samples for training needed to meet the following criteria: 1. Samples selected for training the model should be a series of microseismic vibration signals, achieving clear waveform and obvious jumping. 2. An equal number of noise samples easily expressed as microseismic events should be selected in order to describe the noise features precisely. To ensure that the classification model is able to differentiate microseismic events from noise in a low SNR environment, appropriate historical samples were selected to compose a strong data set. Because of the complexity of the microseismic monitoring environment in coal mines, the selected historical samples for training needed to meet the following criteria:

1.
Samples selected for training the model should be a series of microseismic vibration signals, achieving clear waveform and obvious jumping.

2.
An equal number of noise samples easily expressed as microseismic events should be selected in order to describe the noise features precisely.
The intelligent recognition algorithm based on the WSD and SVM of microseismic signals is as follows: Input: the data set S, time invariance scale, transform times and quality factor Output: the classification results.

1.
Step 1: Sample selection. Because of the above-mentioned criteria, the input data set S for training can be made up of n (n ≥ 50) samples. The data set S is composed of the same percent (50%) of the two types of signals. 2.
Step 2: Feature extraction. The feature matrix of S is obtained by the calculation of scattering coefficients taking into account the certain number of the time invariance scale, the transform times and the quality factor. 3.
Step 3: Cross validation. The k-fold cross validation method can be used to avoid over-fitting, evaluate classifier performance, and estimate the error rate or loss. Taking the level of computational efficiency into consideration, k in this study is 5. 4.
Step 4: SVM classification. In this step, we fit a one-vs-one SVM to the training data only and then use the trained model to make predictions concerning the 30% of the data withheld for testing.
Large amounts of continuous microseismic signals were collected by stations and geophones working in environments with a high level of noise. It is a formidable task to discriminate the microseismic events contained in those signals with precision using previous methods. Many events submerged in the noise cannot help with source location and other processes. The purpose for the construction of the WSD-SVM model used for the processing of monitoring vibration signals obtained from certain stations is to improve the recognition accuracy of microseismic events through processes so that the data can be identified precisely.

Testing Results
The data samples designed to fit the experiment were obtained from the KJ959 microseismic monitoring system, which has a sampling frequency designated as 1 kHz, a standard widely adopted in coal mine inrush water hazards prediction and prevention. These samples provided an effective series of microseismic vibration signals. In addition, we chose single component detection sensors with a frequency response range from 10 Hz to 1 kHz as the geophones. A total number of 108 raw signals regarded as data set S were collected using automatic pick-up technology. For ease of analysis, 108 signals were interpreted as segments of equal length, with each segment consisting of 7000 sampling points. The data set S was split into S1, containing 54 microseismic events with an obvious jump, and S2, comprising 54 noise signals. After that, they were categorized into M (for microseismic events)and N (for noise). For convenience, one segment from S1 and another from S2 were picked for analysis; they are shown is Figure 4.
The time invariance scale i = 6, transform times t = 3, the quality factor q = 3, 2, 1 and the calculation of scattering coefficients for the two signals are shown in Figure 5, showing the distinctive differences between events and noise. The features of all the signal segments in data set S can be expressed by the feature matrix consisting of the scattering coefficients using the proposed method. Hz to 1 kHz as the geophones. A total number of 108 raw signals regarded as data set S were collected using automatic pick-up technology. For ease of analysis, 108 signals were interpreted as segments of equal length, with each segment consisting of 7000 sampling points. The data set S was split into S1, containing 54 microseismic events with an obvious jump, and S2, comprising 54 noise signals. After that, they were categorized into M (for microseismic events)and N (for noise). For convenience, one segment from S1 and another from S2 were picked for analysis; they are shown is Figure 4.  Figure 5, showing the distinctive differences between events and noise. The features of all the signal segments in data set S can be expressed by the feature matrix consisting of the scattering coefficients using the proposed method. To achieve better performance in the defined WSD-SVM model, 70 percent of the data in each class were randomly devoted to the formation of the training set STr which was trained in order to obtain the SVM classifier. Meanwhile, the remaining 30 percent was withheld for testing and assigned to the test set STe. As is known, the performance of a supervised machine learning algorithm is largely dependent on the training percent of the data set. The above process was repeated with different training percents, and the corresponding classification accuracy rates were calculated as shown in Figure 6. To achieve better performance in the defined WSD-SVM model, 70 percent of the data in each class were randomly devoted to the formation of the training set STr which was trained in order to obtain the SVM classifier. Meanwhile, the remaining 30 percent was withheld for testing and assigned to the test set STe. As is known, the performance of a supervised machine learning algorithm is largely dependent on the training percent of the data set. The above process was repeated with different training percents, and the corresponding classification accuracy rates were calculated as shown in Figure 6. To achieve better performance in the defined WSD-SVM model, 70 percent of the data in each class were randomly devoted to the formation of the training set STr which was trained in order to obtain the SVM classifier. Meanwhile, the remaining 30 percent was withheld for testing and assigned to the test set STe. As is known, the performance of a supervised machine learning algorithm is largely dependent on the training percent of the data set. The above process was repeated with different training percents, and the corresponding classification accuracy rates were calculated as shown in Figure 6. Microseismic events and noise can be classified by the WSD-SVM algorithm effectively, as shown in Figure 6. As the following ten experiments illustrate, the recognition accuracy rates increased as the training percent became larger, reaching 99.6% in five experiments. Microseismic events and noise can be classified by the WSD-SVM algorithm effectively, as shown in Figure 6. As the following ten experiments illustrate, the recognition accuracy rates increased as the training percent became larger, reaching 99.6% in five experiments.

Application in Genuine Signals
We sought to verify the validity of the above algorithm, so a continuous microseismic signal with a duration of 56 s was selected for the experiment. Data were obtained using the monitoring equipment installed in a coal mine in northwestern China; results are shown in Figure 7.
Because the monitoring station is disturbed by ambient noise, the signal segment in Figure 7 shows a low SNR. There are 8 microseismic events in total in the sequence. Judging by the software, 3 (E2, E5, E7) of these have a clear waveform and can be verified directly, and another 5 (E1, E3, E4, E6, E8) events are covered by the noise. All 8 events are designed to be detected by the theory of STA/LTA and our trained model. Both the results of the detected event numbers and the corresponding time consumptions of the two methods are recorded in Table 1.
Only four microseismic events (E1, E4, E5, E7) were able to be recognized by the STA/LTA method with a lower threshold, while our proposed algorithm could recognize all eight events effectively. Taking the time consumption of the two algorithms into consideration, it took 2.488 s for the WSD-SVM model to recognize all eight events, irrespective of the training time, which is a little slow for calculation. In contrast, the method we proposed was able to recognize low SNR microseismic events accurately with little sacrifice in calculation time.

Application in Genuine Signals
We sought to verify the validity of the above algorithm, so a continuous microseismic signal with a duration of 56 s was selected for the experiment. Data were obtained using the monitoring equipment installed in a coal mine in northwestern China; results are shown in Figure 7. Because the monitoring station is disturbed by ambient noise, the signal segment in Figure 7 shows a low SNR. There are 8 microseismic events in total in the sequence. Judging by the software, 3 (E2, E5, E7) of these have a clear waveform and can be verified directly, and another 5 (E1, E3, E4, E6, E8) events are covered by the noise. All 8 events are designed to be detected by the theory of STA/LTA and our trained model. Both the results of the detected event numbers and the corresponding time consumptions of the two methods are recorded in Table 1.

The Recognition Ability of WSD-SVM
In order to further confirm the universality of our proposed method, ten microseismic signals were achieved from four unique monitoring stations as experimental sample data and were analyzed using the STA/LTA method and the method employed in this paper. Using professional software, the time-frequency analysis of the ten signals was observed and 28 events were concluded. The number of microseismic events successfully recognized is presented in Table 2, which shows that 28 microseismic events were detected from the selected samples for experiments, the recognition accuracy rate was 92.86%, and the recognition accuracy was better than that of the STA/LTA method.

The Influence of the Transform Times on the Classification Results
Whenever the WSD-SVM algorithm is used to recognize microseismic events, the selection of the appropriate transform times is a critical step, determining the level of classification accuracy. To work out the influence of the transform times on the classification results, 54 event samples and 54 noise samples from Section 3.1 were selected, as well as the WSD-SVM algorithm when i = 6 and q = 1, 2, 3, 4, 5. As we can see from Table 3, when the transform time is l or 2, it takes less time to complete the process of classification with a lower accuracy. However, when the transform time is greater than 4, it results in a higher level of accuracy with an extremely high level of time consumption because of the complexity of calsulation. A small number of transform times is not adequate to express the complexity of the samples, though it takes less time, but an excessive number introduces large time consumption. Therefore, according to these results, the best, most acceptable number of transform times is 3.

Conclusions
To conquer the noise problems in the microseismic monitoring data, a novel intelligent recognition method for microseismic signals with a low SNR was proposed in detail, consisting of the use of a support vector machine classifier in combination with the feature extraction method of wavelet scattering decomposition transform. Though the selected signals are expected to be further processed by the algorithm in this paper, the validity of and the favorable results for the WSD-SVM model have already been demonstrated by the accurate discrimination of genuine microseismic events from noise events. In addition, the scattering coefficients for each signal are shown to be useful as features for training the distinctive model. The recognition accuracy rate of the samples for experiments using the model reached 92.86%, showing that the model could be applied to recognize the microseismic events in the monitoring area. The increased utilization of a smaller feature matrix and an effective feature extraction method is the future direction of microseismic event classification.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.