Microseismic Signal Denoising and Separation Based on Fully Convolutional Encoder–Decoder Network

Denoising methods are a highly desired component of signal processing, and they can separate the signal of interest from noise to improve the subsequent signal analyses. In this paper, an advanced denoising method based on a fully convolutional encoder–decoder neural network is proposed. The method simultaneously learns the sparse features in the time–frequency domain, and the mask-related mapping function for signal separation. The results show that the proposed method has an impressive performance on denoising microseismic signals containing various types and intensities of noise. Furthermore, the method works well even when a similar frequency band is shared between the microseismic signals and the noises. The proposed method, compared to the existing methods, significantly improves the signal–noise ratio thanks to minor changes of the microseismic signal (less distortion in the waveform). Additionally, the proposed methods preserve the shape and amplitude characteristics so that it allows better recovery of the real waveform. This method is exceedingly useful for the automatic processing of the microseismic signal. Further, it has excellent potential to be extended to the study of exploration seismology and earthquakes.


Introduction
During monitoring and data acquisition processes, microseismic signals are often corrupted by various types of noise due to the uncontrollable sources, conditions, and complicated environmental situations. Possible noise could be electrical, construction, mechanical, or traffic noises. Spectral filtering is commonly used for improving the signal-to-noise ratio (SNR) of the microseismic signal. However, it is ineffective to suppress the noise that has a similar frequency band with a microseismic signal. Moreover, it can distort the signal [1] and/or generate artifacts before impulsive arrivals [2].
In order to alleviate this limitation, many methods have been proposed to suppress the noise in seismic/microseismic data, including Short Time Fourier Transform (STFT) [3], the Continuous Wavelet Transform (CWT) [4,5], S-transform [6], the Radon Transform [7][8][9], the Wave-Packet Transform (WPT) [10,11], Empirical Mode Decomposition (EMD) [12][13][14], Fuzzy methods [15], singular spectrum analysis [16], sparse transform-based denoising [17], mathematical morphology-based denoising approach [18], and the non-local means (NLM) algorithm [19]. Further, some hybrid methods were proposed, which combine the advantages of two or more denoising methods [20]. Signal denoising performance can be improved through two ways: a more effective sparse representation of the data and a more flexible and powerful mapping function. Although these methods mentioned above can properly suppress noise components in the data, it is still tricky to choose the optimal mapping function between the noisy signal and the estimated signal.
Recently, deep learning methods have been rapidly developed to overcome the drawbacks of the existing signal denoising methods. The deep neural network can learn extremely complicated mapping functions through the training, which has been proved to be a powerful tool for signal processing [21][22][23][24][25]. Inspired by the success of the encoder-decoder network for image/signal processing, a deep fully-convolutional encoder-decoder network is adopted (CNN-denoiser) for microseismic signals denoising. Given a noisy input signal, the CNN-denoiser can produce two masks (sparse representations): one for estimating signal and the other for estimating noise. Microseismic data recorded in the Micang Mountain tunnel in China are used for network training, validation, and testing. The method has also been tested with other noisy signals recorded in different actual projects, and semisynthetic signals (i.e., generated by superimposing microseismic signals and real noise) are also used to evaluate the proposed method in comparison to existing methods.

Methods
The noisy signal, defined as NS(t, f ), represents the superposition of real microseismic signals MS(t, f ), and noise N(t, f ) that includes some instrumental noise or unknown noise in the time-frequency domain as follows: where t and f represents the sampling point and frequency bins, respectively. The purpose of denoising is to minimize the expected error between the actual microseismic signal MS(t, f ) and the estimated microseismic signal MS(t, f ): ) is a function that maps NS(t, f ) to the MS(t, f ), and n is the number of samples. In this paper, the error minimization problem is solved in a supervised learning manner in which a deep neural network learns to extract a sparse representation of noisy input waveform NS(t, f ) and to map it to a clean microseismic signal. The above theory is also used for estimation of noise, i.e., N(t, f ) = f (t, f )·NS(t, f ). Then, the mapping functions f (t, f ) are defined by two individual masks, modified from Zhu et al. [25], M MS (t, f ) and M N (t, f ) for estimating microseismic signal and noise, respectively.
Both the two individual masks have the same sizes as the noisy signal NS(t, f ), and they are merged as the targets for optimizing the performance of the neural network during training. Two masks contain a series of values between 0 and 1 to attenuate the noisy signal.
The real and imaginary parts of time-frequency coefficients of noisy signals (see Figure 1), as input vector, are firstly reshaped (from 129, 236, 2 into 256, 256, 2) by a zero-padding for the further encoding process. This is adapted from the existing method [25] to avoid a lack of noisy signal information in the encoding and decoding process. Otherwise, the feature space without reshaping will be shrunk by multiple downsampling operations (e.g., the shrinking process of the feature space as follows: 129,  236 firstly to 64, 118, then gradually to 32 and 59, 16 and 29, 8 and 14, 4 and 7, 2 and 3, and finally to 1, 1, which causes the loss of signal information, and the feature cannot be reconstructed entirely in the decoding process. The reshaped vector has then been transformed into new layers through a series of encoding operators that consist of a convolution, a ReLU (rectified linear unit) activation, and a batch normalization layer [26]. A stride of 2 is applied after every two successive layers to shrink the feature space gradually and improve the computational efficiency during the encoding process. For the convolution calculation, a larger kernel has a wider receptive field, which obtains more features of neighboring signals. However, the large convolution kernel leads to a dramatic increase in computational time limiting the depth of the neural network. Therefore, the kernel size of convolution/deconvolution layers is set to be 3 through the entire work. The decoding operators generate the masks M MS (t, f ) and M N (t, f ) in the decoding process. The corresponding feature maps in the encoding and decoding process are concatenated with skip connections, which improve the convergence of training and the reconstruction information of signal [27]. In the penultimate layer of the denoising network, a softmax activation function is used to produce masks. In the last layer, the masks are reshaped into 129, 236, 2, where each channel represents the time-frequency coefficients of microseismic signal and noise, respectively. , and finally to 1, 1, which causes the loss of signal information, and the feature cannot be reconstructed entirely in the decoding process. The reshaped vector has then been transformed into new layers through a series of encoding operators that consist of a convolution, a ReLU (rectified linear unit) activation, and a batch normalization layer [26]. A stride of 2 is applied after every two successive layers to shrink the feature space gradually and improve the computational efficiency during the encoding process. For the convolution calculation, a larger kernel has a wider receptive field, which obtains more features of neighboring signals. However, the large convolution kernel leads to a dramatic increase in computational time limiting the depth of the neural network. Therefore, the kernel size of convolution/deconvolution layers is set to be 3 through the entire work. The decoding operators generate the masks ( , ) and ( , ) in the decoding process. The corresponding feature maps in the encoding and decoding process are concatenated with skip connections, which improve the convergence of training and the reconstruction information of signal [27]. In the penultimate layer of the denoising network, a softmax activation function is used to produce masks. In the last layer, the masks are reshaped into 129, 236, 2, where each channel represents the time-frequency coefficients of microseismic signal and noise, respectively.    One of the advantages of the presented method is that instead of manually defining different features and thresholds to improve the SNR, it can automatically learn the richer features from the semi-synthetic noisy signals to obtain the estimated signals and noise in the time-frequency domain. Thus, deep learning has great potential to provide more efficient and accurate performance on signal denoising, which makes it possible to apply in other challenging tasks such as signal detection and onset time picking.

Data Preparation and Network Training
In this paper, 7500 microseismic signals with high SNRs and 15,000 noise samples were selected to form the dataset, which was randomly split into training (80%), validation (10%), and test (10%) datasets. The validation set is used to determine the hyperparameters and prevent over-fitting of the network from achieving the best results, and the test set is primarily used to evaluate network performance. The amplitude of the recorded signals is in voltage value, and the response frequency ranges from 50 to 5 kHz. The data acquisition station, located in the Micang Mountain tunnel in China, worked at the sampling frequency of 20 kHz and a sampling window of 1.5 seconds, which results in all signals having the size of 30,000 sampling points. The noisy input signal for training were generated through the training dataset and from the noisy signal with different SNRs by superimposing the selected microseismic signals with randomly selected noise samples on each iteration. The network was trained on NVIDIA GTX 1060 GPU with Adam optimizer and the learning rate of 0.001. Moreover, the noisy signals, recorded in the Zijing tunnel (China) by the microseismic monitoring system with the same parameters as Micang Mountain tunnel, were applied to the welltrained model for validating its versatility. Figure 3 shows the denoised results of some noisy signals selected from the test dataset, which are obtained by applying the two output masks to these noisy signals. It is possible to observe that the method can successfully separate the noisy signal with different characteristics into an estimated signal and estimated noise. The CNN-denoiser has an excellent denoising performance for microseismic signals with various types and intensities of noise. Noisy signals in Figure 3a,b indicate microseismic signals with cyclic noise at different frequencies, and Figure 3c indicates microseismic signals with a mixture of cyclic and other noise. For the most part, the cyclic noise changes over time, and its frequency band overlaps with that of the microseismic signals, which is a challenging issue  1 The noisy signal is transformed from the time domain into the time-frequency domain using Short Time Fourier Transform (STFT). 2 The real and imaginary parts of time-frequency coefficients of noisy signals are input into the neural network. 3 Two masks from the network are applied to the real and imaginary parts of time-frequency coefficients of the input noisy signal to estimate the time-frequency coefficients of microseismic signal and noise, respectively. 4 The estimated signal and noise in the time domain are obtained via the inverse STFT.

Test Results
One of the advantages of the presented method is that instead of manually defining different features and thresholds to improve the SNR, it can automatically learn the richer features from the semi-synthetic noisy signals to obtain the estimated signals and noise in the time-frequency domain. Thus, deep learning has great potential to provide more efficient and accurate performance on signal denoising, which makes it possible to apply in other challenging tasks such as signal detection and onset time picking.

Data Preparation and Network Training
In this paper, 7500 microseismic signals with high SNRs and 15,000 noise samples were selected to form the dataset, which was randomly split into training (80%), validation (10%), and test (10%) datasets. The validation set is used to determine the hyperparameters and prevent over-fitting of the network from achieving the best results, and the test set is primarily used to evaluate network performance. The amplitude of the recorded signals is in voltage value, and the response frequency ranges from 50 to 5 kHz. The data acquisition station, located in the Micang Mountain tunnel in China, worked at the sampling frequency of 20 kHz and a sampling window of 1.5 s, which results in all signals having the size of 30,000 sampling points. The noisy input signal for training were generated through the training dataset and from the noisy signal with different SNRs by superimposing the selected microseismic signals with randomly selected noise samples on each iteration. The network was trained on NVIDIA GTX 1060 GPU with Adam optimizer and the learning rate of 0.001. Moreover, the noisy signals, recorded in the Zijing tunnel (China) by the microseismic monitoring system with the same parameters as Micang Mountain tunnel, were applied to the well-trained model for validating its versatility. Figure 3 shows the denoised results of some noisy signals selected from the test dataset, which are obtained by applying the two output masks to these noisy signals. It is possible to observe that the method can successfully separate the noisy signal with different characteristics into an estimated signal and estimated noise. The CNN-denoiser has an excellent denoising performance for microseismic signals with various types and intensities of noise. Noisy signals in Figure 3a,b indicate microseismic signals with cyclic noise at different frequencies, and Figure 3c indicates microseismic signals with a mixture of cyclic and other noise. For the most part, the cyclic noise changes over time, and its frequency band overlaps with that of the microseismic signals, which is a challenging issue for existing denoising methods. These above types of noisy signals cover all the noisy signal samples in the field. It can be found that CNN-denoiser can effectively suppress the microseismic signals with non-Gaussian noise (including cyclic noise, unknown noise, and their mixture). To further demonstrate the applicability of the proposed method to other noise (like Gaussian noise), CNN-denoiser was tested in noisy signals formed with clean microseismic signal and Gaussian noise (Figure 3d). The results show that the noises are significantly reduced regardless of types of noise. Further, the estimated signal leakage is minimal, and the shape and amplitude characteristics of the estimated signal are well preserved. These above characteristics are also applicable to estimated noise, even Gaussian noise.

Test Results
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 14 for existing denoising methods. These above types of noisy signals cover all the noisy signal samples in the field. It can be found that CNN-denoiser can effectively suppress the microseismic signals with non-Gaussian noise (including cyclic noise, unknown noise, and their mixture). To further demonstrate the applicability of the proposed method to other noise (like Gaussian noise), CNNdenoiser was tested in noisy signals formed with clean microseismic signal and Gaussian noise (Figure 3d). The results show that the noises are significantly reduced regardless of types of noise. Further, the estimated signal leakage is minimal, and the shape and amplitude characteristics of the estimated signal are well preserved. These above characteristics are also applicable to estimated noise, even Gaussian noise.
(a)  The proposed model not only learns the features of microseismic signals but also estimates noise. To validate this, the CNN-denoiser was tested for 12,000 real noise samples. Figure 4 shows the robustness of the neural network to some periodic non-microseismic signals (the frequency band varying with time). No estimated signals are predicted, and the estimated noise is almost equivalent to the input noise sample. It indicates that the CNN-denoiser can automatically produce a mask that adapts to the noise characteristics. The mask reflects the changes in frequency characteristics over time, as well as the frequency band of noise. Additionally, the distribution of the max amplitude difference between the real and estimated noise samples is shown in Figure 5. The results show that the max amplitude difference of more than 50% of noise samples is less than 0.05, which means the method only causes minor waveform distortion. The proposed model not only learns the features of microseismic signals but also estimates noise. To validate this, the CNN-denoiser was tested for 12,000 real noise samples. Figure 4 shows the robustness of the neural network to some periodic non-microseismic signals (the frequency band varying with time). No estimated signals are predicted, and the estimated noise is almost equivalent to the input noise sample. It indicates that the CNN-denoiser can automatically produce a mask that adapts to the noise characteristics. The mask reflects the changes in frequency characteristics over time, as well as the frequency band of noise. Additionally, the distribution of the max amplitude difference between the real and estimated noise samples is shown in Figure 5. The results show that the max amplitude difference of more than 50% of noise samples is less than 0.05, which means the method only causes minor waveform distortion.

Application in Real Field
To validate the versatility, the CNN-denoiser was also applied to the 1510 noisy signals recorded in Zijing tunnel. These noisy signals include various types and intensities of noise so that the SNRs ranged from 0 to 15 dB ( Figure 6). SNR [28] is calculated as follows: where and are peak amplitudes of signal and noise, respectively. As shown in Figure  6, the proposed method provides an excellent denoising performance for microseismic signals with various noise in terms of the improvement of the SNRs, good shape recovery, and amplitude characteristics, even one noisy signal contains more than one microseismic waveform (Figure 6d). The average SNR of the 1510 noisy signals after denoising is increased by 8.48 dB, and the maximum improvement up to 36.21 dB (Figure 7). In addition, the shape and amplitude characteristics of the estimated noise are well preserved. Although the CNN-denoiser is trained on semi-synthetic data, it could well be extended to real noisy signals. This suggests that the proposed method in this study can be directly applied to actual engineering for denoising tasks.

Application in Real Field
To validate the versatility, the CNN-denoiser was also applied to the 1510 noisy signals recorded in Zijing tunnel. These noisy signals include various types and intensities of noise so that the SNRs ranged from 0 to 15 dB ( Figure 6). SNR [28] is calculated as follows: where S Amax and N Amax are peak amplitudes of signal and noise, respectively. As shown in Figure 6, the proposed method provides an excellent denoising performance for microseismic signals with various noise in terms of the improvement of the SNRs, good shape recovery, and amplitude characteristics, even one noisy signal contains more than one microseismic waveform (Figure 6d). The average SNR of the 1510 noisy signals after denoising is increased by 8.48 dB, and the maximum improvement up to 36.21 dB (Figure 7). In addition, the shape and amplitude characteristics of the estimated noise are well preserved. Although the CNN-denoiser is trained on semi-synthetic data, it could well be extended to real noisy signals. This suggests that the proposed method in this study can be directly applied to actual engineering for denoising tasks.
The average SNR of the 1510 noisy signals after denoising is increased by 8.48 dB, and the maximum improvement up to 36.21 dB (Figure 7). In addition, the shape and amplitude characteristics of the estimated noise are well preserved. Although the CNN-denoiser is trained on semi-synthetic data, it could well be extended to real noisy signals. This suggests that the proposed method in this study can be directly applied to actual engineering for denoising tasks.

Comparison with Other Existing Methods
To further demonstrate the denoising performance, the proposed method is compared with other existing methods on a benchmark that is constructed by combining one clean microseismic waveform and one noise. As shown in Figure 8, the amplitude of the noise is varied for different SNRs.

Comparison with Other Existing Methods
To further demonstrate the denoising performance, the proposed method is compared with other existing methods on a benchmark that is constructed by combining one clean microseismic waveform and one noise. As shown in Figure 8, the amplitude of the noise is varied for different SNRs. The proposed CNN-denoiser is compared to the high-pass filter and wavelet threshold (WT) filter on the performance of signal denoising. The high-pass filter was designed based on the frequency distribution of clean microseismic signals and noise to achieve the best performance of denoising. In WT filter, the wavelet base of Sym8 was selected, and the seven layers of decomposition and domain value were fixed; the soft threshold was used for microseismic signal denoising. Four measures were employed to compare the methods: the improvements of SNRs between noisy signals and estimated microseismic signals, the correlation coefficient, the changes of the maximum amplitude, and the errors of onset time picking. The Short Term Averaging/Long Term Averaging (STA/LTA) methods for onset time picking was used, and the threshold of the STA/LTA method was set to maximize the accuracy of the onset time picking.
Compared with the high-pass filter and WT filter, the proposed CNN-denoiser outperforms SNR improvements with a maximum of 64.85 dB (Figure 9a). The highest correlation coefficient of estimated signals indicates that the WT filter and CNN-denoiser causes smaller waveform distortion during denoising, and the performance of the latter is better than the former. In contrast, the highpass filter method introduces high waveform distortion, so that the maximum correlation coefficient only reached 0.86 (Figure 9b). CNN-denoiser provides the highest correction coefficient, especially for low SNRs. Figure 9c shows that the absolute value of max amplitude changes of the estimated signals by the high-pass filter and CNN-denoiser are almost similar when the SNR is less than 10 dB. However, the CNN-denoiser is superior to the high-pass filter in terms of the max amplitude changes of noisy signals with high SNRs, indicating that the max amplitude of the estimated signal is closer to the clean signal in Figure 8. Moreover, the max amplitude change caused by the WT filter is the largest in the test. The STA/LTA method is applied to pick up the onset time of the noisy signal and that denoised by the high-pass filter, WT filter, and CNN-denoiser. As shown in Figure 9d, the The proposed CNN-denoiser is compared to the high-pass filter and wavelet threshold (WT) filter on the performance of signal denoising. The high-pass filter was designed based on the frequency distribution of clean microseismic signals and noise to achieve the best performance of denoising. In WT filter, the wavelet base of Sym8 was selected, and the seven layers of decomposition and domain value were fixed; the soft threshold was used for microseismic signal denoising. Four measures were employed to compare the methods: the improvements of SNRs between noisy signals and estimated microseismic signals, the correlation coefficient, the changes of the maximum amplitude, and the errors of onset time picking. The Short Term Averaging/Long Term Averaging (STA/LTA) methods for onset time picking was used, and the threshold of the STA/LTA method was set to maximize the accuracy of the onset time picking.
Compared with the high-pass filter and WT filter, the proposed CNN-denoiser outperforms SNR improvements with a maximum of 64.85 dB (Figure 9a). The highest correlation coefficient of estimated signals indicates that the WT filter and CNN-denoiser causes smaller waveform distortion during denoising, and the performance of the latter is better than the former. In contrast, the high-pass filter method introduces high waveform distortion, so that the maximum correlation coefficient only reached 0.86 (Figure 9b). CNN-denoiser provides the highest correction coefficient, especially for low SNRs. Figure 9c shows that the absolute value of max amplitude changes of the estimated signals by the high-pass filter and CNN-denoiser are almost similar when the SNR is less than 10 dB. However, the CNN-denoiser is superior to the high-pass filter in terms of the max amplitude changes of noisy signals with high SNRs, indicating that the max amplitude of the estimated signal is closer to the clean signal in Figure 8. Moreover, the max amplitude change caused by the WT filter is the largest in the test. The STA/LTA method is applied to pick up the onset time of the noisy signal and that denoised by the high-pass filter, WT filter, and CNN-denoiser. As shown in Figure 9d, the accuracy of the onset time picking is significantly improved with the high-pass filter, WT filter and the CNN-denoiser, where the CNN-denoiser outperforms the others for low SRNs. It represents an improvement compared with the STA/LTA methods.
Although the semi-synthetic data are used for training the network, the performance of the CNN-denoiser on microseismic signal denoising shows its robustness by maintaining an optimal accuracy. However, for new complex noise or microseismic signal samples, the method may not achieve the current performance, which needs validation in further research. The current estimation of the microseismic signal is based on the output of masks, and thus non-mask prediction will be the direction of future research [29]. The increase in training data can continuously separate the microseismic signal and noise perfectly, which will be the goal of the future research. Further, the combination of traditional methods and the neural network will be explored.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 12 of 14 direction of future research [29]. The increase in training data can continuously separate the microseismic signal and noise perfectly, which will be the goal of the future research. Further, the combination of traditional methods and the neural network will be explored.
(a) (b) (c) (d) Figure 9. Performance comparison between high-pass filter, wavelet threshold (WT) filter and CNNdenoiser. (a-c) Improvement of SNRs, correlation coefficient, and max amplitude changes between the high-pass filter, WT filter and CNN-denoiser. (d) The error of onset time picking by Short Term Averaging/Long Term Averaging (STA/LTA) (applied to noisy signal and that denoised by a highpass filter, WT filter and CNN-denoiser). Values in (b) are calculated by Pearson product-moment correlation coefficient [30] between the estimated signals (noisy signal and that denoised by a highpass filter, WT filter and CNN-denoiser) and the clean signal in Figure 8. Values in (c) and (d) are differences between the estimated signals (noisy signal and that denoised by a high-pass filter, WT filter and CNN-denoiser) and the clean signal in Figure 8.

Conclusions
In this paper, an advanced processing method for microseismic signals (CNN-denoiser) based on the deep neural network is proposed. The performance of the proposed method outperforms the microseismic signals corrupted with various types and intensities of noise for denoising, even when the frequency bands of microseismic signal and noise overlap. The signal and noise components in the data are appropriately recognized and separated, even if the signal is heavily polluted by noise. Figure 9. Performance comparison between high-pass filter, wavelet threshold (WT) filter and CNN-denoiser. (a-c) Improvement of SNRs, correlation coefficient, and max amplitude changes between the high-pass filter, WT filter and CNN-denoiser. (d) The error of onset time picking by Short Term Averaging/Long Term Averaging (STA/LTA) (applied to noisy signal and that denoised by a high-pass filter, WT filter and CNN-denoiser). Values in (b) are calculated by Pearson product-moment correlation coefficient [30] between the estimated signals (noisy signal and that denoised by a high-pass filter, WT filter and CNN-denoiser) and the clean signal in Figure 8. Values in (c) and (d) are differences between the estimated signals (noisy signal and that denoised by a high-pass filter, WT filter and CNN-denoiser) and the clean signal in Figure 8.

Conclusions
In this paper, an advanced processing method for microseismic signals (CNN-denoiser) based on the deep neural network is proposed. The performance of the proposed method outperforms the microseismic signals corrupted with various types and intensities of noise for denoising, even when the frequency bands of microseismic signal and noise overlap. The signal and noise components in the data are appropriately recognized and separated, even if the signal is heavily polluted by noise. The results show that the proposed method significantly improves the SNRs with minor changes of the microseismic signal, and also preserves the shape and amplitude characteristics are well. Moreover, generalization ability was validated on the other dataset outside of the training dataset. Compared with existing methods, the CNN-denoiser significantly improves the SNRs and introduces less distortion in the waveform which allows better recovery of the real waveform. Although the motivation of this study is the need for accurate and automated microseismic signal processing, the proposed method can be applied for signal analysis and disaster assessment in geophysical and geotechnical engineering fields, such as hydraulic fracturing, the mining industry, shale-gas exploitation, and earthquake management.