Structure-Preserving Random Noise Attenuation Method for Seismic Data Based on a Flexible Attention CNN

: The noise attenuation of seismic data is an indispensable part of seismic data processing, directly impacting the following inversion and imaging. This paper focuses on two bottlenecks in the AI-based denoising method of seismic data: the destruction of structural information of seismic data and the inferior generalizability. We propose a ﬂexible attention-CNN (FACNN) and realized the denoising work of seismic data. This paper’s main work and advantages were concentrated on the following three aspects: (i) We propose attention gates (AGs), which progressively suppressed features in irrelevant background parts and improved the denoising performance. (ii) We added a noise level map M as an additional channel, making a single CNN model expected to inherit the ﬂexibility of handling noise models with different parameters, even spatially variant noises. (iii) We propose a mixed loss function based on MS _ SSIM to improve the performance of FACNN further. Adding the noise level map can improve the network’s generalization ability, and adding the attention structure with the mixed loss function can better protect the structural information of the seismic data. The numerical tests showed that our method has better generalization and can better protect the details of seismic events.


Introduction
With the deepening of oil exploration, exploration areas are becoming more and more diversified, such as deserts, mountains, plains, oceans, etc. Different exploration areas have different characteristics of seismic data, which raises new challenges for data processing in the oil exploration industry. For example, in desert areas, as influenced by the surface conditions and acquisition environment, the characteristics of seismic records are mainly characterized by the low signal-to-noise ratio (SNR) and severe spectrum aliasing of noise and effective signals [1]. How to enhance the SNR of seismic data is an urgent problem.
Traditional random noise suppression methods can be divided into the following categories. Firstly, filtering techniques based methods, which are based on the difference between the effective signal and the noise in the frequency spectrum, mainly include an f-x deconvolution method (FXDM) [2], the time-frequency peak filtering [3][4][5], the bandpass filtering [6], and so on. The second kind of algorithm is based on inter-tract correlation, mainly including singular-value decomposition [7], K-L transform [8], etc. At the same time, there is a series of transformation algorithms, such as shearlet transformation [9], wavelet transformation [10], seislet transformation [11], and curvelet transformation [12,13]. The fourth is a reduced-rank class algorithm [14], which takes advantage of the feature that the seismic signal is of low rank. However, the noise signal increases the rank of the seismic signal, mainly including Cadzow filtering [15,16] and principal component analysis.
However, conventional denoising algorithms still have two unbreakable bottlenecks [17], inaccurate assumptions and cumbersome parameter adjustment with manual intervention, which are unfavorable for massive seismic data processing. In recent years, due to the rapid development of computer technology, deep learning has become a popular research topic. Lecun et al. [18] proved that CNNs with fewer parameters provided superior classification results on the MNIST. Zhang et al. [19] proposed a CNN with 17 layers named DnCNN to realize the image's noise attenuation. Zhang et al. [20] proposed a fast and flexible denoising CNN to improve the denoising ability further. Yu et al. [21] proposed a vision-based crack diagnosis method using a deep convolutional neural network (DCNN) and an enhanced chicken swarm algorithm. Yu et al. [22] proposed a novel method based on deep convolutional neural networks to identify and localize the damages of building structures equipped with smart control devices.
The deep learning methods can extract the secret relationship between noise and clean data and realize the intelligent denoising tasks without unnecessary assumptions and excessive manual intervention. There have been awe-inspiring research advances from previous researchers for AI-based denoising work. Yu and Ma [17] introduced DnCNN in image denoising into seismic data denoising and achieved good results. Based on this, Wang et al. [23] proposed a data augmentation algorithm, which can improve the diversity of the training dataset. Zhang and Liu [24] proposed a novel approach to attenuate seismic random noise based on a deep convolutional neural network (CNN) in an unsupervised learning manner. Dong and Li [25] proposed an adaptive DnCNN based on the determination of high-order statistics and realized the desert seismic data noise attenuation.
Many scholars have further applied the attention mechanism to various computer vision tasks such as noise removal [26] and image segmentation [27]. Since the attention mechanism allows the neural network to focus more on the higher-weighted parts, it can significantly preserve the structural characteristics of the seismic data [28,29].
Although many AI-based denoising methods have obtained favorable results, they still face two intractable problems: the destruction of the detailed seismic data structure and insufficient generalization. Firstly, different AI-based denoising algorithms will destroy the amplitude and phase information of the seismic signal to some extent, specifically the residual part of the useful signal information in the removed noise profile. Secondly, when using the well-trained network model for testing, if the noise characteristics of the testing data are inconsistent with the training, the network cannot reach the best performance. This paper focuses on two intractable problems: the destruction of the detailed seismic data structure and insufficient generalization. We propose a flexible attention-CNN (FACNN) and realized the denoising work of seismic data. In contrast to the existing AI-based denoisers, FACNN enjoys several desirable properties. We first added a noise level map as an additional channel in the network input data body, making a single CNN model that can handle the noise models with different noise characteristics. Secondly, we integrated the attention gates (AGs) in a standard U-Net model, which progressively suppressed features in irrelevant background parts and improved the preservation of seismic data structure. Thirdly, we used a mixed loss function to improve the performance of FACNN further. Numerical tests demonstrated that the noise level map can improve the network's generalization. The attention structure with the mixed loss function can better protect the details of seismic events.

Theory and Method
The theory in this paper focuses on two unavoidable problems in AI-based denoising work: firstly, the destruction of the seismic data structure by the well-trained denoised network model; secondly, the diminished generalization ability of the trained network when the seismic data characteristics change. This section introduces three theories of this paper: the noise level map for solving Problem 1 with reduced generalization ability and the attention layer with the mixed loss function for solving Problem 2 to protect the detailed structure of seismic signals.
This section first introduces the architecture and characteristics of FACNN from two aspects in the following subsections: the attention gates' (AGs) structure and the noise level map, respectively. The whole network architecture is illustrated in Figure 1. U-Net [30] is a fully convolutional neural network for medical image segmentation. U-Net contains a downsampling layer, upsampling layer, and skip connection operation, which can achieve a very excellent image segmentation function. It is named so because its shape resembles a U-shape and is widely used in deep learning research. Unlike the traditional U-Net, we integrated the noise level map M and AGs into a standard U-Net model. FACNN consists of the encoder and decoder parts, including the convolutional layer (3 × 3 × 3 Conv), activation function (ReLU), downsampling layer, upsampling layer, skip connection structure, and AGs. The input data shape of FACNN is (Batchsize, Channels, H, W), in which Channels = 2 stands for the channel number, H is the time dimension, and W is the surface coordinate. The input data through the encoder structure include 3 × 3 × 3 Conv, ReLU, and downsampling layers. The purpose of ReLU is to increase the deep neural network's nonlinearity. The definition of ReLU is: ReLU(·) = max(·, 0). In this paper, we used Maxpooling as the downsampling layer, significantly reducing the computational cost and saving GPU memory. The decoder shares an asymmetric structure with the encoder. FACNN also uses the skip connection structure, which can help accelerate the gradient back-propagation and convergence.

Attention Gates
One of the distinctive features of FACNN is the integration of AGs into a standard U-Net model. Specifically, we merged the AGs [31] into the skip connection operations. Attention gates can help the model assign weights to each input part, extracting critical and vital information and making more accurate predictions without imposing greater computational and storage expenses. This is beneficial to better preserve the structural information of the seismic data during the denoising process of the neural network.
AGs is a tool that allows updating the model parameters of the shallow feature layer according to the spatial region relevant for a target task. Through weight control, AGs can progressively suppress features in irrelevant background parts and make the feature layers more focused on the target part's characteristics. In addition, compared with the previous attention mechanisms, the proposed AGs in this paper is a memory-efficient method, which means it will not add too many extra model parameters. The AGs can be expressed as the following Equation (1): where l i,c and r i,c are the left and right feature map in downsampling and upsampling, respectively. i and c denote the spatial and channel, respectively. We chose ReLU as the activation function. Ψ and w are the convolution operations. b 1 and b 2 are the bias.
corresponds to the sigmoid function. After l i,c passes through the AGs, we can obtain a weight coefficient matrix coe i,c focusing on the target, and we multiplied it with l i,c to obtain the output of skip connection l i,c , as shown in the following Equation (3): Finally, we used the feature concatenation to concatl i,c and r i,c , which are expressed as Equation (4) The architecture of the AGs is illustrated in Figure 2. The AGs can progressively suppress the irrelevant background features and improve the denoising performance without increasing intolerable extra computation. Adding the attention layer to the traditional neural network can make the network pay more attention to the structural information of the input data so that the network can better protect the structural information of the seismic data.

Noise Level Map
The second distinctive feature of FACNN is that we added a noise level map M [20] as an additional channel in the network input to improve the generalization. When using the well-trained network model for testing, if the noise level and the testing data's characteristics are inconsistent with the training dataset, the network cannot reach the best performance. Then, we introduced the theory of noise level map M.
We can ascribe the denoising problem of seismic data to the following Equation (5): where 1 2σ 2 y − x 2 is the data fidelity term with the noise level σ. Φ(x) is the regularization term, and λ controls the balance between 1 2σ 2 y − x 2 and Φ(x). This means that we introduced λ to control the trade-off between the noise reduction results and seismic data detail preservation.
Based on some optimization algorithms, the denoising function can be expressed as an implicit function, and λ is absorbed into σ. Then, we can obtain the following Equation (6): where σ inherited the characteristics of λ, and we used the seismic data and noise level as the inputs in this paper. However, the seismic data and noise level map M have different dimensions; we cannot directly feed them into our network model. To eliminate the problems of mismatching dimensions, we stretched the noise level σ into a noise level map M composed of σ, which has the same dimension as the input seismic data. Finally, Equation (6) can be expressed as Equation (7): We added a noise level map M as an additional channel, controlling the trade-off between noise attenuation results and seismic data detail preservation. We can adjust the noise level map to keep the mean value of the input data body noise constant so that the network always maintains the best performance.

Mixed Loss Function
The loss function is used to evaluate the difference between the predictedx i = F(y i , M i ; ψ) and true values x i of the model, and the choice of the loss function can largely influence the model's performance. M i and ψ are the noise level maps and collection of all learnable parameters, respectively. Mathematically, the loss function is defined as: where L(ψ) is the loss function of the training process. N is the total number of pixels.
To improve the preservation of the details of the denoised seismic image, we introduced multi-scale structural similarity (MS_SSI M) [32] and formed a new loss function. As an assessment for the image quality method, MS_SSI M is very sensitive to changes in local structure, which can significantly preserve the seismic event details. Mathematically, it can be expressed as: , and s(a, b) = σ ab +c 3 σ a σ b +c 3 represent three measurements between a and b. µ and σ represent the mean and the standard deviation. σ ab is the covariance between data a and b. c 1 , c 2 , c 3 are three constants to avoid the denominator being stable. Based on Equation (9), we obtained an MS_SSI M loss as Equation (10): Furthermore, the mixed loss function is defined as: Through several experiments, we empirically set the weight φ to 0.5. Table 1 shows the SNR values on the same test dataset with different weights φ. It shows that FACNN can achieve the best performance when φ = 0.5. Using the loss function with MS_SSI M can better preserve the structural information of the denoised seismic data. The details of the effective events on denoised images can be maintained better through our proposed mixed loss function.

Numerical Tests
In this section, we mainly tested the denoising performance of the FACNN method proposed in this paper from both synthetic and real seismic data. Some comparison tests were included in the numerical tests to demonstrate the strategy proposed in the theoretical part to protect the detailed structure of the seismic data and to improve the generalization ability.

Synthetic Data Testing
In this subsection, we first tested FACNN on synthetic data. Firstly, we introduce the construction of synthetic training dataset. We usually constructed the training dataset for synthetic data by adding random noise. Considering the samples' diversity and the well-trained model's generalization, we chose 2D, 3D, pre-stack, and post-stack data with different features to construct the training dataset. Then, we split the dataset into small patches through the Monte Carlo strategy [33], removing the useless patches (almost allzero data or particularly similar data). Moreover, we performed a data augmentation strategy based on the following operations: rotation transformation, mirror transformation, space-time downsampling, and intensity transformation [34]. Finally, we obtained a total of 22,000 training datasets, of which 18,000 were used for training and 2000 were used for validation, and the rest were used for testing. The dimension of the training input and output is (20,2,64,64) and (20, 1, 64, 64), respectively. The additional channel of the training input is the noise level map M. The learning rate can be adjusted adaptively to accelerate the network's convergence speed, which starts from 10 −3 and reduces to 10 −4 . This paper set the batch size, patch size, and epoch to 20, 64 × 64, and 100, respectively. The training process involved 100,000 iterations, with 2 h and 5 min total. Figure 3a,b show the clean sigmoid model and the contaminated image. Then, we compare the denoising capabilities of FACNN, DnCNN [17], industrial RNA [2] (f-x random noise attenuation), and the adaptive prediction filter [35] (APF). The RNA method is the most widely used denoising method for data processing work in the petroleum industry. It achieves good denoising effects based on predicted seismic events and has now become the standard algorithm for denoising seismic data. APF is a state-of-the-art traditional denoising method, an extension of the RNA method, based on an adaptive prediction filter. It can achieve an excellent noise suppression effect. DnCNN is the standard comparison algorithm in neural network denoising for seismic data. We demonstrate the capability of FACNN in seismic data denoising by using two traditional algorithms and one deep learning algorithm as a comparison. Figure 4a-d are the denoised results of RNA, APF, DnCNN, and FACNN, respectively. As the red arrow points out, we can easily find that the RNA method will leave much residual noise. For the state-of-art filtering method APF, the denoising effect was satisfactory, and the residual noise is almost invisible in Figure 4b. The denoising result was better than the RAN method for the widely used DnCNN method, but residual random noise still led to partially blurred data. The FACNN proposed in this paper showed a remarkable denoising performance, and its denoising result was further improved compared with the APF method.
Then, we further compared the ability of the structure preservation of the above four methods. Figure 4a*-d* are the corresponding removed noise profiles of the above four methods. The noise profile of the RNA method contained many effective signals, including geologic folds, faults, and unconformities. The preservation of the effective signals of the APF method was excellent, and only a tiny amount of fault information can be seen. Due to the DnCNN method not adapting to this noise level, much geological information remained in its noise profile. In contrast, the FACNN in this paper had no helpful information residual, and the preservation of the seismic structures was the best among the four methods.
By comparing the removed noise profiles, we can find that the detailed structural information of the seismic data can be better preserved by using our proposed attention structure with the mixed loss function. The variations of the SNR and SSIM for the four methods are shown in Table 2 below. It can be found that the SNR and SSIM of the FACNN method in this paper were the highest among several methods.   To examine the influence of the denoising methods on the original data amplitude, we extracted the 50th traces of four methods for comparison. The noisy data, RNA, APF, DnCNN, and FACNN methods are from top to bottom. The black line is the original signal. More overlap with the black line means that the method was more successful. As we can observe, FACNN had the best overlap with the original signal, which proves that FACNN is excellent in preserving the amplitude information. This also proves that the method in this paper hardly damages the structural information of the seismic data ( Figure 5).
It is worth noting that AI-based methods have advantages over traditional methods. The denoising efficiency of traditional methods is not very high, especially the APF method, which is unsuitable for large-scale data processing because it requires iterations and consumes much time. In addition, we tested different denoising parameters for the two non-AI methods and, finally, chose the best denoising results. The traditional method always needs the selection of many parameters and manual intervention. Both AI-based methods consume only about one second of processing time for the sigmoid model and do not require manual intervention. In this experiment, we only compared the denoising ability of two deep learning algorithms at different noise levels. Then, we compared the denoised results of DnCNN with FACNN at different noise levels to examine the superiority of the noise level map M. As we all know, when using a well-trained network model for testing, if the noise levels and the testing data's characteristics are inconsistent with the training dataset, the network cannot reach the best performance. Figure 6 is the comparison of DnCNN and FACNN with respect to the different noise levels, where (a-d) are the DnCNN denoised results and (a*-d*) are the corresponding FACNN denoised. The noise level testing in Figure 6a is the same as the training, and the denoising result of DnCNN was similar to FACNN. Moreover, as the noise level increased, more and more noise remained in the results of DnCNN (b-d). As for FACNN, we can adjust the noise level map so that the network always maintains the best performance. Therefore, its denoising performance only decreased slightly as the noise level increased. At the highest noise level, its denoising result was much better than DnCNN. The variations of the SNR for the two methods are shown in Table 3 below.
We found that, when the characteristics of the test data changed, the network performance of the DnCNN method decreased significantly. In contrast, the FACNN method proposed in this paper did not deteriorate the performance because we can adjust the noise level map so that the noise mean value of the input data remains constant. This also proves again that adding the noise level map can improve the generalization ability of the well-trained neural network. Through the previous tests, we demonstrated that the proposed FACNN in this paper can effectively protect the structural information of seismic signals. We used the control variable method for comparison to further demonstrate the contribution of the proposed attention structure and the mixed loss function. In this experiment, we compared the FACNN method of this paper, the method using the attention structure with the traditional loss function, and the method using the traditional U-Net with the mixed loss function proposed. We verified the effect of the attention and the mixed loss function proposed in this paper with the remaining variables held constant.  We can find some effective signals in (b*) and (c*), and the residual information in (c*) is more than (b*). By comparing (a*) with (b*), we can demonstrate that the mixed loss function can protect the seismic signal structure. Furthermore, by comparing (a*) with (c*), it can be proven that the AGs can also play a role in protecting the seismic structure. The effective residual signal in (b*) is less than that in (c*), which proves that the AGs can protect the seismic signal structure better than the mixed loss function. This demonstrates that both the attention structure and the mixed loss function can contribute to the protection of the seismic data's detailed structure. The variations of the SNR and SSIM for the three methods are shown in Table 4 below. The SSIM tests also demonstrated that the attention structure and the mixed loss function proposed in this paper can protect the details of the seismic signal structure.

Real Data Testing
To denoise the seismic field data, we added real data to the synthetic dataset to build a comprehensive training dataset, where the labels of the field data were denoised by the state-of-the-art APF method. The real seismic data were obtained from the field 3D seismic data of some oil fields in China, including land seismic data, desert seismic data, and marine seismic data. Through suitable processing and cropping, a total of 5000 training samples of real data were constructed to build a comprehensive training set. In addition, we used the same data augmentation strategy for the real data to form a comprehensive dataset with diversity [4,34]. Then, we used the transfer learning method [17], which means the well-trained synthetic data model was used as the pre-trained model, and finally, we obtained the denoised model adapted to the field seismic data. It is worth noting that during the training process, we built the noise level map by using the noise removed by the APF method so that the well-trained model can adjust to the data with different characteristics by adjusting the noise level map M.
In this section, we tested FACNN's effects in the field desert seismic data and the land post-stack seismic data. By comparing the processing results of different field seismic data, we can demonstrate the generalization ability and the ability to protect the structural features of seismic data. Figure 8a is a part of the field desert post-stack seismic data, on which some noise residues can be observed. In this experiment, we compared the performance of the FACNN method in this paper with the DNCNN method and the traditional APF method in real seismic data denoising. Figure 8b-d are the denoised results of the APF, DnCNN, and FACNN, respectively. We can find that the denoising results of the APF method (b) and DnCNN (c) were very similar, which demonstrated that the transfer learning method made the deep learning model outperform the traditional methods. Moreover, the denoising result of FACNN was further improved compared with DnCNN, demonstrating the effectiveness of AGs and the mixed loss function proposed in this paper. Figure 8b*-d* are the removed noise profiles by APF, DnCNN, and FACNN, respectively. We can observe that DnCNN had many residual effective signals due to not adapting to the characteristics of the real seismic data in this region. However, in Profile (d*), we can hardly find the residual geological structure information. This demonstrated that the FACNN in this paper can also better preserve the structural information of seismic signals by the noise level map, AGs, and mixed loss function. Figure 9 is the comparison of the three methods of the FK spectrum in real seismic data, where (a-d) represent the FK spectrum of the field seismic data, the denoising results of the APF method, the denoising results of DnCNN, and the denoising results of FACNN, respectively. All three methods can suppress the noise in Figure 9a. There is still a slight residual noise in the high and low wave numbers in (b,c). In contrast, the FACNN method in this paper can suppress the random noise better. In addition, in Figure 9d, we can observe that the effective signal of the low wavenumber part is better retained. This proved that FACNN can better protect the characteristics of the effective signal while maintaining good denoising ability.
Next, we further verified the generalization ability of FACNN with field land seismic data. The generalization ability of the method in this paper can be demonstrated by testing the field seismic data from different blocks. In this experiment, we compared the denoising performance of a standard U-Net network without using the noise level map and FACNN with the attention structure and noise level map in this paper.  Figure 10a shows a part of the land real seismic data with a low SNR. Figure 10b,c are the denoised results of U-Net without the noise level map and attention-CNN with the noise level map, respectively. The improvement of (b,c) lies in the AGs and mixed loss functions proposed in this paper. The rest of the network structure, training set, and parameters were the same. We can find that the denoising results of FACNN (c) were significantly better than those without the noise level map (b). In comparing the noise difference profiles (d,e), we found that using the noise level map led to better generalization and better event protection of the seismic data. Furthermore, we can find that, by using the structure-preserving method proposed in this paper, we can protect the structural information of the seismic signal.  Figure 11 compares the FK spectrum in real seismic data. (a-c) are the FK spectrum of contaminated real seismic data, the denoised result using U-Net without the noise level map, and the denoised result using attention-CNN with the noise level map. It can be found that the FK spectrum of Figure 11a contains a large number of random noise residues in the whole wavenumber range. The random noise in Figure 11b,c is well suppressed. We can find that the noise in the whole wavenumber part of (c) is significantly better suppressed. It is worth noting that the morphology and characteristics of the effective signal are destroyed in the U-Net-based suppression results in (b). In contrast, in the Attention-CNN-based suppression results in (c), the information of the effective signal is preserved completely. This reaffirmed that the FACNN proposed in this paper has good generalization and structure-preserving ability.

Discussion
This paper proposed solutions for two problems in AI-based denoising of seismic data. Firstly, for the problem that the trained neural network lacks generalization, this paper added a noise level map to improve the network's generalization ability. Secondly, for the problem that the denoising method will damage the structural information of seismic signals, this paper added the attention layer and mixed loss function to improve the ability of the network to protect the structural information of seismic signals.
The research ideas in this paper can be extended from the following aspects. Firstly, the noise level map proposed in this paper can be combined with the self-supervised learning algorithm for the random noise removal aspect. Secondly, the structure-preserving algorithm proposed in this paper can be applied to other types of noise removal work in seismic data, such as surface waves and linear noise.

Conclusions
In this paper, we proposed a flexible attention-CNN (FACNN) and realized the intelligent denoising work of seismic data. We integrated the attention gates (AGs) with the mixed loss function in a standard U-Net model. Then, we added a noise level map as an additional channel in the network input data body, making a single CNN model that can handle the noise models with different noise characteristics, even spatially variant noises. Adding the noise level map can improve the network's generalization ability, and adding the attention structure with the mixed loss function can better protect the structural information of the seismic data. The testing results on synthetic and real data demonstrated the superiority of our proposed method. Extensive experimental results showed that our method has better generalization and can better protect the details of seismic events.

Conflicts of Interest:
The authors declare no conflict of interest.