Seismic Random Noise Attenuation Using a Tied-Weights Autoencoder Neural Network

: Random noise is unavoidable in seismic data acquisition due to anthropogenic impacts or environmental inﬂuences. Therefore, random noise suppression is a fundamental procedure in seismic signal processing. Herein, a deep denoising convolutional autoencoder network based on self-supervised learning was developed herein to attenuate seismic random noise. Unlike conventional methods, our approach did not use synthetic clean data or denoising results as a training label to build the training and test sets. We directly used patches of raw noise data to establish the training set. Subsequently, we designed a robust deep convolutional neural network (CNN), which only depended on the input noise dataset to learn hidden features. The mean square error was then evaluated to establish the cost function. Additionally, tied weights were used to reduce the risk of over-ﬁtting and improve the training speed to tune the network parameters. Finally, we denoised the target work area signals using the trained CNN network. The ﬁnal denoising result was obtained after patch recombination and inverse operation. Results based on synthetic and real data indicated that the proposed method performs better than other novel denoising methods without loss of signal quality loss. aimed to decoding the was used to of hidden to reconstruct original was on the up-sampling processing of the nearest-neighbour interpolation, the sample by inserting


Introduction
Seismic signals are often disturbed by noise, and consequently, the signal-to-noise ratio (SNR) of seismic signals is typically low. A low SNR indicates poor signal quality, which affects downstream signal processing, such as deconvolution [1], seismic in-version [2], and seismic attribute analysis [3]. Therefore, seismic signals must be pre-processed to obtain clear and high-resolution seismic profiles [4,5], and effective noise removal methods are critical for seismic signal pre-processing [6,7]. The relationship between noise and signals is typically classified as coherent and incoherent. Unlike incoherent noise, which is randomly mixed with the effective signal, affects signal recognition, and does not have a fixed dominant frequency or apparent velocity, coherent noise shows a correlation with effective signals, thereby enabling the identification of defined features.
Effective suppression of seismic random noise and the recovery of high-quality seismic data are challenging and have garnered considerable attention from the scientific community [8,9]. Many denoising methods and theories have been proposed and widely used in seismic data processing. The first method is based on the seismic data stack in the offset direction [10,11]. However, this method is not suitable for pre-stack seismic data denoising and exhibits considerable dip angle limitations. The second method is based on predictive filtering, which assumes that the signal is predictable and random noise is not. Based on the above theory, f-x deconvolution (FXDECON) [12,13], non-stationary Our study aimed to suppress incoherent noise (i.e., random noise) from seismic data and improve the efficiency and robustness of convolutional AE learning. We employed unsupervised learning, as mentioned above, and proposed a deep convolutional DAE network framework to attenuate seismic random noise. Unlike traditional methods, which employ synthetic clean data or denoising results as training labels, we used raw noise data to directly construct the training set. The advantage of the proposed method was that we were able to quickly prepare the training dataset from the noisy data using any input real seismic data input, and we did not disconcert the process of obtaining the real training labeling. Our method showed excellent ability in exploring the specific characteristic of the real noisy seismic data because we designed multiple filters and special procedures for extracting useful features. The MSE was selected as an error criterion to establish the cost function. Additionally, tied weights were used to reduce over-fitting risk. These modifications accelerated the training process and the acquisition of optimal network parameters. Finally, the optimized CNN was used to denoise the patches from a target region. The final denoising result was obtained after patch synthesis and inverse operation. This method was used to process synthetic and real seismic data along with other novel denoising methods such as MSSA, FXDECON, and wavelet transform. Upon comparison, the proposed method was found to be more effective than the aforementioned alternatives.

Methodology
Seismic data noise was defined as the sum of the effective signal and noise and could be expressed as follows: where X is the clean signal to be solved, N is the additive noise, and Y is the observed seismic noise data. Notably, signal X was independent of noise N. We restricted our discussion to the assumption that the noise term N was additive isotropic Gaussian noise, and each example n i ∈ N(0, σ) was a variance σ distribution derived from a zero-mean vector.
The proposed method, which was based on deep CNNs, primarily aimed to establish a relationship between X and Y. To train an effective deep CNN, a good training dataset and test dataset should be prepared first.

Training Set Preparation and Self-Supervised Learning
To ensure the quality of deep learning and accelerate network convergence, we first normalized the original seismic dataset Y as follows: where Y * is the normalized dataset, and Y.max is the maximum value of the original data. We normalized the data range to a value between −1 and 1.

Self-Supervised Learning
Self-supervised learning is adopted to circumvent the need for label preparation associated with supervised learning [50]. Our training data were not extracted from clean data. Instead, we directly used the noise data of the target region. The neural network was then used to extract the characteristics of the seismic data by learning a small number of training samples. Afterwards, the feature model was used to analyze the entire work area for denoising.

Dataset Preparation
After normalizing the data derived from the region of interest, the random sampling operator R 1 was introduced to randomly extract the data from the target area to construct the training set X train as follows: where the sampling operator R 1 randomly divides the noisy data Y* of the target work area into a size N × N matrix and uses cutting operator C to divide the training data into smaller training patches of size n to obtain X train . This approach could generate immense training samples, which was helpful to network learning. However, using small datasets could reduce computational complexity and improve the efficiency of neural network training.
After repeated experiments and referring to the literature, a total of 32 general training seismic patches (n value) were selected from a range of 28-64 [51,52]. When the entire AE network structure completed the deep-learning training for the training set X train , the neural network model was used to denoise the target work area as follows: First, the target work area Y * was divided into N 1 × N 2 data using the regular operator R 2 , after which the training data was divided into smaller patches X test with size n using the cutting operator C for denoising. The neural network model M DAE with optimized parameters was then used to denoise X test and obtain X test after prediction. Subsequently, the denoised blocks were restored to the large data block using operator C . Finally, the sampling operator R 2 was used to reverse the process and merge the data into a complete work area, after which reverse normalization was performed to obtain the denoised data of the entire work area.
The patch numbers of the training and test sets were generally different. Each patch in the training set processed by cutting operator C had the same size n as the small block in the test set. Additionally, we used operator R 2 to ensure that all raw data were included in the denoising process. The entire process is illustrated in Figure 1.

2021, 11, x FOR PEER REVIEW 4 of 20
After normalizing the data derived from the region of interest, the random sampling operator R1 was introduced to randomly extract the data from the target area to construct the training set Xtrain as follows: where the sampling operator R1 randomly divides the noisy data Y* of the target work area into a size N × N matrix and uses cutting operator C to divide the training data into smaller training patches of size n to obtain Xtrain. This approach could generate immense training samples, which was helpful to network learning. However, using small datasets could reduce computational complexity and improve the efficiency of neural network training. After repeated experiments and referring to the literature, a total of 32 general training seismic patches (n value) were selected from a range of 28-64 [51,52]. When the entire AE network structure completed the deep-learning training for the training set Xtrain, the neural network model was used to denoise the target work area as follows: First, the target work area * was divided into N1 × N2 data using the regular operator , after which the training data was divided into smaller patches with size n using the cutting operator C for denoising. The neural network model with optimized parameters was then used to denoise and obtain X after prediction. Subsequently, the denoised blocks were restored to the large data block using operator . Finally, the sampling operator was used to reverse the process and merge the data into a complete work area, after which reverse normalization was performed to obtain the denoised data of the entire work area.
The patch numbers of the training and test sets were generally different. Each patch in the training set processed by cutting operator C had the same size n as the small block in the test set. Additionally, we used operator to ensure that all raw data were included in the denoising process. The entire process is illustrated in Figure 1.

Principle of Convolutional AE Seismic Denoising
DAE neural networks were used for feature selection and extraction via dimensionality reduction and reconstruction. AE neural networks used fewer hidden layers than inputs to train the network to ignore input "noise", as shown in Figure 2.

Principle of Convolutional AE Seismic Denoising
DAE neural networks were used for feature selection and extraction via dimensionality reduction and reconstruction. AE neural networks used fewer hidden layers than inputs to train the network to ignore input "noise", as shown in Figure 2. The encoding process was as follows: For the input noise data X, there was a mapping relationship F, which was used to compress X into Y. This was generally referred to as an "encoder". Moreover, this mapping relationship was typically nonlinear.
where is a nonlinear activation function. The encoding mapping parameter is set as F = {W, b}, where W is the weight matrix, and b is the offset. The decoding process was as follows: through a mapping relation G(Y), the compressed representation Y was restored to the reconstruction Z, which was as close as possible to its input: where is a nonlinear activation function. The encoding mapping parameter is set as G = {W, b}, where W is the weight matrix, and b is the offset. The entire encoding and decoding processes were trained by minimizing the loss function as follows: The purpose of mapping F(x) and G(Y) was to minimize the average reconstruction error of the training set, so that Z was similar to the original input noisy data X. The parameters were initialized randomly and then optimized using the random gradient descent method.
Based on the above-described procedure, the entire DAE process consisted of data feature extraction and recovery of complete data [53]. Encoding was a dimension-reduction process that was used to extract specific features, whereas decoding was a dimensionraising process that was used to recover complete data. Since random noise was irregular data and effective signals were regular data, a robust feature representation could be generated through nonlinear neural network learning. Therefore, if an algorithm could accurately reconstruct its input, it could also retain most of the input feature information [54].
When we denoise a one-dimensional signal, a fully connected AE neural network is always selected. Nevertheless, fully connected AE neural networks were not appropriate for two-dimensional seismic signals because: (1) a fully connected AE neural network flattened the input layer and turned them into a single vector that lost the structural information of seismic signals; (2) in a fully connected mesh topology, all nodes were connected to each other, thereby resulting in considerable redundancy and prohibitively high implementation costs.
In recent years, CNNs have been successfully implemented to extract local features of multidimensional data and have shown extraordinary denoising performance. However, the complexity of seismic profile signals was considerably higher than that of ordinary images because of the complexity of the stratigraphic structure and noise interfer- The encoding process was as follows: For the input noise data X, there was a mapping relationship F, which was used to compress X into Y. This was generally referred to as an "encoder". Moreover, this mapping relationship was typically nonlinear.
where σ is a nonlinear activation function. The encoding mapping parameter is set as F = {W, b}, where W is the weight matrix, and b is the offset. The decoding process was as follows: through a mapping relation G(Y), the compressed representation Y was restored to the reconstruction Z, which was as close as possible to its input: where σ is a nonlinear activation function. The encoding mapping parameter is set as G = {W, b}, where W is the weight matrix, and b is the offset. The entire encoding and decoding processes were trained by minimizing the loss function as follows: The purpose of mapping F(x) and G(Y) was to minimize the average reconstruction error J DAE of the training set, so that Z was similar to the original input noisy data X. The parameters were initialized randomly and then optimized using the random gradient descent method.
Based on the above-described procedure, the entire DAE process consisted of data feature extraction and recovery of complete data [53]. Encoding was a dimension-reduction process that was used to extract specific features, whereas decoding was a dimensionraising process that was used to recover complete data. Since random noise was irregular data and effective signals were regular data, a robust feature representation could be generated through nonlinear neural network learning. Therefore, if an algorithm could accurately reconstruct its input, it could also retain most of the input feature information [54].
When we denoise a one-dimensional signal, a fully connected AE neural network is always selected. Nevertheless, fully connected AE neural networks were not appropriate for two-dimensional seismic signals because: (1) a fully connected AE neural network flattened the input layer and turned them into a single vector that lost the structural information of seismic signals; (2) in a fully connected mesh topology, all nodes were connected to each other, thereby resulting in considerable redundancy and prohibitively high implementation costs.
In recent years, CNNs have been successfully implemented to extract local features of multidimensional data and have shown extraordinary denoising performance. However, the complexity of seismic profile signals was considerably higher than that of ordinary images because of the complexity of the stratigraphic structure and noise interference. Therefore, our study aimed to improve the efficiency and robustness of convolutional AE learning. In this study, we proposed a modified AE neural network based on tied weights, which combined AE with CNNs to improve the performance of seismic noise attenuation and showed a higher practical application value.

Proposed Network Architecture and Optimization
We proposed an end-to-end deep DAE network in which the encoder and decoder functions were combined. The proposed DAE requires structurally symmetric encoding and decoding layers, as shown in Figure 3. In other words, they had the same size in the corresponding structure, sharing certain parameters; therefore, only one set of weights was needed for learning. In the last layer, the decoding weight was the transpose of the first encoding weight because of the opposite process. We used the up-sampling layer to resize the signal, which simply doubled the dimensions of the input signal and did not perform an inverse operation. In the other decoding layers, the decoding weights and encoding weights shared the same set of weights. This was described using formulas (7) and (8): where F i is the mapping vector of the i-th layer, and F i and b i are the weight and bias of the training, respectively. Moreover, σ is the nonlinear activation function, where n is the total number of convolution layers, and W i is the weight of the i-th convolution layer. In this way, the tied weights between the encoder and decoder are generated.
ence. Therefore, our study aimed to improve the efficiency and robustness of c tional AE learning. In this study, we proposed a modified AE neural network b tied weights, which combined AE with CNNs to improve the performance of noise attenuation and showed a higher practical application value.

Proposed Network Architecture and Optimization
We proposed an end-to-end deep DAE network in which the encoder and d functions were combined. The proposed DAE requires structurally symmetric en and decoding layers, as shown in Figure 3. In other words, they had the same siz corresponding structure, sharing certain parameters; therefore, only one set of was needed for learning. In the last layer, the decoding weight was the transpos first encoding weight because of the opposite process. We used the up-sampling resize the signal, which simply doubled the dimensions of the input signal and perform an inverse operation. In the other decoding layers, the decoding weights coding weights shared the same set of weights. This was described using formulas (8): where is the mapping vector of the i-th layer, and and are the weight a of the training, respectively. Moreover, is the nonlinear activation function, wh the total number of convolution layers, and is the weight of the i-th convolutio In this way, the tied weights between the encoder and decoder are generated. Compared with learning individual weights in the decoding and coding stag method had the following benefits: (1) faster training speed, as we stopped the training process in decoding layers by freezing the weights and shared the weigh encoding layers, reducing the number of model parameters and just learning on weights; (2) better learning performance, as this was often preferred over learnin rate weights for both phases and could be regarded as a regularisation form, thus re the risk of over-fitting. Figure 4 illustrates the overall network architecture and key steps of our pr procedure. In the entire convolutional encoding layer (L1-L3), we used three pairs volutional layers, batch regularisation layers, and max-pooling layers. The interm layer (L4) was a pair of convolution and batch regularisation layers. The interm layer was a latent representation layer that forced the AE to find patterns in the inp and to eliminate unimportant features. In the decoding layer (L5-L7), we used thr of up-sampling layers, batch regularisation layers, and convolution layers, whic mirror-symmetric to the encoding layer. In the encoding and decoding proces Compared with learning individual weights in the decoding and coding stages, this method had the following benefits: (1) faster training speed, as we stopped the model training process in decoding layers by freezing the weights and shared the weights from encoding layers, reducing the number of model parameters and just learning one set of weights; (2) better learning performance, as this was often preferred over learning separate weights for both phases and could be regarded as a regularisation form, thus reducing the risk of over-fitting. Figure 4 illustrates the overall network architecture and key steps of our proposed procedure. In the entire convolutional encoding layer (L1-L3), we used three pairs of convolutional layers, batch regularisation layers, and max-pooling layers. The intermediate layer (L4) was a pair of convolution and batch regularisation layers. The intermediate layer was a latent representation layer that forced the AE to find patterns in the input data and to eliminate unimportant features. In the decoding layer (L5-L7), we used three pairs of up-sampling layers, batch regularisation layers, and convolution layers, which were mirror-symmetric to the encoding layer. In the encoding and decoding process steps, leaky ReLU was used as the activation function to enhance the training of complex seismic signals with negative values. In the last layer, tanh was used as the activation function because the range of the tanh activation function value was between −1 and 1, which was more suitable for our model. A max-pooling layer, which accelerated the calculation and prevented over-fitting, was used to obtain translation invariant representation and dimension reduction. A batch regularisation layer was used to stabilize the gradient training of deep neural networks, thereby improving training speeds [55]. In the decoding layer, the up-sampling layer was used to expand the dimensions of the hidden feature to reconstruct the original sample. This method was based on the up-sampling processing of the nearest-neighbour interpolation, which increased the sample rate by inserting zeros between samples. erals 2021, 11, x FOR PEER REVIEW 7 of leaky ReLU was used as the activation function to enhance the training of complex seism signals with negative values. In the last layer, tanh was used as the activation funct because the range of the tanh activation function value was between -1 and 1, which w more suitable for our model. A max-pooling layer, which accelerated the calculation a prevented over-fitting, was used to obtain translation invariant representation and dim sion reduction. A batch regularisation layer was used to stabilize the gradient training deep neural networks, thereby improving training speeds [55]. In the decoding layer, up-sampling layer was used to expand the dimensions of the hidden feature to reconstr the original sample. This method was based on the up-sampling processing of the neare neighbour interpolation, which increased the sample rate by inserting zeros between sa ples. As mentioned above, combining Equations (1)-(11), the entire denoising process w expressed as follows: where X represents the input noisy data, and Z represents the denoised restored data af processing. The represents the DAE neural network model, which included sev layers. = , indicated that the network parameters including weight and bias. W froze the weights in the decoding layers and shared the weights from encoding layers The size of the signal patch and convolution filter had a dramatic effect on the noising performance of the model. Through several experiments and by referring to p vious studies [44][45][46][47][48][49], we selected the best performing 32 × 32 as the best performing pa size and 3 × 3 as the filter size. We used Adam as the optimizer and MSE as the loss fu tion. A total of 25 epochs were trained, and satisfactory results were obtained. Tabl shows the network architecture.

Numerical Experiments
The denoising performance of the proposed method was assessed using synthe As mentioned above, combining Equations (1)-(11), the entire denoising process was expressed as follows: where X represents the input noisy data, and Z represents the denoised restored data after processing. The M DAE represents the DAE neural network model, which included seven layers. δ = {W, b} indicated that the network parameters including weight and bias. We froze the weights in the decoding layers and shared the weights from encoding layers. The size of the signal patch and convolution filter had a dramatic effect on the denoising performance of the model. Through several experiments and by referring to previous studies [44][45][46][47][48][49], we selected the best performing 32 × 32 as the best performing patch size and 3 × 3 as the filter size. We used Adam as the optimizer and MSE as the loss function. A total of 25 epochs were trained, and satisfactory results were obtained. Table 1 shows the network architecture.

Numerical Experiments
The denoising performance of the proposed method was assessed using synthetic and real data, after which the results were compared with those obtained with three novel methods (wavelet transform, FXDECON, and MSSA). The denoising performance results were evaluated based on the peak signal-to-noise ratio (PSNR). This parameter represented the ratio between the maximum possible power of a signal and the destructive noise power. PSNR was typically used to evaluate the quality of a compressed image and compare the results with the original image (used for signal denoising). The higher the value of the PSNR, the better the quality and the higher the resolution. PSNR was expressed as follows: where MAX is the peak value of the signal and PSNR is measured in dB, which is mainly used for image compression. When processing an image, MAX represents the maximum possible pixel value. In seismic data processing, MAX is the maximum value of the seismic data. MSE is an estimator that represents the cumulative square error between the reconstructed and original signals. The lower the MSE value, the lower the error. MSE can be expressed as follows: where I(i, j) is the original data, K(i, j) is the approximate data (processed data), and m and n are the data dimensions.

Synthetic Signal Used in the Experiment
The synthetic seismic signal used herein was part of a record obtained from the forward modeling, consisting of 120 traces with a total time of 0.6 s, and a sampling interval time of 5 ms, each of which included 120 sampling points, as shown in Figure 5a. The entire seismic signal was complex and contained both strong and weak amplitude signals. The noise signal shown in Figure 5b was obtained after normalizing the seismic signal and adding random noise levels of 0.25. The PSNR of the noise signal was 13.98 dB, which was used as the test signal in the subsequent process. The noise pollution was considerable, with fuzzy axis signals and details that were difficult to distinguish.
were evaluated based on the peak signal-to-noise ratio (PSNR). This parame sented the ratio between the maximum possible power of a signal and the de noise power.
PSNR was typically used to evaluate the quality of a compressed image and the results with the original image (used for signal denoising). The higher the va PSNR, the better the quality and the higher the resolution. PSNR was expressed a PSNR = 10 × log = 20 × log √ , where MAX is the peak value of the signal and PSNR is measured in dB, which used for image compression. When processing an image, MAX represents the m possible pixel value. In seismic data processing, MAX is the maximum value of th data. MSE is an estimator that represents the cumulative square error between t structed and original signals. The lower the MSE value, the lower the error. M expressed as follows: where , is the original data, , is the approximate data (processed dat and n are the data dimensions.

Synthetic Signal Used in the Experiment
The synthetic seismic signal used herein was part of a record obtained from ward modeling, consisting of 120 traces with a total time of 0.6 s, and a samplin time of 5 ms, each of which included 120 sampling points, as shown in Figur entire seismic signal was complex and contained both strong and weak amplitud The noise signal shown in Figure 5b was obtained after normalizing the seismic s adding random noise levels of 0.25. The PSNR of the noise signal was 13.98 dB, w used as the test signal in the subsequent process. The noise pollution was con with fuzzy axis signals and details that were difficult to distinguish.

Comparison between Tied Weights and Non-Tied Weights
First, we evaluated the influence of tied weights and non-tied weights on the denoising performance of synthetic data. We performed the algorithm experiment on a notebook with a 2.0 G Intel i7 8 core processor and 16 GB of memory. To verify the denoising efficiency and denoising results of the two network models, we performed tied-weights and nontied-weights denoising experiments on seven different training sets (from 10,000 to 40,000) with the same network parameters and the same training epoch (25 epochs). Figure 6a showed the training time of the two network models for different numbers of training sets. The training time of the tied-weights model for different numbers of training sets was lower than that of the non-tied-weights model, and the average training time was reduced by 19%, which demonstrated that the tied-weights AE method improved the training speed (Table 2). Figure 6b showed the denoising effect of the two network models after optimizing the model parameters through different numbers of training sets. We then calculated the PSNR value of the denoised result and the original clean signal. The PSNR of the tied-weights model for different numbers of training sets was higher than that of the non-tied-weights model, with an average improvement of 1.08 dB, which proved that the proposed method performed better ( Table 3). The two methods achieved the best results when using 25,000 training sets, after which the denoising effect tended to be stable but not improved. Figure 6c shows the denoising effect of the two network models after optimizing the model parameters through different epochs. We also calculated the PSNR value of the denoised result and the original clean signal. First, the PSNR of the tied-weights model for different epochs was higher than that of the non-tied-weights model with an average improvement of 0.80 dB, which proved that the proposed method performed well (Table 4). Second, the two methods achieved the best results after 25 epochs, after which the denoising effect tended to be stable but not improved. In other words, 25 epochs led to the best denoising performance.

. Comparison between Tied Weights and Non-Tied Weights
First, we evaluated the influence of tied weights and non-tied weights on the denoising performance of synthetic data. We performed the algorithm experiment on a notebook with a 2.0 G Intel i7 8 core processor and 16 GB of memory. To verify the denoising efficiency and denoising results of the two network models, we performed tied-weights and non-tied-weights denoising experiments on seven different training sets (from 10,000 to 40,000) with the same network parameters and the same training epoch (25 epochs). Figure 6a showed the training time of the two network models for different numbers of training sets. The training time of the tied-weights model for different numbers of training sets was lower than that of the non-tied-weights model, and the average training time was reduced by 19%, which demonstrated that the tied-weights AE method improved the training speed (Table 2). Figure 6b showed the denoising effect of the two network models after optimizing the model parameters through different numbers of training sets. We then calculated the PSNR value of the denoised result and the original clean signal. The PSNR of the tied-weights model for different numbers of training sets was higher than that of the non-tied-weights model, with an average improvement of 1.08 dB, which proved that the proposed method performed better ( Table 3). The two methods achieved the best results when using 25,000 training sets, after which the denoising effect tended to be stable but not improved. Figure 6c shows the denoising effect of the two network models after optimizing the model parameters through different epochs. We also calculated the PSNR value of the denoised result and the original clean signal. First, the PSNR of the tied-weights model for different epochs was higher than that of the non-tied-weights model with an average improvement of 0.80 dB, which proved that the proposed method performed well (Table 4). Second, the two methods achieved the best results after 25 epochs, after which the denoising effect tended to be stable but not improved. In other words, 25 epochs led to the best denoising performance.

Experimental Comparison with Other Denoising Algorithms
We compared the existing three novel denoising algorithms, FXDECON, MSSA, and wavelet transform, with our proposed methods. As shown in Figure 7, some random noise was still observed in a-c, and the edge of the event axis was fuzzy. In Figure 7d, random noise interference was almost absent, and the edge of the event axis of the effective signal became clear. The results illustrated in Figure 7d demonstrated that the PSNR value of our proposed method was 21.00 dB, which was higher than the denoising results of other methods. Table 5 shows the main parameters and denoising performance of each method. Therefore, both the qualitative and quantitative results demonstrated that the denoising ability of the proposed method was better than that of the other three methods.

Experimental Comparison with Other Denoising Algorithms
We compared the existing three novel denoising algorithms, FXDECON, MSSA, and wavelet transform, with our proposed methods. As shown in Figure 7, some random noise was still observed in a-c, and the edge of the event axis was fuzzy. In Figure 7d, random noise interference was almost absent, and the edge of the event axis of the effective signal became clear. The results illustrated in Figure 7d demonstrated that the PSNR value of our proposed method was 21.00 dB, which was higher than the denoising results of other methods. Table 5 shows the main parameters and denoising performance of each method. Therefore, both the qualitative and quantitative results demonstrated that the denoising ability of the proposed method was better than that of the other three methods.    Figure 8 illustrates the noise removal results of the FXDECON, MSSA, wavelet transform, and the proposed method. Fewer random noise and residual effective signals were observed in Figure 8a, whereas 8b and 8c removed more random noise but left some residual effective signals. In Figure 8d, more random noise was removed, but fewer residual coherent signals were left. This illustrated that the proposed method could remove more random noise while maintaining an effective signal.   Figure 8 illustrates the noise removal results of the FXDECON, MSSA, wavelet trans form, and the proposed method. Fewer random noise and residual effective signals were observed in Figure 8a, whereas 8b and 8c removed more random noise but left some re sidual effective signals. In Figure 8d, more random noise was removed, but fewer residua coherent signals were left. This illustrated that the proposed method could remove more random noise while maintaining an effective signal. To further study the denoising performance of the four methods, we extracted the 100th channel from the denoising result of the clean data, noise data, FXDON, MSSA, wavelet transform, and the proposed method and calculated the frequency amplitude spectrum, as shown in Figure 9a-d. As shown in the figure, the noise significantly interferes with the effective signal, especially after 40 Hz, which deviates greatly from the effective signal. Moreover, Figure 9b-d shows that the four methods maintain the general shape of the original signal amplitude spectrum within a 0-40 Hz range. After 40 Hz, some random noise was compressed in (b)-(c), but more random noise remained. As indicated in Figure 9d, the sharp peaks in the noise spectrum of the curve were effectively removed, which demonstrated that the proposed method accurately extracted the underlying useful signals from the noisy input. To further study the denoising performance of the four methods, we extracted the 100 th channel from the denoising result of the clean data, noise data, FXDON, MSSA, wavelet transform, and the proposed method and calculated the frequency amplitude spectrum, as shown in Figure 9a-d. As shown in the figure, the noise significantly interferes with the effective signal, especially after 40 Hz, which deviates greatly from the effective signal. Moreover, Figure 9b-d shows that the four methods maintain the general shape of the original signal amplitude spectrum within a 0-40 Hz range. After 40 Hz, some random noise was compressed in (b)-(c), but more random noise remained. As indicated in Figure 9d, the sharp peaks in the noise spectrum of the curve were effectively removed, which demonstrated that the proposed method accurately extracted the underlying useful signals from the noisy input. To conduct a more detailed numerical comparison, we selected seven synthetic datasets and added random noise levels of 0.1, 0.125, 0.15, 0.175, 0.20, 0.22.5, and 0.25 based on the normalized data. Figure 10 shows the PSNR noise values of FXDECON, MSSA, wavelet transform, and the proposed method at different random noise levels. As illustrated in the figure, the denoising results of the proposed method were better than those obtained using other methods, demonstrating the excellent denoising performance of the proposed method at different noise levels. To conduct a more detailed numerical comparison, we selected seven synthetic datasets and added random noise levels of 0.1, 0.125, 0.15, 0.175, 0.20, 0.22.5, and 0.25 based on the normalized data. Figure 10 shows the PSNR noise values of FXDECON, MSSA, wavelet transform, and the proposed method at different random noise levels. As illustrated in the figure, the denoising results of the proposed method were better than those obtained using other methods, demonstrating the excellent denoising performance of the proposed method at different noise levels.

Application to Real Seismic Signals
To study the denoising performance of the proposed method in practical applications, we analyzed real seismic profile data from the South China Sea. The real profile data had 1000 traces, with a total time of 2 s and a sampling interval time of 2 ms. The frequency range was between 5 and 70 Hz, the wavelength was 300m, and the distance between two traces was 25 m.
There were several differences between the real and synthetic datasets in our experiments. Table 6 shows a quantitative comparison between the real and synthetic datasets in our experiments. First, the real dataset contained more complicated events that the forward modeling could not achieve, such as more faults, fractures, and buried hills, as observed in Figure 11. Second, the signals of the real data had wide-frequency range characteristics, which meant that more detailed signals were submerged by the noise. Third, we added white Gaussian random noise to the synthetic raw data and obtained the noisy data. However, incoherent background noise mixed with real dataset signals was unstructured, untrackable, and not Gaussian distributed. All these differences presented additional challenges.

Application to Real Seismic Signals
To study the denoising performance of the proposed method in practical applications, we analyzed real seismic profile data from the South China Sea. The real profile data had 1000 traces, with a total time of 2 s and a sampling interval time of 2 ms. The frequency range was between 5 and 70 Hz, the wavelength was 300m, and the distance between two traces was 25 m.
There were several differences between the real and synthetic datasets in our experiments. Table 6 shows a quantitative comparison between the real and synthetic datasets in our experiments. First, the real dataset contained more complicated events that the forward modeling could not achieve, such as more faults, fractures, and buried hills, as observed in Figure 11. Second, the signals of the real data had wide-frequency range characteristics, which meant that more detailed signals were submerged by the noise. Third, we added white Gaussian random noise to the synthetic raw data and obtained the noisy data. However, incoherent background noise mixed with real dataset signals was unstructured, untrackable, and not Gaussian distributed. All these differences presented additional challenges. We first randomly collected 43,000 32 × 32 patches in the target region to establish the training set. To verify the denoising effect, we intercepted the test data of 1000 seismic traces and 1000 time samples in the target work area, as shown in Figure 11. Our method was then used to process the real seismic signal. The network structure was the same as described above, including seven convolution layers and other network parameters. The optimized network model was obtained after 25 training epochs, after which it was used to denoise the test seismic signal. Figure 11. Real seismic data. Figure 12a-d showed the denoising results obtained using FXDECON, MSSA, wavelet transform, and the proposed method, respectively. Notably, FXDECON denoising resulted in more residual noise. Moreover, denoising using MSSA and wavelet transform resulted in less residual noise; however, the edge of the event was fuzzy, thereby indicating low fidelity. In contrast, the proposed method rendered less residual noise and clearer signal details. As highlighted in the local correlation map, Figure 12d exhibited more detailed thin-bed reflections and less random noise. However, in Figure 12a,b, the thin-bed details were hazy and difficult to recognise. Figure 12e-h illustrates the noise removal results of FXDECON, MSSA, wavelet transform, and the proposed method, respectively. Among them, the FXDECON, MSSA, and wavelet transform methods left some coherent signals, which indicated that the original signal was damaged during denoising. Moreover, no evident line reflection signal was obtained using the proposed method, which indicated that the proposed method caused no serious damage to the signal during the denoising process. In summary, the denoising performance of the proposed method was better than the others. We first randomly collected 43,000 32 × 32 patches in the target region to establish the training set. To verify the denoising effect, we intercepted the test data of 1000 seismic traces and 1000 time samples in the target work area, as shown in Figure 11. Our method was then used to process the real seismic signal. The network structure was the same as described above, including seven convolution layers and other network parameters. The optimized network model was obtained after 25 training epochs, after which it was used to denoise the test seismic signal. Figure 12a-d showed the denoising results obtained using FXDECON, MSSA, wavelet transform, and the proposed method, respectively. Notably, FXDECON denoising resulted in more residual noise. Moreover, denoising using MSSA and wavelet transform resulted in less residual noise; however, the edge of the event was fuzzy, thereby indicating low fidelity. In contrast, the proposed method rendered less residual noise and clearer signal details. As highlighted in the local correlation map, Figure 12d exhibited more detailed thin-bed reflections and less random noise. However, in Figure 12a,b, the thin-bed details were hazy and difficult to recognise. Figure 12e-h illustrates the noise removal results of FXDECON, MSSA, wavelet transform, and the proposed method, respectively. Among them, the FXDECON, MSSA, and wavelet transform methods left some coherent signals, which indicated that the original signal was damaged during denoising. Moreover, no evident line reflection signal was obtained using the proposed method, which indicated that the proposed method caused no serious damage to the signal during the denoising process. In summary, the denoising performance of the proposed method was better than the others.
Local similarity describes the similarity of a seismic signal in relation to another one. Therefore, in local similarity analysis, we calculated the local similarity between the denoised result and removed noise to further study the signal leakage of denoised data. Figure 13a-d showed the local similarity analysis between the removed noise and denoising results obtained via FXDECON, MSSA, wavelet transform, and the proposed method, respectively. Local abnormal area similarities indicated that the noise and denoising results of the corresponding position were similar. In turn, this was indicative of signal leakage (i.e., damage to the original signal). As illustrated in the figure, there were many high-similarity abnormal regions in the MSSA and wavelet transform algorithm, which indicated considerable signal damage. In contrast, compared with the other methods, the proposed method exhibited less high-similarity outliers, meaning that the proposed procedure not only rendered less signal leakage but also preserved more effective signals after noise removal.  of the corresponding position were similar. In turn, this was indicative of signal leakage (i.e., damage to the original signal). As illustrated in the figure, there were many highsimilarity abnormal regions in the MSSA and wavelet transform algorithm, which indicated considerable signal damage. In contrast, compared with the other methods, the proposed method exhibited less high-similarity outliers, meaning that the proposed procedure not only rendered less signal leakage but also preserved more effective signals after noise removal.

Discussion
In this study, we proposed a tied-weights AE neural network, which presented several advantages. First, our method had a wider scope of application. We directly used patches of raw noise data to establish the training set; therefore, we could prepare the training dataset from the raw data from any real seismic data input. Second, the proposed

Discussion
In this study, we proposed a tied-weights AE neural network, which presented several advantages. First, our method had a wider scope of application. We directly used patches of raw noise data to establish the training set; therefore, we could prepare the training dataset from the raw data from any real seismic data input. Second, the proposed method benefited from the end-to-end deep convolutional DAE framework and showed a strong ability to extract useful features in the real noisy seismic data. Third, our method accelerated the training process and improved the denoising performance, as we proposed a modified AE neural network based on tied weights to reduce the training number of model parameters and reduce the risk of over-fitting.
However, there were some limitations to the proposed approach. First, the training procedure was generally time-consuming and complicated. Through experimental comparison and analysis, optimal training times and results could be obtained with 32×32 patches and more than 30,000 training sets. Due to a large amount of training data and complex network structure, it took a long time to train the network model. In addition, the network lacked scalability to high-dimensional features. The DAE network corrupted the inputs before mapping them into the hidden representation and then reconstructed the original input from its corrupted version, leading to the loss of some high-dimensional features.

Conclusions
Our study proposed a framework for a deep denoising convolutional DAE network to suppress seismic random noise. The scheme was based on AE self-supervised learning, which addressed the limitations of supervised learning (i.e., the requirement for data labels).
Unlike conventional approaches, we directly used patches of raw noise data to construct the training and test sets instead of using synthetic clean data or denoising results as training labels. We then designed a robust deep CNN that only depends on the input noise dataset to learn hidden features. Then, we implemented tied weights to reduce the risk of overfitting and accelerate the training process to obtain optimal network parameters. Finally, we employed a strategy to denoise the target signals using the trained CNN network. The final denoising result was obtained after patch recombination and inverse operation. In quantitative experiments, our tied-weights approach reduced the average training time by 19% and improved the average PSNR value to 1.08 dB in contrast to the non-tied-weights approach. The proposed method also had a higher denoising PSNR value than the other novel denoising algorithms. In qualitative experiments, the proposed method rendered less residual noise and clearer signal details and caused less damage to the signal during denoising. Therefore, both the qualitative and quantitative results demonstrated that the proposed procedure provides a promising means for accurate geological exploration.

Data Availability Statement:
The data discussed in this paper will be shared on reasonable request to the corresponding author.