A Machine Learning-Based Seismic Data Compression and Interpretation Using a Novel Shifted-Matrix Decomposition Algorithm

: Seismic data provides integral information in geophysical exploration, for locating hydrocarbon rich areas as well as for fracture monitoring during well stimulation. Because of its high frequency acquisition rate and dense spatial sampling, distributed acoustic sensing (DAS) has seen increasing application in microseimic monitoring. Given large volumes of data to be analyzed in real-time and impractical memory and storage requirements, fast compression and accurate interpretation methods are necessary for real-time monitoring campaigns using DAS. In response to the developments in data acquisition, we have created shifted-matrix decomposition (SMD) to compress seismic data by storing it into pairs of singular vectors coupled with shift vectors. This is achieved by shifting the columns of a matrix of seismic data before applying singular value decomposition (SVD) to it to extract a pair of singular vectors. The purpose of SMD is data denoising as well as compression, as reconstructing seismic data from its compressed form creates a denoised version of the original data. By analyzing the data in its compressed form, we can also run signal detection and velocity estimation analysis. Therefore, the developed algorithm can simultaneously compress and denoise seismic data while also analyzing compressed data to estimate signal presence and wave velocities. To show its efﬁciency, we compare SMD to local SVD and structure-oriented SVD, which are similar SVD-based methods used only for denoising seismic data. While the development of SMD is motivated by the increasing use of DAS, SMD can be applied to any seismic data obtained from a large number of receivers. For example, here we present initial applications of SMD to readily available marine seismic data.


Introduction
The last decade has seen a great increase in hydrocarbon production from unconventional reservoirs. However, there are still many challenges in predicting their production potential. The quantity of hydrocarbons produced depends on the distribution and quantity of fractures present in the reservoir. Fracture identification and monitoring can be done by analyzing microseismic data [1]. With the goal of improving the ability to track the fracture distribution, the amount of seismic data acquired during reservoir monitoring has been increasing.
Recent developments in reservoir monitoring use fiber optic cables to record both low frequency strain and seismic waves in a technique called distributed acoustic sensing (DAS) [2]. DAS provides a new and unique view of the reservoir by recording strain rate data with a high sample rate and dense spatial sampling. Higher recording rates and more receiver positions increase the amount of microseismic activity recorded while monitoring the reservoir. Recording microseismic events with DAS significantly increases the memory requirement of the monitoring data. Thus, data processing methods that can compress the data and reduce its noise would complement the new instrumental developments in reservoir monitoring.
Signal processing algorithms are often based on the transformation of signal into a new domain. Early examples of compression involve discrete cosine transform [3] and wavelet transforms [4][5][6] which use cosine functions or wavelets to represent the data in order to reduce the memory requirements. Reduced-rank methods, which approximate the noiseless seismic data using low rank matrices and tensors, have been used in noise reduction [7] while also reconstructing missing data [8,9]. In recent years, dictionary learning has seen wide application in seismic data. Because of its ability to provide a compact and informative representation of seismic signals, it has been used for both noise reduction [10][11][12] and data compression [13,14]. Unfortunately, the drawback of the dictionary learning applications is the computation time that comes with learning and updating the dictionary. There have been successful efforts to reduce the associated computational times [15], but the computational cost of applying dictionary learning to microseismic DAS recordings is still to great. Thus far, dictionary learning methods have been applied to data collected by geophones, which can be very large but are still much smaller than data obtained by a single fiber-optic cable which can sample strain thousands of times a second on hundreds, even thousands of receiver locations. Since DAS is used for microseismic monitoring, the great amounts of data would need to be processed in real time, which puts a constraint on the computation time of processing methods.
To achieve computationally efficient compression and denoising, we created a new data decomposition method by improving and further developing ideas developed as a part of the local SVD [16]. Similarly to the previously mentioned reduced-rank methods, local SVD applies SVD to a window in seismic data and represents the data using a small number of singular vectors. What makes local SVD unique is the process of shifting the columns of the window to maximize their correlation prior to applying SVD. This allows singular vectors to capture the signal in seismic data with high accuracy, while ignoring most of the noise. Once the data in the window is processed with SVD, the window is moved to the next location. This process is repeated until column shifting and SVD have been applied to every part of the data matrix. The path of the moving window as well as the number of singular vectors used at each location are predetermined. Local SVD can enhance a seismic data set even if it contains multiple wave arrivals. However, local SVD struggles to capture seismic signals if waves with different dips are interfering or present in the same window. The "dip" of the wave refers to the slope of the wave in the matrix, or how much the row position of a wave changes as we move from one column to its adjacent column.
Some of the problems encountered with local SVD were resolved with the development of structure-oriented singular value decomposition (SOSVD) [17]. By using plane wave destruction [18], SOSVD can identify several dominant slopes at each window location. While using plane wave destruction provides a noticeable improvement, numerical artifacts can still appear at the intersection of the waves with different slopes.
Our method is inspired by the SOSVD and local SVD, and it also shifts the columns of the matrix before applying SVD. However, in order to avoid numerical artifacts, we use different processes to determine how columns should be shifted. Specifically, we do not use a moving window with a predetermined path. Instead, we use a geometric mean filter that adaptively chooses which elements to use in the geometric mean. We apply the geometric mean filter to seismic data in order to highlight areas which contain wave arrivals. The windows from the matrix to which we apply SMD depend on the results of the geometric mean filter. This allows us to use more singular vectors to describe areas with multiple wave arrivals, and fewer singular vectors to describe areas dominated by noise.
Additionally, local SVD and SOSVD use SVD results solely to denoise seismic data. They use them to reconstruct denoised version of seismic data as soon as they obtain them and do not discuss the compression achieved by storing SVD results. In our work, instead of reconstructing the data immediately, we store the SVD results. The final product of our algorithm is the collection of SVD results, that can later be reconstructed into the denoised version of original data. By doing this, our algorithm can be used for data compression as well as noise suppression. We call our new method the shifted-matrix decomposition, or SMD for short.
It should also be noted that DAS has seen increasing use outside geophysical exploration. Distributed acoustic sensing has been used to record signals from earthquakes and volcanic events [19]. Because of its unprecedented spatial and temporal resolutions, DAS is expected to see increasing use in earthquake monitoring, imaging of faults and many other geologic formations, and hazard assessment [20]. The growing potential of DAS application outside of geophysical exploration, adds importance to our method, which we believe will play an integral role in the processing of DAS data.
We organize the paper as follows: First we give an overview of singular value decomposition and present two simple examples that show advantages and drawbacks of its application to seismic data. Next, we demonstrate the improvements achieved by shifting the traces before extracting singular vectors. In the following subsection, we describe in detail each step of the SMD algorithm. The SMD algorithm depends on several parameters. The optimal values of said parameters are determined in a machine learning stage following the algorithm description subsection. In the training stage we use seismic field data obtained from marine seismic gathers, which has a large amount of interference between coherent waves, and noise which can be difficult to differentiate from signal. After training on marine seismic gathers, SMD provides accurate results on other seismic data as well as marine seismic gathers, which allows us to skip the training stage in future applications. To confirm the accuracy of SMD, we reproduce synthetic data from [17], and compare results of SMD to results of local SVD and SOSVD. The SMD is then tested on real seismic data. While SMD is primarily developed for application to microseismic data recorded by DAS, we currently do not have access to such data. Instead, we apply SMD to field data obtained from marine seismic gathers [21]. The results of applying SMD to field data are used to reconstruct a denoised version of the data as well as to estimate the elastic wave velocity. Finally, we discuss possible future applications of SMD and how its results could be used in signal detection during seismic monitoring.

Singular Value Decomposition
Consider seismic data stored in a matrix M ∈ R n t ,n r , where n r is the number of the receivers recording the data and n t is the number of time samples. An element of the matrix M i,j describes the ground motion at the j th receiver, and at the i th time step. Singular value decomposition method is a common matrix decomposition method that can be used to express the matrix M of rank r as the product of matrices: where U r = [u 1 , u 2 , . . . , u r ] contains the r left singular vectors as columns, the diagonal matrix D r = diag(λ 1 , λ 2 , .., λ r ) contains the r singular values, and V r = [v 1 , v 2 , .., v r ] contains the r right singular vectors as columns. In traditional SVD, the left singular vectors and right singular vectors are normalized. However, multiplying the right singular vector by the singular value, allows us to store the singular value in the right singular vector, which slightly reduces the memory requirements of SMD results. For that reason, from this point on, the "right singular vector" refers to the normalized right singular vector multiplied by the singular value. Equivalent to Equation (1), SVD can also be used to express the the matrix M as a sum of outer products of left singular vectors and right singular vectors weighted by singular values [7]. However, due to the way we define the right singular vector in this work, we can express is as just a sum of outer products of left singular vectors and right singular vectors: While Equation (2) may be less common, regarding SVD as sum of outer products is helpful for understanding the processes of SMD. In this work, we refer to any column from U as a column vector, and any column from V as a row vector. This is because in the outer product u k v k T all columns are multiples of the column vector u k , and all rows are multiples of the row vector v k T .
Singular value decomposition can also be used as a data compression method. We assume singular values in D r decrease from first to last row. When applying SVD to seismic data, the initial singular values are much greater than the average singular value in D r , and describe most of the signal. By neglecting the small singular values, the singular value decomposition can be used to approximate matrix M as where (r s << r). By using only the first few singular vectors to describe the the matrix M we can significantly decrease the memory requirement of the seismic data. In some cases, a few pairs of singular vectors can very accurately describe the data stored in M. For example, if a wave packet arrives at all receivers at the same time, all columns in M will contain the same pattern, and in that case, we can represent M using a single outer product (see Figure 1). However, if the wave packet arrives at the receivers at different times, as in Figure 2, we can no longer represent the matrix M using a single outer product. Figure 2 shows an example of the errors that will occur if reconstruction of the data from a single outer product is attempted. Though in practice this would never be done, the example does show that greater data compression can be achieved in this framework if signals in data are aligned prior to a decomposition.
If there is significant variation in signal time position for different receivers, a large number of singular vectors is needed to properly represent the signal. This negatively affects the compression as well as denoising, since using more singular vectors would also include representing more of the noise. Any methods utilizing SVD, such as PCA [22], will be subject to the same challenges. We note that PCA and SVD have been used interchangeably in terms of data compression. A mathematical relation between them is demonstrated in [23]. The issues encountered with regular SVD (or PCA) were first addressed with the development of a local SVD by Bekara and Van der Baan [16], and later improvements led to the development of SOSVD by Gan et al. [17]. In both of these methods SVD is applied to a window M w from matrix M with columns shifted in such a way to maximize the correlation between columns. The shape of the window doesn't change as the columns are shifted. This can be achieved in several ways, one of which is simply applying a wrap around condition so that values off the end of a matrix column, for example, are inserted at its beginning (and vice versa). An example of window shifting is described in Figure 3.
where χ is the "shift" operator, M w is the window subset from M and s is the shift vector. The "shift" operator takes a matrix and a shift vector as its arguments and shifts the columns of the matrix based on the values prescribed by the shift vector. For example, if the value of s for column j is n, then we would replace an element M i,j with the element M i+n,j . The matrix decomposition used in local SVD and SOSVD can also be described with the Equation (5): where s k is the shift vector corresponding to the k th pair of left and right singular vectors u k and v k . By using Equation (5) and plugging in M for M w , the matrix M from the previous example can be presented with a single outer product coupled with a shift vector as shown in Figure 4. A simple quantitative example that shows the effectiveness of column-shifting with a wave similar to the one from Figure 4 can be found in the Appendix A. Unfortunately, both local SVD and SOSVD seem to fail to accurately denoise seismic data which contains interfering waves with different dips. In such scenarios, numerical artifacts appear around the area of intersection. We solve this problem by developing shifted-matrix decomposition (SMD) which uses a very different process for determining the shift vector. The following subsection will give a detailed description of the processes SMD uses to obtain the shift vectors and the pairs of singular vectors.

SMD Algorithm
The SMD algorithm can be described by the following steps: 1.
Using a geometric mean filter, choose a specific point (row number and column number) in the data matrix M at which a displacement (or pressure) from a coherent wave was most likely recorded.

2.
Using cross-correlation, find this wave in as many surrounding columns as possible. For each column, the relative row positions of the said wave are recorded in the shift vector. Repeat steps 1-3 until a certain performance criterion is satisfied.
A visual representation of the algorithm can also be found in Figure 5.
To provide a more in-depth understanding of the algorithm, in the following paragraphs we will give a detailed description of each of the four steps.

Step 1
To identify a point in matrix M at which a displacement (or pressure) from a coherent wave is likely recorded, we start by applying a geometric mean filter to the data matrix M, to obtain the matrix E (Equation (6)): For a position (i, j), the parameter n E indicates that n E columns left of the position (i, j) and n E columns right of the position (i, j) will be used when calculating E i,j . If a position (i, j) is close to the first or the final column, fewer elements are used to calculate the geometric mean E i,j as to avoid stepping outside the matrix boundaries. The row positions of elements used in the geometric mean are also constrained in order to always remain within the matrix boundaries.
What makes this filter unique is the choice of elements which are used when calculating the geometric mean. From each of the 2n E surrounding columns, only one element is used in the geometric mean. The variable δ in Equation (6) indicates which element is used in the geometric mean from each column. For example, when calculating the value of the the geometric mean in position (i, j), from column j + q we take the element at row position i + δ i,j (q) to be used in the geometric mean. The process for determining the said set of elements is described with Figure 6.
Consider calculating the geometric mean in order to determine the value of E i,j . Before calculating the geometric mean, we must determine the set of elements used in the mean (determine the values of δ i,j ). We start from the j th column, from which we always use the element M i,j (δ i,j (0) = 0). Then, we proceed to find which elements to use from adjacent columns ( Figure 6a). Assuming M i,j is positive, for column j + 1, we pick the element between positions (i − m, j + 1) and (i + m, j + 1) which has the highest value. If the element M i,j were negative, we would pick the lowest value. The parameter m represents the maximum dip of the wave rounded up to the nearest integer. The same process is used to determine which element to use from column j − 1. For any column j + n s where 1 < n s ≤ n E , for the geometric mean, we pick the element between positions (i s − 1, j + n s ) and (i s + 1, j + n s ) with the highest value (Figure 6b,c). The row number i s is determined by following a linear trend based on the row position i in column j and the picked row position i + δ i,j (n s − 1) in the column j + n s − 1. The same process is used to determine which element to use from columns below j − 1.
The geometric mean filter is designed to imitate the process humans use for recognizing coherent waves in the data. When differentiating signal from noise, the distribution of displacement (or pressure) along a long range of receivers plays an important role. A weak wave might have an amplitude that is even smaller than that of noise in few noisy areas. However, if the displacement (or pressure) from the wave is present in a long range of receivers, and its arrival time follows a hyperbolic trend, a data reviewer would certainly notice the wave. The goal of the geometric mean filter is to assign higher values to such a weak wave than to a noisy area with high amplitudes. By testing the geometric mean filter on a large number of examples, we found few cases in which the weak wave was not assigned higher values than some areas of the noise. However, even in such cases the slight increase in filter output due to the weak wave, followed a hyperbolic trend in time on a large range of receivers, while the higher increases due to randomness of noise did not. By applying the geometric mean filter again, to the results of the first application of the filter, we can bring out all noticeable waves, no matter how weak, to have higher values than any area containing only noise. As we have achieved the desired results with the second application of the filter, a third application is unnecessary and would only increase computation time. Therefore, once all the elements of the matrix E have been calculated, the filter is used again, but this time on the matrix E to obtain the matrix F. When geometric mean filter is used to obtain F, the parameter which controls the number of column used in each geometric mean, n F , is different than n E which is used with the filter to obtain E. The optimal values of the parameters n E and n F are determined in a machine learning stage of the algorithm.
An example of the original data matrix M, and of the filter's result F is given in the Figure 7.
Matrix F is designed to produce large values at points at which a displacement (or pressure) from a coherent wave was recorded. We select the element in F with the highest value as the most likely element in the matrix M that describes a coherent wave. The row and column of the selected point designates the position of the coherent wave and are termed respectively the waveform row and waveform column, or w r and w c in equations and Figure 7. . The process of determining the set of elements (δ i,j ) for calculating the geometric mean at the position (i, j). The curly braces indicate the set of elements from which the next element will be added to the geometric mean. (a) The first element, M i,j is automatically added to the set δ i,j and the algorithm searches for the following elements in columns adjacent to column j. (b) The two elements from the adjacent columns are added and the search is now on columns j − 2 and j + 2. (c) Two more elements are added to δ i,j , from columns j − 2 and j + 2, and the search continues until δ i,j is complete.

Step 2
In the first step we determine the position (w r and w c ) of a specific point, or element, that describes a part of a coherent wave. Thus, we know the position of said wave in only one of the columns, namely the waveform column (w c ). In the second step, we find the row position of its waveform in the remaining columns of M, and in doing so we build the shift vector. In this context, the shift vector records the row position of the waveform in each of the columns, relative to the position of the waveform row (w r ). Therefore, the waveform position in any column j is equal to the sum of the waveform row w r and the jth element of the shift vector s. The location of the waveform in the remaining columns is found using a cross-correlation procedure.
We select a sequence (ψ) from the waveform column that contains all the elements between rows w r − w and w r + w (Figure 8a,b). The parameter w represents the window size when determining the sequence (ψ), and its value is determined in the machine learning stage. The sequence of elements ψ is supposed to represent the waveform, or the greater part of it, and we refer to it as the waveform sequence. The position of the waveform in other columns is then determined by finding in each column a sequence of elements that has the highest correlation with the waveform sequence ( Figure 8c).
We start from the columns adjacent to the waveform column w c . The waveform position in columns adjacent to w c must be between w r − m and w r + m, where m is the maximum dip rounded up to the nearest integer. To reduce computation time, and to make sure to follow the same wave throughout all the columns, in adjacent columns we search for the ψ sequence from row w r − m − w to row w r + m + w. Initially, the range of rows in which we search for the ψ sequence is entirely dependent on the waveform position in the closest column. For any column position j > w c , we search for the ψ sequence from row Once we have determined the waveform position in columns ranging from position w c − 2l to position w c + 2l, we narrow the search window for the following columns. The value of the parameter l is determined in the machine learning stage. At this point, for a column j > w c + 2l we search for the waveform sequence from row i e − 1 − w to i e + 1 + w.
The row position i e in column j is determined by following the parabolic trend of waveform positions in columns j − 1, j − 1 − l, and j − 1 − 2l. Fitting parabolas to shift vectors will also be used in the Discussion section, for estimating the velocities of the recorded waves. While arrival times are usually modeled with a hyperbola, we use a parabola because it requires fewer points on the shift vector to find its coefficients. Since most parts of the hyperbola can locally be fairly accurately described with a parabola, our parabolic estimates are sufficiently accurate. Narrowing the search window allows us to estimate the waveform position in each column more quickly and to make estimating waveform position more resilient to perturbations from noise and other waves. The optimal values for parameters w and l are determined in the machine learning stage. Moreover, if the maximum correlation with the waveform sequence in a certain column is below 0 the step terminates, as at that point we can be certain that the waveform is no longer present.

Step 3
Once the shift vector and all the waveform positions are determined, the shift vector and shift operator are used to shift the columns of the data matrix M so that the wave appears in the same set of rows in each column. At that point, the waveform is similar to the one shown in Figure 1, and it can be very accurately described as an outer product between two singular vectors. The number of rows storing the waveform pattern is much smaller than the total number of rows in M. Therefore, most of the elements in the column vector are not describing the waveform, hence are not relevant, and can be ignored. The length of the column vector need not be as long as columns in M, so it is reduced such that its first element specifies the number of zeros before the waveform and the last element specifies the number of zeros after the waveform. For example, the waveform row (w r ) would then be equal to the sum of the first element of the column vector (u 1 ) and half of the length of the recorded waveform (l w ). The optimal length of the waveform that is recorded in the column vector (l w ) (i.e., the number of elements in the column vector excluding the first and the last element), is determined in the machine learning stage. The row vectors and shift vectors are reduced in a similar manner to the column vector since the waveform is often contained in only a subset of columns of M, i.e., the wave packet may not be recorded at a subset of receiver locations. The effect of these changes is to decrease the number of elements required to represent M, thus decreasing the memory requirement of the SMD algorithm.
Once the two singular vectors (the column vector u and the row vector v) are extracted, their outer product is subtracted from the matrix M. The columns of M are also shifted back to their original locations. The new matrix M new can be described with the equation: As a result, the majority of the located wave disappears which allows the algorithm to focus on a potential second wave present in M. At the end of this step of the algorithm, a wave has been subtracted from matrix M and added to the compressed data as a pair of singular vectors and a corresponding shift vector.

2.2.4.
Step 4 The three steps described above are repeated until a certain performance criterion is met. Herein we define that criterion in terms of memory. Once the memory usage of the compressed data reaches 20% of the original data (80% compression), the SMD algorithm terminates. Specifically, once the number of elements used to describe all the shift vectors and singular vectors is equal to 20% of the number of elements in the matrix describing the raw seismic data, the performance criterion is met and the SMD algorithm terminates.

Training Data and Scoring the Model
In the algorithm description we have introduced five parameters that influence the results of SMD (shifted-matrix decomposition). The five parameters are n E and n F from Step 1, w and l from Step 2, and l w from Step 3. However, we do not have an intuitive understanding of how the results of SMD are affected by the changes to the parameters, and we have no analytical solutions for optimal values for the parameters. For that reason, we use supervised machine learning to find a set of parameters that provide consistently good SMD results. Once the parameters are optimized in the training stage, SMD can be applied to new seismic data without the need to run the training stage again. Moreover, the parameter m, which also affects the results of SMD, is predetermined and therefore cannot be optimized.
The first step is to define what is meant by a good SMD output. We desire the matrix reconstructed from SMD to closely resemble the original seismogram matrix M. Specifically, with a predefined compression rate, we want to capture a big portion of the original data, and we want as little noise as possible to be recorded in the SMD results. We refer to the matrix reconstructed from decomposed data as M r . When we measured the resemblance between M and M r as an L 1 or L 2 norm of the error M − M r , we obtained the smallest error norms when the SMD captured a lot of the noise with a few pairs of singular vectors and shift vectors. For this reason, rather than minimizing a norm of the error, we decided to maximize the dot product of matrices M and M r : The dot product was more sensitive to SMD results representing a weak coherent wave or a small remaining part of a large wave than it was to SMD results representing primarily noise. We believe this is because the dot product of the two matrices is proportional to their correlation. Recording a large area of noise with a pair of singular vectors and a shift vector could decrease the error M − M r by a fair amount. However, because of the randomness of noise, the reconstructed data will still have weak correlation with the original data in the noisy area. Recording any area with coherent waves with a pair of singular vectors and a shift vector usually provides high correlation between the original and the reconstructed data in said area.
Thus, we search for a set of parameters which maximize the dot product M · M r . For training data, we use recordings of reflected wave arrivals from 5 different shotgathers from marine seismic gathers. We use seismic field data obtained from marine seismic gathers because the data contain a large number of coherent waves with lots of interference, and noise with high coherence which can be difficult to differentiate from signal. To reduce the time spent in the training stage, from each shot-gather we only use 2 s recordings containing reflected waves and the preceding noise recorded by the first 200 receivers. More details on the field data will be provided in the following section. The five shot-gathers are labeled as M 1 − M 5 and data reconstructed from decomposition of the five seismograms as M r,1 − M r,5 . Thus, the overall quality of SMD with a given set of parameters would be quantified as the SMD score (ζ): The SMD score is only used in training stage to find an optimal set of parameters, and it will not be present in the following sections.

Derivative Free Optimization
There is no analytical formula directly relating the SMD score (Equation (9)) to the five input parameters. Therefore, we cannot use a gradient analysis to find optimal values for these parameters. Instead we use a derivative-free numerical optimization method. Specifically, we will use a pattern search. The five parameters to be optimized can only be natural numbers. Thus, the five parameters can be described by an element in N 5 . In this optimization, we take a subset Ω of N 5 , to be the set of all the reasonable values of the five parameters. Specifically, we seek an element in Ω for which SMD score has the highest value. This is done by performing a pattern search which takes the current position in Ω and checks adjacent points to see if any of them yield a higher SMD score. An adjacent position is defined as a position that can be reached by changing only one of the five parameters by the minimum amount. If the adjacent element with the highest SMD score has a higher score than that of the current position, the optimizing position is moved to the adjacent element. This is done until a local maximum for the SMD score is found. To increase the likelihood of finding the global maximum, the pattern search is repeated multiple times with a random starting location.

Application
Once the training stage is finished, SMD can be applied to a new seismic data set even if it is not from a marine gather. Without going through the training stage again, SMD was successfully applied to new marine gathers, synthetic data from [17], and to field data collected by linearly distributed geophones. However, the discussions in the following sections are focused on data from marine seismic gathers, rather than data gathered by geophones. Because marine seismic gathers were obtained from a much larger number of receivers and contain greater volumes of data, they provide a more realistic example of applications of SMD.
Even when skipping the training stage, the parameters still need to be adjusted before SMD is applied to new seismic data, based on the dominant frequency and the maximum slope of the waves in the new dataset. Specifically, parameters w from Step 2 and l w from Step 3 need to scale proportionally to the period of the dominant frequency multiplied by the sampling rate. Moreover, parameters n E and n F from Step 1 and l from Step 2 should scale proportionally to the period of the dominant frequency multiplied by the sampling rate and divided by the maximum slope of arriving waves. Therefore, when applying SMD to a seismic data set, one should also include information about the dominant frequency in the data set, the sampling rate of the receivers, and the maximum slope of the arriving waves. Knowing these values allows the user to apply SMD to new seismic data sets without going through the training stage. The flowchart in Figure 9 provides a visual description of how SMD is applied to a set of files containing seismic data. Figure 9. Flowchart describing the process of applying SMD to seismic data. The fourth step from above is described by the flowchart presented in Figure 5.

Results
Once the SMD parameters have been optimized in the machine learning stage, its performance is tested on three datasets. The first one is a synthetic dataset similar to one from [17]. Structure-oriented SVD [17] was tested on several datasets and while it was successful at reducing noise in all examples, it produced numerical artifacts in one of them. In this work we reproduce that challenging dataset in order to demonstrate the improvements achieved with SMD. The other synthetic data test cases from [17] contain fewer coherent waves and don't show interference between waves of different slopes and are not as difficult for reliable compression and analysis. For that reason, the second and third data sets on which SMD is tested are from field data obtained during marine seismic gathers. The second dataset has many, interfering, strong arrivals which can together create a complicated signal. The third dataset has fewer arrivals, but the coherent waves are much weaker and difficult to differentiate from noise. Figure 10 shows application of local SVD and SOSVD on the synthetic dataset from [17], while Figure 11 shows the application of SMD to a very similar data set. To create the synthetic data in Figure 11 we recreated the signal observed in Figure 10 and added random noise to it. In Figure 10 we see that both local SVD and SOSVD are able to remove most of the noise. However, both local SVD and SOSVD struggle to properly reconstruct the signal in the area highlighted by the blue rectangle in Figure 10c,b. The highlighted area contains two waves of different slopes intersecting with each other. Figure 11 shows results of applying SMD to the synthetic data set twice, with 95% compression (Figure 11c) and 80% compression (Figure 11d). In both SMD applications the coherent waves were reconstructed perfectly. We can see that there are no numerical artifacts in the area highlighted by the blue rectangle in Figure 11c,b, which contains intersecting waves with different slopes.

Synthetic Data
While we can see significant noise reduction, we could not find specific information in [17] describing the amount of noise present in synthetic data in Figure 10, However, we can estimate the signal-to-noise ratio in the synthetic data on which SMD is tested. By measuring the root mean square value from all receivers in time increment range from 300 to 320, where only noise is present, we can estimate the amplitude of the noise. To estimate the amplitude of the signal, we measure the root mean square value from all receivers in time increment range from 340 to 360, from the data presented in Figure 11a, which contains only pure signal. The signal-to-noise (ρ) ratio is therefore calculated with the Equation (10): where M p represents the data containing only pure signal presented in Figure 11a, and M n represents the data from Figure 11b,c, or d, which contains some amount of noise. The signal-to-noise ratios in the original data, data reconstructed from SMD results after 80% compression, and after 95% compression were 1.9, 4.7, and 12.3, respectively. Notice that the signal-to-noise ratio is much larger when SMD is applied with 95% compression than when it is applied with 80% compression. This is because there is only a few coherent waves present in the data. These coherent waves are represented with first several singular vectors and shift vectors, while the following singular vectors and shift vectors are describing noise. However, SMD results represent noise a lot less efficiently than coherent waves, and the majority of the noise is still not recorded when SMD is applied with 80% compression. In conclusion, the SMD application with 80% reduction in memory requirements picked up some of the noise, while SMD application with 95% data compression ignored the noise almost completely.

Field Data
Finally, the SMD algorithm is tested on field data, collected between January and March 2016 during the CREST expedition, MGL1601, aboard the R/V Marcus G. Langseth. Pressure waves are generated by a tuned array of 36 air guns, towed at a depth of 6 m. The resulting acoustic waves are recorded using a 12,587.5 m hydrophone streamer, towed at the depth of 8 to 12 m, and carrying 1008 receivers. The receivers are spaced by 12.5 m, each of them recording pressure once every 4 ms. The maximum offset between two adjacent columns is a little greater than two rows, so we round it up to three rows. Further information regarding data acquisition can be found in [21]. Seismic data are also available at the NSF-sponsored Academic Seismic Portal hosted by the University of Texas Institute for Geophysics and can be accessed at https://www.marine-geo.org/tools/search/Files. php?data_set_uid=23597 (retrieved on 31 March 2021).
Unfortunately, when being applied to real seismic data such as in Figures 12 and 13 SMD with 95% compression cannot reconstruct the arriving waves properly. Due to multiple reflections and dispersion, the number of coherent waves is much larger in marine seismic gathers than in synthetic data. Furthermore, the noise has high coherence such that it can closely resemble wave arrivals. Using SMD with 80% compression ensures that the arriving waves will be properly reconstructed while also ensuring the noise is drastically reduced.  The first data set ( Figure 12) contains strong reflections arriving far after the direct wave which can be seen at early times on the few receivers close to the source. In this test, SMD was able to identify prominent waveform-related features while almost completely ignoring noise-dominated sections of data. The Figure 12 highlights in red an enlarged portion of the data (3 times along x-axis and 6 times along y-axis) in which we can see the first arriving reflections. In the original data (Figure 12a), at the lower half of the enlarged window we can also see the preceding noise. However, once the data is processed with SMD ( Figure 12b) the preceding noise is no longer present. A principal drawback of SMD is that in some scenarios it will also ignore weaker incoming waves, and fail to differentiate them from noise.
The second data set ( Figure 13) is from a different shot-gather in which we can observe weak reflections arriving closely after the much stronger direct wave. In Figure 13 the reflections are circled with purple lines, direct wave with green lines, and the preceding noise with black lines. Red lines are used to highlight and enlarge (2 times along the xand y-axes) an example of noise interfering with the reflected waves. The reflections are hard to differentiate from the more noticeable direct waves and because of their small amplitude, they are noticeably distorted by noise. In this test, SMD identified and properly reconstructed both direct waves and reflected waves, while reducing noise in all parts of the data. The preceding noise (circled in black) is noticeably weaker in the data reconstructed from SMD results (Figure 13b) than in the original data (Figure 13a). Furthermore, the enlarged window (circled in red) in the original data (Figure 13a) shows reflected waves distorted by noise. However, in the data reconstructed from SMD results (Figure 13b), the enlarged window (circled in red) shows a more clear, denoised version of the reflected waves. Because SMD reduces the noise everywhere in the data, the weak reflected waves can be seen more clearly in the data reconstructed from SMD results (Figure 13b).
The examples in Figures 12 and 13 show subsets of the entire data files which are too big to be presented in a single figure. However, the large data size provides a good opportunity for testing the efficiency of SMD. A single data file, which contains 12 s of recorded data, has 1008 traces, each trace containing 3000 pressure samples. Processing this amount of data on a laptop with an i5-6200U CPU and 8 GB of RAM, without parallelization, takes 4.2 s on average. Therefore, SMD is sufficiently fast when processing marine seismic gathers in real time. However, distributed acoustic sensing produces produces greater volumes of data than a long line of receivers during marine gathers.
We propose two solutions as we prepare SMD to future application on seismic data obtained by DAS. First, the current SMD algorithm is optimized to provide a best representation of the coherent waves, for a predefined memory requirement. To make SMD more applicable to new developments in the data acquisition (DAS), we could run a new training stage that considers computation time as well as the quality of the results. This could create a new version of SMD that is more applicable to data obtained by DAS. Second is the introduction of signal detection which we will discuss in the following section. The large data file that requires 4.2 s of computation time is heavily populated with signal. This is not the case during microseismic monitoring during which most of the files contain only noise. Running an initial test to check for the presence of coherent waves, prior to fully processing the data with SMD, may significantly reduce the time spent on processing data during seismic monitoring.

Discussion
In addition to seismic data compression and noise reduction, SMD also provides a new method of seismic data analysis. Rather than a list of displacements distributed in space and time to describe incoming waves, we have pairs of singular vectors paired with shift vectors. In an ideal scenario, for each arriving wave the column vector represents the waveform, the row vector represents the amplitudes at different receiver locations and the shift vector represents relative arrival times. Even though the ideal case is rarely achieved, we believe that the results of SMD provide an excellent advance in the realm of seismic analysis especially with its application of machine learning. With SMD, identifying features that can be used for building models is facilitated when noise-reduced data is represented in the SMD-compressed format.
The application of machine learning to data compression was explored in [24], which applied SVD to synthetic data and developed a model for estimating source location and orientation. However, SMD may provide greater opportunities for machine learning application.
It is instructive to provide an example of how the SMD algorithm can help estimate physical properties such as elastic wave velocities. To this end, herein we predict the average acoustic velocity (α) in our model (seawater) by analyzing the results of the SMD algorithm. Specifically, we use curvature of the shift vectors, and the zero offset time to estimate the average velocity of the reflected waves. Since we have a controlled source, a wave's zero offset time is simply it's row position in the first column of the data matrix multiplied by the time difference between consecutive recordings (t d ). Since the row position of a wave can be determined from the shift vector and the first element of the column vector, our wave velocity estimation is obtained solely from the data stored in SMD results.
To derive the formula for wave velocity we must make several assumptions about our surroundings. First, we assume that the reflecting surface (the ocean floor) is horizontal, and that the depth of the ocean floor (z f ) is significantly larger than the horizontal distance from source to the receiver (x z f ). This gives us the expression for the total distance travel by the reflected waves (d): Since we are considering the average acoustic velocity (α) in our model, we can rewrite the expression (11) in terms of the reflected wave travel-time (T): The zero offset time T 0 is defined as travel-time T at zero horizontal distance (x = 0): Taking the second derivative of the expression (12) with respect to horizontal position x gives: Using expression (13) to substitute (2z f ) with (T 0 α) in expression (14) gives: Expression (15) can be used to calculate the acoustic velocity if we can express both ∂ 2 T ∂x 2 and T 0 in terms of SMD results. In Section 2.2.2 it is explained that the row position (p j ) of a wave in any column j is equal to the sum of the j th element of the shift vector (s) and the waveform row (w r ): In Section 2.2.3 it is explained that the waveform row (w r ) can be obtained from the first element of the column vector (u) and the predetermined length of the recorded waveform (l w ): Using expressions (16) and (17), the zero offset time (T 0 ) can be described with the row position of the wave in the first column (p 1 ) multiplied by the time difference between consecutive recordings (t d ): The expression (18) will be used to obtain the zero offset time from SMD results. If we assume that the relative arrival times were accurately recorded by the shift vector, we can make the following substitution: where x d is the distance between adjacent receivers. In the expression (19), s is the second derivative of an element in the shift vector s with respect to the position (j) of the element in the shift vector.
The value of s is determined by fitting a parabola to first n r elements of the shift vector. The parameter n r need not have an exact value. While we want to use enough elements from the shift vector to confidently fit a parabola, we also want to only use data from the receivers close to the source in order to follow the (x z f ) condition. Therefore, the parameter n r is set to 50. Once we fit the parabola, the value of s is estimated to be the quadratic coefficient c 2 , multiplied by 2: By plugging the expressions (18)- (20) into expression (15) we estimate the acoustic velocity. However, the values of a shift vector can be affected by interfering waves, or noise. To minimize the error from those sources, we estimate the velocity based on the first n s pairs of shift vectors and column vectors. Similar to n r , the n s parameter does not need to be set to any specific value. In this example, the parameter n s is set to 5. The average acoustic velocity α is estimated as the weighted average of the results from the first n s extracted pairs of a of shift vector and a column vector. Each term is weighted by the quality of the parabolic fit, which is equal to the inverse of the error norm e: The formula for α, defined in expression (21), was applied to 20 marine field data files, each recording a seismic response from a unique and controlled seismic source. For each file, we used the formula from expression (21) to estimate the velocity. The average estimated velocity was 1601 m s and the standard deviation among the results was 236 m s . Considering the ocean depth being about 3 km, the correct average acoustic velocity was likely about 1500 m s . If we assume the correct average wave velocity experienced by the reflected waves was 1500 m s , then the average and the median error from the 20 velocity estimates were 12.1% and 8.3%, respectively.
The experiment in this subsection proves that there is a strong correlation between the shift vector and relative arrival times of the wave at different receivers. We were able to use that correlation to estimate the velocity of the waves without relying on any information about the medium through which the waves were traveling. We believe that there is also a strong correlation between the column vector and the waveform, as well as between the row vector and the relative amplitudes of the waveform at different receivers. However, proving these correlations will require further testing.
In this example, SMD was applied to seismic data from a controlled source. In such cases we can always be certain that there are coherent waves present. However, we are planning a wide range of applications for SMD. During microseismic monitoring, often recorded by DAS, we might want to analyze coherent waves that are not coming from controlled sources. Therefore, when applying SMD to data from microseismic monitoring we do not know in advance whether the data contains signals of interest. For that reason, we developed a method for recognising coherent signal in noisy data, that also relies solely on the results from SMD.
We created a method that can differentiate between noisy data and data containing signal, by measuring the curvature in the first several shift vectors. Applying SMD to noisy data without coherent waves returns a set of completely unrelated shift vectors. On the other hand, applying SMD to data containing coherent waves, usually coming from the same source, returns a set of shift vectors most of which have similar curvatures. Therefore, the method differentiates between noisy data and data containing coherent waves, by calculating the standard deviation among the curvatures from the first 10 extracted shift vectors. To test this method, we applied SMD separately to two subsets of data from each of the 20 previously mentioned data files. The first subset contains only the recordings of noise, and the second subset contains recordings of both the coherent waves and the preceding noise ( Figure 14). Figure 14. Two subsets from a marine field data file. The first subset contains only noise, and the second subset contains both a part of the coherent waves and some of the preceding noise.
For each subset, we use expression (20) to estimate the curvatures of the first 10 extracted shift vectors and then we compute the standard deviation of the 10 curvature values. The results are presented in Figure 15, in which the standard deviation for each subset without coherent waves is presented in blue, and the standard deviation for each subset containing coherent waves is presented in red. Due to shift vector curvature having random values when SMD results represent noise, the standard deviation among shift vector curvatures should be greater for the data subsets which contain only noise. As can be seen in Figure 15, the standard deviation among shift vectors is consistently lower for subsets containing coherent waves, which are presented in red. This confirms that we can differentiate between data containing only noise and data containing signal, by using our described method.
It is important to emphasize that this signal detection method does not contain any information regarding the amplitude of the waves. It is entirely dependent on the curvature of the recorded shift vectors, which is a unique property. This means that this method, while accurate, can also be used to complement other signal detection methods, all of which rely on other properties in seismic data (such as STA/LTA for example, which relies on the amplitudes of the waves). Furthermore, this signal detection method only requires the first 10 shift vectors, which can be acquired very quickly compared to the to the processing time of the SMD. Therefore, when applying SMD to data obtained during seismic monitoring, we suggest first running a quick version of SMD that only extracts the first 10 shift vectors in order to run signal detection. If the presence of coherent waves is confirmed, SMD may continue to fully process the data. The analysis of SMD results that was conducted in this subsection was not based on machine learning but instead on our understanding of the correlations between compressed data and attributes of the recorded waves. In the future, those strong correlations will be used with machine learning to train algorithms to accurately infer various properties of the source and the surrounding velocity model directly from compressed data. Additionally, application of data from marine surveys is difficult because of large amounts of interference between arriving waves. Thus, the SMD method may be more effective in different circumstances, such as in unconventional reservoirs monitored by distributed acoustic sensing.
Thus far, we only considered application of SMD to two-dimensional seismic data, obtained by a single line of receivers. However, receivers may often be distributed over an area, in a large number of lines. In such case, SMD may still be applied individually, to the data collected from each line. However, there may be a lot of redundancy between SMD results from each of the lines of receivers. Similar redundancies can also occur by applying SMD to multiple files from different times, obtained by a same set of receivers. In future work, we will take advantage of the fact that SMD results may be similar for batches of data collected from different receivers or during different times. Dictionary learning could be applied to SMD results to further compress the data, although we can not yet recommend a method for learning the dictionary. Future work will therefore likely include further compression by applying dictionary learning to SMD results from multiple data files or multiple receiver lines.

Conclusions
Shifted-matrix decomposition is a powerful tool that can simultaneously compress and improve seismic data. This is done by converting the seismic data from a matrix into a set of pairs of singular vectors coupled with shift vectors which require less memory to store the seismic data. Furthermore, reconstructing the seismic data from compression results, creates a denoised version of the original data. Shifted-matrix decomposition provides an improvement to some of the existing denoising techniques such as SOSVD by avoiding numerical artifacts in areas with multiple intersecting waves of different slopes. This allows us to apply SMD to complicated field data with a large number of arriving waves and still achieve 80% compression.
When applied to synthetic data and ocean gathers it was able to boost coherent signal and erase the majority of the noise. In synthetic data with signal-to-noise ratio of 1.9, it was able to increase the ratio to 4.7 in the case of 80% compression and to 12.3 in the case of 95% compression. However, the excellent result achieved with 95% compression in the synthetic data is an artifact of the lack of coherent noise and the small number of coherent waves. Due to noise coherence and more complicated overlap of coherent waves, we recommend applying 80% compression to examples of field data in order to accurately represent the signal of interest. In field data, SMD was able to reduce the noise in areas preceding the signal as well as in areas containing coherent waves. As a results, weak waves that were difficult to notice in the original data, can be seen more clearly in the data reconstructed from SMD results. The only drawbacks are that in some scenarios SMD may fail to boost weaker signals and it is not meant to be applied to seismic data obtained from a small number of receivers.
There is a good correlation between the physical properties such as elastic wave velocity and the results of the SMD. As an example, the average wave velocity in the medium through which the waves propagate (seawater) was roughly estimated by analyzing the shift vector curvature. In the future, we will use machine learning to build models that infer with high accuracy the properties of the source and the velocity model, directly from the SMD results. Results of SMD can also be used for other analysis, such as signal detection. By analyzing only the first several shift vectors, we can check for the presence of coherent waves. Because it requires such a small amount of data, this technique can be executed very quickly by taking only the initial results of SMD. We recommend using it during microseismic monitoring before fully processing the data with SMD.  Data Availability Statement: All code used in this paper can be obtained by contacting the author at mbrankov@tamu.edu or milanovstalni@gmail.com.

Acknowledgments:
We would like to thank Robert Reece and his student Jingxuan Wei for providing us with the field data which played an integral role in the development and testing of the SMD.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

DAS
Distributed acoustic sensing SMD Shifted-matrix decomposition SVD Singular value decomposition SOSVD Structure-oriented singular value decomposition SNR Signal-to-noise ratio STD standard deviation PCA Principal component analysis STA/LTA short-term average over long-term average

Appendix A. The Quantative Example
Consider a small matrix M s ∈ R 8,8 representing a simple wave arrival recorded by 8 receivers.
In the matrix M s , each receiver is represented by one of the columns, and the time increases from top row to the bottom row. The source is closest to the third receiver, and the perceived waveform is simply: Figure A1 loosely describes a toy model that can generate the matrix M s . By applying SMD to matrix M s we extract a pair of the singular vectors (u SMD ,v SMD ), and a shift vector (s). Keep in mind that when using SMD, the singular value λ SMD is stored in right singular vector v SMD by multiplying the vector with it: Using the singular vectors and the shift vector we can reconstruct a matrix M SMD to be identical to M s : There is a significant difference between M s and M SVD . The data from matrix M s cannot be described by a single pair of singular vectors and a singular value obtained by applying regular SVD. However, it can be described by a pair of singular vectors coupled with a shift vector obtained by applying SMD.