Encoder-Decoder Architecture for 3D Seismic Inversion

Inverting seismic data to build 3D geological structures is a challenging task due to the overwhelming amount of acquired seismic data, and the very-high computational load due to iterative numerical solutions of the wave equation, as required by industry-standard tools such as Full Waveform Inversion (FWI). For example, in an area with surface dimensions of 4.5km $\times$ 4.5km, hundreds of seismic shot-gather cubes are required for 3D model reconstruction, leading to Terabytes of recorded data. This paper presents a deep learning solution for the reconstruction of realistic 3D models in the presence of field noise recorded in seismic surveys. We implement and analyze a convolutional encoder-decoder architecture that efficiently processes the entire collection of hundreds of seismic shot-gather cubes. The proposed solution demonstrates that realistic 3D models can be reconstructed with a structural similarity index measure (SSIM) of 0.8554 (out of 1.0) in the presence of field noise at 10dB signal-to-noise ratio.


INTRODUCTION
A key step into understanding the subsurface by remote sensing, is the acquisition of seismic data, which consists of the recorded response of the subsurface when mechanical perturbations are introduced. After data has been collected, several disciplines of geoscience are involved towards the common objective of produce a reliable subsurface model(s). Earth models can be used for many purposes, such as: seismology studies, hydrocarbon exploration and CO 2 sequestration. When used for the later purpose, models are critical inputs to drilling decisions. The problem at hand is daunting, too many variables, huge datasets. An example of great societal importance is injecting CO 2 from industrial processes into specially reconditioned reservoirs, to that end having high quality subsurface models is crucial. The solution of 3D seismic inverse problems using deep learning (DL) [1,2] is an emerging field of research, motivated by state-of-the-art results obtained by DL for the 2D case [3][4][5]. The contribution of this paper are three-fold: (1) A convolutional encoder-decoder network is proposed with efficient input data dimensionality reduction, to reconstruct complex 3D models at an average inference time of 0.165 seconds (on one NVIDIA A100 GPU), which is a fraction of any iterative global optimization solver. (2) The proposed approach is demonstrated to provide inherent robustness against noise in the recorded seismic data. (3) The proposed approach is evaluated with realistic 3D geological models.

PROBLEM FORMULATION
Direct reconstruction of models of solid earth is not possible, this renders the following forward model: only practical when synthetic seismic data (d) is to be generated from a forward operator F acting on a artificial model m, under ambient noise . F approximates the behavior of seismic waves propagating through the mechanical medium (m), and it is represented by the following expression: where u = u(x, y, z, t) is the seismic wave displacement, V P is P-wave velocity (compression/rarefaction), V S is Swave velocity (shear stress), and f is the source function. While the elastic 1 wave equation describes faithfully seismic waves propagation, it is often preferred (as in this work) to approximate it by the acoustic wave equation [6], which assumes only P-waves and requires less computational resources and parameters, as compared to solving the elastic equation. The acoustic wave equation for a medium without density variations is given by: where u is the wave displacement, V is the P-wave velocity model and f is the perturbation source (i.e. shot) function.
Since the direct formulation is not tractable, it is common to use the inverse approach. Seismic velocity inversion computes a complete 3D velocity model (m) of a certain target area, from recorded seismic data d r , and it can summarize as: where F −1 is the inversion operator. Seismic inversion problems [7] are ill-posed, e.g., the solution is non-unique and unstable in the sense that small noise variations may alter the solution significantly. The DL formulation for solving inverse problems is detailed in the next section. 3. DEEP LEARNING APPROACH

Encoder-Decoder Architecture
Deep Learning (DL) is a powerful class of data-driven machine learning algorithms, built using Deep Neural Networks (DNNs), which are formed by a hierarchical composition of non-linear functions (layers). The main reason for the success of DL is the ability to train very high capacity networks using very large datasets, often leading to and good generalization capabilities in numerous problem domains. Generalization is defined as the ability of an algorithm to perform well on unseen examples. In statistical learning terms an al- is a data sample (in this work, a seismic shot-gather) and y i ∈ Y is the corresponding label (in this work, a 3D velocity model). Let P(X , Y) be the true distribution of the data, then the expected risk is defined by: where L is a loss function that measures the misfit between the algorithm output and the data label. The goal of DL is to find an algorithm A within a given capacity (i.e. function space) that minimizes the expected risk, however, the expected risk cannot be computed since the true distribution is unavailable. Therefore, the empirical risk is minimized instead: , which approximates the statistical expectation with an empirical mean computed using the training dataset.
In this work we trained a 3D convolutional encoderdecoder, inspired by the 2D U-Net architecture [8], to learn the mapping from seismic data space to 3D models space (i.e. inversion). The complete network is detailed in Table 1, with a total of 99M parameters.

Computational considerations
The main challenge in training such a deep convolutional neural network (DCNN) for real-life inversion tasks lies in the demanding GPU RAM size and external storage access requirements due to the large number of input channels and large size of each input channel: each sample in our training data was composed of N x × N y = 529 seismic data cubes (i.e. DCNN input channels), where N x , N y are the total numbers of shots in the lateral and longitudinal axes, respectively. Therefore, a total storage size of 42GB per sample (after decimation to dimensions 96 × 96 × 224). A modest training dataset of 300 samples occupies ≈ 1.6T B storage size, which requires very high-speed storage access to facilitate DCNN training in reasonable duration. Thus, the problem belongs to a High-performance Computing class [9]. To overcome these challenging requirements, we propose a simple yet highly-effective dimensionality reduction scheme: let d(S x , S y , R x , R y , t) denote the 5D tensor that represents a single data sample, i.e. the collection of seismic data cubes (shot-gathers), where S x , S y are the indices of the shot position, R x , R y are the indices of the receiver position, and t is time. We define the time-boosted and dimensionality-reduced data cubed by spatial averaging along the shots dimensions: where b(t) is a monotonically-increasing time-boosting function that compensates the attenuation of wave reflections from the lowest geological layers, by amplifying late-arrival time samples. Therefore,d forms a single 3D input channel, thus significantly mitigating the memory and computational requirements for training and inference of the proposed DCNN. In the next section we describe the performance of the proposed architecture for noiseless seismic data, as well as data contaminated by synthetic and field noise.

Data Preparation
We created 840 3D velocity models using the Gempy 2 tool that creates 3D geologically-feasible models with realistic combinations of features. The selection of Gempy as subsurface modeler is not arbitrary, and obeys to the intention of solving a more realistic problem than just flat layercake models. The physical dimensions of each model were 4.5Km × 4.5Km × 4.0Km (lateral × longitudinal × depth), represented by a 3D tensor of dimensions 300 × 300 × 800 grid points, which was down-sampled for DCNN training to dimensions of 96 × 96 × 224. To generate the synthetic seismic data, through forward modeling, we use an acoustic isotropic wave equation propagator with a 15 Hz peak frequency Ricker wavelet as a source. Shots and receivers are evenly spaced on the top surface of the 3D model (200m between shots and 25m between receivers). To avoid reflections from the boundaries and free surface multiples, convolutional perfectly matched layer (CPML) [10] boundaries are imposed all around the model. Each generated seismic data cube was computed on a grid of dimensions 180 × 180 × 500 (lateral × longitudinal × time) points, which was down-sampled for DCNN training to dimensions of 96 × 96 × 224. The 840 3D models were split to training and testing sets by a 92%/8% ratio, respectively. The proposed DCNN was implemented in PyTorch and trained by minimizing the Mean Absolute Error (MAE) loss function, using the ADAM optimizer with early stopping regularization. Training was repeated for five different scenarios (each resulting in a different trained DCNN): noiseless seismic data, data contaminated with white Gaussian noise at signal-to-noise (SNR) levels of 10dB and 0dB, and data contaminated with field noise (from real data measurements) at SNR levels of 10dB and 0dB. Examples of the clean and noisy data are provided in Fig. 1, demonstrating the highly correlated patterns in space-and time-domains of the field noise.
A separated set of 80 samples were augmented with bodies that resemble simple salt geometries. The network trained with 840 samples without noise was used as starting point for a transfer learning process. The resulting transfer learned network was tested on 20 unseen samples also augmented with simple salt geometries. The 3D SSIM for this testing set is 0.89 and the MAE is 55.70 m/s, examples can be observed in Figure 3. The distribution SSIM values for the testing set is presented in Figure 4 and an example of the structure of the prediction error (see Figure 5).

Evaluation Metrics
SSIM [11] results were computed per 3D model by first averaging SSIM values along the three 2D planes: 96 along the XZ plane, 96 along the YZ plane, and 224 along the XY plane. Finally, the three results were averaged to obtain the single SSIM(3D) result. Table 2, details SSIM and MAE results, averaged on the testing set, clearly demonstrating that the proposed DCNN is capable to reconstruct 3D velocity models from noiseless data (Fig. 2(e)-(h)), as well as with additive white noise (Fig. 2(i)-(p)) or field noise (Fig. 2(q)-(x)). Importantly, results for seismic data contaminated by field noise, indicate close similarity to the ground truth models at a SNR of 10dB, but the SSIM metric slightly deteriorated.
The results in the noiseless data case are surprising, indi-cating that the dimensionality-reduced data cube (5) contains sufficient information for practical reconstruction given the measured metric, achieving an average SSIM(3D) of 0.9003 by the DCNN. We next discuss the noisy data case.

Noisy Data Analysis
In the presence of additive white noise that is spatially (and temporally) independent and identically distributed (iid), the spatial averaging along the shots dimensions results in a reduction of the noise variance, as explained in the following. Denoting by n(R x , R y , t) the noise random variable resulting from the spatial averaging of noise samples, corresponding to receiver coordinates (R x , R y ) and time t: where n(S x , S y , R x , R y , t) are iid random variables with zero mean and variance σ 2 n . By using the iid property, the variance ofn(R x , R y , t) is independent of R x , R y , t and given by Nx×Ny σ 2 n (in our study N x × N y = 529). The time-boosting function b(t) is omitted from (6), since in the presence of additive noise, both the signal and noise components are multiplied by b(t), according to (5), thus the contribution of b(t) is cancelled in a SNR analysis. Therefore, the variance of the noise component in the dimensionality-reduced data cube is effectively reduced by N x × N y , for iid white noise. However, the field noise is clearly not iid, therefore a smaller reduction in the noise variance is achieved.

CONCLUSIONS
Seismic inversion based on DL effectively reconstruct 3D subsurface models from synthetic seismic data and synthetic seismic data contaminated by either white or field noise. Once training is settled, the inference step is fast -a fraction of a second -to a point that allows many different experiments to be carried out with marginal cost. The robustness to noise also demonstrated to reasonable SNR levels. The next steps for this approach are related to improving the accuracy of the reconstructions for more complex and larger scale structures, and to estimating the sensitivity wrt acquisition towards direct use of field data.

ACKNOWLEDGMENTS
The authors acknowledge TotalEnergies EP Research & Technology USA, for supporting this work and allowing its publication.      One testing sample is selected and a 2D (central inline) cut is shown, a comparison between ground-truth and prediction is presented in terms of error. As expected, the greater error (right most bar, in percentage) occurs around the salt geometry and mostly over estimating the model velocity, nevertheless the background velocity is overall correct.