# Constrained Full Waveform Inversion for Borehole Multicomponent Seismic Data

^{1}

^{2}

^{*}

Previous Article in Journal

Previous Article in Special Issue

Previous Article in Special Issue

Skolkovo Institute of Science and Technology, Moscow 121205, Russia

Department of Geosciences and Environment, University of Cergy-Pontoise, 95000 Neuville-sur-Oise, France

Author to whom correspondence should be addressed.

Received: 2 December 2018 / Revised: 25 December 2018 / Accepted: 25 December 2018 / Published: 16 January 2019

(This article belongs to the Special Issue Numerical Methods of Geophysical Fields Inversion)

Full-waveform inversion for borehole seismic data is an ill-posed problem and constraining the problem is crucial. Constraints can be imposed on the data and model space through covariance matrices. Usually, they are set to a diagonal matrix. For the data space, signal polarization information can be used to evaluate the data uncertainties. The inversion forces the synthetic data to fit the polarization of observed data. A synthetic inversion for a 2D-2C data estimating a 1D elastic model shows a clear improvement, especially at the level of the receivers. For the model space, horizontal and vertical spatial correlations using a Laplace distribution can be used to fill the model space covariance matrix. This approach reduces the degree of freedom of the inverse problem, which can be quantitatively evaluated. Strong horizontal spatial correlation distances favor a tabular geological model whenever it does not contradict the data. The relaxation of the spatial correlation distances from large to small during the iterative inversion process allows the recovery of geological objects of the same size, which regularizes the inverse problem. Synthetic constrained and unconstrained inversions for 2D-2C crosswell data show the clear improvement of the inversion results when constraints are used.

Full-waveform inversion of seismic data allows one to obtain an image of the subsurface through the determination of a certain number of physical parameters. Borehole data (VSP and crosswell) are of special interest because of the geometry of acquisition, which provides more informative signals on the medium properties than surface seismic data (high frequency signal, energetic P-S conversions). However, the lack of seismic data redundancy (e.g., few shots) renders the inversion problem underdetermined. In general, full-waveform inversion is an ill-posed problem, in the sense that an infinite number of models match the data [1]. It is classically solved with local optimization schemes and is therefore strongly dependent on the starting model definition. This starting model should predict arrival times with errors less than half of the period to cancel the cycle-skipping ambiguity [1]. The multiscale strategy performed by moving from low to high frequencies during the inversion allows reduction of the nonlinearities and cycle skipping issues of the inversion and helps convergence toward the global minimum. Regularizations are conventionally applied to the inversion in the model space to make it better posed [2,3]. Tikhonov and Arsenin [4] have proposed a regularization strategy within the optimization step to find the smoothest model that explains the data. Preconditioning techniques acting as a smooth operator on the model update [5] may add strong prior features of the expected structure through directive Laplace preconditioning, as in Guitton et al. [6]. Regularization schemes that preserve edges and contrasts have also been developed for specific full waveform inversion applications through a ${\ell}_{1}$ model penalty [7] or through a multiplicative regularization [8]. Regularization can also be expressed in the curvelet or wavelet domains [9,10]. In such domains, the ${\ell}_{1}$ norm minimization is generally preferred for the model term penalty because it ensures sparsity in the model space. All the previous regularization techniques allow stabilization of the inversion scheme by assuming a particular representation or structure of the velocity model (smoothness, sparsity, and so on). However, geological information, taking into account prior model information, is generally not used in classical full waveform inversion implementation; however, there are some examples [11,12]. In addition, most of the weighting operators or covariance matrices associated with the model parameters are set to identity or in the best cases to a diagonal matrix. The same applies to the weighting operators or covariance matrices associated with the data.

Weighting operators on the data and models or inverse of the data and model covariance operators for least squares in the frame of the Bayesian formulation [3] are introduced in the misfit function. In this paper, we will show the benefit of introducing covariance matrices in the data and model space. On the one hand, we will illustrate the impact on constraining the seismic inverse problem on the data space for the case of a synthetic two-component borehole seismic data by using the polarization analysis to fill a block diagonal data space covariance matrix. On the other hand, we will illustrate the benefit of constraining the seismic inverse problem on the model space by performing a crosswell synthetic experiment. The horizontal and vertical spatial correlations using Laplace distribution are used to fill the model space covariance matrix in order to introduce an a priori solution, favoring a tabular medium whenever it doesn’t contradict the information contained in the data.

Using the probabilistic formalism developed by Tarantola [3] and considering only the data space term in the equation, we rewrite the misfit function in the vicinity of the current model ${m}_{n}$:
where:

$$\begin{array}{ccc}S\left({m}_{n}+\delta m\right)& =& {\left(\delta {d}_{n}-{G}_{n}\delta m\right)}^{T}{C}_{D}{}^{-1}\left(\delta {d}_{n}-{G}_{n}\delta m\right)\\ & & +{\left(\Delta {m}_{n}-\delta m\right)}^{T}{C}_{M}{}^{-1}\left(\Delta {m}_{n}-\delta m\right),\end{array}$$

- $\delta m$ is the perturbation in the vicinity of the current model ${m}_{n}$,
- $\delta {d}_{n}$ are the data residuals for the model ${m}_{n}$,
- $\Delta {m}_{n}$ is the difference between the current model ${m}_{n}$ and the a priori model ${m}_{prior}$
- ${G}_{n}$ is the linear function tangent to $g$ at the model ${m}_{n}$,
- $g$ is the function mapping the model space $m\in \mathit{M}$ into the data space $d=g\left(m\right)\in \mathit{D}$
- ${C}_{D}$ is the covariance matrix on the data space.
- ${C}_{M}$ is the covariance matrix on the model space (defining the Gaussian a priori probability density).

Applied to the waveform inversion problem of estimating the elastic parameters and the density of the earth, the minimization of the misfit function can be solved by iterative gradient methods.

The iterative process of a nonlinear inversion can be summarized as follows [3]: for a given iteration $k$, from the current model ${m}_{k}$ (i.e., the discretized physical fields characterizing the medium), we perform simulations of the wave in order to obtain the synthetic data ${d}_{k}^{cal}=g\left({m}_{k}\right)$. Thus, comparing them to observed data ${d}^{obs}$, we obtain the residuals $\delta {d}_{k}={d}^{obs}-{d}_{k}^{cal}$. These residuals are weighted ${\widehat{\delta d}}_{k}={C}_{D}^{-1}\delta {d}_{k}$ (the hat denotes the dual space) and back-propagated using the wave propagation equation again to obtain the gradient in the dual model space ${\widehat{\mathsf{\gamma}}}_{k}={G}_{k}^{T}{C}_{D}^{-1}\delta {d}_{k}$ (where ${G}_{k}$ is the derivative of the function $g$ over the model space at the point ${m}_{k}$ and where $T$ denotes the transpose operation) and consequently, the gradient in the model space ${\mathsf{\gamma}}_{k}={C}_{M}{\widehat{\mathsf{\gamma}}}_{k}$. The gradient indicates the steepest ascent direction of the misfit function in the model space. We modify this direction using the conjugate gradient algorithm ${\mathsf{\phi}}_{k}={\mathsf{\gamma}}_{k}+{\alpha}_{k}{\mathsf{\phi}}_{k-1}$, where ${\mathsf{\phi}}_{k}$ is the conjugate gradient and ${\alpha}_{k}$ is defined from the gradient [13]. Finally, using an additional simulation, we optimize the step length ${\mu}_{k}$ [14] in the upgrading equation ${m}_{k+1}={m}_{k}-{\mu}_{k}{\mathsf{\phi}}_{k}$ and we update the model.

The quantification of uncertainties in seismic data is often neglected, because in many cases it is a difficult task. For instance, data errors generated by receiver response are correlated in time, which means accounting for them is not straightforward. To illustrate the possible contribution of constraints provided by the analysis of uncertainties of data, we have performed two inversions from the synthetic data experiment. The first inversion is done without any data constraints, whereas in the second one we have incorporated polarization wave analysis in the covariance matrix ${C}_{D}$. This matrix treats the uncertainties on particle velocities according to the eigenbasis of the particle motion.

For the case of multi-component data, the uncertainty of a component may be correlated to the uncertainty of the other components. For each time for each of the receiver components, we can define a 2 × 2 matrix for 2-component data (as in the example bellow), or a 3 × 3 matrix for 3-component data. In that case, the covariance matrix on the data space ${C}_{D}$ is not a diagonal matrix. In order to take into account polarization, we consider that the uncertainties are proportional to the local cross-correlation matrix of the signal (the data) and that the signal is locally stationary (over 2 periods). The cross-correlation matrix between components $i$ and $j$ of the signal ${s}_{r,t}$ is computed with zero lag ($\tau =0$) for any receiver $r$ and any time $t$:
where:
with ${d}_{r}^{i}\left(u\right)$ being the trace for the component $i$ of the receiver $r$ of the observed data and ${w}_{t}\left(u\right)$ a Hamming time window centered on time $t$ with a length of typically 2 periods. One can use other time tapering windows classically used in spectral analysis (Blackman, Nuttall, etc).

$${C}_{r,t}^{ij}\left(\tau \right)={{\displaystyle \int}}_{-\infty}^{+\infty}{s}_{r,t}^{i}\left(u\right){s}_{r,t}^{j}\left(u+\tau \right)du,$$

$${s}_{r,t}^{i}\left(u\right)={d}_{r}^{i}\left(u\right){w}_{t}\left(u\right)$$

The polarization of the multicomponent signal is estimated from this matrix ${C}_{r,t}^{ij}\left(\tau =0\right)$ and the matrix of the uncertainties has the same eigenvectors (polarization directions) than the cross-correlation matrix. The uncertainty has the same shape as the polarization of the signal. The ratios among the eigenvalues are the same, in other words, the rectilinearity and planarity of the polarization [15] is conserved.

The polarized signal for rectilinear polarization can be modeled by:
where $d\left(t\right)$ is the 3C signal, $a$ is the polarization vector, $s\left(t\right)$ the scalar signal along the polarization vector, and $n\left(t\right)$ the 3C Gaussian noise.

$$d\left(t\right)=a\xb7s\left(t\right)+n\left(t\right),$$

In the frequency domain, the equation can be expressed as:
where the hat denotes the frequency domain.

$$\widehat{d}\left(f\right)=a\xb7\widehat{s}\left(f\right)+\widehat{n}\left(f\right),$$

Let us consider the spectral matrix of the 3C signal defined by:
where the upper bar denotes the complex conjugate, $T$ denotes the transpose operator and $\widehat{N}\left(f\right)$ being the spectral matrix of the noise. Note that $\overline{\left(a\widehat{s}\right)}\xb7{\widehat{n}}^{T}=\overline{\widehat{n}}\xb7{a}^{T}\widehat{s}=0$ because noise and signal are considered uncorrelated. The polarization vector is an eigenvector of the spectral matrix (for isotropic noise) for any frequency $f$:

$$\begin{array}{lll}\widehat{S}\left(f\right)& =& \overline{\widehat{d}}\left(f\right)\xb7{\widehat{d}}^{T}\left(f\right)\\ & =& \overline{\left(a\widehat{s}+\widehat{n}\right)}\xb7{\left(a\widehat{s}+\widehat{n}\right)}^{T}\\ & =& a\xb7{a}^{T}\xb7\overline{\widehat{s}}\widehat{s}+\overline{\widehat{\mathit{n}}}\xb7{\widehat{n}}^{T}\\ & =& a\xb7{a}^{T}{\left|\widehat{s}\left(f\right)\right|}^{2}+\widehat{N}\left(f\right),\end{array}$$

$$\begin{array}{lll}\widehat{S}\xb7a& =& (a\xb7{a}^{T}{\left|\widehat{s}\right|}^{2}+\widehat{N})a\\ & =& {\left|\widehat{s}\right|}^{2}a\xb7{a}^{T}a+\widehat{N}a\\ & =& {\left|\widehat{s}\right|}^{2}a+({\left|\widehat{n}\right|}^{2}I)a\\ & =& ({\left|\widehat{s}\right|}^{2}+{\left|\widehat{n}\right|}^{2})a.\end{array}$$

When integrating the spectral matrix over $\mathbb{R}$, one obtains the cross-correlation matrix $C\left(\tau =0\right)$ with zero lag with the same eigenvectors.

The structure of the covariance matrix on the data space ${C}_{\mathrm{D}}$ is derived from the previous equations. For instance, for a 2C receiver, a given source or receiver couple, and at any time $t$, the corresponding 2 × 2 submatrix is defined as:
where $\mathit{V}$ is the eigenvectors matrix (column), ${\mathsf{\sigma}}_{\mathrm{D}}$ is the standard deviation of the (isotropic) noise for this couple, and ${\mathsf{\lambda}}_{\mathrm{I}}$ and ${\mathsf{\lambda}}_{\mathrm{II}}$ are the max and min of the two positive eigenvalues of the spectral matrix, respectively.

$${C}_{\mathrm{D}}=\mathit{V}\xb7{\mathsf{\sigma}}_{\mathrm{D}}{}^{2}\left(\begin{array}{cc}1& 0\\ 0& \frac{{\mathsf{\lambda}}_{\mathrm{II}}}{{\mathsf{\lambda}}_{\mathrm{I}}}\end{array}\right)\xb7{\mathit{V}}^{-1}$$

Let us consider the sub-matrix for the first data 2C-trace of the covariance matrix on the data space:
where ${\rho}_{i}$, ${\sigma}_{i,X}$, and ${\sigma}_{i,Z}$ are, respectively, the correlation coefficient, the standard deviation for the $X$ component, and the standard deviation for the $Z$ component for a given time sample $i$, and $nt$ being the total number of time samples. The matrix ${C}_{\mathrm{D}}$ is a block diagonal matrix and its inverse (i.e., ${C}_{\mathrm{D}}^{-1}$, is another block diagonal matrix), composed of the inverse of each block.

$${C}_{\mathrm{D}}\left(rec=1\right)\propto {C}_{\mathrm{D}}\left(r=1,t,\tau =0\right)$$

$$\left[\begin{array}{cccccccc}{\sigma}_{1,X}^{2}& {\rho}_{1}{\sigma}_{1,X}{\sigma}_{1,Z}0& 0& 0& \cdots & \cdots & 0& 0\\ {\rho}_{1}{\sigma}_{1,X}{\sigma}_{1,Z}& {\sigma}_{1,Z}^{2}& 0& 0& \cdots & \cdots & 0& 0\\ 0& 0& {\sigma}_{2,X}^{2}& {\rho}_{2}{\sigma}_{2,X}{\sigma}_{2,Z}& \cdots & \cdots & 0& 0\\ 0& 0& {\rho}_{2}{\sigma}_{2,X}{\sigma}_{2,Z}& {\sigma}_{2,Z}^{2}& \cdots & \cdots & 0& 0\\ \vdots & \vdots & \vdots & \vdots & \ddots & \ddots & \vdots & \vdots \\ \vdots & \vdots & \vdots & \vdots & \ddots & \ddots & \vdots & \vdots \\ 0& 0& 0& 0& \cdots & \cdots & {\sigma}_{nt,Z}^{2}& {\rho}_{nt}{\sigma}_{nt,X}{\sigma}_{nt,Z}\\ 0& 0& 0& 0& \cdots & \cdots & {\rho}_{nt}{\sigma}_{nt,X}{\sigma}_{nt,Z}& {\sigma}_{nt,Z}^{2}\end{array}\right]$$

In this numerical example, the acquisition geometry is an offset VSP with a unique source at 500 m offset and the antenna is between 580 and 970 m depth in a vertical well (see Figure 1 for more details). The isotropic elastic model depends only on the depth (1D model). The “true” model is defined in Figure 1. The horizontal and vertical components of the “observed” data obtained from the true model are displayed in Figure 1. The direct problem (seismic modeling) is based on discretizing the wave equation by the finite differences method (FDM) [16].

The inverted parameters are the P-wave velocity, the S-wave velocity, and the density. The starting models are simple, the water layer is considered as known, and the other layers are replaced by a constant vertical gradient for each parameter (gray solid lines in Figure 2). The test consists of comparing the results of inversion when data uncertainties are isotropic and when the polarization is included in the data uncertainties.

As shown in Figure 2, the inversion results exhibits in both cases a good fit in the receiver zone, while the estimation is not good in the upper part of the model. One can clearly note that the S-wave velocity model is recovered with more detail than the P-wave velocity model. This is due to the fact that converted S-waves are present in borehole seismic data and also because the S-waves have a shorter wavelength than P-waves, hence, a better spatial resolution. The final residuals for both inversion experiments are small; a few percent of the misfit. The inversion results when polarization is included are significantly better than for the isotropic uncertainties. More precisely, for the case of polarization, both P-wave and S-wave estimated velocity fields are well fitted in the antenna zone.

We can distinguish two kinds of geological a priori information that can be incorporated in an inversion process. On the one hand, information obtained by measurements (well logging), that we consider as “objective” information, allows the definition of an a priori model at the well vicinity. On the other hand, information coming from geological interpretations, such as geological layer dips, that we qualify as “subjective” information, can be simplified and introduced into the covariance matrix ${C}_{M}$.

In order to evaluate the importance of constraints on an inversion, it is necessary to quantify the number of degrees of freedom. This number can be estimated from the set of parameters as a function of existing correlations among them and independently of their standard deviations. The number $N$ of free parameters is equal to the number $n$ of parameters in the absence of correlations, and it is reduced to 1 in the case of perfect correlations.

Let us consider a 1D Gaussian random field $X$, the domain is discrete and finite consisting of a set of $n$ points; this field is equivalent to the set of random variables $\left({X}_{1},{X}_{2},\cdots ,{X}_{n}\right)$. It can be characterized by the following probability density function (or p.d.f.):
where $x$ is a realization, $m$ is the expectation of the field $\left(\mathrm{E}\left(X\right)=m\right)$, and $C$ is the covariance matrix of the field.

$$f\left(x\right)=\frac{1}{{\left(2\pi \right)}^{n/2}\mathrm{det}{\left(C\right)}^{1/2}}\mathrm{exp}\left(-\frac{1}{2}{\left(x-m\right)}^{T}{C}^{-1}\left(x-m\right)\right),$$

The covariance matrix can be rewritten with the following form $=SRS$, where $R$ is the correlation matrix and $S$ the diagonal matrix of the standard deviations.

If there is no correlation among the random variables ${X}_{i}$, then $R=I$ and the covariance matrix is diagonal and the number of free parameters is then $n$. If a correlation exists, we then use the $LU$ decomposition method to estimate the number of free parameters (see Appendix A and Appendix B for more details). We can write $R=M{M}^{T}$ where $M$ is the left lower triangle matrix of the $LU$ decomposition of the matrix $R$. The number of free parameters is then reduced to:

$$N=trace\left(M\right).$$

For illustration purposes, let us consider the special case where the spatial correlations are described by an exponential model. Thus, for any pair of points ${M}_{1}$ and ${M}_{2}$ (belonging to the same horizontal line), the correlation $\rho $ between the pair of random variables ${X}_{1}$ and ${X}_{2}$ corresponding to the points ${M}_{i}$ can be expressed as a function of the distance:
where $d(\xb7)$ is a distance and $r$ is the range of the correlation.

$$\rho \left({X}_{1},{X}_{2}\right)=\mathrm{exp}\left(-\frac{d\left({M}_{1},{M}_{2}\right)}{r}\right),$$

By setting the variance for each variable to be constant ${\sigma}_{i}={\sigma}_{1}$, the p.d.f. of Equation (7) reduces to,

$$f\left(x\right)=\frac{1}{{\left(2\pi \right)}^{n/2}\xb7{\sigma}^{n}\sqrt{\mathrm{det}\left(R\right)}}\mathrm{exp}\left(-\frac{1}{2}{\left(\frac{x-m}{\sigma}\right)}^{T}{R}^{-1}\left(\frac{x-m}{\sigma}\right)\right),$$

This simplifies, as we will show below, the study of the relation between the matrix $R$ and the degrees of freedom using an exponential correlation function.

Let us consider the same Gaussian field $X$, but this time defined on a defined $1\mathrm{D}$ domain regularly sampled (as for a grid) containing $n$ points. The distances between points are multiples of the step $\Delta x$ and the correlation matrix has the following form (for ordered points):
where

$$R=\left(\begin{array}{ccccccc}1& a& {a}^{2}& {a}^{3}& {a}^{4}& \cdots & {a}^{n}\\ a& 1& a& {a}^{2}& {a}^{3}& \cdots & {a}^{n-1}\\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ {a}^{n}& {a}^{n-1}& {a}^{n-2}& {a}^{n-3}& {a}^{n-4}& \cdots & 1\end{array}\right),$$

$$a=\mathrm{exp}\left(-\frac{\Delta x\text{}}{r}\right)\le 1.$$

One interesting property of the $R$ matrix is that the $M$ matrix of the $LU$ decomposition has a simple form
with $b=\sqrt{1-{a}^{2}}$. This means that, in terms of a sequential simulation, the $n-1$ variables, ${X}_{2}$ to ${X}_{n}$, have the same degree of freedom: $b$. The total degrees of freedom $N$ of the field $X$ is then:
where for $n$ parameters (or $n$ grid points), the number of free parameters, considered as the degrees of freedom, of the inversion is $N$. Figure 3 shows the relation between the range of the correlation (expressed in number of steps, $\Delta x$) and the values of $b$.

$$M=\left(\begin{array}{ccccccc}1& 0& 0& 0& 0& \cdots & 0\\ a& b& 0& 0& 0& \cdots & 0\\ {a}^{2}& ab& b& 0& 0& \cdots & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ {a}^{n}& {a}^{n-1}b& {a}^{n-2}b& {a}^{n-3}b& {a}^{n-4}b& \cdots & b\end{array}\right),$$

$$N=1+\left(n-1\right)\xb7b,$$

This numerical example is an illustration how to incorporate the “subjective” geological information that our medium is mainly tabular but might contain complex geological structures using the covariance matrix on the model space.

The reference subsurface models are shown in Figure 4. The word “reference” indicates that this model is used to generate the synthetic seismic data used as observed data (see Figure 5) during the inversion process. It is also the model to be retrieved by applying the full-wave inversion method to the observed data. The medium for wave propagation is elastic isotropic. The parameters to be inverted are discretized physical fields: vertical P-wave and S-wave velocities. For simplicity, the density ($\rho $) is not inverted—the field is constant with $\rho =2500\mathrm{kg}\xb7{\mathrm{m}}^{-3}$ and it fixed at the correct value in the different inversions.

The inversion process consists of four successive inversions using the same observed data but with decreasing correlation ranges in horizontal and vertical directions; therefore, it is a multiscale inversion and each inversion is for a given “scale”. The model parameter results of each inversion are used as initial model parameters for the next inversion.

Uncertainties on the model space are required for the definition of ${C}_{M}{}^{-1}$. It can be defined by specifying the spatial correlation ranges and the uncertainties on the parameters. Here the uncertainties on the parameters are kept constant: ${\sigma}_{Vp}=120\mathrm{m}/\mathrm{s}$ and ${\sigma}_{Vs}=100\mathrm{m}/\mathrm{s}$. The ranges of the horizontal and vertical correlations do not depend on the parameters but on the spatial location. The horizontal correlation depends on depth and both horizontal and vertical ranges depend on the inversion scale. We define three types of a priori regions with respect to depth (see Table 1): “Quasi 1D” with large correlation ranges, “2D” with short correlation ranges (where we expect to have complex geological objects, see Figure 4), and “Transition”, with correlation ranges linearly varying with depth to match the ranges of the two adjacent regions.

Table 2 lists the different correlation ranges associated with the four successive inversion scales and for the a priori region types. The four successive inversion scales are denoted by (b), (c), (d), and (e) in relation to the results displayed in Figure 6 and Figure 7. The correlation ranges are gradually decreasing with the successive inversion scales in order to solve first the large wavelengths in the model before resolving the small wavelengths (i.e., the small model structures).

The results of the multiscale inversion process for the P-wave velocity field are displayed in Figure 6. The P-wave velocity starting model is obtained by a low-frequency full waveform inversion. It is also used as the starting model for the unconstrained inversion; such starting models can also be obtained from traveltime inversion, especially for crosswell data. The estimated P-velocity field is obtained from the first inversion scale, i.e., using correlation ranges of 320 m and 24 m in the 2D region, respectively, for the horizontal and vertical correlations. One can notice that the 2D structure cannot be resolved, as the horizontal correlation range is larger than the size of the triangular object. The 1D structures are partially retrieved. For the inversion scale (c), the estimated P-velocity field is, obtained from the inversion scale (b), i.e., using correlation ranges of 80 m and 10 m in the 2D region, respectively, for the horizontal and vertical correlations. One can notice that the 2D structure start to be resolved as both correlation ranges are smaller than the size of the triangle. The 1D structures are well estimated, with some vertical smoothening due to the vertical correlation range of 10 m. For the inversion scale (d), the estimated P-velocity field is obtained from the third inversion scale, i.e., using correlation ranges of 20 m and 6 m in the 2D region, respectively, for the horizontal and vertical correlations. One can notice that the 2D structure is getting better spatial resolution, for both the triangle and the lower zigzag interface. The 1D structures are well resolved. For the inversion scale (e), the estimated P-velocity field is obtained from the fourth and last inversion scale, i.e., using correlation ranges of 5 m and 2 m in the 2D region, respectively, for the horizontal and vertical correlations. One can notice that the delineation of the 2D structure is close to the reference model. As expected, the estimated field outside the zone between the two wells is not reliable.

The results of the multiscale inversion process for the S-wave velocity field are displayed in Figure 7. The results are similar to the P-wave velocity field (Figure 6). The main difference is clear at the fourth scale (e): the spatial resolution is almost perfect for the S-wave due to smaller wavelength content associated to this mode. However, the inversion process provides an overestimation of velocity contrast near the interfaces and some small oscillations in the estimated field clearly visible for the light blue domain above the central triangle. It means that the number of degrees of freedom is too high regarding the amount of information provided by the data.

The misfit function for the successive multiscale inversion continuously decreases from one inversion scale to another (see Table 3).

The number of degrees of freedom per point are provided in Table 4 for the different inversion scales. These numbers of freedoms are calculated from Equation (14). The ratio of the number of degrees with respect to the number of model parameters (i.e., grid points), in other words the number of degrees of freedom per point, is provided. It is increasing slower for the quasi 1D region than for the 2D region because only the vertical correlation range is decreasing, while both horizontal and vertical ranges are decreasing in the 2D region. These ratios are pointing out that the degrees of freedom are significantly lower in the early scales of the multiscale inversion, allowing better constraint of the problem and improving the stability by defining an overdetermined system. During the last scale stages, the constraints are relaxed, allowing one to obtain a better resolution while avoiding artifacts in the estimated fields. Even for the last scale, the ratio indicates about 5 times less free parameters than in the unconstrained inversion.

For comparison purposes with the results of the constrained inversion, we performed an unconstrained inversion. In the Figure 8, we display the results of the multiscale constrained inversion with respect to the unconstrained inversion for the same misfit of 0.3%. We can notice that the unconstrained inversion results are noisier than the constrained inversion. The main artifacts can be found near the sources (classical problem in borehole seismic full waveform inversion), below the triangle structure, above the source or receiver zones (migration smile type artifacts), and in the constant gradient 1D layers. These artifacts in the estimated fields are the consequence of the instability of the least-squares inversion problem when the problem is not correctly determined for all parameters, in other words, when it is partially underdetermined. The number of degrees of freedom is too large regarding the information provided by the data. Reducing the number of degrees of freedom of the problem taking into account prior geological information (i.e., constraining the fields using spatial statistic where we have more prior information) allows one to overcome this problem, reducing drastically the number and the magnitude of the artifacts.

In this paper, we have illustrated how to introduce, in the inverse problem, constraints consistent with the least squares formalism (through covariance matrices) on both the data and the model space.

Constraining the seismic inverse problem in the data space is not straightforward but when possible, it is a very powerful tool. In the case of a multicomponent borehole seismic data, using the polarization in the inversion process allowed better recovery of the sharpness of the interfaces at the level of the receivers.

Constraining the seismic inverse problem in the model space can be achieved by defining a priori model parameters and evaluating the model uncertainties associated with this model. Horizontal and vertical spatial correlation using Laplace distribution can be used to fill the model space covariance matrix in order to introduce an a priori information favoring the tabular region versus more complex regions, by varying accordingly the ranges of these correlations. The inversion favors solutions are consistent with our a priori, whenever it does not contradict the information contained in the data. Moreover, by adopting a multiscale type of inversion by relaxing the correlation ranges, it regularizes the inverse problem.

The incorporation of all our a priori knowledge of the parameters and all statistical studies on the data, allows not only the algorithmic stabilization of the inversion process, but also the reduction of the solution set for an underdetermined problem, with the purpose being not necessarily to converge quickly towards a good model (in term of residuals), but to prospect regions of the model space populated by models that are sensible, a priori, and also yielding the lowest possible misfit.

Conceptualization, M.C. and C.B.; methodology, M.C. and C.B.; validation, M.C. and C.B.; formal analysis; writing—original draft preparation, M.C. and C.B.; writing—review and editing, M.C. and C.B.

This research received no external funding.

The authors declare no conflict of interest.

In this appendix, we will explicitly state the relation between the correlation matrix and the degrees of freedom.

Let us consider a 1D Gaussian random field $X$ defined in section Quantification of number of degrees of freedom and characterized by its p.d.f. given by Equation (7).

If the covariance matrix $C$ is diagonal (i.e., no correlation between the random variables ${X}_{i}$), Equation (7) can be rewritten:
where ${\sigma}_{i}$ is the standard deviation of the Gaussian random variable ${X}_{i}$ for the ${i}^{th}$ point, and ${f}_{i}\left({x}_{i}\right)$ the p.d.f. associated with this variable ${X}_{i}$. For a Gaussian field, when correlation coefficients are null, the variables ${X}_{i}$ are independent and as a consequence, the p.d.f. of the field is the product of the marginal p.d.f. of the variables, the ${f}_{i}$.

$$\begin{array}{lll}f\left(x\right)& =& \frac{1}{{\left(2\pi \right)}^{n/2}{{\displaystyle \prod}}_{i}{\sigma}_{i}}\mathrm{exp}\left(-\frac{1}{2}{\displaystyle {\displaystyle \sum}_{i}}{\left(\frac{{x}_{i}-{m}_{i}}{{\sigma}_{i}}\right)}^{2}\right)\\ & =& {\displaystyle {\displaystyle \prod}_{i}}\frac{1}{\sqrt{2\pi}{\sigma}_{i}}\mathrm{exp}\left(-\frac{1}{2}{\displaystyle {\displaystyle \sum}_{i}}{\left(\frac{{x}_{i}-{m}_{i}}{{\sigma}_{i}}\right)}^{2}\right)\\ & =& {\displaystyle {\displaystyle \prod}_{i}}{f}_{i}\left({x}_{i}\right),\end{array}$$

Assuming for simplicity that the ${m}_{i}$ and the ${\sigma}_{i}$ are constant, the field is made of $n$ independent variables, identically distributed. Then, considering we have in this case $n$ free parameters, we will study the effect of the non-independence of variables on the number of the free parameters, incorporating correlation between variables through a non-diagonal covariance matrix.

Whatever the discrete random field, the knowledge of the joint p.d.f. (like $f$ defined in Equation (7) for the Gaussian random field $X$) allows simulation of a realization of the field sequentially, using the marginal-conditional decomposition (correct whatever the order of the indexes):
where ${g}_{i}$ is the marginal-conditional p.d.f. of the random variable ${X}_{i}$, independently of the random variables ${X}_{j}$ for $ji$ and conditionally to the random variables ${X}_{k}$ for $k<i$. More precisely, ${g}_{i}$ is defined by the relation:

$$f\left({x}_{1},{x}_{2},\cdots ,{x}_{n}\right)={g}_{1}\left({x}_{1}\right)\xb7{g}_{2}\left({x}_{2}|{x}_{1}\right)\xb7{g}_{3}\left({x}_{3}|{x}_{1},{x}_{2}\right)\xb7\cdots \xb7+{g}_{n}\left({x}_{n}|{x}_{1},{x}_{2},\cdots {x}_{n-1}\right),$$

$${g}_{i}\left({x}_{i}|{x}_{1},{x}_{2},\cdots ,{x}_{i-1}\right)=\frac{{{\displaystyle \int}}^{\text{}}d{x}_{i+1}{{\displaystyle \int}}^{\text{}}d{x}_{i+2}\cdots {{\displaystyle \int}}^{\text{}}d{x}_{n}f\left({x}_{1},{x}_{2},\cdots ,{x}_{n}\right)}{{{\displaystyle \int}}^{\text{}}d{x}_{i}{{\displaystyle \int}}^{\text{}}d{x}_{i+1}\cdots {{\displaystyle \int}}^{\text{}}d{x}_{n}f\left({x}_{1},{x}_{2},\cdots ,{x}_{n}\right)}.$$

Using Equation (A2), it is possible to simulate a realization of the field sequentially. First, we use ${g}_{1}\left({x}_{1}\right)$ to generate a realization of the random variable ${X}_{1}$, independently of all other variables. Once ${x}_{1}={x}_{1}^{0}$ is determined, we use ${g}_{2}\left({x}_{2}|{x}_{1}^{0}\right)$ to generate a realization ${x}_{2}^{0}$ of the random variable ${X}_{2}$, dependent to the value ${x}_{1}^{0}$, but independent of the other random variables ${X}_{3}$ to ${X}_{n}$. We repeat this procedure using ${g}_{3}$, then ${g}_{4}$, and so on, until the last value ${x}_{n}^{0}$ is determined, thereby obtaining a realization ${x}^{0}$ of the random field $X$.

We can link the degree of freedom of the variable ${X}_{i}$, when the values ${x}_{i}$ to ${x}_{i-1}$ are known, to the standard deviation ${\sigma}_{i}^{c}$ of the Gaussian defined by the p.d.f. ${g}_{i}\left({x}_{i}|{x}_{1},{x}_{2},\cdots ,{x}_{i-1}\right)$. To account for the correlation between the variables ${X}_{i}$, independently of the values of the standard deviations ${\sigma}_{i}$ of the marginal p.d.f. ${f}_{i}\left({x}_{i}\right)$, we can then define the degrees of freedom of the variable ${X}_{i}$ (when ${x}_{1}$ to ${x}_{i-1}$ are known) as the following ratio:
which equals one when ${X}_{i}$ is not correlated to the variables $\left({X}_{1},{X}_{2},\cdots ,{X}_{i-1}\right)$ and zero when the correlation is perfect.

$$d=\frac{{\sigma}_{i}^{c}}{{\sigma}_{i}},$$

In order to simplify the calculation of the degrees of freedom, we introduce here the $LU$ decomposition technique for Gaussian field simulation, as this method can be interpreted in term of sequential simulation using Equation (7).

A way to simulate a Gaussian discrete random field $X$ defined by the p.d.f. in Equation (A2) ($n$ points with expectation m and covariance matrix $C$) is the use of the $LU$ decomposition of the covariance matrix $C$:
where $L$ denotes the left lower triangle matrix and $U$ the upper right one.

$$C=LU=L{L}^{T},$$

Considering a random field $E$ constituted by $n$ random variables, independently and identically distributed following a Gaussian law with zero mean and unit standard deviation; the field $m+LE$ is then equivalent to $X$. Therefore, to simulate a realization $x$ of the random field $X$, we can use a realization $e$ of the field $E$ and write:

$$x=m+Le\text{}.$$

Let ${l}_{ij}$ be the elements of the matrix $\text{}L$. The relation given by Equation (A6) can be interpreted in term of sequential simulation: the first line gives ${x}_{1}={m}_{1}+{l}_{11}{e}_{1}$, meaning that we simulate the realization of ${X}_{1}$ independently of the other variables, as if we used the $1\mathrm{D}$ marginal p.d.f. ${g}_{1}\left({x}_{1}\right)$ defined in Equation (A2):

$${g}_{1}\left({x}_{1}\right)=\frac{1}{\sqrt{2\pi}{\sigma}_{1}}\mathrm{exp}\left(-\frac{1}{2}{\left(\frac{{x}_{1}-{m}_{1}}{{\sigma}_{1}}\right)}^{2}\right)={f}_{1}\left({x}_{1}\right).$$

Once the value of ${x}_{1}$ is fixed, the second line of the Equation (15) means that the second random variable ${x}_{2}$ can be written:

$${X}_{2}={m}_{2}+{l}_{21}{e}_{1}+{l}_{22}{E}_{2}={m}_{2}+{l}_{21}\left(\frac{{x}_{1}-{m}_{1}}{{l}_{11}}\right)+{l}_{22}{E}_{2},$$

Meaning that we simulate the realization of ${X}_{2}$ independently of ${X}_{3},\cdots ,{X}_{n}$ but dependent of the value ${X}_{1}={x}_{1}$, as if we used the conditional marginal p.d.f. ${g}_{2}\left({x}_{2}|{x}_{1}\right)$ (see Equation (A2)). A given line $i$ of the Equation (A6) means we can simulate a random variable ${X}_{i}$ conditionally to the determined values of the preceding variables ${x}_{1}$ to ${x}_{i-1}$, as if we used the proper marginal-conditional p.d.f. ${g}_{i}\left({x}_{i}|{x}_{1},{x}_{2},\cdots ,{x}_{i-1}\right)$ (see Equation (A2)).

Let us come back to the estimation of the degrees of freedom. The standard deviations ${\sigma}_{i}^{c}$ are given here by the ${i}^{th}$ diagonal element ${l}_{ii}$ of the matrix $L$. This standard deviation is lesser than the standard deviation of the variable ${X}_{i}$, independently of the other variables (i.e., the standard deviation ${\sigma}_{i}$ of the Gaussian defined by the marginal p.d.f. ${f}_{i}\left({x}_{i}\right)$). When correlation between Xi and the variables ${X}_{1}$ to ${X}_{i-1}$ is strong, the ratio $\frac{{l}_{ii}}{{\sigma}_{i}}$ is weak. When ${X}_{i}$ is independent of the variables ${X}_{1}$ to ${X}_{i-1}$, the ratio equals 1, meaning the freedom of the variable is total. We can say that when variables are independent (no correlation), the number of the free parameters is $n={{\displaystyle \sum}}_{i}\frac{1}{1}$ and when correlation exists, it is reduced to:

$$N={\displaystyle \sum}_{i}\frac{{l}_{ii}}{{\sigma}_{i}}.$$

The covariance matrix can be written with the following form:
where $R$ is the correlation matrix and $S$ the diagonal matrix of the standard deviations;

$$C=SRS,$$

$$S=\left(\begin{array}{cccc}{\sigma}_{1}& 0& \cdots & 0\\ 0& {\sigma}_{2}& \cdots & 0\\ \vdots & \vdots & \ddots & \vdots \\ 0& 0& \cdots & {\sigma}_{n}\end{array}\right).$$

The $LU$ decomposition of $R$ gives: $R=M{M}^{T}$, with $L=SM$. Therefore, the field $X$ can be written (see Equation (20)): $X=m+LE=m+SME.$

The number of free parameters is then:

$$N={\displaystyle \sum}_{i}\frac{{l}_{ii}}{{\sigma}_{i}}=trace\left(M\right).$$

Let us consider a pair of random variables $\left(X,Y\right)$ characterized by the following p.d.f.:
where $C$ is the covariance matrix, we have:
where $\rho $ is the correlation between the random variables $X$ and $Y$. An example is given in Figure A1, with the following values: ${m}_{x}={m}_{y}=0$, ${\sigma}_{x}={\sigma}_{y}=1$, $\rho =4/5$.

$$f\left(x,y\right)=\frac{1}{2\pi \sqrt{\mathrm{det}\left(C\right)}}\mathrm{exp}\left(-\frac{1}{2}{\left(\begin{array}{c}x-{m}_{x}\\ y-{m}_{y}\end{array}\right)}^{T}{C}^{-1}\left(\begin{array}{c}x-{m}_{x}\\ y-{m}_{y}\end{array}\right)\right),$$

$$C=\left(\begin{array}{cc}{\sigma}_{x}^{2}& \rho {\sigma}_{x}{\sigma}_{y}\\ \rho {\sigma}_{x}{\sigma}_{y}& {\sigma}_{y}^{2}\end{array}\right),det\left(C\right)={\sigma}_{x}^{2}{\sigma}_{y}^{2}\left(1-{\rho}^{2}\right),{C}^{-1}=\frac{1}{1-{\rho}^{2}}\left(\begin{array}{cc}\frac{1}{{\sigma}_{x}^{2}}& \frac{-\rho}{{\sigma}_{x}{\sigma}_{y}}\\ \frac{-\rho}{{\sigma}_{x}{\sigma}_{y}}& \frac{1}{{\sigma}_{y}^{2}}\end{array}\right),$$

Equation (16) of the decomposition of the p.d.f. is written:
where

$$f\left(x,y\right)={f}_{1}\left(x\right)\cdot {f}_{2}\left(y|x\right),$$

$${f}_{x}\left(x\right)=\frac{1}{\sqrt{2\pi}{\sigma}_{x}}\mathrm{exp}\left(-\frac{1}{2}{\left(\frac{x-{m}_{x}}{{\sigma}_{x}}\right)}^{2}\right)\phantom{\rule{0ex}{0ex}}{f}_{y|x}\left(y|x\right)=\frac{1}{\sqrt{2\pi}{\sigma}_{y}\sqrt{1-{\rho}^{2}}}\mathrm{exp}\left(-\frac{1}{2}{\left(\frac{y-{m}_{y}-\rho {\sigma}_{y}\left(\frac{x-{m}_{x}}{{\sigma}_{x}}\right)}{{\sigma}_{y}\sqrt{1-{\rho}^{2}}}\right)}^{2}\right).$$

We can note that the characteristics of these p.d.f. are given by the elements of the left triangle matrix of the covariance $LU$ decomposition:

$$L=\left(\begin{array}{cc}{\sigma}_{x}& 0\\ \rho {\sigma}_{y}& {\sigma}_{y}\sqrt{1-{\rho}^{2}}\end{array}\right).$$

Indeed, the first line corresponds to the standard deviation of the p.d.f. ${f}_{x}\left(x\right)$. The conditional p.d.f. ${f}_{y|x}\left(y|x\right)$ is plotted on Figure A1. It is characterized by its mean ${m}_{c}={m}_{y}+\rho {\sigma}_{y}\left(\frac{x-{m}_{x}}{{\sigma}_{x}}\right)$ and its standard deviation ${\sigma}_{c}={\sigma}_{y}\sqrt{1-{\rho}^{2}}$. In the second row, the first element corresponds to the scale factor of the correction of the mean of the conditional p.d.f. ${f}_{y|x}$ and the second element to the standard deviation of the conditional p.d.f. In summary, the diagonal elements are the standard deviations of the random variables $X$ and $Y|X$. The effect of the correlation $\rho $ is to reduce, by the factor $\sqrt{1-{\rho}^{2}}$, the degree of freedom of the second variable $Y$ when $X$ is known, as we see in Figure A1. We then note that the total degree of freedom of this pair is (taking as a reference the uncorrelated pair characterized by the diagonal covariance matrix, $\rho =0$):

$$d=\frac{{\sigma}_{x}}{{\sigma}_{x}}+\frac{{\sigma}_{y}\sqrt{1-{\rho}^{2}}}{{\sigma}_{y}}=1+\sqrt{1-{\rho}^{2}}.$$

We can note that $d=1$ when the correlation is perfect ($\rho =\pm 1$) and $d=2$ when the correlation is inexistent ($\rho =0$).

- Virieux, J.; Operto, S. An overview of full waveform inversion in exploration geophysics. Geophysics
**2009**, 74, WCC1–WCC26. [Google Scholar] [CrossRef] - Menke, W. Geophysical Data Analysis: Discrete Inverse Theory; Academic Press, Inc.: Orlando, FL, USA, 1984. [Google Scholar]
- Tarantola, A. Inverse Problem Theory: Methods for Data Fitting and Model Parameter Estimation: Discrete Inverse Theory; Elsevier Science Publ. Co., Inc.: New York, NY, USA, 1987. [Google Scholar]
- Tikhonov, A.; Arsenin, V. Solution of Ill-Posed Problems; Winston: Washington, DC, USA, 1977. [Google Scholar]
- Operto, S.; Virieux, J.; Dessa, X.; Pascal, G. Crustal imaging from multifold ocean bottom seismometers data by frequency-domain full-waveform tomography: Application to the eastern Nankai trough. J. Geophys. Res.
**2006**, 111, B09306. [Google Scholar] [CrossRef] - Guitton, A.; Ayeni, G.; Gonzales, G. A preconditioning scheme for full waveform inversion. In SEG Technical Program Expanded Abstracts 2010; Society of Exploration Geophysicists: Tulsa, OK, USA, 2010; pp. 1008–1012. [Google Scholar] [CrossRef]
- Guitton, A. A blocky regularization scheme for full waveform inversion. In SEG Technical Program Expanded Abstracts 2011; Society of Exploration Geophysicists: Tulsa, OK, USA, 2011; pp. 2418–2422. [Google Scholar] [CrossRef]
- Abubakar, A.; Hu, W.; Habashy, T.M.; van den Berg, P.M. Application of the finite-difference contrast-source inversion algorithm to seismic full-waveform data. Geophysics
**2009**, 74, WCC47–WCC58. [Google Scholar] [CrossRef][Green Version] - Herrmann, F.J.; Erlangga, Y.A.; Lin, T.T.Y. Compressive simultaneous full-waveform simulation. Geophysics
**2009**, 74, A35–A40. [Google Scholar] [CrossRef] - Loris, I.; Douma, H.; Nolet, G.; Daubechies, I.; Regone, C. Nonlinear regularization techniques for seismic tomography. J. Comput. Phys.
**2010**, 229, 890–905. [Google Scholar] [CrossRef][Green Version] - Asnaashari, A.; Brossier, R.; Garambois, S.; Audebert, F.; Thore, P.; Virieux, J. Regularized seismic Full-waveform inversion with prior model information. Geophysics
**2013**, 78, R25–R36. [Google Scholar] [CrossRef] - Asnaashari, A.; Brossier, R.; Garambois, S.; Audebert, F.; Thore, P.; Virieux, J. Time-lapse seismic imaging using regularized full-waveform inversion with a prior model: Which strategy? Geophys. Prospect.
**2015**, 63, 78–98. [Google Scholar] [CrossRef] - Polak, E.; Ribière, G. Note sur la convergence de directions conjuguée. Rev. Fr. Inform. Rech. Oper. 3e Année
**1969**, 16, 35–43. [Google Scholar] - Crase, E. Robust Elastic Nonlinear Inversion of Seismic Waveform Data. Ph.D. Thesis, University of Houston, Houston, TX, USA, 1989. [Google Scholar]
- Jurkevics, A. Polarization analysis of the three component array data. Bull. Seismol. Soc. Am.
**1988**, 78, 1725–1743. [Google Scholar] - Virieux, J. P-SV wave propagation in heterogeneous media: Velocity-stress finite-difference method. Geophysics
**1986**, 51, 889–901. [Google Scholar] [CrossRef]

Depth (m) | Region Type | Number of Vertical Points |
---|---|---|

800–1120 | Quasi 1D | 161 |

1120–1180 | Transition | 29 |

1180–1340 | 2D | 81 |

1340–1400 | Transition | 29 |

1400–1600 | Quasi 1D | 101 |

Region Type | Correlation Ranges (m) | Inversion Scale (b) | Inversion Scale (c) | Inversion Scale (d) | Inversion Scale (e) |
---|---|---|---|---|---|

Quasi 1D | $({r}_{x},{r}_{z})$ | (1000, 24) | (1000, 10) | (1000, 6) | (1000, 2) |

2D | $({r}_{x},{r}_{z})$ | (320, 24) | (80, 10) | (20, 6) | (5, 2) |

Inversion Parameters | Inversion Scale (b) | Inversion Scale (c) | Inversion Scale (d) | Inversion Scale (e) |
---|---|---|---|---|

Initial Misfit | 45.5% | 27.9% | 6.55% | 0.81% |

Final Misfit | 27.9% | 6.55% | 0.81% | 0.14% |

Iteration # | 30 | 30 | 50 | 50 |

Region Type | Inversion Scale (b) | Inversion Scale (c) | Inversion Scale (d) | Inversion Scale (e) |
---|---|---|---|---|

Quasi 1D | 1919 | 2812 | 3416 | 4554 |

Transition | 267 | 468 | 622 | 857 |

2D | 1022 | 2922 | 6792 | 15,725 |

All | 3208 | 6202 | 10,830 | 21,136 |

DoF/point | 2.8% | 5.5% | 9.6% | 18.8% |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).