Constrained Full Waveform Inversion for Borehole Multicomponent Seismic Data

Full-waveform inversion for borehole seismic data is an ill-posed problem and constraining the problem is crucial. Constraints can be imposed on the data and model space through covariance matrices. Usually, they are set to a diagonal matrix. For the data space, signal polarization information can be used to evaluate the data uncertainties. The inversion forces the synthetic data to fit the polarization of observed data. A synthetic inversion for a 2D-2C data estimating a 1D elastic model shows a clear improvement, especially at the level of the receivers. For the model space, horizontal and vertical spatial correlations using a Laplace distribution can be used to fill the model space covariance matrix. This approach reduces the degree of freedom of the inverse problem, which can be quantitatively evaluated. Strong horizontal spatial correlation distances favor a tabular geological model whenever it does not contradict the data. The relaxation of the spatial correlation distances from large to small during the iterative inversion process allows the recovery of geological objects of the same size, which regularizes the inverse problem. Synthetic constrained and unconstrained inversions for 2D-2C crosswell data show the clear improvement of the inversion results when constraints are used.


Introduction
Full-waveform inversion of seismic data allows one to obtain an image of the subsurface through the determination of a certain number of physical parameters.Borehole data (VSP and crosswell) are of special interest because of the geometry of acquisition, which provides more informative signals on the medium properties than surface seismic data (high frequency signal, energetic P-S conversions).However, the lack of seismic data redundancy (e.g., few shots) renders the inversion problem underdetermined.In general, full-waveform inversion is an ill-posed problem, in the sense that an infinite number of models match the data [1].It is classically solved with local optimization schemes and is therefore strongly dependent on the starting model definition.This starting model should predict arrival times with errors less than half of the period to cancel the cycle-skipping ambiguity [1].The multiscale strategy performed by moving from low to high frequencies during the inversion allows reduction of the nonlinearities and cycle skipping issues of the inversion and helps convergence toward the global minimum.Regularizations are conventionally applied to the inversion in the model space to make it better posed [2,3].Tikhonov and Arsenin [4] have proposed a regularization strategy within the optimization step to find the smoothest model that explains the data.Preconditioning techniques acting as a smooth operator on the model update [5] may add strong prior features of the expected structure through directive Laplace preconditioning, as in Guitton et al. [6].
Regularization schemes that preserve edges and contrasts have also been developed for specific full waveform inversion applications through a 1 model penalty [7] or through a multiplicative regularization [8].Regularization can also be expressed in the curvelet or wavelet domains [9,10].In such domains, the 1 norm minimization is generally preferred for the model term penalty because it ensures sparsity in the model space.All the previous regularization techniques allow stabilization of the inversion scheme by assuming a particular representation or structure of the velocity model (smoothness, sparsity, and so on).However, geological information, taking into account prior model information, is generally not used in classical full waveform inversion implementation; however, there are some examples [11,12].In addition, most of the weighting operators or covariance matrices associated with the model parameters are set to identity or in the best cases to a diagonal matrix.The same applies to the weighting operators or covariance matrices associated with the data.
Weighting operators on the data and models or inverse of the data and model covariance operators for least squares in the frame of the Bayesian formulation [3] are introduced in the misfit function.In this paper, we will show the benefit of introducing covariance matrices in the data and model space.On the one hand, we will illustrate the impact on constraining the seismic inverse problem on the data space for the case of a synthetic two-component borehole seismic data by using the polarization analysis to fill a block diagonal data space covariance matrix.On the other hand, we will illustrate the benefit of constraining the seismic inverse problem on the model space by performing a crosswell synthetic experiment.The horizontal and vertical spatial correlations using Laplace distribution are used to fill the model space covariance matrix in order to introduce an a priori solution, favoring a tabular medium whenever it doesn't contradict the information contained in the data.

Constrained Least Squares Inversion
Using the probabilistic formalism developed by Tarantola [3] and considering only the data space term in the equation, we rewrite the misfit function in the vicinity of the current model m n : where: • Applied to the waveform inversion problem of estimating the elastic parameters and the density of the earth, the minimization of the misfit function can be solved by iterative gradient methods.
The iterative process of a nonlinear inversion can be summarized as follows [3]: for a given iteration k, from the current model m k (i.e., the discretized physical fields characterizing the medium), we perform simulations of the wave in order to obtain the synthetic data d cal k = g(m k ).Thus, comparing them to observed data d obs , we obtain the residuals δd k = d obs − d cal k .These residuals are weighted δd k = C −1 D δd k (the hat denotes the dual space) and back-propagated using the wave propagation equation again to obtain the gradient in the dual model space γk = G T k C −1 D δd k (where G k is the derivative of the function g over the model space at the point m k and where T denotes the transpose operation) and consequently, the gradient in the model space γ k = C M γk .The gradient indicates the steepest ascent direction of the misfit function in the model space.We modify this direction using the conjugate gradient algorithm ϕ k = γ k + α k ϕ k−1 , where ϕ k is the conjugate gradient and α k is defined from the gradient [13].Finally, using an additional simulation, we optimize the step length µ k [14] in the upgrading equation m k+1 = m k − µ k ϕ k and we update the model.

Constraints on the Data Space
The quantification of uncertainties in seismic data is often neglected, because in many cases it is a difficult task.For instance, data errors generated by receiver response are correlated in time, which means accounting for them is not straightforward.To illustrate the possible contribution of constraints provided by the analysis of uncertainties of data, we have performed two inversions from the synthetic data experiment.The first inversion is done without any data constraints, whereas in the second one we have incorporated polarization wave analysis in the covariance matrix C D .This matrix treats the uncertainties on particle velocities according to the eigenbasis of the particle motion.
For the case of multi-component data, the uncertainty of a component may be correlated to the uncertainty of the other components.For each time for each of the receiver components, we can define a 2 × 2 matrix for 2-component data (as in the example bellow), or a 3 × 3 matrix for 3-component data.In that case, the covariance matrix on the data space C D is not a diagonal matrix.In order to take into account polarization, we consider that the uncertainties are proportional to the local cross-correlation matrix of the signal (the data) and that the signal is locally stationary (over 2 periods).The cross-correlation matrix between components i and j of the signal s r,t is computed with zero lag (τ = 0) for any receiver r and any time t: where: with d i r (u) being the trace for the component i of the receiver r of the observed data and w t (u) a Hamming time window centered on time t with a length of typically 2 periods.One can use other time tapering windows classically used in spectral analysis (Blackman, Nuttall, etc).

Cross-Correlations and Polarization
The polarization of the multicomponent signal is estimated from this matrix C ij r,t (τ = 0) and the matrix of the uncertainties has the same eigenvectors (polarization directions) than the cross-correlation matrix.The uncertainty has the same shape as the polarization of the signal.The ratios among the eigenvalues are the same, in other words, the rectilinearity and planarity of the polarization [15] is conserved.
The polarized signal for rectilinear polarization can be modeled by: where d(t) is the 3C signal, a is the polarization vector, s(t) the scalar signal along the polarization vector, and n(t) the 3C Gaussian noise.
In the frequency domain, the equation can be expressed as: where the hat denotes the frequency domain.
Let us consider the spectral matrix of the 3C signal defined by: where the upper bar denotes the complex conjugate, T denotes the transpose operator and N( f ) being the spectral matrix of the noise.Note that (aŝ)• nT = n•a T ŝ = 0 because noise and signal are considered uncorrelated.The polarization vector is an eigenvector of the spectral matrix (for isotropic noise) for any frequency f : When integrating the spectral matrix over R, one obtains the cross-correlation matrix C(τ = 0) with zero lag with the same eigenvectors.
The structure of the covariance matrix on the data space C D is derived from the previous equations.For instance, for a 2C receiver, a given source or receiver couple, and at any time t, the corresponding 2 × 2 submatrix is defined as: where V is the eigenvectors matrix (column), σ D is the standard deviation of the (isotropic) noise for this couple, and λ I and λ II are the max and min of the two positive eigenvalues of the spectral matrix, respectively.
Let us consider the sub-matrix for the first data 2C-trace of the covariance matrix on the data space: where ρ i , σ i,X , and σ i,Z are, respectively, the correlation coefficient, the standard deviation for the X component, and the standard deviation for the Z component for a given time sample i, and nt being the total number of time samples.The matrix C D is a block diagonal matrix and its inverse (i.e., C −1 D , is another block diagonal matrix), composed of the inverse of each block.

An Offset VSP Synthetic Example
In this numerical example, the acquisition geometry is an offset VSP with a unique source at 500 m offset and the antenna is between 580 and 970 m depth in a vertical well (see Figure 1 for more details).The isotropic elastic model depends only on the depth (1D model).The "true" model is defined in Figure 1.The horizontal and vertical components of the "observed" data obtained from the true model are displayed in Figure 1.The direct problem (seismic modeling) is based on discretizing the wave equation by the finite differences method (FDM) [16].
The inverted parameters are the P-wave velocity, the S-wave velocity, and the density.The starting models are simple, the water layer is considered as known, and the other layers are replaced by a constant vertical gradient for each parameter (gray solid lines in Figure 2).The test consists of comparing the results of inversion when data uncertainties are isotropic and when the polarization is included in the data uncertainties.As shown in Figure 2, the inversion results exhibits in both cases a good fit in the receiver zone, while the estimation is not good in the upper part of the model.One can clearly note that the S-wave velocity model is recovered with more detail than the P-wave velocity model.This is due to the fact that converted S-waves are present in borehole seismic data and also because the S-waves have a shorter wavelength than P-waves, hence, a better spatial resolution.The final residuals for both inversion experiments are small; a few percent of the misfit.The inversion results when polarization is included are significantly better than for the isotropic uncertainties.More precisely, for the case of polarization, both P-wave and S-wave estimated velocity fields are well fitted in the antenna zone.As shown in Figure 2, the inversion results exhibits in both cases a good fit in the receiver zone, while the estimation is not good in the upper part of the model.One can clearly note that the S-wave velocity model is recovered with more detail than the P-wave velocity model.This is due to the fact that converted S-waves are present in borehole seismic data and also because the S-waves have a shorter wavelength than P-waves, hence, a better spatial resolution.The final residuals for both inversion experiments are small; a few percent of the misfit.The inversion results when polarization is included are significantly better than for the isotropic uncertainties.More precisely, for the case of polarization, both P-wave and S-wave estimated velocity fields are well fitted in the antenna zone.

Constraints on the Model Space
We can distinguish two kinds of geological a priori information that can be incorporated in an inversion process.On the one hand, information obtained by measurements (well logging), that we consider as "objective" information, allows the definition of an a priori model at the well vicinity.On the other hand, information coming from geological interpretations, such as geological layer dips, that we qualify as "subjective" information, can be simplified and introduced into the covariance matrix C M .

Quantification of Number of Degrees of Freedom
In order to evaluate the importance of constraints on an inversion, it is necessary to quantify the number of degrees of freedom.This number can be estimated from the set of parameters as a function of existing correlations among them and independently of their standard deviations.The number N of free parameters is equal to the number n of parameters in the absence of correlations, and it is reduced to 1 in the case of perfect correlations.

General Case
Let us consider a 1D Gaussian random field X, the domain is discrete and finite consisting of a set of n points; this field is equivalent to the set of random variables (X 1 , X 2 , • • • , X n ).It can be characterized by the following probability density function (or p.d.f.): where x is a realization, m is the expectation of the field (E(X) = m), and C is the covariance matrix of the field.The covariance matrix can be rewritten with the following form = SRS, where R is the correlation matrix and S the diagonal matrix of the standard deviations.
If there is no correlation among the random variables X i , then R = I and the covariance matrix is diagonal and the number of free parameters is then n.If a correlation exists, we then use the LU decomposition method to estimate the number of free parameters (see Appendices A and B for more details).We can write R = MM T where M is the left lower triangle matrix of the LU decomposition of the matrix R. The number of free parameters is then reduced to:

Spatial Correlation on the Model Space
For illustration purposes, let us consider the special case where the spatial correlations are described by an exponential model.Thus, for any pair of points M 1 and M 2 (belonging to the same horizontal line), the correlation ρ between the pair of random variables X 1 and X 2 corresponding to the points M i can be expressed as a function of the distance: where d(•) is a distance and r is the range of the correlation.By setting the variance for each variable to be constant σ i = σ 1 , the p.d.f. of Equation ( 7) reduces to, This simplifies, as we will show below, the study of the relation between the matrix R and the degrees of freedom using an exponential correlation function.
Let us consider the same Gaussian field X, but this time defined on a defined 1D domain regularly sampled (as for a grid) containing n points.The distances between points are multiples of the step ∆x and the correlation matrix has the following form (for ordered points): where One interesting property of the R matrix is that the M matrix of the LU decomposition has a simple form This means that, in terms of a sequential simulation, the n − 1 variables, X 2 to X n , have the same degree of freedom: b.The total degrees of freedom N of the field X is then: where for n parameters (or n grid points), the number of free parameters, considered as the degrees of freedom, of the inversion is N. Let us consider the same Gaussian field , but this time defined on a defined 1D domain regularly sampled (as for a grid) containing  points.The distances between points are multiples of the step ∆ and the correlation matrix has the following form (for ordered points): where One interesting property of the  matrix is that the  matrix of the  decomposition has a simple form with  = √1 −  .This means that, in terms of a sequential simulation, the  − 1 variables,  to  , have the same degree of freedom: .The total degrees of freedom  of the field  is then: where for  parameters (or  grid points), the number of free parameters, considered as the degrees of freedom, of the inversion is .

Seismic Crosswell Numerical Experiment Example
This numerical example is an illustration how to incorporate the "subjective" geological information that our medium is mainly tabular but might contain complex geological structures using the covariance matrix on the model space.

Seismic Crosswell Numerical Experiment Example
This numerical example is an illustration how to incorporate the "subjective" geological information that our medium is mainly tabular but might contain complex geological structures using the covariance matrix on the model space.

Description of the Crosswell Experiment and Observed Data
The reference subsurface models are shown in Figure 4.The word "reference" indicates that this model is used to generate the synthetic seismic data used as observed data (see Figure 5) during the inversion process.It is also the model to be retrieved by applying the full-wave inversion method to the observed data.The medium for wave propagation is elastic isotropic.The parameters to be inverted are discretized physical fields: vertical P-wave and S-wave velocities.For simplicity, the density (ρ) is not inverted-the field is constant with ρ = 2500 kg•m −3 and it fixed at the correct value in the different inversions.The medium for wave propagation is elastic isotropic.The parameters to be inverted are discretized physical fields: vertical P-wave and S-wave velocities.For simplicity, the density () is not invertedthe field is constant with  = 2500 kg m and it fixed at the correct value in the different inversions.triangles denote the locations of the receivers.On can notice that the structures of these reference velocity fields are the same for P-and S-wave.It is composed of 1D regions (invariance along the horizontal axis) in the upper part of the model and at the bottom with homogeneous layers and other layers with a velocity increasing linearly with depth (vertical constant gradient).The middle part of the model exhibits complex 2D structures.

Multiscale Constrained Inversion
The inversion process consists of four successive inversions using the same observed data but with decreasing correlation ranges in horizontal and vertical directions; therefore, it is a multiscale inversion and each inversion is for a given "scale".The model parameter results of each inversion are used as initial model parameters for the next inversion.
Uncertainties on the model space are required for the definition of C M −1 .It can be defined by specifying the spatial correlation ranges and the uncertainties on the parameters.Here the uncertainties on the parameters are kept constant: σ V p = 120 m/s and σ Vs = 100 m/s.The ranges of the horizontal and vertical correlations do not depend on the parameters but on the spatial location.The horizontal correlation depends on depth and both horizontal and vertical ranges depend on the inversion scale.We define three types of a priori regions with respect to depth (see Table 1): "Quasi 1D" with large correlation ranges, "2D" with short correlation ranges (where we expect to have complex geological objects, see Figure 4), and "Transition", with correlation ranges linearly varying with depth to match the ranges of the two adjacent regions.Table 2 lists the different correlation ranges associated with the four successive inversion scales and for the a priori region types.The four successive inversion scales are denoted by (b), (c), (d), and (e) in relation to the results displayed in Figures 6 and 7.The correlation ranges are gradually decreasing with the successive inversion scales in order to solve first the large wavelengths in the model before resolving the small wavelengths (i.e., the small model structures).be resolved as both correlation ranges are smaller than the size of the triangle.The 1D structures are well estimated, with some vertical smoothening due to the vertical correlation range of 10 m.For the inversion scale (d), the estimated P-velocity field is obtained from the third inversion scale, i.e., using correlation ranges of 20 m and 6 m in the 2D region, respectively, for the horizontal and vertical correlations.One can notice that the 2D structure is getting better spatial resolution, for both the triangle and the lower zigzag interface.The 1D structures are well resolved.For the inversion scale (e), the estimated P-velocity field is obtained from the fourth and last inversion scale, i.e., using correlation ranges of 5 m and 2 m in the 2D region, respectively, for the horizontal and vertical correlations.One can notice that the delineation of the 2D structure is close to the reference model.As expected, the estimated field outside the zone between the two wells is not reliable.The results are similar to the P-wave velocity field (Figure 6).The main difference is clear at the fourth scale (e): the spatial resolution is almost perfect for the S-wave due to smaller wavelength content associated to this mode.However, the inversion process provides an overestimation of velocity contrast near the interfaces and some small oscillations in the estimated field clearly visible for the light blue domain above the central triangle.It means that the number of degrees of freedom is too high regarding the amount of information provided by the data.The S-wave estimated field exhibits a better spatial resolution than the P-wave (see Figure 6) due to the smaller wavelength content associated with this mode.Small artifacts can be observed around the 2D triangle structure, pointing out the fact that the problem is less constrained in this part of the model.The results of the multiscale inversion process for the P-wave velocity field are displayed in Figure 6.The P-wave velocity starting model is obtained by a low-frequency full waveform inversion.It is also used as the starting model for the unconstrained inversion; such starting models can also be obtained from traveltime inversion, especially for crosswell data.The estimated P-velocity field is obtained from the first inversion scale, i.e., using correlation ranges of 320 m and 24 m in the 2D region, respectively, for the horizontal and vertical correlations.One can notice that the 2D structure cannot be resolved, as the horizontal correlation range is larger than the size of the triangular object.The 1D structures are partially retrieved.For the inversion scale (c), the estimated P-velocity field is, obtained from the inversion scale (b), i.e., using correlation ranges of 80 m and 10 m in the 2D region, respectively, for the horizontal and vertical correlations.One can notice that the 2D structure start to be resolved as both correlation ranges are smaller than the size of the triangle.The 1D structures are well estimated, with some vertical smoothening due to the vertical correlation range of 10 m.For the inversion scale (d), the estimated P-velocity field is obtained from the third inversion scale, i.e., using correlation ranges of 20 m and 6 m in the 2D region, respectively, for the horizontal and vertical correlations.One can notice that the 2D structure is getting better spatial resolution, for both the triangle and the lower zigzag interface.The 1D structures are well resolved.For the inversion scale (e), the estimated P-velocity field is obtained from the fourth and last inversion scale, i.e., using correlation ranges of 5 m and 2 m in the 2D region, respectively, for the horizontal and vertical correlations.One can notice that the delineation of the 2D structure is close to the reference model.As expected, the estimated field outside the zone between the two wells is not reliable.
The results of the multiscale inversion process for the S-wave velocity field are displayed in Figure 7.The results are similar to the P-wave velocity field (Figure 6).The main difference is clear at the fourth scale (e): the spatial resolution is almost perfect for the S-wave due to smaller wavelength content associated to this mode.However, the inversion process provides an overestimation of velocity contrast near the interfaces and some small oscillations in the estimated field clearly visible for the light blue domain above the central triangle.It means that the number of degrees of freedom is too high regarding the amount of information provided by the data.
The misfit function for the successive multiscale inversion continuously decreases from one inversion scale to another (see Table 3).The number of degrees of freedom per point are provided in Table 4 for the different inversion scales.These numbers of freedoms are calculated from Equation (14).The ratio of the number of degrees with respect to the number of model parameters (i.e., grid points), in other words the number of degrees of freedom per point, is provided.It is increasing slower for the quasi 1D region than for the 2D region because only the vertical correlation range is decreasing, while both horizontal and vertical ranges are decreasing in the 2D region.These ratios are pointing out that the degrees of freedom are significantly lower in the early scales of the multiscale inversion, allowing better constraint of the problem and improving the stability by defining an overdetermined system.During the last scale stages, the constraints are relaxed, allowing one to obtain a better resolution while avoiding artifacts in the estimated fields.Even for the last scale, the ratio indicates about 5 times less free parameters than in the unconstrained inversion.For comparison purposes with the results of the constrained inversion, we performed an unconstrained inversion.In the Figure 8, we display the results of the multiscale constrained inversion with respect to the unconstrained inversion for the same misfit of 0.3%.We can notice that the unconstrained inversion results are noisier than the constrained inversion.The main artifacts can be found near the sources (classical problem in borehole seismic full waveform inversion), below the triangle structure, above the source or receiver zones (migration smile type artifacts), and in the constant gradient 1D layers.These artifacts in the estimated fields are the consequence of the instability of the least-squares inversion problem when the problem is not correctly determined for all parameters, in other words, when it is partially underdetermined.The number of degrees of freedom is too large regarding the information provided by the data.Reducing the number of degrees of freedom of the problem taking into account prior geological information (i.e., constraining the fields using spatial statistic where we have more prior information) allows one to overcome this problem, reducing drastically the number and the magnitude of the artifacts.The main artifacts are near the sources, below the triangle structure, above the source or receiver zones and below (less visible due to the color range), and in the constant gradient 1D layers.As expected, there is no additional information outside the square delimited by the wells for the case of unconstrained inversion.

Conclusions
In this paper, we have illustrated how to introduce, in the inverse problem, constraints consistent with the least squares formalism (through covariance matrices) on both the data and the model space.
Constraining the seismic inverse problem in the data space is not straightforward but when possible, it is a very powerful tool.In the case of a multicomponent borehole seismic data, using the Comparison between results from the constrained inversion using the multiscale inversion process results of the multiscale inversion process and results from the unconstrained inversion: (a) The estimated P-wave velocity field for the multiscale inversion for a misfit of 0.3%; (b) the reference P-wave velocity field; (c) the estimated P-wave velocity field for the unconstrained inversion for a misfit of 0.3%; (d) the estimated S-wave velocity field for the multiscale inversion for a misfit of 0.3%; (e) the reference S-wave velocity field; (f) the estimated S-wave velocity field for the unconstrained inversion for a misfit of 0.3%.Both P-and S-wave estimated fields show that the multiscale inversion results are smoother than the unconstrainted inversion results, with well artifact and better reliability.The main artifacts are near the sources, below the triangle structure, above the source or receiver zones and below (less visible due to the color range), and in the constant gradient 1D layers.As expected, there is no additional information outside the square delimited by the wells for the case of unconstrained inversion.

Conclusions
In this paper, we have illustrated how to introduce, in the inverse problem, constraints consistent with the least squares formalism (through covariance matrices) on both the data and the model space.
Constraining the seismic inverse problem in the data space is not straightforward but when possible, it is a very powerful tool.In the case of a multicomponent borehole seismic data, using the polarization in the inversion process allowed better recovery of the sharpness of the interfaces at the level of the receivers.
Constraining the seismic inverse problem in the model space can be achieved by defining a priori model parameters and evaluating the model uncertainties associated with this model.Horizontal and vertical spatial correlation using Laplace distribution can be used to fill the model space covariance matrix in order to introduce an a priori information favoring the tabular region versus more complex regions, by varying accordingly the ranges of these correlations.The inversion favors solutions are consistent with our a priori, whenever it does not contradict the information contained in the data.Moreover, by adopting a multiscale type of inversion by relaxing the correlation ranges, it regularizes the inverse problem.
The incorporation of all our a priori knowledge of the parameters and all statistical studies on the data, allows not only the algorithmic stabilization of the inversion process, but also the reduction of the solution set for an underdetermined problem, with the purpose being not necessarily to converge quickly towards a good model (in term of residuals), but to prospect regions of the model space populated by models that are sensible, a priori, and also yielding the lowest possible misfit.We can note that  = 1 when the correlation is perfect (  = ±1 ) and  = 2 when the correlation is inexistent ( = 0).  .The joint probability density f (x, y) of the couple of random variables (X, Y) is plotted using contour levels (at level 0.9• f max , 0.7• f max , 0.5• f max , 0.3• f max and 0.1• f max ), the marginal probability densities for the variables themselves ( f x (x) for X f y (y) for Y) are plotted on the x and the y axes.The value x 0 = 1.5 has been fixed and the conditional probability density f c (y) = f y|x (y|x 0 ) is plotted, the mean m x = m y = 0 and m c has been reported as the standard deviation σ x = σ y = 1 and σ c , the correlation ρ = 0.8 appears only in the ellipticity of the ellipses for the joint p.d.f.f .As Equation (A17) shows, the number of free parameters of the couple (X, Y) is 1 + σ c σ y = 1 + 1 − ρ 2 = 1.6.
δm is the perturbation in the vicinity of the current model m n , • δd n are the data residuals for the model m n , • ∆m n is the difference between the current model m n and the a priori model m prior • G n is the linear function tangent to g at the model m n , • g is the function mapping the model space m ∈ M into the data space d = g(m) ∈ D • C D is the covariance matrix on the data space.• C M is the covariance matrix on the model space (defining the Gaussian a priori probability density).

Geosciences 2019, 9 ,Figure 1 .
Figure 1.Offset Vertical Seismic Profile (OVSP) numerical experiment: (a) Synthetic experiment configuration where the asterisk near the surface denotes the source position for the unique shot point, whereas the triangles in the well denote the location of the 40 two-component geophones.The "true" model is composed of nine homogeneous layers, the first one being the water layer.This true model is used both to model the observed data and as a reference to check inversion results; (b) the horizontal (top) and vertical (bottom) components of the ground velocity field recorded at geophones-the fields are obtained by modeling the wave equation in the true model (finite difference approximation).The color represents the local polarization estimated using a 2-period time window.

Figure 1 .
Figure 1.Offset Vertical Seismic Profile (OVSP) numerical experiment: (a) Synthetic experiment configuration where the asterisk near the surface denotes the source position for the unique shot point, whereas the triangles in the well denote the location of the 40 two-component geophones.The "true" model is composed of nine homogeneous layers, the first one being the water layer.This true model is used both to model the observed data and as a reference to check inversion results; (b) the horizontal (top) and vertical (bottom) components of the ground velocity field recorded at geophones-the fields are obtained by modeling the wave equation in the true model (finite difference approximation).The color represents the local polarization estimated using a 2-period time window.

Figure 2 .
Figure 2. Two inversion experiments with and without polarization constraints.(a) Without polarization constraints for P-wave velocity estimated model.(b) With polarization constraint for Pwave velocity estimated model.(c) Without polarization constraints for S-wave velocity estimated model.(d) With polarization constraints for S-wave velocity estimated model.In all plots: red thick solid line denotes the true model we want to retrieve by inverting the observed data; gray solid line denotes the starting model in the inversion process; blue, green, and black solid lines denote estimated models from inversion, respectively, at iterations 10th, 30th, and 80th.The vertical thick black solid line indicates the level of the receiver antenna.The constrained inversion contributes to the recovery of the sharp interfaces, especially at the level of the receiver antenna.

Figure 2 .
Figure 2. Two inversion experiments with and without polarization constraints.(a) Without polarization constraints for P-wave velocity estimated model.(b) With polarization constraint for P-wave velocity estimated model.(c) Without polarization constraints for S-wave velocity estimated model.(d) With polarization constraints for S-wave velocity estimated model.In all plots: red thick solid line denotes the true model we want to retrieve by inverting the observed data; gray solid line denotes the starting model in the inversion process; blue, green, and black solid lines denote estimated models from inversion, respectively, at iterations 10th, 30th, and 80th.The vertical thick black solid line indicates the level of the receiver antenna.The constrained inversion contributes to the recovery of the sharp interfaces, especially at the level of the receiver antenna.

Figure 3
shows the relation between the range of the correlation (expressed in number of steps, ∆x) and the values of b.Geosciences 2019, 9, x FOR PEER REVIEW 8 of 19

Figure 3
shows the relation between the range of the correlation (expressed in number of steps, ∆) and the values of .

Figure 3 .
Figure 3. Degree of freedom per point (for a regular grid) as a function of the range (expressed in multiples of the grid step) of the correlation for a Laplace correlation function.

4. 2 . 1 .
Description of the Crosswell Experiment and Observed Data The reference subsurface models are shown in Figure 4.The word "reference" indicates that this

Figure 3 .
Figure 3. Degree of freedom per point (for a regular grid) as a function of the range (expressed in multiples of the grid step) of the correlation for a Laplace correlation function.

Figure 4 .
Figure 4.The reference velocity models: (a) Reference P-wave velocity field; (b) reference S-wave velocity field.The source borehole is located at 0 m distance and the nine asterisks (*) denote the source locations for the nine shots.The receiver borehole is located at 280 m offset and the small triangles denote the locations of the receivers.On can notice that the structures of these reference velocity fields are the same for P-and S-wave.It is composed of 1D regions (invariance along the horizontal axis) in the upper part of the model and at the bottom with homogeneous layers and other layers with a velocity increasing linearly with depth (vertical constant gradient).The middle part of the model exhibits complex 2D structures.

Figure 4 .
Figure 4.The reference velocity models: (a) Reference P-wave velocity field; (b) reference S-wave velocity field.The source borehole is located at 0 m distance and the nine asterisks (*) denote the source locations for the nine shots.The receiver borehole is located at 280 m offset and the small triangles denote the locations of the receivers.On can notice that the structures of these reference velocity fields are the same for P-and S-wave.It is composed of 1D regions (invariance along the horizontal axis) in the upper part of the model and at the bottom with homogeneous layers and other layers with a velocity increasing linearly with depth (vertical constant gradient).The middle part of the model exhibits complex 2D structures.

Figure 5 .
Figure 5.The 5th shot, at 1200 m deep, of the observed data obtained by full-wave modeling from the reference models: (a) The horizontal component; (b) the vertical component.The source function is a first derivative of a Gaussian with a central frequency of 100 Hz and it acts as a stress source with the following radiation coefficients σ xx 1 , σ zz = 0.5, σ xz = 0.These data are used as observed data in the multiscale inversion process.

Figure 6 .
Figure 6.The results of the multiscale inversion process for the P-wave velocity field.Stars denote the source location while the small triangles denote the receiver location: (a) The P-wave velocity starting model; (b) the estimated P-velocity field obtained from the first inversion scale using correlation ranges of  = 320 m and  = 24 m in the 2D region; (c) the estimated P-velocity field obtained from the second inversion scale using correlation ranges of  = 80 m and  = 10 m in the 2D region; (d) the estimated P-velocity field obtained from the third inversion scale using correlation ranges of  = 20 m and  = 6 m in the 2D region; (e) the estimated P-velocity field obtained from the fourth and last inversion scale using correlation ranges of  = 5 m and  = 2 m in the 2D region; (f) the reference P-wave velocity model.The results for the first scale of the inversion do not allow recovery of the 2D structures, as the correlation ranges are greater than the size of the objects.

Figure 6 .
Figure 6.The results of the multiscale inversion process for the P-wave velocity field.Stars denote the source location while the small triangles denote the receiver location: (a) The P-wave velocity starting model; (b) the estimated P-velocity field obtained from the first inversion scale using correlation ranges of r x = 320 m and r z = 24 m in the 2D region; (c) the estimated P-velocity field obtained from the second inversion scale using correlation ranges of r x = 80 m and r z = 10 m in the 2D region; (d) the estimated P-velocity field obtained from the third inversion scale using correlation ranges of r x = 20 m and r z = 6 m in the 2D region; (e) the estimated P-velocity field obtained from the fourth and last inversion scale using correlation ranges of r x = 5 m and r z = 2 m in the 2D region; (f) the reference P-wave velocity model.The results for the first scale of the inversion do not allow recovery of the 2D structures, as the correlation ranges are greater than the size of the objects.The 2D structures start to appear from the second scale, improving from the third scale, and are well recovered for the last scale.

Figure 7 .
Figure 7.The results are similar to the P-wave velocity field (Figure6).The main difference is clear at the fourth scale (e): the spatial resolution is almost perfect for the S-wave due to smaller wavelength content associated to this mode.However, the inversion process provides an overestimation of velocity contrast near the interfaces and some small oscillations in the estimated field clearly visible for the light blue domain above the central triangle.It means that the number of degrees of freedom is too high regarding the amount of information provided by the data.

Figure 7 .
Figure 7.The results of the multiscale inversion process for the S-wave velocity field.Stars denote the source location while the small triangles denote the receiver location: (a) The S-wave velocity starting model; (b) the estimated S-velocity field obtained from the first inversion scale using correlation ranges of  = 320 m and  = 24 m in the 2D region; (c) the estimated S-velocity field obtained from the second inversion scale using correlation ranges of  = 80 m and  = 10 m in the 2D region; (d) the estimated S-velocity field obtained from the third inversion scale using correlation ranges of  = 20 m and  = 6 m in the 2D region; (e) the estimated S-velocity field obtained from the fourthand last inversion scale using correlation ranges of  = 5 m and  = 2 m in the 2D region; (f) the reference S-wave velocity model.The S-wave estimated field exhibits a better spatial resolution than the P-wave (see Figure6) due to the smaller wavelength content associated with this mode.Small artifacts can be observed around the 2D triangle structure, pointing out the fact that the problem is less constrained in this part of the model.

Figure 7 .
Figure 7.The results of the multiscale inversion process for the S-wave velocity field.Stars denote the source location while the small triangles denote the receiver location: (a) The S-wave velocity starting model; (b) the estimated S-velocity field obtained from the first inversion scale using correlation ranges of r x = 320 m and r z = 24 m in 2D region; (c) the estimated S-velocity field obtained from the second inversion scale using correlation ranges of r x = 80 m and r z = 10 m in the 2D region;(d) the estimated S-velocity field obtained from the third inversion scale using correlation ranges of r x = 20 m and r z = 6 m in the 2D region; (e) the estimated S-velocity field obtained from the fourth and last inversion scale using correlation ranges of r x = 5 m and r z = 2 m in the 2D region; (f) the reference S-wave velocity model.The S-wave estimated field exhibits a better spatial resolution than the P-wave (see Figure6) due to the smaller wavelength content associated with this mode.Small artifacts can be observed around the 2D triangle structure, pointing out the fact that the problem is less constrained in this part of the model.

Figure 8 .
Figure 8.Comparison between results from the constrained inversion using the multiscale inversion process results of the multiscale inversion process and results from the unconstrained inversion: (a)The estimated P-wave velocity field for the multiscale inversion for a misfit of 0.3%; (b) the reference P-wave velocity field; (c) the estimated P-wave velocity field for the unconstrained inversion for a misfit of 0.3%; (d) the estimated S-wave velocity field for the multiscale inversion for a misfit of 0.3%; (e) the reference S-wave velocity field; (f) the estimated S-wave velocity field for the unconstrained inversion for a misfit of 0.3%.Both P-and S-wave estimated fields show that the multiscale inversion results are smoother than the unconstrainted inversion results, with well artifact and better reliability.The main artifacts are near the sources, below the triangle structure, above the source or receiver zones and below (less visible due to the color range), and in the constant gradient 1D layers.As expected, there is no additional information outside the square delimited by the wells for the case of unconstrained inversion.

Figure 8 .
Figure 8.Comparison between results from the constrained inversion using the multiscale inversion process results of the multiscale inversion process and results from the unconstrained inversion: (a) The estimated P-wave velocity field for the multiscale inversion for a misfit of 0.3%; (b) the reference P-wave velocity field; (c) the estimated P-wave velocity field for the unconstrained inversion for a misfit of 0.3%; (d) the estimated S-wave velocity field for the multiscale inversion for a misfit of 0.3%; (e) the reference S-wave velocity field; (f) the estimated S-wave velocity field for the unconstrained inversion for a misfit of 0.3%.Both P-and S-wave estimated fields show that the multiscale inversion results are smoother than the unconstrainted inversion results, with well artifact and better reliability.The main artifacts are near the sources, below the triangle structure, above the source or receiver zones and below (less visible due to the color range), and in the constant gradient 1D layers.As expected, there is no additional information outside the square delimited by the wells for the case of unconstrained inversion.

Figure A1 .
Figure A1.The joint probability density (, ) of the couple of random variables (, ) is plotted using contour levels (at level 0.9 •  , 0.7 •  , 0.5 •  , 0.3 •  and 0.1 •  ), the marginal probability densities for the variables themselves ( () for  and  () for ) are plotted on the  and the  axes.The value  = 1.5 has been fixed and the conditional probability density  () =  | (| ) is plotted, the mean  =  = 0 and  has been reported as the standard deviation   =   = 1 and   , the correlation  = 0.8 appears only in the ellipticity of the ellipses for the joint

Figure A1
Figure A1.The joint probability density f (x, y) of the couple of random variables (X, Y) is plotted using contour levels (at level 0.9• f max , 0.7• f max , 0.5• f max , 0.3• f max and 0.1• f max ), the marginal probability densities for the variables themselves ( f x (x) for X f y (y) for Y) are plotted on the x and the y axes.The value x 0 = 1.5 has been fixed and the conditional probability density f c (y) = f y|x (y|x 0 ) is plotted, the mean m x = m y = 0 and m c has been reported as the standard deviation σ x = σ y = 1 and σ c , the correlation ρ = 0.8 appears only in the ellipticity of the ellipses for the joint p.d.f.f .As Equation (A17) shows, the number of free parameters of the couple (X, Y) is 1 + σ c σ y = 1 + 1 − ρ 2 = 1.6.

Table 1 .
A priori geological region type with respect to depth.

Table 2 .
Horizontal and vertical correlation ranges for the different a priori region types and inversion scales.

Table 3 .
The misfit and iteration number for the different inversion scales.

Table 4 .
Number of degrees of freedom (DoF) per point for the successive multiscale inversions.