Article Blind Deconvolution of Seismic Data Using f-Divergences

This paper proposes a new approach to the seismic blind deconvolution problem in the case of band-limited seismic data characterized by low dominant frequency and short data records, based on Csiszar’s f-divergence. In order to model the probability density function of the deconvolved data, and obtain the closed form formula of Csiszar’s f-divergence, mixture Jones’ family of distributions (MJ) is introduced, by which a new criterion for blind deconvolution is constructed. By applying Neidell’s wavelet model to the inverse filter, we then make the optimization program for multivariate reduce to univariate case. Examples are provided showing the good performance of the method, even in low SNR situations.


Introduction
In seismic exploration, a seismic signal (e.g., recorded trace) is often represented as the result of a superposition of wavelets of constant shape weighted by the reflectivity series, thus it can be modeled as a convolution between the source signature (the embedded wavelet) and the reflectivity series.Deconvolution is an important process by which the reflectivity series can be estimated from the recorded trace, and vertical resolution of the seismic image will be enhanced.
For the above convolution model, when the wavelet is known, estimation of the reflectivity series is called deterministic deconvolution, which is done by least-squares, or in a Bayesian framework adding some priors on the reflectivity distribution.

OPEN ACCESS
As the wavelet is unknown, we have the classical blind deconvolution problem; statistical deconvolution appears to be a powerful tool for dealing with practical situations.Various approaches to statistical deconvolution have been proposed and applied in recent decades, including the expectation maximization (EM) algorithm, the higher-order statistics (HOS) algorithm, and the general purpose information criteria algorithm (GPIC).
The EM algorithm is an effective method for maximum-likelihood estimates, which are mainly used to estimate parameters from incomplete data.Since Mendel introduced the Bernoulli-Gaussian (BG) model as the distribution of reflectivity [1], a few approaches, based on the EM algorithm and BG model, have been published [2,3].BG is a simple form of the Gaussian Mixture Model, which makes the likelihood function be expressible in closed form.The EM algorithm has shown good performance in the blind deconvolution context; whereby the estimation problem for the wavelet and reflectivity can be solved in an incomplete data set (the lack of localization and magnitude of the reflectivity).For the real field data, the reflectivity sequence changes do not always appear as clearly as predicted by the BG model [4], therefore, the sparse nature of the input signal is not a sufficiently robust hypothesis for good system estimation.On the other hand, it may be possible to extend the BG model to a more complicated probability density, however, the resulting expressions of the likelihood function will be handled inconveniently, adding obstacles for its use.
The higher-order statistics which we use are known as cumulants.The cumulant matching (CM) method has been developed for estimating a mix-phase wavelet from a convolutional process.Matching between the trace cumulant and the wavelet moment is done in a minimum mean-squared error sense under the assumption of a non-Gaussian, stationary and statistically independent reflectivity sequence.The main difficulty of this approach is that it requires a large amount of data to achieve reliable estimations [5].But, in general, seismic data is arguably always nonstationary since anelastic attenuation processes are present everywhere, a seismic signal is shown to be approximately stationary just within a time window, in which the real signal is finite in length, leading to limited performance with this approach.
The GPIC algorithm has been intensively studied by many researchers for a large number of applications over the last couple of decades.In the seismic application, a seismic trace n x may be written in the form: where denotes the convolution, n w , n r , n n represent the wavelet, reflectivity, and superposed noise respectively.The idea of general purpose information criteria algorithm is to find an inverse filter a to create an output: which looks like the reflectivity series, the inverse filter can be obtained by optimizing some criterion related to the output characteristic.Figure 1 depicts such a convolution-deconvolution system.The earliest technique for this kind of algorithm was introduced by E.A. Robinson [6], and is based on the assumptions that the reflectivity sequence is a Gaussian white noise, and the wavelet is minimum phase.Since the pioneering work of Agard and Grau [7], it has been generally recognized in exploration geophysics that the reflectivity typically has a non-Gaussian distribution.Many techniques have been developed to exploit the nonGaussianity of seismic data for blind deconvolution based on information theory, among which excellent contributions were made by Wiggins, Claerbout, Gray, Ooe and Ulrych, Godfrey, Donoho, and Levy and Oldenburg [8][9][10][11][12][13][14], etc.For a more detailed discussion of their studies, see Walden [15].
Larue, Mars, and Jutten proposed a new blind single-input single-output (SISO) deconvolution method based on the minimization of the mutual information rate of the deconvolved output [4].The mutual information has the nice property of being positive and it vanishes if and only if the components of deconvolved output are mutually independent, by which a good criterion for blind deconvolution can be obtained.They estimate deconvolution filter in the frequency domain.To take the band limited nature of seismic wavelets and the presence of noise into account, Baan and Pham presented a modification of the mutual-information rate, whitening the deconvolution output only within the wavelet pass band [16].Their approach estimates the wavelet by maximizing the negentropy of the deconvolved output, and to prevent noise amplification, Wiener filtering is invoked.
Generally speaking, if a wavelet has no zeros on the unit circle, blind deconvolution is a well posed problem, but if it has zeros on the unit circle, then the problem is ill posed, in which the only criterion to apply to inverse filter is to maximize the nonGaussianity of the deconvolved output according to the central limit theorem, provided the distribution of reflectivity is non-Gaussian and non-infinite divisibility.In Van der Baan and Pham's work, they use negentropy as a criterion to perform deconvolution, which is defined as the Kullback-Liebler divergence between the density and the normal equivalent i.e. with the same mean and variance.
In statistics and information theory, Kullback-Liebler divergence is just one of information measures to show the difference of two distributions, Csiszár (and independently also by Ali and Silvery) introduced the f-divergence yielding a generalized family of invariant measures, which is given by [17]: , and 0 1 f , where p and q are two probability density functions (PDF).Considering a real value convex function g defined on R , from classical Jensen inequality: we have: 0 Equality holds if and only if q p . If we use p to denote the PDF of the deconvolution output, and let q be a PDF of the normal distribution with the same variance as that of p , then f-divergence is nonnegative and scale invariant, and will become larger as the deconvolution output goes off the Gaussianity, which provides an appealing method to build an information criterion.
In the above entropy-based approaches by Larue, Mars, and Jutten, Baan and Pham, one key point is to estimate the PDF of the deconvolution output, therefore, Pham designed a fast algorithm to obtain the PDF and score function for blind source separation application.However, as a nonparametric method, Pham's fast algorithm needs certain length of the data to provide a good estimation, in their deconvolution examples, the length of the input data is usually over 500.As mentioned above, a seismic signal is treated to be stationary just within a limited time window due to the intrinsic attenuation of the earth medium, these entropy-based approaches seem to result in a limited performance in the case of short data records.
Another point for the above approaches is that the number of iterations depends strongly on the choice of initialization of inverse filter, as most optimization programs do.Thirdly, to improve the stability and the performance of the algorithm, some constraints need to be placed on the inverse filter when deconvolution criterion are to be proposed, which can bring in some artifacts.
Inspired by Larue, Mars, and Jutten, Baan and Pham's works, this paper aims at constructing new criterion based on Csiszár f-divergence for seismic deconvolution, and proposes a new blind deconvolution algorithm.This new method can handle data with small samples, which means it has ability to deal with nonstationary data.We believe this approach is novel for several reasons: Finally, we give a briefly discussion on the wavelet phase estimation in Section 8. We give the conclusions in Section 9.

MJ Distribution
Jones constructed a family of distributions arising from distributions of order statistics, and it has a PDF to be of the form [18]: where B is the complete beta function, c is a PDF simply symmetric about zero, its distribution function is C , i.e., c C c .The Roles of a and b are that: , the corresponding distributions remain symmetric;    From Figure 2, we can see that a large family of distributions can be generated with this family; unfortunately a single Jones' family of distributions can only depict the tailedness or peakedness of a PDF, in practice, the distribution of reflectivity is essentially symmetric, with a sharper central peak and larger tails than a Gaussian distribution.In order to fit both tailedness and peakedness of the real data, we introduce the mixture Jones' family of distributions (MJ), which has the form:

D
, which controls the peak of f .g is the PDF of a normal distribution random variable with zero mean and the same variance as that of real data, G is the distribution function of g , g G c . Figure 3 shows an example of fitting a Laplace distribution with unit standard deviation using MJ, where g takes the form of the standard normal distribution.

Measures of Non-gaussianity Using f-Divergence
The goal of introducing an MJ distribution is to obtain the closed form formulas for the Csiszár f-divergence.In the equality (3), if we take one of the distributions as normal distribution with zero mean and the same variance as that of real data, then the f-divergence can indicate the non-Gaussianity of the data.Among the different Csiszár f-divergences, the following measure is considered in this paper: which is also called Kagan distance.
Let us compute the above divergence with respect to the MJ distribution; using (8), we have: Noting that g dx dG , and G is a distribution function, the change of variable x G y , gdx dy yields:

Estimation of the Parameters of the MJ Distribution
From the equality (7), let x G y , the resulting distribution is a mixture Beta distribution, which takes the form: > @ (11) then the parameters can be estimated by resorting to the method of moments.
We use the expectation of function Noting that  13) respectively, and simplifying, we have: x , from ( 14), ( 15), ( 16) we have: After some simple algebraic calculations, we obtain: Then, we have: So, 1 x and 2 x are the solutions of equation: Hence, we can show that the corresponding parameters is estimated as: Based on above results, we have the estimation of the JM parameters as follows: 1. Estimate the variance 2 y V of the output^ǹ y of the inverse filter;

On the Inverse Filter
Following the work by Baan and Pham, we can show that if the wavelet is known, the Wiener filter achieves an optimum solution for the inverse filter, since it makes a compromise between signal recovery and noise reduction.In the frequency domain, it can be written as: where Z W is the wavelet, the superscript of the numerator denotes the complex conjugate, 2 H is a normalization factor.
In the blind deconvolution context, the wavelet is unknown, and needs to be estimated, therefore, we adopt Neidell's wavelet model to construct an inverse filter, which is derived from practical observation, and its z transform has the form [19]: furthermore, if we denote the peak frequency of the amplitude spectrum of this wavelet as p Z , the relationship between the n and m is: So, if we know the peak frequency of the amplitude spectrum of the real wavelet, only one parameter needs to be determined, in practice, usually we do not know it; we have following value as a substitution for p Z : where FT denotes Fourier transform, ^ǹ x denotes the seismic record.

New Deconvolution Algorithm
Based on the above results, we construct a new frequency-domain criterion for blind deconvolution: After determining the p Z , A J is just the function of n or m , so we have: Thus our new deconvolution algorithm consists of the following steps: of the corresponding output of the inverse filter using the method in section four, and compute the information distance 4. Search the maximum cost function

A J
to get the optimum inverse filter.

Results
In this section, we evaluate the performance of the proposed method through simulations and real experimental data.

Simulation Results
We build a synthetic trace by convolving a super-Gaussian reflectivity with a Ricker wavelet; the wavelet has zero phase and a central frequency of 40 Hz.The reflectivity is generated as n b 3 , where n b are independent normal random variables with zero mean and variance 0.28, the sample interval is 1 ms.In order to test the performance of our methods for the data with small samples, we set the length of the synthetic trace about 256.The results are plotted in Figure 4 and Figure 5.  From the above figures, we can see that the wavelet is well estimated, and the spectrum of the seismic trace is obviously broadened, the deconvolved seismic data has an improved vertical resolution; for example, there are two big seismic reflectors near 200 ms, which cannot be separated in the original seismic trace; in the deconvolved trace, they are well distinguished, and some layer interfaces are thinner and better localized.
In the previous experiment, we used noiseless signals.Since the blind deconvolution methods are very sensitive to additive noise, it is important to test our algorithm for noisy signals.To this end, we add some noise to the synthetic trace, where the resulting signal-to-noise ratio is 15 dB.The results are plotted in Figure 6 and Figure 7, in which it can be shown that our method provides a good trade-off between deconvolution and noise amplification.

Real-Data Experiment
Here we adopt our method to a real seismic dataseat, which is from the CHANGQING oil field in China.The original data contains about 750 samples.From Figure 8 and Figure 9, we can see the spectrum of the seismic trace is broadened, and the resolution of seismic record is improved.For instance, the seismic reflectors, localized in about the 170 point, 400 point and the end of the trace, respectively, are now well imaged.

Stacked Section
In the final example, a stacked section is considered which is from CNOOC Research Center.Figures 10 display the data before and after deconvolution by our method.Comparing the seismic data before and after deconvolution, we can see that the deconvolved seismic data shows a significant improvement in vertical resolution and enhanced reflection detail corresponding to the geology.Such high-resolution reflection detail is a desirable feature for seismic interpretation.This example indicates that the method can work well in practice and can properly enhance the resolution of the seismic data.

Discussion
In our simulation experiments, the wavelet is zero phase, which means that our method performs well for zero-phase deconvolution.Zero-phase deconvolution, also denoted as pure-amplitude deonvolution, leaves the phase of original seismic section undistorted after deconvolution, i.e. with no time shift.However, in the blind deconvolution context, it is important for an algorithm to estimate well the phase of the wavelet.Theoretically, if a wavelet is reversible, i.e., no zeros on the unit circle, its phase can be uniquely determined (or a linear phase shift is left) by the non-Gaussianity maximization of the deconvolved output, As far as the irreversible wavelet, i.e., band-limited wavelet, is concerned, to our knowledge, there are no sound theoretical reasons for determining the phase of the wavelet uniquely.
To test the performance of these information-based approaches for the band-limited wavelet phase estimation, we designed the following experiment.Firstly we use a band-pass filter with constant phase to approximately model the spectrum of the deconvolved wavelet, which takes the form: where if the frequency of the band-pass filter is positive, the plus sign is taken.Secondly, we apply this band-pass filter bank to a reflectivity, and get a series of outputs; it is immediate that the information distance g f D , is the function of k .After having computed the information distances corresponding to different k , we can find the unique m k , for which the information distance achieves its peak value.If 0 m k or 1 N , it shows that the algorithm based on the non-Gaussianity maximization can make a good estimation for the phase of the band-limited wavelet.Finally, we independently perform the above experiment many times to obtain a numerical result.
Considering the real situation, in which the dominant frequency of the wavelet is low, we choose a Ricker wavelet with a central frequency 40 Hz, hence, the high frequency of the band-pass filter is 120 Hz, in the meantime, we set the low frequency as 5 Hz, and 20 N . The reflectivity contains 256 sampling points, and is generated as n b 3 , where n b are independent normal random variables with zero mean and variance 0.28.The result of 100 independent trials is plotted in Figure 11.Form the histogram of m k , it can be shown that about 60% of the m k take the value of 0 or 1 N , there is still some randomness in the phase estimation in the case of band-limited seismic data with a low dominant frequency and short data records.

Conclusions
In this paper, we have presented a new blind deconvolution method based on Csiszár f-divergence in the frequency domain.To facilitate estimating the PDF of the deconvolution output and deriving the closed form formulas of Csiszár f-divergence, we use MJ distribution to model the PDF of the output, and in this way, the PDF to be estimated can be generated with only a few parameters, and a new criterion for blind deconvolution can be constructed.Furthermore, we apply the wavelet model derived by Neidell to the inverse filter, which means the optimization program for multivariate reduces to univariate case without setting initialization and adding constraints to the inverse filter.The method has been applied to synthetic and real traces.Simulations and real-data experiments have shown the ability of our approach in the presence of the band-limited seismic data with a low dominant frequency, short data records, and low signal-to-noise ratio.
f-divergence reduces to the classical Kullback-Liebler divergence.
3. If b a z , skewness is introduced; 4. a controls left-hand tail weight, b controls right; the smaller a or , b the heavier the corresponding tail.

Figure 2
Figure2shows some examples of symmetric Jones' family of distributions generated from the standard normal distribution.

Figure 2 .
Figure 2. Examples of symmetric Jones' family of distributions.

2 . 4 M
Transform the data^ǹ y into.of^ǹ z ; 4. Compute parameters of MJ using corresponding equalities.

Figure 4 .
Figure 4.The results of deconvolution in the noise-free case.

Figure 5 .
Figure 5.The contrast between the original trace and the deconvolved trace.

Figure 6 .
Figure 6.The results of deconvolution for a noisy signal.

Figure 7 .
Figure 7.The contrast in the case of noise.

Figure 8 .Figure 9 .
Figure 8.The contrast in the spectrum of the seismic trace.

Figure 11 .
Figure 11.The histogram of m k .
of the inverse filter with the different parameter n , in which the superscript n denotes the shape parameter of the inverse filter; the subscript denotes the number of the time series y , and the inverse filter is given by (31) and (32); 1. Estimate p Z of the real data^ǹ x ; 2. Compute the output ^ǹ n y