RECOVERING SINUSOIDS FROM NOISY DATA USING BAYESIAN INFERENCE WITH SIMULATED ANNEALING

- In this paper, we studied Bayesian analysis proposed by Bretthorst[6] for a general signal model equation and combined it with a simulated annealing (SA) algorithm to obtain a global maximum of a posterior probability density function (PDF) for frequencies. Thus, this analysis offers different approach to finding parameter values through a directed, but random, search of the parameter space. For this purpose, we developed a Mathematica code of this Bayesian approach together with SA and used it for recovering sinusoids from noisy data. Simulations results support its effectiveness.


INTRODUCTION
Let the vector θ contain the parameters to be estimated from the (measurements) vector , D which is the output of the physical system that one wants to be modeled.In many experiments, the recorded data ( ; ) ( ), where ( ) i e t represents the noise, assumed to be drawn independently from a zero mean Gaussian probability distribution with a standard deviation of .Different models correspond to different choices of signal model function ( ; ) f t θ .According to Bretthorst [6], the most general signal model may be given in the following form: . In this paper, we address the problem of estimation of sinusoids in white Gaussian noise within a Bayesian framework.This is of great interest in many fields of science, including seismology, nuclear magnetic resonance and radar.Under an assumption of a known number of sinusoids, several algorithms have already been used in parameter estimation literature, such as least-square fitting [9], discrete Fourier transform [1], and periodogram [2].The discrete Fourier transform has been a very powerful tool in Bayesian spectral analysis since Cooley and Tukey introduced the fast Fourier transform (FFT) technique in 1965, followed by the rapid development of computers.After Jaynes [3] derived a periodogram directly from the principles of Bayesian inference, researchers in different branches of science have given much attention to the relationship between Bayesian inference and parameter estimation and they have done excellent works in this area for last fifteen years [4][5][6][7].
We consider here a Bayesian approach, proposed by Bretthorst, in which the posterior PDF for { }  has to be maximized.Unfortunately, conventional algorithms [9]  based on a gradient direction fail to converge for this maximization problem.Even when they converge, there is no assurance that they have found a global, rather than a local maximum.This is because a log of the posterior PDF is so sharply peaked and highly nonlinear function of{ }  .Bretthorst pointed it out and used a pattern search algorithm described by Hook-Jevees [10] to overcome this problem but, this approach does not converge unless starting point is much closer to the optimum{ }  .Our contribution is therefore to develop a computer program using Mathematica [8] that combines this Bayesian approach with a very different optimization algorithm called SA [11].It explores the entire surface of the posterior PDF for the frequencies and tries to optimize it while moving both uphill and downhill in order to escape from local maxima.Furthermore, it is largely independent of the starting values, often a critical input in conventional algorithms.Finally, we use it for estimating parameters of the sinusoids corrupted by a random noise.

BAYESIAN PARAMETER ESTIMATION
Let us now reconsider the problem given in Equation (1) within the Bayesian  is a set of parameters of interest, then their joint PDF given the observed data D is proportional to The quantity   The exponent Q is defined as follows where . In order to obtain the PDF for{ }  , we integrate out the nuisance parameters{ } B and dependence, To do it analytically, it is necessary to make the matrix jk  to be diagonal, effectively by introducing new orthogonal model functions, This diagonalization process gives a new expression for the signal model function, where the new amplitudes k A 's are related to the old amplitudes j B 's by 1 , ( 1,..., ) and where kj u represents the j th component of the k th normalized eigenvector of jk  , with j  as the corresponding eigenvalue.Substituting these expressions into Equation (5) and define to be the projection of the data onto the orthogonal model functions

 
,{ } j H t  , we can then proceed to perform m Gaussian integrations over j A .If  is known, the joint PDF of the { }  parameters, conditional on the data is given by 2 where not known, by using the Jeffreys prior [15] and integrating Equation ( 6) over parameter we obtain This has the form of the Student t-distribution.It is desirable to compute the variances associated with those parameters.To do this, we can expand the function 2 h in a Taylor series at the point { }  which maximizes the PDF in Equation ( 11) or (12), such For an arbitrary signal model the matrix jk b cannot be calculated analytically; however, it can be evaluated numerically.This can be done by first changing to the orthogonal variables and performing the  Gaussian integrals.Let j  and kj u represent the th j eigenvalue and eigenvectors of the matrix jk b , respectively.Then the new orthogonal variables are given by 1 1 , From these, we get the estimated frequency j w in the form: One can also show that j j A h    .By using Equation ( 9) we get the estimated amplitude k B in the form:

CONTINUOUS GLOBAL OPTIMIZATION ALGORITHM
Bayesian parameter estimation turns into the global optimization problem [12] which is a task to find the best possible solution for the problem in Equations ( 11) and (12): where . Although there is numerous other approaches [13,16] to solve this maximization problem, the SA algorithm, suggested by Corana et.al. [11] and modified by Goffe et.al. [12] is chosen here because it is generally applicable and easy to implement.This begins with initial guesses of 0 ω and 0  (called temperature).Each step of this algorithm replaces the current frequency by a random nearby frequency.In other words, the next candidate point k ω is generated by varying one component of the current iteration ( 1)   k  ω at a time: where ).The step size  is calculated from the square root of the Cramer-Rao lower bound (CRLB)    ω [13] that is a lower limit to the variance of the measurement of j  , so this generates a natural scale size of the search space around its estimated value.It is expected that better solutions lie close to solutions that are already good and so normally distributed step size are used.Thus, the point , , Otherwise, it is accepted or rejected according to the Metropolis-criterion [14]: where p is a uniformly distributed random number from  0,1 .This continues until all  components have been altered and thus  new points have successively accepted or rejected according to the Metropolis criterion.After this process is repeated s n times, the whole cycle is then repeated n  times, after which the temperature is decreased by a factor 0 k      (called annealing or cooling Termination of the algorithm occurs when average function value of sequences of points after each s n n  step cycle reaches a stable state: where is a small positive number defined by user,   is the value of the PDF at the best point k ω and l indicates the last four successive iteration values of the PDF that are being stored.

COMPUTER SIMULATED EXAMPLES
To verify the performance of the proposed algorithm, we generated a data vector from multiple frequency sinusoids.Here i t runs over the symmetric time interval  We carried out the Bayesian analysis of the simulated data, assuming that we know the mathematical form of the signal model but, not the value of the parameters.As an initial estimate of the frequencies 0 ω for the maximization procedure, it is possible to take random from the interval   0, 2 .However, it is better to start with the locations of the peaks chosen automatically from the Fourier power spectral density graph by using a computer code written in Mathematica.After reasonable values of parameters that control the simulated annealing routine are chosen as 20 , the algorithm starts at some high temperature 0  =100 and generates the sequence of points until a sort of equilibrium is approached; that is a sequence of points reaches a stable value as iteration proceeds.During this phase the step size is naturally adjusted.The best point ω reached so far is recorded.After thermal equilibration, 0  is reduced by a factor 0.85    and a new sequence is made starting from this best point ω , until thermal equilibrium is reached again and so on.Therefore it proceeds toward better maxima even in the presence of many local maxima.Consequently, the process is stop at a temperature low enough that no more useful improvement can be expected, according to a stopping criterion in Equation (20).Once the frequencies are estimated, we carried on calculating the estimated amplitudes associated with the errors.However, an evaluation of the posterior PDF at a given point ω cannot be made analytically and also requires a numerical calculation of projections on to orthonormal model functions, related to eigen-decomposition of    ω .It is therefore expected that its evaluation at ω requires larger consumption of time as the length of ω increases.
The computer program we developed was run on the workstation in two cases where the standard deviation of the noise is known or not.The computer simulations illustrated in Table 1 show the results when  is known but they are almost similar with those obtained in the case  is unknown.Estimated parameter values, quoted as (value) ± (standard deviation), indicate that all values of the parameters within the calculated accuracy are clearly recovered.The estimated value of signal to noise ratio (SNR) and the standard deviation of the noise are also shown in Table 1.On the other hand, Figures 1 shows the power of the method for recovering the signal from the noisy data.In general, we consider a multiple harmonic frequency signal model which is also used by Bretthorst [6]: cos(0.1 t 1) 2 cos(0.152) 5cos(0.3 3) 2 cos(0.314) 3cos( +5) ( 1 ,...,512) The best estimates of parameters are tabulated in Table 2. Once again, all the frequencies have been well resolved, even the third and fourth frequencies, which are very close not to be separated by the Fourier power spectral density shown in Figure 3.
Actually with the Fourier spectral density when the separation of two frequencies is less than the Nyquist step, defined as 2 / N  , two frequencies are indistinguishable.In this example these two frequencies are separated by 0.01, which is less than the Nyquist step size.Therefore, there is no way by using the Fourier power spectral density that one can resolve these closed frequencies less than the Nyquist step, however Bayesian power spectral density shown in Figure 3 gives us very good results with high accuracy.The results we obtained so far are also consistent with that of Bretthorst [6].
Moreover, we initially assumed that the noise in data were drawn from the Gaussian density with the mean 0   and the standard deviation .
 Figure 2 shows the exact and estimate PDF of the random noise in data.It is seen that the estimated (dotted) PDF is closer to the true (solid) PDF and the histogram of the data is also much closer to the true PDF.The histogram is known as a nonparametric estimator of the PDF because it does not depend on specified parameters.We generated 64 data samples from a single frequency signal model and added it to the variety of noise levels.After 50 independent trials, the mean square errors (MSE) were calculated and their logarithmic values were plotted with respect to SNR that varies between zero and 20 dB.It can be seen from Figure 4 that the proposed estimator has threshold about 3 dB of SNR and follows nicely the CRLB after this value.As expected, larger SNR ratio gives smaller MSE.

CONCLUSION
In this work we have partially developed a Bayesian approach with a simulated annealing and applied it to spectral analysis and parameter estimation problems.Overall results show that it provides rational approach for estimating, in an optimal way, values of parameters of sinusoids corrupted by random noise.Both frequency and amplitudes can be recovered from the experimental data and the prior information with high accuracy, especially the frequency which is the most important parameter in spectral analysis.Although it requires a large consumption of CPU time, it is competitive when compared to the multiple runs often used with conventional algorithms to test different starting values.For a sufficiently high SNR the MSE will attain Cramer-Rao lower bound so that it justifies the accuracy of the frequency estimation.Moreover, 1 In order to compare the results with Bretthorst's in this example we converted Mathematica is a powerful system for doing mathematics on the computer and it has grown to become today an unparalleled platform for all forms of computations.and "Bayesian Spectrum Estimation of Harmonic Signals with a number FEN-BGS-060907-0191, supported by the University of Marmara, Istanbul, Turkey.

1 {
assumes a signal model equation as follows:

1 2 {
label collectively{ }  and j B represents the amplitude corresponding to the j th model function   ;{ } j G t  .The goals of data analysis are usually to use the observed data D to infer the values of parameters 1 ,..., ; ,..., }

Figure 1 .
Figure 1.Recovering signals from noisy data produced from two closed harmonic Frequency signal model.

Figure 2 .
Figure 2. Comparison of exact and estimate PDF of noise in data   1   .

Table 1 .
Computer simulations for two closed harmonic frequency signal model

Table 2 .
Computer simulations for a multiple harmonic frequency model1