Cramér–Rao Bounds for DoA Estimation of Sparse Bayesian Learning with the Laplace Prior

In this paper, we derive the Cramér–Rao lower bounds (CRLB) for direction of arrival (DoA) estimation by using sparse Bayesian learning (SBL) and the Laplace prior. CRLB is a lower bound on the variance of the estimator, the change of CRLB can indicate the effect of the specific factor to the DoA estimator, and in this paper a Laplace prior and the three-stage framework are used for the DoA estimation. We derive the CRLBs under different scenarios: (i) if the unknown parameters consist of deterministic and random variables, a hybrid CRLB is derived; (ii) if all the unknown parameters are random, a Bayesian CRLB is derived, and the marginalized Bayesian CRLB is obtained by marginalizing out the nuisance parameter. We also derive the CRLBs of the hyperparameters involved in the three-stage model and explore the effect of multiple snapshots to the CRLBs. We compare the derived CRLBs of SBL, finding that the marginalized Bayesian CRLB is tighter than other CRLBs when SNR is low and the differences between CRLBs become smaller when SNR is high. We also study the relationship between the mean squared error of the source magnitudes and the CRLBs, including numerical simulation results with a variety of antenna configurations such as different numbers of receivers and different noise conditions.


Introduction
In signal processing, the direction of arrival (DoA) is a popular topic, and there exists a rich literature of estimation of the DoA of incoming sources from measurements of an antenna array. These approaches include the multiple signal classification (MUSIC) algorithm [1,2], estimation of signal parameters via rotational invariant techniques (ES-PRIT) [3,4], and so on. These algorithms are applied for DoA estimation by partitioning vector space into a signal subspace and a noise subspace, with the signal subspace containing the incoming sources that we want to estimate in DoA. However, there exist several shortcomings or limitations when using the above algorithms. For example, when using the MUSIC algorithm, the number of signal source is needed as a priori knowledge, and the algorithm requires many measurements for the estimate of incoming sources [1].
Sparse Bayesian learning (SBL) [5] can be considered to overcome the limitations of mentioned DoA methods. SBL is also known as the relevance vector machine (RVM). It is a kind of technique for solving the regression problem under the Bayesian framework. Compared with a general least squares method [6], the sparsity of the solution is taken into account within SBL and the standard deviation of the result is also estimated in SBL. With the development of SBL, several methods which are based on SBL have been proposed [7][8][9] and applied to solve the DoA problem in [10][11][12][13][14]. In addition to overcoming the limitations of the mentioned traditional DoA methods and the estimation accuracy and efficiency are also improved according to simulation results, and the SBL-based method is also studied for the 2-D DoA problem in [15] and extended to the off-grid DoA estimation in [16]. The CRLB [17] of the regular SBL is studied and discussed in [18]. In order to further improve the performance of the estimation method, in [19][20][21][22], the Laplace prior is used in SBL instead of the Gaussian prior of the traditional SBL. Simulation results show that compared with the Gaussian prior, using the Laplace prior can obtain sparser estimation results. However, to the best of our knowledge, the CRLB of the Laplace prior-based DoA estimator has not been studied previously; it is the topic of this paper. In this paper, instead of proposing a new DoA estimation method we focus on deriving the lower bounds of the Laplace prior-based DoA estimator, the CRLBs are derived under different scenarios, different antenna configurations, different environmental noises, and different numbers of measurements. We present a comparison between the CRLBs and the estimator mean squared error (MSE) from numerical simulations, as well as a comparison between the CRLBs for the different situations.
This paper is organized as follows. In Section 2, the Laplace prior-based DoA estimation method used in this paper is introduced. In Section 3, the CRLBs under different situations are derived. Simulation results are presented in Section 4. The conclusion is provided in Section 5.

DOA via SBL with Laplace Prior
In this paper, a nonuniform linear array, shown in Figure 1, is used as the receiver. The linear array is comprised of M identical elements with the ith element located at distance l i from the origin. The incoming sources are denoted by s j and the incident angles of the incoming sources are denoted by θ = {θ 1 , . . . , θ N }, where θ j ∈ [−90 • , 90 • ] and j = 1, 2, . . . , N. Independent complex white noise is also present in the element observations and denoted by n i , i = 1, . . . , M. The received signal of the ith element, y i , is written as where k is the wave number. In matrix form (1) is written as where y = [y 1 y 2 . . . y M ] T ∈ C M×1 , A ∈ C M×N is a matrix that encodes the antenna architecture into its entries A(m, n) = e −jkl m sin θ n , s = [s 1 s 2 . . . s N ] T ∈ C N×1 and n = [n 1 n 2 . . . n M ] T ∈ C M×1 ; the latter vector n follows a circularly symmetric complex normal distribution with zero mean and covariance matrix Γ = 2σ 2 I ∈ R M×M , where σ 2 is usually unknown. In order to estimate the incoming source DoA, a DoA sampling grid θ = [θ 1 . . .θ K ] ∈ R K×1 is used where the source strength corresponding toθ is denoted bỹ s ∈ C K×1 and K > N. In this paper, we set the size of sampling grid K = 181 to cover all integers from −90 • to 90 • . Here A ∈ C M×K is computed similarly to A but using all angles in the gridθ. The real and imaginary part of the sources are separated by rewriting (2) as [11]ȳ =Ās +n, . By using the Laplace prior and adopting the multistage hierarchical model of [19], the DoA problem is converted to an 1 -norm regression problem and a sparser solution than that obtained by using the Gaussian prior can be obtained by solving the 1 -norm regression problem [23]. The multistage model used in this paper contains three stages. First, where P denotes probability, N (x|µ, Σ) denotes that x follows a multivariate Gaussian distribution with mean µ and covariance matrix Σ, Σ = diag(α) = diag(α 1 , α 2 . . . α 2K ), and α i denotes the variance fors i . Secondly, we have Thirdly, Jeffrey's hyperprior [24] is applied: P(λ) ∝ 1 λ . To estimate the unknown incoming sources, we first obtain estimates of α and λ by maximizing the marginal likelihood. The logarithm of marginal likelihood is written as [7] log P(ȳ|α, λ, where We can get the estimates for the parameters by using the conventional method of setting the derivative of the marginal likelihood with respect to the specific parameter to 0 [25], and solve for the parameters to obtain the estimates. Alternatively we can estimate the parameters using the "constructive" so-called fast marginal likelihood maximisation method [7], which is more efficient. The fast marginal likelihood maximisation method iterates its search for optimal values over the different parameters α i , optimizing one at a time. The first step of the ith iteration of the the method is to rewrite the term C in (7) as [7] whereĀ i denotes the ith column ofĀ. Then we can separate the term C into two parts, one is dependent on α i , which is equal to α −1 iĀ i (Ā i ) T , and the other part is independent of α i which is C −i ≡ σ 2 I + ∑ j =i α −1 jĀ j (Ā j ) T . Then, we can similarly separate the marginal likelihood into two parts, the part which is dependent on α i is [7] L(α i ) = where In the ith iteration, we only consider the maximum L(α i ), and the optimal α i that maximizes that partial marginal likelihood is estimated by [7] where Once the marginal likelihood converges after iterating over all parameters or the specific requirements we set are satisfied such as the maximum iteration, we will stop the optimal parameter search. We use the estimated parameters and the posterior distribution overs to obtain the estimate for the incoming sources, the posterior distribution is whereμ andΣ are given by [5] In summary, we first update the unknown parameters used in the hierarchical model iteratively until the marginal likelihood converges, and then we use (12) and (13) to update the estimated mean and variance of the sources. Figure 2 presents the marginal likelihood versus iterations of one example and the estimated incoming signals are shown in Figure 3 with setting the convergence threshold as 10 −6 .

CRLBs for DoA Estimation with SBL with Laplace Prior
In this section, we will derive the CRLBs for the DoA estimator described in the previous section. There exist several types of CRLBs, and the appropriate CRLB should be used depending on the configuration of unknown parameters of the estimator [26][27][28]. The regular CRLB corresponds to the estimator in which all the unknown parameters are deterministic; the Bayesian CRLB corresponds to the estimator with random parameters, and the hybrid CRLB corresponds to the estimator with some random and some deterministic parameters. We want to use the derived CRLBs to study the effect of various factors in the performance of the DoA estimator, and we will compare the CRLBs under different situations. In principle, the DoA estimator in this paper works as a maximum likelihood estimator, and its performance can be measured by the MSE. We use the vector φ to denote the true value for the unknown variables. In this DoA problem, the unknown variables include the incoming source strengths, the noise, and the hyperparameters defined in the hierarchical model. We useφ(y) to denote the estimate of φ from the available measurement y. The MSE of the estimated result is defined as where f (y) denotes the distribution of y and n is the dimension of the vector φ. The CRLB expresses a lower bound for the MSE that the minimum estimated variance is greater than the inverse of the Fisher information matrix I(φ) [29], and the Fisher information matrix can be calculated as where P(y, φ) denotes the likelihood function. We will first calculate the Fisher information matrix under the ideal case that the signal variance is deterministic and noise variance is known. Then, we consider the more general cases when these variances are random or unknown, and finally we will study the effect of multiple measurements.

Ideal Case CRLB
In the ideal case, we assume the noise variance σ 2 is known and the hyperparameter Σ = diag(α) = diag(α 1 , α 2 , . . . , α 2K ), which is associated with the source, is deterministic. Letting unknown parameters vector φ = [s T α T ] T , where the incoming source amplitudes s is the only random variable. Because unknown parameters φ consist of random and deterministic parameters, the hybrid CRLB will be derived in this case. With Bayes' rule and the assumption that σ 2 is known and Σ is deterministic, the likelihood of the measurement y in (3) can be written as where P(ȳ|s, σ 2 ) is the likelihood of the measurement, and P(s|Σ) is the prior estimate for the source which appears in the first stage (4) of the hierarchical model. The log likelihood for (18) is written as With the expression for P(ȳ|s, σ 2 ) and P(s|Σ) in [20], (19) can be written as where M is the number of elements in the receiving array and 2K is the number of entries ins. The derivative of log P(ȳ,s, Σ, σ 2 ) with respect tos is where Υ ∈ R 2K×2K is a diagonal matrix with Υ (i,i) = 1 α i and the corresponding second derivative is Similarly we can calculate the derivative of log P(ȳ,s, Σ, σ 2 ) with respect to α i , which is and the second derivatives are I(φ) is calculated by using (17), where From (22), (24) and (25), we can have . From (27), it can be seen the Fisher matrix depends on the choice of sampling grid (i.e.,Ā), measurement conditions (i.e., σ 2 ) and the hyperparameters. The CRLB of the estimated variance for the unknown variable is proportional to the corresponding diagonal elements of the inverse of the Fisher's matrix. However, it is not very convenient to write the expression of the diagonal elements of the inverse of the first block in (27), if we have a look at the diagonal elements of the block, it is clear that they will increase as the number of antennas increases and as σ 2 and α become smaller. The lower bounds are also proportional to the reciprocal of the diagonal elements of the Fisher's matrix [29]. Therefore, it can be concluded that the CRLB decreases as the number of antennas increases and as σ 2 and α become smaller in this case.

Bayesian CRLB
We now consider the second case where Σ is also random and the Bayesian CRLB will be derived. Letting φ = [s T α T ] T , Σ is drawn from the prior distribution which is defined in (5), so when calculating the Fisher information matrix, the prior distribution should be considered. The second derivatives of the log likelihood with respect to the unknown parameters are same as the first case ((22), (24), (25)), and each block of the Fisher information matrix becomes where we reuse the same notations in the first case for the Fisher information matrix, κ and ω are both diagonal matrices with . By using Jensen's inequality and Equation (5), we can have The Fisher information matrix is written as In this case, the Fisher information matrix can be computed without knowing the realization of Σ. Now instead of deriving the lower limit for the incoming source directly by Bayesian CRLB, we notice that the dependence of the source on the hyperparameter λ can be characterized by marginalizing out Σ, which would then be seen as a nuisance parameter: By using ( (4) and (5)) and the identity we see that the distribution of the source is Laplacian: Now, the unknown parameters becomes and λ. The vector of unknown parameters is therefore written as φ = [s T λ] T . The Fisher information matrix I(φ) therefore consists of four blocks,

I(φ) = I(s) I(s, λ) I(s, λ)
where The joint distribution P(ȳ,s, σ 2 ; λ) is obtained by combining the likelihood and the prior distribution, P(ȳ,s, σ 2 ; λ) ∝ P(ȳ|s, σ 2 )P(s|λ)P(λ). With (35) and ignoring the constant terms, the log likelihood of the joint distribution is log P(ȳ,s, σ 2 ; λ) = − (ȳ −Ās) T (ȳ −Ās) 2σ 2 and the first derivative of the log likelihood P(ȳ,s, σ 2 ; λ) with respect tos is We continue to obtain the second derivative of log P(ȳ,s, σ 2 ; λ) as From (39), we can calculate I(s), Comparing (41) with our original result from (33), a tighter bound for the variance of estimateds is derived: In order to obtain the complete Fisher information matrix, the derivatives of log P(ȳ,s, σ 2 ; λ) with respect to λ are required: Thus, the Fisher information for λ is given by By using the source distribution from (35), we can calculate the expected value of |s i |: Plugging (48) into (46), we have I(λ) = K+2 2λ 2 . Then we have the Fisher information matrix where I(s, λ) is a zero matrix because s i is defined as a normal distributed variable with mean 0 in the first stage and Equation (40) will result in cancellation in the expectation. We see then that the CRLB for the source vectors depends only on the matrixĀ and the noise variance σ 2 , whereas the CRLB for the hyperparameter λ is proportional on the square of its true value and inversely proportional to the size of the angle grid K.

Noise Variance CRLB
We now derive the noise lower bound; we use φ = [s; σ 2 ] to denote the unknown variables. The Fisher matrix is written as where I(s) have been derived in (27) and (49) for situations with the known and the unknown signal prior, and both depend on σ 2 . Furthermore, I(s, σ 2 ) is a zero matrix, as this pair of variables is assumed to be uncorrelated. Therefore, we will focus on the block I(σ 2 ); if we denote η = σ 2 , we can write The second derivative of the corresponding log likelihood can be written as where M is the number of antenna elements. It is clear thatȳ −Ās = n, and n has entries which are subject to a Gaussian distribution with mean zero and variance η, we have Therefore we have the CRLB of the noise variance is proportional to the square of the true variance and inversely proportional to the number of the antenna elements in the receiver. The complete Fisher information matrix with the known signal prior is and for the situation with unknown signal prior, we have the Fisher information matrix as (56)

Multiple Measurements
In this subsection, we assume that multiple measurements are available. We are interested in multiple measurements because they can improve the accuracy of the estimated results according to experiments, the improvement can be explained by noting that the effect of noises is mitigated and multiple measurements can reduce the correlation between the columns of the measurement matrix if the measurement matrix changes with time which is common under several scenarios such as distributed ground communication where the locations of receivers could change in each measurement. Thus the difficulty in recovering the source is decreased. For example, the Gram matrices (i.e., ∑ L l=1 (Ā l ) TĀl /L) of single measurement and 100 measurements are presented in Figure 4a,b , in which a 12-element linear array is considered and the location of each receiver is subject to the normal distribution for different measurements.
We can see with multiple measurements, the value of correlation is decreased for different angle. In addition to the change of correlation, CRLB provides us another perspective of studying the effect of multiple measurements to the DoA problem. We can explore the effect of multiple measurements by finding the relationship between the lower bounds with the number of measurements. Two antennas configurations for the CRLB computation are considered here: static and dynamic positions for antenna elements. We start with the static configuration that the element positions are unchanged during the multiple measurements, assuming there are L independent measurements and each measurement is denoted by a 2M × 1 vector. The DoA problem can be rewritten as Z = AV + N, where Z ∈ R 2M×L = [y 1 y 2 . . . y L ] denotes the L measurements, A ∈ R 2M×2K denotes the measurement operator which is unchanged in the static mode, V ∈ R 2K×L = [s 1 s 2 . . . s L ] denotes the source values for the different measurements, and N ∈ R 2M×L corresponds to the noise. Because each measurement is assumed to be independent of the others, the likelihood of the L measurements can be written as P(Z|X) = ∏ L i=1 P(Z i |X i ), where Z i denotes the ith measurement and X i denotes the unknown variables at the ith snapshot. The Fisher matrix becomes When the incoming sources are assumed to be unchanged, i.e., X 1 = X 2 = · · · = X L , and then it is easy to see that With (16) and (58), we can see that the CRLB is reduced by a factor of L when compared with the single measurement case in static mode. In contrast, if we consider the dynamic configuration that the element position changes with time, the DoA estimation is rewritten as Thus, the Fisher information matrix can be written as Similarly to (41), we can show that For the DoA estimation problem, diag(A T Because the lower bounds are proportional to the reciprocal of the diagonal elements of the Fisher's matrix, we obtain an unsurprising conclusion: the minimum estimation variance is reduced by a factor of L compared with the single measurement.

Numerical Results
To test the sharpness of the CRLBs obtained in this paper, we generate numerical results from a simulation involving L measurements obtained from an array of M elements and a set of N incoming sources , the results are obtained by using MATLAB R2021a on a laptop with 8 GB RAM and 1.6 GHz Intel Core i5. The position of the M array elements are drawn i.i.d. from the distribution l m ∼ U[(m − 1) · ∆s, (m − 1) · ∆s + r], where l m denotes the position of the mth element, U[a, b] denotes the uniform distribution between a and b, ∆s denotes the distance between the two neighboring antenna elements, and r denotes the feasible range for the position of one element. All the results in this section are generated by setting N = 3, ∆s = 24λ and r = 6λ, because we are interested in large array spacings. Figure 5 shows the derived CRLBs for the source estimator and its numerical MSE as a function of the signal-to-noise ratio (SNR). The magnitude of MSE and CRLBs are converted to dB in order to make the difference more visible in the plot. We can see that under the same condition, marginalized Bayesian CRLB provides the tightest lower bound, especially when SNR is low. When SNR becomes higher, the difference between the three CRLBs becomes smaller. Another point worth noticing is that if we compare the difference between the magnitudes of MSE and CRLBs, the differences actually become smaller with the increase of SNR for all CRLBs; for example for the marginalized Bayesian CRLB, the difference drops from 0.003 at SNR = 2 dB to 0.001 at SNR = 14 dB.  Figure 6 shows the marginalized Bayesian CRLB for the source estimator and its numerical MSE as a function of the SNR for several numbers of snapshots L. We chose L = 10, 20, and 30 in the experiment, it is clear that the CRLB decreases as L and the SNR increase which explains the effect of multiple snapshots to this DoA estimation method in the perspective of lower bounds. With the increase of SNR, the difference of the MSE and CRLBs decreases for L = 20 and 30 in terms of magnitude, and the difference is almost the same for L = 10. Figure 7 also shows the marginalized Bayesian CRLB and MSE as a function of the SNR for various numbers of receiving elements M. This CRLB and MSE decrease as the number of receiving elements increases which verifies the effect of increasing the number of receiving elements to this DoA estimation method. We also notice that as the SNR increases, the difference between MSE and the CRLB becomes larger for the case with M = 6, in order to improve the performance of this estimation method in that case, we can try to tune the convergence threshold.
Finally, Figure 8 also shows the CRLB of the noise variance estimator and its numerical MSE for several numbers of receiving elements M under various SNRs. We see that the CRLB decreases as the number of receiving elements increases, which means increasing the number of receivers can also improve the estimation accuracy for the noise, and the difference between the MSE and the CRLB also decreases as SNR increases in terms of magnitude.

Conclusions
In this paper, we derived the CRLBs of estimated source, hyperparameters and noise in terms of MSE by using the Laplace prior in sparse Bayesian learning and we explored the several factors that can affect the lower bounds. The numerical simulation result is presented for demonstrating the derived lower bounds under different conditions. Among the derived CRLBs, the marginalized Bayesian CRLB is the tightest one. The relationship between the CRLB of the source to be estimated and the number of receiving elements and the number of snapshots is provided in the DoA problem, as we expect, the CRLB decreases as the number of receiving elements and the number of snapshots increase. In the future, we can extend the lower bounds to planar arrays and exploring the lower bounds of off-grid DoA estimation method can be another topic of a future paper. The effect of the convergence threshold is also worth further investigation.