Direction-of-Arrival Estimation via Sparse Bayesian Learning Exploiting Hierarchical Priors with Low Complexity

For direction-of-arrival (DOA) estimation problems in a sparse domain, sparse Bayesian learning (SBL) is highly favored by researchers owing to its excellent estimation performance. However, traditional SBL-based methods always assign Gaussian priors to parameters to be solved, leading to moderate sparse signal recovery (SSR) effects. The reason is Gaussian priors play a similar role to l2 regularization in sparsity constraint. Therefore, numerous methods are developed by adopting hierarchical priors that are used to perform better than Gaussian priors. However, these methods are in straitened circumstances when multiple measurement vector (MMV) data are adopted. On this basis, a block-sparse SBL method (named BSBL) is developed to handle DOA estimation problems in MMV models. The novelty of BSBL is the combination of hierarchical priors and block-sparse model originating from MMV data. Therefore, on the one hand, BSBL transfers the MMV model to a block-sparse model by vectorization so that Bayesian learning is directly performed, regardless of the prior independent assumption of different measurement vectors and the inconvenience caused by the solution of matrix form. On the other hand, BSBL inherited the advantage of hierarchical priors for better SSR ability. Despite the benefit, BSBL still has the disadvantage of relatively large computation complexity caused by high dimensional matrix operations. In view of this, two operations are implemented for low complexity. One is reducing the matrix dimension of BSBL by approximation, generating a method named BSBL-APPR, and the other is embedding the generalized approximate message passing (GAMB) technique into BSBL so as to decompose matrix operations into vector or scale operations, named BSBL-GAMP. Moreover, BSBL is able to suppress temporal correlation and handle wideband sources easily. Extensive simulation results are presented to prove the superiority of BSBL over other state-of-the-art algorithms.


Introduction
DOA estimation has advanced obviously due to several technique leaps in the last four decades, and the attained achievements are widely applied to communications, radar, sonar, and navigation.To be specific in communication [1,2], DOA estimation is essential in channel estimation, wireless communications, microphone localization, vehicular communications [3], Reconfigurable Intelligent Surfaces (RIS), and corresponding research focuses, including the RIS-based vehicle DOA estimation method [4][5][6].
Among the technique leaps in DOA estimation, compressed sensing (CS) and sparse recovery (SR) have played important roles in the last decade [7,8].Compared with traditional algorithms based on beamforming or subspace techniques [9][10][11][12][13], sparsity-based estimators have achieved technique leaps since CS and SR can mitigate the requirements for high signal-to-noise ratios (SNRs) and abundant snapshots [14].Moreover, it has been proven that sparsity-based estimators have remarkable advantages, such as good robustness to correlation and high estimation accuracy.
From a Bayesian perspective, sparse Bayesian learning is a probabilistic method that achieves l p -norm minimization by assigning sparse priors to the signal of interests for sparse signal recovery (SSR).In the single measurement vector (SMV) model, SBL can retain a desirable property of the l 0 -norm diversity measure, i.e., the global minimum is achieved at the maximally sparse solution [48] and produced a more limited constellation of local minima.In practice, the most used is the MMV model, so MSBL is developed [49].Theoretically, in [49], the adopted empirical Bayesian prior plays an important role in estimating a convenient posterior distribution over candidate basis vectors based on the concept of automatic relevance determination.That implies that the used priors assigned to signals of interest can enforce a common sparsity profile and consistently place the prominent posterior mass on the appropriate region for sparse recovery.In other words, the adopted priors dominate the sparsity performance of SBL and indicate the ability to carry out l p -norm minimization.Despite this, it is still unclear which prior is the best for sparse recovery, but a hierarchical Bayesian framework representing Laplace priors has been proven to be prominent in [50].Therefore, hierarchical priors are widely attractive, and many corresponding works are developed and presented.For instance, hierarchical synthesis lasso (HSL) priors for representing the same small subset of features are created for enforcing a proper sparsity profile of signal vectors [46], and hierarchical priors are adopted to consider unknown mutual coupling and off-grid errors.No matter how hierarchical priors are used, SBL has to face the difficulty caused by the MMV model.To be specific, SBL needs prior assumption, i.e., the uncorrelation between different measurement vectors, while the assumption may not be satisfied in practice.Moreover, the solution of matrix form is not always easy to handle and is even prohibitive due to the large complexity during the learning process.On this basis, vectorizing the MMV model seems to be an optimal selection since the solution can be transformed into the vector form.A successful realization is developed in [32], and the used block-sparse model indeed contributes much to the whole designed algorithm, although the main focus is the real-valued transformation.Regretfully, that work adopts Gaussian priors rather than hierarchical priors, so its sparsity performance is bound to be limited, no matter how impressive and excellent the running efficiency is.Recently, there have been a few documents that show the combination of SBL and deep learning (DL) [51,52], attracting widespread attention among numerous researchers.With big data and artificial intelligence, DL gradually arises and applies to image processing, signal processing, classification, recognition, etc.The obvious advantages of DL are its adaptability to complicated practical cases and its high running efficiency.In signal processing, many researchers use derived iterative processes to design corresponding neural networks so that the proposed algorithms operate with low complexity burdens, and the designed networks are named deep unrolled neural networks.For example, an SBL-based algorithm is unfolded into a layer-wise structure with a set of introduced trainable parameters in [51], which is beneficial for channel estimation.In addition, a modeldriven DL detector is developed based on variational Bayesian inference in [52].Based on deep unrolled networks, the detector is able to capture channel features that may be important but neglected by model-based methods, including SBL-based methods.Although the algorithms based on both SBL and DL are not applied to DOA estimation, the disadvantages lie in the inexplicability caused by data-driven DL techniques.In fact, the lack of theoretical guarantee is always constraining the development and application of DL.Collectively, SBL-based methods face a conflict between prominent sparsity performance achieved by hierarchical priors with low computational complexity.Methods based on both SBL and DL are pending further development, especially in theoretical explicability.
In this paper, we creatively solve the contradiction between sparsity performance with complexity burdens in SBL-based methods.On the one hand, hierarchical priors are still adopted to indirectly enhance sparsity.On the other hand, the increased complexity essentially caused by the hierarchical priors in MMV models is tactfully reduced by two operations based on the transformed block-sparse model.In fact, the balance of both sparsity and complexity is the main innovation point in this paper.Not only that, but the block-sparse model provides great convenience for complexity reduction by decreasing matrix dimensions in terms of matrix operation properties, and hierarchical priors create an opportunity to embed the generalized approximate message passing (GAMB) technique for solving marginal distributions so as to reduce complexity greatly.
To be specific, as the thick and inevitable MMV model restricts the SBL to some extent, we directly vectorize the MMV model, resulting in a block-sparse model that is convenient to carry out Bayesian learning.Unluckily, the vectorization expands the model dimensions, leading to large complexity in later Bayesian learning.Despite this, block-sparse Bayesian learning, named BSBL, is still analytically derived and developed.Moreover, it is worth noting that the complexity of BSBL is reluctantly acceptable owing to our reasonable design of the iterative process.In order to further achieve complexity reduction, two operations are introduced.One is based on matrix operation properties to decrease the matrix dimensions.Specifically, we analyze the complexity of each iterative formula and select the one containing operations of large computation burdens.Later, the selected formulas of BSBL are simplified and approximated by terms of operational properties of Kronecker products and some reasonable preconditions, so the faster version is named BSBL-APPR.The other is based on the famous generalized approximate message passing (GAMB) technique.GAMP is developed to solve approximate marginal posteriors, which are exactly applicable in BSBL since BSBL is derived by iterative hyperparameters originating from marginal distributions.Therefore, the GAMP technique is able to be embedded into BSBL, and the only additional work is to derive the iterative process of GAMP so that GAMP is useful in our block-sparse model.Moreover, GAMP is able to decompose the high dimensional matrix operations into vector or scale operations, which achieves complexity reduction well.BSBL with embedded GAMP is named BSBL-GAMP.In addition, many SBL-based methods consider little about the intractable wideband cases.Since wideband sources are able to be regarded as a superposition of many narrowband sources, we extend the proposed BSBL to wideband cases.The whole algorithm for wideband cases is derived and finished.Last but not least, the temporal correlation, which is not often considered in SBL, is modeled in the block-sparse model.Therefore, all the above methods are able to suppress temporal correlation.
In summary, the contributions of this paper are as follows: • Hierarchical priors are adopted to enhance sparsity, and a block-sparse model is generated to carry out Bayesian learning easily.Hierarchical priors play an important role in l p -norm optimization and outperform Gaussian priors in sparsity constraint, indirectly resulting in better sparsity performance.In the MMV case, the equivalently transformed block-sparse model laid a foundation for complexity reduction.Combining hierarchical priors with the block-sparse model allows for the balance of sparsity and complexity; • Two operations are created to reduce complexity based on the block-sparse model.One exploits matrix operation properties to approximate high-dimension operations of derived formulas in the iterative process, while the other leverages the GAMP technique to simplify the iteration for computing marginal distributions, so that the matrix operations are decomposed into vector or scale operations;

•
For wideband sources appearing in practice, the proposed BSBL is extended to be applicable in terms of the decomposition of wideband signals into narrowband ones.Moreover, the temporal correlation is considered by introducing a temporally correlated matrix into our data model.The designed iterative process of BSBL is able to be robust to temporal correlation.
The rest of this paper is organized as follows.In Section 2, the DOA estimation problem is abstracted from radar detection.Furthermore, the DOA estimation is equivalently transformed into an SSR problem based on the exploited block-sparse model.In Section 3, the traditional SBL based on our model is briefly introduced and derived, and, its defects are presented by simple analysis.In Section 4, the proposed BSBL, BSBL-APPR, BSBL-GAMP, and BSBL for wideband cases are analytically derived, and corresponding iterative processes are presented.In Section 5, the performance of the proposed methods is evaluated comprehensively.In Section 6, conclusions are drawn.
For the sake of convenience, the notations are listed in Table 1.
The i-th row of matrix A and the j-th column of matrix A

A i,j
The element in the i-th row and j-th column of matrix A • p Obtain l p norm for each row of a matrix

Problem Reformulation
In radar or sonar detection, for communication, localization, and navigation, the antenna array receiver can receive different signals from various directions.Taking far-field radar detection as an example, adjacent antenna sensors have the same phase difference due to the plane electromagnetic wave if every adjacent couple has the same distance.Thus, a steer vector can be abstracted as a(θ) = [1, exp(−j2πd sin(θ)/λ), . . ., exp(−j2π(N − 1)d sin(θ)/λ)] T ∈ C N×1 .θ is the direction of the source signal.λ is the wavelength.d is the distance between adjacent sensors.N is the number of sensors.For all the K sources from different directions θ K k=1 , a steer matrix is yielded as A = [a(θ 1 ), . . ., a(θ K )] ∈ C N×K .A is vital in DOA estimation because it implies the directions of all the sources based on the whole antenna array composed of N sensors.Radio frequency signals were received by an antenna array, whose sensors transfer individually received signals to independent channels, in which down conversion is conducted to produce intermediate-frequency signals, i.e., baseband signals.Later, using prior signals to execute matched filtering at time l, the receiver can gain a complex reflection factor of all the K sources, i.e., s(l) = [s 1 , . . ., s k , . . ., s K ] T ∈ C K×1 .s k and ∀k = 1, . . ., K are the products of complex reflection coefficients and Doppler shifts.Generally, radar takes pulse accumulation to enhance signal processing ability when the echo pulses are in the same coherent processing interval (CPI), in which little fluctuation happens between different originally received signals and processed signals.In other words, sources are motionless, and the processing signals are nearly consistent during a CPI.Last but not least, the noises of N channels in a CPI are indispensable and assumed to be mutually independent.Overall, the above entire process is briefly shown in Figure 1, where the signal model is expressed as where X = [x 1 , . . ., x L ] ∈ C N×L is the ideal data of N channels in a CPI containing L snapshots.S = [s(1), . . ., s(l), . . ., s(L)] ∈ C K×L is the complex reflection factor matrix of the K sources in a CPI.N = [n 1 , . . ., n L ] is the assumed noise matrix with each entry n l obeying an i.i.d.Gaussian distribution denoted as CN (0, α −1 ).

Problem Reformulation
In radar or sonar detection, for communication, localization, and navigation, the antenna array receiver can receive different signals from various directions.Taking far-field radar detection as an example, adjacent antenna sensors have the same phase difference due to the plane electromagnetic wave if every adjacent couple has the same distance.Thus, a steer vector can be abstracted as are the products of complex reflection coefficients and Doppler shifts.Generally, radar takes pulse accumulation to enhance signal processing ability when the echo pulses are in the same coherent processing interval (CPI), in which little fluctuation happens between different originally received signals and processed signals.In other words, sources are motionless, and the processing signals are nearly consistent during a CPI.Last but not least, the noises of N channels in a CPI are indispensable and assumed to be mutually independent.Overall, the above entire process is briefly shown in Figure 1, where the signal model is expressed as where is the ideal data of N channels in a CPI containing L snapshots.
[ (1),..., ( ),..., ( )] is the complex reflection factor matrix of the K sources in a CPI.   ( where A = [a(θ 1 ), . . ., a(θ M )] ∈ C N×M is the sparsely extended manifold matrix with the angle set θ M m=1 generated by the M-grid sampling.As K M holds in most cases, P ∈ C M×L is the zero-padding version of S, with each row representing a potential source that maps the M-grid spatial angular sampling.Patently, processed data in a CPI are equivalently transformed into (2), which are typically sparse data in the MMV model.Now, the objective is to solute the sparse P with known data X and steer matrix A. It is worth emphasizing two mainstream methods to handle (2).One is dealing with (2) directly, which is more difficult but has less computa-tional complexity.The other is solving the vectorization (2), which is simple and intuitive but has large complexity caused by the vectorization.In this paper, the latter is selected because we are able to reduce the complexity tactfully.Without loss of generality, (2) is vectorized as where x = vec(X T ) ∈ C NL×1 , Φ = A ⊗ I L ∈ C NL×ML , p = vec(P T ) ∈ C ML×1 , and n = vec(N T )∈ C NL×1 .After the vectorization, a block-sparse vector p is yielded since the original P contains many zero rows.In addition, the matrix dimensions grow after the vectorization, leading to large complexity, but the problem will be solved by our approximation eventually.Based on the block-sparse model shown in (3), the DOA estimation is transformed into a sparse recovery problem, i.e., to solve p with known x and Φ.

Canonical SBL Method
To solve p in (3), the canonical SBL method is introduced and briefly shown as follows.According to (3), the likelihood is The prior of p is supposed to be where γ i is a hyperparameter representing a potential source, and Σ 0 is expressed as Based on the Bayesian formula, the posterior of p is The posterior of ( 7) is rigorously solved as a Gaussian distribution with mean and covariance as follows: The likelihood, prior, and posterior are uniquely determined by the hyperparameter set Θ = {γ i , α, ∀i}.According to the maximum a posterior (MAP) criterion, the Expectation-Maximization (EM) algorithm [53] is used to maximize p(p|x; Θ).Here, p is treated as a hidden variable to obtain the relationship between the hyperparameter set (i.e., Θ new ) and the old one (i.e., Θ old ) by maximizing the following term.
Omitting the specific derivation, the final iteration solutions of the hyperparameters are expressed as Sensors 2024, 24, 2336 7 of 28 where ).The canonical SBL algorithm uses (8), ( 9), (11), and (12) iteratively to estimate γ until convergence, and the final γ = [γ 1 , . . ., γ M ] T is regarded as the solved p.However, recalling the whole process, two limitations are present: (i) the single Gaussian priors cannot enhance sparsity well and (ii) the computational complexity O(M 3 L 3 ), dominated by (12), is usually unacceptable in practice.Consequently, we develop an SBL-based method to resolve (3) in this paper.

BSBL
Without loss of generality, the SBL-based methods are required to construct a Bayesian framework and then complete the corresponding Bayesian inference to develop an iterative algorithm.

Bayesian Framework
Bayesian framework is composed of prior distributions of observed data and unknown variables.

Remark 1.
According to the MAP criterion, priors are essential for SBL-based methods because the iterative process to be constructed is based on the derivatives of different variables so as to ensure the maximal posterior.Therefore, the prior distributions need to be clarified first.
In this paper, the prior distribution (i.e., likelihood) of the observed data x is similar to (4), i.e., For better sparsity performance, we propose hierarchical priors containing Gaussian and Gamma priors.The reason for selecting Gamma priors is that the Gamma distribution is the conjugate prior to the inverse variance of the Gaussian distribution [54].As usual, the prior p obeys the i.i.d.complex Gaussian distribution, i.e., p(p|γ) ∼ CN (0, Σ). (14) where T , and Σ = diag(γ −1 ) ⊗ B. Note that B ∈ C L×L representing temporally correlated level is not equal to I L in the canonical SBL method.Therefore, the proposed method will be able to suppress temporal correlation, which will be tested and verified in Section 5. B is generally modeled as a Toeplitz matrix, i.e., where β is the complex correlation coefficient with the amplitude |β| ∈ [0, 1] and phase arg where Γ(a) = ∞ 0 x a−1 exp(−x)dx.a and b are the shape parameter and scale parameter.Then, a Gamma prior is applied to α so that Sensors 2024, 24, 2336 8 of 28 where c and d are the corresponding shape and scale parameters, respectively.It is worth emphasizing that the true prior distribution of p can be solved with Student's t-distribution, which promotes sparsity better than the traditional Gaussian distribution [47].

Bayesian Inference
Bayesian inference is necessary for the eventual iterative algorithm, and the crux is to deduce the posterior.Unfortunately, according to our Bayesian framework, the posterior is intractable.However, we just need to maximize the posterior by maximizing the evidence procedure, regardless of the analytical closed form of the posterior.Coincidentally, OGSBI [54] provides an example to maximize evidence, i.e., the marginal probability of the observed data x But, ( 18) is still intractable, so maximizing evidence seems to help little.However, variational inference is able to achieve it.It is necessary to explain variational inference before later derivation.Variational inference defines a function as a mapping that takes a function as the input and returns the value of the function as the output [55].In the entropy field, the function is a probability distribution.When variational inference is applied to (18), the parameter vector (i.e., unknown stochastic variables) no longer appears because the parameters are absorbed into new probability distributions.Thus, (18) can be converted to an addressable form.
To be specific, we adopt variational Bayesian inference (VBI) [56] to address (18) by introducing a distribution q(Θ), where Θ = {p, γ, α} is the parameter set of unknown variables.The introduced q(Θ) can simplify (18) and allow the logarithmic form of ( 18) to be divided into two parts, i.e., ln p(x) = q(Θ) ln p(x, Θ) q(Θ) dΘ . a and b are the shape parameter and scale parameter.
Then, a Gamma prior is applied to α so that 1 ( ; , ) .( ) where c and d are the corresponding shape and scale parameters, respectively.
It is worth emphasizing that the true prior distribution of p can be solved with Stu- dent's t-distribution, which promotes sparsity better than the traditional Gaussian distribution [47].

Bayesian Inference
Bayesian inference is necessary for the eventual iterative algorithm, and the crux is to deduce the posterior.Unfortunately, according to our Bayesian framework, the posterior is intractable.However, we just need to maximize the posterior by maximizing the evidence procedure, regardless of the analytical closed form of the posterior.Coincidentally, OGSBI [54] provides an example to maximize evidence, i.e., the marginal probability of the observed data x ( ) But, ( 18) is still intractable, so maximizing evidence seems to help little.However, variational inference is able to achieve it.It is necessary to explain variational inference before later derivation.Variational inference defines a function as a mapping that takes a function as the input and returns the value of the function as the output [55].In the entropy field, the function is a probability distribution.When variational inference is applied to (18), the parameter vector (i.e., unknown stochastic variables) no longer appears because the parameters are absorbed into new probability distributions.Thus, (18) can be converted to an addressable form.
To be specific, we adopt variational Bayesian inference (VBI) [56] to address ( 18) by introducing a distribution ( ) q Θ , where , ,α Θ = {p γ } is the parameter set of unknown variables.The introduced ( ) q Θ can simplify (18) and allow the logarithmic form of (18) to be divided into two parts, i.e., where is the product of priors and ( | ) p Θ x is the posterior.The specific derivation from ( 18) to ( 19) is complicated [55], so it is omitted here.
( , ) F q Θ is the lower bound of ln ( ) p x because ( || ) 0 KL q p ≥ is the Kullback-Leibler divergence between ( ) q Θ and the posterior ( | ) p Θ x .
The significance of ( 19) is transforming the intractable ln ( ) p x into an approximated tractable ( , ) F q Θ , so that maximizing ln ( ) p x is approximately equal to maximizing ( , ) F q Θ .The lower bound ( , ) F q Θ is a functional in terms of ( ) q Θ .In other words, − q(Θ) ln p(Θ|x) q(Θ) dΘ where c and d are the corresponding shape and scale parameters, respecti It is worth emphasizing that the true prior distribution of p can be solve dent's t-distribution, which promotes sparsity better than the traditional Gau bution [47].

Bayesian Inference
Bayesian inference is necessary for the eventual iterative algorithm, and to deduce the posterior.Unfortunately, according to our Bayesian framework rior is intractable.However, we just need to maximize the posterior by max evidence procedure, regardless of the analytical closed form of the poster dentally, OGSBI [54] provides an example to maximize evidence, i.e., the mar ability of the observed data x ( ) But, ( 18) is still intractable, so maximizing evidence seems to help little variational inference is able to achieve it.It is necessary to explain variationa before later derivation.Variational inference defines a function as a mapping function as the input and returns the value of the function as the output [55] tropy field, the function is a probability distribution.When variational inferenc to (18), the parameter vector (i.e., unknown stochastic variables) no longer a cause the parameters are absorbed into new probability distributions.Thus, converted to an addressable form.
To be specific, we adopt variational Bayesian inference (VBI) [56] to add introducing a distribution ( ) q Θ , where is the product of ( | ) p Θ x is the posterior.The specific derivation from ( 18) to ( 19) is complicate is omitted here.
( , ) F q Θ is the lower bound of ln ( ) p x because ( || ) KL q p Kullback-Leibler divergence between ( ) q Θ and the posterior ( | ) p Θ x .The significance of ( 19) is transforming the intractable ln ( ) p x into an ap tractable ( , ) F q Θ , so that maximizing ln ( ) p x is approximately equal to m ( , ) F q Θ .The lower bound ( , ) F q Θ is a functional in terms of ( ) q Θ .In ot where p(x, Θ) = p(x|p; α, γ)p(p|γ)p(γ; a, b)p(α; c, d) is the product of priors and p(Θ|x) is the posterior.The specific derivation from ( 18) to ( 19) is complicated [55], so it is omitted here.F(q, Θ) is the lower bound of ln p(x) because KL(q||p) ≥ 0 is the Kullback-Leibler divergence between q(Θ) and the posterior p(Θ|x).
The significance of ( 19) is transforming the intractable ln p(x) into an approximated tractable F(q, Θ), so that maximizing ln p(x) is approximately equal to maximizing F(q, Θ).The lower bound F(q, Θ) is a functional in terms of q(Θ).In other words, F(q, Θ) is a mapping that takes as input a function q(Θ) and returns the value of the function as the output.Similar to the function derivative, maximizing F(q, Θ) requires some optimization over specific forms of q(Θ).In Bayesian inference, the commonly used form is factorization [56].
Utilizing ( 13), (14), and (23), q(p) can be solved as a Gaussian distribution, with mean and variance given by Please refer to Appendix A for the proof.In general, ( 24) and ( 25) are equivalently transformed into beingless complex according to the properties of the matrix in [46].
Using ( 14), (16), and (28), q(γ) is identified as a Gamma distribution, whose shape parameter, the m − th scale parameters, and the m − th element of the mean are as follows: where T and m = 1, 2, . . ., M. Please refer to Appendix B for the proof.( 30) can be simplified further, i.e., where , and m = 1, 2, . . ., M. Please refer to Appendix C for details of the derivation.

Remark 2.
According to [57], maximizing the lower bound F(q, Θ) guarantees convergence of the iterative optimization since each iteration leads to a nondecreasing value of F(q, Θ).Therefore, the proposed method must converge at some point.

Off-Grid Correction
Recalling (2), P with angle set θ M m=1 is yielded by spatial discretization, causing estimation errors inevitably if sources are off grid.On this basis, the array steering vector of the i − th source is Taylor expanded around the nearest sampling grid denoted as θ m i , i.e., where Patently, the final objective is to solve ∆θ i .Following the above similar principle based on Taylor expansion, Φ can be extended as where λ = [λ 1 , . . ., λ ML ] T with the i − th element is ] and ∀i = 1, . . ., ML.To solve λ, (3) can be used, i.e., Combining (40) with (39), the following equation holds.
Overall, the whole BSBL algorithm is completed and summarized in Algorithm 1.

Initialization
(i) set the first iterative number k = 0, p (0) = A H X 2 .
(ii) assign a, b, c and d very small values (ensure uninformative distributions).
(ii) use the final µ p to solve λ according to (41).
Compared with the canonical SBL method, BSBL achieves better sparsity performance and lower computational complexity.According to the maximal number of complex multiplications, the complexity of BSBL, dominated by (26) However, BSBL will still suffer heavy computational burdens when L or M is large.Therefore, we must seek some techniques to reduce computational complexity.

BSBL-APPR
Obviously, the large complexity is mainly caused by high-dimensional matrix operations that contain massive useless zero (or near-zero) operations.Automatically, the simplest perspective is to exploit some approximation to shrink the matrix dimensions, generating the first faster version called BSBL-APPR.
Recalling the entire algorithm, the high dimensions are essentially yielded by computing (26), (27), and (37).For µ p in (26), the corresponding approximation is where ).The derivation process follows Kronecker-Product properties, i.e., The approximation exactly holds if α −1 = 0 or B = I L .To be specific, the approximation is reasonable if high SNRs or low ow correlation coefficient levels are adopted.In fact, the two conditions (or at least one) are easy to meet in practice.
Likewise, for Σ p in (27), the approximation is Similarly, for d in (37), its approximation is (45) After the approximation, BSBL-APPR is completed.Its concrete iterative steps are omitted here since the process is the same as BSBL, except for the calculation of µ p , Σ p , and d by ( 43)- (45).
Since the approximation operations have been performed, the computational complexity of BSBL-APPR, dominated by ), which theoretically verifies the higher efficiency of BSBL-APPR.
Even so, the complexity of BSBL-APPR still seems to be intolerable when dense sampling is adopted, i.e., M is large enough.Additionally, there exist several operations with many zero (or near-zero) elements, e.g., A Γ −1 A H and Γ −1 ⊗ B. For lower complexity, the most effective method is to decompose matrix operations into vector and even scalar operations so as to selectively avoid useless computation.Fortunately, a technique, named generalized approximate message passing (GAMB), exactly achieves that.

BSBL-GAMP
GAMP is a technique developed to solve approximate marginal posteriors with low complexity based on the central limit theorem [58].To briefly explain the principle of the GAMP technique, (3) is rewritten in scalar form.
where x i and n i are the i − th entries of x and n.Let z i = Φ i• p, p = [p 1 , . . ., p j , . . ., p ML ] T , and j = 1, . . ., ML.Given the known measurement matrix Φ and the observed vector x, the objective is to obtain the estimation of p.Each x i is connected to p j by Φ, and vice versa.x i and p j are defined as the input node and the output node, respectively.The association between them is called an edge.Input nodes and output nodes pass messages to each other along the edges.The original message passing (MP) technique is to keep passing messages (i.e., probability distributions) with respect to p j until convergence.Based on MP, GAMP is the extension for low complexity since it passes only important messages that mainly affect the approximated marginal posteriors of p.
Overall, the GAMP technique is selected to speed up our algorithm for two motivations.(i) It passes messages from one node to another, enforcing scalar operations with low complexity.(ii) It can also compute approximate marginal posteriors of p, which allows GAMP to be embedded into the proposed BSBL.
Similar to ( 7), ( 47) and ( 48) are easily identified as Gaussian distributions. where are the mean and variance of p(z i |x, υ z i , τ z i , η), while are the mean and variance of p(p j |x, υ p j , τ p j , η).Please refer to Appendix E for details of the derivation.Then, it is required to determine two scalar functions denoted as g in (•) and g out (•), where g in (•) is equal to the posterior mean µ p j of p j , i.e., The corresponding posterior variance is g out (•) satisfies The corresponding posterior variance is So far, the derivation of GAMP is completed.Note that the variance of (52) becomes a vector (equivalent to a diagonal matrix), while Σ p of BSBL (or BSBL-APPR) is still a normal matrix.From this perspective, BSBL-GAMP is generated by the most thorough approximation, resulting in the least complexity.As shown in Algorithm 2, the GAMP algorithm is summarized.
Intuitively, only µ p and diagonalized Σ p could be updated in the GAMP algorithm.For hyperparameter α, its update process will still lead to relatively large complexity if computed by (36).Here, α is rewritten as Overall, the second faster version, called BSBL-GAMP, is eventually yielded by embedding the GAMP algorithm into BSBL.Specific steps are summarized in Algorithm 3.

Initialization.
(i) set the first iterative number k = 0, error tolerance ε, maximal iterative number k max .
(iii) assign a, b, c and d very small values.
(i) compute µ p and Σ p according to the above GAMP.
Theoretically, BSBL-GAMP contains only simple multiplication of vectors and linear operations, so it is undoubtedly the fastest algorithm.To be specific, its complexity dominated by (55) is O(ML), much less than O(M 2 NL 3 ) of BSBL or O(M 2 N) of BSBL-APPR.For comparison, the computational complexity of all the narrowband algorithms involved in this paper is summarized in Table 2. Generally, M N, L holds; thus, BSBL-GAMP obviously has the least computational complexity.In contrast, BSBL-APPR seems to be also satisfactory since its complexity is smaller than others, except for IC-SPICE and RVM-DOA.Moreover, the complexity of BSBL is moderate.

BSBL for Wideband Sources
Although BSBL, BSBL-APPR, and BSBL-GAMP are developed, they are only applicable in the case of narrowband sources.For wideband cases, we must extend BSBL further.
A way to deal with wideband sources is to separate the wideband spectrum into independent narrowband ones.Without loss of generality, (3) can be rewritten as the special case at the j − th frequency point f j , ∀j = 1, . . .J, i.e., In this model, it is worth emphasizing that we only care about the locations of non-zero elements in p j rather than the concrete values because different p j theoretically indicate the same location of some source.Consequently, considering all the frequency points, different p j can be unified as q so that y = Ψq + n. where T ∈ C NLJ×1 .Similar to the aforementioned derivation of BSBL, (58)−(65) are yielded as follows.
To be distinguished from BSBL in narrowband cases, the diffetent parameters in wideband cases are with superscript or subscript w and q.In particalur, Σ w = diag((γ w ) −1 ) ⊗ B is the variance of variable q, where γ w = [γ w 1 , . . ., γ w M ] T .In wideband cases, the off-grid correction is the same as BSBL, except for (41), which is replaced by So far, BSBL for wideband sources has been completed.The specific process is summarized in Algorithm 4. Initialization.
(ii) Use the final µ q to solve λ w according to (66).

Numerical Simulation
In this section, the superiority of our proposed algorithms will be proven comprehensively through three subsections, including extensive simulations.For simplicity, the proposed BSBL, BSBL-APPR, and BSBL-GAMP are collectively referred to as BSBLs.In the first and second subsections, the narrowband and wideband estimation performance of various algorithms is evaluated comprehensively.In the third subsection, the in-depth analysis of Bayesian performance is completed by comparison with other off-grid SBLbased methods.

Estimation Performance for Narrowband Sources
In this subsection, estimation performance is evaluated by the Root-Mean-Square Error (RMSE) expressed as where M c is the Monte Carlo number and K is the number of sources.θm c ,k is the estimation value for the k − th source in the m c − th trial and θ k is the true angle of the k − th source.For clarity, we introduce four canonical SMV algorithms, i.e., l 1 −SVD [20], l 1 −SRACV [21], IC-SPICE [22], SBL [31], SS-ANM [25], and StrucCovMLE [26], as comparisons in following simulations.Before that, unless otherwise stated, baseline simulation conditions are SNR = 20dB, K = 3 temporally correlated sources with a random DOA set {−20.53 • , 10.10 • , 43.01 • }, the number of sensors is N = 8, the number of snapshots is L = 4, the grid interval is 1 • , the number of grids is M = 180, the Monte Carlo number is M c = 200, and the temporal correlation coefficient is β = 0.
Remark 3. β = 0 and SNR = 20 dB (or at least one) are provided to ensure that BSBL-APPR performs normally.
Simulation 1 tests the ability of various algorithms to suppress temporal correlation.Specifically, the amplitudes and phases of temporally correlated coefficients uniformly vary from [0, 1] and [0, 2π], respectively.The results are shown in Figure 2. Obviously, the RMSEs of BSBLs persistently remain low and fluctuate only slightly with the correlation coefficient varying, while others just struggle when the correlation coefficient is large.Particularly, IC-SPICE is able to impair the influence of temporal correlation to some extent, but it is still at a loss if the correlated level is high.Overall, the simulation results fully live up to our expectations that BSBLs are able to suppress temporal correlation effectively, which confirms that the considered temporal correlation modeled in the block-sparse model indeed plays an important role in improving the robustness of temporal correlation.π , respectively.The results are shown in Figure 2. Obviously, the RMSEs of BSBLs persistently remain low and fluctuate only slightly with the correlation coefficient varying, while others just struggle when the correlation coefficient is large.Particularly, IC-SPICE is able to impair the influence of temporal correlation to some extent, but it is still at a loss if the correlated level is high.Overall, the simulation results fully live up to our expectations that BSBLs are able to suppress temporal correlation effectively, which confirms that the considered temporal correlation modeled in the block-sparse model indeed plays an important role in improving the robustness of temporal correlation.Simulation 2 examines the dependence on the number of snapshots.In Figure 3, all the sparsity-based algorithms are collectively robust to various snapshots.In other words, all the algorithms seem to achieve SSR with only a few snapshots.Despite this, BSBLs are still commendable due to the realized lowest RMSEs.In fact, SBL itself enables finding global minima and smoothing out numerous local minima in some cases with a few snapshots [48].For SBL-based methods, BSBLs undoubtedly inherit their advantages.Additionally, the used hierarchical priors can improve sparsity performance so that BSBLs perform best.Thus, BSBLs enable high estimated precision with a few snapshots.Simulation 2 examines the dependence on the number of snapshots.In Figure 3, all the sparsity-based algorithms are collectively robust to various snapshots.In other words, all the algorithms seem to achieve SSR with only a few snapshots.Despite this, BSBLs are still commendable due to the realized lowest RMSEs.In fact, SBL itself enables finding global minima and smoothing out numerous local minima in some cases with a few snapshots [48].For SBL-based methods, BSBLs undoubtedly inherit their advantages.Additionally, the used hierarchical priors can improve sparsity performance so that BSBLs perform best.Thus, BSBLs enable high estimated precision with a few snapshots.Simulation 3 focuses on the RMSE performance with respect to SNRs.Apparently, in Figure 4, all the algorithms work well if high SNRs are adopted, while only BSBLs maintain fewer RMSEs at low SNRs.The results can be explained by the fact that the sparsitybased algorithms rely on high SNRs to some extent, but SBL can reduce the dependency.Taking IC-SPICE as an example, it can achieve efficient iterative optimization under the condition of high SNRs but cannot seek the right global minima or even trap some fixed local optima.The original reason for this is that the used covariance matrix and the updated parameters have large errors with ideal ones.However, SBL seems to work normally owing to convergence guarantee and gradual optima under the condition of existing data errors.Surprisingly, the proposed BSBLs have a similar ability in some way.On the whole, BSBLs are preferable, especially under the condition of low SNRs.Simulation 3 focuses on the RMSE performance with respect to SNRs.Apparently, in Figure 4, all the algorithms work well if high SNRs are adopted, while only BSBLs maintain fewer RMSEs at low SNRs.The results can be explained by the fact that the sparsitybased algorithms rely on high SNRs to some extent, but SBL can reduce the dependency.Taking IC-SPICE as an example, it can achieve efficient iterative optimization under the condition of high SNRs but cannot seek the right global minima or even trap some fixed local optima.The original reason for this is that the used covariance matrix and the updated parameters have large errors with ideal ones.However, SBL seems to work normally owing to convergence guarantee and gradual optima under the condition of existing data errors.Surprisingly, the proposed BSBLs have a similar ability in some way.On the whole, BSBLs are preferable, especially under the condition of low SNRs.
Simulation 3 focuses on the RMSE performance with respect to SNRs.Apparently, in Figure 4, all the algorithms work well if high SNRs are adopted, while only BSBLs maintain fewer RMSEs at low SNRs.The results can be explained by the fact that the sparsitybased algorithms rely on high SNRs to some extent, but SBL can reduce the dependency.Taking IC-SPICE as an example, it can achieve efficient iterative optimization under the condition of high SNRs but cannot seek the right global minima or even trap some fixed local optima.The original reason for this is that the used covariance matrix and the updated parameters have large errors with ideal ones.However, SBL seems to work normally owing to convergence guarantee and gradual optima under the condition of existing data errors.Surprisingly, the proposed BSBLs have a similar ability in some way.On the whole, BSBLs are preferable, especially under the condition of low SNRs.5, the RMSE performance of BSBLs is still excellent, although BSBLs are inferior to IC-SPICE at 4 N = .In fact, the results are related to the ability to solve underdetermined DOA estimation problems.SBL still seems to find the sparsest solutions, although the restricted isometry property (RIP) is not satisfied [48].When the number of sensors is not large enough, i.e., the solution to be solved is not sparse enough, SBL still tries its best to realize global minima.Thus, BSBLs can handle Here, this simulation is executed at the number of snapshots L = 20 rather than L = 4 since l 1 −SVD cannot work normally when the number of sensors exceeds the number of snapshots.Intuitively, in Figure 5, the RMSE performance of BSBLs is still excellent, although BSBLs are inferior to IC-SPICE at N = 4.In fact, the results are related to the ability to solve underdetermined DOA estimation problems.SBL still seems to find the sparsest solutions, although the restricted isometry property (RIP) is not satisfied [48].When the number of sensors is not large enough, i.e., the solution to be solved is not sparse enough, SBL still tries its best to realize global minima.Thus, BSBLs can handle underdetermined cases efficiently.In other words, BSBLs are able to realize SSR effectively on the condition of a few sensors.Simulation 5 tests the adaptability to wide grid intervals.In Figure 6, the results illustrate that both BSBLs and IC−SPICE gain highly accurate estimation values at refined grids, but only BSBLs reluctantly adapt to wide grid intervals, although all the algorithms suffer hardship at coarse grids.The coarse girds compel relatively large errors of the preestimated values, so the refined values have a larger bias.In the proposed BSBLs, the grid refinement has an effect on reducing the bias at each iteration.Therefore, BSBLs adapt to coarse grids to some extent.Simulation 5 tests the adaptability to wide grid intervals.In Figure 6, the results illustrate that both BSBLs and IC−SPICE gain highly accurate estimation values at refined grids, but only BSBLs reluctantly adapt to wide grid intervals, although all the algorithms suffer hardship at coarse grids.The coarse girds compel relatively large errors of the pre-estimated values, so the refined values have a larger bias.In the proposed BSBLs, the grid refinement has an effect on reducing the bias at each iteration.Therefore, BSBLs adapt to coarse grids to some extent.
lustrate that both BSBLs and IC−SPICE gain highly accurate estimation values at refined grids, but only BSBLs reluctantly adapt to wide grid intervals, although all the algorithms suffer hardship at coarse grids.The coarse girds compel relatively large errors of the preestimated values, so the refined values have a larger bias.In the proposed BSBLs, the grid refinement has an effect on reducing the bias at each iteration.Therefore, BSBLs adapt to coarse grids to some extent.Simulation 6 examines the RMSE performance with respect to the number of sources.The different conditions are as follows: the DOA sets of sources are selected from [−90 • , 90 • ] randomly, and L = 10 is chosen to ensure the normal operation of l 1 −SVD.In Figure 7, all the RMSEs drop rapidly, but BSBLs seem to slow down the pace of performance degradation, which implies that BSBLs have the potential to locate more sources.In fact, the results are another proof of the excellent underdetermined DOA estimation ability of BSBLs in Simulation 4.More sources and fewer sensors play similar roles in decreasing the sparse level in sparse recovery theory, and BSBLs handle this case efficiently.
Simulation 7 tests the spectral performance of the above algorithms.Its conditions are two uncorrelated chirps with angles of 1 0 −  and 2 5  with a center frequency of 1 kHz and a bandwidth of 400 Hz from 0.8 kHz to 1.2 kHz,
Simulation 7 tests the spectral performance of the above algorithms.Its conditions are two uncorrelated chirps with angles of −10 • and 25 • with a center frequency of 1 kHz and a bandwidth of 400 Hz from 0.8 kHz to 1.2 kHz, SNR = 20dB, the number of sensors is N = 8, the number of snapshots is L = 4, and the grid interval is 1 • .Intuitively, in Figure 8, JLZA−DOA and W−SpSF arise explicit sidelobes around their spikes, while GS−WSpSF, ANM, W−SBL, and BSBL are excellent since their spectra are almost without sidelobes and the spikes are sharp.It is worth noting that the spikes of W−SBL seem to be very low at some frequency points.Specifically, W-SBL fluctuates intensely with varied frequencies, i.e., the signal energy non-uniformly leaks between different frequencies.GS−WSpSF and ANM are better; at least their spikes are visible and apparent over the range of all the frequency points.The proposed BSBL has the highest spikes and varies little over the range of frequencies.Thus, three conclusions are drawn.(1) When sparse recovery is adopted, BSBL can ensure that equal energy is assigned between different frequency bins.
(2) BSBL realizes the highest spikes and shows the best convergence effect, i.e., ensuring global minima.
in Figure 8, JLZA−DOA and W−SpSF arise explicit sidelobes around their spikes, while GS−WSpSF, ANM, W−SBL, and BSBL are excellent since their spectra are almost without sidelobes and the spikes are sharp.It is worth noting that the spikes of W−SBL seem to be very low at some frequency points.Specifically, W-SBL fluctuates intensely with varied frequencies, i.e., the signal energy non-uniformly leaks between different frequencies.GS−WSpSF and ANM are better; at least their spikes are visible and apparent over the range of all the frequency points.The proposed BSBL has the highest spikes and varies little over the range of frequencies.Thus, three conclusions are drawn.(1) When sparse recovery is adopted, BSBL can ensure that equal energy is assigned between different frequency bins.(2) BSBL realizes the highest spikes and shows the best convergence effect, i.e., ensuring global minima.Simulation 8 examines the RMSE performance with respect to the number of sensors.The condition set is the same as baseline conditions, except for wideband sources.As expected, BSBL achieves excellent RMSE performance in Figure 9.Over the whole range of the number of snapshots, BSBL shows overwhelming advantages and outperforms others patently.In fact, DOA estimation for wideband sources is difficult, and one of the main reasons is many algorithms fail to realize accuracy estimation over the whole range of frequency bins.Simulation 7 shows the special ability of BSBL to overcome this problem, and Simulation 8 seems to verify it again.On the one hand, BSBL for wideband sources maintains the superiority of narrowband BSBL, so it can obtain excellent estimation performance.On the other hand, BSBL extends the advantages of SBL to wideband sources, Simulation 8 examines the RMSE performance with respect to the number of sensors.The condition set is the same as baseline conditions, except for wideband sources.As expected, BSBL achieves excellent RMSE performance in Figure 9.Over the whole range of the number of snapshots, BSBL shows overwhelming advantages and outperforms others patently.In fact, DOA estimation for wideband sources is difficult, and one of the main reasons is many algorithms fail to realize accuracy estimation over the whole range of frequency bins.Simulation 7 shows the special ability of BSBL to overcome this problem, and Simulation 8 seems to verify it again.On the one hand, BSBL for wideband sources maintains the superiority of narrowband BSBL, so it can obtain excellent estimation performance.On the other hand, BSBL extends the advantages of SBL to wideband sources, i.e., achieving robust sparse recovery with only a few snapshots for all the sources over the whole frequency band.Simulation 9 tests the RMSE performance with respect to SNRs.In Figure 10, BSBL outperforms others and improves well when SNRs increase.It is worth noting that the results are different from the narrowband ones in Simulation 2. To be specific, BSBL seems to fluctuate intensely with the varied SNRs.BSBL for wideband sources cannot work well enough compared to narrowband source cases.We carefully analyze the reasons and find that BSBL cannot realize ensuring global minima at each frequency bin for sparse recovery.Despite this, the defects cannot obscure the virtues.BSBL still achieves impressive performance for wideband sources.

Analysis of Sparse Bayesian Performance
According to the common perception of SBL, the elaborate Bayesian framework with substantial priors is regarded as a characteristic to enhance sparsity well because priors play a role of regularization in sparse recovery [55,59].Here, we abstract several Bayesian frameworks from several off-grid SBL-based algorithms, such as RVM−DOA [37], RV−ON−SBL [47], ON−SBLRVM [43], SBLMC [39], and HSL [46], for comparison and analysis.As shown in Figure 11, RVM−DOA, RV−ON−SBL, and ON−SBLRVM are only imposed on Gaussian priors that have been proven of poor Bayesian performance.The complicated Bayesian frameworks of HSL and BSBL are the same, except for the priors assigned to the unknown variables; thus, they may perform equally well.SBLMC has the most elaborate Bayesian framework composed of sufficient priors, so its sparsity perfor- Simulation 9 tests the RMSE performance with respect to SNRs.In Figure 10, BSBL outperforms others and improves well when SNRs increase.It is worth noting that the results are different from the narrowband ones in Simulation 2. To be specific, BSBL seems to fluctuate intensely with the varied SNRs.BSBL for wideband sources cannot work well enough compared to narrowband source cases.We carefully analyze the reasons and find that BSBL cannot realize ensuring global minima at each frequency bin for sparse recovery.Despite this, the defects cannot obscure the virtues.BSBL still achieves impressive performance for wideband sources.Simulation 9 tests the RMSE performance with respect to SNRs.In Figure 10, BSBL outperforms others and improves well when SNRs increase.It is worth noting that the results are different from the narrowband ones in Simulation 2. To be specific, BSBL seems to fluctuate intensely with the varied SNRs.BSBL for wideband sources cannot work well enough compared to narrowband source cases.We carefully analyze the reasons and find that BSBL cannot realize ensuring global minima at each frequency bin for sparse recovery.Despite this, the defects cannot obscure the virtues.BSBL still achieves impressive performance for wideband sources.

Analysis of Sparse Bayesian Performance
According to the common perception of SBL, the elaborate Bayesian framework with substantial priors is regarded as a characteristic to enhance sparsity well because priors play a role of regularization in sparse recovery [55,59].Here, we abstract several Bayesian frameworks from several off-grid SBL-based algorithms, such as RVM−DOA [37], RV−ON−SBL [47], ON−SBLRVM [43], SBLMC [39], and HSL [46], for comparison and analysis.As shown in Figure 11, RVM−DOA, RV−ON−SBL, and ON−SBLRVM are only imposed on Gaussian priors that have been proven of poor Bayesian performance.The complicated Bayesian frameworks of HSL and BSBL are the same, except for the priors assigned to the unknown variables; thus, they may perform equally well.SBLMC has the most elaborate Bayesian framework composed of sufficient priors, so its sparsity performance will be perfect theoretically.

Analysis of Sparse Bayesian Performance
According to the common perception of SBL, the elaborate Bayesian framework with substantial priors is regarded as a characteristic to enhance sparsity well because priors play a role of regularization in sparse recovery [55,59].Here, we abstract several Bayesian frameworks from several off-grid SBL-based algorithms, such as RVM−DOA [37], RV−ON−SBL [47], ON−SBLRVM [43], SBLMC [39], and HSL [46], for comparison and analysis.As shown in Figure 11, RVM−DOA, RV−ON−SBL, and ON−SBLRVM are only imposed on Gaussian priors that have been proven of poor Bayesian performance.
The complicated Bayesian frameworks of HSL and BSBL are the same, except for the priors assigned to the unknown variables; thus, they may perform equally well.SBLMC has the most elaborate Bayesian framework composed of sufficient priors, so its sparsity performance will be perfect theoretically.To confirm the above conjectures, two more simulations are performed to test th estimation performance of these algorithms.
Simulation 10 tests the RMSE performance with respect to the number of snapshot The conditions are the same as the baseline conditions.In Figure 12, the RMSE perfor mance of RVM−DOA, RV−ON−SBL, and ON−SBLRVM are expectedly worse than other but SBLMC seems to not meet our expectations, while BSBL and HSL achieve preeminen RMSE performance.BSBL and HSL with moderately elaborate Bayesian frameworks ou perform others, including the SBLMC with the most elaborate one.The result seems t violate the rule of Bayesian learning, which will be explained in the following text.To confirm the above conjectures, two more simulations are performed to test the estimation performance of these algorithms.
Simulation 10 tests the RMSE performance with respect to the number of snapshots.The conditions are the same as the baseline conditions.In Figure 12, the RMSE performance of RVM−DOA, RV−ON−SBL, and ON−SBLRVM are expectedly worse than others, but SBLMC seems to not meet our expectations, while BSBL and HSL achieve preeminent RMSE performance.BSBL and HSL with moderately elaborate Bayesian frameworks outperform others, including the SBLMC with the most elaborate one.The result seems to violate the rule of Bayesian learning, which will be explained in the following text.
The conditions are the same as the baseline conditions.In Figure 12, the RMSE performance of RVM−DOA, RV−ON−SBL, and ON−SBLRVM are expectedly worse than others, but SBLMC seems to not meet our expectations, while BSBL and HSL achieve preeminent RMSE performance.BSBL and HSL with moderately elaborate Bayesian frameworks outperform others, including the SBLMC with the most elaborate one.The result seems to violate the rule of Bayesian learning, which will be explained in the following text.Simulation 11 tests the RMSE performance with respect to SNRs.In Figure 13, the proposed BSBL still seems to work best.It is worth noticing that the advantages of BSBL are not obvious, especially when SNRs are low.For all the SBL-based methods, BSBL has shown no more advantages than others on the condition of low SNRs because hierarchical priors improve sparsity if, and only if, SNRs are high.To be specific, Bayesian learning is able to find global minima, even at low SNRs, but the parameters yielded by hierarchical priors seem to update well only if SNRs are high.Simulation 11 tests the RMSE performance with respect to SNRs.In Figure 13, the proposed BSBL still seems to work best.It is worth noticing that the advantages of BSBL are not obvious, especially when SNRs are low.For all the SBL-based methods, BSBL has shown no more advantages than others on the condition of low SNRs because hierarchical priors improve sparsity if, and only if, SNRs are high.To be specific, Bayesian learning is able to find global minima, even at low SNRs, but the parameters yielded by hierarchical priors seem to update well only if SNRs are high.Based on the simulation results, it can be seen that the complicated Bayesian frameworks, i.e., Ⅱ, Ⅲ, and Ⅳ in Figure 10, indeed achieve more excellent Bayesian performance than the canonical one, i.e., Ⅰ in Figure 11.However, SBLMC with the most elaborate Bayesian framework has not met our initial expectation, which can be explained by the fact that (ⅰ) SBL belongs to machine learning, so the Bayesian framework with too many priors will yield massive iterative hyperparameters, leading to overfitting during the iterative process.(ⅱ) SBLMC is developed in the presence of mutual coupling; thus, the involved additional hyperparameter iteration is bound to affect the key parameters related to DOA estimation.
It is worth emphasizing that BSBL achieves slightly better estimation performance than HSL.The result indicates that the indirectly induced Student's t priors, generated by Gaussian and Gamma priors, indeed express excellent sparsity performance.In fact, Student's t priors have preferable sparsity-inducing performance, which has been mentioned in [55,56].
Overall, the above three subsection simulation results sufficiently demonstrate the superiority of BSBLs.Understandably, BSBL leverages hierarchical Gaussian and Gamma priors and uses VBI to complete the Bayesian inference so as to construct the corresponding iterative algorithm.Theoretically, the superiority is guaranteed by (1) indirect Student's t-distributions, which have an excellent sparsity-inducing ability [56], and (2) variational approximation for Bayesian inference shows better performance than maximum Based on the simulation results, it can be seen that the complicated Bayesian frameworks, i.e., II, III, and IV in Figure 10, indeed achieve more excellent Bayesian performance than the canonical one, i.e., I in Figure 11.However, SBLMC with the most elaborate Bayesian framework has not met our initial expectation, which can be explained by the fact that (i) SBL belongs to machine learning, so the Bayesian framework with too many priors will yield massive iterative hyperparameters, leading to overfitting during the iterative process.(ii) SBLMC is developed in the presence of mutual coupling; thus, the involved additional hyperparameter iteration is bound to affect the key parameters related to DOA estimation.
It is worth emphasizing that BSBL achieves slightly better estimation performance than HSL.The result indicates that the indirectly induced Student's t priors, generated by Gaussian and Gamma priors, indeed express excellent sparsity performance.In fact, Student's t priors have preferable sparsity-inducing performance, which has been mentioned in [55,56].
Overall, the above three subsection simulation results sufficiently demonstrate the superiority of BSBLs.Understandably, BSBL leverages hierarchical Gaussian and Gamma priors and uses VBI to complete the Bayesian inference so as to construct the corresponding iterative algorithm.Theoretically, the superiority is guaranteed by (1) indirect Student's t-distributions, which have an excellent sparsity-inducing ability [56], and (2) variational approximation for Bayesian inference shows better performance than maximum posterior (MAP) estimation adopted in many SBL-based methods [60].In addition, two approximation operations have achieved impressive running efficiency beyond many state-of-the-art methods.Moreover, BSBL still performs well in wideband cases and outperforms other algorithms in smoothing the spectrum peaks and super resolutions.Last but not least, BSBL has suppressed temporal correlation efficiently owing to its tactful algorithm design.

Conclusions
In this paper, we develop a DOA estimator (i.e., BSBL) based on sparse Bayesian learning with hierarchical priors.Due to the unacceptable computational complexity caused by the vectorization of the MMV model, two approximation operations are creatively introduced, thereby yielding two faster versions of BSBL, i.e., BSBL-APPR and BSBL-GAMP.As expected, all the proposed BSBLs (including BSBL, BSBL-APPR, BSBL-GAMP) achieve excellent estimation performance.For narrowband source estimation, BSBLs show perfect sparsity performance owing to the designed hierarchical priors.Further, BSBLs inherit and even extend the advantages of SBL, such as sparse signal recovery guarantee, less dependency on numerous snapshots or high SNRs, and the ability to handle underdetermined DOA estimation.Moreover, BSBLs enable robustness to temporally correlated sources and adaptability to coarse grids, which owes to the considered temporal correlation and the used grid refinement.For wideband source estimation, BSBL almost maintains huge advantages, i.e., realizing highly accurate estimation among the whole frequency band, while others suffer performance reduction to varying degrees.However, in wideband cases, BSBL cannot retain the good performance as in narrowband cases if low SNRs are adopted, which is our goal to solve in the next study.For Bayesian performance, BSBL with a moderately elaborate Bayesian framework realizes the best estimation performance.Furthermore, BSBL can balance both sparsity and complexity.Specifically, BSBL achieves sharp spectrum spikes and avoids overfitting produced by too many parameters.
Overall, the proposed BSBLs tactfully combine the hierarchical priors and the blocksparse model that contribute much to complexity reduction, which is never achieved by other SBL-based methods.Moreover, BSBLs retain and extend the advantages of SBL.Most importantly, BSBL is more practical and applicable when sources are temporally correlated or wideband.Despite this, BSBL seems not to be perfect because its performance suffers a loss at low SNRs in wideband cases to some extent.Anyway, the proposed BSBLs are worth recommendation and praise.
The terms unrelated to α are absorbed into the constant.α obeys another Gamma distribution with shape parameter c + NL 2 and scale parameter d + (x−Φp) H (x−Φp) 2 .

Appendix E
To obtain the mean and variance of variables (i.e., z i and p j ), their distributions are transformed into logarithmic form as follows: ln (49) = ln [N(x i |z i , α −1 )N(z i |υ z i , The final equation of (A6) is yielded based on the expansion of the Gaussian function.

1 (
noise matrix with each entry l n obeying an i.i.d.Gaussian distribution denoted as

Figure 1 .
Figure 1.Flowchart of radar signal processing for DOA estimation.

.Figure 1 .
Figure 1.Flowchart of radar signal processing for DOA estimation.Let θ M m=1 be M-grid sampling that covers spatial range [−π/2, π/2].If sources are located on the grid exactly, (1) can be transformed into a sparse model expressed as X = AP + N.(2)

b 1 (
are the shape parameter and scale Then, a Gamma prior is applied to α so that
least one) are provided to ensure that BSBL-APPR performs normally.Simulation 1 tests the ability of various algorithms to suppress temporal correlation.Specifically, the amplitudes and phases of temporally correlated coefficients uniformly vary from [0,1] and[0,2 ]

Figure 4 .= rather than 4 L = since 1 l
Figure 4. RMSE versus SNR.Simulation 4 investigates the RMSE performance with respect to the number of sensors.Here, this simulation is executed at the number of snapshots 20 L = rather than 4 L = since 1 l −SVD cannot work normally when the number of sensors exceeds the number of snapshots.Intuitively, in Figure5, the RMSE performance of BSBLs is still excellent, although BSBLs are inferior to IC-SPICE at 4 N = .In fact, the results are related to the ability to solve underdetermined DOA estimation problems.SBL still seems to find the sparsest solutions, although the restricted isometry property (RIP) is not satisfied[48].When the number of sensors is not large enough, i.e., the solution to be solved is not sparse enough, SBL still tries its best to realize global minima.Thus, BSBLs can handle

Figure 4 .
Figure 4. RMSE versus SNR.Simulation 4 investigates the RMSE performance with respect to the number of sensors.Here, this simulation is executed at the number of snapshots L = 20 rather than L = 4 since l 1 −SVD cannot work normally when the number of sensors exceeds the number of snapshots.Intuitively, in Figure5, the RMSE performance of BSBLs is still excellent, although BSBLs are inferior to IC-SPICE at N = 4.In fact, the results are related to the ability to solve underdetermined DOA estimation problems.SBL still seems to find the sparsest solutions, although the restricted isometry property (RIP) is not satisfied[48].When the number of sensors is not large enough, i.e., the solution to be solved is not sparse enough, SBL still tries its best to realize global minima.Thus, BSBLs can handle underdetermined cases efficiently.In other words, BSBLs are able to realize SSR effectively on the condition of a few sensors.

Sensors 2024 ,
24, 2336 20 of 30 underdetermined cases efficiently.In other words, BSBLs are able to realize SSR effectively on the condition of a few sensors.

Figure 6 .
Figure 6.RMSE versus grid interval.Simulation 6 examines the RMSE performance with respect to the number of sources.The different conditions are as follows: the DOA sets of sources are selected from [ 90 ,90 ] −   randomly, and 10 L = is chosen to ensure the normal operation of 1 l −SVD.In Figure 7, all the RMSEs drop rapidly, but BSBLs seem to slow down the pace of performance degradation, which implies that BSBLs have the potential to locate more sources.In fact, the results are another proof of the excellent underdetermined DOA estimation ability of BSBLs in Simulation 4.More sources and fewer sensors play similar roles in decreasing the sparse level in sparse recovery theory, and BSBLs handle this case efficiently.

8 N
= , the number of snapshots is 4 L = , and the grid interval is 1  .Intuitively, in Figure8, JLZA−DOA and W−SpSF arise explicit sidelobes around their spikes, while GS−WSpSF, ANM, W−SBL, and BSBL are excellent since their spectra are almost without sidelobes and the spikes are sharp.It is worth noting that the spikes of W−SBL seem to be very low at some frequency points.Specifically, W-SBL fluctuates intensely with varied frequencies, i.e., the signal energy non-uniformly leaks between different frequencies.GS−WSpSF and ANM are better; at least their spikes are visible and apparent over the range of all the frequency points.The proposed BSBL has the highest spikes and varies little over the range of frequencies.Thus, three conclusions are drawn.(1) When sparse recovery is adopted, BSBL can ensure that equal energy is assigned between different frequency bins.(2) BSBL realizes the highest spikes and shows the best convergence effect, i.e., ensuring global minima.

Table 1 .
List of notations.
CN (x|µ, Σ)x Obeys a complex Gaussian distribution with mean µ and variance Σ

Table 2 .
Complexity of various algorithms.