Compressive Sensing via Variational Bayesian Inference under Two Widely Used Priors: Modeling, Comparison and Discussion

Compressive sensing is a sub-Nyquist sampling technique for efficient signal acquisition and reconstruction of sparse or compressible signals. In order to account for the sparsity of the underlying signal of interest, it is common to use sparsifying priors such as Bernoulli–Gaussian-inverse Gamma (BGiG) and Gaussian-inverse Gamma (GiG) priors on the components of the signal. With the introduction of variational Bayesian inference, the sparse Bayesian learning (SBL) methods for solving the inverse problem of compressive sensing have received significant interest as the SBL methods become more efficient in terms of execution time. In this paper, we consider the sparse signal recovery problem using compressive sensing and the variational Bayesian (VB) inference framework. More specifically, we consider two widely used Bayesian models of BGiG and GiG for modeling the underlying sparse signal for this problem. Although these two models have been widely used for sparse recovery problems under various signal structures, the question of which model can outperform the other for sparse signal recovery under no specific structure has yet to be fully addressed under the VB inference setting. Here, we study these two models specifically under VB inference in detail, provide some motivating examples regarding the issues in signal reconstruction that may occur under each model, perform comparisons and provide suggestions on how to improve the performance of each model.


Introduction
Compressive sensing (CS) involves efficient signal acquisition and reconstruction techniques in a sub-Nyquist sampling sense. The CS framework can capture the vital information of the underlying signal via a small number of measurements while retaining the ability to reconstruct the signal. CS operates under the assumption that the signal is compressible or sparse, and the number and location of dominating components are unknown in most cases [1][2][3]. Compressibility or sparsity means that the signal has few dominating elements under some proper basis. CS has been used in a variety of applications such as the single-pixel camera, missing pixels and inpainting removal of images, biomedical such as heart rate estimation, internet of things (IoT), geostatistical data analysis, seismic tomography, communications such as blind multi-narrowband signals sampling and recovery, the direction of arrival (DoA) estimation, spectrum sharing of radar and communication signals, wireless networks and many more . In the linear CS framework, the problem is posed as where y ∈ R M contains the measurements, x s ∈ R N is the sparse signal of interest, e is the noise representing either the measurement noise or the insignificant coefficients of x s and, generally, M N [1,2]. The measurement matrix can be defined as A = ΦΨ, where Φ is the sensing design matrix and Ψ is a proper sparsifying basis. There exist various approaches to solve for x s in (1) including greedy-based, convex-based, thresholding-based and sparse Bayesian learning (SBL) algorithms . Typically, the performance of CS reconstruction is determined in terms of the mean-squared reconstruction error. In this paper, we are also interested in the more demanding requirements of the probability of detection and the false alarm of the nonzero components. This is of more interest to CS applications such as blind multinarrowband signals, spectrum sharing RADAR, etc. [11][12][13][14][15]. The focus of this paper is on sparse Bayesian learning (SBL) for the CS problem. Bayesian learning models are flexible in incorporating prior knowledge of the characteristics of the underlying signal into the model. Bayesian learning also provides a distribution of the hidden variables, which is more informative than the point estimate approaches. A prior favoring the sparsity or compressibility in x s can be represented in the SBL framework via Gaussian-inverse Gamma (GiG), Laplace-inverse Gamma (LiG), Bernoulli-Gaussianinverse Gamma (BGiG), often referred to as spike-and-slab prior, etc. [27,[46][47][48][49][50][51][52][53][54][55][56][57][58][59]. The inference on parameters and hidden variables in these models is usually made using Markov chain Monte Carlo (MCMC) and variational Bayes (VB) [27,[45][46][47][48][49][50][51][52]. In this paper, we focus on the two most commonly used SBL prior models for solving the inverse problem of compressive sensing: Bernoulli-Gaussian-inverse Gamma (BGiG) prior and Gaussian-inverse Gamma (GiG). These models have been widely used, along with some additional priors, for sparse recovery of signals or images with block-sparsity/clustering patterns, sparse signals with correlated coefficients or other structured patterns [26,27,[48][49][50][51]62,63].
We use VB inference to estimate the variables and parameters of the SBL model. VB is preferred over MCMC because MCMC is computationally expensive, though it can numerically approximate exact posterior distributions with a sufficient amount of computation. The convergence diagnostic of MCMC requires additional work, such as measuring the potential scale reduction factor (PSRF) for all the hidden variables and parameters of the model or monitoring their trace plots [45,[50][51][52]65]. In contrast, VB inference can lead to a reasonable approximation of the exact posteriors, using less computation than MCMC and less effort to monitor the convergence [45,51,66,67]. In this paper, we present the derivation of the update rules of the parameters and variables using VB inference for both the BGiG and GiG models. (Portions of this derivation have been previously presented in [68,69]). Although these prior models have been widely used in various applications of compressive sensing, the study of the overall performance of these models under VB inference has yet to be thoroughly investigated. The preference for one model over the other becomes crucial when dealing with moderate or low sampling ratios, which we discuss in this paper. Here, we study the issues associated with each model via some motivational examples. Pre-/postprocessing approaches will then be discussed to tackle the issues. Finally, the overall performance of BGiG and GiG is compared.
The remainder of this work is organized as follows. In Section 2, we present a brief background on VB inference. We study Bernoulli-Gaussian-inverse Gamma modeling for CS using VB in Section 3. Some motivational examples are provided to show the issues with this approach. Section 4 represents Gaussian-inverse Gamma modeling, the associated update rules using VB inference and a motivational example of the issue that may occur using this approach. In Section 5, we study the improvement of the performances of the models after some pre-/postprocessing along with simulation results and comparisons. Section 6 concludes this work.

Variational Bayesian Inference
Variational Bayes (VB) is an effective approach to approximate intractable integrals that may arise in Bayesian inference. The main idea behind variational methods is to use a family of distributions over the latent variables with their own variational param-eters. VB is a fast alternative to sampling methods such as Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) for performing approximate Bayesian inference [70,71]. For a probabilistic model with unknown parameters θ and hidden variables x, the posterior distribution of the unknowns, given a set of observations y, can be written as p(x, θ|y) = p(x, θ, y)/p(y). Finding the exact posterior in closed form to perform the inference would be a challenge, as the marginal distribution p(y) = p(y, x, θ)dxdθ is often intractable. As an efficient approximation method for such inference problems, VB provides an analytical approximation to the posterior p(x, θ|y). VB approximates the joint density p(x, θ|y) via a variational distribution Q x,θ (x, θ), i.e., p(x, θ|y) ≈ Q x,θ (x, θ). VB assumes that the distribution Q can be fully factorized with respect to the unknown parameters and hidden variables, i.e., where I and J are the number of unknown parameters and hidden variables, respectively. This independence assumption in VB further simplifies the search for a closed-form solution to the approximation of the actual posterior. We desire to select the variational distribution Q * x,θ (x, θ) as close as possible to p(x, θ|y), where the closeness metric for distribution Q x,θ (x, θ) is formulated as minimizing the Kullback-Leibler (KL) divergence of the approximation Q x,θ (x, θ) and the true posterior p(x, θ|y) as The quantity log p(y) can be written as log p(y) = log{ p(x, θ, y)dxdθ}. Then, defining It is straightforward to show that Since (by Jensen's inequality) KL(Q x,θ (x, θ) ≥ 0, log(p(y) ≥ F(Q x,θ (x, θ). Since log(p(y) is constant with respect to Q x,θ , minimizing the KL-divergence between the actual posterior distribution and the variational distribution is equivalent to maximizing the lower bound F(·) [66,67]. Since the term p(y) in p(x, θ|y) = p(x, θ, y)p(y) does not involve the variational distribution Q x,θ (x, θ), this term can be ignored when maximizing F(·). The lower bound F(·) on the model log-marginal likelihood can be iteratively optimized until the convergence by the following update rules [66,72]. VB-E step: VB-M step: This results in an iterative algorithm analogous to the expectation-maximization (EM) approach.

Bernoulli-Gaussian-Inverse Gamma Modeling and SBL(BGiG) Algorithm
In the inverse problem of CS defined in (1), the goal is to recover the sparse vector x s . In the Bernoulli-Gaussian-inverse Gamma model, the sparse solution is defined as where s is a binary support vector indicating the non-zero locations in the solution, x represents values of the solution and • is Hadamard (element-by-element) product [47]. We refer to the algorithm associated with this Bayesian modeling based on VB inference as SBL(BGiG). SBL using VB inference for the clustered pattern of sparse signals has already been investigated in the recent literature [45,50,51,58]. In this paper, however, we intend to focus on the ordinary SBL using VB inference modeling without promoting any structure on the supports other than sparsity itself. We show that when the sampling ratio is moderate or low (with respect to the sparsity level), the reconstruction performance becomes sensitive to selecting the support-related hyperparameters.
We define a set of priors as follows [47,68,69]. We model the elements of vector s as where α 0 and β 0 are the support-related hyperparameters. Setting α 0 and β 0 to small values and with α 0 β 0 encourages s to be sparse on average. The prior on the solution value vector is defined as Here, τ is the precision value. Finally, the prior on the noise is where θ 0 and θ 1 are set to small positive values.

Update Rules of SBL(BGiG) Using VB Inference
According to the VB algorithm defined in (2) and (3), the update rule of the variables and parameters of the BGiG model can be simplified as follows [68]. The details of these derivations appear in Appendix A.1.
• Update rule for the solution precision τ • Update rule for the noise precision This yields to the following update rule for the precision of the noise component The stopping criterion of the algorithm is made based on the log-marginalized likelihood. We define the stopping condition in terms of L := log {p(y|s, , τ)}. The marginalized likelihood can be written as p(y|s, , τ) = p(y|x, s, )p(x|τI N )dx.
After some simplification, the negative log-likelihood is proportional to andS := diag{s}. Therefore, the stopping condition can be made as for some small value of threshold T 0 [50], where and Figure 1 illustrates the graphical Bayesian representation of the BGiG model, which is an undirected graph. The shaded node y shows the observations (measurements), and the small solid nodes represent the hyperparameters. Each unshaded node denotes a random variable (or a group of random variables). The flowchart representation of the algorithm is shown in Figure 2 motivated by the graphical approach in [47,73]. According to the pseudocode in Algorithm 1 and the flowchart in Figure 2, first, the hyperparameters of the model are set. The supportrelated hyperparameters α 0 and β 0 are suggested to be set to small values with α 0 β 0 to encourage s to be sparse on the average. The hyperparameters a 0 and b 0 on the precision of the solution-value vector are also initialized and suggested to be small not to bias the estimation when the measurements are incorporated. The hyperparameters θ 0 and θ 1 on the precision of the noise are recommended to be of order 10 −6 for high SNRs. For moderate and low SNRs, higher values are recommended. In the next step, all the main variables of the model are drawn i.i.d. from their corresponding prior distributions defined in (5)- (7). Then, the stopping condition is computed based on the log-marginalized likelihood in (19). In the main loop, all of the main variables of the model are updated via the expected values obtained from the VB inference. Specifically, we first update the support vector and the solution value components; then, the precisions of the solution vector and the noise are updated. Finally, the stopping criterion is computed through the measure of the log-marginalized likelihood of the observations. The pseudocode of the algorithm is provided below.   (14) % (Precision on the solution) Compute˜ from (16) % (Precision on the noise) Compute L [t] from (19) and then t = t + 1 End While

Issues with SBL(BGiG)
In this section, we show that the estimated solution using SBL(BGiG) algorithm is sensitive to support-related hyperparameters, i.e., α 0 and β 0 in (5). We provide an example under three cases to demonstrate this issue. We generated a random scenario, where the true solution x s ∈ R 100 has the sparsity level of k = 25, that is, the true x (or s) has k active elements. The active elements of s were drawn randomly. The nonzeros of x s , corresponding to the active locations of s, were drawn from N (0, σ 2 x ), with σ 2 x = 1. Each entry of the sensing matrix A was drawn i.i.d. from the Gaussian distribution N (0, 1), then normalized, so each column has the Euclidian norm of 1. The elements of measurement noise were drawn from N (0, σ 2 ) with SNR = 25 dB, where SNR := 20 log 10 (σ x /σ). The hyperparameters of τ and were set to a 0 = b 0 = 10 −3 and θ 0 = θ 1 = 10 −6 , respectively. In Cases 1-3, we set the pair (α 0 , β 0 ) with low emphasis on the prior (0.01, 0.99), moderate emphasis (0.1, 0.9) and fairly high emphasis (1.4, 2), respectively.
From the top to the bottom row of According to Figure 3, the setting for (α 0 , β 0 ) in Case 1 fails to provide perfect results even for high sampling ratios. Similarly, Figure 4 shows that the settings for (α 0 , β 0 ) in Case 2 do not provide encouraging results even for high sampling ratios. Specifically, it turns out that Case 1 and Case 2 provide sparse solutions for the sampling ratios within the range [0, 1], where λ = 1 means M = N.
According to Figure 5, setting (α 0 , β 0 ) to (1.4, 2) seems to be a reasonable choice for high sampling ratios (over 70%), while it is not a good choice for the lower sampling ratios. This issue can be seen in the supports plot in the 2nd and 3rd row of Figure 5. One may argue that the estimated support vectorŝ can be filtered via some threshold value (such as 0.3) for λ = 0.6. However, thresholding will adversely affect the detection rate, and setting the threshold depends on our understanding of the signal characteristics. Furthermore, we should account for the effect of the filtered supports since their corresponding estimated components inx s contribute to fitting the model to the measurements.
In Table 1, we summarize the performance of the generated example for Cases 1-3, where P D , P FA and N MSE denote the detection rate and false alarm rate in support recovery and the normalized mean-squared error between the true and the estimated sparse signal. This also shows that the algorithm fails to provide reasonable results for the sampling ratio of λ = 0.4. These experiments suggest that there is no fixed setting for (α 0 , β 0 ) capable of performing reasonably well for all sampling ratios and thus, selecting the hyperparameters (α 0 , β 0 ) should be made with care.
Continuing this examination, in Figures 6-8, we illustrate the negative log-marginalized likelihood, the noise precision estimation and the estimated precision on the generated true solution in Cases 1-3, respectively. The horizontal axis shows the iterations until the stopping rule is met. As expected, as the sampling ratio increases, the algorithm requires fewer iterations to meet its stopping condition. This can be seen on the negative log-marginalized likelihood plots in  In these experiments, the actual precision of the solution components was set to τ = 1, and the actual noise precision was set to = 316.2.
For Cases 1 and 2, according to Figures 6-8, the estimated precisions on both the noise and solution components were far off from the actual ones even for λ = 0.8. Thus, it resulted in poor performance in signal recovery for Cases 1 and 2 (see Figures 3 and 4).
For Case 3, the estimated precisions on the noise and the solution components were acceptable for λ = 0.8 but far off from the actual ones for lower sampling ratios (see Figure 8). The main issue of the failures can be found in the update rule of the support learning vector s defined in (8). It is important to balance between the terms c n and κ n , where c n imposes the effect of hyper-prior on s accompanied by the current estimate of s n . In contrast, κ n imposes the contribution of the current estimates of noise precision, solution and other supports in fitting the model to the measurements. Therefore, if we impose a substantial weight on the sparsity via c n , the solution tends to neglect the effect of κ n and vice versa. This is why we had sparse (with poor performance) in Cases 1 and 2 for all the represented sampling ratios and nonsparse (with poor performance) for moderate and lower sampling ratios in Case 3. These results suggest that the algorithm and its update rules are sensitive to the selection of hyperparameters on the Gamma prior on the support vector s. The main issue can be seen in (9), where the selection of the hyperparameters α 0 and β 0 resulted in a large or small value in c n due to the digamma function.

Gaussian-Inverse Gamma Modeling and SBL(GiG) Algorithm
In this section, we consider the Gaussian-inverse Gamma (GiG) model. In this model, each component x n of the solution is modeled by zero-mean Gaussian with the precision τ n . The main difference between this model and the model defined in Section 3 is that the GiG model does not have the support vector s; instead, different precisions are considered on the components of the solution vector x s in (1). A simpler version of GiG can also be used by defining the same precision τ for all the components of x s .
Here, we rather use different precisions to make the GiG model have almost the same complexity as the BGiG model in terms of the parameters to be learned. The set of priors in this model is defined as follows.
where a 0 and b 0 denote the shape and rate of the Gamma distribution, respectively. The entries of the noise component e are defined the same as (7), i.e., where θ 0 and θ 1 are set to small positive values. The estimation of the parameters in this model is carried out using VB inference, as discussed below.

Update Rules of SBL(GiG) Using VB Inference
According to the VB algorithm described in (2) and (3), the update rule of the variables and parameters of the GiG model can be simplified as follows. The details of these derivations appear in Appendix A.2.

•
Update rule for the precision τ n on x n using VB Thus,τ n = a 0 + 1 • Update rule for the noise precision using VB whereΨ • Update rule for the solution vector x using VB where andT := diag{[τ 1 , . . . ,τ N ]}.
We set the stopping rule of the algorithm using the marginalized likelihood (evidence) defined as p(y| , τ) = p(y|x, , τ)p(x|τ)dx.
After simplification and for the comparison purposes of L [t] with L [t −1] in the updating process, we have where Σ 0 is defined as Therefore, similar to SBL(BGiG), the stopping condition can be made as for some small value of threshold T 0 . Figure 9 illustrates the graphical Bayesian representation of the GiG model, which is an undirected graph. Similar to Figure 1, the shaded node y shows the observations, the small solid nodes represent the hyperparameters and the unshaded nodes denote the random variables.

Issues with SBL(GiG)
An issue with the SBL(GiG) algorithm is that the solution becomes nonsparse since it does not incorporate a binary vector s (hard-thresholding or soft-thresholding if the expected value is used) as we had in SBL(BGiG). This may have no major effect on the signal reconstruction for high sampling ratios. However, the nonsparseness effect appears in low sampling ratios by misleading the algorithm to wrongly activate many components in the estimated signal yet providing a good fit of the model to the measurements. Here, we use the same example as we made for the SBL(BGiG) model with the same sensing matrix A, measurement vector y and noise e. Notice that in the SBL(BGiG) model, we considered the same precision τ on all the components of the solution value vector x support vector s. In contrast, the SBL(GiG) model does not have the support learning vector; instead, we assume that each component of the solution vector has different precision τ n . It turns out that SBL(GiG) is not very sensitive to the selection of the hyperparameters as the SBL(BGiG). Thus, here, we show the results for one case scenario for the hyperparameters. We use the same setting for the parameters of in the hyper prior as before, i.e., θ 0 = θ 1 = 10 −6 , and the same parameters for all the precisions τ n of the solution component, i.e., a 0 = b 0 = 10 −3 . In Figures 11 and 12, we illustrate the results after applying the SBL(GiG) algorithm. In Figure 11, from left to right, we show the results for sampling ratios of λ =0.8, 0.6, and 0.40, respectively. The first row shows the comparison of y withŷ = Ax s , the second row shows the true solution x s and the estimated solutionx s , and the third row demonstrates the estimated precisions on the solution components. In Figure 12, we demonstrate the negative log-marginalized likelihood comparison and the estimated noise precision against the true noise precision for the sampling ratios of λ = 0.8, 0.6 and 0.4.   From the results shown in Figures 11 and 12, we observe that the recovered signal tends to become nonsparse. This effect is illustrated in the second row of Figure 11. This can also be observed in the precision estimations of the solution components. More specifically, the true nonzero components in our simulations were drawn from a zero-mean Gaussian with the precision of τ n = 1. Thus, the ideal precision estimation would be within the two classes of values of 1 and infinity or very large values. However, the estimated results in our simulation do not show such a classification. As the sampling ratio decreases, the solution estimate has poor performance, due not only to the reduction in the number of measurements but also the nonsparseness behavior.

Preprocessing versus Postprocessing and Simulations
In this section, we show that in order to improve the performance of Bernoulli-Gaussian-inverse Gamma modeling using the SBL(BGiG) algorithm, we need to perform a preprocessing step. The results in Section 4 suggest one can perform some postprocessing for the SBL(GiG) algorithm to improve the reconstruction performance. Below, we provide more details for each of these algorithms.

Pre-Processing for the SBL(BGiG) Algorithm
Based on the observations made on the performance of SBL(BGiG) in Section 3.2, we showed that the pair of hyperparameters (α 0 ,β 0 ) should be selected with care. In other words, obtaining good performance with this algorithm needs some preprocessing to assess an appropriate setting for the parameters. For a more rigorous study, here, we perform a grid search on the hyperparameters (α 0 , β 0 ) to see whether we can find some common pattern in selecting these parameters for all sampling ratios. The grid search runs the algorithm for different values of α 0 and β 0 with the search range of [0.1, 2] with the resolution of 0.1. For each (α 0 , β 0 ) within this range, we ran 200 random trials and then averaged the results. The settings of these trials are represented in Table 2. Table 2. Settings for preprocessing analysis and simulations on SBL(BGiG).
We generated a random scenario, where the true solution x s ∈ R 100 has the sparsity level of k = 25. The active elements of s were drawn randomly. The nonzeros of x s were drawn from N (0, σ 2 x ), with σ 2 x = 1. Each entry of the sensing matrix A was drawn i.i.d. from the Gaussian distribution N (0, 1), then normalized. The elements of measurement noise were drawn from N (0, σ 2 ) with SNR = 25 dB. The results were examined to see what values of (α 0 , β 0 ) provided the highest performance in the detection rate vs. and false alarm rate. The simulation was executed for a range of sampling ratios in the range [0.05, 1] with the step size of 0.05. The results are demonstrated in Figure 13. In this figure, we also provide the results of performing a random Sobol search for (α 0 , β 0 ). A Sobol sequence is a low discrepancy quasirandom sequence. The two right plots in Figure 13 show the results for the best setting of (α 0 , β 0 ). It should be clear from Figure 13 that there is no fixed setting for these parameters in order to get the best performance for all sampling ratios. The two plots on the right of Figure 13 illustrate the performance based on the best values of these hyperparameters, which provided the best performance, i.e., tuned hyperparameters. We also examined the grid search results for the top 10 highest performances for each sampling ratio, where performance is in terms of P D − P FA and the normalized mean-squared error (NMSE). In Figure 14a, we demonstrate the top 10 highest performances based on NMSE and P D − P FA for different sampling ratios. In Figure 14b,c, we illustrate the values of (α 0 , β 0 ), which led to the performances shown in Figure 14a for different sampling ratios. Figure 15 details the top 10 values of (α 0 , β 0 ) vs. sampling ratio. According to Figure 14b,c, there is no specific pattern for these hyperparameters. Figure 15 also shows that hyperparameters need to be carefully selected.

Post-Processing for the SBL(GiG) Algorithm
Since the SBL(GiG) algorithm does not include the binary support vector s, as SBL(BGiG) possesses, the resulting solution tends to become nonsparse. This leads to a high detection rate for the location of active supports and a high false alarm rate. Thus, as the sampling ratio decreases, there is a high chance that this algorithm overwhelms the locations of the true solution. Therefore, SBL(GiG) requires some postprocessing to discard the components with low amplitudes. This problem becomes of great importance for applications where detecting the correct nonzero components is more crucial than the magnitudes of the nonzeros in the signal. This effect can be seen in Figure 16b. The curves with solid lines in this plot show the detection and false alarm rate in support recovery and the difference between the rates. This issue can be resolved by some postprocessing such as data-driven threshold tuning. That way, the amplitudes in the reconstructed signal with lower values than the threshold can be discarded. For this purpose, we set up 200 random trials, the same way as the one explained for SBL(BGiG), and then evaluate the performance in terms of NMSE by varying the threshold. Figure 16b shows the averaged results of 200 trials. The settings of these trials are represented in Table 3.  In Figure 16a, we observe that the postprocessing does not benefit us so much in terms of the reconstruction error for low and moderate sampling ratios. However, there is a threshold of around 0.25, for which the postprocessing step reduced the reconstruction error by approximately 3 dB. We set the threshold to 0.25 and ran 200 random trials by applying SBL(GiG) and evaluating the performance based on the detection and false alarm rate in support recovery. According to Figure 16b, the additional post-processing step provides reasonable performance.
Finally, in Figure 17, we compare the performance of the SBL(BGiG) algorithm (with performing the preprocessing step) with the SBL(GiG) algorithm (after performing postprocessing). We see that Bernoulli-Gaussian-inverse Gamma implemented via SBL(BGiG) provides better performance for low and high sampling ratios. In contrast, Gaussian-inverse Gamma modeling implemented via SBL(GiG) performs much better for the moderate sampling ratios.

Conclusions
We investigated solving the inverse problem of compressive sensing using VB inference for two sparse Bayesian models of Bernoulli-Gaussian-inverse Gamma (BGiG) and Gaussian-inverse Gamma (GiG). The issues of each approach were discussed and the performance between the two models was compared. Specifically, we showed the behavior of these models and algorithms when the sampling ratio is low and moderate as well as the importance of selecting the hyperparameters of BGiG model with care. We further provided some intuition for performing additional pre/post-processing steps, depending on the selected model for better performance.
Based on our study on the synthetic data and considering the overall performance of both algorithms and the complexity in additional pre-/postprocessing, we observed that for moderate sampling ratios, SBL(GiG) is performing better than SBL(BGiG) modeling when using VB for sparse signals with no specific pattern in the supports. In contrast, SBL(BGiG) provided better perfomance for low and high sampling ratios. Finally, a rigorous comparison is required to study in the future under real-world scenarios and various applications. The MATLAB codes for GiG and BGiG modeling are available at https:

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
In this section, we provide details on deriving the update rules of the parameters and variables for both models and the associated algorithms.
Since s n is an outcome of a Bernoulli random variable, q s n (s n = 0) ∝ e ψ(β 1,n )−ψ(α 1,n +β 1,n ) and q s n (s n = 1) ∝ e {ψ(α 1,n )−ψ(α 1,n +β 1,n )− 1 2˜ a n 2 2 (x 2 n +σ 2 xn )−2x n a T nỹ −n } . Therefore, q s n (s n ) ∼ Bernoulli q s n (s n = 1) q s n (s n = 0) + q s n (s n = 1) , which yields to q s n (s n ) ∼ Bernoulli 1 1 + e ψ(β 1,n )−ψ(α 1,n +β 1,n ) e −ψ(α 1,n )+ψ(α 1,n +β 1,n )+ 1 2˜ a n 2 2 (x 2 n +σ 2 xn )−2x n a T nỹ −n ∼ Bernoulli 1 The update rule for the component s n can then be written as The stopping rule of the algorithm can be set based on the marginalized likelihood (evidence). We would rather follow the effect of s on the evidence because if s is learned, it would be easy to compute x s . Therefore, we marginalize the distribution on y and integrate x out. The details are described below.
The update rule for τ n can be then defined as follows. where < y − Ax 2 2 > q x = y T y − 2 < x > T q x A T y+ < x T A T Ax > q x = y T y − 2x T A T y+ < x T A T Ax > q x , and < x T A T Ax > q x = Tr (< x T A T Ax > q x ) = Tr (< xx T > q x A T A) = Tr (xx T + Σx)A T A . Therefore,Ψ :=< y − Ax 2 2 > q x = y T y − 2x T A T y + Tr (xx T + Σx)A T A .
Therefore, we can modelˆ as Finally, the update rule for can be then written as Therefore, Thus, we can write q x (x) as