Bayesian 3D X-ray Computed Tomography with a Hierarchical Prior Model for Sparsity in Haar Transform Domain

In this paper, a hierarchical prior model based on the Haar transformation and an appropriate Bayesian computational method for X-ray CT reconstruction are presented. Given the piece-wise continuous property of the object, a multilevel Haar transformation is used to associate a sparse representation for the object. The sparse structure is enforced via a generalized Student-t distribution (Stg), expressed as the marginal of a normal-inverse Gamma distribution. The proposed model and corresponding algorithm are designed to adapt to specific 3D data sizes and to be used in both medical and industrial Non-Destructive Testing (NDT) applications. In the proposed Bayesian method, a hierarchical structured prior model is proposed, and the parameters are iteratively estimated. The initialization of the iterative algorithm uses the parameters of the prior distributions. A novel strategy for the initialization is presented and proven experimentally. We compare the proposed method with two state-of-the-art approaches, showing that our method has better reconstruction performance when fewer projections are considered and when projections are acquired from limited angles.


Introduction
Computed Tomography (CT) has been developed and widely used in medical diagnosis [1] and industrial Non-Destructive Testing (NDT) [2] in recent decades. In CT, objects are observed using different techniques, for example X-rays [3], ultrasound [4], microwaves [5], or infra-red [6]. X-ray CT employs the absorption of X-rays by the organs in a body or by the materials in industrial components to reconstruct the internal structure of the imaged object. When performing X-ray CT, a set of X-ray images of the measured parts is acquired. The intensity measured by the X-ray images corresponds to the intensity of the radiation passing through and attenuated by the object. CT reconstruction is typically treated as an inverse problem.
The conventional analytical techniques for CT reconstruction are based on the Radon transform [7]. Filtered Back-Projection (FBP) [8] is the most frequently-used analytical method in practical applications. FBP performs well when reconstructing from sufficient data with a high signal-to-noise ratio (SNR), but it suffers from artifacts when reconstructing from insufficient data or with noise.
Owing to considerations regarding patients' health in medical CT and in order to reduce acquisition time in industrial applications, reconstruction with insufficient datasets is increasingly attracting the attention of researchers. Reconstruction from fewer projections is an ill-posed inverse problem [9,10]. In this case, conventional analytical reconstruction methods provide unsatisfactory results, and iterative methods can be used to improve the reconstruction performance. The Algebraic Reconstruction Technique (ART) [11,12], the Simultaneous Algebraic Reconstruction Technique (SART) [13], and the Simultaneous Iterative Reconstruction Technique (SIRT) [14,15] are some of the iterative methods proposed initially. These methods consider the discretized forward system model: g = Hf, where f ∈ R N×1 represents the object, g ∈ R M×1 represents the observed dataset, and matrix H ∈ R M×N is the linear projection operator, mainly based on the geometry of acquisition (e.g., parallel beam, cone beam, etc.) [16][17][18]. Typically, the system of equations is under-determined, i.e., N > M. In this context, regularization methods are frequently used, and the forward system is modeled as: where ∈ R M×1 represents the additive noise applied to the projection system. The regularization methods estimate the unknowns by minimizing a penalty criterion, which generally consists of two terms: The loss function Q(g, f) describes discrepancies in the observed data, such as the quadratic (L2) loss Q(g, f) = g − Hf 2 2 or L q loss Q(g, f) = g − Hf q q with 1 ≤ q < 2. Other expressions such as the Huber function are also reported. The regularization term R(f) is a penalty on the complement criterion of f, such as restriction for smoothness Φ(f) 2 2 or for sparsity Φ(f) 1 , where Φ(f) represents a linear function of f. The parameter λ is known as the regularization parameter, which controls the trade-off between the forward model discrepancy and the penalty term.
By choosing different regularization functions R(f), different regularization methods can be implemented. R(f) = 0 refers to the Least-Squares (LS) method [19], with the drawback that the reconstruction is sensitive to the noise due to the ill-posedness of the problem and the ill-conditioning of the operator H. Quadratic Regularization (QR), also known as the Tikhonov method [20], is given by R(f) = Φ(f) 2 2 , where the linear operator Φ(·) is the derivation operator in most cases. The well-known Total Variation (TV) method [21][22][23] is defined by R(f) = D TV f TV where D TV is the gradient operator. D TV is equal to D x f 1 + D y f 1 + D z f 1 for a 3D object in an anisotropic form, where D x , D y and D z are respectively the gradient operators in the x, y and z directions. The L 1 norm is used in TV for sparse estimations, which enforces the sparsity of D TV f. The appearance of the non-differentiable L 1 term leads to difficulties for the implementation of optimization algorithms. Many optimization algorithms have been proposed to solve this L 1 norm optimization problem, for example the primal-dual method [24], the split Bregman method [22], etc. In regularization optimization, due to the large projection data size and the great number of voxels, the explicit expression of the solution cannot be used directly because of the impossibility of inversing the large size matrix such as H T H + λD T D −1 . Hence, optimization algorithms such as gradient descent or conjugate gradient are often used. More general regularization methods have been developed based on the constrained and dual-variable regularization method: which corresponds to the maximum a posterior optimization of a hierarchical structured model where both f and z are unknown variables. In such a model, the penalty regularization term is set on z, which is associated with f via a linear transformation. The loss functions Q 1 (g, f) and Q 2 (f, z) are for example quadratic (L2), i.e., Q 1 (g, f) = g − Hf 2 2 and Q 2 (f, z) = f − Dz 2 2 where D is a linear transform operator such as a wavelet transformation.
Among the methods treating this type of regularization problem, we mention here the Alternating Direction Method of Multipliers (ADMM) [25]. It minimizes Φ(f) + Ψ(z) subject to Af + Bz = C, and it covers a large number of estimation forms. One example is when Φ(f) = g − Hf 2 , Ψ(z) = R(z), A = I, B = −D, and C = 0 and refers to the above-mentioned bi-variable regularization method corresponding to Equation (3).
In the above-mentioned regularization methods, there is always a regularization parameter λ to be fixed. Sometimes, the regularization term consists of more than one parts, and each of them are weighted by a regularization parameter, for example the elastic-net regularizer [26]. In these cases, two or even more regularization parameters need to be fixed. Cross Validation (CV) and the L-curve method [20,27,28] are conventional methods used to determine suitable values for these parameters. However, this work must be repeated for different simulated datasets and is therefore very costly. Statistical methods, therefore, have been developed and used to solve this problem.
From the probabilistic point of view, a Gaussian model for the additive noise in the forward model, Equation (1), leads to the quadratic expression g − Hf 2 in the corresponding regularization criterion. However, in some types of tomography, for example Positron Emission Tomography (PET) or X-ray tomography with a very low number of phantoms, the noise is modeled by a Poisson distribution. In order to account for a more precise modeling of the noise and the other variables and parameters, statistical methods are used [29]. The Maximum Likelihood (ML) methods [30] and different estimation algorithms such as the Expectation Maximization (EM) algorithms [31], the Stochastic EM (SEM) [32], or the Ordered Subsets-EM (OS-EM) [33] are commonly used in PET-CT reconstruction problems.
Another widely-used type of probabilistic method for PET or X-ray CT reconstruction is the Bayesian inference [34][35][36][37]. The prior knowledge is translated by the prior probability model and is used to obtain the expression of the posterior distribution. The basic Bayesian formula is: where p(g|f, θ 1 ) is the likelihood, p(f|θ 2 ) is the prior distribution, p(f|g, θ) is the posterior distribution, θ = (θ 1 , θ 2 ) are the parameters of these different distributions, and p (g|θ) is the evidence of the parameters in the data g. By using Maximum A Posterior (MAP) estimator f = arg max f {p(f|g, θ)} = arg min f {− ln p(f|g, θ)}, links between the Bayesian method and almost all the regularization methods can be illustrated. A Gaussian prior for p(f) in Equation (4) leads to the quadratic (L2) regularization method, while a Laplacian prior in Equation (4) leads to the L1 (LASSO or TV) regularization method. The regularization parameter can be related to θ 1 and θ 2 . One advantage of the Bayesian method is having some explanation for the regularization parameter via its link with θ 1 and θ 2 . For example, when p(g|f, θ 1 ) and p(f|θ 2 ) are Gaussian with θ 1 and θ 2 , respectively the variances of the noise and the variance of the prior, then the regularization parameter is λ = θ 1 /θ 2 . Another advantage of the Bayesian method is that these parameters can also be estimated to achieve unsupervised or semi-supervised methods. This is achieved by obtaining the expression of the joint posterior probability law: where p(θ) is an appropriate prior on θ. For a hierarchical structured model where a hidden variable z appears in the prior model, we have: where θ = [θ 1 , θ 2 , θ 3 ].
With the posterior distribution obtained from an unsupervised Bayesian inference as in Equation (5), we distinguish three estimation methods. The first method consists of integrating out θ from p(f, θ|g) to obtain p(f|g) and then using p(f|g) to infer on f. The second approach is firstly to marginalize p(f, θ|g) with respect to f to obtain p(θ|g) = p(f, θ|g) df and estimate θ = arg max θ {p(θ|g)}, then use θ as it was known. Unfortunately, these approaches do not often give explicit expressions for p(f|g) or p(θ|g). The third and easiest algorithm to implement is the joint optimization, which estimates variable f and parameter θ iteratively and alternately. Bayesian point estimators such as Joint Maximum A Posteriori (JMAP) [38] and Posterior Mean (PM) [39] via Variational Bayesian Approximation (VBA) methods [40][41][42] are often used.
In order to distinguish the details of a reconstructed object, a high-resolution image is expected. In industrial applications, especially for the NDT of a large-size object, the size of the projection (1000 images of 1000 2 pixels) and the number of voxels (1000 3 voxels) become critical, and so does the projection and back-projection operators in CT. It is necessary to account for some computational aspects, for example the GPU processor [43,44].
In our previous work [45], we proposed to use the Bayesian method via a synthesis model, in which the multilevel Haar transformation coefficient z of the image is first estimated, and then, the final image reconstruction result is obtained from post-processing: f = D z. In this case, when using a Laplacian prior model and the MAP estimator, the problem becomes equivalent to the optimization of J(z) = g − HDz 2 + λ z 1 , which is a typical L 1 regularization method. The particularity of our work was to use a generalized Student-t (St g ) prior model [46] in place of the Laplacian model.
In this paper, we present a Hierarchical Haar transform-based Bayesian Method (HHBM), first proposed in [47], in which the object to be reconstructed, f, is related to the Haar transformation coefficient z by f = Dz + ξ where ξ represents the modelization uncertainties. f and z are estimated simultaneously. Wavelets provide an optimal representation for a piecewise continuous function consisting of homogeneous blocs separated by jump discontinuities (the contours), as the wavelet representation is sparse for such signals. The transformations used are, for example, the Haar transformation [48], the Curvelet Transformation (CVT) [49], the Contourlet Transformation (CT) [50], Dual-Tree Complex Wavelet Transform (DT-CWT) [51], etc. As long as the object under consideration, f, is piecewise continuous or constant, the Haar transform is appropriate, with the advantage that: (1) the transform coefficients are sparse; (2) the transformation operator is orthogonal so that the inverse operator and the transpose are identical; and (3) the computation of this transformation consists of only additions and subtractions, while the cost of computation is only O( √ N) where N is the size of the object f.
The sparsity of the transformation coefficient is generally defined by three classes of distributions: the generalized Gaussian distributions [52], the mixture distributions [53], and the heavy-tailed distributions [54]. In this paper, we use a generalization of the Student-t distribution (St g ), which belongs to the heavy-tailed family and has many advantages when enforcing the sparsity of variables [46].
In this paper, we extend extensively the previous work by: (1) adapting the forward model and prior models to the 3D case, which is more appropriate for real 3D large data size applications; (2) comparing the RMSE of the phantom reconstructed by the HHBM method with those by the conventional QR and TV methods, we show the advantages of the semi-supervised property of the HHBM method and that the HHBM method outperforms the TV method when insufficient data are estimated; (3) proposing new ideas for fixing the hyper-parameters in the proposed model; and (4) evaluating the performance of the proposed method in the situations when the number or the angle distribution of the projections is limited.
The rest of this paper is organized as follows: Section 2 presents the proposed hierarchicallystructured Bayesian method; Section 3 gives the details of the implementations and the choice of hyperparameters, as well as the simulation results; some points on the initialization of hyper-parameters are discussed in Section 4. Conclusions are drawn and prospective future research is presented in Section 5.

The Semi-Supervised Hierarchical Model
The Hierarchical Haar-based Bayesian Method (HHBM) is presented, in which the object f and its multilevel Haar transform coefficient z are considered jointly. A sparse enforcing prior is defined on z. The wavelet transformation has been used for the reconstruction of tomography images in some state-of-the-art works [55][56][57][58], using both regularization and Bayesian methods. In these state-of-the-art methods, the phantom f is obtained by a post-processing from reconstructed coefficient z. In this paper, the phantom f and the coefficient z are simultaneously estimated.

Forward System Model and Likelihood
In the proposed method, the forward model introduced in Equation (1) is considered. Generally, the noise in tomography is modeled by a Poisson distribution [59], but in X-ray CT, the Gaussian approximation is often used. We adopt the Gaussian approximation and propose to use a zero mean and non-stationary model where the variance is considered to be unknown, belonging to an inverse Gamma distribution given that this distribution provides a good adaptation of the positivity property of the variances v i : The vector v is considered in order to account for the difference of sensitivity to noise for each detector in each projection direction.
According to the forward model of the linear system, Equation (1), and the prior model of the noise, Equation (7), the likelihood of this model system is: In Bayesian inference, the likelihood is combined with the prior distributions to determine the posterior distribution.

Hierarchical Prior Model and Prior Distributions
Typically, the objects considered in medical and industrial X-ray CT are piecewise continuous. In this paper, a hierarchical prior model is used to define the piecewise continuous property. In this hierarchical prior model, a sparsity enforcing model is defined for the wavelet transformation coefficients of the image. A large number of methods accounting for the sparse structure of the solution have been proposed in the literature. Among them, the L1 regularization method is most frequently used, which minimizes the criterion J(f) = g − Hf 2 2 + λ Φf 1 where Φ is a linear operator, for example the gradient in the TV method. Another class of methods, known as "synthesis" [60], minimizes J(z) = g − HDz 2 2 + λ z 1 where f = Dz, and z = D −1 f is for example a wavelet transform.
In this paper, we propose to use the multilevel Haar transformation as the sparse dictionary. The transformation is modeled using a discretized forward model: where D ∈ R N×N represents the inverse multilevel Haar transformation and ξ ∈ R N×1 represents the uncertainties of the transformation, which is introduced to relax the exact relation of the transform operator D. ξ is supposed to be sparse. Unlike the gradient operator used in the TV method, the multilevel Haar transformation is orthogonal, i.e., D −1 = D T . This property provides certain advantages during optimization, especially for the big data size problems, as the inversion and transpose of the operator are identical and can be replaced by each other for different types of D.
ξ is modeled by a Gaussian distribution, . v ξ is considered an unknown variance. It is modeled in order to realize a semi-supervised system where the variance is estimated. Here, v ξ is modeled by an inverse Gamma distribution, with the same consideration as the model of v . The Gaussian model with an inverse [46]. Consequently, a St g distribution is derived for ξ, and the sparse property of ξ can be guaranteed. From Equation (10), the conditional distribution p(f|z, v ξ ) is derived: with: For practical applications where these parameters are not known or difficult to obtain, we use a semi-supervised method in which the variances of noises, v and v ξ , are unknown. In HHBM, the inverse Gamma distribution is used to model v and v ξ , . Consequently, both ξ and are modeled by a St g distribution.
Vector z = [z 1 , · · · , z N ] represents the multilevel Haar transformation coefficient of piece-wise continuous f. As mentioned above, z is sparse. In this paper, the generalized Student-t distribution (S t g ) [46] is used to enforce the sparsity structure of z. The St g distribution can be expressed as the marginal of a bivariate normal-inverse Gamma distribution: Thanks to the fact that normal and inverse Gamma are conjugate distributions, the use of the St g via Equation (13) simplifies the computations when using the Bayesian point estimators such as the posterior mean via the Variational Bayesian Approximation (VBA) method [42]. From Equation (13), the St g prior distribution modeling z is expressed as the following model: where v z j , ∀ j = 1 : N are i.i.d. distributed. The difference between the standard St distribution, St(z|ν) = N (z|0, v)I G(v| ν 2 , ν 2 ) dv, and the generalized St g , given in Equation (13), is that St(z|ν) is governed by one parameter ν, but St g (z|α, β) is governed by two parameters (α, β). With these two parameters, the St g does not only enforce the sparsity of the variable, but also controls the sparsity rate [46]. By changing the values of the two hyper-parameters α z 0 and β z 0 , we can obtain either a heavy-tailed distribution with a narrow peak or a distribution approaching a Gaussian distribution.

The HHBM Method
The prior models of the proposed Bayesian method based on the forward model of Equation (1) and the prior model of Equation (10) are: Figure 1 shows the generative graph of the proposed model in which the hyperparameters in the rectangles need to be initialized: Via the Bayes rule, Equation (6), the joint posterior distribution of all the unknowns given in the data is derived: Bayesian point estimators are often used for estimation via the a posteriori distribution. In this paper, we focus on the JMAP estimation, given that in the case of the large data size of the 3D object, the computational costs for the VBA algorithm is too expensive.

Joint Maximum a Posteriori Estimation
The negative logarithm of the posterior distribution is used as the criterion of optimization in order to simplify the exponential terms. The maximization of the posterior distribution becomes a minimization of the criterion: We substitute the distribution formulas and obtain: The unknown variables are determined by obtaining the expressions of the alternate minimization in Equation (24): ∀i ∈ [1, M] and ∀j ∈ [1, N].
In 3D X-ray CT, the inversion of matrix in Equations (25) and (26) is impossible due to the large data size. First-order optimization methods are generally used in this case. In this paper, we use the gradient descent algorithm: where I G is the number of iterations for the gradient descent estimation and ∇J f (·) and ∇J z (·) are the derivatives of the criterion (24) regarding f and z, respectively. γ f (·) and γ z (·) are the corresponding descent step lengths, which are obtained by using an optimized step length strategy [61]: where z . The algorithm concerning the optimization of all the unknowns is given in Algorithm 1. for k = 1 to I G do 10: Calculate ∇J( f (k−1) ) according to Equation (32) 11: Update γ (k) f according to Equation (34) 12: Update end for 14: 15: 16: for k = 1 to I G do 17: Calculate ∇J( z (k−1) ) according to Equation (33) 18: Update γ (k) z according to Equation (35) 19: Update end for 21:

Initialization and Experimental Results
For the simulations, the 3D simulated "Shepp-Logan" phantom, shown in Figure 2 on the left, Figure 3 on the top, and the 3D real "head" object, shown in Figure    The proposed HHBM method is compared with the conventional Quadratic Regularization (QR) and Total Variation (TV) methods. For the QR method, the gradient descent algorithm is used for the 3D large data size problem. For the TV method, the split Bregman method [22] is used to solve the L 1 norm minimization problem.
To evaluate the proposed method and compare it with the state-of-the-art methods, four different metrics are used: • the Relative Mean Squared Error (RMSE), RMSE = f − f 2 / f 2 , which shows a relative error of the results; • the Improvement of the Signal-to-Noise Ratio (ISNR), which measures the improvement during iterations; • the Peak Signal-to-Noise-Ratio (PSNR), which presents the SNR relative to the peak data value; • the Structural Similarity of IMage (SSIM) [62], which evaluates the quality of the result approaching human vision.
In 3D X-ray CT, the projection matrix H is very large and is not accessible. For the simulations, only the projection operator Hf and the back-projection operator H T g are used. Considering that the costly projection and back-projection operators are computed in every iteration, the GPU processor is used via the ASTRA toolbox [63] to accelerate the computation.
The reconstructed phantom obtained by using the Filtered Back Projection (FBP) method is considered to be initial value f ini . The initialization of coefficient z ini is the multilevel Haar transformation of f ini : z ini = D −1 f ini . In this article, we choose the level of transformation such that z has a sparse structure. As shown in Figure 4, when the transform level is small, for example two levels, the coefficient z is not sparse; when the transform level is sufficiently large, the coefficient is sparse. In this paper, we set z as a five-level Haar transform coefficient.  The initialization for α z 0 and β z 0 is based on the sparse structure of the variable z. In Figure 4, we can see that the sparsity rate depends on the rank of transform coefficient r, where r ∈ [1, l + 1].
For example, when l = 2, shown in Figure 4, the coefficient has three ranks, r ∈ [1, 2, 3]. The first rank r = 1 corresponds to the low frequency components in the transform coefficient.
The initialization for the hyperparameters, α 0 , β 0 , α ξ 0 , and β ξ 0 , is based on the prior models of the variances v and v ξ we have chosen. In the proposed method, we consider the background of the generalized Student-t distribution, in which both and ξ are modeled by a Gaussian distribution with inverse Gamma distributed variance, i.e., the St g distribution according to Equation (13).
The noise depends on the SNR of the dataset. In order to exploit this information in the initialization, we express the biased dataset as the sum of uncontaminated dataset g 0 and the additive noise : As the noise and the uncontaminated data g 0 are supposed to be independent, we have: The SNR of the dataset is: The mean of variance v of the noise is E [v i |α 0 , β 0 ] = β 0 /(α 0 − 1), ∀i ∈ [0, M], so we obtain: The two hyperparameters α 0 and β 0 are combined according to Equation (40); hence, initialization for one of them is sufficient. In real applications, the SNR of the dataset is unknown, but we can use the projection of an empty object, i.e., f = 0, to obtain a rough value of the variance of noise v . Figure 5 shows the influence of the value of α 0 on the reconstruction. According to the results, a bigger value for α 0 results in a smaller value on RMSE for different numbers of projections and the SNR of the dataset. This monotonous property facilitates the initialization of this hyperparameter, as a large value for α 0 satisfies all cases. When α 0 is greater than a threshold value, the RMSE does not change with different initialization values for α 0 .
For ξ, both α ξ 0 and β ξ 0 are analyzed for the influence of the reconstruction results.     In [46], it is pointed out that when α and β of the St g distribution are both large, the St g distribution approaches a Gaussian distribution, which is the case for the additive noise . If α and β are very small (approaching zero), the St g distribution becomes a non-informative distribution (Jeffreys distribution); when α and β are both small, the St g has the sparsity enforcing property, which is the case for the sparse ξ. Consequently, the initialization of the hyperparameters is theoretically supported, and they can be initialized with respect to these properties in other simulations.

Simulation Results with a Limited Number of Projections
We apply 180, 90, 60, 45, 36, and 18 projections evenly distributed in [0, 180] degrees for the reconstruction of the 3D Shepp-Logan phantom of size 256 3 ; each projection contains 256 2 detectors. The number of projections is chosen such that there is respectively one projection every 1, 2, 3, 4, 5, and 10 degrees.
In Table 1, different evaluation metrics of the reconstructed 256 3 Shepp-Logan phantom are compared. It is shown that the HHBM method does not always perform better than the TV method, especially when there are sufficient numbers of projections. However, when there is insufficient projection data, the HHBM method is more robust than the TV method. On the other hand, as it is known that the choice of regularization parameter plays an important role in the regularization methods like QR or TV and the value for the regularization parameter should be selected for each different system settings, the HHBM method is much more robust on the initialization of hyper-parameters. As we can see from Figures 5-7, once we have chosen the hyperparameters in a certain interval, which is not difficult to fix according to the properties of the prior model, we can obtain the appropriate reconstruction results. More importantly, in the Bayesian approach, the prior model can be chosen from a variety of other suitable distributions, which gives more possibilities for the models than the conventional regularization methods. We may also choose different point estimators from the posterior distribution, for example the posterior mean, etc. Table 1. RMSE, ISNR, PSNR, and SSIM of the reconstructed phantom with 50 global iterations (10 gradient descent iterations in each global iteration). The values of the regularization parameters are respectively λ QR = 10 and λ TV = 50 for SNR = 40 dB, λ QR = 600 and λ TV = 100 for SNR = 20 dB.  Figures 8 and 9 show the reconstructed middle slice of the "Shepp-Logan" phantom and "head" object by using the TV and HHBM methods from 36 projections with SNR = 40 dB and SNR = 20 dB. The red curve illustrates the profile of the blue line position. In the reconstructed Shepp-Logan phantom obtained using the TV method, the three small circles on the top of the slice are not evident. By using the HHBM method, we can distinguish these three small circles. By comparing the profiles of the slice of the reconstructed Shepp-Logan phantom, we can see that by using the HHBM method, the contour positions on the profile are closer to the original profile than those obtained using the TV method. In the reconstructed head object, there are more details than the simulated Shepp-Logan phantom, especially in the zoom area in the second line in Figure 8. By comparing the results, we can see that for the type of object that contains some small details, the TV method derives a result with smoother homogeneous areas, but with fewer details in the contour areas than the HHBM method. Some of the white material, which is dispersed into discontinuous small blocks in the head object, is connected in the results of the TV method. From these images, we conclude that with an insufficient number of projections, the proposed method gives results with clearer contours and details. Figure 10 shows the reconstructed Shepp-Logan phantom from 18 projections of SNR = 40 dB and 20 dB. In this very underdetermined case, the HHBM method can still obtain a result that is clear enough to distinguish the primary zones and contours of the object. In the simulations, we used an SNR = 40 dB to represent a weak noise case and SNR = 20 dB for a strong noise case. When SNR = 40 dB, the HHBM method outperforms both quadratic regularization and the TV method. When SNR = 20 dB, the TV method with the optimal regularization parameter outperforms the HHBM method. However, in Figure 12, we show another curve in light green (TV2) showing the TV reconstruction with some random regularization parameters, which are chosen as a value not far from the optimal regularization parameters. From these results, we can see that the HHBM method is more robust than the TV reconstruction method with respect to the regularization parameter, the optimal value of which is, on the other hand, difficult to determine in the real applications where we cannot evaluate the estimation quality.

Simulation Results with a Limited Angle of Projections
In both medical and industrial X-ray CT, another common challenge is the limit of projection angles. In this part of the simulation, we use evenly-distributed projections in a limited range of angles for the simulated 3D "Shepp-Logan" phantom and the 3D "head" object, both of which have a size of 256 3 . Figures 13 and 14 show the middle slice of the reconstructed Shepp-Logan phantom and the head object, from 90 projections distributed between 0 • and 90 • , with projection SNR of 40 dB and 20 dB. By using the TV method, the reconstructed object is blurry along the diagonal direction for which there is no projection data, and there is a square corner where the object should have a rounded edge. By using the HHBM method, we get results that are more consistent with the true shape and clearer contours.

Simulation with a Different Forward Model
To study the robustness of the results with respect to the modeling errors, we apply a slightly different forward model for the one used for the generation of the simulated data and the one used for the reconstruction. During the projection, the projector is applied on the Shepp-Logan phantom of size 1024 3 , with the detector size of 256 2 for each projection direction.
In Figure 17, we show the results of the two projection data obtained, as well as their difference δ proj , which can be considered as the forward modeling error. The δ proj is rescaled in order to show clearly the details. We then use the data obtained from the 1024 3 phantom to reconstruct an object of size 256 3 .
In Figure 18, the middle slices of the reconstructed Shepp-Logan phantom by using the QR, TV, and HHBM methods are presented, by using 180 and 36 projections, respectively. From the figures, we can see that when there are 180 projections, all three methods performs well, and the TV method detects better the contours, while the HHBM method has more noise at the contour areas. When there are insufficient projection numbers (36 projections in this simulation), the HHBM method outperforms the QR and TV methods for reconstructing the details in the phantom, for example the three small circles in the top of the phantom. In Table 2, the RMSE of the reconstructed phantom by using the different methods are compared. We can conclude that, when the projector model is different than the reconstruction one, all these three methods (QR, TV, and HHBM) have good performance when there are 180 projections. When the projection number decreases, the TV and HHBM methods outperform the QR method. Comparing with the TV method, the HHBM method is more robust to the number of projections. When the projection number is smaller than 60, the HHBM method outperforms the TV method.
All the MATLAB codes for the simulations in this paper can be found on GitHub [64].

Discussion
One advantage of the Bayesian approach is the estimation of the parameters along with the estimation of unknown variables of the forward model at each iteration. However, like in regularization methods, the hyper-parameters need to be initialized.
While the parameters in the regularization methods play an important role in the final results and they are costly to fix, the hyper-parameters in HHBM can be initialized based on the prior information (the sparse structure of z) and the prior model (the Student-t distribution). In this article, we have shown that once the hyper-parameters are fixed in a certain appropriate interval, which is not difficult to obtain, the corresponding algorithm is robust. In this work, the hyper-parameters are not fixed via the classical approach using non-informative prior laws (i.e., considering the inverse Gamma corresponding parameters such that they approach Jeffreys') [65].

Conclusions
In this paper, we propose a Bayesian method with a hierarchical structured prior model based on multilevel Haar transformation (HHBM) for 3D X-ray CT reconstruction. Simulation results indicate that for a limited number of projections or limited projection angles, the proposed method is more robust to noise and to regularization parameters than the classical QR and TV methods.
Indeed, we observe a relatively weak influence of the hyper-parameters in the behavior of the corresponding iterative algorithm. The interest of this weak dependency is that it offers a practical way to ensure the initialization of the algorithm, which typically is not-trivial.
In this context, as future work, we are investigating the causes of the relatively weak influence of the hyper-parameters and the theoretical foundation of the corresponding robust interval, extending the discussion to the same approach using sparsity-enforcing priors expressed as normal variance mixtures, but for other mixing distributions (Gamma, generalized inverse Gaussian) [66].
Another extension of this work is to consider the posterior mean as an estimator. This can be done via the Variational Bayesian Approach (VBA), but a practical implementation requires a method of accessing the diagonal elements of the large matrix H T H, which is being studied by our group.