1. Introduction
Computed tomography (CT) is an ill-posed inverse problem extremely relevant in several areas of science, engineering, and cultural heritage preservation. CT is a non-invasive and non-destructive technique used to reconstruct the internal structure of an object. The final result of CT is a map of the absorption coefficients of the object, which can be obtained through the application of the Radon transform. Assume that an X-ray is irradiated at an angle,
, through a two-dimensional object. If we represent the absorption coefficient of the scanned object by
and assume that
f is regular enough, then we can express the Radon transform as follows:
where
L is the straight-line path along which the X-ray travels,
denotes the line integral along
L, summing up the contributions of
along that line, and
is the total intensity loss of the X-ray beam after passing through the scanned object at an angle
. In practice, a CT scanner captures values of
for some angles
. The fundamental problem in CT imaging is to reconstruct an approximation of
, i.e., to recover the internal structure of the object from these measurements.
If we approximate
f by a piece-wise constant function on a two-dimensional grid with
n elements, we can approximate the integral in (
1) by the following:
where
denotes the set of indices of the elements of the grid that the X-ray passes through,
is the value of the absorption coefficient (that we assumed constant) in the grid element
i, and
is the length of the path that the X-ray travels in the
i-th grid element. Considering a finite set of angles (and, therefore, X-rays)
, for
, we obtain the following linear system of equations:
We can rewrite the system (
2) compactly as follows:
where
is usually a rectangular matrix, with
, containing the lengths
in (
2),
collects the measurements, and
contains the unknown coefficients that we wish to recover. The measurements,
, known as the sinogram, are usually corrupted by some errors, and the “exact” ones are usually not available. In practical applications, we only have access to
, such that we have the following:
where
denotes the Euclidean norm. We will assume that a fairly accurate estimate of
is available. The matrix,
A, may be ill-conditioned, i.e., its singular values may decay rapidly to zero with no significant gap between consecutive ones. Moreover, the system may have many more unknowns than equations; this may be due to several reasons. For instance, it may be impossible to scan the object from certain angles or one may not want to irradiate it with too much radiations. The latter is a common case when one deals with medical applications. Therefore, the CT inversion is a
discrete inverse problem. We refer the interested reader to [
1,
2] for more details on CT and to [
3,
4] for more details on ill-posed inverse problems.
Since the linear system of equations
is a discrete ill-posed problem, one needs to regularize it. There are many possible approaches to regularization, in this work, we consider the so-called iterative regularization methods. Iterative methods, like Krylov methods, exhibit the
semi-convergence phenomenon [
5], i.e., in the first iterations, the iterates
approach the desired solution
, where
denotes the Moore–Penrose pseudo-inverse of
A. After a certain, unknown, amount of iterations have been performed, the noise present in
is amplified, corrupting the computed solution, and the iterates eventually converge to
. The latter is a very poor approximation of
, and in most cases, is completely useless. Therefore, regularization is obtained by early stopping the iterations, before the noise is amplified. Determining an effective stopping criterion is still a matter of current research, and an imprudent choice may produce extremely poor reconstructions.
In recent years, Krylov methods have been considered for solving CT imaging problems. In particular, recently, a variation of the
GMRES algorithm has been proposed [
6]. This method considers an unmatched transpose operator, similar to what has been done in [
7] for image deblurring. Algebraic iterative reconstruction methods are possible alternatives to Krylov methods often applied in CT; see [
8] and references therein. Several sparsity-promoting algorithms have been proposed in the literature, often in combination with data-driven deep learning methods; see, e.g., [
9,
10,
11]. Concerning multigrid methods, to the best of our knowledge, there are very few proposals, and they consider the multigrid strategy only to accelerate the convergence, solving a Tikhonov regularization model [
12], or resorting to a domain decomposition strategy [
13,
14]. To the best of our knowledge, this is the first time an algebraic multigrid algorithm has been developed to solve the CT problem directly.
In this paper, we propose an iterative regularization multigrid method that stabilizes the convergence of the iterative
lsqr method. The latter algorithm is an iterative Krylov method that solves the least-squares problem
As we mentioned above, the solution to the minimization problem (
5) is of no interest, and regularization is achieved by early stopping of the iterations. Depending on the problem, the convergence of the
lsqr method may be so fast that accurately selecting a stopping iteration may be challenging. Combining this Krylov method with the multigrid approach, with properly selected projection and restriction operators, allows us to construct a more stable and accurate algorithm. This is mainly achieved by exploiting the symmetry of the smoother (
lsqr, which solves the symmetric normal equations associated with (
5)) and the symmetric Galerkin projection of the coarse operator. Moreover, in our multigrid method, we can simply add the projection into the nonnegative cone to further improve the quality of the reconstruction.
Note that our multigrid method does not resort to Tikhonov regularization as in [
12,
15], and its aim is not to accelerate the convergence as in [
13,
14], but is inspired by iterative regularization multigrid methods for image deblurring problems as discussed in [
16,
17]. Therefore, we obtain a reliable iterative regularization method, robust with respect to the stopping iteration, but with a computational cost usually larger than that of the
lsqr method used as the smoother.
This paper is structured as follows. In
Section 2, we briefly describe the multigrid method, and in
Section 3, we detail our algorithmic proposal.
Section 4 presents some numerical results to show the performances of the proposed method and we draw our conclusions in
Section 5.
2. Multigrid Method
We will now briefly describe how the multigrid method (MGM) works for solving invertible, usually positive definite, linear systems [
18]. The main idea of classical MGM is to split
into two subspaces. The first one is where the operator,
A, is well-conditioned, and the second one is where the operator is ill-conditioned.
It is well-known that iterative methods first reduce the errors in the well-conditioned space and, only in later iterations, solve the problem in the ill-conditioned one. The cost per iteration is usually of the order of the cost of the matrix-vector product with A. Even if matrix A is moderately ill-conditioned, the decrease of the error in the ill-conditioned space may be extremely slow, and, overall, a large number of iterations is required to achieve numerical convergence, making the iterative method computationally unattractive.
On the other hand, direct methods require a fixed amount of operations, regardless of the conditioning of A. However, the cost is usually sensibly higher than the one of a single iteration of iterative methods, i.e., if , then the cost is usually . Moreover, they usually factor the matrix into the product of two or more matrices that are “easier” to handle. However, these factors may not have the same properties as the original matrix. This is particularly of interest if the matrix A is sparse. In this case, even if , it is still possible to store A, however, if the factors are full, they may require too much memory to be stored and handled.
MGM couples the two approaches exploiting the strengths of both iterative and direct methods and overcoming their shortcomings. We define the operators:
where
,
,
, and
. The sequence stops when the minimum between
and
is small enough. The operator
projects a vector
into a smaller size subspace and, when
, we have that
, where the superscript
T denotes the transposition. If
,
cannot be the transpose of
, due to the difference in the dimensions, but it still represents the adjoint of the operator discretized by
. According to the Galerkin approach, we define the following:
The matrices
are the projection of the original operator
A into smaller subspaces. The choice of such subspaces and, hence, of the projectors
and
, is crucial for the effectiveness of the MGM. In particular, they have to be the ill-conditioned subspaces of matrices
in order to obtain fast convergence. We will discuss later how to choose them to enforce regularization for ill-posed problems.
The MGM is an iterative method. This algorithm is quite involved, therefore, we first describe the two-grid method (TGM).
The TGM, like the MGM, exploits the so-called “error equation”. Let
be an approximation of the exact solution
. We can write the error
as follows:
It is trivial to see that, for the linear system (
3), we have the following:
Therefore, an improved approximation of
, denoted by
, can be obtained by approximately solving the error equation
, obtaining
, and setting the following:
To compute
, one may simply apply a few steps of an iterative method, however, as we discussed above, this would reduce the error only in the well-conditioned space. Reducing the error in the ill-conditioned space may require too much computational work using a simple iterative method. Therefore, in order to obtain a fast solver, we wish to exploit direct methods to tackle the ill-conditioned space. The main drawback of direct methods is the high computational cost. Assuming that
projects into a subspace of small dimension such that
can be factorized cheaply, then a system of the form
can be solved efficiently using a direct method. A single iteration of the TGM goes as follows:
where by
, we denote the application of a few steps of an iterative method, like a Krylov method, to the original system (
3), with the starting guess
.
If n is large, which is usually the case, then may be too large to invert; therefore, to solve the system, one can use the TGM, further projecting the problem. Doing this recursively gives rise to the MGM algorithm. One projects the problem L times until the sizes of are small enough to directly invert it. We refer to each projection as a level. We summarize the computation of a single iteration of MGM in Algorithm 1.
Note that in Algorithm 1 we do not specify explicitly the stopping criterion so that we can later tailor the algorithm to our application of interest, i.e., CT. Also, for , the initial guess is the zero vector. This is because, on lower levels, we are solving the error equation, and one expects the solution to have all vanishing components.
Note that in CT problems, the ill-conditioned subspace resides in the high frequencies. Therefore, if we project into such a subspace to speed up convergence, we risk amplifying the noise, which can destroy the quality of the restored image. On the other hand, for this application, stabilizing instead of accelerating the convergence is more important. Therefore, in the next section, we will choose the grid transfer operators to project the problem in the lower frequencies as done in [
16,
17] for image deblurring problems. The main difference between image deblurring and CT is that, in image deblurring, matrix A is square and structured (e.g., block Toeplitz with Toeplitz blocks (BTTB), block circulant with circulant blocks (BCCB)), while here, in CT, we consider rectangular sparse matrices, so the methods used for image deblurring cannot be applied.
Algorithm 1: MGM method |
![Symmetry 17 00470 i001]() |
3. Our Proposal
We are now in a position to detail our algorithmic proposal.
The first element we wish to describe is the restriction operator,
. The prolongation is chosen as
. We will assume that the vector
is the vectorization of a two-dimensional matrix
X, i.e.,
where the operator “vec” orders the entries of
X in lexicographical order. We denote the inverse operation by “
”. For simplicity of notation, we will assume that
, i.e., that
X is a square of size
. The restriction operator
combines two operators. The first, defined by a stencil
with
, selects the subspace to which the problem is projected; see [
19] for further details. The second one is a downsampling operator, which defines the size of the coarser problem. In detail, if
s is odd, it is defined as follows:
while, if
s is even, it is defined as follows:
Let ∗ denote the convolution operator and ⊗ denote the Kronecker product; then, we define the following:
We can iteratively define
by applying the above construction to the lower levels. Namely, if
, then we define
as in (
6) if
is odd or as in (
7) if
is even. Assuming
, then we define the following:
We consider four possible choices of
M, as follows:
These correspond to different B-spline approximations, where
has order
i [
20]. The first one, i.e.,
, sums up four adjacent grid elements, while the other ones correspond to different types of averages. We wish to discuss
in more detail. Using this restrictor corresponds to summing up the lengths of the ray at a certain angle
on each of the four grid elements that are “fused” to get from level
i to level
; see
Figure 1. Therefore, using this restrictor, intuitively, corresponds to re-discretizing the problem on a coarser grid when we move from level
i to level
and, in some sense, construct a so-called “geometric” multigrid.
Theoretically, the difference between the operators is that each one has a different order, changing the size of the low-frequency subspace, where we project the problem [
21]. Furthermore, when the order is even, the operators are symmetric, while the others are not.
As the post-smoother, we use the
lsqr algorithm; see [
22]. The
lsqr method is crucial to our proposal since it can handle rectangular matrices, which is usually the case with CT, and provides a more stable implementation of the mathematically equivalent CGLS [
23]. Given an initial guess,
, and an initial residual,
, this Krylov method, at each iteration, solves the following:
where
The
lsqr algorithm uses the Golub–Kahan bidiagonalization algorithm to construct an orthonormal basis of
. We will assume that we have access to the matrix
or to an accurate approximation of it; see [
2,
6] for a discussion on non-matching transposes in CT. In the latter case, one may need to use reorthogonalization on the Golub–Kahan algorithm.
Absorption coefficients are always nonnegative, therefore, to ensure that the computed solution satisfies this property, at each iteration, we project the computed solution in the nonnegative cone. This can be simply achieved by setting all the negative values to zero. Projection into the nonnegative cone is often employed in the solution to ill-posed inverse problems since the projection into a convex set is a form of regularization; see, e.g., [
24,
25,
26]. This nonnegative projection is performed after the post-smoother and only at the finest level
, because the coarser levels solve the error equations, so they do not have any sign constraints.
Finally, we describe the stopping criterion employed in our algorithm. If we let
, then
would converge to a solution to the noisy problem (
5). As mentioned above, this is not ideal as this solution is usually meaningless. To achieve regularization, we early stop the iterations. We employ the discrepancy principle (DP); see [
3]. The DP prescribes that the iterations are stopped as soon as the following condition is met:
where
is an upper-bound for the norm of the noise (see (
4)) and
is a user- defined constant.
The theory of the discrepancy principle states that the parameter
has to be greater than one [
27]; however, the greater we take it, the earlier it stops the method, so it is common practice to choose a value very close to 1 [
28]. To summarize, the novelty of our method stands in the choice of the grid transfer operators, as long as the projection in the nonnegative cone, which is specific to the CT problem. We sum up the computations in Algorithm 2. Note that, since the matrix
might be rectangular and may be rank-deficient, we use
rather than
.
Algorithm 2: MGM method for CT |
![Symmetry 17 00470 i002]() |
4. Numerical Examples
To show the potentiality and performance of our algorithmic proposal, we consider three examples with synthetic data.
Our aim is to show that, coupling MGM with a Krylov method, like
lsqr, produces a more stable and more accurate algorithm. Therefore, we compare our approach with the usage of
lsqr and the simultaneous iterative reconstruction technique (SIRT), which is a regularized form of a weighted least squares method often used in computed tomography problems [
29,
30].
We compare the methods in terms of the number of iterations and accuracy. We measure the latter using the relative restoration error defined by the following:
As mentioned above, we stop our method, as well as
lsqr and SIRT, with the DP, where we set
. Moreover, we set the number of iterations of
lsqr to perform at each iteration of MGM to
and we set a maximum number of outer iterations,
.
The matrix
A was explicitly formed using the MATLAB IRTools toolbox [
28].
All computations were performed on MATLAB r2021b running on Windows 11 on a laptop with an 11th Gen Intel Core i7 processor with 16 GB of RAM with 15 digits of precision.
In our first example, we consider the Shepp–Logan phantom discretized on a
grid of pixels; see
Figure 2a. We irradiate the phantom with 362 parallel rays at 180 equispaced angles between 0 and
. Therefore, we obtain
. This is an ideal case, where we assume that we can irradiate the object with as many rays and from as many angles as we wish. We obtain the noise-free sinogram in
Figure 2b. We consider different levels of noise to show the behavior of our method in different scenarios. The noise level is
if we have the following:
We set
and always consider white Gaussian noise. We report the noise-corrupted sinogram with
in
Figure 2c. Note that
.
As mentioned above, we compare our algorithm with
lsqr and SIRT. As stated above, we expect the solution to be nonnegative and, therefore, in our method, we project the computed solution into the nonnegative cone at each iteration. To improve its accuracy further, we perform the same projection in the
lsqr method ensuring that all the computed solutions are physically meaningful. In all the tests, the SIRT method performed worse than the others, computing not very accurate reconstructions and requiring a large amount of iterations to converge. In
Figure 3, we report the evolution of the RRE against the iterations for all considered methods and noise levels. We can observe that semi-convergence is more evident for the
lsqr method than for our algorithmic proposal. Regardless of the
considered, our method is more stable than
lsqr as the RRE increases much slower. We can also observe that the convergence is the most stable when
. This is more evident for higher levels of noise. It is clear that, for the
lsqr method, if one overestimates the stopping iteration, then the RRE may become extremely large. This is not the case for our MGM methods. In
Table 1, we report the RRE obtained at the “optimal” iterations, i.e., the one that minimizes the RRE, and at the DP iteration. In all cases, we can observe that our method is more accurate than the
lsqr algorithm, albeit at a generally higher computational cost. However, we would like to stress that the cost per iteration of both
lsqr and the MGM is of the same order of magnitude as one matrix-vector product with
A and one with
, as analyzed in [
16]. Since
A is extremely sparse, this cost is
; therefore, performing few iterations is computationally extremely cheap. For the noise level,
, we report the solutions computed by
lsqr and by the MGM with
in
Figure 4.
We conclude this example by showing the behavior of our method when we set
. We fix
,
, and we compare the RRE evolution against the iterations of
lsqr and MGM with either
and
, i.e., when we perform two steps of the post-smoother. We show this in
Figure 5. We can observe that for
, the number of iterations required to reach convergence decreases significantly, even though the cost per iteration is doubled. The resulting method is, however, less stable than the one with
, and the obtained RRE is slightly higher. Therefore, in the following, we will set
for all examples.
Depending on the situation, it may occur that one cannot irradiate an object with as many rays as required. It is, therefore, of high importance to verify how an algorithm performs when fewer data are collected. To this aim, we now consider the same phantom as above, but only half of the angles, i.e., we select 90 equispaced angles between 0 and . This produces a matrix . Note that the number of columns of A is roughly double the number of rows.
As above, we consider four levels of noise, namely
and report the evolution of the RRE in
Figure 6. We can observe that the obtained results are similar to the ones in the previous case, i.e., our method is more accurate and more stable, albeit requiring more iterations to converge. Moreover, as in the previous case, the highest stability is obtained for
. This is confirmed by the results in
Table 1. For the noise level
, we report the solutions computed by
lsqr and by the MGM with
in
Figure 7. Note that all the operators perform well, improving the stability of
lsqr and keeping the same computational cost. However,
has the lowest semi-convergence, especially with higher levels of noise and when the problem is strongly underdetermined due to the limited angles used.
Finally, we consider a more realistic image. This is a slice taken from an anisotropic 3D MRI volume that simulates the scan of a human brain present in the
MRI dataset in MATLAB. In this case, the image is composed of
pixels; see
Figure 8. We irradiate the image with 181 parallel rays at 90 equispaced angles between 0 and
. This produces a matrix
. We corrupt the data with
white Gaussian noise, resulting in the sinogram shown in
Figure 8b.
We report the evolution of the RRE against the iterations in
Figure 9. We can observe that the results are very close to the ones obtained for the Shepp–Logan phantom. In particular,
gives more stable convergence, while higher-order projectors provide slightly better reconstructions. This can also be seen from the results reported in
Table 2 and
Figure 10. The latter shows the computed solutions by the MGM using
and
, each stopped at the discrepancy principle iteration.