1. Introduction
It is very important to have reliable and trustworthy methods to obtain good X-ray computed tomography (CT) scans with the lowest level of radiation to achieve medical diagnoses. A common strategy, known as sparse-view CT, reduces the radiation dose by capturing a limited number of data. The CT imaging process can mathematically be expressed as the following model:
where
is the image we want to reconstruct,
is the acquired data (also called a sinogram),
is the noise perturbation, and
is the linear tomographic operator. Due to the limited number of data, the sparse-view CT problem can be cast into the compressed sensing framework [
1,
2], which includes a large variety of imaging problems, e.g., diffuse optic tomography [
3], magnetic resonance imaging [
4], image deblurring [
5], optical coherence tomography [
6], and synthetic-aperture radar imaging [
7].
In addition to being indeterminate, the inverse problem (
1) is typically also ill-conditioned, and the more traditional analytical methods such as filtered back projection (FBP) produce images dominated by noise and artifacts. An efficient alternative to invert (
1) is the model-based approach, where an iterative algorithm solves a minimization problem by incorporating prior information on the reconstruction as either regularization functions or constraint. Among the several proposals in literature, the total variation (TV) function is certainly the most widely used in medical CT (see, e.g., [
8]). The advantages of model-based methods are their explainability and the data consistency imposed by the physical model. However, they require hard parameter tuning and a long execution time.
In the last decade, motivated by the availability of huge datasets and new efficient computational devices, such as dedicated graphics processing units (GPUs), deep neural networks (DNNs), in particular convolutional neural networks (CNNs), have been successfully employed for X-ray CT image reconstruction, producing very accurate reconstructions at the expense of explainability and robustness to noise. Deep-learning-based methods used for CT can be grouped into four classes. The first class is called
data-to-image, and here, the neural network directly maps the acquired sinogram to the reconstructed image (see, for example, [
9]). This approach is not widely used since it is well known that it is not stable with respect to noise in the data [
10]. The second class is composed of networks directly mapping coarse reconstructed images to accurate reconstructions, and for this reason, it is named
image-to-image. Most of the proposed methods use CNNs, taking as input FBP reconstructions (see, e.g., [
11,
12,
13]). The third class is the
plug-and-play-type approaches, where a network is trained outside of the iterative process and then included within it, thus allowing the iteration to be run indefinitely [
14,
15,
16]. The last approach is known as
iterative mapping, and it is realized through the so-called unrolled (or unfolded) neural networks [
17]. The aim of these methods is to mimic the action of a minimization problem by simultaneously learning the hyper-parameters of the model, such as the regularization hyper-parameters, the iterative solution algorithms, and/or the model operators.
The first deep unfolding networks were applied to specific inverse problems [
18], and they demonstrated improved performance compared to the more traditional optimization algorithms. Since then, deep unfolding has been extended to compressive sensing [
19,
20], image deblurring [
21], and tomographic image reconstruction. To cite some examples, the first proposals were from J. Adler and O. Öktem in [
22,
23], where they unrolled a proximal primal–dual method. More recent works are: (a) [
24], which concerns the fast iterative shrinkage/thresholding algorithm (FISTA), which can be applied to solve CT and electromagnetic tomography reconstruction problems; (b) [
25], in which a few iterations of a gradient descent method were applied to a field-of-experts regularized least squares functional; and (c) [
26], where the authors unfolded the dual block forward–backward (DBFB) algorithm embedded in an iterative re-weighted scheme for region-of-interest CT image reconstruction. A different scheme is investigated in [
27], where the pseudodifferential operator is learned for limited-angle CT.
The unfolding-deep-learning-based approach is of great interest to the mathematical community and for scientific research since, by exploiting the physical model of the CT inversion and the regularization properties imposed on the solution by the prior, it maintains coherence between the reconstruction and the sinogram; therefore, it is more explainable and exhibits higher stability properties than other deep learning methods.
Contributions. The aim of this work is to propose an accurate and robust unfolded approach, named CTprintNet in the following, for the reconstruction of CT images from sparse views. The main ingredients of the CTprintNet architecture are those of the iRestNet approach, proposed in [
21] for image deblurring, which unrolls the proximal interior point algorithm for the solution of a regularized minimization problem and learns the regularization parameter of the model, as well as two more proper parameters of the algorithm. We consider as prior a smooth generalization of the TV regularizer, which is very effective in medical imaging, in particular in CT, at both reducing the noise and preserving the edges of objects, such as masses or fibers, which is useful for diagnosis [
28,
29,
30].
We conducted tests of CTprintNet in the presence of different noise intensities with respect to the noise used for training to check the stability property of the framework. Our network proves to be very stable with respect to the noise on the input sinogram; when compared to an image-to-image method, it shows greater robustness with respect to noise.
Organization of the paper. This paper is organized as follows. In
Section 2, we briefly recall the CT imaging problem and reformulate it as an inverse problem. The optimization method exploited to address its solution is detailed in
Section 3, while the CTprintNet architecture built by unfolding the iterations of this method is illustrated in
Section 4, together with some implementation choices made for the numerical tests. The experimental results are presented in
Section 5. Finally, some concluding remarks and considerations are given in
Section 6.
2. The Numerical Model
In this section, we briefly introduce a description of the 2D X-ray CT acquisition process, including some mathematical notations.
The main focus of CT studies is to develop safer protocols to reduce the radiation dose for the patient since limiting the negative effects produced by the radiation allows the use of the CT technique in a wider class of medical examinations. This goal can be pursued by either decreasing the X-ray tube current at each scan (low-dose CT) or reducing the number of X-ray projections (few-view CT). In the first case, the measured data present high levels of noise; however, in the second case, the incompleteness of the projection data leads to create images with pronounced artifacts. In particular, in this paper, we focus on the few-view CT case, where image reconstruction is particularly difficult.
Over the years, different tomographic devices have been designed to fit different medical needs, and this has led to the creation of various protocols and geometries. In particular, in this study, we focus on fan-beam geometry, schematized in
Figure 1, since it is among the most widespread geometries nowadays. It is characterized by a source that emits fan-beam X-rays whose intensity is recorded, after passing through a body, by a decoder with
elements.
The aim of CT imaging is to reconstruct the attenuation coefficient function
of the object when passing through a body from the
projections acquired at equally spaced
angles, with
, and performed in the angular range
(in our case,
). Each individual projection is modelled by the Radon transform, which is the integral along the line
L describing the path of an X-ray beam at a fixed angle. According to Beer–Lambert’s law, given
as the intensity emitted by the X-ray source and
I as the intensity measured by the detector, we have
In the real (discrete) case, the function
, which describes the object, is discretized in
pixels, with values that can be lexicographically re-ordered in a vector
.
Analysing a single projection acquired at a fixed angle
from
cells of the decoder, Equation (
2) can be discretized into a sum over all the pixels as
where
and
is the intensity measured by the
i-th cell. Repeating for all the angles
, with
, and using a compact notation, we obtain the linear system
where
denotes the CT image to be reconstructed,
is the synogram obtained with the projection measurements (with
), and
is a sparse matrix representing the discrete line integrals in (
2). Hence, tomographic image reconstruction is mathematically modeled as an ill-conditioned inverse problem, whose solutions can be dominated by noise. In particular, in the few-view CT case, the linear system is also under-determined due to the lack of projections, and, therefore, it might have infinite solutions.
The model-based approach is introduced to address these numerical controversies, with the aim to model the CT imaging process as a minimization problem of a suitable cost function. The main idea is to combine a fidelity measure on the data with some a priori information on the solution, thus leading to the following problem:
where
is the data-fidelity term,
is a regularization function, and
is the regularization parameter.
The data-fitting function is related to the statistics of the noise on the data, and we consider the least squares function
which provides a good maximum a posteriori estimate for the log-Poisson noise affecting the measured sinograms [
31]. Different functions have been proposed in the CT literature as regularization terms. In particular, the TV function is suitable for use with tomographic images that are rather uniform inside the organs while having fast variations on the borders. Since, in our scheme, we need the objective function in (
5) to be differentiable, we consider a smoothed version of the TV function, defined as
where
are the horizontal and vertical gradient operators, respectively, and
is a small positive parameter. Finally, hard constraints on the values of the image’s pixels are typically added to preserve the physical properties of the attenuation coefficients. A standard assumption is to force the solution to belong to the
N-dimensional hypercube
, thus leading to the constrained optimization problem
3. Proximal Interior Point Method for CT Reconstruction
Problem (
8) involves the constrained minimization of a smooth and convex objective function, which can be performed by means of a large variety of optimization methods. In our paper, we consider the proximal interior point approach [
32], in which the original constrained minimization problem (
8) is replaced by a sequence of unconstrained problems
where
is the logarithmic barrier function, defined as
and
is the barrier parameter, which goes to zero along the minimization.
Following [
21], we address the minimization of (
9) by means of the forward–backward proximal interior point (FBPIP) method, whose iterations are defined in Algorithm 1, where
is the proximity operator of
at
x [
33].
The following proposition directly derives from ([
21], Proposition 2); we report its statement and proof adapted to problem (
9) and (
10) to show the closed-form expression of the proximal point defining
in Algorithm 1, together with its derivatives with respect to
x,
, and
.
Algorithm 1: Proximal interior point algorithm. |
Let , and be a sequence such that ; fordo end for
|
Proposition 1. Let and let be the barrier function associated with the set , defined asThen, for every , the proximity operator associated with is given bywhere is the unique solution in of the cubic equationand is the i-th vector of the canonical basis of . The Jacobian matrix of φ with respect to x and the gradients of φ with respect to μ and γ are given bywhere Proof. The formal derivation of the proximity operator (
13) can be found in [
34]. Let
,
, and
F be defined as
for all
. By definition of
,
. The derivative of
F with respect to the last variable is equal to
We observe that both
and
are negative quantities. Moreover, since
, it follows from (
19) that
and
share the same sign. Hence,
The gradient of
with respect to
x and the partial derivatives of
with respect to
and
for the implicit function theorem exist and are equal to
Using these equations to differentiate (
13) leads to the results of the proposition. □
We end this section by remarking that, since the barrier function
in (
10) is the sum of the functions
in (
12) and each of these acts only on the
i-th component of the vector
u, then the proximity operator associated to
is the vector
, defined as
5. Results and Discussion
In this section, we show and analyze the performance of the proposed CTprintNet architecture for two different datasets composed of synthetic images with geometric elements and real medical images. Let us first describe the training procedure and the experimental setting.
In all the implemented experiments, we used the same protocol to simulate the few-view geometry, i.e., a reduced scanning trajectory limited to 180 degrees with
scanning views that were equally distributed, with
and
for the synthetic and realistic datasets, respectively. We constructed the test problem as
where
is a realization of a normally distributed random variable with a mean equal to zero and a standard deviation equal to
. As we will see in the results, different values of
were considered in the training and testing phases. The sinogram was then used to compute the first iteration
.
The results obtained with the proposed architecture were compared with both an iterative solver and a deep learning approach. For the first class of methods, we chose the scaled gradient projection method (SGP) [
37], which has already been used several times to reconstruct CT images [
38,
39,
40]. Regarding the choice of a competitor among the deep learning methods, it has been very hard for us to exploit existing unfolding architectures specifically designed for tomographic image reconstruction problems. Although algorithms of this type can be found in the literature (see, e.g., the references cited in the Introduction), the codes required to produce the results either are not available or require a notable amount of modifications to be usable. As a result of this, we decided to compare CTprintNet with an image-to-image approach constituted of a UNet residual network applied to a coarse FBP reconstruction. We will denote this framework as LPP in the following. In particular, the different LPP trainings were carried out on the same datasets used for CTprintNet by performing 100 epochs for the synthetic dataset and 50 for the realistic one.
We evaluated the performances through qualitative visual comparisons and by quantitatively computing the peak signal-to-noise ratio (PSNR) and the SSIM [
41].
5.1. Results on a Synthetic Dataset
We first tested the algorithms on the synthetic dataset “Contrasted Overlapping Uniform Lines and Ellipses” (COULE) (
https://www.kaggle.com/datasets/loiboresearchgroup/coule-dataset, accessed on 13 November 2021), which contains 430 sparse-gradient grayscale images of size
. Among the 430 images, we used 400 for training and the remaining 30 for the test phase.
Experiment A. In this experiment, we trained and tested CTprintNet and LPP considering data without noise. From
Figure 5, showing the boxplots relative to PSNR, SSIM (in blue for LPP and in orange for CTprintNet), we can see that LPP defines better reconstructions, on average. In fact, it is well known that the post-processing architectures perform particularly well in standard conditions where the training and testing data are obtained from similar conditions.
Experiment B. Under real conditions, however, the projection data are naturally compromised by noise. To test the stability of the proposed framework with respect to noise in the sinogram, we first analyzed the behavior of the architecture when varying the values of the noise’s standard deviation
introduced in the training and test phases. In Experiment B, we trained the two architectures by adding Gaussian noise as in (
30) with
both to the training set and test set. The results obtained, shown in
Figure 6, demonstrate that noise influences the performance of the two architectures, which now are more competitive with each other. In particular, the SSIM index, indicating the visual quality of the image, is far better for CTprintNet.
Experiment C. Since it is well known that a drawback of neural networks, particularly post-processing ones, is that their performance on data never seen or far from the training data is not always satisfying, in Experiment C, we trained the two architectures without noise on the data (
), and we tested them with variable standard deviations in the interval
. In
Figure 7, we plot the average values of the two metrics as the standard deviation of the noise added on the test set changes (the blue and red lines correspond to LPP and CTprintNet, respectively). The results clearly show that the proposed architecture is stable with respect to noise, whereas the LPP initially provides better performance; however, as the noise increases, its behavior worsens considerably (see also the reconstructions of a representative test image in
Figure 8). Conversely, CTprintNet is able to guarantee good results by limiting the damage caused by increased noise despite being trained with noise-free images.
5.2. Discussion of the Learned Parameters
In this section, we study how the choice of affects the performance of the network by examining a further learning technique and providing a brief excursus on the behavior of the three learned parameters. To this aim, we initially fixed , as it happens in general in the iterative method, while the remaining parameters () were learned from the architecture. We performed Experiment C with different values of set by analyzing those learned from the network in previous experiments, and we saved the best-performing values.
We then utilized a different technique, described in [
42,
43], for learning the regularization parameter, leaving the techniques used to learn
and
untouched. The idea behind this technique is to provide a local TV regularization by setting a specific parameter for each pixel, thus obtaining a weighted TV (WTV), as follows:
This choice allows to diversify the level of the regularization within the single image, e.g., by regularizing more on constant patches and less on patches with complex textures. From the implementation point of view, the parameters
(
) in the
k-th layer for the
i-th pixel are computed as
where
is a scalar parameter of the architecture learned during the training. In this way, the smaller the gradient magnitude is, the greater the regularization provided at pixel
i is.
We modified the architecture using this procedure for the computation of
in the substructure
in
Figure 3 and carried out several trainings. What is evident is that, although the architecture with this modification is able to yield slightly better performances than those obtained with the technique described in (
32), it becomes worse in terms of stability.
We plot in
Figure 9 the results obtained from Experiment C with a fixed
(black line), with Equation (
29) (red line) and with Equation (
32) (green line). They show that the proposed new learning rule weakens the network stability.
Next, we analyzed the behavior of the parameters learned from the architecture within the different layers. We observed from all the performed experiments that these parameters show very similar trends regardless of the image and the presence or absence of noise. In
Figure 10, we plot the values, obtained from Experiment B, of the three parameters as the iterations increase. Concerning the step size, it initially increases, and then it decreases and stabilizes. The central panel of
Figure 10 shows that the regularization parameter tends to vanish as the layers advance since the presence of noise decreases; as a result of this, the contribution of the regularization term becomes weaker. Finally, the barrier parameter vanishes along the minimization process according to the theoretical issues presented in
Section 3.
5.3. Results on a Realistic Dataset
As our realistic dataset, we used a set of clinical images from the AAPM Low-Dose CT Grand Challenge dataset by the Mayo Clinic (
https://www.aapm.org/GrandChallenge/LowDoseCT/, accessed on 22 May 2023). This datset contains images of the human abdomen with a size of
pixels obtained from full-dose acquisitions. From the entire dataset, we selected about 1500 images for the training phase and 327 for the testing phase.
Experiment D. In this experiment, we trained the two architectures on data affected by noise with a standard deviation
and tested them on data with noise with
chosen randomly in the range
.
Figure 11 shows the boxplots relative to the considered metrics in blue for LPP and in orange for CTprintNet. As can be seen from the boxplots, CTprintNet obtains better performances on average; in addition, it is also more stable with respect to different noise intensities. Moreover, we observe that CTprintNet boxplots are smaller with closer means and medians with respect to the LPP boxplots, confirming its superior stability.
In the following, we will show and comment on three reconstructions obtained from test images with standard deviation values of 0.04, 0.065, and 0.075. We compared the reconstructions obtained from the proposed method with those obtained from the SGP algorithm where the regularization parameter was selected according to the high PSNR among 20 different tested values. The reconstructions from CTprintNet and LPP were obtained by considering the same training as before.
Reconstruction D.1. This example is obtained from a test image corrupted by noise with variance
.
Figure 12 depicts the reconstructed images and reports, for each one, the values of PSNR and SSIM. We observe that both CTprintNet and LPP effectively remove the typical artifacts of the few-view geometry and, in particular, they suppress the noise. However, in the LPP image, a white ghost detail appears. The SGP reconstruction looks blocky, and many details in the dark background are lost.
Reconstruction D.2. The second example is obtained from a test image corrupted by noise with variance
.
Figure 13 depicts the reconstructed images and reports, for each one, the values of PSNR and SSIM. The image obtained from the LPP framework presents over-smoothing and some incorrect pixels, as can be highlighted by the zoomed-in image. Hence, we can infer that CTprintNet reconstruction is advantageous over the other methods in terms of both metrics and noise reduction.
Reconstruction D.3. The last example is obtained from a test image corrupted by the highest amount of noise with a variance of
.
Figure 14 shows that, despite the excessive noise compared to that used in the training, the proposed network is able to provide the best-quality reconstruction. In particular, the SGP image is extremely blocky, as in the previous case, and the LPP image presents again an excessive amount of smoothing and number of corrupted pixels.
The results obtained from the various experiments on the realistic dataset confirmed that our architecture is stable, as already observed in
Section 5.1.