Derivative-Free Iterative One-Step Reconstruction for Multispectral CT

Image reconstruction in multispectral computed tomography (MSCT) requires solving a challenging nonlinear inverse problem, commonly tackled via iterative optimization algorithms. Existing methods necessitate computing the derivative of the forward map and potentially its regularized inverse. In this work, we present a simple yet highly effective algorithm for MSCT image reconstruction, utilizing iterative update mechanisms that leverage the full forward model in the forward step and a derivative-free adjoint problem. Our approach demonstrates both fast convergence and superior performance compared to existing algorithms, making it an interesting candidate for future work. We also discuss further generalizations of our method and its combination with additional regularization and other data discrepancy terms.


Introduction
Classical computed tomography (CT) is based on the inversion of the linear Radon transform, where a scalar-valued attenuation map µ : X → R of the patient is recovered from observation of its Radon transform Rµ : L → R derived from projection data.Here and below X ⊆ R d is the image domain in d = 2, 3 dimensions, and L is a set of integration lines.
While sufficient in many applications, the linear problem ignores the polychromatic nature of the X-rays and the energy-dependent absorption characteristics of real-world objects.
The sample is more accurately represented by a family of attenuation maps µ(e) : X → R dependent on the photon energy e ∈ (0, ∞).Recovering a single µ from projection data using a single energy bin results in a mixture of density maps from different energies resulting in severe non-uniqueness.Additionally, the nonlinearity results in severe beam hardening artifacts that may be partially accounted for by iterative algorithms or analytic modeling [14,18,22,26,29,34].In order to overcome such weaknesses, the idea of multispectral CT (MSCT) is to measure projection data for different energy bands, which are then used to reconstruct multiple attenuation maps.The reconstruction problem, however, becomes nonlinear and much more challenging than pure Radon inversion [1,13,15,17,21,24,30].
In this work, we develop a simple and efficient strategy for tackling the nonlinear MSCT image reconstruction problem.

Multispectral CT
Specifically, in this work we use the material decomposition paradigm in MSCT.In the material decomposition approach it is assumes that the energy dependent attenuation maps µ(e) can be written as µ(e, x) = M m=1 µ m (e) f m (x) where f 1 , f 2 , . . ., f M : X → R are the densities of M separate materials to be recovered and µ m (e) are known and tabled absorption characteristic of the m-th material.By collecting projection data for several energy bands the aim is to recover the material densities.This does not only allow to improve image quality but also offers abroad range of the applications as it reconstructs multiple images encoding different characteristics of specific regions enable a deeper understanding of the objects under examination.The recent significant advancement in the manufacturing of energy-sensitive sensors [19,36] has considerably increased the interest in MSCT.
Assuming B spectral measurements Y 1 , . . ., Y B , the material decomposition problem in MSCS can be written as the problem of recovering f 1 , . . ., f M from data They can be broadly classified into two categories: two-step methods and one-step algorithms.The idea of earlier two-step methods is to perform Radon inversion and material decomposition in two separate steps.Material decomposition can be performed either in the projection domain L (before Radon inversion) or in the image domain X (after Radon inversion).Both methods have their specific advantages and disadvantages.The image-domain decomposition approach allows incorporating prior information about the objects that is naturally contained in the image domain X .However, the nonlinear nature of the problem leads to approximate linear models that introduce severe reconstruction artifacts.The image-domain decomposition approach, on the other hand, allows working with the correct nonlinear model.However, the prior structure in the Radon domain is not directly available.See the works [13,21,25,32,33] and references therein for various proposed two-step approaches.
One-step methods reconstruct the material densities f 1 , . . ., f M through iterative minimization techniques for solving (1.1) and thus overcome the drawbacks of both two-step methods.For some one-step algorithms, we refer to [2,5,7,17,20,23,24,30,35]. Despite their superior performance, such one-step iterative algorithms are computationally expensive.Existing methods require many iterative steps due to poor conditioning of the problem or come with computationally expensive iterative steps.The algorithms proposed in this paper are specific one-step algorithms that address these two drawbacks of existing one-step methods.

Our contributions
As our main contribution, we present a novel derivative-free algorithm designed to combine the advantages of one-step and two-step approaches.To achieve this, we introduce a simple and computationally efficient iterative update that incorporates appropriate preconditioning.Image reconstruction is performed in the image domain, which naturally allows for the inclusion of an image smoothness prior.It also integrates benefits of two-step approaches by separating iterative updates into two parts.Moreover, the main ingredient that makes the algorithm efficient is the use of the full nonlinear forward model for the direct problem but linearisation around zero for the adjoint problem.While avoiding computation and evaluation of the derivative of the forward map, this also allows including a simple channel preconditioning.Our method can be combined with additional regularization.However, in order to show the method in its pure form, we will not include such a modification.

Mathematical modelling of MSCT
We assume that the object to be imaged lies in some domain X ⊆ R Assuming that the material specific attenuation functions µ m (e) are known, the goal is to recover densities f m from indirect X-ray measurements using different energy bins, which we describe next.

Continuous model
We start with continuous modeling, where the quantities involved are functions on continuous domains that will be discretized later.Suppose that X-ray energy with a known incident spectral density I 0 (e) is sent along a line ℓ ∈ L from the source position to the detector position.While propagating along ℓ, the X-rays are attenuated according to µ(x, e) defined in (2.1).This results in an outgoing spectral density I 1 (e) = I 0 (e) exp(− ℓ µ(x, e)dℓ(x)) at the detector.The energy sensitive detector with the spectral profile ŝ(e) records the integral ∞ 0 ŝ(e)I 1 (e)de.Denoting the product of the incident spectral density of the source and detector sensitivity by s(e) ≜ I 0 (e)ŝ(e) , referred to as the effective spectrum, the recorded data is given by The data in equation (2.2) represent a single measurement in MSCT.The goal of material decomposition in MSCT is to determine the density distributions f m from multiple multispectral X-ray measurements by varying the line ℓ and the effective spectra s.
For simplicity of presentation we consider the parallel beam mode where any line ℓ = ℓ(θ, r) is parametrized by its normal vector θ and its distance r from the origin.In this case r) is given by the classical Radon transform of f m .Assuming further The methods that we describe, however, would also work with a three-dimensional image domain X and a general projection domain L of lines in R 3 .

Discretisation
In order to avoid technical details and to concentrate on the main ideas we derive the algorithm for the discrete forward model throughout this paper.For that purpose we represent the material densities via discrete column vectors X 1 , . . ., X M ∈ R Nx and the Radon transform via a matrix A ∈ R Ny×Nx where N y is the number of lines used in the projection domain.Further we discretise the effective energy spectra by vectors S 1 , . . ., S B ∈ R E and the known material attenuations by vectors µ 1 , . . ., µ M ∈ R E .The discretization of (2.3) yields the following discrete image reconstruction problem.
Problem 2.1 (Discrete MSCT image reconstruction problem).Recover the unknown X ∈ Here and below we use the convention that the boldface notation exp E×Ny indicates that the scalar function exp is applied pointwise to a vector in R E×Ny .Further, in (2.4) , • the columns of X = [X 1 , . . ., X M ] are the discrete material images; • A ∈ R Ny×Nx is the discretized Radon transform;

Remark 2.2 (Recalibration).
With ⟨S b , 1⟩ = E e=1 S b,e we obtain (2.5) Thus, with ⊙ denoting pointwise multiplication (also known as Hadamard product) and ⊘ the pointwise division we get This means that the recalibrated forward model F(X) ⊺ ⊘ F(0) ⊺ is the same as the original forward model with normalized effective spectra S b /⟨S b , 1⟩.The matrix with normalized spectra can be written as S ⊘ (S • 1 E×E ).
In this work we use rescaled data to which we apply the pointwise logarithm log Ny×B and the corresponding least squares (LSQ) functional.
Definiton 2.3 (Forward model and LSQ functional).We define the logarithmic forward model and the LSQ functional by (2.8) is the derivation of an efficient reconstruction algorithm rather than statistical optimality, we work with D. However, we expect that our strategy can also be applied to L instead of D.
Remark 2.5 (Regularization).Due to the ill-conditioning of H(X) = Y H , the reconstruction problem has to be regularized [3,31].In the context of LSQ minimization, a natural approach is variational regularization, where one considers ∥H(X) − Y H ∥ 2 2 /2 + αR(X) instead of D with a regularization functional R(X).Recently, in [9], the plug-and-play method has been identified as a regularization technique where the regularization is incorporated by a denoiser.Another class is given by iterative regularization [16], where regularization comes from early stopping.All these regularization methods require the gradient of D, which we compute below.

Algorithm development
In this section, we derive the proposed algorithms for MSCT based on channel preconditioning (CP).The first one (CP-full) integrates channel preconditioning into a gradient scheme.In the second algorithm (CP-fast), the derivative of the forward map is replaced by the derivative at zero.Both methods greatly reduce the number of iterations compared to standard gradient methods and the numerical cost per iteration compared to Newton type methods.Before presenting our algorithms, we start by computing derivatives and gradients and recall existing gradient and Newton type methods.

Derivatives computation
As usual we define the vector value functions ϕ N , ψ N : R Nx → R Nx by pointwise application We have the following explicit expressions for derivatives, adjoints and gradients.
Theorem 3.3 (Derivatives computation).Let F, H : R Nx×M → R B×L and D be defined by (2.4), (2.7), (2.8).The the derivative of F, H, the adjoint and the gradient of D are given by where Q X ≜ exp E×Ny (−M(AX) ⊺ ) denote virtual spectrally resolved data.
Proof.By (2.4) and (3.2) we have This is (3.3).Now, using the (2.7), (3.1), (3.2) we have This is (3.4).Next we turn over to the computation of the adjoints.We this done only for (3.6) and (3.5) is verified in a similar manner.By elementary manipulations For CP-fast the derivative of H at zero plays a central role.
Remark 3.4 (Derivative at zero).Let us consider the derivative at the zero image X = 0.
In this case we have The derivative H ′ [0](ξ) = −AξU ⊺ may also be seen as the linearization of H around zero.
It has been used previously in MSCT and can be simply derived by first order Taylor series approximation as we show next.In fact, with The final expression in (3.11) in matrix notation is (3.8), (3.10).For dual energy CT (the case where M = B = 2), the use of the inverse of U has been proposed in [10].
We emphasize that while we utilize the linearization H ′ [0](ξ) = −AξU ⊺ as an auxiliary tool, we actually solve the full nonlinear problem.However, the linearized LSQ problem min ξ is also of interest in its own.Theoretically proven convergent algorithms for such problems can be found in [8,27].
The derivatives in Theorem (3.3) have a clear composite structure that we will discuss next and exploit for our algorithms.
operates on the multichannel sinogram AX along the (horizontal) channel dimension and S = S ⊘ (S • 1 E×Ny ) are the normalized effective spectra.Application of the chain rule and some computations results in from which we recover (3.4), (3.6).Further, for the zero material sinogram Our algorithms will target this structure for fast and effective iterative updates.

Gradient and Newton-type one-step algorithms
It is helpful to start with gradient based method for minimizing the LSQ functional (2.8).
Our first method CP-full can be seen as a modified version of a hybrid between the standard gradient iteration (or Landweber's method) and the Gauss-Newton method, so we describe these methods.Our second method CP-fast involves a simplification based on linearization around zero.
Gradient based one-step algorithms for solving the MSCT problem HX = Y H using the LSQ functional D(X) start with the optimality condition ∇D 0 and fixed point equations derived from it.Applying a non-stationary positive-definite preconditioner Q k and a step size ω k > 0 results in Explicit expressions for H and H ′ [X k ] * are given by (2.7) and (3.6).Particular choices for the preconditioner and the step size yield various iterative solution methods including Landwebers iteration, the steepest descent method, Gauss-Newton iteration, Newton-CG iterations, or Quasi-Newton methods.To motivate our algorithm it is most educational to discuss the Landweber and the Gauss-Newton iteration.
Landwebers method: In the context of inverse problems, the standard gradient method with a constant step size ω is known as the (nonlinear) Landweber iteration X k+1 = ) for the case where Q k is the identity and ω k = ω.Landweber's iteration is stable, robust, and easy to implement.It is even applicable in ill-posed cases where, with an appropriate stopping criterion, it serves as a regularization method [12].On the other hand, it is also known to be slow in the sense that many iterative steps are required.In our case, this is due to the ill-conditioning of the forward operator.
Gauss-Newton method: Several potential accelerations of Landweber's method exist, and preconditioning seems one of the most natural ones.In the context of nonlinear least squares, the Gauss-Newton method and its variants are well-established and effective.In this case one chooses the preconditioner While significantly reducing the required number of iterations, the Gauss-Newton iteration (3.17), however, is numerically costly as it requires inversion of the non- ] in each iterative update.Moreover, due to ill-conditioning, the inversion needs to be regularized [11,16,28].The algorithms proposed in this paper use simplifications that do not need to be regularized and avoid the costly inversion of the normal operator.

Proposed algorithms
Now we move on to the proposed iterative algorithms for MSCT.We start by with CP-full which is gradient-based algorithm with channel preconditioning.We then derive CP-fast

Numerical simulations
We compared our algorithms CP-full and CP-fast to existing iterative one-step algorithms in MSCT.Our evaluation builds on [24], which compares five such algorithms and provides open source code (https://github.com/SimonRit/OneStepSpectralCT)that is used for our results.We compare CP-full and CP-fast with the best performing one of [24], and further with a two-step method.

Comparison methods
The work [24] compares the following iterative one-step algorithms for MSCT in terms of memory usage and convergence speed to reach a fixed image quality threshold: • [5] derives a non-linear CG method for a weighted LSQ term.
Specifically, [24] found the algorithm of [23] (referred to as Mechlem2018) to be significantly faster than the other four methods and thus we use it for comparison.
In addition, we compare with the algorithm [25] (referred to as Niu2014) as a prime example of an image domain two-step method.They use a penalised weighted least squares estimation technique applied to an empirically linear model.Note that more recently, data-driven methods based on neural networks and deep learning have also been proposed.
Such methods are beyond the scope of this manuscript and we refer the interested reader to the review articles [1,4].

Numerical implementation
For the presented results we build on the Matlab code of [24], which we extend with

Results
Reconstruction results using the proposed algorithms CP-fast (top row) and CP-full (second row) and the two comparison methods Mechlem2018 (row three) and Niu2014 (bottom row) can be seen in Figure 4.2.The phantom shown in the top row is made out of iodine (left), gadolinium (middle), and water (right).Note that in all cases, we use noisy data and iterations with minimal ℓ 2 -reconstruction error X k m − X ⋆ m 2 / ∥X ⋆ m ∥ 2 where X ⋆ m is the ground truth.Note that the iterations are all performed on the same standard laptop, where one iteration of CP-full takes around 6 seconds, one iteration of CP-fast takes around one second one iteration of Mechlem2018 about 4 seconds.Figure 4.3 shows the evolution of the relative ℓ 2 -reconstruction error for various one-step methods.Note that for CP-fast, the minimum error in Iodine and Gadolinium is reached at approximately the same number of iterations, which shows efficient preconditioning and is important in application.Furthermore, note that we do not enforce that the sum over the three density images is one, but in the example, it indeed does not hold.The proposed algorithms turned out to be more stable than Mechlem2018 and produce better results.In particular, CP-full gives the best results while CP-fast is fastest.

Conclusion and outlook
Image reconstruction in MSCT requires the solution of a nonlinear ill-posed problem.Iterative one-step methods are known to be accurate for this purpose.In this work, we propose two generic algorithms named CP-full (channel-preconditioned full gradient iteration) and CP-fast (channel-preconditioned fast iteration).Both algorithms use preconditioning in the channel dimension only, which considerably accelerates the updates compared to full There are several future directions emerging from our work.First, proving the convergence of the two algorithms and demonstrating their regularization properties is important.Second, we will combine them with more realistic noise priors such as Poisson noise, resulting in the maximum likelihood estimation (MLE) functional.Additionally, we will integrate explicit image priors, use plug-and-play strategies, and incorporate learned components.
e) f m (x) de for b = 1, 2, . . ., B .(1.1)Here R(f m ) is the Radon transform applied to the m-th material density map f m , exp applies the exponential function pointwise and s b (e) represents the energy profile (effective spectrum) of the b-th measurement.Classical CT would correspond to the unrealistic case where s b is a Dirac delta function and where applying the pointwise logarithm to (1.1) results in a linear inverse problem.In MSCT one accounts for the fact that s(b) has finite energy covering and thus one has to work with the full nonlinear problem (1.1) for recovering the unknown density maps f m .1.2 Two-step and one-step algorithms Various algorithms have been developed for solving the nonlinear inverse problem (1.1).

2
and consists of a combination of M different materials with densities f m : X → R with m = 1, . . ., M .Each material has a separate mass attenuation coefficient µ m : [0, ∞) → [0, ∞) which is a known function of the X-ray energy e.The total energy dependent (linear) X-ray attenuation coefficient is then given by µ(x, e) = M m=1 µ m (e) f m (x) for (x, e) ∈ X × R .(2.1)

Figure 2 . 1 :
Figure 2.1: Illustration of the forward model in MSCT for m = 3 materials and b = 5energy bins, using E = 150 energy discretizations: First, the Radon transform is applied separately to each of the given material densities X 1 , X 2 , and X 3 , resulting in three material sinograms, which can be seen as a three-channel sinogram.Next, the matrix M is applied to each pixel, resulting in 150 energy sinograms.To each of these sinograms, x → exp(−x) is applied, resulting in 150 virtual energy data maps.By applying the matrix S pixel by pixel, one obtains the final data consisting of data maps.The continuous forward model can be visualized in a similar way by replacing the material images with continuous counterparts and the 150 energy channels with a function-valued channel.
are the observed spectral data.Note we repeatedly nclude the transpose operation (•) ⊺ in (2.4) such that all involved linear operations can be written as matrix multiplications from left.Alternatively we can also write F(X) = exp Ny×E (−AXM ⊺ )S ⊺ reflecting that the discrete Radon transform A operates on the columns and the matrices M and S on the rows of X ∈ R Ny×B .At some places we will denote operation of M on X from the right by M ⋄ X ≜ X • M ⊺ such that we have F(X) = S ⋄ exp Ny×E (−M ⋄ A • X) .The structure of the discrete forward model is illustrated in Figure2.1.

Remark 2 . 4 (
4), Y ∈ R Ny×B are the given data, Y H = log Ny×B (Y ⊘ F(0)) the modified data, and log Ny×B (•) the point-wise logarithm.Using the notations of Definition 2.3, material decomposition in MSCT amounts to the near solution of H(X) = Y H or the near minimization of D. Noise modelling).In the statistical context, LSQ minimization derives from maximum likelihood estimation using a Gaussian noise model on Y H . From a statistical perspective, maximum likelihood estimation for Poisson noise on Y might be more reasonable, resulting in L(X) = ⟨F(X), 1⟩ − ⟨log(F(X)), Y ⟩ → min X .As the focus in this paper Standard algorithms for minimizing(2.8)require derivative of the forward map and the gradient of the LSQ functional that we compute next.Recall the original and logarithmic MSCT forward operator F, H : R Nx×M → R Ny×B and the LSQ functional D defined by (2.4), (2.7) and (2.8).We equip R Nx×M and R Ny×B with the Hilbert space structure induced by the standard inner product⟨ξ 1 , ξ 2 ⟩ = i,j ξ 1 [i, j]ξ 2 [i, j].Further we use H ′ [X] : R Nx×M → R Ny×B to denote the derivative of H at location X ∈ R Nx×M and D ′ [X] : R Nx×M → R and ∇D[X] ∈ R Nx×M to denote the derivative and the gradient of D at X, respectively.Remark 3.1 (Gradients, inner products and preconditioning).By the definition of gradients, we have ⟨∇D[X], ξ⟩ = (D ′ [X]) * (ξ), where (•) * denotes the adjoint of a linear operator.Further, by the chain rule, ∇D[X] = (H ′ [X]) • (H(X) − Y H ). Gradients and adjoints depend on the chosen inner product.For example, the inner product ⟨ξ 1 , U −1 ξ 2 ⟩ on the image space with a positive-definite matrix U yields the modified gradient U•∇D[X].Choosing U such that U • ∇D[X] has improved condition significantly improves gradient based methods for minimizing D.

Remark 3 . 2 (
Some calculus rules).For the following computation we use some elementary calculus rules listed next.Let G 1 , G 2 : R Nx → R Ny be vector valued functions and ϕ : R → R a scalar function with derivative ψ : R → R. Then for X, ξ ∈ R Nx we have 6) we obtain (3.7).

Remark 3 . 5 (
Composite structure of derivatives).Consider X ∈ (R M ) Nx as signal of size N x with M channels (each channel is a material) and the data Y H ∈ (R B ) Ny as signal of size N y with B channels (each channel is an energy bin).Then we can write H(X) = Φ(AX) where the nonlinear function Ny×M ) as in Remark 3.4.Equations (3.13), (3.14), (3.15) factorise the derivative Φ ′ [Z] and its adjoint into two separate parts: A high dimensional ill-posed but linear part A operating in the pixel dimension and small size well-posed but nonlinear part operating in the channel dimension.
photon counts per pixel

Figure 4 . 1 :
Figure 4.1: Physical parameters determining the forward model.Top Left: Attenuation functions.Top right: Incident spectrum.Bottom left: Spectral response of the detectors.Bottom right: effective spectra.
our algorithms.In particular, we work with M = 3 base materials (water, iodine, and gadolinium) and B = 5 energy bins.The energy variable is discretized using E = 150 uniform nodes between 0 and 150 keV.The attenuation functions and energy spectra used are shown in Figure 4.1.We use N x = 256 × 256 image pixels and N y = 262450 line integrals for the Radon transform.In particular the code https://github.com/SimonRit/OneStepSpectralCT creates matrices • M ∈ R 150×3 for the base materials; • S ∈ R 5×150 for the effective energy spectra; • A ∈ R L×N 2 for the Radon transform.After row normalizing S = S ⊘ (S • 1 E×Ny ) we have H(X) = log(exp(−A • X • M ⊺ ) • S ⊺ ) for the MSCT forward model.Further, noisy data Y are created with a different realistic forward model and Poisson noise added.Besides M, S, A, H and Y we require implementations of U ‡ = (U * U) −1 U * for CP-fast and U

Figure 4 . 3 :
Figure 4.3: Relative reconstruction error using proposed CP-fast (top), proposed CP-full (middle) and Mechlem2018 (bootom) as a function of the iteration index.
for CP-full.Computing U ‡ is trivial and can be done in advance; U ‡ k are computed in each step of CP-fast using (3.14), (3.15).