Algorithms for Non-Negatively Constrained Maximum Penalized Likelihood Reconstruction in Tomographic Imaging

Jun Ma

doi:10.3390/a6010136

Department of Statistics, Macquarie University, North Ryde, New South Wales 2109, Australia

Algorithms2013, 6(1), 136-160;https://doi.org/10.3390/a6010136

This article belongs to the Special Issue Machine Learning for Medical Imaging

Version Notes

Order Reprints

Abstract

Image reconstruction is a key component in many medical imaging modalities. The problem of image reconstruction can be viewed as a special inverse problem where the unknown image pixel intensities are estimated from the observed measurements. Since the measurements are usually noise contaminated, statistical reconstruction methods are preferred. In this paper we review some non-negatively constrained simultaneous iterative algorithms for maximum penalized likelihood reconstructions, where all measurements are used to estimate all pixel intensities in each iteration.

Keywords:

tomographic imaging; penalized likelihood; algorithms; constrained optimization

1. Introduction

Image reconstruction in medical imaging, in general, considers estimating pixel intensities or attenuations from measurements obtained from an imaging system. For example, for positron emission tomography (PET), the measurements are obtained according to the procedure summarized below; see [1,2] for more details. A type of radioactive isotope is introduced into the body of a patient and, due to the decay of radioisotope, it emits positrons. Each positron moves in the body for a small distance (usually less than 1 mm) and then interacts with an electron to produce a pair of gamma photons that travel in almost opposite directions. The scanning device in the imaging system can detect each pair of gamma photons with a certain probability and all such detections form the measurements that can appear in a histogram or a list form [3]. It is usually assumed that the detection probabilities are known and they can be pre-computed and stored or computed on-the-fly.

Note that a special feature of measurements is that they are contaminated by noises, which can be a severe problem particularly if each measurement is small in value due to dose safety limit. It is possible that, if the noises are not properly addressed, the reconstructed image can be distorted by excessive noises. For example, for low dose X-ray CT (a type of transmission tomography), the metal streak artifact (e.g., [4]) can be a severe problem for the traditional filtered backprojection method. Statistical iterative reconstruction methods, due to their ability to model the physics and measurements more accurately, are capable to reduce metal streak artifacts [5].

To deal with the noise contamination problem, statistical image reconstruction methods in emission, transmission, X-ray CT, etc. have been developed based on specified probability models for measurements. For example, for single photon emission computed tomography (SPECT), possible options include: weighted least squares (equivalent to variable variance Gaussian) [6], fixed variance Gaussian [7] and Poisson [8] models. These models can also be used for transmission scans. Since accidental coincidences are the main source of background noise in PET, most PET scans are precorrected for accidental coincidences by real-time subtraction of the coincidences in the delayed window [9]. For randoms-precorrected PET scans, possible measurement models are Gaussian, ordinary Poisson and shifted Poisson [9], and all of these are just approximations as the true probability density function (pdf) for the measurements is difficult. Shifted Poisson is also used to model X-ray CT measurements [10].

Different algorithms have been proposed to maximize their corresponding objective functions. For example, for emission tomography, the expectation-maximization (EM) algorithm [8] is designed to maximize the log-likelihood formulated from Poisson distributed measurements, or the iterative space reconstruction algorithm (ISRA) [7] for maximizing the log-likelihood formulated from Gaussian (with fixed variances) distributed measurements. An attractive aspect of both EM and ISRA is that they are very easy to implement and both respect the non-negativity constraint on the reconstructions. However, if the objective function contains a penalty term, which is normally used to smooth the reconstruction, then both EM and ISRA become impractical as they involve, in each iteration, a non-linear system of equations that is tedious to solve exactly due to the large number of unknowns in these equations. Moreover, the penalty function also adds an extra inconvenience when searching for a non-negative solution is desirable.

To simplify notations, both the measurements and the unknown image are lexicographically ordered into vectors. More specifically, we use

y = {(y_{1}, \dots, y_{n})}^{T}

to present the measurement vector and

x = {(x_{1}, \dots, x_{n})}^{T}

to denote the unknown image vector, where superscript T denotes matrix transpose. Note although the notations are unified for different reconstruction problems in this paper, the meaning of these notations, such as x and y, can be different for different imaging modalities. Vectors y and x are related through a system matrix A; see Equation (4) below for some examples. For tomographic reconstruction problems, matrix A is usually assumed known so its estimation is not covered by this paper. Rather, we focus on how to estimate x from the observed y and the known system matrix A. We denote the estimate of x by

\hat{x}

.

Statistical reconstruction

\hat{x}

obtained by maximum penalized likelihood (MPL) (also known as maximum a posteriori (MAP)) is defined by

\hat{x} = {arg max}_{x \geq 0} Ψ (x)

(1)

where

Ψ (x)

is an objective function derived from the probability distribution for measurements and the penalty function. When the

y_{i}

’s are assumed independent (given x), the penalized likelihood objective function is

Ψ (x) = l (x) - h J (x)

(2)

where

l (x)

is the log-likelihood function given by

l (x) = \sum_{i = 1}^{n} l_{i} (μ_{i} (x); y_{i})

(3)

Here

h > 0

is the smoothing parameter and

J (x)

is the penalty function used to smooth

\hat{x}

. In Equation (3),

l_{i}

denotes the log-density function for measurement

y_{i}

, and

μ_{i}

is a function of

x \in R_{+}^{p}

(here

R_{+}^{p}

denotes the non-negative orthant of

R^{p}

) representing the mean measurement of camera bin i. Examples of

μ_{i}

include

μ_{i} (x) = \{\begin{matrix} η_{i} (x) + r_{i} & emission \\ b_{i} e^{- η_{i} (x)} + r_{i} & transmission \end{matrix}

(4)

where

η_{i} (x) = A_{i} x

with

A_{i}

being the ith row of matrix A,

b_{i}

is the known blank scan counts of the ith detector and

r_{i}

the known mean background counts. Another example is polyenergetic transmission scans (such as X-ray CT) where

μ_{i} (x) = \sum_{m = 1}^{M} b_{i m} e^{- A_{i} x_{m}} + r_{i}

(5)

and here

x_{m} = {(x_{1 m}, \dots, x_{p m})}^{T}

denotes the attenuation map corresponding to the m-th energy spectrum, x is a vector formed by the

x_{m}

’s and

b_{i m}

is the blank scan count from energy spectrum m.

In Equation (3) the notation

l_{i} (μ_{i}; y_{i})

is used to emphasize that

l_{i}

is a function of

μ_{i}

and it also involves measurement

y_{i}

. We can also write this function as

l_{i} (η_{i})

or

l_{i} (x)

in different contexts when there is no ambiguity. However, the functional properties of

l_{i}

may change with respect to its different arguments. For example, if assuming

y_{i}

follows a Poisson distribution for either emission or transmission scans, then

l_{i} (μ_{i}) = - μ_{i} + y_{i} log μ_{i}

(6)

This is clearly a concave function of

μ_{i}

for both emission and transmission cases. However, for

l_{i} (x)

(treated as a function of x), it may be no longer concave for transmission but still concave for emission scans. Concavity is an important property exploited by the optimization transfer algorithms.

Let

μ

be an n-vector of all

μ_{i}

. The first term of Equation (2), i.e.,

l (x)

, measures similarity between y and

μ

. Different probability distributions have been used to model

y_{i}

even under the same imaging modality. For example, for emission tomography, if assuming the Poisson model for

y_{i}

(i.e.,

y_{i} \sim Poisson (μ_{i})

) then

l_{i}

is given by Equation (6), or if considering the weighted least squares then

l_{i} = - {(y_{i} - μ_{i})}^{2} / w_{i}

(7)

where

w_{i}

is the weight. When

w_{i} = μ_{i}

we have the weighted least squares model as suggested in [11]. Another example in emission (or transmission) tomography is the randoms-precorrected PET scan (assume no scattering to simplify). In this context, the observed measurements are

y_{i} = y_{i}^{Prompt} - y_{i}^{Delay}

, where

y_{i}^{Prompt}

and

y_{i}^{Delay}

(both unavailable directly) denotes the number of coincidences of the prompt and delayed windows respectively. Although we can assume

y_{i}^{Prompt} \sim Poisson (A_{i} x + r_{i})

and

y_{i}^{Delay} \sim Poisson (r_{i})

and that they are independent, the exact distribution of

y_{i}

cannot be derived directly (e.g., [9]). An approximate probability model suggested in [9] is the shifted Poisson distribution, namely

y_{i} + 2 r_{i} \sim Poisson (A_{i} x + 2 r_{i})

, which gives

l_{i} = - (A_{i} x + 2 r_{i}) + (y_{i} + 2 r_{i}) log (A_{i} x + 2 r_{i})

(8)

or the weighted least squares given by

l_{i} = - {(y_{i} - A_{i} x_{i})}^{2} / (A_{i} x + 2 r_{i})

(9)

Note that the shifted Poisson approximation matches the first two moments with the true probability model for

y_{i} + 2 r_{i}

when both the prompt and delayed measurements are assumed independent and follow Poisson distributions.

In this paper, we present and discuss several important non-negatively constrained penalized likelihood reconstruction algorithms. When designing a reconstruction algorithm in tomographic imaging, one considers the following important issues: (i) the algorithm is computationally efficient, and ideally it involves only forward-projection (e.g.,

A x

) and back-projection (e.g.,

A^{T} y

) operations; (ii) the algorithm can be easily applied to different measurement probability models and imaging modalities; (iii) the algorithm can impose the non-negativity constraint; (iv) the algorithm converges fast. Our discussions on the algorithms in this paper will mainly focus on these points.

In tomographic imaging, it is important to produce smoothed reconstructions as severe noise in a reconstruction can cause false diagnoses. Smoothing can generally be achieved by one of the following five practices: (i) early termination of the iterations (e.g., [12]); (ii) MPL reconstructions with an appropriate smoothing parameter (e.g., [13]); (iii) functional representation of the unknown image by a set of smooth basis functions (e.g., [14]); (iv) post smoothing of the reconstruction within each iteration (e.g., [15]) or after all iterations ([16]); and (v) pre-smoothing of the camera data (i.e., sinogram) followed by filter backprojection (FBP) (e.g., [17,18]). We focus on the penalized likelihood approach to smoothing in this paper. In Equation (2), the smoothing parameter h balances two conflicting targets: fidelity of the

μ_{i}

s to the

y_{i}

s and smoothness of x. Although an appropriate choice of h is important for achieving a reconstruction with balanced fidelity and smoothness, we will not consider how to estimate h in this paper. A penalty function

J (x)

is used to smooth or regulate the estimate

\hat{x}

. Usually,

J (x)

takes the form of

J (x) = \sum_{j = 1}^{p} ρ (C_{j} x)

(10)

where

C_{j} x

represents a neighborhood operation (such as the first or second order difference) on pixel j, and function

ρ (\cdot)

measures the magnitude of

C_{j} x

. A common choice of ρ is the quadratic function:

ρ (v) = \frac{1}{2} v^{2}

. Generally, a quadratic penalty tends to produce images with over-smoothed edges. Possible edge preserving penalties include total variation (TV) (e.g., [19]) Huber [20] and hyperbolic functions (e.g., [21]). Note that

ρ (\cdot)

is convex for all these options.

The optimal choice of the penalty function J and the smoothing parameter h are unsolved problems in image processing and will not be further elaborated in this paper. We emphasize that smoothing by MPL indeed produce visually improved reconstructions over the tradition filtered-backprojection method particularly in dose-limited tomography such as low dose X-ray CT. The edge preserving penalties are extremely useful, such as TV and Huber penalties; see [22,23,24]. However, the MPL reconstructions can have unnatural noise textures very different from the familiar filtered-backprojection method. Its impact on diagnostic tasks is still unknown and this is an active research area; see [25] for examples and discussions.

We adopt the following notations throughout this paper. Let

x^{(k)}

be the estimate of x obtained at iteration k of an algorithm. The notation

\nabla b (\cdot)

indicates the derivative of function b with respect to the variable in the brackets. For example,

\nabla b (A_{i} x)

represents the derivative of b with respect to

A_{i} x

and

\nabla b (x; x^{(k)})

the derivative of b with respect to x. We use

\nabla_{j} b (x)

to denote the derivative of b with respect to

x_{j}

, the j-th element of vector x. We also let

\nabla b (x^{(k)})

and

\nabla_{j} b (x^{(k)})

represent, respectively,

\nabla b (x)

and

\nabla_{j} b (x)

evaluated at

x = x^{(k)}

.

Non-negatively constrained MPL image reconstruction algorithms can be classified into simultaneous and block-iterative (a.k.a. ordered subset (OS)) algorithms. For simultaneous algorithms, all elements in y are used to update x in each iteration, and for block-iterative algorithms, distinct portions of y are used in turn to update x. We discuss in this paper some simultaneous algorithms for non-negatively constrained MPL reconstructions, and the block-iterative algorithms are not included in our discussions. The rest of this paper is arranged as follows. The expectation-maximization algorithm for emission tomography is discussed in Section 2. Section 3 explains the alternating minimization algorithm designed specifically for transmission tomography. Section 4 contains explanations on the optimization transfer algorithms and their applications to tomographic reconstructions. The multiplicative iterative (MI) algorithms for tomographic imaging are provided in Section 5 and the Fisher scoring based Jacobi or Gauss–Seidel over-relaxation algorithms are presented in Section 6. Section 7 explains another Gauss–Seidel method named the iterative coordinate ascent algorithm. Finally, Section 8 includes discussions and remarks about this paper.

In this paper we focus on explaining and summarizing different non-negatively constrained tomographic imaging algorithms. Numerical comparisons of some of these algorithms are available in [26], and therefore will not be given in this paper.

2. EM Algorithm for Maximum Likelihood Reconstruction in Emission Tomography

The expectation-maximization (EM) algorithm [27] is a statistical algorithm for iteratively computing maximum likelihood estimates when data contain random missing values. Here “random” means these missing values do not provide extra information about the parameters we wish to estimate. We first give a brief summary of the EM algorithm below.

Since there exist the missing and the observed (or incomplete) components, we can define the complete data set as a combination of the incomplete and the missing data. Note, however, that our aim is to estimate the unknown parameters by maximizing the log-likelihood of the incomplete data. The rationale for the EM algorithm is that if maximizing the incomplete data likelihood is difficult while maximizing the complete data likelihood is easy, then EM can be used to compute iteratively the maximum of the incomplete data likelihood by maximizing the complete data likelihood in each iteration.

Let

C

be the complete data set given by

C = [Y, M]

, where

Y

denotes the incomplete data and

M

the missing data. Let

l_{C} (x)

be the log-likelihood based on the complete data

C

and

l (x)

the log-likelihood of the incomplete data

Y

, where x is a p-vector for the unknown parameters. Let

\hat{x}

be the maximum likelihood (ML) estimate of x. Then iteration

k + 1

of the EM algorithm comprises two steps:

E-Step: Compute the conditional expectation of the complete data log-likelihood given the incomplete data and $x^{(k)}$ , and denote this function by

$Q (x; x^{(k)}) = E (l_{C} (x) ∣ Y, x^{(k)})$

(11)
M-Step: Update the x estimate by maximizing the Q function, namely

$x^{(k + 1)} = arg max_{x} Q (x; x^{(k)})$

(12)

One major advantage of EM is that it guarantees, under certain regularity conditions, that the incomplete data log-likelihood

l (x)

increases in consecutive iterations before convergence. Note that EM requires availability of the Q function in a closed form; otherwise, a Monte-Carlo E-step can be used to replace the E-step [28].

The EM algorithm was first applied to emission tomograph by Shepp and Vardi [8] and Lange and Carson [29]. Both papers adopt the Poisson model for emission counts, namely

y_{i}

are independent Poisson random variables with mean

μ_{i} = A_{i} x

. This model assumes

r_{i} = 0

; otherwise, we can depict

y_{i}

as the value after subtracting

r_{i}

from the bin i measurement. From this Poisson model, we can formulate the complete data as

C = {y_{i j} : y_{i} = \sum_{j = 1}^{p} y_{i j}}

, where

y_{i j}

follows the Poisson distribution with mean

μ_{i j} = a_{i j} x_{j}

. Clearly, each

y_{i j}

represents the unknown portion of measurement on camera bin i attributed to image pixel j. The corresponding complete data log-likelihood is

l_{C} (x) = \sum_{i = 1}^{n} \sum_{j = 1}^{p} \{- μ_{i j} + y_{i j} log μ_{i j}\}

(13)

and the corresponding Q function is

Q (x; x^{(k)}) = \sum_{i = 1}^{n} \sum_{j = 1}^{p} \{- μ_{i j} + y_{i j}^{(k)} log μ_{i j}\}

(14)

where

y_{i j}^{(k)} = E (y_{i j} ∣ y_{i}, x^{(k)})

. Since the conditional distribution of

y_{i j} ∣ y_{i}

is

Binomial (y_{i}; μ_{i j} / μ_{i})

, we have

y_{i j}^{(k)} = y_{i} a_{i j} x_{j}^{(k)} / \sum_{t = 1}^{p} a_{i t} x_{t}^{(k)}

. Thus after solving

\nabla_{j} Q (x; x^{(k)}) = 0

, the M-step of the EM algorithm gives the following updating formula for x:

x_{j}^{(k + 1)} = \frac{x_{j}^{(k)}}{\sum_{i = 1}^{n} a_{i j}} \sum_{i = 1}^{n} \frac{a_{i j} y_{i}}{\sum_{t = 1}^{p} a_{i t} x_{t}^{(k)}}

(15)

for

j = 1, \dots, p

. It has been pointed out in [23,30] that formula (15) can also be explained by the Bayes conditional probability formula. This EM algorithm possesses the following properties making it attractive for emission tomography; they are:

If the initial $x^{(0)} \geq 0$ then $x^{(k)} \geq 0$ for all $k \geq 1$ ; i.e., it automatically satisfies the non-negativity constraint on x.
The algorithm is easy to implement as it only involves forward- and back-projections.
The updating formula in Equation (15) increases the incomplete data log-likelihood: $l (x^{(k + 1)}) \geq l (x^{(k)})$ , where equality holds only when the iteration has converged.
$x^{(k)}$ satisfies $\sum_{i} μ_{i}^{(k)} = \sum_{i} y_{i}$ , where $μ_{i}^{(k)}$ is $μ_{i}$ with $x = x^{(k)}$ . Thus the x estimate at any iteration satisfies that the total expected and the total observed counts are equal.

The above EM is easy to implement and possesses some attractive properties on the reconstructions. This algorithm, however, is restricted only to emission tomography with Poisson distributed measurements. It cannot be easily extended to other reconstruction tasks. For example, application of the EM algorithm to transmission tomography does not lead to an exact updating formula due to the fact that its M-step does not produce a closed-form solution; see [29]. Another limitation is that this EM algorithm can only be used for maximum likelihood reconstructions, and its application to the MPL reconstruction will not in general result in closed-form updating formula. To rectify this problem, Green [31] developed a one-step-late (OSL) algorithm for the MPL reconstruction by replacing x in the derivative of the penalty function by its current estimate

x^{(k)}

, and therefore an “exact” solution can still be accomplished. But this method suffers from the deficiencies that (i) the algorithm may be non-convergent; and (ii) some estimates may be negative.

De Pierro [32] reproduced the EM updating formula using a totally different argument. In his derivation, there is no missing data and hence no E-step. Although the algorithm is named “modified EM”, it is not a real EM. In fact, this algorithm belongs to a more general class called the optimization transfer algorithms, since the Poisson log-likelihood optimization problem is transferred to a simpler optimization in each iteration. We will summarize the optimization transfer algorithms in the Section 4.

3. Alternating Minimization Algorithms for Transmission Tomography

We have explained in Section 2 that the EM algorithm is not directly suitable for transmission scans as its M-step cannot be computed exactly. In this section, we summarize an alternating minimization algorithm designed to solve the transmission tomographic problem, including X-Ray CT. This algorithm is a generalization to the EM algorithm [33] and its application to transmission tomography can be found in [34].

Following [34], we explain this algorithm using the polyenergetic transmission tomography example. In this context, if assuming transmission scans follow Poisson distributions, the corresponding log-likelihood is

l (z) = \sum_{i = 1}^{n} {y_{i} log μ_{i} (z) - μ_{i} (z)}

(16)

where

y_{i}

is the scan count of detector i and

μ_{i}

(now expressed as a function of vector z, which will be defined below) is given by Equation (5). Moreover, elements of the attenuation map associated with spectrum m, namely elements of

x_{m}

in Equation (5), are further modeled by

x_{m j} = \sum_{r = 1}^{a} u_{m r} z_{r j}

(17)

where j indexes pixels, r represents different types of materials,

u_{m r}

are known linear attenuation coefficients and

z_{r j}

are the unknown partial densities (e.g., [34]) we wish to estimate. In Equation (16), z is a vector of size

p a \times 1

formed by column-wise stacking the vectors

z_{j} = {(z_{1 j}, \dots, z_{a j})}^{T}

.

Define set

E = {q_{i m}; i = 1, \dots, n and m = 0, 1, \dots, M}

(18)

where

q_{i m} = b_{i m} e^{- \sum_{j = 1}^{p} a_{i j} \sum_{r = 1}^{a} u_{m r} z_{r j}}

(19)

for

m = 1, \dots, M

and

q_{i m}

equals the background noise

r_{i}

for

m = 0

. Clearly,

μ_{i}

given in Equation (5) can now be expressed as

μ_{i} = \sum_{m = 0}^{M} q_{i m}

. Define another set

L = {p_{i m} : p_{i m} \geq 0 and \sum_{m} p_{i m} = y_{i}; i = 1, \dots, n and m = 0, 1, \dots, M}

(20)

In [34,

E

is called the exponential family and

L

the linear family. Let p and q be the vectors created from

p_{i m}

and

q_{i m}

respectively. It can be shown that the problem of maximizing the log-likelihood Equation (16) can be re-written as

max_{z} l (z) = min_{q \in E} min_{p \in L} {I (p ∥ q)}

(21)

subject to

z_{r j} \geq 0

, where

I (p ∥ q)

is the I-divergence [35] given by

I (p ∥ q) = \sum_{i = 1}^{n} \sum_{m = 0}^{M} (p_{i m} log \frac{p_{i m}}{q_{i m}} - p_{i m} + q_{i m})

(22)

Thus, maximizing the log-likelihood in Equation (16) can be achieved iteratively. Assuming the estimates

p^{(k)}

,

q^{(k)}

and

z^{(k)}

are obtained at iteration k, then iteration

k + 1

contains two steps:

(i): compute $p^{(k + 1)}$ by minimizing $I (p ∥ q^{(k)})$ subject to $p \in L$ ;
(ii): compute $q^{(k + 1)}$ by minimizing $I (p^{(k + 1)} ∥ q)$ subject to $q \in E$ .

Note that the second step is equivalent to minimizing

I (p^{(k + 1)} ∥ q)

over

z_{r j} \geq 0

with

q_{i m}

being given by the expression in Equation (19).

Minimizing

I (p ∥ q^{(k)})

over

p \in L

is easily achieved using the Lagrange multiplier, and the result is

p_{i m}^{(k + 1)} = q_{i m}^{(k)} \frac{y_{i}}{\sum_{m^{'} = 0}^{M} q_{i m^{'}}^{(k)}}

(23)

On the other hand, direct optimization of

I (p^{(k + 1)} ∥ q)

over

z_{r j} \geq 0

is an unmanageable task as the

z_{r j}

’s are mixed (i.e., not decoupled or separated from each other) within the objective function. One approach to overcome this problem is by using a decoupled objective function representing an upper bound of the original objective function. In fact, it can be shown that for

q_{i m}

given by Equation (19),

\begin{matrix} I (p^{(k + 1)} ∥ q) \leq & \sum_{r = 1}^{a} \sum_{j = 1}^{p} \sum_{i = 1}^{n} \sum_{m = 0}^{M} (p_{i m}^{(k + 1)} a_{i j} u_{m r} z_{r j} + {\hat{q}}_{i m} a_{i j} u_{m r} \frac{1}{v_{0}} e^{v_{0} ({\hat{z}}_{r j} - z_{r j})}) + terms independent of z_{r j} \end{matrix}

(24)

where

v_{0} = {max}_{(i, m)} \sum_{j} \sum_{r} a_{i j} u_{m r}

and

{\hat{q}}_{i m}

is an estimate of

q_{i m}

corresponding to the estimate

{\hat{z}}_{r j} \geq 0

of

z_{r j}

. This inequality is obtained from the fact that

I (p^{(k + 1)} ∥ q)

is a convex function of

z_{r j}

. Clearly,

z_{r j}

on the right hand side of Equation (24) are decoupled and thus their non-negatively constrained optimizations will result in closed-form solutions. When we take

{\hat{z}}_{r j} = z_{r j}^{(k)}

, the optimal solution to

z_{r j}

is

z_{r j}^{(k + 1)} = max \{0, z_{r j}^{(k)} - \frac{1}{v_{0}} log (\frac{{\tilde{w}}_{r}^{(k + 1)}}{{\hat{w}}_{r}^{(k)}})\}

(25)

where

{\tilde{w}}_{r}^{(k + 1)} = \sum_{i} \sum_{m} a_{i j} u_{r m} p_{i m}^{(k + 1)}

and

{\hat{w}}_{r}^{(k)} = \sum_{i} \sum_{m} a_{i j} u_{r m} q_{i m}^{(k)}

. We give some remarks about this algorithm below.

Remarks

(1): This algorithm is designed for maximum likelihood estimation. However, it can be easily extended to MPL where the penalty function must be convex and therefore can also be decoupled.
(2): This algorithm is developed for the likelihood function derived from the simple Poisson measurement noise. Note that the alternating minimization algorithm was also developed for a compound Poisson noise model in [36] and its comparison with the simple Poisson alternating minimization was provided in [37]. For other measurement distributions, however, the corresponding algorithms have to be completely re-developed.
(3): The convergence properties of the alternating maximization algorithm have been studied in [34]. Particularly, it is monotonically convergent under certain conditions.
(4): It will become clear in Section 5 (Example 5.3) that the multiplicative-iterative algorithm can be derived more easily for this transmission reconstruction problem.
(5): The trick of decoupling the objective function using its convex (or concave) property is also the key technique of the optimization transfer algorithms discussed in Section 4.

4. Optimization Transfer Algorithms

Details of the optimization transfer (OT) algorithm (also called the minorization–maximization (MM) algorithm for maximizations) can be found in, for example, [38]. In this section we present this algorithm briefly and explain its application in emission and transmission tomography.

The fundamental idea of the OT algorithm is that it employs a surrogate function to minorize (see the definition below) the objective function

Ψ (x)

in each iteration, and then update the parameter estimate by maximizing this surrogate function.

More specifically, a function

Φ (x; x^{(k)})

is said to minorize

Ψ (x)

at

x^{(k)}

if it satisfies the following “minorization” conditions:

(i): $Ψ (x^{(k)}) = Φ (x^{(k)}; x^{(k)})$ , and
(ii): $Ψ (x) \geq Φ (x; x^{(k)})$ for all x.

Then at iteration

k + 1

, x is estimated by maximizing

Φ (x; x^{(k)})

, i.e.,

x^{(k + 1)} = arg max_{x \geq 0} Φ (x; x^{(k)})

(26)

If the exact maximum is not easy to obtain, we can find an

x^{(k + 1)}

by simply increasing

Φ (x; x^{(k)})

, as this will also guarantee that the monotonic condition stated below remains for

Ψ (x)

.

An attractive property when using this surrogate function is that

x^{(k + 1)}

satisfies the monotonic condition, namely

Ψ (x^{(k + 1)}) \geq Ψ (x^{(k)})

(27)

where equality holds only when the iteration has converged. This monotonic property can be easily verified by the minorization conditions since

\begin{matrix} Ψ (x^{(k + 1)}) & = Φ (x^{(k + 1)}; x^{(k)}) + Ψ (x^{(k + 1)}) - Φ (x^{(k + 1)}; x^{(k)}) \geq Φ (x^{(k)}; x^{(k)}) + Ψ (x^{(k)}) - Φ (x^{(k)}; x^{(k)}) \end{matrix}

\begin{matrix} = Ψ (x^{(k)}) \end{matrix}

For implementation of the OT algorithm to medical imaging, a surrogate function

Φ (x; x^{(k)})

must be determined. There exist different ways of choosing the surrogate function, such as those listed in [38]. We mainly consider two approaches in this paper: (i) the method based on the inequality on concave functions (called the concave inequality hereafter); and (ii) the method based on quadratic lower bounds (also known as paraboloidal surrogates [39]). These ideas are summarized below.

Let

G (x) = \sum_{i = 1}^{n} g_{i} (A_{i} x)

be the objective function we wish to maximize, where

A_{i}

is the i-th row of matrix

A_{n \times p}

and x is a p-vector. For matrix A, we assume its elements

a_{i j}

are non-negative and

\sum_{j} a_{i j} \neq 0

. We also assume that all

g_{i} (\cdot)

are concave functions. Let

π_{i j} \geq 0

be weights satisfying

\sum_{j = 1}^{p} π_{i j} = 1

. Then according to the concave inequality we have

\begin{matrix} g_{i} (A_{i} x) = g_{i} (\sum_{j = 1}^{p} π_{i j} \frac{a_{i j} x_{j}}{π_{i j}}) \geq \sum_{j = 1}^{p} π_{i j} g_{i} (\frac{a_{i j} x_{j}}{π_{i j}}) \end{matrix}

(28)

There are different ways of choosing weights

π_{i j}

. For example, we can use

π_{i j} = a_{i j} x_{j} / A_{i} x

, which is also adopted in [32]. In this case since each

π_{i j}

is a function of x, the surrogate function corresponding to Equation (28) is

Φ (x; x^{(k)}) = \sum_{j = 1}^{p} \sum_{i = 1}^{n} \frac{a_{i j} x_{j}^{(k)}}{A_{i} x^{(k)}} g_{i} (\frac{A_{i} x^{(k)}}{x_{j}^{(k)}} x_{j})

(29)

and it is easy to verify that this surrogate satisfies the minorization conditions. The right hand side of Equation (29) is a weighted summation of functions

g_{i}

, each involving a single

x_{j}

only (i.e., decoupled), and therefore maximization with respect to x of

Φ (x; x^{(k)})

can be achieved by a sequence of 1-D optimizations. Another trick, due to De Pierro [32], uses the following concave inequality:

\begin{matrix} g_{i} (A_{i} x) & = g_{i} (\sum_{j = 1}^{p} π_{i j} [\frac{1}{π_{i j}} a_{i j} (x_{j} - x_{j}^{(k)}) + A_{i} x^{(k)}]) \geq \sum_{j = 1}^{p} π_{i j} g_{i} (\frac{1}{π_{i j}} a_{i j} (x_{j} - x_{j}^{(k)}) + A_{i} x^{(k)}) \end{matrix}

(30)

If the weights

π_{i j}

do not depend on

x_{j}

, then Equation (30) leads to the surrogate function of

Φ (x; x^{(k)}) = \sum_{j = 1}^{p} \sum_{i = 1}^{n} π_{i j} g_{i} (\frac{a_{i j}}{π_{i j}} (x_{j} - x_{j}^{(k)}) + A_{i} x^{(k)})

(31)

which clearly also meets the minorization conditions. In Equation (31), the choice of

π_{i j}

is again flexible, and one popular option is to use

π_{i j} = a_{i j} / \sum_{r} a_{i r}

.

The above two surrogates are developed based on the concave inequality. Another useful approach is to employ a quadratic lower bound (e.g., [40]). Assume

g_{i}

is twice differentiable with its second derivative denoted by

\nabla^{2} g_{i}

. Let

d_{i}^{(k)}

be a number such that

d_{i}^{(k)} \leq \nabla^{2} g_{i} (A_{i} x)

for all

A_{i} x > 0

, then

g_{i} (A_{i} x) \geq g_{i} (A_{i} x^{(k)}) + {(x - x^{(k)})}^{T} A_{i}^{T} \nabla g_{i} (A_{i} x^{(k)}) + \frac{1}{2} {(x - x^{(k)})}^{T} A_{i}^{T} d_{i}^{(k)} A_{i} (x - x^{(k)})

(32)

The right hand side of Equation (32) is a parabola surrogate of

g_{i}

and the condition on

d_{i}^{(k)}

guarantees that this function lies below

g_{i}

. Unlike the previous surrogate functions, this surrogate is not separable in x, and therefore its maximization with respect to x cannot be reduced to a series of 1-D problems. To overcome this problem we can find another function surrogating the above parabola surrogate but with separable x. Towards this, we denote the right hand side quadratic function of Equation (32) by

q_{i}^{(k)} (A_{i} x)

. Since

q_{i}^{(k)}

is concave in

A_{i} x

, we can use either Equations (29) or (31) to find a surrogate to

q_{i}^{(k)}

and the resulting algorithm is called the separable paraboloidal surrogate (SPS) algorithm [39]. For example, corresponding to Equation (31), a separable parabola surrogate of

q^{(k)}

is

Φ (x; x^{(k)}) = \sum_{j = 1}^{p} \sum_{i = 1}^{n} π_{i j} q_{i}^{(k)} (\frac{a_{i j}}{π_{i j}} (x_{j} - x_{j}^{(k)}) + A_{i} x^{(k)})

(33)

A careful selection of the curvature

b_{i}^{(k)}

in Equation (32) can lead to fast convergence of the SPS algorithm. Erdoǧan and Fessler [39] derived the optimal curvature for the SPS algorithm in transmission tomography.

Next, we present two examples explaining how to implement the OT algorithm to emission and transmission tomography.

Example 4.1 (OT for emission scans with Poisson noise).

In this example we explain the application of OT for MPL reconstruction in emission tomography, where measurements are assumed to follow Poisson distributions. De Pierro’s modified EM (MEM) [32] coincides with the method discussed below when

r_{i} = 0

. Firstly, under the Poisson model for emission scans, the penalized log-likelihood function is

Ψ (x) = \sum_{i = 1}^{n} \{- (A_{i} x + r_{i}) + y_{i} log (A_{i} x + r_{i})\} - h \sum_{t = 1}^{p} ρ (C_{t} x)

(34)

where ρ is assumed a convex function. Let

l_{i} (η_{i}) = - (η_{i} + r_{i}) + y_{i} log (η_{i} + r_{i})

(35)

where

η_{i} = A_{i} x

. It is easy to verify that

l_{i}

is concave with respect to

η_{i}

, so we can use Equation (28) to define its surrogate function. On the other hand, for the penalty function in Equation (34),

- ρ

is concave, so we can use Equation (31) to construct its surrogate. Combining them together we have the following surrogate for

Ψ (x)

:

Φ (x; x^{(k)}) = \sum_{j = 1}^{p} [\sum_{i = 1}^{n} \frac{a_{i j} x_{j}^{(k)}}{η_{i}^{(k)}} l_{i} (\frac{η_{i}^{(k)}}{x_{j}^{(k)}} x_{j}) - h \sum_{t = 1}^{p} π_{t j} ρ (\frac{c_{t j}}{π_{t j}} (x_{j} - x_{j}^{(k)}) + C_{t} x^{(k)})]

(36)

where

π_{t j} = c_{t j} / \sum_{r} c_{t r}

. Now

\begin{matrix} \nabla_{j} Φ (x; x^{(k)}) = \sum_{i = 1}^{n} a_{i j} (- 1 + \frac{y_{i}}{x_{j} η_{i}^{(k)} / x_{j}^{(k)} + r_{i}}) - h \sum_{t = 1}^{p} c_{t j} \nabla ρ (\frac{c_{t j}}{π_{t j}} (x_{j} - x_{j}^{(k)}) + C_{t} x^{(k)}) \end{matrix}

(37)

The equation

\nabla_{j} Φ (x; x^{(k)}) = 0

has a closed-form solution for

x_{j}

when

ρ (v) = v^{2} / 2

and

r_{i} = 0

for all i. In this context, Equation (37) reduces to a quadratic function so we wish to solve for

x_{j}

from

(h \sum_{t = 1}^{p} \frac{c_{t j}^{2}}{π_{t j}}) x_{j}^{2} + [\sum_{i = 1}^{n} a_{i j} + h \sum_{t = 1}^{p} (c_{t j} C_{t} x^{(k)} - \frac{c_{t j}^{2}}{π_{t j}} x_{j}^{(k)})] x_{j} - x_{j}^{(k)} \sum_{i = 1}^{n} a_{i j} \frac{y_{i}}{η_{i}^{(k)}} = 0

(38)

subject to

x_{j} \geq 0

, and its analytic solution is readily available. If

r_{i} \neq 0

or ρ is not quadratic, the analytic solution to Equation (37) does not exist. In this case, one can use an 1-D optimization method to solve it, or alternatively, one may use a separable parabola surrogate rather than Equation (36). An example of the latter is explained in the next example where the reconstruction problem is for transmission tomography.

Example 4.2 (OT for transmission scans with Poisson noise).

This example considers the application of OT to MPL reconstruction in transmission tomography. Our explanations follow [39] closely. For transmission scans with Poisson noise, the penalized log-likelihood is given by

Ψ (x) = \sum_{i = 1}^{n} \{- (b_{i} e^{- A_{i} x} + r_{i}) + y_{i} log (b_{i} e^{- A_{i} x} + r_{i})\} - h \sum_{t = 1}^{p} ρ (C_{t} x)

(39)

where ρ is convex. Let

η_{i} = A_{i} x

and

l_{i} (η_{i}) = - (b_{i} e^{- η_{i}} + r_{i}) + y_{i} log (b_{i} e^{- η_{i}} + r_{i})

(40)

Since

l_{i} (η_{i})

is concave with respect to

η_{i}

, a separable parabola surrogate can be defined according to Equation (33). For the first term of Equation (39) (i.e., the log-likelihood part), a separable parabola is given by

Φ_{1} (x; x^{(k)}) = \sum_{j = 1}^{p} \sum_{i = 1}^{n} π_{i j} q_{i}^{(k)} (\frac{a_{i j}}{π_{i j}} (x_{j} - x_{j}^{(k)}) + A_{i} x^{(k)})

(41)

where

q_{i}^{(k)} (η_{i}) = l_{i} (η_{i}^{(k)}) + \nabla l_{i} (η_{i}^{(k)}) (η_{i} - η_{i}^{(k)}) + \frac{1}{2} d_{i}^{(k)} {(η_{i} - η_{i}^{(k)})}^{2}

(42)

and here

d_{i}^{(k)}

satisfies

d_{i}^{(k)} \leq \nabla^{2} l_{i} (η_{i})

for all

η_{i} \geq 0

. For the second term of Equation (39) (i.e., the penalty part), let

γ_{t} = C_{t} x

and let the weights

ξ_{t j} = c_{t j} / \sum_{r} c_{t r}

. Its separable parabola surrogate is

Φ_{2} (x; x^{(k)}) = \sum_{j = 1}^{p} \sum_{t = 1}^{p} ξ_{t j} w_{t}^{(k)} (\frac{c_{t j}}{ξ_{t j}} (x_{j} - x_{j}^{(k)}) + C_{t} x^{(k)})

(43)

where

w_{t}^{(k)} (γ_{t}) = ρ (γ_{t}^{(k)}) + \nabla ρ (γ_{t}^{(k)}) (γ_{t} - γ_{t}^{(k)}) + \frac{1}{2} e_{t}^{(k)} {(γ_{t} - γ_{t}^{(k)})}^{2}

(44)

Here

e_{t}^{(k)}

is chosen such that

e_{t}^{(k)} \geq \nabla^{2} ρ (γ_{t})

for all

γ_{t}

in its range; this curvature

e_{t}^{(k)}

ensures that

w_{t}^{(k)} (γ_{t})

lies above

ρ (γ_{t})

. Aggregating Equations (41) and (43) we obtain a separable parabola surrogate for

Ψ (x)

:

Φ (x; x^{(k)}) = Φ_{1} (x; x^{(k)}) - h Φ_{2} (x; x^{(k)})

(45)

We have

\begin{matrix} \nabla_{j} Φ (x; x^{(k)}) = & \sum_{i = 1}^{n} a_{i j} [\nabla l_{i} (η_{i}^{(k)}) + \frac{d_{i}^{(k)} a_{i j}}{π_{i j}} (x_{j} - x_{j}^{(k)})] \\ - h \sum_{t = 1}^{p} c_{t j} [\nabla ρ (γ_{t}^{(k)}) + \frac{e_{t}^{(k)} c_{t j}}{ξ_{t j}} (x_{j} - x_{j}^{(k)})] \end{matrix}

(46)

and for this example

\nabla l_{i} (η_{i}^{(k)}) = b_{i} e^{- η_{i}^{(k)}} (- \frac{y_{i}}{b_{i} e^{- η_{i}^{(k)}} + r_{i}} + 1)

(47)

Let

a_{i \cdot} = \sum_{r} a_{i r}

and

c_{t \cdot} = \sum_{r} c_{t r}

. The solution of

\nabla_{j} Φ (x; x^{(k)}) = 0

, subject to

x_{j} \geq 0

, is given by

x_{j}^{(k + 1)} = max {0, {\tilde{x}}_{j}^{(k + 1)}},

where

{\tilde{x}}_{j}^{(k + 1)} = x_{j}^{(k)} - \frac{\sum_{i = 1}^{n} a_{i j} \nabla l_{i} (η_{i}^{(k)}) - h \sum_{t = 1}^{p} c_{t j} \nabla ρ (γ_{t}^{(k)})}{\sum_{i = 1}^{n} a_{i j} (a_{i \cdot} d_{i}^{(k)}) - h \sum_{t = 1}^{p} c_{t j} (c_{t \cdot} e_{t}^{(k)})}

(48)

This is in fact a special gradient algorithm with a diagonal preconditioning matrix.

5. Multiplicative Iterative Algorithms

The OT algorithms presented in the last section have the following important achievements: (1) they manage to transform a high dimensional optimization problem into a series of 1-D optimizations; (2) due to 1-D optimizations, the non-negativity constraints can be easily enforced by simply resetting negative estimates to zero in each iteration; (3) the surrogate given by the separable parabola approach is general enough to be applicable to different tomographic reconstructions. A limitation of OT is that it requires all

l_{i} (\cdot)

(log-density) and

- J (\cdot)

(negative penalty) to be concave functions.

In this section we discuss a competitive alternative to the OT method called the multiplicative iterative (MI) algorithm; its application to tomographic imaging can be found in [26] and to box-constrained image processing in [41].

The main motivation of the MI algorithm is that it can be easily derived under different imaging modalities and different measurement noise models. Moreover, for some difficult penalties, such as TV, or even non-convex penalties [42], MI can be easily implemented to solve the corresponding optimization problems.

A general MI updating formula can be developed suitable for all tomographic reconstruction problems regardless of the mean function model, measurement probability distribution and penalty function. The simulation study reported in [26] reveals that MI has competitive convergence speed when compared with OT and other reconstruction algorithms. The MI algorithm does not require concavity of the functions

l_{i}

and

- J

and therefore is more general than the OT algorithm. It requires existence of the first derivatives of

l_{i} (\cdot)

and

J (\cdot)

. It is possible that the objective function

Ψ (x)

in Equation (2) has multiple local maxima. In this case, MI finds one of the local non-negative maxima, depending on the starting value of the algorithm.

Here are some notations needed to explain the MI algorithm. For a function

b (z)

, let

{b (z)}^{+}

be the positive component of

b (z)

and

{b (z)}^{-}

the negative component so that

b (z) = {b (z)}^{+} + {b (z)}^{-} .

For a number b, Let

{[b]}^{+} = max (0, b)

and

{[b]}^{-} = min (0, b)

so that

b = {[b]}^{+} + {[b]}^{-}

. Thus, for the numerical value of function

b (\cdot)

at point

z^{*}

, we can also write

b (z^{*}) = {[b (z^{*})]}^{+} + {[b (z^{*})]}^{-}

.

We develop the MI algorithm from the Karush–Kuhn–Tucker (KKT) necessary conditions for the non-negatively constrained optimization of

Ψ (x)

. They are:

\begin{matrix} \nabla_{j} Ψ (x) & = 0 if x_{j} > 0 and \end{matrix}

(49)

\begin{matrix} \nabla_{j} Ψ (x) & \leq 0 if x_{j} = 0 \end{matrix}

(50)

for

j = 1, \dots, p

. Therefore, we aim to solve for x from

x_{j} (\sum_{i = 1}^{n} \nabla l_{i} (μ_{i}) \nabla_{j} μ_{i} (x) - h \nabla_{j} J (x)) = 0

(51)

Note that the expression inside the brackets of Equation (51) represents

\nabla_{j} Ψ (x)

, and

x_{j}

is included in Equation (51) to reflect the conditions in Equations (49) and (50).

The key step in developing the MI algorithm is to rearrange Equation (51) such that its positive and negative terms appear on different sides of the Equation (51). Hence we rewrite Equation (51) as

\begin{matrix} x_{j} \{- \sum_{i = 1}^{n} (\nabla l_{i} {(μ_{i})}^{+} \nabla_{j} μ_{i} {(x)}^{-} + \nabla l_{i} {(μ_{i})}^{-} \nabla_{j} μ_{i} {(x)}^{+}) + h {[\nabla_{j} J (x)]}^{+}\} \\ = x_{j} \{\sum_{i = 1}^{n} (\nabla l_{i} {(μ_{i})}^{+} \nabla_{j} μ_{i} {(x)}^{+} + \nabla l_{i} {(μ_{i})}^{-} \nabla_{j} μ_{i} {(x)}^{-}) - h {[\nabla_{j} J (x)]}^{-}\} \end{matrix}

(52)

This equation naturally suggests the following fixed point algorithm to update x:

x_{j}^{(k + 1 / 2)} = x_{j}^{(k)} \frac{δ_{j 1}^{(k)} + ϵ}{δ_{j 2}^{(k)} + ϵ}

(53)

where

δ_{j 1}^{(k)}

and

δ_{j 2}^{(k)}

denote respectively the right and left hand side of Equation (52), namely,

\begin{matrix} δ_{j 1}^{(k)} = \sum_{i = 1}^{n} {\nabla l_{i} {(μ_{i}^{(k)})}^{+} \nabla_{j} μ_{i} {(x^{(k)})}^{+} + \nabla l_{i} {(μ_{i}^{(k)})}^{-} \nabla_{j} μ_{i} {(x^{(k)})}^{-}} - h {[\nabla_{j} J (x^{(k)})]}^{-} \end{matrix}

(54)

and

\begin{matrix} δ_{j 2}^{(k)} = - \sum_{i = 1}^{n} {\nabla l_{i} {(μ_{i}^{(k)})}^{+} \nabla_{j} μ_{i} {(x^{(k)})}^{-} + \nabla l_{i} {(μ_{i}^{(k)})}^{-} \nabla_{j} μ_{i} {(x^{(k)})}^{+}} + h {[\nabla_{j} J (x^{(k)})]}^{+} \end{matrix}

(55)

and ϵ is a small positive constant, such as

ϵ = 10^{- 5}

, used to avoid zero denominate of Equation (53). Note that the ϵ value does not affect where the algorithm converges to. As both numerator and denominator of Equation (53) are positive,

x_{j}^{(k + 1 / 2)} \geq 0

whenever

x_{j}^{(k)} \geq 0

.

In Equation (53) the updated

x_{j}

is denoted by

x_{j}^{(k + 1 / 2)}

indicating this is not the final estimate for iteration

k + 1

. In fact, this update does not ensure monotonic increment of

Ψ (x)

and a line search step must be included to rectify this problem. We first express Equation (53) as a gradient algorithm:

x_{j}^{(k + 1 / 2)} = x_{j}^{(k)} + s_{j}^{(k)} \nabla_{j} Ψ (x^{(k)})

(56)

where

s_{j}^{(k)} = s_{j} (x^{(k)})

with

s_{j} (x) = x_{j} / (δ_{j 2} (x) + ϵ)

. Note that

s_{j}^{(k)} > 0

when

x_{j}^{(k)} > 0

. When

x_{j}^{(k)} = 0

we set

s_{j}^{(k)} = 0

only if

\nabla_{j} Ψ (x^{(k)}) < 0

(since

x_{j}^{(k)}

satisfies the KKT condition in this case); otherwise, we set

s_{j}^{(k)} = \tilde{ϵ} / (δ_{j 2} (x^{(k)}) + ϵ)

, where

\tilde{ϵ}

is another small constant such as

10^{- 2}

. Equation (56) explains that

x_{j}^{(k + 1 / 2)}

emanates from

x_{j}^{(k)}

in the gradient direction of Ψ with a non-negative step size

s_{j}^{(k)}

. For the line search step, the search direction is

d^{(k)} = x^{(k + 1 / 2)} - x^{(k)}

with

α^{(k)} > 0

denoting the line search step size. Sine

α^{(k)} \leq 1

guarantees

x^{(k + 1)} \geq 0

, we only search in the fixed range of

0 < α^{(k)} \leq 1

. After including a line search step

x^{(k + 1)}

is obtained according to

\begin{matrix} x^{(k + 1)} = x^{(k)} + α^{(k)} d^{(k)} \end{matrix}

(57)

Due to the fixed search interval, this line search is remarkably simple. One simple and efficient search strategy is provided by the Armijo’s rule (e.g., [43]). Armijo line search is a finite terminating algorithm. Briefly, it starts with

α = 1

, and for each α it checks if the following Armijo condition is satisfied:

Ψ (x^{(k)} + α d^{(k)}) \leq Ψ (x^{(k)}) - ξ α \nabla Ψ {(x^{(k)})}^{T} d^{(k)}

(58)

where

0 < ξ < 1

is a fixed parameter such as

ξ = 10^{- 2}

. If Equation (58) is true then stop; otherwise, reset

α = ρ α

(such as

ρ = 0.6

) and reevaluate the Armijo condition (58). Note that the repeated evaluations of

Ψ (x^{(k)} + α d^{(k)})

can be made with

A d^{(k)}

being computed only once. Therefore, the line search step does not add extra major computations to the MI algorithm.

Convergence properties of the MI algorithm are given in [26,41]. Briefly, under certain regular conditions, MI converges monotonically to a local maxima satisfying the KKT conditions.

For the mean functions given in Equation (4), we have

\nabla_{j} μ_{i} (x) = a_{i j}

for emission and

\nabla_{j} μ_{i} (x) = - b_{i} e^{- A_{i} x} a_{i j}

for transmission tomography; the corresponding updating formula (53) becomes:

x_{j}^{(k + 1 / 2)} = x_{j}^{(k)} \frac{\sum_{i = 1}^{n} \nabla l_{i} {(μ_{i}^{(k)})}^{+} a_{i j} - h {[\nabla_{j} J (x^{(k)})]}^{-} + ϵ}{- \sum_{i = 1}^{n} \nabla l_{i} {(μ_{i}^{(k)})}^{-} a_{i j} + h {[\nabla_{j} J (x^{(k)})]}^{+} + ϵ}

(59)

for emission tomography, and

x_{j}^{(k + 1 / 2)} = x_{j}^{(k)} \frac{- \sum_{i = 1}^{n} \nabla l_{i} {(μ_{i}^{(k)})}^{-} b_{i} e^{- A_{i} x^{(k)}} a_{i j} - h {[\nabla_{j} J (x^{(k)})]}^{-} + ϵ}{\sum_{i = 1}^{n} \nabla l_{i} {(μ_{i}^{(k)})}^{+} b_{i} e^{- A_{i} x^{(k)}} a_{i j} + h {[\nabla_{j} J (x^{(k)})]}^{+} + ϵ}

(60)

for transmission tomography. The derivative

\nabla l_{i} (μ_{i})

in the above formulae depends on the log-density

l_{i} (μ_{i})

. Some examples are presented below.

Example 5.1 (MI for emission scans with Poisson noise).

For emission tomography with Poisson noise, we have the log-density function for

y_{i}

:

l_{i} (μ_{i}) = - μ_{i} + y_{i} log μ_{i}

(61)

where

μ_{i} = A_{i} x + r_{i}

. Thus

\nabla l_{i} (μ_{i}) = - 1 + y_{i} / μ_{i}

, which gives

\nabla l_{i} {(μ_{i})}^{+} = y_{i} / μ_{i}

and

\nabla l_{i} {(μ_{i})}^{-} = - 1

. The updating formula (59) becomes, for

j = 1, \dots, p

,

x_{j}^{(k + 1 / 2)} = x_{j}^{(k)} \frac{\sum_{i = 1}^{n} a_{i j} y_{i} / μ_{i}^{(k)} - h {[\nabla_{j} J (x^{(k)})]}^{-} + ϵ}{\sum_{i = 1}^{n} a_{i j} + h {[\nabla_{j} J (x^{(k)})]}^{+} + ϵ}

(62)

Note that when

h = 0

(i.e., maximum likelihood reconstruction),

r_{i} = 0

and

ϵ = 0

, this algorithm coincides with the EM algorithm for emission tomography. After line search, the estimate of x at iteration

k + 1

is given by Equation (57). In this algorithm, there is only one back-projection (for the numerator of Equation (62)) and one forward-projection in each iteration; its computational burden is the same as EM.

Example 5.2 (MI for randoms-precorrected PET emission scans).

Some PET scans produce measurements that have already been corrected for randoms [44] and their measurements no longer follow Poisson distributions. We consider in this example the model weighted least squares which is also used in [11] but under a different context, i.e., we reconstruct from randoms-precorrected measurements

y_{i}

by maximizing the objective Equation (2) where

l_{i} (μ_{i}) = - \frac{{(y_{i} - μ_{i})}^{2}}{(μ_{i} + 2 r_{i})}

(63)

Here

μ_{i}

is used to denote

A_{i} x

, and for this

μ_{i}

formula (59) still applies. Now since

\nabla l_{i} (μ_{i}) = {(\frac{y_{i} + 2 r_{i}}{μ_{i} + 2 r_{i}})}^{2} - 1

(64)

we have

\nabla l_{i} {(μ_{i})}^{+} = {[(y_{i} + 2 r_{i}) / (μ_{i} + 2 r_{i})]}^{2}

and

\nabla l_{i} {(μ_{i})}^{-} = - 1

. The MI algorithm updates x first according to

x_{j}^{(k + 1 / 2)} = x_{j}^{(k)} \frac{\sum_{i = 1}^{n} a_{i j} {(\frac{y_{i} + 2 r_{i}}{μ_{i}^{(k)} + 2 r_{i}})}^{2} - h {[\nabla_{j} J (x^{(k)})]}^{-} + ϵ}{\sum_{i = 1}^{n} a_{i j} + h {[\nabla_{j} J (x^{(k)})]}^{+} + ϵ}

(65)

and then, after the line search step, computes

x^{(k + 1)}

according to Equation (57).

Example 5.3 (MI for polyenergetic transmission scans with Poisson noise).

Application of the MI algorithm to polyenergetic X-ray CT is again extremely easy. Under the assumption of Poisson noise, the log-density for measurement

y_{i}

is identical to Equation (61) but now with

μ_{i} = \sum_{m = 1}^{M} b_{i m} e^{- \sum_{j} a_{i j} \sum_{r} u_{m r} z_{r j}} + r_{i}

; see Equation (17). In Example 5.1 we have already derived

\nabla l_{i} {(μ_{i})}^{+}

and

\nabla l_{i} {(μ_{i})}^{-}

for the Poisson noise log-density. On the other hand, the derivative of

μ_{i}

with respect to

z_{r j}

(denoted by

\nabla_{r j} μ_{i}

) is

\nabla_{r j} μ_{i} = - \sum_{m} b_{i m} e^{- \sum_{j} a_{i j} \sum_{r} u_{m r} z_{r j}} a_{i j} u_{m r}

(66)

Thus, the updating formula for ployenergetic transmission is

z_{r j}^{(k + 1 / 2)} = z_{r j}^{(k)} \frac{\sum_{i = 1}^{n} a_{i j} \sum_{m = 1}^{M} u_{m r} b_{i m} e^{- \sum_{j} a_{i j} \sum_{r} u_{m r} z_{r j}^{(k)}} - h {[\nabla_{j} J (z^{(k)})]}^{-} + ϵ}{\sum_{i = 1}^{n} a_{i j} (y_{i} / μ_{i}^{(k)}) \sum_{m = 1}^{M} u_{m r} b_{i m} e^{- \sum_{j} a_{i j} \sum_{r} u_{m r} z_{r j}^{(k)}} + h {[\nabla_{j} J (z^{(k)})]}^{+} + ϵ}

(67)

for

r = 1, \dots, a

and

j = 1, \dots, p

. After the line search step specified in Equation (57),

z^{(k + 1)}

is obtained. This iterative formula involves one forward- and two back-projections in each iteration, and therefore it demands similar amount of computations when compared with the alternative minimization algorithm in [34]. When

h = 0

,

r_{i} = 0

ϵ = 0

and

m = 1

, this MI algorithm is identical to the algorithm given in [45] for maximum likelihood reconstruction in transmission tomography. Note that unlike the optimization transfer and alternating minimization algorithms, the MI algorithm can be easily derived for other objective functions, such as the weighted least-squares function.

The above examples demonstrate that the MI algorithms are easy to derive and to implement in tomographic imaging. The line search step it requires does not incur significant computational burden.

6. Modified Fisher’s Method of Scoring Using Jacobi or Gauss–Seidel Over-Relaxations

In this section we elaborate on another non-negatively constrained method for tomographic imaging, which is a modification to the standard Fisher’s method of scoring (FS) algorithm. This method is developed based on the following steps. Firstly, the objective function

Ψ (x)

is approximated by a quadratic function in each iteration, where the Fisher information matrix (e.g., [46]) is used to define the quadratic term; secondly, an over-relaxation method, either the Jacobi over-relaxation (JOR) or the Gauss–Seidel over-relaxation (also called the successive over-relaxation (SOR)), is employed to solve approximately the linear system derived from zeroing the derivative of this quadratic function. The resulting algorithms are called FS-JOR and FS-SOR and their detailed descriptions can be found in [47,48]. Descriptions of the JOR and SOR methods are available, for example, in [49].

FS is a general optimization algorithm for computing maximum likelihood estimates. Its advantages over the traditional Newton’s method have been documented in [50]. Briefly, FS iterations are well defined due to the non-negativeness of the Fisher information matrix, but for the Newton’s method, the negative Hessian matrix may not even be non-negative definite, making it unnecessarily proceed in the uphill direction in some applications. Transmission tomography is an example where this problem for the Newton’s method indeed occurs; see Example 6.2.

We assume the objective function

Ψ (x)

in Equation (2) is twice differentiable and let

F (x)

be the Fisher information matrix, namely

F (x) = E (- \nabla^{2} Ψ (x))

. At iteration

(k + 1)

of the Fisher scoring algorithm,

Ψ (x)

is approximated by the following quadratic function:

\begin{matrix} Ψ (x) & \approx Ψ (x^{(k)}) + {(x - x^{(k)})}^{T} \nabla Ψ (x^{(k)}) - \frac{1}{2} {(x - x^{(k)})}^{T} F^{(k)} (x - x^{(k)}) \equiv Ψ^{(k)} (x) \end{matrix}

(68)

where

F^{(k)}

denotes the Fisher information matrix at

x^{(k)}

. Then the x estimate is updated by constrained maximization of

Ψ^{(k)} (x)

, namely

x^{(k + 1)} = arg max_{x \geq 0} Ψ^{(k)} (x)

(69)

The KKT conditions for this optimization are

\begin{matrix} \nabla_{j} Ψ^{(k)} (x) & = 0 if x_{j} > 0 and \end{matrix}

(70)

\begin{matrix} \nabla_{j} Ψ^{(k)} (x) & \leq 0 if x_{j} = 0 \end{matrix}

(71)

where

\nabla_{j} Ψ^{(k)} (x) = \nabla_{j} Ψ (x^{(k)}) - F_{j}^{(k)} (x - x^{(k)})

(72)

Here

F_{j}^{(k)}

denotes the j-th row of matrix

F^{(k)}

. The JOR and SOR methods solve, for

j = 1, \dots, p

,

\nabla_{j} Ψ^{(k)} (x) = 0

(73)

in different manners: JOR solves it by fixing all the x elements, except

x_{j}

, at their estimates from the last iteration (i.e., iteration k), but SOR solves it by fixing all the x elements, except

x_{j}

, at their most current estimates.

The above illustrations describe how to incorporate JOR or SOR sub-iterations into the FS algorithm. In fact, in each iteration, JOR or SOR is used to solve approximately the linear system of equations determined by the FS algorithm, and then this approximate solution is used as the starting value for the next FS iteration. These new schemes modify the standard FS method, and are feasible for large estimation problems.

Usually it suffices to run one JOR or SOR sub-iteration. But running more than one sub-iterations is also attractive as it has the potential to reduce the computations for the entire optimization process. Suppose within each Fisher scoring iteration we run m sub-iterations of JOR or SOR. The resulting algorithms are called the m-step FS-JOR and m-step FS-SOR algorithms respectively. Let r be the sub-iteration index for the over-relaxation method and

x^{(k, r)}

the estimate of x at the r-th over-relaxation sub-iteration of the k-th FS iteration. Let

f_{j t}^{(k)}

be the

(j, t)

-th element of

F^{(k)}

. Assume

f_{j j}^{(k)} > 0

for all j. At iteration

k + 1

, first set

x^{(k, 0)} = x^{(k)}

. If using JOR to solve Equation (73) we have

x_{j}^{(k, r + 1)} = x_{j}^{(k, r)} + ω \frac{1}{f_{j j}^{(k)}} (\nabla_{j} Ψ (x^{(k)}) - \sum_{t = 1}^{p} f_{j t}^{(k)} (x_{t}^{(k, r)} - x_{t}^{(k)}))

(74)

and if using SOR to solve we then have

x_{j}^{(k, r + 1)} = x_{j}^{(k, r)} + ω \frac{1}{f_{j j}^{(k)}} (\nabla_{j} Ψ (x^{(k)}) - \sum_{t = 1}^{j - 1} f_{j t}^{(k)} (x_{t}^{(k, r + 1)} - x_{t}^{(k)}) - \sum_{t = j}^{p} f_{j t}^{(k)} (x_{t}^{(k, r)} - x_{t}^{(k)}))

(75)

where

r = 0, \dots, m - 1

and

ω > 0

is the relaxation parameter. If any

x_{j}^{(k, r + 1)} < 0

then it is reset to zero. This resetting is correct since the only possibility for

x_{j}^{(k, r + 1)} < 0

is that the expressions in the round brackets of Equations (74) and (75) are negative since

x_{j}^{(k, r)} \geq 0

and

f_{j j}^{(k)} > 0

. Hence resetting

x_{j}^{(k, r + 1)}

to zeros assures that the FS-JOR and FS-SOR algorithms converge to, when they converge, the solution satisfying the KKT conditions. At the end of the sub-iterations set

x^{(k + 1)} = x^{(k, m)}

. Note that when

m = 1

, the last term in the round brackets of either Equation (74) or (75) becomes zero. Thus 1-step FS-JOR is basically a gradient algorithm and we can therefore replace ω by a line search step size

ω^{(k)}

, where the search range is fixed at

0 < ω^{(k)} \leq 1

as this range will keep the estimate non-negative.

The relaxation parameter ω is used to achieve convergence of the FS-JOR and FS-SOR algorithms. Results contained in [47] give convergence properties when

n \to \infty

and when the non-negativity constraint is ignored. In fact in this context FS-SOR converges if

0 < ω < 2

and FS-JOR converges if

0 < ω < ξ_{max}

, where

ξ_{max}

is the maximum eigenvalue of

D_{F} {(\hat{x})}^{- 1 / 2} F (\hat{x}) D_{F} {(\hat{x})}^{- 1 / 2}

. Here

\hat{x}

is the MPL solution.

From the updating formulae given in Equations (74) and (75) we can see that both FS-JOR and FS-SOR involve the gradient

\nabla Ψ (x)

and the Fisher information matrix based operation

F (x) δ

. The gradient is standard for most reconstruction algorithms, but the computation of

F (x) δ

requires more careful consideration. It will become clear in Examples 6.1 and 6.2 that for tomographic reconstructions

F (x)

usually exhibits as

A^{T} W (x) A + \nabla^{2} J (x)

, where

W (x) = diag (w_{x} (x), \dots, w_{n} (x))

. It is not wise to compute

A^{T} W (x) A

first as this involves multiplications of two huge matrices A and

A^{T}

. For FS-JOR, a feasible alternative is to use the forward projection to find

A δ

first, then to multiply it with the diagonal values of W to get

W (x) A δ

, and finally to back-project

W (x) A δ

to obtain

F (x) δ

(ignoring the penalty term). This approach involves only one forward- and one back-projections in every sub-iteration. The situation for FS-SOR is more complicated since

δ

changes with the pixel index j. The above approach for FS-JOR cannot be used here as otherwise each FS-SOR sub-iteration will demand infeasible p pairs of forward- and back-projections. To confront this problem, let

x_{≻ j}^{(k, r)} = {(x_{1}^{(k, r + 1)}, \dots, x_{j - 1}^{(k, r + 1)}, x_{j}^{(k, r)}, \dots, x_{p}^{(k, r)})}^{T}

(76)

The

F δ

part of Equation (75) involves

A (x_{≻ j}^{(k, r)} - x^{(k)})

. Note that

A_{i} (x_{≻ j}^{(k, r)} - x^{(k)}) = A_{i} (x_{≻ j - 1}^{(k, r)} - x^{(k)}) + a_{i j} (x_{j - 1}^{(k, r + 1)} - x_{j - 1}^{(k, r)})

(77)

so we can start with

A (x_{≻ 0}^{(k, r)} - x^{(k)}) \equiv A (x_{≻ p + 1}^{(k, r - 1)} - x^{(k)})

and obtain

A (x_{≻ j}^{(k, r)} - x^{(k)})

by applying Equation (77). Although here the number of multiplications for

A δ

(where vector

δ

varies with its index j) becomes the same as

A x

, it requires column access to the system matrix A, which can be a problem if A is generated on-the-fly.

We next provide examples of applying FS-JOR and FS-SOR to emission and transmission tomography.

Example 6.1 (Emission scans with Poisson noise).

For emission reconstruction with Poisson noise, the log-density of

y_{i}

is given by Equation (61). Thus for the corresponding object function

Ψ (x)

of Equation (2), its gradients are

\nabla_{j} Ψ (x) = \sum_{i = 1}^{n} a_{i j} \{- 1 + \frac{y_{i}}{μ_{i}}\} - h \nabla_{j} J (x)

(78)

and its Fisher information matrix elements are

f_{j t} = E [- \nabla_{j t}^{2} Ψ (x)] = \sum_{i = 1}^{n} \frac{a_{i j} a_{i t}}{μ_{i}} + h \nabla_{j t}^{2} J (x)

(79)

where

μ_{i} = A_{i} x + r_{i}

,

j and t = 1, \dots, p

. Assuming we run only one sub-iteration for FS-JOR or FS-SOR (i.e.,

m = 1

), the FS-JOR iterative formula is

{\tilde{x}}_{j}^{(k + 1)} = x_{j}^{(k)} + ω \frac{1}{\sum_{i} a_{i j}^{2} / μ_{i}^{(k)} + h \nabla_{j j}^{2} J (x^{(k)})} (\sum_{i = 1}^{n} a_{i j} (y_{i} - μ_{i}^{(k)}) / μ_{i}^{(k)} - h \nabla_{j} J (x^{(k)}))

(80)

and the FS-SOR formula is

\begin{matrix} {\tilde{x}}_{j}^{(k + 1)} = x_{j}^{(k)} & + ω \frac{1}{\sum_{i} a_{i j}^{2} / μ_{i}^{(k)} + h \nabla_{j j}^{2} J (x^{(k)})} (\sum_{i = 1}^{n} a_{i j} (y_{i} - μ_{i}^{(k)}) / μ_{i}^{(k)} - h \nabla_{j} J (x^{(k)}) \\ - \sum_{t = 1}^{j - 1} \{\sum_{i = 1}^{n} a_{i j} a_{i t} / μ_{i}^{(k)} + h \nabla_{j t}^{2} J (x^{(k)})\} (x_{t}^{(k + 1)} - x_{t}^{(k)})) \end{matrix}

(81)

Then

x_{j}^{(k + 1)} = max {0, {\tilde{x}}_{j}^{(k + 1)}}

. The formula given in Equation (80) is just a gradient algorithm so ω can be replaced by a line search step size

ω^{(k)} \in (0, 1]

. Efficient computation of Equation (81) requires column access to matrix A as explicated before. Hudson et al. [48] reported simulation results and a real data application for emission reconstruction. They compared FS-JOR and FS-SOR with EM. The computer time required per iteration for the EM and one-step FS-JOR algorithms were similar. By comparison with the EM algorithm, FS-JOR and FS-SOR accelerated convergence when an appropriate value of ω was used. Particularly, FS-SOR had a superior speed of convergence when

ω = 1

.

Example 6.2 (Transmission scans with Poisson noise).

For transmission reconstructions with Poisson noise, we can easily work out the gradient and Fisher information matrix from its penalized likelihood function. The gradients are

\nabla_{j} Ψ (x) = \sum_{i = 1}^{n} a_{i j} b_{i} e^{- η_{i}} \{1 - \frac{y_{i}}{μ_{i}}\} - h \nabla_{j} J (x)

(82)

and the Fisher information matrix elements are

f_{j t} = E [- \nabla_{j t}^{2} Ψ (x)] = \sum_{i = 1}^{n} \frac{a_{i j} a_{i t} {(b_{i} e^{- η_{i}})}^{2}}{μ_{i}} + h \nabla_{j t}^{2} J (x)

(83)

where

η_{i} = A_{i} x

,

μ_{i} = b_{i} e^{- η_{i}} + r_{i}

and

j and t = 1, \dots, p

. Note that for this example, the Fisher information matrix is non-negative but the negative Hessian matrix may not be non-negative, making the Newton method non-applicable. Corresponding to

m = 1

, the FS-JOR iterative formula is

\begin{matrix} {\tilde{x}}_{j}^{(k + 1)} = x_{j}^{(k)} & + ω \frac{1}{\sum_{i} a_{i j}^{2} {(b_{i} e^{- η_{i}^{(k)}})}^{2} / μ_{i}^{(k)} + h \nabla_{j j}^{2} J (x^{(k)})} (\sum_{i = 1}^{n} a_{i j} b_{i} e^{- η_{i}^{(k)}} (- y_{i} + μ_{i}^{(k)}) / μ_{i}^{(k)} \\ - h \nabla_{j} J (x^{(k)})) \end{matrix}

(84)

and the FS-SOR formula is

\begin{matrix} {\tilde{x}}_{j}^{(k + 1)} = x_{j}^{(k)} & + ω \frac{1}{\sum_{i} a_{i j}^{2} {(b_{i} e^{- η_{i}^{(k)}})}^{2} / μ_{i}^{(k)} + h \nabla_{j j}^{2} J (x^{(k)})} (\sum_{i = 1}^{n} a_{i j} b_{i} e^{- η_{i}^{(k)}} (- y_{i} + μ_{i}^{(k)}) / μ_{i}^{(k)} \\ - h \nabla_{j} J (x^{(k)}) - \sum_{t = 1}^{j - 1} \{\sum_{i = 1}^{n} a_{i j} a_{i t} {(b_{i} e^{- η_{i}^{(k)}})}^{2} / μ_{i}^{(k)} + h \nabla_{j t}^{2} J (x^{(k)})\} (x_{t}^{(k + 1)} - x_{t}^{(k)})) \end{matrix}

(85)

Then

x_{j}^{(k + 1)} = max {0, {\tilde{x}}_{j}^{(k + 1)}}

. Again, Equation (84) is a gradient algorithm so that a line search can be used, and efficient implementation of Equation (85) demands unpleasant column access to A.

This section explains the Fisher scoring based image reconstruction algorithms using JOR or SOR sub-iterations. For these algorithms, any negative estimates in each iteration can be corrected by simply resetting to zero, as this way of resetting enforces the KKT conditions. If only one sub-iteration is used, FS-JOR is equivalent to a gradient algorithm. For efficient implementation of FS-SOR, it requires column retrieval of the system matrix A, which can be infeasible for some reconstruction problems.

7. Iterative Coordinate Ascent Algorithms

Another method using SOR is the method of iterative coordinate ascent (ICA) (or iterative coordinate descent (ICD) for minimization problems). ICA was first implemented to tomographic imaging in [51,52]. The basic idea of ICA is to apply SOR directly to the objective function

Ψ (x)

, resulting in a sequence of 1-D functions where each

x_{j}

is associated with one of these 1-D functions. Then each function is solved exactly or approximately to update the corresponding

x_{j}

. More specifically, using the SOR principle we can define a function for

x_{j}

according to

ψ_{j}^{(k + 1)} (x_{j}) = Ψ (x_{1}^{(k + 1)}, \dots, x_{j - 1}^{(k + 1)}, x_{j}, x_{j + 1}^{(k)}, \dots, x_{p}^{(k)})

(86)

This is a function of

x_{j}

only and we can update the

x_{j}

estimate by

x_{j}^{(k + 1)} = arg max_{x_{j} \geq 0} ψ_{j}^{(k + 1)} (x_{j})

(87)

Since this is a 1-D function, the constraint

x_{j} \geq 0

can be easily enforced using, for example, the resetting to zero approach.

One computational issue with ICA when applied to tomographic imaging is that it requires repeated calculations of

η_{i} (x) = \sum_{j} a_{i t} x_{t}

for all i when updating

x_{j}

. This problem can be rectified by the following approach. Let

x_{≻ j}^{k} = {(x_{1}^{(k + 1)}, \dots, x_{j - 1}^{(k + 1)}, x_{j}^{(k)}, \dots, x_{p}^{(k)})}^{T}

(88)

Consider the evaluation of

η_{i} (x_{≻ j}^{k})

. Assuming the update of

x_{j - 1}

is given by

x_{j - 1}^{(k + 1)} = x_{j - 1}^{(k)} + δ_{j - 1}^{(k)},

then

a_{i, j - 1} x_{j - 1}^{(k + 1)} = a_{i, j - 1} x_{j - 1}^{(k)} + a_{i, j - 1} δ_{j - 1}^{(k)},

and therefore

η_{i} (x_{≻ j}^{k}) = η_{i} (x_{≻ j - 1}^{k}) + a_{i, j - 1} δ_{j - 1}^{(k)}

(89)

This relationship explains that

η_{i} (x_{≻ j}^{k})

can be cheaply computed using the

η_{i}

value before the

x_{j}

update plus a correction term. However, similar to FS-SOR, it necessitates column access to A. This can be a potential issue if A is generated on-the-fly.

Next we use again the emission and transmission examples to elaborate the ICA algorithm.

Example 7.1 (Emission scans with Poisson noise).

Firstly, we define

x_{(j)}^{(k)} = {(x_{1}^{(k + 1)}, \dots, x_{j - 1}^{(k + 1)}, x_{j}, x_{j + 1}^{(k)}, \dots, x_{p}^{(k)})}^{T}

(90)

From the penalized log-likelihood function of emission measurements

y_{i}

(see, for example, Equation (34)), function

ψ_{j} (x_{j})

is given by

ψ_{j} (x_{j}) = \sum_{i = 1}^{n} \{- (η_{i} (x_{(j)}^{(k)}) + r_{i}) + y_{i} log (η_{i} (x_{(j)}^{(k)}) + r_{i})\} - h J (x_{(j)}^{(k)})

(91)

Since this is a non-quadratic function of

x_{j}

, exact maximization is infeasible. We can find its approximate optimization by running a single or multi- step of, for example, the Newton or Fisher scoring algorithm. In this example we consider using the Fisher scoring algorithm to optimize

ψ_{j} (x_{j})

and call the resulting algorithm ICA-FS. After a single step of Fisher scoring we have

\begin{matrix} {\tilde{x}}_{j}^{(k + 1)} = x_{j}^{(k)} & + ω_{j}^{(k)} \frac{1}{\sum_{i} a_{i j}^{2} / μ_{i} (x_{≻ j}^{k}) + h \nabla_{j j}^{2} J (x_{≻ j}^{k})} (\sum_{i = 1}^{n} a_{i j} (y_{i} / μ_{i} (x_{≻ j}^{k}) - 1) - h \nabla_{j} J (x_{≻ j}^{k})) \end{matrix}

(92)

where

μ_{i} (x_{≻ j}^{k}) = η_{i} (x_{≻ j}^{k}) + r_{i}

and

ω_{j}^{(k)}

is a line search step size enforcing

ψ_{j} (x_{j}^{(k + 1)}) \geq ψ_{j} (x_{j}^{(k)})

, where equality holds only when the algorithm is converged. This monotonic condition eventually leads to

Ψ (x^{(k + 1)}) \geq Ψ (x^{(k)})

. The update for

x_{j}

is then

x_{j}^{(k + 1)} = max {0, {\tilde{x}}_{j}^{(k + 1)}}

.

Example 7.2 (Transmission scans with Poisson noise).

For this example we have

ψ_{j} (x_{j}) = \sum_{i = 1}^{n} \{- (b_{i} e^{- A_{i} x_{(j)}^{(k)}} + r_{i}) + y_{i} log (b_{i} e^{- A_{i} x_{(j)}^{(k)}} + r_{i})\} - h J (x_{(j)}^{(k)})

(93)

where

x_{(j)}^{(k)}

is defined in Equation (90). The ICA-FS algorithm gives

\begin{matrix} {\tilde{x}}_{j}^{(k + 1)} = x_{j}^{(k)} & + ω_{j}^{(k)} \frac{1}{\sum_{i} a_{i j}^{2} {(b_{i} e^{- A_{i} x_{≻ j}^{k}})}^{2} / μ_{i} (x_{≻ j}^{k}) + h \nabla_{j j}^{2} J (x_{≻ j}^{k})} \times \end{matrix}

\begin{matrix} (\sum_{i = 1}^{n} a_{i j} b_{i} e^{- A_{i} x_{≻ j}^{k}} (- y_{i} / μ_{i} (x_{≻ j}^{k}) + 1) - h \nabla_{j} J (x_{≻ j}^{k})) \end{matrix}

(94)

where

μ_{i} (x_{≻ j}^{k}) = b_{i} e^{- A_{i} x_{≻ j}^{k}} + r_{i}

, and then

x_{j}^{(k + 1)} = max {0, {\tilde{x}}_{j}^{(k + 1)}}

.

8. Conclusions

Image reconstruction from projections has wide applications, particularly in medical imaging. Emission and transmission tomography and X-ray CT all fall into this category. Three types of reconstruction methods are available: Fourier methods, algebraic methods and likelihood based reconstruction methods. Our attention in this paper is on the penalized likelihood approaches.

In this paper we present and discuss several important simultaneous MPL reconstruction algorithms, where the non-negativity constraint is enforced. The EM algorithm is limited to maximum likelihood reconstruction problems in emission tomography and is difficult to extend to other imaging modalities and probability models for the likelihood. One variation of EM, called the alternating minimization, is developed for transmission tomography. Another variation of EM, called the OT algorithm, is suitable for any imaging modalities and probability models, but its derivation is often cumbersome as the option for the surrogate function is flexible. The OT algorithm based on the separable parabola surrogate is relatively easy to implement to different tomographic imaging. The MI algorithm, on the other hand, is easy to derive and to implement as its line search step is cheap to compute. Its convergence speed, according to the simulation study, is similar to the separable parabola surrogate algorithm. The FS-JOR and FS-SOR algorithms first apply the Fisher information matrix to obtain a quadratic approximation to the objective function, and then optimize it using JOR or SOR schemes. Implementation of ICA-FS reverses the order of FS and SOR in FS-SOR. For both FS-SOR and ICA-FS, their convergence speeds are usually superior, but their potential problem is that both involves column retrieval of A, which may not be pre-generated and stored.

For some of the algorithms covered in this paper, their corresponding block-iterative algorithms have been developed. Block-iterative algorithms can usually achieve faster convergence than their simultaneous counterpart. However, discussions of the block-iterative algorithms are not included in this paper.

Acknowledgements

I wish to thank the referees for their invaluable comments and suggestions which have greatly enhanced the quality of this paper.

References

Phelps, M.E.; Hoffman, E.J.; Mullani, N.A.; Ter-Pogossian, M.M. Application of annihilation coincidence detection to transaxial reconstruction tomography. J. Nucl. Med. 1975, 16, 210–224. [Google Scholar] [PubMed]
Bailey, D.L.; Townsend, D.W.; Valk, P.E.; Maisey, M.N. Positron Emission Tomography: Basic Sciences; Springer-Verlag: Secaucus, NJ, USA, 2005. [Google Scholar]
Parra, L.; Barrett, H.H. List mode likelihood: EM algorithm and image quality estimation demonstrated on 2-D PET. IEEE Trans. Med. Imaging 1998, 17, 228–235. [Google Scholar] [CrossRef] [PubMed]
Barrett, J.F.; Keat, N. Artifacts in CT: Recognition and avoidance. Radio Graph. 2004, 24, 1679–1691. [Google Scholar] [CrossRef] [PubMed]
De Man, B.; Nuyts, J.; Dupont, P.; Marchal, G. Reduction of metal streak artifacts in X-ray computed tomography using a transmission maximum a posteriori algorithm. IEEE Trans. Nucl. Sci. 2000, 47, 977–981. [Google Scholar] [CrossRef]
Fessler, J.A. Penalized weighted least squares image reconstruction for PET. IEEE Trans. Med. Imaging 1994, 13, 290–300. [Google Scholar] [CrossRef] [PubMed]
Titterington, D.M. On the iterative image space reconstruction algorithm for ECT. IEEE Trans. Med. Imaging 1987, 6, 52–56. [Google Scholar] [CrossRef] [PubMed]
Shepp, L.A.; Vardi, Y. Maximum likelihood estimation for emission tomography. IEEE Trans. Med. Imaging 1982, MI-1, 113–121. [Google Scholar] [CrossRef] [PubMed]
Yavuz, M.; Fessler, J.A. Statistical image reconstruction methods for randoms-precorrected PET scans. Med. Image Anal. 1998, 2, 369–378. [Google Scholar] [CrossRef]
Whiting, B.R. Signal statistics in X-ray computed tomography. Proc. SPIE 4682, Med. Imaging 2002: Phys. of Medical Imaging 2002, 53–60. [Google Scholar]
Anderson, J.M.M.; Mair, B.A.; Rao, M.; Wu, C.H. Weighted least-squares reconstruction methods for positron emission tomography. IEEE Trans. Med. Imaging 1997, 16, 159–165. [Google Scholar] [CrossRef] [PubMed]
Veklerov, E.; Llacer, J. Stopping rule for the MLE algorithm based on statistical hypothesis testing. IEEE Trans. Med. Imaging 1987, 6, 313–319. [Google Scholar] [CrossRef] [PubMed]
Lange, K. Convergence of EM image reconstruction algorithms with Gibbs smoothing. IEEE Trans. Med. Imaging 1990, MI-9, 439–446. [Google Scholar] [CrossRef] [PubMed]
Lewitt, R.M. Multidimensional digital image representations using generalized Kaiser-bessel window functions. J. Opt. Soc. Am. 1990, 7, 1834–1846. [Google Scholar] [CrossRef]
Silverman, B.W.; Jones, M.C.; Wilson, J.D.; Nychka, D.W. A smoothed EM approach to indirect estimation problems, with particular reference to stereology and emission tomography (with discussion). J. R. Stat. Soc. B 1990, 52, 271–324. [Google Scholar]
Snyder, D.L.; Miller, M.I.; Thomas, L.J.; Politte, D.G. Noise and edge artifacts in maximum-likelihood reconstructions for emission tomography. IEEE Trans. Med. Imaging 1987, 6, 228–238. [Google Scholar] [CrossRef] [PubMed]
Fessler, J.A. Tomographic Reconstruction Using Information Weighted Smoothing Splines. In Information Processing in Medical Im.; Barrett, H.H., Gmitro, A.F., Eds.; Springer-Verlag: Berlin, Germany, 1993; pp. 372–386. [Google Scholar]
La Rivière, P.J.; Pan, X. Nonparametric regression sinogram smoothing using a roughness-penalized Poisson likelihood objective function. IEEE Trans. Med. Imaging 2000, 19, 773–786. [Google Scholar] [CrossRef] [PubMed]
Rudin, L.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Physica D 1992, 60, 259–268. [Google Scholar] [CrossRef]
Huber, P.J. Robust regression: Asymptotics, conjectures, and Monte Carlo. Ann. Stat. 1973, 1, 799–821. [Google Scholar] [CrossRef]
Yu, D.F.; Fessler, J.A. Edge-preserving tomographic reconstruction with nonlocal regularization. IEEE Trans. Med. Imaging 2002, 21, 159–173. [Google Scholar] [CrossRef] [PubMed]
Evans, J.D.; Politte, D.A.; Whiting, B.R.; O’Sullivan, J.A.; Williamson, J.F. Noise-resolution tradeoffs in X-ray CT imaging: A comparison of penalized alternating minimization and filtered backprojection algorithms. Med. Phys. 2011, 38, 1444–1458. [Google Scholar] [CrossRef] [PubMed]
Ma, J. Total Variation Smoothed Maximum Penalized Likelihood Tomographic Reconstruction with Positivity Constraints. In Proceedings of the 8th IEEE International Symposium on Biomedical Imaging, Chicago, USA, April 2011; pp. 1774–1777.
Sidky, E.Y.; Duchin, Y.; Pan, X.; Ullberg, C. A constrained, total-variation minimization algorithm for low-intensity X-ray CT. Med. Phys. 2011, 38, S117–S125. [Google Scholar] [CrossRef] [PubMed]
Lauzier, P.T.; Tang, J.; Chen, G.H. Quantitative evaluation method of noise texture for iteratively reconstructed X-ray CT images. Proc. Med. Imaging 2011: Phys. Med. Imaging, Proc. SIPE 2011, 7961, Artical 796135. [Google Scholar]
Ma, J. Positively constrained multiplicative iterative algorithm for maximum penalized likelihood tomographic reconstruction. IEEE Trans. Nucl. Sci. 2010, 57, 181–192. [Google Scholar] [CrossRef]
Dempster, A.; Laird, N.; Rubin, D. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 1977, 39, 1–38. [Google Scholar]
Wei, G.; Tanner, M. A Monte Carlo implementation of the EM algorithm and the Poor Man’s data augmentation algorithm. J. Am. Stat. Assoc. 1990, 85, 699–704. [Google Scholar] [CrossRef]
Lange, K.; Carson, R. EM reconstruction algorithms for emission and transmission tomography. J. Comput. Assis. Tomogr. 1984, 8, 306–316. [Google Scholar]
Ma, J. On iterative Bayes algorithms for emission tomography. IEEE Trans. Nucl. Sci. 2008, 55, 953–966. [Google Scholar] [CrossRef]
Green, P. Bayesian reconstruction from emission tomography data using a modified EM algorithm. IEEE Trans. Med. Imaging 1990, 9, 84–93. [Google Scholar] [CrossRef] [PubMed]
De Pierro, A.R. A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography. IEEE Trans. Med. Imaging 1995, 14, 132–137. [Google Scholar] [CrossRef] [PubMed]
Csiszár, I.; Tusnády, G. Information geometry and alternating minimization procedures. Stat. Decis. 1984, Supplement Issue, No. 1, 205–237. [Google Scholar]
O’Sullivan, J.; Benac, J. Alternating minimization algorithms for transmission tomography. IEEE Trans. Med. Imaging 2007, 26, 283–297. [Google Scholar] [CrossRef] [PubMed]
Csiszár, I. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Stat. 1991, 19, 2032–2066. [Google Scholar] [CrossRef]
O’Sullivan, J.A.; Whiting, B.R.; Snyder, D.L. Alternating Minimization Algorithms for Transmission Tomography Using Energy Detectors. In Proceedings of the 36th Asilomar Conference Signals, Systems and Computers, St. Louis, USA, 2002; Volume 1, pp. 144–147.
Lasio, G.M.; Whiting, B.R.; Williamson, J.F. Statistical reconstruction for X-ray computed tomography using energy-integrating detectors. Phys. Med. Biol. 2007, 52, 2247–2266. [Google Scholar] [CrossRef] [PubMed]
Lange, K.; Hunter, D.R.; Yang, I. Optimization transfer using surrogate objective functions. J. Comput. Graph. Stat. 2000, 9, 1–20. [Google Scholar]
Erdoğan, H.; Fessler, J.A. Monotonic algorithms for transmission tomography. IEEE Trans. Med. Imaging 1999, 18, 801–814. [Google Scholar] [CrossRef] [PubMed]
Böhning, D.; Lindsay, B.G. Monotonicity of quadratic approximation algorithms. Ann. Inst. Stat. Math. 1988, 40, 641–663. [Google Scholar] [CrossRef]
Chan, R.H.; Ma, J. A multiplicative iterative algorithm for box-constrained penalized likelihood image restoration. IEEE Trans. Image Process. 2012, 21, 3168–3181. [Google Scholar] [CrossRef]
Gasso, G.; Rakotomamonjy, A.; Canu, S. Recovering sparse signals with a certain family of non-convex penalties and DC programming. IEEE Trans. Signal Proc. 2009, 57, 4686–4698. [Google Scholar] [CrossRef]
Luenberger, D. Linear and Nonlinear Programming, 2nd ed.; J. Wiley: New York, NY, USA, 1984. [Google Scholar]
Ahn, S.; Fessler, J.A. Emission image reconstruction for randoms-precorrected PET allowing negative sinogram values. IEEE Trans. Med. Imaging 2004, 23, 591–601. [Google Scholar] [CrossRef] [PubMed]
Lange, K.; Bahn, M.; Little, R. A theoretical study of some maximum likelihood algorithms for emission and transmission tomography. IEEE Trans. Med. Imaging 1987, 6, 106–114. [Google Scholar] [CrossRef] [PubMed]
Ober, R.J.; Zou, Q.; Zhiping, L. Calculation of the Fisher information matrix for multidimensional data sets. IEEE Trans. Signal Proc. 2003, 51, 2679–2691. [Google Scholar] [CrossRef]
Ma, J.; Hudson, H.M. Modified Fisher scoring algorithms using Jacobi or Gauss-Seidel subiterations. Comput. Stat. 1997, 12, 467–479. [Google Scholar]
Hudson, H.; Ma, J.; Green, P. Fisher’s method of scoring in statistical image reconstruction: Comparison of Jacobi and Gauss-Seidel iterative schemes. Stat. Method Med. Res. 1994, 3, 41–61. [Google Scholar] [CrossRef]
Ortega, J.M.; Rheinboldt, W.C. Iterative Solutions of Nonlinear Equations in Several Variables; Academic Press: New York, NY, USA, 1970. [Google Scholar]
Osborne, M.R. Fisher’s method of scoring. Int. Stat. Rev. 1992, 60, 99–117. [Google Scholar] [CrossRef]
Sauer, K.; Bouman, C. A local update strategy for iterative reconstruction from projections. IEEE. Trans. Signal Proc. 1993, 41, 533–548. [Google Scholar] [CrossRef]
Bouman, C.A.; Sauer, K. A unified approach to statistical tomography using coordinate descent optimization. IEEE Trans. Image Process. 1996, 5, 480–492. [Google Scholar] [CrossRef] [PubMed]

© 2013 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Algorithms for Non-Negatively Constrained Maximum Penalized Likelihood Reconstruction in Tomographic Imaging

Abstract

1. Introduction

2. EM Algorithm for Maximum Likelihood Reconstruction in Emission Tomography

3. Alternating Minimization Algorithms for Transmission Tomography

4. Optimization Transfer Algorithms

5. Multiplicative Iterative Algorithms

6. Modified Fisher’s Method of Scoring Using Jacobi or Gauss–Seidel Over-Relaxations

7. Iterative Coordinate Ascent Algorithms

8. Conclusions

Acknowledgements

References

Article Metrics

Citations

Article Access Statistics