Abstract
Image reconstruction is a key component in many medical imaging modalities. The problem of image reconstruction can be viewed as a special inverse problem where the unknown image pixel intensities are estimated from the observed measurements. Since the measurements are usually noise contaminated, statistical reconstruction methods are preferred. In this paper we review some non-negatively constrained simultaneous iterative algorithms for maximum penalized likelihood reconstructions, where all measurements are used to estimate all pixel intensities in each iteration.
1. Introduction
Image reconstruction in medical imaging, in general, considers estimating pixel intensities or attenuations from measurements obtained from an imaging system. For example, for positron emission tomography (PET), the measurements are obtained according to the procedure summarized below; see [1,2] for more details. A type of radioactive isotope is introduced into the body of a patient and, due to the decay of radioisotope, it emits positrons. Each positron moves in the body for a small distance (usually less than 1 mm) and then interacts with an electron to produce a pair of gamma photons that travel in almost opposite directions. The scanning device in the imaging system can detect each pair of gamma photons with a certain probability and all such detections form the measurements that can appear in a histogram or a list form [3]. It is usually assumed that the detection probabilities are known and they can be pre-computed and stored or computed on-the-fly.
Note that a special feature of measurements is that they are contaminated by noises, which can be a severe problem particularly if each measurement is small in value due to dose safety limit. It is possible that, if the noises are not properly addressed, the reconstructed image can be distorted by excessive noises. For example, for low dose X-ray CT (a type of transmission tomography), the metal streak artifact (e.g., [4]) can be a severe problem for the traditional filtered backprojection method. Statistical iterative reconstruction methods, due to their ability to model the physics and measurements more accurately, are capable to reduce metal streak artifacts [5].
To deal with the noise contamination problem, statistical image reconstruction methods in emission, transmission, X-ray CT, etc. have been developed based on specified probability models for measurements. For example, for single photon emission computed tomography (SPECT), possible options include: weighted least squares (equivalent to variable variance Gaussian) [6], fixed variance Gaussian [7] and Poisson [8] models. These models can also be used for transmission scans. Since accidental coincidences are the main source of background noise in PET, most PET scans are precorrected for accidental coincidences by real-time subtraction of the coincidences in the delayed window [9]. For randoms-precorrected PET scans, possible measurement models are Gaussian, ordinary Poisson and shifted Poisson [9], and all of these are just approximations as the true probability density function (pdf) for the measurements is difficult. Shifted Poisson is also used to model X-ray CT measurements [10].
Different algorithms have been proposed to maximize their corresponding objective functions. For example, for emission tomography, the expectation-maximization (EM) algorithm [8] is designed to maximize the log-likelihood formulated from Poisson distributed measurements, or the iterative space reconstruction algorithm (ISRA) [7] for maximizing the log-likelihood formulated from Gaussian (with fixed variances) distributed measurements. An attractive aspect of both EM and ISRA is that they are very easy to implement and both respect the non-negativity constraint on the reconstructions. However, if the objective function contains a penalty term, which is normally used to smooth the reconstruction, then both EM and ISRA become impractical as they involve, in each iteration, a non-linear system of equations that is tedious to solve exactly due to the large number of unknowns in these equations. Moreover, the penalty function also adds an extra inconvenience when searching for a non-negative solution is desirable.
To simplify notations, both the measurements and the unknown image are lexicographically ordered into vectors. More specifically, we use to present the measurement vector and to denote the unknown image vector, where superscript T denotes matrix transpose. Note although the notations are unified for different reconstruction problems in this paper, the meaning of these notations, such as x and y, can be different for different imaging modalities. Vectors y and x are related through a system matrix A; see Equation (4) below for some examples. For tomographic reconstruction problems, matrix A is usually assumed known so its estimation is not covered by this paper. Rather, we focus on how to estimate x from the observed y and the known system matrix A. We denote the estimate of x by .
Statistical reconstruction obtained by maximum penalized likelihood (MPL) (also known as maximum a posteriori (MAP)) is defined by
where is an objective function derived from the probability distribution for measurements and the penalty function. When the ’s are assumed independent (given x), the penalized likelihood objective function is
where is the log-likelihood function given by
Here is the smoothing parameter and is the penalty function used to smooth . In Equation (3), denotes the log-density function for measurement , and is a function of (here denotes the non-negative orthant of ) representing the mean measurement of camera bin i. Examples of include
where with being the ith row of matrix A, is the known blank scan counts of the ith detector and the known mean background counts. Another example is polyenergetic transmission scans (such as X-ray CT) where
and here denotes the attenuation map corresponding to the m-th energy spectrum, x is a vector formed by the ’s and is the blank scan count from energy spectrum m.
In Equation (3) the notation is used to emphasize that is a function of and it also involves measurement . We can also write this function as or in different contexts when there is no ambiguity. However, the functional properties of may change with respect to its different arguments. For example, if assuming follows a Poisson distribution for either emission or transmission scans, then
This is clearly a concave function of for both emission and transmission cases. However, for (treated as a function of x), it may be no longer concave for transmission but still concave for emission scans. Concavity is an important property exploited by the optimization transfer algorithms.
Let be an n-vector of all . The first term of Equation (2), i.e., , measures similarity between y and . Different probability distributions have been used to model even under the same imaging modality. For example, for emission tomography, if assuming the Poisson model for (i.e., ) then is given by Equation (6), or if considering the weighted least squares then
where is the weight. When we have the weighted least squares model as suggested in [11]. Another example in emission (or transmission) tomography is the randoms-precorrected PET scan (assume no scattering to simplify). In this context, the observed measurements are , where and (both unavailable directly) denotes the number of coincidences of the prompt and delayed windows respectively. Although we can assume and and that they are independent, the exact distribution of cannot be derived directly (e.g., [9]). An approximate probability model suggested in [9] is the shifted Poisson distribution, namely , which gives
or the weighted least squares given by
Note that the shifted Poisson approximation matches the first two moments with the true probability model for when both the prompt and delayed measurements are assumed independent and follow Poisson distributions.
In this paper, we present and discuss several important non-negatively constrained penalized likelihood reconstruction algorithms. When designing a reconstruction algorithm in tomographic imaging, one considers the following important issues: (i) the algorithm is computationally efficient, and ideally it involves only forward-projection (e.g., ) and back-projection (e.g., ) operations; (ii) the algorithm can be easily applied to different measurement probability models and imaging modalities; (iii) the algorithm can impose the non-negativity constraint; (iv) the algorithm converges fast. Our discussions on the algorithms in this paper will mainly focus on these points.
In tomographic imaging, it is important to produce smoothed reconstructions as severe noise in a reconstruction can cause false diagnoses. Smoothing can generally be achieved by one of the following five practices: (i) early termination of the iterations (e.g., [12]); (ii) MPL reconstructions with an appropriate smoothing parameter (e.g., [13]); (iii) functional representation of the unknown image by a set of smooth basis functions (e.g., [14]); (iv) post smoothing of the reconstruction within each iteration (e.g., [15]) or after all iterations ([16]); and (v) pre-smoothing of the camera data (i.e., sinogram) followed by filter backprojection (FBP) (e.g., [17,18]). We focus on the penalized likelihood approach to smoothing in this paper. In Equation (2), the smoothing parameter h balances two conflicting targets: fidelity of the s to the s and smoothness of x. Although an appropriate choice of h is important for achieving a reconstruction with balanced fidelity and smoothness, we will not consider how to estimate h in this paper. A penalty function is used to smooth or regulate the estimate . Usually, takes the form of
where represents a neighborhood operation (such as the first or second order difference) on pixel j, and function measures the magnitude of . A common choice of ρ is the quadratic function: . Generally, a quadratic penalty tends to produce images with over-smoothed edges. Possible edge preserving penalties include total variation (TV) (e.g., [19]) Huber [20] and hyperbolic functions (e.g., [21]). Note that is convex for all these options.
The optimal choice of the penalty function J and the smoothing parameter h are unsolved problems in image processing and will not be further elaborated in this paper. We emphasize that smoothing by MPL indeed produce visually improved reconstructions over the tradition filtered-backprojection method particularly in dose-limited tomography such as low dose X-ray CT. The edge preserving penalties are extremely useful, such as TV and Huber penalties; see [22,23,24]. However, the MPL reconstructions can have unnatural noise textures very different from the familiar filtered-backprojection method. Its impact on diagnostic tasks is still unknown and this is an active research area; see [25] for examples and discussions.
We adopt the following notations throughout this paper. Let be the estimate of x obtained at iteration k of an algorithm. The notation indicates the derivative of function b with respect to the variable in the brackets. For example, represents the derivative of b with respect to and the derivative of b with respect to x. We use to denote the derivative of b with respect to , the j-th element of vector x. We also let and represent, respectively, and evaluated at .
Non-negatively constrained MPL image reconstruction algorithms can be classified into simultaneous and block-iterative (a.k.a. ordered subset (OS)) algorithms. For simultaneous algorithms, all elements in y are used to update x in each iteration, and for block-iterative algorithms, distinct portions of y are used in turn to update x. We discuss in this paper some simultaneous algorithms for non-negatively constrained MPL reconstructions, and the block-iterative algorithms are not included in our discussions. The rest of this paper is arranged as follows. The expectation-maximization algorithm for emission tomography is discussed in Section 2. Section 3 explains the alternating minimization algorithm designed specifically for transmission tomography. Section 4 contains explanations on the optimization transfer algorithms and their applications to tomographic reconstructions. The multiplicative iterative (MI) algorithms for tomographic imaging are provided in Section 5 and the Fisher scoring based Jacobi or Gauss–Seidel over-relaxation algorithms are presented in Section 6. Section 7 explains another Gauss–Seidel method named the iterative coordinate ascent algorithm. Finally, Section 8 includes discussions and remarks about this paper.
In this paper we focus on explaining and summarizing different non-negatively constrained tomographic imaging algorithms. Numerical comparisons of some of these algorithms are available in [26], and therefore will not be given in this paper.
2. EM Algorithm for Maximum Likelihood Reconstruction in Emission Tomography
The expectation-maximization (EM) algorithm [27] is a statistical algorithm for iteratively computing maximum likelihood estimates when data contain random missing values. Here “random” means these missing values do not provide extra information about the parameters we wish to estimate. We first give a brief summary of the EM algorithm below.
Since there exist the missing and the observed (or incomplete) components, we can define the complete data set as a combination of the incomplete and the missing data. Note, however, that our aim is to estimate the unknown parameters by maximizing the log-likelihood of the incomplete data. The rationale for the EM algorithm is that if maximizing the incomplete data likelihood is difficult while maximizing the complete data likelihood is easy, then EM can be used to compute iteratively the maximum of the incomplete data likelihood by maximizing the complete data likelihood in each iteration.
Let be the complete data set given by , where denotes the incomplete data and the missing data. Let be the log-likelihood based on the complete data and the log-likelihood of the incomplete data , where x is a p-vector for the unknown parameters. Let be the maximum likelihood (ML) estimate of x. Then iteration of the EM algorithm comprises two steps:
- E-Step: Compute the conditional expectation of the complete data log-likelihood given the incomplete data and , and denote this function by
- M-Step: Update the x estimate by maximizing the Q function, namely
The EM algorithm was first applied to emission tomograph by Shepp and Vardi [8] and Lange and Carson [29]. Both papers adopt the Poisson model for emission counts, namely are independent Poisson random variables with mean . This model assumes ; otherwise, we can depict as the value after subtracting from the bin i measurement. From this Poisson model, we can formulate the complete data as , where follows the Poisson distribution with mean . Clearly, each represents the unknown portion of measurement on camera bin i attributed to image pixel j. The corresponding complete data log-likelihood is
and the corresponding Q function is
where . Since the conditional distribution of is , we have . Thus after solving , the M-step of the EM algorithm gives the following updating formula for x:
for . It has been pointed out in [23,30] that formula (15) can also be explained by the Bayes conditional probability formula. This EM algorithm possesses the following properties making it attractive for emission tomography; they are:
- If the initial then for all ; i.e., it automatically satisfies the non-negativity constraint on x.
- The algorithm is easy to implement as it only involves forward- and back-projections.
- The updating formula in Equation (15) increases the incomplete data log-likelihood: , where equality holds only when the iteration has converged.
- satisfies , where is with . Thus the x estimate at any iteration satisfies that the total expected and the total observed counts are equal.
The above EM is easy to implement and possesses some attractive properties on the reconstructions. This algorithm, however, is restricted only to emission tomography with Poisson distributed measurements. It cannot be easily extended to other reconstruction tasks. For example, application of the EM algorithm to transmission tomography does not lead to an exact updating formula due to the fact that its M-step does not produce a closed-form solution; see [29]. Another limitation is that this EM algorithm can only be used for maximum likelihood reconstructions, and its application to the MPL reconstruction will not in general result in closed-form updating formula. To rectify this problem, Green [31] developed a one-step-late (OSL) algorithm for the MPL reconstruction by replacing x in the derivative of the penalty function by its current estimate , and therefore an “exact” solution can still be accomplished. But this method suffers from the deficiencies that (i) the algorithm may be non-convergent; and (ii) some estimates may be negative.
De Pierro [32] reproduced the EM updating formula using a totally different argument. In his derivation, there is no missing data and hence no E-step. Although the algorithm is named “modified EM”, it is not a real EM. In fact, this algorithm belongs to a more general class called the optimization transfer algorithms, since the Poisson log-likelihood optimization problem is transferred to a simpler optimization in each iteration. We will summarize the optimization transfer algorithms in the Section 4.
3. Alternating Minimization Algorithms for Transmission Tomography
We have explained in Section 2 that the EM algorithm is not directly suitable for transmission scans as its M-step cannot be computed exactly. In this section, we summarize an alternating minimization algorithm designed to solve the transmission tomographic problem, including X-Ray CT. This algorithm is a generalization to the EM algorithm [33] and its application to transmission tomography can be found in [34].
Following [34], we explain this algorithm using the polyenergetic transmission tomography example. In this context, if assuming transmission scans follow Poisson distributions, the corresponding log-likelihood is
where is the scan count of detector i and (now expressed as a function of vector z, which will be defined below) is given by Equation (5). Moreover, elements of the attenuation map associated with spectrum m, namely elements of in Equation (5), are further modeled by
where j indexes pixels, r represents different types of materials, are known linear attenuation coefficients and are the unknown partial densities (e.g., [34]) we wish to estimate. In Equation (16), z is a vector of size formed by column-wise stacking the vectors .
Define set
where
for and equals the background noise for . Clearly, given in Equation (5) can now be expressed as . Define another set
In [34, is called the exponential family and the linear family. Let p and q be the vectors created from and respectively. It can be shown that the problem of maximizing the log-likelihood Equation (16) can be re-written as
subject to , where is the I-divergence [35] given by
Thus, maximizing the log-likelihood in Equation (16) can be achieved iteratively. Assuming the estimates , and are obtained at iteration k, then iteration contains two steps:
- (i)
- compute by minimizing subject to ;
- (ii)
- compute by minimizing subject to .
Minimizing over is easily achieved using the Lagrange multiplier, and the result is
On the other hand, direct optimization of over is an unmanageable task as the ’s are mixed (i.e., not decoupled or separated from each other) within the objective function. One approach to overcome this problem is by using a decoupled objective function representing an upper bound of the original objective function. In fact, it can be shown that for given by Equation (19),
where and is an estimate of corresponding to the estimate of . This inequality is obtained from the fact that is a convex function of . Clearly, on the right hand side of Equation (24) are decoupled and thus their non-negatively constrained optimizations will result in closed-form solutions. When we take , the optimal solution to is
where and . We give some remarks about this algorithm below.
Remarks
- (1)
- This algorithm is designed for maximum likelihood estimation. However, it can be easily extended to MPL where the penalty function must be convex and therefore can also be decoupled.
- (2)
- This algorithm is developed for the likelihood function derived from the simple Poisson measurement noise. Note that the alternating minimization algorithm was also developed for a compound Poisson noise model in [36] and its comparison with the simple Poisson alternating minimization was provided in [37]. For other measurement distributions, however, the corresponding algorithms have to be completely re-developed.
- (3)
- The convergence properties of the alternating maximization algorithm have been studied in [34]. Particularly, it is monotonically convergent under certain conditions.
- (4)
- It will become clear in Section 5 (Example 5.3) that the multiplicative-iterative algorithm can be derived more easily for this transmission reconstruction problem.
- (5)
- The trick of decoupling the objective function using its convex (or concave) property is also the key technique of the optimization transfer algorithms discussed in Section 4.
4. Optimization Transfer Algorithms
Details of the optimization transfer (OT) algorithm (also called the minorization–maximization (MM) algorithm for maximizations) can be found in, for example, [38]. In this section we present this algorithm briefly and explain its application in emission and transmission tomography.
The fundamental idea of the OT algorithm is that it employs a surrogate function to minorize (see the definition below) the objective function in each iteration, and then update the parameter estimate by maximizing this surrogate function.
More specifically, a function is said to minorize at if it satisfies the following “minorization” conditions:
If the exact maximum is not easy to obtain, we can find an by simply increasing , as this will also guarantee that the monotonic condition stated below remains for .
- (i)
- , and
- (ii)
- for all x.
An attractive property when using this surrogate function is that satisfies the monotonic condition, namely
where equality holds only when the iteration has converged. This monotonic property can be easily verified by the minorization conditions since
For implementation of the OT algorithm to medical imaging, a surrogate function must be determined. There exist different ways of choosing the surrogate function, such as those listed in [38]. We mainly consider two approaches in this paper: (i) the method based on the inequality on concave functions (called the concave inequality hereafter); and (ii) the method based on quadratic lower bounds (also known as paraboloidal surrogates [39]). These ideas are summarized below.
Let be the objective function we wish to maximize, where is the i-th row of matrix and x is a p-vector. For matrix A, we assume its elements are non-negative and . We also assume that all are concave functions. Let be weights satisfying . Then according to the concave inequality we have
There are different ways of choosing weights . For example, we can use , which is also adopted in [32]. In this case since each is a function of x, the surrogate function corresponding to Equation (28) is
and it is easy to verify that this surrogate satisfies the minorization conditions. The right hand side of Equation (29) is a weighted summation of functions , each involving a single only (i.e., decoupled), and therefore maximization with respect to x of can be achieved by a sequence of 1-D optimizations. Another trick, due to De Pierro [32], uses the following concave inequality:
If the weights do not depend on , then Equation (30) leads to the surrogate function of
which clearly also meets the minorization conditions. In Equation (31), the choice of is again flexible, and one popular option is to use .
The above two surrogates are developed based on the concave inequality. Another useful approach is to employ a quadratic lower bound (e.g., [40]). Assume is twice differentiable with its second derivative denoted by . Let be a number such that for all , then
The right hand side of Equation (32) is a parabola surrogate of and the condition on guarantees that this function lies below . Unlike the previous surrogate functions, this surrogate is not separable in x, and therefore its maximization with respect to x cannot be reduced to a series of 1-D problems. To overcome this problem we can find another function surrogating the above parabola surrogate but with separable x. Towards this, we denote the right hand side quadratic function of Equation (32) by . Since is concave in , we can use either Equations (29) or (31) to find a surrogate to and the resulting algorithm is called the separable paraboloidal surrogate (SPS) algorithm [39]. For example, corresponding to Equation (31), a separable parabola surrogate of is
A careful selection of the curvature in Equation (32) can lead to fast convergence of the SPS algorithm. Erdoǧan and Fessler [39] derived the optimal curvature for the SPS algorithm in transmission tomography.
Next, we present two examples explaining how to implement the OT algorithm to emission and transmission tomography.
Example 4.1 (OT for emission scans with Poisson noise).
In this example we explain the application of OT for MPL reconstruction in emission tomography, where measurements are assumed to follow Poisson distributions. De Pierro’s modified EM (MEM) [32] coincides with the method discussed below when . Firstly, under the Poisson model for emission scans, the penalized log-likelihood function is
where ρ is assumed a convex function. Let
where . It is easy to verify that is concave with respect to , so we can use Equation (28) to define its surrogate function. On the other hand, for the penalty function in Equation (34), is concave, so we can use Equation (31) to construct its surrogate. Combining them together we have the following surrogate for :
where . Now
The equation has a closed-form solution for when and for all i. In this context, Equation (37) reduces to a quadratic function so we wish to solve for from
subject to , and its analytic solution is readily available. If or ρ is not quadratic, the analytic solution to Equation (37) does not exist. In this case, one can use an 1-D optimization method to solve it, or alternatively, one may use a separable parabola surrogate rather than Equation (36). An example of the latter is explained in the next example where the reconstruction problem is for transmission tomography.
Example 4.2 (OT for transmission scans with Poisson noise).
This example considers the application of OT to MPL reconstruction in transmission tomography. Our explanations follow [39] closely. For transmission scans with Poisson noise, the penalized log-likelihood is given by
where ρ is convex. Let and
Since is concave with respect to , a separable parabola surrogate can be defined according to Equation (33). For the first term of Equation (39) (i.e., the log-likelihood part), a separable parabola is given by
where
and here satisfies for all . For the second term of Equation (39) (i.e., the penalty part), let and let the weights . Its separable parabola surrogate is
where
Here is chosen such that for all in its range; this curvature ensures that lies above . Aggregating Equations (41) and (43) we obtain a separable parabola surrogate for :
We have
and for this example
Let and . The solution of , subject to , is given by where
This is in fact a special gradient algorithm with a diagonal preconditioning matrix.
5. Multiplicative Iterative Algorithms
The OT algorithms presented in the last section have the following important achievements: (1) they manage to transform a high dimensional optimization problem into a series of 1-D optimizations; (2) due to 1-D optimizations, the non-negativity constraints can be easily enforced by simply resetting negative estimates to zero in each iteration; (3) the surrogate given by the separable parabola approach is general enough to be applicable to different tomographic reconstructions. A limitation of OT is that it requires all (log-density) and (negative penalty) to be concave functions.
In this section we discuss a competitive alternative to the OT method called the multiplicative iterative (MI) algorithm; its application to tomographic imaging can be found in [26] and to box-constrained image processing in [41].
The main motivation of the MI algorithm is that it can be easily derived under different imaging modalities and different measurement noise models. Moreover, for some difficult penalties, such as TV, or even non-convex penalties [42], MI can be easily implemented to solve the corresponding optimization problems.
A general MI updating formula can be developed suitable for all tomographic reconstruction problems regardless of the mean function model, measurement probability distribution and penalty function. The simulation study reported in [26] reveals that MI has competitive convergence speed when compared with OT and other reconstruction algorithms. The MI algorithm does not require concavity of the functions and and therefore is more general than the OT algorithm. It requires existence of the first derivatives of and . It is possible that the objective function in Equation (2) has multiple local maxima. In this case, MI finds one of the local non-negative maxima, depending on the starting value of the algorithm.
Here are some notations needed to explain the MI algorithm. For a function , let be the positive component of and the negative component so that For a number b, Let and so that . Thus, for the numerical value of function at point , we can also write .
We develop the MI algorithm from the Karush–Kuhn–Tucker (KKT) necessary conditions for the non-negatively constrained optimization of . They are:
for . Therefore, we aim to solve for x from
Note that the expression inside the brackets of Equation (51) represents , and is included in Equation (51) to reflect the conditions in Equations (49) and (50).
The key step in developing the MI algorithm is to rearrange Equation (51) such that its positive and negative terms appear on different sides of the Equation (51). Hence we rewrite Equation (51) as
This equation naturally suggests the following fixed point algorithm to update x:
where and denote respectively the right and left hand side of Equation (52), namely,
and
and ϵ is a small positive constant, such as , used to avoid zero denominate of Equation (53). Note that the ϵ value does not affect where the algorithm converges to. As both numerator and denominator of Equation (53) are positive, whenever .
In Equation (53) the updated is denoted by indicating this is not the final estimate for iteration . In fact, this update does not ensure monotonic increment of and a line search step must be included to rectify this problem. We first express Equation (53) as a gradient algorithm:
where with . Note that when . When we set only if (since satisfies the KKT condition in this case); otherwise, we set , where is another small constant such as . Equation (56) explains that emanates from in the gradient direction of Ψ with a non-negative step size . For the line search step, the search direction is with denoting the line search step size. Sine guarantees , we only search in the fixed range of . After including a line search step is obtained according to
Due to the fixed search interval, this line search is remarkably simple. One simple and efficient search strategy is provided by the Armijo’s rule (e.g., [43]). Armijo line search is a finite terminating algorithm. Briefly, it starts with , and for each α it checks if the following Armijo condition is satisfied:
where is a fixed parameter such as . If Equation (58) is true then stop; otherwise, reset (such as ) and reevaluate the Armijo condition (58). Note that the repeated evaluations of can be made with being computed only once. Therefore, the line search step does not add extra major computations to the MI algorithm.
Convergence properties of the MI algorithm are given in [26,41]. Briefly, under certain regular conditions, MI converges monotonically to a local maxima satisfying the KKT conditions.
For the mean functions given in Equation (4), we have for emission and for transmission tomography; the corresponding updating formula (53) becomes:
for emission tomography, and
for transmission tomography. The derivative in the above formulae depends on the log-density . Some examples are presented below.
Example 5.1 (MI for emission scans with Poisson noise).
For emission tomography with Poisson noise, we have the log-density function for :
where . Thus , which gives and . The updating formula (59) becomes, for ,
Note that when (i.e., maximum likelihood reconstruction), and , this algorithm coincides with the EM algorithm for emission tomography. After line search, the estimate of x at iteration is given by Equation (57). In this algorithm, there is only one back-projection (for the numerator of Equation (62)) and one forward-projection in each iteration; its computational burden is the same as EM.
Example 5.2 (MI for randoms-precorrected PET emission scans).
Some PET scans produce measurements that have already been corrected for randoms [44] and their measurements no longer follow Poisson distributions. We consider in this example the model weighted least squares which is also used in [11] but under a different context, i.e., we reconstruct from randoms-precorrected measurements by maximizing the objective Equation (2) where
Here is used to denote , and for this formula (59) still applies. Now since
we have and . The MI algorithm updates x first according to
and then, after the line search step, computes according to Equation (57).
Example 5.3 (MI for polyenergetic transmission scans with Poisson noise).
Application of the MI algorithm to polyenergetic X-ray CT is again extremely easy. Under the assumption of Poisson noise, the log-density for measurement is identical to Equation (61) but now with ; see Equation (17). In Example 5.1 we have already derived and for the Poisson noise log-density. On the other hand, the derivative of with respect to (denoted by ) is
Thus, the updating formula for ployenergetic transmission is
for and . After the line search step specified in Equation (57), is obtained. This iterative formula involves one forward- and two back-projections in each iteration, and therefore it demands similar amount of computations when compared with the alternative minimization algorithm in [34]. When , and , this MI algorithm is identical to the algorithm given in [45] for maximum likelihood reconstruction in transmission tomography. Note that unlike the optimization transfer and alternating minimization algorithms, the MI algorithm can be easily derived for other objective functions, such as the weighted least-squares function.
The above examples demonstrate that the MI algorithms are easy to derive and to implement in tomographic imaging. The line search step it requires does not incur significant computational burden.
6. Modified Fisher’s Method of Scoring Using Jacobi or Gauss–Seidel Over-Relaxations
In this section we elaborate on another non-negatively constrained method for tomographic imaging, which is a modification to the standard Fisher’s method of scoring (FS) algorithm. This method is developed based on the following steps. Firstly, the objective function is approximated by a quadratic function in each iteration, where the Fisher information matrix (e.g., [46]) is used to define the quadratic term; secondly, an over-relaxation method, either the Jacobi over-relaxation (JOR) or the Gauss–Seidel over-relaxation (also called the successive over-relaxation (SOR)), is employed to solve approximately the linear system derived from zeroing the derivative of this quadratic function. The resulting algorithms are called FS-JOR and FS-SOR and their detailed descriptions can be found in [47,48]. Descriptions of the JOR and SOR methods are available, for example, in [49].
FS is a general optimization algorithm for computing maximum likelihood estimates. Its advantages over the traditional Newton’s method have been documented in [50]. Briefly, FS iterations are well defined due to the non-negativeness of the Fisher information matrix, but for the Newton’s method, the negative Hessian matrix may not even be non-negative definite, making it unnecessarily proceed in the uphill direction in some applications. Transmission tomography is an example where this problem for the Newton’s method indeed occurs; see Example 6.2.
We assume the objective function in Equation (2) is twice differentiable and let be the Fisher information matrix, namely . At iteration of the Fisher scoring algorithm, is approximated by the following quadratic function:
where denotes the Fisher information matrix at . Then the x estimate is updated by constrained maximization of , namely
The KKT conditions for this optimization are
where
Here denotes the j-th row of matrix . The JOR and SOR methods solve, for ,
in different manners: JOR solves it by fixing all the x elements, except , at their estimates from the last iteration (i.e., iteration k), but SOR solves it by fixing all the x elements, except , at their most current estimates.
The above illustrations describe how to incorporate JOR or SOR sub-iterations into the FS algorithm. In fact, in each iteration, JOR or SOR is used to solve approximately the linear system of equations determined by the FS algorithm, and then this approximate solution is used as the starting value for the next FS iteration. These new schemes modify the standard FS method, and are feasible for large estimation problems.
Usually it suffices to run one JOR or SOR sub-iteration. But running more than one sub-iterations is also attractive as it has the potential to reduce the computations for the entire optimization process. Suppose within each Fisher scoring iteration we run m sub-iterations of JOR or SOR. The resulting algorithms are called the m-step FS-JOR and m-step FS-SOR algorithms respectively. Let r be the sub-iteration index for the over-relaxation method and the estimate of x at the r-th over-relaxation sub-iteration of the k-th FS iteration. Let be the -th element of . Assume for all j. At iteration , first set . If using JOR to solve Equation (73) we have
and if using SOR to solve we then have
where and is the relaxation parameter. If any then it is reset to zero. This resetting is correct since the only possibility for is that the expressions in the round brackets of Equations (74) and (75) are negative since and . Hence resetting to zeros assures that the FS-JOR and FS-SOR algorithms converge to, when they converge, the solution satisfying the KKT conditions. At the end of the sub-iterations set . Note that when , the last term in the round brackets of either Equation (74) or (75) becomes zero. Thus 1-step FS-JOR is basically a gradient algorithm and we can therefore replace ω by a line search step size , where the search range is fixed at as this range will keep the estimate non-negative.
The relaxation parameter ω is used to achieve convergence of the FS-JOR and FS-SOR algorithms. Results contained in [47] give convergence properties when and when the non-negativity constraint is ignored. In fact in this context FS-SOR converges if and FS-JOR converges if , where is the maximum eigenvalue of . Here is the MPL solution.
From the updating formulae given in Equations (74) and (75) we can see that both FS-JOR and FS-SOR involve the gradient and the Fisher information matrix based operation . The gradient is standard for most reconstruction algorithms, but the computation of requires more careful consideration. It will become clear in Examples 6.1 and 6.2 that for tomographic reconstructions usually exhibits as , where . It is not wise to compute first as this involves multiplications of two huge matrices A and . For FS-JOR, a feasible alternative is to use the forward projection to find first, then to multiply it with the diagonal values of W to get , and finally to back-project to obtain (ignoring the penalty term). This approach involves only one forward- and one back-projections in every sub-iteration. The situation for FS-SOR is more complicated since changes with the pixel index j. The above approach for FS-JOR cannot be used here as otherwise each FS-SOR sub-iteration will demand infeasible p pairs of forward- and back-projections. To confront this problem, let
The part of Equation (75) involves . Note that
so we can start with and obtain by applying Equation (77). Although here the number of multiplications for (where vector varies with its index j) becomes the same as , it requires column access to the system matrix A, which can be a problem if A is generated on-the-fly.
We next provide examples of applying FS-JOR and FS-SOR to emission and transmission tomography.
Example 6.1 (Emission scans with Poisson noise).
For emission reconstruction with Poisson noise, the log-density of is given by Equation (61). Thus for the corresponding object function of Equation (2), its gradients are
and its Fisher information matrix elements are
where , . Assuming we run only one sub-iteration for FS-JOR or FS-SOR (i.e., ), the FS-JOR iterative formula is
and the FS-SOR formula is
Then . The formula given in Equation (80) is just a gradient algorithm so ω can be replaced by a line search step size . Efficient computation of Equation (81) requires column access to matrix A as explicated before. Hudson et al. [48] reported simulation results and a real data application for emission reconstruction. They compared FS-JOR and FS-SOR with EM. The computer time required per iteration for the EM and one-step FS-JOR algorithms were similar. By comparison with the EM algorithm, FS-JOR and FS-SOR accelerated convergence when an appropriate value of ω was used. Particularly, FS-SOR had a superior speed of convergence when .
Example 6.2 (Transmission scans with Poisson noise).
For transmission reconstructions with Poisson noise, we can easily work out the gradient and Fisher information matrix from its penalized likelihood function. The gradients are
and the Fisher information matrix elements are
where , and . Note that for this example, the Fisher information matrix is non-negative but the negative Hessian matrix may not be non-negative, making the Newton method non-applicable. Corresponding to , the FS-JOR iterative formula is
and the FS-SOR formula is
Then . Again, Equation (84) is a gradient algorithm so that a line search can be used, and efficient implementation of Equation (85) demands unpleasant column access to A.
This section explains the Fisher scoring based image reconstruction algorithms using JOR or SOR sub-iterations. For these algorithms, any negative estimates in each iteration can be corrected by simply resetting to zero, as this way of resetting enforces the KKT conditions. If only one sub-iteration is used, FS-JOR is equivalent to a gradient algorithm. For efficient implementation of FS-SOR, it requires column retrieval of the system matrix A, which can be infeasible for some reconstruction problems.
7. Iterative Coordinate Ascent Algorithms
Another method using SOR is the method of iterative coordinate ascent (ICA) (or iterative coordinate descent (ICD) for minimization problems). ICA was first implemented to tomographic imaging in [51,52]. The basic idea of ICA is to apply SOR directly to the objective function , resulting in a sequence of 1-D functions where each is associated with one of these 1-D functions. Then each function is solved exactly or approximately to update the corresponding . More specifically, using the SOR principle we can define a function for according to
This is a function of only and we can update the estimate by
Since this is a 1-D function, the constraint can be easily enforced using, for example, the resetting to zero approach.
One computational issue with ICA when applied to tomographic imaging is that it requires repeated calculations of for all i when updating . This problem can be rectified by the following approach. Let
Consider the evaluation of . Assuming the update of is given by then and therefore
This relationship explains that can be cheaply computed using the value before the update plus a correction term. However, similar to FS-SOR, it necessitates column access to A. This can be a potential issue if A is generated on-the-fly.
Next we use again the emission and transmission examples to elaborate the ICA algorithm.
Example 7.1 (Emission scans with Poisson noise).
Firstly, we define
From the penalized log-likelihood function of emission measurements (see, for example, Equation (34)), function is given by
Since this is a non-quadratic function of , exact maximization is infeasible. We can find its approximate optimization by running a single or multi- step of, for example, the Newton or Fisher scoring algorithm. In this example we consider using the Fisher scoring algorithm to optimize and call the resulting algorithm ICA-FS. After a single step of Fisher scoring we have
where and is a line search step size enforcing , where equality holds only when the algorithm is converged. This monotonic condition eventually leads to . The update for is then .
Example 7.2 (Transmission scans with Poisson noise).
For this example we have
where is defined in Equation (90). The ICA-FS algorithm gives
where , and then .
8. Conclusions
Image reconstruction from projections has wide applications, particularly in medical imaging. Emission and transmission tomography and X-ray CT all fall into this category. Three types of reconstruction methods are available: Fourier methods, algebraic methods and likelihood based reconstruction methods. Our attention in this paper is on the penalized likelihood approaches.
In this paper we present and discuss several important simultaneous MPL reconstruction algorithms, where the non-negativity constraint is enforced. The EM algorithm is limited to maximum likelihood reconstruction problems in emission tomography and is difficult to extend to other imaging modalities and probability models for the likelihood. One variation of EM, called the alternating minimization, is developed for transmission tomography. Another variation of EM, called the OT algorithm, is suitable for any imaging modalities and probability models, but its derivation is often cumbersome as the option for the surrogate function is flexible. The OT algorithm based on the separable parabola surrogate is relatively easy to implement to different tomographic imaging. The MI algorithm, on the other hand, is easy to derive and to implement as its line search step is cheap to compute. Its convergence speed, according to the simulation study, is similar to the separable parabola surrogate algorithm. The FS-JOR and FS-SOR algorithms first apply the Fisher information matrix to obtain a quadratic approximation to the objective function, and then optimize it using JOR or SOR schemes. Implementation of ICA-FS reverses the order of FS and SOR in FS-SOR. For both FS-SOR and ICA-FS, their convergence speeds are usually superior, but their potential problem is that both involves column retrieval of A, which may not be pre-generated and stored.
For some of the algorithms covered in this paper, their corresponding block-iterative algorithms have been developed. Block-iterative algorithms can usually achieve faster convergence than their simultaneous counterpart. However, discussions of the block-iterative algorithms are not included in this paper.
Acknowledgements
I wish to thank the referees for their invaluable comments and suggestions which have greatly enhanced the quality of this paper.
References
- Phelps, M.E.; Hoffman, E.J.; Mullani, N.A.; Ter-Pogossian, M.M. Application of annihilation coincidence detection to transaxial reconstruction tomography. J. Nucl. Med. 1975, 16, 210–224. [Google Scholar] [PubMed]
- Bailey, D.L.; Townsend, D.W.; Valk, P.E.; Maisey, M.N. Positron Emission Tomography: Basic Sciences; Springer-Verlag: Secaucus, NJ, USA, 2005. [Google Scholar]
- Parra, L.; Barrett, H.H. List mode likelihood: EM algorithm and image quality estimation demonstrated on 2-D PET. IEEE Trans. Med. Imaging 1998, 17, 228–235. [Google Scholar] [CrossRef] [PubMed]
- Barrett, J.F.; Keat, N. Artifacts in CT: Recognition and avoidance. Radio Graph. 2004, 24, 1679–1691. [Google Scholar] [CrossRef] [PubMed]
- De Man, B.; Nuyts, J.; Dupont, P.; Marchal, G. Reduction of metal streak artifacts in X-ray computed tomography using a transmission maximum a posteriori algorithm. IEEE Trans. Nucl. Sci. 2000, 47, 977–981. [Google Scholar] [CrossRef]
- Fessler, J.A. Penalized weighted least squares image reconstruction for PET. IEEE Trans. Med. Imaging 1994, 13, 290–300. [Google Scholar] [CrossRef] [PubMed]
- Titterington, D.M. On the iterative image space reconstruction algorithm for ECT. IEEE Trans. Med. Imaging 1987, 6, 52–56. [Google Scholar] [CrossRef] [PubMed]
- Shepp, L.A.; Vardi, Y. Maximum likelihood estimation for emission tomography. IEEE Trans. Med. Imaging 1982, MI-1, 113–121. [Google Scholar] [CrossRef] [PubMed]
- Yavuz, M.; Fessler, J.A. Statistical image reconstruction methods for randoms-precorrected PET scans. Med. Image Anal. 1998, 2, 369–378. [Google Scholar] [CrossRef]
- Whiting, B.R. Signal statistics in X-ray computed tomography. Proc. SPIE 4682, Med. Imaging 2002: Phys. of Medical Imaging 2002, 53–60. [Google Scholar]
- Anderson, J.M.M.; Mair, B.A.; Rao, M.; Wu, C.H. Weighted least-squares reconstruction methods for positron emission tomography. IEEE Trans. Med. Imaging 1997, 16, 159–165. [Google Scholar] [CrossRef] [PubMed]
- Veklerov, E.; Llacer, J. Stopping rule for the MLE algorithm based on statistical hypothesis testing. IEEE Trans. Med. Imaging 1987, 6, 313–319. [Google Scholar] [CrossRef] [PubMed]
- Lange, K. Convergence of EM image reconstruction algorithms with Gibbs smoothing. IEEE Trans. Med. Imaging 1990, MI-9, 439–446. [Google Scholar] [CrossRef] [PubMed]
- Lewitt, R.M. Multidimensional digital image representations using generalized Kaiser-bessel window functions. J. Opt. Soc. Am. 1990, 7, 1834–1846. [Google Scholar] [CrossRef]
- Silverman, B.W.; Jones, M.C.; Wilson, J.D.; Nychka, D.W. A smoothed EM approach to indirect estimation problems, with particular reference to stereology and emission tomography (with discussion). J. R. Stat. Soc. B 1990, 52, 271–324. [Google Scholar]
- Snyder, D.L.; Miller, M.I.; Thomas, L.J.; Politte, D.G. Noise and edge artifacts in maximum-likelihood reconstructions for emission tomography. IEEE Trans. Med. Imaging 1987, 6, 228–238. [Google Scholar] [CrossRef] [PubMed]
- Fessler, J.A. Tomographic Reconstruction Using Information Weighted Smoothing Splines. In Information Processing in Medical Im.; Barrett, H.H., Gmitro, A.F., Eds.; Springer-Verlag: Berlin, Germany, 1993; pp. 372–386. [Google Scholar]
- La Rivière, P.J.; Pan, X. Nonparametric regression sinogram smoothing using a roughness-penalized Poisson likelihood objective function. IEEE Trans. Med. Imaging 2000, 19, 773–786. [Google Scholar] [CrossRef] [PubMed]
- Rudin, L.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Physica D 1992, 60, 259–268. [Google Scholar] [CrossRef]
- Huber, P.J. Robust regression: Asymptotics, conjectures, and Monte Carlo. Ann. Stat. 1973, 1, 799–821. [Google Scholar] [CrossRef]
- Yu, D.F.; Fessler, J.A. Edge-preserving tomographic reconstruction with nonlocal regularization. IEEE Trans. Med. Imaging 2002, 21, 159–173. [Google Scholar] [CrossRef] [PubMed]
- Evans, J.D.; Politte, D.A.; Whiting, B.R.; O’Sullivan, J.A.; Williamson, J.F. Noise-resolution tradeoffs in X-ray CT imaging: A comparison of penalized alternating minimization and filtered backprojection algorithms. Med. Phys. 2011, 38, 1444–1458. [Google Scholar] [CrossRef] [PubMed]
- Ma, J. Total Variation Smoothed Maximum Penalized Likelihood Tomographic Reconstruction with Positivity Constraints. In Proceedings of the 8th IEEE International Symposium on Biomedical Imaging, Chicago, USA, April 2011; pp. 1774–1777.
- Sidky, E.Y.; Duchin, Y.; Pan, X.; Ullberg, C. A constrained, total-variation minimization algorithm for low-intensity X-ray CT. Med. Phys. 2011, 38, S117–S125. [Google Scholar] [CrossRef] [PubMed]
- Lauzier, P.T.; Tang, J.; Chen, G.H. Quantitative evaluation method of noise texture for iteratively reconstructed X-ray CT images. Proc. Med. Imaging 2011: Phys. Med. Imaging, Proc. SIPE 2011, 7961, Artical 796135. [Google Scholar]
- Ma, J. Positively constrained multiplicative iterative algorithm for maximum penalized likelihood tomographic reconstruction. IEEE Trans. Nucl. Sci. 2010, 57, 181–192. [Google Scholar] [CrossRef]
- Dempster, A.; Laird, N.; Rubin, D. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. B 1977, 39, 1–38. [Google Scholar]
- Wei, G.; Tanner, M. A Monte Carlo implementation of the EM algorithm and the Poor Man’s data augmentation algorithm. J. Am. Stat. Assoc. 1990, 85, 699–704. [Google Scholar] [CrossRef]
- Lange, K.; Carson, R. EM reconstruction algorithms for emission and transmission tomography. J. Comput. Assis. Tomogr. 1984, 8, 306–316. [Google Scholar]
- Ma, J. On iterative Bayes algorithms for emission tomography. IEEE Trans. Nucl. Sci. 2008, 55, 953–966. [Google Scholar] [CrossRef]
- Green, P. Bayesian reconstruction from emission tomography data using a modified EM algorithm. IEEE Trans. Med. Imaging 1990, 9, 84–93. [Google Scholar] [CrossRef] [PubMed]
- De Pierro, A.R. A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography. IEEE Trans. Med. Imaging 1995, 14, 132–137. [Google Scholar] [CrossRef] [PubMed]
- Csiszár, I.; Tusnády, G. Information geometry and alternating minimization procedures. Stat. Decis. 1984, Supplement Issue, No. 1, 205–237. [Google Scholar]
- O’Sullivan, J.; Benac, J. Alternating minimization algorithms for transmission tomography. IEEE Trans. Med. Imaging 2007, 26, 283–297. [Google Scholar] [CrossRef] [PubMed]
- Csiszár, I. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Ann. Stat. 1991, 19, 2032–2066. [Google Scholar] [CrossRef]
- O’Sullivan, J.A.; Whiting, B.R.; Snyder, D.L. Alternating Minimization Algorithms for Transmission Tomography Using Energy Detectors. In Proceedings of the 36th Asilomar Conference Signals, Systems and Computers, St. Louis, USA, 2002; Volume 1, pp. 144–147.
- Lasio, G.M.; Whiting, B.R.; Williamson, J.F. Statistical reconstruction for X-ray computed tomography using energy-integrating detectors. Phys. Med. Biol. 2007, 52, 2247–2266. [Google Scholar] [CrossRef] [PubMed]
- Lange, K.; Hunter, D.R.; Yang, I. Optimization transfer using surrogate objective functions. J. Comput. Graph. Stat. 2000, 9, 1–20. [Google Scholar]
- Erdoğan, H.; Fessler, J.A. Monotonic algorithms for transmission tomography. IEEE Trans. Med. Imaging 1999, 18, 801–814. [Google Scholar] [CrossRef] [PubMed]
- Böhning, D.; Lindsay, B.G. Monotonicity of quadratic approximation algorithms. Ann. Inst. Stat. Math. 1988, 40, 641–663. [Google Scholar] [CrossRef]
- Chan, R.H.; Ma, J. A multiplicative iterative algorithm for box-constrained penalized likelihood image restoration. IEEE Trans. Image Process. 2012, 21, 3168–3181. [Google Scholar] [CrossRef]
- Gasso, G.; Rakotomamonjy, A.; Canu, S. Recovering sparse signals with a certain family of non-convex penalties and DC programming. IEEE Trans. Signal Proc. 2009, 57, 4686–4698. [Google Scholar] [CrossRef]
- Luenberger, D. Linear and Nonlinear Programming, 2nd ed.; J. Wiley: New York, NY, USA, 1984. [Google Scholar]
- Ahn, S.; Fessler, J.A. Emission image reconstruction for randoms-precorrected PET allowing negative sinogram values. IEEE Trans. Med. Imaging 2004, 23, 591–601. [Google Scholar] [CrossRef] [PubMed]
- Lange, K.; Bahn, M.; Little, R. A theoretical study of some maximum likelihood algorithms for emission and transmission tomography. IEEE Trans. Med. Imaging 1987, 6, 106–114. [Google Scholar] [CrossRef] [PubMed]
- Ober, R.J.; Zou, Q.; Zhiping, L. Calculation of the Fisher information matrix for multidimensional data sets. IEEE Trans. Signal Proc. 2003, 51, 2679–2691. [Google Scholar] [CrossRef]
- Ma, J.; Hudson, H.M. Modified Fisher scoring algorithms using Jacobi or Gauss-Seidel subiterations. Comput. Stat. 1997, 12, 467–479. [Google Scholar]
- Hudson, H.; Ma, J.; Green, P. Fisher’s method of scoring in statistical image reconstruction: Comparison of Jacobi and Gauss-Seidel iterative schemes. Stat. Method Med. Res. 1994, 3, 41–61. [Google Scholar] [CrossRef]
- Ortega, J.M.; Rheinboldt, W.C. Iterative Solutions of Nonlinear Equations in Several Variables; Academic Press: New York, NY, USA, 1970. [Google Scholar]
- Osborne, M.R. Fisher’s method of scoring. Int. Stat. Rev. 1992, 60, 99–117. [Google Scholar] [CrossRef]
- Sauer, K.; Bouman, C. A local update strategy for iterative reconstruction from projections. IEEE. Trans. Signal Proc. 1993, 41, 533–548. [Google Scholar] [CrossRef]
- Bouman, C.A.; Sauer, K. A unified approach to statistical tomography using coordinate descent optimization. IEEE Trans. Image Process. 1996, 5, 480–492. [Google Scholar] [CrossRef] [PubMed]
© 2013 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).