Kaczmarz Anomaly in Tomography Problems

Dax, Achiya

doi:10.3390/appliedmath2020012

Open AccessArticle

Kaczmarz Anomaly in Tomography Problems

by

Achiya Dax

Hydrological Service, P.O. Box 36118, Jerusalem 91360, Israel

AppliedMath 2022, 2(2), 196-211; https://doi.org/10.3390/appliedmath2020012

Submission received: 20 December 2021 / Revised: 8 April 2022 / Accepted: 14 April 2022 / Published: 25 April 2022

Download Versions Notes

Abstract

:

The Kaczmarz method is an important tool for solving large sparse linear systems that arise in computerized tomography. The Kaczmarz anomaly phenomenon has been observed recently when solving certain types of random systems. This raises the question of whether a similar anomaly occurs in tomography problems. The aim of the paper is to answer this question, to examine the extent of the phenomenon and to explain its reasons. Another tested issue is the ability of random row shuffles to sharpen the anomaly and to accelerate the rate of convergence. The results add important insight into the nature of the Kaczmarz method.

Keywords:

the smallest singular value anomaly; the condition number anomaly; Kaczmarz anomaly; tomography problems; random rows shuffles

1. Introduction

The Kaczmarz algorithm is an iterative method for solving large sparse linear systems of the form

A x = b,

(1)

where

A \in R^{m \times n}, b = {(b_{1}, \dots, b_{m})}^{T} \in R^{m}

and

x \in R^{n}

denote the vectors of unknowns. Let the rows of A be denoted by the row vectors

a_{i}^{T}, i = 1, \dots, m

. Then, an equivalent way to write (1) is

a_{i}^{T} x = b_{i} for i = 1, \dots, m .

(2)

The idea of the Kaczmarz method is to handle one equation at a time. Let

∥ a_{i} ∥_{2} = {(a_{i}^{T} a_{i})}^{1 / 2}

denote the Euclidean vector norm and let w be a preassigned relaxation parameter that satisfies

0 < w < 2

. Then, the kth iteration of the Kaczmarz algorithm,

k = 1, 2, \dots

, is composed of m steps. The ith step of the kth iteration,

i = 1, \dots, m

, starts with the vector

x_{k, i - 1}

and ends with the vector

x_{k, i} = x_{k, i - 1} + w a_{i} (b_{i} - a_{i}^{T} x_{k, i - 1}) / {∥ a_{i} ∥}_{2}^{2} .

(3)

That is, the ith step uses only the ith equation. Observe that for

w = 1

, the point

x_{k, i}

is the projection of

x_{k, i - 1}

on the hyperplane

{x | a_{i}^{T} x = b_{i}}

. Note also that the kth iteration,

k = 1, 2, \dots

, starts with the vector

x_{k - 1} = x_{k - 1, m} = x_{k, 0},

(4)

and ends with

x_{k} = x_{k, m} = x_{k + 1, 0} .

(5)

The starting point is denoted as

x_{0} = x_{1, 0}

.

The fact that the algorithm uses one row at a time makes it a popular tool for solving large sparse linear systems that arise in important applications, such as computerized tomography or digital signal processing. The literature on the Kaczmarz method is vast and covers various issues. See [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26] and the references therein. Results on the theory behind the method and its rate of convergence can be found in [1,6,14,16,19,21,23,25], while efficient implementations and applications are considered in [3,4,10,12,13,14,15,19]. In addition, there are several variants of the basic iteration, such as block versions and parallel computing techniques [4,14,19]. In particular, recently, there has been growing interest in Randomized Kaczmarz methods [5,20,22,24]. Some of the variants are easily described by restating the algorithm with the restriction that each iteration regards only one equation, which is chosen according to some rule. In the basic iteration (3), the rows are chosen in a sequential “cyclic” manner. In “Greedy” Kaczmarz, we select an equation that has a maximal residual, while in “Randomized” Kaczmarz, the equation’s index i is selected at random, with probability proportional to

∥ a_{i} ∥_{2}^{2}

, e.g., [5,20,21,22,24]. However, in the coming discussions, the terms “Kaczmarz method” and “Kaczmarz iteration” refer to (3). The original algorithm of Kaczmarz [16] is obtained from this framework when

m = n

and

w = 1

.

The use of the relaxation parameter, w, is motivated by the close relation with the SOR method for solving the linear system

A A^{T} y = b,

(6)

where here,

y \in R^{m}

denotes the vector of unknowns. Let

y_{k}

denote the current solution at the end of the kth iteration of this method,

k = 1, 2, \dots

. Then, the following observation is well known, e.g., [1,6]. If the starting points satisfy

x_{0} = A^{T} y_{0},

(7)

then the equality

x_{k} = A^{T} y_{k}

(8)

holds for all k. This relation implies that several convergence properties of Kaczmarz method are inherited from those of the SOR method.

Let the linear system

{\hat{a}}_{i}^{T} x = {\hat{b}}_{i}, i = 1, \dots, m,

(9)

be obtained from (2) by “normalizing” the rows of A to have unit length. That is,

{\hat{a}}_{i} = a_{i} / ∥ a_{i} ∥_{2} and {\hat{b}}_{i} = b_{i} / {∥ a_{i} ∥}_{2} .

(10)

Then, it is easy to verify that applying the Kaczmarz method to solve (9) yields the same sequence as (3). Hence, when studying the convergence properties of Kaczmarz method, there is no loss of generality in assuming that the rows of A have unit length. (A similar remark applies to the related SOR method).

In this paper, we consider an interesting feature of Kaczmarz method. To illustrate this property, it is assumed that m is considerably larger than n. Let

A_{i}

and

b_{i}

be composed from the first i rows of A and

b

, respectively. That is

A_{i} = {[a_{1}, a_{2}, \dots, a_{i}]}^{T} \in R^{i \times n}

(11)

and

b_{i} = {(b_{1}, b_{2}, \dots, b_{i})}^{T} \in R^{i} .

(12)

Then, the new feature is revealed when using Kaczmarz method for solving the linear systems

A_{i} x = b_{i}, i = 1, 2, \dots, m,

(13)

and watching how the number of rows, i, affects the rate of convergence. If A is an arbitrary matrix, we are not expecting a certain behavior. However, as we shall see, in some cases, the number of rows has a dramatic effect on the rate of convergence. Assume first that i is considerably smaller than n. In this case, the Kaczmarz method has a fast rate of convergence. Yet as i increases toward n, the rate of convergence slows down. That is, the more equations we have, the more iterations are needed to solve the system. In particular, as i approaches n, there is a dramatic increase in the number of iterations. The closer i and n are, the slower the convergence. However, as i passes n, the situation is reversed. From now on, the more equations we have, the fewer iterations are needed. Finally, when i is considerably larger than n, the method returns to enjoy rapid convergence.

We call this behavior the Kaczmarz anomaly. One aim of this paper is to examine the presence of this phenomenon when solving tomography problems. The first report on the Kaczmarz anomaly appeared in [7], but it remained almost unnoticed. Recently, we have shown in [8] that it is likely to occur whenever the rows’ directions

{\hat{a}}_{i} = a_{i} / {∥ a_{i} ∥}_{2}

scatter randomly over some portion of the unit sphere. This suggests that a random shuffle of the rows may improve their randomality and strengthen the anomaly. A second aim of the paper is to examine this idea.

The plan of the paper is as follows. The first two sections provide theoretical background that reveals the reasons behind the anomaly phenomenon. Section 2 overviews the condition number anomaly phenomenon, while Section 3 explains how the condition number of A affects the rate of convergence. Combining these features yields the Kaczmarz anomaly. The background is based on recent results by this author; see [8,9]. The use of random shuffles and related techniques is considered in Section 4. The paper ends with numerical experiments that illustrate the anomaly phenomena.

2. The Smallest Singular Value Anomaly and the Condition Number Anomaly

Let

α_{i}

denote the largest singular value of the matrix

A_{i}, i = 1, \dots, m

. Then,

α_{i}^{2}

is the largest eigenvalue of the cross-product matrix

A_{i}^{T} A_{i}

, and there exists a unit eigenvector,

u_{i}

, that satisfies

A_{i}^{T} A_{i} u_{i} = α_{i}^{2} u_{i}

(14)

and

α_{i}^{2} = u_{i}^{T} A_{i}^{T} A_{i} u_{i} = \max {x^{T} A_{i}^{T} A_{i} x | x \in R^{n} {and ∥ x ∥}_{2} = 1} .

(15)

The first assertion characterizes the ascending behavior of the sequence

α_{1}, \dots, α_{m}

. (For proofs of the coming theorems, see [8]).

Theorem 1.

For

i = 1, \dots, m - 1

, we have the inequalities

α_{i}^{2} \leq α_{i}^{2} + {(u_{i}^{T} a_{i + 1})}^{2} \leq α_{i + 1}^{2}

(16)

and

α_{i + 1}^{2} \leq α_{i}^{2} + {(u_{i + 1}^{T} a_{i + 1})}^{2} \leq α_{i}^{2} + {∥ a_{i + 1} ∥}_{2}^{2} .

(17)

Next, we explore the behavior of the smallest singular value, which is rather surprising. Let

β_{i}

denote the smallest singular value of the matrix

A_{i}

,

i = 1, \dots, m

. Then, as the coming theorem shows, the first part of this sequence,

β_{1}, \dots, β_{n}

, is descending.

Theorem 2.

Let

{\hat{a}}_{i + 1} = a_{i + 1} / {∥ a_{i + 1} ∥}_{2}

be a unit vector in the direction of

a_{i + 1}

, and let the unit vector

v_{i} \in R^{n}

be a right singular vector of

A_{i}

that corresponds to

β_{i}

. Then, for

i = 1, 2, \dots, n - 1

, we have the inequalities

\begin{matrix} β_{i + 1}^{2} & \leq β_{i}^{2} [1 - {(v_{i}^{T} {\hat{a}}_{i + 1})}^{2}] / [1 + β_{i}^{2} {(v_{i}^{T} {\hat{a}}_{i + 1})}^{2} / ∥ a_{i + 1} ∥_{2}^{2}] \\ \leq β_{i}^{2} [1 - {(v_{i}^{T} {\hat{a}}_{i + 1})}^{2}] \leq β_{i}^{2} . \end{matrix}

(18)

The assumption

β_{i} > 0

is not essential for the proof of Theorem 2, but it enables us to replace (18) with the inequality

β_{i + 1}^{2} / β_{i}^{2} \leq [1 - {(v_{i}^{T} {\hat{a}}_{i + 1})}^{2}] / [1 + β_{i}^{2} {(v_{i}^{T} {\hat{a}}_{i + 1})}^{2} / ∥ a_{i + 1} ∥_{2}^{2}] .

(19)

This exposes the actual reasons that force

β_{i + 1}^{2}

to be smaller than

β_{i}^{2}

. One reason is the size of

a_{i + 1}

. We see that the smaller

∥ a_{i + 1} ∥_{2}^{2}

is, the smaller is

β_{i + 1}^{2}

. Another important factor is the size of the scalar product

v_{i}^{T} {\hat{a}}_{i + 1}

. Since both

v_{i}

and

{\hat{a}}_{i + 1}

are unit vectors, the Cauchy–Schwartz inequality implies

{(v_{i}^{T} {\hat{a}}_{i + 1})}^{2} \leq 1

, and equality occurs if and only if

{\hat{a}}_{i + 1} = \pm v_{i}

. Now, from (19), we see that the larger

{(v_{i}^{T} {\hat{a}}_{i + 1})}^{2}

is, the smaller is

β_{i + 1}^{2}

.

In this paper, we concentrate on the behavior of Kaczmarz method, and for this purpose, it is possible to assume that all the rows of A have unit length. This assumption implies that

β_{1} = 1

and turns (19) into the form

β_{i + 1}^{2} / β_{i}^{2} \leq [1 - {(v_{i}^{T} {\hat{a}}_{i + 1})}^{2}] / [1 + β_{i}^{2} {(v_{i}^{T} {\hat{a}}_{i + 1})}^{2}] .

(20)

A further simplification is allowed when

β_{i}^{2}

becomes considerably smaller than one. In this case, the factor

1 / [1 + β_{i}^{2} {(v_{i}^{T} {\hat{a}}_{i + 1})}^{2}]

approaches one, and the bound

β_{i + 1}^{2} / β_{i}^{2} \leq 1 - {(v_{i}^{T} {\hat{a}}_{i + 1})}^{2}

(21)

is nearly as good as (20).

It is left to see how the second part of the sequence,

β_{n}, β_{n + 1}, \dots, β_{m}

, behaves. Below, we will show that this part is ascending. The proof is based on the observation that now,

β_{i}^{2}

is the smallest eigenvalue of the cross-product matrix

A_{i}^{T} A_{i}

. Hence, for

i = n, n + 1, \dots, m

, there exists a unit eigenvector

v_{i}

such that

A_{i}^{T} A_{i} v_{i} = β_{i}^{2} v_{i}

(22)

and

v_{i}^{T} A_{i}^{T} A_{i} v_{i} = β_{i}^{2} = \min {x^{T} A_{i}^{T} A_{i} x | x \in R^{n} {and ∥ x ∥}_{2} = 1} .

(23)

Theorem 3.

For

i = n, n + 1, \dots, m - 1

, we have the inequalities

β_{i}^{2} \leq β_{i}^{2} + {(v_{i + 1}^{T} a_{i + 1})}^{2} \leq β_{i + 1}^{2} .

(24)

Assume for a moment that

β_{i} > 0

, which enables us to rewrite (24) in the form

β_{i + 1}^{2} / β_{i}^{2} \geq 1 + {(v_{i + 1}^{T} a_{i + 1})}^{2} / β_{i}^{2} .

(25)

Assume further that the rows of the matrix have unit length and random directions. Then, a small

β_{i}^{2}

implies a large increase ratio, while a large

β_{i}^{2}

means a slow increase. Consequently, when i is close to n, we expect a fast increase, but as i moves away from n, the rate of increase is likely to slow down.

Combining the results of Theorems 2 and 3 shows that the sequence

β_{1}, \dots, β_{n}

is decreasing, the sequence

β_{n}, \dots, β_{m}

is increasing, and

β_{n}

is the smallest number in the whole sequence. Moreover, in some cases,

β_{n}

can be considerably smaller than its neighbors. This behavior is called the smallest singular value anomaly.

Let

k_{i}

denote the condition number of

A_{i}, i = 1, \dots, m

. In the rest of this section, we assume for simplicity that

β_{i} > 0

for

i = 1, \dots, m

, and that

k_{i} = α_{i} / β_{i} .

(26)

(The discussion of the case when

β_{i} = 0

is deferred to the next section. In this case,

β_{i}

is redefined as the smallest nonzero singular value of

A_{i}

). We have seen that the sequence

α_{1}, \dots, α_{n}

is ascending, while the sequence

β_{1}, \dots, β_{n}

is descending. This proves the following conclusion.

Theorem 4.

The sequence

k_{1}, \dots, k_{n}

is ascending. That is,

k_{i} \leq k_{i + 1} f o r i = 1, \dots, n - 1 .

(27)

The behavior of the sequence

k_{n}, k_{n + 1}, \dots, k_{m}

is not that straightforward. We know that the sequences

α_{n}, α_{n + 1}, \dots, α_{m}

and

β_{n}, β_{n + 1}, \dots, β_{m}

are ascending, but this does not provide decisive information. Indeed, for

i \geq n

, one can find examples in which

k_{i + 1} < k_{i}

as well as examples with

k_{i + 1} > k_{i}

, e.g., [8]. The condition number anomaly occurs when the sequence

k_{n}, \dots, k_{m}

is descending. That is, when

k_{i} \geq k_{i + 1} for i = n, \dots, m - 1 .

(28)

The reasons behind this behavior lie in the following observations.

Theorem 5.

Let

u_{i + 1}

be as in Theorem 1, let

v_{i + 1}

be as in Theorem 3, and consider the terms

η_{i}^{2} = {(u_{i + 1}^{T} a_{i + 1})}^{2} / α_{i}^{2}

(29)

and

ν_{i}^{2} = {(v_{i + 1}^{T} a_{i + 1})}^{2} / β_{i}^{2} .

(30)

Then, for

i = n, \dots, m - 1

,

k_{i + 1}^{2} \leq k_{i}^{2} (1 + η_{i}^{2}) / (1 + ν_{i}^{2}) .

(31)

Proof.

From (17), we see that

α_{i + 1}^{2} \leq α_{i}^{2} + {(u_{i + 1}^{T} a_{i + 1})}^{2} = α_{i}^{2} (1 + η_{i}^{2})

(32)

while Theorem 3 gives

β_{i + 1}^{2} \geq β_{i}^{2} + {(v_{i + 1}^{T} a_{i + 1})}^{2} = β_{i}^{2} (1 + ν_{i}^{2}) .

(33)

Hence, combining these inequalities yields (31). □

Corollary 1.

The inequality

ν_{i}^{2} \geq η_{i}^{2}

(34)

implies

k_{i} \geq k_{i + 1} .

(35)

The last corollary is a key observation that indicates at which situations the condition number anomaly is likely to occur. Assume for example that the direction of

a_{i + 1}

is chosen in some random way. Then, the scalar product terms

{(u_{i + 1}^{T} a_{i + 1})}^{2}

and

{(v_{i + 1}^{T} a_{i + 1})}^{2}

are likely to be about the same size. However, since

β_{i}^{2}

is (considerably) smaller than

α_{i}^{2}

, the term

{(v_{i + 1}^{T} β_{i + 1})}^{2} / β_{i}^{2}

is expected to be larger than

{(u_{i + 1}^{T} a_{i + 1})}^{2} / α_{i}^{2}

, which implies

k_{i} \geq k_{i + 1}

. In other words, the condition number anomaly is likely to occur whenever the rows’ directions scatter in some random way. This conclusion means that the phenomenon is shared by a wide range of matrices. See Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 and the examples in [8].

3. The Rate of Convergence of the Kaczmarz Method

We have seen that the Kaczmarz method for solving (1) is closely related to the SOR method for solving the linear system

G y = b,

(36)

where

G = A A^{T} \in R^{m \times m} .

(37)

Below, we will use this relation to obtain the iteration matrix of Kaczmarz method (3). As before, it is allowed to assume that the rows of A have unit length. That is

∥ a_{i} ∥_{2} = 1 for i = 1, \dots, m .

(38)

This assumption implies that G has the form

G = I - L - L^{T},

(39)

where I denotes the identity matrix, and L is a strictly lower triangular matrix. The SOR iteration splits G in the form

G = B_{w} - C_{w},

(40)

where

B_{w} = (I - w L) / w

(41)

and

C_{w} = [(1 - w) I + w L^{T}] / w .

(42)

As before, w is a preassigned relaxation parameter that satisfies

0 < w < 2

. The kth SOR iteration,

k = 1, 2, \dots

, starts with

y_{k - 1}

and ends with

y_{k}

, which is computed by solving the linear system

B_{w} y = C_{w} y_{k - 1} + b .

(43)

In other words,

y_{k}

is obtained from

y_{k - 1}

by the rule

y_{k} = H_{w} y_{k - 1} + d_{w},

(44)

where

H_{w} = B_{w}^{- 1} C_{w}

(45)

is the related iteration matrix, and

d_{w} = B_{w}^{- 1} b .

(46)

Observe that (40) enables us to express

H_{w}

in the form

H_{w} = I - B_{w}^{- 1} G = I - B_{w}^{- 1} A A^{T} .

(47)

Multiplying (44) by

A^{T}

gives

x_{k} = A^{T} H_{w} y_{k - 1} + A^{T} d_{w},

(48)

while substituting

I - B_{w}^{- 1} A A^{T}

instead of

H_{w}

shows that

x_{k} = (I - A^{T} B_{w}^{- 1} A) x_{k - 1} + A^{T} d_{w} .

(49)

This means that the iteration matrix of the Kaczmarz method has the form

F_{w} = I - A^{T} B_{w}^{- 1} A .

(50)

Note that

H_{w}

is an

m \times m

matrix, while

F_{w}

is an

n \times n

matrix. However, as shown in [9], these matrices share the nonzero eigenvalues.

The theory of the Kaczmarz method tells us that the sequence

x_{k}, k = 1, 2, \dots

, converges for any choice of

b

and

x_{0}

, e.g., [3,6,9,10,14,19,22,25]. Moreover, let

\hat{x}

denote the limit point of this sequence; then, the error vectors

x_{k} - \hat{x}

satisfy

x_{k} - \hat{x} = {(F_{w})}^{k} (x_{0} - \hat{x}) .

(51)

This shows that the rate of convergence depends on

ρ (F_{w})

, the spectral radius of

F_{w}

. The smaller

ρ (F_{w})

is, the faster the convergence. It is interesting, therefore, to see which properties of A make

ρ (F_{w})

small. One answer is given by the following bound, which has recently been derived in [9]. (A second answer is given in the next section, in which we consider the effect of rows shuffling).

Let

α

denote the largest singular value of A, let

β > 0

denote the smallest nonzero singular value of A, and let

k = α / β

denote the related condition number of A. Let

ω_{o p t}

denote the optimal relaxation parameter of the Kaczmarz method. That is,

ρ (F_{w_{o p t}}) = \min \{ρ (F_{w}) | 0 < w < 2\} .

(52)

Then, it is proved in [9] that

ρ {(F_{w_{o p t}})}^{2} \leq 1 - 1 / (k^{2} c),

(53)

where c is a constant,

c = 1 + {log}_{2} (2 m) .

(54)

The bound is not tight, and the actual rate of convergence (even for

w = 1

) is often faster than the implied rate. Recall that in many practical problems,

w_{o p t}

is not known in advance, and its value is computed by repeated experiments; see Section 5. Yet the main consequence from this bound is that a small condition number forces fast convergence. Conversely, for large k, the bound tends to 1, which allows slow convergence. Indeed, as explained in [9], the existence of small nonzero singular values invites a slow rate of convergence.

The relation between the condition number and the rate of convergence suggests that the Kaczmarz anomaly phenomenon is caused by the condition number anomaly. We have seen that the last phenomenon is expected to occur whenever the rows’ directions scatter randomly. This raises the question of whether tomography problems possess these properties. The next sections attempt to answer this question.

4. From Random Shuffles to Optimal Ordering

The Kaczmarz anomaly phenomenon is observed by watching how the number of rows affects the rate of convergence. Another property that affects the rate of convergence is row ordering. The initial ordering in tomography problems is often rather poor, in the sense that it yields a slow rate of convergence. A typical tomography matrix is composed from several blocks of rows, where each block is generated by one “view”. The views (and the blocks) are ordered according to the size of the view angle, which is the natural geometric order, e.g., [15], p. 602. Yet this natural order minimizes the angle between adjacent views, which is a property that retards convergence. A possible remedy for this difficulty is to apply a random shuffle of rows before starting the Kaczmarz process. The shuffle is aimed at achieving a faster rate of convergence (see below). Yet, at the same time, it improves the randomality of the rows’ directions, which sharpens the anomaly phenomena.

The term “random shuffle” means that the rows of the linear system (1) are reordered by applying a random permutation. This converts (1) into the form

P A x = P b,

(55)

where

P

is a random permutation matrix. To simplify the coming discussions and experiments, we assume that the random permutation is chosen by MATLAB’s command “randperm

(m)

”, and the matrix

P A

is generated by the command “shuffle (A)”, which uses the “randperm

(m)

” command.

The reordering of the rows is expected to change the iteration matrix of the Kaczmarz method as well as its rate of convergence. It is easy to verify this assertion by using the relation with the SOR method for solving the system

P A A^{T} P^{T} y = P b .

(56)

Now, it is easy to see that the SOR iteration for solving (56) differs from that of (6). Indeed, the observation that the reordering of rows changes the rate of convergence of the SOR method is not new. See, for example, [27] and the references therein.

As mentioned above, it has been observed by several authors that when solving tomography problems, a random row shuffle may improve the rate of convergence of the Kaczmarz method. See, for example, [14,15,19,21,22,24] and the references therein. A possible explanation of this phenomenon comes from geometric interpretation of the basic step (3) when

i > 1

and

w = 1

. Let

0 \leq θ_{i} \leq π / 2

denote the angle between

a_{i - 1}

and

a_{i}

. Then, in two-dimensional space, the distance to the solution point is reduced by the factor

cos θ_{i}

. That is, a small angle yields a small reduction while a large angle implies a large reduction. When moving to larger dimensions, the situation is not that simple, but it is still true that a small

θ_{i}

forces a small step toward the solution, while a large

θ_{i}

allows larger steps. (Recall that a large

θ_{i}

means that

a_{i - 1}

and

a_{i}

are nearly orthogonal). These considerations suggest that a random shuffle may improve the rate of convergence if it improves orthogonality between adjacent rows.

Similar arguments have motivated Herman and Meyer [15] to propose an optimal ordering of rows that takes advantage of the special structure of tomography problems to maximize orthogonality between adjacent rows. Further optimal ordering schemes that follow this approach are described in [11,17,18].

The observation that a random ordering of rows may improve the rate of convergence has motivated the Randomized Kaczmarz algorithm of Strohmer and Vershynin [24]. In this algorithm, the basic step treats one equation whose index is selected at random with probability proportional to

∥ a_{i} ∥_{2}^{2}

. Thus, when all the rows have unit length, all the indices have equal probability. (In our experiments, the row index is obtained by the “randi

(m)

” command.

The use of a random shuffle has recently been considered by Oswald and Zhou [21,22], who proposed an improved randomized method, the Shuffled Kaczmarz algorithm. In this method, each iteration is preceded by a random shuffle of the rows. This formulation has two advantages. First, as in the Kaczmarz method, each iteration treats all the equations. Second, since all the shuffled matrices have the same singular values as A, the bound on the rate of convergence is the same as in the Kaczmarz method.

It is interesting to compare the above randomized methods with the Initial Shuffle method, which uses one random shuffle before starting the Kaczmarz algorithm. Both approaches share the same motivation. If the given system has bad ordering, then a random shuffle is likely to provide a better ordering. Furthermore, basically, we are not expecting large differences in the quality of the generated random shuffles. Hence, in practice, the initial shuffle method is likely to run at the same speed as the randomized Kaczmarz methods. Yet, since it uses only one shuffle, there is a tiny probability to obtain bad ordering, while randomized algorithms avoid this possibility.

5. Numerical Experiments

The experiments examine the behavior of the Kaczmarz method (3) when solving tomography test problems. The test problems are generated by using MATLAB’s functions from “AIR tools”, which is a MATLAB package of algebraic reconstruction iterative methods prepared by P.C. Hansen and others [12,13]. The test problems imitate the scanning of an

N \times N

array of square cells. This generates a linear system with

n = N^{2}

unknowns. (The unknowns present the densities of the cells while equations describe rays.) In our experiments, $N = 20$ , and all the test matrices have $n = N^{2} = 400$ columns. The number of rows depends on the nature of the scanning device and the specific details of the experiment.

The Parallel beam tomography problems are generated by using MATLAB’s function

paralleltomo (N, θ, p)

(57)

with

N = 20

, theta = 1:1:180, and

p = round (\sqrt{2} N) = 28

. (The vector theta contains the angles of the views, while p denotes the number of parallel rays for each view.) This results in a linear system with

n = N^{2} = 400

unknowns and

m = 180 \times 28 = 5040

equations. However, if a ray passes outside the array, it generates a null row. So, we use MATLAB’s function

r z r (A, b)

(58)

to remove zero rows, which yields a linear system with

n = 400

unknowns and

m = 4340

rows.

In Fan beam tomography, each angle (each view) is related to a “fan” of p rays, and the problem is generated by using the function

fanbeamtomo (N)

(59)

with

N = 20

and the default values theta = 0:1:359 and

p = round (\sqrt{2} N) = 28

. This builds a linear system with

n = N^{2} = 400

unknowns and

m = 28 \times 360

= 10,080 equations. Then, after removing zero rows, we remain with

m = 9520

equations.

In Seismic tomography problems, the linear system is generated by applying the function

seismictomo (N, s, p)

(60)

with

N = 20

,

s = 2 N

sources, and

p = 4 N

receivers. This setting builds a linear system with

n = N^{2} = 400

unknowns and

m = s \times p = 3200

equations. (In this case, there are no zero rows.)

The experiments were carried out as follows. At first, we have generated an

m \times n

linear system,

A x = b

, as described above. Together with A and

b

, we are given a prescribed solution

x^{*} \in R^{n}

, which is the one that has been used to build A and

b

. Then, in the second stage, the rows of A are normalized to have a unit norm. Thus, for

i = 1, \dots, m

, the ith row of A is redefined as

a_{i} = a_{i} / {∥ a_{i} ∥}_{2}

, while

b

is updated as

b = A x^{*}

. Finally, after the normalization, the Kaczmarz method was applied to solve partial systems of the form (13). The starting point in our runs is always

x_{0} = 0

, and the iterative process was terminated after 666 iterations.

The shuffled test problems are obtained by reordering the rows of A, using a random permutation. The actual reordering is carried out by applying MATLAB’s function

shuffle (A) .

(61)

After the shuffling, the vector

b

is redefined as

b = A x^{*}

where

x^{*}

denotes the known solution. (The shuffling takes place after the normalization but before starting the solution of the partial linear systems.)

The rows in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 describe the use of the Kaczmarz method to solve partial linear systems of the form

A_{i} x = b_{i} .

(62)

Recall that

A_{i}

is a

i \times n

submatrix of A which is composed from the first i rows of A, and

b_{i} = {(b_{1}, \dots, b_{i})}^{T} \in R^{i}

is composed from the first i entries of

b

. The results for the linear system (62) start with the number of rows, i, and the number of zero singular values of

A_{i}

. Then, we provide the values of

α_{i}, β_{i}

, and

k_{i}

, as well as the related residual values. As noted in the tables’ headlines,

α_{i}

is the largest singular value of

A_{i}

,

β_{i}

is the smallest nonzero singular value of

A_{i}

, and

k_{i} = α_{i} / b_{i}

is the condition number of

A_{i}

. The residual values are defined as

∥ A_{i} x_{666} - b_{i} ∥_{2} / ∥ A_{i} x_{0} - b_{i} ∥_{2} = ∥ A_{i} x_{666} - b_{i} ∥_{2} / {∥ b_{i} ∥}_{2}

(63)

where

{∥ \cdot ∥}_{2}

denotes the Euclidean vector norm and

x_{666}

denotes the computed solution after 666 iterations. The Kaczmarz method uses a relaxation parameter, w, and the residual values are given for

w = 1

and

w = w_{o p t}

. The value of

w_{o p t}

was obtained by running the Kaczmarz method with values of w from the set

{0.1, 0.2, \dots, 1.8, 1.9} ⋃ {1.95}

(64)

and taking a value of w that yields the smallest residual.

Table 1. Parallel beam tomography.

Number of Rows	Number of Zero Singular Values	Largest Singular Value	Smallest Nonzero Singular Value	Condition Number	Residual after 666 Iterations
$i$		$α_{i}$	$β_{i}$	$k_{i}$	$w = 1$	$w = w_{opt}$	$w_{opt}$
50	6	1.723	6.46 × 10 $^{- 2}$	2.67 × 10	1.91 × 10 $^{- 8}$	7.95 × 10 $^{- 13}$	1.4
100	8	2.213	9.77 × 10 $^{- 3}$	2.26 × 10 $^{2}$	1.49 × 10 $^{- 4}$	1.12 × 10 $^{- 4}$	1.5
200	8	3.133	7.50 × 10 $^{- 4}$	4.17 × 10³	3.83 × 10 $^{- 4}$	2.77 × 10 $^{- 4}$	1.2
300	20	3.740	5.18 × 10 $^{- 4}$	7.22 × 10³	1.38 × 10 $^{- 3}$	1.27 × 10 $^{- 3}$	1.1
360	27	4.105	2.91 × 10 $^{- 5}$	1.41 × 10 $^{5}$	2.04 × 10 $^{- 3}$	2.02 × 10 $^{- 3}$	1.1
380	31	4.218	2.67 × 10 $^{- 5}$	1.58 × 10 $^{5}$	2.37 × 10 $^{- 3}$	2.37 × 10 $^{- 3}$	1.0
400	37	4.307	2.12 × 10 $^{- 5}$	2.03 × 10 $^{5}$	2.84 × 10 $^{- 3}$	2.84 × 10 $^{- 3}$	1.0
420	22	4.363	5.09 × 10 $^{- 6}$	8.58 × 10 $^{5}$	3.10 × 10 $^{- 3}$	3.09 × 10 $^{- 3}$	0.9
440	16	4.468	7.07 × 10 $^{- 5}$	6.32 × 10 $^{4}$	3.13 × 10 $^{- 3}$	3.13 × 10 $^{- 3}$	1.0
500	11	4.774	2.15 × 10 $^{- 4}$	2.22 × 10 $^{4}$	2.82 × 10 $^{- 3}$	2.82 × 10 $^{- 3}$	1.0
1000	0	6.504	2.47 × 10 $^{- 3}$	2.64 × 10³	3.11 × 10 $^{- 3}$	2.83 × 10 $^{- 3}$	0.6
2000	0	9.111	2.29 × 10 $^{- 2}$	3.97 × 10 $^{2}$	1.48 × 10 $^{- 3}$	1.24 × 10 $^{- 3}$	0.6
3000	0	11.246	6.87 × 10 $^{- 2}$	1.64 × 10 $^{2}$	6.82 × 10 $^{- 4}$	3.64 × 10 $^{- 4}$	0.2
4000	0	12.898	1.01 × 10 $^{- 1}$	1.28 × 10 $^{2}$	4.66 × 10 $^{- 4}$	2.14 × 10 $^{- 4}$	0.2
4340	0	13.498	1.15 × 10 $^{- 1}$	1.18 × 10 $^{2}$	3.76 × 10 $^{- 4}$	1.76 × 10 $^{- 4}$	0.2

Table 2. Parallel beam with initial shuffle.

Number of Rows	Number of Zero Singular Values	Largest Singular Value	Smallest Nonzero Singular Value	Condition Number	Residual after 666 Iterations
$i$		$α_{i}$	$β_{i}$	$k_{i}$	$w = 1$	$w = w_{opt}$	$w_{opt}$
40	1	1.786	8.94 × 10 $^{- 2}$	2.00 × 10	5.79 × 10 $^{- 9}$	2.08 × 10 $^{- 16}$	1.5
50	1	1.966	6.61 × 10 $^{- 2}$	2.97 × 10	4.15 × 10 $^{- 6}$	5.31 × 10 $^{- 15}$	1.6
60	1	1.999	6.57 × 10 $^{- 2}$	3.04 × 10	6.41 × 10 $^{- 6}$	9.79 × 10 $^{- 15}$	1.6
100	1	2.600	6.41 × 10 $^{- 2}$	3.52 × 10	1.43 × 10 $^{- 5}$	1.48 × 10 $^{- 13}$	1.6
200	5	3.064	1.31 × 10 $^{- 2}$	2.34 × 10 $^{2}$	3.14 × 10 $^{- 4}$	8.42 × 10 $^{- 6}$	1.6
300	9	3.694	7.37 × 10 $^{- 5}$	5.01 × 10 $^{4}$	8.55 × 10 $^{- 4}$	7.55 × 10 $^{- 4}$	1.4
360	15	4.026	1.41 × 10 $^{- 3}$	2.85 × 10³	1.64 × 10 $^{- 3}$	1.60 × 10 $^{- 3}$	0.9
380	18	4.119	3.94 × 10 $^{- 4}$	1.05 × 10 $^{4}$	1.95 × 10 $^{- 3}$	1.81 × 10 $^{- 3}$	0.8
400	22	4.200	4.41 × 10 $^{- 4}$	9.53 × 10³	1.72 × 10 $^{- 3}$	1.59 × 10 $^{- 3}$	0.8
420	5	4.304	1.09 × 10 $^{- 4}$	3.96 × 10 $^{4}$	1.86 × 10 $^{- 3}$	1.80 × 10 $^{- 3}$	0.9
440	0	4.407	6.25 × 10 $^{- 4}$	7.05 × 10³	1.73 × 10 $^{- 3}$	1.71 × 10 $^{- 3}$	0.9
500	0	4.678	4.77 × 10 $^{- 3}$	9.80 × 10 $^{2}$	1.59 × 10 $^{- 3}$	1.58 × 10 $^{- 3}$	1.1
1000	0	6.534	3.86 × 10 $^{- 2}$	1.69 × 10 $^{2}$	1.18 × 10 $^{- 4}$	1.40 × 10 $^{- 5}$	1.95
2000	0	9.216	7.22 × 10 $^{- 2}$	1.28 × 10 $^{2}$	7.45 × 10 $^{- 6}$	4.89 × 10 $^{- 7}$	1.8
3000	0	11.243	9.15 × 10 $^{- 2}$	1.23 × 10 $^{2}$	1.03 × 10 $^{- 6}$	2.00 × 10 $^{- 8}$	1.8
4000	0	12.965	1.10 × 10 $^{- 1}$	1.18 × 10 $^{2}$	9.63 × 10 $^{- 8}$	1.07 × 10 $^{- 10}$	1.95
4340	0	13.498	1.15 × 10 $^{- 1}$	1.18 × 10 $^{2}$	4.36 × 10 $^{- 8}$	9.80 × 10 $^{- 11}$	1.8

Table 3. Fan beam tomography.

Number of Rows	Number of Zero Singular Values	Largest Singular Value	Smallest Nonzero Singular Value	Condition Number	Residual after 666 Iterations
$i$		$α_{i}$	$β_{i}$	$k_{i}$	$w = 1$	$w = w_{opt}$	$w_{opt}$
40	3	1.601	8.98 × 10 $^{- 2}$	1.784 × 10	2.29 × 10 $^{- 10}$	6.29 × 10 $^{- 16}$	1.4
50	3	1.601	8.98 × 10 $^{- 2}$	1.784 × 10	2.07 × 10 $^{- 10}$	5.64 × 10 $^{- 16}$	1.5
100	11	2.275	1.37 × 10 $^{- 3}$	1.658 × 10³	1.51 × 10 $^{- 5}$	6.36 × 10 $^{- 6}$	1.2
200	31	3.165	2.59 × 10 $^{- 4}$	1.220 × 10 $^{4}$	3.37 × 10 $^{- 4}$	2.94 × 10 $^{- 4}$	1.2
300	55	3.812	7.03 × 10 $^{- 4}$	5.418 × 10³	1.40 × 10 $^{- 3}$	1.37 × 10 $^{- 3}$	1.1
360	64	4.068	5.17 × 10 $^{- 4}$	7.875 × 10³	1.93 × 10 $^{- 3}$	1.89 × 10 $^{- 3}$	0.9
380	68	4.213	4.70 × 10 $^{- 4}$	8.969 × 10³	2.11 × 10 $^{- 3}$	2.09 × 10 $^{- 3}$	0.9
400	74	4.346	9.89 × 10 $^{- 5}$	4.393 × 10 $^{4}$	2.25 × 10 $^{- 3}$	2.24 × 10 $^{- 3}$	0.9
420	61	4.452	8.22 × 10 $^{- 5}$	5.416 × 10 $^{4}$	2.77 × 10 $^{- 3}$	2.72 × 10 $^{- 3}$	0.9
440	44	4.531	3.81 × 10 $^{- 5}$	1.190 × 10 $^{5}$	2.95 × 10 $^{- 3}$	2.93 × 10 $^{- 3}$	0.9
500	12	4.810	3.13 × 10 $^{- 5}$	1.537 × 10 $^{5}$	3.05 × 10 $^{- 3}$	2.85 × 10 $^{- 3}$	0.7
1000	0	6.640	3.60 × 10 $^{- 3}$	1.845 × 10³	3.12 × 10 $^{- 3}$	2.84 × 10 $^{- 3}$	0.5
2000	0	9.317	1.37 × 10 $^{- 2}$	6.792 × 10 $^{2}$	1.79 × 10 $^{- 3}$	1.68 × 10 $^{- 3}$	0.6
3000	0	11.435	4.13 × 10 $^{- 2}$	2.767 × 10 $^{2}$	3.10 × 10 $^{- 4}$	2.16 × 10 $^{- 4}$	0.4
4000	0	13.141	5.93 × 10 $^{- 2}$	2.217 × 10 $^{2}$	3.13 × 10 $^{- 4}$	2.28 × 10 $^{- 4}$	0.4
8000	0	18.460	8.99 × 10 $^{- 2}$	2.053 × 10 $^{2}$	1.90 × 10 $^{- 4}$	1.32 × 10 $^{- 4}$	0.2
9520	0	20.106	9.76 × 10 $^{- 2}$	2.06 × 10 $^{2}$	1.05 × 10 $^{- 4}$	7.77 × 10 $^{- 5}$	0.4

Table 4. Fan beam with initial shuffle.

Number of Rows	Number of Zero Singular Values	Largest Singular Value	Smallest Nonzero Singular Value	Condition Number	Residual after 666 Iterations
$i$		$α_{i}$	$β_{i}$	$k_{i}$	$w = 1$	$w = w_{opt}$	$w_{opt}$
40	0	1.589	2.63 × 10 $^{- 1}$	6.04	1.08 × 10 $^{- 16}$	4.90 × 10 $^{- 17}$	0.7
50	0	1.764	2.53 × 10 $^{- 1}$	6.96	1.35 × 10 $^{- 16}$	5.91 × 10 $^{- 17}$	0.9
100	2	2.254	4.08 × 10 $^{- 3}$	5.52 × 10 $^{2}$	1.37 × 10 $^{- 4}$	1.23 × 10 $^{- 4}$	0.9
200	5	3.056	2.75 × 10 $^{- 3}$	1.11 × 10³	3.89 × 10 $^{- 4}$	3.88 × 10 $^{- 4}$	1.1
300	8	3.700	1.50 × 10 $^{- 3}$	2.47 × 10³	1.23 × 10 $^{- 3}$	1.22 × 10 $^{- 3}$	0.9
360	10	4.041	4.04 × 10 $^{- 4}$	1.00 × 10 $^{4}$	1.25 × 10 $^{- 3}$	1.25 × 10 $^{- 3}$	1.0
380	13	4.139	2.32 × 10 $^{- 4}$	1.78 × 10 $^{4}$	1.28 × 10 $^{- 3}$	1.26 × 10 $^{- 3}$	1.2
400	17	4.240	3.17 × 10 $^{- 4}$	1.34 × 10 $^{4}$	1.28 × 10 $^{- 3}$	1.26 × 10 $^{- 3}$	0.9
420	1	4.332	1.13 × 10 $^{- 6}$	3.83 × 10 $^{6}$	1.71 × 10 $^{- 3}$	1.52 × 10 $^{- 3}$	0.7
440	0	4.430	7.70 × 10 $^{- 4}$	5.75 × 10³³	1.49 × 10 $^{- 3}$	1.47 × 10 $^{- 3}$	0.9
500	0	4.699	4.19 × 10 $^{- 3}$	1.12 × 10³	7.96 × 10 $^{- 4}$	7.92 × 10 $^{- 4}$	1.1
1000	0	6.620	2.60 × 10 $^{- 2}$	2.54 × 10 $^{2}$	9.46 × 10 $^{- 5}$	9.46 × 10 $^{- 5}$	1.0
2000	0	9.288	4.15 × 10 $^{- 2}$	2.24 × 10 $^{2}$	5.13 × 10 $^{- 5}$	2.98 × 10 $^{- 5}$	1.7
3000	0	11.291	5.08 × 10 $^{- 2}$	2.22 × 10 $^{2}$	2.65 × 10 $^{- 5}$	4.51 × 10 $^{- 6}$	1.95
4000	0	13.005	5.96 × 10 $^{- 2}$	2.18 × 10 $^{2}$	1.45 × 10 $^{- 5}$	4.27 × 10 $^{- 6}$	1.8
8000	0	18.420	9.01 × 10 $^{- 2}$	2.04 × 10 $^{2}$	8.79 × 10 $^{- 7}$	1.13 × 10 $^{- 8}$	1.9
9520	0	20.106	9.76 × 10 $^{- 2}$	2.06 × 10 $^{2}$	3.23 × 10 $^{- 7}$	9.81 × 10 $^{- 10}$	1.95

Table 5. Seismic tomography problems.

Number of Rows	Number of Zero Singular Values	Largest Singular Value	Smallest Nonzero Singular Value	Condition Number	Residual after 666 Iterations
$i$		$α_{i}$	$β_{i}$	$k_{i}$	$w = 1$	$w = w_{opt}$	$w_{opt}$
40	1	2.929	4.73 × 10 $^{- 2}$	6.19 × 10	9.55 × 10 $^{- 5}$	4.61 × 10 $^{- 7}$	1.6
50	1	3.103	2.93 × 10 $^{- 2}$	1.06 × 10 $^{2}$	7.95 × 10 $^{- 5}$	2.75 × 10 $^{- 5}$	1.6
60	1	3.199	2.92 × 10 $^{- 2}$	1.09 × 10 $^{2}$	9.29 × 10 $^{- 5}$	1.95 × 10 $^{- 5}$	1.6
100	5	3.803	2.92 × 10 $^{- 2}$	1.30 × 10 $^{2}$	6.44 × 10 $^{- 5}$	1.30 × 10 $^{- 5}$	1.6
200	11	5.159	1.06 × 10 $^{- 3}$	4.89 × 10³	4.40 × 10 $^{- 3}$	3.81 × 10 $^{- 3}$	0.5
300	24	6.154	4.33 × 10 $^{- 4}$	1.42 × 10 $^{4}$	3.92 × 10 $^{- 3}$	3.88 × 10 $^{- 3}$	0.6
360	33	6.600	1.22 × 10 $^{- 4}$	5.42 × 10 $^{4}$	5.06 × 10 $^{- 3}$	4.25 × 10 $^{- 3}$	0.4
380	34	6.773	1.06 × 10 $^{- 5}$	6.41 × 10 $^{5}$	4.47 × 10 $^{- 3}$	4.19 × 10 $^{- 3}$	0.5
400	45	6.859	2.40 × 10 $^{- 6}$	2.86 × 10 $^{6}$	4.82 × 10 $^{- 3}$	4.40 × 10 $^{- 3}$	0.5
420	34	6.985	2.41 × 10 $^{- 6}$	2.90 × 10 $^{6}$	4.88 × 10 $^{- 3}$	4.44 × 10 $^{- 3}$	0.5
440	32	7.173	2.43 × 10 $^{- 6}$	2.95 × 10 $^{6}$	4.27 × 10 $^{- 3}$	3.79 × 10 $^{- 3}$	0.6
500	26	7.488	1.06 × 10 $^{- 5}$	7.03 × 10 $^{5}$	4.72 × 10 $^{- 3}$	4.27 × 10 $^{- 3}$	0.5
1000	6	9.807	1.89 × 10 $^{- 3}$	5.18 × 10³	2.61 × 10 $^{- 3}$	2.43 × 10 $^{- 3}$	0.5
2000	4	12.513	9.91 × 10 $^{- 3}$	1.26 × 10³	6.54 × 10 $^{- 4}$	6.45 × 10 $^{- 4}$	1.2
3000	4	14.474	2.79 × 10 $^{- 2}$	5.18 × 10 $^{2}$	4.56 × 10 $^{- 5}$	4.56 × 10 $^{- 5}$	1.0
3200	4	14.902	2.87 × 10 $^{- 2}$	5.19 × 10 $^{2}$	4.51 × 10 $^{- 5}$	4.51 × 10 $^{- 5}$	1.0

Table 6. Seismic tomography problems with initial shuffle.

Number of Rows	Number of Zero Singular Values	Largest Singular Value	Smallest Nonzero Singular Value	Condition Number	Residual after 666 Iterations
$i$		$α_{i}$	$β_{i}$	$k_{i}$	$w = 1$	$w = w_{opt}$	$w_{opt}$
40	0	1.940	3.46 × 10 $^{- 1}$	5.61	1.04 × 10 $^{- 16}$	1.91 × 10 $^{- 17}$	0.4
50	0	2.099	1.70 × 10 $^{- 1}$	1.23 × 10	1.75 × 10 $^{- 16}$	1.12 × 10 $^{- 16}$	1.2
60	0	2.341	1.17 × 10 $^{- 1}$	2.01 × 10	6.16 × 10 $^{- 11}$	1.30 × 10 $^{- 16}$	1.3
100	1	2.876	8.21 × 10 $^{- 2}$	3.50 × 10	1.69 × 10 $^{- 7}$	1.38 × 10 $^{- 8}$	1.1
200	2	3.818	4.19 × 10 $^{- 3}$	9.11 × 10 $^{2}$	4.00 × 10 $^{- 4}$	3.94 × 10 $^{- 4}$	0.9
300	8	4.719	2.22 × 10 $^{- 7}$	2.12 × 10 $^{7}$	4.24 × 10 $^{- 4}$	3.67 × 10 $^{- 4}$	0.7
360	21	5.075	5.56 × 10 $^{- 7}$	9.13 × 10 $^{6}$	3.37 × 10 $^{- 4}$	3.28 × 10 $^{- 4}$	1.2
380	32	5.225	6.33 × 10 $^{- 7}$	8.25 × 10 $^{6}$	2.87 × 10 $^{- 4}$	2.87 × 10 $^{- 4}$	1.0
400	40	5.365	3.28 × 10 $^{- 8}$	1.64 × 10 $^{8}$	2.09 × 10 $^{- 4}$	2.09 × 10 $^{- 4}$	1.1
420	32	5.514	5.29 × 10 $^{- 8}$	1.04 × 10 $^{8}$	2.21 × 10 $^{- 4}$	2.20 × 10 $^{- 4}$	1.1
440	24	5.656	1.15 × 10 $^{- 7}$	4.93 × 10 $^{7}$	2.05 × 10 $^{- 4}$	2.05 × 10 $^{- 4}$	0.9
500	21	6.034	6.29 × 10 $^{- 5}$	9.59 × 10 $^{4}$	1.95 × 10 $^{- 4}$	1.94 × 10 $^{- 4}$	0.9
1000	10	8.370	4.37 × 10 $^{- 3}$	1.92 × 10³	6.94 × 10 $^{- 5}$	6.94 × 10 $^{- 5}$	1.0
2000	6	11.857	2.09 × 10 $^{- 2}$	5.67 × 10 $^{2}$	3.16 × 10 $^{- 5}$	2.52 × 10 $^{- 5}$	1.5
3000	4	14.462	1.96 × 10 $^{- 2}$	7.37 × 10 $^{2}$	1.90 × 10 $^{- 5}$	1.36 × 10 $^{- 5}$	1.6
3200	4	14.902	2.87 × 10 $^{- 2}$	5.19 × 10 $^{2}$	1.56 × 10 $^{- 5}$	1.08 × 10 $^{- 5}$	1.6

The reading of the tables is simple. Consider for example Table 2 when the number of rows equals 400. In this case, the related

400 \times 400

matrix has 22 zero singular values,

α_{i} = 4.200, β_{i} =

4.41 × 10⁻⁴,

k_{i} =

9.53 × 10³, and the residual values are 1.72 × 10⁻³ for

w = 1

, and 1.59 × 10⁻³ for

w_{o p t} = 0.8

.

The experiments reveal interesting features of the anomaly phenomena. First, note the slow increase of the sequence

α_{1}, \dots, α_{m}

. We see that

α_{i}

is considerably smaller than i. Moreover, the larger i is, the smaller the ratio

α_{i} / i

. This behavior is due to the fact that the rows have unit length and random directions; see (17).

The second remark is about the smallest singular value anomaly and the related condition number anomaly. The derivation of these properties relies on the assumption that the submatrices

A_{i}, i = 1, \dots, m

, do not have zero singular values. Yet, as our tables show, several submatrices have zero singular values. Consequently, in some cases, we can see a slight violation of the anomaly behavior.

The third point is about the use of an initial random shuffle. Note that the shuffle reduces the number of zero singular values in the submatrices. In addition, as expected, the shuffled systems enjoy sharper anomaly. In particular, we see that for highly overdetermined (underdetermined) linear systems, the use of a shuffle improves the rate of convergence!

Table 7 and Table 8 display experiments with randomized Kaczmarz methods. In Shuffled Kaczmarz, each iteration starts with a random shuffle of the linear system that is solved. In Randomized Kaczmarz, each iteration is composed from m steps, where each step treats one randomly chosen equation. Thus, in both methods, the computational effort per iteration is slightly larger than that of Kaczmarz iteration. Consider for example Table 7, which describes experiments with the Shuffled Kaczmarz method. Now, let us inspect the solution of the Fan beam tomography problem of the form (62) with

i = 1000

rows. In this case, the related residual values are 1.64 × 10⁻³ and 8.54 × 10⁻⁵, where the smaller value is due to initial shuffle.

The results of Table 7 and Table 8 are quite interesting. First, note that the two randomized methods behave in a similar way. In particular, both methods possess the anomaly phenomenon, and the use of an initial shuffle sharpens the anomaly. However, when solving highly overdetermined systems, the use of initial shuffle has a smaller effect, since now, each iteration includes an internal shuffle. Moreover, comparing Table 7 and Table 8 with Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 indicates that the randomized methods are not faster than the Kaczmarz method with initial shuffle. That is, one shuffle is enough!

Summarizing our experiments, we see that the asymptotic rate of convergence of the Kaczmarz method can be rather slow. Yet, the rate of convergence is considerably affected by a number of factors, such as the number of rows (the Kaczmarz anomaly phenomenon), the value of w, and rows ordering.

6. Concluding Remarks

Although Kaczmarz method has been well known for many years, the Kaczmarz anomaly phenomenon was observed only recently. This is, perhaps, because it requires a certain randomness of the rows’ directions. A major application of Kaczmarz method is to solve large sparse linear systems that arise in computerized tomography. Hence, it is important to expose the extent of the phenomenon when solving such problems. The theory presented in the paper explains the reasons behind the anomaly, while the experiments display its nature.

The Kaczmarz anomaly phenomenon is observed by watching how the number of rows changes the asymptotic rate of convergence. Another property that affects the rate of convergence is row ordering. The initial ordering of tomography problems is often rather poor, which yields a slow rate of convergence. A common remedy that helps to overcome this difficulty is an initial random shuffle. The shuffle is likely to improve the randomality of the rows’ directions and, therefore, to sharpen the anomaly phenomenon. The experiments that we have done illustrate this feature.

Repeating the use of a random shuffle at each iteration gives rise to a new randomized algorithm, the Shuffled Kaczmarz method of Oswald and Zhou [21,22], which is not inferior to the celebrated Randomized Kaczmarz method of Strohmer and Vershynin [24]. However, one consequence of our experiments is that randomized methods are not faster than Kaczmarz method with one initial random shuffle.

In our experiments, the random shuffle is based on a random permutations generator. Yet, following Herman and Meyer [15], it is possible to construct an improved initial shuffle that takes advantage of the special structure of tomography problems. The idea is to seek a permutation that improves the orthogonality between adjacent rows. In general, there is no easy way to achieve this task, but the special structure of tomography problems enables effective solutions of this problem, e.g., [11,15,17,18]. As with random shuffles, the use of optimal ordering is expected to sharpen the anomaly phenomenon. However, the testing of this issue is left to future research.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declare no conflict of interest.

References

Björk, A.; Elfving, T. Accelerated projection methods for computing pseudoinverse solutions of systems of linear equation. BIT 1979, 19, 145–163. [Google Scholar] [CrossRef]
Cegielski, A.; Censor, Y. Projection methods: An annotated bibliography of books and reviews. Optimization 2015, 64, 2343–2358. [Google Scholar]
Censor, Y. Row-action methods for huge and sparse systems and their applications. SIAM Rev. 1981, 23, 444–466. [Google Scholar] [CrossRef]
Censor, Y.; Zenios, S. Parallel Optimization: Theory, Algorithms and Applications; Oxford University Press: Oxford, UK, 1997. [Google Scholar]
Censor, Y.; Herman, G.T.; Jiang, M. A note on the behavior of the randomized Kaczmarz algorithm of Strohmer and Vershynin. J. Fourier Anal. Appl. 2009, 15, 431–436. [Google Scholar] [CrossRef] [Green Version]
Dax, A. The convergence of linear stationary iterative processes for solving singular unstructured systems of linear equations. SIAM Rev. 1990, 32, 611–635. [Google Scholar] [CrossRef]
Dax, A. The adventures of a simple algorithm. Linear Alg. Its Appl. 2003, 361, 41–61. [Google Scholar] [CrossRef] [Green Version]
Dax, A. The smallest singular value anomaly and the condition number anomaly. Axioms 2022, 11, 99. [Google Scholar] [CrossRef]
Dax, A. The rate of convergence of the SOR method in the positive semidefinite case. Comput. Math. Methods 2022, 2022, 6143444. [Google Scholar] [CrossRef]
Gordon, R.; Bender, R.; Herman, G.T. Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography. J. Theor. Biol. 1970, 29, 471–481. [Google Scholar] [CrossRef]
Guan, H.; Gordon, R. A projection access order for speedy convergence of ART (algebraic reconstruction technique): A multilevel scheme for computed tomography. Phy. Med. Biol. 1994, 39, 2005–2022. [Google Scholar] [CrossRef]
Hansen, P.C.; Saxild-Hansen, M. AIR Tools—A MATLAB package of algebraic iterative reconstruction methods. J. Comp. Appl. Math. 2012, 236, 2167–2178. [Google Scholar] [CrossRef] [Green Version]
Hansen, P.C.; Jørgensen, J.S. AIR Tools II: Algebraic iterative reconstruction methods, improved implementation. Numer. Algorithms 2018, 79, 107–137. [Google Scholar] [CrossRef] [Green Version]
Herman, G.T. Image Reconstruction from Projections: The Fundamentals of Computerized Tomography; Academic Press: New York, NY, USA, 1980. [Google Scholar]
Herman, G.T.; Meyer, L.B. Algebraic reconstruction techniques can be made computationally efficient. IEEE Trans. Med. Imaging 1993, 12, 600–609. [Google Scholar] [CrossRef] [Green Version]
Kaczmarz, S. Angenäherte Auflösung von Systemen linearer Gleichungen. Bull. l’Académie Pol. Sci. Lett. 1937, A35, 355–357. [Google Scholar]
Kazantsev, I.G.; Matej, S.; Lewitt, M. Optimal ordering of projections using permutation matrices and angles between projection subspaces. Electron. Notes Discret. Math. 2005, 20, 205–216. [Google Scholar] [CrossRef]
Mueller, K.; Yagel, R.; Cornhill, J.F. The weighted distance scheme: A globally optimizing projection ordering method for ART. IEEE Trans. Med. Imaging 1997, 16, 223–230. [Google Scholar] [CrossRef]
Natterer, E. The Mathematics of Computerized Tomography; Classics in Applied Mathematics; SIAM: Philadelphia, PA, USA, 2001. [Google Scholar]
Needell, D. Randomized Kaczmarz solver for noisy linear systems. BIT 2010, 50, 395–403. [Google Scholar] [CrossRef] [Green Version]
Oswald, P.; Zhou, W. Convergence analysis for Kaczmarz-type methods in a Hilbert space framework. Linear Algebra Its Appl. 2015, 478, 131–161. [Google Scholar] [CrossRef]
Oswald, P.; Zhou, W. Random reordering in SOR-type methods. Numer. Math. 2017, 135, 1207–1220. [Google Scholar] [CrossRef] [Green Version]
Popa, C. Projection Algorithms—Classical Results and Developments; Applications to image reconstruction; Lambert Academic Publishing: Chisinau, Republic of Moldova, 2012. [Google Scholar]
Strohmer, T.; Vershynin, R. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 2009, 15, 262–278. [Google Scholar] [CrossRef] [Green Version]
Tanabe, K. Projection method for solving a singular system of linear equations and its applications. Numer. Math. 1971, 17, 203–214. [Google Scholar] [CrossRef]
Van Dijke, M.C. Iterative Methods in Image Reconstruction. Ph.D. Thesis, Utrecht University, Utrecht, The Netherlands, 1992. [Google Scholar]
Varga, S. Orderings of the successive overrelaxation scheme. Pac. J. Math. 1959, 9, 925–939. [Google Scholar] [CrossRef]

Table 7. Solving tomography problems with the shuffled Kaczmarz method.

Number of Rows	Residuals after 666 Iterations
	No Initial Shuffle			With Initial Shuffle
i	Parallel Beam	Fan Beam	Seismic	Parallel Beam	Fan Beam	Seismic
40	7.90 × 10 $^{- 8}$	5.03 × 10 $^{- 9}$	2.21 × 10 $^{- 4}$	6.65 × 10 $^{- 7}$	8.22 × 10 $^{- 17}$	1.31 × 10 $^{- 16}$
50	3.27 × 10 $^{- 6}$	4.85 × 10 $^{- 9}$	3.40 × 10 $^{- 4}$	3.47 × 10 $^{- 5}$	5.32 × 10 $^{- 17}$	7.12 × 10 $^{- 15}$
100	3.01 × 10 $^{- 4}$	4.53 × 10 $^{- 5}$	2.91 × 10 $^{- 4}$	9.92 × 10 $^{- 5}$	9.18 × 10 $^{- 5}$	1.24 × 10 $^{- 5}$
200	1.01 × 10 $^{- 3}$	6.90 × 10 $^{- 4}$	2.30 × 10 $^{- 3}$	5.28 × 10 $^{- 4}$	4.86 × 10 $^{- 4}$	2.89 × 10 $^{- 4}$
300	2.21 × 10 $^{- 3}$	1.95 × 10 $^{- 3}$	2.72 × 10 $^{- 3}$	9.34 × 10 $^{- 4}$	1.46 × 10 $^{- 3}$	3.73 × 10 $^{- 4}$
360	2.65 × 10 $^{- 3}$	1.80 × 10 $^{- 3}$	3.26 × 10 $^{- 3}$	1.74 × 10 $^{- 3}$	1.26 × 10 $^{- 3}$	3.29 × 10 $^{- 4}$
380	3.07 × 10 $^{- 3}$	2.43 × 10 $^{- 3}$	3.14 × 10 $^{- 3}$	1.80 × 10 $^{- 3}$	1.36 × 10 $^{- 3}$	2.79 × 10 $^{- 4}$
400	2.92 × 10 $^{- 3}$	2.67 × 10 $^{- 3}$	3.11 × 10 $^{- 3}$	1.52 × 10 $^{- 3}$	1.14 × 10 $^{- 3}$	2.31 × 10 $^{- 4}$
420	2.66 × 10 $^{- 3}$	2.85 × 10 $^{- 3}$	3.18 × 10 $^{- 3}$	2.03 × 10 $^{- 3}$	1.25 × 10 $^{- 3}$	1.93 × 10 $^{- 4}$
440	3.11 × 10 $^{- 3}$	2.42 × 10 $^{- 3}$	3.46 × 10 $^{- 3}$	1.80 × 10 $^{- 3}$	1.51 × 10 $^{- 3}$	1.75 × 10 $^{- 4}$
500	2.99 × 10 $^{- 3}$	2.44 × 10 $^{- 3}$	3.01 × 10 $^{- 3}$	1.58 × 10 $^{- 3}$	8.18 × 10 $^{- 4}$	1.81 × 10 $^{- 4}$
1000	1.87 × 10 $^{- 3}$	1.64 × 10 $^{- 3}$	1.17 × 10 $^{- 3}$	1.17 × 10 $^{- 4}$	8.54 × 10 $^{- 5}$	7.62 × 10 $^{- 5}$
2000	3.85 × 10 $^{- 4}$	6.66 × 10 $^{- 4}$	4.06 × 10 $^{- 4}$	7.60 × 10 $^{- 6}$	5.85 × 10 $^{- 5}$	3.73 × 10 $^{- 5}$
3000	1.26 × 10 $^{- 5}$	3.74 × 10 $^{- 5}$	1.65 × 10 $^{- 5}$	9.43 × 10 $^{- 7}$	3.23 × 10 $^{- 5}$	1.80 × 10 $^{- 5}$
4000	1.90 × 10 $^{- 7}$	1.48 × 10 $^{- 5}$		8.06 × 10 $^{- 8}$	1.45 × 10 $^{- 5}$
8000		7.21 × 10 $^{- 7}$			6.95 × 10 $^{- 7}$
9520		3.30 × 10 $^{- 7}$			2.96 × 10 $^{- 7}$

Table 8. Solving tomography problems with the randomized Kaczmarz method.

Number of Rows	Residuals after 666 Iterations
	No Initial Shuffle			With Initial Shuffle
i	Parallel Beam	Fan Beam	Seismic	Parallel Beam	Fan Beam	Seismic
40	1.34 × 10 $^{- 6}$	6.32 × 10 $^{- 8}$	5.12 × 10 $^{- 4}$	9.04 × 10 $^{- 6}$	1.31 × 10 $^{- 16}$	3.30 × 10 $^{- 17}$
50	2.57 × 10 $^{- 5}$	6.65 × 10 $^{- 8}$	5.78 × 10 $^{- 4}$	8.31 × 10 $^{- 5}$	6.11 × 10 $^{- 17}$	5.72 × 10 $^{- 11}$
100	3.02 × 10 $^{- 4}$	9.44 × 10 $^{- 5}$	3.93 × 10 $^{- 4}$	2.63 × 10 $^{- 4}$	3.11 × 10 $^{- 4}$	6.47 × 10 $^{- 5}$
200	1.46 × 10 $^{- 3}$	8.01 × 10 $^{- 4}$	2.51 × 10 $^{- 3}$	7.35 × 10 $^{- 4}$	6.37 × 10 $^{- 4}$	3.98 × 10 $^{- 4}$
300	2.36 × 10 $^{- 3}$	2.74 × 10 $^{- 3}$	2.92 × 10 $^{- 3}$	1.43 × 10 $^{- 3}$	1.36 × 10 $^{- 3}$	3.99 × 10 $^{- 4}$
360	2.62 × 10 $^{- 3}$	2.22 × 10 $^{- 3}$	3.21 × 10 $^{- 3}$	2.01 × 10 $^{- 3}$	1.60 × 10 $^{- 3}$	3.73 × 10 $^{- 4}$
380	2.92 × 10 $^{- 3}$	2.34 × 10 $^{- 3}$	3.09 × 10 $^{- 3}$	2.20 × 10 $^{- 3}$	1.64 × 10 $^{- 3}$	3.48 × 10 $^{- 4}$
400	3.39 × 10 $^{- 3}$	2.52 × 10 $^{- 3}$	3.26 × 10 $^{- 3}$	1.56 × 10 $^{- 3}$	1.65 × 10 $^{- 3}$	2.58 × 10 $^{- 4}$
420	3.26 × 10 $^{- 3}$	2.83 × 10 $^{- 3}$	3.74 × 10 $^{- 3}$	2.36 × 10 $^{- 3}$	1.29 × 10 $^{- 3}$	2.61 × 10 $^{- 4}$
440	3.63 × 10 $^{- 3}$	2.60 × 10 $^{- 3}$	3.38 × 10 $^{- 3}$	2.22 × 10 $^{- 3}$	1.62 × 10 $^{- 3}$	2.04 × 10 $^{- 4}$
500	3.17 × 10 $^{- 3}$	3.14 × 10 $^{- 3}$	3.10 × 10 $^{- 3}$	2.08 × 10 $^{- 3}$	9.14 × 10 $^{- 4}$	2.02 × 10 $^{- 4}$
1000	2.19 × 10 $^{- 3}$	2.09 × 10 $^{- 3}$	1.29 × 10 $^{- 3}$	1.58 × 10 $^{- 4}$	1.14 × 10 $^{- 4}$	7.45 × 10 $^{- 5}$
2000	4.40 × 10 $^{- 4}$	8.05 × 10 $^{- 4}$	3.62 × 10 $^{- 4}$	1.57 × 10 $^{- 5}$	6.40 × 10 $^{- 5}$	4.38 × 10 $^{- 5}$
3000	2.06 × 10 $^{- 5}$	4.36 × 10 $^{- 5}$	1.94 × 10 $^{- 5}$	1.95 × 10 $^{- 6}$	4.87 × 10 $^{- 5}$	2.07 × 10 $^{- 5}$
4000	4.26 × 10 $^{- 7}$	1.88 × 10 $^{- 5}$		1.84 × 10 $^{- 7}$	2.03 × 10 $^{- 5}$
8000		9.80 × 10 $^{- 7}$			1.07 × 10 $^{- 6}$
9520		3.85 × 10 $^{- 7}$			3.60 × 10 $^{- 7}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dax, A. Kaczmarz Anomaly in Tomography Problems. AppliedMath 2022, 2, 196-211. https://doi.org/10.3390/appliedmath2020012

AMA Style

Dax A. Kaczmarz Anomaly in Tomography Problems. AppliedMath. 2022; 2(2):196-211. https://doi.org/10.3390/appliedmath2020012

Chicago/Turabian Style

Dax, Achiya. 2022. "Kaczmarz Anomaly in Tomography Problems" AppliedMath 2, no. 2: 196-211. https://doi.org/10.3390/appliedmath2020012

APA Style

Dax, A. (2022). Kaczmarz Anomaly in Tomography Problems. AppliedMath, 2(2), 196-211. https://doi.org/10.3390/appliedmath2020012

Article Menu

Kaczmarz Anomaly in Tomography Problems

Abstract

1. Introduction

2. The Smallest Singular Value Anomaly and the Condition Number Anomaly

3. The Rate of Convergence of the Kaczmarz Method

4. From Random Shuffles to Optimal Ordering

5. Numerical Experiments

6. Concluding Remarks

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI