1. Introduction
The Kaczmarz algorithm is an iterative method for solving large sparse linear systems of the form
where
and
denote the vectors of unknowns. Let the rows of
A be denoted by the row vectors
. Then, an equivalent way to write (
1) is
The idea of the Kaczmarz method is to handle one equation at a time. Let
denote the Euclidean vector norm and let
w be a preassigned relaxation parameter that satisfies
. Then, the
kth iteration of the Kaczmarz algorithm,
, is composed of
m steps. The
ith step of the
kth iteration,
, starts with the vector
and ends with the vector
That is, the
ith step uses only the
ith equation. Observe that for
, the point
is the projection of
on the hyperplane
. Note also that the
kth iteration,
, starts with the vector
and ends with
The starting point is denoted as
.
The fact that the algorithm uses one row at a time makes it a popular tool for solving large sparse linear systems that arise in important applications, such as computerized tomography or digital signal processing. The literature on the Kaczmarz method is vast and covers various issues. See [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26] and the references therein. Results on the theory behind the method and its rate of convergence can be found in [
1,
6,
14,
16,
19,
21,
23,
25], while efficient implementations and applications are considered in [
3,
4,
10,
12,
13,
14,
15,
19]. In addition, there are several variants of the basic iteration, such as block versions and parallel computing techniques [
4,
14,
19]. In particular, recently, there has been growing interest in Randomized Kaczmarz methods [
5,
20,
22,
24]. Some of the variants are easily described by restating the algorithm with the restriction that each iteration regards only one equation, which is chosen according to some rule. In the basic iteration (
3), the rows are chosen in a sequential “cyclic” manner. In “Greedy” Kaczmarz, we select an equation that has a maximal residual, while in “Randomized” Kaczmarz, the equation’s index
i is selected at random, with probability proportional to
, e.g., [
5,
20,
21,
22,
24]. However, in the coming discussions, the terms “Kaczmarz method” and “Kaczmarz iteration” refer to (
3). The original algorithm of Kaczmarz [
16] is obtained from this framework when
and
.
The use of the relaxation parameter,
w, is motivated by the close relation with the SOR method for solving the linear system
where here,
denotes the vector of unknowns. Let
denote the current solution at the end of the
kth iteration of this method,
. Then, the following observation is well known, e.g., [
1,
6]. If the starting points satisfy
then the equality
holds for all
k. This relation implies that several convergence properties of Kaczmarz method are inherited from those of the SOR method.
Let the linear system
be obtained from (
2) by “normalizing” the rows of
A to have unit length. That is,
Then, it is easy to verify that applying the Kaczmarz method to solve (
9) yields the same sequence as (
3). Hence, when studying the convergence properties of Kaczmarz method, there is no loss of generality in assuming that the rows of
A have unit length. (A similar remark applies to the related SOR method).
In this paper, we consider an interesting feature of Kaczmarz method. To illustrate this property, it is assumed that
m is considerably larger than
n. Let
and
be composed from the first
i rows of
A and
, respectively. That is
and
Then, the new feature is revealed when using Kaczmarz method for solving the linear systems
and watching how the number of rows,
i, affects the rate of convergence. If
A is an arbitrary matrix, we are not expecting a certain behavior. However, as we shall see, in some cases, the number of rows has a dramatic effect on the rate of convergence. Assume first that
i is considerably smaller than
n. In this case, the Kaczmarz method has a fast rate of convergence. Yet as
i increases toward
n, the rate of convergence slows down. That is, the more equations we have, the more iterations are needed to solve the system. In particular, as
i approaches
n, there is a dramatic increase in the number of iterations. The closer
i and
n are, the slower the convergence. However, as
i passes
n, the situation is reversed. From now on, the more equations we have, the fewer iterations are needed. Finally, when
i is considerably larger than
n, the method returns to enjoy rapid convergence.
We call this behavior the
Kaczmarz anomaly. One aim of this paper is to examine the presence of this phenomenon when solving tomography problems. The first report on the Kaczmarz anomaly appeared in [
7], but it remained almost unnoticed. Recently, we have shown in [
8] that it is likely to occur whenever the rows’ directions
scatter randomly over some portion of the unit sphere. This suggests that a random shuffle of the rows may improve their randomality and strengthen the anomaly. A second aim of the paper is to examine this idea.
The plan of the paper is as follows. The first two sections provide theoretical background that reveals the reasons behind the anomaly phenomenon.
Section 2 overviews the condition number anomaly phenomenon, while
Section 3 explains how the condition number of
A affects the rate of convergence. Combining these features yields the Kaczmarz anomaly. The background is based on recent results by this author; see [
8,
9]. The use of random shuffles and related techniques is considered in
Section 4. The paper ends with numerical experiments that illustrate the anomaly phenomena.
2. The Smallest Singular Value Anomaly and the Condition Number Anomaly
Let
denote the largest singular value of the matrix
. Then,
is the largest eigenvalue of the cross-product matrix
, and there exists a unit eigenvector,
, that satisfies
and
The first assertion characterizes the ascending behavior of the sequence
. (For proofs of the coming theorems, see [
8]).
Theorem 1. For , we have the inequalitiesand Next, we explore the behavior of the smallest singular value, which is rather surprising. Let denote the smallest singular value of the matrix , . Then, as the coming theorem shows, the first part of this sequence, , is descending.
Theorem 2. Let be a unit vector in the direction of , and let the unit vector be a right singular vector of that corresponds to . Then, for , we have the inequalities The assumption
is not essential for the proof of Theorem 2, but it enables us to replace (
18) with the inequality
This exposes the actual reasons that force
to be smaller than
. One reason is the size of
. We see that the smaller
is, the smaller is
. Another important factor is the size of the scalar product
. Since both
and
are unit vectors, the Cauchy–Schwartz inequality implies
, and equality occurs if and only if
. Now, from (
19), we see that the larger
is, the smaller is
.
In this paper, we concentrate on the behavior of Kaczmarz method, and for this purpose, it is possible to assume that all the rows of
A have unit length. This assumption implies that
and turns (
19) into the form
A further simplification is allowed when
becomes considerably smaller than one. In this case, the factor
approaches one, and the bound
is nearly as good as (
20).
It is left to see how the second part of the sequence,
, behaves. Below, we will show that this part is ascending. The proof is based on the observation that now,
is the smallest eigenvalue of the cross-product matrix
. Hence, for
, there exists a unit eigenvector
such that
and
Theorem 3. For , we have the inequalities Assume for a moment that
, which enables us to rewrite (
24) in the form
Assume further that the rows of the matrix have unit length and random directions. Then, a small
implies a large increase ratio, while a large
means a slow increase. Consequently, when
i is close to
n, we expect a fast increase, but as
i moves away from
n, the rate of increase is likely to slow down.
Combining the results of Theorems 2 and 3 shows that the sequence is decreasing, the sequence is increasing, and is the smallest number in the whole sequence. Moreover, in some cases, can be considerably smaller than its neighbors. This behavior is called the smallest singular value anomaly.
Let
denote the condition number of
. In the rest of this section, we assume for simplicity that
for
, and that
(The discussion of the case when
is deferred to the next section. In this case,
is redefined as the smallest nonzero singular value of
). We have seen that the sequence
is ascending, while the sequence
is descending. This proves the following conclusion.
Theorem 4. The sequence is ascending. That is, The behavior of the sequence
is not that straightforward. We know that the sequences
and
are ascending, but this does not provide decisive information. Indeed, for
, one can find examples in which
as well as examples with
, e.g., [
8]. The
condition number anomaly occurs when the sequence
is descending. That is, when
The reasons behind this behavior lie in the following observations.
Theorem 5. Let be as in Theorem 1, let be as in Theorem 3, and consider the termsandThen, for , Proof. From (
17), we see that
while Theorem 3 gives
Hence, combining these inequalities yields (
31). □
The last corollary is a key observation that indicates at which situations the condition number anomaly is likely to occur. Assume for example that the direction of
is chosen in some random way. Then, the scalar product terms
and
are likely to be about the same size. However, since
is (considerably) smaller than
, the term
is expected to be larger than
, which implies
. In other words, the condition number anomaly is likely to occur whenever the rows’ directions scatter in some random way. This conclusion means that the phenomenon is shared by a wide range of matrices. See
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 and the examples in [
8].
3. The Rate of Convergence of the Kaczmarz Method
We have seen that the Kaczmarz method for solving (
1) is closely related to the SOR method for solving the linear system
where
Below, we will use this relation to obtain the iteration matrix of Kaczmarz method (
3). As before, it is allowed to assume that the rows of
A have unit length. That is
This assumption implies that
G has the form
where
I denotes the identity matrix, and
L is a strictly lower triangular matrix. The SOR iteration splits
G in the form
where
and
As before,
w is a preassigned relaxation parameter that satisfies
. The
kth SOR iteration,
, starts with
and ends with
, which is computed by solving the linear system
In other words,
is obtained from
by the rule
where
is the related
iteration matrix, and
Observe that (
40) enables us to express
in the form
Multiplying (
44) by
gives
while substituting
instead of
shows that
This means that the
iteration matrix of the Kaczmarz method has the form
Note that
is an
matrix, while
is an
matrix. However, as shown in [
9], these matrices share the nonzero eigenvalues.
The theory of the Kaczmarz method tells us that the sequence
, converges for any choice of
and
, e.g., [
3,
6,
9,
10,
14,
19,
22,
25]. Moreover, let
denote the limit point of this sequence; then, the error vectors
satisfy
This shows that the rate of convergence depends on
, the spectral radius of
. The smaller
is, the faster the convergence. It is interesting, therefore, to see which properties of
A make
small. One answer is given by the following bound, which has recently been derived in [
9]. (A second answer is given in the next section, in which we consider the effect of rows shuffling).
Let
denote the largest singular value of
A, let
denote the smallest nonzero singular value of
A, and let
denote the related condition number of
A. Let
denote the optimal relaxation parameter of the Kaczmarz method. That is,
Then, it is proved in [
9] that
where
c is a constant,
The bound is not tight, and the actual rate of convergence (even for
) is often faster than the implied rate. Recall that in many practical problems,
is not known in advance, and its value is computed by repeated experiments; see
Section 5. Yet the main consequence from this bound is that a small condition number forces fast convergence. Conversely, for large
k, the bound tends to 1, which allows slow convergence. Indeed, as explained in [
9], the existence of small nonzero singular values invites a slow rate of convergence.
The relation between the condition number and the rate of convergence suggests that the Kaczmarz anomaly phenomenon is caused by the condition number anomaly. We have seen that the last phenomenon is expected to occur whenever the rows’ directions scatter randomly. This raises the question of whether tomography problems possess these properties. The next sections attempt to answer this question.
4. From Random Shuffles to Optimal Ordering
The Kaczmarz anomaly phenomenon is observed by watching how the number of rows affects the rate of convergence. Another property that affects the rate of convergence is row ordering. The initial ordering in tomography problems is often rather poor, in the sense that it yields a slow rate of convergence. A typical tomography matrix is composed from several blocks of rows, where each block is generated by one “view”. The views (and the blocks) are ordered according to the size of the view angle, which is the natural geometric order, e.g., [
15], p. 602. Yet this natural order minimizes the angle between adjacent views, which is a property that retards convergence. A possible remedy for this difficulty is to apply a random shuffle of rows before starting the Kaczmarz process. The shuffle is aimed at achieving a faster rate of convergence (see below). Yet, at the same time, it improves the randomality of the rows’ directions, which sharpens the anomaly phenomena.
The term “random shuffle” means that the rows of the linear system (
1) are reordered by applying a random permutation. This converts (
1) into the form
where
is a random permutation matrix. To simplify the coming discussions and experiments, we assume that the random permutation is chosen by MATLAB’s command “randperm
”, and the matrix
is generated by the command “shuffle (
A)”, which uses the “randperm
” command.
The reordering of the rows is expected to change the iteration matrix of the Kaczmarz method as well as its rate of convergence. It is easy to verify this assertion by using the relation with the SOR method for solving the system
Now, it is easy to see that the SOR iteration for solving (
56) differs from that of (
6). Indeed, the observation that the reordering of rows changes the rate of convergence of the SOR method is not new. See, for example, [
27] and the references therein.
As mentioned above, it has been observed by several authors that when solving tomography problems, a random row shuffle may improve the rate of convergence of the Kaczmarz method. See, for example, [
14,
15,
19,
21,
22,
24] and the references therein. A possible explanation of this phenomenon comes from geometric interpretation of the basic step (
3) when
and
. Let
denote the angle between
and
. Then, in two-dimensional space, the distance to the solution point is reduced by the factor
. That is, a small angle yields a small reduction while a large angle implies a large reduction. When moving to larger dimensions, the situation is not that simple, but it is still true that a small
forces a small step toward the solution, while a large
allows larger steps. (Recall that a large
means that
and
are nearly orthogonal). These considerations suggest that a random shuffle may improve the rate of convergence if it improves orthogonality between adjacent rows.
Similar arguments have motivated Herman and Meyer [
15] to propose an optimal ordering of rows that takes advantage of the special structure of tomography problems to maximize orthogonality between adjacent rows. Further optimal ordering schemes that follow this approach are described in [
11,
17,
18].
The observation that a random ordering of rows may improve the rate of convergence has motivated the
Randomized Kaczmarz algorithm of Strohmer and Vershynin [
24]. In this algorithm, the basic step treats one equation whose index is selected at random with probability proportional to
. Thus, when all the rows have unit length, all the indices have equal probability. (In our experiments, the row index is obtained by the “randi
” command.
The use of a random shuffle has recently been considered by Oswald and Zhou [
21,
22], who proposed an improved randomized method, the
Shuffled Kaczmarz algorithm. In this method, each iteration is preceded by a random shuffle of the rows. This formulation has two advantages. First, as in the Kaczmarz method, each iteration treats all the equations. Second, since all the shuffled matrices have the same singular values as
A, the bound on the rate of convergence is the same as in the Kaczmarz method.
It is interesting to compare the above randomized methods with the Initial Shuffle method, which uses one random shuffle before starting the Kaczmarz algorithm. Both approaches share the same motivation. If the given system has bad ordering, then a random shuffle is likely to provide a better ordering. Furthermore, basically, we are not expecting large differences in the quality of the generated random shuffles. Hence, in practice, the initial shuffle method is likely to run at the same speed as the randomized Kaczmarz methods. Yet, since it uses only one shuffle, there is a tiny probability to obtain bad ordering, while randomized algorithms avoid this possibility.
5. Numerical Experiments
The experiments examine the behavior of the Kaczmarz method (
3) when solving tomography test problems. The test problems are generated by using MATLAB’s functions from “AIR tools”, which is a MATLAB package of algebraic reconstruction iterative methods prepared by P.C. Hansen and others [
12,
13]. The test problems imitate the scanning of an
array of square cells. This generates a linear system with
unknowns. (The unknowns present the densities of the cells while equations describe rays.)
In our experiments, , and all the test matrices have columns. The number of rows depends on the nature of the scanning device and the specific details of the experiment.
The
Parallel beam tomography problems are generated by using MATLAB’s function
with
, theta = 1:1:180, and
. (The vector theta contains the angles of the views, while
p denotes the number of parallel rays for each view.) This results in a linear system with
unknowns and
equations. However, if a ray passes outside the array, it generates a null row. So, we use MATLAB’s function
to remove zero rows, which yields a linear system with
unknowns and
rows.
In
Fan beam tomography, each angle (each view) is related to a “fan” of
p rays, and the problem is generated by using the function
with
and the default values theta = 0:1:359 and
. This builds a linear system with
unknowns and
= 10,080 equations. Then, after removing zero rows, we remain with
equations.
In
Seismic tomography problems, the linear system is generated by applying the function
with
,
sources, and
receivers. This setting builds a linear system with
unknowns and
equations. (In this case, there are no zero rows.)
The experiments were carried out as follows. At first, we have generated an
linear system,
, as described above. Together with
A and
, we are given a prescribed solution
, which is the one that has been used to build
A and
. Then, in the second stage, the rows of
A are normalized to have a unit norm. Thus, for
, the
ith row of
A is redefined as
, while
is updated as
. Finally, after the normalization, the Kaczmarz method was applied to solve partial systems of the form (
13). The starting point in our runs is always
, and the iterative process was terminated after 666 iterations.
The shuffled test problems are obtained by reordering the rows of
A, using a random permutation. The actual reordering is carried out by applying MATLAB’s function
After the shuffling, the vector
is redefined as
where
denotes the known solution. (The shuffling takes place after the normalization but before starting the solution of the partial linear systems.)
The rows in
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 describe the use of the Kaczmarz method to solve partial linear systems of the form
Recall that
is a
submatrix of
A which is composed from the first
i rows of
A, and
is composed from the first
i entries of
. The results for the linear system (
62) start with the number of rows,
i, and the number of zero singular values of
. Then, we provide the values of
, and
, as well as the related residual values. As noted in the tables’ headlines,
is the largest singular value of
,
is the smallest nonzero singular value of
, and
is the condition number of
. The residual values are defined as
where
denotes the Euclidean vector norm and
denotes the computed solution after 666 iterations. The Kaczmarz method uses a relaxation parameter,
w, and the residual values are given for
and
. The value of
was obtained by running the Kaczmarz method with values of
w from the set
and taking a value of
w that yields the smallest residual.
Table 1.
Parallel beam tomography.
Table 1.
Parallel beam tomography.
Number of Rows | Number of Zero Singular Values | Largest Singular Value | Smallest Nonzero Singular Value | Condition Number | Residual after 666 Iterations | |
---|
| | | | | | | |
---|
50 | 6 | 1.723 | 6.46 × 10 | 2.67 × 10 | 1.91 × 10 | 7.95 × 10 | 1.4 |
100 | 8 | 2.213 | 9.77 × 10 | 2.26 × 10 | 1.49 × 10 | 1.12 × 10 | 1.5 |
200 | 8 | 3.133 | 7.50 × 10 | 4.17 × 103 | 3.83 × 10 | 2.77 × 10 | 1.2 |
300 | 20 | 3.740 | 5.18 × 10 | 7.22 × 103 | 1.38 × 10 | 1.27 × 10 | 1.1 |
360 | 27 | 4.105 | 2.91 × 10 | 1.41 × 10 | 2.04 × 10 | 2.02 × 10 | 1.1 |
380 | 31 | 4.218 | 2.67 × 10 | 1.58 × 10 | 2.37 × 10 | 2.37 × 10 | 1.0 |
400 | 37 | 4.307 | 2.12 × 10 | 2.03 × 10 | 2.84 × 10 | 2.84 × 10 | 1.0 |
420 | 22 | 4.363 | 5.09 × 10 | 8.58 × 10 | 3.10 × 10 | 3.09 × 10 | 0.9 |
440 | 16 | 4.468 | 7.07 × 10 | 6.32 × 10 | 3.13 × 10 | 3.13 × 10 | 1.0 |
500 | 11 | 4.774 | 2.15 × 10 | 2.22 × 10 | 2.82 × 10 | 2.82 × 10 | 1.0 |
1000 | 0 | 6.504 | 2.47 × 10 | 2.64 × 103 | 3.11 × 10 | 2.83 × 10 | 0.6 |
2000 | 0 | 9.111 | 2.29 × 10 | 3.97 × 10 | 1.48 × 10 | 1.24 × 10 | 0.6 |
3000 | 0 | 11.246 | 6.87 × 10 | 1.64 × 10 | 6.82 × 10 | 3.64 × 10 | 0.2 |
4000 | 0 | 12.898 | 1.01 × 10 | 1.28 × 10 | 4.66 × 10 | 2.14 × 10 | 0.2 |
4340 | 0 | 13.498 | 1.15 × 10 | 1.18 × 10 | 3.76 × 10 | 1.76 × 10 | 0.2 |
Table 2.
Parallel beam with initial shuffle.
Table 2.
Parallel beam with initial shuffle.
Number of Rows | Number of Zero Singular Values | Largest Singular Value | Smallest Nonzero Singular Value | Condition Number | Residual after 666 Iterations | |
---|
| | | | | | | |
---|
40 | 1 | 1.786 | 8.94 × 10 | 2.00 × 10 | 5.79 × 10 | 2.08 × 10 | 1.5 |
50 | 1 | 1.966 | 6.61 × 10 | 2.97 × 10 | 4.15 × 10 | 5.31 × 10 | 1.6 |
60 | 1 | 1.999 | 6.57 × 10 | 3.04 × 10 | 6.41 × 10 | 9.79 × 10 | 1.6 |
100 | 1 | 2.600 | 6.41 × 10 | 3.52 × 10 | 1.43 × 10 | 1.48 × 10 | 1.6 |
200 | 5 | 3.064 | 1.31 × 10 | 2.34 × 10 | 3.14 × 10 | 8.42 × 10 | 1.6 |
300 | 9 | 3.694 | 7.37 × 10 | 5.01 × 10 | 8.55 × 10 | 7.55 × 10 | 1.4 |
360 | 15 | 4.026 | 1.41 × 10 | 2.85 × 103 | 1.64 × 10 | 1.60 × 10 | 0.9 |
380 | 18 | 4.119 | 3.94 × 10 | 1.05 × 10 | 1.95 × 10 | 1.81 × 10 | 0.8 |
400 | 22 | 4.200 | 4.41 × 10 | 9.53 × 103 | 1.72 × 10 | 1.59 × 10 | 0.8 |
420 | 5 | 4.304 | 1.09 × 10 | 3.96 × 10 | 1.86 × 10 | 1.80 × 10 | 0.9 |
440 | 0 | 4.407 | 6.25 × 10 | 7.05 × 103 | 1.73 × 10 | 1.71 × 10 | 0.9 |
500 | 0 | 4.678 | 4.77 × 10 | 9.80 × 10 | 1.59 × 10 | 1.58 × 10 | 1.1 |
1000 | 0 | 6.534 | 3.86 × 10 | 1.69 × 10 | 1.18 × 10 | 1.40 × 10 | 1.95 |
2000 | 0 | 9.216 | 7.22 × 10 | 1.28 × 10 | 7.45 × 10 | 4.89 × 10 | 1.8 |
3000 | 0 | 11.243 | 9.15 × 10 | 1.23 × 10 | 1.03 × 10 | 2.00 × 10 | 1.8 |
4000 | 0 | 12.965 | 1.10 × 10 | 1.18 × 10 | 9.63 × 10 | 1.07 × 10 | 1.95 |
4340 | 0 | 13.498 | 1.15 × 10 | 1.18 × 10 | 4.36 × 10 | 9.80 × 10 | 1.8 |
Table 3.
Fan beam tomography.
Table 3.
Fan beam tomography.
Number of Rows | Number of Zero Singular Values | Largest Singular Value | Smallest Nonzero Singular Value | Condition Number | Residual after 666 Iterations | |
---|
| | | | | | | |
---|
40 | 3 | 1.601 | 8.98 × 10 | 1.784 × 10 | 2.29 × 10 | 6.29 × 10 | 1.4 |
50 | 3 | 1.601 | 8.98 × 10 | 1.784 × 10 | 2.07 × 10 | 5.64 × 10 | 1.5 |
100 | 11 | 2.275 | 1.37 × 10 | 1.658 × 103 | 1.51 × 10 | 6.36 × 10 | 1.2 |
200 | 31 | 3.165 | 2.59 × 10 | 1.220 × 10 | 3.37 × 10 | 2.94 × 10 | 1.2 |
300 | 55 | 3.812 | 7.03 × 10 | 5.418 × 103 | 1.40 × 10 | 1.37 × 10 | 1.1 |
360 | 64 | 4.068 | 5.17 × 10 | 7.875 × 103 | 1.93 × 10 | 1.89 × 10 | 0.9 |
380 | 68 | 4.213 | 4.70 × 10 | 8.969 × 103 | 2.11 × 10 | 2.09 × 10 | 0.9 |
400 | 74 | 4.346 | 9.89 × 10 | 4.393 × 10 | 2.25 × 10 | 2.24 × 10 | 0.9 |
420 | 61 | 4.452 | 8.22 × 10 | 5.416 × 10 | 2.77 × 10 | 2.72 × 10 | 0.9 |
440 | 44 | 4.531 | 3.81 × 10 | 1.190 × 10 | 2.95 × 10 | 2.93 × 10 | 0.9 |
500 | 12 | 4.810 | 3.13 × 10 | 1.537 × 10 | 3.05 × 10 | 2.85 × 10 | 0.7 |
1000 | 0 | 6.640 | 3.60 × 10 | 1.845 × 103 | 3.12 × 10 | 2.84 × 10 | 0.5 |
2000 | 0 | 9.317 | 1.37 × 10 | 6.792 × 10 | 1.79 × 10 | 1.68 × 10 | 0.6 |
3000 | 0 | 11.435 | 4.13 × 10 | 2.767 × 10 | 3.10 × 10 | 2.16 × 10 | 0.4 |
4000 | 0 | 13.141 | 5.93 × 10 | 2.217 × 10 | 3.13 × 10 | 2.28 × 10 | 0.4 |
8000 | 0 | 18.460 | 8.99 × 10 | 2.053 × 10 | 1.90 × 10 | 1.32 × 10 | 0.2 |
9520 | 0 | 20.106 | 9.76 × 10 | 2.06 × 10 | 1.05 × 10 | 7.77 × 10 | 0.4 |
Table 4.
Fan beam with initial shuffle.
Table 4.
Fan beam with initial shuffle.
Number of Rows | Number of Zero Singular Values | Largest Singular Value | Smallest Nonzero Singular Value | Condition Number | Residual after 666 Iterations | |
---|
| | | | | | | |
---|
40 | 0 | 1.589 | 2.63 × 10 | 6.04 | 1.08 × 10 | 4.90 × 10 | 0.7 |
50 | 0 | 1.764 | 2.53 × 10 | 6.96 | 1.35 × 10 | 5.91 × 10 | 0.9 |
100 | 2 | 2.254 | 4.08 × 10 | 5.52 × 10 | 1.37 × 10 | 1.23 × 10 | 0.9 |
200 | 5 | 3.056 | 2.75 × 10 | 1.11 × 103 | 3.89 × 10 | 3.88 × 10 | 1.1 |
300 | 8 | 3.700 | 1.50 × 10 | 2.47 × 103 | 1.23 × 10 | 1.22 × 10 | 0.9 |
360 | 10 | 4.041 | 4.04 × 10 | 1.00 × 10 | 1.25 × 10 | 1.25 × 10 | 1.0 |
380 | 13 | 4.139 | 2.32 × 10 | 1.78 × 10 | 1.28 × 10 | 1.26 × 10 | 1.2 |
400 | 17 | 4.240 | 3.17 × 10 | 1.34 × 10 | 1.28 × 10 | 1.26 × 10 | 0.9 |
420 | 1 | 4.332 | 1.13 × 10 | 3.83 × 10 | 1.71 × 10 | 1.52 × 10 | 0.7 |
440 | 0 | 4.430 | 7.70 × 10 | 5.75 × 103³ | 1.49 × 10 | 1.47 × 10 | 0.9 |
500 | 0 | 4.699 | 4.19 × 10 | 1.12 × 103 | 7.96 × 10 | 7.92 × 10 | 1.1 |
1000 | 0 | 6.620 | 2.60 × 10 | 2.54 × 10 | 9.46 × 10 | 9.46 × 10 | 1.0 |
2000 | 0 | 9.288 | 4.15 × 10 | 2.24 × 10 | 5.13 × 10 | 2.98 × 10 | 1.7 |
3000 | 0 | 11.291 | 5.08 × 10 | 2.22 × 10 | 2.65 × 10 | 4.51 × 10 | 1.95 |
4000 | 0 | 13.005 | 5.96 × 10 | 2.18 × 10 | 1.45 × 10 | 4.27 × 10 | 1.8 |
8000 | 0 | 18.420 | 9.01 × 10 | 2.04 × 10 | 8.79 × 10 | 1.13 × 10 | 1.9 |
9520 | 0 | 20.106 | 9.76 × 10 | 2.06 × 10 | 3.23 × 10 | 9.81 × 10 | 1.95 |
Table 5.
Seismic tomography problems.
Table 5.
Seismic tomography problems.
Number of Rows | Number of Zero Singular Values | Largest Singular Value | Smallest Nonzero Singular Value | Condition Number | Residual after 666 Iterations | |
---|
| | | | | | | |
---|
40 | 1 | 2.929 | 4.73 × 10 | 6.19 × 10 | 9.55 × 10 | 4.61 × 10 | 1.6 |
50 | 1 | 3.103 | 2.93 × 10 | 1.06 × 10 | 7.95 × 10 | 2.75 × 10 | 1.6 |
60 | 1 | 3.199 | 2.92 × 10 | 1.09 × 10 | 9.29 × 10 | 1.95 × 10 | 1.6 |
100 | 5 | 3.803 | 2.92 × 10 | 1.30 × 10 | 6.44 × 10 | 1.30 × 10 | 1.6 |
200 | 11 | 5.159 | 1.06 × 10 | 4.89 × 103 | 4.40 × 10 | 3.81 × 10 | 0.5 |
300 | 24 | 6.154 | 4.33 × 10 | 1.42 × 10 | 3.92 × 10 | 3.88 × 10 | 0.6 |
360 | 33 | 6.600 | 1.22 × 10 | 5.42 × 10 | 5.06 × 10 | 4.25 × 10 | 0.4 |
380 | 34 | 6.773 | 1.06 × 10 | 6.41 × 10 | 4.47 × 10 | 4.19 × 10 | 0.5 |
400 | 45 | 6.859 | 2.40 × 10 | 2.86 × 10 | 4.82 × 10 | 4.40 × 10 | 0.5 |
420 | 34 | 6.985 | 2.41 × 10 | 2.90 × 10 | 4.88 × 10 | 4.44 × 10 | 0.5 |
440 | 32 | 7.173 | 2.43 × 10 | 2.95 × 10 | 4.27 × 10 | 3.79 × 10 | 0.6 |
500 | 26 | 7.488 | 1.06 × 10 | 7.03 × 10 | 4.72 × 10 | 4.27 × 10 | 0.5 |
1000 | 6 | 9.807 | 1.89 × 10 | 5.18 × 103 | 2.61 × 10 | 2.43 × 10 | 0.5 |
2000 | 4 | 12.513 | 9.91 × 10 | 1.26 × 103 | 6.54 × 10 | 6.45 × 10 | 1.2 |
3000 | 4 | 14.474 | 2.79 × 10 | 5.18 × 10 | 4.56 × 10 | 4.56 × 10 | 1.0 |
3200 | 4 | 14.902 | 2.87 × 10 | 5.19 × 10 | 4.51 × 10 | 4.51 × 10 | 1.0 |
Table 6.
Seismic tomography problems with initial shuffle.
Table 6.
Seismic tomography problems with initial shuffle.
Number of Rows | Number of Zero Singular Values | Largest Singular Value | Smallest Nonzero Singular Value | Condition Number | Residual after 666 Iterations | |
---|
| | | | | | | |
---|
40 | 0 | 1.940 | 3.46 × 10 | 5.61 | 1.04 × 10 | 1.91 × 10 | 0.4 |
50 | 0 | 2.099 | 1.70 × 10 | 1.23 × 10 | 1.75 × 10 | 1.12 × 10 | 1.2 |
60 | 0 | 2.341 | 1.17 × 10 | 2.01 × 10 | 6.16 × 10 | 1.30 × 10 | 1.3 |
100 | 1 | 2.876 | 8.21 × 10 | 3.50 × 10 | 1.69 × 10 | 1.38 × 10 | 1.1 |
200 | 2 | 3.818 | 4.19 × 10 | 9.11 × 10 | 4.00 × 10 | 3.94 × 10 | 0.9 |
300 | 8 | 4.719 | 2.22 × 10 | 2.12 × 10 | 4.24 × 10 | 3.67 × 10 | 0.7 |
360 | 21 | 5.075 | 5.56 × 10 | 9.13 × 10 | 3.37 × 10 | 3.28 × 10 | 1.2 |
380 | 32 | 5.225 | 6.33 × 10 | 8.25 × 10 | 2.87 × 10 | 2.87 × 10 | 1.0 |
400 | 40 | 5.365 | 3.28 × 10 | 1.64 × 10 | 2.09 × 10 | 2.09 × 10 | 1.1 |
420 | 32 | 5.514 | 5.29 × 10 | 1.04 × 10 | 2.21 × 10 | 2.20 × 10 | 1.1 |
440 | 24 | 5.656 | 1.15 × 10 | 4.93 × 10 | 2.05 × 10 | 2.05 × 10 | 0.9 |
500 | 21 | 6.034 | 6.29 × 10 | 9.59 × 10 | 1.95 × 10 | 1.94 × 10 | 0.9 |
1000 | 10 | 8.370 | 4.37 × 10 | 1.92 × 103 | 6.94 × 10 | 6.94 × 10 | 1.0 |
2000 | 6 | 11.857 | 2.09 × 10 | 5.67 × 10 | 3.16 × 10 | 2.52 × 10 | 1.5 |
3000 | 4 | 14.462 | 1.96 × 10 | 7.37 × 10 | 1.90 × 10 | 1.36 × 10 | 1.6 |
3200 | 4 | 14.902 | 2.87 × 10 | 5.19 × 10 | 1.56 × 10 | 1.08 × 10 | 1.6 |
The reading of the tables is simple. Consider for example
Table 2 when the number of rows equals 400. In this case, the related
matrix has 22 zero singular values,
4.41 × 10
−4,
9.53 × 10
3, and the residual values are 1.72 × 10
−3 for
, and 1.59 × 10
−3 for
.
The experiments reveal interesting features of the anomaly phenomena. First, note the slow increase of the sequence
. We see that
is considerably smaller than
i. Moreover, the larger
i is, the smaller the ratio
. This behavior is due to the fact that the rows have unit length and random directions; see (
17).
The second remark is about the smallest singular value anomaly and the related condition number anomaly. The derivation of these properties relies on the assumption that the submatrices , do not have zero singular values. Yet, as our tables show, several submatrices have zero singular values. Consequently, in some cases, we can see a slight violation of the anomaly behavior.
The third point is about the use of an initial random shuffle. Note that the shuffle reduces the number of zero singular values in the submatrices. In addition, as expected, the shuffled systems enjoy sharper anomaly. In particular, we see that for highly overdetermined (underdetermined) linear systems, the use of a shuffle improves the rate of convergence!
Table 7 and
Table 8 display experiments with randomized Kaczmarz methods. In
Shuffled Kaczmarz, each iteration starts with a random shuffle of the linear system that is solved. In
Randomized Kaczmarz, each iteration is composed from
m steps, where each step treats one randomly chosen equation. Thus, in both methods, the computational effort per iteration is slightly larger than that of Kaczmarz iteration. Consider for example
Table 7, which describes experiments with the Shuffled Kaczmarz method. Now, let us inspect the solution of the Fan beam tomography problem of the form (
62) with
rows. In this case, the related residual values are 1.64 × 10
−3 and 8.54 × 10
−5, where the smaller value is due to initial shuffle.
The results of
Table 7 and
Table 8 are quite interesting. First, note that the two randomized methods behave in a similar way. In particular, both methods possess the anomaly phenomenon, and the use of an initial shuffle sharpens the anomaly. However, when solving highly overdetermined systems, the use of initial shuffle has a smaller effect, since now, each iteration includes an internal shuffle. Moreover, comparing
Table 7 and
Table 8 with
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 indicates that the randomized methods are not faster than the Kaczmarz method with initial shuffle. That is, one shuffle is enough!
Summarizing our experiments, we see that the asymptotic rate of convergence of the Kaczmarz method can be rather slow. Yet, the rate of convergence is considerably affected by a number of factors, such as the number of rows (the Kaczmarz anomaly phenomenon), the value of w, and rows ordering.
6. Concluding Remarks
Although Kaczmarz method has been well known for many years, the Kaczmarz anomaly phenomenon was observed only recently. This is, perhaps, because it requires a certain randomness of the rows’ directions. A major application of Kaczmarz method is to solve large sparse linear systems that arise in computerized tomography. Hence, it is important to expose the extent of the phenomenon when solving such problems. The theory presented in the paper explains the reasons behind the anomaly, while the experiments display its nature.
The Kaczmarz anomaly phenomenon is observed by watching how the number of rows changes the asymptotic rate of convergence. Another property that affects the rate of convergence is row ordering. The initial ordering of tomography problems is often rather poor, which yields a slow rate of convergence. A common remedy that helps to overcome this difficulty is an initial random shuffle. The shuffle is likely to improve the randomality of the rows’ directions and, therefore, to sharpen the anomaly phenomenon. The experiments that we have done illustrate this feature.
Repeating the use of a random shuffle at each iteration gives rise to a new randomized algorithm, the Shuffled Kaczmarz method of Oswald and Zhou [
21,
22], which is not inferior to the celebrated Randomized Kaczmarz method of Strohmer and Vershynin [
24]. However, one consequence of our experiments is that randomized methods are not faster than Kaczmarz method with one initial random shuffle.
In our experiments, the random shuffle is based on a random permutations generator. Yet, following Herman and Meyer [
15], it is possible to construct an improved initial shuffle that takes advantage of the special structure of tomography problems. The idea is to seek a permutation that improves the orthogonality between adjacent rows. In general, there is no easy way to achieve this task, but the special structure of tomography problems enables effective solutions of this problem, e.g., [
11,
15,
17,
18]. As with random shuffles, the use of optimal ordering is expected to sharpen the anomaly phenomenon. However, the testing of this issue is left to future research.