The Smallest Singular Value Anomaly and the Condition Number Anomaly

Achiya Dax

doi:10.3390/axioms11030099

Abstract

Let A be an arbitrary matrix in which the number of rows, m, is considerably larger than the number of columns, n. Let the submatrix

A_{i}, i = 1, \dots, m

, be composed of the first i rows of A. Let

β_{i}

denote the smallest singular value of

A_{i}

, and let

k_{i}

denote the condition number of

A_{i}

. In this paper, we examine the behavior of the sequences

β_{1}, \dots, β_{m}

, and

k_{1}, \dots, k_{m}

. The behavior of the smallest singular values sequence is somewhat surprising. The first part of this sequence,

β_{1}, \dots, β_{n}

, is descending, while the second part,

β_{n}, \dots, β_{m}

, is ascending. This phenomenon is called “the smallest singular value anomaly”. The sequence of the condition numbers has a different character. The first part of this sequence,

k_{1}, \dots, k_{n}

, always ascends toward

k_{n}

, which can be very large. The condition number anomaly occurs when the second part,

k_{n}, \dots, k_{m}

, descends toward a value of

k_{m}

, which is considerably smaller than

k_{n}

. It is shown that this is likely to happen whenever the rows of A satisfy two conditions: all the rows are about the same size, and the directions of the rows scatter in some random way. These conditions hold in a wide range of random matrices, as well as other types of matrices. The practical importance of these phenomena lies in the use of iterative methods for solving large linear systems, since several iterative solvers have the property that a large condition number results in a slow rate of convergence, while a small condition number yields fast convergence. Consequently, a condition number anomaly leads to a similar anomaly in the number of iterations. The paper ends with numerical experiments that illustrate the above phenomena.

Keywords:

submatrices; largest singular value; smallest singular value anomaly; condition number anomaly; iterations anomaly; random matrices

MSC:

65F15; 65F25; 65F50

1. Introduction

Let

A \in R^{m \times n}

be a given matrix in which the number of rows, m, is considerably larger than the number of columns, n. Let the rows of A be denoted as

a_{i}^{T}, i = 1, \dots, m

, where

a_{i} \in R^{n}

. That is,

A = {[a_{1}, \dots, a_{m}]}^{T}

. Let

A_{i} = {[a_{1}, \dots, a_{i}]}^{T} \in R^{i \times n}

(1)

be a submatrix of A, which is composed from the first i rows of A. Let

α_{i}

denote the largest singular value of

A_{i}

, let

β_{i}

denote the smallest singular value of

A_{i}

, and let

k_{i}

denote the condition number of this matrix. In this paper, we investigate the behavior of the sequences

α_{1}, \dots, α_{m}

,

β_{1}, \dots, β_{m}

and

k_{1}, \dots, k_{m}

. We start by showing that adding rows causes the largest singular value to increase,

α_{i} \leq α_{i + 1} for i = 1, \dots, m - 1,

(2)

and study the reasons for the large, or small, increase. Next, we consider the behavior of the smallest singular values, which is somewhat surprising: at first, adding rows causes decreasing,

β_{i} \geq β_{i + 1} for i = 1, \dots, n - 1 .

(3)

Then, as i passes n, adding rows increases the smallest singular value. That is,

β_{i} \leq β_{i + 1} for i = n, \dots, m - 1 .

(4)

This behavior is called “the smallest singular value anomaly”. The study of this phenomenon explains the reasons for large, or small, difference between

β_{i + 1}

and

β_{i}

.

The last observation implies that

β_{n}

is the smallest number in the sequence

β_{1}, \dots, β_{m}

. Assume for simplicity that

β_{n} > 0

. In this case,

β_{i} \geq β_{n} > 0

for

i = 1, \dots, m

, and the ratio

k_{i} = α_{i} / β_{i}

is the condition number of

A_{i}

. This number affects the results of certain computations, such as the solution of linear equations, e.g., [1,2,3]. It is interesting, therefore, to examine the behavior of the sequence

k_{i}, i = 1, \dots, m

. The inequalities (2) and (3) show that as i moves from 1 to n, the value of

k_{i}

increases. That is,

k_{i} \leq k_{i + 1} for i = 1, \dots, n - 1 .

(5)

However, as i passes n, both

α_{i}

and

β_{i}

are increasing, and the behavior of

k_{i}

is not straightforward. The fact that the sequence

β_{i}, i = n, \dots, m

is increasing tempts one to expect that the sequence

k_{i}, i = n, \dots, m

will decrease. That is,

k_{i} \geq k_{i + 1} for i = n, \dots, m - 1 .

(6)

The situation in which (6) holds is called the condition number anomaly. In this case, the sequence

k_{i}, i = 1, \dots, n

increases toward

k_{n}

, which can be quite large, while the sequence

k_{i}, i = n, \dots, m

decreases toward a value of

k_{m}

, which is considerably smaller than

k_{n}

. The inequalities that assess the increase in the sequences (2) and (4) enable us to derive a useful bound on the ratio

k_{i + 1} / k_{i}

. The bound explains the reasons behind the condition number anomaly, and characterizes situations that invite (or exclude) such behavior.

One type of matrices that exhibits the condition number anomaly is that of dense random matrices in which each element of the matrix is independently sampled from the same probability distribution. In particular, if each element of A comes from an independent standard normal distribution, then A is a Gaussian random matrix, and

A A^{T}

is a Wishart matrix. The problem of estimating the largest and the smallest singular values of large Gaussian matrices has been studied by several authors. See [4,5,6,7,8,9,10,11,12,13,14] and the references therein. In this case, when n is very large and

i > n

, we have the estimates

α_{i} / \sqrt{i} \approx 1 + \sqrt{n / i},

(7)

β_{i} / \sqrt{i} \approx 1 - \sqrt{n / i},

(8)

and

k_{i} \approx (1 + \sqrt{n / i}) / (1 - \sqrt{n / i}),

(9)

which means that very large Gaussian matrices possess the condition number anomaly (for very large n and

i = n

we have

β_{n} \approx 1 / \sqrt{n}

; see [7,12]).

Our analysis shows that the condition number anomaly is not restricted to large Gaussian matrices. It is shared by a wide range of matrices, from small random matrices to large sparse matrices. The bounds that we derive have a simple geometric interpretation that helps to see what makes

k_{n}

large and what forces the sequence

k_{n}, k_{n + 1}, \dots, k_{m}

to decrease. Roughly speaking, the condition number anomaly is expected whenever all the rows of the matrix have about the same size and the directions of the rows are randomly scattered. The paper brings several numerical examples that illustrate this feature.

The practical interest in the condition number anomaly comes from the use of iterative methods for solving large sparse linear systems, e.g., [15,16,17,18]. Some of these methods have the property that the asymptotic rate of convergence depends on the condition number of the related matrix. That is, a large condition number results in slow convergence, while a small condition number yields fast convergence. Assume now that such a method is used to solve a linear system whose matrix has the condition number anomaly. Then the last property implies a similar anomaly in the number of iterations. This phenomenon is called “iterations anomaly”. The discussion in Section 5 demonstrates this property in the methods of Richardson, Cimmino, and Jacobi. See Table 12.

2. The Ascending Behavior of the Largest Singular Values

In this section, we investigate the behavior of the sequence

α_{1}, \dots, α_{m}

. The first assertion establishes the ascending property of this sequence.

Theorem 1.

The sequence

α_{1}, \dots, α_{m}

satisfies

α_{i}^{2} \leq α_{i + 1}^{2} for i = 1, \dots, m - 1 .

(10)

Proof.

Observe that

α_{i}^{2}

is the largest eigenvalue of the matrix

A_{i} A_{i}^{T}

, which is a principal submatrix of

A_{i + 1} A_{i + 1}^{T}

. Hence, (10) is a direct consequence of the Cauchy interlace theorem. For statements and proofs of this theorem, see, for example, Refs. [2] (p. 441), [19] (p. 185), [20] (p. 149), [21] and [22] (p. 186). A second way to prove (10) is given below. This approach enables a closer inspection of the ascending process.

Here, we use the fact that

α_{i}^{2}

is the largest eigenvalue of the cross-product matrix

A_{i}^{T} A_{i}

. Let the unit vector

u_{i}

denote the corresponding dominant eigenvector of

A_{i}^{T} A_{i}

. Then

A_{i}^{T} A_{i} u_{i} = α_{i}^{2} u_{i}

(11)

and

u_{i}^{T} A_{i}^{T} A_{i} u_{i} = α_{i}^{2} = \max \{x^{T} A_{i}^{T} A_{i} x | x \in R^{n} and {∥ x ∥}_{2} = 1\},

(12)

where

{∥ x ∥}_{2} = {(x^{T} x)}^{1 / 2}

denotes the Euclidean vector norm. Note also that

A_{i}^{T} A_{i} = \sum_{j = 1}^{i} a_{j} a_{j}^{T},

(13)

A_{i + 1}^{T} A_{i + 1} = A_{i}^{T} A_{i} + a_{i + 1} a_{i + 1}^{T},

(14)

and

u_{i}^{T} A_{i + 1}^{T} A_{i + 1} u_{i} \leq u_{i + 1}^{T} A_{i + 1}^{T} A_{i + 1} u_{i + 1} = α_{i + 1}^{2} .

(15)

Consequently,

\begin{matrix} α_{i + 1}^{2} & \geq u_{i}^{T} A_{i + 1}^{T} A_{i + 1} u_{i} = u_{i}^{T} A_{i}^{T} A_{i} u_{i} + (u_{i}^{T} a_{i + 1}) (a_{i + 1}^{T} u_{i}) \\ = α_{i}^{2} + {(u_{i}^{T} a_{i + 1})}^{2} \geq α_{i}^{2}, \end{matrix}

(16)

which proves (10). □

Next, we provide an upper bound on the increase in

α_{i + 1}^{2}

.

Theorem 2.

The inequality

α_{i + 1}^{2} \leq α_{i}^{2} + {∥ a_{i + 1} ∥}_{2}^{2}

(17)

holds for

i = 1, \dots, m - 1

.

Proof.

Observe that

u_{i + 1}^{T} A_{i}^{T} A_{i} u_{i + 1} \leq u_{i}^{T} A_{i}^{T} A_{i} u_{i} = α_{i}^{2} .

(18)

Hence a further use of (14) gives

\begin{matrix} α_{i + 1}^{2} & = u_{i + 1}^{T} A_{i + 1}^{T} A_{i + 1} u_{i + 1} = u_{i + 1}^{T} A_{i}^{T} A_{i} u_{i + 1} + {(u_{i + 1}^{T} a_{i + 1})}^{2} \\ \leq α_{i}^{2} + | u_{i + 1}^{T} a_{i + 1} |^{2} \leq α_{i}^{2} + {∥ a_{i + 1} ∥}_{2}^{2}, \end{matrix}

(19)

which proves (17). The last inequality in (19) is due to the Cauchy–Schwartz inequality

| u_{i + 1}^{T} a_{i + 1} | \leq ∥ u_{i + 1} ∥_{2} {∥ a_{i + 1} ∥}_{2}

(20)

and the fact that

∥ u_{i + 1} ∥_{2} = 1

. □

Combining (10) and (17) shows that

α_{i}^{2} \leq α_{i + 1}^{2} \leq α_{i}^{2} + {∥ a_{i + 1} ∥}_{2}^{2} .

(21)

This raises the question of for which directions of

a_{i + 1}

the value of

α_{i + 1}^{2}

attains its bounds. The key to answering this question lies in the following observation.

Lemma 1.

Assume for a moment that

a_{i + 1}

is an eigenvector of the matrix

A_{i}^{T} A_{i}

. That is,

A_{i}^{T} A_{i} a_{i + 1} = τ a_{i + 1}

(22)

where

τ \in R

is a nonnegative scalar. In this case, the matrix

A_{i + 1}^{T} A_{i + 1}

has the same set of eigenvectors as

A_{i}^{T} A_{i}

. The eigenvector

a_{i + 1}

satisfies

A_{i + 1}^{T} A_{i + 1} a_{i + 1} = (τ + ∥ a_{i + 1} ∥_{2}^{2}) a_{i + 1},

(23)

while all the other eigenpairs remain unchanged.

Proof.

Since

a_{i + 1}

is an eigenvector of

A_{i}^{T} A_{i}

, substituting the spectral decomposition of

A_{i}^{T} A_{i}

into (14) yields the spectral decomposition of

A_{i + 1}^{T} A_{i + 1}

. □

The possibility that

α_{i + 1}^{2}

achieves its upper bound is characterized by the following assertion.

Theorem 3.

Assume for a moment that

a_{i + 1}

is a dominant eigenvector of

A_{i}^{T} A_{i}

. In this case

α_{i + 1}^{2} = α_{i}^{2} + {∥ a_{i + 1} ∥}_{2}^{2} .

(24)

Otherwise, when

a_{i + 1}

is not pointing toward a dominant eigenvector,

α_{i + 1}^{2} < α_{i}^{2} + {∥ a_{i + 1} ∥}_{2}^{2} .

(25)

Proof.

The first claim is a direct consequence of Lemma 1. To prove the second claim, we consider two cases. The first one occurs when

u_{i + 1} = u_{i}

. Since

a_{i + 1}

is not at the direction of

u_{i + 1}

, in this case, there is a strict inequality in (20), which yields a strict inequality in (19). In the second case,

u_{i + 1} \neq u_{i}

, so now we have a strict inequality in (18), which leads to a strict inequality in (19). □

Finally, we consider the possibility that

α_{i + 1}^{2} = α_{i}^{2}

.

Theorem 4.

Assume that

i \geq n

and that

a_{i + 1}

is an eigenvector of

A_{i}^{T} A_{i}

, which corresponds to the smallest eigenvalue of this matrix. That is,

A_{i}^{T} A_{i} a_{i + 1} = β_{i}^{2} a_{i + 1} .

(26)

If

∥ a_{i + 1} ∥_{2}^{2} \leq α_{i}^{2} - β_{i}^{2}

(27)

then

α_{i + 1}^{2} = α_{i}^{2}

. Otherwise, when

∥ a_{i + 1} ∥_{2}^{2} > α_{i}^{2} - β_{i}^{2}

(28)

the value of

α_{i + 1}^{2}

satisfies

α_{i + 1}^{2} = β_{i}^{2} + {∥ a_{i} ∥}_{2}^{2} > α_{i}^{2} .

(29)

Proof.

From Lemma 1, we obtain that

a_{i + 1}

is an eigenvector of

A_{i + 1}^{T} A_{i + 1}

whose eigenvalue equals

β_{i}^{2} + {∥ a_{i + 1} ∥}_{2}^{2}

. Therefore, if (27) holds, then

α_{i}^{2}

remains the largest eigenvalue. Otherwise, when (28) holds,

β_{i}^{2} + {∥ a_{i + 1} ∥}_{2}^{2}

is the largest eigenvalue of

A_{i + 1}^{T} A_{i + 1}

. □

The restriction

i \geq n

is due to the fact that if

i < n

, then the smallest eigenvalue of

A_{i}^{T} A_{i}

is always zero. The extension of Theorem 5 to cover this case is achieved by setting zero instead of

β_{n}^{2}

. Similar results are obtained when

a_{i + 1}

points to other eigenvectors of

A_{i}^{T} A_{i}

.

3. The Smallest Singular Value Anomaly

In this section we explore the behavior of the smallest singular values. We shall start by proving that the sequence

β_{1}, \dots, β_{n}

is descending. The proof uses the fact that for

i \leq n

, the smallest eigenvalue of

A_{i} A_{i}^{T}

is

β_{i}^{2}

.

Theorem 5.

For

i = 1, \dots, n - 1

, we have the inequality

β_{i}^{2} \geq β_{i + 1}^{2} .

(30)

Proof.

The matrix

A_{i} A_{i}^{T}

is a principal submatrix of

A_{i + 1} A_{i + 1}^{T}

. Hence, (30) is a direct corollary of the Cauchy interlace theorem. □

Next, we show that the sequence

β_{n}, β_{n + 1}, \dots, β_{m}

, is ascending.

Theorem 6.

For

i = n, n + 1, \dots, m - 1

, we have the inequality

β_{i + 1}^{2} \geq β_{i}^{2} .

(31)

Proof.

One way to conclude (31) is by using the fact that

A_{i} A_{i}^{T}

is a principal submatrix of

A_{i + 1} A_{i + 1}^{T}

. Let

λ_{1} (A_{i} A_{i}^{T}) \geq \dots \geq λ_{i} (A_{i} A_{i}^{T}) \geq 0,

and

λ_{1} (A_{i + 1} A_{i + 1}^{T}) \geq \dots \geq λ_{i} (A_{i + 1} A_{i + 1}^{T}) \geq λ_{i + 1} (A_{i + 1} A_{i + 1}^{T}) \geq 0,

denote the eigenvalues of these matrices. Then, since

i \geq n, β_{i}^{2} = λ_{n} (A_{i} A_{i}^{T})

,

β_{i + 1}^{2} = λ_{n} (A_{i + 1}, A_{i + 1}^{T})

, and (31) is a direct consequence of Cauchy interlace theorem.

As before, a second proof is obtained by comparing the matrices

A_{i}^{T} A_{i}

and

A_{i + 1}^{T} A_{i + 1}

, and this approach provides us with useful inequalities. Let the unit vector

v_{i}, i = n, \dots, m

, denote an eigenvector of

A_{i}^{T} A_{i}

that corresponds to

β_{i}^{2}

. Then

A_{i}^{T} A_{i} v_{i} = β_{i}^{2} v_{i}

(32)

and

v_{i}

has the minimum property

v_{i}^{T} A_{i}^{T} A_{i} v_{i} = β_{i}^{2} = \min {x^{T} A_{i}^{T} A_{i} x | x \in R^{n} {and ∥ x ∥}_{2} = 1} .

(33)

The last property implies the inequality

v_{i + 1}^{T} A_{i}^{T} A_{i} v_{i + 1} \geq v_{i}^{T} A_{i}^{T} A_{i} v_{i} = β_{i}^{2},

(34)

while a further use of (14) gives

\begin{matrix} β_{i + 1}^{2} & = v_{i + 1}^{T} A_{i + 1}^{T} A_{i + 1} v_{i + 1} = v_{i + 1}^{T} A_{i}^{T} A_{i} v_{i + 1} + {(v_{i + 1}^{T} a_{i + 1})}^{2} \\ \geq v_{i}^{T} A_{i}^{T} A_{i} v_{i} + {(v_{i + 1}^{T} a_{i + 1})}^{2} \\ = β_{i}^{2} + {(v_{i + 1}^{T} a_{i + 1})}^{2} \geq β_{i}^{2} . \end{matrix}

(35)

□

The inequality (35) implies that the growing of

β_{i + 1}^{2}

depends on the size of the scalar product

v_{i + 1}^{T} a_{i + 1}

. Basically, it is difficult to estimate this product, but Lemma 1 and Theorem 5 give some insight. For example, if

a_{i + 1}

is an eigenvector of

A_{i}^{T} A_{i}

whose eigenvalue differs from

β_{i}^{2}

, then

β_{i + 1}^{2} = β_{i}^{2}

. If

a_{i + 1}

is an eigenvector that corresponds to

β_{i}^{2}

, there are two possibilities to consider. If

β_{i}^{2}

is a multiple eigenvalue, then, again,

β_{i + 1}^{2} = β_{i}^{2}

. Otherwise, when

β_{i}^{2}

is a simple eigenvalue,

β_{i + 1}^{2} = β_{i}^{2} + \min {δ, ∥ a_{i + 1} ∥_{2}^{2}}

(36)

where

δ > 0

is the difference between the two smallest eigenvalues of

A_{i}^{T} A_{i}

.

We have seen that the sequence

β_{1}, \dots, β_{n}

is descending, while the sequence

β_{n}, \dots, β_{m}

is ascending. This behavior is called the smallest singular value anomaly. The fact that

β_{n}

is the smallest singular value in the whole sequence raises the question of what makes

β_{n}

small. Clearly,

β_{n}

is always smaller than

\min {∥ a_{1} ∥_{2}, \dots, ∥ a_{n} ∥_{2}} .

(37)

Thus, to obtain a meaningful answer, we make the simplifying assumption

∥ a_{i} ∥_{2} = 1 for i = 1, \dots, m,

(38)

which enables the following bounds.

Lemma 2.

Assume that (38) holds and define

μ = \max {| a_{i}^{T} a_{j} | | i = 1, \dots, n, j = 1, \dots, n, and i \neq j} .

(39)

Then,

α_{n}^{2} \geq 1 + μ

(40)

and

β_{n}^{2} \leq 1 - μ .

(41)

Proof.

It is possible to assume that the above maximum is attained for the first two rows and that

a_{1}^{T} a_{2} = μ > 0

. In this case,

A_{2} A_{2}^{T} = (\begin{matrix} 1 & μ \\ μ & 1 \end{matrix}),

(42)

and the eigenvalues of this matrix are

λ_{1} = 1 + μ

and

λ_{2} = 1 - μ

. Therefore, since

A_{2} A_{2}^{T}

is a principal submatrix of

A_{n} A_{n}^{T}

, the Cauchy interlace theorem implies (40) and (41). □

Usually, the bound (41) is a crude estimate of

β_{n}

. Yet, in some cases, it is the reason for a small value of

β_{n}

.

4. The Condition Number Anomaly

In this section we investigate the behavior of the sequence

k_{i} = α_{i} / β_{i}, i = 1, \dots, m

. The discussion is carried out under the assumption that

β_{n} > 0

, which ensures that

β_{i} > 0

for

i = 1, \dots, m

. We have seen that the sequence

α_{1}, \dots, α_{n}

is ascending while the sequence

β_{1}, \dots, β_{n}

is descending. This proves that the sequence

k_{1}, \dots, k_{n}

is ascending. That is,

k_{i} \leq k_{i + 1} for i = 1, \dots, n - 1 .

(43)

It is also known that the sequences

α_{n}, \dots, α_{m}

and

β_{n}, \dots, β_{m}

are ascending, but this does not provide decisive information about the behavior of the sequence

k_{n}, \dots, k_{m}

. We shall start with examples that illustrate this point.

Example 1.

This example shows that

k_{i + 1}

can be larger than

k_{i}

. For this purpose, consider the case when

a_{i + 1}

is a dominant eigenvector of

A_{i}^{T} A_{i}

. Then from Lemma 1 we see that

α_{i + 1}^{2} = α_{i}^{2} + {∥ a_{i + 1} ∥}_{2}^{2}

but

β_{i + 1} = β_{i}

, which means that

k_{i + 1} > k_{i}

.

Example 2.

A similar situation arises when A has the following property. Assume that as i grows, the sequence of rows directions

a_{i} / {∥ a_{i} ∥}_{2}

,

i = 1, 2, \dots

converges rapidly toward some vector. In this case, the sequence

u_{i}, i = 1, 2, \dots

converges to the same vector, which brings us close to the situation of Example 1 (Tables 3 and 10 illustrate this possibility).

Example 3.

The third example shows that

k_{i + 1}

can be smaller than

k_{i}

. Consider the case described in Theorem 4, when (27) holds. Here

α_{i + 1} = α_{i}

,

β_{i + 1} > β_{i}

and

k_{i + 1} < k_{i}

. More reasons that force decrease are given in Corollary 1 below.

Example 4.

The fourth example describes a situation in which the condition number behaves in a cyclic manner. Let

B \in R^{l \times n}

be a given matrix with

l \geq n

. Let the matrix A be obtained by duplicating B k times. That is,

m = k \times l

and

A = {[B^{T}, B^{T}, \dots, B^{T}]}^{T} \in R^{m \times n} .

Then when i takes the values

i = j \times l, j = 1, \dots, k

, the matrix

A_{i}^{T} A_{i}

has the form

A_{i}^{T} A_{i} = j B^{T} B .

Hence, for these values of i we have

α_{i} = j α_{l}

, and

β_{i} = j β_{l}

, but

k_{i} = k_{l}

.

The situation in which the sequence

k_{n}, \dots, k_{m}

is descending,

k_{i} \geq k_{i + 1} for i = n, \dots, m - 1,

(44)

is called the condition number anomaly. The reasons behind this behavior are explained below.

Theorem 7.

Let the positive parameters

η_{i}

and

ν_{i}

be defined by the equalities

η_{i}^{2} = {(u_{i + 1}^{T} a_{i + 1})}^{2} / α_{i}^{2}

(45)

and

ν_{i}^{2} = {(v_{i + 1}^{T} a_{i + 1})}^{2} / β_{i}^{2} .

(46)

Then, for

i = n, \dots, m - 1

,

k_{i + 1}^{2} \leq k_{i}^{2} (1 + η_{i}^{2}) / (1 + ν_{i}^{2}) .

(47)

Proof.

From (19), we see that

α_{i + 1}^{2} \leq α_{i}^{2} + {(u_{i + 1}^{T} a_{i + 1})}^{2} = α_{i}^{2} (1 + η_{i}^{2}) .

(48)

Similarly from (35), we obtain

β_{i + 1}^{2} \geq β_{i}^{2} + {(v_{i + 1}^{T} a_{i + 1})}^{2} = β_{i}^{2} (1 + ν_{i}^{2}) .

(49)

Hence, combining these inequalities gives (47). □

Corollary 1.

The inequality

ν_{i}^{2} \geq η_{i}^{2}

(50)

implies

k_{i} \geq k_{i + 1} .

(51)

□

The last corollary is a key observation that indicates at which situations the condition number anomaly is likely to occur. Assume for a moment that the direction of

a_{i + 1}

is chosen in some random way. Then, the scalar product terms

{(u_{i + 1}^{T} a_{i + 1})}^{2}

and

{(v_{i + 1}^{T} a_{i + 1})}^{2}

are likely to be about the same size. However, since

β_{i}^{2}

is (considerably) smaller than

α_{i}^{2}

, the term

{(v_{i + 1}^{T} a_{i + 1})}^{2} / β_{i}^{2}

is expected to be larger than

{(u_{i + 1}^{T} a_{i + 1})}^{2} / α_{i}^{2}

, which implies (50).

Summarizing the above discussion, we see that the condition number anomaly is likely to occur whenever the rows of the matrix satisfy two conditions: all the rows have about the same size, and the directions of the rows are scattered in some random way. This conclusion means that the phenomenon is shared by a wide range of matrices. The examples in Section 6 illustrate this point.

5. Iterations Anomaly

Let

a_{i}, A_{i}, α_{i}, β_{i}

and

k_{i}, i = 1, \dots, m

, be as in the previous sections. Let

b = {(b_{1}, \dots, b_{m})}^{T}

\in R^{m}

be an arbitrary given vector, which is used to define the vectors

b_{i} = {(b_{1}, \dots, b_{i})}^{T} \in R^{i}, i = 1, \dots, m

. In this section, we examine how the condition number anomaly affects the convergence of certain iterative methods for solving a linear system of the form

A_{i} x = b_{i} .

(52)

We shall start by considering the Richardson method for solving the normal equations

A_{i}^{T} A_{i} x = A_{i}^{T} b_{i},

(53)

e.g., [16,17,18]. Given

x_{k}

the k-th iteration,

k = 1, 2, \dots,

of Richardson method has the form

x_{k + 1} = x_{k} - w A_{i}^{T} (A_{i} x_{k} - b_{i}),

(54)

where

w > 0

is a pre-assigned relaxation parameter. Recall that

A_{i}^{T} (A_{i} x_{k} - b_{i})

is the gradient vector of the least-squares objective function

F (x) = 1 / 2 ∥ A_{i} x - b_{i} ∥_{2}^{2}

(55)

at the point

x_{k}

. Hence, iteration (54) can be viewed as a steepest descent method for minimizing

F (x)

that uses a fixed step length. An equivalent way to write (54) is

x_{k + 1} = (I - w A_{i}^{T} A_{i}) x_{k} + w A_{i}^{T} b_{i},

(56)

which shows that the rate of convergence of the method depends on the spectral radius of the iteration matrix

H_{w} = I - w A_{i}^{T} A_{i} .

(57)

Let

ρ (H_{w})

denote the spectral radius of

H_{w}

. Then the theory of iterative methods tells us that the method converges whenever

ρ (H_{w}) < 1,

(58)

and the smaller

ρ (H_{w})

is, the faster the convergence; see, for example, Refs. [16,17,18]. Observe that the eigenvalues of

H_{w}

lie in the interval

[1 - w α_{i}^{2}, 1 - w β_{i}^{2}]

. This shows that (58) holds for values of w that satisfy

0 < w < 2 / α_{i}^{2} .

(59)

Furthermore, let

w_{o p t}

denote the optimal value of w, for which

ρ (H_{w})

attains its smallest value. Then

w_{o p t} = 2 / (α_{i}^{2} + β_{i}^{2})

(60)

and

\begin{matrix} ρ (I - w_{o p t} A_{i}^{T} A_{i}) & = - (1 - w_{o p t} α_{i}^{2}) = 2 α_{i}^{2} / (α_{i}^{2} + β_{i}^{2}) - 1 \\ = (α_{i}^{2} - β_{i}^{2}) / (α_{i}^{2} + β_{i}^{2}) = (k_{i}^{2} - 1) / (k_{i}^{2} + 1) . \end{matrix}

(61)

See [17] (pp. 22–23) and [18] (pp. 114–115) for detailed discussion of these results. Consequently, as

k_{i}

increases, the spectral radius of the iteration matrix approaches 1, and the rate of convergence slows down. That is, the condition number anomaly results in a similar anomaly in the number of iterations. See Table 12.

Another useful iterative method for solving large sparse linear systems is the Cimmino method, e.g., [15,16,18,23]. Let the unit vectors

{\tilde{a}}_{i} = a_{i} / {∥ a_{i} ∥}_{2}, i = 1, \dots, m,

(62)

be obtained by normalizing the rows of A. Let

{\tilde{A}}_{i}

be an

i \times n

matrix whose rows are

{\tilde{a}}_{j}, j = 1, \dots, i

, and let

D_{i}

denote the

i \times i

diagonal matrix

D_{i} = d i a g {a_{1}^{T} a_{1}, \dots, a_{i}^{T} a_{i}} .

(63)

Then

{\tilde{A}}_{i} = {[{\tilde{a}}_{1}, \dots, {\tilde{a}}_{i}]}^{T} = D_{i}^{- 1 / 2} A_{i}

(64)

for

i = 1, \dots, m

. Similarly, we define

{\tilde{b}}_{i} = b_{i} / {∥ a_{i} ∥}_{2}

for

i = 1, \dots, m

, and

{\tilde{b}}_{i} = D_{i}^{- 1 / 2} b_{i}

. Then Cimmino method is aimed at solving the linear system

{\tilde{A}}_{i} x = {\tilde{b}}_{i},

(65)

or the related normal equations

{\tilde{A}}_{i}^{T} {\tilde{A}}_{i} x = {\tilde{A}}_{i}^{T} {\tilde{b}}_{i} .

(66)

The kth iteration of the Cimmino method,

k = 1, 2, \dots,

has the form

x_{k + 1} = x_{k} - w \sum_{j = 1}^{i} ν_{j} {\tilde{a}}_{j} ({\tilde{a}}_{j}^{T} x_{k} - {\tilde{b}}_{j}),

(67)

where

w > 0

is a pre-assigned relaxation parameter, and

ν_{j}, j = 1, \dots, i

are weighting parameters that satisfy

ν_{j} > 0 for j = 1, \dots, i and ν_{1} + \dots + ν_{i} = 1 .

(68)

Observe that the point

x_{k} - {\tilde{a}}_{j} ({\tilde{a}}_{j}^{T} x_{k} - {\tilde{b}}_{j})

(69)

is the projection of

x_{k}

on the hyperplane

{x \in R^{n} | a_{j}^{T} x = b_{j}}

, and the point

x_{k} - \sum_{j = 1}^{i} ν_{j} {\tilde{a}}_{j} ({\tilde{a}}_{j}^{T} x_{k} - {\tilde{b}}_{j})

(70)

is a weighted average of these projections. The usual way to apply the Cimmino method is with equal weights. That is,

ν_{j} = 1 / i

for

j = 1, \dots, i

. This enables us to rewrite the Cimmino iteration in the form

x_{k + 1} = x_{k} - w {\tilde{A}}_{i}^{T} ({\tilde{A}}_{i} x_{k} - {\tilde{b}}_{i}),

(71)

which is the Richardson iteration for solving the normal equations (66). Therefore, from (61), we conclude that the optimal rate of convergence of the Cimmino method depends on the ratio

ρ (I - w_{o p t} {\tilde{A}}_{i}^{T} {\tilde{A}}_{i}) = ({\tilde{k}}_{i}^{2} - 1) / ({\tilde{k}}_{i}^{2} + 1),

(72)

where

{\tilde{k}}_{i}

is the condition number of

{\tilde{A}}_{i}

.

Another example is the Jacobi iteration for solving the equations

A_{i} A_{i}^{T} y = b_{i} .

(73)

The basic iteration of this method has the form

y_{k + 1} = (I - w D_{i}^{- 1} A_{i} A_{i}^{T}) y_{k} + w D_{i}^{- 1} b_{i},

(74)

where

D_{i}

is the diagonal matrix (63) and

w > 0

is a pre-assigned relaxation parameter. Now the equalities

D_{i}^{1 / 2} (I - w D_{i}^{- 1} A_{i} A_{i}^{T}) D_{i}^{- 1 / 2} = I - w D_{i}^{- 1 / 2} A_{i} A_{i}^{T} D_{i}^{- 1 / 2} = I - w {\tilde{A}}_{i} {\tilde{A}}_{i}^{T}

(75)

indicate that the iteration matrix of the Jacobi method is similar to the matrix

I - w {\tilde{A}}_{i} {\tilde{A}}_{i}^{T}

. Hence, as before, the optimal rate of convergence depends on the ratio

({\tilde{k}}_{i}^{2} - 1) / ({\tilde{k}}_{i}^{2} + 1)

. Thus, again, a condition number anomaly invites a similar anomaly in the number of iterations.

We shall finish this section by mentioning two further methods that share this behavior. The first one is the conjugate gradients algorithm for solving the normal Equations (53), whose rate of convergence slows down as the condition numbers of A increases. See, for example, Refs. [1] (pp. 312–314), [3] (pp. 299–300) and [18] (pp. 203–205). The second is Kaczmarz’s method, which is a popular “row-action” method; see Refs. [15,16,23,24,25]. The use of this method to solve

A x = b

is equivalent to the SOR method for solving the system

A A^{T} y = b

, and both methods have the property that a small condition number results in fast convergence while a large condition number slows it [24,25].

6. Numerical Examples

In this section, we bring some examples that illustrate the actual behavior of the anomaly phenomena. The first examples consider small

m \times n

matrices.

Table 1 describes the anomaly in a “two-ones” matrix. This matrix has

m = n (n - 1) / 2

different rows. Each row has only two nonzero entries, and each nonzero entry has the value 1 (a matrix with n columns has at most

n (n - 1) / 2

different rows of this type). This matrix exhibits a moderate anomaly, due to the fact that

A_{n}

is well conditioned.

Table 1. The anomaly in small “two-ones” matrix.

Table 2 describes the anomaly in a small

m \times n

segment of the Hilbert matrix. Here, the

(i, j)

entry equals

1 / (i + j - 1)

. Consequently, the sequence of rows directions

a_{i} / {∥ a_{i} ∥}_{2}

,

i = 1, 2, \dots

, converges slowly toward the vector

e / {∥ e ∥}_{2}

, where

e = {(1, 1, \dots, 1)}^{T} \in R^{n}

. Hence, the decrease in the sequence

k_{n}, k_{n + 1}, \dots, k_{m}

is quite moderate.

Table 2. The anomaly in small Hilbert matrix.

In Table 3, we consider a small

m \times n

segment of the Pascal matrix. Recall that the entries of this matrix are built in the following way:

a_{1 j} = 1

for

j = 1, \dots, n

, and

a_{i 1} = 1

for

i = 1, \dots, m

. The other entries are obtained from the rule

a_{i j} = a_{i, j - 1} + a_{i - 1, j} for i = 2, \dots, m and j = 2, \dots, n .

(76)

In this matrix, the norm of the rows grows very fast while the sequence of row directions

a_{i} / {∥ a_{i} ∥}_{2}, i = 1, 2, \dots

converges rapidly toward the vector

e_{n} = {(0, 0, \dots, 0, 1)}^{T} \in R^{n}

. Thus, as i becomes considerably larger than n, both

a_{i} / {∥ a_{i} ∥}_{2}

and

u_{i}

approach

e_{n}

, which causes

k_{i + 1}

to be larger than

k_{i}

.

Table 3. Failure of the anomaly in small Pascal matrix.

The random matrices that are tested in Table 4 and Table 5 provide nice examples of the anomaly phenomenon. In these matrices, each entry is a random number from interval

[- 1, 1]

. To generate these matrices, and the other random matrices, we used MATLAB’s command “rand”, whose random numbers generator is of uniform distribution. Similar results are obtained when “rand” is replaced with “randn”, which uses normal distribution.

Table 4. The anomaly in small random matrix.

Table 5. The anomaly in random matrix.

The nonnegative random matrix that is tested in Table 6 is obtained by MATLAB’s command

A = r a n d (m, n)

. That is, here, each entry of A is a random number from the interval

[0, 1]

. This yields a more ill-conditioned matrix and a sharper anomaly.

Table 6. The anomaly in nonnegative random matrix.

Table 7 and Table 8 consider a different type of random matrices. As its name says, the entries of the “

- 1

or 1” matrix are either

- 1

or 1, with equal probability. In practice, the

(i, j)

entry,

a_{i j}

, is defined in the following way. First, sample a random number, r, from the interval

[- 1, 1]

. If

r > 0

, then

a_{i j} = 1

; otherwise

a_{i j} = - 1

. The entries of the “0 or 1” matrix are defined in a similar manner: if

r > 0

then

a_{i j} = 1

; otherwise

a_{i j} = 0

. Both matrices display a strong anomaly. The “0 or 1” matrix is slightly more ill conditioned and, therefore, has a sharper anomaly.

Table 7. The anomaly in random “−1 or 1” matrix.

Table 8. The anomaly in random “0 or 1” matrix.

The results of Table 9 and Table 10 are quite instructive. Both matrices are highly ill conditioned but display a different behavior. The “narrow range” matrix is a random matrix whose entries are sampled from the small interval [0.99, 1.01]. However, the directions of the rows are not converging, and the matrix displays a nice anomaly. The “converging rows” matrix is defined in a slightly different way. Here, the entries in the ith row,

i = 1, \dots, m

, are random numbers from the interval

[1 - 1 / i, 1 + 1 / i]

. Hence, the related sequence of rows directions,

a_{i} / {∥ a_{i} ∥}_{2}, i = 2, \dots

, converges toward the vector

e / {∥ e ∥}_{2}

, which is the situation described in Example 2. Consequently, when i becomes much larger than n, we see a moderate increase in the value of

k_{i}

.

Table 9. The anomaly in random “narrow range” matrix.

Table 10. Failure of anomaly in matrix with “converging rows”.

Other matrices that possess the anomaly phenomena are large sparse matrices. The matrix in Table 11 is created by using MATLAB’s command

A =

sprand

(m, n

, density) with

m =

100,000,

n =

10,000 and density

= 100 / n

. This way, each row of A has nearly 100 nonzero entries that have random values and random locations. Although not illustrated in this paper, our experience shows that the smaller the density, the sharper the anomaly.

Table 11. The anomaly in large sparse random matrix.

Table 12 illustrates the iterations anomaly phenomenon when using the methods of Richardson, Cimmino, and Jacobi. The first two methods were used to solve linear systems of the form

A_{i} x = b_{i}, i = 1, \dots, m .

(77)

As before, each

A_{i}

is an

i \times n

submatrix that is composed from the first i rows of a given

m \times n

matrix A. The construction of A is done in two steps. First, we generate a random matrix as in Table 4 and Table 5. Then the rows of the matrix are normalized to be unit vectors. The vector

b_{i}

is defined by the product

b_{i} = A_{i} e,

(78)

which ensures that

e

solves the linear system. Since

A_{i}

has unit rows, Cimmino iteration (71) coincides with Richardson iteration (54). The value of w that we use is the optimal one,

w_{o p t} = 2 / (α_{i}^{2} + β_{i}^{2}),

(79)

and the iterations start from the point

x_{0} = 0

. The iterative process is terminated as soon as the residual vector satisfies

∥ A_{i} x_{k} - b_{i} ∥_{2} / {∥ b_{i} ∥}_{2} \leq 10^{- 10} .

(80)

The number of iterations which are required to satisfy this condition is displayed in the last column of Table 12.

Table 12. Iterations anomaly in the methods of Richardson, Cimmino, and Jacobi.

The Jacobi method was used to solve the linear systems (73), where

A_{i}

and

b_{i}

are defined as above. Since

A_{i}

has unit rows,

D_{i}

is a unit matrix, and Jacobi iteration (74) is reduced to

y_{k + 1} = y_{k} - w (A_{i} A_{i}^{T} y_{k} - b_{i}) .

(81)

The last iteration uses the optimal value of w, given in (79). It starts from the point

y_{0} = 0

and terminates as soon as the residual vector satisfies

∥ A_{i} A_{i}^{T} y_{k} - b_{i} ∥_{2} / {∥ b_{i} ∥}_{2} \leq 10^{- 10} .

(82)

The number of required iterations is nearly identical to that of the Richardson method. This is not surprising since multiplying (81) by

A_{i}^{T}

shows that the sequence

x_{k} = A_{i}^{T} y_{k}, k = 0, 1, 2, \dots,

(83)

is generated by Richardson iteration (54) (there were only two minor exceptions: for

i = 48

, the Jacobi method required 36,959 iterations instead of 36,960, while for

i = 50

, the Jacobi required 6,772,151 iterations instead of 6,760,589. In all the other cases, the two methods required exactly the same number of iterations).

The figures in Table 12 demonstrate the close link between the condition number and the rate of convergence. As anticipated from (61), for large values of

k_{i}

the spectral radius approaches 1 and the rate of convergence slows down. Thus, a large condition number results in a large number of iterations. Conversely, a small value of

k_{i}

implies a small spectral radius and a small number of iterations. In other words, a condition number anomaly invites a similar anomaly in the number of iterations.

Usually, it is reasonable to assume that the computational effort in solving a linear system is proportional to the number of rows. That is, the more rows we have, the more computation time is needed. From this point of view, the iterations anomaly phenomenon is somewhat surprising, as solving a linear system with

i = 10 n

rows needs considerably less time than solving a linear system with

i = n

rows.

7. Concluding Remarks

As an old adage says, the whole is sometimes much more than the sum of its parts. The basic ascending (descending) properties of singular values are easily concluded from the Cauchy interlace theorem, while the inequalities that we derive enable us to see what causes a large, or small, increase. Combining these results gives a better overview of the whole situation. One consequence regards the anomalous behavior of the smallest singular values sequence

β_{1}, \dots, β_{m}

, and the fact that

β_{n}

is the smallest number in this sequence. The second observation is about the condition number anomaly. It is easy to conclude the increasing of the condition numbers sequence,

k_{1}, \dots, k_{n}

, but Cauchy interlace theorem does not tell us how the rest of this sequence behaves. The answer is obtained by considering the expression for the ratio

k_{i + 1}^{2} / k_{i}^{2}

. This expression explains the reasons behind the condition number anomaly and characterizes situations that invite (or exclude) such behavior. We see that the anomaly phenomenon is likely to occur in “random-like” matrices whose rows satisfy two conditions: all the rows have about the same size and the directions of the rows scatter in some random way. This suggests that the condition number anomaly phenomenon is common in several types of matrices, and the numerical examples illustrate this point.

The practical importance of the condition number anomaly lies in the use of iterative methods for solving large linear systems. As we have seen, several iterative solvers have the property that the rate of convergence depends on the condition number. Therefore, when solving “random-like” systems, a fast rate of convergence is expected in under-determined or over-determined systems, while a slower rate is expected in (nearly) square systems.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

Demmel, J.W. Applied Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [Google Scholar]
Golub, G.H.; Loan, C.F.V. Matrix Computations, 4th ed.; Johns Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]
Trefethen, L.N.; Bau, D., III. Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [Google Scholar]
Tikhomirov, K. The smallest singular value of random rectangular matrices with no moment assumptions on entries. Israel J. Math. 2016, 212, 289–314. [Google Scholar] [CrossRef]
Bai, Z.D.; Yin, Y.Q. Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. Ann. Probab. 1993, 21, 1275–1294. [Google Scholar] [CrossRef]
Chen, Z.; Dongarra, J.J. Condition numbers of Gaussian random matrices. SIAM J. Matrix Anal. Appl. 2005, 27, 603–620. [Google Scholar] [CrossRef]
Edelman, A. Eigenvalues and condition numbers of random matrices. SIAM J. Matrix Anal. Appl. 1988, 9, 543–560. [Google Scholar] [CrossRef]
Marchenko, V.A.; Pastur, L.A. Distribution of eigenvalues of some sets of random matrices. Math. USSR-Sb. 1967, 1, 457–486. [Google Scholar] [CrossRef]
Mendelson, S.; Paouris, G. On the singular values of random matrices. J. Eur. Math. Soc. 2014, 16, 823–834. [Google Scholar] [CrossRef]
Rudelson, M.; Vershynin, R. Smallest singular value of a random rectangular matrix. Commun. Pure Appl. Math. 2009, 62, 1707–1739. [Google Scholar] [CrossRef]
Silverstein, J. On the weak limit of the largest eigenvalue of a large dimensional sample covariance matrix. J. Multivar. Anal. 1989, 30, 307–311. [Google Scholar] [CrossRef]
Szarek, S. Condition numbers of random matrices. J. Complex. 1991, 7, 131–149. [Google Scholar] [CrossRef]
Tatarko, K. An upper bound on the smallest singular value of a square random matrix. J. Complex. 2018, 48, 119–128. [Google Scholar] [CrossRef]
Zimmermann, R. On the condition number anomaly of Gaussian correlation matrices. Linear Algebr. Appl. 2014, 466, 512–526. [Google Scholar] [CrossRef]
Censor, Y. Row-action methods for huge and sparse systems and their applications. SIAM Rev. 1981, 23, 444–466. [Google Scholar] [CrossRef]
Dax, A. The convergence of linear stationary iterative processes for solving singular unstructured systems of linear equations. SIAM Rev. 1990, 32, 611–635. [Google Scholar] [CrossRef]
Hageman, L.A.; Young, D.M. Applied Iterative Methods; Academic Press: New York, NY, USA, 1981. [Google Scholar]
Saad, Y. Iterative Methods for Sparse Linear Systems, 2nd ed.; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar]
Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, MA, USA, 1985. [Google Scholar]
Horn, R.A.; Johnson, C.R. Topics in Matrix Analysis; Cambridge University Press: Cambridge, MA, USA, 1991. [Google Scholar]
Hwang, S.-G. Cauchy’s interlace theorem for eigenvalues of Hermitian matrices. Am. Math. Mon. 2004, 111, 157–159. [Google Scholar] [CrossRef]
Parlett, B.N. The Symmetric Eigenvalue Problem; Prentice-Hall: Englewood Cliffs, NJ, USA, 1980. [Google Scholar]
Censor, Y.; Zenios, S.A. Parallel Optimization, Theory, Algorithms, and Applications; Oxford University Press: Oxford, UK, 1997. [Google Scholar]
Dax, A. The adventures of a simple algorithm. Linear Algebr. Appl. 2003, 361, 41–61. [Google Scholar] [CrossRef][Green Version]
Dax, A. Kaczmarz Anomaly: A Surprising Feature of Kaczmarz Method, Technical Report; Hydrological Service of Israel, 2021; in preparation.

Table 1. The anomaly in small “two-ones” matrix.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
2	10	1.732	1.000	1.732
4	10	2.236	1.000	2.236
6	10	2.646	1.000	2.646
8	10	3.000	1.000	3.000
10	10	3.171	0.377	8.416
15	10	3.286	0.568	5.781
20	10	3.464	0.874	3.963
30	10	3.785	1.294	2.924
40	10	4.067	2.048	1.986
45	10	4.243	2.828	1.500

Table 2. The anomaly in small Hilbert matrix.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
2	5	1.394	1.129 × 10 $^{- 1}$	1.234 × 10 $^{1}$
3	5	1.481	5.428 × 10 $^{- 3}$	2.727 × 10 $^{2}$
4	5	1.532	1.711 × 10 $^{- 4}$	8.956 × 10 $^{3}$
5	5	1.567	3.288 × 10 $^{- 6}$	4.766 × 10 $^{5}$
6	5	1.592	6.400 × 10 $^{- 6}$	2.488 × 10 $^{5}$
7	5	1.611	9.679 × 10 $^{- 6}$	1.664 × 10 $^{5}$
10	5	1.648	1.928 × 10 $^{- 5}$	8.549 × 10 $^{4}$
15	5	1.679	3.231 × 10 $^{- 5}$	5.197 × 10 $^{4}$
20	5	1.696	4.182 × 10 $^{- 5}$	4.056 × 10 $^{4}$
30	5	1.714	5.422 × 10 $^{- 5}$	3.161 × 10 $^{4}$
50	5	1.723	6.689 × 10 $^{- 5}$	2.585 × 10 $^{4}$
100	5	1.746	7.845 × 10 $^{- 5}$	2.219 × 10 $^{4}$

Table 3. Failure of the anomaly in small Pascal matrix.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
2	5	7.691 × 10 $^{0}$	9.194 × 10 $^{- 1}$	8.366 × 10 $^{0}$
3	5	2.068 × 10 $^{1}$	3.495 × 10 $^{- 1}$	5.915 × 10 $^{1}$
4	5	4.649 × 10 $^{1}$	8.258 × 10 $^{- 2}$	5.630 × 10 $^{2}$
5	5	9.229 × 10 $^{1}$	1.084 × 10 $^{- 2}$	8.517 × 10 $^{3}$
6	5	1.672 × 10 $^{2}$	2.623 × 10 $^{- 2}$	6.376 × 10 $^{3}$
7	5	2.826 × 10 $^{2}$	4.751 × 10 $^{- 2}$	5.948 × 10 $^{3}$
10	5	1.020 × 10 $^{3}$	1.391 × 10 $^{- 1}$	7.331 × 10 $^{3}$
20	5	1.539 × 10 $^{4}$	5.132 × 10 $^{- 1}$	2.998 × 10 $^{4}$

Table 4. The anomaly in small random matrix.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
4	16	2.772	1.492	1.858
8	16	3.194	1.006	3.189
12	16	3.817	0.286	13.337
16	16	4.068	0.030	136.876
20	16	4.258	0.295	14.439
30	16	4.683	1.179	3.971
50	16	5.956	2.273	2.621
80	16	6.891	3.338	2.064
120	16	7.960	4.412	1.804
160	16	8.875	5.291	1.677

Table 5. The anomaly in random matrix.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
20	100	7.976	3.505	2.276
40	100	8.782	2.681	3.276
60	100	9.679	1.476	6.556
80	100	10.350	0.706	14.653
100	100	10.802	0.044	245.346
120	100	11.429	0.753	15.183
150	100	12.164	1.471	8.270
200	100	13.413	2.409	5.569
300	100	15.329	4.537	3.379
400	100	16.872	6.084	2.773
500	100	18.458	7.301	2.528
1000	100	23.266	12.593	1.879

Table 6. The anomaly in nonnegative random matrix.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
20	100	22.309	1.836	12.151
40	100	31.532	1.317	23.949
60	100	38.462	0.653	58.927
80	100	44.355	0.359	123.504
100	100	49.634	0.030	1662.0
120	100	54.511	0.357	152.782
150	100	61.162	0.734	83.301
200	100	70.735	1.211	58.391
300	100	86.710	2.269	38.213
400	100	100.217	3.102	32.306
500	100	112.018	3.649	30.702
1000	100	158.344	6.296	25.148

Table 7. The anomaly in random “−1 or 1” matrix.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
20	100	13.526	6.472	2.090
40	100	15.624	4.205	3.715
60	100	16.949	2.647	6.404
80	100	18.019	1.092	16.503
100	100	19.204	0.0355	541.70
120	100	20.095	1.106	18.169
150	100	21.600	2.601	8.305
200	100	23.172	4.052	5.718
300	100	26.536	7.337	3.617
400	100	29.177	9.943	2.934
500	100	31.626	12.834	2.464
1000	100	41.216	21.866	1.883

Table 8. The anomaly in random “0 or 1” matrix.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
20	100	22.404	3.337	6.714
40	100	31.705	2.283	13.886
60	100	38.671	1.344	28.776
80	100	44.727	0.5560	80.443
100	100	50.027	0.0207	2416.8
120	100	54.963	0.4955	110.91
150	100	61.745	1.314	47.001
200	100	71.292	2.026	35.192
300	100	87.315	3.720	23.469
400	100	100.78	4.981	20.234
500	100	112.74	6.398	17.620
1000	100	159.07	10.981	14.486

Table 9. The anomaly in random “narrow range” matrix.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
20	100	44.716	3.667 × 10 $^{- 2}$	1.219 × 10 $^{3}$
40	100	63.240	2.621 × 10 $^{- 2}$	2.413 × 10 $^{3}$
60	100	77.451	1.303 × 10 $^{- 2}$	5.946 × 10 $^{3}$
80	100	89.432	7.184 × 10 $^{- 3}$	1.245 × 10 $^{4}$
100	100	99.989	5.916 × 10 $^{- 4}$	1.690 × 10 $^{5}$
120	100	109.54	7.901 × 10 $^{- 3}$	1.545 × 10 $^{4}$
150	100	122.47	1.473 × 10 $^{- 2}$	8.316 × 10 $^{3}$
200	100	141.43	2.423 × 10 $^{- 2}$	5.836 × 10 $^{3}$
300	100	173.20	4.540 × 10 $^{- 2}$	3.815 × 10 $^{3}$
400	100	200.00	6.204 × 10 $^{- 2}$	3.224 × 10 $^{3}$
500	100	223.61	7.297 × 10 $^{- 2}$	3.064 × 10 $^{3}$
1000	100	316.23	1.259 × 10 $^{- 1}$	2.511 × 10 $^{3}$

Table 10. Failure of anomaly in matrix with “converging rows”.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
20	100	44.700	2.373 × 10 $^{- 1}$	1.883 × 10 $^{2}$
40	100	63.219	8.897 × 10 $^{- 2}$	7.106 × 10 $^{2}$
60	100	77.427	3.257 × 10 $^{- 2}$	2.378 × 10 $^{3}$
80	100	89.409	1.486 × 10 $^{- 2}$	6.017 × 10 $^{3}$
100	100	99.968	1.061 × 10 $^{- 3}$	9.418 × 10 $^{4}$
120	100	109.51	1.018 × 10 $^{- 2}$	1.076 × 10 $^{4}$
150	100	122.45	1.781 × 10 $^{- 2}$	6.875 × 10 $^{3}$
200	100	141.40	2.536 × 10 $^{- 2}$	5.576 × 10 $^{3}$
300	100	173.18	3.333 × 10 $^{- 2}$	5.189 × 10 $^{3}$
400	100	199.98	3.724 × 10 $^{- 2}$	5.369 × 10 $^{3}$
500	100	223.59	3.934 × 10 $^{- 2}$	5.684 × 10 $^{3}$
1000	100	316.22	4.319 × 10 $^{- 2}$	7.321 × 10 $^{3}$

Table 11. The anomaly in large sparse random matrix.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
2000	10,000	23.207	3.122	7.434
4000	10,000	32.275	2.085	15.479
6000	10,000	39.294	1.285	30.571
8000	10,000	45.201	0.605	74.710
9000	10,000	47.893	0.291	164.79
10,000	10,000	50.426	4.064 × 10 $^{- 4}$	1.241 × 10 $^{5}$
11,000	10,000	52.854	0.279	189.50
12,000	10,000	55.161	0.5437	101.45
15,000	10,000	61.590	1.284	47.956
20,000	10,000	71.033	2.372	29.953
30,000	10,000	86.895	4.189	20.744
40,000	10,000	100.32	5.708	17.573
50,000	10,000	112.14	7.072	15.856

Table 12. Iterations anomaly in the methods of Richardson, Cimmino, and Jacobi.

Number of Rows	Number of Columns	Largest Singular Value	Smallest Singular Value	Condition Number	Number of Iterations
$i$	$n$	$α_{i}$	$β_{i}$	$k_{i}$
10	50	1.372	0.6454	2.125	50
30	50	1.673	0.2265	7.385	580
40	50	1.808	0.1241	14.568	2192
45	50	1.861	9.303 × 10 $^{- 2}$	20.008	4154
48	50	1.916	3.259 × 10 $^{- 2}$	58.791	36,960
50	50	1.917	2.415 × 10 $^{- 3}$	793.96	6,760,589
52	50	1.937	3.606 × 10 $^{- 2}$	53.730	30,976
55	50	1.955	0.1078	18.133	3559
60	50	2.029	0.1468	13.825	2107
80	50	2.176	0.2744	7.929	681
100	50	2.337	0.4617	5.061	266
200	50	3.030	1.021	2.966	98
500	50	4.098	2.217	1.848	37

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.