Abstract
Let A be an arbitrary matrix in which the number of rows, m, is considerably larger than the number of columns, n. Let the submatrix , be composed of the first i rows of A. Let denote the smallest singular value of , and let denote the condition number of . In this paper, we examine the behavior of the sequences , and . The behavior of the smallest singular values sequence is somewhat surprising. The first part of this sequence, , is descending, while the second part, , is ascending. This phenomenon is called “the smallest singular value anomaly”. The sequence of the condition numbers has a different character. The first part of this sequence, , always ascends toward , which can be very large. The condition number anomaly occurs when the second part, , descends toward a value of , which is considerably smaller than . It is shown that this is likely to happen whenever the rows of A satisfy two conditions: all the rows are about the same size, and the directions of the rows scatter in some random way. These conditions hold in a wide range of random matrices, as well as other types of matrices. The practical importance of these phenomena lies in the use of iterative methods for solving large linear systems, since several iterative solvers have the property that a large condition number results in a slow rate of convergence, while a small condition number yields fast convergence. Consequently, a condition number anomaly leads to a similar anomaly in the number of iterations. The paper ends with numerical experiments that illustrate the above phenomena.
Keywords:
submatrices; largest singular value; smallest singular value anomaly; condition number anomaly; iterations anomaly; random matrices MSC:
65F15; 65F25; 65F50
1. Introduction
Let be a given matrix in which the number of rows, m, is considerably larger than the number of columns, n. Let the rows of A be denoted as , where . That is, . Let
be a submatrix of A, which is composed from the first i rows of A. Let denote the largest singular value of , let denote the smallest singular value of , and let denote the condition number of this matrix. In this paper, we investigate the behavior of the sequences , and . We start by showing that adding rows causes the largest singular value to increase,
and study the reasons for the large, or small, increase. Next, we consider the behavior of the smallest singular values, which is somewhat surprising: at first, adding rows causes decreasing,
Then, as i passes n, adding rows increases the smallest singular value. That is,
This behavior is called “the smallest singular value anomaly”. The study of this phenomenon explains the reasons for large, or small, difference between and .
The last observation implies that is the smallest number in the sequence . Assume for simplicity that . In this case, for , and the ratio is the condition number of . This number affects the results of certain computations, such as the solution of linear equations, e.g., [1,2,3]. It is interesting, therefore, to examine the behavior of the sequence . The inequalities (2) and (3) show that as i moves from 1 to n, the value of increases. That is,
However, as i passes n, both and are increasing, and the behavior of is not straightforward. The fact that the sequence is increasing tempts one to expect that the sequence will decrease. That is,
The situation in which (6) holds is called the condition number anomaly. In this case, the sequence increases toward , which can be quite large, while the sequence decreases toward a value of , which is considerably smaller than . The inequalities that assess the increase in the sequences (2) and (4) enable us to derive a useful bound on the ratio . The bound explains the reasons behind the condition number anomaly, and characterizes situations that invite (or exclude) such behavior.
One type of matrices that exhibits the condition number anomaly is that of dense random matrices in which each element of the matrix is independently sampled from the same probability distribution. In particular, if each element of A comes from an independent standard normal distribution, then A is a Gaussian random matrix, and is a Wishart matrix. The problem of estimating the largest and the smallest singular values of large Gaussian matrices has been studied by several authors. See [4,5,6,7,8,9,10,11,12,13,14] and the references therein. In this case, when n is very large and , we have the estimates
and
which means that very large Gaussian matrices possess the condition number anomaly (for very large n and we have ; see [7,12]).
Our analysis shows that the condition number anomaly is not restricted to large Gaussian matrices. It is shared by a wide range of matrices, from small random matrices to large sparse matrices. The bounds that we derive have a simple geometric interpretation that helps to see what makes large and what forces the sequence to decrease. Roughly speaking, the condition number anomaly is expected whenever all the rows of the matrix have about the same size and the directions of the rows are randomly scattered. The paper brings several numerical examples that illustrate this feature.
The practical interest in the condition number anomaly comes from the use of iterative methods for solving large sparse linear systems, e.g., [15,16,17,18]. Some of these methods have the property that the asymptotic rate of convergence depends on the condition number of the related matrix. That is, a large condition number results in slow convergence, while a small condition number yields fast convergence. Assume now that such a method is used to solve a linear system whose matrix has the condition number anomaly. Then the last property implies a similar anomaly in the number of iterations. This phenomenon is called “iterations anomaly”. The discussion in Section 5 demonstrates this property in the methods of Richardson, Cimmino, and Jacobi. See Table 12.
2. The Ascending Behavior of the Largest Singular Values
In this section, we investigate the behavior of the sequence . The first assertion establishes the ascending property of this sequence.
Theorem 1.
The sequence satisfies
Proof.
Observe that is the largest eigenvalue of the matrix , which is a principal submatrix of . Hence, (10) is a direct consequence of the Cauchy interlace theorem. For statements and proofs of this theorem, see, for example, Refs. [2] (p. 441), [19] (p. 185), [20] (p. 149), [21] and [22] (p. 186). A second way to prove (10) is given below. This approach enables a closer inspection of the ascending process.
Here, we use the fact that is the largest eigenvalue of the cross-product matrix . Let the unit vector denote the corresponding dominant eigenvector of . Then
and
where denotes the Euclidean vector norm. Note also that
and
Consequently,
which proves (10). □
Next, we provide an upper bound on the increase in .
Theorem 2.
The inequality
holds for .
Proof.
Combining (10) and (17) shows that
This raises the question of for which directions of the value of attains its bounds. The key to answering this question lies in the following observation.
Lemma 1.
Assume for a moment that is an eigenvector of the matrix . That is,
where is a nonnegative scalar. In this case, the matrix has the same set of eigenvectors as . The eigenvector satisfies
while all the other eigenpairs remain unchanged.
Proof.
Since is an eigenvector of , substituting the spectral decomposition of into (14) yields the spectral decomposition of . □
The possibility that achieves its upper bound is characterized by the following assertion.
Theorem 3.
Assume for a moment that is a dominant eigenvector of . In this case
Otherwise, when is not pointing toward a dominant eigenvector,
Proof.
The first claim is a direct consequence of Lemma 1. To prove the second claim, we consider two cases. The first one occurs when . Since is not at the direction of , in this case, there is a strict inequality in (20), which yields a strict inequality in (19). In the second case, , so now we have a strict inequality in (18), which leads to a strict inequality in (19). □
Finally, we consider the possibility that .
Theorem 4.
Assume that and that is an eigenvector of , which corresponds to the smallest eigenvalue of this matrix. That is,
If
then . Otherwise, when
the value of satisfies
Proof.
The restriction is due to the fact that if , then the smallest eigenvalue of is always zero. The extension of Theorem 5 to cover this case is achieved by setting zero instead of . Similar results are obtained when points to other eigenvectors of .
3. The Smallest Singular Value Anomaly
In this section we explore the behavior of the smallest singular values. We shall start by proving that the sequence is descending. The proof uses the fact that for , the smallest eigenvalue of is .
Theorem 5.
For , we have the inequality
Proof.
The matrix is a principal submatrix of . Hence, (30) is a direct corollary of the Cauchy interlace theorem. □
Next, we show that the sequence , is ascending.
Theorem 6.
For , we have the inequality
Proof.
One way to conclude (31) is by using the fact that is a principal submatrix of . Let
and
denote the eigenvalues of these matrices. Then, since , , and (31) is a direct consequence of Cauchy interlace theorem.
As before, a second proof is obtained by comparing the matrices and , and this approach provides us with useful inequalities. Let the unit vector , denote an eigenvector of that corresponds to . Then
and has the minimum property
The last property implies the inequality
while a further use of (14) gives
□
The inequality (35) implies that the growing of depends on the size of the scalar product . Basically, it is difficult to estimate this product, but Lemma 1 and Theorem 5 give some insight. For example, if is an eigenvector of whose eigenvalue differs from , then . If is an eigenvector that corresponds to , there are two possibilities to consider. If is a multiple eigenvalue, then, again, . Otherwise, when is a simple eigenvalue,
where is the difference between the two smallest eigenvalues of .
We have seen that the sequence is descending, while the sequence is ascending. This behavior is called the smallest singular value anomaly. The fact that is the smallest singular value in the whole sequence raises the question of what makes small. Clearly, is always smaller than
Thus, to obtain a meaningful answer, we make the simplifying assumption
which enables the following bounds.
Lemma 2.
Proof.
Usually, the bound (41) is a crude estimate of . Yet, in some cases, it is the reason for a small value of .
4. The Condition Number Anomaly
In this section we investigate the behavior of the sequence . The discussion is carried out under the assumption that , which ensures that for . We have seen that the sequence is ascending while the sequence is descending. This proves that the sequence is ascending. That is,
It is also known that the sequences and are ascending, but this does not provide decisive information about the behavior of the sequence . We shall start with examples that illustrate this point.
Example 1.
This example shows that can be larger than . For this purpose, consider the case when is a dominant eigenvector of . Then from Lemma 1 we see that but , which means that .
Example 2.
A similar situation arises when A has the following property. Assume that as i grows, the sequence of rows directions , converges rapidly toward some vector. In this case, the sequence converges to the same vector, which brings us close to the situation of Example 1 (Tables 3 and 10 illustrate this possibility).
Example 3.
The third example shows that can be smaller than . Consider the case described in Theorem 4, when (27) holds. Here , and . More reasons that force decrease are given in Corollary 1 below.
Example 4.
The fourth example describes a situation in which the condition number behaves in a cyclic manner. Let be a given matrix with . Let the matrix A be obtained by duplicating B k times. That is, and
Then when i takes the values , the matrix has the form
Hence, for these values of i we have , and , but .
The situation in which the sequence is descending,
is called the condition number anomaly. The reasons behind this behavior are explained below.
Theorem 7.
Let the positive parameters and be defined by the equalities
and
Then, for ,
Proof.
Corollary 1.
The inequality
implies
□
The last corollary is a key observation that indicates at which situations the condition number anomaly is likely to occur. Assume for a moment that the direction of is chosen in some random way. Then, the scalar product terms and are likely to be about the same size. However, since is (considerably) smaller than , the term is expected to be larger than , which implies (50).
Summarizing the above discussion, we see that the condition number anomaly is likely to occur whenever the rows of the matrix satisfy two conditions: all the rows have about the same size, and the directions of the rows are scattered in some random way. This conclusion means that the phenomenon is shared by a wide range of matrices. The examples in Section 6 illustrate this point.
5. Iterations Anomaly
Let and , be as in the previous sections. Let be an arbitrary given vector, which is used to define the vectors . In this section, we examine how the condition number anomaly affects the convergence of certain iterative methods for solving a linear system of the form
We shall start by considering the Richardson method for solving the normal equations
e.g., [16,17,18]. Given the k-th iteration, of Richardson method has the form
where is a pre-assigned relaxation parameter. Recall that is the gradient vector of the least-squares objective function
at the point . Hence, iteration (54) can be viewed as a steepest descent method for minimizing that uses a fixed step length. An equivalent way to write (54) is
which shows that the rate of convergence of the method depends on the spectral radius of the iteration matrix
Let denote the spectral radius of . Then the theory of iterative methods tells us that the method converges whenever
and the smaller is, the faster the convergence; see, for example, Refs. [16,17,18]. Observe that the eigenvalues of lie in the interval . This shows that (58) holds for values of w that satisfy
Furthermore, let denote the optimal value of w, for which attains its smallest value. Then
and
See [17] (pp. 22–23) and [18] (pp. 114–115) for detailed discussion of these results. Consequently, as increases, the spectral radius of the iteration matrix approaches 1, and the rate of convergence slows down. That is, the condition number anomaly results in a similar anomaly in the number of iterations. See Table 12.
Another useful iterative method for solving large sparse linear systems is the Cimmino method, e.g., [15,16,18,23]. Let the unit vectors
be obtained by normalizing the rows of A. Let be an matrix whose rows are , and let denote the diagonal matrix
Then
for . Similarly, we define for , and . Then Cimmino method is aimed at solving the linear system
or the related normal equations
The kth iteration of the Cimmino method, has the form
where is a pre-assigned relaxation parameter, and are weighting parameters that satisfy
Observe that the point
is the projection of on the hyperplane , and the point
is a weighted average of these projections. The usual way to apply the Cimmino method is with equal weights. That is, for . This enables us to rewrite the Cimmino iteration in the form
which is the Richardson iteration for solving the normal equations (66). Therefore, from (61), we conclude that the optimal rate of convergence of the Cimmino method depends on the ratio
where is the condition number of .
Another example is the Jacobi iteration for solving the equations
The basic iteration of this method has the form
where is the diagonal matrix (63) and is a pre-assigned relaxation parameter. Now the equalities
indicate that the iteration matrix of the Jacobi method is similar to the matrix . Hence, as before, the optimal rate of convergence depends on the ratio . Thus, again, a condition number anomaly invites a similar anomaly in the number of iterations.
We shall finish this section by mentioning two further methods that share this behavior. The first one is the conjugate gradients algorithm for solving the normal Equations (53), whose rate of convergence slows down as the condition numbers of A increases. See, for example, Refs. [1] (pp. 312–314), [3] (pp. 299–300) and [18] (pp. 203–205). The second is Kaczmarz’s method, which is a popular “row-action” method; see Refs. [15,16,23,24,25]. The use of this method to solve is equivalent to the SOR method for solving the system , and both methods have the property that a small condition number results in fast convergence while a large condition number slows it [24,25].
6. Numerical Examples
In this section, we bring some examples that illustrate the actual behavior of the anomaly phenomena. The first examples consider small matrices.
Table 1 describes the anomaly in a “two-ones” matrix. This matrix has different rows. Each row has only two nonzero entries, and each nonzero entry has the value 1 (a matrix with n columns has at most different rows of this type). This matrix exhibits a moderate anomaly, due to the fact that is well conditioned.
Table 1.
The anomaly in small “two-ones” matrix.
Table 2 describes the anomaly in a small segment of the Hilbert matrix. Here, the entry equals . Consequently, the sequence of rows directions , , converges slowly toward the vector , where . Hence, the decrease in the sequence is quite moderate.
Table 2.
The anomaly in small Hilbert matrix.
In Table 3, we consider a small segment of the Pascal matrix. Recall that the entries of this matrix are built in the following way: for , and for . The other entries are obtained from the rule
In this matrix, the norm of the rows grows very fast while the sequence of row directions converges rapidly toward the vector . Thus, as i becomes considerably larger than n, both and approach , which causes to be larger than .
Table 3.
Failure of the anomaly in small Pascal matrix.
The random matrices that are tested in Table 4 and Table 5 provide nice examples of the anomaly phenomenon. In these matrices, each entry is a random number from interval . To generate these matrices, and the other random matrices, we used MATLAB’s command “rand”, whose random numbers generator is of uniform distribution. Similar results are obtained when “rand” is replaced with “randn”, which uses normal distribution.
Table 4.
The anomaly in small random matrix.
Table 5.
The anomaly in random matrix.
The nonnegative random matrix that is tested in Table 6 is obtained by MATLAB’s command . That is, here, each entry of A is a random number from the interval . This yields a more ill-conditioned matrix and a sharper anomaly.
Table 6.
The anomaly in nonnegative random matrix.
Table 7 and Table 8 consider a different type of random matrices. As its name says, the entries of the “ or 1” matrix are either or 1, with equal probability. In practice, the entry, , is defined in the following way. First, sample a random number, r, from the interval . If , then ; otherwise . The entries of the “0 or 1” matrix are defined in a similar manner: if then ; otherwise . Both matrices display a strong anomaly. The “0 or 1” matrix is slightly more ill conditioned and, therefore, has a sharper anomaly.
Table 7.
The anomaly in random “−1 or 1” matrix.
Table 8.
The anomaly in random “0 or 1” matrix.
The results of Table 9 and Table 10 are quite instructive. Both matrices are highly ill conditioned but display a different behavior. The “narrow range” matrix is a random matrix whose entries are sampled from the small interval [0.99, 1.01]. However, the directions of the rows are not converging, and the matrix displays a nice anomaly. The “converging rows” matrix is defined in a slightly different way. Here, the entries in the ith row, , are random numbers from the interval . Hence, the related sequence of rows directions, , converges toward the vector , which is the situation described in Example 2. Consequently, when i becomes much larger than n, we see a moderate increase in the value of .
Table 9.
The anomaly in random “narrow range” matrix.
Table 10.
Failure of anomaly in matrix with “converging rows”.
Other matrices that possess the anomaly phenomena are large sparse matrices. The matrix in Table 11 is created by using MATLAB’s command sprand, density) with 100,000, 10,000 and density . This way, each row of A has nearly 100 nonzero entries that have random values and random locations. Although not illustrated in this paper, our experience shows that the smaller the density, the sharper the anomaly.
Table 11.
The anomaly in large sparse random matrix.
Table 12 illustrates the iterations anomaly phenomenon when using the methods of Richardson, Cimmino, and Jacobi. The first two methods were used to solve linear systems of the form
As before, each is an submatrix that is composed from the first i rows of a given matrix A. The construction of A is done in two steps. First, we generate a random matrix as in Table 4 and Table 5. Then the rows of the matrix are normalized to be unit vectors. The vector is defined by the product
which ensures that solves the linear system. Since has unit rows, Cimmino iteration (71) coincides with Richardson iteration (54). The value of w that we use is the optimal one,
and the iterations start from the point . The iterative process is terminated as soon as the residual vector satisfies
The number of iterations which are required to satisfy this condition is displayed in the last column of Table 12.
Table 12.
Iterations anomaly in the methods of Richardson, Cimmino, and Jacobi.
The Jacobi method was used to solve the linear systems (73), where and are defined as above. Since has unit rows, is a unit matrix, and Jacobi iteration (74) is reduced to
The last iteration uses the optimal value of w, given in (79). It starts from the point and terminates as soon as the residual vector satisfies
The number of required iterations is nearly identical to that of the Richardson method. This is not surprising since multiplying (81) by shows that the sequence
is generated by Richardson iteration (54) (there were only two minor exceptions: for , the Jacobi method required 36,959 iterations instead of 36,960, while for , the Jacobi required 6,772,151 iterations instead of 6,760,589. In all the other cases, the two methods required exactly the same number of iterations).
The figures in Table 12 demonstrate the close link between the condition number and the rate of convergence. As anticipated from (61), for large values of the spectral radius approaches 1 and the rate of convergence slows down. Thus, a large condition number results in a large number of iterations. Conversely, a small value of implies a small spectral radius and a small number of iterations. In other words, a condition number anomaly invites a similar anomaly in the number of iterations.
Usually, it is reasonable to assume that the computational effort in solving a linear system is proportional to the number of rows. That is, the more rows we have, the more computation time is needed. From this point of view, the iterations anomaly phenomenon is somewhat surprising, as solving a linear system with rows needs considerably less time than solving a linear system with rows.
7. Concluding Remarks
As an old adage says, the whole is sometimes much more than the sum of its parts. The basic ascending (descending) properties of singular values are easily concluded from the Cauchy interlace theorem, while the inequalities that we derive enable us to see what causes a large, or small, increase. Combining these results gives a better overview of the whole situation. One consequence regards the anomalous behavior of the smallest singular values sequence , and the fact that is the smallest number in this sequence. The second observation is about the condition number anomaly. It is easy to conclude the increasing of the condition numbers sequence, , but Cauchy interlace theorem does not tell us how the rest of this sequence behaves. The answer is obtained by considering the expression for the ratio . This expression explains the reasons behind the condition number anomaly and characterizes situations that invite (or exclude) such behavior. We see that the anomaly phenomenon is likely to occur in “random-like” matrices whose rows satisfy two conditions: all the rows have about the same size and the directions of the rows scatter in some random way. This suggests that the condition number anomaly phenomenon is common in several types of matrices, and the numerical examples illustrate this point.
The practical importance of the condition number anomaly lies in the use of iterative methods for solving large linear systems. As we have seen, several iterative solvers have the property that the rate of convergence depends on the condition number. Therefore, when solving “random-like” systems, a fast rate of convergence is expected in under-determined or over-determined systems, while a slower rate is expected in (nearly) square systems.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The author declares no conflict of interest.
References
- Demmel, J.W. Applied Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [Google Scholar]
- Golub, G.H.; Loan, C.F.V. Matrix Computations, 4th ed.; Johns Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]
- Trefethen, L.N.; Bau, D., III. Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [Google Scholar]
- Tikhomirov, K. The smallest singular value of random rectangular matrices with no moment assumptions on entries. Israel J. Math. 2016, 212, 289–314. [Google Scholar] [CrossRef]
- Bai, Z.D.; Yin, Y.Q. Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. Ann. Probab. 1993, 21, 1275–1294. [Google Scholar] [CrossRef]
- Chen, Z.; Dongarra, J.J. Condition numbers of Gaussian random matrices. SIAM J. Matrix Anal. Appl. 2005, 27, 603–620. [Google Scholar] [CrossRef]
- Edelman, A. Eigenvalues and condition numbers of random matrices. SIAM J. Matrix Anal. Appl. 1988, 9, 543–560. [Google Scholar] [CrossRef]
- Marchenko, V.A.; Pastur, L.A. Distribution of eigenvalues of some sets of random matrices. Math. USSR-Sb. 1967, 1, 457–486. [Google Scholar] [CrossRef]
- Mendelson, S.; Paouris, G. On the singular values of random matrices. J. Eur. Math. Soc. 2014, 16, 823–834. [Google Scholar] [CrossRef]
- Rudelson, M.; Vershynin, R. Smallest singular value of a random rectangular matrix. Commun. Pure Appl. Math. 2009, 62, 1707–1739. [Google Scholar] [CrossRef]
- Silverstein, J. On the weak limit of the largest eigenvalue of a large dimensional sample covariance matrix. J. Multivar. Anal. 1989, 30, 307–311. [Google Scholar] [CrossRef]
- Szarek, S. Condition numbers of random matrices. J. Complex. 1991, 7, 131–149. [Google Scholar] [CrossRef]
- Tatarko, K. An upper bound on the smallest singular value of a square random matrix. J. Complex. 2018, 48, 119–128. [Google Scholar] [CrossRef]
- Zimmermann, R. On the condition number anomaly of Gaussian correlation matrices. Linear Algebr. Appl. 2014, 466, 512–526. [Google Scholar] [CrossRef]
- Censor, Y. Row-action methods for huge and sparse systems and their applications. SIAM Rev. 1981, 23, 444–466. [Google Scholar] [CrossRef]
- Dax, A. The convergence of linear stationary iterative processes for solving singular unstructured systems of linear equations. SIAM Rev. 1990, 32, 611–635. [Google Scholar] [CrossRef]
- Hageman, L.A.; Young, D.M. Applied Iterative Methods; Academic Press: New York, NY, USA, 1981. [Google Scholar]
- Saad, Y. Iterative Methods for Sparse Linear Systems, 2nd ed.; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar]
- Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, MA, USA, 1985. [Google Scholar]
- Horn, R.A.; Johnson, C.R. Topics in Matrix Analysis; Cambridge University Press: Cambridge, MA, USA, 1991. [Google Scholar]
- Hwang, S.-G. Cauchy’s interlace theorem for eigenvalues of Hermitian matrices. Am. Math. Mon. 2004, 111, 157–159. [Google Scholar] [CrossRef]
- Parlett, B.N. The Symmetric Eigenvalue Problem; Prentice-Hall: Englewood Cliffs, NJ, USA, 1980. [Google Scholar]
- Censor, Y.; Zenios, S.A. Parallel Optimization, Theory, Algorithms, and Applications; Oxford University Press: Oxford, UK, 1997. [Google Scholar]
- Dax, A. The adventures of a simple algorithm. Linear Algebr. Appl. 2003, 361, 41–61. [Google Scholar] [CrossRef][Green Version]
- Dax, A. Kaczmarz Anomaly: A Surprising Feature of Kaczmarz Method, Technical Report; Hydrological Service of Israel, 2021; in preparation.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).