1. Introduction
Let
A be a real nonsingular nonsymmetric matrix of order
n. So far, the most popular iterative methods for solving a nonsymmetric linear system
, where
b is a given real vector, are Krylov methods. Many of them can be classified as Quasi-ORthogonal (Q-OR) methods or Quasi-Minimal Residual (Q-MR) methods (see, for instance, [
1]). All these methods use the same framework but differ by the basis which is chosen. Different possibilities for computing the basis are described in [
1] (Chapter 4). Well-known examples of Krylov methods are FOM [
2,
3] and GMRES [
4], which use an orthonormal basis. In [
5], a Q-OR optimal method that minimizes the residual norm using a non-orthogonal basis was proposed. In most cases, it must give the same residual norms as GMRES, which also minimizes the residual norm, but uses an orthonormal basis computed with the Arnoldi process (see [
4]).
In recent years, randomization techniques have been proposed to reduce the dimension of some problems in numerical linear algebra (see [
6,
7,
8]). In this paper, we study how to introduce randomization and matrix sketching in the Q-OR optimal algorithm. Sketching is used to solve a least squares subproblem that must be solved at each iteration.
Section 2 recalls the Q-OR optimal method [
1,
5]. In
Section 3, we describe some known techniques for matrix sketching.
Section 4 shows how to use these techniques in the Q-OR optimal method. This is illustrated by a few numerical experiments described in
Section 5, showing that, even though some monotonicity properties are lost, convergence is preserved for the randomized algorithm.
2. The Q-OR Optimal Method
Let
be the initial residual vector. Let us assume that we have an ascending basis of the nested Krylov subspaces
, which are defined as
The dimension of these subspaces rises to , known as the grade of A with respect to . This means that, if are the basis vectors of , then are the basis vectors for as long as .
Such basis vectors satisfy what is called an Arnoldi relation,
where
is an upper Hessenberg matrix with entries
, the columns of
are the basis vectors
, and
is the last column of the identity matrix of order
k. The matrix
is
, appended at the bottom with a
st row equal to
.
The iterates
in Q-OR and Q-MR methods are sought as
for some unique vector
. Since we choose
, the residual vector
, defined as
, is
In a Q-OR method, the
kth iterate
is defined (provided that
is nonsingular) by computing
in (
2) as the solution of the linear system
This annihilates the term within the parentheses on the right side of (
3). The iterates of the Q-OR method are
, the residual vector
is proportional to
, and
In the case where is singular and is not defined, we define the residual norm as being infinite, .
The residual vector in relation (
3) can also be written as
Instead of removing the term within the parentheses on the right side of (
3), we would like to minimize the norm of the residual itself. This is what is carried out in GMRES with an orthonormal basis [
4]. Minimizing the norm of the residual may seem as costly when the columns of the matrix
are not orthonormal. However, we have
In a general Q-MR method, the vector
is computed as the solution of the least squares problem:
Note that
does not minimize the norm of the residual, but the norm of what is called the quasi-residual, as follows:
The Q-MR iterates are always defined as opposite to the Q-OR iterates when
is singular. Note that the preceding definitions do not depend on the choice of the basis. It is a general framework that could use any basis. Q-OR and Q-MR methods, as well as their many interesting mathematical properties, are studied in detail in [
1].
The Hessenberg matrices
are unreduced since
for
. Therefore, they are nonderogatory and can be factorized as
, where
is an upper triangular matrix with
, and
is a companion matrix corresponding to the characteristic polynomial of
(see [
1]). The matrix
is, in fact, a Krylov matrix:
Clearly,
is the principal matrix of order
k of
. Let
be the entries of the first row of
. It is proved in [
1,
5] that, whatever the basis of the Krylov subspace is, the Q-OR residual norms satisfy
As shown in [
1,
5], there exists a non-orthogonal basis such that
is maximized. Therefore, this minimizes the Q-OR residual norm. Assuming that
and
, it can be computed as follows:
with
and
The
k first entries of the
kth column of the upper Hessenberg matrix
are
, and
. Moreover, we have
, and
At iteration
k, we have to solve the linear system (
9) whose matrix is symmetric-positive-definite as long as
is of rank
k. In [
5], this linear system was solved by incrementally computing the inverses of the triangular factors of the Cholesky factorization of
. The details of the method, as described in [
5], are shown as Algorithm 1. In this algorithm, the matrix
contains the inverse of the Cholesky factor of
. Preconditioning can be easily incorporated each time we have a product of the matrix
A with a vector.
Note that the modulus of gives the inverse of the (relative) norm of the Q-OR residual at iteration k. Hence, we can compute the basis vectors , stop the iterations using , and then reduce the upper Hessenberg matrix to an upper triangular form to compute the final approximate solution.
This method is named Q-ORoptinv because it minimizes the residual norm and uses the inverses of Cholesky factors. When, for all k, , it must give the same residual norms as GMRES. The reader may wonder why we have derived an algorithm which delivers the same residual norms as GMRES but with more floating point operations. The reason is that the dot products in Q-ORoptinv are all independent and they can be computed in parallel, contrary to the dot products in the modified Gram–Schmidt (MGS) implementation of GMRES.
As in GMRES, the storage increases at every iteration, so the algorithm can be restarted every
m iterations to limit the needed storage.
Algorithm 1 Q-ORoptinv. |
- 1:
input A, b, - 2:
—Initialization - 3:
- 4:
, , - 5:
, - 6:
- 7:
, - 8:
, - 9:
- 10:
, - 11:
- 12:
—End of initialization - 13:
for until convergence do - 14:
, - 15:
, - 16:
if then - 17:
- 18:
else - 19:
, - 20:
end if - 21:
- 22:
, - 23:
, - 24:
- 25:
- 26:
- 27:
- 28:
, - 29:
- 30:
if needed, solve , - 31:
end for
|
The solution
s of Equation (
9) is also the solution of the least squares problem:
since (
9) is the normal equation corresponding to (
10). Hence, we can use the economy size QR factorization of
to solve (
10) with an upper triangular matrix
R of order
k instead of using the inverses of the Cholesky factors of
. Since the method is often restarted with
, meaning that the number of columns
k is small compared to the number of rows,
is what is called a tall-and-skinny matrix. There exist special algorithms for computing the QR factorization of such matrices that can be used on parallel computers (see [
9,
10]). Note that the columns of
Q give an orthogonal basis of the Krylov subspace. So, if we use the QR factorization, we are more or less back to what is completed in GMRES. When the restart parameter
m is large, or when there is no restart, using the QR factorization may be too expensive. However, at each iteration, we only add one more column to the matrix
. There exist algorithms for updating the QR factorization when we add a new column to the matrix (see [
11,
12]). This can be carried out, for instance, by orthogonalizing the new column against the columns of the previous matrix
Q with the modified Gram–Schmidt algorithm. This is what we used in our numerical experiments.
3. Random Sketching
Since the matrix
in the least squares problem (
10) is tall and skinny, it may be useful to use a random sketching, a technique that was introduced during the last twenty years. This is used to reduce the dimension of the problem (see, for instance, [
6]). A sketching matrix
S is of order
with
. Let
be a subspace of
. The matrix
S is an
-embedding of
if
where
. Generally,
-embeddings are constructed with probabilistic techniques to be independent of the subspace
with a high probability. They are called oblivious
-embeddings. There are several distributions for constructing such embeddings, such as Gaussian ones and the subsampled randomized Hadamard transform (SRHT) [
13].
SRHT is constructed with Hadamard matrices. These matrices are defined recursively. Starting with
, and having a Hadamard matrix
H, the next matrix is
Therefore, their order is always a power of 2. Let
p be an integer such that
is the smallest power of 2 larger than or equal to
n. The
SRHT matrix
is
where
D is a random diagonal matrix with diagonal entries
,
H is a Hadamard matrix, and
P is a random uniform subsampling matrix. The constant in front of
depends on the way the Hadamard matrix is scaled. For our purposes, the sketching matrix
S is made of the first
n columns of
. We apply
to a vector with only the first
n components, which are nonzero. The multiplication by
H is carried out using the fast Walsh–Hadamard transform. It uses the recursive structure of
H to evaluate the product in
operations with
. The problem with this sketching matrix is that
can be much larger than
n.
Another possibility is to use the Clarkson–Woodruff transform [
8,
14]. The matrix
S is an
sparse matrix with only one nonzero entry in each column which is
with probability
. The row number of the entry is chosen randomly. For the first
ℓ columns of
S, a random permutation of
is chosen.
A delicate issue with matrix sketching is the choice of
ℓ. It is known that inequality (
11) is satisfied for SRHT with probability
if
where
k is the dimension of the subspace
. However, this is of little help for us since we need the same sketching matrix
S for all iterations and the subspace dimension is increasing by one every iteration. If the Q-OR method is restarted,
k may be chosen as the restart parameter
m. However, this may be too small to obtain a fast convergence. We will show experimentally in
Section 5 how the choice of
ℓ influences the convergence of the sketched Q-OR method.
In numerical linear algebra, matrix sketching has been mainly used with some successes for solving large least squares problems. In recent years, randomization has also been used in different Krylov methods for solving linear systems. However, methods such as randomized GMRES [
15] or sketched GMRES [
7] do not minimize the residual norm as in GMRES. Hence, they are misnamed. In fact, some of them are Q-MR methods with non-orthogonal bases.
5. Numerical Experiments
For the first experiment, we consider the matrix
fs_680_1 (
https://sparse.tamu.edu, URL accessed on 1 January 2025). We scale this matrix to have a unit diagonal and name it
fs_680_1c. This sparse matrix of order 680 has 21,184 nonzero entries and a condition number equal to
.
Figure 1 shows the true residual norms
for the standard Q-ORoptinv method using the inverses of Cholesky factors and the randomized method using SRHT sketching without preconditioning and without restarting. The initial iterate is the zero vector. Note that for SRHT,
when
. The value of
ℓ is
. Using Clarkson–Woodruff sketching provides almost the same results. The residual norms of the two algorithms are almost similar, but since the method with sketching does not minimize the residual norm, it is slightly larger and with small oscillations.
Figure 2 displays the true residual norms for the method with SRHT sketching for
, and
. Note that
and
. This limits the number of iterations that we can perform with these small values of
ℓ. In fact, one can see that, after 43 iterations, the algorithm with
does not converge. The results with
,
, and
are more or less the same, showing that the algorithm is only weakly dependent on the choice of
ℓ. However, with
, we cannot perform much more than 85 iterations.
For the second example, we consider the matrix
rajat27 (
https://sparse.tamu.edu) of order 20,640. Since this matrix has some zero entries on the diagonal and this can be a problem for some preconditioners, we add
to the matrix, and we name it
rajat27b. This matrix has 101,681 nonzero entries and an estimated condition number equal to
. We use a diagonal preconditioner.
Figure 3 shows the computed residual norms (using relation (
5)) for the standard Q-ORoptinv method and the randomized method using SRHT sketching. Once again, the method with sketching converges similarly to the standard method.
Figure 4 displays the computed residual norms for the method with SRHT sketching for
,
, and
. Note that all these values of
ℓ are larger than the number of iterations we have to perform. The results with these values of
ℓ are more or less the same, showing, once again, that the randomized algorithm is only weakly dependent on the choice of
ℓ.
Figure 5 compares SRHT and Clarkson–Woodruff sketching. The two algorithms converge similarly, but more oscillations occur with Clarkson–Woodruff sketching. However, it is cheaper than SRHT.
The third example corresponds to the finite difference discretization of a convection–diffusion equation,
with homogeneous Dirichlet boundary conditions. The diffusion coefficient
is piecewise constant, being equal to 100 in
and 1 elsewhere. The mesh size is
, providing a matrix of order 22,500. Its estimated condition number is
. The right-hand side is a random vector.
We use an incomplete LU preconditioner without fill-in (ILU(0)) and we restart the methods every 100 iterations.
Figure 6 shows that, even though there are some oscillations with the method using sketching, the convergence is very similar to that of Q-ORoptinv.