Next Article in Journal
New Reductions of the Unsteady Axisymmetric Boundary Layer Equation to ODEs and Simpler PDEs
Next Article in Special Issue
Constructing a Matrix Mid-Point Iterative Method for Matrix Square Roots and Applications
Previous Article in Journal
The Due Window Assignment Problems with Deteriorating Job and Delivery Time
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Efficient Reduction Algorithms for Banded Symmetric Generalized Eigenproblems via Sequentially Semiseparable (SSS) Matrices

1
College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
2
College of Liberal Arts and Sciences, National University of Defense Technology, Changsha 410073, China
3
Computational Aerodynamics Institute, China Aerodynamics Research and Development Center, Mianyang 621000, China
4
School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2022, 10(10), 1676; https://doi.org/10.3390/math10101676
Submission received: 30 March 2022 / Revised: 2 May 2022 / Accepted: 9 May 2022 / Published: 13 May 2022
(This article belongs to the Special Issue Matrix Equations and Their Algorithms Analysis)

Abstract

:
In this paper, a novel algorithm is proposed for reducing a banded symmetric generalized eigenvalue problem to a banded symmetric standard eigenvalue problem, based on the sequentially semiseparable (SSS) matrix techniques. It is the first time that the SSS matrix techniques are used in such eigenvalue problems. The newly proposed algorithm only requires linear storage cost and O ( n 2 ) computation cost for matrices with dimension n, and is also potentially good for parallelism. Some experiments have been performed by using Matlab, and the accuracy and stability of algorithm are verified.
MSC:
6505; 65F30; 65R20; 68W40; 68P05

1. Introduction

In this paper, we consider how to reduce the following generalized eigenvalue problem (GEP) to a standard eigenvalue problem,
A Q = B Q Λ ,
with banded Hermitian matrices A , B C n × n and B positive definite. In the real case, A and B would be symmetric instead of Hermitian. The classical way is first to compute the Cholesky factorization of B = L L H with a lower triangular matrix L, and then multiply (1) with L 1 , and it yields a standard eigenvalue problem,
L 1 A L H × L H Q = L H Q Λ .
The problem with this approach is that C : = L 1 A L H is dense though A and L are banded, since L 1 is fully triangular in general. In this paper, we consider how to reduce matrix C to a symmetric banded form efficiently.
The LAPACK library [1] includes some routines for reducing the banded GEP to banded SEP, named XHBGST and XSBGST, which X denotes different precision, S (single precision), D (double precision), C (single complex precision) and Z (double complex precision). We use the real double case to introduce the main process. First, it (DPBSTF) computes a split Cholesky factorization [2] of the real symmetric banded positive definite matrix B, B = S T S , where the leading p × p submatrix of S an upper banded matrix with bandwidth b B and the lowers n p rows form a lower banded matrix with bandwidth b B . This factorization is also called the ‘twisted factorization’ [3]. Then, the routine PSBGST updates A = X T A X , where X = S 1 Q and Q is an orthogonal matrix chosen to preserve the bandwidth of A. The matrix S is treated as a product of elementary matrices:
S = S m S m 1 S 2 S 1 S m + 1 S n 1 S n ,
where S i is determined by the i-th row of S. For each value of i, the current matrix A is updated by forming S i 1 A S i T , and the introduced bulge is chased out by applying plane rotations.
Some approaches have been proposed to reduce the generalized eigenvalue problem to the tridiagonal-diagonal form [4,5], which directly reduce B to the diaognal form and A to the tridiagonal form, respectively. In 1973, Crawford [6] proposed a new reduction method for the real symmetric case with b A = b B . Different from LAPACK, the method is based on a decomposition of the matrices into b B × b B blocks and the bulge is removed immediately by using matrix-matrix multiplications. Lang [7] combined the good features of Crawford’s scheme with LAPACK routines, which proceeds by blocks and can also handle different bandwidths b B < b A . A distributed parallel version [8] is included in ELPA [9].
In this work we present a new reduction algorithm that is completely different from previous algorithm, and the tool that we use is the sequentially semiseparable matrix (SSS) techniques. The SSS matrix was introduced in [10,11], which is a kind of rank-structured matrices, see the Section 2. Since a symmetric banded matrix can be seen as a special block tridiagonal matrix, L is a block bidiagonal matrix and it is well-known that the inverse of L is a lower triangular SSS matrix, see the Section 2 and [12]. The matrix C = L 1 A L T can be proved to be an SSS matrix, see the Section 2. Different from Crawford’s method [6] and LAPACK [1], we compute C explicitly, but express it in SSS form. The computation and storage do not increase much. Like Crawford’s method, the advantage of our approach is that the matrices are partitioned into blocks, almost all operations are (small) matrix-matrix multiplications, and some of these small matrix-matrix multiplications can be computed in parallel by using dynamic modeling. For the task-based implementation of matrix operations, we can leverage the CHAMELEON library [13,14] to implement a parallel version of our algorithm, which is one potential advantage of our algorithm. This will be our future work.
In this paper we reduce the original problem to the block tridiagonalization problem of an SSS matrix. Some fast algorithms for tridiagonalizing a diagonal plus semiseparable matrix was introduced in [15,16] which costs O ( n 2 ) flops. We generalize the tridiagonalization approach in [15] to the (block) SSS matrix case, and show how to further get its banded form, and the complexity is O ( n r 2 ) flops, where n is the dimension of matrix, and  r = b A = b B . The disadvantage of our algorithm proposed in this work is that it requires the semi-bandwidths of A and B to be equal, b A = b B . The procedure is shown in Algorithm 1, and it works on the SSS generators of matrix C = L 1 A L H and the outputs are also some small matrices. The memory and computation costs are in the same order as the algorithms in LAPACK, and our algorithm is easy to be implemented in parallel.
The following sections of this paper are organized as follows. Section 2 gives a brief introduction to semiseparable and SSS matrices, and fast matrix multiplication of two SSS matrices is also included. Section 3 describes how to express matrix C into an SSS matrix and how to recompress its generators. The banded reduction process for symmetric SSS matrix is shown in Section 4 and the complexity analysis is included. All the performance results are summarized in Section 5. Conclusions are drawn in Section 6.

2. Semiseparable and SSS Matrices

Rank structured matrices have attracted much attention in recent years. In [17], Raf Vandevril, Marc Van Barel, and Nicola, Mastronardi present a comprehensive overview of the mathematical and numerical properites of one class of these matrices: semiseparable matrices, which is the simplest case. The rank-structured matrices include H -matrices [18,19,20,21], H 2 -matrices [22,23], quasiseparable matrices [24,25], semiseparable matrices [17,26], sequentially semiseparable matrices [11], hierarchically semiseparable matrices [27,28,29], etc. Current machine learning and big data analysis are research hotspots [30], and rank-structured matrix techniques can also be used in these areas [31,32,33,34].
The semiseparable structure is a matrix analog of the semiseparable integral kernels as described by Kailath in [35]. The semiseparable matrix has been referred as the inverse of unreducible tridiagonal matrix, and Green matrix, one-pair matrix, and single-pair matrix, see [36,37,38]. Semiseparable matrices appear in several types of applications, e.g., the field of integral equations, boundary value problems, Gauss-Markov process, time-varying linear systems, statistics, acoustic and electromagnetic scattering theory, rational interpolation, and so on.
The sequentially semiseparable matrices (SSS) matrices exploit the off-diagonal low-rank property: the off-diagonal blocks are represented as product of a sequence low-rank matrices. For an n × n matrix A with block partitioning
A = A 11 A 12 A 1 N A 21 A 22 A 2 N A N 1 A N 2 A N N ,
where A i j R m i × m j and n = m 1 + + m N , it can be represented by
A i j = A i i if i = j , U i W i + 1 W j 1 V j T if j > i , P i R i 1 R j + 1 Q j T if j < i .
For a symmetric matrix A, P k = V k , R k = W k T and Q k = U k for each k. The dimensiosn of these generator matrices { U i } i = 1 N 1 , { V i } i = 2 N , { W i } i = 2 N 1 , { P i } i = 2 N , { Q i } i = 1 N 1 , { R i } i = 2 N 1 and { D i } i = 1 N are shown in Table 1. The empty products are defined to be the identity matrix. For  N = 4 , the matrix A has the following form,
A = D 1 U 1 V 2 T U 1 W 2 V 3 T U 1 W 2 W 3 V 4 T P 2 Q 1 T D 2 U 2 V 3 T U 2 W 3 V 4 T P 3 R 2 Q 1 T P 3 Q 2 T D 3 U 3 V 4 T P 4 R 3 R 2 Q 1 T P 4 R 3 Q 2 T P 4 Q 3 T D 4 .

Fast Matrix-Matrix Multiplication

A fast algorithm for multiplying an SSS matrix (4) with any given vector or matrix has been presented in [10,39]. This subsection only introduces the case that both A and B are SSS matrices [10]. Let A and B be matrices in SSS form that are conformally partitioned. The forward and backward recursions are defined as
G 1 = 0 , G i + 1 = Q i T ( A ) U i ( B ) + R i ( A ) G i W i ( B ) , i = 1 , , n 1 , H n = 0 , H i 1 = V i T ( A ) P i ( B ) + W i ( A ) H i R i ( B ) , i = n , , 2 .
We have the following theorem.
Theorem 1
(see [40]). The SSS form of matrix C = A B can be computed through the following recursions:
D i ( C ) = D i ( A ) D i ( B ) + P i ( A ) G i V i T ( B ) + U i ( A ) H i Q i T ( B ) , P i ( C ) = D i ( A ) P i ( B ) + U i ( A ) H i R i ( B ) P i ( A ) R i ( C ) = R i ( B ) Q i T ( A ) P i ( B ) R i ( A ) Q i ( C ) = Q i ( B ) D i T ( B ) Q i ( A ) + V i ( B ) G i T R i T ( A ) U i ( C ) = D i ( A ) U i ( B ) + P i ( A ) G i W i ( B ) U i ( A ) W i ( C ) = W i ( B ) V i T ( A ) U i ( B ) W i ( A ) V i ( C ) = V i ( B ) D i T ( B ) V i ( A ) + Q i ( B ) H i T W i T ( A ) .
This algorithm is an order of magnitude faster than the general matrix-matrix multiplication algorithms. Notice that after multiplication the ranks of generators will increase. Dewilde and van der veen [39] present a technique to compress the generators. A simple, efficient and numerically stable method is further proposed in [11] to compress a given SSS representation to a predefined tolerance τ , and this method is further introduced in Section 3.2 for completeness.

3. The Reduction Algorithm

Assume the bandwidth of matrix B is b B , N = n / b B , and L can be seen as a lower block bidiagonal matrix and each block is a b B × b B small matrix. Without loss of any generality, we assume n = N × b B . From Gaussian elimination, we know
L = L 1 L N ,
where L i is an identity matrix except for the b B × b B diagonal block L i i and the b B × b B subdiagonal block L i , i 1 or L i + 1 , i from L. There are two form of L: the ‘row-wise’ and ‘column-wise’ forms. If N = 3 , the row-wise form, i.e., L i , i 1 is nonzero, is written as
L = L 11 L 21 L 22 L 32 L 33 = L 11 I I I L 21 L 22 I I I L 32 L 33 L 1 L 2 L 3 .
Then, its inverse yields
L 1 = I I L 33 1 L 32 L 33 1 I L 22 1 L 21 L 22 1 I L 11 1 I I L ˜ 11 L ˜ 21 L ˜ 11 L ˜ 22 L ˜ 32 L ˜ 21 L ˜ 11 L ˜ 32 L ˜ 22 L ˜ 33 , where L ˜ i , i 1 = L i i 1 L i , i 1 .
If N = 3 and the column-wise form is written as
L = L 11 L 21 L 22 L 32 L 33 = L 11 L 21 I I I L 22 L 32 I I I L 33 L 1 L 2 L 3 .
Then, the inverse yields
L 1 = I I L 33 1 I L 22 1 L 32 L 22 1 I L 11 1 L 21 L 11 1 I I L ˜ 11 L ˜ 22 L ˜ 21 L ˜ 22 L ˜ 33 L ˜ 32 L ˜ 21 L ˜ 33 L ˜ 32 L ˜ 33 , where L ˜ i , i 1 = L i , i 1 L i i 1 .
It can be seen that L 1 is exactly the sequentially semiseparable (SSS) matrix defined in [11]. The exact formulae for the entries of L 1 are given as, see also [12],
L 1 i j = ( 1 ) i + j k = i j + 1 L k k 1 L k + 1 , k L j j 1 ,
for i = 2 , , N and j = 1 , , i 1 . It is easy to see that the off-diagonal blocks of L 1 are low-rank, and its rank is b B , see [11,17]. The SSS generators of L 1 are
D i ( L 1 ) = L i i 1 , P i = D i ( L 1 ) L i , i 1 , R i ( L 1 ) = P i 1 ( L 1 ) , Q i ( L 1 ) = D i ( L 1 ) .
The complexity of computing these generators is N × ( r 3 + 2 3 r 3 ) = O ( 5 3 n r 2 ) , where L i i and L i , i 1 are upper and lower triangular of dimension r = b B , respectively. To represent L 1 in SSS form, we only need { D i } and { P i } , and their numbers are N and N 1 , respectively. For more complex operations, we introduce the other two generators { R i } and { Q i } .

3.1. The SSS Representation of C

In this subsection, we show that matrix C = L 1 A L H is also an SSS matrix. It is well-known that L 1 is an SSS matrix. If A is a symmetric block tridiagonal matrix, it is also an SSS matrix, see Equation (5), with generators,
D i ( A ) = A i i , P i ( A ) = V i ( A ) = A i + 1 , i , R i ( A ) = W i ( i ) T = 0 , Q i ( A ) = U i ( A ) = I .
According to Theorem 1, if we multiply L 1 with A, the forward and backward recursions, { G i } and { H i } , are
G 1 = 0 , G i + 1 = Q i T ( L 1 ) U i ( A ) = Q i T ( L 1 ) , i = 1 , , N 1 , H n = 0 , H i 1 = V i T ( L 1 ) P i ( A ) = V i T ( L 1 ) A i + 1 , i = 0 , i = N , , 2 .
It is because R i ( A ) = W i ( A ) = 0 , and V i ( L 1 ) = 0 (L is lower triangular). Then, the generators of C ¯ = L 1 A are
D i ( C ¯ ) = D i ( L 1 ) D i ( A ) + P i ( L 1 ) G i V i T ( A ) , P i ( C ¯ ) = D i ( L 1 ) P i ( A ) P i ( L 1 ) R i ( C ¯ ) = 0 Q i T ( L 1 ) P i ( A ) R i ( L 1 ) Q i ( C ¯ ) = Q i ( A ) D i T ( A ) Q i ( L 1 ) + V i ( A ) G i T R i T ( L 1 ) U i ( C ¯ ) = D i ( L 1 ) 0 D i ( L 1 ) W i ( C ¯ ) = 0 0 0 V i ( C ¯ ) = V i ( A ) 0 V i ( A ) .
The ranks of P i ( C ¯ ) , R i ( C ¯ ) and Q i ( C ¯ ) are all b B . We can use the recompression techniques [10] to compress them into compact form, which is also introduced in Section 3.2. After obtaining the compact form of C ¯ , we can further compute C = C ¯ × L H and express it in SSS form. The forward and backward recursions are computed as
G ¯ 1 = 0 , G ¯ i + 1 = Q i T ( C ¯ ) U i ( L H ) + R i ( C ¯ ) G ¯ i W i ( L H ) , i = 1 , , N 1 , H ¯ n = 0 , H ¯ i 1 = V i T ( C ¯ ) P i ( L H ) + W i ( C ¯ ) H ¯ i R i ( L H ) 0 , i = N , , 2 .
Since the generators of L T are the same as those of L 1 , i.e., D i ( L T ) = D i ( L 1 ) T , V i ( L T ) = P i ( L 1 ) , W i ( L T ) = R i ( L 1 ) T , U i ( L T ) = Q i ( L 1 ) , G ¯ i can be computed as
G ¯ 1 = 0 , G ¯ i + 1 = Q i T ( C ¯ ) Q i ( L 1 ) + R i ( C ¯ ) G ¯ i R i ( L 1 ) T , i = 1 , , N 1 ,
and the generators of C are computed as
D i ( C ) = D i ( C ¯ ) D i T ( L 1 ) + P i ( C ¯ ) G ¯ i P i T ( L 1 ) , P i ( C ) = P i ( C ¯ ) , R i ( C ) = R i ( C ¯ ) , Q i ( C ) = D i ( L 1 ) Q i ( C ¯ ) + P i ( L 1 ) G ¯ i T R i T ( C ¯ ) .
All the generators of C are computed and their ranks are also b B . Since C is symmetric, only the generators of the diagonals and lower triangular part are needed. The generators of C are already in compact form. To summarize, we have the following proposition.
Proposition 1.
Assume that A and B are Hermitian matrices with bandwidth b A = b B , and B is further positive definite, and its Cholesky factorization is B = L L H . Then, the matrix C = L 1 A L H is an SSS matrix and the ranks of its off-diagonal generators are all b B .
It is easy to see that the complexity of computing the generators of C ¯ is O ( 31 3 n r 2 ) floating point operations (flops), and similarly, the complexity of computing the generators of C from C ¯ is another O ( 10 n r 2 ) flops.

3.2. Recompression of C

From Equation (8), we know that the SSS representation of C ¯ is not compact. We can use the techniques proposed in Section 3.7 of [11] and Section 10.6 of [10], to compress them into compact forms. Furthermore, since C = L 1 A L T is symmetric, we only need to compress the generators of the lower triangular part, { P i } , { R i } and { Q i } . For a symmetric SSS matrix, its generators satisfy V i = P i , U i = Q i and W i = R i T . Therefore, we only need to consider the lower triangular part of C ¯ .
For completeness, we recall the recompression process in [11]. To be consistent with the context of this paper, we introduce the process by using the generators of the lower triangular part. The recompression method is split into two stages in [11]. In this paper, we reverse the stages. In the first stage, the representation is converted into the right proper form; that is, now all the row bases G i of the Hankel-blocks (The term Hankel-block is taken from [39]. In this paper it denotes the off-diagonal blocks that extend from the diagonal to the southwest corner), where
G N = P N , G i = P i G i + 1 R i , for i = N 1 , , 2 .
will have orthonormal columns. In the second stage, the representation is converted into the left proper form. By left proper form they mean that all the column bases C i of the Hankel-blocks, where
C 1 = Q 1 T , C i = R i C i 1 Q i T ,
should have orthonormal columns. The second stage recursions will essentially be first-stage recursions in the opposite order. Note that the Hankel-block H i = G i + 1 C i .
We follow the notation used in [11], and use hats to denote the representation in right proper form. Consider the following recursions:
P i R ˜ i P ^ i R ^ i Σ i F i H , τ - accurate SVD , R ˜ i 1 = Σ i F i H R i 1 , Q ^ i 1 = Q i 1 F i Σ i H ,
with the understanding that R ˜ N and R ^ N are empty matrices. Then it is easy to check that the new row bases
G ^ n = P ^ n , G ^ i = P ^ i G ^ i + 1 R ^ i ,
have orthonormal columns and that the hatted sequences form a valid SSS representation for the given matrix. The generators { P ^ i } , { R ^ i } and { Q ^ i } are all r × r matrices. For our problem, our main goal is to have a compact form of generators and the orthonormality does not matter much. Therefore, we only need the first stage. We do not introduce the next stage, and the interested readers can refer to Section 3.7 of [11].
In Equation (10), it uses truncated SVD to compute low-rank approximations of generators { P i } . For accuracy, we can let τ be small or zero. We can also use RRQR [41] or interpolative decomposition (ID) [42] to find a low-rank approximation. In our implementation, we used ID and the computed generators { P ^ i } are not orthonormal.

Complexity

The recompression consists of three steps:
  • Compute a low-rank approximation of P i R ˜ i of dimensions r × 2 r or 2 r × 2 r . If using ID, it costs 2 r 3 + ( N 2 ) 4 r 3 = O ( 4 n r 2 ) flops.
  • Compute R ˜ i 1 = X i R i 1 , where X is an r × 2 r matrix and R i 1 is of dimension 2 r × 2 r . It costs ( N 2 ) 2 r × 2 r 2 = O ( 4 n r 2 ) flops.
  • Compute Q ^ i 1 = Q i 1 × X T , where X T is of dimension 2 r × r and Q i 1 is of dimension r × 2 r . It costs ( N 1 ) × 2 r × 2 r 2 = O ( 4 n r 2 ) flops.
We use the fact that ID costs O ( m n k ) flops for computing a rank k approximation of a m × n matrix, Therefore, the recompression of matrix C ¯ costs O ( 12 n r 2 ) flops in total. Thus, computing the SSS representation of C = L 1 A L T totally costs ( 5 3 + 31 3 + 10 + 12 ) n r 2 = O ( 34 n r 2 ) flops.

4. Banded Reduction for Symmetric SSS Matrix

We know that matrix C is an SSS matrix and we can use the method in Section 3.2 to get its compact SSS representation. In this section, we introduce how to tridiagonalize C by using its SSS representations.
For clearness, we assume that C is a 4 × 4 SSS matrix, the same as Equation (5),
C = D 1 U 1 V 2 T U 1 W 2 V 3 T U 1 W 2 W 3 V 4 T P 2 Q 1 T D 2 U 2 V 3 T U 2 W 3 V 4 T P 3 R 2 Q 1 T P 3 Q 2 T D 3 U 3 V 4 T P 4 R 3 R 2 Q 1 T P 4 R 3 Q 2 T P 4 Q 3 T D 4 .
Since matrix C is symmetric, we have P i = V i , Q i = U i and R i = W i T .
We perform the following steps and see how to convert C into a block tridiagonal form by working on the generators of C.
  • Work on the last two block rows. The off-diagonal block is
    P 3 R 2 Q 1 T P 3 Q 2 T P 4 R 3 R 2 Q 1 T P 4 R 3 Q 2 T = P 3 P 4 R 3 R 2 Q 1 T Q 2 T R 2 Q 1 T Q 2 T ,
    where ⊙ means to multiply in the same block row. Find an orthogonal matrix Q C 2 r × 2 r such that Q × P 3 P 4 R 3 = P ^ 3 0 . Compute Q × D 3 Q 3 P 4 T P 4 Q 3 T D 4 × Q T = D ^ 3 Q ^ 3 T Q ^ 3 D ^ 4 , and define R ^ 3 = 0 , P ^ 4 = I , and update D ^ 3 , Q ^ 3 . Now, we have
    C = D 1 U 1 V 2 T U 1 W 2 V ^ 3 T 0 P 2 Q 1 T D 2 U 2 V ^ 3 T 0 P ^ 3 R 2 Q 1 T P ^ 3 Q 2 T D ^ 3 U ^ 3 0 0 Q ^ 3 T D 4 .
  • Work on the 2-nd and 3-rd block rows. The off-diagonal block is
    P 2 Q 1 T P ^ 3 R 2 Q 1 T = P 2 P ^ 3 R 2 Q 1 T Q 1 T .
    We can compute an orthogonal matrix with dimension 2 r such that Q × P 2 P ^ 3 R 2 = P ^ 2 0 . Compute Q × D 2 Q 2 P ^ 3 T P ^ 3 Q 2 T D ^ 3 × Q T = D ^ 2 Q ^ 2 T Q ^ 2 D ^ 3 , and define R ^ 2 = 0 , P ^ 3 = I r , and update D ^ 2 , Q ^ 2 . Now, we will introduce a bulge at positions ( 2 , 4 ) and ( 4 , 2 ) , and C looks like
    C = D 1 U 1 V ^ 2 T 0 0 P ^ 2 Q 1 T D ^ 2 U ^ 2 X 0 Q ^ 2 T D ^ 3 U ^ 3 0 X T Q ^ 3 T D 4 .
    The bulge is computed as 0 Q ^ 3 T Q T = X Q ^ 3 T , and Q ^ 3 is updated. We can use the standard chasing algorithm [43] to eliminate the bulge. The bulge chasing process does not affect the top-left part of matrix C which is represented in SSS form. We draw attention to the fact that P i = V i I r , R i = W i T 0 , for i = 3 , 4 , and the block tridiagonal matrix is defined by D ^ i and Q ^ i or U ^ i . Finally, matrix C has the following form after bulge chasing,
    C = D 1 U 1 V ^ 2 T 0 0 P ^ 2 Q 1 T D ^ 2 U ^ 2 0 0 Q ^ 2 T D ^ 3 U ^ 3 0 0 Q ^ 3 T D ^ 4 .
  • Define Q ^ 1 T = P ^ 2 Q 1 T , D ^ 1 = D 1 and P ^ 2 = I r . Finally, we get the block tridiagonal matrix
    C = D ^ 1 U ^ 1 0 0 Q ^ 1 T D ^ 2 U ^ 2 0 0 Q ^ 2 T D ^ 3 U ^ 3 0 0 Q ^ 3 T D ^ 4 .
The whole procedure is summarized in Algorithm 1. The procedure starts from the last row ( k = N ) and ends at the second row ( k = 2 ), i.e., Step 11 computes the off-diagonal block Q ^ 1 of matrix C above.
Algorithm 1: (Symmetric banded reduction algorithm for symmetric SSS matrix) Assume that C is an N × N block symmetric SSS matrix, and its generators are { P i } , { R i } , { Q i } and { D i } , for i = 1 , , N and each generator is a r × r matrix.
  Inputs: generators { P i } , { R i } , { Q i } and { D i } , for i = 1 , , N
  Outputs: a block tridiagonal matrix defined in { D i } and { Q i } .
  • DO k = N : 1 : 3
  •    Compute orthogonal matrix H k such that H k P k 1 P k R k 1 = P ^ k 1 0 , and update P k 1 = P ^ k 1 .
  •    Compute H k D k 1 Q k 1 P k T P k Q k 1 T D k H k T = D ^ k 1 U ^ k 1 Q ^ k 1 D ^ k , and define
       R k 1 = 0 , P k = I r , and update D k 1 = D ^ k 1 , Q k 1 = Q ^ k 1 and D k = D ^ k .
  •    if  k < N    %(it will introduce a bulge.)
  •        Compute the bulge X by computing H k × 0 Q k = X Q ^ k , and
           update Q k = Q ^ k ;
  •        for  i = k , N 1
  •           Apply the standard chasing procedure and chase the bulge down;
  •        end for
  •    end if
  • END DO
  • when k = 2 , compute Q ^ 1 T = P ^ 2 Q 1 T and update Q 1 = Q ^ 1 .
Remark 1.
By further computing the QR factorization of Q i = Q ^ R ^ i for i = 2 , , N , we can get an symmetric banded matrix, and its off-diagonal blocks are R ^ i and its diagonal blocks are D ^ i = Q ^ i T D i Q ^ i .
Remark 2.
For an N × N block SSS matrix, the number of D i is N, the number of P i is N − 1, Q i is N − 1, and R i is N − 2, and the total number of generators is 4 ( N 1 ) . The lower triangular part of C has N ( N + 1 ) 2 blocks. When N 6 , the generators require less storage than storing C explicitly as a dense matrix.
Remark 3.
Algorithm 1 is based on eliminating block rows and columns one by one from the bottom. We can get a similar algorithm by reducing the block rows and columns from the top.

Complexity

For a symmetric SSS matrix, we assume that all its generators are r × r matrices. We follow the steps of Algorithm 1 to estimate its computational cost.
  • Step 2: QR factorization of a 2 r × r tall matrix costs 2 r 2 ( 2 r r 3 ) = 10 3 r 3 . It totally executes N 2 times, and thus it costs O ( 10 3 n r 2 ) flops.
  • Step 3: Computing P k Q k 1 T costs 2 r 3 and the product of H k and H k T with D k 1 Q k 1 P k T P k Q k 1 T D k costs 2 × 2 ( 2 r ) 3 = 32 r 3 flops. Since it executes N 2 times, it costs O ( 34 n r 2 ) flops in total.
  • Step 4:
    • Computing the bulge costs 2 × 2 r × 2 r × r = 8 r 3 . There are N 2 times, and it costs 8 n r 2 flops in total.
    • Each bulge chasing costs 10 3 r 3 + 2 × ( 2 r ) × ( 2 r ) × ( 3 r ) + 2 × ( 2 r ) × ( 2 r ) × ( 2 r ) = ( 10 3 + 24 + 16 ) r 3 = ( 10 3 + 40 ) r 3 .
    • For k-th step, it requires N k bulge chasing steps. Since k = N 1 3 ( N k ) = = 1 N 3 = ( N 3 ) ( N 2 ) 2 , it totally costs O ( ( 10 3 + 40 ) 1 2 ( N 2 5 N ) r 3 = O ( 65 3 ( n 2 r 5 n r 2 ) ) flops.
  • Step 11: it costs 2 r 3 flops.
Therefore, the block tridiagonalization of a symmetric SSS matrix totally costs O ( 65 3 n 2 r + ( 34 + 10 3 325 3 + 8 ) n r 2 ) = O ( 65 3 n 2 r 188 3 n r 2 ) flops.

5. Numerical Results

In this section we test the accuracy of the banded matrix obtained by using SSS matrix techniques. We further compare the accuracy of the proposed algorithms in computing the eigenvalues of symmetric banded positive generalized eigenvalue matrices. All the numerical results are obtained by using Matlab 2017(b) on a laptop with 16GB memory.
Example 1.
Assume that A and B are two randomized symmetric banded matrices and B is further positive definite, which are constructed by using the following Matlab codes
  • A = rand(n); B = rand(n);
  • A = (A + A’)/2; B = (B + B’)/2 + α eye(n);
  • A = triu(tril(A,r), −r); B = triu(tril(B,r), −r);
where n is the dimension of matrix and r is the semi-bandwidth, α is a constant to make sure B positive, which is 10 in our experiments. We compute matrix C = L 1 A L T explicitly and L is the lower Cholesky factor of matrix B. Then, we compute C C ^ F where C ^ = Q S T ^ Q S , T ^ is the symmetric banded matrix computed by Algorithm 1, and Q S is accumulated orthogonal matrix in Algorithm 1. The results are shown in Table 2. For simplicity we assume n = N × r , i.e., n is divisible by r.
We let N = 16 , 64 , 256 , 512 and r = 8 , 16 , 32 , respectively. The backward errors of Algorithm 1 are shown in Table 2, and the times cost by Algorithm 1 are shown in Table 3. When N = 512 and r = 32 , it is out of memory on the computer used for experiments with 16GB memory, and the result is not included in the following tables. The results show that the proposed algorithm is numerically stable.
Example 2.
In this example we compare the accuracy of the computed generalized eigenvalues by the SSS approach. After obtaining the symmetric banded matrix T ^ from Algorithm 1, we call the Matlab routine eig to compute the eigenvalues of T ^ , and then compare with the eigenvalues computed directly from A and B (by using eig(A,B,’chol’)). The relative errors are measured as max i = 1 n | λ ^ i λ i | λ i , where λ ^ i is the eigenvalue computed by using Algorithm 1, and λ i is the eigenvalue computed directly from A and B. The maximum errors and maximum relative errors are shown in Table 4 and Table 5, respectively. Matrices A and B are constructed as in Example 1, and the meanings of parameters are the same.

6. Conclusions

In this paper, the rank-structured matrix techniques are first used to reduce a banded generalized eigenvalue problem to a banded standard eigenvalue problem, and it is the first time that rank-structured matrix is used for such eigenvalue problems. Note that there are some works for reducing a rank-structured matrix to its tridiagonal or Hessenberg forms [25], which is different from this work. We in this work focus on the symmetric banded matrices which are sparse, not one particular rank-structured matrix such as quasiseparable, semiseparable and so on. In particular, we use the fast algorithms based on the sequentially semiseparable (SSS) matrix to reduce the banded symmetric generalized eigenvalue problems. The whole process of the proposed algorithm is shown in Algorithm 1, and the complexity analysis is also included. Comparing with the classical algorithms in LAPACK, the algorithm proposed in this paper requires the same order of storage and computation cost. The newly proposed algorithm consists of many small matrix-matrix multiplications which can be potentially executed in parallel. We plan to implement our algorithm by leveraging CHAMELEON library [13,14] and even extend it to the distributed parallel computing case by combining some data redistribution techniques such as [44] in near future.

Author Contributions

Conceptualization, S.L.; Data curation, F.Y. and L.D.; Formal analysis, F.Y., C.C. and B.Y.; Writing—review & editing, H.J., H.W. and L.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by NSFC (No. 2021YFB0300101, 62073333, 61902411, 62032023, 12002382, 11275269, 42104078), 173 Program of China (2020-JCJQ-ZD-029), Open Research Fund from State Key Laboratory of High Performance Computing of China (HPCL) (No. 202101-01), Guangdong Natural Science Foundation (2018B030312002), and the Program for Guangdong Introducing Innovative and Entrepreneurial Teams under Grant (No. 2016ZT06D211).

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

Not Applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Anderson, E.; Bai, Z.; Bischof, C.; Blackford, S.; Demmel, J.; Dongarra, J.; Du Croz, J.; Greenbaum, A.; Hammarling, S.; McKenney, A.; et al. LAPACK Users’ Guide, 3rd ed.; SIAM: Philadelphia, PA, USA, 1999. [Google Scholar]
  2. Wilkinson, J. Some recent advances in numerical linear algebra. In The State of the Art in Numerical; Academic Press: New York, NY, USA, 1977. [Google Scholar]
  3. Dhillon, I.S.; Parlett, B.N. Multiple representations to compute orthogonal eigenvectors of symmetric tridiagonal matrices. Linear Algebra Appl. 2004, 387, 1–28. [Google Scholar] [CrossRef] [Green Version]
  4. Brebner, M.; Grad, J. Eigenvalues of Ax = λBx for real symmetric matrices A and B computed by reduction to a pseudosymmetric form and the HR process. Linear Algebra Appl. 1982, 43, 99–118. [Google Scholar] [CrossRef] [Green Version]
  5. Tisseur, F. Tridiagonal-diagonal reduction of symmetric indefinite pairs. SIAM J. Matrix Anal. Appl. 2004, 26, 215–232. [Google Scholar] [CrossRef] [Green Version]
  6. Crawford, C. Reduction of a band-symmetric generalized eigenvalue problem. Commun. ACM 1973, 16, 41–44. [Google Scholar] [CrossRef]
  7. Lang, B. Efficient reduction of banded hermitian positive definite generalized eigenvalue problems to banded standard eigenvalue problems. SIAM J. Sci. Comput. 2019, 41, C52–C72. [Google Scholar] [CrossRef]
  8. Rippl, M.; Lang, B.; Huckle, T. Parallel eigenvalue computation for banded generalized eigenvalue problems. Parallel Comput. 2019, 88, 102542. [Google Scholar] [CrossRef]
  9. Marek, A.; Blum, V.; Johanni, R.; Havu, V.; Lang, B.; Auckenthaler, T.; Heinecke, A.; Bungartz, H.; Lederer, H. The ELPA library: Scalable parallel eigenvalue solutions for electronic structure theory and computational science. J. Phys. Condens. Matter 2014, 26, 213201. [Google Scholar] [CrossRef]
  10. Chandrasekaran, S.; Dewilde, P.; Gu, M.; Pals, T.; Sun, X.; van der Veen, A.J.; White, D. Fast Stable Solvers for Sequentially Semi-Separable Linear Systems of Equations; Technical Report; University of California: Berkeley, CA, USA, 2003. [Google Scholar]
  11. Chandrasekaran, S.; Dewilde, P.; Gu, M.; Pals, T.; Sun, X.; van der Veen, A.J.; White, D. Some fast algorithms for sequentially semiseparable representation. SIAM J. Matrix Anal. Appl. 2005, 27, 341–364. [Google Scholar] [CrossRef] [Green Version]
  12. Singh, V. The inverse of a certain block matrix. Bull. Aust. Math. Soc. 1979, 20, 161–163. [Google Scholar] [CrossRef] [Green Version]
  13. Chameleon Software Homepage. Available online: https://solverstack.gitlabpages.inria.fr/chameleon/ (accessed on 22 March 2022).
  14. Agullo, E.; Augonnet, C.; Dongarra, J.; Ltaief, H.; Namyst, R.; Thibault, S.; Tomov, S. A hybridization methodology for high-performance linear algebra software for GPUs. In GPU Computing Gems Jade Edition; Elsevier: Amsterdam, The Netherlands, 2012; pp. 473–484. [Google Scholar]
  15. Chandrasekaran, S.; Gu, M. A divide-and-conquer algorithm for the eigendecomposition of symmetric block-diagonal plus semiseparable matrices. Numer. Math. 2004, 96, 723–731. [Google Scholar] [CrossRef]
  16. Chandrasekaran, S.; Gu, M. Fast and stable eigendecomposition of symmetric banded plus semi-separable matrices. Linear Algebra Appl. 2000, 313, 107–114. [Google Scholar] [CrossRef] [Green Version]
  17. Vandebril, R.; Van Barel, M.; Mastronardi, N. Matrix Computations and Semiseparable Matrices, Volume I: Linear Systems; Johns Hopkins University Press: Baltimore, MD, USA, 2008. [Google Scholar]
  18. Hackbusch, W. A sparse matrix arithmetic based on -matrices. Part I: Introduction to -matrices. Computing 1999, 62, 89–108. [Google Scholar] [CrossRef]
  19. Hackbusch, W.; Grasedyck, L.; Börm, S. An introduction to hierarchical matrices. Math. Bohem. 2002, 127, 229–241. [Google Scholar] [CrossRef]
  20. Hackbusch, W.; Khoromskij, B. A sparse matrix arithmetic based on -matrices. Part II: Application to multi-dimensional problems. Computing 2000, 64, 21–47. [Google Scholar] [CrossRef]
  21. Börm, S.; Grasedyck, L.; Hackbusch, W. Introduction to hierarchical matrices with applications. Eng. Anal. Bound. Elem. 2003, 27, 405–422. [Google Scholar] [CrossRef] [Green Version]
  22. Hackbusch, W.; Khoromskij, B.; Sauter, S. On 2-matrices. In Proceedings of the Lecture on Applied Mathematics; Bungartz, H., Hope, R.H.W., Zenger, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2000; pp. 9–29. [Google Scholar]
  23. Hackbusch, W.; Börm, S. Data-sparse approximation by adaptive 2-matrices. Computing 2002, 69, 1–35. [Google Scholar] [CrossRef]
  24. Eidelman, Y.; Gohberg, I. On a new class of structured matrices. Integral Equ. Oper. Theory 1999, 34, 293–324. [Google Scholar] [CrossRef]
  25. Eidelman, Y.; Gohberg, I.; Gemignani, L. On the fast reduction of a quasiseparable matrix to Hessenberg and tridiagonal forms. Linear Algebra Appl. 2007, 420, 86–101. [Google Scholar] [CrossRef] [Green Version]
  26. Vandebril, R.; Van Barel, M.; Mastronardi, N. Matrix Computations and Semiseparable Matrices, Volume II: Eigenvalue and Singular value Methods; Johns Hopkins University Press: Baltimore, MD, USA, 2008. [Google Scholar]
  27. Chandrasekaran, S.; Dewilde, P.; Gu, M.; Lyons, W.; Pals, T. A fast solver for HSS representations via sparse matrices. SIAM J. Matrix Anal. Appl. 2006, 29, 67–81. [Google Scholar] [CrossRef] [Green Version]
  28. Li, S.; Gu, M.; Cheng, L.; Chi, X.; Sun, M. An accelerated divide-and-conquer algorithm for the bidiagonal SVD problem. SIAM J. Matrix Anal. Appl. 2014, 35, 1038–1057. [Google Scholar] [CrossRef]
  29. Liao, X.; Li, S.; Lu, Y.; Roman, J.E. A parallel structured divide-and-conquer algorithm for symmetric tridiagonal eigenvalue problems. IEEE Trans. Parallel Distrib. Syst. 2020, 32, 367–378. [Google Scholar] [CrossRef]
  30. Zhang, J.; Su, Q.; Tang, B.; Wang, C.; Li, Y. DPSNet: Multitask Learning Using Geometry Reasoning for Scene Depth and Semantics. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–12. [Google Scholar] [CrossRef] [PubMed]
  31. Rebrova, E.; Chávez, G.; Liu, Y.; Ghysels, P.; Li, X.S. A study of clustering techniques and hierarchical matrix formats for kernel ridge regression. In Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Vancouver, BC, Canada, 21–25 May 2018; pp. 883–892. [Google Scholar]
  32. Chávez, G.; Liu, Y.; Ghysels, P.; Li, X.S.; Rebrova, E. Scalable and memory-efficient kernel ridge regression. In Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA, 18–22 May 2020; pp. 956–965. [Google Scholar]
  33. Erlandson, L.; Cai, D.; Xi, Y.; Chow, E. Accelerating parallel hierarchical matrix-vector products via data-driven sampling. In Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA, 18–22 May 2020; pp. 749–758. [Google Scholar]
  34. Cai, D.; Chow, E.; Erlandson, L.; Saad, Y.; Xi, Y. SMASH: Structured matrix approximation by separation and hierarchy. Numer. Linear Algebra Appl. 2018, 25, e2204. [Google Scholar] [CrossRef] [Green Version]
  35. Kailath, T. Fredholm resolvents, Wiener-Hopf equations, and Riccati differential equations. IEEE Trans. Inf. Theory 1969, 15, 665–672. [Google Scholar] [CrossRef]
  36. Asplund, E. Inverses of matrices aij which satisfy aij = 0 for j > i + p. Math. Scand. 1959, 7, 57–60. [Google Scholar] [CrossRef] [Green Version]
  37. Barrett, W.; Feinsilver, P. Inverses of banded matrices. Linear Algebra Appl. 1981, 4, 111–130. [Google Scholar] [CrossRef] [Green Version]
  38. Vanderbril, R.; Barel, M.V.; Mastronardi, N. A note on the representation and definition of semiseparable matrices. Numer. Linear Algebra Appl. 2005, 12, 839–858. [Google Scholar] [CrossRef] [Green Version]
  39. Dewilde, P.; van der Veen, A. Time-Varying Systems and Computations; Kluwer Academic Publishers: Amsterdam, The Netherlands, 1998. [Google Scholar]
  40. Chandrasekaran, S.; Dewilde, P.; Gu, M.; Pals, T.; van der Veen, A.J. Fast stable solver for sequentially semi-separable linear systems of equations. In Proceedings of the International Conference on High-Performance Computing, Bangalore, India, 18–21 December 2002; pp. 545–554. [Google Scholar]
  41. Gu, M.; Eisenstat, S.C. Efficient algorithms for computing a strong-rank revealing QR factorization. SIAM J. Sci. Comput. 1996, 17, 848–869. [Google Scholar] [CrossRef] [Green Version]
  42. Cheng, H.; Gimbutas, Z.; Martinsson, P.; Rokhlin, V. On the compression of low rank matrices. SIAM J. Sci. Comput. 2005, 26, 1389–1404. [Google Scholar] [CrossRef] [Green Version]
  43. Schwarz, H.R. Tridiagonalization of a symmetric band matrix. Numer. Math. 1968, 12, 231–241. [Google Scholar] [CrossRef]
  44. Li, S.; Jiang, H.; Dong, D.; Huang, C.; Liu, J.; Liao, X.; Chen, X. Efficient data redistribution algorithms from irregular to block cyclic data distribution. IEEE Trans. Parallel Distrib. Syst. 2022. [Google Scholar] [CrossRef]
Table 1. Dimensions of the generators of the SSS matrix shown in Equation (4), k i and l i are column dimensions of U i and P i , respectively.
Table 1. Dimensions of the generators of the SSS matrix shown in Equation (4), k i and l i are column dimensions of U i and P i , respectively.
Matrix U i V i W i P i Q i R i
Dimension m i × k i m i × k i 1 k i 1 × k i m i × l i m i × l i + 1 l i + 1 × l i
Table 2. Backward errors of the computed banded matrix by Algorithm 1.
Table 2. Backward errors of the computed banded matrix by Algorithm 1.
r = 8 r = 16 r = 32
N = 16 2.23 × 10 15 4.74 × 10 15 1.39 × 10 14
N = 64 8.83 × 10 15 1.89 × 10 14 5.82 × 10 14
N = 256 3.40 × 10 14 7.44 × 10 14 2.27 × 10 13
N = 512 6.69 × 10 14 1.46 × 10 13 -
Table 3. The times in second cost by Algorithm 1.
Table 3. The times in second cost by Algorithm 1.
r = 8 r = 16 r = 32
N = 16 5.33 × 10 2 2.78 × 10 2 4.04 × 10 2
N = 64 1.34 × 10 1 3.00 × 10 1 2.18 × 10 0
N = 256 4.87 × 10 0 3.37 × 10 1 1.40 × 10 2
N = 512 3.91 × 10 1 2.77 × 10 2 -
Table 4. The maximum errors of eigenvalues computed by Algorithm 1.
Table 4. The maximum errors of eigenvalues computed by Algorithm 1.
r = 8 r = 16 r = 32
N = 16 2.55 × 10 15 4.44 × 10 15 1.73 × 10 14
N = 64 3.11 × 10 15 9.99 × 10 15 3.71 × 10 14
N = 256 1.76 × 10 14 1.87 × 10 14 1.51 × 10 13
N = 512 2.93 × 10 14 7.79 × 10 14 -
Table 5. The maximum relative errors of eigenvalues computed by Algorithm 1.
Table 5. The maximum relative errors of eigenvalues computed by Algorithm 1.
r = 8 r = 16 r = 32
N = 16 7.29 × 10 14 6.25 × 10 14 5.15 × 10 14
N = 64 9.15 × 10 14 3.89 × 10 11 3.80 × 10 13
N = 256 3.92 × 10 13 5.36 × 10 11 2.73 × 10 12
N = 512 1.12 × 10 12 1.70 × 10 12 -
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yuan, F.; Li, S.; Jiang, H.; Wang, H.; Chen, C.; Du, L.; Yang, B. Efficient Reduction Algorithms for Banded Symmetric Generalized Eigenproblems via Sequentially Semiseparable (SSS) Matrices. Mathematics 2022, 10, 1676. https://doi.org/10.3390/math10101676

AMA Style

Yuan F, Li S, Jiang H, Wang H, Chen C, Du L, Yang B. Efficient Reduction Algorithms for Banded Symmetric Generalized Eigenproblems via Sequentially Semiseparable (SSS) Matrices. Mathematics. 2022; 10(10):1676. https://doi.org/10.3390/math10101676

Chicago/Turabian Style

Yuan, Fan, Shengguo Li, Hao Jiang, Hongxia Wang, Cheng Chen, Lei Du, and Bo Yang. 2022. "Efficient Reduction Algorithms for Banded Symmetric Generalized Eigenproblems via Sequentially Semiseparable (SSS) Matrices" Mathematics 10, no. 10: 1676. https://doi.org/10.3390/math10101676

APA Style

Yuan, F., Li, S., Jiang, H., Wang, H., Chen, C., Du, L., & Yang, B. (2022). Efficient Reduction Algorithms for Banded Symmetric Generalized Eigenproblems via Sequentially Semiseparable (SSS) Matrices. Mathematics, 10(10), 1676. https://doi.org/10.3390/math10101676

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop