1. Introduction
The solution of the continuous Sylvester equation
with large sparse matrices
,
,
,
and with
A,
B positive definite is a common task in numerical linear algebra. It arises in many scientific computing and engineering applications, such as control theory [
1,
2], neural networks, model reduction [
3], image processing [
4], and so on. Therefore, the problem has remained an active area of research. In this context, recent methodological advances have been thoroughly discussed in many papers [
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20]. Iterative methods for solving linear or nonlinear equations have seen constant improvement in recent years to reduce the computational time; for example, two multi-step derivative-free iterative methods [
5]: block Jacobi two stage method [
6] and SYMMLQ algorithm [
7,
8,
9]. In addition, widely-used direct methods are, for instance, the Bartels–Stewart [
10] and the Hessenberg–Schur method [
11]. The main idea is to transform
A and
B into triangular or Hessenberg form [
21] by an orthogonal similarity transformation and then to solve the resulting system of linear equations directly by a back-substitution process. However, this method is not applicable in large-scale problems due to the prohibitive computational issue. In order to overcome this limitation, fast iterative methods have been developed such as the Smith method [
12], the alternating direction implicit (ADI) method [
13], gradient-based methods [
14,
15], and the Krylov subspace-based algorithm [
7,
16,
17]. At present, the conjugate gradient (CG) method [
7] and the preconditioned conjugate gradient method [
18] are popularly used with the advantages of small storage and suitability for parallel computing. Typically, the SYMMLQ algorithm [
7,
8,
9] is quite efficient in the case of symmetric coefficient matrices, as it has tremendous advantages in small storage capacity and stable computations. However, it is not a good option for multi-computer systems due to the high cost of global communication. For asymmetric coefficient matrices, a modified conjugate gradient method (MCG) is useful. However, its convergence speed is slow [
22,
23].
Another type of iteration based on splitting methods allows us to better utilize standard methodologies. For instance, Bai et al. [
24] proposed Hermitian and skew-Hermitian splitting (HSS) iteration methods for solving systems of linear equations with non-Hermitian positive definite form. This has been studied widely and generalized in [
25,
26,
27,
28]. Recently, a Hermitian and skew-Hermitian splitting (HSS) iteration method for solving large sparse continuous Sylvester equations with non-Hermitian and positive definite/semidefinite matrices was discussed in [
29]. Wang et al. [
30] presented a positive-definite and skew-Hermitian splitting (PSS) iteration method, and in [
31] Zhou et al. applied the modified Hermitian and skew-Hermitian splitting (MHSS) iteration method to solve the continuous Sylvester equation. Zheng and Ma in [
32] applied the idea of the normal and skew-Hermitian splitting (NSS) iteration method to continuous Sylvester equations.
However, these iteration methods have the common difficulty that there is no accurate formula to determine the positive value of the corresponding parameter in the iteration scheme. In many articles, a large amount of work has been done to address this issue. Unfortunately this estimation methodology is still not fully resolved in practical applications. In addition, their implementations need to solve two continuous Sylvester equations, which results in great additional computational cost.
All of these brought about the need for the development and validation of an efficient parallel algorithm. In this paper we have proposed a parallel algorithm of two-stage iteration for solving large-scale continuous Sylvester equations with the combination of the HSS iteration method and the SYMMLQ algorithm. The main idea is to split the coefficient matrices into a symmetric and an anti-symmetric matrix, respectively. Then, the original equations are transferred into symmetric matrix equations which are solved by the SYMMLQ algorithm. Furthermore, we focus on the improvement of the parallel efficiency of the SYMMLQ algorithm by adjusting the calculation steps.
The remainder of this paper is organized as follows. In
Section 2, a description of the two-stage iteration method is presented based on a splitting method and the SYMMLQ algorithm for solving the continuous Sylvester Equation (
1). Then the parallel implementation of the algorithm is given in
Section 3 . Its convergence analysis and numerical examples are mentioned in
Section 4 and
Section 5, respectively. We end with conclusions.
Notation in this paper: denotes the transpose of matrix A; inner product using ; matrix norm of A induced by and is the spectral radius of the matrix A. For the matrix , denotes the operator defined as .
3. Parallel Implementation of the Two-Stage Iteration Method
In this section we discuss the parallel implementation including data storage, and implementation of outer iteration and inner iteration.
3.1. Data Storage
For convenience, let p be the number of processors, represent ith processor, and l is an integer in .
Mark
where
,
,
,
,
,
,
,
,
,
,
are
sub-block matrices. These are saved in row storage. Then, store
,
,
,
,
,
on the processor
.
Note: Due to the storage method, we chose the way of matrix multiplication with block row–row matrices in parallel computing process. Detailed descriptions of parallel computing Matrix multiplication are found in References [
5,
23,
34].
3.2. Parallel Implementation of Outer Iteration Method
(1) Splitting process: Processor
computes
(2) Cycle processor:
Step 1. Processor
computes
get
after all-reduce. If
stop; otherwise, turn to step 2.
Step 2. Compute in each processor.
Step 3. Use the improved parallel process of SYMMLQ algorithm to solve the new equation
This step, which improves the parallel efficiency and reduces the parallel time by reducing the frequency of communication, plays an important role in the whole parallel implementation of the two-stage iteration method, and the detailed implementation can be seen in
Section 3.3 and
Section 3.4.
Step 4. Let , turn to Step 1.
3.3. Parallel Implementation of Inner Iteration Scheme
(1) Compute process:
① Processor
computes
obtain
after all-reduce, then compute
② Compute
obtain
after all-reduce. Compute
get
after all-reduce and
, then compute
,
,
and
in each processor.
③ Processor
computes
and computes the inner product
, get
after all-reduce. Compute
get
after all-reduce, compute
.
④ Processor
computes
if
stop, otherwise compute
.
⑤ Processor
computes
(2) Cycle process:
Step 1. Processor
computes
if
stop, otherwise, turn to Step 2.
Step 2. Processor
computes
then compute
, obtain
, after all-reduce. Compute
Obtain
after all-reduce, compute
.
Step 3. Processor
computes
if
stop; otherwise, compute
.
Step 4. Processor
computes
Step 5 Let , turn to Step 1.
3.4. Improved Parallel Implementation of the SYMMLQ Algorithm
Clearly, when computing
,
in each step of the inner iteration, all processors need to apply the reduce operator twice in the parallel implementation of the SYMMLQ algorithm in
Section 3.3. Therefore, we should rearrange Step 2 in the cycle process, while the remaining steps remain the same. The detailed parallel process of the algorithm can be expressed as follows.
Processor
computes
then compute
and
, get
and
after one all-reduce, compute
.
In this way, computing , only needs to all-reduce once, so as to reduce the frequency of communication and thus reduce the parallel time. Eventually, we obtain an improved parallel implementation of the SYMMLQ algorithm.
5. Numerical Examples
In order to illustrate the performance of the two-stage iteration (TS iteration) method, some examples were performed in Matlab on an Intel dual core processor (1.00 GHz, 2 GB RAM) and the parallel machine Lenovo Shen-teng 1800 cluster. All iterations were started from a zero matrix and terminated when the current iterate satisfied , where is the residual of the kth iteration.
Here we compare the TS iteration method with the HSS iteration method proposed in [
29].
Notation:
T | the computational time in seconds |
ITs | the number of iteration steps |
p | the total number of processors |
S | speedup ratio |
| parallel efficiency |
| error |
Example 1. Consider the continuous Sylvester Equation (1) with and the matriceswhere I is the identity matrix, and are the tridiagonal matrices given by The goal in this test is to compare the iteration steps and the computational time by using TS iteration method, HSS iteration method, and MCG method with
,
, and
. The numerical results are listed in
Table 1,
Table 2 and
Table 3, respectively. The optimal parameters
for the HSS iteration method are given in
Table 4 proposed in [
29].
From the above tables, we see that both iteration steps and computational time by the TS method are much less than those by the HSS and MCG in all cases. The comparison between MCG and HSS is not straight-forward. Furthermore, for the above tables we observe that in some cases the number of iteration steps of MCG is larger than that of HSS, while on the contrary, for the computational time it mainly depends on the computational time of each iteration step.
Example 2. Consider the elliptic partial differential equationwith boundary condition Let the step size be
and
. That means that the size of the linear system is
and
, respectively. The equation is discretized by using five-point difference format, and then is transformed into a Sylvester equation. The numerical results are shown in
Table 5 and
Table 6. This numerical experiment is performed on the parallel machine Lenovo Shen-teng 1800 cluster. Here we focus on comparing the parallel performance with the TS iteration method and the MCG method.
From the results
Table 5 and
Table 6, we observe that both iteration steps and computational time using TS are much less than those with the MCG. Furthermore, parallel efficiency with the TS method is higher than with the MCG. In addition, the advantages of the TS method increase over the MCG method with increasing scale-size of equations from
(
) to
(
).
Example 3. Let and where In the Sylvester matrix equation
, let
and
F is an any given matrix. The numerical results are listed in
Table 7.
From
Table 7 we observe that the two-stage iteration method is still efficient in the case that the coefficient matrices are indefinite matrices. This indicates that the condition for the convergence is only a sufficient condition in Theorem 1.