A Parallel Two-Stage Iteration Method for Solving Continuous Sylvester Equations

: In this paper we propose a parallel two-stage iteration algorithm for solving large-scale continuous Sylvester equations. By splitting the coefﬁcient matrices, the original linear system is transformed into a symmetric linear system which is then solved by using the SYMMLQ algorithm. In order to improve the relative parallel efﬁciency, an adjusting strategy is explored during the iteration calculation of the SYMMLQ algorithm to decrease the degree of the reduce-operator from two to one communications at each step. Moreover, the convergence of the iteration scheme is discussed, and ﬁnally numerical results are reported showing that the proposed method is an efﬁcient and robust algorithm for this class of continuous Sylvester equations on a parallel machine.


Introduction
The solution of the continuous Sylvester equation with large sparse matrices A ∈ R n×n , B ∈ R n×n , X ∈ R n×n , F ∈ R n×n and with A, B positive definite is a common task in numerical linear algebra.It arises in many scientific computing and engineering applications, such as control theory [1,2], neural networks, model reduction [3], image processing [4], and so on.Therefore, the problem has remained an active area of research.In this context, recent methodological advances have been thoroughly discussed in many papers [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20].Iterative methods for solving linear or nonlinear equations have seen constant improvement in recent years to reduce the computational time; for example, two multi-step derivative-free iterative methods [5]: block Jacobi two stage method [6] and SYMMLQ algorithm [7][8][9].In addition, widely-used direct methods are, for instance, the Bartels-Stewart [10] and the Hessenberg-Schur method [11].The main idea is to transform A and B into triangular or Hessenberg form [21] by an orthogonal similarity transformation and then to solve the resulting system of linear equations directly by a back-substitution process.However, this method is not applicable in large-scale problems due to the prohibitive computational issue.
In order to overcome this limitation, fast iterative methods have been developed such as the Smith method [12], the alternating direction implicit (ADI) method [13], gradient-based methods [14,15], and the Krylov subspace-based algorithm [7,16,17].At present, the conjugate gradient (CG) method [7] and the preconditioned conjugate gradient method [18] are popularly used with the advantages of small storage and suitability for parallel computing.Typically, the SYMMLQ algorithm [7][8][9] is quite efficient in the case of symmetric coefficient matrices, as it has tremendous advantages in small storage capacity and stable computations.However, it is not a good option for multi-computer systems due to the high cost of global communication.For asymmetric coefficient matrices, a modified conjugate gradient method (MCG) is useful.However, its convergence speed is slow [22,23].
Another type of iteration based on splitting methods allows us to better utilize standard methodologies.For instance, Bai et al. [24] proposed Hermitian and skew-Hermitian splitting (HSS) iteration methods for solving systems of linear equations with non-Hermitian positive definite form.This has been studied widely and generalized in [25][26][27][28].Recently, a Hermitian and skew-Hermitian splitting (HSS) iteration method for solving large sparse continuous Sylvester equations with non-Hermitian and positive definite/semidefinite matrices was discussed in [29].Wang et al. [30] presented a positive-definite and skew-Hermitian splitting (PSS) iteration method, and in [31] Zhou et al. applied the modified Hermitian and skew-Hermitian splitting (MHSS) iteration method to solve the continuous Sylvester equation.Zheng and Ma in [32] applied the idea of the normal and skew-Hermitian splitting (NSS) iteration method to continuous Sylvester equations.
However, these iteration methods have the common difficulty that there is no accurate formula to determine the positive value of the corresponding parameter in the iteration scheme.In many articles, a large amount of work has been done to address this issue.Unfortunately this estimation methodology is still not fully resolved in practical applications.In addition, their implementations need to solve two continuous Sylvester equations, which results in great additional computational cost.
All of these brought about the need for the development and validation of an efficient parallel algorithm.In this paper we have proposed a parallel algorithm of two-stage iteration for solving large-scale continuous Sylvester equations with the combination of the HSS iteration method and the SYMMLQ algorithm.The main idea is to split the coefficient matrices into a symmetric and an anti-symmetric matrix, respectively.Then, the original equations are transferred into symmetric matrix equations which are solved by the SYMMLQ algorithm.Furthermore, we focus on the improvement of the parallel efficiency of the SYMMLQ algorithm by adjusting the calculation steps.
The remainder of this paper is organized as follows.In Section 2, a description of the two-stage iteration method is presented based on a splitting method and the SYMMLQ algorithm for solving the continuous Sylvester Equation (1).Then the parallel implementation of the algorithm is given in Section 3 .Its convergence analysis and numerical examples are mentioned in Sections 4 and 5, respectively.We end with conclusions.
Notation in this paper: A T denotes the transpose of matrix A; inner product using [A, B] = tr(A T B); matrix norm of A induced by A = [A, A] = tr(A T A) and ρ(A) is the spectral radius of the matrix A. For the matrix

Description of the Two-Stage Iteration Method
The two-stage iteration method consisting of the outer and inner iterations is discussed in this section.The outer iteration is performed by splitting the coefficient matrices, while the inner iteration is computed via the SYMMLQ algorithm.

Outer Iteration Scheme
We can split A and B into symmetric and antisymmetric parts: where M 1 and M 2 are symmetric parts and N 1 and N 2 are antisymmetric parts.They can be rewritten as More details can be found in [24,28,29,31].Then, the continuous Sylvester Equation (1) can be written as the following matrix equation: Given an initial guess X (0) ∈ R n×n , because the initial guess has an effect on the convergence speed of the algorithm, it has little influence on the calculation results.For convenience, the initial guess is taken as X (0) = O in the numerical examples.For k = 0, 1, 2, • • • , we use the following iteration scheme until {X (k) } converges: Let Then, the outer iteration can be expressed as We need to solve Equation ( 5) at each step of the iteration method so as to form the two-stage iteration method.Equation ( 5) is computed by the SYMMLQ algorithm, since the new coefficient matrices are symmetric.

Inner Iteration Scheme Based on the SYMMLQ Algorithm
Equation ( 5) is changed into the following linear system by using the vec operator where H = A ⊗ I + I ⊗ B T , the vectors x (k+1) and f contain the concatenated columns of the matrices X (k+1) and F (k) , respectively, with ⊗ being the Kronecker product symbol.Then, the SYMMLQ algorithm is proposed to solve the kth step iteration equation of the outer iteration scheme.For more details, we can refer to [8,9].The description of the corresponding SYMMLQ algorithm is given roughly in the following manner.
Let y = x (k+1) and b = f (k) .Then, Equation ( 6) can be written equivalently as Then, transform H into an i × i symmetric tridiagonal matrix T i by the Lanczos orthogonalization process.The coefficient matrix H is potentially unstable, and consequently T i is not positive definite.So, the LQ decomposition ( Triangular Orthogonal decomposition, where L is a lower triangular matrix, Q is an orthogonal matrix ) is used to transform T i into an i × i lower triangular matrix L i , seen in the following flow chart: The SYMMLQ algorithm: where T i = L i P i , and According to the above observation, we can establish the following stationary fixed-point iteration form for Equation (7): where Q = (q 1 , q 2 , • • • , q i ) is an n × i matrix and q 1 , q 2 , . . ., q i are orthonormal vectors which are computed by the Lanczos orthogonalization process.Here z i satisfies the following equation: where r (0) = b − Hy (0) , y (0) is a given initial vector, and e 1 = (1, 0, • • • , 0) T is an i-dimensional unit vector.More details can be found in the literature [33].

Parallel Implementation of the Two-Stage Iteration Method
In this section we discuss the parallel implementation including data storage, and implementation of outer iteration and inner iteration.

Data Storage
For convenience, let p be the number of processors, p i (i = 1, 2, . . ., p) represent ith processor, and l is an integer in n = pl. Mark Note: Due to the storage method, we chose the way of matrix multiplication with block row-row matrices in parallel computing process.Detailed descriptions of parallel computing Matrix multiplication are found in References [5,23,34].

Parallel Implementation of Outer Iteration Method
(1) Splitting process: Processor (2) Cycle processor: Step 3. Use the improved parallel process of SYMMLQ algorithm to solve the new equation This step, which improves the parallel efficiency and reduces the parallel time by reducing the frequency of communication, plays an important role in the whole parallel implementation of the two-stage iteration method, and the detailed implementation can be seen in Sections 3.3 and 3.4.

Improved Parallel Implementation of the SYMMLQ Algorithm
Clearly, when computing a k , b k in each step of the inner iteration, all processors need to apply the reduce operator twice in the parallel implementation of the SYMMLQ algorithm in Section 3.3.Therefore, we should rearrange Step 2 in the cycle process, while the remaining steps remain the same.The detailed parallel process of the algorithm can be expressed as follows. Processor In this way, computing a k , e k only needs to all-reduce once, so as to reduce the frequency of communication and thus reduce the parallel time.Eventually, we obtain an improved parallel implementation of the SYMMLQ algorithm.

Convergence Analysis
The convergence property above two-stage iteration method is mentioned here.
Lemma 1 (see [35]).Let H be a positive definite matrix, and H = M − N be a splitting, with M = (H + H T )/2, and N = (H T − H)/2.Then, ρ(M −1 N) < 1 if for all y ∈ C n , it holds that y H My > |y H Ny|.
Based on the above lemma, we obtain the following conclusion.Theorem 1. Assume the positive definite matrix H ∈ R n×n is split according to Lemma 1.If for all y ∈ C n , we have that |Re(y H Hy)| > |Im(y H Hy)|, then ρ(M −1 N) < 1 holds.
Proof of Theorem 1.For all y ∈ C n , we can get and y H Ny = (−y H Hy + y H H H y)/2 = Im(y H Hy).
According to the Lemma 1, we can obtain |Re(y The above theorem is generalized to the continuous Sylvester equation.The convergence theorem of the matrix equation can be obtained as follows: Theorem 2. Let A, B ∈ R n×n be positive definite matrices in the Sylvester Equation ( 1) and suppose that they are split as in the two-stage iterative format (5).If for all Y ∈ C n×n , we have then Proof of Theorem 2. By using the Kronecker product, we can transform Equation (1) into the matrix-vector form: where x = vec(X), f = vec(F).When the coefficient matrices A and B are positive definite matrices, we split them as in Section 2.1.Then, Equation (3) can be rewritten equivalently as According to Theorem 1, we yield |Re(y On the other hand, it holds that

Numerical Examples
In order to illustrate the performance of the two-stage iteration (TS iteration) method, some examples were performed in Matlab on an Intel dual core processor (1.00 GHz, 2 GB RAM) and the parallel machine Lenovo Shen-teng 1800 cluster.All iterations were started from a zero matrix and terminated when the current iterate satisfied R (k) < 10 −6 , where R (k) = F − (AX (k) + X (k) B) is the residual of the kth iteration.
Here we compare the TS iteration method with the HSS iteration method proposed in [29].

Notation:
T the computational time in seconds ITs the number of iteration steps p the total number of processors S speedup ratio Example 1.Consider the continuous Sylvester Equation ( 1) with m = n and the matrices where I is the identity matrix, and M, N ∈ R n×n are the tridiagonal matrices given by M = tridiag(−1, −2, −1) and N = tridiag(0.5,0, −0.5).
The goal in this test is to compare the iteration steps and the computational time by using TS iteration method, HSS iteration method, and MCG method with r = 0.01, r = 0.1, and r = 1.0.The numerical results are listed in Tables 1-3, respectively.The optimal parameters α exp (with β exp = α exp ) for the HSS iteration method are given in Table 4 proposed in [29].From the above tables, we see that both iteration steps and computational time by the TS method are much less than those by the HSS and MCG in all cases.The comparison between MCG and HSS is not straight-forward.Furthermore, for the above tables we observe that in some cases the number of iteration steps of MCG is larger than that of HSS, while on the contrary, for the computational time it mainly depends on the computational time of each iteration step.Let the step size be h = 1/101 and h = 1/1001.That means that the size of the linear system is 100 × 100 and 1000 × 1000, respectively.The equation is discretized by using five-point difference format, and then is transformed into a Sylvester equation.The numerical results are shown in Tables 5 and 6.This numerical experiment is performed on the parallel machine Lenovo Shen-teng 1800 cluster.Here we focus on comparing the parallel performance with the TS iteration method and the MCG method.From the results Tables 5 and 6, we observe that both iteration steps and computational time using TS are much less than those with the MCG.Furthermore, parallel efficiency with the TS method is higher than with the MCG.In addition, the advantages of the TS method increase over the MCG method with increasing scale-size of equations from 10 4 (n = 100) to 10 6 (n = 1000).sin(i + j), i − j = 1 In the Sylvester matrix equation AX + XB = F, let n = 1000 and F is an any given matrix.The numerical results are listed in Table 7. From Table 7 we observe that the two-stage iteration method is still efficient in the case that the coefficient matrices are indefinite matrices.This indicates that the condition for the convergence is only a sufficient condition in Theorem 1.

Conclusions
In this paper we have proposed a two-stage iteration parallel method for solving the continuous Sylvester equations.The outer iteration scheme is based on the coefficient matrices' splitting.Furthermore, an inner iteration scheme is obtained using the SYMMLQ algorithm.Its parallel implementation and its improved strategy have been explained in detail.Moreover, we have proved the convergence of the proposed iteration method under certain conditions.Numerical results show that the new proposed algorithm is better than both MCG and HSS methods with regard to computational storage, computational time, and iteration steps.Crucially, these advantages become more significant for large-scale systems of continuous Sylvester equations.

Table 1 .
Comparison of the the number of iteration steps (ITs) and the computational time in seconds (T) when choosing r = 0.01.HSS: Hermitian and skew-Hermitian splitting; MCG: modified conjugate gradient method; TS: two-stage iteration.

Table 2 .
Comparison of ITs and T when choosing r = 0.1.

Table 3 .
Comparison of ITs and T when choosing r = 1.0.

Table 4 .
The optimal values α exp for HSS.