A Parallel Two-Stage Iteration Method for Solving Continuous Sylvester Equations

Xiao, Manyu; Lv, Quanyi; Xing, Zhuo; Zhang, Yingchun

doi:10.3390/a10030095

Open AccessArticle

A Parallel Two-Stage Iteration Method for Solving Continuous Sylvester Equations

by

Manyu Xiao

^1,*,

Quanyi Lv

¹,

Zhuo Xing

¹ and

Yingchun Zhang

²

¹

Department of Applied Mathematics, Northwestern Polytechnical University, Xi’an 710072, China

²

Department of Mechanical Engineering, Northwestern Polytechnical University, Xi’an 710072, China

^*

Author to whom correspondence should be addressed.

Algorithms 2017, 10(3), 95; https://doi.org/10.3390/a10030095

Submission received: 8 July 2017 / Revised: 8 August 2017 / Accepted: 10 August 2017 / Published: 21 August 2017

Download Versions Notes

Abstract

:

In this paper we propose a parallel two-stage iteration algorithm for solving large-scale continuous Sylvester equations. By splitting the coefficient matrices, the original linear system is transformed into a symmetric linear system which is then solved by using the SYMMLQ algorithm. In order to improve the relative parallel efficiency, an adjusting strategy is explored during the iteration calculation of the SYMMLQ algorithm to decrease the degree of the reduce-operator from two to one communications at each step. Moreover, the convergence of the iteration scheme is discussed, and finally numerical results are reported showing that the proposed method is an efficient and robust algorithm for this class of continuous Sylvester equations on a parallel machine.

Keywords:

continuous Sylvester equations; SYMMLQ algorithm; two-stage iteration; parallel computing

1. Introduction

The solution of the continuous Sylvester equation

A X + X B = F

(1)

with large sparse matrices

A \in R^{n \times n}

,

B \in R^{n \times n}

,

X \in R^{n \times n}

,

F \in R^{n \times n}

and with A, B positive definite is a common task in numerical linear algebra. It arises in many scientific computing and engineering applications, such as control theory [1,2], neural networks, model reduction [3], image processing [4], and so on. Therefore, the problem has remained an active area of research. In this context, recent methodological advances have been thoroughly discussed in many papers [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]. Iterative methods for solving linear or nonlinear equations have seen constant improvement in recent years to reduce the computational time; for example, two multi-step derivative-free iterative methods [5]: block Jacobi two stage method [6] and SYMMLQ algorithm [7,8,9]. In addition, widely-used direct methods are, for instance, the Bartels–Stewart [10] and the Hessenberg–Schur method [11]. The main idea is to transform A and B into triangular or Hessenberg form [21] by an orthogonal similarity transformation and then to solve the resulting system of linear equations directly by a back-substitution process. However, this method is not applicable in large-scale problems due to the prohibitive computational issue. In order to overcome this limitation, fast iterative methods have been developed such as the Smith method [12], the alternating direction implicit (ADI) method [13], gradient-based methods [14,15], and the Krylov subspace-based algorithm [7,16,17]. At present, the conjugate gradient (CG) method [7] and the preconditioned conjugate gradient method [18] are popularly used with the advantages of small storage and suitability for parallel computing. Typically, the SYMMLQ algorithm [7,8,9] is quite efficient in the case of symmetric coefficient matrices, as it has tremendous advantages in small storage capacity and stable computations. However, it is not a good option for multi-computer systems due to the high cost of global communication. For asymmetric coefficient matrices, a modified conjugate gradient method (MCG) is useful. However, its convergence speed is slow [22,23].

Another type of iteration based on splitting methods allows us to better utilize standard methodologies. For instance, Bai et al. [24] proposed Hermitian and skew-Hermitian splitting (HSS) iteration methods for solving systems of linear equations with non-Hermitian positive definite form. This has been studied widely and generalized in [25,26,27,28]. Recently, a Hermitian and skew-Hermitian splitting (HSS) iteration method for solving large sparse continuous Sylvester equations with non-Hermitian and positive definite/semidefinite matrices was discussed in [29]. Wang et al. [30] presented a positive-definite and skew-Hermitian splitting (PSS) iteration method, and in [31] Zhou et al. applied the modified Hermitian and skew-Hermitian splitting (MHSS) iteration method to solve the continuous Sylvester equation. Zheng and Ma in [32] applied the idea of the normal and skew-Hermitian splitting (NSS) iteration method to continuous Sylvester equations.

However, these iteration methods have the common difficulty that there is no accurate formula to determine the positive value of the corresponding parameter in the iteration scheme. In many articles, a large amount of work has been done to address this issue. Unfortunately this estimation methodology is still not fully resolved in practical applications. In addition, their implementations need to solve two continuous Sylvester equations, which results in great additional computational cost.

All of these brought about the need for the development and validation of an efficient parallel algorithm. In this paper we have proposed a parallel algorithm of two-stage iteration for solving large-scale continuous Sylvester equations with the combination of the HSS iteration method and the SYMMLQ algorithm. The main idea is to split the coefficient matrices into a symmetric and an anti-symmetric matrix, respectively. Then, the original equations are transferred into symmetric matrix equations which are solved by the SYMMLQ algorithm. Furthermore, we focus on the improvement of the parallel efficiency of the SYMMLQ algorithm by adjusting the calculation steps.

The remainder of this paper is organized as follows. In Section 2, a description of the two-stage iteration method is presented based on a splitting method and the SYMMLQ algorithm for solving the continuous Sylvester Equation (1). Then the parallel implementation of the algorithm is given in Section 3 . Its convergence analysis and numerical examples are mentioned in Section 4 and Section 5, respectively. We end with conclusions.

Notation in this paper:

A^{T}

denotes the transpose of matrix A; inner product using

[A, B] = t r (A^{T} B)

; matrix norm of A induced by

∥ A ∥ = \sqrt{[A, A]} = \sqrt{t r (A^{T} A)}

and

ρ (A)

is the spectral radius of the matrix A. For the matrix

X = {(x_{1}, x_{2}, \dots, x_{n})}^{T} \in R^{n \times n}

,

\bar{v e c} (X)

denotes the

\bar{v e c}

operator defined as

\bar{v e c} (X) = {(x_{1}^{T}, x_{2}^{T}, \dots, x_{n}^{T})}^{T} \in R^{n^{2}}

.

2. Description of the Two-Stage Iteration Method

The two-stage iteration method consisting of the outer and inner iterations is discussed in this section. The outer iteration is performed by splitting the coefficient matrices, while the inner iteration is computed via the SYMMLQ algorithm.

2.1. Outer Iteration Scheme

We can split A and B into symmetric and antisymmetric parts:

A = M_{1} - N_{1}, B = M_{2} - N_{2}

(2)

where

M_{1}

and

M_{2}

are symmetric parts and

N_{1}

and

N_{2}

are antisymmetric parts. They can be rewritten as

\begin{matrix} M_{1} = \frac{1}{2} (A + A^{T}), N_{1} = \frac{1}{2} (A^{T} - A), \\ M_{2} = \frac{1}{2} (B + B^{T}), N_{2} = \frac{1}{2} (B^{T} - B) . \end{matrix}

More details can be found in [24,28,29,31].

Then, the continuous Sylvester Equation (1) can be written as the following matrix equation:

M_{1} X + X M_{2} = N_{1} X + X N_{2} + F .

(3)

Given an initial guess

X^{(0)} \in R^{n \times n}

, because the initial guess has an effect on the convergence speed of the algorithm, it has little influence on the calculation results. For convenience, the initial guess is taken as

X^{(0)} = O

in the numerical examples. For

k = 0, 1, 2, \dots

, we use the following iteration scheme until

{X^{(k)}}

converges:

M_{1} X^{(k + 1)} + X^{(k + 1)} M_{2} = N_{1} X^{(k)} + X^{(k)} N_{2} + F .

(4)

Let

{\tilde{F}}^{(k)} = N_{1} X^{(k)} + X^{(k)} N_{2} + F

. Then, the outer iteration can be expressed as

M_{1} X^{(k + 1)} + X^{(k + 1)} M_{2} = {\tilde{F}}^{(k)} .

(5)

We need to solve Equation (5) at each step of the iteration method so as to form the two-stage iteration method. Equation (5) is computed by the SYMMLQ algorithm, since the new coefficient matrices are symmetric.

2.2. Inner Iteration Scheme Based on the SYMMLQ Algorithm

Equation (5) is changed into the following linear system by using the

\bar{v e c}

operator

H x^{(k + 1)} = {\tilde{f}}^{(k)},

(6)

where

H = A \otimes I + I \otimes B^{T}

, the vectors

x^{(k + 1)}

and

\tilde{f}

contain the concatenated columns of the matrices

X^{(k + 1)}

and

{\tilde{F}}^{(k)}

, respectively, with ⊗ being the Kronecker product symbol.

Then, the SYMMLQ algorithm is proposed to solve the kth step iteration equation of the outer iteration scheme. For more details, we can refer to [8,9]. The description of the corresponding SYMMLQ algorithm is given roughly in the following manner.

Let

y = x^{(k + 1)}

and

b = {\tilde{f}}^{(k)}

. Then, Equation (6) can be written equivalently as

H y = b .

(7)

Then, transform H into an

i \times i

symmetric tridiagonal matrix

T_{i}

by the Lanczos orthogonalization process. The coefficient matrix H is potentially unstable, and consequently

T_{i}

is not positive definite. So, the LQ decomposition ( Triangular Orthogonal decomposition, where L is a lower triangular matrix, Q is an orthogonal matrix ) is used to transform

T_{i}

into an

i \times i

lower triangular matrix

L_{i}

, seen in the following flow chart: The SYMMLQ algorithm:

H \to_{o r t h o g o n a l i z a t i o n}^{L a n c z o s} T_{i} \to_{d e c o m p o s i t i o n}^{L Q} L_{i} P_{i} .

(8)

where

T_{i} = L_{i} P_{i}

, and

\begin{matrix} T_{i} = (\begin{matrix} a_{1} & b_{1} \\ b_{1} & a_{2} & b_{2} \\ b_{2} & a_{3} & ⋱ \\ ⋱ & ⋱ & b_{i - 1} \\ b_{i - 1} & a_{i} \end{matrix}), & L_{i} = (\begin{matrix} r_{1} \\ δ_{2} & r_{2} \\ ε_{3} & δ_{3} & r_{3} \\ ⋱ & ⋱ & ⋱ \\ ε_{i} & δ_{i} & r_{i} \end{matrix}), \end{matrix}

P_{i} = P_{i - 1} (\begin{matrix} 1 \\ 1 \\ ⋱ \\ c_{i - 1} & s_{i - 1} \\ - s_{i - 1} & c_{i - 1} \end{matrix}) .

According to the above observation, we can establish the following stationary fixed-point iteration form for Equation (7):

y^{(i + 1)} = y^{(i)} + Q_{i} z_{i}

(9)

where

Q = (q_{1}, q_{2}, \dots, q_{i})

is an

n \times i

matrix and

q_{1}, q_{2}, \dots, q_{i}

are orthonormal vectors which are computed by the Lanczos orthogonalization process. Here

z_{i}

satisfies the following equation:

T_{i} z_{i} = {∥ r^{(0)} ∥}_{2} e_{1},

(10)

where

r^{(0)} = b - H y^{(0)}

,

y^{(0)}

is a given initial vector, and

e_{1} = {(1, 0, \dots, 0)}^{T}

is an i-dimensional unit vector. More details can be found in the literature [33].

3. Parallel Implementation of the Two-Stage Iteration Method

In this section we discuss the parallel implementation including data storage, and implementation of outer iteration and inner iteration.

3.1. Data Storage

For convenience, let p be the number of processors,

p_{i} (i = 1, 2, \dots, p)

represent ith processor, and l is an integer in

n = p l

.

Mark

\begin{matrix} A = {(A_{1}^{T}, A_{2}^{T}, \dots, A_{p}^{T})}^{T}, B = {(B_{1}^{T}, B_{2}^{T}, \dots, B_{p}^{T})}^{T}, F = {(F_{1}^{T}, F_{2}^{T}, \dots, F_{p}^{T})}^{T}, \\ X^{(0)} = {({(X_{1}^{(0)})}^{T}, {(X_{2}^{(0)})}^{T}, \dots, {(X_{p}^{(0)})}^{T})}^{T}, A^{T} = {({\tilde{A}}_{1}^{T}, {\tilde{A}}_{2}^{T}, \dots, {\tilde{A}}_{p}^{T})}^{T}, \\ B^{T} = {({\tilde{B}}_{1}^{T}, {\tilde{B}}_{2}^{T}, \dots, {\tilde{B}}_{p}^{T})}^{T}, \tilde{F} = {({\tilde{F}}_{1}^{T}, {\tilde{F}}_{2}^{T}, \dots, {\tilde{F}}_{p}^{T})}^{T}, \\ M_{1} = {({(M_{1, 1})}^{T}, {(M_{1, 2})}^{T}, \dots, {(M_{1, p})}^{T})}^{T}, M_{2} = {({(M_{2, 1})}^{T}, {(M_{2, 2})}^{T}, \dots, {(M_{2, p})}^{T})}^{T}, \\ N_{1} = {({(N_{1, 1})}^{T}, {(N_{1, 2})}^{T}, \dots, {(N_{1, p})}^{T})}^{T}, N_{2} = {({(N_{2, 1})}^{T}, {(N_{2, 2})}^{T}, \dots, {(N_{2, p})}^{T})}^{T} \end{matrix}

where

A_{i}

,

B_{i}

,

F_{i}

,

X_{i}^{(0)}

,

{\tilde{A}}_{i}

,

{\tilde{B}}_{i}

,

{\tilde{F}}_{i}

,

M_{1, i}

,

M_{2, i}

,

N_{1, i}

,

N_{2, i}

(i = 1, 2, \dots, p)

are

l \times n

sub-block matrices. These are saved in row storage. Then, store

A_{i}

,

B_{i}

,

F_{i}

,

X_{i}^{(0)}

,

{\tilde{A}}_{i}

,

{\tilde{B}}_{i}

on the processor

p_{i} (i = 1, 2, \dots, p)

.

Note:

Due to the storage method, we chose the way of matrix multiplication with block row–row matrices in parallel computing process. Detailed descriptions of parallel computing Matrix multiplication are found in References [5,23,34].

3.2. Parallel Implementation of Outer Iteration Method

(1) Splitting process: Processor

p_{i} (i = 1, 2, \dots, p)

computes

M_{1, i} = (A_{i} + {\tilde{A}}_{i}) / 2, N_{1, i} = ({\tilde{A}}_{i} - A_{i}) / 2, M_{2, i} = (B_{i} + {\tilde{B}}_{i}) / 2, N_{2, i} = ({\tilde{B}}_{i} - B_{i}) / 2 .

(2) Cycle processor:

Step 1. Processor

p_{i} (i = 1, 2, \dots, p)

computes

Δ X_{i}^{(k)} = X_{i}^{(k)} - X_{i}^{(k - 1)} and [Δ X_{i}^{(k)}, Δ X_{i}^{(k)}],

get

[Δ X_{i}^{(k)}, Δ X_{i}^{(k)}]

after all-reduce. If

∥ Δ X^{(k)} ∥ < ε

stop; otherwise, turn to step 2.

Step 2. Compute

{\tilde{F}}_{i} = N_{1, i} X^{(k)} + X_{i}^{(k)} N_{2} + F_{i}

in each processor.

Step 3. Use the improved parallel process of SYMMLQ algorithm to solve the new equation

M_{1} X^{(k + 1)} + X^{(k + 1)} M_{2} = \tilde{F} .

(11)

This step, which improves the parallel efficiency and reduces the parallel time by reducing the frequency of communication, plays an important role in the whole parallel implementation of the two-stage iteration method, and the detailed implementation can be seen in Section 3.3 and Section 3.4.

Step 4. Let

k : = k + 1

, turn to Step 1.

3.3. Parallel Implementation of Inner Iteration Scheme

(1) Compute process:

① Processor

p_{i} (i = 1, 2, \dots, p)

computes

R_{i}^{(0)} = F_{i} - (A_{i} X^{(0)} + X_{i}^{(i)} B) and [R_{i}^{(0)}, R_{i}^{(0)}],

obtain

[R_{i}^{(0)}, R_{i}^{(0)}]

after all-reduce, then compute

Q_{i}^{(1)} = R_{i}^{(0)} / ∥ R^{(0)} ∥ .

② Compute

G_{i}^{(1)} = A_{i} Q^{(1)} + Q_{i}^{(1)} B and [G_{i}^{(1)}, Q_{i}^{(1)}],

obtain

a_{1} = [G_{i}^{(1)}, Q_{i}^{(1)}]

after all-reduce. Compute

H_{i}^{(1)} = G_{i}^{(1)} - a_{1} Q_{i}^{(1)} and [H_{i}^{(1)}, H_{i}^{(1)}],

get

[H^{(1)}, H^{(1)}]

after all-reduce and

b_{1} = \sqrt{[H^{(1)}, H^{(1)}}

, then compute

{\tilde{r}}_{1}

,

{\tilde{ξ}}_{1}

,

{\tilde{W}}_{i}^{(1)}

and

X_{i}^{(1)} = X_{i}^{(0)} + {\tilde{ξ}}_{1} {\tilde{W}}_{i}^{(1)}

in each processor.

③ Processor

p_{i} (i = 1, 2, \dots, p)

computes

Q_{i}^{(2)} = H_{i}^{(1)} / b_{1} and G_{i}^{(2)} = A_{i} Q^{(2)} + Q_{i}^{(2)} B,

and computes the inner product

[G_{i}^{(2)}, Q_{i}^{(2)}]

, get

a_{2} = [G^{(2)}, Q^{(2)}]

after all-reduce. Compute

H_{i}^{(2)} = G_{i}^{(2)} - a_{2} Q_{i}^{(2)} - b_{1} Q_{i}^{(1)} and [H_{i}^{(2)}, H_{i}^{(2)}],

get

[H^{(2)}, H^{(2)}]

after all-reduce, compute

b_{2} = \sqrt{[H^{(2)}, H^{(2)}]}

.

④ Processor

p_{i} (i = 1, 2, \dots, p)

computes

\begin{matrix} c_{1} = \frac{a_{1}}{{(a_{1}^{2} + b_{1}^{2})}^{\frac{1}{2}}}, s_{1} = \frac{b_{1}}{{(a_{1}^{2} + b_{1}^{2})}^{\frac{1}{2}}}, δ_{2} = b_{1} c_{1} + a_{2} s_{1}, ε_{3} = b_{2} s_{1}, \\ r_{1} = {(a_{1}^{2} + b_{1}^{2})}^{\frac{1}{2}}, ξ_{1} = ∥ r_{0} ∥ / r_{1}, {\tilde{r}}_{2} = b_{1} s_{1} - a_{2} c_{1}, r_{2} = {({\tilde{r}}_{2}^{2} + b_{2}^{2})}^{\frac{1}{2}}, {\tilde{δ}}_{3} = - b_{2} c_{1}, \end{matrix}

if

{\tilde{r}}_{2} < 10^{- 15}

stop, otherwise compute

{\tilde{ξ}}_{2} = - δ_{2} ξ_{1} / {\tilde{r}}_{2}

.

⑤ Processor

p_{i} (i = 1, 2, \dots, p)

computes

\begin{matrix} ξ_{2} = - \frac{δ_{2} ξ_{1}}{r_{2}}, W_{i}^{(1)} = c_{1} {\tilde{W}}_{i}^{(1)} + s_{1} Q_{i}^{(2)}, {\tilde{W}}_{i}^{(2)} = s_{1} {\tilde{W}}_{i}^{(1)} - c_{1} Q_{i}^{(2)}, \\ Y_{i}^{(1)} = X_{i}^{(0)} + ξ_{1} W_{i}^{(1)}, X_{i}^{(2)} = Y_{i}^{(1)} + {\tilde{ξ}}_{2} {\tilde{W}}_{i}^{(2)} . \end{matrix}

(2) Cycle process:

Step 1. Processor

p_{i} (i = 1, 2, \dots, p)

computes

∥ R^{(k - 1)} ∥ = | b_{k - 1} (s_{k - 2} ξ_{k - 2} - c_{k - 2} {\tilde{ξ}}_{k - 1}) |,

if

∥ R^{(k - 1)} ∥ < ε

stop, otherwise, turn to Step 2.

Step 2. Processor

p_{i} (i = 1, 2, \dots, p)

computes

Q_{i}^{(k)} = H_{i}^{(k - 1)} / b_{k - 1} and G_{i}^{(k)} = A_{i} Q^{(k)} + Q_{i}^{(k)} B,

then compute

[G_{i}^{(k)}, Q_{i}^{(k)}]

, obtain

a_{k} = [G^{(k)}, Q^{(k)}]

, after all-reduce. Compute

H_{i}^{(k)} = G_{i}^{(k)} - a_{k} Q_{i}^{(k)} - b_{k - 1} Q_{i}^{(k - 1)} and [H_{i}^{(k)}, H_{i}^{(k)}],

Obtain

[H^{(k)}, H^{(k)}]

after all-reduce, compute

b_{k} = \sqrt{[H^{(k)}, H^{(k)}]}

.

Step 3. Processor

p_{i} (i = 1, 2, \dots, p)

computes

\begin{matrix} c_{k - 1} = \frac{{\tilde{r}}_{k - 1}}{{({\tilde{r}}_{k - 1}^{2} + b_{k - 1}^{2})}^{\frac{1}{2}}}, s_{k - 1} = \frac{b_{k - 1}}{{({\tilde{r}}_{k - 1}^{2} + b_{k - 1}^{2})}^{\frac{1}{2}}}, ε_{k + 1} = b_{k} s_{k - 1}, \\ δ_{k} = {\tilde{δ}}_{k} c_{k - 1} + a_{k} s_{k - 1}, {\tilde{r}}_{k} = {\tilde{δ}}_{k} s_{k - 1} - a_{k} c_{k - 1}, r_{k} = {({\tilde{r}}_{k}^{2} + b_{k}^{2})}^{\frac{1}{2}}, {\tilde{δ}}_{k + 1} = - b_{k} c_{k - 1}, \end{matrix}

if

{\tilde{r}}_{k} < 10^{- 15}

stop; otherwise, compute

{\tilde{ξ}}_{k} = (ε_{k} ξ_{k - 2} + δ_{k} ξ_{k - 1}) / {\tilde{r}}_{k}

.

Step 4. Processor

p_{i} (i = 1, 2, \dots, p)

computes

\begin{matrix} ξ_{k} = - (ε_{k} ξ_{k - 2} + δ_{k} ξ_{k - 1}) / r_{k}, W_{i}^{k - 1} = c_{k - 1} {\tilde{W}}_{i}^{(k - 1)} + s_{k - 1} Q_{i}^{(k)}, \\ {\tilde{W}}_{i}^{(k)} = s_{k - 1} {\tilde{W}}_{i}^{k - 1} - c_{k - 1} Q_{i} (k), Y_{i}^{(k - 1)} = Y_{i}^{(k - 2)} + ξ_{k - 1} W_{i}^{(k - 1)}, \\ X_{i}^{(k)} = Y_{i}^{(k - 1)} + {\tilde{ξ}}_{k} {\tilde{W}}_{i}^{(k)} . \end{matrix}

Step 5 Let

k : = k + 1

, turn to Step 1.

3.4. Improved Parallel Implementation of the SYMMLQ Algorithm

Clearly, when computing

a_{k}

,

b_{k}

in each step of the inner iteration, all processors need to apply the reduce operator twice in the parallel implementation of the SYMMLQ algorithm in Section 3.3. Therefore, we should rearrange Step 2 in the cycle process, while the remaining steps remain the same. The detailed parallel process of the algorithm can be expressed as follows.

Processor

p_{i} (i = 1, 2, \dots, p)

computes

\begin{matrix} Q_{i}^{(k)} = (H_{i}^{(k - 1)} - a_{k - 1} Q_{i}^{(k - 1)}) / b_{k - 1}, G_{i}^{(k)} = A_{i} Q^{(k)} + Q_{i}^{(k)} B, \\ H_{i}^{(k)} = G_{i}^{(k)} - b_{k - 1} Q_{i}^{(k - 1)} and L_{i}^{(k)} = H_{i}^{(k)} - a_{k - 1} Q_{i}^{(k)}, \end{matrix}

then compute

[G_{i}^{(k)}, Q_{i}^{(k)}]

and

[L_{i}^{(k)}, L_{i}^{(k)}]

, get

a_{k} = [G^{(k)}, Q^{(k)}]

and

e_{k} = [L^{(k)}, L^{(k)}]

after one all-reduce, compute

b_{k} = \sqrt{e_{k} - {(a_{k} - a_{k - 1})}^{2}}

.

In this way, computing

a_{k}

,

e_{k}

only needs to all-reduce once, so as to reduce the frequency of communication and thus reduce the parallel time. Eventually, we obtain an improved parallel implementation of the SYMMLQ algorithm.

4. Convergence Analysis

The convergence property above two-stage iteration method is mentioned here.

Lemma 1

(see [35]). Let H be a positive definite matrix, and

H = M - N

be a splitting, with

M = (H + H^{T}) / 2

, and

N = (H^{T} - H) / 2

. Then,

ρ (M^{- 1} N) < 1

if for all

y \in C^{n}

, it holds that

y^{H} M y > | y^{H} N y |

.

Based on the above lemma, we obtain the following conclusion.

Theorem 1.

Assume the positive definite matrix

H \in R^{n \times n}

is split according to Lemma 1. If for all

y \in C^{n}

, we have that

| R e (y^{H} H y) | > | I m (y^{H} H y) |

, then

ρ (M^{- 1} N) < 1

holds.

Proof of Theorem 1.

For all

y \in C^{n}

, we can get

y^{H} M y = (y^{H} H y + y^{H} H^{H} y) / 2 = R e (y^{H} H y)

(12)

and

y^{H} N y = (- y^{H} H y + y^{H} H^{H} y) / 2 = I m (y^{H} H y) .

(13)

According to the Lemma 1, we can obtain

| R e (y^{H} H y) | > | I m (y^{H} H y) |

. Therefore

ρ (M^{- 1} N) < 1

. ☐

The above theorem is generalized to the continuous Sylvester equation. The convergence theorem of the matrix equation can be obtained as follows:

Theorem 2.

Let

A, B \in R^{n \times n}

be positive definite matrices in the Sylvester Equation (1) and suppose that they are split as in the two-stage iterative format (5). If for all

Y \in C^{n \times n}

, we have

| R e [Y, A Y + Y B] | > | I m [Y, A Y + Y B] |,

(14)

then

ρ ({(M_{1} \otimes I + I \otimes M_{2}^{T})}^{- 1} (N_{1} \otimes I + I \otimes N_{2}^{T})) < 1

(15)

holds.

Proof of Theorem 2.

By using the Kronecker product, we can transform Equation (1) into the matrix-vector form:

(A \otimes I + I \otimes B^{T}) x = f,

(16)

where

x = \bar{v e c} (X)

,

f = \bar{v e c} (F)

. When the coefficient matrices A and B are positive definite matrices, we split them as in Section 2.1. Then, Equation (3) can be rewritten equivalently as

(M_{1} \otimes I + I \otimes M_{2}^{T}) x = (N_{1} \otimes I + I \otimes N_{2}^{T}) x + f .

(17)

According to Theorem 1, we yield

| R e (y^{H} (A \otimes I + I \otimes B^{T}) y) | > | I m (y^{H} (A \otimes I + I \otimes B^{T}) y) | .

(18)

On the other hand, it holds that

y^{H} (A \otimes I + I \otimes B^{T}) y = t r (Y^{H} (A Y + Y B)) = [Y, A Y + Y B] .

(19)

Therefore,

| R e [Y, A Y + Y B] | > | I m [Y, A Y + Y B] | .

(20)

☐

5. Numerical Examples

In order to illustrate the performance of the two-stage iteration (TS iteration) method, some examples were performed in Matlab on an Intel dual core processor (1.00 GHz, 2 GB RAM) and the parallel machine Lenovo Shen-teng 1800 cluster. All iterations were started from a zero matrix and terminated when the current iterate satisfied

∥ R^{(k)} ∥ < 10^{- 6}

, where

R^{(k)} = F - (A X^{(k)} + X^{(k)} B)

is the residual of the kth iteration.

Here we compare the TS iteration method with the HSS iteration method proposed in [29].

Notation:

T	the computational time in seconds
ITs	the number of iteration steps
p	the total number of processors
S	speedup ratio
$E_{1}$	parallel efficiency
$Δ$	error $∥ R^{(k)} ∥$

Example 1.

Consider the continuous Sylvester Equation (1) with

m = n

and the matrices

A = B = M + 2 r N + \frac{100}{{(n + 1)}^{2}} I,

where I is the identity matrix, and

M, N \in R^{n \times n}

are the tridiagonal matrices given by

M = t r i d i a g (- 1, - 2, - 1) a n d N = t r i d i a g (0.5, 0, - 0.5) .

The goal in this test is to compare the iteration steps and the computational time by using TS iteration method, HSS iteration method, and MCG method with

r = 0.01

,

r = 0.1

, and

r = 1.0

. The numerical results are listed in Table 1, Table 2 and Table 3, respectively. The optimal parameters

α_{e x p} (with β_{e x p} = α_{e x p})

for the HSS iteration method are given in Table 4 proposed in [29].

From the above tables, we see that both iteration steps and computational time by the TS method are much less than those by the HSS and MCG in all cases. The comparison between MCG and HSS is not straight-forward. Furthermore, for the above tables we observe that in some cases the number of iteration steps of MCG is larger than that of HSS, while on the contrary, for the computational time it mainly depends on the computational time of each iteration step.

Example 2.

Consider the elliptic partial differential equation

\frac{\partial^{2} u}{\partial x^{2}} + \frac{\partial^{2} u}{\partial y^{2}} + \sin 2 π x \frac{\partial u}{\partial x} + \sin 2 π y \frac{\partial u}{\partial y} + u = 0, 0 \leq x, y \leq 1

with boundary condition

\{\begin{matrix} {u |}_{x = 0} {= u |}_{x = 1} = 10 + \cos π y \\ {u |}_{y = 0} {= u |}_{y = 1} = 10 + \cos π x \end{matrix} .

Let the step size be

h = 1 / 101

and

h = 1 / 1001

. That means that the size of the linear system is

100 \times 100

and

1000 \times 1000

, respectively. The equation is discretized by using five-point difference format, and then is transformed into a Sylvester equation. The numerical results are shown in Table 5 and Table 6. This numerical experiment is performed on the parallel machine Lenovo Shen-teng 1800 cluster. Here we focus on comparing the parallel performance with the TS iteration method and the MCG method.

From the results Table 5 and Table 6, we observe that both iteration steps and computational time using TS are much less than those with the MCG. Furthermore, parallel efficiency with the TS method is higher than with the MCG. In addition, the advantages of the TS method increase over the MCG method with increasing scale-size of equations from

10^{4}

(

n = 100

) to

10^{6}

(

n = 1000

).

Example 3.

Let

A = {(a_{i j})}_{n \times n}

and

B = {(b_{i j})}_{n \times n}

where

a_{i j} = {\begin{matrix} - n + \sin (i + j), & i = j \\ i, & o t h e r s \end{matrix}, b_{i j} = {\begin{matrix} 3 + 2 \sin (i + j), & i = j \\ \cos (i + j), & j - i = 1 . \\ \sin (i + j), & i - j = 1 \end{matrix}

In the Sylvester matrix equation

A X + X B = F

, let

n = 1000

and F is an any given matrix. The numerical results are listed in Table 7.

From Table 7 we observe that the two-stage iteration method is still efficient in the case that the coefficient matrices are indefinite matrices. This indicates that the condition for the convergence is only a sufficient condition in Theorem 1.

6. Conclusions

In this paper we have proposed a two-stage iteration parallel method for solving the continuous Sylvester equations. The outer iteration scheme is based on the coefficient matrices’ splitting. Furthermore, an inner iteration scheme is obtained using the SYMMLQ algorithm. Its parallel implementation and its improved strategy have been explained in detail. Moreover, we have proved the convergence of the proposed iteration method under certain conditions. Numerical results show that the new proposed algorithm is better than both MCG and HSS methods with regard to computational storage, computational time, and iteration steps. Crucially, these advantages become more significant for large-scale systems of continuous Sylvester equations.

Acknowledgments

This work has been supported by the National Natural Science Foundation of China (Grants No. 11302173)and the Natural Science Foundation of Shaanxi Province (Grants No. 2017JQ1037). The computation of examples was executed in the Center for High Performance Computing of Northwestern Polytechnical University.

Author Contributions

Manyu Xiao, Quanyi Lv and Zhuo Xing conceived the key ideas, designed the algorithm, performed the examples and drafted the initial manuscript; Manyu Xiao, Quanyi Lv and Yingchun Zhang revised the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Datta, B.N. Numerical Methods for Linear Control Systems; Elsevier Academic Press: San Diego, CA, USA, 2004. [Google Scholar]
Schaft, A.J.V.D. L₂-Gain and Passivity Techniques in Nonlinear Control, 2nd ed.; Springer-Verlag: London, UK, 2000. [Google Scholar]
Baur, U.; Benner, P. Cross-gramian based model reduction for data-sparse systems. Electron. Trans. Numer. Anal. 2008, 31, 256–270. [Google Scholar]
Calvetti, D.; Reichel, L. Application of ADI iterative methods to the restoration of noisy images. SIAM J. Matrix Anal. Appl. 1996, 17, 165–186. [Google Scholar] [CrossRef]
Wang, X.; Fan, X. Two efficient derivative-free iterative methods for solving nonlinear systems. Algorithms 2016, 9, 14. [Google Scholar] [CrossRef]
Allahviranloo, T.; Ahmady, E.; Ahmady, N.; Shams, A.K. Block Jacobi two-stage method with Gauss-Sidel inner iterations for fuzzy system of linear equations. Appl. Math. Comput. 2006, 175, 1217–1228. [Google Scholar] [CrossRef]
Wu, J.P.; Wang, Z.H.; Li, X.M. The Efficient Solution and Parallel Algorithm of Spare Linear Equations; Science and Technology Press of Hunan: Changsha, China, 2004; pp. 269–273. [Google Scholar]
Paige, C.C.; Saunders, M.A. Solution of sparse indefinite system of linear equations. SIAM J. Number Anal. 1975, 12, 617–629. [Google Scholar] [CrossRef]
Jiang, E.X. Matrix Calculations; Science Press: Beijing, China, 2008. [Google Scholar]
Bartels, R.H.; Stewart, G.W. Solution of the matrix equation AX + XB = C: [F4]. Commun. ACM 1972, 15, 820–826. [Google Scholar] [CrossRef]
Golub, G.H.; Nash, S.; Loan, C.V. A Hessenberg-Schur method for the problem AX + XB = C. IEEE Trans. Automat. Control. 1979, 24, 909–913. [Google Scholar] [CrossRef]
Smith, R.A. Matrix equation XA + BX = C. SIAM J. Appl. Math. 1968, 16, 198–201. [Google Scholar] [CrossRef]
Levenberg, M.; Reichel, L. A generalized ADI iterative method. Numer. Math. 1993, 66, 215–233. [Google Scholar] [CrossRef]
Niu, Q.; Wang, X.; Lu, L.Z. A relaxed gradient based algorithm for solving Sylvester equations. Asian J. Control. 2011, 13, 461–464. [Google Scholar] [CrossRef]
Wang, X.; Dai, L.; Liao, D. A modified gradient based algorithm for solving Sylvester equations. Appl. Math. Comput. 2012, 218, 5620–5628. [Google Scholar] [CrossRef]
Bao, L.; Lin, Y.Q.; Wei, Y.M. A new projection method for solving large Sylvester equations. Appl. Numer. Math. 2007, 57, 521–532. [Google Scholar] [CrossRef]
Druskin, V.; Knizhnerman, L.; Simoncini, V. Analysis of the rational Krylov subspace and ADI methods for solving the Lyapunov equation. SIAM J. Numer. 2011, 49, 1857–1898. [Google Scholar] [CrossRef]
Evans, D.J.; Galligani, E. A parallel additive preconditioner for conjugate gradient method for AX + XB = C. Parallel Comput. 1994, 20, 1055–1064. [Google Scholar] [CrossRef]
Kelley, C.T. Iterative Methods for Linear and Nonlinear Equations; Society For Industrial And Applied Mathematics: Philadelphia, PA, USA, 1995. [Google Scholar]
Herisanu, N.; Marinca, V. An iteration procedure with application to van der pol oscillator. Int. J. Nonlinear Sci. Numer. Simul. 2009, 10, 353–362. [Google Scholar] [CrossRef]
Cheng, Y.P.; Zhang, K.Y.; Xu, Z. Matrix Theory, 3rd ed.; Northwestern Polytechnical University Press: Xi’an, China, 2006; pp. 215–216. [Google Scholar]
Hou, J.X.; Lv, Q.Y.; Man, Y.X. A Parallel Preconditioned Modified Conjugate Gradient Method for Large Sylvester Matrix Equation. Math. Prob. Eng. 2014, 2014, 1–7. [Google Scholar] [CrossRef]
Chen, G.L.; Hong, A.; Chen, J.; Zheng, Q.L.; Shan, J.L. The Parallel Algorithm; Higher Education Press: Beijing, China, 2004. [Google Scholar]
Bai, Z.Z.; Golub, G.H.; Ng, M.K. Hermitian and Skew-Hermitian splitting methods for non-Hermitian positive definite linear systems. SIAM J. Matrix Anal. Appl. 2003, 24, 603–626. [Google Scholar] [CrossRef]
Bai, Z.Z. Optimal parameters in the HSS-like methods for saddle-point problems. Numer. Linear Algebra Appl. 2009, 16, 447–479. [Google Scholar] [CrossRef]
Bai, Z.Z.; Golub, G.H.; Ng, M.K. On inexact Hermitian and skew-Hermitian splitting methods for non-Hermitian positive definite linear systems. Numer. Linear Algebra Appl. 2008, 428, 413–440. [Google Scholar] [CrossRef]
Huang, Y.M. A practical formula for computing optimal parameters in the HSS iteration methods. J. Comput. Math. 2014, 255, 142–149. [Google Scholar] [CrossRef]
Wang, X.; Li, Y.; Dai, L. On Hermitian and skew-Hermitian splitting iteration methods for the linear matrix equation AXB = C. Comput. Math. Appl. 2013, 65, 657–664. [Google Scholar] [CrossRef]
Bai, Z.Z. On Hermitian and skew-Hermitian splitting iteration methods for continuous Sylvester equations. J. Comput. Math. 2011, 29, 185–198. [Google Scholar]
Wang, X.; Li, W.W.; Mao, L.Z. On positive-definite and skew-Hermitian splitting iteration methods for continuous Sylvester equation AX + XB = C. Comput. Math. Appl. 2013, 66, 2352–2361. [Google Scholar] [CrossRef]
Zhou, D.M.; Chen, G.L.; Cai, Q.Y. On modified HSS iteration methods for continuous Sylvester equations. Appl. Math. Comput. 2015, 263, 84–93. [Google Scholar] [CrossRef]
Zheng, Q.Q.; Ma, C.F. On normal and skew-Hermitian splitting iteration methods for large sparse continuous Sylvester equations. J. Comput. Appl. Math. 2014, 268, 145–154. [Google Scholar] [CrossRef]
Golub, G.H.; Loan, C.F.V. Matrix Computations, 3rd ed.; The Johns Hopkins University Press: Baltimore, MD, USA; London, UK, 1996. [Google Scholar]
Li, X.M.; Wu, J.P. Numerical Parallel Algorithm and Software; Science Press: Beijing, China, 2007. [Google Scholar]
Wang, C.L.; Bai, Z.Z. Sufficient conditions for the convergent splittings of non-Hermitian positive definite matrices. Linear Algebra Appl. 2001, 330, 215–218. [Google Scholar] [CrossRef]

Table 1. Comparison of the the number of iteration steps (ITs) and the computational time in seconds (T) when choosing

r = 0.01

. HSS: Hermitian and skew-Hermitian splitting; MCG: modified conjugate gradient method; TS: two-stage iteration.

Table 1. Comparison of the the number of iteration steps (ITs) and the computational time in seconds (T) when choosing

r = 0.01

. HSS: Hermitian and skew-Hermitian splitting; MCG: modified conjugate gradient method; TS: two-stage iteration.

n	TS		HSS		MCG
n	ITs	T	ITs	T	ITs	T
32	4	0.030	41	0.053	237	0.036
64	20	0.070	111	0.378	943	0.356
128	130	1.411	270	5.995	4091	9.526

Table 2. Comparison of ITs and T when choosing

r = 0.1

.

Table 2. Comparison of ITs and T when choosing

r = 0.1

.

n	TS		HSS		MCG
n	ITs	T	ITs	T	ITs	T
32	10	0.020	55	0.055	250	0.040
64	27	0.159	96	0.464	1006	0.399
128	149	2.593	205	7.996	3864	9.010

Table 3. Comparison of ITs and T when choosing

r = 1.0

.

Table 3. Comparison of ITs and T when choosing

r = 1.0

.

n	TS		HSS		MCG
n	ITs	T	ITs	T	ITs	T
32	31	0.043	48	0.061	183	0.059
64	54	0.395	125	0.443	254	0.415
128	158	5.796	247	8.872	458	8.236

Table 4. The optimal values

α_{e x p}

for HSS.

Table 4. The optimal values

α_{e x p}

for HSS.

n	HSS
n	$r = 0.01$	$r = 0.1$	$r = 1.0$
32	0.40	0.40	0.95
64	0.17	0.23	0.81
128	0.09	0.13	0.62

Table 5. Numerical results of Example 2 with

h = 1 / 101

.

Table 5. Numerical results of Example 2 with

h = 1 / 101

.

	TS				MCG
p	1	2	4	6	1	2	4	6
T	140.25	77.62	39.83	30.45	297.47	160.89	84.29	63.56
ITs	1067	1067	1067	1067	4992	4992	4992	4992
S		1.81	3.52	4.61		1.85	3.53	4.68
$E_{1}$	1.00	0.90	0.88	0.77	0.47	0.44	0.42	0.37
$Δ$	9.66 × 10 $^{- 7}$	9.66 × 10 $^{- 7}$	9.66 × 10 $^{- 7}$	9.14 × 10 $^{- 7}$	9.01 × 10 $^{- 7}$	9.13 × 10 $^{- 7}$	9.11 × 10 $^{- 7}$	9.80 × 10 $^{- 7}$

Table 6. Numerical results of Example 2 with

h = 1 / 1001

.

Table 6. Numerical results of Example 2 with

h = 1 / 1001

.

	TS				MCG
p	20	25	30	35	20	25	30	35
T	641.12	569.88	491.28	463.74	1563.70	1374.68	1184.62	1116.93
ITs	7194	7194	7194	7194	91,332	91,332	91,332	91,336
S		22.50	26.10	27.65		22.75	26.40	28.00
$E_{1}$	1.00	0.90	0.87	0.79	0.16	0.37	0.36	0.33
$Δ$	9.72 × 10 $^{- 7}$	9.72 × 10 $^{- 7}$	9.72 × 10 $^{- 7}$	9.72 × 10 $^{- 7}$	9.83 × 10 $^{- 7}$	9.83 × 10 $^{- 7}$	9.93 × 10 $^{- 7}$	9.54 × 10 $^{- 7}$

Table 7. Numerical results of Example 3.

	TS
p	16	20	24	28
T	2480.19	2351.59	2296.47	2241.46
ITs	509	509	509	509
S		16.87	17.28	17.92
$E_{1}$	1.00	0.84	0.72	0.64
$Δ$	8.83 × 10 $^{- 7}$	8.83 × 10 $^{- 7}$	8.83 × 10 $^{- 7}$	8.83 × 10 $^{- 7}$

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xiao, M.; Lv, Q.; Xing, Z.; Zhang, Y. A Parallel Two-Stage Iteration Method for Solving Continuous Sylvester Equations. Algorithms 2017, 10, 95. https://doi.org/10.3390/a10030095

AMA Style

Xiao M, Lv Q, Xing Z, Zhang Y. A Parallel Two-Stage Iteration Method for Solving Continuous Sylvester Equations. Algorithms. 2017; 10(3):95. https://doi.org/10.3390/a10030095

Chicago/Turabian Style

Xiao, Manyu, Quanyi Lv, Zhuo Xing, and Yingchun Zhang. 2017. "A Parallel Two-Stage Iteration Method for Solving Continuous Sylvester Equations" Algorithms 10, no. 3: 95. https://doi.org/10.3390/a10030095

APA Style

Xiao, M., Lv, Q., Xing, Z., & Zhang, Y. (2017). A Parallel Two-Stage Iteration Method for Solving Continuous Sylvester Equations. Algorithms, 10(3), 95. https://doi.org/10.3390/a10030095

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Parallel Two-Stage Iteration Method for Solving Continuous Sylvester Equations

Abstract

1. Introduction

2. Description of the Two-Stage Iteration Method

2.1. Outer Iteration Scheme

2.2. Inner Iteration Scheme Based on the SYMMLQ Algorithm

3. Parallel Implementation of the Two-Stage Iteration Method

3.1. Data Storage

3.2. Parallel Implementation of Outer Iteration Method

3.3. Parallel Implementation of Inner Iteration Scheme

3.4. Improved Parallel Implementation of the SYMMLQ Algorithm

4. Convergence Analysis

5. Numerical Examples

6. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI