Modiﬁed Jacobi-Gradient Iterative Method for Generalized Sylvester Matrix Equation

: We propose a new iterative method for solving a generalized Sylvester matrix equation A 1 XA 2 + A 3 XA 4 = E with given square matrices A 1 , A 2 , A 3 , A 4 and an unknown rectangular matrix X . The method aims to construct a sequence of approximated solutions converging to the exact solution, no matter the initial value is. We decompose the coefﬁcient matrices to be the sum of its diagonal part and others. The recursive formula for the iteration is derived from the gradients of quadratic norm-error functions, together with the hierarchical identiﬁcation principle. We ﬁnd equivalent conditions on a convergent factor, relied on eigenvalues of the associated iteration matrix, so that the method is applicable as desired. The convergence rate and error estimation of the method are governed by the spectral norm of the related iteration matrix. Furthermore, we illustrate numerical examples of the proposed method to show its capability and efﬁcacy, compared to recent gradient-based iterative methods.


Introduction
In control engineering, certain problems concerning the analysis and design of control systems can be formulated as the Sylvester matrix equation: where X ∈ R m×n is an unknown matrix, and A 1 , A 2 , C are known matrices of appropriate dimensions.
Here, R m×n stands for the set of m × n real matrices. Let us denote by (·) T the transpose of a matrix. When A 2 = A 1 T , the equation is reduced to the Lyapunov equation, which is often found in continuousand discrete-time stability analysis [1,2]. The Sylvester equation is a special case of a generalized Sylvester matrix equation: where X ∈ R m×n is unknown, and A 1 , A 2 , A 3 , A 4 , E are known constant matrices of appropriate dimensions. This equation also includes the equation A 1 XA 2 = C and the Kalman-Yakubovich equation A 1 XA 2 − X = C as special cases. All of these equations have profound applications in linear system theory and related areas. Normally, a direct way to solve the generalized Sylvester Equation (2) is to reduce it to a linear system by taking the vector operator. Then, Equation (2) reduces to Px = b where Method 1 (Jacobi-Gradient based Iterative (JGI) algorithm [24]). For i = 1, 2, let D i be the diagonal part of A i . Given any initial matrices X 1 (0), X 2 (0). Set k = 0 and compute X(0) = (1/2)(X 1 (0) + X 2 (0)). For k = 1, 2, . . . , End, do: X(k) = 1 2 (X 1 (k) + X 2 (k)).
After that, Tian et al. [25] proposed that an accelerated Jacobi-gradient iterative (AJGI) algorithm for solving the Sylvester matrix equation relies on two relaxation factors and the half-step update. However, the parameter values to apply to the algorithm are difficult to find since they are given in terms of a nonlinear inequality. For the generalized Sylvester Equation (2), the gradient iterative (GI) algorithm [19] and the least-squares iterative (LSI) algorithm [26] were established as follows.
In this paper, we shall propose a new iterative method for solving the generalized Sylvester matrix Equation (2), when A 1 , A 3 ∈ R m×m , A 2 , A 4 ∈ R n×n and X, E ∈ R m×n . This algorithm requires only one initial value X(0) and only one parameter, called a convergence factor. We decompose the coefficient matrices to be the sum of its diagonal part and others. The recursive formula for iteration is derived from the gradients of quadratic norm-error functions together with hierarchical identification principle. Under assumptions on the real-parts sign of eigenvalues of matrix coefficients, we find necessary and sufficient conditions on a convergent factor for which (AS) holds. The convergence rate and error estimates are regulated by the iteration matrix spectral radius. In particular, when the iteration matrix is symmetric, we obtain a convergence criteria, error estimates and the optimal convergent factor in terms of spectral norms and condition number. Moreover, numerical simulations are also provided to illustrate our results for (2) and (1). We compare the efficiency of our algorithm to LSI, GI, RGI, AGBI and JGI algorithms.
Let us recall some terminology from matrix analysis-see e.g., [27]. For any square matrix X, denote σ(X) its spectrum, ρ(X) its spectral radius, and tr(X) its trace. Let us denote the largest and the smallest eigenvalues of a matrix by λ max (·) and λ min (·), respectively. Recall that the spectral norm · 2 and the Frobenius norm · F of A ∈ R m×n are, respectively, defined by The condition number of A = 0 is defined by Denote the real part of a complex number z by (z). The rest of paper is organized as follows. We propose a modified Jacobi-gradient iterative algorithm in Section 2. Convergence criteria, convergence rate, error estimates, and optimal convergence factor are discussed in Section 3. In Section 4, we provide numerical simulations of the algorithm. Finally, we conclude the paper in Section 5.

A Modified Jacobi-Gradient Iterative Method for the Generalized Sylvester Equation
In this section, we propose an iterative algorithm for solving the generalized Sylvester equation, called a modified Jacobi-gradient iterative algorithm.
Throughout, let m, n ∈ N and A 1 , A 3 ∈ R m×m , A 2 , A 4 ∈ R n×n and E ∈ R m×n . We would like to find a matrix X ∈ R m×n , such that Write are the diagonal parts of A 1 , A 2 , A 3 , A 4 , respectively. A necessary and sufficient condition for (3) to have a unique solution is the invertibility of the square matrix In this case, the solution is given by vec X = P −1 vec E.
To obtain an iterative algorithm for solving (3), we recall the hierarchical identification principle in [19]. We write (3) to Define two matrices From (4) and (5), we shall find the approximated solution of the following two subsystems so that the following norm-error functions are minimized: From the gradient formula we can deduce the gradient of the error L 1 as follows: Similarly, we have Let X 1 (k) and X 2 (k) be the estimates or iterative solutions of the system (6) at k-th iteration. The recursive formulas of X 1 (k) and X 2 (k) come from the gradient formulas (8) and (9), as follows: Based on the hierarchical identification principle, the unknown variable X is replaced by its estimates at the (k − 1)-th iteration. To avoid duplicated computation, we introduce a matrix Since any diagonal matrix is sparse, the operation counts in the computation (10) can be substantially ij ] for each l = 1, 2, 3, 4. Indeed, the multiplication of D 1 S(k)D 2 results in a matrix whose (ij)-th entry is the product of the i-th entry in the diagonal of D 1 , the (ij)-th entry of S(k), and the j-th entry of The above discussion leads to the following Algorithm 1.

Algorithm 1: Modified Jacobi-gradient based iterative (MJGI) algorithm
The complexity analysis for each step of the algorithm is given by 2mn(m + n + 5). When m = n, the complexity analysis is 4n 3 + 10n 2 ∈ O(n 3 ), so that the algorithm runtime complexity is cubic time. The convergence property of the algorithm relies on the convergence factor µ. The appropriate value of this parameter is determined in the next section.

Convergence Analysis of the Proposed Method
In this section, we make convergence analysis of Algorithm 1. First, we transform it into a linear iterative method of the first order: x(k) = Tx(k − 1)where x(k) is a vector variable and T is a matrix. The iteration matrix T will reflect convergence criteria, convergence rate, and error estimates of the algorithm.
(2) If (λ j ) > 0 for all j = 1, . . . , mn, then (AS) holds if and only if Proof. From Algorithm 1, we start with considering the error matrices We will show thatX(k) → 0, or equivalently, vecX(k) → 0 as k → ∞. A direct computation reveals that By taking the vector operator and using properties of the Kronecker product, we have Let us denote the diagonal part of P by D(P). Indeed, Thus, we arrive at a linear iterative process where H = D(P)P. Hence, the following statements are equivalent: (i) vecX(k) → 0 for any initial value vecX(0). (ii) System (11) has an asymptotically-stable zero solution.
(iii) The iteration matrix I mn − µH has spectral radius less than 1.
Indeed, since I mn − µH is a polynomial of H, we get Thus, we arrive at two alternative conditions: Case 1 a j = (λ j ) > 0 for all j. In this case, ρ(I mn − µH) < 1 if and only if Case 2 a j = (λ j ) < 0 for all j. In this case, ρ(I mn − µH) < 1 if and only if min j=1,...,mn 2a j Now, suppose that H is a symmetric matrix. Then I mn − µH is also symmetric, and thus all its eigenvalue are real. Hence, It follows that ρ(I mn − µH) < 1 if and only if 0 < µλ min (H) < 2 and 0 < µλ max (H) < 2.
So, λ min (H) and λ max (H) cannot be zero.
Therefore, the condition (16) holds if and only if λ max (H) and λ min (H) have the same sign and µ is chosen according to the above condition.

Convergence Rate and Error Estimate
We now discuss the convergence rate and error estimates of Algorithm 1 from the iterative process (11).
Suppose that Algorithm 1 satisfies the (AS) property-i.e., ρ(I mn − µH) < 1. From (11), we have It follows inductively that for each k ∈ N, Hence, the spectral norm of I mn − µH describes how fast the approximated solutions X(k) converges to the exact solution X * . The smaller spectral radius, the faster X(k) goes to X * . In that case, since Thus, the error at each iteration gets smaller than the previous one. The above discussion is summarized in the following theorem.

Theorem 2.
Suppose that the parameter µ is chosen as in Theorem 1 so that Algorithm 1 satisfies (AS). Then, the convergence rate of the algorithm is governed by the spectral radius (16). Moreover, the error estimate X(k) − X * F compared to the previous step and the fast step are provided by (17) and (18), respectively. In particular, the error at each iteration gets smaller than the (nonzero) previous one, as in (19).
From (16), if the eigenvalues of µH are close to 1, then the spectral radius of the iteration matrix is close to 0, and hence, the error vecX(k) orX(k) converge faster to 0.

Remark 1.
The convergence criteria and the convergence rate of Algorithm 1 depend on A, B, C and D but not on E. However, the matrix E can be used for the stopping criteria.
The next proposition determines the iteration number for which the approximated solution X(k) is close to the exact solution X * so that X(k) − X * F < .

Proposition 1.
According to Algorithm 1, for each given error > 0, we have X(k) − X * F < after k * iterations for any k * , such that Proof. From the estimation (18), we have This means precisely that for each given > 0, there is a k * ∈ N such that for all k ≥ k * , Taking logarithms, we have that the above condition is equivalent to (20). Thus, if we run Algorithm 1 k * times, then we get X(k) − X * < as desired.

Optimal Parameter
We discuss the fastest convergence factor for Algorithm 1.
Theorem 3. The optimal convergence factor µ for which Algorithm 1 satisfies (AS) is one that minimizes I mn − µH 2 . If, in addition, H is symmetric, then the optimal convergence factor for which the algorithm satisfies (AS) is determined by In this case, the convergence rate is governed by where κ denotes be condition number of H, and we have the following estimates: Proof. From Theorem 2, it is clear that the fastest convergence factor is attained at a convergence factor that minimizes I mn − µH 2 . Now, assume that H is symmetric. Then, I mn − µH is also symmetric, thus all its eigenvalues are real and For convenience, denote a = λ min (H), b = λ max (H), and First, we consider the case λ min (H) > 0. To obtain the fastest convergence factor, according to (15), we must solve the following optimization problem min 0<µ< 2 λmax(H) We obtain that the minimizer is given by µ opt = 2/(a + b), so that f (µ opt ) = (b − a)/(b + a). For the case λ max (H) < 0, we solve the following optimization problem A similar argument yields the same minimizer (21) and the same convergence rate (22). From (17), (18) and (25), we obtain the bounds (23) and (24).

Numerical Simulations
In this section, we report numerical results to illustrate the effectiveness of Algorithm 1. We consider various sizes of matrix systems, namely, small (2 × 2), medium (10 × 10) and large (100 × 100). For the generalized Sylvester equation, we compare the performance of Algorithm 1 to the GI and LSI algorithms.
For the Sylvester equation, we compare our algorithm with GI, RGI, AGBI and JGI algorithms. All iterations have been carried out the same environment: MATLAB R2017b, Intel(R) Core(TM) i7-7660U CPU @ 2.5GHz, RAM 8.00 GB Bus speed 2133 MHz. We abbreviate IT and CPU for iteration time and CPU time (in seconds), respectively. As step k-th of the iteration, we consider the following error: where X(k) is the k-th approximated solution of the corresponding system. Choose X(0) = zeros (2). In this case, all eigenvalues of H have positive real parts. The effect of changing the convergence factor µ is illustrated in Figure 1. According to Theorem 1, the criteria for the convergence of X(k) is that µ ∈ (0, 4.1870). Since µ 1 , µ 2 , µ 3 , µ 4 satisfy this criteria, the error is becoming smaller and goes to zero as k increase, as in Figure 1. Among them, µ 4 = 4.0870 gives the fastest convergence. For µ 5 and µ 6 , which do not meet the criteria, the error δ(k) does not converge to zero. Here, E is a heptadiagonal matrix-i.e., a band matrix with bandwidth 3. Choose an initial matrix X(0) = zeros (10), where zeros(n) is an n-by-n matrix that contains 0 for every position. We compare Algorithm 1 with the direct method, LSI and GI algorithms. Table 1 shows the errors at the final step of iteration as well as the computation time after 75 iterations. Figure 2 illustrates that the approximated solutions via LSI diverge, while those via GI and MJGI converge. Table 1 and Figure 2 imply that our algorithm takes significantly less computational time and error than others.  The initial matrix is given by X(0) = zeros(100). We run LSI, GI and MJGI algorithms by using

Numerical Simulation for the Generalized Sylvester Matrix Equation
respectively. The reported result in Table 2 and Figure 3 illustrate that the approximated solution generated from LSI diverges, while those from GI or MJGI converge. Both computational time and the error δ(100) from MJGI are less than those from GI.

Numerical Simulation for Sylvester Matrix Equation
Assume that the Sylvester equation has a unique solution. This condition is equivalent to that the Kronecker sum A T 4 ⊕ A 1 is invertible, or all possible sums between eigenvalues of A 1 and A 4 are nonzero. To solve (26), the Algorithm 2 is proposed:

Algorithm 2:
Modified Jacobi-gradient based iterative (MJGI) algorithm for Sylvester equation Choose µ ∈ R, > 0 and set k = 1; for k = 1, . . . , n do break; else k = k + 1; end end Example 4. Consider the equation A 1 X + XA 4 = E, in which E is the same matrix as in the previous example, In this case, all eigenvalues of the iteration matrix have positive real parts, so that we can apply our algorithm. We compare our algorithm with GI, RGI, AGBI and JGI algorithms. The results after running 100 iterations are shown in Figure 4 and Table 3. According to the error and CT in Table 3 and Figure 4, our algorithm uses less computational time and has smaller errors than others.

Conclusions and Suggestion
A modified Jacobi-gradient (MJGI) algorithm (Algorithm 1) is proposed for solving the generalized Sylvester matrix Equation (3). In order to have MJGI algorithm applicable for any sizes of matrix system and any initial matrices, the convergence factor µ must be chosen properly according to Theorem 1. In this case, the iteration matrix I mn − µH has a spectral radius less than 1. When the iteration matrix is symmetric, we determine the optimal convergent factor µ opt which enhances the algorithm reaching the fastest rate of convergence. The asymptotic convergence rate of the algorithm is governed by the spectral radius of I mn − µH. So, if the eigenvalue H is close to 1, then the algorithm converges faster in the long run. The numerical examples reveal that our algorithm is suitable for small (2 × 2), medium (10 × 10) and large (100 × 100) sizes of matrix systems. In addition, the MJGI algorithm performs well compared to recent gradient iterative algorithms. For future works, we may add another parameter for an updating step to make the algorithm converge faster-see [25]. Another possible way is to apply the idea in this paper to derive an iterative algorithm for nonlinear matrix equations.
Author Contributions: Supervision, P.C.; software, N.S.; writing-original draft preparation, N.S.; writing-review and editing, P.C. All authors contributed equally and significantly in writing this article. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.