Gradient-Based Optimization Algorithm for Solving Sylvester Matrix Equation

: In this paper, we transform the problem of solving the Sylvester matrix equation into an optimization problem through the Kronecker product primarily. We utilize the adaptive accelerated proximal gradient and Newton accelerated proximal gradient methods to solve the constrained non-convex minimization problem. Their convergent properties are analyzed. Finally, we offer numerical examples to illustrate the effectiveness of the derived algorithms.


Introduction
Matrix equations are ubiquitous in signal processing [1], control theory [2], and linear systems [3].Most time-dependent models accounting for the prediction, simulation, and control of real-world phenomena may be represented as linear or nonlinear dynamical systems.Therefore, the relevance of matrix equations within engineering applications largely explains the great effort put forth by the scientific community into their numerical solution.Linear matrix equations have an important role in the stability analysis of linear dynamical systems and the theoretical development of the nonlinear system.The Sylvester matrix equation was first proposed by Sylvester and produced from the research of relevant fields in applied mathematical cybernetics.It is a famous matrix equation that occurs in linear and generalized eigenvalue problems for the computation of invariant subspaces using Riccati equations [4][5][6].The Sylvester matrix equation takes part in linear algebra [7][8][9], image processing [10], model reduction [11], and numerical methods for differential equations [12,13].
We consider the Sylvester matrix equation of the form where A ∈ R m×m , B ∈ R n×n , C ∈ R m×n are given matrices, and X ∈ R m×n is an unknown matrix to be solved.We discuss a special form of the Sylvester matrix equation, in which A and B are symmetric positive definite.
Recently, there has been a lot of discussion on the solution and numerical calculation of the Sylvester matrix equation.The standard methods for solving this equation are the Bartels-Stewart method [14] and the Hessenberg-Schur method [15], which are efficient for small and dense system matrices.When system matrices are small, the block Krylov subspace methods [16,17] and global Krylov subspace methods [18] are proposed.These methods use the global Arnoldi process, block Arnoldi process, or nonsymmetric block Lanczos process to produce low-dimensional Sylvester matrix equations.More feasible methods for solving large and sparse problems are iterative methods.When system matrices are large, there are some effective methods such as the alternating direction implicit (ADI) method [19], global full orthogonalization method, global generalized minimum residual method [20], gradient-based iterative method [21], and global Hessenberg and changing minimal residual with Hessenberg process method [22].When system matrices are low-rank, the ADI method [23], block Arnoldi method [17], preconditioned block Arnoldi method [24], and extended block Arnoldi method [25] and its variants [26,27], including the global Arnoldi method [28,29] and extended global Arnoldi method [25], are proposed to obtain the low-rank solution.
The adaptive accelerated proximal gradient (A-APG) method [30] is an efficient numerical method for calculating the steady states of the minimization problem, motivated by the accelerated proximal gradient (APG) method [31], which has wide applications in image processing and machine learning.In each iteration, the A-APG method takes the step size by using a line search initialized with the Barzilai-Borwein (BB) step [32] to accelerate the numerical speed.Moreover, as the traditional APG method is proposed for the convex problem and its oscillation phenomenon slows down the convergence, the restart scheme has been used for speeding up the convergence.For more details, one can refer to [30] and the references therein.
The main contribution is to study gradient-based optimization methods such as the A-APG and Newton-APG methods for solving the Sylvester matrix equation through transforming this equation into an optimization problem by using Kronecker product.The A-APG and Newton-APG methods are theoretically guaranteed to converge to a global solution from an arbitrary initial point and achieve high precision.These methods are especially efficient for large and sparse coefficient matrices.
The rest of this paper is organized as follows.In Section 2, we transform this equation into an optimization problem by using the Kronecker product.In Section 3, we apply A-APG and Newton-APG algorithms to solve the optimization problem and compare them with other methods.In Section 4, we focus on the convergence analysis of the A-APG method.In Section 5, the computational complexity of these algorithms is analyzed exhaustively.In Section 6, we offer corresponding numerical examples to illustrate the effectiveness of the derived methods.
Throughout this paper, let R n×m be the set of all n × m real matrices.I n is the identity matrix of order n.If A ∈ R n×n , the symbols A T , A −1 , A and tr(A) express the transpose, the inverse, the 2-norm, and the trace of A, respectively.The inner product in matrix space E is x, y = tr(x, y), ∀x, y ∈ E.

The Variant of an Optimization Problem
In this section, we transform the Sylvester equation into an optimization problem.We recall some definitions and lemmas.
Definition 1.Let Y = (y ij ) ∈ R m×n , Z ∈ R p×q , the Kronecker product of Y and Z be defined by From Lemma 1, the Sylvester Equation (1) can be rewritten as Lemma 2. Let A be a symmetric positive matrix; solving the equation Ax = b is equivalent to obtaining the minimum of ϕ(x) = x T Ax − 2b T x.
According to Lemma 2 and Equation (2), define Therefore, Equation (2) should be Ā x = b.Obviously, if A and B are symmetric positive, then Ā is symmetric positive.The variant of the Sylvester Equation (2) reduces to the optimization problem: Using the calculation of the matrix differential from [33], we have the following propositions immediately.
Using Propositions 2 and 3, the gradient of the objective function (3) is By (4), the Hessian matrix is

Iterative Methods
In this section, we will introduce the adaptive accelerated proximal gradient (A-APG) method and the Newton-APG method to solve the Sylvester equation.Moreover, we compare the A-APG and Newton-APG methods with other existing methods.

APG Method
The traditional APG method [31] is designed for solving the composite convex problem: where H is the finite-dimensional Hilbert space equipped with the inner product < •, • >, g and f are both continuously convex, and f has a Lipschitz constant L. Given initializations x 1 = x 0 and t 0 = 1, the APG method is where α ∈ (0, L] and the mapping Prox α g (•) : R n → R n is defined as Since our minimization problem is linear, we choose the explicit scheme.The explicit scheme is a simple but effective approach for the minimization problem.Given an initial value Y 0 and the step α k , the explicit scheme is where Y k is the approximation solution.The explicit scheme satisfies the sufficient decrease property using the gradient descent (GD) method.Let X k and X k−1 be the current and previous states and the extrapolation weight be w k .Using the explicit method ( 6), the APG iterative scheme is Together with the standard backtracking, we adopt the step size α k when the following condition holds: for some η > 0. Combining ( 7) and ( 8), the APG algorithm is summarized in Algorithm 1.

Restart APG Method
Recently, an efficient and convergent numerical algorithm has been developed for solving a discretized phase-field model by combining the APG method with the restart technique [30].Unlike the APG method, the restart technique involves choosing X k+1 = Y k+1 whenever the following condition holds: for some γ > 0. If the condition is not met, we restart the APG by setting w k = 0.The restart APG method (RAPG) is summarized in Algorithm 2.

A-APG Method
In RAPG Algorithm 2, we can adaptively estimate the step size α k by using the line search technique.Define We initialize the search step by the Barzilai-Borwein (BB) method, i.e., Therefore, we obtain the A-APG algorithm summarized in Algorithm 3.

Newton-APG Method
Despite the fast initial convergence speed of the gradient-based methods, the tail convergence speed becomes slow.Therefore, we use a practical Newton method to solve the minimization problem.We obtain the initial value from A-APG Algorithm 3, and then choose the Newton direction as the gradient in the explicit scheme in RAPG Algorithm 2. Then we have the Newton-APG method shown in Algorithm 4.
1: Obtain the initial value from A-APG Algorithm 3 by the precision ; 2: while the stop condition is not satisfied do Update X k+1 by RAPG Algorithm 2 using Newton direction.

Gradient Descent (GD) and Line Search (LGD) Methods
Moreover, we show gradient descent (GD) and line search (LGD) methods for comparing with the A-APG and Newton-APG methods.The GD and line search LGD methods are summarized in Algorithm 5.
1: while the stop condition is not satisfied do if the step size is fixed then Calculate X k+1 via X k+1 = X k − α ϕ(X k ) using GD; Calculate X k+1 via X k+1 via X k+1 = X k − α ϕ(X k ) using LGD; 11:

Computational Complexity Analysis
Further, we analyze the computational complexity of each iteration of the derived algorithms.

Convergent Analysis
In this section, we focus on the convergence analysis of A-APG Algorithm 3. The following proposition is required.Proposition 4. Let M be a bounded region that contains {ϕ ϕ(X 0 )} in R n×n , then ϕ(X) satisfies the Lipschitz condition in M, i.e., there exists L M > 0 such that Proof.Using the continuity of ϕ(X), note that defined by ( 5) is bounded.Then ϕ(X) satisfies the Lipschitz condition in M.
In recent years, the proximal method based on the Bregman distance has been applied for solving optimization problems.The proximal operator is Basically, given the current estimation X k and step size α k > 0, update X k+1 via Thus we obtain This is exactly the explicit scheme in our algorithm.

Linear Search Is Well-Defined
Using the optimization from Equation (11), it is evident that Then we obtain where the second inequality follows from Taylor expansion of ϕ(X k+1 ).By Equation ( 12), set the conditions in linear search Equation ( 8) and non-restart Equation ( 9) are both satisfied.Therefore, the backtracking linear search is well-defined.

Sufficient Decrease Property
In this section, we show the sufficient decrease property of the sequence generated by A-APG Algorithm 3. If α k satisfies the condition Equation ( 13), then where ρ 1 = min{η, γ} > 0. Since ϕ is a bounded function, then there exists ϕ * such that ϕ(X k ) ϕ * and ϕ(X k ) → ϕ * as k → +∞.This implies

Bounded Gradient
Define two sets Thus, where Combining Equations ( 14) and ( 15), it follows that

Subsequence Convergence
As {X k } ∈ M is compact, there exists a subsequence {X k j } ⊂ M and X * ∈ M such that lim j→+∞ X k j = X * .Then ϕ is bounded, i.e., ϕ(X) > −∞ and ϕ keeps decreasing.Hence, there exists ϕ * such that lim Summation over k yields Due to the property of the gradient, thus Considering the continuity of ϕ and ϕ, we have which implies that ϕ(X * ) = 0.
Theorem 1. Assume that Propositions 4 and 5 are met.Let {X k } be the sequence generated by A-APG Algorithm 3.Then, there exists a point X * ∈ M so that lim k→+∞ X k = X * and ϕ(X * ) = 0.

Numerical Results
In this section, we offer two corresponding numerical examples to illustrate the efficiency of the derived algorithms.All code is written in Python language.Denote iteration and error by the iteration step and error of the objective function.We take the matrix order "n" as 128, 1024, 2048, and 4096.

Example 1.
Let be tridiagonal matrices in the Sylvester Equation (1).Set the matrix C 1 as the identity matrix.The initial step size is 0.01, which is small enough to iterate.The parameters are η 1 = 0.25, ω 1 = 0.2 taken from (0,1) randomly.Table 1 and Figure 1 show the numerical results of Algorithms 1-5.
It can be seen that the LGD, A-APG, and Newton-APG Algorithms are more efficient than other methods.Moreover, the iteration step does not increase when the matrix order increases due to the same initial value.The A-APG method has higher error accuracy compared with other methods.The Newton-APG method takes more CPU time and fewer iteration steps than the A-APG method.
The Newton method needs to calculate the inverse of the matrix, while it has quadratic convergence.
From Figure 1, the error curves of the LGD, A-APG, and Newton-APG algorithms are hard to distinguish.We offer another example below.

Example 2. Let A
1 be positive semi-definite matrices in the Sylvester Equation (1).Set the matrix C 2 as the identity matrix.The initial step size is 0.009.The parameters are η 2 = 0.28, ω 2 = 0.25 taken from (0,1) randomly.Table 2 and Figure 2 show the numerical results of Algorithms 1-5.It can be seen that the LGD, A-APG, and Newton-APG algorithms take less CPU time compared with other methods.Additionally, we can observe the different error curves of the LGD, A-APG, and Newton-APG algorithms from Figure 2.
Remark 1.The difference of the iteration step in Examples 1 and 2 emerges due to the given different initial values.It can be seen that the LGD, A-APG, and Newton-APG algorithms have fewer iteration steps.Whether the A-APG method or Newton-APG yields fewer iteration steps varies from problem to problem.From Examples 1 and 2, we observe that the A-APG method has higher accuracy, although it takes more time and more iteration steps than the LGD method.
Remark 2.Moreover, we compare the performance of our methods with other methods such as the conjugate gradient method (CG) in Tables 1 and 2. We take the same initial values and set the error to 1 × 10 −14 .From Tables 1 and 2, it can be seen that the LGD and A-APG methods are more efficient for solving the Sylvester matrix equation when the order n is small.When n is large, the LGD and A-APG methods nearly have a convergence rate with the CG method.

Conclusions
In this paper, we have introduced the A-APG and Newton-APG methods for solving the Sylvester matrix equation.The key idea is to change the Sylvester matrix equation to an optimization problem by using the Kronecker product.Moreover, we have analyzed the computation complexity and proved the convergence of the A-APG method.Convergence results and preliminary numerical examples have shown that the schemes are promising in solving the Sylvester matrix equation.

Table 1 .
Numerical results for Example 1.

Table 2 .
Numerical results for Example 2.