Comparative Performance Evaluation of an Accuracy-Enhancing Lyapunov Solver †

: Lyapunov equations are key mathematical objects in systems theory, analysis and design of control systems, and in many applications, including balanced realization algorithms, procedures for reduced order models, Newton methods for algebraic Riccati equations, or stabilization algorithms. A new iterative accuracy-enhancing solver for both standard and generalized continuous- and discrete-time Lyapunov equations is proposed and investigated in this paper. The underlying algorithm and some technical details are summarized. At each iteration, the computed solution of a reduced Lyapunov equation serves as a correction term to reﬁne the current solution of the initial equation. The best available algorithms for solving Lyapunov equations with dense matrices, employing the real Schur(-triangular) form of the coefﬁcient matrices, are used. The reduction to Schur(-triangular) form has to be done only once, before starting the iterative process. The algorithm converges in very few iterations. The results obtained by solving series of numerically difﬁcult examples derived from the SLICOT benchmark collections for Lyapunov equations are compared to the solutions returned by the MATLAB and SLICOT solvers. The new solver can be more accurate than these state-of-the-art solvers and requires little additional computational effort.


Introduction
Lyapunov equations are key mathematical objects in systems theory, analysis and design of (control) systems, and in many applications. Solving these equations is an essential step in balanced realization algorithms [1,2], in procedures for reduced order models for systems or controllers [3][4][5][6][7], in Newton methods for algebraic Riccati equations (AREs) [8][9][10][11][12][13][14], or in stabilization algorithms [12,15,16]. Stability analyses for dynamical systems may also resort to Lyapunov equations. Standard continuous-time or discrete-time Lyapunov equations, respectively, with symmetric matrix Y, Y = Y T , and T denoting the matrix transposition, are associated to an autonomous linear time-invariant system, described by δ(x(t)) = Ax(t), t ≥ 0, where x(t) ∈ IR n , and δ(x(t)) is either dx(t)/dt-the differential operator, or x(t + 1)-the advance difference operator, respectively. A necessary and sufficient condition for asymptotic stability of system (3) is that for any symmetric positive definite matrix Y, denoted Y > 0, there is a unique solution X > 0 of the Lyapunov Equation (1), or (2). Several other facts deserve to be mentioned. If Y is positive-semidefinite, denoted Y ≥ 0, and X > 0, then all trajectories of x(t) in system (3) are bounded.
If, in addition, the pair (Y, A) is observable, then system (3) is globally asymptotically stable. Another sufficient condition for global asymptotic stability is that Y > 0 and X > 0. If Y ≥ 0 and X ≥ 0, then A is not stable. If V(x) = x T Xx is a generalized energy, it follows that dV(x) dt = −x T Yx, in the continuous-time case, and V(x(t + 1)) − V(x(t)) = −x T Yx, in the discrete-time case, that is, x T Yx is the associated generalized dissipation. The function V(x) is a quadratic Lyapunov function. If X > 0, then V(x) = 0 implies x = 0.
For convenience, the often used notions and notation are given here.
respectively, where A, E ∈ IR n×n . The operator op(M) is often used in basic numerical linear algebra software [17,18], for increased generality and flexibility. A necessary solvability condition is that both A and E, for Equation (4), or either A or E, for Equation (5), are nonsingular. It will be assumed, without loss of generality, that E in Equation (5) (4) or (5) are stable Lyapunov equations. If Y ≥ 0, a stable Lyapunov equation has a unique solution X ≥ 0, that can be expressed and computed in a factored form, X = U T U, where U is the Cholesky factor of X [19]. The standard Lyapunov Equations (1) or (2) are special cases of the generalized Equations (4) or (5), where E is an identity matrix, E = I n , and op(M) = M.
There are applications for which the availability of the op(·) operator is important. Such an application is the computation of the Hankel singular values of a dynamical system, Eδ(x(t)) = Ax(t) + Bu(t), y(t) = Cx(t), for which, two related Lyapunov equations are defined, in the continuous-time case, and in the discrete-time case. The solutions P and Q of these equations are the controllability and observability Gramians, respectively, of system (6). The Hankel singular values are the nonnegative square roots of the eigenvalues of the matrix product QP. If the system (6) is stable, then P ≥ 0 and Q ≥ 0, and these properties imply that QP ≥ 0. But these theoretical results may not hold in numerical computations if the symmetry and semidefiniteness are not preserved by the solver. Some computed Hankel singular values may be returned as negative or even complex numbers. Such an example is given in [20]. This proves how important is to ensure the accuracy and reliability of the results. The recommended algorithm for this application, proposed in [19], for E = I n , and extended in [21] for a general matrix E, uses B and C directly, without evaluating BB T and C T C, and computes the Choleky factors R c and R o of the Hankel singular values are then obtained as the singular values of the product R c R o , which are all real nonnegative.
Many algorithms have been proposed to solve Lyapunov and more general linear matrix equations. The first numerically stable algorithm has been developed by Bartels and Stewart in [22] for Sylvester equation, AX + XB = C, where A ∈ IR n×n , B ∈ IR m×m , and C ∈ IR n×m , and also specialized for solving Lyapunov Equation (1). A transformation approach is used: A T and B are each reduced to a quasi-triangular form, using orthogonal transformations U and V, A = U T A T U, B = V T BV, and C is updated, C = U T CV. Then, a reduced equation, A T X + X B = C is solved by a special back substitution process, and its solution is transformed back to the solution of the original equation, X = U XV T . For standard Lyapunov equations, A is reduced to a quasi-triangular form or a real Schur form, but the rest of the procedure is similar. A more efficient algorithm for Sylvester equations with n ≥ m is based on the Hessenberg-Schur method [23], which reduces B T to quasi-triangular form and A to Hessenberg form. Clearly, this algorithm has no advantage for Lyapunov equations. Hammarling's algorithm [19] also uses the transformation approach for stable Lyapunov equations with Y ≥ 0, and computes the Cholesky factor of the solution. Many algorithmic and computational details for Sylvester and standard Lyapunov equations are given, e.g., in [12]. Computational improvements for solving the reduced equations have been proposed in [24][25][26]. An extension of Bartels-Stewart algorithm for generalized Lyapunov equations has been described in [21]. In this case, using two orthogonal matrices, the pair (A, E) is reduced to the generalized real Schur form [27], also called real Schur-triangular form, ( A, E), with A in a real Schur form and E upper triangular. Then, the right hand side Y is updated accordingly, the corresponding reduced Lyapunov equation is solved, and the result is transformed back to the solution of the original equation. A comprehensive recent survey of the theory and applications of linear matrix equations is [28].
It is worth to mention that solvers implementing Bartels-Stewart-like approaches can be used for small and medium size Lyapunov equations, with n currently less than a few thousands, due to their complexity of order n 3 . Large-order equations can be approached by iterative algorithms, usually exploiting sparsity and/or the low-rank structure, see [28] and the references therein. A compact conjugate-gradient algorithm is proposed in [29] for solving large-scale Equation (4) with factored, low-rank Y and symmetric positive definite matrices A and E. Iterative methods recorded a fast development in recent years for solving various linear and nonlinear problems. For instance, Kyncheva et al. [30] analyze the local convergence of Newton, Halley and Chebyshev iterative methods for simultaneous determination of all multiple zeros of a polynomial function over an arbitrary normed field, while [31] presents a new semi-local convergence analysis for Newton's method in a Banach space for systems of nonlinear equations. This paper investigates the accuracy and efficiency of several Lyapunov solvers for equations with dense matrices. Specifically, the state-of-the-art solvers from the Control System Toolbox [32] and SLICOT Library [20,33,34] (www.slicot.org), and a new accuracy-enhancing iterative solver, referred to as ArLyap, are considered. As in [35], the ArLyap solver has been derived as a special case of an ARE solver based on Newton's method, with or without line search [13,14,[36][37][38]. Actually, Lyapunov equations are simplified AREs, without the quadratic or rational matrix term. All these solvers are based on the best algorithms for Lyapunov equations with dense matrices: the algorithm in [22] and its generalization [21], both available in SLICOT. Relatively straitforward modifications of the ArLyap solver allow to use other algorithms for solving the reduced equations, for instance, Hammarling's or Penzl's algorithms in [19] or [21], respectively, for stable equations with Y ≥ 0.
The ArLyap solver offers an option for specifying an initial approximation, X 0 . It is possible, for instance, to use some upper or lower bounds of the solution, derived as described in [39]. Using tighter estimates may reduce the number of iterations for convergence. Another option is to use the op operator, enherited from the lower-level SLICOT solvers. This allows to compute the real Schur-triangular form of the pair (A, E) (or the real Schur form of A, when E = I n ) only once for obtaining both controllability and observability Gramians. It is not necessary to do these computations for (A T , E T ) (or A T ).
This paper extends the developments in [35] by using a specialized, more efficient algorithm, which iterates directly on reduced Lyapunov equations, with A in a real Schur form, and E upper triangular. The main computational modules involved, which are not available in BLAS [17] or LAPACK [18] Libraries, are also discussed.
The paper is structured as follows. Section 2 presents the numerical results for solving series of test examples from the SLICOT benchmark collections for Lyapunov equations, CTLEX [40] and DTLEX [41]. Section 3 further discusses the relevance of these results. Section 4 describes the underlying algorithm and the new computational modules. Section 5 concludes the paper.

Results
This section presents several results illustrating the performance of the accuracy-enhancing Lyapunov solver, ArLyap, in comparison to the state-of-the art Control System Toolbox [32] and SLICOT Library solvers. ArLyap solves reduced Lyapunov equations at each iteration. The same computational environment as in [35] has been used (64-bit Intel Core i7-3820QM, 2.7 GHz, 16 GB RAM, double precision, Intel Visual Fortran Composer XE 2015 and MATLAB 8.6.0.267246 (R2015b), Natick, MA, USA). An executable MATLAB R MEX-function has been linked using ten new subroutines, several SLICOT subroutines, and optimized LAPACK and BLAS libraries included in MATLAB. The results presented in this section and the next one are new, and complement those reported in [35].

Benchmark Examples
To make possible a comparison with previous results, obtained with the ALyap solver and reported in [35], the same SLICOT benchmark collections for Lyapunov equations, CTLEX [40] and DTLEX [41], have been used. These benchmarks allow to investigate the behavior of numerical methods in difficult situations and assess their correctness, accuracy, and speed. The collections contain parameter-dependent examples of scalable size (group 4). For convenience, the short notation TLEX will be used for both collections and their examples. TLEX examples are generated using several parameters: the order n, and parameters r, s, λ, and t, which define the numerical condition of the problem, that influences the accuracy of the solution and its sensitivity to small perturbations in the data matrices. Increasing the value of any of these parameters, including n, makes the problem more ill-conditioned. Very ill-conditioned examples can be built even for small values of n. The same values of these parameters as in [35] have been used (see Table 2 in [35]). Specifically, the sets of values for n, r, s, λ, and t are defined by the following lists: where the notation in MATLAB style i = k : l : m means that i takes the values k, k + l, k + 2l, . . . , m.
A series of equations has been generated for each TLEX example, using two or three nested loops. The series for TLEX 4.1 is produced by a loop for n = list_n, incorporating a loop for r = list_r, containing, in turn, a loop for s = list_s. The order of the loops is list_n, list_l, and list_s, for TLEX 4.2, and list_n and list_t, for TLEX 4.3 and TLEX 4.4. Each abscissa value in the figures below is the index of an example in a generated series. All figures in this paper are new.

Performance Analysis Issues
The accuracy of a computed solution,X, is assessed using the relative error, when the true solution, X, is known (i.e., for TLEX 4.1 and TLEX 4.3). In this formula, M F denotes the Frobenius norm of the matrix M. If X is unknown (i.e., for TLEX 4.2 and TLEX 4.4), the normalized residual with respect to X m , defined as is used by the performance analysis program, where R(X) is the residual matrix atX (see the definition of R(·) in Section 1), and X m is the solution computed by the MATLAB functions lyap or dlyap, for CTLEX and DTLEX examples, respectively. The usual definition of the normalized residual, used internally by the ArLyap solver to decide if convergence has been achieved, hasX instead of X m in its denominator. The use of X m in Equation (9) allows to make fair comparisons of the residual norms corresponding to all these solvers. In order to avoid too ill-conditioned examples, which cannot be reliably solved by any solver, the performance analysis program also estimates the reciprocal condition number, rcond, of Lyapunov equations, and may bound its value. The SLICOT-based MATLAB functions lyapcond and steicond are used as condition estimators for standard continuous-and discrete-time Lyapunov equations, respectively. The same functions are called for generalized Lyapunov equations, by replacing the matrices A and Y by E −1 A and E −T YE −1 , respectively. These estimators are using the exact solution X, when known, or the MATLAB computed solution X m , otherwise. The chosen sequence of parameter values for each example produces a zigzaggy variation of rcond.
As in [35], the equations with an estimated rcond smaller than   The SLICOT solver is the fastest, and it is closely followed by the ArLyap solver. The accuracy results ( Figure 1) for well-conditioned equations, i.e., with rcond close to 1, are slightly worse than those reported in [35], for reasons explained in detail in Section 3. However, for several ill-conditioned examples, such as those numbered 50, 54, 55, or 63, the relative errors for ArLyap are much smaller than for the ALyap solver.   Table 1 shows the values, rounded to three significant digits, of the normalized residuals for three examples of the CTLEX 4.1 series, using ALyap and ArLyap. The ArLyap solver obtained smaller normalized residuals, possibly in fewer iterations, than the ALyap solver. But the difference is that the pairs of the sets of values in Table 1 are for the solutions of original Lyapunov equations, and for reduced Lyapunov equations, respectively. As shown in Figure 1, the MATLAB function lyap sometimes obtained smaller normalized residuals than the other solvers for the CTLEX 4.1 series of examples. But lyap and dlyap use a balancing procedure before computing the real Schur(-triangular) form. The current results for SLICOT and ArLyap solvers are computed without any balancing. Moreover, lyap is in advantage in a comparison since all computations are done on the given data, while the other solvers get the matrices from the MATLAB context. But even this transfer involves a loss of accuracy. For instance, for CTLEX 4.1 with n = 3, and r = s = 1.9, the relative error between the MATLAB and Fortran representations of the matrix A is about 1.47 × 10 −15 , that is, almost one digit of accuracy has been lost. The matrix Y lost less accuracy, since its relative error is about 8.52 × 10 −16 . Therefore, the data matrices used by the solvers are not exactly the same.

Continuous-Time Lyapunov Equations
The normalized residuals for CTLEX 4.2 series of examples are slightly worse than in [35]. The rcond values can be even smaller than 10 −10 , and the solution can have a large Frobenius norm. The ArLyap solver is faster than the MATLAB function lyap, with few exceptions, and slightly slower than the SLICOT solver.

Discrete-Time Lyapunov Equations
Figures 9 and 10 show the relative errors and the number of iterations plus reciprocal condition numbers, respectively, for the DTLEX 4.1 series. A smaller internal tolerance, ε 2 M , has been used for deciding the convergence of the iterative process. This allowed to make additional iterations than usual in several cases, and reduce the errors. As for the CTLEX 4.1 series (see Figure 1), ArLyap and SLICOT solvers have comparable relative errors for well-conditioned equations, and hence worse than those reported in [35], but for several ill-conditioned examples, such as those numbered 55, 62, 65, 70-73, the relative errors for ArLyap are (much) smaller than those for the SLICOT solver and sometimes also than for ALyap. Figure 11 plots the elapsed CPU times for the three solvers.  The ArLyap solver returned after the first iteration in most cases. Consequently, its accuracy is comparable to that of the SLICOT solver. Again, ArLyap is generally more accurate than ALyap for ill-conditioned equations, but not for well-conditioned ones. Figure 14 shows the relative errors for the DTLEX 4.3 series of examples. The SLICOT and ArLyap solvers have comparable errors, which are often better, and sometimes much better, than the errors of the MATLAB function dlyap. However, the ALyap solver was more accurate than ArLyap for many examples.   Almost always, the ArLyap solver obtained much smaller relative residuals for the DTLEX 4.4 series of examples, as shown in Figure 15. In this case, the matrix X 0 has been chosen as X m , the solution computed by dlyap, in order to test the behavior for an initialization different from a zero matrix. In addition, the tolerance τ has been set to ε 2 M . This resulted in a larger number of iterations than usual for several examples, see Figure 16. The maximum number of iterations has been set to k max = 10. It should be mentioned that X m F is very big for the examples needing ten iterations. For instance, X m F ≈ 1.58 × 10 15 for the last example in the series. If X m F is limited to about 10 −3 /ε M ≈ 4.5 × 10 12 , the maximum number of iterations is seven (for two examples only) and the maximum normalized residual is 9.7 × 10 −13 .

Discussion
The ArLyap solver differs from its previous version, ALyap, dealt with in [35], by solving reduced Lyapunov equations at each iteration, without back transforming their solutions. This implied the use of the real Schur form of the matrix A, or of the real Schur-triangular form of the matrix pair (A, E) for residual matrix computation, which provided gains in efficiency, and expected gains in accuracy, by exploiting the (almost) triangular structure of these matrices. More details will be given in Section 4. However, the numerical results have shown slightly worse accuracy for some equations in    Normalized residuals

Normalized residuals for Example 4.4 from CTLEX collection
Reduced eq. Original eq.   Normalized residuals

Normalized residuals for Example 4.3 from DTLEX collection
Reduced eq. Original eq. Currently, the ArLyap solver uses the normalized residuals for the reduced Lyapunov equations to decide convergence. While these residuals should theoretically coincide to those for the original equations, there is a big discrepancy between their numerical values. In addition, less iterations are needed for deciding that the convergence has been achieved. These issues could make the final errors or residuals (computed by the external MATLAB program, not by the solver) to be sometimes larger than those obtained using the ALyap solver. It can be seen that in most cases the trajectories of the normalized residuals for the original equations are comparable in shape and magnitude to the trajectories of the relative errors or normalized residuals computed externally, and shown in the previous section. It should be emphasized that this increase in the normalized residual values is produced just by the back transformation (with orthogonal matrices!) of the solutions of reduced Lyapunov equations obtained at the end of the iterative process, and by recomputing the residuals using A (or A and E) in Equations (1) or (2) (or in (4) or (5)).
Even in computations with orthogonal matrices, rounding errors can significantly perturb the results. For instance, using the first CTLEX 4.1 example in the generated series, with n = 5, r = s = 1.1, if Q is the orthogonal matrix reducing A to a real Schur form, A = Q T AQ, and Y = Q T YQ is the transformed matrix Y in Equation (1), then Q YQ T − Y F ≈ 4.32 × 10 −14 , and Q YQ T − Y F / Y F ≈ 9.17 × 10 −16 , while these values should theoretically be zero. If X is the solution of the corresponding reduced Lyapunov equation, A T X + X A = − Y, computed using MATLAB function lyap, its normalized residual is about 3.39 × 10 −16 , but the normalized residual of the solution of the original equation, X = Q XQ T is about 1.67 × 10 −15 , i.e., about five times bigger than for X. This increase is produced by the two multiplications, with Q and Q T . Similarly, for the last CTLEX 4.1 example in the generated series, with n = 20, r = 1.9, s = 1.1, the normalized residual for X is over 307 times bigger than for X. Such residual magnification could be attenuated only by using computations with extended precision.
To prove that the back transformation step increases the normalized residuals, the CTLEX 4.1 series of examples has been solved by ArLyap with the additional condition to exit after the first iteration. The ratios between the corresponding normalized residuals for the original and reduced equations have been computed. While the normalized residuals in these two cases should theoretically coincide, the computed values had ratios in the interval [4.62, 420.43], with a mean value of about 54.78. This proves that the back transformation step always increased the normalized residuals, possibly by more than two orders of magnitude. However, the relative errors of the two solvers for this test are comparable. Specifically, the ratios of these errors for the ArLyap  As shown before, even if the normalized residual of the last iterate computed by ArLyap, X k , is very small, the normalized residual of the computed solution of the original equation, Q X k Q T , can be much larger. The previous version of the accuracy-enhancing solver, ALyap, could sometimes achieve more accurate results, with some additional computational effort, by iterating directly on the matrices Q X j Q T , j = 0 : k. Usually, the residual matrices of its iterates, and hence the corrections applied in the process, have larger norms than for the ArLyap solver (see Table 1).
For most of the tests, the default tolerance has been used, to make comparisons with [35] possible. However, the ArLyap solver produces smaller normalized residuals during iterations than the ALyap solver. Consequently, ArLyap can often return after one or two iterations. Indeed, with the default tolerance, all 75 examples generated for the CTLEX 4.1 series needed a total of 124 iterations, hence the mean value is about 1.65 iterations. This suggested to use a smaller tolerance, hoping for more accurate final results. With a tolerance τ = 10 −6 ε, 165 iterations were required, i.e., the mean value increased to 2.17. For both tolerance values above, the maximum number of iterations was five. Some results have been slightly improved, but not the global statistics, such as the mean of normalized residuals for the series of examples. Exactly the same results have been obtained with τ = 10 −14 ε. The reason is that there is an internal test preventing further iterations if the normalized residual increased from one iteration to the next one. In such a case, the previous iterate is restored and returned as the solution. Such an increase is often a sign that the limiting accuracy has been attained, and further iterations could be purposeless. Even if the residual could be further decreased by chance, such a decrease will be rather small and will not justify spending additional computational effort. Further numerical experiments confirmed this conclusion. Indeed, the calculations have been repeated using a test which enabled the iterative process to continue if the current normalized residual is smaller than ten times the previous normalized residual value. But then, the normalized residual trajectory may either arrive to a constant value, or behave periodically, or have all further values in a small range. An exception occurred for the CTLEX 4.2 example, with n = 10, λ = −0.6, s = 1.5. The normalized residual had the following values during iterations 6.72 · 10 −1 , 1.04 · 10 −16 , 1.90 · 10 −16 , 9.72 · 10 −17 , 2.45 · 10 −17 , 1.91 · 10 −16 , 1.92 · 10 −17 , 1.91 · 10 −16 , 6.73 · 10 −18 , 9.07 · 10 −18 , 5.67 · 10 −18 , showing that it increased three times. After each increase, the values decreased in the next one or two iterations. The last value is the smallest. The typical situation is, however, that the normalized residual at the iteration before the first such increase is either the minimum, or at most four times larger than the minimum, but often it is much closer.
It is almost impossible to find the best strategy for deciding when to stop. Sometimes, after a local increase of the normalized residual, the next few iterations will continuously decrease its value, but then another increase could appear, and the previously found minimum value could not be further reduced. Since the normalized residuals trajectory is optionally returned by the ArLyap solver, one possible strategy would be to find the minimum normalized residual value, and call the solver again with the maximum number of iterations, k max , set to the corresponding value. Such a strategy could be useful when accuracy is very important.
There are several directions in which this research can continue. One direction is to combine the previous and current versions of the accuracy-enhancing solver. Specifically, after two-three iterations with ArLyap, one can switch to the computations updating the solution of the original equation at each of the next iterations. Another direction is to refine the stopping strategy, by allowing the iterative process to continue if the normalized residual at a certain iteration exceeds its value at the previous iteration by more than, e.g., two times, but stop the process by restoring the previous iterate at the second detection of a residual increase. Finally, it could be tried to perform the back transformation in quadruple precision. The IEEE standard 754-2008 specifies quadruple and even octuple precision, and some Fortran compilers allow quadruple precision. Moreover, it could be worth trying to make full computations in Fortran, including data input and evaluation of the results. It is expected that better accuracy will be obtained this way.

Materials and Methods
The fact that Lyapunov equations retain only the linear part of AREs suggested that some ARE solvers might be specialized for solving them. Previous successful experience with the algorithms for AREs based on Newton's method, with or without line search [36][37][38], recommended them as good candidates. Recently, the author adapted the Newton-based ARE solver to Lyapunov equations. The conceptual algorithm in [35] is briefly discussed in following subsection and further improved in the next subsections for achieving the highest efficiency.

Conceptual Algorithm Description
Starting from a given initial solution, X 0 , or with X 0 = 0, the algorithm computes the current residual matrix (at iteration k), R(X k ), defined as for a continuous-or discrete-time equation, respectively. Then, a generalized (or standard, if E = I n ) Lyapunov Equation (12) or (13), respectively, which has the current residual matrix in the right hand side, is solved in L k , and the current solution is updated, X k+1 = X k + L k . The main termination criterion for the iterative process is defined based on the normalized residual, r k := r(X k ), and a tolerance τ. Specifically, if the computations are terminated with the computed solution X k . A default tolerance is used if τ ≤ 0 is given on input. Its value is defined by one of the formulas below for Equations (4) and (5), respectively, Another termination criterion is the MATLAB-style relative residual, r r (X k ), defined as the ratio between R(X k ) F and the sum of the Frobenius norms of the matrix terms in Equation (4) or (5). In addition, if L k F ≤ ε M X k F the iterative process terminates with the computed solution X k .
For increased efficiency, A and E are reduced at iteration k = 0 to the real Schur-triangular form, using two orthogonal transformations, Q and Z, namely where A is block upper triangular with diagonal blocks or order 1 and 2, corresponding to real and complex conjugate eigenvalues, respectively, and E is upper triangular. Then, the right hand side of Equation (12) or (13) is transformed A so-called reduced equation, Equation (18) or (19), respectively, is solved for L k . Finally, L k is back transformed, and used to improve the current solution estimate, X k .

New Algorithm
It will now be shown that it is not necessary to transform the solution of the reduced Lyapunov equations, L k , back to L k , except for the final iteration. Indeed, using the notation introduced above, let X k := Q T X k Q, if op(M) = M, and X k := Z T X k Z, if op(M) = M T . For brevity, only the first case will be considered, since the second case is similar. From Equation (16), it follows that A = Q AZ T , and E = Q EZ T , so that replacing A and E in Equation (4), we get Z A T Q T XQ EZ T + Z E T Q T XQ AZ T = −Y, and premultiplying by Z T , postmultiplying by Z, and setting X := Q T XQ, Y := Z T YZ, this formula becomes Similarly, Equations (10) and (17) imply But from Equation (18), Adding the last two equations, it follows that X k+1 := X k + L k solves Equation (21), hence X k+1 = X k + L k theoretically solves Equation (4). Since X k and X k are related by a similarity transformation ( X k := Q T X k Q or X k := Z T X k Z), which preserves their eigenvalues, it follows that X k F = X k F . The same is true for R(X k ) and R(X k ). Therefore, the normalized residuals for X k and X k also coincide (from Equation (9) withX and X m replaced by X k ). The same argument shows that the tolerance τ in Equation (15), computed for the given matrices, A, E, and Y, coincides with its value computed for the transformed matrices, A, E, and Y. This proves that the whole iterative process can be performed solving only reduced Lyapunov equations. Just at the final iteration, after convergence, the solution of the reduced equation should be used for computing Q( X k + L k )Q T .
The same arguments as above can be used for solving Equation (4) with op(A) = A T , or for solving discrete-time Lyapunov Equation (5).
It is important to emphasize that, in theory, there is no need for an iterative process, but this can be useful in practice, due to numerical errors and possibly bad numerical conditioning of a Lyapunov equation.
The new algorithm can be stated as Algorithm 1. Compute the residual matrix R(X k ) 6: for Equation (4) or (5), respectively.

7:
If r k := R( X k ) F / max(1, X k F ) ≤ τ, exit the loop. 8: Solve in L k the reduced Lyapunov Equation (18) or (19), respectively. 9: Update X k+1 = X k + L k . 10: end for 11: Compute X k = Q X k Q T , if op(M) = M, or X k = Z X k Z T , if op(M) = M T and return X k . 12: If k = k max , "Convergence has not been achieved."

Computational Modules for Improving Efficiency
Solving only reduced Lyapunov equations decreases the computational effort with about 1.5n 3 floating point operations (flops) per iteration, by avoiding the back transformation of L k to L k in Equation (20). (This evaluation assumes that the symmetry is exploited.) Additional gains in efficiency can be obtained by simplifying the computation of residuals, since A is in a Schur form, and E is upper triangular. Before commenting on how these improvements could be obtained, few remarks come in order. It is worth mentioning that high-quality numerical software makes references only to the needed part of an array storing a matrix. For instance, only the entries of an upper (or lower) triangle of a symmetric matrix are referenced. All elements on the first subdiagonal of a real Schur matrix are also referenced, and the position of its zero values defines the 1 × 1 or 2 × 2 blocks (needed, e.g., for computing the eigenvalues). Note that a matrix in upper Schur form is a special case of an upper Hessenberg matrix which has no two consecutive nonzero subdiagonal elements.
A professional implementation of the ArLyap solver would need to consider several basic computational modules, which are not available in BLAS [17], LAPACK [18], or SLICOT libraries. Specifically, such modules are described below.

1.
Compute R := αR + β( op(H) T X + X op(H) ), with H an upper Hessenberg matrix and X a symmetric matrix. This is a special symmetric "rank 2k operation" (a specialized version of the BLAS 3 routine syr2k), needed, e.g., for solving standard continuous-time reduced Lyapunov Equation (18), with E = I n .

2.
Compute R := αR + β op(H) T X op(H) , with H an upper Hessenberg matrix and X a symmetric matrix. This operation is necessary for solving standard or generalized discrete-time reduced Lyapunov Equation (19). Let diag(X), triu(X), and tril(X) denote the diagonal, upper and lower triangles of X, respectively, and define two, upper and lower, respectively, triangular matrices

4.
Compute R := αR + β op(E) T X op(E) , with E an upper triangular matrix and X a symmetric matrix. This operation is needed for solving generalized discrete-time reduced Lyapunov Equation (19), and it can be performed using the formulas: Note that UE, EU, L T E, and EL T are all upper triangular matrices. Hence, each of these four formulas involve a special symmetric rank 2k operation on an upper triangular pair. This module needs the product of two upper triangular matrices, expressed as UE, or EU, or L T E, or EL T , with U and E upper triangular, and L lower triangular. This is easily done internally using BLAS 2 function trmv in a loop with n cycles.

5.
Compute , with E and U upper triangular matrices. This module is called by the module 4. 6. Compute with H an upper Hessenberg matrix and X a symmetric matrix, given either the upper triangle U or the lower triangle L of X. This module is needed for computing the relative residual for standard continuous-time reduced Lyapunov equations, since it allows to evaluate the Frobenius norm of this matrix product (which is a term of that equation). Using X = U + U T , or X = L T + L, where U and L = U T are strictly upper and lower triangular, respectively, the module evaluates the product using BLAS 2 trmv function and other routines. Clearly, both HU and L T H are upper Hessenberg, but the results of this module are full matrices. Using Equation (24), the function of the module 1 becomes R := αR + β(P + P T ). However, this formula should only be used when relative residual is needed, and hence P should be computed. 7. Compute with H an upper Hessenberg matrix, X a symmetric matrix, and E an upper triangular matrix. This operation is needed for solving generalized continuous-time reduced Lyapunov Equation (18). Using Equation (23) where X = U + L, and U = L T . Note that UE, EU, L T E, and EL T are all upper triangular, and UH, HU, L T H, and HL T are all upper Hessenberg. Consequently, each of these four formulas involve two special symmetric rank 2k operations for upper Hessenberg-triangular pairs. 8.
Compute R := αR + β( op(H) T op(E) + op(E) T op(H) ), with H an upper Hessenberg matrix and E an upper triangular matrix. This operation is called by the module 7. 9.
Compute either P or P T , where P := op(H) T X op(E) , with H an upper Hessenberg matrix, X a symmetric matrix, and E an upper triangular matrix. This module is needed for evaluating the Frobenius norm of P, used to obtain the relative residual for generalized continuous-time reduced Lyapunov equations. The matrix R in Equation (25) becomes R := αR + β(P + P T ). However, this formula should only be used when relative residual is needed. Note that P is a general matrix, with no structure. The computations can be performed as follows: Note that the Frobenius norms of P and P T coincide, and R can be obtained having either P or P T .
All modules operating with a symmetric matrix must use either the upper, or the lower triangle of an array storing that matrix. Similarly, for an upper Hessenberg matrix, the entries below the first subdiagonal should not be referenced. The modules discussed above represent an extension of the BLAS library, extension which is important for the ArLyap solver, but can be used for other applications as well.
For large order Lyapunov equations, it would be necessary to provide block variants for some of the modules above. As an example, consider the operation HX, with H upper Hessenberg, and X symmetric. Since in the ArLyap solver, H is actually in a real Schur form, let us partition, where H ii ∈ IR n i ×n i , i = 1, 2, n 1 + n 2 = n, and H n 1 +1,n 1 = 0. Clearly, H 11 X 11 and H 22 X 22 can be computed with the module 6, H 12 X T 12 requires a BLAS 3 operation gemm, H 11 X 12 and H 22 X T 12 can be evaluated with an easy extension of the BLAS 3 operation trmm, and H 12 X 22 is obtained by BLAS 3 operation symm. These ideas can be generalized for finer partitions and for other modules above.

Conclusions
A new accuracy-enhancing solver for standard and generalized continuous-and discrete-time Lyapunov equations, has been proposed and investigated. The underlying algorithm and some technical details have been summarized. The best available algorithms for solving Lyapunov equations with dense coefficient matrices, based on the orthogonal reduction to the real Schur(-triangular) form are used in the implementation. The Schur(-triangular) reduction is performed only once, before starting the iterative process. During the iterations, reduced Lyapunov equations are solved. The result of the last iteration is back transformed to obtain the solution of the original equation. How the computations can be organized to increase the efficiency by exploiting the structure and symmetry is also detailed. The numerical results found when solving series of numerically difficult examples generated using SLICOT benchmark collections CTLEX and DTLEX are compared to the solutions computed by the MATLAB and SLICOT solvers. The ArLyap solver can be more accurate than the other solvers, especially for ill-conditioned equations, without a significant additional computational effort. Actually, with very few exceptions, the ArLyap solver is faster than the MATLAB solvers, and close to the SLICOT solvers regarding the elapsed CPU times.

Conflicts of Interest:
The author declares no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: