Abstract
Consider the computation of the solution for a class of discrete-time algebraic Riccati equations (DAREs) with the low-ranked coefficient matrix G and the high-ranked constant matrix H. A structured doubling algorithm is proposed for large-scale problems when A is of lowrank. Compared to the existing doubling algorithm of flops at the k-th iteration, the newly developed version merely needs flops for preprocessing and flopsfor iterations and is more proper for large-scale computations when . The convergence and complexity of the algorithm are subsequently analyzed. Illustrative numerical experiments indicate that the presented algorithm, which consists of a dominant time-consuming preprocessing step and a trivially iterative step, is capable of computing the solution efficiently for large-scale DAREs.
1. Introduction
Consider a discrete-time control system
where and with . Here, stands for sets of complex matrices. The linear quadratic regulator (LQR) control minimizes the energy or the cost functional
with the Hermitian constant term being positive semi-definite []. Here, the symbol “*” is the conjugate transpose of a vector or a matrix.
The corresponding optimal control is
and the feedback gain matrix
can then be expressed in terms of the unique positive semi-definite stabilizing solution X of the discrete-time algebraic Riccati equation (DARE) []
where with , is Hermitian and positive semi-definite. In many control problems, the matrix is sparse in the sense that the matrix-vector product and the inverse-vector product require flops, respectively. The recent applications of the discrete-time control system can be found in [] such as the wheeled robot and the airborne pursuer. There are also some applications (e.g., the singular Kalman filter) about the fractional Riccati equation, see [,] and the references therein.
The existence of the unique positive semi-definite solution X of DARE (1) has been well studied if is d-stabilizable and is observable, see [,] and their references for more details. The structure-preserving doubling algorithm (SDA) is one of the most efficient methods [] to compute the unique positive semidefinite solution X via the following iteration
with , , . Regardless of the structure of coefficient matrices, the computational complexity of each iteration is about , obviously not fitting for large-scale problems. When the constant matrix H is low-ranked, the solution X is commonly numerically low-ranked and can be approximated by in terms of a series of decomposed matrix factors, making the SDA feasible for large-scale DAREs []. If only the feedback gain matrix F is required without outputting the solution X, an adaptive version of the SDA in [] still works for large-scale problems even if H is high-ranked. In that case, the solution X is no longer numerically low-ranked but can be stored in a sequence of matrix-vector products []. In both situations, the computational complexity of the SDA at the kth iteration costs about flops (i.e., the exponential increase in k), resulting in the intolerable iteration time when k is large.
In this paper, we consider DAREs with A of the low-ranked structure (which may not be sparse)
with and (). The motivation behind this is that the complexity of the SDA at the k-th iteration might be further reduced in this case and the DAREs, with the structure (3), have several applications in circuit-controlling areas, for example, the circuits system with and being the mesh inductance matrices, composed of the product of several mesh matrices (n is the number of meshes) and S being the resistance matrix []. To obtain the optimal feedback gain to control the circuit system, one is required to find the solution of the DARE (1).
The main contribution we made under the low-ranked structure (3) is that the computational complexity of the SDA at the k-th iteration can be further reduced to , far less than when . As a result, the most time-consuming part of the SDA lies in the preprocessing step with a fixed computational complexity , and the other part for the iterations might be accordingly insignificant. Numerical experiments are implemented to validate the effectiveness of the presented algorithm, constituting a useful complement to the solver for computing the solution of DAREs.
The rest of the paper is organized as follows. In Section 2, we develop the structured SDA for DAREs with a low-ranked structure of A and construct its convergence. A detailed complexity analysis as well as the design of the termination criterion are established in Section 3. Section 4 is devoted to numerical experiments to indicate the efficiency of the proposed algorithm, and the conclusion is drawn in the last section.
Notation. Symbols and in this paper stand for sets of real and complex matrices, respectively. is the identity matrix. For a matrix , and denote, respectively, the spectrum and spectral radius of A. A Hermitian matrix () when all its eigenvalues are positive (non-negative). Additionally, () if and only if ().
We also need the concept of the numerically low-ranked matrix.
Definition 1.
([]) A matrix A is said to be numerically low-ranked with respect to tolerance if for a constant associated with ϵ but independent of the size of A.
2. Structured Doubling Algorithm
In this section, we describe the structured iteration scheme for DAREs with a high-ranked constant term and low-ranked A in (3). To avoid the inversion of large-scale matrices, the Sherman–Morrison–Woodbury formula (SMWF) [,] is first applied to the sparse-plus-low-ranked matrices to represent the corresponding structured matrices. Then, we aim at preserving the sparsity or the low-ranked structure of the iteration sequence rather than forming it explicitly. As a result, the SDA is capable of being implemented only with some small-scale matrices, referred to as kernels, and the complexity of the iteration can be ignored more easily than that of the preprocessing step for large-scale problems.
2.1. Iteration Scheme
Given the initial matrices , , , , , , and , the SDA will be organized according to the following format:
for , where , , and . One merit of the above scheme (4) is that the sizes of kernels and are always invariant (i.e., ) during iterations. Although the column of and the size of increase linearly with respect to k, the enhanced scale is generally small due to the fast convergence of the SDA. Then, still hopefully maintains a low-ranked structure and could be derived and stored in an economic way.
Let . By applying the Sherman–Morrison–Woodbury formula (SMWF) [], we have
The main computational task of (6) is the update of , in , , and the solutions of two linear system associated with . Regardless of the concrete structure, the complexity of such calculations is [,]. A deeper observation made here will show that such computations can be further down to the complexity of , far less than that of the preprocessing for large-scale problems with . In fact, by setting , it follows from (6) that
and thus
Analogously, we have
and
Furthermore, as
, the update of the matrix
will be of size . Now, suppose that matrices , , , , and are available in the preprocessing step, then in (7) does not require additional computations. Additionally, and can be obtained via updating several small scale matrix multiplications of the size , i.e., , , and , and replicating them times (here and are assumed to be available in the last iteration for computing ). Consequently, the left computation lies in solving two linear systems and of size . We summarize the whole process in Algorithm 1 as below; the concrete complexity analysis in the next section shows that the iteration only costs about flops.
Remark 1.
The output matrices and are numerically low-ranked with respect to the tolerance ϵ. is the matrix from the convergence of given in the next subsection.
Remark 2.
The QR decomposition of is for the derivation of the relative residual and also could be implemented in the preprocessing step. The computational complexity of the preprocessing part is about flops, taking the dominant CPU time compared with the iteration part.
Remark 3.
The computations of the iteration part and of the relative residual in the DARE cost about and flops, respectively, much less than of the preprocessing part when . Hence, the main computation of Algorithm 1 concentrates on the preprocessing part.
2.2. Convergence
To establish the convergence of Algorithm 1, we first review some results for iteration format (2).
| Algorithm 1. Structured SDA for DAREs. | |
| Input: | , , S, B, , H and tolerances and , and ; |
| Output: | , , , and normalized relative resi- |
| dual ; | |
| Preprocess: | Compute , , , , and the economic QR de- |
| composition of . | |
| Iteration: | Set , , , , , , |
| , and ; | |
| For , do until convergence: | |
| Compute the relative residual as in (11). | |
| If , set , , and ; Exit; | |
| End If | |
| Compute | |
| ; | |
| ; | |
| ; | |
| Obtain in (8) with preprocessed matrices. | |
| , | |
| , | |
| , | |
| Set . | |
| End Do | |
Theorem 1.
It follows from (10) that
indicating that sequences , and converge quadratically to zero, X, and Y, respectively, if and . By noting the decomposition , the sequence must converge to zero. On the other hand, the decomposition implies that the sequence converges to some matrix such that the solution of the DARE . At last, the decomposition
indicates that the solution Y of the dual DARE has a numerically low-ranked decomposition with respect to a sufficient small tolerance . So, we have the following corollary.
Corollary 1.
Suppose that X and Y are the Hermitian and positive semi-definite solutions of the DARE (1) and its dual form (9), respectively. Then, for Algorithm 1, the sequence converges to zero matrix quadratically, and converges to some matrix with . Moreover, for sufficiently large k, the matrix is numerically low-ranked with respect to tolerance ϵ. That is, the solution Y of the dual Equation (9) has the low-ranked approximation , where matrices and associate with ϵ but independently of the size of Y.
3. Computational Issues
3.1. Residual and Stop Criterion
Recalling the low-ranked structures of G and A, the residual of the DARE is
with
Let (, ) be the economic QR decomposition of , derived from the preprocessing step. The matrix norm of the residual is
and Algorithm 1 can be terminated by the normalized relative residual
with
Note that the calculation of NRRes only associates with several matrix operations with the small-scale , requiring flops and far less than when .
3.2. Complexity Analysis
The main flops of Algorithm 1 come from the preprocessing step of forming matrices , , , , and QR decomposing with the Householder transformation in [,]. Table 1 lists the details, where only the matrix stored as is orthornormal satisfying .
Table 1.
Complexity and memory of the preprocessing step in Algorithm 1.
It is seen from Table 1 that the computation and the storage are both of flops when . We subsequently analyze the complexity of the iteration part. Assume that the LU decomposition is employed for solving the linear system with . The flops and memory of the kth iteration are summarized in Table 2 below.
Table 2.
Complexity and memory at kth iteration in Algorithm 1.
Table 2 shows that the complexity of the kth iteration in Algorithm 1 is about , far less than of the preprocessing step when . Thus, the dominantly calculating cost of Algorithm 1 locates at the preprocessing step; however, it is still far less than the exponentially increasing complexity [,] when k grows large.
4. Numerical Experiments
In this section, we will show the effectiveness of Algorithm 1 to calculate the solution X of the large-scale DARE (1). The code was programmed by Matlab 2014a [], and all computations were implemented in a ThinkPad notebook with 2.4 GHz Intel i5-6200 CPU and 8G memory. The stop criterion is the NRRes in (11) with a proper tolerance . To show the location of the dominant computations in Algorithm 1, we record the ratio of iteration time and total time in the percentage
where “TIME-P” represents the pre-processing time elapsed for forming matrices associated with n, and “TIME-I” stands for the costed CPU time for iterations.
Example 1.
The first example is devised to measure the actual error between the true solution X and the approximated solution computed from Algorithm 1. Let , and be a vector such that and (), where is a vector with all elements 1. Set , and . Then, the solution of the DARE is
with
and
being the root of the equation . Here, and represent the n-th element of and , respectively. The coefficient matrices are and . The principle of selecting the above vectors and matrices is for the convenient construction of the true solution of the DARE. Then, we can evaluate the error between the computed approximated solution and the true solution.
We consider the medium scales with , 3000, and 5000 to test the accuracy of Algorithm 1, which is terminated when the NRRes is less the prescribed . Numerical experiments show that Algorithm 1 always takes three iterations to obtain the approximate solution for all tested dimensions n. The obtained results on NRRes and Errors are listed in Table 3.
Table 3.
Residual and actual errors in Example 1.
It is seen from the table that Algorithm 1 is efficient to calculate the solution of the DARE. In fact, for different dimensions, the actual error between and the solution X is less than the prescribed accuracy after three iterations, and the derived relative residual is down to a lower level about to . Especially, the value of gradually decreases with the rising scale of n, indicating that the CPU time for iterations takes only a small part of the whole for large-scale problems.
Example 2.
Randomly generate matrices and define
Set and consider the DARE (1) with , , and . It is not difficult to see the solution of the DARE is . Similarly, the principle of selecting the above matrices is for the convenience of evaluating the error.
We take to test the error between the true solution and the computed solution. The obtained results together with the NRRes are plotted in Figure 1, Figure 2 and Figure 3. Still, represents the ratio of the iteration time and the total time.
Figure 1.
History of NRRes and Error for in Example 2.
Figure 2.
History of NRRes and Error for in Example 2.
Figure 3.
History of NRRes and Error for in Example 2.
Figure 1, Figure 2 and Figure 3 show that as the number of iterations increases, the NRRes and errors decrease exponentially and Algorithm 1 terminates at the 6th iteration. In all experiments, the preprocessing time for three cases varied from 0.1 to 0.2 s, while the iterative time only took from 0.0032 to 0.0035 s, costing a small part of the whole CPU time. More experiments also indicated that the ratio became smaller as the scale of the problem increased.
Example 3.
This example comes from a proper modification of the circuits from the magneto-quasistatic Maxwell equations ([,]). The matrix represents the DC resistance matrix of each current filament (see Figure 4) and as well as associated with the mesh matrices. Let , and . We randomly generate the matrix and define
Figure 4.
The structure of the DC resistance matrix of each current filament.
The tolerance ϵ is taken as , and the dimensions are ().
For all cases in our experiments, Algorithm 1 was observed attaining the relative residual level below at the 4-th iteration. The elapsed CPU time and the ratio are plotted in Figure 5, where “” and “” record the CPU time for the preprocessing and for the iteration, respectively. One can see from the figure that as the scale n rises, the preprocessing time becomes more dominant (about 112 s at ) but the iteration time remains almost unchanged (about 3.5 s for all n). The gradually reduced ratio also illustrates that the main computations of Algorithm 1 when solving the large-scale problems lie in the preprocessing step of flops, much less than the exponentially increasing one of in [,].
Figure 5.
Preprocessing time (), iteration time (), and for different dimensions in Example 3.
5. Conclusions
We have proposed an efficient algorithm to solve the large-scale DAREs with low-ranked matrices A and G and a high-rank matrix H. Compared with the SDA of the complexity in [,], the newly developed algorithm only requires preprocessing step of flops and iteration step of flops. For large-scale problems with , the main computations of the whole algorithm lie in the preprocessing step with several matrix multiplications and an economic QR decomposition, while the elapsed CPU time for the iteration part is trivial. Some numerical experiments validate the effectiveness of the proposed algorithm. For future work, we may investigate the possibility of the SDA for solving large-scale DAREs with the structure of sparse-plus-low-rank in A, where the possible difficulty might be understanding the concrete structure of the iterative matrix and knowing how to compute and store it efficiently.
Author Contributions
Conceptualization, B.Y.; methodology, N.D.; software, C.J.; validation, B.Y.; and formal analysis, N.D. All authors have read and agreed to the final version of this manuscript.
Funding
This work was supported partly by the NSF of China (11801163), the NSF of Hunan Province (2021JJ50032, 2023JJ50040), and the Key Foundation of the Educational Department of Hunan Province (20A150).
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Athan, M.; Falb, P.L. Optimal Control: An Introduction to the Theory and Its Applications; McGraw-Hill: New York, NY, USA, 1965. [Google Scholar]
- Lancaster, P.; Rodman, L. Algebraic Riccati Equations; Clarendon Press: Oxford, UK, 1999. [Google Scholar]
- Rabbath, C.A.; Léchevin, N. Discrete-Time Control System Design with Applications; Springer Science and Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Nosrati, K.; Shafiee, M. On the convergence and stability of fractional singular Kalman filter and Riccati equation. J. Frankl. Inst. 2020, 357, 7188–7210. [Google Scholar] [CrossRef]
- Trujillo, J.J.; Ungureanu, V.M. Optimal control of discrete-time linear fractional-order systems with multiplicative noise. Int. J. Control. 2018, 91, 57–69. [Google Scholar] [CrossRef]
- Chu, E.K.-W.; Fan, H.-Y.; Lin, W.-W. A structure-preserving doubling algorithm for continuous-time algebraic Riccati equations. Lin. Alg. Appl. 2005, 396, 55–80. [Google Scholar] [CrossRef]
- Chu, E.K.-W.; Fan, H.-Y.; Lin, W.-W.; Wang, C.-S. A structure-preserving doubling algorithm for periodic discrete-time algebraic Riccati equations. Int. J. Control 2004, 77, 767–788. [Google Scholar] [CrossRef]
- Chu, E.K.-W.; Weng, P.C.-Y. Large-scale discrete-time algebraic Riccati equations—Doubling algorithm and error analysis. J. Comp. Appl. Maths. 2015, 277, 115–126. [Google Scholar] [CrossRef]
- Yu, B.; Fan, H.-Y.; Chu, E.K.-W. Large-scale algebraic Riccati equations with high-rank constant terms. J. Comput. Appl. Math. 2019, 361, 130–143. [Google Scholar] [CrossRef]
- Kamon, M.; Wang, F.; White, J. Generating nearly optimally compact models from Krylov-subspace based reduced order models. IEEE Trans. Circuits -Syst.-Ii Analog Digit. Signal Process. 2000, 47, 239–248. [Google Scholar] [CrossRef]
- Golub, G.H.; Van Loan, C.F. Matrix Computations, 3rd ed.; Johns Hopkins University Press: Baltimore, MD, USA, 1996. [Google Scholar]
- Yu, B.; Li, D.-H.; Dong, N. Low memory and low complexity iterative schemes for a nonsymmetric algebraic Riccati equation arising from transport theory. J. Comput. Appl. Math. 2013, 250, 175–189. [Google Scholar] [CrossRef]
- Lin, W.-W.; Xu, S.-F. Convergence analysis of structure-preserving doubling algorithms for Riccati-type matrix equations. SIAM J. Matrix Anal. Appl. 2006, 28, 26–39. [Google Scholar] [CrossRef]
- Bhatia, R. Matrix Analysis, Graduate Texts in Mathematics; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
- Higham, N.J. Functions of Matrices: Theory and Computation; SIAM: Philadelphia, PA, USA, 2008. [Google Scholar]
- Higham, D.J.; Higham, N.J. MATLAB Guide; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2016. [Google Scholar]
- Miguel, S.L.; Kamon, M.; Elfadel, I.; White, J. A coordinate transformed Arnoldi algorithm for generating guaranteed stable reduced order models of RLC circuits. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, USA, 10–14 November 1996; pp. 288–294. [Google Scholar]
- Odabasioglu, A.; Celik, M.; Pileggi, L.T. PRIMA: Passive Reduced order Interconnect Macro modeling Algorithm. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 1998, 17, 645–654. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).