Next Article in Journal
On a Quadratic Nonlinear Fractional Equation
Next Article in Special Issue
A Brief Survey of Paradigmatic Fractals from a Topological Perspective
Previous Article in Journal
Investigation of Fractal Characteristics of Karman Vortex for NACA0009 Hydrofoil
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Factorized Doubling Algorithm for Large-Scale High-Ranked Riccati Equations in Fractional System

School of Science, Hunan University of Technology, Zhuzhou 412007, China
*
Author to whom correspondence should be addressed.
Fractal Fract. 2023, 7(6), 468; https://doi.org/10.3390/fractalfract7060468
Submission received: 6 May 2023 / Revised: 7 June 2023 / Accepted: 8 June 2023 / Published: 10 June 2023
(This article belongs to the Special Issue Feature Papers for the 'Complexity' Section)

Abstract

:
In real-life control problems, such as power systems, there are large-scale high-ranked discrete-time algebraic Riccati equations (DAREs) from fractional systems that require stabilizing solutions. However, these solutions are no longer numerically low-rank, which creates difficulties in computation and storage. Fortunately, the potential structures of the state matrix in these systems (e.g., being banded-plus-low-rank) could be beneficial for large-scale computation. In this paper, a factorized structure-preserving doubling algorithm (FSDA) is developed under the assumptions that the non-linear and constant terms are positive semidefinite and banded-plus-low-rank. The detailed iteration scheme and a deflation process for FSDA are analyzed. Additionally, a technique of partial truncation and compression is introduced to reduce the dimensions of the low-rank factors. The computation of residual and the termination condition of the structured version are also redesigned. Illustrative numerical examples show that the proposed FSDA outperforms SDA with hierarchical matrices toolbox (SDA_HODLR) on CPU time for large-scale problems.

1. Introduction

Consider the fractional system [1,2]
Δ ( α ) x ( t + 1 ) = A x ( t ) + B u ( t ) , y ( t ) = C x ( t ) ,
where α ( 0 , 1 ) and ( α ) represents the order of the fractional derivative, A R N × N , B R N × m and C R l × N with m , l N . If Δ ( α ) x ( t + 1 ) is approximated by the Grünwald–Letnikov rule [3] at k = 1 , the system (1) is equivalent to the discrete-time linear system
x ( t + 1 ) = A x ( t ) + B u ( t ) , y ( t ) = C x ( t ) ,
where A = h α A + α I and B = h α B . The corresponding optimal control and the feedback gain can be expressed in terms of the unique positive semidefinite stabilizing solution of the discrete-time algebraic Riccati Equation (DARE)
D ( X ) X + A X ( I + G X ) 1 A + H = 0 , A , G , H R N × N .
There have been numerous methods, including classical and state-of-the-art techniques, developed over the past few decades to solve this equation in a numerically stable manner. See [4,5,6,7,8,9,10,11,12,13,14,15] and the references therein for more details.
In many large-scale control problems, the matrix G = B R 1 B in the non-linear term and H = C T 1 C in the constant term are of low-rank with B R N × m g , R R m g × m g , C R m h × N , T R m h × m h , and m g , m h N . Then the unique positive definite stabilizing solution in the DARE (3) or its dual equation can be approximated numerically by a low-rank matrix [16,17]. However, when the constant term H in the DARE equation has a high-rank structure, the stabilizing solution is no longer numerically low-ranked, making its storage and outputting difficult. To solve this issue, an adapted version of the doubling algorithm, named SDA_h, was proposed in [18]. The main idea behind SDA_h is to take advantage of the numerical low-rank of the stabilizing solution in the dual equation to estimate the residual of the original DARE. In this way, SDA_h can efficiently evaluate the residual and output the feedback gain. An interesting question up to now is:
  • Can SDA solve the large-scale DAREs efficiently when both G and H are of high-rank?
The main difficulty, in this case, lies in that the stabilizing solutions both in DARE (3) and its dual equation are not of low-rank, making the direct application of SDA difficult for large-scale problems, especially the estimation of residuals and the realization of algorithmic termination. This paper attempts to overcome this obstacle. Rather than answering the above question completely, DARE (3) with the banded-plus-low-rank structure
A = D A + L 10 A K A ( L 20 A )
is considered, where D A R N × N is a banded matrix, L 10 A , L 20 A R N × m a are low-rank matrices and K A R m a × m a is the kernel matrix with m a N . The assumption of (4) is not necessary when G and H are of low rank, i.e., in that case A is allowed to be any (sparse) matrix. We also assume that the high-rank non-linear item and the constant item are of the form
G = D G + L G K G ( L G ) , H = D H + L H K H ( L H ) ,
where D G , D H R N × N are positive semidefinite banded matrices, L G R N × m g , L H R N × m h , K G R m g × m g and K H R m h × m h are symmetric and m g , m h N (here m g and m h might be zero). In addition, we assume that D A , D G , and D H are all banded matrices with banded inverse (BMBI), which has some applications in the power system [19,20,21]. See also [22,23,24,25,26,27,28,29], as well as their references for other applications.
The main contributions in this paper are:
  • Although the hierarchical (e.g., HODLR) structure [30,31] can be employed to run the SDA to cope with large-scale DAREs with both high-rank H and G, it is the first to develop SDA to the factorized form—FSDA—to deal with such DAREs.
  • The structure of the FSDA iterative sequence is explicitly revealed to consist of two parts—the banded part and the low-rank part. The banded part can iterate independently while the low-rank part relies heavily on the product of the banded part and the low-rank part.
  • A deflation process of the low-rank factors is proposed to reduce the column number of the low-rank part. The conventional truncation and compression in [17,18] for the whole low-rank factor does not to work as it destroys the implicit structure and makes the subsequent deflation infeasible. Instead, a partial truncation and compression (PTC) technique is then devised to impose merely on the exponentially increasing part (after deflation), effectively slimming the dimensions of the low-rank factors.
  • The termination criterion of FSDA consists of two parts. The residual of the banded part is considered in the pre-termination, and only if it is small enough, the actual termination criterion involving the low-rank factors is computed. This way, the time-consuming detection of the terminating condition is reduced in complexity.
The research in this field is also motivated by other applications, such as the finite element methods (FEM). In FEM, the matrices resulting from discretizing the matrix equations exhibit a sparse and structured pattern [32,33]. By capitalizing on these advantages, iterative methods designed for such matrices can significantly enhance computational efficiency, minimize memory usage, and lead to quicker solutions for large-scale problems.
The whole paper is organized as follows. Section 2 describes the FSDA for DAREs (3) with high-rank non-linear and constant terms. The deflation process for the low-rank factors and kernels is given in Section 3. Section 4 dwells on the technique of PTC to slim the dimensions of low-rank factors and kernels. The way to compute the residual, as well as the concrete implementation of the FSDA, is described in Section 5. Numerical experiments are listed in Section 6 to show the effectiveness of the FSDA.
Notation 1.
I N (or simply I) is the N × N identity matrix. For a matrix A R N × N , ρ ( A ) denotes the spectral radius of A. For symmetric matrices A and B R N × N , we say A > B ( A B ) if A B is a positive definite (semi-definite) matrix. Unless stated otherwise, the norm · is the F-norm of a matrix. For a sequence of matrices { A i } i = 1 k , i = k 0 A i = A k A k 1 A 1 A 0 . For a banded matrix B, b w ( B ) represents the bandwidth. Additionally, the Sherman–Morrison–Woodbury (SMW) formula (see [34] for example), ( M + U D V ) 1 = M 1 M 1 U ( D 1 + V M 1 U ) 1 V M 1 is required in the analysis of iterative scheme.

2. SDA and the Structured Iteration for DARE

For DARE
D ( X ) = X + A X ( I + G X ) 1 A + H = 0
and its dual equation
D a ( Y ) = Y + A Y ( I + H Y ) 1 A + G = 0 ,
SDA [7] generates a sequence of matrices, for k 1
G k = G k 1 + A k 1 ( I + G k 1 H k 1 ) 1 G k 1 A k 1 , H k = H k 1 + A k 1 H k 1 ( I + G k 1 H k 1 ) 1 A k 1 , A k = A k 1 ( I + G k 1 H k 1 ) 1 A k 1 ,
with A 0 = A , G 0 = G , H 0 = H . Under some conditions (see also Theorem 1), { A k } converges to the zero matrix and { H k } and { G k } converge to the stabilizing solutions of D ( X ) = 0 and D a ( Y ) = 0 , respectively.

2.1. FSDA for High-Rank Terms

Given banded matrices D 0 A = D A , D 0 G = D G and D 0 H = D H , low-rank matrices L 0 G , L 10 A , L 0 H , and L 20 A , and kernels K 0 A = K A , K 0 G = K G , and K 0 H = K H in the structured initial matrices (4) and (5), the FSDA is described inductively as follows, where
A k = D k A + L 1 , k A K k A ( L 2 , k A ) , G k = D k G + L k G K k G ( L k G ) , H k = D k H + L k H K k H ( L k H )
with sparse banded matrices D k A , D k G , D k H R N × N , low-rank factors L 1 , k A R N × m k a 1 , L 2 , k A R N × m k a 2 , L k G R N × m k g , L k H R N × m k h , kernel matrices K k A R m k a 1 × m k a 2 , K k G R m k g × m k g , K k H R m k h × m k h and m k a 1 , m k a 2 , m k g , m k h N . Without loss of generality, we assume that m 0 a 1 = m 0 a 2 m a and K 0 A = I m a . Otherwise, L 20 A : = L 20 A ( K 0 A ) and K 0 A : = I m a fulfill the assumption.
We first elaborate the concrete format of banded parts and low-rank factors for k = 1 and k 2 . Note that banded parts are capable of iterating independently, regardless of the low-rank parts and kernels.
Case for k = 1 .
In the first step, we will assume that G 0 = D 0 G and H 0 = D 0 H , i.e., these matrices have no low-rank part. Note that this is only performed in order to simplify exposition. The fully general case with non-trivial low-rank parts will be shown in the case k 2 n .
Insert the initial matrices D 0 A , D 0 G , and D 0 H and low-rank matrices L 10 A and L 20 A into SDA (7). It follows from the SMW formula that
D 1 G = D 0 G + D 0 A G H G ( D 0 A ) , D 1 H = D 0 H + D 0 A H G H D 0 A , D 1 A = D 0 A G H D 0 A = D 0 A ( D 0 A H G )
with
D 0 A G H G = D 0 A ( I N + D 0 G D 0 H ) 1 D 0 G , D 0 A H G H = ( D 0 A ) ( I N + D 0 H D 0 G ) 1 D 0 H , D 0 A G H = D 0 A ( I N + D 0 G D 0 H ) 1 , D 0 A H G = ( D 0 A ) ( I N + D 0 H D 0 G ) 1 .
It follows from [35] (Lem 4.5) that the iteration (9) is well defined if D 0 G and D 0 H are both positive semidefinite.
The low-rank factors in (8) are
L 1 G = [ L 10 A , D 0 A G H G L 20 A ] , L 1 H = [ L 20 A , D 0 A H G H L 10 A ] , L 11 A = [ L 10 A , D 0 A G H L 10 A ] , L 21 A = [ L 20 A , D 0 A H G L 20 A ]
and the kernels in the low-rank parts are
K 1 G = ( L 20 A ) D 0 G H G L 20 A I m 0 g I m 0 g 0 , K 1 H = ( L 10 A ) D 0 H G H L 10 A I m 0 h I m 0 h 0 ,
K 1 A = ( L 20 A ) D 0 G H L 10 A I m 0 g I m 0 h 0
with
D 0 G H G = ( I N + D 0 G D 0 H ) 1 D 0 G , D 0 H G H = ( I N + D 0 H D 0 G ) 1 D 0 H , D 0 G H = ( I N + D 0 G D 0 H ) 1
and m 0 g = m a , m 0 h = m a .
Case for general k 2 .
By inserting the banded matrices D k 1 G , D k 1 H and D k 1 A and the low-rank factors L k 1 G , L k 1 H , L 1 , k 1 A , and L 2 , k 1 A and the kernels K k 1 G , D k 1 H and D k 1 A into SDA (7), banded matrices at the k-th iteration are
D k G = D k 1 G + D k 1 A G H G ( D k 1 A ) , D k H = D k 1 H + D k 1 A H G H D k 1 A , D k A = D k 1 A G H D k 1 A = D k 1 A ( D k 1 A H G )
with
D k 1 A G H G = D k 1 A ( I N + D k 1 G D k 1 H ) 1 D k 1 G , D k 1 A H G H = ( D k 1 A ) ( I N + D k 1 H D k 1 G ) 1 D k 1 H , D k 1 A G H = D k 1 A ( I N + D k 1 G D k 1 H ) 1 , D k 1 A H G = ( D k 1 A ) ( I N + D k 1 H D k 1 G ) 1 .
The corresponding low-rank factors are
m k 1 g m k 1 a 1 m k 1 g m k 1 h m k 1 a 2 L k G = L k 1 G , L 1 , k 1 A , D k 1 A G H L k 1 G , D k 1 A G H G L k 1 H , D k 1 A G H G L 2 , k 1 A N ,
m k 1 a 1 m k 1 g m k 1 h m k 1 a 1 L 1 , k A = L 1 , k 1 A , D k 1 A G H L k 1 G , D k 1 A G H G L k 1 H , D k 1 A G H L 1 , k 1 A N ,
m k 1 h m k 1 a 2 m k 1 h m k 1 g m k 1 a 1 L k H = L k 1 H , L 2 , k 1 A , D k 1 A H G L k 1 H , D k 1 A H G H L k 1 G , D k 1 A H G H L 1 , k 1 A N ,
m k 1 a 2 m k 1 g m k 1 h m k 1 a 2 L 2 , k A = L 2 , k 1 A , D k 1 A H G L k 1 H , D k 1 A H G H L k 1 G , D k 1 A H G L 2 , k 1 A N .
To express the kernels explicitly, let
Θ k 1 H = ( L k 1 H ) D k 1 G H G L k 1 H , Θ k 1 G = ( L k 1 G ) D k 1 H G H L k 1 G , Θ k 1 H G = ( L k 1 H ) D k 1 G H L k 1 G , Θ k 1 A = ( L 2 , k 1 A ) D k 1 G H L 1 , k 1 A , Θ 1 , k 1 A = ( L 1 , k 1 A ) D k 1 H G H L 1 , k 1 A , Θ 2 , k 1 A = ( L 2 , k 1 A ) D k 1 G H G L 2 , k 1 A
and
Θ 1 , k 1 A H = ( L 1 , k 1 A ) D k 1 H G L k 1 H , Θ 1 , k 1 A G = ( L 1 , k 1 A ) D k 1 H G H L k 1 G , Θ 2 , k 1 A H = ( L 2 , k 1 A ) D k 1 G H G L k 1 H , Θ 2 , k 1 A G = ( L 2 , k 1 A ) D k 1 G H L k 1 G
with
D k 1 G H G = ( I N + D k 1 G D k 1 H ) 1 D k 1 G , D k 1 H G H = ( I N + D k 1 H D k 1 G ) 1 D k 1 H , D k 1 G H = ( I N + D k 1 G D k 1 H ) 1 , D k 1 H G = ( I N + D k 1 H D k 1 G ) 1 .
Define the kernel components
K k 1 G H = 0 K k 1 G K k 1 H 0 I m k 1 h + m k 1 g + Θ k 1 H Θ k 1 H G ( Θ k 1 H G ) Θ k 1 G K k 1 H 0 0 K k 1 G 1 ,
K k 1 G H G = K k 1 G H 0 I m k 1 h I m k 1 g 0 , K k 1 H G H = 0 I m k 1 h I m k 1 g 0 K k 1 G H
and
K k 1 A G H G = K k 1 A [ Θ 2 , k 1 A G , Θ 2 , k 1 A H ] K k 1 G H G , K k 1 A H G H = ( K k 1 A ) [ Θ 1 , k 1 A H , Θ 1 , k 1 A G ] K k 1 H G H , K k 1 A G H G A = K k 1 A Θ 2 , k 1 A ( K k 1 A ) + K k 1 A G H G [ Θ 2 , k 1 A G , Θ 2 , k 1 A H ] ( K k 1 A ) , K k 1 A H G H A = ( K k 1 A ) Θ 1 , k 1 A K k 1 A + K k 1 A H G H [ Θ 1 , k 1 A H , Θ 1 , k 1 A G ] K k 1 A , K k 1 A G H = K k 1 A [ Θ 2 , k 1 A G , Θ 2 , k 1 A H ] K k 1 G H , K k 1 A G H = ( K k 1 A ) [ Θ 1 , k 1 A H , Θ 1 , k 1 A G ] ( K k 1 G H ) , K k 1 A G H A = K k 1 A Θ k 1 A K k 1 A + K k 1 A G H [ Θ 1 , k 1 A H , Θ 1 , k 1 A G ] K k 1 A .
Then the kernel matrices corresponding to L k G , L k H , and L 1 , k A ( L 2 , k A ) at the k-th step are
m k 1 g m k 1 a 1 m k 1 g + m k 1 h m k 1 a 2 K k G = K k 1 G 0 0 0 0 K k 1 A G H G A K k 1 A G H G K k 1 A 0 ( K k 1 A G H G ) K k 1 G H G 0 0 ( K k 1 A ) 0 0 m k 1 g m k 1 a 1 m k 1 g + m k 1 h m k 1 a 2 ,
m k 1 h m k 1 a 2 m k 1 h + m k 1 g m k 1 a 1 K k H = K k 1 H 0 0 0 0 K k 1 A H G H A K k 1 A H G H ( K k 1 A ) 0 ( K k 1 A H G H ) K k 1 H G H 0 0 K k 1 A 0 0 m k 1 h m k 1 a 2 m k 1 h + m k 1 g m k 1 a 1
and
m k 1 a 2 m k 1 g + m k 1 h m k 1 a 2 K k A = K k 1 A G H A K k 1 A G H K k 1 A ( K k 1 A G H ) K k 1 G H 0 K k 1 A 0 0 m k 1 a 1 m k 1 g + m k 1 h m k 1 a 1 .
Remark 1.
1. The banded parts in (13) in the FSDA can iterate independently of the low-rank parts, motivating to the pre-termination criterion in Section 5.
2. Low-rank factors in (14)–(17) are seen growing in dimension on a scale of O ( 4 k ) , obviously intolerable for large-scale problems. So a deflation process and a truncation and compression technique are required to reduce the dimensions of the low-rank factors.
3. In real implementations, low-rank factors and kernels for k 2 are actually deflated, truncated, and compressed, as described in the next two sections, where a superscript “ d t ” is added to the upper right corner of each low-rank factor. Correspondingly, column numbers m k 1 g , m k 1 h , m k 1 a 1 , and m k 1 a 2 are the ones after deflation, truncation, and compression. Here, we temporarily omit this superscript “ d t ” just for the convenience when describing the successive iteration process.

2.2. Convergence and the Evolution of the Bandwidth

To obtain the convergence, we further assume that
[ A , G ] is d - stabilizable and   [ H , A ] is d - detectable
and
[ D A , D G ] is d - stabilizable and   [ D H , D A ] is d - detectable .
The following theorem concludes the convergence of SDA (7), see [35] (Thm 4.3, Thm 4.6) or [36] (Thm 3.1).
Theorem 1.
Under the assumption (24), there are unique symmetric positive semi-definite and stabilizing solutions X s and Y s to DARE (3) and its dual Equation (6), respectively. Moreover, the sequences { G k } , { H k } and { A k } generated by SDA (7) satisfy 0 H H k H k + 1 X s , 0 G G k G k + 1 Y s for all k and
lim k H k = X s , lim k G k = Y s , lim k A k = 0 ,
all quadratically.
For the banded iterations (9) and (13), we have the following corollary.
Corollary 1.
Under the assumption (25), there are unique symmetric positive semi-definite and stabilizing solutions D X and D Y to the equation
X + ( D A ) X ( I + D G X ) 1 D A + D H = 0
and its dual equation
Y + D A Y ( I + D H Y ) 1 ( D A ) + D G = 0
respectively. Moreover, the sequences { D k G } , { D k H } and { D k A } generated by iterations (9) and (13) satisfy 0 D H D k H D k + 1 H D X , 0 D G D k G D k + 1 G D Y for all k and
lim k D k H = D X , lim k D k G = D Y , lim k D k A = 0 ,
all quadratically.
Proof. 
This is a direct application of Theorem 1 to Equation (27) and its dual Equation (28) under the assumption (25).    □
Corollary 2.
Under the conditions of Theorem 1 and Corollary 1, the symmetric positive semidefinite solutions X s and Y s to DARE (3) and its dual equation have the decompositions
X s = D X + L l r X a n d Y s = D Y + L l r Y .
Moreover, for the sequences generated by FSDA, { D k A } and { L 1 , k A K k A ( L 2 , k A ) } converge to zero, { D k H } and { L k H K k H ( L k H ) } converge to D X and L l r X and, { D k G } and { L k G K k G ( L k G ) } converge to D Y and L l r Y , respectively, all quadratically.
Proof. 
It follows from (26) that { A k } converges to zero. Then the decomposition A k = D k A + L 1 , k A K k A ( L 2 , k A ) in (8) together with lim k D k A = 0 imply that the sequence { L 1 , k A K k A ( L 2 , k A ) } will converge to zero quadratically.
Additionally, as the sequences { H k } and { G k } converge quadratically, by (26), to the unique solutions X s and Y s , respectively, and
H k = D k H + L k H K k H ( L k H ) , G k = D k G + L k G K k G ( L k G )
in (8). So, given the initial banded matrices D 0 H = D H and D 0 G = D G , the iterations { D k H } and { D k G } in (9) and (13) are independent of the low-ranked part and have the unique limits D X and D Y , respectively. Consequently, the sequences { L k H K k H ( L k H ) } = { H k D k H } and { L k G K k G ( L k G ) } = { G k D k G } converge quadratically to the matrices X s D X : = L l r X and Y s D Y : = L l r Y , respectively.    □
Remark 2.
1. Although the product L 1 , k A K k A ( L 2 , k A ) converges to zero, it follows from (15), (17) and (23) that the kernel K k A and low-rank factors L 1 , k A and L 2 , k A might still not converge to zero, respectively.
2. If the convergence of SDA (or the corresponding FSDA) is quadratic, the number of the iterations k is not big when termination occurs, then the matrices L l r X and L l r Y are generally of numerical low-rank.
To show the evolution of the bandwidth of D k A , D k G and D k H , we first require the following result [37].
Theorem 2.
Let A = ( a i j ) be an n × n matrix. Assume that there is a number m such that a i j = 0 if | i j | > m and that A c 1 and A 1 c 2 for some c 1 > 0 and c 2 > 0 . Then for A 1 = ( α i j ) , there are numbers K > 0 and 0 < r < 1 depending only on c 1 , c 2 and m, such that
| α i j | K r | i j | f o r a l l i , j .
We now consider the evolution of the bandwidth for the banded parts.
Theorem 3.
Let b k a = b w ( D k A ) , b k g : = b w ( D k G ) and b k h : = b w ( D k H ) for k 0 . If the assumption (25) holds, then for iteration scheme (13), there is an integers k ¯ independent of k, such that
b k a 2 k ¯ b 0 a + ( 2 k ¯ 1 ) log r ( τ / K ) , b k g ( 2 k ¯ + 1 2 ) b 0 a + b 0 g + ( 2 k ¯ + 1 2 k ¯ ) log r ( τ / K ) , b k h ( 2 k ¯ + 1 2 ) b 0 a + b 0 h + ( 2 k ¯ + 1 2 k ¯ ) log r ( τ / K ) ,
where τ is the truncation tolerance and K > 0 and 0 < r < 1 depend only on the upper bounds of I + D i H D i G , I + D i G D i H , ( I + D i H D i G ) 1 and ( I + D i G D i H ) 1 for i k ¯ .
Proof. 
It follows from [35] (Thm 4.6) that I D k H D k G and I D k G D k H are non-singular for all k. This together with (29) indicate that there is an integer k ¯ such that | ( D k ¯ A ) i j | < τ and the increment of D k G and D k H in (13) satisfies
| ( D k ¯ A ( I + D k ¯ G D k ¯ H ) 1 D k ¯ G D k ¯ A ) i j | < τ a n d | ( D k ¯ A ( I + D k ¯ H D k ¯ G ) 1 D k ¯ H D k ¯ A ) i j | < τ ,
where τ is the given the truncation tolerance. On the other hand for k = 1 , , k ¯ , it follows from Theorem 2 that there are K > 0 and 0 < r < 1 independent of k, such that
| ( ( I + D k G D k H ) 1 ) i j | K r | i j | , | ( ( I + D k H D k G ) 1 ) i j | K r | i j | .
Then one has
b w ( ( I + D k G D k H ) 1 ) log r ( τ / K ) , b w ( ( I + D k G D k H ) 1 ) log r ( τ / K )
for k k ¯ . Now recalling the iteration (9), the bandwidths of the first iteration admit the bounds
b 1 a 2 b 0 a + log r ( τ / K ) , b 1 g 2 b 0 a + b 0 g + log r ( τ / K ) , b 1 h 2 b 0 a + b 0 h + log r ( τ / K ) .
Iterating the above bandwidth bounds according to the scheme (13) at k 1 , we have
b k a 2 k b 0 a + ( 2 k 1 ) log r ( τ / K ) , b k g ( 2 k + 1 2 ) b 0 a + b 0 g + ( 2 k + 1 2 k ) log r ( τ / K ) , b k h ( 2 k + 1 2 ) b 0 a + b 0 h + ( 2 k + 1 2 k ) log r ( τ / K ) .
In particular, the bounds on the RHS of (31) will attain the maximal values at k = k ¯ since elements with the absolute value less than τ are removed as in (30).    □

3. Deflation of Low-Rank Factors and Kernels

It has been shown that there is an exponential increase in the dimension of low-rank factors and kernels. Nevertheless, it is clear that the first three items in L 1 , k A and L 2 , k A (see (15) and (17)) are same as the second to the fourth item in L k G and L k H (see (14) and (16)), respectively. Then the deflation of low-rank factors and kernels is needed to keep these matrices low-ranked. To see this process clearly, we start with the case k = 2 .
Case for k = 2 .
Consider the deflation of the low-rank factors firstly. It follows from (14)–(17) that
L 2 G = [ L 1 G , L 11 A , D 1 A G H L 1 G , D 1 A G H G L 1 H , D 1 A G H G L 21 A ] , L 12 A = [ L 11 A , D 1 A G H L 1 G , D 1 A G H G L 1 H , D 1 A G H L 11 A ] , L 2 H = [ L 1 H , L 21 A , D 1 A H G L 1 H , D 1 A H G H L 1 G , D 1 A H G H L 11 A ] , L 22 A = [ L 21 A , D 1 A H G L 1 H , D 1 A H G H L 1 G , D 1 A H G L 21 A ]
with
D 1 A G H G = D 1 A ( I + D 1 G D 1 H ) 1 D 1 G , D 1 A H G H = ( D 1 A ) D 1 H ( I + D 1 G D 1 H ) 1 , D 1 A G H = D 1 A ( I + D 1 G D 1 H ) 1 , D 1 A H G = ( D 1 A ) ( I + D 1 H D 1 G ) 1 .
Expanding the above low-rank factors with the initial L 10 A R N × m a and L 20 A R N × m a , one can see from Appendix A that L 10 A and D 1 A G H G L 20 A (or L 20 A and D 1 A H G H L 10 A ) occur twice in L 2 G (or L 2 H ). To reduce the dimension of L 2 G , we remove the duplicated L 10 A in L 1 G (or L 20 A in L 1 H ) and retain the one in L 11 A (or L 21 A ). Furthermore, we remove D 1 A G H G L 20 A in D 1 A G H G L 21 A (or D 1 A H G H L 10 A in D 1 A H G H L 11 A ) and keep the one in D 1 A G H G L 1 H (or D 1 A H G H L 1 G ). Then the original L 2 G (or L 2 H ) is deflated to L 2 G d (or L 2 H d ) of a smaller dimension, where the superscript “d ” indicates the matrix after deflation. Analogously, as D 1 A G H L 10 A and D 1 A H G L 20 A appear twice in L 12 A and L 22 A , we apply the same deflation process to L 12 A and L 22 A , respectively, obtaining L 12 A d and L 22 A d in Appendix A, where the left blank in each factor corresponds to the deleted matrix and the black bold matrices inherit from the undeflated ones. Note that the deflated matrices L 2 G d , L 12 A d , L 2 H d and L 22 A d are still denoted by L 2 G , L 12 A , L 2 H and L 22 A , respectively, in next iteration to simplify notations.
For the kernels at k = 2 , one has
2 m a 4 m a 2 m a 2 m a K 2 G = K 1 G 0 0 0 0 K 1 A G H G A K 1 A G H G K 1 A 0 ( K 1 A G H G ) K 1 G H G 0 0 ( K 1 A ) 0 0 2 m a 4 m a 2 m a 2 m a ,
2 m a 4 m a 2 m a 2 m a K 2 H = K 1 H 0 0 0 0 K 1 A H G H A K 1 A H G H ( K 1 A ) 0 ( K 1 A H G H ) K 1 H G H 0 0 K 1 A 0 0 2 m a 4 m a 2 m a 2 m a
and
2 m a 4 m a 2 m a K 2 A = K 1 A G H A K 1 A G H K 1 A ( K 1 A G H ) K 1 G H 0 K 1 A 0 0 2 m a 4 m a 2 m a
with non-zero components defined in (18)–(20). Here, details of the deflation of K 2 G are explained explicitly and that for K 2 H is similar. In fact, there are 10 block rows and block columns with each of initial size m a × m a in K 2 G . Due to the deflation of the L-factors described above, we add the first and the ninth row to the third and the seventh row and then remove the first and the ninth row, respectively. We also add the the first and the ninth column to the third and the seventh column and then remove the first and the ninth column, respectively, completing the deflation of K 2 G d .
Analogously, there are eight block rows and block columns, each of the initial size m a × m a in K 2 A . The deflation process simultaneously adds the seventh column and row subblocks to the third column and row subblocks, respectively. Then the first column sub-block of the upper right K 1 A and the first row sub-block of the lower-left K 1 A overlap with the first column sub-block of K 1 A G H and the first row sub-block of ( K 1 A G H ) , respectively, completing the deflation of K 2 A d .
The whole process is described in Figure 1 and Figure 2 where each small square is of size m a × m a and each block with gray background represents the non-zero component in K 2 G and K 2 A . The little white squares in K 2 G d and K 2 A d inherit from the originally undeflated submatrices and the little black squares in K 2 G d and K 2 A d represent the submatrices after summation.
Case for k 3 .
After the ( k 1 ) -th deflation, the deflated matrices L k 1 G d , L 1 , k 1 A d , L k 1 H d and L 2 , k 1 A d are denoted by L k 1 G , L 1 , k 1 A , L k 1 H and L 2 , k 1 A for simplicity. Now there are m k 1 g ( k 1 ) m a (or m k 1 h ( k 1 ) m a ) columns in L k 1 G and L 1 , k 1 A (or L k 1 H and L 2 , k 1 A ) and m k 1 a 2 m a (or m k 1 a 1 m a ) columns in D k 1 A G H G L 2 , k 1 A and D k 1 A G H G L k 1 H (or D k 1 A H G H L 1 , k 1 A and D k 1 A H G H L k 1 G ) that are identical. Then, one can remove columns of
L k 1 G ( : , ( k 2 ) m a + 1 : m k 1 g m a ) o r L k 1 H ( : , ( k 2 ) m a + 1 : m k 1 h m a )
and
D k 1 A G H G L 2 , k 1 A ( : , 1 : m k 1 a 2 m a ) o r D k 1 A H G H L 1 , k 1 A ( : , 1 : m k 1 a 1 m a ) ,
and keep the columns of
L 1 , k 1 A ( : , 1 : m k 1 g ( k 1 ) m a ) o r L 2 , k 1 A ( : , 1 : m k 1 h ( k 1 ) m a )
and
D k 1 A G H G L k 1 H ( : , m k 1 h m k 1 a 2 + 1 : m k 1 h m a ) o r D k 1 A H G H L k 1 G ( : , m k 1 g m k 1 a 1 + 1 : m k 1 g m a )
in L k G (A1) (or L k H (A3)), respectively. So there are k 1 matrices, each of order N × m a , that are left in L k 1 G (or L k 1 H ), i.e., D 0 A G H G L 20 A , D 1 A G H G D 0 A H G L 20 A , , D k 2 A G H G Π i = 0 k 3 D i A H G L 20 A in (A1) (or D 0 A H G H L 10 A , D 1 A H G H D 0 A G H L 10 A , , D k 2 A H G H Π i = 0 k 3 D i A G H L 10 A in (A3)) in Appendix B. Meanwhile, only one matrix of order N × m a is left in D k 1 A G H G L 2 , k 1 A , (or D k 1 A H G H L 1 , k 1 A ), i.e., the last item D k 1 A G H G Π i = k 2 0 D i A H G L 20 A in (A1) (or D k 1 A H G H Π i = k 2 0 D i A G H L 10 A in (A3)) of Appendix B. We also take L 3 G as an example to describe the above deflation more clearly in Appendix C.
To deflate L 1 , k A ( L 2 , k A ), columns of
D k 1 A G H L 1 , k 1 A ( : , 1 : m k 1 a 1 m a ) or D k 1 A H G L 2 , k 1 A ( : , 1 : m k 1 a 2 m a )
are removed but the columns of
D k 1 A G H L k 1 G ( : , m k 1 g m k 1 a 1 + 1 : m k 1 g m a ) or D k 1 A H G L k 1 H ( : , m k 1 h m k 1 a 2 + 1 : m k 1 h m a )
are retained in L 1 , k A (or L 2 , k A ). So only one matrix of order N × m a is left in D k 1 A G H L 1 , k 1 A (or D k 1 A H G L 2 , k 1 A ), i.e., the last item Π i = k 1 0 D i A G H L 10 A in (A2) (or Π i = k 1 0 D i A H G L 20 A in (A4)) of Appendix B. Note that the low-rank factors in the ( k 1 ) -th iteration are the ones after deflation, truncation and compression, deleting the superscript “d” for the simplicity. We take L 13 A as an example to describe the above deflation more clearly in Appendix D.
Correspondingly, the kernel matrices K k G , K k H , and K k A are deflated according to their low-rank factors. Here, we describe the deflation of K k G and that of K k H is essentially the same. By recalling the place of non-zero sub-matrices (the block with gray background in Figure 3) of K k G in (21), the deflation process essentially adds K k 1 G ( ( k 2 ) m a + 1 : m k 1 g m a , ( k 2 ) m a + 1 : m k 1 g m a ) to K k 1 A G H G A ( 1 : m k 1 g ( k 1 ) m a , 1 : m k 1 g ( k 1 ) m a ) , columns K k 1 A ( : , 1 : m k 1 a 2 m a ) to K k 1 A G H G ( : , m k 1 g + m k 1 h m k 1 a 2 + 1 : m k 1 g + m k 1 h m a ) and rows ( K k 1 A ) ( 1 : m k 1 a 2 m a , : ) to ( K k 1 A G H G ) ( m k 1 g + m k 1 h m k 1 a 2 + 1 : m k 1 g + m k 1 h m a , : ) , respectively. See Figure 3 for illustration.
Similarly, by recalling the positions of non-zero matrices (the block with gray background in Figure 4) of K k A in (23), the deflation process will add columns K k 1 A ( : , 1 : m k 1 a 2 m a ) to columns K k 1 A H G ( : , m k 1 h m k 1 a 2 + 1 : m k 1 h m a ) and rows K k 1 A ( 1 : m k 1 a 1 m a , : ) to rows ( K k 1 A G H ) ( m k 1 g m k 1 a 1 + 1 : m k 1 g m a , : ) . See Figure 4 for illustration.

4. Partial Truncation and Compression

Although the deflation of the low-rank factors and kernels in the last section can reduce dimensional growth, the exponential increment of the undeflated part is still rapid, making large-scale computation and storage infeasible. Conventionally, one efficient way to shrink the column number of low-rank factors is by truncation and compression (TC) [17,18], which, unfortunately, is hard to be applied to our case due to the following two main obstacles.
  • Direct application of TC to L k H d , L k G d , L 1 , k A d , L 2 , k A d , and their corresponding kernels K k H d , K k G d and K k A d at the k-th step will require four QR decompositions, resulting in a relatively high computational complexity and CPU consumption.
  • The TC process applied to the whole low-rank factors at current step breaks up the implicit structure, causing the deflation to be unrealized in the next iteration.
In this section, we will instead present a technique of partial truncation and compression (PTC) to overcome the above difficulties. Our PTC only requires two QR decompositions of the exponentially increasing (not the entire) parts of low-rank factors, keeping the successive deflation for subsequent iterations.
PTC for low-rank factors. Recall the deflated forms (A1) and (A3) in Appendix B. L k G d and L k H d can be divided to three parts
L k G d = [ L k G d ( 1 ) , L k G d ( 2 ) , L k G d ( 3 ) ] L k H d = [ L k H d ( 1 ) , L k H d ( 2 ) , L k H d ( 3 ) ] .
The number of columns in
L k G d ( 1 ) : = [ D 0 A G H G L 20 A , D 1 A G H G D 0 A G H L 20 A , , D k 2 A G H G Π i = k 3 0 D i A G H L 20 A ] R N × ( k 1 ) m a
and
L k H d ( 1 ) : = [ D 0 A H G H L 10 A , D 1 A H G H D 0 A G H L 10 A , , D k 2 A H G H Π i = k 3 0 D i A G H L 10 A ] R N × ( k 1 ) m a
increases only linearly with k, and the last parts
L k G d ( 3 ) : = D k 1 A G H G Π i = k 2 0 D i A G H L 20 A R N × m a
and
L k H d ( 3 ) : = D k 1 A H G H Π i = k 2 0 D i A G H L 10 A R N × m a
are always of size N × m a . So we only truncate and compress the dominantly growing parts
L k G d ( 2 ) : = [ L 1 , k 1 A , D k 1 A G H L k 1 G , D k 1 A G H G L k 1 H ]
and
L k H d ( 2 ) : = [ L 2 , k 1 A , D A H G L k 1 H , D k 1 A H G H L k 1 G ]
by orthogonalization. Consider the QR decompositions with column pivoting of
L k G d ( 2 ) P k G = [ Q k G Q ˜ k G ] U k , 1 G U k , 2 G 0 U ˜ k G , U ˜ k G < u 0 g τ g , L k H d ( 2 ) P k H = [ Q k H Q ˜ k H ] U k , 1 H U k , 2 H 0 U ˜ k H , U ˜ k H < u 0 h τ h ,
where P k G and P k H are permutation matrices such that the diagonal elements of U k , 1 J U k , 2 J 0 U ˜ k J ( J = G or H) are decreasing in absolute value, u 0 g = U 0 , 1 G , u 0 h = U 0 , 1 H and τ g and τ h are some small tolerances controlling PTC of L k G d ( 2 ) and L k H d ( 2 ) , respectively, m k g ( 2 ) and m k h ( 2 ) are the respective column numbers of L k G ( 2 ) and L k H ( 2 ) bounded above by some given m max . Then their ranks satisfy
r k g : = rank ( L k G ( 2 ) ) m k g ( 2 ) m max , r k h : = rank ( L k H ( 2 ) ) m k h ( 2 ) m max
with m max N . Furthermore, Q k G R N × r k g and Q k H R N × r k h are orthonormal and U k G = [ U k , 1 G U k , 2 G ] R r k g × m k 1 h g a and U k H = [ U k , 1 H U k , 2 H ] R r k h × m k 1 h g a are full-rank with m k 1 h g a = m k 1 h + m k 1 g + m k 1 a . Then L k G d and L k H d can be truncated and reorganized as
L k G d t = [ L k G d ( 1 ) , Q k G , L k G d ( 3 ) ] : = [ L k G d t ( 1 ) , L k G d t ( 2 ) , L k G d t ( 3 ) ] R N × m k g , L k H d t = [ L k H d ( 1 ) , Q k H , L k H d ( 3 ) ] : = [ L k H d t ( 1 ) , L k H d t ( 2 ) , L k H d t ( 3 ) ] R N × m k h
with m k g = r k g + k m a and m k h = r k h + k m a .
Similarly, recalling the deflated forms in (A2) and (A4) in Appendix B, L 1 , k A d and L 2 , k A d are also divided into two parts,
L 1 , k A d = [ L 1 , k A d ( 1 ) , L 1 , k A d ( 2 ) ] a n d L 2 , k A d = [ L 2 , k A d ( 1 ) , L 2 , k A d ( 2 ) ]
with
L 1 , k A d ( 1 ) = L k G d ( 2 ) , L 1 , k A d ( 2 ) = Π i = k 1 0 D i A G H L 10 A , L 2 , k A d ( 1 ) = L k H d ( 2 ) , L 2 , k A d ( 2 ) = Π i = k 1 0 D i A H G L 20 A .
Since L k G d ( 2 ) and L k H d ( 2 ) have been compressed to Q k G and Q k H , respectively, one has the truncated and compressed factors
L 1 , k A d t = [ Q k G , L 1 , k A d ( 2 ) ] = [ L 1 , k A d t ( 1 ) , L 1 , k A d t ( 2 ) ] R N × m k a 1 , L 2 , k A d t = [ Q k H , L 2 , k A d ( 2 ) ] = [ L 2 , k A d t ( 1 ) , L 2 , k A d t ( 2 ) ] R N × m k a 2
with m k a 1 = r k g + m a and m k a 2 = r k h + m a , finishing the PTC process for the low-rank factors in the k-th iteration.
It is worth noting that the above PTC process can proceed to the next iteration. In fact, one has
L k + 1 G = [ L k G d t , L 1 , k A d t , D k A G H L k G d t , D k A G H G L k H d t , D k + 1 A G H G L 2 , k A d t ] , L k + 1 H = [ L k H d t , L 2 , k A d t , D k A H G L k H d t , D k A H G H L k G d t , D k A H G H L 1 , k A d t ]
after the k-th PTC. As L 1 , k A d t ( 1 ) is equal to L k G d t ( 2 ) and L 2 , k A d t ( 1 ) is equal to L k H d t ( 2 ) , one can deflate L k + 1 G and L k + 1 H to
L k + 1 G d = [ L k + 1 G d ( 1 ) , L k + 1 G d ( 2 ) , L k + 1 G d ( 3 ) ] , L k + 1 H d = [ L k + 1 H d ( 1 ) , L k + 1 H d ( 2 ) , L k + 1 H d ( 3 ) ]
with
L k + 1 G d ( 1 ) = [ L k G d t ( 1 ) , L k G d t ( 3 ) ] , L k + 1 H d ( 1 ) = [ L k H d t ( 1 ) , L k H d t ( 3 ) ] , L k + 1 G d ( 2 ) = [ L 1 , k A d t , D k A G H L k G d t , D k A G H G L k H d t ] , L k + 1 H d ( 2 ) = [ L 2 , k A d t , D k A H G L k H d t , D k A H G H L k G d t ] , L k + 1 G d ( 3 ) = D k A G H G L 2 , k A d t ( 2 ) , L k + 1 H d ( 3 ) = D k A H G H L 1 , k A d t ( 2 ) .
Applying PTC to L k + 1 G d ( 2 ) and L k + 1 H d ( 2 ) , respectively, again, one has
L k + 1 G d t = [ L k + 1 G d ( 1 ) , Q k + 1 G L k + 1 G d ( 3 ) ] : = [ L k + 1 G d t ( 1 ) , L k + 1 G d t ( 2 ) , L k + 1 G d t ( 3 ) ] , L k + 1 H d t = [ L k + 1 H d ( 1 ) , Q k + 1 H , L k + 1 H d ( 3 ) ] : = [ L k + 1 H d t ( 1 ) , L k + 1 H d t ( 2 ) , L k + 1 H d t ( 3 ) ] ,
where Q k + 1 G R N × r k + 1 g and Q k + 1 H R N × r k + 1 h are unitary matrices from QR decomposition and the PTC in the ( k + 1 ) -th iteration is completed.
PTC for kernels. Define matrices
U ^ 1 , k A = U k G I m a , U ^ k G = I ( k 1 ) m a U k G I m a , U ^ 2 , k A = U k H I m a , U ^ k H = I ( k 1 ) m a U k H I m a ,
with U k G and U k H in (32). Then the truncated and compressed kernels are
K k G d t : = U ^ k G K k G d ( U ^ k G ) R m k g × m k g , K k H d t : = U ^ k H K k H d ( U ^ k H ) R m k h × m k h , K k A d t : = U ^ 1 , k A K k H d ( U ^ 2 , k A ) R m k g × m k h .
To eliminate items less than O ( τ g ) and O ( τ h ) in the low-rank factors and kernels, an additional monitoring step is imposed after the PTC process. Specifically, the last item D k 2 A G H G Π i = k 3 0 D i A G H L 20 A in L k G d t (or D k 2 A H G H Π i = k 3 0 D i A G H L 10 A in L k H d t ) will be discarded if its norm is less than O ( τ g ) (or O ( τ h ) ). Similarly, Π i = k 1 0 D i A G H L 10 A in L 1 , k A d ( 2 ) (or Π i = k 1 0 D i A H G L 20 A in L 2 , k A d ( 2 ) ) will be abandoned if its norm is less than O ( τ g ) (or O ( τ h ) ). In this way, the growth of column dimension in the low-rank factors L k G d t , L k H d t , L 1 , k A d t and L 2 , k A d t , as well as the kernels K k G d t , K k H d t , K k A d t , will be controlled efficiently while sacrificing a hopefully negligible bit of accuracy. Additionally, their sizes after PTC will be further restricted by setting a reasonable upper bound m max .

5. Algorithm and Implementation

5.1. Computation of Residuals

The computation of relative residuals, such as r r e l = | D ( H k ) | / | D ( H 0 ) | , is commonly used in the context of solving the DARE using SDA, as mentioned in [4]. Typically, the FSDA algorithm is designed to stop when the relative residual is sufficiently small, which guarantees that the approximated solution H k is close to the exact solution of the DARE [35]. However, computing r r e l directly can be computationally expensive due to the high rank of H k and G k . To overcome this difficulty, the residual is divided into two parts, the banded part and the low-ranked part, under the assumptions of Equations (4) and (5). The residual for the banded part can be computed relatively easily and serves as a pre-termination condition, followed by the termination of the entire FSDA algorithm based on the residual for the low-ranked part.

5.1.1. Residual for the Banded Part

Define
D ˜ k H G = ( I + D k H D 0 G ) 1 , D ˜ k H G H = D ˜ k H G D k H , D ˜ k G H G = D 0 G D ˜ k H G
and
K ˜ k H = ( I + K k H ( L k H ) D ˜ k G H G L k H ) 1 K k H .
With the current approximated solution H k = D k H + L k H K k H ( L k H ) , the residual for DARE (3) is
D ( H k ) = H k + A D ˜ k H G H + D ˜ k H G L k H K ˜ k H ( D ˜ k H G L k H ) A + H : = D k R + L k R K k R ( L k R ) ,
where the banded part, the low-rank part and the kernel are
D k R = D 0 H D k H + ( D 0 A ) D k H ( I + D 0 G D k H ) 1 D 0 A ,
L k R = [ L 20 A , ( D 0 A ) D ˜ k H G H L 10 A , ( D 0 A ) D ˜ k H G L k H , L k H ] ,
K k R = m a m a m k h m k h K ˜ k A H G H A I m a K ˜ k A H G 0 I m a 0 0 0 ( K ˜ k A H G ) 0 K ˜ k H 0 0 0 0 K k H m a m a m k h m k h
respectively, and
K ˜ k A H G = ( L 10 A ) D ˜ k H G L k H · K ˜ k H ,
K ˜ k A H G H A = ( L 10 A ) D ˜ k H G H L 10 A + K ˜ k A H G · ( L 10 A ) D ˜ k H G L k H .
It is not difficult to see that the main flop counts in the kernel K k R lie in forming matrices
( L 10 A ) D ˜ k H G H L 10 A , ( L 10 A ) D ˜ k H G L k H , ( L k H ) D ˜ k G H G L k H .
To avoid calculating them in each iteration, we first verify if
B _ R R e s = D k R | D ¯ 0 R | + L 0 R 2 K 0 R ϵ b
with | D ¯ 0 R | = D 0 A 2 2 D 0 H ( I + D 0 G D 0 H ) 1 2 and ϵ b being the band tolerance. Here, the norm · 2 is the matrix spectral norm, which is not easy to compute and is replaced by l 1 -matrix norm in practice. This is feasible as the residual of D ( H k ) comes from two relatively independent parts, i.e., the banded part and the low-rank part.

5.1.2. Residual for the Low-Rank Part

When the pre-termination (39) is satisfied, matrices in (38) are then constructed, followed by the deflation, truncation, and compression of the low-rank factor L k R . Specifically, the columns L 20 A ( : , 1 : m a ) are removed and columns of L k H ( : , 1 : m a ) are kept such that L k R is deflated to L k R d , i.e.,
Fractalfract 07 00468 i001
Let I ^ m a = [ I m a , 0 , 0 ] R m a × m k h , K ^ k A H G = [ ( K ˜ k A H G ) , 0 , , 0 ] R m k h × m k h . The kernel K k R in (37) is correspondingly deflated as
K k R d m a m k h m k h 0 0 I ^ m a 0 K ˜ k H K ^ k A H G ( I ^ m a ) ( K ^ k A H G ) K ^ k A H G H A m a m k h m k h : = K k R d ,
where all elements in K ^ k A H G H A are same to those in K k H except K ^ k A H G H A ( 1 : m a , 1 : m a ) = K ˜ k A H G H A K k H ( 1 : m a , 1 : m a ) .
After deflation, the truncation and compression are applied to L k R d with QR decomposition
L k R d P k R = [ Q k R Q ˜ k R ] U k , 1 R U k , 2 R 0 U ˜ k R , U ˜ k R < u 0 r τ r ,
where P k R is the permutation matrix such that the diagonal elements of U k , 1 R U k , 2 R 0 U ˜ k R are decreasing in absolute value, u 0 r = U 0 , 1 R and τ r is the given tolerance, Q k R R n × r k r is orthonormal and U k R = [ U k , 1 R U k , 2 R ] R r k r × n k is full-ranked. Since L k R K k R ( L k R ) U k R K k R d ( U k R ) , the terminating condition of the whole algorithm is chosen to be
L R _ R R e s = U k R K k R d ( U k R ) | D ¯ 0 R | + L 0 R 2 K 0 R ϵ l
with ϵ l being the low-rank tolerance.

5.2. Algorithm and Operation Counts

The process of deflation and PTC together with the computation of residuals (39) and (40) are summarized in the FSDA Algorithm 1.
Algorithm 1 FSDA. Solve DAREs with high-ranked G and H
Inputs: 
Banded matrices D 0 A , D 0 G , D 0 H , low-rank factors L 10 A , L 20 A , L 0 G , L 0 H , K 0 G , K 0 H , and the iterative tolerance t o l ; truncation tolerances τ g , τ h , τ r and upper bound m max ; band tolerance ϵ b and low-rank tolerance ϵ l .
Outputs: 
Banded matrix D H , low-rank matrix L H and the kernel matrix K H with the stabilizing solution X s D H + L H K H ( L H ) .
  • Set D 1 G = D 0 G + D 0 A G H G ( D 0 A ) , D 1 H = D 0 H + D 0 A H G H D 0 A , D 1 A = D 0 A ( D 0 A H G ) as in (9). Set L 1 G = [ L 10 A , D 0 A G H G L 20 A ] , L 1 H = [ L 20 A , D 0 A H G H L 10 A ] , L 11 A = [ L 10 A , D 0 A G H L 10 A ] , L 21 A = [ L 20 A , D 0 A H G L 20 A ] as in (10). Set K 1 G , K 1 H , K 1 A as in (11) and (12).
  • For k = 2 , , until convergence, do
  •  Compute banded matrices D k G , D k H , D k A as in (13).
  •  Form components (18)–(20) and construct kernels K k G , K k G and K k G as in (21)–(23).
  •  Deflate kernels K k G d K k G d , K k H d K k H d and K k A d K k A d in a way of Figure 3 and Figure 4.
  •  Deflate the low-rank factors L k G d L k G d , L k H d L k H d , L 1 , k A d L 1 , k A d and L 2 , k A d L 2 , k A d as in (A1)–(A4).
  •  Partially truncate and compress L k G d and L k H d as in (32) with accuracy u 0 g τ g , u 0 g τ h .
  •  Construct compressed low-rank factors L k G d t , L k H d t , L 1 , k A d t and L 2 , k A d t as in (33)–(34).
  •  Construct compressed kernels K k G d t , K k H d t and K k A d t as in (36).
  •  Evaluate the residual of the banded part B_RRes in (39).
  • If B_RRes < t o l , compute the residual of low-rank part LR_RRes in (40).
  •   If LR_RRes < t o l , break, end.
  • End (If);
  • K k G : = K k G d t , K k H : = K k H d t , K k A : = K k A d t .
  • L k G : = L k G d t , L k H : = L k H d t , L 1 , k A : = L 1 , k A d t , L 2 , k A : = L 2 , k A d t .
  • k : = k + 1 ;
  • End (For)
  • Output D k H = D H , L k H = L H and K k H = K H .
Remark 3.
1. At each iteration, elements in the banded matrices D k A , D k H , and D k G with an absolute value less than t o l = e p s · max { D A , D G , D H } are eliminated.
2. The deflation process involves merging selected rows and columns in the kernels K k G , K k H , and K k A based on overlapping columns in the low-rank factors L k G , L k H , L 1 , k A , and L 2 , k A . This requires adding some columns and rows.
3. The PTC is applied to L k G d ( 2 ) and L k H d ( 2 ) . The column numbers of L k G d ( 1 ) and L k H d ( 1 ) increase linearly with respect to k , while those of L k G d ( 3 ) and L k H d ( 3 ) remain unchanged. Elements in L k G d ( 1 ) , L k H d ( 1 ) , L k G d ( 3 ) , and L k H d ( 3 ) with an absolute value less than t o l are removed to minimize the column size of the low-rank factors.
To further analyze the complexity and the memory requirement of the FSDA, the bandwidth of D k A , D k G , and D k H at each iteration are assumed to be b k a , b k g and b k h ( b k a , b k g , b k h N ), respectively. We also set b k h g = max { b k h , b k g } , b k h g a = max { b k h , b k g , b k a } , m k a = max { m k a 1 , m k a 2 } , and m k 1 h g a : = m k 1 h + m k 1 g + m k 1 a for the convenience of counting flops. The table in Appendix E lists the time and memory requirement for different components in the k-th iteration of the FSDA, where the estimations are upper bounds due to the truncation errors τ g , τ h and τ r .

6. Numerical Examples

In this section, we will demonstrate the effectiveness of the FSDA algorithm in computing the approximate solution of the DARE (3). The FSDA algorithm was implemented using MATLAB 2014a [38] on a 64-bit PC running Windows 10. The PC had a 3.0 GHz Intel Core i5 processor with 6 cores and 6 threads, 32GB RAM, and a machine unit round-off value of eps = 2.22 × 10 16 . The residual for the DARE was estimated using the upper bound formula
r ˜ k = B _ R R e s + L R _ R R e s ,
where B_RRes in (39) and LR_RRes in (40) are the relative residuals for the banded part and the low-rank part, respectively. The tolerance values for truncation and compression were set to τ g = τ h = τ r = 10 16 , and the termination tolerance values were set to ϵ b = ϵ l = 10 11 . We also tried N · eps as the tolerance value for τ g , τ h and τ r in our experiments, but found that it had no impact on the residual accuracy. The maximum permitted column number in the low-rank factors was set to m max = 2200 . As a comparison, we also ran the ordinary SDA algorithm with hierarchical structure (i.e., HODLR) using the hm-toolbox (http://github.com/numpi/hm-toolbox, accessed on 1 June 2023) [39,40]. The SDA algorithm with hierarchical structure is referred to as SDA_HODLR in this paper. The derived relative residual for SDA_HODLR is denoted by r ^ k . In our numerical experiments, the initial bandwidths of all banded matrices in Examples 1 and 3 were relatively small, while those in Example 2 were non-trivial.
Example 1.
The first example is of the medium scale, measuring the error between the true solution and the computed one. Given the constant θ = η + 1 η 2 ζ , where ζ and η are positive numbers such that θ is real. Let L 10 A = θ e with e the random vector satisfying e e = 1 , L 20 A = L 10 A , D 0 A = ζ I N , then A = D 0 A + L 10 A ( L 20 A ) . Set G = D 0 G = I N , H = D 0 H = ( η + 1 η ) D 0 A ( D 0 A ) 2 I N . The solution of the DARE is of the form X s = D X + L X ( L X ) with D X = η D 0 A I N and L X = η L 10 A .
It is not difficult to see that the solution X s is stabilizing since the spectral radius of ( I N + G X s ) 1 A is less than unity when η > 1 .
We first took ζ = 1.2 and η = 2 to calculate B_RRes, followed by LR_RRes as well as the upper bound of residual of DARE r ˜ k . In our implementations, the relative error between the approximated solution (denoted by H j when terminated at the j-th iteration) and the true stabilizing solution X s was evaluated, and the numerical results are presented in Table 1. It is seen that for different scales ( N = 1000 , 3000 , 5000 , 7000 ) FSDA was able to attain the prescribed banded accuracy in five iterations. Residuals LR_Res and r ˜ k were then evaluated, attaining the order O ( 10 16 ) . The relative error with the computational time being not included in the CPU time, also reflects that H 5 approximates the true solution very well. On the other hand, SDA_HODLR also attains the prescribed residual accuracy in five iterations, but cost more CPU time (in seconds).
We then took η = 1.2 to make the spectral radius of ( I N + G X s ) 1 A close to 1 and recorded the numerical performance of the FSDA with ζ = 1.0 . It is seen from Table 1 that the FSDA costs seven iterations before termination, obtaining almost the same banded residual histories (B_RRes) for different N. As before, LR_RRes and r ˜ k were of O ( 10 17 ) and O ( 10 16 ) , respectively, showing that H 7 is a good approximation to the true solution to DARE (3). The last relative error H 7 X s / X s also validates this fact. Analogously, SDA_HODLR requires seven iterations to arrive at the residual level O ( 10 15 ) . It is also seen that the FSDA costs less CPU time than SDA_HODLR for all N.
Example 2.
Consider a generalized model of power system labelled by PI Sections 20–80 (https://sites.google.com/site/rommes/software, “S10PI_n1.mat” accessed on 1 June 2023). All transmission lines in the network are modelled by RLC ladder networks, of cascaded RLC PI-circuits [41]. The original band-plus-low-rank matrix A has a small scale of 528 (Figure 5) and is then extended to larger ones. Specifically, we extract the banded part D ori A of the bandwidth 217 from the original matrix A ori and tile it along the diagonal direction for 20 times to obtain D 0 A . We then implement an SVD of the matrix A ori D ori A to produce the singular value matrix Σ A and the unitary matrices U A and V A . The low-ranked parts L 10 A and L 20 A are then constructed by tiling U A ( : , 1 : r a ) and V A ( : , 1 : r a ) 20 times and multiplying Σ A 1 / 2 ( 1 : r a , 1 : r a ) from the right, respectively, where r a is the number of singular values in Σ A less than 10 8 . Let F 1 and F 3 be block diagonal matrices with each diagonal block the 3 × 3 random matrix (generated by‘rand(3)’). Let F 2 and F 4 be also diagonal block matrices with the top left element a random number, the last diagonal block 2 × 2 random matrix and others 3 × 3 random matrices. Define matrices G and H as
G : = D 0 G = ( R g + R g ) / 2 + ξ I N , H : = D 0 H = ( R h + R h ) / 4 + ξ I N ,
with R g = ( F 1 + I N ) ( F 2 + I N ) , R h = ( F 3 + I N ) ( F 4 + I N ) .
We ran the FSDA with three different ξ = 0.11 , 1.0 , 3.0 , each conducting five random experiments. In all experiments, B_RRes and LR_RRes (in log 10 ) were observed attaining the pre-terminating condition (39) and the terminating condition (40), respectively.
Figure 6 plots the obtained numerical results for five experiments, where Rk is the upper bound of the residual of the DARE, BRes and LRes are the absolute residuals of the banded part and the low-rank part (i.e., the numerators in B_RRes and LR_RRes), respectively. It is seen that the relative residual levels of LR_RRes and B_RRes (between 10 14 and 10 17 ) are lower than those of LRes and BRes (between 10 11 and 10 13 ) in all experiments. Particularly, the gap between them increases as ξ becomes larger. On the other hand, the residual line of Rk is above the residual lines of B_RRes or LR_RRes, attaining the level between 10 15 and 10 16 . This demonstrates that the FSDA can obtain a relatively high residual accuracy.
To clearly see the evolution of the bandwidth of the banded matrices and the dimensional increase in the low-rank factors for five iterations, we listed the history of bandwidths of D k G , D k H , and D k A (denoted by b k g , b k h , and b k a , respectively) and the column numbers of L k H d t and L k G d t (denoted by m k h and m k g , respectively) in Table 2, where the CPU row recorded the consumed CPU time in seconds. It is obviously seen that, for ξ = 0.11 , 1 , and 3, the FSDA requires 5, 4, and 3 iterations to reach the prescribed accuracy, respectively. Further experiments show that the required number of iterations, when terminated, will decrease as ξ goes larger. Additionally, we see that bandwidths b k g and b k h rise much in the second iteration but keep almost unchanged for the remaining iterations. Nevertheless, b k a decreases gradually after reaching the maximal value in the second iteration, which is consistent with the convergence of D k A in Corollary 1. On the other hand, we see from m k h and m k g that the column numbers in the second iteration are about fourfold of those in the first iteration since the FSDA does not deflate the low-rank factors at the first iteration. However, the column numbers in the fifth iteration (if it exists) are less than twofold of those in the fourth iteration. This reflects that deflation and PTC are efficient in reducing the dimensions of low-rank factors. In our experiments, we also found that nearly half of the CPU time in the FSDA was consumed in forming ( I N + D k H D 0 G ) 1 D k H in the pre-termination. However, such a time expense might decrease if the initial bandwidths b 0 g , b 0 h , and b 0 a are narrow.
To further compare numerical performances between the FSDA and SDA_HODLR for larger problems, we extended the original scale to N = 15,840, 21,120, 26,400 and 31,680 at ξ = 3.0 and ran both algorithms until convergence. The results are listed in Table 3, where one can see that both the FSDA and SDA_HODLR (i.e., SDA_HD in the table) attain the prescribed residual accuracy within three iterations, and SDA_HODLR requires less CPU time than FSDA does. However, there seems a strong tendency that the FSDA will outperform the SDA_HODLR on CPU time for larger problems, as the CPU time of the SDA_HODLR appears to surge at N = 26,400 and SDA_HODLR used up memory at N = 31,680 without producing any numerical results (denoted by “—”). The symbols “*” in the SDA_HODLR column represent no related records for bandwidth and column number of the low-rank factors.
We further modified this example to have a simpler banded part to test both algorithms. Specifically, the relatively data-concentrated banded part of bandwidth 3 is extracted and tiled along the diagonal direction for 20 times to form D 0 A . As before, an SVD is imposed on the rest matrix to construct the low-ranked parts L 10 A and L 20 A after tiling the derived unitary matrices 20 times and multiplying Σ A 1 / 2 ( 1 : r a , 1 : r a ) from the right. We still selected ξ = 3.0 and ran both the FSDA and SDA_HODLR at scales N = 15,840, 21,120, 26,400 and 31,680 again. The obtained results are recorded in Table 4, where it is readily seen that the FSDA outperforms the SDA_HODLR on CPU time. Once again, the SDA_HODLR ran out of memory for the case N = 31,680.
Example 3.
This example is an extension of small-scale electric power systems networks to a large-scale one which is used for signal stability analysis [19,20,21]. The corresponding matrix A ori is from the power system of New England (https://sites.google.com/site/rommes/software, “ww_36_pemc_36.mat”, accessed on 1 June 2023). Figure 7 presents the original structure of the matrix A of order 66. We properly modified elements A ori ( 32 , 28 ) = 36.4687 , A ori ( 32 , 29 ) = 37.922 , A ori ( 46 , 42 ) = 33.0033 ; A ori ( 46 , 43 ) = 76.8277 , A ori ( 60 , 56 ) = 83.0405 , A ori ( 60 , 57 ) = 73.9947 , A ori ( 60 , 59 ) = 34.0478 . Then the banded part D ori A is extracted from blocks A ori (1:6, 1:6), A ori (7:13, 7:13), A ori (14:20, 14:20), A ori (21:27, 21:27), A ori (28:34, 28:34), A ori (35:41, 35:41), A ori (42:48, 42:48), A ori (49:55, 49:55), A ori (56:62, 56:62), and A ori (63:66, 63:66), admitting the bandwidth of 4. After tiling D ori A 200, 400, and 600 times along the diagonal direction, we obtain banded matrix D 0 A of scales N = 13,200, 26,400 and 39,600. For the low-rank factors, an SVD of the matrix A ori D ori A is firstly implemented to produce the diagonal singular value matrix Σ A and the unitary matrices U A and V A . The low-ranked parts L 10 A and L 20 A are then constructed by tiling U A ( : , 1 : r a ) and V A ( : , 1 : r a ) 200, 400, and 600 times and dividing their F-norms, respectively, where r a is the number of singular values in Σ A less than 10 10 . The matrices G and H are
G : = D 0 G = ξ I N , H : = D 0 H = I N 1 1 + ξ D 0 A ( D 0 A )
with ξ > 0 .
We took different ξ and ran the FSDA to compute the stabilizing solution for different dimensions N = 13,200, 26,400, and 39,600. In our experiments, the FSDA always satisfied the pre-terminating condition (39) first and then terminated at LR_RRes < ϵ l = 10 11 . We picked ξ = 95 and listed derived results in Table 5, where BRes (or LRes) and B_RRes (or LR_RRes) record the absolute and the relative residual for the banded part (or the low-rank part), respectively, and r ˜ k , [ b k g b k h b k a m k h m k g ] record histories of the upper bound of the residual of DARE, the bandwidths of D k G , D k H and D k A and the column numbers of the low-rank factors L k H d t and L k G d t , respectively. Particularly, the t k column describes the accumulated time to compute residuals (excluding the data marked with “*”).
Obviously, for different N, the FSDA is capable of achieving the prescribed accuracy after five iterations. The residuals BRes, B_RRes, LRes, and LR_RRes indicate that the FSDA tended to converge quadratically. Especially, BRes (or B_RRes) at different N are of nearly same order and terminate at O ( 10 9 ) (or O ( 10 11 ) ). Similarly, LRes (or LR_RRes) at different N attain the order O ( 10 11 ) (or O ( 10 16 ) ). More iterations seemed useless in improving the accuracy of LRes and LR_RRes. Note that data labelled with the superscript “*” in columns LRes, LR_RRes and r ˜ k come from the re-running of the FSDA to complement the residual in each iteration, and their corresponding CPU time is not included in the column t k . Lastly, [ b k g b k h b k a m k h m k g ] indicate that the bandwidths of D k G , D k G , and D k G are invariant and the column numbers of the low-rank factors grow less than twice in each iteration, demonstrating the effectiveness of the deflation and PTC.
We also ran the FSDA to compute the solution of the DARE of ξ = 90 and the results were recorded in Table 6. In this case, the FSDA requires seven iterations to reach the prescribed accuracy. As before, the last few residuals in the column BRes (or B_RRes) at different N are almost the same of O ( 10 9 ) (or O ( 10 14 ) ). The residuals LRes (or LR_RRes) at different N terminate at O ( 10 10 ) (or O ( 10 15 ) ). In particular, BRes and B_RRes showed that the FSDA attained the prescribed accuracy at the 5th iteration, but the corresponding residual of the low-rank part was still between 10 8 and 10 9 . So two additional iterations were required to meet the termination condition (40), even if the residual level in B_RRes kept stagnant in the last three iterations. From a structured point of view, it seems that the low-rank part is approaching the critical case while the banded part still lies in the non-critical case. Similarly, [ b g k b h k b a k m k h m k g ] indicate that D k G , D k H , and D k A are all block diagonal with block sizes 6 and the deflation and PTC for the low-rank factors are effective. Moreover, t k shows that the CPU times at the current iteration were less than twice that of the previous iteration when k 3 .
We further compare numerical performances between the FSDA and SDA_HODLR for large-scale problems. Different values of ξ have been tried and the compared numerical behaviors of both algorithms are analogous. We list the results of ξ = 98 and ξ = 250 in Table 7, where one can see that the FSDA requires less iterations and CPU time to satisfy the stop criterion than the SDA_HODLR. Particularly, the SDA_HODLR depleted all memory at N = 39,600 and did not yield any numerical results (denoted by “—”). The symbols “*” in the SDA_HODLR column represent no related records for bandwidths and column numbers of the low-rank factors.

7. Conclusions

The stabilizing solution of the discrete-time algebraic Riccati Equation (DARE) from the fractional system, with high-rank non-linear term G and constant term H, is not of numerical low-rank. The structure-preserving doubling algorithm (SDA_h) proposed in [18] is no longer applicable for large scale problems. In some applications, such as in power systems, the state matrix A is of banded-plus-low-rank, and in those cases SDA can be further developed to the factorized structure-preserving doubling algorithm (FSDA) to solve large scale DAREs with high-rank non-linear and constant terms. Under the assumption that G and H are positive semidefinite and D G and D H are banded matrices with banded inverse (BMBI), we presented the iterative scheme of FSDA, as well as the convergence of the banded and the low-ranked parts. A deflation process and the technique of PCT are subsequently proposed to efficiently control the growth of the number of columns of low-rank factors. Numerical experiments have demonstrated that the FSDA always reaches the economical pre-terminating condition associated with the banded part before the real terminating condition related to the low-rank part, yielding good approximated solutions H k = D k H + L k H K k H ( L k H ) and G k = D k G + L k G K k G ( L k G ) to the DARE and its dual, respectively. Moreover, our FSDA is superior to the existing SDA_HODLR on the CPU time for large-scale DAREs. For future work, the computation of the stabilizing solution for CAREs might be further investigated. This will be more complicated as the Cayley transformation is incorporated and the selection of the corresponding parameter does not seem easy. In addition, other sparse structures of A and high-rank H and G might be investigated.

Author Contributions

Conceptualization, B.Y.; methodology, B.Y.; software, N.D.; validation, N.D.; and formal analysis, B.Y. All authors have read and agreed to the final version of this manuscript.

Funding

This work was supported in part by the NSF of China (11801163), the NSF of Hunan Province (2021JJ50032, 2023JJ50165), the foundation of Education Department of Hunan Province (HNJG-2021-0129) and Degree & Postgraduate Education Reform Project of Hunan University of Technology and Hunan Province (JG2315, 2023JGYB210).

Acknowledgments

Part of the work occurred when the first author visited Monash University. The authors also thank the editor and three anonymous referees for their helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Fractalfract 07 00468 i002
Matrices L 1 G , L 11 A , L 1 H and L 21 A are actually the deflated, truncated and compressed low-rank factors L 1 G d t , L 11 A d t , L 1 H d t and L 21 A d t , respectively. We omit the superscript “dt” for the simpler notation.

Appendix B

Fractalfract 07 00468 i003
Fractalfract 07 00468 i004
Fractalfract 07 00468 i005
Fractalfract 07 00468 i006
Matrices L k 1 G , L 1 , k 1 A , L k 1 H and L 2 , k 1 A are actually the deflated, truncated and compressed low-rank factors L k 1 G d t , L 1 , k 1 A d t , L k 1 H d t and L 2 , k 1 A d t , respectively. We omit the superscript “dt” for convenience.

Appendix C. Description for the Deflation of L 13 A

m 2 g m 2 a 1 m 2 g m 2 h m 2 a 2 L 3 G = [ L 2 G L 12 A D 2 A G H L 2 G D 2 A G H G L 2 H D 2 A G H G L 22 A ] N = [ D 0 A G H G L 20 A , L 10 A , D 0 AGH L 10 A , D 1 AGH L 10 A , D 1 AGH D 0 AGHG L 20 A , D 1 AGHG L 20 A , D 1 AGHG D 0 A GHG L 10 A , D 1 A G H G D 0 A H G L 20 A , |
L 10 A , D 0 AGH L 10 A , D 1 AGH L 10 A , D 1 AGH D 0 AGHG L 20 A , D 1 AGHG L 20 A , D 1 AGHG D 0 A GHG L 10 A , D 1 A G H D 0 A G H L 10 A , |
D 2 A G H L 2 A , |
D 2 A G H G ( D 0 A H G H L 10 A , L 20 A , D 0 A HG L 20 A , D 1 A HG L 20 A , D 1 A HG D 0 A HGH L 10 A , D 1 A HGH L 10 A , D 1 A HGH D 0 AGHG L 20 A , D 1 A H G H D 0 A G H L 10 A ) |
D 2 A G H G ( L 20 A , D 0 A HG L 20 A , D 1 A HG L 20 A , D 1 A HG D 0 A HGH L 10 A , D 1 A HGH L 10 A , D 1 A HGH D 0 AGHG L 20 A , D 1 A H G D 0 A H G L 20 A ) ]
d [ D 0 A G H G L 20 A , D 1 A G H G D 0 A H G L 20 A , | L 10 A , D 0 AGH L 10 A , D 1 AGH L 10 A , D 1 AGH D 0 AGHG L 20 A , D 1 AGHG L 20 A , D 1 AGHG D 0 A GHG L 10 A , D 1 A G H D 0 A G H L 10 A , | D 2 A G H L 2 A , | D 2 A G H G ( D 0 A H G H L 10 A , L 20 A , D 0 A HG L 20 A , D 1 A HG L 20 A , D 1 A HG D 0 A HGH L 10 A , D 1 A HGH L 10 A , D 1 A HGH D 0 AGHG L 20 A , D 1 A H G H D 0 A G H L 10 A ) | D 2 A G H G D 1 A H G D 0 A H G L 20 A ] 20 m a m 2 a 1 m 2 g m 2 h m a = D 0 A G H G L 20 A , D 1 A G H G D 0 A H G L 20 A L 12 A D 2 A G H L 2 G D 2 A G H G L 2 H D 2 A G H G D 1 A H G D 0 A H G L 20 A N : = L 3 G d .
After the previous deflation, there are m 2 g 2 m a columns in L 2 G and L 12 A (items marked with bold type in (A2) and (A3)) and m 2 a m a columns (items marked with bold type in (A4) and (A5)) in D 2 A G H G L 2 H and D 2 A G H G L 22 A are identical. Then, one can remove columns of L 2 G ( : , m a + 1 : m 2 g m a ) in (A2) and D 2 A G H G L 22 A ( : , 1 : m 2 a m a ) in (A5) (i.e., items with bold type in (A2) and (A5)), keep columns of L 12 A ( : , 1 : m 2 g 2 m a ) in (A3) and D 2 A G H G L 2 H ( : , m 2 h m 2 a 2 + 1 : m 2 h m a ) in (A4) (i.e., items with bold type in (A3) and (A4)), respectively. Then there are two matrices with each of order N × m a are left in L 2 G and only one matrix of order N × m a left in D k A G H G L 22 A .
Note that matrices L 2 G , L 12 A , L 2 H and L 22 A are actually the deflated, truncated and compressed low-rank factors L 2 G d t , L 12 A d t , L 2 H d t and L 22 A d t , respectively.

Appendix D. Description for the Deflation of L 13 A

m 2 a 1 m 2 g m 2 h m 2 a 1 L 13 A = [ L 12 A D 2 A G H L 2 G D 2 A G H G L 2 H D 2 A G H L 12 A ] N = [ L 12 A ,
D 2 A G H ( D 0 A G H G L 20 A , L 10 A , D 0 AGH L 10 A , D 1 AGH L 10 A , D 1 AGH D 0 AGHG L 20 A , D 1 AGHG L 20 A , D 1 AGHG D 0 A GHG L 10 A , D 1 A G H G D 0 A H G L 20 A ) , |
D 2 A G H G L 2 H |
D 2 A G H ( L 10 A , D 0 AGH L 10 A , D 1 AGH L 10 A , D 1 AGH D 0 AGHG L 20 A , D 1 AGHG L 20 A , D 1 AGHG D 0 A GHG L 10 A , D 1 A G H D 0 A G H L 10 A ) ]
d [ L 12 A , | D 2 A G H ( D 0 A G H G L 20 A , L 10 A , D 0 AGH L 10 A , D 1 AGH L 10 A , D 1 AGH D 0 AGHG L 20 A , D 1 AGHG L 20 A , D 1 AGHG D 0 A GHG L 10 A , D 1 A G H G D 0 A H G L 20 A ) , | D 2 A G H G L 2 H , |
D 2 A G H D 1 A G H D 0 A G H L 10 A ]
m 2 a 1 m 2 g m 2 h m a = [ L 12 A D 2 A G H L 2 G D 2 A G H G L 2 H D 2 A G H D 1 A G H G D 0 A G H G L 10 A ] N : = L 13 A d .
To deflate L 13 A , columns of D 2 A G H L 12 A ( : , 1 : m 2 a 1 m a ) are removed (i.e., items marked with bold type in (A7)) but columns of D 2 A G H L 2 G ( : , m 2 g m 2 a 1 + 1 : m 2 g m a ) (i.e., items marked with bold type in (A6)) are retained in L 12 A . So only one matrix of order N × m a is left in D 2 A G H L 12 A , i.e., the last item Π i = 2 0 D i A G H L 10 A in (A8).
Note that matrices L 2 G , L 12 A and L 2 H are actually the deflated, truncated, and compressed low-rank factors L 2 G d t , L 12 A d t , and L 2 H d t , respectively.

Appendix E

Table A1. Complexity and memory requirement at k-th iteration in the FSDA.
Table A1. Complexity and memory requirement at k-th iteration in the FSDA.
ItemsFlopsMemory
Banded part
D k A G H , D k A H G * 4 N ( 2 b k 1 h g + 1 ) 2 + b k 1 h g b k 1 a 2 N ( 2 b k 1 h g a + 1 )
D k G , D k H , D k A 4 N ( 2 b k 1 g + 1 ) ( 2 b k 1 h g a + 1 ) 2 N ( 2 b k 1 h g a + 1 )
Low-rank part and kernels
D k 1 A G H L k 1 G , D k 1 A G H G L k 1 H , D k 1 A G H G L 2 , k 1 A 2 N b k 1 h g a ( m k 1 g + m k 1 h + m k 1 a ) ( m k 1 g + m k 1 h + m k 1 a ) N
D k 1 A H G L k 1 H , D k 1 A H G H L k 1 G , D k 1 A H G H L 1 , k 1 A 2 N b k 1 h g a ( m k 1 g + m k 1 h + m k 1 a ) ( m k 1 g + m k 1 h + m k 1 a ) N
Θ k 1 H , Θ k 1 G , Θ k 1 H G 2 N ( b k 1 h g ( m k 1 h + m k 1 g ) + b k 1 h g m k 1 g + ( m k 1 h ) 2 + ( m k 1 g ) 2 + m k 1 g m k 1 h ) ( m k 1 h ) 2 + ( m k 1 g ) 2 + m k 1 h m k 1 g
Θ k 1 A , Θ 1 , k 1 A , Θ 2 , k 1 A 2 N ( 2 b k 1 h g m k 1 a + b k 1 h g m k 1 a + 3 ( m k 1 a ) 2 ) 3 ( m k 1 a ) 2
Θ 1 , k 1 A H , Θ 1 , k 1 A G 2 N ( b k 1 h g ( m k 1 h + m k 1 g ) + m k 1 a ( m k 1 h + m k 1 g ) ) m k 1 a ( m k 1 h + m k 1 g )
Θ 2 , k 1 A H , Θ 2 , k 1 A G 2 N ( b k 1 h g ( m k 1 h + m k 1 g ) + m k 1 a ( m k 1 h + m k 1 g ) ) m k 1 a ( m k 1 h + m k 1 g )
K k 1 A G H G ( m k 1 a ) 2 ( m k 1 h + m k 1 g ) + m k 1 a ( m k 1 h + m k 1 g ) 2 m k 1 a ( m k 1 h + m k 1 g )
K k 1 A G H G A , K k 1 A H G H A , K k 1 A G H A 6 ( m k 1 a ) 2 ( 2 m k 1 a + m k 1 h + m k 1 g ) 3 ( m k 1 a ) 2
K k 1 A H G H 2 ( m k 1 a ) ( m k 1 a + m k 1 h ) ( m k 1 a + m k 1 h + m k 1 g ) m k 1 a ( m k 1 h + m k 1 g )
K k 1 A G H , K k 1 A G H 2 ( m k 1 a ) ( m k 1 a + m k 1 h ) 2 2 m k 1 a ( m k 1 h + m k 1 g )
K k 1 G H *, K k 1 G H G , K k 1 H G H 8 ( m k 1 h + m k 1 g ) 3 / 3 3 ( m k 1 h + m k 1 g ) 2
Q k G , Q k H ** 4 ( m k 1 a + m k 1 g + m k 1 h ) 2 ( N m k 1 a + m k 1 g + m k 1 h ) ( r k h + r k g ) N
U k G , U k H , 4 ( m k 1 a + m k 1 g + m k 1 h ) r k 1 g ( N m k 1 a + m k 1 g + m k 1 h ) ( r k g + r k h ) × m k 1 h g a
K k G d t 12 ( m k 1 a + m k 1 g + m k 1 h ) 2 r k 1 g ( m k g ) 2
K k H d t , 12 ( m k 1 a + m k 1 g + m k 1 h ) 2 r k 1 h ( m k h ) 2
K k A d t 6 ( m k 1 a + m k 1 g + m k 1 h ) 2 ( r k 1 g + r k 1 g ) m k g m k h
Residual part
  ( D 0 A ) D ˜ k H G H L 10 A , ( D 0 A ) D ˜ k H G L k H 2 b k h g ( m a + m k h ) N ( m k h + m a ) N
( L k H ) D ˜ k G H G L k H 2 b k h g ( m a + m k h ) N ( m k h ) 2
K ˜ k H * 8 ( m k h ) 2 / 3 ( m k h ) 2
K ˜ k A H G 2 b k h g ( m a + m k h ) N m a m k h
K ˜ k A H G H A 2 m a ( b k h g + m a ) N + 2 ( m a ) 2 m k h ( m a ) 2
Q k R ** 2 ( m a + 2 m k h ) 2 ( N m a 2 m k h ) r k r N
U k R 2 ( m a + 2 m k h ) r k r ( N m a 2 m k h ) r k r ( m a + 2 m k h )
U k R K k R d ( U k R ) 2 ( m a + 2 m k h ) r k r ( r k r + m a + 2 m k h ) ( r k r ) 2
* LU factorization and Gaussian elimination is used [42]. ** Householder QR decomposition is used [12].

References

  1. Nosrati, K.; Shafiee, M. On the convergence and stability of fractional singular Kalman filter and Riccati equation. J. Frankl. Inst. 2020, 357, 7188–7210. [Google Scholar]
  2. Trujillo, J.J.; Ungureanu, V.M. Optimal control of discrete-time linear fractional-order systems with multiplicative noise. Int. J. Control 2018, 91, 57–69. [Google Scholar] [CrossRef] [Green Version]
  3. Podlubny, I. Fractional Differential Equations; Academic Press: New York, NY, USA, 1999. [Google Scholar]
  4. Benner, P.; Fassbender, H. The symplectic eigenvalue problem, the butterfly form, the SR algorithm, and the Lanczos method. Linear Algebra Appl. 1998, 275–276, 19–47. [Google Scholar]
  5. Chen, C.-R. A structure-preserving doubling algorithm for solving a class of quadratic matrix equation with M-matrix. Electron. Res. Arch. 2022, 30, 574–581. [Google Scholar]
  6. Chu, E.K.-W.; Fan, H.-Y.; Lin, W.-W. A structure-preserving doubling algorithm for continuous-time algebraic Riccati equations. Linear Algebra Appl. 2005, 396, 55–80. [Google Scholar]
  7. Chu, E.K.-W.; Fan, H.-Y.; Lin, W.-W.; Wang, C.-S. A structure-preserving doubling algorithm for periodic discrete-time algebraic Riccati equations. Int. J. Control 2004, 77, 767–788. [Google Scholar]
  8. Kleinman, D. On an iterative technique for Riccati equation computations. IEEE Trans. Autom. Control 1968, 13, 114–115. [Google Scholar] [CrossRef]
  9. Lancaster, P.; Rodman, L. Algebraic Riccati Equations; Clarendon Press: Oxford, UK, 1995. [Google Scholar]
  10. Laub, A.J. A Schur method for solving algebraic Riccati equation. IEEE Trans. Autom. Control 1979, AC-24, 913–921. [Google Scholar]
  11. Li, T.-X.; Chu, D.-L. A structure-preserving algorithm for semi-stabilizing solutions of generalized algebraic Riccati equations. Electron. Trans. Numer. Anal. 2014, 41, 396–419. [Google Scholar]
  12. Mehrmann, V.L. The Autonomous Linear Quadratic Control Problem; Lecture Notes in Control and Information Sciences; Springer: Berlin/Heidelberg, Germany, 1991; Volume 163. [Google Scholar]
  13. Mohammad, I. Fractional polynomial approximations to the solution of fractional Riccati equation. Punjab Univ. J. Math. 2019, 51, 123–141. [Google Scholar]
  14. Tvyordyj, D.A. Hereditary Riccati equation with fractional derivative of variable order. J. Math. Sci. 2021, 253, 564–572. [Google Scholar] [CrossRef]
  15. Yu, B.; Li, D.-H.; Dong, N. Low memory and low complexity iterative schemes for a nonsymmetric algebraic Riccati equation arising from transport theory. J. Comput. Appl. Math. 2013, 250, 175–189. [Google Scholar] [CrossRef]
  16. Benner, P.; Saak, J.A. Galerkin-Newton-ADI method for solving large-scale algebraic Riccati equations. In DFG Priority Programme 1253 “Optimization with Partial Differential Equations”; Preprint SPP1253-090; DFG: Bonn, Germany, 2010. [Google Scholar]
  17. Chu, E.K.-W.; Weng, P.C.-Y. Large-scale discrete-time algebraic Riccati equations—Doubling algorithm and error analysis. J. Comput. Appl. Math. 2015, 277, 115–126. [Google Scholar] [CrossRef]
  18. Yu, B.; Fan, H.-Y.; Chu, E.K.-W. Large-scale algebraic Riccati equations with high-rank constant terms. J. Comput. Appl. Math. 2019, 361, 130–143. [Google Scholar] [CrossRef]
  19. Martins, N.; Lima, L.; Pinto, H. Computing dominant poles of power system transfer functions. IEEE Trans. Power Syst. 1996, 11, 162–170. [Google Scholar] [CrossRef]
  20. Freitas, F.D.; Martins, N.; Varricchio, S.L.; Rommes, J.; Veliz, F.C. Reduced-Order Transfer Matrices from RLC Network Descriptor Models of Electric Power Grids. IEEE Trans. Power Syst. 2011, 26, 1905–1916. [Google Scholar] [CrossRef]
  21. Rommes, J.; Martins, N. Efficient computation of multivariable transfer function dominant poles using subspace acceleration. IEEE Trans. Power Syst. 2006, 21, 1471–1483. [Google Scholar] [CrossRef]
  22. Dahmen, W.; Micchelli, C.C. Banded matrices with banded inverses, II: Locally finite decomposition of spline spaces. Constr. Approx. 1993, 9, 263–281. [Google Scholar] [CrossRef]
  23. Cantero, M.J.; Moral, L.; Velázquez, L. Five-diagonal matrices and zeros of orthogonal polynomials on the unit circle. Linear Algebra Appl. 2003, 362, 29–56. [Google Scholar] [CrossRef] [Green Version]
  24. Kimura, H. Generalized Schwarz form and lattice-ladder realizations of digital filters. IEEE Trans. Circuits Syst. 1985, 32, 1130–1139. [Google Scholar] [CrossRef] [Green Version]
  25. Kavcic, A.; Moura, J. Matrices with banded inverses: Inversion algorithms and factorization of Gauss–Markov processes. IEEE Trans. Inf. Theory 2000, 46, 1495–1509. [Google Scholar] [CrossRef] [Green Version]
  26. Strang, G. Fast transforms: Banded matrices with banded inverses. Proc. Natl. Acad. Sci. USA 2010, 107, 12413–12416. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Strang, G. Groups of banded matrices with banded inverses. Proc. Am. Math. Soc. 2011, 139, 4255–4264. [Google Scholar] [CrossRef] [Green Version]
  28. Strang, G.; Nguyen, T. Wavelets and Filter Banks; Wellesley-Cambridge Press: Cambridge, UK, 1996. [Google Scholar]
  29. Olshevsky, V.; Zhlobich, P.; Strang, G. Green’s matrices. Linear Algebra Appl. 2010, 432, 218–241. [Google Scholar] [CrossRef] [Green Version]
  30. Grasedyck, L.; Hackbusch, W.; Khoromskij, B.N. Solution of large scale algebraic matrix Riccati equations by use of hierarchical matrices. Computing 2003, 70, 121–165. [Google Scholar] [CrossRef] [Green Version]
  31. Kressner, D.; Krschner, P.; Massei, S. Low-rank updates and divide-and-conquer methods for quadratic matrix equations. Numer. Algorithms 2020, 84, 717–741. [Google Scholar] [CrossRef] [Green Version]
  32. Benner, P.; Saak, J. A Semi-Discretized Heat Transfer Model for Optimal Cooling of Steel Profiles. In Dimension Reduction of Large-Scale Systems; Benner, P., Sorensen, D.C., Mehrmann, V., Eds.; Lecture Notes in Computational Science and Engineering; Springer: Berlin/Heidelberg, Germany, 2005; Volume 45. [Google Scholar]
  33. Korvink, G.; Rudnyi, B. Oberwolfach Benchmark Collection. In Dimension Reduction of Large-Scale Systems; Benner, P., Sorensen, D.C., Mehrmann, V., Eds.; Lecture Notes in Computational Science and Engineering; Springer: Berlin/Heidelberg, Germany, 2005; Volume 45. [Google Scholar]
  34. Golub, G.H.; Van Loan, C.F. Matrix Computations; Johns Hopkins University Press: Baltimore, MD, USA, 1996. [Google Scholar]
  35. Huang, C.-M.; Li, R.-C.; Lin, W.-W. Structure-Preserving Doubling Algorithms for Nonlinear Matrix Equations; SIAM: Washington, DC, USA, 2018. [Google Scholar]
  36. Lin, W.-W.; Xu, S.-F. Convergence analysis of structure-preserving doubling algorithms for Riccati-type matrix equations. SIAM J. Matrix Anal. Appl. 2006, 28, 26–39. [Google Scholar] [CrossRef] [Green Version]
  37. Demko, S. Inverses of band matrices and local convergence of spline projections. SIAM J. Numer. Anal. 1977, 14, 616–619. [Google Scholar] [CrossRef]
  38. Mathworks. MATLAB User’s Guide; Mathworks: Natick, MA, USA, 2010. [Google Scholar]
  39. Massei, S.; Palitta, D.; Robol, L. Solving rank structured Sylvester and Lyapunov equations. SIAM J. Matrix Anal. Appl. 2018, 39, 1564–1590. [Google Scholar] [CrossRef] [Green Version]
  40. Massei, S.; Robol, L.; Kressner, D. hm-toolbox: Matlab software for HODLR and HSS matrices. SIAM J. Sci. Comput. 2020, 42, C43–C68. [Google Scholar] [CrossRef] [Green Version]
  41. Watson, N.; Arrillaga, J. Power Systems Electromagnetic Transients Simulation; IET, Digital Libray: London, UK, 2003. [Google Scholar]
  42. Arbenz, P.; Gander, W. A Survey of Direct Parallel Algorithms for Banded Linear Systems; Tech. Report 221; Departement Informatik, Institut für Wissenschaftliches Rechnen, ETH Zürich: Zurich, Switzerland, 1994. [Google Scholar]
Figure 1. The deflation process of K 2 G (or K 2 H ).
Figure 1. The deflation process of K 2 G (or K 2 H ).
Fractalfract 07 00468 g001
Figure 2. The deflation process of K 2 A .
Figure 2. The deflation process of K 2 A .
Fractalfract 07 00468 g002
Figure 3. The deflation process of K k G (or K k H ).
Figure 3. The deflation process of K k G (or K k H ).
Fractalfract 07 00468 g003
Figure 4. The deflation process of K k A .
Figure 4. The deflation process of K k A .
Fractalfract 07 00468 g004
Figure 5. Structured matrix A ori of size 528 × 528 in Example 2.
Figure 5. Structured matrix A ori of size 528 × 528 in Example 2.
Fractalfract 07 00468 g005
Figure 6. Residual of the banded part and the low-rank part for different ξ .
Figure 6. Residual of the banded part and the low-rank part for different ξ .
Fractalfract 07 00468 g006
Figure 7. Structured matrix A of order 66 × 66 (1194 non-zeros) in Example 3.
Figure 7. Structured matrix A of order 66 × 66 (1194 non-zeros) in Example 3.
Fractalfract 07 00468 g007
Table 1. Residual and actual errors in Example 1.
Table 1. Residual and actual errors in Example 1.
ζ = 1.2 , η = 2.0
N 1000300050007000
FSDA
4.39 × 10 1 4.41 × 10 1 4.42 × 10 1 4.42 × 10 1
3.47 × 10 2 3.48 × 10 2 3.49 × 10 2 3.49 × 10 2
B_RRes 1.38 × 10 4 1.38 × 10 4 1.38 × 10 4 1.38 × 10 4
2.10 × 10 9 2.11 × 10 9 2.11 × 10 9 2.11 × 10 9
4.25 × 10 16 4.27 × 10 16 4.27 × 10 16 4.31 × 10 16
LR_RRes 2.09 × 10 18 2.27 × 10 18 4.04 × 10 18 3.28 × 10 18
r ˜ k 4.27 × 10 16 4.29 × 10 16 4.31 × 10 16 4.34 × 10 16
H 5 X s / X s 2.56 × 10 16 2.57 × 10 16 2.56 × 10 16 2.48 × 10 16
CPU0.040.090.220.48
SDA_HODLR
4.44 × 10 1 4.44 × 10 1 4.44 × 10 1 4.44 × 10 1
3.50 × 10 2 3.50 × 10 2 3.50 × 10 2 3.50 × 10 2
r ^ k 1.39 × 10 4 1.39 × 10 4 1.39 × 10 4 1.39 × 10 4
2.12 × 10 9 2.12 × 10 9 2.12 × 10 9 2.12 × 10 9
1.33 × 10 15 1.27 × 10 15 1.34 × 10 15 1.47 × 10 15
CPU1.1719.9376.67186.61
ζ = 1 . 0 , η = 1.2
N 1000300050007000
FSDA
8.68 × 10 1 8.84 × 10 1 8.89 × 10 1 8.92 × 10 1
6.06 × 10 1 6.18 × 10 1 6.21 × 10 1 6.23 × 10 1
1.93 × 10 1 1.97 × 10 1 1.98 × 10 1 1.99 × 10 1
B_RRes 1.15 × 10 2 1.18 × 10 2 1.18 × 10 2 1.19 × 10 2
3.40 × 10 5 3.47 × 10 5 3.49 × 10 5 3.50 × 10 5
2.91 × 10 10 2.97 × 10 10 2.99 × 10 10 3.00 × 10 10
8.22 × 10 16 8.38 × 10 16 8.43 × 10 16 8.46 × 10 16
LR_RRes 3.03 × 10 17 1.07 × 10 17 2.77 × 10 17 1.75 × 10 17
r ˜ k 8.52 × 10 16 8.48 × 10 16 8.70 × 10 16 8.63 × 10 16
H 7 X s / X s 4.23 × 10 15 5.04 × 10 15 4.94 × 10 15 4.98 × 10 15
CPU0.310.450.480.96
SDA_HODLR
9.08 × 10 1 9.08 × 10 1 9.08 × 10 1 9.08 × 10 1
6.34 × 10 1 6.34 × 10 1 6.34 × 10 1 6.34 × 10 1
2.02 × 10 1 2.02 × 10 1 2.02 × 10 1 2.02 × 10 1
r ^ k 1.21 × 10 2 1.21 × 10 2 1.21 × 10 2 1.21 × 10 2
3.56 × 10 5 3.56 × 10 5 3.56 × 10 5 3.56 × 10 5
3.05 × 10 10 3.05 × 10 10 3.05 × 10 10 3.05 × 10 10
4.75 × 10 15 4.62 × 10 15 4.97 × 10 15 5.52 × 10 15
CPU1.6127.10107.16263.34
Table 2. CPU times and history of bandwidth of banded matrices and column numbers of low-rank factors in Example 2.
Table 2. CPU times and history of bandwidth of banded matrices and column numbers of low-rank factors in Example 2.
12345
[ b k g b k h b k a m k h m k g ] [ b k g b k h b k a m k h m k g ] [ b k g b k h b k a m k h m k g ] [ b k g b k h b k a m k h m k g ] [ b k g b k h b k a m k h m k g ]
[445 445 445 34 34][445 445 445 34 34][445 445 445 34 34][ 445 445 445 34 34][445 445 445 34 34]
[979 980 981 126 132][982 982 1042 126 132][973 767 973 126 132][1047 1033 1051 126 132][998 998 997 126 132]
ξ = 0.11 [981 980 980 474 484][981 980 980 474 481][973 767 748 480 492][1050 1047 1049 468 495][998 999 973 474 488]
[981 980 768 1012 1020][981 980 768 1014 1018][973 767 674 1025 1032][1050 1042 1047 1096 1028][981 980 768 1011 1023]
[981 980 519 1758 1767][981 980 522 1759 1771][973 767 493 1801 1812][1050 1042 983 1946 1853][981 980 525 1762 1773]
CPU4443.634451.364456.964414.654457.14
[445 445 445 34 34][445 445 445 34 34][445 445 445 34 34][ 445 445 445 34 34][445 445 445 34 34]
[973 767 973 126 132][768 973 769 126 132][815 973 973 126 132][1033 996 1042 126 132][745 745 748 126 132]
ξ = 1 [973 973 973 471 476][767 973 766 469 476][815 973 768 477 487][1042 1042 1042 479 490][753 980 732 474 488]
[973 973 646 911 927][767 973 555 910 916][815 973 646 1007 1027][1042 1042 840 973 980][753 980 684 923 931]
CPU4014.654025.743993.864107.844020.12
[445 445 445 34 34][445 445 445 34 34][445 445 445 34 34][445 445 445 34 34][445 445 445 34 34]
ξ = 3.0 [652 654 674 126 132][746 746 746 126 132][695 673 675 126 132][674 686 685 126 132][701 703 686 126 132]
[652 654 519 448 453][746 746 650 466 475][695 673 614 449 454][674 686 658 447 454][701 703 651 448 455]
CPU1797.391640.021803.231748.161695.01
Table 3. Numerical results for FSDA and SDA_HODLR in Example 2 at ξ = 3.0 . The symbol * stands for no related records.
Table 3. Numerical results for FSDA and SDA_HODLR in Example 2 at ξ = 3.0 . The symbol * stands for no related records.
N15,84021,12026,40031,680
FSDASDA_HDFSDASDA_HDFSDASDA_HDFSDASDA_HD
b k g [445 695 695]*[445 736 736]*[445 723 723]*[445 652 652]*
b k h [445 673 673]*[445 745 745]*[445 737 737]*[445 654 654]*
b k a [445 675 614]*[445 745 674]*[445 738 653]*[445 674 619]*
m k h [34 126 448]*[34 126 469]*[34 126 460]*[34 126 444]*
m k g [34 132 453]*[34 132 476]*[34 132 469]*[34 132 454]*
IT.3333333
RES.7.83 × 10 17 1.44 × 10 15 7.27 × 10 17 1.70 × 10 15 8.04 × 10 17 1.74 × 10 15 5.96 × 10 15
CPU6740.541285.3113,037.433701.4318,154.1417,653.6321,618.03
Table 4. Numerical results for FSDA and SDA_HODLR in relatively simpler banded part of Example 2 at ξ = 3.0 . The symbol * stands for no related records.
Table 4. Numerical results for FSDA and SDA_HODLR in relatively simpler banded part of Example 2 at ξ = 3.0 . The symbol * stands for no related records.
N15,84021,12026,40031,680
FSDASDA_HDFSDASDA_HDFSDASDA_HDFSDASDA_HD
b k g [31 31 31]*[36 37 37]*[38 39 39]*[34 37 37]*
b k h [28 30 30]*[34 36 36]*[38 40 40]*[36 39 39]*
b k a [28 31 28]*[36 38 34]*[38 42 38]*[34 40 35]*
m k h [48 280 628]*[48 286 647]*[48 287 651]*[48 285 647]*
m k g [48 281 628]*[48 285 645]*[48 287 650]*[48 287 650]*
IT.3333333
RES.5.95  × 10 17 2.09  × 10 15 3.46  × 10 16 1.81  × 10 15 8.49  × 10 16 3.05  × 10 15 9.93  × 10 17
CPU133.711255.06218.073744.06288.5315,508.81344.18
Table 5. Residuals, column numbers of low-rank factors, and CPU times at ξ = 95 in Example 3.
Table 5. Residuals, column numbers of low-rank factors, and CPU times at ξ = 95 in Example 3.
kBResB_RResLResLR_RRes r ˜ k [ b k g b k h b k a m k h m k g ] t k
N = 13,200 ξ = 95 τ g = τ h = 10 16 m max = 2000
1 1.02 × 10 3 1.42 × 10 2 1.04 × 10 3 * 1.25 × 10 2 * 2.67 × 10 2 * [ 6 6 6 29 29 ] 1.03
2 2.33 × 10 0 3.25 × 10 5 1.40 × 10 1 * 2.06 × 10 6 * 3.45 × 10 5 * [ 6 6 6 66 66 ] 5.01
3 6.19 × 10 3 8.64 × 10 8 2.94 × 10 3 * 4.33 × 10 8 * 1.30 × 10 7 * [ 6 6 6 76 76 ] 79.13
4 1.37 × 10 7 2.02 × 10 12 6.28 × 10 6 7.76 × 10 11 7.96 × 10 11 [ 6 6 6 100 101 ] 158.63
5 2.27 × 10 9 3.30 × 10 14 4.31 × 10 11 6.33 × 10 16 3.38 × 10 14 [ 6 6 6 169 170 ] 246.55
N = 26,400 ξ = 95 τ g = τ h = 10 16 m max = 2000
1 8.31 × 10 2 8.64 × 10 3 1.61 × 10 3 * 1.81 × 10 2 * 2.67 × 10 2 * [ 6 6 6 29 29 ] 3.58
2 2.95 × 10 0 3.07 × 10 5 1.40 × 10 1 * 1.46 × 10 5 * 3.21 × 10 5 * [ 6 6 6 66 66 ] 13.92
3 4.91 × 10 3 5.11 × 10 8 2.94 × 10 3 * 3.06 × 10 8 * 8.07 × 10 8 * [ 6 6 6 75 76 ] 534.56
4 1.94 × 10 7 2.02 × 10 12 5.28 × 10 6 5.49 × 10 11 5.69 × 10 11 [ 6 6 6 97 98 ] 1085.76
5 3.21 × 10 9 3.30 × 10 14 4.81 × 10 11 8.00 × 10 16 3.39 × 10 14 [ 6 6 6 160 161 ] 1675.01
N = 39,600 ξ = 95 τ g = τ h = 10 16 m max = 2000
1 1.01 × 10 3 8.64 × 10 3 1.62 × 10 3 * 1.81 × 10 2 * 2.67 × 10 2 * [ 6 6 6 29 29 ] 7.93
2 3.61 × 10 0 3.07 × 10 5 1.40 × 10 1 * 1.19 × 10 6 * 3.19 × 10 5 * [ 6 6 6 66 66 ] 33.41
3 6.02 × 10 3 5.11 × 10 8 2.94 × 10 3 * 2.50 × 10 8 * 7.62 × 10 8 * [ 6 6 6 76 77 ] 605.64
4 2.37 × 10 7 2.02 × 10 12 5.28 × 10 6 4.48 × 10 11 4.68 × 10 11 [ 6 6 6 100 102 ] 1210.54
5 3.94 × 10 9 3.30 × 10 14 5.22 × 10 11 4.43 × 10 16 3.39 × 10 14 [ 6 6 6 170 172 ] 1923.38
Table 6. Residuals, spans of columns, and CPU times at ξ = 90 in Example 3.
Table 6. Residuals, spans of columns, and CPU times at ξ = 90 in Example 3.
kBResB_RResLResLR_RRes r ˜ k [ b k g b k h b k a m k h m k g ] t k
N = 13,200 ξ = 90 τ g = τ h = 10 16 m max = 2000
1 1.02 × 10 3 1.42 × 10 2 1.59 × 10 3 * 2.12 × 10 2 * 3.35 × 10 2 * [ 6 6 6 29 29 ] 1.05
2 2.33 × 10 0 3.25 × 10 5 3.04 × 10 1 * 4.24 × 10 6 * 3.68 × 10 5 * [ 6 6 6 66 66 ] 5.03
3 6.19 × 10 3 8.64 × 10 8 4.38 × 10 2 * 6.11 × 10 7 * 6.99 × 10 7 * [ 6 6 6 76 76 ] 82.23
4 6.09 × 10 7 8.49 × 10 12 3.45 × 10 3 4.81 × 10 8 4.81 × 10 8 [ 6 6 6 100 101 ] 162.04
5 1.00 × 10 9 1.40 × 10 14 9.49 × 10 4 1.32 × 10 8 1.32 × 10 8 [ 6 6 6 169 170 ] 248.86
6 1.00 × 10 9 1.40 × 10 14 1.01 × 10 5 1.41 × 10 10 1.41 × 10 10 [ 6 6 6 225 256 ] 355.07
7 1.00 × 10 9 1.40 × 10 14 9.45 × 10 10 1.31 × 10 14 2.72 × 10 14 [ 6 6 6 225 256 ] 449.81
N = 26,400 ξ = 90 τ g = τ h = 10 16 m max = 2000
1 1.44 × 10 3 1.42 × 10 2 1.61 × 10 3 * 2.21 × 10 2 * 3.63 × 10 2 * [ 6 6 6 29 29 ] 3.89
2 3.29 × 10 0 3.25 × 10 5 3.04 × 10 1 * 3.00 × 10 6 * 3.55 × 10 5 * [ 6 6 6 66 66 ] 14.24
3 8.76 × 10 3 8.64 × 10 8 4.38 × 10 2 * 4.32 × 10 7 * 5.19 × 10 7 * [ 6 6 6 76 76 ] 554.22
4 8.61 × 10 7 8.49 × 10 12 3.45 × 10 3 3.40 × 10 8 3.40 × 10 8 [ 6 6 6 100 101 ] 1100.79
5 1.42 × 10 9 1.40 × 10 14 9.49 × 10 4 9.35 × 10 9 9.35 × 10 9 [ 6 6 6 169 170 ] 1667.77
6 1.42 × 10 9 1.40 × 10 14 1.01 × 10 5 1.00 × 10 10 1.00 × 10 10 [ 6 6 6 210 234 ] 2286.67
7 1.42 × 10 9 1.40 × 10 14 9.46 × 10 10 9.33 × 10 15 2.33 × 10 14 [ 6 6 6 210 234 ] 2924.54
N = 39,600 ξ = 90 τ g = τ h = 10 16 m max = 2000
1 1.76 × 10 3 1.42 × 10 2 1.61 × 10 3 * 2.21 × 10 2 * 3.63 × 10 2 * [ 6 6 6 29 29 ] 7.49
2 4.03 × 10 0 3.25 × 10 5 3.04 × 10 1 * 2.45 × 10 6 * 3.49 × 10 5 * [ 6 6 6 66 66 ] 28.02
3 1.07 × 10 2 8.64 × 10 8 4.38 × 10 2 * 3.53 × 10 7 * 4.39 × 10 7 * [ 6 6 6 76 76 ] 564.66
4 1.05 × 10 6 8.49 × 10 12 3.45 × 10 3 2.78 × 10 8 2.78 × 10 8 [ 6 6 6 100 101 ] 1206.85
5 1.74 × 10 9 1.40 × 10 14 9.49 × 10 4 7.64 × 10 9 7.64 × 10 9 [ 6 6 6 169 170 ] 1929.52
6 1.74 × 10 9 1.40 × 10 14 1.01 × 10 5 8.19 × 10 11 8.19 × 10 11 [ 6 6 5 209 234 ] 3553.12
7 1.74 × 10 9 1.40 × 10 14 9.48 × 10 10 7.63 × 10 15 2.17 × 10 14 [ 6 6 0 209 234 ] 5806.44
Table 7. Numerical results between FSDA and SDA_HODLR of Example 3. The symbol * stands for no related records.
Table 7. Numerical results between FSDA and SDA_HODLR of Example 3. The symbol * stands for no related records.
N13,20026,40039,600
FSDASDA_HDFSDASDA_HDFSDASDA_HD
b k g [6 6 6 6 ]*[6 6 6 6 ]*[6 6 6 6 ]*
b k h [6 6 6 6 ]*[6 6 6 6 ]*[6 6 6 6 ]*
b k a [6 6 6 6 ]*[6 6 6 6 ]*[6 6 6 6 ]*
m k h [29 66 77 101]*[29 66 77 102]*[29 66 77 102]*
ξ = 98 m k g [29 66 77 102]*[29 66 77 103]*[29 66 77 102]*
IT.45454
RES.8.01  × 10 12 1.64  × 10 12 7.42  × 10 12 1.50  × 10 14 6.22  × 10 12
CPU162.181130.931148.3418,832.711246.78
b k g [6 6 6]*[6 6 6]*[6 6 6]*
b k h [6 6 6]*[6 6 6]*[6 6 6]*
b k a [6 6 6]*[6 6 6]*[6 6 6]*
m k h [29 66 69]*[29 66 69]*[29 66 69]*
ξ = 250 m k g [29 66 73]*[29 66 71]*[29 66 73]*
IT.33333
RES.1.75  × 10 12 1.73  × 10 12 2.54  × 10 12 1.74  × 10 12 3.62  × 10 12
CPU80.96655.69536.7615,322.53634.70
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, B.; Dong, N. Factorized Doubling Algorithm for Large-Scale High-Ranked Riccati Equations in Fractional System. Fractal Fract. 2023, 7, 468. https://doi.org/10.3390/fractalfract7060468

AMA Style

Yu B, Dong N. Factorized Doubling Algorithm for Large-Scale High-Ranked Riccati Equations in Fractional System. Fractal and Fractional. 2023; 7(6):468. https://doi.org/10.3390/fractalfract7060468

Chicago/Turabian Style

Yu, Bo, and Ning Dong. 2023. "Factorized Doubling Algorithm for Large-Scale High-Ranked Riccati Equations in Fractional System" Fractal and Fractional 7, no. 6: 468. https://doi.org/10.3390/fractalfract7060468

APA Style

Yu, B., & Dong, N. (2023). Factorized Doubling Algorithm for Large-Scale High-Ranked Riccati Equations in Fractional System. Fractal and Fractional, 7(6), 468. https://doi.org/10.3390/fractalfract7060468

Article Metrics

Back to TopTop