Model Reduction for Discrete-Time Systems via Optimization over Grassmann Manifold

Yiqin Lin; Liping Zhou

doi:10.3390/math13172767

Abstract

In this paper, we investigate

h_{2}

-optimal model reduction methods for discrete-time linear time-invariant systems. Similar to the continuous-time case, we will formulate this problem as an optimization problem over a Grassmann manifold. We consider constructing reduced systems by both one-sided and two-sided projections. For one-sided projection, by utilizing the principle of the Grassmann manifold, we propose a gradient flow method and a sequentially quadratic approximation approach to solve the optimization problem. For two-sided projection, we apply the strategies of alternating direction iteration and sequentially quadratic approximation to the minimization problem and develop a numerically efficient method. One main advantage of these methods, based on the formulation of optimization over a Grassmann manifold, is that stability can be preserved in the reduced system. Several numerical examples are provided to illustrate the effectiveness of the methods proposed in this paper.

Keywords:

discrete-time system; model reduction; Grassmann manifold; optimization; h₂ approximation

MSC:

65F99; 93C05

1. Introduction

Model reduction, which is often employed in approximating various very large-scale, complex dynamical systems, has received considerable attention in the past several decades. These systems often arise in various applications, such as fluid dynamics [1], structural dynamics [2], electrical circuits [3], aerodynamics [4], micro-electro-mechanical systems [5], weather prediction [6], and so on. Roughly speaking, the goal of model reduction is to replace a given mathematical model, described via differential equations or difference equations, with a model of the same form but with much smaller state space dimension than the original one, such that the reduced model still describes at least appropriately certain aspects of the original system.

In this paper we consider the model reduction of a discrete-time linear time-invariant (LTI) system described by the following system of difference equations

Σ : \{\begin{matrix} x_{k + 1} & = & A x_{k} + B u_{k}, \\ y_{k} & = & C x_{k}, \end{matrix}

(1)

where the matrices

A \in R^{n \times n}

,

B \in R^{n \times m},

and

C \in R^{p \times n}

represent the state matrix, the input matrix, and the output matrix, respectively. The vectors

x_{k} \in R^{n}

,

u_{k} \in R^{m}

, and

y_{k} \in R^{p}

are the state vector, the input vector, and the output vector of the discrete-time LTI system at time

t_{k}

, respectively. For simplicity, we assume that we have a zero initial condition

x_{0} = 0

in (1). Discrete-time systems arise naturally when continuous-time systems are discretized by numerical approximation to differentiation. Moreover, they directly appear in some important applications such as economics [7] and transportation network [8]. For a comprehensive treatment of discrete systems, including system characterization, structural properties, stability, optimal control, and applications, we refer to the book [9].

In this paper we aim at constructing a reduced-order model of

Σ

via projection. Specifically, we hope to find two matrices

W \in R^{n \times r}

and

V \in R^{n \times r}

with

r ≪ n

, such that a reduced system

\hat{Σ}

is given by

\hat{Σ} : \{\begin{matrix} {\hat{x}}_{k + 1} & = & \hat{A} {\hat{x}}_{k} + \hat{B} u_{k}, \\ {\hat{y}}_{k} & = & \hat{C} {\hat{x}}_{k} \end{matrix}

(2)

with

\hat{A} = W^{T} A V

,

\hat{B} = W^{T} B

, and

\hat{C} = C V

.

Much research has addressed the construction of the reduced-order model. For continuous-time LTI systems, there exist various model reduction methods. For example, classical model reduction methods, which are based on system Gramians, have been well established, including the balanced truncation method [10], the optimal Hankel norm approximation method [11], and the singular perturbation approximation method [12]. The other class of projection methods are the Krylov subspace [13] and rational Krylov subspace [14] methods, which become increasingly popular for large-scale linear dynamical systems such as those arising from structure dynamics, circuit simulations, and micro-electro-mechanical systems (see [15,16,17,18,19]). Rational Krylov subspace methods have been further improved to develop an iterative rational Krylov subspace algorithm [20], in which the interpolation points are iteratively updated so that the established reduced-order model satisfies the interpolation-based first-order necessary conditions for

H_{2}

-optimal model reduction of single-input single-output continuous-time systems. In [21,22], an iterative rational Krylov subspace method (IRKA) is extended to derive a tangential rational interpolation framework for model reduction of multi-input multi-output (MIMO) systems.

H_{2}

-optimal model reduction can also be formulated as an optimization problem over a Grassmann manifold. Based on this fact and by making use of one-sided projection, Xu and Zeng [23] proposed a fast gradient flow algorithm and a sequentially quadratic approximation for solving the optimization problem to construct stable reduced systems. The model reduction method based on Grassmann manifold optimization is further improved in [24] by employing two-sided projection. Model order reduction on Grassmann manifolds for other classes of systems are also considered. In [25], model order reduction on Grassmann manifolds has been extended to a special class of linear parameter-varying systems. Xu et al. [26] proposed a preserving–periodic Riemannian descent model reduction iterative method for linear discrete-time periodic systems. In [27], a parametric interpolation parallel MOR method for discrete-time parametric systems is proposed by making use of Grassmann manifolds and discrete Laguerre polynomials. In [28],

H_{2}

order reduction based on Grassmann manifolds is extended to bilinear systems. Otto et al. [29] studied model order reduction over the product of two Grassmann manifolds for nonlinear systems. Padovan et al. [30] considered data-driven model reduction for nonlinear systems by solving an optimization problem over the product of two manifolds. In [31], a novel differential geometric framework for model reduction on smooth manifolds is proposed. This general framework can capture and generalize several existing MOR techniques, such as preserving the structures for Lagrangian or Hamiltonian dynamics. Zimmermann [32] reviewed matrix manifolds and outlined the principal approaches to data interpolation and Taylor-like extrapolation on matrix manifolds. For the reduced-order modeling of high-dimensional dynamical systems, Sashittal and Bodony [33] proposed low-rank dynamic mode decomposition by solving a matrix manifold optimization problem with a rank constraint on the solution.

Some researchers have considered

h_{2}

-optimal model reduction for discrete-time LTI systems. For example, Van Dooren, Gallivan, and Absil [22] derived two first-order necessary conditions, the discrete-time counterparts of Wilson’s conditions [34] and tangential rational interpolation conditions for

h_{2}

-optimal model reduction. In [35], the authors gave the discrete-time counterparts of Hyland–Bernstein Gramian-based first-order necessary

h_{2}

-optimality conditions. They showed that Wilson’s conditions, the Hyland–Bernstein conditions, and the tangential interpolation-based conditions are equivalent. Moreover, based on the tangential interpolation-based conditions, a MIMO iterative rational interpolation algorithm is developed for model reduction of discrete-time MIMO LTI systems.

In this paper, we consider constructing reduced systems by both one-sided and two-sided projection. Similar to the continuous-time case, we formulate the

h_{2}

-optimal model reduction problems for discrete-time LTI systems as optimization problems over Grassmann manifolds. Based on this formulation, in the one-side projection case, we apply the gradient flow method and the sequentially quadratic approximation method to solve the related optimization problem. For two-sided projection, we combine the techniques of alternating direction iteration and sequentially quadratic approximation to develop an iterative method for the Grassmann manifold optimization problem. Numerical experiments demonstrate the effectiveness of model reduction methods based on Grassmann manifold optimization for discrete-time LTI systems.

Throughout this paper the following notation is used: The sets of all real and complex

n \times m

matrices are denoted by

R^{n \times m}

and

C^{n \times m}

, respectively. The identity matrix of dimension n is denoted by

I_{n}

and the zero matrix by 0. If the dimension of

I_{m}

is apparent from the context, we drop the index and simply use I. The actual dimension of 0 will always be apparent from the context. The superscripts T and H denote the transpose and the complex conjugate transpose of a vector or a matrix. The notation

span {V}

denotes the space spanned by the column vectors of the matrix V, and

span {V_{1}, V_{2}, \dots, V_{j}}

denotes the space spanned by the matrix sequence

V_{1}, V_{2}, \dots, V_{j}

. For two matrices

U, V \in R^{n \times m}

, the inner product of U and V is defined as

⟨ U, V ⟩ = trace (V^{T} U)

, where

trace (\cdot)

is the trace of a matrix. The notation

∥ U ∥

denotes the Frobenius matrix norm, which is defined by

∥ U ∥ = \sqrt{⟨ U, U ⟩}

.

The main contributions of this paper include the following:

The $h_{2}$ -optimal model reduction methods based on Grassmann manifold optimization are extended to discrete-time LTI systems.
For one-sided projection, a gradient flow method and a sequentially quadratic approximation approach are proposed to solve the optimization problem. For two-sided projection, the optimization problem is solved by applying the strategies of alternating direction iteration and sequentially quadratic approximation.
We present the details of implementation, such as how to efficiently solve sparse–dense discrete-time Sylvester equations.
The effectiveness of the proposed methods in this paper is demonstrated with two numerical examples.

The remainder of this paper is organized as follows: Section 2 is devoted to a short review of some important results on the

h_{2}

-norm of discrete-time LTI systems and Stiefel and Grassmann manifolds. Moreover, we also outline the framework of the gradient flow method for solving optimization problems over Grassmann manifolds. In Section 3, we consider the one-sided projection case. Two iterative methods, one relying on the gradient flow and the other on sequentially quadratic approximation, are proposed. We consider two-sided projection in Section 4. An iterative method based on the combination of alternating direction iteration and sequentially quadratic approximation is given. In Section 5, some issues related to the implementation of these algorithms are investigated, including the initial projection matrix selection and the termination criterion. We also discuss how to solve Stein equations, which appear in the methods proposed in this paper. In Section 6, several numerical examples are presented to illustrate the effectiveness of these model reduction methods. Finally, some conclusions are drawn in Section 7.

2. Preliminaries

In this section we review some important results on the

h_{2}

-norm of discrete-time LTI systems and the Grassmann manifold. Moreover, we also outline the framework of the gradient flow method for solving optimization problems over Grassmann manifolds. Much of the material in this section is standard and can be found in [35,36,37,38].

2.1. $h_{2}$ -Norm of Discrete-Time LTI Systems

For a vector sequence

x_{k}

,

k = 0, 1, 2, \dots

, its z-transform is defined by

x (z) = \sum_{k = 0}^{\infty} x_{k} z^{- 1} .

By applying the z-transform to the system

Σ

, we obtain, in the frequency domain, the following input–output relation:

y (z) = H (z) u (z),

(3)

where

y (z)

and

u (z)

are the z-transformations of the input and output, respectively, and

H (z) = C {(z I_{n} - A)}^{- 1} B

(4)

is called the transfer function of the system

Σ

.

In this paper, we assume that the discrete-time system

Σ

is asymptotically stable, i.e., all the eigenvalues of A lie inside the unit circle. In this case, the controllability and observability Gramians of the discrete-time system

Σ

are well defined and formulated as

\begin{matrix} P & = & \sum_{k = 0}^{\infty} A^{k} B B^{T} {(A^{T})}^{k}, \end{matrix}

(5)

\begin{matrix} Q & = & \sum_{k = 0}^{\infty} {(A^{T})}^{k} C^{T} C A^{k} . \end{matrix}

(6)

These Gramians also satisfy the Stein equations, also called the discrete Lyapunov equations:

\begin{matrix} A P A^{T} - P + B B^{T} = 0, \end{matrix}

(7)

\begin{matrix} A^{T} Q A - Q + C^{T} C = 0 . \end{matrix}

(8)

For an asymptotically stable discrete-time LTI system

Σ

, its squared

h_{2}

-norm is defined by

{∥ Σ ∥}_{h_{2}}^{2} = {∥ H ∥}_{h_{2}}^{2} = trace (\frac{1}{2 π} \int_{0}^{2 π} H^{*} (e^{i θ}) H (e^{i θ}) d θ) .

(9)

Using these systems’ Gramians P and Q, the

h_{2}

-norm can be calculated via the following equation:

{∥ Σ ∥}_{h_{2}}^{2} = {∥ H ∥}_{h_{2}}^{2} = trace (C P C^{T}) = trace (B^{T} Q B) .

(10)

2.2. Stiefel and Grassmann Manifolds

In this subsection, we briefly review the definitions of Stiefel manifold and Grassmann manifold and some basic results on these two manifolds.

Definition 1.

The real Stiefel manifold

St (r, n)

with

r \leq n

is defined as the set of all

n \times r

orthonormal matrices, i.e.,

St (r, n) = {U \in R^{n \times r} | U^{T} U = I_{r}} .

Definition 2.

Two matrices

U_{1}, U_{2}

in the real Stiefel manifold

St (r, n)

are defined to be equivalent if their columns span the same r-dimensional subspaces.

From Definition 2,

U_{1}, U_{2} \in St (r, n)

are equivalent if and only if there exists an orthogonal matrix

Q \in R^{r \times r}

such that

U_{1} = U_{2} Q

. So, we can define the equivalent class

[U]

of a point

U \in St (r, n)

to be

[U] = {U Q | Q \in O_{r}},

where

O_{r}

denotes the set of all orthogonal matrices in

R^{r \times r}

.

Definition 3.

The real Grassmann manifold

Gr (r, n)

with

r \leq n

is defined as the set of all r-dimensional real linear subspaces of

R^{n}

.

From the above definition, we know that a point in the real Grassmann manifold

Gr (r, n)

is an r-dimensional real linear subspace, which can be spanned by the columns of a matrix in

St (r, n)

. So, there is a one-to-one correspondence between a point in the real Grassmann manifold

Gr (r, n)

and an equivalent class of

St (r, n)

. Thus, the Grassmann manifold

Gr (r, n)

can be seen as a quotient manifold, i.e.,

Gr (r, n) = St (r, n) / O_{r} .

Definition 4.

For a Stiefel manifold

St (r, n)

, the tangent space at

U \in St (r, n)

is defined as

T_{U} St (r, n) = {Z \in R^{n \times r} | U^{T} Z + Z^{T} U = 0} .

Correspondingly, we have a definition of the tangent space of the Grassmann manifold.

Definition 5.

The tangent space at point

[U]

in the Grassmann manifold

Gr (r, n)

is described as

T_{[U]} Gr (r, n) = {Z \in R^{n \times r} | U^{T} Z = 0} .

Let

J (U)

be a function of

U \in R^{n \times r}

. Then we can define the matrix

J_{U}

of all partial derivatives of J with respect to U:

{(J_{U})}_{i j} = \frac{\partial J}{\partial U_{i j}} .

Definition 6.

Let

J (U)

be a function defined in a Stiefel manifold

St (r, n)

. Then, its gradient at U in the Stiefel manifold is defined to be the tangent vector

\nabla J

such that

trace (J_{U}^{T} V) = trace ({(\nabla J)}^{T} (I - \frac{1}{2} U U^{T}) V) .

holds for all tangent vectors

V \in T_{U} St (r, n)

.

Definition 7.

Let

J (U)

be a function defined in a Grassmann manifold

Gr (r, n)

. Then, its gradient at point

[U] \in Gr (r, n)

is defined to be the tangent vector

\nabla J

such that

trace (J_{U}^{T} V) = trace ({(\nabla J)}^{T} V) .

holds for all tangent vectors

V \in T_{[U]} Gr (r, n)

.

It has been shown in [36,37] that the gradient of J defined in a Stiefel manifold

St (r, n)

is expressed as

\nabla J = J_{U} - U J_{U}^{T} U,

(11)

while the gradient of J defined in Grassmann manifold

Gr (r, n)

is formulated as

\nabla J = (I - U U^{T}) J_{U} .

(12)

2.3. The Gradient Flow Method

We briefly overview the framework of this method in the rest of this section.

Given an optimization problem based on Grassmann manifold

Gr (r, n)

,

min_{[U] \in Gr (r, n)} J (U),

(13)

where J is the objective function.

It is well known [36] that the gradient flow method is one of the most efficient iterative methods for solving optimization problems based on Grassmann manifolds.

Let

J_{U}

denote the matrix of all partial derivatives of J with respect to

[U]

. Then the gradient of J at point U of Grassmann manifold

Gr (r, n)

is

\nabla J = (I - U U^{T}) J_{U}

. Clearly, the solution U of minimization problem (13) must be a point of

Gr (r, n)

such that the gradient

\nabla J = 0

, i.e., U, satisfies

\nabla J = (I - U U^{T}) J_{U} = 0, U^{T} U = I_{r} .

To find a zero point of the above equation, it is proposed to solve the following gradient flow problem on the Grassmann manifold

Gr (r, n)

:

\frac{d U (t)}{d t} = - \nabla J = (U (t) U^{T} (t) - I_{n}) J_{U (t)} .

(14)

Given an arbitrary square matrix A, the matrix functions

sin (A)

and

cos (A)

are defined, respectively, as [39]

\begin{matrix} sin (A) & = & \sum_{j = 0}^{\infty} \frac{{(- 1)}^{j}}{(2 j + 1)!} A^{2 j + 1} = A - \frac{A^{3}}{3!} + \frac{A^{5}}{5!} + \frac{A^{7}}{7!} \dots, \\ cos (A) & = & \sum_{j = 0}^{\infty} \frac{{(- 1)}^{j}}{(2 j)!} A^{2 j} = I - \frac{A^{2}}{2!} + \frac{A^{4}}{4!} - \frac{A^{6}}{6!} \dots . \end{matrix}

Note that for any diagonal matrix

Γ = diag {σ_{1}, σ_{2}, \dots, σ_{n}}

, it holds that

\begin{matrix} sin (Γ) & = & diag {sin σ_{1}, sin σ_{2}, \dots, sin σ_{n}}, \\ cos (Γ) & = & diag {cos σ_{1}, cos σ_{2}, \dots, cos σ_{n}} . \end{matrix}

Given a matrix

F \in R^{n \times r}

, let

F = Φ Γ Ψ^{T}

be the singular value decomposition of F, where

Φ \in R^{n \times r}, Γ \in R^{r \times r},

and

Ψ \in R^{r \times r}

. The geodesic on the Grassmann manifold

Gr (r, n)

at point

[U]

with the direction F is defined by

U Ψ cos (t Γ) Ψ^{T} + Φ sin (t Γ) Ψ^{T}, t \in R .

With the help of the geodesic and by choosing an appropriate time step

t_{j}

, the gradient flow iteration scheme, which iteratively constructs the approximate solution of (14) along the geodesic at the negative gradient direction

- \nabla J

, is formulated as

U_{j + 1} = U_{j} Ψ_{j} cos (t_{j} Γ_{j}) Ψ_{j}^{T} + Φ_{j} sin (t_{j} Γ_{j}) Ψ_{j}^{T},

(15)

where

Φ_{j} Γ_{j} Ψ_{j}^{T}

is the singular value decomposition of the negative gradient of J at

U_{j}

.

3. One-Sided Projection via Optimization on Grassmann Manifold

This section is concerned with model order reduction of the discrete-time linear LTI system in (1). One objective of model order reduction is to reduce the error between the original system (1) and the reduced system (2). In this paper we will be measuring the error in the

h_{2}

-norm. For the sake of the derivation of necessary conditions for

h_{2}

-optimal model reduction over a Grassmann manifold, let us define the transfer function of the error system

Σ_{e}

:

H_{e} (z) = H (z) - \hat{H} (z) = C {(z I_{n} - A)}^{- 1} B - \hat{C} {(z I_{r} - \hat{A})}^{- 1} \hat{B} .

(16)

It is clear that the error system

Σ_{e}

has a realization

(A_{e}, B_{e}, C_{e})

with

A_{e} = (\begin{matrix} A & 0 \\ 0 & \hat{A} \end{matrix}), B_{e} = (\begin{matrix} B \\ \hat{B} \end{matrix}), C_{e} = (\begin{matrix} C & - \hat{C} \end{matrix}),

(17)

i.e., the transfer function of the error system

Σ_{e}

can be also formulated as

H_{e} (z) = C_{e} {(z I_{n + r} - A_{e})}^{- 1} B_{e} .

(18)

Following the knowledge in Section 2, the squared

h_{2}

-norm of the error system

Σ_{e}

can be written in terms of its controllability Gramian

P_{e}

and observability

Q_{e}

as

∥ Σ_{e} ∥_{h_{2}}^{2} = {∥ H_{e} ∥}_{h_{2}}^{2} = trace (C_{e} P_{e} C_{e}^{T}) = trace (B_{e}^{T} Q_{e} B_{e}) .

(19)

Note that

P_{e}

and

Q_{e}

are also the solutions of the following two Stein equations associated with the error system:

\begin{matrix} A_{e} P_{e} A_{e}^{T} - P_{e} + B_{e} B_{e}^{T} = 0, \end{matrix}

(20)

\begin{matrix} A_{e}^{T} Q_{e} A_{e} - Q_{e} + C_{e}^{T} C_{e} = 0 . \end{matrix}

(21)

Let us partition the matrices

P_{e}

,

Q_{e}

into

P_{e} = (\begin{matrix} P & X \\ X^{T} & \hat{P} \end{matrix}), Q_{e} = (\begin{matrix} Q & Y \\ Y^{T} & \hat{Q} \end{matrix})

(22)

with

P, Q \in R^{n \times n}

,

\hat{P}, \hat{Q} \in R^{r \times r}

, and

X, Y \in R^{n \times r}

. Then, the Stein Equation in (20) is equivalent to the following equations:

\begin{matrix} A P A^{T} - P + B B^{T} = 0, \end{matrix}

(23)

\begin{matrix} \hat{A} \hat{P} {\hat{A}}^{T} - \hat{P} + \hat{B} {\hat{B}}^{T} = 0, \end{matrix}

(24)

\begin{matrix} A X {\hat{A}}^{T} - X + B {\hat{B}}^{T} = 0, \end{matrix}

(25)

On the other hand, the Stein Equation in (21) is equivalent to

\begin{matrix} A^{T} Q A - Q + C^{T} C = 0, \\ {\hat{A}}^{T} \hat{Q} \hat{A} - \hat{Q} + {\hat{C}}^{T} \hat{C} = 0, \\ A^{T} Y \hat{A} - Y - C^{T} \hat{C} = 0 . \end{matrix}

Based on the expression of the squared

h_{2}

-norm of the error system in (19) and the partition of

P_{e}

and

Q_{e}

in (22), the squared

h_{2}

-norm of the error system can now be rewritten as

\begin{matrix} ∥ H_{e} ∥_{h_{2}}^{2} & = & trace (C P C^{T} + \hat{C} \hat{P} {\hat{C}}^{T} - 2 C X {\hat{C}}^{T}) \end{matrix}

(26)

\begin{matrix} = & trace (B^{T} Q B + {\hat{B}}^{T} \hat{Q} \hat{B} + 2 B^{T} Y \hat{B}) . \end{matrix}

(27)

H_{2}

-norm model reduction on a Grassmann manifold for continuous-time systems was considered in [23,40]. In this section, we aim to extend the one-sided projection technique based on Grassmann manifold optimization to the discrete-time case. The one-sided projection-based reduction approach involves finding one projection matrix

U \in R^{n \times r}

,

U^{T} U = I_{r}

. With U, the reduced system

\hat{Σ}

is constructed by setting

\hat{A} = U^{T} A U, \hat{B} = U^{T} B, \hat{C} = C U .

(28)

In the one-sided projection scheme, the objective function

J (U)

is defined by

\begin{matrix} J (U) = ∥ H_{e} ∥_{h_{2}}^{2} & = & trace (C P C^{T} + \hat{C} \hat{P} {\hat{C}}^{T} - 2 C X {\hat{C}}^{T}) \\ = & trace (C^{T} C (P + U \hat{P} U^{T} - 2 X U^{T})) . \end{matrix}

(29)

Now, from (23)–(25) and (28), it follows that P,

\hat{P}

, and X satisfy

\begin{matrix} A P A^{T} - P + B B^{T} = 0, \end{matrix}

(30)

\begin{matrix} U^{T} A U \hat{P} U^{T} A^{T} U - \hat{P} + U^{T} B B^{T} U = 0, \end{matrix}

(31)

\begin{matrix} A X U^{T} A^{T} U - X + B B^{T} U = 0 . \end{matrix}

(32)

The objective function

J (U)

is also reformulated as

\begin{matrix} J (U) = ∥ H_{e} ∥_{h_{2}}^{2} & = & trace (B^{T} Q B + {\hat{B}}^{T} \hat{Q} \hat{B} + 2 B^{T} Y \hat{B}) \\ = & trace (B B^{T} (Q + U \hat{Q} U^{T} + 2 Y U^{T})), \end{matrix}

where Q,

\hat{Q}

, and Y are the solutions to the equations

\begin{matrix} A^{T} Q A - Q + C^{T} C = 0, \end{matrix}

(33)

\begin{matrix} U^{T} A^{T} U \hat{Q} U^{T} A U - \hat{Q} + U^{T} C^{T} C U = 0, \end{matrix}

(34)

\begin{matrix} A^{T} Y U^{T} A U - Y - C^{T} C U = 0 . \end{matrix}

(35)

respectively.

We will be deriving an expression of the matrix

J_{U}

of all partial derivatives of J with respect to U. The following lemma is useful in the derivation of our result.

Lemma 1.

Let

M, N \in R^{n \times m}

,

A \in R^{n \times n}

,

B \in R^{m \times m}

, and

X, Y \in R^{n \times m}

. If

M, N

satisfy

A M B - M + X = 0, A^{T} N B^{T} - N + Y = 0,

then the following holds:

trace (Y^{T} M) = trace (X^{T} N) .

Proof.

Obviously, it holds that

\begin{matrix} trace (Y^{T} M) & = & trace (N^{T} M - B N^{T} A M) \\ = & trace (N^{T} (M - A M B)) \\ = & trace ({(M - A M B)}^{T} N) \\ = & trace (X^{T} N) . \end{matrix}

This completes the proof. □

Theorem 1.

Let

\hat{P}

, X,

\hat{Q}

, and Y be the solutions of (31), (32), (34), and (35), respectively. Then, the matrix

J_{U}

of all partial derivatives of J with respect to U can be expressed as

J_{U} = 2 R,

(36)

where R is defined by

\begin{matrix} R & = & (A U \hat{P} U^{T} A^{T} U \hat{Q} + A^{T} U \hat{Q} U^{T} A U \hat{P} + B B^{T} U \hat{Q}) \\ + (A^{T} U Y^{T} A X + A U X^{T} A^{T} Y + B B^{T} Y) + (C^{T} C U \hat{P} - C^{T} C X) . \end{matrix}

(37)

Proof.

Let

E_{i j}

be the single-entry matrix being one in entry

(i, j)

and zero otherwise. By differentiating J with respect to

U_{i j}

, we obtain

\begin{matrix} {(J_{U})}_{i j} & = & \frac{\partial}{\partial U_{i j}} trace (C^{T} C (P + U \hat{P} U^{T} - 2 X U^{T})) \\ = & trace (C^{T} C (E_{i j} \hat{P} U^{T} + U \frac{\partial \hat{P}}{\partial U_{i j}} U^{T} + U \hat{P} E_{i j}^{T} - 2 \frac{\partial X}{\partial U_{i j}} U^{T} - 2 X E_{i j}^{T})) \\ = & trace (\frac{\partial \hat{P}}{\partial U_{i j}} U^{T} C^{T} C U - 2 U^{T} C^{T} C \frac{\partial X}{\partial U_{i j}} + 2 (C^{T} C U \hat{P} - C^{T} C X) E_{i j}^{T}) . \end{matrix}

(38)

Differentiating Equations (31) and (32) with respect to

U_{i j}

leads to

\begin{matrix} U^{T} A U \frac{\partial \hat{P}}{\partial U_{i j}} U^{T} A^{T} U - \frac{\partial \hat{P}}{\partial U_{i j}} + Z + Z^{T} = 0, \end{matrix}

(39)

\begin{matrix} A \frac{\partial X}{\partial U_{i j}} U^{T} A^{T} U - \frac{\partial X}{\partial U_{i j}} + S = 0, \end{matrix}

(40)

where Z and S are defined by

\begin{matrix} Z & = & E_{i j}^{T} A U \hat{P} U^{T} A^{T} U + U^{T} A E_{i j} \hat{P} U^{T} A^{T} U + E_{i j}^{T} B B^{T} U, \\ S & = & A X E_{i j}^{T} A^{T} U + A X U^{T} A^{T} E_{i j} + B B^{T} E_{i j} . \end{matrix}

Applying Lemma 1 to Equations (34) and (39) leads to

\begin{matrix} trace (U^{T} C^{T} C U \frac{\partial \hat{P}}{\partial U_{i j}}) \\ = & trace ({(Z + Z^{T})}^{T} \hat{Q}) \\ = & 2 trace (Z \hat{Q}) \\ = & 2 trace (E_{i j}^{T} A U \hat{P} U^{T} A^{T} U \hat{Q} + U^{T} A E_{i j} \hat{P} U^{T} A^{T} U \hat{Q} + E_{i j}^{T} B B^{T} U \hat{Q}) \\ = & 2 trace ((A U \hat{P} U^{T} A^{T} U \hat{Q} + A^{T} U \hat{Q} U^{T} A U \hat{P} + B B^{T} U \hat{Q}) E_{i j}^{T}) . \end{matrix}

Similarly, from (35), (40), and Lemma 1, it follows that

\begin{matrix} trace (- U^{T} C^{T} C \frac{\partial X}{\partial U_{i j}}) \\ = & trace (S^{T} Y) \\ = & trace (U^{T} A E_{i j} X^{T} A^{T} Y + E_{i j}^{T} A U X^{T} A^{T} Y + E_{i j}^{T} B B^{T} Y) \\ = & trace ((A^{T} U Y^{T} A X + A U X^{T} A^{T} Y + B B^{T} Y) E_{i j}^{T}) . \end{matrix}

Then, we obtain

\begin{matrix} {(J_{U})}_{i j} & = & 2 trace (R E_{i j}^{T}), \end{matrix}

where R is defined by (37). Therefore, we can obtain

J_{U} = 2 R

. □

Given a stable discrete-time system (1), one-sided projection

h_{2}

-optimal model order reduction aims to minimize the squared

h_{2}

-norm of the error between the stable full system and the stable reduced-order system; i.e., it seeks to solve the minimization problem

min_{U^{T} U = I_{r}} J (U) = min_{U^{T} U = I_{r}} trace (C^{T} C (P + U \hat{P} U^{T} - 2 X U^{T})),

(41)

where

\hat{P}

and X are the solutions of (31) and (32), respectively. Since U is a point of the Stiefel manifold

St (r, n)

, the minimization problem can be also expressed as an optimization problem based on the Stiefel manifold

St (r, n)

.

It is known [23] that in the continuous-time case, the minimization problem

{min}_{U^{T} U = I_{r}} J (U)

can be equivalently rewritten as a minimization problem based on the Grassmann manifold

Gr (r, n)

if the cost function

J (U)

depends only on the space spanned by the columns of the matrix U, i.e., if

J (U) = J (U Q)

for any

Q \in R^{r \times r}

. For the discrete-time system considered in this paper, due to the same reason, minimization problem (41) can be reformulated as an equivalent optimization problem based on a Grassmann manifold:

min_{[U] \in Gr (r, n)} J (U) = min_{[U] \in Gr (r, n)} trace (C^{T} C (P + U \hat{P} U^{T} - 2 X U^{T})) .

(42)

Clearly, the solution U of minimization problem (42) must be a point of

Gr (r, n)

such that the gradient

\nabla J_{U} = 0

, i.e., U, satisfies the necessary conditions

\nabla J_{U} = (I - U U^{T}) J_{U} = 0, U^{T} U = I_{r},

where

J_{U}

is given by (36).

3.1. Solving the Optimization Problem via the Gradient Flow Approach

In this subsection, we consider the application of the gradient flow method reviewed in Section 2.3 to the optimization problem over a Grassmann manifold (42).

Suppose that for

j = 0, 1, 2, \dots

,

U_{j} \in R^{n \times r}

, with

U_{j}^{T} U_{j} = I_{r}

, is the j-th iterate. From Theorem (1), it follows that the matrix

J_{U_{j}}

of all partial derivatives of J with respect to

U_{j}

can be expressed as

J_{U_{j}} = 2 R_{j},

(43)

where

R_{j}

is given by

\begin{matrix} R_{j} & = & A U_{j} ({\hat{P}}_{j} U_{j}^{T} A^{T} U_{j} {\hat{Q}}_{j} + X_{j}^{T} A^{T} Y_{j}) + B B^{T} (Y_{j} + U_{j} {\hat{Q}}_{j}) \\ + A^{T} U_{j} (Y_{j}^{T} A X_{j} + {\hat{Q}}_{j} U_{j}^{T} A U_{j} {\hat{P}}_{j}) + C^{T} C (U_{j} {\hat{P}}_{j} - X_{j}), \end{matrix}

(44)

where

{\hat{P}}_{j}

,

X_{j}

,

{\hat{Q}}_{j}

, and

Y_{j}

are the solutions of Equations (31), (32), (34), and (35) with

U = U_{j}

, respectively. With

J_{U_{j}}

, we see from (12) that the gradient of J at point

[U_{j}]

in Grassmann manifold

Gr (r, n)

can be formulated as

\nabla J_{U_{j}} = (I - U_{j} U_{j}^{T}) R_{j} .

(45)

Then, the gradient flow method computes the

(j + 1)

-th approximate solution to minimization problem (41) as

U_{j + 1} = U_{j} Ψ_{j} cos (t_{j} Γ_{j}) Ψ_{j}^{T} + Φ_{j} sin (t_{j} Γ_{j}) Ψ_{j}^{T},

(46)

where

- \nabla J_{U_{j}} = (U_{j} U_{j}^{T} - I_{n}) R_{j} = Φ_{j} Γ_{j} Ψ_{j}^{T}

is the singular value decomposition of the negative gradient of J at

U_{j}

.

In iteration (46), the step size

t_{j}

can be chosen according to the inexact line search with the Armijo rule as follows: Define

U_{j} (t) = U_{j} Ψ_{j} cos (t Γ_{j}) Ψ_{j}^{T} + Φ_{j} sin (t Γ_{j}) Ψ_{j}^{T},

where t denotes the step size parameter. For some given

c, δ \in (0, 1), γ > 0

, the Armijo step size is

t_{j} = δ^{l} γ

, where l is the smallest nonnegative integer such that the adequate reduction condition holds:

J (U_{j}) - J (U_{j} (δ^{l} γ)) \geq c δ^{l} γ ⟨\nabla J_{U_{j}}, \nabla J_{U_{j}}⟩ .

(47)

Let

{\hat{P}}_{j}

and

X_{j}

be the solutions of (31) and (32) with

U = U_{j}

, and

{\hat{P}}_{j}

and

X_{j}

be the solutions of (31) and (32) with

U = U_{j} (δ^{l} γ)

. Then, by (29), we have

\begin{matrix} J (U_{j}) - J (U_{j} (δ^{l} γ)) & = & trace (C (U_{j} {\hat{P}}_{j} U_{j}^{T} - 2 X_{j} U_{j}^{T}) C^{T}) \\ - trace (C (U_{j} (δ^{l} γ) {\hat{P}}_{j} U_{j}^{T} (δ^{l} γ) - 2 X_{j} U_{j}^{T} (δ^{l} γ)) C^{T}) . \end{matrix}

This shows that we do not need to solve (30) for P to apply the inexact linear search with the Armijo condition.

We now outline the gradient flow method for

h_{2}

model reduction of discrete-time system (1) in Algorithm 1.

Algorithm 1 Gradient flow method

Input:: the original system $Σ = (A, B, C)$ with $A \in R^{n \times n}, B \in R^{n \times m}, C \in R^{p \times n}$ , $c, δ \in (0, 1), γ > 0$ , and the dimension r of the reduced system.
Output: :: the reduced system $\hat{Σ} = (\hat{A}, \hat{B}, \hat{C})$ with $\hat{A} \in R^{r \times r}, \hat{B} \in R^{r \times m}, \hat{C} \in R^{p \times r}$ .

Choose an initial matrix $U_{0} \in R^{n \times r}$ such that $U_{0}^{T} U_{0} = I_{r}$ .
Set ${\hat{A}}_{0} = U_{0}^{T} A U_{0}$ , ${\hat{B}}_{0} = U_{0}^{T} B$ , $j = 0$ .
While not converge do
- Compute ${\hat{P}}_{j}$ , $X_{j}$ , ${\hat{Q}}_{j}$ , $Y_{j}$ by solving the Equations (31), (32), (34) and (35) with $U = U_{j}$ , $\hat{P} = {\hat{P}}_{j}$ , $X = X_{j}$ , $\hat{Q} = {\hat{Q}}_{j}$ , and $Y = Y_{j}$ .
- Compute $R_{j}$ according to (44).
- Compute the gradient direction $\nabla J_{U_{j}} = (I_{n} - U_{j} U_{j}^{T}) R_{j}$ .
- Compute the SVD: $- \nabla J_{U_{j}} = Φ_{j} Γ_{j} Ψ_{j}^{T}$ .
- Choose the step size $t_{j} = δ^{l} γ$ so that l is the smallest nonnegative integer satisfying
  
  $J (U_{j}) - J (U_{j} (δ^{l} γ)) \geq c δ^{l} γ ⟨\nabla J_{U_{j}}, \nabla J_{U_{j}}⟩ .$
- Compute $U_{j + 1} = U_{j} Ψ_{j} cos (t_{j} Γ_{j}) Ψ_{j}^{T} + Φ_{j} sin (t_{j} Γ_{j}) Ψ_{j}^{T}$ .
- $j = j + 1$ .
End While
Compute $\hat{A} = U_{j}^{T} A U_{j}$ , $\hat{B} = U_{j}^{T} B$ , $\hat{C} = C U_{j}$ .

We note that a detailed complexity analysis of the gradient flow method for model reduction of continuous-time LTI systems is presented in [23]. With a similar analysis, it is easy to show that the computational complexity of Algorithm 1 is

O (N (r (k n + N (A)) + N_{s} n r^{2} + n r m + n r p))

, where k is some fixed integer;

N (A)

denotes the number of nonzero elements; N is the number of iterations;

N_{s}

is the maximum number of search steps;

m, p, r

are the input, output, and reduced dimensions, respectively; and

O (k n + N (A))

is the complexity of iterative methods like GMRES for solving linear systems with a coefficient matrix

λ A - I_{n}

. The computational complexity of Algorithms 2 and 3 can be analyzed similarly.

3.2. Solving the Optimization Problem via Sequentially Quadratic Approximation

Sequentially quadratic approximation [41] is one of the most known methods in the optimization field. In [23], Xu and Zeng applied this technique to minimization problem (42), which arises in the

H_{2}

model reduction of continuous-time systems. In this subsection, we extend sequentially quadratic approximation for the

h_{2}

model reduction of discrete-time systems.

Let

U_{j} \in R^{n \times r}

, with

U_{j}^{T} U_{j} = I_{r}

, be the j-th projection matrix generated in the iteration. In sequentially quadratic approximation, the key step is to approximate the cost

J (U)

, given in (42), by the following quadratic function:

\hat{J} (U) = trace (C^{T} C (P + U {\hat{P}}_{j} U^{T} - 2 X_{j} U^{T})),

(48)

where

{\hat{P}}_{j}

and

X_{j}

are the solutions of Stein Equations (31) and (32) with

U = U_{j}

, respectively. It is clear that the partial derivative matrix of

\hat{J} (U)

is given by

{\hat{J}}_{U} = 2 C C^{T} (U {\hat{P}}_{j} - X_{j})

. By setting

{\hat{J}}_{U} = 0

, we get

{\hat{U}}_{j + 1} = X_{j} {\hat{P}}_{j}^{- 1}

if

{\hat{P}}_{j}

is invertible. Now, we can define a possible search direction

Δ_{j}

by projecting the difference

{\hat{U}}_{j + 1} - U_{j}

to the tangent space of

Gr (r, n)

at point

[U_{j}]

; i.e.,

Δ_{j}

is defined by

Δ_{j} = (I - U_{j} U_{j}^{T}) ({\hat{U}}_{j + 1} - U_{j}) = (I - U_{j} U_{j}^{T}) {\hat{U}}_{j + 1} .

In order to generate a gradient-related sequence

{U_{j}}

, it is proposed in [41] to select the new search

S_{U_{j}}

for obtaining

U_{j + 1}

according to the following rule: Given constants

c_{1} > 0

and

c_{2} \in (- 1, 0)

, if

∥ Δ_{j} ∥ > c_{1} and \frac{⟨\nabla J_{U_{j}}, Δ_{j}⟩}{∥ \nabla J_{U_{j}} ∥ \cdot ∥ Δ_{j} ∥} < c_{2},

then the search direction is

S_{U_{j}} = Δ_{j}

; otherwise, the search direction is the negative gradient direction of

J (U)

at

U_{j}

, i.e.,

S_{U_{j}} = - \nabla J_{U_{j}}

. With the SVD of the new search direction,

S_{U_{j}} = Φ_{j} Γ_{j} Ψ_{j}^{T}

, the

(j + 1)

-iterate

U_{j + 1}

is generated according to iteration scheme (46) with the step size

t_{j} = δ^{l} γ

satisfying the Armijo rule in (47).

We summarize the sequentially quadratic approximation method for the

h_{2}

-optimal model reduction of discrete-time system (1) in Algorithm 2.

Algorithm 2 Sequentially quadratic approximation method

Input:: the original system $Σ = (A, B, C)$ with $A \in R^{n \times n}, B \in R^{n \times m}, C \in R^{p \times n}$ , $c, δ \in (0, 1), γ > 0$ , $c_{1} > 0$ , $c_{2} \in (- 1, 0)$ , and the dimension r of the reduced system.
Output:: the reduced system $\hat{Σ} = (\hat{A}, \hat{B}, \hat{C})$ with $\hat{A} \in R^{r \times r}, \hat{B} \in R^{r \times m}, \hat{C} \in R^{p \times r}$ .

Choose an initial matrix $U_{0} \in R^{n \times r}$ such that $U_{0}^{T} U_{0} = I_{r}$ .
Set ${\hat{A}}_{0} = U_{0}^{T} A U_{0}$ , ${\hat{B}}_{0} = U_{0}^{T} B$ , $j = 0$ .
While not converge do
- Compute ${\hat{P}}_{j}$ , $X_{j}$ , ${\hat{Q}}_{j}$ , $Y_{j}$ by solving the Equations (31), (32), (34) and (35) with $U = U_{j}$ , $\hat{P} = {\hat{P}}_{j}$ , $X = X_{j}$ , $\hat{Q} = {\hat{Q}}_{j}$ , and $Y = Y_{j}$ .
- Compute $R_{j}$ according to (44).
- Compute the gradient direction $\nabla J_{U_{j}} = (I_{n} - U_{j} U_{j}^{T}) R_{j}$ .
- Compute ${\hat{U}}_{j + 1} = X_{j} {\hat{P}}_{j}^{- 1}$ .
- Compute $Δ_{j} = (I - U_{j} U_{j}^{T}) {\hat{U}}_{j + 1}$ .
- If $∥ Δ_{j} ∥ > c_{1}$ and $\frac{⟨\nabla J_{U_{j}}, Δ_{j}⟩}{∥ \nabla J_{U_{j}} ∥ \cdot ∥ Δ_{j} ∥} < c_{2}$ then
  –
  $S_{U_{j}} = Δ_{j}$ ,
  else
  –
  $S_{U_{j}} = - \nabla J_{U_{j}}$ .
- Compute the SVD: $S_{U_{j}} = Φ_{j} Γ_{j} Ψ_{j}^{T}$ .
- Choose the step size $t_{j} = δ^{l} γ$ so that l is the smallest nonnegative integer satisfying
  
  $J (U_{j}) - J (U_{j} (δ^{l} γ)) \geq c δ^{l} γ ⟨S_{U_{j}}, S_{U_{j}}⟩ .$
- Compute $U_{j + 1} = U_{j} Ψ_{j} cos (t_{j} Γ_{j}) Ψ_{j}^{T} + Φ_{j} sin (t_{j} Γ_{j}) Ψ_{j}^{T}$ .
- $j = j + 1$ .
End While
Compute $\hat{A} = U_{j}^{T} A U_{j}$ , $\hat{B} = U_{j}^{T} B$ , $\hat{C} = C U_{j}$ .

Algorithm 3 Two-sided projection method

Input:: the full system $Σ = (A, B, C)$ with $A \in R^{n \times n}, B \in R^{n \times m}, C \in R^{p \times n}$ , $c, δ \in (0, 1), γ > 0$ , and the dimension r of the reduced system.
Output:: the reduced system $\hat{Σ} = (\hat{A}, \hat{B}, \hat{C})$ with $\hat{A} \in R^{r \times r}, \hat{B} \in R^{r \times m}, \hat{C} \in R^{p \times r}$ .

Choose the initial projection matrices $W_{0}, V_{0}$ .
Set ${\hat{A}}_{0} = W_{0}^{T} A V_{0}$ , ${\hat{B}}_{0} = W_{0}^{T} B$ , $j = 0$ .
While not converge do
- Solve the Stein Equations (66) and (67) for ${\hat{P}}_{j}$ and $X_{j}$ , respectively.
- Compute ${\hat{V}}_{j + 1} = X_{j} {\hat{P}}_{j}^{- 1}$ .
- Compute the search direction $S_{V_{j}} = (I - V_{j} {(V_{j}^{T} V_{j})}^{- 1} V_{j}^{T}) {\hat{V}}_{j + 1}$ .
- Find the smallest positive number l so that
  
  $J (W_{j}, V_{j}) - J (W_{j}, V_{j} (δ^{l} γ)) \geq c δ^{l} γ ⟨S_{V_{j}}, S_{V_{j}}⟩,$
- Set $V_{j + 1} = V_{j} (δ^{l} γ) = (V_{j} + δ^{l} γ S_{V_{j}}) {(W_{j}^{T} (V_{j} + δ^{l} γ S_{V_{j}}))}^{- 1}$ .
- Solve the Stein Equations (68) and (69) for ${\hat{Q}}_{j}$ and $Y_{j}$ , respectively.
- Compute ${\tilde{W}}_{j + 1} = - Y_{j} {\hat{Q}}_{j}^{- 1}$ .
- Compute the search direction $S_{W_{j}} = (I - W_{j} {(W_{j}^{T} W_{j})}^{- 1} W_{j}^{T}) {\tilde{W}}_{j + 1}$ .
- Find the smallest positive number l so that
  
  $J (W_{j}, V_{j + 1}) - J (W_{j} (δ^{l} γ), V_{j + 1}) \geq c δ^{l} γ ⟨S_{W_{j}}, S_{W_{j}}⟩,$
- Set $W_{j + 1} = W_{j} (δ^{l} γ) = (W_{j} + δ^{l} γ S_{W_{j}}) {(V_{j + 1}^{T} (W_{j} + δ^{l} γ S_{W_{j}}))}^{- 1}$ .
- $j = j + 1$ .
End While
Compute $\hat{A} = W_{j}^{T} A V_{j}$ , $\hat{B} = W_{j}^{T} B$ , $\hat{C} = C V_{j}$ .

4. Two-Sided Projection via Optimization on Grassmann Manifold

Zeng and Lu [24] considered a two-sided projection for

H_{2}

-optimal model reduction of continuous-time linear systems. The cost is minimized over a Grassmann manifold with two project matrices. In this subsection, we will extend this two-sided projection approach to discrete-time linear system (1).

The two-sided projection based approach naturally involves a pair of biorthogonal matrices

W, V \in R^{n \times r}

, i.e.,

W^{T} V = I_{r}

. The coefficient matrices of the reduced-order systems can be obtained via projections as follows:

\hat{A} = W^{T} A V, \hat{B} = W^{T} B, \hat{C} = C V .

(49)

In the two-sided projection framework, the cost function for the error system

Σ_{e}

is given by

J (W, V) = ∥ H_{e} ∥_{h_{2}}^{2},

(50)

and the minimization problem to be solved is given by the problem

min_{W, V \in R^{n \times r}, W^{T} V = I_{r}} J (W, V) .

(51)

As shown in [24], this optimal problem (51) can be equivalently formulated as the following minimization problem over a Grassmann manifold with two variable matrices:

min_{[W], [V] \in Gr (r, n), W^{T} V = I_{r}} J (W, V) .

(52)

From (26) and (27), we can express the cost

J (W, V)

by

W, V

as follows:

\begin{matrix} J (W, V) & = & trace (C^{T} C (P + V \hat{P} V^{T} - 2 X V^{T})), \end{matrix}

(53)

\begin{matrix} = & trace (B B^{T} (Q + W \hat{Q} W^{T} + 2 Y W^{T})), \end{matrix}

(54)

where

P, \hat{P}, Q, \hat{Q}, X,

and Y satisfy

\begin{matrix} A P A^{T} - P + B B^{T} & = & 0, \end{matrix}

(55)

\begin{matrix} W^{T} A V \hat{P} V^{T} A^{T} W - \hat{P} + W^{T} B B^{T} W & = & 0, \end{matrix}

(56)

\begin{matrix} A X V^{T} A^{T} W - X + B B^{T} W & = & 0, \end{matrix}

(57)

\begin{matrix} A^{T} Q A - Q + C^{T} C & = & 0, \end{matrix}

(58)

\begin{matrix} V^{T} A^{T} W \hat{Q} W^{T} A V - \hat{Q} + V^{T} C^{T} C V & = & 0, \end{matrix}

(59)

\begin{matrix} A^{T} Y W^{T} A V - Y - C^{T} C V & = & 0 . \end{matrix}

(60)

We give the expressions of the matrices of all partial derivatives of J with respect to V and W in the following theorem, the proof of which is similar to that of Theorem 1 and is thus omitted.

Theorem 2.

The matrices

J_{V}, J_{W}

of all partial derivatives of J with respect to

V, W

can be expressed as

\begin{matrix} J_{V} & = & 2 C^{T} C (V \hat{P} - X) + 2 A^{T} W (Y^{T} A X + \hat{Q} W^{T} A V \hat{P}), \end{matrix}

(61)

\begin{matrix} J_{W} & = & 2 B B^{T} (W \hat{Q} + Y) + 2 A V (X^{T} A^{T} Y + \hat{P} V^{T} A^{T} W \hat{Q}), \end{matrix}

(62)

where

\hat{P}

, X,

\hat{Q}

, and Y are the solutions of (56), (57), (59), and (60), respectively.

With the help of the matrices

J_{W}, J_{V}

of all partial derivatives of

J (W, V)

, from [24] (Theorem 1), the gradients

\nabla J_{V}

and

\nabla J_{W}

of J at points

[V]

and

[W]

of Grassmann manifold

Gr (r, n)

are explicitly written as

\begin{matrix} \nabla J_{V} & = & J_{V} - V {(V^{T} V)}^{- 1} V^{T} J_{V}, \end{matrix}

(63)

\begin{matrix} \nabla J_{W} & = & J_{W} - W {(W^{T} W)}^{- 1} W^{T} J_{W}, \end{matrix}

(64)

respectively, where

J_{V}, J_{W}

are given by (61) and (62), respectively.

So, for the two-sided projection,

V, W

must satisfy the necessary conditions

\nabla J_{V} = 0, \nabla J_{W} = 0, V^{T} W = I .

Solving the Optimization Problem via an Alternating Direction Approach

Now we consider applying the alternating direction algorithm with sequentially quadratic approximation of the cost

J (W, V)

to solve minimization problem (52).

Suppose that for

j = 0, 1, 2, \dots

,

W_{j}

and

V_{j}

are the known projection matrices. Firstly, we approximate the cost

J (W, V)

, expressed by (53), in the V-direction by the following quadratic function:

\hat{J} (V) = trace (C^{T} C (P + V {\hat{P}}_{j} V^{T} - 2 X_{j} V^{T})),

(65)

where

{\hat{P}}_{j}

and

X_{j}

are the solutions of the Stein equations

\begin{matrix} {\hat{A}}_{j} {\hat{P}}_{j} {\hat{A}}_{j}^{T} - {\hat{P}}_{j} + {\hat{B}}_{j} {\hat{B}}_{j}^{T} & = & 0, \end{matrix}

(66)

\begin{matrix} A X_{j} {\hat{A}}_{j}^{T} - X_{j} + B {\hat{B}}_{j}^{T} & = & 0 \end{matrix}

(67)

respectively, with

{\hat{A}}_{j} = W_{j}^{T} A V_{j}

and

{\hat{B}}_{j} = W_{j}^{T} B

. It is not difficult to see that the partial derivative

{\hat{J}}_{V}

of the quadratic function

\hat{J}

can be formulated as

{\hat{J}}_{V} = 2 C C^{T} (V {\hat{P}}_{j} - X_{j}) .

By setting

{\hat{J}}_{V} = 0

, we can obtain a new matrix

{\hat{V}}_{j + 1} = X_{j} {\hat{P}}_{j}^{- 1}

if

{\hat{P}}_{j}

is invertible. However, we cannot use

{\hat{V}}_{j + 1}

as a projection matrix since

{\hat{V}}_{j + 1}

does not usually satisfy

W_{j}^{T} {\hat{V}}_{j + 1} = I_{r}

. In [24], an approach is proposed to construct the

(j + 1)

-th iterate

V_{j + 1}

based on the difference

{\hat{V}}_{j + 1} - V_{j}

. We outline the scheme as follows:

Firstly, by projecting ${\hat{V}}_{j + 1} - V_{j}$ onto the tangent space of $Gr (r, n)$ at point $[V_{j}]$ , a search direction $S_{V_{j}}$ is generated by

$S_{V_{j}} = (I - V_{j} {(V_{j}^{T} V_{j})}^{- 1} V_{j}^{T}) {\hat{V}}_{j + 1} .$
Secondly, we define the search matrix $V_{j} (t)$ , which should satisfy $W_{j}^{T} V_{j} (t) = I_{r}$ , by

$V_{j} (t) = (V_{j} + t S_{V_{j}}) {(W_{j}^{T} (V_{j} + t S_{V_{j}}))}^{- 1},$

where t denotes the step length parameter.
Finally, the $(j + 1)$ -th iterate $V_{j + 1}$ is constructed by the inexact line search with the Armijo rule. That is, for some given $c, δ \in (0, 1), γ > 0$ , we find the smallest positive number l so that $V_{j} (δ^{l} γ)$ satisfies the adequate reduction condition

$J (W_{j}, V_{j}) - J (W_{j}, V_{j} (δ^{l} γ) \geq c δ^{l} γ ⟨S_{V_{j}}, S_{V_{j}}⟩,$

and then set $V_{j + 1} = V_{j} (δ^{l} γ) = (V_{j} + δ^{l} γ S_{V_{j}}) {(W_{j}^{T} (V_{j} + δ^{l} γ S_{V_{j}}))}^{- 1}$ .

After obtaining the projection matrix

V_{j + 1}

, the

(j + 1)

iterate

W_{j + 1}

for W is constructed in an analogous way. From (54), the quadratic function

\tilde{J} (W)

, which approximates

J (W, V)

at point

(W_{j}, V_{j + 1})

, is defined by

\tilde{J} (W) = trace (B B^{T} (Q + W {\hat{Q}}_{j} W^{T} + 2 Y_{j} W^{T}))

where

{\hat{Q}}_{j}

and

Y_{j}

satisfy

\begin{matrix} {\tilde{A}}_{j}^{T} {\hat{Q}}_{j} {\tilde{A}}_{j} - {\hat{Q}}_{j} + {\hat{C}}_{j}^{T} {\hat{C}}_{j} & = & 0, \end{matrix}

(68)

\begin{matrix} A^{T} Y_{j} {\tilde{A}}_{j} - Y_{j} + C^{T} {\hat{C}}_{j} & = & 0 \end{matrix}

(69)

respectively, with

{\tilde{A}}_{j} = W_{j}^{T} A V_{j + 1}

and

{\hat{C}}_{j} = C V_{j + 1}

. Similarly, by setting

{\tilde{J}}_{W} = B B^{T} (Y_{j} + W {\hat{Q}}_{j}) = 0

, we obtain

{\tilde{W}}_{j + 1} = - Y_{j} {\hat{Q}}_{j}^{- 1}

. Following the steps in the construction of

V_{j + 1}

, the new projection matrix

W_{j + 1}

is established as follows:

Construct the search direction $S_{W_{j}}$ by

$S_{W_{j}} = (I - W_{j} {(W_{j}^{T} W_{j})}^{- 1} W_{j}^{T}) {\tilde{W}}_{j + 1} .$
Define the search matrix $W_{j} (t)$ by

$W_{j} (t) = (W_{j} + t S_{W_{j}}) {(V_{j + 1}^{T} (W_{j} + t S_{W_{j}}))}^{- 1} .$
For some given $c, δ \in (0, 1), γ > 0$ , find the smallest positive number l so that

$J (W_{j}, V_{j + 1}) - J (W_{j} (δ^{l} γ), V_{j + 1}) \geq c δ^{l} γ ⟨S_{W_{j}}, S_{W_{j}}⟩,$

and then set $W_{j + 1} = W_{j} (δ^{l} γ) = (W_{j} + δ^{l} γ S_{W_{j}}) {(V_{j + 1}^{T} (W_{j} + δ^{l} γ S_{W_{j}}))}^{- 1}$ .

In summary, we propose a two-sided projection method based on Grassmann manifold optimization for

h_{2}

model reduction of discrete-time system (1), as presented in Algorithm 3.

5. Implementation Issues

In this section, we consider some issues relative to the implementation of the algorithms proposed in this paper, including how to choose the initial projection matrices and the termination criterion and select suitable methods for the Stein equations.

5.1. Initial Projection Matrix Selection

For the algorithms proposed in this paper, the first possible choice is to construct initial projection matrices randomly. For one-sided projection, after generating the

n \times r

random matrix, we need to compute its QR decomposition to obtain an orthogonal initial matrix

U_{0}

. For the two-sided projection method, we first construct two

n \times r

random matrices

W_{0}, V_{0}

and then set

V_{0} : = V_{0} {(W_{0}^{T} V_{0})}^{- 1}

so that the biorthogonal condition

W_{0}^{T} V_{0} = I_{r}

holds.

Secondly, we can choose the initial projection matrices so that the subspaces spanned by them are the block Krylov subspace or the rational block Krylov subspace:

\begin{matrix} span \{B, A B, \dots, \dots, A^{l - 1} B\}, \\ span \{{(σ_{1} I_{n} - A)}^{- 1} B, {(σ_{2} I_{n} - A)}^{- 1} B, \dots, \dots, {(σ_{l} I_{n} - A)}^{- 1} B\} . \end{matrix}

Recall that the first-order necessary conditions for

h_{2}

model reduction of a discrete-time system state that the optimal reduced system interpolates the full system at the mirror images with respect to the unit circle of the poles of the reduced system; see, for example, ref. [35]. So the reciprocal of the approximate eigenvalue or the Ritz value of A is the suitable choice of shift parameters

σ_{i}

for the rational block Krylov subspace. In numerical experiments, we will test random selection and the choice of the block Krylov subspace and the rational block Krylov subspace.

5.2. Termination Criterion

As we know, in fact, the algorithms proposed in this paper are developed for solving optimization problems over Grassmann manifolds. So we should stop these algorithms if the norm of the gradient of the cost function J at some approximate solution is small.

For one-sided projection methods such as Algorithms 1 and 2, the cost J is defined based on one matrix variable U and is given in (42). Therefore, the iteration should be naturally terminated if the gradient at point

U_{j}

satisfies

∥ \nabla J_{U_{j}} ∥ < ε,

where

ε

is a prescribed tolerance and

\nabla J_{U_{j}} = (I_{n} - U_{j} U_{j}^{T}) R_{j}

, with

R_{j}

given by (44).

For two-sided projection, the cost J has two matrix variables

W, V

. It is known that if

(W, V)

is the solution of minimization problem (52), the gradients at W and V should satisfy

\nabla J_{W} = \nabla J_{V} = 0

. So the stopping rule for Algorithm 3 can be defined by

\sqrt{∥ \nabla J_{W_{j}} ∥^{2} + {∥ \nabla J_{V_{j}} ∥}^{2}} < ε,

where

\nabla J_{V_{j}}

and

\nabla J_{W_{j}}

are computed by using (63) and (64), respectively.

5.3. Solving Stein Equations

In this subsection, we consider the numerical solution of two classes of Stein equations appearing in our algorithms.

The Stein equation of the type shown in (66), called the discrete-time Lyapunov equation, has a small dimension r. This kind of equations can be solved by the standard direct method provided in [42], which is a direct extension of the well-known Bartels–Stewart algorithm [43] for continuous Lyapunov equations. Note that in order to obtain the real solution, the real Schur decomposition [13] should be used. In [44], low-rank methods were proposed for solving discrete-time projected Lyapunov equations. For a comprehensive review of numerical methods for solving matrix equations, we refer to [45].

We now consider how to solve Stein equations like (67), which are also known as discrete-time Sylvester equations. We point out that this kind of matrix equations cannot be solved by a Bartels–Stewart-like algorithm. The reason is that the matrix A has a large dimension n, so computing the real Schur decomposition of A requires

O (n^{3})

time complexity.

For simplicity of notation, we drop the index j in (67). This kind of equation is now of the form

\begin{matrix} A X {\hat{A}}^{T} - X + B {\hat{B}}^{T} & = & 0 \end{matrix}

(70)

with

A \in R^{n \times n}

,

\hat{A} \in R^{r \times r}

,

X \in R^{n \times r}

,

B \in R^{n \times m}

, and

\hat{B} \in R^{m \times r}

. Usually, A is sparse, and

\hat{A}

is dense. So, this class of matrix equations can be called the sparse–dense discrete-time Sylvester equations. We note that in [46], a method based on Schur decomposition of the dense matrix is proposed for solving sparse–dense continuous-time Sylvester equations, which arise in

H_{2}

model order reduction of continuous-time LTI systems.

Here, we will use a variant of the real Bartels–Stewart algorithm to obtain the real solution of (70). Let

{\hat{A}}^{T} = Q Λ Q^{T}

be the real Schur decomposition of

{\hat{A}}^{T}

with

Λ = [\begin{matrix} Λ_{11} & Λ_{12} & \dots & Λ_{1 l} \\ 0 & Λ_{22} & ⋱ & ⋮ \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & Λ_{l l} \end{matrix}],

where

Λ_{i i}

is a real number or a

2 \times 2

real matrix having a pair of complex conjugate eigenvalues. With this decomposition, Equation (70) is transformed to the equivalent equation

\begin{matrix} A \tilde{X} Λ - \tilde{X} + B {\tilde{B}}^{T} & = & 0, \end{matrix}

(71)

where

\tilde{X} = X Q

and

{\tilde{B}}^{T} = {\hat{B}}^{T} Q

. We partition

\tilde{X}

and

{\tilde{B}}^{T} = {\hat{B}}^{T} Q

as

\begin{matrix} \tilde{X} & = & [{\tilde{X}}_{1}, {\tilde{X}}_{2}, \dots, {\tilde{X}}_{l}], \\ {\tilde{B}}^{T} & = & [{\tilde{B}}_{1}^{T}, {\tilde{B}}_{2}^{T}, \dots, {\tilde{B}}_{l}^{T}], \end{matrix}

where

{\tilde{X}}_{i}

and

{\tilde{B}}_{1}^{T}

have the same number of columns of

Λ_{i i}

. From (71) and the above partition, we have

A {\tilde{X}}_{i} Λ_{i i} - {\tilde{X}}_{i} = - B {\tilde{B}}_{i}^{T} - A \sum_{k = 1}^{i - 1} {\tilde{X}}_{k} Λ_{k i}, 1 \leq i \leq l

(72)

with

Λ_{0 i} = 0

.

If

Λ_{i i}

is a number, it follows from (72) that

{\tilde{X}}_{i} = - {(Λ_{i i} A - I_{n})}^{- 1} (B {\tilde{B}}_{i}^{T} + A \sum_{k = 1}^{i - 1} {\tilde{X}}_{k} Λ_{k i}) .

(73)

Now we consider the case where

Λ_{i i}

is a

2 \times 2

real matrix. Let

λ_{i} \in C

be one eigenvalue of

Λ_{i i}

with the corresponding eigenvector

φ_{i} \in C^{2}

. Then,

({\bar{λ}}_{i}, {\bar{φ}}_{i})

is also an eigenpair of

Λ_{i i}

. Thus, we have

Λ_{i i} [\begin{matrix} φ_{i} & {\bar{φ}}_{i} \end{matrix}] = [\begin{matrix} φ_{i} & {\bar{φ}}_{i} \end{matrix}] [\begin{matrix} λ_{i} & 0 \\ 0 & {\bar{λ}}_{i} \end{matrix}] .

(74)

Multiplying (72) by

[\begin{matrix} φ_{i} & {\bar{φ}}_{i} \end{matrix}]

the right-hand side and using (74) lead to

(λ_{i} A - I_{n}) {\tilde{X}}_{i} φ_{i} = - (B {\tilde{B}}_{i}^{T} + A \sum_{k = 1}^{i - 1} {\tilde{X}}_{k} Λ_{k i}) φ_{i} .

Define

ψ_{i} : = {\tilde{X}}_{i} φ_{i} = - {(λ_{i} A - I_{n})}^{- 1} (B {\tilde{B}}_{i}^{T} + A \sum_{k = 1}^{i - 1} {\tilde{X}}_{k} Λ_{k i}) φ_{i} .

(75)

Then, we have

[\begin{matrix} ψ_{i} & {\bar{ψ}}_{i} \end{matrix}] = {\tilde{X}}_{i} [\begin{matrix} φ_{i} & {\bar{φ}}_{i} \end{matrix}],

i.e.,

[\begin{matrix} real (ψ_{i}) & imag (ψ_{i}) \end{matrix}] [\begin{matrix} 1 & 1 \\ ȷ & - ȷ \end{matrix}] = {\tilde{X}}_{i} [\begin{matrix} real (φ_{i}) & imag (φ_{i}) \end{matrix}] [\begin{matrix} 1 & 1 \\ ȷ & - ȷ \end{matrix}],

where

real (\cdot)

and

imag (\cdot)

denote the real and imaginary parts of a complex number or a complex matrix, respectively, and ȷ is the imaginary unit.

So, we obtain a real formula for

{\tilde{X}}_{i}

:

{\tilde{X}}_{i} = [\begin{matrix} real (ψ_{i}) & imag (ψ_{i}) \end{matrix}] {[\begin{matrix} real (φ_{i}) & imag (φ_{i}) \end{matrix}]}^{- 1} .

(76)

We summarize the framework of this method for solving the discrete-time Sylvester Equation (70) as follows:

Compute the real Schur decomposition ${\hat{A}}^{T} = Q Λ Q^{T}$ , and set ${\tilde{B}}^{T} = {\hat{B}}^{T} Q$ .
For $1 \leq i \leq l$ , if $Λ_{i i}$ is a number, compute ${\tilde{X}}_{i}$ by (73); otherwise, compute one eigenpiar $(λ_{i}, φ_{i})$ of $Λ_{i i}$ , compute $ψ_{i}$ by (75), and then compute ${\tilde{X}}_{i}$ by (76).
Finally, compute $X = \tilde{X} Q^{T}$ .

We remark that this approach involves complex arithmetic and the storage of complex matrices only in computing

ψ_{i}

and thus avoids complex arithmetic and the storage of complex matrices as much as possible. Moreover, for a

2 \times 2

block

Λ_{i i}

, only one system of linear equations needs to be solved. The main computation in this method and thus in the algorithms proposed in this paper for model reduction is the solution of linear systems with a coefficient matrix

λ A - I_{n}

. Thus, sparse direct solvers or Krylov subspace methods like GMRES or QMR [13] can be applied, which makes these model reduction methods suitable for large-scale problems.

6. Numerical Experiments

In this section, we present two examples to illustrate the performance of

h_{2}

-optimization over Grassmann manifolds for the model order reduction of linear discrete-time systems. We compare Algorithms 1–3 with balanced truncation (BT) and discrete-time IRKA. We note that the step size

t_{j} = δ^{l} γ

in Algorithms 1–3 is chosen to satisfy the Armijo rules. So, stability is preserved if the full-order system is stable (see [23]). However, IRKA may not necessarily generate stable reduced systems even if the original system is stable. The numerical experiments are carried out in Matlab 2022a on an Intel i5 processor with 2.3 GHz and 16 GB memory. The

h_{2}

relative error is defined by

rerr = \frac{∥ H - \hat{H} ∥_{2}}{{∥ H ∥}_{2}},

where H and

\hat{H}

are the transfer functions of the full system and the reduced system, respectively. In the numerical examples, we make the following parameter choices:

c = 0.5

,

δ = 10^{- 3}

,

γ = 3000

,

c_{1} = 10^{- 11}

, and

c_{2} = - 10^{- 7}

.

6.1. Example 1

In the first numerical example, the linear discrete-time system is obtained from the single-input single-output ISS-1R module (cf. [47]), which is a continuous-time model of the international space station. We discretize this model by using a semi-explicit Euler and a semi-implicit Euler method. The discrete-time model is a system with dimension

n = 270

. The numerical results are presented in Table 1. Table 1 shows that of all three algorithms proposed in this paper, the two-sided projection method (Algorithm 3) has the least relative errors for the reduced orders

r = 5, 10

. Moreover, Algorithm 3 and IRKA have almost the same relative errors and are slightly better than BT. In this numerical example, the algorithms tested in this paper have preserved the stability of the original system.

Table 1.

h_{2}

relative errors of Example 1.

6.2. Example 2

In the second numerical example, we discretize the continuous-time CD player model (cf. [47]) by using the same scheme as in the first example to generate a discrete-time system. The discrete-time model is a system with dimension

n = 120

and has 2 inputs and 2 outputs. The

h_{2}

relative errors are given in Table 2. It is clear from Table 2 that all three algorithms proposed in this paper are effective. For this example, Algorithm 3 and IRKA have also almost the same relative errors and are slightly better than BT. We point out that Algorithm 3 preserves stability in the reduced system, but IRKA does not preserve stability in this numerical example.

Table 2.

h_{2}

relative errors of Example 2.

7. Conclusions

In this paper, we have investigated

h_{2}

-optimal model reduction over Grassmann manifolds for discrete-time time-invariant systems. We have presented necessary conditions for optimization problems over Grassmann manifolds. We have considered two kinds of projections: one-sided projection and two-sided projection. For one-sided projection, we apply the gradient flow method and the sequentially quadratic approximation method to solve the related optimization problem. For two-sided projection, we apply the strategies of alternating direction iteration and sequentially quadratic approximation to the minimization problem. Numerical examples are presented to demonstrate the effectiveness of the proposed algorithms.

Author Contributions

Conceptualization, Y.L.; Methodology, L.Z.; Software, L.Z.; Writing—original draft, Y.L.; Formal analysis, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Hunan Province under grant 2024JJ7203, the key Project of Hunan provincial Education Department under grant 23A0577, and the Applied Characteristic Discipline at Hunan University of Science and Engineering.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Anderson, J. Computational Fluid Dynamics; McGraw-Hill Education: New York, NY, USA, 1995. [Google Scholar]
Tedesco, J.W.; McDougal, W.G.; Ross, C.A. Structural Dynamics: Theory and Applications; Prentice Hall: Upper Saddle River, NJ, USA, 1999. [Google Scholar]
Reis, T. Systems Theoretic Aspects of PDAEs and Applications to Eletrical Circuits. Ph.D. Thesis, Technische Universitat Kaiserlautern, Kaiserlautern, Germany, 2006. [Google Scholar]
Moran, J. An Introduction to Theoretical and Computational Aerodynamics; John Wiley & Sons: New York, NY, USA, 1984. [Google Scholar]
Allen, J.J. Micro Electro Mechanical System Design; CRC Press: Boca Raton, FL, USA, 2005. [Google Scholar]
Coiffier, J. Fundamentals of Numerical Weather Prediction; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
Miao, J. Economic Dynamics in Discrete Time; MIT Press: Cambridge, UK, 2014. [Google Scholar]
Möller, D.P.F. Introduction to Transportation Analysis, Modeling and Simulation; Springer: London, UK, 2014. [Google Scholar]
Gu, G. Discrete-Time Linear Systems: Theory and Design with Applications; Springer: New York, NY, USA, 2012. [Google Scholar]
Moore, B.C. Principal component analysis in linear systems: Controllability, observability, and model reduction. IEEE Trans. Autom. Control 1981, 26, 17–32. [Google Scholar] [CrossRef]
Glover, K. All optimal Hankel-norm approximations of linear multivariable systems and their L^∞-errors bounds. Int. Control 1984, 39, 1115–1193. [Google Scholar] [CrossRef]
Liu, Y.; Anderson, B.D.O. Singular perturbation approximation of balanced systems. Int. J. Control 1989, 50, 1379–1405. [Google Scholar] [CrossRef]
Golub, G.H.; Loan, C.F.V. Matrix Computations, 3rd ed.; Johns Hoplins University Press: Baltimore, MD, USA, 1996. [Google Scholar]
Ruhe, A. Rational Krylov algorithms for nonsymmetric eigenvalue problems. II: Matrix pair. Linear Algebra Its Appl. 1994, 197, 282–295. [Google Scholar] [CrossRef]
Clark, J.V.; Zhou, N.; Bindel, D.; Schenato, L.; Wu, W.; Demmel, J.; Pister, K.S.J. 3d MEMS simulation using modified nodal analysis. Proc. Microscale Syst. Mech. Meas. Symp. 2000, 68–75. [Google Scholar]
Craig, R.R. Structural Dynamics: An Introduction to Computer Methods; John Wiley & Sons: New York, NY, USA, 1981. [Google Scholar]
Freund, R.W. Krylov-subspace methods for reduced-order modeling in circuit simulation. J. Comput. Appl. Math. 2000, 123, 395–421. [Google Scholar] [CrossRef]
Freund, R.W. Model reduction methods based on Krylov susbspace. Acta Numer. 2003, 12, 267–319. [Google Scholar] [CrossRef]
Grimme, E. Krylov Projection Methods for Model Reduction. Ph.D. Thesis, University of Illinois, Urbana–Champaign, Champaign, IL, USA, 1997. [Google Scholar]
Gugercin, S.; Antoulas, A.C.; Beattie, C. H₂ model reduction for large-scale linear dynamical systems. SIAM J. Matrix Anal. Appl. 2008, 30, 609–638. [Google Scholar] [CrossRef]
Van Dooren, P.; Gallivan, K.A.; Absil, P.A. H₂-optimal model reduction of MIMO systems. Appl. Math. Lett. 2008, 21, 1267–1273. [Google Scholar] [CrossRef]
H₂-optimal model reduction with higher-order poles. SIAM J. Matrix Anal. Appl. 2010, 31, 2738–2753. [CrossRef]
Xu, Y.; Zeng, T. Fast optimal h₂ model reduction algorithms based on Grassmann manifold optimization. Int. J. Numer. Anal. Model. 2013, 10, 972–991. [Google Scholar]
Zeng, T.; Lu, C. Two-sided Grassmann manifold algorithm for optimal H₂ model reduction. Int. J. Numer. Methodsin Eng. 2015, 104, 928–943. [Google Scholar] [CrossRef]
Benner, P.; Cao, X.; Schilders, W. A bilinear H₂ model order reduction approach to linear parameter-varying systems. Adv. Comput. Math. 2019, 45, 2241–2271. [Google Scholar] [CrossRef]
Xu, K.; Dong, L.; Wang, B.; Li, Z. Preserving-periodic Riemannian descent model reduction of linear discrete-time periodic systems with isometric vector transport on product manifolds. Appl. Math. Lett. 2025, 171, 231–254. [Google Scholar] [CrossRef]
Xu, K.; Li, Z.; Benner, P. Parametric interpolation model order reduction on Grassmann manifolds by parallelization. IEEE Trans. Circuits Syst. 2025, 72, 198–202. [Google Scholar] [CrossRef]
Xu, K.; Jiang, Y.; Yang, Z. H₂ order-reduction for bilinear systems based on Grassmann manifold. J. Frankl. Inst. 2015, 352, 4467–4479. [Google Scholar] [CrossRef]
Otto, S.E.; Padovan, A.; Rowley, C.W. Optimizing oblique projections for nonlinear systems using trajectories. SIAM J. Sci. Comput. 2022, 44, A1681–A1702. [Google Scholar] [CrossRef]
Padovan, A.; Vollmer, B.; Bodony, D.J. Data-driven model reduction via non-intrusive optimization of projection operators and reduced-order dynamics. SIAM J. Appl. Dyn. Syst. 2024, 23, 3052–3076. [Google Scholar] [CrossRef]
Buchfink, P.; Glas, S.; Haasdonk, B.; Unger, B. Model Reduct. Manifolds: A Differ. Geom. Framework. Phys. D Nonlinear Phenom. 2024, 468, 134299. [Google Scholar] [CrossRef]
Zimmermann, R. Manifold interpolation. In System and Data-Driven Methods and Algorithms; De Gruyter: Berlin, Germany, 2021; pp. 229–274. [Google Scholar]
Sashittal, P.; Bodony, D. Low-rank dynamic mode decomposition using Riemannian manifold optimization. In Proceedings of the IEEE Conference on Decision and Control (CDC), Miami Beach, FL, USA, 17–19 December 2018; pp. 2265–2270. [Google Scholar]
Wilson, D.A. Optimum solution of model reduction problem. Proc. Inst. Electr. Eng. 1970, 117, 1161–1165. [Google Scholar] [CrossRef]
Bunse-Gerstner, A.; Kubalińska, D.; Vossen, G.; Wilczek, D. h₂-Norm Optimal Model Reduction for Large Scale Discrete Dynamical MIMO Systems. J. Comput. Appl. Math. 2010, 233, 1202–1216. [Google Scholar] [CrossRef]
Absil, P.-A.; Mahony, R.; Sepulchre, R. Optimization Algorithms on Matrix Manifolds; Princeton University Press: Princeton, NJ, USA, 2008. [Google Scholar]
Edelman, A.; Arias, T.A.; Smith, S.T. The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 1998, 20, 303–353. [Google Scholar] [CrossRef]
Antoulas, A.C. Approximation of Large-Scale Dynamical Systems; SIAM: Philadelphia, PA, USA, 2005. [Google Scholar]
Highham, N.J. Functions of Matrices: Theory and Computation; SIAM: Philadelphia, PA, USA, 2008. [Google Scholar]
Yan, W.-Y.; Lam, J. An approximate approach to H₂-optimal model reduction. IEEE Trans. Autom. Control 1999, 44, 1341–1357. [Google Scholar]
Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: New York, NY, USA, 1999. [Google Scholar]
Barraud, A.Y. A numerical algorithm to solve A^TXA − X = Q. IEEE Trans. Autom. Control 1977, 22, 883–885. [Google Scholar] [CrossRef]
Bartels, R.; Stewart, G. Solution of the equation AX + XB = C. Commun. ACM 1972, 15, 820–826. [Google Scholar] [CrossRef]
Lin, Y. Low-rank methods for solving discrete-time projected Lyapunov equations. Mathematics 2024, 12, 1166. [Google Scholar] [CrossRef]
Simoncini, V. Computational methods for linear matrix equations. SIAM Rev. 2016, 58, 377–441. [Google Scholar] [CrossRef]
Benner, P.; Köhler, M.; Saak, J. Sparse-Dense Sylvester Equations in H₂-Model Order Reduction; Technical Report MPIMD/11-11; Max Planck Institute Magdeburg Preprint: Magdeburg, Germany, 2011. [Google Scholar]
Chahlaoui, Y.; Dooren, P.V. Benchmark Examples for Model Reduction of Linear Timeinvariant Dynamical Systems. In Dimension Reduction of Large-Scale Systems, Lecture Notes in Computational Science and Engineering 45; Springer: New York, NY, USA, 2005; pp. 379–392. [Google Scholar]

Table 1.

h_{2}

relative errors of Example 1.

Table 1.

h_{2}

relative errors of Example 1.

Method	$r = 5$	$r = 10$
Algorithm 1	$5.07 \times 10^{- 1}$	$7.62 \times 10^{- 2}$
Algorithm 2	$4.79 \times 10^{- 1}$	$7.41 \times 10^{- 2}$
Algorithm 3	$8.12 \times 10^{- 2}$	$3.24 \times 10^{- 3}$
BT	$1.22 \times 10^{- 1}$	$5.36 \times 10^{- 3}$
IRKA	$8.10 \times 10^{- 2}$	$3.25 \times 10^{- 3}$

Table 2.

h_{2}

relative errors of Example 2.

Table 2.

h_{2}

relative errors of Example 2.

Method	$r = 5$	$r = 10$
Algorithm 1	$3.06 \times 10^{- 1}$	$3.52 \times 10^{- 3}$
Algorithm 2	$2.79 \times 10^{- 1}$	$3.41 \times 10^{- 3}$
Algorithm 3	$5.17 \times 10^{- 2}$	$6.55 \times 10^{- 4}$
BT	$5.74 \times 10^{- 2}$	$8.29 \times 10^{- 4}$
IRKA	$5.17 \times 10^{- 2}$	$6.54 \times 10^{- 4}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.

Model Reduction for Discrete-Time Systems via Optimization over Grassmann Manifold

Abstract

1. Introduction

2. Preliminaries

2.1. h 2 -Norm of Discrete-Time LTI Systems

2.2. Stiefel and Grassmann Manifolds

2.3. The Gradient Flow Method

3. One-Sided Projection via Optimization on Grassmann Manifold

3.1. Solving the Optimization Problem via the Gradient Flow Approach

3.2. Solving the Optimization Problem via Sequentially Quadratic Approximation

4. Two-Sided Projection via Optimization on Grassmann Manifold

Solving the Optimization Problem via an Alternating Direction Approach

5. Implementation Issues

5.1. Initial Projection Matrix Selection

5.2. Termination Criterion

5.3. Solving Stein Equations

6. Numerical Experiments

6.1. Example 1

6.2. Example 2

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Article Access Statistics

2.1. $h_{2}$ -Norm of Discrete-Time LTI Systems