On Schur Forms for Matrices with Simple Eigenvalues

Mihail Mihaylov Konstantinov; Petko Hristov Petkov

doi:10.3390/axioms13120839

and

¹

Department of Mathematics, University of Architecture, Civil Engineering and Geodesy, 1064 Sofia, Bulgaria

²

Bulgarian Academy of Sciences, 1040 Sofia, Bulgaria

^*

Author to whom correspondence should be addressed.

Axioms2024, 13(12), 839;https://doi.org/10.3390/axioms13120839

Version Notes

Order Reprints

Abstract

In this paper, we consider various aspects of the Schur problem for a square complex matrix A, namely the similarity unitary transformation of A into upper triangular form containing the eigenvalues of A on its diagonal. Since the profound work of I. Schur published in 1909, this has become a fundamental issue in the theory and applications of matrices. Nevertheless, certain details concerning the Schur problem need further clarification, especially in connection with the perturbation analysis of the Schur decomposition relative to perturbations in the matrix A. We consider both canonical and condensed Schur forms. Special attention is paid to matrices with simple eigenvalues. Some new concepts, such as quasi-Schur forms and diagonally spectral matrices, are also introduced and studied.

Keywords:

Schur canonical form; Schur condensed form; diagonally spectral matrix; quasi-Schur form; perturbations of Schur form

MSC:

15A21; 65G30

1. Introduction and Notation

The Schur decomposition of a general square matrix [1] and its generalizations are major tools both in the theory and applications of matrix analysis; see, e.g., refs. [2,3,4,5]. In this paper, we consider the main definitions and properties of the Schur decomposition of a square matrix (see the definitions given later), which are important from the point of view of the perturbation analysis of the Schur problem; see [6,7,8,9,10]. Various aspects of Schur and Schur-like decompositions are considered in [11,12,13,14]. The monographs [15,16] also deal with these problems. Schur-like forms are also proposed for pairs of matrices (or matrix pencils) [3], for control systems [17,18,19], etc. A perturbation analysis of these constructions is presented in [18,20,21] and the references therein.

In this paper, we introduce new concepts in this field such as quasi-canonical Schur forms and diagonally spectral matrices. A number of examples are given for illustration of the results presented. This is a specific issue, and we shall need a large number of notations. For the convenience of the reader, the general notations are gathered below in this section, while some specific notations appear further in the text. Some of the matrix notations are inspired by the language of the program system MATLAB^® Version 9.9 (R2020b) [22].

If

K

is a finite set, then

card (K)

is the number of its elements. Let

Z = {0, \pm 1, \pm 2, \dots}

be the set of integers and

m, n \in Z

, where

m \leq n

. We denote by

Z [m, n] = {k \in Z : m \leq k \leq n}

the set of

n - m + 1

integers

m, m + 1, \dots, n

. We write

Z [m, m] = m

and

Z [n, m] = \emptyset

when

n > m

.

The set of real (resp. complex) numbers is denoted by

R

(resp.

C ≃ R \times R

), and

i = \sqrt{- 1}

is the imaginary unit. A quantity

0 \neq z \in C ∖ R

is said to be genuinely complex. An array (vector or matrix) is genuinely complex if at least one of its elements is genuinely complex.

A complex number z is written as

z = x + i y

with

x, y \in R

, or

z = | z | exp (i φ)

, where

| z | = \sqrt{x^{2} + y^{2}}

is the absolute value and

φ \in (- π, π]

is the angle of z. The complex conjugate of z is denoted as

\bar{z} = x - i y = | z | exp (- i φ)

.

The sign function

sign : R \to Z [- 1, 1]

of the scalar argument is defined as

sign (x) = - 1

,

sign (x) = 0

and

sign (x) = 1

for

x < 0

,

x = 0

and

x > 0

, resp. The sign function for real n-tuples

x = (x_{1}, x_{2}, \dots, x_{n})

is defined by the expression

sign (x) = \sum_{k = 1}^{n} 2^{1 - k} sign (x_{k}) .

The lexicographical order ≺ for n-tuples is defined as

x ≺ \tilde{x}

,

x = \tilde{x}

and

x ≻ \tilde{x}

if

sign (x - \tilde{x}) < 0

,

sign (x - \tilde{x}) = 0

and

sign (x - \tilde{x}) > 0

, resp. Otherwise,

x ≺ \tilde{x}

when either

x_{1} < {\tilde{x}}_{1}

, or there exists

m \in Z [2, n]

such that

x_{k} = {\tilde{x}}_{k}

for

k \in Z [1, m - 1]

and

x_{m} < {\tilde{x}}_{m}

.

We use this lexicographical order for complex numbers

z = x + i y

written as real pairs

(x, y)

and for pairs

(m, n)

of integers

m, n \in Z

. For example, for the fourth roots

{\pm 1, \pm i}

of 1, we have

- 1 ≺ - i ≺ i ≺ 1

.

We denote by

C (m, n)

(resp.

R (m, n)

) the space of

m \times n

complex (resp. real) matrices

A = [A (i, j)]

with elements

A (i, j)

and we set

C (n) = C (n, n)

,

R (n) = R (n, n)

. The i-th row and the j-th column of A are denoted as

A (i, :) \in C (1, n)

and

A (:, j) \in C (m, 1)

, respectively.

The column m-vector

b \in C (m, 1)

with elements

b_{i} = b (i)

is written as

b = [b_{1}; b_{2}; \dots; b_{m}]

, while the row n-vector

c \in C (1, n)

with elements

c_{j} = c (j)

is denoted as

c = [c_{1}, c_{2}, \dots, c_{n}]

.

The Kronecker delta symbol is

d (i, j) = 1 - {sign}^{2} (i - j)

. The identity

n \times n

matrix is denoted as

I_{n} \in R (n)

and has elements

I_{n} (i, j) = d (i, j)

. The anti-identity matrix

P_{n} \in R (n)

has elements

P_{n} (i, j) = d (i, n + 1 - j)

.

The zero

m \times n

matrix is denoted as

O_{m, n} \in R (m, n)

with

O_{n} = O_{n, n}

, or simply as O. We denote by

L_{n} \in R (n)

the strictly lower triangular matrix with ones below its main diagonal and zeros otherwise, i.e.,

L_{n} (i, j) = 1

if

i > j

and

L_{n} (i, j) = 0

if

i \leq j

. The elementary matrix with element 1 in position

(p, q)

and zero otherwise is denoted as

E_{p, q} \in R (m, n)

, i.e.,

E_{p, q} (i, j) = d (i, p) d (j, q)

.

The absolute value of the matrix

A \in C (m, n)

is the matrix

| A | \in R (m, n)

with elements

| A | (i, j) = | A (i, j) |

. The transpose of the matrix

A \in C (m, n)

is denoted

A^{⊤} \in C (n, m)

and has elements

A^{⊤} (i, j) = A (j, i)

. The complex conjugate transpose of

A \in C (m, n)

is denoted

A^{H} \in C (n, m)

and has elements

A^{H} (i, j) = \bar{A (j, i)}

.

The inverse of the non-singular matrix A is denoted as

A^{- 1}

. The flipped matrix

A^{F} \in C (m, n)

of the matrix

A \in C (m, n)

is

A^{F} = P_{n} A P_{n}

and has elements

A^{F} (i, j) = A (m + 1 - i, n + 1 - j)

.

For

A, B \in C (m, n)

, we denote by

A \circ B \in C (m, n)

the element-wise product of A and B, i.e.,

(A \circ B) (i, j) = A (i, j) B (i, j)

. The spectral and the Frobenius norms of the matrix

A \in C (m, n)

are denoted as

∥ A ∥

and

{∥ A ∥}_{F}

, respectively.

The spectrum

spect (A)

of the matrix

A \in C (n)

is the collection, or the multiset, of the eigenvalues

λ_{k} (A) \in C

of A,

k \in Z [1, n]

, counted according to their algebraic multiplicities. With certain abuse of notation, we write

spect (A) \subset C

in the general case, and

spect (A) \subset R

in the case when all eigenvalues of A are real.

The multiplicative group of non-singular matrices

U \in C (n)

is denoted as

G L (n)

, while the group of unitary matrices

U \in C (n)

such that

U^{H} U = I_{n}

is denoted by

U (n)

. The group of orthonormal matrices

U \in R (n)

such that

U^{⊤} U = I_{n}

is denoted as

O (n)

.

For

A \in C (n)

, we denote by

Low (A) = A \circ L_{n}

and

Diag (A) = A \circ I_{n}

the strictly lower triangular and the diagonal parts of A, respectively. If x is an n-vector with elements

x (i) \in C

then

diag (x) \in C (n)

is the matrix with elements

diag (x) (i, j) = x (i) d (i, j)

.

The set of upper triangular matrices

A = A - Low (A)

is denoted as

T (n) \subset C (n)

, while the set of diagonal matrices

A = Diag (A)

is denoted as

D (n) \subset T (n)

. For

n \geq 2

, the group of diagonal matrices of the form

diag (1, exp (i φ_{2}), \dots, exp (i φ_{n})),

where

φ_{k} \in R

, is denoted as

D^{*} (n) \subset U (n)

.

The set of

(n - 1)

-tuples of pairs

{(i_{1}, j_{1}), (i_{2}, j_{2}), \dots, (i_{n - 1}, j_{n - 1})}

of integers

i_{k} \in Z [1, n - 1]

,

j_{k} \in Z [2, n]

, where

i_{k} < j_{k}

, is denoted as

K (n)

.

We set

ν_{n} = n (n - 1) / 2

and

μ_{n} = card (K (n)) = \frac{ν_{n}!}{(n - 1)! (ν_{n} - n + 1)!}

In particular, we have

ν_{4} = 6

and

μ_{4} = 6! / 3! / 3! = 20

.

Unspecified matrix blocks are denoted by star.

We use the following abbreviations: CSF—canonical Schur form; ConSF—condensed Schur form; DPS—diagonal preserving solution; GJF—generalized Jordan form; PSP—perturbed Schur problem; SP—Schur problem.

2. Condensed Schur Forms

Let a nonzero matrix

A \in C (n)

,

n \geq 2

, be given. A condensed form

U A V

of A relative to a set of invertible matrices

U, V \in C (n)

is a matrix with preliminary fixed elements (usually zeros and/or ones) at certain positions. Most often, the condensed form is a triangular matrix with at least

ν_{n}

zero elements. According to the famous Schur result [1], there exists a factorization

A = U T U^{H}

of the matrix A, where

U \in U (n)

and

T \in T (n)

. Two things here deserve mentioning: the proof of the Schur result is elementary, and it appeared too late in the history of mathematics.

Definition 1.

The pair

(U, T) \in U (n) \times T (n), T = U^{H} A U,

(1)

is said to be a Schur decomposition, or an upper triangular unitary decomposition of the matrix A. The matrix T is referred to as a condensed Schur form, or ConSF of A. The columns of the unitary transformation matrix U form a Schur basis for the space

C (n, 1)

relative to the matrix A.

Thus, the definition of the ConSF T of A is not unique, and hence, it is not canonical in the sense of Definitions 7 and 8 below. If

A \in R (n)

and

spect (A) \subset R

is real, then the matrix U may be chosen as

U \in O (n)

, and we have

T = U^{⊤} A U \in R (n)

. If

A \in R (n)

has at least one pair of complex eigenvalues

α \pm i β

,

β \neq 0

, then a real block-diagonal Schur form with diagonal blocks

λ \in R

and

[\begin{matrix} α & β \\ - β & α \end{matrix}] \in R (2)

may be constructed by orthogonal similarity transformations; see e.g., refs. [23,24].

Next, we define two sets of matrices depending on the matrix A thatplay an important role in our analysis. Denote

U (A) = {U \in U (n) : U^{H} A U \in T (n)} \subset U (n)

and

T (A) = {U^{H} A U : U \in U (A)} \subset T (n) .

Thus,

U (A) \subset U (n)

is the set of unitary matrices transforming the matrix A into ConSF, and

T (A)

is the set of ConSF of A. For matrices

A \in R (n)

with real spectra, we denote

O (A) = {U \in O (n) : U^{⊤} A U \in T (n)} \subset O (n) .

In general, the set

U (A)

is not a group and not even a groupoid, i.e.,

U_{1}, U_{2} \in U (A)

does not imply

U_{1} U_{2} \in U (A)

.

The most important property of the ConSF T of A is that its diagonal elements are the eigenvalues of A. Another application of ConSF is the evaluation of functions of matrices defined by a matrix power series or by Padé approximations [25].

Because of the only condition

Low (T) = 0

imposed on T, the matrix T is a condensed form (rather than a canonical form) of A relative to the similarity action

U (n) \times C (n) \to T (n)

, defined by

(U, A) \mapsto U^{H} A U

, of the group

U (n)

on the set

C (n)

.

Definition 2.

The problem of finding the ConSF (1) is referred to as the Schur problem (SP) for the matrix

A \in C (n)

. The general solution of the SP is the set

S (A) = \{(U, U^{H} A U) : U \in U (n), U^{H} A U \in T (n)\} \subset U (A) \times T (A)

of all ConSF of A. A pair

(U, T) \in S (A)

is a particular solution of the SP for the matrix A.

Sometimes the matrices U and T in a particular solution

(U, T)

of SP for A are written as

U (A)

and

T (A)

to emphasize their dependence on A. This dependence, however, is not a functional one. Indeed, the matrix U is always not unique. For example,

(U, T) \in S (A)

and

| V | = I_{n}

imply

(U V, V^{H} T V) \in S (A)

and, in particular,

(- U, T) \in S (A)

. With the exception of the case

A = λ I_{n}

,

λ \in C

, when it is fulfilled

T (A) = A

, the matrix

T (A)

is also not unique. For

A = λ I_{n}

, we have

S (A) = U (n) \times {A}

.

All upper triangular unitary equivalent forms of a given matrix are unitary similar. In particular, the next proposition is a direct corollary of the definitions; see, e.g., ref. [24].

Proposition 1.

Let

Π_{1} = (U_{1}, T_{1})

and

Π_{2} = (U_{2}, T_{2})

be two solutions of the SP for A. Then,

T_{2} = U_{2}^{H} U_{1} T_{1} U_{1}^{H} U_{2}

.

Proof.

It suffices to observe that

A = U_{1} T_{1} U_{1}^{H} = U_{2} T_{2} U_{2}^{H}

. □

Definition 3.

The solutions

Π_{1}

and

Π_{2}

are said to be diagonally equal if

Diag (T_{1}) = Diag (T_{2})

, and diagonally different if

Diag (T_{1}) \neq Diag (T_{2})

.

The next proposition has generally been known since 1933 and is attributed to H. Röseler; see Theorem 2.3 of [24]. It gives sufficient and almost necessary conditions for diagonal equality of the solutions of SP. The formulation and proof of the results below are slightly different from the known ones.

Proposition 2.

The following assertions hold true.

1.: If $V = U_{1}^{H} U_{2} \in D^{*} (n)$ , then the solutions $Π_{1}$ and $Π_{2}$ are diagonally equal.
2.: If the matrix A has pair-wise distinct eigenvalues and the solutions $Π_{1}$ and $Π_{2}$ are diagonally equal then $V \in D^{*} (n)$ .

Proof.

To prove Assertion 1 note that the condition

V \in D^{*} (n)

is equivalent to the existence of a matrix

D \in D^{*} (n)

such that

U_{2} = U_{1} D

. In this case

T_{1} (i, j) = D (i, i) \bar{D (j, j)} T_{2} (i, j), T_{1} (i, i) = T_{2} (i, i)

and

Diag (T_{1}) = Diag (T_{2})

.

To prove 2, we use the fact that

T_{1} V = V T_{2}

. Partition the matrices in this equality as

T_{1} = [\begin{matrix} λ & * \\ 0 & Λ \end{matrix}], T_{2} = [\begin{matrix} λ & * \\ 0 & * \end{matrix}], V = [\begin{matrix} μ & u \\ v & W \end{matrix}],

where

λ \in spect (A)

,

Λ \in C (n - 1)

,

μ \in C

,

W \in C (n - 1)

and ∗ is a matrix block of corresponding size. We have

T_{1} V = [\begin{matrix} * & * \\ Λ v & Λ W \end{matrix}], V T_{2} = [\begin{matrix} λ μ & * \\ λ v & * \end{matrix}]

and comparing the (2,1)-blocks of these matrices, we obtain

Λ v = λ v

. Since

λ \notin spect (Λ)

, we obtain

v = 0

. Hence

| μ | = 1

,

u = 0

and

V = diag (μ, W)

. Now, the proof is completed by induction. □

The MATLAB^® command [U,T] = schur(A) computes a particular solution

(U, T)

of SP for a matrix

A \in C (n)

. One of the aims of computing a ConSF T of a general matrix

A \in C (n)

is to determine the eigenvalues

λ_{k} (A)

of A as the diagonal elements

T (k, k)

of

T \in T (n)

. Another aim is to evaluate matrix functions

f (A) = f (U T U^{H}) = U f (T) U^{H},

where

f (A)

is a power matrix series [25] in A.

A particular challenge is to define SP for matrices

A \in T (n)

, which are already in ConSF. Here, the problem is not to transform A into ConSF (it already is), but rather to find the matrices

U \in U (n)

that keep

A \in T (n)

in ConSF, i.e.,

U^{H} A U \in T (n)

.

For matrices

A \in T (n)

, the MATLAB^® command [U,T] = schur(A) returns the solution

U = I_{n}

,

T = A

of the SP. Next, for

1 \leq m < n

and a matrix

A = [\begin{matrix} A_{1} & * \\ 0 & A_{2} \end{matrix}], A_{1} \notin T (n - m), A_{2} \in T (m)

the computed solution [U,T] is of the form

U = [\begin{matrix} U_{1} & 0 \\ 0 & I_{m} \end{matrix}], T = [\begin{matrix} T_{1} & * \\ 0 & A_{2} \end{matrix}], T_{1} = U_{1}^{H} A_{1} U_{1}

where

U_{1} \in U (n - m)

. In particular, for

A = I_{n}

, the command [U,T] = schur(eye(n)) returns

U = I_{n}

,

T = I_{n}

.

The MATLAB^® (Maple) command [V,J] = jordan(A) for computing an invertible matrix V and a Jordan canonical form

J = V^{- 1} A V

of A, in case

A = I_{n}

, also gives

V = I_{n}

,

J = I_{n}

.

It is interesting to reveal the action of existing computer codes on matrices A close to

I_{n}

in double-precision floating-point arithmetic.

Example 1.

Let

n = 2

and

A (ε) = I_{2} + ε E_{2, 1}

, where ε is small. We have

T = I_{2} \pm ε E_{1, 2}

. For

ε = 2^{- 51}

, the solution of SP for

A (ε)

is computed wrongly by the MATLAB^® code schur as

U = I_{2}

,

T = I_{2}

. Choosing ε as the next machine number, which is

2^{- 51} + 2^{- 103}

, the code schur computes exactly one of the solutions of SP for

A (ε)

, namely

U = E_{2, 1} - E_{1, 2}

,

T = I_{2} - ε E_{1, 2}

.

The code jordan exactly computes the Jordan canonical form

(V, J)

of A as

V = E_{1, 2} + ε E_{2, 1}

,

J = I_{2} + E_{1, 2}

for values of

ε > 0

as small as

2^{- 1074}

. For

ε = 2^{- 1075}

, the matrix

A (ε)

is rounded to

I_{2}

, and the computed Jordan form is

V = I_{2}

,

J = I_{2}

.

Note that the code jordan works with variable precision arithmetic.

The SP for

A \in T (n)

has infinitely many solutions

(U, T)

for both U and T with one exception for T. Namely, the matrix T is uniquely determined only when

A = λ I_{n}

, where

λ \in C

. In this case,

T = A

and

U \in U (n)

is arbitrary, i.e., the general solution of SP is

U (n) \times {A}

. It is also interesting to see whether the order of the diagonal elements of

A \in T (n)

is the same as the order of the diagonal elements of

T \in T (n)

. These considerations lead to the following definition.

Definition 4.

Let

A \in T (n)

. Then, the pair

(U, T)

, where

U \in U (n)

and

T = U^{H} A U \in T (n)

, is said to be a diagonal preserving solution (DPS) of the SP for A if

Diag (T) = Diag (A)

.

Otherwise speaking, the pair

(U, T)

is the DPS of SP when the upper triangular matrices A and T are diagonally equal, see Definition 3. The DPS thus defined as a particular solution of SP. The general DPS of SP for

A \in T (n)

is the set of all particular DPS solutions. In practice, we are interested in finding a particular DPS rather than determining the general DPS as a subset of

S (A)

.

If the matrix

A \in T (n)

has simple eigenvalues, the matrix U in any particular DPS

(U, T)

is diagonal, i.e.,

| U | = I_{n}

. To illustrate this, let

n = 2

and

A = diag (λ, 0) + a E_{1, 2}, T = diag (λ, 0) + t E_{1, 2}, | t | = | a |,

where

λ \neq 0

. We have

(A U) (2, 1) = 0

and

(U T) (2, 1) = λ U (2, 1)

, which yields

U (2, 1) = 0

and

| U | = I_{2}

.

If the matrix

A \in T (n)

has multiple eigenvalues but its upper triangular part is generically nonzero, i.e.,

A (p, q) \neq 0

for

p > q

, then the matrix U in any particular DPS

(U, T)

is again diagonal. To illustrate this, let

n = 2

and

A = a E_{1, 2}

,

T = t E_{1, 2}

, where

| t | = | a | > 0

. We have

(A U) (1, 1) = a U (2, 1)

and

(U T) (1, 1) = 0

, which yields

U (2, 1) = 0

and

| U | = I_{2}

.

For all

A \in T (n)

, a simple choice for U in any particular DPS is

U = I_{n}

, which allows treating all cases in a unified manner. Choosing any other matrix U with

| U | = I_{n}

is of course also possible. In the real case, orthogonal diagonal matrices U have diagonal elements

\pm 1

.

The case when the matrix A has multiple eigenvalues and its upper triangular part is not generically nonzero, e.g.,

A = λ I_{n}

or

A = [λ I_{m}, *; 0, A_{2}]

, where

1 \leq m < n

and

A_{2} \notin T (n - m)

, needs special consideration. Here, the matrix U in a particular DPS may not be diagonal. For example, in the most non-generic case

A = λ I_{n}

, any pair

(U, A)

with

U \in U (n)

is a particular solution of SP, and hence, the general solution of SP is

U (n) \times {A}

. Anyway, for definiteness, we choose the particular solution with

U = I_{n}

and

T = A

of SP each time when

A \in T

.

Any particular choice of U in a DPS for matrices A with multiple eigenvalues and a non-generic upper triangular part may lead to jump effects, as described below.

Suppose that

A = A_{1} (ε)

,

ε \in (- ε_{0}, ε_{0})

is the matrix function of a scalar argument, where

ε_{0} > 0

is a small parameter. If the matrix

A_{1} (0) \in T (n)

has simple eigenvalues (or multiple eigenvalues but generic upper triangular part), then the matrix

U = U_{1} (ε)

in a DPS for

A_{1} (ε)

may be chosen so that

U_{1} (0) = I_{n}

. Here, the function

U = U_{1} (ε)

will be continuous in the interval

(- ε_{0}, ε_{0})

. If, however,

A_{1} (0)

has multiple eigenvalues and a non-generic upper triangular part, then the choice

U_{1} (0) = I_{n}

may lead to jump of the function

U = U_{1} (ε)

at the point

ε = 0

.

We may use another function

U = U_{2} (ε)

with

U_{2} (0) \neq I_{n}

in the DPS of the SP for

A_{1} (0) \in T (n)

, such that

U_{2}

is continuous at

ε = 0

for the above choice of

A = A_{1} (ε)

. However, for another choice of

A = A_{2} (ε)

, the matrix function

U_{2}

will have a jump at

ε = 0

. Hence, this type of discontinuity is inherent for the statement of the SP with T being only upper triangular. The same is true if T is always lower triangular.

The next example illustrates these slightly complicated considerations.

Example 2.

A possible DPS

(U, I_{n})

of the SP for

A_{0} = I_{2}

is

U_{1} (0) = I_{2}

. Let

A_{1} (ε) = A_{0} + ε E_{2, 1}

, where

ε \in (- ε_{0}, ε_{0})

and

ε_{0} > 0

is fixed. Then, for

ε \neq 0

, we have

| U_{1} (ε) | = P_{2}

,

P_{2} = E_{1, 2} + E_{2, 1}

, and the function

U = U_{1} (ε)

has a jump at

ε = 0

with

∥ U_{1} (ε) - U_{1} (0) ∥ = \sqrt{2}

for

ε \neq 0

. If another DPS is chosen with

U_{2} (0) = P_{2}

, then the matrix function

U_{2}

satisfies

U_{2} (ε) = P_{2}

for

ε \neq 0

and is constant (and hence continuous) at

ε = 0

. If, however, we take

A_{2} (ε) = A_{0} + ε E_{1, 2}

and then

U_{2} (ε) = I_{2}

for

ε \neq 0

, and the function

U = U_{2} (ε)

is discontinuous with the same jump at the point

ε = 0

.

Example 2 shows that the discontinuity of

U = U (A)

is inherent to this statement of the SP in cases like

A = λ I_{2}

. This is due to the definition of T as an upper triangular matrix (or as a lower triangular matrix). If we allow the condensed form T to be any triangular unitary equivalent form of A, then this type of artificial discontinuity may disappear due to the definitions. This is the idea of the quasi-Schur condensed forms introduced later on. The latter forms generalize triangular Schur forms.

Without additional assumptions, the matrix

T \in T (n)

is only a condensed form rather than a canonical form of A relative to the similarity action of the group

U (n)

. The only (albeit most important) invariants for this action, which are revealed by the matrix T, are the eigenvalues

λ_{k} (A) = T (k, k)

of the matrix A.

The definition of complete invariants and canonical forms for the similarity action of

U (n)

on

C (n)

is mathematically interesting (see [24]), but it is much more complicated and is not considered in full detail here. Further on, we consider, among others, only a partial formulation of Schur canonical forms for generic matrices A, see also [26,27,28]. From the point of view of applications, the condensed forms provide the same advantages as the canonical forms. Moreover, strict unitary canonical forms of non-generic matrices A are rarely, if ever, used in practice, since they involve complicated conditions and procedures (which are hard to be checked) and are more sensitive to perturbations in A compared with the condensed forms.

Let

U \in U (A)

and

| D | = I_{n}

. Then,

U D \in U (A)

as well. In particular,

c U \in U (A)

for

| c | = 1

and

- U \in U (A)

. This fact has an important implication. The diameter of the set

U (A)

, i.e., the maximum of

∥ U - V ∥

for

U, V \in U (A)

, is equal to 2 and is achieved for

U = - V \in U (A)

.

Given a matrix

A \in C (n)

, neither the ConSF

T \in T (n)

of A nor the transformation matrix

U \in U (n)

are unique in general. In fact, the matrix T is unique if and only if

A = λ I_{n}

,

λ \in C

, while U is always not unique. In this case,

T = A

and

U \in U (n)

is an arbitrary unitary matrix, or, equivalently,

T (A) = {A}

and

U (A) = U (n)

.

If A has at least two different eigenvalues, then we have a set

T (A)

of ConSF T with different ordering of the eigenvalues of A on the diagonal of T. The ConSF T also differ in their strictly upper triangular parts.

Suppose that

spect (A)

consists of

m \leq n

pair-wise disjoint elements

λ_{1}, λ_{2}, \dots, λ_{m}

with multiplicities

n_{1}, n_{2}, \dots, n_{m}

, where

n_{1} + n_{2} + \dots + n_{m} = n

. Then, there are

N = N (n_{1}, n_{2}, \dots, n_{m}) = \frac{n!}{n_{1}! n_{2}! \dots n_{m}!}

different orderings of the elements

T (k, k)

on the diagonal

Diag (T)

of the ConSF T, or N diagonally different solutions of the SP for A.

Here, one of the ConSF of A is the block matrix

T = [T_{k, l}]

with

T_{k, l} \in C (n_{k}, n_{l})

and

T (k, k) \in T (n_{k})

, where

Diag (T_{k, k}) = λ_{k} I_{n_{k}}

. In the generic case,

m = n

, we have

N (1, 1, \dots, 1) = n!

diagonally different ConSF, while in the most non-generic case

m = 1

, we have

N (n) = 1

and all ConSF are diagonally equal.

3. Canonical Schur Forms for Generic Matrices

In this section, we summarize and reformulate some of the results concerning Schur canonical forms for the unitary similarity action of

U (n)

on the set

C (n)

. The canonical Schur form

T \in T (n)

of the matrix

A \in C (n)

is a ConSF with additional conditions imposed on its elements; see [24] and the references therein. We consider only generic matrices A with pair-wise disjoint eigenvalues for which the solution

(U, T)

of the Schur problem is continuous as a function of the matrix A. At the same time, the Schur basis U for condensed forms (and hence for canonical forms as well) of a matrix A with multiple eigenvalues may be discontinuous as a function of A.

Definition 5.

For a given matrix

A \in C (n)

, the set

Orb (A) = {U^{H} A U : U \in U (n)} \subset C (n)

is called equivalence class, or orbit, of A relative to the similarity action of the unitary group

U (n)

.

Obviously,

B \in Orb (A)

implies

A \in Orb (B)

and vice versa. Let

A \subset C (n)

and

C \subset T (n)

be certain sets.

Definition 6.

The matrices

A, B \in C (n)

are said to be unitary equivalent (denoted as

A \sim B

) if

B \in Orb (A)

.

Definition 7.

The function

γ : A \to C

is said to be canonical mapping for the similarity action of the group

U (n)

on the set

A

when the equality

γ (A) = γ (B)

holds if and only if

A \sim B

.

Hence, the canonical mapping

γ : A \to C

is a complete invariant [29] for the similarity action of the group

U (n)

on the set

A

, but the opposite, of course, is not true. Next, the canonical form of A is defined as the image

γ (A)

of A under

γ

.

Definition 8.

The image

γ (A) \in T (n)

of the matrix A under the canonical mapping γ is said to be unitary canonical form, or Schur canonical form, of A.

Definition 9.

The subset

A

of

C (n)

is said to be closed in the Zariski topology if it is the union of the zeros of a system of polynomials in

z \in C (n)

. The subset

A \subset C (n)

is said to be open in the Zariski topology if its complement

C (n) ∖ A \subset C (n)

is closed in this topology.

Definition 10.

A property

P

of a matrix

A \in C (n)

is said to be generic if it is fulfilled on a subset

A \subset C (n)

, which is open in the Zariski topology.

Informally, the matrix A is said to be generic relative to a given property if this property is generic.

Proposition 3.

The following properties of a matrix

A \in C (n)

are generic.

1.: The matrix A is totally different from any fixed matrix $A_{0} \in C (n)$ , i.e., $A (k, l) \neq A_{0} (k, l)$ for $k, l \in Z [1, n]$ ; in particular, $A (k, l) \neq 0$ for any given pair $(k, l)$ .
2.: The matrix A is not normal, i.e., $A^{H} A \neq A A^{H}$ ; in particular, the matrix A is not unitary.
3.: The singular values of the matrix A are positive and pair-wise different; in particular, $rank (A) = n$ .
4.: The eigenvalues $λ_{k}$ of the matrix A satisfy the inequalities $Re (λ_{k}) \neq Re (λ_{l})$ and $Im (λ_{k}) \neq Im (λ_{l})$ for $k \neq l$ ; in particular, $λ_{k} \neq λ_{l}$ for $k \neq l$ and the Jordan canonical form of A is diagonal.
5.: Any ConSF T of the matrix A has nonzero and pair-wise different elements on and above its diagonal, i.e., $T (k, l) \neq 0$ and $T (i, j) \neq T (k, l)$ for $i \leq j$ , $k \leq l$ and $(i, j) \neq (k, l)$ .

4. Geometry of Schur Canonical Sets

Let

ω = (ω_{1}, ω_{2}, \dots, ω_{n}) : Z [1, n] \to Z [1, n]

be a permutation of the integers

1, 2, \dots, n

and recall that

Z [1, n - 1] = {1, 2, \dots, n - 1}

and

Z [2, n] = {2, 3, \dots, n}

. Set

Z_{n} = Z [1, n - 1] \times Z [2, n]

.

Below, we describe a possible set of canonical forms for the similarity action of the group

U (n)

on the subset

C (n)

of matrices with simple eigenvalues. Let

K (n) \subset Z_{n}^{n - 1}

be the set of

(n - 1)

-tuples

{(i_{1}, j_{1}), (i_{2}, j_{2}), \dots, (i_{n - 1}, j_{n - 1})}

of integer pairs

p_{k} = (i_{k}, j_{k})

,

k \in Z [1, n - 1]

, where

i_{k} < j_{k}

. There are

μ_{n}

such

(n - 1)

-tuples; see Table 1. Later on, we shall define three important types of such sets.

Table 1. Number of generic canonical Schur forms.

Definition 11.

The conjugate pair of the pair

p = (i, j) \in Z_{n}

is

p^{τ} = {(i, j)}^{τ} = (n + 1 - j, n + 1 - i) \in Z_{n}

The pair p is self-conjugate if

p^{τ} = p

.

Obviously, the pair

(i, j)

is self-conjugate if and only if

i + j = n + 1

.

Definition 12.

The conjugate

(n - 1)

-tuple of the

(n - 1)

-tuple

θ = (p_{1}, p_{2}, \dots, p_{n - 1})

, where

p_{k} = (i_{k}, j_{k})

,

k \in Z [1, n - 1]

, is

θ^{τ} = (p_{1}^{τ}, p_{2}^{τ}, \dots, p_{n - 1}^{τ}) .

The

(n - 1)

-tuple θ is self-conjugate if

θ^{τ} = θ

.

The conjugation for pairs p and

(n - 1)

-tuples

θ

is an involution, i.e.,

{({(i, j)}^{τ})}^{τ} = (i, j)

and

{(θ^{τ})}^{τ} = θ

. It corresponds to reflection relative to the anti-diagonal

(1, n), (2, n - 1), \dots, (n, 1)

of

n \times n

arrays.

Definition 13.

The set

K (n)

has the following important subsets.

1.: The set $K_{1} (n) \subset K (n)$ is of type 1 if its elements are of the form

${(i_{1}, 1), (i_{2}, 2), \dots, (i_{n - 1}, n - 1)}$
2.: The set $K_{2} (n) \subset K (n)$ is of type 2 if its elements are of the form

${(1, j_{1}), (2, j_{2}), \dots, (n - 1, j_{n - 1})}$
3.: The set $K_{3} (n) = K (n) ∖ (K_{1} (n) \cup K_{2} (n)) \subset K (n)$ is of type 3 if it is neither of type 1 nor of type 2.

Note that the elements of the set

K_{1} (n)

are conjugate to the elements of the set

K_{2} (n)

.

Proposition 4.

The intersection

K_{1} (n) \cap K_{2} (n)

has a single element

θ^{@} = {(1, 2), (2, 3), \dots, (n - 1, n)}

which is a self-conjugate

(n - 1)

-tuple.

Definition 14.

The elements of the set

K_{1} (n) \cup K_{2} (n)

are said to be proper. The elements of the set

K_{3} (n)

are said to be improper.

There are

(n - 1)!

elements in each of the sets

K_{1} (n)

and

K_{2} (n)

and one joint element of

K_{1} (n)

and

K_{2} (n)

. Thus, we have

\begin{matrix} card (K_{1} (n) \cup K_{2} (n)) = (n - 1)! + (n - 1)! - 1 = 2 (n - 1)! - 1 \\ card (K_{3} (n)) = μ_{n} - 2 (n - 1)! + 1 \end{matrix}

Example 3.

For

n = 2

, there is

μ_{2} = 1

pair of indexes

(1, 2)

and it is proper. For

n = 3

, there are

μ_{3} = 3

sets of pairs of indexes

{(1, 2), (1, 3)}, {(1, 2), (2, 3)}, {(1, 3), (2, 3)}

and they are all proper. For

n = 4

, there are

μ_{4} = 20

triples of pairs of indexes of which 11 are proper, namely

\begin{matrix} {(1, 2), (2, 3), (3, 4)}, {(1, 2), (1, 3), (1, 4)}, {(1, 4), (2, 4), (3, 4)} \\ {(1, 2), (1, 3), (2, 4)}, {(1, 3), (2, 4), (3, 4)}, {(1, 2), (1, 3), (3, 4)} \\ {(1, 3), (2, 3), (3, 4)}, {(1, 2), (2, 3), (2, 4)}, {(1, 4), (2, 3), (3, 4)} \\ {(1, 4), (2, 3), (3, 4)}, {(1, 2), (2, 3), (1, 4)} \end{matrix}

and 9 are improper, namely

\begin{matrix} {(1, 3), (2, 3), (3, 4)}, {(1, 3), (1, 4), (2, 4)}, {(1, 2), (1, 4), (3, 4)} \\ {(1, 2), (1, 3), (2, 4)}, {(1, 3), (2, 4), (3, 4)}, {(1, 2), (1, 3), (3, 4)} \\ {(1, 2), (1, 3), (2, 3)}, {(1, 3), (1, 4), (3, 4)}, {(1, 2), (1, 4), (2, 4)} \end{matrix}

Proposition 5.

The minimal and maximal elements relative to the order relation ≺ on the set

K_{2} (n)

are

θ_{1} = {(1, 2), (1, 3), \dots, (1, n)}

and

θ_{(n - 1)!} = θ^{@} = {(1, 2), (2, 3), \dots, (n - 1, n)},

respectively. The minimal and maximal elements of the set

K_{1} (n)

are

θ^{@}

and

θ_{2 (n - 1)! - 1} = {(1, n), (2, n), \dots, (n - 1, n)}

respectively.

Now, we are in position to define possible sets of canonical Schur forms (CSF)

S \in T (n)

for generic matrices

A \in C (n)

. There are

n! (2 (n - 1)! - 1)

such sets. The multiplier

n!

comes from the different orders of the (simple) eigenvalues of A on the diagonal of S. The multiplier

2 (n - 1)! - 1

corresponds to different choices of proper

(n - 1)

-tuples

{(i_{1}, j_{1}), (i_{2}, j_{2}), \dots, (i_{n - 1}, j_{n - 1})}

such that the elements

S (i_{k}, j_{k})

,

k \in Z [1, n - 1]

, of S are positive.

If the eigenvalues of S are ordered as

λ_{1} ≺ λ_{2} ≺ \dots ≺ λ_{n}

, then there remain

2 (n - 1)! - 1

sets of CSF. Note that any fixed order of the (simple) eigenvalues of A on the diagonal of S is preserved only by unitary similarity transformations with diagonal matrices U, i.e.,

U (j, k) = d (j, k) exp (i φ_{k})

.

If, in particular, we choose a given

(n - 1)

-tuple, say

{(1, 2), (2, 3), \dots, (n - 1, n)} \in K_{1} (n) \cap K_{2} (n)

then the set of CSF is uniquely fixed. In this case, the CSF have the form

[\begin{matrix} λ_{1} & \oplus & * & \dots & * \\ 0 & λ_{2} & \oplus & \dots & * \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & \dots & λ_{n - 1} & \oplus \\ 0 & 0 & \dots & 0 & λ_{n} \end{matrix}]

where ⊕ denotes a positive element. The two other CSFs are

[\begin{matrix} λ_{1} & \oplus & \oplus & \dots & \oplus \\ 0 & λ_{2} & * & \dots & * \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & \dots & λ_{n - 1} & * \\ 0 & 0 & \dots & 0 & λ_{n} \end{matrix}], [\begin{matrix} λ_{1} & * & * & \dots & \oplus \\ 0 & λ_{2} & * & \dots & \oplus \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & \dots & λ_{n - 1} & \oplus \\ 0 & 0 & \dots & 0 & λ_{n} \end{matrix}]

Note that there is a similar problem with canonical Jordan forms of matrices

A \in C

relative to general similarity transformations. Usually, it is assumed that different orders of the Jordan blocks do not produce different Jordan forms. Formally, this means that the canonical Jordan form of A is not a single block-diagonal matrix

J \in C (n)

but a class of block-diagonal matrices, which are permutationally equivalent to J.

Definition 15.

A set

C \subset T (n)

of CSF

T \in T (n)

for generic matrices

A \in C (n)

and a fixed

(n - 1)

-tuple

{(i_{1}, j_{1}), (i_{2}, j_{2}), \dots, (i_{n - 1}, j_{n - 1})} \in K_{1} \cup K_{2}

is characterized as follows.

1.: The n diagonal elements of the matrix T are ordered as

$T (1, 1) ≺ T (2, 2) ≺ \dots ≺ T (n, n) .$
2.: The $n - 1$ elements $T (i_{1}, j_{1}), T (i_{2}, j_{2}), \dots, T (i_{n - 1}, j_{n - 1})$ of T over the diagonal are real and positive.

Of course, we may choose the elements

T (i_{k}, j_{k})

to be real and negative as well, or to have angles equal to a fixed value

φ_{0} \in (- π, π]

, etc.

A matrix

A \in C

with eigenvalues

λ_{1} ≺ λ_{2} ≺ \dots ≺ λ_{n}

may be transformed into CSF

C \in T (n)

by the next three steps.

The matrix A is transformed into any ConSF $T_{1} = U_{1}^{H} A U_{1} \in T (n)$ by a matrix $U_{1} \in U (n)$ . Numerically this is performed by the QR algorithm [3]. For this purpose, the code schur from MATLAB^® may be used [22].
A ConSF $T_{2} = U_{2}^{H} T_{1} U_{2} \in T (n)$ is constructed so that $T_{2} (k, k) = λ_{k}$ , $k \in Z [1, n]$ . This may be performed by complex plane rotations, which interchange the positions of two diagonal elements $T_{1} (i, i)$ and $T_{1} (j, j)$ of $T_{1}$ such that $i < j$ but $T (i, i) > T (j, j)$ ; see, e.g., ref. [3].
A diagonal matrix $U_{3} \in U (n)$ with elements $U_{3} (1, 1) = 1$ , $U_{3} (k, k) = exp (i φ_{k})$ , $φ_{k} \in R$ , $k \in Z [2, n]$ is chosen so that the matrix $T = U_{3}^{H} T_{2} U_{3}$ has positive elements in positions $(i_{k}, j_{k})$ .

In connection with step 3, we note that unitary similarity transformations that introduce positive elements in certain positions of the transformed matrices are considered in [30].

We recall that to introduce CSF in the set

C (n)

relative to the similarity action of

U (n)

, we use the lexicographical order ≺ on

C ≃ R \times R

. For

z_{k} = x_{k} + i y_{k} \in C

, where

x_{k}, y_{k} \in R

,

k = 1, 2

, we write

z_{1} ≺ z_{2}

if either

x_{1} < x_{2}

, or

x_{1} = x_{2}

and

y_{1} < y_{2}

.

There are

μ_{n} = (\binom{ν_{n}}{n - 1}), ν_{n} = n (n - 1) / 2,

sets of generic canonical forms

C_{k} \subset T (n)

,

k \in Z [1, μ_{n}]

, for

A \in C (n)

. The values of

μ_{n}

for small values of n are given in Table 1.

There are

μ_{n}

different pairs of

p_{k} = (i_{k}, j_{k})

,

1 \leq i < j \leq n

. They are ordered lexicographically according to the rule

(i_{1}, j_{1}) ≺ (i_{2}, j_{2})

if either

i_{1} < i_{2}

, or

i_{1} = i_{2}

and

j_{1} < j_{2}

. We may order the pairs

p_{k}

as

p_{1} ≺ p_{2} ≺ \dots ≺ p_{μ_{n}}

, where

k = k_{n} (i, j) = j + n (i - 1) - \frac{i (i + 1)}{2} .

(2)

Thus, we have the chain of inequalities

\begin{matrix} (1, 2) ≺ (1, 3) ≺ \dots ≺ (1, n) ≺ (2, 3) ≺ \dots ≺ (2, n) ≺ \dots \\ ≺ (n - 2, n - 1) ≺ (n - 2, n) ≺ (n - 1, n) \end{matrix}

For any

p = (i, j)

, denote the conjugate pair

p^{τ} = (n + 1 - j, n + 1 - i)

, symmetric to p relative to the anti-diagonal of elements in positions

(i, n + 1 - i)

,

i \in Z [1, n]

.

It follows from (2) that for n fixed and any

k \in Z [1, ν_{n}]

there exists a unique integer

i = i_{n} (k)

such that

a_{n} (i) \leq k \leq b_{n} (i)

, where

a_{n} (i) = 1 + (i - 1) (2 n - i) / 2

and

b_{n} (i) = i (2 n - i - 1) / 2

.

The integer

i_{n} (k)

may be defined from

a_{n} (i_{n} (k)) = max_{i} {a_{n} (i) \leq k}

or

b_{n} (i_{n} (k)) = min_{i} {b_{n} (i) \geq k} .

Finally, set

j_{n} (k) = k - n (i_{n} (k) - 1) + \frac{i_{n} (k) (i_{n} (k) + 1)}{2} .

Thus, we have defined a bijection

(i, j) \mapsto k = k_{n} (i, j), k \mapsto (i, j) = (i_{n} (k), j_{n} (k)),

between the ordered sets of integers

Z [1, ν_{n}]

and integer pairs

(i, j)

, where

1 \leq i < j \leq n

.

Proposition 6.

The triple of pairs of indexes

((i, k), (i, j), (l, j))

, where

i < k

,

i < j

,

l < j

and

k < j

,

i < l

, is said to be improper. The triple of pairs of indexes

((k, j), (i, j), (i, l))

, where

k < j

,

i < j

,

i < l

and

k < i

,

j < l

, is said to be improper.

Theorem 1.

Each set

θ \in K_{1} (n) \cup K_{2} (n)

of proper integer

(n - 1)

-tuples defines a class

C (θ) \subset T (n)

of canonical forms for the unitary similarity action of the group

U (n)

on generic matrices

A \in C (n)

. These forms are upper triangular matrices S with

S (k, k) ≺ S (k + 1, k + 1)

for

k \in Z [1, n - 1]

and

T (i, j) \in R

,

T (i, j) > 0

for

(i, j) \in θ

.

If the matrix

A \in C (n)

with eigenvalues

λ_{1} ≺ λ_{2} ≺ \dots ≺ λ_{n}

is already transformed into ConSF, i.e.,

A \in T (n)

, it is then easily put into CSF as follows. First, a matrix

U \in U (n)

is chosen so as

Diag (S) = diag (λ_{1}, λ_{2}, \dots, λ_{n}), S = U^{H} A U .

Then, a diagonal unitary matrix D with

D (1, 1) = 1

is found so that

T = D^{H} S D \in C (θ)

. Denoting

S (i, j) = | S (i, j) | exp (i α (i, j))

and

D (i, i) = exp (i φ (i))

,

φ (1) = 0

, where

α (i, j), φ (i) \in (- π, π]

, the conditions

S (i, j) > 0

give the system of

n - 1

linear equations

φ (i) - φ (j) = α (i, j), (i, j) \in θ,

(3)

for

φ (2), φ (3), \dots, φ (n)

. If it happens that

φ (k) \notin (- π, π]

for some k then

φ (k)

is replaced by

\tilde{φ} (k) = φ (k) mod (2 π) \in (- π, π] .

(4)

Three special sets of Schur canonical forms for generic matrices

A \in C (n)

deserve attention. For these sets, the system (3) for

φ (i)

,

i \in Z [2, n]

, is solved explicitly as follows.

The first set corresponds to pairs of indexes $(1, j)$ , $j \in Z [2, n]$ , and here $φ (i) = - α (1, i)$ .
The second set corresponds to pairs of index pairs $(i, n)$ , $i \in Z [1, n - 1]$ , and here $φ (i) = α (i, n) - α (1, n)$ .
The third set corresponds to pairs of indexes $(i, i + 1)$ , $i \in Z [1, n - 1]$ , and here $φ (i) = - α (1, 2) - α (2, 3) - \dots - α (i - 1, i)$ .

In all these cases, the fulfilment of the convention (4) is presupposed.

The restrictions assumed in this section, and in particular, the condition that the eigenvalues of A are simple, seem serious, but in fact, their violation can make the perturbation analysis of this statement of SP meaningless. If, for example, A has two or more equal eigenvalues, then the Schur basis

U (A + δ A)

of the perturbed SP may be discontinuous as a function of the perturbation

δ A

in A; see, e.g., refs. [6].

5. Real Schur Canonical Forms

The considerations above are valid for real or genuinely complex matrices with spectra that may in turn be real or genuinely complex. In particular, we have the following four possibilities.

The matrix A is real and has a real spectrum.
The matrix A is real and has a genuinely complex spectrum (i.e., there is at least one complex conjugate pair $α \pm i β$ of eigenvalues, where $0 < β \in R$ ).
The matrix A is genuinely complex and has a real spectrum.
The matrix A is genuinely complex and has a genuinely complex spectrum.

When

A \in R (n)

(cases 1 and 2), we may use orthogonal transformation matrices

U \in O (n)

instead of unitary ones to obtain the real Schur canonical form and the real Schur condensed forms of A. In case 1, the transformation matrix is taken as

U \in O (n)

and both the SCF and the ConSF of A are real upper triangular matrices T with the eigenvalues of A on their main diagonals.

Case 2 is slightly more subtle. Here, the transformation matrix U may be chosen as orthogonal [23,24], while the canonical form and the condensed forms of A are upper block-triangular matrices with

1 \times 1

or

2 \times 2

blocks (

λ_{k} \in R

or

Λ_{k} \in R (2)

) on the main diagonal. In this case, there is at least one

2 \times 2

block

Λ_{k} = [\begin{matrix} α_{k} & β_{k} \\ - β_{k} & a_{k} \end{matrix}] \in R (2)

corresponding to the eigenvalues

α_{k} \pm i β_{k}

of A, where

α_{k}, β_{k} \in R

and

β_{k} > 0

.

Let

n > 2

and suppose that the spectrum

spect (A)

contains m real elements

λ_{1}, λ_{2}, \dots, λ_{m}

and

n - m

genuinely complex elements

α_{k} + i β_{k}, α_{k} - i β_{k}

,

k = 1, 2, \dots, (n - m) / 2

, where the number

n - m

is even. Set

q = (n - m) / 2

. Then, the orthogonal canonical form of A has the structure

S = U^{⊤} A U = [\begin{matrix} S_{1, 1} & S_{1, 2} \\ O_{n - m, m} & S_{2, 2} \end{matrix}]

Here

S_{1, 1} \in R (m)

,

S_{1, 2} \in R (m, n - m)

,

S_{2, 2} \in R (n - m)

,

S_{1, 1} = [\begin{matrix} λ_{1} & s_{1, 2} & \dots & s_{1, m} \\ 0 & λ_{2} & \dots & s_{2, m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & λ_{m} \end{matrix}], S_{2, 2} = [\begin{matrix} Λ_{1} & S_{1, 2} & \dots & S_{1, q} \\ O_{2} & Λ_{2} & \dots & S_{2, q} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ O_{2} & O_{2} & \dots & Λ_{q} \end{matrix}]

and

s_{i, j} \in R

,

S_{i, j} \in R (2)

. The diagonal blocks are ordered as

λ_{1} < λ_{2} < \dots < λ_{m}

and

α_{t} + i β_{t} ≺ α_{t + 1} + i β_{t + 1}

,

t = 1, 2, \dots, q

.

6. Perturbations of the Schur Problem

Let

(U, T) \in U (n) \times T (n)

be a particular DPS of SP for a matrix

A \in C (n)

, i.e.,

T = U^{H} A U

and

Diag (T) = Diag (A)

. If A is already in ConSF, i.e.,

A \in T (n)

, there is nothing to transform. Here, we may choose

U = I_{n}

for definiteness among all unitary matrices U that keep the diagonal of A under similarity transformations

(U, A) \to U^{H} A U

. In addition, if the matrix

A \in T (n)

has a simple spectrum, and/or has multiple eigenvalues but its upper triangular form is generically nonzero, then any unitary matrix U in the DPS is necessarily diagonal, i.e.,

| U | = I_{n}

.

The choice of U as the simplest diagonal matrix

I_{n}

for

A \in T (n)

is justified by a number of additional arguments as follows.

It works for arbitrary matrices A with simple eigenvalues as well as with multiple eigenvalues.
When using ConSF for the evaluation of matrix functions, the computational algorithm has to work with a value for U, which is applicable to all input matrices A. For this reason, the QR algorithm [3] for computing T first checks whether $A \in T (n)$ . If yes, it assumes $U = I_{n}$ and $T = A$ .
Algorithms for computing a Jordan form $J = V^{- 1} A V$ of A act similarly. Indeed, for A already in Jordan form, computational algorithms for finding $(V, J)$ assume $V = I_{n}$ and $J = A$ .

As shown later on, the choice

U (0) = I_{n}

in the solution of SP for

A_{0} = λ I_{n}

may lead to discontinuity of the function

U = U (ε)

at the point

ε = 0

, in particular solutions

(U (ε), T (ε))

of SP for a perturbed matrix

A + ε E

with a given E. If we choose

U (0) \neq I_{n}

, this discontinuity disappears. However, for another perturbed matrix

A + ε F

, the discontinuity occurs as well. This inevitable discontinuity is due to the fact that ConSF T is upper triangular and for a lower triangular perturbation

ε E

, the perturbed transformation matrix

U (ε)

is a flip matrix such as

P_{n}

.

Let

δ A \in C (n)

be a perturbation in A. Usually (but not always), we suppose that the matrix

δ A

is small relative to A, e.g.,

∥ δ A ∥ \leq ρ ∥ A ∥

, where

ρ = 2^{- 53} ≃ 1.1102 \times 10^{- 16}

is the rounding unit of FPA used in the computations [31].

We often assume that the perturbation

δ A

is a 1-parameter family

δ A (ε) = ε E

, where

ε > 0

is a small parameter, and

E \in C (n)

is a fixed matrix with

∥ E ∥ = 1

, i.e.,

∥ δ A ∥ = ε

. The technique of the so-called fictitious small parameter can also be used in the perturbation analysis of matrix problems. Assuming that

∥ δ A ∥

is small relative to

∥ A ∥

, we use the identity

δ A = ε E

, where

E = δ A / ∥ δ A ∥

and

ε

is finally set to

∥ δ A ∥

.

The formulation of the perturbed Schur problem (PSP), i.e., the SP for a perturbed matrix

A + δ A

, is not trivial. First, we mention two facts.

If PSP for a perturbed matrix $A + δ A$ has a perturbed solution $(U + δ U, T + δ T)$ with $∥ δ U ∥, ∥ δ T ∥ \to 0$ as $∥ δ A ∥ \to 0$ , then it also has a perturbed solution $(U + Δ U, T + δ T)$ with $Δ U = - 2 U - δ U$ and $∥ Δ U ∥ \to 2$ as $δ \to 0$ . The reason is that $U + Δ U = - (U + δ U)$ .
The solution of PSP for a matrix $A + δ A$ may have the form $(U, T + δ T)$ with $δ U = 0$ . This will happen when $U^{H} δ A U \in T (n)$ , and in this case, $∥ δ T ∥ = ∥ δ A ∥$ .

Let

(U_{0}, T_{0}) \in U (n) \times T (n)

be a particular solution of SP for a matrix

A_{0} \in C (n)

. For definiteness and based on the above considerations, we assume that, if

A_{0} \in T (n)

, then

U_{0} = I_{n}

and

T_{0} = A_{0}

.

Consider the perturbation

δ A = ε E

, where

ε \in (- ε_{0}, ε_{0})

,

ε_{0} > 0

is a small parameter and

E \in C (n)

is a fixed matrix with

∥ E ∥ = 1

. Let

(U (ε), T (ε))

be a particular solution to PSP for the matrix

A (ε) = A_{0} + ε E

, i.e.,

T (ε) = U^{H} (ε) A (ε) U (ε) \in T (n), U (ε) \in U (n) .

(5)

Since the solution of PSP always exists, we have defined functions

U : (- ε_{0}, ε_{0}) \to U (n)

and

T : (- ε_{0}, ε_{0}) \to T (n)

through relations (5). The problem is that there are many such functions and not all of them are suitable for perturbation analysis. The aim of the next definition is to clarify the concepts in this area.

Definition 16.

The pair

(U (ε), T (ε))

is said to be a regular solution of the PSP for the matrix

A (ε) = A_{0} + ε E

if the functions U and T are continuous on the interval

(- ε_{0}, ε_{0})

and, if

A_{0} \in T (n)

, then

(U (0), T (0))

is the DPS of SP for

A_{0}

.

Example 4.

Let

A = λ I_{n}

,

λ \in C

. Then, the general solution of SP for A is

U (n) \times {A}

. The opposite statement is also true in the form of the next two assertions.

1.: If $U (A) = U (n)$ , then $A = λ I_{n}$ and $T (A) = {A}$ .
2.: If $T (A) = {A}$ , then $A = λ I_{n}$ and $U (A) = U (n)$ .

Example 5.

Let

A = λ I_{n} + J_{n}

, where

λ \in C

and

J_{n} = [0, I_{n - 1}; 0, 0] \in R (n)

is a Jordan block with a zero eigenvalue. Then,

U (A) = {U \in U (n) : | U | = I_{n}}, T (A) = {λ I_{n} + X : | X | = J_{n}} .

A number of examples of ConSF for

A \in R (2)

presented in the next section illustrate the structure of these forms and the behavior of their perturbations; see also [7].

7. Examples of Real $2 \times 2$ Matrices

In this section, we consider several examples illustrating the concepts introduced so far. The examples are for SP and PSP for matrices

A \in R (2)

with

spect (A) \subset R

for which the transformation group is

O (2)

. This is the simplest non-trivial case. However, the effects observed are valid for matrices

A \in C (n)

of the form

A = [\begin{matrix} A_{1, 1} & A_{1, 2} \\ O_{n - 2, 2} & A_{2, 2} \end{matrix}]

where

n \geq 3

,

A_{1, 1} \in R (2)

,

A_{1, 2} \in C (2, n - 2)

and

A_{2, 2} \in C (n - 2)

.

Matrices

A \in R (2)

correspond to linear operators

R (2, 1) \to R (2, 1)

and have the simplest nontrivial albeit rich structure. A surprisingly large number of facts about general linear operators is revealed by such matrices; see, e.g., ref. [32] and the examples below.

Example 6.

Let the matrix

A \in R (2)

have eigenvalues

λ_{1}, λ_{2}

and set

r = \sqrt{{∥ A ∥}_{F}^{2} - | λ_{1} |^{2} - {| λ_{2} |}^{2}} .

Then, the following four cases are possible in which the statements are reversible.

If $λ_{1} = λ_{2} = λ$ and $r = 0$ , then there exists a unique ConSF $λ I_{2}$ of the matrix A.
If $λ_{1} = λ_{2} = λ$ and $r > 0$ , then there exist two ConSF $λ I_{2} \pm r E_{1, 2}$ of the matrix A.
If $λ_{1} \neq λ_{2}$ and $r = 0$ , then there exist two ConSF $diag (λ_{1}, λ_{2})$ and $diag (λ_{2}, λ_{1})$ of the matrix A.
If $λ_{1} \neq λ_{2}$ and $r > 0$ , then there exist four ConSF $diag (λ_{1}, λ_{2}) \pm r E_{1, 2}$ , $diag (λ_{2}, λ_{1}) \pm r E_{1, 2}$ of the matrix A.

Example 7.

Let

A_{0} = λ I_{2}

,

0 \neq λ \in R

. We have

U (A_{0}) = O (2)

and

T (A_{0}) = {A_{0}}

. Since

A_{0}

is in ConSF, we consider the simplest particular DPS, which is

(I_{2}, A_{0})

. Let the matrix

A_{0}

be perturbed to

A (ε) = A_{0} + ε E_{2, 1}

, where ε is a small parameter. Then, a particular solution

(U (ε), T (ε))

of PSP may be written as

U (ε) = I_{2} + δ U (ε)

,

T (ε) = A_{0} + δ T (ε)

, where

δ T (ε) = ε U^{H} (ε) E_{2, 1} U (ε)

. Here, the perturbation

δ T (ε)

is small for order ε, but the perturbation

δ U (ε)

is not small even though ε is small.

For

ε \neq 0

, the set of transformation matrices

U (ε)

consists of the matrices

U_{1} = E_{1, 2} + E_{2, 1}

,

U_{2} = E_{1, 2} - E_{2, 1}

and their negations. In view of the equalities

U_{k} = I_{2} + δ U_{k}

, we have

∥ δ U_{1} ∥ = ∥ I_{2} \pm U_{1} ∥ = 2

and

∥ δ U_{2} ∥ = ∥ I_{2} \pm U_{2} ∥ = \sqrt{2}

. At the same time, for

ε \neq 0

, the set of Schur forms

T (A (ε))

consists of two matrices

A_{0} \pm ε E_{1, 2}

. Thus, the transformation matrix

U (ε)

is discontinuous at the point

ε = 0

.

Consider the multivalued function

Ψ : R \to 2^{O (2)}

, where

2^{O (2)}

is the set of subsets of

O (2)

, defined by

ε \mapsto Ψ (ε) : = U (ε)

. We have

Ψ (0) = O (2)

and

Ψ (ε) = {\pm U_{1}, \pm U_{2}}

for

ε \neq 0

. Hence, the function Ψ, i.e., the Schur basis for

R (2, 1)

relative to the matrix

A (ε)

, is discontinuous at the point

ε = 0

, while the Schur forms

T = T (ε)

of

A (ε)

are continuous in ε.

Example 8.

Let

A_{0} = λ I_{2} + E_{1, 2} \in R (2)

be a Jordan block with eigenvalue

λ \in R

. The set

T (A_{0})

contains two matrices

T_{0, 1} = λ I_{2} + E_{1, 2}

and

T_{0, 2} = λ I_{2} - E_{1, 2}

, while the set

U (A_{0})

contains the matrices

I_{2}

,

E_{1, 1} - E_{2, 2}

and their negations.

Let the matrix

A_{0}

be perturbed to

A (ε) = A_{0} + ε E_{2, 1}

, where

ε > 0

. The eigenvalues of

A (ε)

are

λ_{1} (ε) = λ - \sqrt{ε}

,

λ_{2} (ε) = λ + \sqrt{ε}

. Setting

c (ε) = \frac{1}{\sqrt{1 + ε}}, s (ε) = - \frac{\sqrt{ε}}{\sqrt{1 + ε}}

we see that there are four ConSF

T_{1} (ε) = [\begin{matrix} λ_{1} (ε) & 1 - ε \\ 0 & λ_{2} (ε) \end{matrix}], T_{2} (ε) = [\begin{matrix} λ_{1} (ε) & ε - 1 \\ 0 & λ_{2} (ε) \end{matrix}],

and

T_{3} (ε) = [\begin{matrix} λ_{2} (ε) & 1 - ε \\ 0 & λ_{1} (ε) \end{matrix}], T_{4} (ε) = [\begin{matrix} λ_{2} (ε) & ε - 1 \\ 0 & λ_{1} (ε) \end{matrix}],

The orthogonal matrices

U_{k} (ε)

that transform

A (ε)

into

T_{k} (ε)

are

U_{1} (ε) = [\begin{matrix} c (ε) & - s (ε) \\ s (ε) & c (ε) \end{matrix}], U_{2} (ε) = [\begin{matrix} c (ε) & s (ε) \\ s (ε) & - c (ε) \end{matrix}],

and

U_{3} (ε) = [\begin{matrix} c (ε) & s (ε) \\ - s (ε) & c (ε) \end{matrix}], U_{4} (ε) = [\begin{matrix} c (ε) & - s (ε) \\ - s (ε) & - c (ε) \end{matrix}]

Hence, there are two regular solutions of this PSP, namely

(U_{1} (ε), T_{1} (ε))

and

(U_{3} (ε), T_{3} (ε))

corresponding to the unperturbed ConSF

T_{0, 1}

and

T_{0, 2}

, respectively.

Example 9.

Let

A_{0} = diag (λ_{1}, λ_{2})

, where

λ_{1} \neq λ_{2}

. Here, the set

T (A_{0})

contains two diagonally different ConSF

T_{0, 1} = diag (λ_{1}, λ_{2})

and

T_{0, 2} = diag (λ_{2}, λ_{1})

of

A_{0}

, while the set

(A_{0})

has eight elements, namely

\pm E_{1, 1} \pm E_{2, 2}

and

\pm E_{1, 2} \pm E_{2, 1}

. Let us choose

δ A (ε) = ε E_{2, 1}

. For

ε \neq 0

, the set

T (A_{0} + ε E_{2, 1})

has four elements

{\tilde{T}}_{1} (ε) = T_{0, 1} + ε E_{1, 2}

,

{\tilde{T}}_{2} (ε) = T_{0, 1} - ε E_{1, 2}

and

{\tilde{T}}_{3} (ε) = T_{2} + ε E_{1, 2}

and

{\tilde{T}}_{4} (ε) = T_{2} - ε E_{1, 2}

.

The matrices

U_{1}, - U_{1}

from Example 7 transform the perturbed matrix

A + δ A (ε)

into the ConSF

{\tilde{T}}_{3} (ε)

and the matrices

U_{2}, - U_{2}

transform

A + δ A (ε)

into the ConSF

{\tilde{T}}_{4} (ε)

since

U_{1}

and

U_{2}

transform

δ A (ε)

in

\pm ε E_{1, 2} \in T (2)

, respectively.

Consider the transformation of

A + δ A

into some of the Schur forms

{\tilde{T}}_{1} (ε)

or

{\tilde{T}}_{2} (ε)

. Define the orthogonal matrices

U_{1} (ε) = [\begin{matrix} c (ε) & s (ε) \\ s (ε) & - c (ε) \end{matrix}], U_{2} (ε) = [\begin{matrix} c (ε) & - s (ε) \\ s (ε) & c (ε) \end{matrix}],

where

c (ε) = \frac{λ_{1} - λ_{2}}{\sqrt{ε^{2} + {(λ_{1} - λ_{2})}^{2}}}, s (ε) = \frac{ε}{\sqrt{ε^{2} + {(λ_{1} - λ_{2})}^{2}}} .

We have

U_{k}^{⊤} (ε) (A + δ A (ε)) U_{k} (ε) = {\tilde{T}}_{k} (ε), k = 1, 2 .

Furthermore,

U_{2} (0) = I_{2}

and

U_{1} (0) = diag (1, - 1)

are fulfilled. Hence, the regular solution of the PSP is

(U_{2} (ε), {\tilde{T}}_{2} (ε))

.

Example 10.

Let

A_{0} = diag (λ_{1}, λ_{2}) + a E_{1, 2} \in R (2)

, where

δ = λ_{2} - λ_{1} > 0

and

a \neq 0

. The set

T (A_{0})

of ConSF of

A_{0}

contains four matrices:

T_{1, 2} = diag (λ_{1}, λ_{2}) \pm a E_{1, 2}, T_{3, 4} = diag (λ_{2}, λ_{1}) \pm a E_{1, 2} .

Let the matrix

A_{0}

be perturbed to

A (ε) = A_{0} + ε E_{2, 1}

, where ε is a small parameter such that

δ^{2} + 4 a ε > 0

, i.e.,

ε \in (- ε_{0}, ε_{0})

, where

ε_{0} = δ^{2} / (4 | a |)

. The ConSF of the matrix

A (ε)

are

{\tilde{T}}_{1, 2} = diag ({\tilde{λ}}_{1}, {\tilde{λ}}_{2}) \pm \tilde{a} E_{1, 2}, {\tilde{T}}_{3, 4} = diag ({\tilde{λ}}_{2}, {\tilde{λ}}_{1}) \pm \tilde{a} E_{1, 2},

where the quantities

{\tilde{λ}}_{1} = {\tilde{λ}}_{1} (ε)

,

{\tilde{λ}}_{2} = {\tilde{λ}}_{2} (ε)

and

\tilde{a} = \tilde{a} (ε)

are analytical functions of ε. In particular,

\begin{matrix} {\tilde{λ}}_{1} = λ_{1} + \frac{a ε}{λ_{1} - λ_{2}} + O (ε^{2}), \\ {\tilde{λ}}_{2} = λ_{2} + \frac{a ε}{λ_{2} - λ_{1}} + O (ε^{2}), \\ \tilde{a} = a - ε + O (ε^{2}), ε \to 0 . \end{matrix}

Among the four ConSF, only the matrix

{\tilde{T}}_{1} = diag ({\tilde{λ}}_{1}, {\tilde{λ}}_{2}) + \tilde{a} E_{1, 2}

corresponds to a regular solution.

8. Diagonally Spectral Matrices

Denote by

Δ (n) \subset C (n)

the set of matrices

A \in C (n)

such that the multiset of its diagonal elements is equal to the multiset of its eigenvalues, i.e.,

spect (A) = {A (1, 1), A (2, 2), \dots, A (n, n)}

(6)

Otherwise speaking,

Δ (n)

is the set of matrices A such that

det (A - A (k, k) I_{n}) = 0, k = 1, 2, \dots, n

(7)

Definition 17.

The matrix

A \in Δ (n)

, which satisfies (6) or (7), is said to be diagonally spectral.

The set

Δ (n) \subset C (n)

is defined by n algebraic equations (7) (some of them may not be independent) in the

n^{2}

elements of the matrix A and is hence a closed algebraic variety [29] of complex dimension of at least

n^{2} - n

.

Upper triangular matrices and lower triangular matrices are diagonally spectral. Schur condensed forms in particular are diagonally spectral. More generally, for

P \in O (n)

being a permutation matrix, and

A \in C (n)

being a diagonally spectral matrix, the matrix

P A P

is also diagonally spectral.

Example 11.

The elements of the matrix

A \in Δ (2)

satisfy one independent algebraic equation

A (1, 2) A (2, 1) = 0

. Hence, the matrix

A \in Δ (2)

has the form

[\begin{matrix} * & * \\ 0 & * \end{matrix}], [\begin{matrix} * & 0 \\ * \end{matrix}]

where ∗ denotes unspecified matrix elements.

Example 12.

The matrices

A_{1}, A_{1}^{⊤}, A_{2}, A_{2}^{⊤}, A_{3}, A_{3}^{⊤} \in C (3)

, where

A_{1} = [\begin{matrix} * & * & * \\ 0 & * & * \\ 0 & 0 & * \end{matrix}], A_{2} = [\begin{matrix} * & 0 & * \\ * & * \\ 0 & 0 & * \end{matrix}], A_{3} = [\begin{matrix} * & * & * \\ 0 & * & 0 \\ 0 & * & * \end{matrix}],

are diagonally spectral.

Matrices from

Δ (n)

may not be condensed in the sense that they may have less than

n (n - 1) / 2

zero elements. In particular, matrices from

Δ (n)

may have all their elements different from zero.

Example 13.

Let

z \in C

be a parameter. Then, the matrices

A (z) = [\begin{matrix} 1 & 1 & 1 \\ 0 & 2 & 1 \\ z & - z & 3 \end{matrix}], B (z) = [\begin{matrix} 1 & 1 & 1 \\ z & 2 & 1 \\ - z - 2 & 2 & 3 \end{matrix}]

are diagonally spectral, i.e.,

spect (A (z)) = spect (B (z)) = {1, 2, 3}

. We stress that

A (0) \in T (3)

but

B (z) \notin T (3)

for all

z \in C

.

The main advantage of a Schur canonical or condensed form

T = U^{⊤} A U \in T (n), U \in U (n)

of a matrix

A \in C (n)

is that it reveals the spectrum

{λ_{1}, λ_{2}, \dots, λ_{n}}

of the matrix A as the collection of the diagonal elements

{T (1, 1), T (2, 2), \dots, T (n, n)}

of the form T. Thus, the sets of Schur canonical and condensed forms are subsets of the larger set (closed in the Zarisky topology)

Δ (n) = Δ_{1} (n) \cap Δ_{2} (n) \cap \dots \cap Δ_{n} (n) \subset C (n),

where

Δ_{k} (n) = {A \in C (n) : det (A - A (k, k) I_{n}) = 0}, k \in Z [1, n],

of matrices having spectra equal to the collection of their diagonal elements.

Obviously, the set

T (n)

as well as the set of lower triangular

n \times n

matrices are subsets of

Δ (n)

. More generally, if

P \in O (n)

is a permutation matrix (i.e., the columns of P are a permutation of the columns of the identity matrix

I_{n}

) and

T \in T (n)

then

P T P \in Δ (n)

. In particular, if

T \in T (n)

, then the matrix

R = P_{n} T P_{n}

is lower triangular with

R (k, k) = T (n + 1 - k, n + 1 - k)

,

k = 1, 2, \dots, n

.

Example 14.

Let

A = λ I_{2}

,

0 \neq λ \in R

. One of the DPS

(U, T)

of SP for A is the pair

(I_{n}, A)

. If we perturb A to

\tilde{A} = λ I_{2} + ε E_{2, 1}

, where

ε > 0

is small, the pair

(U, T)

is transformed to

(\tilde{U}, \tilde{T})

, where

\tilde{T} = A + ε E_{1, 2}

and

\tilde{U}

is any of the four matrices

\pm E_{1, 2} \pm E_{2, 1}

. Thus,

∥ U - \tilde{U} ∥ = 2

and the transformation matrix

U = U (ε)

is discontinuous at the point

ε = 0

.

To avoid such artificial high sensitivity, in the next section, we introduce the concept of condensed quasi-Schur forms.

9. Condensed Quasi-Schur Forms

Let

m \in Z [1, n]

be a fixed integer.

Definition 18.

The m-tuple

ν = (n_{1}, n_{2}, \dots, n_{m})

, where

n_{k}

are positive integers, is said to be m-partition of n if

n = n_{1} + n_{2} + \dots + n_{m}

.

Next, we define condensed quasi-Schur forms of the matrix

A \in C (n)

as block-triangular matrices (upper or lower) such that the blocks on the diagonal of A are, in turn, triangular matrices. In particular, condensed quasi-Schur forms of A are diagonally spectral and thus reveal the spectrum of A. The idea is that these forms are less sensitive to perturbations in A in comparison to other unitary equivalent forms of A such as ConSF.

Definition 19.

A matrix

S = U^{H} A U \in C (n)

, where

U \in U (n)

, is said to be a condensed quasi-Schur form of

A \in C (n)

if there exists an m-partition ν of n such that S or

S^{⊤}

is block-upper triangular with diagonal blocks

S_{k, k} \in C (n_{k})

, i.e.,

S = [\begin{matrix} S_{1, 1} & S_{1, 2} & \dots & S_{1, m} \\ O & S_{2, 2} & \dots & S_{2, m} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ O & O & \dots & S_{m, m} \end{matrix}]

and either

S_{k, k} \in T (n_{k})

or

S_{k, k}^{⊤} \in T (n_{k})

,

k = 1, 2, \dots, m

.

Note that the upper triangular matrix

S_{k, k}

is permutationally equivalent to the lower triangular matrix

S_{k, k}^{⊤} = P_{n_{k}} S_{k, k} P_{n_{k}}

. At the same time, a matrix A and its transpose

A^{⊤} = V^{- 1} A V

are similar [33], but we cannot use this result here since the matrix V is not unitary in general. Instead of the transposed matrix

A^{⊤}

, we may use its flipped variant

A^{F} = P_{n} A P_{n}

, which is lower triangular whenever A is upper triangular and

P_{n}

is a permutation matrix.

Example 15.

For

n = 2

, the condensed quasi-Schur forms are

A_{1}, A_{1}^{⊤}

, where

A_{1} \in T (2)

. For

n = 3

, the condensed quasi-Schur are

A_{1}, A_{1}^{⊤}

,

A_{2}, A_{2}^{⊤}

and

A_{3}, A_{3}^{⊤}

, where

A_{1} \in T (3)

,

A_{2} = [\begin{matrix} * & * & * \\ 0 & * & 0 \\ 0 & * & * \end{matrix}], A_{3} = [\begin{matrix} * & 0 & * \\ * & * \\ 0 & 0 & * \end{matrix}] .

and the star denotes unspecified elements.

Condensed quasi-Schur forms are diagonally spectral, but the opposite is not true for

n \geq 3

; see Example 13. Obviously, a ConSF is also a condensed quasi-Schur form, but the opposite may not be true (we recall that

n \geq 2

). We stress that the high sensitivity of Schur forms as in Example 14 may not be observed for condensed quasi-Schur forms.

We stress, finally, that the concept of a diagonally spectral matrix as an algebraic object may be of independent interest in matrix analysis and should be independently studied in more detail.

10. Properties of Canonical/Condensed Forms

Canonical and condensed forms of matrices and matrix pencils under the action of matrix transformation groups are widely used in matrix analysis and control theory. These forms have zeros at given positions and, optionally, ones at other positions. Introducing zeros at given positions may lead to discontinuity of the transformation matrix, while introducing ones may cause bad conditioning of this matrix. In particular, the requirement that the condensed form T of A is upper triangular, i.e.,

T \in T (n)

, may lead to extreme sensitivity of the transformation pair

(T, U)

relative to perturbations in the matrix A. At the same time, this high sensitivity may not be relevant to the problem of computing the spectrum of A. Also, whether a condensed form is upper or lower triangular does not matter from both theoretical and practical points of view. In the concept of a condensed quasi-Schur form of a matrix, we exploit this idea: the condensed form is defined as a set, containing all triangular forms.

This approach may be extended to generalized Jordan forms (GJF) of matrices. We recall that GJF

G = X^{- 1} A X

of the matrix A is a bi-diagonal matrix with the eigenvalues of A on its diagonal and the property that

G (i, i + 1) \neq 0

implies

G (i, i) = G (i + 1, i + 1)

. The difference with the standard Jordan form of A is that the nonzero super-diagonal elements

G (i, i + 1)

of G are not necessarily equal to 1.

11. Conclusions

Novelty

In this paper, we consider condensed Schur forms for a general square matrix A as well as various sets of canonical Schur forms for a square matrix A with distinct eigenvalues. This case is generic, and hence, it is a primary problem [34] in the analysis of Schur decompositions. We also study the most non-generic case of scalar matrices, which belong to a one-dimensional variety in the space of square matrices.

The sensitivity of the Schur forms relative to perturbations in A is also studied. The concepts of a diagonal preserving solution to the Schur problem and of a regular solution to the perturbed Schur problem are introduced and illustrated by many examples.

We also introduce the concepts of diagonally spectral matrices and of quasi-Schur condensed forms of a matrix A. The latter forms are much less sensitive (if ever) to perturbations in the matrix A in comparison with the upper triangular condensed forms of A. The concept of a diagonally spectral matrix is about a new algebraic object that may be studied independently in matrix theory as a closed variety in the Zariski topology.

Author Contributions

Methodology, M.M.K.; Validation, P.H.P.; Formal analysis, M.M.K.; Writing—review & editing, P.H.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors of this paper are grateful to the anonymous reviewers for their very useful and detailed comments and suggestions, which helped to improve the text.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Schur, I. Beiträge zur Theorie der Gruppen linearer homogener Substitutionen. Trans. Am. Math. Soc. 1909, 10, 159–175. [Google Scholar] [CrossRef][Green Version]
Bhatia, R. Matrix Analysis; Springer: Berlin/Heidelberg, Germany, 1996; ISBN 978-0387948461. [Google Scholar]
Golub, G.; Loan, C.V. Matrix Computations, 4th ed.; The Johns Hopkins University Press: Baltimore, MD, USA, 2013; ISBN 978-1421407944. [Google Scholar]
Horn, R.; Johnson, C. Matrix Analysis, 2nd ed.; Cambridge University Press: Cambridge, UK, 2012; ISBN 978-0521839402. [Google Scholar]
Horn, R.; Johnson, C. Topics in Matrix Analysis; Cambridge University Press: Cambridge, UK, 1991; ISBN 978-0511840371. [Google Scholar]
Konstantinov, M.; Petkov, P.; Christov, N. Nonlocal perturbation analysis of the Schur system of a matrix. SIAM J. Matrix Anal. Appl. 1994, 15, 383–392. [Google Scholar] [CrossRef]
Konstantinov, M.; Petkov, P. Perturbation Methods in Matrix Analysis and Control; Nova Science Publishers: New York, NY, USA, 2020; ISBN 978-1536174700. [Google Scholar] [CrossRef]
Minenkova, A.; Nitch-Griffin, E.; Olshevsky, V. Gohberg-Kaashoek numbers and forward stability of Schur canonical forms. arXiv 2021, arXiv:2110.15334. [Google Scholar] [CrossRef]
Minenkova, A.; Nitch-Griffin, E.; Olshevsky, V. Backward stability of the Schur decomposition under small perturbations. Linear Algebra Appl. 2024, in press. [Google Scholar] [CrossRef]
Zhang, G.; Li, H.; Wel, Y. Componentwise perturbation analysis for the generalized Schur decomposition. Calcolo 2022, 59, 19. [Google Scholar] [CrossRef]
Konstantinov, M.; Mehrmann, V.; Petkov, P. Perturbation analysis of Hamiltonian Schur and block-Schur forms. SIAM J. Matrix Anal. Appl. 2001, 23, 387–424. [Google Scholar] [CrossRef]
Chu, E. Pole assignment via the Schur form. Syst. Control. Lett. 2007, 56, 303–314. [Google Scholar] [CrossRef]
Chu, D.; Liu, X.; Mehrmann, V. A numerical method for computing the Hamiltonian Schur form. Numer. Math. 2007, 105, 375–412. [Google Scholar] [CrossRef]
Chen, J.; Ma, W.; Miao, Y.; Wei, Y. Perturbations of Tensor-Schur decomposition and its applications to multivariable control systems and facial recognitions. Neurocomputing 2023, 547, 126446. [Google Scholar] [CrossRef]
Stewart, G.; Sun, J. Matrix Perturbation Theory; Academic Press: Cambridge, MA, USA, 1990; ISBN 978-0126702309. [Google Scholar]
Konstantinov, M.; Gu, D.; Mehrmann, V.; Petkov, P. Perturbation Theory for Matrix Equations; Science Direct: Amsterdam, The Netherlands, 2003; ISBN 0-444513159. [Google Scholar]
Konstantinov, M.; Petkov, P.; Christov, N. Invariants and canonical forms for linear multivariable systems under the action of orthogonal transformation groups. Kybernetika 1981, 17, 413–424. [Google Scholar]
Konstantinov, M.; Postlethwhite, I.; Gu, D.; Petkov, P. Perturbation analysis of orthogonal canonical forms. Linear Algebra Appl. 1997, 251, 267–291. [Google Scholar] [CrossRef][Green Version]
Boley, D.; Datta, B. Numerical Methods for Linear Control Systems. In Systems and Control in the Twenty-First Century. Systems & Control: Foundations & Applications; Byrnes, C., Ed.; Birkhäuser: Boston, MA, USA, 1997; Volume 22. [Google Scholar]
Sun, J. Perturbation bounds for the generalized Schur decomposition. SIAM J. Matrix Anal. Appl. 1995, 16, 1328–1340. [Google Scholar] [CrossRef]
Sun, J. Perturbation analysis of system Hessenberg and Hessenberg/triangular forms. Linear Algebra Appl. 1996, 241/243, 811–849. [Google Scholar] [CrossRef][Green Version]
The MathWorks, Inc. MATLAB Version 9.9.0.1538559 (R2020b); The MathWorks, Inc.: Natick, MA, USA, 2020. [Google Scholar]
Murnaghan, F.; Wintner, A. A canonical form for real matrices under orthogonal transformations. Proc. Natl. Acad. Sci. USA 1931, 17, 417–420. [Google Scholar] [CrossRef] [PubMed]
Shapiro, H. A survey of canonical forms and invariants for unitary similarity. Linear Algebra Appl. 1991, 147, 101–167. [Google Scholar] [CrossRef]
Higham, N. Functions of Matrices: Theory and Computation; SIAM: Philadelphia, PA, USA, 2008. [Google Scholar] [CrossRef]
Brenner, J. The problem of unitary equivalence. Acta Math. 1951, 86, 297–308. [Google Scholar] [CrossRef]
Littlewood, D. On unitary equivalence. J. Lond. Math. Soc. 1953, 28, 314–322. [Google Scholar] [CrossRef]
Ikramov, K. The canonical Schur form of a matrix with simple eigenvalues. Dokl. Math. 2008, 77, 359–360. [Google Scholar] [CrossRef]
Hartshorne, R. Algebraic Geometry; Springer: Berlin/Heidelberg, Germany, 1977; ISBN 978-0387902449. [Google Scholar]
Ikramov, K. On a constructive procedure for verifying whether a matrix can be made real by a unitary similarity transformation. Comput. Math. Math. Phys. 2010, 50, 383–386. [Google Scholar] [CrossRef]
754-2019; IEEE Standard for Floating-Point Arithmetic. IEEE Computer Society: Washington, DC, USA, 2019. [CrossRef]
Glazman, M.; Ljubich, J. Finite-Dimensional Linear Analysis: A Systematic Presentation in Problem Form (Dover Books in Mathematics); Dover Publications: New York, NY, USA, 2006; ISBN 978-0486453323. [Google Scholar]
Tausky, O.; Zassenhaus, H. On the similarity transformation between a matrix and its transpose. Pac. J. Math. 1959, 9, 893–896. [Google Scholar] [CrossRef]
Arnold, V. Geometric Methods in the Theory of Ordinary Differential Equations; Springer Science + Business Media: New York, NY, USA, 1988; ISBN 978-1461269946. [Google Scholar]

Table 1. Number of generic canonical Schur forms.

n	2	3	4	5	6	7	8	9	10
$μ_{n}$	1	3	20	210	3003	54,264	1,184,040	30,260,340	886,163,135

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

On Schur Forms for Matrices with Simple Eigenvalues

Abstract

1. Introduction and Notation

2. Condensed Schur Forms

3. Canonical Schur Forms for Generic Matrices

4. Geometry of Schur Canonical Sets

5. Real Schur Canonical Forms

6. Perturbations of the Schur Problem

7. Examples of Real $2 \times 2$ Matrices

8. Diagonally Spectral Matrices

9. Condensed Quasi-Schur Forms

10. Properties of Canonical/Condensed Forms

11. Conclusions

Novelty

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

On Schur Forms for Matrices with Simple Eigenvalues

Abstract

1. Introduction and Notation

2. Condensed Schur Forms

3. Canonical Schur Forms for Generic Matrices

4. Geometry of Schur Canonical Sets

5. Real Schur Canonical Forms

6. Perturbations of the Schur Problem

7. Examples of Real 2 × 2 Matrices

8. Diagonally Spectral Matrices

9. Condensed Quasi-Schur Forms

10. Properties of Canonical/Condensed Forms

11. Conclusions

Novelty

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

7. Examples of Real $2 \times 2$ Matrices