On the Lanczos Method for Computing Some Matrix Functions

Gu, Ying; Srivastava, Hari Mohan; Liu, Xiaolan

doi:10.3390/axioms13110764

Open AccessArticle

On the Lanczos Method for Computing Some Matrix Functions

by

Ying Gu

^1,*,

Hari Mohan Srivastava

^2,3,4,5,6

and

Xiaolan Liu

¹

School of Sciences and Arts, Suqian University, Suqian 223800, China

²

Department of Mathematics and Statistics, University of Victoria, Victoria, BC V8W 3R4, Canada

³

Department of Medical Research, China Medical University Hospital, China Medical University, Taichung 40402, Taiwan

⁴

Department of Mathematics and Informatics, Azerbaijan University, AZ197 Baku, Azerbaijan

⁵

Section of Mathematics, International Telematic University Uninettuno, I-00186 Rome, Italy

⁶

Center for Converging Humanities, Kyung Hee University, 26 Kyungheedae-ro, Dongdaemun-gu, Seoul 02447, Republic of Korea

^*

Author to whom correspondence should be addressed.

Axioms 2024, 13(11), 764; https://doi.org/10.3390/axioms13110764

Submission received: 7 October 2024 / Revised: 30 October 2024 / Accepted: 1 November 2024 / Published: 4 November 2024

(This article belongs to the Section Mathematical Analysis)

Download

Browse Figures

Versions Notes

Abstract

The study of matrix functions is highly significant and has important applications in control theory, quantum mechanics, signal processing, and machine learning. Previous work has mainly focused on how to use the Krylov-type method to efficiently calculate matrix functions

f (A)

β

and

β^{T} f (A)

β

when A is symmetric. In this paper, we mainly illustrate the convergence using the polynomial approximation theory for the case where A is symmetric positive definite. Numerical results illustrate the effectiveness of our theoretical results.

Keywords:

matrix logarithm function; Chebyshev polynomial; Legendre polynomial; Lanczos method; Krylov subspace

MSC:

65K10; 65K05; 90C20

1. Introduction

The study of matrix functions is highly significant and has important applications in control theory, quantum mechanics, signal processing, and machine learning. Some studies on discrete matrix functions have been investigated in [1,2]. Let

A \in R^{n \times n}

be a large sparse matrix and

β \in R^{n}

be a nonzero real vector. Many research works have focused on computing

f (A) β

and

β^{T} f (A) β

[3,4,5] using the Krylov-type method, where f is a function that makes

f (A) \in R^{n \times n}

well defined and

β^{T} f (A) β

is called a bilinear form when A is symmetric [6]. Theoretical results and numerical experiments have shown that

f (A) β

and

β^{T} f (A) β

can be calculated efficiently using the Krylov-type method [7,8,9,10].

In this paper, we consider the special case that matrix A is symmetric positive definite and

f (\cdot) = log (\cdot)

,

f (\cdot) = {(\cdot)}^{- \frac{1}{2}}

or

f (\cdot) = {(\cdot)}^{- 1}

. The logarithm of A is any matrix X that holds

exp (X) = A

(refer to ([11], Chapter 11), [12,13], or [14]). Let the eigen decomposition of A be

A = U Λ U^{T},

with

Λ = (\begin{matrix} λ_{1} \\ λ_{2} \\ ⋱ \\ λ_{n} \end{matrix}),

where

U \in R^{n \times n}

,

U^{T} U = I

, and

λ_{1} \geq λ_{2} \geq \dots \geq λ_{n} > 0

. If an eigenvalue of A is less than 0, then

log (A)

is not well defined ([11], Chapter 11). Thus, if A is symmetric, it must be positive definite. The matrix logarithm function is defined for A as follows ([11], Definition 1.2):

log (A) : = U \cdot (\begin{matrix} log (λ_{1}) \\ log (λ_{2}) \\ ⋱ \\ log (λ_{n}) \end{matrix}) \cdot U^{T} \in R^{n \times n} .

Similarly, the matrix square root is defined for A as follows ([11], Definition 1.2):

A^{\frac{1}{2}} : = U \cdot (\begin{matrix} \sqrt{λ_{1}} \\ \sqrt{λ_{2}} \\ ⋱ \\ \sqrt{λ_{n}} \end{matrix}) \cdot U^{T} \in R^{n \times n} .

The estimation of the log-determinant of matrix A is widely used in, for example, Markov random field models [15], lattice quantum chromodynamics [16], statistical learning [17], and so on. Moreover, the calculation of the bilinear form

β^{T} log (A) β

is crucial for the estimation of the log-determinant of matrix A. The computation of

A^{- \frac{1}{2}} β

arises in the context of Markov function problems and domain decomposition methods [18]. The computation of

trace (A^{- 1})

has received much attention [6,19]. The utilization of modified moments [6,20], Monte Carlo methods [21], and Gaussian quadratures [6,22] to estimate

trace (A^{- 1})

for symmetric positive definite matrices has been proposed. These estimates have a wide range of applications in mathematics [6], physics [23], and statistics [24]. Moreover, the computation of

β^{T} A^{- 1} β

is important for estimating the trace of the matrix inverse [25,26]. The Lanczos method is a method commonly used to compute

log (A) β

,

β^{T} log (A) β

,

A^{- \frac{1}{2}} β

and

β^{T} A^{- 1} β

.

Using the polynomial approximation theory, we prove, in this paper, that the Lanczos method converges at rates

O (k^{- 1} {(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{k}), O (k^{- 1} {(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{2 k - 1}), O ({(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{k}) a n d O ({(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{2 k - 1})

for computing

log (A) β

,

β^{T} log (A) β

,

A^{- \frac{1}{2}} β

, and

β^{T} A^{- 1} β

, respectively. Here, k denotes the iterations and

ϰ = ∥ A ∥ ∥ A^{- 1} ∥

, with

∥ \cdot ∥

denoting the Euclidean norm of a matrix or a vector in this paper.

Throughout our paper, we let A be symmetric positive definite and organized the paper as follows. We present the Lanczos approximation to

log (A) β

,

β^{T} log (A) β

,

A^{- \frac{1}{2}} β

, and

β^{T} A^{- 1} β

in Section 2. In Section 3, a convergence analysis of the Lanczos method for computing

log (A) β

,

β^{T} log (A) β

,

A^{- \frac{1}{2}} β

, and

β^{T} A^{- 1} β

is provided. Numerical results are presented in Section 4 to illustrate the effectiveness of our bounds, and in Section 5, we present a conclusion.

2. Lanczos Approximation of $log (A) β$ , $β^{T} log (A) β$ , $A^{- \frac{1}{2}} β$ , and $β^{T} A^{- 1} β$

For the Krylov subspace

K_{k} (A, β) = span {β, A β, \dots, A^{k - 1} β},

we can usually obtain an orthonormal basis

Q_{k}

of it using the Lanczos process [6]. Moreover, it holds that

A Q_{k} = Q_{k} S_{k} + θ_{k} q_{k + 1} e_{1}^{T},

(1)

where

\begin{matrix} Q_{k} = [q_{1}, q_{2}, \dots, q_{k}] \in R^{n \times k}, q_{1} = \frac{β}{∥ β ∥}, \\ {[Q_{k} q_{k + 1}]}^{T} [Q_{k} q_{k + 1}] = I_{k + 1}, e_{1} = {(1 0 \dots 0)}^{T} \end{matrix}

and

S_{k} = Q_{k}^{T} A Q_{k} = (\begin{matrix} ϖ_{1} & θ_{1} \\ θ_{1} & ϖ_{2} & θ_{2} \\ ⋱ & ⋱ & ⋱ \\ θ_{k - 2} & ϖ_{k - 1} & θ_{k - 1} \\ θ_{k - 1} & ϖ_{k} \end{matrix}) \in R^{k \times k} .

Thus,

∥ β ∥ Q_{k} log (S_{k}) e_{1}

,

{∥ β ∥}^{2} e_{1}^{T} log (S_{k}) e_{1}

,

∥ β ∥ Q_{k} S_{k}^{- \frac{1}{2}} e_{1}

, and

{∥ β ∥}^{2} e_{1}^{T} S_{k}^{- 1} e_{1}

are taken as approximations of

log (A) β

,

β^{T} log (A) β

,

A^{- \frac{1}{2}} β

, and

β^{T} A^{- 1} β

, respectively ([11], Section 13.2).

If the Lanczos process (1) terminates at iteration

k_{max}

for the first time, then

θ_{k_{max}} = 0

and

A Q_{k_{max}} = Q_{k_{max}} S_{k_{max}} .

(2)

Let the eigen decomposition of

S_{k_{max}}

be

S_{k_{max}} = W \cdot diag (d_{1}, d_{2}, \dots, d_{k_{max}}) \cdot W^{T} .

Hence,

{d_{1}, d_{2}, \dots, d_{k_{max}}} \subseteq {λ_{1}, λ_{2}, \dots, λ_{n}}

. By ([11], Definition 1.4), we have that there exist three polynomials q, p, and h such that

A^{- \frac{1}{2}} = q (A)

,

log (A) = p (A)

and

h (A) = A^{- 1}

. Hence,

q (λ_{i}) = \sqrt{λ_{i}}

,

p (λ_{i}) = log (λ_{i})

, and

h (λ_{i}) = λ_{i}^{- 1}

,

i = 1, 2, \dots, n

. Thus, it can be seen that

q (d_{i}) = \sqrt{d_{i}}

,

p (d_{i}) = log (d_{i})

,

h (d_{i}) = d_{i}^{- 1}

,

i = 1, 2, \dots, k_{max}

,

\begin{matrix} log (A) β = p (A) β \\ = & ∥ β ∥ \cdot p (A) Q_{k_{max}} e_{1} \\ \overset{(2)}{=} & ∥ β ∥ \cdot Q_{k_{max}} p (S_{k_{max}}) e_{1} \\ = & ∥ β ∥ \cdot Q_{k_{max}} W \cdot diag (p (d_{1}), p (d_{2}), \dots, p (d_{k_{max}})) \cdot W^{T} e_{1} \\ = & ∥ β ∥ \cdot Q_{k_{max}} W \cdot diag (log (d_{1}), log (d_{2}), \dots, log (d_{k_{max}})) \cdot W^{T} e_{1} \\ = & ∥ β ∥ \cdot Q_{k_{max}} log (S_{k_{max}}) e_{1}, \end{matrix}

and

β^{T} log (A) β = ∥ β ∥ \cdot β^{T} Q_{k_{max}} log (S_{k_{max}}) e_{1} = {∥ β ∥}^{2} \cdot e_{1}^{T} log (S_{k_{max}}) e_{1} .

(3)

Similarly, it holds that

A^{- \frac{1}{2}} β = ∥ β ∥ \cdot S_{k_{max}}^{- \frac{1}{2}} e_{1} and β^{T} A^{- 1} β = {∥ β ∥}^{2} \cdot e_{1}^{T} S_{k_{max}}^{- 1} e_{1} .

Thus, when the Lanczos process terminates,

log (A) β

,

β^{T} log (A) β

,

A^{- \frac{1}{2}} β

, and

β^{T} A^{- 1} β

can be found exactly.

3. Main Achievements

The convergence of the Lanczos approximations to

∥ β ∥ Q_{k} log (S_{k}) e_{1}

,

{∥ β ∥}^{2} e_{1}^{T} log (S_{k}) e_{1}

,

∥ β ∥ Q_{k} S_{k}^{- \frac{1}{2}} e_{1}

, and

{∥ β ∥}^{2} e_{1}^{T} S_{k}^{- 1} e_{1}

is mainly considered in this section. We first give the following three lemmas.

Lemma 1

(([27], eq.(5.6.2)), and ([28], p. 449)). For any

| z | \leq 1

and

x \in [- 1, 1]

, we have

\begin{matrix} log (1 - 2 x z + z^{2}) = - 2 \sum_{n = 1}^{\infty} \frac{T_{n} (x) z^{n}}{n}, \end{matrix}

and

\frac{1}{\sqrt{1 - 2 x z + z^{2}}} = \sum_{n = 1}^{\infty} P_{n} (x) z^{n},

(4)

where

T_{n} (x)

represents the first kind of n-th degree Chebyshev polynomial

T_{n} (x) : = cos (n arccos x), x \in [- 1, 1], n \in N_{+},

and

P_{n} (x)

represents the n-th degree Legendre polynomial

P_{n} (x) = \frac{{(- 1)}^{n}}{2^{n} n!} \cdot \frac{d^{n}}{d x^{n}} {(1 - x^{2})}^{n}, x \in [- 1, 1], n \in N_{+} .

Moreover,

max_{| x | \leq 1} | P_{n} (x) | = 1 .

(5)

Lemma 2

([8], Lemma 2.2). If f is analytic on

[λ_{n}, λ_{1}]

, then

\begin{matrix} \frac{∥ f (A) β - ∥ β ∥ Q_{j} f (S_{j}) e_{1} ∥}{∥ β ∥} \leq 2 \cdot min_{p_{j} \in P_{j - 1}} max_{λ \in [λ_{n}, λ_{1}]} | f (λ) - p (λ) |, \end{matrix}

(6)

\begin{matrix} \frac{| β^{T} f (A) β - {∥ β ∥}^{2} e_{1}^{T} f (S_{j}) e_{1} |}{{∥ β ∥}^{2}} \leq 2 \cdot min_{p_{j} \in P_{2 j - 1}} max_{λ \in [λ_{n}, λ_{1}]} | f (λ) - p (λ) |, \end{matrix}

(7)

where

j \in N_{+}

, and

P_{s}

denotes the polynomials set with degrees no higher than

s \in N_{+}

.

Lemma 3

([29]). For any

ω > 1

, we have

min_{ϕ \in P_{j}} max_{x \in [- 1, 1]} |\frac{1}{x - ω} - ϕ (x)| \leq \frac{{(ω + \sqrt{ω^{2} - 1})}^{- j}}{ω^{2} - 1} .

The main achievements of this article are presented below.

Theorem 1.

Let

ϰ = ∥ A ∥ ∥ A^{- 1} ∥

. For

k \geq 1

,

ϵ_{1}^{(k)} : = \frac{∥ log (A) β - ∥ β ∥ \cdot Q_{k} log (S_{k}) e_{1} ∥}{∥ log (A) β ∥} \leq \frac{2 ∥ β ∥}{∥ log (A) β ∥} \cdot \frac{\sqrt{ϰ} + 1}{k} {(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{k},

(8)

and

0 \leq ϵ_{2}^{(k)} : = \frac{{∥ β ∥}^{2} \cdot e_{1}^{T} log (S_{k}) e_{1} - β^{T} log (A) β}{| β^{T} log (A) β |} \leq \frac{{∥ β ∥}^{2}}{| β^{T} log (A) β |} \cdot \frac{\sqrt{ϰ} + 1}{k} {(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{2 k} .

(9)

Proof.

Let

η : = \frac{λ_{1} + λ_{n}}{λ_{1} - λ_{n}} = \frac{ϰ + 1}{ϰ - 1} .

(10)

Then,

z_{0} : = η - \sqrt{η^{2} - 1} = \frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1} \in (0, 1), and 1 + z_{0}^{2} = 2 z_{0} η .

(11)

According to Lemma 1, we have

log (2 z_{0} (η - x)) = log (1 - 2 x z_{0} + z_{0}^{2}) = - 2 \sum_{n = 1}^{\infty} \frac{T_{n} (x) z_{0}^{n}}{n} .

(12)

Consider the linear transformation

λ = \frac{λ_{n} - λ_{1}}{2} x + \frac{λ_{1} + λ_{n}}{2}, x \in [- 1, 1],

which maps

x \in [- 1, 1]

bijectively onto

λ \in [λ_{n}, λ_{1}]

. Then,

\begin{matrix} \frac{∥ log (A) β ∥ ϵ_{1}^{(k)}}{∥ β ∥} \overset{(6)}{\leq} 2 min_{p \in P_{k - 1}} max_{λ \in [λ_{n}, λ_{1}]} | log λ - p (λ) | \\ = & 2 min_{p \in P_{k - 1}} max_{- 1 \leq x \leq 1} |log (\frac{λ_{n} - λ_{1}}{2} x + \frac{λ_{1} + λ_{n}}{2}) - p (\frac{λ_{n} - λ_{1}}{2} x + \frac{λ_{1} + λ_{n}}{2})| \\ = & 2 min_{q \in P_{k - 1}} max_{- 1 \leq x \leq 1} |log [\frac{λ_{n} - λ_{1}}{2} (x - η)] - q (x)| \\ = & 2 min_{q \in P_{k - 1}} max_{- 1 \leq x \leq 1} |log [\frac{λ_{1} - λ_{n}}{4 z_{0}} (2 z_{0} (η - x))] - q (x)| \\ = & 2 min_{q \in P_{k - 1}} max_{- 1 \leq x \leq 1} |log (2 z_{0} (η - x)) - q (x) + log (\frac{λ_{1} - λ_{n}}{4 z_{0}})| . \end{matrix}

Let

q (x) - log (\frac{λ_{1} - λ_{n}}{4 z_{0}}) = - 2 \sum_{n = 1}^{k - 1} \frac{T_{n} (x) z_{0}^{n}}{n} \in P_{k - 1} .

By (12) and

| T_{n} (x) | \leq 1

for

x \in [- 1, 1]

, we obtain that

\begin{matrix} \frac{∥ log (A) β ∥ ϵ_{1}^{(k)}}{∥ β ∥} \leq & 2 max_{x \in [- 1, 1]} |log (2 z_{0} (η - x)) + 2 \sum_{n = 1}^{k - 1} \frac{T_{n} (x) z_{0}^{n}}{n} (x)| \\ \leq & 4 \sum_{n = k}^{\infty} \frac{z_{0}^{n}}{n} = 4 z_{0}^{k} \sum_{n = 0}^{\infty} \frac{z_{0}^{n}}{n + k} \\ \leq & \frac{4 z_{0}^{k}}{k} \sum_{n = 0}^{\infty} z_{0}^{n} = \frac{4 z_{0}^{k}}{k (1 - z_{0})}, \end{matrix}

which yields (8).

Furthermore, a linear map

Ψ : R^{n_{1} \times n_{1}} \to R^{n_{2} \times n_{2}}

is called a unital positive linear map ([30], p. 5) if the following hold:

(i): A is symmetric positive definite ⇒ $Ψ (A)$ is also symmetric positive definite;
(ii): $Ψ (I_{n_{1}}) = I_{n_{2}}$ .

Define

Φ (Δ) = Δ (1 : k, 1 : k) \in R^{k \times k}, where Δ \in R^{k_{max} \times k_{max}} .

Thus,

Φ

is a unital positive linear map. By ([30], Corollary 1.8),

log (S_{k}) - Φ (log (S_{k_{max}})) = log (Φ (S_{k_{max}})) - Φ (log (S_{k_{max}})) i s p o s i t i v e s e m i d e f i n i t e .

It follows that

\begin{matrix} 0 \leq & e_{1}^{T} log (S_{k}) e_{1} - e_{1}^{T} Φ (log (S_{k_{max}})) e_{1} \\ = & e_{1}^{T} log (S_{k}) e_{1} - e_{1}^{T} log (S_{k_{max}}) e_{1} \\ \overset{(3)}{=} & e_{1}^{T} log (S_{k}) e_{1} - {∥ β ∥}^{- 2} \cdot β^{T} log (A) β, \end{matrix}

which yields the first inequality in (9). In light of (7), we see that

\begin{matrix} \frac{| β^{T} log (A) β | ϵ_{2}^{(k)}}{{∥ β ∥}^{2}} \\ \leq & 2 min_{h \in P_{2 k - 1}} max_{λ \in [λ_{n}, λ_{1}]} | log (λ) - h (λ) | \\ = & 2 min_{h \in P_{2 k - 1}} max_{- 1 \leq x \leq 1} |log (\frac{λ_{n} - λ_{1}}{2} x + \frac{λ_{1} + λ_{n}}{2}) - h (\frac{λ_{n} - λ_{1}}{2} x + \frac{λ_{1} + λ_{n}}{2})| \\ = & 2 min_{h \in P_{2 k - 1}} max_{- 1 \leq x \leq 1} |log (2 z_{0} (η - x)) - h (\frac{λ_{n} - λ_{1}}{2} x + \frac{λ_{1} + λ_{n}}{2}) + log (\frac{λ_{1} - λ_{n}}{4 z_{0}})| . \end{matrix}

Analogously, consider

h (\frac{λ_{n} - λ_{1}}{2} x + \frac{λ_{1} + λ_{n}}{2}) - log (\frac{λ_{1} - λ_{n}}{4 z_{0}}) = - 2 \sum_{n = 1}^{2 k - 1} \frac{T_{n} (x) z_{0}^{n}}{n} \in P_{2 k - 1} .

Recall that

| T_{n} (x) | \leq 1

for

x \in [- 1, 1]

. We then have

\begin{matrix} 0 \leq & \frac{| β^{T} log (A) β | ϵ_{2}^{(k)}}{{∥ β ∥}^{2}} \\ \leq & 2 max_{- 1 \leq x \leq 1} |log (2 z_{0} (η - x)) + 2 \sum_{n = 1}^{2 k - 1} \frac{T_{n} (x) z_{0}^{n}}{n}| \\ \overset{(12)}{\leq} & 4 \sum_{n = 2 k}^{\infty} \frac{z_{0}^{n}}{n} = 4 z_{0}^{2 k} \sum_{n = 0}^{\infty} \frac{z_{0}^{n}}{n + 2 k} \\ \leq & \frac{2 z_{0}^{2 k}}{k} \sum_{n = 0}^{\infty} z_{0}^{n} = \frac{2 z_{0}^{2 k}}{k (1 - z_{0})}, \end{matrix}

which yields (9). □

Theorem 2.

For

k \geq 1

,

ϵ_{3}^{(k)} : = \frac{∥A^{- \frac{1}{2}} β - ∥ β ∥ Q_{k} S_{k}^{- \frac{1}{2}} e_{1}∥}{∥ A^{- \frac{1}{2}} β ∥} \leq \frac{2 ∥ β ∥ (\sqrt{κ} + 1)}{∥ A^{- \frac{1}{2}} β ∥ \sqrt{λ_{1} - λ_{n}}} {(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{k + \frac{1}{2}}

(13)

Proof.

By (11) and Lemma 1,

\frac{1}{\sqrt{2 z_{0} (η - x)}} = \frac{1}{\sqrt{1 - 2 x z_{0} + z_{0}^{2}}} = \sum_{n = 1}^{\infty} P_{n} (x) z_{0}^{n} .

It holds that

\begin{matrix} \frac{∥A^{- \frac{1}{2}} β - ∥ β ∥ Q_{k} S_{k}^{- \frac{1}{2}} e_{1}∥}{∥ β ∥} \leq 2 \cdot min_{p_{j} \in P_{j - 1}} max_{λ \in [λ_{n}, λ_{1}]} |\frac{1}{\sqrt{λ}} - p (λ)| \\ = & 2 min_{p \in P_{k - 1}} max_{x \in [- 1, 1]} |{(\frac{λ_{n} - λ_{1}}{2} x + \frac{λ_{1} + λ_{n}}{2})}^{- \frac{1}{2}} - p (\frac{λ_{n} - λ_{1}}{2} x + \frac{λ_{1} + λ_{n}}{2})| \\ \overset{(10)}{=} & 2 min_{q \in P_{k - 1}} max_{x \in [- 1, 1]} |{[\frac{λ_{1} - λ_{n}}{4 z_{0}} (2 z_{0} (η - x))]}^{- \frac{1}{2}} - q (x)| \\ = & \frac{2}{\sqrt{\frac{λ_{1} - λ_{n}}{4 z_{0}}}} min_{\tilde{q} \in P_{k - 1}} max_{x \in [- 1, 1]} |{(2 z_{0} (η - x))}^{- \frac{1}{2}} - \tilde{q} (x)| \\ \leq & 4 \sqrt{\frac{z_{0}}{λ_{1} - λ_{n}}} \cdot max_{x \in [- 1, 1]} |{(2 z_{0} (η - x))}^{- \frac{1}{2}} - \sum_{n = 1}^{k - 1} P_{n} (x) z_{0}^{n}| \\ \overset{(4)}{\leq} & 4 \sqrt{\frac{z_{0}}{λ_{1} - λ_{n}}} \cdot \sum_{n = k}^{\infty} (max_{- 1 \leq x \leq 1} | P_{n} (x) |) z_{0}^{n} \overset{(5)}{=} 4 \sqrt{\frac{z_{0}}{λ_{1} - λ_{n}}} \cdot \frac{z_{0}^{k}}{1 - z_{0}} \\ = & \frac{2 (\sqrt{κ} + 1) z_{0}^{k + \frac{1}{2}}}{\sqrt{λ_{1} - λ_{n}}}, \end{matrix}

which yields (13). □

Finally, we provide a convergence analysis of

|β^{T} A^{- 1} β - {∥ β ∥}^{2} e_{1}^{T} S_{k}^{- 1} e_{1}|

.

Theorem 3.

For

k \geq 1

,

0 \leq ϵ_{4}^{(k)} : = \frac{β^{T} A^{- 1} β - {∥ β ∥}^{2} e_{1}^{T} S_{k}^{- 1} e_{1}}{β^{T} A^{- 1} β} \leq \frac{{∥ β ∥}^{2}}{λ_{n} \cdot β^{T} A^{- 1} β} {(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{2 k - 1} .

(14)

Proof.

On the one hand, we deduce that

\begin{matrix} β^{T} A^{- 1} β - {∥ β ∥}^{2} e_{1}^{T} S_{k}^{- 1} e_{1} \\ = & {∥ β ∥}^{2} (e_{1}^{T} Q_{k_{max}}^{T} A^{- 1} Q_{k_{max}} e_{1} - e_{1}^{T} S_{k}^{- 1} e_{1}) \\ = & {∥ β ∥}^{2} (e_{1}^{T} S_{k_{max}}^{- 1} e_{1} - e_{1}^{T} S_{k}^{- 1} e_{1}) \geq 0 . \end{matrix}

On the other hand,

\begin{matrix} \frac{β^{T} A^{- 1} β - {∥ β ∥}^{2} e_{1}^{T} S_{k}^{- 1} e_{1}}{{∥ β ∥}^{2}} \overset{(6)}{\leq} 2 min_{p_{k} \in P_{2 k - 1}} max_{λ \in [λ_{n}, λ_{1}]} | \frac{1}{λ} - p (λ) | \\ = & 2 min_{p \in P_{2 k - 1}} max_{x \in [- 1, 1]} |{(\frac{(λ_{n} - λ_{1}) x}{2} + \frac{λ_{1} + λ_{n}}{2})}^{- 1} - p (\frac{(λ_{n} - λ_{1}) x}{2} + \frac{λ_{1} + λ_{n}}{2})| \\ \overset{(10)}{=} & \frac{4}{λ_{1} - λ_{n}} \cdot min_{\tilde{p} \in P_{2 k - 1}} max_{x \in [- 1, 1]} |\frac{1}{x - η} - \tilde{p} (x)| \\ \overset{L e m m a 3}{=} & \frac{4}{λ_{1} - λ_{n}} \cdot \frac{{(η + \sqrt{η^{2} - 1})}^{- (2 k - 1)}}{η^{2} - 1} \\ = & \frac{4}{λ_{n} (ϰ - 1)} \cdot \frac{{(η + \sqrt{η^{2} - 1})}^{- (2 k - 1)}}{η^{2} - 1} . \end{matrix}

Recall that

η = \frac{ϰ + 1}{ϰ - 1}

; it follows that

η + \sqrt{η^{2} - 1} = \frac{\sqrt{ϰ} + 1}{\sqrt{ϰ} - 1} and η^{2} - 1 = \frac{4 ϰ}{{(ϰ - 1)}^{2}} > \frac{4}{ϰ - 1},

which yields (14). □

Remark 1.

It is well known that the rate of convergence in computing

A^{- 1} β

using the conjugate gradient method is

ς_{k} : = \frac{∥ β - A y_{k} ∥}{∥ β ∥} \leq O ({(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{k}),

where

y_{k}

is the approximate solution of

A^{- 1} β

derived from the k-th step of the conjugate gradient method for solving linear system

A y = β

. It follows that

ϵ_{4}^{(k)} = O (ς_{k}^{2})

. Thus, in practice, we can estimate the size of

ϵ_{4}

from the size of

ς_{k}

.

Remark 2.

According to Theorems 1–3, the convergence rates of

ϵ_{1}^{(k)}

,

ϵ_{2}^{(k)}

ϵ_{3}^{(k)}

, and

ϵ_{4}^{(k)}

are

O (k^{- 1} {(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{k}), O (k^{- 1} {(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{2 k - 1}), O ({(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{k}) and O ({(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{2 k - 1}),

respectively. The smaller the condition number ϰ of A, the faster these four quantities converge to zero. It should be noted that the convergence rate of

ϵ_{2}^{(k)}

is significantly higher than

ϵ_{1}^{(k)}

and

ϵ_{3}^{(k)}

and slightly higher than that of

ϵ_{4}^{(k)}

. The numerical experiments in Section 4 illustrate these two facts.

If f is analytic on

[λ_{n}, λ_{1}]

, then

{| ∥ β ∥}^{2} \cdot e_{1}^{T} f (S_{k}) e_{1} - β^{T} f (A) β | \leq O ({(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{2 k}) .

In contrast, for

f (\cdot) = log (\cdot)

, the convergence rate provided by (9) is superior.

4. Numerical Experiments

We want to show the effectiveness of the bounds provided by (8), (9), (13), and (14) with two examples.

Example 1.

Let the matrix

B = diag (t_{i n}^{[a, b]})

[31], where

t_{i n}^{[a, b]} = (\frac{b - a}{2}) (t_{i n} + (\frac{a + b}{b - a})) with t_{i n} = cos \frac{(2 i - 1) π}{2 n}, i = 1, 2, \dots, n .

Here,

n = 5000

,

a = - 10, b = 10

, and

t_{i n}^{[a, b]}

represents the n-th translated Chebyshev zero node on

[a, b]

. Let

A = B + ζ I, β = \frac{v}{∥ v ∥},

with

v = {(1 1 \dots 1)}^{T} \in R^{5000}, and ζ > 10 .

The selection of ζ is listed in Table 1.

Example 2.

Consider the Strakoš matrix

A = diag ({α_{i}}_{i = 1}^{n}) with α_{i} = α_{1} + (\frac{i - 1}{n - 1}) (α_{n} - α_{1}) ρ^{n - i}, i = 1, 2, \dots, n = 10000 .

The distribution of the eigenvalue is controlled by the parameter ρ. We take

α_{1} = 1, ρ = 0.99 and β = \frac{v}{∥ v ∥}, with v = {(1 1 \dots 1)}^{T} \in R^{10000}

in this example. The selection of

α_{n}

is listed in Table 2.

So, we obtain different values of

ϰ

with different choices of

ζ

and

α_{n}

; refer to Table 1 and Table 2. In Figure 1, Figure 2, Figure 3 and Figure 4, we plot the curves of

ϵ_{1}^{(k)}

,

ϵ_{2}^{(k)}

,

ϵ_{3}^{(k)}

, and

ϵ_{4}^{(k)}

with different

ϰ

, as well as the bounds in (8) and (9), respectively. It is obvious to see from Figure 1, Figure 2, Figure 3 and Figure 4 that (i) the iterations increase with the increasing of

ϰ

, which is consistent with our results (see (8), (9), (13), and (14)); (ii) the convergence rate of

ϵ_{2}^{(k)}

is significantly higher than

ϵ_{1}^{(k)}

and

ϵ_{3}^{(k)}

and slightly higher than that of

ϵ_{4}^{(k)}

, which is also consistent with our results; (iii) the curves of the bounds in (8), (9), (13), and (14) are almost parallel to the curves of the real values, except for

ϵ_{3}^{(k)}

in Example 2 (see Figure 4c,d). That is, our bounds can evaluate the convergence rates of

ϵ_{1}^{(k)}

,

ϵ_{2}^{(k)}

,

ϵ_{3}^{(k)}

, and

ϵ_{4}^{(k)}

effectively, in most instances. All these show the effectiveness of the theoretical results proposed in our paper.

5. Concluding Remarks

We established the convergence rates of the Lanczos method for computing

log (A) β

,

β^{T} log (A) β

,

A^{- \frac{1}{2}} β

, and

β^{T} A^{- 1} β

, as

O (k^{- 1} {(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{k}), O (k^{- 1} {(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{2 k - 1}), O ({(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{k}) and O ({(\frac{\sqrt{ϰ} - 1}{\sqrt{ϰ} + 1})}^{2 k - 1}),

respectively. Here, A is assumed to be symmetric positive definite. It is not difficult to see that the convergence rate of

ϵ_{2}^{(k)}

is significantly higher than

ϵ_{1}^{(k)}

and

ϵ_{3}^{(k)}

and slightly higher than that of

ϵ_{4}^{(k)}

. Numerical experiments illustrate the effectiveness of our theoretical results.

The Lanczos method can also be used to compute some other matrix functions, e.g.,

exp (A) β

,

A^{p} β

(

0 < p < 1

),

sin (A) β

,

cos (A) β

, and so on. The key to convergence analysis is to find the Chebyshev expansion of these functions. Furthermore, our analysis is confined to the scenario where A is symmetric positive definite. The situation is complicated for a nonsymmetric matrix whose eigenvalues are often complex. Therefore, we must resort to the theory of polynomial approximation over complex domains for convergence analysis. Extending this analysis to cases where A is nonsymmetric warrants further investigation. Finally, it is clear from the numerical experiments that the bounds we provide are not sharp enough when the condition number is large, and it is also an interesting question to investigate whether these upper bounds can be improved further.

Author Contributions

Y.G. prepared the mathematica programs, tables, and figures; Y.G., H.M.S. and X.L. wrote the manuscript text. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Suqian University Youth Foundation (Grant No. 2023XQNA15) and Suqian Sci & Tech Program (Grant No. K202419).

Data Availability Statement

This manuscript has no associated data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cuchta, T.; Grow, D.; Wintz, N. Discrete matrix hypergeometric functions. J. Math. Anal. Appl. 2023, 518, 126716. [Google Scholar] [CrossRef]
Cuchta, T.; Luketic, R. Discrete hypergeometric Legendre polynomials. Mathematics 2021, 9, 2546. [Google Scholar] [CrossRef]
Güttel, S. Rational Krylov approximation of matrix functions: Numerical methods and optimal pole selection. GAMM-Mitteilungen 2013, 36, 8–31. [Google Scholar] [CrossRef]
Ilic, M.D.; Turner, I.W.; Simpson, D.P. A restarted Lanczos approximation to functions of a symmetric matrix. IMA J. Numer. Anal. 2009, 30, 1044–1061. [Google Scholar] [CrossRef]
Ubaru, S.; Chen, J.; Saad, Y. Fast estimation of tr(f(A)) via stochastic Lanczos quadrature. SIAM J. Matrix Anal. Appl. 2017, 38, 1075–1099. [Google Scholar] [CrossRef]
Golub, G.H.; Meurant, G. Matrices, Moments and Quadrature with Applications; Princeton University Press: Princeton, NJ, USA, 2009; Volume 30. [Google Scholar]
Frommer, A.; Schweitzer, M. Error bounds and estimates for Krylov subspace approximations of Stieltjes matrix functions. BIT Numer. Math. 2015, 56, 865–892. [Google Scholar] [CrossRef]
Chen, T.; Hallman, E. Krylov-aware stochastic trace estimation. SIAM J. Matrix Anal. Appl. 2023, 44, 1218–1244. [Google Scholar] [CrossRef]
Druskin, V.L.; Knizhnerman, L.A. Error bounds in the simple Lanczos procedure for computing functions of symmetric matrices and eigenvalues. Comput. Math. Math. Phys. 1991, 31, 20–30. [Google Scholar]
Frommer, A.; Kahl, K.; Lippert, T.; Rittich, H. 2-norm error bounds and estimates for Lanczos approximations to linear systems and rational matrix functions. SIAM J. Matrix Anal. Appl. 2013, 34, 1046–1065. [Google Scholar] [CrossRef]
Higham, N.J. Functions of Matrices: Theory and Computation; SIAM: Philadelphia, PA, USA, 2008. [Google Scholar]
Cuchta, T.; Grow, D.; Wintz, N. A dynamic matrix exponential via a matrix cylinder transformation. J. Math. Anal. Appl. 2019, 479, 733–751. [Google Scholar] [CrossRef]
Golub, G.H.; Van Loan, C.F. Matrix Computations; Johns Hopkins University Press: Baltimore, MD, USA, 2012. [Google Scholar]
Saad, Y. Analysis of Some Krylov Subspace Approximations to the Matrix Exponential Operator. SIAM J. Numer. Anal. 1992, 29, 209–228. [Google Scholar] [CrossRef]
Wainwright, M.J.; Jordan, M.I. Log-determinant relaxation for approximate inference in discrete Markov random fields. IEEE Trans. Signal Process. 2006, 54, 2099–2109. [Google Scholar] [CrossRef]
Thron, C.; Dong, S.J.; Liu, K.F.; Ying, H.P. Padé-Z₂ estimator of determinants. Phys. D-Rev. Part. Fields Gravit. Cosmol. 1998, 57, 1642–1653. [Google Scholar] [CrossRef]
Affandi, H.; Fox, E.; Adams, R.; Taskar, B. Learning the parameters of determinantal point process kernels. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1224–1232. [Google Scholar]
Arioli, M.; Loghin, D. Matrix Square-Root Preconditioners for the Steklov-Poincaré Operator; Technical Report RAL-TR-2008-003; Rutherford Appleton Laboratory: Didcot, UK, 2008. [Google Scholar]
Tang, J.; Saad, Y. A Probing Method for Computing the Diagonal of the Inverse of a Matrix; Report UMSI–2010–42; Minnesota Supercomputer Institute, University of Minnesota: Minneapolis, MN, USA, 2010. [Google Scholar]
Meurant, G. Estimates of the trace of the inverse of a symmetric matrix using the modified Chebyshev algorithms. Numer. Algorithms 2009, 51, 309–318. [Google Scholar] [CrossRef]
Bai, Z.; Fahey, M.; Golub, G.H. Some large Scale computation problems. J. Comput. Appl. Math. 1996, 74, 71–89. [Google Scholar] [CrossRef]
Bai, Z.; Golub, G.H. Bounds for the trace of the inverse and the determinant of symmetric positive definite matrices. Ann. Numer. Math. 1997, 4, 29–38. [Google Scholar]
Dong, S.J.; Liu, K.F. Stochastic estimation with Z₂ noise. Phys. Lett. B 1994, 328, 130–136. [Google Scholar] [CrossRef]
Ortner, B.; Krauter, A.R. Lower bounds for the determinant and the trace of a class of Hermitian matrices. Linear Algebra Its Appl. 1996, 236, 147–180. [Google Scholar] [CrossRef]
Brezinski, C.; Fika, P.; Mitrouli, M. Moments of a linear operator, with applications to the trace of the inverse of matrices and the solution of equations. Numer. Linear Algebra Appl. 2012, 19, 937–953. [Google Scholar] [CrossRef]
Wu, L.; Laeuchli, J.; Kalantzis, V.; Stathopoulos, A.; Gallopoulos, E. Estimating the trace of the matrix inverse by interpolating from the diagonal of an approximate inverse. J. Comput. Phys. 2016, 326, 828–844. [Google Scholar] [CrossRef]
Beals, R.; Wong, R. Special Functions and Orthogonal Polynomials; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
Olver, F.; Lozier, D.; Boisvert, R.; Clark, C. The NIST Handbook of Mathematical Functions; Cambridge University Press: New York, NY, USA, 2010. [Google Scholar]
Bernstein, S.N. Sur l, ordre de la meilleure approximation des fonctions continues par les polynômes de degré donné. R. Acad. Med. Belg. 1912, 4, 1–104. [Google Scholar]
Zhan, X. Matrix Inequalities; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Zhang, L.; Shen, C.; Li, R. On the generalized Lanczos trust-region method. SIAM J. Optim. 2017, 27, 2110–2142. [Google Scholar] [CrossRef]

Figure 1. Example 1: Lines correspond to

ϵ_{1}^{(k)}

and

ϵ_{2}^{(k)}

; refer to Theorem 1.

Figure 1. Example 1: Lines correspond to

ϵ_{1}^{(k)}

and

ϵ_{2}^{(k)}

; refer to Theorem 1.

Figure 2. Example 1: Lines correspond to

ϵ_{3}^{(k)}

and

ϵ_{4}^{(k)}

; refer to Theorems 2 and 3.

Figure 2. Example 1: Lines correspond to

ϵ_{3}^{(k)}

and

ϵ_{4}^{(k)}

; refer to Theorems 2 and 3.

Figure 3. Example 2: Lines correspond to

ϵ_{1}^{(k)}

and

ϵ_{2}^{(k)}

; refer to Theorem 1.

Figure 3. Example 2: Lines correspond to

ϵ_{1}^{(k)}

and

ϵ_{2}^{(k)}

; refer to Theorem 1.

Figure 4. Example 2: Lines correspond to

ϵ_{3}^{(k)}

and

ϵ_{4}^{(k)}

; refer to Theorems 2 and 3.

Figure 4. Example 2: Lines correspond to

ϵ_{3}^{(k)}

and

ϵ_{4}^{(k)}

; refer to Theorems 2 and 3.

Table 1. Example 1: Different values of

ϰ

with different choices of

ζ

.

Table 1. Example 1: Different values of

ϰ

with different choices of

ζ

.

	$ζ = 10 + 10^{- 1}$	$ζ = 10 + 10^{- 2}$	$ζ = 10 + 10^{- 3}$	$ζ = 10 + 10^{- 4}$
$ϰ$	2.01 × $10^{2}$	2.00 × $10^{3}$	2.00 × $10^{4}$	1.99 × $10^{5}$

Table 2. Example 2: Different values of

ϰ

with different choices of

α_{n}

.

Table 2. Example 2: Different values of

ϰ

with different choices of

α_{n}

.

	$α_{n} = 10^{- 2}$	$α_{n} = 10^{- 3}$	$α_{n} = 10^{- 4}$	$α_{n} = 10^{- 5}$
$ϰ$	1 × $10^{2}$	1 × $10^{3}$	1 × $10^{4}$	1 × $10^{5}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gu, Y.; Srivastava, H.M.; Liu, X. On the Lanczos Method for Computing Some Matrix Functions. Axioms 2024, 13, 764. https://doi.org/10.3390/axioms13110764

AMA Style

Gu Y, Srivastava HM, Liu X. On the Lanczos Method for Computing Some Matrix Functions. Axioms. 2024; 13(11):764. https://doi.org/10.3390/axioms13110764

Chicago/Turabian Style

Gu, Ying, Hari Mohan Srivastava, and Xiaolan Liu. 2024. "On the Lanczos Method for Computing Some Matrix Functions" Axioms 13, no. 11: 764. https://doi.org/10.3390/axioms13110764

APA Style

Gu, Y., Srivastava, H. M., & Liu, X. (2024). On the Lanczos Method for Computing Some Matrix Functions. Axioms, 13(11), 764. https://doi.org/10.3390/axioms13110764

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Lanczos Method for Computing Some Matrix Functions

Abstract

1. Introduction

2. Lanczos Approximation of $log (A) β$ , $β^{T} log (A) β$ , $A^{- \frac{1}{2}} β$ , and $β^{T} A^{- 1} β$

3. Main Achievements

4. Numerical Experiments

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

On the Lanczos Method for Computing Some Matrix Functions

Abstract

1. Introduction

2. Lanczos Approximation of log ( A ) β , β T log ( A ) β , A − 1 2 β , and β T A − 1 β

3. Main Achievements

4. Numerical Experiments

5. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2. Lanczos Approximation of $log (A) β$ , $β^{T} log (A) β$ , $A^{- \frac{1}{2}} β$ , and $β^{T} A^{- 1} β$