Shared-Pole Carathéodory–Fejér Approximations for Linear Combinations of φ-Functions

Awad H. Al-Mohy

doi:10.3390/math13243985

Department of Mathematics, King Khalid University, Abha 61421, Saudi Arabia

Mathematics2025, 13(24), 3985;https://doi.org/10.3390/math13243985

This article belongs to the Special Issue Numerical Methods for Scientific Computing

Version Notes

Order Reprints

Abstract

We develop a shared denominator Carathéodory–Fejér (CF) method for efficiently evaluating linear combinations of

φ

-functions for matrices whose spectrum lies in the negative real axis, as required in exponential integrators for large stiff ODE systems. This entire family is approximated with a single set of poles (a common denominator). The shared pole set is obtained by assembling a stacked Hankel matrix from Chebyshev boundary data for all target functions and computing a single SVD; the zeros of the associated singular-vector polynomial, mapped via the standard CF slit transform, yield the poles. With the poles fixed, per-function residues and constants are recovered by a robust least squares fit on a suitable grid of the negative real axis. For any linear combination of resolvent operators applied to right-hand sides, the evaluation reduces to one shifted linear solve per pole with a single combined right-hand side, so the dominant cost matches that of computing a single

φ

-function action. Numerical experiments indicate geometric convergence at a rate consistent withHalphen’s constant, and for highly stiff problems our algorithm outperforms existing Taylor and Krylov polynomial-based algorithms.

Keywords:

matrix functions; Carathéodory–Fejér approximation; shared poles; exponential integrators; φ-functions

MSC:

15A16; 65F60; 65L05

1. Introduction

Over the last two decades, the

φ

-functions have become central objects in the design of exponential integrators for stiff systems; see, e.g., the survey by Hochbruck and Ostermann [1], the review by Minchev and Wright [2], and the references therein. Typical target problems include semilinear diffusion–reaction equations, Schrödinger-type equations, advection–diffusion–reaction systems, and more general evolution equations obtained by spatial discretization of parabolic or highly oscillatory PDEs in physics and engineering leading to a system of ODEs:

u^{'} (t) = A u (t) + g (u (t), t), u (t_{0}) = u_{0}, u (t) \in C^{d},

(1)

where

A \in C^{d \times d}

is obtained from spatial discretization and g is a nonlinear vector function. In such applications, the matrix A is usually large and sparse and in many cases it is negative semidefinite.

Exponential integrators have emerged as a successful class of numerical methods for systems of ODEs. A broad class of exponential integrators reduces each stage to a linear combination of

φ

–function actions of the form (see, e.g., [1,3,4,5,6,7,8,9,10]):

w = \sum_{j = 0}^{p} φ_{j} (A) v_{j} .

(2)

For a scalar argument

z \in C

, the functions

φ_{k}

admit a convenient integral representation

φ_{0} (z) = e^{z}, φ_{k} (z) = \frac{1}{(k - 1)!} \int_{0}^{1} e^{(1 - τ) z} τ^{k - 1} d τ, k \geq 1,

which, by the holomorphic functional calculus, extends directly to matrices:

φ_{k} (A) = \frac{1}{(k - 1)!} \int_{0}^{1} e^{(1 - τ) A} τ^{k - 1} d τ, k \geq 1 .

Equivalently, one can use the power series representation

φ_{k} (A) = \sum_{m = 0}^{\infty} \frac{A^{m}}{(m + k)!}

.

As an illustration of the pivotal role of the

φ

-functions in exponential integrators, consider the problem (1) with step size

h > 0

. The classical fourth-order exponential time differencing Runge–Kutta method (ETDRK4) of Cox and Matthews [9], modified later by Kassam and Trefethen [11] for stability, reads as follows. Given

u_{n} \approx u (t_{n})

, define the stage values

\begin{matrix} a_{n} & = e^{\frac{h}{2} A} u_{n} + A^{- 1} (e^{\frac{h}{2} A} - I) g (u_{n}, t_{n}), \\ b_{n} & = e^{\frac{h}{2} A} u_{n} + A^{- 1} (e^{\frac{h}{2} A} - I) g (a_{n}, t_{n} + \frac{h}{2}), \\ c_{n} & = e^{\frac{h}{2} A} a_{n} + A^{- 1} (e^{\frac{h}{2} A} - I) (2 g (b_{n}, t_{n} + \frac{h}{2}) - g (u_{n}, t_{n})) . \end{matrix}

Then, the step from

t_{n}

to

t_{n + 1} = t_{n} + h

is given by

\begin{matrix} u_{n + 1} & = e^{h A} u_{n} + h^{- 2} A^{- 3} {[- 4 I - h A + e^{h A} (4 I - 3 h A + {(h A)}^{2})] g (u_{n}, t_{n}) \\ + 2 [2 I + h A + e^{h A} (- 2 I + h A)] (g (a_{n}, t_{n} + \frac{h}{2}) + g (b_{n}, t_{n} + \frac{h}{2})) \\ + [- 4 I - 3 h A - {(h A)}^{2} + e^{h A} (4 I - h A)] g (c_{n}, t_{n} + h)} . \end{matrix}

Using the fact that ([12], Section 10.7.4)

φ_{0} (z) = e^{z}, φ_{1} (z) = \frac{e^{z} - 1}{z}, φ_{2} (z) = \frac{e^{z} - 1 - z}{z^{2}}, φ_{3} (z) = \frac{e^{z} - 1 - z - \frac{1}{2} z^{2}}{z^{3}},

the ETDRK4 scheme can be written in the form

\begin{matrix} a_{n} & = φ_{0} (\frac{h}{2} A) u_{n} + \frac{h}{2} φ_{1} (\frac{h}{2} A) g (u_{n}, t_{n}), \\ b_{n} & = φ_{0} (\frac{h}{2} A) u_{n} + \frac{h}{2} φ_{1} (\frac{h}{2} A) g (a_{n}, t_{n} + \frac{h}{2}), \\ c_{n} & = φ_{0} (\frac{h}{2} A) a_{n} + \frac{h}{2} φ_{1} (\frac{h}{2} A) (2 g (b_{n}, t_{n} + \frac{h}{2}) - g (u_{n}, t_{n})) . \end{matrix}

The step from

t_{n}

to

t_{n + 1} = t_{n} + h

is then given by

\begin{matrix} u_{n + 1} & = φ_{0} (h A) u_{n} \\ + h φ_{1} (h A) g (u_{n}, t_{n}) \\ + h φ_{2} (h A) [- 3 g (u_{n}, t_{n}) + 2 g (a_{n}, t_{n} + \frac{h}{2}) + 2 g (b_{n}, t_{n} + \frac{h}{2}) - g (c_{n}, t_{n} + h)] \\ + 4 h φ_{3} (h A) [g (u_{n}, t_{n}) - g (a_{n}, t_{n} + \frac{h}{2}) - g (b_{n}, t_{n} + \frac{h}{2}) + g (c_{n}, t_{n} + h)] . \end{matrix}

This formulation is algebraically equivalent to the original ETDRK4 scheme of Cox and Matthews, but all matrix coefficients are expressed as linear combinations of the

φ

-functions

φ_{j}

,

j = 0, 1, 2, 3

, making the scheme stable and less prone to subtractive cancellation [11].

The demand of computing such a combination in (2) has led to a substantial body of work devoted specifically to the numerical evaluation of matrix

φ

-functions. Niesen and Wright [8] proposed a Krylov subspace algorithm for computing

φ_{k} (A) v

that is now widely used in exponential integrator codes. Their algorithm has been improved and extended to several Krylov-based algorithms that simultaneously evaluate several linear combinations of the form (2); see, e.g., Luan et al. [13], Gaudreault et al. [5], and Caliari et al. [14]. Recently, Al-Mohy [3] proposed an algorithm based on the Taylor series that simultaneously calculates several linear combinations of the form (2). For the implementation of rational Krylov subspaces, see, e.g., Moret [15], Bergermann and Stoll [16], and the references therein.

For algorithms of

φ

-functions of medium size, Berland, Skaflestad, and Wright [17] developed the expint MATLAB package, emphasizing that the stability and efficiency of exponential integrators hinge on the accurate evaluation of the underlying

φ

-functions. A more recent contribution includes the scaling and recovering algorithm by Al-Mohy and Liu [18], which extends the work of Al-Mohy and Higham [19] for the matrix exponential.

In this context, our goal in the present work is to develop a CF-based rational approximation framework that exploits a shared set of poles for the family

{φ_{j}}_{j = 0}^{p}

. By constructing near-best scalar rational approximants with a common denominator on the negative real axis, and then lifting them to the matrix level, we obtain an efficient mechanism for evaluating general linear combinations of the form (2) using only one set of shifted factorizations

(A - θ_{ℓ} I)

across all indices j and all vectors

v_{j}

.

In this manuscript, we focus on approximating the linear combination (2) simultaneously by constructing a shared-pole CF rational approximation for each

φ_{j}

of the form [20]

r_{j} (x) = r_{\infty}^{(j)} + \sum_{ℓ = 1}^{n} \frac{η_{ℓ}^{(j)}}{x - θ_{ℓ}}, x \in R^{-} : = (- \infty, 0], j = 0, \dots, p,

(3)

where the pole set

{θ_{ℓ}}_{ℓ = 1}^{n}

is common to all j, while the residues

η_{ℓ}^{(j)}

and constants

r_{\infty}^{(j)}

depend on j. Equivalently,

r_{j} (x) = \frac{p_{j, n} (x)}{q_{n} (x)}, q_{n} (x) = \prod_{ℓ = 1}^{n} (x - θ_{ℓ}),

so that all

r_{j}

share the same denominator

q_{n}

and differ only in their numerators

p_{j, n}

.

The CF approach proposed by Trefethen [21,22] and Trefethen and Gutknecht [23] constructs near-best real rational approximants to scalar functions from boundary data using singular structure of a Hankel matrix. Its roots trace back to the early 20th-century work of Carathéodory and Fejér on the relation between the extrema of harmonic functions and their coefficients [24]; an extensive historical review is given in [23]. The use of CF approximants for matrix functions goes back to Trefethen, Weideman, and Schmelzer ([20], Section 4) for the matrix exponential. Schmelzer and Trefethen [25] subsequently used CF approximants to evaluate actions

φ_{j} (A) v

, typically with distinct pole sets for each j. They also advocate a common-pole strategy exploiting block-matrix identities among the

φ

-functions ([25], Section 4), yielding rational approximants that share a single denominator and thereby allowing reuse of the same shifted factorizations

A - θ_{ℓ} I

across multiple right-hand sides. They did not develop a general framework for arbitrary linear combinations of the form (2) with heterogeneous right-hand sides, which we provide below. Moreover, they explicitly remark that these approximants are far from optimal.

A key advantage of our approach is that the required approximation degree (and hence the accuracy up to the conditioning of the problems) of the CF rational approximants is its tendency to be independent of the spectral radius of A: the same shared pole set yields geometric decay uniformly on

R^{-}

, so large spectral radius does not force a higher degree n. By contrast, the algorithms based on Taylor series, like that of Al-Mohy [3] and based on the standard Krylov like those of [5,8,14] typically require degrees that grow with

∥ A ∥

[16].

This paper is organized as follows: In Section 2, we show how the shared poles, residues, and constants can be computed and propose an algorithm for their computation. In Section 3, we present several results showing that the shared-pole rational approximants retain the exponential accuracy. Section 4 presents the main algorithm for computing the linear combination in (2). Next we present our numerical experiments in Section 5. Finally, we draw some concluding remarks in Section 6.

In the next section, we describe how to construct the shared pole set

{θ_{ℓ}}_{ℓ = 1}^{n}

together with the per–function residues

η_{ℓ}^{(j)}

and constants

r_{\infty}^{(j)}

that define the CF approximants

r_{j}

in (3).

2. Constructing Shared Poles and Per–Function Residues and Constants

This section builds on the CF analysis and practice of [20,23,25]. Let

f_{0}, \dots, f_{p}

be analytic in

C \ R^{-}

and continuous up to

R^{-}

. Assume further that

| f_{j} (z) | \to 0

as

| z | \to \infty

in a left sector containing

R^{-}

, for

j = 0, \dots, p

.

We consider the conformal map

z (w) = σ {(\frac{w - 1}{w + 1})}^{2},

(4)

where

σ > 0

is a suitably chosen scaling parameter. It maps the interior of the unit disk onto

C \ R^{-}

and the unit circle

w = e^{i θ}

,

0 \leq θ \leq 2 π

, onto

R^{-}

, on which

z (e^{i θ}) = σ {(\frac{e^{i θ} - 1}{e^{i θ} + 1})}^{2} = σ \frac{t - 1}{t + 1}, t = cos θ .

Since

(t - 1) / (t + 1) \to - \infty

as

t \to - 1^{+}

and

f_{j} (z) \to 0

as

z \to - \infty

along

R^{-}

by assumption, we set

F_{j} (- 1) : = lim_{t \to - 1^{+}} f_{j} (σ \frac{t - 1}{t + 1}) = 0,

so that

F_{j}

is continuous on

[- 1, 1]

. For each

j = 0, \dots, p

we define the boundary function

F_{j} (t) = f_{j} (z (w)) = f_{j} (σ \frac{t - 1}{t + 1}), t \in [- 1, 1] .

Since

F_{j}

is continuous on

[- 1, 1]

, it has the Chebyshev expansion

F_{j} (t) = \sum_{k = 0}^{\infty} c_{k}^{(j)} T_{k} (t),

where

c_{0}^{(j)} = a_{0}^{(j)}

and

c_{k}^{(j)} = 2 a_{k}^{(j)}

for

k \geq 1

, with

a_{k}^{(j)} = \frac{1}{π} \int_{- 1}^{1} \frac{F_{j} (t) T_{k} (t)}{\sqrt{1 - t^{2}}} d t = \frac{1}{2 π} \int_{0}^{2 π} F_{j} (cos θ) cos (k θ) d θ,

since the Chebyshev polynomial

T_{k} (t) = cos (k arccos t)

. Numerically these integrals are approximated using the composite trapezoidal rule in

θ

, which converges geometrically for

2 π

-periodic analytic functions ([26], Theorem 3.1). The fast Fourier transform (FFT) can then be used to compute the coefficients efficiently because of periodicity ([27], Section 5.5).

Trefethen and Gutknecht ([23], Section 1) truncate

F_{j}

as

F_{j}^{(K)} (t) = \sum_{k = 0}^{K} c_{k}^{(j)} T_{k} (t),

and write

t = (w + \bar{w}) / 2

for

| w | = 1

, so that

T_{k} (t) = (w^{k} + {\bar{w}}^{k}) / 2

for

k \geq 1

, yielding

F_{j}^{(K)} (t) = c_{0}^{(j)} + \frac{1}{2} (g_{j} (w) + g_{j} (\bar{w})), g_{j} (w) = \sum_{k = 1}^{K} c_{k}^{(j)} w^{k} .

(5)

The key consequence of the CF theorem ([21], Theorem 3) is as follows: The polynomial

g_{j}

in (5) has a unique best rational approximation of type

(n, n)

, denoted

r_{n n}^{* (j)}

, with all poles lying outside the unit disk, and

g_{j} (w) - r_{n n}^{* (j)} (w) = s_{n + 1} w^{K} B_{j} (w), B_{j} (w) = \frac{u_{1} + u_{2} w + \dots + u_{K} w^{K - 1}}{v_{K} + v_{K - 1} w + \dots + v_{1} w^{K - 1}},

(6)

where

B_{j} (w)

is a finite Blaschke product of degree n and

v = {[v_{1}, v_{2}, \dots, v_{K}]}^{T}

and

u = {[u_{1}, u_{2}, \dots, u_{K}]}^{T}

are the right and left singular vectors, respectively, associated with the singular value

s_{n + 1}

of the Hankel matrix

H^{(j)} = [\begin{matrix} c_{1}^{(j)} & c_{2}^{(j)} & \dots & c_{K - 1}^{(j)} & c_{K}^{(j)} \\ c_{2}^{(j)} & c_{3}^{(j)} & \dots & c_{K}^{(j)} & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ c_{K - 1}^{(j)} & c_{K}^{(j)} & \dots & 0 & 0 \\ c_{K}^{(j)} & 0 & \dots & 0 & 0 \end{matrix}] .

(7)

That is,

H^{(j)} v = s_{n + 1} u

; see ([22], Proposition 3.1) and ([23], Theorem 1).

In our applications the functions

f_{j}

are real-valued on

R^{-}

(for example, the scalar

φ

-functions), and, hence, the boundary functions

F_{j}

are real-valued on

[- 1, 1]

. Consequently, all Chebyshev coefficients

c_{k}^{(j)}

are real and the Hankel matrices

H^{(j)}

are real. It follows that we may take the singular vectors u and v in the relation

H^{(j)} v = s_{n + 1} u

to be real, so that the associated Blaschke product

B_{j}

and the type

(n, n)

rational approximant

r_{n n}^{* (j)}

have real coefficients and poles occurring in complex conjugate pairs. For simplicity, we adopt this real-valued setting throughout the paper. The construction extends in a straightforward way to complex-valued functions

f_{j}

, at the expense of working with complex Hankel matrices and singular vectors; we do not pursue this generality here.

Trefethen ([22], p. 303) shows that the best rational approximant

r_{n n}^{* (j)}

has n poles counted with multiplicity and they are the zeros of the denominator polynomial of

B (w)

in (6) that lie outside the unit circle. Thus, the residues and constants associated with partial fraction form of

r_{n n}^{* (j)}

are also encoded in the vector u representing the numerator of

B (w)

; see Trefethen and Gutknecht ([23], Section 1) for further details. After obtaining the poles and residues in the w-plane, the conformal map

z (w)

defined in (4) is applied to transplant them in the z-plane and construct a near-best rational approximation for

f_{j}

.

Our novel approach in this paper is to construct a family of rational approximants

r_{n n}^{(j)}

in a near-best sense for the functions

f_{j}

such that

r_{n n}^{(j)}

,

j = 0, \dots, p

, have the same denominator. That is, they share the same set of poles. We begin by forming the Hankel matrices

H^{(j)}

,

j = 0 \dots, p

, as prescribed above and then build the block-row matrix

H = [\begin{matrix} \sqrt{w_{0}} H^{(0)} \\ \sqrt{w_{1}} H^{(1)} \\ ⋮ \\ \sqrt{w_{p}} H^{(p)} \end{matrix}],

which we refer to as the stacked weighted Hankel matrix with positive weights

w_{j}

. We then compute the thin singular value decomposition (SVD)

H = U S V^{T}

and let

v_{n + 1} \in R^{K}

denote the right singular vector associated with the

(n + 1)

st singular value

s_{n + 1}

(singular values ordered nonincreasingly). Equivalently,

v_{n + 1}

minimizes the quadratic form

\sum_{j = 0}^{p} w_{j} {∥ H^{(j)} v ∥}_{2}^{2} over unit vectors v orthogonal to span {v_{1}, \dots, v_{n}} .

(8)

The vector

v_{n + 1}

is regarded as the coefficients of the polynomial

Q (q) = \sum_{i = 1}^{K} {(v_{n + 1})}_{i} q^{K - i} .

We evaluate the zeros

{q_{ℓ}}

of Q lying outside the unit disk and retain the n zeros nearest to the unit circle. Finally, we map them to the slit plane via the conformal map

z (w)

in (4) as

θ_{ℓ} = z (q_{ℓ}), ℓ = 1, \dots, n .

The resulting

{θ_{ℓ}}_{ℓ = 1}^{n}

constitute a shared pole set (a common denominator) for all functions

{f_{j}}

. With these poles fixed, we approximate each

f_{j}

by a shared denominator partial fraction

r_{j} (x) = r_{\infty}^{(j)} + \sum_{ℓ = 1}^{n} \frac{η_{ℓ}^{(j)}}{x - θ_{ℓ}}, x \in R^{-} .

To compute the residues

η_{ℓ}^{(j)}

and constants

r_{\infty}^{(j)}

, we choose suitable grid points

{x_{k}}_{k = 1}^{τ} \subset [- M, 0] \subset R^{-}

(for some truncation parameter

M > 0

proportional to

σ

in practice) and solve a robust least squares problem

min_{r_{\infty}^{(j)}, η^{(j)}} ∥ L β^{(j)} - f_{j} (x) ∥_{2}, L = [\begin{matrix} 1 & {({(x_{k} - θ_{ℓ})}^{- 1})}_{k, ℓ} \end{matrix}], β^{(j)} = [\begin{matrix} r_{\infty}^{(j)} \\ η_{1}^{(j)} \\ ⋮ \\ η_{n}^{(j)} \end{matrix}],

(9)

where

f_{j} (x) = {[f_{j} (x_{1}), f_{j} (x_{2}), \dots, f_{j} (x_{τ})]}^{T}

and

1

denotes the column vector whose entries are all ones. Another way to compute the residues

η_{ℓ}^{(j)}

and constants

r_{\infty}^{(j)}

is to use the

(n + 1)

st vector of U, denote it u. By blocking u in accordance with the rows of H,

u = {[u^{(0)}, u^{(1)}, \dots, u^{(p)}]}^{T},

we have

H^{(j)} v_{n + 1} = \frac{s_{n + 1}}{\sqrt{w_{j}}} u^{(j)}, j = 0, \dots, p .

This shows that the blocks

u^{(j)}

play, for each

f_{j}

, a role analogous to the left singular vector in the one-function CF construction. In principle, one could recover residue information from these vectors along the lines of [22,23]. In Algorithm 1, however, we opt for the simpler and more robust least squares procedure (9); we do not exploit this alternative in our implementation.

Algorithm 1 Shared Denominator CF on

R^{-}

: poles

{θ_{ℓ}}

, residues

{η_{ℓ}^{(j)}}

, and constants

r_{\infty}^{(j)}

.

Input:: Functions $f_{0}, \dots, f_{p}$ satisfying the assumptions in Section 2; target degree n; CF parameters: scale $σ > 0$ , truncation $K > n$ , number of Chebyshev points $N > K$ ; positive weights $w_{0}, \dots, w_{p}$ (default $w_{j} \equiv 1$ ); residue fit parameters: LS grid size d; far-left bound M (e.g., $M ≃ 100 σ$ ).
Output:: Shared poles ${θ_{ℓ}}_{ℓ = 1}^{n}$ ; per-function constants $r_{\infty}^{(j)}$ and residues ${η_{ℓ}^{(j)}}_{ℓ = 1}^{n}$ .

1:: Map and sample on the unit circle
2:: Define $x (t) = σ \frac{t - 1}{t + 1}$ and sample $t_{k} = cos (2 π i k / N)$ , $k = 0, \dots, N - 1$ .
3:: for $j = 0, \dots, p$ do
4:: $F_{j} (t_{k}) \leftarrow f_{j} (x (t_{k}))$ ;
5:: Compute the Cheb. coef. $c_{0}^{(j)}, \dots, c_{K}^{(j)}$ exploiting FFT.
6:: Form $K \times K$ Hankel matrices $H^{(j)}$ . ▹ see (7)
7:: end for
8:: Stacked-weighted Hankel SVD and pole extraction
9:: Build $H = [\begin{matrix} \sqrt{w_{0}} H^{(0)} \\ \sqrt{w_{1}} H^{(1)} \\ ⋮ \\ \sqrt{w_{p}} H^{(p)} \end{matrix}]$ .
10:: Compute the thin SVD $H = U S V^{T}$ . ▹ singular values nonincreasing.
11:: Extract $v_{n + 1}$ , the $(n + 1)$ st column of V.
12:: Compute the roots of the polynomial $\sum_{i = 1}^{K} {(v_{n + 1})}_{i} q^{K - i}$ .
13:: Select the n roots nearest to the unit circle with $| q | > 1$ , denote ${q_{ℓ}}_{ℓ = 1}^{n}$ .
14:: $θ_{ℓ} = σ {(\frac{q_{ℓ} - 1}{q_{ℓ} + 1})}^{2}, ℓ = 1, \dots, n$ . ▹ shared poles
15:: Per-function residues and constants via x-plane least squares on $[- M, 0]$
16:: Build a log-dense grid ${x_{k}}_{k = 1}^{τ}$ on $[- M, 0]$ .
17:: Construct the matrix:

$L = [\begin{matrix} 1 & {({(x_{k} - θ_{ℓ})}^{- 1})}_{k, ℓ} \end{matrix}] \in C^{τ \times (n + 1)} .$
18:: Compute the thin $Q R$ factorization $L = Q R$ .
19:: for $j = 0, \dots, p$ do
20:: $y \leftarrow {[f_{j} (x_{k})]}_{k = 1}^{τ}$
21:: Solve $R β^{(j)} = Q^{*} y$ ▹ least squares for $β^{(j)} \in C^{n + 1}$
22:: Set $r_{\infty}^{(j)} = β_{1}^{(j)}$ and $η_{ℓ}^{(j)} = β_{ℓ + 1}^{(j)}$ for $ℓ = 1, \dots, n$ .
23:: end for
24:: return shared poles ${θ_{ℓ}}$ , constants ${r_{\infty}^{(j)}}$ , residues ${η_{ℓ}^{(j)}}$ .

The quadratic form in (8) makes clear how the single denominator is optimized across the family

{f_{j}}

. Larger

w_{j}

bias the pole set toward features of

f_{j}

(improving its fit, potentially at the expense of others). A simple and effective choice is

w_{0} = \dots = w_{p}

, so each

f_{j}

contributes equally to the shared poles. In the terminology of [25], the “common poles” used in practice correspond to taking the denominator from a single base function (often

e^{x}

); in our framework, this is the extreme weight choice

w_{0} = 1

and

w_{j} = 0

for

1 \leq j \leq p

. This perspective aligns with their comment that such approximations are “far from optimal.”

3. Exponential Accuracy of Shared-Denominator Rational Approximants

Let

D : = C \ R^{-}

. Write

A (D)

for the class of functions analytic in D and continuous up to

R^{-}

in the nontangential sense. The results below show that shared denominator (shared-pole) CF rational approximants retain exponential accuracy. Trefethen, Weideman, and Schmelzer [20] observed that quadrature formulas can be interpreted as rational approximations (and conversely). Consequently, it suffices to establish exponential convergence of the periodic trapezoidal rule for the boundary integral representation (after mapping D to the exterior disk); this holds whenever the integrand extends analytically to a fixed annulus about the unit circle, and thus yields exponential accuracy for the CF approximants; see, e.g., ([26], Theorem 2.1).

Theorem 1.

Fix a finite family

F = {f_{0}, \dots, f_{p}} \subset A (D)

. There exist constants

C > 0

and

ρ > 1

, depending only on F and on

R^{-}

, such that for each

n \in N

there are polynomials

q_{n} (deg q_{n} \leq n), p_{j, n} (deg p_{j, n} \leq n, 0 \leq j \leq p),

with a single denominator

q_{n}

(independent of j) satisfying

max_{0 \leq j \leq p} sup_{x \leq 0} |f_{j} (x) - p_{j, n} (x) / q_{n} (x)| \leq C ρ^{- n} .

Proof.

Let

Φ : D \to {w : | w | > 1}

be a conformal bijection with

Φ (\infty) = \infty

and

Φ^{'} (\infty) > 0

, and let

g_{D} (z, \infty) = log | Φ (z) |

. For any

σ > 1

define the level curve

Γ_{σ} : = {z \in D : | Φ (z) | = σ}

and parameterize it by

ζ (θ) = Φ^{- 1} (σ e^{i θ})

,

0 \leq θ < 2 π

. For

x \in R^{-}

,

f_{j} (x) = \frac{1}{2 π} \int_{0}^{2 π} \frac{f_{j} (ζ (θ)) ζ^{'} (θ)}{ζ (θ) - x} d θ = \frac{1}{n} \sum_{m = 0}^{n - 1} \frac{ω_{m} f_{j} (ζ_{m})}{ζ_{m} - x} + E_{j, n} (x),

with

ζ_{m} = ζ (2 π m / n)

and

ω_{m} = ζ^{'} (2 π m / n)

. The integrand is analytic and bounded in a strip

| Im θ | < a

, so by the exponentially convergent trapezoidal rule,

{sup}_{x \in K} | E_{j, n} (x) | \leq C_{1} η^{- n}

for some

η > 1

independent of j and x; see ([26], Theorem 2.1). Clearing denominators gives a rational

r_{j, n} (x) = p_{j, n} (x) / q_{n} (x)

with

q_{n} (x) = \prod_{m = 0}^{n - 1} (ζ_{m} - x)

, common to all j. Taking

ρ = η

proves the claim. □

This leads to two important corollaries. The first shows that any linear combination admits shared denominator approximants with exponential accuracy; the second treats matrix arguments.

Corollary 1.

For any

α \in C^{p + 1}

, there exist

C_{α} > 0

and

ρ > 1

such that the combination

F_{α} : = \sum_{j = 0}^{p} α_{j} f_{j}

admits type

(n, n)

shared denominator approximants with

sup_{x \leq 0} |F_{α} (x) - P_{n} (x) / q_{n} (x)| \leq C_{α} ρ^{- n} .

Proof.

Let

r_{j, n} (x) = p_{j, n} (x) / q_{n} (x)

be the shared denominator approximants from Theorem 1, so that

sup_{x \leq 0} |f_{j} (x) - r_{j, n} (x)| \leq C_{j} ρ^{- n}, 0 \leq j \leq p .

For

α = (α_{0}, \dots, α_{p})

define

P_{n} (x) : = \sum_{j = 0}^{p} α_{j} p_{j, n} (x)

and

R_{n} (x) : = P_{n} (x) / q_{n} (x)

. Then,

|F_{α} (x) - R_{n} (x)| = |\sum_{j = 0}^{p} α_{j} (f_{j} (x) - r_{j, n} (x))| \leq \sum_{j = 0}^{p} |α_{j}| C_{j} ρ^{- n};

hence,

{sup}_{x \leq 0} | F_{α} (x) - P_{n} (x) / q_{n} (x) | \leq C_{α} ρ^{- n}

with

C_{α} : = \sum_{j = 0}^{p} | α_{j} | C_{j}

. □

Corollary 2.

Let A be normal with spectrum

σ (A) \subset R^{-}

. Then there exist

C > 0

and

ρ_{1} > 1

such that

{∥f_{j} (A) - p_{j, n} (A) / q_{n} (A)∥}_{2} = sup_{x \leq 0} |f_{j} (x) - p_{j, n} (x) / q_{n} (x)| \leq C ρ_{1}^{- n} .

If A is nonnormal and the field of values

W (A) \subset C \ R^{-}

, Crouzeix’s theorem [28] yields

∥f_{j} (A) - p_{j, n} (A) / q_{n} (A)∥ \leq C^{'} ρ_{2}^{- n},

for some

C^{'} > 0

and

ρ_{2} > 1

.

Proof.

First, suppose A is normal and

σ (A) \subset R^{-}

. By the spectral theorem, there is a unitary U and a real diagonal

Λ

with entries in

R^{-}

such that

A = U Λ U^{*}

. For any rational r without poles on

R^{-}

and since the 2-norm is unitarily invariant, we have

\begin{matrix} {∥f_{j} (A) - r (A)∥}_{2} & = & {∥U (f_{j} (Λ) - r (Λ)) U^{*}∥}_{2} \\ = & {∥f_{j} (Λ) - r (Λ)∥}_{2} = max_{λ \in σ (A)} |f_{j} (λ) - r (λ)| . \end{matrix}

Apply Theorem 1 with a contour

Γ_{σ}

enclosing

R^{-}

; the same construction provides a uniform bound on any compact subset inside

Γ_{σ}

, hence on

σ (A) \subset R^{-}

:

{∥f_{j} (A) - p_{j, n} (A) / q_{n} (A)∥}_{2} \leq sup_{x \leq 0} |f_{j} (x) - p_{j, n} (x) / q_{n} (x)| \leq C ρ_{1}^{- n},

for some

C > 0

and

ρ_{1} > 1

.

If A is nonnormal and

W (A) \subset C \ R^{-}

, pick

σ > 1

so that the level curve

Γ_{σ}

encloses the compact set

W (A)

at positive distance. By the same Cauchy-trapezoidal construction, the rational

r_{n} = p_{j, n} / q_{n}

satisfies

sup_{z \in W (A)} |f_{j} (z) - r_{n} (z)| \leq \hat{C} ρ_{2}^{- n},

for some

\hat{C} > 0

and

ρ_{2} > 1

. Therefore, we have

∥f_{j} (A) - r_{n} (A)∥ \leq C_{Crx} sup_{z \in W (A)} |f_{j} (z) - r_{n} (z)| \leq C^{'} ρ_{2}^{- n}

by Crouzeix’s theorem with the universal constant

C_{Crx} \leq 1 + \sqrt{2}

. □

In view of classical potential theoretic results on best type-

(n, n)

rational approximation of

e^{x}

on

R^{-}

, the geometric rate

ρ

in Theorem 1 can be identified with the reciprocal of Halphen’s constant

H \approx 1 / 9.28903

, i.e.,

ρ = 1 / H \approx 9.28903

; see ([20], Section 4). Motivated by this prediction, in the numerical sweep of Section 5.1 we vary the CF scale parameter

σ \in {5, 7, 9, 11, 13}

and measure the familywise worst error

E_{n}

and fitted rates

ρ^{*}

. As reported in Table 1, for all

p = 0, \dots, 4

the fitted rates lie between 9 and 10 and

σ = 9

consistently yields both the largest

ρ^{*}

and the smallest worst error

E_{14}

. Thus,

σ = 9

is a robust, near-optimal choice for the CF scale parameter, and we adopt this fixed default in Algorithm 1.

Table 1. Fitted rate

ρ_{*}

and familywise worst error

E_{14}

on

[- M, 0]

with

M = 100 σ

,

5 \leq σ \leq 13

.

4. Evaluating the Linear Combination of the $φ$ -Functions with Shared Poles

The functions of interest

{φ_{j}}_{j = 0}^{p}

are evaluated on the slit plane

C \ R^{-}

, with

R^{-}

playing the role of a branch cut, and they satisfy the decay property

φ_{j} (z) \to 0

as

| z | \to \infty

in a sector containing the negative real axis. A shared denominator captures the common structure across the family. If

r_{j}

denotes a CF rational approximant to

φ_{j}

as in (3), then for matrix arguments with spectra lying in the negative real axis we have

r_{j} (A) = r_{\infty}^{(j)} I + \sum_{ℓ = 1}^{n} η_{ℓ}^{(j)} {(A - θ_{ℓ} I)}^{- 1} .

Consequently, the linear combination (2) is approximated by

\begin{matrix} \sum_{j = 0}^{p} φ_{j} (A) v_{j} & \approx \sum_{j = 0}^{p} (r_{\infty}^{(j)} v_{j} + \sum_{ℓ = 1}^{n} η_{ℓ}^{(j)} {(A - θ_{ℓ} I)}^{- 1} v_{j}) \\ = \sum_{j = 0}^{p} r_{\infty}^{(j)} v_{j} + \sum_{ℓ = 1}^{n} {(A - θ_{ℓ} I)}^{- 1} v_{rhs}^{(ℓ)}, \end{matrix}

where

v_{rhs}^{(ℓ)} : = \sum_{j = 0}^{p} η_{ℓ}^{(j)} v_{j}, ℓ = 1, \dots, n .

Thus, we solve only n shifted linear systems—one per pole—so the cost scales with the pole count n, rather than with

(p + 1) n

as it would if each

r_{j}

had a different pole set.

A substantial computational saving is possible when the matrix A and the vectors

{v_{j}}_{j = 0}^{p}

are real, which is the most common situation in practice. Because each

φ_{j}

is real-valued on

R^{-}

, the CF approximants

r_{j}

can be chosen with real coefficients, so all non-real poles and residues occur in complex conjugate pairs (apart from any real poles). In this case, we order the poles so that for some even degree n we have

θ_{n + 1 - ℓ} = \bar{θ_{ℓ}}, ℓ = 1, \dots, \frac{n}{2},

and we reorder the residues accordingly:

η_{n + 1 - ℓ}^{(j)} = \bar{η_{ℓ}^{(j)}}, j = 0, \dots, p .

Since A and the

v_{j}

are real, this implies

v_{rhs}^{(n + 1 - ℓ)} = \sum_{j = 0}^{p} η_{n + 1 - ℓ}^{(j)} v_{j} = \bar{\sum_{j = 0}^{p} η_{ℓ}^{(j)} v_{j}} = \bar{v_{rhs}^{(ℓ)}}, ℓ = 1, \dots, \frac{n}{2} .

Hence, the corresponding solutions satisfy

x^{(n + 1 - ℓ)} : = {(A - θ_{n + 1 - ℓ} I)}^{- 1} v_{rhs}^{(n + 1 - ℓ)} = \bar{{(A - θ_{ℓ} I)}^{- 1} v_{rhs}^{(ℓ)}} = \bar{x^{(ℓ)}},

and, therefore,

\sum_{ℓ = 1}^{n} {(A - θ_{ℓ} I)}^{- 1} v_{rhs}^{(ℓ)} = 2 Re (\sum_{ℓ = 1}^{n / 2} {(A - θ_{ℓ} I)}^{- 1} v_{rhs}^{(ℓ)}) .

In the real case, it thus suffices to solve only

n / 2

shifted linear systems.

We are now in a position to present our algorithm for computing linear combinations of several

φ

-function actions; it is given in Algorithm 2. It is worth noticing that the m shifted linear systems in line 11 of Algorithm 2 need not be solved sequentially. Since each system corresponds to a distinct pole and has an independent right-hand side, they can be solved in parallel.

Algorithm 2 Linear combination of

φ

-functions with shared poles.

Input:

⋄ $A \in C^{d \times d}$ and vectors $v_{1}, \dots, v_{τ} \in C^{d}$
⋄ Degree n (default $n = 14$ ) of the rational approximants $r_{j}$ (for real data, n is even so that poles occur in conjugate pairs)
⋄ A family ${φ_{j_{k}}}_{k = 1}^{τ}$ , where $j_{k} \in I \subset N \cup {0}$ and $| I | = τ$
⋄ Shared CF data: poles ${θ_{ℓ}}_{ℓ = 1}^{n}$ , residues $η_{ℓ}^{(j_{k})}$ , and constants $r_{\infty}^{(j_{k})}$ constructed via Algorithm 1 for the scalar functions $φ_{j_{1}}, \dots, φ_{j_{τ}}$

Output:

w \approx \sum_{k = 1}^{τ} φ_{j_{k}} (A) v_{k}

.

1:: Order ${θ_{ℓ}}_{ℓ = 1}^{n}$ so that complex-conjugate pairs $(θ_{ℓ}, θ_{n + 1 - ℓ})$ are adjacent, and reorder the corresponding $η_{ℓ}^{(j_{k})}$ accordingly.
2:: $m = n$
3:: if realData then
4:: $m \leftarrow n / 2$
5:: end if
6:: $V = [v_{1}, v_{2}, \dots, v_{τ}]$ ▹ $d \times τ$ matrix
7:: $w = 0$ ▹ $d \times 1$ zero vector
8:: for $ℓ = 1$ to m do
9:: $γ^{(ℓ)} \leftarrow {[η_{ℓ}^{(j_{1})}, η_{ℓ}^{(j_{2})}, \dots, η_{ℓ}^{(j_{τ})}]}^{T}$
10:: $v_{rhs}^{(ℓ)} \leftarrow V γ^{(ℓ)}$ ▹ one combined right-hand side
11:: Solve $(A - θ_{ℓ} I) x^{(ℓ)} = v_{rhs}^{(ℓ)}$
12:: if realData then
13:: $x^{(ℓ)} \leftarrow 2 Re (x^{(ℓ)})$
14:: end if
15:: $w \leftarrow w + x^{(ℓ)}$
16:: end for
17:: $w \leftarrow w + V {[r_{\infty}^{(j_{1})}, r_{\infty}^{(j_{2})}, \dots, r_{\infty}^{(j_{τ})}]}^{T}$

5. Numerical Experiments

This section presents three numerical experiments. The first investigates the algorithmic parameters and the geometric convergence rate of the CF approximation while the second experiment implements Algorithm 2 for a highly nonnormal matrix. The third experiment involves the 2D Poisson matrix.

All runs were performed in MATLAB R2022b on a single desktop (Intel^® Core™ i7–7700T @ 2.90 GHz, 16 GB RAM, Intel Corporation, Santa Clara, CA, USA). To contextualize performance and accuracy, we compare the following five routines:

cfphimv:: our MATLAB routine for Algorithm 2 (https://github.com/aalmohy/cfphimv (accessed on 9 December 2025)).
phimv:: Al-Mohy’s algorithm ([3], Algorithm 2) (https://github.com/aalmohy/phimv (accessed on 20 October 2025)).
phi_funm:: Al-Mohy and Liu’s algorithm ([18], Algorithm 5.1) (https://github.com/xiaobo-liu/phi_funm (accessed on 20 October 2025)). This algorithm evaluates several $φ$ –functions of a moderate size matrix jointly via a scaling and recovering strategy. We use this routine to compute a reference solution for medium-sized problems.
bamphi:: Caliari, Cassini, and Živković’s routine [14], combining Newton form polynomial interpolation at special nodes with Krylov techniques (https://github.com/francozivcovich/bamphi (accessed on 20 October 2025)).
kiops:: The adaptive Krylov solver of Gaudreault, Rainwater, and Tokman [5] based on the incomplete orthogonalization procedure (https://gitlab.com/stephane.gaudreault/kiops (accessed on 20 October 2025)).

All phimv, bamphi, and kiops natively accept block right-hand sides and, in one call, evaluate several linear combinations of the form (2). We use each routine with its default settings; for kiops, we set the tolerance to the double-precision machine epsilon for consistency with the other routines.

5.1. Numerical Sweep for Shared-Denominator CF Tables

The purpose of this experiment is to (i) empirically verify geometric convergence of the shared denominator CF approximants for the family

{φ_{j}}_{j = 0}^{p}

on

R^{-}

, and (ii) identify practical defaults for the CF scale

σ

and degree n that deliver double precision accuracy uniformly across j.

For

p \in {0, 1, 2, 3, 4}

we build type-

(n, n)

shared-pole tables at

n \in {6, 8, 10, 12, 14}

and

σ \in {5, 7, 9, 11, 13}

. Pole extraction via Algorithm 1 uses

K = 100

Chebyshev coefficients and

N = 1024

Chebyshev points. With the shared poles

{θ_{ℓ}}

fixed, per-function residues and constants are fitted by least squares (9) on a log-dense training grid of size

τ = 8000

over

[- M, 0]

with

M = 100 σ

. Accuracy is measured on an independent testing grid of size

τ = 20, 000

over

[- M, 0]

. For each configuration, we compute the familywise worst error

E_{n} = max_{0 \leq j \leq p} max_{x \in [- M, 0]} |φ_{j} (x) - r_{j} (x)| .

We report the geometric rate as

ρ_{*}

, defined by a least squares fit of

log E_{n} \approx a + b n

over the interior degrees

{8, 10, 12}

to reduce endpoint bias, yielding

ρ_{*} = e^{- b}

and

C = e^{a}

. Across all p, the fitted rates lie near the Halphen’s constant and the worst errors at

n = 14

are between

10^{- 14}

and

10^{- 13}

. Similar behavior holds for

σ \in {5, 7, 11, 13}

, with

σ = 9

consistently optimal by rate and end error.

We conclude that choosing

σ \approx 9

is sufficient to achieve a nearly optimal geometric convergence rate across

p = 0, \dots, 4

. This default aligns with the scales reported in [20,25]. Accordingly, we recommend

σ = 9

as the CF scale parameter in Algorithm 1.

5.2. Chebyshev Spectral Laplacian (Dirichlet)

We use a Chebyshev collocation discretization on the interval

[0, L]

with homogeneous Dirichlet boundary conditions ([3], Section 5). Starting from the Chebyshev first derivative matrix D on

ξ \in [- 1, 1]

, the change in variables

x = \frac{L}{2} (ξ + 1)

implies

\frac{d}{d x} = \frac{2}{L} \frac{d}{d ξ}

, so the discrete second derivative on

[0, L]

is

{(2 / L)}^{2} D^{2}

. Enforcing Dirichlet conditions by deleting the first and last rows/columns yields a dense, strongly nonnormal matrix

A \in R^{(N - 1) \times (N - 1)}

whose spectrum lies on the negative real axis and whose spectral radius grows with N, making it a stiff, representative test for exponential integrators. The code in Table 2 (adapted from [29]) constructs A for general N and L.

Table 2. MATLAB code for the discrete second derivative on

[0, L]

with Dirichlet conditions.

We consider

N \in {50, 100, 150, 200, 250}

and

L = 2

. The aim is to demonstrate the speed of our shared denominator CF rational approximant (with

n = 14

) relative to Taylor- and Krylov polynomial-based routines. As

∥ A ∥

and nonnormality grow, polynomial approaches typically require higher degrees (or smaller steps) to control the error, so their cost escalates. By contrast, the CF rational approximants employ a fixed set of poles that capture the branch cut of the

φ

-family, making accuracy essentially insensitive to the spectral radius of A; the dominant work reduces to a small number of shifted linear solves. This is therefore a highly stiff test where rational approximants should retain both accuracy and speed as N increases. For

p = 3

and each N, we generate a random set of vectors

{v_{0}, v_{1}, \dots, v_{p}} \in R^{N - 1}

and evaluate

w_{N} = \sum_{j = 0}^{p} φ_{j} (A) v_{j}

using cfphimv, phimv, bamphi, and kiops. The reference solution is computed using phi_funm.

The data in Table 3 highlight three key points. (i) The cost of cfphimv is essentially flat in N: even as

{∥ A ∥}_{2}

increases from

3.0 \times 10^{5}

to

1.9 \times 10^{8}

and the matrix becomes more nonnormal, the runtime stays below

0.5

s and the relative error remains in the

10^{- 11}

–

10^{- 10}

range. (ii) Krylov/Taylor-based methods (phimv, bamphi, kiops) degrade rapidly with N: phimv and bamphi become slower by one to two orders of magnitude and eventually time out, and kiops either times out or returns unusably large errors (up to

10^{8}

). (iii) Beyond

N ≃ 100

, only cfphimv continues to deliver both accuracy and subsecond turnaround. This supports the main claim of the paper: once the shared pole set is precomputed, evaluating the full linear combination of

φ

-functions reduces to solving a small number of shifted systems, and this remains stable and fast even for highly stiff, strongly nonnormal matrices.

Table 3. Runtimes and relative errors for the linear combination

w_{N} = \sum_{j = 0}^{p} φ_{j} (A) v_{j}

with a hard per-call timeout of 300 s. “TO” = timed out; “ER” = routine error.

5.3. Two-Dimensional Poisson Matrix

In this experiment we use the two-dimensional Poisson matrix

P \in R^{N^{2} \times N^{2}}

obtained from the standard five-point finite difference discretization of

- Δ u = f

on

{(0, 1)}^{2}

with homogeneous Dirichlet boundary conditions. With lexicographic ordering of the

N^{2}

interior grid points, P can be written as a sum of Kronecker products

P = I_{N} \otimes T + T \otimes I_{N},

where

T \in R^{N \times N}

is the tridiagonal Toeplitz matrix

tridiag (- 1, 2, - 1)

corresponding to the one-dimensional second-difference operator. The matrix A can be generated by the MATLAB command P = gallery(‘poisson’, N). Poisson matrices of this type arise ubiquitously in the finite difference and finite element discretization of diffusion and heat equations, electrostatics and potential problems, pressure Poisson equations in incompressible flow, and in image processing and graph-based models where discrete Laplacians are used for smoothing and regularization.

The spectral structure of P can be characterized explicitly. The one-dimensional matrix T is diagonalized by the discrete sine transform matrix S with entries

S_{j k} = \sqrt{\frac{2}{N + 1}} sin (\frac{j k π}{N + 1}), j, k = 1, \dots, N,

yielding

T = S Λ S^{T}

with eigenvalues

λ_{k} = 2 - 2 cos (\frac{k π}{N + 1}) = 4 {sin}^{2} (\frac{k π}{2 (N + 1)}), k = 1, \dots, N .

By standard properties of Kronecker products,

P = (S \otimes S) Λ_{2 D} {(S \otimes S)}^{T}, Λ_{2 D} = diag (λ_{k, ℓ}), λ_{k, ℓ} = λ_{k} + λ_{ℓ}, k, ℓ = 1, \dots, N,

so the eigenvectors of P are tensor products

s_{k} \otimes s_{ℓ}

of the one-dimensional sine modes. Detailed expositions of this construction and its use in fast Poisson solvers and spectral analysis can be found; for example, in Strang ([27], Section 5.5) and Golub and Van Loan ([30], Section 4.8).

To evaluate

f (P) b

efficiently for a scalar matrix function f analytic on the spectrum of P and a vector

b \in C^{N^{2}}

, we exploit the explicit diagonalization

P = (S \otimes S) Λ_{2 D} {(S \otimes S)}^{T}, Λ_{2 D} = diag (λ_{k, ℓ}), λ_{k, ℓ} = λ_{k} + λ_{ℓ} .

We first reshape b into an

N \times N

matrix B such that

vec (B) = b

, where the vec operator stacks the columns of a matrix on top of each other. Using the identity

{(S \otimes S)}^{T} vec (B) = vec (S^{T} B S),

we compute the spectral coefficients

\hat{B} = S^{T} B S

. Defining

F_{k, ℓ} = f (λ_{k, ℓ}), k, ℓ = 1, \dots, N,

we apply

f (Λ_{2 D})

by pointwise multiplication,

Z_{k, ℓ} = F_{k, ℓ} {\hat{B}}_{k, ℓ}, k, ℓ = 1, \dots, N,

and transform back via

B_{f} = S Z S^{T}

, so that

f (P) b = vec (B_{f}) .

In this procedure, we never form

S \otimes S

or

f (P)

explicitly; the dominant cost is a small number of

N \times N

matrix–matrix products (or fast sine transforms), i.e.,

O (N^{3})

flops with an explicit S or

O (N^{2} log N)

flops with FFT-based discrete sine transforms ([30], Section 4.8). We use this spectral machinery, combined with multiprecision arithmetic using the Multiprecision Computing Toolbox (ver. 5.1.0) [31], to generate highly accurate reference solutions.

We now use this eigenvalue decomposition as a high-accuracy reference to evaluate the linear combination (2). Let

h = 1 / (N + 1)

be the mesh size and

N = 2^{k}

for

k = 6, \dots, 10

. We define the scaled operator

A = - \frac{1}{h^{2}} P .

Thus, A is negative definite and increasingly stiff as N grows (indeed,

{∥ A ∥}_{1} = O (N^{2})

). For each N and four (

p = 3

) randomly generated vectors

{v_{j}}_{j = 0}^{p} \subset R^{N^{2}}

, we evaluate the linear combination

w_{N} = \sum_{j = 0}^{p} φ_{j} (A) v_{j}

using the routines cfphimv, phimv, kiops, and bamphi. The reference solution is computed via the spectral diagonalization of A described above, with

φ_{j} (- λ_{k, ℓ} / h^{2})

applied to the eigenvalues in multiprecision arithmetic, and the relative error is measured in the 1-norm against this reference.

Table 4 presents the results. For the smallest sizes (

N =

64,128), all four algorithms reach very small relative errors, typically between

10^{- 14}

and

10^{- 11}

, confirming that they all resolve the highly stiff operator A on this model problem. However, the runtimes differ markedly: cfphimv is already more than an order of magnitude faster than phimv and substantially faster than kiops and bamphi. As N increases, the stiffness and dimension grow rapidly (

{∥ A ∥}_{1}

increases by roughly two orders of magnitude over the tested range). For

N = 256

, phimv already hits the time limit of 600 s, while kiops and bamphi remain accurate but require tens of seconds. For

N \geq 512

, all three polynomial- or Krylov-based routines time out, whereas cfphimv continues to deliver accurate results with runtimes ranging from a fraction of a second (for

N = 64

) to about 80 s (for

N = 1024

). The modest growth of the cfphimv error with N (remaining around

10^{- 9}

at

N = 1024

) indicates that the rational approximation error and the effect of the reference solver are both well under control. Overall, this experiment shows that the shared denominator CF approach can handle very stiff, large-scale diffusion operators for linear combinations of

φ

-functions at a cost comparable to a small number of shifted linear solves, while competing Taylor and rational Krylov methods become prohibitively slow or fail to complete within the time limit.

Table 4. Runtimes and relative errors for the linear combination

w_{N} = \sum_{j = 0}^{p} φ_{j} (A) v_{j}

with A the scaled 2D Poisson matrix and a hard per-call timeout of 600 s. “TO” = timed out.

6. Conclusions

We presented a shared denominator CF framework for evaluating linear combinations of

φ

–functions with stable matrices, a core task in exponential integrators. The key idea is to approximate

{φ_{j}}_{j = 0}^{p}

on

R^{-}

with rational functions that share a single pole set

{θ_{ℓ}}_{ℓ = 1}^{n}

, while allowing per-function residues

{η_{ℓ}^{(j)}}

and constants

{r_{\infty}^{(j)}}

. We obtain the poles by a single SVD of a stacked weighted Hankel built from Chebyshev boundary data of all functions, and then recover the residues and constants via a robust least squares fit on a log-dense grid of

R^{-}

. With these ingredients fixed, any linear combination

w = \sum_{j = 0}^{p} φ_{j} (A) v_{j}

is reduced to solving only

n / 2

(for real data and even n) shifted linear systems independent of p—one per shared pole—against a single combined right-hand side per shift as described in Algorithm 2. Thus, the dominant cost matches that of evaluating a single

φ

-function action.

On the theoretical side, we proved a “no assumptions” shared denominator approximation theorem on

R^{-}

that yields a geometric rate

C ρ^{- n}

for a finite family of analytic functions and their linear combinations, and we lifted these bounds to matrix arguments for normal matrices (via the spectral theorem) and for nonnormal matrices whose field of values avoids the cut (via Crouzeix’s theorem). This places our construction on the same exponential convergence footing as CF/contour-trapezoid approaches, with the observed rates closely tied to the classical constants for slit domains.

Our numerical sweeps corroborate the theory: for

p = 0, \dots, 4

and moderate degrees

n \leq 14

, the worst case scalar errors across

{φ_{j}}

decay geometrically to

\sim 10^{- 14}

, with a near-uniformly effective CF scale around

σ \approx 9

. The experiments also clarify the distinct roles of the parameters used in practice. Importantly, because the rational tables are constructed on the continuous negative real axis and then applied to A via shifted solves, their effectiveness is largely insensitive to the spectral radius of A, in contrast to Taylor and polynomial Krylov approaches whose difficulty grows directly with

∥ A ∥

.

Because our evaluation reduces to solves with the shifted operators

(A - θ_{ℓ} I)

against a single combined right-hand side per pole, the method is immediately amenable to matrix-free implementations.

Funding

This research was funded by the Deanship of Scientific Research at King Khalid University through the Research Groups Program under Grant No. RGP.1/318/45.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We thank the reviewers for their insightful comments and suggestions that helped to improve the presentation of this paper.

Conflicts of Interest

The author declares no conflicts of interest.

References

Hochbruck, M.; Ostermann, A. Exponential integrators. Acta Numer. 2010, 19, 209–286. [Google Scholar] [CrossRef]
Minchev, B.V.; Wright, W.M. A Review of Exponential Integrators for First Order Semi-Linear Problems; Technical Report 2/05; Norwegian University of Science and Technology: Trondheim, Norway, 2005. [Google Scholar]
Al-Mohy, A.H. Computing Linear Combinations of φ-Function Actions for Exponential Integrators. arXiv 2025, arXiv:2509.26475. [Google Scholar]
Al-Mohy, A.H.; Higham, N.J. Computing the Action of the Matrix Exponential, with an Application to Exponential Integrators. SIAM J. Sci. Comput. 2011, 33, 488–511. [Google Scholar] [CrossRef]
Gaudreault, S.; Rainwater, G.; Tokman, M. KIOPS: A Fast Adaptive Krylov Subspace Solver for Exponential Integrators. J. Comput. Phys. 2018, 372, 236–255. [Google Scholar] [CrossRef]
Hochbruck, M.; Lubich, C.; Selhofer, H. Exponential Integrators for Large Systems of Differential Equations. SIAM J. Sci. Comput. 1998, 19, 1552–1574. [Google Scholar] [CrossRef]
Koskela, A.; Ostermann, A. Exponential Taylor Methods: Analysis and Implementation. Comput. Math. Appl. 2013, 65, 487–499. [Google Scholar] [CrossRef]
Niesen, J.; Wright, W.M. Algorithm 919: A Krylov Subspace Algorithm for Evaluating the φ-Functions Appearing in Exponential Integrators. ACM Trans. Math. Softw. 2012, 38, 22. [Google Scholar] [CrossRef]
Cox, S.; Matthews, P. Exponential Time Differencing for Stiff Systems. J. Comput. Phys. 2002, 176, 430–455. [Google Scholar] [CrossRef]
Luan, V.T. Efficient Exponential Runge–Kutta Methods of High Order: Construction and Implementation. BIT Numer. Math. 2021, 61, 535–560. [Google Scholar] [CrossRef]
Kassam, A.K.; Trefethen, L.N. Fourth-Order Time-Stepping for Stiff PDEs. SIAM J. Sci. Comput. 2005, 26, 1214–1233. [Google Scholar] [CrossRef]
Higham, N.J. Functions of Matrices: Theory and Computation; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2008; p. xx+425. [Google Scholar] [CrossRef]
Luan, V.T.; Pudykiewicz, J.A.; Reynolds, D.R. Further Development of Efficient and Accurate Time Integration Schemes for Meteorological Models. J. Comput. Phy. 2019, 376, 817–837. [Google Scholar] [CrossRef]
Caliari, M.; Cassini, F.; Zivcovich, F. BAMPHI: Matrix-Free and Transpose-Free Action of Linear Combinations of φ-Functions from Exponential Integrators. J. Comput. Appl. Math. 2023, 423, 114973. [Google Scholar] [CrossRef]
Moret, I. On RD-rational Krylov approximations to the core-functions of exponential integrators. Numer. Linear Algebra Appl. 2007, 14, 445–457. [Google Scholar] [CrossRef]
Bergermann, K.; Stoll, M. Adaptive Rational Krylov Methods for Exponential Runge–Kutta Integrators. SIAM J. Matrix Anal. Appl. 2024, 45, 744–770. [Google Scholar] [CrossRef]
Berland, H.; Skaflestad, B.; Wright, W.M. EXPINT—A MATLAB Package for Exponential Integrators. ACM Trans. Math. Softw. 2007, 33, 4-es. [Google Scholar] [CrossRef]
Al-Mohy, A.H.; Liu, X. A Scaling and Recovering Algorithm for the Matrix φ-Functions. arXiv 2025, arXiv:2506.01193. [Google Scholar]
Al-Mohy, A.H.; Higham, N.J. A New Scaling and Squaring Algorithm for the Matrix Exponential. SIAM J. Matrix Anal. Appl. 2009, 31, 970–989. [Google Scholar] [CrossRef]
Trefethen, L.N.; Weideman, J.A.C.; Schmelzer, T. Talbot quadratures and rational approximations. BIT Numer. Math. 2006, 46, 653–670. [Google Scholar] [CrossRef]
Trefethen, L.N. Near-circularity of the error curve in complex Chebyshev approximation. J. Approx. Theory 1981, 31, 344–367. [Google Scholar] [CrossRef]
Trefethen, L.N. Rational Chebyshev approximation on the unit disk. Numer. Math. 1981, 37, 297–320. [Google Scholar] [CrossRef]
Trefethen, L.N.; Gutknecht, M.H. The Carathéodory–Fejér Method for Real Rational Approximation. SIAM J. Numer. Anal. 1983, 20, 420–436. [Google Scholar] [CrossRef]
Carathéodory, C.; Fejér, L. Über den Zusammenhang der Extremen von harmonischen Funktionen mit ihren Koeffizienten und über den Picard–Landau’schen Satz. Rend. Circ. Mat. Palermo 1911, 32, 218–239. [Google Scholar] [CrossRef]
Schmelzer, T.; Trefethen, L.N. Evaluating Matrix Functions for Exponential Integrators via Carathéodory–Fejér Approximation and Contour Integrals. Electron. Trans. Numer. Anal. 2007, 29, 1–18. [Google Scholar]
Trefethen, L.N.; Weideman, J.A.C. The Exponentially Convergent Trapezoidal Rule. SIAM Rev. 2014, 56, 385–458. [Google Scholar] [CrossRef]
Strang, G. Introduction to Applied Mathematics; Wellesley–Cambridge Press: Wellesley, MA, USA, 1986. [Google Scholar]
Crouzeix, M.; Palencia, C. The Numerical Range is a (1 + $\sqrt{2}$ )-Spectral Set. SIAM J. Matrix Anal. Appl. 2017, 38, 649–655. [Google Scholar] [CrossRef]
Trefethen, L.N. Spectral Methods in MATLAB; SIAM: Philadelphia, PA, USA, 2000. [Google Scholar] [CrossRef]
Golub, G.H.; Van Loan, C.F. Matrix Computations; Johns Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]
Advanpix. Multiprecision Computing Toolbox; Advanpix: Tokyo, Japan, 2025; Available online: http://www.advanpix.com (accessed on 15 August 2023).

Table 1. Fitted rate

ρ_{*}

and familywise worst error

E_{14}

on

[- M, 0]

with

M = 100 σ

,

5 \leq σ \leq 13

.

Table 1. Fitted rate

ρ_{*}

and familywise worst error

E_{14}

on

[- M, 0]

with

M = 100 σ

,

5 \leq σ \leq 13

.

p	$ρ_{*}$	$E_{14}$
0	$9.663$	$4.11 \times 10^{- 14}$
1	$9.581$	$6.60 \times 10^{- 14}$
2	$9.691$	$7.15 \times 10^{- 14}$
3	$9.959$	$4.54 \times 10^{- 14}$
4	$9.788$	$5.51 \times 10^{- 14}$

Table 2. MATLAB code for the discrete second derivative on

[0, L]

with Dirichlet conditions.

Table 2. MATLAB code for the discrete second derivative on

[0, L]

with Dirichlet conditions.

k = (0:N)′; % node indices

xi = cos(pi∗k/N); % Chebyshev points on [−1,1]

x = (xi + 1)∗(L/2); % map to [0,L]

c = [2; ones(N−1,1); 2].∗(−1).^k; % barycentric weights

X = repmat(x,1,N+1);

dX = X − X.′; % pairwise differences

D = (c∗(1./c)′)./(dX + eye(N+1)); % first-derivative matrix on [−1,1]

D = D − diag(sum(D,2)); % enforce exact column sums = 0

Lh = (2/L)^2 ∗ (D∗D); % scaled second derivative on [0,L]

A = Lh(2:N, 2:N); % remove boundary rows/cols (Dirichlet)

Table 3. Runtimes and relative errors for the linear combination

w_{N} = \sum_{j = 0}^{p} φ_{j} (A) v_{j}

with a hard per-call timeout of 300 s. “TO” = timed out; “ER” = routine error.

Table 3. Runtimes and relative errors for the linear combination

w_{N} = \sum_{j = 0}^{p} φ_{j} (A) v_{j}

with a hard per-call timeout of 300 s. “TO” = timed out; “ER” = routine error.

N	${∥ A ∥}_{2}$	`phimv`		`bamphi`		`kiops`		`cfphimv`
N	${∥ A ∥}_{2}$	time (s)	rel. err	time (s)	rel. err	time (s)	rel. err	time (s)	rel. err
50	$3.0 \times 10^{5}$	3.300	$2.4 \times 10^{- 13}$	3.108	$1.6 \times 10^{- 11}$	2.066	$9.0 \times 10^{5}$	0.076	$3.7 \times 10^{- 12}$
100	$4.8 \times 10^{6}$	35.202	$1.3 \times 10^{- 11}$	126.170	$1.3 \times 10^{- 10}$	32.038	$2.7 \times 10^{8}$	0.073	$1.4 \times 10^{- 11}$
150	$2.4 \times 10^{7}$	TO	—	TO	—	TO	—	0.462	$1.9 \times 10^{- 11}$
200	$7.7 \times 10^{7}$	TO	—	TO	—	TO	—	0.452	$1.7 \times 10^{- 11}$
250	$1.9 \times 10^{8}$	TO	—	ER	—	TO	—	0.450	$2.7 \times 10^{- 10}$

Table 4. Runtimes and relative errors for the linear combination

w_{N} = \sum_{j = 0}^{p} φ_{j} (A) v_{j}

with A the scaled 2D Poisson matrix and a hard per-call timeout of 600 s. “TO” = timed out.

Table 4. Runtimes and relative errors for the linear combination

w_{N} = \sum_{j = 0}^{p} φ_{j} (A) v_{j}

with A the scaled 2D Poisson matrix and a hard per-call timeout of 600 s. “TO” = timed out.

N	${∥ A ∥}_{1}$	`phimv`		`bamphi`		`kiops`		`cfphimv`
N	${∥ A ∥}_{1}$	time (s)	rel. err	time (s)	rel. err	time (s)	rel. err	time (s)	rel. err
64	$3.4 \times 10^{4}$	8.340	$1.0 \times 10^{- 14}$	0.770	$3.0 \times 10^{- 13}$	0.450	$5.2 \times 10^{- 15}$	0.180	$3.9 \times 10^{- 11}$
128	$1.3 \times 10^{5}$	104.470	$2.3 \times 10^{- 14}$	5.490	$5.2 \times 10^{- 12}$	3.280	$1.7 \times 10^{- 14}$	0.540	$5.2 \times 10^{- 11}$
256	$5.3 \times 10^{5}$	TO	—	68.650	$1.7 \times 10^{- 11}$	47.950	$9.6 \times 10^{- 14}$	2.320	$1.2 \times 10^{- 10}$
512	$2.1 \times 10^{6}$	TO	—	TO	—	TO	—	13.410	$8.1 \times 10^{- 10}$
1024	$8.4 \times 10^{6}$	TO	—	TO	—	TO	—	81.180	$1.2 \times 10^{- 9}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Shared-Pole Carathéodory–Fejér Approximations for Linear Combinations of φ-Functions

Abstract

1. Introduction

2. Constructing Shared Poles and Per–Function Residues and Constants

3. Exponential Accuracy of Shared-Denominator Rational Approximants

4. Evaluating the Linear Combination of the $φ$ -Functions with Shared Poles

5. Numerical Experiments

5.1. Numerical Sweep for Shared-Denominator CF Tables

5.2. Chebyshev Spectral Laplacian (Dirichlet)

5.3. Two-Dimensional Poisson Matrix

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Shared-Pole Carathéodory–Fejér Approximations for Linear Combinations of φ-Functions

Abstract

1. Introduction

2. Constructing Shared Poles and Per–Function Residues and Constants

3. Exponential Accuracy of Shared-Denominator Rational Approximants

4. Evaluating the Linear Combination of the φ -Functions with Shared Poles

5. Numerical Experiments

5.1. Numerical Sweep for Shared-Denominator CF Tables

5.2. Chebyshev Spectral Laplacian (Dirichlet)

5.3. Two-Dimensional Poisson Matrix

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

4. Evaluating the Linear Combination of the $φ$ -Functions with Shared Poles