Kernel-Based Approximation of the Koopman Generator and Schrödinger Operator

Stefan Klus; Feliks Nüske; Boumediene Hamzi

doi:10.3390/e22070722

,

and

¹

Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany

²

Department of Mathematics, Paderborn University, 33098 Paderborn, Germany

³

Department of Mathematics, Imperial College London, London SW7 2AZ, UK

^*

Author to whom correspondence should be addressed.

Entropy2020, 22(7), 722;https://doi.org/10.3390/e22070722

This article belongs to the Special Issue Machine Learning for Prediction, Data Assimilation, and Uncertainty Quantification of Dynamical Systems

Version Notes

Order Reprints

Abstract

Many dimensionality and model reduction techniques rely on estimating dominant eigenfunctions of associated dynamical operators from data. Important examples include the Koopman operator and its generator, but also the Schrödinger operator. We propose a kernel-based method for the approximation of differential operators in reproducing kernel Hilbert spaces and show how eigenfunctions can be estimated by solving auxiliary matrix eigenvalue problems. The resulting algorithms are applied to molecular dynamics and quantum chemistry examples. Furthermore, we exploit that, under certain conditions, the Schrödinger operator can be transformed into a Kolmogorov backward operator corresponding to a drift-diffusion process and vice versa. This allows us to apply methods developed for the analysis of high-dimensional stochastic differential equations to quantum mechanical systems.

Keywords:

Koopman generator; Schrödinger operator; reproducing kernel Hilbert space

1. Introduction

The Koopman operator [1,2,3,4] plays a central role in the global analysis of complex dynamical systems. It is, for instance, used to find conformations of molecules and coherent patterns in fluid flows, but also for prediction, stability analysis, and control [5,6,7,8,9,10]. Instead of analyzing a given finite-dimensional, but highly nonlinear system directly, the underlying idea is to compute an associated infinite-dimensional, but linear operator [4]. By computing an approximation of this operator from measurement or simulation data, it is possible to extract Koopman eigenvalues, eigenfunctions, and modes. The most frequently used techniques are based on variants or generalizations of extended dynamic mode decomposition (EDMD) [11,12]. A reformulation of EDMD for the generator of the Koopman operator, called gEDMD, was recently proposed in [13]. It was shown that in addition to the previously mentioned applications, the generator contains valuable information about the governing equations of a system; see also [7,14]. System identification aims at learning a preferably parsimonious model from data. That is, the learned model should comprise as few terms as possible and still have predictive power, which is typically accomplished by utilizing sparse regression techniques. One drawback of gEDMD is that it requires a set of explicitly chosen basis functions and their first- and—if the system is non-deterministic and non-reversible—second-order derivatives. Moreover, the size of the resulting matrix eigenvalue problem that needs to be solved to compute eigenvalues, eigenfunctions, and modes of the generator depends on the size of the dictionary. The goal of this paper is to derive a kernel-based method to approximate the Koopman generator from data. A kernel-based variant of EDMD was proposed in [12] and generalized in [15]. We derive a kernel-based variant of gEDMD. Employing the well-known kernel trick, a dual eigenvalue problem whose size depends on the number of snapshots can be constructed. The resulting methods allow for implicitly infinite-dimensional feature spaces and only require partial derivatives of the kernel function. This enables us to apply the methods to high-dimensional systems for which conventional techniques would be prohibitively expensive due to the curse of dimensionality, provided the number of snapshots is such that the eigenvalue problem can still be solved numerically or can be downsampled without losing essential information. Since we aim at approximating differential operators, we need to be able to represent derivatives in reproducing kernel Hilbert spaces. This requires the notion of derivative reproducing properties. Derivative reproducing kernels [16] were used to approximate Lyapunov functions for ordinary differential equations in [17] and to approximate center manifolds for ordinary differential equations in [18]. Reproducing kernel Hilbert spaces with derivative reproducing properties are related to the native spaces introduced in a different context in [19].

Similar operators are also used for manifold learning and understanding the geometry of high-dimensional data [20,21,22,23]. Methods like diffusion maps construct graph Laplacians with the aid of diffusion kernels, effectively approximating transition probabilities between data points. In the infinite data limit and letting the kernel bandwidth go to zero, it has been shown that these methods, depending on the normalization, essentially compute eigenfunctions of certain differential operators, e.g., the Laplace–Beltrami operator, the Kolmogorov backward operator, or the Fokker–Planck operator.

Another related differential operator that is of utmost importance in quantum mechanics is the Schrödinger operator. Solutions of the time-independent Schrödinger equation describe stationary states and associated energy levels. We will illustrate how kernel-based methods developed for the Koopman generator can be applied to these related problems. The main contributions of this paper are:

We show how the derivative reproducing properties of kernels can be used to approximate differential operators such as the Koopman generator and the Schrödinger operator, as well as their eigenvalues and eigenfunctions from data. Additionally, we derive a kernel-based method tailored to reversible dynamics, which does not require estimating drift and diffusion terms, but only an equilibrated trajectory.
Furthermore, we exploit the fact that, under certain conditions, the Schrödinger operator can be turned into a Kolmogorov backward operator (see, e.g., [24]), which allows for the interpretation of a quantum-mechanical system as a drift-diffusion process and, as a consequence, the application of methods developed for the analysis of stochastic differential equations or their generators.
We demonstrate potential applications in molecular dynamics, using the example of a quadruple-well problem, and quantum mechanics, describing how to apply the proposed methods directly to the Schrödinger equation or the associated stochastic process. This will be illustrated with two well-known examples, the quantum harmonic oscillator and the hydrogen atom.

The remainder of the manuscript is structured as follows: We first introduce the necessary tools, namely the Koopman operator, its generator, and (derivative) reproducing kernel Hilbert spaces in Section 2. Additionally, relationships with the Schrödinger equation will be explored. The derivation of the kernel-based formulation of gEDMD will be detailed in Section 3. In Section 4, we will show how the derived methods can be applied to molecular dynamics and quantum mechanics problems. Concluding remarks and future work will be discussed in Section 5.

2. Koopman Theory and Reproducing Kernel Hilbert Spaces

We start directly with the non-deterministic setting; the Koopman operator and its generator for ordinary differential equations can then be regarded as a special case; see also [13] for a detailed comparison. The notation used below is summarized in Table 1.

Table 1. Overview of notation.

2.1. The Koopman Operator and Its Generator

In what follows, let

X \subset R^{d}

be the state space and

f : X \to R

a real-valued observable of the system. Furthermore, let

E [\cdot]

denote the expected value and

Θ^{t}

the flow map associated with a dynamical system, i.e.,

Θ^{t} (X_{0}) = X_{t}

. Given a stochastic differential equation of the form:

d X_{t} = b (X_{t}) d t + σ (X_{t}) d B_{t},

(1)

where

b : R^{d} \to R^{d}

is called the drift term,

σ : R^{d} \to R^{d \times d}

the diffusion term, and

B_{t}

is d-dimensional Brownian motion, the stochastic Koopman operator is defined by:

(K^{t} f) (x) = E [f (Θ^{t} (x))] .

The infinitesimal generator

L

of the semigroup of Koopman operators is given by:

L f = \sum_{i = 1}^{d} b_{i} \frac{\partial f}{\partial x_{i}} + \frac{1}{2} \sum_{i = 1}^{d} \sum_{j = 1}^{d} a_{i j} \frac{\partial^{2} f}{\partial x_{i} \partial x_{j}}

(2)

and its adjoint, the generator of the Perron–Frobenius operator, by:

L^{*} f = - \sum_{i = 1}^{d} \frac{\partial (b_{i} f)}{\partial x_{i}} + \frac{1}{2} \sum_{i = 1}^{d} \sum_{j = 1}^{d} \frac{\partial^{2} (a_{i j} f)}{\partial x_{i} \partial x_{j}},

with

a = σ σ^{⊤}

. We assume from now on that a is uniformly positive definite on

X

. The second-order partial differential equation

\frac{\partial u}{\partial t} = L u

is also called the Kolmogorov backward equation and

\frac{\partial u}{\partial t} = L^{*} u

the Fokker–Planck equation [2].

Remark 1.

As in [13], we will often consider systems of the form:

d X_{t} = - \nabla V (X_{t}) d t + \sqrt{2 β^{- 1}} d B_{t},

where V is a given potential and β the inverse temperature. In this case, the operators can be written as:

L f = - \nabla V \cdot \nabla f + β^{- 1} Δ f a n d L^{*} f = \nabla V \cdot \nabla f + Δ V f + β^{- 1} Δ f .

2.2. Generator EDMD

A data-driven method for the approximation of the generator of the Koopman operator and Perron–Frobenius operator called generator extended dynamic mode decomposition (gEDMD) was derived in [13]. While standard EDMD requires a training dataset

{x_{m}}_{m = 1}^{M}

and the corresponding data points

{y_{m}}_{m = 1}^{M}

, where

y_{m} = Θ^{τ} (x_{m})

for a fixed lag time

τ

, gEDMD assumes that we can evaluate or estimate (using, for instance, Kramers–Moyal formulae)

{b (x_{m})}_{m = 1}^{M}

and

{σ (x_{m})}_{m = 1}^{M}

. Choosing a dictionary of basis functions

{ϕ_{n}}_{n = 1}^{N}

, where

ϕ_{n} : R^{d} \to R

, and defining

ϕ (x) = {[ϕ_{1} (x),, ϕ_{N} (x)]}^{⊤}

, we compute the matrices

Φ_{X}, d Φ_{X} \in R^{N \times M}

, with:

Φ_{X} = [\begin{matrix} ϕ_{1} (x_{1}) & \dots & ϕ_{1} (x_{M}) \\ ⋮ & ⋱ & ⋮ \\ ϕ_{N} (x_{1}) & \dots & ϕ_{N} (x_{M}) \end{matrix}] and d Φ_{X} = [\begin{matrix} d ϕ_{1} (x_{1}) & \dots & d ϕ_{1} (x_{M}) \\ ⋮ & ⋱ & ⋮ \\ d ϕ_{N} (x_{1}) & \dots & d ϕ_{N} (x_{M}) \end{matrix}],

where:

d ϕ_{n} (x) = \sum_{i = 1}^{d} b_{i} (x) \frac{\partial ϕ_{n}}{\partial x_{i}} (x) + \frac{1}{2} \sum_{i = 1}^{d} \sum_{j = 1}^{d} a_{i j} (x) \frac{\partial^{2} ϕ_{n}}{\partial x_{i} \partial x_{j}} (x) .

The matrix representation of the least-squares approximation of the Koopman generator

L

is then given by:

{\hat{L}}^{⊤} = d Φ_{X} Φ_{X}^{+} = \hat{A} {\hat{G}}^{+},

with:

\hat{A} = \frac{1}{M} \sum_{m = 1}^{M} d ϕ (x_{m}) ϕ {(x_{m})}^{⊤} and \hat{G} = \frac{1}{M} \sum_{m = 1}^{M} ϕ (x_{m}) ϕ {(x_{m})}^{⊤} .

It was shown that gEDMD, in the infinite data limit, converges to a Galerkin projection of the generator onto the space spanned by the basis functions

{ϕ_{n}}_{n = 1}^{N}

and that

\hat{L}

is an empirical estimate of the projected generator [13]. Approximations of eigenfunctions of

L

are then given by:

φ_{ℓ} (x) = ⟨ξ_{ℓ}, ϕ (x)⟩,

where

ξ_{ℓ}

is an eigenvector of

\hat{L}

corresponding to the eigenvalue

λ_{ℓ}

and

⟨\cdot, \cdot⟩

denotes the standard Euclidean inner product. Analogously, the generator of the Perron–Frobenius operator is given by

{({\hat{L}}^{*})}^{⊤} = \hat{A}^{⊤} {\hat{G}}^{+}

. Further details, examples, and different applications including system identification, coarse graining, and control can also be found in [13].

2.3. Second-Order Differential Operators

Consider the generator

L

in (2), and assume there is a unique strictly positive invariant density

ρ_{0}

, which we can write as

ρ_{0} (x) \propto exp (- F (x))

. The function F is called a generalized potential (with

F = β V

for the stochastic differential equation in Remark 1). The measure corresponding to

ρ_{0}

is denoted by

d μ = ρ_{0} d x

. The negative generator can be decomposed into a symmetric and an anti-symmetric part as:

\begin{matrix} - L & = - \frac{1}{2} e^{F} \nabla \cdot (e^{- F} a \nabla \cdot) + J \cdot \nabla = S + A, \end{matrix}

(3)

\begin{matrix} J & = \frac{1}{2} e^{F} \nabla \cdot (e^{- F} a) - b; \end{matrix}

(4)

see [24]. The vector field J is called stationary probability flow. In the form of (3),

- L

is a special case of an elliptic second-order differential operator on

L_{μ}^{2}

, given by:

T = - \frac{1}{2} e^{F} \nabla \cdot (e^{- F} a \nabla \cdot) + J \cdot \nabla + W,

(5)

for scalar functions

F, W

, a uniformly positive definite matrix field a, and a vector field J.

Remark 2.

Because of the general form of (5), we avoid making too many assumptions about the coefficients of

T

or its domain of definition. The goal is to derive numerical algorithms using a minimal set of assumptions. A detailed analysis of the interplay between the domains and properties of the reproducing kernel Hilbert space (RKHS) will be carried out in future publications.

If

F \equiv 0

, we obtain generalized Schrödinger operators as another special case, i.e.,

H = - \frac{1}{2} \nabla \cdot (a \nabla \cdot) + J \cdot \nabla + W,

(6)

with W called the potential energy in quantum mechanics. In particular, with the reduced Planck constant ℏ and the mass

m

, setting

a \equiv \frac{ℏ^{2}}{m} I

and

J \equiv 0

leads to the Hamiltonian

H = - \frac{ℏ^{2}}{2 m} Δ + W

of the time-independent Schrödinger equation in quantum mechanics:

H ψ = E ψ .

(7)

We note for later use that, under certain conditions, Schrödinger operators and Koopman generators are equivalent; see, e.g., ([24] Chapter 4.9). For the sake of completeness, the proof is shown in Appendix A.

Lemma 1.

The ergodic generator

- L

with unique positive invariant density

ρ_{0} \propto exp (- F)

is unitarily equivalent to the Schrödinger operator

H

in (6) on

L^{2}

, with J remaining unchanged and W given by:

\begin{matrix} W & = - \frac{1}{4} \nabla \cdot (a \nabla F) + \frac{1}{8} \nabla F^{⊤} a \nabla F + \frac{1}{2} J \cdot \nabla F . \end{matrix}

The function

e^{- \frac{1}{2} F}

is an eigenfunction of

H

with eigenvalue zero. Conversely, let

H

be as in (6), and assume there is a non-degenerate smallest eigenvalue

E_{0}

with strictly positive real eigenfunction

ψ_{0} = exp (- η)

. Then,

H

is unitarily equivalent to a negative ergodic generator

- L

on

L_{μ}^{2}

, where

ρ_{0} \propto exp (- 2 η)

is the density associated with μ and

ρ_{0}

is invariant for the corresponding SDE. The explicit form of

- L

is given by:

- L = e^{η} [H - E_{0}] (e^{- η} \cdot) = - \frac{1}{2} e^{2 η} \nabla \cdot (e^{- 2 η} a \nabla \cdot) + J \cdot \nabla .

Corollary 1.

Applying Lemma 1 to (7), we have:

\begin{matrix} \frac{1}{ψ_{0}} (H - E_{0}) (ψ_{0} f) = - (- \frac{ℏ^{2}}{m} \nabla η \cdot \nabla f + \frac{ℏ^{2}}{2 m} Δ f) = - L f, \end{matrix}

where

L

is the Koopman generator of a drift-diffusion process (see Remark 1) with potential (up to an additive constant):

V (x) = \frac{ℏ^{2}}{m} η (x),

and temperature

β^{- 1} = \frac{ℏ^{2}}{2 m}

.

We will exploit this duality below to apply methods developed for the Koopman operator or generator to the Schrödinger operator. More details on quantum chemistry in general and also the quantum harmonic oscillator and the hydrogen atom studied in Section 4 can be found, e.g., in [25].

2.4. Reproducing Kernel Hilbert Spaces and Derivative Reproducing Properties

We aim at representing the differential operators introduced above in reproducing kernel Hilbert spaces.

Definition 1.

Let

X

be a set and

H

a space of functions

f : X \to R

. Then,

H

is called an RKHS with inner product

{⟨\cdot, \cdot⟩}_{H}

if a function

k : X \times X \to R

exists such that:

(i): ${⟨f, k (x, \cdot)⟩}_{H} = f (x)$ for all $f \in H$ and
(ii): $H = \bar{span {k (x, \cdot) ∣ x \in X}}$ .

The function k is called a kernel. It was shown that every RKHS has a unique symmetric positive definite reproducing kernel and that, conversely, every symmetric positive definite kernel spans a unique RKHS; see [26,27,28]. Here, we use the terms positive definite and strictly positive definite, i.e., positive definite means that

\sum_{r = 1}^{M} \sum_{s = 1}^{M} γ_{r} γ_{s} k (x_{r}, x_{s}) \geq 0

for all

M \in N

,

γ_{1}, \dots, γ_{M} \in R

, and

x_{1}, \dots, x_{M} \in X

. Frequently used kernels include the polynomial kernel and the Gaussian kernel, given by:

k (x, x^{'}) = {(c + x^{⊤} x^{'})}^{q} and k (x, x^{'}) = exp (- \frac{{∥x - x^{'}∥}^{2}}{2 ς^{2}}),

respectively. Here,

q \in N

is the degree of the polynomial kernel,

c \geq 0

a parameter, and

ς

the bandwidth of the Gaussian kernel. We now introduce the partial derivative reproducing properties of RKHSs [16]. Let

α = (α_{1}, \dots, α_{d}) \in N_{0}^{d}

be a multi-index and

| α | = \sum_{i = 1}^{d} α_{i}

. Furthermore, for a fixed

p \in N_{0}

, we define the index set

I_{p} = {α \in N_{0}^{d} : | α | \leq p}

. Given

f : X \to R

, let

D^{α}

denote the partial derivative (assuming it exists):

D^{α} f = \frac{\partial^{| α |}}{\partial x_{1}^{α_{1}} \dots \partial x_{d}^{α_{d}}} f .

Thus, the

i^{th}

entry of the gradient is given by

D^{e_{i}} f

and the

{(i, j)}^{th}

entry of the Hessian by

D^{e_{i} + e_{j}}

, where

e_{i}

and

e_{j}

are the

i^{th}

and

j^{th}

unit vectors, respectively. When we apply the differential operator

D^{α}

to the kernel k, the multi-index

α

is assumed to be embedded into

N_{0}^{2 d}

by adding zeros, i.e., the derivatives are computed with respect to the first argument of the kernel. Furthermore, when we write

\nabla k (x, x^{'})

, the gradient is computed with respect to x. In what follows, let

k (x, \cdot) = ϕ (x)

, where

ϕ

is the canonical feature space mapping.

Theorem 1

([16]). Given

p \in N_{0}

and a positive definite kernel

k : X \times X \to R

with

k \in C^{2 p} (X \times X)

, the following holds:

(i): $D^{α} k (x, \cdot) \in H$ for any $x \in X$ and $α \in I_{p}$ .
(ii): $(D^{α} f) (x) = {⟨D^{α} k (x, \cdot), f⟩}_{H}$ for any $x \in X$ , $f \in H$ , and $α \in I_{p}$ .

The second property is called the derivative reproducing property. For

p = 0

, this reduces to the standard reproducing property of RKHSs.

Example 1.

Let us consider the two aforementioned kernels:

For the polynomial kernel, we obtain:

$D^{e_{i}} k (x, x^{'}) = q x_{i}^{'} {(c + x^{⊤} x^{'})}^{q - 1} a n d D^{e_{i} + e_{j}} k (x, x^{'}) = q (q - 1) x_{i}^{'} x_{j}^{'} {(c + x^{⊤} x^{'})}^{q - 2} .$

Thus, $\nabla k (x, x^{'}) = q x^{'} {(c + x^{⊤} x^{'})}^{q - 1}$ and $\nabla^{2} k (x, x^{'}) = q (q - 1) x^{'} x^{' ⊤} {(c + x^{⊤} x^{'})}^{q - 2}$ .
Similarly, for the Gaussian kernel, this results in:

$\begin{matrix} D^{e_{i}} k (x, x^{'}) & = - \frac{1}{ς^{2}} (x_{i} - x_{i}^{'}) k (x, x^{'}), \\ D^{e_{i} + e_{j}} k (x, x^{'}) & = \{\begin{matrix} [\frac{1}{ς^{4}} {(x_{i} - x_{i}^{'})}^{2} - \frac{1}{ς^{2}}] k (x, x^{'}), & i = j, \\ \frac{1}{ς^{4}} (x_{i} - x_{i}^{'}) (x_{j} - x_{j}^{'}) k (x, x^{'}), & i \neq j, \end{matrix} \end{matrix}$

$\nabla k (x, x^{'}) = - \frac{1}{ς^{2}} (x - x^{'}) k (x, x^{'})$ , and $\nabla^{2} k (x, x^{'}) = [\frac{1}{ς^{4}} (x - x^{'}) {(x - x^{'})}^{⊤} - \frac{1}{ς^{2}} I] k (x, x^{'})$ .

For the numerical experiments below, we will mainly use the Gaussian kernel. (To get error estimates, it might be more convenient to use Wendland functions [19]. We leave the formal analysis of the methods developed in this paper for future work.)

3. Kernel-Based Representation of Differential Operators

In this section, we introduce the Galerkin projection of the differential operators discussed above onto the RKHS, including the Koopman generator and Schrödinger operator. We then move on to show how these projected operators can be estimated from data.

3.1. Galerkin Projection of Operators

Let

μ

denote a probability measure on the state space

X

, with density

ρ_{0} \propto e^{- F}

for a generalized potential F.

Definition 2.

We define the covariance operator

C_{00} : H \to H

by:

C_{00} = \int ϕ (x) \otimes ϕ (x) d μ (x),

(8)

and an operator

T_{H} : H \to H

by:

\begin{matrix} T_{H} & = \int ϕ (x) \otimes [- \frac{1}{2} \sum_{i = 1}^{d} \sum_{j = 1}^{d} a_{i j} (x) D^{e_{i} + e_{j}} ϕ (x)] d μ (x) \\ + \int ϕ (x) \otimes [\sum_{i = 1}^{d} (J_{i} (x) - \frac{1}{2} e^{F (x)} \nabla \cdot (e^{- F (x)} a_{:, i} (x))) D^{e_{i}} ϕ (x)] d μ (x) \\ + \int W (x) ϕ (x) \otimes ϕ (x) d μ (x) . \end{matrix}

(9)

If

J \equiv 0

, we define

T_{H}

by:

T_{H} = \int [\frac{1}{2} \sum_{i = 1}^{d} \sum_{j = 1}^{d} a_{i j} (x) (D^{e_{i}} ϕ (x) \otimes D^{e_{j}} ϕ (x))] + W (x) ϕ (x) \otimes ϕ (x) d μ (x) .

(10)

The operator

C_{00}

is the standard covariance operator

C_{X X}

; see [29,30]. The operator

T_{H}

mimics the action of the bilinear form

{⟨T f, g⟩}_{μ}

on the RKHS. It plays the same role as the cross-covariance operator

C_{X Y}

for the Koopman operator in [15]. The form of the symmetric operator for

J \equiv 0

is motivated by the symmetry of

T

, and that, at least formally:

{⟨T f, g⟩}_{μ} = \int [\frac{1}{2} \nabla f {(x)}^{⊤} a (x) \nabla g (x)] + W (x) f (x) g (x) d μ (x);

3.2. Empirical Estimates

The next step is to derive empirical estimates of the operators defined above. Given training data

{x_{m}}_{m = 1}^{M}

, sampling the probability distribution

μ

, we define

Φ = [ϕ (x_{1}), \dots, ϕ (x_{M})]

and

d Φ = [d ϕ (x_{1}), \dots, d ϕ (x_{M})]

, where:

\begin{matrix} d ϕ (x_{m}) & = - \frac{1}{2} \sum_{i = 1}^{d} \sum_{j = 1}^{d} a_{i j} (x_{m}) D^{e_{i} + e_{j}} ϕ (x_{m}) \\ + \sum_{i = 1}^{d} [J_{i} (x_{m}) - \frac{1}{2} \sum_{j = 1}^{d} e^{F (x_{m})} \frac{\partial}{\partial x_{j}} (e^{- F (x_{m})} a_{j i} (x_{m}))] D^{e_{i}} ϕ (x_{m}) \\ + W (x_{m}) ϕ (x_{m}) . \end{matrix}

If

T

is the generator of an SDE with invariant measure

μ

, the data can also be obtained by integrating the stochastic dynamics with the initial condition drawn from

μ

. We see that

Φ

is the standard feature map and

d Φ

contains the action of the differential operator. The empirical estimates of the operators

C_{00}

and

T_{H}

are then given by the following expressions:

\begin{matrix} {\hat{C}}_{00} & = \frac{1}{M} \sum_{m = 1}^{M} ϕ (x_{m}) \otimes ϕ (x_{m}) & = \frac{1}{M} Φ Φ^{⊤}, \\ {\hat{T}}_{H} & = \frac{1}{M} \sum_{m = 1}^{M} ϕ (x_{m}) \otimes d ϕ (x_{m}) & = \frac{1}{M} Φ d Φ^{⊤} . \end{matrix}

Note that these are still finite-rank operators on the full RKHS

H

. For the symmetric RKHS operator

T_{H}

, we need to define the empirical estimate in a slightly different way. Decompose the positive definite matrix

a (x_{m}) = σ (x_{m}) σ {(x_{m})}^{⊤}

. With:

d ϕ_{l} (x_{m}) = \sum_{i = 1}^{d} σ_{i l} (x_{m}) D^{e_{i}} ϕ (x_{m}) = [\nabla ϕ {(x_{m})}^{⊤} σ_{l} (x_{m})],

where

σ_{l}

is the

l^{th}

column of

σ

, the empirical RKHS operator becomes:

{\hat{T}}_{H} = \frac{1}{2 M} \sum_{m = 1}^{M} \sum_{l = 1}^{d} d ϕ_{l} (x_{m}) \otimes d ϕ_{l} (x_{m}) + \frac{1}{M} \sum_{m = 1}^{M} W (x_{m}) ϕ (x_{m}) \otimes ϕ (x_{m}) .

Remark 3.

If the feature space associated with the kernel k is finite-dimensional and known explicitly, i.e.,

ϕ (x) = {[ϕ_{1} (x), \dots, ϕ_{N} (x)]}^{⊤}

and

k (x, x^{'}) = ⟨ϕ (x), ϕ (x^{'})⟩

, then for the Koopman generator, we obtain gEDMD as a special case, with

{\hat{C}}_{00} = \hat{G}

and

{\hat{T}}_{H} = - {\hat{A}}^{⊤}

. However, the goal is to rewrite gEDMD in such a way that only kernel evaluations are required since ϕ can potentially be infinite-dimensional and might only be defined implicitly.

3.3. Weak Formulation and Numerical Algorithm

With Lemma 2 in mind, we now proceed to the weak formulation of the eigenvalue problem for the operator

T

. We then define the quadratic forms:

\begin{matrix} Q (f, g) & = {⟨T f, g⟩}_{μ}, & f, g \in D_{Q}, & S (f, g) & = {⟨f, g⟩}_{μ}, & f, g \in L_{μ}^{2}, \\ Q_{H} (f, g) & = {⟨T_{H} f, g⟩}_{H}, & f, g \in H, & S_{H} (f, g) & = {⟨C_{00} f, g⟩}_{H}, & f, g \in H, \\ {\hat{Q}}_{H} (f, g) & = {⟨{\hat{T}}_{H} f, g⟩}_{H}, & f, g \in H, & {\hat{S}}_{H} (f, g) & = {⟨{\hat{C}}_{00} f, g⟩}_{H}, & f, g \in H, \end{matrix}

where

D_{Q}

is the domain of the quadratic form

Q

. We consider the weak eigenvalue problems:

\begin{matrix} Q (f_{n}, g) & = λ_{n} S (f_{n}, g) & \forall g \in D_{Q}, \end{matrix}

(15)

\begin{matrix} Q_{H} ({\tilde{f}}_{n}, g) & = {\tilde{λ}}_{n} S_{H} ({\tilde{f}}_{n}, g) & \forall g \in H, \end{matrix}

(16)

\begin{matrix} {\hat{Q}}_{H} ({\hat{f}}_{n}, g) & = {\hat{λ}}_{n} {\hat{S}}_{H} ({\hat{f}}_{n}, g) & \forall g \in H . \end{matrix}

(17)

We will now rewrite (17) in such a way that only kernel evaluations—in the form of Gram matrices—are required. The derivation is similar to the kernel transfer operator counterpart in [15], but we now need to consider derivatives at the training data points instead of the time-lagged variables. We start by restricting (17) to the finite-dimensional space

H^{M} = span {ϕ (x_{m})}_{m = 1}^{M}

, which we assume to be M-dimensional. Elements of this space are of the form

f = Φ u

for some vector

u \in R^{m}

. We examine the quadratic forms

{\hat{Q}}_{H}

and

{\hat{S}}_{H}

on this space.

Lemma 4.

A solution of the problem

{\hat{Q}}_{H} (f, g) = \hat{λ} {\hat{S}}_{H} (f, g)

is given by

f = Φ u

, where u is a solution of one of the following generalized eigenvalue problems:

(i): In the general case, u solves $G_{2} u = \hat{λ} G_{0} u$ , where the entries of the matrices $G_{2}$ and $G_{0}$ are given by:

$\begin{matrix} {[G_{2}]}_{m r} & = [d ϕ (x_{m})] (x_{r}), & {[G_{0}]}_{m r} & = [ϕ (x_{m})] (x_{r}) . \end{matrix}$
(ii): Analogously, for the symmetric case, we obtain $\frac{1}{2} \sum_{l = 1}^{d} G_{1}^{(l)} G_{1}^{(l)} u = \hat{λ} G_{0} G_{0} u$ , where we define:

${[G_{1}^{(l)}]}_{m r} = σ_{l} {(x_{m})}^{⊤} \nabla k (x_{m}, x_{r})$

and $σ_{l} (x_{m})$ is the $l^{th}$ column of the matrix $σ (x_{m})$ .

The proofs are shown in Appendix A. Since

[ϕ (x_{m})] (x_{r}) = k (x_{m}, x_{r})

,

G_{0}

is the standard Gram matrix. The reversible case requires only first-order derivatives of the kernel. Furthermore, only trajectory data sampled from the invariant distribution

μ

and estimates of the diffusion term

σ

are needed. For typical problems,

σ

is constant and not position-dependent. As a result, the diffusion term needs to be estimated only once or might even be known. For molecular dynamics problems, for instance, it is proportional to the square root of the temperature. The overall approach is summarized in the following algorithm. Note that it is not a direct kernelization of gEDMD, but an extension that approximates the Koopman generator as a special case.

Algorithm 1.

The final numerical algorithm can be summarized as follows:

(1): Choose a kernel k and compute all its required derivatives, either analytically or with the aid of automatic differentiation.
(2): Assemble the Gram matrices $G_{2}$ and $G_{0}$ or, if the system is symmetric, $G_{1}^{(l)}$ , for $l = 1, \dots, d$ , and $G_{0}$ .
(3): Solve the corresponding eigenvalue problem described in Lemma 4 to obtain an eigenvector u.
(4): An eigenfunction is then given by $φ = Φ u$ .

The two main steps of the algorithm are assembling the Gram matrices and solving the generalized eigenvalue problem. Since the size of the eigenvalue problem depends on the number of data points, the cost is cubic in M. This is a drawback of many kernel-based methods. The efficient approximation of solutions to this eigenvalue problem for large datasets will be considered in future work.

3.4. Analysis

In this section, we provide some preliminary analysis of the methods introduced above. The first result concerns the convergence of the empirical estimates.

Lemma 5.

As

M \to \infty

, the empirical estimates defined in Section 3.2 converge to the corresponding RKHS operators in Definition 2 with respect to the operator norm for almost all data sequences

{x_{m}}_{m = 1}^{M}

, if the data were generated either as i.i.d. samples from μ or by integrating a stochastic dynamics, which is ergodic with respect to μ.

Proof.

The statement follows from ergodicity of the underlying dynamics, the integrability conditions in Lemma 2, and the Birkhoff individual ergodic theorem for Banach space valued functions [32]. □

Next, we generalize ([33] Theorem 7) to obtain convergence rates on the empirical estimates for i.i.d. data:

Lemma 6.

Assume that (11–14) hold. Then:

(i): The operators $C_{00}$ , ${\hat{C}}_{00}$ , $T_{H}$ , and ${\hat{T}}_{H}$ are Hilbert–Schmidt.
(ii): Let $δ \in (0, 1]$ . Assume the coefficients of the operator $T$ are all globally bounded, and let ${sup}_{x \in X} D^{α} k (x, x) < \infty$ for all $| α | \leq 4$ ( $| α | \leq 2$ in the symmetric case). If the data are drawn i.i.d. from the distribution μ, then there are constants $κ_{0}, κ_{1}$ such that with probability at least $1 - δ$ ,

$\begin{matrix} ∥ C_{00} - {\hat{C}}_{00} ∥_{H S} & \leq \frac{2 κ_{0} \sqrt{2}}{\sqrt{M}} {log}^{1 / 2} \frac{2}{δ}, & ∥ T_{H} - {\hat{T}}_{H} ∥_{H S} & \leq \frac{2 κ_{1} \sqrt{2}}{\sqrt{M}} {log}^{1 / 2} \frac{2}{δ}, \end{matrix}$

where the ${∥\cdot∥}_{H S}$ is the Hilbert–Schmidt norm.

Proof.

(i) The empirical estimates are all finite rank and, therefore, Hilbert-Schmidt. For

C_{00}

and

T_{H}

, this follows from the integrability conditions and the first part of the proof of Lemma 2; see Appendix A.

(ii) For

C_{00}

, the bound was already proven in [33] with

κ_{0} = {sup}_{x \in X} k {(x, x)}^{2}

. We can employ the same strategy to obtain the bound for

T_{H}

. Consider the operator

{\hat{T}}_{H}^{m} = ϕ (x_{m}) \otimes d ϕ (x_{m}) - T_{H}

, which satisfies

E^{μ} [{\hat{T}}_{H}^{m}] = 0

. By global boundedness of the coefficients of

T

and by:

\begin{matrix} ∥ ϕ (x) \otimes D^{α} {ϕ (x) ∥}_{H S} & = {∥ ϕ (x) ∥}_{H} {∥ D^{α} ϕ (x) ∥}_{H} \\ = {⟨k (x, \cdot), k (x, \cdot)⟩}_{H}^{1 / 2} {⟨D^{α} k (x, \cdot), D^{α} k (x, \cdot)⟩}_{H}^{1 / 2} \\ = k {(x, x)}^{1 / 2} D^{2 α} k {(x, x)}^{1 / 2}, \end{matrix}

we can find a

κ_{1}

such that

{∥ ϕ (x) \otimes d ϕ (x) ∥}_{H S} \leq κ_{1}

for all

x \in X

. We then have

∥ {\hat{T}}_{H}^{m} ∥_{H S} \leq 2 κ_{1}

, and the result follows from the concentration bound ([33] Equation (3)). □

Finally, we show that solutions of (16) are also eigenvalues of the full operator

T

if the RKHS is dense in

D_{Q}

:

Proposition 1.

Let

H

be dense in

D_{Q}

with respect to the norm in

L_{μ}^{2}

. If

{\tilde{ψ}}_{ℓ} \in H

is an eigenfunction of (16), it is also an eigenfunction of

T

with the same eigenvalue.

Proof.

Let

{\tilde{ψ}}_{ℓ}

solve the variational problem (16). The definition of the operators

C_{00}, T_{H}

implies that for all

ϕ \in H

:

\begin{matrix} {⟨T {\tilde{ψ}}_{ℓ}, ϕ⟩}_{μ} & = {⟨T_{H} {\tilde{ψ}}_{ℓ}, ϕ⟩}_{H} = {\tilde{λ}}_{ℓ} {⟨C_{00} {\tilde{ψ}}_{ℓ}, ϕ⟩}_{H} = {\tilde{λ}}_{ℓ} {⟨{\tilde{ψ}}_{ℓ}, ϕ⟩}_{μ} . \end{matrix}

By the density of the RKHS, this also holds for all

ϕ \in D_{Q}

, and consequently,

{\tilde{ψ}}_{ℓ}

is an eigenfunction of

T

. □

Note that even if the RKHS is dense, there might be additional eigenfunctions that are not contained in

H

and that will not appear as solutions of (16).

4. Applications

The methods described above have important applications in molecular dynamics and quantum physics, which we will show in an exemplary way, but can in principle be applied to data generated by arbitrary dynamical systems and also other differential operators. The code and select examples are available online [34]. Note that this is just a proof-of-concept implementation and that the methods could be sped up significantly by vectorizing and parallelizing the code and by tailoring the implementation to specific kernels.

4.1. Molecular Dynamics

Eigenvalues and eigenfunctions of transfer operators associated with molecular dynamics problems are often used to understand protein folding or binding/unbinding processes and their implied time scales. Conformations correspond to metastable sets and transitions between different conformations to crossing energy barriers. The slowest dynamical processes are encoded in eigenfunctions whose eigenvalues are close to zero. Large-scale molecular dynamics examples, analyzed using kernel EDMD, can also be found in [35]. In this paper, we want to focus more on new applications.

Example 2.

Let us consider the simple quadruple-well problem whose potential V is visualized in Figure 1a; see also [13]. We first generate an equilibrated trajectory so that the training dataset of size

M = 5000

is sampled from the invariant distribution and then apply kernel gEDMD for reversible processes, choosing a Gaussian kernel with bandwidth

ς = 0.5

. The operator

- L

has four dominant eigenvalues

λ_{0} = 0.009

,

λ_{1} = 0.400

,

λ_{2} = 1.011

, and

λ_{3} = 1.55

, followed by a spectral gap. We then apply SEBA (sparse eigenbasis approximation; see [36]) to cluster the dominant eigenfunctions into four metastable sets. The results are shown in Figure 1b. As expected, the sets correspond to the wells of the potential. The computation and clustering of the eigenfunctions took approximately four minutes on a standard laptop (8 cores, 1.80 GHz, 16 GB of RAM). For comparison, we estimated the generator eigenvalues using a Markov state model. Applying both methods to 20 different trajectories, we computed the average of the eigenvalues and the standard deviation, see Figure 1c. The results were in excellent agreement. Clearly, the standard deviation increased for higher eigenvalues.

Figure 1. (a) Quadruple-well potential. The color blue corresponds to small values and yellow to large values. (b) Clustering into four metastable sets based on sparse eigenbasis approximation (SEBA). (c) Eigenvalues computed using kernel generator extended dynamic mode decomposition (gEDMD) and a Markov state model. The bars indicate the estimated standard deviation.

4.2. Quantum Mechanics

The goal now is to apply data-driven methods to simple quantum mechanics problems of the form (7) with

H = - \frac{ℏ^{2}}{2 m} Δ + W

.

4.2.1. Generator EDMD for the Schrödinger Equation

Let us consider two systems for which the eigenfunctions are well known.

Example 3.

For the quantum harmonic oscillator with angular frequency ω, the potential can be written as

W (x) = \frac{1}{2} m ω^{2} x^{2}

. The eigenfunctions

ψ_{ℓ}

and corresponding energy levels

E_{ℓ}

of this system can be computed analytically, and we obtain:

ψ_{ℓ} (x) = \frac{1}{\sqrt{2^{ℓ} ℓ!}} {(\frac{m ω}{π ℏ})}^{1 / 4} exp (- \frac{m ω}{2 ℏ} x^{2}) H_{ℓ} (\sqrt{\frac{m ω}{ℏ}} x)

and

E_{ℓ} = ℏ ω (ℓ + \frac{1}{2})

, for

ℓ = 0, 1, 2, \dots

. Here,

H_{ℓ}

denotes the

ℓ^{th}

physicists’ Hermite polynomial. For the numerical experiments, we set

ℏ = m = ω = 1

. Furthermore, the bandwidth of the kernel is set to

ς = 1

. Computing the Gram matrices

G_{2}

and

G_{0}

for 100 uniformly distributed points in

[- 5, 5]

and solving the corresponding eigenvalue problem, this results in the eigenfunctions shown in Figure 2. The probability densities

p_{ℓ}

are defined by

p_{ℓ} (x) = {| ψ_{ℓ} (x) |}^{2}

.

Figure 2. (a) Numerically computed eigenfunctions

ψ_{ℓ}

and associated energy levels

E_{ℓ}

of the quantum harmonic oscillator. The results are virtually indistinguishable from the analytical results. (b) Corresponding probability densities

p_{ℓ}

.

Example 4.

As a second example, let us analyze the Schrödinger equation for the hydrogen atom, where

W (x) = - \frac{e^{2}}{4 π ε_{0} ∥x∥}

, with

x \in R^{3}

. Here, e is the electron charge and

ε_{0}

the vacuum permittivity. Note that the parameter

m

in front of the Laplacian is the reduced mass of the system. As before, we define the physical constants to be one and use the Gaussian kernel, now with bandwidth

ς = 2

. We then generate 5000 uniformly distributed test points in the ball with radius 20 and compute the Gram matrices

G_{2}

and

G_{0}

. Solving the resulting eigenvalue problem, we obtain the eigenfunctions shown in Figure 3. As expected, there are several repeated eigenvalues (up to small perturbations due to the randomly sampled test points and numerical errors) for the higher energy states.

Figure 3. Numerically computed eigenfunctions of the Schrödinger equation associated with the hydrogen atom. Only points where the absolute value of the eigenfunction is larger than a given threshold are plotted. The shapes clearly resemble the well-known hydrogen atom orbitals shown next to the scatter plots. The eigenfunctions (or rotations thereof) correspond to the following quantum numbers

(\underset{̲}{n}, \underset{̲}{ℓ}, \underset{̲}{m})

: (a)

(1, 0, 0)

, (b)

(2, 1, 1)

, (c)

(3, 2, 1)

, and (d)

(4, 3, 1)

.

4.2.2. SDE Formulation of the Schrödinger Equation

In order to derive gEDMD, we went from the stochastic differential equation to the Kolmogorov backward equation, which is the generator of the Koopman operator, or the adjoint Fokker–Planck equation, which is the generator of the Perron–Frobenius operator. Exploiting the resemblance between these two equations and the Schrödinger equation, we illustrated how data-driven methods can, in the same way, be used to compute wavefunctions. We now want to go in the opposite direction and find a stochastic differential equation whose eigenfunctions correspond to the wavefunctions. Formal similarities between quantum mechanics and the theory of stochastic processes have been investigated since the beginning of quantum mechanics by Schrödinger and others (see, for example, [37] and the references therein). The necessary transformations were already introduced in Section 2.3; we now want to exploit these relationships. Let us consider the two aforementioned examples again.

Example 5.

Using Corollary 1, the quantum harmonic oscillator can be transformed into an Ornstein–Uhlenbeck process:

d X_{t} = - α X_{t} d t + \sqrt{2 β^{- 1}} d B_{t},

with friction coefficient

α = ℏ ω

and temperature

β^{- 1} = \frac{ℏ^{2}}{2 m}

. Since the eigenvalues of the Ornstein–Uhlenbeck process are

λ_{ℓ} = - α ℓ = - ℏ ω ℓ

, the resulting eigenvalues of the quantum harmonic oscillator are

E_{ℓ} = E_{0} - λ_{ℓ} = ℏ ω (ℓ + \frac{1}{2})

. Correspondingly, the (unnormalized) eigenfunctions of the Ornstein–Uhlenbeck process are

φ_{ℓ} (x) = {\tilde{H}}_{ℓ} (\sqrt{α β} x)

, where

{\tilde{H}}_{ℓ}

is the

ℓ^{th}

probabilists’ Hermite polynomial. Thus,

ψ_{ℓ} (x) = ψ_{0} (x) {\tilde{H}}_{ℓ} (\sqrt{\frac{2 m ω}{ℏ}} x) = exp (- \frac{m ω}{2 ℏ} x^{2}) H_{ℓ} (\sqrt{\frac{m ω}{ℏ}} x),

which is consistent with the results obtained above. In the last step, we transformed the probabilists’ Hermite polynomials into the physicists’ Hermite polynomials.

Example 6.

Similarly, for the hydrogen atom, whose ground state is given by:

ψ_{0} (x) = \frac{1}{\sqrt{π a_{0}^{3}}} exp (- \frac{1}{a_{0}} ∥x∥),

where

a_{0} = \frac{4 π ε_{0} ℏ^{2}}{m e^{2}}

, we obtain

V (x) = \frac{ℏ^{2}}{m a_{0}} ∥x∥

, and thus:

\nabla V (x) = \frac{ℏ^{2}}{m a_{0}} \frac{x}{∥x∥} .

There are now two options to compute the eigenfunctions numerically: we can either directly apply kernel gEDMD to the Koopman generator or generate time-series data by integrating the stochastic differential equation and then applying kernel EDMD or simply Ulam’s method. We proceed with the former, but the latter leads to comparable results (although typically, more data points are required to achieve the same accuracy due to the stochasticity). We again generate uniformly distributed test points

x_{m}

in the ball with radius 20, this time

m = 10, 000

, and use the Gaussian kernel with bandwidth

ς = 2

. This results in the same eigenfunctions as the ones shown in Figure 3. Due to the larger number of test points, even higher energy states can be well approximated. Two additional eigenfunctions are shown in Figure 4.

Figure 4. Eigenfunctions of the Schrödinger equation associated with the hydrogen atom computed by applying kernel gEDMD to the corresponding Koopman generator. The quantum numbers

(\underset{̲}{n}, \underset{̲}{ℓ}, \underset{̲}{m})

are: (a)

(3, 2, 0)

and (b)

(4, 3, 2)

.

The examples illustrate that instead of solving partial differential equations, we can also compute eigenfunctions by approximating the Koopman operator from time-series data. The question under which conditions a non-degenerate strictly positive ground state exists needs to be addressed separately. One important theorem can be found in [38]:

Theorem 2.

Let

L_{loc}^{2} (X)

be the space of locally square-integrable functions and

W \in L_{loc}^{2} (X)

positive. Suppose

{lim}_{| x | \to \infty} W (X) = \infty

, then

- Δ + W

has a non-degenerate strictly positive ground state.

There are other results concerning the existence of such states; see [38] for details. Furthermore, diffusion Monte Carlo methods, which simultaneously compute the ground state energy and wavefunction, rely on similar assumptions [39]. However, in many cases of interest, the ground state of fermionic systems will have nodes so that these methods are not applicable [39]. The work presented here aims mainly at linking different operators describing the evolution of dynamical systems; more detailed relationships—in particular with the aforementioned diffusion Monte Carlo methods—and practical implications will be studied in future work.

4.3. Manifold Learning

So far, we assumed that the data were generated by a dynamical system. There is, however, a second scenario without any notion of time, where the Kolmogorov backward equation and Fokker–Planck equation are used for dimensionality reduction and manifold learning [21]; see also [20,22,23] and the references therein.

Let the data points

{x_{m}}_{m = 1}^{M}

be sampled from an arbitrary probability density

ρ

, then we can define the associated potential by:

U (x) = - log ρ (x) .

It was shown in [21] that, depending on some normalization parameter

α

, anisotropic diffusion maps approximate operators of the form:

L_{α} f = - 2 (1 - α) \nabla U \cdot \nabla f + Δ f .

That is, for

α = \frac{1}{2}

, we obtain the standard Kolmogorov backward equation with

β = 1

. Thus, the algorithms described above could also potentially be used for manifold learning purposes. We will illustrate this with a simple example.

Example 7.

We consider the well-known Swiss roll; see, for instance, [23]. The goal is to parametrize the two-dimensional manifold. We use kernel density estimation, cf. [40], and a Gaussian kernel with bandwidth

ς = 0.22

to learn

U (x)

, i.e.,

U (x) = \frac{1}{M {(\sqrt{2 π} ς)}^{d}} \sum_{m = 1}^{M} k (x, x_{m})

and approximate the backward Kolmogorov operator by applying kernel gEDMD. Here,

M = 2000

and

d = 2

. The results are shown in Figure 5. The first eigenfunction parametrizes the angular direction, followed by higher order modes, and only the sixth eigenfunction corresponds to the

x_{3}

direction. Considering these eigenfunctions as new coordinates, we obtain an unfolding of the roll. Note that also diffusion maps do not yield perfect rectangles in the embedded space due to the non-uniform density of points on the manifold [23].

Figure 5. Swiss roll colored with respect to the eigenfunctions (a)

φ_{0}

and (b)

φ_{5}

, which parametrize the angular and vertical direction, respectively. (c) Resulting two-dimensional embedding.

These results demonstrate that the eigenfunctions of certain differential operators capture geometrical properties of the data. However, the assumption that a strictly positive density in the ambient space exists will in general not be satisfied if the data are supported only on a lower dimensional manifold. This problem was circumvented by using kernel density estimation and a kernel with global support. Carrying over the definition of the differential operators involved and of their kernel-based analogues to the manifold case are beyond the scope of this paper and will be studied in future work. The same applies to the investigation of detailed relationships with diffusion maps or other manifold learning techniques. Concepts like neighborhood and sparsity will then need to be carried over to gEDMD to make this method amenable to large datasets. Furthermore, heuristics to find the optimal bandwidth

ς

are required since the results often strongly depend on the kernel hyperparameters.

5. Conclusions

Using the theory of derivative reproducing kernel Hilbert spaces, we derived a kernel-based formulation of gEDMD for approximating the Koopman generator, which allowed for the computation of eigenfunctions of potentially high-dimensional stochastic dynamical systems. If the system is reversible, the generator can be approximated from equilibrated time-series data, without having to estimate the drift and diffusion terms at the training data points. Furthermore, we showed that data-driven methods developed for the analysis of stochastic dynamical systems (kernel EDMD) can be carried over to their generators (kernel gEDMD) and, in turn, to the Schrödinger operator. Conversely, under certain assumptions on the ground state, the Schrödinger equation can be turned into a Kolmogorov backward equation corresponding to a drift-diffusion process. These results are summarized in Figure 6. Similar transformations also exist for the Fokker–Planck operator; see [24]. All derived approaches were illustrated with numerical results ranging from molecular dynamics to quantum mechanics.

Figure 6. Relationships between the Koopman, Kolmogorov, and Schrödinger operators for a drift-diffusion process of the form

d X_{t} = - \nabla V (X_{t}) d t + \sqrt{2 β^{- 1}} d B_{t}

. Here,

ρ_{0}

denotes the invariant density, i.e.,

L^{*} ρ_{0} = 0

. In our setting, the transformation of the Schrödinger operator requires a strictly positive real-valued ground state

ψ_{0}

.

Although we focused mainly on the Kolmogorov backward equation, the Fokker–Planck equation, and the Schrödinger equation, these methods can be applied to approximate other differential operators as well. An interesting open question is whether such algorithms can also be used for manifold learning. Some preliminary results were presented in Section 4, but a rigorous mathematical justification would require significant additional research. Analyzing connections with diffusion maps [20] or generalizations thereof in detail could be a potential direction for future work.

Another interesting avenue for future research could be to improve the efficiency and stability of the presented algorithms. Exploiting the properties of the given kernels, it might be possible to speed up computations significantly. The definition of a cutoff radius for the kernel or considering only a certain number of neighbors of data points, for instance, would—for suitable problems—result in sparse matrices. Moreover, the results sensitively depend on the hyperparameters such as the bandwidth of the Gaussian kernel. If the bandwidth is too small, this leads to overfitting and noisy eigenfunctions. If it is, on the other hand, too large, then the kernel is not able to capture the properties of the dynamical system accurately anymore. As a result, the Gram matrix

G_{0}

has (numerically) essentially a low rank structure, and we obtain many zero eigenvalues. The question is then how to compute the smallest nonzero eigenvalues and corresponding eigenvectors efficiently.

Potential solutions for the hyperparameter tuning problem are techniques based on cross-validation [41] or so-called kernel flows [42]. By defining an optimization problem for the parameters of the kernel, e.g., based on a variational principle [43], gradient descent methods can help find suitable parameter values.

Author Contributions

Conceptualization, S.K., F.N., and B.H.; methodology, S.K. and F.N.; software, S.K. and F.N.; formal analysis, S.K. and F.N.; writing, original draft preparation, S.K., F.N., and B.H. All authors read and agreed to the published version of the manuscript.

Funding

S.K. was funded by Deutsche Forschungsgemeinschaft (DFG) through the grant CRC 1114 (Scaling Cascades in Complex Systems, project ID: 235221301) and through Germany’s Excellence Strategy (MATH+: The Berlin Mathematics Research Center, EXC-2046/1, project ID: 390685689). B.H. thanks the European Commission for funding through the Marie Curie fellowships scheme.

Acknowledgments

The publication of this article was funded by Freie Universität Berlin. We would like to thank Luigi Delle Site for many helpful discussions regarding quantum chemistry and Amel Durakovic for useful discussions on the correspondence between the Schrödinger equation and complex Langevin dynamics.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs

Proof of Lemma 1.

The unitary transformation is

H = - e^{- \frac{1}{2} F} L (e^{\frac{1}{2} F} \cdot)

. We obtain:

\begin{matrix} H f & = e^{- \frac{1}{2} F} S (e^{\frac{1}{2} F} f) + e^{- \frac{1}{2} F} J \cdot (e^{\frac{1}{2} F} \nabla f + \frac{1}{2} e^{\frac{1}{2} F} f \nabla F) \\ = e^{- \frac{1}{2} F} S (e^{\frac{1}{2} F} f) + J \cdot \nabla f + \frac{1}{2} J \cdot \nabla F f, \end{matrix}

which establishes the first-order term in

H

and the third term in the definition of W. For the symmetric part, we find that:

\begin{matrix} e^{- \frac{1}{2} F} S (e^{\frac{1}{2} F} f) & = - \frac{1}{2} e^{\frac{1}{2} F} \nabla \cdot (e^{- F} a \nabla (e^{\frac{1}{2} F} f)) \\ = - \frac{1}{2} e^{\frac{1}{2} F} \nabla \cdot (e^{- \frac{1}{2} F} a [\nabla f + \frac{1}{2} \nabla F f]) \\ = - \frac{1}{2} \nabla \cdot (a \nabla f) + \frac{1}{4} \nabla F^{⊤} a \nabla f - \frac{1}{4} e^{\frac{1}{2} F} \nabla \cdot (e^{- \frac{1}{2} F} f a \nabla F), \end{matrix}

which establishes the second-order term in the definition of

H

. Expanding the third term above, we get:

\begin{matrix} - \frac{1}{4} e^{\frac{1}{2} F} \nabla \cdot (e^{- \frac{1}{2} F} f a \nabla F) & = - \frac{1}{4} \nabla \cdot (a \nabla F) f - \frac{1}{4} \nabla f^{⊤} a \nabla F + \frac{1}{8} \nabla F^{⊤} a \nabla F f, \end{matrix}

which cancels out the second term of the previous equation and establishes the remaining terms for W.

For the converse direction, we first translate the eigenvalue equation for

ψ_{0}

into an equation for

η

:

\begin{matrix} 0 & = (H - E_{0}) ψ_{0} = - \frac{1}{2} \nabla \cdot (a \nabla e^{- η}) + J \cdot \nabla e^{- η} + (W - E_{0}) e^{- η} \\ = - \frac{1}{2} \nabla \cdot (- e^{- η} a \nabla η) - J \cdot \nabla η e^{- η} + (W - E_{0}) e^{- η} \\ = e^{- η} [\frac{1}{2} \nabla \cdot (a \nabla η) - \frac{1}{2} \nabla η^{⊤} a \nabla η - J \cdot \nabla η + W - E_{0}], \end{matrix}

implying that the term in brackets is also vanishing. Now, we define the negative generator by the transformation

- L = e^{η} [H - E_{0}] (e^{- η} \cdot)

. Expanding the action of

- L

, we find:

\begin{matrix} - L f & = e^{η} [- \frac{1}{2} \nabla \cdot (a \nabla (e^{- η} f)) + J \cdot \nabla (e^{- η} f) + (W - E_{0}) (e^{- η} f)] \\ = e^{η} [- \frac{1}{2} \nabla \cdot (e^{- η} a [\nabla f - \nabla η f]) + e^{- η} J \cdot \nabla f + (W - E_{0} - J \cdot \nabla η) (e^{- η} f)] \\ = - \frac{1}{2} \nabla \cdot (a \nabla f) + \frac{1}{2} \nabla η^{⊤} a \nabla f + \frac{1}{2} \nabla \cdot (a \nabla η) f + \frac{1}{2} \nabla η^{⊤} a \nabla f - \frac{1}{2} \nabla η^{⊤} a \nabla η f + \\ J \cdot \nabla f + (W - E_{0} - J \cdot \nabla η) f \\ = - \frac{1}{2} \nabla \cdot (a \nabla f) + \nabla η^{⊤} a \nabla f + J \cdot \nabla f \\ = - \frac{1}{2} e^{2 η} \nabla \cdot (e^{- 2 η} a \nabla f) + J \cdot \nabla f . \end{matrix}

□

Proof of Lemma 2.

We only show the proof for

T_{H}

. Similar to the argument in [44],

T_{H}

is a bounded linear operator on

H

because of:

\begin{matrix} ∥ \int ϕ (x) \otimes [- \frac{1}{2} \sum_{i = 1}^{d} \sum_{j = 1}^{d} a_{i j} (x) D^{e_{i} + e_{j}} ϕ (x)] d μ (x) \\ + \int ϕ (x) \otimes [\sum_{i = 1}^{d} (J_{i} (x) - \frac{1}{2} e^{F (x)} \nabla \cdot (e^{- F (x)} a_{:, i} (x))) D^{e_{i}} ϕ (x)] d μ (x) \\ + \int W (x) ϕ (x) \otimes ϕ (x) d μ (x) ∥_{H S} \\ \leq \frac{1}{2} \sum_{i = 1}^{d} \sum_{j = 1}^{d} \int | a_{i j} (x) | ∥ D^{e_{i} + e_{j}} {ϕ (x) ∥}_{H} {∥ ϕ (x) ∥}_{H} d μ (x) \\ + \sum_{i = 1}^{d} \int (| J_{i} (x) | + \frac{1}{2} e^{F (x)} | \nabla \cdot (e^{- F (x)} a_{:, i} (x)) |) ∥ D^{e_{i}} {ϕ (x) ∥}_{H} {∥ ϕ (x) ∥}_{H} d μ (x) \\ + {\int | W (x) | ∥ ϕ (x) ∥}_{H} {∥ ϕ (x) ∥}_{H} d μ (x) \\ < \infty . \end{matrix}

Using the derivative reproducing property, we obtain:

\begin{matrix} {⟨T f, g⟩}_{μ} & = \int (T f) (x) g (x) d μ (x) \\ = \int [- \frac{1}{2} \sum_{i = 1}^{d} \sum_{j = 1}^{d} a_{i j} (x) \frac{\partial^{2} f}{\partial x_{i} \partial x_{j}} (x)] g (x) d μ (x) \\ + \int [\sum_{i = 1}^{d} (J_{i} (x) - \frac{1}{2} e^{F (x)} \nabla \cdot (e^{- F (x)} a_{:, i} (x))) \frac{\partial f}{\partial x_{i}}] g (x) d μ (x) \\ + \int W (x) f (x) g (x) d μ (x) \\ = \int [- \frac{1}{2} \sum_{i = 1}^{d} \sum_{j = 1}^{d} a_{i j} (x) {⟨D^{e_{i} + e_{j}} ϕ (x), f⟩}_{H}] {⟨ϕ (x), g⟩}_{H} d μ (x) \\ + \int [\sum_{i = 1}^{d} (J_{i} (x) - \frac{1}{2} e^{F (x)} \nabla \cdot (e^{- F (x)} a_{:, i} (x))) {⟨D^{e_{i}} ϕ (x), f⟩}_{H}] {⟨ϕ (x), g⟩}_{H} d μ (x) \\ + \int W (x) {⟨ϕ (x), f⟩}_{H} {⟨ϕ (x), g⟩}_{H} d μ (x) \\ = - \int [\frac{1}{2} \sum_{i = 1}^{d} \sum_{j = 1}^{d} a_{i j} (x) {⟨D^{e_{i} + e_{j}} ϕ (x) \otimes ϕ (x), f \otimes g⟩}_{H \otimes H}] d μ (x) \\ + \int [\sum_{i = 1}^{d} (J_{i} (x) - \frac{1}{2} e^{F (x)} \nabla \cdot (e^{- F (x)} a_{:, i} (x))) {⟨D^{e_{i}} ϕ (x) \otimes ϕ (x), f \otimes g⟩}_{H \otimes H}] d μ (x) \\ + \int W (x) {⟨ϕ (x) \otimes ϕ (x), f \otimes g⟩}_{H \otimes H} d μ (x) \\ = {⟨T_{H} f, g⟩}_{H} . \end{matrix}

The same argument can be used to prove the statement about the symmetric case. □

Proof of Lemma 4.

Let

f = Φ u

and

g = Φ v

. Then:

\begin{matrix} {\hat{S}}_{H} (f, g) & = {⟨[\frac{1}{M} \sum_{m = 1}^{M} ϕ (x_{m}) \otimes ϕ (x_{m})] \sum_{r = 1}^{M} u_{r} ϕ (x_{r}), \sum_{s = 1}^{M} v_{s} ϕ (x_{s})⟩}_{H} \\ = \frac{1}{M} \sum_{m = 1}^{M} \sum_{r = 1}^{M} \sum_{s = 1}^{M} u_{r} v_{s} {⟨ϕ (x_{m}), ϕ (x_{r})⟩}_{H} {⟨ϕ (x_{m}), ϕ (x_{s})⟩}_{H} \\ = \frac{1}{M} \sum_{m = 1}^{M} \sum_{r = 1}^{M} \sum_{s = 1}^{M} u_{r} v_{s} k (x_{m}, x_{r}) k (x_{m}, x_{s}) \\ = \frac{1}{M} ⟨G_{0} u, G_{0} v⟩ . \end{matrix}

Similarly,

\begin{matrix} {\hat{Q}}_{H} (f, g) & = {⟨[\frac{1}{M} \sum_{m = 1}^{M} ϕ (x_{m}) \otimes d ϕ (x_{m})] \sum_{r = 1}^{M} u_{r} ϕ (x_{r}), \sum_{s = 1}^{M} v_{s} ϕ (x_{s})⟩}_{H} \\ = \frac{1}{M} \sum_{m = 1}^{M} \sum_{r = 1}^{M} \sum_{s = 1}^{M} u_{r} v_{s} {⟨d ϕ (x_{m}), ϕ (x_{r})⟩}_{H} {⟨ϕ (x_{m}), ϕ (x_{s})⟩}_{H} \\ = \frac{1}{M} ⟨G_{2} u, G_{0} v⟩ . \end{matrix}

If the kernel functions at the training points are linearly independent, then

G_{0}

is invertible, and it suffices to compute eigenvectors u of the generalized matrix eigenvalue problem

G_{2} u = λ G_{0} u

. In the symmetric case, the expression for the quadratic form

{\hat{Q}}_{H} (f, g)

changes to:

\begin{matrix} {\hat{Q}}_{H} (f, g) & = \frac{1}{2 M} \sum_{l = 1}^{d} \sum_{m = 1}^{M} \sum_{r = 1}^{M} \sum_{s = 1}^{M} u_{r} v_{s} (σ_{l} {(x_{m})}^{⊤} \nabla k (x_{m}, x_{r})) (σ_{l} {(x_{m})}^{⊤} \nabla k (x_{m}, x_{s})) \\ = \frac{1}{2 M} \sum_{l = 1}^{d} ⟨G_{1}^{(l)} u, G_{1}^{(l)} v⟩ . \end{matrix}

□

References

Koopman, B. Hamiltonian systems and transformations in Hilbert space. Proc. Natl. Acad. Sci. USA 1931, 17, 315. [Google Scholar] [CrossRef]
Lasota, A.; Mackey, M.C. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics, 2nd ed. Probab. Eng. Inf. Sci. 1994, 10, 311. [Google Scholar]
Mezić, I. Spectral Properties of Dynamical Systems, Model Reduction and Decompositions. Nonlinear Dyn. 2005, 41, 309–325. [Google Scholar] [CrossRef]
Budišić, M.; Mohr, R.; Mezić, I. Applied Koopmanism. Chaos Interdiscip. J. Nonlinear Sci. 2012, 22. [Google Scholar] [CrossRef]
Mauroy, A.; Mezić, I. Global stability analysis using the eigenfunctions of the Koopman operator. IEEE Trans. Autom. Control 2016, 61, 3356–3369. [Google Scholar] [CrossRef]
Klus, S.; Koltai, P.; Schütte, C. On the numerical approximation of the Perron–Frobenius and Koopman operator. J. Comput. Dyn. 2016, 3, 51–79. [Google Scholar] [CrossRef]
Kaiser, E.; Kutz, J.N.; Brunton, S.L. Data-driven discovery of Koopman eigenfunctions for control. arXiv 2017, arXiv:1707.01146. [Google Scholar]
Korda, M.; Mezić, I. Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica 2018, 93, 149–160. [Google Scholar] [CrossRef]
Peitz, S.; Klus, S. Koopman operator-based model reduction for switched-system control of PDEs. Automatica 2019, 106, 184–191. [Google Scholar] [CrossRef]
Klus, S.; Husic, B.E.; Mollenhauer, M.; Noé, F. Kernel methods for detecting coherent structures in dynamical data. Chaos 2019. [Google Scholar] [CrossRef]
Williams, M.O.; Kevrekidis, I.G.; Rowley, C.W. A Data-Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition. J. Nonlinear Sci. 2015, 25, 1307–1346. [Google Scholar] [CrossRef]
Williams, M.O.; Rowley, C.W.; Kevrekidis, I.G. A Kernel-Based Method for Data-Driven Koopman Spectral Analysis. J. Comput. Dyn. 2015, 2, 247–265. [Google Scholar] [CrossRef]
Klus, S.; Nüske, F.; Peitz, S.; Niemann, J.H.; Clementi, C.; Schütte, C. Data-driven approximation of the Koopman generator: Model reduction, system identification, and control. Physica D 2020, 406, 132416. [Google Scholar] [CrossRef]
Mauroy, A.; Goncalves, J. Linear identification of nonlinear systems: A lifting technique based on the Koopman operator. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, 12–14 December 2016; pp. 6500–6505. [Google Scholar]
Klus, S.; Schuster, I.; Muandet, K. Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces. J. Nonlinear Sci. 2019. [Google Scholar] [CrossRef]
Zhou, D.X. Derivative reproducing properties for kernel methods in learning theory. J. Comput. Appl. Math. 2008, 220, 456–463. [Google Scholar] [CrossRef]
Giesl, P.; Hamzi, B.; Rasmussen, M.; Webster, K. Approximation of Lyapunov functions from noisy data. J. Comput. Dyn. 2019. [Google Scholar] [CrossRef]
Haasdonk, B.; Hamzi, B.; Santin, G.; Witwar, D. Greedy Kernel Methods for Center Manifold Approximation. arXiv 2018, arXiv:1810.11329. [Google Scholar]
Wendland, H. Scattered Data Approximation; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar] [CrossRef]
Coifman, R.R.; Lafon, S. Diffusion maps. Appl. Comput. Harmon. Anal. 2006, 21, 5–30. [Google Scholar] [CrossRef]
Nadler, B.; Lafon, S.; Coifman, R.R.; Kevrekidis, I.G. Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Appl. Comput. Harmon. Anal. 2006, 21, 113–127. [Google Scholar] [CrossRef]
Coifman, R.R.; Kevrekidis, I.G.; Lafon, S.; Maggioni, M.; Nadler, B. Diffusion Maps, Reduction Coordinates, and Low Dimensional Representation of Stochastic Systems. Multiscale Model. Simul. 2008, 7, 842–864. [Google Scholar] [CrossRef]
Nadler, B.; Lafon, S.; Coifman, R.R.; Kevrekidis, I.G. Diffusion Maps—A Probabilistic Interpretation for Spectral Embedding and Clustering Algorithms. In Principal Manifolds for Data Visualization and Dimension Reduction; Gorban, A., Kégl, B., Wunsch, D., Zinovyev, A., Eds.; Springer: Heidelberg, Germany, 2008; pp. 238–260. [Google Scholar]
Pavliotis, G.A. Stochastic Processes and Applications: Diffusion Processes, the Fokker–Planck and Langevin Equations; Springer: New York, NY, USA, 2014. [Google Scholar]
Levine, I.N. Quantum Chemistry; Prentice Hall: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
Aronszajn, N. Theory of Reproducing Kernels. Trans. Am. Math. Soc. 1950, 68, 337–404. [Google Scholar] [CrossRef]
Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond; MIT press: Cambridge, MA, USA, 2001. [Google Scholar]
Steinwart, I.; Christmann, A. Support Vector Machines, 1st ed.; Springer: New York, NY, USA, 2008. [Google Scholar]
Baker, C. Mutual Information for Gaussian Processes. SIAM J. Appl. Math. 1970, 19, 451–458. [Google Scholar] [CrossRef]
Baker, C. Joint Measures and Cross-Covariance Operators. Trans. Am. Math. Soc. 1973, 186, 273–289. [Google Scholar] [CrossRef]
Davies, E.B. Spectral Theory and Differential Operators; Cambridge University Press: Cambridge, UK, 1996; Volume 42. [Google Scholar]
Chacon, R.V. An ergodic theorem for operators satisfying norm conditions. J. Math. Mech. 1962, 11, 165–172. [Google Scholar]
Rosasco, L.; Belkin, M.; Vito, E.D. On Learning with Integral Operators. J. Mach. Learn. Res. 2010, 11, 905–934. [Google Scholar]
Klus, S. Data-Driven Dynamical Systems Toolbox. Available online: https://github.com/sklus/d3s/ (accessed on 1 May 2020).
Klus, S.; Bittracher, A.; Schuster, I.; Schütte, C. A kernel-based approach to molecular conformation analysis. J. Chem. Phys. 2018, 149, 244109. [Google Scholar] [CrossRef]
Froyland, G.; Rock, C.P.; Sakellariou, K. Sparse eigenbasis approximation: Multiple feature extraction across spatiotemporal scales with application to coherent set identification. Commun. Nonlinear Sci. Numer. Simul. 2019, 77, 81–107. [Google Scholar] [CrossRef]
Okamoto, H. Stochastic formulation of quantum mechanics based on a complex Langevin equation. J. Phys. A Math. Gen. 1990, 23, 5535–5545. [Google Scholar] [CrossRef]
Reed, M.; Simon, B. Methods of Modern Mathematical Physics. IV Analysis of Operators; Academic Press: San Diego, CA, USA, 1978. [Google Scholar]
Kosztin, I.; Faber, B.; Schulten, K. Introduction to the diffusion Monte Carlo method. Am. J. Phys. 1996, 64, 633–644. [Google Scholar] [CrossRef]
Parzen, E. On Estimation of a Probability Density Function and Mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
McGibbon, R.T.; Pande, V.S. Variational cross-validation of slow dynamical modes in molecular kinetics. J. Chem. Phys. 2015, 142, 03B621_1. [Google Scholar] [CrossRef]
Owhadi, H.; Yoo, G.R. Kernel Flows: From learning kernels from data into the abyss. J. Comput. Phys. 2019, 389, 22–47. [Google Scholar] [CrossRef]
Wu, H.; Noé, F. Variational approach for learning Markov processes from time series data. arXiv 2017, arXiv:1707.04659. [Google Scholar] [CrossRef]
Muandet, K.; Fukumizu, K.; Sriperumbudur, B.; Schölkopf, B. Kernel mean embedding of distributions: A review and beyond. Found. Trends Mach. Learn. 2017, 10, 1–141. [Google Scholar] [CrossRef]

Figure 1. (a) Quadruple-well potential. The color blue corresponds to small values and yellow to large values. (b) Clustering into four metastable sets based on sparse eigenbasis approximation (SEBA). (c) Eigenvalues computed using kernel generator extended dynamic mode decomposition (gEDMD) and a Markov state model. The bars indicate the estimated standard deviation.

Figure 2. (a) Numerically computed eigenfunctions

ψ_{ℓ}

and associated energy levels

E_{ℓ}

of the quantum harmonic oscillator. The results are virtually indistinguishable from the analytical results. (b) Corresponding probability densities

p_{ℓ}

.

Figure 3. Numerically computed eigenfunctions of the Schrödinger equation associated with the hydrogen atom. Only points where the absolute value of the eigenfunction is larger than a given threshold are plotted. The shapes clearly resemble the well-known hydrogen atom orbitals shown next to the scatter plots. The eigenfunctions (or rotations thereof) correspond to the following quantum numbers

(\underset{̲}{n}, \underset{̲}{ℓ}, \underset{̲}{m})

: (a)

(1, 0, 0)

, (b)

(2, 1, 1)

, (c)

(3, 2, 1)

, and (d)

(4, 3, 1)

.

Figure 4. Eigenfunctions of the Schrödinger equation associated with the hydrogen atom computed by applying kernel gEDMD to the corresponding Koopman generator. The quantum numbers

(\underset{̲}{n}, \underset{̲}{ℓ}, \underset{̲}{m})

are: (a)

(3, 2, 0)

and (b)

(4, 3, 2)

.

Figure 5. Swiss roll colored with respect to the eigenfunctions (a)

φ_{0}

and (b)

φ_{5}

, which parametrize the angular and vertical direction, respectively. (c) Resulting two-dimensional embedding.

Figure 6. Relationships between the Koopman, Kolmogorov, and Schrödinger operators for a drift-diffusion process of the form

d X_{t} = - \nabla V (X_{t}) d t + \sqrt{2 β^{- 1}} d B_{t}

. Here,

ρ_{0}

denotes the invariant density, i.e.,

L^{*} ρ_{0} = 0

. In our setting, the transformation of the Schrödinger operator requires a strictly positive real-valued ground state

ψ_{0}

.

Table 1. Overview of notation.

$X_{t}$	stochastic process
$X$	state space
$k, ϕ$	kernel and associated feature map
$H$	reproducing kernel Hilbert space induced by k
$K^{t}$	Koopman operator with lag time t
$L$	generator of the Koopman operator
$H$	Schrödinger operator
$T$	general differential operator
$T_{H}$	kernel-based differential operator
$C_{00}$	covariance operator
$\hat{A}$	empirical estimate of operator $A$
$G_{0}, G_{1}, G_{2}$	(generalizations of) Gram matrices

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Kernel-Based Approximation of the Koopman Generator and Schrödinger Operator

Abstract

1. Introduction

2. Koopman Theory and Reproducing Kernel Hilbert Spaces

2.1. The Koopman Operator and Its Generator

2.2. Generator EDMD

2.3. Second-Order Differential Operators

2.4. Reproducing Kernel Hilbert Spaces and Derivative Reproducing Properties

3. Kernel-Based Representation of Differential Operators

3.1. Galerkin Projection of Operators

3.2. Empirical Estimates

3.3. Weak Formulation and Numerical Algorithm

3.4. Analysis

4. Applications

4.1. Molecular Dynamics

4.2. Quantum Mechanics

4.2.1. Generator EDMD for the Schrödinger Equation

4.2.2. SDE Formulation of the Schrödinger Equation

4.3. Manifold Learning

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proofs

References

Article Metrics

Citations

Article Access Statistics