Kernel-Based Approximation of the Koopman Generator and Schrödinger Operator

Many dimensionality and model reduction techniques rely on estimating dominant eigenfunctions of associated dynamical operators from data. Important examples include the Koopman operator and its generator, but also the Schrödinger operator. We propose a kernel-based method for the approximation of differential operators in reproducing kernel Hilbert spaces and show how eigenfunctions can be estimated by solving auxiliary matrix eigenvalue problems. The resulting algorithms are applied to molecular dynamics and quantum chemistry examples. Furthermore, we exploit that, under certain conditions, the Schrödinger operator can be transformed into a Kolmogorov backward operator corresponding to a drift-diffusion process and vice versa. This allows us to apply methods developed for the analysis of high-dimensional stochastic differential equations to quantum mechanical systems.


Introduction
The Koopman operator [1][2][3][4] plays a central role in the global analysis of complex dynamical systems. It is, for instance, used to find conformations of molecules and coherent patterns in fluid flows, but also for prediction, stability analysis, and control [5][6][7][8][9][10]. Instead of analyzing a given finite-dimensional, but highly nonlinear system directly, the underlying idea is to compute an associated infinite-dimensional, but linear operator [4]. By computing an approximation of this operator from measurement or simulation data, it is possible to extract Koopman eigenvalues, eigenfunctions, and modes. The most frequently used techniques are based on variants or generalizations of extended dynamic mode decomposition (EDMD) [11,12]. A reformulation of EDMD for the generator of the Koopman operator, called gEDMD, was recently proposed in [13]. It was shown that in addition to the previously mentioned applications, the generator contains valuable information about the governing equations of a system; see also [7,14]. System identification aims at learning a preferably parsimonious model from data. That is, the learned model should comprise as few terms as possible and still have predictive power, which is typically accomplished by utilizing sparse regression techniques. One drawback of gEDMD is that it requires a set of explicitly chosen basis functions and their first-and-if the system is non-deterministic and non-reversible-second-order derivatives. Moreover, the size of the resulting matrix eigenvalue problem that needs to be solved to compute eigenvalues, eigenfunctions, and modes of the generator depends on the size of the dictionary. The goal of this paper is to derive a kernel-based method to approximate the Koopman generator from data. A kernel-based variant of EDMD was proposed in [12] and generalized in [15]. We derive a kernel-based variant of gEDMD. Employing the well-known kernel trick, a dual eigenvalue problem whose size depends on the number of snapshots can be constructed. The resulting methods allow for implicitly infinite-dimensional feature spaces and only require partial derivatives of the kernel function. This enables us to apply the methods to high-dimensional systems for which conventional techniques would be prohibitively expensive due to the curse of dimensionality, provided the number of snapshots is such that the eigenvalue problem can still be solved numerically or can be downsampled without losing essential information. Since we aim at approximating differential operators, we need to be able to represent derivatives in reproducing kernel Hilbert spaces. This requires the notion of derivative reproducing properties. Derivative reproducing kernels [16] were used to approximate Lyapunov functions for ordinary differential equations in [17] and to approximate center manifolds for ordinary differential equations in [18]. Reproducing kernel Hilbert spaces with derivative reproducing properties are related to the native spaces introduced in a different context in [19].
Similar operators are also used for manifold learning and understanding the geometry of high-dimensional data [20][21][22][23]. Methods like diffusion maps construct graph Laplacians with the aid of diffusion kernels, effectively approximating transition probabilities between data points. In the infinite data limit and letting the kernel bandwidth go to zero, it has been shown that these methods, depending on the normalization, essentially compute eigenfunctions of certain differential operators, e.g., the Laplace-Beltrami operator, the Kolmogorov backward operator, or the Fokker-Planck operator.
Another related differential operator that is of utmost importance in quantum mechanics is the Schrödinger operator. Solutions of the time-independent Schrödinger equation describe stationary states and associated energy levels. We will illustrate how kernel-based methods developed for the Koopman generator can be applied to these related problems. The main contributions of this paper are:

•
We show how the derivative reproducing properties of kernels can be used to approximate differential operators such as the Koopman generator and the Schrödinger operator, as well as their eigenvalues and eigenfunctions from data. Additionally, we derive a kernel-based method tailored to reversible dynamics, which does not require estimating drift and diffusion terms, but only an equilibrated trajectory. • Furthermore, we exploit the fact that, under certain conditions, the Schrödinger operator can be turned into a Kolmogorov backward operator (see, e.g., [24]), which allows for the interpretation of a quantum-mechanical system as a drift-diffusion process and, as a consequence, the application of methods developed for the analysis of stochastic differential equations or their generators.

•
We demonstrate potential applications in molecular dynamics, using the example of a quadruple-well problem, and quantum mechanics, describing how to apply the proposed methods directly to the Schrödinger equation or the associated stochastic process. This will be illustrated with two well-known examples, the quantum harmonic oscillator and the hydrogen atom.
The remainder of the manuscript is structured as follows: We first introduce the necessary tools, namely the Koopman operator, its generator, and (derivative) reproducing kernel Hilbert spaces in Section 2. Additionally, relationships with the Schrödinger equation will be explored. The derivation of the kernel-based formulation of gEDMD will be detailed in Section 3. In Section 4, we will show how the derived methods can be applied to molecular dynamics and quantum mechanics problems. Concluding remarks and future work will be discussed in Section 5.

Koopman Theory and Reproducing Kernel Hilbert Spaces
We start directly with the non-deterministic setting; the Koopman operator and its generator for ordinary differential equations can then be regarded as a special case; see also [13] for a detailed comparison. The notation used below is summarized in Table 1.

The Koopman Operator and Its Generator
In what follows, let X ⊂ R d be the state space and f : X → R a real-valued observable of the system. Furthermore, let E[ · ] denote the expected value and Θ t the flow map associated with a dynamical system, i.e., Θ t (X 0 ) = X t . Given a stochastic differential equation of the form: Brownian motion, the stochastic Koopman operator is defined by: The infinitesimal generator L of the semigroup of Koopman operators is given by: and its adjoint, the generator of the Perron-Frobenius operator, by: with a = σ σ . We assume from now on that a is uniformly positive definite on X. The second-order partial differential equation ∂u ∂t = Lu is also called the Kolmogorov backward equation and ∂u ∂t = L * u the Fokker-Planck equation [2].

Remark 1.
As in [13], we will often consider systems of the form: where V is a given potential and β the inverse temperature. In this case, the operators can be written as:

Generator EDMD
A data-driven method for the approximation of the generator of the Koopman operator and Perron-Frobenius operator called generator extended dynamic mode decomposition (gEDMD) was derived in [13]. While standard EDMD requires a training dataset { x m } M m=1 and the corresponding data points { y m } M m=1 , where y m = Θ τ (x m ) for a fixed lag time τ, gEDMD assumes that we can evaluate or estimate (using, for instance, Kramers-Moyal formulae) Choosing a dictionary of basis functions { φ n } N n=1 , where φ n : R d → R, and defining φ(x) = [φ 1 (x), . . . , φ N (x)] , we compute the matrices Φ X , dΦ X ∈ R N×M , with: where: The matrix representation of the least-squares approximation of the Koopman generator L is then given by: It was shown that gEDMD, in the infinite data limit, converges to a Galerkin projection of the generator onto the space spanned by the basis functions { φ n } N n=1 and that L is an empirical estimate of the projected generator [13]. Approximations of eigenfunctions of L are then given by: where ξ is an eigenvector of L corresponding to the eigenvalue λ and ·, · denotes the standard Euclidean inner product. Analogously, the generator of the Perron-Frobenius operator is given by ( L * ) = A G + . Further details, examples, and different applications including system identification, coarse graining, and control can also be found in [13].

Second-Order Differential Operators
Consider the generator L in (2), and assume there is a unique strictly positive invariant density ρ 0 , which we can write as ρ 0 (x) ∝ exp(−F(x)). The function F is called a generalized potential (with F = βV for the stochastic differential equation in Remark 1). The measure corresponding to ρ 0 is denoted by dµ = ρ 0 dx. The negative generator can be decomposed into a symmetric and an anti-symmetric part as: see [24]. The vector field J is called stationary probability flow. In the form of (3), −L is a special case of an elliptic second-order differential operator on L 2 µ , given by: for scalar functions F, W, a uniformly positive definite matrix field a, and a vector field J.

Remark 2.
Because of the general form of (5), we avoid making too many assumptions about the coefficients of T or its domain of definition. The goal is to derive numerical algorithms using a minimal set of assumptions.
A detailed analysis of the interplay between the domains and properties of the reproducing kernel Hilbert space (RKHS) will be carried out in future publications.
If F ≡ 0, we obtain generalized Schrödinger operators as another special case, i.e., with W called the potential energy in quantum mechanics. In particular, with the reduced Planck We note for later use that, under certain conditions, Schrödinger operators and Koopman generators are equivalent; see, e.g., ([24] Chapter 4.9). For the sake of completeness, the proof is shown in Appendix A. Lemma 1. The ergodic generator −L with unique positive invariant density ρ 0 ∝ exp(−F) is unitarily equivalent to the Schrödinger operator H in (6) on L 2 , with J remaining unchanged and W given by: The function e − 1 2 F is an eigenfunction of H with eigenvalue zero. Conversely, let H be as in (6), and assume there is a non-degenerate smallest eigenvalue E 0 with strictly positive real eigenfunction ψ 0 = exp(−η). Then, H is unitarily equivalent to a negative ergodic generator −L on L 2 µ , where ρ 0 ∝ exp(−2η) is the density associated with µ and ρ 0 is invariant for the corresponding SDE. The explicit form of −L is given by: Corollary 1. Applying Lemma 1 to (7), we have: where L is the Koopman generator of a drift-diffusion process (see Remark 1) with potential (up to an additive constant): We will exploit this duality below to apply methods developed for the Koopman operator or generator to the Schrödinger operator. More details on quantum chemistry in general and also the quantum harmonic oscillator and the hydrogen atom studied in Section 4 can be found, e.g., in [25].

Reproducing Kernel Hilbert Spaces and Derivative Reproducing Properties
We aim at representing the differential operators introduced above in reproducing kernel Hilbert spaces. Definition 1. Let X be a set and H a space of functions f : X → R. Then, H is called an RKHS with inner product ·, · H if a function k : X × X → R exists such that: The function k is called a kernel. It was shown that every RKHS has a unique symmetric positive definite reproducing kernel and that, conversely, every symmetric positive definite kernel spans a unique RKHS; see [26][27][28]. Here, we use the terms positive definite and strictly positive definite, i.e., positive definite means that ∑ M r=1 ∑ M s=1 γ r γ s k(x r , x s ) ≥ 0 for all M ∈ N, γ 1 , . . . , γ M ∈ R, and x 1 , . . . , x M ∈ X. Frequently used kernels include the polynomial kernel and the Gaussian kernel, given by: respectively. Here, q ∈ N is the degree of the polynomial kernel, c ≥ 0 a parameter, and ς the bandwidth of the Gaussian kernel. We now introduce the partial derivative reproducing properties of RKHSs [16].
Given f : X → R, let D α denote the partial derivative (assuming it exists): Thus, the i th entry of the gradient is given by D e i f and the (i, j) th entry of the Hessian by D e i +e j , where e i and e j are the i th and j th unit vectors, respectively. When we apply the differential operator D α to the kernel k, the multi-index α is assumed to be embedded into N 2 d 0 by adding zeros, i.e., the derivatives are computed with respect to the first argument of the kernel. Furthermore, when we write ∇k(x, x ), the gradient is computed with respect to x. In what follows, let k(x, ·) = φ(x), where φ is the canonical feature space mapping. Theorem 1 ([16]). Given p ∈ N 0 and a positive definite kernel k : X × X → R with k ∈ C 2 p (X × X), the following holds: The second property is called the derivative reproducing property. For p = 0, this reduces to the standard reproducing property of RKHSs. Example 1. Let us consider the two aforementioned kernels:

1.
For the polynomial kernel, we obtain: Similarly, for the Gaussian kernel, this results in: For the numerical experiments below, we will mainly use the Gaussian kernel. (To get error estimates, it might be more convenient to use Wendland functions [19]. We leave the formal analysis of the methods developed in this paper for future work.)

Kernel-Based Representation of Differential Operators
In this section, we introduce the Galerkin projection of the differential operators discussed above onto the RKHS, including the Koopman generator and Schrödinger operator. We then move on to show how these projected operators can be estimated from data.

Galerkin Projection of Operators
Let µ denote a probability measure on the state space X, with density ρ 0 ∝ e −F for a generalized potential F.

Definition 2.
We define the covariance operator C 00 : H → H by: and an operator T H : H → H by: If J ≡ 0, we define T H by: The operator C 00 is the standard covariance operator C XX ; see [29,30]. The operator T H mimics the action of the bilinear form T f , g µ on the RKHS. It plays the same role as the cross-covariance operator C XY for the Koopman operator in [15]. The form of the symmetric operator for J ≡ 0 is motivated by the symmetry of T , and that, at least formally: see also [31].

Lemma 2.
Assume that H ⊂ D(T ) and that all terms appearing under the integral signs in (8) and (9) (or (10)) are in L 1 µ as bounded operators on H, that is: Then, for all f , g ∈ H, The proof can be found in Appendix A. It uses the derivative reproducing properties and the definition of rank-one operators. Note that: Proof. The proof is similar to the one for the corresponding result for kernel transfer operators; see [15]. With the previous lemma, we obtain: If the assumptions of Lemma 3 are satisfied and the operator C 00 is invertible, the RKHS operators defined above can be used to compute exact eigenfunctions of T . Indeed, if ϕ is a solution of: then multiplying this equation by C −1 00 shows that ϕ is also an eigenfunction for T . A typical approach to circumvent the potential nonexistence of the inverse of the covariance operator is to consider a regularized version T ε = (C 00 + εI) −1 T H for a regularization parameter ε. However, the assumptions of Lemma 3 are strong and may be hard to verify in practice. However, in any case, Lemma 2 shows that the operators defined in Definition 2 provide a Galerkin approximation of the full operator in the RKHS H.

Empirical Estimates
The next step is to derive empirical estimates of the operators defined above. Given training data { x m } M m=1 , sampling the probability distribution µ, If T is the generator of an SDE with invariant measure µ, the data can also be obtained by integrating the stochastic dynamics with the initial condition drawn from µ. We see that Φ is the standard feature map and dΦ contains the action of the differential operator. The empirical estimates of the operators C 00 and T H are then given by the following expressions: Note that these are still finite-rank operators on the full RKHS H. For the symmetric RKHS operator T H , we need to define the empirical estimate in a slightly different way. Decompose the positive definite matrix a(x m ) = σ(x m )σ(x m ) . With: where σ l is the l th column of σ, the empirical RKHS operator becomes:

Remark 3.
If the feature space associated with the kernel k is finite-dimensional and known explicitly, i.e., , then for the Koopman generator, we obtain gEDMD as a special case, with C 00 = G and T H = − A . However, the goal is to rewrite gEDMD in such a way that only kernel evaluations are required since φ can potentially be infinite-dimensional and might only be defined implicitly.

Weak Formulation and Numerical Algorithm
With Lemma 2 in mind, we now proceed to the weak formulation of the eigenvalue problem for the operator T . We then define the quadratic forms: where D Q is the domain of the quadratic form Q. We consider the weak eigenvalue problems: We will now rewrite (17) in such a way that only kernel evaluations-in the form of Gram matrices-are required. The derivation is similar to the kernel transfer operator counterpart in [15], but we now need to consider derivatives at the training data points instead of the time-lagged variables. We start by restricting (17) to the finite-dimensional space H M = span{φ(x m )} M m=1 , which we assume to be M-dimensional. Elements of this space are of the form f = Φ u for some vector u ∈ R m . We examine the quadratic forms Q H and S H on this space.

Lemma 4.
A solution of the problem Q H ( f , g) = λ S H ( f , g) is given by f = Φ u, where u is a solution of one of the following generalized eigenvalue problems: In the general case, u solves G 2 u = λ G 0 u, where the entries of the matrices G 2 and G 0 are given by: (ii) Analogously, for the symmetric case, we obtain 1 and σ l (x m ) is the l th column of the matrix σ(x m ).
The proofs are shown in Appendix A. Since [φ(x m )] (x r ) = k(x m , x r ), G 0 is the standard Gram matrix. The reversible case requires only first-order derivatives of the kernel. Furthermore, only trajectory data sampled from the invariant distribution µ and estimates of the diffusion term σ are needed. For typical problems, σ is constant and not position-dependent. As a result, the diffusion term needs to be estimated only once or might even be known. For molecular dynamics problems, for instance, it is proportional to the square root of the temperature. The overall approach is summarized in the following algorithm. Note that it is not a direct kernelization of gEDMD, but an extension that approximates the Koopman generator as a special case. Algorithm 1. The final numerical algorithm can be summarized as follows: 1.
Choose a kernel k and compute all its required derivatives, either analytically or with the aid of automatic differentiation.

2.
Assemble the Gram matrices G 2 and G 0 or, if the system is symmetric, G 1 , for l = 1, . . . , d, and G 0 .

3.
Solve the corresponding eigenvalue problem described in Lemma 4 to obtain an eigenvector u.
The two main steps of the algorithm are assembling the Gram matrices and solving the generalized eigenvalue problem. Since the size of the eigenvalue problem depends on the number of data points, the cost is cubic in M. This is a drawback of many kernel-based methods. The efficient approximation of solutions to this eigenvalue problem for large datasets will be considered in future work.

Analysis
In this section, we provide some preliminary analysis of the methods introduced above. The first result concerns the convergence of the empirical estimates.

Lemma 5.
As M → ∞, the empirical estimates defined in Section 3.2 converge to the corresponding RKHS operators in Definition 2 with respect to the operator norm for almost all data sequences { x m } M m=1 , if the data were generated either as i.i.d. samples from µ or by integrating a stochastic dynamics, which is ergodic with respect to µ.
Proof. The statement follows from ergodicity of the underlying dynamics, the integrability conditions in Lemma 2, and the Birkhoff individual ergodic theorem for Banach space valued functions [32].
(ii) Let δ ∈ (0, 1]. Assume the coefficients of the operator T are all globally bounded, and let sup x∈X D α k(x, x) < ∞ for all |α| ≤ 4 (|α| ≤ 2 in the symmetric case). If the data are drawn i.i.d. from the distribution µ, then there are constants κ 0 , κ 1 such that with probability at least 1 − δ, where the · HS is the Hilbert-Schmidt norm.
Proof. (i) The empirical estimates are all finite rank and, therefore, Hilbert-Schmidt. For C 00 and T H , this follows from the integrability conditions and the first part of the proof of Lemma 2; see Appendix A.
(ii) For C 00 , the bound was already proven in [33] with κ 0 = sup x∈X k(x, x) 2 . We can employ the same strategy to obtain the bound for T H . Consider the operatorT m By global boundedness of the coefficients of T and by: we can find a κ 1 such that φ(x) ⊗ dφ(x) HS ≤ κ 1 for all x ∈ X. We then have T m H HS ≤ 2κ 1 , and the result follows from the concentration bound ( [33] Equation (3)).
Finally, we show that solutions of (16) are also eigenvalues of the full operator T if the RKHS is dense in D Q : Proposition 1. Let H be dense in D Q with respect to the norm in L 2 µ . If ψ ∈ H is an eigenfunction of (16), it is also an eigenfunction of T with the same eigenvalue.
Proof. Let ψ solve the variational problem (16). The definition of the operators C 00 , T H implies that for all φ ∈ H: By the density of the RKHS, this also holds for all φ ∈ D Q , and consequently, ψ is an eigenfunction of T .
Note that even if the RKHS is dense, there might be additional eigenfunctions that are not contained in H and that will not appear as solutions of (16).

Applications
The methods described above have important applications in molecular dynamics and quantum physics, which we will show in an exemplary way, but can in principle be applied to data generated by arbitrary dynamical systems and also other differential operators. The code and select examples are available online [34]. Note that this is just a proof-of-concept implementation and that the methods could be sped up significantly by vectorizing and parallelizing the code and by tailoring the implementation to specific kernels.

Molecular Dynamics
Eigenvalues and eigenfunctions of transfer operators associated with molecular dynamics problems are often used to understand protein folding or binding/unbinding processes and their implied time scales. Conformations correspond to metastable sets and transitions between different conformations to crossing energy barriers. The slowest dynamical processes are encoded in eigenfunctions whose eigenvalues are close to zero. Large-scale molecular dynamics examples, analyzed using kernel EDMD, can also be found in [35]. In this paper, we want to focus more on new applications.

Example 2.
Let us consider the simple quadruple-well problem whose potential V is visualized in Figure 1a; see also [13]. We first generate an equilibrated trajectory so that the training dataset of size M = 5000 is sampled from the invariant distribution and then apply kernel gEDMD for reversible processes, choosing a Gaussian kernel with bandwidth ς = 0.5. The operator −L has four dominant eigenvalues λ 0 = 0.009, λ 1 = 0.400, λ 2 = 1.011, and λ 3 = 1.55, followed by a spectral gap. We then apply SEBA (sparse eigenbasis approximation; see [36]) to cluster the dominant eigenfunctions into four metastable sets. The results are shown in Figure 1b. As expected, the sets correspond to the wells of the potential. The computation and clustering of the eigenfunctions took approximately four minutes on a standard laptop (8 cores, 1.80 GHz, 16 GB of RAM). For comparison, we estimated the generator eigenvalues using a Markov state model. Applying both methods to 20 different trajectories, we computed the average of the eigenvalues and the standard deviation, see Figure 1c. The results were in excellent agreement. Clearly, the standard deviation increased for higher eigenvalues.

Quantum Mechanics
The goal now is to apply data-driven methods to simple quantum mechanics problems of the form (7) with H = −¯h 2 2m ∆ + W.

Generator EDMD for the Schrödinger Equation
Let us consider two systems for which the eigenfunctions are well known.

Example 3.
For the quantum harmonic oscillator with angular frequency ω, the potential can be written as W(x) = 1 2 m ω 2 x 2 . The eigenfunctions ψ and corresponding energy levels E of this system can be computed analytically, and we obtain: and E =h ω + 1 2 , for = 0, 1, 2, . . . . Here, H denotes the th physicists' Hermite polynomial. For the numerical experiments, we seth = m = ω = 1. Furthermore, the bandwidth of the kernel is set to ς = 1. Computing the Gram matrices G 2 and G 0 for 100 uniformly distributed points in [−5, 5] and solving the corresponding eigenvalue problem, this results in the eigenfunctions shown in Figure 2. The probability densities p are defined by p (x) = |ψ (x)| 2 .
Here, e is the electron charge and ε 0 the vacuum permittivity. Note that the parameter m in front of the Laplacian is the reduced mass of the system. As before, we define the physical constants to be one and use the Gaussian kernel, now with bandwidth ς = 2. We then generate 5000 uniformly distributed test points in the ball with radius 20 and compute the Gram matrices G 2 and G 0 . Solving the resulting eigenvalue problem, we obtain the eigenfunctions shown in Figure 3. As expected, there are several repeated eigenvalues (up to small perturbations due to the randomly sampled test points and numerical errors) for the higher energy states.

SDE Formulation of the Schrödinger Equation
In order to derive gEDMD, we went from the stochastic differential equation to the Kolmogorov backward equation, which is the generator of the Koopman operator, or the adjoint Fokker-Planck equation, which is the generator of the Perron-Frobenius operator. Exploiting the resemblance between these two equations and the Schrödinger equation, we illustrated how data-driven methods can, in the same way, be used to compute wavefunctions. We now want to go in the opposite direction and find a stochastic differential equation whose eigenfunctions correspond to the wavefunctions. Formal similarities between quantum mechanics and the theory of stochastic processes have been investigated since the beginning of quantum mechanics by Schrödinger and others (see, for example, [37] and the references therein). The necessary transformations were already introduced in Section 2.3; we now want to exploit these relationships. Let us consider the two aforementioned examples again.
Example 5. Using Corollary 1, the quantum harmonic oscillator can be transformed into an Ornstein-Uhlenbeck process: with friction coefficient α =hω and temperature β −1 =¯h 2 2m . Since the eigenvalues of the Ornstein-Uhlenbeck process are λ = −α = −hω , the resulting eigenvalues of the quantum harmonic oscillator are E = E 0 − λ =h ω + 1 2 . Correspondingly, the (unnormalized) eigenfunctions of the Ornstein-Uhlenbeck process are ϕ (x) = H ( αβ x), where H is the th probabilists' Hermite polynomial. Thus, which is consistent with the results obtained above. In the last step, we transformed the probabilists' Hermite polynomials into the physicists' Hermite polynomials.

Example 6.
Similarly, for the hydrogen atom, whose ground state is given by: where a 0 = 4πε 0h 2 me 2 , we obtain V(x) =¯h 2 m a 0 x , and thus: There are now two options to compute the eigenfunctions numerically: we can either directly apply kernel gEDMD to the Koopman generator or generate time-series data by integrating the stochastic differential equation and then applying kernel EDMD or simply Ulam's method. We proceed with the former, but the latter leads to comparable results (although typically, more data points are required to achieve the same accuracy due to the stochasticity). We again generate uniformly distributed test points x m in the ball with radius 20, this time m = 10, 000, and use the Gaussian kernel with bandwidth ς = 2. This results in the same eigenfunctions as the ones shown in Figure 3. Due to the larger number of test points, even higher energy states can be well approximated. Two additional eigenfunctions are shown in Figure 4. The examples illustrate that instead of solving partial differential equations, we can also compute eigenfunctions by approximating the Koopman operator from time-series data. The question under which conditions a non-degenerate strictly positive ground state exists needs to be addressed separately. One important theorem can be found in [38]: loc (X) be the space of locally square-integrable functions and W ∈ L 2 loc (X) positive. Suppose lim |x|→∞ W(X) = ∞, then −∆ + W has a non-degenerate strictly positive ground state.
There are other results concerning the existence of such states; see [38] for details. Furthermore, diffusion Monte Carlo methods, which simultaneously compute the ground state energy and wavefunction, rely on similar assumptions [39]. However, in many cases of interest, the ground state of fermionic systems will have nodes so that these methods are not applicable [39]. The work presented here aims mainly at linking different operators describing the evolution of dynamical systems; more detailed relationships-in particular with the aforementioned diffusion Monte Carlo methods-and practical implications will be studied in future work.

Manifold Learning
So far, we assumed that the data were generated by a dynamical system. There is, however, a second scenario without any notion of time, where the Kolmogorov backward equation and Fokker-Planck equation are used for dimensionality reduction and manifold learning [21]; see also [20,22,23] and the references therein.
Let the data points { x m } M m=1 be sampled from an arbitrary probability density ρ, then we can define the associated potential by: It was shown in [21] that, depending on some normalization parameter α, anisotropic diffusion maps approximate operators of the form: That is, for α = 1 2 , we obtain the standard Kolmogorov backward equation with β = 1. Thus, the algorithms described above could also potentially be used for manifold learning purposes. We will illustrate this with a simple example.

Example 7.
We consider the well-known Swiss roll; see, for instance, [23]. The goal is to parametrize the two-dimensional manifold. We use kernel density estimation, cf. [40], and a Gaussian kernel with bandwidth ς = 0.22 to learn U(x), i.e., and approximate the backward Kolmogorov operator by applying kernel gEDMD. Here, M = 2000 and d = 2.
The results are shown in Figure 5. The first eigenfunction parametrizes the angular direction, followed by higher order modes, and only the sixth eigenfunction corresponds to the x 3 direction. Considering these eigenfunctions as new coordinates, we obtain an unfolding of the roll. Note that also diffusion maps do not yield perfect rectangles in the embedded space due to the non-uniform density of points on the manifold [23]. These results demonstrate that the eigenfunctions of certain differential operators capture geometrical properties of the data. However, the assumption that a strictly positive density in the ambient space exists will in general not be satisfied if the data are supported only on a lower dimensional manifold. This problem was circumvented by using kernel density estimation and a kernel with global support. Carrying over the definition of the differential operators involved and of their kernel-based analogues to the manifold case are beyond the scope of this paper and will be studied in future work. The same applies to the investigation of detailed relationships with diffusion maps or other manifold learning techniques. Concepts like neighborhood and sparsity will then need to be carried over to gEDMD to make this method amenable to large datasets. Furthermore, heuristics to find the optimal bandwidth ς are required since the results often strongly depend on the kernel hyperparameters.

Conclusions
Using the theory of derivative reproducing kernel Hilbert spaces, we derived a kernel-based formulation of gEDMD for approximating the Koopman generator, which allowed for the computation of eigenfunctions of potentially high-dimensional stochastic dynamical systems. If the system is reversible, the generator can be approximated from equilibrated time-series data, without having to estimate the drift and diffusion terms at the training data points. Furthermore, we showed that data-driven methods developed for the analysis of stochastic dynamical systems (kernel EDMD) can be carried over to their generators (kernel gEDMD) and, in turn, to the Schrödinger operator. Conversely, under certain assumptions on the ground state, the Schrödinger equation can be turned into a Kolmogorov backward equation corresponding to a drift-diffusion process. These results are summarized in Figure 6. Similar transformations also exist for the Fokker-Planck operator; see [24]. All derived approaches were illustrated with numerical results ranging from molecular dynamics to quantum mechanics.
Koopman operator Kolmogorov backward operator Figure 6. Relationships between the Koopman, Kolmogorov, and Schrödinger operators for a drift-diffusion process of the form dX t = −∇V(X t ) dt + 2β −1 dB t . Here, ρ 0 denotes the invariant density, i.e., L * ρ 0 = 0. In our setting, the transformation of the Schrödinger operator requires a strictly positive real-valued ground state ψ 0 .
Although we focused mainly on the Kolmogorov backward equation, the Fokker-Planck equation, and the Schrödinger equation, these methods can be applied to approximate other differential operators as well. An interesting open question is whether such algorithms can also be used for manifold learning. Some preliminary results were presented in Section 4, but a rigorous mathematical justification would require significant additional research. Analyzing connections with diffusion maps [20] or generalizations thereof in detail could be a potential direction for future work.
Another interesting avenue for future research could be to improve the efficiency and stability of the presented algorithms. Exploiting the properties of the given kernels, it might be possible to speed up computations significantly. The definition of a cutoff radius for the kernel or considering only a certain number of neighbors of data points, for instance, would-for suitable problems-result in sparse matrices. Moreover, the results sensitively depend on the hyperparameters such as the bandwidth of the Gaussian kernel. If the bandwidth is too small, this leads to overfitting and noisy eigenfunctions. If it is, on the other hand, too large, then the kernel is not able to capture the properties of the dynamical system accurately anymore. As a result, the Gram matrix G 0 has (numerically) essentially a low rank structure, and we obtain many zero eigenvalues. The question is then how to compute the smallest nonzero eigenvalues and corresponding eigenvectors efficiently.
Potential solutions for the hyperparameter tuning problem are techniques based on cross-validation [41] or so-called kernel flows [42]. By defining an optimization problem for the parameters of the kernel, e.g., based on a variational principle [43], gradient descent methods can help find suitable parameter values.

Appendix A. Proofs
Proof of Lemma 1. The unitary transformation is H = −e − 1 2 F L e 1 2 F · . We obtain: which establishes the first-order term in H and the third term in the definition of W. For the symmetric part, we find that:

Proof of Lemma 2.
We only show the proof for T H . Similar to the argument in [44], T H is a bounded linear operator on H because of: a ij (x)D e i +e j φ(x) dµ(x) Using the derivative reproducing property, we obtain: