Spectral Properties of Effective Dynamics from Conditional Expectations

The reduction of high-dimensional systems to effective models on a smaller set of variables is an essential task in many areas of science. For stochastic dynamics governed by diffusion processes, a general procedure to find effective equations is the conditioning approach. In this paper, we are interested in the spectrum of the generator of the resulting effective dynamics, and how it compares to the spectrum of the full generator. We prove a new relative error bound in terms of the eigenfunction approximation error for reversible systems. We also present numerical examples indicating that, if Kramers–Moyal (KM) type approximations are used to compute the spectrum of the reduced generator, it seems largely insensitive to the time window used for the KM estimators. We analyze the implications of these observations for systems driven by underdamped Langevin dynamics, and show how meaningful effective dynamics can be defined in this setting.


Introduction
The description of high-dimensional systems by a reduced set of variables, usually referred to as coarse graining, is of tremendous importance across many different fields of research.Examples range from finance to climate modeling to molecular biology.Two of the central reasons for the importance of coarse graining are that, firstly, analysis or numerical simulation of high-dimensional systems is often challenging or simply infeasible, and secondly, not all detailed features of the full system are needed in order to answer questions of scientific interest.Typical challenges arising along the search for a coarse grained model are 1) which properties of the high-dimensional system are important, and which ones are not, 2) what constitutes a good set of reduced variables (also called collective variables or reaction coordinates in some contexts), and 3) what is a suitable equation to define the low-dimensional description of the system, often called an effective dynamics.
In this paper, we will study the coarse graining problem for systems driven by a diffusion process, and adopt a perspective motivated by molecular dynamics simulations of biological systems.In molecular simulation, equilibrium properties and slow dynamical properties are typically identified as relevant, while all fast oscillatory or vibrational motions are less important to characterize the long-time behavior of the system.We will approach the second and third question raised above from this perspective.
For molecular systems, two types of coarse graining strategies can be broadly distinguished.The first is coarse graining in structural space, where the physical representation of a system is simplified, e.g. by unifying several atoms into so-called beads or pseudo-atoms, see e.g.[9,31,40].The second approach is a coarse graining in configurational space.Here, the full system is projected to a smaller set of variables by a transformation of its state space.Important contributions along these lines are the Mori-Zwanzig formalism [26,48,7,8,18] as well as the framework of averaging and homogenization for systems with explicit multiscale structure [33].For further important contributions to this line of research in the field of molecular simulation please see [37,39,29] and the references therein.We proceed along the latter direction and concentrate on the effective dynamics using conditional expectations introduced in [22] and discussed in detail in Refs.[47,23].In particular, we focus on the preservation of slow dynamical components by the effective dynamics.
The contribution of this paper is as follows: first, we discuss the error between dominant eigenvalues of the full and the effective dynamics for reversible diffusions.We provide a new bound for the relative approximation error of the dominant eigenvalues in terms of the H 1 µ -approximation error for the corresponding eigenfunctions.The relative error is a more practical error measure for small eigenvalues than the absolute error considered in [47].Also, the new bound is less restrictive than the one obtained therein.Second, we discuss the applicability of these results to Langevin dynamics if their overdamped limit is exploited, which requires the use of a large offset when estimating the parameters of the effective dynamics by Kramers-Moyal type estimators.Third, we present numerical examples showing that essential long-term dynamical properties appear to be preserved well by effective dynamics estimated using a large offset.This observation makes the second point particularly useful.Although a theoretical proof of these findings is still lacking, as a potential first step, we provide expressions for the parameters of the effective dynamics at large offset within metastable sets.
The rest of this paper is organized as follows: in section 2, we recap what is needed of the theory of diffusion processes, their generators and transfer operators.Section 3 contains a description of the effective dynamics, our approximation result, the discussion of Langevin dynamics and the expressions for large offsets.Numerical results for two toy problems and two benchmark datasets of molecular dynamics simulation are provided in section 4. Conclusions and outlook follow in section 5.

Setting
In this paper, we consider a continuous-time and -space Markov process X t attaining values in R d .The process is governed by the stochastic differential equation Here, B t denotes d-dimensional Brownian motion, the function b : R d → R d is called the drift, β is the inverse temperature, and σ : R d → R d×d is called the diffusion.We use the notation for the covariance matrix of the diffusion.A standard example for dynamics of type Eq. ( 1) are Langevin dynamics of a system with N position variables q t ∈ R N and momentum variables p t ∈ R N , such that d = 2N .The constant γ in Eq. ( 4) represents friction, while the function V : R N → R is the potential energy on position space.A second and related example is the overdamped Langevin process which is defined on position space only, i.e., d = N .
We assume that drift and diffusion are smooth functions and that a is bounded and uniformly elliptic, i.e., for all vectors v ∈ R d , and η 1 , η 2 > 0 are independent of x.It follows that X t is ergodic with respect to a unique invariant measure µ.Note that the Langevin process Eqs.(3)(4) does not satisfy Eq. ( 6), but ergodicity can still be shown using different techniques [34,24]. 1 We assume that the invariant measure possesses a density function with respect to Lebesgue measure, likewise denoted by µ.We will frequently use the Hilbert space L2 µ of square integrable functions with respect to the measure µ, with scalar product Also, we use the symbols H k µ for the Sobolev spaces [16] of functions possessing weak derivatives up to order k in L 2 µ .
The system dynamics are encoded in the semigroup of transfer operators Tτ , τ ≥ 0, defined for f ∈ L 2 µ by where E x [•] denotes expectation given that the dynamics starts deterministically at x. Transfer operators describe the evolution of expectation values as the dynamics progress over a time window τ , typically called the lag time.We denote the infinitesimal generator of this semigroup by L, and its domain by D(L) ⊃ H 2 µ .For twice continuously differentiable functions f ∈ L 2 µ , the generator acts as a differential operator: where ∇ 2 f is the matrix of second derivatives of the function f and : denotes the Frobenius inner product between matrices.A diffusion process is called reversible if the generator is self-adjoint on L 2 µ .In this case, L satisfies the equality for all f, g ∈ D(L).We will assume reversibility from now on.

Spectral Properties
With the assumption (6) the generator L has compact resolvents 2 and thus possesses a complete set of eigenfunctions corresponding to isolated eigenvalues.That is, there are functions ψ 1 , ψ 2 , . . .and non-negative numbers 0 = κ 1 < κ 2 ≤ . . .such that and every function f ∈ L 2 µ can be represented by the expansion By the spectral mapping theorem [35, Ch 2, Thm 2.4.], the eigenfunctions ψ i are also eigenfunctions of the transfer operators Tτ for all τ , corresponding to eigenvalues Due to the exponential decay of all λ i (τ ), it is common to refer to the κ i as rates, and to their reciprocals as implied timescales The action of Tτ on a function f ∈ L 2 µ can be decomposed as In many applications, including molecular dynamics, we expect to find a spectral gap, that is a number M of dominant rates 0 = κ 1 < κ 2 < κ M κ M +1 separated from all others.In this case, the expansion (15) consists of only M slowly decaying terms if t t M +1 .The existence of dominant spectral components is directly related to metastability, that is, the existence of long-lived macrostates such that transitions between those states are rare events [10,12,13].

Markov State Models and Variational Approach
Many numerical methods for the analysis of metastable behavior have focussed on modeling the dominant eigenfunctions ψ 1 , . . ., ψ M by linear combinations of a finite set of basis functions ϕ i , i = 1, . . ., N .Applying either a Galerkin projection to the transfer operator Tτ [12,42] or invoking the Rayleigh-Ritz variational principle [19,30] leads to the generalized eigenvalue problem The top M generalized eigenvectors provide expansion coefficients for approximate eigenfunctions Notably, the matrix elements in Eqs.(17)(18) can be approximated by ergodic averages from long simulations.We refer to this approximation technique as the variational approach to conformational dynamics (VAC).If the basis functions are chosen as indicator functions corresponding to a decomposition of state space into non-overlapping sets S i , i = 1, . . ., N , Eq. ( 16) is also referred to as Ulam's method or a Markov state model (MSM).In this case, the matrix T = (C 0 ) −1 C τ , which possesses the same eigenvalues as the solutions of Eq. ( 16), can be interpreted as a stochastic transition matrix between the sets S i .This allows to make use of numerous tools available to analyze discrete time Markov chains.Please see Refs [42,37,43,4] on the use of MSMs in biomolecular simulation, and the recent reviews [29,20] on the VAC and its relation to other methods and fields.In the numerical examples (sec.4), the VAC and MSMs are the primary tools we use to extract dominant spectral components from simulation data.

Projected Dynamics
If the state space dimension d is large, simulation of the dynamics Eq. (1) up to statistically relevant time horizons can be prohibitively costly.It is therefore desirable to perform a dimensionality reduction by projecting the system onto a much smaller number m of variables, and replace the full dynamics by a suitable effective dynamics that only depends on the selected degrees of freedom.Many different techniques to accomplish this have been developed.Here, we focus on the projection formalism introduced in Ref. [22] and discussed in detail in Ref. [46].

Projection Operator
Dimensionality reduction is realized by a function ξ which maps the state space R d onto R m , where m ≤ d.We denote the components of the map ξ by ξ l , l = 1, . . ., m, and assume each component to be smooth, such that ξ l ∈ D(L) for all l.Also, we assume that the image of ξ can be mapped to all of R m by a diffeomorphism.Without loss of generality, we can then take the image of ξ to be all of R m .The key ingredient to the definition of an effective dynamics that only depends on the lower-dimensional space R m , is the projection operator [46] In this definition, z ∈ R m , Σz = x ∈ R d : ξ(x) = z is the pre-image of z under the map ξ, and σz denotes the surface measure on the set Σz.Moreover, J(x) is the Jacobian determinant of the map ξ, and is the ξ-marginal of µ at ξ(x) = z.Thus, the projection operator acts on a function f ∈ L 2 µ by averaging its values over the level sets of the map ξ with respect to the invariant measure.The additional factor J −1/2 and the appearance of the surface measure are due to the co-area formula [17] and account for the non-linear change of variables.The normalization by ν(z) ensures that the projection of a constant function is the constant itself.We introduce the notation for the probability measure on Σz obtained by conditioning the invariant measure µ to ξ(x) = z.We also note that ν is itself a probability density on R m , and hence gives rise to a probability measure which we will also denote by ν.
For each f ∈ L 2 µ , the function Pf depends on the variables z only.In fact, P is an orthogonal projection onto the subspace spanned by functions of this type [2].The subspace H 0 can be identified with the space L 2 ν of square-integrable functions on R m with respect to ν, and the scalar products are identical for all f, g ∈ H 0 [46]: For f ∈ H 0 we identify f and f from now on, and we also write ∇zf (x) for ∇z f ξ(x) .As usual, we denote the projection onto the orthogonal complement of H 0 by P ⊥ = I − P.

Effective Dynamics
In Refs.[22,46], it was suggested to define an effective dynamics Z t on R m , by the following definitions of effective drift and diffusion: As discussed in detail in Ref. [46], these coefficients result from a Galerkin projection of the generator L onto the infinite-dimensional space H 0 = L 2 ν .Consider the operator which is a self-adjoint operator on also acts as a differential operator + 1 where Lξ is understood componentwise and ∇ξ is a d × m-matrix, [∇ξ] il = ∂ξ l ∂xi .Thus, L ξ is the infinitesimal generator of a diffusion process on R m with drift and diffusion given by Eqs.(27)(28).The effective drift and diffusion can also be calculated by the Kramers-Moyal formulae In practice, the expectations in Eqs.(32)(33) must be estimated at a finite value of the parameter s, which we will call the offset from now on.

Preservation of Spectral Properties
We investigate conditions to ensure that the operator L ξ approximately preserves the dominant eigenvalues κ 1 , . . ., κ M , which encode the metastability of the process.We assume that L ξ also possesses a complete set of eigenfunctions ψ ξ i and corresponding eigenvalues −ω i .We follow the approach from previous works [46, 47, 2] and consider the case that the dominant eigenfunctions can be approximately parametrized by the projected variables, that is, we assume that It was already shown in [46] that if Eq. ( 34) holds exactly, then the corresponding eigenvalues are left unchanged by the projection: . Then ψ i is also an eigenfunction of L ξ corresponding to the same eigenvalue −κ i .
In the next result, we show that if not only the eigenfunctions themselves, but also their derivatives can be well approximated by functions in H 0 , a bound on the eigenvalue error for L ξ can be obtained: Proposition 1 If, for each eigenfunction ψ i , i = 1, . . ., M , the error upon projection onto H 0 is small in then the maximal relative error between the dominant eigenvalues of L ξ and L is bounded by max i=2,...,M Proof Note that by definition, the first eigenfunction ψ 1 ≡ 1 is not affected by the projection, therefore we only consider the error for i = 2, . . ., M .Following the approach in Ref. [15], we choose some positive constant α > 0 and introduce the operator A = αId − L. Since L is negative semi-definite by Eqs. ( 6) and ( 10), A is positive definite, and therefore defines a norm on D(L) We consider the finite-dimensional space G spanned by the functions g i := Pψ i , which we assume to be linearly independent without loss of generality.Let Q A denote the A-orthogonal projection onto G, and let ωi denote the approximate eigenvalues generated by the Ritz-Galerkin projection onto G. Theorem 4.3 and Remark 4.2 in Ref. [15] provide us with the following estimate: Examining the projection errors on the right-hand-side, we find by Eq. (10).As ψ j − g j = ψ j − Pψ j = P ⊥ ψ j , we conclude and since α > 0 was arbitrary, we end up with the result As discussed in Ref. [46,Sec. 3.2.3],Galerkin projection of the full generator onto a subspace of H 0 is equivalent to a Galerkin projection of the projected generator L ξ onto the same subspace.From the Rayleigh-Ritz variational principle we obtain ωi ≥ ω i , thus we conclude and the claim follows.
In Ref. [47,Theorem 2], it was shown that the absolute eigenvalue error of the projected generator is small if LP ⊥ ψ i L 2 µ and P ⊥ ψ i L 2 µ are small.Proposition 1 complements these results in the sense that it bounds the relative error of eigenvalues (timescales), which is a more practical error measure for eigenvalues close to zero, i.e., large timescales.Furthermore, the bound (35) is less restrictive than the conditions in [47], as it uses a H 1 µ norm instead of the H 2 µ (or equivalent) norm, as done therein.In fact, the bound assumed in [47, Theorem 2] implies our bound (35) up to to a multiplicative constant.
Proposition 2 Let us consider the diffusion (1), satisfying (6), on a bounded domain with smooth boundary and reflecting boundary conditions.If LP ⊥ ψ i L 2 µ ≤ δ 1 , then there is a C > 0 such that the assumptions of Prop. 1 are satisfied with for some C > 0 independent of u.We have concluding the proof.

Langevin Dynamics
So far, we have discussed spectral properties of the projected dynamics for reversible diffusion processes.A widespread dynamical model, especially in molecular simulations, are the Langevin dynamics, Eqs.(3)(4).These dynamics are, however, not reversible in full state space.Nevertheless, in many molecular dynamics applications, the slow transitions of interest are typically independent of the momentum coordinates, which can be assumed to equilibrate rapidly.It is also well-known that under a re-scaling of time and for large friction γ, the spatial dynamics are close to a reversible overdamped process.More precisely, upon re-scaling time via (q t , p t ) = (q t/ , p t/ ), and choosing = 1 γ , Eqs. (3-4) transform to As it is well-known (see, e.g., [33], [25]), the theory of multiscale asymptotics shows that, as → 0, the re-scaled positional dynamics Eq. ( 52) converge weakly to the overdamped Langevin process Thus, conditional expectations like Eq. ( 8) or Eqs.(32)(33) converge to those estimated from overdamped Langevin dynamics Eq. (54) as → 0.Moreover, we note that Eq. ( 54) is obtained from Eq. ( 5) by the same re-scaling of time, t = t/ = γt.
As re-scaling of time is the same as observing a process at larger time window, we can exploit this finding by selecting a sufficiently large offset s in the Kramers-Moyal formulae Eqs.(32-33) for a reaction coordinate ξ which is independent of the momenta.If both the friction γ and the offset s are sufficiently large, the effective drift and diffusion obtained by the Kramers-Moyal formulae can be expected to approximately agree with those of the overdamped process Eq. ( 5) estimated at the same offset.This enables us to estimate an effective dynamics from Langevin dynamics data for which the approximation results from section 3.3 still apply.
Using a large offset s in the Kramers-Moyal formulae seems to be at odds with the need to use a small value of s in order to remain close to the actual limit s → 0. However, we have found compelling numerical evidence that even at surprisingly large values of s, good approximations to the dominant eigenvalues of the transfer operator Tτ can be obtained from the effective simulations (i.e., from the transfer operator T ξ τ of the effective process).We will discuss these findings in the numerical examples below.A rigorous result on this phenomenon is still missing.In the next section, we take a first step and characterize the expected drift and diffusion for large values of s if z is contained in an idealized metastable set.

Asymptotics for Large Offsets
We use the notation b ξ,s l , a ξ,s lr for the components of effective drift and diffusion estimated at finite s.Assuming a timescale separation as described in section 2.2, there exist M subsets S i ⊂ R d such that the M first eigenfunctions of the generator are nearly constant within the sets S i : [11].Let us assume that there are corresponding subsets Si ⊂ R m such that Σz ⊂ S i for all z ∈ Si , i.e., the level sets of all points in Si are completely contained within the metastable set S i .This is in line with the idea that our choice of reaction coordinates should be able to parametrize the dominant eigenfunctions well.Using these assumptions, we can show the following: Proposition 3 Given the assumptions outlined above, for z ∈ Si and for s t M +1 , the effective drift and diffusion satisfy Proof We express the effective drift at finite s using Ito's formula: where we have used the fact that the stochastic integral is of zero expectation.Next, we apply the identities Inserting the spectral expansion Eq. ( 15) and truncating it after M terms, we find: In the second equation, we used the assumption about constancy of the eigenfunctions along the levelset Σz.
The result for the diffusion can be obtained in a similar way, by re-writing the Kramers-Moyal formula Eq. ( 33) as Now, we follow the previous derivation separately for each of the three terms in the expectation, and use that the pre-factors z l and zr for the last two terms are not affected by the expectation.This results in

Numerical Examples
In all of the following examples, we use the Kramers-Moyal formulae Eqs.(32)(33) to obtain numerical estimates of effective drift and diffusion from simulation data of the original process Eq. ( 1).These expressions suggest to use the following procedure: for each point z ∈ R m , we need to draw a large number of initial conditions from the distribution µz along the level set Σz. Independent simulations need to be started from each of these points and run for a short time s.Averages of the linear and quadratic differences 1 s (ξ(Xs) − z) and 1 s (ξ(Xs) − z) ⊗ (ξ(Xs) − z) over all simulations provide estimates of the effective drift and diffusion at point z.However, as all of the examples below are rather simple systems, we can proceed differently in this work.We start out with a long equilibrium simulation of the full process X t .We discretize the reaction coordinate space into discrete bins B i .For each of these bins, all simulation steps X t such that ξ(X t ) falls into B i are determined.As the data is in global equilibrium, we can then average the linear and quadratic differences ) over all X t ∈ S i to obtain discretized estimates of effective drift and diffusion.These estimates are extended to continuous space by interpolation.This procedure is feasible as long as the reaction coordinate space is low-dimensional and simulation of an equilibrium trajectory is possible, as it is in all of the following examples.If the reaction coordinate space is not low-dimensional, we recommend to obtain functional representations of effective drift and diffusion by means of the data-based regression explained in Ref. [3].Moreover, if equilibrium data cannot be produced, re-weighting methods such as those explained in Refs.[44,45,21] need to be employed on top of that.

Lemon Slice Potential
Our first numerical example is a two-dimensional toy system.We consider overdamped Langevin dynamics Eq. ( 5) in the "lemon slice" potential [2] V (r, ϕ) = cos(7ϕ) + 10(r − 1) where r, ϕ are two-dimensional polar coordinates, at inverse temperature β = 1 and friction γ = 1.The structure of the potential function suggests that its slow dynamics correspond to transitions between the seven minima shown in Fig. 1.It follows that the polar angle ϕ should parametrize the first seven eigenfunctions of the associated generator well, making it an appropriate choice of reaction coordinate: ξ(x, y) = ϕ(x, y). (77) In this case, the effective drift b ξ and diffusion a ξ can be calculated analytically, see appendix A: We generate a long equilibrium simulation of the dynamics Eq. ( 5) using the Euler-Maruyama method at integration time step ∆ t = 10 −3 for a total of 10 7 steps.A Markov state model analysis-as described in section 2.3-of the full data set reveals seven dominant eigenvalues and six corresponding slow timescales t 2 , . . ., t 7 , as expected.As indicated in Fig. 2 C, these slow timescales come in three pairs due to the symmetry of the system.Their numerical values are t 2 ≈ t 3 ≈ 1.70, t 4 ≈ t 5 ≈ 0.49, t 6 ≈ t 7 ≈ 0.29.
By the procedure outlined at the beginning of this section, we estimate the parameters of the effective dynamics using 63 discrete bins along the angular coordinate, each of bin width 0.1 radians.We do so for a range of different offsets s between the integration time step s = 10 −3 up to s = 1.We can observe in Figs. 2 A and B that, as expected, the exact drift and diffusion Eqs.(78-79) are recovered well for small s, while very different results are obtained as s increases.Nevertheless, we run simulations of the effective dynamics using all of these estimates of drift and diffusion, at the same integration time step and total simulation length as in the original dynamics.All of these simulations are analyzed by a Markov state model, where the same bins as above are used as discrete states.As shown in Fig. 2 C, the resulting estimates of the six leading implied timescales barely change as s increases.Only the last two timescales t 6 , t 7 start deviating from the reference as s approaches the magnitude of t 6 , t 7 themselves, which is not surprising.Similarly, we monitor the stationary probabilities of the seven potential minima if estimated from the different effective simulations.The minima are defined as intervals z ∈ [z k − 0.25, z k + 0.25] for z k = −π + 2πk 7 , k = 0, . . ., 6.We compare them to the reference obtained by numerically integrating the marginal ν(z), Eq. ( 21), over those intervals.In Fig. 2 D, we find that these probabilities are preserved by the effective dynamics until s is comparable to t 6 , t 7 .Thus, the essential stationary and dynamical properties of the original dynamics seem to be preserved even if the effective dynamics are estimated at fairly large offset s, which is a surprising result.

Langevin Toy Model
Next, we study Langevin dynamics in another two-dimensional toy potential.The potential is composed of a double-well in x-direction and a harmonic potential in both x and y, the functional form is A contour plot of V is shown in Fig. 3.We set γ = 10, β = 0.4.The slowest transition in this energy landscape is the crossing of the barrier around x = 2, the corresponding implied timescale is t 2 ≈ 75.We project the system onto the x-coordinate only, integrating out all momentum coordinates.We note that the effective dynamics provided by Eqs.(27)(28), in the limit of s → 0, are meaningless in this case.It can be verified that b ξ (x) = a ξ (x) = 0 for all x.In brief terms, the reason is that the dynamics are anti-symmetric in the momentum coordinate p, while µ is symmetric in p, and thus the projection P integrates over an odd function in p; see [41,1].However, upon increasing the offset it is possible to find a sweet spot where s is larger than the momentum relaxation time 1 γ , but smaller than the slowest timescale t 2 .In this regime, the overdamped limit can be exploited to find a meaningful effective dynamics along x.We also note that the effective dynamics would not vanish if the first momentum coordinate was included in the projection.
We simulate Langevin dynamics using the Euler-Maruyama scheme at integration time step ∆ t = 10 −2 for 10 7 steps, and also generate data of the overdamped dynamics in the same potential using the same simulation parameters.Figures 4 A and B show numerical estimates of the effective drift and diffusion obtained by the Kramers-Moyal formulae along 24 discrete bins of bin width 0.2, using both the Langevin and overdamped Langevin simulations.We note that these estimates are very different for s = 0.01 (and indeed, estimates from Langevin data are close to zero), but agree for s = 1.0.Again, we produce long simulations of all the resulting effective dynamics, and analyze these data by a Markov state model defined on the same discrete bins as above, extracting the slowest implied timescale t 2 and the probabilities of the two metastable sets using the PCCA method [14,38].We find that for sufficiently large s, all effective simulations reproduce both t 2 (Fig. 4 C) and the probabilities of the PCCA sets (Fig. 4 D) from the original simulations of the overdamped dynamics.
This example shows that the overdamped limit can be exploited by using a sufficiently large offset s, and again, we find that essential stationary and dynamical properties of the original dynamics are retained by effective simulations estimated at large s.

Alanine Dipeptide
Our next example is a dataset of molecular dynamics simulations of alanine dipeptide.This small molecule consists of only a single amino acid with two capping groups.It has been used as a test system for numerous methods and algorithms in the past years.Its slow dynamics can be described by the backbone dihedral angles φ, ψ.The projected free energy in φ-ψ-space decomposes into three minima, and the highest barrier to be crossed is aligned with the φ-direction, as shown in Fig. 5 A. The corresponding implied timescale is t 2 ≈ 1.5 ns [32].Therefore, we choose ξ(x) = φ(x) as one-dimensional reaction coordinate in this example, lumping the two states on the left into a single metastable state.To avoid numerical difficulties caused by unsampled regions of reaction coordinate space, we make use of the periodicity of the dihedral angles and shift the interval 3 4 π, π to − 5 4 π, −π in what follows.The range of ξ thus becomes ξ(x) ∈ − 5 4 π, 3 4 π .The system was simulated in explicit water using the Amber99 force field and velocity re-scaling thermostat, see Ref. [32] for a detailed simulation setup.The data consists of 20 long simulations of 200 ns simulation time each, resulting in 4 µs total simulation time.Each simulation is long enough to sample the global equilibrium distribution.The velocity re-scaling thermostat has been shown to be closely related to Langevin dynamics [5].It also involves a smallness parameter (called τ in [5]), which is related to the friction via τ = (2γ) −1 .Its numerical value is τ = 0.01 ps in this data set.We expect to find similar results as in the case of Langevin dynamics.A Markov state model based on 500 discrete bins in φ-ψ-space serves as reference model.
The effective drift and diffusion are estimated for 62 discrete bins along the φ-coordinate.The bin width is 0.1, and offsets ranging from s = 1 ps up to s = 500 ps are used.Instead of showing all of these parameters here, we provide an illustration of the local approximations Eqs.(55-57) within the two metastable sets along φ at a large offset (s = 100 ps).We find in Fig. 5 B that they are in good agreement with the Kramers-Moyal estimates.Just as in the previous examples, we use all of the estimated parameters and produce simulations of the effective dynamics in φ by means of the Euler scheme.The same integration time step is employed as in the original MD simulations, and each of those trajectories covers 200 ns of total simulation time.Figure 5 C shows the slowest implied timescale estimated by Markov state models of the effective simulations, based on the same discrete bins used above, as a function of s.It can be observed that the slowest timescale is massively underestimated at small s, but estimates improve with increasing s and the correct regime is reached for s ≥ 100 ps.Furthermore, we use PCCA to extract two dominant metastable states from the effective simulations.We calculate their stationary probabilities and compare them to those of the dominant metastable states in the original simulation.The results in Fig. 5 D confirm that at sufficiently large offset, these stationary probabilities agree.τ , which is where the overdamped regime is expected to be attained.Errorbars in panels C and D were obtained by bootstrapping on the effective simulations.

Deca Alanine
We conclude by studying a set of more complex molecular simulation data.The deca alanine peptide has also been used as a test system in multiple previous studies.Here, we use a data set of six independent long simulations of 500 ns each, using the Amber03 force field and velocity re-scaling thermostat with time constant τ = 0.01 ps, see Ref. [32] for the complete simulation setup.It was also shown in Ref. [32] that the slowest dynamical process is formation and deformation of a helix, it occurs at an implied timescale t 2 ≈ 7.6 ns, followed by the next slowest process at t 3 ≈ 3.9 ns.
This time, we select a two-dimensional projection into the space of two slowest approximate eigenfunctions, that is ξ(x) = ψ2 (x), ψ3 (x) ∈ R 2 , where ψ2 ≈ ψ 2 , ψ3 ≈ ψ 3 .These approximations are obtained by the following procedure: we apply TICA [36] to the the time series of 16 backbone dihedral angles in the peptide.According to the criterion of kinetic variance [28,27], the first seven TICs are retained, and the k-means algorithm is used to discretize this space into 500 states.The k-means clustercenters are used as centers for a basis of 500 isotropic multivariate Gaussian functions with covariance matrix equal to 0.5 times the identity.As described in section 2.3, the optimal linear approximations ψ2 , ψ3 to the first two eigenfunctions of Tτ projected on this basis are computed by the VAC, completing the definition of ξ.The upper part of Fig. 6 A shows the free energy − 1 β log(ν(z)) in this space, where three major minima can be identified.Based on 100 k-means clusters in the seven-dimensional TICA space mentioned above, we build a Markov state model which serves as reference.
We use 784 bins of size 0.1 × 0.1 in the projected space to estimate effective drift and diffusion for a series of offsets between s = 1 ps and s = 1 ns, and employ the Euler scheme run the resulting stochastic dynamics at integration time step 2 fs for 500 ns total simulation time.We extract the two leading implied timescales from all of these simulations using a Markov state model on the same discrete states as above, and compare them to those of the original simulation in Fig. 6 B. Both timescales are massively underestimated at small s, but agree well as s approaches 1 ns.In Fig. 6 A, lower part, we show the effective free energy sampled by the simulation at s = 1 ns.It can be seen that detailed features of the original energy landscape are washed out, only the overall structure of valleys and barriers is roughly preserved.Nevertheless, comparing the stationary probabilities of the three PCCA sets obtained from the original dynamics in both the original and effective simulations, we find that these probabilities are approximately preserved for s = 1 ns.In fact, they are preserved for most values of s in this example, see Fig. 6 C. Again, we find that, while detailed features of the original simulations are lost if a large offset s is used, the essential (long-timescale) dynamical and stationary properties remain unchanged.The vertical dashed line indicates where the offset equals 1 τ , which is where the overdamped regime is expected to be attained.C: Stationary probabilities of three metastable sets (obtained from the original data using the PCCA method) as a function of s (dashed lines), compared to the reference values (solid lines).The vertical dashed line indicates where the offset equals 1 τ , which is where the overdamped regime is expected to be attained.Errorbars in panels B and C were obtained by bootstrapping on the effective simulations.

Summary
We have discussed the approximation of high-dimensional diffusion processes by effective dynamics defined on the lower-dimensional space of reduced variables.We have analyzed the approximation quality of low-lying eigenvalues of the corresponding generator for reversible diffusions.A new relative error bound for dominant eigenvalues in terms of the H 1 µ -approximation error of the corresponding eigenfunctions was proved.Furthermore, we have discussed how the overdamped limit can be exploited to extend the validity of these approximation results to Langevin dynamics.Numerical examples have shown that using a large offset in the Kramers-Moyal estimators for effective drift and diffusion coefficients does not seem to impair the approximation quality of dominant eigenvalues.Future work will focus on providing a theoretical foundation for the observations stated in this paper.

Figure 2
Figure2Analysis of effective dynamics on the polar angle for the lemon slice potential.A: Numerical estimates of effective drift for different values of the offset s, compared to the reference in black.B: The same for the effective diffusion.C: Implied timescales t 2 , . . ., t 7 obtained by analyzing simulation data of the effective dynamics, as a function of s, compared to the reference values in black.Note that all of these timescales come in three pairs due to the symmetry of the system.The vertical dashed line indicates the last two dominant timescales t 6 ≈ t 7 .D: Stationary probabilities of the potential minima estimated from simulations of the effective dynamics, as a function of s (we only show four of the seven stationary probabilities for clarity of the figure).The reference value is shown in black, it is the same for all seven minima.The vertical dashed line indicates the last two dominant timescales t 6 ≈ t 7 .Errorbars were produced by bootstrapping on the original data (A and B) and on the effective simulations (C and D).We note that errorbars in C and D are barely visible.

Figure 3
Figure 3 Contour plot of the two-dimensional toy potential Eq. (82).

Figure 4
Figure 4  Analysis of effective dynamics on the x-coordinate of the two-dimensional toy potential Eq. (82).A: Numerical estimates of effective drift for different values of the offset s (indicated by different colors) using both the Langevin (L) data (circles) and the overdamped (O) data (triangles).B: The same for the effective diffusion.C: Leading implied timescale t 2 obtained by analyzing simulation data of the effective dynamics from both Langevin data (circles) and overdamped data (triangles), as a function of s, compared to the reference value in black.D: Stationary probabilities of the two metastable sets estimated from simulations of the effective dynamics from both Langevin data (circles) and overdamped data (triangles), as a function of s, compared to the references in black.Errorbars were produced by bootstrapping on the original data (A and B) and on the effective simulations (C and D).We note that errorbars in C and D are barely visible.

Figure 5
Figure 5 Analysis of effective dynamics for alanine dipeptide along φ-direction.A: Effective free energy of the original simulation data in φ-ψ-space.B: Comparison of numerical estimates for drift and diffusion at s = 100 ps to the local approximations Eqs.(55-57) within the two dominant metastable sets of the original dynamics.We use blue and green color to indicate the local approximations within the two states, while the numerical estimates are shown in black.The drift is represented by a solid line and the diffusion by a dashed line.C: Slowest implied timescale t 2 obtained from a Markov state model of the effective dynamics, as a function of s.The reference value is indicated by the black line.The vertical dashed line indicates where the offset equals 1τ , which is where the overdamped regime is expected to be attained.D: Stationary probabilities of two PCCA sets extracted from the effective dynamics, as a of s.Black lines indicate stationary probabilities of metastable sets extracted from the original data.The vertical dashed line indicates where the offset equals 1 τ , which is where the overdamped regime is expected to be attained.Errorbars in panels C and D were obtained by bootstrapping on the effective simulations.

Figure 6
Figure 6  Analysis of effective dynamics for deca alanine in the space of its two slowest eigenfunctions.A: Effective free energy of the original dynamics (top) and free energy sampled by effective dynamics at s = 1 ns (bottom).B: First two implied timescales of the effective simulations as a function of the offset s, compared to the references in black.The vertical dashed line indicates where the offset equals 1 τ , which is where the overdamped regime is expected to be attained.C: Stationary probabilities of three metastable sets (obtained from the original data using the PCCA method) as a function of s (dashed lines), compared to the reference values (solid lines).The vertical dashed line indicates where the offset equals 1 τ , which is where the overdamped regime is expected to be attained.Errorbars in panels B and C were obtained by bootstrapping on the effective simulations.