Open Access
This article is
 freely available
 reusable
Computation 2018, 6(1), 22; doi:10.3390/computation6010022
Article
Optimal DataDriven Estimation of Generalized Markov State Models for NonEquilibrium Dynamics
^{1}
Department of Mathematics and Computer Science, Freie Universität Berlin, 14195 Berlin, Germany
^{2}
Zuse Institute Berlin, 14195 Berlin, Germany
^{*}
Author to whom correspondence should be addressed.
Received: 12 January 2018 / Accepted: 23 February 2018 / Published: 26 February 2018
Abstract
:There are multiple ways in which a stochastic system can be out of statistical equilibrium. It might be subject to timevarying forcing; or be in a transient phase on its way towards equilibrium; it might even be in equilibrium without us noticing it, due to insufficient observations; and it even might be a system failing to admit an equilibrium distribution at all. We review some of the approaches that model the effective statistical behavior of equilibrium and nonequilibrium dynamical systems, and show that both cases can be considered under the unified framework of optimal lowrank approximation of socalled transfer operators. Particular attention is given to the connection between these methods, Markov state models, and the concept of metastability, further to the estimation of such reduced order models from finite simulation data. All these topics bear an important role in, e.g., molecular dynamics, where Markov state models are often and successfully utilized, and which is the main motivating application in this paper. We illustrate our considerations by numerical examples.
Keywords:
Markov state model; nonequilibrium; metastability; coherent set; molecular dynamics; transfer operator; Koopman operator; extended dynamic mode decomposition1. Introduction
Metastable molecular systems under nonequilibrium conditions have recently attracted increasing interest. Examples include systems that evolve under an external force, such as a pulling force generated by an optical tweezer or an atomic force microscope, an electrostatic force across a biomembrane that drives ion through a channel protein, or the periodically changing force generated by a spectroscopic electromagnetic field. Such nonequilibrium conditions can be built into molecular dynamics (MD) simulations in order to probe their effects on the molecule. Despite the relevance of nonequilibrium effects, reliable tools for the quantitative description of nonequilibrium phenomena like the conformational dynamics of a molecular system under external forcing are still lacking.
In this paper, we say that a process is in “equilibrium”, if it is statistically reversible with respect to its equilibrium distribution (see Table 1). For MD simulations under equilibrium conditions, reliable analysis tools have been developed. For example, Markov state models (MSMs) allow for an accurate description of the transitions between the main conformations of the molecular system under investigation. MSMs for equilibrium MD have been well developed and established over the past decade in theory [1,2], applications (see the recent book [3] for an overview), and software implementations [4,5]. The principal idea of equilibrium MSMs is to approximate the longtimescale and stationary properties of the MD system by a reduced Markovian dynamics over a finite number of (macro)states, i.e., in discrete state space. These states represent or at least separate the dominant metastable sets of the system, i.e., sets in which typical MD trajectories stay substantially longer than the system needs for a transition to another such set [1,6].
Equilibrium Markovian processes are associated with realvalued eigenvalues and eigenfunctions in their propagators—a property that methods for the approximation of equilibrium dynamics are built upon. For example, the approximation error of MSMs and their slowest relaxation timescales can be expressed in terms of how well the state space discretization approximates the dominant eigenfunctions [7,8]. This has been formulated in the variational approach for conformation dynamics (VAC), or Rayleigh–Ritz method, which provides an optimization method to systematically search for best MSMs or other models of the equilibrium dynamics [9,10]. PerronCluster Cluster Analysis (PCCA) [11,12] identify the metastable states of a molecule by conducting a spectral clustering in the space spanned by the dominant eigenfunctions. Additionally, equilibrium MSMs are the foundation of analyzing multiensemble simulations that help to sample the rare events [13,14].
In the nonequilibrium setting the above tools break down or are not defined. The purpose of the present paper is to summarize and reconcile recently developed methods for the description of nonequilibrium processes, and to merge them with their equilibrium counterparts into a unified framework. Note that there are different possibilities to deviate from the “equilibrium” situation, and this makes the term “nonequilibrium” ambiguous. To avoid confusion, we consider one of the following cases when referring to the nonequilibrium setting (again, see Table 1 on terminology).
 (i)
 Timeinhomogeneous dynamics, e.g., the system feels a timedependent external force, for instance due to an electromagnetic field or force probing.
 (ii)
 Timehomogeneous nonreversible dynamics, i.e., where the governing laws of the system do not change in time, but the system does not obey detailed balance, and, additionally we might want to consider the system in a nonstationary regime.
 (iii)
 Reversible dynamics but nonstationary data, i.e., the system possesses a stationary distribution with respect to which it is in detailed balance, but the empirical distribution of the available data did not converge to this stationary distribution.
Even though we consider genuinely stochastic systems here, the algorithm of Section 5 can be used for deterministic systems as well—and indeed it is, see Remark 2 and references therein.
Note that with regard to the considered dynamics (i)–(iii) represent cases with decreasing generality. For (i), timedependent external fields act on the system, such that the energy landscape depends on time, i.e., the main wells of the energy landscape can move in time. That is, there may no longer be timeindependent metastable sets in which the dynamics stays for long periods of time before exiting. Instead, the potentially metastable sets will move in state space. Generally, moving “metastable” sets cannot be considered metastable anymore. However, the socalled coherent sets, which have been studied for nonautonomous flow fields in fluid dynamics [15,16], permit to give a meaning to the concept of metastability [17]. For (iii), the full theory of equilibrium Markov state modeling is at one’s disposal, but one needs to estimate certain required quantities from nonequilibrium data [18]. Case (ii) seems the most elusive, due to the fact that on the one hand it could be handled by the timeinhomogeneous approach, but on the other hand it is a timehomogeneous system and some structural properties could be carried over from the reversible equilibrium case that are out of reach for a timeinhomogeneous analysis. For instance, if the dynamics shows cyclic behavior, it admits structures that are well captured by tools from the analysis of timehomogeneous dynamical systems (e.g., Floquet theory and Poincaré sections [19,20]), and a more general view as in (i) might miss them; however, cyclic behavior is not present in reversible systems, such that the tools from (iii) are doomed to failure in this respect. In order to avoid confusion, however, it should be emphasized that the three cases distinguished above do not suffice to clarify the discussion about the definition of equilibrium or nonequilibrium, e.g., see the literature on nonequilibrium steady state systems [21,22].
Apart from MSMs the literature on kinetic lumping schemes offers several other techniques for finding a coarsegrained descriptions of systems [23,24,25]. These techniques are, however, not built on the intuition of metastable behavior in state space. What we consider here can be seen in connection to optimal prediction in the sense of the Mori–Zwanzig formalism [26,27,28,29], but we will try to choose the observables of the system such that projecting the dynamics on these keeps certain properties intact without including memory terms.
In what follows, we will review and unify some of the theoretical and also datadriven algorithmic approaches that attempt to model the effective statistical behavior of nonequilibrium systems. To this end, a MSM—or, more precisely, a generalized MSM—is sought, i.e., a possibly small matrix T that carries the properties of the actual system that are of physical relevance. In the equilibrium case, for example, this includes the slowest timescales on which the system relaxes towards equilibrium (Section 2). The difference of generalized to standard MSMs is that we do not strictly require the former to be interpretable in terms of transition probabilities between some regions of the state space (Section 3), however usually there is a strong connection between the matrix entries and metastable sets. While heavily building upon recent results for MSMs of nonstationary MD [17] and a nonequilibrium generalization of the variational principle [30], we will focus on a slightly different characteristic of the approximate model, namely its “propagation error”. It turns out that this notion permits a straightforward generalization from equilibrium (reversible) to all our nonequilibrium cases (Section 4), and even retain the physical intuition behind true MSMs through the concept of coherent sets. We will show in Section 5 how these considerations can be carried over to the case when only a finite amount of simulation data is available. The above nonequilibrium cases (ii)–(iii) can be then given as specific instances of the construction (Section 6). The theory is illustrated with examples throughout the text. Bringing the formerly known equilibrium and timeinhomogeneous concepts into a unified framework (Section 3 and Section 4) can be seen as the main contribution of this paper, together with the novel application of this framework to timehomogeneous nonequilibrium systems relying on nonstationary data (Section 6).
We note in advance that in course of the (generalized) Markov state modeling we will consider different instances of approximations to a certain linear operator $\mathcal{T}:\mathbb{S}\to \mathbb{S}$ mapping some space to itself (and sometimes to a different one). On the one hand, there will be a projected operator ${\mathcal{T}}_{k}:\mathbb{S}\to \mathbb{S}$, where ${\mathcal{T}}_{k}=\mathcal{Q}\mathcal{T}\mathcal{Q}$ with a projection $\mathcal{Q}:\mathbb{S}\to \mathbb{V}$ having a kdimensional range $\mathbb{V}\subset \mathbb{S}$. On the other hand, we will consider the restriction of the projected operator ${\mathcal{T}}_{k}$ to this kdimensional subspace, i.e., ${\mathcal{T}}_{k}:\mathbb{V}\to \mathbb{V}$, also called $\mathbb{V}$restriction of ${\mathcal{T}}_{k}$, which has a $k\times k$ matrix representation (with respect to some chosen basis of $\mathbb{V}$) that we will denote by ${T}_{k}$.
2. Studying Dynamics with Functions
2.1. Transfer Operators
In what follows, $\mathsf{P}[\phantom{\rule{0.166667em}{0ex}}\xb7\mid \mathfrak{E}]$ and $\mathsf{E}[\phantom{\rule{0.166667em}{0ex}}\xb7\mid \mathfrak{E}]$ denote probability and expectation conditioned on the event $\mathfrak{E}$. Furthermore, ${\{{x}_{t}\}}_{t\ge 0}$ is a stochastic process defined on a state space $\mathbb{X}\subset {\mathbb{R}}^{d}$. For instance, we can think of ${x}_{t}$ being the solution of the stochastic differential equation
describing diffusion in the potential energy landscape given by W. Here, $\beta $ is the nondimensionalized inverse temperature, and ${w}_{t}$ is a standard Wiener process (Brownian motion). The transition density function ${p}^{t}:\mathbb{X}\times \mathbb{X}\to {\mathbb{R}}_{\ge 0}$ of a timehomogeneous stochastic process ${\{{x}_{t}\}}_{t\ge 0}$ is defined by
$$\mathrm{d}{x}_{t}=\nabla W({x}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+\sqrt{2{\beta}^{1}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{w}_{t}\phantom{\rule{0.166667em}{0ex}},$$
$$\mathsf{P}[{x}_{t}\in \mathbb{A}\mid {x}_{0}=x]={\int}_{\mathbb{A}}{p}^{t}(x,y)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}y\phantom{\rule{0.166667em}{0ex}},\phantom{\rule{2.em}{0ex}}\mathbb{A}\subseteq \mathbb{X}\phantom{\rule{0.166667em}{0ex}}.$$
That is, ${p}^{t}(x,y)$ is the conditional probability density of ${x}_{t}=y$ given that ${x}_{0}=x$. We also write ${x}_{t}\sim {p}^{t}({x}_{0},\xb7)$ to indicate that ${x}_{t}$ has density ${p}^{t}({x}_{0},\xb7)$.
With the aid of the transition density function, we will now define transfer operators, i.e., the action of the process on functions of the state. Note, however, that the transition density is in general not known explicitly, and thus we will need databased approximations to estimate it. We assume that there is a unique stationary density $\mu $, such that ${\{{x}_{t}\}}_{t\ge 0}$ is stationary with respect to $\mu $; that is, it satisfies ${x}_{0}\sim \mu $ and
$$\mu (x)={\int}_{\mathbb{X}}\mu (y)\phantom{\rule{0.166667em}{0ex}}{p}^{t}(y,x)\mathrm{d}y\phantom{\rule{1.em}{0ex}}\mathrm{for}\phantom{\rule{4.pt}{0ex}}\mathrm{all}\phantom{\rule{4.pt}{0ex}}t\ge 0.$$
Let now f be a probability density over $\mathbb{X}$, $u=f/\mu $ a probability density with respect to $\mu $ (meaning that $\mu $ is to be interpreted as a physical density), and g a scalar function of the state (an “observable”). We define the following transfer operators, for a given lag time $\tau $:
 (a)
 The Perron–Frobenius operator (also called propagator),$${\mathcal{P}}^{\tau}f(x)={\int}_{\mathbb{X}}f(y)\phantom{\rule{0.166667em}{0ex}}{p}^{\tau}(y,x)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}y$$
 (b)
 The Perron–Frobenius operator with respect to the equilibrium density (also called transfer operator, simply),$${\mathcal{T}}^{\tau}u(x)=\frac{1}{\mu (x)}{\int}_{\mathbb{X}}u(y)\mu (y)\phantom{\rule{0.166667em}{0ex}}{p}^{\tau}(y,x)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}y\phantom{\rule{0.166667em}{0ex}}.$$
 (c)
 The Koopman operator$${\mathcal{K}}^{\tau}g(x)={\int}_{\mathbb{X}}{p}^{\tau}(x,y)\phantom{\rule{0.166667em}{0ex}}g(y)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}y=\mathsf{E}[g({x}_{\tau})\mid {x}_{0}=x]$$
We denote by ${L}^{q}={L}^{q}(\mathbb{X})$ the space (equivalence class) of qintegrable functions with respect to the Lebesgue measure. ${L}_{\nu}^{q}$ denotes the same space of function, now integrable with respect to the weight function $\nu $. All our transfer operators are welldefined nonexpanding operators on the following Hilbert spaces: ${\mathcal{P}}^{\tau}:{L}_{1/\mu}^{2}\to {L}_{1/\mu}^{2}$, ${\mathcal{T}}^{\tau}:{L}_{\mu}^{2}\to {L}_{\mu}^{2}$, and ${\mathcal{K}}^{\tau}:{L}_{\mu}^{2}\to {L}_{\mu}^{2}$ [31,32,33]. The equilibrium density $\mu $ satisfies ${\mathcal{P}}^{\tau}\mu =\mu $, that is, $\mu $ is an eigenfunction of ${\mathcal{P}}^{\tau}$ with associated eigenvalue ${\lambda}_{0}=1$. The definition of ${\mathcal{T}}^{\tau}$ relies on $\mu $, we have
thus ${\mathcal{P}}^{\tau}\mu =\mu $ translates into ${\mathcal{T}}^{\tau}1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}1=1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}1$, where $1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}1=1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{\mathbb{X}}$ is the constant one function on $\mathbb{X}$.
$$\mu \phantom{\rule{0.166667em}{0ex}}{\mathcal{T}}^{\tau}u={\mathcal{P}}^{\tau}(u\mu )\phantom{\rule{0.166667em}{0ex}},$$
2.2. Reversible Equilibrium Dynamics and Spectral Decomposition
An important structural property of many systems used to model molecular dynamics is reversibility. Reversibility means that the process is statistically indistinguishable from its timereversed counterpart, and it can be described by the detailed balance condition
$$\mu (x)\phantom{\rule{0.166667em}{0ex}}{p}^{t}(x,y)=\mu (y)\phantom{\rule{0.166667em}{0ex}}{p}^{t}(y,x)\phantom{\rule{2.em}{0ex}}\forall x,y\in \mathbb{X},\phantom{\rule{4pt}{0ex}}t\ge 0\phantom{\rule{0.166667em}{0ex}}.$$
The process generated by (1) is reversible and ergodic, i.e., it has a unique positive equilibrium density, given by $\mu (x)\propto exp(\beta W(x))$, under mild conditions on the potential W [34,35]. The subsequent considerations hold for all reversible and ergodic (with respect to a unique positive invariant density) stochastic processes, and are not limited to the class of systems given by (1). Ref. [1] discusses a variety of stochastic dynamical systems that have been considered in this context. Furthermore, if ${p}^{t}(\xb7,\xb7)$ is a continuous function in both its arguments for $t>0$, then all the transfer operators above are compact, which we also assume from now on. This implies that they have a discrete eigen and singular spectrum (the latter meaning it has a discrete set of singular values). For instance, the process generated by (1) has continuous transition density function under mild growth and regularity assumptions on the potential W.
As a result of the detailed balance condition, the Koopman operator ${\mathcal{K}}^{\tau}$ and the Perron–Frobenius operator with respect to the equilibrium density ${\mathcal{T}}^{\tau}$ become identical and we obtain
i.e., all the transfer operators become selfadjoint on the respective Hilbert spaces from above. Here, ${\langle \xb7,\xb7\rangle}_{\nu}$ denotes the natural scalar products on the weighted space ${L}_{\nu}^{2}$, i.e., ${\langle f,g\rangle}_{\nu}={\int}_{\mathbb{X}}f(x)g(x)\nu (x)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}x$; the associated norm is denoted by ${\parallel \xb7\parallel}_{\nu}$. Due to the selfadjointness, the eigenvalues ${\lambda}_{i}^{\tau}$ of ${\mathcal{P}}^{\tau}$ and ${\mathcal{T}}^{\tau}$ are realvalued and the eigenfunctions form an orthogonal basis with respect to ${\langle \xb7,\xb7\rangle}_{1/\mu}$ and ${\langle \xb7,\xb7\rangle}_{\mu}$, respectively.
$${\langle {\mathcal{P}}^{\tau}f,g\rangle}_{1/\mu}={\langle f,{\mathcal{P}}^{\tau}g\rangle}_{1/\mu}\phantom{\rule{1.em}{0ex}}\mathrm{and}\phantom{\rule{1.em}{0ex}}{\langle {\mathcal{T}}^{\tau}f,g\rangle}_{\mu}={\langle f,{\mathcal{T}}^{\tau}g\rangle}_{\mu}\phantom{\rule{0.166667em}{0ex}},$$
Ergodicity implies that the dominant eigenvalue ${\lambda}_{1}$ is the only eigenvalue with absolute value 1 and we can thus order the eigenvalues so that
$$1={\lambda}_{1}^{\tau}>{\lambda}_{2}^{t}\ge {\lambda}_{3}^{t}\ge \cdots .$$
The eigenfunction of ${\mathcal{T}}^{\tau}$ corresponding to ${\lambda}_{1}=1$ is the constant function ${\varphi}_{1}={\mathbb{1}}_{\mathbb{X}}$. Let ${\varphi}_{i}$ be the normalized eigenfunctions of ${\mathcal{T}}^{\tau}$, i.e., ${\langle {\varphi}_{i},{\varphi}_{j}\rangle}_{\mu}={\delta}_{ij}$, where ${\delta}_{ij}$ denotes the Kroneckerdelta. Then any function $f\in {L}_{\mu}^{2}$ can be written in terms of the eigenfunctions as $f={\sum}_{i=1}^{\infty}{\langle f,{\varphi}_{i}\rangle}_{\mu}\phantom{\rule{0.166667em}{0ex}}{\varphi}_{i}$. Applying ${\mathcal{T}}^{\tau}$ thus results in
$${\mathcal{T}}^{\tau}f=\sum _{i=1}^{\infty}{\lambda}_{i}^{\tau}\phantom{\rule{0.166667em}{0ex}}{\langle f,{\varphi}_{i}\rangle}_{\mu}\phantom{\rule{0.166667em}{0ex}}{\varphi}_{i}.$$
For more details, we refer to [33] and references therein.
For some $k\in \mathbb{N}$, we call the k dominant eigenvalues ${\lambda}_{1}^{\tau},\cdots ,{\lambda}_{k}^{\tau}$ of ${\mathcal{T}}^{\tau}$ the dominant spectrum of ${\mathcal{T}}^{\tau}$, i.e.,
$${\lambda}_{\mathrm{dom}}({\mathcal{T}}^{\tau})=\{{\lambda}_{1}^{\tau},\cdots ,{\lambda}_{k}^{\tau}\}.$$
Usually, k is chosen in such a way that there is a spectral gap after ${\lambda}_{k}^{\tau}$, i.e., $1{\lambda}_{k}^{\tau}\ll {\lambda}_{k}^{\tau}{\lambda}_{k+1}^{\tau}$. The (implied) time scales on which the associated dominant eigenfunctions decay are given by
$${t}_{i}=\tau /log({\lambda}_{i}^{\tau}).$$
If ${\{{\mathcal{T}}^{t}\}}_{t\ge 0}$ is a semigroup of operators (which is the case for every timehomogeneous process, as, e.g., the transfer operator associated with (1)), then there are ${\kappa}_{i}\le 0$ with ${\lambda}_{i}^{\tau}=exp({\kappa}_{i}\tau )$ such that ${t}_{i}={\kappa}_{i}^{1}$ holds. Assuming there is a spectral gap, the dominant time scales satisfy $\infty ={t}_{1}>\dots \ge {t}_{k}\gg {t}_{k+1}$. These are the time scales of the slow dynamical processes, also called rare events, which are of primary interest in applications. The other, fast processes are regarded as fluctuations around the relative equilibria (or metastable states) between which the relevant slow processes travel.
3. Markov State Models for Reversible Systems in Equilibrium
In the following, we will fix a lag time $\tau >0$, and drop the superscript $\tau $ from the transfer operators for clarity of notation.
3.1. Preliminaries on Equilibrium Markov State Models
Generally, in the equilibrium case, a generalized MSM (GMSM) is any matrix ${T}_{k}\in {\mathbb{R}}^{{n}_{k}\times {n}_{k}}$, ${n}_{k}\ge k$, that approximates the k dominant time scales of $\mathcal{T}$, i.e., its dominant eigenvalues;
$${\lambda}_{\mathrm{dom}}({T}_{k})\approx {\lambda}_{\mathrm{dom}}(\mathcal{T})\phantom{\rule{0.166667em}{0ex}}.$$
It is natural to ask for some structural properties of $\mathcal{T}$ to be reproduced by ${T}_{k}$, such as:
 $\mathcal{T}$ is a positive operator ⟷ all entries of ${T}_{k}$ are nonnegative;
 $\mathcal{T}$ is probabilitypreserving ⟷ each column sum of ${T}_{k}$ is 1.
These two properties together make ${T}_{k}$ to a stochastic matrix, and in this case ${T}_{k}$ is usually called a MSM. We shall use the term Generalized MSM for a matrix ${T}_{k}$ that violates these requirements but still approximates the dominant spectral components of the underlying operator. Another structural property that one would usually ask for is to have apart from the time scales/eigenvalues also some approximation of the associated eigenvectors of $\mathcal{T}$, as these are the dynamic observables related to the slow dynamics. This is incorporated in the general approach, which we discuss next.
The question is now how to obtain a GMSM ${T}_{k}$ for a given $\mathcal{T}$. To connect these objects, a natural and popular approach is to obtain the reduced model ${T}_{k}$ via projection. To this end, let $\mathcal{Q}:{L}_{\mu}^{2}\to \mathbb{V}\subset {L}_{\mu}^{2}$ be a projection onto a ${n}_{k}$dimensional subspace $\mathbb{V}$. The GMSM is then defined by the projected transfer operator
and ${T}_{k}$ can now be taken as the matrix representation of the $\mathbb{V}$restriction of the projected operator ${\mathcal{T}}_{k}:\mathbb{V}\to \mathbb{V}$ with respect to a chosen basis of $\mathbb{V}$.
$${\mathcal{T}}_{k}=\mathcal{Q}\mathcal{T}\mathcal{Q}\phantom{\rule{0.166667em}{0ex}};$$
Is there a “best” choice for the projection? If we also ask for perfect approximation of the time scales, i.e., ${\lambda}_{\mathrm{dom}}({T}_{k})={\lambda}_{\mathrm{dom}}(\mathcal{T})$, the requirement of parsimony—such that the model size is minimal, i.e., ${n}_{k}=k$—leaves us with a unique choice for $\mathbb{V}$, namely the space spanned by the dominant (normalized) eigenfunctions ${\varphi}_{i}$ of $\mathcal{T}$, $i=1,\dots ,k$. This follows from the socalled variational principle (or Rayleigh–Ritz method) [9,10]. In fact, it makes a stronger claim: every projection to a kdimensional space ${\mathbb{V}}^{\prime}$ yields a GMSM ${\mathcal{T}}_{k}^{\prime}:{\mathbb{V}}^{\prime}\to {\mathbb{V}}^{\prime}$ which underestimates the dominant time scales, i.e., ${\lambda}_{i}({\mathcal{T}}_{k}^{\prime})\le {\lambda}_{i}(\mathcal{T})$, $i=1,\dots ,k$; and equality holds only for the projections on the eigenspaces.
Note that the discussion about the time scales (equivalently, the eigenvalues) involves only the range of the projection, the space $\mathbb{V}$. However, there are multiple ways to project on the space $\mathbb{V}$. It turns out, that the $\mu $orthogonal projection given by
is superior to all of them, if we consider a stronger condition than simply reproducing the dominant time scales. This condition is the requirement of minimal propagation error, and it will be central to our generalization of GMSMs for nonequilibrium, or even timeinhomogeneous systems. Let us define the best kdimensional approximation ${\mathcal{T}}_{k}$ to $\mathcal{T}$, i.e., the best projection $\mathcal{Q}$, as the rankk operator satisfying
where $\parallel \xb7\parallel $ denotes the induced operator norm for operators mapping ${L}_{\mu}^{2}$ to itself. The induced norm of an operator $\mathcal{A}:\mathbb{X}\to \mathbb{Y}$ is defined by $\parallel \mathcal{A}\parallel ={max}_{{\parallel f\parallel}_{\mathbb{X}}=1}{\parallel \mathcal{A}f\parallel}_{\mathbb{Y}}$, where ${\parallel \xb7\parallel}_{\mathbb{X}}$ and ${\parallel \xb7\parallel}_{\mathbb{Y}}$ are the norms on the spaces $\mathbb{X}$ and $\mathbb{Y}$, respectively.
$$\mathcal{Q}f=\sum _{i=1}^{k}{\langle {\varphi}_{i},f\rangle}_{\mu}\phantom{\rule{0.166667em}{0ex}}{\varphi}_{i}$$
$$\parallel \mathcal{T}{\mathcal{T}}_{k}\parallel \le \parallel \mathcal{T}{\mathcal{T}}_{k}^{\prime}\parallel \phantom{\rule{0.166667em}{0ex}},$$
Equivalently, (11) can be viewed as a result stating that ${\mathcal{T}}_{k}$ is the kdimensional approximation of $\mathcal{T}$ yielding the smallest (worstcase) error in density propagation:
where ${x}_{*}={\mathrm{arg}\mathrm{min}}_{x}h(x)$ means that ${x}_{*}$ minimizes the function h, possibly subject to constraints that are listed under arg min.
$${\mathcal{T}}_{k}=\underset{\begin{array}{c}{\mathcal{T}}_{k}^{\prime}={\mathcal{Q}}^{\prime}\mathcal{T}{\mathcal{Q}}^{\prime}\\ \mathrm{rank}\phantom{\rule{0.166667em}{0ex}}{\mathcal{Q}}^{\prime}=k\end{array}}{\mathrm{arg}\mathrm{min}}\underset{{\parallel f\parallel}_{\mu}=1}{\mathrm{max}}{\parallel \mathcal{T}f{\mathcal{T}}_{k}^{\prime}f\parallel}_{\mu}\phantom{\rule{0.166667em}{0ex}},$$
To summarize, the best GMSM (9) in terms of (11) (or, equivalently, (12)) is given by the projection (10). This follows from the selfadjointness of $\mathcal{T}$ and the Eckard–Young theorem; details can be found in [30] and in Appendix A. Caution is needed however, when interpreting ${\mathcal{T}}_{k}f$ as the propagation of a given probability density f. The projection to the dominant eigenspace in general does not respect positivity (i.e., $f\ge 0\nRightarrow {\mathcal{T}}_{k}f\ge 0$), thus ${\mathcal{T}}_{k}f$ loses its probabilistic meaning. This is the price to pay for the perfectly reproduced dominant time scales. We can retain a physical interpretation of a MSM if we accept that the dominant time scales will be slightly off, as we discuss in the next section.
3.2. Metastable Sets
There is theoretical evidence [1] that the more pronounced the metastable behavior of system is (in the sense that the size of the time scale gap ${t}_{1}\ge \dots \ge {t}_{k}\gg {t}_{k+1}$ is large), the more constant the dominant eigenfunctions ${\varphi}_{i}$ are on the metastable sets ${\mathbb{M}}_{1},\dots ,{\mathbb{M}}_{d}$, given the lag time with respect to which the transfer operator $\mathcal{T}={\mathcal{T}}^{\tau}$ is taken satisfies $\tau \gg {t}_{k+1}$. Assuming such a situation, the eigenfunctions of $\mathcal{T}$ can approximately be combined from the characteristic functions over the metastable sets, i.e., with the abbreviation $1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{i}:=1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{{\mathbb{M}}_{i}}$ it holds that
where the ${c}_{ij}$ are components of the linear combination, such that the ${\widehat{\varphi}}_{i}$ are orthonormal. Using the “approximate eigenfunctions” ${\widehat{\varphi}}_{i}$ defined in (13), the modified projection
defines a new MSM ${\widehat{\mathcal{T}}}_{k}:=\widehat{\mathcal{Q}}\mathcal{T}\widehat{\mathcal{Q}}$. Since $\mathbb{V}=\mathrm{span}\{{\varphi}_{i}\}\approx \mathrm{span}\{{\widehat{\varphi}}_{i}\}=\widehat{\mathbb{V}}$, also $\widehat{\mathcal{Q}}\approx \mathcal{Q}$, and thus we have ${\widehat{\mathcal{T}}}_{k}\approx {\mathcal{T}}_{k}$. This implies [36], Lemma 3.5 that also their dominant eigenvalues, hence time scales are close. Further, we have that in the basis ${\{1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{i}/{\langle 1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{i},1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{i}\rangle}_{\mu}\}}_{i=1}^{k}$ the matrix representation ${\widehat{T}}_{k}$ of the $\widehat{\mathbb{V}}$restriction of the operator ${\widehat{\mathcal{T}}}_{k}$ has the entries
where ${\mathsf{P}}_{\mu}[\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}x\in \mathbb{M}]$ denotes the probability measure that arises if $x\in \mathbb{M}$ has distribution $\mu $ (restricted to $\mathbb{M}$). That is, ${\widehat{T}}_{k}$ has the transition probabilities between the metastable sets as entries, giving a direct physical interpretation of the MSM. Note, however, that for this approximation to reproduce the dominant time scales well, i.e., to have ${t}_{i}\approx {\widehat{t}}_{i}$, $i=1,\dots ,k$, we need a strong separation of time scales in the sense that ${t}_{k}\gg {t}_{k+1}$ has to hold, and the lag time $\tau $ needs to be chosen sufficiently large [7].
$${\varphi}_{i}\approx \sum _{j=1}^{k}{c}_{ij}1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{j}=:{\widehat{\varphi}}_{i}\phantom{\rule{0.166667em}{0ex}},$$
$$\widehat{\mathcal{Q}}f=\sum _{i=1}^{k}{\langle {\widehat{\varphi}}_{i},f\rangle}_{\mu}{\widehat{\varphi}}_{i}$$
$$\begin{array}{cc}\hfill {\widehat{T}}_{k,ij}& =\frac{{\langle 1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{i},\mathcal{T}1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{j}\rangle}_{\mu}}{{\langle 1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{j},1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{j}\rangle}_{\mu}}\hfill \\ & ={\int}_{{\mathbb{M}}_{i}}\mathcal{T}\left(\frac{1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{j}}{{\langle 1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{j},1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{j}\rangle}_{\mu}}\right)\mu (x)dx\hfill \\ & =\frac{1}{{\mathsf{P}}_{\mu}[{x}_{0}\in {\mathbb{M}}_{j}]}{\int}_{{\mathbb{M}}_{i}}{\int}_{{\mathbb{M}}_{j}}\mu (x){p}^{t}(x,y)\phantom{\rule{0.166667em}{0ex}}dx\phantom{\rule{0.166667em}{0ex}}dy\hfill \\ & ={\mathsf{P}}_{\mu}[{x}_{t}\in {\mathbb{M}}_{i}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{x}_{0}\in {\mathbb{M}}_{j}]\phantom{\rule{0.166667em}{0ex}},\hfill \end{array}$$
3.3. Example: Stationary Diffusion in DoubleWell Potential
Let us consider the diffusion (1) in the potential landscape $W(x)={({x}^{2}1)}^{2}$ with $\beta =5$; cf. Figure 1 (left). With the lag time $\tau =10$ we approximate the Perron–Frobenius operator $\mathcal{P}={\mathcal{P}}^{t}$ and compute its eigenvector $\mu $ at the eigenvalue ${\lambda}_{1}=1$. Then, we compute the transfer operator $\mathcal{T}={\mathcal{T}}^{\tau}$ with respect to the stationary distribution $\mu $, and its dominant eigenvalues ${\lambda}_{2},{\lambda}_{3},\dots $ and corresponding eigenvectors ${\varphi}_{2},{\varphi}_{3},\dots $ (Figure 1, right). While ${\lambda}_{2}=0.888$, we have ${\lambda}_{3}<{10}^{12}$, hence we have a clear time scale separation, ${t}_{2}=84.1\gg 0.35={t}_{3}$, cf. (7).
Thus, we expect a rank2 MSM to recover the dominant time scales very well. Indeed, choosing ${\mathbb{M}}_{1}=(\infty ,0]$ and ${\mathbb{M}}_{2}=[0,\infty )$ gives ${\varphi}_{2}\approx 1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{{\mathbb{M}}_{1}}+1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{{\mathbb{M}}_{2}}$, and we obtain by (15) that
$${\widehat{T}}_{2}=\left(\begin{array}{cc}0.943& 0.057\\ 0.057& 0.943\end{array}\right).$$
This is a stochastic matrix with eigenvalues ${\widehat{\lambda}}_{1}=1$ and ${\widehat{\lambda}}_{2}=0.886$, i.e., yielding an approximate time scale ${\widehat{t}}_{2}=82.4$.
4. Markov State Models for TimeInhomogeneous Systems
As all our nonequilibrium cases will be special instances of the most general, timeinhomogeneous case, we consider this next.
4.1. Minimal Propagation Error by Projections
4.1.1. Conceptual Changes
The above approach to Markov state modeling is relying on the existence of an stationary distribution and reversibility. In the case of a timeinhomogeneous system there will not be, in general, any stationary distribution $\mu $. Additionally, we are lacking physical meaning, since it is unclear with respect to which ensemble the dynamical fluctuations should be described. From a mathematical perspective there is a problem as well, since the construction relies on the reversibility of the underlying system, which gives the selfadjointness of the operator $\mathcal{T}$ with respect to the weighted scalar product ${\langle \xb7,\xb7\rangle}_{\mu}$. Timeinhomogeneous systems are not reversible in general.
Additionally to these structural properties, we might need to depart from some conceptional ones as well. As timeinhomogeneity usually stems from an external forcing that might not be present or known for all times, we need a description of the system on a finite time interval. This disrupts the concept of dominant time scales as they are considered in equilibrium systems, because there it relies on selfsimilarity of observing an eigenmode over and over for arbitrary large times. It also forces us to revisit the concept of metastability for two reasons. First, many definitions of metastability rely on statistics under the assumption that we observe the system for infinitely long times. Second, as an external forcing may theoretically arbitrarily distort the energy landscape, it is a priori unclear what could be a metastable set.
As a remedy, we aim at another property when trying to reproduce the effective behavior of the full system by a reduced model; this will be minimizing the propagation error, as in (12). Remarkably, this will also allow for a physical interpretation through socalled coherent sets; analogous to metastable sets in the equilibrium case.
A prototypical timeinhomogeneous system can be given by
where the potential W now depends explicitly on time t. In this case, a lag time $\tau $ is not sufficient to parametrize the statistical evolution of the system, because we need to know when we start the evolution. Thus, transition density functions need two time parameters, e.g., ${p}^{s,t}(x,\xb7)$ denotes the distribution of ${x}_{t}$ conditional to ${x}_{s}=x$. Similarly, the transfer operators $\mathcal{P},\mathcal{T},\mathcal{K}$ are parametrized by two times as well, e.g., ${\mathcal{P}}^{s,t}$ propagates probability densities from initial time s to final time t (alternatively, from initial time s for lag time $\tau =ts$). To simplify notation, we will settle for some initial and final times, and drop these two time parameters, as they stay fixed.
$$\mathrm{d}{x}_{t}=\nabla W(t,{x}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+\sqrt{2{\beta}^{1}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{w}_{t}\phantom{\rule{0.166667em}{0ex}},$$
4.1.2. Adapted Transfer Operators
Let us observe the system from initial time ${t}_{0}$ to final time ${t}_{1}$, such that its distribution at initial time is given by ${\mu}_{0}$. Then, if $\mathcal{P}$ denotes the propagator of the system from ${t}_{0}$ to ${t}_{1}$, then we can express the final distribution at time ${t}_{1}$ by ${\mu}_{1}=\mathcal{P}{\mu}_{0}$. As the transfer operator in equilibrium case was naturally mapping ${L}_{\mu}^{2}$ to itself (because $\mu $ was invariant), here it is natural to consider the transfer operator mapping densities (functions) with respect to ${\mu}_{0}$ to densities with respect to ${\mu}_{1}$. Thus, we define the transfer operator $\mathcal{T}:{L}_{{\mu}_{0}}^{2}\to {L}_{{\mu}_{1}}^{2}$ by
which is the nonequilibrium analogue to (4). This operator naturally retains some properties of the equilibrium transfer operator [37]:
$$\mathcal{T}u:=\frac{1}{{\mu}_{1}}\mathcal{P}\left(u{\mu}_{0}\right),$$
 $\mathcal{T}1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}1=1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}1$, encoding the property that ${\mu}_{0}$ is mapped to ${\mu}_{1}$ by the propagator $\mathcal{P}$.
 $\mathcal{T}$ is positive and integralpreserving, thus ${\sigma}_{max}(\mathcal{T})=1$.
 Its adjoint is the Koopman operator $\mathcal{K}:{L}_{{\mu}_{1}}^{2}\to {L}_{{\mu}_{0}}^{2}$, $\mathcal{K}g(x)=\mathsf{E}\left[g({x}_{t})\phantom{\rule{0.166667em}{0ex}}\right{x}_{0}=x]$.
4.1.3. An Optimal NonStationary GMSM
As already mentioned above, it is not straightforward how to address the problem of Markov state modeling in this timeinhomogeneous case via descriptions involving time scales or metastability. Instead, our strategy will be to search for a rankk projection ${\mathcal{T}}_{k}$ of the transfer operator $\mathcal{T}$ with minimal propagation error, to be described below.
The main point is now that due to the nonstationarity the domain ${L}_{{\mu}_{0}}^{2}$ (where $\mathcal{T}$ maps from) and range ${L}_{{\mu}_{1}}^{2}$ (where $\mathcal{T}$ maps to) of the transfer operator $\mathcal{T}$ are different spaces, hence it is natural to choose different rankk subspaces as domain and range of ${\mathcal{T}}_{k}$ too. In fact, it is necessary to choose domain and range differently, since $f\in {L}_{{\mu}_{0}}^{2}$ has a different meaning than $f\in {L}_{{\mu}_{1}}^{2}$. Thus, we will search for projectors ${\mathcal{Q}}_{0}:{L}_{{\mu}_{0}}^{2}\to {\mathbb{V}}_{0}\subset {L}_{{\mu}_{0}}^{2}$ and ${\mathcal{Q}}_{1}:{L}_{{\mu}_{1}}^{2}\to {\mathbb{V}}_{1}\subset {L}_{{\mu}_{1}}^{2}$ on different kdimensional subspaces ${\mathbb{V}}_{0}$ and ${\mathbb{V}}_{1}$, respectively, such that the reduced operator
has essentially optimal propagation error. In quantitative terms, we seek to solve the optimization problem
where $\parallel \xb7\parallel $ denotes the induced operator norm of operators mapping ${L}_{{\mu}_{0}}^{2}$ to ${L}_{{\mu}_{1}}^{2}$.
$${\mathcal{T}}_{k}:={\mathcal{Q}}_{1}\mathcal{T}{\mathcal{Q}}_{0}$$
$${\mathcal{T}}_{k}=\underset{\begin{array}{c}{\mathcal{T}}_{k}^{\prime}={\mathcal{Q}}_{1}^{\prime}\mathcal{T}{\mathcal{Q}}_{0}^{\prime}\\ \mathrm{rank}\phantom{\rule{0.166667em}{0ex}}{\mathcal{Q}}_{0}^{\prime}=k\\ \mathrm{rank}\phantom{\rule{0.166667em}{0ex}}{\mathcal{Q}}_{1}^{\prime}=k\end{array}}{\mathrm{arg}\mathrm{min}}\underset{{\parallel f\parallel}_{{\mu}_{0}}=1}{max}\parallel \mathcal{T}f{\mathcal{T}}_{k}^{\prime}{f\parallel}_{{\mu}_{1}}\phantom{\rule{1.em}{0ex}}\mathrm{or},\phantom{\rule{4.pt}{0ex}}\mathrm{equivalently}\phantom{\rule{1.em}{0ex}}{\mathcal{T}}_{k}=\underset{\begin{array}{c}{\mathcal{T}}_{k}^{\prime}={\mathcal{Q}}_{1}^{\prime}\mathcal{T}{\mathcal{Q}}_{0}^{\prime}\\ \mathrm{rank}\phantom{\rule{0.166667em}{0ex}}{\mathcal{Q}}_{0}^{\prime}=k\\ \mathrm{rank}\phantom{\rule{0.166667em}{0ex}}{\mathcal{Q}}_{1}^{\prime}=k\end{array}}{\mathrm{arg}\mathrm{min}}\parallel \mathcal{T}{\mathcal{T}}_{k}^{\prime}\parallel \phantom{\rule{0.166667em}{0ex}},$$
As an implication of the Eckart–Young theorem [38] Theorem 4.4.7, the solution of (19) can explicitly be given through singular value decomposition of $\mathcal{T}$; yielding the variational approach for Markov processes (VAMP) [30]. More precisely, the k largest singular values ${\sigma}_{1}\ge \dots \ge {\sigma}_{k}$ of $\mathcal{T}$ have right and left singular vectors ${\varphi}_{i},{\psi}_{i}$ satisfying ${\langle {\varphi}_{i},{\varphi}_{j}\rangle}_{{\mu}_{0}}={\delta}_{ij},{\langle {\psi}_{i},{\psi}_{j}\rangle}_{{\mu}_{1}}={\delta}_{ij}$, respectively, i.e., $\mathcal{T}{\varphi}_{i}={\sigma}_{i}{\psi}_{i}$. Choosing
solves (19), see Appendix A.
$${\mathcal{Q}}_{0}f=\sum _{i=1}^{k}{\langle {\varphi}_{i},f\rangle}_{{\mu}_{0}}{\varphi}_{i}\phantom{\rule{1.em}{0ex}}\mathrm{and}\phantom{\rule{1.em}{0ex}}{\mathcal{Q}}_{1}g=\sum _{i=1}^{k}{\langle {\psi}_{i},g\rangle}_{{\mu}_{1}}{\psi}_{i}$$
4.2. Coherent Sets
Similarly to the reversible equilibrium case with pronounced metastability in Section 3.2, it is also possible in the timeinhomogeneous case to give our GMSM (18) from Section 4.1 a physical interpretation—under some circumstances.
In the reversible equilibrium situation, recall from (13) that in the case of sufficient time scale separation the eigenfunctions are almost constant on metastable sets. In the timeinhomogeneous situation, considered now, we have just shown that the role played before by the eigenfunctions is taken by left and right singular functions. Thus, let us assume for now that there are two collections of sets, ${\mathbb{M}}_{0,1},\dots ,{\mathbb{M}}_{0,k}$ at initial time, and ${\mathbb{M}}_{1,1},\dots ,{\mathbb{M}}_{1,k}$ at final time, such that
holds with appropriate scalars ${c}_{ij}$ and ${d}_{ij}$, where we used the abbreviation $1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{0,i}=1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{{\mathbb{M}}_{0,i}}$ and $1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{1,i}=1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{{\mathbb{M}}_{1,i}}$. That means, dominant right singular functions ${\varphi}_{i}$ are almost constant on the sets ${\mathbb{M}}_{0,j}$, and dominant left singular functions ${\psi}_{i}$ are almost constant on the sets ${\mathbb{M}}_{1,j}$. In analogy to (14), we modify the projections ${\mathcal{Q}}_{0},{\mathcal{Q}}_{1}$ from (20) to ${\widehat{\mathcal{Q}}}_{0}:{L}_{{\mu}_{0}}^{2}\to {\widehat{\mathbb{V}}}_{0},{\widehat{\mathcal{Q}}}_{1}:{L}_{{\mu}_{1}}^{2}\to {\widehat{\mathbb{V}}}_{1}$ by using ${\widehat{\varphi}}_{i}$ and ${\widehat{\psi}}_{i}$ instead of ${\varphi}_{i}$ and ${\psi}_{i}$, respectively, and define the modified GMSM by ${\widehat{\mathcal{T}}}_{k}={\widehat{\mathcal{Q}}}_{1}\mathcal{T}{\widehat{\mathcal{Q}}}_{0}$. An analogous computation to (15) yields for the matrix representation ${\widehat{T}}_{k}$ of the restriction ${\widehat{\mathcal{T}}}_{k}:{\widehat{\mathbb{V}}}_{0}\to {\widehat{\mathbb{V}}}_{1}$ with respect to the bases ${\{1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{0,i}/{\langle 1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{0,i},1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{0,i}\rangle}_{{\mu}_{0}}\}}_{i=1}^{k}$ and ${\{1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{1,i}/{\langle 1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{1,i},1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{1,i}\rangle}_{{\mu}_{1}}\}}_{i=1}^{k}$ that
$${\varphi}_{i}\approx \sum _{j=1}^{k}{c}_{ij}1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{0,j}=:{\widehat{\varphi}}_{i}\phantom{\rule{2.em}{0ex}}\mathrm{and}\phantom{\rule{2.em}{0ex}}{\psi}_{i}\approx \sum _{j=1}^{k}{d}_{ij}1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{1,j}=:{\widehat{\psi}}_{i}$$
$${\widehat{T}}_{k,ij}={\mathsf{P}}_{{\mu}_{0}}\left[{x}_{{t}_{1}}\in {\mathbb{M}}_{1,i}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{x}_{{t}_{0}}\in {\mathbb{M}}_{0,j}\right].$$
In other words, the entries of ${\widehat{T}}_{k}$ contain the transition probabilities from the sets ${\mathbb{M}}_{0,i}$ (at initial time) into the sets ${\mathbb{M}}_{1,j}$ (at final time). Thus, ${\widehat{T}}_{k}$ has the physical interpretation of a MSM, with the only difference to the reversible stationary situation being that the “metastable” sets at initial and final time are different. This can be seen as a natural reaction to the fact that in the timeinhomogeneous case the dynamical environment (e.g., the potential energy landscape governing the dynamics of a molecule) can change in time.
It remains to discuss when does (21) actually hold true. It is comprehensively discussed in [17] that a sufficient condition for (21) is if
holds for $i=1,\dots ,k$. Equation (23) says that if the process starts in ${\mathbb{M}}_{0,i}$, it ends up at final time with high probability in ${\mathbb{M}}_{1,i}$, and that if the process ended up in ${\mathbb{M}}_{1,i}$ at final time, in started with high probability in ${\mathbb{M}}_{0,i}$; see Figure 2. This can be seen as a generalization of the metastability condition from Section 3.2 that allows for an efficient lowrank Markov modeling in the timehomogeneous case. The pairs of sets ${\mathbb{M}}_{0,i},{\mathbb{M}}_{1,i}$ are called coherent (set) pair, and they have been shown to be very effective tools identifying timedependent regions in nonautonomous flow fields that do not mix with their surrounding (this is, effectively, what (23) says), e.g., moving vortices in atmospheric and oceanographic applications [15,16,39,40]. More details on the generalization of the concept of metastability by coherent sets, and on subsequent Markov state modeling can be found in [17].
$${\mathsf{P}}_{{\mu}_{0}}\left[{x}_{{t}_{1}}\in {\mathbb{M}}_{1,i}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{x}_{{t}_{0}}\in {\mathbb{M}}_{0,i}\right]\approx 1\phantom{\rule{2.em}{0ex}}\mathrm{and}\phantom{\rule{2.em}{0ex}}{\mathsf{P}}_{{\mu}_{1}}\left[{x}_{{t}_{0}}\in {\mathbb{M}}_{0,i}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{x}_{{t}_{1}}\in {\mathbb{M}}_{1,i}\right]\approx 1$$
4.3. Example: Diffusion in Shifting TripleWell Potential
Let us consider the diffusion (1) in the timedependent potential landscape
with $\beta =5$ and on the time interval $[{t}_{0},{t}_{1}]=[0,10]$; cf. Figure 3 (left). Taking the initial distribution ${\mu}_{0}\propto exp(\beta W(0,\xb7))$, we build the transfer operator (17), and consider its dominant singular values:
$$W(t,x)=7{\left((xt/10)(x1t/10)(x+1t/10)\right)}^{2}$$
$${\sigma}_{1}=1.000,\phantom{\rule{1.em}{0ex}}{\sigma}_{2}=0.734,\phantom{\rule{1.em}{0ex}}{\sigma}_{3}=0.536,\phantom{\rule{1.em}{0ex}}{\sigma}_{4}\approx 0\phantom{\rule{0.166667em}{0ex}}.$$
This indicates that a rank3 GMSM is sufficient to approximate the system, and that we have three coherent sets. We observe the characteristic almost constant behavior (21) of the left and right singular vectors over the respective coherent sets; Figure 3 (middle and right). Recall that right singular vectors show coherent sets at initial time, and left singular vectors the associated coherent sets at final time.
We can identify the three wells as three coherent sets. Figure 4 shows that they are coherent indeed: the respective parts of the initial ensemble ${\mu}_{0}$ is to a large extent mapped onto the corresponding part of the final ensemble ${\mu}_{1}$, cf. Figure 2 and (23).
Computing the MSM from the transition probabilities between the coherent sets as in (22) gives the stochastic matrix
$${\widehat{T}}_{3}=\left(\begin{array}{ccc}0.794& 0.150& 0.026\\ 0.196& 0.767& 0.274\\ 0.010& 0.083& 0.701\end{array}\right).$$
The initial distribution ${\widehat{\mu}}_{0}$ of this MSM is given by the probability that ${\mu}_{0}$ assigns to the respective coherent sets at initial time. Analogously, collecting the probabilities from ${\mu}_{1}$ in the coherent sets at final time gives the final distribution ${\widehat{\mu}}_{1}$ of the MSM. We have
$${\widehat{\mu}}_{0}=\left(\begin{array}{c}0.250\\ 0.500\\ 0.250\end{array}\right)\phantom{\rule{1.em}{0ex}}\mathrm{and}\phantom{\rule{1.em}{0ex}}{\widehat{\mu}}_{1}=\left(\begin{array}{c}0.280\\ 0.500\\ 0.219\end{array}\right).$$
The singular values of ${\widehat{T}}_{k}$ as mapping from the ${\widehat{\mu}}_{0}$weighted ${\mathbb{R}}^{3}$ to the ${\widehat{\mu}}_{1}$weighted ${\mathbb{R}}^{3}$ are
they are in good agreement with the true singular values of $\mathcal{T}$.
$${\widehat{\sigma}}_{1}=1.000,\phantom{\rule{1.em}{0ex}}{\widehat{\sigma}}_{2}=0.733,\phantom{\rule{1.em}{0ex}}{\widehat{\sigma}}_{3}=0.534\phantom{\rule{0.166667em}{0ex}};$$
We repeat the computation with a different initial distribution ${\mu}_{0}$, where only the left and right well are initially populated, as shown in Figure 5.
The largest singular values of $\mathcal{T}$,
already show that there are only two coherent sets, as the third singular value is significantly smaller than the second one. The left well forms one coherent set, and the union of the middle and right ones form the second coherent set.
$${\sigma}_{1}=1.000,\phantom{\rule{1.em}{0ex}}{\sigma}_{2}=0.643,\phantom{\rule{1.em}{0ex}}{\sigma}_{3}=0.030,\phantom{\rule{1.em}{0ex}}{\sigma}_{4}\approx 0\phantom{\rule{0.166667em}{0ex}},$$
5. DataBased Approximation
5.1. Setting and Auxiliary Objects
We would like to estimate the GMSM (18) from trajectory data. In the timeinhomogeneous setting, let us assume that we have m data points ${x}_{1},\dots ,{x}_{m}$ at time ${t}_{0}$, and their (random) images ${y}_{1},\dots ,{y}_{m}$ at time ${t}_{1}$, meaning that ${y}_{i}$ is a random sample of the underlying process at time ${t}_{1}$, given it started in ${x}_{i}$ at time ${t}_{0}$. We can think of the empirical distribution of the ${x}_{i}$ and ${y}_{i}$ being estimates of ${\mu}_{0}$ and ${\mu}_{1}$, respectively.
Let us further define two sets of basis functions ${\chi}_{0,1},\dots ,{\chi}_{0,n}$ and ${\chi}_{1,1},\dots ,{\chi}_{1,n}$, which we would like to use to approximate the GMSM. If we would like to estimate the first k dominant modes, the least requirement is $n\ge k$; in general we have $n\gg k$. The vectorvalued functions
are basis functions at initial and final times, respectively. One can take ${\chi}_{0}$ and ${\chi}_{1}$ to have different lengths too, we just chose them to have the same lengths for convenience. Now we can define the data matrices
$${\chi}_{0}=\left(\begin{array}{c}{\chi}_{0,1}\\ \vdots \\ {\chi}_{0,n}\end{array}\right),\phantom{\rule{1.em}{0ex}}{\chi}_{1}=\left(\begin{array}{c}{\chi}_{1,1}\\ \vdots \\ {\chi}_{1,n}\end{array}\right)$$
$${\mathbf{\chi}}_{0}=\left(\begin{array}{cc}& \\ {\chi}_{0}({x}_{1})& \dots & {\chi}_{0}({x}_{m})\\ & \end{array}\right),\phantom{\rule{1.em}{0ex}}{\mathbf{\chi}}_{1}=\left(\begin{array}{cc}& \\ {\chi}_{1}({y}_{1})& \dots & {\chi}_{1}({y}_{m})\\ & \end{array}\right)\phantom{\rule{0.166667em}{0ex}}.$$
The following $n\times n$ correlation matrices ${C}_{00},{C}_{01},{C}_{11}$ will be needed later:
$${C}_{00,ij}={\langle {\chi}_{0,i},{\chi}_{0,j}\rangle}_{{\mu}_{0}},\phantom{\rule{1.em}{0ex}}{C}_{01,ij}={\langle {\chi}_{1,i},\mathcal{T}{\chi}_{0,j}\rangle}_{{\mu}_{1}},\phantom{\rule{1.em}{0ex}}{C}_{11,ij}={\langle {\chi}_{1,i},{\chi}_{1,j}\rangle}_{{\mu}_{1}}\phantom{\rule{0.166667em}{0ex}}.$$
Their Monte Carlo estimates from the trajectory data are given by products of the datamatrices, as
$${C}_{00}\approx \frac{1}{m}{\mathbf{\chi}}_{0}{\mathbf{\chi}}_{0}^{T},\phantom{\rule{1.em}{0ex}}{C}_{01}\approx \frac{1}{m}{\mathbf{\chi}}_{1}{\mathbf{\chi}}_{0}^{T},\phantom{\rule{1.em}{0ex}}{C}_{11}\approx \frac{1}{m}{\mathbf{\chi}}_{1}{\mathbf{\chi}}_{1}^{T}\phantom{\rule{0.166667em}{0ex}}.$$
Note that the approximations in (24) become exact if we take ${\mu}_{0},{\mu}_{1}$ to be the empirical distributions ${\mu}_{0}={\textstyle \frac{1}{m}}{\sum}_{i}\delta (\xb7{x}_{i})$ and ${\mu}_{1}={\textstyle \frac{1}{m}}{\sum}_{i}\delta (\xb7{y}_{i})$, where $\delta (\xb7)$ denotes the Dirac delta. We assume that ${C}_{00},{C}_{11}$, just as their databased approximations in (24) are invertible. If they are not, all the occurrences of their inverses below need to be replaced by their Moore–Penrose pseudoinverses. Alternatively, one can also discard basis functions that yield redundant information, until ${C}_{00},{C}_{11}$ are invertible. Further strategies to deal with the situation where the correlation matrices are singular or illconditioned can be found in [18].
5.2. Projection on the Basis Functions
To find the best GMSM representable with the bases ${\chi}_{0}$ and ${\chi}_{1}$, we would like to solve (19) under the constraint that the ranges of ${\mathcal{Q}}_{0}$ and ${\mathcal{Q}}_{1}$ are in ${\mathbb{W}}_{0}:=\mathrm{span}({\chi}_{0})$ and ${\mathbb{W}}_{1}:=\mathrm{span}({\chi}_{1})$, respectively. To the knowledge of the authors it is unknown whether this problem has an explicitly computable solution, because it involves a nontrivial interaction of ${\mathbb{W}}_{0},{\mathbb{W}}_{1}$ and $\mathcal{T}$.
Instead, we will proceed in two steps. First, we compute the projected transfer operator ${\mathcal{T}}_{n}={\mathsf{\Pi}}_{1}\mathcal{T}{\mathsf{\Pi}}_{0}$, where ${\mathsf{\Pi}}_{0}$ and ${\mathsf{\Pi}}_{1}$ are the ${\mu}_{0}$ and ${\mu}_{1}$orthogonal projections on ${\mathbb{W}}_{0}$ and ${\mathbb{W}}_{1}$, respectively. Second, we reduce ${\mathcal{T}}_{n}$ to its best rankk approximation ${\mathcal{T}}_{k}$ (best in the sense of density propagation).
Thus, the restriction ${\mathcal{T}}_{n}$ to ${\mathbb{W}}_{0}\to {\mathbb{W}}_{1}$ is simply the ${\mu}_{1}$orthogonal projection of $\mathcal{T}$ on ${\mathbb{W}}_{1}$, giving the characterization
$${\langle {\chi}_{1,j},\mathcal{T}{\chi}_{0,i}{\mathcal{T}}_{n}{\chi}_{0,i}\rangle}_{{\mu}_{1}}=0,\phantom{\rule{1.em}{0ex}}\forall i,j\phantom{\rule{0.166667em}{0ex}}.$$
It is straightforward to compute that with respect to the bases ${\chi}_{0}$ and ${\chi}_{1}$ the matrix representation ${T}_{n}$ of ${\mathcal{T}}_{n}:{\mathbb{W}}_{0}\to {\mathbb{W}}_{1}$ is given by
see [30].
$${T}_{n}={C}_{11}^{1}{C}_{01}\phantom{\rule{0.166667em}{0ex}},$$
5.3. Best LowRank Approximation
To find the best rankk projection of ${\mathcal{T}}_{n}$, let us now switch to the bases ${\tilde{\chi}}_{0}={C}_{00}^{1/2}{\chi}_{0}$ and ${\tilde{\chi}}_{1}={C}_{11}^{1/2}{\chi}_{1}$. We can switch between representations with respect the these bases by
and similarly for ${\chi}_{1}$ and ${\tilde{\chi}}_{1}$. Again, a direct calculation shows that ${\tilde{\chi}}_{0}$ and ${\tilde{\chi}}_{1}$ build orthonormal bases, i.e., ${\langle {\tilde{\chi}}_{0,i},{\tilde{\chi}}_{0,j}\rangle}_{{\mu}_{0}}={\delta}_{ij}$ and ${\langle {\tilde{\chi}}_{1,i},{\tilde{\chi}}_{1,j}\rangle}_{{\mu}_{1}}={\delta}_{ij}$. This has the advantage, that for any operator ${\mathcal{S}}_{n}:{\mathbb{W}}_{0}\to {\mathbb{W}}_{1}$ having matrix representation ${S}_{n}$ with respect to the bases ${\tilde{\chi}}_{0}$ and ${\tilde{\chi}}_{1}$ we have
where ${\parallel \xb7\parallel}_{2}$ denotes the spectral norm of a matrix (i.e., the matrix norm induced by the Euclidean vector norm). The matrix representation of ${\mathcal{T}}_{n}$ in the new bases is
$$f=\sum _{k=1}^{n}{c}_{k}{\chi}_{0,k}\phantom{\rule{1.em}{0ex}}\u27fa\phantom{\rule{1.em}{0ex}}f=\sum _{k=1}^{n}{\tilde{c}}_{k}{\tilde{\chi}}_{0,k},\phantom{\rule{4.pt}{0ex}}\mathrm{where}\phantom{\rule{4.pt}{0ex}}\tilde{c}={C}_{00}^{1/2}c\phantom{\rule{0.166667em}{0ex}},$$
$$\parallel {\mathcal{S}}_{n}\parallel =\parallel {S}_{n}{\parallel}_{2},$$
$${\tilde{T}}_{n}={C}_{11}^{1/2}{C}_{01}{C}_{00}^{1/2}\phantom{\rule{0.166667em}{0ex}}.$$
However, finding now the best rankk approximation ${\mathcal{T}}_{k}$ of ${\mathcal{T}}_{n}$ amounts, written in these new bases, to
$$\parallel {\mathcal{T}}_{n}{\mathcal{T}}_{k}\parallel =\parallel {\tilde{T}}_{n}{\tilde{T}}_{k}{\parallel}_{2}\to \underset{\mathrm{rank}({\tilde{T}}_{k})=k}{min}.$$
Again, by the Eckart–Young theorem [38] Theorem 4.4.7, the solution to this problem is given by
where $\tilde{U},\tilde{V}\in {\mathbb{R}}^{n\times k}$ are the matrices with columns being the right and left singular vectors of ${\tilde{T}}_{n}$ to the largest k singular values ${\sigma}_{1}\ge \dots \ge {\sigma}_{k}$, and $\tilde{\mathsf{\Sigma}}$ is the diagonal matrix with these singular values on its diagonal. Thus, the best GMSM in terms of propagation error is given with respect to the bases ${\chi}_{0}$ and ${\chi}_{1}$ by
$${\tilde{T}}_{k}=\tilde{V}\tilde{\mathsf{\Sigma}}{\tilde{U}}^{T},$$
$${T}_{k}={C}_{11}^{1/2}\tilde{V}\tilde{\mathsf{\Sigma}}{\tilde{U}}^{T}{C}_{00}^{1/2}\phantom{\rule{0.166667em}{0ex}}.$$
The resulting algorithm to estimate the optimal GMSM is now identical to the timelagged canonical correlation algorithm (TCCA) that results from VAMP and is described in [30].
Algorithm 1 TCCA algorithm to estimate a rankk GMSM. 

Remark 1 (Reversible system with equilibrium data).
If the system in consideration is reversible, the data samples its equilibrium distribution, i.e., ${\mu}_{0}={\mu}_{1}=\mu $, and also ${\chi}_{0}={\chi}_{1}$, then ${C}_{00}={C}_{11}$, and by the selfadjointness of $\mathcal{T}$ from (6) we have ${C}_{01}={C}_{01}^{T}$. Thus, ${\tilde{T}}_{n}$ in (28) is a symmetric matrix, and as such, its singular value and eigenvalue decompositions coincide. Hence, the construction for the best GMSM in this section (disregarding the projection on the basis functions) coincides with the one from Section 3. This is not surprising, as both give the best model in terms of propagation error.
Remark 2 (Other databased methods).
The approximation (26) of the transfer operator has natural connections to other databased approximation methods. It can be seen as a problemadapted generalization of the socalled Extended Dynamic Mode Decomposition (EDMD) [41,42]. Strictly speaking, however, EDMD uses an orthogonal projection with respect to the distribution ${\mu}_{0}$ of the initial data $\{{x}_{i}\}$, and so approximation (32) below is equivalent to it [43]. EDMD has been shown in [33] to be strongly connected to other established analytic tools for (molecular) dynamical data, such as timelagged independent component analysis (TICA) [44,45], blind source separation [46], and the variational approach to conformation analysis [9].
Remark 3 (Sampling the correlation matrices).
Our point of view in (24) and in the discussion following it is that without further knowledge of the system the empirical distributions are the best estimates of the actual system distributions ${\mu}_{0},{\mu}_{1}$; and they are considered to be the same for a given finite set of data. The question how to approximate the correlation matrices efficiently if we are allowed to generate arbitrary amount of simulation data is an important and hard one. The reason for this is that in high dimensions these quantities can only be accessed via (Markov chain) Monte Carlo methods, the convergence speed of which, however, suffers immensely from the rareness of transitions between metastable regions. There are different approaches to circumvent this problem, like importance sampling, where driving the system out of equilibrium accelerates sampling, and fluctuation theorems are used to determine the original quantity of interest [47,48,49,50,51,52,53].
6. TimeHomogeneous Systems and NonStationary Data
In this final section, we illustrate how the above methods can be used to construct a GMSM for and assess properties of a stationary system, even if the simulation data at our disposal does not sample the stationary distribution of the system. In the first example we reconstruct the equilibrium distribution of a reversible system—hence we are able to build an equilibrium GMSM. In the second example we approximate a nonreversible stationary system (i.e., detailed balance does not hold) by a (G)MSM, again from nonstationary data.
Of course, all the examples presented so far can also be computed by the databased algorithm of Section 5.
6.1. Equilibrium MSM from NonEquilibrium Data
When working with simulation data, we need to take into account that this data might not be in equilibrium. Then, obviously, the empirical distribution does not reflect the stationary distribution of the system. In general, any empirical statistical analysis (e.g., counting transitions between a priori known metastable states) will be biased in such a case.
Let us consider a reversible system with equilibrium distribution $\mu $, and let the available trajectory data be ${\mu}_{\mathrm{ref}}$distributed. Then, it is natural to describe the system by its transfer operator ${\mathcal{T}}_{\mathrm{ref}}:{L}_{{\mu}_{\mathrm{ref}}}^{2}\to {L}_{{\mu}_{\mathrm{ref}}}^{2}$ with respect to the reference distribution [18,42]; given explicitly by
$${\mathcal{T}}_{\mathrm{ref}}\phantom{\rule{0.166667em}{0ex}}u(x)=\frac{1}{{\mu}_{\mathrm{ref}}(x)}{\int}_{\mathbb{X}}u(y){\mu}_{\mathrm{ref}}(y)\phantom{\rule{0.166667em}{0ex}}{p}^{t}(y,x)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}y\phantom{\rule{0.166667em}{0ex}}.$$
Note that ${\mu}_{\mathrm{corr}}:=\mu /{\mu}_{\mathrm{ref}}$ is the stationary distribution of this transfer operator, hence we can retrieve the equilibrium distribution of the system by correcting the reference distribution, $\mu ={\mu}_{\mathrm{corr}}{\mu}_{\mathrm{ref}}$.
In the databased context, we choose the same basis ${\chi}_{0}={\chi}_{1}$ for initial and final times, since the system is timehomogeneous. In complete analogy to (26) above, the ${\mu}_{\mathrm{ref}}$orthogonal projection of ${\mathcal{T}}_{\mathrm{ref}}:{L}_{{\mu}_{\mathrm{ref}}}^{2}\to {L}_{{\mu}_{\mathrm{ref}}}^{2}$ to ${\mathbb{V}}_{0}$ is given by the matrix
$${T}_{\mathrm{ref},n}={C}_{00}^{1}{C}_{01}\phantom{\rule{0.166667em}{0ex}}.$$
We will now apply this procedure to the doublewell system from Section 3.3 with initial points ${x}_{1},\dots ,{x}_{m}$ distributed as shown in Figure 6 (gray histogram). We chose the number of points to be $m={10}^{5}$, the basis functions ${\chi}_{0,i}$ to be indicator functions of subintervals of an equipartition of the interval $[2,2]$ into $n=100$ subintervals, and the lag time $\tau =10$. In a preprocessing step we discard all basis functions that do not have any of the points ${x}_{i}$ in their support, thus obtaining a nonsingular ${C}_{00}$, and use the remaining 77 to compute ${T}_{n,\mathrm{ref}}\in {\mathbb{R}}^{77\times 77}$.
We obtain ${\lambda}_{2}=0.894$ giving a time scale ${t}_{2}=89.6$, and the corrected equilibrium distribution—$\mu ={\mu}_{\mathrm{corr}}{\mu}_{\mathrm{ref}}$, where ${\mu}_{\mathrm{corr}}$ is the right eigenvector of ${T}_{\mathrm{ref},n}$ at eigenvalue 1—is shown in Figure 6 (left) by the black curve. The righthand side of this figure shows the results of the same computations, but for a sample size $m={10}^{4}$. Then, we obtain an eigenvalue $0.890$ and corresponding time scale $85.9$.
It is now simple to reconstruct the approximation ${\mathcal{T}}_{n}$ of $\mathcal{T}$, the transfer operator with respect to the equilibrium density. Let ${D}_{\mathrm{corr}}$ denote the diagonal matrix with the elements of ${\mu}_{\mathrm{corr}}$ as diagonal entries. Then, ${T}_{n}={D}_{\mathrm{corr}}^{1}{T}_{n,\mathrm{ref}}{D}_{\mathrm{corr}}$ approximates the matrix representation of ${\mathcal{T}}_{n}$ with respect to our basis of step functions.
Remark 4 (Koopman reweighting).
One can make use of the knowledge that the system that one estimates is reversible, even though due to the finite sample size m this is not necessarily valid for ${T}_{\mathrm{ref},n}$. In [18], the authors add for each sample pair $({x}_{i},{y}_{i})$ also the pair $({x}_{i+m}={y}_{i},{y}_{i+m}={x}_{i})$ to the sample set, thus numerically forcing the estimate to be reversible. In practice, one defines the diagonal matrix $\mathit{W}$ with diagonal ${\mathit{\chi}}^{T}{\mu}_{\mathit{corr}}$, builds the reweighted correlation matrices ${\overline{C}}_{00}=\frac{1}{2}({\mathit{\chi}}_{0}\mathit{W}{\mathit{\chi}}_{0}^{T}+{\mathit{\chi}}_{1}\mathit{W}{\mathit{\chi}}_{1}^{T})$ and ${\overline{C}}_{01}=\frac{1}{2}({\mathit{\chi}}_{1}\mathit{W}{\mathit{\chi}}_{0}^{T}+{\mathit{\chi}}_{0}\mathit{W}{\mathit{\chi}}_{1}^{T})$, and uses them instead of ${C}_{00},{C}_{01}$.
6.2. A NonReversible System with NonStationary Data
Reversible dynamics gives rise to selfadjoint transfer operators, and their theory of Markov state modeling is well developed. However, transfer operators of nonreversible systems are not selfadjoint, hence their spectrum is in general not purely realvalued. Thus, the definition of time scales, and in general the approximation by GMSMs is not fully evolved. Complex eigenvalues indicate cyclic behavior of the process. As this topic is beyond the scope of this paper, we refer the reader to [20,54,55] and to [25,56] for Markov state modeling with cycles.
We will consider a nonreversible system here, and show that restricting its behavior to the dominant singular modes of its transfer operator is able to reproduce its dominant longtime behavior, and even allows for a good, fewstate MSM. Note that the best rankk GMSM (18) maps to the kdimensional subspace ${\mathbb{V}}_{1}$ of left singular vectors, thus its eigenvectors also fall into this subspace.
The system in consideration consists of two driving “forces”, one is a reversible part ${F}_{r}(x)=\nabla W(x)$ coming from the potential
and the other is a circular driving given by
where $\beta =2$ is the inverse temperature, as in (1). The dynamics now is governed by the SDE $\mathrm{d}{x}_{t}=({F}_{r}+{F}_{c})({x}_{t})\phantom{\rule{0.166667em}{0ex}}\mathrm{d}t+\sqrt{2{\beta}^{1}}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{w}_{t}$. It is a diffusion in a 7well potential (the wells are positioned uniformly on the unit circle) with an additional clockwise driving that is strongest along the unit circle and decreases exponentially in the radial distance from this circle.
$$W(x)=cos(7\phi )+10{(r1)}^{2},\phantom{\rule{1.em}{0ex}}\mathrm{where}\phantom{\rule{4.pt}{0ex}}x=\left(\begin{array}{c}rcos(\phi )\\ rsin(\phi )\end{array}\right),$$
$${F}_{c}(x)={\mathrm{e}}^{\beta W(x)}\phantom{\rule{0.166667em}{0ex}}\left(\begin{array}{cc}0& 1\\ 1& 0\end{array}\right)x\phantom{\rule{0.166667em}{0ex}},$$
For our databased analysis we simulate a trajectory of this system of length 500 and sample it every 0.01 time instances to obtain an initial set of $5\times {10}^{4}$ points. Every point herein is taken as initial condition of 100 independent simulations of the SDE for lag time $\tau =1$, thus obtaining $5\times {10}^{6}$ point pairs $({x}_{i},{y}_{i})$.
We observe in Figure 7 (left) that the empirical distribution of the ${x}_{i}$ did not yet converge to the invariant distribution of the system, which would populate every well evenly.
To approximate the transfer operator we use $\chi ={\chi}_{0}={\chi}_{1}$ consisting of the characteristic functions of a uniform $40\times 40$ partition of $[2,2]\times [2,2]$, and restrict this basis set to those 683 partition elements that contain at least one ${x}_{i}$ and ${y}_{j}$. The associated projected transfer operator, ${T}_{n}$ from (26) is then used to compute ${\tilde{T}}_{n}$ from (28), and its singular values
indicating a gap after seven singular values. Thus, we assemble a rank7 GMSM ${T}_{k}$ via (30). This GMSM maps ${L}_{{\mu}_{0}}^{2}$ to ${L}_{{\mu}_{1}}^{2}$, thus to make sense of its eigenmodes, we need to transform its range to densities with respect to ${\mu}_{0}$ instead of ${\mu}_{1}$. As a density u with respect to ${\mu}_{1}$ is made by $\frac{{\mu}_{1}u}{{\mu}_{0}}$ to a density with respect to ${\mu}_{0}$,
rescales the GMSM to map ${L}_{{\mu}_{0}}^{2}$ to itself. Note here that since the basis functions are characteristic functions with disjoint support, the correlation matrices ${C}_{00},{C}_{11}$ are diagonal, having exactly the empirical distributions as diagonal entries—i.e., the number of data point falling into the associated partition element.
$${\sigma}_{1}=1.000,\phantom{\rule{4pt}{0ex}}{\sigma}_{2}=0.872,\phantom{\rule{4pt}{0ex}}{\sigma}_{3}=0.588,\phantom{\rule{4pt}{0ex}}\dots ,\phantom{\rule{4pt}{0ex}}{\sigma}_{7}=0.383,\phantom{\rule{4pt}{0ex}}{\sigma}_{8}=0.052,$$
$${T}_{k}^{\prime}={C}_{00}^{1}{C}_{11}{T}_{n}$$
We are also interested in the system’s invariant distribution. As in Section 6.1, we can correct the reference distribution ${\mu}_{\mathrm{ref}}={\mu}_{0}$ by the first eigenfunction ${\mu}_{\mathrm{corr}}$ of ${T}_{k}^{\prime}$ to yield the invariant distribution $\mu ={\mu}_{\mathrm{corr}}{\mu}_{\mathrm{ref}}$, cf. Figure 7 (middle). The dominant eigenvalues of ${T}_{k}^{\prime}$ are
$$\begin{array}{cc}\hfill {\lambda}_{k,1}^{\prime}=0.998+0.000i,& \phantom{\rule{1.em}{0ex}}{\lambda}_{k,2/3}^{\prime}=0.803\pm 0.261i,\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}{\lambda}_{k,4/5}^{\prime}=0.511\pm 0.230i,& \phantom{\rule{1.em}{0ex}}{\lambda}_{k,6/7}^{\prime}=0.378\pm 0.077i\phantom{\rule{0.166667em}{0ex}},\hfill \end{array}$$
Note that ${\lambda}_{k,1}^{\prime}<1$. This is due to our restriction of the computation to certain partition elements, as specified above. This set of partition elements is not closed under the process dynamics, and this “leakage of probability mass” (about $0.2\%$) is reflected by the dominant eigenvalue. All eigenvalues of ${T}_{k}^{\prime}$ are within $0.5\%$ error from the dominant eigenvalues of the transfer operator ${T}_{n}^{\prime}$ with respect to the stationary distribution (projected on the same basis set, and computed with higher accuracy), which is a surprisingly good agreement.
The 8th eigenvalue of ${T}_{n}^{\prime}$ is smaller in magnitude than $0.03$. As indicated by this spectral gap, we may obtain a fewstate MSM ${\widehat{T}}_{k}$ here as well. To this end we need to find “metastable sets” (although in the case of this cyclically driven system the term metastability is ambiguous) on which we can project the system’s behavior. Let ${v}_{i}={({v}_{i,1},\dots ,{v}_{i,n})}^{T}$ denote the ith eigenvector of ${T}_{k}^{\prime}$. As in the reversible case, where eigenvectors are close to constant on metastable sets, we will seek also here for regions that are characterized by almost constant behavior of the eigenvectors. More precisely, if the pth and qth partition elements belong to the same metastable set, then we expect ${v}_{i,p}\approx {v}_{i,q}$ for $i=1,\dots ,7$. Thus, we embed the pth partition element into ${\mathbb{C}}^{7}\equiv {\mathbb{R}}^{14}$ (i.e., a complex number is represented by two coordinates: its real and imaginary parts) by $p\mapsto {({v}_{1,p},\dots ,{v}_{7,p})}^{T}$, and cluster the hence arising point cloud into 7 clusters by the kmeans clustering algorithm. The result is shown in Figure 7 (right). Taking these sets we can assemble the MSM ${\widehat{T}}_{k}\in {\mathbb{R}}^{7\times 7}$ via (15). We obtain a MSM that maps a Markov state (i.e., a cluster) with probability $0.62$ to itself, with probability $0.29$ to the clockwise next cluster, and with probability $0.06$ to the second next cluster in clockwise direction. The probability to jump one cluster in the counterclockwise direction is below $0.001$. The eigenvalues of ${\widehat{T}}_{k}$,
are also close to those of ${T}_{n}^{\prime}$ (below $1\%$ error), justifying this MSM.
$$\begin{array}{cc}\hfill {\widehat{\lambda}}_{1}=0.998+0.000i,& \phantom{\rule{1.em}{0ex}}{\widehat{\lambda}}_{2/3}=0.800\pm 0.260i,\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}{\widehat{\lambda}}_{4/5}=0.507\pm 0.227i,& \phantom{\rule{1.em}{0ex}}{\widehat{\lambda}}_{6/7}=0.374+0.076i\phantom{\rule{0.166667em}{0ex}},\hfill \end{array}$$
Remark 5.
The kmeans algorithm provides a hard clustering; i.e., every point belongs entirely to exactly one of the clusters. An automated way to find fuzzy metastable sets from a set of eigenvectors is given by the PCCA+ algorithm [12]. A fuzzy clustering assigns to each point a set of nonnegative numbers adding up to 1, indicating the affiliations of that point to each cluster.
Acknowledgments
This work is supported by the Deutsche Forschungsgemeinschaft (DFG) through the CRC 1114 “Scaling Cascades in Complex Systems”, projects A04 and B03, and the Einstein Foundation Berlin (Einstein Center ECMath).
Author Contributions
P.K., H.W. and F.N. conceived and designed the numerical experiments; P.K. performed the simulations; P.K. analyzed the data; P.K., H.W., F.N. and C.S. wrote the paper.
Conflicts of Interest
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.
Abbreviations
The following abbreviations are used in this manuscript:
EDMD  extended dynamic mode decomposition 
(G)MSM  (generalized) Markov state model 
SDE  stochastic differential equation 
TCCA  timelagged canonical correlation algorithm 
VAMP  variational approach for Markov processes 
Appendix A. Optimal LowRank Approximation of Compact Operators
For completeness, we include a proof of the Eckart–Young–Mirsky theorem for compact operators between separable Hilbert spaces. A space is separable if it has a countable basis. The Lebesgue space ${L}_{\mu}^{2}({\mathbb{R}}^{d})$ of $\mu $weighted squareintegrable functions is separable for bounded and integrable $\mu $. This is the case we consider here. In particular, the theorem shows that the optimal lowrank approximation of such an operator is obtained by an orthogonal projection on its subspace of dominant singular vectors; cf. (A1).
Theorem A1.
Let $\mathcal{A}:{\mathbb{H}}_{0}\to {\mathbb{H}}_{1}$ be a compact linear operator between the separable Hilbert spaces ${\mathbb{H}}_{0}$ and ${\mathbb{H}}_{1}$, with inner products ${\langle \xb7,\xb7\rangle}_{0}$ and ${\langle \xb7,\xb7\rangle}_{1}$, respectively. Then, the optimal rankk approximation ${\mathcal{A}}_{k}$ of $\mathcal{A}$ in the sense that
where $\parallel \xb7\parallel $ denotes the induced operator norm, is given by
where ${\sigma}_{i},{\psi}_{i},{\varphi}_{i}$ are the singular values (in nonincreasing order), left and right normalized singular vectors of $\mathcal{A}$, respectively. The optimum is unique iff ${\sigma}_{k}>{\sigma}_{k+1}$.
$$\parallel \mathcal{A}{\mathcal{A}}_{k}\parallel \to \underset{\mathit{rank}\phantom{\rule{0.166667em}{0ex}}{\mathcal{A}}_{k}=k}{min}\phantom{\rule{0.166667em}{0ex}},$$
$${\mathcal{A}}_{k}=\sum _{i=1}^{k}{\sigma}_{i}{\psi}_{i}{\langle {\varphi}_{i},\xb7\rangle}_{0}\phantom{\rule{0.166667em}{0ex}},$$
Proof.
Let ${\mathcal{A}}_{k}$ be defined as in (A1). Since $\mathcal{A}={\sum}_{i=1}^{\infty}{\sigma}_{i}{\psi}_{i}{\langle {\varphi}_{i},\xb7\rangle}_{0}$, we have
Let now ${\mathcal{B}}_{k}$ be any rankk operator from ${\mathbb{H}}_{0}$ to ${\mathbb{H}}_{1}$. Then, there exist linear functionals ${c}_{i}:{\mathbb{H}}_{0}\to \mathbb{R}$ and vectors ${v}_{i}\in {\mathbb{H}}_{1}$, $i=1,\dots ,k$, such that
$$\parallel \mathcal{A}{\mathcal{A}}_{k}\parallel =\parallel \sum _{i=k+1}^{\infty}{\sigma}_{i}{\psi}_{i}{\langle {\varphi}_{i},\xb7\rangle}_{0}\parallel ={\sigma}_{k+1}\phantom{\rule{0.166667em}{0ex}}.$$
$${\mathcal{B}}_{k}=\sum _{i=1}^{k}{c}_{i}(\xb7){v}_{i}\phantom{\rule{0.166667em}{0ex}}.$$
For every i, since ${c}_{i}$ has onedimensional range, its kernel has codimension 1, thus the intersection of the kernels of all the ${c}_{i}$ has codimension at most k. Thus, any $(k+1)$dimensional space has a nonzero element w with ${c}_{i}(w)=0$ for $i=1,\dots ,k$.
By this, we can find scalars ${\gamma}_{1},\dots ,{\gamma}_{k+1}$ such that ${\sum}_{i=1}^{k+1}{\gamma}_{i}^{2}=1$ and $w={\gamma}_{1}{\varphi}_{1}+\dots +{\gamma}_{k+1}{\varphi}_{k+1}$ satisfies ${c}_{i}(w)=0$ for $i=1,\dots ,k$. By construction ${\parallel w\parallel}_{0}$ holds. It follows that
$$\parallel \mathcal{A}{\mathcal{B}}_{k}{\parallel}^{2}\ge \parallel (\mathcal{A}{\mathcal{B}}_{k}){w\parallel}_{1}^{2}={\parallel \mathcal{A}w\parallel}_{1}^{2}=\sum _{i=1}^{k+1}{\sigma}_{i}^{2}{\gamma}_{i}^{2}\ge {\sigma}_{k+1}^{2}\underset{=1}{\underbrace{\sum _{i=1}^{k+1}{\gamma}_{i}^{2}}}\phantom{\rule{0.166667em}{0ex}}.$$
This with (A2) proves the claim. □
As a corollary, if $\mathcal{A}:\mathbb{H}\to \mathbb{H}$ is a selfadjoint operator, then its eigenvalue and singular value decompositions coincide, giving ${\psi}_{i}={\varphi}_{i}$, and thus ${\mathcal{A}}_{k}$ in (A1) is the projection on the dominant eigenmodes.
References
 Schütte, C.; Sarich, M. Metastability and Markov State Models in Molecular Dynamics: Modeling, Analysis, Algorithmic Approaches; American Mathematical Society: Providence, RI, USA, 2013; Volume 24. [Google Scholar]
 Prinz, J.; Wu, H.; Sarich, M.; Keller, B.; Senne, M.; Held, M.; Chodera, J.; Schütte, C.; Noé, F. Markov models of molecular kinetics: Generation and validation. J. Chem. Phys. 2011, 134, 174105. [Google Scholar] [CrossRef] [PubMed]
 Bowman, G.R.; Pande, V.S.; Noé, F. (Eds.) An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation. In Advances in Experimental Medicine and Biology; Springer: New York, NY, USA, 2014; Volume 797. [Google Scholar]
 Scherer, M.K.; TrendelkampSchroer, B.; Paul, F.; PérezHernández, G.; Hoffmann, M.; Plattner, N.; Wehmeyer, C.; Prinz, J.H.; Noé, F. PyEMMA 2: A software package for estimation, validation, and analysis of Markov models. J. Chem. Theory Comput. 2015, 11, 5525–5542. [Google Scholar] [CrossRef] [PubMed]
 Harrigan, M.P.; Sultan, M.M.; Hernández, C.X.; Husic, B.E.; Eastman, P.; Schwantes, C.R.; Beauchamp, K.A.; McGibbon, R.T.; Pande, V.S. MSMBuilder: Statistical Models for Biomolecular Dynamics. Biophys. J. 2017, 112, 10–15. [Google Scholar] [CrossRef] [PubMed]
 Schütte, C.; Noé, F.; Lu, J.; Sarich, M.; VandenEijnden, E. Markov State Models based on Milestoning. J. Chem. Phys. 2011, 134, 204105. [Google Scholar] [CrossRef] [PubMed]
 Sarich, M.; Noé, F.; Schütte, C. On the approximation quality of Markov state models. Multiscale Model. Simul. 2010, 8, 1154–1177. [Google Scholar] [CrossRef]
 Djurdjevac, N.; Sarich, M.; Schütte, C. Estimating the eigenvalue error of Markov State Models. Multiscale Model. Simul. 2012, 10, 61–81. [Google Scholar] [CrossRef]
 Noé, F.; Nüske, F. A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Model. Simul. 2013, 11, 635–655. [Google Scholar] [CrossRef]
 Nüske, F.; Keller, B.G.; PérezHernández, G.; Mey, A.S.; Noé, F. Variational approach to molecular kinetics. J. Chem. Theory Comput. 2014, 10, 1739–1752. [Google Scholar] [CrossRef] [PubMed]
 Schütte, C.; Fischer, A.; Huisinga, W.; Deuflhard, P. A direct approach to conformational dynamics based on hybrid Monte Carlo. J. Comput. Phys. 1999, 151, 146–168. [Google Scholar] [CrossRef]
 Deuflhard, P.; Weber, M. Robust Perron cluster analysis in conformation dynamics. Linear Algebra Its Appl. 2005, 398, 161–184. [Google Scholar] [CrossRef]
 Wu, H.; Paul, F.; Wehmeyer, C.; Noé, F. Multiensemble Markov models of molecular thermodynamics and kinetics. Proc. Natl. Acad. Sci. USA 2016, 113, E3221–E3230. [Google Scholar] [CrossRef] [PubMed]
 Chodera, J.D.; Swope, W.C.; Noé, F.; Prinz, J.; Shirts, M.R.; Pande, V.S. Dynamical reweighting: Improved estimates of dynamical properties from simulations at multiple temperatures. J. Chem. Phys. 2011, 134, 06B612. [Google Scholar] [CrossRef] [PubMed]
 Froyland, G.; Santitissadeekorn, N.; Monahan, A. Transport in timedependent dynamical systems: Finitetime coherent sets. Chaos Interdiscip. J. Nonlinear Sci. 2010, 20, 043116. [Google Scholar] [CrossRef] [PubMed]
 Froyland, G. An analytic framework for identifying finitetime coherent sets in timedependent dynamical systems. Phys. D Nonlinear Phenom. 2013, 250, 1–19. [Google Scholar] [CrossRef]
 Koltai, P.; Ciccotti, G.; Schütte, C. On metastability and Markov state models for nonstationary molecular dynamics. J. Chem. Phys. 2016, 145, 174103. [Google Scholar] [CrossRef] [PubMed]
 Wu, H.; Nüske, F.; Paul, F.; Klus, S.; Koltai, P.; Noé, F. Variational Koopman models: Slow collective variables and molecular kinetics from short offequilibrium simulations. J. Chem. Phys. 2017, 146, 154104. [Google Scholar] [CrossRef] [PubMed]
 Schütte, C.; Wang, H. Building Markov State Models for Periodically Driven NonEquilibrium Systems. J. Chem. Theory Comput. 2015, 11, 1819–1831. [Google Scholar]
 Froyland, G.; Koltai, P. Estimating longterm behavior of periodically driven flows without trajectory integration. Nonlinearity 2017, 30, 1948–1986. [Google Scholar] [CrossRef]
 Seifert, U.; Speck, T. Fluctuationdissipation theorem in nonequilibrium steady states. EPL Europhys. Lett. 2010, 89, 10007. [Google Scholar] [CrossRef]
 Lee, H.K.; Lahiri, S.; Park, H. Nonequilibrium steady states in Langevin thermal systems. Phys. Rev. E 2017, 96, 022134. [Google Scholar] [CrossRef] [PubMed]
 Yao, Y.; Cui, R.; Bowman, G.; Silva, D.; Sun, J.; Huang, X. Hierarchical Nystroem methods for constructing Markov state models for conformational dynamics. J. Chem. Phys. 2013, 138, 174106. [Google Scholar] [CrossRef] [PubMed]
 Bowman, G.; Meng, L.; Huang, X. Quantitative comparison of alternative methods for coarsegraining biological networks. J. Chem. Phys. 2013, 139, 121905. [Google Scholar] [CrossRef] [PubMed]
 Knoch, F.; Speck, T. Cycle representatives for the coarsegraining of systems driven into a nonequilibrium steady state. New J. Phys. 2015, 17, 115004. [Google Scholar] [CrossRef]
 Mori, H. Transport, collective motion, and Brownian motion. Prog. Theor. Phys. 1965, 33, 423–455. [Google Scholar] [CrossRef]
 Zwanzig, R. Nonlinear generalized Langevin equations. J. Stat. Phys. 1973, 9, 215–220. [Google Scholar] [CrossRef]
 Chorin, A.J.; Hald, O.H.; Kupferman, R. Optimal prediction and the Mori–Zwanzig representation of irreversible processes. Proc. Natl. Acad. Sci. USA 2000, 97, 2968–2973. [Google Scholar] [CrossRef] [PubMed]
 Chorin, A.J.; Hald, O.H.; Kupferman, R. Optimal prediction with memory. Phys. D Nonlinear Phenom. 2002, 166, 239–257. [Google Scholar] [CrossRef]
 Wu, H.; Noé, F. Variational approach for learning Markov processes from time series data. arXiv, 2017. [Google Scholar]
 Baxter, J.R.; Rosenthal, J.S. Rates of convergence for everywherepositive Markov chains. Stat. Probab. Lett. 1995, 22, 333–338. [Google Scholar] [CrossRef]
 Schervish, M.J.; Carlin, B.P. On the convergence of successive substitution sampling. J. Comput. Graph. Stat. 1992, 1, 111–127. [Google Scholar]
 Klus, S.; Nüske, F.; Koltai, P.; Wu, H.; Kevrekidis, I.; Schütte, C.; Noé, F. DataDriven Model Reduction and Transfer Operator Approximation. J. Nonlinear Sci. 2018, 1–26. [Google Scholar] [CrossRef]
 Mattingly, J.C.; Stuart, A.M. Geometric ergodicity of some hypoelliptic diffusions for particle motions. Markov Process. Relat. Fields 2002, 8, 199–214. [Google Scholar]
 Mattingly, J.C.; Stuart, A.M.; Higham, D.J. Ergodicity for SDEs and approximations: Locally Lipschitz vector fields and degenerate noise. Stoch. Process. Their Appl. 2002, 101, 185–232. [Google Scholar] [CrossRef]
 Bittracher, A.; Koltai, P.; Klus, S.; Banisch, R.; Dellnitz, M.; Schütte, C. Transition Manifolds of Complex Metastable Systems. J. Nonlinear Sci. 2017, 1–42. [Google Scholar] [CrossRef]
 Denner, A. Coherent Structures and Transfer Operators. Ph.D. Thesis, Technische Universität München, München, Germany, 2017. [Google Scholar]
 Hsing, T.; Eubank, R. Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
 Froyland, G.; Horenkamp, C.; Rossi, V.; Santitissadeekorn, N.; Gupta, A.S. Threedimensional characterization and tracking of an Agulhas Ring. Ocean Model. 2012, 52–53, 69–75. [Google Scholar] [CrossRef]
 Froyland, G.; Horenkamp, C.; Rossi, V.; van Sebille, E. Studying an Agulhas ring’s longterm pathway and decay with finitetime coherent sets. Chaos 2015, 25, 083119. [Google Scholar] [CrossRef] [PubMed]
 Williams, M.O.; Kevrekidis, I.G.; Rowley, C.W. A data–driven approximation of the Koopman operator: Extending dynamic mode decomposition. J. Nonlinear Sci. 2015, 25, 1307–1346. [Google Scholar] [CrossRef]
 Klus, S.; Koltai, P.; Schütte, C. On the numerical approximation of the Perron–Frobenius and Koopman operator. J. Comput. Dyn. 2016, 3, 51–79. [Google Scholar]
 Korda, M.; Mezić, I. On convergence of extended dynamic mode decomposition to the Koopman operator. J. Nonlinear Sci. 2017, 1–24. [Google Scholar] [CrossRef]
 PérezHernández, G.; Paul, F.; Giorgino, T.; Fabritiis, G.D.; Noé, F. Identification of slow molecular order parameters for Markov model construction. J. Chem. Phys. 2013, 139, 015102. [Google Scholar] [CrossRef] [PubMed]
 Schwantes, C.R.; Pande, V.S. Improvements in Markov state model construction reveal many nonnative interactions in the folding of NTL9. J. Chem. Theory Comput. 2013, 9, 2000–2009. [Google Scholar] [CrossRef] [PubMed]
 Molgedey, L.; Schuster, H.G. Separation of a mixture of independent signals using time delayed correlations. Phys. Rev. Lett. 1994, 72, 3634–3637. [Google Scholar] [CrossRef] [PubMed]
 Hammersley, J.M.; Morton, K.W. Poor man’s Monte Carlo. J. R. Stat. Soc. Ser. B Methodol. 1954, 16, 23–38. [Google Scholar]
 Rosenbluth, M.N.; Rosenbluth, A.W. Monte Carlo calculation of the average extension of molecular chains. J. Chem. Phys. 1955, 23, 356–359. [Google Scholar] [CrossRef]
 Jarzynski, C. Nonequilibrium equality for free energy differences. Phys. Rev. Lett. 1997, 78, 2690. [Google Scholar] [CrossRef]
 Crooks, G.E. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E 1999, 60, 2721. [Google Scholar] [CrossRef]
 Bucklew, J. Introduction to Rare Event Simulation; Springer Science & Business Media: New York, NY, USA, 2013. [Google Scholar]
 Hartmann, C.; Schütte, C. Efficient rare event simulation by optimal nonequilibrium forcing. J. Stat. Mech. Theory Exp. 2012, 2012, P11004. [Google Scholar] [CrossRef]
 Hartmann, C.; Richter, L.; Schütte, C.; Zhang, W. Variational Characterization of Free Energy: Theory and Algorithms. Entropy 2017, 19, 626. [Google Scholar] [CrossRef]
 Dellnitz, M.; Junge, O. On the approximation of complicated dynamical behavior. SIAM J. Numer. Anal. 1999, 36, 491–515. [Google Scholar] [CrossRef]
 Djurdjevac Conrad, N.; Weber, M.; Schütte, C. Finding dominant structures of nonreversible Markov processes. Multiscale Model. Simul. 2016, 14, 1319–1340. [Google Scholar] [CrossRef]
 Conrad, N.D.; Banisch, R.; Schütte, C. Modularity of directed networks: Cycle decomposition approach. J. Comput. Dyn. 2015, 2, 1–24. [Google Scholar]
Figure 1.
Left: doublewell potential. Right: invariant distribution $\mu $ (gray dashed) and second eigenfunction ${\varphi}_{2}$ (solid black) of the associated transfer operator $\mathcal{T}$.
Figure 2.
Cartoon representation of (23) with a coherent pair ${\mathbb{M}}_{0,i},{\mathbb{M}}_{1,i}$, which are represented by the thick horizontal lines left and right, respectively. Condition (23) can be translated into $\mathcal{T}1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{0,i}\approx 1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{1,i}$, or equivalently $\mathcal{P}(1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{0,i}{\mu}_{0})\approx 1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{1,i}{\mu}_{1}$. In other words, the part of the ensemble ${\mu}_{0}$ supported on the set ${\mathbb{M}}_{0,i}$ (dark gray region on the left) is mapped by the propagator to an ensemble (dark gray region on the right) that is almost equal to the part of the ensemble ${\mu}_{1}$ supported on ${\mathbb{M}}_{1,i}$. Note that little of $\mathcal{P}(1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{0,i}{\mu}_{0})$ is supported outside of ${\mathbb{M}}_{1,i}$, and little of $1\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}{1}_{1,i}{\mu}_{1}$ came from outside ${\mathbb{M}}_{0,i}$.
Figure 3.
Left: shifting triplewell potential. All three wells are coherent sets, as the plateaus of the singular vectors indicate. Middle: second right (initial) and left (final) singular vectors of the transfer operator (solid black and gray dashed lines, respectively). Right: third right (initial) and left (final) singular vectors of the transfer operator (solid black and gray dashed lines, respectively). The singular vectors are for reasons of numerical stability only computed in regions where ${\mu}_{0}$ and ${\mu}_{1}$ are, respectively, larger than machine precision.
Figure 4.
Top: initial ensemble ${\mu}_{0}$ (black solid) and its respective parts in the three coherent sets (gray shading). Bottom: final ensemble ${\mu}_{1}$ (black solid) and the image of the corresponding gray ensembles from the top row (gray shading).
Figure 5.
The same as Figure 4, for a different initial distribution ${\mu}_{0}$.
Figure 6.
The empirical initial distribution of the simulation data, i.e., the reference distribution ${\mu}_{\mathrm{ref}}$ (gray histogram), and the corrected equilibrium distribution computed from this data (solid black line). Left: sample size $m={10}^{5}$, right: sample size $m={10}^{4}$.
Figure 7.
Left: empirical distribution of the ${x}_{i}$ (histogram with $40\times 40$ bins). Middle: corrected invariant distribution. Right: clustering of the populated partition elements based on the 7 dominant eigenfunctions of the lowrank GMSM.
A Stochastic (Markov) Process Is Called  

timehomogeneous  if the transition probabilities from time s to time t depend only on $ts$ (in analogy to the evolution of an autonomous ordinary differential equation). 
stationary  if the distribution of the process does not change in time (such a distribution is also called invariant, cf. (2)). 
reversible  if it is stationary and the detailed balance condition (5) holds (reversibility means that time series are statistically indistinguishable in forward and backward time). 
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).