^{1}

^{*}

^{1}

^{1}

^{1}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Rare, but important, transition events between long-lived states are a key feature of many molecular systems. In many cases, the computation of rare event statistics by direct molecular dynamics (MD) simulations is infeasible, even on the most powerful computers, because of the immensely long simulation timescales needed. Recently, a technique for spatial discretization of the molecular state space designed to help overcome such problems, so-called Markov State Models (MSMs), has attracted a lot of attention. We review the theoretical background and algorithmic realization of MSMs and illustrate their use by some numerical examples. Furthermore, we introduce a novel approach to using MSMs for the efficient solution of optimal control problems that appear in applications where one desires to optimize molecular properties by means of external controls.

Stochastic processes are widely used to model physical, chemical or biological systems. The goal is to approximately compute interesting properties of the system by analyzing the stochastic model. As soon as randomness is involved, there are mainly two options for performing this analysis: (1) Direct sampling and (2) the construction of a discrete coarse-grained model of the system. In a direct sampling approach, one tries to generate a statistically significant amount of events that characterize the property of the system one in which is interested. For this purpose, computer simulations of the model are a powerful tool. For example, an event could refer to the transition between two well-defined macroscopic states of the system. In chemical applications, such transitions can often be interpreted as reactions or, in the context of a molecular system, as conformational changes. Interesting properties are, e.g., average waiting times for such reactions or conformational changes and along which pathways the transitions typically occur. The problem with a direct sampling approach is that many interesting events are so-called rare events. Therefore, the computational effort for generating sufficient statistics for reliable estimates is very high, and particularly if the state space is continuous and high dimensional, estimation by direct numerical simulation is infeasible.

Available techniques for rare event simulations in continuous state space are discussed in [

The actual construction of an MSM requires one to sample certain transition probabilities of the underlying dynamics between sets. The idea is: (1) to choose the sets such that the sampling effort is much lower than the direct estimation of the rare events under consideration; and (2) to compute all interesting quantities for the MSM from its transition matrix,

In the first part of this article, we will discuss the approximation quality of two different types of Markov State Models that are defined with respect to a full partition of state space or with respect to so-called core sets. We will also discuss the algorithmic realization of MSMs and provide references to the manifold of realistic applications to molecular systems in equilibrium that are available in the literature today.

The second part will show how to use MSMs for optimizing particular molecular properties. In this type of application, one wants to steer the molecular system at hand by external controls in a way such that a pre-selected molecular property is optimized (minimized or maximized). That is, one wants to compute a specific external control from a family of admissible controls that optimizes the property of interest under certain side conditions. The property to be optimized can be quite diverse: For example, it can be (1) the population of a certain conformation that one wants to maximize under a side condition that limits the total work done by the external control or (2) the mean first passage time to a certain conformation that one wants to minimize (in order to speed up a rare event), but under the condition that one can still safely estimate the mean first passage time of the uncontrolled system. The theoretical background of case (1) has been considered in [

Let (_{t}_{t≥0} be a time-continuous Markov process on a continuous state space, ^{d}_{t}_{k}_{k}_{∈ℕ}, on a finite and preferably small state space _{t}_{k}_{k}_{∈ℕ}, lives on a finite state space, the construction of an MSM boils down to the computation of its transition matrix,

The main benefit is that for a finite Markov chain, one can compute many interesting dynamical properties directly from its transition matrix, e.g., timescales and the metastability in the system [_{t}

Having this in mind, the first natural idea is to let the states of an MSM correspond to sets _{1}, ..., _{n}

It is well known that this process is not Markovian, mainly due to the so-called recrossing problem. This refers to the fact that the original process typically crosses the boundary between two sets, _{i}_{j}

The non-Markovianity of the index process is often seen as a problem in Markov State Modeling, because many arguments assume that _{k}_{k}^{τ}_{k}_{k∈ℕ}, associated with this transition matrix. From above, it is clear that, in general, we have _{k}_{k}_{k}_{t}^{τ}

In order to compare the MSM to the continuous process, we introduce one of the key objects for our analysis, the _{t}_{t}_{t}^{2}(^{2}(_{t}_{0} = ^{2}(

Note that _{t}

The benefit of working with _{t}^{2}(^{2}(^{2}(_{1}, ..., _{n}_{τ}Q: D^{n,n}^{n,n}

_{i}_{Ai} to be the characteristic function of set _{i}_{0} ∼

The previously constructed transition matrix of the MSM based on a full partition can be interpreted as a projection onto a space of densities that are constant on the partitioning sets. This interpretation of an MSM is useful, since it allows one to analyze its approximation quality. For example, in [_{t}_{t}Q_{1} < 1 is the largest non-trivial eigenvalue of _{t}

In particular, for

An eigenvalue, _{i}_{i}_{1}, ..., _{n}

The projection error, _{t}

The figure also shows a choice of three sets that form a full partition of state space. The computation of the transition matrix _{0} = 1, _{1} = 0.9877, _{2} = 0.9037.

As one can see, the timescales are strongly underestimated. This is a typical phenomenon. From a statistical point of view, the recrossing problem will lead to cumulatively appearing transition counts when one computes the transition probabilities, ℙ_{μ}_{τ}_{j}|X_{0} ∈ _{i}_{t}_{1}, belonging to the timescale _{1} = 103.7608 and its best-approximation by a step function.

The eigenvector is indeed almost constant in the vicinity of the wells, but within the transition region between the wells, the eigenvector is varying and the approximation by a step function is not accurate. Therefore, we have two explanations of why the main error is introduced in the region close to shared boundaries of neighboring sets: (1) because of recrossing issues; and (2) because of the main projection error of the associated eigenvector. Of course, one solution would be an adaptive refinement of the discretization, that is, one could choose a larger number of smaller sets, such that the eigenvector is better approximated by a step function on these sets. In the following section, we will present an alternative solution for overcoming the recrossing problem and reducing the projection error without refining the discretization.

From ^{2}(_{1}, ..., _{n}_{i}_{Ai} on a full partition, we do not have to compute the scalar products numerically, since the matrix entries have a stochastic interpretation in terms of transition probabilities between set

Now, the question is if there is another basis other than characteristic functions that: (a) is more adapted to the eigenvectors of the transfer operator; and (b) still leads to a probabilistic interpretation of the matrix entries _{1}, ..., _{n}_{i}_{j}

Now, we use the core sets to define our basis functions, _{1}, ..., _{n}_{τ}_{1}, ..., _{n}_{i}_{i}_{i}_{i}

The matrices, _{i}_{j}

Let _{i}_{τ}_{i}C_{i}_{x}^{c}_{i}_{i}_{i}_{x}^{c}^{c}_{i}C_{i}_{i}_{i}

The message behind (P1) is that it is possible to relax the full partition constraint and use a core set discretization that does not cover the whole state space. We can still define a basis for a projection of the transfer operator that leads to a matrix representation that can be interpreted in terms of transition probabilities.

Property (P2) yields that the relaxation of the full partition constraint should also lead to an improvement of the MSM if the region,

The matrix _{Q}^{−1} that represents the projection, _{τ}

From the discussion above, this has to be expected, because the eigenvectors are almost constant in the vicinity of the wells, and we removed a part of state space from the discretization that is typically left quickly compared to the timescales, _{1} and _{2}. Therefore, the committor functions should deliver a good approximation of the first two eigenvectors.

In the previous sections, we have interpreted the construction of an MSM as a projection of the dynamics onto some finite dimensional ansatz space. We have discussed two types of spaces that both have been defined on the basis of a set discretization. First, we chose a full partition of state space and the associated space of step functions, and second, we analyzed a discretization by core sets and the associated space spanned by committor functions. These two methods have the advantage that the resulting projections lead to transition matrices for the MSM with entries that are given in terms of transition probabilities between the sets. That is, one can compute estimates for the transition matrices from simulation data. This is an important property for practical applications, because it means that we never need to compute committor functions or scalar products between committors or step functions. We rather generate trajectories _{0},_{1}, ..._{N}_{t}_{i}_{hi}_{1}, ..., _{m}_{k}_{k}_{i}_{1}, ..., _{m}_{Q}^{−1} of the core set MSM matrix

Since, in practice, we will only have a finite amount of data available, we will have statistical errors when constructing an MSM. This is an additional error to the projection error related to the discretization that we have discussed above. On the other hand, one should note that these errors are

Besides the choice of discretization and the available statistics, the estimates above also depend on a lag time, ^{*}_{i}_{j}_{i}^{*}^{−}^{1} with the mass matrix,

Let us now analyze how the choice of core sets, particularly the size of the core sets, influences the resulting approximation. Therefore, we consider an MD example that was discussed in [_{α}_{β}_{α},x_{β}

For computing a reference timescale, several MSMs based on three different full partitions using 10, 15 and 250 sets have been constructed for increasing lag times. In [^{*}

One can see that the estimate by the milestoning generator, ^{*}^{*}^{−1}, this effect is theoretically corrected by the mass matrix,

Markov State Modeling has been show to apply successfully to many different molecular systems, like peptides, including time-resolved spectroscopic experiments [

_{k}_{k}_{ij}_{i}_{j}

In practice, the statistics of the transition events between core sets will preferably be estimated from many short trajectories using milestoning techniques [

This article cannot give a detailed review on the algorithmic realization of MSMs for realistic molecular systems and on the findings resulting from such applications, since this is discussed to some extent elsewhere; see [

In this section, we will borrow ideas from the previous section and explain how MSMs can be used to discretize optimal control problems that are linear-quadratic in the control variables and which appear in, e.g., the sampling of rare events. Specifically, we consider the case that (_{t}_{t≥0} is the solution of:
_{t}^{d}_{t}_{t}

Other choices of _{E\C}_{t}

There are other types of cost functions,

_{t}_{t}_{t}^{d}^{*}_{t}

The difficulty is that _{t}

The basic idea is now to choose a subspace, ^{2}(_{1},...,_{n}

We will do this construction for the full partition case _{i}_{Ai}_{i}_{i}_{i}

^{2}(_{1},...,_{n}_{+1} with the following properties:

The _{i}

The _{i}_{n}_{+1}_{A}_{i}|_{A}

Now, let _{1},…,_{n+1}, of its finite-dimensional representation _{j}ϕ̂_{j}χ_{j}^{2}(^{2}(_{j}ψ_{j}χ_{j}_{j}^{−1}^{2}(^{−1}^{2}(^{2}(_{i}

The best approximation error ‖^{⊥}_{μ}_{ψ∈D}‖_{μ}_{i}_{i}^{⊥}_{μ}_{i}C_{i}_{x∈C} 𝔼_{x}τ_{E\C}^{2}(_{i}^{2}(^{⊥}^{⊥}_{μ}^{⊥}_{∞}, measure how constant the solution,

_{ii}_{j}F_{ij}^{−1}(

This equation can be given a stochastic interpretation. To this end, let us introduce the vector, ^{n+1}, with nonnegative entries _{i}_{i},_{i}π_{i}_{i}_{i}χ_{i}

_{t}_{t≥0} (

_{i}F_{ij}_{j}F_{ji}

^{T}_{i}G_{ij}_{j}G_{ji}_{ij}_{∞} ≤

It follows that if the running costs, _{t}_{t≥0}, and ^{v}_{t}

^{*}_{i}

The _{v}^{*}^{v*}

^{v}^{v}

_{t}_{0} = ^{v}

A few remarks seem in order: Item (^{*}^{−1}

This completes our derivation of the discretized optimal control problem, and we now compare it with the continuous problem we started with for the case of a full partition of

_{1},...,_{n+1}, with centers _{1},...,_{n+1} and such that _{n+1} ≔ _{i}_{Ai}_{i}_{i}_{ij}_{ij}_{ij}_{ij}_{i}

One can show that the approximation error vanishes for ^{τ}^{τ}^{τ}^{v}_{i}_{n}

_{1},...,_{n}_{+1} with _{n+1} = _{i}_{i}_{i}_{i}_{t}_{i}_{t}_{i}

The matrix
^{τ}^{τ}^{τ}^{τ}

Therefore, as in the construction of core MSMs, we do not need to compute committor functions explicitly. Note, however, that _{i}_{∞} ≤

Firstly, we study diffusion in the triple well potential, which is presented in _{0/1} = _{2} = 0. We choose the three core sets _{i}_{i}_{i}_{0}. We are interested in the moment generating function ^{−∊−1στ}] of passages into _{0} and the cumulant generating function _{i}

In _{0} very quickly. The reference computations here have been carried out using a full partition FEM (finite element method) discretization of ^{*}^{*}^{*}^{*}

Next, we construct a core MSM to sample the matrices,

Lastly, we study _{i}_{α}_{α}_{α},ψ_{α}_{βα}^{τ}

In [

In this article, we have discussed an approach to overcome direct sampling issues of rare events in molecular dynamics based on spatial discretization of the molecular state space. The strategy is to define a discretization by subsets of state space, such that the sampling effort with respect to transitions between the sets is much lower than the direct estimation of the rare events under consideration. That is, without having to simulate rare events, we construct a so-called Markov State Model, a Markov chain approximation to the original dynamics. Since the state space of the MSM is finite, we can then calculate the properties of interest by simply solving linear systems of equations. Of course, it is crucial that these properties of the MSM can be related to the rare event properties of the original process that we have not been able to sample directly.

This is why we have analyzed the approximation quality of MSMs in the first part of the article. We have used the interpretation of MSMs as projections of the transfer operator to: (1) derive conditions that guarantee an accurate reproduction of the dynamics; and (2) show how to construct models based on a core set discretization by leaving the state space partly undiscretized.

In the second part of the article, we have used the concept of MSM discretization to solve MD optimal control problems in which one computes the optimal external force that drives the molecular system to show an optimized behavior (maximal possible population in a conformation; minimal mean first passage time to a certain conformation) under certain constraints. We have demonstrated that the spatial discretization underlying an MSM turns the high-dimensional continuous optimal control problem into a rather low-dimensional discrete optimal control problem of the same form that can be solved efficiently. This result allows two different types of applications: (1) if one can construct an MSM for a molecular system in equilibrium, then one can use it to compute optimal controls that extremize a given costs criterion; (2) if an MSM can be computed based on transition probabilities between neighboring core sets alone, then the rare event statistics for transitions between strongly separated metastable states of the system can be computed from an associated optimal control problem that can be solved after discretization using the pre-computed MSM.

The authors have been supported by the DFG Research Center MATHEON.

The authors declare no conflict of interest.

^{2+}dependent folding of a Diels-Alderase ribozyme probed by single-molecule FRET analysis

Cumulative transitions between two sets along boundaries are typical.

A potential with three wells and a choice of three sets, _{1}, _{2}, _{3}.

The first non-trivial eigenvector, _{1} (solid blue), and its projection, _{1} (dashed red), onto step functions that are constant on _{1}, _{2}, _{3}.

Core sets do not have to share boundaries anymore. This can reduce the recrossing effect.

Excluding a small region of state space from the sets, _{1}, _{2}, _{3}, as in _{1}, _{2}, _{3} that do not share boundaries anymore.

(_{1} (solid blue), and its projection, _{f}_{1} (finely dashed red), onto step functions (full partition) and its projection, _{c}u_{1} (dashed green), onto committors (core sets). (_{2}.

The stationary distribution of alanine dipeptide and the two centers of the core sets, _{α}, x_{β}

Estimate of the implied timescales from ^{*}

The potential from

The mesh for the full partition.

Three well potential example for _{1}(_{2}(_{3}(^{*}

Dipeptide example. (a) MFPT from

Comparison of implied timescales

_{1} |
_{2} | |
---|---|---|

Original | 103.7608 | 11.9566 |

Full partition 3 sets | 80.6548 | 9.8784 |

More accurate approximation if implied timescales

_{1} |
_{2} | |
---|---|---|

Original | 103.7608 | 11.9566 |

3 core sets | 100.8066 | 11.9145 |

Full partition 3 sets | 80.6548 | 9.8784 |