^{1}

^{*}

^{1}

^{1}

^{1}

^{1}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

A good deal of molecular dynamics simulations aims at predicting and quantifying rare events, such as the folding of a protein or a phase transition. Simulating rare events is often prohibitive, especially if the equations of motion are high-dimensional, as is the case in molecular dynamics. Various algorithms have been proposed for efficiently computing mean first passage times, transition rates or reaction pathways. This article surveys and discusses recent developments in the field of rare event simulation and outlines a new approach that combines ideas from optimal control and statistical mechanics. The optimal control approach described in detail resembles the use of Jarzynski’s equality for free energy calculations, but with an optimized protocol that speeds up the sampling, while (theoretically) giving variance-free estimators of the rare events statistics. We illustrate the new approach with two numerical examples and discuss its relation to existing methods.

Rare but important transition events between long-lived states are a key feature of many systems arising in physics, chemistry, biology,

In this article, we consider typical rare events in molecular dynamics for which conformation changes or protein folding may serve as examples. They can be described in the following abstract way: The molecular system under consideration has the ability to go from a reactant state given by a set

The system is (meta)stable, with the sets

The sets

know which parts of state space such reactive trajectories visit most likely,

characterize the rare event statistically,

The molecular dynamics literature on rare event simulations is rich. Since the 1930s, transition state theory (TST) [

These different strategies approach the problem by sampling the ensemble of reactive trajectories or by directly searching for the transition channels of the system. Most notable among these techniques are (1) Transition Path Sampling (TPS) [

Our aim is (A) to review some of these methods based on a joint theoretical basis and (B) to outline a new approach to the estimation of rare event statistics based on a combination of ideas from optimal control and statistical mechanics. In principle, this approach allows for a

Alternative, inherently discrete methods, like Markov State Modeling, that discretize the state space appropriately and try to compute transition channels and rates

Since our results are rather general, it is useful to set the stage somewhat abstractly. To this end, we borrow some notation from [^{n}_{t}_{t}_{t}_{t∈ℝ} where, for technical reasons, we let the trajectory start at time _{t∈ℝ}, we say that its reactive pieces are the segments during which _{t}_{t≥0} is reactive for all _{1}, _{2}] ⊂

Given the ensemble of reactive trajectories, we want to characterize it statistically by answering the following questions:

What is the probability of observing a trajectory at

What is the probability current of reactive trajectories? This probability current is the vector field _{AB}_{AB}

What is the transition rate of the reaction, _{AB}

Where are the main transition channels used by most of the reactive trajectories?

Question (Q1) can be answered easily, at least theoretically: The probability density to observe any trajectory (reactive or not) at point

In order to give answers to the other questions, we will exploit the framework of ^{n}_{t}^{−1} exp(_{max}. As a consequence, the relaxation of the dynamics towards equilibrium is dominated by the rare transitions over the largest energy barriers.

For these kind of dynamics, Questions (Q2) and (Q3) have surprisingly simple answers: The reactive probability current is given by
_{S}_{S}^{c}^{c}

These considerations can be generalized to a wide range of different kinds of dynamics in continuous state spaces, including, e.g., full Langevin dynamics, see [

This example illustrates that TPT in principle allows us to quantify all aspects of the transition behavior underlying a rare event. We can compute transition rates exactly and even characterize the transition mechanisms if we can compute the committor function. Deeper insight using the Feynman–Kac formula yields that the committor function can be computed as the solution of a linear boundary value problem, which for diffusive molecular dynamics reads
_{AB}

TPS has been developed in order to sample from the probability distribution of reactive trajectories in so-called “path space”, which means nothing else than the space of all discrete or continuous paths starting in _{T}_{t}_{0≤t≤T} of length _{A}_{A}

TPS is a Metropolis Monte-Carlo (MC) method for sampling
_{t}_{0≤t≤T})) that uses explicit information regarding the path measure _{T}

Whenever a transition channel exists, one can try to approximate the center curve of the transition channel instead of sampling the ensemble of reactive trajectories. If the center curve (also:

Rather than sampling the probability distribution of reactive pathways, such as _{T}_{T}^{n}_{∊}

The fact that the Euler discretization of the path density ℓ, with _{∊}^{n}^{*}_{φ} I^{∊}

In [_{∊}

Another action-based method that has been introduced in [

There are several other methods that entirely avoid the computation of reactive trajectories, but try to reconstruct the less complex transition channels or pathways instead, analyzing the energy landscape of the system. One group of such techniques, like the Zero Temperature String method [_{α}_{α}_{α}_{α}_{Γα} where the average is taken according to _{α}_{α}_{α}_{Γα}, which defines the width of the transition channel, is small, which implies that the isocommittor surfaces can be locally approximated by hyperplanes _{α}_{Pα}, where the average is computed by running constrained dynamics on _{α}_{α}

The FTS method allows one to compute single transition channels in rugged energy landscapes as long as these are not too extended and rugged. Compared to methods that sample the ensemble of reactive trajectories, it has the significant advantage that the string, that is, the principal curve inside the transition channel, is rather smooth and short, as compared to the typical reactive trajectories. The FTS further allows one to compute the free energy profile

The computation of transition rates can be performed without computing the dominant transition channels or similar objects. There is a list of rather general techniques, with Forward Flux Sampling (FFS) [

The first step of FFS is the choice of a finite sequence of interfaces _{k}_{N}_{AB}_{A}_{1}; and (2) the probability
_{1} makes it to _{k}_{+1}_{k}_{k}_{k}_{+1} before it returns to _{1}, yielding an estimate for the flux _{A}_{1} per unit of time). Second, a point from this ensemble on _{1} is selected at random and used to start a trajectory, which is followed until it either hits the next interface _{2} or returns to _{2}|_{1}). This procedure then is iterated from interface to interface. Finally, the rate _{AB}_{A}_{1}) is computed. Variants of this algorithm are described in [

FFS has been demonstrated to be quite general in approximating the flux of reactive trajectories through a given set of interfaces; it can be applied to equilibrium, as well as nonequilibrium systems, and its implementation is easy (see [_{k+1}|_{k}

Milestoning [_{k}_{N}_{i}_{i±1} before time _{i}_{i±1} and _{i}_{i}_{i±1}. The hitting times are recorded and collected into two distributions

These local kinetics are then compiled into the global kinetics of the process: For each _{i}_{i−1} and _{i+1} at time _{i}_{i}

The computation of reliable rare event statistics suffers from the enormous lengths of reactive trajectories. One obvious way to overcome this obstacle is to force the system to exhibit the transition of interest on shorter timescales. Therefore, can we

As was shown by Jarzynski and others, nonequilibrium forcing can in fact be used to obtain equilibrium rare event statistics. The advantage seems to be that the external force can speed up the sampling of the rare events by biasing the equilibrium distribution towards a distribution under which the rare event is no longer rare. We will shortly review Jarzynski’s identity before discussing the matter in more detail.

Jarzynski’s and Crook’s formulae [^{−1} log(_{1}/_{0}) between two equilibrium states of a system given by an unperturbed energy _{0} and its perturbation _{1} with the work _{ξ}_{0} + _{1} with _{0}), then, by the second law of thermodynamics, it follows that

In order to demonstrate how to improve approaches based on the idea of driving molecular systems to make rare events frequent, we first have to introduce some concepts and notation from statistical mechanics: Let _{t}_{t≥0}, ^{−1} in front.) Taylor expanding the CGF about ^{2}]; hence, for sufficiently small

The CGF admits a variational characterization in terms of relative entropies. To this end, let

Let ^{*}

When _{t}

We define ^{n}_{t}

The random variable ^{n}

The potential ^{n}

_{σ}

^{+}is the terminal set of the augmented process (

_{t}

^{+}= ([0,

_{t}

_{σ}

_{σ}_{σ}_{σ}

^{n}

_{σ}^{*}

The function _{σ}_{σ}

We want to consider the limit _{O}_{t}^{n}_{O}_{O}_{σ}^{*}_{∞}(_{σ}_{σ}

If we keep _{r}_{r}_{σ}_{σ}^{n}^{*}_{T}

The optimal control ^{*}_{σ}

Monte-Carlo estimators of the conditional CGF

The reader may now wonder as to whether it is possible to extract single moments from the CGF (e.g., mean first passage times). In general, this question is not straightforward to answer. One of the difficulties is that extracting moments from the CGF requires one to take derivatives at

Jarzynski’s identity relates equilibrium free energies to averages that are taken over an ensemble of trajectories generated by controlled dynamics, and the reader may wonder whether the above zero-variance property can be used in connection with free energy computations à la Jarzynski (_{t}_{t}_{0} the equilibrium distribution corresponding to the initial value _{0} of the protocol, but optimal controls are defined point-wise for each state (_{t}

A similar argument as the one underlying the derivation of the HJB equation from the linear boundary value problem yields that Jarzynski’s formula can be interpreted as a two-player zero-sum differential game (

Now, we illustrate how to use the results of the last section in practice. We will mainly consider the case discussed in Section 6.1 regarding the statistical characterization of hitting a certain set.

Roughly speaking, the CGF encodes information about the moments of any random variable _{t}_{t≥}_{0}. For example, for _{x}_{0} = _{0} = ^{0} denotes expectation with respect to the unperturbed dynamics.

It is not only possible to use the moment generating function to collect statistics about rare events in terms of the cumulant generating function, but also to express the committor function directly in terms of an optimal control problem (see Section 2.1 for the definition of the committor _{AB}_{O}_{1} has a singular boundary value at

Setting _{AB}

The logarithmic singularity of the value function at “reactant state” _{O}

The optimally controlled dynamics has a stationary distribution with a density proportional to

For the exit problem (“Case I”, above), one can find an efficient algorithm for computing the conditional CGF _{σ}

The algorithm that finds the optimal _{j}_{1}_{M}

The minimization algorithm for the value function belongs to the class of expectation-maximization algorithms (although, here, we carry out a minimization rather than a maximization), in that each minimization step is followed by a function evaluation that involves computing an expectation. In connection with rare events sampling and molecular dynamics problems, a close relative is the

The number of basis functions needed depends mainly on the roughness of the value function, but is independent of the system dimension. For systems with clear time scale separation, it has been moreover shown [

In our first example, we consider diffusive molecular dynamics as in ^{c}_{O}^{c}^{c}^{c}

This case is instructive: For the unperturbed original dynamics, the mean first passage time _{x}_{O}^{4} for ^{c}_{x}_{O}^{3} shorter than the ones we would have to use by direct numerical simulation of the unperturbed dynamics.

_{AB}

In our second example, we consider two-dimensional diffusive molecular dynamics as in _{AB}

As in our former experiment, we observe that the optimal control potential prevents the dynamics from returning to

We have surveyed various techniques for the characterization and computation of rare events occurring in molecular dynamics. Roughly, the approaches fall into two categories: (a) methods that approach the problem by characterizing the ensemble of reactive trajectories between metastable states or (b) path-based methods that target dominant transition channels or pathways by minimization of suitable action functionals. Methods of the first type, e.g., Transition Path Theory, Transition Path Sampling, Milestoning or variants thereof, are predominantly Monte-Carlo-type methods for generating one very long or many short trajectories, from which the rare event statistics can then be estimated. Methods that belong to the second category, e.g., MaxFlux, Nudged-Elastic Band or the String Method, are basically optimization methods (sometimes combined with a Monte-Carlo scheme); here, the objectives are few (single or multiple) smooth pathways that describe, e.g., a transition event. It is clear that this classification is not completely unambiguous, in that action-based methods for computing most probable pathways can be also used to sample an ensemble of reactive trajectories. Another possible classification (with its own drawbacks) is along the lines of the biased-unbiased dichotomy that distinguishes between methods that characterize rare events based on the original dynamics and methods that bias the underlying equilibrium distribution towards a new probability distribution under which the rare events are no longer rare. Typical representatives of the second class range from biasing force methods, such as ABF or metadynamics, up to genuine nonequilibrium approaches based on Jarzynski’s identity for computing free energy profiles. The problem often is that rare event estimators based on an ensemble of nonequilibrium trajectories suffer from large variances, unless the bias is cleverly chosen.

We have described a strategy to find such a cleverly chosen perturbation, based on ideas from optimal control. The idea rests on the fact that the cumulant generating function of a certain observable, e.g., the first exit time from a metastable set, can be expressed as the solution to an optimal control problem, which yields a zero variance estimator for the cumulant generating function. The control acting on the system has essentially two effects: (1) Under the controlled dynamics, the rare events are no longer rare, as a consequence of which the simulations become much shorter; (2) The variance of the statistical estimators is small (or even zero if the optimal control is known exactly). We should stress that, depending on the type of observable, the approach only appears to be a nonequilibrium method, for the optimal control is an exact gradient of a biasing potential; hence, the optimally perturbed system satisfies a detailed balance, which is one criterion for thermodynamic equilibrium. Future research should address the question as to whether the approach is competitive for realistic molecular systems, how to efficiently and robustly extract information about specific moments rather than cumulant generating functions and how to extend it to the more general observables or the calculation of free energy profiles.

The authors are grateful to Eric Vanden-Eijnden, Giovanni Ciccotti, Frank Pinski and Christoph Dellago for valuable discussions and comments. Ralf Banisch and Tomasz Badowski hold scholarships from the Berlin Mathematical School (BMS). This work was supported by the DFG Research Center “Mathematics for key technologies” (MATHEON) in Berlin.

The authors declare no conflict of interest.

(_{AB}_{AB}

Five-well potential (^{c}_{1} (

Optimally-corrected potential for the case of _{AB}_{1} of the potential. (_{3} _{3} +0.1[ the _{3}. (_{2} _{2} +0.1[ the _{2}.

Optimally-corrected potential for the three-well potential shown in _{AB}