Entropy 2014, 16(1), 350-376; doi:10.3390/e16010350

Characterization of Rare Events in Molecular Dynamics
Carsten Hartmann 1,*, Ralf Banisch 1, Marco Sarich 1, Tomasz Badowski 1 and Christof Schütte 1,2
Institut für Mathematik, Freie Universit¨at Berlin, Arnimallee 6, 14195 Berlin, Germany; E-Mails: ralf.banisch@fu-berlin.de (R.B.); sarich@mi.fu-berlin.de (M.S.); tomasz.badowski@gmail.com (T.B.); schuette@mi.fu-berlin.de (C.S.)
Konrad-Zuse Zentrum, Takustraße 7, 14195 Berlin, Germany
Author to whom correspondence should be addressed; E-Mail: chartman@mi.fu-berlin.de; Tel.: +49(0)30-838-75286.
Received: 13 September 2013; in revised form: 8 October 2013 / Accepted: 22 November 2013 /
Published: 30 December 2013


: A good deal of molecular dynamics simulations aims at predicting and quantifying rare events, such as the folding of a protein or a phase transition. Simulating rare events is often prohibitive, especially if the equations of motion are high-dimensional, as is the case in molecular dynamics. Various algorithms have been proposed for efficiently computing mean first passage times, transition rates or reaction pathways. This article surveys and discusses recent developments in the field of rare event simulation and outlines a new approach that combines ideas from optimal control and statistical mechanics. The optimal control approach described in detail resembles the use of Jarzynski’s equality for free energy calculations, but with an optimized protocol that speeds up the sampling, while (theoretically) giving variance-free estimators of the rare events statistics. We illustrate the new approach with two numerical examples and discuss its relation to existing methods.
rare events; molecular dynamics; optimal pathways; stochastic control; dynamic programming; change of measure; cumulant generating function

1. Introduction

Rare but important transition events between long-lived states are a key feature of many systems arising in physics, chemistry, biology, etc. Molecular dynamics (MD) simulations allow for analysis and understanding of the dynamical behavior of molecular systems. However, realistic simulations for interesting (large) molecular systems in solution on timescales beyond microseconds are still infeasible even on the most powerful general purpose computers. This significantly limits the MD-based analysis of many biological equilibrium processes, because they often are associated with rare events. These rare events require prohibitively long simulations because the average waiting time between the events is orders of magnitude longer than the timescale of the transition characterizing the event itself. Therefore, the straightforward approach to such a problem via direct numerical simulation of the system until a reasonable number of events has been observed is impractically excessive for most interesting systems. As a consequence, rare event simulation and estimation are among the most challenging topics in molecular dynamics.

In this article, we consider typical rare events in molecular dynamics for which conformation changes or protein folding may serve as examples. They can be described in the following abstract way: The molecular system under consideration has the ability to go from a reactant state given by a set A in its state space (e.g., an initial conformation) to a product state described by another set B (e.g., the target conformation). Dynamical transitions from A to B are rare. The general situation we will address is as follows:

  • The system is (meta)stable, with the sets A and B being two of its metastable sets in the sense that if the system is put there, it will remain there for a long time; transitions between A and B are rare events.

  • The sets A and B are separated by an unknown and, in general, rough or diffusive energy landscape (that will be denoted by V).

In addition, we will assume that the system under consideration is in equilibrium with respect to the stationary Gibbs-Boltzmann density
μ ( x ) = 1 Z exp ( β V ( x ) ) .
We are interested in characterizing the transitions leading from A into B, that is, we are interested in the statistical properties of the ensemble of reactive trajectories that go directly from A to B (i.e., start in A without returning to A before going to B). In other words, we are interested in all trajectories comprising the actual transition. We would like to:
  • know which parts of state space such reactive trajectories visit most likely, i.e., where in state space do we find transition pathways or transition channels through which most of the probability current generated by reactive trajectories flows and

  • characterize the rare event statistically, i.e., compute the transition rate, the free energy barrier, the mean first passage time or even more elaborated statistical quantities.

The molecular dynamics literature on rare event simulations is rich. Since the 1930s, transition state theory (TST) [1,2] and extensions thereof based on the reactive flux formalism have provided the main theoretical framework for the description of transition events. TST can, however, at best deliver rates and does not allow one to characterize transition channels. It is based on partitioning the state space into two sets with a dividing surface in between, leaving set A on one side and the target set B on the other, and the theory only tells how this surface is crossed during the reaction. Often, it is difficult to choose a suitable dividing surface, and a bad choice will lead to a very poor estimate of the rate. The TST estimate is then extremely difficult to correct, especially if the rare event is of the diffusive type, where many different reaction channels co-exist. Therefore, many techniques have been proposed that try to go beyond TST.

These different strategies approach the problem by sampling the ensemble of reactive trajectories or by directly searching for the transition channels of the system. Most notable among these techniques are (1) Transition Path Sampling (TPS) [3]; (2) the so-called String Methods [4], or optimal path approaches [57] and variants thereof; (3) techniques that follow the progress of the transition through interfaces, like Forward-Flux Simulation (FFS) [8], Transition Interface Sampling (TIS) [9] or the Milestoning techniques [10,11]; and (4) methods that drive the molecular system by external forces with the aim of making the required transition more frequent while still allowing one to compute the exact rare event statistics for the unforced system, e.g., based on Jarzynski’s and Crook’s identity [12,13]. All of these methods consider the problem in continuous state space, i.e., through reactive trajectories or transition channels in the original state space of the molecular system. They all face substantial problems, e.g., if the ensemble of reactive trajectories and/or transition channels of the system under consideration are too complicated (multi-modal, irregular, essentially high dimensional) or they suffer from too large variance of the underlying statistical estimators. We should moreover stress that each of these methods has its specific scope of application; some methods are mainly useful for computing transition rates, whereas others can be used to compute transition pathways or free energy differences.

Our aim is (A) to review some of these methods based on a joint theoretical basis and (B) to outline a new approach to the estimation of rare event statistics based on a combination of ideas from optimal control and statistical mechanics. In principle, this approach allows for a variance-free estimation of rare event statistics in combination with much reduced simulation time. The rest of the article is organized as follows: We start with a precise characterization of reactive trajectories, transition channels and related quantities in the framework of Transition Path Theory (TPT) in Section 2. Then, in Sections 3 and 4, we discuss the methods from classes (1)–(3) and characterize their potential problems in more detail. In Section 5, we consider methods of type (4) as an introduction to the presentation of the new optimal control approach that is outlined in detail in Sections 6 and 7, including some numerical experiments.

Alternative, inherently discrete methods, like Markov State Modeling, that discretize the state space appropriately and try to compute transition channels and rates a posteriori based on the resulting discrete model of the dynamics will not be discussed herein and are considered in the article [14] in a way related to the presentation at hand. We should further mention that not all rare event problems in molecular dynamics are related to sampling the underlying Gibbs–Boltzmann statistics, e.g., nucleation events under shear [15] or genuine nonequilibrium systems without a stationary probability distribution [16].

2. Reactive Trajectories, Transition Rates and Transition Channels

Since our results are rather general, it is useful to set the stage somewhat abstractly. To this end, we borrow some notation from [17] and consider a system whose state space is ℝn and denote by Xt the current state of the system at time t. For example, Xt may be the set of instantaneous positions and momenta of the atoms of a molecular system. We assume that the system is ergodic with respect to a probability (equilibrium) distribution μ and that we can generate an infinitely long equilibrium trajectory {Xt}t∈ℝ where, for technical reasons, we let the trajectory start at time t = −∞. The trajectory will go infinitely many times from A to B and each time the reaction happens. This reaction involves reactive trajectories that can be defined as follows: Given the trajectory {X(t)}t∈ℝ, we say that its reactive pieces are the segments during which Xt is neither in A or B, came out of A last and will go to B next. To formalize things, let

t AB + ( t ) = smallest  s t such that X ( s ) A B , t AB ( t ) = largest s t such that X ( s ) A B .
Then, the trajectory {X(t)}t≥0 is reactive for all t ∈ R where R ⊂ [0, ∞) is defined by the requirements
X t A B , X t AB + ( t ) B and X t AB ( t ) A
and the ensemble of reactive trajectories is given by the set
= { X t : t R }
where each specific continuous piece of trajectory going directly from A to B in the ensemble belongs to a specific interval [t1, t2] ⊂ R.

Given the ensemble of reactive trajectories, we want to characterize it statistically by answering the following questions:


What is the probability of observing a trajectory at x ∉ (AB) at time t, conditional on tR?


What is the probability current of reactive trajectories? This probability current is the vector field jAB(x) with the property that given any separating surface S between A and B (i.e., the boundary of a region that contains A but not B), the surface integral of jAB over S gives the probability flux of reactive trajectories between A and B across S.


What is the transition rate of the reaction, i.e., what is the mean frequency kAB of transitions from A to B?


Where are the main transition channels used by most of the reactive trajectories?

Question (Q1) can be answered easily, at least theoretically: The probability density to observe any trajectory (reactive or not) at point x is μ(x). Let q(x) be the so-called committor function, that is the probability that the trajectory starting from x reaches first B rather than A. If the dynamics are reversible, then the probability that a trajectory we observe at state x is reactive is q(x)(1 − q(x)), where the first factor appears since the trajectory must go to B rather than A next, and the second factor appears since it needs to come from A rather than B last. Now, the Markov property of the dynamics implies that the probability density to observe a reactive trajectory at point x is

μ AB ( x ) q ( x ) ( 1 q ( x ) ) μ ( x ) ,
which is the probability of observing any trajectory in x times the probability that it will be reactive (the proportionality symbol is used to indicate identity up to normalization).

2.1. Transition Path Theory (TPT)

In order to give answers to the other questions, we will exploit the framework of transition path theory (TPT), which has been developed in [1720] in the context of diffusions and has been generalized to discrete state spaces in [21,22]. In order to review the key results of TPT, let us consider diffusive molecular dynamics in an energy landscape V : ℝn → ℝ:

dX t = V ( X t ) dt + 2 d B t , X 0 = x .
Here, Bt denotes standard n-dimensional Brownian motion, and ∊ > 0 is the temperature of the system. Under mild conditions on the energy landscape function V, we have ergodicity with respect to the stationary distribution μ(x) = Z−1 exp(−βV (x)) with β = 1/∊. The dynamics are reversible with respect to this distribution, i.e., the detailed balance condition holds. We assume throughout that the temperature is small relative to the largest energy barriers, i.e., ∊ ≪ ΔVmax. As a consequence, the relaxation of the dynamics towards equilibrium is dominated by the rare transitions over the largest energy barriers.

For these kind of dynamics, Questions (Q2) and (Q3) have surprisingly simple answers: The reactive probability current is given by

j AB ( x ) = μ ( x ) q ( x )
where ▿q denotes the gradient of the committor function q. Based on this, the transition rate can be computed by the total reactive current across an arbitrary separating surface S:
k AB = S n S ( x ) j AB ( x ) d σ S ( x )
where nS denotes the unit normal vector on S pointing towards B and σS the associated surface element. The rate can also be expressed by
k AB = ( A B ) c ( q ( x ) ) 2 μ ( x ) dx
where (AB)c denotes the entire state space excluding A and B. Given the reactive current, we can even answer Question (Q4): The transition channels of the reaction AB are the regions of (AB)c in which the streamlines of the reactive current, i.e., the solutions of
d dt x AB ( t ) = j AB ( x AB ( t ) ) , x AB ( 0 ) A
are exceptionally dense.

Figure 1 illustrates these quantities for the case of a 2D three well potential with two main wells (the bottoms of which we take as A and B in the following) and a less significant third well. The three main saddle points separating the wells are such that the two saddle points between the main wells and the third well are lower in energy than the saddle point between the main wells, such that in the zero temperature limit, we expect that almost all reactive trajectories take the route through the third well across the two lower saddle points. We observe that the committor functions for low and higher temperatures exhibit smooth isocommittor lines separating the sets A and B, as expected. The transition channels computed from the associated reactive current also show what one should expect: For a lower temperature, the channel through the third well and across the two lower saddle points is dominant, while for a higher temperature, the direct transition from A to B across the higher saddle point is preferred.

These considerations can be generalized to a wide range of different kinds of dynamics in continuous state spaces, including, e.g., full Langevin dynamics, see [1720].

This example illustrates that TPT in principle allows us to quantify all aspects of the transition behavior underlying a rare event. We can compute transition rates exactly and even characterize the transition mechanisms if we can compute the committor function. Deeper insight using the Feynman–Kac formula yields that the committor function can be computed as the solution of a linear boundary value problem, which for diffusive molecular dynamics reads

Lq AB = 0 in ( A B ) c , q AB = 0 in A , q AB = 1 in B
where the generator L has the following form
L = Δ V ( x )
where Δ = i 2 / x i 2 denotes the Laplace operator. This equation allows the computation of qAB in relatively low-dimensional spaces, where the discretization of L is possible based on finite element methods or comparable techniques. In realistic biomolecular state spaces, this is infeasible because of the curse of dimensionality. Therefore, TPT gives a complete theoretical background for rare event simulation, but its application in high dimensional situations is still problematic. As a remedy, a discrete version of TPT has been developed [21,22], which can be used in combination with Markov State Modeling; see [23].

2.2. Transition Path Sampling (TPS)

TPS has been developed in order to sample from the probability distribution of reactive trajectories in so-called “path space”, which means nothing else than the space of all discrete or continuous paths starting in A and ending up in B equipped with the probability distribution generated by the dynamics through the ensemble of associated reactive trajectories. Let PT denote the path measure on the space of discrete or continuous trajectories {Xt}0≤tT of length T. The path measure of reactive trajectories then is

P T AB ( { X t } 0 t T ) = 1 Z AB 1 A ( X 0 ) P T ( { X t } 0 t T ) 1 B ( X T )
where 1A denotes the indicator function of set A (that is, 1A(x) = 0 if xA and = 1 otherwise).

TPS is a Metropolis Monte-Carlo (MC) method for sampling P T AB ({Xt}0≤tT)) that uses explicit information regarding the path measure PT, such as Equation (5), with MC moves that are based on a perturbation of a precomputed reactive trajectory [3,24]. It delivers an ensemble of reactive trajectories of length T that (under the assumption of convergence of the MC scheme) is representative for P T AB and thus allows one to compute respective expectation values, like the probability to observe a reactive trajectory or the reactive current. However, its potential drawbacks are obvious: (1) A typical reactive trajectory is very long and rather uninformative (cf. Figure 1), i.e., the computational effort of generating an entire ensemble of long reactive trajectories can be prohibitive; (2) convergence of the MC scheme in the infinite dimensional path space can be very poor; and (3) the limitation to a pre-defined trajectory length T can lead to biased statistics of the TPS ensemble. Advanced TPS schemes try to remedy these drawbacks by combining the original TPS idea with interface methods [9]. Even though TPS can be used no matter whether the underlying dynamics is deterministic or stochastic, the algorithm is usually used in connection with deterministic Hamiltonian dynamics [3].

3. Finding Transition Channels

Whenever a transition channel exists, one can try to approximate the center curve of the transition channel instead of sampling the ensemble of reactive trajectories. If the center curve (also: principal curve) is a rather smooth object, then such a method would not suffer from the extensive length of reactive trajectories. Several such methods have been introduced; they differ with respect to the definition of the transition channel and the corresponding center or principal curve.

3.1. Action-Based Methods

Rather than sampling the probability distribution of reactive pathways, such as Equation (4), one can try to obtain a representative or dominant pathway, e.g., by computing the pathway that has maximum probability under PT. For the case of diffusive molecular dynamics, the path measure PT has a probability density relative to a (fictitious) uniform measure on the space of all continuous paths in ℝn of length T that are generated by Brownian motion; the relative density reads

( φ ) = exp ( 1 2 I ( φ ) )
where I is the Onsager–Machlup action
I ( φ ) = 0 T { 1 2 | φ ˙ ( s ) | 2 + 1 2 | V ( φ ( s ) ) | 2 Δ V ( φ ( s ) ) } dt .
More precisely, ℓ(φ) is the limiting ratio between the probability that the solution of Equation (2) remains in a small tubular neighborhood of a smooth path φ(·) and the probability that 2 B t remains in a small neighborhood of the initial value x = φ(0), as the size of the neighborhoods go to zero [25].

The fact that the Euler discretization of the path density ℓ, with I interpreted in the sense of Itô integrals, corresponds to the probability density of the Euler-discretized reaction path with respect to Lebesgue measure has led to the idea that by minimizing the Onsager–Machlup action over all continuous paths φ:[0, T ] → ℝn going from A to B, one can find the dominant reactive path φ* = argminφ I (φ) in the sense of a maximum likelihood estimator. The hope is that this path, often also called the optimal path or most probable path, on the one hand, contains information on the transition mechanism and, on the other hand, is much smoother and easier to interpret than a typical reactive trajectory. Note, however, that the actual probability that the solution of Equation (2) remains in a small neighborhood of a given path φ(·) is exponentially small in the size of the neighborhood.

In [7], a comparison between the Onsager–Machlup action and its zero temperature limit has been given using gradient descent methods, raising issues regarding the correct interpretation of the minimizers of I (that need not exist) as most probable paths. In [5], the dominant reaction pathway method has been outlined, which uses a simplified version of the Onsager–Machlup functional that leads to a computationally simpler optimization problem and is applicable to large-scale problems, e.g., protein folding [6]. However, even if the globally dominant pathways can be computed, such that the optimization does not get stuck in local minima, and even if we ignore the issues regarding the correct interpretation of minimizers, the resulting pathways in general do not allow one to gain statistical information on the transition (like rates, currents, mean first passage times).

Another action-based method that has been introduced in [26] is the MaxFlux method, which seeks the path that carries the highest reactive flux among all reactive trajectories of a certain length. The idea is to compute the path of least resistance by minimizing the functional

L ( φ ) = 0 T exp ( 1 V ( φ ( s ) ) ) ds .
Several algorithmic approaches for the minimization of the resistance functional L have been proposed, e.g., a path-based method [27], discretization of the corresponding Euler–Lagrange equation based on a mean-field approximation of it [28] or a Hamilton–Jacobi-based approach using the method of characteristics [29]. Minimizing L for different values of T then yields a collection of paths, each of which carries a certain percentage of the total reactive flux. The method is useful if the temperature is small, so that the reactive flux concentrates around a sufficiently small number of reactive pathways.

3.2. String Method and Variants

There are several other methods that entirely avoid the computation of reactive trajectories, but try to reconstruct the less complex transition channels or pathways instead, analyzing the energy landscape of the system. One group of such techniques, like the Zero Temperature String method [4], the Geometric Minimum Action method [30] or the Nudged Elastic Band method [31], concentrate on the computation of the minimal energy path (MEP), i.e., the path of lowest potential energy between (a point in) A and (a point in) B. Under diffusive molecular dynamics and for vanishing temperature, the MEP is the path that transitions take with probability one [32]. It turns out that the MEP in this case is the minimizer of the Onsager–Machlup action (5) in the limit → 0. For non-zero temperature and a rugged energy landscape, the MEP will in general be not very informative and must be replaced by a finite-temperature transition channel. This is done by the finite-temperature string (FTS) method [33] based on the following considerations: Firstly, the isocommittor surfaces Γα, α [0, 1], of the committor q are taken as natural interfaces that separate A from B. Secondly, each Γα is weighted with the stationary distribution μ to find reactive trajectories crossing it at a certain point x ∈ Γα,

ρ α ( x ) = 1 Z α q ( x ) ( 1 q ( x ) ) μ ( x ) , Z α = Γ α q ( x ) ( 1 q ( x ) ) μ ( x ) d σ α ( x )
The idea of the FTS method is that the ensemble of reactive trajectories can be characterized by this distribution on the isocommittor surfaces. Third, one assumes that for each α, the probability density ρα is peaked in just one point φ(α) and that the curve φ = φ(α), α [0, 1] defined by the sequence of these points forms the center of the (single) transition channel. More precisely, one defines φ(α) = 〈xΓα where the average is taken according to ρα along the respective isocommittor surface Γα. Fourth, it is assumed that the covariance Cα = 〈(xφ(α)) ⊗ (xφ(α))〉Γα, which defines the width of the transition channel, is small, which implies that the isocommittor surfaces can be locally approximated by hyperplanes Pα. The computation of the FTS string φ then is done by approximating it via φ(α) = 〈xPα, where the average is computed by running constrained dynamics on Pα while iteratively refining the hyperplanes Pα; see [34] for details. Later extensions [35] remove the restrictions resulting from the hyperplanes by using Voronoi tessellations instead.

The FTS method allows one to compute single transition channels in rugged energy landscapes as long as these are not too extended and rugged. Compared to methods that sample the ensemble of reactive trajectories, it has the significant advantage that the string, that is, the principal curve inside the transition channel, is rather smooth and short, as compared to the typical reactive trajectories. The FTS further allows one to compute the free energy profile F = F (α) along the string,

F ( α ) = β 1 log P α μ ( x ) d σ α ( x )
that characterizes the transition rates associated with the transition channel (at least in the limits of the approximations invoked by the FTS).

4. Computing Transition Rates

The computation of transition rates can be performed without computing the dominant transition channels or similar objects. There is a list of rather general techniques, with Forward Flux Sampling (FFS) [8], Transition Interface Sampling (TIS) [9] and Milestoning [10] as examples, that approximate transition rates by exploring how the transition progresses from one to the next interface that separates A from B.

4.1. Forward Flux Sampling (FFS)

The first step of FFS is the choice of a finite sequence of interfaces Ik, k = 1, . . .,N, in state space between A and B = IN. The transition rate kAB comes as the product of two factors: (1) the probability current JA of all trajectories leaving A and hitting I1; and (2) the probability

( B | I 1 ) = j = 1 N 1 ( I k + 1 | I k )
that a trajectory that leaves I1 makes it to B before it returns to A; here, ℙ(Ik+1|Ik) denotes the probability that a trajectory starting in Ik makes it to Ik+1 before it returns to A. FFS first performs a brute-force simulation starting in A, which yields an ensemble of points at the first interface I1, yielding an estimate for the flux JA (the number of trajectories hitting I1 per unit of time). Second, a point from this ensemble on I1 is selected at random and used to start a trajectory, which is followed until it either hits the next interface I2 or returns to A; this gives ℙ(I2|I1). This procedure then is iterated from interface to interface. Finally, the rate kAB = JA · ℙ(B|I1) is computed. Variants of this algorithm are described in [36,37], for example.

FFS has been demonstrated to be quite general in approximating the flux of reactive trajectories through a given set of interfaces; it can be applied to equilibrium, as well as nonequilibrium systems, and its implementation is easy (see [16,38]). The interfaces used in FFS are, in principle, arbitrary. However, the efficiency of the sampling of the reactive hitting probabilities ℙ(Ik+1|Ik) crucially depends on the choice of the interfaces. In practice, the efficiency of FFS will drop dramatically if one does not use appropriate surfaces, and totally misleading rates may result from this. Ideally, one would like to choose these surfaces, so that the computational gain offered by FFS in optimized, but in practice, this is not a trivial task; see [39]. The same is true for TIS that couples TPS with progressing from interface to interface.

4.2. Milestoning

Milestoning [10] is similar to FFS in so far as it also uses a set of interfaces Ik, k = 1, . . .,N that separate A and B = IN. In contrast to FFS and TIS, the fundamental quantities in Milestoning are the hitting time distributions K i ±(τ), i = 1, . . .,N − 1, where K i ±(τ) is the probability that a trajectory starting at t = 0 at interface Ii hits Ii±1 before time τ. Trajectories that make it to milestone Ii must come from milestones Ii±1 and vice versa. In the original algorithm, these distributions are approximated as follows [10]: For each milestone Ii, one first samples the distribution μ constrained to Ii. Based on the resulting sample, we start a trajectory from each point, which is terminated when it reaches one of its two neighboring milestones Ii±1. The hitting times are recorded and collected into two distributions K i ±(τ).

These local kinetics are then compiled into the global kinetics of the process: For each i, one defines Pi(t) as the probability that the process is found between Ii−1 and Ii+1 at time t and that the last milestone hit was Ii. Milestoning is based on a (non-Markovian) construction of Pi(t) from the K i ±(τ). Its efficiency comes from two sources: (1) It does not require the computation of long reactive trajectories but only short ones between milestones (which therefore should be ‘close enough’); (2) It is easily parallelizable. Its disadvantage is the dependence on the milestones that have to be chosen in advance: It can be shown that Milestoning with perfect sampling allows one to compute exact transition rates or mean first passage times if the interfaces are given by the isocommittor surfaces (which in general are not known in advance) [40]; if the interfaces are chosen inappropriately, the results can be rather misleading.

5. Nonequilibrium Forcing and Jarzynski’s Identity

The computation of reliable rare event statistics suffers from the enormous lengths of reactive trajectories. One obvious way to overcome this obstacle is to force the system to exhibit the transition of interest on shorter timescales. Therefore, can we drive the molecular system to make the required transition more frequently but still compute the exact rare event statistics for the unforced system?

As was shown by Jarzynski and others, nonequilibrium forcing can in fact be used to obtain equilibrium rare event statistics. The advantage seems to be that the external force can speed up the sampling of the rare events by biasing the equilibrium distribution towards a distribution under which the rare event is no longer rare. We will shortly review Jarzynski’s identity before discussing the matter in more detail.

5.1. Jarzynski’s Identity

Jarzynski’s and Crook’s formulae [12,13] relate the equilibrium Helmholtz free energy to the nonequilibrium work exerted under external forcing: Given a system with energy landscape V (x), the total Helmholtz free energy can be defined as

F = β 1 log Z with Z = exp ( β V ( x ) ) dx .
Jarzynski’s equality [12] then relates the free energy difference ΔF = −β−1 log(Z1/Z0) between two equilibrium states of a system given by an unperturbed energy V0 and its perturbation V1 with the work W applied to the system under the perturbation: Suppose we set Vξ = (1 − ξ)V0 + ξV1 with ξ ∈ [0, 1], and assume we set a protocol that describes how the system evolves from ξ = 0 to ξ = 1. If, initially, the system is distributed according to exp(−βV0), then, by the second law of thermodynamics, it follows that E(W) ≥ ΔF where W is the total work applied to the system and E denotes the average overall possible realizations of the transition from ξ = 0 to ξ = 1; equality is attained if the transition is infinitely slow (i.e., adiabatic). Jarzynski’s identity now asserts that the free energy is always equal to the exponential average of the nonequilibrium work,
Δ F = β 1 log E [ exp ( β W ) ]
arbitrarily far away from the adiabatic regime. Many generalizations exist: In [13], a generalized version of this fluctuation theorem, the so-called Crook’s formula, for stochastic, microscopically reversible dynamics, is derived. In [41,42], it is shown how one can compute conditional free energy profiles along a reaction coordinate for the unperturbed system, rather than total free energy differences between a perturbed and unperturbed system.

Algorithmic application prohibitive. Despite the fact that Jarzynski’s and Crook’s formulae are used in molecular dynamics applications [43], their algorithmic usability is limited by the fact that the likelihood ratio between equilibrium and nonequilibrium trajectories is highly degenerate, and the overwhelming majority of nonequilibrium forcings generate trajectories that have almost zero weight with respect to the equilibrium distribution that is relevant for the rare event. This leads to the fact that most rare event sampling algorithms based on Jarzynski’s identity have prohibitively large variance. Recent developments have reduced this problem by sampling just the reversible work processes based on Crook’s formula, but could not fully remove the problem of large variance [44]; see also [45]. Because of this, we will approach the problem of variance reduction subsequently.

5.2. Cumulant Generating Functions

In order to demonstrate how to improve approaches based on the idea of driving molecular systems to make rare events frequent, we first have to introduce some concepts and notation from statistical mechanics: Let W be a random variable that depends on the sample paths of (Xt)t≥0, i.e., on molecular dynamics trajectories of the system under investigation. Further, let P be the underlying probability measure on the space of continuous trajectories as introduced in Section 2.2 (but without the restriction to a given length T). We define the cumulant generating function (CGF) of W by

γ ( σ ) = σ 1 log E [ exp ( σ W ) ]
where σ is a non-zero scalar parameter and E[f]= ∫f dP denotes the expectation value with respect to P. Note that the CGF is basically the free energy at inverse temperature β as in Jarzynski’s formula, but here, it is considered as a function of the independent parameter σ. (Definition (6) differs from the standard CGF only by the prefactor σ−1 in front.) Taylor expanding the CGF about σ = 0, we observe that γ(σ) ≈ E[W ] σ 2E[(W − E[W ])2]; hence, for sufficiently small σ, the variance is decoupled from the mean. Moreover, it follows by Jensen’s inequality that
γ ( σ ) E [ W ]
where equality is achieved if and only if W is almost surely constant, in accordance with the second law of thermodynamics. (This is the case, e.g., when W is the work associated with an adiabatic transition between thermodynamic equilibrium states.)

Optimal reweighting.

The CGF admits a variational characterization in terms of relative entropies. To this end, let Q be another probability measure, so that P is absolutely continuous with respect to Q, i.e., the likelihood ratio dP/dQ exists and is Q-integrable. Then, using Jensen’s inequality again,

σ 1 log e σ W dP = σ 1 log e σ W + log ( dP dQ ) dQ { W + σ 1 log ( dQ dP ) } dQ ,
which, noting that the logarithmic term is the relative entropy (or Kullback–Leibler divergence) between Q and P, can be recast as
γ ( σ ) W dQ + H ( Q P )
H ( Q P ) = σ 1 log ( dQ dP ) dQ ,
and we declare that H(QP) = ∞ if Q does not have a density with respect to P. Again, it follows from the strict convexity of the exponential function that equality is achieved if and only if the new random variable
Z = W + σ 1 log ( dQ dP )
is Q-almost surely constant. This gives us the following variational characterization of the cumulant generating function that is due to [46]: Variational formula for the cumulant generating function.

Let W be bounded from above, with E[exp(−σW)] < ∞. Then

γ ( σ ) = inf Q P { W dQ + H ( Q P ) }
where the infimum runs over all probability measures Q that have a density with respect to P. Moreover, the minimizer Q* exists and is given by
dQ * = e γ ( σ ) σ W dP .

6. Optimal Driving from Control Theory

When Xt denotes stochastic dynamics, such as Equation (2), the above variational formula admits a nice interpretation in terms of an optimal control problem with a quadratic cost. To reveal it, we first need some technical assumptions.


We define Q = [0, T) × O where T ∈ [0, ∞] and O ⊂ ℝn is a bounded open set with smooth boundary ∂O. Further, let τ < be the stopping time

τ = inf { t > t 0 : ( t , X t ) Q } ,
i.e., τ is the stopping time that either t = T or Xt leaves the set O, whichever comes first.


The random variable W is of the form

W = 1 0 τ f ( X t ) dt + 1 g ( X τ )
for some continuous and nonnegative functions f, g : ℝn → ℝ, which are bounded from above and at most polynomially growing in x (compare Jarzynski’s formula).


The potential V : ℝn → ℝ in Equation (2) is smooth, bounded below and satisfies the usual local Lipschitz and growth conditions.

We consider the conditioned version of the moment generating function (which is just the exponential of the cumulant generating function):
ψ σ ( x , t ) = E [ exp ( σ W ) | X t = x ] .
By the Feynman–Kac theorem, ψσ solves the linear boundary value problem
( 𝒜 σ f ) ψ σ = 0 ψ σ | E + = exp ( σ g )
where E+ is the terminal set of the augmented process (t, Xt), precisely E+ = ([0, T) × ∂O) ∪ ({T} × O), and
𝒜 = t + L
is the backward evolution operator associated with Xt, with the shorthand
L = Δ V
introduced in Equation (3). Assumptions (A1)–(A3) guarantee that Equation (11) has a unique smooth solution ψσ for all σ > 0. Moreover, the stopping time τ is almost surely finite, which implies that
0 < c ψ σ 1
for some constant c ∈ (0, 1).

Log transformation of the cumulant generating function. In order to arrive at the optimal control version of the variational formula (9), we introduce the logarithmic transformation of ψσ as

v σ ( x , t ) = σ log ψ σ ( x , t ) ,
which is analogous to the CGF γ, except for the leading factor ∊ and the dependence on the initial condition x. As we will show below, vσ is related to an optimal control problem. To see this, remember that ψσ is bounded away from zero and note that
σ ψ σ 1 𝒜 ψ σ = 𝒜 v σ σ | v σ | 2 ,
which implies that Equation (11) is equivalent to
𝒜 v σ σ | v σ | 2 + f = 0 v σ | E + = g .
min α n { 𝒜 v σ + α v σ + 1 4 σ | α | 2 + f } = 0 v σ | E + = g
where we have used that
σ | y | 2 = min α n { α y + 1 4 σ | α | 2 } .
(For the general framework of change-of-measure techniques and Girsanov transformations and their relation to logarithmic transformations, we refer to ([47] (Section VI.3)).)

Optimal control problem. Equation (12) is a Hamilton–Jacobi–Bellman (HJB) equation and is recognized as the dynamic programming equation of the following optimal control problem: minimize

J ( u ) = E [ 0 τ { f ( X t ) + 1 4 σ | u t | 2 } dt + g ( X τ ) ]
over a suitable space of admissible control functions u: [0, ∞) → ℝn and subject to the dynamics
d X t = ( u t V ( X t ) ) dt + 2 dW t .

Form of optimal control. In more detail, one can show (e.g., see ([47] (Section IV.2))) that assumptions (A1)–(A3) above imply that Equation (12) has a classical solution (i.e., twice differentiable in x, differentiable in t and continuous at the boundaries). Moreover, vσ satisfies

v σ ( x , t ) = E [ t τ { f ( X s ) + 1 4 σ | u s * | 2 } ds + g ( X τ ) | X t = x ]
where u* is the unique minimizer of J(u) that is given by the Markovian feedback law
u t * = α * ( X t , t ) ,
α * = argmin α n { α v σ + 1 4 σ | α | 2 } .

The function vσ is called the value function or optimal-cost-to-go for the optimal control problems (13) and (14). Specifically, vσ(x, t) measures the minimum cost needed to drive the system to the terminal state when started at x at time t. We briefly mention the two most relevant special cases of (13) and (14).

6.1. Case I: The Exit Problem

We want to consider the limit T → ∞. To this end, call τO = inf{t > 0: XtO} the first exit time of the set O ⊂ ℝn. The stopping time τ = min{T, τO} then converges to τO, i.e.,

min { T , τ O } τ O .
As a consequence (using monotone convergence), vσ converges to the value function of an optimal control problem with cost functional
J ( u ) = E [ 0 τ O { f ( X t ) + 1 4 σ | u t | 2 } dt + g ( X τ O ) ] .
It can be shown that the value function
v σ ( x , t ) = E [ t τ O { f ( X s ) + 1 4 σ | u s * | 2 } ds + g ( X τ ) | X t = x ]
with u* = argmin J(u) is independent of the initial time t; hence, we can drop the dependence on t and redefine vσ(x) ≔ vσ(x, t). The value function now solves the boundary value HJB equation
min α n { L v σ + α v σ + 1 4 σ | α | 2 + f } = 0 v σ | O = g .

6.2. Case II: Finite Time Horizon Optimal Control

If we keep T < ∞ fixed while letting O grow, such that diam(O) → , where diam(O) = sup{r > 0: ℬr(x) ⊂ O, xO} is understood as the maximum radius r > 0 that an open ball ℬr(·) contained in O can have, it follows that

min { T , τ O } τ .
In this case, vσ converges to the value function with a finite time horizon and cost functional
J T ( u ) = E [ 0 τ { f ( X t ) + 1 4 σ | u t | 2 } dt + g ( X T ) ] .
Now, vσ is again a function on ℝn × [0, T] and given by
v σ ( x , t ) = E [ t τ { f ( X s ) + 1 4 σ | u s * | 2 } ds + g ( X τ ) | X t = x ] ,
with u* being the minimizer of JT (u). The value function solves the backward evolution HJB equation
min α n { 𝒜 v σ + α v σ + 1 4 σ | α | 2 + f } = 0 v σ ( x , T ) = g ( x ) ,
with a terminal condition at time t = T.

6.3. Optimal Control Potential and Optimally Controlled Dynamics

The optimal control u* that minimizes the functional in Equation (13) is again of gradient form and given by

u t * = 2 σ v σ ( X t , t )
as can be readily checked by minimizing the corresponding expression in Equation (12) over α. Given vσ, the optimally controlled dynamics reads
d X t = U ( X t , t ) dt + 2 d W t ,
with the optimal control potential
U ( x , t ) = V ( x ) + 2 σ v σ ( x , t ) .
In the case when T (Case I, above), the biasing potential is independent of t.

Remarks. Some remarks are in order.


Monte-Carlo estimators of the conditional CGF

γ ( σ ; x ) = σ 1 log E [ exp ( σ W ) | X 0 = x ]
that are based on the optimally controlled dynamics have zero variance. This is so because the optimal control minimizes the variational expression in Equation (9), but at the minimum, the random variable inside the expectation must be almost surely constant (as a consequence of Jensen’s inequality and the strict convexity of the exponential function). Hence, we have a zero-variance estimator of the conditional CGF.


The reader may now wonder as to whether it is possible to extract single moments from the CGF (e.g., mean first passage times). In general, this question is not straightforward to answer. One of the difficulties is that extracting moments from the CGF requires one to take derivatives at σ = 0, but small values of σ imply strong penalization, which renders the control inactive and, thus, makes the approach inefficient. Another difficulty is that reweighting the controlled trajectories back to the original (equilibrium) path measure can increase the variance of a rare event estimator, as compared to the corresponding estimator based on the uncontrolled dynamics. As yet, the efficient calculation of moments from the CGF by either extrapolation methods or reweighing is an open question and currently a field of active research (see, e.g., [48,49]).


Jarzynski’s identity relates equilibrium free energies to averages that are taken over an ensemble of trajectories generated by controlled dynamics, and the reader may wonder whether the above zero-variance property can be used in connection with free energy computations à la Jarzynski (cf.[45]). Indeed, we can interpret the CGF as the free energy of the nonequilibrium work

W ξ = 0 T f ( X t , ξ t ) dt
where f is the nonequilibrium force exerted on the system under driving it with some prescribed protocol ξ : [0, T ] → ℝ; in this case, the dynamics Xt depend on ξt, as well, and writing down the HJB equation according to Equation (19) is straightforward. However, even if we can solve Equation (19), we do not get zero-variance estimators for the free energy
F ( ξ T ) = F ( ξ 0 ) = β 1 log E [ exp ( β E ξ ) ] .
The reason for this is simple: Jarzynski’s formula requires that the initial conditions are chosen from an equilibrium distribution, say, π0 the equilibrium distribution corresponding to the initial value ξ0 of the protocol, but optimal controls are defined point-wise for each state (t, Xt) and
β 1 log n E [ exp ( β W ξ ) | X 0 = x ] d π 0 ( x ) β 1 n log E [ exp ( β W ξ ) | X 0 = x ] d π 0 ( x ) .
In other words:
F ( ξ T ) = F ( ξ 0 ) n v β ( x , 0 ) d π 0 ( x ) .


A similar argument as the one underlying the derivation of the HJB equation from the linear boundary value problem yields that Jarzynski’s formula can be interpreted as a two-player zero-sum differential game (cf.[50]).

7. Characterize Rare Events by Optimally Controlled MD

Now, we illustrate how to use the results of the last section in practice. We will mainly consider the case discussed in Section 6.1 regarding the statistical characterization of hitting a certain set.

7.1. First Passage Times

Roughly speaking, the CGF encodes information about the moments of any random variable W that is a functional of the trajectories (Xt)t≥0. For example, for f = ∊ and T, we obtain the CGF of the first exit time from O, i.e.,

σ 1 log E x [ exp ( σ τ O ) ] = min u E x u [ τ O + 1 4 σ 0 τ O | u t | 2 dt ]
where we have introduced the shorthand Ex[·] = E[·|X0 = x] to denote the conditional expectation when starting at X0 = x and the superscript “u” to indicate that the expectation is understood with respect to the controlled dynamics
d X t = ( u t V ( X t ) ) dt + 2 d W t
where E = E0 denotes expectation with respect to the unperturbed dynamics.

7.2. Committor Probabilities Revisited

It is not only possible to use the moment generating function to collect statistics about rare events in terms of the cumulant generating function, but also to express the committor function directly in terms of an optimal control problem (see Section 2.1 for the definition of the committor qAB between to sets A and B). To this end, let σ = 1, and suppose we divide ∂O into two sets B ⊂ ∂O and A = ∂O \ B (i.e., τO is the stopping time that is defined by hitting either A or B). Setting

f = 0 and   g ( x ) = log 1 B ( x )
reduces the moment generating function (10) to
ψ 1 ( x ) = E x [ 1 B ( X τ O ) ]
or, in more familiar terms,
ψ 1 ( x ) = P [ X τ O B X τ O A | X 0 = x ] = q AB ( x ) .
According to Equation (16) the corresponding optimal control problem has the cost functional
J ( u ) = E [ 1 4 0 τ O | u s | 2 ds log 1 B ( X τ O ) ] ,
which amounts to a control problem with zero terminal cost when ending up in B and an infinite terminal cost for hitting A. Therefore, the HJB equation for v = v1 has a singular boundary value at A; it reads
min α n { L v + α v + 1 4 | α | 2 } = 0 v | A = , v | B = 0 .

Setting v(x) = − ∊log qAB(x) yields the equality

log q AB ( x ) = min u E x u [ 1 4 0 τ O | u s | 2 ds log 1 B ( X τ O ) ] .
In this case, the optimally controlled dynamics (20) is of the form
d X t = U AB ( X t ) dt + 2 d W t ,
with optimal control potential
U AB ( x ) = V ( x ) = 2 log q AB ( x ) .

Remarks. Some remarks on the committor equation follow:


The logarithmic singularity of the value function at “reactant state” A has the effect that the control will try to avoid running back into A, for there is an infinite penalty on hitting A. In other words, by controlling the system, we condition it on hitting the “product state” B at time t = τO. Conditioning a diffusion (or general Markov) process on an exit state has a strong connection with Doob’s h-transform, which can be considered a change-of-measure transformation of the underlying path measure that forces the diffusion to hit the exit state with probability one [51].


The optimally controlled dynamics has a stationary distribution with a density proportional to

exp ( β U AB ( x ) ) = q AB 2 ( x ) exp ( β V ( x ) )
where we used β = 1/∊.

7.3. Algorithmic Realization

For the exit problem (“Case I”, above), one can find an efficient algorithm for computing the conditional CGF γ(σ;x) or, equivalently, the value function vσ(x) in [52]. The idea of the algorithm is to exploit that, according to Equations (20) and (21), the optimal control is of gradient form. The latter implies that the value function can be represented as a minimization of the cost functional over time-homogeneous candidate functions C for the optimal bias potential, in other words,

v σ ( x ) = min C E x [ 0 τ O { f ( X t ) + 1 4 σ | C t | 2 } dt + g ( X τ O ) ]
where the expectation E is understood with respect to the path measure generated by
d X t = ( C ( X t ) + V ( X t ) ) dt + 2 d W t .
Once the optimal C has been computed, both value function and CGF can be recovered by setting
v σ ( x ) = C ( x ) 2 σ and γ ( σ ; x ) = C ( x ) 2 σ .

The algorithm that finds the optimal C works by iteratively minimizing the cost functional for potentials C from a finite-dimensional ansatz space, i.e.,

C ( x ) = j = 1 M a j φ j ( x ) ,
with appropriately chosen ansatz functions φj. The iterative minimization is then carried out on the M-dimensional coefficient space of the a1, . . . , aM. With this algorithm, we are able to compute the optimal control potential for the exit problem in the two interesting cases: first passage times and committor probabilities (as outlined in Sections 7.1 and 7.2).

Remarks. Let us briefly comment on some aspects of the gradient descent algorithm.


The minimization algorithm for the value function belongs to the class of expectation-maximization algorithms (although, here, we carry out a minimization rather than a maximization), in that each minimization step is followed by a function evaluation that involves computing an expectation. In connection with rare events sampling and molecular dynamics problems, a close relative is the adaptive biasing force (ABF) method for computing free energy profiles, the latter being intimately linked with cumulant generating functions or value functions (cf. Section 5). In ABF methods (or its variants, such as metadynamics or Wang–Landau dynamics), the gradient of the free energy is estimated on the fly, running a molecular dynamics simulation, and then added as a biasing force to accelerate the sampling in the direction of the relevant coordinates [53,54]. The biasing force eventually converges to the derivative of the free energy, which is the optimal bias for passing over the relevant energy barriers that are responsible for the rare events [55].


The number of basis functions needed depends mainly on the roughness of the value function, but is independent of the system dimension. For systems with clear time scale separation, it has been moreover shown [56] that the optimal control is independent of the fast variables; hence, we expect that the algorithm can be efficient, even for large-scale systems, provided that some information about the relevant collective variables and a reasonable initial guess are available. Yet, the question remains: How many basis functions are needed to approximate the optimal control up to a given accuracy? Controlling the error in the value function and the resulting optimal control is particularly important, as a wrong (e.g., suboptimal) bias potential may lead to Monte-Carlo estimators that may have a larger variance than the vanilla rare event estimator, as has been pointed out in [57,58]. The first results in this direction have been obtained in [59], in which error bounds for the CGF for suboptimal controls have been derived, and [60], which discusses the approximation error of the Galerkin approximation of the corresponding HJB equations; see also [61]for a related discussion regarding so-called log-efficient estimators for rare events.

7.4. Numerical Examples

In our first example, we consider diffusive molecular dynamics as in Equation (2) with ∊ = 0.1 and V being the five-well potential shown in Figure 2. We first compute the CGF of the first passage time as discussed in Section 7.1, using the gradient descent algorithm described in Section 7.3 with 10 Gaussian ansatz functions that are centered around the critical points of the potential energy function. The resulting optimal control potential (21) after roughly 20 iterations of the gradient descent is displayed in Figure 2 for different values of σ. As the set O, we take the whole state space, except a small neighborhood of its global minimum of V, so that its complement Oc is identical to the vicinity of the global minimum and the exit time τO is the first passage time to Oc. Figure 2 shows that the optimal control potential alters the original potential V significantly in the sense that for σ > 0, the set Oc is the bottom of the only well of the potential, so that all trajectories starting somewhere else will quickly enter Oc.

This case is instructive: For the unperturbed original dynamics, the mean first passage time Ex(τO) takes values of around 104 for x > −2. For the optimally controlled dynamics, the mean first passage times into Oc are less than five for σ = 0.1, 0.5, 1.0, so that the estimation of Ex(τO) resulting from the optimal control approach requires trajectories that are a factor of at least 103 shorter than the ones we would have to use by direct numerical simulation of the unperturbed dynamics.

Figure 3 shows the optimal control potentials for computation of the committor qAB, as described in Section 7.2. We observe that the optimal control potential exhibits a singularity at the boundary of the basin of attraction of the set A. That is, it prevents the optimally controlled dynamics from entering the basin of attraction of A and, thus, avoids the waste of computational effort by unproductive returns to A.

In our second example, we consider two-dimensional diffusive molecular dynamics as in Equation (2) with the energy landscape V being the three-well potential shown in Figure 1. In Figure 4, the optimal control potential for computing the committors qAB between the two main wells for two different temperatures ∊ = 0.15 and ∊ = 0.6 are displayed. The numerical solution is based on a Galerkin approximation of the log-transformed HJB equation, using precomputed committor functions as the basis set; see [60] for details.

As in our former experiment, we observe that the optimal control potential prevents the dynamics from returning to A; in addition, it flattens the third well significantly, such that the optimally controlled dynamics in any case quickly goes into B. For ∊ = 0.15, a TPS sampling of reactive trajectories between the two main wells, precisely from A to B with A and B, as indicated in Figure 4, results in an average length of 367 for reactive trajectories based on the original dynamics. For the optimally controlled dynamics, we found an average length of 1.3.

8. Conclusions

We have surveyed various techniques for the characterization and computation of rare events occurring in molecular dynamics. Roughly, the approaches fall into two categories: (a) methods that approach the problem by characterizing the ensemble of reactive trajectories between metastable states or (b) path-based methods that target dominant transition channels or pathways by minimization of suitable action functionals. Methods of the first type, e.g., Transition Path Theory, Transition Path Sampling, Milestoning or variants thereof, are predominantly Monte-Carlo-type methods for generating one very long or many short trajectories, from which the rare event statistics can then be estimated. Methods that belong to the second category, e.g., MaxFlux, Nudged-Elastic Band or the String Method, are basically optimization methods (sometimes combined with a Monte-Carlo scheme); here, the objectives are few (single or multiple) smooth pathways that describe, e.g., a transition event. It is clear that this classification is not completely unambiguous, in that action-based methods for computing most probable pathways can be also used to sample an ensemble of reactive trajectories. Another possible classification (with its own drawbacks) is along the lines of the biased-unbiased dichotomy that distinguishes between methods that characterize rare events based on the original dynamics and methods that bias the underlying equilibrium distribution towards a new probability distribution under which the rare events are no longer rare. Typical representatives of the second class range from biasing force methods, such as ABF or metadynamics, up to genuine nonequilibrium approaches based on Jarzynski’s identity for computing free energy profiles. The problem often is that rare event estimators based on an ensemble of nonequilibrium trajectories suffer from large variances, unless the bias is cleverly chosen.

We have described a strategy to find such a cleverly chosen perturbation, based on ideas from optimal control. The idea rests on the fact that the cumulant generating function of a certain observable, e.g., the first exit time from a metastable set, can be expressed as the solution to an optimal control problem, which yields a zero variance estimator for the cumulant generating function. The control acting on the system has essentially two effects: (1) Under the controlled dynamics, the rare events are no longer rare, as a consequence of which the simulations become much shorter; (2) The variance of the statistical estimators is small (or even zero if the optimal control is known exactly). We should stress that, depending on the type of observable, the approach only appears to be a nonequilibrium method, for the optimal control is an exact gradient of a biasing potential; hence, the optimally perturbed system satisfies a detailed balance, which is one criterion for thermodynamic equilibrium. Future research should address the question as to whether the approach is competitive for realistic molecular systems, how to efficiently and robustly extract information about specific moments rather than cumulant generating functions and how to extend it to the more general observables or the calculation of free energy profiles.

The authors are grateful to Eric Vanden-Eijnden, Giovanni Ciccotti, Frank Pinski and Christoph Dellago for valuable discussions and comments. Ralf Banisch and Tomasz Badowski hold scholarships from the Berlin Mathematical School (BMS). This work was supported by the DFG Research Center “Mathematics for key technologies” (MATHEON) in Berlin.

Conflicts of Interest

The authors declare no conflict of interest.


  1. Eyring, H. The activated complex in chemical reactions. J. Chem. Phys 1935, 3, 107–115, doi:10.1063/1.1749604.
  2. Wigner, E. Calculation of the rate of elementary association reactions. J. Chem. Phys 1937, 5, 720–725, doi:10.1063/1.1750107.
  3. Bolhuis, P.G.; Chandler, D.; Dellago, C.; Geissler, P. Transition path sampling: Throwing ropes over rough mountain passes, in the dark. Annu. Rev. Phys. Chem 2002, 59, 291–318.
  4. E, W.; Ren, W.; Vanden-Eijnden, E. String method for the study of rare events. Phys. Rev. B 2002, 66, 052301.
  5. Beccara, S.; Skrbic, T.; Covino, R.; Faccioli, P. Dominant folding pathways of a WW domain. Proc. Natl. Acad. Sci. USA 2012, 109, 2330–2335, doi:10.1073/pnas.1111796109.
  6. Faccioli, P.; Lonardi, A.; Orland, H. Dominant reaction pathways in protein folding: A direct validation against molecular dynamics simulations. J. Chem. Phys 2010, 133, 045104, doi:10.1063/1.3459097.
  7. Pinski, F.; Stuart, A. Transition paths in molecules: Gradient descent in path space. J. Chem. Phys 2010, 132, 184104, doi:10.1063/1.3391160.
  8. Allen, R.; Warren, P.; ten Wolde, P.R. Sampling rare switching events in biochemical networks. Phys. Rev. Lett 2005, 94, 018104, doi:10.1103/PhysRevLett.94.018104.
  9. Moroni, D.; van Erp, T.; Bolhuis, P. Investigating rare events by transition interface sampling. Physica A 2004, 340, 395–401, doi:10.1016/j.physa.2004.04.033.
  10. Faradjian, A.K.; Elber, R. Computing time scales from reaction coordinates by milestoning. J. Chem. Phys 2004, 120, 10880–10889, doi:10.1063/1.1738640.
  11. Olender, R.; Elber, R. Calculation of classical trajectories with a very large time step: Formalism and numerical examples. J. Chem. Phys 1996, 105, 9299–9315, doi:10.1063/1.472727.
  12. Jarzynski, C. Nonequilibrium equality for free energy differences. Phys. Rev. Lett 1997, 78, 2690–2693, doi:10.1103/PhysRevLett.78.2690.
  13. Crooks, G. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys. Rev. E 1999, 60, 2721–2726, doi:10.1103/PhysRevE.60.2721.
  14. Sarich, M.; Bansich, R.; Hartmann, C.; Schütte, C. Markov state models for rare events in molecular dynamics. Entropy 2014, 16, 258–286.
  15. Allen, R.J.; Valeriani, C.; Tanase-Nicola, S.; ten Wolde, P.R.; Frenkel, D. Homogeneous nucleation under shear in a two-dimensional Ising model: Cluster growth, coalescence, and breakup. J. Chem. Phys 2008, 129, 134704, doi:10.1063/1.2981052.
  16. Berryman, J.T.; Schilling, T. Sampling rare events in nonequilibrium and nonstationary systems. J. Chem. Phys 2010, 133, 244101, doi:10.1063/1.3525099.
  17. E, W.; Vanden-Eijnden, E. Towards a theory of transition paths. J. Stat. Phys 2006, 123, 503–523, doi:10.1007/s10955-005-9003-9.
  18. E, W.; Vanden-Eijnden, E. Metastability, Conformation Dynamics, and Transition Pathways in Complex Systems. In Multiscale Modeling and Simulation; Springer: Berlin, Germany, 2004; pp. 35–68.
  19. Metzner, P.; Schütte, C.; Vanden-Eijnden, E. Illustration of transition path theory on a collection of simple examples. J. Chem. Phys 2006, 125, 084110, doi:10.1063/1.2335447.
  20. E, W.; Vanden-Eijnden, E. Transition-path theory and path-finding algorithms for the study of rare events. Annu. Rev. Phys. Chem 2010, 61, 391–420, doi:10.1146/annurev.physchem.040808.090412.
  21. Metzner, P.; Schütte, C.; Vanden-Eijnden, E. Transition path theory for Markov jump processes. Multiscale Model. Sim 2009, 7, 1192–1219, doi:10.1137/070699500.
  22. Metzner, P. Transition Path Theory for Markov Processes: Application to Molecular DynamicsPh.D. Thesis, Freie Universit¨at Berlin, Berlin, Germany. 2007.
  23. Noé, F.; Schütte, C.; Vanden-Eijnden, E.; Reich, L.; Weikl, T. Constructing the full ensemble of folding pathways from short off-equilibrium trajectories. Proc. Natl. Acad. Sci. USA 2009, 106, 19011–19016, doi:10.1073/pnas.0905466106.
  24. Chandler, D. Finding Transition Pathways: Throwing Ropes over Rough Montain Passes, in the Dark; World Scientific: Singapore, Singapore, 1998.
  25. Dürr, D.; Bach, A. The Onsager-Machlup function as Lagrangian for the most probable path of a diffusion process. Commun. Math. Phys 1978, 60, 153–170, doi:10.1007/BF01609446.
  26. Berkowitz, M.; Morgan, J.D.; McCammon, J.A.; Northrup, S.H. Diffusion-controlled reactions: A variational formula for the optimum reaction coordinate. J. Chem. Phys 1983, 79, 5563–5565, doi:10.1063/1.445675.
  27. Huo, S.; Straub, J.E. The MaxFlux algorithm for calculating variationally optimized reaction paths for conformational transitions in many body systems at finite temperature. J. Chem. Phys 1997, 107, 5000–5006, doi:10.1063/1.474863.
  28. Zhao, R.; Shen, J.; Skeel, R.D. Maximum flux transition paths of conformational change. J. Chem. Theory Comput 2010, 6, 2411–2423, doi:10.1021/ct900689m.
  29. Cameron, M. Estimation of reactive fluxes in gradient stochastic systems using an analogy with electric circuits. J. Comput. Phys 2013, 247, 137–152, doi:10.1016/j.jcp.2013.03.054.
  30. Vanden-Eijnden, E.; Heymann, M. The geometric minimum action method for computing minimum energy paths. J. Chem. Phys 2008, 128, 061103, doi:10.1063/1.2833040.
  31. Jonsson, H.; Mills, G.; Jacobsen, K. Nudged Elastic Band Method for Finding Minimum Energy Paths of Transitions. In Classical and Quantum Dynamics in Condensed Phase Simulations; Berne, B.J., Ciccotti, G., Coker, D.F., Eds.; World Scientific: Singapore, Singapore, 1998.
  32. Freidlin, M.; Wentzell, A.D. Random Perturbations of Dynamical Systems; Springer: New York, NY, USA, 1998.
  33. E, W.; Ren, W.; Vanden-Eijnden, E. Finite temperature string method for the study of rare events. J. Phys. Chem. B 2005, 109, 6688–6693, doi:10.1021/jp0455430.
  34. Ren, W.; Vanden-Eijnden, E.; Maragakis, P; E, W. Transition pathways in complex systems: Application of the finite-temperature string method to the alanine dipeptide. J. Phys. Chem 2005, 123, 134109, doi:10.1063/1.2013256.
  35. Vanden-Eijnden, E.; Venturoli, M. Revisiting the finite temperature string method for the calculation of reaction tubes and free energies. J. Phys. Chem 2009, 130, 194103, doi:10.1063/1.3130083.
  36. Allen, R.; Frenkel, D.; ten Wolde, P.R. Simulating rare events in equilibrium or nonequilibrium stochastic systems. J. Chem. Phys 2006, 124, 024102, doi:10.1063/1.2140273.
  37. Allen, R.; Frenkel, D.; ten Wolde, P.R. Forward flux sampling-type schemes for simulating rare events: Efficiency analysis. J. Chem. Phys 2006, 124, 194111, doi:10.1063/1.2198827.
  38. Becker, N.B.; Allen, R.J.; ten Wolde, P.R. Non-stationary forward flux sampling. J. Chem. Phys 2012, 136, 174118, doi:10.1063/1.4704810.
  39. Kratzer, K.; Arnold, A.; Allen, R.J. Automatic, optimized interface placement in forward flux sampling simulations. J. Chem. Phys 2013, 138, 164112, doi:10.1063/1.4801866.
  40. Vanden-Eijnden, E.; Venturoli, M.; Ciccotti, G.; Elber, R. On the assumptions underlying milestoning. J. Chem. Phys 2008, 129, 174102, doi:10.1063/1.2996509.
  41. Latorre, J.; Hartmann, C.; Schütte, C. Free energy computation by controlled Langevin processes. Procedia Comput. Sci 2010, 1, 1591–1600.
  42. Lelièvre, T.; Stoltz, G.; Rousset, M. Free Energy Computations: A Mathematical Perspective; Imperial College Press: London, UK, 2010.
  43. Isralewitz, B.; Gao, M.; Schulten, K. Steered molecular dynamics and mechanical functions of proteins. Curr. Opin. Struct. Biol 2001, 11, 224–230, doi:10.1016/S0959-440X(00)00194-9.
  44. Vaikuntanathan, S.; Jarzynski, C. Escorted free energy simulations: Improving convergence by reducing dissipation. Phys. Rev. Lett 2008, 100, 190601, doi:10.1103/PhysRevLett.100.190601.
  45. Oberhofer, H.; Dellago, C. Optimum bias for fast-switching free energy calculations. Comput. Phys. Commun 2008, 179, 41–45, doi:10.1016/j.cpc.2008.01.017.
  46. Dai Pra, P.; Meneghini, L.; Runggaldier, W. Connections between stochastic control and dynamic games. Math. Control Signals Syst 1996, 9, 303–326, doi:10.1007/BF01211853.
  47. Fleming, W.; Soner, H. Controlled Markov Processes and Viscosity Solutions; Springer: New York, NY, USA, 2006.
  48. Awad, H.; Glynn, P.; Rubinstein, R. Zero-variance importance sampling estimators for markov process expectations. Math. Oper. Res 2013, 38, 358–388, doi:10.1287/moor.1120.0569.
  49. Badowski, T. Importance Sampling Using Discrete Girsanov EstimatorsPh.D. Thesis, Freie Universit¨at Berlin, Berlin, Germany. 2013. in preparation.
  50. Fleming, W. Risk sensitive stochastic control and differential games. Commun. Inf. Syst 2006, 6, 161–178.
  51. Day, M. Conditional exits for small noise diffusions with characteristic boundary. Ann. Probab 1992, 20, 1385–1419, doi:10.1214/aop/1176989696.
  52. Hartmann, C.; Schütte, C. Efficient rare event simulation by optimal nonequilibrium forcing. J. Stat. Mech. Theor. Exp 2012, 2012, P11004, doi:10.1088/1742-5468/2012/11/P11004.
  53. Darve, E.; Rodriguez-Gomez, D.; Pohorille, A. Adaptive biasing force method for scalar and vector free energy calculations. J. Chem. Phys 2008, 128, 144120, doi:10.1063/1.2829861.
  54. Lelièvre, T.; Rousset, M.; Stoltz, G. Computation of free energy profiles with parallel adaptive dynamicss. J. Chem. Phys 2007, 126, 134111, doi:10.1063/1.2711185.
  55. Lelièvre, T.; Rousset, M.; Stoltz, G. Long-time convergence of an adaptive biasing force methods. Nonlinearity 2008, 21, 1155–1181, doi:10.1088/0951-7715/21/6/001.
  56. Zhang, W.; Latorre, J.; Pavliotis, G.; Hartmann, C. Optimal control of multiscale diffusions using reduced-order models. J. Comput. Dynam 2013. submitted.
  57. Dupuis, P.; Wang, H. Importance sampling, large deviations, and differential games. Stochast. Stochast. Rep 2004, 76, 481–508, doi:10.1080/10451120410001733845.
  58. Dupuis, P.; Wang, H. Subsolutions of an isaacs equation and efficient schemes for importance sampling. Math. Oper. Res 2007, 32, 723–757, doi:10.1287/moor.1070.0266.
  59. Zhang, W.; Hartmann, C.; Weber, M.; Schütte, C. Importance sampling in path space for diffusion processes. Multiscale Model. Sim 2013. submitted.
  60. Banisch, R.; Hartmann, C. Meshless Discretizations of LQ-type stochastic control problems. SIAM J. Control Optim 2013. submitted.
  61. Vanden-Eijnden, E.; Weare, J. Rare event simulation of small noise diffusions. Commun. Pure Appl. Math 2012, 65, 1770–1803, doi:10.1002/cpa.21428.
Entropy 16 00350f1 200
Figure 1. (Top left panel) Three-well energy landscape V as described in the text. (Top right panel) Typical reactive trajectory in the three-well landscape. (Middle left panel) Committor functions qAB for diffusion molecular dynamics with relatively high temperature ∊ = 0.6 for the sets A (main well, right-hand side) and B (main well, left-hand side). (Middle right panel) Committor qAB for the low temperature case ∊ = 0.15. (Bottom left panel) Transition channels for ∊ = 0.6. (Bottom right panel) Transition channels for ∊ = 0.15. For details of the computations underlying the pictures, see [22].

Click here to enlarge figure

Figure 1. (Top left panel) Three-well energy landscape V as described in the text. (Top right panel) Typical reactive trajectory in the three-well landscape. (Middle left panel) Committor functions qAB for diffusion molecular dynamics with relatively high temperature ∊ = 0.6 for the sets A (main well, right-hand side) and B (main well, left-hand side). (Middle right panel) Committor qAB for the low temperature case ∊ = 0.15. (Bottom left panel) Transition channels for ∊ = 0.6. (Bottom right panel) Transition channels for ∊ = 0.15. For details of the computations underlying the pictures, see [22].
Entropy 16 00350f1 1024
Entropy 16 00350f2 200
Figure 2. Five-well potential (left) and associated optimal control potential for the first passage time to the target set Oc given by a small interval around the main minimum x1 (right) for different values of σ (right). ∊ = 0.1; the gradient descent solution fully agrees with the reference finite element solution (that is not shown) in the “eye-norm”.

Click here to enlarge figure

Figure 2. Five-well potential (left) and associated optimal control potential for the first passage time to the target set Oc given by a small interval around the main minimum x1 (right) for different values of σ (right). ∊ = 0.1; the gradient descent solution fully agrees with the reference finite element solution (that is not shown) in the “eye-norm”.
Entropy 16 00350f2 1024
Entropy 16 00350f3 200
Figure 3. Optimally-corrected potential for the case of J being the committor qAB for B being the ±0.1-interval around the main minimum x1 of the potential. (Left panel) A =]x3 0.1,x3 +0.1[ the ±0.1 interval around the highest minimum x3. (Right panel) A =]x2 0.1,x2 +0.1[ the ±0.1 interval around the second lowest minimum x2.

Click here to enlarge figure

Figure 3. Optimally-corrected potential for the case of J being the committor qAB for B being the ±0.1-interval around the main minimum x1 of the potential. (Left panel) A =]x3 0.1,x3 +0.1[ the ±0.1 interval around the highest minimum x3. (Right panel) A =]x2 0.1,x2 +0.1[ the ±0.1 interval around the second lowest minimum x2.
Entropy 16 00350f3 1024
Entropy 16 00350f4 200
Figure 4. Optimally-corrected potential for the three-well potential shown in Figure 1 for the committor qAB for the medium temperature ∊ = 0.6 case (left), the low temperature ∊ = 0.15 case (right) and for the sets A (ellipse in main well, right-hand side) and B (ellipse in main well, left-hand side). Note that the committor basis is not smooth at the boundaries of the initial and target sets (see Figure 1 for comparison), which explains the roughness of the control potential in the neighborhood of the sets A and B.

Click here to enlarge figure

Figure 4. Optimally-corrected potential for the three-well potential shown in Figure 1 for the committor qAB for the medium temperature ∊ = 0.6 case (left), the low temperature ∊ = 0.15 case (right) and for the sets A (ellipse in main well, right-hand side) and B (ellipse in main well, left-hand side). Note that the committor basis is not smooth at the boundaries of the initial and target sets (see Figure 1 for comparison), which explains the roughness of the control potential in the neighborhood of the sets A and B.
Entropy 16 00350f4 1024
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert