Enhanced Sampling in Molecular Dynamics Using Metadynamics, Replica-Exchange, and Temperature-Acceleration

We review a selection of methods for performing enhanced sampling in molecular dynamics simulations. We consider methods based on collective variable biasing and on tempering, and offer both historical and contemporary perspectives. In collective-variable biasing, we first discuss methods stemming from thermodynamic integration that use mean force biasing, including the adaptive biasing force algorithm and temperature acceleration. We then turn to methods that use bias potentials, including umbrella sampling and metadynamics. We next consider parallel tempering and replica-exchange methods. We conclude with a brief presentation of some combination methods.

The discussion so far leaves open the correct way to compute the local free-energy gradients. A gradient is a local quantity, so a natural choice is to compute it from an MD simulation localized at a point in CV space by a constraint. Consider a long MD simulation with a holonomic constraint fixing the system at the point z. Uniform samples from this constrained trajectory x(t) then represent an ensemble at fixed z over which the averaging needed to convert gradients in potential energy to gradients in free energy could be done. However, this constrained ensemble has the undesired property that the velocities θ(x) are zero. This is a bit problematic because virtually none of the samples plucked from a long unconstrained MD simulation (as is implied by Eq. 1), would haveθ = 0, andθ = 0 acts as a set of M unphysical constraints on the system velocitiesẋ, sinceθ j = i (∂θ j /∂x i )ẋ i . Probably the best-known example of a method to correct for this bias is the so-called "blue-moon" sampling method [12][13][14][15] or the constrained ensemble method [16,17]. The essence of the method is a decomposition of free energy gradients into components along the CV gradients and thermal components orthogonal to them: where · θ(x)=z denotes averaging across samples drawn uniformly from the MD simulation constrained at θ(x) = z, and the b j (x) is the vector field orthogonal to the gradients of every component k of θ for where δ jk is the Kroenecker delta. (For brevity, we have omitted the consideration of holonomic 91 constraints other than that on the CV; the reader is referred to the paper by Ciccotti et al. for details [15].) 92 The vector fields b j for each θ j can be constructed by orthogonalization. The first term in the angle 2. environmental effects on covalent bond formation/breaking (usually in combination with ab initio the atomic coordinates, and balancing those forces against their exact opposite should allow for thermal motion to take the system out of those minima. The second idea is a bit more subtle; after all, in a running MD simulation with no CV constraints, the constrained ensemble expression for the mean force (Eq. 4) does not directly apply, because a constrained ensemble is not what is being sampled. However, Darve et al. showed how to relate these ensembles so that the samples generated in the MD simulation could be used to build mean forces [35]. Further, they showed using a clever choice of the fields of Eq. 4 an equivalence between (i) the spatial gradients needed to computed forces, and (ii) time-derivatives of the CV's [6]: where M θ is the transformed mass matrix given by where J θ is the M ×3N matrix with elements ∂θ i /∂x j (i  Both blue-moon sampling and ABF are based on statistics in the constrained ensemble. However, estimation of mean forces need not only use this ensemble. One can instead relax the constraint and work with a "mollified" version of the free energy: where δ κ refers to the Gaussian (or "mollified delta function"): where β is just shorthand for 1/k B T . Since lim βκ→∞ δ κ = δ, we know that lim βκ→∞ F κ = F . One way 129 to view this Gaussian is that it "smoothes out" the true free energy to a tunable degree; the factor 1/ √ βκ 130 is a length-scale in CV space below which details are smeared. 131 Because the Gaussian has continuous gradients, it can be used directly in an MD simulation. Suppose we have a CV space θ(x), and we extend our MD system to include variables z such that the combined set (x, z) obeys the following extended potential: where V (x) is the interatomic potential, and κ is a constant. Clearly, if we fix z, then the resulting free energy is to within an additive constant the mollified free energy of Eq. 8. (The additive constant is related to the prefactor of the mollified delta function and has nothing to do with the number of CV's.) Further, we can directly express the gradient of this mollified free energy with respect to z: [54] This suggests that, instead of using constrained ensemble MD to accumulate mean forces, we could 132 work in the restrained ensemble and get very good approximations to the mean force. By "restrained", 133 we refer to the fact that the term giving rise to the mollified delta function in the configurational integral 134 is essentially a harmonic restraining potential with a "spring constant" κ. Temperature-accelerated MD (TAMD) [7] takes advantage of the restrained-ensemble approach to directly evolve the variables z in such a way to accelerate the sampling of CV space. First, consider how the atomic variables x evolve under the extended potential (assuming Langevin dynamics): Here, m i is the mass of x i , γ is the friction coefficient for the Langevin thermostat, and η is the thermostat white noise satisfying the fluctuation-dissipation theorem at physical temperature β −1 : Key to TAMD is that the z are treated as slow variables that evolve according to their own equations of motion, which here we take as diffusive (though other choices are possible [7]): Here,γ is a fictitious friction,m j is a mass, and the first term on the right-hand side represents the 141 instantaneous force on variable z j , and the second term represents thermal noise at the fictitious thermal The advantage of TAMD is that if (1)γ is chosen sufficiently large so as to guarantee that the slow variables indeed evolve slowly relative to the fundamental variables, and (2) κ is sufficiently large such that θ(x(t)) ≈ z(t) at any given time, then the force acting on z is approximately equal to minus the gradient of the free energy (Eq. 11) [7]. This is because the MD integration repeatedly samples κ [θ(x) − z] for an essentially fixed (but actually very slowly moving) z, so z evolution effectively feels these samples as a mean force. In other words, the dynamics of z(t) is effectivelȳ This shows that the z-dynamics describes an equilibrium constant-temperature ensemble at fictitious 144 temperatureβ −1 acted on by the "potential" F (z), which is the free energy evaluated at the physical In the previous section, we considered methods that achieve enhanced sampling by using mean forces:

191
in TI, these are integrated to reconstruct a free energy; in ABF, these are built on-the-fly to drive uniform 192 CV sampling; and in TAMD, these are used on-the-fly to guide accelerated evolution of CV's. In this 193 section, we consider methods that achieve enhanced sampling by means of controlled bias potentials. As 194 a class, we refer to these as non-Boltzmann sampling methods.

195
Non-Boltzmann sampling is generally a way to derive statistics on a system whose energetics differ from the energetics used to perform the sampling. Imagine we have an MD system with bare interatomic potential V (x), and we add a bias ∆V (x) to arrive at a biased total potential: The statistics on the CV's on this biased potential are then given as where · denotes ensemble averaging on the unbiased potential V (x). Further, if we take the bias potential ∆V to be explicitly a function only of the CV's θ, then it becomes invariant in the averaging of the numerator thanks to the delta function, and we have Finally, since the unbiased statistics are P (z) = δ [θ(x) − z] , we arrive at Taking samples from an ergodic MD simulation on the biased potential V b , Eq. 19 provides the recipe for reconstructing the statistics the CV's would present were they generated using the unbiased potential V . However, the probability P (z) is implicit in this equation, because This is not really a problem, since we can treat e −β∆V as a constant we can get from normalizing 196 P b (z)e β∆V (z) .

197
How does one choose ∆V so as to enhance the sampling of CV space? Evidently, from the standpoint of non-Boltzmann sampling, the closer the bias potential is to the negative free energy −F (z), the more uniform the sampling of CV space will be. To wit , then e β∆V (z) = e −βF (z) = P (z), and Eq. 19 can be inverted for P b to yield So we see that taking the bias potential to be the negative free energy makes all states z in CV space 198 equiprobable. This is indeed the limit to which ABF strives by applying negative mean forces, for Umbrella sampling is the standard way of using non-Boltzmann sampling to overcome free energy barriers. In its debut [69], umbrella sampling used a function w(x) that weights hard-to-sample configurations, equivalent to adding a bias potential of the form w is found by trial-and-error such that configurations that are easy to sample on the unbiased potential 211 are still easy to sample; that is, w acts like an "umbrella" covering both the easy-and hard-to-sample 212 regions of configuration space. Nearly always, w is an explicit function of the CV's, Coming up with the umbrella potential that would enable exploration of CV space with a single 214 umbrella sampling simulation that takes the system far from its initial point is not straightforward. Akin 215 to TI, it is therefore advantageous to combine results from several independent trajectories, each with 216 its own umbrella potential that localizes it to a small volume of CV space that overlaps with nearby 217 volumes. The most popular way to combine the statistics of such a set of independent umbrella sampling 218 runs is the weighted-histogram analysis method (WHAM) [70].

219
To compute statistics of CV space using WHAM, one first chooses the points in CV space that define the little local neighborhoods, or "windows" to be sampled and chooses the bias potential used to localize the sampling. Not knowing how the free energy changes in CV space makes the first task somewhat challenging, since more densely packed windows are preferred in regions where the free energy changes rapidly; however, since the calculations are independent, more can be added later if needed. A convenient choice for the bias potential is a simple harmonic spring that tethers the trajectory to a reference point z i in CV space: which means the dynamics of the atomic variables x are identical to Eq. 12 at fixed z = z i . The points 220 {z i } and the value of κ (which may be point-dependent) must be chosen such that θ [x(t)] from any one 221 window's trajectory makes excursions into the window of each of its nearest neighbors in CV space.

222
Each window-restrained trajectory is directly histogrammed to yield apparent (i.e., biased) statistics on θ; let us call the biased probability in the ith window P b,i (z). Eq. 19 again gives the recipe to reconstruct the unbiased statistics P i (z) for z in the window of z i : We could use Eq. 24 directly assuming the biased MD trajectory is ergodic, but we know that regions far 223 from the reference point will be explored very rarely and thus their free energy would be estimated with 224 large uncertainty. This means that, although we can use sampling to compute P b,i knowing it effectively 225 vanishes outside the neighborhood of z i , we cannot use sampling to compute e −β 1  As already mentioned, one of the difficulties of the umbrella sampling method is the choice and 249 construction of the bias potential. As we already saw with the relationship among TI, ABF, and TAMD, 250 an adaptive method for building a bias potential in a running MD simulation may be advantageous.  In metadynamics, configurational variables x evolve in response to a biased total potential: where V 0 is the bare interatomic potential and ∆V (x, t) is a time-dependent bias potential. The key element of metadynamics is that the bias is built as a sum of Gaussian functions centered on the points in CV space already visited: Here, w is the height of each Gaussian, τ G is the size of the time interval between successive Gaussian converges to the negative of the free energy, thus providing an optimal bias to enhance transition events.
The difference between the metadynamics estimate of the free energy and the true free energy can be shown to be related to the diffusion coefficient of the collective variables and to the rate at which the bias is grown. A possible way to decrease this error as a simulation progresses is to decrease the growth rate of the bias. Well-tempered metadynamics [148] used an optimized schedule to decrease the deposition rate of bias by modulating the Gaussian height: Here, ω 0 is the initial "deposition rate", measured Gaussian height per unit time, and ∆T is a parameter 263 that controls the degree to which the biased trajectory makes excursions away from free-energy minima.

264
It is possible to show that using well-tempered metadynamics the bias does not converge to the negative 265 of the free-energy but to a fraction of it, thus resulting in sampling the CVs at an effectively higher 266 temperature T + ∆T , where normal metadynamics is recovered for ∆T → ∞. We notice that other 267 deposition schedules can be used aimed, e.g., at maximizing the number of round-trips in the CV space realizable only when the auxiliary variables z never move, is the only way to guarantee they are perfectly 279 accurate. In practical application, TAMD never achieves perfect adiabatic separation. In contrast, 280 because the deposition rate of decreases as a well-tempered trajectory progresses, errors related to poor 281 adiabatic separation are progressively damped. Second, as already mentioned, TAMD alone cannot 282 report the free energy, but it also is therefore not practically limited by the dimensionality of CV space; 283 multicomponent gradients are just as accurately calculated in TAMD as are single-component gradients.

284
Metadynamics, as a histogram-filling method, must exhaustively sample a finite region around any point 285 to know the free energy and its gradients are correct, which can sometimes limit its utility.

286
Metadynamics is a powerful method whose popularity continues to grow. In either its original 287 formulation or in more recent variants, metadynamics has been employed successfully in several fields,    The first of these may seem obvious: CV's are chosen to provide a low-dimensional description of 309 some important process, say a conformational change or a chemical reaction or a binding event, and 310 one can't describe a process without being able to discriminate states. However, it is not always easy 311 to find CV's that do this. Even given representative configurations of two distinct metastable states, 312 standard MD from these two different initial configurations may sample partially overlapping regions of 313 CV space, making ambiguous the assignation of an arbitrary configuration to a state. It may be in this 314 case that the two representative configurations actually belong to the same state, or that if there are two 315 states, that no matter what CV space is overlaid, the barrier separating them is so small that, on MD 316 timescales, they can be considered rapidly exchanging substates of some larger state.

330
An obvious way of reducing the likelihood of hidden barriers is to use increase the dimensionality of 331 CV space. TAMD is well-suited to this because it is a gradient method, but standard metadynamics, 332 because it is a histogram-filling method, is not. A recent variant of metadynamics termed 333 "reconnaissance metadynamics" [172] does have the capability of handling high-dimensional CV spaces.

334
In reconnaissance metadynamics, bias potential kernels are deposited at the CV space points identified 335 as centers of clusters detected and measured by an on-the-fly clusterization scheme. These kernels are 336 hyperspherically symmetric but grow as cluster sizes grow and are able to push a system out of a CV 337 space basin to discover other basins. As such, reconnaissance metadynamics is an automated way of There are very few "best practices" codified for choosing CV's for any given system. Most CV's 343 are developed ad hoc based on the processes that investigators would like to study, for instance, 344 center-of-mass distance between two molecules for studying binding/unbinding, or torsion angles 345 for studying conformational changes, or number of contacts for studying order-disorder transitions.  . 363 We finally mention the possibility of building collective variables based on set of frames which might be available from experimental data or generated by means of previous MD simulations. Some of these variables are based on the idea of computing the distances between the present configuration and a set of precomputed snapshots. These distances, here indicated with d i , where i is the index of the snapshot, are then combined to obtain a coarse representation of the present configuration, which is then used as a CV. As an example, one might combine the distances as If the parameter λ is properly chosen, this function returns a continuous interpolation between the indexes of the snapshots which are closer to the present conformation. If the snapshots are disposed along a putative path connecting two experimental structures, this CV can be used as a path CV to monitor and bias the progression along the path [182]. A nice feature of path CVs is that it is straighforward to also monitor the distance from the putative path. The standard way to do it is by looking at the distance from the closest reference snapshot, which can be approximately computed with the following continuous function:

367
"Tempering" refers to a class of methods based on increasing the temperature of an MD system to 368 overcome barriers. Tempering relies on the fact that according to the Arrhenius law the rate at which 369 activated (barrier-crossing) events happen is strongly dependent on the temperature. Thus, an annealing 370 procedure where the system is first heated and then cooled allows one to produce quickly samples which 371 are largely uncorrelated. The root of all these ideas indeed lies in the simulated annealing procedure 372 [185], a well-known method successfully used in many optimization problems. artificially modified during the simulation. In particular, sampling is initially done at a temperature 376 high enough that the simulation can easily overcome high free-energy barriers. Then, the temperature is 377 decreased as the simulation proceeds, thus smoothly bringing the simulation to a local energy minimum.

378
In simulated annealing, a critical parameter is the cooling speed. Indeed, the probability to reach the 379 global minimum grows as this speed is decreased.

380
The search for the global minimum can be interpreted in the same way as sampling an energy landscape at zero temperature. One could thus imagine to use simulated annealing to generate conformations at, e.g., room temperature by slowly cooling conformations starting at high temperature. However, the resulting ensemble will strongly depend on the cooling speed, thus possibly providing a biased result. A better approach consists of the the so-called simulated tempering methods [186]. Here, a discrete list of temperatures T i , with i ∈ 1 . . . N are chosen a priori, typically spanning a range going from the physical temperature of interest to a temperature which is high enough to overcome all relevant free energy barriers. (Note that we do not have to stipulate a CV-space in which those barriers live.) Then, the index i, which indicates at which temperature the system should be simulated, is evolved with time. Two kind of moves are possible: (a) normal evolution of the system at fixed temperature, which can be done with a usual Markov Chain Monte Carlo or molecular dynamics and (b) change of the index i at fixed atomic coordinates. It is easy to show that the latter can be performed as a Monte Carlo step with acceptance equal to where i and j are the indexes corresponding to the present temperature and the new one. The weights 381 Z i should be choosen so as to sample equivalently all the value of i. It must be noticed that also within 382 molecular dynamics simulations only the potential energy usually appears in the acceptance. This is due 383 to the fact that the velocities are typically scaled by a factor 3.2. Parallel tempering A smart way to alleviate the issue of finding the correct weights is that of simulating several replicas at the same time [188,189]. Rather that changing the temperature of a single system, the defining move proposal in parallel tempering consists of a coordinate swap between two T -replicas with acceptance probability This method is the root of a class of techniques collectively known as "replica exchange" methods, and 394 the latter name is often used as a synonimous of parallel tempering. Notably, within this framework 395 it is not necessary to precompute a set of weights. Indeed, the equal time spent by each replica at 396 each temperature is enforced by the constraint that only pairwise swaps are allowed. Moreover, parallel 397 tempering has an additional advantage: since the replicas are weakly coupled and only interact when 398 exchanges are attempted, they can be simulated on different computers without the need of a very fast 399 interconnection (provided, of course, that a single replica is small enough to run on a single node).

400
The calculation of the acceptance is very cheap as it is based on the potential energy which is often 401 computed alongside force evaluation. Thus, one could in theory exploit also a large number of virtual, 402 rejected exchanges so as to enhance statistical sampling [190,191]. Since efficiency of parallel tempering 403 simulation can deteriorate if the stride between subsequent exchanges is too large [192,193], a typical 404 recipe is to choose this stride as small as possible, with the only limitation of avoiding extra costs due to 405 replica synchronization. One can push this idea further and implement asynchronous versions of parallel 406 tempering, where overhead related to exchanges is minimized [193,194]. One should be however aware 407 that, especially at high exchange rate, artifacts coming from e.g. the use of wrong thermostating schemes 408 could spoil the results [195,196]. 409 Parallel tempering is popular in simulations of protein conformational sampling [197,198], protein 410 folding [189,[199][200][201][202][203] and aggregation [204,205], due at least in part to the fact that one need not choose 411 CV's to use it, and CV's for describing these processes are not always straightforward to determine. The difference between the replicas is not restricted to be a change in temperature. Any control parameter can be changed, and even the expression of the Hamiltonian can be modified [206]. In the most general case every replica is simulated at a different temperature (and or pressure) and a different Hamiltonian, and the acceptance reads could notice that a scaling of the Hamiltonian by a factor λ is completely equivalent to a scaling of the 417 temperature by a factor λ −1 . Hamiltonian scaling however can take advantage of the fact that the total 418 energy of the system is an extensive property. Thus, one can limit the scaling to the portion of the system 419 which is considered to be interesting and which has the relevant bottlenecks. With solute tempering, the 420 solute energy is scaled whereas the solvent energy is left unchanged. This is equivalent to keeping the In general, the advantage of these tempering methods over straighforward sampling can be 439 rationalized as follows. A simulation is evolved so as to sample a modified ensemble by e.g. raising to be minimized.

464
Parallel tempering has the advantage that all the replicas can be analyzed to obtain meaningful results, 465 e.g., to predict the melting curve of a molecule. This procedure should be used with caution, especially 466 with empirically parametrized potentials, which are often tuned to be realistic only at room temperature.

467
On the other hand, Hamiltonian replica exchange often relies on unphysically modified ensembles which 468 have no interest but for the fact that they increase ergodicity.

469
As a final note, we observe that data obtained at different temperature (or with modified Hamiltonians) 470 could be combined to enhance statistics at the physical temperature [223]. However, the effectiveness 471 of this data recycling is limited by the fact that high temperature replicas visit very rarely low energy 472 conformations, thus decreasing the amount of additional information that can be extracted. e.g., the folded state so as to better estimate the free-energy difference between the folded and unfolded 497 states.

498
It is also possible to combine metadynamics with the solute tempering method so as to decrease the The string method is generally an approach to find pathways of minimal energy connecting two points in phase space [228]. When working in CV's, the string method is used to find minimal free-energy paths (MFEP's) [229]. String method calculations involve multiple replicas, each representing a point z s in CV space at position s along a discretized string connecting two points of interest (reactant and product states, say). The forces on each replica's z s are computed and their z s 's updated, as in TAMD, with the addition of forces that act to keep the z's equidistant along the string (so-called reparameterization forces):γż Here,M jk is the metric tensor mapping distances on the manifold of atomic coordinates to the manifold  Because TAMD provides mean-force estimates as it is exploring CV space, it stands to reason that those mean forces could be used to compute a free energy. In contrast, in the single-sweep method [68], the TAMD forces are only used in the CV space exploration phase, not the free-energy calculation itself. Recently, Abrams and Vanden-Eijnden proposed a method for using TAMD directly to parameterize a free energy; that is, to determine the best set of some parameters λ on which a free energy of known functional form depends [235]: The approach, termed "on-the-fly free energy parameterization", uses forces from a running TAMD simulation to progressively optimize λ using a time-averaged gradient error: If constructed so that F is linear in λ = (λ 1 , λ 2 , . . . , λ M ), minimization of E can be expressed as a simple linear algebra problem and the running TAMD simulation provides progressively better estimates of A and b until the λ 527 converge. In the cited work, it was shown that this method is an efficient way to derive potentials of 528 mean force between particles in coarse-grained molecular simulations as basis-function expansions. It 529 is currently being investigated as a means to parameterize free energies associated with conformational 530 changes of proteins. 531 Chen, Cuendet, and Tuckermann developed a very similar approach that in addition to parameterizing 532 a free energy using d-AFED-computed gradients uses a metadynamics-like bias on the potential [236].

533
These authors demonstrated efficient reconstruction of the four-dimensional free-energy of vacuum 534 alanine dipeptide with this approach.

536
In this review, we have summarized some of the current and emerging enhanced sampling methods 537 that sit atop MD simulation. These have been broadly classified as methods that use collective variable 538 biasing and methods that use tempering. CV biasing is a much more prevalent approach than tempering, 539 due partially to the fact that it is perceived to be cheaper, since tempering simulations are really only