Accelerating Kinetics with Time-Reversal Path Sampling

In comparison to numerous enhanced sampling methods for equilibrium thermodynamics, accelerating simulations for kinetics and nonequilibrium statistics are relatively rare and less effective. Here, we derive a time-reversal path sampling (tRPS) method based on time reversibility to accelerate simulations for determining the transition rates between free-energy basins. It converts the difficult uphill path sampling into an easy downhill problem. This method is easy to implement, i.e., forward and backward shooting simulations with opposite initial velocities are conducted from random initial conformations within a transition-state region until they reach the basin minima, which are then assembled to give the distribution of transition paths efficiently. The effects of tRPS are demonstrated using a comparison with direct simulations of protein folding and unfolding, where tRPS is shown to give results consistent with direct simulations and increase the efficiency by up to five orders of magnitude. This approach is generally applicable to stochastic processes with microscopic reversibility, regardless of whether the variables are continuous or discrete.

Thermodynamics and kinetics are two fundamental aspects of physical systems.In general, thermodynamics and equilibrium statistics are relatively easy to handle compared to kinetics and nonequilibrium statistics.Based on the ubiquitous Boltzmann distribution, various efficient enhanced sampling methods have been developed to greatly accelerate simulations of equilibrium properties [1], e.g., umbrella sampling [2,3], histogram method [4,5], temperature replica-exchange [6][7][8], integrated tempering sampling [9], and metadynamics [10,11].In recent years, machine learning techniques were also combined with conventional enhanced sampling to explore the vast conformational space of the studied systems [12][13][14].
In contrast, there is less known profound principles for kinetics and nonequilibrium statistics.The Onsager reciprocal relations caught keenly the time reversal symmetry of the underlying microscopic dynamics to express the equality of certain ratios between different pairs of forces and flows [15][16][17].The Jarzynski equality [18], which revealed an unexpected connection between irreversible work and free energy difference, actually utilized the invariance of equilibrium distributions.Kinetics was widely described by a transition state theory [19], which is based only on the information of potential energy surfaces and thus cannot provide an accurate transition rate.For simulations, a direct simulation on transition processes to determine the transition rate is usually inefficient since the simulation trajectory spends most of the time wobbling and swaying in the vicinity of the reactant free energy minimum and the transition events to the product basin are extremely rare [20].A major category of accelerating kinetics simulations were based on a description of path ensemble where the transition paths can be sampled purposefully [21].Many simulation methods have been developed accordingly, e.g., transition path sampling (TPS) [22][23][24], transition interface sampling (TIS) [25,26], forward flux sampling (FFS) [27,28] and a Bayesian relation method [29][30][31].The distribution of path ensemble is inherently related to the maximum entropy principle [32], and thus machine learning has potential applications in path statistics and kinetics compu-tation.Other approaches such as Hyperdynamics using a bias potential to upshift the free energy basins were also explored [33][34][35].Overall, accelerating kinetics simulations often relied on more complex assumptions than enhanced samplings of thermodynamics, and are usually less effective.
In this letter, we present a related approach to accelerate kinetics simulation for systems with free energy barrier.The approach is established based on the time reversal symmetry of the microscopic dynamics and the existence of equilibrium distribution, and is thus expected to be generally applicable.The obtained formula to calculate transition rate is bias-free and is easy to implement.
The microscopic state of a system is described by a point in the phase space.Imagine that a phase space has many states (similar to the ensemble picture) that evolve and bifurcate (due to stochastics) over time, forming some infinitely long trajectories.To make practical statistics on trajectories, one should use some ways to cut the infinitely long trajectories into paths (segments of trajectories) whose length (duration) is finite.In the literature, there are two main schemes in cutting trajectories.One is to cut trajectories at fixed time so that the resulting paths have the same duration.This scheme was adopted by TPS [22][23][24] and TIS [25,26].Another is to cut trajectories with some fixed planes in the phase space with specified conformational features.For example, for a system with a double-well free energy profile, the basin minima can be chosen as the cutting points, and the trajectory was cut whenever it crosses the cutting points [Fig.1(a)].This scheme was adopted in methods such as FFS [27,28] and the Bayesian relation method [29][30][31], and is also adopted in this study.
The thermodynamic properties are determined by the state statistics, while the kinetic properties are embedded in the path statistics, i.e., the distribution of paths.To make the analysis more clear, the trajectories and paths are assumed to be composed of a sequence of conformations (plus velocities if necessary) sampled with a very small time step dt [see Fig. 1(a)].Then a path-i is described in where L i denotes the duration (in a unit of dt) of path-i.Actually, the distribution of paths contains the information for state statistics.According to the basic postulates of statistical thermodynamics, the ensemble average (distribution) of a thermodynamic variable is equal to its time average.For a certain property Q (which may present any conformation property, e.g., the fractional number of native contact for protein folding considered below, or the indicator to indicate whether the system falls in a certain free-energy basin A or B), its equilibrium distribution is given by where if (Conf i,j ∈ Q) = 1 or 0 depending on whether Conf i,j possesses a property Q.In other words, the number of conformations in a basin A in the ensemble (at any specified time) is equal to that of the conformations in A for all paths (at various time).When focusing on kinetics, one needs to calculate how many molecules transit from basin A to basin B in a certain time (Fig. 1).
We use the basin minima A and B to cut the trajectory [Fig.1(a)], and thus the paths can be classified into four types: A-A, A-B, B-A and B-B depending on the beginning and end points are cut by A or B. A-B and B-A are transition paths.Notably, each A-B path will contribute a transition event from A to B within dt, because each conformation (filled circles in Fig. 1) on a trajectory/path will run ahead (evolve) to occupy the position of the preceding one, regardless of the path duration and speeds, which is convenient for calculation.For the transition/reaction the kinetic equation is where N A is the number of molecules in basin A. Therefore, the transition rate coefficient k is calculated as where if (Path i ∈ A-B) indicates whether the path is classified into the type A-B, and if (Conf i,j ∈ A) indicates whether the conformation falls in the basin A. The denominator in the above equation includes the contributions of paths A-A, A-B and B-A, but generally A-A is dominant and the other can be ignored to yield Therefore, the kinetics can be readily obtained from path statistics.
In thermodynamics, sampling can be accelerated by applying bias or other means to enhance the sampling probability at certain regions of phase space.Similarly, methods can be developed to enhance the sampling of certain paths in kinetics simulations.Obviously, a direct simulation to generate trajectories/paths to calculate Eq. ( 6) is very inefficient since most produced paths belong to the type A-A and B-B but not transition paths A-B and B-A, and the resulting error of the numerator in Eq. ( 6) is large.Then it comes the main idea of this study [Fig.1(b)], i.e., to enhance the sampling of A-B and B-A paths using the time reversal symmetry of the microscopic dynamics.Specifically, any A-B path will pass the transition state TS (strictly speaking, here we use it not for the genuine transition state, but just refer to a high free-energy range separating two basins, or, through which all transition paths have to pass), and is divided into an A-TS half-path and a TS-B half-path.The distribution of TS-B half-paths can be easily obtained by shooting simulation from initial states at TS (whose equilibrium distribution can be obtained by conventional enhanced sampling for equilibrium conformation statistics) and terminated at A or B (to give TS-A and TS-B half-paths).Although the A-TS half-paths is difficult to obtain in direct simulations from initial states at A, their population is exactly equal to that of the reverse TS-A ones due to the time reversal symmetry and can be obtained from the shooting simulations from TS.
In actual simulations, the initial conformations are randomly chosen from an equilibrium distribution within a TS region [T S − , T S + ] which are determined by umbrella sampling in the examples below, and a forward and a backward shooting simulations with opposite initial velocities are conducted until they reach the basin minima A or B. Then the backward half-path is reversed and assembled with the forward one to give a path passing the TS range, which may be A-A (i.e., A-TS-A), A-B, B-A and B-B (i.e., B-TS-B) types.It is noted that such a path may contain more than one conformations in the TS region (the number of which is denoted as n TS ), which have the same probability to be chosen as initial state to generate the path.To avoid any double counting of the same path, the sampled paths by shooting simulations should be corrected by a weight 1/n TS so as to be consistent with those in Eq. ( 6).In addition, it is noted that the denominator of Eqns.(5,6) is actually the equilibrium conformation number in the basin A. Taking all these together, it yields where i in shooting indicates a summation over paths assembled from shooting simulations, and t TS = n TS dt is the duration of a path spent within the TS region.N TS /N A is the population ratio of equilibrium conformations in the TS region and the basin A. Eq. ( 7) is the central result of this study, which indicates that the transition rate of kinetics can be obtained from an equi- librium result N TS /N A and a shooting simulation, both of which are easy to implement.It is applicable to both continuous and discrete variables.We term the method time-reversal path sampling (tRPS).
We test the method on the protein folding problem [36].Although protein modelling has advanced rapidly over the past 50 years, a direct approach to protein folding kinetics is still challenging [37].We consider a widely studied model protein, chymotrypsin inhibitor 2 (CI2) with 64 residues (PDB ID: 2CI2), which folds and unfolds in a simple two-state manner [38].A coarse-grained Gō-like model was adopted to describe the conformational energetics of protein in the folding and unfolding processes [39,40], where the protein conformation is represented by the Cα coordinates of the amino acid residues (see Supporting Information).Molecular dynamics (MD) simulations were conducted to obtain the equilibrium conformation distribution and folding/unfolding rates.
The obtained equilibrium free energy profiles of CI2 using the umbrella sampling technique are plotted in Fig. 2 as a function of the number of native contact (Q).The free energy profiles exhibit a typical double well form and are quite smooth, with one basin minimum at Q ≈ 20 for unfolded states and another at Q ≈ 110 for folded states.The folded states get more stable with decreasing the temperature.The midpoint temperature at which folded and unfolded states exhibit an equal stability was determined to be T = 0.855, where the free-energy barrier is about 6.4k B T .We chose two schemes of TS regions in separating folded/unfolded basins, i.e., Q TS ∈ [50, 51] and Q TS ∈ [80, 81] (black thin lines in Fig. 2), to calculate the equilibrium population ratio N TS /N A (here A can represent unfolded or folded states) as well as preparing initial conformations within the TS region for shooting simulations as required by tRPS in Eq. ( 7).
The accelerating effect on the kinetics calculation using the tRPS method is demonstrated in Fig. 3 with a comparison to direct simulations.The logarithmic folding/unfolding rates form a typical V-shape (chevron plot in protein folding).The rates obtained from tRPS agree excellently with the direct simulations in a wide range over two orders of magnitude.It is noted that the choice of TS region, whether Q TS ∈ [50, 51] or Q TS ∈ [80, 81], does not affect the agreement since the only requirement for the TS region is that it separates the folded and unfolded basins.It is not required to be the true transition state.As a proof of the acceleration, the simulation time consumed in tRPS, e.g., the average time in obtaining a transition path in shooting simulation starting from TS region (green diamonds in Fig. 3), is much shorter than that in direct simulations (blue squares).Although the equilibrium property N TS /N A is also required by Eq. ( 7), various efficient enhanced sampling methods have been developed previously in obtaining the equilibrium properties (among which we adopted the umbrella sampling here).In addition, the temperaturedependence of free-energy difference [−k B T ln(N TS /N A ) is approximately linear (Fig. S1) and is thus relatively simple to determine.Usual models of kinetics suggested that the system has to oscillate around the bottom of a basin for many times before it finally crosses the barrier to successfully transit into another basin.This is in line with the proteinfolding example here: the averaged duration of A-A and B-B paths keeps roughly unchanged with increasing the temperature (Fig. S2), similar to the characteristics of a simple pendulum.To have one successful A-B or B-A transition, it has to oscillate 10 3 ∼ 10 6 times.The acceleration of tRPS originates from the fact that it does not need to spend a lot of time on the massive oscillations, but can directly sample the transition paths.
With tRPS, transition paths can be readily obtained for analyses (Fig. 4).The averaged duration of transition paths t TPath is in an order of magnitude of 10 4 steps and increases exponentially with decreasing temperature [Fig.4(a)], much larger than that for A-A and B-B oscillation paths (in the order of magnitude of 10 2 steps, see Fig. S2).The logarithmic t TPath roughly obeys a Gaussian distribution [Fig.4(b)], similar to previous studies [20,31].Remarkably, a transition path usually has multiple times to cross the TS region, the number of which well obeys an exponential distribution, i.e., a memoryless distribution [Fig.4(d)].Consequently, the duration of a path spent within the TS region, t TS , also obeys the exponential distribution [Fig.4(c)].At the midpoint temperature T = 0.855, the average crossing times is about 25, and an averaged t TS of about 540 steps.The large crossing times indicates that the conventional transition state theory would inevitable overestimate the transition rates since it assumes that a transition path crosses the TS region only once.Another discrepancy with the transition state theory is that the activation enthalpy of folding/unfolding (the minus slope of unfolding curve in Fig. 3 is about 62ε at midpoint temperature) is not equal to the enthalpy difference between the TS region and the unfolded/folded state (contributed by the N TS /N A term in Eq. (7), which is about 72ε for unfolding) due to the contribution of the last term in Eq. ( 7).
The tRPS method can be combined with direct simulations to provide much more comprehensive results.For example, the minimal/maximal Q value of a path can be adopted to measure how far it can go, and the resulting path population decreases exponentially with the distance between Q min/max and the cutting point (Fig. 5).This makes the paths with distant Q min/max hard to sample in direct simulations.The tRPS method, on the other hand, samples only the paths that cross the TS region and thus possess the capability to probe distant Q min/max .The patches they provided can be combined to give a smooth and complete distribution (Fig. 5).
Transition rates decay exponentially with the barrier height, but the duration of transition paths usually depends on the barrier in a much weaker logarithmic law [20,41].This makes tRPS even more powerful when the barrier is high.As a proof we apply tRPS on another protein, acylphosphatase (PDB ID: 1APS) with 98 residues, which was listed as a slow folding protein in a previous study [42].The obtained free-energy profile is smooth, possessing a high barrier of about 15k B T at a midpoint temperature of T = 0.913 [Fig.6(a)].The folding/unfolding are slow, being extremely difficult to determine with direct simulations.Therefore, we con- ducted direct simulations only in some feasible temperature range [filled circles/squares in Fig. 6(b)], and applied tRPS to complete the gaps (opened circles/squares).Results of direct simulation and tRPS in the overlapping area are well consistent with each other.The data combine to give a nice chevron plot.As a main expense of tRPS, the averaged duration [opened diamonds in Fig. 6(b)] of transition paths for acylphosphatase is similar in the order of magnitude with that for CI2.The increase in the efficiency of kinetics calculation by tRPS is up to five orders of magnitude around the midpoint temperature if not taking into account the expense of equilibrium calculations [for N TS /N A in Eq. ( 7)].
The validity of tRPS relies on the time reversibility.Although it cannot be applied to irreversible systems as methods like FFS [27], it possesses the benefits of "shooting from the top" [31] to avoid possible inadequate choice of initial states in basin that are capable/incapable of crossing barrier.Although the TS region was not particularly optimized in our examples (Fig. S3), the resulting rates from tRPS are satisfactory (Figs. 3 and 6).Another underlying assumption of tRPS is the preequilibrium after a transition path, i.e., after a transition hit the cutting planes at the basin bottom, the system will preequilibrate within the basin but not cross the barrier back to the reactant basin soon.This can be tested by extending the shooting simulation after the path hits the cutting planes.Analyses on the examples of CI2 and acylphosphatase show that the error caused by the preequilibrium assumption is negligible (data not shown).In addition, Eq. ( 7) is beneficial in terms that it is less affected by possible complicated energy landscape around the bottom of basins.For the case of acylphosphatase, some abnor-mal high durations of transition paths were observed in shooting simulations at low temperatures (Fig. S4), likely caused by hidden traps within basins, but the resulting kinetic rates seem unaffected (Fig. 6).
In Summary, in this letter we have proposed a method to accelerate the simulations for determining the kinetics of systems.The approach was constructed based on the time reversibility of microscopic dynamics and is thus generally applicable.It is easy to implement, and can operate on both continuous and discrete variables.The method was tested on the folding/unfolding of two proteins with fast and slow kinetics.In areas where direct kinetics simulations can be readily performed, the accelerating method produced results fully consistent with direct simulations.In areas where direct simulations are inaccessible, the accelerating method provided reasonable results at little cost, with an increase of efficiency up to five orders of magnitude.The technique is easily applied to other kinds of calculations such as quantum dynamics and chemical reaction.
This work was supported by the National Natural Science Foundation of China (grant 22273003).Part of the simulations was performed on the High-Performance Computing Platform of the Center for Life Science (Peking University).

FIG. 1 .
FIG. 1. Schematics on accelerating kinetics with time reversibility.(a) The construction of path ensemble.A very long trajectory (blue line) is cut into some short paths (segments) by two cutting planes A and B located at the basin minima of the free energy profile (brown line).The cutting points are marked with red crossings.Green filled circles represent the conformations sampled on the trajectory with a fixed time step (dt) and violet arrows indicate their velocities.(b) Collection of paths crossing the transition state (TS).Starting from an initial conformation (bigger circles with red edge) (obeying equilibrium distribution) within the TS range [T S−, T S+] (indicated by dashed-dotted lines), a forward and a backward simulations with opposite initial velocities are conducted until they reach any cutting planes (A or B), and, with the time reversal symmetry, they can be assembled to give a path passing the TS range.

9 TSFIG. 2 .
FIG.2.The free energy profiles of the protein CI2 as a function of the number of native contact (Q) at different reduced temperature (from top to bottom): T = 0.9, 0.855, 0.8ε/kB (where ε is the native contact energy strength).Two choices of transition state (TS) regions at around Q = 50 and Q = 80 are indicated by black thin lines.

FIG. 4 .
FIG. 4. Properties of transition paths.(a) The averaged duration of transition paths (t TPath ) as a function of 1/T , which obeys an exponential law (solid line).(b, c, d) The distributions of t TPath (b), the duration of a path spent within the TS region (tTS) (c), and the number of times a path crosses the TS region (d), at the midpoint temperature T = 0.855.Solid lines are quadratic fit in (b) and linear fit in (c,d).

FIG. 6 .
FIG.6.Equilibrium thermodynamics and kinetics of protein folding/unfolding for acylphosphatase.(a) The free energy profiles at different reduced temperature (from top to bottom): T = 0.95, 0.925, 0.913, 0.9, 0.875ε/kB.(b) Accelerating kinetics with tRPS.The temperature T was measured in a unit of ε/kB.The folding/unfolding rates of direct simulations were plotted in filled squares/circles, while those obtained by tRPS were shown in opened squares/circles (with QTS ∈ [100, 102]).The sampling rate of transition paths were plotted in scattering diamonds to demonstrate the acceleration effect.