A Novel Hybrid Monte Carlo Algorithm for Sampling Path Space

To sample from complex, high-dimensional distributions, one may choose algorithms based on the Hybrid Monte Carlo (HMC) method. HMC-based algorithms generate nonlocal moves alleviating diffusive behavior. Here, I build on an already defined HMC framework, hybrid Monte Carlo on Hilbert spaces (Beskos, et al. Stoch. Proc. Applic. 2011), that provides finite-dimensional approximations of measures π, which have density with respect to a Gaussian measure on an infinite-dimensional Hilbert (path) space. In all HMC algorithms, one has some freedom to choose the mass operator. The novel feature of the algorithm described in this article lies in the choice of this operator. This new choice defines a Markov Chain Monte Carlo (MCMC) method that is well defined on the Hilbert space itself. As before, the algorithm described herein uses an enlarged phase space Π having the target π as a marginal, together with a Hamiltonian flow that preserves Π. In the previous work, the authors explored a method where the phase space π was augmented with Brownian bridges. With this new choice, π is augmented by Ornstein–Uhlenbeck (OU) bridges. The covariance of Brownian bridges grows with its length, which has negative effects on the acceptance rate in the MCMC method. This contrasts with the covariance of OU bridges, which is independent of the path length. The ingredients of the new algorithm include the definition of the mass operator, the equations for the Hamiltonian flow, the (approximate) numerical integration of the evolution equations, and finally, the Metropolis–Hastings acceptance rule. Taken together, these constitute a robust method for sampling the target distribution in an almost dimension-free manner. The behavior of this novel algorithm is demonstrated by computer experiments for a particle moving in two dimensions, between two free-energy basins separated by an entropic barrier.


Introduction
Often, it is important to understand how molecules change conformations. For example, the folding of proteins is of high interest [1]. One approach is to simulate such transitions and generate a thermodynamic-relevant ensemble of paths that start in one freeenergy basin and end in another. The molecular motion here is assumed to be driven by the random thermal motions of the surroundings. This is modeled by Brownian dynamics with the thermal noise supplied by a heat reservoir operating at the fixed temperature . The evolution of the particle position x is described by the Stochastic Differential Equation (SDE): where dW t is the standard Wiener process. Here, the force (drift) is assumed to be the (negative) gradient of a potential energy function U(x), namely F(x) = −∂U/∂ x. The physical potential U(x) must be bounded from below to be physical. Integrating the SDE forward over time, one assembles a trajectory: the starting position may be known, but not the ending point. As shown by Onsager and Machlup [2], the path probability can be expressed in terms of the path positions. The (negative) logarithm of this probability has become known as the Onsager-Machlup (OM) functional. As shown in Appendix A, the form of the functional depends on how the SDE (Equation (1)) is discretized. To understand molecular transitions, it would be useful to extract a thermodynamic distribution of paths that start in one free energy basin and end in another. For such constrained paths, an effective Hamiltonian can be formed and expressed as: with the details given in Appendix A. Using this effective Hamiltonian, standard techniques can be used to sample the probability distribution at an effective temperature e f f = 2 /∆t, with being the physical temperature and ∆t being the time discretization step used in approximating the SDE. Performing such path sampling is challenging for several reasons. Firstly, as the time step used to solve the SDE becomes small, that is as ∆t → 0, contributions from the highfrequency modes lead to a divergence when evaluated using the effective Hamiltonian (Equation (2)). Secondly, at each point along the path, one must have a full description of the molecule. Thirdly, barriers separating different pathways may be insurmountable. Thus, it is extremely important to have efficient algorithms to probe the thermodynamicrelevant distribution of paths. The work by Beskos et al. [3] addressed the first of these issues using a Hybrid Monte Carlo (HMC) [4] method that augments the path space with Brownian bridges. One of the advantages is that HMC generates nonlocal moves, alleviating the diffusive behavior that plagues other methods [5]. However, in that method, the velocities form a Brownian bridge, and thus, they scale with the (time) path length. This growth negatively affects the accuracy of the algorithm. As shown in Section 2, by creating velocities that form an Ornstein-Uhlenbeck (OU) [6] bridge, one can eliminate this dependence on the path length, while continuing to treat the high-frequency modes accurately. In Section 2, the algorithm for the deterministic integration in the Molecular Dynamics (MD) step is described, as well as the associated error in the effective energy. In Section 3, an example in two dimensions is described, as well as how the algorithmic parameters are chosen. In the novel scheme described here, the time step for the deterministic, molecular dynamics integration can be increased by at least an order of magnitude over the previous method [3], leading to a concomitant decrease in computational effort. In other words, the novel method presented here is at least 10 times faster. This is followed in Section 4 by the description of the results for sampling paths that contain a transition from one basin to another. In Section 5, the continuous-time limit of the OM functional is sampled. As found before [7,8], this form produces unphysical results. The paper ends with a short discussion of how this new algorithm can be used in calculations employing the path integral molecular dynamic method and, finally, with some concluding remarks.

Time Evolution of the Hamiltonian
In sampling the probability distribution, the effective Hamiltonian, Equation (2), is augmented by including Gaussian distributed variables that can be identified as the momentum, with the aim of using a Hybrid (or Hamiltonian) Monte Carlo (HMC) method. The method is summarized as follows: first, pick the augmented variables from their known distribution; use them to (approximately) evolve the Hamiltonian flow; and then, accept or reject the evolved path using the Metropolis-Hastings criterion. The augmented Hamiltonian is given by: The braket notation, introduced in Appendix A, is used to simplify the look of the equations. Define x = q + l t as the physical path with l t being a term linear in time that allows x to have the starting point x(0) = x − and the ending point x(T) = x + . Note that Φ(x) = Φ(q + l t ) and is denoted as Φ(q). The path l t is given by the linear (in t) relationship: The path p is the momentum conjugate to the path q. The distribution of the momentum p is known, but not transparent. However, the distribution of the velocities is easier to understand, as it corresponds to an OU process, with A ou being the OU parameter.
To understand how the momenta p are distributed, one begins with the OU process: The OM functional for this process is: The mass operator can be defined as M = (L + A 2 ou 1) and then the momentum as |p >= M |v >, giving the effective Hamiltonian for the momentum variables as: Thus, the velocities will correspond to an OU process with both endpoints being zero, which is an OU bridge. See Appendix B for the numerical details for constructing realizations of OU bridges. Now return to the original Hamiltonian. Equation (3) with some modifications is: Note that with α = 1, this Hamiltonian (Equation (7)) has the term 1 2 q A 2 ou 1 q added and then subtracted. This algebraic manipulation importantly simplifies the numerical procedure, as will be shown below. The role (and value) of the parameter α will be addressed later.
It is convenient to work with the velocities v given by |v >= M −1 |p >, which comprise a bridge being zero at both endpoints. When A ou = 0, the velocities form a Brownian bridge, with covariance (with t ≥ s) being E (v s × v t ) = s(1 − t/T) where T is the length of the path. This gives E v 2 T/2 = T/4, which grows with the (time) length of the path. In contrast, the covariance of an OU bridge is given by: with being the temperature. At the midpoint of the OU bridge, E v 2 T/2 ≈ /A ou when A ou T 2, which is independent of the path length T. In Appendix B, the numerical procedure for constructing an OU bridge is described.
Defining φ in terms of Φ as φ(q) = ∂Φ/∂q, the equations of motion are given by: with |φ(q) >= {φ(q 0 ), φ(q 1 ), φ(q 3 ), ... , φ(q N t )} and with the A and B groupings being used in the splitting implemented in the numerical algorithms. This method used is symplectic as it corresponds to a symmetric splitting in a Trotter-Strang [9,10] procedure.

Splitting: ABA
The numerical implementation of the ABA splitting is given in this section. The BAB splitting is relegated to Appendices C and D. Schematically, the integration can be viewed as pictured in Equation (10). The ABA procedure begins with the initial position and velocity (q 0 , v 0 ), then during the time interval h/2, these evolve into (q H , v H ) using Equations (11) and (12). Next is the so-called full step, where the velocity evolves over a time h changing to w H using Equation (13). The final transformation takes (q H , w H ) and turns them into (q 1 , v 1 ) during a time interval h/2 using Equations (14) and (15).
Note that the integrations in the half steps are exact, while the full step integration is approximate. As the numerical algorithm is based on a symmetric splitting of the time evolution operator, it is symplectic and represents the flow of a shadow Hamiltonian [11,12]. Thus, for a small integration time step h, the energy error will be bounded. In the following section, the formula for this error is examined.

"Energy" Error
The deterministic MD integration is preformed over a total time N MD h. For each time increment, the integration corresponds to the ABA method described above. The total energy error for the MD integration can be determined from the sum of the errors of each ABA step. Thus, it is sufficient to find the expression for the energy error in a single ABA step.
The error of the "energy" in the ABA step is due to the middle (full) Step B, since the first and third steps reflect exact integrations. The error is given by: which simplifies to: Substituting for v H and w H , one transforms the last term using: The value of α can be chosen to be α = sinc (h/2). With this value of α, the energy error is given as: with q 01 = (q 1 + q 0 )/2. Now, it is clear why α was chosen in this way. The approximate equations of motion reproduce the flow of a shadow Hamiltonian [11,12]. Evidently, the flow produced by numerically solving the equations of motion with α = sinc (h/2) is closer to the actual flow than to the one with unit α. The full step in the ABA splitting is: The operator M −1 is defined with the boundary conditions such that ∆v is zero at both the beginning and end of the path. The above is then a matrix equation: with M being a symmetric, real tridiagonal matrix, with the only nonzero elements given by: With the vanishing of ∆v at both ends of the path, there are no endpoint corrections to M. To solve the matrix equation, the standard Gaussian elimination procedure is used (without pivoting).

Numerical Experiments
To explore the efficacy of the HMC method described above, I chose a two-dimensional example. A particle moving in a potential is given by: where x and y are the usual Cartesian coordinates. The first term above is a unit Gaussian centered at x g = −0.5 and y g = −1. The second term is a trough, with a minimum of zero and somewhat squarish in shape. A contour plot of this potential is given in Figure 1. In this figure, note that the white contour corresponds to U(x, y) = 2 and the dashed line designates the contour U[x, y) = 0.05, which encloses the solid black contour U(x, y) = 0.001. A saddle point exists near x s ≈ −0.5 and y s ≈ −1 with U[x s , y s ) = 1. A local maximum exists near x m ≈ −0.225 and y m ≈ −0.794 with U(x m , y m ) ≈ 1.6. Connecting the free-energy basins is an energy-barrier-free path that becomes severely narrowed as it crosses the x-axis. Thus, one can view this potential as consisting of two basins separated by an entropic barrier if the temperature is not too high.

Equilibrium Distribution
In this work, the computer experiments were performed with a temperature = 0.05. The temperature is much smaller than the lowest energy barrier, which is unity at the saddle point. The equilibrium (Boltzmann) distribution at a temperature = 0.05 is displayed in the next two figures. In Figure 2, this distribution is shown in terms of a density plot. The narrow channel at the top of the figure connects the smaller basin on the left with the larger basin (on the right). The lack of an energy barrier is reflected in the (equal) values of the distribution along the channel. The equilibrium (Boltzmann) distribution for a temperature = 0.05 plotted as a "density" plot. The black areas denote the higher probability areas. The narrow channel connects the left and right probability basins. The lack of an energy barrier is seen by the lack of variation in the shading along the channel. The angle Θ is pictured here.
To quantify the equilibrium distribution, I looked at its values as a function of angle Θ, which is defined in a clockwise sense, measured with respect to the negative y axis, as defined in Figure 2. At a temperature = 0.05, the left basin has 36% of the weight, and the remaining 64% is in the right basin. A function P is defined as and is plotted in Figure 3 to emphasize the entropic barrier between the two basins. The value of δ = π/40 was used. Notice that the single-peak structure on the left differs from the twin peaks on the right. This structure is caused by the geometric factor, which arises when the radius line slices the distribution function.

Parameter Tuning
Unless stated otherwise, the form of the OM functional used was that corresponding to the midpoint integration of the SDE as written in Equation (A9). To investigate the two-dimensional paths that contain a transition between the two basins, one must pick several parameters; some are physical: the temperature and the path length. For this problem, the temperature was chosen to be = 0.05. This temperature is small compared to the unit energy barrier. Thus, at this temperature, the narrowness of the channel will be the limiting factor that impedes basin hopping. The path length T must not be too long nor too short. For a very short value of T, the motion will become ballistic and uninteresting. For a very large T, the path will be dominated by intra-basin motion, which is not the focus of the study. The remaining parameters are those of the algorithm: the time length of the deterministic integration, the OU bridge parameters, A x ou and A y ou , and finally, the value of the algorithmic time step h. The last of these will be determined by requiring the Metropolis-Hastings acceptance rate to be sufficiently large.

Path Length
The next point to address is how quickly the particle moves from the left to right basins. To make an estimate of this transition time, I collected information from 10,000 realizations of Brownian dynamics by performing forward integrations of the two-dimensional SDE. Each integration started at the same place in the left basin. The results for the number of attempts residing in the right basin are tabulated as a function of time and plotted in Figure 4. As one sees, the curve approaches its equilibrium value as the total time reaches 1000. Furthermore, when the elapsed time is 125, over a third of the realizations are in the right basin. Thus, this latter value is a reasonable time (path length) to use for exploring doubly constrained paths that contain a transition. In the following sections, paths are examined that start in the left basin and end in the right basin, while time T = 125 has elapsed. The path is then described by 125,001 ordered pairs with ∆t = 1.0 × 10 −3 .

Deterministic Integration Time
One of the advantages of the HMC method is its ability to generate nonlocal moves. To explore the nonlocality of a proposed move, the correlation function d (x) (n h) is defined by: with a similar definition for d (y) (n h), where n is the number of the molecular dynamics steps and N t is the number of points along the path. In Figure 5, d (x) (τ) and d (y) (τ) are plotted as functions of τ for two cases, A The time step parameter h was adjusted so that the (energy) errors were comparable, as seen in Figure 6. Such energy errors will be examined more closely later in the paper. As seen in Figure 5, both d (x) (τ) and d (y) (τ) level out for τ > 2. Thus, the MD integration time should be on the order of π/2 to maximize the nonlocality of the move. To avoid any resonances that might occur if a fixed integration time should be chosen, the length of the integration is chosen to be uniformly distributed in the interval (π/4, 3π/4) designated by the gray shaded area in Figure 5.
The time step parameter for the former case is h = 6.667 × 10 −4 and for the latter h = 6.667 × 10 −5 . In the latter case, the smaller h meant that an order of magnitude more computing resources were required to generated the gray curve, as compared to the black one.

OU Bridge Parameter
To understand how the choice of A x ou and A y ou affect the sampling behavior, consider the plots of the correlation functions in Figure 7. As shown in the this figure and in Figure 5, the y-degree of freedom shows the floppiness that corresponds to the low energy cost of movement in that direction in either basin. The relative stiffness of motion in the x direction is a direct result of the restriction due to the narrow channel connecting the two basins. The floppiness (in the y direction) can be mathematically reduced by increasing the value of A y ou . For the chosen integration times (as designated as a gray rectangle in Figure 7), values A y ou between four and eight put the correlation function d (y) (τ) in the same range as d (x) (τ). However, decreasing A x ou below unity (not shown in the figures) does not alter the stiffness as its origin is due to the geometry of the connecting channel.

MD Time Step Size (h)
For the deterministic integration in the Molecular Dynamics (MD) step, a choice must be made for the size of the time increment h. The size of h must not be too large nor too small. In the first case, the errors become large and the acceptance rate becomes unacceptably small. In the second case, computational resources are wasted. Since the numerical algorithm is based on a Trotter-Strang [9,10] splitting, for small values of h, the energy error is bounded, as the energy of a "nearby" Hamiltonian (the shadow Hamiltonian) is conserved. As shown in Figure 8, for small h, the energy error oscillates as a function of integration time τ. In addition, because of the stochastic nature of the integrals, the curves are noisy, with the noise increasing as h increases. The value for h can be chosen by requiring the Metropolis-Hastings acceptance rate to be on the order of 80%. In Figure 9, this acceptance rate is plotted as a function of the time step h. As can be seen in the plot, the rate is a very steep function of h. Furthermore, when A x OU = A y OU = 0, h must be less than 0.0002 to have a substantial acceptance rate, while when A x OU = 1 and A y OU = 8, h can be chosen almost two orders of magnitude larger. Evidently the acceptance rate is sensitive to h as the values of A x OU and A y OU change.  The above highlights the advantage of the novel algorithm presented here. By choosing the OU parameters, A x OU and A y OU , to be unity, reasonably high acceptance rates can be achieved by using a value of h 10 times larger than when both OU parameters are zero. As shown in Figure 5, the nonlocality generated in the MD integration is only weakly dependent on the OU parameters, but does depend on the total MD integration time. Taken together, this then leads to a corresponding factor of 10 decrease in the computational effort by choosing the OU parameters to be unity. As shown in Figure 9, larger values of h can be used when larger OU parameters are used, leading to even fewer MD steps being necessary to attain the same MD integration time. This translates into a speedup of a factor of 10 or more, showing the power of picking nonzero values for the OU parameters and using the corresponding "optimal" value for h.

Path Sampling
As seen in Figures 2 and 3, in equilibrium, a particle will spend a small fraction of time in the narrow channel. Thus, the expectation is that a transition path would consist of motion in one basin or the other with a short transition while in the narrow channel. In Figure 10, a typical path is displayed: it starts in the left basin, remaining in it for under 25% of the path length, making a transition, remaining in the right basin for approximately the last 75% of the path. The boundary conditions influence the shape of the path, although the time spent in the narrow channel can change during the sampling procedure. The shown path is the end result of a simulation using A x OU = 1 and A y OU = 8 after 100,000 Metropolis-Hastings steps.   Figure 2 for the definition of Θ. At t = 0, the particle starts out in the left basin, makes its way through the narrow channel at t ≈ 50, and ends in the right basin.
In Figure 11, a summary of the evolution of this simulation is plotted. For these values of A OU , on average, the particle spends about 70% of the time in the right basin. The variation of this percentage changes very little (only a few percent) over the course of the simulation run. In Figure 12, a similar plot is shown for the parameters A x OU = 1 and A y OU = 4 and 200,000 Metropolis-Hastings steps, twice as long as used for the results shown in Figure 11. The pictures are similar, except that the mean percentages are slightly shifted, and the variation about the mean is larger in the second case. The smaller variation in the first case is due to the larger value of A y OU , which dampens the oscillations, the floppiness, in the y component. With either parameter set, the path sampling calculations correctly infer that the free energy of the right basin is larger that of the left basin.

Number of Metropolis Steps
Fraction Figure 11. Results for a calculation with A x OU = 1 and A y OU = 8. The bottom (black) curve is the fraction of the path with |x| < 1/2; the middle (gray) curve is the fraction of the path with x < −1/2; and the top (black) curve is the fraction of the path with x > 1/2.

Number of Metropolis Steps
Fraction Figure 12. Results for a calculation with A x OU = 1 and A y OU = 4. The bottom (black) curve is the fraction of the path with |x| < 1/2; the middle (gray) curve is the fraction of the path with x < −1/2; and the top (black) curve is the fraction of the path with x > 1/2. Now, consider Figure 13, where the information for the second simulation run is summarized. Here, the normalized histogram is plotted for the run with A x OU = 1 and A y OU = 4. Note that the comparison with the equilibrium histogram is quite close. This closeness indicates that the calculated ensemble of paths accurately probes the potential shape of each basin. The results shown in Figures 11 and 12 used an initial path where the fraction spent in the left basin was less than the fraction spent in the right. This was chosen because the relative fractions were approximately what was expected from their equilibrium values. For the next calculation, the starting path had a value for the fraction spent in the left well of ≈ 80%, much larger than its equilibrium value. In Figure 14, the evolution of the right and left fractions is shown as a function of the algorithmic steps. In this simulation, A x OU = A y OU = 1 and the MD time step was h = 0.003. As indicated, it took about 10 5 steps before the left and right fractions became roughly equal and twice that many steps before the right fraction became significantly larger that the left. The change in the fraction was due to the movement of the time at which the transition took place and not the creation of a new pair of transitions. One does see an incipient attempt at the creation of a new pair of transitions at the ≈325,000th step of the simulation. This attempt is signaled by the spike in the fraction of the path that is spent in the narrow channel, shown as the dark grey curve.

Number of Metropolis Steps
Fraction Figure 14. Results for a calculation with A x OU = A y OU = 1. The black curve is the fraction of the path with x > 1/2; the light gray curve is the fraction of the path with x < −1/2; and the bottom (dark gray) curve is the fraction of the path with |x| < 1/2.

Continuous-Time Limit
Now, I turn to the calculations using the continuous-time limit of the OM functional [13,14]. The Radon-Nikodym derivative is manipulated using the Girsanov theorem and Ito's lemma. Displayed in Equation (A11), this is denoted as the Ito-Girsanov measure. As shown in Figure 15, the results of a simulation using the Ito-Girsanov measure gave unphysical results. As shown in Figure 16, the paths converge very quickly to objects that look similar to noisy versions of the Most Probable Path (MPP). Notice that this behavior happens after only 5000 hybrid steps. The MPP reflects the maximum of the measure and is dominated by the maxima of the Laplacian (of the physical potential). In the narrow channel, the Laplacian reflects the large curvature due to the confining sides. For calculations using the discrete form of the OM functional, Equation (A4), a very different behavior is found, as shown in Figure 11. This unphysical behavior is similar to what was found before [7,8], where the cause was traced back to a buildup of correlations that were inconsistent with the assumption in the original SDE that the noise is white and uncorrelated with the particle position.

Number of Metropolis Steps
Fraction Figure 15. Results for a calculation with A x OU = 1 and A y OU = 4 using Equation (A11) as the effective Hamiltonian. The black curve labeled "Center" is the fraction of the path with |x| < 1/2; the solid gray curve is the fraction of the path with x > 1/2; and the dashed gray curve is the fraction of the path with x < −1/2.

Discussion
This work is related to the recent work of Korol et al. [15], who explored using a similar approach and applied it to path-integral molecular dynamics. In that work, the authors suggested that one should use mollified forces. This force mollification is a frequencydependent factor that limits the effects of high-frequency internal modes and yet keeps intact the low-frequency motion. In this current work, this task is performed by the mass operator, M = L + A 2 ou 1. Here, A ou can be adjusted to change the crossover between lowand high-frequency behavior. At high frequencies, the modes revert to the free-particle motion. This reversion is due to the effects of the L in the mass operator, which dominates at high frequencies. In the novel method described here, this "force mollification" is folded into a unified framework, incorporating its effects in a self-consistent manner.
If, instead of the dynamics, one is interested in highlighting particular modes, one can choose A ou differently for each mode. In particular, when working in normal mode coordinates, one can choose A ou individually to minimize the size of ∆v in Equation (13), thereby reducing the numerical errors in the integration.

Conclusions
A novel HMC procedure for sampling paths was presented here. This new procedure is much more efficient than previous methods, resulting in an order of magnitude speedup for the two-dimensional problem explored. The decrease in computational effort comes from increasing the size of the MD time increment h by at least an order of magnitude. The novelty in the procedure lies in the choice of the mass matrix and concomitantly the velocities that form an OU bridge. Using this new procedure, I successfully examined transitions across an entropic barrier and found that the method can determine the relative free energy of the two basins that are not separated by an energy barrier.
Again, using the Ito-Girsanov measure was found to give unphysical results. The sampling procedure, for the two-dimensional problem examined, converged to paths that look like noisy versions of the MPP [16] (most probable path), where the vast majority of the path was trapped in the narrow channel that connected the probability basins. Clearly, the source for this unphysical feature is the Laplacian (of the physical potential) in the formula for the measure.
This new algorithm is also useful for calculations using path-integral molecular dynamics [15] where the effective Hamiltonian has a similar form to that studied here. The novel procedure presented here naturally folds the "mollification" of forces into the HMC formalism by a modification of the mass matrix. Thus, this novel procedure includes this mollifying effect in a self-consistent manner. The parameters {A ou } in this novel method are able to be tuned individually for each degree of freedom to emphasize different physical effects.
In addition, this method is able to be adapted for use with the OBABO integration scheme [17] or with the second-order Langevin method [18]. Acknowledgments: I wish to acknowledge many conversations with Patrick Malsom, Andrew Stuart, Kody Law, Gideon Simpson, and Robin Ball. In addition, I wish to thank the Department of Physics, University of Cincinnati, for the funds to cover the publication costs associated with this article.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Onsager-Machlup Functionals
In this Appendix, the algebraic forms of the OM functionals are delineated. For simplicity, the one-dimensional forms of the equations are written below; the multi-dimensional generalization is straight forward. First, recognize that different discretizations of the SDE (Equation (1)) result in different forms of the OM function [19]. Here, two are considered: the Euler-Maruyama and midpoint methods. The former is given by: where ξ n is a random Gaussian variate with mean zero and unit variance and ∆t is the time increment. The original Onsager-Machlup functional is based on this expression. If one iterates Equation (A1) N t times, one obtains a sequence of points |x >= {x 0 , x 1 , x 2 , x 3 , x 4 , . . . x n t }, with a probability: Onsager and Machlup recognized that one can rewrite this probability distribution in terms of the path positions as P EM ∝ exp(−I OM ) with: Defining e f f = 2 /∆t as an effective temperature: with: which can be rewritten as: Here, it is convenient to use the bra-ket mathematical notation. Specifically: By rearranging the order in the sum, the first term in Equation (A4) can be transformed to: where L is a symmetric, tridiagonal, positive definite matrix. The last term in Equation (A4) is denoted as Φ EM ({x}). The effective Hamiltonian can be written as: Another discretization of the SDE (Equation (1)) uses the Stratonovich midpoint [20] integration and is written as: Hamiltonian is a bit more complicated because of the Jacobian and again can be put in the form: The second term in the equation above is: where J n is the Jacobian matrix, whose elements are: and where the Jacobian J n is evaluated at the midpoint. Lastly, the continuous-time limit of Equations (A6) and (A8) can be written as: with: in one dimension. In higher dimensions, the Laplacian of U replaces U (x n ).

Appendix B. Constructing Ornstein-Uhlenbeck Bridges
The following describes the method used to construct Ornstein-Uhlenbeck (OU) bridges. An OU bridge is an OU process that begins and ends at the origin. Consider the standard OU process for a variable Z i given by the following SDE: with A being the OU parameter and being the temperature.
To form a realization of an OU process, we use the close relationship between HMC and the Metropolis Adjusted Langevin Algorithm (MALA). To sample the Boltzmann distribution for a harmonic potential, the Hamiltonian is augmented by auxiliary variables, which can be viewed as momenta, conjugate to the position variables. In one dimension, this is given by: The equations of motion are then: dx dt = p and dp dt = −A x (A15) The HMC procedure is first to form the momentum p 0 = √ ξ where is the temperature and ξ is a Gaussian random variate with mean zero and unit variance.
If one integrates the equations of motion over a time h and uses the velocity-Verlet approximation, one obtains: which becomes: with ∆t = h 2 /2. This is then the Euler-Maruyama [21] formula for the OU process. Within the one-step HMC, one uses the Metropolis-Hastings step to accept or reject the proposed move: Such a move may be rejected if the energy error is too large. However, the equations of motion above can be integrated exactly, which gives: In such a method, no proposed moves will be rejected in the Metropolis-Hastings step. Thus, these steps can be combined to obtain To first order in ∆t, the above agrees with other expressions for forming a realization of an OU process. The advantage of the above form is that for any size of ∆t, the path asymptotically approaches the target Boltzmann (Gaussian) distribution. Now, the ending point of the OU bridge needs to be addressed. The ending point of an OU bridge is also zero, but this is not the case for a typical realization of an OU process. To form a bridge, two OU processes can be combined by the following procedure. Assume that Z (1) and Z (2) are two different finite realizations of Equation (A13); both starting at the origin with endpoints z ± are OU bridges. Another way exists in the literature [22] for forming OU bridges. However, I found it to be numerically unstable due to the need to calculate the ratios of hyperbolic sines with large arguments.

Appendix C. BAB Splitting: Numerical Integration
In this section, the numerical algorithm is written down for the case when the BAB splitting is used. The method is schematically described in Equation (A22).