Preface

Condensed matter systems, ranging from simple fluids and solids to complex multicomponent materials and even biological matter, are governed by well understood laws of physics, within the formal theoretical framework of quantum theory and statistical mechanics. On the relevant scales of length and time, the appropriate 'first-principles' description needs only the Schroedinger equation together with Gibbs averaging over the relevant statistical ensemble. However, this program cannot be carried out straightforwardly—dealing with electron correlations is still a challenge for the methods of quantum chemistry. Similarly, standard statistical mechanics makes precise explicit statements only on the properties of systems for which the many-body problem can be effectively reduced to one of independent particles or quasi-particles.

As the interactions among so many degrees of freedom introduce nontrivial correlations between them, only computer simulation provides us with a methodic route to make accurate explicit predictions for the static and dynamic properties of many-body physical systems starting from first principles. The molecular dynamics simulation method (MD) was introduced in the 1950s, shortly after the 'companion' Monte Carlo method. Since then, the scope of both has been rapidly expanding. Despite the fact that suitable computing facilities were scarce, very slow, and with very small storage capacities compared to present-day facilities, immediately important and, at the time, rather surprising discoveries were made—notably that hard spheres crystallize at a density long before close packing has been achieved and that dynamic correlations in fluids exhibit long time tails. These have been the starting point of a great variety of methodological developments, with many exciting technical extensions still under development, providing broad applications and opportunities for important discoveries.

Nowadays, with pervasive high-speed networking and powerful massively-parallel computers at the hands of every scientist, advances in simulation methods are progressing at a breathtaking speed. Molecular dynamics computer simulation offers the advantage that connections can be established between the models of condensed matter on different scales and the hierarchy, from the sub-Angstrom scale—where one deals with effects due to the electrons, up to the mesoscopic and macroscopic scales relevant for living matter. Applications cut across extremely diverse fields, from fundamental problems in solid state physics to the rich world of phenomena exhibited by complex fluids and biological systems—elucidating the electronic properties of materials as well as the major nonequilibrium processes that take place in the living cell. The goal is to develop a simulation approach for complex materials and biological matter that successfully bridges the gap from the small scales of electronic structure calculations to the mesoscopic scales of pattern formation in soft matter (where one uses coarse-grained techniques such as dissipative particle dynamics and multiscale collision dynamics). This is a goal that will remain an exciting challenge for many years to come.

The contributions collected in this book move from the quantum-statistical description to the validity of classical modeling; they present some perspectives in the algorithmic and in the enhanced sampling approaches, tackling some longstanding challenges to simulation in the area of non-equilibrium, rare events, mesoscale and quantum-classical simulation. Initially, the book deals with the validity of molecular dynamics modeling, starting from the adiabatic hypothesis for the electronic ground state; the first contribution explores different descriptions of the potential energy surfaces one can use in a molecular dynamics simulation; the second analyzes in detail the Born-Oppenheimer schemes for *ab initio* MD within Kohn– Sham density functional theory, while the third one tackles the problem from the alternative perspective of a quantum Monte Carlo approach. The next contribution dwells on how to improve the statistical ensemble properties of time integrators for Langevin dynamics by including an acceptance–rejection scheme. The subject of free energy calculations by molecular dynamics is illustrated in the next two contributions, first with a presentation of alternative dynamical approaches for performing enhanced sampling by force biasing and temperature acceleration, then using non-equilibrium path sampling within the framework of Jarzynski identity and Crooks fluctuation theorem. The general ideas behind non-equilibrium molecular dynamics are the focus of the next two contributions, regarding calculation of dynamical responses and the application of Malliavin weight sampling to dynamical trajectories. Many of the same ideas are at the core of the study of rare, reactive, events by molecular dynamics as discussed in the next two contributions, more in general in the first and then with specific reference to the Markov state models approach. The last four invited contributions are dedicated to the problem of dealing with well separated space and time scales. First, the general philosophy of multiscale approaches and the related computational strategies within molecular dynamics are discussed in a concept paper, while the other three deals with specific non-adiabatic dynamical approaches for systems with a mixed quantumclassical description, based upon alternative approaches borrowing either from the Wigner transform representation or from the Bohmian formulation of quantum dynamics. The book is completed by the contributed papers to the molecular dynamics special issue.

The reader will find answers to a number of questions, a few of which we can briefly recall here:


This is what you will find in the present book but many more questions, some certainly yet to be posed, will certainly find their answers in the forthcoming developments of molecular dynamics simulation.

We wish to acknowledge the collaboration of the many people who have made possible this special issue. First of all, the authors, whose rigor, good work and speed have, of course, been instrumental. Also, we are very grateful to the many anonymous referees for the invaluable work of guaranteeing the quality and soundness of the contributions. Thanks, finally, to Jely He: She and the entire MDPI staff of the Editorial Office of Entropy have generously given invaluable help and good professional skill to bring this adventure to a successful conclusion.

Giovanni Ciccotti, Mauro Ferrario and Christof Schuette *Guest Editors* 

Reprinted from *Entropy*. Cite as: Ballone, P. Modeling Potential Energy Surfaces: From First-Principle Approaches to Empirical Force Fields. *Entropy* 2014, *16*, 322–349.

#### *Article*

## Modeling Potential Energy Surfaces: From First-Principle Approaches to Empirical Force Fields

#### Pietro Ballone

Department of Physics, Università di Roma "La Sapienza", Roma 00185, Italy; E-Mail: pballone58@gmail.com; Tel.: +39-06-4991-4248; Fax: +39-06-4991-7697

*Received: 17 September 2013; in revised form: 15 October 2013 / Accepted: 18 October 2013 / Published: 30 December 2013*

Abstract: Explicit or implicit expressions of potential energy surfaces (PES) represent the basis of our ability to simulate condensed matter systems, possibly understanding and sometimes predicting their properties by purely computational methods. The paper provides an outline of the major approaches currently used to approximate and represent PESs and contains a brief discussion of what still needs to be achieved. The paper also analyses the relative role of empirical and *ab initio* methods, which represents a crucial issue affecting the future of modeling in chemical physics and materials science.

Keywords: atomistic modeling; bond-order potentials; *ab initio* methods

#### 1. Introduction

Most, if not all, of computer simulations using particles require the specification of the system potential energy as a function of particles' coordinates [1]. The most *ab initio* methods, such as those discussed in [2], represent systems as made of electrons and atomic nuclei, and Coulomb's law is sufficient to account for every interaction. In all other cases, *particles* represent composite objects, such as atoms or atomic nuclei, dressed by core electrons, possibly embedded into a sea of valence electrons described at some approximate level of a many-body theory. Then, all the relevant interactions need to be worked out on a case by case basis, and the effort required to determine inter-particle forces may represent a sizeable fraction of the work to be done to investigate condensed matter systems [3].

The sections that follow contain an overview of modeling approaches and a discussion of their relative merits and limitations. Needless to say, the variety of systems and methods, together with the shear size of the knowledge accumulated over decades, impose strict limits to the scope of this presentation. First of all, the focus is on atomistic models, *i.e.*, models in which the number and geometry of interaction centers follows the distribution of atoms closely. A second major branch of modeling, concerning coarse graining approaches, is the subject of a separate contribution (see [4]).

Moreover, again, for limitations of space, the discussion that follows mainly concerns the most restrictive picture of interatomic interactions, based on the assumption that the potential energy of a system of N atoms can be expressed as a single-valued function of their 3N coordinates {**R**i, i = 1, ..., N}, which represents the so-called potential energy surface (PES) of the system. This assumption relies, first of all, on the so-called Born-Oppenheimer approximation [5], whose validity is loosely attributed to the ∼ 3–4 orders of magnitude difference in the mass of electrons and atomic nuclei, giving rise to a clear separation of the characteristic energy and time scales for the motion of electrons and atomic nuclei. Then, for any given instantaneous configuration of the atomic cores, electrons will be able to reach their electronic ground state, justifying the single-value assumption for the system potential energy. Experience shows that this "adiabatic assumption" is fairly well justified for a wide variety of systems and thermodynamic conditions. To be precise, it turns out that some cases are left out of this picture and often represent systems and phenomena of great interest. Methods suitable to deal with these cases are discussed in [6].

Computational science and simulation, in particular, always have a practical and an algorithmic aspect to them, and a central theme of research is the development of efficient ways to approximate and represent PESs. The availability of simple and computationally-convenient models of inter-particle interactions, for instance, has been instrumental in the dawning of computer simulation. Since then, the two complementary stages of determining the relevant interactions and of working out their structural, thermodynamic and dynamical consequences have cross fertilized each other, so much that the terms, *modeling* and *simulation*, often appear together in the title of books, papers, conferences, workshops and funding proposals.

Nowadays, the general perception of atomistic modeling is that of an overwhelmingly important and successful field, steadily expanding its reach towards more complex systems, which in this context means systems combining a wider variety of chemical bonds. In this respect, it is clear that much remains to be done, for instance, to bring under the cover of simulation heterogeneous systems and interfaces at which organic, semiconducting and metal phases meet each other or to model systems in which chemical transformations take place.

During the last few decades, *ab initio* simulation methods have progressively come to play the role of the elephant in the (modeling) room. Methods, such as density functional theory [7,8] and *ab initio* molecular dynamics [9], could, in principle, replace all other approaches, reducing the variety of modeling problems to just one, concerning the effective and accurate representation of the energy of valence electrons in the field of atomic nuclei or ionic cores.

Up to now, this replacement has not been pervasive, mainly because of the size and time limitations of *ab initio* methods running on present day computers and partly because the approximations that make *ab initio* computations feasible still somewhat limit their accuracy on the energy scale of thermal motion, especially for molecular systems whose properties are determined by weak interactions among closed shell molecules. *Ab initio* modeling, however, is progressing and extending its reach. For what concerns atomistic simulation, therefore, empirical and semi-empirical models might eventually be squeezed out by the combination of *ab initio* methods and coarse-grained approaches. Simple models of atom-atom interactions, however, are likely to retain their appeal, because of their unique ability to represent and rationalize the microscopic forces underlying the properties and behaviors of condensed matter systems.

#### 2. The Potential Energy Surface (PES) of a Many-Atom System

From a physicist point of view, ordinary matter consists of an assembly of electrons and atomic nuclei, evolving according to the laws of quantum mechanics. The non-relativistic limit is adequate for many of the systems and properties of interest for the present discussion, and unless differently specified, we shall restrict ourselves to this case.

Let us therefore consider a system made of N electrons and K nuclei, and let {**ri**, i = 1, ..., N} and {**R**α, α = 1, ..., K} be the coordinates of electrons and nuclei, respectively. The corresponding linear momenta are denoted by {**pi**} and {**P**α}. In the absence of external fields, the system Hamiltonian is:

$$\hat{H}\_0 = \sum\_{\alpha=1}^{K} \frac{\mathbf{P}\_{\alpha}^2}{2M\_{\alpha}} + \sum\_{i=1}^{N} \frac{\mathbf{p}\_{\mathbf{i}}^2}{2m} + \frac{1}{2} \sum\_{\alpha \neq \beta} \frac{Z\_{\alpha} Z\_{\beta} e^2}{|\mathbf{R}\_{\alpha} - \mathbf{R}\_{\beta}|} - \sum\_{i,\alpha} \frac{Z\_{\alpha} e^2}{|\mathbf{r}\_{\mathbf{i}} - \mathbf{R}\_{\alpha}|} + \frac{1}{2} \sum\_{i \neq j} \frac{e^2}{|\mathbf{r}\_{\mathbf{i}} - \mathbf{r}\_{\mathbf{j}}|} \tag{1}$$

that, for the sake of simplicity, we re-write as:

$$
\hat{H}\_0 = T\_{ion} + T\_{ele} + V\_{ion-ion} + V\_{ion-ele} + V\_{ele-ele} \tag{2}
$$

with an obvious correspondence between Equations (1) and (2). The Hamiltonian does not depend on the spin of electrons and nuclei, since we restrict ourselves to the non-relativistic limit, and we do not include any spin-orbit interaction into our Hamiltonian. Unless differently specified, Hartree atomic units (-= e<sup>2</sup> = m = 1) are used in this section.

Let us assume that the system is described by a many-body wave function, Ψ(**r1**, ..., **rN**; **R1**, ..., **Rk**;t), whose time evolution is determined by the time-dependent Schrodinger equation:

$$i\hbar\frac{\partial\Psi(\{\mathbf{r}\_{i}\};\{\mathbf{R}\_{\alpha}\};t)}{\partial t} = \hat{H}\_{0}\Psi(\{\mathbf{r}\_{i}\};\{\mathbf{R}\_{\alpha}\};t) \tag{3}$$

with appropriate boundary conditions in space and in time. Since the Hamiltonian is time independent, let us turn to the equivalent version of this same problem, concerned with the stationary states, <sup>Ψ</sup>k({**ri**}; {**R**α}) of <sup>H</sup>ˆ0.

The first important step towards the definition of a potential energy surface for the atomic nuclei is provided by the Born-Oppenheimer approximation (BO), which, under suitable and often verified conditions, opens the way to a separate description of the time evolution of electrons and nuclei [5]. The intuitive justification of BO is the observation that the motion of electrons and nuclei takes place over different time scales, since Mα/m is at least Mn/m ∼1, 800, and usually approaches 2ZαMn/m, where M<sup>n</sup> is the mass of a nucleon (proton or neutron). Moreover, the ratio of vibrational and rotational excitations is again <sup>∼</sup> Mα/m. Experimental data confirm that, indeed, typical electronic excitations are of the order of a few eV; vibrational energies reach up to a few hundred meV, and even for small molecules, the separation of rotational levels is of the order of 1 meV. The conclusion is that the excitation of electrons, because of vibrational or rotational motion, is very unlikely. We can therefore represent the motion of electrons as taking place in the slowly varying field of the nuclei. Consistently with these qualitative arguments, the BO approximation breaks down whenever the energy of relevant electronic excitations becomes comparable to typical vibrational energies (or, much less likely, comparable to rotational energies). In those cases, vibrational and electronic excitations need to be considered on the same footing.

The core of the so-called adiabatic approximation can be given a semi-rigorous mathematical formulation in the following way [5]. Let us re-write Hˆ<sup>0</sup> as:

$$
\hat{H}\_0 = \hat{T}\_{ion} + \hat{H}\_{ele} \tag{4}
$$

where Hˆele = Tˆ ele + Vion−ion + Vion−ele + Vele−ele. The energy term, Vion−ion, commutes with all other terms in Hˆele, and its inclusion in the electronic part is just a matter of convenience.

For every choice of the nuclear coordinates, {**R**α, α = 1, ..., K}, the eigenvalue problem:

$$
\hat{H}\_{ele}\psi\_j(\{\mathbf{r}\_{\bar{\mathbf{l}}}\}\mid\{\mathbf{R}\_{\alpha}\}) = E\_j(\{\mathbf{R}\_{\alpha}\})\psi\_j(\{\mathbf{r}\_{\bar{\mathbf{l}}}\}\mid\{\mathbf{R}\_{\alpha}\})\tag{5}
$$

is well defined and provides a sequence of eigenvalues, E<sup>j</sup> ({**R**α}), and eigenfunctions ψ<sup>j</sup> ({**ri**} | {**R**α}). At this stage, nuclei are "clamped", *i.e.*, they are no longer treated as particles embodied with a mass and a momentum, but only as sources of the potential acting on the electrons. The notation, (**r<sup>i</sup>** | **R**α), means that ψ<sup>j</sup> is an explicit function of **r<sup>i</sup>** and depends parametrically on the nuclear coordinates, {**R**α}.

The functions, ψ<sup>j</sup> , are a basis for the Hilbert space spanned by the electron coordinates, and we can represent Ψ<sup>k</sup> as follows:

$$\Psi\_k(\{\mathbf{r\_i}\}, \{\mathbf{R\_\alpha}\}) = \sum\_j \psi\_j(\{\mathbf{r\_i}\} \mid \{\mathbf{R\_\alpha}\}) \chi\_j^{(k)}(\mathbf{R\_\alpha}) \tag{6}$$

where, at this stage, χ(k) <sup>j</sup> (**R**α) is simply the coefficient expressing the projection of Ψ<sup>k</sup> on ψ<sup>j</sup> :

$$\chi\_j^{(k)}(\{\mathbf{R}\_\alpha\}) = \int \psi\_j^\*(\{\mathbf{r}\_i\} \mid \{\mathbf{R}\_\alpha\}) \Psi\_k(\{\mathbf{r}\_i\}, \{\mathbf{R}\_\alpha\}) \Pi\_{i=1}^N d\mathbf{r}\_i \tag{7}$$

The equation for Ψ<sup>k</sup> becomes:

$$
\hat{H}\_0 \Psi\_k(\{\mathbf{r}\_{\bar{\mathbf{r}}}\}, \{\mathbf{R}\_\alpha\}) = (\hat{T}\_{ion} + \hat{H}\_{ele}) \Psi\_k(\{\mathbf{r}\_{\bar{\mathbf{r}}}\}, \{\mathbf{R}\_\alpha\}) \tag{8}
$$

$$
= \sum\_j \chi\_j^{(k)}(\{\mathbf{R}\_\alpha\}) E\_j(\mathbf{R}\_\alpha) \psi\_j(\{\mathbf{r}\_{\bar{\mathbf{r}}}\} \mid \{\mathbf{R}\_\alpha\}) + \psi\_j(\{\mathbf{r}\_{\bar{\mathbf{r}}}\} \mid \{\mathbf{R}\_\alpha\}) \hat{T}\_{ion} \chi\_j^{(k)}(\{\mathbf{R}\_\alpha\})
$$

$$
\qquad
+ \chi\_j^{(k)}(\{\mathbf{R}\_\alpha\}) \hat{T}\_{ion} \psi\_j(\{\mathbf{r}\_{\bar{\mathbf{r}}}\} \mid \{\mathbf{R}\_\alpha\}) = \mathcal{E}\_k \Psi\_k(\{\mathbf{r}\_{\bar{\mathbf{r}}}\}, \{\mathbf{R}\_\alpha\})
$$

Let us now multiply on the left by ψ<sup>∗</sup> <sup>m</sup>({**ri**}|{**R**α}) and integrate over the electron coordinates. One obtains in this way a set of coupled partial differential equations for the <sup>χ</sup>(k) <sup>m</sup> ({**R**α}) functions:

$$E\_m(\{\mathbf{R}\_\alpha\})\chi\_m^{(k)}(\{\mathbf{R}\_\alpha\}) + \hat{T}\_{ion}\chi\_m^{(k)}(\{\mathbf{R}\_\alpha\}) + \sum\_j \chi\_j^{(k)}(\{\mathbf{R}\_\alpha\}) \langle \psi\_m \mid \hat{T}\_{ion} \mid \psi\_j \rangle = \mathcal{E}\_k \chi\_m^{(k)}(\{\mathbf{R}\_\alpha\}) \tag{9}$$

where <sup>E</sup><sup>k</sup> is the eigenvalue of the full, *i.e.*, electrons and ions Hamiltonian <sup>H</sup>ˆ0, and the relation, ψ<sup>m</sup> | ψ<sup>j</sup> = δmj , has been used. The coupling among the equations is due to the non-diagonal part of ψ<sup>m</sup> <sup>|</sup> <sup>T</sup><sup>ˆ</sup> ion | ψ<sup>j</sup> :

$$\langle \psi\_m \mid \hat{T}\_{ion} \mid \psi\_j \rangle = \sum\_{\alpha} \frac{1}{M\_{\alpha}} \int \left[ -i \frac{\partial \psi\_m(\{\mathbf{r}\_{\bar{1}}\} \mid \{\mathbf{R}\_{\alpha}\})}{\partial \mathbf{R}\_{\alpha}} \right]^\* \left[ -i \frac{\partial \psi\_j(\{\mathbf{r}\_{\bar{1}}\} \mid \{\mathbf{R}\_{\alpha}\})}{\partial \mathbf{R}\_{\alpha}} \right] \Pi\_{i=1}^N d\mathbf{r}\_{\bar{1}} \quad (10)$$

whose computation requires the parametric dependence of χm(**R**α) on the {**R**α} coordinates to be continuous and differentiable.

Neglecting these non-diagonal terms, the equations for the electronic and ionic coordinates are decoupled, and the picture emerging from this manipulation of Equation (6) is that of nuclei evolving on the potential energy surfaces <sup>U</sup><sup>j</sup> [{**R**α}] = <sup>E</sup><sup>j</sup> ({**R**α}) + ψ<sup>j</sup> <sup>|</sup> <sup>T</sup><sup>ˆ</sup> ion | ψ<sup>j</sup> . This last expression, corresponding to the so-called Born-Huang approximation [10], represents, in fact, an upper bound for the system's potential energy. A lower bound, instead, is given by the original BO approximation, *i.e.*, U<sup>j</sup> [{**R**α}] = E<sup>j</sup> ({**R**α}).

The nuclear motion in general is quantum mechanical, and, depending on initial conditions, it might occur on any of the U<sup>j</sup> potential energy surfaces (PESs). More precisely, since the equations for different j's are separated, it will take place on a single surface of index j, provided the starting point is consistent with this choice. This condition, that we identify with *adiabatic motion*, underlies most of the simulations that are routinely carried out in computational-condensed matter physics. Moreover, again, in most cases, but with noticeable exceptions, the relevant PES corresponds to the electronic ground state, and the scale of times and energies of interest allows the usage of classical dynamics instead of quantum mechanics [6].

The following sections are devoted to the discussion of the general properties of PESs, and of computationally tractable approaches to approximate them. Before doing that, it might be interesting to consider briefly when the BO approximation and the conditions for adiabatic motion are no longer valid.

An estimate of the ψ<sup>m</sup> <sup>|</sup> <sup>T</sup><sup>ˆ</sup> ion | ψ<sup>j</sup> terms can be obtained by perturbation theory, showing that the strength of the non-diagonal coupling is proportional to:

$$
\langle \psi\_m \mid \hat{T}\_{ion} \mid \psi\_j \rangle \propto \frac{1}{E\_m - E\_j} \langle \psi\_m \mid [\mathbf{P}\_\alpha, \hat{H}\_{ele}] \mid \psi\_j \rangle \tag{11}
$$

Moreover, the matrix element of the commutator can be shown to depend primarily on the properties of individual atoms and to be only moderately dependent on the {**R**α} coordinates. Then, the major factor determining the coupling strength among different adiabatic surfaces is the energy gap separating different PESs. Whenever (E<sup>m</sup> − E<sup>j</sup> ) becomes comparable to the typical energies of the atomic motion, the BO decoupling is no longer valid, the electronic and ionic motion are intimately intertwined and both need to be treated quantum mechanically. The range of quantum mechanical features that become relevant in the non-BO case go beyond delocalization and diffraction, but includes the appearance of geometric (Berry-Pancharatnam) phases [11].

Far from being the exception, violations of the BO approximation are pervasive. They occur often, but not exclusively, at the so-called conical intersections [11], playing a major role in chemical reactions and, for instance, challenging our ability to model catalysis [12]. Apparent non-BO effects are routinely highlighted by clever experiments [13,14].

Metals, whose occupied states are immediately contiguous in energy to the empty states, may appear as the most obvious candidates for large deviations from the BO picture. In the vicinity of the Fermi surface, however, single particle excitations are the only relevant excitations, but the coupling of each of these excitations to the nuclear motion (through Equation (11)) is vanishingly small. Collective electron excitations, such as plasmons, couple to the atomic motion, but their energies are of the order of several eV and, thus, are comparable to, if not higher than, those of closed shell atoms and molecules. As a result, vibrational properties of metals are generally well described by adiabatic dynamics. Exceptions are represented by Kohn anomalies, resulting from the nesting of reciprocal lattice vectors with the Fermi surface. Metals also provide the setting for a type of BO violation qualitatively different from those considered until now, represented by superconductors, in which the coupling of the electron and nuclear motion changes the symmetry of the ground state.

The isolated system picture underlying the BO decoupling has been generalized in [15–17] to the case of electrons and nuclei evolving in an external time-dependent potential. It was shown, in particular, that the full wave function can be factorized exactly into an electronic and a nuclear wave function, again opening the way to the definition of a time-dependent PES. The picture is less simple than in the static case, since it involves the introduction of a Berry vector potential and of Berry-Pancharatnam geometric phases [18,19] into the problem. This approach has already provided the basis for the real-time simulation of molecular systems in strong (laser) external fields. For completeness, I mention that some details of the formal framework might still need to be worked out for a fully rigorous treatment [20].

#### 3. Properties of Potential Energy Surfaces

Basic features of the PES can be anticipated even without an explicit solution of the standard electronic problem in Equation (5). A surprisingly realistic intuition of what a PES looks like was outlined in elegant Latin prose long before quantum mechanics [21], based on an atomistic hypothesis and on the assumption that the still undiscovered atoms felt each other mainly at short distances.

The modern interpretation confirms this picture and adds a wealth of microscopic detail. The direct Coulomb repulsion among nuclei, unscreened by electrons at short distances, prevents the close contact of atoms and their eventual collapse. The kinetic energy of the electrons tightly bound to the nuclei will provide an additional repulsive contribution, resulting from the need to preserve the Pauli principle. On the other hand, the formation of chemical bonds gives rise to attractive potentials, binding atoms together. Even in the case of inert species, subtle quantum mechanical effects give rise to dispersion forces, which provide a weak, but pervasive, attraction.

Arguably, the simplest and most intuitive picture of atomic interactions is provided by pair potential models, in which the system energy is written as:

$$U[\{\mathbf{R}\_{\alpha}\}] = \frac{1}{2} \sum\_{\alpha,\beta} \phi\_{\alpha\beta}(|\mathbf{R}\_{\alpha} - \mathbf{R}\_{\beta}|) \tag{12}$$

where the α, β label on φα,β indicates that the interaction depends on the chemical identity of particles α and β. A spherically symmetric potential has been assumed for the sake of simplicity.

Computations and comparison with experiments have shown that an expression of this kind is suitable for rare gases [22] and for simple ionic compounds [23]. Systems and models of this kind have been instrumental in establishing computer simulation as a quantitative research tool in condensed matter and in chemical physics.

Needless to say, the scope of pair potentials is very narrow, and limitations of this model were already apparent well before the dawn of computer simulation, based on the results of lattice dynamics models in metals and semiconductors.

One could think of the pair potential expression as being only the lowest order approximation of the PES into an n-body expansion of the form:

$$U[\{\mathbf{R}\_{\alpha}\}] = \frac{1}{2!} \sum\_{\alpha,\beta} V\_2(\mathbf{R}\_{\alpha}, \mathbf{R}\_{\beta}) + \frac{1}{3!} \sum\_{\alpha,\beta\gamma} V\_3(\mathbf{R}\_{\alpha}, \mathbf{R}\_{\beta}, \mathbf{R}\_{\gamma}) + \dots \tag{13}$$

For a system made of a finite and constant number of particles, such an expression can always be written down. For instance, one could define V<sup>2</sup> as the interaction energy of two isolated atoms, V<sup>3</sup> as the corresponding energy of trimers, minus the symmetrized combination of V<sup>2</sup> contributions, *etc*. Such an expansion, however, is useful only if it converges within a few terms, at least because the cost of evaluating successive n body terms grows rapidly with increasing n. Moreover, it contributes to the physical understanding of the system behavior only when its convergence is absolute, *i.e.*, it does not require the cancellation of contributions of alternating sign, whose amplitude is constant or even increasing with increasing order. Model computations based on a tight binding Hamiltonian [24], however, show that even for simple systems, the expansion in Equation (13) is not well behaved and, thus, is seldom useful for practical computations.

More fruitful than the systematic expansion of Equation (13) has been the introduction of the *cluster potential* idea [25,26], loosely and sometimes more closely based on the bond-order concept introduced by Pauling [27]. In this approach, a fixed and low number of terms is retained; the expression looses its character of a systematic series to become an asymptotic expansion. Each of the few terms that are retained describe low-order potentials whose strength depends on the local environment. Approaches of this kind have given origin to the most popular family of potentials used to simulate metals and metallic alloys and also to some important approaches to approximate the PES of semi-conductors, which are discussed in the following sections.

#### 4. Many-Body Interactions: Metals and Metal Alloys

Metals and their alloys posed an early challenge to the pair or few-body potential picture, since their basic properties manifest essential many-body interactions [28].

The successful and physically-motivated incorporation of these effects into tractable models in the early eighties of the last century has spawned a vast simulation activity, aiming, at first, at reproducing phase diagrams, then at analyzing in detail surfaces and interfaces and further progressing towards the prediction of mechanical properties through multi-scale approaches. Physical metallurgy is currently one of the most active and productive subfields of atomistic simulation [29,30].

Many-body interactions in metals were first identified by the analysis of their elastic properties. For instance, the elastic constants of cubic materials consisting of atoms interacting via spherically symmetric pair potentials have to satisfy the so-called Cauchy relations, stating, for instance, that C<sup>12</sup> = C44. The violation of this relation, known in the solid state literature as a Cauchy anomaly, is the rule more than the exception in metals, unambiguously pointing to a deviation from the pair potential picture.

These features were first rationalized by considering the basic representation of a metal, as made of ions embedded into a sea of valence electrons. Since the major ingredient, *i.e.*, the homogeneous electron gas could be solved analytically, and, at least for sp metals, the electron-ion interaction is weak, the full problem could be attacked by perturbation theory [28,31]. Carried up to the second order, this approach provides an expression for the system total energy that consists of a large volume (or, equivalently, density) term and a pair potential contribution. The volume term is able to account for the Cauchy anomaly. In simple metals, such as the alkalis, the pair potential is relatively soft at short distances and oscillates at large distances, reflecting Friedel oscillations. These features explain the bccstructure of these systems at normal conditions and provide a clue to understand more complex structures adopted by the lighter alkali metals at very low temperature or found in slightly more complex systems, such as alloys, or heavier sp metals, such as gallium, indium or tin.

Approaches of this kind are now mainly of historical interest, since most of the cases relevant for applications involve transition metals, and in those systems, the valence electron-ion interaction is by no means weak; the perturbation expansion cannot be limited to the second order and becomes rapidly untreatable beyond that point [32]. Besides these fundamental problems, other practical difficulties concern the definition and the zero-order solution of an electron gas problem suitable for inhomogeneous systems and for alloys. Electron gas perturbative approaches, therefore, could not solve problems, such as the inward relaxation of crystal surfaces, the quantitative description of stacking faults or the overestimation by pair potentials of the vacancy formation energy in metals.

To overcome these problems, new models have been proposed in [33–35], conforming to the cluster-potential idea [26], and representing low-order approximations to a bond-order potential. The embedded atom model (EAM) of [33,34], loosely based on density functional theory, has the broadest appeal, and for this reason, it is used here as a representative of a wider class of models.

According to EAM, each metal ion, i, at position **R<sup>i</sup>** gains an energy, E[ρe(**Ri**)], upon being immersed into the valence electron distribution at density ρe(**Ri**) and interacts with neighboring ions by a short range repulsive pair potential, V2(R). The energy of N metal atoms, therefore, is:

$$U[\{\mathbf{R\_i}\}] = \frac{1}{2} \sum\_{i \neq j}^{N} V\_2(|\ \mathbf{R\_i} - \mathbf{R\_j}|) + \sum\_{i=1}^{N} E[\rho\_e(\mathbf{R\_i})] \tag{14}$$

The picture is completed by a prescription to compute the electron density, ρe, at the position, **Ri**, of each atomic core. EAM represents such a density as the sum of contributions from every other atom:

$$\rho\_e(\mathbf{R\_i}) = \sum\_{j \neq i} t\_j (|\mathbf{R\_i} - \mathbf{R\_j}|) \tag{15}$$

where the t<sup>j</sup> (R) are again relatively short-range functions, mimicking the tail of the electron distribution around an isolated atom. Since it introduces a *local* embedding density, this prescription overcomes most of the limitations of the free electron models, which instead rely on a global definition of the valence electron density.

Parameters and auxiliary functions, such as t(R), E[ρe] and V2(R), could be computed from first principles [36], but this approach has been only moderately successful. Far more effective has been the strategy of adopting the EAM potential energy expression as a general framework, relying on fitting experimental quantities to tune a few parameters distributed into the functional form.

The success of EAM has been due to its ability to overcome the limitations of simpler models, easily accounting for the Cauchy anomaly, the reduced value of the vacancy formation energy, the inward relaxation of compact metal surfaces and the reconstruction of more open ones. Its broad acceptance relies also on the many and physically appealing properties of the model, discussed in a number of publications, such as the ease of extending EAM to alloys or the close relation with pair potentials in the case of homogeneous systems at constant volume.

From the computational point of view, the efficiency of EAM is due to the pair potential form of both the repulsive contribution, V2, and the embedding density expression in Equation (15). The time required to carry out a simulation based on EAM is expected to be twice that of a pair potential model, since a pass on all atom pairs is required to compute the repulsive potentials and the embedding density, while a second pass is needed to compute forces on atoms arising from the embedding energy. With suitable lists of neighbors, and depending on the range of V2(R) and of t(R), EAM can be used to carry out MDsimulations for systems of 10<sup>4</sup> atoms over several nanoseconds using laptops or inexpensive PCs. Supercomputers extend these ranges to several million atoms, and μs time scales.

Needless to say, an empirical and approximate approach, such as EAM, cannot provide the final answer to the problem of modeling metals, and transition metals, in particular. A comprehensive discussion of inaccuracies and limitations identified during thirty years of applications is beyond the scope of this short review, and only two examples are briefly mentioned here. Phonons in transition metal crystals, a property routinely measured by inelastic neutron scattering, are not well reproduced by EAM. The elastic constants usually enter the fitting of the potential, and thus, the low-frequency acoustic phonons close to the Γ-point of the first Brillouin zone are usually well reproduced. Higher frequency modes at the zone boundary, however, turn out to be too soft with respect to the experimental data (see Figure 1). Transition metal clusters from a few to several thousand atoms are important for catalysis and represent a basic ingredient of nanotechnology. EAM neglects the details of the electronic structure of the atoms, leaving out quantum mechanical effects, such as Jahn-Teller. Thus, EAM is unable to quantitatively reproduce the structure and cohesive properties of the very small aggregates as provided by density functional computations. Beyond ∼ 100 atoms, cluster properties are expected to evolve more continuously with size, approaching those of bulk phases beyond 10<sup>4</sup> atoms. EAM has been used extensively to investigate clusters across this range, but a quantitative validation of the model is still lacking and difficult to achieve, since more *ab initio* computations become too expensive to carry out, and experiments find it difficult to probe this range of cluster sizes.

A step beyond EAM, needed to quantitatively model the fine details of the structure, thermodynamics and dynamics of transition metal systems, requires the introduction of explicit angular terms into the potential energy expression. This can be achieved through a conceptually simple extension of EAM, known as modified EAM (MEAM) [34], or resorting to a chemically accurate bond-order potential model, including the directionality of d and f electron orbitals, as well as the distinction of σ, π, δ, ..., bonding, anti-bonding and non-bonding orbitals [37].

The MEAM is somewhat more complex to use than EAM, and probably for this reason, it has been less extensively applied. Moreover, its ability to quantitatively overcome the limitations of the simpler model is not always so apparent. The other approaches, more closely based on the bond order approach, appear to be cumbersome to use in simulations, and the number of applications based on these models has been limited.

Figure 1. Phonon frequencies of fccpalladium from experiments (symbols, see [38]) and from the embedded atom model (EAM) model of [33].

Because of the inclusion of angularly dependent forces, the scope of MEAM could, in principle, cover semiconductors. Successful applications have been published [34], but more specific models, described in the following section, have received broader attention in this subfield.

#### 5. Semiconductors and Insulators

Semiconductor materials, exemplified by silicon, germanium, gallium arsenite, *etc*., are characterized by fairly open and complex structures of relatively low coordination, stabilized by sizeable angular forces, arising from the directionality of covalent bonds. Apart from elemental systems, most inorganic semiconductors are characterized, in fact, by a combination of covalent and ionic bonding. Several of these systems, most notably silicon and germanium, turn into metals upon melting.

Despite the difficulty of reproducing these properties by few-body potentials, the urgency of investigating the elements and compounds that fueled the electronic revolution stimulated the first bold attempts. The two- and three-body potential for silicon proposed by Stillinger and Weber [39] arguably has been the most representative example of this first generation of models.

Despite their interest, approaches of this kind have been only moderately successful, and once again, the bond-order concept [27] proved more fruitful. Its application to semiconductors was first discussed by Abell [25] before being used in a more empirical setting by Tersoff [40,41] and extended by Brenner [42] to a wider class of systems and problems.

According to these models, the potential energy of an assembly of N atoms of coordinates {**Ri**} is written as:

$$E\_N = \sum\_{i \neq j} \left[ A \exp\left(-\lambda\_1 R\_{ij}\right) - B\_{ij} \exp\left(-\lambda\_2 R\_{ij}\right) \right] \tag{16}$$

where Rij =| **R<sup>i</sup>** − **R<sup>j</sup>** |. The first term, representing the short-range repulsion, is a genuine pair potential. The second term contains many-body contributions via the dependence of Bij on the local environment around the interacting pair, ij.

This form has obvious analogies with the EAM case. The difference is that Bij not only counts neighbors, as the embedding density does, but takes into account also the angular correlation among their mutual positions. This addition is required to enforce the dominance of tetrahedral sp<sup>3</sup> coordination, but also to carve a secondary role for other structures, from the sp<sup>2</sup> bonding of graphite, to the octahedral coordination of liquid silicon and germanium [40,41].

Parallel to the EAM case for metals, potentials of this type replaced previous models and established a new standard in modeling semiconducting systems. Success, however, has been somewhat less pervasive than in the case of EAM, for reasons that are relatively easy to identify. First of all, interactions in semiconductors are more complex and propagate at a longer range, since screening is not as effective as in metals. Moreover, semiconducting alloys and compounds give rise to partially Coulombic interactions, whose combination with covalent bonding has seldom been modeled, even by bond-order potentials.

Furthermore, in this case, the systematic improvement beyond the semi-empirical Tersoff and Brenner potentials has to rely on the analytical development of chemically accurate bond-order models [43]. Work along these lines is underway and has shown promising developments, but current models still appear fairly difficult to implement in molecular dynamics or Monte Carlo packages.

An important development of Brenner's scheme has been the introduction of *reactive* force fields, able to describe chemical transformations in the system under consideration. The majority of the parameterizations and applications published until now concern organic systems, but potentials of this kind are mentioned here for their similarity with models first introduced for semiconductor systems. Prototypical examples of a reactive force field are the so-called ReaxFF [44] and the REBOpotential [45]. Both models require a massive parametrization effort, and for this reason, they appear to be fairly *ad hoc* and system specific.

A different line of attack to modeling semiconducting systems is suggested by the observation that in many cases, force fields of the form currently used to model organic systems and consisting on stretching, bending and torsion might indeed provide a good representation of structural and dynamical properties of semiconductors and of network insulators, such as silica. Models of this kind, in fact, were developed well before the age of computer simulation, and extensively used in lattice dynamics studies of semiconductors and insulators [46]. The problem of these models is that, mainly because of the established tradition, the topology of bonds is kept fixed, bonds are harmonic and can neither form nor break. These models, therefore, describe only low amplitude oscillations around a pre-assigned minimum of the potential energy surface. Removing these inessential constraints by introducing rules to break, form and interchange bonds results in a far more realistic picture. It was shown, for instance, that such a reactive force field model of silica undergoes melting at approximately the right conditions [47] (see Figure 2), and the same model has been used to provide an intriguing view of the amorphous silica surface at length and time scales unachievable by other methods [48].

Figure 2. Average potential energy per atom U(T)/K<sup>B</sup> of SiO<sup>2</sup> computed by the force field of [47]. k<sup>B</sup> is the Boltzmann constant, introduced to express energies in temperature units (K). Solid dots: heating a β-cristobalite sample. Solid line: cooling the same sample from high temperature. The potential energy contribution, Cp, to the constant pressure specific heat computed on heating the full model is shown in the inset. The peak in C<sup>p</sup> and the anomaly in U(T) are around the same temperature point to a melting transition at T<sup>M</sup> ∼2150 K.

Progressively increasing the electronegativity difference in compound semiconductors enhances the charge transfer among atoms, widening the band gap and turning the system into an ionic insulator. In the limit of strongly ionic materials, of course, pair potentials are adequate, but only a few compounds belong to this class, such as, for instance, alkali-halides or the oxides and chlorides of Group IIA and Group IIB metals. In between ionic insulators and polar semiconductors, there is a vast number of systems, including technologically relevant compounds, such as ceramics, transition metal oxides, ferroelectric and ferroid materials, minerals and bio-minerals, in particular, for which no current model is fully satisfactory. One of the major issues for these systems is the inclusion of polarizability into ionic and polar models [49]. Unfortunately, simulation approaches using polarizable models require either the minimization at every step of a polarization energy functional or the inclusion into the model of charged shells [50]. These last represent electronic degrees of freedom and react to electric fields on a time scale much faster than that of ionic vibrations [51]. Both methods are significantly at a disadvantage with respect to cases in which the potential energy is an explicit function of the atomic coordinates, and the simulation of systems bound by a combination of covalent and ionic forces appears to be split between oversimplified pair potential models and *ab initio* approaches.

#### 6. Force Fields for Molecular Systems

Although every material ultimately consists of atoms, many systems are more easily understood as being made of molecules.

Modeling the PES of small and relatively unreactive species, such as N2, O2, CO, CO2, but, also, PF6, BF4, BH4, *etc*., requires only a slight extension of the pair-potential picture. Each molecule is represented by a small number of interaction centers, which may or may not coincide with atoms in number and position. The intra-molecular configuration is enforced by constraints representing rigid bonds or, less often, by harmonic springs, while centers on different molecules interact pair-wise. Because of their simplicity, models for small inorganic molecules have been used since the early days of computer simulation. Perhaps the most remarkable observation concerning these systems is that the quantitative details of their PES are still under investigations and require surprisingly sophisticated models to be reproduced [52,53].

Conspicuously absent in the list of small unreactive and supposedly simple molecules is water, whose peculiar properties and special role have motivated an extraordinary modeling effort, which is discussed separately in Section 7.

A specialized subfield of modeling simple species concerns systems in which a weakly bound molecular fluid is physisorbed on an inert solid surface, such as MgO, mica, graphite and flat or stepped transition metal surfaces. In this case, the effect of the solid substrate on the molecular fluid often is represented as an external field. In the case of crystal surfaces, the in-plane dependence of the field strength can be expanded in plane waves, whose wave vectors reflect the periodicity and symmetry of the surface lattice [54].

#### *6.1. Organic Molecular Systems*

In many respects, organic molecular systems are not so different from any other molecular systems, but the range and impact of their applications together with the explosive expansion of simulation in bio-physics and bio-chemistry amply justify a separate discussion. Systems of interest in this context include polymers, hydrocarbons, sugars, cellulose, *etc*., but also the endless variety of biological molecules, from phospholipids to proteins and nucleic acids. Other molecular organic systems of biological interest include drugs, simple nutrients, signal molecules, such as hormones, metabolic species, such as ATP, GTP, NADP, coenzymes, including vitamins, and prosthetic groups.

The modeling and simulation of systems of this kind arguably is the computational condensed matter activity with the largest economic relevance, both directly via the commercialization of packages and force fields and indirectly through the impact it has on applied research.

Despite the complexity of the structures they form, the PESs of organic systems turns out to be approximated fairly well by simple analytic expressions. First of all, the organic and biological species of interest are made primarily of light elements, forming strong covalent bonds through their s and p orbitals, giving origin to closed shell molecules. Systems of this kind, therefore, can be thought of as consisting of atoms connected by a fixed topology of bonds, with inter-molecular, *i.e.*, non-bonded, interactions consisting of pair-wise Coulomb and dispersion forces. Because of their sp character, intra-molecular angular forces are relatively simple. Whenever d electron metals are involved, as in metal centers and in prosthetic groups, modeling becomes far more challenging.

In the standard cases, the PES of organic and biological systems is written as the sum of contributions from bonded (Ub) and non-bonded (Unb) interactions:

$$U = U\_b + U\_{nb} \tag{17}$$

The bonded energy, in turn, is given by the sum of two-, three- and four-body terms from atoms joined by one ({ij}), two ({ijk}) and three ({ijkl}) consecutive covalent bonds:

$$U\_b = \frac{1}{2} \sum\_{\{ij\}} K\_{ij}^s [R\_{ij} - \bar{R}\_{ij}]^2 + \frac{1}{2} \sum\_{\{ijkl\}} K\_{ijb}^b [\theta\_{ijk} - \bar{\theta}\_{ijk}]^2 + \frac{1}{2} \sum\_{\{ijkl\}} K\_{ijkl}^\tau \left[1 + \cos \left(n \phi\_{ijkl} - \bar{\phi}\_{ijkl}\right)\right] \tag{18}$$

K<sup>s</sup> ij , K<sup>b</sup> ijk and K<sup>τ</sup> ijkl are suitable force constants; <sup>R</sup>¯ij , ¯θijk, <sup>φ</sup>¯ijkl and <sup>n</sup> reflect the length, bending and dihedral angles of unstrained bonds. The sub-indices, ij, *etc*., indicate that each of these parameters depends on the chemical identity of the atoms involved. The form for the dihedral contribution in Equation (18) is just one of a few different expressions used in popular force fields, while the choice for stretching and bending terms is more uniform.

Non-bonded interactions are written as.

$$U\_{nb} = \frac{1}{4\pi\epsilon\_0} \sum\_{i \neq j}' \frac{q\_i q\_j}{R\_{ij}} + \sum\_{i \neq j}' 4\epsilon\_{ij} \left[ \left(\frac{\sigma\_{ij}}{R\_{ij}}\right)^{12} - \left(\frac{\sigma\_{ij}}{R\_{ij}}\right)^6 \right] \tag{19}$$

where the {qi} are atomic charges, Coulomb forces are assumed to be acting in vacuum and σij and ij are suitable coefficients for the dispersion interaction. The prime on each sum indicates that pairs of atoms separated by one and two consecutive bonds are excluded, and the contribution from pairs separated by three consecutive bonds might be reduced.

The remarkable and, to same extent, unique property of the PES of organic and biological systems is that the bonds, whose properties are described in Equation (18), are fairly transferable, meaning that the equilibrium length, stiffness, *etc*., of a given organic bond is nearly the same in a large number of homologous compounds. Highlighting these similarities and exploiting them to endow the model with broad transferability is the most challenging and most rewarding part of modeling organic molecular systems.

The parametrization and, especially, validation of these potentials may require sizeable computations and are the playground of large collaborations, since it requires the convergence of several types of complementary expertise. Any single system might be analyzed by *ab initio* computations to derive intra-molecular force constants and atomic charges. These need to be complemented by suitable coefficients for the dispersion part, which are usually obtained by fitting measured properties, such as the equilibrium density and enthalpy per molecule or the molecular diffusion constant.

Generic potentials covering large classes of compounds and widely used by the community include Amber [55], CHARMM [56], OPLS [57] and Gromos [58]. More specialized parameterizations, tuned on the properties of specific families of compounds, are too many to be listed.

In many respects, the most uncertain part of the parametrization is the choice of coefficients for the non-bonded interactions. The definition of atomic charges is not unique, and different methods provide fairly different results. The most popular approach [59] attributes charges by fitting the electrostatic potential outside gas-phase molecules, as provided by *ab initio* computations. The method is physically sound, but the fit becomes ill conditioned whenever the molecular size exceeds ∼ 15–20 atoms or when the geometry is compact, thus reducing the number of multipolar momenta whose modulus is significantly different from zero. Constraints and minimum conditions on the size of individual charges do improve the fit [60], but the choice of these parameters remains fairly uncertain. For each individual system, the error introduced by the choice of the charge may be compensated for by the selection of the dispersion coefficients. In fact, it has been observed many times that it was possible to accurately reproduce the target properties of condensed phases such as the density or the molecular diffusion even starting from the fairly different charges provided by different methods. Unfortunately, this cancellation of errors limits the transferability of the potential, since an equivalent compensation might not occur when a given organic molecule is transferred into a different environment.

Especially for large biological systems, computational cost considerations have motivated approximations and shortcuts that might reduce the size of the simulated system. One obvious saving is obtained by representing CH<sup>2</sup> and CH<sup>3</sup> groups in aliphatic chains by a single particle. This *united-atoms* approximation is fairly well justified, since these groups are small and and the non-bonded potential arising from them is fairly spherical. Moreover, the motion of hydrogen in each of these groups is frozen by quantum effects up to fairly high temperature.

A second more drastic approximation concerns systems in solution. Since, especially in biochemistry, one is interested in the properties of the solute, implicit solvent models [61] have been developed to replace the effect of the solvent by suitable modifications of the solute force field. In many respects, implicit solvent models are a special case of coarse graining and, as such, are left out of our discussion.

In summary, the force field modeling of organic and biological systems is a largely successful enterprise, validated by a vast number of applications and supporting the research of a large portion of the simulation community. Furthermore, in this case, and almost needless to say, the vast simulation activity has highlighted many cases of inaccuracies or outright failures. The general feeling, however, is that the scale of most of these simulations is too large to allow, at present, the usage of significantly more sophisticated and more expensive approaches. Polarizability is likely to be the single most relevant missing ingredient, but the available methods to include it into simulations are still fairly expensive, and for this reason, explicitly polarizable models have been used only for a limited number of large-scale studies.

At present, a very active research field is the development of force fields for organo-metallic complexes, which represent prosthetic groups in proteins or active groups in a variety of organic opto-electronic devices and are important also for homogeneous catalysis. Peculiar difficulties are represented by the variety of coordination numbers, sometimes corresponding to different spin states, thus pointing to multiple PESs fairly close in energy. Moreover, the structure of organo-metallic complexes is characterized by the importance of quantum mechanical effects, such as Jahn-Teller, or by the so-called *trans influence*, defined as the "tendency of a ligand to selectively weaken the bond *trans* to itself" [62]. Models to include these effects in empirical PES models might turn out to be too complex to be used in practice. A more promising alternative is provided by QM/MMapproaches, using classical force fields for most of the system and resorting to *ab initio* methods for the challenging portion around the metal center.

An intriguing subset of mainly, but not exclusively, organic compounds is represented by the so-called room temperature ionic liquids [63], defined as molecular ionic systems whose melting temperature is below 100◦. Prototypical systems are made by an alkane substituted imidazolium cation, joined to an organic or inorganic anion. Systems of this kind are relevant here, not only because of the intense simulation activity that concerns them, but mainly because they provide a bridge between different classes of bonding and, thus, pose special modeling problems.

The bulk of the extensive simulation work carried out at present relies on Amber-like force fields, with specialized parameterizations (see, for instance, [64,65]). Models of this kind are fairly successful, but issues concerning polarizability and the attribution of partial charges to atoms become particularly important for these systems. Despite these difficulties, a number of simulations have successfully addressed the properties of very complex systems, consisting of room temperature ionic liquids in combination with a variety of solvents and neutral organic compounds, including bio-molecular species (see Figure 3).

A few carbon systems, such as fullerenes, carbon nanotubes and graphene, lie at the boundary between inorganic and organic species and even blur the distinction between covalent and metal character. Not surprisingly, systems of this kind have been represented by a variety of models, from Tersoff-Brenner to a molecular force field, such as those described in this section.

Figure 3. Snapshot from a molecular dynamics simulation of a room temperature ionic liquid/water solution at 0.5 M concentration in contact with a POPCphospholipid bilayer [66]. Green balls: [Cl]−; gray-silver molecules: [bmim]<sup>+</sup>. wireframe molecules: POPC. Water has been removed to highlight the incorporation of [bmim]<sup>+</sup> cations into the phospholipid bilayer.

#### 7. Water

Because of its fundamental role in life and of its widespread and generally benign presence in nature, water has always been the object of interest and fascination. In this respect, computational physicists and chemists are no exception, although the reasons for their interest are somewhat different from those of the rest of humankind. A number of measurements have highlighted a wide variety of peculiarities, if not anomalies, in the properties of water [67]. These include the surprising expansion of water upon freezing, the density anomaly observed at 4 ◦C at ambient pressure, and, more in general, the non-monotonic variation of several physics-chemical properties in the vicinity of this remarkable density maximum. Other peculiar features consist in the wide temperature range of super-cooling, the high liquid-vapor critical temperature and the large value of the latent heat of the liquid water-ice transition.

To a large extent, these anomalous behaviors are embodied into the PES of water systems and arise from the strength and directionality of the hydrogen bond network that provides the bulk of water cohesion. In part, however, they are due to the light mass of the water molecule, causing non-negligible quantum effects that influence the properties of hydrogen bonds. Heavy water, for instance, is already somewhat different from ordinary water, so much that D2O is known to have peculiar and generally adverse biological effects. This duality of potential energy *versus* quantum mechanical effects poses apparent and significant problems to modeling [68]. Potentials tuned on the *exact* PES of water do not reproduce its properties when used in a classical simulation. On the other hand, potentials tuned on experimental properties of water do not necessarily reflect the details of the exact PES.

Work to provide a quantitative and comprehensive description of water properties is still in progress [69,70]. In the meantime, a vast number of simulations in which water is the unique or an essential component are being carried out with a variety of simple potentials, reflecting the basic atomistic and electronic structure of the water molecule. Two major families are in use: TIPnP [71–73], with n = 3, 4 and 5, and SPC [74–77], both based on fixed charges (rigid ions) and centers of short range interactions, joined by rigid or harmonic bonds.

Models of this kind allow the routine simulation by MD of systems of <sup>50</sup> <sup>×</sup> <sup>10</sup><sup>3</sup> water molecules solvating whole proteins, covering times well in excess of 100 ns. Results are generally good, and a large number of successful applications clearly validate these models, at least up to the accuracy needed for these large-scale applications. However, it is fair to say that no single model of the rigid ion type is able to provide a uniformly satisfactory account of water properties over a wide range of regimes and thermodynamic conditions. Several of these models, in particular, do not display the experimental density maximum of water or place it at (P, T)conditions far from the experimental ones [69,70]. The liquid-vapor coexistence curve is also poorly predicted by rigid ion models, unless the potential parameters are explicitly adjusted for this purpose. In such a case, however, the accurate description of some other quantity might need to be sacrificed. The description of critical properties, that are accurately known from measurements, are only moderately well reproduced [78].

Water clusters and droplets are another, distinct subfield of water research. Thermodynamic and spectroscopic data are available from experiments, but are not sufficiently detailed to provide a full description of structural and dynamical properties. In this case, state-of-the-art quantum chemistry computations supplement the experimental information [79]. Once again, it turns out that rigid ion models are only moderately successful in predicting their properties and usually fail to reproduce the reduced binding of very small clusters. The oxygen-oxygen equilibrium distance in the water dimer, for instance, is greatly underestimated by popular models, and its cohesive energy is correspondingly overestimated. These discrepancies decrease in importance with increasing cluster size, but the convergence to the bulk cohesive properties, reliably described by current DFTmodels of water, is fairly slow (See Table 1). In these small systems, the rigid-ion assumption, or, in other terms, the lack of polarizability, again seems to be the major problem. The molecular dipole moment of water, for instance, changes from μ = 1.855 D in the gas phase molecule, to nearly μ = 3 D in ice and in liquid water, but rigid ion models cannot reproduce this change. Moreover, within rigid-ion models, hydrogen bonds have only a Coulombic origin, contradicting the results of experiments and quantum chemistry computations showing that both Coulomb and covalent contributions are important [80] and change in slightly different ways upon changing the aggregation state of water.

Table 1. Cohesive energy (kJ/mol per water molecule) of (H2O)2, of cyclic water clusters (H2O)n, n = 3, 4, 5, 6, and of the cubic D<sup>2</sup><sup>d</sup> form of (H2O)<sup>8</sup> computed by an SPC, rigid ion model (SPC/Fw, [77]). Deviations from dispersion-corrected [81] DFT [82] results are given in parentheses. Data are from [83].


Somewhat surprisingly, the inclusion of polarizability into simple models has not resulted yet into the systematic improvement of the description of the properties for extended water systems [84], while it has been more successful for clusters.

All these difficulties have stimulated a large number of new attempts. It might be worth mentioning the representation of electron polarizability via classical [85] and quantum [86] Drude oscillators, the application to water [87] of the empirical valence band (EVB) theory [88] and the usage of polarizable Thole models [89].

*Ab initio* modeling, discussed in more detailed below, will eventually provide the method of choice to study water [90]. Until now, however, approaches of this kind using standard approximations for the exchange-correlation energy (see next section) have given rather mixed results [91].

### 8. The *Ab initio* Route

Over the last twenty years, the art of representing PES as a function of atomic coordinates has seen its role increasingly challenged by the explosive growth of *ab initio* simulation methods.

As discussed in Section 2, the exact PES of a system made by N electrons evolving in the field of K nuclei can be determined point by point by computing the energy eigenvalues of the Hˆele Hamiltonian:

$$E\_k(\{\mathbf{R}\_\alpha\}) = \frac{\hat{H}\_{ele}\psi\_k(\{\mathbf{r\_i}\} \mid \{\mathbf{R}\_\alpha\})}{\psi\_k(\{\mathbf{r\_i}\} \mid \{\mathbf{R}\_\alpha\})} \tag{20}$$

For any single choice of the {**R**α} coordinates, a fairly extended array of quantum chemistry *ab initio* methods, such as configuration interaction, Møller-Plesset perturbation theory or coupled clusters, are available to find all or a few of the lowest energy eigenvalues and eigenvectors of this so-called *standard problem* in electronic structure computations.

For what concerns the direct application of *ab initio* methods to simulation, however, progress came primarily through the advent of density functional theory, whose recognized theoretical and practical foundation is provided by the Hohenberg-Kohn (HK) theorem [92] and by the seminal paper by Kohn and Sham (KS) [93]. In a very schematic way, density functional theory in the popular Kohn-Sham formulation represents the ground state electron density, ρ(**r**), in terms of an auxiliary set of non-interacting electron orbitals {φi(**r**), i = 1, ..., K}, generally known as the Kohn-Sham orbitals:

$$\rho(\mathbf{r}) = \sum\_{i=1}^{K} |\,\phi\_i(\mathbf{r})\,|^2 \tag{21}$$

To reproduce the exact density, the (unspecified) potential acting on the non-interacting electrons has to be different from the one acting on their interacting counterpart. The properties of such a potential and, in particular, its local, *i.e.*, multiplicative nature are a corollary of the HKtheorem.

Then, according to KS, the system ground state energy is the minimum of the unique and universal functional:

$$E\_{KS}[\rho \mid \{\mathbf{R}\_{\alpha}\}] = -\frac{1}{2} \sum\_{i=1}^{K} \langle \phi\_i \mid \nabla^2 \mid \phi\_i \rangle + \frac{1}{2} \int \frac{\rho(\mathbf{r})\rho(\mathbf{r}')}{|\mathbf{r} - \mathbf{r}'|} d\mathbf{r} d\mathbf{r}' - \sum\_{\alpha=1}^{K} Z\_{\alpha} \int \frac{\rho(\mathbf{r})d\mathbf{r}}{|\mathbf{r} - \mathbf{R}\_{\alpha}|} + U\_{XC}[\rho] \tag{22}$$

where UXC[ρ] is the so-called exchange correlation energy, a functional of the electron density, ρ(**r**), which also contains a small fraction of the kinetic energy of the interacting electrons. Minimization of Equation (22) under the constraint of ortho-normality for the Kohn-Sham orbitals results in a set of coupled partial differential equations for {φi}.

Methods to solve this problem have been developed and discussed in a vast numbers of papers and textbooks [7,8]. The accuracy of the solution depends on the functional used to approximate UXC[ρ], and on the choice of the basis used to represent the orbitals. Popular choices for the exchange-correlation energy are generalized gradient corrections, such as PBE [82], or hybrid functionals, such as B3LYP [94]. Basis sets range from atomic orbitals to wavelets, but plane waves [95,96] and Gaussian functions [97] are probably the most widely used choice for implementations tuned on molecular dynamics applications.

The solution of the standard problem in Equation (5) obtained through Equation (22) is restricted to the ground state PES. Even within this limited scope, the PES itself can only be determined point by point. Nevertheless, the KS energy expression can be used to evolve the atomic positions in time, thus opening the way to MD, provided one can: (i) minimize Equation (22) fast enough; and (ii) evaluate forces on the atoms through:

$$\mathbf{F}\_{\beta} = -\nabla\_{\mathbf{R}\_{\beta}} E\_{KS}[\rho \mid \{\mathbf{R}\_{\alpha}\}] \tag{23}$$

Towards this goal, the work of Car and Parrinello [9] has truly represented the single most important breakthrough, whose major innovation consisted of the introduction of direct minimization approaches for Equation (22), exploiting the close similarity of the electronic configuration at two successive steps of MD. Evaluation of forces, moreover, was greatly eased by the choice of plane waves as the basis set to represent KS orbitals, whose unbiased coverage of the entire space allows the application of the Hellmann-Feynman theorem in its simplest form to compute gradients of the ground state energy [95,98].

Atoms evolve on the adiabatic PES implicitly defined by Equation (22) classically or quantum mechanically. The validity of a classical time evolution for the atoms according to Newton's equations relies on conditions discussed in detail in Chapter [6]. Outside these conditions, one could resort to a path integral approach, as done, for instance, in [99].

The method can be extended to simulate the atomic dynamics on the single PES of an electronically excited state [100], provided the different symmetry of the ground and excited state allows a meaningful definition of both PESs by density functional methods. As apparent from the discussion of the Born-Oppenheimer approximation, multiple PESs close in energy make it impossible to disentangle the ionic and electron dynamics, and in these cases, resorting to semiclassical or to more accurate quantum mechanical approaches [6] is mandatory.

Somewhat simplified versions of the density-functional-based MD, resorting to localized bases and relying on a self-consistent tight-binding approach have been developed [101,102] and provide a cheaper and popular alternative to unrestricted DFT methods. The price to be paid is a slight limitation in the quality of the solution, as well as occasional failures of the method.

The amazing success of density-functional-based simulation methods is due to the fact that they represent the only method endowed with truly predictive power, which can be used for systems of several hundred atoms, with up to a few thousand valence electrons. *ab initio* simulation, therefore, is the method of choice whenever we cannot guess a suitable representation of the PES or when we need an accuracy that cannot be provided by the empirical models that are available. *Ab initio* simulation is also strictly required for systems whose structure is affected by electronic effects, such as Jahn-Teller, and also enjoys a clear advantage in describing spin-polarization effects or systems undergoing chemical transformations and non-stoichiometric compounds exhibiting different valence states.

Well known drawbacks are represented by the computational cost that limits the size and especially the time scale of *ab initio* simulations, even though the reach of the method is constantly expanding. At present, large computations running on state-of-the-art facilities may involve ∼1, 000 atoms and ∼4, 000–5, 000 valence electrons. Early problems with metals have been progressively eased by approaches relying on the accurate step-by-step minimization of the KS energy functional. Problems, however, remain with transition and, especially, rare-earth metals, for which standard exchange-correlation approximations give unsatisfactory results, and quantum chemistry hybrid methods fail fairly spectacularly [103]. Progress is being achieved with methods incorporating strong correlation at some approximate level, such as LSD+U [104].

Difficulties remain also in the limit of weakly interacting molecular systems. Furthermore, in this case, early methods lacked essential components, such as the dispersion interaction, which in molecular systems provide a good portion of cohesion. Dispersion interactions are now increasingly included in *ab initio* simulations [81], especially for molecular systems and for water, in particular. Results are encouraging, although not yet in full quantitative agreement with experiments. However, the accuracy, reliability and computational efficiency of these methods are improving rapidly.

The major problem in current MD applications of *ab initio* methods arguably is that achieving accurate results for *difficult* systems, such as transition metals and oxides or molecular systems, still require an extensive preliminary calibration stage and system-specific exchange correlation approximations [105], effectively spoiling the *ab initio* character of these methods. Perhaps more importantly, these adjustments of the model decrease their reliability for systems exhibiting different bonding types, since the improvement on one type might worsen the description of the other type.

Most of the cost of KS-DFT computations is due to the representation of the density in terms of KS orbitals. Approaches relying on genuine density functional formalism, such as a refined Thomas-Fermi method, could enjoy a huge computational advantage, but no successful scheme has emerged during the years, and only very idealized Gordon-Kim approaches [106] have been used with some success.

#### 9. Conclusions

Explicit or implicit expressions of the PES of condensed matter systems represent the basis of our ability to simulate them, possibly understanding and sometimes predicting their properties by purely computational methods. For this reason, the development of approximations and efficient representations of PES is the focus of an intense research effort, involving a sizable portion of the computational community.

Such a modeling activity is an art as much as a science. It is a science in the systematic derivation of interatomic forces from more fundamental interactions. It is an art in the invention of effective ways to incorporate new ideas in physically transparent and computationally efficient mathematical expressions. Like many other forms of art, it relies on a big deal of craftsmanship, required in the stage of parameterizing force fields, validating them and incorporating them into widely used computer packages, using sophisticated programming techniques, tuned on state-of-the-art computational hardware.

It should be apparent from the discussion of the previous sections that the last thirty years have seen an amazing enhancement of our ability to model a wide variety of systems at the atomistic level, fueling the explosive growth of simulation studies, while, at the same time, being driven by it. Equally amazing, however, is the extent of what we are still unable to model satisfactorily. Interfaces between different materials, for instance, are intrinsically difficult to describe by simple approaches. Excluding *ab initio*, no reliable, general and widely accepted model is available to simulate water and electrolyte solutions in contact with neutral or charged electrodes, organic and biological molecules on solid surfaces or the junction of metal and semiconducting phases. Even homogeneous phases, such as non-stoichiometric oxides, still represent a formidable challenge for models suitable for simulating 10<sup>4</sup> atoms over 100 ns or more. Systems undergoing chemical transformations are another sore point, even though methods, such as ReaxFF and REBO, are achieving progress in this direction.

At this stage, strategic decisions on the directions and aims of the modeling effort have to take into account the rapid growth of *ab initio* methods, which easily account for the intermixing of different bonding categories, cover electrostatic polarizability, provide information on excited state PES and may include magnetic interactions and spin effects through their approximate description of exchange.

The rapid progress of methods and computational equipment implies that the foreseeable future spans at most ten to fifteen years from now. Over this time, empirical models of PES will continue to play an important and useful role in the atomistic simulation of large systems (N - 10<sup>4</sup> atoms) over times in excess of 100 ns. Most biochemistry and biophysics simulations fall into this class.

On the longer run, however, the general picture of modeling might indeed change. First of all, the domain proper to atomistic modeling concerns the investigation of the microscopic details underlying larger-scale phenomena. In this context, the scales of interest rarely exceed <sup>∼</sup>10<sup>4</sup> atoms and correspondingly short times of less than ∼10 ns. Beyond this range, simulation may become the exclusive domain of coarse graining and multi-scale approaches, provided refined versions of these methods are developed over the next few years.

*Ab initio* methods already represent the method of choice for systems for which we do not have reliable approximations of their PES, for phenomena that can be represented by 100 to 1, 000 atoms and that take place within a 50–100 ps time span. Mixed QM/MM approaches extend this reach and represent the most appealing method to treat systems, such as protein reaction centers, organometallic catalysts, *etc*., in which a small portion of a large system needs to be represented in full chemical detail.

The parallel development of *ab initio* and of refined coarse graining and multi-scale methods, therefore, could greatly shrink the role of empirical PES approximations in atomistic simulation. Even these likely developments, however, might not mark the end of atomistic potential models, since simple and transparent representations of PES will continue to provide the conceptual basis to rationalize the properties of condensed matter systems in terms of atoms, of molecules and of their microscopic interactions.

#### Acknowledgments

I thank Carlo Pierleoni for useful discussions and for a careful reading of the manuscript.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


Reprinted from *Entropy*. Cite as: Lin, L.; Lu, J.; Shao, S. Analysis of Time Reversible Born-Oppenheimer Molecular Dynamics. *Entropy* 2014, *16*, 110–137.

*Article*

## Analysis of Time Reversible Born-Oppenheimer Molecular Dynamics †

Lin Lin **<sup>1</sup>**, Jianfeng Lu **<sup>2</sup>** and Sihong Shao **<sup>3</sup>***,* \*


*Received:13 June 2013; in revised form: 10 July 2013 / Accepted: 9 September 2013 / Published: 27 December 2013*

Abstract: We analyze the time reversible Born-Oppenheimer molecular dynamics (TRBOMD) scheme, which preserves the time reversibility of the Born-Oppenheimer molecular dynamics even with non-convergent self-consistent field iteration. In the linear response regime, we derive the stability condition, as well as the accuracy of TRBOMD for computing physical properties, such as the phonon frequency obtained from the molecular dynamics simulation. We connect and compare TRBOMD with Car-Parrinello molecular dynamics in terms of accuracy and stability. We further discuss the accuracy of TRBOMD beyond the linear response regime for non-equilibrium dynamics of nuclei. Our results are demonstrated through numerical experiments using a simplified one-dimensional model for Kohn-Sham density functional theory.

Keywords: ab initio molecular dynamics; self-consistent field iteration; time reversibility; stability

Classification: PACS 31.15.xv; 71.15.Pd

#### 1. Introduction

*Ab initio* molecular dynamics (AIMD) [1–6] has been greatly developed in the past few decades, so that nowadays, it is able to quantitatively predict the equilibrium and non-equilibrium properties for a vast range of systems. AIMD has become widely used in chemistry, biology, materials science, *etc*. A coherent and comprehensive presentation of AIMD with both the basic theory and advanced methods can be found in [7]. Most AIMD methods treat the nuclei as classical particles following Newtonian dynamics (known as the time-dependent Born-Oppenheimer approximation), and the interactive force among nuclei is provided directly from electronic structure theory, such as the Kohn-Sham density functional theory [8,9] (KSDFT), without the need of using empirical atomic potentials. KSDFT consists of a set of nonlinear equations that are solved at each molecular dynamics time step *self-consistently* via the self-consistent field (SCF) iteration. In Born-Oppenheimer molecular dynamics (BOMD), KSDFT is solved until full self-consistency for each atomic configuration per time step. Since many iterations are usually needed to reach full self-consistency and each iteration takes a considerable amount of time, until recently, this procedure was still found to be prohibitively expensive for producing meaningful dynamical information. On the other hand, if the self-consistent iterations are truncated before convergence is reached, it is often the case that the energy of the system is no longer conservative, even for an NVE system. The error in SCF iteration acts as a sink or source, gradually draining or adding energy to the atomic system within a short period of molecular dynamics simulation [10]. This is one of the main challenges for accelerating Born-Oppenheimer molecular dynamics.

AIMD was made practical by the ground-breaking work of Car-Parrinello molecular dynamics (CPMD) [11]. CPMD introduces an extended Lagrangian, including the degrees of freedom of both nuclei and electrons without the necessity of a convergent SCF iteration. The dynamics of electronic orbitals can be loosely viewed as a special way for performing the SCF iteration at each molecular dynamics (MD) step. Thanks to the Hamiltonian structure, numerical simulation for CPMD is stable, and the energy is conservative over a much longer time period compared to that for BOMD with non-convergent SCF iteration. When the system has a spectral gap, the accuracy of CPMD is controlled by a single parameter, the fictitious electron mass, μ. The result of CPMD approaches that of BOMD as μ goes to zero [12,13]. However, it has also been shown that CPMD does not work as well for systems with a vanishing gap, for example, for metallic systems [12].

To reduce the cost of BOMD, in particular, the number of SCF iterations needed per MD time step, a new type of AIMD method, the time reversible Born-Oppenheimer molecular dynamics (TRBOMD) method has been recently proposed by Niklasson, Tymczak and Challacombe in [14]. The method has been further developed in [15–18]. The idea of TRBOMD can be summarized as follows: TRBOMD assumes that the SCF iteration is a *deterministic* procedure, with the outcome determined only by the initial guess of the variable to be determined self-consistently. For instance, this variable can be the electron density, and the SCF iteration procedure can be simple mixing with a fixed number of iteration steps without reaching full self-consistency. Then, a fictitious dynamics governed by a second order ordinary differential equation (ODE) is introduced on this initial guess variable. The resulting coupled dynamics is then time-reversible and supposed to be more stable, since it has been found that time-reversible numerical schemes are more stable for long time simulation [19,20]. Besides TRBOMD, alternative ideas based on time-reversible predictor-corrector methods [21] and Langevin dynamics [22,23] can also relax the requirement on the accuracy of the force for AIMD simulation. For these methods, we refer the readers to a recent review paper [24] for more information.

Although TRBOMD has been found to be effective and significantly reduces the number of SCF iterations needed in practice, to the extent of our knowledge, there has been so far no detailed analysis of TRBOMD, other than the numerical stability condition of the Verlet or generalized Verlet scheme for time discretization [17]. Accuracy, stability, as well as the applicability range of TRBOMD remain unclear. In particular, it is not known how the choice of SCF iteration scheme affects TRBOMD. These are crucial issues for guiding the practical use of TRBOMD. The full TRBOMD method for general systems is highly nonlinear and is difficult to analyze. In this work, we first focus on the linear response regime, *i.e.*, we assume that each atom oscillates around their equilibrium position and the electron density stays around the "true" electron density. Under such assumptions, we analyze the accuracy and stability of TRBOMD. We then extend the results to the regime where the atom position is not near equilibrium using the averaging principle.

The rest of the paper is organized as follows. We illustrate the idea of TRBOMD and its analysis in the linear response regime using a simple model in Section 2 and introduce TRBOMD for AIMD in Section 3. We analyze TRBOMD in the linear response regime and compare TRBOMD with CPMD in Section 4. The numerical results for TRBOMD in the linear response regime are given in Section 5. We present the analysis of TRBOMD beyond the linear response regime, such as the non-equilibrium dynamics in Section 6, and conclude with a few remarks in Section 7.

#### 2. An Illustrative Model

To start, let us illustrate the main idea for a simple model problem, which provides the essence of TRBOMD in a much simplified setting. Consider the following nonlinear ODE:

$$
\ddot{x}(t) = f(x(t)) \tag{1}
$$

where we assume that the right-hand side f(x) is difficult to compute, and it can be approximated by an iterative procedure. Starting from an initial guess, s ≈ f(x), the final approximation via the iterative procedure is denoted by g(x, s). We assume the approximation, g(x, s), is consistent, *i.e.*,:

$$(g(x, f(x)) = f(x) \tag{2}$$

To numerically solve the ODE Equation (1), we discretize it by some numerical scheme; then, it remains to decide the initial guess, s, at each time step. A natural choice of s would be g(x, s) from the previous step, as x does not change much in successive steps. For instance, if the Verlet algorithm is used and t<sup>k</sup> = kΔt with Δt being the time step, the discretized ODE becomes:

$$\begin{aligned} x\_{k+1} &= 2x\_k - x\_{k-1} + (\Delta t)^2 g(x\_k, s\_k) \\ s\_{k+1} &= g(x\_k, s\_k) \end{aligned} \tag{3}$$

We immediately observe that the discretization scheme Equation (3) breaks the time reversibility of the original ODE Equation (1). In other words, for the original ODE Equation (1), we propagate the system forward in time from (x(t0), x˙(t0)) to (x(t1), x˙(t1)). Then, if we use (x(t1), x˙(t1)) as the initial data at t = t<sup>1</sup> and propagate the system backward in time to time t = t0, we will be at the state, (x(t0), x˙(t0)). The loss of the time reversible structure can introduce large error in long time numerical simulation [20]. This is the main reason why BOMD with non-convergent SCF iteration fails for long time simulations [14]. To overcome this obstacle, the idea of TRBOMD is to introduce a fictitious dynamics for the initial guess, s. Namely, we consider the time reversible coupled system:

$$\begin{aligned} \ddot{x}(t) &= g(x(t), s(t)) \\ \ddot{s}(t) &= \omega^2(g(x(t), s(t)) - s(t)) \end{aligned} \tag{4}$$

where ω is an artificial frequency. We analyze, now, the accuracy and stability of Equation (4) in the linear response regime by assuming that the trajectory, x(t), oscillates around an equilibrium position, <sup>x</sup>∗. We denote by <sup>x</sup> (t) = <sup>x</sup>(t) <sup>−</sup> <sup>x</sup><sup>∗</sup> the deviation from the equilibrium position and <sup>s</sup> (t) = <sup>s</sup>(t) <sup>−</sup> <sup>f</sup>(x(t)), the deviation of the initial guess from the exact force term. Consequently, the equation of motion (4) can be rewritten as (for simplicity we suppress the t-dependence in the notation for the rest of the section):

$$\begin{aligned} \ddot{\tilde{x}} &= g(x, s) \\ \ddot{\tilde{s}} &= \omega^2 (g(x, s) - s) - f''(x)(\dot{x})^2 - f'(x)\ddot{x} \end{aligned} \tag{5}$$

where the term, <sup>−</sup>f(x)( ˙x)<sup>2</sup> <sup>−</sup> <sup>f</sup> (x)¨x, comes from the term, <sup>f</sup>(x) in <sup>s</sup> , by the chain rule.

In the linear response regime, we assume the linear approximation of force for x around x∗:

$$f(x) \approx -\Omega^2(x - x^\*) = -\Omega^2 \tilde{x} \tag{6}$$

where Ω is the oscillation frequency of x in the linear response regime. We also linearize g with respect to <sup>s</sup> and <sup>x</sup> and dropping all higher order terms as:

$$\begin{aligned} g(x,s) &= g(x, f(x) + \widetilde{s}) \\ &\approx g(x, f(x)) + g\_s(x, f(x))\widetilde{s} \\ &\approx -\Omega^2 \widetilde{x} + g\_s(x^\*, f(x^\*))\widetilde{s} \end{aligned} \tag{7}$$

where g<sup>s</sup> denotes the partial derivative of g with respect to s, and the consistency condition (2) is applied. We then have:

$$\begin{aligned} g(x,s) - s &= (g(x, f(x) + \widetilde{s}) - f(x)) - (s - f(x)) \\ &\approx (g\_s(x, f(x)) - 1)\widetilde{s} \\ &\approx (g\_s(x^\*, f(x^\*)) - 1)\widetilde{s} \end{aligned} \tag{8}$$

In accord with notations used in later discussions, let us denote:

$$\mathcal{L} = g\_s(x^\*, f(x^\*)), \quad \mathcal{K} = 1 - g\_s(x^\*, f(x^\*)) \tag{9}$$

with which the linearized system of Equation (5) becomes:

$$
\frac{\mathrm{d}^2}{\mathrm{d}t^2} \begin{pmatrix} \widetilde{x} \\ \widetilde{s} \end{pmatrix} = \begin{pmatrix} -\Omega^2 & \mathcal{L} \\ f'(x^\*)\Omega^2 & -f'(x^\*)\mathcal{L} - \omega^2 \mathcal{K} \end{pmatrix} \begin{pmatrix} \widetilde{x} \\ \widetilde{s} \end{pmatrix} := A \begin{pmatrix} \widetilde{x} \\ \widetilde{s} \end{pmatrix} \tag{10}
$$

Note that when the force is computed accurately, *i.e.*,

$$g(x,s) = f(x), \quad \forall s \tag{11}$$

we have:

$$\mathcal{L} = 0, \quad \mathcal{K} = 1 \tag{12}$$

meaning that the motion of <sup>x</sup> is decoupled from that of <sup>s</sup> , and <sup>x</sup> follows the exact harmonic motion in the linear response regime with the accurate frequency, Ω. When the force is computed inaccurately, <sup>x</sup> is coupled with <sup>s</sup> in Equation (10). Actually, we can solve (10) analytically, and the eigenvalues of A are:

$$
\begin{pmatrix}
\lambda\_{\hat{\Omega}} \\
\lambda\_{\hat{\omega}}
\end{pmatrix} = \begin{pmatrix}
\frac{1}{2} \left( \sqrt{(\mathcal{L}f'(x^\*) + \mathcal{K}\omega^2 + \Omega^2)^2 - 4\mathcal{K}\omega^2\Omega^2} - \mathcal{L}f'(x^\*) - \mathcal{K}\omega^2 - \Omega^2 \right) \\
\frac{1}{2} \left( -\sqrt{(\mathcal{L}f'(x^\*) + \mathcal{K}\omega^2 + \Omega^2)^2 - 4\mathcal{K}\omega^2\Omega^2} - \mathcal{L}f'(x^\*) - \mathcal{K}\omega^2 - \Omega^2 \right) \\
\omega
\end{pmatrix} \tag{13}
$$

Then, the frequencies of the normal modes of the ODE are Ω = −λ<sup>Ω</sup> and <sup>ω</sup> <sup>=</sup> √−<sup>λ</sup><sup>ω</sup>-, respectively. Assume <sup>ω</sup><sup>2</sup> <sup>Ω</sup><sup>2</sup> and expand the solution to the order of <sup>O</sup>(1/ω<sup>2</sup>); we have:

$$
\widetilde{\Omega} = \Omega \left( 1 - \frac{f'(x^\*)}{2\omega^2} \mathcal{L} \mathcal{K}^{-1} \right) + \mathcal{O}(1/\omega^4) \tag{14}
$$

Similarly, the frequency for the other normal mode, which is dominated by the motion of <sup>s</sup> , is:

$$
\widetilde{\omega} = \sqrt{\mathcal{K}} \omega \left( 1 + \frac{f'(x^\*)}{2\omega^2} \mathcal{L} \mathcal{K}^{-1} \right) + \mathcal{O}(1/\omega^3) \tag{15}
$$

It is found that one of the normal modes of Equation (10) has frequency <sup>Ω</sup> <sup>≈</sup> <sup>Ω</sup>. We can therefore measure the accuracy of Equation (4) using the relative error between <sup>Ω</sup> and <sup>Ω</sup>. Furthermore, if the dynamics (4) is stable in the linear response regime, it is necessary to have K > 0.

From Equation (14), we conclude that if the time reversible numerical scheme (4) is used for simulating the ODE Equation (1) and if we neglect the error due to the Verlet scheme, the error introduced in computing the frequency, Ω, is proportional to ω−<sup>2</sup>. This seems to indicate that very large ω (*i.e.*, very small time step Δt) might be needed to obtain accurate results. Fortunately, the ω−<sup>2</sup> term in Equation (14) has the prefactor, f (x∗)LK<sup>−</sup><sup>1</sup>. Equation (6) shows that <sup>f</sup> (x∗) ≈ −Ω<sup>2</sup>, which is small compared to <sup>ω</sup><sup>2</sup>. If <sup>g</sup>s(x∗, f(x∗)) is small, then K ≈ <sup>1</sup>, and the accuracy of <sup>Ω</sup> is determined by L or gs(x∗, f(x∗)), which indicates the sensitivity of the computed force with respect to the initial guess, or the accuracy of the iterative procedure for computing the force. If a "good" iterative procedure is used, gs(x∗, f(x∗)) will be small. Therefore, the presence of the term, L, allows one to obtain relatively accurate approximation to the frequency, Ω, without using a large ω. The same behavior can be observed when using TRBOMD to approximate BOMD (*vide post*).

Finally, we remark that even though Equation (1) is a much simplified system, it will be seen below that for BOMD with M atoms and N interacting electrons, the analysis in the linear response regime follows the same line, and the result for the frequency is similar to Equation (14).

#### 3. Time Reversible Born-Oppenheimer Molecular Dynamics

Consider a system with M atoms and N electrons. The position of the atoms at time t is denoted by **R**(t)=(R1(t),...,RM(t))<sup>T</sup> . In BOMD, the motion of atoms follows Newton's law:

$$m\ddot{R}\_I(t) = f\_I(\mathbf{R}(t)) = -\frac{\partial E(\mathbf{R}(t))}{\partial R\_I} \tag{16}$$

where E(**R**(t)) is the total energy of the system at the atomic configuration, **R**(t). In KSDFT, the total energy is expressed as a functional of a set of Kohn-Sham orbitals, {ψi(x)}<sup>N</sup> <sup>i</sup>=1. To illustrate the idea with minimal technicality, let us consider for the moment a system of N electrons at zero temperature. The energy functional in KSDFT takes the form:

$$\begin{aligned} E(\{\psi\_i(x)\}\_{i=1}^N; \mathbf{R}) &= \frac{1}{2} \sum\_{i=1}^N \int |\nabla \psi\_i(x)|^2 \, \mathrm{d}x + \int \rho(x) V\_{\mathrm{ion}}(x; \mathbf{R}) \, \mathrm{d}x + E\_{\mathrm{hxc}}[\rho] \\ \rho(x) &= \sum\_{i=1}^N |\psi\_i(x)|^2 \end{aligned} \tag{17}$$

The first term in the energy functional is the kinetic energy of the electrons. The second term contains the electron-ion interaction energy. The ion-ion interaction energy usually takes the form I<J ZIZ<sup>J</sup> |RI−R<sup>J</sup> | , where Z<sup>I</sup> is the charge for the nucleus, I. The ion-ion interaction energy does not depend on the electron density, ρ. To simplify the notation, we include the ion-ion interaction energy in the Vion term as a constant shift that is independent of the x variable. The third term does not explicitly depend on the atomic configuration, **R**, and is a nonlinear functional of the electron density, ρ. It represents the Hartree part of electron-electron interaction energy (h) and the exchange-correlation energy (xc) characterizing many body effects. The energy, E(**R**), as a function of atomic positions is given by the following minimization problem:

$$\begin{aligned} E(\mathbf{R}) &= \min\_{\{\psi\_i(x)\}\_{i=1}^N} E(\{\psi\_i(x)\}\_{i=1}^N; \mathbf{R})\\ \text{s.t.} &\quad \int \psi\_i^\dagger(x) \psi\_j(x) \, \mathrm{d}x = \delta\_{ij}, \quad i, j = 1, \ldots, N \end{aligned} \tag{18}$$

We denote by {ψi(x; **<sup>R</sup>**)}<sup>N</sup> <sup>i</sup>=1 the (local) minimizer and ρ∗(x; **R**) = <sup>N</sup> <sup>i</sup>=1 |ψi(x; **R**)| 2 , the converged electron density corresponding to the minimizer (here, we assume that the minimizing electron density is unique). Then, the force acting on the atom I is:

$$f\_I(\mathbf{R}; \rho^\*(x; \mathbf{R})) = -\frac{\partial E(\mathbf{R})}{\partial R\_I} = -\int \rho^\*(x; \mathbf{R}) \frac{\partial V\_{\text{ion}}(x; \mathbf{R})}{\partial R\_I} \, \text{d}x \tag{19}$$

In the physics literature, the force formula in Equation (19) is referred to as the Hellmann-Feynman force. The validity of the Hellmann-Feynman formula relies on the electron density, ρ∗(x; **R**), corresponding to the minimizers of the Kohn-Sham energy functional. Since Ehxc[ρ] is a nonlinear functional of ρ, the electron density, ρ, is usually determined through the self-consistent field (SCF) iteration as follows.

Starting from an inaccurate input electron density, ρin, one first computes the output electron density by solving the lowest N eigenfunctions of the problem:

$$\left(-\frac{1}{2}\Delta\_x + \mathcal{V}(x; \mathbf{R}, \rho^{\rm in})\right)\psi\_i = \varepsilon\_i \psi\_i \tag{20}$$

with:

$$\mathcal{V}(x; \mathbf{R}, \rho) = V\_{\rm ion}(x; \mathbf{R}) + \frac{\delta E\_{\rm hxc}[\rho]}{\delta \rho}(x) \tag{21}$$

and the output electron density, ρout, is defined by:

$$\rho^{\rm out}(x) := F[\rho^{\rm in}](x) = \sum\_{i=1}^{N} |\psi\_i(x)|^2 \tag{22}$$

Here, the operator, F, is called the Kohn-Sham map. ρout can be used directly as the input electron density, ρin, in the next iteration. This is called the *fixed point iteration*. Unfortunately, in most electronic structure calculations, the fixed point iteration does not converge, even when ρin is very close to the true electron density, ρ∗. The fixed point iteration can be improved by the simple mixing method, which takes the linear combination of the electron density:

$$
\alpha \rho^{\rm out} + (1 - \alpha) \rho^{\rm in} \tag{23}
$$

as the input density for the next iteration with 0 < α ≤ 1. Simple mixing can greatly improve the convergence properties of the SCF iteration over the fixed point iteration, but the convergence rate can still be slow in practice. There are more complicated SCF iteration schemes, such as the Anderson mixing scheme [25], the Pulay mixing scheme [26] and the Broyden mixing scheme [27]. Furthermore, preconditioners can be applied to the SCF iteration to enhance convergence properties, such as the Kerker preconditioner [28]. More detailed discussion on the convergence properties of these SCF schemes can be found in [29]. In the following discussions, we denote by ρSCF(x; **R**, ρ) the final electron density after the SCF iteration starting from an initial guess, ρ. We assume that ρSCF satisfies the consistency condition:

$$
\rho\_{\rm SCF}(x; \mathbf{R}, \rho^\*(\cdot; \mathbf{R})) = \rho^\*(x; \mathbf{R}) \tag{24}
$$

If a non-convergent SCF iteration procedure is used, ρSCF(x; **R**, ρ) might deviate from ρ∗(x; **R**). Such deviation introduces error in the force, and the error can accumulate in the long time molecular dynamics simulation and lead to inaccurate results in computing the statistical and dynamical properties of the systems.

The map, ρSCF, is usually highly nonlinear, which makes it difficult to correct the error in the force. The TRBOMD scheme avoids the direct correction for the inaccurate ρSCF, but allows the initial guess to dynamically evolve together with the motion of the atoms. We denote by ρ(x, t) the initial guess for the SCF iteration at time t. When ρ(·, t) is used as an argument, we also write ρSCF(x; **R**(t), ρ(t)) := ρSCF(x; **R**(t), ρ(·, t)). The Hellmann-Feynman formula (19) is used to compute the force at the electron density, ρSCF(x; **R**(t), ρ(t)), even though ρ∗(x; **R**(t)) is not available. Thus, the equation of motion in TRBOMD reads:

$$\begin{split} m\ddot{R}\_I(t) &= f\_I(\mathbf{R}(t); \rho\_{\text{SCF}}(x; \mathbf{R}(t), \rho(t))) = -\int \rho\_{\text{SCF}}(x; \mathbf{R}(t), \rho(t)) \frac{\partial V\_{\text{ion}}(x; \mathbf{R}(t))}{\partial R\_I} \, \text{d}x \\ \ddot{\rho}(x, t) &= \omega^2(\rho\_{\text{SCF}}(x; \mathbf{R}(t), \rho(t)) - \rho(x, t)) \end{split} \tag{25}$$

It is clear that TRBOMD is time reversible. The discretized TRBOMD is still time reversible if the numerical scheme is time reversible. For instance, if the Verlet scheme is used, the discretized equation of motion becomes:

$$\begin{split} R\_I(t\_{k+1}) &= 2R\_I(t\_k) - R\_I(t\_{k-1}) - \frac{\Delta t^2}{m} f\_I(\mathbf{R}(t\_k); \rho\_{\text{SCF}}(x; \mathbf{R}(t\_k), \rho(t\_k)) \\ \rho(x, t\_{k+1}) &= 2\rho(x, t\_k) - \rho(x, t\_{k-1}) + \Delta t^2 \omega^2(\rho\_{\text{SCF}}(x; \mathbf{R}(t\_k), \rho(t\_k)) - \rho(x, t\_k)) \end{split} \tag{26}$$

which is evidently time reversible. The artificial frequency, ω, controls the frequency of the fictitious dynamics of ρ(x, t) and is generally chosen to be larger than the frequency of the motion of the atoms. The numerical stability of the Verlet algorithm requires that the dimensionless quantity, κ := (ωΔt)<sup>2</sup>, be small [30]. When <sup>κ</sup> is fixed, <sup>ω</sup> controls the stiffness or, equivalently, the time step <sup>Δ</sup><sup>t</sup> <sup>=</sup> <sup>√</sup><sup>κ</sup> <sup>ω</sup> for the equation of motion (26).

Let us mention that TRBOMD is closely related to CPMD. In CPMD, the equation of motion is given by:

$$\begin{aligned} m\ddot{R}\_I(t) &= f\_I(\mathbf{R}(t), \rho(t)) = -\int \rho(t) \frac{\partial V\_{\text{ion}}(x; \mathbf{R}(t))}{\partial R\_I} \mathbf{d}x \\ \mu \ddot{\psi}\_i(t) &= -\frac{\delta E(\mathbf{R}(t), \{\psi\_i(t)\})}{\delta \psi\_i^\dagger} + \sum\_j \psi\_j(t) \Lambda\_{ji}(t) \end{aligned} \tag{27}$$

where μ is the fictitious electron mass for the fake electron dynamics in CPMD and Λ's are the Lagrange multipliers determined so that {ψi(t)} is an orthonormal set of functions for any time. The CPMD scheme (27) can be viewed as the equation of motion with an extended Lagrangian:

$$\mathcal{L}\_{\rm CP}(\mathbf{R}, \dot{\mathbf{R}}, \{\psi\_i\}, \{\dot{\psi}\_i\}) = \sum\_{I} \frac{m}{2} |\dot{R}\_I|^2 + \sum\_i \frac{\mu}{2} \int |\dot{\psi}\_i|^2 - E(\mathbf{R}, \{\psi\_i\}) \tag{28}$$

which contains both ionic and electronic degrees of freedom. Therefore, CPMD is a Hamiltonian dynamics and, thus, time reversible.

Note that the frequency of the evolution equation for {ψi} in CPMD is adjusted by the fictitious mass parameter, μ. Comparing with TRBOMD, the parameter, μ, plays a similar role as ω−<sup>2</sup>, which controls the frequency of the fictitious dynamics of the initial density guess in SCF iteration. This connection will be made more explicit in the sequel.

We remark that the papers, [16,17], took a further step in viewing TRBOMD by an extended Lagrangian approach in a vanishing mass limit. This was also interpreted differently in [24] by starting from a Lagrangian and, then, using inaccurate forces in the equation of motions. However, unless a very specific and restrictive form of the error due to non-convergent SCF iterations is assumed, the equation of motion in TRBOMD does not have an associated Lagrangian in general. The connection to Lagrangian dynamics remains formal, and hence, we will not further explore it here.

#### 4. Analysis of TRBOMD in the Linear Response Regime

In this section, we consider Equation (25) in the linear response regime, in which each atom, I, oscillates around its equilibrium position, R<sup>∗</sup> <sup>I</sup> . The displacement of the atomic configuration, **R**, from the equilibrium position is denoted by **<sup>R</sup>** (t) := **<sup>R</sup>**(t) <sup>−</sup> **<sup>R</sup>**∗, and the deviation of the electron density from the converged density is denoted by <sup>ρ</sup> (x, t) := <sup>ρ</sup>(x, t) <sup>−</sup> <sup>ρ</sup>∗(x; **<sup>R</sup>**(t)). Both **<sup>R</sup>** (t) and <sup>ρ</sup> (x, t) are small quantities in the linear response regime and contain the same information as **R**(t) and ρ(x, t). Using **<sup>R</sup>** (t) and <sup>ρ</sup> (x, t) as the new variables and noting the chain rule due to the **<sup>R</sup>**-dependence in ρ∗(x; **R**(t)), the equation of motion in TRBOMD becomes:

$$\begin{split} m\ddot{\tilde{R}}\_{I}(t) &= -\int \rho\_{\text{SCF}}(x; \mathbf{R}(t), \rho(t)) \frac{\partial V\_{\text{ion}}(x; \mathbf{R}(t))}{\partial R\_{I}} \, \mathrm{d}x \\ \ddot{\tilde{\rho}}(x, t) &= \omega^{2}(\rho\_{\text{SCF}}(x; \mathbf{R}(t), \rho(t)) - \rho(x, t)) - \sum\_{I=1}^{M} \frac{\partial \rho^{\*}(x; \mathbf{R}(t))}{\partial R\_{I}} \ddot{\tilde{R}}\_{I}(t) \\ &- \sum\_{I, J=1}^{M} \dot{\tilde{R}}\_{I}(t) \dot{\tilde{R}}\_{J}(t) \frac{\partial^{2} \rho^{\*}(x; \mathbf{R}(t))}{\partial R\_{I} \partial R\_{J}} \end{split} \tag{29}$$

To simplify notation, from now on, we suppress the t-dependence in all variables, and Equation (29) becomes:

$$m\ddot{\tilde{R}}\_I = -\int \rho\_{\rm SCF}(x; \mathbf{R}, \rho) \frac{\partial V\_{\rm ion}(x; \mathbf{R})}{\partial R\_I} \, \mathrm{d}x \tag{30a}$$

$$\ddot{\tilde{\rho}}(x) = \omega^2(\rho\_{\rm SCF}(x; \mathbf{R}, \rho) - \rho(x)) - \sum\_{I=1}^{M} \frac{\partial \rho^\*}{\partial R\_I}(x; \mathbf{R}) \ddot{\tilde{R}}\_I - \sum\_{I,J=1}^{M} \dot{\tilde{R}}\_I \dot{\tilde{R}}\_J \frac{\partial^2 \rho^\*}{\partial R\_I \partial R\_J}(x; \mathbf{R}) \tag{30b}$$

In the linear response regime, we expand Equation (30) and only keep terms that are linear with respect to **<sup>R</sup>** and <sup>ρ</sup> . All the higher order terms, including all the cross products of <sup>R</sup> <sup>I</sup> , ˙ <sup>R</sup> <sup>I</sup> and <sup>ρ</sup> , will be dropped. First, we linearize the force on atom <sup>I</sup> with respect to <sup>ρ</sup> as:

$$\begin{split} &f\_{I}(\mathbf{R};\rho\_{\mathrm{SCF}}(x;\mathbf{R},\rho)) \\ &= -\int \rho\_{\mathrm{SCF}}(x;\mathbf{R},\rho) \frac{\partial V\_{\mathrm{ion}}(x;\mathbf{R})}{\partial R\_{I}} \mathrm{d}x \\ &= -\int \rho^{\*}(x;\mathbf{R}) \frac{\partial V\_{\mathrm{ion}}(x;\mathbf{R})}{\partial R\_{I}} \mathrm{d}x - \int \left(\rho\_{\mathrm{SCF}}(x;\mathbf{R},\rho^{\*}(\mathbf{R}) + \tilde{\rho}) - \rho^{\*}(x;\mathbf{R})\right) \frac{\partial V\_{\mathrm{ion}}(x;\mathbf{R})}{\partial R\_{I}} \mathrm{d}x \\ &\approx -\int \rho^{\*}(x;\mathbf{R}) \frac{\partial V\_{\mathrm{ion}}(x;\mathbf{R})}{\partial R\_{I}} \mathrm{d}x - \int \frac{\delta \rho\_{\mathrm{SCF}}}{\delta \rho}(x,y;\mathbf{R}) \tilde{\rho}(y) \frac{\partial V\_{\mathrm{ion}}(x;\mathbf{R})}{\partial R\_{I}} \mathrm{d}x \,\mathrm{d}y \end{split} \tag{31}$$

Next, we linearize with respect to **<sup>R</sup>** ; we have:

$$\int \rho^\*(x; \mathbf{R}) \frac{\partial V\_{\rm ion}(x; \mathbf{R})}{\partial R\_I} \, \mathrm{d}x \approx -m \sum\_{I,J=1}^M \mathcal{D}\_{IJ} \tilde{R}\_J \tag{32}$$

Here, the matrix, {DIJ }, is the dynamical matrix for the atoms. For the last term in Equation (31), we have:

$$\begin{split} & \int \frac{\delta \rho\_{\rm SCF}}{\delta \rho}(x, y; \mathbf{R}) \widetilde{\rho}(y) \frac{\partial V\_{\rm ion}(x; \mathbf{R})}{\partial R\_I} \, \mathrm{d}x \, \mathrm{d}y \\ & \approx \int \frac{\delta \rho\_{\rm SCF}}{\delta \rho}(x, y; \mathbf{R}^\*) \widetilde{\rho}(y) \frac{\partial V\_{\rm ion}(x; \mathbf{R}^\*)}{\partial R\_I} \, \mathrm{d}x \, \mathrm{d}y \\ & \coloneqq - \, m \mathcal{L}\_I[\widetilde{\rho}] \end{split} \tag{33}$$

The last equation in Equation (33) defines a linear functional, <sup>L</sup><sup>I</sup> , with δρSCF δρ (x, y; **<sup>R</sup>**∗) and ∂Vion(x;**R**∗) ∂R<sup>I</sup> evaluated at the fixed equilibrium point, **R**∗.

In the linear response regime, the operator, δρSCF δρ (x, y; **R**∗), carries all the information of the SCF iteration scheme. Let us now derive the explicit form of δρSCF δρ (x, y; **R**∗) for the k-step simple mixing scheme with mixing parameter (step length) α (0 < α ≤ 1). If k = 1, the simple mixing scheme reads:

$$\rho\_{\rm SCF}(x; \mathbf{R}, \rho^\*(\mathbf{R}) + \widehat{\rho}) = \alpha F[\rho^\*(\mathbf{R}) + \widehat{\rho}] + (1 - \alpha)(\rho^\*(\mathbf{R}) + \widehat{\rho})\tag{34}$$

so:

$$\frac{\delta \rho\_{\rm SCF}}{\delta \rho}(x, y; \mathbf{R}^\*) = \delta(x - y) - \alpha \left(\delta(x - y) - \frac{\delta F}{\delta \rho}(x, y)\right) \tag{35}$$

Here, δ(x) is the Dirac δ-function, and the operator, <sup>δ</sup>(<sup>x</sup> <sup>−</sup> <sup>y</sup>) <sup>−</sup> δF δρ (x, y) := ε(x, y), is usually refereed to as the *dielectric operator* [31,32]. To simplify the notation, we would not distinguish the kernel of an integral operator from the integral operator itself. For example, ε(x, y) is denoted by ε. Neither will we distinguish integral operators defined on continuous space from the corresponding finite dimensional matrices obtained from certain numerical discretization. This slight abuse of notation allows us to simply denote f(x) = A(x, y)g(y) dy by f = Ag as a matrix-vector multiplication and to denote the composition of kernels of integral operators C(x, y) = dzA(x, z)B(z, y) by C = AB as a matrix-matrix multiplication. Using such notations, Equation (35) can be written in a more compact form:

$$\frac{\delta\rho\_{\rm SCF}}{\delta\rho} = I - \alpha\varepsilon\tag{36}$$

Similarly, for the k-step simple mixing method, we have:

$$\frac{\delta \rho\_{\rm SCF}}{\delta \rho} = (1 - \alpha \varepsilon)^k \tag{37}$$

In general, the dielectric operator is diagonalizable, and all eigenvalues of ε are real. Therefore, the linear response operator, δρSCF δρ , for the k-th step simple mixing method is also diagonalizable with real eigenvalues.

From Equation (30b), we have:

$$\begin{split} &\rho\_{\text{SCF}}(x;\mathbf{R},\rho)-\rho(x) \\ &= (\rho\_{\text{SCF}}(x;\mathbf{R},\tilde{\rho}+\rho^\*(\mathbf{R}))-\rho^\*(x;\mathbf{R}))-(\rho(x)-\rho^\*(x;\mathbf{R})) \\ &\approx \int \frac{\delta\rho\_{\text{SCF}}}{\delta\rho}(x,y;\mathbf{R})\tilde{\rho}(y)\,\mathrm{d}y-\tilde{\rho}(x) \\ &\approx \int \frac{\delta\rho\_{\text{SCF}}}{\delta\rho}(x,y;\mathbf{R}^\*)\tilde{\rho}(y)\,\mathrm{d}y-\tilde{\rho}(x) \\ &:= -\int \mathcal{K}(x,y)\tilde{\rho}(y)\,\mathrm{d}y \end{split} \tag{38}$$

Here, we have used consistency condition (24). The last line of Equation (38) defines a kernel:

$$\mathcal{K}(x, y) = \delta(x - y) - \frac{\delta \rho\_{\text{SCF}}}{\delta \rho}(x, y; \mathbf{R}^\*) \tag{39}$$

which is an important quantity for the stability of TRBOMD, as will be seen later. Using Equations (33) and (38), the equation of motion, (30), can be written in the linear response regime as:

$$\begin{aligned} \ddot{\tilde{R}}\_I &= -\sum\_{J=1}^M \mathcal{D}\_{IJ} \tilde{R}\_J + \mathcal{L}\_I[\tilde{\rho}] \\ \ddot{\tilde{\rho}}(x) &= -\omega^2 \int \mathcal{K}(x, y) \tilde{\rho}(y) \, \mathrm{d}y - \sum\_{I=1}^M \frac{\partial \rho^\*}{\partial R\_I}(x; \mathbf{R}^\*) \left( -\sum\_{J=1}^M \mathcal{D}\_{IJ} \tilde{R}\_J + \mathcal{L}\_I[\tilde{\rho}] \right) \end{aligned} \tag{40}$$

Define:

$$\mathcal{L} = (\mathcal{L}\_1, \dots, \mathcal{L}\_M)^T \tag{41}$$

then Equation (40) can be rewritten in a more compact form as:

$$
\ddot{\tilde{\mathbf{R}}} = -\mathcal{D}\tilde{\mathbf{R}} + \mathcal{L}[\tilde{\rho}],
\tag{42a}
$$

$$\ddot{\tilde{\rho}}(x) = -\omega^2 \int \mathcal{K}(x, y)\tilde{\rho}(y) \, \mathrm{d}y - \left(\frac{\partial \rho^\*}{\partial \mathbf{R}}(x; \mathbf{R}^\*)\right)^T \left(-\mathcal{D}\tilde{\mathbf{R}} + \mathcal{L}[\tilde{\rho}]\right) \tag{42b}$$

Now, if the self-consistent iteration is performed accurately regardless of the initial guess, *i.e.*,

$$
\rho\_{\rm SCF}(x; \mathbf{R}, \rho) = \rho^\*(x; \mathbf{R}), \quad \forall \rho \tag{43}
$$

which implies:

$$\frac{\delta \rho\_{\rm SCF}}{\delta \rho}(x, y; \mathbf{R}^\*) = 0, \quad \mathcal{L} = \mathbf{0}, \quad \mathcal{K}(x, y) = \delta(x - y) \tag{44}$$

The linearized equation of motion (42) becomes:

$$
\ddot{\tilde{\mathbf{R}}} = -\mathcal{D}\tilde{\mathbf{R}},\tag{45a}
$$

$$\ddot{\tilde{\rho}}(x) = -\omega^2 \tilde{\rho}(x) + \left(\frac{\partial \rho^\*}{\partial \mathbf{R}}(x; \mathbf{R}^\*)\right)^T \mathcal{D}\tilde{\mathbf{R}}\tag{45b}$$

Therefore, in the case of accurate SCF iteration, according to Equation (45a), the equation of the motion of atoms follows the accurate linearized equation and is decoupled from the fictitious dynamics of <sup>ρ</sup> . The normal modes of the equation of motion of atoms can be obtained by diagonalizing the dynamical matrix, D, as:

$$\mathcal{D}\mathbf{v}\_l = \Omega\_l^2 \mathbf{v}\_l, \quad l = 1, \ldots, M \tag{46}$$

The frequencies, {Ωl} (Ω<sup>l</sup> > 0), are known as *phonon frequencies*. When the SCF iterations are performed inaccurately, it is meaningless to assess the accuracy of the approximate dynamics (42) by direct investigation of the trajectories, <sup>R</sup> (t), since small difference in the phonon frequency can cause large error in the phase of the periodic motion, <sup>R</sup> (t), over a long time. However, it is possible to compute the approximate phonon frequencies, {<sup>Ω</sup> <sup>l</sup>}, from Equation (42) and measure the accuracy of TRBOMD in the linearized regime from the relative error:

$$\text{err}\_l = \frac{\widetilde{\Omega}\_l - \Omega\_l}{\Omega\_l} \tag{47}$$

The operator, K(x, y), in Equation (39) is directly related to the stability of the dynamics. Equation (42b) also suggests that in the linear response regime, the spectrum of K(x, y) must be on the real line, which requires that the matrix, δρSCF δρ (x, y; **R**∗), be diagonalizable with real eigenvalues. This has been shown for the simple mixing scheme. However, we remark that the condition that all eigenvalues of K(x, y) are real may not hold for general preconditioners or for more complicated SCF iterations (for instance, Anderson mixing). This is one important restriction of the linear response analysis. Of course, this may not be a restriction for practical TRBOMD simulation for real systems. We will leave further understanding of this to future works.

Let us now assume that all eigenvalues of K are real. The lower bound of the spectrum of K, denoted by λmin(K), should satisfy:

$$
\lambda\_{\min}(\mathcal{K}) > 0 \tag{48}
$$

Equation (48) is a necessary condition for TRBOMD to be stable, which will be referred to as the *stability condition* in the following. Furthermore, ω should be chosen large enough in order to avoid resonance between the motion of **<sup>R</sup>** and <sup>ρ</sup> . Therefore, the *adiabatic condition*:

$$
\omega^2 \gg \frac{\lambda\_{\text{max}}(\mathcal{D})}{\lambda\_{\text{min}}(\mathcal{K})} = \frac{\max\_l \Omega\_l^2}{\lambda\_{\text{min}}(\mathcal{K})} \tag{49}
$$

should also be satisfied. Due to Equation (49), we may assume = 1/ω<sup>2</sup> is a small number and expand Ω<sup>l</sup> in the perturbation series of to quantify the error in the linear response regime. Following the derivation in the appendix, we have:

$$\widetilde{\boldsymbol{\Omega}}\_{l} = \boldsymbol{\Omega}\_{l} \left( 1 - \frac{1}{2\omega^{2}} \mathbf{v}\_{l}^{T} \mathcal{L} \left[ \boldsymbol{\mathcal{K}}^{-1} \left[ \left( \frac{\partial \boldsymbol{\rho}^{\*}}{\partial \mathbf{R}} \right)^{T} \mathbf{v}\_{l} \right] \right] \right) + \mathcal{O}(1/\omega^{4}) \tag{50}$$

where <sup>K</sup><sup>−</sup><sup>1</sup> is the inverse operator of <sup>K</sup> (<sup>K</sup> is invertible, due to the stability condition). Since ω = √κ/Δt, Equation (50) suggests that the accuracy of TRBOMD in the linear response regime is (Δt)<sup>2</sup>, with the pre-constant mainly determined by <sup>L</sup>, *i.e.*, the accuracy of the SCF iteration.

Let us compare TRBOMD with CPMD. It is well known that CPMD accurately approximates the results of BOMD, provided that the electronic and ionic degrees of freedom remain adiabatically separated, as well as the electrons stay close to the Born-Oppenheimer surface [12,13]. More specifically, the fictitious electron mass should be chosen, so that the lowest electronic frequency is well above ionic frequencies:

$$
\mu \ll \frac{E\_{\text{gap}}}{\max\_{l} \Omega\_{l}^{2}} \tag{51}
$$

where Egap is the spectral gap (between the highest occupied and the lowest unoccupied states) of the system, and recall that Ω<sup>l</sup> is the vibration frequency of the lattice phonon. For CPMD, a similar analysis in the linear response regime as above (we omit the derivation here) shows that:

$$
\dot{\Omega}\_l = \Omega\_l (1 + \mathcal{O}(\mu)) \tag{52}
$$

under assumption (51). The adiabaticity (51), as well as the role of the fictitious electron mass on physical quantities have been investigated extensively in [33–35]. The linear relationship (52) between the fictitious electron mass and the dynamical frequencies of CPMD was also presented in [34].

Note that condition (51) implies that CPMD no longer works if the system has a small gap or is even metallic. The usual work-around for this is to add a heat bath for the electronic degrees of freedom in CPMD [33], so that it maintains a fictitious temperature for the electronic degree of freedom. Nonetheless, the adiabaticity is lost for metallic systems, and CPMD is no longer accurate over long time simulation. In contrast, as we have discussed previously, TRBOMD may work for both insulating and metallic systems without any modification, provided that the SCF iteration is accurate and no resonance occurs. This is an important advantage of TRBOMD, which we will illustrate using numerical examples in the next section.

When the system has a gap, we can take μ sufficiently small to satisfy the adiabatic separation condition (51). Compare Equation (52) with Equation (50); we see that μ in CPMD plays a similar role as ω−<sup>2</sup> in TRBOMD. The accuracy (in the linear regime) for CPMD and TRBOMD is the first order in μ and ω−<sup>2</sup>, respectively. At the same time, as taking a small μ or large ω increases the stiffness of the equation, the computational cost is proportional to μ−<sup>1</sup> and ω<sup>2</sup>, respectively.

Let us remark that the above analysis is done in the linear response regime. As shown in [12,13], the accuracy of CPMD, in general, is only <sup>O</sup>(μ<sup>1</sup>/<sup>2</sup>) instead of <sup>O</sup>(μ) for the linear regime. Due to the close connection between these two parameters, we do not expect <sup>O</sup>(ω−<sup>2</sup>) accuracy for TRBOMD in general, either. Actually, as will be discussed in Section 6, if the deviation of atom positions from equilibrium is not so small that we cannot linearize the nuclei motion, the error of TRBOMD in general will be <sup>O</sup>(ω−<sup>1</sup>).

#### 5. Numerical Results in the Linear Response Regime

In this section, we present numerical results for TRBOMD in the linear response regime using a one-dimensional (1D) model for KSDFT without the exchange correlation functional. The model problem can be tuned to exhibit both metallic and insulating features. Such a model was used before in mathematical analysis of ionization conjecture [36].

The total energy functional in our 1D density functional theory (DFT) model is given by:

$$E(\{\psi\_i(x)\}\_{i=1}^N; \mathbf{R}) = \frac{1}{2} \sum\_{i=1}^N \int \left| \frac{d}{dx} \psi\_i(x) \right|^2 \, \mathrm{d}x + \frac{1}{2} \int K(x, y) (\rho(x) + m(x; \mathbf{R})) (\rho(y) + m(y; \mathbf{R})) \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}y \, \mathrm{d}x \, \mathrm{d}y \, \mathrm{d}$$

with ρ(x) = <sup>N</sup> <sup>i</sup>=1 |ψi(x)| 2 . The associated Hamiltonian is given by:

$$H(\mathbf{R}) = -\frac{1}{2}\frac{d^2}{dx^2} + \int K(x, y)(\rho(y) + m(y; \mathbf{R})) \, \mathrm{d}y \tag{54}$$

Here, m(x; **R**) = <sup>M</sup> <sup>I</sup>=1 m<sup>I</sup> (x − R<sup>I</sup> ), with the position of the I-th nucleus denoted by R<sup>I</sup> . Each function, m<sup>I</sup> (x), takes the form:

$$m\_I(x) = -\frac{Z\_I}{\sqrt{2\pi\sigma\_I^2}} e^{-\frac{x^2}{2\sigma\_I^2}}\tag{55}$$

where Z<sup>I</sup> is an integer representing the charge of the i-th nucleus. This can be understood as a local pseudopotential approximation to represent the electron-ion interaction. The second term on the right-hand side of Equation (53) represents the electron-ion, electron-electron and ion-ion interaction energy. The parameter, σ<sup>I</sup> , represents the width of the nuclei in the pseudopotential theory. Clearly, as σ<sup>I</sup> → 0, m<sup>I</sup> (x) → −Z<sup>I</sup> δ(x), which is the charge density for an ideal nucleus. In our numerical simulation, we set σ<sup>I</sup> to a finite value. The corresponding m<sup>I</sup> (x) is called a *pseudo charge density* for the I-th nucleus. We refer to the function, m(x), as the total pseudo-charge density of the nuclei. The system satisfies the charge neutrality condition, *i.e.*,

$$\int \rho(x) + m(x; \mathbf{R}) \, \mathrm{d}x = 0 \tag{56}$$

Since m<sup>I</sup> (x) dx = −Z<sup>I</sup> , the charge neutrality condition (56) implies:

$$\int \rho(x) \, \mathrm{d}x = \sum\_{I=1}^{M} Z\_I = N \tag{57}$$

where N is the total number of electrons in the system. To simplify discussion, we omit the spin degeneracy here. The Hellmann-Feynman force is given by:

$$f\_I = -\int K(x, y)(\rho(y) + m(y; \mathbf{R})) \frac{\partial m(x; \mathbf{R})}{\partial R\_I} \, \mathrm{d}x \, \mathrm{d}y \tag{58}$$

Instead of using a bare Coulomb interaction, which diverges in 1D, we adopt a Yukawa kernel:

$$K(x,y) = \frac{2\pi e^{-\kappa|x-y|}}{\kappa \epsilon\_0} \tag{59}$$

which satisfies the equation:

$$-\frac{d^2}{dx^2}K(x,y) + \kappa^2 K(x,y) = \frac{4\pi}{\epsilon\_0}\delta(x-y) \tag{60}$$

As κ → 0, the Yukawa kernel approaches the bare Coulomb interaction given by the Poisson equation. The parameter, 0, is used to make the magnitude of the electron static contribution comparable to that of the kinetic energy.

The parameters used in the 1D DFT model are chosen as follows. Atomic units are used throughout the discussion unless otherwise mentioned. The Yukawa parameter, κ = 0.01, is small enough so that the range of the electrostatic interaction is sufficiently long, and <sup>0</sup> is set to 10.00. The nuclear charge, Z<sup>I</sup> , is set to one for all atoms. Since spin is neglected, Z<sup>I</sup> = 1 implies that each atom contributes to one occupied state. The Hamiltonian operator is represented in a planewave basis set. All the examples presented in this section consists of 32 atoms. Initially, the atoms are at their equilibrium positions, and the distance between each atom and its nearest neighbor is set to 10 au. Starting from the equilibrium position, each ion is given a finite velocity, so that the velocity on the centroid of mass is zero. In the numerical experiments below, the system contains only one single phonon, which is obtained by assigning an initial velocity, v<sup>0</sup> ∝ (1, −1, 1, −1, ···), to the atoms. We denote by ΩRef the corresponding phonon frequency. We choose v0, so that <sup>1</sup> 2mv<sup>2</sup> <sup>0</sup> = kBTion, where k<sup>B</sup> is the Boltzmann constant and Tion is 10 K, to make sure that the system is in the linear response regime. In the atomic unit, the mass of the electron is one, and the mass of each nuclei is set to 42, 000. By adjusting the parameters, {σI}, the 1D DFT model model can be tuned to resemble an insulating (with σ<sup>I</sup> = 2.0) or a metallic system (with σ<sup>I</sup> = 6.0) throughout the MD simulation. Figure 1 shows the spectrum of the insulating and the metallic system after running 1, 000 BOMD steps with converged SCF iteration.

Figure 1. Spectrum for the insulator and metal with 32 atoms after 1, 000 Born-Oppenheimer molecular dynamics (BOMD) steps with converged self-consistent field (SCF) iteration. (a) Insulator; (b) metal.

In the linear response regime, we measure the error of the phonon frequency calculated from TRBOMD. This can be done in two ways. The first is given by Equation (50), namely, all quantities in the big parentheses in Equation (50) can be directly obtained by using the finite difference method at the equilibrium position, **R**∗. The second is to explore the fact that in the linear response regime, there is a linear relation between the force and the atomic position, as in Equation (32), *i.e.*, Hooke's law:

holds approximately at each time step. Here, {f<sup>I</sup> (tl)} and {<sup>R</sup> <sup>I</sup> (tl)} are obtained from the trajectory of the TRBOMD simulation directly. To numerically compute DIJ , we solve the least square problem:

$$\min\_{\mathcal{D}} \sum\_{l,I} \left\| f\_I(t\_l) + m \sum\_J \mathcal{D}\_{IJ} \widetilde{R}\_J(t\_l) \right\|^2 \tag{62}$$

which yields:

$$\mathcal{D} = -\frac{1}{m} S^{fR} \left( S^{RR} \right)^{-1} \tag{63}$$

where:

$$S\_{IJ}^{fR} = \sum\_{l} f\_I(t\_l) \widetilde{R}\_J(t\_l), \quad S\_{IJ}^{RR} = \sum\_{l} \widetilde{R}\_I(t\_l) \widetilde{R}\_J(t\_l) \tag{64}$$

The frequencies, {<sup>Ω</sup> <sup>l</sup>}, can be obtained by diagonalizing the matrix, <sup>D</sup>. Similarly, one can perform the calculation for the accurate BOMD simulation and obtain the exact value of the frequencies, {Ωl}.

In order to compare the performance among BOMD, TRBOMD and CPMD, we define the following relative errors:

$$\text{err}\_{\Omega}^{\text{Hoke}} = \frac{\widetilde{\Omega}^{\text{Hoke}} - \Omega^{\text{Ref}}}{\Omega^{\text{Ref}}} \tag{65}$$

$$\text{err}\_{\Omega}^{\text{LR}} = \frac{\widetilde{\Omega}^{\text{LR}} - \Omega^{\text{Ref}}}{\Omega^{\text{Ref}}} \tag{66}$$

$$\text{err}\_{\overline{E}} = \frac{\overline{E} - \overline{E}^{\text{Ref}}}{\overline{E}^{\text{Ref}}} \tag{67}$$

$$\text{err}\_{R}^{L^2} = \frac{\|R\_1(t) - R\_1^{\text{Ref}}(t)\|\_{L^2}}{\|R\_1^{\text{Ref}}(t)\|\_{L^2}}\tag{68}$$

$$\text{err}\_{R}^{L^{\infty}} = \frac{\|R\_1(t) - R\_1^{\text{Ref}}(t)\|\_{L^{\infty}}}{\|R\_1^{\text{Ref}}(t)\|\_{L^{\infty}}} \tag{69}$$

where the results from BOMD with convergent SCF iteration are taken to be corresponding reference values, <sup>E</sup> is the average total energy over time, the frequencies, <sup>Ω</sup> Hooke and <sup>Ω</sup>Ref, are obtained via solving the least square problem (62), the frequency, <sup>Ω</sup> LR, is measured by Equation (50) with finite difference methods and R1(t) is the trajectory of the left-most atom.

#### *5.1. Numerical Comparison between BOMD and TRBOMD*

The first run is to validate the performance of TRBOMD. We set the time step Δt = 250, the artificial frequency ω = <sup>1</sup> <sup>Δ</sup><sup>t</sup> = 4.00E-03, the final time T = 2.50E+06 and employ the simple mixing with step length α = 0.3 and the Kerker preconditioner in SCF cycles. Figure 2 plots the energy drift for BOMD with the converged SCF iteration (denoted by BOMD(c)) where the tolerance is 1.00E-08; BOMD with five SCF iterations per time step (denoted by BOMD(5)) and TRBOMD with five SCF iterations per time step (denoted by TRBOMD(5)). We see clearly there that BOMD(5) produces large drift for both insulator and metal, but TRBOMD(5) does not. Actually, from Table 1, the relative error in the average total energy over time between TRBOMD(5) and BOMD(c) is under 1.30E-05, but BOMD(c) needs about an average of 45 SCF iterations per time step to reach the tolerance 1.00E-08. Figure 3 plots corresponding trajectory of the left-most atom during about the first 25 periods and shows that the trajectory from TRBOMD (five) almost coincides with that from BOMD (c), which is also confirmed by the data of err<sup>L</sup><sup>2</sup> <sup>R</sup> and err<sup>L</sup><sup>∞</sup> <sup>R</sup> in Table 1. However, for BOMD(5), the atom will cease oscillation after a while. A similar phenomena occurs for other atoms. In Table 1, we present more results for TRBOMD(n) with n = 3, 5, 7. We observe there that TRBOMD(n) gives more accurate results with larger n, and errHooke <sup>Ω</sup> has a similar behavior as n increases to errLR <sup>Ω</sup> , which is in accord with our previous linear response analysis in Section 4.

Figure 2. The energy fluctuations around the starting energy, E(t = 0), as a function of time. The time step is Δt = 250. The final time is 2.50E+06 and ω = 1/Δt = 4.00E-03. The simple mixing with the Kerker preconditioner is applied in SCF cycles. BOMD (c) denotes the BOMD simulation with converged SCF iteration, and BOMD (n) (resp.TRBOMD(n)) represents the BOMD (resp. TRBOMD) simulation with n SCF iterations per time step. It shows clearly that BOMD (five) produces large drift for both the insulator (a) and the metal (b), but TRBOMD (five) does not.

Table 1. The errors for time reversible Born-Oppenheimer molecular dynamics (TRBOMD) (n). The settings are the same as those in Figure 2, except for the number of SCF iterations.


Figure 3. The position of the left-most atom as a function of time. The settings are the same as those in Figure 2. It shows clearly that the trajectory from TRBOMD (five) almost coincides with that from BOMD (c). However, for BOMD (five), the atom will cease oscillation after a while. (a) Insulator; (b) metal.

Figure 4. The absolute value of the error for TRBOMD (three) as a function of 1/ω<sup>2</sup> in logarithmic scales. The time step is Δt = 20, and the final time is 6.00E+05. For the readers' reference, within each plot, the red straight line denotes corresponding linear dependence, while the red solid point on the x axis represents the critical value of λmin(K)/λmax(D). (a) Insulator; (b) metal.

According to Equation (50), we have that errLR <sup>Ω</sup> is proportional to 1/ω<sup>2</sup> for large ω. We verify this behavior using TRBOMD(3) as an example. In this example, a smaller time step, Δt = 20, is set to allow bigger artificial frequency ω. The final time is T = 6.00E+05, and the simple mixing with α = 0.3 and the Kerker preconditioner is applied in SCF iterations. For TRBOMD (three) under these settings, we have λmin(K) 8.81E-03 for the insulator and λmin(K) 5.92E-01

for the metal, and thus, the critical values of (ΩRef)<sup>2</sup>/λmin(K) in Equation (49) are about <sup>7</sup>.12E-<sup>06</sup> and 1.90E-08, respectively. We choose ω<sup>2</sup> = 2.50E-03, 2.50E-04, 2.50E-05, 2.50E-06, 2.50E-07, 2.50E-08, 2.50E-09, and plot in Figure 4 the absolute values of errHooke <sup>Ω</sup> , errE, err<sup>L</sup><sup>2</sup> <sup>R</sup> for TRBOMD (three) as a function of <sup>1</sup>/ω<sup>2</sup> in logarithmic scales. When <sup>1</sup>/ω<sup>2</sup> <sup>λ</sup>min(K)/(ΩRef)<sup>2</sup>, Figure 4 shows clearly that all of <sup>|</sup>errHooke <sup>Ω</sup> <sup>|</sup>, <sup>|</sup>errE|, <sup>|</sup>err<sup>L</sup><sup>2</sup> <sup>R</sup> <sup>|</sup> depend linearly on <sup>1</sup>/ω<sup>2</sup>. The error, err<sup>L</sup><sup>∞</sup> <sup>R</sup> , has a similar behavior to err<sup>L</sup><sup>2</sup> <sup>R</sup> and is skipped here for saving space.

The last example illustrates the possible unstable behavior of TRBOMD when the stability condition λmin(K) > 0 in Equation (48) is violated. Here, we take the insulator as an example and set the time step Δt = 250, the final time to 2.50E+05 and the artificial frequency ω = <sup>1</sup> <sup>Δ</sup><sup>t</sup> = 4.00E-03. The simple mixing with α = 0.3 is now applied in SCF iterations. Under these setting, we have λmin(K) < 0, e.g., λmin(K) = −2.42E+03 for TRBOMD (three). Figure 5a plots the energy drift for TRBOMD (n) with n = 3, 5, 7, 45. We see clearly there that TRBOMD is unstable even using 45 SCF iterations per time step (recall that BOMD (c) in the first run needs about average 45 SCF iterations per time step). Figure 5b plots the corresponding trajectory of the left-most atom and shows that the atom is driven wildly by the non-convergent SCF iteration.

Figure 5. The unstable behavior of TRBOMD with the simple mixing for the insulator. The time step is Δt = 250. The final time is 2.50E+05 and ω = 1/Δt = 4.00E-03. (a) The energy drift; (b) the trajectory of the left-most atom.

We now present some numerical examples for CPMD illustrating the difference between CPMD and TRBOMD. As we have discussed, TRBOMD is applicable to both metallic and insulting systems, while CPMD becomes inaccurate when the gap vanishes. To make this statement more concrete, we apply CPMD to the same atom chain system. We implement CPMD using a standard velocity Verlet scheme combined with RATTLEfor the orthonormality constraints [37–39].

We present in Figure 6 the error of CPMD simulation for different choices of fictitious electron mass μ. We study the relative error of the phonon frequency, errHooke <sup>Ω</sup> , the relative error of the position of the left-most atom measured in L<sup>2</sup> norm, *i.e.*, err<sup>L</sup><sup>2</sup> <sup>R</sup> . We observe in Figure 6a linear convergence of CPMD to the BOMD result as the parameter, μ, decreases. This is consistent with our analysis. Recall that in CPMD, μ plays a similar role as ω−<sup>2</sup> in TRBOMD. For the metallic example, the behavior is quite different; actually, Figure 6b shows a systematic error as μ decreases. For metallic system, as the spectral gap vanishes, the adiabatic separation between ionic and electronic degrees of freedom cannot be achieved no matter how small μ is. The adiabatic separation for TRBOMD, on the other hand, relies on the choice of an effective ρSCF, and hence, TRBOMD also works for a metallic system, as Figure 4 indicates.

Figure 6. The absolute value of the error for Car-Parrinello molecular dynamics (CPMD) as a function of μ in logarithmic scales. The time step is Δt = 20, and the final time is 6.00E+05. (a) Insulator; (b) metal.

Figure 7. The trajectory of the position of the left-most atom. The dashed line is the result from BOMD with converged SCF iteration. Colored solid lines are the results from CPMD with fictitious electron mass μ = 2, 500, 5, 000, 10, 000 and 20, 000. The time step is Δt = 20; the trajectory plotted is within the time interval, [2.00E+05, 4.00E+05]. (a) Insulator; (b) metal.

The different behavior of CPMD for insulating and metallic systems is further illustrated by Figure 7, which shows the trajectory of the position of the left-most atom during the simulation. The phase error is apparent from the two subfigures. While the phase error decreases so that the trajectory approaches that of BOMD for the insulator in Figure 7a, the result in Figure 7b shows a systematic error for a metallic system.

#### 6. Beyond the Linear Response Regime: Non-Equilibrium Dynamics

The discussion so far has been limited to the linear response regime so that we can make linear approximations for the degrees of freedom of both nuclei and electrons. In this case, as the system becomes linear, explicit error analysis has been given. For practical applications, we will be also interested in non-equilibrium nuclei dynamics, so that the deviation of atom positions is no longer small. In this section, we will investigate the non-equilibrium case using the averaging principle (see e.g., [40,41] for a general introduction on the averaging principle).

Figure 8. Comparison of the trajectories of the first three atoms from the left for a non-equilibrium system. Different atoms are distinguished by color (blue for the initially left-most atom; green for the initially second left-most atom; red for the initially third left-most atom). Solid lines are the results from BOMD (c); circled lines are the results from TRBOMD (seven); dashed lines are the results from BOMD (seven). It is evident that while the results from BOMD with a non-convergent SCF iteration have a huge deviation, the results from TRBOMD are hardly distinguishable from the "true" results from BOMD.

Let us first show numerically a non-equilibrium situation for the atom chain example discussed before. Initially, the 32 atoms stay at their equilibrium position. We set the initial velocity so that the left-most atom has a large velocity towards the right and other atoms have equal velocity towards the left. The mean velocity is equal to zero; so, the center of mass does not move. Figure 8 shows the trajectory of the positions of the first three atoms from the left. We observe that the results from TRBOMD agree very well with the BOMD results with convergent SCF iterations. Let us note that in the simulation, the left-most atom crosses over the second left-most atom. This happens since, in our model, we have taken a 1D analog of Coulomb interaction, the nuclei background charges are smeared out and, hence, the interaction is "soft" without hard-core repulsion. In Figure 9, we plot the difference between ρSCF and the converged electron density of the SCF iteration (denoted by ρKS) along the TRBOMD simulation. We see that the electron density used in TRBOMD stays close to the ground state electron density corresponding to the atom configuration.

To understand the performance of TRBOMD, recall that the equations of motion are given by:

$$\begin{aligned} m\ddot{R}\_I(t) &= -\int \rho\_{\text{SCF}}(x; \mathbf{R}(t), \rho(t)) \frac{\partial V\_{\text{ion}}(x; \mathbf{R}(t))}{\partial R\_I} \, \text{d}x \\ \ddot{\rho}(x, t) &= \omega^2(\rho\_{\text{SCF}}(x; \mathbf{R}(t), \rho(t)) - \rho(x, t)) \end{aligned}$$

To satisfy the adiabatic condition (49) from the linear analysis, ω here is a large parameter. As a result, the time scales of the motions of the nuclei and of the electrons are quite different: The electronic degrees of freedom move much faster than the nuclear degrees of freedom.

Let us consider the limit, ω → ∞. In this case, we may freeze the **R** degree of freedom in the equation of motion for ρ, as ρ changes on a much faster time scale. To capture the two time scale behavior, we introduce a heuristic two-scale asymptotic expansion with faster time variable given by τ = ωt (with some abuse of notation):

$$R(t) = R(t) \quad \text{and} \quad \rho(x, t) = \rho(x, t, \tau) \tag{70}$$

and hence:

$$\ddot{\rho}(x,t) = \omega^2 \partial\_\tau^2 \rho(x,t,\tau) + 2\omega \partial\_\tau \partial\_t \rho(x,t,\tau) + \partial\_t^2 \rho(x,t,\tau) \tag{71}$$

Therefore, to the leading order, after neglecting the terms of <sup>O</sup>(ω−<sup>1</sup>), we obtain:

$$m\ddot{R}\_I(t) = -\int \rho\_{\rm SCF}(x; \mathbf{R}(t), \rho(t, \tau)) \frac{\partial V\_{\rm ion}(x; \mathbf{R}(t))}{\partial R\_I} \, \mathrm{d}x \tag{72}$$

$$
\partial\_\tau^2 \rho(x, t, \tau) = \rho\_{\rm SCF}(x; \mathbf{R}(t), \rho(t, \tau)) - \rho(x, t, \tau) \tag{73}
$$

For the equation of motion for ρ, note that as **R** only depends on t, the nuclear positions are fixed parameters in Equation (73).

To proceed, we consider the scenario that ρ(t, τ ) is close to the ground state electron density corresponding to the current atom configuration, ρ∗(**R**(t)). We have seen from numerical examples (Figure 9) that this is indeed the case for a good choice of SCF iteration, while we do not have a proof of this in the general case. Hence, we linearize the map: ρSCF.

$$\rho\_{\rm SCF}(x;\mathbf{R},\rho) = \rho^\*(x;\mathbf{R}) + \int \frac{\delta\rho\_{\rm SCF}}{\delta\rho}(x, y; \mathbf{R}, \rho^\*(\mathbf{R})) (\rho(y) - \rho^\*(y; \mathbf{R})) \, dy \tag{74}$$

and Equation (73) becomes:

$$
\partial^2\_\tau \rho(x, t, \tau) = -\mathcal{K}(\mathbf{R}) (\rho(x, t, \tau) - \rho^\*(x; \mathbf{R}(t)))\tag{75}
$$

where K(**R**) is the same as in Equation (39), except it is now defined for each atom configuration, **R**. Let us emphasize that here we have only taken the linear approximation for the electronic degrees of freedom, while keeping the possibly nonlinear dynamics of **R**. This is different from the linear response regime considered before, where the nuclei motion is also linearized.

Figure 9. The difference of ρSCF with the converged electron density of SCF iteration (denoted by ρKS) measured in L<sup>1</sup> norm along the TRBOMD simulation for a non-equilibrium system.

Under the stability condition (48), it is easy to see that for ρ(t, τ ) satisfying Equation (75), the limit of the time average:

$$\begin{split} \overline{\rho}(x; \mathbf{R}(t)) &= \lim\_{T \to \infty} \frac{1}{T} \int\_{0}^{T} \rho\_{\text{SCF}}(x; \mathbf{R}(t), \rho(t, \tau)) \, \mathrm{d}\tau \\ &\approx \rho^\*(x; \mathbf{R}(t)) + \int \frac{\delta \rho\_{\text{SCF}}}{\delta \rho}(x, y; \mathbf{R}, \rho^\*(\mathbf{R})) \left( \lim\_{T \to \infty} \frac{1}{T} \int\_{0}^{T} \rho(y; t, \tau) - \rho^\*(y; \mathbf{R}(t)) \, \mathrm{d}\tau \right) \mathrm{d}y \\ &= \rho^\*(x; \mathbf{R}(t)) \end{split} \tag{76}$$

Take the average of Equation (72) in τ , we have:

$$m\ddot{R}\_I(t) = -\int \overline{\rho}(x; \mathbf{R}(t)) \frac{\partial V\_{\text{ion}}(x; \mathbf{R}(t))}{\partial R\_I} \, \mathrm{d}x \tag{77}$$

Because of Equation (76), the above dynamics is given by:

$$m\ddot{R}\_I(t) = -\int \rho^\*(x; \mathbf{R}(t)) \frac{\partial V\_{\rm ion}(x; \mathbf{R}(t))}{\partial R\_I} \, \mathrm{d}x \tag{78}$$

which agrees with the equation of the motion of atoms in BOMD. As we have neglected <sup>O</sup>(ω−<sup>1</sup>) terms in the averaging, the difference in the trajectory of BOMD and TRBOMD is on the order of <sup>O</sup>(ω−<sup>1</sup>) for finite <sup>ω</sup>.

$$\overline{\rho}(x; \mathbf{R}(t)) = \lim\_{T \to \infty} \frac{1}{T} \int\_0^T \rho\_{\text{SCF}}(x; \mathbf{R}(t), \rho(t, \tau)) \,\mathrm{d}\tau \tag{79}$$

exists or how close the limit is to ρ∗(x; **R**(t)) in a fully nonlinear regime. One particular difficulty lies in the fact that unlike BOMD or CPMD, we do not have a conserved Lagrangian for the TRBOMD. Actually, it is easy to construct a much simplified analog of Equation (73), the average of which is different from ρ∗. For example, if we consider the following analog, which only has one degree of freedom, ξ:

$$
\ddot{\xi} = \left(\xi/2 + a\xi^2\right) - \xi \tag{80}
$$

where (ξ/2 + aξ<sup>2</sup>) is the analog of ρSCF, here, and a > 0 is a small parameter, which characterizes the nonlinearity of the map. Note that:

$$\ddot{\xi} = -\xi/2 + a\xi^2 = -\partial\_{\xi}(\xi^2/4 - a\xi^3/3) \tag{81}$$

The motion of ξ is equivalent to the motion of a particle in an anharmonic potential. It is clear that if, initially, ξ(0) = 0, the long time average of ξ will not be zero. Furthermore, if, initially, ξ(0) is too large, the orbit is not closed (ξ escapes the well around ξ = 0). If phenomena similar to this occur for a general ρSCF, then even in the limit, ω → ∞, there will be a systematic uncontrolled bias between BOMD and TRBOMD. This is in contrast with Car-Parrinello molecular dynamics, which agrees with BOMD in the limit fictitious mass going to zero (μ → 0) if the adiabatic condition holds.

As a result of this discussion, in practice, when we apply TRBOMD to a particular system, we need to be cautious whether the electronic degree of freedom remains around the converged Kohn-Sham electron density, which is not necessarily guaranteed (in contrast to CPMD for systems with gaps).

#### 7. Conclusions

The recently developed time reversible Born-Oppenheimer molecular dynamics (TRBOMD) scheme provides a promising way for reducing the number of self-consistent field (SCF) iterations in molecular dynamics simulation. By introducing auxiliary dynamics to the initial guess of the SCF iteration, TRBOMD preserves the time-reversibility of the NVE dynamics, both at the continuous and at the discrete level, and exhibits improved long time stability over the Born-Oppenheimer molecular dynamics with the same accuracy. In this paper we analyze, for the first time, the accuracy and the stability of the TRBOMD scheme, and our analysis is verified through numerical experiments using a one-dimensional density functional theory (DFT) model without exchange correlation potential. The validity of the stability condition in TRBOMD is directly associated with the quality of the SCF iteration procedure. In particular, we demonstrate in the case in which the SCF iteration procedure is not very accurate, the stability condition can be violated, and TRBOMD becomes unstable. We also 54

compare TRBOMD with the Car-Parrinello molecular dynamics (CPMD) scheme. CPMD relies on the adiabatic evolution of the occupied electron states, and therefore, CPMD works better for insulators than for metals. However, TRBOMD may be effective for both insulating and metallic systems. The present study is restricted to the NVE system and to simplified DFT models. Moreover, the analysis in the present work is mainly focused on the accuracy of trajectories and harmonic frequencies in the perturbation regime. However, in practice, the more important question is how the introduced artificial dynamics influence static properties, like distribution functions, and the most critical capability is to reproduce the correct distribution functions. The performance of TRBOMD for the NVT system and for realistic DFT systems with emphasis on the accuracy of static properties will be our future work.

#### Acknowledgments

This work was partially supported by the Laboratory Directed Research and Development Program of Lawrence Berkeley National Laboratory under the US Department of Energy contract number DE-AC02-05CH11231 and the Scientific Discovery through Advanced Computing (SciDAC) program funded by the US Department of Energy, Office of Science, Advanced Scientific Computing Research and Basic Energy Sciences (L.L.), the Alfred P. Sloan Foundation and the National Science Foundation (J.L.), the National Natural Science Foundation of China under the Grant Nos. 11101011 and 91330110 and the Specialized Research Fund for the Doctoral Program of Higher Education under the Grant No. 20110001120112 (S.S.). The authors would also like to thank the referees for many useful suggestions.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### Appendix

Here, we derive the perturbation analysis result in Equation (50). When deriving the perturbation analysis below, we use linear algebra notation and do not distinguish matrices from operators. We use the linear algebra notation, replace all the integrals by matrix-vector multiplication and drop all the dependencies of the electron degrees of freedom, <sup>x</sup> and <sup>y</sup>. For instance, <sup>K</sup><sup>ρ</sup> should be understood as <sup>K</sup>(x, y)<sup>ρ</sup> (y) dy. We also denote ∂ρ<sup>∗</sup> <sup>∂</sup>**<sup>R</sup>** (x; **<sup>R</sup>**∗) simply by ∂ρ<sup>∗</sup> <sup>∂</sup>**<sup>R</sup>** ; then, Equation (42) can be rewritten as:

$$
\begin{pmatrix}
\ddot{\tilde{\mathbf{R}}} \\
\ddot{\tilde{\rho}}
\end{pmatrix} = A \begin{pmatrix}
\tilde{\mathbf{R}} \\
\tilde{\rho}
\end{pmatrix} = \left(A\_0 + \frac{1}{\epsilon} A\_1\right) \begin{pmatrix}
\tilde{\mathbf{R}} \\
\tilde{\rho}
\end{pmatrix} \tag{82}
$$

Here:

$$A\_1 = \begin{pmatrix} 0 & 0 \\ 0 & -\mathcal{K} \end{pmatrix} \tag{83}$$

is a block diagonal matrix, and:

$$A\_0 = \begin{pmatrix} -\mathcal{D} & \mathcal{L} \\ \left(\frac{\partial \rho^\*}{\partial \mathbf{R}}\right)^T \mathcal{D} & -\left(\frac{\partial \rho^\*}{\partial \mathbf{R}}\right)^T \mathcal{L} \end{pmatrix} = \begin{pmatrix} \mathcal{D} \\ -\left(\frac{\partial \rho^\*}{\partial \mathbf{R}}\right)^T \end{pmatrix} \begin{pmatrix} -\mathcal{D} & \mathcal{L} \end{pmatrix} \tag{84}$$

is a rank-M matrix. I is a M × M identity matrix. Now, assume the eigenvalues and eigenvectors of A follow the expansion:

$$
\lambda = \lambda\_0 + \epsilon \lambda\_1 + \dotsb, \quad v = v\_0 + \epsilon v\_1 + \dotsb \tag{85}
$$

Match the equation up to O(), and:

$$A\_1 v\_0 = 0\tag{86a}$$

$$A\_0 v\_0 + A\_1 v\_1 = \lambda\_0 v\_0 \tag{86b}$$

$$A\_0 v\_1 + A\_1 v\_2 = \lambda\_0 v\_1 + \lambda\_1 v\_0 \tag{86c}$$

Equation (86a) implies that v<sup>0</sup> ∈ KerA1. Apply the projection operator, PKerA<sup>1</sup> , to both sides of Equation (86b), and use v<sup>0</sup> = PKerA<sup>1</sup> v0; we have:

$$P\_{\text{Ker}A\_1} A\_0 P\_{\text{Ker}A\_1} v\_0 = \lambda\_0 P\_{\text{Ker}A\_1} v\_0 \tag{87}$$

or:

$$
\begin{pmatrix} -\mathcal{D} & 0 \\ 0 & 0 \end{pmatrix} v\_0 = \lambda\_0 v\_0 \tag{88}
$$

From the eigen-decomposition of <sup>D</sup> in Equation (46), we have <sup>λ</sup><sup>0</sup> <sup>=</sup> <sup>−</sup>Ω<sup>2</sup> <sup>l</sup> for some l = 1,...,M. For a fixed l, the corresponding eigenvector to the 0-th order is:

$$v\_0 = \begin{pmatrix} \mathbf{v}\_l, \mathbf{0} \end{pmatrix}^T \tag{89}$$

From Equation (86b), we also have:

$$A\_1 v\_1 = \lambda\_0 v\_0 - A\_0 v\_0 = \begin{pmatrix} \mathbf{0} \\ -\Omega\_l^2 \left(\frac{\partial \rho^\*}{\partial \mathbf{R}}\right)^T \mathbf{v}\_l \end{pmatrix} \tag{90}$$

and therefore:

$$v\_1 = \Omega\_l^2 \left( \mathbf{0}, \mathcal{K}^{-1} \left[ \left( \frac{\partial \rho^\*}{\partial \mathbf{R}} \right)^T \mathbf{v}\_l \right] \right)^T \tag{91}$$

Finally, we apply v<sup>0</sup> to both sides of Equation (86c); we have:

$$\lambda\_1 = (v\_0, A\_0 v\_1) - (v\_0, \lambda\_0 v\_1) = \Omega\_l^2 \mathbf{v}\_l^T \mathcal{L} \left[ \mathcal{K}^{-1} \left[ \left( \frac{\partial \rho^\*}{\partial \mathbf{R}} \right)^T \mathbf{v}\_l \right] \right] \tag{92}$$

Therefore:

$$\lambda = -\Omega\_l^2 + \epsilon \Omega\_l^2 \mathbf{v}\_l^T \mathcal{L} \left[ \mathcal{K}^{-1} \left[ \left( \frac{\partial \rho^\*}{\partial \mathbf{R}} \right)^T \mathbf{v}\_l \right] \right] + \mathcal{O}(\epsilon^2) \tag{93}$$

In other words, the phonon frequency, <sup>Ω</sup> <sup>l</sup> <sup>=</sup> √−<sup>λ</sup>, up to the leading order is:

$$\widetilde{\Omega}\_l = \Omega\_l \left( 1 - \frac{1}{2\omega^2} \mathbf{v}\_l^T \mathcal{L} \left[ \mathcal{K}^{-1} \left[ \left( \frac{\partial \rho^\*}{\partial \mathbf{R}} \right)^T \mathbf{v}\_l \right] \right] \right) + \mathcal{O}(1/\omega^4) \tag{94}$$

which is Equation (50).

#### References


Reprinted from *Entropy*. Cite as: Morales, M.A.; Clay, R.; Pierleoni, C.; Ceperley, D.M. First Principles Methods: A Perspective from Quantum Monte Carlo. *Entropy* 2014, *16*, 287–321.

## *Article*

## First Principles Methods: A Perspective from Quantum Monte Carlo

Miguel A. Morales <sup>1</sup>, Raymond Clay <sup>2</sup>, Carlo Pierleoni <sup>3</sup>,4, \* and David M. Ceperley <sup>2</sup>


*Received: 22 September 2013; in revised form: 27 November 2013 / Accepted: 28 November 2013 / Published: 30 December 2013*

Abstract: Quantum Monte Carlo methods are among the most accurate algorithms for predicting properties of general quantum systems. We briefly introduce ground state, path integral at finite temperature and coupled electron-ion Monte Carlo methods, their merits and limitations. We then discuss recent calculations using these methods for dense liquid hydrogen as it undergoes a molecular/atomic (metal/insulator) transition. We then discuss a procedure that can be used to assess electronic density functionals, which in turn can be used on a larger scale for first principles calculations and apply this technique to dense hydrogen and liquid water.

Keywords: quantum Monte Carlo; first-principles simulations; hydrogen; Coupled Electron-Ion Monte Carlo; high pressure

#### 1. Introduction

With the increasing computational power and the greater access to large clusters seen during the last decade, simulation methods have become an increasingly useful tool for many fields of science, including chemistry, materials science, condensed matter physics, and biophysics. In this article we explore some of the future impact of Quantum Monte Carlo in the field of first principles simulation (FPS). By this we mean reliable simulation methods that can be performed on condensed matter systems in the absence of detailed experimental information on those systems. Starting with the general Hamiltonian in Equation (1), and taking as input only the chemical compositions, masses, density, temperature *etc*, currently there is a hierarchy of methods that are used to perform such a simulation. In this introduction we focus on three classes of methods: the use of semi-empirical interatomic potentials together with Monte Carlo (MC) or molecular dynamics (MD) simulations, Density Functional Theory-based simulation methods, and Quantum Monte Carlo simulations.

The first member of the hierarchy uses semi-empirical interatomic potentials among effective atoms considered as point particles, the best known of which is the Lennard-Jones potential. Such potentials are routinely used in the vast majority of simulations (soft condensed matter, biophysics, materials science) and are reviewed in a different contribution to this issue [1]. The first question is how do we construct such a potential? The typical approach is to use available experimental data. However, it is well known that those potentials are not very accurate in the vast majority of systems, even if they match experimental data. Hence, though they can be used to say something about generic properties of systems, quantitative predictions for defect energies, energy barriers, melting temperatures, cannot be trusted. (If the potential has been adjusted to reproduce experimental measurements, then the method is no longer first principles, and the question becomes whether the potential is transferrable, *i.e.*, reliable for properties that are not fitted for.) Another fundamental limitation of this approach is that it becomes difficult to construct reliable interatomic potentials for complex systems containing several types of atoms, for example a solvent with various solutes, or systems under extreme conditions, since it becomes difficult to get enough reliable experimental data to constrain all of the parameters. For these reasons, it is highly desirable to have methods that can provide reliable predictions without input from experimental measurements.

Density Functional Theory (DFT) in the Kohn-Sham formulation maps the problem of many interacting electrons in the external field of the nuclei onto a system of non-interacting electrons in external field, a one body problem, and adds electronic correlation through an exchange-correlation functional. A breakthrough in the usefulness and popularity of simulations occurred with the development of the first-principle molecular dynamics (FPMD) approach by Car and Parrinello [2], where they combined molecular dynamics and DFT to perform simulations of complex chemical systems. Due to its favorable ratio between accuracy and computational cost, DFT has become the workhorse as electronic solver in the field of first-principles simulations. In fact, the recent explosion in the popularity of first-principles methods is, to a large part, due to the success of DFT in providing a fairly accurate description of the electronic structure of materials at a reasonable computational cost. DFT also gives access to a large range of observables. While DFT has been very successful in the description of many types of materials, e.g., metals and weakly correlated systems, many of the currently available exchange-correlation functionals in DFT possess well-known limitations [3], including the failure to properly describe strongly correlated materials, self-interaction errors, *etc*. It is recognized that even for such a fundamental system as water, the FPMD procedure is not accurate enough, giving large errors in many basic properties including the melting temperature, the diffusion constant, the compressibility, among others [4].

In the past decade there has been an explosion of new DFT exchange-correlation functionals with various characteristics. The reason is the difficulty of making systematic improvements to the functional or judging the accuracy of a functional. If the DFT functional is considered as "variable" then how does the user, in the absence of experimental data, decide on the functional? In the case of finite molecular systems, the availability of high-level quantum chemistry methods, like Coupled-Cluster theory offers a possible path towards the improvement of approximated functionals in DFT, for example by minimizing errors in a training set between DFT and Coupled Cluster theory results at various level of accuracy (with Single, Double or Triple excitations). In fact, many exchange-correlation functionals contain optimizable parameters that are obtained from calculations on finite molecular systems (exceptions to this include LDA, PBE, among others), where results of quantum chemistry methods are routinely used as a references. In solids, accurate calculations using many-body methods are computationally expensive, which has limited their use in the development of density functionals. While there has also been considerable developments in other correlated approaches for bulk systems, such as the many-body Green's function methods (GW approximation and Bethe-Salpeter equation), and Dynamical Mean Field Theory (DMFT), they are more expensive and still leave questions of accuracy. For reasons of space, we do not discuss these approaches further.

The third approach in our hierarchy is the use of Quantum Monte Carlo (QMC) methods, which are generalizations of the classical Monte Carlo techniques to quantum statistical physics and fundamentally based on imaginary-time path integrals. For a class of systems (bosons and systems in one dimension) such techniques provide an exact computational method. For general problems, though not exact, they are highly accurate *and* systematically improvable. Although there are a variety of QMC methods (ground state, variational, path integral, auxiliary field...) fundamentally they are closely related. QMC are the most accurate general methods but are less developed and require much more computational facilities than DFT methods (although the scaling of computer time versus system size is similar) limiting the systems on which such simulations can and have been performed. The largest impact to date of QMC has been in the development and improvement of DFT methods; specifically we mention the correlation energy of the electron gas [5], a fundamental component in almost all exchange-correlation functionals used in DFT. Recent calculations [6] give the corresponding correlation energies at finite electronic temperature.

Later in this paper we give an example of work in progress in this direction where QMC is used to directly rank various DFT functionals. We suggest that this benchmark quality data could be used to improve directly the best functionals. One can then envision using the highest ranked functional to develop intermolecular potentials that would then be of higher quality. Ercolessi *et al*. [7] have developed the force-matching procedure to find the optimal effective potential reproducing the forces appearing in an FPMD simulation. Such an approach is now feasible using QMC calculated forces and energies.

First principles simulation methods entirely based on QMC have also been developed in the last decade. These are the Coupled Electron-Ion Monte Carlo method [8] and the QMC-Molecular Dynamics [9], and have been recently reviewed in [10]. However their application to condensed phases has been limited so far to high pressure hydrogen, and hydrogen-helium mixtures because of their considerable computation cost. In this paper we will illustrate their use to investigate the dissociation of liquid molecular hydrogen under pressure, a problem which is still unsolved by DFT methods.

The article is organized as follows. We first describe in Section 2 the various QMC methods. Section 3 is devoted to few applications of QMC. In Section 3.1 we present a QMC study of high pressure phases of hydrogen. This is followed in Section 3.2 by a description of the use of these methods to provide quantitative information on the accuracy of various DFT functionals. Finally we close with a discussion in Section 4.

#### 2. Computational Methods

In this section, we review some of the Quantum Monte Carlo methods used in the first principles modeling of condensed matter systems. Under normal conditions of temperature and pressure, such systems are described to a high degree of accuracy by the non-relativistic Hamiltonian for a collection of electrons and ions. We will use atomic units throughout the paper, where Planck's constant h¯ = m<sup>e</sup> = k<sup>B</sup> = e = 4π<sup>0</sup> = 1 with k<sup>B</sup> being Boltzmann's constant, and the energy is measured in Hartrees E<sup>h</sup> = 315, 775 K = 27.2114 eV. Note that, in these units, the energy of a hydrogen atom is 0.5Eh, the binding energy of a hydrogen molecule is 0.17Eh, the unit of length is the Bohr Radius a<sup>0</sup> = 0.0529 nm, and the molecular equilibrium bond length is 1.4a0. The Hamiltonian of the systems reads

$$
\hat{H}\_{\text{out}} = \left. \hat{T}\_n + \hat{H}\_{el} = \hat{T}\_n + \hat{T}\_e + \hat{V}\_s \right. \tag{1}
$$

$$\hat{T}\_n = -\sum\_{I=1}^{N\_n} \lambda\_I \hat{\bigtriangleup}\_I^2, \quad \hat{T}\_e = -\lambda\_e \sum\_{i=1}^{N\_e} \hat{\bigtriangleup}\_i^2,\tag{2}$$

$$\hat{V}\_{\parallel} = \sum\_{I$$

where N<sup>n</sup> and N<sup>e</sup> are the number of ions and electrons, respectively, in atomic units λ<sup>e</sup> = 1/2, λ<sup>I</sup> = 1/(2M<sup>I</sup> ), and M<sup>I</sup> and z<sup>I</sup> are the mass and charge (in units of the electron mass m<sup>e</sup> and charge e) of the nucleus I. The system occupies a volume Ω. Note that r with lower case indexes (i, j, ...) is used to denote the position of electrons and R with upper case indexes (I, J, ...) is used for the nuclei. When no indices are used, r and R represent the full 3N<sup>e</sup> and 3N<sup>n</sup> dimensional vectors, respectively. The electronic Hamiltonian <sup>H</sup>ˆel corresponds to the solution of the problem in the clamped-nuclei approximation, where the ions produce a fixed external potential for the electrons. Another quantity that will be of interest is the electron number-density given by ρ = Ne/Ω, and parameterized with r<sup>s</sup> = a/a0, where 4πa<sup>3</sup>/3 = ρ−<sup>1</sup>. Given Equation (1), we only need to add the temperature, particle statistics and boundary conditions to completely specify the physical and numerical problem to be solved.

Finding the eigenvalues and eigenfunctions of the Hamiltonian in Equation (1) is a formidable task, impossible to do analytically except for a few simple systems such as the single hydrogen atom. In practice, numerical or approximate theoretical methods must be used. Two of the most widely applicable methods are based either on imaginary-time path integrals or density functional theory (DFT), as discussed in the following subsections.

#### *2.1. Ground State Methods*

The following ground state methods seek to evaluate expectation values of physical observables taken over the ground state wavefunction φ0(R):

$$
\langle \hat{\mathcal{O}} \rangle = \frac{\int dR \, \phi\_0^\*(R) \hat{\mathcal{O}} \phi\_0(R)}{\int dR \, |\phi\_0(R)|^2} \tag{4}
$$

Two problems are evident from this formula. The first is that we almost never know φ0(R) exactly. The second is that even if we did, Equation (4) is a high dimensional integral. The following methods address both these problems. For sake of notation simplicity, throughout the Sections 2.1–2.3 we will indicate by R the set of all coordinates of the quantum degrees of freedom without distinction between electrons and nuclei.

#### 2.1.1. Variational Monte Carlo

Variational Monte Carlo (VMC) is conceptually the simplest of the ground-state QMC methods. It works by approximating the true ground-state wavefunction φ0(R) with some trial wavefunction Ψ<sup>T</sup> (R). Integrals like Equation (4) are then performed using Metropolis Monte Carlo sampling, with Ψ<sup>T</sup> (R) in place of φ0(R) [11]. The accuracy of this method depends strongly on how closely Ψ<sup>T</sup> (R) approximates φ0(R). Fortunately, the variational principle of quantum mechanics gives us a metric by which to improve the quality of trial wavefunctions. Consider the expectation value of the Hamiltonian and its variance:

$$E[\Psi\_T] \quad = \frac{\int dR \, \Psi\_T^\*(R)\hat{H}\Psi\_T(R)}{\int |\Psi\_T(R)|^2 dR} = \frac{\int dR \, |\Psi\_T(R)|^2 E\_L(R)}{\int dR \, |\Psi\_T(R)|^2} \tag{5}$$

$$
\sigma\_E^2[\Psi\_T] = \frac{\int dR \, \Psi\_T^\*(R)(\hat{H} - E[\Psi\_T])^2 \Psi\_T(R)}{\int dR \, |\Psi\_T(R)|^2} \tag{6}
$$

$$\sigma$$

where <sup>E</sup>L(R)=[HˆΨ<sup>T</sup> (R)]/Ψ<sup>T</sup> (R) in Equation (5) is called *local energy*. The variational theorem states that:

$$E[\Psi\_T] \quad \ge \quad E[\phi\_0] \tag{8}$$

$$
\sigma\_E^2[\Psi\_T] \quad \ge \quad \sigma\_E^2[\phi\_0] = 0 \tag{9}
$$

Based on this, improvements to the wavefunction can be quickly gauged by whether they lower the energy and variance.

A popular approach for fermionic problems is to assume a Slater-Jastrow wavefunction. This type of wavefunction possesses the correct fermionic antisymmetry, and symbolically is given by Ψ<sup>T</sup> (R) = det(M(R))e<sup>J</sup>(R) . Here, M(R)ij = φ<sup>j</sup> (ri) is a Slater determinant of single-particle orbitals. The single-particle orbitals φ<sup>j</sup> (r) are typically taken from other quantum-chemistry methods (Hartree-Fock, DFT, *etc*.). J(R) is called a "Jastrow" factor, and is constructed to be symmetric under particle exchange [12,13]. The Jastrow factor is typically chosen to be a sum of species dependent one-body, two-body, and sometimes three-body functions, which are designed to capture bosonic correlations. The form of these functions can vary from analytically derived forms with few to no free parameters, like the RPA jastrow [14,15], to functions with a large number of variational parameters, like b-splines. The interested reader is encouraged to look at the references for more information on Slater-Jastrow wavefunctions [12,16]. One can also go beyond the Slater-Jastrow form; other possible choices include multi-Slater determinant expansions [17], geminals [18], *etc*.

VMC can be improved if we consider classes of trial wavefunctions Ψ<sup>T</sup> (R, α) parameterized by α = (α1, ..., αm) free parameters. We then minimize the energy and/or variance with respect to these parameters. Recent improvements to optimization algorithms allow the optimization of thousands of variational parameters [19,20]. Traditionally, only the Jastrow functions have been parameterized, although work has been done using parameterized single particle orbitals and multi-Slater determinantal expansions.

VMC has some advantages that keep it in use. First, it is usually computationally cheaper than more accurate QMC methods (to be discussed later). VMC can also include several different types of electron correlations (various forms of electronic wave functions). Lastly, it doesn't suffer from a sign problem. However, it is at heart an approximate method, and does depend on the choice of trial wavefunction.

#### 2.1.2. Projector Methods

#### 2.1.2.1. Formalism

Projector methods attempt to stochastically project out the exact many-body ground state, allowing us to sample this distribution for Monte Carlo integration. The "projector", or imaginary-time Green's function G(R , R, β <sup>−</sup> <sup>β</sup>), is the operator solution to the imaginary-time Schrödinger equation:

$$\frac{\partial \Psi}{\partial \beta} = -\hat{H}\Psi(R,\beta) \tag{10}$$

subject to the boundary condition that lim<sup>β</sup>-<sup>→</sup><sup>β</sup> <sup>G</sup>(R , R, β <sup>−</sup>β) = <sup>δ</sup>(R <sup>−</sup>R). One can verify that the formal solution is <sup>G</sup><sup>ˆ</sup> = exp(−βH<sup>ˆ</sup> ). Now consider an arbitrary wavefunction Ψ(R, β = 0) that is not orthogonal to the ground state φ0(R) (in general this is an optimized trial function Ψ<sup>T</sup> ). Expanding this function in terms of the eigenfunctions of the Hamiltonian, and applying the projector to this, we find:

$$\begin{aligned} \Psi(R,\beta) &= \sum\_{i} a\_i \phi\_i(R) e^{-\beta \epsilon\_i} \\ &\propto a\_0 \phi\_0(R) + \sum\_{i} a\_i \phi\_i(R) e^{-\beta(\epsilon\_i - \epsilon\_0)} \end{aligned} \tag{11}$$

This implies that as β → ∞, we are left with just the ground state wavefunction.

For efficiency reasons, it is better to use the "importance-sampled" Schrödinger's equation [12,21,22]. We obtain this by writing the original equation in terms of f(R, β) = Ψ<sup>T</sup> (R)Ψ(R, β). After some algebra [12], we find that

$$\begin{split} \frac{\partial f(R,\beta)}{\partial \beta} &= \hat{L}f(R,\beta) \\ &= \left[\lambda \nabla \cdot [\nabla - F(R)]\right] f(R,\beta) + \left[E\_T - E\_L(R)\right] f(R,\beta) \end{split} \tag{12}$$

<sup>F</sup>(R) is the quantum force defined by <sup>F</sup>(R) = <sup>∇</sup> ln <sup>|</sup>Ψ<sup>T</sup> (R)<sup>|</sup> <sup>2</sup> and EL(R) is the local energy defined above. E<sup>T</sup> , the trial energy, is an arbitrary energy shift, unessential for the physics, but important for the numerical algorithm. If <sup>f</sup>(R, β) <sup>≥</sup> <sup>0</sup> everywhere, then we can interpret <sup>f</sup> as a probability distribution. This amounts to demanding a bosonic many-body ground state (fermions will be covered in a later section). Equation (12) can then be interpreted as a generalized Smoluchowski equation for a drift-diffusion process with sources and sinks. The first term represents a drift-diffusion process, whereas the second term represents an exponential growth/decay process. When we get around to simulating this equation, we will use the mapping between a Smoluchowski equation governing probability distributions, and Langevin-like equations, governing the diffusion and growth of *particles*.

The solution of Equation (12) satisfy the following integral equation

$$f(R,\beta) = \int dR'\tilde{G}(R,R',\beta)f(R',\beta)\tag{13}$$

where the Green's function for this equation is formally <sup>G</sup>˜(R , R, β) = R <sup>|</sup> exp(βLˆ)|R, and it is easy to show that this is related to the original projector by the transformation <sup>G</sup>˜(R , R, β) = Ψ<sup>T</sup> (R )G(R , R, β)Ψ<sup>T</sup> (R)−<sup>1</sup>. In the short-time approximation (τλ << 1), we can decouple the drift-diffusion and growth operators by the Trotter formula. The result (for the symmetric decomposition) is:

$$
\tilde{G}(R',R,\tau) \simeq G\_{DD}(R',R,\tau)G\_B(R',R,\tau) \tag{14}
$$

$$G\_{DD} = -\exp\left(-\frac{(R'-R-2\lambda\tau F(R))^2}{4\lambda\tau}\right) \tag{15}$$

$$G\_B(R', R, \tau) \quad = \exp(-\frac{\tau}{2} [E\_L(R') + E\_L(R) - 2E\_T])\tag{16}$$

where λ indicates either λ<sup>e</sup> or λ<sup>I</sup> as defined after Equation(1). The short-time approximation allows us to deal with the full propagator as a product of short-time propagators, <sup>G</sup>ˆ(β)=(Gˆ(<sup>τ</sup> <sup>=</sup> β/M))<sup>M</sup>. The cost is that we have now incurred in a time-step error that we must take into account.

#### 2.1.2.2. Diffusion Monte Carlo

In diffusion Monte Carlo (DMC) [22–24], we represent the distribution function f(R, β) as an ensemble of 3N-dimensional samples {R1, ..., RM}, which are known as "walkers". The average density of walkers at position R in configurational space is proportional to the distribution function f(R).

As in classical diffusion, we would then simulate Equations (13) and (14) by a Langevin-like process acting on the walkers. Assuming that the time step τ = β/M is sufficiently small, we advance from <sup>f</sup>(R, β) <sup>→</sup> <sup>f</sup>(R, β <sup>+</sup> <sup>τ</sup> ) by first proposing to move each walker <sup>R</sup><sup>i</sup> to <sup>R</sup> <sup>i</sup> by a drift-diffusion step, prescribed by Equation (15). Then we accumulate a weight associated with walker i, given by wi(β+τ ) = wi(β)GB(R <sup>i</sup>, Ri, τ ). To calculate the expectation value of an operator <sup>O</sup><sup>ˆ</sup> over <sup>f</sup>(R, β) = Ψ<sup>T</sup> (R)Ψ(R, β), we average over the ensemble of walkers, including the appropriate weights:

$$
\langle \hat{\mathcal{O}} \rangle = \frac{\sum\_{i=1}^{M} w\_i(\boldsymbol{\beta}) \mathcal{O}(R\_i)}{\sum\_{i=1}^{M} w\_i(\boldsymbol{\beta})} \tag{17}
$$

If we stopped here, this would be the basis of pure-diffusion Monte Carlo [25]. Because these weights are exponential factors, the variance associated with Equation (17) will increase exponentially as the simulation progresses: the weights of a few walkers will exponentially grow, whereas the rest will exponentially tend to zero.

Branching diffusion Monte Carlo [23], by far the most used form of DMC, fixes this problem by using the weights to either replicate or kill off walkers. After each drift-diffusion step, the number of walkers associated with the single walker R<sup>i</sup> to advance to the next time-step, M<sup>i</sup> next is chosen to be M<sup>i</sup> next = INT(wi(β + τ ) + ξ), where ξ is a random number between [0, 1]. The weights of the replicated walkers are all adjusted to conserve the total weight of walker i as much as possible. Modern methods are typically hybrids, where the weights of walkers are carried until they exceed certain established bounds, at which point they are branched [26].

The simulation is run by initializing the starting ensemble according to <sup>f</sup>(R, 0) = <sup>|</sup>Ψ<sup>T</sup> (R)<sup>|</sup> 2. Assuming β is the projection time required to reach the ground-state, the simulation is incremented M = β/τ steps, at which point our ensemble is distributed according to f0(R)=Ψ<sup>T</sup> (R)φ0(R). Samples can then be accumulated, and the simulation is run for a long enough time to achieve the desired statistical error bars.

It is important to note that since we are sampling f0(R), this corresponds to the following type of expectation value, known as a "mixed-estimate":

$$
\langle \hat{\mathcal{O}} \rangle\_{DMC} = \frac{\langle \Psi\_T | \hat{\mathcal{O}} | \phi\_0 \rangle}{\langle \Psi\_T | \phi\_0 \rangle} \tag{18}
$$

For observables that commute with the Hamiltonian, this gives us exact, unbiased estimates over the true many-body ground state wavefunction. For those that don't, the estimators will be biased by the quality of the trial wavefunction. This bias is less than that encountered by VMC, but still present. This can be alleviated somewhat by the use of "extrapolated estimators", and by the "forward-walking" method [27].

#### 2.1.2.3. Reptation Monte Carlo

Reptation Monte Carlo (RMC) is based on the path-integral representation of the projector. Assuming that β is large enough to guarantee sufficient convergence to the ground state, we begin by partitioning the full projector into M segments of time-interval τ = β/M, called "time slices". Inserting a resolution of the identity between each short-time projector, we find the following path-integral expression for the mixed distribution Ψ<sup>T</sup> <sup>|</sup>φ0:

$$\langle \Psi\_T | \phi\_0 \rangle = \int dR\_0 \dots dR\_M \Psi\_T(R\_0) G(R\_0, R\_1, \tau) \dots G(R\_{M-1}, R\_M, \tau) \Psi\_T(R\_M) \tag{19}$$

Using the short-time approximate Green's function at the beginning of this section, we can recast this expectation value in a more traditional path-integral form:

$$
\langle \Psi\_T | \phi\_0 \rangle \quad = \ \mathcal{Z} = \int \mathcal{D}X e^{S[X]} \tag{20}
$$

$$S[X] \ = \ \ln \Psi\_T(R\_0) + \ln \Psi\_T(R\_M) - \sum\_{i=0}^{M-1} L\_s(R\_i, R\_{i+1}) \tag{21}$$

$$L\_s(R',R) \ = \ \frac{(R'-R)^2}{4\lambda\tau} + \frac{1}{2}(R'-R)\cdot(F'-F) \tag{22}$$

$$+\frac{\tau}{2}\left[E\_L(R') + E\_L(R) + \lambda(F^2(R') + F^2(R))\right] \tag{23}$$

Here, X is shorthand for the directed path X = R0,...,RM. Equation (20) plays the role of a partition function in statistical mechanics, where the Π[X] = e<sup>S</sup>[X] /Z is the probability of a given path <sup>X</sup>, <sup>−</sup>S[X] is the path action, which includes the trial wavefunctions at the ends of the path, as well as a sum over "link-actions" Ls(R , R), (see Equations (22) and (23)). The form we used for the link-action comes from imposing symmetry of the normal Green's function under the exchange of two end-points, and writing it in terms of the importance-sampled Green's functions [28].

The versatility of reptation Monte Carlo comes from how Π[X] is sampled. In the original method [29], one takes a given path X and chooses a growth direction at random. One then proposes a new path X<sup>∗</sup> by adding δ time slices to the "head" and removing δ slices from the "tail". Acceptance or rejection of this move is based on the usual Metropolis acceptance step. This type of move is called "reptation", reminiscent of a "reptile", from which the method derives its name. The proposed head move is done by a sequence of drift-diffusion moves, as in DMC, and rigorously preserves detailed balance.

Most practical implementations use what's known as the "bounce algorithm" [28]. Rather than choosing the growth direction randomly, it is set at the beginning of the simulation and is changed only after a rejection step, hence the name "bounce". This method does not satisfy detailed balance, but does satisfy the more general stationarity condition required for Markov chain Monte Carlo. This dramatically decreases the autocorrelation time of the method, and also tames ergodicity problems that have been observed to crop up in the method.

RMC is appealing for two reasons. It gives us the same level of accuracy for the energy as DMC but correlated sampling between different configurations can be done without approximation. This is particularly useful in methods like the Coupled Electron-Ion Monte Carlo. RMC also gives us the ability to sample expectation values over the pure distribution, as seen below:

$$
\langle \hat{\mathcal{O}} \rangle\_{pure} = \frac{\langle \Psi\_T | e^{-\frac{\beta}{2}\hat{H}} \hat{\mathcal{O}} e^{-\frac{\beta}{2}\hat{H}} | \Psi\_T \rangle}{\langle \Psi\_T | e^{-\beta \hat{H}} | \Psi\_T \rangle} \tag{24}
$$

$$=\frac{1}{\mathcal{Z}}\int \mathcal{D}X e^{-S[X]} \mathcal{O}(R\_{\beta/2}) \tag{25}$$

This shows that the center time slice of the reptile is distributed according to <sup>|</sup>φ0(R)<sup>|</sup> <sup>2</sup>, whereas the ends are distributed according to the mixed distribution f(R). This easy access to the pure distribution makes RMC ideal for calculations of unbiased observables and correlation functions, doing so in a more efficient manner than "forward-walking" in DMC. Estimation of observables over the pure distribution works whenever we can write a meaningful estimator in terms of position space coordinates. Diagonal position space observables, like the average potential energy and pair-correlation function, can be measured directly from the sampled pure distribution. Observables that aren't diagonal in position space, like off-diagonal density matrix elements and the momentum distribution, can be measured from the pure distribution with suitable additions to the basic algorithm. This procedure does not work for all estimators however; one can show that evaluating the local kinetic energy over the pure distribution does not yield a correct estimate of the average ground-state kinetic energy.

#### 2.1.2.4. The Fixed-Node Approximation

The previous projector methods we mentioned are in principle *exact* for bosonic systems, since the mapping to a diffusion process is valid when <sup>φ</sup>0(R) <sup>≥</sup> <sup>0</sup> everywhere. However, since the wavefunction for a fermion systems must be antisymmetric under exchange, the ground state wavefunction will have as many negative configurations as positive ones (in many cases the wavefunction can be made real). We can restore the probabilistic interpretation of the wavefunction Ψ(R, β) if we factor its sign into the weight of the walker, or into the observable itself. It turns out that in doing so, we will have large and almost equal contributions to the expectation value of opposite signs. This leads to an exponentially decaying signal to noise ratio, implying that the computational effort required to treat the fermion problem directly scales exponentially. This is the well known "fermion sign problem".

By far, the most common means of alleviating the sign-problem in both DMC and RMC is applying the "fixed-node" approximation [23,24]. We assume that the nodes of φ0(R) are the same as the nodes for Ψ<sup>T</sup> (R). We then propagate our ensemble of walkers or our reptile strictly within restricted space where Ψ<sup>T</sup> (R) doesn't change sign. This can be implemented by rejecting moves that carry walkers across a node, or bouncing a reptile whenever a head move is proposed across a nodal surface. Though this is an uncontrolled approximation, it turns out to be an extremely good one in most cases. Fixed-node energies are proved to be upper bounds of the exact energy [16], which allows us to optimize the nodal surfaces and to compare fixed-node DMC and fixed-node RMC energies with other methods. It turns out that both of these methods are among the most accurate computational methods known for electronic systems.

#### *2.2. Scaling of QMC Methods*

Like DFT, fermionic QMC typically has scaling between <sup>O</sup>(N<sup>3</sup>) and <sup>O</sup>(N<sup>4</sup>) depending on the property computed and the trial function. Here N is the number of particles. In contrast, popular quantum chemistry methods like Moller-Plesset Perturbation Theory, coupled-cluster, or configuration interaction, scale at least like O(N<sup>7</sup>). This makes QMC one of the few accurate many-body theories that is able to treat bulk systems.

Unlike DFT, whose scaling prefactor is governed by the solution of a generalized eigenvalue problem, Monte Carlo methods, in general, have statistical error bars which reduce as the inverse of the square root of the sampled configurations as a consequence of the central limit theorem. This makes quantum Monte Carlo significantly more expensive than DFT to reach chemical accuracy, though it has a smaller uncontrolled bias. The necessity for a much smaller time step in projector monte carlo than in VMC can make projector monte carlo about an order of magnitude more expensive for the same statistical uncertainty.

The cost of a single N-particle monte carlo step in VMC and projector monte carlo methods are determined by the evaluation of the trial wavefunction. For bosonic trial wavefunctions with pair-wise correlations, these calculations scale like <sup>O</sup>(N<sup>2</sup>) per N-particle step. If these correlations are short-ranged, linear scaling can be achieved.

For fermionic trial wavefunctions, the computational cost is determined by the evaluation of single-particle orbitals and by the evaluation of a Slater determinant. The scaling of orbital evaluations depends on whether the electrons are localized since evaluating localized orbitals can be done in constant time. For plane waves basis sets, the cost scales like <sup>O</sup>(N). If we seek to include the effects of backflow, this can increase the computational cost by an additional factor of N. The remaining bottleneck is then the evaluation of the Slater determinant, which scales like <sup>O</sup>(N<sup>3</sup>) per N-particle step. In theory, the cost of the determinant evaluation could be brought down by almost a factor of N if the Slater determinant is sparse, however, the crossover point is prohibitive (greater than 3000 particles for a model system) [30]. This causes VMC and projector monte carlo to realistically scale like <sup>O</sup>(N<sup>3</sup>−<sup>4</sup>) depending on whether one uses backflow or not.

#### *2.3. Finite-Temperature Methods*

Next, we summarize path integral methods. These methods are similar to DMC but can treat systems at non-zero temperature: a many-body density matrix replaces the trial wave function. Concerning first principles simulations the path integral method can be used either to simulate the properties of thermal electrons or to simulate the zero point effects of light nuclei or both. For electronic simulations there are two major problems. First, the energy scale of electrons is 1 Hartree or above, thus to reach ambient temperature requires very long paths. Second, since electrons are fermions, antisymmetrization and hence the sign problem is inevitable. For a more complete overview of the method and its application to fermion systems, see [31,32] respectively.

#### 2.3.1. Path Integrals

To begin, we define the many particle density matrix for a system in equilibrium with an external reservoir at inverse temperature β = 1/kBT (canonical ensemble)

$$
\rho(R, R'; \beta) = \langle R \mid e^{-\beta \hat{H}} \mid R' \rangle \tag{26}
$$

where <sup>R</sup> <sup>≡</sup> (r(1),...,r(N) ) with r(i) specifying the spacial coordinates of the i th of N particles. The partition function is defined as the trace of the density matrix,

$$Z(\beta) = Tr(\rho) = \int dR \langle R \mid e^{-\beta \hat{H}} \mid R \rangle = \int dR \rho(R, R; \beta) \tag{27}$$

The expectation value of any observable may be computed from this definition as

$$
\langle \hat{\mathcal{O}} \rangle = Tr(\hat{\mathcal{O}}\rho)/Z = Tr(\hat{\mathcal{O}}\rho)/Tr(\rho) \tag{28}
$$

Using the product property of the density matrix M times, such that β = Mτ , we write the partition function (or the diagonal density matrix) as an integral over a discrete path:

$$Z(\beta) = \int \left[ \prod\_{i=0}^{M-1} dR\_i \right] \rho(R\_0, R\_1; \tau) \rho(R\_1, R\_2; \tau) \dots \rho(R\_{M-1}, R\_0; \tau) \tag{29}$$

We have reduced the problem of sampling a low temperature density matrix to one of finding a high temperature density matrix and integrating over the path. The action, defined as

$$S(R\_i, R\_j; \tau) \equiv -\ln[\rho(R\_i, R\_j; \tau)]\tag{30}$$

can be broken into kinetic and potential parts, using Trotter's formula. The integration over all of the path variables is done using a specialized form of either Metropolis Monte Carlo or Molecular Dynamics, generating the Path Integral Monte Carlo (PIMC) or Path Integral Molecular Dynamics (PIMD) methods.

Finally, in order to account for the particle statistics of the simulated system, we must sum over permutations P, giving

$$Z(\beta) = \frac{1}{N!} \sum\_{\mathcal{P}} (\pm 1)^{\mathcal{P}} \int\_{R \to \mathcal{P}R} dR\_t e^{-S[R\_t]} \tag{31}$$

where <sup>R</sup><sup>t</sup> represents the generic path starting at <sup>R</sup> and ending at <sup>P</sup><sup>R</sup> while <sup>t</sup> varies from <sup>0</sup> to <sup>β</sup>.

#### 2.3.2. Restricted Paths

For fermions, negative terms enter in this sum, leading to a sign problem. As was done in the previous discussion of DMC, one way to circumvent this issue is to impose a nodal constraint [33]. We define the *nodal surface* Υ<sup>R</sup><sup>β</sup> for a given point R and inverse temperature β to be

$$\Upsilon\_{R\star\beta} = \{ R \mid \rho(R, R\_{\star}; \beta) = 0 \} \tag{32}$$

which is a (dN <sup>−</sup> 1)-dimensional manifold in dN-dimensional configuration space (<sup>d</sup> is the space dimensionality). Here, R is dubbed the *reference point*, as it is needed to define the nodal surfaces. Inside a nodal cell, by definition the sign of the density matrix is uniform. Using Dirichlet boundary conditions, we may solve the Bloch equation within each nodal cell. We define the *reach* Γβ(R) as the set of all continuous paths <sup>R</sup>t, for which <sup>ρ</sup>(Rt, R, β) = 0 for all intermediate <sup>t</sup> (<sup>0</sup> < t <sup>≤</sup> <sup>β</sup>), *i.e.*, node-avoiding paths

$$\Gamma\_{\beta}(R\_{\star}) = \{ \gamma : R\_{\star} \to R\_t \mid \rho(R\_{\star}, R\_t; \beta) \neq 0 \}\tag{33}$$

Since paths are continuous Brownian objects, all paths contributing to the Bloch equation solution must belong to this reach. For all diagonal contributions, odd permutations must cross a node an odd number of times and thus are not allowed by this constraint and are exactly cancelled by all paths of node-crossing even permutations. This leaves us with the following expression for the density matrix,

$$\rho(R, R; \beta) = \frac{1}{N!} \sum\_{\mathcal{P}, even} \int\_{\gamma: R \to \mathcal{P}R}^{\gamma \in \Gamma\_{\beta}(R)} \mathcal{D}R\_t e^{-S[R\_t]/\hbar} \tag{34}$$

We have thus turned the sign-full expression for the density matrix into one which includes only terms of a single sign, allowing efficient computation. However, because ρ appears on both sides of Equation (34) (in the r.h.s. it appears into the definition of the reach), this requires a priori knowledge of the density matrix nodal structure, which is generally unknown. To escape this self-consistency issue, an ansatz density matrix that approximates the actual nodal structure, is introduced. This will give an exact sampling of the Fermi density matrix if its nodes are correct. This method is called *restricted* PIMC (RPIMC). The density matrix for non-interacting fermions is a Slater determinant of single-particle distinguishable density matrices, ρ(R, R; β) = <sup>1</sup> <sup>N</sup>! det <sup>ρ</sup>ij where

$$\rho\_{ij\star} = \left(4\pi\lambda\beta\right)^{-d/2} \exp(-\frac{(r\_i - r\_{j\star})^2}{4\lambda\beta})\tag{35}$$

It is a good approximation to use the free particle density matrix at high temperatures (say for temperatures greater than the Fermi energy) and when correlation effects are weak. Furthermore, due to the constraint of translational invariance, free particle nodes are quite reasonable for homogeneous systems.

The nodal error, arising from using an approximate restriction is problematic since it is uncontrollable. The finite temperature variational principle is through the free energy, as opposed to the internal energy in the ground state. Thus one possible solution is to parameterize the nodal ansatz, and then minimize the free energy by varying the parameters. This will require a thermodynamic integration, in general. Systems analyzed to date suggest that the nodal error arising from the free-particle ansatz is small since the correlation from the interacting potential is fully taken into account.

#### 2.3.3. Path Integrals for Nuclei

Even when quantum particles can be considered distinguishable, as for instance light nuclei in condensed phases, there could be substantial physical effects arising from their quantum behavior, *i.e.*, resulting from the <sup>T</sup><sup>ˆ</sup> <sup>n</sup> in Equation (1). For example in bulk hydrogen and in water, the zero point motion of the protons must be taken into account for an accurate description. Furthermore, in the crystalline phase the frequently used harmonic approximation is often inadequate since non-harmonic effects can be as significant as harmonic effects. In contrast to the situation with electrons, our ability to simulate the nuclei with current algorithms and hardware is well controlled; because the nuclei are thousands of time heavier, they are much closer to the classical limit, so that fewer path steps are needed. For hydrogen-containing compounds at room temperature, one can often get away with about few tens of imaginary time slices. A second consequence is that particle statistics (either Fermi or Bose) can typically be ignored; a notable exception is the difference between para- and ortho-hydrogen, important for modeling the low-temperature low-pressure crystals of molecular hydrogen and deuterium.

A frequent use of path integrals for nuclei occurs when DFT is used to integrate out the electronic degrees of freedom. However, one wants to use the DFT energy surface for the properties of the quantum nuclei in equilibrium, using the path integral method. To perform the path integration, it is advantageous to use molecular dynamics instead of Monte Carlo since that will allow the electronic wave functions to evolve smoothly in time, and thus reduce the time to convergence in solving the DFT self-consistency conditions. M. Ceriotti, *et al*. [34] have devised an ingenious noise filtering scheme to reduce the number of needed path integral steps. Assuming the density functional description of the electrons is accurate, thermodynamic (static) properties of the simulated system will be accurate. Conversely the dynamical properties are not to be trusted. In general a reliable method for quantum time correlation functions or, even worse, quantum dynamics is still missing.

#### *2.4. Coupled Electron-Ion Monte Carlo*

The QMC methods described so far, when applied to an ion-electron system, treat all particles on the same footing, either both in the ground state [35–37] or both at the same finite temperature [38–40]. However the large nucleon-electron mass ratio implies a wide separation of time and energy scales and it is a common practice to adopt the adiabatic, or Born-Oppenheimer (BO), approximation. Ignoring such an approximation in QMC causes difficulties. The imaginary time step of the path integral representation (both in DMC/RMC and PIMC) is imposed by the light electron mass. In DMC this means that nuclear "dynamics" (the speed of sampling configuration space) is much slower than electron "dynamics" requiring very long (and time consuming) trajectories. In PIMC the separation of time scales presents itself as a separation in the regions where thermal effects are relevant: in high pressure hydrogen for instance nuclear quantum effects becomes relevant below ∼2000 K where electrons are, to a very good approximation, in their ground state. Performing PIMC in this region of temperatures requires very long electronic paths causing a slowing down of the exploration of configuration space and effectively limiting the ability of PIMC to perform accurate calculations at low temperatures.

The Coupled Electron-Ion Monte Carlo method (CEIMC) is a QMC method based on the BO approximation [8]. In CEIMC a Monte Carlo calculation for finite temperature nuclei (either classical or quantum represented by path integrals) is performed using the Metropolis method with the BO energy obtained by a separate QMC calculation for ground state electrons. CEIMC has been extensively reviewed in [8,10]. Here, we only briefly report the main features of the method.

#### 2.4.1. Penalty Method

In CEIMC the difference of BO energies of two nearby nuclear configurations in a MC attempted step, as obtained by an electronic QMC run, is affected by statistical noise which, if ignored, results in a biased nuclear sampling. To cope with this situation either the statistical noise needs to be reduced to a negligible value by long electronic calculations (very inefficient), or the Metropolis acceptance/rejection scheme needs modifications to cope with noisy energy differences. The latter strategy is implemented in the Penalty Method [41] which enforces detailed balance to hold on average over the noise distribution. The presence of statistical noise causes an extra rejection for a single nuclear move with respect to the noiseless situation. An extra "penalty" defined as the variance of the energy difference over the square of the physical temperature is added to the energy differences. Therefore running at lower temperatures requires a reduced variance to keep an acceptable efficiency of the nuclear sampling. Small variances can be obtained if correlated sampling is used to compute the energy of the two competing nuclear configurations. In an attempted nuclear MC step, a single ground state electronic run is performed with a trial wave function which is a linear combination of the wave functions of the two nuclear configurations considered. The BO energy of the two nuclear configurations is obtained by a reweighting procedure which provides energy differences with a much reduced variance with respect to performing two independent electronic runs if the "distance" between the two nuclear configurations is limited (*i.e.*, the overlap between the trial wave functions of the two configurations is large) [42]. This strategy allows an efficient sampling of nuclear configuration space for high pressure hydrogen and helium down to temperature as low as ∼200 K.

#### 2.4.2. Nuclear PIMC

When nuclear quantum effects are included using a path integral representation (see §2.3), the relevant inverse temperature in the penalty method is the imaginary time discretization step τ , so that no loss of efficiency is experienced when lowering the temperature (*i.e.*, taking longer paths). For quantum protons in high pressure hydrogen, CEIMC can be used to efficiently study systems at temperatures as low as ∼200 K. In the present implementation of nuclear quantum effects in CEIMC, we introduce an effective pair potential between nuclei and use the pair density matrix corresponding to the effective potential to factorize the imaginary time propagator. The residual difference between the energy of the effective system and the BO energy of the original system is considered at the primitive approximation level of the Trotter break-up of the proton propagator [8]. In high pressure hydrogen (r<sup>s</sup> = 1.40) it is found that with this strategy, an inverse time step of <sup>τ</sup> <sup>−</sup><sup>1</sup> 4800 K is enough to reach convergence of the thermodynamics properties, which allows to study systems at low temperature with a limited number of time slices (≤50).

In CEIMC many-body nuclear moves are preferred to single-body moves. The reason is that even if only few nuclei are moved the entire electronic calculation must be repeated, by far the most expensive part of the method. For this reason we sample nuclear configuration by a smart Monte Carlo method [43] in the normal mode space of the path [44] with forces from the effective two body potential. This strategy allows us to simulate systems of ∼100 protons (for hydrogen) at temperature as low as 200 K with an acceptable efficiency.

#### 2.4.3. VMC *vs.* RMC

The main ingredient of CEIMC is the electronic QMC engine used to compute the BO energy. As mentioned a very important aspect for the efficiency of CEIMC is the noise level which is related to the variance of the local energy. In ground state QMC (see §2.1) the "zero variance principle" applies: if the trial wave function is an eigenfunction of the Hamiltonian, the local energy is no longer a function of the electronic coordinates and a single calculation provides the exact corresponding eigenvalues. Therefore by improving the trial wave function and approaching the exact ground state, the variance of the local energy decreases to zero. In connection with CEIMC, this is important not only for the accuracy of the BO energy but also for the efficiency of the nuclear sampling since the extra rejection due to the noise is reduced for a more accurate trial wave function.

To go beyond VMC accuracy in CEIMC we have implemented Reptation QMC method (RMC) [8,29]. RMC is superior to DMC in the CEIMC context since it uses an explicit representation of the statistical weight of each path and therefore the reweighting procedure needed for estimating energy differences is easily applied. Going from VMC to RMC accuracy in CEIMC requires at least one order of magnitude more computer time. This is because it is in general more difficult to properly sample the configuration space of a 3N-dimensional path than of a 3N-dimensional point. It is analogous to the difficulty of sampling the configuration space of a long polymers with respect to point particles. For any proposed nuclear move one has to relax the electronic path to the new equilibrium state and perform long enough sampling of the electronic configuration space to compute the energy difference with the required noise level.

In order to improve the efficiency of CEIMC while keeping the RMC accuracy, we have recently developed a method, based on a peculiar thermodynamic integration, to estimate the free energy of the system with RMC based BO energy from the knowledge of the free energy of the system with VMC based BO energy [45]. This allows to extensively use VMC rather than RMC, performing RMC on selected thermodynamic states only.

#### 2.4.4. Hydrogen Trial Wave Function

For high pressure hydrogen we have developed a quite accurate trial function of the Slater-Jastrow, single determinant, form. The Jastrow part has an electron-proton and electron-electron Random Phase Approximation (RPA) term plus two-body and three-body empirical terms depending on few variational parameters. The Slater determinants (one for each spin state) are built with single electron orbitals obtained by a self-consistent DFT solution. We have recently integrated the PWSCF-DFT solver [46] into our CEIMC code to ensure a faster and uniform convergence of the single electron orbitals in different physical conditions. Further, the argument of the orbitals are not the bare electron positions but rather the quasiparticle positions defined by the backflow transformation [47,48]. We combined both the RPA analytical form and the Gaussian-like empirical terms depending on variational parameters. Our trial wave function has a total of 13 variational parameters to be optimized [42,48].

Figure 1. Variational energy of four different crystalline molecular structures versus rs: C2/c upper-left panel, Cmca-12 upper-right panel, P63m lower-left panel and Pbcn lower-right panel. Energies from wave functions with different orbitals relatives to the energy with LDA orbitals: PBE orbitals (red triangles), HSE orbitals (green closed circles) and vdW-DF2 orbitals (blue closed squares).

In view of the large variability of DFT results from different exchange-correlation approximations in the dissociation region of high pressure hydrogen (see next section), one interesting question is about the sensitivity of the trial wave function to the particular form of the adopted Kohn-Sham orbitals in the Slater determinant. This is particularly relevant since the form of the orbitals determine the nodal surface of the trial wave function, the ultimate limit in the accuracy of fermionic QMC. On the one hand one could hope to further improve the quality of the trial wave function by varying the type of orbitals, on the other hand a large sensitivity to the form of the Kohn-Sham orbitals will signal a too constrained form of the wave function, probably with a large room for improvements. The recent technical advance of the CEIMC code, namely the integration of PWSCF, allowed us to test several different types of orbitals: standard local (LDA) and semilocal (GGA-PBE) approximation, a non-local functional devised to reduce the self-interaction error and improve the description of the electronic correlation in DFT (HSE [49]) and a functional devised to improve the description of the dispersion interactions which are absent in a self-consistent mean-field theory (vdW-DF2 [50–52]). In the range of coupling parameter <sup>1</sup>.<sup>22</sup> <sup>≤</sup> <sup>r</sup><sup>s</sup> <sup>≤</sup> <sup>1</sup>.<sup>44</sup> which corresponds approximatively to the range of pressure between 200 GPa and 550 GPa according to DFT, we have considered four recently proposed candidate structures for the molecular crystal [53], namely C2/c, Cmca-12, Pbcn and P63m. For each structure we have performed parameter optimizations for the four mentioned forms of the orbitals and at eight different densities. Supercells of 96 atoms were considered for C2/c, Cmca-12 and Pbcn structures, while a supercell of 128 atoms was studied for the P63m structure. Moreover for a single structure, Pbcn, at a single value of r<sup>s</sup> = 1.35 we have performed a complete RMC study. In Figure 1 we report for all densities investigated the variational energies from the different orbitals relative to the energy of the trial function with LDA orbitals.

Figure 2. Pbcn structure of molecular hydrogen at r<sup>s</sup> = 1.35. Left panel: energy per atom versus projection time in RMC from different kind of orbitals: LDA (closed red squares), PBE (green closed circles), HSE (upward blue triangles), vdW-DF2 (downward purple triangles). Also results from the old LDA implementation (cyan open circles) are reported. Right panel: Energy per atom versus variance in RMC from different kind of orbitals.

We note that for all structures and at all densities LDA, PBE and HSE orbitals provides trial functions of the same quality (differences are of the order of 0.2 mH/atoms = 90K). Instead the trial function with orbitals from vdW-DF2 functional provides higher energies, by roughly 0.4 mH/at with values up to 1.4 mH/atom ( 630 K). This first result is quite indicative that our trial function is flexible and general enough to be very little sensitive to the form of the orbitals. In order to check whether the observed differences from vdW-DF2 orbitals could be due to optimization problems only, we performed a complete RMC study for a single case, namely the Pbcn structure at r<sup>s</sup> = 1.35. A time step of τ = 0.005 h−<sup>1</sup> was used, which is fairly typical in this sort of calculation. No further time step error extrapolation study has been performed. In Figure 2 the energy versus projection time is reported for all kind of orbitals. We also added results from our old DFT solver with LDA

orbitals plagued by the truncation error. For all kinds of trial function we observe a very similar relaxation with projection time meaning that the quality of the trial function is similar in all cases. The differences observed at the variational level among different trial functions essentially remain along the projection and therefore in the extrapolated value for the total energy. A quantitative way to estimate the extrapolated (β → ∞) value of the total energy is to plot energy versus its variance (pure estimate) and use a linear extrapolation at small values of σ<sup>2</sup>. This plot for all studied cases is shown in the right panel of Figure 2. We see that the three kinds of orbitals, LDA, PBE and HSE all provides extrapolated energies within error bars (E<sup>0</sup> <sup>=</sup> <sup>−</sup>0.5350(2)), while the vdW-DF2 orbitals provides a higher value (E<sup>0</sup> <sup>=</sup> <sup>−</sup>0.5342(2)). The fact that the RMC projection is not able to remove the difference observed at the VMC level means that the nodes from the vdW-DF2 are less accurate than for the other kind of orbitals, which instead, despite their differences, provide essentially the same nodal structure. Finally we note that our old implementation of LDA orbitals provides a less accurate determination of the energy with correspondingly larger variance.

#### 3. Applications

#### *3.1. High-Pressure Hydrogen*

Hydrogen is the simplest element of the periodic table and also the most abundant element in the Universe. Because of its simple electronic structure, it has been instrumental in the development of quantum mechanics and remains important for developing ideas and theoretical methods. In the next section we explore its use in developing DFT functionals. Its phase diagram at high pressure has received considerable attention from the first-principles simulation community due to its critical importance in many fields like planetary science, high pressure physics, astrophysics, inertial confinement fusion, among many others [10,54,55]. The phase diagram of hydrogen at high pressure contains many interesting features including: a maximum in the melting line with a subsequent negative slope [56,57], a predicted liquid-liquid transition between an insulating molecular and a conducting atomic phase [58,59], exotic molecular phases at low temperature, and a predicted metal-insulator transition in the solid phase [10,55].

The ground state structure of crystalline hydrogen across the pressure-induced molecular dissociation has been studied by DMC [35–37] which predicted molecular dissociation at density corresponding to r<sup>s</sup> 1.3. RPIMC has been applied to investigate the Warm Dense Matter regime, namely the regime of high pressure and density where thermal and pressure molecular dissociation and ionization occur simultaneously [38,39,60]. Particularly relevant for our current understanding of the phase diagram and the Equation of State (EOS) of compressed hydrogen has been the determination of the primary and secondary Hugoniots lines of deuterium which could be directly compared with experimental data [40,61]. RPIMC predictions for the principal Hugoniot of deuterium were first in disagreement with pulsed laser-produced shock compression experiments [62–64], but were later confirmed by magnetically generated shock compression experiments at the Z-pinch machine [65–70] and by converging explosive-driven shock waves techniques [71,72]. Also relevant for the development and fine tuning of simulation methods for Warm Dense Matter has been the comparison with the less demanding, but also less fundamental methods based on Density Functional Theory (either Kohn-Sham or Orbital-Free flavours). A general agreement between RPIMC and FPMD predictions for the Hugoniot lines was observed [10] except at the lowest temperatures that could be reached by RPIMC (∼10,000 K). More recently the synergetic use of Born-Oppenheimer molecular dynamics (BOMD) and RPIMC has allowed to produce first-principle based EOS's in a wide range of physical conditions for hydrogen, helium and hydrogen-helium mixtures [73,74] instrumental in planetary modeling and crucial ingredients for the hydrodynamic codes used in the large facilities for extreme conditions experiments.

Temperatures lower than ∼10,000 K cannot be easily reached by RPIMC without reducing the level of accuracy. However, most of the interesting phenomena in high pressure hydrogen, like molecular dissociation under pressure, metallization, solid-fluid transition, a possible liquid-liquid phase transition and its interplay with melting, the various crystalline phases and the transition to the atomic phases [10], occur at lower temperature out of the reach of RPIMC. Investigating this regime by QMC methods has been the main motivation in developing CEIMC. The other motivation, as mentioned above, is the benchmark of the much more developed (and less demanding) alternative theoretical method, namely FPMD based on DFT. Indeed the numerical implementation of DFT is based on approximations (the exchange-correlation functional) the accuracy of which can only be established against experiments or, better, against more accurate theories. As mentioned earlier, QMC energy is an upper bound and therefore has an internal measure of accuracy.

CEIMC has been applied to investigate the WDM regime of hydrogen and helium and benchmark FPMD [48,75,76]. In [76] an investigation of the fully ionized state of hydrogen in a region of pressure and temperature relevant for Jovian planets found that FPMD based on the GGA-PBE exchange-correlation functional and CEIMC are in very good agreement but both deviates from a widely accepted phenomenological EOS. The agreement between the simulation methods becomes less good when approaching the molecular dissociation regime at slightly lower temperature and pressure. Both CEIMC and FPMD with different approximated functionals has been applied to investigate the Liquid-Liquid phase transition (LLPT) region in hydrogen [45,59,77]. The emerging picture is that a weak first-order phase transition occurs in hydrogen between a molecular-insulating fluid and a metallic-mostly monoatomic fluid. At higher temperature, molecular dissociation and metallization occur continuously. However the precise location of the transition line and the critical point are still matter of debate since several levels of the theory provide different locations. Within FPMD-DFT the location of the transition line depends strongly on the exchange-correlation functional employed and on whether classical or quantum protons are considered [77]. Transition lines from the PBE and vdW-DF2 approximations differ by roughly 200–250 GPa, the PBE one being located at lower pressure. The PBE melting line with quantum protons is not in agreement with experiments, which highlights the failure of the PBE approximation when employed together with the quantum description of the nuclei. On the other hand, optical properties for the vdW-DF2 approximation are in agreement with experiments supporting the use of this functional for hydrogen in the WDM regime. The LLPT line from CEIMC lies in between the lines from PBE and vdW-DF2 functionals [45,59]. However, those results were plagued by a truncation error in the calculations of the single electron orbitals which showed up only around the metallization and which resulted in biased estimates. We have now changed the DFT solver in our CEIMC code and checked the convergence. We find a roughly uniform shift of the transition line of ∼50 GPa to higher pressure and we are performing new calculations with quantum nuclei. Preliminary results, based on VMC electronic energies, suggests that, similarly to the DFT scenario, nuclear quantum effects favor molecular dissociation and become increasingly important at lower temperatures. We estimate that the transition pressure is decreased, because of nuclear quantum effects, by ∼60 GPa at 600 K and by ∼150 GPa at 300 K (from ∼430 GPa for classical nuclei to ∼290 GPa for quantum nuclei). RQMC corrections to the transition lines was previously found to be small and we expect an even smaller effect with the new CEIMC implementation since the VMC variance is roughly half of what it was in the previous code [45].

The last estimate however is for a metastable liquid state obtained by an instantaneous quenching of the fluid at higher temperature, while it is expected that the equilibrium state at 300 K and ∼290 GPa be crystalline (of unknown structure) [10]. Those results are preliminary since the calculation is performed for a small system of 54 protons (we employ Twist Averaged boundary conditions to reduce size effects on the single-electron properties with a 4 × 4 × 4 twist angle grid) and we are presently estimating size effects, both by direct size extrapolation and by the analytic treatment of size effects [78,79]. In Figure 3 we report CEIMC proton-proton g(r) at various densities along the T = 600 K isotherm to illustrate the relevance of nuclear quantum effects on the pressure dissociation. The preliminary CEIMC results suggest that, despite the good performance observed on band gap calculations in the crystalline phases [80], the vdW-DF2 exchange-correlation functional has a tendency to over-stabilize molecules.

Although our results demonstrate the power of CEIMC in predicting the physical properties of hydrogen, its use is still quite demanding in terms of computer time, a fact that limits its applicability. This is particularly true when a much larger exploration of external conditions is needed to clarify the physics. For example, to study the crystalline state of the molecular system and clarify the molecular-atomic transition mechanism in the solid state, it is necessary to consider a large number of candidate structures, some of which have very large unit cells (the recently proposed Pc structure for phase IV of molecular hydrogen [81] contains 192 proton, more than three times larger than the system considered in the LLPT). Moreover, in studying those structure at finite temperature it is important to apply a constant stress algorithm allowing the simulation box to deform and release the excess internal stress that otherwise would produce metastable states. While larger systems (>250 particles) and constant pressure algorithms are routinely applied in FP methods based on DFT, their use in conjunction with CEIMC is still problematic. Therefore, it is important to apply CEIMC and other QMC methods to validate DFT predictions and determine the most accurate functional for a given system. The same considerations apply to systems more complex than hydrogen. In the next section we will describe our effort to benchmark functionals for high pressure hydrogen and for water in condensed phase.

Figure 3. Proton-proton radial distribution function at various densities along the isotherm T = 600 K. Comparison between classical nuclei (red continuous line) and quantum nuclei (blue dashed line) for hydrogen nuclear mass. It is evident the molecular dissociation with increasing density.

#### *3.2. QMC Benchmarks of DFT*

Within the Born-Oppenheimer approximation at low temperatures, the only interaction between ions and electrons comes through the potential energy surface E0(R), defined as the solution of the electronic Hamiltonian for a fixed set of ionic coordinates. E0(R) is typically approximated by EDFT (R) in first-principles calculations, and obtained from a density functional theory (DFT) calculation. Over the last several years, many-body methods for solids have been developed to the point that the prospect of developing density functionals from accurate reference calculations is now a possibility. In this section, we show how quantum Monte Carlo calculations can be used to benchmark the accuracy of DFT in the description of the potential energy surface. The quality of EDFT (R) defines the predictive capabilities of the resulting first-principles simulation. We use large sets of representative configuration from PIMD simulations, and compare the mean absolute error between accurate QMC calculations and various DFT functionals. We present preliminary calculations on high pressure hydrogen and liquid water at ambient conditions, two materials that are particularly challenging to DFT due to the subtle competition between dispersion interactions, nuclear quantum effects, hydrogen bonding, and anisotropic interactions.

#### 3.2.1. Hydrogen

The phase diagram of hydrogen at high pressure has been extensively explored using first-principles simulations with DFT [58,59,82–85]. In spite of the large number of studies, most of the work so far has employed either the local density (LDA) [86] approximation to the exchange-correlation potential or the Perdew-Burke-Ehrzenhof (PBE) [87] generalized gradient approximation. These are two of the simplest functionals currently available in DFT. In fact, both of them suffer from self-interaction errors and lack a proper treatment of dispersion interactions, making their application in the regime of molecular dissociation questionable. Recently, the use of DFT functionals with an improved description of dispersion interactions has been employed in the study of the liquid and solid molecular phases in the neighborhood of molecular dissociation. It was found that the dissociation density changed when compared to calculations using PBE [77,80]. Since these functionals were not designed for materials at high density, and because dispersion interactions are clearly important in dense molecular hydrogen, there is a crucial need for accurate calculations that can be used to benchmark the different exchange-correlation functionals employed in first-principles simulations.

Since sufficient experimental data is not available to validate the quality of functionals in the high-pressure high-temperature regime of the phase diagram, we used fixed-node diffusion Monte Carlo (DMC) to benchmark the accuracy of several DFT functionals over a range of densities near the liquid-liquid phase transition at a temperature of T = 1000 K. Henceforth, we will refer to densities using the parameter rs. First, we ran PIMD simulations with the PBE functional for N = 54 hydrogen atoms at three densities: r<sup>s</sup> = 1.30, 1.45, 1.60. In this range of densities, the liquid goes from an insulating molecular state at r<sup>s</sup> = 1.60 to a conducting atomic liquid at r<sup>s</sup> = 1.30. The density r<sup>s</sup> = 1.45 is intermediate and close to the LLPT for this functional. After equilibration, we sampled 100 ionic configurations from uncorrelated PIMD time slices for each density. For each configuration at each density, we calculated the DMC energy, and then computed EDFT (R) for the following functionals: LDA, PBE, vdW-DF [50], vdW-DF2 [51,52,88], and HSE [49].

All QMC calculations were performed with the QMCPACK [89–91] software package. We used a Slater-Jastrow trial wavefunction with twist-averaged boundary conditions [92], employing a 3 × 3 × 3 grid of boundary conditions. For the Jastrow functions, we used real space b-splines with optimizable knots. We included spin-independent one-body proton-electron terms; a short-ranged term with the appropriate cusp condition, and a long-ranged term. We also included two long-ranged spin-dependent electron-electron functions with appropriate cusp conditions. For each configuration, linear optimization with VMC was performed for all Jastrow parameters at a single twist-angle, these parameters were subsequently used for all twists in the DMC calculations. For the DMC run, a timestep of τ = 0.05 Ha−<sup>1</sup> and 6000 walkers were used. The orbitals were obtained from DFT using the Quantum Espresso software package [46], using the PBE functional. We used a plane wave cutoff of 210 Ry. DFT calculations were performed with a Troullier-Martins norm conserving pseudo-potential [93] with a cutoff radius of r<sup>c</sup> = 0.5a0, DMC calculations were performed with the Coulomb potential. Based on the scale of the energy differences, we found a statistical error of 0.02 mHa/particle to be sufficient for present purposes. Since we were interested in measuring the spread of energy errors in this presentation, constant energy offsets were removed from our error assessments. This means that we did not have to include energetic finite size effects, although more detailed assessments will certainly call for this.

An example of the comparison between QMC and DFT is given in Figure 4. Shown is a histogram of the energy difference between the results of DMC and the PBE functional at the three densities: <sup>Δ</sup>EDFT <sup>=</sup> <sup>E</sup>DFT <sup>−</sup> <sup>E</sup>DMC. Given that <sup>r</sup><sup>s</sup> = 1.<sup>30</sup> corresponds to the atomic liquid, and <sup>r</sup><sup>s</sup> = 1.<sup>60</sup> to the molecular liquid, we immediately see that the errors incurred by using the PBE functional are not consistent across the LLPT. As expected, PBE offers a much better description of the atomic liquid compared to the molecular phase, where self-interaction errors are larger and dispersion interactions are important. This is a well-known failure of most semi-local density functionals, which tend to favor delocalized states.

Figure 4. Histograms of ΔEDFT for the PBE functional for dense hydrogen at densities r<sup>s</sup> = 1.30, 1.45, 1.60 at T = 1, 000K. ΔEDFT refers to the absolute energy difference per hydrogen atom between the DFT and QMC for a given configuration. There were 54 atoms per configuration.

To better quantify and compare the quality of functionals, we have computed the mean absolute error (MAE) from data similar to that shown in Figure 4. This quantity is defined as MAEfunc = |ΔEDFT − ΔEDFT |, where the average is taken over all configurations at a particular density. Notice that we subtract the average energy difference in the definition of the MAE, since the zero of energy of each functional is modified by the use of pseudopotentials. Fluctuations of the energy differences are more significant since the structure of the liquid is only sensitive to differences. The MAE gives us one measure of the quality, or predictive capability, of a given functional as defined by the reference method, in this case DMC. We have tabulated our results in Figure 5.

There are several interesting features in Figure 5 directly related to the expected performance of these functionals in the description of hydrogen near molecular dissociation in the liquid. First, the two semi-local functionals in the comparison, LDA and PBE, have considerably different errors in the molecular and atomic regimes. As described above, the atomic regime is more accurately described in comparison to the molecular phase, leading to a potentially strong underestimation of dissociation transition pressures in both solid and liquid phases. This is consistent with recently reported simulations [77]. On the other hand, both the hybrid HSE and the functionals with improved dispersion vdW-DF and vdW-DF2 offer a more consistent level of description between the two regimes. The mean absolute errors of the HSE and vdW-DF functionals are approximately half that of the PBE functional for all densities, which indicates that these functionals more accurately capture energy *differences* between various liquid configurations.

Figure 5. Mean absolute error of energy/atom *vs.* functional for dense liquid hydrogen at 1000 K. For each functional, we computed the mean absolute error for three different densities, denoted by the different colored bars.

#### 3.2.2. Liquid Water

Water plays a central role in many scientific fields [94]. It is a critical component to almost all chemical, biological, and geophysical processes. As a result, it is one of the most studied substances in science, both from an experimental and a theoretical point of view. Despite such broad importance, water's most basic property, its local structure at ambient conditions, characterized by the geometry of its underlying hydrogen-bond (H-bond) network, has remained a matter of debate for over a century [95–97]. Challenges arise because water is only ≈25 K (at room temperature) from the melting temperature of ice, where a variety of subtle and complex effects become important. While the structure is dominated by H bond between neighboring molecules, both van der Waals (vdW) interactions (which, in this context, refers to dispersion forces resulting from dynamical nonlocal electron correlations) and nuclear quantum effects (NQEs) influence the topology of the H-bond network. In fact, it is precisely these seemingly subtle effects (compared to H bonding) that are key to accurately describing ambient water, but have been (until recently) difficult or impossible to model.

Atomistic simulations have the potential to resolve these issues, particularly using first-principles methods. Providing an accurate theoretical description has been a central topic and open challenge in physical chemistry for many decades. Despite considerable focus over the last decade, to date DFT has proven insufficient for the accurate description of liquid water [4,98]. Nonetheless, much progress has occurred during the last several years. The main advances include the use of functionals that properly describe dispersion interactions in the liquid [50,52,99,100], the use of hybrid functionals [101], and the direct treatment of nuclear quantum effects [102]. The combination of all of these advances in first-principles simulations of liquid water could lead to an accurate description of its interesting properties, including its local structure. At the same time, the choice of exchange-correlation functional in DFT is still a source of complication, mainly due to the large number of possibilities and the inability to test their predictive capabilities without resorting to full first-principles calculations of a large set of observables. As in the case of hydrogen, an accurate first-principles description almost certainly requires the use of path integral methods in order to directly treat nuclear quantum effects, which makes the calculations quite computationally intensive. What is needed is a way to assess the quality of a given functional without having to resort to first-principles calculations of the liquid at the PIMD level, and if possible, a way to systematically improve them using high quality reference calculations from accurate many-body methods.

In this section, we present QMC calculations of configurations of molecules extracted from PIMD simulations of liquid water. QMC has been shown to be a reliable benchmark in the study of small water clusters [103–105], and should provide an accurate reference method to measure the quality of typical density functionals used in simulations of water. All DMC calculations were performed with the QMCPack software package [89–91]. A Troullier-Martins norm-conserving pseudo-potential [93] was used to represent both hydrogen and oxygen. In particular, we used the pseudo-potentials from the CASINO database [106,107], which were recently shown to produce accurate results in the study of small water clusters. A Slater-Jastrow trial wave-function was used. The orbitals in the Slater determinant were obtained from DFT calculations employing the PBE exchange-correlation functional. We do not expect a strong dependence of the resulting comparison on the functional used to generate the orbitals. The Jastrow term contains electron-ion, electron-electron and electron-electron-ion terms, the variational parameters were optimized at the VMC level using a variant of the linear method of Umrigar, *et al.* [108]. A time-step of 0.01 Ha−<sup>1</sup> was found to be sufficiently small to produce accurate total energies and approximately 4800 walkers were used in the DMC calculations. Casula's T-moves [109] were used to reduce locality errors, while the Model Coulomb Potential [110] and Chiesa's [78] correction scheme were used to estimate finite-size corrections to the potential and kinetic energies respectively.

DFT calculations were performed with both Quantum Espresso (QE) [46] and VASP [111–113] simulation packages. In the case of QE calculations we employed norm-conserving Troullier-Martins pseudo-potentials, while in the case of VASP calculations we employed the Projector Augmented Wave method (PAW) [114,115]. A single pseudo-potential (constructed with PBE) was chosen in order to make a homogeneous comparison of all DFT functionals, since some of the functionals employed in this work do not yet allow for the production of pseudo-potentials. All simulations were performed at the Γ point of the supercell in order to be consistent with the corresponding DMC calculations; errors due to the lack of k-point integration were small enough to be safely discarded. We carefully tested the convergence with the plane-wave cutoff in all DFT calculations.

We present calculations for 3 different configuration sets. The first two sets, which we called *TIP5P-PI-0C-ICE* and *TIP5P-PI-0C-LIQ*, were generated with PIMD calculations on simulation cells using the semi-empirical TIP5P water model and 32 molecules [116]. As the name suggests, the PIMD calculations used to generate the configuration set were performed at T = 0 C, from stable solid and liquid phases. The third configuration set was obtained from PIMD calculations of 64 water molecules, at room temperature and density of 1 g/cm<sup>3</sup>, with the vdW-DF2 functional, which has been recently shown to provide an accurate description of the structure of water when combined with a path integral representation [117]. The number of configurations in each set is 20, 47, 50, respectively. The three configuration sets sample different aspects of the potential energy surface of liquid water. While TIP5P is a rigid molecule model, the first-principles simulations with vdW-DF2 are fully flexible, which allows us to emphasize different ranges of the molecular interactions in the liquid. On the other hand, the simulations with TIP5P in both liquid and solid phases atT=0C sample the configurations that either strongly favor hydrogen bonding in the solid, with those where the hydrogen-bond network has been destabilized in the liquid.

Figure 6. Mean absolute error in the total energy between DMC and DFT with various exchange correlation functionals for a supercell containing water molecules. Results presented correspond to calculations using the PAW formulation with VASP. X-D, where X represents a given density functional, designates results using the empirical dispersion corrections of Grimme *et al.*, [118], in particular the DFT-D2 correction scheme as implemented in VASP. Statistical errors on the presented results are on the order of 0.003 mHa and 0.005 mHa for rigid and flexible molecule configurations respectively. They are not shown on the figure for clarity.

Figure 6 shows the mean absolute difference in the total energy between DMC and DFT calculations, results are separated by configuration sets in order to allow for a more clear comparison between them. Several functionals are considered including the semi-local functionals: PBE [87]; the hybrid functionals: PBE0 [49], B3LYP [119,120]; the non-local van der Waals functionals: optB88 [121], optPBE [121], optB86b [122], vdW-DF [50] and vdW-DF2 [52]; and finally functionals with the empirical van der Waals correction of Grimme, *et al.*, (DFT-D2) [118]. While there are many interesting results in this comparison, the most noticeable feature is the large difference in the scale of the MAE between rigid and flexible molecule configurations. This

is not unexpected since the larger energy fluctuations in the system are found coupled to the intramolecular degrees of freedom of the molecule. In the case of flexible molecule configurations, hybrid functionals offer a much better agreement with DMC results, producing errors typically a factor of 2 smaller than non-hybrid functionals. This result shows the fact that hybrid functionals do a much better job at describing the intramolecular potential energy surface. This is consistent

with the recent calculations of Alfe, *et al.* [104] and with the recent calculations of the absorption spectra of bulk water at ambient conditions of Zhang, *et al.* [101]. On the other hand, the functionals that include an appropriate description of dispersion interactions offer a clearly better comparison with QMC in the rigid-molecule configuration sets. In this case, the intermolecular interactions are the dominant energy contribution and the lack of appropriate dispersion leads to a larger error. In this case, we can also see a small but finite improvement with the inclusion of empirically corrected vdW functionals (PBE-D, B3LYP-D), but the gain is small and can not compete with non-local vdW functionals. Notice also that the performance of hybrids in the rigid-molecule sets is comparable to the performance of semi-local functionals, due to the fact that neither of these type of functionals can properly describe dispersion interactions. Finally, the configuration set with the smallest overall MAE is the one obtained from the calculations in the solid phase close to melting, showing the fact that most of these functionals can describe hydrogen bonded configurations fairly well.

#### 4. Discussion

Direct first-principles simulations with QMC accuracy of condensed phases systems are nowadays possible but restricted so far to the simplest first few elements of the periodic table, namely hydrogen, helium and their mixtures. Even for those simple systems, challenges are present and the computational demand is large. Nonetheless, CEIMC predictions for the liquid-liquid phase transition in hydrogen remains today the target for less accurate but faster DFT-based FP methods. While much work remains to be done in developing QMC-based FP methods, the calculations presented here show one possible use of accurate many-body calculations: using QMC to benchmark the accuracy of DFT functionals. Not only does this allow us to make a judgment of the quality of a functional before its use in first-principles simulations, but it also shows us a path for the systematic improvement of the functionals by adjusting free parameters to minimize the errors. DFT users will often point to experimental data to validate the quality of a chosen functional. What we have shown is that we can use highly-accurate QMC methods to benchmark functionals around the liquid-liquid transition of hydrogen from first-principles. In addition, this set of reference energies for the bulk system can be used to optimize the free parameters in the DFT functional to minimize the errors, and in the limit of a large data set, reproduce the quality of the more accurate many-body method in first-principles calculations using DFT. This approach will be increasingly necessary as we continue to explore matter under extreme pressures, since experimental data is often insufficient or nonexistent at geophysical/planetary scales. It will also be necessary for other situations where DFT functionals have difficulties, such as near metal-insulator transitions.

Let us consider a more general point. We suggest that, in general, it is superior to use total energies to find an interatomic potential (force field). The traditional approach is to fit experimental data, for example, the melting temperature of ice, the density of water versus temperature, *etc*. Clearly this procedure was necessary in the past since experimental data was all that was available. However, using this approach requires very extensive calculations including free energy or equivalent computations and ultimately only gives a few constraints. We can invoke "The Allegory of the Cave" from Plato's *The Republic*. We should not look to fit the atomic potentials using the projections of the energy surface onto thermodynamic properties, but, instead to fit directly the energy surface. Thus we will obtain an interatomic potential suitable for all properties. The situation has changed since QMC methods have matured and much more computational power is available. We note that scanning potential energy surface is a task very well suited to massively parallel computers. Including total energy QMC benchmarks into the fitting procedure in addition to experimental data, can allow for much more systematic improvements. QMC thus can provide a unique role in giving total energies and is applicable to large enough systems to approximate condensed matter.

Water and hydrogen show an additional complication of using experimental data: namely because of the importance of quantum zero-point effects of the protons, fitting of the experimental data becomes particularly problematic. A common approach is to do a simulation of the classical system and assume that the effective classical system includes the effects of zero-point energy; clearly this then becomes quite approximate since the zero-point effects are not small. A complication is that the interatomic potential that results can become temperature and density dependent with all known pathologies related to the use of state dependent potentials [123]. One may need to do full PIMD simulations of the system in order to determine the best empirical potential, thus increasing the, already large, computational requirements considerably.

One aspect in determining good force fields is to find an appropriate basis set to parameterize the force field. Traditionally, these have contained few functions with very few parameters, e.g., the Lennard Jones potential with only two parameters: and σ. It is feasible today to calculate the energy and forces for millions of independent arrangements of ions. Using QMC techniques, each would come with an error estimate. Hence, we can envision fitting this data set to a force field with potentially tens of thousands of independent parameters. This will allow us to determine a completely general pair potential (say with a spline basis), a three-body potential, four-body potential, *etc*. However, the investigation into effective basis sets to describe these potentials becomes very important. We can imagine an integrated set of tools: QMC simulations of systems with thousands of electrons to produce data sets of energies and forces. These can be used either to tailor a DFT to a particular system, or to determine a force field. The DFT simulations and the effective force field simulations can then be used to model much larger systems. Thus simulations can thereby become much more predictive, and produce not just universal properties, but details important to applications and experiment.

#### Acknowledgments

Miguel Angel Morales was supported by the U.S. Department of Energy at the Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, by LDRD Grant No. 13-LW-004 and by the Basic Energy Science (BES), DOE through the Predictive Theory and Modeling for Materials and Chemical Science program, D. M. C. and R. C. were supported by the DOE grant DE-NA0001789 and C. P. by the Italian Institute of Technology (IIT) under the SEED project grant number 259 SIMBEDD Advanced Computational Methods for Biophysics, Drug Design and Energy Research. Computer resources have been provided by the US DOE INCITE program, Lawrence Livermore National Laboratory through the 7th Institutional Unclassified Computing Grand Challenge program and PRACE Project No. 2011050781.

### Conflicts of Interest

The authors declare no conflict of interest.

#### References


Reprinted from *Entropy*. Cite as: Bou-Rabee, N. Time Integrators for Molecular Dynamics. *Entropy* 2014, *16*, 138–162.

*Article*

## Time Integrators for Molecular Dynamics

#### Nawaf Bou-Rabee

Department of Mathematical Sciences, Rutgers University—Camden, 311 N 5th Street, Camden, NJ 08102, USA; E-Mail: nawaf.bourabee@rutgers.edu; Tel.: +1-856-225-6093; Fax: +1-856-225-6602

*Received: 19 September 2013; in revised form: 20 November 2013 / Accepted: 4 December 2013 / Published: 27 December 2013*

Abstract: This paper invites the reader to learn more about time integrators for Molecular Dynamics simulation through a simple MATLAB implementation. An overview of methods is provided from an algorithmic viewpoint that emphasizes long-time stability and finite-time dynamic accuracy. The given software simulates Langevin dynamics using an explicit, second-order (weakly) accurate integrator that exactly reproduces the Boltzmann-Gibbs density. This latter feature comes from adding a Metropolis acceptance-rejection step to the integrator. The paper discusses in detail the properties of the integrator. Since these properties do not rely on a specific form of a heat or pressure bath model, the given algorithm can be used to simulate other bath models including, e.g., the widely used v-rescale thermostat.

Keywords: explicit integrators; Metropolis algorithm; ergodicity; weak accuracy

Classification: MSC 82C80 (Primary); 82C31, 65C30, 65C05 (Secondary)

#### 1. Introduction

Molecular Dynamics (MD) simulation refers to the time integration of Hamilton's equations often coupled to a heat or pressure bath [1–5]. From its early use in computing equilibrium dynamics of homogeneous molecular systems [6–13] and pico- to nano-scale protein dynamics [14–23], the method has evolved into a general purpose tool for simulating statistical properties of heterogeneous molecular systems [24]. Accessible time horizons have increased remarkably: the time line in Figure 1 attempts to capture this nearly billion-fold improvement in capability over the last forty or so years. To put this speedup in perspective, though, computing power has increased by about eight powers of ten over this time period as predicted by Moore's law.

To be clear, the selection of applications and methods shown in Figure 1 is not comprehensive and heavily biased towards the specific ideas and methods that inform this paper. The applications highlighted are simulations of liquid argon [6], water [11], protein dynamics without solvent [14,15] and biopolymer dynamics with solvent [25–31]. The methods include the following "upgrades" to MD simulation: Verlet integrator and neighbor lists [7], cell linked list [32], the SHAKE integrator for constraints [33], stochastic heat baths via Langevin dynamics [34,35], a library of empirical potentials [36], a deterministic heat bath via Nosé-Hoover dynamics [37,38], the fast multipole method [39], multiple time steps [40], splitting methods for Langevin dynamics [41–43], quasi-symplectic integrators [44,45], (fast) combined neighbor and cell lists [46], the v-rescale thermostat [47] and the stochastic Nosé-Hoover Langevin thermostat [48–50].

Near future applications of MD simulation include micro- to milli-scale simulations of biomolecular processes, like protein folding, ligand binding, membrane transport and biopolymer conformational changes [51–53]. In addition, atomistic MD simulations are used more sparingly in multiscale models [54–58] and rare event simulation, such as the finite temperature string method and milestoning [59–62]. Given this continuous development and generalization of MD, it is not a stretch to suppose that MD will play a transformative role in medicine, technology and education in the twenty-first century.

In its standard form, the method inputs a random initial condition, physical and numerical parameters and outputs a long discrete path of the molecular system. Statistical quantities, like velocity correlation or mean radius of gyration, are usually computed online, *i.e.*, as points along this trajectory are produced. MD simulation is built atop a cheap forward Euler-like integrator that requires only a single interactomic force field evaluation per step. Even though MD seems straightforward, software implementations of MD are typically optimized for performance [36,63,64], and as a side effect, make it cumbersome for non-experts to learn and modify.

Also, besides this issue, due to the interplay between stochastic Brownian and molecular forces, infinitely long trajectories of existing MD integrators do not have the right distribution. What happens is that the Brownian force can cause the integrator to enter regions where its approximation to the molecular force is inaccurate and possibly destabilizing. In the latter case, the approximation spends a disproportionate amount of time at higher energies, and thus, the invariant measure of the approximation, if it even exists, is not correct. This phenomenon is a well-known shortcoming of explicit integrators for nonlinear diffusions [65–69].

Recently, a probabilistic approach was proposed to solve this problem, which questions the notion that Monte Carlo methods and MD have different aims: the former strictly samples probability distributions, and the latter estimates dynamics. The basic idea is to combine a standard MD integrator with a Metropolis-Hastings algorithm targeted to the Boltzmann-Gibbs distribution [70–72]. Because the scheme is a Monte Carlo method, it exactly preserves the desired distribution [71,72]. This property implies numerical stability over long-time simulations. However, the price to be paid for this stability is a loss of accuracy whenever a move is rejected and some overhead in evaluating the Metropolis acceptance-rejection step. Still, a Metropolized integrator is dynamically accurate on finite-time intervals [72,73], and so, even though a Metropolized integrator involves a Monte Carlo step, its aim and philosophy are very different from Monte Carlo methods, whose only goal is to sample a target distribution with no concern for the dynamics [71,74–82]. In principle, this approach offers a simple alternative to costly implicit integrators, but are Metropolized integrators ready for daily use in MD simulation? The answer to this question is unclear, since this approach is new and has not been tested on enough examples.

Motivated by these issues, this paper builds a software system for MD simulation with a Metropolis step built in and applies it to a homogeneous molecular system. The algorithm and its properties are introduced in a step-by-step fashion. In particular, we show that the integrator is second-order weakly accurate on finite-time intervals and converges to the Boltzmann-Gibbs distribution in the long-time limit. The software version of the algorithm is written in the latest version of MATLAB with plenty of comments, variables that are descriptively named and operations that can be easily translated into mathematical expressions [83]. Since MATLAB is widely available, this design ensures that the software will be easy-to-use and cross-platform. The following MATLAB-specific file formats will be used.


The paper is organized as follows. We begin with an overview of integrators that have been proposed in MD simulation in Section 2. We explain how to Metropolize each of these schemes to make them long-time stable in Section 3, and as an application, we use a Metropolized scheme to generate a long trajectory of a Lennard-Jones fluid in Section 4. Generalizations of corrected MD integrators to other molecular models are discussed in Section 5. The paper closes by discussing some potential pitfalls in high dimension and tricks to get the integrator to scale well in Section 6.

#### 2. Algorithmic Introduction to Time Integrators for MD Simulation

For pedagogical reasons, we will start with Langevin dynamics of a system of N molecules. Then, we show in Section 5 how to simulate more general models of molecular systems. Denote by <sup>m</sup><sup>j</sup> <sup>&</sup>gt; <sup>0</sup> and *<sup>q</sup>*<sup>j</sup> the mass and position of the <sup>j</sup>-th molecule, respectively. The governing Langevin equation is given by:

$$\begin{cases} \frac{d\mathbf{q}\_j}{dt}(t) = m\_j^{-1} \mathbf{p}\_j(t) \; ,\\ d\mathbf{p}\_j(t) = -\frac{\partial U}{\partial \mathbf{q}\_j}(\mathbf{q}(t))dt - \gamma \mathbf{p}\_j(t)dt + \sqrt{2kT\gamma m\_j}d\mathbf{w}\_j \; , \end{cases} \quad j = 1, \dots, N \tag{1}$$

where *<sup>q</sup>* = (*q*<sup>1</sup>, ··· , *<sup>q</sup>*<sup>N</sup> ) and *<sup>p</sup>* = (*p*<sup>1</sup>, ··· , *<sup>p</sup>*<sup>N</sup> ) denote the positions and momenta of the particles, kT is the temperature factor, and {*w*<sup>j</sup>}<sup>N</sup> <sup>j</sup>=1 are N-independent Brownian motions. The last two terms in the second equation in (1) represent the effect of a heat bath with parameter γ. In Langevin dynamics, positions are differentiable, and due to the irregularity of the Brownian force, momenta are just continuous, but not differentiable. This difference in regularity explains why the first equation in (1) is written as an ordinary differential equation (ODE) and the second equation is written as a stochastic differential equation (SDE).

The bath-free dynamics is a Hamiltonian system with the following Hamiltonian energy function:

$$H(\mathbf{q}, \mathbf{p}) = \sum\_{j=1}^{N} \frac{1}{2m\_j} |\mathbf{p}\_j|^2 + U(\mathbf{q}) \tag{2}$$

Since the masses are constant, this Hamiltonian nicely separates into a kinetic and potential energy that are purely functions of *p* and *q*, respectively. The stationary probability density of the solution to Equation (1) is the Boltzmann-Gibbs density given by:

$$\nu(\mathbf{q}, \mathbf{p}) = Z^{-1} \exp\left(-\frac{1}{kT} H(\mathbf{q}, \mathbf{p})\right), \quad Z = \int \exp\left(-\frac{1}{kT} H(\mathbf{q}, \mathbf{p})\right) d\mathbf{q} d\mathbf{p} \tag{3}$$

Let <sup>h</sup> be a given time step size and *<sup>m</sup>* = diag(m1, ··· , m<sup>N</sup> ). Let (*Q*<sup>0</sup>, *<sup>P</sup>* <sup>0</sup>) denote the position and momentum of the molecular system at time t > 0. The simplest approximation to Equation (1) is a forward Euler discretization or Euler-Maruyama scheme [85] that computes an updated position and momentum (*Q*<sup>1</sup>, *<sup>P</sup>* <sup>1</sup>) at <sup>t</sup> <sup>+</sup> <sup>h</sup> using:

$$\begin{aligned} \mathbf{Q}\_1 &= \mathbf{Q}\_0 + h m^{-1} \mathbf{P}\_0\\ \mathbf{P}\_1 &= \mathbf{P}\_0 - h \nabla U(\mathbf{Q}\_0) - h \gamma \mathbf{P}\_0 + \sqrt{h} \sqrt{2kT \gamma} m^{1/2} \mathbf{\xi} \end{aligned} \tag{\text{forward Euler}}$$

Here, *<sup>ξ</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup> denotes a Gaussian random vector with mean zero and covariance <sup>E</sup>(*ξ*<sup>i</sup>*ξ*<sup>j</sup> ) = *<sup>δ</sup>*ij . The problem with this approximation is that the forward Euler method is known to diverge in finite-time when the derivatives of the potential are unbounded, which is the norm in MD simulation. The precise statement and proof of divergence in a general setting can be found in [86]. By far the most computationally intensive part of the time-stepping algorithm is the evaluation of the potential force. Thus, we will restrict our discussion to schemes that, like Euler, only require a single force field evaluation per step.

An improvement to the forward Euler method is the following two-step scheme:

$$\mathbf{Q}\_2 = (1 + e^{-\gamma h})\mathbf{Q}\_1 - e^{-\gamma h}\mathbf{Q}\_0 + \frac{1 - e^{-\gamma h}}{\gamma}m^{-1}\left(-h\nabla U(\mathbf{Q}\_1) + \sqrt{h}\sqrt{2kT\gamma}m^{1/2}\xi\right) \tag{\text{BBK}}$$

In the limit, γ → 0, this scheme reduces to the well-known Verlet integrator for MD simulation [7]. Just like Verlet, this integrator defines a map on pairs of molecular system configurations. Substituting the approximation, <sup>e</sup>−γh <sup>≈</sup> (1 <sup>−</sup> γh/2)/(1 + γh/2), into the above yields the Brünger-Brooks-Karplus (BBK) scheme, as appearing in [35]. Like the forward Euler method, this method is explicit and only requires one new force evaluation per step.

Second-order accurate schemes that generalize the Velocity Verlet integrator to Langevin dynamics were proposed in a sequence of papers [42–44,87,88]. Here, we mention two of these schemes that are both Strang splittings of Equation (1). The first was proposed by Ricci and Ciccotti [42] and consists of the following sub-steps:

$$\underbrace{\begin{pmatrix} \dot{\mathbf{q}}(t) = m^{-1}\mathbf{p}(t) \\\\ d\mathbf{p}(t) = \mathbf{0} \end{pmatrix}}\_{\text{exactly evolve by 1/2 a step}} \circ \underbrace{\begin{pmatrix} \dot{q}(t) = \mathbf{0} \\\\ d\mathbf{p}(t) = -\nabla U(\mathbf{q}(t))dt - \gamma \mathbf{p}(t)dt + \sqrt{2kT\gamma}m^{1/2}dW \end{pmatrix}}\_{\text{exactly overlap by a step}} \circ \underbrace{\begin{pmatrix} \dot{q}(t) = m^{-1}\mathbf{p}(t) \\\\ d\mathbf{p}(t) = \mathbf{0} \\\\ \text{exactly evolve by 1/2 a step}}\end{pmatrix}$$

Each step in this decomposition can be exactly solved. Clearly, the half-steps are easy to solve, since momentum is constant over each of these half-steps. The SDE appearing in the inner step can also be exactly solved, since it is linear in momentum (see Chapter 5 in [89]). This splitting is quite natural, since it treats the heat bath forces in the same way as the potential forces.

A related, but different, splitting method was proposed by Bussi and Parinello in [43] and is given by:

$$\underbrace{\begin{pmatrix}\dot{\mathbf{q}}(t) = \mathbf{0} \\\\ \mathbf{dp}(t) = -\gamma p(t)dt + \sqrt{2kT\gamma}\mathbf{m}^{1/2}dW \end{pmatrix}}\_{\text{exactly evolve by }1/2 \text{ a.s.p}} \circ \underbrace{\begin{pmatrix}\dot{\mathbf{q}}(t) = m^{-1}\mathbf{p}(t) \\\\ \dot{\mathbf{p}}(t) = -\nabla U(\mathbf{q}(t)) \end{pmatrix}}\_{\text{ approximately evolve}} \circ \underbrace{\begin{pmatrix}\dot{\mathbf{q}}(t) = \mathbf{0} \\\\ \mathbf{dp}(t) = -\gamma p(t)dt + \sqrt{2kT\gamma}\mathbf{m}^{1/2}dW \end{pmatrix}}\_{\text{exactly yellow by }1/2 \text{ a.s.p}}$$

Notice that this decomposition splits the Langevin dynamics into its Hamiltonian and heat bath parts, which makes it easy to analyze the structural properties of the scheme. A Velocity Verlet integrator is used to approximate the Hamiltonian dynamics. This approximation exactly preserves phase space volume and preserves energy to third-order accuracy per step. Moreover, the solution to the SDE appearing in the half-steps exactly preserves the Boltzmann-Gibbs density.

Since the Velocity Verlet integrator does not exactly preserve energy, the composition above does not exactly preserve the stationary distribution with density in Equation (3). In [90], it was shown that if the derivatives of the potential are all bounded, the Bussi and Parinello integrator possesses an invariant measure that is <sup>O</sup>(h<sup>2</sup>) close to the Boltzmann-Gibbs distribution. In this same context, the leading order error term in the integrator's approximation to the invariant measure was explicitly determined [91]. Technically speaking, however, these results do not directly apply to MD simulation, since real MD simulation involves potentials whose derivatives are unbounded, e.g., Lennard-Jones forces. As a consequence of this irregularity in the force fields and discretization error, explicit schemes, like this one, may either not detect features of the potential energy properly, which leads to unnoticed, but large errors in dynamic quantities such as the mean first passage time, or may mishandle soft- or hard-core potentials, which leads to numerical instabilities; see the numerical examples in [92]. These numerical artifacts motivate adding a Metropolis accept/refusal sub-step to the integrator. In the next section, we show how to Metropolize all of the MD integrators presented in this section. In Section 5, we explain how to generalize the Metropolis-corrected Bussi and Parinello algorithm to a larger class of diffusion processes.

#### 3. Metropolis-Corrected MD Integrators

Here, we show how to add a Metropolis acceptance-rejection step to a BBK-type scheme and the Bussi and Parinello splitting scheme and then precisely state the properties of these integrators. We start with a detailed description of each algorithm. Both algorithms require evaluating the acceptance probability given by the usual Metropolis ratio:

$$\alpha(\mathbf{q}, \mathbf{p}, \mathbf{Q}, \mathbf{P}) = \min\left(1, \exp\left(-\frac{1}{kT}(H(\mathbf{Q}, \mathbf{P}) - H(\mathbf{q}, \mathbf{p}))\right)\right) \tag{4}$$

The procedure to Metropolize the Ricci and Ciccotti scheme can be found in Section 2 of [70].

Algorithm 3.1 (First-order BBK-type integrator). Given the current state (*Q*<sup>0</sup>, *P* <sup>0</sup>) at time <sup>t</sup>, the algorithm proposes a new state (*Q* <sup>1</sup>, *P* <sup>1</sup>) at time t + h for some time step h > 0 via:

$$
\begin{pmatrix} Q\_1^\* \\ P\_1^\* \end{pmatrix} = \begin{pmatrix} Q\_0 + m^{-1} \left( hP\_0 - \frac{h^2}{2} \nabla U(Q\_0) \right) \\ P\_0 - \frac{h}{2} \left( \nabla U(Q\_0) + \nabla U(Q\_1^\*) \right) \end{pmatrix} \tag{\text{Step 1}}
$$

This "proposal move" (*Q* <sup>1</sup>, *P* <sup>1</sup>) is then accepted or rejected:

$$
\begin{pmatrix} \tilde{Q}\_1\\ \tilde{P}\_1 \end{pmatrix} = x \begin{pmatrix} Q\_1^\star\\ P\_1^\star \end{pmatrix} + (1-x) \begin{pmatrix} Q\_0\\ -P\_0 \end{pmatrix} \tag{\text{Step 2}}
$$

where <sup>x</sup> is a Bernoulli random variable with parameter <sup>α</sup>(*Q*<sup>0</sup>, *P* <sup>0</sup>, *Q* <sup>1</sup>, *P* <sup>1</sup>) given by Equation (4). The actual update of the system is taken to be:

$$
\begin{pmatrix} Q\_1 \\ P\_1 \end{pmatrix} = \begin{pmatrix} \tilde{Q}\_1 \\ \exp(-\gamma h)\tilde{P}\_1 + \sqrt{kT}\sqrt{1 - \exp(-2\gamma h)}m^{1/2}\xi \end{pmatrix} \tag{\text{Step 3}}
$$

Here, *<sup>ξ</sup>* <sup>∈</sup> <sup>R</sup><sup>n</sup> denotes a Gaussian random vector with mean zero and covariance <sup>E</sup>(*ξ*<sup>i</sup>*ξ*<sup>j</sup> ) = kT*δ*ij .

The momenta of the molecules gets reversed if a move is rejected in Step 2 of Algorithm 3.1. This momentum flip is necessary for the algorithm to preserve the correct stationary distribution [70,71], but results in an O(1) error in dynamics. High acceptance rates are therefore needed to ensure that the time lag between successive rejections is frequently long enough for the approximation to capture the desired dynamics. Since the acceptance rate in Equation (4) is related to how well the Verlet integrator in (Step 1) preserves energy after a single step, this rejection rate is O(h<sup>3</sup>). Thus, in practice, we find that the time step required to obtain a sufficiently high acceptance rate is often automatically fulfilled by a time step that sufficiently resolves the desired dynamics. Each step of this algorithm requires: evaluating the atomic force field once in the third equation of (Step 1), generating a Bernoulli random variable with parameter α in (Step 2) and generating an n-dimensional Gaussian vector in (Step 3). We stress that (Step 2) in Algorithm 3.1 is all that is needed to get MD integrators to exactly preserve the Boltzmann-Gibbs density in Equation (3).

Next, we show how to Metropolize the Bussi and Parinello splitting integrator.

Algorithm 3.2 (Second-order Bussi and Parinello integrator). Let *ξ*, *η* <sup>∈</sup> <sup>R</sup><sup>n</sup> be two independent Gaussian random vectors with mean zero and covariance <sup>E</sup>(*ξ*<sup>i</sup>*ξ*<sup>j</sup> ) = <sup>E</sup>(*η*<sup>i</sup>*η*<sup>j</sup> ) = *<sup>δ</sup>*ij . Given a time step size <sup>h</sup> and the current state (*Q*<sup>0</sup>, *<sup>P</sup>* <sup>0</sup>) at time <sup>t</sup>, the algorithm takes a half-step of the heat bath dynamics:

$$
\begin{pmatrix}
\tilde{\mathbf{Q}}\_0 \\
\tilde{\mathbf{P}}\_0
\end{pmatrix} = \begin{pmatrix}
\mathbf{Q}\_0 \\
\exp(-\gamma h/2)\tilde{\mathbf{P}}\_0 + \sqrt{kT}\sqrt{1-\exp(-\gamma h)}m^{1/2}\xi \\
\end{pmatrix} \tag{\text{Step 1}}
$$

Followed by a full step of Verlet to compute a proposal move (*Q*˜ 1, *P*˜ 1):

$$
\begin{pmatrix} \tilde{\boldsymbol{Q}}\_1^\star \\ \tilde{\boldsymbol{P}}\_1^\star \end{pmatrix} = \begin{pmatrix} \tilde{\boldsymbol{Q}}\_0 + \boldsymbol{m}^{-1} \left( h \tilde{\boldsymbol{P}}\_0 - \frac{h^2}{2} \nabla U(\tilde{\boldsymbol{Q}}\_0) \right) \\ \boldsymbol{P}\_0 - \frac{h}{2} \left( \nabla U(\tilde{\boldsymbol{Q}}\_0) + \nabla U(\tilde{\boldsymbol{Q}}\_1^\star) \right) \end{pmatrix} \tag{Step 2}
$$

This proposal move (*Q*˜ 1, *P*˜ <sup>1</sup>) is then accepted or rejected:

$$
\begin{pmatrix} \tilde{Q}\_1 \\ \tilde{P}\_1 \end{pmatrix} = x \begin{pmatrix} \tilde{Q}\_1^\star \\ \tilde{P}\_1^\star \end{pmatrix} + (1 - x) \begin{pmatrix} \tilde{Q}\_0 \\ -\tilde{P}\_0 \end{pmatrix} \tag{\text{Step 3}}
$$

where <sup>x</sup> is a Bernoulli random variable with parameter <sup>α</sup>(*Q*˜ <sup>0</sup>, *P*˜ <sup>0</sup>, *Q*˜ 1, *P*˜ <sup>1</sup>) given by Equation (4). The actual update of the system at time t + h is taken to be:

$$
\begin{pmatrix} Q\_1 \\ P\_1 \end{pmatrix} = \begin{pmatrix} \tilde{Q}\_1 \\ \exp(-\gamma h/2)\tilde{P}\_1 + \sqrt{kT}\sqrt{1 - \exp(-\gamma h)}m^{1/2}\eta \end{pmatrix} \tag{\text{Step 4}}
$$

This algorithm requires generating two independent n-dimensional Gaussian vectors per step. Thus, it is more costly than Algorithm 3.1. However, the advantage of doing this is that the resulting Metropolis corrected algorithm is second-order weakly accurate, as the following Proposition states.

Proposition 3.3. *Let* (*Q*<sup>n</sup>, *<sup>P</sup>* <sup>n</sup>) *represent the numerical approximation produced by Algorithm 3.2 at time* nh *with the same initial condition as the true solution:* (*Q*<sup>0</sup>, *<sup>P</sup>* <sup>0</sup>)=(*q*(0), *<sup>p</sup>*(0))*. For every time interval* T > <sup>0</sup> *and for suitable observables* <sup>f</sup>(*q*, *p*)*, there exists a* <sup>C</sup>(T) <sup>&</sup>gt; <sup>0</sup>*, such that:*

$$|\mathbb{E}f(\mathbf{q}(\lfloor t/h \rfloor h), \mathbf{p}(\lfloor t/h \rfloor h)) - \mathbb{E}f(\mathbf{Q}\_{\lfloor t/h \rfloor}, \mathbf{P}\_{\lfloor t/h \rfloor})| \le C(T)h^2 \tag{5}$$

*for all* t<T*.*

This accuracy concept is sufficient for computing means and correlation functions at finite-time and equilibrium correlations. Figure 2 verifies this Proposition by checking the weak accuracy of Algorithms 3.1 and 3.2 on a harmonic oscillator test problem.

To be specific, Figure 2 plots the weak accuracy of the Metropolis-corrected MD integrators with respect to the true solution of the Langevin dynamics of a harmonic oscillator: q˙(t) = p(t), dp(t) = <sup>−</sup>q(t) <sup>−</sup> <sup>p</sup>(t) + <sup>√</sup>2dw(t), with initial condition <sup>q</sup>(0) = 1.<sup>0</sup> , p(0) = 0. The time steps tested are h = 2−<sup>n</sup>, where n is given on the x-axis. The quantity monitored for the error is the estimate of E(q(1)<sup>2</sup> + p(1)<sup>2</sup>)=1.699445410 computed analytically. The dashed and solid curves are the graphs of 2−<sup>n</sup>(= h) and 2−2<sup>n</sup>(= h<sup>2</sup>) *versus* n, respectively.

Proof. The desired single-step error estimate can be obtained from an application of the triangle inequality:

$$|\mathbb{E}f(\mathbf{q}(h),\mathbf{p}(h)) - \mathbb{E}f(\mathbf{Q}\_1,\mathbf{P}\_1)| \le |\mathbb{E}f(\mathbf{q}(h),\mathbf{p}(h)) - \mathbb{E}f(\hat{\mathbf{Q}}\_1,\hat{\mathbf{P}}\_1)| + |\mathbb{E}f(\hat{\mathbf{Q}}\_1,\hat{\mathbf{P}}\_1) - \mathbb{E}f(\mathbf{Q}\_1,\mathbf{P}\_1)|\tag{6}$$

where (*Q*<sup>ˆ</sup> <sup>1</sup>, *<sup>P</sup>*<sup>ˆ</sup> <sup>1</sup>) denotes one step of the uncorrected Bussi and Parinello scheme with (*Q*<sup>ˆ</sup> <sup>0</sup>, *<sup>P</sup>*<sup>ˆ</sup> <sup>0</sup>) = (*q*(0), *p*(0)). The first term in the upper bound in Equation (6) is <sup>O</sup>(h<sup>3</sup>), since the unadjusted scheme is a Strang splitting of Equation (1). To bound the second term in Equation (6), note that:

$$\mathbb{E}f(\boldsymbol{Q}\_1, \boldsymbol{P}\_1) - \mathbb{E}f(\boldsymbol{\hat{Q}}\_1, \boldsymbol{\hat{P}}\_1) = \mathbb{E}\left\{ \left( \bar{f}(\boldsymbol{\tilde{Q}}\_1^\star, \boldsymbol{\tilde{P}}\_1^\star) - \bar{f}(\boldsymbol{\tilde{Q}}\_0, -\boldsymbol{\tilde{P}}\_0) \right) \left( \alpha(\boldsymbol{\tilde{Q}}\_0, \boldsymbol{\tilde{P}}\_0, \boldsymbol{\tilde{Q}}\_1^\star, \boldsymbol{\tilde{P}}\_1^\star) - 1 \right) \right\}$$

where we have introduced the auxilary function:

$$\bar{f}(\mathbf{q}, \mathbf{p}) = \mathbb{E}f(\mathbf{q}, \exp(-\gamma h/2)\mathbf{p} + \sqrt{kT}\sqrt{1 - \exp(-\gamma h)}\mathbf{m}^{1/2}\eta)$$

Since the rejection rate is <sup>O</sup>(h<sup>3</sup>), it follows from the above expression that the second term in the upper bound of Equation (6) is also O(h<sup>3</sup>). Standard results in numerical analysis for SDEs then imply that the algorithm converges weakly on finite-time intervals with global order two; see, for instance, [93] (Chapter 2.2).

For completeness sake, we also provide a statement that both algorithms are ergodic.

Proposition 3.4. *Let* (*Q*<sup>n</sup>, *P* <sup>n</sup>) *be the numerical approximation produced by Algorithms 3.1 or 3.2 at time* nh*. Then, for suitable observables* <sup>f</sup>(*q*, *p*)*:*

$$\lim\_{T \to \infty} \frac{1}{T} \int\_0^T f(\mathbf{Q}\_{\lfloor t/h \rfloor}, \mathbf{P}\_{\lfloor t/h \rfloor}) dt \to \int\_{\mathbb{R}^{2n}} f(\mathbf{q}, \mathbf{p}) \nu(\mathbf{q}, \mathbf{p}) d\mathbf{q} d\mathbf{p} \tag{7}$$

*Here,* <sup>ν</sup>(*q*, *p*) *denotes the Boltzmann-Gibbs density defined in Equation* (3)*.*

A proof of this Proposition can be found in [72].

#### 4. Application to Lennard-Jones Fluid

Listing 1 translates Algorithm 3.2 into the MATLAB language. Intrinsically defined MATLAB functions appear in boldface. The algorithm uses MATLAB's built in random number generators to carry out Step 1, Step 3 and Step 4. In particular, the Bernoulli random variable, x, in Step 3 is generated in Line 20, and the Gaussian vectors in Step 1 and Step 4 are generated on Line 9 and Line 29, respectively. In addition to updating the positions and momenta of the system, the program also stores the previous value of the potential energy and force, so that the force and potential energy is evaluated in Line 15 just once per simulation step. This evaluation calls a MEX function, which inputs the current position of the molecular system and outputs the force field and potential energy at that position. We use a MEX function, because the atomistic force field evaluation cannot be easily vectorized and is, by far, the most computationally demanding step in MD. The PreProcessing script file called in Line 2 defines the physical and numerical parameters, sets the initial condition and allocates space for storing simulation data. Sample averages are updated as new points on the trajectory are produced in the UpdateSampleAverages script file invoked in Line 35. Finally, the outputs produced by the algorithm are handled by the PostProcessing script file in Line 39.

Let us consider a concrete example: a Lennard-Jones fluid that consists of N identical atoms [1–3]. The configuration space of this system is a fixed cubic box with periodic boundary conditions. The distance between the i-th and j-th particle is defined according to the minimum image convention, which states that the distance between *<sup>q</sup>*<sup>i</sup> and *<sup>q</sup>*<sup>j</sup> in a cubic box of length is:

$$d\_{MD}(\mathbf{q}\_i, \mathbf{q}\_j) \stackrel{\text{def}}{=} |(\mathbf{q}\_i - \mathbf{q}\_j) - \ell |(\mathbf{q}\_i - \mathbf{q}\_j)/\ell| \,\tag{8}$$

where · is the nearest integer function. In terms of this distance, the total potential energy is a sum over all pairs:

$$U(\mathbf{q}) = \sum\_{i=1}^{n-1} \sum\_{j=i+1}^{n} U\_{LJ}(d\_{MD}(\mathbf{q}\_i, \mathbf{q}\_j)) \tag{9}$$

where ULJ (r) is the following truncated Lennard-Jones potential function:

1

$$U\_{LJ}(r) = \begin{cases} f(r) - f(r\_c), & r < r\_c \\ 0, & \text{otherwise} \end{cases} \tag{10}$$

#### Listing 1. Metropolized MD Integrator: MDintegrator.m

```
2 PreProcessing;
3
4 for i = 1:Ns
5
6 %--- Step 1 --- Heat Bath Step
7
8 tQ0=Q0;
9 tP0=f1*P0+f2*randn(3*Nm,1);
10
11 %--- Step 2 --- Velocity Verlet Proposal
12
13 Ppt5=tP0+0.5*h*F0;
14 tQ1star=tQ0+h*Ppt5;
15 [tF1star,tU1star]=ForceFieldmex(tQ1star,Nm,rcut2,ell);
16 tP1star=Ppt5+0.5*h*tF1star;
17
18 %--- Step 3 --- Accept or Refuse Step
19
20 x=(rand<exp(-(0.5*tP1star'*tP1star-0.5*tP0'*tP0+tU1star-U0)/kT));
21
22 tP1=x*P1star-(1-x)*P0;
23 tQ1=x*Q1star+(1-x)*Q0;
24 F1=x*tF1star+(1-x)*F0; U1=x*tU1star+(1-x)*U0;
25
26 %--- Step 4 --- Heat Bath Step
```
### 

```
27
28 Q1=tQ1;
29 P1=f1*tP1+f2*randn(3*Nm,1);
30
31 %--- iterate
32
33 Q0=Q1; P0=P1; F0=F1; U0=U1;
34
35 UpdateSampleAverages;
36
37 end
38
39 PostProcessing;
```
Listing 2. Metropolized MD Integrator: PreProcessing.m

```
1 %--- seed random # generator
2
3 rng(123);
4
5 %--- physical parameters
6
7 rho=0.6; % density
8 kT=0.5; % temperature factor
9 gama=0.1; % heat bath parameter
10 Nm=500; % # of molecules
11 T=2.0; % time span for velocity correlation
12 ell=(Nm/rho)^(1/3); % length of cubic box
13
14 %--- simulation parameters
15
16 h=0.005; % time-step size
17 Ns=1e3; % # of steps
18 rcut = 2.0^(1/6); % cutoff radius
19 rcut2 = rcut*rcut;
20
21 f1=exp(-0.5*gama*h); f2=sqrt((1.0-exp(-gama*h))*kT);
22
23 %--- initial condition
24
25 A=fcclattice(Nm,ell);
```

```
26 Q0=reshape(A, [3*Nm 1]); % atoms on an fcc lattice
27 P0=zeros(3*Nm,1); % atoms at rest
28
29 %--- initialize statistics
30
31 NA=ceil(T/h)+1; % preallocate space for
32 acf=zeros(NA,1); % online correlation computation
33 varacf=zeros(NA,1);
34 pivot=zeros(NA,3*Nm);
35 nacf=zeros(NA,1);
36
37 AP=zeros(Ns,1); % vector of acceptance probabilities
38
39 [F0,U0]=ForceFieldmex(Q0,Nm,rcut2,ell); % initial force & energy
```
Here, <sup>f</sup>(r) = 4(1/r<sup>12</sup> <sup>−</sup> <sup>1</sup>/r<sup>6</sup>) and <sup>r</sup><sup>c</sup> is the cutoff radius, which is bounded above by the size of the simulation box; and we have used dimensionless units to describe this system, where energy is rescaled by the depth of the Lennard-Jones potential energy and length by the point where the potential energy is zero. The error introduced by the truncation in Equation (10) is proportional to the density of the molecular system and can be made arbitrarily small by selecting the cutoff distance to be sufficiently large. A direct evaluation of the potential force, <sup>∇</sup>U(*q*), scales like <sup>O</sup>(N<sup>2</sup>), and typically dominates the total computational cost. In practice, neighbor/cell lists, also called Verlet lists, are used in order to obtain a force evaluation that scales linearly with system size. Since the system we consider will have just a few hundred atoms, there is, however, little advantage to using these data structures, or using a fast force field evaluation, and thus, ForceFieldmex evaluates the force and energy using a sum over all particle pairs.


Table 1. Simulation parameters.

Listing 2 shows the PreProcessing script, which sets the parameters provided in Table 1 and constructs the initial condition, where the N atoms are assumed to be at rest and on the sites of a face-centered cubic lattice. The command, rng(123), on *Line 3* sets the seed of the random number generator functions, RAND and RANDN. The acceptance rates at every step and the velocity autocorrelation are updated in the UpdateSampleAverages script shown in Listing 3. The mean acceptance rate, which is outputted in the PostProcessing script shown in Listing 4, must be high enough to ensure that the dynamics is accurately represented. To compute the autocorrelation of an observable over a time interval of length T, the value of that observable along the entire trajectory is not needed. In fact, it suffices to use the values of this observable along a piece of trajectory over a moving time-window [ti, t<sup>i</sup> +T], where t<sup>i</sup> = i×h. This storage space is allocated in PreProcessing and is updated in UpdateSampleAverages. More precisely, the molecular velocities are stored in the pivot array from i − N<sup>a</sup> to i, where i is the index of the current position and N<sup>a</sup> = T/h + 1. Notice that velocity autocorrelations are not computed until after the index, i, exceeds 10<sup>4</sup>. This *equilibration time* removes some of the statistical bias that may arise from using a non-random initial condition. Short-time trajectories of this molecular system are plotted in Figure 3 from an initial condition where atoms are placed on the sites of a face-centered cubic lattice and at rest. The trajectory is computed using the numerical and physical parameters indicated in Table 1, with the exception of the number of steps, which is set equal to N<sup>s</sup> = 1000. Notice that at lower densities particle trajectories are more diffusive and less localized. Using the parameters provided in Table 1, we compute velocity autocorrelations for a range of density values in Figure 4. Since the heat bath parameter is set to a small value, these figures are in qualitative agreement with those obtained by simulating the molecular system with no heat bath as shown in Figure 5.2 of [3].

#### Listing 3. Metropolized MD Integrator: UpdateSampleAverages.m

```
1 %--- store acceptance probability
2
3 AP(i)=x;
4
5 %--- update correlation function
6
7 if (i>1e4)
8
9 pp=mod(i-1,NA)+1;
10 pivot(pp,:)=P0;
11
12 for j=1:min(i,NA)
13 nacf(j)=nacf(j)+1;
14 mui=acf(j);
15 vari=varacf(j);
16 n_samples=nacf(j);
17 xip1=pivot(mod(pp-j,NA)+1,:)*pivot(pp,:)'/(3.0*Nm);
18 acf(j)=mui+(xip1-mui)/n_samples;
19 varacf(j)=((n_samples-1)*vari+...
```
Listing 4. Metropolized MD Integrator: PostProcessing.m

```
1 %--- output results
2
3 disp(['h=' num2str(h) ',<AP>=' num2str(mean(AP))]);
4
5 figure(2); clf; hold on; tt=0:h:T;
6 errorbar(tt,acf,1.96*sqrt(varacf)./sqrt(nacf));
7
8 save('VelocityAutocorrelation.mat', 'tt', 'acf', 'varacf');
```
Figure 3. Atomic trajectories in a simulation box.

Figure 4. Soft-sphere velocity autocorrelation functions. A reproduction of Figure 5.2 of [3] using Langevin dynamics with heat bath parameter γ = 0.01. The remaining parameters are set equal to those provided in Table 1. The negative correlations at higher densities are consistent with what has been found in the literature [6,8].

## 5. General Case

Here, we show how the preceding ideas extend to other molecular systems that obey stochastic differential equations. In the process, we generalize the Metropolized Bussi and Parinello integrator (Algorithm 3.2) to a big class of diffusion processes, including the v-rescale thermostat. We begin with the underlying Hamiltonian dynamics of a molecular system.

## *5.1. Bath-Free Dynamics*

MD is based on Hamilton's equations for a Hamiltonian <sup>H</sup> : <sup>R</sup><sup>2</sup><sup>d</sup> <sup>→</sup> <sup>R</sup>:

$$\dot{\mathbf{z}}(t) = \mathbf{J} \nabla H(\mathbf{z}(t)) \; , \; \mathbf{z}(0) \in \mathbb{R}^{2d} \tag{11}$$

where *<sup>z</sup>*(t)=(*q*(t), *<sup>p</sup>*(t)) is a vector of molecular positions *<sup>q</sup>*(t) <sup>∈</sup> <sup>R</sup><sup>d</sup> and momenta *p*(t) <sup>∈</sup> <sup>R</sup><sup>d</sup> and *J* is the <sup>2</sup><sup>d</sup> <sup>×</sup> <sup>2</sup><sup>d</sup> skew-symmetric matrix defined as:

$$\mathbf{J} = \begin{pmatrix} \mathbf{0}\_{d \times d} & \mathbf{I}\_{d \times d} \\ -\mathbf{I}\_{d \times d} & \mathbf{0}\_{d \times d} \end{pmatrix} \tag{12}$$

The Hamiltonian, <sup>H</sup>(*z*), represents the total energy of the molecular system and is typically "separable", meaning that it can be written as:

$$H(\mathbf{z}) = K(\mathbf{p}) + U(\mathbf{q}) \,, \; z = (\mathbf{q}, \mathbf{p}) \tag{13}$$

where <sup>K</sup>(*p*) and <sup>U</sup>(*q*) are the kinetic and potential energy functions, respectively [94]. In MD, the kinetic energy function is a positive definite quadratic form, and the potential energy function involves "fudge factors" determined from experimental or quantum mechanical studies of pieces of the molecular system of interest [36]. The accuracy of the resulting energy function must be systematically verified by comparing MD simulation data to experimental data [95]. The flow that Equation (11) determines has the following structure:

(S1) volume-preserving (since the vector-field in Equation (11) is divergenceless); and

(S2) energy-preserving (since *J* is skew-symmetric and constant).

Explicit *symplectic integrators*, like the Verlet scheme, exploit these properties to obtain long-time stable schemes for Hamilton's equations [96,97].

## *5.2. Governing Stochastic Dynamics*

In order to mimic experimental conditions, Equation (11) is often coupled to a bath that puts the system at constant temperature and/or pressure. The standard way to do this is to assume that the system with a bath is governed by a stochastic ordinary differential equation (SDE) of the type:

$$\boxed{d\mathbf{Y}(t) = \underbrace{\mathbf{A}(\mathbf{Y}(t))dt + \underbrace{(\operatorname{div}\mathbf{D})(\mathbf{Y}(t))dt + \sqrt{2kT}\mathbf{B}(\mathbf{Y}(t))dW(t)}\_{\text{heat bath}}}\_{\text{heat bath}}\tag{14}$$

Here, we have introduced the following notation.


The <sup>n</sup> <sup>×</sup> <sup>n</sup> *diffusion matrix*, *D*(*x*), is defined in terms of the noise coefficient matrix, *B*(*x*), as:

$$D(x) \stackrel{\text{def}}{=} kTB(x)B(x)^T, \text{ for all } x \in \mathbb{R}^n \tag{15}$$

where *B*(*x*)<sup>T</sup> denotes the transpose of the real matrix, *<sup>B</sup>*(*x*). The diffusion matrix is symmetric and nonnegative definite. Depending on the particular bath that is used, the dimension, <sup>n</sup>, of *Y* (t) in Equation (14) is related to the dimension, <sup>2</sup>d, of *z*(t) in Equation (11) by the inequality: <sup>n</sup> <sup>≥</sup> <sup>2</sup>d. For example, in Nosé-Hoover Langevin dynamics, a single bath degree of freedom is added to Equation (11), so that n = 2d + 1, while in Langevin dynamics, the effect of the bath is modeled by added friction and Brownian forces that keep n = 2d. The Langevin Equation (1) can be put in the form of Equation (14) by letting *x* = (*q*, *p*),

$$A(x) = \begin{pmatrix} m^{-1}p \\ -\nabla U(\mathbf{q}) - \gamma \mathbf{p} \end{pmatrix}, \quad \mathbf{B} = \sqrt{\gamma} \begin{pmatrix} \mathbf{0} & \mathbf{0} \\ \mathbf{0} & m^{1/2} \end{pmatrix}, \text{ and } \mathbf{W} = (w\_1, \dots, w\_N) \tag{16}$$

where *<sup>m</sup>* = diag(m1, ··· , m<sup>N</sup> ).

Equation (14) generates a stochastic process, *Y* (t), that is a Markov diffusion process. We assume that this diffusion process admits a *stationary* distribution <sup>μ</sup>(d*x*), *i.e.*, a probability distribution preserved by the dynamics [98,99]. We denote by <sup>ν</sup>(*x*) the density of this distribution. Even though the diffusion matrix in Equation (15) is not necessarily positive definite, one can use the Hörmander's condition to prove that the process, *Y* (t), is an ergodic process with a unique stationary distribution [100,101]. By the ergodic theorem, it then follows that:

$$\frac{1}{T} \int\_{0}^{T} f(\mathbf{Y}(t)) dt \to \int\_{\mathbb{R}^{n}} f(x) \nu(x) dx \,, \quad \text{as } T \to \infty, \quad \text{a.s.} \tag{17}$$

where <sup>f</sup>(*x*) is a suitable test function.

The evolution of the probability density of the law of *Y* (t) at time <sup>t</sup>, <sup>ρ</sup>(t, *x*), satisfies the Fokker-Planck equation:

$$-\frac{\partial \rho}{\partial t} + L\rho = 0\tag{18}$$

where <sup>ρ</sup>(0, ·) is the density of the initial distribution, *Y* (0) <sup>∼</sup> <sup>ρ</sup>(0, ·), and <sup>L</sup> is defined as the following second-order partial differential operator:

$$(Lf)(x) \stackrel{\text{def}}{=} \text{div}\left(\text{div}(\mathbf{D}(x)f(x)) - A(x)f(x)\right) \tag{19}$$

Since <sup>μ</sup>(d*x*) = <sup>ν</sup>(*x*)d*x* is a stationary distribution of *Y* (t), the probability density, <sup>ν</sup>(*x*), is a steady-state solution of Equation (18), *i.e.*, it satisfies:

$$(L\nu)(x) = 0\tag{20}$$

Define the *probability current* as the vector field:

$$j(x) \stackrel{\text{def}}{=} \text{div}(\mathbf{D}(x)\boldsymbol{\nu}(x)) - A(x)\boldsymbol{\nu}(x) \tag{21}$$

The stationarity condition in Equation (20) implies that *j*(*x*) is divergenceless. In the zero-current case, the diffusion process, *Y* (t), is *reversible*, and the stationary density <sup>ν</sup>(*x*) is called the equilibrium probability density of the diffusion [102].

In this case, the operator, L, is self-adjoint, in the sense that:

$$\langle Lf, g \rangle\_{\nu} = \langle f, Lg \rangle\_{\nu} \qquad \text{for all suitable test functions } f, g \tag{22}$$

where ·, ·<sup>ν</sup> denotes an <sup>L</sup><sup>2</sup> inner product weighted by the density, <sup>ν</sup>(x). This property implies that the diffusion is ν-symmetric [103]:

$$
\nu(x)p\_t(x,y) = \nu(y)p\_t(y,x) \qquad \text{for all } t > 0 \tag{23}
$$

where <sup>p</sup>t(*x*, *y*) denotes the transition probability density of *Y* (t). Indeed, Equation (22) is simply an infinitesimal version of Equation (23), which is referred to as the detailed balance condition. In the self-adjoint case, the drift is uniquely determined by the diffusion matrix and the stationary density <sup>ν</sup>(*x*):

$$j(x) = 0 \implies A(x) = \frac{1}{\nu(x)} \operatorname{div}(D(x)\nu(x))\tag{24}$$

$$\dots \quad . \quad . \quad \dots \quad . \quad . \quad . \quad . \quad . \quad . \quad \dots \quad .$$

Long-time stable explicit schemes adapted to this structure have been recently developed [92].

#### *5.3. Splitting Approach to MD Simulation*

We are now in a position to explain our general approach for deriving a long-time stable scheme for Equation (14). Crucial to our approach is that in MD simulation, we usually have a formula for a function proportional to the stationary density <sup>ν</sup>(*x*). Following [90], we can split Equation (14) into:

$$d\mathbf{Y} = -\mathbf{D}(\mathbf{Y})\nabla H\_{\nu}(\mathbf{Y})dt + \mathrm{div}\,\mathbf{D}(\mathbf{Y})dt + \sqrt{2kT}B(\mathbf{Y})dW\tag{25}$$

$$\dot{Y} = A(Y) + D(Y)\nabla H\_{\nu}(Y) \tag{26}$$

where we have introduced <sup>H</sup>ν(*x*) = <sup>−</sup>(log <sup>ν</sup>)(*x*). An *exact splitting method* preserves <sup>μ</sup>(d*x*). It is formed by taking the exact solution (in law) of Equation (25) in composition with the exact flow of Equation (26). The process produced by Equation (25) is self-adjoint with respect to <sup>ν</sup>(*x*). Moreover, the stationarity of <sup>ν</sup>(*x*) implies that the flow of the ODE (26) preserves it. Since each step is preservative, their composition is, too.

In place of the exact splitting, a Metropolized explicit integrator can be used for Equation (25) [92], and a measure-preserving scheme can be designed to solve the ODE [72,104]. In [92], explicit schemes are introduced for Equation (25) that: (i) sample the exact equilibrium probability density of the SDE when this density exists (*i.e.*, whenever <sup>ν</sup>(*x*) is normalizable); (ii) generates a weakly accurate approximation to the solution of Equation (14) at constant kT; (iii) acquire higher order accuracy in the small noise limit, kT → 0; and (iv) avoid computing the divergence of the diffusion matrix *D*(*x*). Compared to the methods in [72], the main novelty of these schemes stems from (iii) and (iv). The resulting explicit splitting method is accurate, since it is an additive splitting of Equation (14); and typically ergodic when the continuous process is ergodic [72].

This type of splitting of Equation (14) is quite natural and has been used before in MD [43,87], dissipative particle dynamics [105,106] and the simulation of inertial particles [107]. Other closely related schemes for Equation (14) include Brünger-Brooks-Karplus (BBK) [35], van Gunsteren and Berendsen (vGB) [108] and the Langevin-Impulse (LI) methods [41] and quasi-symplectic integrators [44]. However, for general MD force fields, none of these explicit integrators are long-time stable. Our framework to stabilize explicit MD integrators is the Metropolis-Hastings algorithm.

#### *5.4. Metropolis-Hastings Algorithm*

A Metropolis-Hastings method is a Monte Carlo method for producing samples from a probability distribution, given a formula for a function proportional to its density [74,75]. The algorithm consists of two sub-steps: firstly, a proposal move is generated according to a transition density, <sup>g</sup>(*x*, *y*); and secondly, this proposal move is accepted or rejected with a probability:

$$\alpha(x, y) = 1 \land \frac{g(y, x)\nu(y)}{g(x, y)\nu(x)} \tag{27}$$

Standard results on Metropolis-Hastings methods can be used to classify this algorithm as ergodic [100,109,110].

#### 6. Conclusions

This paper provided an algorithmic introduction to time integrators for MD simulation. A quick overview of existing algorithms was given. When the derivatives of the potential are bounded, it is well known that these integrators work just fine: they are convergent on finite-time intervals and possess an invariant measure that is nearby the Boltzmann-Gibbs density. However, in realistic MD simulation, the derivatives of the potential are unbounded. This lack of regularity can cause numerical instabilities or artifacts in explicit integrators. The paper demonstrated how a Metropolis acceptance-rejection step can be added to explicit MD integrators to mitigate some of these problems and, in principle, obtain long-time stable and finite-time accurate schemes. A MATLAB implementation of Metropolis-corrected MD integrators was provided and used to compute the velocity autocorrelation of a sea of Lennard-Jones particles at various densities between the solid and liquid phases. The paper did not provide an in-depth review of the theory of Metropolis integrators, which can be found elsewhere [72,73].

Calculating the force field at every step dominates the overall computational cost of MD simulation. These force fields involve: bonded interactions and non-bonded Lennard-Jones and electrostatic interactions. The calculation of bonded interactions is straightforward to vectorize and scales like O(N). In addition, Lennard-Jones forces rapidly decay with interatomic distance. To a good approximation, every atom interacts only with neighbors within a sufficiently large ball. By using data structures, like neighbor lists and cell linked lists, these interactions can be calculated in O(N) steps, and therefore, the Lennard-Jones interactions can be calculated in O(N) steps [46]. On the other hand, the electrostatic energy between particles decays, like 1/r, where r denotes an interatomic distance, which leads to long-range interactions between atoms. Unlike Lennard-Jones interaction, this interaction cannot be cutoff without introducing large errors. In this case, one can use sophisticated techniques, like the fast multipole method, to rigorously handle such interactions in O(N) steps [39,58].

However, the effect of these 'mathematical tricks' for fast calculation of the force field can become muted if the time step requirement for stability or accuracy becomes more severe in high dimension. This can happen in the Metropolis integrator, if the acceptance probability in Step 2 of Algorithm 3.1 or Step 3 of Algorithm 3.2 deteriorates in high dimension. The scaling of Metropolis algorithms has been quantified for the random walk Metropolis, hybrid Monte Carlo and Metropolis-adjusted Langevin algorithm (MALA) [111–115]. Since the acceptance probability is a function of an extensive quantity, the acceptance rate can artificially deteriorate with increasing system size, unless the time step is reduced. Because high acceptance rates are required to maintain dynamic accuracy, the dependence of the time step on system size limits the application of Metropolized schemes to large-scale systems. Fortunately, this scalability issue can often be resolved by using local, rather than global proposal moves, because the change in energy induced by a local move is typically an intensive quantity. For molecular dynamics calculations, this approach was pursued in [73]. Using dynamically consistent local moves (a so-called J-splitting [116]), it was shown that in certain situations, a scalable Metropolis integrator can be designed; however, the extent to which this strategy remedies the issue of high rejection rate in high dimension is not clear at this point and should be tested in applications.

#### Acknowledgments

The author wishes to acknowledge Eric Vanden-Eijnden for useful comments on an earlier version of this paper. The research that led to this paper was funded by the US National Science Foundation through Division of Mathematical Sciences (DMS) grant # DMS-1212058.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


116. Kang, F.; Dao-Liu, W. Dynamical Systems and Geometric Construction of Algorithms. In *Computational Mathematics in China*; Contemporary Mathmatics, Volume 163; Shi, Z.-C., Yang, C.C., Eds.; American Mathmatical Society: New York, NY, USA,1994; pp. 1–32.

Reprinted from *Entropy*. Cite as: Abrams, C.; Bussi, G. Enhanced Sampling in Molecular Dynamics Using Metadynamics, Replica-Exchange, and Temperature-Acceleration. *Entropy* 2014, *16*, 163–199.

## *Article*

## Enhanced Sampling in Molecular Dynamics Using Metadynamics, Replica-Exchange, and Temperature-Acceleration

Cameron Abrams **<sup>1</sup>***,* \* and Giovanni Bussi **<sup>2</sup>**


*Received: 13 September 2013; in revised form: 7 November 2013 / Accepted: 11 November 2013/ Published: 27 December 2013*

Abstract: We review a selection of methods for performing enhanced sampling in molecular dynamics simulations. We consider methods based on collective variable biasing and on tempering, and offer both historical and contemporary perspectives. In collective-variable biasing, we first discuss methods stemming from thermodynamic integration that use mean force biasing, including the adaptive biasing force algorithm and temperature acceleration. We then turn to methods that use bias potentials, including umbrella sampling and metadynamics. We next consider parallel tempering and replica-exchange methods. We conclude with a brief presentation of some combination methods.

Keywords: collective variables; free energy; blue-moon sampling; adaptive-biasing force algorithm; temperature-acceleration; umbrella sampling; metadynamics

#### 1. Introduction

The purpose of molecular dynamics (MD) is to compute the positions and velocities of a set of interacting atoms at the present time instant given these quantities one time increment in the past. Uniform sampling from the discrete trajectories one can generate using MD has long been seen as synonymous with sampling from a statistical-mechanical ensemble; this just expresses our collective wish that the ergodic hypothesis holds at finite times. Unfortunately, most MD trajectories are not ergodic and leave many relevant regions of configuration space unexplored. This stems from the separation of high-probability "metastable" regions by low-probability "transition" regions and the inherent difficulty of sampling a 3N-dimensional space by embedding into it a one-dimensional dynamical trajectory.

This review concerns a selection of methods to use MD simulation to enhance the sampling of configuration space. A central concern with any enhanced sampling method is guaranteeing that the statistical weights of the samples generated are known and correct (or at least correctable) while simultaneously ensuring that as much of the relevant regions of configuration space are sampled. Because of the tight relationship between probability and free energy, many of these methods are known as "free-energy" methods. To be sure, there are a large number of excellent reviews of free-energy methods in the literature (e.g., [1–5]). The present review is in no way intended to be as comprehensive. As the title indicates, we will mostly focus on enhanced sampling methods of three flavors: tempering, metadynamics, and temperature-acceleration. Along the way, we will point out important related methods, but in the interest of brevity we will not spend much time explaining these. The methods we have chosen to focus on reflect our own preferences to some extent, but they also represent popular and growing classes of methods that find ever more use in biomolecular simulations and beyond.

We divide our review into three main sections. In the first, we discuss enhanced sampling approaches that rely on *collective variable biasing*. These include the historically important methods of thermodynamic integration and umbrella sampling, and we pay particular attention to the more recent approaches of the adaptive-biasing force algorithm, temperature-acceleration, and metadynamics. In the second section, we discuss approaches based on *tempering*, which is dominated by a discussion of the parallel tempering/replica exchange approaches. In the third section, we briefly present some relatively new methods derived from either collective-variable-based or tempering-based approaches, or their combinations.

#### 2. Approaches Based on Collective-Variable Biasing

#### *2.1. Background: Collective Variables and Free Energy*

For our purposes, the term "collective variable" or CV refers to any multidimensional function *θ* of 3N-dimensional atomic configuration *x* <sup>≡</sup> (xi|<sup>i</sup> = 1 ... <sup>3</sup>N). The functions <sup>θ</sup>1(*x*), <sup>θ</sup>2(*x*),... ,θM(*x*) map configuration *<sup>x</sup>* onto an <sup>M</sup>-dimensional CV space *<sup>z</sup>* <sup>≡</sup> (z<sup>j</sup> <sup>|</sup><sup>j</sup> = 1 ...M), where usually <sup>M</sup> <sup>3</sup>N. At equilibrium, the probability of observing the system at CV-point *z* is the weight of all configurations *x* which map to *z*:

$$P(\mathbf{z}) = \langle \delta[\boldsymbol{\theta}(\mathbf{z}) - \mathbf{z}] \rangle \tag{1}$$

The Dirac delta function picks out only those configurations for which the CV *θ*(*x*) is *z*, and · denotes averaging its argument over the equilibrium probability distribution of *x*. The probability can be expressed as a *free energy*:

$$F(\mathbf{z}) = -k\_B T \ln \left< \delta[\boldsymbol{\theta}(\mathbf{z}) - \mathbf{z}] \right> \tag{2}$$

Here, k<sup>B</sup> is Boltzmann's constant and T is temperature.

Local minima in F are metastable equilibrium states. F also measures the energetic cost of a maximally efficient (*i.e.*, reversible) transition from one region of CV space to another. If, for example, we choose a CV space such that two well-separated regions define two important allosteric states of a given protein, we could perform a free-energy calculation to estimate the change in free energy required to realize the conformational transition. Indeed, the promise of being able to observe with atomic detail the transition states along some pathway connecting two distinct states of a biomacromolecule is strong motivation for exploring these transitions with CVs.

Given the limitations of standard MD, how does one "discover" such states in a proposed CV space? A perfectly ergodic (infinitely long) MD trajectory would visit these minima much more frequently than it would the intervening spaces, allowing one to tally how often each point in CV space is visited; normalizing this histogram into a probability <sup>P</sup>(*z*) would be the most straightforward way to compute F via Equation (2). In all too many actual cases, MD trajectories remain close to only one minimum (the one closest to the initial state of the simulation) and only very rarely, if ever, visit others. In the CV sense, we therefore speak of standard MD simulations failing to overcome *barriers* in free energy. "Enhanced sampling" in this context refers then to methods by which free-energy barriers in a chosen CV space are surmounted to allow as broad as possible an extent of CV space to be explored and statistically characterized with limited computational resources.

In this section, we focus on methods of enhanced sampling of CVs based on MD simulations that are directly biased on those CVs; that is, we focus on methods in which an investigator must identify the CVs of interest as an input to the calculation. We have chosen to limit discussion to two broad classes of biasing: those whose objective is direct computation of the gradient of the free energy (∂F/∂*z*) at local points throughout CV space, and those in which non-Boltzmann sampling with bias potentials is used to force exploration of otherwise hard-to-visit regions of CV space. The canonical methods in these two classes are *thermodynamic integration* and *umbrella sampling*, respectively, and a discussion of these two methods sets the stage for discussion of three relatively modern variants: the Adaptive-Biasing Force Algorithm [6], Temperature-Accelerated MD [7] and Metadynamics [8].

*2.2. Gradient Methods: Blue-Moon Sampling, Adaptive-Biasing Force Algorithm, and Temperature-Accelerated Molecular Dynamics*

#### 2.2.1. Overview: Thermodynamic Integration

Naively, one way to have an MD system visit a hard-to-reach point *z* in CV space is simply to create a realization of the configuration *x* at that point (*i.e.*, such that *θ*(*x*) = *z*). This is an inverse problem, since the number of degrees of freedom in *x* is usually much larger than in *z*. One way to perform this inversion is by introducing external forces that guide the configuration to the desired point from some easy-to-create initial state; both targeted MD [9] and steered MD [10] are ways to do this. Of course, one would like MD to explore CV space in the vicinity of *z*, so after creating the configuration *x*, one would just let it run. Unfortunately, this would likely result in the system drifting away from *z* rather quickly, and there would be no way from such calculations to estimate the likelihood of observing an unbiased long MD simulation visit *z*. However, there is information in the fact that the system drifts away; if one knows *on average* which direction and how strongly the system would like to move if initialized at *z*, this would be a measure of negative gradient of the free energy, <sup>−</sup>(∂F/∂*z*), or the "mean force". We have then a glimpse of a three-step method to compute F (*i.e.*, the statistics of CVs) over a meaningfully broad extent of CV space:


$$F(\mathbf{z}) - F(\mathbf{z}\_0) = \int\_{z\_0}^{\mathbf{z}} \left(\frac{\partial F}{\partial \mathbf{z}}\right) d\mathbf{z} \tag{3}$$

Inspired by Kirkwood's original suggestion involving switching parameters [11], such an approach is generally referred to as "thermodynamic integration" or TI. TI allows us to reconstruct the statistical weights of any point in CV space by accumulating information on the gradients of free energy at selected points.

#### 2.2.2. Blue-Moon Sampling

The discussion so far leaves open the correct way to compute the local free-energy gradients. A gradient is a local quantity, so a natural choice is to compute it from an MD simulation localized at a point in CV space by a constraint. Consider a long MD simulation with a holonomic constraint fixing the system at the point *z*. Uniform samples from this constrained trajectory *x*(t) then represent *an* ensemble at fixed *z* over which the averaging needed to convert gradients in potential energy to gradients in free energy could be done. However, this constrained ensemble has the undesired property that the velocities *<sup>θ</sup>*˙(*x*) are zero. This is a bit problematic because virtually none of the samples plucked from a long unconstrained MD simulation (as is implied by Equation (1)), would have *θ*˙ = 0, and *θ*˙ = 0 acts as a set of <sup>M</sup> unphysical constraints on the system velocities *x*˙ , since ˙ θ<sup>j</sup> = i (∂θj/∂xi) ˙xi. Probably the best-known example of a method to correct for this bias is the so-called "blue-moon" sampling method [12–15] or the constrained ensemble method [16,17]. The essence of the method is a decomposition of free energy gradients into components along the CV gradients and thermal components orthogonal to them:

$$\frac{\partial F}{\partial z\_j} = \left< \mathbf{b}\_j(\mathbf{x}) \cdot \nabla V(\mathbf{x}) - k\_B T \nabla \cdot \mathbf{b}\_j(\mathbf{x}) \right>\_{\theta(\mathbf{x}) = \mathbf{z}} \tag{4}$$

where ·*θ*(*x*)=*<sup>z</sup>* denotes averaging across samples drawn uniformly from the MD simulation constrained at *<sup>θ</sup>*(*x*) = *<sup>z</sup>*, and the *<sup>b</sup>*<sup>j</sup> (*x*) is the vector field orthogonal to the gradients of every component <sup>k</sup> of *θ* for <sup>k</sup> <sup>=</sup> <sup>j</sup>:

$$
\delta\_j(\boldsymbol{x}) \cdot \nabla \theta\_k(\boldsymbol{x}) = \delta\_{jk} \tag{5}
$$

where δjk is the Kroenecker delta. (For brevity, we have omitted the consideration of holonomic constraints other than that on the CV; the reader is referred to the paper by Ciccotti *et al*. for details [15].) The vector fields *<sup>b</sup>*<sup>j</sup> for each <sup>θ</sup><sup>j</sup> can be constructed by orthogonalization. The first term in the angle brackets in Equation (4) implements the chain rule one needs to account for how energy <sup>V</sup> changes with *z* through all the ways *z* can change with *x*. The second term corrects for the thermal bias imposed by the constraint.

Although nowhere near exhaustive, below is a listing of common types of problems to which blue-moon sampling has been applied with some representative examples:


#### 2.2.3. The Adaptive Biasing Force Algorithm

The blue-moon approach requires multiple independent constrained MD simulations to cover the region of CV space in which one wants internal statistics. The care taken in choosing these quadrature points can often dictate the accuracy of the resulting free energy reconstruction. It is therefore sometimes advantageous to consider ways to avoid having to choose such points ahead of time, and adaptive methods attempt to address this problem. One example is the adaptive-biasing force (ABF) algorithm of Darve *et al*. [6,35] The essence of ABF is two-fold: (1) recognition that external bias forces of the form ∇*x*θ<sup>j</sup> (∂F/∂z<sup>j</sup> ) for j = 1,...,M exactly oppose mean forces and should lead to more uniform sampling of CV space; and (2) that these bias forces can be converged upon adaptively during a single unconstrained MD simulation.

The first of those two ideas is motivated by the fact that "forces" that keep normal MD simulations effectively confined to free energy minima are mean forces on the collective variables projected onto the atomic coordinates, and balancing those forces against their exact opposite should allow for thermal motion to take the system out of those minima. The second idea is a bit more subtle; after all, in a running MD simulation with no CV constraints, the constrained ensemble expression for the mean force (Equation (4)) does not directly apply, because a constrained ensemble is not what is being sampled. However, Darve *et al*. showed how to relate these ensembles so that the samples generated in the MD simulation could be used to build mean forces [35]. Further, they showed using a clever choice of the fields of Equation (4) an equivalence between (i) the spatial gradients needed to computed forces, and (ii) time-derivatives of the CVs [6]:

$$\frac{\partial F}{\partial z\_i} = -k\_B T \left\langle \frac{d}{dt} \left( M\_\theta \frac{d\theta\_i}{dt} \right) \right\rangle\_{\theta = \mathbf{z}} \tag{6}$$

where M<sup>θ</sup> is the transformed mass matrix given by

$$M\_{\theta}^{-1} = J\_{\theta}M^{-1}J\_{\theta} \tag{7}$$

where J<sup>θ</sup> is the M × 3N matrix with elements ∂θi/∂x<sup>j</sup> (i = 1 ...M, j = 1 ... 3N), and M is the diagonal matrix of atomic masses. Equation (7) is the result of a particular choice for the fields *<sup>b</sup>*<sup>j</sup> (*x*). This reformulation of the instantaneous mean forces computed on-the-fly makes ABF exceptionally easy to implement in most modern MD packages. Darve *et al*. present a clear demonstration of the ABF algorithm in a pseudocode [6] that attests to this fact.

ABF has found rather wide application in CV-based free energy calculations in recent years. Below is a representative sample of some types of problems subjected to ABF calculations in the recent literature:


#### 2.2.4. Temperature-Accelerated Molecular Dynamics

Both blue-moon sampling and ABF are based on statistics in the constrained ensemble. However, estimation of mean forces need not only use this ensemble. One can instead relax the constraint and work with a "mollified" version of the free energy:

$$F\_{\kappa}(\mathbf{z}) = -k\_B T \ln \left< \delta\_{\kappa} \left[ \boldsymbol{\theta}(\boldsymbol{x}) - \mathbf{z} \right] \right> \tag{8}$$

where δ<sup>κ</sup> refers to the Gaussian (or "mollified delta function"):

$$\delta\_{\kappa} = \sqrt{\frac{\beta \kappa}{2\pi}} \exp\left[ -\frac{1}{2} \beta \kappa \left| \varPhi(\mathbf{z}) - \mathbf{z} \right|^{2} \right] \tag{9}$$

where β is just shorthand for 1/kBT. Since limβκ→∞ δ<sup>κ</sup> = δ, we know that limβκ→∞ F<sup>κ</sup> = F. One way to view this Gaussian is that it "smoothes out" the true free energy to a tunable degree; the factor 1/ <sup>√</sup>βκ is a length-scale in CV space below which details are smeared.

Because the Gaussian has continuous gradients, it can be used directly in an MD simulation. Suppose we have a CV space *θ*(*x*), and we extend our MD system to include variables *z* such that the combined set (*x*, *z*) obeys the following extended potential:

$$U(x, z) = V(x) + \sum\_{j=1}^{M} \frac{1}{2} \kappa \left| \theta\_j(x) - z\_j \right|^2 \tag{10}$$

where <sup>V</sup> (*x*) is the interatomic potential, and <sup>κ</sup> is a constant. Clearly, if we fix *z*, then the resulting free energy is to within an additive constant the mollified free energy of Equation (8). (The additive constant is related to the prefactor of the mollified delta function and has nothing to do with the number of CVs.) Further, we can directly express the gradient of this mollified free energy with respect to *z*: [54]

$$\nabla\_{\mathbf{z}} F\_{\kappa} = - \left< \kappa \left[ \varPhi(x) - z \right] \right> \tag{11}$$

This suggests that, instead of using constrained ensemble MD to accumulate mean forces, we could work in the *restrained* ensemble and get very good approximations to the mean force. By "restrained", we refer to the fact that the term giving rise to the mollified delta function in the configurational integral is essentially a harmonic restraining potential with a "spring constant" κ. In this restrained-ensemble approach, no velocities are held fixed, and the larger we choose κ the more closely we can approximate the true free energy. Notice however that large values of κ could lead to numerical instabilities in integrating equations of motion, and a balance should be found. (In practice, we have found that for CVs with dimensions of length, values of κ less than about 1,000 kcal/mol/Å<sup>2</sup> can be stably handled, and values of around 100 kcal/mol/Å<sup>2</sup> are typically adequate.)

Temperature-accelerated MD (TAMD) [7] takes advantage of the restrained-ensemble approach to directly evolve the variables *z* in such a way to accelerate the sampling of CV space. First, consider how the atomic variables *x* evolve under the extended potential (assuming Langevin dynamics):

$$m\_i \ddot{x}\_i = -\frac{\partial V(\mathbf{x})}{\partial x\_i} - \kappa \sum\_{j=1}^m \left[\theta\_j(\mathbf{z}) - z\_j\right] \frac{\partial \theta\_j(\mathbf{z})}{\partial x\_i} - \gamma m\_i \dot{x}\_i + \eta\_i(t; \beta) \tag{12}$$

Here, <sup>m</sup><sup>i</sup> is the mass of <sup>x</sup>i, <sup>γ</sup> is the friction coefficient for the Langevin thermostat, and *<sup>η</sup>* is the thermostat white noise satisfying the fluctuation-dissipation theorem at physical temperature β−<sup>1</sup>:

$$
\langle \eta\_i(t;\beta)\eta\_j(t';\beta)\rangle = \beta^{-1}\gamma m\_i \delta\_{ij}\delta(t-t') \tag{13}
$$

Key to TAMD is that the *z* are treated as slow variables that evolve according to their own equations of motion, which here we take as diffusive (though other choices are possible [7]):

$$
\bar{\gamma}\bar{m}\_j\dot{z}\_j = \kappa \left[\theta\_j(\mathbf{z}) - z\_j\right] + \xi\_j(t; \bar{\beta}) \tag{14}
$$

Here, γ¯ is a fictitious friction, m¯ <sup>j</sup> is a mass, and the first term on the right-hand side represents the instantaneous force on variable z<sup>j</sup> , and the second term represents thermal noise at the fictitious thermal energy <sup>β</sup>¯−<sup>1</sup> <sup>=</sup> <sup>β</sup>−<sup>1</sup>.

The advantage of TAMD is that if (1) γ¯ is chosen sufficiently large so as to guarantee that the slow variables indeed evolve slowly relative to the fundamental variables; *and* (2) κ is sufficiently large such that *θ*(*x*(t)) <sup>≈</sup> *z*(t) at any given time, then the force acting on *z* is approximately equal to minus the gradient of the free energy (Equation (11)) [7]. This is because the MD integration repeatedly samples <sup>κ</sup> [*θ*(*x*) <sup>−</sup> *z*] for an essentially fixed (but actually very slowly moving) *z*, so *z* evolution effectively feels these samples as a mean force. In other words, the dynamics of *z*(t) is effectively

$$
\bar{\gamma}\bar{m}\_j\dot{z}\_j = -\frac{\partial F(\mathbf{z})}{\partial z\_j} + \xi\_j(t; \bar{\beta}) \tag{15}
$$

This shows that the *z*-dynamics describes an equilibrium constant-temperature ensemble at *fictitious* temperature <sup>β</sup>¯−<sup>1</sup> acted on by the "potential" <sup>F</sup>(*z*), which is the free energy evaluated at the *physical* temperature <sup>β</sup>−<sup>1</sup>. That is, under TAMD, *<sup>z</sup>* conforms to a probability distribution of the form exp <sup>−</sup>βF¯ (*z*; <sup>β</sup>) , whereas under normal MD it would conform to exp [−βF(*z*; <sup>β</sup>)]. The all-atom MD simulation (at <sup>β</sup>) simply serves to approximate the *local gradients* of <sup>F</sup>(*z*). Sampling is enhanced by taking β¯−<sup>1</sup> > β−<sup>1</sup>, which has the effect of attenuating the ruggedness of F. TAMD therefore can accelerate a trajectory *z*(t) through CV space by increasing the likelihood of visiting points with relatively low physical Boltzmann factors. This borrows directly from the main idea of adiabatic free-energy dynamics [55] (AFED), in that one deliberately makes some variables hot (to overcome barriers) but slow (to keep them adiabatically separated from all other variables). In TAMD, however, the use of the mollified free energy means no cumbersome variable transformations are required. (The authors of AFED refer to TAMD as "driven"-AFED, or d-AFED [56].) It is also worth mentioning in this review that TAMD borrows heavily from an early version of metadynamics [57], which was formulated as a way to evolve the auxiliary variables *z* on a mollified free energy. However, unlike metadynamics (which we discuss below in Section 2.3.3), there is no history-dependent bias in TAMD.

Unlike TI, ABF, and the methods of umbrella sampling and metadynamics discussed in the next section, TAMD is not a method for direct calculation of the free energy. Rather, it is a way to overcome free energy barriers in a chosen CV space quickly without visiting irrelevant regions of CV space. (However, we discuss briefly a method in Section 4.2.2in which TAMD gradients are used in a spirit similar to ABF to reconstruct a free energy.) That is, we consider TAMD a way to efficiently explore relevant regions CV space that are practically inaccessible to standard MD simulation. It is also worth pointing out that, unlike ABF, TAMD does not operate by opposing the natural gradients in free energy, but rather by using them to guide accelerated sampling. ABF can only use forces in locations in CV space the trajectory has visited, which means nothing opposes the trajectory going to regions of very high free energy. However, under TAMD, an acceleration of β¯−<sup>1</sup>= 6 kcal/mol on the CVs will greatly accelerate transitions over barriers of 6-12 kcal/mol, but will still not (in theory) accelerate excursions to regions requiring climbs of hundreds of kcal/mol. TAMD and ABF have in common the ability to handle rather high-dimensional CVs.

Although it was presented theoretically in 2006 [7], TAMD was not applied directly to large-scale MD until much later [58]. Since then, there has been growing interest in using TAMD in a variety of applications requiring enhanced sampling:


Finally, we mention briefly that TAMD can be used as a quick way to generate trajectories from which samples can be drawn for subsequent mean-force estimation for later reconstruction of a multidimensional free energy; this is the essence of the single-sweep method [68], which is an efficient means of computing multidimensional free energies. Rather than using straight numerical TI, single sweep posits the free energy as a basis function expansion and uses standard optimization methods to find the expansion coefficients that best reproduce the measured mean forces. Single-sweep has been used to map diffusion pathways of CO and H2O in myoglobin [64,65].

#### *2.3. Bias Potential Methods: Umbrella Sampling and Metadynamics*

#### 2.3.1. Overview: Non-Boltzmann Sampling

In the previous section, we considered methods that achieve enhanced sampling by using mean forces: in TI, these are integrated to reconstruct a free energy; in ABF, these are built on-the-fly to drive uniform CV sampling; and in TAMD, these are used on-the-fly to guide accelerated evolution of CVs. In this section, we consider methods that achieve enhanced sampling by means of controlled bias potentials. As a class, we refer to these as *non-Boltzmann sampling* methods.

Non-Boltzmann sampling is generally a way to derive statistics on a system whose energetics differ from the energetics used to perform the sampling. Imagine we have an MD system with bare interatomic potential <sup>V</sup> (*x*), and we add a bias <sup>Δ</sup><sup>V</sup> (*x*) to arrive at a biased total potential:

$$V\_b(x) = V(x) + \Delta V(x) \tag{16}$$

The statistics of the CVs on this biased potential are then given as

$$\begin{split} P\_b(\mathbf{z}) &= \frac{\int d\mathbf{x} \, e^{-\beta V\_0(\mathbf{x})} e^{-\beta \Delta V(\mathbf{x})} \delta \left[\varPhi(\mathbf{z}) - \mathbf{z}\right]}{\int d\mathbf{x} \, e^{-\beta V\_0(\mathbf{x})} e^{-\beta \Delta V(\mathbf{x})}} \\ &= \frac{\int d\mathbf{x} \, e^{-\beta V\_0(\mathbf{x})} e^{-\beta \Delta V} \delta \left[\varPhi(\mathbf{z}) - \mathbf{z}\right]}{\int d\mathbf{x} \, e^{-\beta V\_0(\mathbf{x})}} \frac{\int d\mathbf{x} \, e^{-\beta V\_0(\mathbf{x})}}{\int d\mathbf{x} \, e^{-\beta V\_0(\mathbf{x})} e^{-\beta \Delta V(\mathbf{x})}} \\ &= \frac{\left\langle e^{-\beta \Delta V(\mathbf{x})} \delta \left[\varPhi(\mathbf{z}) - \mathbf{z}\right] \right\rangle}{\left\langle e^{-\beta \Delta V(\mathbf{x})} \right\rangle} \end{split} \tag{17}$$

where · denotes ensemble averaging on the unbiased potential <sup>V</sup> (*x*). Further, if we take the bias potential <sup>Δ</sup><sup>V</sup> to be explicitly a function only of the CVs *θ*, then it becomes invariant in the averaging of the numerator thanks to the delta function, and we have

$$P\_b(x) = \frac{e^{-\beta \Delta V(\mathbf{z})} \left< \delta \left[ \theta(x) - \mathbf{z} \right] \right>}{\left< e^{-\beta \Delta V[\theta(\mathbf{z})]} \right>} \tag{18}$$

Finally, since the unbiased statistics are <sup>P</sup>(*z*) = <sup>δ</sup> [*θ*(*x*) <sup>−</sup> *z*], we arrive at

$$P(\mathbf{z}) = P\_b(\mathbf{z})e^{\beta \Delta V(\mathbf{z})} \left\langle e^{-\beta \Delta V[\theta(\mathbf{z})]} \right\rangle \tag{19}$$

Taking samples from an ergodic MD simulation on the biased potential Vb, Equation (19) provides the recipe for reconstructing the statistics the CVs *would* present were they generated using the *unbiased* potential <sup>V</sup> . However, the probability <sup>P</sup>(*z*) is implicit in this equation, because

$$
\left< e^{-\beta \Delta V} \right> = \int d\mathbf{z} P(\mathbf{z}) e^{-\beta \Delta V[\theta(\mathbf{z})]} \tag{20}
$$

This is not really a problem, since we can treat ( e−βΔ<sup>V</sup> ) as a constant we can get from normalizing <sup>P</sup>b(*z*)e<sup>β</sup>Δ<sup>V</sup> (*z*) .

How does one choose ΔV so as to enhance the sampling of CV space? Evidently, from the standpoint of non-Boltzmann sampling, the closer the bias potential is to the negative free energy <sup>−</sup>F(*z*), the more uniform the sampling of CV space will be. To wit: if <sup>Δ</sup><sup>V</sup> [*θ*(*x*)] = <sup>−</sup><sup>F</sup> [*θ*(*x*)], then <sup>e</sup><sup>β</sup>Δ<sup>V</sup> (*z*) <sup>=</sup> <sup>e</sup>−βF(*z*) <sup>=</sup> <sup>P</sup>(*z*), and Equation (19) can be inverted for <sup>P</sup><sup>b</sup> to yield

$$P\_b(\mathbf{z}) = \frac{1}{\langle e^{\beta F(\mathbf{z})} \rangle} = \frac{1}{\int d\mathbf{z} P(\mathbf{z}) e^{\beta F(\mathbf{z})}} = \frac{1}{\int d\mathbf{z} e^{-\beta F} e^{\beta F}} = \frac{1}{\int d\mathbf{z}} \tag{21}$$

So we see that taking the bias potential to be the negative free energy makes all states *z* in CV space equiprobable. This is indeed the limit to which ABF strives by applying negative mean forces, for example [6].

We usually do not know the free energy ahead of time; if we did, we would already know the statistics of CV space and no enhanced sampling would be necessary. Moreover, perfectly uniform sampling of the entire CV space is usually far from necessary, since most CV spaces have many irrelevant regions that should be ignored. And in reference to the mean-force methods of the last section, uniform sampling is likely not necessary to achieve accurate mean force values; how good an estimate of <sup>∇</sup><sup>F</sup> is at some point *<sup>z</sup>*<sup>0</sup> should not depend on how well we sampled at some other point *z*<sup>1</sup>. Yet achieving uniform sampling is an idealization since, if we do, this means we know the free energy. We now consider two other biasing methods that aim for this ideal, either in relatively small regions of CV space using fixed biases, or over broader extents using adaptive biases.

#### 2.3.2. Umbrella Sampling

Umbrella sampling is the standard way of using non-Boltzmann sampling to overcome free energy barriers. In its debut [69], umbrella sampling used a function <sup>w</sup>(*x*) that weights hard-to-sample configurations, equivalent to adding a bias potential of the form

$$
\Delta V(x) = -k\_B T \ln w(x) \tag{22}
$$

w is found by trial-and-error such that configurations that are easy to sample on the unbiased potential are still easy to sample; that is, w acts like an "umbrella" covering both the easy- and hard-to-sample regions of configuration space. Nearly always, w is an explicit function of the CVs, <sup>w</sup>(*x*) = <sup>W</sup>[*θ*(*x*)].

Coming up with the umbrella potential that would enable exploration of CV space with a single umbrella sampling simulation that takes the system far from its initial point is not straightforward. Akin to TI, it is therefore advantageous to combine results from several independent trajectories, each with its own umbrella potential that localizes it to a small volume of CV space that overlaps with nearby volumes. The most popular way to combine the statistics of such a set of independent umbrella sampling runs is the weighted-histogram analysis method (WHAM) [70].

To compute statistics of CV space using WHAM, one first chooses the points in CV space that define the little local neighborhoods, or "windows" to be sampled and chooses the bias potential used to localize the sampling. Not knowing how the free energy changes in CV space makes the first task somewhat challenging, since more densely packed windows are preferred in regions where the free energy changes rapidly; however, since the calculations are independent, more can be added later if needed. A convenient choice for the bias potential is a simple harmonic spring that tethers the trajectory to a reference point *<sup>z</sup>*<sup>i</sup> in CV space:

$$
\Delta V\_i(x) = \frac{1}{2}\kappa \left| \theta(x) - z\_i \right|^2 \tag{23}
$$

which means the dynamics of the atomic variables *<sup>x</sup>* are identical to Equation (12) at fixed *<sup>z</sup>* <sup>=</sup> *<sup>z</sup>*<sup>i</sup>. The points {*z*<sup>i</sup>} and the value of <sup>κ</sup> (which may be point-dependent) must be chosen such that *<sup>θ</sup>* [*x*(t)] from any one window's trajectory makes excursions into the window of each of its nearest neighbors in CV space.

Each window-restrained trajectory is directly histogrammed to yield apparent (*i.e*., biased) statistics on *θ*; let us call the biased probability in the <sup>i</sup>th window <sup>P</sup>b,i(*z*). Equation (19) again gives the recipe to reconstruct the unbiased statistics <sup>P</sup>i(*z*) for *<sup>z</sup>* in the window of *<sup>z</sup>*<sup>i</sup>:

$$P\_i(\mathbf{z}) = P\_{b,i}(\mathbf{z})e^{\frac{1}{2}\beta\kappa \left|\mathbf{z} - \mathbf{z}\_i\right|^2} \left\langle e^{-\beta\frac{1}{2}\kappa \left|\theta(\mathbf{z}) - \mathbf{z}\_i\right|^2} \right\rangle \tag{24}$$

We could use Equation (24) directly assuming the biased MD trajectory is ergodic, but we know that regions far from the reference point will be explored very rarely and thus their free energy would be estimated with large uncertainty. This means that, although we can use sampling to compute Pb,i *knowing* it effectively vanishes outside the neighborhood of *z*<sup>i</sup>, we cannot use sampling to compute \* e−<sup>β</sup> <sup>1</sup> <sup>2</sup>κ|*θ*(*x*)−*z*i| 2 + .

WHAM solves this problem by renormalizing the probabilities in each window into a single composite probability. Where there is overlap among windows, WHAM renormalizes such that the statistical variance of the probability is minimal. That is, it treats the factor \* e−<sup>β</sup> <sup>1</sup> <sup>2</sup>κ|*θ*(*x*)−*z*i| 2 + as an undetermined constant C<sup>i</sup> for each window, and solves for specific values such that the composite unbiased probability <sup>P</sup>(*z*) is continuous across all overlap regions with minimal statistical error. An alternative to WHAM, termed "umbrella integration", solves the problem of renormalization across windows by constructing the composite mean force [71,72].

The literature on umbrella sampling is vast (by simulation standards), so we present here a very condensed listing of some of its more recent application areas with representative citations:


#### 2.3.3. Metadynamics

As already mentioned, one of the difficulties of the umbrella sampling method is the choice and construction of the bias potential. As we already saw with the relationship among TI, ABF, and TAMD, an adaptive method for building a bias potential in a running MD simulation may be advantageous. Metadynamics [8,143] represents just such a method.

Metadynamics is rooted in the original idea of "local elevation" [144], in which a supplemental bias potential is progressively grown in the dihedral space of a molecule to prevent it from remaining in one region of configuration space. However, at variance with metadynamics, local elevation does not provide any means to reconstruct the unbiased free-energy landscape and as such it is mostly aimed at fast generation of plausible conformers.

In metadynamics, configurational variables *x* evolve in response to a biased total potential:

$$V(x) = V\_0(x) + \Delta V(x, t) \tag{25}$$

where <sup>V</sup><sup>0</sup> is the bare interatomic potential and <sup>Δ</sup><sup>V</sup> (*x*, t) is a time-dependent bias potential. The key element of metadynamics is that the bias is built as a sum of Gaussian functions centered on the points in CV space already visited:

$$\Delta V\left[\theta(x), t\right] = w \sum\_{\substack{t'=\tau\_G, 2\tau\_G, \dots \\ t'$$

Here, w is the height of each Gaussian, τ<sup>G</sup> is the size of the time interval between successive Gaussian depositions, and <sup>δ</sup>*θ* is the Gaussian width. It has been first empirically [145] then analytically [146] demonstrated that in the limit in which the CVs evolve according to a Langevin dynamics, the bias indeed converges to the negative of the free energy, thus providing an optimal bias to enhance transition events. Multiple simulations can also be used to allow for a quicker filling of the free-energy landscape [147].

The difference between the metadynamics estimate of the free energy and the true free energy can be shown to be related to the diffusion coefficient of the collective variables and to the rate at which the bias is grown. A possible way to decrease this error as a simulation progresses is to decrease the growth rate of the bias. Well-tempered metadynamics [148] used an optimized schedule to decrease the deposition rate of bias by modulating the Gaussian height:

$$w = \omega\_0 \tau\_G e^{-\frac{\Delta V(\theta, t)}{k\_B \Delta T}} \tag{27}$$

Here, ω<sup>0</sup> is the initial "deposition rate", measured Gaussian height per unit time, and ΔT is a parameter that controls the degree to which the biased trajectory makes excursions away from free-energy minima. It is possible to show that using well-tempered metadynamics the bias does not converge to the negative of the free-energy but to a fraction of it, thus resulting in sampling the CVs at an effectively higher temperature T + ΔT, where normal metadynamics is recovered for ΔT → ∞. We notice that other deposition schedules can be used aimed, e.g., at maximizing the number of round-trips in the CV space [149]. Importantly, it is possible to recover equilibrium Boltzmann statistics of *unbiased* collective variables from samples drawn throughout a well-tempered metadynamics trajectory [150]; it does not seem clear that one can do this from an ABF trajectory. Finally, it is possible to tune the shape of the Gaussians on the fly using schemes based on the geometric compression of the phase space or on the variance of the CVs [151].

In the well-tempered ensemble, the parameter ΔT can be used to tune the size of the explored region, in a fashion similar to the fictitious temperature in TAMD. So both TAMD and well-tempered metadynamics can be used to explore *relevant* regions of CV space while surmounting *relevant* free energy barriers. However, there are important distictions between the two methods. First, the main source of error in TAMD rests with how well mean-forces are approximated, and adiabatic separation, realizable only when the auxiliary variables *z* never move, is the only way to guarantee they are perfectly accurate. In practical application, TAMD never achieves perfect adiabatic separation. In contrast, because the deposition rate of decreases as a well-tempered trajectory progresses, errors related to poor adiabatic separation are progressively damped. Second, as already mentioned, TAMD alone cannot report the free energy, but it also is therefore not practically limited by the dimensionality of CV space; multicomponent gradients are just as accurately calculated in TAMD as are single-component gradients. Metadynamics, as a histogram-filling method, must exhaustively sample a finite region around any point to know the free energy and its gradients are correct, which can sometimes limit its utility.

Metadynamics is a powerful method whose popularity continues to grow. In either its original formulation or in more recent variants, metadynamics has been employed successfully in several fields, some of which we point out below with some representative examples:


#### (9) and proton diffusion [168].

#### *2.4. Some Comments on Collective Variables*

#### 2.4.1. The Physical Fidelity of CV-Spaces

Given a potential <sup>V</sup> (*x*), any multidimensional CV *θ*(*x*) has a mathematically determined free energy <sup>F</sup>(*z*), and in principle the free-energy methods we describe here (and others) can use and/or compute it. However, this does not guarantee that <sup>F</sup> is meaningful, and a poor choice for *θ*(*x*) can render the results of even the most sophisticated free-energy methods useless for understanding the nature of actual metastable states and the transitions among them. This puts two major requirements on any CV space:


The first of these may seem obvious: CVs are chosen to provide a low-dimensional description of some important process, say a conformational change or a chemical reaction or a binding event, and one can not describe a process without being able to discriminate states. However, it is not always easy to find CVs that do this. Even given representative configurations of two distinct metastable states, standard MD from these two different initial configurations may sample partially overlapping regions of CV space, making ambiguous the assignation of an arbitrary configuration to a state. It may be in this case that the two representative configurations actually belong to the same state, or that if there are two states, that no matter what CV space is overlaid, the barrier separating them is so small that, on MD timescales, they can be considered rapidly exchanging substates of some larger state.

However, a third possibility exists: the two MD simulations mentioned above may in fact represent very different states. The overlap might just be an artifact of neglecting to include one or more CVs that are truly necessary to distinguish those states. If there is a significant free energy barrier along this neglected variable, an MD simulation will not cross it, yet may still sample regions in CV space also sampled by an MD simulation launched from the other side of this hidden barrier. And it is even worse: if TI or umbrella sampling is used along a pathway in CV space that neglects an important variable, the free-energy barriers along that pathway might be totally meaningless.

Hidden barriers can be a significant problem in CV-based free-energy calculations. Generally speaking, one only learns of a hidden barrier after postulating its existence and testing it with a new calculation. Detecting them is not straightforward and often involves a good deal of CV space exploration. Methods such as TAMD and well-tempered metadynamics offer this capability, but much more work could be done in the automated detection of hidden barriers and the "right" CVs (e.g., [169–171]).

An obvious way of reducing the likelihood of hidden barriers is to use increase the dimensionality of CV space. TAMD is well-suited to this because it is a gradient method, but standard metadynamics, because it is a histogram-filling method, is not. A recent variant of metadynamics termed "reconnaissance metadynamics" [172] does have the capability of handling high-dimensional CV spaces. In reconnaissance metadynamics, bias potential kernels are deposited at the CV space points identified as centers of clusters detected and measured by an on-the-fly clusterization scheme. These kernels are hyperspherically symmetric but grow as cluster sizes grow and are able to push a system out of a CV space basin to discover other basins. As such, reconnaissance metadynamics is an automated way of identifying free-energy minima in high-dimensional CV spaces. It has been applied the identification of configurations of small clusters of molecules [173] and identification of protein-ligand binding poses [162].

#### 2.4.2. Some Common and Emerging Types of CVs

There are very few "best practices" codified for choosing CVs for any given system. Most CVs are developed ad hoc based on the processes that investigators would like to study, for instance, center-of-mass distance between two molecules for studying binding/unbinding, or torsion angles for studying conformational changes, or number of contacts for studying order-disorder transitions. Cartesian coordinates of centers of mass of groups of atoms are also often used as CVs, as they are functions of these coordinates.

The potential energy <sup>V</sup> (*x*) is also an example of a 1-D CV, and there have been several examples of using it in CV-based enhanced sampling methods, such as umbrella sampling [174], metadynamics [175] well-tempered metadynamics [176]. In a recent work based on steered MD, it has been shown that also relevant reductions of the potential energy (e.g., the electrostatic interaction free-energy) can be used as effective CVs [177]. The basic rationale for enhanced sampling of V is that states with higher potential energy often correspond to transition states, and one need make no assumptions about precise physical mechanisms. Key to its successful use as a CV, as it is for any CV, is a proper accounting for its entropy; *i.e.*, the classical density-of-states.

Coarse-graining of particle positions onto Eulerian fields was used early on in enhanced sampling [178]; here, the value of the field at any Cartesian point is a CV, and the entire field represents a very high-dimensional CV. This idea has been put to use recently in the "indirect umbrella sampling" method of Patel *et al*. [179] for computing free energies of solvation, and string method (Section 4.2.1) calculations of lipid bilayer fusion [180]. In a similar vein, there have been recent attempts at variables designed to count the recurrency of groups of atoms positioned according to given templates, such as α-helices paired β-strands in proteins [181].

We finally mention the possibility of building collective variables based on set of frames which might be available from experimental data or generated by means of previous MD simulations. Some of these variables are based on the idea of computing the distances between the present configuration and a set of precomputed snapshots. These distances, here indicated with di, where i is the index of the snapshot, are then combined to obtain a coarse representation of the present configuration, which is then used as a CV. As an example, one might combine the distances as

$$s = \frac{\sum\_{i} e^{-\lambda d\_i} i}{\sum\_{i} e^{-\lambda d\_i}} \tag{28}$$

If the parameter λ is properly chosen, this function returns a continuous interpolation between the indexes of the snapshots which are closer to the present conformation. If the snapshots are disposed along a putative path connecting two experimental structures, this CV can be used as a path CV to monitor and bias the progression along the path [182]. A nice feature of path CVs is that it is straighforward to also monitor the distance from the putative path. The standard way to do it is by looking at the distance from the closest reference snapshot, which can be approximately computed with the following continuous function:

$$z = -\lambda^{-1} \log \sum\_{i} e^{-\lambda d\_i} \tag{29}$$

This approach, modified to use internal coordinates, was used recently by Zinovjev *et al*. to study the aqueous phase reaction of pyruvate to salycilate, and in the CO bond-breaking/proton transfer in PchB [183].

A generalization to multidimensional paths (*i.e*., sheets) can be obtained by assigning a generic vector v<sup>i</sup> to each of the precomputed snapshots and computing its average [184]:

$$s = \frac{\sum\_{i} e^{-\lambda d\_i} v\_i}{\sum\_{i} e^{-\lambda d\_i}} \tag{30}$$

#### 3. Tempering Approaches

"Tempering" refers to a class of methods based on increasing the temperature of an MD system to overcome barriers. Tempering relies on the fact that according to the Arrhenius law the rate at which activated (barrier-crossing) events happen is strongly dependent on the temperature. Thus, an annealing procedure where the system is first heated and then cooled allows one to produce quickly samples which are largely uncorrelated. The root of all these ideas indeed lies in the simulated annealing procedure [185], a well-known method successfully used in many optimization problems.

#### *3.1. Simulated Tempering*

Simulated annealing is a form of Markov-chain Monte Carlo sampling where the temperature is artificially modified during the simulation. In particular, sampling is initially done at a temperature high enough that the simulation can easily overcome high free-energy barriers. Then, the temperature is decreased as the simulation proceeds, thus smoothly bringing the simulation to a local energy minimum. In simulated annealing, a critical parameter is the cooling speed. Indeed, the probability to reach the global minimum grows as this speed is decreased.

The search for the global minimum can be interpreted in the same way as sampling an energy landscape at zero temperature. One could thus imagine to use simulated annealing to generate conformations at, e.g., room temperature by slowly cooling conformations starting at high temperature. However, the resulting ensemble will strongly depend on the cooling speed, thus possibly providing a biased result. A better approach consists of the the so-called simulated tempering methods [186]. Here, a discrete list of temperatures Ti, with i ∈ 1 ...N are chosen *a priori*, typically spanning a range going from the physical temperature of interest to a temperature which is high enough to overcome all relevant free energy barriers. (Note that we do not have to stipulate a CV-space in which those barriers live.) Then, the index i, which indicates at which temperature the system should be simulated, is evolved with time. Two kind of moves are possible: (a) normal evolution of the system at fixed temperature, which can be done with a usual Markov Chain Monte Carlo or molecular dynamics and (b) change of the index i at fixed atomic coordinates. It is easy to show that the latter can be performed as a Monte Carlo step with acceptance equal to

$$\alpha = \min\left(1, \frac{Z\_j}{Z\_i} e^{-\frac{U(x)}{k\_B T\_j} + \frac{U(x)}{k\_B T\_i}}\right) \tag{31}$$

where i and j are the indexes corresponding to the present temperature and the new one. The weights Z<sup>i</sup> should be choosen so as to sample equivalently all the value of i. It must be noticed that also within molecular dynamics simulations only the potential energy usually appears in the acceptance. This is due to the fact that the velocities are typically scaled by a factor ,<sup>T</sup><sup>j</sup> <sup>T</sup><sup>i</sup> upon acceptance. This scaling leads to a cancellation of the contribution to the acceptance coming from the kinetic energy. Ultimately, this is related to the fact that the ensemble of velocities is analytically known *a priori*, such that it is possible to adapt the velocities to the new temperature instantaneously.

Estimating these weights Z<sup>i</sup> is nontrivial and typically requires a preliminary step. Moreover, if this estimate is poor the system could spend no time at the physical temperature, thus spoiling the result. Iterative algorithms for adjusting these weights have been proposed (see e.g., [187]). We also observe that since the temperature sets the typical value of the potential energy, an effect much similar to that of simulated tempering with adaptive weights can be obtained by performing a metadynamics simulation using the potential energy as a CV (Section 2.4.2).

#### *3.2. Parallel Tempering*

A smart way to alleviate the issue of finding the correct weights is that of simulating several replicas at the same time [188,189]. Rather that changing the temperature of a single system, the defining move proposal in parallel tempering consists of a coordinate swap between two T-replicas with acceptance probability

$$\alpha = \min\left(1, e^{\left(\frac{1}{k\_B T\_j} - \frac{1}{k\_B T\_i}\right) [U(\mathbf{x}\_j) - U(\mathbf{x}\_i)]}\right) \tag{32}$$

This method is the root of a class of techniques collectively known as "replica exchange" methods, and the latter name is often used as a synonimous of parallel tempering. Notably, within this framework it is not necessary to precompute a set of weights. Indeed, the equal time spent by each replica at each temperature is enforced by the constraint that only pairwise swaps are allowed. Moreover, parallel tempering has an additional advantage: since the replicas are weakly coupled and only interact when exchanges are attempted, they can be simulated on different computers without the need of a very fast interconnection (provided, of course, that a single replica is small enough to run on a single node).

The calculation of the acceptance is very cheap as it is based on the potential energy which is often computed alongside force evaluation. Thus, one could in theory exploit also a large number of virtual, rejected exchanges so as to enhance statistical sampling [190,191]. Since efficiency of parallel tempering simulation can deteriorate if the stride between subsequent exchanges is too large [192,193], a typical recipe is to choose this stride as small as possible, with the only limitation of avoiding extra costs due to replica synchronization. One can push this idea further and implement asynchronous versions of parallel tempering, where overhead related to exchanges is minimized [193,194]. One should be however aware that, especially at high exchange rate, artifacts coming from e.g., the use of wrong thermostating schemes could spoil the results [195,196].

Parallel tempering is popular in simulations of protein conformational sampling [197,198], protein folding [189,199–203] and aggregation [204,205], due at least in part to the fact that one need not choose CVs to use it, and CVs for describing these processes are not always straightforward to determine.

#### *3.3. Generalized Replica Exchange*

The difference between the replicas is not restricted to be a change in temperature. Any control parameter can be changed, and even the expression of the Hamiltonian can be modified [206]. In the most general case every replica is simulated at a different temperature (and or pressure) and a different Hamiltonian, and the acceptance reads

$$\alpha = \min\left(1, \frac{e^{-\left(\frac{U\_i(x\_j)}{k\_B T\_i} + \frac{U\_j(x\_i)}{k\_B T\_j}\right)}}{e^{-\left(\frac{U\_i(x\_i)}{k\_B T\_i} + \frac{U\_j(x\_j)}{k\_B T\_j}\right)}}\right) \tag{33}$$

Several recipes for choosing the modified Hamiltonian have been proposed in the literature [207–219]. Among these, a notable idea is that of solute tempering [208,217] which is used for the simulation of solvated biomolecules. Here, only the Hamiltonian of the solute is modified. More precisely, one could notice that a scaling of the Hamiltonian by a factor λ is completely equivalent to a scaling of the temperature by a factor λ−<sup>1</sup>. Hamiltonian scaling however can take advantage of the fact that the total energy of the system is an extensive property. Thus, one can limit the scaling to the portion of the system which is considered to be interesting and which has the relevant bottlenecks. With solute tempering, the solute energy is scaled whereas the solvent energy is left unchanged. This is equivalent to keeping the solute at a high effective temperature and the solvent at the physical temperature. Since in the simulation of solvated molecules most of the atoms belong to the solvent, this turns in a much smaller modification to the explored ensemble when compared with parallel tempering. In spite of this, the effect on the solute resemble much that of increasing the physical temperature.

A sometimes-overlooked subtlety in solute tempering is the choice for the treatment of solvent-solute interactions. Indeed, whereas solute-solute interactions are scaled with a factor λ < 1 and solvent-solvent interactions are not scaled, any intermediate choice (scaling factor between λ and 1) could intuitively make sense for solvent-solute coupling. In the original formulation, the authors used a factor (1 + λ)/2 for the solute-solvent interaction. This choice however was later shown to be suboptimal [217,220], and refined to be <sup>√</sup> λ. This latter choice appears to be more physically sound, since it allows one to just simulate the biased replicas with a modified force-field. Indeed, if one scales the charges of the solute by a factor <sup>√</sup> λ, electrostatic interactions are changed by a factor <sup>λ</sup> for solute-solute coupling and <sup>√</sup> λ for solute-solvent coupling. The same is true for Lennard-Jones terms, albeit in this case it depends on the specific combination rules used. Notably, the same rules for scaling were used in a previous work [209]. As a final remark, we point out that solute tempering can be also used in a serial manner *a là* simulated tempering, in a simulated solute tempering scheme [221].

#### *3.4. General Comments*

In general, the advantage of these tempering methods over straighforward sampling can be rationalized as follows. A simulation is evolved so as to sample a modified ensemble by e.g., raising temperature or artificially modifying the Hamiltonian. The change in the ensemble could be drastic, so that trying to extract canonical averages by reweighting from such a simulation would be pointless. For this reason, a ladder of intermediate ensembles is built, interpolating between the physical one (*i.e*., room temperature, physical Hamiltonian) and the modified one. Then, transitions between consecutive steps in this ladder (or, in parallel schemes, coordinate swaps) are performed using a Monte Carlo scheme. Assuming that the dynamics of the most modified ensemble is ergodic, independent samples will be generated every time a new simulation reaches the highest step of the ladder. Thus, efficiency of these methods is often based on the evaluation of the round trip time required for a replica to traverse the entire ladder.

Tempering methods are thus relying on the ergodicity of the most modified ensemble. This assumption is not always correct. A very simple example is parallel tempering used to accelerate the sampling over an entropic barrier. Since the height of an entropic barrier grows with the temperature, in this conditions the barrier in the most modified ensembles are unaffected [222]. Moreover, since a lot of time is spent in sampling states in non-physical situations (e.g., high temperature), the overall computational efficiency could even be lower than that of straightforward sampling. Real applications are often in an intermediate situation, and usefulness of parallel tempering should be evaluated case by case.

The number of intermediate steps in the ladder can be shown to grow with the square root of the specific heat of the system in the case of parallel tempering simulations. No general relationship can be drawn in the case of Hamiltonian replica exchange, but one can expect approximately that the number of replicas should be proportional to the square root of the number of degrees of freedom affected by the modification of the Hamiltonian. Thus, Hamiltonian replica exchange methods could be much more effective than simple parallel tempering as they allow the effort to be focused and the number of replicas to be minimized.

Parallel tempering has the advantage that all the replicas can be analyzed to obtain meaningful results, e.g., to predict the melting curve of a molecule. This procedure should be used with caution, especially with empirically parametrized potentials, which are often tuned to be realistic only at room temperature. On the other hand, Hamiltonian replica exchange often relies on unphysically modified ensembles which have no interest but for the fact that they increase ergodicity.

As a final note, we observe that data obtained at different temperature (or with modified Hamiltonians) could be combined to enhance statistics at the physical temperature [223]. However, the effectiveness of this data recycling is limited by the fact that high temperature replicas visit very rarely low energy conformations, thus decreasing the amount of additional information that can be extracted.

#### 4. Combinations and Advanced Approaches

#### *4.1. Combination of Tempering Methods and Biased Sampling*

The algorithms presented in Section 3 and based on tempering are typically considered to be simpler to apply when compared with those discussed in Section 2 and based on biasing the sampling of selected collective variables. Indeed, by avoiding the problem of choosing collective variables which properly describe the reaction path, most of the burden of setting up a simulation is removed. However, this comes at a price: considering the computational cost, tempering methods are extremely expensive. This cost is related to the fact that they are able to accelerate all degrees of freedom to the same extent, without an *a priori* knowledge of the sampling bottlenecks. In this sense, Hamiltonian replica exchange methods are in an intermediate situation, since they are typically less expensive than parallel tempering but allow to embed part of the knowledge of the system in the simulation set up.

Because of the conceptual difference between tempering methods and CV-based methods, these approaches can be easily and efficiently combined. As an example, the combination of metadynamics and parallel tempering can be used to take advantage of the known bottlenecks with biased collective variables at the same time accelerating the overall sampling with parallel tempering [156]. In that work, the free energy landscape for the folding of a small hairpin was computed by biasing a small number of selected CVs (gyration radius and the number of hydrogen bonds). These CVs alone are not enough to describe folding, as can be easily shown by performing a metadynamics simulation using these CVs. However, the combination with parallel tempering allowed acceleration of all the degrees of freedom blindly and reversible folding of the hairpin. This combined approach also improves the results when compared with parallel tempering alone, since it accelerates exploration of phase-space. Moreover, since parallel tempering samples the unbiased canonical distribution, it is very difficult to use it to compute free-energy differences which are larger than a few kBT. The metadynamics bias can be used to disfavor, e.g., the folded state so as to better estimate the free-energy difference between the folded and unfolded states.

It is also possible to combine metadynamics with the solute tempering method so as to decrease the number of required replicas and the computational cost [224]. As an alternative to solute tempering, metadynamics in the well-tempered ensemble can be effectively used to enhance the acceptance in parallel tempering simulations and to decrease the number of necessary replicas [176]. This combination of parallel tempering with well-tempered ensemble can be pushed further and combined with metadynamics on a few selected degrees of freedom [225]. As a final note, bias exchange metadynamics [226] combines metadynamics and replica echange in a completely different spirit: every replica is run using a different CV, thus allowing many CVs to be tried at the same time. This technique has been succesfully applied to several problems. For a recent review, we refer the reader to [227].

#### *4.2. Some Methods Based on TAMD*

#### 4.2.1. String Method in Collective Variables

The string method is generally an approach to find pathways of minimal energy connecting two points in phase space [228]. When working in CVs, the string method is used to find minimal free-energy paths (MFEP's) [229]. String method calculations involve multiple replicas, each representing a point *<sup>z</sup>*<sup>s</sup> in CV space at position <sup>s</sup> along a discretized string connecting two points of interest (reactant and product states, say). The forces on each replica's *<sup>z</sup>*<sup>s</sup> are computed and their *z*<sup>s</sup>'s updated, as in TAMD, with the addition of forces that act to keep the *z*'s equidistant along the string (so-called reparameterization forces):

$$\bar{\gamma}\dot{z}\_{j}(s,t) = \sum\_{k} \left[ \tilde{M}\_{jk}(\mathbf{x}(s,t))\kappa[\theta\_{k}(\mathbf{x}(s,t)) - z\_{k}(s,t)] \right] + \eta\_{z}(t) + \lambda(s,t)\frac{\partial z\_{j}}{\partial s} \tag{34}$$

Here, M˜jk is the metric tensor mapping distances on the manifold of atomic coordinates to the manifold of CV space, η is thermal noise and λ(s, t) ∂z<sup>j</sup> ∂s represents the reparameterization force tangent to the string that is sufficient to maintain equidistant images along the string. String method has been used to study activation of the insulin-receptor kinase [63], docking of insulin to its receptor [230], and myosin [231]. In these examples, the update of the string coordinates is done at a lower frequency than the atomic variables in each image.

In contrast, in the on-the-fly variant of string method in CVs, the friction on the *z*<sup>s</sup>'s is set high enough to make the effective averaging of the forces approach the true mean forces, and the *z* updates occur in lockstep with the *x* updates of the MD system [232]. Just as in TAMD, the atomic variables obey an equation of motion like Equation (12) tethering them to the *<sup>z</sup>*<sup>s</sup>. Stober and Abrams recently demonstrated an implementation of on-the-fly string method to study the thermodynamics of the normal-to-amyloidogenic transition of β2-microglobulin [233]. Unique in this approach was the construction of a single composite MD system containing 27 individual β2 molecules restrained to points on 3 × 3 × 3 grid inside a single large solvent box. Zinovjev *et al*. used a combination of the on-the-fly string method and of path-collective variables (see Equations (28) and (29)) in a quantum-mechanics/molecular-mechanics approach to study a methyltransferase reaction [234].

#### 4.2.2. On-the-Fly Free Energy Parameterization

Because TAMD provides mean-force estimates as it is exploring CV space, it stands to reason that those mean forces could be used to compute a free energy. In contrast, in the single-sweep method [68], the TAMD forces are only used in the CV space exploration phase, not the free-energy calculation itself. Recently, Abrams and Vanden-Eijnden proposed a method for using TAMD directly to *parameterize* a free energy; that is, to determine the best set of some parameters *λ* on which a free energy of known functional form depends [235]:

$$F(\mathbf{z}) = F(\mathbf{z}; \lambda^\*) \tag{35}$$

The approach, termed "on-the-fly free energy parameterization", uses forces from a running TAMD simulation to progressively optimize *λ* using a time-averaged gradient error:

$$E(\lambda) = \frac{1}{2t} \int\_0^t \left| \nabla\_z F \left[ \mathbf{z}(s), \lambda(t) \right] + \kappa \left[ \theta(\mathbf{z}(s)) - \mathbf{z}(s) \right] \right|^2 ds \tag{36}$$

If constructed so that <sup>F</sup> is linear in *<sup>λ</sup>* = (λ1, λ2,...,λM), minimization of <sup>E</sup> can be expressed as a simple linear algebra problem

$$\sum\_{j} A\_{ij}\lambda\_j = b\_i, \quad i = 1, \ldots, M \tag{37}$$

and the running TAMD simulation provides progressively better estimates of <sup>A</sup> and <sup>b</sup> until the *λ* converge. In the cited work, it was shown that this method is an efficient way to derive potentials of mean force between particles in coarse-grained molecular simulations as basis-function expansions. It is currently being investigated as a means to parameterize free energies associated with conformational changes of proteins.

Chen, Cuendet, and Tuckermann developed a very similar approach that in addition to parameterizing a free energy using d-AFED-computed gradients uses a metadynamics-like bias on the potential [236]. These authors demonstrated efficient reconstruction of the four-dimensional free-energy of vacuum alanine dipeptide with this approach.

#### 5. Conclusions

In this review, we have summarized some of the current and emerging enhanced sampling methods that sit atop MD simulation. These have been broadly classified as methods that use collective variable biasing and methods that use tempering. CV biasing is a much more prevalent approach than tempering, due partially to the fact that it is perceived to be cheaper, since tempering simulations are really only useful for enhanced sampling of configuration space when run in parallel. CV-biasing also reflects the desire to rein in the complexity of all-atom simulations by projecting configurations into a much lower dimensional space. (Parallel tempering can be thought of as increasing the dimensionality of the system by a factor equal to the number of simulated replicas.) But the drawback of all CV-biasing approaches is the risk that the chosen CV space does not provide the most faithful representation of the true spectrum of metastable subensembles and the barriers that separate them. Guaranteeing that sampling of CV space is not stymied by hidden barriers must be of paramount concern in the continued evolution of such methods. For this reason, methods that specifically allow broad exploration of CV space, like TAMD (which can handle large numbers of CVs) and well-tempered metadynamics will continue to be valuable. So too will parallel tempering because its broad sampling of configuration space can be used to inform the choice of better CVs. Accelerating development of combined CV-tempering methods bodes well for enhanced sampling generally.

Although some of these methods involve time-varying forces (ABF, TAMD, and metadynamics), all methods we've discussed have the underlying rationale of the equilibrium ensemble. TI uses the constrained ensemble, ABF and metadynamics ideally converge to an ensemble in which a bias erases free-energy variations, and TAMD samples an attenuated/mollified equilibrium ensemble. There is an entirely separate class of methods that inherently rely on *non-equilibrium* thermodynamics. We have not discussed at all the several free-energy methods based on non-equilibrium MD simulations; we refer interested readers to the article by Christoph Dellago and Gerhard Hummer in this issue.

Finally, we have also not really touched on any of the practical issues of implementing and using these methods in conjunction with modern MD packages (e.g., NAMD [237], LAMMPS [238], Gromacs [239], Amber [240], and CHARMM [241], to name a few). At least two packages (NAMD and CHARMM) have native support for collective variable biasing, and NAMD in particular offers both native ABF and a TcL-based interface which has been used to implement TAMD [58]. The native collective variable module for NAMD has been recently ported to LAMMPS [242]. Gromacs offers native support for parallel tempering. Generally speaking, however, modifying MD codes to handle CV-biasing and multiple replicas is not straightforward, since one would like access to the data structures that store coordinates and forces. A major help in this regard is the PLUMED package [243,244], which patches a variety of MD codes to enable users to use many of the techniques discussed here.

#### Acknowledgments

CFA would like to acknowledge support of NSF (DMR-1207389) and NIH (1R01GM100472). GB would like to acknowledge the European Research Council (Starting Grant S-RNA-S, no. 306662) for financial support. Both authors would like to acknowledge NSF support of a recent Pan-American Advanced Studies Institute Workshop "Molecular-based Multiscale Modeling and Simulation" (OISE-1124480; PI: W. J. Pfaednter, U. Washington) held in Montevideo, Uruguay, 12–15 September 2012, where the authors met and began discussions that influenced the content of this review.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


Reprinted from *Entropy*. Cite as: Dellago, C.; Hummer, G. Computing Equilibrium Free Energies Using Non-Equilibrium Molecular Dynamics. *Entropy* 2014, *16*, 41–61.

## *Article*

## Computing Equilibrium Free Energies Using Non-Equilibrium Molecular Dynamics

Christoph Dellago **<sup>1</sup>***,* \* and Gerhard Hummer **<sup>2</sup>**


*Received: 10 October 2013; in revised form: 12 November 2013 / Accepted: 19 November 2013 / Published: 27 December 2013*

Abstract: As shown by Jarzynski, free energy differences between equilibrium states can be expressed in terms of the statistics of work carried out on a system during non-equilibrium transformations. This exact result, as well as the related Crooks fluctuation theorem, provide the basis for the computation of free energy differences from fast switching molecular dynamics simulations, in which an external parameter is changed at a finite rate, driving the system away from equilibrium. In this article, we first briefly review the Jarzynski identity and the Crooks fluctuation theorem and then survey various algorithms building on these relations. We pay particular attention to the statistical efficiency of these methods and discuss practical issues arising in their implementation and the analysis of the results.

Keywords: fast switching simulations; non-equilibrium work theorem; fluctuation theorem; non-equilibrium molecular dynamics

### 1. Introduction

The calculation of free energies from atomistic simulations is of great importance in many applications, ranging from the prediction of the phase behavior of a certain substance to the calculation of ligand affinities in drug design. Since the computation of free energies (or, more precisely, of free energy differences) involves the determination of entropic contributions and, hence, the estimation of phase space volumes [1], free energy calculations are computationally very demanding in most cases. Therefore, a significant effort has been devoted to the development of more efficient free energy calculation algorithms. This endeavor has received new momentum with Jarzynski's discovery of a very general relation between equilibrium free energies and non-equilibrium work [2,3], which has inspired several molecular dynamics-based algorithms for free energy computations. In this article, we will give an overview of these methods.

According to the maximum work theorem, a consequence of the second law of thermodynamics, the amount of work W performed on a system during a non-equilibrium transformation is larger than the free energy difference ΔF between the equilibrium states corresponding to the transition end points:

$$
\langle W \rangle \ge \Delta F \tag{1}
$$

Equivalently, the amount of work that can be extracted from a system is bounded from above by the free energy difference. In the above equation, the equal sign holds only if the transformation is carried out reversibly, maintaining equilibrium at all times. The angular brackets on the left-hand side of the maximum work theorem indicate an average over many realizations of the non-equilibrium process. If one considers a macroscopic system, for instance, a piston compressing a gas enclosed in a cylinder, the average is not necessary, because every realization of the process yields, for all practical purposes, the same amount of work W, if the transformation is carried out following the same protocol. This is essentially a consequence of the central limit theorem for thermal fluctuations. In the case of a microscopic system, however, fluctuations become important, and different realizations of the transformation typically produce different work values, leading to a statistical distribution of W. For instance, stretching a biomolecule with atomic force microscopes or optical tweezers will cost a different amount work for each repetition of the experiment. In some cases, the work expended on the system might even be smaller than the free energy difference, seemingly violating the maximum work theorem and, hence, the second law of thermodynamics.

As shown by Jarzynski in 1997 [2,3], the work fluctuations resulting for microscopic systems can be accounted for in an exact way, transforming the maximum work theorem into an equality:

$$
\left\langle e^{-\beta W} \right\rangle = e^{-\beta \Delta F} \tag{2}
$$

Here, β = 1/kBT is the reciprocal temperature of the equilibrium state from which the transformation is started, and k<sup>B</sup> is the Boltzmann constant. Remarkably, this result, now commonly referred to as Jarzynski equation or Jarzynski non-equilibrium work theorem, relates the statistics of irreversible work carried out on the system, while it is driven away from equilibrium, to an equilibrium free energy difference. A closely connected result is the Crooks fluctuation theorem [4–6], which relates the equilibrium free energy difference to the work distributions of the forward and reversed process.

In general, processes during which work is performed on or by the system drive the system away from equilibrium, such that the phase space distribution obtained at the end of the process may differ strongly from the equilibrium distribution to which the system relaxes after the external perturbation has been stopped. For instance, a piston pushed quickly into a gas-filled cylinder generates non-equilibrium states with strong flows markedly different from the static equilibrium state to which the gas eventually relaxes after the piston has reached its final state. At first sight, it is therefore surprising that equilibrium properties, such as free energy differences, can be extracted from non-equilibrium trajectories. As discussed in the following sections of this paper, a closer analysis reveals that averaging over the work exponential is equivalent to removing the bias introduced during the driving process. It is this unbiasing that ultimately permits the extraction of equilibrium properties (as we will discuss in Section 5, in principle, one can determine the entire equilibrium distribution and not only the free energy) from non-equilibrium trajectories. Thus, the non-equilibrium work theorem can be viewed as a prescription of how to compensate for the effects of manipulations that drive the system into non-equilibrium rather than a tool that illuminates the nature of non-equilibrium processes. Nevertheless, it is remarkable that the bias has a very simple exponential form and can be expressed in terms of the work only.

The Jarzynski non-equilibrium work theorem, as well as the Crooks fluctuation theorem provide the framework for the interpretation of single-molecule pulling experiments [7–9], in which non-equilibrium effects can never be fully avoided. These exact results can also be exploited to devise computer simulation algorithms for the calculation of free energies. In this article, we review several computational approaches based on the collection of work statistics in a fast-switching non-equilibrium setting, paying particular attention to the accuracy and efficient implementation of these methods compared to conventional free energy computation methods (see [10–12]). In the remainder of this article, we will first state the Jarzynski and Crooks theorems more explicitly and discuss the conditions under which they apply. After that, we will survey several fast switching algorithms in which free energies are determined from sets of molecular dynamics trajectories obtained while changing a control parameter, thereby exerting work on the system. We conclude with a brief summary and outlook to future possibilities and applications.

#### 2. Jarzynski Identity and Crooks Fluctuation Theorem

To set the notation, consider a classical system with energy H(x, λ) depending on the microscopic state x of the system, as well as on a parameter λ. The microscopic state x is specified by the positions of all particles in the system and, if necessary, also by all momenta. The parameter λ is a control parameter that can be changed externally, for instance, the volume of the cylinder containing the particles or an external field. According to the basic laws of statistical mechanics, the free energy difference between the two equilibrium states A and B corresponding to the values λ<sup>A</sup> and λB, respectively, of the order parameter is given by:

$$
\Delta F = F\_B - F\_A = -k\_B T \ln \frac{Z\_B}{Z\_A} \tag{3}
$$

where Z<sup>A</sup> = dx exp{−βH(x, λA)} and Z<sup>B</sup> = dx exp{−βH(x, λB)} are the canonical partition functions of the two equilibrium states (up to a combinatorial prefactor irrelevant for our considerations). The free energy difference ΔF is the work required to change the external parameter from λ<sup>A</sup> to λ<sup>B</sup> in a *reversible* process. Such a reversible transformation could be realized, for instance, by changing the parameter λ infinitely slowly, while keeping the system in contact with a heat bath. In this case, the free energy difference is equal to the work of the system.

Instead of changing the control parameter λ very slowly, one could change it at a finite rate over a time interval τ , following a certain protocol λ(t), where λ(0) = λ<sup>A</sup> and λ(τ ) = λB. In general, such a fast switching of the control parameter drives the system away from equilibrium in an *irreversible* way, such that the work required to do the change exceeds the free energy difference, as posited by the maximum work theorem of Equation (1). To be more specific, the work performed on the system along a particular trajectory x(t) is the energy change caused by changes of the control parameter accumulated along the trajectory:

$$W[x(t),\lambda(t)] = \int\_0^\tau \left. \frac{\partial H(x(t),\lambda)}{\partial \lambda} \right|\_{\lambda=\lambda(t)} \dot{\lambda}(t) \, dt \tag{4}$$

where λ˙(t) is the time derivative of λ(t). Note that this work depends both on the protocol λ(t) as well as on the particular trajectory x(t) followed by the system. The average appearing on the left-hand side of Equation (1) is over many repetitions of the switching process starting from initial conditions distributed according to the equilibrium distribution ρ(x) ∝ exp(−βH(x, λA)) for control parameter λA. In a computer simulation, one could realize such a process by sampling initial conditions from a canonical distribution and then integrating the underlying equations of motion, while at the same time changing the control parameter λ according to the protocol λ(t).

Jarzynski has shown [2,3] that averaging over the exponential of the work exp(−βW(τ )) rather than the work, turns the maximum work theorem into an equality, exp{−βW[x(t), λ(t)]} = exp{−βΔF}. It is important to realize that the average over the work exponential involves two averages, one over the distribution of initial conditions and another one over the set of trajectories that originate from a particle initial condition. For deterministic dynamics, the initial condition determines the entire trajectory, x(t), but for stochastic dynamics, the system evolves in different ways, even if one repeatedly starts from the same initial condition. Hence, for stochastic dynamics, the average appearing in the Jarzynski equation also requires an average over noise histories.

The Jarzynski equation is an exact result that holds under very general conditions. The requirements are that initially, the system must be in equilibrium and that for a fixed control parameter, the dynamics conserves the equilibrium distribution corresponding to that value of the control parameter. The latter condition is satisfied by most types of dynamics usually used in computer simulations, including Newtonian, thermostated, Langevin and Monte Carlo dynamics. It is worth pointing out that it is not necessary that the system be in an equilibrium state at the end of the transformation process or relax towards equilibrium after the control parameter switching is completed. Furthermore, it is interesting that the Jarzynski equation holds, even if the switching is carried out according to different (though prescribed) protocols provided that λ(0) = λ<sup>A</sup> and λ(τ ) = λB, *i.e.*, all protocols start at λ<sup>A</sup> at time 0 and finish at λ<sup>B</sup> at time τ . After Jarzynski's seminal work [2], in which the Jarzynski equality was derived for systems evolving deterministically with and without coupling to a heat bath, several other proofs were provided, for instance, based on a master equation [3], for Markovian dynamics satisfying detailed balance [5,13], for dynamical systems conserving the canonical distribution [14] or from the Feynman–Kac theorem [7].

In the limiting cases of infinitely fast switching and infinitely slow switching, the Jarzynski equality reduced to two well-known results. For instantaneous switching, τ → 0, the initial and final point of a trajectory are identical, as the system has no time to evolve. In this case, the work carried out on the system at a particular microscopic state x equals the difference in energy evaluated for the two values of the control parameter:

$$W(x) = H(x, \lambda\_B) - H(x, \lambda\_A) \tag{5}$$

The Jarzynski equation then becomes:

$$e^{-\beta \Delta F} = \left\langle e^{-\beta [H(x,\lambda\_B) - H(x,\lambda\_A)]} \right\rangle\_{\lambda\_A} \tag{6}$$

where the subscript next to the angular bracket indicates that the average has to be carried out with respect to the equilibrium distribution at λA. The above equation is the central result of free energy perturbation theory [15] and is often used to compute free energy differences. In the opposite limit of infinitely slow switching, τ → ∞, the system has time to equilibrate for every intermediate value of the control parameter, such that the Jarzynski equation together with Equation (4) implies:

$$
\Delta F = \int\_{\lambda\_A}^{\lambda\_B} \left\langle \frac{\partial H(x,\lambda)}{\partial \lambda} \right\rangle\_{\lambda} d\lambda \tag{7}
$$

This expression provides the basis for the thermodynamic integration method [16], in which equilibrium simulations are carried out for different, but fixed values of the control parameter λ to compute the average energy derivatives ∂H/∂λλ. The free energy difference is then obtained by numerical integration, for instance, by using the Simpson rule or more sophisticated integration schemes. The maximum work theorem of Equation (1) also immediately follows from the Jarzynski equation by virtue of Jensen's inequality, exp(−x) ≥ exp(−x).

As mentioned in the introduction, the Jarzynski equation can be viewed as a way to remove the bias introduced by the switching process into the phase space distribution obtained at the end of the process. Following similar considerations as those used to derive the Jarzynski equality, one can prove that for any phase space function A(x) the following equation holds [4,7,17]:

$$<\langle A(x)\rangle\_{\text{eq},\lambda\_B} = \langle A(x(\tau))e^{-\beta[W(\tau)-\Delta F]}\rangle\_{\text{non-eq}}\tag{8}$$

Here, the angular brackets on the left-hand side indicate an equilibrium average for the control parameter fixed at λB, and the average on the right-hand side is an average over non-equilibrium pathways generated with protocol λ(t) just as in the Jarzynski equations. To make this difference even more explicit, we have added the subscripts eq and non-eq to the equilibrium and non-equilibrium average, respectively. In the above equation, x(τ ) refers to the endpoints of the non-equilibrium trajectories. The Jarzynski equation is simply obtained by setting A(x)=1. Equation (8) implies that equilibrium averages can be computed by reweighting the non-equilibrium distribution obtained as a result of the switching procedure by exp(−βW + βΔF). In particular, the equilibrium distribution for λ<sup>B</sup> is obtained by setting A(x) = δ(x − x(τ )), where δ(x) is the Dirac delta function:

$$
\rho\_{\rm eq}(x,\lambda\_B) = \langle \delta(x - x(\tau))e^{-\beta[W(\tau) - \Delta F]} \rangle\_{\rm non-cq} \tag{9}
$$

If the dynamics of the system not only conserves the equilibrium distribution for a fixed control parameter, but is also microscopically reversible, *i.e.*, if it satisfies detailed balance, the work distribution for the forward process is simply related to that of the process carried out with the time reversed protocol. More specifically, the distribution P(W) of work W observed in repeated realizations of the switching process is given by:

$$P(W) = \langle \delta(W - W[x(t), \lambda(t)]) \rangle\_A \tag{10}$$

where the average is over initial conditions of the equilibrium ensemble A and over pathways starting from these initial conditions under the action of the protocol λ(t). Now, consider the time inverted protocol λR(t) = λ(τ − t). The distribution PR(W), observed for the reverse process, in which the control parameter is changed from λ<sup>B</sup> back to λA, can be written as:

$$P\_R(W) = \langle \delta(W - W[x(t), \lambda\_R(t)]) \rangle\_B \tag{11}$$

where, now, the average is over initial conditions from the equilibrium ensemble B with trajectories evolving, while the control parameter follows the inverted protocol λR(t). Crooks has shown that for dynamics that is microscopically reversible, the work distributions P(W) and PR(W) for the forward and reverse process, respectively, are related by [5,6]:

$$P(W) = P\_R(-W)e^{\beta(W - \Delta F)}\tag{12}$$

This exact result, known as the Crooks fluctuation theorem, also serves as a basis for various free energy calculation methods, as explained in detail in subsequent sections.

#### 3. Implementing Fast Switching Simulations

Jarzynski's non-equilibrium work theorem and the Crooks fluctuation theorem suggest interesting algorithms for the calculation of free energy differences. The power of these algorithms derives from the fact that all quantities appearing in these relations can be easily determined. The simplest of these algorithms consists in the following steps. First, one needs to prepare initial conditions distributed according to the Boltzmann–Gibbs distribution. This can be achieved using a variety of methods, for instance, canonical Monte Carlo simulation, possibly combined with enhanced sampling methods, such as parallel replica sampling, or with thermostated molecular dynamics. To improve the efficiency of the free energy calculation, it is important to make sure that these initial conditions are sufficiently decorrelated.

From these initial conditions, one then starts trajectories of the desired length that are integrated, while, at the same time, changing the control parameter according to the protocol λ(t). Both the choice of the parameter λ used to drive the transformation, as well as the shape of the protocol influence the efficiency of the calculation, as described in detail below. One can compute the dynamics of the system based on stochastic equations of motion, such as the Langevin equation, or deterministic equations of motion, such as Newton's equations with or without thermostat. Along the computed trajectories, one then has to compute the work W carried out on the system by changing the control parameter. This is most easily done by dividing the basic molecular dynamics steps into two sub-steps. In the first sub-step, the state x(t + Δt) of the system at time t + Δt is computed by carrying out an integration step with the control parameter fixed at value λ(t). In the second sub-step, one then changes the control parameter from λ(t) to λ(t + Δt), while keeping the state x(t + Δt) of the system unchanged. Only in this second sup-step is work carried out on the system. In this two-step procedure, the work carried out on the system along a particular trajectory up to time t+Δt is given by:

$$W(t + \Delta t) = W(t) + H(x\_{t + \Delta t}, \lambda\_{t + \Delta t}) - H(x\_{t + \Delta t}, \lambda\_t) \tag{13}$$

where x<sup>t</sup> ≡ x(t) and λ<sup>t</sup> ≡ λ(t) are the state of the system and the value of the control parameter at time t, respectively. From the work values collected in this way for the forward process, and possibly also for the backward process, one can then determine the free energy difference by applying the types of analyses discussed in the next section.

An important choice one has to make in the context of fast switching free energy computations is how to allocate computing time. In particular, one has to decide whether to generate many short trajectories with a large switching rate or fewer and longer trajectories along which the system is driven more gently. Without enhanced sampling schemes, as those discussed in subsequent sections, one generally expects the slow switching regime to give more accurate free energy estimates for a given amount of computing time [18]. As a rule of thumb, one should carry out the switching slowly enough, such that the standard deviation of the work values does not exceed kBT. In this slow switching regime, the statistical error obtained with a given amount of computing time grows slowly with the switching rate. It is nevertheless more advantageous to compute several trajectories at a moderate switching rate than one single long trajectory, because then, an error estimate for the free energy can be obtained in a straightforward manner. Furthermore, multiple trajectories can be run in parallel to exploit the capabilities of parallel processing machines. Another important choice to make in fast switching simulations concerns the direction in which the transformation is carried out. Interestingly, it can be shown that the direction in which more work is dissipated is computationally beneficial [19]. This formal result is consistent with experience in free energy calculations using perturbation theory. In the calculation of chemical potentials, for instance, test particle insertion typically produces a larger variation in the energy change compared to particle removal and leads to more accurate estimates of the chemical potential [1].

As discussed above, the statistical error of a free energy computed via fast switching strongly depends on the rate at which the system is driven out of equilibrium. However, while the switching rate is certainly the most important parameter, also the particular shape of the protocol λ(t) for a given total switching time τ plays an important role in determining the accuracy of the free energy estimate. Since the Jarzynski equality and the Crooks fluctuation theorem hold for arbitrary protocols, one can exploit this freedom to design protocols that optimize the free energy computation. Recently, Schmiedl and Seifert have addressed a related question, asking how the protocol should be designed to minimize the average work expended during the non-equilibrium transformation for a given total of τ [20]. Their analysis, carried out for a particle dragged through a fluid and for a particle in a harmonic trap with changing strength, indicates that, surprisingly, the optimum protocol has discontinuous jumps, both at the beginning and at the end of the process. This result is in contrast to an earlier linear-response analysis [21], which implied that the optimum protocol is smooth and free of jumps. In the cases studied by Schmiedl and Seifert, the optimum protocol with jumps led to a reduction of the dissipated work by up to 12% compared to the case with a continuous protocol changing linearly in time. A subsequent numerical study of a non-linear system carried out by Then and Engel [22] showed that the optimum protocol can have one, two or even more jumps. Steps occur also in the optimum protocol for underdamped Langevin dynamics, for which also delta-like singularities appear at the start and the end of the switching process, effectively kicking the system discontinuously [23].

While, in general, protocols in which the dissipated work is small are expected to yield a more accurate free energy estimate, there is no simple relation between the average work and the statistical error in the free energy. Hence, a protocol optimized with respect to the work does not necessarily minimize the statistical error. However, numerical protocol optimizations conducted for various models indicate that control parameter steps at the start and the end of the protocol (but never in between) are beneficial also for free energy computations [24]. These steps are most pronounced in the fast switching regime and disappear for slow switching. For small switching rates, the minimum work protocol and the minimum error protocol are identical, but for large switching rates, that may differ. In some cases the minimum error protocol even yields an average work that is larger than that of a linear protocol without steps. While appropriate steps in the protocol can lead to a considerable reduction of the computational cost of fast switching free energy calculations, such large savings typically occur only in switching regimes where the straightforward application of the Jarzynski equality is impractical. Whether work biased sampling schemes (discussed in Section 6) may serve to leverage the potential power of discontinuous protocols is currently an open question.

#### 4. Analysis of Non-Equilibrium Free Energy Calculations

The simplest, but also most error-prone, method to obtain free energies from one-sided non-equilibrium simulations is a direct evaluation of the exponential estimator:

$$
\Delta F = -k\_\text{B} T \ln \left< e^{-\beta W} \right> \approx -k\_\text{B} T \ln \sum\_{i=1}^n e^{-\beta W\_i} / n \tag{14}
$$

where W<sup>i</sup> are the work values obtained in n independent non-equilibrium runs. If the work distribution is broad, with a variance var(W) (kBT)<sup>2</sup>, then the estimate will tend to be dominated by only a few trajectories [19]. All others have negligible weight, resulting not only in sampling inefficiency, but also a systematic bias of the free energy estimate (*i.e.*, the average of ΔF, obtained in repeated sampling with a fixed number n of trajectories, deviates from the exact value [25]). The resulting systematic errors can be estimated and at least partly corrected [17,26–28]. Alternatively, the width of the work distribution can be reduced by breaking the transformation up into segments [18,29,30]. However, the computational cost of re-equilibration at intermediate stages can be significant. The bias can also be eliminated by using cumulant estimators [2,18], in particular, the second-order approximation:

$$
\Delta F \approx \langle W \rangle - \beta \operatorname{var}(W) / 2 \tag{15}
$$

However, while eliminating the bias of the exponential estimator, the cumulant approximation is only approximate and, thus, has a systematic error if the work distribution deviates from a Gaussian. Other approaches using the tail statistics of work values have also been proposed [31,32]. In closing the discussion of the direct estimator, we note that the width of the work distribution is closely related to the amount of energy dissipated in the non-equilibrium transformation:

$$
\langle W \rangle - \Delta F \quad \approx \quad \beta \operatorname{var}(W)/2 \tag{16}
$$

Large variance, and, thus, large dissipation, arises from hysteresis effects and can be minimized by optimising the transformation protocol with respect to its time dependence [20] and the choice of control parameter.

More accurate and asymptotically unbiased free energy estimates can be obtained from two-sided simulations by using the Crooks relation. By exploiting the analogy between equilibrium perturbation theory and non-equilibrium simulations, one can adapt Bennett's acceptance ratio as the estimator [33,34]. It requires solving an implicit relation:

$$\sum\_{i=1}^{n\_f} \frac{1}{1 + \frac{n\_f}{n\_b} e^{\beta(W\_i - \Delta F)}} = \sum\_{i=1}^{n\_b} \frac{1}{1 + \frac{n\_b}{n\_f} e^{\beta(\underline{W\_i} + \Delta F)}} \tag{17}$$

where W<sup>i</sup> and W<sup>i</sup> are the work values obtained on the n<sup>f</sup> and n<sup>b</sup> forward and reverse transformations, respectively. This equation can be solved numerically, e.g., by using the Newton–Raphson method. Note that the work values, Wi, on the reversed path have the opposite sign.

The analogy to the equilibrium method also allows us to adapt two-sided cumulant estimators [35] to non-equilibrium work distributions [18] or to use Bennett's overlapping histogram method [33]. While less efficient as a free energy estimator than the acceptance ratio method, the histogram method provides us with a test of consistency between forward and reverse transformations. According to Equation (12), a plot of the logarithm of P(W)/PR(−W) should be a straight line as a function of W with slope β. Deviations point to sampling issues or other problems. Another approach [36] for the calculation of free energies from non-equilibrium switching simulations relies on the ideas of waste-recycling Monte Carlo [37].

#### 5. Calculating Potentials of Mean Force

Potentials of mean force (PMF) G(q) along a chosen coordinate q = q(x) are defined as:

$$G(q) \ = -k\_{\rm B}T \quad \ln \int dx e^{-\beta H(x)} \delta[q - q(x)] \tag{18}$$

up to an arbitrary constant. The coordinate q depends on the phase space coordinate x and, thus, fluctuates along a trajectory. To apply the Jarzynski equality, one would need to make q a control parameter equivalent to λ. However, in molecular simulations, one may not be able to (or want to) control q explicitly, e.g., by applying a holonomic constraint. Instead, it may be easier to restrain q, for instance, by imposing harmonic biasing potentials, as in umbrella sampling. Even if such bias potentials are explicit functions of time, e.g., by moving the center of the harmonic bias, one can obtain equilibrium PMFs from an extension of the Jarzynski equality [7]. The central relation is Equation (9), which allows us to obtain an estimate of the equilibrium phase space density by reweighting trajectory data. If the time-dependent biasing potential is of the form V = V [q(x), t], then the equilibrium PMF in the absence of the bias V , up to a time-dependent constant, can be recovered by weighting trajectory points q[x(t)] with the Boltzmann factor of the work minus the energy stored in the pulling spring:

$$G(q) = -k\_\text{B}T \ln \left\langle \delta[q - q[x(t)]]e^{-\beta[W(t) - V[q[x(t)], t]]} \right\rangle \tag{19}$$

In principle, this relation applies at every time, t. In practice, q values at time t will be concentrated in a narrow region, whose location depends on the bias, V , and its history. Therefore, to obtain a complete PMF over a range of q values, one should combine results at different times t. In the original derivation, the histogram-reweighting procedure of Ferrenberg and Swendsen [38] was adapted for non-equilibrium PMF calculations [7,17]:

$$G(q) = -k\_\text{B}T \ln \frac{\sum\_t \frac{\langle \delta[q - q(t)] \exp[-\beta W(t)] \rangle}{\langle \text{exp}[-\beta q(t)] \rangle}}{\sum\_t \frac{\exp[-\beta V(q, t)]}{\langle \text{exp}[-\beta W(t)] \rangle}} \tag{20}$$

where the sums extend over different time points t. This is not the only possible way to combine histograms obtained at different times, and other procedures have been suggested [39–41].

In many practical applications, the biasing potentials V are harmonic. In such "steered molecular dynamics" simulations and similar approaches [42–45], one can obtain estimates of the PMF using approximate formalisms that involve the system's free energy difference ΔF(t) and its time dependence. In the limit of very stiff pulling springs <sup>V</sup> (q, t) = <sup>k</sup>[<sup>q</sup> <sup>−</sup> <sup>z</sup>(t)]<sup>2</sup>/2, constraining <sup>q</sup> to a prescribed path z(t) with large k, one can use the "stiff-spring approximation" of Park *et al*. [46]. In this limit, q is almost a control parameter, which results in an approximate relation between the system free energy difference ΔF(t) and the PMF G(q):

$$G[q = q(t)] \approx \Delta F(t) - \frac{1}{2kv^2} \left(\frac{\Delta \ddot{F}(t)}{\beta} - \Delta \dot{F}(t)^2\right) \tag{21}$$

where we assumed, for simplicity, that the spring moves at a constant velocity v, *i.e.*, z(t) = vt, and ΔF˙ = dF(t)/dt. More accurate approaches using the same information, ΔF(t) and its first two time derivatives, have been derived on the basis of the Weierstrass transform [17,47]:

$$G\left(q = vt - \frac{\Delta \dot{F}(t)}{kv}\right) \approx \left.\Delta F(t) - \frac{\Delta \dot{F}(t)^2}{2kv^2}\right.$$

$$+ \frac{1}{2\beta} \ln\left(1 - \frac{\Delta \ddot{F}(z)}{kv^2}\right) \tag{22}$$

Note that the PMF is calculated at a shifted position and that the argument of the logarithm is positive by definition, being proportional to a variance [47]. In practical applications of Equation (21) or (22), ΔF(t) can be obtained from either unidirectional simulations using the Jarzynski equality or from bidirectional sampling using, e.g., the method of Minh and Adib [48], building on the Crooks fluctuation theorem. Minh and Adib [48] have also developed histogram-based PMF reconstructions that combine information from simulations starting at different transition endpoints, *i.e.*, with initial biases V (q, 0) and V (q, τ ) and evolving as V (q, t) and V (q, τ − t).

#### 6. Importance Sampling of Fast-Switching Trajectories

Fast switching simulations carried out at large switching rates typically generate work distributions that lead to large statistical uncertainties in the free energy estimate. As discussed earlier, the reason is that trajectories with typical work values contribute little to the exponential average of the Jarzynski equation, while trajectories with work values dominating the average are very rare. As a consequence, the convergence of the computed free energy is impractically slow for overly fast switching. A solution to this problem consists in favoring the generation of trajectories with important work values. In this section, we discuss how path sampling techniques can be used for this purpose.

To introduce computational methods for realizing this idea, we rewrite the exponential work average as an explicit sum over pathways:

$$e^{-\beta \Delta F} = \int \mathcal{D}x(t)P[x(t), \lambda(t)]e^{-\beta W[x(t), \lambda(t)]}\tag{23}$$

where the notation Dx(t) implies an integral over all pathways x(t) and P[x(t), λ(t)] is the probability to observe the trajectory x(t) for given protocol λ(t). Note that the path probability P[x(t), λ(t)] also includes the probability of the initial condition x0. As suggested by Ytreberg and Zuckerman [49] and by Athènes [50], one way to enhance the sampling of important trajectories consists in introducing an explicit bias function π[x(t)] (assumed to be integrable and positive everywhere) in the average:

$$e^{-\beta \Delta F} = \frac{\int \mathcal{D}x(t)P[x(t)]\pi[x(t)]e^{-\beta W[x(t)]}/\pi[x(t)]}{\int \mathcal{D}x(t)P[x(t)]\pi[x(t)]/\pi[x(t)]} \tag{24}$$

where we have dropped the explicit dependence on the protocol λ(t) in the arguments of P[x(t)] and W[x(t)] to simplify the notation. The right-hand side of this equation, obtained by simply dividing and multiplying by the (so far unspecified) bias function π[x(t)] can be viewed as the ratio of two averages taken in a biased ensemble, leading to:

$$e^{-\beta \Delta F} = \frac{\langle e^{-\beta W[x(t)]}/\pi[x(t)]\rangle\_{\pi}}{\langle 1/\pi[x(t)]\rangle\_{\pi}} \tag{25}$$

Here, the angular brackets · · · <sup>π</sup> denote an average over pathways distributed according to the biased ensemble Pπ[x(t)] ∝ P[x(t)]π[x(t)]. Since, in general, the bias function π[x(t)] depends on the entire pathway x(t), the biased ensemble cannot be sampled by preparing initial conditions according to a certain distribution and running fast switching trajectories from them. Instead, one can use trajectory sampling algorithms (such as the shooting algorithm) adapted from transition path sampling, a methodology originally developed for the simulation of rare events occurring in complex systems [51–53]. In this approach, the bias function appears in the acceptance probability of the path sampling scheme, steering the simulation towards the desired regions of trajectory space.

Since the bias function should enhance the sampling of important, but rare, work values, a bias function depending on the path x(t) only through the work W[x(t)] suffices, π[x(t)] = π[W[x(t)]]. The accuracy of a free energy calculation carried out with biased path sampling now crucially depends on the particular choice of this bias function. It is evident that to obtain an accurate estimate of ΔF, the bias function should be selected, such that the statistical error is small both in the numerator and in the denominator of the fraction on the right-hand side of Equation (25). This implies that the work distribution in the biased ensemble should have a large overlap with the work distribution P(W) in the unbiased ensemble, as well as with the integrand P(W) exp(−βW) appearing in the Jarzynski equality. It has been shown [49,50] that large efficiency increases can be obtained using the bias function π(W) = exp(−βW/2), which produces a work distribution in between the two distributions P(W) and P(W) exp(−βW) [54]. A more systematic investigation [55] of the statistical error in the free energy estimate obtained by biased path sampling yields the optimum bias π(W) = | exp(−β(W −ΔF))−1|. This result implies that the expected statistical error in the free energy is smallest if typical and dominant work values are sampled with high frequency. Interestingly, sampling work values W ≈ ΔF near the free energy difference is not important. Unfortunately, the practical usefulness of this optimum bias function is limited, because its application requires prior knowledge of the free energy difference, *i.e.*, the very quantity one wants to compute. However, iterative schemes, in which the bias function is adapted as the simulation goes on, might make productive use of the functional form of the optimum bias. A recently suggested approach [36] based on the waste-recycling estimator [37] effectively introduces a bias that covers both peaks of the optimum bias, π(W).

Another way of realizing work biased path sampling of fast-switching trajectories for the computation of free energies was suggested by Sun [56,57]. In this approach, which can be viewed as a thermodynamic integration procedure in path space, a parameter α is introduced into the exponential average:

$$e^{-\beta \Delta \bar{F}(\alpha)} = \int \mathcal{D}x(t)P[x(t)]e^{-\beta \alpha W[x(\tau)]}\tag{26}$$

The right-hand side defines, in effect, the generating function of the work distribution at the end of the transformation. The free energy difference ΔF˜(α) defined by the above equation depends on this parameter α. While for α = 0 one obtains ΔF˜(0) = 0 due to the normalization of the path distribution, for α = 1 one recovers the original free energy difference ΔF˜(α)=ΔF. One can thus compute ΔF by taking the derivative of ΔF˜(α) with respect to α and then integrate over α from zero to one [18,56,57]:

$$
\Delta F = \int\_0^1 d\alpha \frac{d\Delta \tilde{F}(\alpha)}{d\alpha} \tag{27}
$$

The advantage of writing the free energy difference in this way is that the derivative of ΔF˜(α) with respect to α yields a simple average over the work:

$$\frac{d\Delta\tilde{F}(\alpha)}{d\alpha} = \langle W \rangle\_{\alpha} \tag{28}$$

where the notation · · · <sup>α</sup> indicates a path average over the work weighted path ensemble:

$$P\_{\alpha}[x(t)] \propto P[x(t)]e^{-\beta \alpha W[x(t)]} \tag{29}$$

The work average W<sup>α</sup> is not affected by the type of statistical errors that make the computation of the exponential work average difficult, and it can be evaluated efficiently in a path sampling simulation. By repeating such a calculation for different values of α and integrating the work average numerically, one finally obtains the desired free energy difference. Furthermore, in this method, the statistical errors are kept low by making sure that pathways with both dominant and typical work values are sampled with sufficient frequency. This can be seen explicitly by noting that in the work biased ensemble corresponding to a particular value of the bias parameter, α, the work, W, is distributed according to Pα(W) ∝ P(W) exp(−βαW). Thus, by gradually changing α from zero to one, one switches the work distribution from P(W) to P(W) exp(−βW), sweeping over all important work values in the course of the thermodynamic integration procedure.

One can show that in the limit of infinitely short trajectories, Sun's method reduces to conventional thermodynamic integration. This result raises the question of which trajectory length leads to the most efficient free energy calculations and, in particular, if work biased path sampling algorithms perform better then conventional methods, such as thermodynamic integration or umbrella sampling. Extensive calculations carried out for various models indicate [58,59] that work biased fast switching path algorithms are generally less efficient than standard methods, such as thermodynamic integration, thermodynamic perturbation or umbrella sampling. There are however cases, such as an ideal gas compressed by a piston moving in a cylinder, where fast switching is advantageous [59]. In this particular case, the work distribution does not converge to a limiting form for increasing switching speed, and the typical work values keep growing. As a consequence, the optimum switching rate is finite in this case, even if an optimum work bias is applied [59].

#### 7. Fast Switching with Large Time Steps

Molecular dynamics simulations are usually carried out with time steps that are a compromise between accuracy (often assessed in terms of energy conservation) and computing speed. Small time steps yield accurate trajectories with good energy conservation, but require a larger computational effort, because the cost of a trajectory of a given length is proportional to the number of steps and, hence, inversely proportional to the size of the time step. Larger time steps reduce the computing time, but corrupt the accuracy, resulting in poor energy conservation. In general, using such low-accuracy trajectories for free energy computations introduces a systematic error into the free energy estimate. It is, however, possible to devise exact expressions akin to the Jarzynski equation to compute free energy differences from crude trajectories calculated with large time steps [13,60]. Using this approach, which is based on a generalization of the Jarzynski equation for phase space mappings [61], can help to considerably increase the efficiency of fast switching simulations, due to the reduced computational cost of the large time step trajectories.

As mentioned earlier, in the limit of instantaneous switching, the Jarzynski equation reduces to the perturbation identity of Equation (6). Free energy computation methods relying on this equation perform well if there is a large overlap between the ensembles A and B, corresponding to the control parameters λ<sup>A</sup> and λB, respectively. If, however, these ensembles strongly differ, the free energy calculation converges poorly, because important contributions to the average are rarely sampled. To remedy this situation, Jarzynski has devised the targeted free energy perturbation method [61] based on a generalization of the Jarzynski equality. The basic ideas underlying this approach is to improve the efficiency of the perturbative calculation by applying a mapping that transforms the equilibrium ensemble A into an ensemble A that overlaps more strongly with ensemble B. The mapping φ(x) considered in this approach is required to be invertible and differentiable, but is arbitrary otherwise. By starting from the definition of the free energy difference (Equation (3)) and carrying out a variable transformation from x to x = φ(x), one can then show that:

$$e^{-\beta \Delta F} = \left\langle e^{-\beta W\_{\phi}(x)} \right\rangle \tag{30}$$

where the "work" function is defined as:

$$W\_{\phi}(x) = H(\phi(x), \lambda\_B) - H(x, \lambda\_A) - k\_B T \ln \left| \frac{\partial \phi}{\partial x} \right| \tag{31}$$

The last term in the work function results from the Jacobian of the transformation and vanishes for phase space volume preserving maps. If the mapping is chosen to be the propagator of Newtonian dynamics, Equation (30) reduces to the Jarzynski equation for isolated systems evolving at constant energy. By using the inverse map, φ−<sup>1</sup>, with the corresponding work definition, one can also use this mapping approach together with the Crooks fluctuation theorem.

Equation (30) suggests the following algorithm for free energy computation. One first samples phase space points x from the equilibrium ensemble A. Then, to each of these points, one applies the mapping and computes Wφ. Finally, the average of exp(−βWφ(x)) carried out over all points x yields the free energy difference. Now, the efficiency of this method crucially depends on the ability to devise appropriate mapping φ(x). The closer the ensemble resulting from the transformation resembles B, the higher is the efficiency. No general methods exists to derive φ(x), but a well-chosen mapping can substantially reduce the cost of a free energy computation.

One possible strategy to exploit Equation (30) consists in choosing a sequence of molecular dynamics steps as phase space mapping. Each of these steps, designed to approximate the time evolution of the system over a small interval Δt maps a phase point x<sup>i</sup> into the next phase point xi+1 along the molecular dynamics trajectory. Hence, a sequence of n molecular dynamics steps may also be considered as a phase space mapping that takes the initial point x<sup>0</sup> into the final point xn. The expression for the work W<sup>φ</sup> is particularly simple for integrators, such as the Verlet algorithm, that conserve phase space volume. Then, the Jacobian of the mapping is unity, and Equation (30) turns into:

$$e^{-\beta \Delta F} = \left\langle e^{-\beta [H(x\_n, \lambda\_B) - H(x\_0, \lambda\_A)]} \right\rangle \tag{32}$$

Interestingly, this relation holds exactly independently of the size of the time step Δt used in the integration algorithm. Hence, fast switching simulations can be carried out with large time steps, producing only approximate trajectories. Nevertheless, the free energies obtained in this way are in principle exact. Since trajectories computed with a large time step require a smaller number of integration steps, such fast switching simulation holds the promise to improve the efficiency of the free energy calculation. Whether this is indeed the case, depends on how the work distribution changes due to the large time step. Calculations carried out for several model systems indicate that while the molecular dynamics trajectories generated with large time steps are approximate, they still reproduce the essential physics of the process, such that the work distributions are not affected adversely. As a consequence, for optimum efficiency, time steps of fast switching free energy computations can be increased up to the stability limit of the simulation. Note that this large time step approach can be used also using integrators that do not conserve phase space volume [60], but this unnecessarily complicate the simulations, because one has to keep track of the Jacobian while computing the molecular dynamics trajectories.

The large time step formalism can also be used for the calculation of potentials of mean force [62]. In such a simulation, the work based reweighing of Equation (30) is applied at each stage of the time evolution with a work function that accumulates along the trajectory. Fast switching simulations were carried out for the force induced unfolding of a decalanine molecule [62]. The free energy profile obtained for a time step of 3.2 fs, *i.e.*, close to the stability limit, agrees well with that calculated using a conservative time step of 0.5 fs. An efficiency analysis reveals that the optimum time step for the unfolding simulations lies in the range 1–3 fs. It is interesting to note that the fast-switching trajectories may show unphysical features, such as a redistribution from potential to kinetic energy, due to the conserved shadow Hamiltonian belonging to the integrator used in the simulation [62]. Nevertheless, the obtained free energy profile is exact up to statistical errors.

#### 8. Applications

Arguably the most important practical application of non-equilibrium work theorems has been to experiments. Almost immediately after the connection between non-equilibrium single-molecule pulling experiments and Jarzynski's identity was rigorously established [7], experimental studies of the folding and unfolding of nucleic acids using optical tweezers followed [8,63]. It is often difficult, if not impossible, to conduct pulling experiments sufficiently slowly to maintain near-equilibrium conditions. Nonetheless, the use of non-equilibrium free energy reconstruction has made it possible to extract thermodynamic information.

Applications to pulling have been mirrored on the simulation side. Simulated pulling methods mimicking experiments have been developed, initially to probe mechanical perturbations on biomolecules [42–44]. Non-equilibrium pulling methods have been applied not only to protein unfolding, but also to many other complex molecular processes, including ligand dissociation [64–66] and channel translocation [67,68]. To analyze such "steered molecular dynamics" simulations and extract PMFs, the stiff-spring approximation is widely used [46], though Equation (22) offers a more accurate method using the same information [47] that produce results comparable to full histogram reweighting. In molecular simulations, non-equilibrium methods tend to be less efficient than optimized equilibrium methods as a tool to calculate free energies [18,58]. However, as discussed above, the optimization of non-equilibrium sampling methods is an area of active research, in particular, using importance sampling methods involving path reweighting [49,50,55–59] and nonlinear maps [69,70]. Moreover, non-equilibrium methods can provide valuable insight into the mechanism underlying a process. By forcing the system through a transition and monitoring the resulting bottlenecks [71], one may be able to devise improved control variables that result in a smoother transition and improved sampling efficiency, both in non-equilibrium and equilibrium simulations.

#### 9. Conclusions and Outlook

The Jarzynski non-equilibrium work theorem and the Crooks fluctuation theorem are fundamental exact relations that link the irreversible work carried out on a system during a non-equilibrium transformation to the system's equilibrium statistics. To date, the most significant application of these relations lies in the interpretation of single-molecule pulling experiments, in which forces exerted by atomic force microscopes or optical tweezers are used to probe the properties of individual molecules. Due to technological limitations, such experiments are necessarily carried out at a finite pulling rate, leading to non-equilibrium effects that cannot be neglected. The theorems of Jarzynski and Crooks provide a practical tool for the interpretation of such single-molecule pulling experiments and permit one to extract equilibrium information, such as potentials of mean force, from data obtained under inherently non-equilibrium conditions [7–9,72].

From a computational point of view, the Jarzynski and Crooks theorems have provided a new and powerful framework for the calculation of free energies using computer simulations. Apart from putting earlier slow-growth free energy simulations on a firm theoretical footing, these results have spawned the development of several new free energy algorithms based on non-equilibrium, fast-switching trajectories.

Depending on the rate at which the system is driven away from equilibrium, fast switching free energy computations can be plagued by large statistical errors. For strong driving, *i.e.*, for large switching rates, work distributions are broad, with typical work values by far exceeding the free energy difference. As a consequence, the exponential work average of the Jarzynski equation is dominated by a few rare contributions, leading to large statistical uncertainties and a bias in the free energy estimate. Such errors can outweigh the computational advantage of running inexpensive short trajectories rather than one single long trajectory [18,29,58]. In fact, it has been shown that in the slow switching regime, one obtains more accurate results from few slow simulations than from many faster ones [18]. Numerical simulations carried out for various model systems [58,59] indicate that conventional free energy computation methods, such as thermodynamic integration or free energy perturbation theory, are more efficient than fast switching simulations, even if work biasing techniques are employed. Fast switching methods may, however, be advantageous for systems in which the states of interest are connected by several distinct pathways. In such a case, conventional methods may fail to sample all important transition routes while multiple fast switching trajectories have the chance to probe all important pathways. Such a situation was indeed observed for transitions between low-energy configurations of Lennard-Jones clusters [41], which could be sampled successfully only with non-equilibrium path sampling, but not with other approaches. Compared to standard methods, fast switching algorithms appeared on the scene only recently, such that substantial improvements and new developments are to be expected [13,21,57,73–78]. It is worth noting that fast switching ideas have not only been applied to the calculation of free energies, but have also been combined with existing sampling methods to enhance the efficiency of the simulation. For instance, non-equilibrium switches have been used to improve the acceptance probability of replica exchange simulations [79,80] and to generate trial configurations for Monte Carlo simulations [81,82]. Conversely, waste-recycling Monte Carlo [37] can be adapted for the calculation of free energies from non-equilibrium switching simulations [36].

One aspect of fast switching simulations that has not been fully exploited is the freedom in choosing the transformation protocol. While the optimization of the time dependence of the driving parameter has been the subject of previous numerical and analytical studies [23,24], the extension of such optimizations to multiple control parameters is unexplored to date. The control parameter at the start and the end of the transformation are given, but in between, additional parameters can be subjected to a change as well, without affecting the validity of the relations that provide the basis for fast switching simulation. As an early example, an external pressure has been heuristically adjusted to maintain reasonable box sizes and prevent phase separation in a transformation between liquid and ideal gas states [54]. Defining parameter spaces of higher dimension and determining optimum parameter pathways in these spaces may offer efficient ways to control the work distribution and, hence, reduce the computational cost of fast switching simulations.

#### Acknowledgments

We acknowledge the financial support of the Austrian Science Fund (FWF, Fonds zur Förderung der Wissenschaftlichen Forschung) within the SFB ViCoM (Spezialforschungsbereich Vienna Computational Materials Laboratory), Grant F41, as well as Project P24681-N20 (C.D.) and the Max Planck Society (G.H.).

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


Reprinted from *Entropy*. Cite as: Ciccotti, G.; Ferrario, M. Dynamical Non-Equilibrium Molecular Dynamics. *Entropy* 2014, *16*, 233–257.

## *Review*

## Dynamical Non-Equilibrium Molecular Dynamics

#### Giovanni Ciccotti **<sup>1</sup>** and Mauro Ferrario **<sup>2</sup>***,* \*


*Received: 10 November 2013; in revised form: 26 November 2013 / Accepted: 16 December 2013 / Published: 27 December 2013*

Abstract: In this review, we discuss the Dynamical approach to Non-Equilibrium Molecular Dynamics (D-NEMD), which extends stationary NEMD to time-dependent situations, be they responses or relaxations. Based on the original Onsager regression hypothesis, implemented in the nineteen-seventies by Ciccotti, Jacucci and MacDonald, the approach permits one to separate the problem of dynamical evolution from the problem of sampling the initial condition. D-NEMD provides the theoretical framework to compute time-dependent macroscopic dynamical behaviors by averaging on a large sample of non-equilibrium trajectories starting from an ensemble of initial conditions generated from a suitable (equilibrium or non-equilibrium) distribution at time zero. We also discuss how to generate a large class of initial distributions. The same approach applies also to the calculation of the rate constants of activated processes. The range of problems treatable by this method is illustrated by discussing applications to a few key hydrodynamic processes (the "classical" flow under shear, the formation of convective cells and the relaxation of an interface between two immiscible liquids).

Keywords: non-equilibrium; molecular dynamics; dynamical relaxation; hydrodynamics

#### 1. Introduction

The most widespread use of Molecular Dynamics (MD) [1,2], in the same spirit of Monte Carlo (MC) [3,4], is to compute the thermodynamic or statistical behavior of molecular systems at equilibrium. This means that, starting from the assumption of the validity of the ergodic hypothesis, dynamical (MD) or fictitious-time (MC) trajectories are used to sample the equilibrium distribution in phase space (MD) or in configurational space (MC). "Time" averages over the generated trajectories will thereafter provide the statistical properties of the system.

At variance with Monte Carlo, the dynamical approach of Molecular Dynamics can be directly extended to sample distributions corresponding to stationary non-equilibrium conditions, where there exists a stationary distribution but, at variance with equilibrium, its expression is not explicitly known. However, the statistical problem of sampling a time-dependent ensemble cannot be solved by generating states along a single dynamical non-equilibrium trajectory, as long as time cannot be taken as homogeneous and averages over time make no sense.

Generally, to compute macroscopic dynamical behaviors, as, e.g., in hydrodynamics, the assumption of time-scale separation is made and rigorous ensemble averages are substituted with short-time averages equivalent to local smoothing. This may not be the case, sometimes. Moreover, the statistical error implied by this procedure cannot be made as small as desirable and possible. These difficulties can be faced and solved.

In the nineteen-thirties, Lars Onsager [5] observed that an induced (non-equilibrium) relaxation towards equilibrium could be obtained by studying the regression of the corresponding spontaneous fluctuations at equilibrium. Later, in the nineteen-fifties, Kubo [6] provided a mathematical formulation of Onsager's ideas by showing how the (linear) response of a system, initially at equilibrium, to a time-dependent (external) physical perturbation could be obtained by convoluting it with an appropriate equilibrium time-correlation function [7–9]. Kubo also derived the formal expression for the complete (linear and nonlinear) response.

In the case of Kubo's procedure one does not need to make reference to an initial *equilibrium* state, but can, rather, refer to an arbitrary initial distribution at time t<sup>0</sup> = 0 of the system. This result has an important consequence for Molecular Dynamics simulations, since it allows one to separate the problem of dynamical evolution from the problem of sampling the initial condition.

Starting from the mid-nineteen-seventies, the direct numerical simulation of the response was used in conjunction with a sample of initial conditions extracted from an equilibrium trajectory [10,11]. In this context, the problem of achieving a reasonable signal-to-noise ratio, even for weak perturbations, was solved for short times by introducing the so-called *subtraction technique* [12], which permitted one to verify, with surprising results [13], the range of the validity of linearity.

Some time later on, it was realized that the same approach could be used to calculate dynamical properties for rare events (e.g., transmission coefficients) by averaging the dynamical response over time-dependent trajectories started from initial conditions sampled from a constrained/conditional equilibrium ensemble [14–18].

Quite recently, finally, the idea of creating a large sample of non-equilibrium trajectories starting from a given initial distribution has been extended to cover whatever distribution that can be sampled starting from an equilibrium or a non-equilibrium, but stationary, dynamics. In particular stationary non-equilibrium ensembles can be generated by suitably restraining standard MD simulations.

In particular, we will illustrate the approach by reporting the results of a study of the time evolution of classical fields, including the onset of convective cells and the relaxation of hydrodynamic interfaces in simple liquids. In this context, we will also briefly address a conceptual difficulty of the approach, due to the possible existence of more than one macroscopic state associated with specific perturbations. In particular cases the problem can be circumvented.

The structure of the paper is as follows. In Section 2 we derive the general framework and specify the possible forms for the initial ensemble. In Section 3 we present a few successful applications of the method. Finally, in Section 4 we try to assess the situation and sketch an outlook.

#### 2. Dynamical Approach to Non-Equilibrium: Theoretical Background

#### *2.1. General Formulation*

We start considering, in a very general way, a (classical) dynamical system with n degrees of freedom, whose time evolution is described by a set of first order differential equations in a phase space of dimension 2n. We will refer to the phase space variables in a collective way with the vector formalism Γ = {q1, p1, q2, p2,...,qn, pn}, where the q's and the p's reduce to the usual coordinate-momentum pairs for Hamiltonian dynamics. The equations of motion can be written in the compact form

$$\dot{\Gamma}\_j = \dot{\Gamma}\_j(\vec{\Gamma}; t) = \dot{\Gamma}\_j\left(q\_1, p\_1, q\_2, p\_2, \dots, q\_n, p\_n; t\right), \quad j = 1, 2, \dots, 2n \tag{1}$$

The above equations could be the usual Hamiltonian equations of motion for an isolated system of N particles [19], contain a number of holonomic constraints [20] or represent the more general case of an "extended" system, possibly non-Hamiltonian [21,22], including couplings of the system to a thermal and/or pressure bath by means of a few extra degrees of freedom, so that, in general, n > 3N (see, also, [23]). We will only assume that the dynamics described by Equation (1) are ergodic, *i.e.*, if we wait long enough, all regions of the phase space available to the system, in accord with the imposed conditions, will be explored by the dynamic evolution. With this in mind, the statistical mechanics description of the system requires the introduction of the invariant measure dμ( Γ, d<sup>2</sup><sup>n</sup>Γ) in phase space [23]. We start by introducing the generator of time translations in terms of the Liouville operator, <sup>L</sup><sup>ˆ</sup>

$$n\hat{\mathcal{L}}(\vec{\Gamma};t) = \sum\_{j=1}^{2n} \dot{\Gamma}\_j(\vec{\Gamma};t) \cdot \frac{\partial}{\partial \Gamma\_j} = \sum\_{j=1}^{n} \dot{q}\_j(\vec{\Gamma};t) \cdot \frac{\partial}{\partial q\_j} + \sum\_{j=1}^{n} \dot{p}\_j(\vec{\Gamma};t) \cdot \frac{\partial}{\partial p\_j} \tag{2}$$

so that the equations of motion can be rephrased in the operator form and formally solved. As the Liouville operator depends explicitly on time, integrating Equation (2) from some initial time t<sup>0</sup> to time t, one obtains an implicit integral equation that can be solved by iteration for each j = 1, 2,..., 2n

$$\Gamma\_j(t) = \Gamma\_j(t\_0) + \int\_{t\_0}^t ds \left(\imath \hat{\mathcal{L}}(s)\right) \Gamma\_j(t\_0) + \int\_{t\_0}^t ds\_1 \int\_{t\_0}^{s\_1} ds\_2 \left(\imath \hat{\mathcal{L}}(s\_1)\right) \left(\imath \hat{\mathcal{L}}(s\_2)\right) \Gamma\_j(t\_0) + \cdots \quad . \tag{3}$$

The results can be expressed in closed "operatorial" form

$$\dot{\Gamma}\_j(t) = \iota \hat{\mathcal{L}}(t) \Gamma\_j(t) \quad \longrightarrow \quad \Gamma\_j(t) = \hat{S}(t, t\_0) \Gamma\_j(t\_0) \,, \quad j = 1, 2, \ldots, 2n \tag{4}$$

with the introduction of the evolution operator

$$
\hat{S}(t, t\_0) = \hat{T} \exp\left[\int\_{t\_0}^t \imath \hat{\mathcal{L}}(s) \, ds\right] \tag{5}
$$

where Tˆ is the time-ordering operator.

Time evolution in phase space can be alternatively expressed in term of the Jacobian **J** Γ(t), Γ(t0) of the time transformation from Γ(t0) to Γ(t). The phase space element d<sup>2</sup><sup>n</sup>Γ(t0) at time t<sup>0</sup> transforms into the volume element d<sup>2</sup><sup>n</sup>Γ(t) = J Γ(t), Γ(t0) d<sup>2</sup><sup>n</sup>Γ(t0) at time t, where J = det **J** obeys the differential equation [23]

$$\frac{dJ\left(\vec{\Gamma}(t), \vec{\Gamma}(t\_0)\right)}{dt} = \hat{\kappa}(\vec{\Gamma}(t); t) \ J\left(\vec{\Gamma}(t), \vec{\Gamma}(t\_0)\right) \tag{6}$$

and the phase space compressibility κˆ is defined by

$$\hat{\kappa}(\vec{\Gamma};t) = \sum\_{j=1}^{2n} \frac{\partial}{\partial \Gamma\_j} \cdot \dot{\Gamma}\_j(\vec{\Gamma};t) \tag{7}$$

For a Hamiltonian system the compressibility κˆ vanishes, J Γ(t), Γ(t0) = 1 and the dynamics preserves volume in phase space (Liouville Theorem). More generally, when κˆ does not vanish, d<sup>2</sup><sup>n</sup>Γ is no longer a dynamical invariant and one needs to introduce a metric factor to define the invariant measure of the phase space under the dynamical evolution. Starting from the general expression for the Jacobian determinant, one gets

$$J\left(\vec{\Gamma}(t), \vec{\Gamma}(t\_0)\right) = \exp\left[\int\_{t\_0}^t \hat{\kappa}(\vec{\Gamma}(s); s)ds\right] = e^{w(\vec{\Gamma}(t); t) - w(\vec{\Gamma}(t\_0); t\_0)} = \frac{Z(\vec{\Gamma}(t\_0); t\_0)}{Z(\vec{\Gamma}(t); t)}\tag{8}$$

where w is the indefinite time integral of κˆ and Z( Γ(t);t) = exp / <sup>−</sup>w( Γ(t);t) 0 . The dynamically invariant volume element in phase space can be defined as

$$\begin{aligned} d\mu\left(\vec{\Gamma}(t), d^{2n}\Gamma\right) &= \,^\*Z(\vec{\Gamma}(t);t)d^{2n}\Gamma(t) = Z(\vec{\Gamma}(t);t)J\left(\vec{\Gamma}(t);\vec{\Gamma}(t\_0)\right)d^{2n}\Gamma(t\_0) \\ &= \,^\*Z(\vec{\Gamma}(t\_0);t\_0)d^{2n}\Gamma(t\_0) = d\mu\left(\vec{\Gamma}(t\_0), d^{2n}\Gamma(t\_0)\right) \end{aligned} \tag{9}$$

Consider, now, an ensemble of systems whose dynamical evolution is defined by Equation (1). The statistical mechanics is described by the time-dependent probability distribution function in phase space f( Γ;t) which must obey the global conservation law for probabilities

$$\int d\mu(\vec{\Gamma}, d^{2n}\Gamma) f(\vec{\Gamma}; t) = 1$$

The corresponding local, differential, conservation law can be derived by transforming the integral back to the phase space element d<sup>2</sup><sup>n</sup>Γ, by using

$$f(\vec{\Gamma};t)d\mu\left(\vec{\Gamma},d^{2n}\Gamma\right) = f(\vec{\Gamma};t)Z(\vec{\Gamma};t)d^{2n}\Gamma = \rho(\vec{\Gamma};t)d^{2n}\Gamma\tag{10}$$

and introducing the phase space density ρ( Γ;t) = Z( Γ;t)f( Γ;t). The continuity equation to be satisfied is

$$\frac{\partial \rho(\vec{\Gamma};t)}{\partial t} + \sum\_{j=1}^{2n} \frac{\partial}{\partial \Gamma\_j} \cdot \left(\dot{\Gamma}\_j(\vec{\Gamma})\rho(\vec{\Gamma};t)\right) = 0 \tag{11}$$

which when expressed in terms of the Liouville operator <sup>L</sup><sup>ˆ</sup> and the phase space compressibility <sup>κ</sup><sup>ˆ</sup> becomes the "generalized" Liouville equation

$$\frac{\partial \rho(\vec{\Gamma};t)}{\partial t} + \left[i\hat{\mathcal{L}}(\vec{\Gamma};t) + \hat{\kappa}(\vec{\Gamma};t)\right] \rho(\vec{\Gamma};t) = 0 \tag{12}$$

and reduces to the more "familiar" equation for the probability density f( Γ;t)

$$\frac{\partial f(\vec{\Gamma};t)}{\partial t} + \imath \hat{\mathcal{L}} f(\vec{\Gamma};t) = 0 \tag{13}$$

However, we must point out that this last equation may lead to confusion if one does not keep in mind that, while the Liouville operator <sup>L</sup><sup>ˆ</sup> defines the dynamical evolution of the time-dependent probability density in phase space f, the not-vanishing compressibility κˆ, hidden in the phase space invariant volume, defines the time evolution of the phase space volume d<sup>2</sup><sup>n</sup>Γ.

The solution of Equation (12) can be retrieved along the same lines followed for Equation (2) and the results can be formally written in closed "operatorial" form

$$\rho(\vec{\Gamma};t) = \hat{S}^{\dagger}(t,t\_0)\rho(\vec{\Gamma};t\_0) \,, \quad \hat{S}^{\dagger}(t,t\_0) = \hat{T}\exp\left[\int\_{t\_0}^{t} -\left(\imath\hat{\mathcal{L}}(s) + \hat{\kappa}(s)\right)ds\right] \tag{14}$$

where we have introduced the adjoint Sˆ† (t, t0) of the previously defined time evolution operator Sˆ(t, t0) acting on the phase space variables Γ and the phase density ρ<sup>0</sup> = ρ( Γ;t0) at the initial time t0.

The average over the (non-)equilibrium ensemble of a physical observable <sup>O</sup>(t) = Oˆ( Γ)<sup>t</sup> or, more generally, of a macroscopic field <sup>O</sup>(x, t) = Oˆ(x, Γ)<sup>t</sup> = \* <sup>j</sup> <sup>O</sup>ˆ( Γ)δ(x <sup>−</sup> R<sup>j</sup> ) + t (the sum is over the particles) can be defined as

$$O(t) \, := \left. \int \hat{O}(\vec{\Gamma}) \, f(\vec{\Gamma}; t) \, d\mu(\vec{\Gamma}, d^{2n}\Gamma) = \int \hat{O}(\vec{\Gamma}) \, \rho(\vec{\Gamma}; t) \, d^{2n}\Gamma \tag{15}$$

$$O(\vec{x},t) \;=\int \hat{O}(\vec{x},\vec{\Gamma})\, f(\vec{\Gamma};t)\,d\mu(\vec{\Gamma},d^{2n}\Gamma) = \int \hat{O}(\vec{x},\vec{\Gamma})\,\rho(\vec{\Gamma};t)\,d^{2n}\Gamma\tag{16}$$

We can make the time evolution explicit by means of the adjoint time evolution operator ρ( Γ;t) = Sˆ† (t, t0)ρ( Γ;t0) and then, by taking advantage of the fact that Sˆ† is the adjoint of the dynamics, we can transfer the effect of time evolution to the physical observables

$$\begin{split} O(t) &= \int \hat{O}(\vec{\Gamma}) \, \hat{S}^{\dagger}(t, t\_{0}) \rho(\vec{\Gamma}; t\_{0}) d^{2n} \Gamma = \int \left( \hat{S}(t, t\_{0}) \hat{O}(\vec{\Gamma}) \right) \, \rho(\vec{\Gamma}; t\_{0}) d^{2n} \Gamma \\ &= \int \hat{O}(\vec{\Gamma}; t) \, \rho(\vec{\Gamma}; t\_{0}) d^{2n} \Gamma \end{split} \quad \Rightarrow \quad O(t) = \langle \hat{O}(\vec{\Gamma}; t) \, \}\_{\rho\_{0}} \tag{17}$$
 
$$\begin{split} O(\vec{x}, t) &= \int \hat{O}(\vec{x}, \vec{\Gamma}) \, \hat{S}^{\dagger}(t, t\_{0}) \rho(\vec{\Gamma}; t\_{0}) d^{2n} \Gamma = \int \left( \hat{S}(t, t\_{0}) \hat{O}(\vec{x}, \vec{\Gamma}) \right) \, \rho(\vec{\Gamma}; t\_{0}) d^{2n} \Gamma \\ &= \int \hat{O}(\vec{x}, \vec{\Gamma}; t) \, \rho(\vec{\Gamma}; t\_{0}) d^{2n} \Gamma \end{split} \quad \Rightarrow \quad O(\vec{x}, t) = \langle \hat{O}(\vec{x}, \vec{\Gamma}; t) \rangle\_{\rho\_{0}} \tag{18}$$

where Oˆ( Γ;t) = Sˆ(t, t0) Oˆ( Γ), *i.e.*, the time evolution along the dynamical trajectory of the system starting from the initial condition Γ(t0) at time t0. We have introduced the shorthand notation, · · · <sup>ρ</sup><sup>0</sup> , for the averages over the ensemble described by the space density ρ<sup>0</sup> at the initial time t0.

Despite the apparent complexity of the time evolution operator Sˆ(t, t0) in Equation (5), its action is a task that can be simply accomplished by MD, *i.e.*, by the numerical integration of the evolution defined by Equation (1). Note that all this is possible thanks to the fact that the Liouville equation can be integrated by the method of characteristics.

In the following, we will deal with fluid systems where the relevant macroscopic fields are [24] the density field (x, t), the velocity field v(x, t) and the temperature field T(x, t):

$$\begin{aligned} \rho(\vec{x},t) &= \int \sum\_{j=1}^{N} m\_j \delta\left(\vec{x} - \vec{R}\_j\right) \hat{S}^\dagger(t, t\_0) \rho(\vec{\Gamma}; t\_0) \, d^{2n} \Gamma \\ &\Rightarrow \quad \left\langle \sum\_{j=1}^{N} m\_j \delta\left(\vec{x} - \vec{R}\_j(t)\right) \right\rangle\_{\rho\_0} \\ \vec{v}(\vec{x},t) &= \quad \frac{1}{\vec{\Gamma}^{(1)}} \int \sum\_{j=1}^{N} \vec{P}\_j \delta\left(\vec{x} - \vec{R}\_j\right) \hat{S}^\dagger(t, t\_0) \rho(\vec{\Gamma}; t\_0) \, d^{2n} \Gamma \end{aligned} \tag{19}$$

$$\begin{aligned} \left| \vec{x}, t \right> &= \frac{1}{\varrho(\vec{x}, t)} \int \sum\_{j=1} \vec{P}\_j \delta\left( \vec{x} - \vec{R}\_j \right) \hat{S}^\dagger(t, t\_0) \rho(\vec{\Gamma}; t\_0) \, d^{2n} \Gamma \\ &\Rightarrow \quad \frac{1}{\varrho(\vec{x}, t)} \left\langle \sum\_{j=1}^N \vec{P}\_j(t) \delta\left( \vec{x} - \vec{R}\_j(t) \right) \right\rangle\_{\rho\_0} \end{aligned} \tag{20}$$

$$\begin{aligned} \left(\frac{f}{N}\right)^{\mathsf{L}}k\_{B}\mathsf{T}(\vec{x},t)&=\ \frac{1}{\varrho(\vec{x},t)}\int\sum\_{j=1}^{N}\left[\vec{P}\_{j}-m\_{j}\vec{v}(\vec{x},t)\right]^{2}\delta\left(\vec{x}-\vec{R}\_{j}\right)\hat{S}^{\dagger}(t,t\_{0})\rho(\vec{\Gamma};t\_{0})\,d^{2n}\mathsf{T} \\ &\Rightarrow\ \frac{1}{\varrho(\vec{x},t)}\left\langle\sum\_{j=1}^{N}\left[\vec{P}\_{j}(t)-m\_{j}\vec{v}(\vec{x},t)\right]^{2}\delta\left(\vec{x}-\vec{R}\_{j}(t)\right)\right\rangle\_{\rho\_{0}}\end{aligned} \tag{21}$$

where N is the number of particles and the factor f, usually equal to 3N, counts the number of degrees of freedom in the presence of constraints.

## *2.2. Ensembles at* t<sup>0</sup>

Equations (17) and (18) express what we like to call the Onsager–Kubo relations and state that we can obtain the time evolution of a macroscopic observable or of a macroscopic field as the average of the time evolved corresponding microscopic expression over the initial-time-ensemble described by the phase space density ρ<sup>0</sup> = ρ( Γ;t0).

If the ensemble at the initial time t<sup>0</sup> can be simulated by a dynamical system in stationary conditions, then such a probability density function can be sampled by MD, generating a set of (possibly independent) phase space points distributed according to ρ0. From each of these points, one can then start an independent dynamical trajectory along which the observables Oˆ( Γ;t) and Oˆ(x, Γ;t) can be computed. Finally, by averaging over all the trajectories, the values of the involved observables at time t, one can obtain the macroscopic time-dependent behavior of the system as visualized in Figure 1.

In order to use MD to sample the appropriate initial ensemble at time t0, one needs to define, for any specific problem, the dynamical evolution, Equation (1), and the auxiliary conditions to which the systems is subjected. Sometimes, but not always, this will be possible within the Hamiltonian formulation of the dynamics.

Figure 1. Phase space representation of the ensemble of dynamical side-trajectories providing the non-equilibrium statistical averages: in blue, the Molecular Dynamics (MD) trajectory sampling the ensemble at time t0; in black, the individual non-equilibrium trajectories sampling the Non-Equilibrium Molecular Dynamics (D-NEMD) ensemble, over which one can average the time behavior of the observable Oˆ, as a function of the time t.

#### 3. D-NEMD Selected Applications

We will now list a number of cases, which will later be illustrated with the corresponding application. Transport properties, like viscosity, thermal conductivity, *etc*., have been computed and their linearity range investigated by non-equilibrium MD since the 1970s [10–12,25–34]. These results were obtained by measuring on a computer the mechanical response when switching on the external (at the beginning Hamiltonian and later on, more generally, also non-Hamiltonian) perturbation applied to a model system initially at equilibrium. In other words, we identify in the present case the ensemble at time t<sup>0</sup> with the statistical mechanics equilibrium ensemble, while the dynamical trajectories are carried out under the influence of an external (time-dependent) force field.

More generally, we can generate (and sample) initial ensembles by less trivial procedures, e.g., in the case of the formation of convective cells, gravity is considered as the external perturbation to be applied on a system initially in a steady state under the effect of a thermal gradient. The ensemble at time t<sup>0</sup> no longer corresponds to the equilibrium one, but it is set up by introducing a stationary boundary perturbation which, in the specific case, is just an *ad hoc* boundary condition, which models a thermal wall stochastically. Moreover, a confining wall, present in the form of an external field acting at the boundary on each particle, confines the system in the simulation box. This boundary condition is perfectly compatible with the presence of a gravity field.

Another possible case we will consider is the relaxation to equilibrium of an interface between two immiscible liquids, starting from an imposed, non-equilibrium, condition in which the curvature of the interface is maintained by a macroscopic restraint fixing the shape of the initial interface. The ensemble at time t<sup>0</sup> is described by a conditional probability density in which an *ad hoc* restraint is imposed on a field-like observable. The sample is generated by using an advanced MD sampling technique, where the dynamical trajectory evolves under the effect of a suitable restraining potential, from which we can extract an unbiased sample of the conditional probability density function. Time-dependent averages are then taken over dynamical trajectories generated according to the un-restrained dynamics of the systems. The different situations described are summarized in Figure 2.

Figure 2. We distinguish three different classes for the sampling of the initial distribution: equilibrium, direct stationary non-equilibrium simulations and advanced conditional sampling. They are shown to be associated with the corresponding sampling techniques and test-case applications.

#### *3.1. Transport and Linear Response*

Linear Response Theory is a nice result of the nineteen-fifties in the theory of irreversible processes [6], where well-defined microscopic expressions for all transport coefficients have been derived in terms of a properly chosen perturbation [7,8,35,36]. In the Dynamical approach to Non-Equilibrium Molecular Dynamics (D-NEMD) framework it has been possible to investigate the linear and, more generally, the non-linear response by making reference to the canonical ensemble for sampling the initial conditions at time t0.

#### 3.1.1. Hamiltonian Perturbations

For a system of particles in three dimensions described by the usual set of Cartesian coordinates and momenta, {R<sup>j</sup> , P<sup>j</sup> , j = 1, <sup>2</sup>,... }, the perturbation can be put in Hamiltonian form by choosing a physical property <sup>A</sup>ˆ(x| Γ) = <sup>j</sup> <sup>A</sup><sup>j</sup> ( Γ)δ(x <sup>−</sup> R<sup>j</sup> ) that describes the coupling of the system to the applied external local field ψ(x, t) = ϕ(x)χ(t), whose time-dependent intensity χ(t) can be constant or periodic or even arbitrary, generating corresponding flux conditions. Especially important are the cases in which the perturbation is either a step function θ(t − t0) (θ(t>t0)=1 , θ(t<t0)=0) or a Dirac delta impulse δ(t − t0), at t = t0, after which the system is left free to relax. In the linear regime, the general response can be computed as the superposition of these impulsive responses. One then derives the equations of motion using the standard Hamiltonian route, where we start by separating in the Hamiltonian <sup>H</sup>( <sup>Γ</sup>, t) = <sup>H</sup>0( Γ) + <sup>H</sup>p( Γ, t) the time-dependent perturbation term

$$\mathcal{H}\_p(t) = -\int d\vec{x} \hat{A}(\vec{x}|\vec{\Gamma})\psi(\vec{x},t) = -\left(\sum\_j A\_j \varphi(\vec{R}\_j)\right)\chi(t) = -h\_p \chi(t) \tag{22}$$

where the Hamiltonian H<sup>0</sup> is the equilibrium Hamiltonian to which one can possibly add the coupling to a thermostat or a barostat, something that can be done in a variety of ways that we do not need to specify here. Indicating generically the possible presence of such couplings to different baths with ellipses, the equations of motion for particle j can be written

$$\begin{aligned} \dot{\vec{R}}\_{j} &= \ \vec{+} \frac{\partial \mathcal{H}\_{0}}{\partial \vec{P}\_{j}} + \frac{\partial \mathcal{H}\_{p}}{\partial \vec{P}\_{j}} = \left(\frac{\vec{P}\_{j}}{m\_{j}} + \dotsb\right) - \frac{\partial h\_{p}}{\partial \vec{P}\_{j}} \chi(t) \\\dot{\vec{P}}\_{j} &= \ -\frac{\partial \mathcal{H}\_{0}}{\partial \vec{R}\_{j}} - \frac{\partial \mathcal{H}\_{p}}{\partial \vec{R}\_{j}} = \left(\vec{F}\_{j} + \dotsb\right) + \frac{\partial h\_{p}}{\partial \vec{R}\_{j}} \chi(t) \end{aligned} \tag{23}$$

The structure of the equations of motion can be broken into the two terms of the Liouville operator defined in Equation (2), <sup>ı</sup>Lˆ( Γ;t) = <sup>ı</sup>Lˆ0( Γ)+ıLˆp( Γ;t), with the partial Liouville operator <sup>ı</sup>Lˆ<sup>0</sup> defining the dynamical evolution in phase space for the sampling of the ensemble at time t0. Accordingly, the corresponding evolution operator for the stationary dynamics will be called Sˆ0(t). The dynamics of the time-dependent trajectories will be generated by the t0-(time dependent) evolution operator Sˆ(t, t0), obeying the (usual) Dyson equation

$$
\hat{S}(t, t\_0) = \hat{S}\_0(t) + \int\_{t\_0}^t \hat{S}\_0(t - s) \imath \hat{\mathcal{L}}\_p(s) \hat{S}(s, t\_0) \, ds \tag{24}
$$

which (if of interest) can be taken as the basis to develop the perturbative approach, whose first term leads to the Linear Response Theory approach. However, in many cases of interest, for example for constrained systems with a Hamiltonian or non-Hamiltonian structure, it becomes very difficult, if not impossible, to carry out the standard manipulations leading to the correlation function expressions for the linear response [17,18]. Nevertheless, a linear (or non-linear) response can always be computationally investigated using the procedure defined by Equations (17) and (18), as outlined in Figure 1.

#### 3.1.2. Non-Hamiltonian Perturbations

A more general scheme has also been used for bulk perturbations, where the new equations of motion, which cannot be derived from a time-dependent Hamiltonian in a way that remains consistent with applied (periodic) boundary conditions, are obtained from Equation (23) by substituting the terms derived from the Hamiltonian perturbation hp, with two sets of "*ad hoc*" phase space functions {C<sup>j</sup> ( Γ), D <sup>j</sup> ( Γ), j = 1, 2,...}:

$$
\begin{aligned}
\dot{\vec{R}}\_j &= \begin{pmatrix} \vec{P}\_j \\ \frac{m\_j}{m\_j} + \cdots \\ \vec{P}\_j \end{pmatrix} + \vec{C}\_j(\vec{\Gamma}) \cdot \chi(t) \\
\dot{\vec{P}}\_j &= \begin{pmatrix} \vec{F}\_j + \cdots \\ \vec{F}\_j + \cdots \end{pmatrix} + \vec{D}\_j(\vec{\Gamma}) \cdot \chi(t)
\end{aligned}
\tag{25}
$$

A specific, notable, example is the one known under the name of "SLLOD tensor" dynamics [37], where C<sup>j</sup> <sup>χ</sup>(t) = <sup>−</sup>(R<sup>j</sup> · <sup>κ</sup>) <sup>χ</sup>(t) and D <sup>j</sup> <sup>χ</sup>(t)=(P<sup>j</sup> · <sup>κ</sup>) <sup>χ</sup>(t) are coupled with specific, synchronized, Lees–Edwards periodic boundary conditions [38] (see Figure 3), which are needed to establish the tensor κ expressing the desired velocity gradient in the non-equilibrium simulation of viscous flows by molecular dynamics [39–42].

Figure 3. The Lees–Edwards periodic boundary conditions (Panel A) used to establish a stationary Couette flow (Panel B). In the case of a step function perturbation, periodic images above and below the reference MD cell are translated by an amount ±vδt at each time step, starting from time t0. Periodic boundary conditions can be effectively imposed using the equivalent non-orthogonal reference cell, highlighted in red (the actual inclination increases uniformly with time).

In the typical setup for a planar Couette flow, one establishes a gradient of the x-component of the velocity along the y-axis of the simulation and measures the response using as an observable the xy component of the pressure tensor σxy, which can be written, for a system where the potential U is given by a sum of pair interactions, as

$$\sigma\_{x,y} = \frac{1}{V} \left[ \sum\_{j} \frac{\vec{P}\_j^{(x)} \vec{P}\_j^{(y)}}{m\_j} + \sum\_{i$$

where Rij = (R<sup>i</sup> <sup>−</sup> R<sup>j</sup> ). In the D-NEMD approach, if the external field term is switched on with a step function perturbation in time at t = t<sup>0</sup> = 0, one can measure the viscous time-dependent response η(t) = −σxy(t)<sup>ρ</sup><sup>0</sup> /γ, where γ is the applied shear rate and the asymptotic value η at long times of η(t) gives the viscosity of the fluid.

For the purpose of illustrating the method in the original applications, when the ensemble at the initial time t<sup>0</sup> is an equilibrium ensemble, we will restrict ourselves to the simple case of shear (Couette) flow. We would like to mention, however, that also elongational flows [41,43–46] and, later on, mixed shear-elongational flows [47–49] have been simulated both in atomic and molecular fluids. In these cases, it becomes technically much more difficult to maintain for an indefinite length of time the periodic boundary conditions and, for that, we refer the interested reader to [50,51].

Figure 4. Panel (A). Comparison of shear viscosity values as a function of the shear rate for the planar Couette flow: (a) D-NEMD asymptotic values from Reference [52]; (b) and (c) average values from stationary non-equilibrium calculations from Reference [54] and Reference [55] respectively. The solid line is the Lorentzian best-fit to the data and the dashed line is the Ree-Eyring-Eu prediction [56]; Panel (B). The running-time integral (solid line) of the D-NEMD viscous dynamical response to a δ(t − t0) perturbation with γ = 10−<sup>4</sup>, averaged over 4000 trajectories versus the running-time integral (dashed line) of the stress autocorrelation function shows the agreement of D-NEMD results with the Green-Kubo linear reponse theory [52]. The error bars, extrapolated using the mean square fluctuations over the 4000 trajectories, increase with time restricting the time range over which the response can be computed. (nb: the same kind of time dependent behavior for η(t) is observed directly when using a step function perturbation).

In Panel (A) of Figure 4, we show the results of a calculation [52] with a step function perturbation on a Lennard–Jones (LJ) fluid at the triple point, = 0.8442 and kBT<sup>p</sup> = 0.725 in reduced LJ units, *i.e.*, ε for energy, σ for distances and the particle mass m for masses. The temperature of a system of 2,048 particles was controlled using a Nosé–Hoover thermostat [22], both on the long equilibrium trajectory, which samples the (independent) initial conditions from a canonical ensemble at temperature T<sup>p</sup> and on the non-equilibrium trajectories to handle the heat produced, especially at high shear rates. The behavior of the time-dependent viscous response for the case of a t<sup>0</sup> = 0 impulsive perturbation with a δ(t − t0) term was used to investigate the range of validity of the Linear Response Theory for very small shear rates by comparison with the running time integral of the corresponding stress autocorrelation at equilibrium [53].

### *3.2. Non-Equilibrium (Steady State) Initial Conditions at Time* t<sup>0</sup>

The D-NEMD approach can be used also to follow the transient evolution of a system, which, starting from an out-of-equilibrium state under the effect of a stationary thermodynamic field, reaches a final (different) non-equilibrium state in response to an additional external perturbation. Below, we illustrate the approach with a case worked out in [57]. This is the case of the build up of a convective roll in a two-dimensional (2D) model fluid kept in an out-of-equilibrium condition by the presence of a thermal gradient when an external gravity field is (instantaneously) switched on.

The 2D system is composed of N = 5, 401 identical particles in a square box of size L in the xz plane with periodic boundary conditions along the x direction and a pair of confining walls along z obtained by means of an external field ψ(z), acting at the top and the bottom of the simulation box to avoid the drifting away of the particles, which interact with each other via a purely repulsive (2D) Weeks–Chandler–Andersen (WCA) [58] pair potential obtained by truncating the Lennard–Jones potential at its minimum r<sup>m</sup> = 2<sup>1</sup>/<sup>6</sup>σ and shifting its value by ε in such a way that both the force and potential are continuous and equal to zero for r rm.

$$V\_{WCA} = 4\varepsilon \left[ \frac{1}{4} + \left(\frac{\sigma}{r}\right)^{12} - \left(\frac{\sigma}{r}\right)^{6} \right], \quad r \leqslant r\_m; \quad V\_{WCA} = 0, \quad r \geqslant r\_m \tag{27}$$

The size of the MD box is L = 84.9 in reduced LJ units, which leads to a density <sup>f</sup> = 0.75, on average. The confining potential Vwall is constructed as the result of a (2D) LJ fluid with continuous constant density <sup>w</sup> filling the two half planes above and below the periodically replicated MD boxes and has a 10-4 power dependence with parameters defined in [57]:

$$\begin{array}{rcl} V\_{wall}(z) &=& U(z) - U(z\_m), & z \leqslant z\_m; & U(z) = \left(\rho\_w \sigma^2\right) 4\varepsilon \left[ V\_{10}\left(\frac{\sigma}{z}\right)^{10} - V\_4\left(\frac{\sigma}{z}\right)^4 \right] \\\\ V\_{wall}(z) &=& 0 & z \geqslant z\_m; & \\ \end{array}$$

where z is the distance from the box edge, z<sup>m</sup> is the value at which U(z) has its minimum and V<sup>n</sup> = 2πn!/ (2<sup>n</sup>+1 (n/2)! n), obtaining, in analogy with WCA, a purely repulsive wall. The thermal gradient along the x-direction is obtained by means of two stochastic reservoirs, which are implemented in the two stripe regions at the x-extremities of the MD box (see Panel (A) of Figure 5). The velocities of each particle located in these two stripes are sampled from a 2D Maxwellian distribution f(v) = e−mv2/(2kBTi) /(2πmkBTi) at the temperature T<sup>i</sup> of the stripe, with i = 1, 2 labeling the two reservoirs. Periodic boundary conditions along the x-direction mean that a particle can actually travel from the first (cold) to the second (hot) reservoir. To avoid that a non-thermalized particle in the inner region interacts with both reservoirs, a stripe thickness Δx<sup>T</sup> = 1.68 > r<sup>m</sup> was chosen. For the system in stationary conditions, each reservoir contains an average of around 100 particles. The reservoir temperatures were chosen to be T<sup>1</sup> = 1.5 and T<sup>2</sup> = 9.9, corresponding to a thermal gradient ∇T = 0.1. Using these conditions, in the absence of gravity, a long stationary trajectory was generated and, from it, a set of 1,000 initial conditions at time t<sup>0</sup> was sampled. Time-dependent trajectories have been generated and then suitable properties averaged at times 0 s t, switching on a gravity field with acceleration g = 0.1 in LJ reduced units (while huge compared with Earth gravity, this is a very small value when compared to the accelerations coming from the interatomic interactions). The behavior of the system was analyzed by coarse graining the MD box into a 15 × 15 mesh of square cells of sides = 5.66 that was used to compute local macroscopic fields. Coarse graining is applied by approximating <sup>δ</sup>(x−Ri) = <sup>δ</sup>(x−Xi)δ(z−Zi) with the value 1/<sup>2</sup> for particles inside the cell labeled by (j, k) and centered on the mesh point (x<sup>j</sup> , zk) and zero otherwise. The velocity field is calculated as an average over the D-NEMD trajectories

$$\vec{v}(x\_j, z\_k; t) = \frac{\langle \ \vec{p}(x\_j, z\_k; t) \ \rangle\_{\rho\_0}}{\varrho\_m(x\_j, z\_k; t)} \tag{28}$$

where m, the mass density, is given in terms of the average of the number of particles njk(t) inside the cell (j, k) at time t

$$\varrho\_m(x\_j, z\_k; t) = m \langle \ n\_{jk}(t) \ \rangle\_{\rho\_0} \tag{29}$$

Figure 5. The simulation setup (Panel A) for sampling the initial distribution showing the regions where the confining field ψ(z) and the two temperature reservoirs act on the particles. In Panel (B) the average evolution of the circulation of the velocity field is shown after averaging over 200 independent initial conditions (in the inset, we show the path along which the circulation was calculated).

The collective behavior of the velocity field can be monitored by calculating its circulation C(t) = 3 <sup>P</sup> v(x(s), z(s);t)ds on a closed path P, as a function of time. Its evolution is shown in Panel (B) of Figure 5 as a function of the time t after the ignition of the gravity field along a path located in the bulk region, but far enough from the center of the box. The circulation starts from zero at t<sup>0</sup> = 0; its value grows with time and, after a small overshooting, reaches its plateau, stationary, value at t ≈ 200. This is the time at which also both temperature and density fields become stationary. The transient is characterized by correlated oscillations of temperature and density with a very similar period τ = 18 in all cells, but with opposite phases for the cells at the bottom of the box with respect to the ones at the top. The velocity field, shown in Figure 6, is initially null, in Panel (A), acquires first at t ≈ 4.5, in Panel (B), an almost uniform downward component in the direction of the force as a consequence of the ignition of gravity, then it is almost null again at t ≈ 9.0; it shows a maximum reaction to compression at t ≈ 13.5, in Panel (C), and almost vanishes again at t ≈ 18. The cycle restarts and at t ≈ 22.5, in Panel (D), the field is again predominantly in the downward direction, although one can start to see the building up of a convective flow, which is shown in its stationary condition at t ≈ 205, in Panel (E).

We have seen how D-NEMD can be used to illustrate the build up of a convective roll when a gravity field is instantaneously switched on in a system where a stationary (non-equilibrium) thermal gradient was already present. This is not the only case in which a convective roll can be observed. Indeed, keeping the same geometry for the system, *i.e.*, with the gravity field orthogonal to the thermal gradient (Panel A of Figure 5), one could alternatively start from initial conditions in which the system is at equilibrium in the presence of the gravity field and, then, follow the dynamics when the thermal gradient is instantaneously switched on or even start from a homogeneous fluid at equilibrium and instantaneously switch on both the gravity field and the thermal gradient [57]. In all these cases, although following different paths, the system eventually reaches the same (macroscopic) final steady state with the formation of a clockwise rotating convective roll, centered at the center of the box.

Figure 6. The build up of the convective flow is shown by visualizing the local velocity field averaged over 1,000 independent initial configurations as a function of time: (A) t = 0; (B) t ≈ 4.5; (C) t ≈ 13.5; (D) t ≈ 22.5; (E) t ≈ 205.

Complications arise if the system is setup with a different geometry, e.g., in the case of Rayleigh–Bénard convection when the direction of the thermal gradient is parallel to the direction of the gravity field. The system has a higher symmetry and rolls rotating both in the counterclockwise and clockwise directions are possible. This implies that the D-NEMD averages cannot be carried out directly, as in the case we have described so far. In fact, now, in the ensemble of individual trajectories, one samples with equal probability the initial conditions leading to clockwise or to counterclockwise rolls. Performing ensemble averages without paying attention to the direction of rotation would give a wrong result. One needs either to enforce a mechanism that breaks this symmetry or to weight trajectories differently, according to the direction of rotation of the convective roll. The latter was the choice applied in [57], where it was simply impossible to fix *a priori*, by tweaking the initial conditions, the final rotation direction of each trajectory. Instead, the direction of the convective roll rotation was analyzed in the steady-state part of each trajectory by time averaging the velocity field over the last 10,000 steps. The ensemble averages were consistently computed, afterwards, by taking the specular image of the fields when the roll rotation was opposite to the one (arbitrarily) chosen as the reference.

### *3.3. Sampling from Conditional Distributions at Time* t<sup>0</sup>

Probably the most interesting application of the D-NEMD procedure is when the non-equilibrium dynamical trajectories start from states corresponding to a very unlikely fluctuation and we want to follow dynamically the way in which the system relaxes back to equilibrium. An efficient sampling of the points in phase space representing the initial condition cannot be achieved just by waiting long enough for the desired event to occur during a standard MD trajectory, but more advanced methods are required to enhance the sampling. Whenever the conditions can be described using an "order parameter", *i.e.*, an appropriate phase space function or field, one can define a macroscopic constraint that applied to the system will allow one to explore the interesting, but unlikely, region of phase space. A viable method in many cases is the well-known Blue Moon approach [15,59], where the conditional probability density is constructed by augmenting the dynamical system with a set of holonomic constraints that force the system to explore states on the specific hypersurface of interest in phase space. The points sampled on a dynamical trajectory subject to constraints, however, cannot be directly used to start the time-dependent non-equilibrium trajectories, because of the unphysical additional conditions enforced on velocities to keep the dynamics on the constrained hypersurface. In the Blue Moon approach, such caveats are overcome, the correct procedure is outlined in [15], requiring, first, an apt resampling of the velocities and, then, an appropriate reweighting when computing the time-dependent averages. The Blue Moon approach was initially devised and successfully applied to the calculation of rate constants for activated processes. In particular, the rate constants are defined in terms of the product of two terms: a transition state theory term and a "transmission coefficient" (*i.e.*, the plateau value, in the intermediate time scale, reached following the transient behavior of the time-dependent values of the reaction coordinate(s) [14,60]). The calculation of the transmission coefficient is a task that can be accomplished exactly along the same lines of the proposed D-NEMD approach.

As a more advanced illustration of D-NEMD when sampling from a conditional probability density, we describe the case of the hydrodynamic relaxation to equilibrium of the interface between two immiscible liquids [61]. The relaxation can be described by following the time evolution of the difference of the density fields of the two species A and B, <sup>Δ</sup>(x;t) = <sup>A</sup>(x;t) <sup>−</sup> <sup>B</sup>(x;t) , and the associated velocity field v(x;t). The distribution at time t<sup>0</sup> corresponds to the stationary conditions of the system subject to a macroscopic restraint that forces a non-equilibrium geometry for the interface. This requires the implementation of a method, like the Blue Moon one, that allows one to sample the conditional probability density associated with the constraint. However, using the Blue Moon approach for vector or field-like constraints can become considerably cumbersome and rather inconvenient in practice, especially for molecular systems where constraints are already used in the force field to impose molecular geometries. A much more practical alternative is to use restrained MD, where one substitutes the constraint with an equivalent restraining potential in terms of an additional coupling parameter, asymptotically reproducing unbiased constrained conditions. Let us summarize the restraint MD approach for the case in which the constraint is imposed on a field-like observable, as for the density difference Δ(x, t). Constraining the shape of the interface S corresponds to specifying the set of spatial points {xS} where Δ(xS, t)=0.

According to Irving and Kirkwood [24], we associate with this macroscopic field a microscopic observable

$$\Delta\hat{\varrho}(\vec{x},\vec{\Gamma}) = \sum\_{j=1}^{N\_A} m\_j^A \delta(\vec{x} - \vec{R}\_j^A) - \sum\_{j=1}^{N\_B} m\_j^B \delta(\vec{x} - \vec{R}\_j^B) \tag{30}$$

on which we need to impose the condition Δ(x, t0) = Δ˜(x)=0 on all points {x} in the domain corresponding to the desired geometrical surface at the time t0. However, in the numerical approach, one cannot deal directly with a continuous (vector) variable x, therefore, the volume available to the system needs to be discretized over a mesh. With the choice of subdividing the volume in elementary cubic cells, one introduces the space discretization {xα, α = 1, 2,...}, where the reference point x<sup>α</sup> coincides with the center of the α − th cell and the microscopic observable field at this point x<sup>α</sup> is defined as the average <sup>F</sup><sup>ˆ</sup> of Δˆ over the volume <sup>Ω</sup><sup>α</sup> of the <sup>α</sup> <sup>−</sup> th cell:

$$\hat{F}(\vec{x}\_{\alpha},\vec{\Gamma}) = \frac{1}{\Omega\_{\alpha}} \int\_{\Omega\_{\alpha}} d^3x \left[ \sum\_{j=1}^{N\_A} m\_j^A \delta(\vec{x} - \vec{R}\_j^A) - \sum\_{j=1}^{N\_B} m\_j^B \delta(\vec{x} - \vec{R}\_j^B) \right], \quad \alpha = 1, 2, \dots \tag{31}$$

on which we now need to impose the condition Fˆ(xα, Γ) = F˜(xα)=0 at each of the m points {xα, α = 1, 2,...,m}, which correspond to the subset of cells that make up the discretized representation of the chosen interface between the two immiscible liquids.

Consider, now, a system described by the Hamiltonian

$$\mathcal{H}\_k(\vec{\Gamma}) = \mathcal{H}(\vec{\Gamma}) + \frac{k}{2} \sum\_{\alpha=1}^m \left[ \hat{F}(\vec{x}\_{\alpha}, \vec{\Gamma}) - \tilde{F}(\vec{x}\_{\alpha}) \right]^2 \tag{32}$$

where <sup>H</sup>( Γ) is the Hamiltonian of the unconstrained system and k is a tunable parameter that defines the strength of the (harmonic) restraining potential, *i.e.*, the last term on the right-hand side of Equation (32). This Hamiltonian can be used to drive either an MC simulation or an MD simulation at a fixed temperature T generating trajectories, stochastic or dynamic, which sample the phase space of the system according to the canonical probability density ρ (k) <sup>0</sup> ( Γ) at time t<sup>0</sup>

$$\rho\_0^{(k)}(\vec{\Gamma}) = \frac{e^{-\beta \mathcal{H}\_k(\vec{\Gamma})}}{\mathcal{Z}\_0^{(k)}} = \frac{e^{-\beta \mathcal{H}(\vec{\Gamma})} \prod\_{\alpha=1}^m e^{-\frac{\beta k}{2} \left[\vec{F}(\vec{x}\_\alpha, \vec{\Gamma}) - \vec{F}(\vec{x}\_\alpha)\right]^2}}{\mathcal{Z}} \frac{1}{P^{(k)}(\tilde{F})} \equiv \frac{P^{(k)}(\vec{\Gamma}, \tilde{F})}{P^{(k)}(\tilde{F})} = P^{(k)}(\vec{\Gamma}|\tilde{F})\tag{33}$$

where <sup>Z</sup>(k) <sup>0</sup> <sup>=</sup> <sup>d</sup><sup>2</sup><sup>n</sup>Γe−βHk( Γ) and <sup>Z</sup> <sup>=</sup> <sup>d</sup><sup>2</sup><sup>n</sup>Γe−βH( Γ) are the canonical partition functions and

$$P^{(k)}(\tilde{F}(\vec{x}\_1), \tilde{F}(\vec{x}\_2), \dots \tilde{F}(\vec{x}\_m)) \quad = \frac{1}{\mathcal{Z}} \int d^{2n} \Gamma \left\{ e^{-\beta \mathcal{H}(\vec{\Gamma})} \prod\_{\alpha=1}^m e^{-\frac{\beta k}{2} \left[ \vec{F}(\vec{x}\_\alpha, \vec{\Gamma}) - \vec{F}(\vec{x}\_\alpha) \right]^2} \right\} \tag{34}$$

We see, then, more explicitly, that, thanks to the restraint potential, at a given value k of the tunable coupling parameter, we are sampling the conditional probability density of Γ given F˜, whose limit for k → ∞ is just the ensemble associated with that given fluctuation.

The idea of using a biasing potential to sample unlikely points in configuration space was pioneered by Torrie and Valleau for MC simulation [62], then presented in this form in [63]. However, while in their case of *umbrella/window sampling*, the bias is tuned in such a way to sample, in a statistically significant manner, a wider portion of the configuration space, in the restraint MD approach, one considers high enough values of the tunable parameter k with the aim of sampling the conditional probability associated with the portion of phase space representing a rare region of interest. Indeed, using that lim<sup>a</sup>→∞ exp # −a 2 (y − y˜) 2 \$ −→ <sup>√</sup>2<sup>π</sup> a δ(y − y˜), one recovers, in the limit βk → ∞, the joint probability density

$$\lim\_{\beta k \to \infty} \rho\_0^{(k)}(\vec{\Gamma}) \quad = \ \rho\_0(\vec{\Gamma}|\{\tilde{F}\}) = \frac{e^{-\beta \mathcal{H}(\vec{\Gamma})}}{\mathcal{Z}} \prod\_{\alpha=1}^m \delta\left(\hat{F}(\vec{x}\_{\alpha}, \vec{\Gamma}) - \tilde{F}(\vec{x}\_{\alpha})\right) / \mathcal{P}(\{\tilde{F}\}) \tag{35}$$

where, in the normalizing factor, the probability density of the "condition" {F˜} is given by

$$\mathcal{P}(\{\tilde{F}\}) \; : \; \mathcal{P}\left(F(\vec{x}\_1) = \tilde{F}(\vec{x}\_1), F(\vec{x}\_2) = \tilde{F}(\vec{x}\_2), \dots, F(\vec{x}\_m) = \tilde{F}(\vec{x}\_m)\right) \tag{36}$$

$$=\frac{1}{\mathcal{Z}}\int d^{2n}\Gamma\left\{e^{-\beta\mathcal{H}(\vec{\Gamma})}\prod\_{\alpha=1}^{m}\delta\left(\hat{F}(\vec{x}\_{\alpha},\vec{\Gamma})-\tilde{F}(\vec{x}\_{\alpha})\right)\right\}\tag{37}$$

The choice of a restraining potential, which depends only on the coordinates of the particles, as in this case, does not influence the probability density in the momentum space, which remains the Maxwellian (equilibrium) distribution and, at variance with the Blue Moon approach, independent points along the stationary restrained MD trajectory can be directly taken as initial configurations representative of the probability density at time t0. Moreover, if needed, the restrained MD approach can be further generalized to enforce a more general macroscopic constraint affecting also the momenta of the particles, for example, coupling it with the *ad hoc* boundary conditions and the localized velocity sampling described in Section 3.2 to impose a non-uniform macroscopic velocity field or a temperature gradient in the system.

The definition of the microscopic field in Equation (31) has still one important drawback, which can prompt major issues in particular conditions. In fact, because of the presence of δ-functions in the definition, as a particle crosses the border between one cell and the neighboring one, the integrals that define the macroscopic field at the two corresponding points in space change by, plus or minus, respectively, a finite value introducing discontinuities in the restraining potential in Equation (32). This results in the (highly undesirable) appearance of impulsive terms in the forces on the atoms. Consider the <sup>δ</sup>(x <sup>−</sup> R<sup>j</sup> ) function contribution to the integral in Equation (31) for a specific particle <sup>j</sup>. This is given by the products of three terms, corresponding to the orthogonal directions in space, where each of them is the difference of the values of the cumulative distribution at the edges of the α − th cell. The cumulative distribution for the delta function δ(ξ) is the step function θ(ξ), {θ(ξ < 0) = 0 , θ(ξ > 0) = 1}, so that each of the above three terms can only be either zero or one. In order to smooth the restraining potential, we need to replace the step function with a smoother function, like a sigmoid, resulting in a continuous variation, between zero and one, of each term in the product. This is equivalent to giving a finite extension to the particle size, resulting in the possibility that a particle contributes, fractionally, to the density field of more than one cell at the same time. In this way, the restraining potential changes smoothly with the motion of the particles in time, without discontinuities when particles cross the cell borders. One possible choice for such function is given by the error function, which corresponds to replacing the delta function by the equivalent Gaussian, <sup>e</sup><sup>−</sup> <sup>ξ</sup><sup>2</sup> <sup>2</sup><sup>a</sup> / <sup>√</sup>2πa −→ <sup>δ</sup>(ξ), in the limit <sup>a</sup> −→ <sup>0</sup>, where the parameter <sup>a</sup> gives the order of magnitude for the (1D) size of the particle. Within such an approximation, Equation (31) becomes

$$\hat{F}(\vec{x}\_{\alpha}, \vec{\Gamma}) = \frac{1}{\Omega\_{\alpha}} \left[ \sum\_{j=1}^{N\_A} m\_j^A \Theta(a, \vec{x}\_{\alpha}, \vec{R}\_j^A) - \sum\_{j=1}^{N\_B} m\_j^B \Theta(a, \vec{x}\_{\alpha}, \vec{R}\_j^B) \right], \quad \alpha = 1, 2, \dots \tag{38}$$

where the function Θ(a, xα, R<sup>j</sup> ) is the product of three terms corresponding to the integrals along the three spatial components (x<sup>1</sup> <sup>≡</sup> x, x<sup>2</sup> <sup>≡</sup> <sup>y</sup> and <sup>x</sup><sup>3</sup> <sup>≡</sup> <sup>z</sup>) each involving the evaluation of two values of the error function relative to the border of the cubic cell of length :

$$\Theta(a,\vec{x}\_{\alpha},\vec{R}\_{j}) = \prod\_{\nu=1}^{3} \left[ \text{erf}\left(\frac{x\_{\alpha}^{\nu} + \ell/2 - R\_{j}^{\nu}}{\sqrt{a}}\right) - \text{erf}\left(\frac{x\_{\alpha}^{\nu} - \ell/2 - R\_{j}^{\nu}}{\sqrt{a}}\right) \right] \tag{39}$$

The two fluids, A and B, are modeled using identical Lennard–Jones particles with mass m and (unique) parameters σ = σAA = σBB = σAB and ε = εAA = εBB = εAB for the LJ potential. The immiscibility is obtained by removing the attractive term for the pair interactions between a particle of type A and a particle of type B, keeping only the purely repulsive part, *i.e.*, taking <sup>u</sup>(<sup>r</sup> <sup>=</sup> <sup>|</sup>R <sup>A</sup> <sup>−</sup> R <sup>B</sup>|)=4<sup>ε</sup> [σ/r] <sup>12</sup>. The simulation was performed at a fixed temperature kBT = 1.5ε on a system totaling 171,500 particles, of which 88,889 for Fluid A and 82,611 for Fluid B, in a rectangular parallelepiped box with the same width and height w = h ≈ 44 and double length d ≈ 88, corresponding to an average particle density n = 1.024, where all figures are in reduced LJ units. The density and temperature are in the fluid region of the phase space of a pure LJ fluid. In order to follow the behavior of the density and velocity fields, the space was discretized using 5,488 cubic cells arranged on a 14 × 14 × 28 grid. In this way, local field values are obtained averaging out on roughly 30 particles in each cell. For the ensemble at time t0, the initial configuration for the interface between Fluid A and Fluid B is defined by selecting the m cells (centered on the mesh points, xα, α = 1, 2,...,m), which are cut across by the ideal cylindrical surface, S˜,

$$\tilde{S} = \left\{ \vec{x} : z = \mathcal{A} \sin \left( \frac{\pi x}{w} \right) + \frac{h - \mathcal{A}}{2}, \quad 0 \le x \le w \; , \quad 0 \le y \le h \right\} \tag{40}$$

where A = 50 is the amplitude that determines the curvature of the surface, which is approximatively placed halfway along the z-direction in the simulation box. The restraint potential is completely defined by the choice of the coupling parameter k = 0.004 in LJ units and the imposed values F˜(xα)=0, α = 1, 2,...,m. The initial configuration was first prepared with the equilibration of a pure Type A fluid at the target temperature and density and then identifying, within the simulation box, as particles of Type A all the particles that are on the red side of the surface S˜, as particles of Type B all the particles that are on the blue side (see the left panel in Figure 7), taking care of having exactly half of the particles of Type A and half of Type B in the m cells that make up the discretized interface at time t0. Periodic boundary conditions are applied in all directions, so that a second flat interface (the condition that minimizes the surface tension) is created at the same time at the sides of the box along the z-direction. Then, the system was equilibrated running restrained MD with a time step δt = 4.<sup>56</sup> · <sup>10</sup>−<sup>4</sup> in LJ units. Such a rather small value for δt was used to ensure a proper numerical integration of the "stiff" restraining forces. A typical snapshot of the isosurface Δ(x)=0 is shown in the left panel of Figure 7.

A long, 10<sup>6</sup> MD restrained trajectory was then carried out, with that same time step, taking out, at regular intervals of 25,000 steps, the configuration of the system in phase space. A set of 40 independent initial conditions was collected in this way, and from each of them was started, now with a regular time step δt = 4.56·10−<sup>3</sup>, a 25,000-steps unrestrained MD trajectory at constant energy, *i.e.*, using the equations of motion derived only from the Hamiltonian <sup>H</sup>( Γ), given in Equation (32). The D-NEMD averaging procedure was used with those 40 trajectories to compute the time-dependent behavior of the macroscopic density and velocity fields, discretized on the previously mentioned cubic mesh.

In the right panel of Figure 7, the dynamical behavior of the interface is shown by plotting the isosurfaces Δ(x)=0 for four successive times.

One can see that the interface curvature diminishes progressively towards the flat, equilibrium condition, while maintaining (approximatively) both the initial uniformity along the direction of the y-axis and the initial mirror symmetry with respect to the middle yz-plane. Small deviations are present, as expected, considering that this macroscopic field results from averaging over a relatively small sample of 40 independent trajectories. They are compatible with the expected amplitudes of the equilibrium fluctuations of the interface. Relaxation of the initially curved surface reaches the flat equilibrium condition fully in a time lapse of approximatively 20,000 steps, *i.e.*, something just short of 100 LJ time units. One can use this information to estimate the order of the relaxation time and, from it, a maximum value vmax ≈ 0.5 in LJ units for the average velocity field at the mid-point along x of the interface, *i.e.*, in the region corresponding to the maximum displacement at time t<sup>0</sup> of the interface. If one takes the LJ parameters of argon (for which <sup>σ</sup> = 3.<sup>405</sup> · <sup>10</sup>−<sup>10</sup> <sup>m</sup> and the unit of time corresponds to <sup>τ</sup> = 2.<sup>156</sup> · <sup>10</sup>−<sup>12</sup> <sup>s</sup>), this value translates to an experimentally convincing velocity of <sup>≈</sup> <sup>80</sup> <sup>m</sup> · <sup>s</sup>−<sup>1</sup>.

Figure 7. (Left panel) A sampled initial condition for the S isosurface Δ(x)=0 separating the two liquids, A and B (the second planar interface at the long edge of the simulation box is not shown). (Right panel) The evolution in time of the initially curved interface (purple) towards the relaxed planar condition (green). The snapshots are the D-NEMD results averaged over a sample of 40 initial conditions.

The second interface on the sides of the MD box remains flat, with even smaller deviations, all along the 25,000 time steps. There seem to be no significant effects on it as a result of the relaxation process, which takes place in the middle of the box. The reason for this will become evident after looking at the time-dependent behavior of the velocity field. We have shown, in fact, how the D-NEMD approach provides very detailed information on the hydrodynamic behavior of the system and unravels the underlying physical mechanisms.

Starting from Equation (20), the discretized velocity field is calculated accordingly to:

$$\vec{v}(\vec{x},t) = \frac{\left<\sum\_{j=1}^{N\_A} \vec{P}\_j^A(t) \Theta(a, \vec{x}\_\alpha, \vec{R}\_j^A(t)) + \sum\_{j=1}^{N\_B} \vec{P}\_j^B(t) \Theta(a, \vec{x}\_\alpha, \vec{R}\_j^B(t))\right>\_{\rho\_0}}{\left<\sum\_{j=1}^{N\_A} m\Theta(a, \vec{x}\_\alpha, \vec{R}\_j^A(t)) + \sum\_{j=1}^{N\_B} m\Theta(a, \vec{x}\_\alpha, \vec{R}\_j^B(t))\right>\_{\rho\_0}} \tag{41}$$

where we made explicit use of the fact that the particles have identical masses m, regardless of whether they are of Type A or B, and the D-NEMD average is taken over the 40 trajectories with initial conditions sampled from the ρ<sup>0</sup> probability density along the restrained MD trajectory. The results of the calculation are shown in the left panels of Figure 8. The direction of the projections of the velocity field in each cell on the xz-plane (after averaging along the translational symmetry axis y) is represented by a small arrow whose length is proportional to its modulus.

Figure 8. The behavior of the velocity field in the two liquid regions as a function of time: comparison of the results obtained using the D-NEMD approach averaging over 40 initial conditions (Panels A–C) and the local time averaging procedure (Panels D–F). The results emphasize how the latter approach returns a much flatter picture of the velocity field, exposing features of the relaxation mechanism that are in marked contrast with the underlying symmetries of the process. (reproduced from [61] with permission from the Physical Chemistry Chemical Physics Owner Societies.)

Snapshots at three successive times are shown. In the snapshot taken 500 time steps after t0, one can notice the build up of some coherence in the velocity field, which becomes structured in a region that is, along the z-direction, about twice the size of the width of the curved interface, the projection of which is represented by the contour line at the value Δ = 0. For the sake of clarity, the symmetry yz plane is as well highlighted by a dashed straight line. One can distinguish a quasi-symmetric two-tail profile, where the push towards the edges of the interface appears to be more pronounced than the one, in the opposite direction, in the region near the center of the interface (Panel A). This behavior marks the initial build up of a more stable two-roll velocity profile, of roughly the same width across the interface region in the A and B fluids, which becomes very evident after 3,750 steps (Panel B) and is still neatly visible and qualitatively unchanged, even after 7,500 steps from t0, when the curvature of the interface is significantly reduced (Panel C).

One can notice that the size and the intensity of the field has decreased, but the profile remains highly symmetrical along the mirror symmetry plane. All along, one can notice also that the field in the extreme sides of the simulation box remains essentially unperturbed, which explains why the second flat interface at the boundary is not affected in any significant way and remains stable during the whole relaxation process of the curved interface. It is very interesting to compare these insights on the hydrodynamic processes underlying the interface relaxation, as given by the time-dependent behavior of the dynamical response calculated using the D-NEMD procedure, with the standard approach used on single trajectory simulations of hydrodynamic processes. Starting from the local equilibrium hypothesis of hydrodynamic theory, based on the assumption of a time scale separation between the fast microscopic motion of the particles and the slower hydrodynamic processes, the macroscopic fields are computed as local time averages on the short time scale τ of the atomistic processes [64].

By applying this approach to the time evolution from a single initial condition, one obtains the results shown in the right panels of Figure 8. At first glance, the velocity field appears to be much smoother than the D-NEMD result, which is relatively noisy due to the limited size (40) of the sample of initial conditions used. However, the picture that is returned is quite different, with the physical mechanism exposed by the D-NEMD procedure effectively washed out by the local time averaging, which also presents a velocity field that does seem to violate the mirror plane symmetry initially imposed on the system at time t0, contrary to the more convincing evidence, given from the D-NEMD results, of a relaxation mechanism satisfying, on average, such symmetry at all times along the dynamical trajectory.

#### 4. Conclusions and Perspectives

In this paper, we have presented a dynamical approach to non-equilibrium MD, which makes it possible to compute, numerically, but, otherwise, rigorously, time-dependent non-equilibrium responses, *i.e.*, to observe directly transient responses in non-stationary regimes. We have shown that using a proper simulation setup, it is possible to go beyond the usual situation of initial equilibrium conditions to treat interesting cases in which the initial condition is either a stationary non-equilibrium or a constrained equilibrium condition dictated by means of a macroscopic constraint, which can be expressed, in a general way, either as a scalar or field-like observable, outlining also the connections of the D-NEMD approach to the Blue Moon method [15] to compute the transmission coefficient contribution to the rate constants of activated processes.

We illustrated a few applications of the method starting from the early, historical, approach to the calculation of transport properties in the lines of Linear Response Theory and beyond, to a couple of recent atomistic simulations of hydrodynamic processes: the establishing of a convective cell, when gravity is switched on in the presence of a stationary thermal gradient, and the relaxation of an initially curved interface between two immiscible liquids. We have shown that the method generates rigorous time-dependent non-equilibrium averages, providing valuable insights on the mechanisms of hydrodynamic processes that can be missed using a method like the local time average, which cannot have a rigorous justification, presents a statistical error that cannot be reduced at will and, finally, as we have seen above, can bias the statistical response. A word of caution is needed, though. The time-dependent ensemble averages are meaningful only if the thermodynamical response is unique [65]. Whenever this is not the case, the meaning of the statistical averages becomes questionable. To our knowledge, in these cases a systematic answer does not exist for the non-equilibrium thermodynamic response and problems have to be treated on a one-by-one basis.

In summary, with the outlined exceptions, D-NEMD is a method ready for challenging applications, by which it is possible to study complex time-dependent phenomena using only the fundamental laws of Statistical Mechanics, *i.e.*, without using empirical approaches as, for example, in the case of continuum hydrodynamic theories. Work is in progress in this direction.

#### Acknowledgments

We acknowledge the Science Foundation Ireland SFI Grant No. 08-IN.1-I1869 and the Istituto Italiano di Tecnologia under the SEED Project grant No. 259 SIMBEDD-Advanced Computational Methods for Biophysics, Drug Design and Energy Research for financial support.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


Reprinted from *Entropy*. Cite as: Warren, P.B.; Allen, R.J. Malliavin Weight Sampling: A Practical Guide. *Entropy* 2014, *16*, 221–232.

*Article*

## Malliavin Weight Sampling: A Practical Guide

Patrick B. Warren **<sup>1</sup>***,* \* and Rosalind J. Allen **<sup>2</sup>***,* \*


*Received: 25 September 2013; in revised form: 9 October 2013 / Accepted: 18 October 2013 / Published: 27 December 2013*

Abstract: Malliavin weight sampling (MWS) is a stochastic calculus technique for computing the derivatives of averaged system properties with respect to parameters in stochastic simulations, without perturbing the system's dynamics. It applies to systems in or out of equilibrium, in steady state or time-dependent situations, and has applications in the calculation of response coefficients, parameter sensitivities and Jacobian matrices for gradient-based parameter optimisation algorithms. The implementation of MWS has been described in the specific contexts of kinetic Monte Carlo and Brownian dynamics simulation algorithms. Here, we present a general theoretical framework for deriving the appropriate MWS update rule for any stochastic simulation algorithm. We also provide pedagogical information on its practical implementation.

Keywords: stochastic calculus; Brownian dynamics

#### 1. Introduction

Malliavin weight sampling (MWS) is a method for computing derivatives of averaged system properties with respect to parameters in stochastic simulations [1,2]. The method has been used in quantitative financial modelling to obtain the "Greeks" (price sensitivities) [3], and as the Girsanov transform, in kinetic Monte Carlo simulations for systems biology [4]. Similar ideas have been used to study fluctuation-dissipation relations in supercooled liquids [5]. However, MWS appears 210

to be relatively unknown in the fields of soft matter, chemical and biological physics, perhaps because the theory is relatively impenetrable for non-specialists, being couched in the language of abstract mathematics (e.g., martingales, Girsanov transform, Malliavin calculus, *etc.*); an exception in financial modelling is [6].

MWS works by introducing an auxiliary stochastic quantity, the Malliavin weight, for each parameter of interest. The Malliavin weights are updated alongside the system's usual (unperturbed) dynamics, according to a set of rules. The derivative of any system function, A, with respect to a parameter of interest is then given by the average of the product of A with the relevant Malliavin weight; or in other words, by a weighted average of A, in which the weight function is given by the Malliavin weight. Importantly, MWS works for non-equilibrium situations, such as time-dependent processes or driven steady states. It thus complements existing methods based on equilibrium statistical mechanics, which are widely used in soft matter and chemical physics.

MWS has so far been discussed only in the context of specific simulation algorithms. In this paper, we present a pedagogical and generic approach to the construction of Malliavin weights, which can be applied to any stochastic simulation scheme. We further describe its practical implementation in some detail using as our example one dimensional Brownian motion in a force field.

#### 2. The Construction of Malliavin Weights

The rules for the propagation of Malliavin weights have been derived for the kinetic Monte-Carlo algorithm [4,7], for the Metropolis Monte-Carlo scheme [5] and for both underdamped and overdamped Brownian dynamics [8]. Here we present a generic theoretical framework, which encompasses these algorithms and also allows extension to other stochastic simulation schemes.

We suppose that our system evolves in some state space, and a point in this state space is denoted as S. Here, we assume that the state space is continuous, but our approach can easily be translated to discrete or mixed discrete-continuous state spaces. Since the system is stochastic, its state at time t is described by a probability distribution, P(S). In each simulation step, the state of the system changes according to a propagator, W(S → S ), which gives the probability that the system moves from point S to point S during an application of the update algorithm. The propagator has the property that

$$P'(S') = \int\_{S} dS \ W(S \to S') \ P(S) \tag{1}$$

where P (S) is the probability distribution after the update step has been applied and the integral is over the whole state space. We shall write this in a shorthand notation as

$$P' = \int WP \,. \tag{2}$$

Integrating Equation (1) over S , we see that the propagator must obey <sup>S</sup> W(S → S )=1. It is important to note, however, that we do *not* assume the detailed balance condition Peq(S) W(S → S ) = Peq(S ) W(S → S) (for some equilibrium Peq(S)). Thus, our results apply to systems whose dynamical rules do not obey detailed balance (such as chemical models of gene regulatory networks [9]), as well as to systems out of steady state. We observe that the (finite) product

$$W(S\_1, \ldots, S\_n) = W(S\_1 \to S\_2) \times \cdots \times W(S\_{n-1} \to S\_n) \tag{3}$$

is proportional to the probability of occurrence of a trajectory of states, {S1,...,Sn}, and can be interpreted as a *trajectory weight*.

Let us now consider the average of some quantity A(S) over the state space, in shorthand

$$
\langle A \rangle = \int A \, P \, . \tag{4}
$$

The quantity A might well be a complicated function of the state of the system: for example the extent of crystalline order in a particle-based simulation, or a combination of the concentrations of various chemical species in a simulation of a biochemical network. We suppose that we are interested in the sensitivity of A to variations in some parameter of the simulation, which we denote as λ. This might be one of the force field parameters (or the temperature) in a particle-based simulation or a rate constant in a kinetic Monte Carlo simulation. We are interested in computing ∂A/∂λ. This quantity can be written as

$$\frac{\partial \langle A \rangle}{\partial \lambda} = \int AP \, Q\_{\lambda} \,, \tag{5}$$

where

$$Q\_{\lambda} = \frac{\partial \ln P}{\partial \lambda} \tag{6}$$

(using the fact that ∂ ln P/∂λ = (1/P)∂P/∂λ).

Let us now suppose that we track in our simulation not only the physical state of the system, but also an auxiliary stochastic variable, which we term qλ. At each simulation step q<sup>λ</sup> is updated according to a rule that depends on the system state; this does not perturb the system's dynamics, but merely acts as a "readout". By tracking qλ, we *extend* the state space, so that S becomes {S, qλ}. We can then define the average qλS, which is an average of the value of q<sup>λ</sup> in the extended state space, with the constraint that the original (physical) state space point is fixed at S (see further below).

Our aim is to define a set of rules for updating qλ, such that qλ<sup>S</sup> = Qλ, *i.e.*, such that the average of the auxiliary variable, for a particular state space point, measures the *derivative* of the probability distribution with respect to the parameter of interest, λ. If this is the case then, from Equation (5)

$$\frac{\partial \langle A \rangle}{\partial \lambda} = \langle A \neq q\_{\lambda} \rangle \,. \tag{7}$$

The auxiliary variable q<sup>λ</sup> is the Malliavin weight corresponding to the parameter λ.

How do we go about finding the correct updating rule? If the Malliavin weight exists, we should be able to derive its updating rule from the system's underlying stochastic equations of motion. We obtain an important clue from differentiating Equation (1) with respect to λ. Extending the shorthand notation, one finds

$$P'Q'\_{\lambda} = \int WP \left(Q\_{\lambda} + \frac{\partial \ln W}{\partial \lambda}\right). \tag{8}$$

This strongly suggests that the rule for updating the Malliavin weight should be

$$q'\_{\lambda} = q\_{\lambda} + \frac{\partial \ln W}{\partial \lambda} \,. \tag{9}$$

In fact, this is correct. The proof is not difficult and, for the case of Brownian dynamics, can be found in the supplementary material for [8]. It involves averaging Equation (9) in the extended state space {S, qλ}.

From a practical point of view, for each time step, we implement the following procedure:


At the start of the simulation, the Malliavin weight is usually initialised to q<sup>λ</sup> = 0.

Let us first suppose that our system is not in steady state. However rather the quantity A in which we are interested is changing in time and likewise ∂A(t)/∂λ is a time-dependent quantity. To compute ∂A(t)/∂λ we run N independent simulations, in each one tracking as a function of time A(t), qλ(t) and the product A(t) qλ(t). The quantities A(t) and ∂A(t)/∂λ are then given by

$$
\langle A(t)\rangle \approx \frac{1}{N} \sum\_{i=1}^{N} A\_i(t), \quad \frac{\partial \langle A(t)\rangle}{\partial \lambda} \approx \frac{1}{N} \sum\_{i=1}^{N} A\_i(t) \, q\_{\lambda,i}(t) \,, \tag{10}
$$

where Ai(t) is the value of A(t) recorded in the ith simulation run (and likewise for qλ,i(t)). Error estimates can be obtained from the variance among the replicate simulations.

If, instead, our system is in steady state, the procedure needs to be modified slightly. This is because the variance in the values of qλ(t) across replicate simulations increases linearly in time (this point is discussed further below). For long times, computation of ∂A/∂λ using Equation (10) therefore incurs a large statistical error. Fortunately, this problem can easily be solved by computing the correlation function

$$C(t, t') = \left< A(t) \left[ q\_\lambda(t) - q\_\lambda(t') \right] \right>. \tag{11}$$

In steady state, C(t, t ) = C(t − t ), with the property that C(Δt) → ∂A/∂λ as Δt → ∞. In a single simulation run, we simply measure qλ(t) and A(t) at time intervals separated by Δt (which is typically multiple simulation steps). At each measurement, we compute A(t) [qλ(t) − qλ(t − Δt)]. We then average this latter quantity over the whole simulation run to obtain an estimate of ∂A/∂λ. For this estimate to be accurate, we require that Δt is long enough that C(Δt) has reached its plateau value; this typically means that Δt should be longer than the typical relaxation time of the system's dynamics. The correlation function approach is discussed in more detail in [7,8].

Returning to a more theoretical perspective, it is interesting to note that the rule for updating the Malliavin weight, Equation (9), depends deterministically on S and S . This implies that the value of

212

the Malliavin weight at time t is completely determined by the trajectory of system states during the time interval 0 → t. In fact, it is easy to show that

$$q\_{\lambda} = \frac{\partial \ln \mathbb{W}}{\partial \lambda} \tag{12}$$

where W is the trajectory weight defined in Equation (3). Similar expressions are given in [5,7]. Thus, the Malliavin weight q<sup>λ</sup> is not fixed by the state point S but by the entire trajectory of states that have led to state point S. Since many different trajectories can lead to S, many values of q<sup>λ</sup> are possible for the same state point S. The average qλ(t)<sup>S</sup> is actually the expectation value of the Malliavin weight, averaged over all trajectories that reach state point S at time t. This can be used to obtain an alternative proof that qλ<sup>S</sup> = ∂ ln P/∂λ. Suppose we sample N trajectories, of which N<sup>S</sup> end up at state point S (or a suitably defined vicinity thereof, in a continuous state space). We have P(S) = NS/N. Then, the Malliavin property implies ∂P/∂λ = N<sup>S</sup> qλ/N, and hence ∂ ln P/∂λ = N<sup>S</sup> qλ/NS = qλS.

#### 3. Multiple Variables, Second Derivatives and the Algebra of Malliavin Weights

Up to now, we have assumed that the quantity A does not depend explicitly on the parameter λ. There may be cases, however, when A does have an explicit λ-dependence. In these cases, Equation (7) should be replaced by

$$\frac{\partial \langle A \rangle}{\partial \lambda} = \left\langle \frac{\partial A}{\partial \lambda} \right\rangle + \left\langle A \neq\_{\lambda} \right\rangle. \tag{13}$$

If we set A to be a constant in this, we immediately obtain the general result that qλ = 0. Equation (13) reveals a kind of 'algebra' for Malliavin weights: we see that the operations of taking an expectation value and taking a derivative can be commuted, provided the Malliavin weight is introduced as the commutator.

We can also extend our analysis further to allow us to compute higher derivatives with respect to the parameters. These may be useful, for example, for increasing the efficiency of gradient-based parameter optimisation algorithms. Taking the derivative of Equation (13) with respect to a second parameter μ gives

$$
\begin{split}
\frac{\partial^2 \langle A \rangle}{\partial \lambda \partial \mu} &= \frac{\partial}{\partial \mu} \left\langle \frac{\partial A}{\partial \lambda} \right\rangle + \frac{\partial \langle A \neq \lambda \rangle}{\partial \mu} \\
&= \left\langle \frac{\partial^2 A}{\partial \lambda \partial \mu} \right\rangle + \left\langle \frac{\partial A}{\partial \lambda} q\_{\mu} \right\rangle + \left\langle A \frac{\partial q\_{\lambda}}{\partial \mu} \right\rangle + \left\langle \frac{\partial A}{\partial \mu} q\_{\lambda} \right\rangle + \left\langle A \neq q\_{\lambda} q\_{\mu} \right\rangle \\
&= \left\langle A \left( q\_{\lambda \mu} + q\_{\lambda} q\_{\mu} \right) \right\rangle + \left\langle \frac{\partial A}{\partial \lambda} q\_{\mu} \right\rangle + \left\langle \frac{\partial A}{\partial \mu} q\_{\lambda} \right\rangle + \left\langle \frac{\partial^2 A}{\partial \lambda \partial \mu} \right\rangle,
\end{split}
\tag{14}
$$

where in the second line we iterate the commutation relation, and in the third line we collect like terms and introduce

$$q\_{\lambda\mu} = \frac{\partial q\_{\lambda}}{\partial \mu}.\tag{15}$$

In the case where A is independent of the parameters, this result simplifies to

$$\frac{\partial^2 \langle A \rangle}{\partial \lambda \partial \mu} = \left\langle A \left( q\_{\lambda \mu} + q\_{\lambda} q\_{\mu} \right) \right\rangle. \tag{16}$$

The quantity qλμ here is a new, second order, Malliavin weight which from Equations (12) and (15) satisfies

$$q\_{\lambda\mu} = \frac{\partial^2 \ln \mathbb{W}}{\partial \lambda \partial \mu}.\tag{17}$$

To compute second derivatives with respect to the parameters, we should therefore track these second order Malliavin weights in our simulation, updating them alongside the existing Malliavin weights by the rule

$$q'\_{\lambda\mu} = q\_{\lambda\mu} + \frac{\partial^2 \ln W(S \to S')}{\partial \lambda \partial \mu} \,. \tag{18}$$

Setting A as a constant in Equation (16), we also obtain the interesting result that qλμ = −qλqμ.

Steady state problems can be approached by extending the correlation function method to second order weights. Define, *cf.* Equation (11),

$$C(t, t') = \left\langle A(t) \left\{ \left[ q\_{\lambda \mu}(t) + q\_{\lambda}(t) q\_{\mu}(t) \right] - \left[ q\_{\lambda \mu}(t') + q\_{\lambda}(t') q\_{\mu}(t') \right] \right\} \right\rangle. \tag{19}$$

As in the first order case, in steady state we expect C(t, t ) = C(t − t ) with the property that <sup>C</sup>(Δt) <sup>→</sup> <sup>∂</sup><sup>2</sup>A/∂λ∂μ as <sup>Δ</sup><sup>t</sup> → ∞.

#### 4. One-Dimensional Brownian Motion in a Force Field

We now demonstrate this machinery by way of a practical but very simple example, namely one-dimensional (overdamped) Brownian motion in a force field. In this case, the state space is specified simply by the particle position x which evolves according to the Langevin equation

$$\frac{dx}{dt} = f(x) + \eta \tag{20}$$

where f(x) is the force field and η is Gaussian white noise of amplitude 2T, where T is temperature. Without loss of generality, we have chosen units so that there is no prefactor multiplying the force field. We discretise the Langevin equation to the following updating rule:

$$x' = x + f(x)\,\delta t + \xi \,, \tag{21}$$

where δt is the time step and ξ is a Gaussian random variate with zero mean and variance 2T δt. Corresponding to this updating rule is an explicit expression for the propagator,

$$W(x \to x') = \frac{1}{\sqrt{4\pi T \,\delta t}} \exp\left(-\frac{(x'-x-f(x)\,\delta t)^2}{4T\,\delta t}\right). \tag{22}$$

This follows from the statistical distribution of ξ. Let us suppose that the parameter of interest λ enters into the force field (the temperature T could also be chosen as a parameter). Making this assumption

$$\frac{\partial \ln W(x \to x')}{\partial \lambda} = \frac{(x' - x - f\,\delta t)}{2T} \frac{\partial f}{\partial \lambda} \,. \tag{23}$$

214

We can simplify this result by noting that from Equation (21), x − x − f δt = ξ. Making use of this, the final updating rule for the Malliavin weight is

$$q'\_{\lambda} = q\_{\lambda} + \frac{\xi}{2T} \frac{\partial f}{\partial \lambda} \tag{24}$$

where ξ is the *exact same* value that was used for updating the position in Equation (21). Because the value of ξ is the same for the updates of position and of qλ, the change in q<sup>λ</sup> is completely determined by the end points, x and x . The derivative ∂f/∂λ should be evaluated at x since that is the position at which the force is computed in Equation (21). Since ξ in Equation (21) is a random variate uncorrelated with x, averaging Equation (24) shows that q <sup>λ</sup> = qλ. As the initial condition is q<sup>λ</sup> = 0, this means that qλ = 0, as predicted in the previous section. Equation (24) is essentially the same as that derived in [8].

If we differentiate Equation (23) with respect to a second parameter μ we get

$$\frac{\partial^2 \ln W(x \to x')}{\partial \lambda \partial \mu} = \frac{(x' - x - f\,\delta t)}{2T} \frac{\partial^2 f}{\partial \lambda \partial \mu} - \frac{\delta t}{2T} \frac{\partial f}{\partial \lambda} \frac{\partial f}{\partial \mu}.\tag{25}$$

Hence, the updating rule for the second order Malliavin weight can be written as

$$q'\_{\lambda\mu} = q\_{\lambda\mu} + \frac{\xi}{2T} \frac{\partial^2 f}{\partial \lambda \partial \mu} - \frac{\delta t}{2T} \frac{\partial f}{\partial \lambda} \frac{\partial f}{\partial \mu},\tag{26}$$

where again ξ is the exact same value as that used for updating the position in Equation (21). If we average Equation (26) over replicate simulation runs, we find q λμ = qλμ − (δt/2T)(∂f/∂λ)(∂f/∂μ). Hence the mean value qλμ drifts in time, unlike qλ or qμ. However, one can show that the mean value of the sum (qλμ + qλqμ) is constant in time and equal to zero as long as initially q<sup>λ</sup> = q<sup>μ</sup> = 0.

Now let us consider the simplest case of a particle in a linear force field f = −κx + h (also discussed in [8]). This corresponds to a harmonic trap with the potential U = <sup>1</sup> <sup>2</sup>κx<sup>2</sup> <sup>−</sup> hx. We let the particle start from x<sup>0</sup> at t = 0 and track its time-dependent relaxation to the steady state. We shall set T = 1 for simplicity. The Langevin equation can be solved exactly for this case, and the mean position evolves according to

$$
\langle x(t) \rangle = x\_0 e^{-\kappa t} + \frac{h}{\kappa} (1 - e^{-\kappa t}) \,. \tag{27}
$$

We suppose that we are interested in derivatives with respect to both h and κ, for a "baseline" parameter set in which κ is finite but h = 0. Taking derivatives of Equation (27) and setting h = 0, we find

$$\frac{\partial \langle x(t) \rangle}{\partial h} = \frac{1 - e^{-\kappa t}}{\kappa}, \quad \frac{\partial \langle x \rangle(t)}{\partial \kappa} = -x\_0 t e^{-\kappa t}, \quad \frac{\partial^2 \langle x(t) \rangle}{\partial h \partial \kappa} = \frac{t e^{-\kappa t}}{\kappa} - \frac{1 - e^{-\kappa t}}{\kappa^2}.\tag{28}$$

We now show how to compute these derivatives using Malliavin weight sampling. Applying the definitions in Equations (24) and (26), the Malliavin weight increments are

$$q'\_h = q\_h + \frac{\xi}{2}, \quad q'\_\kappa = q\_\kappa - \frac{x\,\xi}{2}, \quad q'\_{h\kappa} = q\_{h\kappa} + \frac{x\,\delta t}{2}, \tag{29}$$

and the position update itself is

$$x' = x - \kappa x \,\delta t + \xi \,. \tag{30}$$

We track these Malliavin weights in our simulation and use them to calculate derivatives according to

$$\frac{\partial \langle x(t) \rangle}{\partial h} = \left\langle x(t) q\_h(t) \right\rangle, \quad \frac{\partial \langle x(t) \rangle}{\partial \kappa} = \left\langle x(t) q\_\kappa(t) \right\rangle, \quad \frac{\partial^2 \langle x(t) \rangle}{\partial h \partial \kappa} = \left\langle x(t) (q\_{h\kappa}(t) + q\_h(t) q\_\kappa(t)) \right\rangle. \tag{31}$$

Equations (29)–(31) have been coded up as a MATLAB script, described in Section 5. A typical result generated by running this script is shown in Figure 1. Equations (29) and (30) are iterated with δt = 0.01 up to t = 5, for a trap strength κ = 2 and initial position x<sup>0</sup> = 1. The weighted averages in Equation (31) are evaluated as a function of time for N = 10<sup>5</sup> samples as in Equation (10). These results are shown as the solid lines in Figure 1. The dashed lines are theoretical predictions for the time dependent derivatives from Equation (28). As can be seen, the agreement between the time-dependent derivatives and the Malliavin weight averages is very good.

Figure 1. Time-dependent derivatives, ∂x/∂h (top curve, blue), ∂x/∂κ (middle curve, green), and <sup>∂</sup><sup>2</sup>x/∂h∂κ (bottom curve, red). Solid lines (slightly noisy) are the Malliavin weight averages as indicated in the Figure, generated by running the MATLAB script described in Section 5. Dashed lines are theoretical predictions from Equation (28).

As discussed briefly above, in this procedure the sampling error in the computation of ∂A(t)/∂λ is expected to grow with time. Figure 2 shows the mean square Malliavin weight as a function of time for the same problem. For the first order weights q<sup>h</sup> and q<sup>κ</sup> the growth rate is typically linear in time. Indeed, from Equation (29), one can prove that in the limit δt → 0 (see Section 5)

$$\frac{d\langle q\_h^2 \rangle}{dt} = \frac{1}{2} \,, \quad \frac{d\langle q\_\kappa^2 \rangle}{dt} = \frac{\langle x^2 \rangle}{2} \,. \tag{32}$$

Thus q<sup>h</sup> behaves exactly as a random walk, as should be obvious from the updating rule. The other weight <sup>q</sup><sup>κ</sup> also ultimately behaves as a random walk since x<sup>2</sup> = 1/κ in steady state (from equipartition). Figure 2 also shows that the second order weight qhκ grows superdiffusively; one can show that eventually (qhκ <sup>+</sup> <sup>q</sup>hqκ)<sup>2</sup> ∼ <sup>t</sup> <sup>2</sup>, although the transient behaviour is complicated. Full expressions are given in Section 5. This suggests that computation of second order derivatives is likely to suffer more severely from statistical sampling problems than the computation of first order derivatives.

Figure 2. Growth of mean square Malliavin weights with time. The solid lines are from simulations and the dashed lines are from Equation (35) in the Appendix. Parameters are as for Figure 1.

#### 5. Conclusions

In this paper, we have provided an outline of the generic use of Malliavin weights for sampling derivatives in stochastic simulations, with an emphasis on practical aspects. The usefulness of MWS for a particular simulation scheme hinges on the simplicity or otherwise of constructing the propagator W(S → S ) which fixes the updating rule for the Malliavin weights according to Equation (9). The propagator is determined by the algorithm used to implement the stochastic equations of motion; MWS may be easier to implement for some algorithms than for others. We note however that there is often some freedom of choice about the algorithm, such as the choice of a stochastic thermostat in molecular dynamics, or the order in which update steps are implemented. In these cases, a suitable choice may simplify the construction of the propagator and facilitate the use of Malliavin weights.

#### Acknowledgments

Rosalind J. Allen is supported by a Royal Society University Research Fellowship.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


#### Appendix

#### MATLAB Script

The MATLAB script in Listing 1 was used to generate the results shown in Figure 1. It implements Equations (29)–(31) above, making extensive use of the compact MATLAB syntax for array operations, for instance invoking '.\*' for element-by-element multiplication of arrays.

```
Listing 1. MATLAB script used to generate Figure 1.
```

```
1 clear all
2 randn ( ' seed ' , 12345);
3 kappa = 2; x0 = 1; tend = 5; dt = 0.01; nsamp = 10^5;
4 npt = round (tend / dt ) + 1;
5 t = (0: npt −1) ' * dt ;
6 x = zeros ( npt , 1); xi = zeros ( npt , 1);
7 qh = zeros ( npt , 1 ); qk = zeros ( npt , 1 ); qhk = zeros ( npt , 1);
8 x_av = zeros ( npt , 1); xqh_av = zeros ( npt , 1);
9 xqk_av = zeros ( npt , 1); xqhk_av = zeros ( npt , 1);
10 for samp = 1: nsamp
11 x (1 ) = x0 ; qh (1 ) = 0; qk (1 ) = 0; qhk (1 ) = 0;
12 xi = randn ( npt , 1) * sqrt ( 2* dt );
13 for i = 1: npt −1
14 x ( i +1) = x ( i ) − kappa *x(i )* dt + xi ( i );
15 qh ( i +1) = qh ( i ) + 0.5 * xi ( i );
16 qk ( i +1) = qk ( i ) − 0.5 * x(i )* xi ( i );
17 qhk ( i +1) = qhk ( i ) + 0.5 * x(i )* dt ;
18 end
19 x_av = x_av + x;
20 xqh_av = xqh_av + x. * qh ;
21 xqk_av = xqk_av + x. * qk ;
22 xqhk_av = xqhk_av + x . * ( qhk + qh . * qk );
23 end
24 x_av = x_av / nsamp ; xqh_av = xqh_av / nsamp ;
25 xqk_av = xqk_av / nsamp ; xqhk_av = xqhk_av / nsamp ;
26 hold on
27 plot ( t , x_av , ' k ' ); plot ( t , xqh_av , ' b ' );
28 plot ( t , xqk_av , ' g ' ); plot ( t , xqhk_av , ' r ' );
29 plot ( t , x0*exp(−kappa * t ) , 'k−− ' )
30 plot ( t , (1−exp(−kappa * t ) )/ kappa , 'b−− ' )
31 plot (t , −x0* t . * exp(−kappa * t ) , 'g−− ' )
32 plot (t , t. * exp(−kappa * t )/ kappa −(1−exp(−kappa * t ) )/ ( kappa ^2) , ' r−− ' )
33 re s ult = [ t x_av xqh_av xqk_av xqhk_av ];
34 save ( ' result . dat ' , '−ascii ' , ' result ' );
```
Here is a brief explanation of the script. *Lines 1–3* initialise the problem and the parameter values. *Lines 4* and *5* calculate the number of points in a trajectory and initialise a vector containing the time coordinate of each point. *Lines 6–9* set aside storage for the actual trajectory, Malliavin weights and cumulative statistics. *Lines 10–23* implement a pair of nested loops, which are the kernel of the simulation. Within the outer (trajectory sampling) loop, *Line 11* initialises the particle position and Malliavin weights, *Line 12* precomputes a vector of random displacements (Gaussian random variates), and *Lines 13–18* generate the actual trajectory. Within the inner (trajectory generating loop), *Lines 14–17* are a direct implementation of Equations (29) and (30). After each individual trajectory has been generated, the cumulative sampling step implied by Equation (31) is done in *Lines 19–22*; after all the trajectories have been generated, these quantities are normalised in *Lines 24* and *25*. Finally, *Lines 26–32* generate a plot similar to Figure 1 (albeit with the addition of x), and *Lines 33* and *34* show how the data can be exported in tabular format for replotting using an external package.

Listing 1 is complete and self-contained. It will run in either MATLAB or Octave. One minor comment is perhaps in order. The choice was made to precompute a vector of Gaussian random variates, which are used as random displacements to generate the trajectory and update the Malliavin weights. One could equally well generate random displacements on-the-fly, in the inner loop. For this one-dimensional problem storage is not an issue and it seems more elegant and efficient to exploit the vectorisation capabilities of MATLAB. For a more realistic three-dimensional problem, with many particles (and a different programming language), it is obviously preferable to use an on-the-fly approach.

#### Selected Analytic Results

Here, we present analytic results for the growth in time of the mean square Malliavin weights. We can express the rate of growth of the mean of a generic function f(x, qh, qκ, qhκ) as

$$\frac{d\langle f\rangle}{dt} = \lim\_{\delta t \to 0} \frac{\langle f(x', q\_h', q\_\kappa', q\_{h\kappa}') - f(x, q\_h, q\_\kappa, q\_{h\kappa})\rangle}{\delta t},\tag{33}$$

where on the right-hand side (RHS) the values of x , q <sup>h</sup>, q <sup>κ</sup> and qhκ are substituted from the updating rules in Equations (29) and (30). In calculating the RHS average, we note that the distribution of ξ is a Gaussian independent of the position and Malliavin weights and thus one can substitute ξ = 0, ξ<sup>2</sup> = 2 δt, ξ<sup>3</sup> = 0, ξ<sup>4</sup> = 12 δt<sup>2</sup>, *etc.* Proceeding in this way, with judicious choices for <sup>f</sup>, one can obtain the following set of coupled ordinary differential equations (ODEs):

$$\frac{d\langle q\_{\rm h}^2 \rangle}{dt} = \frac{1}{2}, \quad \frac{d\langle q\_{\rm h}^2 \rangle}{dt} = \frac{\langle x^2 \rangle}{2}, \quad \frac{d\langle x^2 \rangle}{dt} + 2\kappa \langle x^2 \rangle = 2, \quad \frac{d\langle xq\_{\rm h} \rangle}{dt} + \kappa \langle xq\_{\rm h} \rangle = 1,$$

$$\frac{d\langle x^2 q\_{\rm h}^2 \rangle}{dt} + 2\kappa \langle x^2 q\_{\rm h}^2 \rangle = 2\langle q\_{\rm h}^2 \rangle + 4\langle xq\_{\rm h} \rangle + \frac{\langle x^2 \rangle}{2}, \quad \frac{d\langle xq\_{\rm h}q\_{\rm h} \rangle}{dt} + \kappa \langle xq\_{\rm h}q\_{\rm h} \rangle = -\langle xq\_{\rm h} \rangle - \frac{\langle x^2 \rangle}{2},$$

$$\frac{d\langle (q\_{\rm h\rm h} + q\_{\rm h}q\_{\rm h})^2 \rangle}{dt} = \frac{\langle q\_{\rm h}^2 \rangle}{2} - \langle xq\_{\rm h}q\_{\rm h} \rangle + \frac{\langle x^2 q\_{\rm h}^2 \rangle}{2} \quad \left(= \frac{\langle (q\_{\rm h} - xq\_{\rm h})^2 \rangle}{2} \right). \tag{34}$$

Some of these have already been encountered in the main text. The last one is for the desired mean square second order weight. The ODEs can be solved with the initial conditions that at t = 0 all averages involving Malliavin weights vanish but x<sup>2</sup> <sup>=</sup> <sup>x</sup><sup>2</sup> <sup>0</sup>. The results include *inter alia*

$$
\langle q\_h^2 \rangle = \frac{t}{2}, \quad \langle q\_\kappa^2 \rangle = \frac{t}{2\kappa} + \frac{(\kappa x\_0^2 - 1)(1 - e^{-2\kappa t})}{4\kappa^2},
$$

$$
\langle (q\_{h\kappa} + q\_h q\_\kappa)^2 \rangle = \frac{2\kappa^2 t^2 + (19 + \kappa x\_0^2)\kappa t + 2\kappa x\_0^2 - 34}{8\kappa^3} + \frac{2\kappa t + 10 - \kappa x\_0^2}{2\kappa^3} e^{-\kappa t}
$$

$$
+ \frac{(1 - \kappa x\_0^2)\kappa t + 2\kappa x\_0^2 - 6}{8\kappa^3} e^{-2\kappa t}.
\tag{35}
$$

These are shown as the dashed lines in Figure 2. The leading behaviour of the last as t → ∞ is

$$
\langle \langle q\_{h\kappa} + q\_h q\_\kappa \rangle^2 \rangle = \frac{t^2}{4\kappa} + \text{subdominant terms} \,, \tag{36}
$$

however the approach to the pure asymptotic limit is slow.

Reprinted from *Entropy*. Cite as: Hartmann, C.; Banisch, R.; Sarich, M.; Badowski, T.; Schütte, C. Characterization of Rare Events in Molecular Dynamics. *Entropy* 2014, *16*, 350–376.

*Article*

## Characterization of Rare Events in Molecular Dynamics

Carsten Hartmann **<sup>1</sup>***,* \*, Ralf Banisch **<sup>1</sup>**, Marco Sarich **<sup>1</sup>**, Tomasz Badowski **<sup>1</sup>** and Christof Schütte **<sup>1</sup>***,***<sup>2</sup>**

<sup>1</sup> Institut für Mathematik, Freie Universität Berlin, Arnimallee 6, 14195 Berlin, Germany; E-Mails: ralf.banisch@fu-berlin.de (R.B.); sarich@mi.fu-berlin.de (M.S.); tomasz.badowski@gmail.com (T.B.); schuette@mi.fu-berlin.de (C.S.)

<sup>2</sup> Konrad-Zuse Zentrum, Takustraße 7, 14195 Berlin, Germany

\* Author to whom correspondence should be addressed; E-Mail: chartman@fu-berlin.de; Tel.: +49-(0)30-838-75286.

*Received: 13 September 2013; in revised form: 8 October 2013 / Accepted: 22 November 2013 / Published: 30 December 2013*

Abstract: A good deal of molecular dynamics simulations aims at predicting and quantifying rare events, such as the folding of a protein or a phase transition. Simulating rare events is often prohibitive, especially if the equations of motion are high-dimensional, as is the case in molecular dynamics. Various algorithms have been proposed for efficiently computing mean first passage times, transition rates or reaction pathways. This article surveys and discusses recent developments in the field of rare event simulation and outlines a new approach that combines ideas from optimal control and statistical mechanics. The optimal control approach described in detail resembles the use of Jarzynski's equality for free energy calculations, but with an optimized protocol that speeds up the sampling, while (theoretically) giving variance-free estimators of the rare events statistics. We illustrate the new approach with two numerical examples and discuss its relation to existing methods.

Keywords: rare events; molecular dynamics; optimal pathways; stochastic control; dynamic programming; change of measure; cumulant generating function

#### 1. Introduction

Rare but important transition events between long-lived states are a key feature of many systems arising in physics, chemistry, biology, *etc*. Molecular dynamics (MD) simulations allow for analysis and understanding of the dynamical behavior of molecular systems. However, realistic simulations for interesting (large) molecular systems in solution on timescales beyond microseconds are still infeasible even on the most powerful general purpose computers. This significantly limits the MD-based analysis of many biological equilibrium processes, because they often are associated with rare events. These rare events require prohibitively long simulations because the average waiting time between the events is orders of magnitude longer than the timescale of the transition characterizing the event itself. Therefore, the straightforward approach to such a problem via direct numerical simulation of the system until a reasonable number of events has been observed is impractically excessive for most interesting systems. As a consequence, rare event simulation and estimation are among the most challenging topics in molecular dynamics.

In this article, we consider typical rare events in molecular dynamics for which conformation changes or protein folding may serve as examples. They can be described in the following abstract way: The molecular system under consideration has the ability to go from a reactant state given by a set A in its state space (e.g., an initial conformation) to a product state described by another set B (e.g., the target conformation). Dynamical transitions from A to B are rare. The general situation we will address is as follows:


In addition, we will assume that the system under consideration is in equilibrium with respect to the stationary Gibbs-Boltzmann density

$$
\mu(x) = \frac{1}{Z} \exp(-\beta V(x))\,. \tag{1}
$$

We are interested in characterizing the transitions leading from A into B, that is, we are interested in the statistical properties of the ensemble of *reactive trajectories* that go *directly* from A to B (*i.e.*, start in A without returning to A before going to B). In other words, we are interested in all trajectories comprising the actual transition. We would like to:


The molecular dynamics literature on rare event simulations is rich. Since the 1930s, transition state theory (TST) [1,2] and extensions thereof based on the reactive flux formalism have provided the main theoretical framework for the description of transition events. TST can, however, at best deliver rates and does not allow one to characterize transition channels. It is based on partitioning the state space into two sets with a dividing surface in between, leaving set A on one side and the target set B on the other, and the theory only tells how this surface is crossed during the reaction. Often, it is difficult to choose a suitable dividing surface, and a bad choice will lead to a very poor estimate of the rate. The TST estimate is then extremely difficult to correct, especially if the rare event is of the diffusive type, where many different reaction channels co-exist. Therefore, many techniques have been proposed that try to go beyond TST.

These different strategies approach the problem by sampling the ensemble of reactive trajectories or by directly searching for the transition channels of the system. Most notable among these techniques are (1) Transition Path Sampling (TPS) [3]; (2) the so-called String Methods [4], or optimal path approaches [5–7] and variants thereof; (3) techniques that follow the progress of the transition through interfaces, like Forward-Flux Simulation (FFS) [8], Transition Interface Sampling (TIS) [9] or the Milestoning techniques [10,11]; and (4) methods that drive the molecular system by external forces with the aim of making the required transition more frequent while still allowing one to compute the exact rare event statistics for the unforced system, e.g., based on Jarzynski's and Crook's identity [12,13]. All of these methods consider the problem in continuous state space, *i.e.*, through reactive trajectories or transition channels in the original state space of the molecular system. They all face substantial problems, e.g., if the ensemble of reactive trajectories and/or transition channels of the system under consideration are too complicated (multi-modal, irregular, essentially high dimensional) or they suffer from too large variance of the underlying statistical estimators. We should moreover stress that each of these methods has its specific scope of application; some methods are mainly useful for computing transition rates, whereas others can be used to compute transition pathways or free energy differences.

Our aim is (A) to review some of these methods based on a joint theoretical basis and (B) to outline a new approach to the estimation of rare event statistics based on a combination of ideas from optimal control and statistical mechanics. In principle, this approach allows for a *variance-free* estimation of rare event statistics in combination with *much reduced simulation time*. The rest of the article is organized as follows: We start with a precise characterization of reactive trajectories, transition channels and related quantities in the framework of Transition Path Theory (TPT) in Section 2. Then, in Sections 3 and 4, we discuss the methods from classes (1)–(3) and characterize their potential problems in more detail. In Section 5, we consider methods of type (4) as an introduction to the presentation of the new optimal control approach that is outlined in detail in Sections 6 and 7, including some numerical experiments.

Alternative, inherently discrete methods, like Markov State Modeling, that discretize the state space appropriately and try to compute transition channels and rates *a posteriori* based on the resulting discrete model of the dynamics will not be discussed herein and are considered in the article [14] in a way related to the presentation at hand. We should further mention that not all rare event problems in molecular dynamics are related to sampling the underlying Gibbs–Boltzmann statistics, e.g., nucleation events under shear [15] or genuine nonequilibrium systems without a stationary probability distribution [16].

#### 2. Reactive Trajectories, Transition Rates and Transition Channels

Since our results are rather general, it is useful to set the stage somewhat abstractly. To this end, we borrow some notation from [17] and consider a system whose state space is R<sup>n</sup> and denote by X<sup>t</sup> the current state of the system at time t. For example, X<sup>t</sup> may be the set of instantaneous positions and momenta of the atoms of a molecular system. We assume that the system is ergodic with respect to a probability (equilibrium) distribution μ and that we can generate an infinitely long equilibrium trajectory {Xt}<sup>t</sup>∈<sup>R</sup> where, for technical reasons, we let the trajectory start at time t = −∞. The trajectory will go infinitely many times from A to B and each time the reaction happens. This reaction involves reactive trajectories that can be defined as follows: Given the trajectory {X(t)}<sup>t</sup>∈<sup>R</sup>, we say that its reactive pieces are the segments during which X<sup>t</sup> is neither in A or B, came out of A last and will go to B next. To formalize things, let

$$\begin{aligned} t\_{AB}^+(t) &= \text{smallest}\, s \ge t \text{ such that } X(s) \in A \cup B, \\ t\_{AB}^-(t) &= \text{largeest}\, s \le t \text{ such that } X(s) \in A \cup B. \end{aligned}$$

Then, the trajectory {X(t)}<sup>t</sup>≥<sup>0</sup> is reactive for all t ∈ R where R ⊂ [0, ∞) is defined by the requirements

$$X\_t \notin A \cup B, \quad X\_{t^+\_{AB}(t)} \in B \quad \text{and} \quad X\_{t^-\_{AB}(t)} \in A$$

and the ensemble of reactive trajectories is given by the set

$$\mathcal{R} = \{X\_t \colon t \in R\}$$

where each specific continuous piece of trajectory going directly from A to B in the ensemble belongs to a specific interval [t1, t2] ⊂ R.

Given the ensemble of reactive trajectories, we want to characterize it statistically by answering the following questions:


Question (Q1) can be answered easily, at least theoretically: The probability density to observe any trajectory (reactive or not) at point x is μ(x). Let q(x) be the so-called committor function, that

226

is the probability that the trajectory starting from x reaches first B rather than A. If the dynamics are reversible, then the probability that a trajectory we observe at state x is reactive is q(x)(1 − q(x)), where the first factor appears since the trajectory must go to B rather than A next, and the second factor appears since it needs to come from A rather than B last. Now, the Markov property of the dynamics implies that the probability density to observe a *reactive* trajectory at point x is

$$
\mu\_{AB}(x) \propto q(x)(1 - q(x))\,\mu(x)\,,
$$

which is the probability of observing any trajectory in x times the probability that it will be reactive (the proportionality symbol ∝ is used to indicate identity up to normalization).

## *2.1. Transition Path Theory (TPT)*

In order to give answers to the other questions, we will exploit the framework of *transition path theory* (TPT), which has been developed in [17–20] in the context of diffusions and has been generalized to discrete state spaces in [21,22]. In order to review the key results of TPT, let us consider diffusive molecular dynamics in an energy landscape <sup>V</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>:

$$dX\_t = -\nabla V(X\_t)dt + \sqrt{2\epsilon} \, dB\_t \,, \quad X\_0 = x \,. \tag{2}$$

Here, B<sup>t</sup> denotes standard n-dimensional Brownian motion, and > 0 is the temperature of the system. Under mild conditions on the energy landscape function V , we have ergodicity with respect to the stationary distribution <sup>μ</sup>(x) = <sup>Z</sup>−<sup>1</sup> exp(−βV (x)) with <sup>β</sup> = 1/. The dynamics are reversible with respect to this distribution, *i.e.*, the detailed balance condition holds. We assume throughout that the temperature is small relative to the largest energy barriers, *i.e.*,  ΔVmax. As a consequence, the relaxation of the dynamics towards equilibrium is dominated by the rare transitions over the largest energy barriers.

For these kind of dynamics, Questions (Q2) and (Q3) have surprisingly simple answers: The reactive probability current is given by

$$j\_{AB}(x) = \epsilon \mu(x) \,\nabla q(x).$$

where ∇q denotes the gradient of the committor function q. Based on this, the transition rate can be computed by the total reactive current across an arbitrary separating surface S:

$$k\_{AB} = \int\_S n\_S(x) j\_{AB}(x) d\sigma\_S(x).$$

where n<sup>S</sup> denotes the unit normal vector on S pointing towards B and σ<sup>S</sup> the associated surface element. The rate can also be expressed by

$$k\_{AB} = \epsilon \int\_{(A \cup B)^c} (\nabla q(x))^2 \mu(x) dx$$

where (<sup>A</sup> <sup>∪</sup> <sup>B</sup>)<sup>c</sup> denotes the entire state space excluding <sup>A</sup> and <sup>B</sup>. Given the reactive current, we can even answer Question (Q4): The transition channels of the reaction A → B are the regions of (<sup>A</sup> <sup>∪</sup> <sup>B</sup>)<sup>c</sup> in which the streamlines of the reactive current, *i.e.*, the solutions of

$$\frac{d}{dt}x\_{AB}(t) = j\_{AB}\left(x\_{AB}(t)\right), \quad x\_{AB}(0) \in A$$

are exceptionally dense.

Figure 1. (Top left panel) Three-well energy landscape V as described in the text. (Top right panel) Typical reactive trajectory in the three-well landscape. (Middle left panel) Committor functions qAB for diffusion molecular dynamics with relatively high temperature = 0.6 for the sets A (main well, right-hand side) and B (main well, left-hand side). (Middle right panel) Committor qAB for the low temperature case = 0.15. (Bottom left panel) Transition channels for = 0.6. (Bottom right panel) Transition channels for = 0.15. For details of the computations underlying the pictures, see [22].

Figure 1 illustrates these quantities for the case of a 2D three well potential with two main wells (the bottoms of which we take as A and B in the following) and a less significant third well. The three main saddle points separating the wells are such that the two saddle points between the main wells and the third well are lower in energy than the saddle point between the main wells, such that

in the zero temperature limit, we expect that almost all reactive trajectories take the route through the third well across the two lower saddle points. We observe that the committor functions for low and higher temperatures exhibit smooth isocommittor lines separating the sets A and B, as expected. The transition channels computed from the associated reactive current also show what one should expect: For a lower temperature, the channel through the third well and across the two lower saddle points is dominant, while for a higher temperature, the direct transition from A to B across the higher saddle point is preferred.

These considerations can be generalized to a wide range of different kinds of dynamics in continuous state spaces, including, e.g., full Langevin dynamics, see [17–20].

This example illustrates that TPT in principle allows us to quantify all aspects of the transition behavior underlying a rare event. We can compute transition rates exactly and even characterize the transition mechanisms if we can compute the committor function. Deeper insight using the Feynman–Kac formula yields that the committor function can be computed as the solution of a linear boundary value problem, which for diffusive molecular dynamics reads

$$Lq\_{AB} = 0 \quad \text{in } (A \cup B)^c, \quad q\_{AB} = 0 \text{ in } A, \quad q\_{AB} = 1 \text{ in } B.$$

where the generator L has the following form

$$L = \epsilon \Delta - \nabla V(x) \cdot \nabla \tag{3}$$

where Δ = <sup>i</sup> ∂<sup>2</sup>/∂x<sup>2</sup> <sup>i</sup> denotes the Laplace operator. This equation allows the computation of qAB in relatively low-dimensional spaces, where the discretization of L is possible based on finite element methods or comparable techniques. In realistic biomolecular state spaces, this is infeasible because of the curse of dimensionality. Therefore, TPT gives a complete theoretical background for rare event simulation, but its application in high dimensional situations is still problematic. As a remedy, a discrete version of TPT has been developed [21,22], which can be used in combination with Markov State Modeling; see [23].

#### *2.2. Transition Path Sampling (TPS)*

TPS has been developed in order to sample from the probability distribution of reactive trajectories in so-called "path space", which means nothing else than the space of all discrete or continuous paths starting in A and ending up in B equipped with the probability distribution generated by the dynamics through the ensemble of associated reactive trajectories. Let P<sup>T</sup> denote the path measure on the space of discrete or continuous trajectories {Xt}<sup>0</sup>≤t≤<sup>T</sup> of length T. The *path measure of reactive trajectories* then is

$$P\_T^{AB}(\{X\_t\}\_{0 \le t \le T}) = \frac{1}{Z\_{AB}} \mathbf{1}\_A(X\_0) \, P\_T(\{X\_t\}\_{0 \le t \le T}) \, \mathbf{1}\_B(X\_T) \tag{4}$$

where **1**<sup>A</sup> denotes the indicator function of set A (that is, **1**A(x)=0 if x ∈ A and = 1 otherwise).

TPS is a Metropolis Monte-Carlo (MC) method for sampling P AB <sup>T</sup> ({Xt}<sup>0</sup>≤t≤<sup>T</sup> )) that uses explicit information regarding the path measure P<sup>T</sup> , such as Equation (5), with MC moves that are based on a perturbation of a precomputed reactive trajectory [3,24]. It delivers an ensemble of reactive trajectories of length T that (under the assumption of convergence of the MC scheme) is representative for P AB <sup>T</sup> and thus allows one to compute respective expectation values, like the probability to observe a reactive trajectory or the reactive current. However, its potential drawbacks are obvious: (1) A typical reactive trajectory is very long and rather uninformative (*cf*. Figure 1), *i.e.*, the computational effort of generating an entire ensemble of long reactive trajectories can be prohibitive; (2) convergence of the MC scheme in the infinite dimensional path space can be very poor; and (3) the limitation to a pre-defined trajectory length T can lead to biased statistics of the TPS ensemble. Advanced TPS schemes try to remedy these drawbacks by combining the original TPS idea with interface methods [9]. Even though TPS can be used no matter whether the underlying dynamics is deterministic or stochastic, the algorithm is usually used in connection with deterministic Hamiltonian dynamics [3].

#### 3. Finding Transition Channels

Whenever a transition channel exists, one can try to approximate the center curve of the transition channel instead of sampling the ensemble of reactive trajectories. If the center curve (also: *principal curve*) is a rather smooth object, then such a method would not suffer from the extensive length of reactive trajectories. Several such methods have been introduced; they differ with respect to the definition of the transition channel and the corresponding center or principal curve.

#### *3.1. Action-Based Methods*

Rather than sampling the probability distribution of reactive pathways, such as Equation (4), one can try to obtain a representative or *dominant* pathway, e.g., by computing the pathway that has maximum probability under P<sup>T</sup> . For the case of diffusive molecular dynamics, the path measure P<sup>T</sup> has a probability density relative to a (fictitious) uniform measure on the space of all continuous paths in R<sup>n</sup> of length T that are generated by Brownian motion; the relative density reads

$$\ell(\varphi) = \exp\left(-\frac{1}{2\epsilon}I\_{\epsilon}(\varphi)\right).$$

where I is the Onsager–Machlup action

$$I\_{\epsilon}(\varphi) = \int\_{0}^{T} \left\{ \frac{1}{2} |\dot{\varphi}(s)|^{2} + \frac{1}{2} |\nabla V(\varphi(s))|^{2} - \epsilon \Delta V(\varphi(s)) \right\} dt \,. \tag{5}$$

More precisely, (ϕ) is the limiting ratio between the probability that the solution of Equation (2) remains in a small tubular neighborhood of a smooth path <sup>ϕ</sup>(·) and the probability that <sup>√</sup>2B<sup>t</sup> remains in a small neighborhood of the initial value x = ϕ(0), as the size of the neighborhoods go to zero [25].

The fact that the Euler discretization of the path density , with I interpreted in the sense of Itô integrals, corresponds to the probability density of the Euler-discretized reaction path with respect to Lebesgue measure has led to the idea that by minimizing the Onsager–Machlup action over all continuous paths <sup>ϕ</sup>: [0, T] <sup>→</sup> <sup>R</sup><sup>n</sup> going from <sup>A</sup> to <sup>B</sup>, one can find the dominant reactive path ϕ<sup>∗</sup> = argmin<sup>ϕ</sup> I (ϕ) in the sense of a maximum likelihood estimator. The hope is that this path, often also called the *optimal path* or *most probable path*, on the one hand, contains information on the transition mechanism and, on the other hand, is much smoother and easier to interpret than a typical reactive trajectory. Note, however, that the actual *probability* that the solution of Equation (2) remains in a small neighborhood of a given path ϕ(·) is exponentially small in the size of the neighborhood.

In [7], a comparison between the Onsager–Machlup action and its zero temperature limit has been given using gradient descent methods, raising issues regarding the correct interpretation of the minimizers of I (that need not exist) as *most probable paths*. In [5], the *dominant reaction pathway method* has been outlined, which uses a simplified version of the Onsager–Machlup functional that leads to a computationally simpler optimization problem and is applicable to large-scale problems, e.g., protein folding [6]. However, even if the globally dominant pathways can be computed, such that the optimization does not get stuck in local minima, and even if we ignore the issues regarding the correct interpretation of minimizers, the resulting pathways in general do *not* allow one to gain statistical information on the transition (like rates, currents, mean first passage times).

Another action-based method that has been introduced in [26] is the *MaxFlux* method, which seeks the path that carries the highest reactive flux among all reactive trajectories of a certain length. The idea is to compute the path of least resistance by minimizing the functional

$$L(\varphi) = \int\_0^T \exp\left(\epsilon^{-1} V(\varphi(s))\right) ds \dots$$

Several algorithmic approaches for the minimization of the resistance functional L have been proposed, e.g., a path-based method [27], discretization of the corresponding Euler–Lagrange equation based on a mean-field approximation of it [28] or a Hamilton–Jacobi-based approach using the method of characteristics [29]. Minimizing L for different values of T then yields a collection of paths, each of which carries a certain percentage of the total reactive flux. The method is useful if the temperature is small, so that the reactive flux concentrates around a sufficiently small number of reactive pathways.

#### *3.2. String Method and Variants*

There are several other methods that entirely avoid the computation of reactive trajectories, but try to reconstruct the less complex transition channels or pathways instead, analyzing the energy landscape of the system. One group of such techniques, like the Zero Temperature String method [4], the Geometric Minimum Action method [30] or the Nudged Elastic Band method [31], concentrate on the computation of the *minimal energy path* (MEP), *i.e.*, the path of lowest potential energy between (a point in) A and (a point in) B. Under diffusive molecular dynamics and for vanishing temperature, the MEP is the path that transitions take with probability one [32]. It turns out that the MEP in this case is the minimizer of the Onsager–Machlup action (5) in the limit → 0. For non-zero temperature and a rugged energy landscape, the MEP will in general be not very informative and must be replaced by a finite-temperature transition channel. This is done by the finite-temperature string (FTS) method [33] based on the following considerations: Firstly, the isocommittor surfaces Γα,

$$\rho\_{\alpha}(x) = \frac{1}{Z\_{\alpha}} q(x)(1 - q(x))\,\mu(x), \qquad Z\_{\alpha} = \int\_{\Gamma\_{\alpha}} q(x)(1 - q(x))\,\mu(x)d\sigma\_{\alpha}(x)$$

The idea of the FTS method is that the ensemble of reactive trajectories can be characterized by this distribution on the isocommittor surfaces. Third, one assumes that for each α, the probability density ρ<sup>α</sup> is peaked in just one point ϕ(α) and that the curve ϕ = ϕ(α), α ∈ [0, 1] defined by the sequence of these points forms the center of the (single) transition channel. More precisely, one defines ϕ(α) = x<sup>Γ</sup><sup>α</sup> where the average is taken according to ρ<sup>α</sup> along the respective isocommittor surface Γα. Fourth, it is assumed that the covariance C<sup>α</sup> = (x − ϕ(α)) ⊗ (x − ϕ(α))<sup>Γ</sup><sup>α</sup> , which defines the width of the transition channel, is small, which implies that the isocommittor surfaces can be locally approximated by hyperplanes Pα. The computation of the FTS string ϕ then is done by approximating it via ϕ(α) = x<sup>P</sup><sup>α</sup> , where the average is computed by running constrained dynamics on P<sup>α</sup> while iteratively refining the hyperplanes Pα; see [34] for details. Later extensions [35] remove the restrictions resulting from the hyperplanes by using Voronoi tessellations instead.

The FTS method allows one to compute single transition channels in rugged energy landscapes as long as these are not too extended and rugged. Compared to methods that sample the ensemble of reactive trajectories, it has the significant advantage that the string, that is, the principal curve inside the transition channel, is rather smooth and short, as compared to the typical reactive trajectories. The FTS further allows one to compute the free energy profile F = F(α) along the string,

$$F(\alpha) = -\beta^{-1} \log \int\_{P\_{\alpha}} \mu(x) d\sigma\_{\alpha}(x)$$

that characterizes the transition rates associated with the transition channel (at least in the limits of the approximations invoked by the FTS).

#### 4. Computing Transition Rates

The computation of transition rates can be performed without computing the dominant transition channels or similar objects. There is a list of rather general techniques, with Forward Flux Sampling (FFS) [8], Transition Interface Sampling (TIS) [9] and Milestoning [10] as examples, that approximate transition rates by exploring how the transition progresses from one to the next interface that separates A from B.

#### *4.1. Forward Flux Sampling (FFS)*

The first step of FFS is the choice of a finite sequence of interfaces Ik, k = 1,...,N, in state space between A and B = I<sup>N</sup> . The transition rate kAB comes as the product of two factors: (1) the probability current J<sup>A</sup> of *all* trajectories leaving A and hitting I1; and (2) the probability

$$\mathbb{P}(B|I\_1) = \prod\_{j=1}^{N-1} \mathbb{P}(I\_{k+1}|I\_k)$$

that a trajectory that leaves <sup>I</sup><sup>1</sup> makes it to <sup>B</sup> before it returns to <sup>A</sup>; here, <sup>P</sup>(Ik+1|Ik) denotes the probability that a trajectory starting in I<sup>k</sup> makes it to Ik+1 before it returns to A. FFS first performs a brute-force simulation starting in A, which yields an ensemble of points at the first interface I1, yielding an estimate for the flux J<sup>A</sup> (the number of trajectories hitting I<sup>1</sup> per unit of time). Second, a point from this ensemble on I<sup>1</sup> is selected at random and used to start a trajectory, which is followed until it either hits the next interface <sup>I</sup><sup>2</sup> or returns to <sup>A</sup>; this gives <sup>P</sup>(I2|I1). This procedure then is iterated from interface to interface. Finally, the rate <sup>k</sup>AB <sup>=</sup> <sup>J</sup><sup>A</sup> · <sup>P</sup>(B|I1) is computed. Variants of this algorithm are described in [36,37], for example.

FFS has been demonstrated to be quite general in approximating the flux of reactive trajectories through a given set of interfaces; it can be applied to equilibrium, as well as nonequilibrium systems, and its implementation is easy (see [16,38]). The interfaces used in FFS are, in principle, arbitrary. However, the efficiency of the sampling of the reactive hitting probabilities <sup>P</sup>(Ik+1|Ik) crucially depends on the choice of the interfaces. In practice, the efficiency of FFS will drop dramatically if one does not use appropriate surfaces, and totally misleading rates may result from this. Ideally, one would like to choose these surfaces, so that the computational gain offered by FFS in optimized, but in practice, this is not a trivial task; see [39]. The same is true for TIS that couples TPS with progressing from interface to interface.

#### *4.2. Milestoning*

Milestoning [10] is similar to FFS in so far as it also uses a set of interfaces Ik, k = 1,...,N that separate A and B = I<sup>N</sup> . In contrast to FFS and TIS, the fundamental quantities in Milestoning are the hitting time distributions K<sup>±</sup> <sup>i</sup> (τ ), i = 1,...,N − 1, where K<sup>±</sup> <sup>i</sup> (τ ) is the probability that a trajectory starting at t = 0 at interface I<sup>i</sup> hits I<sup>i</sup>±<sup>1</sup> before time τ . Trajectories that make it to milestone I<sup>i</sup> must come from milestones I<sup>i</sup>±<sup>1</sup> and *vice versa*. In the original algorithm, these distributions are approximated as follows [10]: For each milestone Ii, one first samples the distribution μ constrained to Ii. Based on the resulting sample, we start a trajectory from each point, which is terminated when it reaches one of its two neighboring milestones I<sup>i</sup>±<sup>1</sup>. The hitting times are recorded and collected into two distributions K<sup>±</sup> <sup>i</sup> (τ ).

These local kinetics are then compiled into the global kinetics of the process: For each i, one defines Pi(t) as the probability that the process is found between I<sup>i</sup>−<sup>1</sup> and Ii+1 at time t and that the last milestone hit was Ii. Milestoning is based on a (non-Markovian) construction of Pi(t) from the K<sup>±</sup> <sup>i</sup> (τ ). Its efficiency comes from two sources: (1) It does not require the computation of long reactive trajectories but only short ones between milestones (which therefore should be 'close enough'); (2) It is easily parallelizable. Its disadvantage is the dependence on the milestones that have to be chosen in advance: It can be shown that Milestoning with perfect sampling allows one to compute exact transition rates or mean first passage times if the interfaces are given by the isocommittor surfaces (which in general are not known in advance) [40]; if the interfaces are chosen inappropriately, the results can be rather misleading.

#### 5. Nonequilibrium Forcing and Jarzynski's Identity

The computation of reliable rare event statistics suffers from the enormous lengths of reactive trajectories. One obvious way to overcome this obstacle is to force the system to exhibit the transition of interest on shorter timescales. Therefore, can we *drive* the molecular system to make the required transition more frequently but still compute the exact rare event statistics for the unforced system?

As was shown by Jarzynski and others, nonequilibrium forcing can in fact be used to obtain equilibrium rare event statistics. The advantage seems to be that the external force can speed up the sampling of the rare events by biasing the equilibrium distribution towards a distribution under which the rare event is no longer rare. We will shortly review Jarzynski's identity before discussing the matter in more detail.

#### *5.1. Jarzynski's Identity*

Jarzynski's and Crook's formulae [12,13] relate the equilibrium Helmholtz free energy to the nonequilibrium work exerted under external forcing: Given a system with energy landscape V (x), the total Helmholtz free energy can be defined as

$$F = -\beta^{-1} \log Z \quad \text{with} \quad Z = \int \exp(-\beta V(x)) dx \dots$$

Jarzynski's equality [12] then relates the free energy difference <sup>Δ</sup><sup>F</sup> <sup>=</sup> <sup>−</sup>β−<sup>1</sup> log(Z1/Z0) between two equilibrium states of a system given by an unperturbed energy V<sup>0</sup> and its perturbation V<sup>1</sup> with the work W applied to the system under the perturbation: Suppose we set V<sup>ξ</sup> = (1 − ξ)V<sup>0</sup> + ξV<sup>1</sup> with ξ ∈ [0, 1], and assume we set a protocol that describes how the system evolves from ξ = 0 to ξ = 1. If, initially, the system is distributed according to exp(−βV0), then, by the second law of thermodynamics, it follows that **E**(W) ≥ ΔF where W is the total work applied to the system and **E** denotes the average overall possible realizations of the transition from ξ = 0 to ξ = 1; equality is attained if the transition is infinitely slow (*i.e.*, adiabatic). Jarzynski's identity now asserts that the free energy is always equal to the exponential average of the nonequilibrium work,

$$
\Delta F = -\beta^{-1} \log \mathbf{E} \left[ \exp(-\beta W) \right],
$$

arbitrarily far away from the adiabatic regime. Many generalizations exist: In [13], a generalized version of this fluctuation theorem, the so-called Crook's formula, for stochastic, microscopically reversible dynamics, is derived. In [41,42], it is shown how one can compute conditional free energy profiles along a reaction coordinate for the *unperturbed* system, rather than total free energy differences between a perturbed and unperturbed system.

*Algorithmic application prohibitive.*Despite the fact that Jarzynski's and Crook's formulae are used in molecular dynamics applications [43], their algorithmic usability is limited by the fact that the likelihood ratio between equilibrium and nonequilibrium trajectories is highly degenerate, and the overwhelming majority of nonequilibrium forcings generate trajectories that have almost zero weight with respect to the equilibrium distribution that is relevant for the rare event. This leads to the fact that most rare event sampling algorithms based on Jarzynski's identity have *prohibitively large variance*. Recent developments have reduced this problem by sampling just the reversible work processes based on Crook's formula, but could not fully remove the problem of large variance [44]; see also [45]. Because of this, we will approach the problem of variance reduction subsequently.

#### *5.2. Cumulant Generating Functions*

In order to demonstrate how to improve approaches based on the idea of driving molecular systems to make rare events frequent, we first have to introduce some concepts and notation from statistical mechanics: Let W be a random variable that depends on the sample paths of (Xt)<sup>t</sup>≥<sup>0</sup>, *i.e.*, on molecular dynamics trajectories of the system under investigation. Further, let P be the underlying probability measure on the space of continuous trajectories as introduced in Section 2.2 (but without the restriction to a given length T). We define the *cumulant generating function* (CGF) of W by

$$\gamma(\sigma) = -\sigma^{-1} \log \mathbf{E}[\exp(-\sigma W)] \tag{6}$$

where σ is a non-zero scalar parameter and **E**[f] = f dP denotes the expectation value with respect to P. Note that the CGF is basically the free energy at inverse temperature β as in Jarzynski's formula, but here, it is considered as a function of the independent parameter σ. (Definition (6) differs from the standard CGF only by the prefactor σ−<sup>1</sup> in front.) Taylor expanding the CGF about σ = 0, we observe that <sup>γ</sup>(σ) <sup>≈</sup> **<sup>E</sup>**[W] <sup>−</sup> <sup>σ</sup> <sup>2</sup>**E**[(<sup>W</sup> <sup>−</sup> **<sup>E</sup>**[W])<sup>2</sup>]; hence, for sufficiently small <sup>σ</sup>, the variance is decoupled from the mean. Moreover, it follows by Jensen's inequality that

$$\gamma(\sigma) \le \mathbf{E}[W]$$

where equality is achieved if and only if W is almost surely constant, in accordance with the second law of thermodynamics. (This is the case, e.g., when W is the work associated with an adiabatic transition between thermodynamic equilibrium states.)

#### *Optimal reweighting*.

The CGF admits a variational characterization in terms of relative entropies. To this end, let Q be another probability measure, so that P is absolutely continuous with respect to Q, *i.e.*, the likelihood ratio dP/dQ exists and is Q-integrable. Then, using Jensen's inequality again,

$$\begin{aligned} -\sigma^{-1}\log\int e^{-\sigma W} \,dP &= -\sigma^{-1}\log\int e^{-\sigma W + \log(\frac{dP}{dQ})} \,dQ\\ &\leq \int \left\{W + \sigma^{-1}\log\left(\frac{dQ}{dP}\right)\right\} \,dQ,\end{aligned}$$

which, noting that the logarithmic term is the relative entropy (or Kullback–Leibler divergence) between Q and P, can be recast as

$$\gamma(\sigma) \le \int W \, dQ + H(Q \| P) \tag{7}$$

where

$$H(Q\|\|P) = \sigma^{-1} \int \log\left(\frac{dQ}{dP}\right) dQ\,,\tag{8}$$

and we declare that H(QP) = ∞ if Q does not have a density with respect to P. Again, it follows from the strict convexity of the exponential function that equality is achieved if and only if the new random variable

$$Z = W + \sigma^{-1} \log \left(\frac{dQ}{dP}\right)$$

is Q-almost surely constant. This gives us the following variational characterization of the cumulant generating function that is due to [46]: *Variational formula for the cumulant generating function.*

Let W be bounded from above, with **E**[exp(−σW)] < ∞. Then

$$\gamma(\sigma) = \inf\_{Q \ll P} \left\{ \int W \, dQ + H(Q \| P) \right\} \tag{9}$$

where the infimum runs over all probability measures Q that have a density with respect to P. Moreover, the minimizer Q<sup>∗</sup> exists and is given by

$$dQ^\* = e^{\gamma(\sigma) - \sigma W} \, dP\,\,.$$

#### 6. Optimal Driving from Control Theory

When X<sup>t</sup> denotes stochastic dynamics, such as Equation (2), the above variational formula admits a nice interpretation in terms of an optimal control problem with a quadratic cost. To reveal it, we first need some technical assumptions.

(A1) We define <sup>Q</sup> = [0, T) <sup>×</sup> <sup>O</sup> where <sup>T</sup> <sup>∈</sup> [0, <sup>∞</sup>] and <sup>O</sup> <sup>⊂</sup> <sup>R</sup><sup>n</sup> is a bounded open set with smooth boundary ∂O. Further, let τ < ∞ be the stopping time

$$\tau = \inf\{t > t\_0 \colon (t, X\_t) \notin Q\},$$

*i.e.*, τ is the stopping time that either t = T or X<sup>t</sup> leaves the set O, whichever comes first. (A2) The random variable W is of the form

$$W = \frac{1}{\epsilon} \int\_0^\tau f(X\_t) \, dt + \frac{1}{\epsilon} g(X\_\tau).$$

for some continuous and nonnegative functions f,g : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup>, which are bounded from above and at most polynomially growing in x (compare Jarzynski's formula).

(A3) The potential <sup>V</sup> : <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> in Equation (2) is smooth, bounded below and satisfies the usual local Lipschitz and growth conditions.

We consider the conditioned version of the moment generating function (which is just the exponential of the cumulant generating function):

$$\psi\_{\sigma}(x,t) = \mathbf{E}[\exp(-\sigma W)|X\_t = x]. \tag{10}$$

By the Feynman–Kac theorem, ψ<sup>σ</sup> solves the linear boundary value problem

$$\begin{aligned} \left(\mathcal{A} - \frac{\sigma}{\epsilon} f\right) \psi\_{\sigma} &= 0\\ \psi\_{\sigma}|\_{E^{+}} &= \exp\left(-\frac{\sigma}{\epsilon} g\right) \end{aligned} \tag{11}$$

where <sup>E</sup><sup>+</sup> is the terminal set of the augmented process (t, Xt), precisely <sup>E</sup><sup>+</sup> = ([0, T) <sup>×</sup> ∂O) <sup>∪</sup> ({T} × O), and

$$\mathcal{A} = \frac{\partial}{\partial t} + L$$

is the backward evolution operator associated with Xt, with the shorthand

$$L = \epsilon \Delta - \nabla V \cdot \nabla$$

introduced in Equation (3). Assumptions (A1)–(A3) guarantee that Equation (11) has a unique smooth solution ψ<sup>σ</sup> for all σ > 0. Moreover, the stopping time τ is almost surely finite, which implies that

$$0 < c \le \psi\_{\sigma} \le 1$$

for some constant c ∈ (0, 1).

*Log transformation of the cumulant generating function.*

In order to arrive at the optimal control version of the variational formula (9), we introduce the logarithmic transformation of ψ<sup>σ</sup> as

$$v\_{\sigma}(x,t) = -\frac{\epsilon}{\sigma} \log \psi\_{\sigma}(x,t),$$

which is analogous to the CGF γ, except for the leading factor and the dependence on the initial condition x. As we will show below, v<sup>σ</sup> is related to an optimal control problem. To see this, remember that ψ<sup>σ</sup> is bounded away from zero and note that

$$-\frac{\epsilon}{\sigma} \psi\_{\sigma}^{-1} \mathcal{A} \psi\_{\sigma} = \mathcal{A} v\_{\sigma} - \sigma |\nabla v\_{\sigma}|^2 \,,$$

which implies that Equation (11) is equivalent to

$$\begin{aligned} \mathcal{A}v\_{\sigma} - \sigma |\nabla v\_{\sigma}|^2 + f &= 0 \\ v\_{\sigma}|\_{E^+} &= g \end{aligned}$$

Equivalently,

$$\min\_{\alpha \in \mathbb{R}^n} \{ \mathcal{A}v\_\sigma + \alpha \cdot \nabla v\_\sigma + \frac{1}{4\sigma} |\alpha|^2 + f \} = 0$$
 
$$v\_\sigma |\_{E^+} = g$$

where we have used that

$$-\sigma|y|^2 = \min\_{\alpha \in \mathbb{R}^n} \left\{ \alpha \cdot y + \frac{1}{4\sigma} |\alpha|^2 \right\} \ .$$

(For the general framework of change-of-measure techniques and Girsanov transformations and their relation to logarithmic transformations, we refer to ([47] (Section VI.3)).)

*Optimal control problem.* Equation (12) is a Hamilton–Jacobi–Bellman (HJB) equation and is recognized as the dynamic programming equation of the following optimal control problem: minimize

$$J(u) = \mathbf{E}\left[\int\_0^\tau \left\{ f(X\_t) + \frac{1}{4\sigma}|u\_t|^2 \right\} \, dt + g(X\_\tau) \right] \tag{13}$$

over a suitable space of admissible control functions <sup>u</sup>: [0, <sup>∞</sup>) <sup>→</sup> <sup>R</sup><sup>n</sup> and subject to the dynamics

$$dX\_t = \left( u\_t - \nabla V(X\_t) \right) dt + \sqrt{2\epsilon} dW\_t \,. \tag{14}$$

*Form of optimal control.* In more detail, one can show (e.g., see ([47] (Section IV.2))) that assumptions (A1)–(A3) above imply that Equation (12) has a classical solution (*i.e.*, twice differentiable in x, differentiable in t and continuous at the boundaries). Moreover, v<sup>σ</sup> satisfies

$$v\_{\sigma}(x,t) = \mathbf{E}\left[\int\_{t}^{\tau} \left\{ f(X\_s) + \frac{1}{4\sigma}|u\_s^\*|^2 \right\} \, ds + g(X\_\tau) \middle| X\_t = x \right] \tag{15}$$

where u<sup>∗</sup> is the unique minimizer of J(u) that is given by the Markovian feedback law

$$u\_t^\* = \alpha^\*(X\_t, t)\,,$$

with

$$\alpha^\* = \underset{\alpha \in \mathbb{R}^n}{\text{argmin}} \left\{ \alpha \cdot \nabla v\_{\sigma} + \frac{1}{4\sigma} |\alpha|^2 \right\}.$$

The function v<sup>σ</sup> is called the *value function* or *optimal-cost-to-go* for the optimal control problems (13) and (14). Specifically, vσ(x, t) measures the minimum cost needed to drive the system to the terminal state when started at x at time t. We briefly mention the two most relevant special cases of (13) and (14).

#### *6.1. Case I: The Exit Problem*

We want to consider the limit T → ∞. To this end, call τ<sup>O</sup> = inf{t > 0: X<sup>t</sup> ∈/ O} the first exit time of the set <sup>O</sup> <sup>⊂</sup> <sup>R</sup><sup>n</sup>. The stopping time <sup>τ</sup> = min{T,τO} then converges to <sup>τ</sup>O, *i.e.*,

$$
\min\{T, \tau\_O\} \to \tau\_O.
$$

As a consequence (using monotone convergence), v<sup>σ</sup> converges to the value function of an optimal control problem with cost functional

$$J\_{\infty}(u) = \mathbf{E}\left[\int\_{0}^{\tau\_{O}} \left\{ f(X\_{t}) + \frac{1}{4\sigma}|u\_{t}|^{2} \right\} \, dt + g(X\_{\tau\_{O}}) \right]. \tag{16}$$

It can be shown that the value function

$$v\_{\sigma}(x,t) = \mathbf{E}\left[\int\_{t}^{\tau\_{O}} \left\{ f(X\_{s}) + \frac{1}{4\sigma}|u\_{s}^{\*}|^{2} \right\} \, ds + g(X\_{\tau}) \, \middle| \, X\_{t} = x \right]$$

with u<sup>∗</sup> = argmin J∞(u) is *independent* of the initial time t; hence, we can drop the dependence on t and redefine vσ(x) := vσ(x, t). The value function now solves the boundary value HJB equation

$$\begin{aligned} \min\_{\alpha \in \mathbb{R}^n} \{ Lv\_{\sigma} + \alpha \cdot \nabla v\_{\sigma} + \frac{1}{4\sigma} |\alpha|^2 + f \} &= 0 \\ v\_{\sigma}|\_{\partial O} &= g \end{aligned} \tag{17}$$

#### *6.2. Case II: Finite Time Horizon Optimal Control*

If we keep T < ∞ fixed while letting O grow, such that diam(O) → ∞, where diam(O) = sup{r > 0: Br(x) ⊂ O, x ∈ O} is understood as the maximum radius r > 0 that an open ball Br(·) contained in O can have, it follows that

$$\min\{T, \tau\_O\} \to T.$$

In this case, v<sup>σ</sup> converges to the value function with a finite time horizon and cost functional

$$J\_T(u) = \mathbb{E}\left[\int\_0^T \left\{ f(X\_t) + \frac{1}{4\sigma}|u\_t|^2 \right\} \, dt + g(X\_T) \right]. \tag{18}$$

Now, <sup>v</sup><sup>σ</sup> is again a function on <sup>R</sup><sup>n</sup> <sup>×</sup> [0, T] and given by

$$v\_{\sigma}(x,t) = \mathbf{E}\left[\int\_{t}^{T} \left\{ f(X\_s) + \frac{1}{4\sigma}|u\_s^\*|^2 \right\} \, ds + g(X\_\tau) \, \middle| \, X\_t = x \right],$$

with u<sup>∗</sup> being the minimizer of J<sup>T</sup> (u). The value function solves the backward evolution HJB equation

$$\begin{aligned} \min\_{\alpha \in \mathbb{R}^n} \{ \mathcal{A}v\_\sigma + \alpha \cdot \nabla v\_\sigma + \frac{1}{4\sigma} |\alpha|^2 + f \} &= 0 \\ v\_\sigma(x, T) &= g(x) \,, \end{aligned} \tag{19}$$

with a terminal condition at time t = T.

#### *6.3. Optimal Control Potential and Optimally Controlled Dynamics*

The optimal control u<sup>∗</sup> that minimizes the functional in Equation (13) is again of gradient form and given by

$$u\_t^\* = -2\sigma \nabla v\_\sigma(X\_t, t).$$

as can be readily checked by minimizing the corresponding expression in Equation (12) over α. Given vσ, the *optimally controlled dynamics* reads

$$dX\_t = -\nabla U(X\_t, t)dt + \sqrt{2\epsilon}dW\_t\,,\tag{20}$$

with the *optimal control potential*

$$U(x,t) = V(x) + 2\sigma v\_{\sigma}(x,t) \,. \tag{21}$$

In the case when T → ∞ (Case I, above), the biasing potential is independent of t.

Remarks. Some remarks are in order.

(a) Monte-Carlo estimators of the conditional CGF

$$\gamma(\sigma; x) = -\sigma^{-1} \log \mathbf{E} [\exp(-\sigma W) | X\_0 = x]$$

that are based on the optimally controlled dynamics have zero variance. This is so because the optimal control minimizes the variational expression in Equation (9), but at the minimum, the random variable inside the expectation must be almost surely constant (as a consequence of Jensen's inequality and the strict convexity of the exponential function). Hence, we have a *zero-variance estimator* of the conditional CGF.


$$W\_{\xi} = \int\_{0}^{T} f(X\_t, \xi\_t) \, dt.$$

where f is the nonequilibrium force exerted on the system under driving it with some prescribed protocol <sup>ξ</sup> : [0, T] <sup>→</sup> <sup>R</sup>; in this case, the dynamics <sup>X</sup><sup>t</sup> depend on <sup>ξ</sup>t, as well, and writing down the HJB equation according to Equation (19) is straightforward. However, even if we can solve Equation (19), we do not get zero-variance estimators for the free energy

$$F(\xi\_T) - F(\xi\_0) = -\beta^{-1} \log \mathbf{E}[\exp(-\beta W\_\xi)]\dots$$

The reason for this is simple: Jarzynski's formula requires that the initial conditions are chosen from an equilibrium distribution, say, π<sup>0</sup> the equilibrium distribution corresponding to the initial value ξ<sup>0</sup> of the protocol, but optimal controls are defined point-wise for each state (t, Xt) and

$$\begin{aligned} & -\beta^{-1} \log \int\_{\mathbb{R}^n} \mathbf{E} [\exp(-\beta W\_{\xi}) | X\_0 = x] \, d\pi\_0(x) \\ & \neq -\beta^{-1} \int\_{\mathbb{R}^n} \log \mathbf{E} [\exp(-\beta W\_{\xi}) | X\_0 = x] \, d\pi\_0(x) \, . \end{aligned}$$

In other words:

$$F(\xi\_T) - F(\xi\_0) \neq \int\_{\mathbb{R}^n} v\_\beta(x, 0) \, d\pi\_0(x) \, .$$

(d) A similar argument as the one underlying the derivation of the HJB equation from the linear boundary value problem yields that Jarzynski's formula can be interpreted as a two-player zero-sum differential game (*cf*. [50]).

#### 7. Characterize Rare Events by Optimally Controlled MD

Now, we illustrate how to use the results of the last section in practice. We will mainly consider the case discussed in Section 6.1 regarding the statistical characterization of hitting a certain set.

#### *7.1. First Passage Times*

Roughly speaking, the CGF encodes information about the moments of any random variable W that is a functional of the trajectories (Xt)<sup>t</sup>≥<sup>0</sup>. For example, for f = and T → ∞, we obtain the CGF of the first exit time from O, *i.e.*,

$$-\sigma^{-1}\log\mathbf{E}\_x[\exp(-\sigma\tau\_O)] = \min\_u \mathbf{E}\_x^u \left[\tau\_O + \frac{1}{4\sigma} \int\_0^{\tau\_O} |u\_t|^2 \,dt\right]$$

where we have introduced the shorthand **E**x[·] = **E**[·|X<sup>0</sup> = x] to denote the conditional expectation when starting at X<sup>0</sup> = x and the superscript "u" to indicate that the expectation is understood with respect to the controlled dynamics

$$dX\_t = \left(u\_t - \nabla V(X\_t)\right)dt + \sqrt{2\epsilon}dW\_t$$

where **E** = **E**<sup>0</sup> denotes expectation with respect to the unperturbed dynamics.

#### *7.2. Committor Probabilities Revisited*

It is not only possible to use the moment generating function to collect statistics about rare events in terms of the cumulant generating function, but also to express the committor function directly in terms of an optimal control problem (see Section 2.1 for the definition of the committor qAB between to sets A and B). To this end, let σ = 1, and suppose we divide ∂O into two sets B ⊂ ∂O and A = ∂O \ B (*i.e.*, τ<sup>O</sup> is the stopping time that is defined by hitting either A or B). Setting

$$f = 0 \quad \text{and} \quad g(x) = -\epsilon \log \mathbf{1}\_B(x)$$

reduces the moment generating function (10) to

$$\psi\_1(x) = \mathbf{E}\_x[\mathbf{1}\_B(X\_{\tau o})]$$

or, in more familiar terms,

$$\psi\_1(x) = \mathbf{P}[X\_{\tau o} \in B \land X\_{\tau o} \notin A | X\_0 = x] = q\_{AB}(x) \dots$$

According to Equation (16) the corresponding optimal control problem has the cost functional

$$J(u) = \mathbf{E}\left[\frac{1}{4}\int\_0^{\tau\_O} |u\_s|^2 \, ds - \epsilon \log \mathbf{1}\_B(X\_{\tau\_O})\right],$$

which amounts to a control problem with zero terminal cost when ending up in B and an infinite terminal cost for hitting A. Therefore, the HJB equation for v = v<sup>1</sup> has a singular boundary value at A; it reads

$$\begin{aligned} \min\_{\alpha \in \mathbb{R}^n} \{ Lv + \alpha \cdot \nabla v + \frac{1}{4} |\alpha|^2 \} &= 0 \\ v|\_A &= \infty, \quad v|\_B = 0 \dots \end{aligned}$$

Setting v(x) = −log qAB(x) yields the equality

$$-\log q\_{AB}(x) = \min\_{u} \mathbf{E}\_x^u \left[ \frac{1}{4\epsilon} \int\_0^{\tau\_O} |u\_s|^2 ds - \log \mathbf{1}\_B(X\_{\tau\_O}) \right].$$

In this case, the optimally controlled dynamics (20) is of the form

$$dX\_t = -\nabla U\_{AB}(X\_t)dt + \sqrt{2\epsilon}dW\_t\,,$$

with optimal control potential

$$U\_{AB}(x) = V(x) - 2\epsilon \log q\_{AB}(x) \, .$$

Remarks. Some remarks on the committor equation follow:


$$\exp(-\beta U\_{AB}(x)) = q\_{AB}^2(x) \exp(-\beta V(x))$$

where we used β = 1/.

#### *7.3. Algorithmic Realization*

For the exit problem ("Case I", above), one can find an efficient algorithm for computing the conditional CGF γ(σ; x) or, equivalently, the value function vσ(x) in [52]. The idea of the algorithm is to exploit that, according to Equations (20) and (21), the optimal control is of gradient form. The latter implies that the value function can be represented as a minimization of the cost functional over time-homogeneous candidate functions C for the optimal bias potential, in other words,

$$v\_{\sigma}(x) = \min\_{C} \mathbb{E}\_{x} \left[ \int\_{0}^{\tau\_{O}} \left\{ f(X\_{t}) + \frac{1}{4\sigma} |\nabla C\_{t}|^{2} \right\} \, dt + g(X\_{\tau\_{O}}) \right] \tag{22}$$

where the expectation **E** is understood with respect to the path measure generated by

$$dX\_t = -\left(\nabla C(X\_t) + \nabla V(X\_t)\right)dt + \sqrt{2\epsilon}dW\_t\dots$$

Once the optimal C has been computed, both value function and CGF can be recovered by setting

$$v\_{\sigma}(x) = -\frac{C(x)}{2\sigma} \quad \text{and} \quad \gamma(\sigma; x) = -\frac{C(x)}{2\epsilon\sigma}.$$

The algorithm that finds the optimal C works by iteratively minimizing the cost functional for potentials C from a finite-dimensional ansatz space, *i.e.*,

$$C(x) = \sum\_{j=1}^{M} a\_j \varphi\_j(x)\,,$$

with appropriately chosen ansatz functions ϕ<sup>j</sup> . The iterative minimization is then carried out on the M-dimensional coefficient space of the a1,...,aM. With this algorithm, we are able to compute the optimal control potential for the exit problem in the two interesting cases: first passage times and committor probabilities (as outlined in Sections 7.1 and 7.2).

Remarks. Let us briefly comment on some aspects of the gradient descent algorithm.


#### *7.4. Numerical Examples*

In our first example, we consider diffusive molecular dynamics as in Equation (2) with = 0.1 and V being the five-well potential shown in Figure 2. We first compute the CGF of the first passage time as discussed in Section 7.1, using the gradient descent algorithm described in Section 7.3 with 10 Gaussian ansatz functions that are centered around the critical points of the potential energy function. The resulting optimal control potential (21) after roughly 20 iterations of the gradient descent is displayed in Figure 2 for different values of σ. As the set O, we take the whole state space, except a small neighborhood of its global minimum of V , so that its complement O<sup>c</sup> is identical to the vicinity of the global minimum and the exit time τ<sup>O</sup> is the first passage time to O<sup>c</sup> . Figure 2 shows that the optimal control potential alters the original potential V significantly in the sense that for σ > 0, the set O<sup>c</sup> is the bottom of the only well of the potential, so that all trajectories starting somewhere else will quickly enter O<sup>c</sup> .

Figure 2. Five-well potential (left) and associated optimal control potential for the first passage time to the target set O<sup>c</sup> given by a small interval around the main minimum x<sup>1</sup> (right) for different values of σ (right). = 0.1; the gradient descent solution fully agrees with the reference finite element solution (that is not shown) in the "eye-norm".

This case is instructive: For the unperturbed original dynamics, the mean first passage time **<sup>E</sup>**x(τO) takes values of around <sup>10</sup><sup>4</sup> for x > <sup>−</sup>2. For the optimally controlled dynamics, the mean first passage times into O<sup>c</sup> are less than five for σ = 0.1, 0.5, 1.0, so that the estimation of **E**x(τO) resulting from the optimal control approach requires trajectories that are a factor of at least 10<sup>3</sup> shorter than the ones we would have to use by direct numerical simulation of the unperturbed dynamics.

Figure 3 shows the optimal control potentials for computation of the committor qAB, as described in Section 7.2. We observe that the optimal control potential exhibits a singularity at the boundary of the basin of attraction of the set A. That is, it prevents the optimally controlled dynamics from entering the basin of attraction of A and, thus, avoids the waste of computational effort by unproductive returns to A.

In our second example, we consider two-dimensional diffusive molecular dynamics as in Equation (2) with the energy landscape V being the three-well potential shown in Figure 1. In Figure 4, the optimal control potential for computing the committors qAB between the two main wells for two different temperatures = 0.15 and = 0.6 are displayed. The numerical solution is based on a Galerkin approximation of the log-transformed HJB equation, using precomputed committor functions as the basis set; see [60] for details.

Figure 3. Optimally-corrected potential for the case of J being the committor qAB for B being the ±0.1-interval around the main minimum x<sup>1</sup> of the potential. (Left panel) A =]x<sup>3</sup> −0.1, x<sup>3</sup> + 0.1[ the ±0.1 interval around the highest minimum x3. (Right panel) A =]x<sup>2</sup> − 0.1, x<sup>2</sup> + 0.1[ the ±0.1 interval around the second lowest minimum x2.

As in our former experiment, we observe that the optimal control potential prevents the dynamics from returning to A; in addition, it flattens the third well significantly, such that the optimally controlled dynamics in any case quickly goes into B. For = 0.15, a TPS sampling of reactive trajectories between the two main wells, precisely from A to B with A and B, as indicated in Figure 4, results in an average length of 367 for reactive trajectories based on the original dynamics. For the optimally controlled dynamics, we found an average length of 1.3.

Figure 4. Optimally-corrected potential for the three-well potential shown in Figure 1 for the committor qAB for the medium temperature = 0.6 case (left), the low temperature = 0.15 case (right) and for the sets A (ellipse in main well, right-hand side) and B (ellipse in main well, left-hand side). Note that the committor basis is not smooth at the boundaries of the initial and target sets (see Figure 1 for comparison), which explains the roughness of the control potential in the neighborhood of the sets A and B.

#### 8. Conclusions

We have surveyed various techniques for the characterization and computation of rare events occurring in molecular dynamics. Roughly, the approaches fall into two categories: (a) methods that approach the problem by characterizing the ensemble of reactive trajectories between metastable states or (b) path-based methods that target dominant transition channels or pathways by minimization of suitable action functionals. Methods of the first type, e.g., Transition Path Theory, Transition Path Sampling, Milestoning or variants thereof, are predominantly Monte-Carlo-type methods for generating one very long or many short trajectories, from which the rare event statistics can then be estimated. Methods that belong to the second category, e.g., MaxFlux, Nudged-Elastic Band or the String Method, are basically optimization methods (sometimes combined with a Monte-Carlo scheme); here, the objectives are few (single or multiple) smooth pathways that describe, e.g., a transition event. It is clear that this classification is not completely unambiguous, in that action-based methods for computing most probable pathways can be also used to sample an ensemble of reactive trajectories. Another possible classification (with its own drawbacks) is along the lines of the biased-unbiased dichotomy that distinguishes between methods that characterize rare events based on the original dynamics and methods that bias the underlying equilibrium distribution towards a new probability distribution under which the rare events are no longer rare. Typical representatives of the second class range from biasing force methods, such as ABF or metadynamics, up to genuine nonequilibrium approaches based on Jarzynski's identity for computing free energy profiles. The problem often is that rare event estimators based on an ensemble of nonequilibrium trajectories suffer from large variances, unless the bias is cleverly chosen.

We have described a strategy to find such a cleverly chosen perturbation, based on ideas from optimal control. The idea rests on the fact that the cumulant generating function of a certain observable, e.g., the first exit time from a metastable set, can be expressed as the solution to an optimal control problem, which yields a zero variance estimator for the cumulant generating function. The control acting on the system has essentially two effects: (1) Under the controlled dynamics, the rare events are no longer rare, as a consequence of which the simulations become much shorter; (2) The variance of the statistical estimators is small (or even zero if the optimal control is known exactly). We should stress that, depending on the type of observable, the approach only appears to be a nonequilibrium method, for the optimal control is an exact gradient of a biasing potential; hence, the optimally perturbed system satisfies a detailed balance, which is one criterion for thermodynamic equilibrium. Future research should address the question as to whether the approach is competitive for realistic molecular systems, how to efficiently and robustly extract information about specific moments rather than cumulant generating functions and how to extend it to the more general observables or the calculation of free energy profiles.

#### Acknowledgments

The authors are grateful to Eric Vanden-Eijnden, Giovanni Ciccotti, Frank Pinski and Christoph Dellago for valuable discussions and comments. Ralf Banisch and Tomasz Badowski hold scholarships from the Berlin Mathematical School (BMS). This work was supported by the DFG Research Center "Mathematics for key technologies" (MATHEON) in Berlin.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


246


Reprinted from *Entropy*. Cite as: Sarich, M.; Banisch, R.; Hartmann, C.; Schütte, C. Markov State Models for Rare Events in Molecular Dynamics. *Entropy* 2014, *16*, 258–286.

*Article*

## Markov State Models for Rare Events in Molecular Dynamics

Marco Sarich **<sup>1</sup>***,* \*, Ralf Banisch **<sup>1</sup>**, Carsten Hartmann **<sup>1</sup>** and Christof Schütte **<sup>1</sup>***,***<sup>2</sup>**

<sup>1</sup> Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 6, Berlin 14195, Germany; E-Mails: ralf.banisch@fu-berlin.de (R.B.); chartman@mi.fu-berlin.de (C.H.); christof.schuette@fu-berlin.de (C.S.)

<sup>2</sup> Zuse Institute Berlin, Takustr. 7, Berlin 14195, Germany

\* Author to whom correspondence should be addressed; E-Mail: sarich@math.fu-berlin.de; Tel.: +49-30-838-75-322.

*Received: 18 September 2013; in revised form: 3 December 2013 / Accepted: 9 December 2013 / Published: 30 December 2013*

Abstract: Rare, but important, transition events between long-lived states are a key feature of many molecular systems. In many cases, the computation of rare event statistics by direct molecular dynamics (MD) simulations is infeasible, even on the most powerful computers, because of the immensely long simulation timescales needed. Recently, a technique for spatial discretization of the molecular state space designed to help overcome such problems, so-called Markov State Models (MSMs), has attracted a lot of attention. We review the theoretical background and algorithmic realization of MSMs and illustrate their use by some numerical examples. Furthermore, we introduce a novel approach to using MSMs for the efficient solution of optimal control problems that appear in applications where one desires to optimize molecular properties by means of external controls.

Keywords: rare events; Markov State Models; long timescales; optimal control

#### 1. Introduction

Stochastic processes are widely used to model physical, chemical or biological systems. The goal is to approximately compute interesting properties of the system by analyzing the stochastic model. As soon as randomness is involved, there are mainly two options for performing this analysis: (1) Direct sampling and (2) the construction of a discrete coarse-grained model of the system. In a direct sampling approach, one tries to generate a statistically significant amount of events that characterize the property of the system one in which is interested. For this purpose, computer simulations of the model are a powerful tool. For example, an event could refer to the transition between two well-defined macroscopic states of the system. In chemical applications, such transitions can often be interpreted as reactions or, in the context of a molecular system, as conformational changes. Interesting properties are, e.g., average waiting times for such reactions or conformational changes and along which pathways the transitions typically occur. The problem with a direct sampling approach is that many interesting events are so-called rare events. Therefore, the computational effort for generating sufficient statistics for reliable estimates is very high, and particularly if the state space is continuous and high dimensional, estimation by direct numerical simulation is infeasible.

Available techniques for rare event simulations in continuous state space are discussed in [1]. In this article, we will discuss approach (2) to the estimation of rare event statistics via *discretization* of the state space of the system under consideration. That is, instead of dealing with the computation of rare events for the original, continuous process, we will approximate them by a so-called Markov State Model (MSM) with *discrete* finite state space. The reason is that for such a discrete model, one can numerically compute many interesting properties without simulation, mostly by solving linear systems of equations as in discrete transition path theory (TPT) [2]. We will see that this approach, called Markov State Modeling, *avoids* the combinatorial explosion of the number of discretization elements with the increasing size of the molecular system in contrast to other methods for spatial discretization.

The actual construction of an MSM requires one to sample certain transition probabilities of the underlying dynamics between sets. The idea is: (1) to choose the sets such that the sampling effort is much lower than the direct estimation of the rare events under consideration; and (2) to compute all interesting quantities for the MSM from its transition matrix, *cf*. [2,3]. There are many examples for the successful application of this strategy. In [4], for example, it was used to compute dominant folding pathways for the PinWW domain in explicit solvent. However, we have to make sure that the Markov State Model approximates the original dynamics well enough. For example, the MSM should correctly reproduce the timescales of the processes of interest. These approximation issues have been discussed since more than a decade now [5,6]; in this article, we will review the present state of research on this topic. In the algorithmic realization of Markov State Modeling for realistic molecular systems, the transition probabilities and the respective statistical uncertainties are estimated from short molecular dynamics (MD) trajectories only, *cf*. [7]. This makes Markov State Modeling applicable to many different molecular systems and processes, *cf*. [8–13].

In the first part of this article, we will discuss the approximation quality of two different types of Markov State Models that are defined with respect to a full partition of state space or with respect to so-called core sets. We will also discuss the algorithmic realization of MSMs and provide references to the manifold of realistic applications to molecular systems in equilibrium that are available in the literature today.

The second part will show how to use MSMs for optimizing particular molecular properties. In this type of application, one wants to steer the molecular system at hand by external controls in a way such that a pre-selected molecular property is optimized (minimized or maximized). That is, one wants to compute a specific external control from a family of admissible controls that optimizes the property of interest under certain side conditions. The property to be optimized can be quite diverse: For example, it can be (1) the population of a certain conformation that one wants to maximize under a side condition that limits the total work done by the external control or (2) the mean first passage time to a certain conformation that one wants to minimize (in order to speed up a rare event), but under the condition that one can still safely estimate the mean first passage time of the uncontrolled system. The theoretical background of case (1) has been considered in [14], for example, and of case (2) in [1,15]. There, one finds the mathematical problem that has to be solved in order to compute the optimal control. Here, we will demonstrate that one can use MSMs for the efficient solution of such a mathematical problem (for both cases). We will see that the spatial discretization underlying an MSM turns the high-dimensional continuous optimal control problem into a rather low-dimensional discrete optimal control problem of the same form that can be solved efficiently. Based on these insights, MSM discretization yields an efficient algorithm for solving the optimal control problem, whose performance we will outline in some numerical examples, including an application to alanine dipeptide.

#### 2. MSM Construction

Let (Xt)<sup>t</sup>≥<sup>0</sup> be a time-continuous Markov process on a continuous state space, <sup>E</sup>, e.g., <sup>E</sup> <sup>⊂</sup> <sup>R</sup><sup>d</sup>. That is, X<sup>t</sup> is the state of the molecular system at time t resulting from any usually used form of molecular dynamics simulation, be it based on Newtonian dynamics with thermostats or resulting from Langevin dynamics or other diffusion molecular dynamics models. The idea of Markov State Modeling is to derive a Markov chain, (Xˆk)<sup>k</sup>∈<sup>N</sup>, on a finite and preferably small state space <sup>E</sup><sup>ˆ</sup> <sup>=</sup> {1, ..., n} that models characteristic dynamics of the continuous process, (Xt). For example, in molecular dynamics applications, such characteristic dynamics could refer to protein folding processes [16,17], conformational rearrangements between native protein substates [18,19], or ligand binding processes [20]. Since the approximating Markov chain, (Xk)<sup>k</sup>∈<sup>N</sup>, lives on a finite state space, the construction of an MSM boils down to the computation of its transition matrix, P:

$$P\_{ij} = \mathbb{P}[\hat{X}\_{k+1} = j | \hat{X}\_k = i] \tag{1}$$

The main benefit is that for a finite Markov chain, one can compute many interesting dynamical properties directly from its transition matrix, e.g., timescales and the metastability in the system [5,21,22], a hierarchy of important transition pathways [2] or mean first passage times between selected states. With respect to an MSM, these computations should be used afterwards to answer related questions for the original continuous process. To do this, we must be able to link the states of the Markov chain back to the spatial information of the original process, and the approximation of the process (Xt) by the MSM must be valid in some sense.

Having this in mind, the first natural idea is to let the states of an MSM correspond to sets A1, ..., A<sup>n</sup> ⊂ E in continuous state space that form a full partition, *i.e.*,:

$$A\_i \cap A\_j = \emptyset \text{ for } i \neq j, \qquad \bigcup\_{i=1}^n A\_i = E \tag{2}$$

Typical choices for such sets are box discretizations or Voronoi tessellations [23]. For such a full partition, it is trivial to also define a corresponding discretized process by the original switching dynamics between the sets. For a given lag time, τ > 0, we can define the index process:

$$
\tilde{X}\_k = i \Leftrightarrow X\_{k\tau} \in A\_i \tag{3}
$$

It is well known that this process is not Markovian, mainly due to the so-called recrossing problem. This refers to the fact that the original process typically crosses the boundary between two sets, A<sup>i</sup> and A<sup>j</sup> , several times when transitions take place, as illustrated in Figure 1. This results in cumulative transitions between indices i and j for the index process, that is, a not memoryless transition behavior.

Figure 1. Cumulative transitions between two sets along boundaries are typical.

The non-Markovianity of the index process is often seen as a problem in Markov State Modeling, because many arguments assume that X˜<sup>k</sup> is a Markov process. In this article, we will *not* make this assumption. We interpret the process (X˜k) as a tool to construct the following transition matrix, P<sup>τ</sup> :

$$P\_{ij}^{\tau} = \mathbb{P}[\tilde{X}\_{k+1} = j | \tilde{X}\_k = i] = \mathbb{P}[X\_{(k+1)\tau} \in A\_j | X\_{k\tau} \in A\_i] \tag{4}$$

and, hence, the MSM as the Markov chain, (Xˆk)<sup>k</sup>∈<sup>N</sup>, associated with this transition matrix. From above, it is clear that, in general, we have <sup>X</sup>ˆ<sup>k</sup> <sup>=</sup> <sup>X</sup>˜k, and in [24] it was analyzed how these two processes relate in terms of density propagation. In the following, we will show under which assumptions and in which sense the MSM (Xˆk) will be a good approximation of the original dynamics given by (Xt). For convenience, we will usually write <sup>P</sup><sup>τ</sup> <sup>≡</sup> <sup>P</sup> and leave the τ -dependence implicit.

#### 3. Analytical Results

In order to compare the MSM to the continuous process, we introduce one of the key objects for our analysis, the *transfer operator* of a Markov process. We assume that the Markov process (Xt)

254

has a unique, positive invariant probability measure, μ, and that it is time-reversible. Then, for any time-step, t ≥ 0, we define the transfer operator, Tt, via the property:

$$\int\_{A} T\_t v(y) \mu(dy) = \int\_{E} v(x) p(t, x, A) \mu(dx) \qquad \text{for all measurable } A \tag{5}$$

as an operator <sup>T</sup><sup>t</sup> : <sup>L</sup><sup>2</sup>(μ) <sup>→</sup> <sup>L</sup><sup>2</sup>(μ). Here, <sup>p</sup>(t, x, A) = <sup>P</sup>[X<sup>t</sup> <sup>∈</sup> <sup>A</sup>|X<sup>0</sup> <sup>=</sup> <sup>x</sup>] defines the transition probability measure and L<sup>2</sup>(μ) denotes the Hilbert space of functions v with:

$$\int\_{E} v(y)^2 \mu(dy) \le \infty \tag{6}$$

and the scalar product:

$$
\langle v, w \rangle = \int\_{E} v(y)w(x)\mu(dy) \tag{7}
$$

Note that T<sup>t</sup> is nothing else other than the propagator of densities under the dynamics, but the densities are understood as densities with respect to the measure, μ. That is, if the Markov process is initially distributed according to:

$$\mathbb{P}[X\_0 \in A] = \int\_A v\_0(x)\mu(dx) \tag{8}$$

its probability distribution at time t is given by:

$$\mathbb{P}[X\_t \in B] = \int\_B v\_t(x)\mu(dx), \qquad v\_t = T\_t v\_0 \tag{9}$$

The benefit of working with μ-weighted densities is that the transfer operator, Tt, becomes essentially self-adjoint on L<sup>2</sup>(μ) for all cases of molecular dynamics satisfying some form of detailed balance condition. Hence, it has real eigenvalues and orthogonal eigenvectors with respect to Equation (7) (or, at least, the dominant spectral elements are real-valued). Moreover, the construction of an MSM can be seen as a projection of the transfer operator [25]. Assume Q is an orthogonal projection in <sup>L</sup><sup>2</sup>(μ) onto an <sup>n</sup>-dimensional subspace, <sup>D</sup> <sup>⊂</sup> <sup>L</sup><sup>2</sup>(μ), with <sup>1</sup> <sup>∈</sup> <sup>D</sup>, and <sup>χ</sup>1, ..., χ<sup>n</sup> is a basis of D. Then, the so-called projected transfer operator, QTτQ : D → D, has the matrix representation:

$$P\_Q = PM^{-1} \tag{10}$$

with the non-negative, invertible mass matrix, <sup>M</sup> <sup>∈</sup> <sup>R</sup>n,n, with entries:

$$M\_{ij} = \frac{\langle \chi\_i, \chi\_j \rangle}{\langle \chi\_i, \mathbbm{1} \rangle} \tag{11}$$

The matrix, <sup>P</sup> <sup>∈</sup> <sup>R</sup>n,n, is also non-negative and has entries:

$$P\_{ij} = \frac{\langle \chi\_i, T\_\tau \chi\_j \rangle}{\langle \chi\_i, \mathbb{1} \rangle} \tag{12}$$

Full Partition MSM. If we choose χ<sup>i</sup> = 1<sup>A</sup><sup>i</sup> to be the characteristic function of set A<sup>i</sup> for i = 1, ..., n, one can easily check that we get M = I to be the identity matrix and:

$$P\_{ij} = \mathbb{P}\_{\mu}[X\_{\tau} \in A\_j | X\_0 \in A\_i] \tag{13}$$

as in Equation (4). The subscript, μ, shall indicate that X<sup>0</sup> ∼ μ. Therefore, the transition probabilities are evaluated along equilibrium paths.

The previously constructed transition matrix of the MSM based on a full partition can be interpreted as a projection onto a space of densities that are constant on the partitioning sets. This interpretation of an MSM is useful, since it allows one to analyze its approximation quality. For example, in [25,26], it is proven that we can reproduce an eigenvalue, λ, of a self-adjoint transfer operator, Tt, by the MSM by choosing the subspace appropriately. That is, if u is a corresponding normalized eigenvector, <sup>Q</sup> the orthogonal projection to a subspace, <sup>D</sup>, with <sup>1</sup> <sup>∈</sup> <sup>D</sup>, then there exists an eigenvalue, λˆ, of the projected transfer operator, QTtQ, with:

$$|\lambda - \hat{\lambda}| \le \lambda\_1 \delta (1 - \delta^2)^{-\frac{1}{2}}$$

where λ<sup>1</sup> < 1 is the largest non-trivial eigenvalue of T<sup>t</sup> and δ = u − Qu.

In particular, for <sup>δ</sup> <sup>≤</sup> <sup>3</sup> <sup>4</sup> , one can simplify the equation to:

$$|\lambda - \hat{\lambda}| \le 2\lambda\_1 \delta \tag{14}$$

An eigenvalue, λi, of the transfer operator directly relates to an implied timescale, Ti, of the system via:

$$\mathcal{T}\_i = -\frac{\tau}{\log(\lambda\_i)}\tag{15}$$

Therefore, the transition matrix Equation (4) that we construct from transitions between the sets, A1, ..., An, will generate a Markov chain that will reproduce the original timescales well if the partitioning sets are chosen such that the corresponding eigenvectors are almost constant on these sets. In this case, δ = u − Qu; that is, the approximation error of the eigenvector by a piecewise constant function on the sets will be small.

The projection error, δ, depends on our choice of the discretizing sets. As an example, let us consider a diffusion in the potential that is illustrated in Figure 2, that is, the reversible Markov process given by the stochastic differential equation:

$$dX\_t = -\nabla V(X\_t)dt + \sqrt{2\varepsilon}dB\_t\tag{16}$$

where V is the potential, B<sup>t</sup> denotes a Brownian motion and ε > 0.

The figure also shows a choice of three sets that form a full partition of state space. The computation of the transition matrix Equation (4) for σ = 0.7 and a lag time τ = 1 yields:

$$P\_Q = P = \begin{pmatrix} 0.9877 & 0.0123 & 0.0000 \\ 0.0420 & 0.9160 & 0.0419 \\ 0.0000 & 0.0123 & 0.9877 \end{pmatrix}$$

that has three eigenvalues λ<sup>0</sup> = 1, λ<sup>1</sup> = 0.9877, λ<sup>2</sup> = 0.9037. Table 1 shows the two resulting implied timescales Equation (15) in comparison to the timescales of the original system.

Figure 2. A potential with three wells and a choice of three sets, A1, A2, A3.

Table 1. Comparison of implied timescales


As one can see, the timescales are strongly underestimated. This is a typical phenomenon. From a statistical point of view, the recrossing problem will lead to cumulatively appearing transition counts when one computes the transition probabilities, <sup>P</sup>μ[X<sup>τ</sup> <sup>∈</sup> <sup>A</sup><sup>j</sup> <sup>|</sup>X<sup>0</sup> <sup>∈</sup> <sup>A</sup>i], from a trajectory (Xt), as discussed above. Therefore, on average, transitions between sets seem to become too likely, and hence, the processes in the coarse-grained system get accelerated. We have seen in Equation (14) that this cannot happen if the associated eigenvectors can be approximated well by the subspace that corresponds to the MSM. Figure 3 shows the first non-trivial eigenvector, u1, belonging to the timescale T<sup>1</sup> = 103.7608 and its best-approximation by a step function.

Figure 3. The first non-trivial eigenvector, u<sup>1</sup> (solid blue), and its projection, Qu<sup>1</sup> (dashed red), onto step functions that are constant on A1, A2, A3.

The eigenvector is indeed almost constant in the vicinity of the wells, but within the transition region between the wells, the eigenvector is varying and the approximation by a step function is not accurate. Therefore, we have two explanations of why the main error is introduced in the region close to shared boundaries of neighboring sets: (1) because of recrossing issues; and (2) because of the main projection error of the associated eigenvector. Of course, one solution would be an adaptive refinement of the discretization, that is, one could choose a larger number of smaller sets, such that the eigenvector is better approximated by a step function on these sets. In the following section, we will present an alternative solution for overcoming the recrossing problem and reducing the projection error without refining the discretization.

#### 4. The Core Set Approach

From Equation (10), we know how to compute a matrix representation for a projected transfer operator for an arbitrary subspace, <sup>D</sup> <sup>⊂</sup> <sup>L</sup><sup>2</sup>(μ). For a given basis, <sup>χ</sup>1, ..., χn, we have to compute Equations (11) and (12), so:

$$M\_{ij} = \frac{\langle \chi\_i, \chi\_j \rangle}{\langle \chi\_i, \mathbb{1} \rangle}, \qquad P\_{ij} = \frac{\langle \chi\_i, T\_\tau \chi\_j \rangle}{\langle \chi\_i, \mathbb{1} \rangle} \tag{17}$$

In general, the evaluation of these scalar products for arbitrary basis functions is a non-trivial task. On the other hand, we have seen that for characteristic functions χ<sup>i</sup> = 1<sup>A</sup><sup>i</sup> on a full partition, we do not have to compute the scalar products numerically, since the matrix entries have a stochastic interpretation in terms of transition probabilities between set Equation (13). This means they can be directly estimated from a trajectory of the process, which is a strong computational advantage, particularly in high-dimensional state spaces.

Now, the question is if there is another basis other than characteristic functions that: (a) is more adapted to the eigenvectors of the transfer operator; and (b) still leads to a probabilistic interpretation of the matrix entries Equation (17), such that scalar products never have to be computed. The basic idea is to stick to a set-oriented definition of the basis, but to relax the full partition constraint. We will define our basis with respect to so-called *core sets*, C1, ..., C<sup>n</sup> ⊂ E, that are still disjoint, so C<sup>i</sup> ∩ C<sup>j</sup> = ∅, but they do not have to form a full partition. Figure 4 suggests that this could lead to a reduction of the recrossing phenomenon, since the sets do not share boundaries anymore.

Figure 4. Core sets do not have to share boundaries anymore. This can reduce the recrossing effect.

Now, we use the core sets to define our basis functions, χ1, ..., χn. Assume T<sup>τ</sup> is, again, a self-adjoint transfer operator and consider n core sets C1, ..., Cn. For every i, take the committor function, χi, of the process with respect to core set Ci; that is, χi(x) denotes the probability to hit the core set, Ci, next, rather than the other core sets, when starting the process in x. If we now study the projection, Q, onto the space spanned by these committor functions, the two following properties hold [25,27].

(P1) The matrices, M and P, in Equation (10) can be written as:

$$M\_{ij} = \mathbb{P}\_{\mu}[\tilde{X}\_k^+ = j | \tilde{X}\_k^- = i], \qquad P\_{ij} = \mathbb{P}\_{\mu}[\tilde{X}\_{k+1}^+ = j | \tilde{X}\_k^- = i] \tag{18}$$

where (X˜ <sup>+</sup> <sup>k</sup> ) and (X˜ <sup>−</sup> <sup>k</sup> ) are forward and backward milestoning processes [25,28]; that is, X˜ <sup>−</sup> <sup>k</sup> <sup>=</sup> <sup>i</sup> if the process came at time <sup>t</sup> <sup>=</sup> kτ , last from core set <sup>C</sup><sup>i</sup> and <sup>X</sup>˜ <sup>+</sup> <sup>k</sup> = j if the process went next to core set C<sup>j</sup> after time t = kτ .

(P2) Let u<sup>i</sup> be an eigenvector of T<sup>τ</sup> that is almost constant on the core sets. Let the region C = E \ ; <sup>i</sup> C<sup>i</sup> that is not assigned to a core set be left quickly enough, so Ex[τ (C<sup>c</sup> )]  T<sup>i</sup> for all <sup>x</sup> <sup>∈</sup> <sup>C</sup>, where <sup>T</sup><sup>i</sup> is the timescale associated with <sup>u</sup><sup>i</sup> and <sup>E</sup>x[<sup>τ</sup> (C<sup>c</sup> )] is the expected hitting time of C<sup>c</sup> = ; <sup>i</sup> C<sup>i</sup> when starting in x ∈ C. Then, u<sup>i</sup> − Qui is small; so, the committor approximation to the eigenvector is accurate.

The message behind (P1) is that it is possible to relax the full partition constraint and use a core set discretization that does not cover the whole state space. We can still define a basis for a projection of the transfer operator that leads to a matrix representation that can be interpreted in terms of transition probabilities.

Important Remark: The construction of the projection onto the committors is only necessary for theoretical purposes. In practice, neither the committor functions nor scalar products between the committors have to be computed numerically, since the matrix entries of M and P can be estimated from trajectories again.

Property (P2) yields that the relaxation of the full partition constraint should also lead to an improvement of the MSM if the region, C, between the core sets is typically left on a faster timescale than the processes of interest taking place. Let us get back to the example from above. We will see that we can achieve a strong improvement of the approximation by simply excluding a small part of state space from our discretization. In Figure 5, we have turned our initial full partition into a core set discretization by removing parts of the transition region between the wells.

The matrix P<sup>Q</sup> = PM−<sup>1</sup> that represents the projection, QTτQ, of the transfer operator onto the committor space associated with the core sets is given by:

$$P\_Q = \begin{pmatrix} 0.9897 & 0.0103 & 0.0000 \\ 0.0352 & 0.9298 & 0.0351 \\ 0.0000 & 0.0103 & 0.9897 \end{pmatrix}$$

Comparing to the MSM for the full partition one can see that transitions between indices i and j, i = j are less likely. Table 2 shows this leads to a far more accurate reproduction of the timescales in the system.

From the discussion above, this has to be expected, because the eigenvectors are almost constant in the vicinity of the wells, and we removed a part of state space from the discretization that is typically left quickly compared to the timescales, T<sup>1</sup> and T2. Therefore, the committor functions should deliver a good approximation of the first two eigenvectors. Figure 6 underlines this theoretical result.

Figure 5. Excluding a small region of state space from the sets, A1, A2, A3, as in Figure 2, to form core sets C1, C2, C<sup>3</sup> that do not share boundaries anymore.

Figure 6. (Upper panel) The first non-trivial eigenvector, u<sup>1</sup> (solid blue), and its projection, Qfu<sup>1</sup> (finely dashed red), onto step functions (full partition) and its projection, Qcu<sup>1</sup> (dashed green), onto committors (core sets). (Lower panel) The same plot for the second non-trivial eigenvector, u2.


Table 2. More accurate approximation if implied timescales.

#### 5. Practical Considerations and MD Applications

In the previous sections, we have interpreted the construction of an MSM as a projection of the dynamics onto some finite dimensional ansatz space. We have discussed two types of spaces that both have been defined on the basis of a set discretization. First, we chose a full partition of state space and the associated space of step functions, and second, we analyzed a discretization by core sets and the associated space spanned by committor functions. These two methods have the advantage that the resulting projections lead to transition matrices for the MSM with entries that are given in terms of transition probabilities between the sets. That is, one can compute estimates for the transition matrices from simulation data. This is an important property for practical applications, because it means that we never need to compute committor functions or scalar products between committors or step functions. We rather generate trajectories x0, x1, ...x<sup>N</sup> of the process (Xt), let us say, for a time step h > 0, so x<sup>i</sup> = Xhi. For example, we can then define for a full partition, A1, ..., Am, and a lag time <sup>τ</sup> <sup>=</sup> nh the discrete trajectory <sup>s</sup><sup>k</sup> <sup>=</sup> <sup>i</sup> <sup>⇔</sup> <sup>x</sup><sup>k</sup> <sup>∈</sup> <sup>A</sup><sup>i</sup> and compute the matrix, <sup>P</sup>ˆ:

$$\hat{P}\_{ij} = \frac{C\_{ij}}{\sum\_{j} C\_{ij}}, \qquad C\_{ij} = \sum\_{k=0}^{N-n} \mathbb{1}\_{\{s\_k = i\}} \mathbb{1}\_{\{s\_{k+n} = j\}} \tag{19}$$

It is well known [7] that Pˆ is a maximum likelihood estimator for the full partition MSM transition matrix Equation (4). Similarly one can also compute estimates for a core set MSM by using the definition of milestoning processes [27,28]. That is, if we have core sets C1, ..., Cm, a lag time τ = nh as before, and we define discrete milestoning trajectories by:

> s<sup>−</sup> <sup>k</sup> = i ⇔ x<sup>k</sup> ∈ A<sup>i</sup> or came last from A<sup>i</sup> before time k s+ <sup>k</sup> = i ⇔ x<sup>k</sup> ∈ A<sup>i</sup> or went next to A<sup>i</sup> after time k

we can compute an estimator Pˆ<sup>Q</sup> = PˆMˆ <sup>−</sup><sup>1</sup> of the core set MSM matrix Equation (10) by counting transitions:

$$\hat{P}\_{ij} = \frac{C\_{ij}}{\sum\_{j} C\_{ij}}, \qquad C\_{ij} = \sum\_{k=0}^{N-n} \mathbb{1}\_{\{s\_k^- = i\}} \mathbb{1}\_{\{s\_{k+n}^+ = j\}} \tag{20}$$

$$
\hat{M}\_{ij} = \frac{N\_{ij}}{\sum\_{j} N\_{ij}}, \qquad N\_{ij} = \sum\_{k=0}^{N} \mathbb{1}\_{\{s\_k^- = i\}} \mathbb{1}\_{\{s\_k^+ = j\}} \tag{21}
$$

Since, in practice, we will only have a finite amount of data available, we will have statistical errors when constructing an MSM. This is an additional error to the projection error related to the discretization that we have discussed above. On the other hand, one should note that these errors are *not independent* of each other. For example, it is clear that if we take a full partition of state space, and we let the partition become arbitrarily fine by letting the number of sets go to infinity, the discretization error will vanish. At the same time, for a fixed amount of statistics, the statistical error will become arbitrarily large, because we will need to compute more and more estimators for transition events between the increasing number of sets. For more information on statistical errors, we refer to the literature [7,29].

Besides the choice of discretization and the available statistics, the estimates above also depend on a lag time, τ . This dependence can be used to validate an MSM by a Chapman–Kolmogorov test [7]. This is based on the fact that the MSM matrices approximately form a semi-group for all large enough lag times τ>τ <sup>∗</sup>; although, for small lag times, this is typically not true, due to memory effects. These facts also motivate one to look at something, like an infinitesimal generator, that approximately generates these MSM transition matrices for large enough lag times. In [27], two types of generator constructions have been compared for a core set setting. The first generator, K, is simply constructed from the transition rates between the core sets in the milestoning sense, that is:

$$K\_{ij} = \lim\_{T \to \infty} \frac{N\_{ij}^T}{R\_i^T}, i \neq j \qquad K\_{ii} = -\sum\_{j \neq i} K\_{ij} \tag{22}$$

where N<sup>T</sup> ij is the amount of time in [0, T] the process has spent on its way from core set C<sup>i</sup> to C<sup>j</sup> and R<sup>T</sup> <sup>i</sup> is the total time in [0, T] the process came last from Ci. On the other hand, one can see [27,30] that K<sup>∗</sup> = KM−<sup>1</sup> with the mass matrix, M, from above Equation (18), can be interpreted as a projection of the original generator of the process and, also, as a derivative of the core set MSM from above, *i.e.*,:

$$K^\* = \lim\_{\tau \to 0} \frac{PM^{-1} - I}{\tau} \tag{23}$$

where P depends on τ Equation (17).

Let us now analyze how the choice of core sets, particularly the size of the core sets, influences the resulting approximation. Therefore, we consider an MD example that was discussed in [27], namely one molecule of alanine dipeptide monitored via its φ and ψ backbone dihedral angles. Two core sets are defined as balls with radius r around the two points with angular coordinates x<sup>α</sup> = (−80, −60) and x<sup>β</sup> = (−80, 170). The stationary distribution of the process and the two centers of the core sets, xα, xβ, in the angular space are shown in Figure 7.

For computing a reference timescale, several MSMs based on three different full partitions using 10, 15 and 250 sets have been constructed for increasing lag times. In [27], it is shown that in each setting, the estimate for the longest implied timescale of the process converged to ≈19 ps for large enough τ . Now, the implied timescales for the two different generators, K Equation (22) and K<sup>∗</sup> Equation (23), are computed. In Figure 8, the resulting timescales are plotted against the reference timescale ≈ 19 ps for varying size of the core sets.

Figure 7. The stationary distribution of alanine dipeptide and the two centers of the core sets, xα, xβ, in the angular space as white dots.

Figure 8. Estimate of the implied timescales from K Equation (22), the projected generator K<sup>∗</sup> Equation (23) and the reference computed from several full partition Markov State Models (MSMs).

One can see that the estimate by the milestoning generator, K, is rather sensitive to the size of core sets. It overestimates the timescales for small core sizes and underestimates it for larger core sizes. On the other hand, the projected generator, K∗, can never overestimate the timescale, due to its interpretation as projection. It is also rather robust against the choice of size of the core sets until the core sets become too large, e.g., r > 15. Then, the discretization becomes close to a full partition discretization using only two sets. In this case, the timescales have to be underestimated heavily, because of recrossing phenomena. On the other hand, the underestimation for very small core sets has to be explained by a lack of statistics. When the core sets are chosen as arbitrarily small, it is clearly more difficult for the process to hit the sets, and therefore, transition events become rare. Note that for the straightforward milestoning generator, K, the processes seem to become very slow, but for the projected generator K<sup>∗</sup> = KM−<sup>1</sup>, this effect is theoretically corrected by the mass matrix, M. Nevertheless, in both cases, the generation of enough statistics will be problematic for too small core sets.

#### 6. Further Applications in MD

Markov State Modeling has been show to apply successfully to many different molecular systems, like peptides, including time-resolved spectroscopic experiments [10–12], proteins and protein folding [4,9,13], DNA [31] and ligand-receptor interaction [32]. In most of the respective publications, full partition MSMs are used, and the underlying discretization is based on cluster finding methods (see [7] for a review), while the sampling issues are tackled by means of ideas from enhanced sampling [33] and based on ensembles of rather short trajectories instead of one long one, *cf*. [4]. Core set-based approaches have been used just recently [10,27]; related algorithms are less well developed. However, recent work has shown that and how every full partition MSM can be easily transformed into a core set-based MSM with significantly *improved* approximation quality [34], making core set MSMs the most promising next generation MSM tools.

Very Rare Transitions between Discretization Sets. When constructing a full partition or a core set MSM, we have to estimate transition probabilities between sets in state space, and it can happen that we cannot avoid that some of these transitions are *very* rare. That is, the transition probabilities for a lag time, τ , between some sets can be non-zero, but small even, if compared to the remaining transition probabilities that are small already. This is why it is important to note that neglecting these very rare transitions during the construction of an MSM does *not* harm its approximation quality. For example, we can define for a transition matrix, P, another transition matrix, P˜, by:

$$\tilde{P}\_{ij} = \begin{cases} P\_{ij}, & i \neq j, (i, j) \notin R \\ 0, & i \neq j, (i, j) \in R \end{cases} \tag{24}$$

where R denotes the set of pairs of indices for which the transition are very rare and for which we set the transition probability to zero. If the Markov chain is reversible and (i, j) ∈ R ⇔ (j, i) ∈ R, one can show that for all ordered eigenvalues, λk(P) and λk(P˜), it holds that:

$$|\lambda\_k(P) - \lambda\_k(\tilde{P})| \le \max\_i \sum\_{j \ne i, (i, j) \in R} P\_{ij} \tag{25}$$

That is, if we cannot estimate a very small transition probability, Pij , for a very rare transition event between two sets, A<sup>i</sup> and A<sup>j</sup> , and even totally neglect this probability by setting it to zero, the timescales of the MSM remain almost unaffected. Thus, if we compute the set of the "first order" transition probability of a system correctly enough and ignore all "higher order" ones, then our accuracy will not be spoiled. This nicely illustrates the main advantage of MSM modeling compared to classical long-term simulation: since only neighboring core sets have to be connected by accurately estimated rates, the long residence time of long-term trajectories between and in core sets can be avoided, thus cutting down total simulation time.

Computation from Trajectories. Clearly, constructing and analyzing a core set MSM will only have a computational advantage compared to the direct sampling of a rare event if the transition events between neighboring core sets occur on a much shorter timescale than the rare event itself. One should note that from the purely theoretical point of view, it would be optimal to choose only very few core sets in the most metastable regions of state space, because this would minimize the projection error δ = u − Qu for each dominant eigenvector u, as discussed in Section 3. On the other hand, when estimating the MSM from trajectories, only a finite amount of statistics will be available, so there will also be a statistical error. In order to keep the total error small, additional core sets in less metastable parts of state space typically have to be introduced. In the end, this makes the estimation of a core set MSM possible without having to sample rare events. Note that the projection error is still under control, as long as there is a transition region between the core sets that is typically left very quickly (see Property (P2) in Section 4).

In practice, the statistics of the transition events between core sets will preferably be estimated from many short trajectories using milestoning techniques [27,28] and parallel computing. However, any algorithm for the construction of a core set MSM has to find a balance between sampling issues (not too many too long trajectories needed) and discretization issues (not too many core sets). Construction of such an algorithm still is ongoing research.

This article cannot give a detailed review on the algorithmic realization of MSMs for realistic molecular systems and on the findings resulting from such applications, since this is discussed to some extent elsewhere; see [7] for a recent review of the algorithmic aspects and [32,35] for ligand-receptor interaction.

#### 7. MSM for Optimal Control Problems

In this section, we will borrow ideas from the previous section and explain how MSMs can be used to discretize optimal control problems that are linear-quadratic in the control variables and which appear in, e.g., the sampling of rare events. Specifically, we consider the case that (Xt)<sup>t</sup>≥<sup>0</sup> is the solution of:

$$dX\_t = (\sqrt{2}u\_t - \nabla V(X\_t))dt + \sqrt{2\varepsilon}dB\_t \tag{26}$$

with potential V , Brownian motion B<sup>t</sup> and temperature ε > 0, as in Equation (16), and an unknown control variable, <sup>u</sup>: [0, <sup>∞</sup>) <sup>→</sup> <sup>R</sup><sup>d</sup>, that is chosen so as to minimize the cost function:

$$J(u;x) = \mathbb{E}\left[\int\_0^\tau \left(f(X\_s) + \frac{1}{2}\left|u\_t\right|^2\right)ds\middle|X\_0 = x\right] \tag{27}$$

(The factors of <sup>1</sup>/<sup>2</sup> and <sup>√</sup><sup>2</sup> in front of the control terms are for notational convenience.) Here, <sup>f</sup> <sup>≥</sup> <sup>0</sup> is a bounded continuous function called *running cost* and τ < ∞ (almost surely) is a random stopping time that is determined by X<sup>t</sup> hitting a given target set, A ⊂ E, *i.e.*, τ = inf{t > 0: X<sup>t</sup> ∈ A}, in other words, we are interested in controlling X<sup>t</sup> = X<sup>u</sup> <sup>t</sup> until it reaches A. As an example, consider the case f = 1 with the potential considered in Figure 5 and the target region, A, around the left well. This situation is illustrated in Figure 9 and amounts to the situation that one seeks to minimize the time to reach A by tilting the potential towards A; tilting the potential too much is prevented by the quadratic penalization term in the cost functional that grows when too much force is applied.

Figure 9. The potential from Figure 5 (blue) and a tilted potential to minimize the time required to hit the target set, A (green).

Other choices of f in Equation (26) result in alternative applications. One obvious application would be to set τ = T to a fixed time and f to the characteristic function of the complement of a conformation set <sup>C</sup>, <sup>f</sup> <sup>=</sup> <sup>1</sup><sup>E</sup>\<sup>C</sup>. In this case, minimization of <sup>J</sup> wrt. the control <sup>u</sup><sup>t</sup> would mean maximization of the probability to find the system in the conformation, C, until time T under a penalty on the external work done to the system. See [14] for more details on such applications.

There are other types of cost functions, J, one might consider, e.g., control until a deterministic finite time τ = T is reached or, even, τ → ∞, and the construction would follow analogously. For compactness, we consider here only cost functions as in Equation (27).

Optimal Control and Equilibrium Expectation Values. It turns out that when minimizing J, it is sufficient to consider control strategies that are Markovian and depend only on Xt, *i.e.*, we consider feedback laws of the form <sup>u</sup><sup>t</sup> <sup>=</sup> <sup>α</sup>(Xt) for some smooth function, <sup>α</sup>: <sup>E</sup> <sup>→</sup> <sup>R</sup><sup>d</sup>. Moreover, only controls with finite energy are considered, for otherwise, J(u; x) = ∞. For control problems of the form Equations (26) and (27), the optimal feedback function can be shown to be α∗(x) = − <sup>√</sup>2∇W, where W is the value function or optimal-cost-to-go [1,15]:

$$W(x) = \min\_{u} J(u; x) \tag{28}$$

with the minimum running over all admissible Markovian feedback strategies. It can be shown that W satisfies the following dynamic programming equation of the Hamilton–Jacobi–Bellman type (see [36]):

266

$$\begin{aligned}LW(x) - |\nabla W(x)|^2 + f &= 0 \\ W|\_A &= 0 \end{aligned} \tag{29}$$

with the second-order differential operator:

$$L = \varepsilon \Delta - \nabla V \cdot \nabla$$

that is the infinitesimal generator of the process, Xt, for u = 0. If the value function, W, is known, it can be plugged into the equation of motion, which then turns out to be of the form:

$$dX\_t^\* = -\nabla U(X\_t^\*)dt + \sqrt{2\varepsilon}dB\_t\tag{30}$$

with the new potential:

$$U(x) = V(x) + 2W(x)$$

The difficulty is that Equation (29) is a nonlinear partial differential equation and for realistic high-dimensional systems, it is not at all obvious how to discretize it, employing any kind of state space partitioning. It has been demonstrated in [14,15] that Equation (29) can be transformed into a linear equation by a logarithmic transformation. Setting: W(x) = −ε log φ(x) it readily follows, using chain rule and Equation (29), that φ solves the linear equation:

$$\begin{aligned}(L - \varepsilon^{-1}f)\phi &= 0\\\phi|\_A &= 1\end{aligned} \tag{31}$$

The last equation is linear and can be solved by using MSMs, as we will show below. Moreover, by the Feynman–Kac theorem [37], the solution to Equation (31) can be expressed as:

$$\phi(x) = \mathbb{E}\left[\exp\left(-\frac{1}{\varepsilon}\int\_0^\tau f(X\_t)dt\right)\middle|X\_0 = x\right] \tag{32}$$

where X<sup>t</sup> solves the control-free equation:

$$dX\_t = -\nabla V(X\_t)dt + \sqrt{2\varepsilon}dB\_t$$

That is, the optimal control for Equation (26) can be computed by solving Equation (31), which can be done *in principle* via Monte Carlo approximation of the expected value in Equation (32) if critical slowing down by rare events can be avoided.

Remark. The optimization problem Equation (28) admits an interpretation in terms of entropy minimization: let Q = Q<sup>u</sup> <sup>x</sup> and P = Q<sup>0</sup> <sup>x</sup> denote the path probability measures of controlled and uncontrolled trajectories starting at x at time t = 0, and set:

$$Z = \int\_0^\tau f(X\_s) \, ds$$

Then, it follows that we can write:

$$W(x) = \min\_{Q \ll P} J(u; x), \qquad J(u; x) = \int \left\{ Z + \varepsilon \log \left( \frac{dQ}{dP} \right) \right\} dQ \tag{33}$$

where the notation "Q  P" means that Q has a density (That is, the density function, dQ/dP, exists and is almost everywhere positive and normalized) with respect to P. It turns out that for every such Q, there is exactly one control strategy, u, such that Q = Q<sup>u</sup> <sup>x</sup> is generated by Equation (26); in this sense, the notation in Equation (33) is meaningful. The second term:

$$H(Q\|\|P) = \varepsilon \int \log\left(\frac{dQ}{dP}\right) dQ$$

is the relative entropy or Kullback–Leibler divergence between Q and P. For details on this matter that are based on Girsanov transformations for stochastic differential equations, we refer to [38] or the article [1] in this special issue.

#### 8. MSM Discretization of Optimal Control Problems

The basic idea is now to choose a subspace, <sup>D</sup> <sup>⊂</sup> <sup>L</sup><sup>2</sup>(μ), with basis <sup>χ</sup>1,...,χ<sup>n</sup> as in Markov State Modeling and then discretize the dynamic programming Equation (29) of our optimal control problem by projecting the equivalent log transformed Equation (31) onto that subspace. As we will see, the resulting discrete matrix equation can be transformed back into an optimal control problem for a discrete Markov jump process (MJP).

We will do this construction for the full partition case χ<sup>i</sup> = 1<sup>A</sup><sup>i</sup> and the core set case χ<sup>i</sup> = q<sup>i</sup> discussed earlier. We will see that in both cases, we arrive at a structure-preserving discretization of the original optimal control problem, where the states of the corresponding MJP will be related to the partition subsets, Ai. The first case will give us back a well-known lattice discretization for continuous control problems, the Markov chain approximation [39]. This is illustrated in the following diagram:

$$\begin{aligned} \text{Linear equation} \\ &= \begin{cases} \text{SDE} \\ \end{cases} \begin{aligned} \text{SDE} \\ \end{aligned} \\ &= \epsilon \log \phi \end{aligned} \\ \text{Control Problem} \qquad \begin{aligned} \text{MJD} \\ \end{aligned} \qquad \begin{aligned} \text{MJD} \\ \text{MJD} \\ \end{aligned}$$

$$\begin{aligned} \text{Control Problem} \qquad \begin{aligned} \text{W} &= \epsilon \log \hat{\phi} \\ &\qquad \text{?} \\ \end{aligned} \\ \text{where} \quad \hat{W} = \min\_{v} \hat{J}(v) \end{aligned}$$

Subspace Projection. The key step for the discretization is that we pick a suitable subspace, <sup>D</sup> <sup>⊂</sup> <sup>L</sup><sup>2</sup>(μ), that is adapted to the boundary value problem Equation (31). Specifically, we require that the subspace contains the constant function, <sup>1</sup> <sup>∈</sup> <sup>D</sup>, and that it gives a good representation of the most dominant metastable sets. To this end, we choose basis functions χ1,...,χn+1 with the following properties:


Now, let Q be the orthogonal projection onto D, and define the matrices:

$$F\_{ij} = \frac{\langle \chi\_i, f \chi\_j \rangle}{\langle \chi\_i, \mathbb{1} \rangle}, \qquad K\_{ij} = \frac{\langle \chi\_i, L \chi\_j \rangle}{\langle \chi\_i, \mathbb{1} \rangle}$$

Now, if φ solves the linear boundary value problem Equation (31), then the coefficients, φˆ1,..., φˆn+1, of its finite-dimensional representation Qφ = <sup>j</sup> <sup>φ</sup>ˆjχ<sup>j</sup> on the subspace, <sup>D</sup>, satisfy the constrained linear system:

$$\sum\_{j=1}^{n+1} \left( K\_{ij} - \varepsilon^{-1} F\_{ij} \right) \hat{\phi}\_j = 0, \quad i \in \{1, \ldots, n\} \tag{34}$$
 
$$\hat{\phi}\_{n+1} = 1$$

that is the discrete analogue of Equation (31). The discrete solution φˆ = Qφ is optimal in the sense of being the best approximation of φ in the energy norm, *i.e.*,:

$$\|\|\phi - \hat{\phi}\|\|\_{A} = \inf\_{\psi \in D} \|\|\phi - \psi\|\|\_{A} \tag{35}$$

where:

$$\|\phi\|\_A^2 = \left\langle \phi, (\varepsilon^{-1}f - L)\phi \right\rangle.$$

is the energy norm on <sup>L</sup><sup>2</sup>(μ), and the infimum runs over all functions, <sup>ψ</sup> <sup>∈</sup> <sup>L</sup><sup>2</sup>(μ), that are of the form ψ(x) = <sup>j</sup> <sup>ψ</sup>jχ<sup>j</sup> (x) with coefficients <sup>ψ</sup><sup>j</sup> <sup>∈</sup> <sup>R</sup>. This is a standard result about projections of PDEs; see [40] for details. (By the same argument as in the previous sections, <sup>A</sup> <sup>=</sup> <sup>ε</sup>−<sup>1</sup><sup>f</sup> <sup>−</sup> <sup>L</sup> is symmetric and positive definite as an operator on the weighted Hilbert space, L<sup>2</sup>(μ). Moreover, φ<sup>2</sup> <sup>A</sup> <sup>=</sup> <sup>ε</sup>−<sup>1</sup>φ, fφ <sup>+</sup> <sup>ε</sup>∇φ, <sup>∇</sup>φ.) In analogy with Equation (14), we can use the above result to get the error estimate: 

$$\|\|\phi - \hat{\phi}\|\|\_{\mu}^{2} \le \left(1 + \frac{1}{\delta^{2}} \|QAQ^{\perp}\|^{2}\right) \inf\_{\psi \in D} \|\|\phi - \psi\|\|\_{\mu}^{2} \tag{36}$$

where <sup>A</sup> <sup>=</sup> <sup>ε</sup>−<sup>1</sup><sup>f</sup> <sup>−</sup> <sup>L</sup> is a shorthand for the operator appearing in Equation (31) and the constant δ > <sup>0</sup> is defined, such that v<sup>2</sup> <sup>A</sup> <sup>≥</sup> <sup>δ</sup>v<sup>2</sup> <sup>μ</sup> holds for all <sup>v</sup> <sup>∈</sup> <sup>L</sup><sup>2</sup>(μ); see [41]. The bottom line of Equation (35) shows that discretizing Equation (31) via Equation (34) minimizes the projection error measured in the energy norm. Since all functions are μ-weighted, the approximation will be good in regions visited with high probability and less good in regions with lower probability. The error estimate Equation (36) is along the lines of the MSM approximation result: if we switch to the norm on L<sup>2</sup>(μ), the function φˆ = Qφ is still almost the best approximation of φ, provided that A leaves the subspace, D, almost invariant. As was pointed out earlier, this is exactly the case when the χ<sup>i</sup> are close to the eigenfunctions of A (e.g., when the system is metastable).

The best approximation error Q⊥φ<sup>μ</sup> = inf<sup>ψ</sup>∈<sup>D</sup> φ − ψμ, which appears in Equation (36), will vanish if the χ<sup>i</sup> form an arbitrarily fine full partition of E. If we follow the core set idea from Section 4 and choose the χ<sup>i</sup> to be committor functions, we have good control over Q⊥φμ. Due to [41]:

$$\|Q^{\perp}\phi\|\_{\mu} \le \|P^{\perp}\phi\|\_{\mu} + \mu (C)^{1/2} \left[\kappa \|f\|\_{\infty} + 2\|P^{\perp}\phi\|\_{\infty}\right] \tag{37}$$

where <sup>C</sup> <sup>=</sup> <sup>E</sup> \ ∪iC<sup>i</sup> is the transition region, <sup>κ</sup> = sup<sup>x</sup>∈<sup>C</sup> <sup>E</sup>xτ<sup>E</sup>\<sup>C</sup> is the maximum expected time of hitting the metastable set from outside (which is short) and P is the orthogonal projection onto

268

the subspace <sup>V</sup> <sup>=</sup> {<sup>v</sup> <sup>∈</sup> <sup>L</sup><sup>2</sup>(μ), v <sup>=</sup> const on every <sup>C</sup>i} ⊂ <sup>L</sup><sup>2</sup>(μ). Note that <sup>P</sup> <sup>⊥</sup><sup>φ</sup> = 0 on <sup>C</sup>. The errors, P <sup>⊥</sup>φ<sup>μ</sup> and P <sup>⊥</sup>φ∞, measure how constant the solution, φ, is on the core sets. Hence, Equation (37) together with Equation (36) gives us complete control over the approximation error of our projection method, even if we consider just a few core sets. In Section 9, we will investigate the full and core set partition cases further.

Properties of the Projected Problem. We introduce now the diagonal matrix, Λ, with entries Λii = <sup>j</sup> <sup>F</sup>ij (zero otherwise) and the full matrix <sup>G</sup> <sup>=</sup> <sup>K</sup> <sup>−</sup>ε−<sup>1</sup>(<sup>F</sup> <sup>−</sup>Λ), and rearrange Equation (34) as follows:

$$\sum\_{j=1}^{n+1} \left( G\_{ij} - \varepsilon^{-1} \Lambda\_{ij} \right) \hat{\phi}\_j = 0, \quad i \in \{1, \ldots, n\} \tag{38}$$
 
$$\hat{\phi}\_{n+1} = 1$$

This equation can be given a stochastic interpretation. To this end, let us introduce the vector, <sup>π</sup> <sup>∈</sup> <sup>R</sup>n+1, with nonnegative entries <sup>π</sup><sup>i</sup> <sup>=</sup> χi, <sup>1</sup> and notice that <sup>i</sup> π<sup>i</sup> = 1 follows immediately from the fact that the basis functions, χi, form a partition of unity, *i.e.*, <sup>i</sup> χ<sup>i</sup> = 1. This implies that <sup>π</sup> is a probability distribution on the discrete state space <sup>E</sup><sup>ˆ</sup> <sup>=</sup> {1,...,n + 1}. We summarize properties of the matrices, K, F and G; see also [41]:

(M1) <sup>K</sup> is a generator matrix of an MJP (Xˆt)<sup>t</sup>≥<sup>0</sup> (*i.e.*, <sup>K</sup> is a real-valued square matrix with row sum zero and positive off-diagonal entries) with stationary distribution, π, that satisfies detailed balance:

$$
\pi\_i K\_{ij} = \pi\_j K\_{ji} \,, \quad i, j \in \hat{E}
$$


It follows that if the running costs, f, are such that (M3) holds, then G is a generator matrix of an MJP that we shall denote by (Xˆt)<sup>t</sup>≥<sup>0</sup>, and Equation (38) has a unique and positive solution. In this case, the logarithmic transformation <sup>W</sup><sup>ˆ</sup> <sup>=</sup> <sup>−</sup><sup>ε</sup> log <sup>φ</sup><sup>ˆ</sup> is well defined. It was shown in [42] that <sup>W</sup><sup>ˆ</sup> can be interpreted as the value function of a Markov decision problem with cost functional (*cf*. also [36]):

$$\hat{J}(v;i) = \mathbb{E}\left[\int\_0^\tau \left(\hat{f}(\hat{X}\_s) + k(\hat{X}\_s, v\_s)\right) ds \middle| \hat{X}\_0 = i\right] \tag{39}$$

that is minimized over the set of Markovian control strategies, <sup>v</sup> : <sup>E</sup><sup>ˆ</sup> <sup>→</sup> (0, <sup>∞</sup>), subject to the constraint that the controlled process Xˆ<sup>t</sup> = Xˆ <sup>v</sup> <sup>t</sup> is generated by G<sup>v</sup>, where:

$$G\_{ij}^v = \begin{cases} \ v(i)^{-1} G\_{ij} v(j) \,, \ i \neq j \\\ -\sum\_{j \neq i} G\_{ij}^v \,, \ i = j \end{cases} \tag{40}$$

with stopping time <sup>τ</sup> = inf{t > 0: <sup>X</sup>ˆ<sup>t</sup> <sup>=</sup> <sup>n</sup> + 1} and running costs:

$$\hat{f}(i) = \Lambda\_{ii}, \quad k(i, v) = \varepsilon \sum\_{j \neq i} G\_{ij} \left\{ \frac{v(j)}{v(i)} \left[ \log \frac{v(j)}{v(i)} - 1 \right] + 1 \right\} \tag{41}$$

Properties of the Projected Problem, Continued. From [42], we know that the optimal cost:

$$
\hat{W}(i) = \min\_{v} \hat{J}(v; i).
$$

is given by <sup>W</sup><sup>ˆ</sup> <sup>=</sup> <sup>−</sup>log <sup>φ</sup>ˆ, where <sup>φ</sup><sup>ˆ</sup> solves Equation (38), with the optimal feedback strategy given by v∗(i) = φˆ<sup>i</sup> (see [36]). We list additional properties:

(i) The v-controlled system has the unique invariant distribution:

$$
\pi^v = \left(\pi\_1^v, \dots, \pi\_{n+1}^v\right), \quad \pi\_i^v = \frac{v(i)^2 \pi\_i}{Z\_v}
$$

with Z<sup>v</sup> an appropriate normalization constant; in terms of the value function, π<sup>∗</sup> = π<sup>v</sup><sup>∗</sup> reads:

$$
\pi^\* = \left(\pi\_1^\*, \dots, \pi\_{n+1}^\*\right), \quad \pi\_i^\* = \frac{1}{Z\_\*} e^{-2\varepsilon^{-1}\hat{W}(i)} \pi\_i
$$

(ii) G<sup>v</sup> is reversible and stationary with respect to π<sup>v</sup>, *i.e.*, π<sup>v</sup> <sup>i</sup> G<sup>v</sup> ij = π<sup>v</sup> <sup>j</sup> G<sup>v</sup> ji for all i, j <sup>∈</sup> <sup>E</sup>ˆ.

(iii) Jˆ admits the same interpretation as Equation (33) in terms of the relative entropy:

$$
\hat{W}(i) = \min\_{Q \ll P} \hat{J}(v; i), \qquad \hat{J}(v; i) = \int \left\{ \hat{Z} + \varepsilon \log \left( \frac{dQ}{dP} \right) \right\} dQ
$$

where P denotes expectation with respect to the uncontrolled MJP, Xˆt, starting at Xˆ<sup>0</sup> = i, Q denotes the path measure of the corresponding controlled process with generator G<sup>v</sup> and:

$$
\hat{Z} = \int\_0^\tau \hat{f}(\hat{X}\_s) \, ds
$$

A few remarks seem in order: Item (i) of the above list is in accordance with the continuous setting, in which the optimally controlled dynamics is governed by the new potential U = V + 2W and has the stationary distribution, <sup>μ</sup><sup>∗</sup> <sup>∝</sup> exp(−2−<sup>1</sup>W)μ, with <sup>μ</sup> being the stationary distribution of the uncontrolled process. Hence, the effect of the control on the invariant distribution is the same in both cases. Further, note that optimal strategies change the jump rates according to:

$$G\_{ij}^{v^\*} = G\_{ij} e^{-\varepsilon^{-1} \left(\hat{W}(j) - \hat{W}(i)\right)} \tag{42}$$

that is, Wˆ acts as an effective potential as in the continuous case, and the change in the jump rates can be interpreted in terms of Kramer's law for this effective potential.

This completes our derivation of the discretized optimal control problem, and we now compare it with the continuous problem we started with for the case of a full partition of E and a core set partition of E.

#### 9. Markov Chain Approximations and Beyond

Full Partitions. Let E be fully partitioned into disjoint sets, A1,...,An+1, with centers x1,...,xn+1 and such that An+1 := A, and define χ<sup>i</sup> := χ<sup>A</sup><sup>i</sup> . These χ<sup>i</sup> satisfy Assumptions (S1) and (S2) discussed in Section 8. Since they are not overlapping, F is diagonal, and:

$$\hat{f}(i) = \frac{1}{\pi\_i} \int\_{A\_i} f(x)\mu(x)dx = \mathbb{E}\_{\mu}[f(X\_t)|X\_t \in A\_i] \tag{43}$$

is just obtained by averaging f(x) over the cell, Ai. Equation (43) is also a sampling formula for ˆf(i). It follows directly that G = K, and in particular, (M3) holds for any f. One can show that K has components:

$$K\_{ij} \approx \frac{1}{\Delta\_{ij}} e^{-\beta \langle V(\bar{x}\_{ij}) - V(x\_i) \rangle}, \quad \Delta\_{ij}^{-1} = \beta^{-1} \frac{m(S\_{ij})}{m(h\_{ij})m(A\_i)} \tag{44}$$

if i and j are neighbors (Kij = 0 otherwise). Here, m is the Lebesgue measure, and hij , Sij and x¯ij are defined as in Figure 10. K is the generator of an MJP on the cells, Ai, and coincides with the so-called *finite volume approximation* of L discussed in [43]. It is reversible with stationary distribution:

$$\pi\_i = \int\_{A\_i} d\mu \approx m(A\_i)e^{-\beta V(x\_i)}$$

Figure 10. The mesh for the full partition.

One can show that the approximation error vanishes for n → ∞. K and π can be computed from the potential, V , and the geometry of the mesh. By inspecting Equations (12) and (13), we see that K is connected to the transition matrix, P<sup>τ</sup> , of a full partition MSM with lag time τ by

$$\lim\_{\tau \to 0} \frac{1}{\tau} \left( P\_{ij}^{\tau} - M\_{ij} \right) = \lim\_{\tau \to 0} \frac{1}{\pi\_i} \langle \chi\_i, \frac{1}{\tau} (T\_\tau - \mathbb{1}) \chi\_j \rangle = \frac{1}{\pi\_i} \langle \chi\_i, L\chi\_j \rangle = K\_{ij},$$

thus K is the generator of the semigroup of transition matrices, P<sup>τ</sup> . Therefore we could obtain K by sampling in the same way we obtained P<sup>τ</sup> through Equation (19) in Section 5. This is difficult, however, due to recrossing problems for small τ ; see e.g., [44]. Finally, let us note in passing that we can drastically simplify k<sup>v</sup> if the cells, Ai, are boxes of length h. Denote the elementary lattice vectors by en. Then,

$$k^v(i) = \frac{1}{2}|u^v(i)|^2 + \mathcal{O}(h), \quad u\_n^v(i) := \frac{1}{\sqrt{2}}\frac{\epsilon}{2h}\left(\log v(i + e\_n) - \log v(i - e\_n)\right),$$

which establishes the connection to the continuous case. However, more is true: The whole discrete control problem reduces to first order in h to the well-known Markov chain approximation (MCA) [39], which allows us to use convergence theory for MCAs to conclude that, for n → ∞, the optimal control and value function of the discrete control problem converge to their continuous counterparts. More details can be found in [41].

Core Set Partition. Now, we choose core sets C1,...,Cn+1 with Cn+1 = A, and we let χ<sup>i</sup> = q<sup>i</sup> be the committor function of the process with respect to Ci, as in Section 4. These χ<sup>i</sup> satisfy Assumptions (S1) and (S2) discussed in Section 8. The projection onto the committor basis also allows for a stochastic interpretation. Recall the definition of the forward and backward milestoning process, X˜ <sup>±</sup> <sup>t</sup> , from Equation (18). The discrete costs can be written as:

$$\hat{f}(i) = \frac{1}{\pi\_i} \langle q\_i, f \sum\_j q\_j \rangle = \int \nu\_i(x) f(x) dx = \mathbb{E}\_{\mu} \left[ f(X\_t) \middle| \hat{X}\_t^- = i \right] \tag{45}$$

where νi(x) = <sup>q</sup>i(x)μ(x) <sup>π</sup><sup>i</sup> <sup>=</sup> <sup>P</sup>(X<sup>t</sup> <sup>=</sup> <sup>x</sup>|X˜ <sup>−</sup> <sup>t</sup> = i) is the probability density of finding the system in state x given that it came last from i. Hence, ˆf(i) is the average costs conditioned on the information X˜ <sup>−</sup> <sup>t</sup> = i, *i.e.*, X<sup>t</sup> came last from Ai, which is the natural extension to the full partition case, where <sup>ˆ</sup>f(i) was the average costs conditioned on the information that <sup>X</sup><sup>t</sup> <sup>∈</sup> <sup>A</sup>i.

The matrix K = π−<sup>1</sup> <sup>i</sup> qi, Lq<sup>j</sup> is reversible with stationary distribution

$$\pi\_i = \langle q\_i, \mathbb{1} \rangle = \mathbb{P}\_{\mu}(\tilde{X}\_t^- = i)$$

and is related to core MSMs again:

$$K = \lim\_{\tau \to 0} \frac{1}{\tau} \left( P^{\tau} - M \right)$$

where P<sup>τ</sup> and M are now the matrices for core MSMs, as in Equation (18). Formally, K is the generator of the <sup>P</sup><sup>τ</sup> , but these do not form a semigroup, since <sup>M</sup> <sup>=</sup> <sup>1</sup>. Therefore, we cannot interpret K directly as, e.g., the generator of X˜ <sup>−</sup> <sup>t</sup> . Nevertheless, the entries of K are the transition rates between the core sets, as defined in transition path theory [45]. We can sample P<sup>τ</sup> and M using Equations (20) and (21); because we used an incomplete partition, the recrossing problem is removed, and there is no difficulty in sampling P<sup>τ</sup> for all lag times, τ , and therefore, K directly. It is worth noting that F can also be sampled:

$$F\_{ij} = \mathbb{E}\_{\mu} \left[ f(X\_t) \chi\_{\{\tilde{X}^+\_t = j\}} \Big| \tilde{X}^-\_t = i \right],$$

Therefore, as in the construction of core MSMs, we do not need to compute committor functions explicitly. Note, however, that G = L, there is a reweighting, due to the overlap of the qi's, which causes F to be non-diagonal. This reweighting is the surprising bit of this discretization. From properties (M1)–(M3) from Section 8, we see, however, that G and K are both reversible with stationary distribution, π. Finally, note that if the cost function, f(x), does not satisfy f<sup>∞</sup> ≤ C from (M3), G will not even be a generator matrix. In this case, (34) still has a solution, φˆ, which is the best approximation to φ, but this solution may not be unique; it may not satisfy φ >ˆ 0, and we have no interpretation as a discrete control problem.

#### 10. Numerical Results

#### *10.1. 1D Potential Revisited*

Firstly, we study diffusion in the triple well potential, which is presented in Figure 2. This potential has three minima at approximately x<sup>0</sup>/<sup>1</sup> = ±3.4 and x<sup>2</sup> = 0. We choose the three core sets C<sup>i</sup> = [x<sup>i</sup> − δ, x<sup>i</sup> + δ] around the minima with δ = 0.2. Take τ to be the first hitting time of C0. We are interested in the moment generating function φ(x) = E / e−−1στ 0 of passages into C<sup>0</sup> and the cumulant generating function W = log φ. This is of the form Equation (32) for A = C<sup>i</sup> and f = σ, a constant function.

In Figure 11a, the potential, V , and effective potential, U, are shown for β = 2 and σ = 0.08 (solid lines), *cf*. Equation (30). One can observe that the optimal control effectively lifts the second and third well up, which means that the optimal control will drive the system into C<sup>0</sup> very quickly. The reference computations here have been carried out using a full partition FEM (finite element method) discretization of Equation (31) with a lattice spacing of h = 0.01. Now, we study the MJP approximation constructed via the committor functions shown in Figure 11b. These span a three-dimensional subspace, but due to the boundary conditions, the subspace, D, of the method is actually two-dimensional. The dashed line in Figure 11a gives the approximation to U calculated by solving Equation (38). We can observe extremely good approximation quality, even in the transition region. In Figure 11c, the approximation to the optimal control, α∗(x) (solid line), and its approximation αˆ<sup>∗</sup> = − <sup>√</sup>2∇W<sup>ˆ</sup> (dashed line) are shown. The core sets are shown in blue. We can observe jumps in αˆ<sup>∗</sup> at the left boundaries of the core sets. This is to be expected and comes from the fact that the committor functions are not smooth at the boundaries of the core sets, but only continuous. Therefore, the approximation to U is continuous, but the approximation to α<sup>∗</sup> is not.

Next, we construct a core MSM to sample the matrices, K and F. One hundred trajectories of length T = 20, 000 were used to build the MSM. In Figure 11d, W and its estimate using the core MSM are shown for = 0.5 and different values of σ. Each of the 100 trajectories has seen about four transitions. For comparison, a direct sampling estimate of W using the same data is shown (green). The direct sampling estimate suffers from a large bias and variance and is practically useless. In contrast, the MSM estimator for W performs well for all considered values of σ, and always, its variance is significantly small. The constant, <sup>C</sup>, which ensures φ ><sup>ˆ</sup> <sup>0</sup> when <sup>σ</sup> <sup>≤</sup> <sup>C</sup>, is approximately 0.2 in this case. This seems restrictive, but still allows one to capture all interesting information about φ and W.

Figure 11. Three well potential example for = 0.5 and σ = 0.08. (a) Potential V (x) (blue), effective potential U = V +2W (green) and approximation of U with committors (dashed red). (b) The three committors, q1(x), q2(x) and q3(x). (c) The optimal control α∗(x) (solid line) and its approximation (dashed line). Core sets are shown in blue; (d) Optimal cost W for β = 2 as a function of σ. Blue: Exact solution. Red: Core MSM estimate. Green: Direct sampling estimate.

Lastly, we study α-β-transitions in alanine dipeptide, a well-studied test system for molecular dynamics applications. We use a 1μs long trajectory simulated with the CHARMM (Chemistry at HARvard Molecular Mechanics) 27 force field. The conformational dynamics is monitored as usual via the backbone dihedral angles, φ and ψ. The data was first presented in [27]. We construct a full partition MSM with 250 clusters using k-means clustering. We are interested in the MFPT (mean first passage time) t ˆ(i) = Ei[τα], where τ<sup>α</sup> is the first hitting time of the α conformation, which we define as a circle with radius r = 45 around (φα, ψα)=(−80, −60). The MFPT vector, t ˆ, solves the boundary value problem

but since K is not available directly via sampling, we have to consider the equation

$$\frac{1}{\tau} \left( P^{\tau} - 1 \right) \hat{t} = -1 \text{ outside of } \alpha, \quad \hat{t} = 0 \text{ in } \alpha^{\tau}$$

instead. The result will depend on the choice of lag time τ . In Figure 12a, the results are shown for τ = 5; we can identify the β-structure as the red cloud of clusters where t ˆ(i) is approximately constant. In Figure 12b, t ˆβα = E(t ˆ(i)|<sup>i</sup> <sup>∈</sup> <sup>β</sup>) is shown as a function of <sup>τ</sup> . We observe a linear behavior for large τ , which is due to the linear error introduced in the replacement of K with <sup>1</sup> <sup>τ</sup> (P<sup>τ</sup> <sup>−</sup> 1), and a nonlinear drop for small τ , which is due to non-Markovianity. Our best guess is, therefore, a linear interpolation to τ = 0, which is indicated by the solid line. The result is t ˆ(0) βα = 35.5ps. As a comparison, the reference value t ˆref βα = 36.1ps from [27] is shown as a dashed line. It was computed in [27] as an inverse rate, using the slowest ITS (implied time scale) and information about the equilibrium weights of the α and β structure. We see very good agreement. The result is, of course, dependent, though, on the assignment of clusters to the α and β structure. Some tests show that t ˆ(0) βα as computed with the interpolation method is fairly insensitive to this choice.

Figure 12. Dipeptide example. (a) MFPT from β to α in φ-ψ space for τ = 5. The red cloud to the right is the β-structure. (b) MFPT as a function of τ (dashed line) and linear interpolation to τ = 0 (solid line). Green dashed line: reference computed via the slowest ITS .

In [14], it is demonstrated how to use the method presented herein for maximizing the population of the α-conformation of alanine dipeptide based on the MSM used here.

#### 11. Conclusions

In this article, we have discussed an approach to overcome direct sampling issues of rare events in molecular dynamics based on spatial discretization of the molecular state space. The strategy is to define a discretization by subsets of state space, such that the sampling effort with respect to transitions between the sets is much lower than the direct estimation of the rare events under consideration. That is, without having to simulate rare events, we construct a so-called Markov State Model, a Markov chain approximation to the original dynamics. Since the state space of the MSM is finite, we can then calculate the properties of interest by simply solving linear systems of equations. Of course, it is crucial that these properties of the MSM can be related to the rare event properties of the original process that we have not been able to sample directly.

This is why we have analyzed the approximation quality of MSMs in the first part of the article. We have used the interpretation of MSMs as projections of the transfer operator to: (1) derive conditions that guarantee an accurate reproduction of the dynamics; and (2) show how to construct models based on a core set discretization by leaving the state space partly undiscretized.

In the second part of the article, we have used the concept of MSM discretization to solve MD optimal control problems in which one computes the optimal external force that drives the molecular system to show an optimized behavior (maximal possible population in a conformation; minimal mean first passage time to a certain conformation) under certain constraints. We have demonstrated that the spatial discretization underlying an MSM turns the high-dimensional continuous optimal control problem into a rather low-dimensional discrete optimal control problem of the same form that can be solved efficiently. This result allows two different types of applications: (1) if one can construct an MSM for a molecular system in equilibrium, then one can use it to compute optimal controls that extremize a given costs criterion; (2) if an MSM can be computed based on transition probabilities between neighboring core sets alone, then the rare event statistics for transitions between strongly separated metastable states of the system can be computed from an associated optimal control problem that can be solved after discretization using the pre-computed MSM.

#### Acknowledgments

The authors have been supported by the DFG Research Center MATHEON.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


Reprinted from *Entropy*. Cite as: Delle Site, L. What is a Multiscale Problem in Molecular Dynamics?. *Entropy* 2014, *16*, 23–40.

*Concept Paper*

## What is a Multiscale Problem in Molecular Dynamics?

#### Luigi Delle Site

Institute for Mathematics, Freie Universität Berlin, Arnimallee 6, Berlin D-14195, Germany; E-Mail: luigi.dellesite@fu-berlin.de; Tel.: +49-(0)-30-838-75775; Fax: +49-(0)-30-838-75412

*Received: 25 June 2013; in revised form: 7 August 2013 / Accepted: 11 September 2013 / Published: 27 December 2013*

Abstract: In this work, we make an attempt to answer the question of what a multiscale problem is in Molecular Dynamics (MD), or, more in general, in Molecular Simulation (MS). By introducing the criterion of separability of scales, we identify three major (reference) categories of multiscale problems and discuss their corresponding computational strategies by making explicit examples of applications.

Keywords: multiscale modeling; quantum; classical atomistic; coarse graining; adaptive resolution

#### 1. Introduction

One of the major challenges of Molecular Dynamics (MD) over the last decade has been the development and application of techniques that allow the bridging of length and time scales in a physically consistent way. The relevance of such an effort is obvious: the understanding of the microscopic origin of large-scale properties leads to a deeper knowledge of physical phenomena and, when required, to the design of physical systems with specific properties on demand. The computational and conceptual progress in bridging scales, in condensed matter, material science, chemical physics and related fields, has been rather massive, and nowadays, the expression "*multiscale modeling*" has become almost routine. However, what is exactly meant by multiscale modeling is not clear yet. Obviously, one must go beyond the approach of combining, in a brute force fashion, different simulation techniques designed for different scales; computers will always give an answer; however, physical consistency must not be violated beyond the level of a controlled error/approximation. In this work, we discuss a possible classification of multiscale problems and relate them to the corresponding computational techniques and to the idea of physical consistency. The paper is organized as follows. Based on the concept of scale separability, that is, how much scales can be separated in a system, we will identify three major categories: problems with "separated scales", those with "separable scales" and those with "highly-interconnected scales". Next, we will treat for each case some specific examples taken from applications and discuss the corresponding computational strategy. The final part will be dedicated to one (specific) emerging scale-coupling technique, that is, the adaptive resolution MD approach. This latter allows, in principle, one to describe in a unified simulation framework, the molecular (chemical) origin of large-scale properties and, thus, to interpret the multiscale idea in its full meaning.

#### 2. Separability of Scales

Rather than providing a universal definition of multiscale, we instead introduce an objective criterion and define the idea of multiscale accordingly. The criterion in question is the "separability of scales", that is, how much in space and time, given some properties or phenomena of interest, scales can be separated in a system. Obviously, this separation is never sharp, and scales are never exactly disjoint; for this reason, this classification must be intended only as a general reference scheme. According to the concept above, we have identified three major categories: "separated scales", "separable scales" and "highly-interconnected scales"; below, we comment on each specific category.


Above, we have reported, for simplicity, only the case of a two scales problem; however, the extension to multiple scales is obvious. Moreover, in real applications, multiscale problems occupy the spectrum of categories given above in a sort of continuous way and, thus, require various combinations of the computational strategies reported above. It must be clarified that the separability of scales is not the only criterion possible for such a classification. An example of a complementary criterion is the one used by Berendsen: scales' hierarchy. This stems from deciding *a priori* which scale is more relevant in a problem rather than looking primarily at how scales are connected [1,2]. An overview of how the idea of multiscale is interpreted in the field of condensed matter, material science, chemical physics and related disciplines, together with recent method developments and applications, can be found in [3,4]. The fact that two relevant journals in the field dedicate an entire issue to the subject is an implicit confirmation of the relevance of the idea within the community of chemists and physicists (and even beyond, e.g., to mathematicians, engineers, biologists). In the next section, we will provide few examples of problems where the abstract classification defined above finds practical application in the field of MD.

#### 3. Separated Scales and Sequential Strategy

Let us start by discussing an example where scales can be separated in a quite good approximation, and thus, the sequential coupling strategy is appropriate. The system we will discuss is that of macromolecular samples on solid inorganic surfaces. There are two main aspects in this problem:


The first aspect requires the detailed description of specific chemistry and its corresponding electronic properties; thus, a quantum mechanical description is mandatory. The second aspect is dominated by the entropic character of the chain entanglements in bulk and, thus, requires classical statistical methods, which properly sample the vast configurational space of a liquid. When we put the solid surface in contact with the macromolecular liquid, the properties of interest are those at the interface. These would emerge as a result of a non-trivial combination of adhesion and molecular packing. From a methodological point of view, this implies that one must combine, in a proper consistent way, quantum mechanics and classical statistical mechanics. As an example, this idea has been put in practice for a polycarbonate melt on a surface of nickel (111) [5,6]. Figure 1 gives a pictorial explanation of the idea. First, quantum mechanical calculations are performed for each isolated polymer subgroup. However, calculations are done by taking into account all possible allowed geometries at the surface consistent with the topological constraints of the large polymer; then, an effective moiety-surface potential is derived. In parallel, a coarse-grained (bead and spring) model for the polymer, which reproduces the bulk properties of the liquid, is derived from short full atomistic simulations. Finally, a coarse-grained simulation of a large system with the quantum-based surface-polymer interaction is performed [7]. In this specific case, we found that only phenolic chain ends experience a strong attraction; internal beads or other suitable chemical modifications of the chain ends, experience the surface as a hard wall. The results of this study allow one, then, to establish whether the interface properties are energy dominated (polymers are grafted onto the surface, and this leads to a sort of brush-like interface structure) or entropy dominated (polymers are topologically confined by a purely repulsive surface, and this leads to a parallel layering of the liquid) [8–10]. The same general idea has been later on extended to the case of the adsorption of large biomolecules out of solution on metal surfaces [11–15].

Figure 1. Pictorial representation of the computational strategy adopted for studying a melt of polycarbonate on a Ni(111) surface. On the left side, part (c), the explicit chemistry of the submolecular unit studied at the quantum level on the nickel surface. Part (a) illustrates the corresponding bead and spring coarse-grained model with the underlying atomistic structure. Part (b) represents one polymer out of the melt at the surface, interacting with the effective, quantum-based-derived potential (only the phenolic chain ends sticks onto the surface). On the right side, the cartoon summarizes the result of the simulation at a large scale; as a function of the chemical specificity of the chain ends, one can go from energy-dominated interface properties to entropy-dominated interface properties.

The case illustrated above is a sequential strategy in the "bottom-up" fashion, that is, from a finer to a coarser scale; however, the sequential strategy can be applied in the other direction, in a "top-down" fashion, that is, from a coarse scale down to a refined finer scale. For example, this was done by Zhu and Hummer (see, also, Figure 2 for a pictorial representation) when studying the gating transition in biological ion channels: the transition from opened to closed configuration (and *vice versa*) is done at a coarse-grained level and later refined at the atomistic level to study the specific role of hydration, which requires the atomistic resolution of water molecules [16,17].

In the next section, we will make a step forward and consider those cases where hopping between scales is required.

Figure 2. Pictorial representation of the top-down strategy adopted by Zhu and Hummer for studying the gating transition in biological ion channels. The first step consists of describing the large-scale conformational changes using a computationally affordable and physically consistent coarse-grained model. Next, atomistic details are reinserted, and finally, water is treated with atomistic resolution; this allows for investigating the role of channel hydration in the gating transition. This figure is adapted from Figure 2 of [17] with the permission of the corresponding author.

#### 4. Separable Scales and Backmapping Strategy

The sequential strategy of the chapter above cannot be applied when, despite a clear separation of scales in space and time, the evolution of the system requires a refinement of the coarser model as the simulation proceeds. This case of "separable", but not fully separated, scales is illustrated in this section via the example of photoswitchable liquid crystals. The physical ingredients of the problem are the following: liquid crystals containing azobenzene groups (see Figure 3) upon illumination can isomerize by changing conformation from the *trans* to the *cis* state.

This is the basic mechanism for light-induced mesoscopic transition: upon illumination, one can have a transition from the nematic to the isotropic phase, as shown by Ikeda and Tsutsumi [18]. From the computational point of view, the scales involved are the electronic/quantum, the classical atomistic and the coarse-grained. From the quantum mechanical point of view, one must describe the photoinduced electronic excitation and the possible consequent isomerization. The classical atomistic level is then required, because as the molecule is excited, a certain intermediate conformation is taken; however, the conformation resulting from the de-excitation (*i.e.*, staying in the *trans* or isomerizing in the *cis* state) strongly depends on the immediate atomistic environment; this, in turn, leads to a local rearrangement of the surrounding molecules. Finally, the coarse-grained scale is required to describe the large conformational response in the bulk involving the slow relaxation process of the photo-mechanical response. From the point of view of the computational strategy, one needs to link quantum, classical atomistic and coarse-grained in a consistent way [19–22]. Figure 4 illustrates the strategy. First, a coarse-grained model is derived from an all atom simulation of a relatively small system. Next, a large coarse-grained sample is simulated for long enough to have bulk equilibration. From the equilibrated system, a subsystem is cut out, and the atomistic degrees of freedom are mapped back. Next, one runs a simulation that ensures atomistic equilibration, and then, from the atomistic sample, a subsystem is cut out. Next, in this subsystem, the excitation of one molecule is allowed by treating the problem at quantum mechanical level. After the system decays from the excited state and the electronic degrees of freedom are equilibrated, the resulting configuration of the subsystem is reinserted into the the classical atomistic sample, equilibrated at the atomistic level, then, reinserted into the coarse-grained sample and equilibrated at the coarse-grained level; at this point, the loop is repeated. The meaning of separable becomes clear: time scales are separated at least by one order of magnitude (order of femtoseconds for the quantum, at least picoseconds, for the atomistic, and at least nanoseconds, for the coarse grained). Length scales are obviously separated as well, thus space and time scales are separable. However, the process at each scale is intimately linked to the response at the other scale.

Figure 3. The azobenzene group in the *trans* (top) and *cis* (bottom) configuration. Upon illumination, the isomerization can take place; the molecule goes through an intermediate configuration corresponding to the excited state and, then, decays, either back into the *trans* or isomerizes into the *cis* state; this depends on the immediate atomistic neighborhood.

At this point, it must be clarified that the idea of the example discussed above can capture, at this stage, only the response of the system to the excitation/de-excitation of one single molecule per time and cannot directly address the question of how many molecules can concurrently undergo the *cis-trans* transition. In order to model this more realistic scenario, one would require treating larger quantum systems in order to understand how excited molecules influence each other. At the current state-of-the-art, this would imply a prohibitive computational effort, even for the simple case of three molecules treated at the quantum level. Tests (quantum calculations) are being performed for the case of two molecules in order to understand, at a basic level, the mutual influence of excited

#### 286

molecules. The idea, in perspective, is that of including the information obtained by the quantum studies of one molecule and two molecules into a classical model of an azobenzene molecule that can switch mechanically from *trans* to *cis* [23], under the hypothesis that, further, many-molecule effects, at the quantum level, may be negligible. Next, one would use the multiscale simulation, including the quantum subsystem, as a reference for a test of basic consistency of the classical model. If the test is satisfactory, then the question of how many molecules can concurrently undergo the *cis-trans* transition could be treated at the (classical) atomistic-coarse-grained level, keeping in mind that beyond two-molecule correlations, the quantum effects in the switching process are neglected. Obviously, when larger computational resources become available, one could systematically improve the classical model, adding information from larger quantum calculations. Anyway, in the current paper, the specific strategy reported above has to be intended as a typical example of a problem where going back and forward from one scale to another is the main characteristic of the modeling idea; however, from the practical point of view, it shows also that, because of the current computational limitations, the whole complexity of the problem can only be addressed in an approximate way. Nevertheless the relevant message here is that, because of the clear separation of time and length scales, the basic strategy of going back and forward is still the optimal one, even in the case that computational resources were available for studying macroscopic systems. Finally, in the next section, we will describe the case where scales cannot be separated and, thus, a simultaneous coupling of the corresponding computational techniques and models is required.

Figure 4. The backmapping loop. Following the black-arrowed line (I), derive a suitable coarse-grained model; (II) then, equilibrate a large coarse-grained sample; (III) next, cut out a subsystem and map back the atomistic degrees of freedom; (IV) the atomistic sample is then equilibrated; and finally, (V) a subsystem is cut out from it, and for this latter, the quantum process is allowed. Once the quantum process has occurred, the procedure continues by reinserting the subsystem into the atomistic sample, and then, the loop is repeated (red-arrowed line).

#### 5. Highly-Interconnected Scales and Concurrent Coupling

As reported above, the backmapping strategy is rather efficient if there is a clear separation in length and time scales; however, when scales are intimately interconnected, in most of the cases, the relevant properties of the system are the expression of this interconnection. From the computational point of view, it is then mandatory to adopt strategies where, within a unified approach, all the scales are treated at the same time, *i.e.*, a simultaneous or concurrent coupling of scales. A typical example is that reported in Figure 5 of solvation of molecules in water. High resolution—at least classical atomistic—is needed in the first hydration shells of the molecule, where the explicit formation of the hydrogen bonding network is required, as this uniquely characterizes the solvation of the specific molecule. Far away in the bulk, water plays the role of a thermodynamic bath, and thus, it may be described by coarser models, which reproduce the essential thermodynamic features (e.g., temperature and particle density fluctuations). However, the scale at which the first hydration shells evolve cannot be separated by the thermodynamic scale of the bulk, since between the two regions, there is a simultaneous exchange of information in terms of energy and (eventually) particles.

For this reason, both scales must be treated in a simultaneous fashion, taking care that the overall thermodynamics is well preserved. Popular computational approaches of this kind are, for example, the Quantum Mechanical/Molecular Mechanical (QM/MM) set-ups, where a small quantum region is coupled to a large atomistic or coarse-grained classical region, allowing free exchange of energy (though not particles, see the discussion later on) (see, e.g., [24]). The set-up mentioned above allows then for studying systems where the local process can be linked to the global behavior; for example, *in situ* simulation of chemical reactions and molecular excitations in solution, where the chemical reaction or the molecular excitation occurs at the quantum level in a small region of the simulation space, while the bulk solvent can be treated at the classical atomistic level [25]. Another example, relevant for mechanical engineering, is that of the crack propagation in solids. The breaking of atomic bonds in the region of the crack must be treated at the quantum level, because this is an electronic property; then, the surrounding region can be described at the classical atomistic level, so that one can see the induced crystal distortion. Finally, the bulk material is described at the continuum (finite elements) level in order to detect its macroscopic mechanical changes. However, all the scales exchange information simultaneously as the crack propagates, and thus, they are coupled in a simultaneous fashion [26]. Although the dividing line between separable scales and interconnected scales is not sharp, one can see a clear difference between the examples reported in this section and that of the azobenzene systems of the previous section. In fact, let us suppose, for example, that we were interested only in the influence of the immediate molecular neighborhood onto the excitation and de-excitation of an azobenzene molecule. In this case, a QM/MM approach would be highly appropriate, because the local liquid structure and its local fluctuations would slowly follow (and at the same time, influence) the evolution of the electronic and conformational properties of the excited molecule. In fact, in the multiscale study of the azobenzene system, the QM/MM approach was used for quantum calculations; however, the macroscopic response of the bulk cannot be predicted only by QM/MM calculations, because it would require a size for the MM system and a time scale that are, at this stage, computationally prohibitive. For this reason, the coarse-grained approach for obtaining relaxed macroscopic configurations is in this case mandatory. The three categories of problems so far discussed provide an overview of what in the literature can be classified as multiscale. However, there is an underlying general message in all the examples made: multiscale essentially means the interplay between local and global aspects (in space and time). Thus, the detailed understanding of the molecular origin of macroscopic properties requires a step forward, beyond the strategies shown so far (or their possible combinations). This will be discussed in the next chapter, where we introduce the idea of adaptive resolution simulation.

Figure 5. Pictorial representation of a molecule solvated in water. The first hydration shells must be treated at least at the atomistic level, so that the formation of the hydrogen bonding network can be explicitly described. Far away in the bulk, a coarse model (e.g., spherical) can be employed to assure the proper thermodynamic bath. However, the two scales are coupled simultaneously. Note that in standard computational set-ups, such as the one of Quantum Mechanical/Molecular Mechanical (QM/MM) discussed in the text, the high resolution region is fixed and particles cannot be exchanged; thus, there is only an exchange of energy.

Solvation Shell=Hydrogen Bonding Network

## 6. Molecular Origin of Macroscopic Properties: Zooming in at the Molecular Scale

The molecular origin of macroscopic properties can be understood by zooming in (and out) in the region where the relevant microscopic physics and chemistry is taking place.

Figure 6 explains this idea for two examples previously discussed, namely, the adsorption of a large molecule on a solid surface and the solvation of a molecule in water. In the first case, while the molecule is far away from the surface, the only relevant physics is related to the proper sampling of the conformational space of the backbone; thus, a simplified bead and spring model would be sufficient for this purpose. Instead, as the molecule approaches the surface, one needs to zoom in (put the system under a magnifying glass) at the contact region and have an explicit atomistic description, so that the chemical recognition between the molecule and the surface can be properly described and be understood together with the consequent conformational rearrangement of the rest of the molecule (at a coarser scale). The same kind of idea applies to the solvation of molecules in water; the specific solvation structures of the liquid can be understood by zooming in at the hydration region around the solute: when a water molecule enters under the viewing region of the magnification glass, it must be described at the atomistic level; when it leaves, it then takes a coarse-grained description. This process requires that the high resolution region be open and allow for a free exchange of molecules. This, from the methodological point of view, implies that one must go beyond the idea of concurrent coupling to that of "adaptive resolution simulation".

Figure 6. On the left side, (a), zooming in on the contact region between the molecule and the surface. The magnifying glass is intended as a computational tool to introduce explicit chemistry and atomistic structure, so that the process of chemical recognition between the molecule and the solid surface can be properly described together with the simultaneous evolution of the large-scale conformational changes of the polymer. On the right side, (b), the same idea, but for the solvation process. All the molecules under the magnifying glass must have atomistic resolution, so that the hydrogen bonding network of the hydration shell can be properly described. Molecules that leave the solvation region loose atomistic degrees of freedom.

*6.1. Beyond Concurrent Coupling: Adaptive Resolution Molecular Dynamics*

From the methodological point of view, the essential requirements of an adaptive resolution molecular dynamics scheme are the following:


• (iii) Finally, the process of (i) and (ii) should occur under conditions of thermodynamic equilibrium: *i.e.*, the same particle density, same temperature and same pressure all over the simulation box (ρatom = ρcg, patom = pcg, Tatom = Tcg).

Of course, the thermodynamic state point must be the same as if the whole system was described at high resolution. Several adaptive resolution methods, which sometimes satisfy and sometime do not satisfy the requirements above, have been proposed in the last few years [27–31]. While most of the methods can switch only between two resolutions, the AdResS method has extended the idea of adaptivity from the quantum description of atoms [32–34] to the continuum description of a liquid [35,36]. The original idea was to have an on-the-fly interchange between atomistic and coarse-grained description of a liquid through a two-stage procedure. First, develop an effective, coarse-grained pair potential, Ucm, from the reference all atom simulation. Next, the atomistic and coarse-grained resolutions are coupled through an interpolation formula on the forces:

$$\mathbf{F}\_{\alpha\beta} = w(X\_{\alpha})w(X\_{\beta})\mathbf{F}\_{\alpha\beta}^{atom} + [1 - w(X\_{\alpha})w(X\_{\beta})]\mathbf{F}\_{\alpha\beta}^{cm} \tag{1}$$

Here, α and β are the labels of two molecules, **F**atom αβ is the force derived from the atomistic potential and **F**cm αβ is the force derived from the coarse-grained potential. X<sup>α</sup> and X<sup>β</sup> are the coordinates of the center of mass of, respectively, the molecule α and the molecule β. The multiplicative function, w(x), is zero in the coarse-grained region, one in the atomistic region and smooth and monotonic in an intermediate region, Δ. Figure 7 shows the idea for a test molecule (tetrahedral molecule), left, coarse grained, in Δ, a hybrid resolution according to w(x) and, on the right, atomistic.

Figure 7. Schematic representation of the adaptive idea for tetrahedral molecules.

With this set-up, two atomistic molecules interact as atomistic, coarse-grained molecules interacts with all the others as coarse-grained pairs (coarse-grained molecules do not posses any atomistic degrees of freedom), while for the other cases, molecules interact according to their coupled value of w(Xα)w(Xβ) with hybrid resolutions. This means that a molecule that goes from the atomistic to the coarse-grained region slowly looses its atomistic degrees of freedom (rotations and vibrations) and becomes an effective sphere, going through a continuous stage of hybrid resolutions in Δ. The same process, but in the opposite direction (acquiring degrees of freedom) occurs to a coarse-grained molecule moving towards the atomistic region. At this point, the "technical" meaning of "loosing" or "acquiring" degrees of freedom in the boundary (hybrid) region that divides the atomistic and the coarse-grained region must be clarified. All molecules of the system, independently of their position in space, retain the full atomistic structure. For a molecule, α, in the hybrid region, the force acting on each single atom (derived from the atomistic potential) is weighted by an amount, w(Xα), multiplied by the weight, w(Xβ), of the paired molecule, β, and, by construction, the remaining total force acting on the molecule (derived from the coarse-grained potential) is assigned to the center of mass by the weighting term, [1 − w(Xα)w(Xβ)]. Since w(x) goes to zero in the coarse-grained region, the closer the molecule is to the coarse-grained region, the weaker is the force (derived from the atomistic potential) acting on each atom (and the larger the force on the center of mass). When the molecule enters into the coarse-grained region, the only force acting on the molecule is that on the center of mass, independently of the resolution of the paired molecule. The forces acting on each atomistic degrees of freedom coming from the atomistic potential are no more considered, although the underlying atomistic structure is artificially kept for technical reasons. In practice, the interactions of the atomistic degrees of freedom for this molecule are no more explicitly considered. This implies that their contributions to the dynamic evolution and to the energy of the system disappear. Moreover, the internal kinetic energy of the molecule (*i.e.*, kinetic energy associated with rotations and internal vibrations) is also not considered in the calculation of the properties of the system (thus, effectively, the molecule behaves as a sphere). The computational gain consists of a drastically reduced number of interactions that needs to be calculated. *Vice versa*, when a molecule enters into the transition region, the force acting on each single atom is slowly reactivated according to the weight, w(Xβ) (and to the weight of the paired molecule, w(Xβ)). When the molecule reaches the atomistic region, we have w(Xα)=1, and thus, it interacts in the standard full atomistic manner with molecules of the atomistic region, while for the interaction with molecules in other regions, the weight on the atomistic force depends only on the position of the paired molecule (above is commented the extreme case of the paired molecule being in the coarse-grained region). A detailed explanation of all the technical details of the implementation of this idea can be found in [37]. Obviously, in the diving boundary region between the atomistic and coarse-grained region, one must properly deactivate/reactivate the atomistic degrees of freedom. This is done by introducing an external source of heat (thermostat), which acts locally on each degree of freedom in order to assure that despite the deactivated/reactivated degrees of freedom, the temperature in this region is the same as the target one (that is, the temperature of the atomistic and of the coarse-grained systems) [38–40]. Additionally an external force derived on the basis of the first principles of thermodynamics (*i.e.*, a force that balances the difference of the chemical potential between the atomistic region and the rest of the system) [41] is added to assure thermodynamic equilibrium. The natural question for a such molecular dynamics scheme is whether the force of Equation (1) is conservative, and the answer is negative. In fact, despite a different claim [29,42,43], it has been shown both analytically [44] and numerically [45] that within this scheme, there is no possibility of deriving Equation (1) from a potential. According to the statement above, a natural conclusion is that without a conservative force one cannot be in the microcanonical (or canonical) ensemble, and thus, even time-independent properties cannot be accurately calculated. However, it has been shown that the atomistic region (the most delicate case) is characterized by a Grand Canonical distribution [46], and thus, the method reproduces the same statistics of the equivalent region in a full atomistic system (the key property of any valid adaptive resolution method!). Moreover, an exact Hamiltonian can be written for the atomistic (and coarse-grained) region, where exact means that each term of the Hamiltonian is physically well defined without introducing any artificial (unphysical) quantity; this, then, allows for a full Grand Canonical-like formalization of the adaptive set-up [47,48]. In general, all the relevant properties (radial distribution functions, density distribution across the simulation box, molecular diffusion, particle density fluctuations and solvation structures) determined via full atomistic simulation for several liquids and solvated systems were properly reproduced (see, e.g., [27,38,41,49–53]). However, a key issue remains unresolved, that is, whether or not there exists a well-founded Hamiltonian route to adaptive resolution simulation. If a Hamiltonian approach exists, it must fulfill the following necessary requirements:


Recently, Potestio *et al.* proposed a global Hamiltonian approach of AdResS [54]. This is based on the interpolation in space of the atomistic and coarse-grained potential, that is, in this case, the potential is interpolated instead of the forces, as in the standard AdResS. The idea provides an elegant thermodynamic procedure of how to equilibrate the system, but, by construction, cannot satisfy the requirement of correct limit according to [44]. In fact, an additional force is generated by the gradient of the weighting function, w(x). This force induces an unphysical flux of particles from one region to another. In order to balance this flux, an additional field must be added to the original Hamiltonian. In [54], an elegant thermodynamic procedure is used to determine this field. However, in [44], it has been shown that such a field is a solution of a first order partial differential equation, and in order to describe a proper adaptive system, it requires that the field satisfies two boundary conditions; that is, it must be zero in the atomistic and in the coarse-grained region. Since the equation is of the first order, only one boundary condition can be used to fix the solution. Let us suppose we fix the boundary condition in the atomistic region; then, inevitably, in the coarse-grained region, the original potential is changed by an artificial, unphysical additional term. This implies that if we, ideally, shrink the atomistic region to zero, we do not recover the original coarse-grained potential. The additional term is constant in space, but strongly depends on the size of the system; although this may turn out to be a non-relevant aspect from the technical point of view, however, from the conceptual point of view, this makes the Hamiltonian, and, thus, the corresponding adaptive process, artificial. Moreover, at a more practical level, according to [48], as it stands now, the method can assure that in the atomistic region (when compared to a reference full atomistic simulation), only the first order of the probability distribution of the system, that is, the molecular number density, can be obtained with high accuracy. Higher orders, as atom-atom (or even molecule-molecule) radial distribution functions or three-body correlations, do not come automatically. Nevertheless, the approach of [54] may represent the closest (somehow "first order") procedure for a truly adaptive Hamiltonian procedure with the further, major, advantage of the possibility of performing adaptive Monte Carlo simulations [55]. In conclusion, a

293

truly, satisfactory, adaptive Hamiltonian has not been found yet, and my personal opinion is that this is not really needed; however, the cultural barrier in the MD community is such that, unfortunately, a method without a Hamiltonian is seen as a problem, rather than an alternative to the standard routes. The consequence is that often, the various additions to the original idea of AdResS, despite being based on clear thermodynamic arguments, are considered more at the level of practical patches, to cover, at the best, unavoidable conceptual holes, due to the lack of a Hamiltonian, rather than a natural methodological evolution *per se*.

On the other hand, one should keep in mind the following: the constraint that the atomistic region of the adaptive simulation should reproduce the probability distribution of the full atomistic system (that is, the key criterion to evaluate the quality of any adaptive method) can be easily implemented within the standard force-based AdResS, but (so far) not in a Hamiltonian one [48]. In [47,48], first principles, analytic/numerical conditions on the probability distribution of the system have been defined in such a way that the accuracy in the high resolution region can be controlled and systematically improved. Moreover, in [48], it has been also shown that the method is also a very powerful tool for the calculation of the chemical potential of complex liquids, at a computational cost that is orders of magnitude below that of the Insertion Particle Method, routinely used for calculating such a quantity.

#### 7. Conclusions

We have addressed the question of how to provide a concrete meaning to the expression *multiscale modeling* in Molecular Dynamics (simulation). The use of the criterion of separability of scale allows for classifying multiscale problems in three main categories. Next, the most popular computational strategies adopted for each category have been discussed. In this perspective, we have underlined the role of the concurrent coupling of scales via the idea of adaptive molecular resolution. In principle, this approach can be considered a truly multiscale technique, which acts as a sort of computational microscope to determine the molecular origin of the large-scale behavior of matter. In fact, it can zoom in to the very microscopic detail and zoom out to the large-scale behavior while the simulation is running. Future developments should be focused on including electronic resolution and, thus, address the question of interfacing a quantum and a classical system in a physically-consistent way.

#### Acknowledgments

This work was supported by the Deutsche Forschungsgemeinschaft (DFG) within the Heisenberg program (grant code DE 1140/5-1).

#### Conflicts of Interest

The authors declare no conflict of interest.

### References


Reprinted from *Entropy*. Cite as: Hsieh, C.-Y.; Kapral, R. Correlation Functions in Open Quantum-Classical Systems. *Entropy* 2014, *16*, 200–220.

#### *Article*

## Correlation Functions in Open Quantum-Classical Systems

#### Chang-Yu Hsieh and Raymond Kapral \*

Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON M5S 3H6, Canada; E-Mail: kim.hsieh@utoronto.ca

\* Author to whom correspondence should be addressed; E-Mail: rkapral@chem.utoronto.ca; Tel.: +1-416-978-6106 ; Fax: +1-416-978-5325.

*Received: 25 September 2013; in revised form: 21 October 2013 / Accepted: 22 October 2013 / Published: 27 December 2013*

Abstract: Quantum time correlation functions are often the principal objects of interest in experimental investigations of the dynamics of quantum systems. For instance, transport properties, such as diffusion and reaction rate coefficients, can be obtained by integrating these functions. The evaluation of such correlation functions entails sampling from quantum equilibrium density operators and quantum time evolution of operators. For condensed phase and complex systems, where quantum dynamics is difficult to carry out, approximations must often be made to compute these functions. We present a general scheme for the computation of correlation functions, which preserves the full quantum equilibrium structure of the system and approximates the time evolution with quantum-classical Liouville dynamics. Several aspects of the scheme are discussed, including a practical and general approach to sample the quantum equilibrium density, the properties of the quantum-classical Liouville equation in the context of correlation function computations, simulation schemes for the approximate dynamics and their interpretation and connections to other approximate quantum dynamical methods.

Keywords: quantum correlation functions; quantum-classical systems; nonadiabatic dynamics

#### 1. Introduction

The dynamical properties of condensed-phase or complex systems are often investigated experimentally by applying external fields to weakly perturb a system and observe its relaxation back to the thermal equilibrium state. In such experiments, measurable quantities can be related to equilibrium time correlation functions via linear response theory [1,2]:

$$C\_{AB}(t) \ = \frac{1}{Z\_Q} \text{Tr} \left[ e^{-\beta \hat{H}} \hat{A}(0) \hat{B}(t) \right] = \frac{1}{Z\_Q} \text{Tr} \left[ e^{-\beta \hat{H}} \hat{A} e^{\dagger \hat{H}t} \hat{B} e^{-\dagger \hat{H}t} \right] \tag{1}$$

where Aˆ and Bˆ are operators corresponding to some specific dynamical variables under investigation, Hˆ is the unperturbed Hamiltonian and Z<sup>Q</sup> is the quantum canonical partition function associated with Hˆ . Many experiments employing spectroscopic methods directly probe such time correlation functions.

Exact numerical evaluation of Equation (1) for real condensed phase quantum systems is prohibitive, since the computational cost scales exponentially with respect to the number of degrees of freedom (DOF). Various approaches have been developed to address this challenging problem. A common approach shared by many methods is to partition the entire system into a subsystem (whose dynamical properties are of interest) and an environment (or bath) in which the subsystem resides. Other recently developed schemes for computing quantum correlation functions do not rely on such a partition and instead utilize approximations to treat the quantum evolution of the entire system in conjunction with quantum equilibrium sampling [3–5]. In this paper, we focus on schemes based on the system-bath partition, and using this partition, the Hamiltonian reads: Hˆ = Hˆ<sup>b</sup> + hˆ<sup>s</sup> + Vˆc(Rˆ); where Hˆ<sup>b</sup> = <sup>P</sup>ˆ<sup>2</sup> <sup>2</sup><sup>M</sup> <sup>+</sup> <sup>V</sup>ˆb(Rˆ) and <sup>h</sup>ˆ<sup>s</sup> represent the pure bath and subsystem Hamiltonians, respectively. The last term in Hˆ is a coupling potential that depends on the spatial coordinates of the bath wave functions. We shall always take the bath part of the Hamiltonian in the coordinate representation; however, we can represent hˆ<sup>s</sup> = <sup>p</sup>ˆ<sup>2</sup> <sup>2</sup><sup>m</sup> <sup>+</sup> <sup>V</sup>ˆs(ˆr) in some quantum basis: <sup>h</sup>ˆ<sup>s</sup> <sup>=</sup> ij <sup>|</sup>i i<sup>|</sup> <sup>h</sup>ˆ<sup>s</sup> <sup>|</sup>j j|.

Several methods based on various master equations [6–10] and path integral influence functional methods [11,12] provide approximate schemes, often in the weak coupling limit, to systematically project out the environmental DOF and yield a subsystem dynamics that incorporates dissipation and decoherence, due to coupling to the environment. However, for many applications, such as proton and electron transfer in condensed phases, it is desirable to explicitly simulate, even approximately, the bath dynamics, since specific local bath DOF may be crucial for a description of the dynamics of the quantum subsystem. For this purpose, several semiclassical [13–15] and mixed quantum-classical [16,17] (MQC) methods, which either treat the entire dynamics semiclassically or simulate the dynamics of the bath and subsystem with different levels of rigor (e.g., classical *versus* quantum mechanical), have been formulated. Many semiclassical and mixed quantum-classical approaches, adopting powerful classical simulation techniques, evaluate Equation (1) by combined Monte Carlo-molecular dynamics (MC-MD) techniques.

In this paper, we formulate MC-MD schemes to evaluate Equation (1) within the framework of the quantum-classical Liouville equation (QCLE) [18]. The QCLE employs a partial Wigner representation of the environmental (bath) DOF and may be derived from full quantum dynamics by truncating the quantum evolution operator to the first order in a small parameter related to the ratio of the characteristic masses of quantum and bath DOF [18]. In particular, we suppose that the quantum subsystem has a finite-dimension Hilbert space. Under this assumption, Equation (1) is cast in the following form [19,20]:

$$C\_{AB}(t) \;= \frac{1}{Z\_Q} \sum\_{n\_1, n\_2} \int dX \left[ \left( e^{-\beta \hat{H}} A \right)\_W^{n\_1 n\_2} (X) B\_W^{n\_2 n\_1} (X, t) \right] \tag{2}$$

where the n<sup>j</sup> indices label the basis states (in some chosen quantum basis), X = (R, P) represents the Wigner-transformed phase space point for the bath, N<sup>B</sup> is the number of bath DOF and the subscript, W, on an operator indicates a partial Wigner transform on the bath DOF; e.g., an operator is partially Wigner transformed as Bˆ<sup>W</sup> (X) = dZ ( <sup>R</sup> <sup>−</sup> <sup>Z</sup> 2 <sup>B</sup><sup>ˆ</sup> <sup>R</sup> <sup>+</sup> <sup>Z</sup> 2 ) e i <sup>h</sup>¯ <sup>P</sup>·<sup>Z</sup>.

Two main tasks are involved in evaluating Equation (2) with an MC-MD algorithm. First, one needs to sample initial conditions (for an ensemble of trajectories) from the partially Wigner-transformed quantum density, ρˆeqAˆ W (X) with ρeq = e−βH<sup>ˆ</sup> /ZQ. There exist numerical algorithms to accomplish such a task [21,22]. Second, one needs to propagate the initial points in the phase space. These time-evolved trajectories may then be used to construct the matrix elements, Bnm <sup>W</sup> (X, t), needed to compute the correlation function. Various simulation methods, whose structure depends on the basis chosen to represent the quantum degrees of freedom in the QCLE, have been devised to simulate the mixed quantum-classical dynamics [23–31]. Simulation methods that utilize an adiabatic basis can be cast into the form of surface-hopping dynamics, but in a way that includes coherent evolution segments that account for creation and destruction of coherence in a proper manner. More recently, as in some semiclassical approaches [32], the mapping basis [33] was used to describe the quantum degrees of freedom in the QCLE in a continuous classical-like manner, leading to a trajectory description in the full system phase space [30,31,34–36].

The goals and outline of the paper are as follows: We first consider how the two ingredients, quantum equilibrium sampling and evolution of quantum operators, which are needed to compute quantum correlation functions, may be carried out. In Section 2, we describe a path-integral scheme to perform MC sampling from the partially Wigner transformed quantum density. In the Appendix, we also discuss a simplified, but approximate sampling scheme that is useful in the high-temperature limit. Another aim of this paper is to demonstrate how a recently-developed simulation method for the QCLE, the forward-backward trajectory solution (FBTS), can be used to efficiently obtain quantum correlation functions. To place these results in proper context, in Section 3, we sketch the important features and properties of the QCLE and discuss both the adiabatic Trotter-based surface-hopping (TBSH) algorithm and the FBTS, which is formulated in the mapping basis. In this section, we also present the explicit form of the N-level generalization of the TBSH algorithm. Comparisons of the trajectories that underlie these algorithms allow us to investigate how completely different ensembles of trajectories can be used to simulate the same observable correlation function. The implementation and utility of the simulation algorithms are illustrated on the dynamics in a two-level system coupled to a quartic oscillator embedded in a bath of independent harmonic oscillators, described in Section 4. Finally, in Section 5, we comment on the advantages, challenges and potential problems in adopting an approximate mixed quantum-classical dynamics for the computation of quantum time-correlation functions.

#### 2. Sampling from the Partially Wigner-Transformed Density

In general, analytical expressions for the Wigner transform of the density matrix cannot be determined easily. In this section, we present a path-integral-based scheme to perform MC sampling from the Wigner-transformed density, ρˆeqAˆ <sup>n</sup>1n<sup>2</sup> W (X), in Equation (2).

First, we recall the definition of partial Wigner transform:

$$\left(\hat{\rho}\_{eq}\hat{A}\right)\_W^{n\_1 n\_2}(X) = \frac{1}{Z\_Q} \int dZ \left\langle n\_1, R - \frac{Z}{2} \right| e^{-\beta \hat{H}} \hat{A} \left| n\_2, R + \frac{Z}{2} \right\rangle e^{\frac{i}{\hbar} P \cdot Z} \tag{3}$$

where R represents the vector of bath coordinates, n denotes a basis state for the subsystem and Hˆ = <sup>P</sup>ˆ<sup>2</sup> <sup>2</sup><sup>M</sup> <sup>+</sup> <sup>V</sup>ˆb(Rˆ) + <sup>h</sup>ˆ(Rˆ) with <sup>h</sup>ˆ(Rˆ) <sup>≡</sup> <sup>h</sup>ˆ<sup>s</sup> <sup>+</sup> <sup>V</sup>ˆc(Rˆ). One way to compute the integral on the right side of Equation (3) is to first factorize <sup>e</sup>−βH<sup>ˆ</sup> <sup>=</sup> <sup>4</sup> <sup>e</sup>−βLH<sup>ˆ</sup> into <sup>L</sup> <sup>−</sup> <sup>1</sup> pieces with <sup>β</sup><sup>L</sup> <sup>=</sup> β/(<sup>L</sup> <sup>−</sup> 1). Following the standard procedures for path integral calculations, we then insert resolutions of the identity, I = dR<sup>i</sup> <sup>m</sup><sup>i</sup> |mi, Ri mi, Ri|, between every pair of factorized operators and apply the approximation, <sup>e</sup>−βLH<sup>ˆ</sup> <sup>≈</sup> <sup>e</sup>−β<sup>L</sup> <sup>P</sup> <sup>2</sup> <sup>2</sup><sup>M</sup> e−βL(Vˆb(Rˆ)+hˆ(Rˆ)). The integrand on the right side of Equation (3) can then be written as follows:

$$\begin{split} & \left\langle n\_{1}, R - \frac{Z}{2} \right| e^{-\beta \hat{H}} \hat{A} \left| n\_{2}, R + \frac{Z}{2} \right\rangle \\ &= \int \prod\_{i=1}^{L-1} dR\_{i} \sum\_{\{m\_{i}\}} \left\langle n\_{1}, R - \frac{Z}{2} \right| e^{-\beta\_{L}R} \left| m\_{1}, R\_{1} \right\rangle \left\langle m\_{1}, R\_{1} \right| e^{-\beta\_{L}\hat{H}} \left| m\_{2}, R\_{2} \right\rangle \dots \\ & \times \left\langle m\_{L-1}, R\_{L-1} \right| \hat{A} \left| n\_{2}, R + \frac{Z}{2} \right\rangle, \\ &= \left( \frac{M}{2 \pi \beta\_{L} \hbar^{2}} \right)^{\frac{N\_{R}(L-1)}{2}} \int \prod\_{i=1}^{L-1} dR\_{i} \sum\_{\{m\_{i}\}} \left\{ \prod\_{i=1}^{L-2} M\_{i,i+1} (R\_{i}) e^{-\beta\_{L} V\_{b} (R\_{i})} e^{-\frac{M}{2\beta\_{L} \hbar^{2}} (R\_{i} - R\_{i+1})^{2}} \right\} \\ & \times \left\langle n\_{1}, R - \frac{Z}{2} \right| \left( e^{-\beta\_{L} \hat{H}} \left| m\_{1}, R\_{1} \right\rangle \left\langle m\_{L-1}, R\_{L-1} \right| \hat{A} \right) \left| n\_{2}, R + \frac{Z}{2} \right\rangle \end{split} \tag{4}$$

where:

$$\mathcal{M}\_{i,j} = \left\langle m\_i \right| e^{-\beta\_L \hat{h}(R\_i)} \left| m\_j \right\rangle = \begin{cases} \begin{array}{c} e^{-\beta\_L} h^{ij}(R\_i), & i = j \\ -\beta\_L h^{ij}(R\_i) e^{-\beta\_L h^{ij}(R)}, & i \neq j \end{array} \tag{5}$$

which is correct to order <sup>O</sup>(β<sup>2</sup> <sup>L</sup>). Substituting Equation (4) into Equation (3), the new integrand of the Wigner transform becomes <sup>A</sup><sup>ˆ</sup> <sup>=</sup> <sup>e</sup>−βLH<sup>ˆ</sup> <sup>|</sup>m1, R1 m<sup>L</sup>−<sup>1</sup>, R<sup>L</sup>−<sup>1</sup><sup>|</sup> <sup>A</sup><sup>ˆ</sup> , as shown in the last line of Equation (4). An analytical approximation for the Wigner transform of <sup>A</sup><sup>ˆ</sup> can be obtained easily in most cases when Aˆ is a pure observable subsystem or if it depends on just one of the conjugate variables: <sup>R</sup> or <sup>P</sup>. Since <sup>β</sup><sup>L</sup> <sup>1</sup>, it is possible to replace the term, <sup>e</sup>−βLH<sup>ˆ</sup> , inside <sup>A</sup><sup>ˆ</sup> with its high-temperature approximation (discussed in the Appendix). Letting <sup>A</sup>ˆ<sup>W</sup> (X) be the partial Wigner transform of <sup>A</sup>ˆ, Equation (3) reads:

$$\left(\widehat{\rho}\_{eq}\hat{A}\right)\_{W}^{n\_{1}n\_{2}}(X) = \frac{\mathcal{G}^{N\_{B}(L-1)/2}}{Z\_{Q}} \int \prod\_{i=1}^{L-1} dR\_{i} \sum\_{\{m\_{i}\}} \left\{ \prod\_{i=1}^{L-2} \mathcal{M}\_{i,i+1}(R\_{i}) e^{-\beta\_{L}V\_{b}(R\_{i})} e^{-\pi \mathcal{G}(R\_{i}-R\_{i+1})^{2}} \right\} \mathcal{A}\_{W}^{n\_{1}n\_{2}}(X) \tag{6}$$

where G = <sup>M</sup> 2πβL¯h<sup>2</sup> . Substituting Equation (6) into Equation (2), the time correlation function becomes:

$$C\_{AB}(t) = \frac{\mathcal{G}^{N\_B(L-1)/2}}{Z\_Q} \sum\_{n\_1, n\_2} \sum\_{\{m\_i\}} \int \prod\_{i=1}^{L-1} dR\_i \left\{ \prod\_{i=1}^{L-2} \mathcal{M}\_{i, i+1}(R\_i) e^{-\beta\_L V\_b(R\_i)} e^{-\pi \mathcal{G}(R\_i - R\_{i+1})^2} \right\} $$

$$\times \int dX \left(\hat{\mathcal{A}}\right)\_W^{n\_1 n\_2}(X) B\_W^{n\_2 n\_1}(X, t) \tag{7}$$

Following [37], we remark that the initial phase space coordinate X = (R, P) and auxiliary variables, {Ri}, can be sampled from probability densities constructed from <sup>A</sup>ˆ<sup>W</sup> (X) and |Mi,i+1(Ri)|e−βLVb(Ri) e−πG(Ri−Ri+1)<sup>2</sup> , respectively.

#### 3. Quantum-Classical Liouville Equation

In this section, we discuss how one can simulate the time-evolved matrix elements, B<sup>n</sup>2n<sup>1</sup> <sup>W</sup> (X, t), in Equation (2) using the QCLE:

$$\begin{split} \frac{\partial \hat{B}\_W(X,t)}{\partial t} &= \frac{i}{\hbar}[\hat{H}\_W, \hat{B}\_W] - \frac{1}{2}(\{\hat{H}\_W, \hat{B}\_W\} - \{\hat{B}\_W, \hat{H}\_W\}) \\ &= \, ^t\hat{\omega}\hat{B}\_W(X,t) = \frac{i}{\hbar}\left(\stackrel{\rightarrow}{\mathcal{H}}\_\Lambda \hat{B}\_W - \hat{B}\_W \stackrel{\leftarrow}{\mathcal{H}}\_\Lambda\right) \end{split} \tag{8}$$

where Λ =<sup>←</sup> ∇<sup>P</sup> → <sup>∇</sup><sup>R</sup> <sup>−</sup> <sup>←</sup> ∇<sup>R</sup> → ∇<sup>P</sup> . The arrow on top of a differential operator indicates the direction in which it acts. In the first line, the square bracket and the curly brackets denote the quantum commutator and classical Poisson brackets, respectively. The two kinds of Lie bracket act together as the generator of the mixed quantum-classical dynamics. Due to the fact that Hˆ<sup>W</sup> (X) and Bˆ<sup>W</sup> (X, t) are quantum operators with respect to the subsystem DOF, two differently ordered Poisson brackets are needed to properly account for the mixed dynamics. However, in general, the dynamics described by the QCLE does not have a Lie algebraic structure, a feature that is common to mixed quantum-classical approaches [38]. In the second line, we introduce the abstract, quantum-classical Liouville (QCL) superoperator, <sup>L</sup>ˆ. Finally, the third equality is another equivalent representation of QCLE in terms of the forward and backward mixed quantum-classical Hamiltonians:

$$\stackrel{\rightarrow}{\mathcal{H}}\_{\Lambda} = \hat{H}\_W \left( 1 + \frac{\hbar \Lambda}{2i} \right), \quad \stackrel{\leftarrow}{\mathcal{H}}\_{\Lambda} = \left( 1 + \frac{\hbar \Lambda}{2i} \right) \hat{H}\_W \tag{9}$$

The QCLE has many desirable features, such as the conservation of energy, momentum and phase space volumes. Furthermore, the QCLE is equivalent to full quantum dynamics for arbitrary quantum subsystems, which are bilinearly coupled to a harmonic bath. For instance, commonly used spin boson models are of this type. In this circumstance, the combination of quantum and classical brackets in the QCLE does have a Lie algebraic structure. For the more general bath and coupling potentials, the QCLE provides an approximate description of the quantum dynamics. In this case, comparisons of simulations of QCL dynamics with exact quantum results have indicated that it is quantitatively accurate for a wide range of systems [36,39–48]

The QCLE equation can be simulated using ensembles of trajectories, which, in combination with the quantum initial condition sampling discussed above, provides a way to compute quantum correlation functions. As we shall see, the nature of the trajectories that enter in the simulations depends on the algorithm and should not be ascribed physical significance. It is only the observable, in this case, the correlation function, that has physical meaning and is independent of the manner in which it is simulated, provided the simulation algorithm is capable of exactly solving the QCLE, which is not always the case. One of the goals of this paper is to illustrate how a recently-developed FBTS [31] can be used to easily compute quantum correlation functions. For this purpose, it is interesting to contrast the solution using this scheme, and the trajectory description that underlies it, with the previously-developed and frequently-used TBSH algorithm [26]. Taking the adiabatic representation of the QCL superoperator is the key step in implementing the TBSH algorithm. The last representation of QCLE in Equation (8) resembles the quantum Liouville equation and forms the starting point of the FBTS.

#### *3.1. Adiabatic Trotter-Based Surface Hopping*

In order to discuss the nature of the trajectory description involved in the TBSH algorithm, we briefly describe how it is implemented and, in particular, present the explicit generalization to an N-level quantum subsystem, which was only outlined in [26]. We first consider the adiabatic representation of the QCLE, since the TBSH algorithm is cast in this basis. The adiabatic basis is defined by <sup>h</sup>ˆ<sup>W</sup> (R)|α; <sup>R</sup> <sup>=</sup> <sup>E</sup>α(R)|α; <sup>R</sup>, where <sup>h</sup>ˆ<sup>W</sup> (R) = <sup>H</sup>ˆ<sup>W</sup> (R) <sup>−</sup> <sup>P</sup><sup>2</sup>/2<sup>M</sup> is taken to be the adiabatic Hamiltonian for a static configuration of R in this section. In the adiabatic basis, the QCLE reads:

$$\frac{\partial B\_W^{\alpha\alpha'}}{\partial t} = i\mathcal{L}\_{\alpha\alpha',\beta\beta'} B\_W^{\beta\beta'}(X,t) \tag{10}$$

where the matrix elements of the QCL superoperator are given by:

$$i\mathcal{L}\_{\alpha\alpha',\beta\beta'} = \left(i\omega\_{\alpha\alpha'} + iL\_{\alpha\alpha'}\right)\delta\_{\alpha\beta}\delta\_{\alpha'\beta'} - \mathcal{J}\_{\alpha\alpha',\beta\beta'} = i\mathcal{L}^{0}\_{\alpha\alpha'}\delta\_{\alpha\beta}\delta\_{\alpha'\beta'} - \mathcal{J}\_{\alpha\alpha',\beta\beta'} \tag{11}$$

with ωαα = (E<sup>α</sup> − E<sup>α</sup>)/h¯. (The Einstein summation convention will be used throughout the following sections, although sometimes, sums will be explicitly written if there is the possibility of confusion.) The Liouville operator, iL, may be separated into two contributions: The classical propagator is defined as:

$$iL\_{\alpha\alpha'} = \frac{P}{M} \cdot \frac{\partial}{\partial P} + \frac{1}{2} \left(F\_{\alpha} + F\_{\alpha'}\right) \cdot \frac{\partial}{\partial R} \tag{12}$$

where <sup>F</sup> <sup>α</sup> <sup>=</sup> α; <sup>R</sup><sup>|</sup> ∂hˆ<sup>W</sup> (R) ∂R |α; R is the Hellmann-Feynman force. The superoperator, Jαα,ββ, is responsible for nonadiabatic transitions and associated momentum changes in the bath. For an N-level system, there exist N(N − 1)/2 unique transitions. In the following, we define J as a sum of Jλλ, which introduces transitions only between the specific pair of λ and λ adiabatic states:

$$\begin{split} \mathcal{J}\_{\alpha\alpha',\beta\beta'} &= \sum\_{\lambda>\lambda'} (\mathcal{J}\_{\lambda\lambda'})\_{\alpha\alpha',\beta\beta'} \\ &= \sum\_{\lambda>\lambda'} \left\{ -d\_{\lambda\lambda'} \cdot \frac{P}{M} \left( (\delta\_{\lambda\alpha}\delta\_{\lambda'\beta} - \delta\_{\lambda\alpha}\delta\_{\lambda\beta})\delta\_{\alpha'\beta'} + ((\delta\_{\lambda\alpha'}\delta\_{\lambda'\beta'} - \delta\_{\lambda\alpha'}\delta\_{\lambda\beta'})\delta\_{\alpha\beta}) \right. \\ &\left. - \frac{1}{2}\hbar\omega\_{\lambda\lambda'}d\_{\lambda\lambda'} \cdot \frac{\partial}{\partial P} \left( (\delta\_{\lambda\alpha}\delta\_{\lambda'\beta} + \delta\_{\lambda'\alpha}\delta\_{\lambda\beta})\delta\_{\alpha'\beta'} + (\delta\_{\lambda\alpha'}\delta\_{\lambda'\beta'} + \delta\_{\lambda'\alpha'}\delta\_{\lambda\beta'})\delta\_{\alpha\beta} \right) \right. \\ &= \left. -\frac{P}{M} \cdot d\_{\alpha\beta} \left( 1 + \frac{1}{2}S\_{\alpha\beta} \cdot \frac{\partial}{\partial P} \right) \delta\_{\alpha'\beta'} + \frac{P}{M} \cdot d\_{\beta'\alpha'} \left( 1 - \frac{1}{2}S\_{\beta'\alpha'} \cdot \frac{\partial}{\partial P} \right) \delta\_{\alpha\beta} \right. \end{split} \tag{13}$$

where <sup>d</sup>αβ <sup>=</sup> α; <sup>R</sup><sup>|</sup> ∂/∂R <sup>|</sup>β; <sup>R</sup> and <sup>S</sup>αβ = ¯hωαβdαβ <sup>P</sup> <sup>M</sup> · <sup>d</sup>αβ<sup>−</sup><sup>1</sup> . The second equality gives the adiabatic representation of Jλλ. We remark that it is difficult to exactly simulate the term, J , involving bath momentum derivatives within the context of a trajectory-based algorithm. Using the identity that <sup>1</sup> <sup>2</sup>Sαβ · <sup>∂</sup> ∂P = ¯hωαβ<sup>M</sup> · ∂/∂( <sup>ˆ</sup> <sup>d</sup>αβ · <sup>P</sup>)<sup>2</sup>, where <sup>M</sup> is a diagonal matrix of the masses of the bath particles and ˆ dαβ is the unit vector along dαβ, allows us to employ the momentum-jump approximation:

$$\left(1 + \frac{c}{2} S\_{\alpha\beta} \cdot \frac{\partial}{\partial P}\right) f(P) \approx \quad e^{\frac{c}{2} S\_{\alpha\beta} \cdot \frac{\partial}{\partial P}} f(P) = e^{c\hbar \omega\_{\alpha\beta} M \cdot \partial / \partial (\not{a}\_{\overline{\alpha}} \cdot P)^2} f(P) = f(P + \Delta P\_c) \tag{14}$$

where c = 1, 2 corresponding to single and double hops, respectively, and ΔP<sup>c</sup> = ˆ dαβsgn ˆ d · P , ( ˆ <sup>d</sup>αβ · <sup>P</sup>)<sup>2</sup> <sup>+</sup> chω¯ αβ<sup>M</sup> <sup>−</sup> <sup>ˆ</sup> d ˆ d · P . We have a translation operator with respect to the variable, ( ˆ <sup>d</sup>αβ · <sup>P</sup>)<sup>2</sup>, in the above equation. Decomposing <sup>P</sup> <sup>=</sup> <sup>P</sup><sup>⊥</sup> <sup>+</sup> <sup>P</sup> <sup>=</sup> <sup>P</sup><sup>⊥</sup> <sup>+</sup> <sup>ˆ</sup> dαβsgn ˆ dαβ · P ' ˆ dαβ · P 2 , it becomes obvious that the translation operator updates P components by ΔPc, as presented in Equation (14). This momentum update conserves the energy of surface-hopping trajectories. Apart from technical issues associated with sampling when the algorithm is implemented, this is the only approximation made to QCL evolution. In fact, it is this approximation that gives this algorithm a surface-hopping structure that has some features in common with Tully's surface-hopping method; however, coherence and decoherence are automatically incorporated in the evolution. The QCLE does not have such sudden momentum changes, and its evolution is described by continuous momentum changes in the course of the evolution. Comparisons of results using this algorithm with exact quantum solutions indicate that the momentum-jump is rarely the source of problems.

Equation (10) admits a formal solution:

$$
\hat{B}\_W^{\alpha\alpha'}(X,t) = \left(e^{i\mathcal{L}t}\right)\_{\alpha\alpha',\beta\beta'} B\_W^{\beta\beta'}(X) \tag{15}
$$

Thus, our following discussion focuses on evaluating:

$$\left(e^{i\triangle t}\right)\_{\alpha\alpha',\alpha\_K\alpha'\_K} = \sum\_{\left\{\alpha\_1\alpha'\_1\right\},\ldots\left\{\alpha\_K\alpha'\_K\right\}} \prod\_{j=1}^K \left(e^{i\triangle t\_j}\right)\_{\alpha\_{j-1}\alpha'\_{j-1},\alpha\_j\alpha'\_j} \tag{16}$$

In the above equation, we simply factorize the propagator into K pieces with Δt<sup>j</sup> = t<sup>j</sup> − t<sup>j</sup>−<sup>1</sup> = Δt. In each small time slice, we perform the symmetric Trotter decomposition:

$$\mathcal{N}\left(e^{i\hat{\mathcal{L}}\Delta t\_{j}}\right)\_{\alpha\alpha',\beta\beta'} \approx \mathcal{W}\_{\beta\beta'}\left(t\_{j-1}, t\_{j-1} + \frac{\Delta t}{2}\right)e^{iL\_{\beta\beta'}\Delta t/2}\mathcal{Q}\_{\beta\beta',\alpha\alpha'}\mathcal{W}\_{\alpha\alpha'}\left(t\_{j-1} + \frac{\Delta t}{2}, t\_{j}\right)e^{iL\_{\alpha\alpha'}\Delta t/2} \tag{17}$$

where: <sup>W</sup>αα(t1, t2) = <sup>e</sup>iωαα (t2−t1) , and:

$$\mathcal{Q}\_{\alpha\alpha',\beta\beta'} = \left(e^{\mathcal{I}\Delta t}\right)\_{\alpha\alpha',\beta\beta'} = \left(e^{\sum\_{\lambda>\lambda'} \mathcal{I}\_{\lambda\lambda'}\Delta t}\right)\_{\alpha\alpha',\beta\beta'} \approx \left(\prod\_{\lambda>\lambda'} e^{\mathcal{I}\_{\lambda\lambda'}\Delta t}\right)\_{\alpha\alpha',\beta\beta'} \tag{18}$$

We observe that it is possible to express e<sup>J</sup>λλΔ<sup>t</sup> in the following block-diagonal matrix form:

$$e^{\mathcal{I}\_{\lambda\lambda'}\Delta t} = \mathcal{M}^{\lambda\lambda'} \oplus \mathcal{K}^{\lambda\lambda'}\_{\xi\_1} \cdots \oplus \mathcal{K}^{\lambda\lambda'}\_{\xi\_{N-2}} \oplus \mathcal{N}^{\lambda\lambda'} \tag{19}$$

where ξ<sup>i</sup> is one of the N − 2 adiabatic states other than λ and λ . In the above equation, M is a four by four matrix, defined with respect to the basis, {(λ, λ),(λ, λ ),(λ , λ),(λ , λ )}:

$$\mathcal{M}^{\lambda\lambda'} = \begin{pmatrix} \cos^2(a) & -\cos(a)\sin(a)\hat{j}\_{\lambda\lambda'} & -\cos(a)\sin(a)\hat{j}\_{\lambda\lambda'} & \sin^2(a)\hat{j}\_{\lambda\lambda'} \\ \cos(a)\sin(a)\hat{j}\_{\lambda\lambda'} & \cos^2(a) & -\sin^2(a) & -\sin(a)\cos(a)\hat{j}\_{\lambda\lambda'} \\ \cos(a)\sin(a)\hat{j}\_{\lambda\lambda'} & -\sin^2(a) & \cos^2(a) & -\sin(a)\cos(a)\hat{j}\_{\lambda\lambda'} \\ \sin^2(a)\hat{j}\_{\lambda\to\lambda'} & \cos(a)\sin(a)\hat{j}\_{\lambda\lambda'} & \cos(a)\sin(a)\hat{j}\_{\lambda\lambda'} & \cos^2(a) \end{pmatrix} (20)$$

with <sup>a</sup> = (P/M) · <sup>d</sup>λλΔt, and <sup>ˆ</sup>jλλ and <sup>ˆ</sup>j<sup>λ</sup>→λ are the momentum-jump operators, <sup>e</sup> 1 <sup>2</sup>Sλλ <sup>∂</sup> ∂P and e<sup>S</sup>λλ <sup>∂</sup> ∂P , defined in Equation (14) with c = 1, 2, respectively. In Equation (19), there exists another set of four by four matrices, <sup>K</sup>λλ <sup>ξ</sup><sup>i</sup> , with i = 1,...,N − 2. Each of these matrices is defined with respect to a basis of the form, {(λ, ξi),(λ , ξi)}⊕{(ξi, λ),(ξi, λ )}:

$$\mathcal{K}\_{\xi}^{\lambda\lambda'} = \begin{pmatrix} \cos(a) & -\sin(a)\mathring{j}\_{\lambda\lambda'} \\ \sin(a)\mathring{j}\_{\lambda\lambda'} & \cos(a) \end{pmatrix} \oplus \begin{pmatrix} \cos(a) & -\sin(a)\mathring{j}\_{\lambda\lambda'} \\ \sin(a)\mathring{j}\_{\lambda\lambda'} & \cos(a) \end{pmatrix} \tag{21}$$

Finally, there is a null matrix, <sup>N</sup> λλ , of a size of (<sup>N</sup> <sup>−</sup> 2)<sup>2</sup>, and the associated null space is spanned by basis vectors, (ξ1, ξ2), where <sup>ξ</sup><sup>i</sup> <sup>=</sup> <sup>λ</sup>( ) . We remark that one has to permute the basis vectors in order to construct these block-diagonal matrices [26].

At this point, we have specified all the necessary details in order to simulate the QCL dynamics in the adiabatic basis:

$$B\_W^{\alpha\alpha'}(X,t) = \sum\_{\substack{(\alpha\_1\alpha'\_1),\ldots,\\(\alpha\_K\alpha'\_K)}} \left[ \prod\_{j=1}^K \mathcal{W}\_{\alpha\_{j-1}\alpha'\_{j-1}} e^{iL\_{\alpha\_{j-1}\alpha'\_{j-1}}\Delta t} \mathcal{Q}\_{\alpha\_{j-1}\alpha'\_{j-1}\alpha\_j\alpha'\_j} \mathcal{W}\_{\alpha\_j\alpha'\_j} e^{iL\_{\alpha\_j\alpha'\_j}\Delta t} \right] B\_W^{\alpha\_K\alpha'\_K}(X) \tag{22}$$

where α( ) <sup>0</sup> = α( ) . The explicit summation over all quantum indices, (α1α 1)...(αKα <sup>K</sup>), can also be evaluated stochastically. For instance, given a pair of indices, (αjα<sup>j</sup>−<sup>1</sup>), one can determine the next pair at the time slice, j + 1, by drawing an MC sample from the transition probability:

$$P(\alpha\_{j+1}, \alpha\_{j+1}' | \alpha\_j, \alpha\_j') = \frac{|\mathcal{Q}\_{\alpha\_j \alpha\_j', \alpha\_{j+1}\alpha\_{j+1}'}|}{\sum\_{\beta\_{j+1}, \beta\_{j+1}'} |\mathcal{Q}\_{\alpha\_j \alpha\_j', \beta\_{j+1}\beta\_{j+1}'}|} \tag{23}$$

If the sampled new pair of indices differs from the starting pair, then the sampled Q matrix element must contain the proper momentum-jump operators to update the energy of the trajectory after the jump. In any actual implementation of this algorithm, it is desirable to restrict to nonadiabatic transitions between one pair of states in every time slice. Under this assumption, one can then approximate:

$$\mathcal{Q}\_{\alpha\alpha',\beta\beta'} \approx \begin{cases} \delta\_{\alpha\beta}\delta\_{\alpha'\beta'} & \text{if no hop happens,} \\ \left(c^{\mathcal{I}\mu\gamma}\right)\_{\alpha\alpha',\beta\beta'} & \text{if } (\alpha,\alpha') \rightarrow (\beta,\beta') \text{ involves transition between } (\mu,\gamma) \text{ states,} \\ 0 & \text{if } (\alpha,\alpha') \rightarrow (\beta,\beta') \text{ involves transitions between two or more pairs of states.} \end{cases} \tag{24}$$

In this algorithm, we see that the trajectories in the ensemble that are used to simulate the time evolution are non-Newtonian in character, consisting of Newtonian segments where the system evolves on adiabatic surfaces, or the mean of two adiabatic surfaces, interspersed with quantum transitions and momentum changes.

#### *3.2. Forward-Backward Trajectory Solution*

This scheme is motivated by another way of writing the formally exact solution [38] of the QCLE using the last line of Equation (8):

$$
\hat{B}\_W(X,t) \quad = \mathcal{S}\left(e^{i\stackrel{\rightarrow}{\hat{\mathcal{H}}}t/\hbar}\hat{B}\_W(X)e^{-i\stackrel{\leftarrow}{\hat{\mathcal{H}}}t/\hbar}\right) \tag{25}
$$

The S operator [31,38] specifies the order in which the forward and backward evolution operators act on Bˆ<sup>W</sup> (X). The ordering of evolution operators is critical because of the lack of an underlying Lie algebraic structure [38] of the QCLE.

One approach to solve Equation (25) is to apply the mapping transformation in which N discrete quantum states of the subsystem are represented by the continuous position and momenta of N fictitious harmonic oscillators. The properties of the original subsystem are then obtained via an ensemble average involving trajectories in the phase space of the fictitious oscillators. More precisely, in the mapping representation, a subsystem state, |λ, is replaced by |mλ = |01, ··· , 1λ, ··· 0<sup>N</sup> , a product state specifying the occupation numbers (limited to zero or one) of N fictitious harmonic oscillators [33,49]. Creation and annihilation operators, aˆ† <sup>λ</sup> and aˆλ, satisfy the commutation relation [ˆaλ, aˆ† <sup>λ</sup>] = δλ,λ for harmonic oscillators. The actions of these operators on the single-excitation mapping states are aˆ† <sup>λ</sup> |0 = |mλ and aˆ<sup>λ</sup> |mλ = |0, where |0 = |0<sup>1</sup> ... 0<sup>N</sup> is the ground state of the mapping basis.

Next, we define the mapping version of operators, Bˆm(X) = Bλλ <sup>W</sup> (X)ˆa† <sup>λ</sup>aˆ<sup>λ</sup>, such that matrix elements of Bˆ<sup>W</sup> in the subsystem basis are equal to the matrix elements of the corresponding mapping operator: Bλλ <sup>W</sup> (X) = λ|Bˆ<sup>W</sup> (X)|λ <sup>=</sup> mλ|Bˆm(X)|m<sup>λ</sup>. In particular, the mapping Hamiltonian is:

$$
\hat{H}\_m = H\_b(X) + h^{\lambda \lambda'}(R) \hat{a}\_{\lambda}^{\dagger} \hat{a}\_{\lambda'} \equiv H\_b(X) + \hat{h}\_m \tag{26}
$$

where we applied the mapping transformation only on the part of the Hamiltonian that involves the subsystem DOF in Equation (26). The mapping Hamiltonian, hˆm, is always a quadratic Hamiltonian with respect to the quantum DOF. The pure bath term, Hˆb(X), acts as an identity operator in the subsystem basis and is mapped onto the identity operator of the mapping space directly. The mapped formal solution of QCLE now reads:

$$
\hat{B}\_m(X,t) = \mathcal{S}\left(e^{i\vec{\mathcal{H}}\_\Lambda^\mathrm{im}t/\hbar}\hat{B}\_m(X)e^{-i\vec{\mathcal{H}}\_\Lambda^\mathrm{im}t/\hbar}\right) \tag{27}
$$

where <sup>→</sup> <sup>H</sup><sup>m</sup> <sup>Λ</sup> is given by <sup>→</sup> <sup>H</sup><sup>m</sup> <sup>Λ</sup> <sup>=</sup> <sup>H</sup>ˆm(1 + ¯hΛ/2i), with an analogous definition for <sup>←</sup> <sup>H</sup><sup>m</sup> Λ .

We now introduce the coherent states, |z, in the mapping space, aˆ<sup>λ</sup> |z = z<sup>λ</sup> |z and z| aˆ† <sup>λ</sup> = z<sup>∗</sup> <sup>λ</sup> z|, where |z = |z1,...,z<sup>N</sup> , and the eigenvalue is z<sup>λ</sup> = (q<sup>λ</sup> + ipλ)/ √ h¯. The variables q = (q1,...,q<sup>N</sup> ) and p = (p1,...,p<sup>N</sup> ) are mean coordinates and momenta of the harmonic oscillators encoded in the coherent state, |z, respectively. The coherent states form an overcomplete basis with the inner product between any two such states, z| z <sup>=</sup> <sup>e</sup>−(|z−z | <sup>2</sup>)−i(z·z∗−z∗·z ) . Finally, we remark that the coherent states provide the resolution of identity:

$$\mathcal{Z} = \int \frac{d^2 z}{\pi^N} \left| z \right> \left< z \right| \tag{28}$$

where <sup>d</sup><sup>2</sup><sup>z</sup> <sup>=</sup> <sup>d</sup>(#(z))d(\$(z)) = dqdp/(2¯h)<sup>N</sup> .

Similar to the path integral approach for solving the quantum dynamics, we decompose the forward and backward evolution operators in Equation (27) into a concatenation of M short-time evolutions with Δt<sup>i</sup> = τ and Mτ = t. In each short-time interval, Δti, we introduce two sets of coherent states, |zi and |z <sup>i</sup>, via Equation (28) to expand the forward and backward time evolution operators, respectively. The time evolution (generated by a quadratic Hamiltonian) of coherent states can be represented by trajectory evolution in the phase space of (q, p). After some algebra, the matrix elements of Equation (27) can be approximated by:

$$B\_W^{\lambda\lambda'}(X,t) = \sum\_{\mu\mu'} \int dx dx' \phi(x)\phi(x')\frac{1}{\hbar}(q\_\lambda + ip\_\lambda)(q\_{\lambda'}' - ip\_{\lambda'}')B\_W^{\mu\mu'}(X\_t)$$

$$\times \frac{1}{\hbar}(q\_\mu(t) - ip\_\mu(t))(q\_{\mu'}'(t) + ip\_{\mu'}'(t))\tag{29}$$

where x = (q, p) gives the real and imaginary parts of z, dx = dqdp and φ(x) = (¯h) −N e− ν(q<sup>2</sup> ν+p<sup>2</sup> <sup>ν</sup>)/¯h is the normalized Gaussian distribution function. In deriving Equation (29), we have invoked an orthogonality approximation on the inner product between subsequent coherent state variables, zi|e i <sup>h</sup>¯ ht <sup>ˆ</sup> <sup>|</sup>zi+1 <sup>=</sup> zi(t)|zi+1 ≈ <sup>π</sup><sup>N</sup> <sup>δ</sup>(zi+1 <sup>−</sup> <sup>z</sup>i(ti)), with <sup>i</sup> being the time step index. This approximation is necessary to construct a continuous trajectory of z(t). In the extended phase space of (X(t), z(t), z (t)), the trajectories follow Hamiltonian dynamics:

$$\frac{d\chi\_{\mu}}{dt} = \frac{\partial H\_{e}(\chi, \pi)}{\partial \pi\_{\mu}}, \qquad \frac{d\pi\_{\mu}}{dt} = -\frac{\partial H\_{e}(\chi, \pi)}{\partial \chi\_{\mu}}\tag{30}$$

where He(χ, π) = P<sup>2</sup>/2M + V0(R) + <sup>1</sup> 2¯hhλλ (R)(qλq<sup>λ</sup> + pλp<sup>λ</sup> + q λq <sup>λ</sup> + p λp <sup>λ</sup>) with <sup>V</sup>0(R) = <sup>V</sup>b(R) <sup>−</sup> T rhˆ(R), <sup>χ</sup> = (R, q, q ) and π = (P, p, p ). We remark that the FBTS trajectories manifestly conserve energy. Furthermore, simulating the dynamics with a standard velocity Verlet type of symplectic integrator has a stationary solution proportional to Hpseudo = He(χ, π)+Δt <sup>2</sup>δH, as discussed in [35].

The main approximation introduced in the derivation of the FBTS, Equation (29), is the orthogonality approximation. The simplest improvement to the algorithm is to refrain from applying this approximation at every time step. In [36], we outlined a practical approach to evaluate the set of selected integrals of z<sup>i</sup> and z <sup>i</sup> (which could be evaluated analytically if the orthogonality approximation were applied). We termed this extension of FBTS as the jump FBTS (JFBTS). Since the computational cost grows quickly with respect to the number of jumps inserted, one needs to make a trade-off between numerical efficiency and accuracy.

In the simplest approach, one selects every (M/K) time step from a total of M steps to fully evaluate the coherent state integrals:

$$\begin{split} B\_W^{\lambda \lambda'} (X, t) &= \sum\_{\mu \mu'} \sum\_{\substack{s\_0 \downarrow \mu, \\ s\_{K-1} \le s\_{K-1} \end{split}} \int \prod\_{v=0}^K dx dx' \phi(x\_v) \phi(x'\_v) \\ &\times \frac{1}{\hbar} (q\_{0\lambda} + ip\_{0\lambda}) (q'\_{0\lambda'} - ip'\_{0\lambda'}) B\_W^{\mu \mu'} (X\_t) \\ &\times \frac{1}{\hbar} \left\{ \prod\_{v=1}^K \left( q\_{(v-1)s\_{v-1}}(\tau\_v) - ip\_{(v-1)s\_{v-1}}(\tau\_v) \right) (q\_{vs\_v} + ip\_{vs\_v}) \right\} \\ &\times \frac{1}{\hbar} \left\{ \prod\_{v=1}^K \left( q'\_{(v-1)s\_{v-1}}(\tau\_v) + ip'\_{(v-1)s\_{v-1}}(\tau\_v) \right) \left( q'\_{vs\_v} - ip'\_{vs\_v} \right) \right\} \\ &\times \frac{1}{\hbar} (q\_{K\mu}(\tau\_{K+1}) - ip\_{K\mu}(\tau\_{K+1})) (q'\_{K\mu'}(\tau\_{K+1}) + ip'\_{K\mu'}(\tau\_{K+1})) \end{split} \tag{31}$$

where the subscripts, v and s, refer to the v-th time step and the s-th component of the q and p vectors, respectively, and τ<sup>v</sup> = t<sup>i</sup><sup>v</sup> − t<sup>i</sup>v−<sup>1</sup> with t<sup>i</sup><sup>0</sup> = 0 and t<sup>i</sup>K+1 = t. According to this prescription, the continuous FB trajectories experience K discontinuous jumps in the (x, x ) phase space. Between subsequent jumps, the evolution of the FB trajectory is governed by Equation (30). Simulations show that with a sufficient number of jumps, numerically exact solutions of the QCLE can be obtained [36].

#### *3.3. Comparisons between Algorithms*

The differences between the two QCLE simulation algorithms can be traced to the quantum basis that is used and the way that feedback between quantum and classical systems is treated. In the case of the TBSH algorithm, the trajectories are propagated through a Hellmann-Feynman force, or the mean of two Hellmann-Feynman forces [Equation (12)], with intermittent surface hops that switch the adiabatic surfaces on which the trajectories propagate. In the case of FBTS, one not only propagates the bath dynamical variables as trajectories, but also the quantum dynamical variables, which are associated with fictitious harmonic oscillators. In this extended phase space, we have exact Hamiltonian dynamics. In particular, the force acting on the bath particles simultaneously involves all N adiabatic surfaces, which is similar to, but different from, the Ehrenfest mean-field approach. The very different characteristics of the trajectories in two algorithms manifest the artificial character of the trajectory dynamics. Thus, one should not attach physical significance to single trajectories in the computation. All physical properties of the system can only be extracted from a proper ensemble average of a large set of trajectories, as implied in Equation (2). Nevertheless, insight into the trajectory dynamics of each algorithm will help to judge the simulation efficiency for various classes of models.

For certain problems, such as proton transfer reactions, where the time scales of the bath and subsystem are well-separated, even during nonadiabatic transitions, the TBSH algorithm can yield quantitatively accurate results with a few hops. There are also dynamical problems in which distinct bath motions can be explicitly correlated with the subsystem's quantum states. For instance, in the simple Tully I model [35,50], trajectories populated on the excited state will cross the avoided crossing point, while the ground state trajectories will eventually be reflected and retrace their paths in the opposite direction. This kind of behavior is, however, completely missed when one propagates trajectories in a single effective mean field. Again, the inherent multi-configuration nature of surface-hopping-like algorithms is a more appropriate choice for this case. However, a recent study [51] has indicated that the "jump" version of mean-field-like algorithms can improve the simulation results in cases of this type.

Alternatively, there are also many examples where one would expect FBTS to be the preferred simulation method. In general, the TBSH algorithm has convergence issues, as the MC weights associated with nonadiabatic hops grows rapidly. Even for the simple spin boson model, one can identify parameter regimes where this numerical instability is clearly observed. In these cases, the FBTS and JFBTS are certainly the alternatives that one should adopt for efficient simulations.

#### 4. An Example: Quartic Oscillator in a Harmonic Bath

As a specific example to illustrate the formalism outlined above, we consider a two-level system coupled to a quartic bistable oscillator with a single pair of phase space coordinates X<sup>0</sup> = (R0, P0). The quartic oscillator is, in turn, coupled to an Ohmic heat bath of N<sup>b</sup> independent harmonic oscillators with phase space coordinates X<sup>i</sup> = (Ri, Pi) and i = 1 ...Nb. The partially Wigner transformed Hamiltonian, expressed in the diabatic basis, {|R, |L}, reads:

$$\hat{H}\_W = \begin{pmatrix} \hbar\gamma\_0 R\_0 & -\hbar\Omega\\ -\hbar\Omega & -\hbar\gamma\_0 R\_0 \end{pmatrix} + \left(\frac{P\_0^2}{2M\_0} + V\_n(R\_0) + \sum\_{j=1}^{N\_b} \frac{P\_j^2}{2M\_j} + \frac{M\_j\omega\_j^2}{2} \left(R\_j - \frac{\gamma\_0 c\_j}{M\_j\omega\_j^2} R\_0\right)^2\right) \mathbf{I} \tag{32}$$

where <sup>V</sup>n(R0) = <sup>−</sup>M0ω<sup>2</sup> 0R<sup>2</sup> <sup>0</sup>/2 + AR<sup>4</sup> <sup>0</sup>/4 and **I** is an identity matrix. We take N<sup>b</sup> = 40 harmonic oscillators for the discretization of the Ohmic heat bath. Following the discretization scheme introduced in [52], we set <sup>ω</sup><sup>j</sup> <sup>=</sup> <sup>ω</sup><sup>c</sup> ln(1 <sup>−</sup> jωc/δω) and <sup>c</sup><sup>j</sup> = (ξhδωM ¯ <sup>j</sup> )<sup>1</sup>/<sup>2</sup>ω<sup>j</sup> with δω = (1 − exp(ωmax/ωc))/Nb. The parameters, ω<sup>c</sup> and ωmax, are the characteristic and cut-off frequencies for the Ohmic bath, respectively. The Kondo parameter is ξ.

The adiabatic states for the subsystem are:

$$\begin{aligned} \left| + ; R\_0 \right\rangle &= \frac{1}{\mathcal{N}(R\_0)} \left[ (1 - G) \left| R \right\rangle - (1 + G) \left| L \right\rangle \right] \\ \left| - ; R\_0 \right\rangle &= \frac{1}{\mathcal{N}(R\_0)} \left[ (1 + G) \left| R \right\rangle + (1 - G) \left| L \right\rangle \right] \end{aligned} \tag{33}$$

where <sup>N</sup> (R0) = 2(1 + <sup>G</sup><sup>2</sup>(R0)) and <sup>G</sup>(R0)=(γ0R0)−<sup>1</sup> / <sup>−</sup>Ω + Ω<sup>2</sup> <sup>+</sup> <sup>γ</sup><sup>2</sup> 0R<sup>2</sup> 0 0 . The adiabatic energies are given by E<sup>±</sup> = Vn(R0) ± h¯ Ω<sup>2</sup> + γ<sup>2</sup> 0R<sup>2</sup> <sup>0</sup> = Vn(R0) ± ±(R0).

We shall study the autocorrelation functions, <sup>C</sup>LL, with <sup>A</sup><sup>ˆ</sup> <sup>=</sup> <sup>B</sup><sup>ˆ</sup> <sup>=</sup> <sup>|</sup>L L|. The entire system is assumed to be in thermal equilibrium initially. Using the high-temperature approximation presented in the Appendix, the correlation function of interest can be given in a compact form:

$$\begin{split} C\_{LL}(t) &= \int dX\_0 dX\_b \mathcal{W}(R\_0) \mathcal{G}\left(P\_0; \frac{M\_0}{\beta}\right) \prod\_{j=1}^{N\_b} \mathcal{G}\left(P\_j; \frac{M\_j}{\beta}\right) \mathcal{G}\left(R\_j - \frac{\gamma\_b c\_j}{M\_j \omega\_j^2} R\_0; \frac{1}{\beta M\_j \omega\_j^2}\right) \\ &\times \sum\_{n=L,R} \sum\_{\alpha,\alpha'} F\_{\alpha\alpha'}(X\_0) \langle n|\alpha; R\_0\rangle \langle \alpha'; R\_0|L\rangle B\_W^{Ln}(X\_t) \end{split} \tag{34}$$

where <sup>G</sup>(x; <sup>σ</sup><sup>2</sup>) = (2πσ<sup>2</sup>) −1/2 e−x2/2σ<sup>2</sup> , and:

$$\mathcal{W}(R\_0) = \frac{e^{-\beta\left(\frac{A}{4}R\_0^4 - \frac{1}{2}M\_0\omega\_0^2 R\_0^2\right)} \left(e^{-\beta\epsilon\_+\left(R\_0\right)} + e^{\beta\epsilon\_-\left(R\_0\right)}\right)}{\int dR\_0 e^{-\beta\left(\frac{A}{4}R\_0^4 - \frac{1}{2}M\_0\omega\_0^2 R\_0^2\right)} \left(e^{-\beta\epsilon\_+\left(R\_0\right)} + e^{\beta\epsilon\_-\left(R\_0\right)}\right)}\tag{35}$$

An MC evaluation of the integrals can be done by sampling P0, Rb, P<sup>b</sup> from the Gaussian distributions and sampling R<sup>0</sup> from W(R0), respectively. The time-evolved matrix element, Bnm <sup>W</sup> (Xt), will be computed using both the TBSH and the FBTS algorithms. Finally, we note that the path-integral-based sampling scheme introduced in Section 2 should be adopted to sample phase-space points from (ˆρeq)<sup>W</sup> (X) for more generalized situations, including cases of low-temperature, arbitrary subsystem-bath divisions of a composite system, strong subsystem-bath couplings and an arbitrary potential energy profile.

In this study, we report numerical results in the energy unit, hω¯ <sup>c</sup>, and distance unit, h/M¯ <sup>j</sup>ωc, for each environmental DOF. We consider two sets of parameters. In the first case, we use the following parameter values, a = 1.0, ω<sup>0</sup> = 1.2, γ<sup>0</sup> = 0.05 γ<sup>b</sup> = 1.0, Ω=0.3, ξ = 0.1, ωmax = 3 and β = 0.2, in the dimensionless units. Figure 1a presents the potential surface profiles [53], Wα(R0). The two diabatic surfaces, WL,R(R0), remain close to each other, and the two adiabatic surfaces, W±(R0), share essentially the same characteristics. In this case, a mean-field-based algorithm, like FBTS, should be accurate and efficient. This problem can also be handled easily in the adiabatic basis, since the surface-hopping trajectories will be initialized in both the adiabatic ground and excited states, because the system is in a thermal equilibrium state at t = 0. Furthermore, the coupling parameter, γb, was purposely chosen to be small in order to minimize the number of nonadiabatic transitions (or hops) encountered in the TBSH algorithm. In panel (b), CLL(t) is computed using both algorithms. The agreement between these results is good.

Figure 1. (a) Potential surface profiles, Wα(R0), for the ground adiabatic state (black, dotted), excited adiabatic state (black, dotted) and for the diabatic states, L (green) and R (red). (b) CLL(t) correlation function. These results are associated with the first set of parameters.

Figure 2. (a) Potential surface profiles, Wα(R0), for the ground adiabatic state (black, dotted), excited adiabatic state (black, dotted) and for the diabatic states, L (green) and R (red). The blue curve is a plot of the un-normalized distribution function, W(R0), Equation (35). (b) CLL(t) correlation functions. (Inset) Short-time CLL(t) computed by the FBTS (blue) and TBSH (red) algorithms. These results are associated with the second set of parameters.

Next, we consider the following parameter set, a = 0.8, ω<sup>0</sup> = 0.6, γ<sup>0</sup> = 0.3 γ<sup>b</sup> = 1.0, Ω=0.1, ξ = 0.1, ωmax = 3, and β = 0.2 in the dimensionless units. Figure 2a shows the potential surface profiles, Wα(R0), obtained from this set of parameters. In this case, the adiabatic, W±(R0), and diabatic surfaces, WL,R(R0), only differ markedly near the region of the barrier top, where an avoided crossing point indicates significant mixing of the two diabatic states. Nonadiabatic effects should be most prominent near this barrier top. A stronger coupling, γ0, is also chosen in this case. Figure 2b presents the autocorrelation functions. In the main figure of panel (b), the blue curves (CLL(t) computed by the FBTS) start with the full correlation at one, then gradually reduce to 1/2, which implies that the subsystem is in an equal admixture of the two diabatic states in the asymptotic limit. The TBSH simulation results are only valid for very short times (as shown in the inset of

the Figure 2b), due to instability arising from the accumulation of weights, even with filtering [54]. The thermal equilibrium distribution, W(R0), has a bimodal distribution profile, as illustrated in Figure 2a; however, for the (inverse) temperature, β = 0.2, the double-peaked structure is very broad. The W(R0) distribution profile (blue curve in Figure 2a) suggests that the thermal equilibrium state has a non-trivial contribution from the excited surface. Sampling from W(R0) yields many R<sup>0</sup> values near the barrier top, where several hops immediately take place for this strong-coupling case, and the instability sets in early in the simulation. Lowering β will produce a more pronounced double-peak structure for W(R0), but the quartic oscillator's momentum, P0, will fluctuate with a larger variance in the presence of the heat bath in this case. Since nonadiabatic transitions depend non-trivially on a = P<sup>0</sup> · d12(R0)Δt in the TBSH algorithm, large momentum fluctuations will eventually affect the long-time result. This case shows some of the practical limitations of the TBSH algorithm for the computation of this correlation function.

#### 5. Conclusions

The scheme for computing the quantum correlation function in Equation (2) combines a numerically exact quantum initial sampling method with dynamics described by the QCLE; thus, the approximations in the simulation method reside in the dynamics. It is easier to compute the equilibrium properties of a quantum system, for instance, by using the imaginary-time Feynman path integral method, than to obtain dynamical properties by using similar real-time Feynman path integrals without adopting further approximations. Since we approximate the quantum dynamics of the entire system, quantum subsystem plus bath, by QCL dynamics, it is appropriate to comment on some of its features.

It is known that the quantum-classical bracket, defined in terms of the commutator and Poisson brackets in Equation (8), does not possess a Lie algebraic structure, since it fails to satisfy the Jacobi identity [2,38]. This lack of a proper algebraic structure is shared by all known MQC methods and simply reflects the inconsistency in mixing classical and quantum mechanical dynamics. One consequence of this inconsistency is that the partial Wigner transform, ρˆW e(R, P), of the full canonical equilibrium density function, ρˆeq = e−βH<sup>ˆ</sup> /ZQ, is not stationary under the QCLE; however, ρˆW e(R, P) can be written as an expansion in the mass ratio (or h¯), and it has been shown that the full quantum equilibrium density is conserved under the QCL dynamics up to O(¯h). Therefore, the detailed balance relation is also satisfied to this order. The violation of a detailed balance is a common problem that affects all major MQC methods, including the two most popular approaches, Ehrenfest mean-field [55] and Tully's fewest switching surface hopping [56] (FSSH), to various degrees. Of course, as noted earlier, for the class of models where an arbitrary quantum system is bilinearly coupled to a harmonic bath, the dynamics is exact, and a detailed balance is exactly satisfied.

The dynamics described by the QCLE can be related to that prescribed by other methods. In [57], it was shown that one could derive both Ehrenfest mean-field dynamics and a version of surface-hopping dynamics starting from the QCLE. In the former case, one simply drops all the "correlations" (including entanglement) between the subsystem and bath densities in the QCLE [58]. In the later case, one projects out all the off-diagonal matrix elements of the density in the QCLE to obtain a generalized master equation for the subsystem alone. Then, one considers decoherence to suppress the coherences in order to recover a simple "surface hopping" dynamics [59] similar to that prescribed in the FSSH algorithm. Furthermore, it had been proven [60] that the QCLE and the partially linearized path integral (PLPI) method [61–64] share the same starting mathematical foundation. In particular, the most recent PLPI algorithm, called PLDM (Partially Linearized Density Matrix) method [64], is very similar to the FBTS presented in this paper [31]. One can also draw comparisons between methods based on the QCLE and semiclassical initial value representations. For instance, numerical schemes based on the Poisson bracket mapping equation (PBME) [30], an approximate equation derived from the mapping-transformed QCLE, and the linearized semiclassical initial value representations [65] share the same set of equations of motion for the trajectories.

Mixed quantum-classical methods are often the only feasible approach to explore the dynamics of large complex systems, such as condensed phase or biochemical systems, where only a few light-mass DOF need be treated quantum mechanically. In many rate processes of interest, such as electron transfer or proton transfer, the local polar solvent motions are responsible for important features of the reaction mechanism. As a result, it is essential that the dynamics of these environmental degrees of freedom be treated in detail. Open quantum system methods that trace out all bath details cannot capture important aspects of such dynamics.

Some recent work [48,66] has suggested interesting ways to combine the QCLE and the generalized master equation [67–69] approach. Simulation tests on spin boson models [48] and a two-level system coupled to an anharmonic bath [68] indicate that accurate, long-time dynamical properties of such systems can be efficiently calculated with an improved memory kernel (which takes the short-time QCLE computation of some bath correlation functions as the input) for the general master equation. This type of hybrid approach may eventually prove to be useful for studies of more complex systems.

Finally, we provide comments that may help in choosing between the two algorithms for simulations. The TBSH algorithm, without filtering, provides a very accurate QCL dynamics before the onset of the sign problem associated with its heavy reliance on Monte Carlo sampling. While filtering can be used to extend simulations to much longer times, the problems related to Monte Carlo sampling limits its usefulness in performing long-time simulations, as vividly illustrated in Section 4. However, the TBSH is found to be the preferred simulation method (in comparison to the FBTS) when one investigates bath dynamical properties of systems in the vicinity of conical intersections and avoided crossings. For instance, the TBSH results accurately capture the intricate geometric phases [46] and the bimodal structure in the momentum distribution [35] in the Tully 1 model (a single avoided crossing model), while the FBTS fails to reproduce these delicate features, even though it provides fairly accurate population dynamics, as reported in [36]. Since the FBTS trajectory dynamics is based on a mean-field description, one finds that the results are usually very accurate (even in the long-time limit) when the energy gap between diabatic energy surfaces is small in comparison to the typical subsystem-bath coupling strength. Another advantage of the FBTS is the availability of the JFBTS [36] algorithm, which implements systematic correction of FBTS results towards the exact QCL dynamics and provides a simple method to gauge the sufficiency of the FBTS results.

### Acknowledgments

Research was supported in part by a grant from the Natural Sciences and Engineering Research Council of Canada.

## Conflicts of Interest

The authors declare no conflict of interest.

## Appendix

## High Temperature Limit

Many realistic chemical and biological processes take place at room temperature, in which case, it is often justified to apply a classical approximation to the bath. In this Appendix, we make two assumptions: As in most condensed phase models, we consider a pure subsystem observable, Aˆ, such that ρˆeqAˆ W = (ˆρeq)<sup>W</sup> <sup>A</sup>ˆ. We also assume that the environment is further partitioned into an immediate part that can couple nonlinearly to the quantum subsystem and shield the subsystem from the larger set of environmental DOF, often modeled as a heat bath of independent harmonic oscillators. Furthermore, we write X = {Xb, Xn}, where n refers to the few DOF that couple directly to the quantum subsystem and b refers to the remainder of the large number of coordinates that only couple to the n-labeled coordinates. Similarly, we re-label different parts of the Hamiltonian as follows: Hˆ = Hˆ<sup>s</sup> + Hˆ<sup>n</sup> + Vˆsn + Hˆ<sup>b</sup> + Vˆbn with Hˆ<sup>i</sup> = Kˆ<sup>i</sup> + Vˆ<sup>i</sup> and i = s, b, n. The quantities, Kˆ<sup>i</sup> and Vˆi, are the total kinetic energy and isolated potential, respectively, of the i-th system. Potential energy terms with a subscript of two letters imply a coupling potential between two components of the composite system. In addition, we introduce hˆ<sup>W</sup> (Rn) = Hˆ<sup>s</sup> + Vˆsn(Rn) + Vn(Rn), Hˆbn(Rn) = Hˆ<sup>b</sup> + Vˆbn(Rn) and Hˆsn = hˆ<sup>W</sup> (Rn) + Kˆn. In the following, we express the distance in units of λ<sup>j</sup> = h/M¯ <sup>j</sup>ω<sup>c</sup> and energy in units of hω¯ <sup>c</sup>, where ω<sup>c</sup> is the cut-off frequency of the heat bath.

Under these assumptions, one needs to evaluate the partial Wigner transform of e−βH<sup>ˆ</sup> alone. In the high temperature limit, we factorize the un-normalized equilibrium density matrix operator, ρˆ = e−βHˆsn e−βHˆbn . The partial Wigner transform of this approximate density operator reads:

$$\hat{\rho}\_W(X) = \int dZ e^{-iP\_n \cdot Z} \left\langle R\_n + \frac{Z}{2} \right| e^{-\beta \hat{H}\_{sn}} \left| R\_n - \frac{Z}{2} \right\rangle \rho\_b(X\_b; R\_n) \tag{36}$$

where ρb(Xb; Rn) = dZbe−iPb·Z<sup>b</sup> ( R<sup>b</sup> + <sup>Z</sup><sup>b</sup> 2 <sup>e</sup>−βHˆbn(Rn) R<sup>b</sup> <sup>−</sup> <sup>Z</sup><sup>b</sup> 2 ) is the Wigner transform of the un-normalized equilibrium density matrix for the heat bath.

We next apply a symmetric Trotter decomposition to the matrix element of Equation (36):

$$\begin{split} \left< R\_n + \frac{Z}{2} \right| e^{-\beta \hat{H}\_{sn}} \left| R\_n - \frac{Z}{2} \right> &\approx \left< R\_n + \frac{Z}{2} \right| e^{-\frac{\beta}{2} \Delta \hat{H}\_W (R\_n + \frac{Z}{2})} e^{-\beta \hat{H}\_{ho}} e^{-\frac{\beta}{2} \Delta \hat{H}\_W (R\_n - \frac{Z}{2})} \left| R\_n - \frac{Z}{2} \right> \\ &= \left( \frac{\omega}{2 \pi \sinh(\omega \beta)} \right)^{N\_n/2} \exp \left( -\coth\left(\frac{\omega \beta}{2}\right) \frac{\omega Z^2}{4} \right) \exp \left( -\tanh\left(\frac{\omega \beta}{2} \right) \omega R\_n^2 \right) \\ &\times e^{-\frac{\beta}{2} \Delta \hat{H}\_W (R\_n + \frac{Z}{2})} e^{-\frac{\beta}{2} \Delta \hat{H}\_W (R - \frac{Z}{2})} \end{split} \tag{37}$$

In this equation, the symmetric Trotter decomposition separates the subsystem potential in hˆ<sup>W</sup> (Rn) into harmonic, Vˆho = <sup>1</sup> 2ω<sup>2</sup>R<sup>2</sup> <sup>n</sup>, and anharmonic, <sup>Δ</sup>H<sup>ˆ</sup> (Rn) = <sup>h</sup>ˆ<sup>W</sup> (Rn) <sup>−</sup> <sup>V</sup>ho(Rn), contributions; furthermore, we define Hˆho = Kˆ<sup>n</sup> + Vˆho.

The anharmonic term in Equation (37) can be approximated as follows:

$$\begin{split} &e^{-\beta \Delta \bar{h}\_W(R\_n + \frac{Z}{2})} e^{-\beta \Delta \bar{h}\_W(R\_n - \frac{Z}{2})} \\ &= \sum\_{\alpha, \alpha'} e^{-\beta \bar{E}\_\alpha(R\_n)} \left[ \delta\_{\alpha \alpha'} + \frac{Z}{2} O\_{\alpha \alpha'}(R\_n) d\_{\alpha \alpha'}(R\_n) \right] |\alpha; R\_n\rangle \left\langle \alpha'; R\_n \right| \end{split} \tag{38}$$

where |n is the subsystem basis and |α; Rn is the real-valued adiabatic state with adiabatic energy <sup>E</sup>α(Rn) with respect to the Hamiltonian, <sup>h</sup>ˆ<sup>W</sup> (Rn). The adjusted energy is <sup>E</sup>˜α(Rn) = <sup>E</sup>α(Rn) <sup>−</sup> Vho(Rn). The O function in Equation (38) reads:

$$O\_{\alpha\alpha'}(R\_n) = \left[1 - e^{-\frac{\beta}{2}(\tilde{E}\_{\alpha'}(R\_n) - \tilde{E}\_{\alpha}(R\_n))}\right]^2\tag{39}$$

and dαα = α; Rn| ∇<sup>R</sup><sup>n</sup> |α ; Rn. Details of a similar derivation for Equations (37) and (38) may be found in [70].

Substituting Equation (37) into Equation (36) and integrating out the Z variable, Equation (36) simplifies to:

$$\begin{split} \left| \widehat{\rho}\_{W}(X) \right| &= \left( \frac{1}{2\pi \hbar \cosh(\frac{\omega \beta}{2})} \right)^{N\_{n}} \rho\_{b}(X\_{b}; R\_{n}) e^{-\frac{P\_{n}^{2}}{\omega} \tanh(\frac{\omega \beta}{2})} \sum\_{\lambda} e^{-\beta E\_{\lambda}(R\_{n})} \\ &\sum\_{\alpha, \alpha'} \left| \alpha; R\_{n} \right\rangle \langle \alpha'; R\_{n} | F\_{\alpha \alpha'}(X\_{n}) \end{split} \tag{40}$$

where:

$$F\_{\alpha\alpha'}(X\_n) = \frac{e^{-\beta E\_\alpha(R\_n)}}{\sum\_\lambda e^{-\beta E\_\lambda(R\_n)}} \left[ \delta\_{\alpha\alpha'} - i \frac{P\_n}{\omega} \tanh(\omega\beta/2) O\_{\alpha\alpha'}(R\_n) d\_{\alpha\alpha'}(R\_n) \right] \tag{41}$$

Now, the canonical partition function is determined by:

$$\begin{split} Z\_{Q} &= \sum\_{\alpha} \int dX\_{n} dX\_{b} \rho\_{W}^{\alpha \alpha}(X) \\ &= \left( \frac{1}{\cosh(\frac{\omega \beta}{2})} \right)^{N\_{n}} \left( \prod\_{j=1}^{N\_{b}} \frac{\pi}{\sinh(\omega\_{j}\beta/2)} \right) \sqrt{\frac{\pi \omega}{\tanh(\omega \beta/2)}} \int dR\_{n} \sum\_{\lambda} e^{-\beta E\_{\lambda}(R\_{n})} \\ &= \left( \frac{1}{\cosh(\frac{\omega \beta}{2})} \right)^{N\_{n}} Z\_{b} Z\_{sn} \end{split} \tag{42}$$

where Z<sup>b</sup> is defined by the expression in the second bracket on the second line and Zsn is defined by the expression behind the second bracket on the second line. Z<sup>b</sup> and Zsn are the bath and subsystem (with its immediate environment) canonical partition functions, respectively. In summary, the time correlation function takes the following simple form:

$$C\_{AB}(t) \;= \frac{1}{Z\_Q} \sum\_{n\_1, n\_2, n\_3} \int dX \left< n\_1 \right| \hat{\rho}\_W(X) \left| n\_3 \right> \left< n\_3 \right| \hat{A} \left| n\_2 \right> \left< n\_2 \right| \hat{B}\_W(X, t) \left| n\_1 \right> \tag{43}$$

where ρˆ<sup>W</sup> (X) and Z<sup>Q</sup> are given by Equations (40) and (42), respectively.

#### References


Reprinted from *Entropy*. Cite as: Bonella, S.; Ciccotti, G. Approximating Time-Dependent Quantum Statistical Properties. *Entropy* 2014, *16*, 86–109.

## *Article*

## Approximating Time-Dependent Quantum Statistical Properties

## Sara Bonella \* and Giovanni Ciccotti

Department of Physics and CNISM Unit 1, University of Rome "La Sapienza", Ple A. Moro 5, 00185 Rome, Italy; E-Mail: giovanni.ciccotti@roma1.infn.it

\* Author to whom correspondence should be addressed; E-Mail: sara.bonella@roma1.infn.it; Tel.: +39-06-49914208; Fax: +39-06-4957697.

*Received: 11 November 2013; in revised form: 10 December 2013 / Accepted: 19 December 2013 / Published: 27 December 2013*

Abstract: Computing quantum dynamics in condensed matter systems is an open challenge due to the exponential scaling of exact algorithms with the number of degrees of freedom. Current methods try to reduce the cost of the calculation using classical dynamics as the key ingredient of approximations of the quantum time evolution. Two main approaches exist, quantum classical and semi-classical, but they suffer from various difficulties, in particular when trying to go beyond the classical approximation. It may then be useful to reconsider the problem focusing on statistical time-dependent averages rather than directly on the dynamics. In this paper, we discuss a recently developed scheme for calculating symmetrized correlation functions. In this scheme, the full (complex time) evolution is broken into segments alternating thermal and real-time propagation, and the latter is reduced to classical dynamics via a linearization approximation. Increasing the number of segments systematically improves the result with respect to full classical dynamics, but at a cost which is still prohibitive. If only one segment is considered, a cumulant expansion can be used to obtain a computationally efficient algorithm, which has proven accurate for condensed phase systems in moderately quantum regimes. This scheme is summarized in the second part of the paper. We conclude by outlining how the cumulant expansion formally provides a way to improve convergence also for more than one segment. Future work will focus on testing the numerical performance of this extension and, more importantly, on investigating the limit for the number of segments that goes to infinity of the approximate expression for the symmetrized correlation function to assess formally its convergence to the exact result.

Keywords: semiclassical statistical properties; time correlation functions; mixed quantum classical dynamics

#### 1. Introduction

Exact simulation methods to compute either the evolution of the wave function or dynamical statistical averages for quantum systems in the condensed phase are currently restricted to small sizes and short times. The exponential scaling of available algorithms with the number of degrees of freedom, in fact, limits calculations to ten–twenty particles (and this for Hamiltonians of relatively simple form) and to time scales of at most a few picoseconds. This situation is in striking contrast with analogous classical calculations, which, when empirical potentials are adopted, are nowadays routinely used to study high dimensional, complex systems for times reaching, on dedicated machines, microseconds. (*Ab initio* classical molecular dynamics is considerably more expensive, but, depending on the number of electrons that have to be included, even in this case moderately sized systems of up to a hundred particles can be integrated for hundreds of picoseconds.) Several approximate schemes have thus been proposed attempting to import, with appropriate modifications, methods from classical molecular dynamics to quantum dynamics. Two approaches, in particular, can be identified in which classical trajectories play a crucial role: semi-classical and mixed quantum classical.

In semi-classical schemes, originally developed for approximating wave function propagation, all degrees of freedom are treated on equal footings. To begin with, the quantum time propagator is expressed, in the path integral formalism [1], as a sum over all possible paths connecting the initial and final states, each path being weighted by a complex exponential, whose argument is the classical action along it. The approximate propagator is then obtained by expanding the action to second order around its stationary points, which are classical trajectories, and performing the remaining quadratic path integral analytically [2,3]. Different forms exist for the semi-classical propagator depending on the specific representation adopted for the path integral (most notably standard coordinates, usually followed by the so-called initial value transformation [4,5], hybrid coordinate momenta [6] or coherent states [7–9]), but the evolved wave function has always the same structure, which we illustrate with the most commonly used Herman Kluk expression [10–12]:

$$|\Psi(t)\rangle\_{sc} = \int dp dq |q(t), p(t)\rangle W(t)e^{\frac{i}{\hbar}S\_{cl}(t)}\langle p, q|\Psi(0)\rangle\tag{1}$$

In the expression above, |q, p indicates a coherent state (in the coordinate representation, r|q, p ∝ <sup>e</sup>−γ(q−r)2+ip(q−r)/¯<sup>h</sup>), <sup>S</sup>cl(t) is the action computed along a classical trajectory propagated to time t from initial conditions (q, p), (q(t), p(t)) is the endpoint of the trajectory and W(t), a known function, is the result of the integration over the quadratic fluctuations around the stationary paths. All the functions of time in the integrand are calculable using classical evolution algorithms and, once the ket has been saturated (for example, via a scalar product), the integral over the initial conditions can be estimated via Monte Carlo sampling of a probability density based on the absolute value of the wave function at t = 0. Although calculations based on this scheme have been used, the semi-classical wave function is still remarkably expensive, due mainly to the characteristics of W(t). This function is in fact related to a linear combination of the monodromy matrices of the system (*i.e.*, the matrices of the derivatives of the endpoints of the trajectory with respect to the initial conditions), and, for chaotic dynamics, it can assume values varying over several orders of magnitude, hindering the convergence of the Monte Carlo sampling. Furthermore, the actual evaluation of the wave function requires one to project <sup>|</sup>Ψ(t)sc on a basis. While this is, in principle, straightforward (for example, one could choose the continuous coordinate representation and then discretize it on a grid), in practice it reinstates the exponential scaling of the numerical effort with the number of degrees of freedom. This last problem can be avoided focusing on expected values (observables):

$$\langle \chi\_{sc} \vert \Psi(t) \vert \hat{A} \vert \Psi(t) \rangle\_{sc} = \int d\tilde{p} d\tilde{q} dp dq \langle \Psi(0) \vert \hat{q}, \tilde{p} \rangle \tilde{W}^\*(t) e^{-\frac{i}{\hbar} \hat{S}\_{cl}(t)} \langle \hat{q}(t), \hat{p}(t) \rangle \hat{A} \vert q(t), p(t) \rangle W(t) e^{\frac{i}{\hbar} \hat{S}\_{cl}(t)} \langle p, q \vert \Psi(0) \rangle \tag{2}$$

(A<sup>ˆ</sup> is a Hermitian operator) at the price of doubling the dimension of the Monte Carlo sampling. The expression above, however, requires averaging the product of the two unstable W functions (for common operators, the matrix element in the integrand is known analytically, thus posing no problem), and although some schemes exist to mitigate the problem [13,14], this approach is of limited practical use. Shifting the focus from the wave function to the observables proves considerably more effective moving to the Heisenberg representation and taking advantage of the presence of two propagators in their exact quantum expression to develop alternative approximations. This strategy is most commonly adopted when calculating time-dependent statistical properties, more specifically, time correlation functions of operators <sup>A</sup><sup>ˆ</sup> and <sup>B</sup>ˆ, usually defined as:

$$\begin{split} \langle C\_{A,B}(t;\beta) \rangle &= \frac{1}{Z} \text{Tr} \left\{ e^{-\beta \hat{H}} \hat{A} e^{\pm \hat{H}t} \hat{B} e^{-\frac{i}{\hbar} \hat{H}t} \right\} \\ &= \frac{1}{Z} \int dr dr\_N d\tilde{r} d\tilde{r}\_N \langle r | e^{-\beta \hat{H}} \hat{A} |\tilde{r}\rangle \langle \tilde{r} | e^{\pm \hat{H}t} | \tilde{r}\_N \rangle \langle \tilde{r}\_N | \hat{B} | r\_N \rangle \langle r\_N | e^{-\frac{i}{\hbar} \hat{H}t} | r \rangle \end{split} \tag{3}$$

where <sup>H</sup><sup>ˆ</sup> is the Hamiltonian, <sup>Z</sup> the partition function, <sup>β</sup> = 1/kB<sup>T</sup> with <sup>k</sup><sup>B</sup> Boltzmann's constant and T temperature. (Throughout the paper, we consider distinguishable particles.) In the second line, the trace was expressed, in the coordinate representation, as the product of four matrix elements; from left to right, that of the product of the quantum Boltzmann density times operator <sup>A</sup>ˆ, the propagator backward in time, the matrix element of operator <sup>B</sup><sup>ˆ</sup> and the propagator forward in time. To pave the way for the so-called linearized approximation of the correlation function [13,15–18], the two propagators are expressed as path integrals in the hybrid coordinate-momenta representation, in which the resolution of the identity in the momentum basis (inserted, as usual, to evaluate the exponential of the kinetic energy contribution in the Trotter break up of the propagators) are not resolved analytically. The advantage of this representation is a closer analogy to the phase space representation of classical mechanics. The quantum expression of the correlation function thus obtained is then manipulated via a change of variables: the forward and backward paths are changed to semi-sum and difference paths (see the next section for a more precise definition of these paths), and the key approximation of the approach is introduced. A Taylor series expansion of the action to quadratic order in the difference path is performed, allowing all integrals on the difference variables, except the ones over the initial and endpoint in coordinate space, to be performed analytically. The integration results in a product of delta functions constraining the semi-sum path to be a classical (Hamiltonian) trajectory. The remaining integrals over the difference coordinates define Wigner transforms [19] of operators to give:

$$C\_{A,B}^{l}(t; \beta) = \frac{1}{(2\pi\hbar)Z} \int dr dp \left[e^{-\beta \hat{H}} \hat{A}\right]\_w(r, p) B\_w(r(t), p(t))\tag{4}$$

where, for example, Bw(r(t), p(t)) = dξe <sup>i</sup> <sup>h</sup>¯ <sup>p</sup>(t)<sup>ξ</sup>r(t) + ξ/2|Bˆ|r(t) <sup>−</sup> ξ/2. The superscript, l, indicates the linearization approximation. Compared to the semi-classical expression for the wave function, Equation (4) has the remarkable advantage of not containing unstable factors in the integrand, even though the approximation of the overall dynamics is accurate to the same order in h¯. Indeed, the second order terms in the action expansion, which originated the W(t) in Equation (1), cancel exactly when the difference of the action along the forward and backward paths is considered. The absence of the Ws suggests computing the approximate time correlation by combining Monte Carlo sampling of initial conditions and molecular dynamics. The serious difficulty with this idea lies in the sampling of the initial conditions. The probability density is, in fact, usually defined from the absolute value of e−βH<sup>ˆ</sup> Aˆ w (r, p)/(2πh¯)Z, but computing this Wigner transform is far from trivial, and the available methods introduce further, uncontrolled, approximations. In addition to this practical difficulty, there is also a conceptual problem with linearized calculations: the classical evolution conserves the quantum probability density only for short times. The rapid decay time of correlations for standard condensed phase systems is usually invoked to mitigate the consequences of this pathology, but it is known that in some cases, e.g., low dimensional systems with a long time coherence, it may lead to unphysical results (this is the so-called zero energy leakage problem).

The problems and numerical cost of semi-classical calculations justify the development of the second, alternative approximation scheme mentioned at the beginning of this section: mixed quantum classical dynamics. In this approach, the degrees of freedom of the system are partitioned into two sets, usually based on their mass ratio. The first set (called the subsystem) is composed of a few degrees of freedom and is treated quantum mechanically; the second (called the environment or the bath) is often high dimensional and is treated classically. Existing quantum classical methods differ in the way in which the coupling among the classical evolution of the bath and the quantum propagation of the subsystem is taken into account. The first approach of this kind, still very popular due to its efficiency and ease of implementation, is Tully's surface hopping [20]. In this scheme, electrons and nuclei constitute the subsystem and the bath, respectively, and the coupling, designed to mimic dynamics beyond the Born–Oppenheimer approximation, is defined *ad hoc* based on heuristic arguments. In more recent, and more rigorous, developments, the coupling is derived starting from a fully quantum representation of the evolution equations for the system and then taking a partial classical limit on the bath's degrees of freedom. Examples of this type are schemes to propagate the full density matrix of the quantum subsystem, such as the Wigner Liouville mixed quantum dynamics [21,22], or the iterative linearized density propagation methods [23,24]. Both surface hopping and Wigner Liouville dynamics (with particular reference to its most recent developments this issue. Focusing on the latter, we quickly recall that it adopts a mixed representation in which the operators related to the bath's degrees of freedom are described using the Wigner representation, while for the subsystem an abstract operator representation is retained. The quantum evolution operator in this mixed representation is then expanded to first order in the ratio of the thermal De Broglie wavelength of subsystem to the bath to obtain the generator of the mixed quantum classical dynamics. This generator has the form of a generalized Lie bracket in which both a commutator (linked to the operators for the subsystem) and a Poisson parenthesis (acting on the bath's phase space) appear. Once a specific basis set is chosen for the subsystem (e.g., adiabatic electronic states [21,25] or, more recently, the so-called mapping representation [26,27]) the evolution equation for the density matrix, or any observable, becomes explicit, and several different algorithms, sharing the characteristic that the bath motion is obtained via classical evolution (possibly with generalized forces describing the influence of more than one electronic state), have been proposed to solve them. In spite of its merits, it has been shown that this mixed quantum classical dynamics lacks several properties that characterize fully quantum and classical dynamics [28]. In particular the mixed Lie bracket does not satisfy the Jacobi identity exactly, and, similar to linearized calculations, the quantum thermal density is not stationary under the mixed dynamics. The loss of formal properties with respect to classical and quantum mechanics arises, in different forms, in all current mixed quantum classical schemes (see also [29]).

While application driven calculations might not be paralyzed by the state of affairs described above, in particular, if and when it is possible to verify that these well-known pathologies have no uncontrolled effects on the results, it is important to pursue alternative approaches in an effort to derive more general schemes allowing for systematic improvement and/or assessment of the approximations employed. Indeed, a critical stumbling block common to semi-classical and mixed quantum classical methods is that it is essentially impossible to go beyond classical trajectories to approximate the quantum evolution of the full system (semi-classical) or of the bath (mixed). In the semi-classical case, including terms of higher order, the expansion of the action along the paths makes it impossible to obtain calculable expressions for the pre-factor in the expression of the wave function (already at third order, the integral corresponds to intractable Airy functions [2]), while in the linearized correlation function, it kills the emergence of delta functions that univocally determine the semi-sum path. In mixed quantum classical calculations, we refer to the Wigner Liouville formalism, but analogous problems appear, for example, in the iterative linearized propagation methods, including higher order terms in the mass ratio expansion of the propagator introduces terms in the phase space evolution of the bath that cannot be integrated numerically. In this paper, we summarize (in the spirit of an extended review) a recently developed method [30] attempting to overcome this problem. In this approach, the focus is not directly on the dynamics, but, rather, on statistical time-dependent averages, which are linked (via linear response theory) to experimental observables. In particular, we focus on time correlation functions expressed in the symmetrized form first introduced by Schofield [31]:

$$G\_{A,B}(t; \beta) = \frac{1}{Z} \text{Tr} \left\{ \hat{A} e^{\frac{i}{\hbar} \hat{H} t\_c^\*} \hat{B} e^{-\frac{i}{\hbar} \hat{H} t\_c} \right\} \tag{5}$$

where <sup>t</sup><sup>c</sup> <sup>=</sup> <sup>t</sup> <sup>−</sup> iβ¯<sup>h</sup> <sup>2</sup> . The time Fourier transform of this complex time correlation function is related to the time Fourier transform of Equation (3) by a known multiplicative factor so both carry equivalent information. Furthermore, the symmetrized function shares some properties with its classical counterpart (e.g., it is a real function of time), which makes it a convenient starting point for developing approximations [32–37]. In the following, we summarize how the path integral formalism can be used to express the full complex time evolution in Equation (5) as a concatenation of segments alternating imaginary (*i.e.*, thermal sampling) and real-time propagation. The real-time propagation is then reduced to classical evolution via a linearization approximation. In our approach, the number of segments, L, plays a role analogous to that of the number of beads in standard thermal path integral calculations. Although the precise nature of the limit for L → ∞ is still under investigation, this analogy and numerical calculations on relatively simple model systems indicate that increasing the number of segments systematically improves the results with respect to classical dynamics or to the previously mentioned linearization schemes. It may be worth stressing that, in this approach, the focus is on computing the correlation by defining an appropriate stochastic process inspired by the full quantum expression. Adopting this perspective, the dynamics does not have any meaning *per se* and is viewed simply as part of a sampling mechanism, which is implemented via a generalized Monte Carlo scheme. While this circumvents some of the inconsistency of standard semi-classical and mixed quantum classical schemes and justifies further investigation of the method, it remains to be seen whether the approach outlined in the following has practical value. In fact, due to the presence of an increasing number of phase factors in the Monte Carlo estimator of the correlation function, the numerical cost of the calculation scales very badly with the number of segments (and of degrees of freedom). In the second part of the paper, we then summarize (again reviewing published material) how, when only one segment is considered, it is possible to improve the situation via a cumulant expansion that tames the phase factor present already in this lowest order approximation of the result [38]. We then present a new formal development of our approach that generalizes the use of cumulants to the case of more than one propagation segment, and we give the explicit formal expression for the case L = 2. Future work will focus on testing the accuracy of this new result. We conclude by stating some of the open questions related to the approach and indicating possible further developments.

#### 2. Theory

Let us begin by expressing the symmetrized correlation function, Equation (5), in the coordinate representation. Inserting resolutions of the identity, we have:

$$G\_{A,B}(t; \beta) = \frac{1}{Z} \int dr\_0 d\tilde{r}\_0 dr\_{t\_c} d\tilde{r}\_{t\_c} \langle r\_0 | \hat{A} | \tilde{r}\_0 \rangle \langle \tilde{r}\_0 | e^{\frac{i}{\hbar} \hat{H}t\_c^\*} | \tilde{r}\_{t\_c} \rangle \langle \tilde{r}\_{t\_c} | \hat{B} | r\_{t\_c} \rangle \langle r\_{t\_c} | e^{-\frac{i}{\hbar} \hat{H}t\_c} | r\_0 \rangle \tag{6}$$

The structure of the integrand is represented in Figure 1 in which we show the sequence of matrix elements to be evaluated. Reading the figure from the bottom left corner up, we see the matrix element of operator <sup>A</sup>ˆ, the backward complex time propagator (from <sup>r</sup>˜<sup>0</sup> to <sup>r</sup>˜c), the matrix element of operator <sup>B</sup><sup>ˆ</sup> and, finally, the forward propagator that closes the circuit representing the trace operation. The difficult task is the evaluation of the propagators in complex time. To set the stage for the approximation we intend to perform, we use the time composition property to divide the two propagations into L segments of duration τ<sup>c</sup> = tc/L (τ<sup>c</sup> need not be infinitesimal) and rewrite, for example:

$$\langle r\_{t\_c} | e^{-\frac{i}{\hbar} \hat{H} t\_c} | r\_0 \rangle = \int dr\_1...dr\_{L-1} \prod\_{J=0}^{L-1} \langle r\_{J+1} | e^{-\frac{i}{\hbar} \hat{H} r\_c} | r\_J \rangle \tag{7}$$

with r<sup>L</sup> = r<sup>t</sup><sup>c</sup> . Introducing an analogous expression for the backward propagator changes the scheme of the integrand as sketched in Figure 2, where each propagation lag (from J to J + 1) is indicated by the segment with arrows. We can now pair corresponding segments of propagation along the forward and backward paths, as indicated by the red frame in the figure, and define the product <sup>K</sup>(r<sup>J</sup>+1, r<sup>J</sup> ; ˜r<sup>J</sup>+1, <sup>r</sup>˜<sup>J</sup> ) = r˜<sup>J</sup>+1|<sup>e</sup> i <sup>h</sup>¯ Hτ <sup>ˆ</sup> <sup>∗</sup> <sup>c</sup> <sup>|</sup>r˜<sup>J</sup> r<sup>J</sup>+1|e<sup>−</sup> <sup>i</sup> <sup>h</sup>¯ Hτ <sup>ˆ</sup> <sup>c</sup> <sup>|</sup>r<sup>J</sup> to rewrite the symmetrized correlation function as:

$$G\_{A,B}(t; \beta) \quad = \frac{1}{Z} \int d\tilde{r}\_L dr\_L \langle \tilde{r}\_L | \hat{B} | r\_L \rangle \left\{ \prod\_{J=1}^{L-1} \int d\tilde{r}\_J dr\_J K(r\_{J+1}, r\_J; \tilde{r}\_{J+1}, \tilde{r}\_J) \right\}$$

$$\times \int dr\_0 d\tilde{r}\_0 K(r\_1, r\_0; \tilde{r}\_1, \tilde{r}\_0) \langle r\_0 | \hat{A} | \tilde{r}\_0 \rangle \tag{8}$$

Figure 1. Schematic representation of the integrand in the coordinate representation of the symmetrized time correlation function; see the text.

Figure 2. Schematic representation of the break up of the propagators in complex time: the short complex time propagators are represented as the segments with arrows along the forward and backward path, and the pairing mentioned in the text to obtain the K propagators is indicated by the red frame.

The expression above is an exact, incalculable, expression of the time correlation function. In the following, we will work on the generic K to obtain an approximate expression for it that has the key advantage of being analytically known as a product of functions calculable via an appropriate combination of Monte Carlo and molecular dynamics. To obtain this result, we first separate the real from the imaginary time part of the propagation by inserting one more resolution of the coordinate identity, thus:

$$\begin{split} K(r\_{J+1},r\_J;\tilde{r}\_{J+1},\tilde{r}\_J) &= \, \langle \tilde{r}\_{J+1} | e^{\frac{i}{\hbar}\hat{H}\tau\_c^\*} | \tilde{r}\_J \rangle \langle r\_{J+1} | e^{-\frac{i}{\hbar}\hat{H}r\_l} | r\_J \rangle \\ &= \int d\tilde{r}\_J^\nu dr\_J^\nu \langle \tilde{r}\_{J+1} | e^{\frac{i}{\hbar}\hat{H}\tau\_l} | \tilde{r}\_J^\nu \rangle \langle \tilde{r}\_J^\nu | e^{-\tau\_\beta \hat{H}} | \tilde{r}\_J \rangle \langle r\_{J+1} | e^{-\frac{i}{\hbar}\hat{H}\tau\_l} | r\_J^\nu \rangle \langle r\_J^\nu | e^{-\tau\_\beta \hat{H}} | r\_J \rangle \end{split} \tag{9}$$

The two (thermal) propagators in the integrand above, associated with the inverse temperature, τβ, can be expressed relatively easily in the path integral formalism as positive definite functions and interpreted, via the so-called classical isomorphism, as probability densities associated with systems of polymers. Well established techniques allow one to sample these densities. The real-time propagators, on the other hand, are prohibitive, even in path integral form. Feynman's prescription requires, in fact, to generate "all possible paths" connecting the initial and final point of the propagation, but, in contrast to the thermal case in which the probability density provides us with a sampling mechanism for the paths, no rule is given to determine them. Furthermore, even if we had a recipe for generating the paths, we would have to sum a (potentially infinite) set of phase factors, the exponential weighting each path. Capturing accurately the interference among these factors is essentially impossible (this is the well-known dynamical sign problem). As shown in detail in [30], progress can be made by deriving an approximate form for the product of the two real-time propagators in Equation (9). In analogy with the standard linearization methods mentioned in the Introduction, this is done most conveniently using a hybrid coordinate momenta representation of the propagators. The propagation time is divided into n intervals of length δ<sup>t</sup> = τt/n, and appropriate resolutions of the identity are introduced to isolate matrix elements of the propagator for each short time interval. As usual, after a Trotter break up of the exponential of the Hamiltonian is performed, the (diagonal) exponential of the potential can be trivially evaluated. The matrix element of the kinetic energy part of the propagator, on the other hand, is easily evaluated by inserting a resolution of the identity in the momenta. Contrary to what is done in standard path integrals, however, the resulting generalized Gaussian integral in the momenta is not performed analytically, but left in the expression. This sequence of operations results in:

$$\langle \check{r}\_{J+1} | e^{\frac{i}{\hbar} \hat{H}^{\tau\_{l1}}} | \check{r}\_J^{\nu} \rangle \langle r\_{J+1} | e^{-\frac{i}{\hbar} \hat{H}^{\tau\_{l1}}} | r\_J^{\nu} \rangle \approx \int \prod\_{l=1}^n \frac{d \vec{p}\_J^l}{2\pi \hbar} \prod\_{l=1}^{n-1} d \hat{r}\_J^{\nu+l} e^{-\frac{i}{\hbar} S(\{\hat{r}, \hat{p}\})} \int \prod\_{l=1}^n \frac{d p\_J^l}{2\pi \hbar} \prod\_{l=1}^{n-1} d r\_J^{\nu+l} e^{\frac{i}{\hbar} S(\{r, p\})} (10)$$

where ({r, p}) indicates the full set of path variables and <sup>S</sup>({r, p}) = <sup>n</sup> <sup>l</sup>=1 pl J (r (ν+l) <sup>J</sup> − r (ν+l−1) <sup>J</sup> ) <sup>−</sup> <sup>δ</sup>t[(p<sup>l</sup> <sup>J</sup> )<sup>2</sup>/2<sup>m</sup> <sup>−</sup> <sup>V</sup> (<sup>r</sup> (ν+l−1) <sup>J</sup> )] with analogous definitions for the tilde variables. The expression above becomes exact for n → ∞. At this stage, the forward and backward path integrals above are independent. Proceeding in analogy with standard linearization methods, we combine them by introducing the semi-sum and difference variables:

$$
\bar{r}\_J^{\nu+l} = \frac{r\_J^{\nu+l} + \tilde{r}\_J^{\nu+l}}{2} \qquad \qquad \bar{p}\_J^l = \frac{p\_J^l + \tilde{p}\_J^l}{2}
$$

$$
\Delta r\_J^{\nu+l} = \left. r\_J^{\nu+l} - \hat{r}\_J^{\nu+l} \right| \qquad \qquad \Delta p\_J^l = p\_J^l - \hat{p}\_J^l \tag{11}
$$

with l = 0, ..., n. In these variables, the difference of actions in Equation (10) is a linear function in Δp<sup>l</sup> <sup>J</sup> . Integrals over the difference momenta can then be performed analytically and result in a set of delta functions (this originates the last set of deltas in Equation (12) below). The dependence of the difference of the actions on Δr (ν+l) <sup>J</sup> is more complicated: they appear in an explicit, linear term, but also in the argument of the potentials. This dependence can also be linearized via the expansion V (¯r (ν+l) <sup>J</sup> + Δr (ν+l) <sup>J</sup> /2) <sup>−</sup> <sup>V</sup> (¯<sup>r</sup> (ν+l) <sup>J</sup> <sup>−</sup> <sup>Δ</sup><sup>r</sup> (ν+l) <sup>J</sup> /2) = <sup>∇</sup><sup>V</sup> (¯<sup>r</sup> (ν+l) <sup>J</sup> )Δr (ν+l) <sup>J</sup> + o[(Δr (ν+l) <sup>J</sup> )3]. This is the key approximation that we perform. An appropriate rescaling of the variables shows that the approximation is equivalent to a second order expansion in h¯ of the phase, but a more precise analysis of its validity is required and under consideration (see, also, the discussion at the end of this section). Bearing this in mind, we observe that, once the expansion is performed, also the integrals on the Δr (ν+l) <sup>J</sup> variables can be analytically solved, producing a second set of delta functions. Thus:

$$\begin{split} & \langle \bar{r}\_J^{\nu} - \frac{\Delta r\_J^{\nu}}{2} | e^{\frac{\mu}{h} \hat{H} \tau\_l} | \bar{r}\_{J+1} - \frac{\Delta r\_{J+1}}{2} \rangle \langle \bar{r}\_{J+1} + \frac{\Delta r\_{J+1}}{2} | e^{-\frac{\mu}{h} \hat{H} \tau\_l} | \bar{r}\_J^{\nu} + \frac{\Delta r\_J^{\nu}}{2} \rangle \approx \\ & \int d\bar{r}\_J^{\nu+1} \dots d\bar{r}\_J^{\nu+n-1} \int d\bar{p}\_J^1 \dots d\bar{p}\_J^n e^{\frac{\mu}{h} \hat{p}\_J^n \Delta r\_J^{(\nu+n)}} e^{-\frac{\lambda}{h} \bar{p}\_J^1 \Delta r\_J^{\nu}} \\ & \times \quad \prod\_{l=1}^{n-1} \delta \left[ (\bar{p}\_J^{(l+1)} - \bar{p}\_J^l) + \delta\_l \nabla V(\bar{r}\_J^{(\nu+l)}) \right] \prod\_{l=1}^n \delta \left[ \frac{\bar{p}\_J^l}{m} \delta\_t - (\bar{r}\_J^{(\nu+l)} - \bar{r}\_J^{(\nu+l-1)}) \right] \end{split} (12)$$

The linearization approximation then has two crucial consequences: (1) by allowing the integration over the difference paths, it transforms the quantum expression of the correlation function, which, in the beginning, includes two propagators and, therefore, two paths, into a formula where only the semi-sum path appears, thus leading to a structure more similar to classical time correlations in which only one propagation is present; (2) (perhaps more importantly) it forces the semi-sum path to follow a, classical, Hamiltonian trajectory, as identified by the arguments of the delta functions.

The final step to obtain a suitable expression for Equation (9) does not introduce any further approximation. Let us consider again the product of the thermal propagators in the equation. As mentioned above, these can be written via standard coordinate path integrals. Once this is done, it is convenient to introduce also for these propagators semi-sum and difference path (this is important, in particular, to ensure that the common boundaries of the thermal and real-time propagations, r<sup>ν</sup> <sup>J</sup> and r˜ν <sup>J</sup> , are represented coherently). In the semi-sum and difference variables, the product of the thermal propagators takes the form:

$$\begin{split} & \langle \bar{r}\_J - \frac{\Delta r\_J}{2} \vert e^{-\hat{H}\tau\_\beta} \vert \bar{r}\_J^\nu - \frac{\Delta r\_J^\nu}{2} \rangle \langle \bar{r}\_J^\nu + \frac{\Delta r\_J^\nu}{2} \vert e^{-\hat{H}\tau\_\beta} \vert \bar{r}\_J + \frac{\Delta r\_J}{2} \rangle \approx \left[ \frac{m}{2\pi\hbar \delta\_\beta} \right]^{(\nu - 1)} \\ & \int d\bar{r}\_J^1 \dots d\bar{r}\_J^{\nu - 1} \int d\Delta r\_J^1 \dots d\Delta r\_J^{\nu - 1} e^{-\delta\_\beta} \, \Sigma\_{\lambda = 1}^\nu [V^{(\bar{r}\_J^{(\lambda - 1)} + \Delta r\_J^{(\lambda - 1)}/2) + V(\bar{r}\_J^{(\lambda - 1)} - \Delta r\_J^{(\lambda - 1)}/2)]} \\ & \times \quad e^{-\frac{\sigma\_p^2}{2} \sum\_{\lambda = 1}^\nu (\Delta r\_J^\lambda - \Delta r\_J^{(\lambda - 1)})^2} \, e^{-\frac{1}{\sigma\_\tau^2} \sum\_{\lambda = 1}^\nu (\bar{r}\_J^\lambda - \bar{r}\_J^{(\lambda - 1)})^2} \end{split} \tag{13}$$

with σ<sup>2</sup> <sup>p</sup> = m/2δβh¯ and σ<sup>2</sup> <sup>r</sup> = ¯hδβ/2m. Substituting Equations (12) and (13) in Equation (9), it can be noted that the integral over Δr<sup>ν</sup> is of a Gaussian form and can be performed analytically. Introducing the notation <sup>X</sup><sup>J</sup> = ({r¯<sup>λ</sup> <sup>J</sup> }<sup>λ</sup>=0,...ν, {Δr<sup>λ</sup> <sup>J</sup> }<sup>λ</sup>=0,...ν−<sup>1</sup>, {r¯<sup>ν</sup>+<sup>l</sup> <sup>J</sup> }<sup>l</sup>=1,...n−<sup>1</sup>, {p¯<sup>l</sup> <sup>J</sup> }<sup>l</sup>=1,...,n), after integration over Δr<sup>ν</sup>, the linearized short time propagator can be written as:

$$K^l(\bar{r}\_{J+1}, \Delta r\_{J+1}; \bar{r}\_J, \Delta r\_J) = \int d\mathcal{X}\_J e^{\frac{i}{\hbar}\bar{p}\_J^n \Delta r\_{J+1}} \rho(\mathcal{X}\_J, \bar{r}\_{J+1}) e^{-\frac{i}{\hbar}\bar{p}\_J^1 \Delta r\_J^{(\nu-1)}} \tag{14}$$

with:

$$\begin{split} \rho(\mathcal{X}\_{J}, \bar{r}\_{J+1}) &= \prod\_{l=1}^{n-1} \delta \left[ (\bar{p}\_{J}^{(l+1)} - \bar{p}\_{J}^{l}) + \delta\_{l} \nabla V(\bar{r}\_{J}^{(\nu+l)}) \right] \prod\_{l=1}^{n} \delta \left[ \frac{\bar{p}\_{J}^{l}}{m} \delta\_{l} - (\bar{r}\_{J}^{(\nu+l)} - \bar{r}\_{J}^{(\nu+l-1)}) \right] \\ & \times e^{-\delta\_{\beta} \sum\_{\lambda=1}^{\nu} [V(\bar{r}\_{J}^{(\lambda-1)} + \Delta r\_{J}^{(\lambda-1)}/2) + V(\bar{r}\_{J}^{(\lambda-1)} - \Delta r\_{J}^{(\lambda-1)}/2)]} \\ & \times e^{-\frac{(\bar{p}\_{J}^{l})^{2}}{2\sigma\_{F}^{2}}} e^{-\frac{\sigma\_{p}^{2}}{2}} \Delta^{(\nu-1)}\_{\lambda=1} (\Delta r\_{J}^{\flat} - \Delta r\_{J}^{(\lambda-1)})^{2}} e^{-\frac{1}{\sigma\_{F}^{2}} \sum\_{\lambda=1}^{\nu} (\bar{r}\_{J}^{\lambda} - \bar{r}\_{J}^{(\lambda-1)})^{2}} \end{split} (15)$$

(The Gaussian in p¯<sup>1</sup> <sup>J</sup> , the first factor in the last line of the expression above, is the result of the integration over Δr<sup>ν</sup>.) Substituting the approximate form of the propagator between complex time slices J and J + 1, we obtain for the symmetrized correlation function:

$$\begin{split} G\_{A,B}^{(L)}(t,\beta) &= \frac{1}{Z} \int d\Delta r\_{t\_c} d\bar{r}\_{t\_c} \langle \bar{r}\_{t\_c} + \frac{\Delta r\_{t\_c}}{2} |\hat{B}| \bar{r}\_{t\_c} - \frac{\Delta r\_{t\_c}}{2} \rangle \\ &\times \prod\_{J=1}^{L-1} \int d\mathcal{X}\_J e^{\frac{i}{\hbar} \bar{p}\_J^0 \Delta r\_{J+1}} \rho(\mathcal{X}\_J, \bar{r}\_{J+1}) e^{-\frac{i}{\hbar} \bar{p}\_J^0 \Delta r\_J^{(\nu-1)}} \\ &\times \int d\mathcal{X}\_0 e^{\frac{i}{\hbar} \bar{p}\_0^0 \Delta r\_1} \rho(\mathcal{X}\_0, \bar{r}\_1) e^{-\frac{i}{\hbar} \bar{p}\_0^1 \Delta r\_0^{(\nu-1)}} \langle \bar{r}\_0 + \frac{\Delta r\_0}{2} |\hat{A}| \bar{r}\_0 - \frac{\Delta r\_0}{2} \rangle \end{split} \tag{16}$$

The expression above is interesting. First of all, assuming that the linearization approximation of each short time propagator improves when the propagation time goes to zero, there is potential for systematic improvement with increasing L, and indeed, numerical tests [30] indicate that this is the case. However, the limit for large L, and, in particular, the validity of the expansion in the difference path at the intermediate times of the propagation, is delicate. In fact, while it can be argued that the matrix elements of operators <sup>A</sup><sup>ˆ</sup> and <sup>B</sup><sup>ˆ</sup> (usually diagonal) force the forward and backward paths (the free and tilde variables in the upper panel of Figure 1) to start and end close to one another, and, therefore, that only small values of the difference among the paths will be relevant close to the initial and final time, truncating the expansion of the difference of the potentials along the whole pair of paths is considerably more delicate. This issue, and the nature of the dynamics when L → ∞, are currently under investigation. In the meantime, note that the ρ functions are positive definite, so that they can be used to define a probability density for sampling the overall path variables (*i.e.*, the full set of {X<sup>J</sup> }<sup>J</sup>=0,...,L−<sup>1</sup> variables) as Π = <sup>1</sup> Ω <sup>L</sup>−<sup>1</sup> <sup>J</sup>=0 <sup>ρ</sup>(X<sup>J</sup> , <sup>r</sup>¯<sup>J</sup>+1), where <sup>Ω</sup> is the (unknown) normalization factor. The method to deal with this factor is illustrated in the next subsection for the case L = 1 and can be straightforwardly generalized to L > 1. The probability density, Π, corresponds to a stochastic process, which concatenates the thermal and time propagations within each short time propagator, K<sup>l</sup> . The structure of the propagations in real and imaginary time is determined by the definition of ρ in Equation (15) and can be described as follows. For L = 1, there is only one real-time leg of duration τ<sup>t</sup> = t, while the imaginary time propagation corresponds to an inverse temperature β/2 for both the semi-sum and difference variables. The upper panel of Figure 3 illustrates these propagations with a sketch. In the figure, the horizontal axis is time and the vertical axis temperature. The vertical plane represents the space of configurations associated with the thermal path integral for both the semi-sum and difference variables; the thermal beads are represented with the red circles. The harmonic interactions in the thermal paths are indicated with zigzagged lines connecting adjacent beads, while the interactions among the two paths due to the potential are drawn as dashed lines. Note that the difference variables path, on the left in the vertical plane in the figure, has one less bead than the semi-sum variables path, due to the integration carried out to isolate a Gaussian probability density for the initial momenta. The propagation in real time is drawn as the curve on the horizontal plane, which represents the phase space of the system. The red and golden circle at t = 0 represents the initial conditions for the time evolution: the initial coordinate coincides with the last bead of the thermal path in the semi-sum variables, while the initial momentum is sampled from the Gaussian mentioned before. A phase factor is associated with the initial point of the classical propagation. The exponent of this phase couples the initial momentum of the trajectory with the last bead of the thermal path in the difference variables. A phase factor is also associated with the final point of the classical propagation, where, for L = 1, the exponent couples the momentum at time t with the variable, Δr1. The integrals over Δr (ν−1) <sup>0</sup> and Δr<sup>1</sup> in the expression for G<sup>1</sup> AB(t; β) involve products of these phase factors with the matrix elements of operators <sup>A</sup><sup>ˆ</sup> and <sup>B</sup>ˆ. The end-point integral reconstructs the Wigner transform of the operator, <sup>B</sup>ˆ. To see this, consider Equation (16). For L = 1, the second line of the equation is absent, and boundary conditions impose Δr<sup>1</sup> = Δr<sup>t</sup><sup>c</sup> (with similar relationships for the sum variables). With this notation, the integral over Δr<sup>t</sup><sup>c</sup> is recognizable as the Wigner transform of operator <sup>B</sup>ˆ. The structure of the sequence of imaginary and real-time propagation for generic values of L can be inferred from the lower part of Figure 3, where we show what happens for L = 2. In this case, there are two segments of classical dynamics, each of duration t/2, and two propagations of semi-sum and difference variables in imaginary time, taking the system from zero inverse temperature to β/4 and from β/4 to β/2, respectively. As before, the first segment of dynamics starts, with a Gaussian initial momentum, from the last bead of the semi-sum variable thermal path at t = 0. The end-point of this leg of propagation is the initial configuration for the semi-sum variable thermal path at t/2, and the second segment of dynamics has as initial conditions the final coordinate of the semi-sum variable thermal path and a new momentum sampled from a Gaussian. The variances of the Gaussians associated with the momentum sampling are doubled with respect to the case L = 1. The integrand now contains four phase factors coupling the momenta at the beginning and end of each classical dynamics segment with the values of the difference path variables at the end and at the beginning of each thermal slice, respectively. The phase factor that depends on Δr2p¯<sup>n</sup> <sup>1</sup> (*i.e.*, the phase factor computed at time t) can again be combined with the matrix element of operator <sup>B</sup><sup>ˆ</sup> to obtain the Wigner transform of this operator at the final time of the propagation, so that only three phases remain to contribute to the result. In general, G(L) AB(t; β) involves L segments of classical propagation, each of duration t/L, interspersed with L pairs of thermal paths in the semi-sum and difference variables, each at an inverse temperature β/2L. The rules for connecting the coordinate and momenta at the initial and final time of the dynamics with the final and initial points of the thermal paths and for constructing the <sup>2</sup><sup>L</sup> <sup>−</sup> <sup>1</sup> phase factors contributing to the integrand (the phase factor at time <sup>t</sup> can always be absorbed in the Wigner transform of operator <sup>B</sup>ˆ) are completely analogous to the L = 2 case.

Figure 3. Graphic representation of the propagators in real and imaginary times contributing to the approximate Schofield function for the case L = 1 (upper panel) and L = 2 (lower panel). The horizontal axis is real time, while the vertical axis is inverse temperature. The mean and difference coordinates in the thermal paths are represented as red dots on the vertical planes (in the upper panel, for example, ν = 6, *i.e.*, we use six beads to represent the thermal path integrals at inverse temperature β/2). Segments of classical propagation in phase space are represented as continuous red curves in the horizontal planes. The golden circles indicate the connection between the coordinate-momentum representation of the dynamics in real time (horizontal planes) and the representation of the dynamics in imaginary time that takes place in coordinate space (vertical planes).

A Monte Carlo algorithm to sample Π for different values of L was illustrated in [30]. This Monte Carlo has several non-standard features, most notably the fact that the normalization of the probability density is unknown and that Π contains products of delta functions, *i.e.*, singular distributions. These difficulties can be circumvented as detailed in [30]. The first one is tackled by recasting (without further approximations) Equation (16) in the form of a ratio of expected values. The second is addressed via an appropriate choice of the trial moves and acceptance probabilities. The most serious numerical difficulty, unfortunately, comes from the estimator of the observable.

In fact, from Equation (16), it can be seen that, in addition to the matrix elements of the operators, the integrand contains a product of phase factors to be evaluated at the beginning and end of each real-time propagation segment. As mentioned above, the number of these phase factors for the "order <sup>L</sup>" approximation of the symmetrized correlation function is <sup>2</sup><sup>L</sup> <sup>−</sup> <sup>1</sup>, and their presence rapidly hinders the convergence of the calculation. Furthermore, for a system of n particles in three dimensions, the phases take the (generic) form e<sup>±</sup> <sup>i</sup> <sup>h</sup>¯ **<sup>p</sup>**·**Δr**, where **p** and **Δr** are 3n-dimensional vectors. The number of phase factors thus scales linearly with the number of degrees of freedom, so that, even for small values of L, convergence is problematic. The numerical tests performed so far on simple model systems confirm both the interest and the difficulties of Equation (16). In [30], we computed position autocorrelation functions for a set of one-dimensional systems (e.g., quartic potential) at temperatures low enough to ensure that the system was in the quantum regime. We observed that increasing the number of complex time slices did systematically improve the length of time for which we were able to get accurate results. However, the numerical effort to go beyond L = 3, though not exponential in time, became essentially prohibitive, even for these simple systems. To indicate possible means to reduce the numerical effort involved in these calculations, we now discuss a recent development of the method developed to address the problem of the phase in the simplest case, L = 1, in which it presents itself. We then illustrate how to formally extend this development, which, in its simplest form, has been successfully applied to realistic models of condensed phase systems, to the case L > 1.

#### 3. L = 1: Fully Linearized Approximation

The expression for the L = 1 approximation of the symmetrized correlation function is given by (see Equation (16)):

$$\begin{split} G\_{\rm A,B}^{(1)}(t;\beta) &= \frac{1}{Z} \int d\Delta r\_{t\_c} d\bar{r}\_{t\_c} \int d\mathcal{X}\_0 \langle \bar{r}\_{t\_c} + \frac{\Delta r\_{t\_c}}{2} |\hat{B}| \bar{r}\_{t\_c} - \frac{\Delta r\_{t\_c}}{2} \rangle e^{\frac{i}{\hbar}\bar{p}\_0^n \Delta r\_{t\_c}} \\ &\times \quad \rho(\mathcal{X}\_0, \bar{r}\_{t\_c}) e^{-\frac{i}{\hbar}\bar{p}\_0^1 \Delta r\_0^{(\nu-1)}} \langle \bar{r}\_0 + \frac{\Delta r\_0}{2} |\hat{A}| \bar{r}\_0 - \frac{\Delta r\_0}{2} \rangle \end{split} \tag{17}$$

We are now going to simplify the expression above using four steps: (1) observe that the integral over <sup>Δ</sup>r<sup>t</sup><sup>c</sup> in the first line of the equation above defines the Wigner transform of operator <sup>B</sup><sup>ˆ</sup> (see, also, the definition below Equation (4)); (2) note that the product of δ functions in the definition of ρ (Equation (15)) forces (¯r<sup>t</sup><sup>c</sup> , p¯<sup>n</sup> <sup>t</sup><sup>c</sup> ) to be endpoints of a classical trajectory of length <sup>t</sup> starting at (¯rν <sup>0</sup> , p¯<sup>1</sup> <sup>0</sup>), so that, after integration over <sup>r</sup>¯(ν+l) (<sup>l</sup> = 1, ..., n−1) and <sup>p</sup>¯<sup>l</sup> (<sup>l</sup> = 1, ..., n), (¯r<sup>t</sup><sup>c</sup> , <sup>p</sup>¯<sup>n</sup> <sup>t</sup><sup>c</sup> )=(rt, pt) (where (rt, pt) denote the classically evolved variables); (3) choose, for the sake of simplicity, to specialize the discussion to an operator, <sup>A</sup>ˆ, which is diagonal in the coordinate representation (the case of generic operators is considered in the Appendix of [39]). This choice, producing a δ(Δr0) in the evaluation of the matrix elements, allows one to integrate also over Δr0. The surviving variables (*i.e.*, the semi-sum and difference variables of the thermal path integral and the initial momentum of the classical trajectory) will be indicated collectively as Γ = {p<sup>1</sup>, r<sup>0</sup>, ...., r<sup>ν</sup>, <sup>Δ</sup>r<sup>1</sup>, ..., <sup>Δ</sup>r<sup>ν</sup>−<sup>1</sup>}; (4) simplify the notation by dropping the bar from the semi-sum variables and the subscript, which identifies the J = 0 propagation segment in Equation (16), since only one segment is now present. Once these operations are performed, the correlation function can be written as:

$$G\_{A,B}^{(1)}(t;\beta) = \frac{\hat{Q}}{Z} \int d\Gamma B\_w(r\_t, p\_t) P(\Gamma) e^{-\frac{i}{\hbar}p^1 \Delta r^{(\nu - 1)}} A(r^0) \tag{18}$$

with:

$$P(\Gamma) = \frac{1}{\hat{Q}} e^{-\frac{(p^1)^2}{2\sigma\_p^2}} e^{-2\delta\_\beta V(r^0)} e^{-\frac{1}{2\sigma\_r^2} \sum\_{\lambda=1}^\nu (r^\lambda - r^{\lambda-1})^2} e^{-\frac{\sigma\_p^2}{2} \sum\_{\lambda=2}^{\nu-1} (\Delta r^\lambda - \Delta r^{\lambda-1})^2} e^{-\frac{\sigma\_p^2}{2} (\Delta r^1)^2} \tag{19}$$
 
$$\times \quad e^{-\delta\_\beta \sum\_{\lambda=2}^\nu \left[ V\left(r^{(\lambda-1)} + \frac{\Delta r^{\lambda-1}}{2}\right) + V\left(r^{(\lambda-1)} - \frac{\Delta r^{\lambda-1}}{2}\right) \right]} \tag{10}$$

and <sup>Q</sup><sup>ˆ</sup> the normalization of <sup>P</sup>(Γ). Note that the expression above for the probability is quite standard, being an explicit function of the semi-sum and difference variables, which can be sampled via Monte Carlo, multiplied by a Gaussian term for the momenta. As mentioned in the previous section, the ratio of the normalization over the partition function, which appears in Equation (18), is, in general, not known, and we estimate it via the autocorrelation of the identity, G(1) I,I =1= Qˆ Z dΓP(Γ)e<sup>−</sup> <sup>i</sup> <sup>h</sup>¯ <sup>p</sup>1Δr(ν−1) . Using this approach, the L = 1 estimator of the correlation function is given by the following ratio of expectation values over P(Γ):

$$G\_{A,B}^{(1)}(t;\beta) = \frac{\langle B\_w(r\_t, p\_t)e^{-\frac{i}{\hbar}p^1\Delta r^{(\nu-1)}}A(r^0)\rangle\_P}{\langle e^{-\frac{i}{\hbar}p^1\Delta r^{(\nu-1)}}\rangle\_P} \tag{20}$$

As anticipated, both in the numerator and denominator of this expression, a phase factor appears, which, for high dimensional systems, hinders an efficient convergence of the calculation. To alleviate this problem, we proposed a method, described in detail in [39], which starts by obtaining an alternative expression for G(1) A,B. As will be shown in the following, the new expression does not introduce further (analytical) approximations, but it has the advantage of eliminating the phase factor from the observable. Let us consider in more detail the structure of the probability, P(Γ). This probability is given by the product of a Gaussian for the momenta, <sup>ρ</sup>G(p) <sup>∝</sup> <sup>e</sup> <sup>−</sup> <sup>p</sup><sup>2</sup> 2σ2 <sup>p</sup> (note that, with respect to Equation (19), we dropped the superscript, 1, on the momenta to simplify the notation), times a joint probability function for the semi-sum and difference thermal variables to be indicated in the following as <sup>ρ</sup>˜(**r**, **Δr**), where we have introduced the notation **<sup>r</sup>** <sup>=</sup> {r<sup>0</sup>, ..., r<sup>ν</sup>} and <sup>Δ</sup>**<sup>r</sup>** <sup>=</sup> {Δr<sup>1</sup>, ..., <sup>Δ</sup>r(ν−1)}. This joint probability (whose form can be inferred from Equation (19) by taking out the momentum Gaussian) is most conveniently expressed as:

$$
\tilde{\rho}(\mathbf{r}, \Delta \mathbf{r}) = \rho\_c(\Delta \mathbf{r}|\mathbf{r})\rho\_m(\mathbf{r}) \tag{21}
$$

where:

$$\rho\_m(\mathbf{r}) = \frac{1}{\hat{Q}} e^{-2\delta\_\beta V(r^0)} e^{-\frac{1}{2\sigma\_r^2} \sum\_{\lambda=1}^\nu (r^\lambda - r^{\lambda-1})^2} \int d\Delta \mathbf{r} e^{-\frac{\sigma\_p^2}{2} \sum\_{\lambda=2}^{\nu-1} (\Delta r^\lambda - \Delta r^{\lambda-1})^2} e^{-\frac{\sigma\_p^2}{2} (\Delta r^1)^2} \tag{22}$$
 
$$\times \quad e^{-\delta\_\beta \sum\_{\lambda=2}^\nu \left[ V\left(r^{(\lambda-1)} + \frac{\Delta r^{\lambda-1}}{2}\right) + V\left(r^{(\lambda-1)} - \frac{\Delta r^{\lambda-1}}{2}\right) \right]} \tag{22}$$

is the marginal probability for the semi-sum variables and <sup>ρ</sup>c(Δ**r**|**r**) <sup>≡</sup> <sup>ρ</sup>˜(**r**, <sup>Δ</sup>**r**)/ρm(**r**) is the conditional probability for the difference variables given the semi-sum variables. This rewriting

335

of the probability density is convenient because the phase factors in Equation (20) depend only on the momenta and difference variables. We can use this observation to define:

$$F(p, \mathbf{r}) = \int d\Delta \mathbf{r} e^{-\frac{i}{\hbar}p\Delta r^{(\nu - 1)}} \rho\_c(\Delta \mathbf{r}|\mathbf{r}) \tag{23}$$

which is the average of the phase with respect to the conditional probability density, and investigate the properties of this function to see if we can use them to improve the convergence of our calculations. To that end, note that F is also, by definition, the cumulant generating function of the variable Δr(ν−1) with respect to the conditional probability, ρc(see, for example, [40,41] for previous use of cumulants in this field). This means that the coefficients of the Taylor series expansion (with respect to <sup>−</sup>ip/h¯):

$$\ln F(p, \mathbf{r}) = \sum\_{n=1}^{\infty} \frac{(-ip/\hbar)^n}{n!} \langle \left(\Delta r^{(\nu - 1)}\right)^n \rangle\_{\rho\_c(\Delta \mathbf{r}|\mathbf{r})}^c \tag{24}$$

(these coefficients are indicated above as Δr(ν−1)<sup>n</sup> c <sup>ρ</sup>c(Δ**r**|**r**)) are the cumulant moments of <sup>Δ</sup>r(ν−1). Importantly, the conditional probability density is an even function of the difference variables, implying that only even order terms in the series above are non-zero and that the series corresponds to a real function that we will denote in the following with E(p, **r**). We can then express the average of the phase as:

$$F(p, \mathbf{r}) = e^{-E(p, \mathbf{r})} \tag{25}$$

*i.e.*, a positive definite function of the momenta and the semi-sum variables. We now use the function above to define a new probability density:

$$\mathcal{P}(p,\mathbf{r}) = \frac{\rho\_g(p)e^{-E(p,\mathbf{r})}\rho\_m(\mathbf{r})}{\int dp d\mathbf{r} \rho\_g(p)e^{-E(p,\mathbf{r})}\rho\_m(\mathbf{r})} \tag{26}$$

and note that, by direct substitution of this definition in the (explicit) expression of Equation (20), we obtain:

$$G\_{A,B}^{(1)}(t; \beta) = \langle B\_w(r\_t, p\_t) A(r^0) \rangle\_{\mathcal{P}} \tag{27}$$

The key advantage of the expression above is that the observable does not contain phase factors anymore and is, therefore, well suited for a Monte Carlo estimate. Sampling the distribution, P, however, is non-trivial, since this probability density contains two factors, e−E(p,**r**) and ρm(**r**), that do not have an explicit analytic form, but, for each value of **r** and p, can only be estimated numerically. The numerical estimate of E(**r**, p), in particular, requires one to truncate the cumulant series at a given order. The convergence of the calculation with respect to truncation of the series can always be checked numerically, and, although the cost scales up where terms of higher order are included, it does not present any particular difficulty. (In all calculations performed so far, a second order cumulant expansion proved sufficient.) In the following subsection, we briefly describe how to combine two schemes, known as the Kennedy and Penalty methods, for Monte Carlo sampling of noisy probability densities and obtain G(1) A,B(t; β). Our goal is to highlight the main differences among these schemes and standard Monte Carlo and to indicate where the algorithm is most affected by them. A detailed description of the algorithm can be found in [38,39].

#### *3.1. Noisy Monte Carlo Algorithm*

To simplify the discussion, we introduce some notation. Let us indicate the coordinate-dependent Gaussian terms in Equation (19) as:

$$e^{-\frac{1}{2\sigma\_r^2} \sum\_{\lambda=1}^{\nu} (r^{\lambda} - r^{(\lambda - 1)})^2} = e^{-\mathbf{V}\_r(\mathbf{r})}$$

$$e^{-\frac{\sigma\_p^2}{2} \sum\_{\lambda=1}^{\nu - 1} (\Delta r^{\lambda} - \Delta r^{(\lambda - 1)})^2} = e^{-\mathbf{V}\_\Delta(\Delta \mathbf{r})} \tag{28}$$

(above Δr<sup>0</sup> = 0) and write the potential term as:

$$e^{-\delta\_{\beta}\sum\_{\lambda=2}^{\nu}\left[\mathbf{V}(r^{(\lambda-1)}+\frac{\Delta r}{2}^{\prime \lambda-1})+\mathbf{V}(r^{(\lambda-1)}-\frac{\Delta r}{2}^{\prime \lambda-1})\right]}e^{-2\delta\_{\beta}\mathbf{V}(r^{0})}$$
 
$$=e^{-\delta\_{\beta}\bar{\mathbf{V}}(\mathbf{r},\Delta\mathbf{r})}\tag{29}$$

We also rewrite the marginal probability, ρm(**r**), defined in Equation (22), isolating the terms that do not depend on Δ**r** and have an explicit analytic expression, thus:

$$\begin{split} \rho\_m(\mathbf{r}) &= \int d\Delta \mathbf{r} \tilde{\rho}(\mathbf{r}, \Delta \mathbf{r}) = \frac{1}{Q} e^{-\mathbf{V}\_r(\mathbf{r})} \int d\Delta \mathbf{r} \, e^{-\delta\_\beta \bar{\mathbf{V}}(\mathbf{r}, \Delta \mathbf{r})} e^{-\mathbf{V}\_\Delta(\Delta \mathbf{r})} \\ &= \frac{1}{Q} e^{-\mathbf{V}\_r(\mathbf{r})} \rho'\_m(\mathbf{r}) \end{split} \tag{30}$$

With the definitions above, <sup>P</sup>(p, **<sup>r</sup>**) takes the form:

$$\mathcal{P}(p,\mathbf{r}) = \frac{1}{\mathcal{Q}} \rho\_G(p) e^{-E(\mathbf{r},p)} e^{-\mathbf{V}\_r(\mathbf{r})} \rho\_m'(\mathbf{r}) \tag{31}$$

where Q is the normalization. The scheme that we use to perform the sampling is based on earlier work by Ceperley [42] and Kennedy [43]. Adapting their ideas to our case, we will introduce a Monte Carlo algorithm in which the definition of the probability to generate a new state of the system by changing either the coordinates or the momenta of the particle (unlike what happens in classical canonical densities, in our probability, the variables, p and **r**, are not independent (with the momenta Gaussian and then integrable) and must be treated together) and/or to accept this new state is modified to guarantee that detailed balance is satisfied also when ρ <sup>m</sup>(**r**) and E(**r**, p) are estimated with significant noise. Both the Ceperley and Kennedy scheme require the introduction of appropriate numerical estimators of the unknown functions. These estimators will be indicated with calligraphic fonts.

The Monte Carlo scheme to sample Equation (31) is constructed as follows. Choose, with probability 1/2, if the move will involve **r** or p.

(1) A move on p has been selected:

choose a new momentum according to p = p + δp, where δp is a uniform random number centered on zero (the magnitude of the displacement is chosen so as to optimize the acceptance). Taking into account that the **r** variables are not being updated, detailed balance for this trial move takes the form:

$$
\rho\_G(p)e^{-E(\mathbf{r},p)}A^p(p\to p') = \rho\_G(p')e^{-E(\mathbf{r},p')}A^p(p'\to p) \tag{32}
$$

) is the acceptance probability. The detailed balance relationship above has the

where <sup>A</sup><sup>p</sup>(<sup>p</sup> <sup>→</sup> <sup>p</sup> same form as the one discussed by Ceperly *et al*. within the penalty method [42], a generalized Monte Carlo for sampling a density given by the exponential of a function, in our case E(., .), known with statistical errors. According to the penalty method, if a numerical estimate, <sup>Δ</sup>Ep(p , p; **r**), of the difference E(**r**, p ) <sup>−</sup> <sup>E</sup>(**r**, p) has been obtained (for example, by averaging <sup>N</sup><sup>s</sup> values of a specific estimator) and an estimate of its variance, χ<sup>2</sup> <sup>p</sup>, is also known, detailed balance can be satisfied by defining the acceptance as:

$$A^p(p \to p') = \min\left[1, \frac{\rho\_G(p')}{\rho\_G(p)} e^{-\Delta \mathcal{E}\_p(p', p; \mathbf{r}) - u\_{\chi\_p^2}}\right] \tag{33}$$

where:

$$u\_{\chi^2\_p} = \frac{\chi^2\_p}{2} + \frac{\chi^4\_p}{4(N\_s + 1)} + \dots \tag{34}$$

The expression for the acceptance probability differs from the standard Metropolis prescription for the presence of uχ<sup>2</sup> <sup>p</sup> and is valid when χ<sup>2</sup> <sup>p</sup>/n < 1/4(herenisthesizeof thesampleusedtoestimatetheenergydifference) [42]. In the limit of an infinitely precise estimate of the difference, uχ<sup>2</sup> <sup>p</sup> <sup>→</sup> <sup>0</sup> and the standard criterion is recovered; when non-zero, this function corrects, on average, for the effect of the noise.

(2) A move on **r** has been selected:

in this case, indicating with <sup>T</sup><sup>r</sup>(**<sup>r</sup>** <sup>→</sup> **<sup>r</sup>** ) and <sup>A</sup><sup>r</sup>(**<sup>r</sup>** <sup>→</sup> **<sup>r</sup>** ) the probability to generate and accept a new configuration, respectively, detailed balance is expressed, after simplifying ρG(p), as:

$$e^{-E(\mathbf{r},p)}e^{-\mathbf{V}\_r(\mathbf{r})}\rho\_m'(r)T^r(\mathbf{r}\to\mathbf{r}')A^r(\mathbf{r}\to\mathbf{r}') = \\\\e^{-E(\mathbf{r}',p)}e^{-\mathbf{V}\_r(\mathbf{r}')}\rho\_m'(\mathbf{r}')T^r(\mathbf{r}'\to\mathbf{r})A^r(\mathbf{r}'\to\mathbf{r}) \\\tag{35}$$

The structure of this relationship is analogous to the one considered by Kennedy *et al*. [43], who adapted Monte Carlo sampling to probability densities given by an exponential term times a "noisy" (positive definite) function. They showed that detailed balance is satisfied if states are generated according to the probability:

$$T^r(\mathbf{r}\rightarrow\mathbf{r'})\propto e^{-E(\mathbf{r'},p)}e^{-\mathbf{V}\_r(\mathbf{r'})}\tag{36}$$

(a method to sample <sup>T</sup><sup>r</sup>(**<sup>r</sup>** <sup>→</sup> **<sup>r</sup>** ) is described after the next equation) and accepted with probability:

$$A^r(\mathbf{r}\rightarrow\mathbf{r}') = \begin{cases} c\mathcal{U}(\mathbf{r}\rightarrow\mathbf{r}') \text{ if } e^{-\delta\_\beta \mathbf{V}(\mathbf{r},\mathbf{0})} > e^{-\delta\_\beta \mathbf{V}(\mathbf{r}',\mathbf{0})} \\ c & \text{if } e^{-\delta\_\beta \mathbf{V}(\mathbf{r},\mathbf{0})} \le e^{-\delta\_\beta \mathbf{V}(\mathbf{r}',\mathbf{0})} \end{cases} \tag{37}$$

Above, <sup>U</sup>(**<sup>r</sup>** <sup>→</sup> **<sup>r</sup>** ) is an unbiased estimator of the ratio, ρ m(**r** )/ρ <sup>m</sup>(**r**), and c < 1 is a constant that ensures <sup>A</sup><sup>r</sup>(**<sup>r</sup>** <sup>→</sup> **<sup>r</sup>** ) <sup>∈</sup> [0, 1] (for details on the meaning and choice of <sup>c</sup>, see [43] and the discussion on page 8 of [39]). The conditions on the exponential of the potential enforce an ordering criterion on the states, whose optimal choice depends on the problem (here, we used the one adopted in our previous work [39]). In the usual implementation of the Kennedy method, the exponential part of the probability density is assumed to be known analytically, and the states are generated via a standard Monte Carlo method. In our case, the situation is more complicated, since e−E(**r**- ,**p**) is only known with noise. To solve this problem, we employ the penalty method to obtain configurations distributed according to Equation (36). These configurations are generated using a Monte Carlo with transition probability <sup>t</sup>(**<sup>r</sup>** <sup>→</sup> **<sup>r</sup>** ) <sup>∝</sup> <sup>e</sup>−**V**r(r- ) and acceptance probability <sup>a</sup>(<sup>r</sup> <sup>→</sup> <sup>r</sup> ) = min[1, exp(−ΔEr(**r** , **<sup>r</sup>**; <sup>p</sup>) <sup>−</sup> <sup>u</sup>χ<sup>2</sup> <sup>r</sup> )], where <sup>Δ</sup>Er(**r** , **r**; p) is an unbiased estimator of E(**r** , p) <sup>−</sup> <sup>E</sup>(**r**, p), and <sup>u</sup>χ<sup>2</sup> <sup>r</sup> is defined in analogy with Equation (34).

This concludes the description of our Monte Carlo moves. The practical implementation of this algorithm requires the definition of the numerical estimators, <sup>U</sup>(**<sup>r</sup>** <sup>→</sup> **<sup>r</sup>** ), <sup>Δ</sup>Ep(**p** , **p**; **r**) and <sup>Δ</sup>Er(**r** , **r**; **p**). While this is an important technical point, it only involves a set of calculations, each performed via an auxiliary Monte Carlo move, that are quite standard. To provide a typical example, we consider one of the estimators referring the reader to [39] for a detailed description of the others. Let us then consider <sup>U</sup>(**<sup>r</sup>** <sup>→</sup> **<sup>r</sup>** ). This quantity, necessary in the Kennedy acceptance test, see Equation (37), is obtained by writing the ratio of the marginal probabilities as:

$$\begin{split} \frac{\rho\_m'(\mathbf{r}')}{\rho\_m'(\mathbf{r})} &= \frac{\int d\Delta \mathbf{r} \, e^{-\mathbf{V}\_\Delta(\Delta \mathbf{r})} e^{-\delta\_\beta \bar{\mathbf{V}}(\mathbf{r}', \Delta \mathbf{r})} \\ &= \frac{\int d\Delta \mathbf{r} \, e^{-\mathbf{V}\_\Delta(\Delta \mathbf{r})} e^{-\delta\_\beta \bar{\mathbf{V}}(\mathbf{r}, \Delta \mathbf{r})} e^{-\delta\_\beta \left[\bar{\mathbf{V}}(\mathbf{r}', \Delta \mathbf{r}) - \bar{\mathbf{V}}(\mathbf{r}, \Delta \mathbf{r})\right]}}{\int d\Delta \mathbf{r} \, e^{-\mathbf{V}\_\Delta(\Delta \mathbf{r})} e^{-\delta\_\beta \bar{\mathbf{V}}(\mathbf{r}, \Delta \mathbf{r})}} \\ &= \langle e^{-\delta\_\beta \left[\bar{\mathbf{V}}(\mathbf{r}', \Delta \mathbf{r}) - \bar{\mathbf{V}}(\mathbf{r}, \Delta \mathbf{r})\right]}\rangle\_{\beta \in (\Delta \mathbf{r} \, \mathbf{r})} \end{split} \tag{38}$$

whose unbiased estimator is:

$$\mathcal{U}(\mathbf{r}\rightarrow\mathbf{r}') = \frac{1}{N\_a} \sum\_{i=1}^{N\_a} e^{-\delta\_\beta \left[\bar{\mathbf{V}}(\mathbf{r}', \Delta \mathbf{r}\_i) - \bar{\mathbf{V}}(\mathbf{r}, \Delta \mathbf{r}\_i)\right]} \tag{39}$$

where {Δ**r**i} are a sample distributed according to <sup>ρ</sup>c(Δ**r**|**r**). This sample is obtained via an auxiliary (standard) Monte Carlo calculation over the conditional probability, <sup>ρ</sup>c(Δ**r**|**r**), in which new configurations are generated according to <sup>T</sup>(Δ**<sup>r</sup>** <sup>→</sup> <sup>Δ</sup>**r** ) <sup>∝</sup> exp[−VΔ(Δ**r** )] (To do this, we use the staging method [44], which allows one to sample exactly a probability density containing Gaussian-like distributions, see Equation (28)), and moves are accepted or rejected based on <sup>A</sup>(Δ**<sup>r</sup>** <sup>→</sup> <sup>Δ</sup>**r** ) = min <sup>1</sup>, exp[−δβ(V¯ (**r**, <sup>Δ</sup>**r** ) <sup>−</sup> <sup>V</sup>¯ (**r**, <sup>Δ</sup>**r**)] . As can be seen from the expression above, calculating the estimator requires N<sup>a</sup> steps in the auxiliary Monte Carlo calculation. A similar situation arises when the other estimators introduced above are considered, so that the total number of Monte Carlo moves in our scheme is given by <sup>N</sup><sup>t</sup> <sup>=</sup> <sup>N</sup><sup>m</sup> <sup>×</sup> <sup>N</sup>a, where we indicated with <sup>N</sup><sup>m</sup> the number of moves in the "main" Monte Carlo cycle (*i.e.*, each choice of a move on r or p) and with N<sup>a</sup> the number of auxiliary Monte Carlo steps per "main" move.

The computational overhead introduced by the auxiliary Monte Carlo calculation increases the cost of our calculation, but it is very small compared to the number of moves necessary to converge the estimate of Equation (20). The algorithm just described is, in fact, efficient enough to make possible calculations on realistic condensed phase systems with relatively little numerical effort. In particular, the algorithm was used to compute the dynamic structure factor of a model of liquid neon composed of 64 atoms [38]. Details of the calculation can be found in [38]. Here, we show, in Figure 4, our results (red triangles with error bars in the figure) and compare them with experiments (green curve) and the results of a calculation (with the same empirical potential and simulation parameters) performed by Poulsen *et al*. [45] using the linearized approximation for quantum time correlation functions described in the Introduction (see Equation (4) and the discussion above it).

Figure 4. Dynamic structure factor for liquid neon (see the text). The solid green line shows the experimental curve, our results (with error bars) are the red triangles. We also report for comparison results obtained with the linearized IVRmethod by Polusen *et al*. (see the Introduction); blue circles.

The results show a rather pronounced asymmetry around zero, due to detailed balance, that indicates the presence of relevant quantum effects in the system. The agreement between our calculations and experiments is very good, as it is the agreement with the standard linearized calculation by Poulsen (a state-of-the-art reference in the field). The numerical cost of the two calculations is very similar (about a million Monte Carlo steps in total for initial condition sampling), showing that the auxiliary steps, due to the noisy distributions in our approach, are essentially irrelevant. Indeed, other tests indicate that, depending on the system, the overall cost of our method can be less than that of alternative schemes with comparable or better accuracy. The approach described in this section, for example, has also been used to obtain the infrared spectra of simple models of molecules in the gas phase [46]. Although these systems are quite small, the calculations that we performed are known to pose a considerable challenge to alternative, less rigorous methods, such as Centroid Molecular Dynamics [47] and Ring Polymer Molecular Dynamics [48], which fail to capture the spectra and/or introduce spurious features. In contrast, even though obtaining the exact intensities is quite expensive, our method proved remarkably effective in identifying the positions of the peaks, which could be obtained with only about one hundred Monte Carlo moves.

#### *3.2. L* > *1*

In this subsection, we present a new development of the approach summarized above that extends the use of cumulants to pre-average the phase factors in the expression of the symmetrized correlation function to the case L > 1 (This possibility came out in discussions with M. Monteferrante.). For simplicity of notation, in this subsection, we describe how this can be done for L = 2, but the steps that we shall use can be generalized to a larger number of segments. In the following, we report the formal result, while the construction and test of an algorithm that generalizes the noisy Monte Carlo scheme described in the previous section will be the object of future work. Let us begin by rewriting the <sup>L</sup> = 2 approximation of the symmetrized correlation function, for diagonal <sup>A</sup>ˆ, as follows (see Equations (16) and (20) for structure and notation):

$$G\_{A,B}^{(2)}(t;\beta) = \frac{\int d\Gamma\_1 d\Gamma\_0 B\_w(r\_t^{(2)}, p\_t^{(2)}) \left[e^{-\frac{i}{\hbar}p\_1^1 \Delta r\_1^{(\nu-1)}} P(\Gamma\_1; r\_0^n) e^{\frac{i}{\hbar}p\_0^n \Delta r\_1}\right] \left[e^{-\frac{i}{\hbar}p\_0^1 \Delta r\_0^{(\nu-1)}} P(\Gamma\_0)\right] A(r\_0)}{\int d\Gamma\_1 d\Gamma\_0 \left[e^{-\frac{i}{\hbar}p\_1^1 \Delta r\_1^{(\nu-1)}} P(\Gamma\_1; r\_0^n) e^{\frac{i}{\hbar}p\_0^n \Delta r\_1}\right] \left[e^{-\frac{i}{\hbar}p\_0^1 \Delta r^{(\nu-1)}} P(\Gamma\_0)\right]} \tag{40}$$

In the equation above, (r (2) <sup>t</sup> , p(2) <sup>t</sup> ) is the endpoint of the propagation obtained by combining the two segments of classical dynamics described in the lower panel of Figure 3; Γ<sup>0</sup> = {p1 0, r<sup>0</sup> 0, ..., r<sup>ν</sup> <sup>0</sup> , Δr<sup>1</sup> <sup>0</sup>, ..., Δr (ν−1) <sup>0</sup> } and <sup>Γ</sup><sup>1</sup> <sup>=</sup> {p<sup>1</sup> 1, r<sup>1</sup> 1, ..., r<sup>ν</sup> <sup>1</sup> , Δr<sup>0</sup> <sup>1</sup>, ..., Δr (ν−1) <sup>1</sup> } indicate the variables associated with the first and second set of thermal path integrals, respectively (the first set does not include Δr<sup>0</sup> <sup>0</sup>, since this variable can be integrated over for diagonal <sup>A</sup>ˆ, and the second does not include r<sup>0</sup> <sup>1</sup> <sup>≡</sup> <sup>r</sup><sup>n</sup> <sup>0</sup> , since this is the endpoint of the, deterministic, classical propagation from zero to t/2 in Figure 3). P(Γ0) was defined in the previous section (see Equation (19)), and:

$$P(\Gamma\_1; r\_0^n) = \rho\_G(p\_1^1)\rho\_m(\mathbf{r}\_1; r\_0^n)\rho\_c(\Delta \mathbf{r}\_1|\mathbf{r}\_1) \tag{41}$$

where **<sup>r</sup>**<sup>1</sup> <sup>=</sup> {r<sup>1</sup> 1, ..., r<sup>ν</sup> <sup>1</sup> } and <sup>Δ</sup>**r**<sup>1</sup> <sup>=</sup> {Δr<sup>0</sup> <sup>1</sup>, ..., Δr (ν−1) <sup>1</sup> }. The Gaussian probability for the momenta, ρG(p<sup>1</sup> <sup>1</sup>), and the marginal, ρm(**r**1; r<sup>n</sup> <sup>0</sup> ), and conditional, <sup>ρ</sup>c(Δ**r**1|**r**1), probabilities are defined in analogy with the expressions introduced in Section 3, with the caveat that for J = 1 (and, in general, for J > 0), the sum involving the potentials in the second line of Equation (22) runs from one to <sup>ν</sup> <sup>−</sup> <sup>1</sup>. In the marginal probability, we have also indicated the (parametric) dependence of the density on r<sup>n</sup> <sup>0</sup> , *i.e.*, the endpoint of the classical propagation of the first segment, which corresponds, due to the boundary condition mentioned above, to the first bead of the concatenated semi-sum thermal path. The square brackets in Equation (40) isolate the terms that play the most significant role in the following. The first bracket from the left (corresponding to the J = 0 term in the approximation of the correlation function) is the same as the one we encountered in the previous subsection, while the second shows the general structure of the terms involving the phase factors for J > 0. As summarized when discussing Figure 3, in this bracket the first phase factor is given by the product of the momentum at the endpoint of the first segment of classical dynamics (*i.e.*, a variable fixed by the classical evolution) and the first difference variable of the second thermal segment. The second phase factor is given by the product of the initial momentum of the second leg of classical dynamics (a variable to be sampled in analogy with p<sup>1</sup> 0) and the final difference variable of the thermal path. As in the L = 1 case, these phase factors do not depend on the semi-sum variables and can be pre-averaged with respect to the conditional probability density. Let us indicate this average as:

$$F(\pi\_J, \mathbf{r}\_J) = \int d\Delta \mathbf{r}\_J e^{-\frac{i}{\hbar} \pi\_J \cdot \delta \mathbf{r}\_J} \rho\_c(\Delta \mathbf{r}\_J | \mathbf{r}\_J) \tag{42}$$

where, for J = 0, π<sup>0</sup> = p<sup>1</sup> <sup>0</sup> and δ**r**<sup>0</sup> = Δr (ν−1) <sup>0</sup> , while for J = 1, and, more in general, for J > 0, <sup>π</sup><sup>J</sup> <sup>=</sup> {−p<sup>n</sup> <sup>J</sup>−<sup>1</sup>, p<sup>1</sup> <sup>J</sup> } and <sup>δ</sup>**r**<sup>J</sup> <sup>=</sup> {Δr<sup>0</sup> <sup>J</sup> , Δr (ν−1) <sup>J</sup> }. The equation above is formally identical to Equation (23), with the important difference that, when J > 0, the phase is now given by the scalar product of two vectors and can be recognized as the definition of the joint cumulant generating function for the components of δ**r**<sup>J</sup> [40]. Although such joint cumulants are formally more complex, the cumulant moments of δ**r**<sup>J</sup> are still given by the coefficients of the expansion:

$$\ln F(\pi\_J, \mathbf{r}\_J) = \sum\_{|\lambda| \ge 1}^{\infty} \frac{(-i)^{|\lambda|}}{\lambda! \hbar^{|\lambda|}} \pi\_J^{\lambda} C\_{\lambda}(\mathbf{r}\_J) \tag{43}$$

For J = 0, the definition above is to be read as identical to Equation (24). For J = 1 (and, in general, J > <sup>0</sup>), <sup>λ</sup> <sup>=</sup> {λ1, λ2} is a vector of positive integers (including zero), <sup>|</sup>λ<sup>|</sup> is their sum, λ! = λ1!λ2! and:

$$C\_{\lambda}(\mathbf{r}\_J) = C\_{\lambda\_1, \lambda\_2}(\mathbf{r}\_J) = \left( \frac{\partial^{|\lambda|} \ln F(\pi\_J, \mathbf{r}\_J)}{\partial(-p\_{J-1}^n)^{\lambda\_1} \partial(p\_J^1)^{\lambda\_2}} \right)\_{\pi\_J = 0} \tag{44}$$

As in the previous subsection, the conditional distribution density is even with respect to the difference variables, implying that only even terms are non-zero in Equation (43). The function F(π<sup>J</sup> , **r**<sup>J</sup> ) is then real and positive, so we can set F(π<sup>J</sup> , **r**<sup>J</sup> ) = e−E(π<sup>J</sup> ,**r**<sup>J</sup> ) and define, in analogy with Equation (26), the probability density:

$$\mathcal{P}(\pi\_J, \mathbf{r}\_J; r\_0^n) = \frac{\rho\_G(p\_J^1)\rho\_m(\mathbf{r}\_J; r\_0^n)e^{-E(\pi\_J, \mathbf{r}\_J)}}{\int dp\_J^1 d\mathbf{r}\_J \rho\_g(p\_J^1)\rho\_m(\mathbf{r}\_J)e^{-E(\pi\_J, \mathbf{r}\_J)}}\tag{45}$$

Substitution of the definition above in the expression for the symmetrized correlation function shows that we can write the two-segment approximation as the following expectation value:

$$\langle G\_{A,B}^{(2)}(t;\beta) = \langle B\_w(r\_t^{(2)}, p\_t^{(2)})A(r\_0) \rangle\_{\mathcal{P}^{(2)}} \tag{46}$$

where <sup>P</sup>(2) <sup>=</sup> <sup>P</sup>(π1, **<sup>r</sup>**1; <sup>r</sup><sup>n</sup> <sup>0</sup> )P(p<sup>1</sup> <sup>0</sup>, **r**0) (with a straightforward generalization of the notation adopted here, the L-segment approximation of the correlation function can be written as G(L) A,B(t; β) = Bw(<sup>r</sup> (L) <sup>t</sup> , p(L) <sup>t</sup> )A(r0)P(L) , where <sup>P</sup>(L) <sup>=</sup> <sup>L</sup>−<sup>1</sup> <sup>J</sup>=1 <sup>P</sup>(π<sup>J</sup> , **<sup>r</sup>**<sup>J</sup> ; <sup>r</sup><sup>n</sup> <sup>J</sup>−<sup>1</sup>)P(p<sup>1</sup> <sup>0</sup>, **r**0)). The average above presents the same immediate advantage of the L = 1 case in that the "observable" does not contain any explicit phase factors. It also presents the same numerical difficulties, given that the probability density contains analytically unknown quantities (the marginal probabilities, ρm(**r**<sup>J</sup> ), and the cumulants). Although it is possible to construct generalization of the noisy Monte Carlo scheme described in the previous section, it is not certain that the favorable convergence properties of the auxiliary sampling necessary, in particular, for computing the cumulants (the decisive ingredient in the L = 1 case) will be preserved in this more general situation. Developing and testing the most appropriate algorithm for this generalization will be the focus of future work.

#### 4. Conclusions

In this paper, we summarized a recently developed method to approximate symmetrized quantum time correlation functions. The method recasts the problem as the calculation of averages over a stochastic process based on a linearized approximation of the complex time propagators in the correlation function. This approximation can be enforced either on the full length of the evolution (fully linearized approach) or in an iterative form obtained via the (complex) time composition property of the evolution operators. Thanks to the use of a cumulant expansion, which tames the phase factors present in the observable, the fully linearized approach has proven efficient and accurate in calculations on moderately quantum systems in the condensed phase. The iterative form offers, in principle, a way to improve the accuracy of the results with respect to the fully linearized case and may be useful when higher order quantum effects must be kept into account. While the potential for systematic improvement with respect to the fully classical limit for the dynamics is indeed the most interesting feature of the approach (and the one that distinguishes it from other available methods for which there is no way to improve upon the classical or semi-classical approximation), the practical use of the approach for L > 1 is currently hindered by numerical instabilities. In the final section of the paper, we have shown how to extend the use of the cumulant expansion to obtain a formal expression for this case that does not require one to average phase factors in the observable. This expression may be a promising starting point for considerable improvement of the algorithm for more than one segment, and future work will focus on developing and testing an appropriate algorithm. However, importantly, while numerical evidence on model systems supports the claim that systematic improvements can be obtained by higher order iterations, an exact statement on the convergence properties of the method is lacking, and further investigation is needed to formally assess the features of the scheme.

#### Acknowledgments

The authors are grateful to C. Pierleoni and M. Monteferrante for their substantial contributions to the earlier methods for symmetrized correlation functions summarized in this work. Funding from IIT-SEED grant No 259 "SIMBEDD" is also acknowledged.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


Reprinted from *Entropy*. Cite as: de Carvalho, F.F.; Bouduban, M.E.F.; Curchod, B.F.E.; Tavernelli, I. Nonadiabatic Molecular Dynamics Based on Trajectories. *Entropy* 2014, *16*, 62–85.

*Review*

## Nonadiabatic Molecular Dynamics Based on Trajectories

## Felipe Franco de Carvalho, Marine E. F. Bouduban, Basile F. E. Curchod and Ivano Tavernelli \*

Laboratory of Computational Chemistry and Biochemistry, Ecole Polytechnique Fédérale de Lausanne, CH-1015 Lausanne, Switzerland; E-Mails: felipe.francodecarvalho@epfl.ch (F.F.C.); marine.bouduban@epfl.ch (M.E.F.B.); basile.curchod@epfl.ch (B.F.E.C.)

\* Author to whom correspondence should be addressed; E-Mail: ivano.tavernelli@epfl.ch; Tel.: +41-21-693-03-28.

*Received: 18 September 2013; in revised form: 12 December 2013 / Accepted: 16 December 2013 / Published: 27 December 2013*

Abstract: Performing molecular dynamics in electronically excited states requires the inclusion of nonadiabatic effects to properly describe phenomena beyond the Born-Oppenheimer approximation. This article provides a survey of selected nonadiabatic methods based on quantum or classical trajectories. Among these techniques, trajectory surface hopping constitutes an interesting compromise between accuracy and efficiency for the simulation of medium- to large-scale molecular systems. This approach is however, based on non-rigorous approximations that could compromise, in some cases, the correct description of the nonadiabatic effects under consideration and hamper a systematic improvement of the theory. With the help of an *in principle* exact description of nonadiabatic dynamics based on Bohmian quantum trajectories, we will investigate the origin of the main approximations in trajectory surface hopping and illustrate some of the limits of this approach by means of a few simple examples.

Keywords: nonadiabatic dynamics; trajectory surface hopping; Ehrenfest dynamics; Bohmian dynamics; Born-Oppenheimer approximation

### 1. Introduction

Traditionally, *ab-initio* molecular dynamics (AIMD) is described within the so-called Born-Oppenheimer approximation, which assumes that the electronic and nuclear dynamics can be adiabatically separated [1], due to a large difference in mass between nuclei and electrons. Within this approximation, one usually solves the time-independent electronic Schrödinger equation for a given nuclear configuration [2] and then computes the quantum mechanical forces acting on the nuclei from the gradient of the corresponding eigenvalues, which depend parametrically on the nuclear coordinates and form the so-called potential energy surfaces (PES). However, in the description of most photophysical and photochemical processes, the electronic and nuclear dynamics become entangled, and therefore, more accurate nonadiabatic molecular dynamics schemes that go beyond the Born-Oppenheimer (BO) approximation are required. The most commonly used *ab initio* nonadiabatic molecular dynamics schemes are those based on the mixed quantum/classical propagation of an ensemble of (quasi-) classical trajectories [3–6], which, to some extent, reproduce the quantum dynamics of the nuclei. These mixed quantum/classical methods are especially popular, because they only require that the necessary electronic structure properties be computed *on-the-fly*, *i.e.*, only at the points in the configuration space visited during the dynamics, therefore making the calculation of the full potential energy surfaces unnecessary. These approaches can be implemented numerically using electronic structure methods, such as Kohn-Sham Density Functional Theory (DFT) [7,8] and its time-dependent version (TDDFT) [9–12] or wavefunction-based approaches, such as Complete Active Space Self-Consistent Field (CASSCF), Multireference Configuration Interaction (MRCI) and Second-Order Approximate Coupled-Cluster (CC2) [13].

Among all nonadiabatic AIMD schemes, Tully's fewest switches trajectory surface hopping [14,15] (TSH) and its extensions to mixed quantum/classical dynamics [16] are probably the most widely used. In the framework of TSH, the nuclear wave packets on the different PESs are described as a swarm of *independent* classical trajectories, while the nonadiabatic couplings induce hops of the trajectories from one electronic state to another; the occurrence of a trajectory hop is governed by the evaluation of a hopping probability, which depends on the temporal evolution of state amplitudes (Tully's coefficients) and on the value of the nonadiabatic couplings.

Alternative schemes have been proposed for the description of the nonadiabatic dynamics of the nuclear degrees of freedom, among which we quote semiclassical approaches [17,18], extended surface hopping [19,20], quantum/classical Liouville approaches [21,22], hydrodynamic nonadiabatic dynamics [23], linearized nonadiabatic dynamics (LAND-map) [24] or correlated electron-ion dynamics methods [25].

Despite the success of the nonadiabatic trajectory-based approaches, there are many quantum mechanical phenomena that cannot be entirely captured within this framework, namely nuclear quantum effects, like wavepacket interference [22], decoherence [26–28] and tunneling. Quantum dynamics methods based on a quantum mechanical representation of both electronic and nuclear degrees of freedom have also become available (see, for example, [29]). However, their high computational cost and the need for a numerical fit of the relevant PESs prior to propagation have limited their application to just a few nuclear degrees of freedom, and they are therefore not yet suited for the simulation of complex molecular systems.

One possible way to account for quantum nuclear effects within a trajectory-based framework consists in the use of quantum (or Bohmian) trajectories [30–32]. This approach emerges from

#### 348

a transformation of the time-dependent Schrödinger equation using a polar representation of the complex nuclear wavefunction (see [33] and Section 2 below). Robert. E. Wyatt and coworkers have recently introduced a numerical formulation of Bohmian dynamics using a trajectory-based solution of the so-called quantum hydrodynamics equations [34], named the quantum trajectory method (QTM). In their approach, the spatial support of the nuclear wave packet is split into fluid elements (FEs) that represent volume elements in configuration space carrying quantum information (amplitude and phase). Each of them is propagated according to a Newton-like equation of motion augmented by a nonlocal quantum potential. The latter supplies correlation between the FEs and is, therefore responsible for most quantum nuclear effects. The QTM approach has been employed to address challenging quantum dynamics problems in low dimensional model systems (see [35–37] for an extended presentation of quantum trajectory methods). Generalizations of QTM for multiple electronic states have also been proposed [38–42]. These are, however, based on a diabatic representation of the PESs. In an attempt to extend this type of dynamics to the investigation of molecular systems, we have recently developed an *in principle* exact QTM approach, named NABDY (nonadiabatic Bohmian dynamics), which solves the non-relativistic quantum dynamics of nuclei and electrons within the framework of quantum hydrodynamics, using the adiabatic representation of the electronic states [43,44].

In this article, we review a number of trajectory-based nonadiabatic molecular dynamics schemes together with our recent work on nonadiabatic Bohmian dynamics. Our aim is to provide a unified picture of the field by trying to "derive" the different approaches starting from a common framework, namely the quantum hydrodynamics reformulation of the molecular time-dependent Schrödinger equation. In particular, we propose a classification of the different trajectory-based approaches based on the choice of the initial expansion of the molecular wavefunction (that depends on both the nuclear and the electronic degrees of freedom) into a sum or a single product of electronic and nuclear wavefunctions. Finally, we propose a rationalization of the TSH equation of motion based on our exact nonadiabatic Bohmian dynamics scheme, showing by means of tests on two simple model systems the origin of some typical failures of TSH.

#### 2. Nonadiabatic Dynamics with Classical and Quantum Trajectories

In this Section, we briefly review the theoretical background of the different nonadiabatic molecular dynamics schemes that we have selected for this study. The selection is based on the fact that all these trajectory-based approaches can be classified according to the way the molecular wavefunction is represented in terms of the electronic and nuclear components.

Starting from the Born-Huang representation of the total molecular wavefunction, we first introduce the Born-Oppenheimer molecular dynamics (BO-MD), which is based on the adiabatic separation of the electronic and nuclear dynamics, the latter being described by a single classical trajectory. Nonadiabaticity is then reintroduced following different strategies. In trajectory surface hopping (TSH), when the classical trajectories enter a region of strong coupling between different PESs, they are allowed to *hop* from one surface to another according to a hopping algorithm designed by Tully [15]. An interesting improvement of this scheme consists in adding Gaussian-expanded nuclear wavefunctions to the propagating trajectories; this approach is named Full Multiple Spawning [45–48] and is characterized by a balance between accuracy and numerical efficiency. Finally, we will describe a trajectory-based approach in which classical trajectories are replaced by Bohmian quantum trajectories that evolve under the influence of quantum adiabatic and nonadiabatic potentials. All these methods make use of the computationally advantageous adiabatic representation of all involved electronic states.

In the second part of this review, we discuss nonadiabatic AIMD approaches that can be derived from a single product ansatz for the total molecular wavefunction. Two of these methods will be investigated, namely the approximated Ehrenfest dynamics and the exact solution, named "Exact Factorization", which has recently been proposed by Gross and coworkers.

We begin by introducing the time-dependent Schrödinger equation (TDSE) for a molecular system, which, neglecting the nuclear and electronic spins, is given by

$$i\hbar\frac{\partial}{\partial t}\Psi(\mathbf{r},\mathbf{R},t) = \hat{H}(\mathbf{r},\mathbf{R})\Psi(\mathbf{r},\mathbf{R},t),\tag{1}$$

where Ψ(*r*, *<sup>R</sup>*, t) is the total wavefunction of the system, *<sup>r</sup>* = (*r*<sup>1</sup>,..., *<sup>r</sup>*<sup>k</sup>,..., *<sup>r</sup>*<sup>N</sup>el ) is the collective position vector for the <sup>N</sup>el electrons and *<sup>R</sup>* = (*R*<sup>1</sup>,..., *<sup>R</sup>*<sup>γ</sup> ..., *<sup>R</sup>*<sup>N</sup><sup>n</sup> ) the corresponding one for the N<sup>n</sup> nuclei of mass Mγ. The molecular Hamiltonian can be expressed in the following form

$$\begin{split} \hat{H}(\boldsymbol{r}, \mathbf{R}) &= -\sum\_{\gamma} \frac{\hbar^2}{2M\_{\gamma}} \nabla\_{\gamma}^2 - \frac{\hbar^2}{2} \sum\_{k} \nabla\_k^2 + \sum\_{k$$

where <sup>H</sup>ˆel(*r*; *<sup>R</sup>*) is the electronic Hamiltonian, which is parametrically dependent on the nuclear coordinates. In Equation (2) and in the ones that follow, atomic units are used, except for the reduced Planck's constant, -, that will be kept for clarity.

#### *2.1. Methods Based on the Born-Huang Expansion*

The Born-Huang expansion gives an exact expression for the total wavefunction [49,50]

$$
\Psi(r, \mathbf{R}, t) = \sum\_{i}^{\infty} \Omega\_i(\mathbf{R}, t) \Phi\_i(r; \mathbf{R}). \tag{3}
$$

The total wavefunction, Ψ(*r*, *R*, t), is expanded in the complete basis set of of electronic eigenfunctions of <sup>H</sup>ˆel(*r*; *<sup>R</sup>*), which depend parametrically on the nuclear positions, *<sup>R</sup>*. The expansion "coefficients", <sup>Ω</sup>i(*R*, t), are functions of the nuclear coordinates, *<sup>R</sup>*, and are explicitly dependent on time. Inserting Equation (3) into the TDSE, multiplying by Φ<sup>∗</sup> <sup>j</sup> (*r*; *<sup>R</sup>*) and then integrating over *<sup>r</sup>* gives the equation of motion for the amplitudes, <sup>Ω</sup><sup>j</sup> (*R*, t)

$$i\hbar\frac{\partial}{\partial t}\Omega\_j(\mathbf{R},t) = \left[-\sum\_{\gamma}\frac{\hbar^2}{2M\_{\gamma}}\nabla\_{\gamma}^2 + E\_j^{el}(\mathbf{R})\right]\Omega\_j(\mathbf{R},t) + \sum\_{i}^{\infty}\mathcal{F}\_{ji}(\mathbf{R})\Omega\_i(\mathbf{R},t),\tag{4}$$

where the <sup>F</sup>ji(*R*) are the elements of the nonadiabatic coupling matrix

$$\begin{split} \mathcal{F}\_{ji}(\mathbf{R}) &= \int d\mathbf{r} \, \Phi\_{j}^{\*}(\mathbf{r}; \mathbf{R}) \left[ -\sum\_{\gamma} \frac{\hbar^{2}}{2M\_{\gamma}} \nabla\_{\gamma}^{2} \right] \Phi\_{i}(\mathbf{r}; \mathbf{R}) \\ &+ \sum\_{\gamma} \frac{1}{M\_{\gamma}} \left\{ \int d\mathbf{r} \, \Phi\_{j}^{\*}(\mathbf{r}; \mathbf{R}) \left[ -i\hbar \nabla\_{\gamma} \right] \Phi\_{i}(\mathbf{r}; \mathbf{R}) \right\} \left[ -i\hbar \nabla\_{\gamma} \right]. \end{split} \tag{5}$$

These terms induce nonadiabatic coupling between different electronic states (for <sup>i</sup> <sup>=</sup> <sup>j</sup>) due to nuclear motion. Equation (4) can be interpreted as a Schrödinger-like equation for "nuclear" wavefunctions <sup>Ω</sup><sup>j</sup> (*R*, t) augmented by nonadiabatic coupling terms. In fact, the amplitudes <sup>Ω</sup><sup>j</sup> (*R*, t) can be interpreted as nuclear wavefunctions in state j only when the coupling terms vanish.

#### 2.1.1. The Born-Oppenheimer Approximation and Adiabatic Dynamics

The BO approximation consists in neglecting all off-diagonal terms <sup>F</sup>ji(*R*) in Equation (4) (*i.e.*, neglecting inter-state couplings, but keeping intra-state electronic-nuclear couplings). The molecular wavefunction on each PES is therefore represented by the simple product Ψ(*r*, *R*, t) = <sup>Ω</sup><sup>j</sup> (*R*, t)Φ<sup>j</sup> (*r*; *<sup>R</sup>*). If the diagonal terms <sup>F</sup>jj (*R*) are also neglected, then we obtain what is usually called the adiabatic BO approximation [51]. Introducing the polar representation of <sup>Ω</sup><sup>j</sup> (*R*, t), we obtain

$$
\Omega\_j(\mathbf{R}, t) = A\_j(\mathbf{R}, t) \exp\left[\frac{i}{\hbar} S\_j(\mathbf{R}, t)\right],\tag{6}
$$

where both the amplitude, <sup>A</sup><sup>j</sup> (*R*, t), and the phase, <sup>S</sup><sup>j</sup> (*R*, t), are real. By inserting Equation (6) into Equation (4) and separating the real and imaginary parts, we obtain, within the adiabatic BO approximation, two separate, but coupled, equations for the amplitude and the phases

$$\frac{\partial S\_j(\mathbf{R}, t)}{\partial t} = \frac{\hbar^2}{2} \sum\_{\gamma} M\_{\gamma}^{-1} \frac{\nabla\_{\gamma}^2 A\_j(\mathbf{R}, t)}{A\_j(\mathbf{R}, t)} - \frac{1}{2} \sum\_{\gamma} M\_{\gamma}^{-1} \left(\nabla\_{\gamma} S\_j(\mathbf{R}, t)\right)^2 - E\_j^{el}(\mathbf{R}), \tag{7}$$

$$\frac{\partial A\_j(\mathbf{R}, t)}{\partial t} = -\sum\_{\gamma} M\_{\gamma}^{-1} \nabla\_{\gamma} A\_j(\mathbf{R}, t) \cdot \nabla\_{\gamma} S\_j(\mathbf{R}, t) - \frac{1}{2} \sum\_{\gamma} M\_{\gamma}^{-1} A\_j(\mathbf{R}, t) \nabla\_{\gamma}^2 S\_j(\mathbf{R}, t). \tag{8}$$

Taking the so-called classical limit - <sup>→</sup> <sup>0</sup> in Equation (7) leads to something akin to the classical Hamilton-Jacobi equation of motion

$$\frac{\partial S\_j(\mathbf{R}, t)}{\partial t} = -\frac{1}{2} \sum\_{\gamma} M\_{\gamma}^{-1} \left( \nabla\_{\gamma} S\_j(\mathbf{R}, t) \right)^2 - E\_j^{el}(\mathbf{R}), \tag{9}$$

where <sup>S</sup><sup>j</sup> (*R*, t) can now be interpreted as the classical Hamilton's principal function. In this case, <sup>∇</sup>γS<sup>j</sup> (*R*, t) corresponds to the nuclear momentum *<sup>p</sup>*<sup>γ</sup> <sup>j</sup> (t). By rearranging Equation (9), we finally obtain the Newtonian equation of motion for the nuclei

$$M\_{\gamma} \ddot{\mathbf{R}}\_{j}^{\gamma}(t) = -\nabla\_{\gamma} E\_{j}^{el}(\mathbf{R}(t)). \tag{10}$$

The nuclei therefore evolve on a given potential energy surface, Eel <sup>j</sup> (*R*(t)) (selected by the initial conditions), while the electrons adiabatically follow the nuclei along their classical trajectories *R*(t). Equation (8) represents a continuity equation for the nuclear amplitudes, <sup>A</sup><sup>j</sup> (*R*, t), in an arbitrary state j, which, in the classical limit, is trivially fulfilled because of the conservation of the number of trajectories. The BO-MD method therefore consists in solving the time-independent electronic Schrödinger equation to get the potential and the forces acting on the nuclei; these are then used to propagate the nuclei for time step dt using Equation (10), and the process is iterated until the desired propagation time is reached.

#### 2.1.2. Tully's Trajectory Surface Hopping

One of the most successful methods for nonadiabatic dynamics is Tully's trajectory surface hopping [14,15]. In this method, the nuclei are treated classically, and the only nuclear quantum effect that is accounted for is the nonadiabatic transfer of "amplitude" between electronic states. This is achieved classically through *hops* of trajectories from one electronic state to another according to a hopping probability determined by the strength of the nonadiabatic couplings and the values of the state amplitudes C[α] <sup>i</sup> (t) defined below. A swarm of trajectories needs to be propagated in order to reproduce the probability distribution associated with corresponding nuclear quantum wave packet.

In this section, we only give a brief introduction to TSH, while a more detailed description of the method is given in Section 3, where we attempt a "derivation" of TSH, starting from the nonadiabatic Bohmian dynamics equations of motion.

The main ansatz in TSH is given by the following description of the molecular wavefunction [5,15]

$$\Psi^{[\alpha]}(r,\mathbf{R},t) = \sum\_{i}^{\infty} C\_i^{[\alpha]}(t) \Phi\_i^{[\alpha]}(r;\mathbf{R}),\tag{11}$$

which, in a way, constitutes a simplified version of the original Born-Huang expansion. When we introduce Equation (11) into the electronic time-dependent Schrödinger equation, we get a set of coupled equations of motion for the complex nuclear state amplitudes, C[α] <sup>j</sup> (t) (for trajectory α)

$$i\hbar \dot{C}\_j^{[\alpha]}(t) = \sum\_i^{\infty} C\_i^{[\alpha]}(t) \left( E\_i^{el}(\mathbf{R}^{[\alpha]}) \delta\_{ij} - i\hbar \sum\_{\gamma}^{N\_n} \mathbf{d}\_{ji}^{\gamma}(\mathbf{R}^{[\alpha]}) \cdot \dot{\mathbf{R}}\_{\gamma}^{[\alpha]} \right), \tag{12}$$

where *d*<sup>γ</sup> ji(*R*) = <sup>d</sup>*r* <sup>Φ</sup><sup>∗</sup> <sup>j</sup> (*r*; *<sup>R</sup>*)∇γΦi(*r*; *<sup>R</sup>*) are the first-order nonadiabatic couplings (see Equation (5)). These coupled equations will be solved along a classical trajectory α, evolving *adiabatically* in a given electronic state j. The probability, g [α] ji (t, t + dt), for the trajectory α to jump from state j to state i during the time interval [t, t + dt] is given by

$$g\_{ji}^{[\alpha]}(t, t+dt) = 2\int\_{t}^{t+dt} d\tau \frac{-\Re[C\_i^{[\alpha]}(\tau)C\_j^{[\alpha]\*}(\tau)\Xi\_{ij}^{[\alpha]}(\tau)]}{C\_j^{[\alpha]}(\tau)C\_j^{[\alpha]\*}(\tau)},\tag{13}$$

where Ξ[α] ij (τ ) = <sup>N</sup><sup>n</sup> <sup>γ</sup> *<sup>d</sup>*<sup>γ</sup> ij (*R*[α] ) · *R*˙ [α] <sup>γ</sup> .

A surface hop between two PESs, j and i (j → i), occurs "stochastically" when, for a randomly generated number, <sup>ζ</sup> <sup>∈</sup> [0, 1], we get:

$$\sum\_{k \le i-1} g\_{jk}^{[\alpha]} < \zeta < \sum\_{k \le i} g\_{jk}^{[\alpha]}.\tag{14}$$

This algorithm guarantees that a minimum number of hops is performed along each trajectory; for this reason, the method is also referred to as the "fewest switches algorithm".

#### 2.1.3. Full Multiple Spawning

Full Multiple Spawning (FMS) [45–48] proposes an interesting compromise between accuracy and efficiency by representing nuclear wavefunctions as sums of time-dependent Gaussian basis functions, whose width is frozen and whose center evolves adiabatically according to classical mechanics. This ansatz on the classical evolution of the Gaussian centers is consistently applied throughout the full derivation of the FMS equations of motion.

In the FMS method, the nuclear wavefunction <sup>Ω</sup>i(*R*, t) in electronic state <sup>i</sup>, is represented by a linear combination of multidimensional Gaussian wave packets Ω<sup>i</sup> <sup>J</sup> (*R*; *<sup>R</sup>*<sup>i</sup> <sup>J</sup> (t), *<sup>p</sup>*<sup>i</sup> <sup>J</sup> (t), γ<sup>i</sup> <sup>J</sup> (t), *<sup>α</sup>*<sup>i</sup> J ), products of one-dimensional Gaussian functions [52]

$$\begin{split} \Omega\_{i}(\mathbf{R},t) &= \sum\_{J=1}^{N\_{i}(t)} C\_{J}^{i}(t) \Omega\_{J}^{i}(\mathbf{R}; \overline{\mathbf{R}}\_{J}^{i}(t), \overline{\mathbf{p}}\_{J}^{i}(t), \overline{\gamma}\_{J}^{i}(t), \boldsymbol{\alpha}\_{J}^{i}) \\ &= \sum\_{J=1}^{N\_{i}(t)} C\_{J}^{i}(t) \left[ e^{\overline{\boldsymbol{\sigma}}\_{J}^{i}(t)t} \prod\_{\rho} \left( \frac{2\alpha\_{\rho\_{J}}^{i}}{\pi} \right)^{1/4} e^{-\alpha\_{\rho\_{J}}^{i} \left( \boldsymbol{R}\_{\rho} - \overline{\boldsymbol{R}}\_{\rho\_{J}}^{i}(t) \right)^{2} + \overline{\boldsymbol{\sigma}}\_{\rho\_{J}}^{i}(t) \left( \boldsymbol{R}\_{\rho} - \overline{\boldsymbol{R}}\_{\rho\_{J}}^{i}(t) \right)} \right]. \end{split} \tag{15}$$

In Equation (15), the multidimensional Gaussian basis functions are labeled with index J, their time-independent width by *α*<sup>i</sup> <sup>J</sup> and their time-dependent position, momentum and nuclear phase by *<sup>R</sup>*<sup>i</sup> <sup>J</sup> (t), *<sup>p</sup>*<sup>i</sup> <sup>J</sup> (t) and γ<sup>i</sup> <sup>J</sup> (t), respectively. Ni(t) gives the number of Gaussian basis functions used in order to describe the nuclear wavefunction in electronic state i, and its time dependence comes from the possible "spawning" of new basis functions (as further discussed here below). The nuclear phases are propagated semi-classically, whereas the positions and momenta of the center of the Gaussians obey classical equations of motion in a given electronic state [52].

The time-evolution of the expansion coefficients *C*<sup>i</sup> (t) is obtained through the solution of the following differential equation:

$$\frac{d\mathbf{C}^i(t)}{dt} = -i(\mathbf{S}\_{ii}^{-1})\left\{ \left[ \mathbf{H}\_{ii} - i\dot{\mathbf{S}}\_{ii} \right] \mathbf{C}^i + \sum\_{j \neq i} \mathbf{H}\_{ij} \mathbf{C}^j \right\} \tag{16}$$

This equation is derived by plugging the Born-Huang expansion (Equation (3)) and the ansatz for the nuclear wavefunctions (Equation (15)) into the time-dependent Schrödinger equation. In Equation (16) the bold symbols emphasize that, for each electronic state i, there is a time-dependent coefficient per each Gaussian basis function, and *<sup>S</sup>*ii and *<sup>S</sup>*˙ ii represent different overlap matrices of the Gaussian functions (see [52] for the more details). The matrix elements of *<sup>H</sup>*ij are given by:

$$\begin{split} \langle (\mathbf{H}\_{ij})\_{KK'} = H\_{iKjK'} &= \langle \Omega\_K^i \Phi\_i | \hat{\mathcal{H}}\_{el} + \hat{T}\_\mathbf{R} | \Omega\_{K'}^j \Phi\_j \rangle \\ &= \langle \Omega\_K^i | \mathcal{H}\_{el}^{ij} | \Omega\_{K'}^j \rangle\_\mathbf{R} + 2D\_{iKjK'} + G\_{iKjK'} \end{split} \tag{17}$$

where <sup>H</sup>ˆel is the electronic Hamiltonian and <sup>T</sup><sup>ˆ</sup> *<sup>R</sup>* the kinetic energy operator for the nuclei. In Equation (17), DiKjK- <sup>=</sup> Ω<sup>i</sup> K| <sup>N</sup><sup>n</sup> γ 1 <sup>2</sup>M<sup>γ</sup> Φi|∇γ|Φ<sup>j</sup> *<sup>r</sup>* · ∇γ|Ω<sup>j</sup> K-*<sup>R</sup>* and GiKjK-= Ωi K| <sup>N</sup><sup>n</sup> γ 1 <sup>2</sup>M<sup>γ</sup> Φi|∇<sup>2</sup> <sup>γ</sup>|Φ<sup>j</sup> *r*|Ω<sup>j</sup> K-*<sup>R</sup>* couple the electronic and nuclear motions (· · · *<sup>R</sup>* means integration over *<sup>R</sup>* and · · · *<sup>r</sup>* integration over *<sup>r</sup>*).

The spawning procedure takes place when a region of nonadiabaticity is detected along a trajectory (by monitoring the strength of nonadiabatic couplings in the adiabatic representation) and allows for the generation of new Gaussian basis functions (children), placed in the newly populated electronic state according to physical rules (like position or momentum conservation [52]) maximizing the coupling between the parent and children Gaussian basis functions [52] until the system leaves the region of strong nonadiabatic coupling [53]

The spawning procedure, therefore, limits the number of Gaussian basis function used in the calculation by defining precisely where and when they are needed. Moreover, the FMS method offers a numerically exact [48] solution when all matrix elements are computed exactly, and a complete Gaussian basis set is used.

While keeping a trajectory-based formalism, FMS fully incorporates nuclear quantum effects that are missing in methods like TSH. Furthermore, the nuclear propagation can be performed on-the-fly, by computing any electronic structure property needed, like electronic energies (Hii el in Equation (17)) or nonadiabatic couplings (Φi|∇*R*|Φ<sup>j</sup> *<sup>r</sup>* and Φi|∇<sup>2</sup> *<sup>R</sup>*|Φ<sup>j</sup> *r*; note that the <sup>G</sup>iKjK terms are normally small and usually neglected) with either an *ab initio* electronic structure or semiempirical methods (*Ab Initio* Multiple Spawning, AIMS [54]). AIMS, therefore, overcomes the limitations in accuracy of TSH, preserving efficiency all the while. For additional information about the derivation and numerical procedure of this method, the interested reader is referred to [52].

#### 2.1.4. Nonadiabatic Bohmian Dynamics

Just as for the previous three methods, nonadiabatic Bohmian dynamics (NABDY) is also based on the propagation of trajectories. However, this time, the trajectories evolve under the action of additional quantum potentials (adiabatic and nonadiabatic), which make the dynamics exact in principle. In other words, this approach is able to capture all adiabatic and nonadiabatic nuclear quantum effects through the propagation of a sufficiently large (*i.e.*, converged) number of trajectories.

The derivation of the NABDY equations of motion starts from the insertion of the polar representation of the nuclear wavefunction in Equation (6) into Equation (4). After separation of the real and imaginary parts, we obtain

$$\begin{split}-\frac{\partial S\_{j}(\mathbf{R},t)}{\partial t} &= \sum\_{\gamma} \frac{1}{2M\_{\gamma}} \left(\nabla\_{\gamma} S\_{j}(\mathbf{R},t)\right)^{2} + E\_{j}^{el}(\mathbf{R}) - \sum\_{\gamma} \frac{\hbar^{2}}{2M\_{\gamma}} \frac{\nabla\_{\gamma}^{2} A\_{j}(\mathbf{R},t)}{A\_{j}(\mathbf{R},t)} \\ &+ \sum\_{\gamma i} \frac{\hbar^{2}}{2M\_{\gamma}} D\_{ji}^{\gamma}(\mathbf{R}) \frac{A\_{i}(\mathbf{R},t)}{A\_{j}(\mathbf{R},t)} \Re\left[e^{i\phi\_{j}(\mathbf{R},t)}\right] - \sum\_{\gamma,i\neq j} \frac{\hbar^{2}}{M\_{\gamma}} d\_{ji}^{\gamma}(\mathbf{R}) \cdot \frac{\nabla\_{\gamma} A\_{i}(\mathbf{R},t)}{A\_{j}(\mathbf{R},t)} \Re\left[e^{i\phi\_{j}(\mathbf{R},t)}\right] \\ &+ \sum\_{\gamma,i\neq j} \frac{\hbar}{M\_{\gamma}} \frac{A\_{i}(\mathbf{R},t)}{A\_{j}(\mathbf{R},t)} d\_{ji}^{\gamma}(\mathbf{R}) \cdot \nabla\_{\gamma} S\_{i}(\mathbf{R},t) \odot \left[e^{i\phi\_{j}(\mathbf{R},t)}\right],\tag{18} \end{split} \tag{19}$$

 

and

$$\begin{split} \hbar \frac{\partial A\_{j}(\mathbf{R},t)}{\partial t} &= -\sum\_{\gamma} \frac{\hbar}{M\_{\gamma}} \nabla\_{\gamma} A\_{j}(\mathbf{R},t) \cdot \nabla\_{\gamma} S\_{j}(\mathbf{R},t) - \sum\_{\gamma} \frac{\hbar}{2M\_{\gamma}} A\_{j}(\mathbf{R},t) \nabla\_{\gamma}^{2} S\_{j}(\mathbf{R},t) \\ &+ \sum\_{\gamma i} \frac{\hbar^{2}}{2M\_{\gamma}} D\_{ji}^{\gamma}(\mathbf{R}) A\_{i}(\mathbf{R},t) \odot \left[ e^{i\phi\_{ij}(\mathbf{R},t)} \right] - \sum\_{\gamma,i \neq j} \frac{\hbar^{2}}{M\_{\gamma}} \mathbf{d}\_{ji}^{\gamma}(\mathbf{R}) \cdot \nabla\_{\gamma} A\_{i}(\mathbf{R},t) \odot \left[ e^{i\phi\_{ij}(\mathbf{R},t)} \right] \\ &- \sum\_{\gamma,i \neq j} \frac{\hbar}{M\_{\gamma}} A\_{i}(\mathbf{R},t) \mathbf{d}\_{ji}^{\gamma}(\mathbf{R}) \cdot \nabla\_{\gamma} S\_{i}(\mathbf{R},t) \Re \left[ e^{i\phi\_{ij}(\mathbf{R},t)} \right], \tag{19} \end{split} \tag{10}$$

where <sup>φ</sup>ij (*R*, t) = <sup>1</sup> - (Si(*R*, t) <sup>−</sup> <sup>S</sup><sup>j</sup> (*R*, t)) and <sup>D</sup><sup>γ</sup> ji(*R*) are the second-order nonadiabatic couplings. Equation (18) is equivalent to the classical Hamilton-Jacobi equation augmented by terms that are O(-) and <sup>O</sup>(-<sup>2</sup>). The third term, <sup>Q</sup>(*R*, t), on the right-hand side of Equation (18) is called the quantum potential, and it includes all adiabatic quantum effects (adiabatic in the sense that the potential <sup>Q</sup>(*R*, t) acts on a single PES and does not include contributions from other surfaces). Unlike "classical" potentials, it is non-local in space, in the sense that it depends on the position of all particle in configuration space [55]. The last three terms on the right-hand side of Equation (18) describe inter-state nonadiabatic quantum effects and, like the quantum potential, <sup>Q</sup>(*R*, t), do not have a classical equivalent.

Equation (19) represents a continuity equation the for probability density, <sup>|</sup>A<sup>j</sup> (*R*, t)<sup>|</sup> <sup>2</sup>, with corresponding probability density flux *J*(*R*, t) [33,35,44]. The first two terms on the right-hand side describe the "adiabatic" probability density flow within a given state, j, while the remaining terms that depend on the first-order and second-order nonadiabatic couplings induce probability density exchanges across different electronic states. Of course, the overall nuclear amplitude (summed up over all states) is conserved.

The two equations for the phases and the amplitudes are coupled, and they therefore need to be solved simultaneously. Instead of solving complex differential equations for the two fields, (A<sup>j</sup> (*R*, t) and <sup>S</sup><sup>j</sup> (*R*, t)), we reintroduce trajectories in configuration space that drive the dynamics of "infinitesimal" volume elements called "fluid elements" [35]. The derivation of the equations of motion is similar to that described in Section 2.1.1 for the BO approach, with the important difference that in NABDY new fluid elements can be created at any time on any other PES according to the size of the nonadiabatic terms in Equation (19). The details of the numerical implementation of NABDY are given in [44], while a possible extension of NABDY to large dimensions (in the adiabatic case) is proposed in [56].

#### *2.2. Methods Based on a Single Product Ansatz*

#### 2.2.1. Ehrenfest Dynamics

The equation of motion that drives Ehrenfest dynamics (EHD) is derived from a simpler ansatz for the total wavefunction than the Born-Huang expansion (Equation (3)) used for the methods presented in Section 2.1.

In EHD, the molecular wavefunction is described by the simple product

$$\Psi(r,\mathbf{R},t) = \Phi(r,t)\Omega(\mathbf{R},t)\exp\left[\frac{i}{\hbar}\int\_{t\_0}^{t}dt'\,E\_{el}(t')\right] \tag{20}$$

where Φ(*r*, t) is the electronic wavefunction and Ω(*R*, t) is the nuclear wavefunction. Note that in this case, both amplitudes Φ(*r*, t) and Ω(*R*, t) depend explicitly on time. In addition, they also have a parametric dependence on the other set of coordinates (Φ(*r*, t) on *R* and Ω(*R*, t) on *r*), which is not explicitly shown, so as to simplify the notation.

The exponential in Equation (20) is named the phase term and is defined as

$$E\_{el}(t) = \int\int d\mathbf{r} \, d\mathbf{R} \, \Phi^\*(\mathbf{r}, t) \Omega^\*(\mathbf{R}, t) \hat{\mathcal{H}}\_{el}(\mathbf{r}, \mathbf{R}) \Phi(\mathbf{r}, t) \Omega(\mathbf{R}, t) \tag{21}$$

and it guarantees that the product wavefunction, Ψ(*r*, *R*, t), fulfills the corresponding time-dependent Schrödinger equation.

Following the derivation proposed by Tully [57], we can substitute Equation (20) into the TDSE, multiply by <sup>Ω</sup>∗(*R*, t) and integrate over *<sup>R</sup>*, and assuming that Ω(*R*, t) is normalized, we finally obtain

$$i\hbar\frac{\partial\Phi(\mathbf{r},t)}{\partial t}=-\frac{\hbar^2}{2}\sum\_{k}\nabla\_k^2\Phi(\mathbf{r},t)+\tag{22}$$

$$\left\{\int d\mathbf{R}\,\,\Omega^\*(\mathbf{R},t)\left[-\frac{\hbar^2}{2}\sum\_{\gamma}M\_{\gamma}^{-1}\nabla\_\gamma^2+\hat{V}(\mathbf{r},\mathbf{R})\right]\Omega(\mathbf{R},t)\right\}\Phi(\mathbf{r},t)+$$

$$E\_{el}(t)\Phi(\mathbf{r},t)-i\hbar\left[\int d\mathbf{R}\,\,\Omega^\*(\mathbf{R},t)\frac{\partial\Omega(\mathbf{R},t)}{\partial t}\right]\Phi(\mathbf{r},t)$$

where <sup>V</sup><sup>ˆ</sup> (*r*, *R*) is the sum of all the potential energy terms in the molecular Hamiltonian.

Applying an analogous procedure, we can also derive the equation of motion for Ω(*R*, t)

$$i\hbar \frac{\partial \Omega(\mathbf{R}, t)}{\partial t} = -\frac{\hbar^2}{2} \sum\_{\gamma} M\_{\gamma}^{-1} \nabla\_{\gamma}^2 \Omega(\mathbf{R}, t) + \tag{23}$$

$$\left\{ \int d\mathbf{r} \, \Phi^\*(\mathbf{r}, t) \left[ \hat{\mathcal{H}}\_{el}(\mathbf{r}, \mathbf{R}) \right] \Phi(\mathbf{r}, t) \right\} \Omega(\mathbf{R}, t)$$

$$+ E\_{el}(t) \Omega(\mathbf{R}, t) - i\hbar \left[ \int d\mathbf{r} \, \Phi^\*(\mathbf{r}, t) \frac{\partial \Phi(\mathbf{r}, t)}{\partial t} \right] \Omega(\mathbf{R}, t).$$

Using the relations [57]

$$\int d\mathbf{r} \, \Phi^\*(\mathbf{r}, t) \frac{\partial \Phi(\mathbf{r}, t)}{\partial t} = E\_{el}(t) \tag{24}$$

and

$$\int d\mathbf{R} \,\Omega^\*(\mathbf{R}, t) \frac{\partial \Omega(\mathbf{R}, t)}{\partial t} = E \tag{25}$$

where E is the expectation value of the molecular Hamiltonian for the wavefunction appearing in Equation (20), and the fact that <sup>E</sup> <sup>=</sup> <sup>E</sup>el(t) + T*R* (T*R* is the expectation value of the nuclear kinetic energy), we can further simplify Equations (22) and (23) and obtain the following differential equations for the two amplitudes

$$i\hbar\frac{\partial\Phi(r,t)}{\partial t} = -\frac{\hbar^2}{2}\sum\_{k}\nabla\_k^2\Phi(r,t) + \left[\int d\mathbf{R}\,\Omega^\*(\mathbf{R},t)\hat{V}(r,\mathbf{R})\Omega(\mathbf{R},t)\right]\Phi(r,t)\tag{26}$$

and

$$i\hbar\frac{\partial\Omega(\mathbf{R},t)}{\partial t} = -\frac{\hbar^2}{2}\sum\_{\gamma}M\_{\gamma}^{-1}\nabla\_{\gamma}^2\Omega(\mathbf{R},t) + \left[\int d\mathbf{r}\,\,\Phi^\*(\mathbf{r},t)\hat{\mathcal{H}}\_{el}(\mathbf{r},\mathbf{R})\Phi(\mathbf{r},t)\right]\Omega(\mathbf{R},t)\tag{27}$$

These are mean field coupled equations, in which the electrons move in a field generated by the nuclei (second term on the right hand side (r.h.s.) of Equation (26)) and the nuclei move in a field generated by the electrons (second term on the r.h.s. of Equation (27)). Strictly speaking, these are not yet the EHD equations of motion, but, rather, a version of the time-dependent self-consistent field equations. EHD implies the passage to the classical limit for the nuclear amplitudes, which is again accomplished through the use of the polar representation of the nuclear wavefunction (Equation (6)) in Equation (27).

Once again, we obtain a classical Hamilton-Jacobi equation, which can be transformed into a Newton equation of motion given by

$$M\_{\gamma}\ddot{\mathbf{R}}\_{\gamma}(t) = -\nabla\_{\gamma}\langle\hat{\mathcal{H}}\_{el}(\boldsymbol{r},\boldsymbol{R})\rangle\_{t} = -\nabla\_{\gamma}\left[\int d\boldsymbol{r}\,\,\Phi^{\*}(\boldsymbol{r},t)\hat{\mathcal{H}}\_{el}(\boldsymbol{r},\boldsymbol{R})\Phi(\boldsymbol{r},t)\right].\tag{28}$$

Notice that the potential acting on the nuclei is now given by the expectation value of the electronic Hamiltonian computed using the time-dependent electronic "wavefunction" Φ(*r*, t), which is not necessarily an eigenstate of <sup>H</sup>ˆel(*r*, *<sup>R</sup>*(t)), but which can be expressed as a linear combination of the static solutions of the corresponding time-independent electronic Schrödinger equation for the same nuclear position, *R*, at time <sup>t</sup> . For this reason, EHD is called a mean-field solution of the time-dependent molecular Schrödinger equation.

The equation of motion for the electronic amplitudes, Equation (26), also depends on the nuclear amplitudes, Ω(*R*, t). However, since the nuclei are treated as classical particles, we can set

$$|\Omega(\mathbf{R}(t))|^2 = \prod\_{\gamma} \delta(\mathbf{R}\_{\gamma} - \mathbf{R}\_{\gamma}(t))\tag{29}$$

that is to say, we induce localization of the nuclear densities at a fixed position, *R*<sup>γ</sup>(t). By plugging Equation (29) into Equation (26) we obtain a TDSE for the electronic amplitude

$$i\hbar\frac{\partial\Phi(r;R(t),t)}{\partial t} = \hat{\mathcal{H}}\_{el}(r;R(t))\Phi(r;R(t),t) \tag{30}$$

where the Hamiltonian and the wavefunction both depend parametrically on the nuclear positions, which induces the coupling with the nuclear equation of motion (Equation (28)). As we mentioned before, in EHD the nuclei will evolve on a single time-dependent PES, which can be expressed at any instant of time as a linear combination of all adiabatic PESs. This implies that in EHD nonadiabatic effects are taken into account through the propagation of the electronic wavefunction [58]; a perspective that is indeed very different from what is observed in the approaches derived from the Born-Huang expansion (Section 2.1).

#### 2.2.2. The Exact Factorization-Based Dynamics

Recently, Gross *et al.* [59,60] have shown that the (exact) solution of the molecular TDSE can be factorized into the product of an electronic and a nuclear wavefunction [61] (even when the Hamiltonian includes coupling to external electromagnetic fields)

$$
\Psi(r,R,t) = \Phi(r;R,t)\Omega(R,t)\,. \tag{31}
$$

Equation (31) might seem counter-intuitive at first, because the molecular Hamiltonian is not separable. In fact, while the factorization in Equation (31) can be made at any time, t, and at any position, *r* or *R*, the persistence of this kind of solution along the time propagation of the two wavefunctions is less obvious (as can be seen from the resulting evolution equations [59]). As discussed in [60], the factorization in Equation (31) can be justified using multivariate statistics, according to which any probability distribution (here, the square of the molecular wavefunction) can be factored into a marginal probability and a conditional probability. In this respect, it is also important to notice that Φ(*r*; *R*, t) depends (parametrically) on the nuclear coordinates, *R*, and there is, therefore, no loss of generality in applying Equation (31).

The factorization of Ψ(*r*, *R*, t) does not simplify, *per se*, the task of solving the molecular TDSE. Nonetheless, this approach has a great interpretive power, since Φ(*r*; *R*, t) and Ω(*R*, t) have both a clear physical meaning: they are *the* exact electronic and nuclear time-dependent wavefunctions, respectively. One crucial requirement for this to be true is the so-called partial normalization condition

$$\int dr \, |\Phi(r; R, t)|^2 = 1. \tag{32}$$

This condition allows for the interpretation of <sup>d</sup>*r*|Φ(*r*; *R*, t)<sup>|</sup> <sup>2</sup> as the conditional probability of finding an electron in volume element, <sup>d</sup>*r*, at position *r* given a nuclear configuration, *R*; that is to say, <sup>|</sup>Φ(*r*; *R*, t)<sup>|</sup> <sup>2</sup> is an electronic probability density function. According to the standard interpretation of quantum mechanics, Φ(*r*; *R*, t) is then the corresponding electronic wavefunction. Similarly, Ω(*R*, t) is the marginal probability density for the nuclear position, *R* (marginal, and not conditional, because *r* is unknown), and Ω(*R*, t) is the corresponding nuclear wavefunction. Interestingly, just as in EHD, this factorization leads to the definition of a single time-dependent potential energy surface (because of the time-dependence of the electronic wavefunction), which this time is however, exact and unique. What is lost is the picture of a time-dependent nuclear wave packet (or corresponding trajectories) evolving on an ensemble of static PESs; a picture that has provided important insights for the understanding of many photophysical and photochemical processes.

The time evolution of the wavefunctions, Φ(*r*; *R*, t) and Ω(*R*, t), is described by two connected differential equations, which contain, besides the usual interaction terms, additional scalar and vector potential terms [59,60,62,63].

#### 3. Trajectory Surface Hopping from the Nonadiabatic Bohmian Dynamics Equations

When it comes to nonadiabatic molecular dynamics, TSH is probably the most popular simulation scheme. As stated in Section 2, it relies on the description of nuclear wave packets by means of a swarm of classical trajectories. A complex coefficient, C[α] <sup>j</sup> , for each electronic state, j, is propagated along a given classical trajectory, α, according to Equation (12). The classical trajectory may "hop" from its current electronic state, i, to another at any point in time, and the probability that a hop to state j occurs is given by Equation (13) [15,64–66].

In this Section, starting from Equations (18) and (19), we will present a "rationalization" of the TSH equations of motion based on the nonadiabatic Bohmian dynamics equations.

The following steps were reported in [44] and can be summarized as follows:


$$\begin{split}-\frac{\partial S\_{j}(\mathbf{R},t)}{\partial t} &= E\_{j}^{el}(\mathbf{R}) + \sum\_{\gamma}^{N\_{n}} \sum\_{i}^{\infty} \frac{\hbar^{2}}{2M\_{\gamma}} D\_{ji}^{\gamma}(\mathbf{R}) \frac{A\_{i}(\mathbf{R},t)}{A\_{j}(\mathbf{R},t)} \mathfrak{R} \left[e^{i\phi\_{ij}(\mathbf{R},t)}\right] \\ &- \sum\_{\gamma}^{N\_{n}} \sum\_{i \neq j}^{\infty} \frac{\hbar^{2}}{M\_{\gamma}} d\_{ji}^{\gamma}(\mathbf{R}) \cdot \frac{\nabla\_{\gamma} A\_{i}(\mathbf{R},t)}{A\_{j}(\mathbf{R},t)} \mathfrak{R} \left[e^{i\phi\_{ij}(\mathbf{R},t)}\right] \\ &+ \sum\_{\gamma}^{N\_{n}} \sum\_{i \neq j}^{\infty} \frac{\hbar}{M\_{\gamma}} \frac{A\_{i}(\mathbf{R},t)}{A\_{j}(\mathbf{R},t)} d\_{ji}^{\gamma}(\mathbf{R}) \cdot \nabla\_{\gamma} S\_{i}(\mathbf{R},t) \mathfrak{R} \left[e^{i\phi\_{ij}(\mathbf{R},t)}\right] \end{split} \tag{33}$$

and

$$\begin{split} \frac{\partial A\_j(\mathbf{R},t)}{\partial t} &= \sum\_{\gamma}^{N\_n} \sum\_{i}^{\infty} \frac{\hbar}{2M\_{\gamma}} D\_{ji}^{\gamma}(\mathbf{R}) A\_i(\mathbf{R},t) \mathfrak{S} \left[ e^{i\phi\_{ij}(\mathbf{R},t)} \right] \\ &- \sum\_{\gamma}^{N\_n} \sum\_{i \neq j}^{\infty} \frac{\hbar}{M\_{\gamma}} \mathbf{d}\_{ji}^{\gamma}(\mathbf{R}) \cdot \nabla\_{\gamma} A\_i(\mathbf{R},t) \mathfrak{S} \left[ e^{i\phi\_{ij}(\mathbf{R},t)} \right] \\ &- \sum\_{\gamma}^{N\_n} \sum\_{i \neq j}^{\infty} \frac{1}{M\_{\gamma}} A\_i(\mathbf{R},t) \mathbf{d}\_{ji}^{\gamma}(\mathbf{R}) \cdot \nabla\_{\gamma} S\_i(\mathbf{R},t) \mathfrak{R} \left[ e^{i\phi\_{ij}(\mathbf{R},t)} \right] . \end{split} \tag{34}$$

Note that due to the *independent trajectory approximation*, we assume that there is no amplitude exchange among the FEs propagated along the different trajectories.

Neglecting the second-order nonadiabatic couplings, <sup>D</sup>ji(*R*), due to their usually small size [67], we are left with an equation of motion for the phases and the amplitudes, which is equivalent to the following nuclear wavefunction time-evolution equation

$$i\hbar \frac{\partial \Omega\_j(\mathbf{R}, t)}{\partial t} = E\_j^{el}(\mathbf{R})\Omega\_j(\mathbf{R}, t) - i\hbar \sum\_{i \neq j}^{\infty} \sum\_{\gamma}^{N\_n} \frac{1}{M\_\gamma} d\_{ji}^\gamma(\mathbf{R}) \cdot \hat{\mathbf{p}}^\gamma \Omega\_i(\mathbf{R}, t) \tag{35}$$

where we have used the definition of the momentum operator *p*ˆ<sup>γ</sup> <sup>=</sup> <sup>−</sup>i-∇γ.

(*c*) In the derivation of the equation of motion for the nuclear amplitude coefficients, we start by assigning delta-like wave packets (denoted as the TSH wave packet in the following) to each trajectory, α, defined as

$$
\hat{\Omega}\_j^{\lambda,[\alpha]}(\mathbf{R},t) = \hat{A}\_j^{[\alpha]}(t) \exp\left[\frac{i}{\hbar}\hat{S}\_j^{[\alpha]}(t)\right] g^\lambda(\mathbf{R} - \mathbf{R}^{[\alpha]}(t)) \tag{36}
$$

where <sup>A</sup>˜[α] <sup>j</sup> (t) and <sup>S</sup>˜[α] <sup>j</sup> (t)/ are real functions representing the amplitude and the phase of the TSH nuclear wave packet at *R*[α] (t) in electronic state <sup>j</sup>. The function <sup>g</sup><sup>λ</sup>(*R* <sup>−</sup> *R*[α] (t)) = 1 λ <sup>√</sup><sup>π</sup> exp ! <sup>−</sup>(*R* <sup>−</sup> *R*[α] (t))<sup>2</sup>/λ<sup>2</sup>) " , localized at the position of the classical trajectory, α, is normalized to <sup>d</sup>*R* <sup>g</sup><sup>λ</sup>(*R*−*R*[α] (t)) = 1 and becomes a <sup>δ</sup>-function in the limit lim<sup>λ</sup>→∞ <sup>g</sup><sup>λ</sup>(*R*<sup>−</sup> *R*[α] (t)) = <sup>δ</sup>(*R* <sup>−</sup> *R*[α] (t)). The total probability density of the nuclear wave packet in state j becomes <sup>|</sup>Ω<sup>j</sup> (*R*, t)<sup>|</sup> <sup>2</sup> <sup>∼</sup> <sup>1</sup> Ntraj [α] <sup>∞</sup> <sup>t</sup>=0 dt <sup>|</sup>A˜[α] <sup>j</sup> (t )| <sup>2</sup>g<sup>λ</sup>(*R* <sup>−</sup> *R*[α] (t ))δ(<sup>t</sup> <sup>−</sup> <sup>t</sup> ), where Ntraj is the total number of trajectories. The *independent trajectory approximation* invoked in point (*a*) also has an important impact on the nonadiabatic component of the nuclear dynamics (due to their nonlocality; see Equations (18) and (19)). Indeed, it has the consequence that, for a given trajectory, <sup>α</sup>, the complex amplitudes, Ω˜ λ,[α] <sup>j</sup> (*R*, t), for each and every electronic state, <sup>j</sup>, share the same support (localized around the instantaneous nuclear position, *R*, in configuration space). Said otherwise, the TSH nuclear wave packet component, <sup>g</sup><sup>λ</sup>(*<sup>R</sup>* <sup>−</sup> *<sup>R</sup>*[α] (t)), will be the same for all electronic states of a trajectory, α, at any time, t (this is why g<sup>λ</sup> does not have an electronic state index). This is indubitably the strongest approximation made in the "derivation", since it induces "overcoherence" in the dynamics of the amplitudes, C[α] <sup>j</sup> (t), and suppresses all (nonadiabatic) decoherence effects that could occur, for example, at and after the branching of nuclear wave packets.

(*d*) Since we are working in the Lagrangian frame, we need only consider the explicit time-dependence of the amplitudes and phases. As a consequence, the TSH nuclear wave packet evolving in electronic state j follows the classical trajectory, α, on the support of the function, <sup>g</sup><sup>λ</sup>(*<sup>R</sup>* <sup>−</sup> *<sup>R</sup>*[α] ) (where *R*[α] is the position vector in the Lagrangian frame), and is described by <sup>A</sup>˜[α] <sup>j</sup> (t) exp i -<sup>S</sup>˜[α] <sup>j</sup> (t) .

If we substitute <sup>Ω</sup><sup>j</sup> (*R*, t) in Equation (35) by the form given in Equation (36) and then apply points (*a*), (*b*) and (*d*), we obtain [44]

$$-\dot{\tilde{S}}\_{j}^{[\alpha]}(t) = E\_{j}^{el}(\mathbf{R}^{[\alpha]}) + \hbar \sum\_{\gamma}^{N\_{n}} \sum\_{i \neq j}^{\infty} \frac{\tilde{A}\_{i}^{[\alpha]}(t)}{\tilde{A}\_{j}^{[\alpha]}(t)} \left( d\_{ji}^{\gamma}(\mathbf{R}^{[\alpha]}) \cdot \dot{\mathbf{R}}\_{\gamma}^{[\alpha]} \right) \odot \left[ e^{i\bar{\phi}\_{ij}^{[\alpha]}(t)} \right] \tag{37}$$

˜˙ A[α] <sup>j</sup> (t) = <sup>−</sup>- N<sup>n</sup> γ -∞ i =j <sup>A</sup>˜[α] <sup>i</sup> (t) ! *d*γ ji(*R*[α] ) · *R*˙ [α] γ " eiφ˜[α] ij (t) . (38)

Here, <sup>φ</sup>˜ij (t) = <sup>1</sup> - ! <sup>S</sup>˜[α] <sup>i</sup> (t) <sup>−</sup> <sup>S</sup>˜[α] <sup>j</sup> (t) " and *R*˙ [α] are the nuclear velocities at time t along trajectory α.

Notice that Equations (37) and (38) are equivalent to the TSH equations for the complex coefficients, C[α] <sup>j</sup> (t) (Equation (12)), which is obtained using a polar representation of the complex coefficients, C[α] <sup>j</sup> (t) = <sup>A</sup>˜[α] <sup>j</sup> (t) exp i -<sup>S</sup>˜[α] <sup>j</sup> (t) .

We have described until now the dynamics of TSH nuclear wave packets following a single classical trajectory, α. At this point, we have to account for the fact that the nuclear dynamics in TSH is described by a "swarm" of classical trajectories that evolve according to the adiabatic and nonadiabatic components of the equation of motion (points (*a*) to (*d*)). In order to to this, we have to require that the following be maintained

$$(A\_j^{TSH}(\mathbf{R}, t))^2 d\mathbf{R} \approx (A\_j(\mathbf{R}, t))^2 d\mathbf{R} \tag{39}$$

at all times, for a sufficiently large number of trajectories. In Equation (39), (ATSH <sup>j</sup> (*R*, t))<sup>2</sup> is computed as the density (histogram) of configuration space points in the volume element, <sup>d</sup>*R*, at *<sup>R</sup>*(t) in state <sup>j</sup> that are sampled by the ensemble of <sup>N</sup>traj trajectories

$$(A\_j^{TSH}(\mathbf{R}, t))^2 d\mathbf{R} = \frac{N\_j(\mathbf{R}, d\mathbf{R}, t)}{N\_{traj}} \tag{40}$$

while the right-hand side is the corresponding nuclear density obtained from the corresponding quantum mechanically propagated nuclear wave packets. Note that Equation (39) is only valid when correlated quantum (Bohmian) trajectories are used [36].

In TSH, the balance described in Equation (39) is maintained (in an approximative way) through the use of the switching algorithm given in Equations (13) and (14), which can be motivated by the following considerations:


$$\int d\mathbf{R} \, (A\_j^{TSH}(\mathbf{R}, t))^2 = |C\_j(t)|\_{av}^2 \tag{41}$$

where the <sup>|</sup>C<sup>j</sup> (t)|av are the norms of the coefficients defined in Equation (12) averaged over the ensemble of trajectories. Equation (41) is the internal consistency criterion described in [68]. However, in practice, one replaces the <sup>|</sup>C<sup>j</sup> (t)<sup>|</sup> 2 av with the corresponding amplitudes computed along a single trajectory, <sup>|</sup>C[α] <sup>j</sup> (t)<sup>|</sup> <sup>2</sup>, which are the coefficients that appear in Equation (13). The reason for this modification is that in the *independent trajectory approximation*, one computes single trajectories, and therefore, the average over the ensemble is not available during propagation. This replacement of <sup>|</sup>C<sup>j</sup> (t)<sup>|</sup> 2 av by <sup>|</sup>C[α] <sup>j</sup> (t)<sup>|</sup> <sup>2</sup> is, in our opinion, more of an assumption than an approximation and remains without formal justification.

In summary, starting from the *exact* formulation of the nonadiabatic dynamics within the nonadiabatic Bohmian dynamics framework, we proposed a series of approximations/assumptions (points (*a*) to (*f*)) that help rationalize Tully's TSH equations of motion for the nuclear trajectories and amplitudes. In particular, the *independent trajectory approximation* (point (*a*)) implies that the amplitudes and phases associated with the classical trajectories are uncorrelated (which is also evident from the fact that the trajectories are propagated separately) and that quantum nonlocality is, therefore, lost. The assumption made in point (*f*) is particularly strong, as it states that the averaged TSH population amplitude (on a given electronic state, j) taken over the ensemble of trajectories can be replaced by the corresponding amplitude, C[α] <sup>j</sup> , computed along a single trajectory, α. Furthermore, the nuclear amplitudes associated to each electronic state are evaluated strictly at the same position in space, at any time t, even though the different curvature of the PESs involved in the dynamics may drive the nuclear wave packets towards different regions in configuration space. This implies that TSH is strictly local in space and time or, equivalently, that equal-time corresponds to equal-space events, which leads to the loss of quantum mechanical nonlocality [37]. This is the case, even if we allow for retardation (causality), since the TSH equations have no memory. In other words, Equation (12) is obtained from

$$i\hbar \dot{C}\_{j}^{[\alpha]}(t) = \sum\_{i}^{\infty} \int\_{t\_{0}}^{t} dt' F(t - t') C\_{i}^{[\alpha]}(t') \left( E\_{i}^{el}(\mathbf{R}^{[\alpha]}) \delta\_{ij} - i\hbar \sum\_{\gamma}^{N\_{n}} \mathbf{d}\_{ji}^{\gamma}(\mathbf{R}^{[\alpha]}) \cdot \dot{\mathbf{R}}\_{\gamma}^{[\alpha]} \right) \tag{42}$$

with the kernel, <sup>F</sup>(<sup>t</sup> <sup>−</sup> <sup>t</sup> ), replaced by a delta function, <sup>δ</sup>(<sup>t</sup> <sup>−</sup> <sup>t</sup> ). Some implication of these approximations will be described in Section 4 for simple one-dimensional model systems.

#### 4. Trajectory Surface Hopping at Work

While TSH is an elegant compromise between accuracy and efficiency for the simulation of nonadiabatic phenomena, its accuracy (either in its fewest-switches version or with additional corrections) has been challenged several times in the literature (see [22,68–72] for an non-exhaustive list). Recently, a series of simple one-dimensional model systems were used to highlight potential failures of the standard TSH, even with high initial momenta [28,73–76]. The "double arch" model is composed of a couple of potential energy curves, whose shapes strongly differ in the region where they are not degenerate (−<sup>10</sup> <sup>≤</sup> <sup>x</sup> <sup>≤</sup> <sup>10</sup> a.u. in Figure 1).

In this model, a Gaussian wave packet launched from <sup>x</sup> <sup>=</sup> <sup>−</sup><sup>20</sup> a.u. with a positive initial momentum will first reach a region of strong nonadiabaticity (Figure 2, upper panel), leading to a population of both the ground state (GS) and the first excited state (S1). Right after this nonadiabatic event occurs, the two potential energy curves will diverge, one exhibiting a strong positive slope (S<sup>1</sup> state), the other a negative one (GS). The wave packet contribution in each electronic state will therefore be spatially split and eventually recombined in a second nonadiabatic region at x = 10 a.u. (Figure 2, upper panel). The final population on S<sup>1</sup> after the second nonadiabatic region strongly depends on the spatial decoherence between the nuclear wave packets. However, such peculiar decoherence is hardly captured by TSH, due to the *independent trajectory approximation* (and other approximations discussed in Section 3). This is observed from its deviation with respect to an exact nuclear wave packet propagation in the lower panel of Figure 2. TSH in general fails qualitatively for all different initial momenta tested here, which correspond in all cases to a propagation with no back reflections. Changing the initial conditions of TSH strongly alters the final population of S1, but does not improve it substantially [75]. On the other hand, the correlated quantum trajectories (NABDY) provide an accurate description of the nuclear wave packet propagation with only minor deviations from the exact propagation (full numerical details can be found in [44]).

Figure 1. The double arch model in the adiabatic representation. The ground state (GS) (S1) potential energy curve is represented with a red (dashed) line and nonadiabatic coupling with a blue dotted line. The initial nuclear wave packet is displayed in grey.

We further investigate the effects of overcoherence on the TSH dynamics by means of a second model system consisting of two coupled harmonic potentials, as depicted in the upper inset of Figure 3. A swarm of trajectories (and a corresponding Gaussian wave packet for the exact propagation) is initialized in the excited state (S1) at x = 0 a.u., with a positive initial momentum p<sup>0</sup> = 40 a.u.. In this model system, a single nonadiabatic region is located at x = 10 a.u.; the initial conditions are chosen in such a way that the wave packets (and the classical trajectories) will reflect back shortly after the first transition through the nonadiabatic coupling region recrossing, and therefore the same coupling region a second time, with opposite velocity (for a total of two nonadiabatic events, see the lower inset of Figure 3).

The S<sup>1</sup> wave packet enters the strong coupling region at x = 10 a.u. (for t < 1, 000 a.u.) and populates the GS (87%, Figure 3). This first nonadiabatic event is perfectly described by TSH (Figure 3, <sup>1000</sup> <sup>≤</sup> <sup>t</sup> <sup>≤</sup> <sup>2000</sup> a.u.). Due to the difference between the potential energy curves (slope of Eel <sup>S</sup><sup>1</sup> larger than the one of Eel GS), the wave packet component in the GS travels further towards positive x values, while the weak contribution in S<sup>1</sup> inverts the direction of its propagation and rapidly returns towards the nonadiabatic region at x = 10 a.u.. In the exact propagation, there is no interference with the wave packet evolving in the GS, since the two wave packets (GS and S1) are spatially separated. As for the first transition through the nonadiabatic region at <sup>x</sup> <sup>∼</sup> <sup>10</sup> a.u., the S<sup>1</sup> wave packet is transferred almost entirely to the other electronic state (now, the GS, Figure 3, <sup>t</sup> <sup>≥</sup> <sup>3000</sup> a.u.). On the other hand, in TSH, each independent trajectory carries a set of coherently coupled complex amplitudes (see point (*c*) of Section 3). When reaching the nonadiabatic coupling at x = 10 a.u. for the second time, the complex amplitudes, C[α] GS(t) and <sup>C</sup>[α] <sup>S</sup><sup>1</sup> (t), evolved along a given trajectory, <sup>α</sup>, in S<sup>1</sup> couple coherently because they share the same support (same position in space for any time t). This induces "overcoherence" in dynamics for the amplitudes, which leads to deviations from the exact propagation (Figure 3, <sup>3000</sup> <sup>≤</sup> <sup>t</sup> <sup>≤</sup> <sup>5000</sup> a.u.). The total population in S<sup>1</sup> increases back to <sup>∼</sup>78% of the <sup>t</sup> = 0 value when the wave packet in the GS recrosses the nonadiabatic region at t = 6000 a.u.. Some additional deviations of TSH with respect to the exact propagation are observed, and they are linked to subsequent recrossings of the nuclear wave packets.

Figure 2. Nonadiabatic dynamics for the double arch system. (Upper panel) Time series (gray scale) of the nuclear wave packet probability density, <sup>|</sup>A<sup>j</sup> (x, t)<sup>|</sup> <sup>2</sup>, and trajectory surface hopping (TSH) histograms for p<sup>0</sup> = 45 a.u. (lower panel = GS; upper panel = S1). The adiabatic potential energy curves are given in red, while the nonadiabatic coupling vectors are shown in blue. (Lower panel) Deviation of the final population in S<sup>1</sup> from an exact nuclear wave packet propagation obtained with TSH and nonadiabatic Bohmian dynamics (NABDY), for different initial momenta ("TSH": initial conditions sampled from a Gaussian distribution for positions and momenta, 1,500 trajectories; "TSH∗": same initial conditions, momentum and position, for all 1,500 trajectories; "NABDY" is based on a maximum total number of 162 trajectories). The maximum total number of quantum trajectories used in NABDY is 162.

Figure 3. Nonadiabatic dynamics on two coupled harmonic potential energy curves. Population in the first excited state (S1) along the dynamics for 3,444 TSH trajectories (green) and an exact propagation (red). (Upper inset) Schematic representation of the model. The GS (S1) potential energy curve is represented with a continuous (dashed) black line and the nonadiabatic coupling with a blue dotted line. The initial nuclear wave packet is displayed in grey. (Lower inset) Time series of potential energies along a TSH trajectory. The trajectory is initially in S1, then jumps to the GS after the first coupling and, finally, hops back to S<sup>1</sup> after it reaches back to the coupling region. This representation highlights that the model describes two nonadiabatic events with a single nonadiabatic region.

#### 5. Conclusions

The description of the nonadiabatic dynamics of molecular systems is a challenging task for theory, due to the difficulty of providing both the electronic structure of a system *beyond* its electronic ground state and the corresponding nuclear dynamics *beyond* the Born-Oppenheimer approximation. Moreover, nonadiabatic phenomena require a description of the nuclear degrees of freedom that goes *beyond* the classical approximation and, finally a good compromise between accuracy and efficiency is also required when realistic molecular systems are investigated. In this article, we have summarized some of the main techniques for describing the nonadiabatic dynamics of molecular systems, namely Ehrenfest dynamics, nonadiabatic Bohmian dynamics, Multiple Spawning, the recently proposed Exact Factorization, and trajectory surface hopping. We have also shown how the latter method can be rationalized starting from the "exact" nonadiabatic Bohmian dynamics equations. Trajectory surface hopping is indeed one of the most commonly applied on-the-fly trajectory-based methods to describe the dynamics of molecular systems beyond the Born-Oppenheimer approximation in the (unconstrained) configuration space. This is possible at the cost of describing the nuclear wave packet dynamics with a swarm of uncorrelated classical trajectories with the consequent banishing of all quantum (de)coherence effects. Understanding the underlying limitations of trajectory surface hopping is of foremost importance for the future improvement of the theory and, in our opinion, quantum Bohmian dynamics can give valuable contributions in this direction.

#### Acknowledgments

We are grateful to Jiˇrí Vanícek and Tomàš Zimmermann for providing us a version of their exact ˇ propagation code. COSTactions CM0702 and CM1204 and Swiss National Science Foundation grants 200021-137717 and 200021-146396 are acknowledged for funding and support.

#### Conflicts of Interest

The authors declare no conflicts of interest.

#### References


Reprinted from *Entropy*. Cite as: Arnold, A.; Breitsprecher, K.; Fahrenberger, F.; Kesselheim, S.; Lenz, O.; Holm, C. Efficient Algorithms for Electrostatic Interactions Including Dielectric Contrasts. *Entropy* 2013, *15*, 4569–4588.

*Article*

## Efficient Algorithms for Electrostatic Interactions Including Dielectric Contrasts

Axel Arnold \*, Konrad Breitsprecher, Florian Fahrenberger, Stefan Kesselheim, Olaf Lenz and Christian Holm \*

Institute for Computational Physics, University of Stuttgart, Allmandring 3, Stuttgart 70569, Germany; E-Mails: konrad.breitsprecher@icp.uni-stuttgart.de (K.B.); florian.fahrenberger@icp.uni-stuttgart.de (F.F.); kessel@icp.uni-stuttgart.de (S.K.); olenz@icp.uni-stuttgart.de (O.L.)

\* Authors to whom correspondence should be addressed; E-Mails: arnolda@icp.uni-stuttgart.de (A.A.); holm@icp.uni-stuttgart.de (C.H.); Tel.: +49-711-685-63593.

*Received: 6 August 2013; in revised form: 15 October 2013 / Accepted: 18 October 2013 / Published: 24 October 2013*

Abstract: Coarse-grained models of soft matter are usually combined with implicit solvent models that take the electrostatic polarizability into account via a dielectric background. In biophysical or nanoscale simulations that include water, this constant can vary greatly within the system. Performing molecular dynamics or other simulations that need to compute exact electrostatic interactions between charges in those systems is computationally demanding. We review here several algorithms developed by us that perform exactly this task. For planar dielectric surfaces in partial periodic boundary conditions, the arising image charges can be either treated with the MMM2D algorithm in a very efficient and accurate way or with the electrostatic layer correction term, which enables the user to use his favorite 3D periodic Coulomb solver. Arbitrarily-shaped interfaces can be dealt with using induced surface charges with the induced charge calculation (ICC\*) algorithm. Finally, the local electrostatics algorithm, MEMD (Maxwell Equations Molecular Dynamics), even allows one to employ a smoothly varying dielectric constant in the systems. We introduce the concepts of these three algorithms and an extension for the inclusion of boundaries that are to be held fixed at a constant potential (metal conditions). For each method, we present a showcase application to highlight the importance of dielectric interfaces.

#### 1. Introduction

Electrostatic interactions play an important role in many nano- or meso-scale systems. Almost every surface immersed in water develops a significant surface charge, due to the acid-base reactions of surface groups, and most biomolecules also carry charges. Therefore, it is often indispensable to include these long-ranged interactions in computer simulations. However, the system sizes that can be handled in simulations are orders of magnitude smaller than in real experiments, which drastically enhances the influence of boundary effects. To avoid artifacts due to an artificially small simulation volume, one typically uses periodic boundary conditions (PBC), which in simulations have to be taken into account by special electrostatics algorithms. These are often based on the idea of the Ewald summation [1–4], namely splitting the potential into a smooth, long-ranged and a singular, short-ranged contribution. In modern computer simulations, one usually uses mesh-based variants of the Ewald summation [5–8], which have a favorable O(N log N) computing time scaling with respect to the number of charged particles, N.

When studying capacitors, membranes or thin films, one does not want PBC perpendicular to the surfaces of interest. For these systems, partially periodic boundary conditions with only two periodically replicated dimensions are desirable, while the third one has a finite extent h (2D + h geometry). For these geometries, the Ewald summation becomes ineffective [9]. However, it has been shown that replicating the system artificially in the direction perpendicular to the surface results in reasonable accuracy when compared to the non-periodic system, provided a sufficient gap is left between the replicas, and a correction term for the summation order is included [10]. The ELC (electrostatic layer correction) approach [11,12] can, in principal, compute electrostatic interactions with the charges in the unwanted periodic dimension exactly. Practically, it allows one to tune the gap size to a desired accuracy, so that the 2D + h geometry is computationally tractable with any Coulomb solver for 3D PBC. It has to be stated that an algorithmic approach for partially periodic systems has been presented in [13,14] using a Monte Carlo extension to the local update scheme sketched in Section 5, but it is not implemented into the molecular dynamics implementation used for this article.

When studying large-scale problems, for example, the buckling of membranes, solutions containing charged polymers or colloids, DNA translocation or crystallization of charged colloids, an atomistic representation of the system under study is often unfeasible, even with periodic boundary conditions. This is related to the vast numbers of charges that would need to be treated, but also with the unfavorable scaling of the relaxation times of the system. One popular route to tackle such problems is to use implicit solvent models, which electrostatically represent the solvent as an effective dielectric medium of the representative dielectric constant at the investigated temperature. This works well if particles, for example, colloids or polymers, are in bulk solution, since, then, the dielectric medium is isotropic and, to a good approximation, homogeneous. However, if surfaces are involved, the dielectric constant of the solvent is usually drastically different from the material forming the surface. For example, water at room temperature has a dielectric constant of εrel ≈ 80, while a typical membrane has a dielectric constant of εrel ≈ 2 − 5.

In systems with spatial variations of the dielectric constant, the Poisson equation for electrostatics reads as:

$$\nabla \cdot \left( \varepsilon(r) \nabla \Phi \right) = -\rho \tag{1}$$

with the permittivity <sup>ε</sup>(*r*) = <sup>ε</sup>0εrel(*r*), the electrostatic potential, <sup>Φ</sup>, and the charge distribution, <sup>ρ</sup>. Equation (1) leads to complex boundary conditions at dielectric interfaces, which need to be taken into account by the underlying electrostatics method. Conducting media can, in principle, be treated as the ε → ∞ limit of the above equation and, thus, allow one to treat this case with very similar methods to those for dielectric contrast. Since these materials appear in important fields, such as energy storage and electrolyte capacitors, our described methods of treating dielectric contrast will also be useful there.

In the following, we will describe how to fulfill the dielectric boundary conditions for planar surfaces using the concept of image charges [15–17]. These approaches can even be extended to the special case of two *connected* conducting surfaces. As an alternative route, we present the ICC\* (induced charge calculation) algorithm, which can handle arbitrarily-shaped surfaces [18,19]. Instead of computing image charge interactions, which is only feasible in some simple geometries, the method determines the induced charges at the surfaces self-consistently.

Both image charge and induced charge approaches can only handle boundaries between media of otherwise constant dielectric properties. In this article, we will also describe an extension of the Maxwell Equations Molecular Dynamics (MEMD) algorithm [20,21], which can also handle continuously varying dielectric constants. It solves a simplified version of the Maxwell electrodynamics equations on a discrete lattice.

The methods and results presented in this review were mostly published before in [16–19,22]. Implementations of all methods, including the features discussed here, can be found in ESPResSo, the Extensible Simulation Package for Research on Soft matter [22,23]. A review article on the general topic of long-range interactions in soft matter can be found in [24].

#### 2. Planar Interfaces: Image Charges

We start by discussing the simplest case of dielectric interfaces, namely, two planar, parallel interfaces that enclose a set of charges. We assume a vertical orientation of the interfaces and refer to a left and a right (l/r) interface. The electric field between the two interfaces can be computed from these charges, plus additional image charges outside of the dielectric boundaries [25]. The positions and magnitude of these images charges are chosen to satisfy the boundary conditions for the electric field. If only one interface, the left or the right interface, were present, image charges would appear reflected at the respective interface, with the charge scaled down by a factor of:

$$
\Delta\_l = \frac{\varepsilon\_m - \varepsilon\_l}{\varepsilon\_m + \varepsilon\_l} \qquad \text{and} \qquad \Delta\_r = \frac{\varepsilon\_m - \varepsilon\_r}{\varepsilon\_m + \varepsilon\_r} \tag{2}
$$

To construct the image charges in a situation with two interfaces, every image charge created by reflection on one interface also needs to be reflected again onto the other interface. This leads to an infinite set of images: A charge, q, is reflected at the right (left) interface and yields an image of the magnitude of qΔ<sup>r</sup> (resp. qΔl). The next reflection gives rise to another image charge, qΔlΔ<sup>r</sup> (resp. qΔrΔl), in the opposite dielectric domain, and so on. The infinite array of image charges is depicted in Figure 1: a charge, q, at position z will produce a series of mirror charges in the right dielectric domain (with εr) with charges:

$$q\Delta^{n+1} \text{ at positions } -2(n+1)L\_z + z \quad \text{and} \quad q\Delta\_r\Delta^n \text{ at positions } -2nL\_z - z, n \ge 0 \tag{3}$$

where L<sup>z</sup> denotes the distance between the two interfaces and Δ := ΔrΔl. In the left dielectric domain (with εl), the charges are:

$$q\Delta^{n+1} \text{ at positions } 2(n+1)L\_z + z \quad \text{and} \quad q\Delta\!\!/ \Delta^n \text{ at positions } 2(n+1)L\_z - z, \; n \ge 0 \tag{4}$$

Figure 1. Schematic summation scheme for Image Charge MMM2D (ICMMM2D): in order to take into account dielectric boundaries, image charges are introduced outside the dielectric boundaries, to the left and right of the original box. The dielectric contrasts, Δ<sup>l</sup> and Δr, are computed from the dielectric jump at the left and right boundaries, respectively. Depending on the dielectric contrasts, charges will either be repelled by the surface (as in this sketch) or attracted to it. Note that in usual computer simulations, the system is periodically replicated in the dimensions parallel to the interfaces, as indicated in the figure.

When computing the electrostatic interaction in a computer simulation with such parallel, infinite planar walls, it is often desired to employ periodic boundary conditions in the two directions parallel to the walls to minimize surface effects. The direct summation of periodic replicas is very costly, as the sum is only slowly convergent. Thus, special techniques to compute the electrostatic interactions are required. MMM2D is such an algorithm that computes the electrostatic interaction with two periodic dimensions [26,27] and is well suited for computing the interactions with the image charges. The key idea of the MMM2D algorithm is to use two different formulas for the interaction of two charges. One of the formulas, the near formula, is only used if the two particles are sufficiently close. In combination with image charges, it is only used for the closest of all images, and its discussion is beyond the scope of this article.

The second, far formula is accurate if a certain distance between the two charges is exceeded. We assume a simulation box of L × L × L<sup>z</sup> that is periodically replicated in the x and y dimensions. Then, the Coulomb potential of a unit charge placed at the origin evaluated at position (x, y, z) with |z| > 0, including periodic replicas in the x and y direction, can be written as:

$$\Phi(x,y,z) = \frac{1}{4\pi\varepsilon L^2} \sum\_{p,q \neq 0} \frac{e^{-2\pi f\_{pq}|z|}}{f\_{pq}} e^{(2\pi px + 2\pi qy)/L} + \frac{1}{2\varepsilon L^2} |z| \tag{5}$$

where fpq := p<sup>2</sup> + q<sup>2</sup>/L. This formula follows from a Fourier transformation of the Poisson equation in the x and y direction and can be factorized into contributions. This makes it possible to compute the interactions between N separated charges in O(N) operations. Due to the unfavorable scaling of the near formula, MMM2D scales only like <sup>O</sup>(N<sup>5</sup>/<sup>3</sup>) overall. It, however, is still superior to Ewald-based methods [9] in partially periodic boundary conditions.

Coming back to our original problem of taking into account the infinite array of image charges, we note that all these charges, with the exception of those directly adjacent, i.e., with index n ≥ 1, are far from the slab containing the real charges. Therefore, we can apply the fast far formula in order to compute the interaction with these charges. The infinite sums of image charges lead to geometric series of the form:

$$\sum\_{n=0}^{\infty} \Delta^n \frac{1}{4\pi\varepsilon\_m L^2} \sum\_{p, q \neq 0} \frac{e^{-2\pi f\_{pq} (2nL\_z \pm z)}}{f\_{pq}} e^{\langle 2\pi px + 2\pi qy \rangle/L} \tag{6}$$

which is a geometric series that can be easily simplified to:

$$\frac{1}{4\pi\varepsilon\_m L^2} \sum\_{p,q\neq 0} \frac{e^{\pm 2\pi f\_{pq}z}}{f\_{pq}(1-\Delta e^{4\pi f\_{pq}L\_z})} e^{\langle 2\pi px + 2\pi qy \rangle/L} \tag{7}$$

In other words, an existing implementation of MMM2D can be easily enhanced in order to include dielectric interfaces, simply by modifying the prefactors of the p, q-Fourier sum. For the detailed expressions, see [16]. In the following, we denote such an MMM2D implementation for dielectric interfaces as ICMMM2D (Image Charge MMM2D).

Note that Equation (5) contains an additional term, <sup>|</sup>z|/(2εmL<sup>2</sup>), which represents the constant Fourier mode. The summation over the image charges of this term leads to expressions of the form:

$$\frac{1}{2\varepsilon\_m L^2} \sum\_{n\geq 0} \Delta^n \left( 2nL\_z \pm z \right) \tag{8}$$

since 2nL<sup>z</sup> is larger than any possible particle distance z. Unlike Equation (6), this is not a simple geometric series. However, when computing the total potential in a charge neutral system, terms that do not depend on the positions cancel out, in particular, the 2nL<sup>z</sup> terms. What remains from the four image charge sums are again geometric series that lead to a contribution of:

$$\frac{1}{2\varepsilon\_m L^2} \frac{\Delta\_r - \Delta\_l}{1 - \Delta} z \tag{9}$$

If the two planar walls have the same dielectric contrast, this contribution to the potential vanishes. Furthermore, if one is only interested in energy or forces, this contribution vanishes, due to charge neutrality.

An alternative method for planar dielectric interfaces is based on an extension of our ELC method. It also uses the technique to sum up the image charges with the ICMM2D far formula, and we termed it ELCIC (ELC with Image Charges), see [17] for details of the method.

Note that it is also possible to consider systems that are not charge neutral. Formally, one assumes two equally charged plates at the positions of the dielectric interfaces that cancel the total charge [28]. The field of a charged plate is, however, exactly what the <sup>|</sup>z|/(2εmL<sup>2</sup>) term represents, so that the above considerations for this term still hold. Therefore, one can safely ignore this contribution.

Figure 2. (a) Sketch of our simulation setup, a 3:1 electrolyte, e.g., AlCl3, between two walls with a dielectric constant different from that of water. The size difference between the ion types is neglected in this simulation. The slab is periodically replicated parallel to the walls, vertical in the sketch. (b) Density distribution of anions and cations of the trivalent electrolyte, near the dielectric interface. The dielectric interfaces is placed at x = 0, and a repulsive potential maintains a minimum distance of 0.5 nanometers for all ions. A good dielectric ε<sup>C</sup> = 800, representing conducting material, strongly attracts cations, while anions are less attracted. A bad dielectric ε<sup>C</sup> = 2, representing a typical biological membrane, repels cations. The univalent anions are much less repelled.

#### *2.1. Example: Electrolyte between Dielectric Walls*

As an example application of the ICMMM2D algorithm, we simulated a 3:1 electrolyte (e.g., AlCl3) with a concentration of 0.01 mol/l confined by planar walls to a slab, as depicted in Figure 2a. The size difference between positive and negative ions has been neglected here. This is a good model for the narrow slit between the electrodes of a capacitor, as well as two biological membranes or glass plates. However, the dielectric properties of a metal electrode (here approximated with ε<sup>C</sup> ≈ 800) and a biological membrane (ε<sup>C</sup> ≈ 2) are very different and in both cases also differ strongly from the permittivity of the solvent; here, water (ε<sup>W</sup> = 80). The resulting dielectric jumps at the surfaces have a pronounced effect on the distribution of ions near the walls.

Figure 2b shows this strong influence of the dielectric interfaces. Both cations and anions are attracted to the walls of high permittivity (ε<sup>C</sup> = 800), but repelled by the low permittivity (ε<sup>C</sup> = 2) walls. On a microscopic level, the electric field of a charge will give rise to a dielectric displacement in the wall. This displacement will weaken or pronounce the field within the dielectric medium, compared to the other side of the interface. It can be accounted for by imagining virtual mirror charges or surface charges directly on the interface. To correctly reproduce the field discontinuities at the boundary, these charges will be of an attractive nature in the region of lower permittivity and of a repulsive nature in the region of higher permittivity. Note also that the effect is more pronounced for multivalent ions. Ignoring the dielectric jumps would lead to a much more homogeneous charge distribution, so that one would strongly underestimate the effect of including multivalent ions.

#### 3. Arbitrarily-Shaped Interfaces: Induced Charges

The concept of induced charges rather than image charges is a direct route to take into account dielectric interfaces of arbitrary shape. Conventionally, charge induction is considered to be the origin of Faraday's cage effect: applying an electric field to a conductor will trigger the mobile charges inside to move until the electric field vanishes inside and field lines end orthogonally to the surface of induced charges. The same concept, however, can also be applied to dielectrics, hence materials with immobile charges. This can be seen from the following mathematical consideration. The Poisson equation in an inhomogeneous dielectric medium (1) can be rewritten as:

$$
\Delta\Phi = -\frac{1}{\varepsilon}\rho - \frac{1}{\varepsilon}\nabla\varepsilon \cdot \nabla\Phi \tag{10}
$$

The term, ∇ε · ∇Φ, is identified as the *induced surface charge density*, σ. It is nonzero only on the dielectric interfaces, since ∇ε vanishes everywhere else.

Let us introduce the Green's function, G, for the Laplace operator. It can, e.g., be just <sup>1</sup> 4π|r−r| , but may also include the desired periodicity. Then, it is possible to eliminate the Laplace operator from the equation above and express the potential by means of two integrals:

$$\Phi = \int\_{V} G\left(r, r'\right) \rho\left(r'\right) / \varepsilon \,\mathrm{d}V' + \int\_{A} G\left(r, r'\right) \sigma\left(r'\right) / \varepsilon \,\mathrm{d}A' \tag{11}$$

The volume integral extends over medium 1, and the surface integral extends over all dielectric interfaces. The potential is now expressed in terms of the Green's function of a homogeneous dielectric, yet the induced charge density, σ, is still unknown. We now assume that the charges are embedded in a medium with permittivity, ε1, and for simplicity, only a second permittivity, ε2. By taking the gradient and inserting this expression in the definition of the induced charge density, we obtain the following integral equation:

$$\sigma = 2\frac{\varepsilon\_1 - \varepsilon\_2}{\varepsilon\_1 + \varepsilon\_2} \left( \int\_V \nabla\_r G\left(r, r'\right) \rho\left(r'\right) / \varepsilon\_1 \mathrm{d}V' + \int\_A \nabla\_r G\left(r, r'\right) \sigma\left(r'\right) / \varepsilon\_1 \mathrm{d}A' \right) \tag{12}$$

This result is easily generalized to multiple regions with different permittivities The idea of the ICC\* algorithm [18] is to determine this charge density self-consistently, after discretizing the surface.

In principle, this approach is a boundary element method, an approach that is very widely used, e.g., for a low Reynold number flow [29]. Different from other approaches is, however, that the efficient evaluation of the Green's function can be borrowed from standard Coulomb solvers. This can be seen from the following: assuming a discretized surface of m point charges on the dielectric interface, the equation above for discretization point k can be written as:

$$q\_k = A\_k \frac{\varepsilon\_1 - \varepsilon\_2}{\varepsilon\_1 + \varepsilon\_2} n\_k \cdot \left[ \sum\_{i=1}^n q\_i \nabla\_{r\_k} G\left(\boldsymbol{\tau}\_k, \boldsymbol{\tau}\_i\right) + \sum\_{j=1, j \neq k}^m q\_j \nabla\_{r\_k} G\left(\boldsymbol{\tau}\_k, \boldsymbol{\tau}\_j\right) \right]$$

where A<sup>k</sup> is the surface area of the surface element, k. The term in square brackets is just the electric field acting at the position of point k, assuming a homogeneous dielectric constant, ε1, in the system, created by conventional (not induced) charges. Any standard Coulomb solver can thus be used to perform the calculation. The desired solution of all q<sup>k</sup> is the fix point of the following iteration:

$$q\_k^{l+1} = \left(1 - \omega\right) q\_k^l + \omega A\_k \frac{\varepsilon\_1 - \varepsilon\_2}{\varepsilon\_1 + \varepsilon\_2} n\_k \cdot \mathbf{E}\left(\left[q\_i\right], \left[q\_j^l\right]\right),$$

It turned out that this iteration is very stable and with a choice of ω ≈ 0.7, no stability issues occur. In every MD step, only 1–3 iterations are necessary, as the particle positions change only slightly.

An important advantage of this algorithm is that the computationally most costly part, the evaluation of the electric field, can be done with *any* usual electrostatics solver without modifications. Thus, not only the computational efficiency, but also the periodicity is inherited from the Coulomb solver. The complexity of the algorithm remains unchanged by the presence of induced surface charges. However, the number of particles can increase considerably. We found it sufficient to discretize the surface with mutual particle distances equal to the distance of closest approach. For the system shown in Section 2, this would mean, in total, 1,600 surface charges per wall, compared to less than 100 ions in the system.

In our research, this algorithm was applied to investigate if dielectric effects can change the electrolytic conductance of very narrow pores, nanopores, through membranes or have an influence on the translocation of charged macromolecules through nanopores [19,30]. Here, dielectric boundary forces lead to a repulsion of (unpaired) ions. Taking into account dielectric effects in small pores will decrease the number of available ions and, thus, decrease the conductance.

The error in the obtained electric field depends on the applied resolution with which the surface is resolved. The method has been tested for planar and curved surfaces [18], and it was found that from a distance larger than one lattice spacing, the relative error remains smaller than 1%. Since the permissible error depends often on the desired application, we advise to determine the necessary accuracy specifically for each case. If interfaces with media with a high dielectric constant or even metallic boundaries are considered, charges are attracted to the surface and can come quite close to the interface, depending on the ion size. In this case, the necessary accuracy is clearly higher than for interfaces with lower dielectric media, from which particles are repelled.

#### *3.1. Example: Ion Distribution in a Pore*

As an example, we investigate the ion distribution in a cylindrical pore of radius 5 nm in a 40 nm-thick membrane in an aqueous electrolyte. This geometry resembles the so-called solid state nanopores [31–33] in silicon wafers. In several experiments (e.g., [34,35]), it was shown that single DNA molecules present in such a pore can be detected by the change of the electric conductance of such a system. To make the dielectric effect more pronounced, we again used a 3:1 electrolyte with a concentration of 10 mmol/L. Our setup is sketched in Figure 3a: the surface charges of the ICC\* algorithm are displayed along with the ions, each as spheres. We assume the dielectric constant of the membrane to be ε<sup>P</sup> = 2. In Figure 3b, the equilibrium distribution of ions near the center of the pore is shown. Ions, especially the trivalent ones, are repelled from the dielectric interface. This leads to an overall decrease of ions in the pore by around 20%. Thus, the conductance of the system can be expected to be reduced similarly, compared to a model that does not consider the dielectric contrast.

Figure 3. (a) The induced charge calculation (ICC\*) example system: positive and negative ions are displayed as red and blue spheres, the ICC\* discretization points by grey spheres; (b) ion density of both species in the pore measured close to the center of the pore.

#### 4. Metallic Interfaces: Corrections

Metallic boundary conditions are the ε → ∞ limit of Equations (2) or (11), where ε denotes the dielectric constant of the surrounding medium. The corresponding dielectric contrasts become −1, so that the field automatically vanishes in the conductor. In the case of two dielectric interfaces, this simply leads to constant potentials on each of the two interfaces, but the potentials are not necessarily the same. It is, however, also possible to fix the electrostatic potential on surfaces in periodic systems. This has been done, for example, for the Monte Carlo implementation of the local algorithm presented in Section 5 [13,14]. The fixing of the surface potential in periodic systems can be achieved also for other algorithms, but the details are different, and we therefore briefly describe the necessary ingredients for the algorithms previously described.

Figure 4. (a) Illustration of the dielectric boundary problem of single charge q outside of a grounded conducting sphere. The problem can be solved by assuming an image charge, q , inside the sphere, leading to zero potential Φ on its surface. If the sphere is assumed to be conducting, but isolated, the excess charge, q , has to be canceled by adding a second charge, q, in the center of the sphere, which leads to a constant surface potential, Φ. (adapted from [25]). (b) A more complex geometry with an upper and lower electrode (yellow). The electrodes are treated with the ICC\* algorithm. If a Coulomb solver with periodic boundary conditions (BCs) in the vertical direction is applied, the potential difference between both electrodes is automatically zero. This is because the periodicity yields zero potential difference between an electrode and its periodic image, and the ICC\* algorithm ensures that the two electrodes connected over periodic BCs are on equal potential.

The starting point is the textbook example of a single point charge outside of a conducting sphere, as depicted in Figure 4a. A metallic sphere brought into an electric field can either be isolated, *i.e.*, the charge on the sphere is constant, or on a constant electrostatic potential, typically grounded. Again, the boundary problem can be solved by adding an image charge opposite to the source charge in the sphere. This ensures that the surface potential of the sphere does not vary. A second image charge can be placed at the center, which dictates the electrostatic potential at the surface of the sphere: for an isolated sphere with zero net charge, the image charge at the center must be of the same magnitude as the other image charge, but with the opposite sign. For a grounded sphere, it is simply zero. In the following, we will show how conducting or grounded boundary conditions can be added to both the image charge and induced charge methods, in order to perform computer simulations with constant electrostatic metallic surface potentials.

Using the MMM2D far formula, the potential difference between the two plates appears as the <sup>p</sup> <sup>=</sup> <sup>q</sup> = 0 mode of the electrostatic potential, *i.e.*, the <sup>2</sup>πlb|z|/L<sup>2</sup> term in Equation (5). The higher modes, in the limit of ε → ∞, serve to hold the potential constant on each conducting plate individually. Due to the absolute value, this term does not cancel when considering the interactions in the primary box. The potential difference, V , of the two plates is:

$$V = \frac{1}{2\varepsilon L^2} \int\_0^L \int\_0^L \int\_0^{L\_z} \rho\left(x, y, z\right) \underbrace{\left(|L\_z - z| - |z|\right)}\_{L\_z - 2z} \mathrm{d}z \, \mathrm{d}y \, \mathrm{d}x$$

where ρ denotes the charge density. In terms of the z-component of the dipole moment of the system, P<sup>z</sup> = <sup>i</sup> qizi, this equals:

$$V = -\frac{1}{\varepsilon L^2} P\_z$$

since the L<sup>z</sup> contribution vanishes once more due to charge neutrality. In order to cancel this potential difference, one has to apply a constant external field *E* <sup>=</sup> (0, <sup>0</sup>, <sup>1</sup> εL2L<sup>z</sup> Pz) in every MD step. This additional field corresponds to that created by the central charge in the spherical image charge picture, which puts the surfaces to equal potential. To obtain a different fixed potential ΔΦ0, an additional field, ΔΦ0/Lz, needs to be applied in the z direction.

Systems treated with ICC\* require no special measures if the surface on the constant potential is connected within the simulation box. However, some attention is required when considering electrodes at the boundary of the simulation box, as depicted in Figure 4b. If an electrostatics method is applied that is not periodic in the respective direction, this will result in two electrically unconnected surfaces. To obtain electrically connected surfaces, such as two grounded plates, it is sufficient to use a solver periodic in the required direction and to leave a gap in the periodic images. This can be seen from the simple case depicted in Figure 4b. If a periodic solver with metallic boundary conditions is used, for example, the Ewald summation, the difference between the electrostatic potential at a given position and its nearest periodic image is necessarily zero, due to periodicity. However, since the induced charges create a constant potential in both sections of the conductor, these must be the same throughout the whole, periodically connected conductor.

To obtain a non-zero electrostatic surface potential, the solution of the Poisson equation with zero potential can be superimposed with a solution of the empty simulation box with nonzero potential on the surfaces. This requires a solution of the Laplace equation that is then applied as an external field, just as in the simple case of parallel plates. To do that, our simulation package, ESPResSo, supports reading in tabulated external potentials, which are applied to charged particles weighted with the according charge. It also takes care that the external potential is not applied to image charges. The solution of the Laplace equation has to be obtained externally. We use a finite element solution based on the DUNE framework [36,37]. Packages, like Matlab or Comsol, can, of course, also be applied.

#### *4.1. Verification: Field and Potential in Metallic BCs*

In order to illustrate the methods described above, we show its importance on a simple model system. We chose a planar geometry, because in this geometry, it is possible to use both the image charge method, as well as the induced charge method. Thus, our system consists of a set of charges confined by two parallel conducting planes. In the following, we will show that isolated plates can be simulated either by using the ICMMM2D algorithm without correction or by using the ICC\* method with a Coulomb solver, which is not periodic in the direction of the planes' normal vectors. Connected plates at zero potential difference can either be simulated using the ICMMM2D method with the correction derived above or using the ICC\* method with a Coulomb solver, which is periodic in the normal direction. As a solver for the fully periodic case, we use the P<sup>3</sup>M algorithm [8,38]; as a solver for the partially periodic case, MMM2D [26,27].

Figure 5. (a) Sketch of the model system that was used to probe the influence of grounded and isolated metallic boundary conditions. The two possible setups are depicted by adding a switch to electrically connect the two plates. (b) The resulting electrostatic field, E, perpendicular to the plates and the electrostatic potential in the slab.

For simplicity, we construct a constant charge distribution with a net dipole moment and probe the electrostatic field with a small test charge q = 10−<sup>9</sup>e that is moved through the system. Metallic boundary conditions are created at z = 0 and z = L<sup>z</sup> = 10 nm by using the four algorithms described above. The dielectric permittivity between the surrounding metal plates is assumed to be ε<sup>W</sup> = 80 for bulk water. The charge distribution we chose is depicted in Figure 5a: fixed, charged particles form two oppositely charged plates at z = 0.25L<sup>z</sup> and 0.75Lz. In one of the two directions parallel to the surrounding metal plates, a void of a length of 0.5 L<sup>x</sup> is left, so that the test charge can be moved through the gaps without getting close to the charged plates. We measure the electric field by performing a force calculation with the respective algorithm and dividing the obtained force by the small value of the test charge. In Figure 5b, we report the measured electric field and the potential obtained from the integration of the electric field. We observe the expected behavior: the shape of the electric field is identical in all cases up to a constant. In the cases where the the algorithms are supposed to simulate two connected metal electrodes, the electric field is shifted downwards, so that the integral is zero, and both electrodes have the same potential.

These two methods are, in our opinion, very interesting for investigating supercapacitors based on electrolytes or ionic liquids, (e.g., [39–41]). They are complimentary in the sense that the image charge method is computationally very cheap: the extra cost of image charges is typically negligible, and the only model parameter is the distance of closest approach between ions and the metallic interface. The induced charge methods, however, allow for arbitrarily-shaped surfaces, and one can investigate very complex geometries. The extra computational cost is feasible if the resolution of the surface can be relatively coarse or, in other words, a certain electrostatic "roughness" or inaccuracy is acceptable and can be considered as an adjustable model parameter.

#### 5. Smooth Variations: Local Method

We have presented methods to deal with sharp dielectric interfaces of several types and shapes. Yet, none of these methods offer the possibility of a spatially smooth varying permittivity. More precisely, they are all restricted to single step-like changes and do not allow charges to pass through those regions of variation.

In the following, we sketch an algorithm that allows for an arbitrary distribution, <sup>ε</sup>(*r*), of the permittivity that we call Maxwell Equations Molecular Dynamics (MEMD). The concept of diffusive field propagation was first presented by Maggs in 2002 [20,42] and adapted for molecular dynamics simulations simultaneously by Rottler and Maggs [43] and Dünweg and Pasichnyk [21]. The algorithm is not based on the static Poisson Equation (1), which is of a global nature. Instead, the time derivative of Gauss' law <sup>∇</sup>*D* <sup>=</sup> <sup>ρ</sup> of electrodynamics, with *D* <sup>=</sup> <sup>ε</sup>(*r*)*E*, is extended to the following constraint of the most general form.

$$
\dot{\mathbf{D}} + \dot{\mathbf{y}} - \nabla \times \dot{\Theta} = 0 \tag{13}
$$

where *j* denotes the local electric current and **<sup>Θ</sup>** is an arbitrary vector field representing an additional degree of freedom. If we apply this constraint to the system propagation via a Lagrange multiplier, *A*, fix the gauge degree of freedom from Equation (13) to:

$$
\dot{A}^{\prime} = -D,\qquad \text{define}\tag{14}
$$

$$B \; := \; \nabla \times A \tag{15}$$

and introduce the magnetic field, *B*, then this so-called temporal (or Weyl) gauge will, via variational calculus, lead to the equations of motion for the charges and fields that are known as the Maxwell equations. The actual electrostatic potential, Φ, is never calculated in this algorithm, only the electric field, *E*, for Lorentz force calculation:

$$\mathbf{F}\_L = q \left( \mathbf{E} + \frac{1}{c^2} \mathbf{v} \times \mathbf{B} \right) \tag{16}$$

It is remarkable that simply by applying constraint (13) and the Weyl gauge, (14), the complete equations of the electromagnetic formalism can be reproduced. It can even be shown [20] that the propagation speed of the magnetic field, an equivalent of the speed of light, c, can be reduced in a Car-Parrinello manner, and correct retarded solutions for statistic observables can still be maintained.

This reduces the elliptic partial differential Equation (1) to a set of hyperbolic differential equations for the propagation of magnetic fields and charges, requiring only local operations for the solution. It therefore opens up the possibility of arbitrary local dielectric permittivities within the system. If discretized on a lattice and coupled with a linear next neighbor interpolation scheme for the charges and electric currents, the permittivity can be set individually on every lattice link [44]. The discretization is carried out as seen in Figure 6a. This is in agreement with <sup>ε</sup>(*r*) being a differential one-form, if we assume the tensor to only have identical diagonal entries (optically isotropic dielectric medium).

In this algorithm, the charges can move freely through a smoothly varying dielectric medium. At the current implementation state, the variations are only spatial, but temporal changes of the dielectric during the simulation are theoretically possible. It has also been shown that the field propagation within the system reproduces the classical Keesom potential interactions between two dielectric interfaces [45].

Figure 6. Maxwell Equations Molecular Dynamics (MEMD) interpolation of the charges onto the lattice. (a) The electric current, *j*, permittivity <sup>ε</sup> and electric field *D* are interpolated to the adjacent lattice links. The magnetic field component, *B*, is placed on the lattice plaquettes via a finite-differences curl (∇×) operator. (b) The numerical error of the algorithm is dependent on the lattice spacing. The error can be reduced by applying a coarser mesh, coming from the right in this graph, and, thereby, increasing the field propagation speed in the system. However, at large lattice spacings a, the interpolation error at small distances dominates and diverges. In a densely-populated system, like the examples seen here, a minimal relative error of 10−<sup>3</sup> is achieved at mesh sizes comparable to the minimum distance of two charges, denoted here by σ. For reference, the errors compared to a high precision P3M force calculation are included for three example systems.

One source of numerical errors in this algorithm stems from the retarded solution of the Maxwell equations with the Lorentz force calculation (16) at the finite speed of light c  ∞. The second error is introduced via the linear lattice interpolation of the charges. The self-interaction of a charge with the lattice can be corrected, for example, using a lattice Greens function for constant dielectric backgrounds, but for locally varying dielectric properties, the implementation relies on a straight-forward direct subtraction scheme. Here, the local electric field created by the interpolated charges of one particle is calculated and subtracted from the resulting force. Still, an interpolation error remains for the coupling between two particles at close distances, since the charge is spread out on the lattice.

Since the field propagation speed within the system is proportional to the lattice spacing, a, the first of these errors depends on (1/a<sup>2</sup>) (see Equation (16)), whereas the second error for geometrical reasons scales like a<sup>3</sup>. It can be seen in Figure 6b that the error can be reduced by making the mesh coarser, until the interpolation error at close distances dominates for a very coarse mesh. The plot shows a theoretical estimate of the error and simulation data for three example systems: an artificial system with a charged infinite plate (cloud-wall), a polyelectrolyte in aqueous solution and a silica salt melt. The second parameter, besides the lattice spacing, the artificial speed of light, c, was chosen to obey the Courant-Friedrichs-Lewy stability condition, c  a/dt, where dt is the time step of the MD simulation. For every lattice spacing, there exists a maximum speed of light parameter that keeps the algorithm stable. Here, we picked the speed of light c = 0.1 · a/dt as the parameter for all three setups. It turns out that a relative force error of 10−<sup>3</sup> is achievable in sufficiently homogeneous systems, and the optimal lattice spacing is comparable to the minimal distance between charges. The interpolation error, and, therefore, the total error, can be reduced by splitting off a near field that spans across multiple mesh cells and applying a short-range calculation in this region. However, this is not possible for spatially varying permittivity and will not be discussed here.

#### *5.1. Example: Colloid with Dielectric Jump and Continuous Dielectric Constant*

An example where such smoothly varying changes in ε can play a substantial role is the simulation of a charged colloid in a salt solution (Figure 7a). The first approach at dielectric coarse graining is a sharp dielectric contrast between the colloid and the solvent. Such a system can be simulated by the ICC\* method presented in Section 3 with a dielectric permittivity ε<sup>C</sup> = 2 within the colloid and ε<sup>W</sup> = 80 for the surrounding bulk water. However, in practice, the polarizability, and, therefore, bulk permittivity, of water close to charged surfaces and in regions of high ion concentrations is significantly reduced [46]. Many workarounds were proposed to address this behavior, including the introduction of an artificial Stern layer [47] to reproduce the desired Gouy-Chapman predictions. A more direct and more physical approach is to interpolate the bulk permittivity from the colloid to free water between the colloid surface and the solution (see the bottom of Figure 7b). A linear interpolation is sufficient to describe the local permittivity obtained from atomistic simulations [48].

In order to illustrate the difference between dielectric jump and continuous dielectric constant, we simulated a spherical setup with two different models of radial permittivity dependence ε(|r|). Model 1 includes a discontinuity of ε at the colloid surface, whereas model 2 uses a linear interpolation ε(|r|) = ε<sup>C</sup> + (ε<sup>W</sup> − εC)/(4σ) · (r − R) over four ion radii σ = 0.425 nm. Model 1 additionally has been simulated using the ICC\* algorithm combined with the P<sup>3</sup>M Coulomb solver, for comparison. Figure 7b shows the radial charge density of the counterions around a colloid of charge Z = 60 with radius R<sup>c</sup> = 30σ = 12.75 nm in a monovalent electrolyte solvent of c = 50 mmol/L concentration. The difference between the two models is drastic and can be explained by the stronger Coulomb attraction of counterions towards the colloid, due to the smaller dielectric permittivity close to the colloid surface. Another effect is the earlier occurrence of overcharging effects, because of increased ion correlations, due to their entering the region of lower permittivity. The comparison with the ICC\* algorithm, on the other hand, shows that MEMD is also very well capable of simulating dielectric jumps, provided a sufficiently small mesh, comparable to the particle size, for the discretization of the electrodynamic equations can be realized. The computational overhead, compared to a simulation with MEMD at constant background permittivity, is negligible at less than 0.1% in this setup. MEMD ran 41% longer than the identical setup for the ICC\* algorithm. This overhead could be reduced by further optimizing the mesh size, which was chosen to be well resolved here at a colloid diameter of 16 mesh spacings.

Figure 7. (a) A charged colloid (charge Z = 60e, radius R = 12.75 nm) is suspended in a salt solution (concentration c = 50 mmol/L). The dielectric constant is modeled as an abrupt jump to the bulk water permittivity with the MEMD and ICCP<sup>3</sup>M algorithms (gray points, green curve) or a linear radial increase within two ion diameters from the colloid surface between the two regimes (red). (b) The resulting radial charge density profiles, which exhibit a drastic difference between the dielectric jump in model 1 and a smooth interpolation in model 2.

#### 6. Conclusions

In this review, we have presented several methods to compute electrostatic interactions in the presence of dielectric interfaces in computer simulations and give examples that demonstrate the importance of taking the dielectric contrasts into account, like they appear in implicit solvent models.

In a planar slit pore, such as a plate capacitor, ions are attracted to the walls or are repelled by them, depending on whether the walls consist of an dielectric medium with high or low polarizability. The presented ICMMM2D [16] or the ELCIC [17] algorithms allow one to treat a slit pore with dielectric jumps at both confining interfaces, like capacitors or thin films.

In curved or more complex geometries, such as a nanopore, dielectric properties can also drastically alter the ion distribution or the translocation properties of charged macromolecules. The presented ICC\* algorithm allows one to include arbitrarily-shaped dielectric interfaces in computer simulations, by computing the induced surface charges necessary to fulfill the dielectric boundary conditions on the fly.

As another interesting application, we presented the example of plates that are held electrically at a constant potential, such as one would use to study capacitors. This will lead to a different polarization than isolated grounded plates, and this effect has to be taken into account in simulations, with algorithms that are adjusted accordingly. Both the ICMMM2D and the ICC\* algorithm can handle this special case, as we have demonstrated.

Finally, we have shown that smoothly varying dielectric properties again are very different from dielectric jumps as treated by ICMMM2D or ICC\*. At present, only the sketched MEMD electrostatics algorithm [20,21,44] is able to treat such systems.

All algorithms presented here, including the extensions for varying dielectric constants, are implemented in the open source simulation package, ESPResSo [22,23]. MEMD is also part of the open source ScaFaCoS (Scalable Fast Coulomb Solvers) library for fast Coulomb solvers [49]. In conclusion, computational tools for the most common cases of varying dielectric properties are readily available for computer simulations. This will become more and more important with the increased interest of applying coarse-grained or implicit solvent models in nanopores, biological membranes, thin films and supercapacitors.

#### Acknowledgments

We would like to thank Anthony Maggs and Zhenli Xu for fruitful discussions.

This work was financially supported by the German Science Foundation (DFG) through the Collaborative Research Center (SFB) 716 and the funding programme Open Access Publishing. Furthermore funds were provided by the German Ministry of Science and Education (BMBF) under grant 01IH08001 and the cluster of excellence SimTech at the University of Stuttgart.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References

1. Ewald, P.P. Die Berechnung optischer und elektrostatischer Gitterpotentiale. *Ann. Phys.* 1921, *369*, 253–287.


Reprinted from *Entropy*. Cite as: Santiso, E.E.; Herdes, C.; Müller, E.A. On the Calculation of Solid-Fluid Contact Angles from Molecular Dynamics. *Entropy* **2013**, *15*, 3734–3745.

## *Article*

**390** 

## **On the Calculation of Solid-Fluid Contact Angles from Molecular Dynamics**

#### **Erik E. Santiso 1,2, Carmelo Herdes 1 and Erich A. Müller 1,\***


*Received: 21 July 2013; in revised form: 2 September 2013 / Accepted: 3 September 2013 / Published: 6 September 2013* 

**Abstract:** A methodology for the determination of the solid-fluid contact angle, to be employed within molecular dynamics (MD) simulations, is developed and systematically applied. The calculation of the contact angle of a fluid drop on a given surface, averaged over an equilibrated MD trajectory, is divided in three main steps: (i) the determination of the fluid molecules that constitute the interface, (ii) the treatment of the interfacial molecules as a point cloud data set to define a geometric surface, using surface meshing techniques to compute the surface normals from the mesh, (iii) the collection and averaging of the interface normals collected from the post-processing of the MD trajectory. The average vector thus found is used to calculate the Cassie contact angle (*i.e.*, the arccosine of the averaged normal *z*-component). As an example we explore the effect of the size of a drop of water on the observed solid-fluid contact angle. A single coarse-grained bead representing two water molecules and parameterized using the SAFT-- Mie equation of state (EoS) is employed, meanwhile the solid surfaces are mimicked using integrated potentials. The contact angle is seen to be a strong function of the system size for small nano-droplets. The thermodynamic limit, corresponding to the infinite size (macroscopic) drop is only truly recovered when using an excess of half a million water coarse-grained beads and/or a drop radius of over 26 nm.

**Keywords:** cloud data set; interfacial tension; coarse-graining; water; line tension; graphene

#### **1. Introduction**

Wetting is the ability of a liquid to maintain contact with a solid surface, resulting from intermolecular interactions when the two are brought together. The presence of a liquid drop on a rigid surface is a reflection of the force balance between adhesive and cohesive forces and is commonly used to determine the wettability (the degree of wetting) of the solid-fluid system in terms of the solid-fluid contact angle, (see Figure 1). In this context, hydrophobicity is commonly referred to as the ability of a solid surface to repel water: if the water contact angle is smaller than 90°, the solid surface is considered hydrophilic and if the water contact angle is larger than 90°, the solid surface is considered hydrophobic.

**Figure 1.** Schematic of a liquid drop on a solid surface showing the contact angle.

Despite the fact of being such a well-defined problem the amount of conflicting (both theoretical and experimental) reported values for a given system is intriguing (see Figure 2, data from reference [1]).

**Figure 2.** Frequency of contact angle values of water on graphite reported in literature; both from experimental results and numerical simulations [1].

For instance, in the case of the graphite-water system, contact angles have been addressed extensively by experimental and theoretical approaches; however, a single general value has not been accepted [2–9].

A variety of causes for the discrepancies can be enumerated: heterogeneity and/or impurities at the surfaces or in the fluids, different methodological unstandardized calibration of equipment in experiments, possible system size effects and the distinct interaction potentials in simulations.

At the molecular scale, the main hurdle is that estimating contact angles for nanodroplets on surfaces is complicated by the fact that there are significant fluctuations in the shape of the droplet, and its geometry at a given step is often not axially symmetric. Furthermore, for very small nanoclusters, the fluid interfacial tension is a function of the curvature, and the planar limit is, in some cases, not recovered even after drop radii of 14 times the molecular diameter [10,11]. The change in the line tension with curvature (discussed in the latter part of this manuscript) is also an important factor affecting the result. It is seen that the contact angle will be, in the nanoscopic limit, a strong function of the system size [2]. In the analysis of simulations, contact angles [12,13] are commonly determined by using two-dimensional slices of the droplet and fitting its density profile to an empirical function, usually a circular section [14]. Such an approach, although appealing from the simplicity of the method, provides inconsistent results, particularly for small droplets [11]. Understandingly, different methods of increasing complexity have been devised for this purpose [15–17].

Figure 3 illustrates the difficulties associated with defining the contact angle using two-dimensional slices of molecular snapshots, especially for small droplets. Such droplets are often asymmetrical, and their shape changes substantially with time, due to capillary fluctuations. Using one or a few two-dimensional projections of a few molecular snapshots can thus potentially lead to large errors on the measured contact angle: for example, the values shown on Figure 3 vary over range of about 10°.

**Figure 3.** Two-dimensional projections of a given configuration of a water droplet on a surface. The contact angles, measured using the auxiliary lines depicted in black, are: (**a**) 63.77° (**b**) 60.52° (**c**) 64.56° (**d**) 54.93°.

(**c**) (**d**)

In addition to the inherent fluctuations of the drop's shape, there is a degree of arbitrariness in defining a single line separating the two phases in the two-dimensional projection. The interface layer is actually diffuse, as seen in Figure 3, spanning a width of several molecular diameters. This introduces another potential source of error, as the contact angle measured can change over a range of 5° or more depending on the choice of the contact line.

In this paper, we propose a method that does not make any *a priori* assumption about the shape of the drop, and uses the complete three-dimensional structure of the droplet near the surface to estimate the contact angle. For such analysis, we propose a geometrical estimation of the contact angle based on cloud point data breakdown of an equilibrated molecular dynamics (MD) trajectory of a drop on a given surface, this eliminates the effect of both spatial and temporal fluctuations on the estimated value of contact angle, yielding a value that better represents the average shape of the drop, as explained in detail in the next section.

#### **2. Methodology**

In order to estimate the average contact angle during an MD simulation, we first define a contact layer at each time step. This layer is defined as the set of molecules within the liquid-vapor interface that are close to (*i.e.*, within a maximum distance *zmax* from) the solid surface. We then estimate the normal to the interface for each molecule in the contact layer by finding the plane that best fits the local shape of the interface at that point. The average of all the local normal vectors for all the time steps, , can be used to calculate the average contact angle as , where is the normal to the solid surface. **n** - cos<sup>1</sup> **n n**surf **n**surf

In practice, at each time step in the simulation, we carry out a two-step calculation: In the first step we use a discretized density profile to identify the molecules belonging to the liquid-vapor interface. In the second step, we estimate the local surface normal vectors for the molecules in the contact layer. The procedure is explained in detail below.

#### *2.1. Identification of Interfacial Molecules*

We identify the molecules belonging to the liquid-vapor interface using the following procedure:


The results obtained using the interface-sensing procedure are only weakly dependent on the choice of cutoff density, as long as this density is between the liquid and vapor densities. The number of subcells, on the other hand, should be chosen carefully: having more subcells identifies the interface with a higher resolution. However, if the number of subcells is too large, the density fluctuations within the droplet may cause the algorithm to incorrectly label too many cells as part of the interface. This can be easily detected by visualizing the interface molecules at one or a few simulation steps. Alternatively, one can use the average coordination number or the Minkowski-Bouligand dimension (obtained from a box-counting algorithm) [18,19] as a quantitative measure to select the appropriate subcell size. As the cells become smaller and the interface-sensing algorithm starts fitting the molecular-level fluctuations in the density rather than the actual interface, both the fractal dimension and the average coordination number will start deviating from the value expected for the larger subcells, indicating that the subcell size has become too small.

**Figure 4. (a)** A snapshot from a MD simulation showing a droplet on top of a given surface (not shown). **(b)** The discretized density profile from the same system obtained using cubic subcells of width 3 nm. Density values are in molecules/Å [3].

#### *2.2. Estimation of Local Contact Angles*

Having identified the molecules belonging to the interface, we choose a subset of the interface molecules as the interface contact layer. This layer contains the interface molecules that are within a given distance *zmax* from the solid surface. We then estimate the local contact angle at the position of each molecule *i* using the following procedure:


$$\boldsymbol{\Omega} = \sum\_{j} \mathbf{r}\_{j}^{\text{cent}} \otimes \mathbf{r}\_{j}^{\text{cent}} \tag{1}$$

where denotes the outer (Kronecker) product. 

(e). Find the eigenvector of corresponding to its smallest eigenvalue. This is the normal to the plane that best fits the set of molecules in step (c). The sign of this normal is chosen to point away from the center of the droplet [20].

We average all the local normal vectors obtained at all the time steps to obtain the average normal. The component of this average normal perpendicular to the solid surface is the cosine of the average contact angle. The eigenvalues of the covariance matrix measure the variability of the positions along the three orthogonal directions defined by the eigenvectors. These can be used as a measure of how 1-, 2- or 3-dimensional the interface is. If one of the eigenvectors is close to zero, the interface is close to planar in that region.

The procedure described above to estimate the local contact angle requires arbitrarily setting a distance from the solid surface, *zmax*, to define the contact layer, and a cutoff radius to estimate the tangent plane. In theory the contact angle would correspond to the limit when *zmax* 0. However, if *zmax* is chosen to be too small, the local density fluctuations within the contact layer cause the distribution of local contact angle values to be too noisy. Thus, one should choose the smallest value that gives a reasonable estimation error for the average contact angle. For the examples shown in this work, we have found *zmax* = 0.5 nm to give reasonable results. *rc*

For the cutoff radius, *rc*, a compromise is also needed: if the value is too small, local density fluctuations cause the error in the estimated normal to be too large, whereas if the value is too large the orientation of the tangent plane will be affected by molecules far from the position of interest. The choice of *rc* should be made by considerations similar to those used in choosing *zmax*. For the examples in this work, we have found *rc* = 1 nm to be a reasonable value. A more detailed discussion of how to choose this cutoff can be found in reference [21]. Finally, we estimate the statistical error associated with the average contact angle computed by our method, we use the bootstrapping method [22].

#### *2.3. Molecular Dynamics Details*

Atomistic MD simulations are generally limited to nanometer-sized water droplets [2] Consequently, the apparent contact angle is usually drop-size dependent. To explore bigger systems, and aiming to find the optimal size for the MD contact angle calculation, we adopted a coarse-grained (CG) approach to describe the solid-water system in order to test the system size dependence beyond the atomistic limits, in a reasonable time [2].

We used the GROMACS simulation open source [23] suite to calculate the MD, which is well suited to implement Mie potentials [24]. Here, a single CG isotropic bead represents two water molecules [25]. Although several options are available for choosing the number of water molecules in a CG bead [26], this choice and the current parameterization produces sensible results, including a melting point, the surface tension, liquid densities and vapour pressures close to the experimental. The parameterization was carried out using SAFT- Mie approach, [27–29] where the water parameters were obtained by fitting to macroscopic properties, namely, the planar limit interfacial tension and liquid state density of water in a range from 0 °C to 40 °C. The SAFT EoS is a perturbation approach based on a well-defined Hamiltonian; here the CG beads are represented in the theory by a Mie potential, *u*:

$$u(r) = \left(\frac{\lambda\_r}{\lambda\_r - \lambda\_a}\right) \left(\frac{\lambda\_r}{\lambda\_a}\right)^{\frac{\lambda\_a}{\lambda\_r - \lambda\_a}} \mathcal{E}\left[\left(\frac{\sigma}{r}\right)^{\lambda\_r} - \left(\frac{\sigma}{r}\right)^{\lambda\_a}\right] = Ar^{-\lambda\_r} - Cr^{-\lambda\_a} \tag{2}$$

where *r* is the intermolecular distance, and , and , are the adjustable parameters relating to the energy and distance scales, *a* is the dispersion exponent and *r* is the short-range repulsion. The potential is expressed in terms of two constants *A* and *C* for ease of tabulating in MD codes. Solid walls are modeled implicitly and described by an integrated potential of the same form in the *z*-dimension. Eight implicit solid surfaces of increasing  *Wall#-W/kB* increments of 10 K, from 60 K to 130 K (labeled Wall01 to Wall08 respectively) are employed [30]. Table 1 summarizes the selected coarse-grained parameters.


**Table 1.** Coarse-grained parameters. W refers to a CG representation of water that accounts for two water molecules.

The systems are run under a canonical (NVT) ensemble, where the total volume, concentration and temperature are kept constant. Periodic boundary conditions were applied in *xy* dimensions, meanwhile an attractive wall was placed at z = 0 and a repulsive one (*C* = 5.73512 × 104 kJ mol1 nm4 , *A* = 1.93147 × 106 kJ mol1 nm10 at the maximum height of the box. The number density of the atoms for each wall was set in 5 nm2 (c.f. in a Si crystal the number of atoms per nm2 on the (100), (111) and (110) planes are 6.78, 7.83 and 9.59 nm2, respectively [31]). The simulations are thermostated to 298.15 K every 1 ps by a Nose-Hoover algorithm, all non-bonded interactions were truncated at 2.0 nm. The trajectories were recorded every 1000 time-steps (t = 0.01 ps) for at least 2000 ps after equilibrium.

#### **3. Results and Discussions**

We simulate 16,000, 32,000, 64,000, 128,000, 256,000, 512,000 and 1,024,000 water molecules (8,000, 16,000, 32,000, 64,000, 128,000 and 512,000 beads) on the eight solid surfaces. The simulation boxes dimensions can be seen in Table 2.

Although arbitrary, the reason guiding the choice of the simulation box size is to prevent the interaction between the sample and its periodic images. The meshing was done by dividing the simulation domain with a 1.2 × 1.2 × 1.2 nm3 subcell. The water droplet density contour is obtained from the cloud point data set analysis explained in the methodology section.

To test the size dependence of the water contact angle on a given surface we chose the intermediate Wall05 (see Table 1). The contact angles obtained using a moderately hydrophilic substrate (Wall05) as a function of the drop size can be seen in the Figure 5. The water contact

angle over Wall05 exhibits a marked system-size dependence even up to 256,000 water molecules (128,000 beads) Larger drops exhibit a less pronounced but nevertheless noticeable size dependence. It is interesting to note that the effect of this scale up is an increase in the hydrophobicity of this surface in correspondence with previous simulations [32] and experimental studies at the macroscale. For this substrate, a system with 1,024,000 water molecules shows an apparent limiting contact angle of 74.30° [8]. The Wall05 parameters are chosen to loosely relate to graphene, a hydrophilic substrate.


**Table 2.** Simulation box size for each system.

**Figure 5. (a)** Water contact angle as a function of the water molecules on Wall05, inset shows the correspondent drop diameter, dashed lines are guide to the eye. **(b)** Snapshots of the drop interfaces for the smallest and the biggest system studied.

**Figure 6. (a)** Water contact angle as a function of the fluid-substrate interactions, solid circles are simulation results, dashed red line marks the hydrophobic-hydrophilic threshold. **(b)** Corresponding equilibrium interface snapshots depicting interfacial beads.

Following from the above, we use the system of 256,000 water molecules to study its contact angle as a function of the fluid-substrate interactions; the results can be seen in Figure 6. As expected, increasing the interaction energy between water molecules and the attractive wall diminishes the solid-fluid contact angle. For this particular coarse-grained surface model the functionality is linear, and a value of *Wall-W/kB* ~ 85 K can be taken as the boundary between hydrophilic and hydrophobic surface behavior.

An ancillary quantity of interest in the context of the drops on surfaces is the line tension. The line tension is the relation between the energy associated with the three phase contact line and the length of this line. This quantity is inherently dependent on the size of the drop [33] and is discussed as one of the reasons why the calculated tensions appear to be size-dependent. For large drops, one can obtain an estimate of the line tension using the approximation described by Weijs *et al*. [34]. For a typical case as shown above: the case of a drop of roughly 26 nm in diameter and considering the planar limit of the fluid-vapor tension (72 mN·m1) results in a contact tension line strength of *O*(108) J·m1. While this number seems to be different from those estimated from micrometer drop experiments [35] or from molecular simulation of atomistic water models [36,37]; it is appropriate to point out that values as low as ×1011 J·m1 and as high as ×105 J·m1 have been reported [33], which place our results in the correct context. Obviously, there is scope for much more detailed research into this topic.

#### **4. Conclusions**

We have proposed and validated a methodology for the unambiguous calculation of the solidfluid contact angle from molecular dynamics simulations. We have tested model coarse-grained water-solid systems far beyond the limits commonly taken in atomistic simulation and showed, that for this particular model, more than 500,000 effective beads and/or drop diameters in excess of 50 nm would be required in order to obtain a result which is invariant of system size. So far the methodology has been applied over homogenous surfaces. However, is a well-known fact that surface roughness and energetic heterogeneities will have a profound effect on the contact angle calculations. The methodology presented is well suited to capture those effects.

#### **Acknowledgments**

This work was supported by the EPSRC through research grants (EP/I018212, EP/J014958 and EP/J010502).

#### **Conflicts of Interest**

The authors declare no conflict of interest.

#### **References**


Reprinted from *Entropy*. Cite as: Uline, M.J.; Corti, D.S. Molecular Dynamics at Constant Pressure: Allowing the System to Control Volume Fluctuations via a "Shell" Particle. *Entropy* 2013, *15*, 3941–3969.

*Review*

## Molecular Dynamics at Constant Pressure: Allowing the System to Control Volume Fluctuations via a "Shell" Particle

Mark J. Uline **<sup>1</sup>***,* \* and David S. Corti **<sup>2</sup>**

<sup>1</sup> Department of Chemical Engineering, University of South Carolina, Columbia, SC 29208, USA

<sup>2</sup> School of Chemical Engineering, Purdue University, West Lafayette, IN 47907, USA;

E-Mail: dscorti@purdue.edu

\* Author to whom correspondence should be addressed; E-Mail: uline@cec.sc.edu; Tel.: +1-803-777-2030; Fax: +1-803-777-8265.

*Received: 29 July 2013; in revised form: 6 September 2013 / Accepted: 16 September 2013 / Published: 23 September 2013*

Abstract: Since most experimental observations are performed at constant temperature and pressure, the isothermal-isobaric (NPT) ensemble has been widely used in molecular simulations. Nevertheless, the NPT ensemble has only recently been placed on a rigorous foundation. The proper formulation of the NPT ensemble requires a "shell" particle to uniquely identify the volume of the system, thereby avoiding the redundant counting of configurations. Here, we review our recent work in incorporating a shell particle into molecular dynamics simulation algorithms to generate the correct NPT ensemble averages. Unlike previous methods, a piston of unknown mass is no longer needed to control the response time of the volume fluctuations. As the volume of the system is attached to the shell particle, the system itself now sets the time scales for volume and pressure fluctuations. Finally, we discuss a number of tests that ensure the equations of motion sample phase space correctly and consider the response time of the system to pressure changes with and without the shell particle. Overall, the shell particle algorithm is an effective simulation method for studying systems exposed to a constant external pressure and may provide an advantage over other existing constant pressure approaches when developing nonequilibrium molecular dynamics methods.

Keywords: isothermal-isobaric ensemble; molecular dynamics

#### 1. Introduction

The molecular dynamics (MD) simulation method can be straightforwardly applied to the analysis of an isolated system or a system described by the microcanonical ensemble in which the energy, volume V and particle number N are held fixed. The equations of motion that describe the time evolution of the positions and momenta of the particles, *i.e*., the resulting microcanonical ensemble phase space trajectory, follow directly from Newtonian mechanics. Energy, however, is not a variable of choice for experiments. Many experimental observations are carried out under conditions of constant pressure and temperature, such that the system is no longer isolated from its environment. Therefore, while the generation of dynamic information about these systems is of interest, how to modify the equations of motion to describe a system at constant temperature and/or constant pressure is arguably not an obvious task.

An extension of the MD method to systems not described by the microcanonical ensemble was presented by Andersen in 1980 [1]. Andersen showed, for example, that by modifying the Lagrangian of the system, a constant external pressure could be imposed within MD. Specifically, additional control variables were introduced into the Lagrangian, beyond the standard coordinate and momentum vectors needed to describe the classical N-particle system. The new variables served to drive the fluctuations of those variables no longer held fixed within the ensemble of interest. For a system in which a constant external pressure is imposed, the system volume is now introduced as a dynamic variable that serves to maintain, on average, mechanical equilibrium between the external and system pressure. Consequently, the system is exposed to a barostat, whereby a "piston" of arbitrary "mass" controls the dynamics of the volume. While ensemble averages are independent of the piston mass, the fictitious mass does affect the response time for volume fluctuations.

Andersen's extended Lagrangian approach was later adapted by Nosé [2,3] to simulate systems in contact with a thermostat using MD. Hoover [4,5] proposed another isothermal-isobaric (NPT) MD algorithm using a modification of Andersen's piston method for maintaining constant pressure and the thermostating method of Nosé. As Hoover was aware of, and as discussed in detail by Tuckerman *et al.* [6], this algorithm does not yield ensemble averages consistent with the then accepted form of the NPT ensemble partition function. Consequently, several new NPT MD algorithms have been introduced in the literature (a non-exhaustive list is given here [7–11]).

Yet, starting nearly 20 years ago, the foundation of the NPT ensemble (when the volume is considered to be a continuous variable) has been reconsidered [12–14]. What was noted was that the NPT partition function redundantly counts the configurations of the system. This problem of over-counting was removed by requiring that the volume, V , of the system be defined by a "shell" particle, where at least one particle resides in the volume, dV , encapsulating V . All of the NPT MD algorithms mentioned above are not, however, consistent with the proper shell-particle formulation of the NPT ensemble (we will show later that Hoover's algorithm does give the correct distribution of volumes if periodic boundary conditions are employed). As such, new NPT MD algorithms should be introduced in order to generate the correct NPT ensemble averages.

Corti [15] previously modified the Monte Carlo NPT algorithm to be consistent with the correct NPT partition function. The current authors performed a similar reformulation for the constant pressure MD algorithm for systems whose particles interact via continuous [16,17] and discontinuous [18] potentials. In these new MD algorithms, a shell particle is used to uniquely define the volume of a system exposed to a constant external pressure. Consequently, since the shell particle sets the volume of the system, no piston mass needs to be specified. In other words, the system itself controls the response time of volume fluctuations, as the mass of the shell particle is known, and not the user through the introduction of an arbitrary piston mass. Various benefits arise from the removal of this ambiguity in the NPT MD algorithm.

As a side note, Evans and Morriss [19,20] utilized constrained dynamics to develop an NPT MD algorithm. In this method, both the instantaneous pressure and kinetic energy are made strict constants of motion, and so, the Andersen piston is not employed. Nevertheless, this algorithm does not yield ensemble averages consistent with the NPT partition function (either with or without the shell particle), as the instantaneous pressure fluctuates within the NPT ensemble [15]. Even though the constraint dynamics also does not utilize a piston, the resulting equations of motion do not generate the proper NPT ensemble averages.

In this paper, we review our previous work on employing the shell particle to generate equations of motion that are consistent with the proper shell-particle formulation of the NPT ensemble. To begin, we provide in Section 2 an overview of the reformulation of the NPT ensemble partition function and the need to employ the shell particle to eliminate the redundant counting of configurations. In Section 3, the equations of motion required to properly generate a system within the NPT ensemble are presented, in which the piston of arbitrary mass is replaced with a shell particle of known mass. We include the previously derived equations in which an external temperature is imposed via the use of the Nosé-Hoover thermostat chains, as well as recently developed equations making use of a thermostat based on the configurational temperature. The Trotter expansion to the Liouville operator formalism [21–26] is used to factorize the classical propagator into analytically solvable operators. We also provide simulation results for the Lennard-Jones fluid, particularly for small system sizes, where interesting differences between the old and new NPT partition function appear for various ensemble averages. 'Nonequilibrium' simulations are presented in Section 4, in which the external pressure is changed after the system has equilibrated. As the system evolves to a new equilibrium state, we compare the dynamics of the volume as defined via the shell particle to that when the Hoover algorithm is used with different piston masses. Conclusions are provided in Section 5, as well as a discussion of some particular dynamic systems of interest that may benefit from the use of the shell-particle formalism.

#### 2. The Volume Scale in Constant Pressure Ensembles

The original formulation of the isothermal-isobaric ensemble can be traced back to 1939, where Guggenheim [27] wrote the partition function, Δ(N, P, T), as:

$$\Delta(N, P, T) \;=\sum\_{V} Q(N, V, T) e^{-PV/k\_B T} \tag{1}$$

where k<sup>B</sup> is Boltzmann constant, Q(N, V, T) is the canonical ensemble partition function of a system composed of N particles held in a volume, V , and at a temperature, T, and P is the external pressure to which the system is exposed as the volume is allowed to fluctuate. Although Equation (1) is formally correct, an ambiguity arises when dealing with systems in which the volume is a continuous variable. In the late 1950s, several authors [28–31] attempted to remove the conceptual difficulty associated with the sum over an unspecified set of discrete volumes by expressing Δ(N, P, T) as:

$$
\Delta\_0(N, P, T) \quad = \frac{1}{V\_0} \int\_0^\infty Q(N, V, T) e^{-PV/k\_B T} dV \tag{2}
$$

The replacement of the sum in Equation (1) by an integral enables the inclusion of all volumes, but at the expense of generating a partition function that has the dimensions of volume. Consequently, this partition function must be rendered dimensionless through division by some constant with units of volume denoted by V<sup>0</sup> in Equation (2). Note that we wrote the partition function with a subscript (Δ0(N, P, T)) in Equation (2) to signify that this partition function uses V<sup>0</sup> as its volume scale. The constant, V0, does cancels out when determining the ensemble average of a given variable and, so, need not be specified. Even so, Sack [30] showed in the thermodynamic limit that:

$$V\_0 = \frac{k\_B T}{P} \tag{3}$$

Hill [31] noted that in the thermodynamic limit, the choice of V<sup>0</sup> is arbitrary, due in part to the equivalency of the ensembles in the thermodynamic limit. Evaluation of ensemble averages of macroscopic systems using Equation (2) yields only a completely negligible error. Yet, the precise value of the volume scale is important when dealing with systems of sufficiently small size [12–14]. The volume scale must be chosen carefully, since it depends upon the properties of the boundary separating the system of interest from the surroundings [14]. The boundary serves to define the volume of the system and allows the system volume to fluctuate against the external pressure imposed by the surroundings. Hence, the boundary cannot be chosen arbitrarily, particularly when the system is not in the thermodynamic limit. In other words, the properties assigned to the boundary must conform to the actual physical situation in which the system is found.

As shown by Koper and Reiss [12] using the microcanonical ensemble, verified later by Corti and Soto-Campos [13] and Corti [14] using the canonical ensemble, when the boundary is not a physical object to which a mass or momentum can be assigned (*i.e*., a mathematical construct to aid in the specification of the system volume), then the partition function in Equation (2) counts configurations of the system redundantly (whether or not V<sup>0</sup> is specified). The problem of over-counting is removed by requiring that the volume, V , of the system be specified by a "shell" molecule, where at least one molecule resides in the volume, dV , surrounding V . To illustrate this problem, turn to Figure 1, which demonstrates how several volumes may enclose the same configuration of n particles surrounded by N − n particles. In the rigorous formulation of the NPT ensemble, each configuration of the system must correspond to only one specific volume state of the n particles. Otherwise, the same configuration will be counted more than once in Equation (2) [14]. The problem of over-counting, or redundancy, is resolved by defining a "shell" particle [12–14], in which at least one of the system particles resides in the shell that encapsulates the system volume. Defining the n-particle system with the shell particle means that a new and distinct state of the total N-particle system is necessarily created when the volume of the n-particle system is varied (whether or not the configuration of the surrounding N − n particles changes), since the position of the shell particle changes, as well [14]. Consequently, the inclusion of configurations of the n particles common to larger values of the volume is explicitly avoided.

Figure 1. One particular configuration of N particles enclosed within a total volume, V , demonstrating how to uniquely define one specific volume state of n particles (shaded circles). The unshaded circles represent the surrounding N − n particles that comprise the bath. Each particle center is marked by a dot and is surrounded by an effective diameter. The first step in determining the volume occupied by the n particles is to choose a particular reference point in V as the origin, rc. Yet, several volumes (dashed circles) centered at r<sup>c</sup> still enclose the n particles and, therefore, include common configurations. The exact volume, v (bold circle), of the n particles is defined by the presence of a shell particle that is farthest from r<sup>c</sup> and resides in the shell, dv, encapsulating v. (Adapted from Figure 2 in reference [14].)

Therefore, the proper form of Δ(N, P, T), in which interactions between the system and surroundings are neglected, as in Equation (2), should be [12–14]:

$$
\Delta(N, P, T) \;= \int\_0^\infty Q^\*(N, V, T) e^{-PV/k\_B T} dV \tag{4}
$$

where Q∗(N, V, T)dV represents the number of configurations in which at least one of the N particles resides in the shell, dV , surrounding V . Note that the above partition function is dimensionless, since Q∗(N, V, T)dV is a pure number (or Q∗(N, V, T) is a density of states). The shell particle is the correct volume scale when there is not a physical boundary to attach the volume of the system. Koper and Reiss [12] demonstrated that the states summed in Equation (4) do not contain common configurations, because the shell particle sets the volume. As shown above, all redundancies are eliminated by equating the system volume to the shell molecule.

#### *2.1. Cubic System Volume*

The application of the constant pressure ensemble to small systems also reveals the effects of additional variables, such as surface area and curvature, on the system's properties. In the thermodynamic limit, the shape of the container enclosing the system has no influence on its properties. As the size of the system is decreased, additional independent thermodynamic variables (e.g., surface area, curvature) must be introduced to ensure that the system's properties are described properly. These additional parameters are a function of the "shape" of the system volume. Therefore, the constant pressure ensemble partition function must be formulated differently in order to describe a system in which its volume is always either spherical (e.g., physical cluster) or cubic (the standard shape used to apply periodic boundary conditions). As a result, ensemble averages within the constant pressure ensemble will depend upon the shape of the system volume. This dependence upon shape, of course, becomes negligible in the thermodynamic limit. A reader interested in spherical systems is referred to [15]. However, since we are focusing on MD simulations with periodic boundary conditions, we are going to present results for a cubic volume [15].

The mathematical representation of Q∗(N, V, T)dV for a cubic volume, V = L<sup>3</sup>, whose length, L, lies between L and L + dL with at least one particle in the shell, dL, is given by [15–18]:

$$Q\_{cub}^\*(N, L, T)dL \quad = \frac{3dL}{(N-1)!\Lambda^{3N}} \int\_A dy\_1 dz\_1 \int\_{V^{N-1}} d\tau\_{12} \dots d\tau\_{1N} e^{-\beta U\_N} \tag{5}$$

where β = 1/kBT, Λ is the de Broglie wavelength, A represents the area of a face of the cube, dy<sup>1</sup> and dz<sup>1</sup> represent the differential change in the y and z coordinates of particle 1 (the shell particle), respectively, τ12...τ<sup>1</sup><sup>N</sup> are the coordinates of the remaining N − 1 particles relative to the position of particle 1, and U<sup>N</sup> is the interaction potential of all the N particles. The number three is required by the switch from dV to 3L<sup>2</sup>dL, since volume changes occur with constant shape, and indicates the three sets of equivalent configurations generated if a particle is held fixed in the shell in either the x, y or z direction. Particle 1 cannot be integrated throughout the entire shell, but due to symmetry, can be integrated separately in the x direction (or the y or z direction). The above integral is therefore evaluated with the x coordinate of particle 1 held fixed in the plane that corresponds to one of the two faces of the cube perpendicular to the x-axis of the coordinate system.

With Equation (5), the isothermal-isobaric ensemble partition function for a cubic volume is now represented by [15–18]:

$$
\Delta\_{cub}(N, P, T) = \int Q\_{cub}^\*(N, L, T) e^{-PL^3/k\_B T} dL \tag{6}
$$

Within the NPT ensemble, the instantaneous pressure, P, of the system fluctuates. The instantaneous pressure is defined as [15–18]:

$$P'' = \frac{k\_B T}{3L^2} \left(\frac{\partial \ln Q\_{cub}^\*}{\partial L}\right)\_{T,N} \tag{7}$$

and is calculated during a simulation via the following equation [15]:

$$P'' = \frac{(N - 1/3)k\_B T}{V} + \frac{\langle \sum\_{i} \sum\_{j>i} \vec{r\_{ij}} \cdot \vec{f\_{ij}} \rangle}{3V} \tag{8}$$

where the second term on the right side is the standard virial of the system, rij is the vector between the centers of particles i and j and fij is the corresponding force. The ideal, or kinetic, term now reflects the loss of one, out of 3N, translational degree of freedom. By defining the system volume and, therefore, being directly coupled to the barostat via the changes in the volume, the shell particle does not translate freely, that is, independently of the volume, in the x direction and, so, does not impart any momentum in the x direction to the surface of the cube. The shell particle translates freely in only two directions (y and z). The virial term in Equation (8) remains unchanged, since the shell particle still interacts with the other particles in the system. The ensemble average of P is related to the externally imposed pressure, P, as follows [15]:

$$
\langle P'' \rangle = P + \frac{2k\_B T}{3N} \langle \rho \rangle \tag{9}
$$

where ρ is the average density of the system.

While revising the Monte Carlo NPT algorithm to incorporate the shell particle, Corti [15] derived several relations that describe how ensemble averages obtained within the new NPT partition function, Equation (4) or (6), relate to ensemble averages obtained with the old no-shell NPT (Equation (2)) partition function, Δ0, [15]. If V represents the ensemble-averaged volume defined via the shell particle and V <sup>0</sup> represents the ensemble-averaged volume defined via the old definition, then [15]:

$$
\langle V \rangle\_0 = \langle V \rangle\_0 - \frac{k\_B T}{P} \tag{10}
$$

Consequently, V < V 0; the difference between these two average volumes is only apparent at small system sizes, since in the thermodynamic limit, V →V <sup>0</sup> (kBT/P is intensive).

#### *2.2. Ideal Gas Results*

The ideal gas offers a unique opportunity to obtain a closed form solution for the partition function. Using the shell molecule definition (Equation (4)), we obtain [14]:

$$\Delta(N, P, T) = \int\_0^\infty Q\_{id}^\*(N, V, T) e^{-\beta PV} dV = \int\_0^\infty \frac{V^{N-1}}{(N-1)!\Lambda^{3N}} e^{-\beta PV} dV = \left(\frac{1}{\beta P \Lambda^3}\right)^N \tag{11}$$

Using the following definition [31,32]:

$$
\langle V \rangle = -k\_B T \left( \frac{\partial \ln \Delta}{\partial P} \right)\_{\beta, N} \tag{12}
$$

we get the following expression for the equation of state [14]:

$$P\langle V\rangle = Nk\_BT\tag{13}$$

408

Now, we can use the older partition function (Equation (2)) and perform the same analysis. The partition function is [14]:

$$\Delta\_0(N, P, T) = \frac{1}{V\_0} \int\_0^\infty Q\_{\rm id}(N, V, T) e^{-\beta PV} dV = \frac{1}{V\_0} \int\_0^\infty \frac{V^N}{(N)! \Lambda^{3N}} e^{-\beta PV} dV = \frac{1}{V\_0} \frac{1}{\Lambda^{3N} (\beta P)^{N+1}} (14)$$

We then get the following equation of state, noting that V<sup>0</sup> is a constant [14]:

$$P\langle V\rangle\_0 = (N+1)k\_BT\tag{15}$$

The use of (N +1) or N is clearly inconsequential in the thermodynamic limit. Yet, the difference between Equations (13) and (15) is significant when the system is sufficiently small. In general, the ensemble averages calculated within different ensembles will not be the same for small systems. In contrast, ensemble averages are independent of the particular ensemble chosen to evaluate them when the system is in the thermodynamic limit. One exception, however, is the ideal gas. Due to the absence of inter-particle interactions, identical results should be obtained within all ensembles and for all system sizes. Hence, the small system NPT partition function of the ideal gas should yield Equation (13) and not Equation (15).

#### 3. Shell Molecule Equations of Motion

In order to simulate the NPT ensemble, a technique for maintaining a constant temperature needs to be introduced into the equations of motion. As mentioned earlier, Nosé [2,3] and Hoover [4] proposed a completely dynamic method for maintaining constant temperature in an MD simulation. An additional variable, which serves to couple the system to a thermostat of fixed temperature T, is added to the Lagrangian of the N-body system. An effective mass is then associated with this new variable and controls the time scale for temperature fluctuations. While this scheme is usually effective, it does not always perform well [23]. In some cases, the resulting equations of motion do not generate phase-space trajectories that are ergodic [4]. To overcome this potential problem, the Nosé- Hoover chain method [23] was later developed. In this method, multiple thermostats are themselves successively coupled to adjacent thermostats, thereby forming a chain of thermostats.

The equations of motion for the NPT ensemble with the shell particle are straightforwardly obtained by employing an extended Lagrangian approach. The full derivation is presented in appendix A of reference [16]. For a cubic volume in which V = L<sup>3</sup>, we let the +x coordinate of particle 1, or the shell particle, define half the box length, L/2, of the simulation cell. We also choose q<sup>i</sup> and p<sup>i</sup> to represent the 3N-generalized coordinates and conjugate momenta, respectively. Note that we always have that q<sup>1</sup> = L/2. We therefore get the following equations of motion for an isothermal-isobaric ensemble consistent with Equation (6) using a single thermostat chain for all of the particles [16]:

$$\begin{aligned} \dot{q}\_i &= \frac{p\_i}{m\_i} + \left(\frac{\dot{q}\_1}{q\_1}\right) q\_i\\ \dot{p}\_i &= -F\_i - \left(\frac{\dot{q}\_1}{q\_1}\right) p\_i - \dot{\xi}\_1 p\_i \end{aligned}$$

$$\begin{array}{rcl} \dot{q}\_{1} &=& \frac{p\_{1}}{m\_{1}}\\ \dot{p}\_{1} &=& 24q\_{1}^{2}(P\_{int} - P) - \dot{\xi}\_{1}p\_{1} \\ \dot{\xi}\_{k} &=& \frac{p\_{\xi\_{k}}}{Q\_{k}}\\ \dot{p}\_{\xi\_{1}} &=& \sum\_{j=1}^{3N} \frac{p\_{j}^{2}}{m\_{j}} - \dot{\xi}\_{2}p\_{\xi\_{1}} - gk\_{B}T \\ \dot{p}\_{\xi\_{k}} &=& \frac{p\_{\xi\_{k-1}}^{2}}{Q\_{k-1}} - (\dot{\xi}\_{k+1})p\_{\xi\_{k}} - k\_{B}T \\ \dot{p}\_{\xi\_{C}} &=& \frac{p\_{\xi\_{C-1}}^{2}}{Q\_{C-1}} - k\_{B}T \\ \end{array} \tag{16}$$

where i = 2, ..., 3N, and the overdots signify time derivatives. F<sup>i</sup> is the x, y or z component of the force acting on the particle represented by the ith-generalized coordinate, and m<sup>i</sup> is the corresponding mass of the particle. Each ξ<sup>k</sup> is a thermodynamic friction coefficient introduced to simplify the equations, and p<sup>ξ</sup><sup>k</sup> is the corresponding momentum of ξk, whose effective mass is Qk. C is the total number of coupled thermostats in the chain, so that k = 1, ..., C, and g denotes the total number of degrees of freedom of the momenta of the particles. The expression for the internal pressure, Pint, which follows from Equation (6), is:

$$P\_{int} = \frac{1}{24q\_1^3} \left[ \sum\_{i=2}^{3N} \frac{p\_i^2}{m\_i} + \sum\_{i=1}^{3N} q\_i F\_i \right] \tag{17}$$

where the first summation runs from two to 3N, indicating that the x momentum of the shell particle does not contribute to the internal pressure.

The extended Hamiltonian, Hext, for this system is [16–18]:

$$H\_{ext} = \sum\_{i=1}^{3N} \frac{p\_i^2}{2m\_i} + \sum\_i \sum\_{j>i} U(r\_{ij}) + 8q\_1^3 P + \sum\_{k=1}^C \frac{p\_{\xi\_k}^2}{2Q\_k} + gk\_B T \xi\_1 + \sum\_{k=2}^C k\_B T \xi\_k \tag{18}$$

where V = 8q<sup>3</sup> <sup>1</sup>, U(rij ) is the interaction potential between particles i and j and i j>i is a sum over all distinct pairs of particles. Although the equations of motion cannot be obtained directly from Equation (18), Hext is a conserved quantity.

With the exception of those equations of motion that describe the velocity and acceleration of q1, the proposed equations are the same as those of Andersen's method [1]. The expression for the acceleration of q<sup>1</sup> provides an interesting physical interpretation. Given that the area of a single face of the simulation cell is 4q<sup>2</sup> <sup>1</sup> (since L = 2q1), the total surface area of the cube is 24q<sup>2</sup> 1. When multiplied by the difference between the internal and external pressures, we obtain the net force that drives the acceleration of q1. This connection between the acceleration of q<sup>1</sup> and the pressure difference is more physically appealing than what appears in other methods [1,5,8]. Another benefit to the shell formulation is that the system itself sets the time scale for volume and pressure fluctuations, since the mass of the shell particle is known. In Andersen's method, there is an unknown piston mass that sets the response time of volume and pressure fluctuations.

In the simulations performed in this work, the forces acting in the y and z directions on each particle sum to zero when there are no external forces in the y and z directions. The sum of forces in the x direction will not be zero, since the x directional momentum of the shell particle is directly coupled to the barostat. Therefore, only the linear momenta in the y and z directions are conserved. Furthermore, to avoid particle drift during simulations when periodic boundary conditions are applied, the center-of-mass momentum in the y and z directions is set equal to zero. Again, the center-of-mass momentum in the x direction is driven by the external pressure and cannot be held fixed at a zero value, though volume fluctuations ensure that the total momentum in the x direction averages to zero. Consequently, it can be shown that g = 3N − 2 [16,33], indicating that the above equations of motion yield trajectories in phase space that are consistent with a (3N − 2)P T partition function (there are 3N − 2 momentum degrees of freedom) [16].

The new equations of motion that employ a shell particle to define the system volume provide another example of a non-Hamiltonian system, in that Equation (16) cannot be derived from the extended Hamiltonian in Equation (18). A systematic procedure for extending classical statistical mechanics to non-Hamiltonian systems was proposed by Tuckerman *et al.* [6,34]. The crux of their analysis relies on the notion that non-Hamiltonian phase space is compressible, as opposed to its Hamiltonian counterpart. For a non-Hamiltonian system, the Jacobian describing the transformation from an initial phase-space vector to a phase-space vector at time t is not equal to unity. The invariant phase-space metric for a non-Hamiltonian system is therefore not the same as the Hamiltonian system. Nevertheless, using the procedure of Tuckerman *et al.* [6,34], where the compressibility of the phase space is taken into account, the extended system partition function can still be derived from the equations of motion and the various constraints, or conservation relations, on the system. The detailed phase space analysis of Equation (16) presented in reference [16] shows that the proposed shell particle equations of motion are completely consistent with the shell particle partition function (Equation (6)) with and without periodic boundary conditions.

#### *3.1. The Hoover Algorithm and Periodic Boundary Conditions*

The NPT partition function in Equation (6) can be rewritten if the system is homogeneous and periodic boundary conditions are applied. Han and Son [35] showed that since periodic boundary conditions yield a transitionally symmetric system, particle 1 does not need to be held fixed inside the shell, dL. If all of the relative distances between the particles remain fixed, identical configurations will be generated if particle 1 is allowed to sample the entire instantaneous volume. Thus [15]:

$$\begin{split} Q\_{cub}^{\*}(N,L,T)dL &= \frac{3dL}{(N-1)!\Lambda^{3N}} \int\_{A} dy\_{1} dz\_{1} \int\_{V^{N-1}} d\tau\_{12} ... d\tau\_{1N} e^{-\beta U\_{N}} \\ &= \frac{3dL}{(N-1)!\Lambda^{3N}} \frac{1}{L} \int\_{V} d\tau\_{1} ... d\tau\_{N} e^{-\beta U\_{N}} = \frac{N}{V} Q(N,V,T)dV \end{split} \tag{19}$$

where Q(N, V, T) is the canonical partition function without the shell molecule. Using Equation (19), we can write the isothermal-isobaric partition function as:

$$
\Delta\_{PB}(N, P, T) \quad = \int \frac{N}{V} Q(N, V, T) e^{-\beta PV} dV \tag{20}
$$

where the P B subscript signifies that the partition function is only valid under the symmetry imposed by periodic boundary conditions [15,16]. This volume scale was derived earlier using the information theory by Attard [36].

Let us now consider the following equations of motion proposed for the NPT ensemble by Hoover [4,5] with a single chained thermostat:

$$\begin{array}{rcl} \dot{q}\_{i} &=& \frac{p\_{i}}{m\_{i}} + \left(\frac{\dot{V}}{3V}\right)q\_{i} \\ \dot{p}\_{i} &=& F\_{i} - \left(\frac{\dot{V}}{3V}\right)p\_{i} - \dot{\xi}\_{1}p\_{i} \\ \dot{V} &=& \frac{3p\_{\epsilon}}{M\_{p}}V \\ \dot{p}\_{\epsilon} &=& 3V(P\_{int} - P) - \dot{\xi}\_{1}p\_{\epsilon} \\ \dot{\xi}\_{k} &=& \frac{p\_{\xi\_{k}}}{Q\_{k}} \\ \dot{p}\_{\xi\_{1}} &=& \sum\_{j=1}^{3N} \frac{p\_{j}^{2}}{m\_{j}} + \frac{p\_{\epsilon}^{2}}{M\_{p}} - \dot{\xi}\_{2}p\_{\xi\_{1}} - gk\_{B}T \\ \dot{p}\_{\xi\_{k}} &=& \frac{p\_{\xi\_{k-1}}^{2}}{Q\_{k-1}} - (\dot{\xi}\_{k+1})p\_{\xi\_{k}} - k\_{B}T \\ \dot{p}\_{\xi\_{C}} &=& \frac{p\_{\xi\_{C-1}}^{2}}{Q\_{C-1}} - k\_{B}T \\ \end{array} \tag{21}$$

where i = 1, ..., 3N and:

$$P\_{int} = \frac{1}{3V} \left[ \sum\_{i=1}^{3N} \frac{p\_i^2}{m\_i} + \sum\_{i=1}^{3N} q\_i F\_i \right] \tag{22}$$

The extended Hamiltonian for the Hoover NPT algorithm is:

$$H\_{ext} = \sum\_{i=1}^{3N} \frac{p\_i^2}{2m\_i} + \sum\_i \sum\_{j>i} U(r\_{ij}) + \frac{p\_\epsilon^2}{2M\_P} + PV + \sum\_{k=1}^C \frac{p\_{\xi\_k}^2}{2Q\_k} + gk\_BT\xi\_1 + \sum\_{k=2}^C k\_B T\xi\_k \tag{23}$$

Tuckerman *et al.* [6] already performed the phase-space analysis on Hoover's equations of motion, in which they obtained a 1/V weighting in the volume distribution function when all three directional linear momenta are conserved, as well as the three center-of-mass momenta being set to zero. The appearance of the 1/V weighting of the volume distribution makes it completely consistent with the partition function introduced by Attard (Equation (20)). The Hoover algorithm does in fact lead to the correct sampling of volume states, but only for homogenous systems with periodic boundary conditions. In the absence of external forces, Hoover's algorithm yields a (3N − 2)P T ensemble: there are a total of (3N + 1) momentum degrees of freedom (3N particles and one volume), but now, the total linear momentum in each of the three directions is conserved. Therefore, g = 3N − 2 in Equation (21).

Although the Hoover algorithm does sample phase space correctly (but only for periodic boundary conditions), there is still an unknown piston mass, which sets the response time of volume and pressure fluctuations, which must be specified. On the other hand, when the shell particle formulation is used, the system itself sets the time scale for volume and pressure fluctuations, since the mass of the shell particle is known. Furthermore, since the piston mass associated with the Hoover algorithm can have very different dynamics compared to the particles in the system, it is the suggested form of the equations of motion to use two separate chained thermostats, one coupled to the particles and the other to the volume [8]. The need to introduce another set of chained thermostats to drive the volume fluctuations in the Hoover algorithm requires that another set of unknown parameters, the additional thermostat masses, be specified [8]. This separate thermostat chain is not necessary with the shell particle algorithm, as the momentum of the shell particle is on the same scale as the rest of the particles of the system [16,17].

As a final point of interest and, again, to focus on the effects of the different barostats, we briefly consider the results of the phase-space analysis of the NPT equations of motion for the shell (Equation (16)) and the Hoover algorithm (Equation (21)). The explicit partition functions are derived in reference [16], where the influence of each barostat is clearly seen. By definition, the enthalpy, H, of the system is equal to H = H(q, p) + P V , where H(p, q) = <sup>3</sup><sup>N</sup> <sup>i</sup>=1 p<sup>2</sup> <sup>i</sup> /2m<sup>i</sup> + U(q). Hoover's algorithm generates an extended Hamiltonian (Equation (23)) that contains an additional term associated with the kinetic energy of the volume (p<sup>2</sup> /2M<sup>P</sup> ), a quantity that should not appear in the enthalpy if the boundary used to describe the system volume is a mathematical construct to which a mass or momentum cannot be assigned [14]. In contrast, each configuration in the shell particle partition function corresponds to one and only one volume state, since the non-extended Hamiltonian is directly coupled to the volume (*i.e*., there is a one to one correspondence with the non-extended Hamiltonian and the volume states) [16]. The redundant counting of volume states is not eliminated in the other algorithms, because those non-extended Hamiltonians are decoupled from the volume. Note that whenever we use the extended variables approach to thermostat systems in this manuscript that there is always a kinetic term associated with the thermostat variables in the extended Hamiltonian. We are focusing here on how the positions and momenta are sampling the correct distribution, and the preceding argument on the enthalpy is independent of this kinetic term, due to the thermostat variables.

We conclude this section by noting that the ensemble average pressure for a system whose partition function is described by Equation (20) obeys the following relation [15]:

$$
\langle P' \rangle\_{\cdot} = \langle P + \frac{k\_B T}{N} \langle \rho \rangle \tag{24}
$$

The correction to the ensemble average volume is the same as is given in Equation (10).

#### *3.2. Multicomponent Systems*

In this section, we discuss the extension of the shell particle MD algorithm to multicomponent systems. In particular, we consider a binary mixture comprised of species A and B. In this case, the isothermal-isobaric partition function must include configurations in which the shell particle is of type A and configurations in which the shell particle is of type B. Therefore, the isothermal-isobaric partition function, ΔAB, is given by (only the case of a cubical volume is considered) [15,17]:

$$
\Delta\_{AB}(N, P, T) \quad = \int Q\_{cub}^{\*A}(N, V, T) e^{-\beta P L^3} dL + \int Q\_{cub}^{\*B}(N, V, T) e^{-\beta P L^3} dL \tag{25}
$$

where Q∗<sup>A</sup> cub(N, V, T), for example, is the total number of configurations of N<sup>A</sup> particles of type A and N<sup>B</sup> particles of type B contained in a volume V = L<sup>3</sup> in which at least one of the N<sup>A</sup> particles resides in the shell, dL, encapsulating V .

When periodic boundary conditions are applied, one can show that the probability of a given configuration having a shell particle of type A is simply equal to the mole fraction of A. Begin with the partition function, ΔA, that includes only those configurations in which the shell particle is of type A:

$$
\Delta\_A(N, P, T) \quad = \int Q\_{cub}^{\*A}(N, V, T) e^{-\beta P L^3} dL \tag{26}
$$

For a homogeneous fluid in which periodic boundary conditions are employed, one can rewrite ΔA, following the argument presented by Han and Son [35], as:

$$
\Delta\_A(N, P, T) \quad = \quad N\_A \int \frac{Q(N, V, T)}{V} e^{-\beta PV} dV \tag{27}
$$

where Q(N, V, T) is the canonical ensemble partition function for N<sup>A</sup> and N<sup>B</sup> particles without a shell particle used to define the volume. The fraction of configurations containing a shell particle of type A is therefore given by NA/(N<sup>A</sup> + NB) = x<sup>A</sup> [17]. It was shown in reference [17] that the ensemble average of F (F being any given quantity) is given by:

$$
\langle F \rangle = x\_A \langle F \rangle\_A + x\_B \langle F \rangle\_B \tag{28}
$$

where F<sup>A</sup> is the ensemble average obtained with only A as the shell particle and F<sup>A</sup> is the ensemble average obtained with only B as the shell particle. Hence, two separate simulations can be run, each with different identities of the shell particle, with the resulting ensemble averages simply weighted by the mole fractions of each component. Yet, one can proceed even further and demonstrate that only one simulation per state point is ultimately required, with the identity of the shell particle being completely arbitrary. With periodic boundary conditions, we showed in reference [17] that F = F<sup>A</sup> = FB. Therefore, only one single simulation is required; the choice of which species to be the shell particle is solely a matter of convenience [17]. This conclusion also holds for mixtures with more than two components, again, only when periodic boundary conditions are employed.

#### *3.3. Collision Dynamics for Discontinuous Potentials*

In this section, we discuss the implementation of the shell particle formalism to simulate systems that have discontinuous intermolecular potentials [18]. Discontinuous molecular dynamics (DMD) have been widely used for quite some time, beginning with the initial work of Alder and Wainwright [37,38] in the microcanonical ensemble. Gruhn and Monson [39], following an analysis by de Smedt *et al.* [40], extended DMD for the hard-sphere potential to the NPT ensemble. Their method, however, was based on Andersen's constant pressure algorithm [1], which does not yield averages consistent with Equation (6). Gruhn and Monson [39] derived expressions for the discontinuous change of the momenta of two hard spheres upon collision, as well as the change of the velocity of the piston (or system volume) upon that same collision.

We use the shell particle equations of motion provided in Equation (16) to develop a constant pressure DMD algorithm for both the hard-sphere and square-well fluids that are consistent with the proper NPT ensemble partition function, Equation (6). Momentum changes upon the collision of any two particles, including those changes for the shell particle, whether or not it participates in the collision, were derived in reference [18] and presented below. Our method is based on that of Gruhn and Monson [39], though we utilize the conservation of the extended Hamiltonian to obtain the collision dynamics. We simply present the results below, so the reader interested in the detailed derivations are referred to [18].

In an additive hard-sphere system, the potential of interaction between two particles, i and j, with diameters, σ<sup>i</sup> and σ<sup>j</sup> , respectively, is represented by:

$$u(r) = \begin{cases} \infty, & r < \sigma \\ 0, & r \ge \sigma \end{cases} \tag{29}$$

where r is the distance between the particle centers and σ = (σ<sup>i</sup> + σ<sup>j</sup> )/2. In between collisions, the hard-sphere fluid evolves dynamically without any force interactions. When applying the shell particle equations of motion to a hard-sphere collision, one must consider two separate cases: (1) neither particle i nor particle j is the shell particle and (2) either i or j is the shell particle. Furthermore, even if the shell particle does not participate in a collision, its x momentum will still change, since the acceleration of the shell particle is proportional to the internal pressure, which varies upon any collision.

There are several variables that are present in all of the expressions for the collision dynamics. These are the reduced mass, μ, the center-to-center vector, q, and r˙. The reduced mass is

$$
\mu = \frac{m\_i m\_j}{m\_i + m\_j} \tag{30}
$$

where m<sup>i</sup> and m<sup>j</sup> are the masses of particles i and j, respectively. q is defined as q = q<sup>i</sup> − q<sup>j</sup> , and <sup>r</sup>˙ is <sup>r</sup>˙ = (q · ˙ q)/σ, which is the time rate of change of q evaluated at |q| = σ. When neither of the colliding particles are the shell particle, the collision dynamics are given by:

$$
\begin{aligned}
\vec{t}\_d &= \begin{array}{c}
\hline
1 + \mu \sigma^2 / m\_1 q\_1^2
\end{array} \\
\Delta \vec{p}\_i &= \vec{t}\_d \frac{\vec{q}}{\sigma} \\
\Delta \vec{p}\_i &= \left. -\vec{t}\_d \frac{\vec{q}}{\sigma} \right. \\
\Delta p\_1 &= \vec{t}\_d \frac{\sigma}{q\_1}
\end{aligned}
\tag{31}
$$

When one of the colliding particles is the shell particle, the collision dynamics are now given by:

$$
\begin{aligned}
\overrightarrow{t}\_d &= \frac{-2\mu\overrightarrow{r}}{1 + \mu\sigma^2/m\_1q\_1^2 - \mu q\_x^2/m\_1\sigma^2} \\
\Delta\overrightarrow{p}\_i &= \overrightarrow{t}\_d\frac{\overrightarrow{q}}{\sigma} \\
\Delta p\_{1,y} &= -\overrightarrow{t}\_d\frac{q\_y}{\sigma} \\
\Delta p\_{1,z} &= -\overrightarrow{t}\_d\frac{q\_z}{\sigma} \\
\Delta p\_1 &= \overrightarrow{t}\_d\frac{\sigma}{q\_1}
\end{aligned}
\tag{32}
$$

where, for example, q<sup>x</sup> is the x component of q.

The square-well interaction potential is represented by:

$$u(r) = \begin{cases} \infty, & r < \sigma \\ -\epsilon, & \sigma \le r < \lambda \sigma \\ 0, & r \ge \lambda \sigma \end{cases} \tag{33}$$

where λ is the width and is the depth of the square-well (and may vary depending upon the interaction between any two particles). The interaction at r → σ is identical to the hard-sphere collision obtained above. There are three other types of collisions that occur in the square-well system at r → λσ. The capture interaction is the case where i and j start beyond λσ. There are two types of collision that occur at λσ when the starting distance between i and j are within the attractive well. The dissociation collision occurs when the molecules have enough kinetic energy to overcome the attractive potential energy and the molecules no longer interact, and the bounce collision occurs when there is not enough kinetic energy to overcome the attractive energy and the particle centers stay within the attractive well. The bounce dynamics are analogous to the hard-sphere collision presented above with the only difference being setting σ = λσ in Equations (31) and (32). The mathematical condition for determining if the collision is a bounce or a dissociation collision is that if μr˙ <sup>2</sup>/<sup>2</sup> <sup>≥</sup> (1 + μσ<sup>2</sup>/m1q<sup>2</sup> <sup>1</sup>) for the shell particle not taking part in the collision and μr˙ <sup>2</sup>/<sup>2</sup> <sup>≥</sup> (1 + μσ<sup>2</sup>/m1q<sup>2</sup> <sup>1</sup> <sup>−</sup> μq<sup>2</sup> <sup>x</sup>/m1λ<sup>2</sup>σ<sup>2</sup>) when the shell particle is taking part in the collision, then the collision is a dissociation collision.

The collision dynamics for capture and dissociation differ only by a plus/minus sign, so we present them together. When neither of the colliding particles are the shell particle, the collision dynamics are given by:

$$
\begin{aligned}
\vec{t}\_d &=& \frac{\mu \dot{r} \pm \mu [\dot{r}^2 \pm (2\epsilon/\mu)(1 + \mu \lambda^2 \sigma^2/m\_1 q\_1^2)]^{1/2}}{1 + \mu \lambda^2 \sigma^2/m\_1 q\_1^2} \\
\Delta \vec{p}\_i &=& -\vec{t}\_d \frac{\vec{q}}{\lambda \sigma} \\
\Delta \vec{p}\_j &=& \vec{t}\_d \frac{\vec{q}}{\lambda \sigma} \\
\Delta p\_1 &=& -\vec{t}\_d \frac{\lambda \sigma}{q\_1}
\end{aligned}
\tag{34}
$$

416

with the "+" being for capture and the "−" being for dissociation. When one of the colliding particles is the shell particle, then the collision dynamics are given by:

$$\begin{aligned} \overline{t}\_d &= \frac{\mu \dot{r} \pm \mu [\dot{r}^2 \pm (2\epsilon/\mu)(1 + \mu\lambda^2 \sigma^2/m\_1 q\_1^2 - \mu q\_x^2/m\_1 \lambda^2 \sigma^2)]^{1/2}}{1 + \mu\lambda^2 \sigma^2/m\_1 q\_1^2 - \mu q\_x^2/m\_1 \lambda^2 \sigma^2} \\ \Delta \overline{p}\_i &= -\overline{t}\_d \frac{\overline{q}}{\lambda \sigma} \\ \Delta p\_{1,y} &= \overline{t}\_d \frac{q\_y}{\lambda \sigma} \\ \Delta p\_{1,z} &= \overline{t}\_d \frac{q\_z}{\lambda \sigma} \\ \Delta p\_1 &= -\overline{t}\_d \frac{\lambda \sigma}{q\_1} \end{aligned} \tag{35}$$

We integrate the NPT equations of motion in between collisions via the application of the generalized Trotter expansion formula to the extended phase space classical Liouville operator discussed in the appendix and [17,18,22,24,41]. Since the thermostat variables have no influence on a hard-sphere or square-well collision, the updates of the thermostats can be completely decoupled from the updates of the particle positions and the momentum changes upon a collision. The full integration scheme is presented in detail in [18].

#### *3.4. Shell Particle Simulations Using the Configurational Temperature*

The concept of a configurational temperature was introduced in 1997 in the seminal paper by Rugh [42], which provided a tractable statistical mechanical expression for the reciprocal of this temperature. The expression for the configurational temperature was later generalized by Jepps *et al.* [43]. Since then, several MD algorithms have been developed that make use of the configurational temperature, but the few that are most useful for the current discussion are by Braga and Travis [44,45]. They introduced NPT equations of motion that use the configurational temperature and showed the benefits of using this temperature, instead of the standard kinetic temperature, within nonequilibrium simulations [44,45].

The equations of motion that they derived are not consistent with the shell molecule partition function, however, and so, we reformulated their equations to account for this. The new equations of motion are:

$$\begin{aligned} \dot{q}\_i &= \quad \frac{p\_i}{m\_i} + \left(\frac{\dot{q}\_1}{q\_1}\right)q\_i + \left(\frac{\dot{q}\_1}{q\_1}\right)(3N - 1)\frac{F\_i}{\Delta'} + \dot{\xi}\frac{F\_i}{\Delta'}\\ \dot{p}\_i &= \quad F\_i\\ \dot{q}\_1 &= \quad \frac{p\_1}{m\_1}\\ \dot{p}\_1 &= \quad 24q\_1^2(P\_{int} - P)\\ \dot{\xi} &= \quad \frac{p\_\xi}{Q}\\ \dot{p}\_\xi &= \quad \sum\_{i=2}^{3N} \frac{F\_i^2}{\Delta'} - k\_B T \end{aligned} \tag{36}$$

where i = 2, ..., 3N,

$$
\Delta' = \sum\_{i=2}^{3N} (\frac{\partial^2 U}{\partial q\_i^2}) \tag{37}
$$

where U is the total potential energy, and:

$$P\_{int} = \frac{1}{24q\_1^3} \left[ (3N - 1) \sum\_{i=2}^{3N} \frac{F\_i^2}{\Delta'} + \sum\_{j=1}^{3N} q\_j F\_j \right] \tag{38}$$

The extended Hamiltonian for the configurational temperature shell molecule system is:

$$H\_{ext} = \sum\_{i=1}^{3N} \frac{p\_i^2}{2m\_i} + \sum\_{i} \sum\_{j>i} U(r\_{ij}) + 8q\_1^3 P + \frac{p\_\xi^2}{2Q} + k\_B T \xi \tag{39}$$

The instantaneous configurational temperature, kBTconf , is:

$$k\_B T\_{conf} = \sum\_{i=2}^{3N} \frac{F\_i^2}{\Delta'} \tag{40}$$

The instantaneous configurational temperature appearing in the above shell particle equations of motion differs from that of Braga and Travis [44,45], whereby the sums appearing in Equations (37) and (40) run from two to 3N, as compared to one to 3N. The x-component of the shell particle is not included in these summations, although the shell particle does still contribute to the forces (and their derivatives) of the remaining particles. Since the configurational temperature thermostating appears to be preferred in nonequilibrium simulations, as known artifacts seen in some simulations with the Nosé-Hoover thermostat were not exhibited with the configurational temperature thermostat [44,45], we also include below new results for NPT MD simulations with the shell molecule using the configurational temperature.

#### 4. Results and Discussion

Several tests of the new shell particle equations of motion (Equation (16)) have been published previously. The agreement between isobars predicted by the new shell molecule molecular dynamics, the shell molecule Monte Carlo algorithm [15] and the equation of state for the Lennard-Jones fluid introduced by Johnson *et al.* [46] is shown in Figure 1 of [17]. Similar agreement between the MD and MC results is presented in Figure 1 of [18] for the square-well potential, which also includes, for comparison, the predictions of an equation of state introduced by Patel *et al.* [47]. For both systems, the MD and MC simulation results agree with each other and with the appropriate equation of state to high accuracy over a very broad range of pressures. On the scale of the plots, the MD and MC results are nearly indistinguishable [17,18]. We also looked at the self-diffusion coefficients for various binary Lennard-Jones mixtures and compared them with results from MD simulations in the microcanonical ensemble [17]. The self-diffusion coefficients of each species were essentially identical within both ensembles.

418

Table 1. Comparison of various ensemble averages for the truncated and shifted Lennard-Jones fluid with a cuttoff of 1.5σ for T<sup>∗</sup> = 1.5 and P<sup>∗</sup> = 0.5. The top dataset was obtained with the shell particle Monte Carlo method [15]. The second dataset was generated from the shell particle molecular dynamics (MD) algorithm using the Nosé-Hoover chained thermostat. The third dataset was obtained from the shell particle MD algorithm using the configurational temperature thermostat. The fourth dataset was obtained from the Hoover algorithm. The bottom set are the results of constant pressure MC simulations without a shell particle. The numbers in parentheses indicate the error in the final significant digits.


The differences between systems that sample the rigorously correct volume distributions and those that do not can most readily be seen in small systems. Equations (9) and (10), for example, provide strict tests of the validity of the shell particle equations of motion when compared against simulation methodologies that don not employ the correct volume scale. Several state points for Lennard-Jones, hard-sphere and square-well fluids are compared in [17,18]. In each of the conditions studied, the relations derived earlier are found to be satisfied to a high accuracy. We also included results from the Hoover algorithm (Equation (21), where it is important to note that the internal pressure equation for the Hoover algorithm is given by Equation (24)).

As an additional test, we present here results for a pure component Lennard-Jones fluid with system sizes ranging from N = 16 to 256. To avoid the use of long-range corrections, as well as force profiles that would not sum to zero in the y and z directions, we utilized the truncated and shifted force Lennard-Jones potential [38]:

$$u(r) \;=\; 4\epsilon \left[ \left( \frac{\sigma}{r} \right)^{12} - \left( \frac{\sigma}{r} \right)^{6} \right] - u\_{LJ}(r\_c) - (r - r\_c)u'\_{LJ}(r\_c) \tag{41}$$

where r<sup>c</sup> is the cutoff distance and u LJ (rc) is the derivative of the potential at the cutoff distance. At the chosen truncation distance, both the potential and force smoothly vanish. To prevent the truncation distance from exceeding half the box length at small system sizes, since periodic boundary conditions were employed, we chose r<sup>c</sup> = 1.5σ. Up to 10<sup>6</sup> time steps of equilibration were performed, followed by up to 10<sup>8</sup> time steps at the smallest system sizes for the determination of ensemble averages. All simulations were run at T<sup>∗</sup> = 1.5. The results for a reduced external pressure of P<sup>∗</sup> = 0.5 are included in Table 1. Additionally, provided in the tables are the averages obtained from MC simulations both with and without the shell particle.

According to Equation (10), the average volume of the shell particle simulations should be three units lower than the no-shell simulations in Table 1. The simulation results agree quite well with these predictions, considering the large absolute volume fluctuations that are obtained and satisfy Equation (10) with similar accuracy as noted in [15]. As expected, the average volume, density and internal energy per particle obtained with the shell particle MD with the traditional Nosé-Hoover thermostat, the shell particle MD with the configurational thermostat and the Hoover algorithms are in agreement, at least within the error bars. Both sets of averages are nearly the same for P<sup>∗</sup> = 0.5. The average internal pressures differ, but each is seen to satisfy Equations (9) and (24) to a high degree of accuracy. Both of the shell particle results and Hoover results also agree, within the error bars, with the MC shell particle simulations. There is, however, a slight discrepancy between the MD and MC shell particle results at very small system sizes (N ≤ 64), particularly for the values of Pint and the average density or volume. This difference can be attributed to the relatively large temperature fluctuations that develop within the MD simulations, as opposed to the strictly fixed temperature during the MC simulations. Statistical mechanics requires that the kinetic temperature of the system have a standard deviation of:

$$
\sigma\_T = \sqrt{\langle T^2 \rangle - \langle T \rangle^2} = T\_{bath} \sqrt{\frac{2}{3N}} \tag{42}
$$

where Tbath is the temperature of the surrounding temperature reservoir and N is the number of particles in the system. Equation (42) holds regardless of the usage of periodic boundary conditions [25,33]. In Table 1, we report the standard deviation by the number in parentheses indicating the error in the final significant digits. As an example, the number 1.50(31) means that the average is calculated to be 1.50 and the standard deviation is 0.31. The results for the temperature fluctuations in Table 1 agree very closely to Equation (42).

Furthermore, presented in Table 1 are the results for the shell molecule configurational temperature NPT. Note that the results for the configurational temperature are not provided for N = 16. At this small density and small number of particles, there is a chance that no pairs of particles reside within the cuttoff distance. As a result, Δ is equal to zero and the integration scheme breaks down for that time step. This problem only arose for the smallest system size (N = 16). Additionally, note in Table 1 that the configurational temperature yields the largest temperature fluctuations as compared to the other simulation methods. Again, at this relatively low density, the effects of adding or deleting one or two particle pairs within the potential cutoff for each time step are greatly enhanced for the configurational temperature (as compared to the kinetic temperature, which is based solely on the particle momenta). The average volumes are larger than they should be and the average potential energy is lower than it should be for the configurational temperature simulations. Interestingly, the results do seem to improve consistently as the number of particles increases. This may be due, in some small part, to the given expression for the configurational temperature, which as a measure of the system temperature is only accurate on the order of (1/N) [44,45].

#### *4.1. Discontinuous Pressure Jumps*

Ultimately, the true benefits of the shell particle algorithm may become apparent for nonequilibrium simulations where the system itself sets the time scale for pressure/volume fluctuations. It is important to note that in a multicomponent system, there is freedom to choose the identity of the shell particle, although the masses of the various components comprising the mixture are still known. To gain some initial idea of how the shell particle equations of motion might behave in a nonequilibrium application, we ran a pressure-jump simulation in which the external pressure is abruptly changed after the system has equilibrated. For example, we first consider the response of the internal pressure to a sudden change in the external pressure from P<sup>∗</sup> = 1.0 to P<sup>∗</sup> = 2.0 and then back to P<sup>∗</sup> = 1.0 at T<sup>∗</sup> = 2.0 for the pure component Lennard-Jones fluid with N = 500 and long-range corrections applied after a potential cutoff of 3.0σ. The resulting time evolution of the internal pressure is shown in Figure 2.

The figure includes results for the shell molecule with the Nosé-Hoover thermostat, the shell molecule with the configurational temperature thermostat and results for the Hoover algorithm with the reduced piston mass, M<sup>∗</sup> <sup>p</sup> , equal to M<sup>∗</sup> <sup>p</sup> = 10.0 and M<sup>∗</sup> <sup>p</sup> = 5.0, respectively. Both of the shell particle simulations and the Hoover simulation with M<sup>∗</sup> <sup>p</sup> = 10.0 quickly adjust to the new external pressure, while the Hoover simulation with M<sup>∗</sup> <sup>p</sup> = 5.0 requires a much longer time to re-equilibrate (again, the time scale obtained from the Hoover code is directly dependent upon the mass of the piston, whereas the time scale for the shell particle algorithm is automatically set by the system). This result is somewhat surprising, since the two piston masses are so close in their numerical values. The fluctuations of the internal pressure exhibited by both of the shell particle codes at the new external pressure of P<sup>∗</sup> = 2.0 and, then, again, at P<sup>∗</sup> = 1.0 are immediately identical to the fluctuations seen from regular equilibrium simulations at P<sup>∗</sup> = 2.0 and P<sup>∗</sup> = 1.0. The internal pressure for the Hoover simulations also adjusts to the new external pressure, but there does appear to be a considerable "decay" to the new set point after the pressure jump. This decay is dependent on the value of the piston mass. A smaller value of the piston mass yields a longer decay in the instantaneous pressure to the new equilibrium point.

Figure 2. Time response of the internal pressure of the pure component Lennard-Jones fluid to a sudden change of the external pressure from P<sup>∗</sup> = 1.0 to P<sup>∗</sup> = 2.0 and, then, back down to P<sup>∗</sup> = 1.0. For the given choice of the time origin, the pressure is increased after 2000 time steps and, then, reduced after another 4000 time steps. The solid line is the set external pressure, P. The dashed lines are the simulation results. In all cases, T<sup>∗</sup> = 2.0 and N = 500. The plot in the upper-left corner is the results for the shell molecule with the Nosé-Hoover thermostat. The plot in the upper-right corner is the results for the shell molecule with the configurational temperature thermostat; The plots in the lower-left and lower-right corner are the results for the Hoover algorithm with M<sup>∗</sup> <sup>p</sup> = 10.0 and M<sup>∗</sup> <sup>p</sup> = 5.0, respectively.

We also preformed an isothermal compression as a series of steps from P<sup>∗</sup> = 1.0 to P<sup>∗</sup> = 4.0 in increments of 1.0 unit of reduced pressure every 2000 time steps at T<sup>∗</sup> = 2.0 for the pure component Lennard-Jones fluid with N = 500 and long-range corrections applied after a potential cutoff of 3.0σ.

The results are presented in Figure 3. The value of the time steps, along with every other aspect of the simulations, are the same as those performed for Figure 2. As before, both of the shell particle simulations exhibit fluctuations of the internal pressure after the jumps to be almost immediately identical to the fluctuations seen from regular equilibrium simulations at the respective set external pressures at equilibrium. The internal pressure for the Hoover simulations also adjust to the new external pressure, but again, there appears to be a considerable decay to the new set point after the pressure jump. The simulation with M<sup>∗</sup> <sup>P</sup> = 5.0 does not allow the internal pressure to equilibrate after an external pressure jump before the system takes the next jump. This shows that the value of the piston mass is critical to capturing the dynamics and fluctuations in nonequilibrium systems and that the results can be considerably different for values of the piston mass that are relatively close to one another.

Figure 3. Time response of the internal pressure of the pure component Lennard-Jones fluid to an isothermal compression from P<sup>∗</sup> = 1.0 to P<sup>∗</sup> = 4.0 in increments of 1.0 unit of reduced pressure every 2000 time steps. For the given choice of the time origin, the pressure is increased after 2000, 4000 and, again, after 6000 time steps. The solid line is the set external pressure, P. The dashed lines are the simulation results. In all cases, T<sup>∗</sup> = 2.0 and N = 500. The plot in the upper-left corner is the results for the shell molecule with the Nosé-Hoover thermostat. The plot in the upper-right corner is the results for the shell molecule with the configurational temperature thermostat. The plots in the lower-left and lower-right corner are the results for the Hoover algorithm with M<sup>∗</sup> <sup>p</sup> = 10.0 and M<sup>∗</sup> <sup>p</sup> = 5.0, respectively.

#### 5. Conclusions

The MD NPT simulation method that employs the shell particle is based on equations of motion consistent with the proper statistical mechanical formulation of the NPT ensemble. Within other MD methods, a piston of arbitrary mass is introduced to control the response time of volume fluctuations. Now, the shell particle of known mass determines the time scale for volume and pressure fluctuations, in addition to performing the important function of eliminating the redundant counting of configurations through its unique definition of the volume of the system.

There are several benefits to using the shell particle algorithm for MD equilibrium simulations. For example, as the shell particle directly interacts with all other particles in the system, only a single Nosé-Hoover chained thermostat need be employed. Additionally, as noted above, the mass of the "volume", that is, the shell particle, is a known quantity.

Allowing the system itself to control the relaxation time of property fluctuations should ultimately provide a significant edge over piston-based methods, specifically for nonequilibrium systems (though that has yet to be shown). Adapting the shell particle approach to a simulation of isothermal-isobaric shear flow [44,45] or homogeneous nucleation in simple fluids [48–50] may provide another worthwhile test of the shell particle formalism. The piston mass is not known *a priori* and greatly affects the response time of the system. As such, for nonequilibrium simulations, the appropriate choice of a piston mass is unclear. On the other hand, at least for a pure component system, there is no ambiguity as to what should be the response time of the volume fluctuations; the mass of the shell particle is again a known input to the simulation. For mixtures, however, the identity of the shell particle is important (though not for equilibrium averages). Yet, the masses of the components comprising the mixture are still known. How the identity of the shell particle controls the response time of pressure and volume fluctuations is therefore, in the end, still a property of the system itself.

Future directions include incorporating the shell particle formalism into constant pressure molecular dynamics algorithms for both molecular systems and systems with long-range intermolecular interactions (such as electrostatic interactions). To date, we have not performed any simulations of these systems with the shell particle, so definitive statements would be inappropriate at this time. However, at first sight, it would appear that selecting the center of mass of one of the molecules as the volume scale is a logical choice for molecular systems. It also appears that the techniques already employed to deal with long-range interactions in previous algorithms (such as the Ewald summation in electrostatic systems) would apply equally to shell particle algorithms. There is nothing in the derivations given in this manuscript that suggests to us that intra- or long-range intermolecular interactions would have any effect on the validity of the shell molecule formulation.

Finally, various shape changing, or isotension, simulations [25] may also benefit from the use of the shell particle. The introduction of a shell particle into these codes would add some complexities, such as possibly allowing the shell particle to move along different faces of the system volume. Nevertheless, eliminating the need to specify a piston mass might be very helpful for these simulations.

#### Acknowledgments

This work was supported by the National Science Foundation/EPSCoR Grant(EPS-0903795) to Mark J. Uline.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### Appendix

#### Integration Scheme Using the Liouville Operator

We integrate the NPT equations of motion via the application of the generalized Trotter expansion formula to the extended phase space classical Liouville operator [25,41]. What follows from this approach is an integration scheme that is time-reversible and volume-preserving in the appropriate extended phase space, yielding stable trajectories with no significant drift in the extended Hamiltonian. One begins with a unitary operator, called the classical propagator, that propagates the appropriate phase space vector, Γ, from an initial state at time t = 0 to a final state at time t, *i.e*., Γ(t) <sup>→</sup> Γ(0). The evolution as a function of time t can be formally written as the solution of the following equation:

$$\vec{\Gamma}(t) \quad = \exp(iLt)\vec{\Gamma}(0) \tag{43}$$

where iL is the Liouville operator, defined as iL = Γ · ∇ <sup>Γ</sup>, and exp(iLt) is the classical propagator. The Liouville operator can be expressed as the sum of other operators, for example, iL = iL<sup>1</sup> + iL2, such that the action of each separate classical propagator can be evaluated analytically. Since iL<sup>1</sup> and iL<sup>2</sup> do not, in general, commute, exp(iLt) cannot be replaced by exp(iL1)exp(iL2). The classical propagator can be rewritten, however, using the Trotter expansion theorem [21–26],

$$e^{iLt} = e^{\langle iL\_1t + iL\_2t \rangle} = e^{\langle iL\_1t/2 \rangle} e^{\langle iL\_2t \rangle} e^{\langle iL\_1t/2 \rangle} + O(t^3) \tag{44}$$

Application of the above operator, along with the given equations of motion, provides a sequence of operations on the components of Γ, thereby generating an integration scheme that updates Γ from time t to time t + Δt. Since we start from the Liouville formulation of classical mechanics and the classical propagator is a unitary operator, the resulting MD algorithm is guaranteed to be time-reversible and (phase space) volume-preserving (within finite machine precision and to the second order in the chosen time step, Δt).

The phase space vector appropriate for the shell particle equations of motion with a single chain of C thermostats is Γ = Γ(q1, p1, qi, pi, ξk, p<sup>ξ</sup><sup>k</sup> ), where i = 2 to 3N and k = 1 to C. In this case, the Liouville operator is given by [17]:

$$iL\_{NPT} = \dot{q}\_1 \frac{\partial}{\partial q\_1} + \dot{p}\_1 \frac{\partial}{\partial p\_1} + \sum\_{i=2}^{3N} \left[ \dot{q}\_i \frac{\partial}{\partial q\_i} + \dot{p}\_i \frac{\partial}{\partial p\_i} \right] + \sum\_{k=1}^{C} \left[ \dot{\xi}\_k \frac{\partial}{\partial \xi\_k} + \dot{p}\_{\xi\_k} \frac{\partial}{\partial p\_{\xi\_k}} \right] \tag{45}$$

By changing the order of some of the terms and replacing the various time derivatives with their expressions given in Equation (16), we rewrite the full Liouville operator as a sum of the following five separate operators:

$$iL\_{NPT} = \ ^{i}Li\_q + iL\_{p\_1} + iL\_{p\_i} + iL\_{p\_{1i}} + iL\_{NH} \tag{46}$$

in which:

$$\begin{aligned} iL\_{q} &= \frac{p\_{1}}{m\_{1}}\frac{\partial}{\partial q\_{1}} + \sum\_{i=2}^{3N} \left[ \left(\frac{p\_{i}}{m\_{i}} + \left(\frac{\dot{q\_{1}}}{q\_{1}}\right)\dot{q\_{i}} \right) \frac{\partial}{\partial q\_{i}} \right] \\ iL\_{p\_{1}} &= \bar{G}\_{\epsilon} \frac{\partial}{\partial p\_{1}} \\ iL\_{p\_{i}} &= \sum\_{i=2}^{3N} \left[ \bar{F}\_{i} \frac{\partial}{\partial p\_{i}} \right] \\ iL\_{p\_{i1}} &= \sum\_{i=2}^{3N} \left[ \left(-\frac{\dot{q\_{1}}}{q\_{1}}\right)p\_{i} \frac{\partial}{\partial p\_{i}} \right] \\ iL\_{NH} &= \sum\_{i=2}^{3N} \left[ \left(-\frac{p\_{\xi\_{i}}}{Q\_{1}}\right)p\_{i} \frac{\partial}{\partial p\_{i}} \right] - \frac{p\_{\xi\_{i}}}{Q\_{1}}p\_{1} \frac{\partial}{\partial p\_{i}} + \sum\_{k=1}^{C} \frac{p\_{\xi\_{k}}}{Q\_{k}} \frac{\partial}{\partial \xi\_{k}} + \sum\_{k=1}^{C} G\_{\xi\_{k}} \frac{\partial}{\partial p\_{k}} - \sum\_{k=1}^{C-1} \frac{p\_{\xi\_{k+1}}}{Q\_{k+1}} p\_{\xi\_{k}} \frac{\partial}{\partial p\_{i}} \end{aligned} (47)$$

where:

$$\begin{aligned} G\_{\epsilon} &= \left[ \frac{1}{q\_1} \left( \sum\_{i=2}^{3N} \frac{p\_i^2}{m\_i} + \sum\_{i=1}^{3N} q\_i F\_i \right) - 24q\_1^2 P\_{ext} \right] \\ G\_{\xi\_1} &= \left[ \sum\_{i=1}^{3N} \frac{p\_i^2}{m\_i} - gk\_B T \right] \\ G\_{\xi\_k} &= \frac{p\_{\xi\_{k-1}}^2}{Q\_{k-1}} - k\_B T \end{aligned}$$

It is assumed in the above expressions that periodic boundary conditions are imposed, and we lose three of the momentum degrees of freedom in determining the instantaneous kinetic temperature (one from the x coordinate of the shell particle being coupled to the barostat and the center-of-mass momentum in the y and z directions being conserved and set equal to zero [33]). Under these circumstances, the instantaneous pressure is given in Equation (17), and the total number of the momentum degrees of freedom is g = 3N − 2.

The above decomposition of iLNPT is slightly different from what was done previously for continuous potentials [17,24]. In particular, all particle positions have been removed from iLNH and, so, are not altered by the action of this operator. What remains in iLNH are the thermostat variables, as well as the influence of the thermostats on the particle momenta. Changes in the particle positions, along with further updates of the particle momenta due to the dilatation of the system volume, are now only generated by the action of the remaining four operators. We found slightly better conservation of the extended Hamiltonian by splitting up the operator in this way relative to our previous factorization [17]. This approach of completely separating the influence of the thermostat variables on the particle momentum was our approach to factorize the propagator when using discontinuous potentials [18]. The reason being that since the collision dynamics do not influence the thermostat variables, it is much more convenient to update the positions and momentum of the physical particles and their collision properties all at once separated from the thermostat variables [18].

To determine the effect of the full Liouville operator, or exp(iLNPT t), on Γ, the Trotter expansion formula must be applied. Although there are several ways to do so, we follow a similar factorization proposed by Martyna *et al.* [24], whereby:

$$e^{iL\_{NPT}t} = e^{iL\_{NH}t/2}e^{iL\_{p1}t/2}e^{iL\_{p\_1t}t/2}e^{iL\_{p\_1}t/2}e^{iL\_{p1}t/2}e^{iL\_{p\_1t}t/2}e^{iL\_{p\_1}t/2}e^{iL\_{NH}t/2} + O(t^3)\tag{48}$$

The operator iLNH has to be further divided, which we split in the following manner:

$$iL\_{NH} = iL\_{cv} + iL\_{v\epsilon} + \sum\_{k=1}^{C} iL\_{G\_{\xi\_k}} + \sum\_{k=1}^{C-1} iL\_{v\xi\_k} + iL\_{\xi} \tag{49}$$

0

where:

$$\begin{aligned} iL\_{cv} &= \sum\_{i=2}^{3N} \left[ \left( -\frac{p\_{\xi\_1}}{Q\_1} \right) p\_i \frac{\partial}{\partial p\_i} \right] \\ iL\_{v\_k} &= -\frac{p\_{\xi\_1}}{Q\_1} p\_1 \frac{\partial}{\partial p\_1} \\ iL\_{G\_{\xi\_k}} &= -G\_{\xi\_k} \frac{\partial}{\partial p\_{\xi\_k}} \\ iL\_{v\_{\xi\_k}} &= -\frac{p\_{\xi\_{k+1}}}{Q\_{k+1}} p\_{\xi\_k} \frac{\partial}{\partial p\_{\xi\_k}} \\ iL\_{\xi} &= \sum\_{k=1}^{C} \frac{p\_{\xi\_k}}{Q\_k} \frac{\partial}{\partial \xi\_k} \end{aligned}$$

We again apply the Trotter expansion to factorize exp(iLNHt/2), the final form of which depends upon the number of thermostats in the chain. We present the results for C = 3 with exp(iLNHt/2) expanded as:

$$\begin{array}{rcl}e^{iL\_{NH}t/2} &=& e^{iL\_{G\_{\xi}}t/4} \Big[e^{iL\_{\upsilon\_{\xi\_{2}}}t/8}e^{iL\_{G\_{\xi\_{2}}}t/4}e^{iL\_{\psi\_{\xi\_{2}}}t/8}\Big] \Big[e^{iL\_{\upsilon\_{\xi\_{1}}}t/8}e^{iL\_{G\_{\xi\_{1}}}t/4}e^{iL\_{\upsilon\_{\xi\_{1}}}t/8}\Big] \\ &e^{iL\_{\upsilon\_{\xi}}t/4} \Big[e^{iL\_{\upsilon\_{\xi}}t/2}e^{iL\_{\xi}t/2}\Big]e^{iL\_{\upsilon\_{\xi}}t/4} \Big[e^{iL\_{\upsilon\_{\xi\_{1}}}t/8}e^{iL\_{G\_{\xi\_{1}}}t/4}e^{iL\_{\upsilon\_{\xi\_{1}}}t/8}\Big] \\ &\left[e^{iL\_{\upsilon\_{\xi\_{2}}}t/8}e^{iL\_{G\_{\xi\_{2}}}t/4}e^{iL\_{\upsilon\_{\xi\_{2}}}t/8}\right]e^{iL\_{G\_{\xi\_{2}}}t/4} \end{array} \tag{50}$$

Each of the above operators individually performs the following operations on the phase space vector [25]:

$$\begin{aligned} \label{eq:L} e^{iL\_G} \epsilon\_k \, ^{t/4} &: \quad p\_{\xi\_k} \to p\_{\xi\_k} + G\_{\xi\_k} t/4\\ e^{iL\_v} \epsilon\_k \, ^{t/8} &: \quad p\_{\xi\_k} \to p\_{\xi\_k} \exp \left( -\frac{p\_{\xi\_{k+1}}}{Q\_{k+1}} t/8 \right) \end{aligned}$$

$$\begin{cases} e^{iL\_0t/4} & : \quad p\_1 \to p\_1 \exp\left(-\frac{p\_{\xi\_1}}{Q\_1}t/4\right) \\\\ e^{iL\_0t/2} & : \quad \xi\_k \to \xi\_k + \frac{p\_{\xi\_k}}{Q\_k}t/2; k = 1, \ldots, C \end{cases} \tag{51}$$

$$e^{iL\_0t/2} \quad : \quad p\_i \to p\_i \exp\left(-\frac{p\_{\xi\_1}}{Q\_1}t/2\right); i = 2, \ldots, 3N \tag{52}$$

$$e^{iL\_{p\_1}t/2} \quad : \quad p\_1 \to p\_1 + G\_t t/2$$

$$e^{iL\_{p\_1}t/2} \quad : \quad p\_i \to p\_i \exp\left(-\frac{\dot{q}\_1}{q\_1}t/2\right); i = 2, \ldots, 3N \tag{53}$$

$$e^{iL\_0t/2} \quad : \quad p\_i \to p\_i + F\_t t/2; i = 2, \ldots, 3N$$

$$e^{iL\_0t} \quad : \quad q\_1 \to q\_1 + \frac{p\_1}{m\_1}t$$

$$q\_i \to q\_i \exp\left(\frac{\dot{q}\_1}{q\_1}t\right) + \frac{p\_i}{m\_i}t \frac{\sinh\left(\frac{\dot{q}\_1}{q\_1}\frac{t}{2}\right)}{\left(\frac{q\_1}{q\_1}\frac{t}{2}\right)} \exp\left(\frac{\dot{q}\_1}{q\_1}t/2\right); i = 2, \ldots, 3N$$

where sinh(x)/x can be expanded in a Maclaurin series to an arbitrarily high order [24] (we choose to truncate at the eighth order). In deriving the expression for qi, we follow the literature and use a slightly different approach relative to propagating all of the other phase space variables forward in time [26]. Instead of using a Taylor series expansion (truncated at the second order) to express the action of the propagator on q<sup>i</sup> [25], we instead rigorously solve the equation of motion (an ordinary first order differential equation with constant coefficients) for qi, noting that all of the variables in the equation of motion are constant, except for q<sup>i</sup> and t. Expanding sinh(x)/x to an arbitrarily high order is equivalent to truncating the Taylor series expansion of the operator to an arbitrarily high order [21,24].

Now that the full operator exp(iLNPT t) has been factorized, we operate on the phase space vector by following the order of the expansion of exp(iLNPT t) from right to left, thereby sequentially propagating various components of Γ from time t to t + Δt.

#### References


Reprinted from *Entropy*. Cite as: Hülsmann, M.; Reith, D. SpaGrOW—A Derivative-Free Optimization Scheme for Intermolecular Force Field Parameters Based on Sparse Grid Methods. *Entropy* 2013, *15*, 3640–3687.

### *Article*

## SpaGrOW—A Derivative-Free Optimization Scheme for Intermolecular Force Field Parameters Based on Sparse Grid Methods

#### Marco Hülsmann **<sup>1</sup>***,* \* and Dirk Reith **<sup>2</sup>**


*Received: 16 February 2013; in revised form: 15 July 2013 / Accepted: 28 August 2013 / Published: 6 September 2013*

Abstract: Molecular modeling is an important subdomain in the field of computational modeling, regarding both scientific and industrial applications. This is because computer simulations on a molecular level are a virtuous instrument to study the impact of microscopic on macroscopic phenomena. Accurate molecular models are indispensable for such simulations in order to predict physical target observables, like density, pressure, diffusion coefficients or energetic properties, quantitatively over a wide range of temperatures. Thereby, molecular interactions are described mathematically by force fields. The mathematical description includes parameters for both intramolecular and intermolecular interactions. While intramolecular force field parameters can be determined by quantum mechanics, the parameterization of the intermolecular part is often tedious. Recently, an empirical procedure, based on the minimization of a loss function between simulated and experimental physical properties, was published by the authors. Thereby, efficient gradient-based numerical optimization algorithms were used. However, empirical force field optimization is inhibited by the two following central issues appearing in molecular simulations: firstly, they are extremely time-consuming, even on modern and high-performance computer clusters, and secondly, simulation data is affected by statistical noise. The latter provokes the fact that an accurate computation of gradients or Hessians is nearly impossible close to a local or global minimum, mainly because the loss function is flat. Therefore, the question arises of whether to apply a derivative-free method approximating the loss function by an appropriate model function. In this paper, a new Sparse Grid-based Optimization Workflow (SpaGrOW) is presented, which accomplishes this task robustly and, at the same time, keeps the number of time-consuming simulations relatively small. This is achieved by an efficient sampling procedure for the approximation based on sparse grids, which is described in full detail: in order to counteract the fact that sparse grids are fully occupied on their boundaries, a mathematical transformation is applied to generate homogeneous Dirichlet boundary conditions. As the main drawback of sparse grids methods is the assumption that the function to be modeled exhibits certain smoothness properties, it has to be approximated by smooth functions first. Radial basis functions turned out to be very suitable to solve this task. The smoothing procedure and the subsequent interpolation on sparse grids are performed within sufficiently large compact trust regions of the parameter space. It is shown and explained how the combination of the three ingredients leads to a new efficient derivative-free algorithm, which has the additional advantage that it is capable of reducing the overall number of simulations by a factor of about two in comparison to gradient-based optimization methods. At the same time, the robustness with respect to statistical noise is maintained. This assertion is proven by both theoretical considerations and practical evaluations for molecular simulations on chemical example substances.

Keywords: force field parameterization; molecular simulations; atomistic models; derivative-free optimization; sparse grids; smoothing procedures

Classification: PACS 34.20.Gj; 64.75.Gh

#### 1. Introduction

In the last few decades, computer simulations have gained in importance for both science and industry, particularly due to the fast development of parallel high performance clusters. A denotative subarea of computer simulations are molecular simulations, which allow one to study the effects of modifications in microscopic states on macroscopic system properties. In contrast to simulations in process engineering, not the continuum, but the molecular level of a system is modeled. Thereby, the goal is to describe interatomic and intermolecular interactions, so that, on the one hand, certain accuracy demands are fulfilled, and on the other hand, the required computation time is as low as possible. The latter aspect is very important, because molecular simulations are numerically costly, also on modern computer clusters. The industrial relevance of molecular simulations originates from the fact that labor- and cost-intensive chemical experiments can be avoided. Hence, chemical systems can be simulated at temperatures and pressures that are very difficult to realize in a laboratory, the properties of toxic substances can be calculated without any risk and measure of precaution and processes on surfaces or within membranes are observable on a microscopic level. Another advantage of molecular simulations is that both the location and velocity of each particle are saved after certain time intervals, which results in a detailed observation of the behavior of the system. Altogether, molecular simulations have emerged as their own scientific discipline, and because of the continuous growth of computer resources, they will still become much more important in the coming years [1,2].

In principle, it is possible to describe interactions within a chemical system by quantum mechanics. Thereby, a partial differential equation, the so-called *Schrödinger equation* [3], has to be solved. As this turns out to be extremely difficult, especially for multi-particle systems, the problem is simplified by classical mechanical methods. Then, the system is considered on an atomistic level, *i.e*., the smallest unit is an atom. Molecular simulation techniques are based on statistical mechanics, a subarea of classical mechanics. The most important simulation methods are molecular dynamics (MD) and Monte Carlo (MC). Thereby, both intra- and inter-molecular interactions are described by the foundation of a simulation, the force field. A force field consists of an analytic term and parameters to be adjusted. It is given by the potential energy, which has the following typical form [1,4]:

$$\begin{aligned} \{U\_{\text{pot}}(r^M)\} &:= \sum\_{\text{Bonds}} \frac{k\_r}{2} \left(r - r\_0\right)^2 + \sum\_{\text{Angles}} \frac{k\_\phi}{2} \left(\phi - \phi\_0\right)^2 + \sum\_{\text{Ditherals}} \sum\_{n=1}^m V\_n \cos(n\omega) \\ &+ \sum\_{i$$

Thereby, <sup>r</sup><sup>M</sup> <sup>∈</sup> <sup>R</sup><sup>3</sup><sup>M</sup> is a vector containing all three-dimensional coordinates of the interaction sites, where M is the number of particles in the system. The first row of Equation (1) describes the intramolecular and the second row the intermolecular part. The parameters of the intramolecular part, modeling the interactions caused by the modifications of bond lengths r, bond angles φ and dihedral angles ω can be computed by quantum mechanics. Please note that the index, <sup>0</sup>, denotes the respective parameter in equilibrium and that k<sup>r</sup> and k<sup>φ</sup> are force constants. The factors, <sup>V</sup>n, n = 1, ..., m, m <sup>∈</sup> <sup>N</sup>, describe the rotation barriers around the molecular axes. The parameterization of the intermolecular part, modeling the interactions caused by dispersion—described by the Lennard-Jones (LJ) parameters, σij and εij , i, j = 1, ..., M, i < j, and electrostatic effects—described by partial atomic charges, q<sup>i</sup> and q<sup>j</sup> , i, j = 1, ..., M, i < j, is often tedious. The constant, <sup>ε</sup><sup>0</sup> = 8.<sup>854</sup> <sup>×</sup> <sup>10</sup>−<sup>12</sup> Fm−<sup>1</sup>, is the dielectric constant.

#### *1.1. Force Field Parameterization*

It is known from the literature that many force fields describe molecular interactions accurately, qualitatively and quantitatively [5]. Intramolecular parameters can be determined by quantum mechanics, *i.e*., by the minimization of a potential hyperplane. Partial atomic charges can also be computed from the position of nuclei and electrons. However, quantum mechanical methods require a high computational effort, especially for large molecules. This is why some simplifications were carried out, and the adjustment of the parameters was performed by fitting them to spectroscopic data. Some of the most famous force fields based on such semiempirical methods have been developed, e.g., by [6–8]. Quantum mechanical calculations of partial atomic charges were realized, e.g., by [9]. In related work [10,11], the automized optimization workflow, *WOLF*2*PACK*, was created, which combines quantum mechanical algorithms with atomistic models and is capable of calculating both optimal intramolecular force field parameters and partial atomic charges. For the intermolecular part of Equation (1), the respective force field parameters were mostly adjusted to experimental target data, *i.e*., physical properties resulting from a molecular simulation were compared with experimental reference data. In particular, this empirical approach was realized by [12–14]. However, the parameters obtained cannot be considered as optimal, because they were not fitted to a large number of experimental data. Furthermore, in most cases, they ware adjusted manually, which is always time-consuming. They are transferable to other substances, but a subsequent readjustment is indispensable [15]. Many users take standard force fields from the literature for their simulations, which may lead to satisfactory, but not to optimal, target properties.

In the last decade, a few approaches were published realizing an automated force field parameterization procedure [16–20]. Thereby, physical target properties, like density, enthalpy of vaporization and vapor pressure, were fitted to their respective experimental reference data at different temperatures and pressures simultaneously. This was done via the minimization of a quadratic loss function between simulated and experimental data, *i.e*., by solving a mathematical optimization problem with numerical optimization algorithms. This approach is pursued in this paper, as well. The loss function to be minimized is given by:

$$F: \mathbb{R}^N \to \mathbb{R}\_0^+ \tag{2}$$

$$x \quad \mapsto \sum\_{i=1}^{n} w\_i^2 \left( \frac{f\_i^{\text{exp}} - f\_i^{\text{sim}}(x)}{f\_i^{\text{exp}}} \right)^2 \tag{3}$$

where x = (x1, ..., x<sup>N</sup> )<sup>T</sup> is a force field parameter vector, N the dimension of the parameter space, n the number of considered physical properties, maybe at different temperatures, fsim <sup>i</sup> (x), i = 1, ..., n the simulated physical target properties as functions of the parameter vector, x, and fexp <sup>i</sup> , i = 1, ..., n the respective experimental reference data. The weights, w<sup>2</sup> <sup>i</sup> , i = 1, ..., n, account for the fact that some properties are easier to reproduce or measured more accurately than others. The loss function, <sup>F</sup>, is minimized within a compact domain, <sup>Ω</sup> <sup>⊂</sup> <sup>R</sup><sup>N</sup> .

The optimization workflow is shown in Figure 1: the initial guess has to be reasonably close to the minimum. The target properties computed by a simulation tool are inserted into loss function in Equation (3) and compared with the experimental target properties. If a specified stopping criterion is fulfilled, the parameters are final, and the workflow terminates. Otherwise, the current parameter vector is passed on to the optimization procedure searching for new parameters with a lower loss function value.

There are two main requirements for the numerical optimization algorithms solving the minimization problem: Firstly, they have to be efficient, *i.e*., their convergency must be fast and the number of function evaluations has to be low, because for each function evaluation, time-consuming molecular simulations are needed, which can be parallelized for the different temperatures. Secondly, they have to be robust with respect to statistical noise, because simulation data is always affected by uncertainties. The reason for this is the fact that physical properties are computed by averaging over a certain time period in the case of MD simulations and over a certain number of system states in the case of MC simulations. In previous work [21–23], the software package, *Gradient-based Optimization Workflow (GROW)*, was developed. Thereby, efficient gradient-based numerical optimization algorithms were successfully applied to minimize loss function in Equation (3) in an efficient and robust way. Thereby, simple methods, like steepest descent and conjugate gradient algorithms, turned out to be most suitable. Algorithms requiring a Hessian were too time-consuming or not reliable whenever the Hessian was assumed to be positive definite, which it was not in most cases. Gradients and Hessians were approximated using first-order finite differences. The lengths of the descent directions were computed by an Armijo step length control mechanism.

Figure 1. Optimization workflow: The target properties are computed for an initial guess for the force field parameters. If they do not agree sufficiently well with the experimental target properties, the optimization procedure is performed searching for new parameters with a lower loss function value.

However, the accurate computation of a gradient or a Hessian is problematic close to a global or local minimum of F, due to the presence of statistical noise, *cf*. Figure 2. For the finite difference approximation of the gradient, two adjacent points are necessary. If these two points are situated too close to each other, their loss function values cannot be distinguished anymore, due to the error bars surrounding them. Therefore, the direction of the approximated gradient can be completely wrong. This leads to the motivation to develop a new efficient derivative-free method counteracting this problem. The right side of Figure 2 shows that a possible solution is to approximate the loss function by an adequate model function and to determine the minimum of the model function. In this work, a new Sparse Grid-based Optimization Workflow (SpaGrOW) is presented in detail, implementing the aforesaid modeling approach. The main difficulty is to filter out the statistical noise during the approximation process, which is realized in SpaGrOW by regularization methods. As approximation is always based on sampling the parameter space and as molecular simulations are required for all selected points within the sampling process, sparse grids are involved in order to avoid high computational effort. The combination technique developed by [24] is applied in order to perform a piecewise multilinear interpolation from a sparse grid to a full grid. As sparse grids are fully occupied at their boundaries, the loss function is artificially set to zero at the boundary by a mathematical transformation. Then, the minimum on the full grid is determined, and a back-transformation is performed. As the combination technique requires certain smoothness properties, and as the loss function cannot be assumed to be smooth, it has to be preprocessed. Hence, it is approximated by a smooth model function before. The selection of a suitable model function is done by both theoretical considerations and practical evaluations.

Figure 2. Motivation of a derivative-free method in the case of noisy loss function values close to the minimum; the direction of a gradient can be completely wrong. Hence, an approximation of the loss function is necessary. This regression procedure has to filter out the statistical noise.

Another very important goal of SpaGrOW is to outperform gradient-based optimization algorithms with respect to computational effort, *i.e*., the number of molecular simulations to be performed should be reduced by about a factor of two. At the same time, it should be at least as robust as gradient-based methods. Hence, SpaGrOW should also be applicable in domains that are further away from the minimum. This can only be realized efficiently by a local consideration. The minimization of SpaGrOW is performed in a small compact trust region. As the minimum can be situated at the boundary of the trust region, the method becomes an iterative procedure. The actual minimum is accepted as a new iteration, if the piecewise multilinear model predicts the decreasing trend of the loss function in a reliable way. Hence, SpaGrOW is combined with the approach of another category of optimization algorithms, the Trust Region methods, *cf*. [25,26]. The speed of convergency can be controlled by the size of the trust region. Altogether, SpaGrOW is shown to be an efficient combination of numerical methods, which has the following three very important issues: it is efficient, robust and gets very close to the minimum, if certain assumptions are fulfilled. However, these assumptions cannot be proven *a priori*, because the shape of the loss function and the amount of statistical noise is unknown in most cases. Therefore, a detailed practical evaluation is indispensable.

The methodology of SpaGrOW is presented in detail in Section 2, including complexity and convergency considerations. Section 4 shows the results of a detailed practical evaluation of the methodology, and finally, Section 5 concludes the paper.

#### *1.2. Drawbacks of Gradient-Based Methods*

Although gradient-based optimization methods turned out to be suitable and also efficient instruments for the parameterization of force fields in previous work [21–23], they unfortunately have the following disadvantages:


These drawbacks of gradient-based methods motivate the development of an algorithm that is capable of both getting robustly closer to the minimum and solving the optimization problem with significantly less simulations. It should not assume the loss function to be smooth. Instead, it should use smoothing and regularization procedures in order to handle the statistical noise and jagged functions.

#### 2. SpaGrOW Methodology: The Main Elements

In this work, a new derivative-free algorithm based on sparse grids and smoothing methods is presented. Figure 3 visualizes the gain in both efficiency and robustness by the combination of the Trust Region approach with the interpolation on sparse grids and smoothing procedures: the interpolation from a sparse grid to a full grid realizes a significant reduction of function evaluations, *i.e*., simulations, without increasing the interpolation error significantly, on the other hand. At each iteration, the loss function is smoothed before the interpolation is carried out. The quality of the interpolation model is measured following the Trust Region idea. The actual minimum, which is determined on the full grid, is either accepted as a new iteration or the trust region is decreased. It is desisted from a continuous minimization within the trust region in order to avoid additional internal optimization iterations. There also exist approaches to determine the global minimum of a piecewise-linearly interpolated function via so-called subgradients [27]. However, only for the interpolation itself, at least 3<sup>N</sup> function evaluations are required. In the present case, there are constraints, due to the minimization on a compact trust region. Hence, the sparse grid approach seems to be more reasonable. Another advantage of the Trust Region approach is the fact that it is able to be fast at the beginning of the optimization procedure and to jump over undesired intermediate local minima. This is due to the size of the step length, which is selected before a descent direction is calculated.

Figure 3. Overview of the Sparse Grid-based Optimization Workflow (SpaGrOW): the combination of the Trust Region approach with the interpolation on sparse grids requiring a smoothing procedure to be preceded leads to both increasing efficiency and robustness.

First, the two main elements of SpaGrOW are presented: In Section 2.1, the idea of sparse grids is introduced and the advantages with respect to reduction of computation time are presented. Suitable

smoothing and regularization procedures are described in Section 2.2. The algorithm of SpaGrOW is introduced afterwards in Section 3.

#### *2.1. Interpolation on Sparse Grids*

The interpolation on sparse grids is a highly efficient discretization method in the field of Finite Element Methodology (FEM). In contrast to full grids, sparse grids possess significantly less grid points, especially in high dimensional spaces. The computational effort decreases from <sup>O</sup>(h−<sup>N</sup> ) to <sup>O</sup>(h−<sup>1</sup> <sup>×</sup> (log <sup>h</sup>−<sup>1</sup>)<sup>N</sup>−<sup>1</sup>), where <sup>h</sup> is the mesh size of the full grid. In the meantime, the interpolation error increases from <sup>O</sup>(h<sup>2</sup>) to <sup>O</sup>(h<sup>2</sup> <sup>×</sup> (log <sup>h</sup>−<sup>1</sup>)<sup>N</sup>−<sup>1</sup>) only, with respect to the <sup>L</sup>∞-norm [28]. The interpolation is founded on a tensor product approach for high dimensions and a linear combination of basis functions, e.g., of hierarchical hat functions [29]. Thereby, the basis functions with small coefficients within the linear combination are left out, and the remaining basis functions correspond to points of a sparse grid. Here, the combination technique by [24] is used, an efficient methodology interpolating a function on regular subgrids. The combination of these subgrids results in a sparse grid. The most important application of interpolation on sparse grids is the solution of partial differential equations, *cf*. [30].

#### 2.1.1. Idea of Sparse Grids

The main idea of sparse grids is to reduce the computational effort without obtaining an intolerable increase of the interpolation error. Especially in high dimensions, the reduction of computational effort becomes notable; Table 1 shows the number of grid points of full and sparse grids of different resolutions and dimensions. In the following, full and sparse grids are introduced with the aid of basis functions for piecewise-bilinear functions, *i.e*., for the two-dimensional case:




Let Ωi,j be the equidistant rectangular grid on the unit square, Ω := [0, 1]<sup>2</sup>, with mesh sizes <sup>h</sup><sup>i</sup> := 2−<sup>i</sup> in <sup>x</sup>- and <sup>h</sup><sup>j</sup> := 2−<sup>j</sup> in the <sup>y</sup>-direction. The vector, := (i, j) <sup>∈</sup> <sup>N</sup><sup>2</sup>, is called *level* of the grid, Ωi,j , with one-norm ||<sup>1</sup> := i + j. Moreover, let Si,j be the space of piecewise-bilinear functions on the grid, Ωi,j . For reasons of simplicity, only those piecewise-bilinear functions are considered here, which fulfill homogeneous Dirichlet boundary conditions in Ωi,j . The respective space is denoted by S<sup>0</sup> i,j . It can be expressed as a tensor product of subspaces Ts,t, s = 1, ..., i, t = 1, ..., j, whose functions vanish on all grid points corresponding to S<sup>0</sup> <sup>s</sup>−1,t and <sup>S</sup><sup>0</sup> s,t−<sup>1</sup>:

$$S\_{i,j}^0 = \bigotimes\_{s=1}^i \bigotimes\_{t=1}^j T\_{s,t} \tag{4}$$

Uniquely determined piecewise-bilinear basis functions in Ts,t with non-overlapping rectangular supports of size <sup>1</sup>/2<sup>s</sup>−<sup>1</sup> <sup>×</sup> <sup>1</sup>/2<sup>t</sup>−<sup>1</sup> are introduced; the so-called *hierarchical basis* [29]. Thereby, each grid point corresponds to a specific hierarchical basis function. It is situated in the center of the support. Moreover, each grid point belongs to a grid of a certain level, . The full grid results from the combination of these grids, and the direct sums of the hierarchical basis functions are the standard basis functions of the full grid.

Each function, <sup>u</sup> <sup>∈</sup> <sup>S</sup><sup>0</sup> i,j , can be expressed by a linear combination of hierarchical basis functions:

$$u = \sum\_{s=1}^{i} \sum\_{t=1}^{j} u\_{s,t}, \ u\_{s,t} \in T\_{s,t}, \ s = 1, \ldots, i, \ t = 1, \ldots, j. \tag{5}$$

In [29], the inequality:

$$\|u\_{s,t}\|\_{\infty} \le 4^{-s-t-1} |u| \tag{6}$$

was proven, where the seminorm, |u|, is given by:

$$|u| := \left\| \frac{\partial^4 u}{\partial x^2 \partial y^2} \right\|\_{\infty} \tag{7}$$

In order to obtain a sparse grid, only the functions whose coefficients are greater than or equal to a certain tolerance value are chosen. In [29], this tolerance value is set to 4− <sup>ˆ</sup>−<sup>1</sup>|u|, where <sup>ˆ</sup> is the level of the sparse grid, which is defined in the following. All remaining basis functions are neglected, and a sparse grid space can be defined as follows:

Definition 1 (Sparse grid space). *The space,* Sˆ<sup>0</sup> <sup>ˆ</sup>*, spanned by the subspaces,* <sup>T</sup>i,j *, with* <sup>i</sup>+<sup>j</sup> <sup>=</sup> <sup>≤</sup> <sup>ˆ</sup>+1*, i.e.,*

$$\hat{S}\_{\hat{\ell}}^{0} := \bigotimes\_{s=1}^{\hat{\ell}} \bigotimes\_{t=1}^{\hat{\ell}-s+1} T\_{s,t} = \bigotimes\_{s+t \le \hat{\ell}+1} T\_{s,t} \tag{8}$$

*is called the* sparse grid space*. The grid resulting from the combination of the subgrids corresponding to the subspaces,* <sup>T</sup>s,t, s <sup>+</sup> <sup>t</sup> <sup>≤</sup> <sup>ˆ</sup> + 1*, is called the* sparse grid *and has the level,* <sup>ˆ</sup>*.*

The condition, <sup>|</sup>|<sup>1</sup> <sup>≤</sup> <sup>ˆ</sup> + 1, leads to a triangular scheme of subspaces, <sup>T</sup>i,j , which is depicted in Figure 4. Please note that <sup>k</sup> = 0, k ∈ {1, <sup>2</sup>}, is feasible, but it must hold <sup>∀</sup><sup>k</sup>∈{1,2} <sup>k</sup> <sup>&</sup>lt; <sup>ˆ</sup>. The transition to N-dimensional sparse grids of the level, ˆ, is trivial: all subgrids of the level, , with <sup>|</sup>|<sup>1</sup> <sup>≤</sup> <sup>ˆ</sup> <sup>+</sup> <sup>N</sup> <sup>−</sup> <sup>1</sup> ∧ ∀k=1,...,N <sup>k</sup> <sup>&</sup>lt; <sup>ˆ</sup> have to be combined, where <sup>|</sup>|<sup>1</sup> <sup>=</sup> <sup>N</sup> <sup>i</sup>=1 i.

Figure 5 shows some examples of sparse grids of the levels 3, 4 and 5 in 2D and 3D, which were produced with an algorithm combining sparse grids of a given level from the respective subgrids.

Figure 4. Triangular scheme for the combination of a sparse grid of level 3 from two-dimensional subgrids meeting the condition, ||<sup>1</sup> ≤ 3+2 − 1=4 ∧ ∀<sup>k</sup>∈{1,2} <sup>k</sup> < 3. If all eight subgrids are combined, a full grid of level 3 is obtained. If only the subgrids of levels (0,0), (1,0), (2,0), (3,0), (0,1), (0,2), (0,3), (1,1), (2,1), (1,2), (2,2), (3,1) and (1,3) are taken, *i.e*., if the small triangle consisting of three grids at the bottom right is left out, the corresponding sparse grid is obtained, *cf*. Figure 5 top left.

Figure 5. Sparse grids of the levels 3, 4, and 5 in 2D and 3D.

The coefficients corresponding to grid points of a grid with a high level do not contribute considerably to the interpolant. Hence, they can be neglected. However, a drawback of the sparse grid interpolation is the fact that this is only possible for functions that are sufficiently smooth. Because of inequality in Equation (6), the function that is interpolated must be at least four times continuously differentiable. Otherwise, the estimate cannot be made. As differentiability cannot be assumed in the present application, the combination technique developed by [24] is used here. A second problem concerning the computational effort is that sparse grids are still fully occupied at the boundary, *cf*. Figure 5. This problem can be handled using a transformation that sets the loss function to zero at the boundary. The methodology applied for SpaGrOW is described in the following.

#### 2.1.2. Combination Technique

The combination technique developed by [24] combines problems on regular grids of the level, , with different mesh sizes, 1, ..., <sup>N</sup> , to a sparse grid problem in a very efficient way. As this method has proven of value in practical applications and as it is also applicable to functions that are not necessarily smooth, it is taken as the sparse grid interpolation method for SpaGrOW.

As before, the two-dimensional case is considered first: Let <sup>u</sup> : <sup>R</sup><sup>2</sup> <sup>→</sup> <sup>R</sup> be an arbitrary function. Every function defined on a sparse grid can be linearly combined from their interpolants on the regular subgrids, <sup>Ω</sup>i,j , i <sup>+</sup> <sup>j</sup> ∈ {ˆ, <sup>ˆ</sup> + 1}. If <sup>u</sup>i,j <sup>∈</sup> <sup>T</sup>i,j is considered, a combination function, <sup>u</sup><sup>c</sup> <sup>ˆ</sup>, can be defined as follows:

Definition 2 (Combination function (2D)). *The* combination function *of solutions* ui,j ∈ Ti,j *of FEM problems on regular two-dimensional grids of the level,* (i, j)*,* <sup>i</sup> <sup>+</sup> <sup>j</sup> ∈ {ˆ, <sup>ˆ</sup> + 1}*, is given by:*

$$u\_{\hat{\ell}}^c := \sum\_{i+j=\hat{\ell}+1} u\_{i,j} - \sum\_{i+j=\hat{\ell}} u\_{i,j} \tag{9}$$

The error, <sup>u</sup> <sup>−</sup> <sup>u</sup><sup>c</sup> <sup>ˆ</sup>, is in O h2 <sup>ˆ</sup> log(h <sup>ˆ</sup>)−<sup>1</sup> , *cf*. Section 3.3.1. Analogous to Definition 2, the following definition is given for the three-dimensional case:

Definition 3 (Combination function (3D)). *The* combination function *of solutions* ui,j,k ∈ Ti,j,k *of FEM problems on regular three-dimensional grids of the level,* (i, j, k)*,* <sup>i</sup> <sup>+</sup> <sup>j</sup> <sup>+</sup> <sup>k</sup> ∈ {ˆ, <sup>ˆ</sup> + 1, <sup>ˆ</sup> + 2}*, is given by:*

$$u\_{\hat{\ell}}^c := \sum\_{i+j+k=\hat{\ell}+2} u\_{i,j,k} - 2 \sum\_{i+j+k=\hat{\ell}+1} u\_{i,j,k} + \sum\_{i+j+k=\hat{\ell}} u\_{i,j,k} \tag{10}$$

Via complete induction, this can directly be transferred to the N-dimensional case:

Definition 4 (Combination function (in general)). *The* combination function *of solutions* u ∈ T *of FEM problems on regular* <sup>N</sup>*-dimensional grids of the level, , with* <sup>|</sup>|<sup>1</sup> <sup>=</sup> <sup>ˆ</sup> <sup>+</sup> <sup>N</sup> <sup>−</sup> <sup>1</sup> <sup>−</sup> i, i <sup>=</sup> 0, ..., N − 1*, is given by:*

$$u\_{\ell}^c := \sum\_{i=0}^{N-1} (-1)^i \binom{N-1}{i} \sum\_{|\ell|\_1 = \hat{\ell} + N - 1 - i} u\_{\ell} \tag{11}$$

Remark 5. *Because of the condition,* <sup>|</sup>|<sup>1</sup> <sup>=</sup> <sup>ˆ</sup>+N−1−i, i = 0, ..., N−1*, a multilinear interpolation on a full regular grid with mesh size* 1/2 <sup>ˆ</sup> *and level* ¯ = (ˆ) *can be executed as follows: All function values are computed on a sparse grid of the level,* (ˆ, ..., <sup>ˆ</sup>) <sup>∈</sup> <sup>N</sup><sup>N</sup> *. Because of the hierarchical structure of the sparse grid, all function values of the regular subgrids of the level, , with* ||<sup>1</sup> ≤ <sup>ˆ</sup> <sup>+</sup> <sup>N</sup> <sup>−</sup> <sup>1</sup> *are known, i.e., all function values required for the interpolation, cf. Equation* (11)*.*

#### *2.2. Smoothing Procedures*

As already mentioned, the combination technique is applicable to functions that are not necessarily differentiable. However, an assumption of this method is the existence of an asymptotic error expansion for the discrete solution, *cf*. Section 3.3.1. Hence, the function to be interpolated should possess certain smoothness properties. At least, the statistical noise has to be filtered out as effectively as possible, and the quality of the continuity of the loss function should be high enough. As noise can be unfavorable for the combination technique, smoothing procedures have to be applied before the piecewise-linear interpolation is performed. In SpaGrOW, radial basis functions (RBFs) have turned out to be very suitable in order to approximate the loss function. In some cases, especially close to the minimum, a simple quadratic model is sufficient, as well. Additionally, the noise is filtered out via regularization procedures. In the approximation process, the respective nonlinear regression problem is not solved via least squares, but estimators with a lower variance. The regularization methods used in SpaGrOW are elastic nets and multi-adaptive regression splines. The applied smoothing and regularization algorithms are described in brief, and a theoretical selection is presented and discussed in the following. The final decision can only be made after a detailed practical evaluation, which is performed in Section 4.1.

#### 2.2.1. Effects of Statistical Noise on Piecewise-linear Interpolation

Suppose u is linear and noisy. Then, the piecewise-linear interpolation should reproduce the function exactly. However, statistical noise can lead to a staggered function, which is shown in Figure 6. As this is not desired, a preprocessing, which approximates the function to be interpolated and eliminates the noise, is indispensable.

However, another problematic may appear whenever the sampled points used for the approximation are situated too close to each other: Figure 7 indicates that in this case, the trend of the function can be reproduced completely incorrectly. Therefore, SpaGrOW has to deal with this phenomenon, and the size of the trust region becomes too small has to be avoided.

Figure 6. Problematic in the case of piecewise-linear interpolation of a noisy function, the interpolation leads to a staggered function reproducing the noise.

Figure 7. Problematic in the case of the approximation of a noisy function, when the points are situated too close to each other, the function values cannot be differentiated anymore, and the trend of the function can be reproduced completely incorrectly by the smoothing procedure.

#### 2.2.2. Smoothing Functions

In the following, the approximation of high-dimensional functions with radial basis functions (RBFs) is presented shortly.

RBFs are an efficient and common method for interpolation and approximation tasks [31]. The word *radial* indicates that they are functions of the distance of two adjacent points. Hence, points which are situated on a hyperball with radius r around a reference point have the same function value. Here are some examples:

$$\phi(r) \;=\; r \; \text{(linear)}\tag{12}$$

$$\phi(r) \ = \ \exp\left(-cr^2\right), \ c \in \mathbb{R}^+ \ \text{(Gaussian)}\tag{13}$$

$$\phi(r) \;=\;\sqrt{r^2 + c^2}, \; c \in \mathbb{R}^{\neq 0} \text{ (multiquadratic)}\tag{14}$$

$$\phi(r) \;=\; \frac{1}{\sqrt{r^2 + c^2}}, \; c \in \mathbb{R}^{\neq 0} \text{ (inverse multiplicative)} \tag{15}$$

$$\left(\phi(r)\right) \;=\; r^3 \left(\text{cubic}\right) \tag{16}$$

$$\phi(r) \quad = \quad r^2 \log r \text{ (thin plate splines)} \tag{17}$$

A function, <sup>f</sup> : <sup>R</sup><sup>N</sup> <sup>→</sup> <sup>R</sup>, is approximated by:

$$f(X) \approx \sum\_{i=1}^{K} \beta\_i h\_i \left( \|X - X\_i\| \right) \tag{18}$$

where <sup>X</sup> <sup>∈</sup> <sup>R</sup><sup>N</sup> is an arbitrary test point and <sup>K</sup> <sup>≤</sup> <sup>m</sup>, where <sup>m</sup> <sup>∈</sup> <sup>N</sup> is the total number of points. The points, <sup>X</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup><sup>N</sup> , i = 1, ..., K, are the <sup>K</sup> representative centers of the training set, on the basis of which the coefficients, <sup>β</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup>, i = 1, ..., K, are determined. The functions, <sup>h</sup><sup>i</sup> ∈ H, i = 1, ..., K, are from the set of RBFs, H. The computation of the coefficients is realized by so-called *RBF networks*, *cf*. Figure 8, which are directed neuronal networks [32] and proceed from the input to the output neurons. They possess one layer of hidden neurons only, which corresponds to the determination of the coefficients. In the case of an approximation, it holds m ≤ K, and the following overdetermined linear equation system (LES) has to be solved:

$$H\beta = Y\tag{19}$$

where:

$$H = \begin{pmatrix} h\_1 \left( \|X\_1 - X\_1\| \right) & h\_2 \left( \|X\_1 - X\_2\| \right) & \cdots & h\_K \left( \|X\_1 - X\_K\| \right) \\ h\_1 \left( \|X\_2 - X\_1\| \right) & h\_2 \left( \|X\_2 - X\_2\| \right) & \cdots & h\_K \left( \|X\_2 - X\_K\| \right) \\ \vdots & \vdots & \vdots & \vdots \\ h\_1 \left( \|X\_m - X\_1\| \right) & h\_2 \left( \|X\_m - X\_2\| \right) & \cdots & h\_K \left( \|X\_m - X\_K\| \right) \end{pmatrix} \in \mathbb{R}^{m \times K}$$

and:

$$Y = \begin{pmatrix} y\_1 \\ y\_2 \\ \vdots \\ y\_m \end{pmatrix} = \begin{pmatrix} f(X\_1) \\ f(X\_2) \\ \vdots \\ f(X\_m) \end{pmatrix} \in \mathbb{R}^m$$

The solution vector, <sup>β</sup> := (β1, ..., βK)<sup>T</sup> <sup>∈</sup> <sup>R</sup><sup>K</sup>, can be computed e.g., by using least squares.

In order to select K representative centers out of the m training data in an efficient way, an unsupervised learning method is used, namely the automated classification (*clustering*) procedure, k*-means* [33]. The K centers are also the centroids of the data classes obtained by the clustering algorithm.

A drawback of RBFs is the high computational effort required for the collocation of the matrix, H, in Equation (19). For N = 5, more than one thousand and for N = 6, over 72 million RBF evaluations are necessary. Hence, RBFs should not be used for N > 4.

For dimensions optimization problems, a simpler smoothing procedure is applied, whose complexity is only quadratic in the dimension. Instead of approximating the loss function itself, each of the physical target properties is approximated linearly, which results in a quadratic model of the loss function. This method is called *linear property approximation (Lipra)* and only requires the solution of an LES of the size N on a sparse grid. The loss function values can easily be obtained by inserting the approximated physical properties into the loss function. The complexity of the Lipra algorithm depends on the number of properties <sup>n</sup> and the dimension, <sup>N</sup>. It is <sup>O</sup>(nN<sup>2</sup>).

Figure 8. Radial basis function (RBF) network: The network proceeds for a test point, X, from the input neurons, X<sup>j</sup> , j = 1, ..., N, (components of X), over a layer of hidden neurons containing the RBF evaluations, hi(X − Xi), i = 1, ..., K, with the coefficients, βi, i = 1, ..., K, to the output neuron, where the summation takes place. The result is the function value, f(X).

#### 2.2.3. Regularization Algorithms

In order to avoid overfitting, the statistical noise the simulation data is affected with has to be eliminated. Therefore, the regression coefficients computed within the smoothing procedure may not be overestimated, so that not too much stress is laid on noisy data points. This can be achieved by a weighted regression, where a weighting factor is assigned to each data point that is small, whenever the noise on its function value is high, and *vice versa*. Another way is to constrain the coefficients in the optimization problem contained in the regression procedure. These so-called *regularization algorithms* do not solve the regression problem with the least square estimator, βLS, but with low-variance estimators. The former is given by:

$$\beta\_{\rm LS} = \arg\min\_{\beta} ||Y - H\beta||\_2^2 \tag{20}$$

and has the following drawbacks:


In order to counteract these drawbacks, the regression coefficients have to be *shrinked* or set to zero in some cases (variable selection). This has the effect that especially correlated variables and outliers only have low influence within the model. Moreover, the variance of the estimator is reduced, so that the probability of overfitting decreases. In SpaGrOW, this is realized by a *Naive Elastic Net* [34]:

$$\beta\_{\text{NEN}} = \arg\min\_{\beta} \| |Y - H\beta| \|\_{2}^{2}, \text{ where } \alpha ||\beta||\_{2}^{2} + (1 - \alpha) |\beta|\_{1} \le t, \ \alpha \in [0, 1], \ t > 0 \qquad (21)$$

It contains two additional model parameters, <sup>α</sup> <sup>∈</sup> [0, 1] and <sup>t</sup> <sup>∈</sup> <sup>R</sup>><sup>0</sup>, which can be determined, e.g., by a ten-fold cross-validation. A naive elastic net possesses a so-called *grouping effect*, *i.e*., correlated model variables have similar regression coefficients. A penalized formulation of Equation (21) is achieved by introducing Lagrangian multiplicators, λ1, λ<sup>2</sup> ≥ 0:

$$\beta\_{\text{NEN}} = \arg\min\_{\beta} \mathcal{L}(\beta, \lambda\_1, \lambda\_2), \ \mathcal{L}(\beta, \lambda\_1, \lambda\_2) := ||Y - H\beta||\_2^2 + \lambda\_2 ||\beta||\_2^2 + \lambda\_1 |\beta|\_1 \tag{22}$$

It holds α = <sup>λ</sup><sup>2</sup> <sup>λ</sup>1+λ<sup>2</sup> . For α ∈ {0, 1}, the two most famous methods in the field of biased regression are obtained: for α = 1, it is the *Ridge Regression* [35], and for α = 0, it is the *Least Absolute Shrinkage and Selection Operator (LASSO)* [36]. Both methods contain shrinkage, but only in the case of the LASSO, a variable selection is possible. For more details about Elastic Nets, Ridge Regression and the LASSO, *cf*. [34].

Another regularization method is based on the application of *Multivariate Adaptive Regression Splines (MARS)* [37]. The basis functions within this algorithm are products of so-called *hinge functions* of the order <sup>q</sup> <sup>∈</sup> <sup>N</sup>:

$$S\_q(x) = S\_q(x - \nu) := [\pm(x - \nu)]\_+^q, \ q \in \mathbb{N} \tag{23}$$

where ν is a *node*. "+" means that for negative arguments, the functions becomes zero. Two hinge functions with opposite sign belong together and are mirrored at their common node. Hence, they divide the input space into two disjoint subsets. By the recursive selection of p nodes, the input space is divided into p + 1 disjoint subsets. A single hinge function or a product of hinge functions, which lead to even more divisions, is assigned to each subspace. This models the interaction of the input variables. The approximation of a function, <sup>f</sup> : <sup>R</sup><sup>N</sup> <sup>→</sup> <sup>R</sup>, within the MARS algorithm is realized as follows:

$$f(x) \approx \sum\_{i=1}^{p} (\beta\_M)\_i \prod\_{j=1}^{s\_i} S\_q(x\_{N(i,j)} - \nu\_{i,j}) \tag{24}$$

Thereby, the s<sup>i</sup> are the numbers of hinge functions considered in the partitioning, i ∈ {1, ..., p}, νi,j , i = 1, ..., p, j = 1, ..., s<sup>i</sup> are the nodes of the recursive division and N(i, j) is the dimension of the input vector, x, to which the division defined by the node νi,j refers. In total, f is approximated by a regression spline and, in the case of q = 1, by a piecewise-linear function. The lower probability of overfitting is reached by pruning the basis functions involved in the model. For more details about MARS, *cf*. [37]. A disadvantage of this regularization algorithm is the fact that it requires high computational effort for higher dimensions, due to the pruning procedure.

Please note that in the case of the Lipra algorithm, which is used for high-dimensional problems, no regularization procedure is applied, due to the high amount of computational effort.

#### 2.2.4. Selection of Smoothing Procedures: Theoretical Considerations

The selection of the best smoothing procedure is made first from a theoretical perspective. A detailed practical evaluation is pointed out in Section 4.1. Theoretically, an approximation method can be selected, due to a reliable estimate of the approximation error within a domain, <sup>Ω</sup> <sup>⊂</sup> <sup>R</sup><sup>N</sup> . Such estimates exist for positive definite RBFs, *cf*. [38]. The Gaussian, the inverse multiquadric and so-called *Wendland functions* [39] are positive definite. The latter are RBFs with compact supports and are piecewise polynomial with a minimal degree dependent on differentiability and dimension.

An estimate of the approximation error can be realized via the introduction of *Native Spaces* [38]:

Definition 6 (Native Spaces). *The* Native Space*,* Nφ(Ω)*, for a given positive definite RBF,* φ*, within the domain,* Ω*, is given by:*

$$\mathcal{N}\_{\phi(\Omega)} := \overline{\{\phi(||\cdot - x||), x \in \Omega\}}\tag{25}$$

*where for* f<sup>1</sup> = <sup>n</sup><sup>1</sup> <sup>k</sup>=1 <sup>α</sup>kφ(|| · −xk||) ∈ Nφ(Ω) *and* <sup>f</sup><sup>2</sup> <sup>=</sup> <sup>n</sup><sup>2</sup> <sup>j</sup>=1 βjφ(|| · −x<sup>j</sup> ||) ∈ Nφ(Ω)*:*

$$\left< f\_1, f\_2 \right>\_{\phi(\Omega)} := \sum\_{k=1}^{n\_1} \sum\_{j=1}^{n\_2} \alpha\_k \beta\_j \phi(||x\_k - x\_j||) \tag{26}$$

*Thereby,* <sup>n</sup>1, n<sup>2</sup> <sup>∈</sup> <sup>N</sup>*,* <sup>α</sup>k, β<sup>j</sup> <sup>∈</sup> <sup>R</sup>, k = 1, ..., n1, j = 1, ..., n2*, and* <sup>x</sup>k, x<sup>j</sup> <sup>∈</sup> <sup>Ω</sup>, k = 1, ..., n1, j = 1, ..., n2*.*

The Hilbert space Nφ(Ω) is the completing of the pre-Hilbert space, {φ(|| · −x||), x ∈ Ω}. As the approximation error depends on the size of the domain, Ω, the so-called *fill distance* is defined next:

Definition 7 (Fill distance). *For a given discrete training set,* X ⊂ Ω*, with training data,* x<sup>j</sup> , j = <sup>1</sup>, ..., m, m <sup>∈</sup> <sup>N</sup>*, the* fill distance*,* <sup>Δ</sup><sup>Ω</sup>,<sup>X</sup> *, is defined by:*

$$\Delta\_{\Omega, \mathcal{K}} \coloneqq \sup\_{x \in \Omega} \min\_{j=1,\ldots,m} ||x - x\_j|| \tag{27}$$

From Definitions 6 and 7, the following theorem for the estimation of the approximation error can be formulated:

Theorem 8 (Approximation error for positive definite RBFs). *Let* Ω, X , φ *and* f *be defined as above. Let, furthermore,* g *be the approximating function for* f ∈ Nφ(Ω)*, obtained by the positive definite RBF,* φ*. Then, the following estimate for the approximation error holds:*

$$||f - g||\_{L\_{\infty}(\Omega)} \le h(\Delta\_{\Omega, \mathcal{X}})||f||\_{N\_{\phi(\Omega)}}\tag{28}$$

*where* lim<sup>Δ</sup>Ω,X→<sup>0</sup> h(Δ<sup>Ω</sup>,<sup>X</sup> )=0*.*

Proof. [38]

This theorem is very important for the convergency of SpaGrOW, *cf*. Section 3.3.2. However, for SpaGrOW, another condition, lim<sup>Δ</sup>Ω,X→<sup>0</sup> h(Δ<sup>Ω</sup>,<sup>X</sup> )=0, has to be fulfilled. The following definition introduces the term *stability* of an approximation in the sense of SpaGrOW:

Definition 9 (Stable approximation). *Let* Ω *and* X *be defined as above. Let be* f ∈ H*, where* H *is a Hilbert space of functions on* <sup>Ω</sup> *with the scalar product,* ·, ·H*, and norm* ||f||<sup>H</sup> <sup>=</sup> f,f<sup>H</sup> *for* f ∈ H*. An approximation within the domain,* Ω*, by a function,* g ∈ P*, where* P *is a pre-Hilbert space with the same scalar product and* <sup>P</sup>¯ <sup>=</sup> <sup>H</sup>*, is called stable, if:*

$$||f - g||\_{\mathcal{H}} \le \kappa h(\Delta\_{\Omega, \mathcal{X}}) \tag{29}$$
  $where \; \kappa > 0, \limsup\_{\Delta\_{\Omega, \mathcal{X}} \to 0} h(\Delta\_{\Omega, \mathcal{X}}) = 0 \; and \limsup\_{\Delta\_{\Omega, \mathcal{X}} \to 0} \frac{h(\Delta\_{\Omega, \mathcal{X}})}{\Delta\_{\Omega, \mathcal{X}}} = 0.$ 

The second important condition for the convergency of SpaGrOW is lim<sup>Δ</sup>Ω,X→<sup>0</sup> h(ΔΩ,<sup>X</sup> ) <sup>Δ</sup>Ω,<sup>X</sup> = 0. The following corollary ascertains the stability of an approximation based on a Gaussian RBF:

Corollary 10 (Stability of an approximation based on a Gaussian RBF). *Let* Ω, X *and* f *be defined as in Theorem 8. Let, furthermore,* <sup>φ</sup>(|| · −x||) = exp (−c|| · −x||<sup>2</sup>) *for* <sup>x</sup> <sup>∈</sup> <sup>Ω</sup> *and* <sup>c</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> *a Gaussian RBF. Then, the approximation of* f *by* g(x) := <sup>ν</sup> <sup>k</sup>=1 <sup>α</sup>kφ(||x<sup>k</sup> <sup>−</sup> <sup>x</sup>||), ν <sup>∈</sup> <sup>N</sup>, α<sup>k</sup> <sup>∈</sup> <sup>R</sup>, x<sup>k</sup> <sup>∈</sup> <sup>Ω</sup>, k <sup>=</sup> 1, ..., ν, *is stable.*

Proof. For a Gaussian RBF, the following estimation holds (*cf*. [38]):

$$||f - g||\_{L\_{\infty}(\Omega)} \le \exp\left(-d\left(\frac{\log(\Delta\_{\Omega, \mathcal{X}})}{\Delta\_{\Omega, \mathcal{X}}}\right)\right)||f||\_{\mathcal{N}\_{\phi(\Omega)}}\tag{30}$$

where d > <sup>0</sup> is a constant. With <sup>h</sup>(Δ<sup>Ω</sup>,<sup>X</sup> ) := exp −d log(ΔΩ,<sup>X</sup> ) ΔΩ,<sup>X</sup> , it holds lim<sup>Δ</sup>Ω,X→<sup>0</sup> <sup>h</sup>(Δ<sup>Ω</sup>,<sup>X</sup> )=0 and lim<sup>Δ</sup>Ω,X→<sup>0</sup> h(ΔΩ,<sup>X</sup> ) <sup>Δ</sup>Ω,<sup>X</sup> = 0. With <sup>κ</sup> := ||f||<sup>N</sup>φ(Ω) and the fact that <sup>N</sup>φ(Ω) is a Hilbert space and <sup>g</sup> an element of the pre-Hilbert space, {φ(||·−x||), x ∈ Ω}, with {φ(|| · −x||), x ∈ Ω} = Nφ(Ω), it follows that the approximation based on a Gaussian RBF is stable with respect to Definition 9.

Because of Corollary 10, the Gaussian RBF is selected for SpaGrOW for theoretical reasons. However, such estimates exist for other positive RBFs, as well. The final selection of the Gaussian RBF is made after a detailed practical evaluation, as mentioned above.

#### 3. The SpaGrOW Algorithm

#### *3.1. Ingredients of the Algorithm*

Within SpaGrOW, the minimization of the interpolation model on a full grid is performed discretely. The corresponding complexity is O (n + 1)<sup>2</sup><sup>N</sup> , where <sup>n</sup> +1(<sup>n</sup> <sup>∈</sup> <sup>N</sup>) is the number of points of the full grid in one dimension. The first question arising is whether the algorithm should be applied locally or globally. Globally, this means that the smoothing procedure and the sparse grid interpolation are performed on the complete feasible domain for the force field parameters. However, it results that a local consideration is the better way, *i.e*., the combination with the Trust Region approach is highly important. This makes SpaGrOW an iterative local optimization method. The algorithm is described in detail in the following.

#### 3.1.1. Local and Global Consideration

The combination technique delivers a piecewise-multilinear function, <sup>q</sup> : <sup>R</sup><sup>N</sup> <sup>→</sup> <sup>R</sup>, which either interpolates the loss function, F, itself or an approximating function, G, from a sparse grid of the level, <sup>ˆ</sup> <sup>∈</sup> <sup>N</sup>, on a full grid, <sup>G</sup><sup>N</sup> ¯ , with the level, ¯ = (ˆ, ..., <sup>ˆ</sup>) <sup>∈</sup> <sup>N</sup><sup>N</sup> . The total error resulting from the approximation and interpolation error can be measured via the L2- or L∞-distance between F and the interpolation model, q, on the unit square, Ω := [0, 1]<sup>N</sup> :

$$||e||\_{L\_2[0,1]^N} := \quad ||F-q||\_{L\_2[0,1]^N} = \left(\int\_{[0,1]^N} (F(x) - q(x))^2 \, dx\right)^{\frac{1}{2}} \tag{31}$$

$$||e||\_{L\_{\infty}[0,1]^N} := ||F - q||\_{L\_{\infty}[0,1]^N} = \max\_{x \in [0,1]^N} |F(x) - q(x)|\tag{32}$$

If the function values, <sup>F</sup>(x), are known for <sup>x</sup> <sup>∈</sup> <sup>G</sup><sup>N</sup> ¯ , the error terms in Equations (31) and (32) can be approximated numerically.

The total error is important for the assessment of whether SpaGrOW can be applied globally or not. Hence, it has to be determined on a full grid for the local and the global variant. Moreover, the convergency behavior of SpaGrOW has to be analyzed. There are some arguments for a local consideration: Following the heuristics indicated in Section 2.2, it is indispensable to perform a smoothing procedure before interpolating. In the global case, this smoothing procedure would be executed on a sparse grid discretizing the complete feasible domain. As the loss function can be arbitrarily complex and jagged, it should be nearly impossible to reproduce them in an accurate way within a huge domain with only a small number of data points. Furthermore, the discretization error would be too high within a huge domain for the determination of the global minimum. However, it would be possible to apply the algorithm globally first and to make local refinements afterwards, but an inaccurate approximation of the loss function can lead far away from the minimum. Hence, a local consideration should be preferred, and SpaGrOW should be a local iterative procedure, like the gradient-based methods applied for the present task.

In order to verify these heuristic considerations, a practical analysis was performed, where the loss function was replaced by the simple two-dimensional function, H(x1, x2) = <sup>−</sup> exp (−(x<sup>1</sup> <sup>−</sup> 2)<sup>2</sup> <sup>−</sup> (x<sup>2</sup> + 1)<sup>2</sup>). The global minimum is situated at (x<sup>∗</sup> 1, x<sup>∗</sup> <sup>2</sup>) = (2, −1), with corresponding function value, H(2, −1) = −1. Artificial statistical uncertainties, *i.e*., uniformly distributed random numbers from the interval, [−0.03H(x1, x2), +0.03H(x1, x2)], were added on the function values, H(x1, x2). For reasons of brevity, the analysis is not reported here. The results were the following:

## • Comparison of the total errors:

	- Without artificial uncertainties, the global variant of SpaGrOW only led to the minimum when the minimum was a grid point. Otherwise, the minimum could only be determined within the accuracy of the discretization.
	- With artificial uncertainties, both discretization and approximation error had a negative effect on the convergency. The resulting approximated minimum was far away from the real one.

To summarize, a local consideration is preferred to a global one.

#### 3.1.2. Combination with the Trust Region Approach

For each iteration, a compact neighborhood, where the minimization problem is solved discretely, is determined. In most cases, the minimum is situated at the boundary of the compact domain. If certain assumptions are fulfilled, this boundary minimum becomes the new iteration and the center of a new compact neighborhood.

Following the idea of Trust Region methods [25,26], the compact neighborhood is a trust region of the size, Δ<sup>k</sup> > 0. Due to the sparse grid interpolation, it is not a ball, B<sup>Δ</sup><sup>k</sup> (x<sup>k</sup>), but a hyperdice, <sup>W</sup><sup>k</sup>, of the form:

$$\mathcal{W}^k := \bigotimes\_{i=1}^N \left[ x\_i^k - \Delta\_k, x\_i^k + \Delta\_k \right] \tag{33}$$

where x<sup>k</sup> is the kth iteration of SpaGrOW. On the one hand, the size, Δk, has to be small enough, so that <sup>W</sup><sup>k</sup> is situated within the feasible domain of the force field parameters and the interpolation model is consistent with the original loss function, F, as already discussed in Section 3.1.1. On the The quality of the model, q, is estimated using the following ratio:

$$r^k := \frac{F(x^k) - F(x^\*)}{q(x^k) - q(x^\*)} = \frac{F(x^k) - F(x^\*)}{G(x^k) - q(x^\*)}\tag{34}$$

where <sup>x</sup><sup>∗</sup> is the discrete minimum on a full grid within the hyperdice, <sup>W</sup><sup>k</sup>. As <sup>x</sup><sup>k</sup> is a point of the sparse grid and the model, q, interpolates the approximating function, G, from the sparse grid on the full grid, it holds q(x<sup>k</sup>) = G(x<sup>k</sup>).

In practice, two thresholds, 0 < η<sup>1</sup> < η2, and size parameters, 0 < γ<sup>1</sup> < 1 < γ2, are introduced, and the following three cases are considered:


As for the Trust Region methods, a minimal Δmin > 0 is defined *a priori*. The algorithm is stopped as soon as <sup>∃</sup> <sup>k</sup> : Δ<sup>k</sup> <sup>&</sup>lt; <sup>Δ</sup>min.

#### 3.1.3. Treatment of Boundary Points

After the application of a transformation, <sup>ξ</sup> : <sup>W</sup><sup>k</sup> <sup>→</sup> [0, 1]<sup>N</sup> , into the unit hyperdice, *cf*. Section 3.2.1, the loss function values are set to zero at its boundary by an appropriate modification. The reason of the realization of homogeneous Dirichlet boundary conditions is the fact that also sparse grids are fully occupied at their boundaries. In order to save a magnitude of simulations, the original loss function, F, is multiplied with a product of sine functions, *i.e*., the modified function:

$$\begin{aligned} \bar{F}: [0,1]^N &\rightarrow & \mathbb{R}\_0^+ \\ y &\mapsto & \left(\prod\_{i=1}^N \sin \pi y\_i\right) F(y) \end{aligned} \tag{35}$$

where <sup>y</sup> <sup>=</sup> <sup>ξ</sup>(x) with <sup>x</sup> ∈ W<sup>k</sup> is considered instead of <sup>F</sup>. Hence, the smoothing and interpolation procedures are applied for <sup>F</sup>¯ ◦ <sup>ξ</sup> : <sup>W</sup><sup>k</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> . Due to Equation (35), it holds <sup>F</sup>¯|∂[0,1]<sup>N</sup> = 0. However, as F and not F¯ has to be minimized and as F and F¯ do not have the same minimum, the back-transformation:

$$F(x) = F(\xi^{-1}(y)) = \frac{\bar{F}(y)}{\prod\_{i=1}^{N} \sin \pi y\_i} \tag{36}$$

has to be applied before the discrete minimization is performed. Hence, the minimum of <sup>F</sup> ◦ <sup>ξ</sup>−<sup>1</sup> has to be determined, which is not possible at the boundary of <sup>W</sup><sup>k</sup>. As the minimum is expected to be situated at the boundary, only the grid:

$$
\tilde{G}\_{\bar{\ell}}^{N} := G\_{\bar{\ell}}^{N} \backslash \partial G\_{\bar{\ell}}^{N} \tag{37}
$$

is considered for the minimization. This grid does not contain any boundary point of the original grid. However, this reduces the size of the trust region, but the loss in convergency speed is negligible, due to the number of simulations to be saved; for N = 4 and ˆ = 2, only nine instead of 393 simulations (*cf*. Table 1) per iteration are required.

Figure 9. Overview of the SpaGrOW algorithm, *i.e*., the inner iteration of the optimization procedure visualized in Figure 1. The Trust Region size, Δ, is increased or decreased, depending on the quality of the approximation model on the sparse grid.

#### *3.2. The Full Algorithm*

#### 3.2.1. Structure

The algorithm of SpaGrOW, *i.e*., the inner iteration of the optimization procedure shown in Figure 1, is visualized in Figure 9 and has the following structure:

• Initialization: Choose an initial vector, <sup>x</sup><sup>0</sup>, and an initial step length, <sup>D</sup><sup>0</sup> <sup>&</sup>gt; <sup>0</sup>, so that:

$$\forall \forall\_{i=1,\ldots,N} c\_i^0 \le x\_i^0 \le C\_i^0, \,\Delta\_0 < C\_i^0 - x\_i^0, \,\Delta\_0 < x\_i^0 - c\_i^0 \tag{38}$$

Thereby, [c<sup>0</sup> <sup>i</sup> , C<sup>0</sup> <sup>i</sup> ] is the feasible interval for the ith force field parameter. The maximal step length possible, Δmax, is computed at the beginning, and Δ0, as well as a minimal step length, Δmin, are set in relation to it. Please note that on the one hand, Δ<sup>0</sup> must not be too small, due to the noise, and that, on the other hand, it must not be too large, so that the problematic described in Section 3.1.1 is not faced.

Let k := 0.

• Transformation: Via the transformation:

$$\begin{aligned} \xi: \mathcal{W}^k &\to \ [0, 1]^N\\ x &\mapsto \ \frac{x - \left(\min\_{i=1,\ldots,N} x\_i^k\right) \cdot e}{\left(\max\_{i=1,\ldots,N} x\_i^k\right) \cdot e - \left(\min\_{i=1,\ldots,N} x\_i^k\right) \cdot e}\\ &= \ \frac{1}{2\Delta\_k} \left(x - x^k + \Delta\_k \cdot e\right) \end{aligned} \tag{39}$$

the initial vector, x<sup>0</sup>, is mapped from the hyperdice of the size, Δ0, into the unit hyperdice. Thereby, <sup>e</sup> = (1, ..., 1)<sup>T</sup> <sup>∈</sup> <sup>R</sup><sup>N</sup> and <sup>x</sup><sup>k</sup> <sup>i</sup> , i = 1, ..., N are the components of the vector, x<sup>k</sup>. Please note that only the back-transformation obtained by the inverse function:

$$\begin{aligned} \xi^{-1} : [0,1]^N &\rightarrow & \mathcal{W}^k \\ y &\mapsto & 2\Delta\_k y + x^k - \Delta\_k \cdot e \end{aligned} \tag{40}$$

is important, because, first, a sparse grid is simply collocated in [0, 1]<sup>N</sup> . Then, the grid points are back-transformed into the force field parameter space via ξ−<sup>1</sup>, so that molecular simulations can be executed.


$$\bar{F}: [0,1]^N \quad \rightarrow \quad \mathbb{R}\_0^+\tag{41}$$

$$y \quad \mapsto \left(\prod\_{i=1}^{N} \sin \pi y\_i \right) F(y) \tag{42}$$

homogeneous Dirichlet boundary conditions are realized in order to reduce the computational effort significantly. The function, F¯, is applied to each point, y,j , of the sparse grid.

• Smoothing: As the function to be minimized is affected with statistical noise, the function, <sup>F</sup>¯, is smoothed and regularized by the methods indicated in Section 2.2.2. Hence, for each point, y,j , of the sparse grid, a value, G(y,j ), of the approximating function is obtained.

• Interpolation: The function, <sup>G</sup>, is interpolated from the sparse grid, <sup>G</sup>ˆ<sup>N</sup> <sup>ˆ</sup> , on the full grid, G<sup>N</sup> ¯ , by the combination technique. Hence, an interpolation model, q¯, is obtained for each point, <sup>y</sup> <sup>∈</sup> <sup>G</sup><sup>N</sup> ¯ .

As the smoothing and interpolation procedures have been executed for F¯, the following division has to be applied for each non-boundary point of the full grid:

$$\forall\_{y \in \bar{G}\_{\bar{\ell}}^N} q(y) = \frac{\bar{q}(y)}{\prod\_{i=1}^N \sin \pi y\_i} \tag{43}$$

The interpolation model, <sup>q</sup>, is only valid for all <sup>y</sup> <sup>∈</sup> <sup>G</sup>˜<sup>N</sup> ¯ .

• Discrete minimization: Determine:

$$y^\* := \arg\min\_{y \in \tilde{G}^N\_{\tilde{\ell}}} q(y) \tag{44}$$

• Iteration step <sup>x</sup><sup>k</sup> <sup>→</sup> <sup>x</sup><sup>k</sup>+1: The ratio:

$$r^k := \frac{F(y^k) - F(y^\*)}{q(y^k) - q(y^\*)} = \frac{F(y^k) - F(y^\*)}{G(y^k) - q(y^\*)}, \ y^k = \xi(x^k) \tag{45}$$

is determined, and the following three cases are differentiated:

$$\begin{aligned} -\;r^k \ge \eta\_1 \Rightarrow x^{k+1} &:= \xi^{-1}(y^\*) \land \Delta\_{k+1} := \Delta\_k. \\ -\;r^k \ge \eta\_2 > \eta\_1 \Rightarrow x^{k+1} &:= \xi^{-1}(y^\*) \land \Delta\_{k+1} := \gamma\_2 \Delta\_k. \\ -\;r^k < \eta\_1 \Rightarrow x^{k+1} &:= x^k \land \Delta\_{k+1} := \gamma\_1 \Delta\_k. \end{aligned}$$

Thereby, η<sup>2</sup> > η<sup>1</sup> > 0 and γ<sup>2</sup> > 1 > γ<sup>1</sup> > 0 are global parameters.

Let k := k + 1, and go to step 2.

• Stopping criteria: The general stopping criterion is:

$$F(x^\*) \le \tau \tag{46}$$

where τ > 0. However, an additional criterion is that the minimum is situated within the hyperdice and not at its boundary. In total, the following three stopping criteria are considered:

(i) F(x∗) ≤ τ ∧ ξ(x∗) ∈/ U<sup>0</sup> ∪ U<sup>1</sup> (ii) <sup>F</sup>(x∗) <sup>≤</sup> <sup>τ</sup> <sup>∧</sup> <sup>ξ</sup>(x∗) <sup>∈</sup>/ <sup>U</sup><sup>0</sup> <sup>∪</sup> <sup>U</sup><sup>1</sup> <sup>∧</sup> <sup>Δ</sup><sup>∗</sup> <sup>&</sup>lt; <sup>Δ</sup>min (iii) <sup>∃</sup> <sup>k</sup> <sup>∈</sup> <sup>N</sup> : <sup>r</sup><sup>k</sup> < η<sup>1</sup> <sup>∧</sup> <sup>Δ</sup><sup>k</sup> <sup>&</sup>lt; <sup>Δ</sup>min.

Thereby:

$$\begin{aligned} U\_0 &:= \{ y \in [0,1]^N | \exists \ i \in 1, \ldots, N \; : \; y\_i \in \{0,1\} \}, \\ U\_1 &:= \{ y \in [0,1]^N | \exists \ i \in 1, \ldots, N \; : \; y\_i \in \{2^{-\hat{\ell}}, 1 - 2^{-\hat{\ell}}\} \} \end{aligned}$$

Moreover, Δ<sup>∗</sup> := Δk, where x<sup>∗</sup> = x<sup>k</sup>.

If the stopping criteria, (i) and (ii), are fulfilled, then SpaGrOW has converged successfully. Due to ξ(x∗) ∈/ U1, it is excluded, as well, that the minimum is situated at the boundary of the grid, where the interpolation model, q, is valid. Stopping criterion (ii) contains the additional condition, Δ<sup>∗</sup> < Δmin, excluding that improvements can be achieved by local refinements. Hence, this is the ideal stopping criterion.

Stopping criterion (iii) means that SpaGrOW has not led to success, *i.e*., even by decreasing the trust region, no accurate model, q, can be found. In particular, this is the case when the assumptions for the application of the combination technique are not fulfilled, which may be caused by an inaccurate smoothing procedure, wherein the noise has not been filtered out in a sufficient way.

#### 3.2.2. Complexity

In the following, the complexity of SpaGrOW is discussed. The present section is organized like Section 3.2.1, but here, SpaGrOW is discussed with respect to complexity:


$$\mathcal{O}\left[2^{\hat{\ell}}\left(\hat{\ell}^{N-1} - N2^{N-1}\right)\right] \tag{47}$$

This number multiplied by 2N + 1 is the number of required multiplications and multiplied by N, the number of sine evaluations required. The reduction of molecular simulations is achieved at the expense of

$$\mathcal{O}\left[N2^{\hat{\ell}}\left(\hat{\ell}^{N-1} - N2^{N-1}\right)\right] \tag{48}$$

multiplications and sine evaluations, a computational effort that can be neglected, if the high amount of computation time for a simulation is opposed.

• Smoothing: In the case of most approximation methods, a multivariate linear regression has to be performed with complexity <sup>O</sup>(mM<sup>2</sup> <sup>+</sup> <sup>M</sup><sup>3</sup>), where <sup>m</sup> is the number of data points and M, the number of basis functions (e.g., M = K for RBFs). However, this complexity can often be reduced to <sup>O</sup>(mM<sup>2</sup> <sup>+</sup> <sup>M</sup><sup>2</sup>) or, even, to <sup>O</sup>(1) by smart numerical methods, e.g., by a Cholesky factorization, in the case of positive definiteness. In contrast to simulations, the computational effort required for a smoothing procedure is negligible, as well. However, one has to consider that m and M must be large enough, on the one hand, in order to achieve an approximation as accurate as possible, and also small enough, on the other hand, in order to keep the computational effort low and to avoid overfitting. Additional effort appears due to the selection of centroids by the k*-means* algorithm, whose convergency speed always depends on the random choice of the initial centroids. The evaluation of the model function is done by a summation of the centroids only.

Furthermore, the regularization methods require some amount of computational effort, in particular, due to the application of Newton-Lagrange algorithms for the constrained optimization. Please note that most of them have been parameterized, as well, e.g., by cross validation.

• Interpolation: In the multilinear interpolation, all adjacent points have to be considered for each grid point. Hence, the interpolation is in O / (2<sup>N</sup> + 1) · <sup>2</sup> ˆ · <sup>ˆ</sup><sup>N</sup>−<sup>1</sup> 0 .

After the multilinear interpolation, a division by a sine term has to be executed for each point situated inside the unit hyperdice. As the sine term has already been calculated for each point of the sparse grid, only:

$$\mathcal{O}\left[\left(2N+1\right)\left(\left(2^{\hat{\ell}}-1\right)^{N}-2^{\hat{\ell}}\hat{\ell}^{N-1}\right)\right] \tag{49}$$

multiplications and:

$$\mathcal{O}\left[N\left((2^{\hat{\ell}}-1)^N - 2^{\hat{\ell}}\hat{\ell}^{N-1}\right)\right] \tag{50}$$

sine evaluations have to be performed. The number of required divisions is equal to (2 ˆ <sup>−</sup> 1)<sup>N</sup> . Please note that the divisions have to be performed for each inner point of the sparse grid, as well, because only the approximating function, G, coincides with the interpolation model, but not F¯.


#### *3.3. Convergency*

In the following, some convergency aspects the SpaGrOW algorithm are considered. In [40], it was proven that the algorithm converges under certain assumptions. Thereby, both smooth and noisy noisy case, the approximation error with respect to the original function, *i.e*., the function without noise, has to be taken into account. Moreover, it is examined to what extent SpaGrOW manages with less simulations than gradient-based methods. In the case of the latter, simulations have to be performed for the gradient components, the entries of the Hessian matrix and the Armijo steps. In the case of SpaGrOW, they have to be executed for the sparse grid points and the Trust Region steps. The steepest descent algorithm and the conjugate gradient methods required significantly less simulations for the gradient than SpaGrOW for the sparse grid: for N = 4, four simulations are required for the gradient and nine for a sparse grid of the level 2. As for the step length control, it can be observed that both gradient-based methods and SpaGrOW mostly need one Armijo or, respectively, one Trust Region step at the beginning of the optimization, but many more close to the minimum. The reason for the reduction of computational effort in the case of SpaGrOW lies in the lower number of iterations, not in the lower number of function evaluations per iteration. The advantage is that the step length is determined before, so that, especially at the beginning of the optimization, large steps can be realized. For smooth functions, the combination technique delivers small interpolation errors in most cases, also when the trust region is quite large, but always a descent direction, mostly leading to the boundary of the actual trust region.

Close to the minimum, both approaches have the drawback that after a high number of step length control iterations, *i.e*., after a large computational effort, only marginal improvements in the loss function values are observed. However, due to the statistical noise, the minimum can never be predicted exactly. At some point, the minimization has to be stopped, and the actual result has to be evaluated. However SpaGrOW is capable of searching for a smaller loss function value in more than one direction, due to the grid approach. Furthermore, its modeling approach increases the probability to get close to the minimum than gradient-based methods, as motivated in Section 1.

As the interpolation and smoothing errors are essential for the convergency of SpaGrOW, they are introduced and discussed in the following.

### 3.3.1. Interpolation Error

Under certain assumptions [24], the interpolation error for a sparse grid of the level, ˆ, is of the order O h2 <sup>ˆ</sup> (log(h <sup>ˆ</sup>)−<sup>1</sup>) <sup>ˆ</sup>−<sup>1</sup> . First, the two-dimensional case is considered again. If the difference <sup>u</sup> <sup>−</sup> <sup>u</sup>i,j , where <sup>u</sup>i,j <sup>∈</sup> <sup>T</sup>i,j , <sup>i</sup> <sup>+</sup> <sup>j</sup> ∈ {ˆ, <sup>ˆ</sup> + 1}, meets point-wise an asymptotic error expansion, *i.e*.,

$$u - u\_{i,j} = C\_1(h\_i)h\_i^2 + C\_2(h\_j)h\_j^2 + D(h\_i, h\_j)h\_i^2h\_j^2 \tag{51}$$

where ∀<sup>i</sup> |C1(hi)| ≤ κ, ∀<sup>j</sup> |C2(h<sup>j</sup> )| ≤ κ and ∀i,j |D(hi, h<sup>j</sup> )| ≤ κ, κ > 0, then the interpolation error can be estimated as follows:

$$|u - \hat{u}^c\_{\hat{\ell}}| \le \left(1 + \frac{5}{4} \log \left(h\_{\hat{\ell}}^{-1}\right)\right) \kappa h\_{\hat{\ell}}^2 = \mathcal{O}\left(h\_{\hat{\ell}}^2 \log(h\_{\hat{\ell}})^{-1}\right) \tag{52}$$

For the N-dimensional case, an analogous asymptotic error expansion delivers the following formula:

$$|u - \hat{u}^c\_{\hat{\ell}}| \le \left(\sum\_{l=0}^{N-1} \chi\_l \left(\log\left(h\_{\hat{\ell}}^{-1}\right)\right)^l\right) \kappa h\_{\hat{\ell}}^2 = \mathcal{O}\left(h\_{\hat{\ell}}^2 \left(\log(h\_{\hat{\ell}})^{-1}\right)^{\hat{\ell}-1}\right) \tag{53}$$

Thereby, <sup>u</sup> <sup>∈</sup> <sup>S</sup><sup>0</sup> ¯, ¯ = (ˆ, ..., <sup>ˆ</sup>) <sup>∈</sup> <sup>N</sup><sup>N</sup> and <sup>χ</sup><sup>l</sup> <sup>∈</sup> <sup>N</sup>, l = 0, ..., <sup>ˆ</sup> <sup>−</sup> <sup>1</sup>.

Please note that in order to guarantee the existence of such an asymptotic error expansion, the exact solution, u, must fulfill certain continuity and smoothness conditions. As u is supposed to reproduce F exactly on the sparse grid and as accurately as possible between the sparse grid points, these assumptions have to be transferred to F. This motivates again the need for a smoothing procedure in the case of statistical noise. In most cases, the existence of an asymptotic error expansion cannot be proven *a priori*. However, the combination technique was shown to deliver very good results in practice [24].

The interpolation error can be estimated above as follows:

Theorem 11 (Estimate for the interpolation error). *Let* <sup>Δ</sup> <sup>&</sup>gt; <sup>0</sup> *be the size of the hyperdice,* <sup>W</sup><sup>Δ</sup>(x)*, with center* <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>N</sup> *, where a sparse grid of the level,* <sup>ˆ</sup>*, is defined. Let the approximated function,* <sup>G</sup> : <sup>R</sup><sup>N</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> *, be given on the sparse grid and interpolated by the model function,* <sup>q</sup> : <sup>B</sup>Δ(0) <sup>→</sup> <sup>R</sup>*, on a full grid of the level,* := (ˆ, ..., <sup>ˆ</sup>) <sup>∈</sup> <sup>N</sup><sup>N</sup> *, using the combination method. Then, for the interpolation error,* <sup>ε</sup> <sup>=</sup> <sup>|</sup><sup>u</sup> <sup>−</sup> <sup>u</sup>ˆ<sup>c</sup> <sup>ˆ</sup>|*, in Inequality* (53)*, it holds for* <sup>u</sup> := <sup>G</sup> *and* <sup>u</sup>ˆ<sup>c</sup> <sup>ˆ</sup> := q*:*

$$\left| \forall\_{y \in \mathcal{W}^{\Delta}(x)} |\varepsilon(y)| \leq \kappa\_{\varepsilon}(\hat{\ell}) f\_{\hat{\ell}}^{\varepsilon}(\Delta) \right. \tag{54}$$

*where* <sup>κ</sup><sup>ε</sup> : <sup>N</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> *and* <sup>f</sup> <sup>ε</sup> <sup>ˆ</sup> : <sup>R</sup><sup>+</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> *with* lim <sup>ˆ</sup>→∞ <sup>κ</sup>ε(ˆ)=0 *and* lim<sup>Δ</sup>→<sup>0</sup> <sup>f</sup> <sup>ε</sup> <sup>ˆ</sup>(Δ) = 0*. The function,* f <sup>ε</sup> <sup>ˆ</sup>*, is continuous. Moreover,* ˜<sup>f</sup> <sup>ε</sup> <sup>ˆ</sup> := f ε ˆ <sup>Δ</sup> : <sup>R</sup><sup>+</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> *is continuous, as well, with* lim<sup>Δ</sup>→<sup>0</sup> ˜<sup>f</sup> <sup>ε</sup> <sup>ˆ</sup>(Δ) = 0*.*

Proof. Let <sup>y</sup> ∈ W<sup>Δ</sup>(x) be an arbitrary point in the hyperdice. It holds <sup>h</sup> <sup>ˆ</sup> = 2<sup>1</sup>− ˆ Δ. Following Inequality (53), there exist constants χ<sup>l</sup> > 0, l = 0, ..., N − 1 and a κ > 0, so that:

$$\begin{split} |\varepsilon(y)| &\leq \left(\sum\_{l=0}^{N-1} \chi\_l \left(\log\left(2^{\hat{\ell}-1}\Delta^{-1}\right)\right)^l\right) \kappa 2^{2-2\hat{\ell}} \Delta^2 \\ &= \left(\sum\_{l=0}^{N-1} \chi\_l \left(\hat{\ell} - 1 - \log \Delta\right)^l\right) \kappa 2^{2-2\hat{\ell}} \Delta^2 \\ &\leq \underbrace{N \cdot \max\_{l=0}^{N-1} \chi\_l \cdot \kappa 2^{2-2\hat{\ell}}}\_{:= \kappa\_\hat{\ell}(\hat{\ell})} \cdot \underbrace{\max\_{l=0}^{N-1} \left(\hat{\ell} - 1 - \log \Delta\right)^l \cdot \Delta^2}\_{:= f\_{\hat{\ell}}^+(\Delta)} \\ &= \quad \kappa\_\hat{\ell}(\hat{\ell}) f\_{\hat{\ell}}^\varepsilon(\Delta) \end{split} \tag{55}$$

It holds lim <sup>ˆ</sup>→∞ <sup>κ</sup>ε(ˆ)=0, and also, because of l'Hospital's rule, lim<sup>Δ</sup>→<sup>0</sup> <sup>f</sup> <sup>ε</sup> <sup>ˆ</sup>(Δ) = 0. Due to ˜f <sup>ε</sup> ˆ(Δ) = max<sup>N</sup>−<sup>1</sup> <sup>l</sup>=0 <sup>ˆ</sup> <sup>−</sup> <sup>1</sup> <sup>−</sup> log Δ<sup>l</sup> · Δ, it follows also from l'Hospital's rule that lim<sup>Δ</sup>→<sup>0</sup> ˜<sup>f</sup> <sup>ε</sup> ˆ(Δ) = 0.

#### 3.3.2. Smoothing Error

Let || · || be one of the two norms, || · ||<sup>L</sup><sup>2</sup> or || · ||<sup>L</sup>∞. The smoothing error, μ, is the error with respect to || · || between the original function, <sup>F</sup> <sup>=</sup> <sup>F</sup>˜ <sup>−</sup> <sup>Δ</sup>F, without noise and the approximating function, G. Thereby, F˜ is the noisy function. It holds:

$$\mu := ||F - G|| = ||(F + \Delta F) - G - \Delta F|| = ||\tilde{F} - G - \Delta F|| \le |\vartheta| + ||\Delta F||\tag{56}$$

where ϑ denotes the training error. If Ω is a trust region, then define <sup>∀</sup><sup>x</sup>∈<sup>Ω</sup> <sup>ϑ</sup>(x) := <sup>F</sup>˜(x) <sup>−</sup> <sup>G</sup>(x). In the ideal case, <sup>ϑ</sup>(x)=ΔFx, it holds <sup>μ</sup>(x)=0. Otherwise, for 0 < δ << |ΔFx|, the following inequation must hold:

$$|\mu(x)| = |\Delta F\_x - \vartheta(x)| \le \delta \tag{57}$$

This means that the training error, ϑ, must not be too small. Figure 10a shows an overfitted model, G, which reproduces exactly the oscillations produced by the noise. In this case, ϑ = 0, but |μ(x)| = |ΔFx| >> δ for certain x ∈ Ω. On the other hand, the training error must not be too high. In Figure 10b, it holds ∀<sup>x</sup>∈<sup>Ω</sup> |μ(x)|≈|ΔFx| >> δ. Only Figure 10c depicts a feasible case; here, the training error is in the same order as the noise, and it holds |μ| ≤ δ.

Figure 10. Approximation models with different training errors, ϑ, where the function, G, approximates the noisy function, F˜ = F + ΔF, overfitted model with ϑ = 0 (a); model, where ϑ is too high (b); and feasible model with |ΔF − ϑ| ≤ δ (c). In the ideal case, it holds ϑ(x)=ΔFx.

In order to keep the smoothing error low enough, a smoothing procedure has to be applied, which filters out the noise and reproduces the loss function, at least on the sparse grid, as exactly as possible. As a smoothing procedure can only be evaluated on a sparse grid, due to the high amount of computation time for molecular simulations, the condition, |μ| ≤ δ, is considered only on the sparse grid. It does not matter if G has oscillations between the sparse grid points, because the piecewise-multilinear sparse grid interpolation is not capable of modeling them anyway.

The following theorem is essential for the convergency proof of SpaGrOW and gives an estimate for the smoothing error in the case of positive definite RBFs:

Theorem 12 (Estimate for the smoothing error). *Let* <sup>Δ</sup> <sup>&</sup>gt; <sup>0</sup> *be the size of the hyperdice,* <sup>W</sup><sup>Δ</sup>(x)*, with the center,* <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>N</sup> *, in which a sparse grid of the level,* <sup>ˆ</sup>*, is constructed. Let the loss function,* <sup>F</sup> : <sup>R</sup><sup>N</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> *, be given on the sparse grid and approximated by the function,* <sup>G</sup> : <sup>R</sup><sup>N</sup> <sup>→</sup> <sup>R</sup>*. Suppose that the approximation be stable. Then, the smoothing error* μ *from Equation* (56) *can be estimated as follows:*

$$\exists \; \kappa\_{\mu} > 0 \; : \; \forall\_{y \in \mathcal{W}^{\Delta}(x)} \left| \mu(y) \right| \leq \kappa\_{\mu} f^{\mu}(\Delta) \tag{58}$$

*where* <sup>f</sup> <sup>μ</sup> : <sup>R</sup><sup>+</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> *is continuous with* lim<sup>Δ</sup>→<sup>0</sup> <sup>f</sup> <sup>μ</sup>(Δ) = 0*. Furthermore,* ˜f(μ) := <sup>f</sup><sup>μ</sup> <sup>Δ</sup> : <sup>R</sup><sup>+</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> *is continuous, as well, with* lim<sup>Δ</sup>→<sup>0</sup> ˜<sup>f</sup> <sup>μ</sup>(Δ) = 0*.*

Proof. Due to the stability of the smoothness, Inequality (29) holds, where Ω = <sup>W</sup><sup>Δ</sup>(x) and <sup>X</sup> is the sparse grid. The fill distance on the sparse grid and Δ only differ by a constant, ω > 0, *i.e*., Δ<sup>Ω</sup>,<sup>X</sup> = ωΔ. For κ<sup>μ</sup> := κ and f <sup>μ</sup>(Δ) := h(ωΔ) = h(Δ<sup>Ω</sup>,<sup>X</sup> ), the estimate for the smoothing error in Equation (58) follows directly.

Remark 13. *Following Corollary 10, estimate Equation* (58) *is given in the case of a smoothing procedure based on Gaussian RBFs.*

Remark 14. *For Theorem 12 and the full convergency proof, it is irrelevant whether the original function,* <sup>F</sup> : <sup>R</sup><sup>N</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> *, or the transformed function,* <sup>F</sup>¯ : [0, 1]<sup>N</sup> <sup>→</sup> <sup>R</sup><sup>+</sup> <sup>0</sup> *, is approximated by* <sup>G</sup> : <sup>R</sup><sup>N</sup> <sup>→</sup> <sup>R</sup>*. The function to be smoothed only has to be continuous within the trust region. Please note that in the case of* F¯*, the function,* G*, can have negative values, as* F¯ *is equal to zero at the boundary of the trust region. For the original function,* G(x) ≥ 0 *can be assumed, due to Theorem 12, when* Δ *is small enough. For* F¯*, this can be assumed, as well. Otherwise, consider a translation that does not have any impact on either the approximation or the minimization.*

The convergency proof executed for SpaGrOW was related to a general convergency proof for derivative-free Trust Region methods [41]. However, the Trust Region method used in that paper is based on an interpolation with Newtonian fundamental polynomials. Hence, the partial proofs cannot be transferred, but have to be developed anew. Another crucial difference consists in the assumption for the loss function to be at least two times continuously differentiable with a bounded Hessian norm, which cannot be made in the case of SpaGrOW. The detailed convergency proof was performed in [40].

#### *3.4. Speed of Convergency Compared to Gradient-Based Methods*

The speed of convergency of SpaGrOW is discussed in the following, also with respect to statistical noise. As already mentioned, the trust region size, Δ, may not be too small, due to the noise. However, the convergency proof is based on the choice of a small Δ. This dilemma, which is also present in the case of gradient-based methods whenever two adjacent points are required for the computation of a partial derivative, leads to the need for an optimal parameterization of SpaGrOW. To achieve a high speed of convergency, primarily at the beginning of the optimization, the choice of a large Δ is required without hurting one of the assumptions for the convergency proof, *cf*. [40]. In the following, some heuristic considerations are made in this regard. Thereby, the index, <sup>g</sup>, refers to gradient-based descent methods, the index, <sup>H</sup> to descent methods using a Hessian and the index, <sup>S</sup> to SpaGrOW. Furthermore, let M¯ be the average number of function evaluations per iteration and ¯l, the average number of Armijo or Trust Region steps. Then, it holds:

$$
\bar{M}\_g = \; N + \bar{l}\_g \tag{59}
$$

$$
\bar{M}\_H = \left( N + \frac{N(N+1)}{2} + \bar{l}\_H \right) \tag{60}
$$

$$
\bar{M}\_S \quad = \ 2N \cdot \bar{l}\_S \tag{61}
$$

The drawback of SpaGrOW lies in the multiplicative dependency of M¯<sup>S</sup> on ¯lS. This is due to the fact that a new sparse grid is used at each iteration step. However, at the beginning of the optimization, ¯l<sup>g</sup> = ¯l<sup>H</sup> = ¯l<sup>S</sup> = 1 is assumed. Then, it holds M¯ <sup>g</sup> < M¯<sup>S</sup> < M¯H. Hence, SpaGrOW requires less iterations on average than a method based on Hessians. However, in total, it has to manage with less iterations than a gradient-based method requiring less function evaluations per iteration: Let k be the general number of iterations; then it must hold: k<sup>S</sup> < kg. If:

$$k\_S < \frac{N+1}{2N} k\_g \tag{62}$$

SpaGrOW needs less iterations and function evaluations than a gradient-based method. Now, the question arises, how this can be steered. By choosing an initial Δ<sup>0</sup> (and also γ1) that is large enough, a faster convergency can be achieved.

In the following, the initial phase of an optimization process is considered, and a short comparison between SpaGrOW and a gradient-based descent method is pointed out, *i.e*., it is discussed under what conditions the speed of convergency is significantly higher in the case of SpaGrOW. Please note that *at the beginning of the optimization* means here that the number of trust region or Armijo steps is equal to one at each iterations and that k<sup>g</sup> is chosen, so that the number of Armijo steps for x<sup>k</sup><sup>g</sup> is still equal to one and for x<sup>k</sup>g+1, greater than one. Furthermore, choose kS, so that ||x<sup>k</sup><sup>S</sup> <sup>−</sup> <sup>x</sup><sup>0</sup>|| ≤ ||x<sup>k</sup><sup>g</sup> <sup>−</sup> <sup>x</sup><sup>0</sup>||, the size of the trust region is equal to <sup>Δ</sup><sup>0</sup> and <sup>x</sup><sup>k</sup><sup>g</sup> <sup>∈</sup> <sup>Ω</sup>k. Hence, <sup>x</sup><sup>k</sup><sup>g</sup> <sup>∈</sup>/ <sup>Ω</sup><sup>k</sup>−<sup>1</sup>. This means that both <sup>x</sup><sup>k</sup><sup>S</sup> and <sup>x</sup><sup>k</sup><sup>g</sup> are reached by SpaGrOW with <sup>k</sup><sup>S</sup> steps of length Δ0, which is depicted in Figure 11. The distance between x<sup>0</sup> and x<sup>k</sup><sup>S</sup> is equal to kSΔ0. It holds:

$$k\_S \Delta\_0 \le ||x^{k\_g} - x^0||\tag{63}$$

If:

$$
\Delta\_0 > \underbrace{\frac{||x^{k\_g} - x^0||}{N + 1}}\_{> 1, \text{ if } k\_g \ge 2} \tag{64}
$$

*i.e*., if:

$$
\Delta\_0 > \zeta ||x^{k\_g} - x^0||\tag{65}
$$

for a maximal ζ ∈ (0, 1); then, Inequality (62) follows. In practice, a realistic case is k<sup>g</sup> = 4. If:

$$
\zeta := \frac{4N}{(N+1)k\_g} = \frac{N}{N+1} < 1\tag{66}
$$

is chosen, SpaGrOW requires less than 50% of the iterations and function evaluations required by the gradient-based descent method. Hence, a reduction of computation time by a factor of two is plausible at the beginning of the optimization process.

Figure 11. Speed of convergency of SpaGrOW at the beginning of the optimization. For an appropriate choice of the size of the initial trust region, Δ0, the number of iterations in the case of SpaGrOW (kS) is significantly smaller than in the case of a gradient-based method (kg). It is realistic that the number of iterations and function evaluations can be reduced by a factor of two.

#### 4. Practical Evaluation and Results

The methodology of SpaGrOW was implemented in *python* (version 2.4.3) and is modularly constructed. The software consists of a main control script and secondary control scripts for specific parts of the algorithm, e.g., for the sparse grid interpolations and the control of the smoothing procedures, which were implemented in *S+* (version 2.1.10), scripting language related to the *R Project for Statistical Computing* [42].

The implementation of SpaGrOW contains the full algorithm described in the previous section and acts as an interface between optimization and simulation, providing all necessary control routines for both tasks. On the optimization side, the method starts with an initial guess and evaluates the loss function based on the results of a simulation. If one of the stopping criteria of SpaGrOW is fulfilled, the optimization workflow terminates. Otherwise, the parameters are updated by SpaGrOW.

On the simulation side, a control script calls the simulation tool performing all preparation routines and computing the trajectory, as well as the desired physical target properties. The latter are passed on to the optimization workflow of SpaGrOW. In the case of a simultaneous optimization of properties at different temperatures, the respective simulations are executed in parallel. In this case, a script distributing K jobs at K temperatures is called. A script controlling the parallel environment and the simulation control script are called K times. If m = n/K physical properties are fitted, the result of each job consists of m properties files. Figure 12 shows both the optimization and the simulation side and how they interact with each other in the case of parallel jobs at different temperatures.

Figure 12. Technical realization of optimization procedure for physical target properties at different temperatures. If the simulated properties do no coincide well with their experimental reference data, the optimization control script—depicted on the left—passes the current force field parameters on to a distribution control script, which submits parallel jobs at different temperatures. Then, a parallel environment control script is executed, and a simulation control script is called, which performs the following three tasks: preparation routines, the simulation itself and the computation of the simulated target properties. The properties are written into separate files, which are read by the optimization control script. Finally, the loss function is evaluated and the workflow continues.

The simulations were performed on a parallel computer cluster with 215 available nodes, where each node is provided with two Intel-Nehalem-EP-Quadcore processors (Xeon X5550) with 24 GB of main storage, which are connected by a fast QDR infiniband interconnect with a 40-Gb/s Double Date Rate (DDR).

In this section, SpaGrOW is evaluated in practice and applied to molecular simulations as described above. The questions to be answered are the following:


#### *4.1. Selection of Smoothing and Regularization Methods*

As molecular simulations are extremely time-consuming, an analytical model replacing them was used for the selection. In previous work [22], a similar assessment has already been performed for gradient-based methods. Vapor-Liquid Equilibrium (VLE) data, like the saturated liquid density, ρl, the enthalpy of vaporization, ΔvH, and the vapor pressure, pσ, can be evaluated directly as functions of certain force field parameters. These functions were determined by [43] by nonlinear regression using a high amount of simulation data for two-centered Lennard-Jones (LJ) fluids with a quadrupolar moment. Here, nitrogen was considered as an example for this kind of fluid. The model parameters to be optimized were the elongation, L, of the two LJ sites, the quadrupolar moment, Q<sup>2</sup>, and the two LJ parameters, σ and ε. In order to mimic molecular simulations, uniformly distributed artificial uncertainties (0.5% for the density and 3% for the pressure) were added for the simultaneous optimization of ρ<sup>l</sup> and p<sup>σ</sup> at six different temperatures, T/K ∈ {65, 75, 85, 95, 105, 115}. As in [22], the weights of the properties in the loss function were all equal to one, because all properties were considered as homologous. As the simulation data were noisy, ten statistically independent random replicates were performed for each optimization run, whose results were averaged.

Due to theoretical considerations, the tendency consists in selecting positive definite RBFs, in particular, Gaussian RBFs. In the following, it is shown that this is also a good choice in practice.

In order to evaluate, whether a smoothing or regularization procedure is appropriate for SpaGrOW, the behavior of the algorithm combined with each preprocessing procedure is analyzed. Thereby, both efficiency and robustness with respect to noise are considered. Table 2 shows the candidates and their abbreviations.


Table 2. Candidates for smoothing and regularization procedures within SpaGrOW together with their abbreviations.

Selection of the Best Smoothing Procedure It was already motivated that a smoothing procedure is indispensable, whenever noisy loss function values are present. A detailed analysis has shown that certain RBFs deliver better results by far than others. Suitable RBFs are the linear, cubic, Gaussian and thin plate spline RBF, *cf*. Section 2.2.2. The multiquadric functions were not reliable. Moreover, Wendland functions were considered, as well. Figure 13 shows box plots for the loss function values achieved by SpaGrOW combined with the different RBFs, which is a criterion for the quality of convergency. The results over ten statistically independent replicates are indicated. The lower the loss function achieved was, the closer got the algorithm to the minimum. The smallest loss function values were achieved robustly by the four RBFs mentioned above. A Gaussian RBF is selected here for the following reasons:


Please note that linear RBFs delivered good approximations at the beginning of the optimization, as well, which was not surprising, because the steepest descent method was also very successful [22]. For higher dimensions, the Lipra method, *i.e*., a quadratic approximation of the loss function, could be convincing.

Figure 13. Box plots of the loss function values achieved by SpaGrOW in combination with a smoothing procedure based on Radial Basis Functions (RBFs). The RBFs were the linear, cubic, multiquadric (Multi), inverse multiquadric (Invers), Gaussian, thin plate spline RBF (TPS) and a Wendland function. Suitable RBFs were only the linear, cubic, Gaussian and thin plate spline RBF.

Figure 14. Approximations on the unit square, [0, 1]<sup>2</sup>, based on Gaussian RBFs (a) and thin plate spline RBFs (b). The blue points mark the original (noisy) function values of F¯ on the sparse grid. It holds x1 = ξ(Q<sup>2</sup>), x2 = ξ(L) and y = F¯(x1, x2). The smoothing procedure based on thin plate spline RBFs reproduces F¯ at the boundary of the unit square in a very bad fashion.

Selection of the Best Regularization Method As already mentioned, the selection of the best regularization method can only be achieved by practical evaluations. Candidates are least squares, NENs (LASSO for α = 0 and Ridge Regression for α = 1), as well as MARS.

Figure 15. Approximations of F¯ on the unit square, [0, 1]<sup>2</sup>, based on Gaussian RBFs, combined with a LASSO regularization. The blue points mark the original (noisy) function values of F¯ on the sparse grid. It holds x1 = ξ(Q<sup>2</sup>), x2 = ξ(L) and y = F¯(x1, x2). At the boundary of the unit square, the function is reproduced in a bad fashion.

The regularization methods were evaluated in combination with the selected smoothing procedure based on Gaussian RBFs. The application of least squares is the same as not using any regularization method. All other regularization methods could improve the results achieved with least squares only. The LASSO algorithm performs a variable selection, *i.e*., it tends to detect outliers by mistake. Hence, the model obtained by LASSO was often under-fitted. In contrast, the Ridge Regression estimator was more suitable for the present task, as it is a compromise between least squares, which often lead to overfitting, and the LASSO, which often leads to under-fitting. Figure 15 shows an approximation based on Gaussian RBFs in combination with LASSO. As can be seen, the function to be approximated was reproduced in a bad fashion at the boundary.

An NEN with α = 0.7 delivered an even better quality of convergence; however, the computational effort to optimize α was too high compared to the benefit achieved. The application of an NEN with α /∈ {0, 1} is not worthwhile for the present task.

Figure 16. Box plots of the loss function values (a) and of the number of function evaluations (b) resulting from the application of SpaGrOW combined with a smoothing procedure based on Gaussian RBFs and different regularization methods: Ridge Regression, LASSO, a weighted linear regression (rlm), an RBF approximation with an additional linear term (lt), an NEN (eln) with α = 0.7 and MARS. It becomes clear that MARS is the algorithm to select for regularization. Ridge Regression is reliable, as well.

The best regularization method is the MARS algorithm, not only with respect to robustness and quality of convergency, but also with respect to the number of function evaluations: Figure 16 shows box plots of all regularization methods applied. Figure 16a shows the loss function values achieved and Figure 16b the number of function evaluations. Besides the methods considered here, two other ones were tried: a weighted linear regression based on M-estimators and an RBF approximation with an additional linear term. As can be seen, the MARS algorithm delivers the best results. Hence, it is selected as regularization method for SpaGrOW. However, Ridge Regression is reliable, as well, and suggested as the alternative. The NEN with α = 0.7 achieved very low loss function values in a very robust way, but it always required more than a hundred function evaluations.

For Lipra, the least square estimator was the best regularization method. All other methods biased the quadratic approximation in a highly inappropriate way.

To summarize, Gaussian RBFs in combination with MARS and, in particular, for N ≥ 5, the Lipra method together with the least square estimator have turned out to be most suitable for the present task and are implemented in SpaGrOW for this reason.

#### *4.2. Application of SpaGROW to Molecular Simulations*

Finally, SpaGrOW is applied to molecular simulations. Thereby, it is compared to gradient-based methods with respect to computational effort. Additionally, for an eight-dimensional problem, the Lipra method is evaluated, and it is analyzed how close SpaGrOW can get to the minimum.

## 4.2.1. Comparison to Gradient-Based Methods with Respect to Computation Time: Benzene and Ethylene Oxide

In the following, SpaGrOW is compared to GROW with respect to computational effort on the basis of two applications: benzene and ethylene oxide.

Benzene Benzene (C6H6) is a quite simple molecule, because of its symmetric structure and the fact that it does not possess a permanent dipolar moment. Furthermore, benzene has two chemically independent atom types only. Hence, the force field parameterization for benzene was deemed to be a relatively easy task. However, it is still challenging, because of the π interactions.

Figure 17 shows the comparison between SpaGrOW and GROW with respect to the computational effort required within the respective optimization procedures. The target observables were the enthalpy of vaporization (Figure 17a) and the saturated liquid density (Figure 17b), considered at three different temperatures. The values indicated on the y-axis are the Mean Absolute Percentage Errors (MAPE), *i.e*., the absolute deviations from the respective experimental reference data in %, averaged over the range of temperatures. The experimental saturated liquid density was taken from [44] and the enthalpy of vaporization from the NISTdatabase [45]. The simulations performed were molecular dynamics simulations in the NpT ensemble executed with the software tool, *GROMACS* [46]. The non-bonded potential energy was computed by *Moscito* [47] using the trajectories collocated by GROMACS. Please note that the experimental target observables were VLE define data and that the simulated properties were only approximations, due to the lack of an explicit gas phase, which was assumed to be ideal. On the computer cluster mentioned above, three to four hours were required for the simulation of 1,000 benzene molecules for 2 ns using a time step of 2 fs. For SpaGrOW, the following variables were chosen: η<sup>1</sup> = 0.2, η<sup>2</sup> = 0.7, γ<sup>1</sup> = 0.5, γ<sup>2</sup> = 1.1, and Δ<sup>0</sup> = 0.3 × Δmax. The force field parameters were σ(H), σ(C), ε(H) and ε(C). The initial parameters were taken from previous work [23]. Thereby, the saturated liquid density and the self-diffusion coefficient were optimized at the vapor-liquid coexistence curve. The feasible domain was defined as follows: σ was changed by no more than 30% and ε by no more than 80%. As it was a four-dimensional optimization problem, Gaussian RBFs were chosen for the smoothing and the MARS algorithm for the regularization procedure.

Figure 17. Mean Absolute Percentage Errors (MAPE) values with respect to ΔvH (a) and ρ<sup>l</sup> (b) for benzene during the SpaGrOW optimization in comparison to GROW. The smoothing procedure was based on Gaussian RBFs in combination with MARS. The force field parameters to be optimized were σ(H), σ(C), ε(H) and ε(C). A faster convergency of SpaGrOW could be confirmed.

GROW needed seven iterations in total: five steepest descent and two conjugate gradient iterations. In contrast, SpaGrOW required six iterations only. Please note that an optimal force field with respect to ΔvH and ρ<sup>l</sup> was already achieved after four iterations, *i.e*., both target observables were equal to their experimental reference data up to statistical uncertainties for all temperatures. Typical statistical uncertainties for ΔvH and ρ<sup>l</sup> are 1% and 0.5%, respectively, *cf*. e.g., [19]. For GROW, this was the case after seven iterations. The number of function evaluations, *i.e*., simulations, for SpaGrOW and GROW was 37 and 62, respectively. However, in the latter case, seven simulations have to be subtracted for the comparison, because the partial atomic charged were optimized, as well. Hence, in the case of GROW, it was a five-dimensional optimization problem, and for the gradient calculation, one simulation more was required at each iteration. However, SpaGrOW was significantly faster than GROW.

Figure 18 depicts the development of the LJ Parameters for GROW and SpaGrOW (Figure 18a refers to σ and Figure 18b to ε). σ(C) remained constant, and all other force field parameters were increased. At the fourth iteration of SpaGrOW, the parameters were very similar to the ones of GROW at the seventh iteration. However, the following interesting observation could be made. The parameter, σ(C), remained constant, even during the whole optimization procedure. In the case of GROW, it was first decreased in order to obtain nearly the same value as before. As it is gradient-based, GROW tends to make detours. As it is grid-based, SpaGrOW is capable of keeping one or more parameters constant during the whole optimization procedure, because it can converge along a certain grid line or hyperplane. This is another reason for the faster convergency of SpaGrOW. Only in the case of ε, some small detours were observed. The algorithm ran through the triangle indicated in Figure 18b. SpaGrOW delivered a set of force field parameters, which differed a little from the ones obtained by GROW. Hence, it achieved a different domain close to the minimum.

Figure 18. Development of the Lennard-Jones (LJ) parameters in the case of benzene (ΔvH,ρl) for GROW and SpaGrOW—σ(H) and σ(C) (a)—as well as ε(H) and ε(C) (b). The unfilled circles indicate the optimal parameters. SpaGrOW led in a more direct way to the minimum than GROW. Only in the case of ε, some detours could be observed, due to the triangle.

Ethylene oxide Ethylene oxide (C2H4O) is a highly toxic substance, but very relevant for industrial applications, because it is an educt for many industrially fabricated materials. It is very suitable for the evaluations of the optimization algorithms, because it has been characterized both experimentally [48,49] and by molecular simulations [14,50,51].

Figure 19 shows the MAPE values for ethylene oxide during the optimization procedures. The target observables were the saturated liquid density (Figure 19a), the enthalpy of vaporization (Figure 19b) and the vapor pressure (Figure 19c), considered at seven different temperatures. For pressures, the statistical uncertainty within molecular simulations is higher than for ΔvH and ρl. Here, 5% were assumed. Experimental data was taken from [48]. The simulations performed were Grand-Equilibrium Monte-Carlo (GEMC) simulations based on a rigid united-atom model with a dipolar moment developed by [14]. They were executed with the software tool *ms2* [52]. As GEMC simulations simulate the liquid and gas phase successively, the VLE data could be calculated correctly. The liquid simulation was performed in the NpT ensemble and the gas simulation in the μVT ensemble, where the chemical potential, μ, was kept constant. On the computer cluster mentioned above, eight to ten hours were required for the simulation of 500 liquid and 500 gaseous ethylene oxide molecules. Thereby, the total number of 345,000 Monte-Carlo steps was distributed on eight processors. The force field parameters to be optimized were ε(CH2), ε(O), σ(CH2) and σ(O). The initial parameters were the result of parameters obtained by a global pre-optimization based on random search [51]. Both parameters were changed by no more than 20%. The SpaGrOW variables were the same as for benzene. As the optimization problem was four-dimensional, Gaussian RBFs in combination with MARS were used again.

Figure 19. MAPE values with respect to ρ<sup>l</sup> (a), ΔvH (b) and p<sup>σ</sup> (c) in the case of ethylene oxide during the Vapor-Liquid Equilibrium (VLE) optimization for SpaGrOW and GROW. The smoothing procedure was based on Gaussian RBFs combined with the MARS algorithm. The force field parameters were ε(CH2), ε(O), σ(CH2) and σ(O). The faster convergency of SpaGrOW compared to GROW could be confirmed again.

In total, GROW required 14 iterations for the optimization using the steepest descent method only. SpaGrOW only needed five to achieve a comparable loss function value in the same order of magnitude: for GROW, <sup>F</sup>(x(14))=2.4×10−<sup>4</sup> and, for SpaGrOW, <sup>F</sup>(x(5))=3.9×10−<sup>4</sup>. Please note that the force field was not optimal and that other methods had to be applied in order to optimize it. For details, *cf*. [40]. SpaGrOW and GROW required 54 and 77 molecular simulations, respectively. Hence, SpaGrOW was significantly faster again.

Figure 20 shows the development of the LJ parameters during the optimization process in comparison to GROW (Figure 20a refers to ε and Figure 20b to σ). All parameters were decreased, and SpaGrOW delivered approximately the same parameters as GROW. Undesired detours were avoided again, except for σ. However, the detours made by GROW could be linearized.

Figure 20. Development of the LJ parameters in the case of ethylene oxide (VLE) for GROW and SpaGrOW: ε(CH2) and ε(O) (a), as well as σ(CH2) and σ(O) (b). The unfilled circles show the final parameters. SpaGrOW led to a more direct way to the minimum again.

To summarize, SpaGrOW exhibits a significantly higher speed of convergency than GROW.

## 4.2.2. Comparison to Gradient-Based Methods Close to the Minimum: Dipropylene Glycol Dimethyl Ether

In the framework of the *Industrial Fluid Property Simulation Challenge (IFPSC) 2010* [53], a liquid-liquid equilibrium (LLE) between two liquid phases should be calculated with molecular models. One of the two component was water and the other, dipropylene glycol dimethyl ether (C8H18O3), an inert, water-resistant, nontoxic and industrially highly relevant solvent.

The target observable was the liquid density, ρ, only, considered at four different temperatures and taken from [54]. The simulations performed were molecular dynamics simulations in the NpT ensemble executed with the software tool, *GROMACS* [46]. On the computer cluster mentioned above, three to four hours were required for the simulation of 512 ether molecules for 0.5 ns using a time step of 2 fs.

The LJ sites were located at the CH3, the CH2, the CH group and the oxygen. Hence, it was an eight-dimensional optimization problem. The initial force field parameters from [55]. As GROW could not achieve an optimal force field reproducing the liquid density of the ether [55], another gradient-based method was applied, based on a Taylor series up to the first member for the target observables, delivering a quadratic model for the loss function. It is a modified Gauss-Newton method combined with the Trust Region approach, *i.e*., the quadratic model is minimized within a compact domain, as well. The algorithm has also been developed by the authors, *cf*. [40] for more details. After three iterations, the modified Gauss-Newton method could achieve optimal liquid densities.

Figure 21 indicates that SpaGrOW (with <sup>Δ</sup><sup>0</sup> = 0.<sup>3</sup> <sup>×</sup> <sup>Δ</sup>max) required two iterations only to do so. As the optimization problem was eight-dimensional, the Lipra method was applied as a smoothing procedure. However, SpaGrOW needed 38 simulations, three-times more than the modified Gauss–Newton method, which only had to execute twelve simulations. With an optimal Δ0, the number of simulations could have been reduced to 19 in the case of SpaGrOW. The modified Gauss-Newton method is more reliable than SpaGrOW close to the minimum, but it could be shown that SpaGrOW is capable of getting closer to the minimum than the standard gradient-based algorithms used in GROW.

Figure 21. MAPE values with respect to ρ in the case of dipropylene glycol dimethyl ether during the optimization process of SpaGrOW in comparison to the modified Gauss-Newton method. The smoothing procedure was realized by the Lipra method. The optimization problem was eight-dimensional. SpaGrOW needed two iterations only, but significantly more simulations than the modified Gauss-Newton method in order to achieve optimal liquid densities.

However, the Gauss-Newton method is only applicable close to the minimum, *i.e*., when the loss function is nearly quadratic. In contrast, SpaGrOW is more generally applicable, and another drawback of the Gauss-Newton method is that it requires a gradient, which is not guaranteed to be computable, due to the reasons mentioned in Section 1. Please note that the density is not as noisy as other target properties, like the pressure or diffusion coefficient. Of course, the existence of an optimal Δ for SpaGrOW is not guaranteed either in practice.

Figure 22 shows the optimal liquid densities as a function of temperature for the modified Gauss-Newton method and SpaGrOW. Both methods could achieve an optimal force field with respect to ρ in contrast to GROW. However, SpaGrOW could reproduce the trend of the curve better than the modified Gauss-Newton method.

To summarize, SpaGrOW is capable of getting closer to the minimum than GROW, when the smoothing procedure and Δ<sup>0</sup> are chosen properly. However, the modified Gauss-Newton method needs significantly less simulations.

Figure 22. Optimization of the density ρ in the case of dipropylene glycol dimethyl ether using the modified Gauss-Newton method and SpaGrOW. Optimal densities could be achieved by both methods, but SpaGrOW needed many more simulations. However, SpaGrOW could reproduce the trend of the curve better.

#### 5. Conclusions

In this paper, the new derivative-free optimization method was presented in detail and applied to the parameterization of force fields in the field of molecular simulations. It is a combination of appropriate smoothing and regularization procedures, interpolation on sparse grids and the Trust Region approach. SpaGrOW turned out to be a highly efficient algorithm outperforming standard gradient-based methods with respect to the speed of convergency. Furthermore, it is capable of getting closer to the minimum. However, if a gradient can be calculated correctly, the gradient-based methods exhibit a slightly higher robustness than SpaGrOW. The new method is validated in the following with respect to the three criteria speed of convergency, local refinements and robustness:

• Speed of convergency: Whenever Δ<sup>0</sup> is chosen properly, SpaGrOW often requires only half of the number of simulations than gradient-based methods. The speed of convergency was also higher, *i.e*., the number of iterations was significantly lower.

The choice, <sup>Δ</sup><sup>0</sup> = 0.<sup>3</sup> <sup>×</sup> <sup>Δ</sup>max, was suitable in most cases, *i.e*., the parameterization of SpaGrOW is not critical at the beginning of the optimization.

• Local refinements: SpaGrOW is capable of getting closer to the minimum than GROW. However, the choice of Δ becomes critical, which reduces the robustness of SpaGrOW: If Δ is too high, the smoothing and interpolation algorithms cannot deliver a reliable model for the loss functions. If it is too small, only noise can be reproduced. A modified Gauss-Newton method turned out to be more efficient and close to the minimum, if the associated gradients can be computed correctly. An important advantage of SpaGrOW is the fact that the step length can be modified by the Trust Region steps, leading to different descent directions, which is not possible for gradient-based methods first determining the descent direction and, then, searching for a reliable step length.

• Robustness: SpaGrOW exhibits a slightly lower robustness than the gradient-based methods. Due to an inappropriate choice of Δ, the minimum of the model can be transferred under certain conditions, and the course of the algorithm can be modified. The variable must be chosen, so that a decreasing trend of the loss function is present within the actual trust region. However, this is not trivial here, because the shape of the loss function is not known *a priori*.

It is extremely difficult to find a method that is efficient and robust at the same time. These two properties often face each other: Stochastic global optimization methods, like simulated annealing or evolutionary algorithms, are very robust with respect to statistical noise, but require a high amount of computation time to determine the global minimum exactly. Gradient-based ones are fast-convergent, but are less robust and reliant on the differentiability of the function to be minimized. SpaGrOW is situated in between: The higher speed of convergency has to be compensated by a lower robustness. However, the robustness was not reduced significantly, and the assumption that the loss function is smooth does not have to be made. Furthermore, SpaGrOW may still be successful when gradient-based methods exist.

To summarize, SpaGrOW is a generic, efficient and, also, quite robust algorithm, which can be used for many optimization problems. For force field parameterization tasks, it is highly recommended and preferred to gradient-based algorithms.

#### Acknowledgments

We are grateful to Janina Hemmersbach for the detailed analysis and selection of appropriate smoothing procedures, as well as to Anton Schüller for fruitful discussions and advice.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


Reprinted from *Entropy*. Cite as: Takahashi, K.Z. Truncation Effects of Shift Function Methods in Bulk Water Systems. *Entropy* 2013, *15*, 3249–3264.

#### *Article*

## Truncation Effects of Shift Function Methods in Bulk Water Systems

#### Kazuaki Z. Takahashi

Department of Mechanical Engineering, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan; E-Mail: takahashi@mech.keio.ac.jp; Tel.: +81-45-566-1454 (ext. 47151); Fax: +81-45-566-1495

*Received: 30 June 2013; in revised form: 5 August 2013 / Accepted: 7 August 2013 / Published: 13 August 2013*

Abstract: A reduction of the cost for long-range interaction calculation is essential for large-scale molecular systems that contain a lot of point charges. Cutoff methods are often used to reduce the cost of long-range interaction calculations. Molecular dynamics (MD) simulations can be accelerated by using cutoff methods; however, simple truncation or approximation of long-range interactions often offers serious defects for various systems. For example, thermodynamical properties of polar molecular systems are strongly affected by the treatment of the Coulombic interactions and may lead to unphysical results. To assess the truncation effect of some cutoff methods that are categorized as the shift function method, MD simulations for bulk water systems were performed. The results reflect two main factors, *i.e.*, the treatment of cutoff boundary conditions and the presence/absence of the theoretical background for the long-range approximation.

Keywords: molecular-dynamics simulations; long-range interactions; liquid water; electrostatic interactions; reaction field

#### 1. Introduction

In the calculation of thermodynamic, structural and dynamical properties by molecular dynamics (MD) simulations, the effect of long-range interactions is an important issue. Long-range interactions on the periodic boundary conditions (PBCs) can be calculated using the Ewald sum or cutoff methods. The Ewald sum [1] is the key standard method used in calculations involving long-range interactions with the periodic boundary condition. In this method, the total energy is split into real and reciprocal space contributions. Calculation of the Ewald sum involves three problems, the first being that the reciprocal part is computationally expensive. Particle mesh Ewald (PME) [2,3] reduces computational cost for the reciprocal part by using fast Fourier transform (FFT); however, FFT has problems, becoming a cause of a strong bottle neck in massively parallel computers [4]. The second is the inherent periodicity, which can develop artifacts [5–14]. The third is that the thermodynamic limit is unclear [15]. Notwithstanding the three problems, the Ewald sum and PME are methods of choice that most appropriately represent the long-range interactions.

Cutoff methods are often used to accelerate long-range interaction calculations. The interactions between molecular pairs only with a distance shorter than a given cutoff length are considered, and effects from more distant pairs are truncated or approximated. The plain cutoff, the cutoff with the switch/shift function and the reaction field (RF) method are some examples of typical cutoff methods for Coulombic systems. Simulations can be accelerated by using cutoff methods, but simple truncation or approximation of long-range interactions have serious defects for various systems. In Lennard-Jones (LJ) fluid systems, long-range interactions do not have a prominent effect on transport properties [16,17], but phase equilibria and interfacial properties change drastically [18–20]. In water systems, the electrostatic interactions dominate the physical properties, and truncation or continuum approximation may lead to unphysical results. A lot of cutoff methods applied to the Coulombic interaction offer insufficient accuracy [21–31], and all of the results are highly sensitive to the cutoff distance. Cutoff methods are also applied to macromolecular systems [12–14,32–43], and many results indicate that the aforementioned approximations have difficulties when estimating these systems. On the other hand, advanced cutoff-like methods have been developed to avoid the aforementioned difficulties and to accelerate long-range interaction calculations. Wolf *et al*. [44] developed a method to calculate electrostatic interactions, which is simpler than the Ewald sum. They took into account charge neutrality in the cutoff sphere and discovered that the electrostatic potential of condensed phases seems to have short range behavior. The modified method developed by Fennel and Gezelter [45] could reproduce some thermodynamic properties of homogeneous systems obtained by the Ewald sum. However, the method can hardly estimate heterogeneous systems [46,47]. Wu and Brooks developed the isotropic periodic sum (IPS) method [48–50]. The IPS method is a method that can calculate contributions from the infinite periodic structure without reciprocal space calculations. Some reports on the accuracy of the IPS method of homogeneous [48,50–54] and heterogeneous systems [47,50,55–57] show that the method yields estimates in good agreement with the results of the Ewald sum. Improved methods were developed to speed up calculations for large-scale systems [58] and to improve the accuracy for homogeneous and heterogeneous systems [59,60].

Some cutoff methods and the aforementioned two cutoff-like methods can be regarded as the shift function method, which produces the pairwise-potential shifted from the original potential function (e.g., Coulombic interaction) by any theoretical or other requirements. To discover the relation between truncation effects and approximation treatments of long-range interactions, in this work, we focused on shift function methods. MD simulations of bulk water systems were carried out for evaluating the truncation effects of the potential energy, self-diffusion, radial distribution function and the dipole-dipole correlation. The results reflect two main factors, *i.e.*, the treatment of cutoff boundary conditions and the presence/absence of the theoretical background for long-range approximation.

#### 2. Experimental

MD simulations for bulk water systems were conducted to examine the truncation effects of shifted potentials, and physical properties were compared with those from the simulations of the Ewald sum. For shift function methods, CHARMm-shift [61], Ohmine-shift [62], the dumped-shifted-force potential of the Wolf method (Wolf-DSF) [45], the RFmethod with an infinite dielectric constant (RF-metal), the IPS method for non-polar systems (IPSn) [48], the IPS method for polar systems (IPSp) [50] and the linear-combination-based IPS (LIPS) method with a fifth-order cutoff boundary condition (LIPS-fifth) [59] were chosen. CHARMm-shift is used in CHARMm [61] for shifting Coulombic and LJ interactions. The cutoff boundary conditions are considered until the first-order differential of the interaction potential (first-order cutoff boundary condition). Ohmine-shift is originally one of the switching function methods. The method provides shifted pairwise-potential, if the switching point is set to zero. This potential has second-order cutoff boundary conditions. Wolf-DSF was developed by Fennel and Gezelter [45]. The charge neutrality assumption inside the cutoff sphere is the basic concept of the original Wolf method and Wolf-DSF. In the bulk water systems, the α-parameter of Wolf-DSF is set to 0.2 nm−<sup>1</sup> [45]. It should be noted that better α-parameters for any other systems potentially exist [46,47]. RF-metal is the RF method with an infinite dielectric constant. In the RF theory, the Coulombic interaction can be modified for homogeneous systems, by assuming a constant dielectric environment beyond the cutoff sphere. Originally, the dielectric constant of the RF method should be set to realistic value. However, some results indicate that the RF method with an infinite dielectric constant is best for estimating bulk water systems [30]. Therefore, we set the dielectric constant of the RF method to the infinite value, like a bulk metal. IPSn and IPSp are two different versions of the IPS method. IPSn is applied to calculations for point charges, whereas IPSp calculates polar molecules. The IPS method assumes the isotropic periodic structure outside of the cutoff sphere. The contribution from this structure (periodic reaction field) determines the shape of the IPS potentials. LIPS-fifth is the potential produced by the improved IPS method, called the LIPS method. The LIPS method is based on the extended IPS theory that provides the design procedure of the periodic reaction fields.

In order to clarify the truncation effect of the Coulombic interaction for shift function methods, it was only applied for the Coulombic interaction, and a cutoff method was used for the LJ interaction. The cutoff radius of the LJ interaction was set as 1.2664 nm, which is 4.0 in LJ length units. For the shift function methods, the cutoff radius, rc, of the Coulombic interaction was changed from 1.2 nm to 2.0 nm by 0.2 nm increments. In this simulation, the extended simple point charge (SPC/E) model [63] was used for water molecules. The velocity Verlet algorithm [64] was used with three-dimensional periodic boundary conditions along with a time step of 2 fs. The atoms in a water molecule were constrained by the RATTLEalgorithm [65]. The simulation was performed in a constant particle-number, volume and temperature ensemble with the Nosé-Hoover thermostat [66–68], where the number of water molecules was 6,192, the density was 0.997 cm<sup>3</sup> and the temperature was 298.15 K. After equilibrating the system, a total of <sup>5</sup> <sup>×</sup> <sup>10</sup><sup>5</sup> time steps (1 ns) were carried out for each cutoff radius of the shift function methods. The potential energy, U, the self-diffusion coefficient, D, the radial distribution function, g(r), the distance dependence of the Kirkwood factor, GK(r), and the radial distribution of the dipole ordering, s(r), were calculated. We calculated the self-diffusion coefficient for the transport coefficients. The self-diffusion coefficient can be determined either by the Einstein relation or the Green-Kubo formula, which are basically equivalent formulas. Here, we used the Einstein relation:

$$D = \lim\_{t \to \infty} \frac{1}{6t} \langle |\mathbf{r}\_i(t) - \mathbf{r}\_i(0)|^2 \rangle\_N \tag{1}$$

where <sup>t</sup> is the time, *<sup>r</sup>*<sup>i</sup>(t) is the position of particle <sup>i</sup> and · · · <sup>N</sup> denotes the particle average. The slope of the mean-squared displacement of the diffusing particle in the long-time limit is calculated for the diffusion coefficient. The radial distribution function, the distance dependence of the Kirkwood factor and the radial distribution of dipole ordering were calculated for the configuration of water. These properties are given as a function of the distance between two water molecules, denoted r. The conventional expressions give:

$$g(r) = \frac{V}{4\pi r^2 \Delta r N (N - 1)} \left\langle \sum\_{i} n\_i(r) \right\rangle\_e \tag{2}$$

$$G\_{\mathcal{K}}(r) = \frac{1}{N} \left\langle \sum\_{i} \left( u\_i \cdot \sum\_{j, r\_{ij} < r} u\_j \right) \right\rangle\_e \tag{3}$$

and:

$$s(r) = \frac{1}{N} \left\langle \sum\_{i} \frac{1}{n\_i(r)} \left(\sum\_{j=1}^{n\_i(r)} u\_i \cdot u\_j\right) \right\rangle\_e \tag{4}$$

where ni(r) is the number of molecules that exist in the region between r and r + Δr from molecule <sup>i</sup>. *<sup>u</sup>*<sup>i</sup> and *<sup>u</sup>*<sup>j</sup> are the normalized dipole moments of molecules <sup>i</sup> and <sup>j</sup>, respectively, while · · · <sup>e</sup> signifies an equilibrium ensemble average.

All of above properties calculated from the shift function methods were compared with that from the Ewald sum. For the Ewald sum, the cutoff radius for the real part was 2.8 nm. The α-parameter was determined by the following equation:

$$\text{erfc}(-\alpha r\_{\text{c}}) \approx \exp(-\alpha^2 r\_{\text{c}}^2) = \delta \tag{5}$$

where δ is a small number, which indicates the convergence of real space potentials in the Ewald sum. δ was 10−<sup>6</sup>, so αr<sup>c</sup> = (6 ln 10)<sup>1</sup>/<sup>2</sup>.

#### 3. Results and Discussions

#### *3.1. Bulk Water*

#### 3.1.1. Potential Energy

The thermodynamic properties for the shift function methods and Ewald sum were calculated by potential energies. Figure 1 shows the potential energy per molecule with different cutoff radii, for the shift function methods and the Ewald sum. The results from CHARMm-shift, Ohmine-shift and Wolf-DSF are far from that of the Ewald sum. In contrast, RF-metal, IPSn, IPSp and LIPS-fifth are close to that of the Ewald sum.

Figure 1. Potential energy for the shift function methods and the Ewald sum. The results from CHARMm-shift, Ohmine-shift and the Wolf method (Wolf-DSF) are far from that of the Ewald sum. In contrast, RF-metal, the isotropic periodic sum for non-polar systems (IPSn), for polar systems (IPSp) and the linear-combination-based IPS (LIPS)-fifth are close to that of the Ewald sum.

To examine the cutoff radius tendency of the potential energy thoroughly, we plotted the error of the potential energy calculated with the shift function methods against that determined with the Ewald sum, as shown in Figure 2. The error of the potential energy for each method decreases by an increment of the cutoff radius, except for the case of the Wolf-DSF. The fastest decline was observed in the case of LIPS-fifth; the error was roughly in proportion to r−<sup>4</sup> <sup>c</sup> . RF-metal and LIPS-fifth at r<sup>c</sup> = 2.0 nm achieved the smallest error and had the same value as that of the Ewald sum within 0.02%. It is clearly shown that CHARMm-shift, Ohmine-shift and Wolf-DSF poorly estimated the potential energy for bulk water systems. The reason for this is related to the presence/absence of the theoretical background for contributions outside the cutoff sphere. CHARMm-shift and Ohmine-shift do not have the theory that justifies their shifting procedure. Wolf-DSF explains its own truncation treatment by the charge neutrality assumption inside the cutoff sphere, but contributions from outside are not considered. In contrast, RF-metal, IPSn and LIPS-fifth are, respectively, based on a definite theory that considers the contributions from long-range interactions. Therefore, RF-metal, IPSn and LIPS-fifth have much better accuracy for estimating the potential energy. IPSp had intermediate values between the former and latter. This seems to be related to the counter-charge assumption of the IPSp. The counter-charge effect assumed at the cutoff boundary may partially interrupt long-range contributions.

Figure 2. The error of the potential energy calculated with the shift function methods against that determined with the Ewald sum. It is clearly shown that CHARMm-shift, Ohmine-shift and Wolf-DSF poorly estimated the potential energy for bulk water systems. In contrast, RF-metal, IPSn and LIPS-fifth have much better accuracy for estimating the potential energy. IPSp had intermediate values between the former and latter. The error of the potential energy for each method decreases by an increment of the cutoff radius, except for the case of the Wolf-DSF. The fastest decline was observed in the case of LIPS-fifth; the error was roughly in proportion to r−<sup>4</sup> <sup>c</sup> . RF-metal and LIPS-fifth at r<sup>c</sup> = 2.0 nm achieved the smallest error and had the same value as that of the Ewald sum within 0.02%.

#### 3.1.2. Self-Diffusion Coefficient

We calculated the self-diffusion coefficient for the Figure 3 shows the self-diffusion coefficient for shift function methods and the Ewald sum. The results from CHARMm-shift could not have a similar value to that of the Ewald sum at 1.2 nm ≤ r<sup>c</sup> ≤ 2.0 nm. Other methods seem to estimate the self-diffusion coefficient with an adequate accuracy.

To examine the cutoff radius tendency of the self-diffusion coefficient thoroughly, we plotted the error of the self-diffusion coefficient calculated with the shift function methods against that determined with the Ewald sum, as shown in Figure 4. The convergence of the IPSp and LIPS-fifth is much faster than other methods. For IPSp, the self-diffusion coefficient is saturated at r<sup>c</sup> ≥ 1.6 nm, and the saturated value is almost the same as that of the Ewald sum (within 0.35%). For LIPS-fifth, the self-diffusion coefficient is saturated at r<sup>c</sup> ≥ 1.4 nm, and the saturated value is almost the same as that of the Ewald sum (within 0.36%). The difference of the cutoff radius tendency comes from the treatment of the cutoff boundary conditions and long-range interactions. The results show that the improvement of the cutoff boundary conditions or long-range interaction treatment strongly affects the accuracy of the self-diffusion coefficient. In CHARMm-shift, both treatments are insufficient. It has a first-order cutoff boundary condition and does not have any theoretical background for long-range interaction treatment. Ohmine-shift had improved accuracy in comparison with that of CHARMm-shift, even if it merely comes from an advantage on the cutoff boundary condition. RF-metal and IPSn consider the first-order cutoff boundary condition and the adequate treatment for long-range contributions. Therefore, these two methods have similar accuracy to that of Ohmine-shift. Wolf-DSF also had similar accuracy to the result of Ohmine-shift, despite the absence of the theoretical background for the long-range interaction treatment. Strictly, Wolf-DSF has a first-order cutoff boundary condition, but it can be regarded as an infinite-order cutoff boundary condition under the certain value of alpha. This is the reason for the results of Wolf-DSF. In IPSp, a faster convergence of errors were observed, because it has a third-order cutoff boundary condition and the long-range interaction treatment. LIPS-fifth achieved the fastest convergence. It has a fifth-order cutoff boundary condition and a reliable background for the long-range interaction treatment.

Figure 3. The self-diffusion coefficient for the shift function methods and the Ewald sum. The results from CHARMm-shift could not have a similar value to that of the Ewald sum at 1.2 nm ≤ r<sup>c</sup> ≤ 2.0 nm. Other methods seem to estimate the self-diffusion coefficient with an adequate accuracy.

#### 3.1.3. Radial Distribution Function

To examine the structure around a molecule for shift function methods, the radial distribution function, g(r), was calculated. Figure 5 shows the oxygen-oxygen, g(r), of the water molecule for shift function methods at r<sup>c</sup> = 2.0 nm and for the Ewald sum. In Figure 5, CHARMm-shift, Ohmine-shift, RF-metal and IPSn have notable deviations from the result of the Ewald sum. On the other hand, Wolf-DSF, IPSp and LIPS-fifth provided adequate accuracy. The oxygen-hydrogen and hydrogen-hydrogen, g(r), have very similar behavior in comparison to oxygen-oxygen in Figure 5.

Figure 4. The error of the self-diffusion coefficient calculated with the shift function methods against that determined with the Ewald sum. The convergence of the IPSp and LIPS-fifth is much faster than other methods. For IPSp, the self-diffusion coefficient is saturated at r<sup>c</sup> ≥ 1.6 nm and the saturated value is almost the same as that of the Ewald sum (within 0.35%). For LIPS-fifth, the self-diffusion coefficient is saturated at r<sup>c</sup> ≥ 1.4 nm, and the saturated value is almost the same as that of the Ewald sum (within 0.36%).

Figure 5. The oxygen-oxygen radial distribution function of the water molecule for the shift function methods at r<sup>c</sup> = 2.0 nm and for the Ewald sum. CHARMm-shift, Ohmine-shift, RF-metal and IPSn have notable deviations from the result of the Ewald sum. On the other hand, Wolf-DSF, IPSp and LIPS-fifth provided adequate accuracy. The oxygen-hydrogen and hydrogen-hydrogen, g(r), have very similar behavior in comparison to oxygen-oxygen (figures not shown).

To examine the decrease of the deviation for r<sup>c</sup> thoroughly, we plotted the root mean square deviation (RMSD) of the oxygen-oxygen, g(r), for each shift function method against the Ewald sum at different cutoff radii in Figure 6a. The RMSDs of the oxygen-hydrogen and hydrogen-hydrogen,

g(r), are also plotted in Figure 6b,c, respectively. The difference of the cutoff radius tendency is affected strongly by the treatment of the cutoff boundary conditions. In CHARMm-shift, RF-metal and IPSn that have a first-order cutoff boundary condition, the deviation decreases roughly in proportion to r−2.<sup>5</sup> <sup>c</sup> , r−<sup>3</sup> <sup>c</sup> and r−<sup>3</sup> <sup>c</sup> , respectively. Ohmine-shift has a second-order cutoff boundary condition, and the RMSD of g(r) declines roughly in proportion to r−<sup>4</sup> <sup>c</sup> . The RMSD of LIPS-fifth has a similar tendency with these shift function methods for cutoff radii, but a faster decline is observed. The RMSD of LIPS-fifth decreases roughly in proportion to r−<sup>6</sup> <sup>c</sup> . Furthermore, LIPS-fifth gives accurate estimations of g(r); the RMSD converges at r<sup>c</sup> ≥ 2.0 nm. Converged values of RMSD for LIPS-fifth are most accurate. On the other hand, the RMSDs of Wolf-DSF and IPSp have an adequate accuracy in any cutoff radius. The charge neutrality and counter-charge assumptions of Wolf-DSF and IPSp, respectively, seem to work better for bulk water systems.

Figure 6. The RMSDs of (a) the oxygen-oxygen, (b) the oxygen-hydrogen and (c) the hydrogen-hydrogen radial distribution function for the shift function method against the Ewald sum at different cutoff radii. In CHARMm-shift, RF-metal and IPSn that have a first-order cutoff boundary condition, the deviation decreases roughly in proportion to r−2.<sup>5</sup> <sup>c</sup> , r−<sup>3</sup> <sup>c</sup> and r−<sup>3</sup> <sup>c</sup> , respectively. Ohmine-shift has a second-order cutoff boundary condition, and the RMSD of g(r) declines roughly in proportion to r−<sup>4</sup> <sup>c</sup> . The RMSD of LIPS-fifth has a similar tendency with these shift function methods for cutoff radii, but a faster decline is observed. The RMSD of LIPS-fifth decreases roughly in proportion to r−<sup>6</sup> <sup>c</sup> . Furthermore, LIPS-fifth gives accurate estimations of g(r); the RMSD converges at r<sup>c</sup> ≥ 2.0 nm. Converged values of RMSD for LIPS-fifth are most accurate. The RMSDs of Wolf-DSF and IPSp have an adequate accuracy in any cutoff radius.

#### 3.1.4. Dipole-Dipole Correlation

We focused on the distance dependence of the Kirkwood factor, GK(r), where one can see the dipole-dipole correlation of bulk water systems. GK(r) has a strong cutoff radius effect, and the influence of the interaction treatment is quantitatively-expressible by the shape of GK(r). An evident shortcoming of the cutoff-like method appears for the GK(r) value in bulk water systems. Thus, GK(r) of various cutoff radii were calculated using the shift function methods to evaluate the truncation effect of the dipole-dipole correlation.

Figure 7. Distance dependence of the Kirkwood factor for the shift function methods and the Ewald sum. It is clearly seen that GK(r) calculated with CHARMm-shift, Ohmine-shift, RF-metal and IPSn fluctuate near r<sup>c</sup> as in g(r), and this fluctuation still remains in spite of the increment of the cutoff radius. The artificial configuration of Ohmine-shift was smaller than that of the other three methods. The defect of GK(r) for these above shift function methods was not seen in Wolf-DSF, IPSp and LIPS-fifth. Wolf-DSF, IPSp and LIPS-fifth can estimate GK(r) more adequately than other shift function methods.

Figure 7 shows the shape of GK(r) determined using the shift function methods and the Ewald sum. It is clearly seen that GK(r) calculated with CHARMm-shift, Ohmine-shift, RF-metal and IPSn fluctuate near r<sup>c</sup> as in g(r), and this fluctuation still remains in spite of the increment of the cutoff radius. The artificial configuration of Ohmine-shift was smaller than that of the other three methods. The defect of GK(r) for these above shift function methods was not seen in Wolf-DSF, IPSp and LIPS-fifth. Wolf-DSF, IPSp and LIPS-fifth can estimate GK(r) more adequately than other shift function methods.

The result of s(r) shows the radial distribution of the dipole ordering for water molecules. Figure 8 presents s(r) calculated with the shift function methods at r<sup>c</sup> = 2.0 nm, along with that determined by the Ewald sum for comparison. s(r) calculated with the CHARMm-shift, Ohmine-shift, RF-metal and IPSn fluctuate near rc, like g(r), despite the long cutoff radius. Wolf-DSF, IPSp and LIPS-fifth did not calculate any singular configurations of s(r), like that for the other shift function methods. These three methods can estimate s(r) with adequate accuracy.

The aforementioned characteristics of the truncation effect in the dipole-dipole correlation for the shift function methods are affected strongly by the treatment of the cutoff boundary condition, like the case of g(r).

Figure 8. Radial distributions of dipole ordering calculated with the shift function methods at r<sup>c</sup> = 2.0 nm and for the Ewald sum. s(r) calculated with the CHARMm-shift, Ohmine-shift, RF-metal and IPSn fluctuate near rc, like g(r), despite the long cutoff radius. Wolf-DSF, IPSp and LIPS-fifth did not calculate any singular configurations of s(r), like that for the other shift function methods. These three methods can estimate s(r) with adequate accuracy.

#### 4. Conclusions

To assess the truncation effect of some cutoff methods that are categorized as the shift function method, MD simulations for bulk water systems were performed. The results reflect mainly two main factors, *i.e.*, the treatment of cutoff boundary conditions and the presence/absence of the theoretical background for long-range approximation.

The difference of estimated value of the potential energy is related to the presence/absence of the theoretical background for contributions outside the cutoff sphere. CHARMm-shift, Ohmine-shift and Wolf-DSF poorly estimated the potential energy, because these methods do not have a reliable theory that justifies their shifting procedure. In contrast, RF-metal, IPSn and LIPS-fifth are, respectively, based on a definite theory, which considers contributions from long-range interactions. RF-metal, IPSn and LIPS-fifth have much better accuracy for estimating the potential energy. The fastest decline was observed in the case of LIPS-fifth; the error was roughly in proportion to r−<sup>4</sup> <sup>c</sup> . RF-metal and LIPS-fifth at r<sup>c</sup> = 2.0 nm achieved the smallest error and had the same value as that of the Ewald sum within 0.02%.

For estimating the self-diffusion coefficient, the difference of the cutoff radius tendency comes from the treatment of cutoff boundary conditions and long-range interactions. In IPSp, a faster convergence of errors was observed, because it has a third-order cutoff boundary condition and the long-range interaction treatment. For IPSp, the self-diffusion coefficient is saturated at r<sup>c</sup> ≥ 1.6 nm, and the saturated value is almost the same as that of the Ewald sum (within 0.35%). LIPS-fifth achieved the fastest convergence in this work. It has a fifth-order cutoff boundary condition and reliable treatment for long-range interactions. For LIPS-fifth, the self-diffusion coefficient is saturated at r<sup>c</sup> ≥ 1.4 nm, and the saturated value is almost the same as that of the Ewald sum (within 0.36%).

The truncation effect in the radial distribution function mainly reflects the treatment of cutoff boundary conditions. In CHARMm-shift, RF-metal and IPSn, which have a first-order cutoff boundary condition, the deviation decreases roughly in proportion to r−2.<sup>5</sup> <sup>c</sup> , r−<sup>3</sup> <sup>c</sup> and r−<sup>3</sup> <sup>c</sup> , respectively. Ohmine-shift has a second-order cutoff boundary condition, and the RMSD of g(r) declines roughly in proportion to r−<sup>4</sup> <sup>c</sup> . The RMSD of LIPS-fifth has a similar tendency with these shift function methods for cutoff radii, but a faster decline is observed. The RMSD of LIPS-fifth decreases roughly in proportion to r−<sup>6</sup> <sup>c</sup> . Furthermore, LIPS-fifth gives accurate estimations of g(r); the RMSD converges at r<sup>c</sup> = 2.0 nm. Converged values of RMSD for LIPS-fifth are most accurate. On the other hand, the RMSDs of Wolf-DSF and IPSp have an adequate accuracy in any cutoff radius. The charge neutrality and counter-charge assumptions of Wolf-DSF and IPSp, respectively, seem to work better for bulk water systems.

The cutoff radius effect in the dipole-dipole correlation is very similar to that of g(r). GK(r), and s(r) calculated with CHARMm-shift, Ohmine-shift, RF-metal and IPSn fluctuate near rc, as in g(r). The artificial configuration of Ohmine-shift was smaller than that of the other three methods. The defect of GK(r) and s(r) for these above shift function methods was not seen in Wolf-DSF, IPSp and LIPS-fifth. Wolf-DSF, IPSp and LIPS-fifth can estimate the dipole-dipole correlation more adequately than other shift function methods.

Overall, the shift function method that has a higher-order cutoff boundary condition and a reliable theoretical background for long-range interaction treatments achieves better accuracy. For estimating the potential energy and self-diffusion, LIPS-fifth is the most accurate shift function method. In the estimation of the radial distribution function, Wolf-DSF and IPSp have good accuracy with relatively short cutoff radii, and LIPS-fifth becomes most accurate at r<sup>c</sup> = 2.0 nm.

### Acknowledgments

This work was supported by the Japan Society for the Promotion of Science (JSPS) Grants-in-Aid for Scientific Research (KAKENHI) Grant Number 25820065.

## Conflicts of Interest

The authors declare no conflict of interest.

### References


Reprinted from *Entropy*. Cite as: Ono, S. Elastic Properties of CaSiO3 Perovskite from *ab initio* Molecular Dynamics. *Entropy* **2013**, *15*, 4300–4309.

#### *Article*

## **Elastic Properties of CaSiO3 Perovskite from**  *ab initio* **Molecular Dynamics**

## **Shigeaki Ono 1,2**


*Received: 26 June 2013 / Accepted: 6 October 2013 / Published: 10 October 2013* 

> **Abstract:** *Ab initio* molecular dynamics simulations were performed to investigate the elasticity of cubic CaSiO3 perovskite at high pressure and temperature. All three independent elastic constants for cubic CaSiO3 perovskite, C11, C12, and C44, were calculated from the computation of stress generated by small strains. The elastic constants were used to estimate the moduli and seismic wave velocities at the high pressure and high temperature characteristic of the Earth's interior. The dependence of temperature for sound wave velocities decreased as the pressure increased. There was little difference between the estimated compressional sound wave velocity (*VP*) in cubic CaSiO3 perovskite and that in the Earth's mantle, determined by seismological data. By contrast, a significant difference between the estimated shear sound wave velocity (*VS*) and that in the Earth's mantle was confirmed. The elastic properties of cubic CaSiO3 perovskite cannot explain the properties of the Earth's lower mantle, indicating that the cubic CaSiO3 perovskite phase is a minor mineral in the Earth's lower mantle.

**Keywords:** perovskite; first-principles calculation; seismic wave velocity

**PACS Codes**: 62.20.de; 91.60.Gf

#### **1. Introduction**

Mineral physics constraints on the composition of the Earth's lower mantle rely on knowledge of the equations of state (EOSs) and sound wave velocities in candidate minerals. According to reliable estimates of the composition of the Earth, an MgO-FeO-SiO2-CaO-Al2O3 system could

#### **500**

comprise about 99% of the mantle volume [1]. Three minerals have been proposed to be possible hosts of the MgO-FeO-SiO2-CaO-Al2O3 system in the Earth's lower mantle. A recent phase equilibrium study using a more representative composition of the mantle shows that Mg, Fe, and Al are mostly accommodated in orthorhombic (Mg,Fe)SiO3 perovskite and ferropericlase, (Mg,Fe)O. On the other hand, a number of other experimental studies indicate that the most likely Ca-bearing phase is CaSiO3 perovskite [2,3]. Thus, the Earth's lower mantle may be composed mainly of aluminous (Mg,Fe)SiO3 perovskite, CaSiO3 perovskite, and ferropericlase. To gain an understanding of the structure and dynamics of the Earth's lower mantle, it is important to investigate the elastic properties of these minerals under the pressure and temperature conditions found in this region. It is easy to investigate the physical properties of orthorhombic (Mg,Fe)SiO3 perovskite and ferropericlase, because both minerals can be recovered under ambient conditions. By contrast, CaSiO3 perovskite is unstable under ambient conditions, and it readily transforms to glass on the release of pressure. Therefore, it is difficult to measure some of its physical properties.

The structure of CaSiO3 perovskite has tetragonal or orthorhombic symmetry at high pressures and room temperature, e.g., [4,5]. The structure of low-symmetry CaSiO3 perovskite is still an open question [6–8]. A phase transformation of CaSiO3 perovskite from this low symmetry into cubic symmetry with an increase in temperature was found in a previous study [5], indicating that the cubic structure of CaSiO3 perovskite is stable under the conditions of the Earth's lower mantle, and that its physical properties are important for understanding the dynamics and evolution of the Earth's interior. The elastic properties of cubic CaSiO3 perovskite were calculated at 0 K [9,10] and high temperatures [8]. These data relating to the elastic properties of cubic CaSiO3 perovskite are insufficient to discuss the composition of the Earth's lower mantle, because an internally consistent data set describing the density-VP-VS relationship in cubic CaSiO3 perovskite is needed in order to compare the density-VP-VS relationship of a preliminary reference earth model (PREM) [11] with that estimated based on cubic CaSiO3 perovskite. Therefore, it is necessary to reevaluate the density-VP-VS relationship using *ab initio* calculations.

We employed the *ab initio* molecular dynamics (AIMD) method using density functional theory (DFT) to determine the density values and sound wave velocities for cubic CaSiO3 perovskite at pressures typical of the Earth's lower mantle. We also used the experimental data to correct the calculated values of the density and the sound wave velocities for cubic CaSiO3 perovskite.

#### **2. Method**

We performed the AIMD calculations based on DFT using the VASP code [12]. The interactions between the electrons and the ionic cores were described using the projector augmented wave (PAW) method [13] with generalized gradient approximations, known as PBE [14]. The advantage of this code is that the *ab initio* energy of the system can be combined with the molecular dynamics method to simulate the properties of cubic CaSiO3 perovskite at high pressure and high temperature simultaneously. The PAW potentials of Ca, Si, and O had core radii of 2.3, 1.5, and 1.1 a.u., respectively. Single particle orbitals were expanded in plane waves, with a plane-wave cut-off of 900 eV. The calculations were performed on the basis of a self-consistency convergence on the total energy of 104 eV per simulation cell. We used a 135-atom supercell, with -point Brillouin zone sampling and a time step of 1 fs at a constant volume. The simulations were run in the

constant *NVT* ensemble with a Nosé thermostat [15] for at least 5 ps after equilibration. The computation time required to reach equilibrium varied between configurations, and depended on the starting atomic position, velocity, temperature, and pressure. In previous studies, we have confirmed that useful data for the elastic properties of solids in high pressure and temperature conditions can be acquired using the previous AIMD calculations, e.g., [16,17]. In this study, AIMD calculations were performed under 27 selected pressure and temperature conditions up to 175 GPa and 4000 K. A comprehensive description of our method as applied to the modeling of condensed matter has been described previously [18].

The elastic constants can be determined from the computation of the second derivatives of the free energy as a function of small strains [19]. For a cubic crystal, the three elastic moduli, *C*11, *C*12, and *C*44, fully describe its elastic behavior. The values of *C*11 and *C*12 can be determined from the bulk modulus *K* and shear constant *CS*:

$$K = (C\_{11} + \mathcal{Z}C\_{12})/\mathfrak{Z} \tag{1}$$

$$C\_S = (C\_{11} - C\_{12}) / 2 \tag{2}$$

The following tetragonal strains were applied to obtain *CS*:

$$
\mathcal{E} = \begin{pmatrix} e & 0 & 0 \\ 0 & e & 0 \\ 0 & 0 & (1+e)^{-2} - 1 \end{pmatrix} \tag{2}
$$

where *e* is the strain magnitude. The change in the free energy of the strained structure, *E*(*e*), is related to *e* as follows:

$$
\Delta E(e) = \Im V(C\_{11} - C\_{12})e^2 + O(e^3) \tag{3}
$$

where *V* is the volume of the cell. *C*44 was calculated by applying the volume-conserving orthorhombic strain:

$$
\varepsilon = \begin{pmatrix} 0 & e & 0 \\ e & 0 & 0 \\ 0 & 0 & e^2/(1 - e^2) \end{pmatrix} \tag{4}
$$

The energy associated with this strain is:

$$
\Delta E(e) = 2C\_{44}Ve^2 + O(e^4) \tag{5}
$$

According to the calculations for unstrained and strained structures, the elasticity of cubic CaSiO3 perovskite at high pressure and high temperature can be determined.

#### **3. Results**

The EOS of cubic CaSiO3 perovskite has been investigated in a previous experimental study [20]. Recent theoretical studies have investigated the physical properties of materials under high pressure and high temperature using first-principles calculations. We noticed that the scatter of the EOS determined by experimental study was smaller than that obtained from first-principles calculations [18], indicating that the EOS determined in experiments was more accurate than that

determined from first-principles calculations. By contrast, the AIMD calculations present significant advantages for investigating the elastic properties of materials under high temperatures and high temperatures. Therefore, we used the experimental data to determine the EOS for cubic CaSiO3 perovskite. The pressures estimated by the AIMD calculations were corrected based on the EOS determined by the experimental data. The combination of first-principles molecular dynamics calculations and high pressure experimental data led us to determine reliable physical properties over a wide range of pressures and temperatures. The EOS for a solid can be described in a general form as a functional relationship between pressure, volume, and temperature:

$$P\_{Total}(V,T) = P\_{st}(V, \mathfrak{J}000) + P\_{th}(V,T) \tag{7}$$

A fit of the volume-pressure data yielded volume and bulk modulus values of *V0* = 45.58 Å3 , *KT0* = 236 GPa and *K'T0* = 3.9 [20] for a third-order Birch-Murnaghan EOS [21]:

$$P\_{st} = \frac{3}{2} K\_{r0} \left[ \left( \frac{V\_0}{V} \right)^{\frac{7}{3}} - \left( \frac{V\_0}{V} \right)^{\frac{5}{3}} \right] \left[ 1 - \frac{3}{4} (4 - K\_{r0}) \left[ \left( \frac{V\_0}{V} \right)^{\frac{2}{3}} - 1 \right] \right] \tag{8}$$

where *V0* and *KT0*, and *K'T0* are the volume, isothermal bulk modulus, and first pressure derivative of the isothermal bulk modulus, respectively. The thermal pressure, *Pth*, of the thermal pressure EOS can be written as follows:

$$P\_{th} = \left[\alpha K\_T \left(V\_0, T\right) + \left(\frac{\partial K\_T}{\partial T}\right)\_V \ln\left(\frac{V\_0}{V}\right)\right] \left(T - T\_0\right) \tag{9}$$

A least squares fit of the high temperature data from the AIMD calculations yields *-KT*(*V*0,*T*) = 0.0083 and (*KT*/*T*)*V* = 0.0031. The value of *Pth* of cubic CaSiO3 perovskite was not sensitive to changes in volume at the pressures investigated in this study, because the values of (*KT*/*T*)*V* were very small. The fitting parameters of the third-order Birch-Murnaghan EOS combined with the thermal pressure EOS were *V*0, *KT0*, *KT0'*, *KT*(*V*0,*T*), and (*KT*/*T*)*V*. The results of the fit of our *P-V-T* data to the thermal pressure equation of state are summarized in Table 1.

**Table 1.** The thermoelastic parameters of cubic CaSiO3 perovskite. The third-order Birch–Murnaghan EOS was used to calculate the parameters of cubic CaSiO3 perovskite. Key: *KT0*, isothermal bulk modulus at 0 GPa and 300 K; *K'T0*, first pressure derivation of the bulk modulus; *V0*, volume at 0 GPa and 300 K. The terms *-KT*(*V*0,*T*) and (&*KT/* & *T*)*V* are parameters of the thermal pressure.


The parameters are from Shim *et al.* [20].

We determined the elastic constant by computing the stress generated by small deformations of the equilibrium cell. Figure 1 shows three elastic constants of cubic CaSiO3 perovskite (*C*11, *C*12

and *C*44) at 2000 K as a function of pressure up to 160 GPa. The bulk modulus of an isotropic aggregate cubic crystal is well defined, whereas the shear modulus can be constrained by the Voigt-Reuss-Hill scheme [22]:

$$\left| G^{V} = \frac{1}{\mathfrak{S}} (\mathfrak{L}C\_{\mathfrak{S}} + \mathfrak{K}C\_{44}) \right. \tag{10}$$

$$G^{R} = \left[\frac{1}{5} \left(\frac{2}{C\_S} + \frac{3}{C\_{44}}\right)\right]^{-1} \tag{11}$$

$$G^{H} = \frac{1}{2} (G^{\prime} + G^{R}) \tag{12}$$

We also show the bulk modulus *K* and Hill's average *G* at 2000 K as a function of pressure in Figure 1. Karki and Crain [9] calculated elastic constants and moduli of cubic CaSiO3 perovskite at 0 K up to 140 GPa. Our results for elastic parameters calculated at 2000 K were in general agreement with those at 0 K reported by previous studies.

**Figure 1.** Pressure dependence of three elastic constants, *C*11, *C*12, and *C*44, and the isotropic bulk (*K*) and shear (*G*) moduli of cubic CaSiO3 perovskite at 2000 K. The solid circles and diamonds represent the elastic constants and the elastic moduli, respectively. The solid and dashed lines are the fits of each parameter.

The three isotropically averaged aggregate sound velocities could be derived from the bulk modulus *K* and shear modulus *G*:

$$W\_P = \left[ \left( K + \frac{4}{3} G \right) / \rho \right]^{\frac{1}{2}} \tag{13}$$

$$V\_B = \left(\frac{K}{\rho}\right)^{\overline{2}}\tag{14}$$

$$W\_S = \left(\frac{G}{\rho}\right)^{\frac{1}{2}}\tag{15}$$

where *VP*, *VB*, and *VS* are the compressional, bulk, and shear sound wave velocities, respectively, and is the density. The three sound wave velocities, *VP*, *VB*, and *VS*, increased with increasing pressure at 2000 K in Figure 2. Our results for sound wave velocities were in good agreement with those reported by Li *et al.* [8].

**Figure 2.** Sound wave velocities in cubic CaSiO3 perovskite at 2000 K calculated from the elastic constants. The solid circles, squares, and diamonds represent the compressional, bulk, and shear velocities, respectively.

The effect of temperature on the sound wave velocities was investigated at high temperatures corresponding to conditions in the Earth's mantle. In Figure 3, the results of the AIMD simulations at high temperatures of 2000 *T*(K) 4000 showed that sound wave velocities decreased with increasing temperature. However, there were only small dependencies on temperature. As the pressure increased, these dependencies on temperature became small. The sound wave velocities were fitted to the following equation as functions of temperature and pressure:

$$\mathbf{v} = a + (b + cP)T + d \ln(P) \tag{16}$$

where *a*, *b*, *c* and *d* are fitted parameters, and *T* and *P* are given in K and GPa, respectively. The results of the fitted parameters are summarized in Table 2.

**Figure 3.** Sound wave velocities in cubic CaSiO3 perovskite at high temperatures. The diamonds represent the compressional and shear wave velocities calculated by the AIMD simulations. The dashed lines represent the fitted velocities for 2000, 3000, and 4000 K.

**Table 2.** Parameters of the compressional and shear sound velocities. The parameters are given by *v a b cP T d* ln *P* , where *T* and *P* are the temperature (K) and the pressure (GPa), respectively. The conditions for applying these parameters to Equation (16) are 15 GPa < *P* < 140 GPa and 1500 K < *T* < 4500 K.


#### **4. Discussion**

It is important to assess the uncertainties of the *ab initio* calculations to understand the implications of the calculated results. In general, different types of approximation have led to different values in previous *ab initio* studies, e.g., [18]. The difference between Local Density Approximation (LDA) and Generalized Gradient Approximation (GGA) leads to a change in cell volume of a few percent. This uncertainty is non-negligible in the context of discussing the behavior of the Earth's mantle. Although GGA was used in the present study, we corrected the calculated values according to the experimental EOS to minimize the uncertainties related to approximations used in *ab initio* simulations. Therefore, our discussion of the comparison between estimated elastic properties and PREM values is more reliable than those of previous studies.

It is believed that the lower mantle contains three minerals; (Fe,Al)-bearing Mg-perovskite, ferropericlase, and Ca-perovskite [2,3]. We calculated the density, and the compressional and shear sound wave velocities for cubic CaSiO3 perovskite in order to compare them with the values from the PREM [11]. The values for cubic CaSiO3 perovskite were calculated using the EOS defined in

Table 1 and Equation (16) defined in Table 2. The estimated values for cubic CaSiO3 perovskite and the PREM are compared in Figure 4. The adiabatic temperature profile (geotherm) was used as the temperature profile in the Earth's lower mantle [3]. The calculated density of cubic CaSiO3 perovskite was in good agreement with that estimated by seismological data (PREM). Li *et al.* [8] estimated the density of cubic CaSiO3 perovskite, and the estimated density was higher than that in the PREM data. Although our AIMD method was similar to that used by Li *et al.* [8], the discrepancy between this and the previous study was confirmed. As the pressure was corrected to estimate density accurately in this study, the difference between our estimated density of cubic CaSiO3 perovskite and that from the PREM data should be small. For the compressional sound wave velocity, the discrepancy between the calculated values for cubic CaSiO3 perovskite and the observed data was small. The difference increased as the depth increased. By contrast, the shear sound wave velocity in cubic CaSiO3 perovskite was much higher than that from the PREM data. If cubic CaSiO3 perovskite is a major mineral in the Earth's lower mantle, the sound wave velocity profiles cannot be explained. Therefore, our study implies that cubic CaSiO3 perovskite is a minor mineral in the Earth's lower mantle. In a previous study, the shear sound wave velocity in orthorhombic (Mg,Fe)SiO3 perovskite was reported to be lower than that in cubic CaSiO3 perovskite at 0 K [23]. As the shear sound wave velocity in ferropericlase, (Mg,Fe)O, is much lower than that of orthorhombic (Mg,Fe)SiO3 perovskite, sound wave velocities in orthorhombic (Mg,Fe)SiO3 perovskite at higher temperatures are therefore in general agreement with those from PREM, indicating that orthorhombic (Mg,Fe)SiO3 perovskite might be a major mineral in the Earth's lower mantle.

**Figure 4.** Density and sound wave velocities for cubic CaSiO3 perovskite compared with PREM data. The solid circles represent the values from PREM [11]. The solid lines represent the calculated values under the conditions of the Earth's lower mantle. The values for cubic CaSiO3 perovskite were calculated using the equations defined in Tables 1 and 2 and the adiabatic temperature profile (geotherm) in the Earth's interior [3]; a: compressional sound velocity; b: shear sound velocity; c: density.

In general, a certain quantity of most elements dissolves in three minerals, namely (Fe,Al)-bearing Mg-perovskite, ferropericlase, and Ca-perovskite, and the partition coefficients of minor elements between the three minerals change with temperature and pressure. In this study, the physical properties of a pure cubic CaSiO3 perovskite host were calculated, because the complicated chemical composition would need a very large simulation time, and it would have been difficult to perform a reliable AIMD study. The effects of the minor elements on the density and the sound wave velocities therefore require further investigation.

#### **5. Conclusions**

We have predicted from the first-principles theory the sound velocities of cubic CaSiO3 perovskite at high pressures and temperatures corresponding to the Earth's lower mantle. Comparison of the elastic properties of cubic CaSiO3 perovskite with the lower mantle properties estimated from the seismic observations supports the prevailing hypothesis that the lower mantle consists primarily of (Mg,Fe)SiO3 perovskite.

#### **Acknowledgments**

This work made use of the super computer system of JAMSTEC and of the computer systems of the Earthquake Information Center of the Earthquake Research Institute. This work was supported by Grant-in-Aid for Scientific Research from JSPS and the Earthquake Research Institute cooperative research program, Japan.

#### **Conflicts of Interest**

The author declares no conflict of interest.

#### **References**


Reprinted from *Entropy*. Cite as: Wang, B.-B.; Wang, X.-D.; Chen, M.; Xu, J.-L. Molecular Dynamics Simulations on Evaporation of Droplets with Dissolved Salts. *Entropy* **2013**, *15*, 1232–1246.

*Article* 

## **Molecular Dynamics Simulations on Evaporation of Droplets with Dissolved Salts**

**Bing-Bing Wang 1,2, Xiao-Dong Wang 1,2,\*, Min Chen 3,\* and Jin-Liang Xu 1,2** 


*Received: 6 January 2013; in revised form: 18 March 2013 / Accepted: 18 March 2013 / Published: 8 April 2013* 

**Abstract:** Molecular dynamics simulations are used to study the evaporation of water droplets containing either dissolved LiCl, NaCl or KCl salt in a gaseous surrounding (nitrogen) with a constant high temperature of 600 K. The initial droplet has 298 K temperature and contains 1,120 water molecules, 0, 40, 80 or 120 salt molecules. The effects of the salt type and concentration on the evaporation rate are examined. Three stages with different evaporation rates are observed for all cases. In the initial stage of evaporation, the droplet evaporates slowly due to low droplet temperature and high evaporation latent heat for water, and pure water and aqueous solution have almost the same evaporation rates. In the second stage, evaporation rate is increased significantly, and evaporation is somewhat slower for the aqueous salt-containing droplet than the pure water droplet due to the attracted ion-water interaction and hydration effect. The Li<sup>+</sup> -water has the strongest interaction and hydration effect, so LiCl aqueous droplets evaporate the slowest, then NaCl and KCl. Higher salt concentration also enhances the ion-water interaction and hydration effect, and hence corresponds to a slower evaporation. In the last stage of evaporation, only a small amount of water molecules are left in the droplet, leading to a significant increase in ion-water interactions, so that the evaporation becomes slower compared to that in the second stage.

**Keywords:** molecular dynamics simulations; evaporation; aqueous droplet; salts

#### **1. Introduction**

The physics of droplet evaporation in an infinite space has attracted much interest due to the crucial role played in energy engineering, in chemical engineering as well as in environmental processes [1], and has been under investigation for many years by different methods: hydrodynamics [2], kinetic theory [3], and molecular simulations [4–11]. Molecular dynamics attempts to simulate the real behavior of Nature by identifying each atom and following their motion in time through the basic laws of classical mechanics [7]. The system behavior and temporal evolution of its thermodynamic and transport properties can be obtained by statistically averaging the results of all molecular motions. Molecular dynamics simulating an evaporation process has no need of some assumptions made by CFD (computational fluid dynamics), so this method was adopted to study the droplet evaporation [4–11] and the evaporation of flat thin liquid films on solid surfaces [12–14]. These studies focused on evaporation of droplets consisting of one component under various conditions, and compared the evaporation rates simulated by molecular dynamics and predicted by classical kinetic theory, such as the *D*<sup>2</sup> law [5,6,9]. The *D*<sup>2</sup> law was derived based on the droplet evaporation in an infinite space [15], and predicts that the derivative of the square of the droplet diameter with respect to time is constant, or *dD*<sup>2</sup> /*dt* = *K*, where *K* is the evaporation constant. However, Semenov *et al.* [16] presented that for the evaporation of sessile drops on hydrophobic substrates, the evaporation rate is proportional to the radius of the three phase line instead of being proportional to the area of the surface of the droplet. Semenov *et al.* [17] also investigated the effect of the influence of kinetic effects on evaporation of pinned sessile water droplets of submicrometer size placed on a heat conductive substrate. Their computer simulation model took into account the following phenomena: influence of curvature of the droplet's surface on the saturated vapor pressure above the surface (Kelvin's equation), the effect of latent heat of vaporization, thermal Marangoni convection, and Stefan flow inside an air domain above the droplet.

The evaporation of droplets of dissolved salts (aqueous droplets) has extensive applications in many industrial processes such as crystallization [2,18], electrospraying [19], electrospinning [20], and atmospheric science [21–25]. Starov and Churaev [2] investigated the crystallization process of aqueous solutions in a thin capillary using hydrodynamics, and they presented that the evaporation flux differs significantly from that predicted by the classical solution with a one-component liquid. Recently, some studies have used molecular dynamics simulations to investigate the evaporation of a small water cluster dissolving a single kind of charged ion [19,26–28]. Caleman and Spoel [26] simulated the evaporation from a water cluster (*N* = 216 and 512) containing either Cl , H2PO4 , Na<sup>+</sup> or NH4 + (*N* = 0, 4 and 8) under a vacuum, and their results showed a somewhat slower evaporation rate for clusters with Cl and Na+ than those with H2PO4 and NH4 + . Daub and Cann [27] studied evaporation and condensation of a small cluster (*N* = 10, 20, 30 and 40) of water or methanol containing one single Ca2+, Na+ or Cl ion in either a vacuum or under argon gas. Daub and Cann [19] also studied the evaporation of a water cluster (*N* = 10, 15 or 20) containing

one Na<sup>+</sup> ion or one Ca2+ ion under the action of an electric field. Daub and Cann's results demonstrated that the interaction between ions and water molecules affects the evaporation of the cluster [19,27]. Khler [21] investigated the process of formation of liquid cloud drops based on equilibrium thermodynamics, where water vapor condensed with existence of nucleus (solutes), and proposed the well-known Khler theory expressed as:

$$\ln\left(\frac{p\_{\text{w}}\left(D\right)}{p^{\text{o}}}\right) = \frac{4M\_{\text{w}}\sigma\_{\text{w}}}{RT\,\rho\_{\text{w}}D} - \frac{6n\_{\text{s}}M\_{\text{w}}}{\pi\rho\_{\text{w}}D^{3}}\tag{1}$$

where *p*w is the water vapor pressure outside the droplet, *p*° is the corresponding saturation vapor pressure over a flat surface, w is the droplet surface tension, w is the density of pure water, *n*s is the moles of solute, *M*w is the molecular weight of water, and *D* is the droplet diameter. According to Khler's theory, the droplet diameter, water surface tension, and molar concentration of the solute significantly affect the water vapor pressure and hence the droplet evaporation rate.

Generally, an aqueous solution is electrically neutral and it includes the same number of cations and anions for monovalent salts, the interaction between cations and anions may influence the evaporation properties of the aqueous droplet, however, few molecular dynamics simulations have been carried out considering this process. The salt crystallization process from an evaporating NaCl aqueous solution has been studied by Mucha and Jungwirth [18] using molecular dynamics simulations, but they did not focus on evaporation rates. This work uses molecular dynamics simulations to investigate the evaporation of water droplets containing either dissolved LiCl, NaCl or KCl salt. The droplet is surrounded and heated by a nitrogen gas atmosphere at a constant temperature. The effects of the salt concentration and salt type on the evaporation rate are examined. By analyzing the spatial position and the interaction energy of ions and water molecules, the differences between evaporation rates for various cases are explained in detail.

#### **2. Molecular Dynamics Simulations**

#### *2.1. Interatomic Potential and Initial Configuration*

For molecular dynamics simulations, selecting a proper intermolecular potential function and constructing a correct initial configuration of system are important to correctly describe the physical process concerned. This work simulates the evaporation of water droplets with or without dissolved salts. Three salts, LiCl, NaCl or KCl, at various concentrations are added to the water droplet. The droplet is surrounded and heated by nitrogen gas at a constant temperature. Since the system includes nitrogen molecules, water molecules, Li+ , Na+ , K+ , and Cl ions, the long-range Coulombic force between ions must be considered. Thus, a combined potential model of Lennard-Jones 12-6 potential and Coulombic potential is adopted here, which can be expressed as [29,30]:

$$U\_{\ddot{\imath}} = \frac{q\_i q\_j}{r\_{\ddot{\imath}}} + 4\varepsilon\_{\ddot{\imath}} \left[ \left( \frac{\sigma\_{\ddot{\imath}}}{r\_{\ddot{\imath}}} \right)^{12} - \left( \frac{\sigma\_{\ddot{\imath}}}{r\_{\ddot{\imath}}} \right)^6 \right] \tag{2}$$

where the subscripts *i* and *j* denote *i*th and *j*th particles (atoms or ions), *q* is the charge of particle, *r*  is the distance between particles, and are the minimum energy and the zero energy separation distance. The water molecules are characterized by the SPC/E model [30]. The values of the potential parameters *q*, and for the same particles are summarized in Table 1 [29,30]. The following mixing rules are adopted to describe the potential parameters between different particles, or:

$$
\sigma\_{\vec{y}} = \left(\sigma\_i + \sigma\_j\right) / 2 \tag{3}
$$

$$
\varepsilon\_{ij} = \sqrt{\varepsilon\_i \varepsilon\_j} \tag{4}
$$


**Table 1.** Values of potential parameters.

The truncated distances for short-range and long-range forces are taken as 10 Å and the PPPM summation technique [31] is used to modify the long-range Coulomb interaction. The equations of motion are integrated using the Velocity-Verlet algorithm [5]. The method of constraints [29] is applied for nitrogen molecules and water molecules to maintain their bond lengths and angles.

The initial configuration of the system is shown in Figure 1, where the nitrogen molecules, water molecules, and ions are distinguished with different colors. The droplet is placed in the center of a cubic box with a side length of 12 nm, and nitrogen molecules surround the droplet. The number of water molecules in the droplet is 1,120, and the number of nitrogen molecules is 600, corresponding to a 16.47 kg·m3 gas density. The number of salt molecules is assumed to be 0, 40, 80, and 120, respectively, to analyze the effect of salt concentration on droplet evaporation. It is noted that the maximum salt mole concentration is 9.7%, which is less than its saturation concentration, and hence salt crystallization cannot occur. The droplet radius is fixed to 2 nm for all cases. This value corresponds to a density of 1 g·cm3 as the droplet is composed of pure water.

**Figure 1.** Initial configuration of system: green balls are N, white balls are H, red balls are O, blue balls are positive ions, and purple balls are chloride ions.

#### *2.2. Preparation of Initial Equilibrium State*

Before the onset of evaporation a well-defined system has to be prepared. Initial velocities of particles in both the gaseous phase (nitrogen) and the droplet are generated by assuming a Maxwell-Boltzmann distribution based on the initial temperature of 298 K. Periodic boundary condition is applied to the three coordinates of the box. A time step length *t* = 1 fs is used for all cases. The system with the initial configuration is simulated in an NVT ensemble and it reaches an equilibrium state after 100,000 time steps. For each time step, the velocities of the gaseous phase and the droplet are separately rescaled to maintain a constant temperature of 298 K. It is noted that a very small amount of water molecules (less than 10 water molecules for the case in Figure 2a) escape from the droplet and occur in the surrounding vapor when the initial equilibrium state is reached, as shown Figure 2a.

To analyze the temporal evolution of the evaporation rate, one must define whether a water molecule belongs to the droplet or to the vapor. The method originally proposed by Shigeo *et al.* [32] is adopted here, which is based on counting the number of neighboring water molecules around each water molecule. Neighbor molecules are determined as the molecules within a distance of 4.34 Å from the molecule of interest. A molecule is considered to be in the droplet if its neighbor molecule number *n*neighbor  9, in the vapor phase if *n*neighbor 1, or in interface region if 2 *n*neighbor 8. The interface is ignored in the present work, so *n*neighbor  4 is used as a threshold value to determine the droplets.

**Figure 2.** Snapshots of the simulation boxes and corresponding number densities of water molecules, Na<sup>+</sup> and Cl¯ ions at different evaporation instants: (**a**) *t* = 0 ps; (**b**) *t* = 500 ps; (**c**) *t* = 1,000 ps; (**d**) *t* = 1,600 ps.

#### *2.3. Droplet Evaporation*

The temperature of the gaseous phase is abruptly increased to 600 K to trigger the droplet evaporation and this temperature is kept by the velocity-rescaling method [4] for nitrogen molecules for every time step throughout the evaporation process for all cases. Total evaporation time is set as 1,600 ps. The instantaneous positions and velocities of particles are recorded every 1 ps and all quantities of interest are calculated statistically for 10 recorded results to reduce statistical fluctuations. Figure 2 presents snapshots for the water droplets with 120 dissolved NaCl

molecules and the number densities of water molecules, Na+ and Cl ions at different evaporation instants. It can be seen that the droplets deviate from the initial spherical shape and their volume gradually decreases with time, however, Na+ and Cl ions cannot escape from the droplet and finally crystallize as the droplet evaporates completely.

#### **3. Results and Discussion**

#### *3.1. Effect of Salt Concentration*

To analyze the effect of the salt concentration on the droplet evaporation, 0, 40, 80, and 120 LiCl molecules are added into 1,120 water molecules, respectively, to prepare aqueous droplets. The initial temperature of the droplet is 298 K, the droplet is heated by 600 K nitrogen gas and evaporates. The temporal evolution of the water molecule number in the droplet is shown in Figure 3.

Figure 3 shows that the evaporation process can be divided into three stages. At the beginning of evaporation the evaporation rates for pure water and three aqueous solutions are low, since only a small amount of heat is transferred to the droplet and water has high latent heat of evaporation, and no visible difference is observed for four droplets which means that Li<sup>+</sup> and Cl have not yet affected the evaporation. Later, the evaporation rates increase because more heat is transferred to the droplet and the difference between four droplets occurs, the water droplet evaporates faster than three aqueous droplets and vanishes about at *t* = 1,150 ps. The aqueous droplet with high LiCl concentration has a lower evaporation rate than that with low LiCl concentration. In the last stage of evaporation (at about *t* > 1,200 ps), the evaporation rates for aqueous droplets decrease compared to that in the second stage.

**Figure 3.** Temporal evolution of the water molecule number in the droplet with various salt concentrations.

The radial distribution functions and their integrals of Li+ -O, Cl -O for the aqueous droplet with 80 LiCl molecules at different evaporation instants are shown in Figure 4. The hydration number in the present work is defined as the average number of water molecules around an ion in the first solvation shell, and can be expressed as:

$$N\_{\rm ion\cdot 0}^{\rm sol} = \rho\_{\rm ion} 4\pi \int\_0^{r\_{\rm vol}} \mathbf{g}\_{\rm ion\cdot 0} \left( r \right) r^2 dr \tag{5}$$

**515** 

where, ion is the number density of ions (Li+ or Cl ), *g*ion-O (*r*) is the radial distribution function, and *r*sol is the radius of the first solvation shell. Figure 4 shows that the first peak values of *g*Li+-O(*r*) and *g*Cl -O (*r*) occur at *r* = 1.95 Å and 3.25 Å, and the first valley values at 2.85 Å and 4.15 Å at *t* < 600 ps, which implies that the water molecules located at a distance *r* < 2.85 from Li<sup>+</sup> and 4.15 Å from Cl are attracted strongly by the ions, thus, 2.85 Å and 4.15 Å can be regarded as the radius of the first solvation shell for Li+ and Cl , respectively.

**Figure 4.** Radial distribution functions and hydration numbers at various evaporation instants for aqueous droplet with 80 LiCl molecules: (a) *g*Li+-O(*r*) and *N*Li+-O (*r*); (b) *g*Cl -O(*r*) and *N*Cl -O(*r*).

Figure 4 shows that hydration number of Li+ is 3.80 at *t* = 0 ps, 3.71 at *t* = 200 ps, 3.45 at *t* = 400 ps, as well as 3.34 at *t* = 600 ps, with only 9.2% decrease from *t* = 0 ps to *t* = 600 ps; however, hydration number of Cl is 8.70 at *t* = 0 ps, 8.68 at *t* = 200 ps, 8.67 at *t* = 400 ps, and 8.65

at *t* = 600 ps. Therefore, only 52 and four water molecules escape the confinement of Li+ and Cl , respectively. At the same period, about 270 water molecules escape from the droplet due to evaporation (Figure 3). The results above demonstrates that the free water molecules with a weak interaction with ions made the biggest contribution to evaporation rate at the beginning of evaporation, and hence no visible difference is observed for pure water and aqueous solution with various LiCl concentrations.

The hydration numbers of Li<sup>+</sup> and Cl at *t* = 600 ps for aqueous droplets with 40, 80, and 120 LiCl molecules are listed in Table 2. Although high LiCl concentration leads to a small hydration number, the hydration effect is enhanced because the total number of water molecules bounded by Li<sup>+</sup> and Cl is increased. The addition of Li<sup>+</sup> and Cl into the water droplet also affects the interaction between water molecules. Table 3 lists the coordination number of water molecular at *t* = 0 ps, which is defined as the average number of water molecules in a sphere with 0.35 nm radius around a water molecule. The value of 0.35 nm chosen here is based on the fact that it is a standard length to determine the formation of hydrogen bonds between water molecules [26]. Table 3 shows that the coordination number of water molecular is reduced for high LiCl concentration, thus, the interaction between water molecules becomes less with increased LiCl concentration. The average interaction energies between water molecules and ions (Li+ and Cl ) for various LiCl concentrations are calculated by Equation (2) and shown in Figure 5. The negative value means that water molecules are attracted by ions. The interaction energy is stronger for high LiCl concentration at *t* < 1,200 ps. Based on results above, the low evaporation rate of the droplet with high LiCl concentration can be attributed to stronger hydration effect and stronger attractive force to water imposed by Li+ and Cl as compared to that with low LiCl concentration.


**Table 2.** hydration number of Li+ and Cl for different cases at *t* = 600 ps.


Figure 5 also shows that the interaction energy is significantly elevated at *t* > 1,200 ps, because less and less water molecules are left in the droplet, hence, the evaporation becomes slower at *t* > 1,200 ps. The vapor pressure is low at the beginning stage of evaporation, and it gradually increases as water molecules escape from the droplet. High vapor pressure means larger evaporation resistance, which is another important factor for the slower evaporation rate in the last stage of evaporation than that in the second stage.

The aqueous droplet with 40 dissolved LiCl molecules has the highest evaporation rate at *t* < 1,200 ps (Figure 3), thus, less water molecules are left in the droplet compared to the droplets with 80 and 120 LiCl, so that each water molecule in the droplet is surrounded by more ions at *t* > 1,200 ps, which leads to a stronger ion-water interaction for the droplet with 40 LiCl.

Therefore, the crossover of curves of ion-water interaction for various LiCl concentrations is observed at *t* = 1,200 s.

**Figure 5.** Average interaction energies between water molecules and ions (Li+ and Cl ) for various LiCl concentrations.

#### *3.2. Effect of Salt Category*

Due to the difference of interactions between water molecules and various ions, the evaporation rates of droplets with dissolved different salts may be different. Three common salts LiCl, NaCl and KCl are used to analyze this effect. Figure 6 shows temporal evolution of the number of water molecules in the droplets with 120 LiCl, NaCl or KCl molecules. Again, three stages are observed during evaporation for all the three aqueous droplets. The evaporation rates of aqueous droplets are lower than that of pure water droplet, and the slowest is LiCl aqueous droplet, then NaCl and KCl.

**Figure 6.** Temporal evolution of the water molecule number in the droplets with various salts.

Figure 7 shows the distribution of water molecules in a sphere with 0.45 nm radius around Li+ , Na<sup>+</sup> , and K<sup>+</sup> at evaporation instants of 0, 300, and 1,000 ps. Oxygen atoms are closer to cations

than hydrogen atoms due to attracted Coulombic interaction. The number of water molecules around Li<sup>+</sup> , Na<sup>+</sup> , and K+ differs significantly, more water molecules occur around Li<sup>+</sup> , then Na<sup>+</sup> and K+ . As the droplet evaporates, the water molecules around cations are gradually reduced.

**Figure 7.** Snapshots of local distribution of water molecules around cations at different evaporation instants for various salts: (**a1**), (**a2**) and (**a3**) KCl at 0, 300, and 1,000 ps; (**b1**), (**b2**) and (**b3**) NaCl at 0, 300, and 1,000 ps; (**c1**), (**c2**) and (**c3**) LiCl at 0, 300, and 1,000 ps. (White balls: H, red balls: O, purple balls: Cl blue ball: K+ , Na<sup>+</sup> or Li<sup>+</sup> ).

The radial distribution functions *g*(*r*)Cation-Cl, *g*(*r*)Cation-O, *g*(*r*)Cl -O for LiCl, NaCl, and KCl aqueous droplets at various evaporation instants of 0, 600, 1,300, and 1,600 ps are show in Figure 8, where the subscript "Cation" denotes K+ , Na<sup>+</sup> , or Li<sup>+</sup> , respectively. The positions of first peak of *g*(*r*)Cation-Cl are 0.24 nm, 0.27 nm and 0.33 nm for LiCl, NaCl, and KCl aqueous droplets (Figure 8a), and the positions are almost unchanged throughout the evaporation process. However, the peak values of *g*(*r*)Cation-Cl are elevated with the time, which means that more and more cations and chloride ions aggregate together. Eventually, a crystal will form when all water molecules in the droplet evaporate completely. The peak values of *g*(*r*)Cation-O (Figure 8b) and *g*(*r*)Cl -O (Figure 8c) for LiCl aqueous droplet are the largest throughout the evaporation process, then for NaCl and the smallest for KCl. Thus, the strongest hydration effect occurs for LiCl aqueous droplet according to Equation (4).

Comparison of Figure 8b,c indicates that the difference of *g*(*r*)Cation-O for three aqueous droplets is more significant than that of *g*(*r*)Cl -O. Therefore, only the average interaction energy between water molecules and cations (K+ , Na<sup>+</sup> , or Li<sup>+</sup> ) is calculated by Equation (2) and is plotted in Figure 9. The attractive force between water molecules and Li<sup>+</sup> is the strongest, while the weakest is for K+ . The results confirm again that the strong hydration effect and attractive force are responsible for the slow evaporation. The results can also be connected to the Hoffmeister series effect [33] in term of structure breakers or structure enhancer cations. Hofmeister series is a classification of ions in order of their ability to salt out. The order of cations is usually given as: K+ > Na<sup>+</sup> > Li<sup>+</sup> in Hofmeister series, therefore, the present results are in good agreement with the Hofmeister series effects.

**Figure 8.** The radial distribution functions at different evaporation instants: (**a**) *g*(*r*)Cation-Cl; (**b**) *g*(*r*)Cation-O; (**c**) *g*(*r*)Cl -O. (cation denotes K<sup>+</sup> , Na<sup>+</sup> , or Li+ ).

In Nature, many aqueous solutions include two or more solutes, so it is necessary to discuss the evaporation of water droplet simultaneously dissolved various kinds of salts. The evaporation of KCl + LiCl aqueous droplet is simulated and the results are shown in Figure 10, where KCl20+LiCl60 means that droplet dissolves simultaneously 20 KCl molecules and 60 LiCl molecules. With the same salt concentration, the evaporation rates of KCl20 + LiCl60 and

KCl40 + LiCl40 aqueous droplets are between the ones of KCl and LiCl aqueous droplets, and faster evaporation occurs at KCl40 + LiCl40 since K+ has weaker hydration effect and smaller attractive force towards water molecules than Li+ .

**Figure 9.** Average interaction energies between water molecules and ions (Li+ and Cl ) for various salts.

**Figure 10.** Temporal evolution of the water molecule number in the droplets with various salts.

#### **4. Conclusions**

The evaporations of pure water droplets as well as NaCl, KCl and LiCl aqueous droplets are studied by molecular dynamics simulations. The droplets are placed in a gaseous nitrogen surrounding and heated by the surrounding with a constant high temperature of 600 K. The evaporation of aqueous droplet can be divided into three stages with different evaporation rates. The rate is slow at the beginning of evaporation, because only a small amount of heat is transferred to the droplet and water has high latent heat of evaporation. The rate is increased in the second stage as more and more heat is transferred to the droplet, however, the rate is again decreased at the last stage of evaporation due to the much stronger ion-water interaction.

The addition of salts into water droplet results in a slower evaporation rate compared to pure water droplets, which can be attributed to the strong hydration effect and strong attractive force on the water imposed by cations and anions. The evaporation rates for various aqueous droplets are LiCl < NaCl < KCl, and the evaporation becomes slower for high salt concentrations, due to the stronger hydration effect and attractive force occur in LiCl aqueous droplets and at high salt concentration.

The interaction potential model of particles is important for MD simulation results, however, the present study focuses on the effect of the salts concentration and category on the evaporation of the aqueous droplets, the conclusions are expected to be still applicable when a different interaction potential model is adopted. Furthermore, the previous MD studies [5,6,9] showed that because there was no "bulk" liquid for nano-scale droplets, its evaporation rate deviated from the classical *D*2 law, which was regarded as a good description of evaporation of micro- and milli-scale droplets. Therefore, it is worth studying further that whether the present results may extend to micro- and milli-scale droplets.

#### **Acknowledgments**

This study was partially supported by the National Natural Science Foundation of China (No. 51076009), by the 111 Project (No. B12034), and by the National Natural Science Foundation of China (Nos. 50876049 and 51210011).

#### **References**


Reprinted from *Entropy*. Cite as: Akhter, T.; Rohlf, K. Quantifying Compressibility and Slip in Multiparticle Collision (MPC) Flow through a Local Constriction. *Entropy* 2014, *16*, 418–442.

*Article*

## Quantifying Compressibility and Slip in Multiparticle Collision (MPC) Flow through a Local Constriction

Tahmina Akhter **<sup>1</sup>** and Katrin Rohlf **<sup>2</sup>***,* \*


*Received: 27 October 2013; in revised form: 13 December 2013 / Accepted: 16 December 2013 / Published: 2 January 2014*

Abstract: The flow of a compressible fluid with slip through a cylinder with an asymmetric local constriction has been considered both numerically, as well as analytically. For the numerical work, a particle-based method whose dynamics is governed by the multiparticle collision (MPC) rule has been used together with a generalized boundary condition that allows for slip at the wall. Since it is well known that an MPC system corresponds to an ideal gas and behaves like a compressible, viscous flow on average, an approximate analytical solution has been derived from the compressible Navier–Stokes equations of motion coupled to an ideal gas equation of state using the Karman–Pohlhausen method. The constriction is assumed to have a polynomial form, and the location of maximum constriction is varied throughout the constricted portion of the cylinder. Results for centerline densities and centerline velocities have been compared for various Reynolds numbers, Mach numbers, wall slip values and flow geometries.

Keywords: multiparticle collision (MPC) dynamics; constriction; slip; Karman–Pohlhausen method; compressible; ideal gas

#### 1. Introduction

Flows through microchannels and microtubes have become recent areas of interest due to new developments in the fabrication technology of microfluidic devices. Examples of applications include micro-gas turbine generators and bio-analytical devices. In order to implement flow control measures or to optimize the design of bio-analytical devices, for example, a proper understanding of the flow through the device has to be developed. On the other hand, in gas microflows, compressibility effects can be important, and wall slip can be measurable, requiring incorporation of these in any numerical or analytical studies in this field. Particle-based methods, such as multiparticle collision dynamics (MPCD), are a means to simulate flows of a Newtonian, compressible, ideal gas, and slip effects can be incorporated very easily. Additionally, a constricted geometry is an ideal flow domain where compressibility effects can be important, for which an analytical solution is feasible. Our goal in this paper is to develop a better understanding, both theoretically and numerically, of the effects of compressibility and wall slip in a flow through a local constriction.

Flows through constrictions are popular in blood flow studies, and the analytical method used in this paper is an extension of the pioneering analysis carried out in [1–3]. The method used is called the Karman–Pohlhausen method, which essentially leads to the determination of the axial velocity profile. In [1–3], the fluid is considered to be Newtonian and incompressible, and the no-slip assumption is used, as would be common for blood flow applications. A more accurate pressure distribution was later developed for the same flow problem and presented in [4]. The same method was also used in [5], where the flow of an incompressible couple-stress fluid through a constriction was developed. In [6], a modified Karman–Pohlhausen method was proposed, and a general (2M)-degree polynomial was used for the flow field rather than a fourth degree polynomial, as per the original Karman–Pohlhausen method. In [7], slip was incorporated for incompressible, Newtonian flow through a local constriction. Weakly compressible flow with slip was later considered by [8,9], who also allowed for a flow geometry that is not necessarily symmetric about the location of maximum constriction. The results presented here are extensions of the results given in [8,9], giving more accurate expressions for the axial velocity profile.

Numerical works for flow through constrictions are two-fold. Discretization of the Navier–Stokes equations of motion for steady flow through stenoses was carried out by a number of authors for a Newtonian fluid [10–16]. Non-Newtonian models were considered numerically in [17–19] to name a few. All but [19] used the no-slip boundary condition. All of these works are for incompressible flows as they are applied to blood flow studies. Particle-based numerical methods, such as the Lattice-Boltzmann method [20], dissipative particle dynamics (DPD) [21,22] and multiparticle collision (MPC) dynamics [8,9,16], have more recently led to numerical solutions for flow through a local constriction. The Lattice-Boltzmann method has also recently been used for blood flow studies in complex flow geometries for realistic cardiovascular flow domains [23–25]. The method has been reviewed recently in [26], and its use for complex flows has been reviewed in [27]. Except for [8,9], the results are numerical. Since compressible flows through constrictions can exhibit significant compressibility effects and since particle-based methods have compressibility built-in, such methods are ideal numerical means for simulating compressible flow through local constrictions.

Additional particle-based methods applied to blood flow studies in microvessels, for which deformable particles are modeled separately from the fluid in which they are suspended, include simulations with MPC [28,29] and DPD [30–33]. In [32], a Y-shaped bifurcation is considered, and [29] consider a complex flow domain. The simulations in this paper differ from these references in that the MPC fluid in this paper has point particles that neither deform nor aggregate, and there is only one type of particle in the system.

In this paper, the Karman–Pohlhausen method is used to develop the axial velocity distribution for steady, Newtonian flow through a stenosed vessel, allowing for slip at the wall, as well as compressibility. The analysis is a natural extension of [1] and an improvement to the results given in [8,9]. The flow geometry considered is axisymmetric, but asymmetric about the location of maximum constriction. Effects of compressibility, slip and flow geometry are assessed. Numerical results for flow through the same geometry using multiparticle collision (MPC) dynamics are also obtained and compared to the analytical solution.

#### 2. Multiparticle Collision Dynamics

The particle system contains N identical point particles of unit mass that are distributed uniformly over cells on a regular three-dimensional lattice. Each cell, ξ, contains n particles on average. At discrete time intervals, Δt, the continuous positions, **r**i, and velocities, **v**<sup>i</sup> (i = 1,...N), are updated according to the multiparticle collision (MPC) dynamics originally developed in [34]. So as to ensure Galilean invariance, a random grid shift is implemented prior to each collision step as first introduced in [35]. The idealized collisions of the MPC algorithm then update the velocity of particle i according to:

$$\mathbf{v} \rightarrow \mathbf{V}\_{\xi} + \hat{\omega}\_{\xi} (\mathbf{v}\_{i} - \mathbf{V}\_{\xi}) \tag{1}$$

where ωˆ<sup>ξ</sup> is a stochastic rotation matrix that rotates the velocities by either +π/2 or −π/2 about a randomly chosen axis that varies from cell to cell and in time, and **V**<sup>ξ</sup> is the average velocity of all particles in cell ξ in the pre-collision state [34].

Next, a constant external force accelerates the post-collision velocity of particle i in the z-direction according to:

$$
v\_z^i \to v\_z^i + g\Delta t\tag{2}$$

where v<sup>i</sup> <sup>z</sup> is the z-component of the velocity of particle i and g is the acceleration value.

To simulate isothermal flow conditions, a thermostat is applied to the system, so as to remove the energy that the external force pumps into the system. The velocity of each particle is rescaled according to a profile-unbiased Galilean invariant thermostat first introduced by [36], the details of which can be found in [8,9,16].

Finally, free-streaming of the particles updates the positions according to:

$$
\mathbf{r}\_i \to \mathbf{r}\_i + \mathbf{v}\_i \Delta t \tag{3}
$$

where the velocity here is the velocity after the collision, acceleration and thermostatting steps have taken place.

#### *2.1. Boundary Conditions*

Periodic boundary conditions are applied in the z-direction, and collisions with the cylinder walls follow the generalized boundary condition [8,9,16,37,38]:

$$\mathbf{v}\_n \to -\mathbf{v}\_n \tag{4}$$

$$\mathbf{v}\_t \to (2\lambda - 1)\mathbf{v}\_t \tag{5}$$

which is capable of incorporating macroscopic slip by means of changing the value of λ ∈ [0, 1]. No-slip flow is obtained with the λ = 0 bounce-back rule, while elastic collisions (λ = 1) would result in uniform flow through the pipe. For our simulations, we use λ ∈ [0, 0.5].

In order to compare the particle-based method with the analytical results, the particle-system is subjected to a cumulative averaging procedure as outlined in [16], where it was found that the averaging method is ideal for determining the macroscopic velocity profile for MPC flows.

Theoretical expressions for the viscosity coefficient of an MPC flow have been developed, and it has been shown that for our choice in ωˆ:

$$
\mu = \mu\_{kin} + \mu\_{coll} \tag{6}
$$

where:

$$
\mu\_{kin} = \left(\frac{nk\_BT}{m(\Delta x)^3}\right)\Delta t \left[\frac{5n}{6(n-1+e^{-n})} - \frac{1}{2}\right] \tag{7}
$$

$$
\mu\_{coll} = \frac{m}{18\Delta x \Delta t} (n - 1 + e^{-n}) \tag{8}
$$

and k<sup>B</sup> is the Boltzmann constant, T the system temperature, Δx the length of a cubic cell in the lattice and n the average number of particles in a cell [34,35,39–43].

#### 3. Theoretical Analysis

The governing equations of motion for a compressible, isothermal, viscous flow of an ideal gas are given by:

$$\frac{\partial \rho}{\partial t} + \nabla \cdot (\rho \mathbf{u}) = 0, \quad \text{(conservation of mass)} \tag{9}$$

$$D\_{\text{max}} = \frac{1}{\nabla D\_{\text{min}}} \mathbf{c} \cdot \mathbf{c} + \nabla^2 \mathbf{c} \cdot \mathbf{c} + (2\pi) \nabla (\nabla \cdot \mathbf{u})$$

$$
\rho \frac{D}{Dt} \mathbf{u} = -\nabla P + \rho \mathbf{f} + \mu \nabla^2 \mathbf{u} + (\kappa - \frac{2}{3}\mu) \nabla(\nabla \cdot \mathbf{u}) \tag{10}
$$

$$
\text{(conservation of momentum)}\tag{10}
$$

$$P = \frac{k\_B T}{m} \rho,\quad \text{(equation of state)}\tag{11}$$

where ρ is the density, t is time, D/Dt = ∂/∂t + **u** · ∇ is the material derivative, **u** is the velocity vector, P is the pressure, **f** corresponds to an external force, μ is the viscosity, κ is the 528

bulk viscosity, m is the mass of the fluid particle, k<sup>B</sup> is the Boltzmann constant and T is the constant fluid temperature.

Assuming steady-state and axisymmetry, the velocity vector in cylindrical coordinates is assumed to have the form:

$$\mathbf{u} = (u\_r, u\_\theta, u\_z) = (u(r, z), 0, w(r, z)) \tag{12}$$

together with ρ = ρ(r, z). Under the Stokes assumption (κ = 0), the governing equations, with an external force in the form **f** = (fr, fθ, fz) = (0, 0, ρg) become:

$$\begin{split} \frac{\partial}{\partial r}(\rho u) + \frac{\partial}{\partial z}(\rho w) + \frac{\rho u}{r} &= 0, \quad \text{(mass)}\\ \rho \left( u \frac{\partial u}{\partial r} + w \frac{\partial u}{\partial z} \right) &= -\frac{\partial P}{\partial r} + \mu \left( \frac{\partial^2 u}{\partial r^2} + \frac{1}{r} \frac{\partial u}{\partial r} + \frac{\partial^2 u}{\partial z^2} - \frac{u}{r^2} \right) \\ &\quad + \frac{\mu}{3} \frac{\partial}{\partial r} (\nabla \cdot \mathbf{v}), \quad \text{( $r$ -momentum)} \end{split} \tag{14}$$

$$\begin{split} \rho \left( u \frac{\partial w}{\partial r} + w \frac{\partial w}{\partial z} \right) &= \rho g - \frac{\partial P}{\partial z} + \mu \left( \frac{\partial^2 w}{\partial r^2} + \frac{1}{r} \frac{\partial w}{\partial r} + \frac{\partial^2 w}{\partial z^2} \right) \\ &+ \frac{\mu}{3} \frac{\partial}{\partial z} (\nabla \cdot \mathbf{v}), \quad \text{( $z$ -momentum)} \end{split} \tag{15}$$

$$P(r,z) = \frac{k\_B T}{m} \rho(r,z), \quad \text{(equation of state)}\tag{16}$$

where:

$$\nabla \cdot \mathbf{v} = \frac{u}{r} + \frac{\partial u}{\partial r} + \frac{\partial w}{\partial z} \tag{17}$$

and the θ-momentum equation is identically satisfied.

As per [1], for a mild stenosis geometry, the r-momentum Equation (14) can be approximated as ∂P ∂r = 0, in which case, Equation (16) implies ρ = ρ(z), which can be used in Equation (13) to give:

$$\frac{u}{r} + \frac{\partial u}{\partial r} = -\frac{1}{\rho} \frac{\partial}{\partial z} (\rho w) \tag{18}$$

Using this in the last term of Equation (15), together with the assumption that u ∂w ∂r  <sup>w</sup>∂w ∂z allows us to write the system for determining w(r, z) and P(z) as:

$$\begin{split} \rho w \frac{\partial w}{\partial z} &= \rho g - \frac{dP}{dz} + \mu \left( \frac{\partial^2 w}{\partial r^2} + \frac{1}{r} \frac{\partial w}{\partial r} + \frac{4}{3} \frac{\partial^2 w}{\partial z^2} \right) \\ &\quad - \frac{\mu}{3} \frac{\partial}{\partial z} \left( \frac{1}{\rho} \frac{\partial}{\partial z} (\rho w) \right) \end{split} \tag{19}$$

$$P(z) = \frac{k\_B T}{m} \rho(z) \tag{20}$$

Following [1], we now assume that the radial dependence of the axial velocity, w, is a fourth-order polynomial in the form:

$$\frac{w}{W} = -A\eta + B\eta^2 + C\eta^3 + D\eta^4 + E \tag{21}$$

where η = <sup>R</sup>−<sup>r</sup> <sup>R</sup> , and W = W(z) is the as yet unknown centerline velocity. Constants A to E are determined by imposing:


Condition (i) follows from solving **u** · **n** = 0 (the vanishing normal component of velocity) and **u** · **t** = w<sup>s</sup> (the tangential component of velocity is ws) for w, while (iv) comes from the velocity profile:

$$w^{poi}(r) = (W - w\_s) \left[ 1 - \frac{r^2}{R^2} \right] + w\_s \tag{22}$$

which is Poiseuille flow in an unconstricted tube with slip, ws, at the wall (r = R) and W is centerline velocity.

Imposing (i)–(v) and solving for the unknown constants gives:

$$A\_{\
u} = \frac{1}{7} \left( -\lambda + 10 - 12E + T + 2\frac{w\_s}{W} \right) \tag{23}$$

$$B\_{\perp} = \frac{1}{\frac{7}{4}} \left( 3\lambda + 5 - 6E - 3T + \frac{w\_s}{W} \right) \tag{24}$$

$$C\_- = \frac{1}{7} \left( -3\lambda - 12 + 20E + 3T - 8\frac{w\_s}{W} \right) \tag{25}$$

$$\Delta D = \frac{1}{7} \left( \lambda + 4 - 9E - T + 5 \frac{w\_s}{W} \right) \tag{26}$$

$$E\_{\perp} = \frac{w\_s}{W\sqrt{1 + R^2}}\tag{27}$$

where:

$$
\lambda\_{\perp} = \frac{R^2}{\mu W} \frac{dP}{dz} \tag{28}
$$

and:

$$\left(T\_{\perp}\right) = \frac{\rho g R^2}{\mu W} \tag{29}$$

By definition, the flow rate is given by:

$$Q = \pi \rho R^2 \overline{W} = \int\_0^R 2\pi \rho wr \, dr \tag{30}$$

Substituting Equation (21) for w, using Equations (23)–(27) and solving for W in terms of W, gives the relationship:

$$W = \frac{210}{97}\overline{W} + \frac{2}{97}\frac{R^2}{\mu}\frac{dP}{dz} - \frac{2}{97}\frac{R^2\rho g}{\mu} - \frac{11}{97}w\_s - \frac{102}{97}\frac{w\_s}{\sqrt{1+R^2}}\tag{31}$$

The details pertaining to the next step involving the derivation of the equation for dP dz are outlined in Appendix A. The result is:

$$\begin{aligned} \frac{R^2}{\mu \overline{W}} \frac{dP}{dz} \left(1 - \frac{388}{225} Ma^2 + \frac{97}{225} \frac{w\_s}{\overline{W}} Ma^2 - \frac{194}{225} \frac{w\_s}{\overline{W}} \frac{R'}{\sqrt{1 + R'^2}} \frac{Ma^2}{Re}\right) &= \frac{388}{225} R' Re + \frac{gR}{\overline{W}^2} Re \\\ -8 - \frac{8}{25} \frac{w\_s}{\overline{W}} - \frac{97}{225} \frac{w\_s^2}{\overline{W}^2} R' Re + \frac{w\_s}{\overline{W}} \frac{1}{\sqrt{1 + R'^2}} \left(\frac{208}{25} - \frac{194}{75} \frac{d}{dz} (RR')\right) \\\ + \frac{97}{75} \frac{R'}{1 + R'^2} \frac{w\_s^2}{\overline{W}^2} Re - \frac{388}{75} RR' \frac{w\_s}{\overline{W}} \frac{d}{dz} (1 + R^2)^{-1/2} \end{aligned} \tag{32}$$

where we have defined the local Reynolds and Mach numbers as:

$$Re\_{\perp} = \frac{\rho \overline{W} R}{\mu} \tag{33}$$

and:

$$Ma\_{\perp} = \frac{\overline{W}}{\sqrt{\frac{k\_B T}{m}}}\tag{34}$$

respectively.

Finally, substitution of Equations (31) and (32) in Equation (21) and subsequent simplification gives the axial velocity as:

$$\frac{w(\eta, z)}{\overline{W}} = \frac{G\eta + H\eta^2 + I\eta^3 + J\eta^4}{\left(1 - \frac{388}{225}Ma^2 + \frac{97}{225}\frac{w\_s}{W}Ma^2 - \frac{194}{225}\frac{w\_s}{W}\frac{R'}{\sqrt{1 + R'^2}}\frac{Ma^2}{Re}\right)} + K \tag{35}$$

where G, H, I, J and K are given in Appendix B.

Substituting η = 1 and simplifying gives the centerline velocity as:

$$\begin{aligned} \left[\frac{w(\eta=1,z)}{\overline{W}} - \frac{w\_s}{\overline{W}\sqrt{1+R^2}}\right] \left(1 - \frac{388}{225}Ma^2 + \frac{97}{225}\frac{w\_s}{\overline{W}} Ma^2 - \frac{194}{225}\frac{w\_s}{\overline{W}}\frac{R'}{\sqrt{1+R^2}}\frac{Ma^2}{Re}\right) \\ = &2 + Re\frac{dR}{dz}\left[\frac{8}{225} - \frac{2}{225}\frac{w\_s^2}{\overline{W}^2} + \frac{6}{225}\frac{1}{1+R^2}\frac{w\_s^2}{\overline{W}^2}\right] \\ + \frac{1}{75}\frac{w\_s}{\overline{W}} \left[-9 - \frac{141}{\sqrt{1+R^2}} - \frac{4}{\sqrt{1+R^2}}(R^2 + RR'') + 8\frac{RR^2R''}{(1+R^2)^{3/2}}\right] \\ + Ma^2 \left[-\frac{56}{15} + \frac{2}{225}\frac{gR}{W^2}\left(4Re - \frac{w\_s}{\overline{W}}Re + 2\frac{w\_s}{\overline{W}}\frac{R'}{\sqrt{1+R^2}}\right) \\ + \frac{1}{225}\frac{w\_s}{\overline{W}} \left(254 - 11\frac{w\_s}{\overline{W}} + \frac{796}{\sqrt{1+R^2}} - \frac{199}{\sqrt{1+R^2}}\frac{w\_s}{\overline{W}}\right) \\ + \frac{2}{225}\frac{1}{Re}\frac{w\_s}{\overline{W}} \left(-210\frac{R'}{\sqrt{1+R^2}} + 11\frac{w\_s}{\overline{W}}\frac{R'}{\sqrt{1+R^2}} + 199\frac{w\_s}{\overline{W}}\frac{R'}{1+R^2}\right)\right] \end{aligned} (36)$$

Note that substituting M a = 0 and dR dz = 0 leads to <sup>W</sup> <sup>W</sup> = 2 <sup>−</sup> <sup>w</sup><sup>s</sup> <sup>W</sup> , which agrees with Equation (A9) for w = wpoi, as it should, and that substitution of M a = 0 and w<sup>s</sup> = 0 for dR dz = 0 in the above solution gives Forrester and Young's [1] result for incompressible no-slip flow.

531

### 4. Equation for Density

In order to plot the velocity profile obtained in the previous section, the explicit solution for ρ(z) has to be found, since Re and M a depend on ρ(z), due to their local nature. To achieve this, the ideal gas equation of state Equation (20) can be used to replace pressure terms with density in Equation (A2), while constant flow rate can be used to replace local Re and M a numbers with upstream values and ρ(z) terms. Specifically, constant flow rate implies (see Equation (30)):

$$
\overline{W}\_{\quad} = \begin{array}{c} \overline{W}\_{0} \rho\_{0} R\_{0}^{2} \\ \overline{\rho R^{2}} \end{array} \tag{37}
$$

where the zero subscript indicates constant upstream values in the unconstricted portion of the cylinder. Thus:

$$\begin{aligned} Re^{-} &= \frac{\rho \overline{W} R}{\mu} = \frac{\rho \overline{W}\_{0} \rho\_{0} R\_{0}^{2} R}{\rho R^{2} \mu} \\ &= \
Re\_{0} \frac{R\_{0}}{R} \end{aligned} \tag{38}$$

and:

$$\begin{split} Ma &= \frac{\overline{W}}{\sqrt{\frac{k\_B T}{m}}} = \frac{\overline{W}\_0 \rho\_0 R\_0^2}{\rho R^2 \sqrt{\frac{k\_B T}{m}}} \\ &= \
\left. Ma\_0 \frac{\rho\_0}{\rho} \left( \frac{R\_0}{R} \right)^2 \right. \end{split} \tag{39}$$

Lastly, the dimensionless slip velocity can be written as:

$$\frac{w\_s}{\overline{W}} = \frac{w\_s}{\overline{W}\_0} \left(\frac{R}{R\_0}\right)^2 \frac{\rho}{\rho\_0} \tag{40}$$

It follows that the pressure equation can be written in terms of ρ(z) using the equation of state, Equation (20), giving:

$$\begin{aligned} &-\frac{R\_0^2}{\mu \overline{W}\_0} \left(\frac{R}{R\_0}\right)^4 \frac{\rho}{\rho\_0} \frac{k\_B T}{m} \frac{d\rho}{dz} \left[1 - \frac{388}{225} M a\_0^2 \left(\frac{\rho\_0}{\rho}\right)^2 \left(\frac{R\_0}{R}\right)^4 + \frac{97}{225} \frac{w\_s}{\overline{W}\_0} M a\_0^2 \frac{\rho\_0}{\rho} \left(\frac{R\_0}{R}\right)^2\right] \\ &- \frac{194}{225} \frac{w\_s}{\overline{W}\_0} \frac{R'}{\sqrt{1 + R'^2}} \frac{M a\_0^2 \rho\_0}{Re\_0} \frac{R\_0}{\rho} \frac{R\_0}{R}\right] = \frac{388}{225} R' Re + \frac{gR}{\overline{W}^2} Re \\ &- 8 - \frac{8}{25} \frac{w\_s}{\overline{W}} - \frac{97}{225} \frac{w\_s^2}{\overline{W}^2} R' Re + \frac{w\_s}{\overline{W}} \frac{1}{\sqrt{1 + R'^2}} \left(\frac{208}{25} - \frac{194}{75} \frac{d}{dz}(RR')\right) \\ &+ \frac{97}{75} \frac{R'}{1 + R'^2} \frac{w\_s^2}{\overline{W}^2} Re - \frac{388}{75} RR \frac{w\_s}{\overline{W}} \frac{d}{dz} (1 + R'^2)^{-1/2} \end{aligned} \tag{41}$$

where Re and M a must be written in terms of Re0, M a<sup>0</sup> and ρ as given by Equations (38)–(40).

#### 5. Flow Geometry

In order to be able to consider an asymmetric stenosis, the radius is taken to have the idealized polynomial form:

$$R(z) \quad = \begin{cases} R\_0, & \text{for } z \le z\_1 \\ az^3 + bz^2 + cz + d, & \text{for } z\_1 \le z \le z\_2 \\ ez^3 + fz^2 + gz + h, & \text{for } z\_2 \le z \le z\_3 \\ R\_0 & \text{for } z \ge z\_3 \end{cases} \tag{42}$$

where z<sup>2</sup> = z<sup>1</sup> + l<sup>1</sup> and z<sup>3</sup> = z<sup>2</sup> + l2.

Imposing that R(z) be continuously differentiable and that R(z2) = R<sup>0</sup> − δ and R (z2)=0 requires:

$$a\_1 = \frac{2\delta}{l\_1^3} \tag{43}$$

$$b\_1 = -\frac{3\delta(2z\_1 + l\_1)}{l\_1^3} \tag{44}$$

$$c\_1 = \frac{6\delta z\_1 (z\_1 + l\_1)}{l\_1^3} \tag{45}$$

$$d\_1 = \quad -\frac{2\delta z\_1^3 + 3\delta z\_1^2 l\_1 - R\_0 l\_1^3}{l\_1^3} \tag{46}$$

$$
\delta e\_{\perp} = -\frac{2\delta}{l\_2^3} \tag{47}
$$

$$\left(f\right)' = \frac{3\delta(2z\_1 + 2l\_1 + l\_2)}{l\_2^3} \tag{48}$$

$$g\_{\pm} = -\frac{6\delta(z\_1 + l\_1)(z\_1 + l\_1 + l\_2)}{l\_2^3} \tag{49}$$

$$h\_{\perp} = \frac{3\delta l\_2(z\_1^2 + l\_1^2) + 6\delta z\_1 l\_1 (z\_1 + l\_1 + l\_2) + 2\delta (z\_1^3 + l\_1^3) + (R\_0 - \delta)l\_2^3}{l\_2^3} \tag{50}$$

The resulting axisymmetric flow domain is shown in Figure 1. As can be seen from the figure, by construction, δ controls the severity of the constriction, while l<sup>1</sup> can be used to create the asymmetry about the z<sup>2</sup> location.

#### Figure 1. Flow geometry.

For all results that follow, R<sup>0</sup> = 10.5, z<sup>1</sup> = 600.5 and l<sup>1</sup> + l<sup>2</sup> = 30.

#### 6. Numerical Results and Discussion

For all (dimensionless) MPC simulations that follow, there were approximately N = 8.5 million particles of unit mass m = 1 in the system, Δx =1=Δy = Δz; there were 1, 200 cells in the z direction and 25 cells in the x and y directions, respectively. The time step was taken to be Δt = 1 and kBT = 1, together with n = 20. For the cumulative average, the averaging started after 5, 000 time steps and was performed for 35, 000 time steps thereafter. The initial system was set up with x and y velocities drawn from a Maxwellian velocity distribution, and z velocity drawn from the steady velocity profile of flow through a cylinder of fixed radius R0.

A length of 1, 200 cells in the z-direction was chosen so as to ensure that periodic boundary conditions are valid. For this cylinder length, the velocity settled back to the expected parabolic profile in an unconstricted cylinder prior to reaching the exit for all constrictions considered here. In addition, since the velocity and density were found to be affected upstream in some simulations, starting the constriction at z = 600.5 ensured that there was a region upstream for which this effect was not present. Although some constrictions did not require a length of 1,200, this length was fixed for all simulations, so as to ensure that the most severe constriction with the highest Reynolds number would satisfy the periodic boundary condition.

The initial velocity distribution in the z direction was chosen, so as to reduce the simulation time. Test simulations (not reported here) were performed using a Maxwellian velocity distribution in all three directions as the initial state. The system maintained the Maxwellian velocity distribution in the x and y directions, and on average, the expected z velocity distribution that was later chosen as the initial state. In this way, the system reached equilibrium earlier, and the cumulative averaging could start after 5,000 time steps in all cases considered.


Table 1. Parameter values used in the analytical solution in Figure 2 for comparison with the particle-based method for compressible no-slip flow (λ = 0, w<sup>s</sup> = 0).

Table 2. Parameter values used in the analytical solution in Figure 3 for comparison with the particle-based method for compressible flow with slip (λ = 0.5).


Simulations were done using serial code on an Intel Xeon X5482 3.2 GHz machine with 8 GB RAM. Typical run times were 3–4 days.

To obtain the required upstream values for ρ0, the particle-based numerical results were averaged over the centerline density values for z ∈ [0, 100], and a best parabolic fit to the cross-section at z = 100.5 gave rise to the values for W<sup>0</sup> and ws, as provided in Tables 1 and 2. These values were then used to determine the density from numerical integration of Equation (41).

Figure 2. Comparison of analytical results with the particle-based method for variation in the Reynolds number in a constriction for which δ = 0.5, l<sup>1</sup> = 20, λ = 0 (no slip) and w<sup>s</sup> = 0. (a) Numerical and theoretically-predicted scaled centerline densities; and (b) corresponding numerical and analytical scaled centerline velocities. See also Table 1.

Figure 3. Comparison of analytical results with the particle-based method for variation in the Reynolds number in a constriction for which δ = 0.5, l<sup>1</sup> = 20, λ = 0.5 (slip). (a) Numerical and approximate scaled centerline densities; and (b) corresponding numerical and analytical scaled centerline velocities. See also Table 2.

535

It can be seen in Figure 4 that the bounce-back rule (MPC-BB, λ = 0) correctly leads to the expected zero velocity at the wall, while slip is clearly present in the MPC-LIT(λ = 0.5) case.

Figure 4. Cross-section velocity profile at various z locations far upstream of the constriction for λ = 0 (bounce-back, multiparticle collision (MPC)-BB) and for λ = 0.5 (loss-in-tangential, MPC-LIT) together with a best parabolic fit. MPC-BB correctly leads to the no-slip boundary condition, while MPC-LIT clearly has finite slip at the wall.

The differential equation for density was found to have a stable positive steady state, ρequil, that differed slightly from the ρ<sup>0</sup> determined from the MPC results. The values have been added to the Tables, as well as the relative errors from ρ0. The density equation was solved numerically using the fourth-order Runge–Kutta scheme with Δz = 0.001 using MAPLE. Since the geometry is a piecewise defined function, the equation was solved one piece at a time, and instead of imposing ρ<sup>0</sup> as an initial condition at z = 0, ρequil was used. The differential equation was then solved on [0, z1] with the value at z<sup>1</sup> becoming the initial condition for the differential equation on [z1, z2], and so on. In this way, the numerical solution was found for z ∈ [0, 1200]. Since the system has a steady state, the density settled back to the equilibrium value downstream of the constriction, thus ensuring that periodic boundary conditions are obtained in the analysis allowing comparison with the MPC results.

#### *6.1. Compressible No-Slip Flow*

In Figure 2a, a comparison of the theoretically-predicted centerline density arising from the numerical solution of Equation (41) is made with the particle-based MPC density results in the no-slip case. It can be seen that although there are some discrepancies between the predicted density curves and those obtained from the MPC simulations, both predict a density increase through the constriction, and the best agreement is found for the lowest Reynolds number considered (g = 0.005 curves). Worth noting in Table 1 is the increase in ρ<sup>0</sup> as Re<sup>0</sup> increases, which is consisted with the increase in ρequil.

Using the theoretically-predicted density curves in the centerline velocity expression (36) gives rise to the theoretically-predicted centerline velocity curves in Figure 2b. It can be seen that the theoretically-predicted centerline velocity agrees fairly well with the MPC result for g = 0.005, but as the Reynolds number increases, the agreement worsens. Worth noting is the appearance of a dip in the centerline velocity in both the theoretically-predicted and MPC results as a result of the constriction for the largest Reynolds number considered (g = 0.02).

#### *6.2. Compressible Flow with Slip*

For compressible flow with slip at the wall, relevant parameter values arising from the theoretical and numerical results are shown in Table 2. Theoretical scaled centerline densities and centerline velocities are compared to MPC results in Figure 3. It can be seen in (a) of the figure that there is some discrepancy between the theoretically predicted and MPC density results, but that the agreement is somewhat better than in the no-slip case. Likely due to the better agreement between the density curves, the scaled centerline velocities agree better, as well, and the dip for the largest Reynolds number (g = 0.02) is slightly overestimated by the theoretical predictions, contrary to the no-slip case.

Worth noting here is that, although the density curves seem to match better in the slip case, glancing at Table 2, ρ<sup>0</sup> is found to increase as the Reynolds number increases, while the reverse is predicted with ρequil.


Table 3. Parameter values used in the analytical solution in Figure 5 for comparison with particle-based method for compressible flow through constrictions of varying degrees.

#### *6.3. Effect of the Severity of the Constriction*

For the smallest Reynolds number considered (g = 0.005), the severity of the constriction is varied for both slip (λ = 0.5) and no-slip (λ = 0) flow. Corresponding parameter values are given in Table 3, and resulting scaled centerline velocity plots are shown in Figure 5. It can be seen that there is relatively good agreement between the theoretically-predicted curves and those from the MPC results for the mildest constriction (δ = 0.5) and that there is some discrepancy as the constriction becomes more severe. The appearance of a dip in the scaled centerline velocity for the more severe constrictions is captured in the slip case, while the decrease in scaled centerline velocity upstream of the constriction is found in the MPC no-slip results, but not in the theoretical predictions. On these same no-slip plots, the theoretical results predict a lower scaled centerline velocity in the post-constriction region, while MPC results do not show this feature.

Figure 5. Comparison of analytical results with the particle-based method as the severity of the constriction varies with g = 0.005, l<sup>1</sup> = 20. (a) Scaled centerline velocities for no-slip flow (λ = 0); (b) scaled centerline velocities for flow with slip (λ = 0.5). See also Table 3.

#### *6.4. Effect of Increasing Slip*

Increasing the slip parameter, λ, and, thus, the wall slip, ws, leads to Figure 6. Parameter values used in the analytical velocity profiles are provided in Table 4. There is very good agreement between the analytical and the numerical results as the slip is varied, and the equilibrium density values from the theoretical predictions agree well with the centerline densities obtained in the MPC results.

Table 4. Parameter values used in the analytical solution in Figure 6 for comparison with the particle-based method for compressible flow through a constriction with δ = 0.5, g = 0.005, l<sup>1</sup> = 20 and variable slip parameter values.


Figure 6. Comparison of analytical and numerical scaled centerline velocities for varying values of the wall slip through a constriction with g = 0.005, δ = 0.5 and l<sup>1</sup> = 20. See also Table 4.

#### *6.5. Contour Plot Comparison*

Figure 7 shows contour plots for the scaled centerline velocity for both the analytical and numerical particle-based method results for a constriction with δ = 2, g = 0.005, l<sup>1</sup> = 20 and λ = 0.5. For the analytical results, the values of the last row of Table 3 were used.

Figure 7. (Color online) Contour plots for the scaled velocity with δ = 2, λ = 0.5, l<sup>1</sup> = 20 and g = 0.005 for (a) the analytical results and (b) the particle-based method.

#### 7. Discussion and Conclusions

An approximate analytical solution for the density, and for the axial velocity distribution, in an asymmetric constriction have been developed and compared to the numerical solution of a particle-based system governed by the multiparticle collision (MPC) dynamics. The solutions in all cases correspond to compressible flow with slip at the cylinder wall. Reynolds numbers varied from approximately four to 17.

Analysis of results revealed that increasing the Reynolds number in a fixed geometry leads to the appearance of a dip in the scaled centerline velocity in the entry region of the constriction, together with more pronounced flow acceleration following the location of maximum constriction. This is true with and without slip. In addition, as the Reynolds number increases, there is an increase in scaled centerline density, ρ0, in all cases considered, except in the analytical results with slip that predict a decrease in centerline density (ρequil) instead. As the severity of the constriction increases, both slip and no-slip results show acceleration through the constriction, although the analytical and MPC results agree best for the mildest constriction (δ = 0.5) considered. Consistent with theory and MPC is the appearance of a dip in the scaled centerline velocity in the post-constriction region that is more pronounced as the severity of the constriction increases. This dip is, however, missing from the no-slip MPC results, which, instead, show a dip in the upstream section that is not captured in the theoretical predictions. Lastly, increasing slip has the effect of leading to faster flow through the constriction with the appearance of a dip in the post-constriction region that is consistent with the MPC results.

Figure 8. Comparison of W *versus* Wapprox = 2W −w<sup>s</sup> for both (a) no-slip; and (b) slip. The best agreement is found for g = 0.005.

Since many key features compare well between the theoretically predicted results and those obtained by MPC, it is expected that improvements in the theory will lead to even better agreement in the constriction region and thereafter. In particular, an approximation was made for <sup>R</sup> <sup>0</sup> rw<sup>2</sup>dr, which led to some errors in the pressure equation and all equations in the subsequent analysis. In Figure 8, plots of W and Wapprox = 2W − w<sup>s</sup> can be found for the constrictions considered in Figures 2 and 3. It can be seen that relationship Equation (A9) is true for the smallest constriction considered and fails to hold for the higher Reynolds numbers, more so for the no-slip case in (a). This is likely a key reason as to why the agreement between MPC and theory is worse for larger Reynolds numbers. Furthermore, all quadratic (dP/dz)<sup>2</sup> and second-order d<sup>2</sup>P/dz<sup>2</sup> terms were dropped in the analysis, which likely led to some errors, as well. It would be interesting to explore whether or not keeping such terms in the analysis leads to significant improvements over what was found here, and this is currently under investigation. An additional source of discrepancy between the results could be the use of a thermostat in the MPC simulations that is applied uniformly, rather than locally, and whether or not using a local thermostat leads to better agreement is currently under investigation. A discussion on the use of thermostats in MPC simulations has been given in [43,44], and it would be interesting to see whether or not changing the thermostat in the simulations can lead to better agreement with the theoretical results.

In summary, an analytical solution for the flow of a compressible Newtonian fluid with slip at the wall was developed and found to compare fairly well to a numerical solution for a particle-based fluid governed by MPC in mild constrictions with low Reynolds numbers. Various Reynolds numbers, Mach numbers, wall slip values and flow geometries were considered in the analysis for asymmetric flow domains.

#### Acknowledgments

This work was supported by an equipment grant from the Natural Sciences and Engineering Research Council of Canada, as well as grants from the same agency for graduate student support. Additionally, T. Akhter gratefully acknowledges funding support from her Ontario Graduate Scholarship.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


#### Appendix A

In this Appendix, the details of obtaining pressure Equation (A19) are shown.

To obtain an expression for dP dz , we first integrate Equation (19) across the cylinder to get:

$$\frac{1}{2}\int\_{0}^{R}\rho r\frac{\partial}{\partial z}(w^{2})dr = \left.\rho g\frac{R^{2}}{2} - \frac{dP}{dz}\frac{R^{2}}{2} + \mu R\left(\frac{\partial w}{\partial r}\right)\right|\_{r=R} + \frac{4}{3}\mu \int\_{0}^{R}\frac{\partial^{2}w}{\partial z^{2}}dr\tag{A1}$$

$$-\frac{\mu}{3} \int\_0^R \frac{\partial}{\partial z} \left( \frac{1}{\rho} \frac{\partial}{\partial z} (\rho w) \right) \tag{A2}$$

Next, we divide by ρ = ρ(z) and take all z-derivatives outside of the integral using:

$$\frac{d}{dz}\int\_0^R h(r,z)dr = h(R,z)\frac{dR}{dz} + \int\_0^R \frac{d}{dz}h(r,z)dr\tag{A3}$$

to get:

$$\begin{aligned} &\frac{1}{2}\frac{d}{dz}\int\_0^R rw^2 dr - \frac{1}{2}\frac{RR'w\_s^2}{1+R'^2} = g\frac{R^2}{2} - \frac{1}{\rho}\frac{dP}{dz}\frac{R^2}{2} + \nu R\left(\frac{\partial w}{\partial r}\right)\bigg|\_{r=R} \\ &+ \frac{4}{3}\nu \left[\frac{d^2}{dz^2}\int\_0^R rw dr - \frac{w\_s}{\sqrt{1+R'^2}}\frac{d}{dz}(RR') - 2RR'w\_s\frac{d}{dz}(1+R'^2)^{-1/2}\right] \\ &+ \frac{\nu}{3\rho^2}\frac{d\rho}{dz}\frac{d}{dz}\int\_0^R r\rho w dr - \frac{\nu}{3\rho}\frac{d\rho}{dz}\frac{RR'w\_s}{\sqrt{1+R'^2}} - \frac{\nu}{3\rho}\frac{d^2}{dz^2}\int\_0^R r\rho w dr \\ &+ \frac{\nu}{3}\frac{w\_s}{\sqrt{1+R'^2}}\frac{d}{dz}(RR') + \frac{2}{3}\nu RR'w\_s\frac{d}{dz}(1+R'^2)^{-1/2} + \frac{2}{3}\frac{\nu}{\rho}\frac{d\rho}{dz}\frac{RR'w\_s}{\sqrt{1+R'^2}} \end{aligned} \tag{A4}$$

where ν = <sup>μ</sup> <sup>ρ</sup> . Taking <sup>w</sup> <sup>≈</sup> <sup>w</sup>poi in the integral on the left-hand side gives,

$$\frac{1}{2}\frac{d}{dz}\int\_0^R rw^2 \, dr \quad \approx \quad \frac{1}{2}\frac{d}{dz}\int\_0^R r(w^{poi})^2 \, dr\tag{A5}$$

$$=\frac{1}{2}\frac{d}{dz}\left[\frac{R^2}{6}(W^2+Ww\_s+w\_s^2)\right] \tag{A6}$$

546

$$=\left.\frac{d}{dz}\left[\frac{1}{3}R^2\overline{W}^2 - \frac{1}{6}R^2\overline{W}w\_s + \frac{1}{12}R^2w\_s^2\right] \tag{A7}$$

$$=\left.\frac{d}{dz}\left[\frac{1}{3}\frac{Q^2}{\pi^2\rho^2R^2} - \frac{1}{6}\frac{Qw\_s}{\pi\rho} + \frac{1}{12}R^2w\_s^2\right] \tag{A8}$$

where:

$$
\overline{W} = \frac{1}{2}(W + w\_s) \tag{A9}
$$

has been used in Equation (A6) to replace W in terms of W, and Equation (30) has been used in Equation (A7) to replace W in terms of Q. The relationship in Equation (A9) follows from using wpoi as given in Equation (22), in flow rate Equation (30). Although this relationship is exact for w = wpoi in an unconstricted portion of the cylinder, it is also assumed to hold throughout the constriction, thus potentially giving rise to some error in the analysis.

Now:

$$\frac{dQ}{dz} = \left. \frac{d}{dz} \int\_0^R 2\pi\rho wr \, dr \right. \quad \text{(from (30))}\tag{A10}$$

$$=-2\pi\rho R\,w\vert\_{r=R}\frac{dR}{dz} + 2\pi\int\_0^R r\frac{d(\rho w)}{dz}\,dr\tag{A11}$$

$$=-2\pi\rho R\,w\big|\_{r=R}\frac{dR}{dz} - 2\pi\int\_0^R r\rho\left(\frac{u}{r} + \frac{\partial u}{\partial r}\right)\,dr\quad\text{(using (13))}\tag{A12}$$

$$=-2\pi\rho R\,w\big|\_{r=R}\frac{dR}{dz} - 2\pi\rho\int\_0^R \frac{\partial}{\partial r}(ur)\,dr\tag{A13}$$

$$\dot{\rho} = \left. 2\pi \rho R \, w \right|\_{r=R} \frac{dR}{dz} - 2\pi \rho R \, u\big|\_{r=R} \tag{A14}$$

$$\begin{split} \dot{\rho} &= -2\pi\rho R \frac{w\_s}{\sqrt{1+R'^2}} R' - 2\pi\rho R \frac{w\_s R'}{\sqrt{1+R'^2}} \\ &= \quad 0. \end{split} \tag{A15}$$

$$\text{Thus, Equation (A8) gives:}$$

$$\begin{split} \frac{1}{2}\frac{d}{dz}\int\_{0}^{R}r w^{2} \, dr &\quad \approx \quad \frac{2Q}{3\pi^{2}\rho^{2}R^{2}} \left(\frac{dQ}{dz} - \frac{Q}{\rho}\frac{d\rho}{dz} - \frac{Q}{R}\frac{dR}{dz}\right) \\ &\quad - \frac{w\_{s}}{6\pi\rho} \left(\frac{dQ}{dz} - \frac{Q}{\rho}\frac{d\rho}{dz}\right) + \frac{1}{6}w\_{s}^{2}R\frac{dR}{dz} \\ &\quad = \quad -\frac{4}{3}R\overline{W}\,u\vert\_{r=R} - \frac{2}{3}R^{2}\overline{W}^{2}\frac{m}{\rho k\_{B}T}\frac{dP}{dz} - \frac{2}{3}\overline{W}^{2}R\frac{dR}{dz} \\ &\quad \quad + \frac{1}{3}Rw\_{s}\,u\vert\_{r=R} + \frac{1}{6}R^{2}\overline{W}w\_{s}\frac{m}{\rho k\_{B}T}\frac{dP}{dz} \\ &\quad + \frac{1}{6}R\frac{dR}{dz}w\_{s}^{2} \end{split} \tag{A18}$$

where we have also used equation of state Equation (16) to write dρ dz in terms of dP dz and flow rate Equation (30) to write Q in terms of W.

Substituting Equation (A18) in Equation (A4), writing all integrals in terms of Q and differentiating, noting that dQ/dz = 0, using ∂w ∂r <sup>r</sup>=<sup>R</sup> <sup>=</sup> <sup>−</sup>AW <sup>R</sup> from Equation (21) together with Equations (23), (28), (29) and (31), and u| <sup>r</sup>=<sup>R</sup> <sup>=</sup> <sup>w</sup>sdR/dz <sup>√</sup>1+R<sup>2</sup> , gives:

$$\begin{aligned} \frac{R^2}{\mu \overline{W}} \frac{dP}{dz} \left(1 - \frac{388}{225} Ma^2 + \frac{97}{225} \frac{w\_s}{\overline{W}} Ma^2 - \frac{194}{225} \frac{w\_s}{\overline{W}} \frac{R'}{\sqrt{1 + R'^2}} \frac{Ma^2}{Re}\right) &= \frac{388}{225} R' Re + \frac{gR}{\overline{W}^2} Re \\\ -8 - \frac{8}{25} \frac{w\_s}{\overline{W}} - \frac{97}{225} \frac{w\_s^2}{\overline{W}^2} R' Re + \frac{w\_s}{\overline{W}} \frac{1}{\sqrt{1 + R'^2}} \left(\frac{208}{25} - \frac{194}{75} \frac{d}{dz} (RR')\right) \\\ + \frac{97}{75} \frac{R'}{1 + R'^2} \frac{w\_s^2}{\overline{W}^2} Re - \frac{388}{75} RR' \frac{w\_s}{\overline{W}} \frac{d}{dz} (1 + R'^2)^{-1/2} \end{aligned} \tag{A19}$$

where we have defined the local Reynolds and Mach numbers as:

$$Re\_{\perp} = \frac{\rho \overline{W} R}{\mu} \tag{A20}$$

and

$$Ma\_{\perp} = \frac{\overline{W}}{\sqrt{\frac{k\_B T}{m}}} \tag{A21}$$

respectively.

This is the pressure equation provided in Equation (32).

Appendix B In this Appendix, we provide the coefficients of η in axial velocity Equation (35):

<sup>G</sup> = 4+ RedR dz <sup>−</sup> <sup>44</sup> <sup>225</sup> <sup>+</sup> 11 225 w2 s <sup>W</sup><sup>2</sup> <sup>−</sup> <sup>33</sup> 225 1 1 + R<sup>2</sup> w2 s W<sup>2</sup> + 2 75 ws W <sup>6</sup> <sup>−</sup> <sup>156</sup> <sup>√</sup>1 + <sup>R</sup><sup>2</sup> <sup>+</sup> 11 <sup>√</sup>1 + <sup>R</sup><sup>2</sup> (R<sup>2</sup> <sup>+</sup> RR) <sup>−</sup> <sup>22</sup> RR<sup>2</sup>R (1 + R<sup>2</sup>)<sup>3</sup>/<sup>2</sup> +M a<sup>2</sup> −16 <sup>3</sup> <sup>+</sup> 11 225 gR W<sup>2</sup> −4Re + ws W Re − 2 ws W R <sup>√</sup>1 + <sup>R</sup><sup>2</sup> (A22) + 4 75 ws W 21 + ws W + 104 <sup>√</sup>1 + <sup>R</sup><sup>2</sup> <sup>−</sup> <sup>26</sup> <sup>√</sup>1 + <sup>R</sup><sup>2</sup> ws W + 8 75 1 Re ws W <sup>−</sup><sup>25</sup> <sup>R</sup> <sup>√</sup>1 + <sup>R</sup><sup>2</sup> <sup>−</sup> <sup>w</sup><sup>s</sup> W R <sup>√</sup>1 + <sup>R</sup><sup>2</sup> + 26 ws W R 1 + R<sup>2</sup> <sup>H</sup> <sup>=</sup> <sup>−</sup>2 + <sup>43</sup> 225 RedR dz <sup>4</sup> <sup>−</sup> <sup>w</sup><sup>2</sup> s <sup>W</sup><sup>2</sup> <sup>+</sup> 3 1 + R<sup>2</sup> w2 s W<sup>2</sup> + 2 75 ws W <sup>−</sup>3 + <sup>78</sup> <sup>√</sup>1 + <sup>R</sup><sup>2</sup> <sup>−</sup> <sup>43</sup> <sup>√</sup>1 + <sup>R</sup><sup>2</sup> (R<sup>2</sup> <sup>+</sup> RR) + 86 RR<sup>2</sup>R (1 + R<sup>2</sup>)<sup>3</sup>/<sup>2</sup> +M a<sup>2</sup> −8 3 + 43 225 gR W<sup>2</sup> <sup>4</sup>Re <sup>−</sup> <sup>w</sup><sup>s</sup> W Re + 2 ws W R <sup>√</sup>1 + <sup>R</sup><sup>2</sup> (A23) + 2 75 ws W 21 + ws W + 104 <sup>√</sup>1 + <sup>R</sup><sup>2</sup> <sup>−</sup> <sup>26</sup> <sup>√</sup>1 + <sup>R</sup><sup>2</sup> ws W 

$$+\frac{4}{75} \frac{1}{Re} \frac{w\_s}{\overline{W}} \left( -25 \frac{R'}{\sqrt{1+R'^2}} - \frac{w\_s}{\overline{W}} \frac{R'}{\sqrt{1+R'^2}} + 26 \frac{w\_s}{\overline{W}} \frac{R'}{1+R'^2} \right) \tag{A24}$$

$$\begin{split} I &= \quad Re\frac{dR}{dz}\left[-\frac{4}{5} + \frac{1}{5}\frac{w\_s^2}{\overline{W}^2} - \frac{3}{5}\frac{1}{1+R^2}\frac{w\_s^2}{\overline{W}^2}\right] \\ &+ 2\frac{w\_s}{5\overline{W}}\left[-2 + \frac{2}{\sqrt{1+R^2}} + \frac{3}{\sqrt{1+R^2}}(R'^2 + RR'') - 6\frac{RR'^2R''}{(1+R'^2)^{3/2}}\right] \\ &+ Ma^2 \left[\frac{32}{5} + \frac{1}{5}\frac{gR}{\overline{W}^2}\left(-4Re + \frac{w\_s}{\overline{W}}Re - 2\frac{w\_s}{\overline{W}}\frac{R'}{\sqrt{1+R'^2}}\right) \\ &+ \frac{4}{225}\frac{w\_s}{\overline{W}}\left(2 - 23\frac{w\_s}{\overline{W}} - \frac{452}{\sqrt{1+R^2}} + \frac{113}{\sqrt{1+R^2}}\frac{w\_s}{\overline{W}}\right) \\ &+ \frac{8}{225}\frac{1}{Re}\frac{w\_s}{\overline{W}}\left(90\frac{R'}{\sqrt{1+R'^2}} + 23\frac{w\_s}{\overline{W}}\frac{R'}{\sqrt{1+R'^2}} - 113\frac{w\_s}{\overline{W}}\frac{R'}{1+R'^2}\right)\end{split} \tag{A26}$$

$$\begin{aligned} J &= \operatorname{Re} \frac{dR}{dz} \left[ \frac{4}{15} - \frac{1}{15} \frac{w\_s^2}{\overline{W}^2} + \frac{3}{15} \frac{1}{1+R^2} \frac{w\_s^2}{\overline{W}^2} \right] \\ &+ \frac{1}{5} \frac{w\_s}{\overline{W}} \left[ 3 - \frac{3}{\sqrt{1+R^2}} + \frac{2}{\sqrt{1+R^2}} (R'^2 + RR'') - 4 \frac{RR'^2R'}{(1+R^2)^{3/2}} \right] \\ &+ Ma^2 \left[ -\frac{32}{15} + \frac{1}{15} \frac{gR}{\overline{W}^2} \left( 4Re - \frac{w\_s}{\overline{W}} Re + 2 \frac{w\_s}{\overline{W}} \frac{R'}{\sqrt{1+R^2}} \right) \\ &+ \frac{1}{75} \frac{w\_s}{\overline{W}} \left( -44 + 21 \frac{w\_s}{\overline{W}} + \frac{244}{\sqrt{1+R^2}} - \frac{61}{\sqrt{1+R^2}} \frac{w\_s}{\overline{W}} \right) \\ &+ \frac{2}{75} \frac{1}{Re} \frac{w\_s}{\overline{W}} \left( -40 \frac{R'}{\sqrt{1+R'^2}} - 21 \frac{w\_s}{\overline{W}} \frac{R'}{\sqrt{1+R'^2}} + 61 \frac{w\_s}{\overline{W}} \frac{R'}{1+R'^2} \right) \right] \\ &K \quad = \frac{w\_s}{W\sqrt{1+R^2}} \end{aligned} \tag{A28}$$

Reprinted from *Entropy*. Cite as: Herman, A. Shear-Jamming in Two-Dimensional Granular Materials with Power-Law Grain-Size Distributio. *Entropy* 2013, *15*, 4802–4821.

#### *Article*

## Shear-Jamming in Two-Dimensional Granular Materials with Power-Law Grain-Size Distribution

#### Agnieszka Herman

Institute of Oceanography, University of Gdansk, Pilsudskiego 46, Gdynia 81-378, Poland; E-Mail: oceagah@ug.edu.pl; Tel.: +48-58-5236887; Fax: +48-58-5236678

*Received: 13 August 2013; in revised form: 28 October 2013 / Accepted: 31 October 2013 / Published: 5 November 2013*

Abstract: Although substantial progress has been made in recent years in research on sheared granular matter, relatively few studies concentrate on the behavior of materials with very strong polydispersity. In this paper, shear deformation of a two-dimensional granular material composed of frictional disk-shaped grains with power-law size distribution is analyzed numerically with a finite-difference model. The analysis of the results concentrates on those aspects of the behavior of the modeled system that are related to its polydispersity. It is demonstrated that many important global material properties are dependent on the behavior of the largest grains from the tail of the size distribution. In particular, they are responsible for global correlation of velocity anomalies emerging at the jamming transition. They also build a skeleton of the global contact and force networks in shear-jammed systems, leading to the very open, "sparse" structure of those networks, consisting of only ∼35% of all grains. The details of the model are formulated so that it represents fragmented sea ice moving on a two-dimensional sea surface; however, the results are relevant for other types of strongly polydisperse granular materials, as well.

Keywords: granular materials; finite-element simulation; shear deformation; jamming phase transition; polydispersity; force networks

#### 1. Introduction

Granular materials are an example of systems in which relatively simple interactions between similar discrete objects (grains, or particles) produce very complex emergent behavior. Extensive experimental and numerical research on granular materials in recent years produced many important insights into the dynamics of those systems. One group of studies has concentrated on the jamming phase transition, revealing new details of the (relatively well understood) isotropic jamming (e.g., [1,2]), as well as the existence of previously unexplored jammed states in systems subject to shear strain [3–10]. However, the behavior of very strongly polydisperse materials in those settings remains very poorly understood. Most works, including those cited above, concentrate on materials with narrow grain-size distributions (GSD). How polydispersity influences the system dynamics close to and at the jamming phase transition remains an open question.

An example of a granular material with a very wide GSD is sea ice, especially close to the ice edge (the so-called marginal ice zone) or, more generally, in regions where, due to the action of wind, ocean surface waves and currents, the ice cover is fragmented into separate floes. A typical example of this ice cover type is shown in Figure 1. Because the vertical dimension (thickness) of the floes is much smaller than their horizontal dimension (diameter), sea ice can be regarded as two-dimensional (2D). The shape of the ice floes may vary from very irregular through polygonal to nearly circular, depending on the external forcing (especially waves) and the ice age and thickness. However, in most situations, the geometrical properties of the floes, like, e.g., the aspect ratio, remain within a relatively narrow range independently of the area and conditions of observation [11–13]. More importantly, the observed floe-size distributions (FSDs) are very wide and have power-law tails with an exponent α < 2 [11–16]. Although it is generally acknowledged that the granular nature of fragmented sea ice influences its dynamics (see, e.g., [17]), most large-scale sea ice models treat ice as a viscous-plastic continuum; our knowledge of how and when the processes taking place at a floe level influence the large-scale behavior of sea ice is very limited.

Figure 1. Fragment of a satellite image of fragmented sea ice in the marginal ice zone off the Antarctic Peninsula (source: Landsat [18]).

This work is a continuation of previous numerical studies on sea ice composed of disk-shaped floes with power-law size distribution [19–21]. It examines the behavior of a 2D polydisperse granular material composed of frictional grains under pure-shear deformation (constant packing fraction, or, in the sea-ice nomenclature, ice concentration A). The grains are placed on a frictional substrate (representing the ocean) and interact with each other by means of Hertzian contact forces. Although the details of the model are formulated so that it can represent sea ice moving on the sea surface, the results are relevant in a more general context of sheared, strongly polydisperse granular materials. Therefore, the specific sea-ice terminology is generally avoided in the rest of this paper, with an exception yo some parts of the discussion in the last section.

The paper is structured as follows: the next section contains the description of the model—its assumptions, governing equations and numerical formulation. The results are presented and discussed in Section 3, with an emphasis on those aspects of the model behavior that are related to the polydispersity of the material. In particular, it is demonstrated that grains from the tail of the GSD play a crucial role in the development of the force and contact networks during the jamming phase transition and are responsible for the emergence of domain-wide correlations between velocity anomalies of individual grains. Finally, conclusions are formulated in Section 4.

#### 2. Model Description

#### *2.1. Model Equations*

The modeled system consists of i = 1,...,N disk-shaped grains with radii ri, thickness h<sup>i</sup> and density ρi, occupying a certain two-dimensional region, S. Let us denote the surface area, volume and mass of the i-th grain with Si, V<sup>i</sup> and mi, respectively. Obviously, m<sup>i</sup> = ρiV<sup>i</sup> = πρihir<sup>2</sup> <sup>i</sup> . The grains move within S, due to both external forcing (e.g., friction against the underlying material) and interactions with neighboring grains. The external forcing acting on the individual grains can be expressed in terms of the density of the surface and body forces, denoted with ˇ**f**s,i and ˇ**f**b,i, respectively. The net interaction force acting on grain i at a given time instance, t, is a sum of all pairwise interaction forces with grains that are in contact with i at time t. The set of those grains will be denoted with <sup>C</sup>i(t). For <sup>j</sup> ∈ Ci, **<sup>F</sup>**ˆij,n and **<sup>F</sup>**ˆij,t, denote the normal and tangential components, respectively, of the grain-grain interaction force. Under these assumptions, the general form of the equations for the linear and angular momentum of the i-th grain is:

$$m\_i \frac{d\mathbf{u}\_i}{dt} = \int\_{S\_i} \mathbf{\check{f}}\_{s,i} \mathrm{d}s + \int\_{V\_i} \mathbf{\check{f}}\_{b,i} \mathrm{d}v + \sum\_{j \in \mathcal{C}\_l(t)} \mathbf{\hat{F}}\_{ij,n} \tag{1}$$

and:

$$m\_i \frac{r\_i^2}{2} \frac{d\omega\_i}{dt} = \mathbf{k} \cdot \left[ \int\_{S\_i} \mathbf{r} \times \mathbf{\check{f}}\_{s,i} \mathrm{d}s + \int\_{V\_i} \mathbf{r} \times \mathbf{\check{f}}\_{b,i} \mathrm{d}v + \sum\_{j \in \mathcal{C}\_i(t)} \mathbf{r}\_{ij} \times \mathbf{\hat{F}}\_{ij,t} \right] \tag{2}$$

where **u**<sup>i</sup> denotes the velocity of the grain's mass center, ω<sup>i</sup> (its angular velocity), **k** (a unit vector pointing vertically upward), **r** (the horizontal distance from the grain's center) and **r**ij (a vector pointing from the center of grain i to the contact point with grain j). The interaction forces are calculated based on the Hertzian contact model. The normal force, **F**ˆij,n, has two components, a contact force (dependent on the overlap between grains) and a damping force (dependent on the relative normal velocity between grains). The tangential force, **F**ˆij,t, has two component,s as well, namely the shear force (the so-called "history effect" that accounts for the tangential displacement of the interacting grains during contact) and the damping force (dependent on the relative tangential velocity between grains). All four of those forces depend on the effective radius of grains i and j, rij = rirj/(r<sup>i</sup> +r<sup>j</sup> ), and on their material properties, *i.e.*, the elastic modulus, E, and Poisson's ratio, ν, assumed constant for all grains. Details concerning the formulation of **F**ˆij,n and **F**ˆij,t can be found, e.g., in [22,23] and in the documentation of the numerical model (see below).

Further details concerning the model formulation are given in [21]. The previous works [19–21] stressed the importance of the size-dependent response of individual grains (ice floes) to the forcing acting on them, relevant at low and medium packing fractions (A  1). However, the focus of this paper is on a slow deformation of a compact material in or close to the jammed state. Therefore, the simulations described further were performed with a simplified set of equations, without the form-drag terms responsible for the size-dependent response (see [21] for comparison). Furthermore, it is assumed that the frictional substrate is at rest (*i.e.*, both the wind speed and the current speed are zero); the inertial effects (Coriolis term) are omitted, as well. Linearized formulae are used for the grain-substrate (ice-ocean) friction term. Thus, Equations (1) and (2) simplify to:

$$m\_i \frac{d\mathbf{u}\_i}{dt} = -\pi r\_i^2 C\_f \mathbf{u}\_i + \sum\_{j \in \mathcal{C}\_i(t)} \hat{\mathbf{F}}\_{ij,n} \tag{3}$$

and:

$$m\_i \frac{r\_i^2}{2} \frac{d\omega\_i}{dt} = -\pi \frac{r\_i^4}{2} C\_f \omega\_i + \mathbf{k} \cdot \sum\_{j \in \mathcal{C}\_i(t)} \mathbf{r}\_{ij} \times \hat{\mathbf{F}}\_{ij,t} \tag{4}$$

where C<sup>f</sup> denotes the friction coefficient (in the context of sea ice, C<sup>f</sup> = ρwChw, where ρ<sup>w</sup> is the water density and Chw, the water-ice drag coefficient).

As described in [21], the model is based on the LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator) library [24,25], designed for simulating large systems of interacting objects (particles, molecules, *etc*.). For the purpose of sea ice modeling, LAMMPS has been extended to disk-shaped particles moving within two-dimensional domains. The Hertzian contact model, available in the official version of LAMMPS, but only for spherical particles, has been modified in order to account for non-spherical grain shape. The modification concerns the relationship between the overlap between the grains and the shape and size of the contact area between them, which, in turn, determines the resulting interaction force. In the case of spherical particles, the contact area is circular, in the case of cylindrical particles, rectangular.

For all N grains, Newton equations of motion (1) and (2) (or, in this particular case, (3) and (4)) are solved by means of the velocity-Verlet integrator.

#### *2.2. Model Configuration and Simulations*

In the simulations described in this work, the model domain, S, was rectangular, with length Lx, width L<sup>y</sup> = Lx/2 and surface area S = LxL<sup>y</sup> = π <sup>N</sup> <sup>i</sup>=1 r<sup>2</sup> <sup>i</sup> /A. In isotropic-compression simulations, periodic boundary conditions were used in both x and y directions; in pure-shear simulations, only along the x-axis, with the grains along the lower model boundary defined as "frozen" (velocity set to zero throughout the simulation) and the grains along the upper model boundary moving with a prescribed velocity **u**<sup>i</sup> = [ub, 0].

A complete list of the model parameters can be found in Table 1. The simulations were conducted for grains with a power-law (PL) GSD, with the mean grain radius r¯ = 4.0 m and the slope of the distribution α = 1.8, a typical value observed in sea ice (see, e.g., [15,16]). The sample of <sup>N</sup> = 2 · <sup>10</sup><sup>4</sup> grain radii was generated with a maximum-likelihood method (e.g., [26], chapter 6.5), which provides an estimate of the most probable value of the i-th element in the rank-ordered sample of finite size N from a given distribution. Thus, deviations from a power law in the tail of the GSD, resulting from the finite sample size, are properly accounted for.


Table 1. Physical and numerical model parameters used in the simulations. GSD, grain-size distributions.

In order to better illustrate the role of the extreme polydispersity in systems with a PL GSD, additional simulations were performed with a narrow, bidisperse (BD) GSD, corresponding to that used by Bi and colleagues [7,8], *i.e.*, with the ratio of the radii of the coarse and fine fraction r1/r<sup>2</sup> = 1.16 and the respective numbers of grains n<sup>1</sup> = 0.2N and n<sup>2</sup> = 0.8N. The total number, N, and mean grain radius, r¯, were the same as in the model setup with the PL GSD, resulting in r<sup>1</sup> = 4.503 m, r<sup>2</sup> = 3.874 m. In the remaining parts of the paper, all results and comments relate to the PL-GSD simulations unless clearly stated otherwise. Experiments with PL GSD with other values of α from the range [1.5, 2.0] produced very similar results and will not be discussed here.

The simulations were performed in two stages: (i) uniform, biaxial compression up to the jamming phase transition; and (ii) pure shear for a set of combinations of packing fraction A and strain rate values (with = ub/Ly). The simulations of the second stage were initialized by sampling the results of the first stage at selected values of A and letting the system relax before applying shear strain.

#### 3. Results and Discussion

The general model behavior in uniform-compression simulations is described in [21]. The jamming phase transition in the analyzed case occurs at A<sup>J</sup> ≈ 0.918. It is accompanied by a rapid increase of the internal pressure, p, the fraction of non-rattler grains (defined here as grains with at least two contacts) and the mean contact number, ηc, *i.e.*, changes indicative of the percolation of the contact and force network. Additional simulations performed with different values of the GSD exponent, α, showed that, not surprisingly, the jamming packing fraction, A<sup>J</sup> , increases with decreasing α (the wider the GSD, the denser the packing fraction attainable), but the course of the jamming transition (for example, the shape of the p(A − A<sup>J</sup> ) curve) remains almost unaffected. This suggests that the results presented here are relevant for a wider range of model parameters than those actually used in the simulations.

The analysis below concentrates on the pure-shear model runs, with an emphasis on the role of polydispersity in the model behavior close to and at the jamming phase transition. All results have been obtained for packing fractions A<A<sup>J</sup> , *i.e.*, below the isotropic jamming point. Hence, the "jammed states" in the discussion below refer to regions of shear-jammed and fragile states on the jamming phase diagram proposed by Bi and colleagues [8]. Anticipating the further analysis of the results, the term "jammed state" used throughout the rest of the paper refers to states in which the largest contact network has percolated the whole system in both directions (and which are accompanied by certain global characteristics described further).

#### *3.1. Shear Jamming: General Characteristics*

The general behavior of the modeled system under pure shear deformation depends on the packing fraction, A, and the strain rate, [8]. At low A, the system remains in an unjammed state, in which the internal stress is generated via short, binary collisions between neighboring grains. Regions of jammed, more densely packed grains develop only locally (Figure 2); they are short-lived and disperse, due to interactions with the surrounding, more loosely packed regions. Hence, the internal stress level at the system scale remains very low, a few orders of magnitude lower than in the jammed states (Figure 3), when the force network between grains percolates the whole system (Figure 2) and the neighboring grains remain in contact for many seconds or even minutes (see further Section 3.2), *i.e.*, periods of time up to a few orders of magnitude longer than the duration of a typical binary collision.

Figure 2. Snapshots of contact networks in the modeled system in the unjammed (left; A = 0.890, u<sup>b</sup> = 0.5 m/s) and jammed (right; A = 0.905, u<sup>b</sup> = 1.0 m/s) state. For each grain, i, a line is drawn from its center to the center of the neighboring grain, j, if j ∈ Ci(t). Grains belonging to the 'frozen' and moving boundaries are not shown.

Figure 3. Time series of the average contact number, η<sup>c</sup> (a), force-network anisotropy η<sup>a</sup> (b), pressure p (c), shear stress τ (d) and the principal angle, θ<sup>p</sup> (e), during simulations with: A = 0.908 and u<sup>b</sup> = 1.0 m/s (blue); A = 0.905 and u<sup>b</sup> = 0.5 m/s (black); A = 0.905 and u<sup>b</sup> = 0.2 m/s (magenta).

The large-scale system behavior can be described by means of the properties of the stress and fabric tensors: the pressure p = (σ<sup>1</sup> +σ2)/2 and the shear stress τ = (σ<sup>2</sup> −σ1)/2 are calculated from the principal stresses, σ<sup>1</sup> and σ2; the mean contact number η<sup>c</sup> = λ<sup>1</sup> + λ<sup>2</sup> and the contact-network anisotropy η<sup>a</sup> = (λ<sup>2</sup> − λ1)/η<sup>c</sup> are calculated from the eigenvalues of the fabric tensor, λ<sup>1</sup> and λ<sup>2</sup> (see [8,21] for details). Both in the jammed and unjammed states, far from the jamming-transition point, those four large-scale system characteristics—p, τ , η<sup>c</sup> and ηa—remain relatively stable in time, and the system recovers fast from short rearrangement events that sporadically take place (Figure 3). In between those two extremes, the system undergoes rapid changes and shifts from unjammed to jammed states and *vice versa* (black lines in Figure 3). Between those two extremes, the force networks often have a fragile, "openwork" structure, with relatively large unjammed areas where the stress remains very low and with forces transmitted via long "strands" of approximately linearly aligned grains. As in the case of fragile states observed recently [8], those force networks may span the whole model domain in only one (compressive) direction, giving the material anisotropic strength in response to deformation, which manifests itself in high values of η<sup>a</sup> (see, also, Section 3.2). The present results suggest that, even in constant strain conditions, the fragile states are short-lived, at least in the range of A and combinations analyzed here.

Apart from the properties of the stress and contact-fabric tensors, a signature of jamming is also present in the grains' velocity, both means and their anomalies. Let us define **u**m(y, t) and σm(y, t), as the mean and standard deviation, respectively, of the velocity of all grains that at time t have their y-coordinate within a certain small distance, δ, from y (*i.e.*, that lie inside a stripe of length L<sup>x</sup> and width 2δ). Further, let **u**m(y) and σm(y) denote the time mean of **u**m(y, t) and σm(y, t) over the whole simulation time and **u** <sup>i</sup>(**x**, t)—the velocity anomaly of grain i, *i.e.*, **u** <sup>i</sup>(**x**, t) = **u**i(**x**, t) − **u**m(y).

The profiles of **u**m(y) and σm(y) are shown in Figure 4 for a range of A values corresponding to unjammed and jammed states. At low packing fractions, the motion of the grains is confined to the region close to the moving boundary, and a narrow zone of strong shear separates this region from the rest of the model domain, remaining almost at rest. To the contrary, jammed states are characterized by an almost constant velocity gradient d **u**m(y) /dy and constant standard deviation of velocity σm(y), independently on the distance from the moving boundary, *i.e.*, the strain is distributed over the whole system.

In order to characterize the variability of velocity anomalies, it is convenient to define a measure analogous to entropy (as used in statistical mechanics), characterizing the spread of velocity anomalies of individual grains at a given time, t:

$$E(t) = -c \sum\_{i=1}^{n} \left( p\_i \log\_2 p\_i \right) \tag{5}$$

where n denotes the number of bins of the discrete pdfof **u** <sup>i</sup>(**x**, t), p<sup>i</sup> is the probability density of the i-th bin and c = 1/ log<sup>2</sup> n—a normalization constant, introduced so that the maximum value of E = 1. In order to account for different ranges of **u** <sup>i</sup>(**x**, t) in different model runs, the pdfs were estimated by dividing the range, [q<sup>0</sup>.<sup>01</sup>, q<sup>0</sup>.<sup>99</sup>], into n = 100 bins of equal width, where q<sup>0</sup>.<sup>01</sup>, q<sup>0</sup>.<sup>99</sup> denote the 1% and 99% quantiles of the data, respectively. Thus, E analyzed here reflects the shape of the pdfs within the respective inter-quantile range, and not their widths, which, as shown in Figure 4b, is much larger in the jammed than in the unjammed states.

Figure 4. Profiles of the average (a) and standard deviation (b) of the x-component of grain velocity in the function of the normalized y-distance (y = 0 at the "frozen" boundary and y = 1 at the moving boundary). Results obtained with u<sup>b</sup> = 1.0 m/s.

Figure 5. Normalized entropy E of the anomalies, **u** <sup>i</sup>(**x**, t), in the function of the grain packing fraction, A (a), and grain size (b). On each box, the central mark is the median, the edges of the box are the 25th and 75th percentiles and the whiskers extend to the most extreme data points not considered outliers. In (b), for the two selected values of A (0.890 and 0.905), the statistics are calculated three times: for all N grains and for the subsets of the 10% largest and 10% smallest grains, respectively. Results obtained with u<sup>b</sup> = 1.0 m/s.

As can be seen in Figure 5a, E increases with increasing packing fraction A. It has highest values, exceeding 0.85, and lowest time variability (see the boxes and whiskers in Figure 5) in shear-jammed states. In unjammed states, E, most of the time remains within the 0.65–0.7 range. Thus, the range of instantaneous velocity anomalies in jammed systems is significantly larger, even corrected for the width of the respective pdfs. On the other hand, jamming is associated with a transition from local to global correlations of **u** <sup>i</sup>(**x**, t), as illustrated in Figures 6 and 7, showing the linear correlation coefficient, C, between pairs of grains in two selected model runs (for two grains, i and j, C is a Pearson correlation coefficient between the x-components of **u** <sup>i</sup> and **u** <sup>j</sup> over time t<sup>c</sup> = 100 min). At low A, statistically significant correlation of velocity anomalies is observed only between grains within a small spatial distance from each other. At high A, the correlation remains high within the whole model domain. Those two facts—velocity anomalies correlated on the system-scale and high values of E—indicate that in a jammed state, the grains tend to have large velocity anomalies that are of the same sign.

Figure 6. Snapshots of the modeled system for u<sup>b</sup> = 1.0 m/s and the packing fraction A = 0.809 (a) and A = 0.905 (b), showing the linear correlation coefficient, C, between the velocity anomalies, **u** <sup>i</sup>(**x**, t), of a selected grain (dark brown, C = 1) and all other grains in the system. C was calculated for a period of time equal to 100 minutes. Grains belonging to the "frozen" and moving boundaries are not shown.

Figure 7. The correlation coefficient, C, between the velocity anomalies, **u** <sup>i</sup>(**x**, t), calculated for pairs of grains from a subset of the 10% largest (continuous lines) and 10% smallest (dashed lines) grains in the whole ensemble, in the function of the grain-grain distance. Results of simulations with u<sup>b</sup> = 1.0 m/s and A = 0.890 (blue), A = 0.905 (red).

#### *3.2. The Role of Polydispersity*

In systems with power-law GSD, the largest grains occupy a substantial part of the model domain (even with increasing system size N), and it is their locations and relative movement that have a deciding influence on the system as a whole. Sub-regions of the model domain that at a given time instance are filled with small grains can change their shape (and thus react to strain deformation) more easily than assemblies of large grains. In many respects, assemblies of very small grains act as a plastic, easily deformable 'filler' occupying empty spaces between very large grains. An analysis of animations illustrating the time evolution of the modeled system reveal that the rapid jamming and un-jamming events mentioned earlier (black curves in Figure 3) tend to be associated with reorganization of the positions of the largest grains. This observation seems confirmed by the fact that, at high packing fractions, the analyzed measures of the grains' velocity anomalies, like the entropy, E, are strongly correlated to the global instantaneous pressure, p, and shear stress τ and that this correlation is higher for a subset of the largest grains than for the whole system. For example, in the model run with A = 0.905 and u<sup>b</sup> = 1 m/s, the correlation of E with log(p) equals 0.83 and 0.95 for, respectively, all and the subset of 10% of the largest grains.

Previous experiments with an earlier version of the model demonstrated that polydispersity plays an important role in many aspects of the dynamics of sea ice composed of floes with power-law size distribution, including the formation of clusters in response to wind [19,20]. Not surprisingly, polydispersity also influences the behavior of the sheared systems studied here. Many global characteristics of the system, including those analyzed above, have different values when they are calculated for a subset of the largest or smallest grains, revealing their different response to the forcing and interactions with neighboring grains. In particular, the entropy, E, of velocity anomalies of the largest grains in an ensemble is higher than the system average at all packing fractions analyzed, *i.e.*, both in jammed and unjammed states (Figure 5b). The emergence of long-range correlations between velocity anomalies at the jamming transition, described in the previous section, takes place almost exclusively due to correlations between the largest grains in the system (Figures 7 and 8). Similarly, at low A, the high values of C within clusters (0.5–0.6 on average) are observed only for pairs of the largest grains. Furthermore, whereas at low A, those values drop rapidly with increasing grain-grain distance, the rate of that decrease is much slower in jammed states (compare the continuous curves in Figure 7), resulting in a shift of the pdf of C towards larger values, representing statistically significant correlation (Figure 8b). To the contrary, the pdfs of C of the smallest grains (in this case, r<sup>i</sup> < 1.87 m) hardly change at the jamming transition, with most values of C remaining at a very low, statistically insignificant level.

Figure 8. pdfsof the correlation coefficient, C, between the velocity anomalies, **u** <sup>i</sup>(**x**, t), calculated for pairs of grains from a subset of the 10% largest (blue) and 10% smallest (red) grains in the whole ensemble. Results of simulations with u<sup>b</sup> = 1.0 m/s and A = 0.890 (a), A = 0.905 (b).

Thus, the increase of the packing fraction, A, towards jamming is accompanied by the growth of clusters of coordinated motion of the relatively small subset of the largest grains (notably, the range of sizes of those grains is still very wide, between 6.5 and 180 m). It is worth noting that similar behavior has been described recently for sheared bidisperse granular systems, in which the dominant

dynamical modes were associated with reorganization of grains within localized clusters [10]. Similarly, Weeks and colleagues [27] observed cooperative motion of particles within clusters in colloidal supercooled fluids, with the size of clusters rapidly increasing when the system approached the glass transition.

Figure 9. Selected properties of the contact networks in the modeled system for a number of packing fractions, A: number of contacts of individual non-rattler grains, nc,i (a); nc,i scaled with grain perimeter 2πr<sup>i</sup> (b); percentage of the simulation time when individual grains were non-rattler grains (c); percentage of grains with at least three contacts (d); average contact number η<sup>c</sup> (e); and contact-network anisotropy η<sup>a</sup> (f). In (d–f), the elements of the box symbols are the same as in Figure 5; they reflect the time variability of the analyzed variables during the simulation.

Due to obvious geometrical reasons, in strongly polydisperse materials, the number of contacts of individual grains strongly varies. Interestingly, the jamming transition (inferred from the size of the largest connected cluster) in the analyzed cases still takes place when the average contact number, ηc, exceeds the value of three (packing fraction A = 0.905 in Figure 9e), similarly as in monodisperse and weakly polydisperse systems ([8] and Figure 10). For the large grains from the tail of the GSD, the contact number of individual grains, nc,i, is an approximately linear function of their radius (and perimeter), with nc,i/(2πri) ≈ 0.05 in the jammed state (Figure 9a,b), e.g., nc,i ∼ 30 for r<sup>i</sup> = 100 m.

562

A linear relationship for nc(r) has been obtained recently by Shaebani and colleagues [28] in simulations of 2D uniformly compressed, weakly polydisperse systems, in agreement with their mean-field solution. In our simulations, smaller grains often have just one or two neighbors (hence the points on the left side of Figure 9b tend to lie on the r−<sup>1</sup> curve), and importantly, it is their incorporation into the system-wide contact network that leads to its consolidation at the jamming transition: whereas the largest grains are non-rattler grains most of the time, even in unjammed states, the smallest grains switch at jamming from predominantly freely moving to predominantly non-rattler (Figure 9c; see also Figure 2). Consequently, jamming is associated with a decrease of the mean radius of grains forming the main contact network (not shown). On the other hand, it is the largest grains that build the stable skeleton of the global contact and force network, in the sense that the great majority of grains, predominantly from the left part of the GSD, does not participate in its formation. Even in a jammed state, only ∼35% of grains have three or more contacts (Figure 9d), and even though η<sup>c</sup> > 3, the mean contact number for the 90% smaller grains (*i.e.*, excluding the 10% largest) is smaller than three. The "sparse" character of the grain-grain contacts is clearly seen in Figure 11. In the jammed state at A = 0.905, the force network percolates the whole model domain (see also Figure 2), but it has an "openwork" structure, with the largest grains incorporated into long force chains and irregular "cells", surrounding unjammed regions, usually filled with very small grains. As already mentioned, the assemblies of the smallest grains act as a semi-plastic "filler", adjusting its shape to the deforming cells of the main force network.

All those properties of the analyzed system are directly related to its extreme polydispersity. In the bidisperse reference case (Figure 10), the jamming transition is much more rapid in terms of the amount of grains that are incorporated into the global contact network: as soon as η<sup>c</sup> exceeds the value of three, roughly 80% of grains become non-rattler grains (Figure 10d,e). Moreover, the coarser and finer fractions contribute similarly to the contact network (compare Figure 10b,c), and the (very small) difference between the average number of neighbors of individual grains from the two fractions (Figure 10a) results simply from the difference between their perimeters.

Back to the PL-GSD simulation, it is also worth noticing that although the contact numbers of the largest grains are high independent of the packing fraction (Figures 9c and 11), in unjammed states, most of those contacts do not form part of stable force strains, but reflect individual collisions as they 'fight their way' among smaller neighbors (in Figure 11 at A = 0.890, most of the lines outgoing from the centers of the largest grains are black, *i.e.*, they lead to grains with a number of contacts lower than three). Such contacts rarely survive more than a few seconds. Indeed, the exceedance probability of contact lifetime is at low A similar for all grain sizes (Figure 12) and only ∼10% of contacts survive for longer than one minute, as compared to ∼25% and 40% of contacts between the smallest and largest grains, respectively, observed in the jammed state.

Moreover, it must be remembered that the condition, nc,i ≥ 3, alone is not sufficient to stabilize the position of individual grains within the contact network. The arrangement of the contacts around the grain's perimeter is important, as well. For a given grain, i, this arrangement is determined by a set of vectors, **r**ij , for all j ∈ C<sup>i</sup> (see Section 2.1), which divide the grain into nc,i sectors. The central angle of the widest of those sectors (for brevity, we will call it the maximum contact angle, αmax) provides a useful measure of the above-mentioned stability of the i-th grain within the force network. Obviously, αmax < 180◦, possible only if nc,i > 2, is necessary for stability; αmax < 120◦ is possible only if nc,i > 3.

Figure 10. Selected properties of the contact networks in the bidisperse (BD) model case for a number of packing fractions, A: mean number of contacts of individual non-rattler grains, nc,i, from the finer and coarser fractions (a); percentage of the simulation time when individual grains from the finer (b) and coarser (c) fraction were non-rattler grains; percentage of grains with at least three contacts (d); average contact number η<sup>c</sup> (e); and contact-network anisotropy η<sup>a</sup> (f). In (b–f), the elements of the box symbols are the same as in Figure 5; they reflect the time variability of the analyzed variables during the simulation.

As can be seen in Figure 11b, many non-rattler grains, *i.e.*, those with nc,i > 2, have αmax > 180◦. In fact, ∼14% out of the ∼35% of grains building the percolated contact network (*i.e.*, ∼5% of the total) have αmax > 180◦. For comparison, out of 82%–83% of grains building the global network in the BD case, only ∼1% are unstable. Figure 13 shows the pdfs of αmax for the PL and BD cases in shear-jammed states. In the BD simulations, the pdfs have high peaks close to the 120◦ value, indicating the prevailing—very stable—contact arrangement with nc,i = 3 and roughly uniform distribution of neighbors around the grain's perimeter. Notably, the pdfs for the coarser and finer fractions have very similar shapes, with an exception of a small second peak close to 180◦ for the

finer fraction. To the contrary, in the PL simulations, the pdf of αmax of the smallest grains has a wide maximum shifted towards the unstable region and a long tail corresponding to individual, instantaneous collisions.

Figure 11. Zoomed fragments of the modeled system (top: A = 0.890; bottom: A = 0.905) corresponding to the situations shown in Figure 2. Color scale: number of contacts, nc,i, of individual grains (dark blue: zero; light blue: one; yellow: two; brown: three or more). Red lines: forces between grains with n<sup>c</sup> ≥ 3; black lines: the remaining forces.

Finally, another very important property of force networks in sheared systems is their anisotropy, ηa. It has been identified as an order parameter for shear-jammed states, in that it is non-zero in such states and zero in isotropically jammed systems [8]. The values of η<sup>a</sup> obtained in this work, in the BD and, especially, PL simulations, are lower than those reported in [8], which may be related to the details of the contact-model formulation. Characteristically, for the PL case, the η<sup>a</sup> values are roughly 50% higher for the 10% largest grains than the ensemble mean (not shown), again underlying the special role of those grains in shaping the force network structure. The overall variability of ηa(A) is, however, similar in both PL and BD simulations, *i.e.*, a rapid drop of η<sup>a</sup> is observed at jamming, preceded by a slight increase when the jamming is approached from below (Figures 9f and 10f). The strongest anisotropy was obtained in those model runs, in which the system underwent strong shifts between unjammed and jammed states (black curves in Figure 3).

Figure 12. Exceedance probability of the contact lifetime (in seconds) for two values of the packing fraction (A = 0.890, dashed lines; A = 0.905, continuous lines), calculated for contacts between the 10% largest (blue) and 10% smallest (red) grains.

Figure 13. pdfs of the maximum contact angle, αmax, for two model runs: with bidisperse (BD; A = 0.816) and power-law (PL; A = 0.905) grain-size distribution. For the BD case, the pdfs are calculated separately for the coarser and finer grain fraction; for the PL case—for the subsets of 10% largest and 10% smallest grains. The vertical dotted and dashed lines mark the values αmax = 120◦ and αmax = 180◦, respectively (see the text for a description).

#### 4. Conclusions

The results of the present study suggest that many global characteristics of granular materials with very wide GSD, including those indicative of the jamming phase transition, are determined by the behavior of and interactions between a relatively small subset of the largest grains from the tail of the GSD. They build the "core" of the contact and force networks in the material and, consequently, react in a coordinated manner to strain deformation.

To summarize, the percolation of the contact networks in the analyzed system with PL GSD is associated with the following changes of the global system properties: (i) rapid increase of the entropy of grain velocity anomalies; (ii) emergence of large-scale correlation between the velocity anomalies of the largest grains; (iii) rapid increase of the mean contact number and the fraction of non-rattler grains, accompanied by a rapid decrease of the contact network anisotropy; and (iv) rapid increase of the contact lifetimes, especially between the largest grains. In comparison to less strongly polydisperse systems, the percolated contact networks are built of a significantly smaller subset of grains in stable positions, capable of sustaining non-zero strain.

In the context of sea ice (and presumably other real-world polydisperse granular material, as well), the behavior of the largest grains is relevant from a practical point of view: very likely, it is they that are subject to observation. Measuring equipment is usually mounted on the largest and thickest ice floes, providing stable and relatively safe observational platforms. Similarly, in the analysis of remote sensing (satellite or airborne) images of sea ice, it is the largest floes that can be easily identified and tracked. Thus, it is relevant to understand how the behavior of floes (or, more generally, grains) from the tail of the GSD is different from the behavior of the remaining grains and how it is related to the properties of the granular material as a whole. Even though the simulations described in this paper are highly idealized (no wind, ocean currents, *etc*.), in view of the fact of how little is known about the influence of granular effects on sea ice dynamics, they provide a starting point for further, more advanced studies.

#### Acknowledgments

The calculations described in this paper were carried out at the Academic Computer Center (TASK) in Gdansk, Poland [29]. I would like to thank the anonymous reviewers for their insightful and very valuable comments and suggestions that helped to improve the quality of this paper.

#### Conflicts of Interest

The author declares no conflict of interest.

#### References

1. Zhang, H.; Makse, H. Jamming transition in emulsions and granular materials. *Phys. Rev. E* 2005, *72*, 011301.


Reprinted from *Entropy*. Cite as: Potestio, R.; Peter, C.; Kremer, K. Computer Simulations of Soft Matter: Linking the Scales. *Entropy* 2014, *16*, 4199–4245.

*Review*

## Computer Simulations of Soft Matter: Linking the Scales

Raffaello Potestio **<sup>1</sup>***,* \*, Christine Peter **<sup>2</sup>** and Kurt Kremer **<sup>1</sup>**


*Received: 3 June 2014; in revised form: 10 July 2014 / Accepted: 11 July 2014 / Published: 28 July 2014*

Abstract: In the last few decades, computer simulations have become a fundamental tool in the field of soft matter science, allowing researchers to investigate the properties of a large variety of systems. Nonetheless, even the most powerful computational resources presently available are, in general, sufficient to simulate complex biomolecules only for a few nanoseconds. This limitation is often circumvented by using coarse-grained models, in which only a subset of the system's degrees of freedom is retained; for an effective and insightful use of these simplified models; however, an appropriate parametrization of the interactions is of fundamental importance. Additionally, in many cases the removal of fine-grained details in a specific, small region of the system would destroy relevant features; such cases can be treated using dual-resolution simulation methods, where a subregion of the system is described with high resolution, and a coarse-grained representation is employed in the rest of the simulation domain. In this review we discuss the basic notions of coarse-graining theory, presenting the most common methodologies employed to build low-resolution descriptions of a system and putting particular emphasis on their similarities and differences. The AdResS and H-AdResS adaptive resolution simulation schemes are reported as examples of dual-resolution approaches, especially focusing in particular on their theoretical background.

Keywords: soft matter; coarse-graining; adaptive resolution simulations

#### 1. Introduction

Since the pioneering work carried out by Berni Alder [1] in the 1950s, *in silico* experiments, such as Molecular Dynamics (MD) or Monte Carlo (MC) simulations, allowed researchers to obtain major advancements in the understanding of systems with many degrees of freedom. In particular, during the last few decades, the increasing accuracy of the force-fields, the improvement of the algorithms, and the steady boost of computer power made it possible to perform insightful simulations of a broad variety of systems of increasing size and complexity, ranging from simple liquids -composed of idealized, point-like molecules interacting via simple potentials- to biomolecules. Nonetheless, the amount of available computational resources can be insufficient to simulate, for a physically meaningful time, even the simplest nontrivial macromolecule. It is often the case, in fact, that "interesting" phenomena in these systems occur on very long time-scales: a simple example of this is provided by the diffusion of a polymer in a melt [2,3]; the same behavior can observed in conformational changes of proteins [4–9], at least in those cases in which the force field provides a good approximation to the real atomistic interactions.

At the same time, in many cases the massive amount of data that are produced in a simulation is composed mostly of non-useful information. A prototypical example is given by the solvent: the water molecules that solvate a protein or a membrane are typically discarded from the analysis that follows the simulation, with the possible exception of a few solvation shells around the molecule itself. In this case a large fraction of the computational power is employed in the integration of the equations of motion of degrees of freedom which are extremely relevant *during* the simulations, but are completely neglected *afterwards*.

In order to overcome this limitation, *coarse-grained models* [10–15] have been developed, where the structure and interactions of the original system are replaced with simpler ones, which are easier to describe, model, simulate and understand. The assumption underlying the coarse-graining of a system is that above a given length scale the low-level, chemistry-specific detail of the model affects some properties of the system only in a simple, functionally trivial way - often through prefactors. Examples of systems for which this approach proved to be extremely successful are molecular fluids, polymers [2,3,16,17], elastic network models of proteins [18–23], lipid membranes and other biomolecular systems, just to mention a few.

In recent years, systematic coarse graining approaches have gained importance, where the interactions in the coarse-grained (CG) model are derived systematically from atomistic reference simulations in a bottom-up fashion. These models are often used in a multiscale simulation framework, where the closeness of higher and lower levels of resolution allows a switching back and forth between them. Below, we will review several systematic coarse graining approaches and address some of the most important methodological issues and challenges.

The smaller amount of degrees of freedom that are retained in coarse-grained models and the simpler force-fields employed allow the characterization of relevant properties of a system at a cheaper computational cost compared to the high-resolution atomistic models; on the other hand, there are cases in which the chemical detail *in a small region of the system* plays a crucial role, such that no simplification of the description is possible: think, for example, of the active site of a large enzyme, where fine-grained chemical processes take place. A high-resolution modeling of each part of the system would not be necessary, but at the same time a coarse-graining approach would delete important information.

This last observation naturally leads us to identify a particular class of soft matter systems among those that are studied with the help of computer simulations. Specifically, we can consider those systems where the focus is on a small, well-defined subregion of the simulation box. To this class belong, for example, certain solvated (macro)molecules, active sites of enzymes, the interaction of specific polymer ends at a surface, or simply a small spherical region in a homogeneous fluid whose radius is of the length scale of the property we are interested in.

For such systems the remaining, "non-interesting" region consists of the volume containing all those degrees of freedom which will be eventually neglected and/or discarded once the simulation is done, such as the solvent or large parts of a macromolecule which do not play an active role in the process of interest (e.g., all atoms sufficiently far from the active site of an enzyme). Usually, detailed knowledge about structural, energetic and thermodynamical properties of these large sections of the system is not required; nonetheless these "non-interesting" degrees of freedom have to be explicitly present and integrated, inasmuch as they "scaffold" the target object of the simulation and represent a reservoir of energy and molecules.

A method is thus required that allows one to perform a simulation where the largest part of the computational resources is concentrated on that region of the system that will be subsequently analyzed. *Adaptive resolution simulations methods*[24–34] were developed to solve the contradiction between the necessity of simulating all parts of the system and the fact that, eventually, the detailed information from a large subgroup of them will be neglected. The underlying idea is to replace these "non-interesting" degrees of freedom of the system with a simpler, coarse-grained representation, such that a sensibly smaller number of computations (e.g., force calculations) is required, while the "interesting" region is treated at a higher resolution.

This approach gives rise to at least two important conceptual problems that have to be solved:


These two problems are obviously interconnected, since the way the high- and low-resolution regions interact at the interface naturally depends on the specific properties of the models used in each of them; a thorough discussion of these aspects will be carried out in the context of the Adaptive Resolution Simulation (AdResS) [24–32] and Hamiltonian AdResS [33,34] (H-AdResS) methods.

The present review is composed of two principal parts: in Section 2 the basics of coarse-graining theory are presented together with a few examples of the most commonly used techniques, e.g., Force Matching, Boltzmann Inversion and Relative Entropy; in Section 3 we discuss two strategies, the adaptive resolution simulation (AdResS) scheme and the Hamiltonian AdResS (H-AdResS) to perform simulations in which different regions of the same system are modeled with different resolution.

Large parts of the present review are based on course material that was compiled for two workshops at the Forschungszentrum Jülich ("Hierarchical Methods for Dynamics in Complex Molecular Systems, 2012" [35], and "Workshop on Hybrid Particle-Continuum Methods in Computational Materials Physics", 2013 [36]), as well as on original publications on the respective methodologies [33,34,37–40].

#### 2. Coarse-Graining

As was mentioned in the Introduction, there are many interesting physical problems for which a detailed description of the system at the all-atom (AA) level is not necessary to obtain the relevant information. In these cases a simpler model might be used, where a given high-resolution, computationally expensive model is replaced with a simpler one.

These Coarse-Grained models possess a number of features that make them particularly appealing. For example, a smaller amount of computational resources is required to perform a simulation: this is due to both the reduced number of degrees of freedom and the simpler form of the interactions. Another important characteristic is that since many interaction centers are replaced with a single one, the fluctuations of the force experienced by a molecule are generally much smaller; this results in smoother free energy profiles and, as a consequence, in faster diffusive processes, allowing the system to reach larger time-scales with less computations. This last aspect implies that one typically has to determine a rescaling factor between the simulation timescale (usually given in Lennard Jones units) and the corresponding real world time (or the corresponding timescale in a higher resolution system). A detailed discussion of these dynamic aspects with further references can be found in Reference [41]. Finally, coarse-grained models are designed to reproduce large length-scale properties of the system, such as the global, collective conformational changes of a protein or the diffusive process of a polymer in a melt, that can be strongly insensitive to the fine-grained, chemistry-specific details; as a consequence, the parametrization of the coarse-grained interactions is also advantageously simpler.

Many CG models are generic, *i.e.*, they were not developed to model a specific chemical system but rather with the aim of studying a physical phenomenon such as folding or aggregation in general. One example is generic CG lipid models, which have been successfully employed to study the self assembly of micelles, bilayers and other structures [42–46]. Generic CG models have also been employed to study folding and aggregation of peptides and proteins [47–59]. For polymers, such generic models were especially successful. Following the so called 1/N theorem of de Gennes [60–62] it was shown that properties such as the overall chain extension as a function of the polymerisation index follow the same power law with the same exponent for all polymers, independent of the chemical species. The results of these scaling theories were instrumental in the development of generic and thus very efficient models, as well as in the interpretation of experiments. For dynamical properties generic models simulations provided the first direct evidence of the reptation/tube concept put forward by Edwards and de Gennes [63,64]. The reptation model is based on the fact that the dynamics of long polymer chains is dominated by the constraint that polymer chains cannot simply cut through each other.

A wide range of approaches have been developed that aim for consistency between a CG model and either experimental data or simulations of accurate high resolution models. Typically, these approaches are divided into thermodynamics-based and so-called *structure-based* ones. In thermodynamic coarse graining approaches, individual elements of the CG interaction function are separately parameterized based on thermodynamic reference data such as solvation free energies and partitioning data, liquid densities, surface tension, *etc.* [65–76]. (These are usually experimental reference data, but in a multiscale simulation approach the reference data can of course also be obtained from an atomistic simulation, to keep the CG and atomistic level thermodynamically consistent). In another group of approaches, one numerically generates CG interaction functions with the aim of reproducing the configurational phase space sampled in an atomistic reference simulation. These approaches may rely on different types of reference properties such as structure functions [77–89], mean forces [90–95] or relative entropies [96–98]. In the following subsection, a few basic notions of coarse-graining theory will be introduced, together with examples of the strategies that can be employed to perform the coarse-graining in practice.

#### *2.1. The Mapping Function and the Potential of Mean Force*

In a multiscale approach, one first needs to define the relationship between the two levels of resolution. This is typically done via mapping functions which determine the CG Cartesian coordinates of each site as a linear combination of coordinates for the atoms that are involved in the site (that could be via a center-of-mass or a center-of-geometry mapping or some other geometric construction). This means the CG coordinates **R** are constructed from the atomistic coordinates **r** via

$$\mathbf{R} = \mathbf{M}\mathbf{r} \tag{1}$$

where **M** is an n×N matrix (n and N being the number of particles in the atomistic and CG system, respectively). In the (canonical) sampling of the atomistic and CG systems with respective interaction potentials V AA(**r**) and V CG(**R**) the corresponding configuration functions P AA(**r**) and P CG(**R**) are given by

$$P^{AA}(\mathbf{r}) = Z\_{AA}^{-1} \exp[-\beta V^{AA}(\mathbf{r})] \tag{2}$$

and

$$P^{CG}(\mathbf{R}) = Z\_{CG}^{-1} \exp[-\beta V^{CG}(\mathbf{R})] \tag{3}$$

with <sup>Z</sup>AA <sup>=</sup> exp[−βV AA(**r**)]d**<sup>r</sup>** and <sup>Z</sup>CG <sup>=</sup> exp[−βV CG(**R**)]d**<sup>R</sup>** being the respective partition functions and β = 1/kBT. If one analyses the atomistically sampled system in CG coordinates one can determine the probability distribution of sampling atomistic coordinates that map to a given CG coordinate **r**)

$$P^{AA}(\mathbf{R}) = \langle \delta(\mathbf{M}\mathbf{r} - \mathbf{R})\rangle \tag{4}$$

(Here, we follow the notation used by Noid and collaborators, e.g., in References [99,100]). The angular brackets indicate canonical sampling of the atomistic system (*i.e.*, according to P AA(**r**)). One can formulate the aim of many systematic coarse graining approaches in the following way: To sample the part of phase space which is sampled by the atomistic system with the same probability distribution. Following this, one possible definition of consistency between atomistic and CG level of resolution is that the two models are consistent if the canonical configurational distribution sampled by the CG model P CG(**R**) is equal to the probability distribution P AA(**R**) obtained after mapping the atomistic system to CG coordinates. In a canonical ensemble, independent degrees of freedom q are Boltzmann distributed and the Boltzmann inverse of P(q)

$$V(q) = -k\_B T \ln P(q) \tag{5}$$

is a many-dimensional potential of mean force (PMF), which, when used for example as an interaction potential in a CG simulation, reproduces the distribution P(q) . This means that Boltzmann inversion of P AA(**R**) defines, uniquely up to an additive constant, a high-dimensional CG potential

$$V\_{PMF}^{CG}(\mathbf{R}) = -k\_B T \ln P^{AA}(\mathbf{R}) + const \tag{6}$$

which will result in a sampling of CG configurations consistent with the atomistic reference simulation. This high-dimensional, many-body CG potential contains both energetic and entropic contributions from the configurational sampling in the high-resolution model and the mapping between high-resolution and CG model (Equation (4)). Therefore, the resulting CG model is state point dependent and not necessarily readily transferable. While it is conceptually easy to formulate the PMF as a solution of the systematic coarse graining task, it is practically unfeasible. In most cases the PMF cannot be easily determined, and even if it were possible, the resulting high-dimensional potentials are computationally prohibitive. In addition, V CG PMF (**R**) is a function of **R**, *i.e.*, this PMF as is can in principle only be applied to a system which is identical in size to the atomistic reference system; if this limitation cannot be overcome, e.g., by breaking it down to short-range interactions, it would defeat the purpose of coarse graining. Therefore, one has to decompose the PMF into simpler independent terms and approximate it by simpler interaction functions, ideally ones that resemble interaction functions typically used in molecular mechanics forcefields, *i.e.*, short range bonded contributions and pair potentials or similar. Conceptually, one can decompose the PMF into a series of many-body terms up to an N-body term, where N is the number of particles on the system. However, this itself does not solve the problem since these multi-body interactions are again computationally unfeasible.

$$V\_{PMF}^{CG}(\mathbf{R}) \ = \sum\_{i,j} V\_2(r\_{ij}) + \sum\_{i,j,k} V\_3(r\_{ij}, r\_{jk}, r\_{ik}) + \dots + const$$

$$\approx \sum\_{i,j} V\_{\text{eff}}(r\_{ij}) + const \tag{7}$$

In Equation (7) one approximates the series by an effective pair interaction which also contains contributions from the higher order terms in Equation (7) (some approaches also include three-body terms for systems where this is necessary [101]). There are many approaches to this task of determining effective CG interactions, and all the resulting CG models are (only) approximations to V CG PMF (**R**).

#### *2.2. Multi-Scale Coarse-Graining*

Probably the most painful limitation in the use of the many-body PMF is the fact that, in general, it cannot be decomposed into a sum of local contributions depending on the interactions between two to a few particles. A simple strategy would therefore be to decide a simple functional form of the potential, e.g., a sum of pairwise, radial interactions, which depend on a set of parameters; the values of the latter are then chosen so that the CG potential is as close as possible to the true PMF. This approach was pioneered by Ercolessi and Adams in 1994 [102] and Tschöp and coworkers in 1998 [103]. Later, Izvekov and Voth [104,105] made use of the force-matching concept of Ercolessi and Adams in the development of the Multi-Scale Coarse-Graining (MS-CG) method. These approaches have been successfully applied to a multitude of biomolecular and other soft matter systems, in particular to biomolecules [90–95].

The central idea of Force Matching is to use a variational (*i.e.*, non-iterative) approach for constructing the CG potential based on the atomistic reference simulation (the recorded forces from the atomistic simulation). The numerical implementation of this variational principle works in such a way that the exact many-body PMF (Equation (6)) is represented by a linear combination of basis functions that are functions of the CG site coordinates [14,15]. For a given configuration of the CG coordinates, in fact, the average of the total atomistic force **f**<sup>α</sup> acting on a CG site α is equal to the derivative of the many-body PMF:

$$
\langle \mathbf{f}\_{\alpha} \rangle\_{\mathbf{R}} \equiv -\frac{\partial U[\mathbf{R}]}{\partial \mathbf{R}\_{\alpha}} \tag{8}
$$

where the subscript **R** on the averages indicates that the sampling is constrained to those configurations of the AA system having the CG sites in a fixed configuration. The CG force field depends on M parameters g1, ··· , gM, that can be prefactors of analytical functions, tabulated values of the interaction potentials, or coefficients of splines used to describe these potentials. These parameters have to be optimized so that the CG force field reproduces the forces in the atomistic system (after mapping) as close as possible. To this end, one minimizes the difference between the average AA force **f**α**<sup>R</sup>** and the force **F**<sup>α</sup> due to the CG potential by minimizing the following quadratic function:

$$\chi^2[\mathbf{F}] = \frac{1}{3N} \left\langle \sum\_{\alpha=1}^{N} |\mathbf{f}\_{\alpha} - \mathbf{F}\_{\alpha}|^2 \right\rangle \tag{9}$$

Equation (9) can be rephrased in terms of generalized scalar products of elements in a multi-dimensional vector space; these elements are the 3N-dimensional force-fields **f** and **F** acting on the CG sites, with the scalar product and the corresponding norm given by:

$$\mathbf{F}^a \cdot \mathbf{F}^b \equiv \left\langle \sum\_{\alpha=1}^N \mathbf{F}\_{\alpha}^a \cdot \mathbf{F}\_{\alpha}^b \right\rangle \tag{10}$$
 
$$||\mathbf{F}|| \equiv \sqrt{\mathbf{F} \cdot \mathbf{F}}$$

Given the definitions in Equation (10), it can be shown that minimizing the function χ<sup>2</sup> in the MS-CG method is equivalent to minimizing the 'distance' between the many-body PMF and the CG potential:

$$
\chi^2[\mathbf{F}] = \chi^2[\mathbf{F}^{\mathbf{PMF}}] + \frac{1}{3N}||\mathbf{F}^{\mathbf{PMF}} - \mathbf{F}||^2\tag{11}
$$

The force-matching strategy thus *projects* the true many-body PMF onto the basis of functions that are used to define the CG force-field; a thorough formal explanation of this interpretation can be found in Reference [14,15].

It should be noted, however, that the CG force field is still an approximation to the high dimensional PMF within the limitations of the types of CG forces chosen (for example pair forces that can be derived either from analytical or from numerical tabulated potentials). This also implies that a CG model obtained from force matching does not by construction reproduce the pair correlation functions in the system, and the reproduction of local structural properties such as pair distributions may (or may not) be imperfect depending on the importance of cross-correlations between degrees of freedom. An exact reproduction of the underlying atomistic problem by matching mean forces therefore potentially requires the introduction of higher order (e.g., three-body) interactions. Noid and coworkers have extended the force matching method and demonstrated that the CG force field can be directly determined from structural correlation functions obtained from the atomistic system instead of the forces [99]. Their theoretical approach also allows an assessment of the correlations between different interactions that are neglected by straightforward Boltzmann inversion and allows the quantification of the importance of many-body correlations in CG models. In a recent study, Rudzinski and Noid explore these aspects in detail [106]. They demonstrate how the balance between accurately reproducing individual correlation functions (such as pair correlation functions or angle distributions) and also reproducing cross correlations between the respective degrees of freedom is affected by the mapping scheme and the coarse graining method (or more accurately its targets, namely the mean forces versus the individual correlation functions).

#### *2.3. Boltzmann-Inversion Based Methods*

In contrast to the Force Matching or Multi-scale coarse graining scheme, other structure-based methods provide CG interactions that reproduce pre-defined target structure properties—often a set of radial distribution functions [77–89]. This means that the many-body PMF (Equations (6) and (7)) is replaced as a target by a set of simpler structural correlation functions. If the interactions in the CG model are statistically independent or only weakly coupled then direct Boltzmann inversion determines each term in the potential immediately from the corresponding distribution function [77,107–109]; for non-bonded interactions in dense systems, though, this is typically not the case. This means that the individual distribution functions and their corresponding potentials of mean force, e.g., a radial distribution function of a simple liquid gtarget(r) and its Boltzmann inverse, the pair PMF, V CG <sup>0</sup> (r) = −kBT ln gtarget(r), cannot be directly used as an interaction function since they correspond not only to the interaction potential but also to the correlated contributions from the surroundings. These multi-body effects of the environment need to be removed from the PMF in order to generate an effective pair potential that reproduces the target structure, for example the pair correlation function in the liquid. It can be shown that such a pair potential is unique up to an arbitrary constant [110] and exists [96,111–113]. There are several numerical methods to generate this pair potential (tabulated interaction function).

Iterative Boltzmann inversion (IBI) [81,114,115] is a natural extension of the Boltzmann inversion method. Here, a numerical CG potential is iteratively refined until the target structure is reproduced within a predefined error. Each step in the iteration procedure is a CG simulation with potential V CG <sup>i</sup> (r) which yields an RDF gi(r) that differs from the target gtarget(r). The potential is then modified by a correction term ΔV (r) according to

$$V\_{i+1}^{CG}(r) = V\_i^{CG}(r) + \Delta V\_i(r) = V\_i^{CG}(r) + k\_B T \ln \frac{g\_i(r)}{g\_{target}(r)}$$

Sometimes the potential correction ΔVi(r) is multiplied by a prefactor 0 < λ ≤ 1 to avoid overshooting in the numerical procedure. The iterative procedure is often initiated with the pair potential of mean force V CG <sup>0</sup> (r) = −kBT ln gtarget(r), but that is not mandatory. Different starting potentials can be useful, in particular for more complex mixed systems where the iterative procedure may be unstable because intermediate CG models lead to phase separation. This is for example observed in the case of hydrophobic molecules in aqueous solution where both above-mentioned precautions have found to be useful to prevent strong oscillations or even instability of the IBI procedure.

IBI is by no means the only numerical method that solves the above task. Another numerical scheme is the so called inverse Monte Carlo (or more recently renamed Newton inversion) method [78,79,83,84] which, according to Henderson's theorem, should lead to the same numerical solution for the pair potential corresponding to a given pair correlation function. While in IBI the potential update ΔV<sup>i</sup> is ad hoc, in IMC it is computed using rigorous statistical mechanical arguments (for details see Reference [78]). In the case of multicomponent systems, where several pair potentials need to be updated, IMC accounts for correlations between observables, *i.e.*, the updates for the different potentials are interdependent. In contrast, for IBI each potential is updated independently, which might lead to oscillations and convergence problems in the iteration procedure. The disadvantage of IMC on the other hand is a high computational cost and problems with numerical stability; for a detailed comparison see Reference [116]. Related to IMC, there are several other recent developments, e.g., a molecular renormalization group approach [85–87] or an approach that relies on relative entropies [96–98] (which will be discussed in more detail below). While the above structure-based methods by construction reproduce *exactly*, within the error of the numerical procedure, the local pair structures and thus are well-suited to the reinsertion of atomistic coordinates, it can be expected *a priori* that they will not be equally well suited to the reproduction of thermodynamic properties (pressure, phase behavior, *etc.*) of the reference system; in this respect, water provides a prototypical case and a reference for testing. Note also that CG models based on pair correlation functions do not necessarily reproduce higher-order (e.g., three-body) structural correlations [116] since the pair correlation functions as structural targets are just an approximation to the total conformational distribution function obtained from the atomistic sampling, P AA(**R**) (Equation (4)). This means that if higher order correlations are a crucial part of the many-body PMF, models based on pair structures may fail to represent these, and it may even be possible that models which are limited to pair potentials may fail to reproduce these correlations irrespective of the parametrization methodology. One example where this is studied in detail is liquid water [101,116–119]. Recently Noid and coworkers have analyzed these aspects using concepts from liquid state theory [100,120].

One more note concerning Henderson's theorem: even though there is in principle one *exact* solution for the effective pair potential that reproduces a given pair correlation function, different potentials might give a reasonably close representation of the structure, *i.e.*, the above inverse problem is mathematically ill-posed [116,121]. This effect becomes even more pronounced in complex systems where several interaction functions corresponding to several RDFs need to be numerically determined. This can to some extent be turned into an advantage since it allows one to impose thermodynamic constraints in the parametrization procedure. This will result in interaction functions which do *not exactly* reproduce the target structure but give a very close representation while at the same time producing the desired thermodynamic behavior. One example of this is pressure correction terms [81,117]. Here, an additional linear pressure correction is applied during the iterative Boltzmann inversion procedure with

$$
\Delta V\_{i,P}^{CG}(r) = A\_i \left( 1 - \frac{r}{r\_{cut}} \right) \tag{12}
$$

where rcut is the radial cutoff distance of the non-bonded interaction and the constant A is determined via the virial expression for the pressure to

$$-\left[\frac{2\pi N\rho}{3r\_{cut}}\int\_{0}^{r\_{cut}}r^3 g\_i(r) \text{d}r\right] A\_i \approx \left(P\_i - P\_{target}\right)V\tag{13}$$

V is the volume of the system, P<sup>i</sup> the pressure of the CG model in the i-th iteration, and Ptarget the target pressure. The price to pay for this adjustment, however, is the loss of the perfect compressibility match. This phenomenon is of course a direct consequence of the state point dependency of coarse grained interactions. Further details on this topic can be found in Reference [117]. Recently, different functional forms of pressure correction terms and the influence of the cutoff length have been explored by Fu *et al.* [122].

It is to be expected that there will be more development in this direction (using other types of thermodynamic constraints) since in particular for complex soft matter system the balancing of structural and thermodynamic behavior in CG models is an ongoing field of research [88,89].

The IBI method is in its original form designed and best suited for systems with uniform density distributions. Recently, Jochum *et al.* have shown how it can be generalized for non-bonded potentials for inhomogeneous systems [123]. For a system with a slab geometry (such as systems of solvent slabs in vacuum or phase-separated systems consisting of two liquid slabs in contact with each other), the method is analogous to IBI but the iterative update of the interaction potential consists of two terms, one based on the radial distribution function calculated in a slab geometry and one that accounts for the slab and interfacial widths. These latter geometric features are very sensitive to the thermodynamic properties (surface tension) of the interface. Therefore the two update terms allow for a balance between the local liquid structure and the thermodynamic properties of the liquid/vapor or liquid/liquid interface. In addition to water/vapor and methanol/vapor interfaces, the method has also been successfully applied to a solute-solvent system of a single benzene molecule at the vacuum-water interface, *i.e.*, it is possible to account to some extent for the partitioning behavior of a solute between bulk and interface, an aspect that makes this method promising in the context of designing transferable CG models for phase separation processes (see below).

Last but not least, one should mention the particular case of Boltzmann-inversion based approaches for mixed systems where (at least) one component is very dilute (from now on termed solute), e.g., biomolecules in aqueous solution. In this case, iterative Boltzmann inversion and similar methods are problematic. While one can easily compute the solvent-solvent and the solute-solvent radial distribution functions, and therefore determine the corresponding CG potentials with for example IBI, this is not so straightforward for the interactions between the low concentration component (solute). (Note that for simplicity only solutes that are represented by a single CG bead will be discussed here.) In these cases, obtaining the PMF through brute force sampling of a radial distribution function is not advisable. One should rather compute the solute-solute pair PMF (between two solute particles) with an advanced sampling method such as umbrella sampling or thermodynamic integration (using distance constraints) [124,125].

When solvent degrees of freedom are not explicitly present in the CG system, this solute-solute PMF can be used directly as an effective solute-solute non-bonded interaction since the environmental (solvent) effects within the PMF are not explicitly represented through solvent degrees of freedom in the CG model. For many types of solutes the solute-solute PMF has been used as an interaction potential in implicit solvent models [126,127]. One prominent example is the use of the solute-solute pair PMF for implicit solvent models of aqueous electrolyte solutions, *i.e.*, implicit solvent ion models [37,79,85,128,129].

The case is somewhat different if some sort of explicit solvent representation, for example in the form of a CG water model, is present in the CG system. In this case, effective solute-solute non-bonded pair interactions are needed from which the solvent contributions are removed in the same way they are removed by IBI in other systems. However, due to the sampling problem of the PMF between dilute components, an iterative procedure is prohibitive for solute-solute interactions. To solve this problem, an approximate method has been developed by Villa *et al.* [38,130]. Here, the CG solvent-solvent and solute-solvent interactions are first determined, for example through normal IBI. Now the pair PMF between the solutes V AA PMF (r) is computed (from atomistic umbrella sampling or thermodynamic integration) and used as a target, in other words the resulting CG model is parameterized to reproduce the solute-solute association strength observed in the atomistic system. In order to remove the solvent contribution from V AA PMF (r), a subtraction procedure is employed. One conducts a separate PMF calculation (again with umbrella sampling or thermodynamic integration), this time in a CG system, where the (previously determined) CG solvent-solvent and solute-solvent interactions are present but no direct interaction between the solute particles is turned on. The resulting PMF V CG PMF,excl(r) *only* consists of the environmental contributions (in the CG environment). By subtracting V CG PMF,excl(r) from the target PMF one obtains the missing direct pair interaction

$$V^{CG}(r) = V^{AA}\_{PMF}(r) - V^{CG}\_{PMF,excl}(r) \tag{14}$$

which by construction reproduces the target PMF. Note that this subtraction procedure is not necessarily limited to CG solvent-solvent or solute-solvent interactions determined by IBI. In principle other types of CG solvent-solvent or solute-solvent interactions could also be used to determine V CG PMF,excl(r). If one then applies Equation (14), one obtains an effective solute-solute interaction V CG(r) which reproduces the atomistically observed solute-solute association strength (*i.e.*, V AA PMF (r)) in the particular CG solvent that was chosen.

#### *2.4. Relative Entropy*

Aiming at reproducing different properties or objective functions of the reference, atomistic system, IBI and Force Matching have manifestly different algorithms and produce qualitatively different results. Recently a different coarse-graining strategy has been developed, namely the Relative Entropy method [131–133], which relies on a quantitative measure of the loss of information that follows from the description of a system in terms of different interaction potentials and/or different resolution. Remarkably, it is possible to demonstrate that the information function employed in this strategy connects Relative Entropy, IBI and Force Matching together. The functional form of this measure function is given by the relative entropy, or Kullback–Leibler distance:

$$S\_{rel} = \sum\_{\nu} \mathcal{P}\_{AA}(\nu) \cdot \ln \left( \frac{\mathcal{P}\_{AA}(\nu)}{\mathcal{P}\_{CG}(\nu)} \right) \tag{15}$$

In Equation (15), ν labels a given microstate or atomistic configuration, PAA(ν) is the probability of sampling a configuration ν in the fully atomistic system, and PCG(ν) is the probability of sampling the same (atomistic) configuration in the system with coarse-grained interactions, but still described by a high-resolution structure. This latter probability is degenerate with respect to the atomistic-potential configurations, as many of them correspond to the same coarse-grained configuration V. It is therefore advantageous to write the probability to sample a given atomistic configuration in the CG system in terms of the function that maps the fine-grained configurations onto the coarse-grained ones:

$$\mathcal{P}\_{CG}(\nu) \equiv \frac{\mathcal{P}\_{CG}'(\mathcal{V})}{\Omega(\mathcal{V})}$$

$$\mathcal{V} \equiv \mathbf{M}(\nu) \tag{16}$$

Here, P CG(V) is the probability of sampling the CG configuration V in the low-resolution system and Ω(V) = <sup>ν</sup> δ(**M**(ν) − V) is a measure of the degeneracy of the configuration V in the atomistic system. It should be noted that this last quantity depends only on the mapping function **M** and not on the coarse-grained interactions; this term can therefore be separated out in the definition of the relative entropy to obtain:

$$\begin{aligned} S\_{rel} &= S\_{map} + \langle \phi \rangle \\ \text{with:} \\ \langle \mathcal{Q} \rangle &\equiv \sum\_{\nu} \mathcal{P}\_{AA}(\nu) \cdot \mathcal{Q}(\nu) \\ S\_{map} &= \langle \ln \left( \Omega(\mathbf{M}(\nu)) \right) \rangle \\ \phi(\nu) &= \ln \left( \frac{\mathcal{P}\_{AA}(\nu)}{\mathcal{P}\_{CG}(\mathbf{M}(\nu))} \right) \end{aligned} \tag{17}$$

The quantity φ(ν) can be interpreted as the amount of information in the configuration ν which discriminates between the atomistic and the coarse-grained probability. The definition in Equation (17) is particularly appealing because it shows that the relative entropy can be computed as the sum of operator averages. In the special, but quite common case of systems in thermal equilibrium, the probability distributions P are simply given by the Boltzmann weights, and the relative entropy reduces to the form:

$$S\_{rel} = S\_{map} + \beta \left[ \left( A\_{AA} - A\_{CG} \right) - \left\langle U\_{AA} - U\_{CG} \right\rangle\_{AA} \right]$$

with AAA (resp. AAA) being the free energy of the atomistic (resp. CG) system. For a given choice of the mapping function **M**, the optimal coarse-grained potential is obtained by minimizing the relative entropy functional with respect to the parameters in terms of which the aforementioned potential is defined: common choices for non-bonded, two-body interactions are the coefficients of a Lennard-Jones potential or the nodes of a spline.

As anticipated at the beginning of this section, IBI and Force Matching can be connected using the concept of relative entropy. In fact, a straightforward minimization of Srel making use of two-body coarse-grained potentials can be shown to be equivalent to the IBI algorithm; on the other hand, the Force Matching scheme is retrieved if the average of the function |∇φ| <sup>2</sup> is minimized instead of the average of φ [134]: the squared gradient of the φ function with respect to the Cartesian coordinates, in fact, is proportional to the squared difference of the forces obtained from the AA and the CG descriptions, so that:

$$
\chi^2[\mathbf{F}] = \chi^2[\mathbf{F}^{\mathbf{PMF}}] + \frac{(k\_B T)^2}{3N} \left< |\nabla \phi|^2 \right>\tag{19}
$$

In conclusion, it therefore appears evident that different coarse-graining schemes are obtained through the minimization of different functionals of the same information function φ, which represents the unifying element between various approaches.

#### *2.5. Transferability of Coarse-Grained Models*

From the preceding sections we have seen that there are different approaches to the systematical parameterization of CG models which by construction will not be equally well suited to the reproduction of thermodynamic and structural properties of the system. It is not *a priori* clear whether structure-based potentials reproduce macroscopic thermodynamic properties and, vice versa, if thermodynamics-based potentials reproduce microscopic structural properties. However, the interplay of structure and thermodynamics is crucial for the investigation of structure formation processes, in particular for biomolecular aggregation in aqueous solution where partitioning and phase separation play a decisive role. All CG models (in fact also all classical atomistic forcefields) are state-point dependent and cannot necessarily be—without reparametrization—transferred to different thermodynamic conditions or a different chemical environment compared to the one where they had been derived. This means "transferability" can refer to a change in temperature, density, concentration, system composition, phase, *etc.*, but also a change in chemical environment, e.g., the change of length or sequence of an amino acid chain. Structure-motivated CG models which approximate the high dimensional PMF obtained from an atomistic reference are by construction heavily state point dependent, and several studies have addressed questions regarding their ability to reproduce thermodynamic properties. One system that has been of particular interest in this context is liquid water [112,117,135]. The reason is on the one hand of course its immense importance in all questions regarding biomolecular systems. In addition, it is of particular methodological interest because for single bead models of water it is known that three-body correlations play a decisive role and the potential compromise between reproducing pair- or higher order structural correlations is particularly relevant for the properties of the model [101,116,117]. Different studies have been carried out that compare structure-motivated and thermodynamics-based CG models [121,136,137]. While CG models where the parametrization targets had been solvation and partitioning properties are particularly well suited to reproduce processes where for example hydrophilicity/hydrophobicity arguments play a decisive role, they do not per se reproduce the structure of the system [121,136]. Related to their ability to reproduce the thermodynamic properties of certain chemical units, these models exhibit considerable transferability and can often be applied to a variety of molecular systems and a range of thermodynamic conditions. Motivated by these observations, intensive research is currently being carried out to derive CG potentials that are both thermodynamically as well as structurally consistent with the underlying higher-resolution description, thus ensuring for example a certain state point transferability [38,88,89,94,138].

One possibility to improve transferability in this context is to exploit—similar to the case of the pressure correction described above—the fact that the derivation of a CG model based on the reproduction of structural properties (potentials of mean force) is an ill-posed problem which allows a reproduction of the original target property within a given error while at the same time including certain thermodynamic target properties during parametrization. One approach developed by Ganguly *et al.* for multicomponent systems that follows this idea combines the IBI method with Kirkwood–Buff integrals as additional targets which are related to the activity coefficients of the components [139]. With this approach transferability over a certain concentration range can be achieved.

Yet another non-structure-based method that produces CG pair potentials with remarkable state point transferability is the conditional reversible work method by Brini *et al.* [140–142]. Here, several calculations of pair potentials of mean force on the atomistic level are used to assess and correctly account for the indirect contribution by the environment to the effective CG pair forces. The observed transferability of this method can be ascribed to the fact that the method relies on direct pairwise interactions in the atomistic reference system. In other words, the method does *not* rely on reproducing a structural property such as a pair PMF or multi-body PMF, *i.e.*, on properties that are extremely dependent on the precise thermodynamic state of the reference system.

It has been mentioned before that effective pair potentials account for multi-body effects, for example, three body interactions. For this reason, they are only to a limited extent additive, which limits the transferability of the potentials [38,143]. Understanding the physical nature of non-additivity in the system of interest can help to make a CG model transferable. In principle, there are various possibilities to approach the question of transferability of effective pair potentials: (i) One applies a model derived at/optimized for a given state point unaltered to a range of state points nearby; in that case, one has to carefully investigate the range in which this is permitted [144–146]; (ii) One creates a new set of potentials for each state point one wants to investigate [144]; (iii) One specifically designs a single CG model with the aim of transferability (for example specific density dependent potentials [94,147,148], CG models that are designed to be applicable for a range of mixture compositions [71,138], or CG models that are capable of capturing a liquid crystalline phase transition [88,89]); (iv) One uses a model derived at one state point and (analytically) modifies it to be applicable to different conditions (one example being the rescaling of potentials in order to apply them to a different temperature [149]).

The approach of using a model at a specific state point and then testing its transferability over a reasonably wide range of different physical conditions has traditionally been applied in the case of classical polymer melts. In this field, structure-based models have been very successfully applied, and decent temperature [77,150–152] and pressure transferability [153] have been found. In fact in the first papers by Tschöp *et al.* [77,103] the temperature transferability already allowed the semi-quantitative prediction of shifts in Vogel Fulcher temperature for different polycarbonate modifications. This observation appears to hold for classical isotropic polymer melt systems where the behavior is largely dominated by the correct representation of the chain conformations and the excluded volume of the chain. As soon as more specific chemical interactions play a role, the case of transferability becomes more delicate.

In the following, we discuss three examples which illustrate that understanding the physical basis behind the limitations in transferability can help to design transferable models.

Binary mixtures have in general been widely used as model systems to explore various aspects of the transferability of CG models [37,38,71,128,129,138,143,147,148]. The transferability to different concentrations of liquid mixtures or solutions is of vital importance for simulation of processes such as (bio)molecular aggregation which are characterized by spatially varying structure and fluctuating concentrations.

Following the above-described method to apply Boltzmann-inversion derived methods in dilute solute solvent systems, a CG model for mixed systems of benzene in water had been derived [38]. This means that the CG benzene-benzene potential had been parameterized on the basis of the benzene-benzene PMF of two benzene molecules in aqueous solution, *i.e.*, at "infinite" dilution. Benzene-water mixtures of different composition have been studied with this CG model and analyzed using the Kirkwood-Buff theory of solutions [154]. Kirkwood-Buff theory provides a link between local structural information and thermodynamic properties of the solution. This CG model, parametrized at infinite dilution of benzene, reproduces the Kirkwood-Buff integrals of mixtures at various concentrations obtained with the detailed-atomistic model. It reproduces the changes in the benzene chemical potential and the activity coefficients of the mixtures over a range of mixture compositions (up to concentrations where benzene and water demix in the atomistic reference simulation). A possible explanation is that hydrophobic interactions between benzene solutes are short-ranged, and the multi-body correlations involved in hydrophobic association can be described by pairwise additive effective potentials (category (i) of the above list). The observed transferability of the potential supports the idea that hydrophobic interactions between small molecules are pairwise additive. Villa *et al.* also found that a different CG model for benzene-benzene interactions that had been derived for pure benzene (via IBI) is neither suited to describe benzene-benzene interactions in aqueous solution at different concentrations nor a phase-separated benzene/water system with a bulk benzene layer [38].

To reproduce the actual phase separation process as well as the behavior of the mixed (or dilute) systems is much more complicated (yet it is of vital importance in the parameterization of bottom-up CG models that are able to reproduce biological partitioning and self aggregation phenomena). Here, a combination of a wise choice of one or possibly several reference state points is promising, in particular combining the reference of infinite dilution with the phase separated one. For the latter, application of the IBI extension by Jochum *et al.* for inhomogeneous systems with an interface/phase boundary can be utilized [123].

In this context it should also be mentioned that similar transferability problems exist in other areas, for example in the simulations of solids (e.g., with embedded atom potentials). As soon as one encounters surfaces or interfaces the local environment of an atom differs substantially from the bulk (crystalline) phase, which was used to parameterize the interaction potentials. Consequently the transferability of the potentials will affect the ability to model processes such as crack formation or the relocation of grain boundaries [155,156].

In the second example, the situation is different. Here, the transferability of CG (in this case implicit-solvent) ion models in aqueous solution had been investigated. Due to long-range electrostatic interactions, the ions affect the behavior of water increasingly strongly with increasing ion concentration. More specifically, the presence of many ions reduces the orientational fluctuations of the water molecules and thus the dielectric permittivity of the solvent. Therefore, effective ion-ion potentials parametrized at infinite dilution are not directly transferable to higher salt concentrations. Hess *et al.* developed a reduced-resolution (in this case implicit-solvent) potential for aqueous electrolyte solutions where an ion-concentration-dependent Coulomb term was added to the (ion-specific) pair interaction. Thus, by using a concentration-dependent dielectric permittivity for water, part of the multi-body effects in the system were accounted for in the ion-ion pairwise interaction in the implicit solvent model [128,129]. This approach reproduced the NaCl solution osmotic properties and the ion coordination up to a concentration of 2.8 M (mol/L). While in the case of the CG model of benzene/water mixtures [38] the short-range hydrophobic interactions parameterized at infinite dilution were directly transferable to higher benzene concentrations, the ion-ion interactions determined at infinite dilution had to be split into a short-range ion-specific and a long-range electrostatic part. The interactions were then made transferable by keeping the short-ranged part constant and analytically modifying the long-ranged electrostatic part (category (iv) of the above list). Shen *et al.* have further investigated the structure and osmotic properties of electrolyte solutions over a wide range of concentrations [37]. Using a concentration-dependent dielectric constant one also obtains very good structural properties of the electrolyte solution at low and intermediate salt concentrations while for larger salt concentrations multi-body ion-ion correlations limit straightforward transferability. Guided by this structural analysis, the transferability of the implicit-solvent model could also be improved for high ion concentrations. One obtains transferable implicit-solvent effective pair potentials which are both structurally and thermodynamically consistent with an explicit solvent reference model.

The third example again stresses the immense importance of a good reference state point. It also shows how the reference choice can be guided by understanding the underlying physics.

One highly relevant case of a transferability problem is the ability of a CG model to correctly describe a phase transition while being (reasonably) faithful at both phases below and above, a prominent example being liquid crystalline systems. For such systems, coarse graining can gain access to large system sizes with local disorder, domains *etc.*, and a bottom-up, non-generic CG model has the power to include molecular flexibility and other chemistry specific details. This means that the model should on the one hand faithfully represent the structure in the LC ordered state and on the other hand reproduce the LC/isotropic phase transition.

For an azobenzene-based liquid crystalline compound (8AB8) it was found that state point transferability could be achieved by choosing as an appropriate state point for the reference simulation the supercooled liquid just below the smectic-isotropic phase transition. This reference state is characterized by a high degree of local nematic order while being overall isotropic. The primary idea behind this choice of reference state is the observation that—in the spirit of arguments from classical density functional theories of liquids [157]—the short ranged correlations in the ordered phase are not very different from the local correlations present in a disordered phase at suitable thermodynamic state (density, temperature, *etc.*) (as one approaches the transition from the high-temperature side). If one captures these local correlations and builds them into the (structure-based) potentials, then these potentials should be able to describe phases on both sides of the transition. For 8AB8, indeed an excellent structural correspondence with the atomistic reference in the smectic state has been found. With the resulting CG model it is possible to switch between the atomistic and the CG levels (and vice versa) in a seamless manner maintaining values of all the relevant order parameters which describe the LC ordered state (see Figure 1). At the same time, this CG model shows remarkable state point transferability and reproduces the LC-isotropic phase transition upon heating and cooling [39]. Such a CG LC model—since it is on the one hand sufficiently coarse grained to study a variety of processes in the LC phase while being at the same time still very closely related to an underlying chemically realistic atomistic description, e.g., allowing for realistic molecular flexibility—is able to give new insights into for example microscopic dynamics in LC phases [40]

Figure 1. A transferable coarse-grained (CG) model for a liquid crystalline molecule that reproduces the ordered/disordered phase transition while at the same time being highly consistent with an atomistic level of resolution. This is achieved by the choice of reference state point, namely the supercooled liquid just below the smectic-isotropic phase transition which is characterized by a high degree of local nematic order while being overall isotropic, for details see Reference [39]. Left panel: snapshot of a CG simulation in the LC state with a backmapped atomistic structure superimposed; Right panel: This model allows mechanistic studies of dynamic processes in smectic systems, where the influence of the intrinsic flexibility of the molecules on the free energy of different permeation pathways can be elucidated (reprinted from [40]).

#### 3. Adaptive Resolution Simulations

In the introduction we defined a class of systems for which the focus of the researcher's interest is on a (possibly small) subregion of the simulated system: this is the case, for example, of the hydrogen bond network at the surface of a solvated molecule in water. The bulk of water molecules has to be simulated in order to sustain the thermodynamical properties of the subsystem of interest—the interfacial water—and to provide the correct exchange of molecules. Nonetheless, the fine-grained detail of molecules far from the interface is not relevant; it would be therefore desirable to replace the atomistic, expensive interactions of hydrogen and oxygen atoms with a coarser model.

We can then introduce a geometrical separation between an "inside" and an "outside", *i.e.*, an all-atom and a coarse-grained region, and assign different types of representations and interactions to the molecules according to their position in the simulation domain.

This idea has a long and successful history: to investigate crack propagation in hard matter, for example, several authors [158–162] made use of a hybrid description of the system, where a "high resolution" description is employed only for the area in the proximity of the crack, and the material far from the crack is treated with a simpler model. Another important example of hybrid resolution simulation is provided by Quantum Mechanics/Molecular Mechanics (QM/MM) methods [163–167]. In this case the structure of the system is described at the same (atomistic) level everywhere; however, the interactions are obtained from a classical force-field in the bulk of the system, but in a small region *ab initio* methods -such as Density Functional Theory, DFT- are employed to calculate the forces. Many different "flavors" of this approach have been developed; in all of them, though, one of the crucial aspects is how to interface the two domains where interactions are different, and in most of the established methods the identity or resolution of the particle is not allowed to change. In general, one has to answer the two following questions:


The last question is of particular importance for all systems whose components can diffuse on large length scales (at last of the order of the molecules' size) in the simulation time. It appears natural to introduce a *transition region* (often called hybrid region, or healing region) that allows for a smooth interpolation from a given representation of the molecule's structure/interaction to another; a schematic representation of this setup is provided in Figure 2. The choice of the specific way this interpolation is implemented depends, as we mentioned earlier, on the properties that have to be preserved in the CG region.

Figure 2. Typical scheme of an adaptive resolution simulation: a high-resolution region, where molecules are described at the atomistic level, is coupled to a low-resolution region where a simpler, coarse-grained model is employed. These two sub-parts of the system are interfaced via a hybrid region, in which the molecule's representation smoothly changes from one to the other, depending on their positions. It is on this last region and its properties (*i.e.*, the way molecules change resolution) that the complexity of adaptive resolution schemes concentrates.

Irrespective of the chosen method to interface the two regions of the system, however, it is natural to expect that the equilibrium state that will be reached in the absence of external driving forces will not be the desired one. A further crucial point is then to find the simplest way to impose the desired thermodynamics.

The central, strong requirement that has to be satisfied is that molecules should be free to diffuse from any region of the simulation box to any other. Additionally, in a hybrid resolution model thermal equilibrium should be preserved, *i.e.*, the temperature of the system has to be constant during the simulation. Another possible constraint is to impose a uniform density across the box, irrespective of the specific resolution; nonetheless, we'll see that there are cases where this is neither strictly necessary nor desirable.

These are the fundamental constraints that can be imposed on the system as a whole. Other, more specific ones can be introduced on the properties of the CG region as well as the transition region, which will "drive" us towards a specific formulation of a double-resolution simulation method.

#### *3.1. The Adaptive Resolution Simulation Scheme*

The Adaptive Resolution Scheme (AdResS) represents the first effective and computationally efficient method to simulate a system where two different models, e.g., an all-atom one and a coarse-grained one, are *simultaneously* employed in different subregions of the simulation domain, interfaced in such a way to allow molecules to freely diffuse from one region to the other.

The basic constraint that was enforced in the original version of this scheme is that Newton's 3rd law has to be exactly satisfied everywhere in the simulation domain. This requirement rules out any form of potential energy interpolation: it can in fact be formally demonstrated [168] that no method exists to smoothly "blend" the interaction between two molecules from a given potential energy to another without generating forces that cannot be recast in a form that satisfies Newton's Third Law. In order to preserve the latter, then, a *force-interpolation scheme* is required, such that the force that a given molecule receives due to the interaction with a second one is antisymmetric under exchange of the molecules' labels:

$$\mathbf{F}\_{\alpha|\beta} = -\mathbf{F}\_{\beta|\alpha} \tag{20}$$

A second, less strict requirement is that CG molecules possess CG degrees of freedom only; this determines the specific way the force mixing is performed: a molecule in the CG region loses completely its atomistic detail (thus retaining, for example, the center of mass coordinates only), and interacts with a molecule in the AA or even the transition region only via its CG degrees of freedom. Formally, this constraint imposes that the atomistic forces vanish when at least one of the two interacting molecules is in the CG domain.

These two constraints are sufficient to define the force-field interpolation; the force acting between molecules α and β is given by:

$$\mathbf{F}\_{\alpha\beta} = \lambda(\mathbf{R}\_{\alpha})\lambda(\mathbf{R}\_{\beta})\mathbf{F}\_{\alpha\beta}^{AA} + \left(1 - \lambda(\mathbf{R}\_{\alpha})\lambda(\mathbf{R}\_{\beta})\right)\mathbf{F}\_{\alpha\beta}^{CG} \tag{21}$$

In Equation (21) λ(x) is any smooth function that goes from 1 in the AA region to 0 in the CG region. **R**<sup>α</sup> (resp. **R**β) is the CoM coordinate of molecule α (resp. β). FAA αβ and FCG αβ are, respectively, the atomistic and the coarse-grained forces acting on molecule α due to the interaction with molecule β.

The CG force is computed between the coarse grained centers of the molecules and then redistributed to the atoms weighted by the ratio of the atom's mass to the mass of molecule [169]; in the transition region this operation is required by the fact that molecules interact at both the AA and the CG level. AA degrees of freedom thus *have* to be explicitly integrated, at least into the hybrid region. In the CG region, on the other hand, it is in principle not necessary to conserve the atomistic detail of the molecules, so that the CG force could be applied directly to the CoM coordinate; a molecule's internal structure can thus be removed when it enters the CG region, and reintroduced (e.g., taking it from a reservoir/repertoire of equilibrated atomistic molecules) as soon as it approaches the hybrid region. In all AdResS versions implemented so far, though, the atomistic DoFs are retained for simplicity of implementation [24]; the CoM of the molecule is nonetheless decoupled from the internal atomistic structure, and it evolves only subject to the CG force.

It was previously mentioned that no energy interpolation is possible, that is compatible with the requirement of having Newton's 3rd law preserved everywhere in the system [168]; as a consequence, a force interpolation had to be chosen. It is evident, then, that the AdResS scheme cannot be formulated in terms of a Hamiltonian, thus making it impossible to perform microcanonical, *i.e.*, energy-conserving simulations. The force-field used in this adaptive resolution simulation framework is not conservative in the transition region, and when crossing it a molecule receives a surplus of energy that has to be removed in order to prevent the system from artificially heating up. This excess energy can be removed with a *local* thermostat, such as Langevin thermostat: in this way, the temperature of the system is kept constant everywhere. The equilibrium state of the system is then *dynamical*: the thermostat takes care of absorbing the extra heat produced in the transition region by non-conservative forces, and the system samples equilibrium configurations according to Boltzmann's distribution [24–32].

The pressure difference between an AA system and a low-resolution model typically resulting from coarse graining procedures determines the onset of a non-uniform density profile. For example, a one-site CG model of water obtained with IBI can have a pressure ∼6000 times the atomistic reference value [117]. Therefore, the densities in the two subregions will change in order to equate the pressures. A possible solution to this density imbalance is to parametrize the CG potential to the target pressure. In the IBI framework this can be achieved by introducing a "pressure correction" [81]. This approach can provide a CG potential that has the target pressure, but this would also result in a modified compressibility [117].

Another option to preserve a uniform density across the simulation domain without modifying the CG potential is to introduce an external force which counterbalances the high pressure of the CG model. This *thermodynamic force* can be obtained with an iterative procedure via the following expression [169–171]:

$$\mathbf{f}\_{th}^{i+1} = \mathbf{f}\_{th}^{i} - \frac{1}{\rho^\* \kappa\_T} \nabla \rho^i(r) \tag{22}$$

where ρ is the reference molecular density, κ<sup>T</sup> is the system's isothermal compressibility and ρ<sup>i</sup> (r) is the molecular density profile as a function of the position in the direction perpendicular to the CG-AA interface. The thermodynamic force is initialized to zero, **f** <sup>0</sup> th = 0, while the initial density profile is the one calculated from an AdResS simulation with **f**th = 0. As can be easily seen, the iterative procedure converges once the density profile is flat (∇ρ(r)=0).

This approach guarantees a flat density profile without having to modify the CG potential: because of its very definition, the thermodynamic force only acts on those molecules that cross the hybrid region, leaving the others unaffected. It can also be shown [24,169] that the integral of the thermodynamic force across the interface, *i.e.*, the work due to this force performed by a molecule while crossing the hybrid region, is proportional to the local pressure profile, the proportionality factor being the reference density ρ .

In summary, the thermodynamic force allows us to couple a system at atomistic resolution to a coarse-grained counterpart whose pressure, for given values of density and temperature, is significantly different. The global properties of the force, whose direct effect is restricted to the hybrid region, only depend on the pressure difference between the two coupled subsystems; the detailed profile of the force, on the other hand, can be obtained via a system-specific iterative procedure. This method not only allows one to preserve the desired structure of the system in the CG region; in principle, in fact, an arbitrary CG force-field, with pressure *and structure* completely different from the target atomistic ones, can be used. Consequently, the AA region behaves as an open system [169] that exchanges energy and molecules with a reservoir: the molecule number fluctuations, the pressure and all other thermodynamically relevant quantities are the same as if the AA region were simply 'cut' from a large all-atom simulations. It is relevant to stress here that because of the thermodynamic force this condition can be established *irrespective of the specific model used in the CG region*.

#### *3.2. Applications*

The possibility of treating a system with a reduced number of degrees of freedom except where it is strictly necessary was explored, making use of the AdResS method, in several applications [24–30,172]. From the numerical/computational point of view it clearly represents an advantage, since a much smaller number of force calculations are required in the coarse-grained region: this is particularly true for parallel MD codes such as GROMACS [173], where a dynamical decomposition of the simulation box allows one to subdivide the box with a finer grid in the AA and hybrid region, while a smaller number of processors is assigned to the CG region. For example, for a water system with an AA region covering 1/6 of the total simulation box, simulated with GROMACS on a 16-cores processor, the speed-up is about a factor three. This factor is nonetheless small compared to what can be achieved with other simulation packages, such as ESPRESSO++ [174]: in fact, water simulation in GROMACS is extremely optimized, and any hacking of the standard code can introduce a bottleneck.

A major strength of the AdResS method is the fact that it introduces a decoupling between a given region of the system and the rest while keeping the thermodynamic properties of both regions under control: as a consequence, it is possible to conceive numerical experiments in which the spatial extension of correlations in the system is investigated. More specifically, one can study the structural properties of the high-resolution region as a function of its size, in order to determine their dependency on the interaction with molecules in the bulk region. This kind of experiments is different from the study of finite-size effects: in the latter, in fact, the system has the same resolution and interaction type everywhere, and the change of a property with the box size depends on the asymptotic approach to the thermodynamic limit. In the AdResS setup, on the other hand, finite-size effect can be neglected for sufficiently large boxes, thus allowing one to characterize the response of the system's properties in a small subregion when atomistic interactions with the bulk are switched off, but the thermodynamics is the same as in a fully-atomistic simulation. An example of this applications is provided by the work in Reference [175]: here a molecule with both hydrophilic and hydrophobic interactions was solvated in water and put at the center of the high-resolution region, while the water molecules far from the surface were treated at the coarse-grained level. The ordering degree of the hydrogen bond network on the molecule's surface was measured as a function of the size of the all-atom region: the results showed a dependency of the ordering for water molecules close to the surface of the repulsive solute, while no relevant effect was observed for the attractive case.

The same strategy has been applied to investigate the extent of spatial correlations in a quantum fluid, namely low-temperature para-hydrogen [30,176]. The latter is the spin-zero singlet state of molecular hydrogen. Because of the spherical symmetry of the global wave function, para-hydrogen in the solid and gas phase can be modeled as a classical, point-like particle interacting via a simple radial potential, such as Lennard-Jones or the more accurate Silvera-Goldman potential [177,178]. The same classical potential has been shown to correctly reproduce the experimental results both in the solid and the gas phase [178]. In the fluid phase, however, nuclear delocalization effects become important, and a quantum mechanical treatment of the problem is necessary. This can be achieved through the path integral formalism [179,180], which allows for the explicit inclusion of nuclear quantum effects in a "classical" description; unfortunately, this also implies a significant increase in the number of degrees of freedom that have to be simulated, since each molecule becomes a collection of P beads connected by springs. The possibility to simulate a quantum system in a classical framework such as classical MD makes it possible to couple quantum a classical descriptions with the AdResS scheme. In particular, a low-temperature para-hydrogen system was simulated making use of the explicit path integral representation only in a small spherical subregion of the domain, while the molecules in the outer region were treated at the purely classical level, *i.e.*, point-like particles interacting through a coarse-grained potential [30]; in Figure3asnapshot of the simulated system is provided. This study showed that a few molecules in a small (∼0.6 nm radius) region of the system are sufficient to reproduce the quantum pair correlation function obtained from a full path integral simulation, but treating the molecules in the outer region at the CG level; this result opens the way to simulate large systems of low-temperature para-hydrogen taking advantage of a double resolution without disrupting the thermodynamical and structural properties of the small, purely quantum region, thus saving computational time in the CG region.

More recently the AdResS scheme has been successfully employed to perform simulations of biologically relevant systems such as methanol-water mixtures [181] and triglycine in aqueous urea [171], and to study the coil-globule transition of a PNIPAm molecule in aqueous methanol [182]. In all these cases a crucial necessity is to correctly reproduce the solvation free energies of the system, a condition that is verified only when the particle number fluctuations are compatible with those observed in the Grand Canonical ensemble. The large system sizes necessary to fulfill this requirement in a standard, all atom simulation often make the latter unfeasible; the employment of dual-resolution simulation methods, possibly coupled to a Monte Carlo scheme [182] to enforce fluctuations in the total number of molecules, see Figure 4, allows one to keep the computational cost low and obtain results that would otherwise require a significantly longer time.

Figure 3. Set-up of the Adaptive Resolution Simulation (AdResS) para-hydrogen simulation performed in Reference [30] (figure adapted from therein). A small sphere in the center of the box, having radius as small as 0.6 nm, is treated at the path integral level (red rings), while the rest is described by point-like molecules (the white spheres); the hybrid region (blue) interfaces these two representations.

Figure 4. Schematic representation of the schemes used for the simulations of a PNIPAm molecule solvated in aqueous methanol: (a) Conventional AdResS scheme, where a small all-atom (AA) region is coupled to a large "closed boundary" coarse-grained reservoir; (b) Particle exchange adaptive resolution scheme (PE-AdResS), where an AA region is coupled to a much smaller open boundary coarse-grained reservoir, where particle exchange is performed at the eight corners of the simulation domain to avoid depletion effects; (c) Mapping scheme representing the smooth coupling between AA and CG particle representations. Figure from [182].

#### *3.3. The Limitations of the Force-Based Approach*

The AdResS method discussed so far represents a simple, effective way to perform double-resolution simulations, *i.e.*, simulations where the model used to represent a molecule and its interactions with the others changes according to the molecule position. Assigning the lowest-resolution model to the largest region allows one to save computational resources and characterize the bulk dependence of structural properties of the high-resolution subsystem. A majorly important point is given by the possibility to keep the thermodynamics of the system under control: this can be achieved by direct intervention on the CG model's properties, or by introducing an external field -the thermodynamic force- in the hybrid region to compensate for density imbalances. This second streategy is crucial, since it allows one to *couple arbitrarily different systems* while keeping locally well-defined temperature, pressure and energy.

The AdResS method was conceived based on the requirement that Newton's Third Law has to be exactly satisfied everywhere. This constraint poses a strict limitation to the possible ways to interface the two representations of the system: specifically, no potential energy interpolation is possible, via a position-dependent switching function, that preserves Newton's Third Law [168]; as a consequence, the only acceptable interpolation can be performed on the forces.

*A posteriori*, the lack of a global energy function proves not to be a major problem: equilibrium and canonical sampling can be enforced making use of a local Langevin thermostat. A theoretical analysis of the AdResS dual resolution scheme has been recently carried out in Reference [183], where the presence of a local thermostat and the thermodynamic force have been shown to be necessary and sufficient conditions to guarantee the equivalence of the atomistic region to an open region of a fully atomistic simulation up to second order correlation functions (density profile and radial distribution function). These results have been obtained from a completely general model of a dual resolution setup under the assumption of the thermodynamic limit; the generality of this approach makes it thus applicable to different types of adaptive resolution schemes, independently of the detailed form of the method chosen to interpolate the resolutions.

Nonetheless, the lack of a Hamiltonian has negative consequences on the usage of the AdResS method; the four major ones are: microcanonical, *i.e.*, energy-conserving simulation are not possible; no partition function can be written for the system as a whole; no Monte Carlo scheme can be implemented. Finally, due to the non-conservative nature of the forces in the hybrid region the system necessarily has to be locally thermostatted to compensate for the heat that is produced in the hybrid region, so that an AdResS simulation is found to be in a state of *dynamical equilibrium* [32], with a constant flux of heat between the system and the thermostat.

In the next section a method is discussed, named H-AdResS [33] (for Hamiltonian Adaptive Resolution Simulation scheme), that provides a solution to the aforementioned problems; clearly, as no free lunch is usually available, there is a price to pay: the Hamiltonian formulation requires a local breakdown of Newton's Third Law.

#### *3.4. The Hamiltonian Adaptive Resolution Scheme*

As was discussed in the previous section, the force-based AdResS method was developed on the basis of a central requirement, namely that Newton's 3rd law has to be exactly satisfied everywhere. A consequence of this constraint is that no Hamiltonian formulation is possible [168]: if a position-dependent interpolation of the potential energies is done, in fact, the resulting forces include a term proportional to the derivatives of the switching function λ that cannot be recast in a form that satisfies Newton's Third Law. The only method developed in the past that allows one to explicitly conserve the energy in an adaptive resolution simulation is that proposed by Heyden and Truhlar, [184,185], where a sum of the Lagrangians of all possible groupings of atomistic and coarse-grained molecules is done. Due to its combinatoric nature, this approach is extremely difficult to implement efficiently; moreover, the resulting Lagrangian includes a position-dependent kinetic energy term for which a specific, non-symplectic integrator is required.

In the H-AdResS method [33], which we now describe, the aforementioned constraints are relaxed in order to develop an energy-based, Hamiltonian adaptive resolution simulation scheme. As will be clear in a few lines, the particular choice of energy "mixing" gives rise to forces that do not comply with the first constraint; nevertheless, the physical interpretation of these terms is immediate and naturally points towards the solution -though approximate- of the Newton's Third Law breakdown.

The core idea of the energy-based approach is to weight the *total energy* of each molecule with a position-dependent function:

$$H = \mathcal{K} + V^{int} + \sum\_{\alpha} \left\{ \lambda\_{\alpha} V\_{\alpha}^{AA} + (1 - \lambda\_{\alpha}) V\_{\alpha}^{CG} \right\} \tag{23}$$

where <sup>K</sup> is the (all-atom) kinetic energy of the molecules, <sup>V</sup> int is the interaction internal to the molecules, and:

$$\begin{cases} V\_{\alpha}^{\rm AA} \equiv \frac{1}{2} \sum\_{\beta, \beta \neq \alpha}^{N} \sum\_{ij} V^{AA} (|\mathbf{r}\_{\alpha i} - \mathbf{r}\_{\beta j}|) \\\ V\_{\alpha}^{\rm CG} \equiv \frac{1}{2} \sum\_{\beta, \beta \neq \alpha}^{N} V^{CG} (|\mathbf{R}\_{\alpha} - \mathbf{R}\_{\beta}|) \\\ \lambda\_{\alpha} = \lambda(\mathbf{R}\_{\alpha}) \end{cases}$$

The switching function λ goes from 0 (purely CG) to 1 (purely AA). The force acting on atom i in molecule α is obtained through differentiation of the Hamiltonian in Equation (23):

$$\mathbf{F}\_{\alpha i} = \mathbf{F}\_{\alpha i}^{int} + \sum\_{\beta, \beta \neq \alpha} \left\{ \frac{\lambda\_{\alpha} + \lambda\_{\beta}}{2} \mathbf{F}\_{\alpha i | \beta}^{AA} + \left( 1 - \frac{\lambda\_{\alpha} + \lambda\_{\beta}}{2} \right) \mathbf{F}\_{\alpha i | \beta}^{CG} \right\} - \left[ V\_{\alpha}^{AA} - V\_{\alpha}^{CG} \right] \nabla\_{\alpha i} \lambda\_{\alpha} \tag{24}$$

The forces FAA αi|<sup>β</sup> and <sup>F</sup>CG αi|<sup>β</sup> are defined as:

$$\mathbf{F}\_{\alpha i|\beta}^{AA} \equiv \sum\_{j=1}^{n\_{\beta}} -\frac{\partial}{\partial \mathbf{r}\_{\alpha i}} V(|\mathbf{r}\_{\alpha i} - \mathbf{r}\_{\beta j}|)$$

$$\mathbf{F}\_{\alpha i|\beta}^{CG} \equiv -\frac{m\_{\alpha i}}{M\_{\alpha}} \frac{\partial}{\partial \mathbf{R}\_{\alpha}} V^{CG}(|\mathbf{R}\_{\alpha} - \mathbf{R}\_{\beta}|) \tag{25}$$

The redistribution of the CG force on the atomistic degrees of freedom follows the same rules as applied in the case of the force-based AdResS method. It's worth noting that in this energy-based scheme the atomistic degrees of freedom are retained and integrated everywhere in the system, a necessary requirement in order to perform a *microcanonical* simulation making use of a Hamiltonian.

We now detail the various components of the force, Equation (24). The first term, Fint αi , is due to the interactions *internal* to the molecule; as such, it automatically satisfies Newton's Third Law. The second term is a sum of pairwise forces obtained from all-atom and coarse-grained Hamiltonians, weighted by a function that is symmetric under molecule label exchange, that is α ↔ β; this force also complies with Newton's Law. Up to this point we modified only one aspect of the original AdResS scheme, that is, the force weights are not given by the product of the two molecules' switching function, rather by the average; consequently, the molecules in the coarse-grained region are also allowed to interact through their atomistic degrees of freedom.

The third term of the forces in Equation (24) is the part that breaks down Newton's Third Law: in fact, it cannot be written as a sum of terms antisymmetric under molecule label exchange. This force, which is nonzero *only in the hybrid region*, is proportional to the difference between the potential energies of a given molecule in the AA and the CG representation; if a systematic difference exists between the AA and the CG potentials, the effect of this term is to push molecules into one of the two bulk regions. The hybrid region thus behaves as an *active membrane*, inducing a density imbalance and a non-flat pressure profile. One is then naturally led to ask how strong is the drift term **F**dr <sup>α</sup> = − V AA <sup>α</sup> <sup>−</sup> <sup>V</sup> CG α ∇αiλα; if it is negligible in some cases; which these cases are; and if there is a general way at least to minimize its effect without giving up the Hamiltonian character of these forces. We shall now address these questions.

The optimal case in which this term is minimized is *when the CG potential perfectly reproduces the many-body PMF*. If this is true, in fact, the drift term vanishes on average:

$$\left| V^{CG}\_{\alpha\beta} \equiv \left\langle V^{AA}\_{\alpha\beta} \right\rangle \right. \Rightarrow \left\langle \mathbf{F}^{dr}\_{\alpha} \right\rangle \propto \left\langle \left[ V^{AA}\_{\alpha} - V^{CG}\_{\alpha} \right] \right\rangle \rightarrow 0$$

This can be numerically verified with a simple toy model, for which a pairwise CG potential represents an excellent approximation to the PMF. Such a model is provided by a low-density fluid of purely repulsive tetrahedral molecules [24], whose CG potential has been obtained from IBI. This model was used in an energy-conserving H-AdResS simulation, and the resulting density profile is plotted in Figure 5. The molecular density attains the same value in both the AA and CG regions; in the hybrid region a small depletion is present, because the free energy of the mixed potential is different from the free energy of the "pure" (*i.e.*, purely AA or purely CG) potentials. The same behavior has been systematically observed in AdResS simulations [24].

Needless to say, this particular case is very fortunate: as we discussed in the previous sections, the CG potentials almost never reproduce the many-body potential of mean force [117,186]. The difference between an atomistic model and its coarse-grained representation therefore results in a thermodynamic imbalance, that is, both pressure and density of the two bulk (AA and CG) regions are different [13]. The solution to this problem is again to introduce a compensation term in the Hamiltonian, as was done in the AdResS scheme with the thermodynamic force. More specifically, we modify the Hamiltonian as follows:

$$H\_{\Delta} = H - \sum\_{\alpha=1}^{N} \Delta H(\lambda(\mathbf{R}\_{\alpha})) \tag{26}$$

where ΔH(λ) is a function to be defined. It's worth noting that this term preserves the conservative nature of the Hamiltonian.

Figure 5. H-AdResS simulation of a system of tetrahedral molecules coupled to point-like molecules interacting through an Iterative Boltzmann inversion (IBI)-CG potential (reprinted from the Supporting Information of Reference [33]). Top: density profile; bottom: radial distribution functions of the atomistic (red lines) and coarse-grained (blue lines) degrees of freedom in the all-atom region; the solid lines are the reference RDFs calculated in the all-atom system, while the dashed lines are obtained from a H-AdResS simulation.

In order to determine the specific form of ΔH we impose that the drift force cancels on average:

$$\left. \frac{d\Delta H(\lambda)}{d\lambda} \right|\_{\lambda\_\alpha} \nabla\_\alpha \lambda\_\alpha + \langle \mathbf{F}\_\alpha^{dr} \rangle \equiv 0 \tag{27}$$

or equivalently:

$$\left. \frac{d\Delta H(\lambda)}{d\lambda} \right|\_{\lambda = \lambda\_\alpha} = \left\langle \left[ V\_\alpha^{AA} - V\_\alpha^{CG} \right] \right\rangle\_{\mathbf{R}\_\alpha} \tag{28}$$

where the subscript in the average indicates that the latter has to be performed constraining the CG site of molecule α in the position **R**α.

In principle, Equation (28) provides us with the way to compute the compensating function—or, more precisely, its derivative; nonetheless, an approximation to ΔH might be sufficient. A way to do this is the following:

$$
\langle \left[ V\_{\alpha}^{AA} - V\_{\alpha}^{CG} \right] \rangle\_{\mathbf{R}\_{\alpha}} \simeq \frac{1}{N} \left\langle \left[ V^{AA} - V^{CG} \right] \right\rangle\_{\lambda'} \tag{29}
$$

where λ ≡ λ(**R**α) is the same for all molecules. The approximate function ΔH is obtained by integration:

$$
\Delta H(\lambda) = \int\_0^\lambda d\lambda' \frac{d\Delta H(\lambda')}{d\lambda'} \simeq \frac{1}{N} \int\_0^\lambda d\lambda' \langle \left[ V^{AA} - V^{CG} \right] \rangle\_{\lambda'} = \frac{\Delta F(\lambda)}{N}
$$

Most interestingly, we see from Equation (30) that the compensation needed to cancel **F**dr <sup>α</sup> is related to the *Helmholtz free energy* difference between AA and CG system [187]. Therefore, it is possible to calculate the compensating function needed to restore, on average, Newton's Third Law by performing a Kirkwood thermodynamic integration.

The "Helmholtz free energy compensation" thus cancels the active effect of the hybrid region, restoring a flat pressure profile. Nonetheless, coarse-grained models have, in general, a substantially different pressure with respect to their atomistic reference [117], thus inducing a further density imbalance (usually larger than the one due to the different Helmholtz free energy). In order to restore a flat density profile a second term has then to be added to the compensating function, that counterbalances the pressure difference.

The right way to introduce the pressure into the compensating function is to balance, rather than Helmholtz free energy, the *Gibbs free energy* difference per particle, that is, the chemical potential Δμ = ΔG/N:

$$
\Delta H(\lambda) \equiv \Delta \mu(\lambda) = \frac{\Delta F(\lambda)}{N} + \frac{\Delta p(\lambda)}{\rho^\star} \tag{30}
$$

Figure 6 shows the density and pressure profiles for the three possible cases we discussed: the previously mentioned system of tetrahedral molecules was coupled to a coarse-grained fluid of purely repulsive point-like molecules; the pressure of this fluid has a larger pressure then the reference all-atom one for the same temperature and density. In the plot, the red lines correspond to the case in which no free energy compensation is introduced: the density is higher in the AA region, due to the molecules in the CG region that "push" with a higher pressure. The profile of the pressure is also not flat: the Helmholtz free energy of two systems differs, therefore an active force exists in the hybrid region. When the Helmholtz free energy compensation is applied we have the situation shown by the green lines: the density is still higher in the AA region, but the pressure profile is now flat: the forces that break Newton's Third Law in the hybrid region are cancelled on average, and the density imbalance decreases. Finally, when the Gibbs free energy compensation is applied the densities of the AA and CG regions attain the same value, but for a small deviation due to the fluctuations present in the hybrid region (that the compensation function ΔH, computed in a homogeneous system, cannot remove). The pressure, on the other hand, is different: in fact, in each region it reaches the value that corresponds to the reference state of density and temperature. Analogous results are obtained in a thermostatted simulation of a water box, as shown in Figure 7: here the system is composed of a slab of water molecules described at atomistic resolution, coupled to a CG bulk where particles interact via a purely repulsive WCA potential. As in the previous case, the CG interaction was parametrized to induce an increase of the density in the atomistic region, as can be seen in Figure 8 (upper panel). The Free Energy Compensation restores the correct density profile, and guarantees that in the AA region the pairwise correlations, *i.e.*, the radial distribution functions, are the same that one would measure in a fully atomistic simulation, as shown in Figure 8 (bottom panel). We notice that Gibbs free energy compensation, even though it equates the densities in the bulk regions, is not sufficient to remove small fluctuations (of the order of ∼3%) in the hybrid region: these deviations from the reference value are due to the fact that the compensation ΔH is computed in a homogeneous system, where all molecules have the same value of λ—that is, a regular Kirkwood thermodynamic integration Hamiltonian. The molecules in the hybrid region, on the other hand, interact with other molecules having different λ values. The resulting fluctuations are expected to decrease with increasing size of the hybrid region, in which case the environment of a given molecule approaches the condition of homogeneous λ. Another strategy to flatten the density profile is clearly provided by the iterative approach of the thermodynamic force (Equation (22)), a few iterations of which would be sufficient to modify the ΔH function by the small amount necessary to remove the fluctuations.

Figure 6. Plots showing the effect of the free energy compensations on the density profile (upper panel) and pressure profile (lower panel) in a H-AdResS simulation with CG potential having larger pressure, for identical temperature and density values, than the all-atom one (reprinted from Reference [33]). The red line corresponds to the case where no compensating function was employed; the green line to the Helmholtz free energy compensation; and the blue line to the Gibbs free energy compensation. All densities are normalized to the value of the fully atomistic simulation (dotted line at ρ = 1). All pressures are normalized to the value of the fully atomistic simulation (dash-dot line); the dotted line indicates the normalized pressure of the fully coarse-grained simulation.

Figure 7. Schematic view of a dual-resolution simulation of water: the central slab of the box is described at atomistic resolution, while in the bulk the molecules are point-like particles interacting via a purely repulsive WCA potential.

Figure 8. Top panel: density profile of the water system along the x coordinate. The red dotted line corresponds to the H-AdResS simulation without FEC, while the solid back line has been obtained using the FEC. Bottom panel: radial distribution functions of the water atoms in the central (AT) slab of the box, as obtained from a fully atomistic simulation (solid lines) and a H-AdResS simulation with FEC (dots).

The Free Energy Compensation (FEC) strategy, defined by Equation (26), can be extended to multi-component systems. To illustrate this idea we consider a molecular liquid composed by two types of molecules, A and B, indexed with a and b, respectively. The corresponding H-AdResS Hamiltonian for this system reads:

$$H^{MIX} = K + V^{int} + \sum\_{a \in A} \left[ \lambda\_a V\_a^{AA} + (1 - \lambda\_a) V\_a^{CG} \right] + \sum\_{b \in B} \left[ \lambda\_b V\_b^{AA} + (1 - \lambda\_b) V\_b^{CG} \right] \tag{31}$$

with λ<sup>a</sup> = λ(Ra) and λ<sup>b</sup> = λ(Rb). The intermolecular potential energy terms are given by the following expressions:

$$\begin{aligned} V\_a^{AA} &= \frac{1}{2} \left[ \sum\_{\substack{a' \in A \\ a' \neq a}} \sum\_{ij} V[AA]\_{ai;a'j}^{AA} + \sum\_{b \in B} \sum\_{ij} V[AB]\_{ai;bj}^{AA} \right] \\\ V\_a^{CG} &= \frac{1}{2} \left[ \sum\_{\substack{a' \in A \\ a' \neq a}} V[AA]\_{ai'}^{CG} + \sum\_{b \in B} V[AB]\_{ab}^{CG} \right] \\\ V\_b^{AA} &= \frac{1}{2} \left[ \sum\_{\substack{b' \in B \\ b' \neq b}} \sum\_{ij} V[BB]\_{ib;b'j}^{AA} + \sum\_{a \in A} \sum\_{ij} V[AB]\_{bi;aj}^{AA} \right] \\\ V\_b^{CG} &= \frac{1}{2} \left[ \sum\_{\substack{b' \in B \\ b' \neq b}} V[BB]\_{bb'}^{CG} + \sum\_{a \in A} V[AB]\_{ba}^{CG} \right] \end{aligned} \tag{32}$$

where V [XY ] is the non-bonded interaction between a molecule of type X and a molecule of type Y , with X, Y = A, B, and the indices i, j labeling the atoms.

In analogy with one-component systems we introduce a FEC term for each species to compensate for the free energy difference between the AA and the CG regions:

$$H\_{\Delta}^{MIX} = H^{MIX} - \sum\_{a \in A} \Delta H\_A(\lambda\_a) - \sum\_{b \in B} \Delta H\_B(\lambda\_b) \tag{33}$$

An *Ansatz* for the compensation term of a given species k = a, b can be obtained from TI as follows:

$$
\Delta H\_k(\lambda) = \frac{\Delta F\_k(\lambda)}{N\_k} + \frac{\Delta p\_k(\lambda)}{\rho\_k^{\star}}
$$

$$
\begin{aligned}
\Delta F\_k(\lambda) &= \int\_0^\lambda d\lambda' \left< \left[ V\_k^{AA} - V\_k^{CG} \right] \right>\_{\lambda'} \\
\Delta p\_k(\lambda) &= p\_k(\lambda) - p\_k(0) \end{aligned}
\tag{34}
$$

where the Nk, ρ <sup>k</sup> ≡ Nk/V and p<sup>k</sup> are, respectively, the number of molecules, the reference partial density and the partial virial pressure of species k. We stress that all the quantities in Equation (34)

600

can be computed in a single TI of the mixture from AA to CG at the concentration of interest, irrespective of the number of species. All the cross-interactions between different types of molecules are automatically included in the free energy contribution of each species. Additionally, the Free Energy Compensation ΔHk(λ) is an *intensive* quantity and does not depend on the specific geometry of the H-AdResS setup. It is therefore possible to perform the TI in a relatively small system, provided that it is statistically representative, *i.e.*, finite size effects are negligible.

The effectiveness of this strategy has been proven by the Monte Carlo simulations of binary mixtures performed in Reference [34]. Here we report one of these simulations, specifically the mixture of 70% A-type molecules and 30% B-type molecules, both made of four identical atoms; the A–A and B–B interactions are identical WCA potentials, while the A–B interaction is a Lennard-Jones potential. In the CG region both molecules are represented as spherical particles with identical, purely repulsive WCA A–A, B–B and A–B interactions, resulting in a particularly large thermodynamic mismatch between AA and CG domains. This can be directly observed in the snapshot of the simulation reported in Figure 9 (top) as well as in the density profiles (dotted lines in Figure 10): the chemical potential imbalance between the two resolutions leads to a large accumulation of B-molecules in the AA zone. As a consequence, neither the total density nor the relative concentrations in the AA zone obtained using the uncompensated adaptive resolution Hamiltonian in Equation (31) correspond to the reference atomistic system.

Figure 9. Snapshots of a H-AdResS Monte Carlo simulation (reprinted from [34]). Top panel: Equilibrated configuration, without FEC. Bottom panel: Equilibrated configuration, with FEC. The A-type atoms are represented in gray, the B-type atoms in orange. Molecules in the coarse-grained (CG) region are represented as large spheres. White vertical lines mark the boundaries of the CG-hybrid and hybrid-atomistic regions.

According to Equation (34), a thermodynamic integration was performed to determine the thermodynamic mismatch between the AA and the CG zone. The Helmholtz and Gibbs free energy differences per molecule between the CG and AA models as a function of the coupling parameter λ, computed for both species *simultaneously* in a single TI, are shown in Figure 11. In spite of the same interaction between molecules of the same type (V [AA] ≡ V [BB]), the uneven relative concentration of the two species determines a much larger free energy difference between the AA and CG models for the B-type. In fact, the latter shows a Gibbs free energy difference per particle |ΔGB/NB| > 2 |ΔGA/NA|. This is mainly due to the fact that the interaction between A and B types is attractive only in the AA representation, thus determining a lower chemical potential for the minority type (B) in the AA region. In addition, in both cases the sign of ΔG favors the densification of particles in the AA region, as can be seen in Figure 10. To counterbalance the mismatch in chemical potentials a FEC was introduced in the H-AdResS Hamiltonian according to Equation (33), using the free energy functions shown in Figure 11. The resulting density profiles (solid lines in Figure 10) demonstrate the success of the procedure.

Figure 10. Density profiles along the direction of resolution change (reprinted from [34]). Dotted lines: H-AdResS simulations without FEC; solid lines: With FEC. Vertical dashed lines indicate the boundaries between the AT, hybrid and CG regions; horizontal dashed lines mark the reference value of the density (normalized to the total density) as expected in a fully atomistic simulation of the system.

Figure 11. Free energy differences per molecule between the AA and CG models as a function of the mixing parameter λ (reprinted from [34]). The Helmholtz free energy is represented by the dotted lines, the Gibbs free energy by the solid lines. Molecular species A corresponds to the black curves, species B to the orange curves.

In this section we discussed the H-AdResS method, which allows for a seamless coupling of two models of the same system with different resolution within a Hamiltonian framework. In order to define an energy-based mixing rule for the two models, the requirement to preserve Newton's Third Law everywhere in the system had to be relaxed. Nonetheless, the "undesired" term that appears in the forces due to the differentiation of the switching function λ is non-zero only in the hybrid region, and its particular form naturally indicates how to introduce, in a physically sound manner, a compensation function that cancels the average effect of the drift force without disrupting the Hamiltonian character of the model. The computational cost of the H-AdResS simulations is comparable to that of the AdResS method, the only difference being the need to calculate the drift force **F**dr <sup>α</sup> in the hybrid region: nonetheless, the number of molecules that are affected by this force is typically small (both the AA and the CG regions are expected to be much larger than the hybrid region), and the quantities involved, namely interaction energies and molecules' CoM coordinates, are normally computed in a MD simulation.

In spite of its simple formulation and relatively small difference with respect to the force-based method, H-AdResS represents a major step forward in terms of understanding and practical advantages. In fact, the existence of a Hamiltonian allows one to precisely formulate a statistical physics theory of double-resolution systems, providing a deep insight into the properties of a given all-atom model, its coarse-grained counterpart and the relation between them. In particular, the free energy compensations provide a simple and effective way to modulate the thermodynamic balance of AA and CG regions, thus leaving to the user the choice of the environment for the AA region most appropriate for the specific problem under examination. Last but not least, this scheme broadens the spectrum of physical ensembles that can be simulated to the microcanonical ensemble, and allows the use use of simulations techniques—e.g., Monte Carlo—that were not accessible in the force-based AdResS framework, with the *a priori* guarantee that real equilibrium configurations are sampled.

#### 4. Conclusions

The characterization of the properties of new materials, as well as the investigation of biological macromolecular machineries, have largely benefited from *in silico* experiments. In spite of a steady increase in available computational power for very large systems and long timescales of the processes involved these resources turn out to be insufficient, due to the extraordinarily large amount of data that has to be stored and force/energy calculations that have to be performed. To overcome these limitations, the field of multiscale simulations has vastly expanded over recent years, and in the present review we have covered two aspects that are central to many multiscale approaches.

In the first part of the review we have addressed methodological questions associated with the development of coarse grained models, where atoms are grouped into super-atoms to reduce the number of degrees of freedom in the system. We have summarized the current approaches to bottom-up coarse graining and addressed some of the ongoing coarse graining issues such as the choice of parametrization targets and the choice of interaction functions used for the coarse grained model. These choices lead to several possibilities (*i.e.*, coarse graining methodologies) for solving the inverse problem of finding parameters for the coarse grained interaction functions

#### 604

given the selected target properties. We have briefly discussed these (statistical-mechanically interrelated) methods in context with each other. An inevitable question that arises from having to choose coarse graining target properties and approximations to solve the parametrization problem is the question of representability of different thermodynamic and structural properties. These representability challenges go hand in hand with the question of transferability, *i.e.*, to which extent a reduced-resolution model is applicable to a state-point that is different from the one where it was parametrized. In general it can be said that transferability problems increase with decreasing level of resolution, *i.e.*, the coarser a model the more limited is its applicability range, which then needs to be very carefully assessed. However, as a positive aspect one should mention that the investigation of transferability issues can help to gain insight into physical-chemical principles that drive the behavior of the system. We have illustrated transferability-related questions with the help of a few examples. In conclusion, one should mention that transferability problems are *not* specific to coarse grained models. Such problems are well known for classical atomistic forcefield models as well. A good example is simulations of mineral systems in contact with electrolyte or polyelectrolyte solutions. Here, forcefields for ions in solution and in the mineral solid have to be combined. This combination leads to transferability issues since electronic polarizability is not represented in a classical atomistic forcefield, and the compromises that are made to approximately account for its effects in a classical parametrization are different in the different phases. As a consequence, the typical "recipes" to combine parameters for different components cannot be straightforwardly applied, resulting in a significant parametrization effort for such problems [188–190]. The increasing awareness of transferability as a modeling challenge and the solution strategies developed in the context of coarse grained models may therefore very well benefit other areas of model development such as classical atomistic force-fieds for multicomponent materials systems.

In the second part of the review we have discussed the recent advances in the field of adaptive resolution approaches. The above mentioned limitation in system size comes together with the disappointing fact that a considerable fraction of the simulated data is often discarded afterwards: the solvent, for example, is usually not involved in the analysis of the system, but it is nonetheless required by the simulation. Adaptive resolution methods try to reduce the amount of resources dedicated to the simulation of large, non-interesting regions of the system by replacing them with a simpler, coarse-grained representation of their content. Such "dual-resolution" schemes are built with the constraint that the thermodynamical properties of the region of interest (*i.e.*, the one with the higher resolution) do not differ from those that an equivalent subdomain of the system would have in a fully high-resolution simulation.

In the present work we discussed two methods to achieve this goal: the Adaptive Resolution Simulation (AdResS) scheme, based on the interpolation of two different force-fields, and its Hamiltonian formulation, H-AdResS, where the all-atom and coarse-grained potential energies are interpolated. These methods have been successfully used to interface different molecular fluids, treated at the atomistic level, with their coarse-grained models; the different properties of the AA and the CG potentials naturally induce thermodynamical imbalances in the corresponding sub-regions, but simple and effective ways to overcome this problem have been described.

The possibility of replacing vast regions of the simulated system with a crude, cheap-to-compute representation and concentrating the computational resources on smaller parts while keeping the relative thermodynamics under control makes it possible to sensibly reduce the amount of calculations required to perform a simulation, and opens the way to a broad spectrum of applications, such as large-scale simulations of complex biomolecules in solution and efficient open-boundary simulations with varying number of particles.

#### Acknowledgments

We thank all present and past members of the multiscale modeling group at the Max Planck Institute for Polymer Research, the theoretical chemistry group at the University of Konstanz as well many further colleagues for fruitful and enjoyable collaborations, in particular Luigi Delle Site, Rafael Delgado Buscalioni, Davide Donadio, Pep Español, Ralf Everaers, Sebastian Fritsch, Mara Jochum, Christoph Junghans, Biswaroop Mukherjee, Simon Poblete, Matej Praprotnik, and Nico van der Vegt. We would also like to thank the Kavli Institute for Theoretical Physics for sponsoring and hosting the workshop "Physical Principles of Multiscale Modeling, Analysis and Simulation in Soft Condensed Matter" and the participants of this workshop for many stimulating discussions. CP acknowledges financial support by the German Science Foundation within the Emmy Noether Programme (grant PE 1625/1-1) and by the Volkswagen Foundation within the call "New Conceptual Approaches to Modeling and Simulation of Complex Systems". Kurt Kremer acknowledges research funding through the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement n. 340906-MOLPROCOMP. All authors express their deep gratitude to Aoife Fogarty for a careful proofreading of the manuscript.

#### Author Contributions

All authors contributed equally to the composing as well as the writing of this paper. All authors have read and approved the final published manuscript.

#### Conflicts of Interest

The authors declare no conflict of interest.

#### References


#### MDPI AG

Klybeckstrasse 64 4057 Basel, Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 http://www.mdpi.com/

*Entropy* Editorial Office E-Mail: entropy@mdpi.com http://www.mdpi.com/journal/entropy

MDPI • Basel • Beijing ISBN 978-3-906980-66-9 www.mdpi.com