Article Entropy and Free Energy of a Mobile Loop Based on the Crystal Structures of the Free and Bound Proteins

A mobile loop changes its conformation from "open" (free enzyme) to "closed" upon ligand binding. The difference in the Helmholtz free energy, ΔF(loop) between these states sheds light on the mechanism of binding. With our "hypothetical scanning molecular dynamics" (HSMD-TI) method ΔF(loop) = F(free) - F(bound) where F(free) and F(bound) are calculated from two MD samples of the free and bound loop states; the contribution of water is obtained by a thermodynamic integration (TI) procedure. In previous work the free and bound loop structures were both attached to the same "template" which was "cut" from the crystal structure of the free protein. Our results for loop 287-290 of AcetylCholineEsterase agree with the experiment, ΔF(loop)~ -4 kcal/mol if the density of the TIP3P water molecules capping the loop is close to that of bulk water, i.e., N(water) = 140 - 180 waters in a sphere of a 18 Å radius. Here we calculate ΔF(loop) for the more realistic case, where two templates are "cut" from the crystal structures, 2dfp.pdb (bound) and 2ace.pdb (free), where N(water) = 40 - 160; this requires adding a computationally more demanding (second) TI procedure. While the results for N(water) ≤ 140 are computationally sound, ΔF(loop) is always positive (18 ± 2 kcal/mol for N(water) = 140). These (disagreeing) results are attributed to the large average B-factor, 41.6 of 2dfp (23.4 Å(2) for 2ace). While this conformational uncertainty is an inherent difficulty, the (unstable) results for N(water) = 160 suggest that it might be alleviated by applying different (initial) structural optimizations to each template.


Introduction
In recent years we have developed a new simulation method for calculating the absolute entropy, S and the absolute Helmholtz free energy, F called the hypothetical scanning Monte Carlo (HSMC) (or HSMD), where molecular dynamics (MD) is used [1][2][3][4][5][6][7].HSMC(D) has been applied to systems of increasing complexity, where the most recent ones have been mobile loops in proteins [8][9][10].Typically, a mobile loop has an "open" conformation in the unbound protein, which is changed to a "closed" conformation upon ligand binding, i.e., the loop becomes a "lid" which covers the bound ligand.In these studies the main focus has been on entropy and free energy differences, ΔF loop between these two loop conformations in the bound and free proteins, rather than on the absolute values themselves.(As discussed later, ΔF loop not only sheds light on the mechanism of ligand binding, but in some cases it is an (ignored) component of the absolute free energy of binding.)In the present work we develop HSMD further, extending its applicability to more complex models, in particular to more complex models of loops, as explained below.
Before describing HSMC(D) and its planned enhancements in detail, it should be emphasized that calculation of S and F still constitutes a central problem in computer simulation, in spite of the significant progress achieved in the last 50 years [11][12][13][14][15][16][17][18][19][20][21][22][23].This problem is in particular severe in structural biology due to the flexibility and strong long-range interactions characterizing bio-macromolecules such as proteins.More specifically, the potential energy surface of a protein, E(x) is rugged (x is the 3N-dimensional vector of the Cartesian coordinates of the molecule's N atoms), i.e., this surface is "decorated" by a tremendous number of localized wells and "wider" ones, defined over regions,  m (called microstates) each consisting of many localized wells.A microstate  m , (e.g., the α-helical region of a peptide) which typically constitutes only a tiny part of the entire conformational space, , can be represented by a sample (trajectory) generated by a local MD [24,25] simulation.A molecule will visit a localized well only for a very short time [several femtoseconds (fs)] while staying much longer within a microstate [26,27] which is thus of a greater physical significance (for further discussions about microstates and their problematic definition in simulations, see [8,9]).
Typically one is interested in calculating the free energy of the most stable microstates rather than calculating the total free energy (of ).Thus, flexible protein segments (e.g., sidechains and surface loops), cyclic peptides and ligands bound to proteins can populate significantly several  m in thermodynamic equilibrium, which should be identified and their populations, p m = exp[-F m /k B T] calculated (k B is Boltzmann constant and T is the absolute temperature).In particular, identifying the global free energy minimum of a protein is the daunting task of protein folding.
As stated above, we are primarily interested in entropy and free energy differences, which are commonly calculated by thermodynamic integration (TI) techniques where the integration can be carried out over physical quantities such as the energy, temperature, and the specific heat, as well as over non-thermodynamic parameters [free energy perturbation (FEP), and histogram analysis methods are also included in this category] [11][12][13][14][15][16][17][18][19][20][21][22][23].While TI is a robust and highly versatile approach, it has some limitations; for example, if one seeks to calculate ΔF m,n = ∫dF between two microstates m and n with significant structural variance the integration from m to n becomes difficult and for large molecules unfeasible.This drawback of TI can be overcome to a large extent by methods that provide the absolute F m (and S m ) from a given sample; thus, one is required to carry out (only) two separate local MD simulations of microstates m and n, calculating directly the absolute F m and F n hence their difference ΔF mn = F m -F n , where the complex TI process is avoided.HSMC(D) (and other techniques [28][29][30][31] such as the quasi-harmonic approximation [32][33][34]) enable one to calculate the absolute F (and S) (for a more extensive discussion on TI and other techniques for calculating differences, ΔF (and ΔS) see [17]).
With HSMC(D) each conformation i of a given MD or MC sample is reconstructed step-by-step (from nothing) using transition probabilities (TPs); the product of these TPs leads to an approximation for the correct Boltzmann probability P i B from which various free energy functionals can be defined.
While the TPs of HSMC(D) are stochastic in nature (calculated by MD or MC simulations), all the system interactions are taken into account; in this respect HSMC(D) can be viewed as exact [3], where the only approximation involved is due to insufficient MC (MD) sampling for calculating the TPs.HSMC(D) has unique features: it provides rigorous lower and upper bounds for F, which enable one to determine the accuracy from HSMD results alone without the need to know the correct answer.Furthermore, F can be obtained from a very small sample and in principle even from any single conformation (e.g., see results for argon in [3]).
The HSMC(D) methodology has been developed systematically, as applied to systems of increasing complexity.The initial (HSMC) calculations of liquid argon, TIP3P water [3], self-avoiding walks [5] and polyglycine molecules [6] have verified the validity of the theoretical predictions stated above by comparisons with accurate results obtained by other well established techniques; a subsequent application of HSMD to peptides has led to a 100 times reduction in computer time [7].
As mentioned earlier, HSMD has also been applied to mobile loops calculating ΔF m,n (and ΔS m,n ) between the open and closed microstates.First, we treated the 7-residue mobile loop, 304-310 (Gly-His-Gly-Ala-Gly-Gly-Ser) of the enzyme porcine pancreatic α-amylase [8,9], where the system is modeled by the AMBER96 force field [35] alone and by AMBER96 with the GB/SA implicit solvent of Still and coworkers [36].In this work only protein atoms close to the loop (the template) within a sphere of radius, R tmpl were considered, where they were kept fixed in their x-ray crystallography positions, while only the loop's atoms were moved by MD.Later the implicit solvent was replaced by explicit solvent, i.e., the α-amylase loop was capped with 70 TIP3P [37] water molecules (within a sphere of radius R water = 14 Å), which (together with the loop's atoms) were moved in the MD simulation.Because the application of HSMD to water has not been optimized yet, the contribution of water to the free energy was calculated by a TI procedure incorporated within the framework of HSMD.This HSMD-TI process leads to a relatively small statistical error because only the water-loop interactions are involved, where their full values are gradually decreased to zero, for a fixed loop structure.
The number of water molecules used (70) is relatively small relying on a previous paper [38] where the minimal number of water molecules, min water N required to cap a loop was studied.min water N was determined from a stability criterion based on the loop's root mean square deviation (RMSD) from the Protein Data Bank (pdb) structure during 4 ns MD simulations.Surprisingly, min water N (which depends on the loop's sequence and environment) was found to be relatively small on average 12 waters per residue, in accord with a study of Steinbach and Brooks [39] For the loop of amylase, which consists of small residues, min water N = 70 seemed to be adequate.Clearly, these conclusions should further be tested and corroborated by free energy calculations.
A detailed such study was carried out in a following paper [10] where HSMD-TI was applied to the loop 287-290 with the bulky residues, Ile, Phe, Arg, and Phe of acetylcholinesterase (AChE) from Torpedo california; reaction of AChE with the inhibitor diisopropylphosphorofluoridate (DFP) leads to a displacement of the loop's backbone roughly by 4 Å [40][41][42][43][44][45], and experimental evidence suggests that the free energy penalty for the loop displacement is on the order of 4 kcal/mol (i.e., ΔF loop = F free -F bound ~ −4 kcal/mol) [40,42,46].Therefore AChE has been an ideal system for checking the performance of HSMD-TI, and in particular for examining the minimal number of water molecules needed to cap the loop.Thus we carried out two sets of calculations for systems of increasing size defined by R tmpl = 12 and R water = 13 Å, and by R tmpl = 13 and R water = 14 Å, for an increasing number of water molecules, N water = 80, 100, 120, 180 and 80, 100, 120, …, 220, respectively.We have found that to recover the experimental free energy difference the water density should be close to that of bulk water and our results for the first set, ΔF loop = −3.1 ± 2.5 and −3.6 ± 4 kcal/mol for a sphere containing 160 and 180 waters are equal within error bars to the experimental value [10].
Because the crystal structures of the free (unbound) [47] and bound [40] proteins are similar, in [10], (but also in our previous loop studies [8,9] and studies by others [48,49]) the template was "cut" from the crystal structure of the unbound protein, where the loop conformation in the bound protein was attached to the free template by superimposing the two crystal structures.Obviously, it would be more realistic to carry out the calculations with respect to two templates, which are cut separately from the free and bound x-ray structures.However, this would require defining an additional TI procedure where the water-template interactions are also gradually eliminated.This TI is expected to be computationally demanding since, unlike the original TI where (relatively small) loop-water interactions are considered, the number of template-water interacting atoms is relatively large increasing with the increase of the system size.
AChE is expected to be a suitable system for studying the effect of two different templates because the experimental ΔF loop is known.Therefore, the loop of AChE is treated here again, where the simulations are carried out with respect to two different templates which are cut from the free and bound crystal structures; this requires applying the (relatively) elaborate HSMD-TI procedure mentioned above.While the basic theory is similar to that described in [10], we derive it here in a new and shorter way for better clarity.
The foregoing (mainly technical) discussion did not touch on the relation of ΔF loop to ligand binding.One question of interest in this area is whether the conformational change adopted by a mobile loop upon ligand binding is induced by the ligand (induced fit [50,51]) or alternatively, the free loop (in the apo protein) interconverts among different microstates, one of which is selected upon binding (selected fit [52]).While some recent studies favor the latter mechanism, there is no general consensus, and the adopted mechanism is probably system dependent.The present study is not aimed at solving this problem, which would require to apply the longest simulation possible to the entire AChE protein soaked in explicit water, as done, for example, in [53] for studying loop 6 of TIM.Notice, however, that in the case of induced fit ΔF loop (in the apo protein) should be added to the total absolute free energy of binding of the protein-ligand complex.Thus, if the template remains unchanged (due to binding), calculation of ΔF loop carried out in [10] is expected to be adequate, whereas for a changed template the present calculation should be used.Indeed, in our recent study of the avidin-biotin complex ΔF loop is taken into account (for the first time) in the calculation of the absolute free energy of binding [54].
However, even if the behavior of the loop of AChE is of a selected fit type (i.e., the loop conformations in the apo protein also cover the structural region of the bound loop) the crystal structures and other experimental data suggest that the loop spends appreciable amount of time in its unbound microstate.Therefore, calculating ΔF loop , which measures the difference in stability between the two (possibly) meta-stable microstates is of interest.For this purpose, generating the longest possible MD trajectories might be a disadvantage because the loop is expected to escape from the original microstates during the simulations.Therefore, ΔF loop is calculated from relatively short trajectories (0.5 ns) where the loop remains in its original microstates (bound and free).
In this context it should be pointed out that determining the exact limits of a microstate in conformational space is practically impossible and therefore it is commonly defined by an MC or MD sample initiated from a microstate's structure.Thus, the microstate's size typically increases with simulation time, t and estimation of E, S, and F depend on t as well; in previous publications we have developed procedures for checking that the system remains in its microstate during a simulation.Thus, in the case of two microstates one should verify that the differences, ΔE loop (t), ΔS loop (t) and ΔF loop (t) are stable during a long enough time, t; for more details see [8,9].
Finally, while the differences ΔE loop , ΔS loop and ΔF loop between two loop microstates might be small, the fluctuations in E and S increase as N 1/2 , where N is the number of atoms simulated.Therefore, simulating the entire protein might be prohibitive and to keep the fluctuations manageable the coordinates of the template in our calculations are held fixed.Notice, however, that this is not an inherent limitation as in our present application of HSMD-TI to the avidin-biotin complex [54] part of the template is also moved in the MD simulation since the size of the fluctuations remains under control.

The Loop and the Protein's Template
As in [10] we study the 4-residue loop, Ile, Phe, Arg, Phe, (287-290) of AChE in two microstates related to the free and bound loop structures.The starting point is the available crystal structures of AChE from the protein data bank (PDB), where the unbound (free) structure 2ace [47] is based on residues 4 to 535 and 2dfp, the structure of AChE with DFP consists of residues 2 to 535 (the bound structure) [40].To be able to compare these structures 2dfp was trimmed so that both structures consist of the 531 residues, 4-535 and crystal water molecules were retained.
Moreover changing the bound structure also consisted in recovering the catalytic residue Serine (200) of this serine hydrolase (2ace) by remodeling the mono-isoprophyl-phosphoryl residue (MIS) (200) of the aged phosphorylated enzyme (2dfp).Remodeling consisted in removing the d-isopropyl-phosphoryl chain without changing the heavy atoms of the modified Serine residue and adding the corresponding hydrogen atoms.
A close analysis shows that these conformations differ overall by (all heavy atoms) root mean square deviation (RMSD) of 1.22 Å, while the loop conformations differ by RMSD = 1.38, 2.91, and 2.71 Å for the backbone, sidechains, and all loop heavy atoms, respectively.The RMSD of the templates, i.e., all atoms excluding the loop's atoms, is 1.10 Å.It should be emphasized that the coordinates of the x-ray bound structure (2dfp) appear with significantly larger B-factors than those of the free structure (2ace).
Using the program TINKER [55], and the AMBER force field [35], where Lys, Arg, and His are positively charged and Glu and Asp have a negative charge.Hydrogens were added to the free and bound structures (including the crystal water) and the potential energy was minimized, with all heavy atoms restrained to their crystal positions by harmonic forces (with a force constant of 1 kcal/Å 2 ).Afterwards the loop and (TIP3P) water atoms were allowed to relax in the presence of a fixed template.These minimizations eliminate bad atomic overlaps and strains in the original structures, while keeping the atoms reasonably close to the PDB coordinates.
The relatively weak force constant (1 kcal/Å 2 ) (as compared to 100 kcal/Å 2 used in [10] for 2ace) was chosen to allow more extensive changes in the bound structure due to its relatively large B-factors and the chemical change imposed on it by removing the phosphate group; correspondingly, larger structural changes were also allowed to occur in the free structure.To assess the change in the structure during minimization we list the various RMSD values (Å) in pairs, for the beginning and the end of the process: RMSD for all atoms, (0.98, 0.99), backbone atoms, (0.43, 0.41), sidechains, (1.20, 1.21), whole template, (0.85, 0.86), template backbone, (0.35, 032), template sidechains, (1.05, 1.06), whole loop (2.71, 2.87), loop backbone (1.39, 1.45), loop sidechains (2.92, 3.10).Thus, the strongest changes occurred for the loops.
Taking into account the whole protein (of 8,284 atoms) would be computationally prohibitive; therefore (as in [8][9][10]), the template size is reduced to the N tmpl atoms closest to the loop, where the rest of the protein atoms are ignored.More specifically, the center of mass of the backbone atoms of the free loop (in the 2ace structure) is calculated as a (3D) reference point denoted x cmb and a distance (R tmpl ) is chosen.If the distance of any atom of a residue from x cmb is less than R tmpl , the entire residue is included in the template; otherwise, the residue is eliminated.We use R tmpl = 12 Å which was found in [10] to lead to a large enough template consisting of N tmpl = 944 atoms; however, for the present (somewhat different) structure N tmpl = 961, and exactly all these atoms were included in the template which was "cut" from the crystal structure of the bound protein (2dfp).We added to each template 33 crystal water molecules selected from those which are closest to the loop (30 in [10]).
After defining the templates, the water molecules and the orientations of the polar hydrogens of the free and bound loops (on different templates) were subjected to an optimization procedure.This procedure (where all the loop's heavy atoms are kept fixed), was carried out in several rounds, each consisting of a 0.1 ns MD run (1 fs time step) at high temperature (1,250 K) followed by energy minimization; the process is stopped after a fixed number of unsuccessful rounds (typically 100), namely rounds which do not reduce the energy further.Although the heavy atoms are fixed, considerable gain in potential energy is achieved.Next, keeping the template atoms fixed, the energy of each system was minimized where the loop and water molecules were allowed to move freely.Each of the final "free" and "bound" structures constitute starting points for a further analysis.The RMSD values between the final (optimized) loop structure and its crystal structure are relatively small.For the free loop they are 0.18, 0.45, and 0.42 Å, and for the bound loop, 0.27, 0.47, and 0.44 Å for the backbone, sidechains, and all loop atoms, respectively.

Addition of Water
While our considerations thus far are based on the crystal structures, we seek to simulate the loop in solution.Therefore, it is not clear whether the positions of the crystal waters are relevant for the solution environment.In particular, water molecules that are caged within the crystal structure are expected to stay there during the simulation, and thus can be considered as part of the template.Therefore, the number and arrangement of these waters should be globally optimized, practically by energy minimization, which, however, is a non-trivial problem (see below).
As in [10], our strategy is to add more water molecules to the already existing crystallographic waters.Thus, we define a sphere centered at x cmb with a radius, R water (R water = R tmpl + 1 Å) where waters are added at random to the hemisphere oriented towards the exterior of the template.To hold these waters around the loop they are restrained with a flat-welled half-harmonic potential (with a force constant of 10 kcal mol -1 Å -2 ) based on their distance from x cmb .That is, if the distance of a water oxygen from x cmb is greater than R water a harmonic restoring force is applied, otherwise the restraining force is zero.
Even though we have found in [10] that for the above R tmpl and R water , N water should be 140-180, it is of interest to check this conclusion again for the present case of two different templates; therefore, we carry out calculations for N water = 40, 60, 80, 100, 120, 140, and 160.
Each of the two fully hydrated systems is then treated by a sequence of MD/minimization procedure similar to that outlined above, applied only to the hydrogen atoms of the loop and all water molecules restrained within the full sphere of radius R water .Again, at the end of this optimization the energy is minimized where the loop's atoms and the water molecules are free to move.The change in the loops' conformations from their crystal structures measured by the RMSD are relatively small; for the free loop we obtain 0.31, 1.03, and 0.94 Å for backbone, side chains and all loop atoms, respectively, where the corresponding results for the bound loop are 0.37, 1.06, and 0.98 Å (see the discussion related to Table 1).
For each of the optimized "free" and "bound" structures an MD run at 300 K is performed, where only the loop and water atoms are moved while the template atoms are kept fixed.Thus, 1,000 loop/water configurations are collected by retaining a configuration every 0.5 ps along the 0.5 ns MD trajectory.An equilibration run of 0.5 ns is carried out prior to the production run.The total potential energy E total is the sum of partial energies related to the loop and water (the template-template energy is constant and thus is ignored): where E loop-loop is the intra loop energy, E loop-tmpl is the energy due to loop-template interactions; these energies define the total loop energy E loop , and the interactions related to water are defined in a similar way, where their total is defined as E water .The reconstruction of the loop structure is carried out in internal coordinates; therefore, the loop conformations simulated by MD are transferred from Cartesians to the dihedral angles φ i , ψ i , and ω i (i = 1,N res = 4), the bond angles θ i , l (i = 1,N res , l = 1,3), the side chain angles χ, and the corresponding bond angles.For convenience, all these angles (ordered along the backbone) are denoted by α k , k = 1,37 = K.We have argued in [7,8] that to a good approximation bond stretching can be ignored.

Statistical Mechanics of a Loop in Internal Coordinates
The theory of HSMD-TI has been developed in previous publications; therefore, the present description is shorter, providing mainly equations that are related directly to the calculations.
The reconstruction of the loop structure (of N res = 4 residues) is carried out in internal coordinates; therefore, the loop conformations simulated by MD are transferred from Cartesians to the dihedral angles φ i , ψ i , and ω i (i = 1, N res ), the bond angles θ i , l (i = 1, N res , l = 1, 3), the side chain angles χ, and the corresponding bond angles.For convenience, all these angles (ordered along the backbone) are denoted by α k , k = 1,K; We have argued in [7,8] that to a good approximation bond stretching can be ignored, thus the bond lengths are considered to be constant.
The partition function of the loop/water /template system is: where E(x loop ,x N ) = E total is defined in Equation ( 1), x loop is the Cartesian coordinates of the loop in microstate m and N x is the 9N Cartesian coordinates of the water molecules; E total also depends on the coordinates of the "frozen" template which are omitted for simplicity.For the same reason the letter m will be omitted in most of the equations and N water will be replaced in the theoretical section by N (N = N water ).After changing the variables of integration from x loop to internal coordinates, the integral becomes a function of the K dihedral and bond angles, α k , k = 1,K and a Jacobian, cos(θ i , l ) that depends (only) on each of the bond angles θ i , l [27,28,32]: In Equation (3) we have omitted a factor, which depends on the bond lengths and is assumed to be the same (i.e., constant) for different microstates of the same loop and therefore does not affect entropy differences (e.g., see discussion in [10]).The Jacobian is also omitted for simplicity because we have shown [8] that it cancels out (within the error bars) in entropy and free energy differences.The Boltzmann probability density corresponding to Z [Equation (3)] is: and the exact entropy, S and exact free energy, F (defined up to an additive constant) are: (5) and: It should be noted that the fluctuation of the exact F is zero [56,57], because by substituting ]) [Equation ( 4)] inside the curly brackets of Equation ( 6) one obtains, , i.e., the expression in the curly brackets is constant and equal to F for any set ] [ k  within m.This means that the free energy can be obtained from any single conformation if its Boltzmann probability density is known.[Notice, however, that calculation of ) ], ([ for a single conformation depends on the entire microstate as is also evident from the HSMC(D) procedure discussed later].Still, the fluctuation of an approximate free energy (i.e., which is based on an approximate probability density) is finite and is expected to decrease as the approximation improves [3][4][5]56,57].Because HSMC(D) provides an approximation for ) ], ([ , one can, in principle, estimate the free energy of the system from any single structure [3][4][5].This is the reason why in practice reliable HSMC(D) results for F (but not necessarily for S and E) can be obtained from a relatively small sample.

Exact Future Scanning Procedure
It should first be pointed out that MC and MD are exact dynamical methods that enable one to sample system configuration i correctly with its Boltzmann probability, P i B , while the value of P i B is not provided (due to the dynamical character of these methods).[To simplify the discussion we use the discrete probability, P i B rather than the probability density ) ], ([ defined in Equation ( 4).] Thus, properties such as the energy that are "written" on i can easily be calculated, while a direct calculation of the absolute S is difficult because lnP i B is unknown (it depend not only on i but on the entire ensemble through the partition function Z, which cannot be obtained from a finite sample).Unlike the dynamical MC and MD, the exact future scanning method [that constitutes the basis of HSMC(D)] is a growth procedure which enables one (at least in principle) generating any system configuration (from nothing) by determining the positions of the atoms step-by-step with the help of transition probabilities (TPs); the product of these TPs leads to the value of P i B hence to S and F.
Practically, a loop/water/template configuration would be generated by initially building a loop structure (in the presence of moving water) followed by the construction of a configuration of the surrounding water molecules (in the presence of a fixed loop conformation).In this way a sample of statistically independent system configurations can be obtained.
For simplicity this construction is described for a loop without side chains, consisting of M Gly residues, i.e., ordered along the chain are the 3M heavy atoms denoted k' = 1…3M and the 6M dihedral and bond angles denoted α k , 1 ≤ α k ≤ 6M = K with values within microstate m; the loop is surrounded by N water water molecules moving within the volume defined by a sphere of radius, R water , the template, and the loop.We seek to generate a configuration of the entire system by first generating a loop conformation and then a configuration of the water molecules.
With the scanning procedure the position of the heavy atoms (x loop ) is determined step-by step.Thus, the position of the first atom k' = 1 is defined by the simultaneous determination of the first pair of dihedral and bond angles α 1, and α 2 .The maximum range Δα 1 Δα 2 which will keep the loop within m is defined, and each of Δα 1 and Δα 2 is divided into n b small bins (of sizes Δα 1 /n b and Δα 2 /n b ) denoted j 1 and j 2 , j 1 = 1…n b , j 2 = 1…n b , respectively.A long MD simulation of the whole system (loop + water) is carried out within microstate m, where a conformation is retained every l fs leading to a huge sample of size n; then, the number of conformations n j1j2 that visit simultaneously the (double) bin j 1 j 2 is calculated from which the corresponding TP is obtained, A double bin is then selected by a random number according to the probabilities p j1j2 which defines the position of atom k' = 1 (and its hydrogen or oxygen).The position of this atom is not changed in the next steps of the build-up process, i.e., it becomes part of the "past".The position of the second atom (k' = 2) is determined in the same manner from a long MD simulation of the future part of the system (i.e., atoms k' = 2…3M and water) where α 3 and α 4 are considered, bins Δα 3 /n b and Δα 4 /n b are defined, probabilities are calculated and a "lottery" (like above) determines the values of α 3 and α 4 which define the position of atom k' = 2; the process continues until the positions of all the loop's atoms (and their hydrogens or oxygens) have been determined.A configuration of the N water molecules is then determined in a similar way step-by-step in the presence of the fixed loop structure previously constructed (for details see [3]).Obviously, the smaller are the bins the higher is the accuracy of the construction process, provided that the statistics is adequate, i.e., that the (future) MD simulations are long enough; this stochastic scanning method becomes exact as the bin size → 0 (n b → ∞) and n→ ∞.
Notice that in applications of the (deterministic) scanning method to lattice models, only part of the future has been considered, (i.e., only f steps ahead), where this part has been scanned completely; therefore, the corresponding TPs are approximate but deterministic (rather than stochastic), and accurate results were obtained by using an additional importance sampling procedure [58].This procedure can be described more formally as follows: at step k' (k = 2k'), the positions of k' − 1 atoms have already been determined (from the values of the corresponding k − 2 angles α 1, … , α k-2 ) and they are kept fixed (defining the "past"); α k-1 and α k (which will determine the position of atom k') are defined with the exact TP density where ) , , ( is a future partition function.The term "future" indicates that the integration defining Z future is carried out over the positions of atoms k' = k/2 + 1…K/2 (which affect angles, ) and the 9N coordinates N x of the water molecules (which will be determined in future steps of the build-up process).Notice that this integration is carried out in a restrictive way where the corresponding conformations (of the loop) remain within microstate m.Also, in this integration the atoms treated in the past (1… k' − 1) (which were determined by ) are held fixed in their coordinates.For simplicity the integrations below are written over the angles rather than the Cartesian coordinates (x loop ) of the loop atoms, k' = k/2+1…K/2.Thus: where E (Equation ( 1)) is the total potential energy of the loop/template/water system, which also imposes the loop closure condition.The product of the TPs (Equation ( 7)) leads to the (Boltzmann) probability density of the entire loop conformation, ) , , ( .One can define for m "the loop entropy of mean force", loop S : where S loop is defined up to an additive constant.Extending the exact scanning procedure to side chains is straightforward, where again the position of a side chain atom is defined by two angles as described above.However, implementation of the exact scanning procedure (as described prior to Equation ( 7)) for generating Boltzmann weighted configurations of a large loop/water/template system would be inefficient, due to the need to calculate (at each step) a large set of accurate TPs (i.e., for an extremely large number of small bins); this would require extremely long (future) MD simulations.Also, with long simulations it is difficult to guarantee that the loop will remain in microstate m (see discussion in [8]).However, the exact scanning procedure provides the theoretical basis for HSMC(D).Thus, the exact scanning method is equivalent to any other exact simulation technique (in particular, to MC or MD) in the sense that large samples generated by such methods lead to the same averages and fluctuations (the sample does not carry a memory of the simulation method with which it has been generated).Therefore, one can assume that a given MC or MD sample has rather been generated by the exact scanning method, which enables one to reconstruct each conformation i by calculating the TP densities that hypothetically were used to create it step-by-step.With HSMC(D) the efficiency problems discussed above for the exact (stochastic) scanning procedure are alleviated to a large extent since only a single TP is calculated at each step (rather than many TPs (p j ) required with the scanning method), as described below; also, because we are mainly interested in entropy differences, approximations (e.g., ignoring the Jacobians and bond stretching) can be applied without compromising the accuracy of the results.

Loop Reconstruction by HSMC(D)
The theory of HSMC(D) is described again for a loop consisting of M Gly residues.Notice that while HSMD and the exact scanning method (described in the previous section) have some resemblance, they are different.With the scanning method a loop conformation is generated with P i B whereas with HSMD an already generated structure (by MD) is reconstructed in order to obtain its probability.Thus, one starts by generating an MD sample of the loop in water in microstate m; the configurations of this sample will be reconstructed by HSMD (application of HSMC is similar).First, the conformations are represented in terms of the dihedral and bond angles α k , 1 ≤ α k ≤ 6M = K, and the variability range Δα k is calculated: where α k (max) and α k (min) are the maximum and minimum values of α k found in the sample, respectively.Δα k , α k (max), and α k (min) enable one to verify that the sample has not "escaped" from , is calculated from an MD run, where the entire future of the loop and water is moved [i.e., the loop's atoms k',k' + 1,…K/2 and their connected hydrogens, or oxygens and the water coordinates x N ] while the past (loop's atoms 1,2,…,k'−1 and their connected hydrogens or oxygens) are held fixed at their values in conformation i.By considering a future conformation every 10 fs, a sample of size n f is generated.Two small segments (bins) δα k-1 and δα k are centered at α k-1 (i) and α k (i), respectively, and the number of simultaneous visits, n visit , of the future chain to these two bins during the simulation is calculated; one obtains (see Figure 1): becomes exact for very large n f (n f →∞) and very small bins (δα k-1 , δα k →0).This means that in practice ) , , ( will be somewhat approximate due to insufficient future sampling (finite n f ) and relatively large bin sizes.The same treatment can be applied to side chain atoms; however, notice that the sidechain of Arg is treated here approximately, where at each step three consecutive atoms are considered and their positions are defined only by the corresponding dihedral angles.In [8] we have shown that δα k and δα k+1 can be optimized.Notice that with HSMD the future loop conformations generated by MD at each step k' remain in general within the limits of m, which is represented by the analyzed MD sample.The corresponding probability density related to the loop is: ]) ([ defines an approximate entropy functional, denoted A loop S , which can be shown (using Jensen's inequality) to constitute a rigorous upper bound for loop S (Equation ( 9)) [3,59]: is also known as the Gibbs' inequality).Being an upper bound suggests that A loop S will decrease as the approximation improves.In practice, however, this inequality is satisfied only if the probabilities are well defined (e.g., they are correctly normalized).This is pointed out because stochastically and its error (and the error in A loop S , the estimation of A loop S , see Equation ( 14)) increases with increasing system size (i.e., increasing the number of TPs; see later discussions in 3.3).13)); correspondingly, the ensemble averages of the energy are estimated from a sample of size n s and should appear with a bar as well.However, from now on only estimations will be considered and for simplicity, all of them will appear without the bar, like the energies defined in Equation (1).A loop S (Equations ( 13) and ( 14)) constitutes a measure of a pure geometrical character for the loop flexibility, i.e., with no direct dependence on the interaction energy.When the converged or the best value of A loop S is considered, it will be denoted by S loop ; thus, F loop = E loop −TS loop is defined as the loop's contribution to the total free energy, where E loop is defined in Equation ( 1).In the same way, the difference in the loop entropies between the free and One can define a free energy difference for the loop, ΔF loop = ΔE loop −TΔS loop , where loop E  is obtained from Equation (1).Notice that (unlike in [10]) ΔS loop and ΔF loop are obtained for different templates; for simplicity their dependence on the templates is omitted.

Thermodynamic Integration of Water
To reconstruct the water configuration one can use, in principle, the HSMC(D) procedure for fluids mentioned previously [3] , which depends on the free or bound template (tmpl).However, this procedure for fluids has not been optimized yet and it is relatively time consuming.Alternatively, as in [9,10], one can obtain (defined up to an additive constant) by a thermodynamic integration (TI) procedure based on the same reference state for the free and bound structures (i.e., the above constant is the same for both structures).In this state only the water-water interactions and the harmonic restraining force that keeps the waters in the spherical volume are preserved, while the water-template and water-loop interactions [electrostatic and Lennard Jones (LJ)] are switched off, i.e., the fixed loop structure ] [ k  and the fixed template do not "see" the surrounding waters.Thus, ΔF water (our main interest) can be obtained by a TI procedure (applied to each system) where these interactions are gradually increased (from zero) during an MD simulation of water (while the loop structure and template remain fixed).(Notice that while the templates are structurally different, the template-template interactions are ignored, therefore the templates are considered to be in the same state.) In practice, however, it is more convenient to carry out the integration in an opposite direction, starting from the current water configuration with full water-loop and water-template interactions, which are gradually decreased to zero.More specifically, the water-loop interactions are integrated (eliminated) first (in the presence of intact water-template interactions), which leads to ) ], ([ The subsequent template-water integration (that depends only on the template) is denoted ) , tmpl ( TI water m F , where m stands for "free" or "bound".Each of the free and bound integrations are carried out in two stages, where in the first stage the electrostatic interactions (i.e., the charges) are decreased to zero followed by decreasing to zero the Lennard Jones (LJ) interactions, where the latter is performed by decreasing a parameter λ from 1 to zero (see Section 3.3 and Appendix).
The water-loop integration is carried out n s times for each of the loop structures ] [ k  of the sample of size n s .Denoting these ] [ k  of the sample by t and keeping for brevity the letter m ("free" or "bound") only on the left side of the equation, leads to: It should be pointed out again that the template-template energy is not considered in our model, therefore the two templates only differ by their geometry and the corresponding reference states are thus the same.

The Reconstruction Procedure with HSMD
The HSMD reconstruction procedure needs further discussions.Thus, the MD simulation of the future chain at step k' starts from the reconstructed conformation i, and every g = 10 fs the current conformation is considered, where the n init initial considered conformations are discarded for equilibration.For each of the next n f (considered) future conformations the pair (α k-1 ,α k ) (k = 2k') are calculated and their contribution to n visit (Equation ( 11)) is calculated.An essential issue is how to guarantee an adequate coverage of microstate m, i.e., that the future chains will span its entire region (in particular the side chain rotamers) while avoiding their "overflow" to neighboring microstates, conditions that will occur for a too small and a too large n f , respectively.[Note that even at step k', where the "past" of the loop (atoms 1…k'−1) is kept fixed, the (future) unfixed part (atoms k', k' + 1…) can leave the microstate during long MD simulations.Such an "overflow" is more likely to happen for small residues such as Gly and for small k'.] In previous work [6][7][8][9] we developed procedures for keeping the loop in its original microstate and measures for estimating the extent of coverage of the microstate by the reconstructed samples.In this paper these procedures have not been applied because the maximal n f values used are not large and the microstates are concentrated (i.e., the Δα k values of Equation ( 10) are relatively small; see discussions in 3.2).

Simulation Details
The solvated free and bound structures (defined on the corresponding x-ray structures) were initially optimized as described in sections 2.1 and 2.2 leading to two optimized structures.Then, starting from each optimized structure, an 1 ns MD trajectory was generated at 300 K, where the initial 0.5 ns part was used for equilibration and a sample of 1,000 structures was generated from the last (half) part of the trajectory by retaining a structure to a sample every 0.5 ps.From these samples (which represent the free and bound microstates on their different templates) the free energy and entropy are calculated.
All simulations were carried out with the velocity-Verlet algorithm [60] based on a time step of 2 fs, where bonds involving hydrogens (including those of water) were frozen to their ideal values by the RATTLE algorithm [60]; the Berendsen [60] heat bath controlled the temperature.Cut-offs on long-range interactions were not imposed, and in the reconstruction process a structure was added to the sample every g = 10 fs, where the n init = 250 initial structures (2.5 ps) were discarded for equilibration.The future samples were generated for several bin sizes, δ, where results are presented for δ = Δα k /l, l = 5, 10, 20, 30, 40, and 50, centered at α k (i.e., α k ± δ/2) (Equation ( 11)).If the counts of the smallest bin are smaller than 50 the bin size is increased to the next size, and if necessary to the next one, etc.In the case of zero counts, n visit is taken to be 1; however, an event of zero counts is very rare.

Dihedral Angles for Different Microstates
In Table 1 we present results for α k (min), α k (max) and Δα k (Equation ( 10)) for the backbone dihedral angles φ and ψ and the sidechains, χ.These values are based on the samples of 1,000 conformations generated for the free and bound microstates for N water = 140 (rather than the results for N water = 160, which show some inconsistencies, see 3.4).The table reveals that the Δα k values for the backbone of the free and bound microstates are relatively small (in most cases smaller than 80 o ) and they are comparable to those obtained by other samples (based on different N water values).Somewhat larger Δα k values were obtained for χ, where all these results are also comparable to those obtained in [10]; this suggests that ΔS loop (Equation (15b)) would be small as well.These small Δα k values stem from the constraints imposed by the template on the inner loop.
For comparison we also provide in Table 1 the dihedral values of the crystal structures, α k (crystal).These angles enable one to determine whether the samples have escaped from their original microstates.While exact definition of a microstate is practically unfeasible (see discussion in Section 1.3 of [8]), we have accepted an "escape" criterion for a dihedral angle when α k (crystal) + 60 o is smaller than α k (max) or α k (crystal)−60 o is larger than α k (min), i.e., if some angle values fall beyond the range α k (crystal) ± 60 o ; these angles are bold-faced in the table.The table reveals that only one backbone angle, φ(Arg) of both the free and bound microstates has been escaped but with small deviations of 18 o and 10 o , respectively.The number of escaped sidechain angles is seven for the free microstate and five for the bound one.Thus, the original microstates are retained for the backbone but only partially for the side chains, which might be expected since we model the loops in solution rather in the crystal environment.*α k (min), α k (max), and Δα k are defined in Equation (10); their values were calculated from samples of 1,000 loop conformations (0.5 ps) generated for the free and bound microstates.The values of α k (crystal) were calculated from the PDB crystal structures, 2ace [47] and 2dfp [40] of the free and bound protein, respectively.13) and ( 14)) and for the differences, (Equation (15a)]; they were obtained by reconstructing 80 loop structures selected homogeneously from larger MD samples (of 1,000 water-loop configurations) of the free and bound microstates obtained for N water = 140.The results are calculated as a function of the bin size δ = Δα k /l (Equation ( 10)) and n f (Equation ( 11)) the sample size of the future chains used in the reconstruction process.

Results for the Loop Entropy
Results for the loop entropy, A loop

S
[Equations ( 13) and ( 14)] appear in Table 2 for N water = 140 for the free and bound microstates and for their difference [see the discussion preceding Equation (15a)].These results were obtained by reconstructing n s = 80 loop structures, distributed homogeneously along the entire sample of 1,000 system configurations.The simulated future consists of the future part of the loop including all the surrounding water molecules.
The results are presented for several values of n f -the sample size of the future chains [Equation (11)], where n f = 200, 400, 1,200, 1,600, and 2,000; these values of n f are used for pairs of angles, such as a backbone dihedral and the successive bond angle.However, for the sidechains we also reconstruct a single χ angle and triplets of successive χ angles (e.g., for Arg) for which the maximal n f is 1,000 and 4,000 (rather than 2,000), respectively (see Equation (11)).The results are also presented as a function of bin size, δ = Δα k /l (Equations ( 10) and ( 11)) where l = 30, 40 and 50, while for n f = 2,000 we also provide results for larger bin sizes defined by l = 5, 10 and 20.The statistical errors were obtained from the fluctuations (standard deviation).

Being an upper bound,
A loop TS [Equation (13)] is expected to decrease as the approximation improves, i.e., with decreasing the bin size an expectation that is fully satisfied.For example, for n f = 2,000, the ) free ( TS slightly increase in going from n f = 200 to 1,200 and then become constant.While most of these changes are within the error bars, they reflect an insufficient equilibration of 2.5 ps (see discussion following Equation ( 13)).Thus, eliminating the results for n f = 200 (i.e., increasing the equilibration to 4.5 ps) would lead to the expected decrease among the (new) A loop TS results due to the expected increase in the (new) results for n f = 200 and 1,000 (however, to be consistent with [10] the results in the table are based on a 2.5 ps equilibration).In any case, within the error bars the A loop TS values in Table 2 can be considered as converged; they are only slightly larger (by ~1 kcal/mol) than those of Table 4 of [10] and the errors in both tables are comparable.We are mostly interested in the difference in entropy between the free and bound microstates, loop S T [Equation (15b)].Table 2 shows that for all approximations loop S T ~0 kcal/mol, i.e., the entropy of both microstates is the same, while in [10] for 160 water molecules kcal/mol, i.e., the free microstate has slightly higher entropy than the bound one.
The HSMD results for the entropy are compared to those obtained with the quasi-harmonic (QH) method [32], i.e., based on the equation: where the covariance matrix, σ, is obtained from a local MD sample and N is the number of internal coordinates used in the HSMD procedure.It should be pointed out that in a detailed study of QH Chang et al. [61] have found that the method might be unreliable when used in Cartesian coordinates or applied (in internal coordinates) to several microstates.On the other hand, QH was found suitable for treating a single microstate, while the of the results is slow and large samples are typically needed; also, because QH takes into account only second order correlations ) ( QH loop m S constitutes an upper bound for the correct S. Still, entropy differences ) ( QH loop m S  are expected to be reliable (see also [21]).Thus, to obtain reasonable precision, results for (m is bound or free) were obtained from MD samples of 10,000 loop-water configurations.To avoid the "escape" of a sample from the original microstate, it consists of 10 separate samples of 1,000 configurations (0.5 ns), each started from the same structure with a different set of initial velocities, where the initial trajectory of 0.5 ns is used for equilibration and is thus discarded.The values of QH loop TS exceed the HSMD results for loop TS (for n s = 2,000) by 11-10 kcal/mol for the free and bound microstates.These elevated results are in accord with QH loop

S
being an upper bound and are comparable to the overestimation values found in our previous studies [6][7][8][9][10].On the other hand, bound) ( -free) ( QH loop QH loop S S = 0.6 ± 2.0 is in accord (within the error bars) with the HSMD result ΔS loop = 0.
As in [10], the computer time required to reconstruct a loop structure capped with N water = 160 is 8.1 hours CPU on a 2.1 GHz Athlon processor, meaning that the entire reconstructions required 1,296 and 1,472 h CPU; this time is smaller for N water = 140.However, we have shown that considering only 10% (n f = 200) of the maximal reconstruction samples and using smaller samples of n s = 40 (rather than 80) have led to sufficiently accurate entropy differences, meaning that the total computer time can be reduced to 65 h CPU, respectively.We have generated the relatively large reconstruction to verify the convergence of the results.

The Effect of Water
In Table 3

F
(which is the total free energy without the contribution of the loop entropy, S loop , Equations ( 13) and ( 14)), and for the total energy, E total (Equation ( 1)).For simplicity we have omitted the letter "m" from these quantities; however, each one of them is calculated for the free and bound microstates, and their difference, Δ (free bound) is also provided.These results are obtained for 40 ≤ N water ≤ 160.Details about the integrations are described in the Appendix.
To check the stability of these results, and assess their statistical errors they were calculated for an increasing sample size (not shown), where the maximal n s is 80 (N water ≤ 140) and 40 for N water = 160.Statistical errors, s/(n s ) 1/2 , where s is the standard deviation, are provided only for N water = 140 and 160, which are larger than those obtained for N water < 140.The largest error of 2.5 kcal/mol is for E total (N water = 160), where the other errors are significantly smaller.The errors in the differences, Δ are again very stable, and smaller than 2.6 kcal/mol for ΔF sum (N water = 160); these errors are comparable to those obtained in [10].It is in particular encouraging that small errors are found for the water-template free energies,

F
, which are based on an integration of a significant number of atoms and are calculated for the first time in this paper.
Based on [10], one would expect to obtain the experimental difference in free energy, ΔF loop = F free -F bound ~ −4 kcal/mol for water densities close to that of bulk water, which for the present system occur for N water ranging from 140 to 180.Still, it is of interest to check this expectation by calculating results also for smaller N water .We have seen in 3.2 that for N water = 140 ΔS loop ~0, where small ΔS loop was found also in [10].We therefore assume that ΔS loop ~0 for other sets of results (N water ≠ 140) meaning that ΔF sum actually constitutes the difference in free energy between the two microstates.However, the results for ΔF sum in the table are positive for all N water , even for N water = 140 and 160, i.e., they differ significantly from the experimental value; in particular, for N water = 140 ΔF sum = 18 ± 2 kcal/mol.
Expecting ΔS loop to be close to zero for all values of N water , also means that the total entropy, TΔS total = ΔE total -ΔF sum is mainly due to water.The table shows that in all cases ΔE total is positive, i.e., the total energy of the free loop is larger than that of the bound one.Correspondingly, in all cases (besides for N water = 100), ΔE total is larger than ΔF sum , i.e., the entropy of the free loop (as expected) is also the largest, where TΔS total ranging from 5 (N water = 140) to 42 kcal/mol for N water = 160.Thus the entropy contributes significantly to the free energy where it constitutes between 20 and 64% of the total energy.
Before returning to the experimental-computational disagreement related to ΔF loop , it is of interest to discuss the results for N water = 160.These results (in particular those for the differences Δ) show an unexpected "jump" as compared to their counterparts for N water < 160.Thus, while the results for ΔF sum for 60 ≤ N water ≤ 140 are comparable (ranging between 15 and 26 kcal/mol), ΔF sum (N water = 160) = 95 kcal/mol.Correspondingly, ΔE total increases from 23 (N water = 140) to 137 kcal/mol (N water = 160).The main contributors for these results are the relatively large values of the sum of the loop energies, ΔE loop = 98 and ) tmpl LJ, ( TI water F  = 55 kcal/mol; these values are significantly larger than their counterparts for N water < 160.To check this point further we integrated the water-template interactions in our sample for the free microstate generated in [10] for N water = 160 to find the still relatively large results, ΔF sum = 43 kcal/mol and ΔE total = 134 kcal/mol.In these calculations (which do not appear in the table) ΔE loop = 117 contributes significantly to the large ΔF sum ; on the other hand, ) tmpl LJ, ( TI water F  = −64 kcal/mol is negative!It has been difficult to identify the configurationl elements that lead to these energetic changes even after examining the individual configurations numerically and by computer graphics.However, since the template is fixed, these results suggest that as the number of waters increase to N water = 160 their configuration undergoes a significant change, probably due to the increase of pressure in the sphere.Thus, water molecules that stay in the bulk for N water < 160 enter the template already in the equilibration stage (i.e., when the water-template interactions are still intact) affecting thereby the results for These configurational changes of water depend on the template which is different in [10] than in the present study.In this context it should be noted that there is a large cavity in the active site of the bound template as the catalytic residue was trimmed significantly to become a smaller Ser residue (see 2.1).It should be noted, that this effect of water is expected and indeed is found to be less pronounced in the loop-water integrations, TI water F (ch,loop) and TI water F (LJ,loop) (Equation ( 16)).The problems involving the results for N water = 160 are not expected to disappear to a large extent when a single template is used, because changes in the arrangement of water will affect approximately equally the results for the free and bound microstates and the corresponding deviations will mostly be cancelled in differences.Unlike the non-consistent results for N water = 160 discussed above, the results for N water < 160 show a consistent behavior, and based on [10] one would expect the simulations with N water = 140 to lead to the experimental, ΔF loop = −4 rather than ΔF loop = +18 kcal/mol.This disagreement might stem from the uncertainty in the crystal structure of the bound protein, 2dfp, which is characterized by large B-factors (with an average of 41.6 Å 2 ) whereas for many atoms the values are much larger; for comparison, the average B-factors in 2ace, the crystal structure of free AChE, is 23.4 Å 2 .The large cavity formed in the active site of the bound template mentioned above might also affect the results for the bound protein.These are inherent problems which might not be easy to overcome completely; however, one might alleviated them by applying different initial optimizations to the two templates.Thus, in this work we have used the same harmonic force constants of 1 kcal/(molÅ 2 ) to the free and bound x-ray structures, while it is plausible to anticipate that the force constant should be commensurate with the quality of the crystal structure, i.e., stronger for well defined structures.The results for N water = 160 provide some support for this assumption; thus, F sum = 43 is better (i.e., smaller) than F sum = 95 kcal/mol where these values are based on templates optimized by the force constants, 100 ( [10]) and 1 kcal/(molÅ 2 ) (here), respectively.However, the optimal force constants are unknown a-priori and their determination probably would require quite a few trials.
In summary: The fact that the experimental value, ΔF loop = −4 is known makes AChE a suitable system for checking the calculation of ΔF loop based on two different templates.As previously discussed, such calculations are of interest, as they mimic better the experimental conditions, and for an enzyme with a mobile loop, ΔF loop might be an important ingredient of the total absolute free energy of binding.Our calculations have revealed some instability in the results for N water = 160 but have led to consistent results for N water ≤ 140; obviously, it is somewhat disappointing that ΔF loop = −4 kcal/mol has not been recovered for N water = 140.However, this is not a failure of HSMD-TI per se but reflects the uncertainty in the crystal structure of the bound protein.One should also bear in mind that no other studies of a loop attached to two templates have been carried out thus far and besides our HSMD-TI papers [8][9][10]62] only few calculations of ΔF loop (based on a single template) are available [48,49].Thus, the present work should be considered as the first step and a basis for future more extensive studies, which can be pursued in several avenues: for example, HSMD-TI can be applied to a mobile loop with known ΔF loop where the quality of the crystal structures of both, the bound and the free protein is high.As is already stated above, one can try optimizing the templates by changing the force constants in a systematic way for N water = 140, and more work can be done to elucidate the problematic behavior of the results for N water = 160, which again reflects problems in the simulation itself rather than in the application of HSMD-TI.

Summary and Conclusions
This is an additional paper in a series of papers where our HSMD-TI method has been extended systematically for mobile loops in proteins.The method provides the absolute entropy and free energy and thus enables one calculating these quantities for microstates m and n with a large conformational variance without the need to carry out a complex thermodynamic integration from m to n. Mobile loops in enzymes can be crucial for the extent of binding, where their open conformation is moved to form a "lid" over the bound active site.
Thus far we have studied the 7-residue mobile loop, 304-310 (Gly-His-Gly-Ala-Gly-Gly-Ser) of the enzyme porcine pancreatic α-amylase, where the system was modeled by the AMBER96 force field alone and by AMBER96 with the GB/SA implicit solvent [8].Later the implicit solvent was replaced by explicit solvent, i.e., the α-amylase loop was capped with 70 TIP3P [9] water molecules.To determine the minimum size of the template and the minimum number of water molecules required to cap a loop, we have recently carried out a systematic study of the loop, 287-290 (Ile, Phe, Arg, and Phe) of AChE [10].The conclusion is that to recover the experimental free energy difference the water density should be close to that of bulk water, and the minimum template size is of 1,000 atoms for this loop.In all these studies and in a recent study of a mobile loop of the protein streptavidin [62] the free and bound loop structures were attached to a common template that was "cut" from the crystal structure of the free loop.Obviously, one would seek to treat each loop attached to its own template.This has been the objective of the present paper.
As a first step of this study the heavy atoms of the two crystal structures were harmonically restrained (with a force constant of 1 kcal/Å 2 ) and the energy was minimized.Afterwards the loop and (TIP3P) water atoms were allowed to relax in the presence of a fixed template.The relatively weak force constant (1 kcal/Å 2 ) (as compared to 100 kcal/Å 2 used in [10] for the pdb structure 2ace) was chosen to allow more extensive changes in the bound structure due to its relatively large B-factors and the chemical change imposed on it by removing the phosphate group; correspondingly, larger structural changes were also allowed to occur in the free structure.Then two cuts from each crystal structures were performed, which contain exactly the same atoms; these templates and the attached loops were subjected to further optimization steps as described earlier.
The new computational component in this study is the annihilation (for each template) of the water-template interactions using a TI procedure.Unlike the loop-water TI, one would expect the accuracy of the water-template TI to depend strongly on the system size.Thus, for N water ≤ 140 the errors are relatively small and the results for the free-bound differences in the total free energy and energy, ΔF sum and ΔE total are comparable, while for N water = 160 the errors become larger (in particular for the Lennard-Jones part of the water-template TI) and a significant increase in ΔF sum and ΔE total occurs.We attribute this jump in the results to changes in the configuration of water stemming from the increase in water pressure in the sphere as the number of waters grows.
Finally, calculation of ΔF loop based on separate templates which are cut from the free and bound crystal structures is more realistic than using a single common template for both loop conformations.ΔF loop should also be considered in the calculation of the absolute free energy of binding when the conformational change of the loop is due to ligand binding is of an induced fit type.Thus, while the present calculations show that recovering the experimental result, ΔF loop = F free − F bound ~ −4 kcal/mol is not straightforward, to achieve improvement more research is needed, which will be based on the conclusions of the initial work carried out here, as specified in the summary of the previous section.

Thermodynamic Integration of Water
As described earlier, two TI processes are carried out where the first is applied to the loop and water.Thus, the interaction energy (electrostatic and Lennard Jones (LJ)) between a fixed loop structure and the (moving) water molecules is decreased gradually to zero (rather than increased from zero) at constant T and V; the water-water and water-template potential energy is unchanged.After completing a loop-water TI (i.e., the loop-water interactions are zero), a second TI process is performed where the water-template interactions (LJ and electrostatic) are decreased to zero; this TI process starts from the last water configuration obtained in the loop-water TI.For the (LJ) potential we have used the shifted scaling potential, introduced by Zacharias et al. [63]: where the shift parameter,  = 3 Å 2 , prevents the divergence of the potential (and its derivative) at small pair separations; a similar scaling function is used for the Coulomb interactions.The free energy derivatives with respect to λ, ∂F/∂λ is: where the derivative of the energy is calculated analytically.The integration with respect to λ is carried out by dividing the range [1,0] into 20 equal integration bins Δλ i .The (λ = 1→λ = 0) integration of the electrostatic interactions (i.e., charge elimination) is carried out first (in the presence of intact LJ interactions) followed by a ε = λ = 1→0 integration of the LJ interactions.Thus, the entire two-stage process is based on 40 ∂F/∂λ i integration steps.
The MD simulation consists of a 2 fs integration step.Each (Δλ i ) step (window) starts with energy minimization (based on λ i ) of the last structure obtained in the simulation of the i-1 step, followed by 5 ps MD simulation for equilibration, which is discarded.The next step the production MD simulation, is of 20 ps in the loop-water TI, where a configuration is retained every 0.02 ps, i.e., altogether 1,000 configurations are used for evaluating <∂F/∂λ i >.For the template-water TI, the production simulation is longer, of 40 ps (2,000 configurations), due to the larger size of the template.It should be pointed out again that these two integrations are carried out for two different templates.The computer time is discussed in the Appendix of [10].

2 
microstate m.Notice that in the discussion below we define the loop conformation by the set of angles [α k ] rather than by the positions of the loop atoms, x loop , which are determined by [α k ].Each of the configurations (frames) for brevity) of the sample is reconstructed in two stages, where the biotin structure is reconstructed first followed by the reconstruction of the water configuration x N .Because the position of atom k' is defined by a dihedral and a bond angle one has to calculate their TP simultaneously.Thus, at step k' (k = 2k') of stage 1, the k−2 angles 1    k have already been reconstructed and the TP density of k

Figure 1 .(
Figure 1.An illustration of the HSMD reconstruction process of conformation i of a peptide consisting of three glycine residues, where for simplicity the oxygens and most of the hydrogens are discarded.At step k' = 6 the TPs related to the "past" atoms k' = 1…5 (depicted by full spheres) have already been determined and these atoms are kept fixed in their positions at conformation i.At this step (6) one calculates the TPs of bond angle α k (defined by C'-C α -N) and dihedral angle α k-1 (defined by C'-C α -N-C') which are related to C', where k = 2k' and thus k = 12.The TPs are obtained from an MD simulation where the as yet unreconstructed atoms k' = 6…10 (the "future" atoms) are moved (depicted by empty spheres connected by dashed lines) while the past atoms k' = 1…5 are kept fixed; notice that the future part should remain within the limits of the microstate and future-past interactions are taken into account.Small bins δα k-1 and δα k are centered at the values α k-1 and α k in i.The TP is calculated from the number of visits (n visit Equation11) of the future part to δα k-1 and δα k simultaneously during the simulation.After ) , , ( 1 10 12 11 microstates obtained for a specific set of parameters is denoted by

Table 2 .
HSMD results (in kcal/mol) for the loop entropy, results are presented for the loop energy,

Table 3 .
Results in kcal/mol for energy and free energy components and their difference (Δ) between the free and bound microstates*.
energy (without the contribution of the loop entropy, S loop ).Δ is the difference, free bound.n s is the sample size.Statistical errors, s/(n s ) 1/2 , where s is the standard deviation, are given only for results based of a number of water molecules, N water = 140 and 160; the errors for N water < 140 are smaller than those obtained for N water = 140 and 160.