A Comparison of QM/MM Simulations with and without the Drude Oscillator Model Based on Hydration Free Energies of Simple Solutes

Maintaining a proper balance between specific intermolecular interactions and non-specific solvent interactions is of critical importance in molecular simulations, especially when predicting binding affinities or reaction rates in the condensed phase. The most rigorous metric for characterizing solvent affinity are solvation free energies, which correspond to a transfer from the gas phase into solution. Due to the drastic change of the electrostatic environment during this process, it is also a stringent test of polarization response in the model. Here, we employ both the CHARMM fixed charge and polarizable force fields to predict hydration free energies of twelve simple solutes. The resulting classical ensembles are then reweighted to obtain QM/MM hydration free energies using a variety of QM methods, including MP2, Hartree–Fock, density functional methods (BLYP, B3LYP, M06-2X) and semi-empirical methods (OM2 and AM1 ). Our simulations test the compatibility of quantum-mechanical methods with molecular-mechanical water models and solute Lennard–Jones parameters. In all cases, the resulting QM/MM hydration free energies were inferior to purely classical results, with the QM/MM Drude force field predictions being only marginally better than the QM/MM fixed charge results. In addition, the QM/MM results for different quantum methods are highly divergent, with almost inverted trends for polarizable and fixed charge water models. While this does not necessarily imply deficiencies in the QM models themselves, it underscores the need to develop consistent and balanced QM/MM interactions. Both the QM and the MM component of a QM/MM simulation have to match, in order to avoid artifacts due to biased solute–solvent interactions. Finally, we discuss strategies to improve the convergence and efficiency of multi-scale free energy simulations by automatically adapting the molecular-mechanics force field to the target quantum method.


Introduction
Biological systems are mostly composed of water, and the interactions with water are a central feature of life as we know it [1][2][3][4][5]. Solvation influences a wide variety of processes, including protein folding [6][7][8][9][10], crystal polymorphism [11], conformational equilibria [12][13][14][15] and even basic reaction pathways [16]. Furthermore, water is one of the main actors in the selectivity of biochemical interactions and has a profound influence on both the kinetics and thermodynamics of protein-protein, protein-nucleic acid and protein-ligand binding [17]. Any binding event between a ligand and a receptor in aqueous solution is first preceded by the desolvation of water molecules from the binding site and the ligand's surface. A binding event only occurs if the ligand-receptor interactions can compensate for the loss of ligand-solvent and receptor-solvent interactions and the associated entropy changes [18][19][20]. Given the fundamental importance of the solvent, no biomolecular model is adequate without properly accounting for solvation.
The free energy costs of (de-)solvation are quantified by its solvation free energy, which corresponds to the transfer free energy of the molecule from the gas phase to solution [21][22][23][24][25]. In aqueous solution, the solvation free energy is also known as hydration free energy (∆G hyd ). In the molecular mechanics (MM) modeling community, ∆G hyd values have been an essential benchmark quantity for decades [14,. Furthermore, significant efforts have been invested in the quantum mechanical (QM) community to develop highly accurate implicit solvent methods [55][56][57][58][59][60][61][62][63]. However, when it comes to a hybrid QM/MM approach, where a quantum mechanical solute is embedded in a classical explicit solvent, solvation free energies have received less attention because of the computational cost and complexity of sampling the solvent degrees of freedom.
One of the most important shortcomings of conventional force fields is the neglect of electronic polarization. During a simulation, the charge distribution of an MM molecule cannot respond to its environment. Since polarizability is known to be important, especially in QM/MM simulations, there is major interest in the use of polarizable force fields such as the CHARMM Drude force field [132]. Here, we perform simulations with both the CHARMM fixed charge force field and the CHARMM Drude polarizable force field, to discern the benefits and challenges of this new generation of force fields and help lay the groundwork for future development of QM/MM methods with increased predictive capability. It is of particular practical interest to ascertain the degree to which optimization of the QM/MM van der Waals interaction parameters may be needed for different QM methods, and the additional computational efforts of the Drude force field are beneficial. Our recent work [133] analytically showed that significant additional computational costs can be justified in multi-scale free energy simulations, if the sampling method exhibits a higher phase space overlap with the target QM Hamiltonian. Thus, it can be expected that polarizable force fields and, ultimately, quantum-mechanical methods will play an increasing role in free energy calculations [134][135][136][137][138][139].
The remainder of this paper is organized as follows: First, we summarize the details of the model systems and simulations. Next, we present the results for the ∆G hyd values of twelve simple solutes, using both the fixed charge and the Drude force field. Finally, we compare the performance of MP2, Hartree-Fock, several density functional methods (BLYP, B3LYP, M06-2X) and semi-empirical methods (OM2 and AM1 ) in terms of ∆G hyd with QM/MM. This is done for both the fixed charge force field and the Drude force field. We also discuss other aspects that can have an impact on the accuracy of the results and the efficiency of the free energy simulations, including empirically scaling ∆G hyd values, using a self-consistent optimization of the Drude particles at each step, or increasing the overlap between the MM force field and the QM target energy function by introducing a tailored MM' force field. The Appendix includes a comparison of the convergence properties of free energy estimates based on the fixed charge and the Drude force field and also provides the detailed results of all MM free energy sub-steps.

Simulations
A test set of 12 molecules was used: water, methanol, ethanol, methanethiol, acetamide, tetrahydrofuran, benzene, phenol, aniline, ethane, n-hexane and cyclohexane (see Figure 1). These molecules were chosen to cover a large range of hydration free energy values, between −8.05 kcal/mol (acetamide) [140] and +2.55 kcal/mol (n-hexane) [24], encompassing different levels of hydrophobicity. In addition, this set includes amino acid side-chain analogs, ring compounds and hydrophobic molecules, thus providing a minimalistic test set without additional challenges, such as protonation, tautomerism or extensive conformational flexibility. We have previously used this test set to study polarization energies [141], the convergence of free energy simulations [133] and the use of 1-butanol for the extraction of polar solutes [142]. All free energy calculations were conducted using the CHARMM software [143,144], with the CHARMM General Force Field (CGenFF) for organic molecules [145] (the fixed charge force field) and the CHARMM Drude force field (polarizable force field) [146][147][148][149][150][151]. To determine ∆G hyd , each molecule was alchemically annihilated both in the aqueous phase and in the gas phase. The gas phase simulations used Langevin dynamics with a friction coefficient of 1 ps −1 and a temperature of 300 K. The simulation time was 500 ns, and coordinates were saved every 20 ps.
We modeled the aqueous phase with 1687 water molecules in a cubic simulation cell with edge lengths between 36.85 and 36.89 Å, as determined from equilibration simulations of 0.5 ns in the isobaric isothermal ensemble (NPT). For the fixed charge simulations, we used CHARMM TIP3P water with Lennard-Jones parameters on the hydrogens [152] and a Nosé-Hoover thermostat [153,154] at 300 K. We performed the Drude simulations with the SWM4 water model [155] and an operator-splitting velocity-Verlet algorithm [156], using a response time τ of 0.1 and a temperature of 300 K for the atomic particles and a τ of 0.005 and a temperature of 1 K for the relative motion of Drude particles. The friction constant was set to 10 ps −1 . In all solvent simulations, long-range electrostatic interactions were computed with the particle Mesh Ewald method [157], and Lennard-Jones interactions were switched off between 10 and 12 Å. All molecules were first equilibrated for 0.5 ns using constant pressure, followed by an equilibration of each alchemical transformation state (λ state) for 0.5 ns in the constant volume ensemble (NVT). Production simulations in aqueous solution were conducted for 5 ns. All simulations used a time step of 1 fs, saving frames every 1000 steps. SHAKE [158] was used to keep all bonds in the solvent rigid.
Both the simulations in the aqueous phase and in the gas phase employed λ-Hamiltonian replica exchange [159] to enhance sampling by exchanging structures between adjacent λ states every 20,000 steps, using the REPD module of CHARMM. The free energy calculation was broken into two parts. First, the charges were scaled to zero (∆G char ), using three steps for the fixed charge force field (λ = 0.00, 0.20, 0.55 and 1.00) and five steps for the Drude force field (λ = 0.00, 0.10, 0.25, 0.5, 0.75 and 1.00). The choice of this protocol is discussed in more detail in Appendix A. Second, the van der Waals interactions were turned off (∆G vdw ) with λ = 0.00, 0.15, 0.30, 0.45, 0.60, 0.75, 0.87, 0.96 and 1.00 for the fixed charge force field and λ = 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1.0 for the Drude force field. Soft core potentials, as implemented with the PSSP command in the PERT module of CHARMM, were used with the default parameters to avoid the "van der Waals endpoint problem [143,160]." Based on the information in the Hamiltonian replica exchange log file, free energy differences were calculated with Bennett's acceptance ratio (BAR) method [161], as implemented in the FREN module of CHARMM. Each free energy simulation was repeated four times to calculate averages and standard deviations.
Electrostatic embedding was used (i.e., the QM or SQM solute is polarized by the MM point charges of the solvent, but the solute-solvent van der Waals interactions are still calculated on the MM level). Since Q-Chem and MNDO do not support periodic boundary conditions (PBC) in a fully consistent manner, potential energy differences for the free energy corrections were calculated by centering the solvent box around the solute molecule with the MERGE command of CHARMM, followed by a potential energy evaluation of the simulation box without any cutoffs.
Only the solute-solute and electrostatic solute-solvent interactions were considered for the potential energy differences, by using the BLOCK module of CHARMM. This is justified by the fact that the solvent-solvent interactions and the solute-solvent van der Waals interactions cancel when calculating the potential energy difference between MM and QM/MM. This way, also long-range electrostatic interactions are still considered, but only at the MM level. This treatment is analogous to that done in [79,130,177]. For the Hartree-Fock QM/MM corrections based on the Drude simulations, we also considered potential energy evaluations where all Drude particles within 6 Å of the solute were first relaxed with 100 steps of conjugate gradient energy minimization using the MM force field, followed by five steps of energy minimization with QM/MM. This allows some response to the polarization from the QM region. All other particles were held in place during the minimization with the CONS FIX command.
The MM→(S)QM/MM free energy corrections were computed with the Zwanzig equation (also known as free energy perturbation or the exponential formula), as implemented in the FREN module of CHARMM [178]. This choice is justified, since the limiting factor for convergence is the phase space overlap between MM and QM [177,179]. A more thorough discussion of this aspect can be found in [133], where Figure 10 is based on exactly the same data as some of the gas phase results here.

Generation of the MM' Tailored Force Field
To illustrate the effect of adjusting the bonded parameters of the force field on the convergence of multi-scale free energy simulations, tailored force field parameters were generated for each molecule based on OM2. For this purpose, the initial conformation in the gas phase of each molecule was energy minimized. The resulting OM2-optimized structure was then used to populate the bond and angle parameters in CHARMM. To allow unique equilibrium bond length and angle values for all atoms, every atom was assigned to its own unique type. Charge and Lennard-Jones parameters for each unique atom type were obtained from the original parameterization. The overall procedure of generating tailored bonded parameters was implemented as the new QMFIX command in the FREN module of CHARMM. Tailored parameter and topology files were written with the MKDUMMY command in the FREN module.
Based on the original parameters and the tailored MM' force field, simulations of only the physical end points in both the gas phase and in solution were performed. The simulation length was adjusted to 100 ns in the gas phase and 10 ns in the aqueous phase, with 10,000 frames saved for later analysis and all other settings unchanged. Three different ways to calculate the QM/MM free energy corrections were employed: (a) using the Zwanzig equation based on a simulation with the original force field; (b) using the Zwanzig equation based on a simulation with the MM' tailored force field; (c) combining the data of the original force field and the MM' tailored force field simulation with the Non-Boltzmann-Bennett (NBB) method [79,80,180].

Results and Discussion
Before discussing the impact of using QM/MM on the affinity for water, it is illustrative to observe the faithfulness of the solute-water interactions in pure MM. Hydration free energies have been classical benchmark systems for decades. In the CHARMM force field, the compatibility with a particular water model such as TIP3P is a centerpiece of the parameterization strategy, in particular for the charges. Thus, it is expected that the interactions with water are comparable to experiment. Table 1 lists the hydration free energies for both the CHARMM fixed charge force field (∆G FC hyd ) and the Drude force field (∆G Drude hyd ). More detailed results, listing the free energy results of the gas phase, electrostatic and van der Waals changes can be found in Appendix B. Since each simulation was repeated four times, also the corresponding standard deviations of the results are provided. The overall metrics for agreement with experiment are listed in the last three rows. While the fixed charged force field exhibits a root mean squared deviation (RMSD) of 0.89 kcal/mol, the Drude force field reaches an RMSD of 0.55 kcal/mol. Thus, the Drude force field outperforms the fixed charge force field. Both force fields yield what is considered "chemical accuracy", but this is most likely a reflection of the simplicity of the test set and the high level of optimization of the parameters. In terms of mean signed deviation, the Drude force field also yields a more favorable result (0.04 kcal/mol compared to 0.65 kcal/mol). This indicates some small systematic bias of the fixed charge force field in terms of being overly hydrophobic. The correlation coefficients with the experiment are in both cases excellent (R 2 of 0.97 and 0.99).
The last column of Table 1 lists the differences between the fixed charge and the Drude force field results. While the deviations for most apolar molecules are not statistically significant, the results for water, acetamide and phenol differ by more than one kcal/mol. Furthermore, several other polar molecules exhibit a change of their ∆G hyd , but all changes improve the agreement with experiment. The only notable exception is cyclohexane, where the deviation from the experimental ∆G hyd increases from ca. half to one kcal/mol. On the other hand, the small differences for methanol and ethanol are a bit surprising.
The increased accuracy of the Drude force field comes at a price though. First, the average CPU times for the aqueous phase simulations increase by at least a factor of two due to the additional Drude and lone pair particles. Second, additional λ points were required to achieve approximately the same level of precision as the fixed charge force field. This aspect is more thoroughly discussed in Appendix A based on the ∆G char calculations. The largest differences between the fixed charge and the Drude force field are found for acetamide (2.5 kcal/mol), phenol (2.35 kcal/mol), aniline (1.08 kcal/mol), benzene (1.07 kcal/mol) and water (−1.14 kcal/mol). The ∆G hyd values obtained from different QM/MM methods based on trajectories with the CHARMM fixed charge force field are presented in Table 2. The columns are ordered based on the RMSD of the corresponding method from experimental hydration free energies, starting with the lowest RMSD on the left. The last six rows again represent the root mean squared deviation from experiment, the mean signed deviation and the Pearson correlation coefficient. RMSD, MSD and R 2 are given twice: once for the complete dataset (unmarked) and once for all molecules except ethanol and acetamide (marked with asterisks). The two molecules were omitted because of the high standard deviations of more than one kcal/mol in some calculations (ethanol in the case of the fixed charge force field and acetamide because of problems encountered with the Drude force field). This allows a direct comparison of the converged parts of the two datasets.  Overall, the QM/MM results with electrostatic embedding and CHARMM TIP3P water in the MM region are slightly disappointing. The RMSD vary between 1.3 and 2.4 kcal/mol, which is worse than the pure MM result of 0.9 kcal/mol. This finding can partly be explained by the high level of optimization of the MM force field. Furthermore, the QM methods were not adapted to cancel some of the shortcomings of the TIP3P water model.
Before discussing the results in more detail, we want to validate our protocol for obtaining QM/MM hydration free energies based on the existing literature. The ∆G hyd values for water are in good agreement with relative free energy results between MM and QM based on QM/MM sampling with Monte Carlo simulations by Shaw, Woods and Mulholland [181]. Table 1 of [181] lists a free energy difference between QM water and CHARMM TIP3P water of 1.5 ± 0.5 kcal/mol for MP2, while we obtain a difference of 1.9 ± 0.2 kcal/mol. The discrepancies for BLYP (0.5 ± 0.3 versus our 1.2 ± 0.2 kcal/mol) and HF (2.7 ± 0.5 versus 3.1 ± 0.2 kcal/mol) are higher, but this can be explained by the use of different basis sets (Shaw et al. used aug-cc-pVDZ, while we employed 6-31G(d)). Furthermore, the BLYP and M06-2X ∆G hyd values exhibit an average deviation of 0.5 and 0.3 kcal/mol from the results published in Table 4 of [141]. The small discrepancies can be explained by the use of rigid gas-phase geometries for the solutes in [141] and by the high uncertainty of the ethanol result here. For B3LYP, the ∆G hyd values for ethane (1.9 kcal/mol) and methanol (−5.1 kcal/mol) are in excellent agreement with previously published hydration free energy differences (−7.0 kcal/mol here compared to −6.96 kcal/mol in Table 1 of [80] and −7.15 kcal/mol in Figure 7 of [177]). The relatively good agreement with previously published results, in conjunction with the simplicity of the solutes and the high number of QM/MM potential energy evaluations, supports our findings.
In terms of the compatibility of different QM methods with CHARMM TIP3P water based on the RMSD from experiment, the OM2 method seems to be the best (RMSD = 1.3 kcal/mol), followed by BLYP (1.4 kcal/mol), B3LYP (1.5 kcal/mol), M06-2X (1.9 kcal/mol), MP2 (2.1 kcal/mol), AM1 (2.4 kcal/mol) and HF (2.4 kcal/mol). This finding agrees with the ranking by Shaw et al. based on the free energy difference between QM and MM water (BLYP < MP2 < HF) [181]. To some degree, it is surprising that the semi-empirical method OM2 and the pure functional BLYP clearly outperform more advanced QM methods. As discussed in Section IV E and Table S14 of [141], the QM/MM electrostatics become more attractive as the amount of Hartree-Fock exchange increases from BLYP to B3LYP to M06-2X to HF/MP2. With fixed QM/MM van der Waals interactions, the hydration free energies become more negative. The MSD are −0.8 kcal/mol for BLYP, −1.0 kcal/mol for B3LYP, −1.5 kcal/mol for M06-2X and −1.9 kcal/mol for Hartree-Fock. Thus, the QM/MM results are significantly too hydrophilic. Although the CHARMM charges are based on Hartree-Fock calculations [145], the results imply that Hartree-Fock itself is not particularly suited for QM/MM simulations, due to the large systematic bias in favor of solute-solvent interactions. However, since the QM/MM ∆G hyd values are highly correlated with the experimental data, it is possible to address this shortcoming by scaling the interactions. This is illustrated in Appendix C.
The ∆G hyd values obtained from different QM/MM methods based on trajectories with the CHARMM Drude force field are presented in Table 3. The columns are again ordered based on the RMSD of the corresponding method from experimental ∆G hyd values, starting with the lowest RMSD on the left. Except for the two semi-empirical methods, the rank order of the QM methods based on RMSD is actually inverted, with Hartree-Fock (RMSD = 2.4 kcal/mol) followed by MP2 (2.4 kcal/mol), M06-2X (2.6 kcal/mol), B3LYP (2.7 kcal/mol) and BLYP (3.1 kcal/mol). However, the RMSD is not a reliable measure here, since the acetamide results are far from converged, with standard deviations between 1.4 and 2.9 kcal/mol. As more thoroughly discussed in a recent paper, high standard deviations in multi-scale free energy simulations can be an indicator that the MM energy minimum is very far away from the QM energy minimum [133]. Indeed, when comparing the optimal C-C bond length of acetamide of MM (1.153 Å for the types CD201A and CD33C) with, e.g., the bond length of an energy optimized structure with OM2 (1.513 Å), there is a clear discrepancy of 0.36 Å, which leads to substantial energy differences. Given that the equilibrium bond lengths of C-C bonds are typically between 1.35 and 1.55 Å in the CHARMM force field, this is a clear indication for a typo in the Drude parameter file for acetamide. A further investigation is in progress. Ignoring the flawed acetamide results and focusing on the metrics marked with an asterisk, the overall results of most methods (except BLYP and AM1) are surprisingly similar, with RMSD * between 1.3 and 1.5 kcal/mol and MSD * between mere −0.4 and 0.5 kcal/mol. While the RMSD * are a little bit higher than the best results for the CHARMM TIP3P water model (RMSD * between 0.9 and 2.4 kcal/mol), the consistency between most methods and the low systematic errors can be regarded as a sign of better compatibility with QM/MM methods. Given that the development of polarizable Drude force fields is still in its early stages, one can still expect some improvements in the future. The AM1 semi-empirical method is among the most inaccurate methods in the test set, with RMSD of 2.4 kcal/mol for the fixed charge model and 3.1 kcal/mol for the Drude model. In the light of such results, it is somewhat surprising that the popular AM1-BCC method to determine MM charges [182,183], which builds upon AM1, is as effective as it is when it comes to hydration free energies [44].
Another aspect that can influence the accuracy of the Drude oscillator model is the use of the extended Lagrangian formalism [184], in which Drude particles have a mass and kinetic energy. This implies that the particles do not necessarily reside at the energy minimum at each step. Also in our QM/MM energy evaluations, the Drude particles in the MM region were not relaxed in response to the QM wave function. To evaluate the effect of relaxing the Drude particles, five steps of conjugate gradient energy minimization were performed with QM/MM after an MM SCF optimization of the Drude particles. The resulting hydration free energies for Hartree-Fock with the extended Lagrangian approach (HF-EL) and based on the self-consistent optimization of the Drude particles (HF-SCOD) are shown in Table 4. While the overall agreement with experiment in terms of the RMSD does not change significantly with the use of self-consistent Drude particles (RMSD of 2.4 and 2.5 kcal/mol), the solvent affinity increases in all cases (as it should). For the Hartree-Fock calculations, this leads to a lower systematic error in terms of MSD of a mere 0.04 kcal/mol (instead of 0.34 kcal/mol). The average change of 0.3 kcal/mol is lower than the average standard deviation of ca. 0.7 kcal/mol, so most differences here are not statistically significant. Because the convergence of some of the QM/MM ∆G hyd results was not satisfactory, we also explored the possibility to improve this situation by employing a tailored force field (denoted as MM'). By adopting bonded terms that match more closely the bond lengths and angles encountered in the target QM method, the phase space overlap is supposed to be increased, which also improves the convergence of the free energy calculation between MM and QM [133]. The approach is outlined in Figure 2. In particular, we explored three different ways to perform the "bookend" corrections: (a) using the Zwanzig equation [178] to directly calculate the free energy difference between the MM force field and the QM Hamiltonian; (b) generating an MM' tailored force field with optimized parameters to increase the phase space overlap with the QM Hamiltonian; the free energy difference between the original force field and the tailored force field can be calculated with Bennett's acceptance ratio method (BAR) [161], while the free energy difference between the modified MM' force field and the QM state is calculated with the Zwanzig equation; (c) combining all the potential energy data from MM, MM' and QM with the Non-Boltzmann-Bennett equation [79,80,180]. A comparison of the results of the three theoretically equivalent approaches is given in Table 5. The third column (MM→QM) reflects the ∆G hyd values from direct free energy calculations between MM and QM energy surfaces using the Zwanzig equation. In principle, the results here should correspond to those in the third column of Table 2 (OM2). However, since different trajectories and setups were employed, one can expect some small discrepancies. The overall RMSD (1.5 compared to 1.3 kcal/mol) and MSD (0.3 versus −0.3 kcal/mol) are similar compared to Table 2, which serves as another verification of the approach. The third column of Table 5 shows the results obtained using the tailored MM' force field to calculate the free energy difference to the QM state. The ∆G hyd values of Columns 2 and 3 should also match within the corresponding uncertainties, since the end points are the same. Indeed, except for aniline, the differences between the two columns are below 0.2-0.3 kcal/mol, which also corresponds to the average standard deviation of the results (shown in the last line). Importantly, the average standard deviation is a little bit lower for the MM→MM'→QM transformation, due to the increased phase space overlap between the MM' and the QM state. The last column shows the result of an NBB calculation that combines the potential energy data of the two transformations in an optimal way. The fact that the NBB results are almost identical to the MM→MM'→QM transformation further indicates that there is more phase space overlap between the MM' and QM state, thus dominating the NBB calculation. However, the overall improvement is rather small, which signifies that the original bonded parameters were already well optimized.

Conclusions
In this work, we computed hydration free energies for twelve simple solutes to determine an effective choice of QM method to use in combination with explicit solvent. Here, we focused on the fixed charge CHARMM TIP3P and the polarizable SWM4 water model in the CHARMM force field. As a reference, we first provided hydration free energies based on pure MM simulations. Both the fixed charge (RMSD = 0.89 kcal/mol) and the Drude force field simulations (RMSD = 0.55 kcal/mol) exhibit excellent agreement with the experimental data and are well converged with respect to conformational sampling.
For QM/MM hydration free energy calculations based on the CHARMM CGenFF fixed charge force field, the best results were obtained with the OM2 semi-empirical method (RMSD = 1.3 kcal/mol) and the BLYP method (RMSD = 1.4 kcal/mol). The other methods (B3LYP, M06-2X, MP2, AM1 and Hartree-Fock) yielded RMSD between 1.5 and 2.4 kcal/mol. This ranking of QM methods agrees with the previous observation that the systematic error of hydration free energies of QM/MM methods with CHARMM TIP3P water increases systematically with the amount of Hartree-Fock exchange [141]. Therefore, we recommend using either OM2 or BLYP for QM/MM simulations in aqueous solution with CHARMM TIP3P water. This QM/MM protocol was also successfully applied to the calculation of distribution coefficients in SAMPL5 [130], which reflects the change from a hydrophilic to a hydrophobic environment.
As for the QM/MM hydration free energy calculations based on the CHARMM Drude force field, the best results were obtained with the OM2 semi-empirical method (RMSD = 1.6 kcal/mol). However, the ranking of the other methods is nearly reversed, with Hartree-Fock (RMSD = 2.4 kcal/mol) outperforming MP2, M06-2X, B3LYP, BLYP and AM1. The MP2, M06-2X and Hartree-Fock methods perform slightly better with the Drude force field in terms of RMSD, and their systematic error is significantly lower. Thus, if a potential bias from the solute-solvent interactions is a concern, it might be advisable to employ the Drude force field for QM/MM simulations with those methods. However, the performance of QM/MM with the Drude force field is only marginally better. Furthermore, the Drude accuracy between the extended-Lagrangian (EL) and self-consistent optimization implementations is statistically indistinguishable, but can slightly affect the systematic bias.
Overall, the OM2 semi-empirical method shows the best performance for both datasets with RMSD of 1.3 and 1.6 kcal/mol, while the AM1 semi-empirical method exhibits the worst performance with RMSD of 2.4 and 3.1 kcal/mol. The PM3 semi-empirical method was omitted in the manuscript because of its RMSD of 3.5 and 4.5 kcal/mol, further demonstrating the high variability in the quality of semi-empirical methods. However, both the accuracy and robustness of the OM2 hydration free energy results are very encouraging, especially since the OM2 parametrization did not include solvation free energies. This makes the method suitable for improving the quality of MM free energy predictions via post-processing, as OM2 can be applied to thousands of snapshots within mere minutes on a commodity laptop.
Our results also corroborate the conclusions of a recent study by Ganguly, Boulanger and Thiel [185]. The effect of MM polarization via Drude particles on QM/MM hydration free energies is only moderate compared to the well-developed CHARMM fixed charge force field. Fixed charge force fields are well tested, faster and more robust than the recently developed polarizable force fields. Therefore, they will most likely continue to play a significant role in computational chemistry. While polarization is a highly relevant physical effect, Drude force fields still neglect other important factors such as charge penetration, coupling of polarization with many-body exchange, dispersion and charge transfer [186][187][188]. In addition, the impact of Drude point charges in proximity to the QM region is still unclear at this point.
The force field parameters (e.g., the van der Waals parameters) will likely have to be adapted according to the target QM function. Thus, some form of tailored MM' force field will be beneficial for future applications of QM/MM in multi-scale free energy simulations. The need for improvement is highlighted by the systematic errors of QM/MM in the kcal/mol range, as well as the clear superiority of the MM ∆G hyd results compared to QM/MM. Our results show that spending computer power to add all the right physics to the QM region in a QM/MM simulation will be in vain if the MM description of the solvent environment is not compatible with the QM description of the solute.

Acknowledgments:
The authors are much indebted to Yihan Shao for his help and advice. The authors especially thank Tim Miller for his support during the generation of the underlying software, as well as Richard Venable and John Legato for technical assistance. The authors also thank Pavlo Dral and Abir Ganguly for very helpful discussions.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:  Figure A1 shows a comparison of the convergence of the ∆G char free energy estimates of the fixed charge (left side) and Drude force field (right). The first row shows the average deviation of all twelve molecules with respect to a reference BAR protocol with four λ steps (λ = 0.00, 0.25, 0.50, 0.75, 1.00). Each λ point uses 1900 snapshots from 1.9 ns of simulation time, and each simulation was repeated four times to obtain standard deviations. The second row shows the average standard deviation of the free energy estimates. In addition, each plot also shows a comparison of BAR free energy estimates (blue) with the corresponding estimates based on Thermodynamic Integration (TI) [189] using Clenshaw-Curtis numerical quadrature (red) [190]. Clenshaw-Curtis quadrature is an attractive option for Thermodynamic Integration because it uses the physical end points of the simulation and is nestable. In addition, it is almost as efficient as the well-known Gauss-Legendre method [191][192][193]. In contrast to Gauss-Legendre quadrature, which does not include the physical end points, Clenshaw-Curtis quadrature allows a direct comparison to BAR, using exactly the same potential energy data.
Focusing on the average deviation of the free energy estimates (top of Figure A1), one can observe that more λ steps are necessary with the Drude force field in order to yield the same level of accuracy and precision as the fixed charge force field. The TI results based on one λ step only use the physical end points, which implicitly assumes that the ∂G ∂λ is linear. Thus, the errors of the one-step TI protocol are an indicator for higher-order coupling within the system and that the probability distribution of the potential energy difference cannot be described by a Gaussian [30,194]. Not surprisingly, this problem is more pronounced in the polarizable Drude force field with an average error of 4.37 kcal/mol of the one-step protocol (compared to an error of 1.11 kcal/mol for the fixed charge force field). This suggests that linear response methods such as the Linear Interaction Energy method [195] or low-order cumulant expansion [196] are only marginally compatible with the Drude force field.
The average standard deviation (bottom of Figure A1) is elevated for the one-step BAR protocol (0.14 kcal/mol for the fixed charge force field and 0.29 kcal/mol for the Drude force field), while all other protocols exhibit average standard deviations between 0.04 and 0.08 kcal/mol. This demonstrates that the precision is not a reliable measure for the accuracy of the free energy estimates. This is especially striking for TI, where the main source of error arises from the treatment of the higher derivatives of the integrand. This notwithstanding, both BAR and TI can reach converged values of ∆G char with just two or three λ steps for the molecules considered here. Our choice of using more λ points than necessary was motivated by the exchange rate of the Hamiltonian replica exchange scheme to increase sampling.

Fixed Charge Drude
Average deviation from reference results   [133,161,197,198] [133,161,197,198]). d Free energy difference of removing all Lennard-Jones interactions of the uncharged solute in aqueous solution. e Total hydration free energy (cf. Table 1).

Appendix C. Scaling the Hydration Free Energies
The high correlation coefficient between the QM/MM derived hydration free energies and the experimental data in Table 2 indicates that a better agreement with experiment can be achieved by scaling the results. Using the ratio of the experimental chemical potential of water in its own liquid relative to the computational result, i.e., ∆G water Expt hyd ∆G water hyd , as a scaling factor, one can obtain significantly better agreement with experiment. Table A3 illustrates this for the QM/MM results with the fixed charge force field and Hartree-Fock (which yielded the worst RMSD with 2.4 kcal/mol and an MSD of −1.9 kcal/mol). The employed scaling factor was 0.6317. Interestingly, the RMSD drops to 0.8 kcal/mol and the MSD to a mere −0.02 kcal/mol. This is an indicator that the scaling factor might be transferable. The results also show that the bias of QM/MM ∆G hyd values is not constant and, therefore, does not cancel if the end points of a free energy calculation involve different chemical species (such as in chemical reactions).