Benchmarking Free Energy Calculations in Liquid Aliphatic Ketone Solvents Using the 3D ‐ RISM ‐ KH Molecular Solvation Theory

: The three ‐ dimensional reference interaction site model of the molecular solvation theory with the Kovalenko–Hirata closure is used to calculate the free energy of solvation of organic so ‐ lutes in liquid aliphatic ketones. The ketone solvent sites were modeled using a modified unit ‐ ed ‐ atom force field. The successful application of these solvation models in calculating ketone– water partition coefficients of a large number of solutes supports the validation and benchmarking reported here.


Introduction
The calculation of solvation free energy (SFE) is one of the cornerstones in chemistry, biology, and drug development efforts. The process of experimental solvation energy measurements is a tedious non-trivial process. Screening of solvation energy for a large number of compounds is time-and resource-consuming. Theoretical methods of calculation of SFEs have been developed over the years and are validated against experimental solvation free energy databases [1][2][3]. Such theoretical frameworks of solvation energy calculations encompass both explicit and implicit considerations of the environment provided by solvents [4][5][6]. A combination of these two models gave rise to the cluster-continuum model [7,8]. As with all theoretical models, these methodologies have limitations and drawbacks related to system size, type of molecules, and computational resource requirements. One of the statistical-mechanics-based molecular solvation theories, the three-dimensional reference interaction site model, a.k.a. 3D-RISM, has gained attention in the last couple of decades due to the proven applicability and accuracy of this theory in applications to (bio-)molecular simulations, material science applications, chemical reactivity problems, and inhomogeneous medium modeling [9][10][11][12][13][14][15][16][17][18][19][20].
The interaction site model has its origin in the works of Chandler and co-workers [21][22][23][24][25]. The detailed theoretical derivations and approximations of the theory of liquid state were reported elsewhere [26][27][28][29]. Here we will briefly explain the key points pertaining to the theoretical construct used for calculations. The RISM theory of molecular liquids is based on first principle statistical mechanics. The distribution of solvent sites around a solute of arbitrary shape and size is generated via the total correlation function h(r), direct correlation function c(r), and the site-site bulk susceptibility function χαγ (r) for  solvent sites around a solute at position r, as: The solute-solvent interaction potentials, u(r), are related to the direct correlation function as: c(r)~−u(r)/(kBT) with T and kB are temperature and the Boltzmann constant, respectively. A simplified mathematical construct is needed to obtain the total and direct correlation functions to help integrate an infinite chain of inter-and intramolecular interactions and to impose a set of consistency conditions of the path-independent chemical potential μ. The bridge function(s) B(r) used to achieve such transformations rewrites equation 1 as: A closure relation also helps simplifying mathematical and computational requirements in calculating the bridging function. In this work, we have used the well-established Kovalenko-Hirata (KH) closure relations for all RISM calculations [30]. The functional form of the KH closure is given as: The excess chemical potential (and hence the solvation energy) is obtained using closed analytical expression as: where  is the Heaviside step function.
The other important physical properties, partial molar volume (PMV, Ṽ) and isothermal compressibility (T), are computed using the number density () as: The 3D-RISM-KH theory can produce solvation structures efficiently, although the internal pressure computed is wrong since the formulation of RISM theory. Further, the maxima on the partial distribution functions were shifted and broadened, as shown in other reports using this theory. One way to correct the Gaussian fluctuation solvation free energy (ΔGGF) to get correct correlation to experimental data is to use a "universal correction" scheme using the 3D-RISM computed PMVs as [31]: The coefficients a and b are obtained from linear regression analysis against known solvation energies. The success of this correction scheme depends on the availability of a sufficient number of solvation free energy values in benchmark dataset(s).
The 3D-RISM-KH theory was successfully applied to calculate SFEs in a broad range of solvents previously, with both low-and high-polarity solvents. In this report, we have selected four aliphatic liquid ketones to standardize application in free energy calculation with the 3D-RISM-KH theory. Ketones are an important class of compounds with appli-cations as solvents, polymer building blocks, pharmaceutical ingredients, etc. We have chosen four different aliphatic ketones, viz. acetone, butanone (methyl ethyl ketone), methyl isobutyl ketone (4-Methyl-2-pentanone), and cyclohexanone in this study due to the availability of a sufficient number of experimental solvation energy data to calibrate 3D-RISM-KH calculations and to further validate the results against experimental ketone-water partition coefficient calculations.

Materials and Methods
Electronic structure calculations: All of the solute molecules used for the solvation energy calculations were optimized at the M06-2X/Def2-TZVPP level in the gas phase as well as in cyclohexanone (CyO), butanone (MEK), and methyl isobutyl ketone (MIBK) continuums [32,33]. The solvent effects were incorporated using the conductor-like polarizable continuum model (CPCM) and the SMD solvation models [34][35][36]. All the quantum mechanics (QM) calculations were performed using the Gaussian16 software package with default settings [37]. All the optimized structures were verified as minima on the respective potential energy surface via vibrational mode analysis.
Molecular dynamics (MD) simulations: All the MD simulations were performed using the GROMACS simulation package [38]. The all-atom GAFF force fields with the AM1-BCC charges were used to model the liquid states of the ketone [39,40]. A solvent box containing 256 ketone molecules was created and equilibrated for a total of 2 ns at a temperature of 298 K and a pressure of 1 bar using a Berendsen thermostat without any geometrical constraints for stable temperature and density profiles. The final production runs were of 10 ns for each system. All of the distribution functions were calculated using the standard utilities incorporated in the GROMACS package.
RISM calculations: A united-atom model of the liquid ketones was used for all the RISM calculations using previously reported parameters [41,42]. Atomic charges obtained from quantum chemical calculations at the MP2/cc-pVTZ level were used to assign partial charges to solvent sites [43,44]. The solvent susceptibility functions of the liquid ketones were computed using the extended-RISM (XRISM) formalism at 298 K. A tolerance criterion of 1 was used for the convergence of solvent distribution function calculations. The cyclohexanone system failed to converge with the united atom parameters. Such behavior was reported earlier by Luchko et al. for a cyclohexane system [45]. Those authors have developed united-atom parameters for C[H2]-centers in cyclohexane. In this work we have adopted those parameters for cyclohexanone C[H2] centers. For water, the DRISM theory is used with the modified SPCe water model. Please refer to Table 1 for the force field parameters used for the solvent molecules reported in this study. The lowest energy conformation of the solutes for the molecular partition coefficient calculations were generated using the MMFF94 force field [46,47], and were used for all the 3D-RISM-KH calculations. All of the solutes were parameterized using the GAFF force field with the AM1-BCC charges. The 3D-RISM-KH calculations on the solutes were performed on a uniform cubic 3D-grid of 128 × 128 × 128 points in a box of size 64 × 64 × 64 Å 3 representing a solute with a few solvation layers with convergence accuracy set to 10 -5 of the modified direct inversion in the iterative subspace (MDIIS) solver. All the RISM calculations were conducted with our in-house code, a working version of which is available in the AMBERTOOLS utility. The partition coefficients between acetate esters and water were calculated as: Solute Databases: For solvation free energy calculations in butanone, methyl isobutyl ketone, and cyclohexanone, the experimental data were taken from the Minnesota Solvation Database [48]. The ketone-water partition coefficient data for acetone, butanone, methyl isobutyl ketone, and cyclohexanone to water partitioning were taken from the work of Abraham et al. [49]. The experimental hydration free energies were taken from the FreeSolv database [50].

Results
In this manuscript, we have benchmarked the performance of the 3D-RISM-KH molecular solvation theory in predicting the solvation free energy of organic solutes in liquid aliphatic ketones, butanone (MEK), methyl isobutyl ketone (MIBK), and cyclohexanone (CyO) using the united-atom amber force field parameters. A correction scheme is developed to predict solvation free energy in liquid ketones. This correction scheme is used to calculate ketone-water partition coefficients for a large set of organic molecules. Our findings are divided as follows: the first section provides a comparative picture of the liquid structures of four ketones obtained from the all-atom MD simulations and the XRISM-KH calculations; the following sections detail the SFE calculations using the QM calculations and the 3D-RISM-KH calculations, and a comparison between different computation schemes; the third and final section provides a proof of concept of a successful application of the 3D-RISM-KH theory in predicting ketone-water partition coefficients.

Molecular Simulations of Pure Liquid Ketones
The liquid state of four ketones, acetone, MEK, MIBK, and CyO, are simulated with all-atom MD. The MD equilibration runs provided satisfactory results for density profiles for all of the ketones in the temperature range of 298 K. The findings are summarized in Table 2. It is important to note that the dielectric constant and density of the liquid state are inputs for XRISM-KH calculations. The first check of the quality of solvent susceptibility function calculations using the XRISM-KH theory is performed by confirming that all the isothermal compressibilities of liquid ketones under study are positive at 298 K. A negative isothermal compressibility would otherwise be indicative of a non-physical nature of the solution(s) obtained at this level of theory. The liquid structures of four ketones as shown from the radial distribution functions present distinctive features between cyclic-and non-cyclic ketones (Figure 1). For example, the first maxima in the carbonyl oxygen distribution are around 5.4 Å for acetone, MEK, and MIBK, with a pre-peak shoulder in the 4-5 Å region. The same radial distribution for CyO have two equal-density broad peaks at 5.4 Å and 7.2 Å. The second peak for the three acyclic ketones are of much less intensity than the first maxima. The features in the carbonyl oxygen partial distribution functions obtained using the XRISM-KH theory showed similar features as those obtained with the all-atom MD simulations, although with some differences. First, the CyO distribution has a first maximum around 3.1 Å, followed by two other peaks around 5.9 Å and 7.1 Å. The peak around 5.1 Å has the smallest intensity. The carbonyl carbon center (C[=O]) of ketones showed unique features based on the identity of the molecule, as described below. The MD computed maxima for acetone are around 5.4 Å and 9.5 Å. In the XRISM-KH calculations, the maxima are around 3.4 and 6 Å. For the MEK system, the first maxima in the MD RDF is around 4.2 Å,s with a broad hump in the 7-9 Å region. The RISM profile of the carbonyl carbon of MEK had maxima at 3.9 Å and 5.6 Å, and the broad hump was located in the 7-9 Å. The MD-computed rdfs for the MIBK carbonyl carbon has maxima around 4.6 and 6.1 Å. The same distributions from the XRISM calculations are located about 3.6 and 6.1 Å. The profile of the carbonyl carbon of CyO showed stark differences from that of other ketones. The MD simulations picked the first maxima around 2.5 Å, with multiple minima between 4.4 and 8.6 Å and a broad peak in the region of 10-13 Å. The XRISM distribution showed multiple peaks in the CyO carboyl carbon distribution in the region of 4.7-7 Å, with a broad peak in the 10-13 Å region. Overall, the MD-and XRISM-KH-computed distributions of solvent sites for individual ketones are qualitatively similar. The differences in the computed distribution profiles between these two computational methods can be atrributed to the choice of the force field (all-atom in MD vs. united atom in XRISM-KH) and other theoretical artifacts associated with molecular simulation methods. The difference in the distribution profile between the three acyclic ketones (viz. acetone, MEK, and MIBK) and the cylohexanone are indicative of different liquid structures for these two (sub)classes of compounds. Hence, we have decided to scale the solvation behavior differently for the cyclic-and acyclic-ketones in this study.

Solvation Free Energy Calculations
The solvent susceptibility functions computed using the XRISM-KH theory for MEK, MIBK, and CyO are used to calculate the SFEs of small organic molecules in these three solvents using the 3D-RISM-KH theory. There is a total of thirty-six experimental SFE values available in the Minnesota Solvation Database, out of which 10 are SFEs in CyO, and 13 experimental SFEs, each for MEK and MIBK. As mentioned in the previous section, due to the differences in the liquid structures between cyclic and acyclic ketones, we have used the "universal correction" scheme for the combined SFE datasets for MEK and MIBK, while the SFEs of CyO were calibrated separately. The coefficients for the "universal correction" scheme of SFEs are provided in Table 3. It is important to note that the fitting coefficients are dependent on the accuracy of the benchmark dataset, as well as the number of data points used in the regression analysis. The solute dataset reported here was used for the development of the SMD solvation model, and hence, the best results in the SFE calculation were obtained using the QM calculations in the SMD model with an overall relative mean square error (RMSE) of 1.33 kcal/mol. The other atomic surface-charge-based QM solvation model, i.e., CPCM, yielded a RMSE of 2.71 kcal/mol. The 3D-RISM-KH-computed corrected SFE has an overall RMSE of 1.51 kcal/mol. The performance of the 3D-RISM-KH theory in calculating SFEs is on par with that of the state-of-the-art QM SFE calculation methods. The 3D-RISM-KH method requires less computational resource requirements than QM calculations. The experimental and computed SFEs are provided in Table 4.

Ketone-Water Partition Coefficient Calculations Using the 3D-RISM-KH Theory
The ketone water partition coefficients were calculated by correcting the GF-solvation free energy calculated using the 3D-RISM-KH theory in respective solvent media. The regression coefficients for the hydration free energy calculation was obtained by benchmarking the hydration free energy computation against the experimental hydration free energies obtained from the FreeSolv database. For hydration free energy calculation, a MAD of 1.4 kcal/mol is obtained for 642 solutes. The details of these solutes are provided in the Supplementary Materials. The 3D-RISM-KH theory computed cyclohexanone-water partition coefficients of 29 solutes have MAD of 0.42 units. The partition coefficients for acyclic ketone-water partitioning with a higher MAD of ~1.3 units were obtained. The overall correlation between the experimental and computed ketonewater partition coefficients showed a good correlation (r 2 = 0.596) for a total of 284 solutes ( Figure 2).

Discussion
In this work, we have first validated the force field parameters for four aliphatic ketones for use as solvents in the 3D-RISM-KH molecular solvation theory. The liquid states of three acyclic ketones (acetone, MEK, and MIBK) and one cyclic ketone (cyclohexanone) were modeled using united-atom force field parameters. The resultant XRISM-KH-based liquid profiles of these four ketones agree very well with the relevant features observed in the MD simulations with the all-atom force field. The liquid structures of the acyclic ketones differ significantly from the cyclic one. Several different local orientations of the molecules exist in the cyclohexane system, owing to the six-member ring framework. Both the MD and RISM calculations underscore these findings. Subsequently, the XRISM-KH-computed solvent susceptibility function is used to calculate the solvation free energy of small molecules with known experimental solvation free energies in different ketones studied here. Our result showed a very good performance of the 3D-RISM-KH-theory-based solvation free energy calculations in liquid ketones, with excellent accuracy at a very modest computational cost. Finally, we have compared the ketone-water partition coefficients for 284 organic molecules for which experimental partitioning coefficients were reported in the literature. Our results are in good agreement with the experimental results. We have noted comparatively large deviations in the calculations for the systems containing N/P/S atom(s), and further considerations should be made while choosing force field parameters for such a type of systems. The partition coefficients calculated for CYO-water and MEK-Water were found to be better than those computed for other two-ketone water systems. Tuning or modifying force field parameters may further help in better correlations with experiments. Further refinements of the ketone-water partition coefficient can be achieved by considering conformational flexibility of relatively larger systems via longer simulations.

Supplementary
Materials: The following are available online at www.mdpi.com/article/10.3390/j4040044/s1, Details of the solutes, partition coefficients computed using the 3D-RISM-KH theory.
Author Contributions: Conceptualization, D.R. and A.K.; methodology, D.R. and A.K.; software, D.R. and A.K.; validation, D.R. and A.K.; formal analysis, D.R.; writing-original draft preparation and editing, D.R. and A.K.; funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.
Funding: This work was financially supported by the NSERC Discovery Grant (RES0029477), and Canadian Consortium on Neurodegeneration in Aging (CCNA) Protein Misfolding, Phase II grant (RES0051206).

Conflicts of Interest:
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.