Hydration Thermodynamics of Non-Polar Aromatic Hydrocarbons: Comparison of Implicit and Explicit Solvation Models

The precise description of solute-water interactions is essential to understand the chemo-physical nature in hydration processes. Such a hydration thermodynamics for various solutes has been explored by means of explicit or implicit solvation methods. Using the Poisson-Boltzmann solvation model, the implicit models are well designed to reasonably predict the hydration free energies of polar solutes. The implicit model, however, is known to have shortcomings in estimating those for non-polar aromatic compounds. To investigate a cause of error, we employed a novel systematic framework of quantum-mechanical/molecular-mechanical (QM/MM) coupling protocol in explicit solvation manner, termed DFT-CES, based on the grid-based mean-field treatment. With the aid of DFT-CES, we delved into multiple energy parts, thereby comparing DFT-CES and PB models component-by-component. By applying the modified PB model to estimate the hydration free energies of non-polar solutes, we find a possibility to improve the predictability of PB models. We expect that this study could shed light on providing an accurate route to study the hydration thermodynamics for various solute compounds.


Introduction
Hydration is a key phenomenon, which governs the chemo-physical processes occurring in an aqueous solution [1]. By means of non-covalent interactions among solutes and solvents, hydration largely affects the thermodynamics of the various processes in water and, thus, is often critical in determining the energetics of chemical reaction [2], bio-molecular association [3,4], structural stabilization of protein and nucleic acids [4][5][6][7][8] and self-assembly [9]. Therefore, an in-depth understanding and an accurate quantification of the hydration free energy have been a long-standing goal of many theoretical chemists to date.
According to Ben-Naim's definition, hydration free energy is the free-energy change associated with a transfer of an isolated solute in gas-phase into water [10][11][12]. To evaluate ∆G hyd , a number of theoretical treatments have been developed [13][14][15][16][17][18][19][20][21][22]. These so-called solvation models treat the S (aq) + S (aq) → S · · · S(aq) It is, thus, necessary to understand the hydration thermodynamics of hydrophobic solutes. Lum-Chandler-Weeks (LCW) theory provides a unified picture of length-scale-dependent hydration thermodynamics of non-polar solutes [31]. The theory suggests a transition from entropy-driven to enthalpy-driven hydration thermodynamic while increasing the solute size [26]. In most theoretical models, the hydrophobic solutes are often conceived as having a repulsive solute-solvent interaction, and thus have been usually modeled using hard-sphere particles. Interestingly, aromatic hydrocarbons yield the negative values of hydration free energies, in contrast to other hydrophobic solutes such as aliphatic hydrocarbons (e.g., the hydration free energy of methane is +1.91 kcal/mol while that of benzene is −0.87 kcal/mol [32]). This implies the existence of attractive solute-solvent interaction, which delicately manifests the hydration energetics of the hydrophobic aromatic molecules. Indeed, implicit solvation methods usually fail to predict the hydration free energies of aromatic molecules. For instance, PB model underestimates the value by around half and SMD models constantly underestimates the values ( Figure S1).
In this article, we investigate hydration free energies of various non-polar aromatic hydrocarbons. For accurate evaluation of their hydration free energies, we employ the mean-field quantum mechanics/molecular mechanics (QM/MM) simulation, namely DFT-CES, recently developed by our group. By means of combining DFT-CES and the two-phase thermodynamic (2PT) model, we compute the hydration free energies for a set of non-polar solutes, in comparison with those obtained from an implicit solvation model. We then decompose the hydration free energy into four distinct components: reorganization, electrostatic, cavitation, and dispersion energies.

Density Functional Theory in Classical Explicit Solvents (DFT-CES)
In the mean-field QM/MM method [33][34][35], a QM subsystem is embedded in the mean electrostatic potential arising from MM particles, V MM . Instead of numerically solving equations of the motions of QM and MM particles at the same discretized time domain, the mean-field QM/MM employs an iterative procedure consisting of QM optimizations and classical molecular dynamics (MD) simulations. At each step of the QM optimization, both the electron density (ρ QM ) and the atomic structure (r QM ) are subject to be minimized under the influence of V MM . Additionally, at each step of the MD simulation, the MM particles at r MM , with momenta of p MM , undergo classical dynamics under the influence of van der Waals (vdW) and electrostatic potentials due to the fixed QM particles. This iterative procedure can be seen as a QM/MM version of a self-consistent reaction field (SCRF) theory to treat solvation effect, and thus can be referred to as a self-consistent ensemble-averaged reaction field (SCERF). During SCERF, if one treats QM region using Kohn-Sham density functional theory (KS-DFT), the free-energy functional of the total QM/MM system defined in Equation (2) is (approximately) minimized upon QM electronic and nuclei degrees of freedom [36]: On the right-hand side of Equation (2), the first term E KS , which is the KS-DFT total energy functional, denotes the internal electronic energy (E int QM ) of the QM subsystem. E int QM can be evaluated in the gas phase, although the ρ QM and r QM are optimized under the effects of V MM . The second term of Equation (2) denotes the free energy (A MM ) of the MM subsystem, where H MM is the classical Hamiltonian governing the MM particle dynamics under an external potential describing the QM subsystem.
Recently, our group developed a grid-based mean-field QM/MM method, which incorporates the periodic DFT with plane-wave basis set, coupled with the classical MD. For brevity, the QM/MM method is called DFT in Classical Explicit Solvents, namely DFT-CES [37]. DFT-CES is designed to utilize a very fine three-dimensional rectangular grid as a communication medium between QM and MM subsystems, thereby transferring precise information of ρ QM and V MM seamlessly to the MM and QM subsystem, respectively, without involving a charge fitting scheme, such as an electrostatic potential (ESP) fitting procedure [38]. In addition, by performing both DFT and MD simulations in the periodic box, long-range electrostatic interaction is straightforwardly taken into consideration when solving the Poisson equation by using the fast Fourier transform method.
Detail of the DFT-CES method is fully provided in our recent literature [37].

Free-Energy Calculation Using Two-Phase Thermodynamic (2PT) Model
To compute the hydration free energy (∆A hyd ) using DFT-CES, a natural partitioning of the total hydrated system is the QM description on the solute and MM description on the water. After the SCERF iterations, the converged DFT results of ρ QM and r QM enable us to evaluate the electronic and structural reorganization energy of the QM solute, which is (or E 0 QM(solute) ) is the KS-DFT total energy of the QM solutes in the gas phase, fully optimized in the presence (or in the absence) of V MM .
Having the converged ρ QM and r QM of the hydrated QM solute after the SCERF iteration, the classical dynamics of MM water subsystem under the fixed solute potential is well defined, and the free energy of the MM water subsystem, A MM(wat) is given by the second term of Equation (2). By defining A 0 MM(wat) as the free energy of the bulk water (i.e., a reference state), the free energy change of water associated with the hydration process becomes ∆A wat = A MM(wat) − A 0 MM(wat) , and the hydration free energy is defined as: The free-energy change of ∆A wat can be calculated by means of an alchemical transformation such as free-energy perturbation [39] or thermodynamic integration [40][41][42]. During MD simulations of the water subsystem, such a calculation is commonly achieved by switching on and off the fixed solute potential. To speed up the evaluation of A MM(wat) and A 0 MM(wat) , instead, we choose the two-phase thermodynamic (2PT) model to extract directly thermodynamic properties from MD  [43][44][45][46]. For a given liquid system, the 2PT model decomposes the total number of its degrees of freedom into two parts (gas-like and the solid-like parts), as inheriting the viewpoint of Eyring's significant theory of liquids [47,48]. Then, the free energy of the liquid system is assumed to be given as the linear combination of the free energy of the gas-like part and that of the solid-like part. The thermodynamics of the gas-like part is analytically calculated by considering a hard-sphere system having the corresponding effective packing fraction, and the thermodynamics of the solid-like part is numerically calculated using harmonic oscillator model.
The 2PT model has been widely tested and proven successful for the quantitative prediction of the thermodynamics and phase behavior of Lennard-Jones particles at various densities [43], absolute entropy of water molecules [44] and organic solvents [45], and the interfacial thermodynamics of various liquids, e.g., surface tensions [46]. It is further emphasized that the combination of DFT-CES and 2PT has demonstrated a satisfactory description of the ∆A hyd of the 17 polar solutes [37], which were adopted as the official SAMPL0 challenge test set [25], and of the solid-liquid interfacial tension of water-graphene/graphite system [49].

Simulation Details
Using our in-house code, we implemented DFT-CES by coupling two open-source programs of Quantum Espresso (plane-wave DFT engine) [50] and the LAMMPS (classical MD engine) [51]. For the DFT part, we employed the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional [52] with the projector augmented-wave (PAW) scheme [53,54], and set the kinetic energy cutoff as 50 Ry. For the MD part, we performed the canonical ensemble (NVT) simulations at 300 K, using the Nosé-Hoover thermostat [55,56] with a damping constant of 100 fs. During the MD part of each SCERF iteration, we performed a total of 2.5 ns NVT MD sampling, where the last 2 ns of the trajectory was employed to evaluate V MM . Solute-water vdW interactions were described using the OPLS-AA force field [57]. Intermolecular interactions of waters were described using the modified TIP3P water potential [58], wherein all bond lengths and angles were constrained to their equilibrium value with the RATTLE algorithm [59]. During MD, long-range Coulomb interactions were treated using the Ewald summation method [60] with a real space cut-off of 15 Å.
DFT-CES simulation box contains a QM solute at the center, which is hydrated by 1,000 classical water molecules. To compute the energies of reference states, E 0 QM(solute) and A 0 MM(wat) , we separately performed a DFT calculation for the gas-phase optimization of the QM solute and an MD simulation of the bulk water box, respectively. Figure 1 illustrates the simulation box for the representative case. The SCERF iteration of the DFT-CES calculation was repeated until the convergence criterion was sufficiently achieved. The criterion is set such that the DFT energy difference becomes less than 0.1 kcal/mol. be given as the linear combination of the free energy of the gas-like part and that of the solid-like part. The thermodynamics of the gas-like part is analytically calculated by considering a hard-sphere system having the corresponding effective packing fraction, and the thermodynamics of the solidlike part is numerically calculated using harmonic oscillator model. The 2PT model has been widely tested and proven successful for the quantitative prediction of the thermodynamics and phase behavior of Lennard-Jones particles at various densities [43], absolute entropy of water molecules [44] and organic solvents [45], and the interfacial thermodynamics of various liquids, e.g., surface tensions [46]. It is further emphasized that the combination of DFT-CES and 2PT has demonstrated a satisfactory description of the Δ of the 17 polar solutes [37], which were adopted as the official SAMPL0 challenge test set [25], and of the solid-liquid interfacial tension of water-graphene/graphite system [49].

Simulation Details
Using our in-house code, we implemented DFT-CES by coupling two open-source programs of Quantum Espresso (plane-wave DFT engine) [50] and the LAMMPS (classical MD engine) [51]. For the DFT part, we employed the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional [52] with the projector augmented-wave (PAW) scheme [53,54], and set the kinetic energy cutoff as 50 Ry. For the MD part, we performed the canonical ensemble (NVT) simulations at 300 K, using the Nosé-Hoover thermostat [55,56] with a damping constant of 100 fs. During the MD part of each SCERF iteration, we performed a total of 2.5 ns NVT MD sampling, where the last 2 ns of the trajectory was employed to evaluate 〈 〉. Solute-water vdW interactions were described using the OPLS-AA force field [57]. Intermolecular interactions of waters were described using the modified TIP3P water potential [58], wherein all bond lengths and angles were constrained to their equilibrium value with the RATTLE algorithm [59]. During MD, long-range Coulomb interactions were treated using the Ewald summation method [60] with a real space cut-off of 15 Å. DFT-CES simulation box contains a QM solute at the center, which is hydrated by 1,000 classical water molecules. To compute the energies of reference states, ( ) and ( ) , we separately performed a DFT calculation for the gas-phase optimization of the QM solute and an MD simulation of the bulk water box, respectively. Figure 1 illustrates the simulation box for the representative case. The SCERF iteration of the DFT-CES calculation was repeated until the convergence criterion was sufficiently achieved. The criterion is set such that the DFT energy difference becomes less than 0.1 kcal/mol. ). In the perspective of the hydration process, the first term corresponds to the water-solvated system ( ( ) + ( ) ), whereas the second term (in parentheses) corresponds to the reference states for the bulk water box ( ( ) ) and the QM solute in the gas-phase ( ( ) ). Figure 1. Schematized process of calculating the hydration free energy (∆A hyd ). In the perspective of the hydration process, the first term corresponds to the water-solvated system (A MM(wat) + E int QM(solute) ), whereas the second term (in parentheses) corresponds to the reference states for the bulk water box (A 0 MM(wat) ) and the QM solute in the gas-phase (E 0 QM(solute) ).

Hydration Free Energies of Aromatic Hydrocarbons
Using DFT-CES/2PT method, we calculated ∆A hyd of eight non-polar aromatic hydrocarbons, listed as benzene, toluene, biphenyl, naphthalene, fluorene, phenanthrene, pyrene, and anthracene ( Figure 2a). For comparison, we also evaluated ∆A hyd by applying the Poisson-Boltzmann (PB) implicit solvation model as implemented in Jaguar 8.4 [61]. Such DFT calculations were carried out with the choice of Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional [52] and 6-31G ** basis sets [62,63]. We note that our test set of eight molecules, likewise as most of the other aromatic hydrocarbons, do not exhibit much flexibility in their conformations so that a rigorous conformational sampling analysis is not significant. In Figure 2b, the calculated ∆A hyd values are compared with the available experimental results [64,65]. We find that our DFT-CES/2PT method shows a remarkable performance in predicting the ∆A hyd values of aromatic hydrocarbons, which is characterized by small mean absolute error (MAE) and root-mean-square error (RMSE) of 0.34 and 0.37 kcal/mol, respectively. The implicit PB solvation model, however, shows a significant underestimating tendency, estimating only ca. 40% of experimental ∆A hyd results. Considering that the PB model in Jaguar is well optimized to accurately predict the ∆A hyd values for polar solutes (MAE = 1.73 kcal/mol and RMSE = 1.88 kcal/mol for the SAMPL0 test set; Figure 2), such a gross error in predicting the aromatic hydrocarbons is notable. Using DFT-CES/2PT method, we calculated Δ of eight non-polar aromatic hydrocarbons, listed as benzene, toluene, biphenyl, naphthalene, fluorene, phenanthrene, pyrene, and anthracene ( Figure 2a). For comparison, we also evaluated Δ by applying the Poisson-Boltzmann (PB) implicit solvation model as implemented in Jaguar 8.4 [61]. Such DFT calculations were carried out with the choice of Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional [52] and 6-31G ** basis sets [62,63]. We note that our test set of eight molecules, likewise as most of the other aromatic hydrocarbons, do not exhibit much flexibility in their conformations so that a rigorous conformational sampling analysis is not significant. In Figure 2b, the calculated values are compared with the available experimental results [64,65]. We find that our DFT-CES/2PT method shows a remarkable performance in predicting the Δ values of aromatic hydrocarbons, which is characterized by small mean absolute error (MAE) and root-mean-square error (RMSE) of 0.34 and 0.37 kcal/mol, respectively. The implicit PB solvation model, however, shows a significant underestimating tendency, estimating only ca. 40% of experimental Δ results. Considering that the PB model in Jaguar is well optimized to accurately predict the Δ values for polar solutes (MAE = 1.73 kcal/mol and RMSE = 1.88 kcal/mol for the SAMPL0 test set; Figure 2), such a gross error in predicting the aromatic hydrocarbons is notable. To trace the error source of the PB solvation model, we perform a comparative study between DFT-CES and PB calculation results. Implicit solvation models, including the PB solvation model, generally assume that the total hydration free energy ( Δ ) is given as the sum of the electronic/structural reorganization energy of the solute ( ) , and electrostatic and nonelectrostatic free energies (which are respectively denoted as Δ and Δ ), where Δ consists of the cavitation (Δ ) and dispersion (Δ ) components: We now decompose the estimate of Δ from the DFT-CES into four different contributions: , Δ , Δ , and Δ . The component of can be readily separated from Δ , as being defined as the first term of Equation (3). To extract the rest of the term (∆ ), we additionally To trace the error source of the PB solvation model, we perform a comparative study between DFT-CES and PB calculation results. Implicit solvation models, including the PB solvation model, generally assume that the total hydration free energy (∆A hyd ) is given as the sum of the electronic/structural reorganization energy of the solute E reorg , and electrostatic and non-electrostatic free energies (which are respectively denoted as ∆A ES and ∆A nonES ), where ∆A nonES consists of the cavitation (∆A cav ) and dispersion (∆A disp ) components: We now decompose the estimate of ∆A hyd from the DFT-CES into four different contributions: E reorg , ∆A ES , ∆A cav , and ∆A disp . The component of E reorg can be readily separated from ∆A hyd , as being defined as the first term of Equation (3). To extract the rest of the term (∆A wat ), we additionally carry out a series of DFT-CES/2PT calculations in a non-self-consistent manner without involving SCERF iteration. ∆A ES is first decomposed from ∆A wat , with the help of the non-self-consistent DFT-CES/2PT simulation where the electrostatic potential of the QM solute (which is fully stored in the grid) is switched off. ∆A disp is further decomposed using the non-self-consistent DFT-CES/2PT result where the vdW potential is replaced with the purely repulsive Weeks-Chandler-Anderson (WCA) potential [66]. The remaining term, which is associated with the switching-off process of the WCA potential, becomes ∆A cav . (For more information, please refer to Equations (15)- (17) in [37].) Each energy component decomposed from ∆A hyd is summarized in Table 1.

Validity of Linear-Reponse Theory
In the implicit solvation models, linear-response theory (LRT) is underlying to estimate the "free energy quantity of ∆A" from the easily computable "interaction energy of U (having no entropic term)". Considering the reversible work done by water during the process of turning on solute-solvent interaction from U uv to U uv,0 , the LRT yields ∆A wat = (U uv + U uv,0 )/2 [67]. For the process of turning the electrostatic interaction on, the solvent medium is expected to be gradually polarized by re-orienting the dipoles of water molecules. Thus, the electrostatic interaction energy is assumed to be grown from U ES,0 = 0 to a finite value of U ES , resulting in ∆A ES = U ES /2.
In Figure 3a, we compare the free-energy component of ∆A ES (that is computed using 2PT method) and its corresponding interaction energy of U ES (that is ensemble averaged over the MD trajectories). It shows a linear relationship of ∆A ES ≈ 0.46 U ES , where the slope of 0.46 is in consistent not only with the theoretical value from the LRT (slope = 0.5), but also with the value (slope ≈ 0.40) determined for the polar solutes in the SAMPL0 set [37]. This implies that the LRT can be applied for both polar and non-polar solutes with a reasonable accuracy. The global linear fit on the polar and non-polar solutes leads to the linear coefficient of 0.41 with R 2 = 0.95.
While gradually tuning the dispersive vdW interaction on, no dipolar relaxation of the water is expected, yielding U vdW,0 = U vdW and thereby ∆A disp = U vdW . Figure 3b further shows the near identity between the ∆A disp and the ensemble-averaged U vdW , in consistent with the LRT about vdW interaction.

Non-Electrostatic Interaction Components: Surface-Area Dependence
The PB solvation model evaluates Δ based on the LRT by calculating the electrostatic interaction ( ) between the solute and the SCRF that is obtained by solving a continuum equation such as a Poisson-Boltzmann equation. However, Δ cannot be obtained using the LRT due to the difficulty in calculating , but is estimated by simply assuming its linear relationship with the solvent accessible surface area (ASA) in the PB solvation model. Additionally, the cavitation free energy of Δ is conceived to be proportional to either the surface area or the volume of the solute, which is rationalized using the scaled-particle theory for hard spheres [68]. Figure 4 shows how Δ or Δ changes as a function of ASA of the solute molecules. We find a good positive proportionality of Δ and a negative proportionality of Δ with the ASA, resulting in a mutual cancellation when both terms are combined into the Δ . At a glance, these trends are generally as similar as for polar solutes [37]. However, the degree of mutual cancellation is somewhat different. The more effective stabilization of Δ is found for the larger aromatic hydrocarbon solutes, yielding a small, but still appreciable negative proportionality of Δ with the ASA. This is in contrast to the case of polar solutes, where a more complete mutual cancellation has been found [37].

Non-Electrostatic Interaction Components: Surface-Area Dependence
The PB solvation model evaluates ∆A ES based on the LRT by calculating the electrostatic interaction (U ES ) between the solute and the SCRF that is obtained by solving a continuum equation such as a Poisson-Boltzmann equation. However, ∆A disp cannot be obtained using the LRT due to the difficulty in calculating U vdW , but is estimated by simply assuming its linear relationship with the solvent accessible surface area (ASA) in the PB solvation model. Additionally, the cavitation free energy of ∆A cav is conceived to be proportional to either the surface area or the volume of the solute, which is rationalized using the scaled-particle theory for hard spheres [68]. Figure 4 shows how ∆A disp or ∆A cav changes as a function of ASA of the solute molecules. We find a good positive proportionality of ∆A disp and a negative proportionality of ∆A cav with the ASA, resulting in a mutual cancellation when both terms are combined into the ∆A nonES . At a glance, these trends are generally as similar as for polar solutes [37]. However, the degree of mutual cancellation is somewhat different. The more effective stabilization of ∆A disp is found for the larger aromatic hydrocarbon solutes, yielding a small, but still appreciable negative proportionality of ∆A nonES with the ASA. This is in contrast to the case of polar solutes, where a more complete mutual cancellation has been found [37].

Non-Electrostatic Interaction Components: Surface-Area Dependence
The PB solvation model evaluates Δ based on the LRT by calculating the electrostatic interaction ( ) between the solute and the SCRF that is obtained by solving a continuum equation such as a Poisson-Boltzmann equation. However, Δ cannot be obtained using the LRT due to the difficulty in calculating , but is estimated by simply assuming its linear relationship with the solvent accessible surface area (ASA) in the PB solvation model. Additionally, the cavitation free energy of Δ is conceived to be proportional to either the surface area or the volume of the solute, which is rationalized using the scaled-particle theory for hard spheres [68]. Figure 4 shows how Δ or Δ changes as a function of ASA of the solute molecules. We find a good positive proportionality of Δ and a negative proportionality of Δ with the ASA, resulting in a mutual cancellation when both terms are combined into the Δ . At a glance, these trends are generally as similar as for polar solutes [37]. However, the degree of mutual cancellation is somewhat different. The more effective stabilization of Δ is found for the larger aromatic hydrocarbon solutes, yielding a small, but still appreciable negative proportionality of Δ with the ASA. This is in contrast to the case of polar solutes, where a more complete mutual cancellation has been found [37].

Component-by-Component Comparisons
In the previous sections, we confirm that the basic approximations underlying the PB solvation model are more-or-less reasonably working still for aromatic hydrocarbons; (1) the LRT is still valid, and (2) both ∆A disp and ∆A cav are directly proportional to the ASA. In this section, we further analyze each component of ∆A hyd in detail, for the sake of quantitative comparison between DFT-CES and PB calculations, as shown in Figure 5.

Component-by-Component Comparisons
In the previous sections, we confirm that the basic approximations underlying the PB solvation model are more-or-less reasonably working still for aromatic hydrocarbons; (1) the LRT is still valid, and (2) both Δ and Δ are directly proportional to the ASA. In this section, we further analyze each component of Δ in detail, for the sake of quantitative comparison between DFT-CES and PB calculations, as shown in Figure 5.  For the case of the electrostatic component (Δ ), Figure 5b shows that the PB model predicts reasonably well the relative difference of Δ . When the proportional coefficient of the LRT is changed from 0.5 to 0.41 as determined from the Section 3.2, the relative difference of Δ is even more reliably estimated using the PB model, as shown in the linear-fit slope on the PB versus DFT-CES values curve changing from 1.19 to 0.97. However, Δ from the PB model tends to be overestimated as a whole with a 1.65 or 1.36 kcal/mol offset when using the original and modified LRT coefficient, respectively. We attribute the offset error to the incomplete description on the reaction field. It is worth to note that aromatic hydrocarbons are known to develop π-water hydrogen bonds based on the substantial quadrupoledipole interaction. Figure 6 shows the reaction-field maps for fluorine, as a representative example. (The other maps for the remaining seven non-polar solutes are available in Figure S2). By using the explicit solvation model of DFT-CES, in Figure 6a, the map successfully exhibits the local solventdensity fluctuation with positive and negative charges alternatively, compared with the map from the PB implicit method showing only one polarity of the positive charge near π-electron. Furthermore,  Figure 5a shows that the PB model predicts quite well the reorganization energies (E reorg ) and their absolute values are relatively small with a weak influence on the total solvation free energy.
For the case of the electrostatic component (∆A ES ), Figure 5b shows that the PB model predicts reasonably well the relative difference of ∆A ES . When the proportional coefficient of the LRT is changed from 0.5 to 0.41 as determined from the Section 3.2, the relative difference of ∆A ES is even more reliably estimated using the PB model, as shown in the linear-fit slope on the PB versus DFT-CES values curve changing from 1.19 to 0.97.
However, ∆A ES from the PB model tends to be overestimated as a whole with a 1.65 or 1.36 kcal/mol offset when using the original and modified LRT coefficient, respectively. We attribute the offset error to the incomplete description on the reaction field. It is worth to note that aromatic hydrocarbons are known to develop π-water hydrogen bonds based on the substantial quadrupole-dipole interaction. Figure 6 shows the reaction-field maps for fluorine, as a representative example. (The other maps for the remaining seven non-polar solutes are available in Figure S2). By using the explicit solvation model of DFT-CES, in Figure 6a, the map successfully exhibits the local solvent-density fluctuation with positive and negative charges alternatively, compared with the map from the PB implicit method showing only one polarity of the positive charge near π-electron. Furthermore, our DFT-CES model can accurately describe the π-water hydrogen bonding interaction, which is known experimentally to be crucial in non-polar aromatic compounds.
our DFT-CES model can accurately describe the π-water hydrogen bonding interaction, which is known experimentally to be crucial in non-polar aromatic compounds. The systematic offset in predicting Δ is one source of error originated by the incompleteness of the PB model, which can bring an overestimating trend of Δ by 1 to 2 kcal/mol. However, it is notable that an underestimating, instead of overestimating, trend of Δ has been found ( Figure  2), which implies that the electrostatic contribution is not the major error source of the Δ of the PB model. Figure 5c shows the non-electrostatic component (Δ ) from the PB model, compared with that from the DFT-CES calculation. We note that the PB results show a significant error in predicting Δ , which is even anti-correlated with the DFT-CES results with an offset of 2.55 kcal/mol. The PB model, implemented in Jaguar, yields Δ by using the linear-regression result, obtained from the linear and branched aliphatic hydrocarbons [69]. The calculated data produced from the original PB model shows a positive correlation between Δ (in unit of kcal/mol) and the ASA (in unit of Å 2 ): However, from our analysis on the DFT-CES results, as aforementioned in Section 3.3, the modified linear regression between Δ and the ASA yields the negative slope of −0.0131 kcal/mol/Å 2 and the intercept of 3.56 kcal/mol.
We now apply two modifications, which are derived from our component-by-component analyses, to the PB model; (1) change of the linear coefficient of the LRT from 0.5 into 0.41 when evaluating Δ , and (2) use of the modified linear model predicting Δ from the ASA. Although the offset error of 1.36 kcal/mol found in the electrostatic component (Δ ) propagates to Δ , it is unfortunately deemed to be an innate limitation of the PB model ignoring the solvent shell structure. As an ad-hoc remedy, we here make the non-electrostatic component to cancel the offset error in the Δ . To help readers' understanding, results without including such an ad hoc correction are also provided in Figure S3. From the linear-regression results on the DFT-CES data, we simply change the intercept value from the 3.56 kcal/mol to 4.92 (=3.56 + 1.36) kcal/mol: In Figure 5c, the newly predicted results of Δ are also depicted (in red triangles) with respect to the DFT-CES results. The refined PB model offers much improved estimates of Δ , albeit not fully satisfactory, and also a positive correlation between Δ and Δ . Figure 5d shows the new estimates of from the modified PB model, which are compared with the experimental values. Despite of some discrepancies in each component of Δ , the refined The systematic offset in predicting ∆A ES is one source of error originated by the incompleteness of the PB model, which can bring an overestimating trend of ∆A hyd by 1 to 2 kcal/mol. However, it is notable that an underestimating, instead of overestimating, trend of ∆A hyd has been found ( Figure 2), which implies that the electrostatic contribution is not the major error source of the ∆A hyd of the PB model. Figure 5c shows the non-electrostatic component (∆A nonES ) from the PB model, compared with that from the DFT-CES calculation. We note that the PB results show a significant error in predicting ∆A nonES , which is even anti-correlated with the DFT-CES results with an offset of 2.55 kcal/mol. The PB model, implemented in Jaguar, yields ∆A nonES by using the linear-regression result, obtained from the linear and branched aliphatic hydrocarbons [69]. The calculated data produced from the original PB model shows a positive correlation between ∆A nonES (in unit of kcal/mol) and the ASA (in unit of Å 2 ): ∆A PB nonES = 0.005 × ASA + 1.09 (6) However, from our analysis on the DFT-CES results, as aforementioned in Section 3.3, the modified linear regression between ∆A nonES and the ASA yields the negative slope of −0.0131 kcal/mol/Å 2 and the intercept of 3.56 kcal/mol.
We now apply two modifications, which are derived from our component-by-component analyses, to the PB model; (1) change of the linear coefficient of the LRT from 0.5 into 0.41 when evaluating ∆A ES , and (2) use of the modified linear model predicting ∆A nonES from the ASA. Although the offset error of 1.36 kcal/mol found in the electrostatic component (∆A ES ) propagates to ∆A hyd , it is unfortunately deemed to be an innate limitation of the PB model ignoring the solvent shell structure. As an ad-hoc remedy, we here make the non-electrostatic component to cancel the offset error in the ∆A ES . To help readers' understanding, results without including such an ad hoc correction are also provided in Figure S3. From the linear-regression results on the DFT-CES data, we simply change the intercept value from the 3.56 kcal/mol to 4.92 (=3.56 + 1.36) kcal/mol: In Figure 5c, the newly predicted results of ∆A modPB nonES are also depicted (in red triangles) with respect to the DFT-CES results. The refined PB model offers much improved estimates of ∆A modPB nonES , albeit not fully satisfactory, and also a positive correlation between ∆A modPB nonES and ∆A DFT−CES nonES . Figure 5d shows the new estimates of ∆A hyd from the modified PB model, which are compared with the experimental values. Despite of some discrepancies in each component of ∆A hyd , the refined PB values of ∆A hyd agree fairly well with the experimental data (MAE = 0.29 kcal/mol and RMSE = 0.36 kcal/mol).
In Figure 7, we further check the transferability of our modified PB model into other various compounds, including the 17 polar solutes adopted as the SAMPL0 test set. For the SAMPL0 test compounds, Figure 7a,b show the results from the original PB model in Jaguar and from our modified PB model, denoted as ∆A PB nonES and ∆A modPB nonES , respectively. At a glance, we find that the good predictability of the original PB model becomes deteriorated, when we modify the parameters using the ones derived from non-polar aromatic hydrocarbon system. However, a deeper look on data, in Figure 7b, allows us to suggest a direction for further possible modifications of the implicit model to incorporate both polar and non-polar solutes. We find that the modified PB model can predict relatively well the estimates of ∆A hyd , for relatively more hydrophobic solutes, such as halide-containing solutes as well as no nitrogen (N)-nor oxygen (O)-containing solutes. This infers that the newly derived parameters from the aromatic hydrocarbon could rather generally be utilized for the non-polar solutes, although more intensive tests are required. In addition, we find that the modified PB model systematically underestimates ∆A hyd of the N-containing solutes, while overestimates ∆A hyd of the O-containing solutes. This implies that it is strongly necessary to treat atom-species dependency, for the sake of improving predictability of the PB model. Considering that ∆A cav is estimated from a dummy-atom model (where no atom-species information is left), we suggest that the atom-species dependency needs to be included in developing a model for ∆A disp . In Figure 7, we further check the transferability of our modified PB model into other various compounds, including the 17 polar solutes adopted as the SAMPL0 test set. For the SAMPL0 test compounds, Figure 7a,b show the results from the original PB model in Jaguar and from our modified PB model, denoted as Δ and Δ , respectively. At a glance, we find that the good predictability of the original PB model becomes deteriorated, when we modify the parameters using the ones derived from non-polar aromatic hydrocarbon system. However, a deeper look on data, in Figure 7b, allows us to suggest a direction for further possible modifications of the implicit model to incorporate both polar and non-polar solutes. We find that the modified PB model can predict relatively well the estimates of Δ , for relatively more hydrophobic solutes, such as halidecontaining solutes as well as no nitrogen (N)-nor oxygen (O)-containing solutes. This infers that the newly derived parameters from the aromatic hydrocarbon could rather generally be utilized for the non-polar solutes, although more intensive tests are required. In addition, we find that the modified PB model systematically underestimates Δ of the N-containing solutes, while overestimates of the O-containing solutes. This implies that it is strongly necessary to treat atom-species dependency, for the sake of improving predictability of the PB model. Considering that Δ is estimated from a dummy-atom model (where no atom-species information is left), we suggest that the atom-species dependency needs to be included in developing a model for Δ .   (a,b)) as well as other various types of aromatic and aliphatic hydrocarbons (in (c,d)). (Left panels (a,c)) Calculated estimates of ∆A hyd from the original PB model. (Right panels (b,d)) Refined estimates of ∆A hyd from the modified PB model. Figure 7c,d also show the results from the original and modified PB models, respectively, for the other test sets including aromatic and aliphatic (linear, branched and cyclic) hydrocarbons. Concerning other ten aromatic hydrocarbons (t-butylbenzene, ethylbenzene, p-xylene, m-xylene, o-xylene, 1,3-dimethylnaphthalene, 2,6-dimethylnaphthalene, 2,3-dimethylnaphthalene, 1,4-dimethylnaphthalene, and acenaphthene), the original PB model still seems problematic as shown in Figure 7c and Figure S4 (The slope of the linear-fit curve: 0.38). On the other hand, our modified PB model also predicts well for those non-polar compounds (slope: 1.13). Moreover, we assessed the prediction accuracy of the both PB models for other aliphatic compounds. The modified PB model works reasonably well for cyclic hydrocarbons (cyclopropane, cyclopentane, and cyclohexane). However, for branched (isobutane, isopentane, neopentane, and isohexane) and linear (methane, ethane, propane, n-butane, n-pentane, n-hexane, n-heptane, and n-octane) hydrocarbons, Figure 7d represents a negative trend for estimating ∆A hyd . We conceive that the origin of the gross error found in the linear and branched hydrocarbons is mostly due to the lack of conformational sampling since the ASA changes significantly during their structural dynamics. In addition, the change of conformational entropy during hydration process needs to be taken into consideration for a reliable prediction of hydration free energies of these flexible molecules, which is entirely absent in current implicit solvation methods. Indeed, it is found that the error of small molecules (e.g., methane, ethane, propane) is acceptable, while the error increases as the solute molecule becomes longer and thereby more flexible. These results further pointed out a possible problem of the previous parametrization method of the original PB model, which was carried out by comparing the experimental hydration free energy and the ASA of one conformer of the flexible linear hydrocarbon. By performing a similar comparative study of DFT-CES and PB model in our next study, we expect to build more systematic understanding on the hydration process of linear and branched hydrocarbons and then to answer the open question about the origin of the different solvation of aliphatic and aromatic hydrocarbons.

Conclusions
In this work, we examined the thermodynamic properties in the hydration process of non-polar aromatic compounds. The in-depth case study on their hydration free energies (∆A hyd ) was carried out with the aid of a mean-field QM/MM method, i.e., DFT-CES, recently developed in our group. Here, we tested the extensibility of DFT-CES toward non-polar species, which are commonly known to be difficult challenges for general implicit solvation models. Our DFT-CES/2PT method successfully offers an excellent estimate of ∆A hyd for non-polar solutes, compared with the experimental dataset, whereas, for the same set of data, almost all ∆A hyd is poorly determined by the PB implicit model with large prediction error (ca. 40%). To delve into a cause of error, we decomposed ∆A hyd into multiple energy components, thereby conducting component-by-component comparisons between DFT-CES and PB models. It is noteworthy that the PB implicit model can accurately predict the experimental values in hydration free energies of aromatic hydrocarbons, after revising ∆A PB ES and ∆A PB nonES energy parts. By further testing our modified PB model against to the polar solutes, we find an atom-species dependent error, suggesting a need for a more elaborated method accounting for exposed atom-species dependent modeling of ∆A disp . We anticipate that our study provides not only a fundamental understanding on the hydration thermodynamics of non-polar aromatic hydrocarbons, but also a route to overcome limitations of implicit solvation methods toward an improved accuracy for extended sets of solute molecules.