Predicting 1,9-Decadiene − Water Partition Coefﬁcients Using the 3D-RISM-KH Molecular Solvation Theory

: The Three-Dimensional Reference Interaction Site Model (3D-RISM) with Kovalenko − Hirata (KH) closure is applied to calculate the 1,9-Decadiene/Water partition coefﬁcients for a diverse class of compounds. The liquid state of 1,9-Decadiene is represented with the united atom TraPPE force ﬁeld parameters. The 3D-RISM-KH computed partition functions are in good agreement with the experimental results. Our computational scheme can be used for a quantitative structure partitioning prediction for decadiene-water system, which has been used in membrane-mimicking of the egg-lecithin/water permeability experiments. We employed the generalized Amber force ﬁeld (GAFF) parameters with the AM1-BCC charges for all the solutes [39,40]. The 3D-RISM-KH calculations for the solute molecules were performed using a uniform cubic 3D-grid of 128 × 128 × 128 points in a box of size 64 × 64 × 64 Å3 to represent a solute with a few solvation layers. The convergence accuracy was set to 10 − 5 in the modiﬁed direct inversion in the iterative subspace (MDIIS) solver. The excess chemical potentials (exchem) of the solutes in each solvent were used as additional descriptors for the QSAR models. All the solutes were treated as their neutral form, unless otherwise mentioned in the discussion section.


Introduction
Rapid development of computational methods and tools yielded a vast collection of drug/drug-like compounds that can potentially be used for drug development as well as drug repurposing. Such drug development programs involve (bio-)physical property predictions, quantitative structure activity relationships, applicability domain calculations, etc. The bottleneck in such activities is an accurate prediction of bioavailability of a drug candidate. Bioavailability depends on the physical and chemical properties of a molecule. Within the physical property domain, solubility and permeability are key factors. These two together constitute one of the major challenges in biophysics, i.e., the prediction of permeability through the cell membrane. Permeability in turn depends on a series of solvation/desolvation couples for its way to target tissue(s). Several molecular structure and activity relationships were developed over the years to incorporate lipid-membrane permeability in the absorption−distribution−metabolism−excretion (ADME) studies of drug candidates and risk assessments of chemical exposures [1][2][3][4]. The membrane permeability models have attracted a lot of attention in both experimental and computational points of view, owing to elaborate experimental setups and requirement of non-trivial molecular simulations. The most common permeations across any barrier are diffusioncontrolled. Modeling such permeation processes often makes assumptions to simplify an otherwise very complex process. A direct mimic of a membrane/bi-layer model generally approximates a homogenous and isotropic bilayer, in its simplest form. An extension of this model is a more realistic version where the permeability coefficient is related to the local permeability of the solute for a given region of the bilayer and interfacial resistance. While this process works reasonably well for small molecules, other considerations are needed for peptides [5]. A common practice in experimental permeability determination is to replace the actual membrane with solvent(s) of "adequate" physico-chemical properties.
While such replacements have innate issues, they are easier to set up and can often be used in a high throughput manner. One such example is the use of the non-polar 1,9-decadiene (DED) molecule as a membrane mimic of egg-lecithin membrane. The egg-lecithin membrane (also known as the black lipid membrane) is used in experiments for calculating the bilayer permeability of compounds [6][7][8]. The DED/Water partitioning thus attracted attention as an estimator of lecithin−water permeability as well as an excellent descriptor of chemical selectivity in lecithin membrane permeability [9][10][11]. It is important to point out that the success of such correlations between a membrane mimic and actual membrane permeability depends on the uncertainty in K mimic/water calculations, membrane composition (and in turn, density of phospholipids), ionic strength, etc. Detailed molecular studies of liquid state of DED are conspicuously absent in literature, although interesting quantitative structure activity relationships for DED/Water partitioning were reported using molecular descriptors based on linear free energy relationships [12][13][14]. Theoretical modeling of permeability as well as partitioning requires detailed involvement of solvents and solvation free energies in molecular partitioning. Continuum solvation models are thus a suitable candidate to calculate explicit solvation free energy terms that are required in partition function calculation. However, a continuum model for DED is not available yet.
The statistical mechanics-based 3D-RISM-KH molecular solvation theory is an alternative to the continuum solvation model in the sense that it represents a solvent molecule with a fixed number of solvent sites, with sizes and charges based on the choice of force field parameters around a solute of arbitrary shape. This theory provides direct correlation functions (DCFs) for all species in solution [15][16][17] by expressing any molecular system with a six-dimensional vector consisting of three positional {r} and orientational degrees of freedom {Θ}, each in the molecular Ornstein-Zernike equation (MOZ) via the pair correlation functions (PCF) of r and Θ of liquids, in three dimensions (3D). Solvent is represented with a finite number of sites (γ) around a solute with the 3D correlation function (h γ (r)): The 3D-site distribution function (g γ ) is calculated as g γ (r) = h γ (r) + 1 and consists of all the interactions among all the solvent sites [18,19]. The physical characteristics of a given solvent/solution, e.g., density and dielectric constant, are used as input in an RISM calculation. For instance, the bulk susceptibility function reflects the shape and orientation of the solvent and is constructed from the intramolecular correlation function ω αγ from the dielectrically consistent RISM (DRISM): The computational speed and accuracy of an RISM calculation depends on the nature of a closure relation for integrating the infinite chain of diagrams produced through Equation (1). The Kovalenko−Hirata closure (KH) has proven to be the most stable, numerically, and provides the solvation structure with reasonable accuracy at a modest computational cost, amongst a handful of successful closure relations [20]. The KH closure approximation accounts for both electrostatic and non-polar features of the liquid and has the following form: The KH closure combines the so-called mean spherical approximation (applied to the spatial part with solvent density enrichment, g γ (r) > 1) with the hypernetted chain (HNC) applied to the spatial part with solvent density depletion, g γ (r) < 1, and provides numerical stability and accuracy. While the KH closure is known to underestimate the height of strong associative peaks, these errors are mitigated by broadening of the peaks, and this often corrects the solvation thermodynamics and structure. For further theoretical details of the theory, please refer to [21][22][23]. It is important to keep in mind that the predicted solvation free energy with any theoretical model is not absolute. The excess chemical potentials obtained from the 3D-RISM-KH theory have a qualitative relation with the experimental solvation free energies, and a more quantitative measure can be obtained by careful calibration using the so-called "universal correction" scheme [24] and using the partial molar volume (PMV, applied to the spatial part with solvent density depletion, gγ(r) < 1, and provides numerical stability and accuracy. While the KH closure is known to underestimate the height of strong associative peaks, these errors are mitigated by broadening of the peaks, and this often corrects the solvation thermodynamics and structure. For further theoretical details of the theory, please refer to [21][22][23]. It is important to keep in mind that the predicted solvation free energy with any theoretical model is not absolute. The excess chemical potentials obtained from the 3D-RISM-KH theory have a qualitative relation with the experimental solvation free energies, and a more quantitative measure can be obtained by careful calibration using the so-called "universal correction" scheme [24] and using the partial molar volume (PMV, Ṽ) computed from extending the RISM formalism with the Kirkwood−Buff theory and isothermal compressibility (χT): The objective of this manuscript is to first establish a 3D-RISM-KH-based computational protocol to describe the liquid state of DED. This theoretical framework is then validated against traditional molecular dynamics (MD) simulations. The 3D-RISM-KH-based calculations are then extended to calculate excess chemical potentials of 48 solutes in DED and in water. These excess chemical potentials are then used as molecular solvation descriptors with other 2D descriptors to develop a quantitative structure partitioning model using machine learning techniques. This work serves as a proof of concept for successful application of the 3D-RISM-KH solvation energy descriptors in predictive modeling of the decadiene−water partition of small molecules.

Materials and Methods
Database preparation: The experimental DED/Water molecular partitioning for small molecules (KDED/W) was collected from the work of Nitsche and co-workers [14]. For logKDED/W data, the reported standard errors in the KDED/W are ignored. This dataset is also a part of those reported by Abraham et al. [12].
Molecular dynamics (MD) simulations: The MD simulations of liquid DED were done using the GROMACS software package [25]. The molecule was parameterized using the all-atom OPLS and CHARMM force fields. The OPLS parameters were generated from the LigParGen server with the 1.14*CM1A-LBCC charge assignment protocol [26][27][28]. The CHARMM parameters were developed using the SWISSPARAM webserver [29,30]. Additionally, the united atom GROMOS parameters were also used for the MD simulations of DED. These parameters were obtained from the Automated Topology Builder webserver [31][32][33]. For all the liquid phase simulation, a homogeneous cubic simulation box with 256 solvent molecules was generated. The initial energy-minimized solvent box was subjected to 500 ps NVT and NPT equilibration without any constraints under periodic boundary conditions. The target temperature was set to 298 K and the target pressure to 1 bar using Berendsen thermostat. The temperature and density profiles were used to judge the adequacy of the equilibration steps. The final production runs were 5 ns long. All the radial distribution functions were calculated using the built-in functions of the GROMACS package from the production simulation trajectories.
Three-dimensional RISM-KH calculations: The lowest energy conformation of all the solutes generated using the OpenBabel toolkit with MMFF94 force field was further used for all the RISM calculations [34]. The 3D-RISM-KH-based excess chemical potential and partial molar volume (used as descriptors in the prediction) were calculated for all the solutes using our in-house 3D-RISM-KH code, a working version of which is implemented in the AMBERTOOLS suite of programs [35]. We used the Transferable Potentials for Phase Equilibria family of force fields (TraPPE) of Siepmann and co-workers for a 1,9decadiene molecule [36,37]. This is a united atom force field with no charges on the carbon sites. The extended-RISM (X-RISM) formalism was used for calculating susceptibility ) computed from extending the RISM formalism with the Kirkwood−Buff theory and isothermal compressibility (χ T ): Physchem 2021, 1, FOR PEER REVIEW 3 applied to the spatial part with solvent density depletion, gγ(r) < 1, and provides numerical stability and accuracy. While the KH closure is known to underestimate the height of strong associative peaks, these errors are mitigated by broadening of the peaks, and this often corrects the solvation thermodynamics and structure. For further theoretical details of the theory, please refer to [21][22][23]. It is important to keep in mind that the predicted solvation free energy with any theoretical model is not absolute. The excess chemical potentials obtained from the 3D-RISM-KH theory have a qualitative relation with the experimental solvation free energies, and a more quantitative measure can be obtained by careful calibration using the so-called "universal correction" scheme [24] and using the partial molar volume (PMV, Ṽ) computed from extending the RISM formalism with the Kirkwood−Buff theory and isothermal compressibility (χT): The objective of this manuscript is to first establish a 3D-RISM-KH-based computational protocol to describe the liquid state of DED. This theoretical framework is then validated against traditional molecular dynamics (MD) simulations. The 3D-RISM-KH-based calculations are then extended to calculate excess chemical potentials of 48 solutes in DED and in water. These excess chemical potentials are then used as molecular solvation descriptors with other 2D descriptors to develop a quantitative structure partitioning model using machine learning techniques. This work serves as a proof of concept for successful application of the 3D-RISM-KH solvation energy descriptors in predictive modeling of the decadiene−water partition of small molecules.

Materials and Methods
Database preparation: The experimental DED/Water molecular partitioning for small molecules (KDED/W) was collected from the work of Nitsche and co-workers [14]. For logKDED/W data, the reported standard errors in the KDED/W are ignored. This dataset is also a part of those reported by Abraham et al. [12].
Molecular dynamics (MD) simulations: The MD simulations of liquid DED were done using the GROMACS software package [25]. The molecule was parameterized using the all-atom OPLS and CHARMM force fields. The OPLS parameters were generated from the LigParGen server with the 1.14*CM1A-LBCC charge assignment protocol [26][27][28]. The CHARMM parameters were developed using the SWISSPARAM webserver [29,30]. Additionally, the united atom GROMOS parameters were also used for the MD simulations of DED. These parameters were obtained from the Automated Topology Builder webserver [31][32][33]. For all the liquid phase simulation, a homogeneous cubic simulation box with 256 solvent molecules was generated. The initial energy-minimized solvent box was subjected to 500 ps NVT and NPT equilibration without any constraints under periodic boundary conditions. The target temperature was set to 298 K and the target pressure to 1 bar using Berendsen thermostat. The temperature and density profiles were used to judge the adequacy of the equilibration steps. The final production runs were 5 ns long. All the radial distribution functions were calculated using the built-in functions of the GROMACS package from the production simulation trajectories.
Three-dimensional RISM-KH calculations: The lowest energy conformation of all the solutes generated using the OpenBabel toolkit with MMFF94 force field was further used for all the RISM calculations [34]. The 3D-RISM-KH-based excess chemical potential and partial molar volume (used as descriptors in the prediction) were calculated for all the solutes using our in-house 3D-RISM-KH code, a working version of which is implemented in the AMBERTOOLS suite of programs [35]. We used the Transferable Potentials for Phase Equilibria family of force fields (TraPPE) of Siepmann and co-workers for a 1,9decadiene molecule [36,37]. This is a united atom force field with no charges on the carbon sites. The extended-RISM (X-RISM) formalism was used for calculating susceptibility The objective of this manuscript is to first establish a 3D-RISM-KH-based computational protocol to describe the liquid state of DED. This theoretical framework is then validated against traditional molecular dynamics (MD) simulations. The 3D-RISM-KHbased calculations are then extended to calculate excess chemical potentials of 48 solutes in DED and in water. These excess chemical potentials are then used as molecular solvation descriptors with other 2D descriptors to develop a quantitative structure partitioning model using machine learning techniques. This work serves as a proof of concept for successful application of the 3D-RISM-KH solvation energy descriptors in predictive modeling of the decadiene−water partition of small molecules.

Materials and Methods
Database preparation: The experimental DED/Water molecular partitioning for small molecules (KDED/W) was collected from the work of Nitsche and co-workers [14]. For logKDED/W data, the reported standard errors in the KDED/W are ignored. This dataset is also a part of those reported by Abraham et al. [12].
Molecular dynamics (MD) simulations: The MD simulations of liquid DED were done using the GROMACS software package [25]. The molecule was parameterized using the all-atom OPLS and CHARMM force fields. The OPLS parameters were generated from the LigParGen server with the 1.14*CM1A-LBCC charge assignment protocol [26][27][28]. The CHARMM parameters were developed using the SWISSPARAM webserver [29,30]. Additionally, the united atom GROMOS parameters were also used for the MD simulations of DED. These parameters were obtained from the Automated Topology Builder webserver [31][32][33]. For all the liquid phase simulation, a homogeneous cubic simulation box with 256 solvent molecules was generated. The initial energy-minimized solvent box was subjected to 500 ps NVT and NPT equilibration without any constraints under periodic boundary conditions. The target temperature was set to 298 K and the target pressure to 1 bar using Berendsen thermostat. The temperature and density profiles were used to judge the adequacy of the equilibration steps. The final production runs were 5 ns long. All the radial distribution functions were calculated using the built-in functions of the GROMACS package from the production simulation trajectories.
Three-dimensional RISM-KH calculations: The lowest energy conformation of all the solutes generated using the OpenBabel toolkit with MMFF94 force field was further used for all the RISM calculations [34]. The 3D-RISM-KH-based excess chemical potential and partial molar volume (used as descriptors in the prediction) were calculated for all the solutes using our in-house 3D-RISM-KH code, a working version of which is implemented in the AMBERTOOLS suite of programs [35]. We used the Transferable Potentials for Phase Equilibria family of force fields (TraPPE) of Siepmann and co-workers for a 1,9-decadiene molecule [36,37]. This is a united atom force field with no charges on the carbon sites. The extended-RISM (X-RISM) formalism was used for calculating susceptibility functions of DED molecule. The geometry of the DED molecule was used after optimizing with the ANTECHAMBER module of AMBERTOOLS. For susceptibility calculations of the water solvent, the dielectric-RISM (DRISM) formalism was used with the modified SPCe force field parameters [38]. The solute force field parameters are summarized in Figure 1.
We employed the generalized Amber force field (GAFF) parameters with the AM1-BCC charges for all the solutes [39,40]. The 3D-RISM-KH calculations for the solute molecules were performed using a uniform cubic 3D-grid of 128 × 128 × 128 points in a box of size 64 × 64 × 64 Å3 to represent a solute with a few solvation layers. The convergence accuracy was set to 10 −5 in the modified direct inversion in the iterative subspace (MDIIS) solver. The excess chemical potentials (exchem) of the solutes in each solvent were used as additional descriptors for the QSAR models. All the solutes were treated as their neutral form, unless otherwise mentioned in the discussion section.
functions of DED molecule. The geometry of the DED molecule was used af with the ANTECHAMBER module of AMBERTOOLS. For susceptibility the water solvent, the dielectric-RISM (DRISM) formalism was used with SPCe force field parameters [38]. The solute force field parameters are summ ure 1. We employed the generalized Amber force field (GAFF) parameters BCC charges for all the solutes [39,40]. The 3D-RISM-KH calculations for th cules were performed using a uniform cubic 3D-grid of 128 × 128 × 128 po size 64 × 64 × 64 Å3 to represent a solute with a few solvation layers. Th accuracy was set to 10 −5 in the modified direct inversion in the iterative sub solver. The excess chemical potentials (exchem) of the solutes in each solv as additional descriptors for the QSAR models. All the solutes were treated form, unless otherwise mentioned in the discussion section. Two-dimensional molecular descriptor generation: Molecular descrip erated from the corresponding SMILES strings of the solute molecules usin available PaDEL-Descriptor software [41].
Machine learning and statistical modeling [42,43]: The machine learn models for molecular partitioning were developed with the above-genera descriptors. The statistical importance analysis of the descriptors, machine lations, and performance indices of models were calculated via the "Ext Boosting" (XGBoost) and random forest (RF) technique used successively the parameters reducing the relative mean square error (RMSE). The train the machine learning protocols contained 80% of the randomly chosen dat taset.

Results and Discussion
In the following sections, we have detailed our findings on compa trasting the results of the RISM-KH simulation of liquid DED with the M using both the all-atom and united atom versions of the current generatio Subsequently, we have shown the applicability of the 3D-RISM-KH proto herein in predicting DED−Water partitioning of a diverse set of compound The lack of experimental data on the liquid state of DED made it diffic the simulation results for accuracy. There is handful of chemical literature simulations involving DED in the context of a membrane mimic [10,[44][45][46] constant of the DED molecule (2.16) was adopted from the works of Lomize the MD simulations with different force fields, the equilibration step yielde density of the system (Table 1). Two-dimensional molecular descriptor generation: Molecular descriptors were generated from the corresponding SMILES strings of the solute molecules using the publicly available PaDEL-Descriptor software [41].
Machine learning and statistical modeling [42,43]: The machine learning predictive models for molecular partitioning were developed with the above-generated molecular descriptors. The statistical importance analysis of the descriptors, machine learning calculations, and performance indices of models were calculated via the "Extreme Gradient Boosting" (XGBoost) and random forest (RF) technique used successively by optimizing the parameters reducing the relative mean square error (RMSE). The training set used in the machine learning protocols contained 80% of the randomly chosen data from the dataset.

Results and Discussion
In the following sections, we have detailed our findings on comparing and contrasting the results of the RISM-KH simulation of liquid DED with the MD simulations using both the all-atom and united atom versions of the current generation force fields. Subsequently, we have shown the applicability of the 3D-RISM-KH protocol developed herein in predicting DED−Water partitioning of a diverse set of compounds.
The lack of experimental data on the liquid state of DED made it difficult to compare the simulation results for accuracy. There is handful of chemical literature on molecular simulations involving DED in the context of a membrane mimic [10,[44][45][46]. The dielectric constant of the DED molecule (2.16) was adopted from the works of Lomize et al. [45]. For the MD simulations with different force fields, the equilibration step yielded a reasonable density of the system (Table 1). The intermolecular separations, as observed from the radial distribution function (RDF) for alkene atoms (1 and 2 in Figure 1) and saturated CH 2 centers (3 in Figure 1), are consistent among the different force fields used (Table 2). A compact arrangement of DED molecules in the liquid state exists, which is an excellent property for mimicking a lipid membrane. As evident from the relatively short intermolecular separations of saturated sites (~2.5-3.9 Å, based on force field choice), the core of the liquid structure is more compact than the terminal parts. It is possible that an aliphatic π-interaction for the terminal alkene groups of DED molecule exists in the liquid state. For ethylene dimers, the intermolecular distances were reported to be~3.8 Å from gas phase geometry optimizations at the coupled cluster level with the correlation consistent basis sets [46]. The partial distribution functions from the RISM-KH calculations with the TraPPE parameters for liquid DED are qualitatively similar to those from the MD simulations but with certain deviations. For instance, the first maxima for the alkene sites have two overlapping peaks. These terminal groups are packed closer than those obtained from the MD simulations ( Figure 2). The second maxima of these distribution functions appear at the distance slightly lower than those from the MD simulations, although qualitatively in the similar region for both the types of calculations. The alkane groups are less tightly packed in the RISM-KH calculations. a Solvent sites are provided in Figure 1. C1 is the terminal H 2 C = site (1), C2 = CH-site (2), and CH 2 is the saturated alkane site (3). The intermolecular separations, as observed from the radial distribution functio (RDF) for alkene atoms (1 and 2 in Figure 1) and saturated CH2 centers (3 in Figure 1), ar consistent among the different force fields used ( Table 2). A compact arrangement of DE molecules in the liquid state exists, which is an excellent property for mimicking a lipi membrane. As evident from the relatively short intermolecular separations of saturate sites (~2.5-3.9 Å, based on force field choice), the core of the liquid structure is more com pact than the terminal parts. It is possible that an aliphatic π-interaction for the termin alkene groups of DED molecule exists in the liquid state. For ethylene dimers, the inte molecular distances were reported to be ~3.8 Å from gas phase geometry optimizations the coupled cluster level with the correlation consistent basis sets [46]. The partial distr bution functions from the RISM-KH calculations with the TraPPE parameters for liqui DED are qualitatively similar to those from the MD simulations but with certain devi tions. For instance, the first maxima for the alkene sites have two overlapping peaks. Thes terminal groups are packed closer than those obtained from the MD simulations (Figur 2). The second maxima of these distribution functions appear at the distance slightly lowe than those from the MD simulations, although qualitatively in the similar region for bot the types of calculations. The alkane groups are less tightly packed in the RISM-KH ca culations.  To calculate the DED/Water partition coefficients, we used the DED susceptibility function calculated using the TraPPE parameters and the water susceptibility functions with the modified SPCe parameters using the RISM formalism. The excess chemical potentials calculated from the 3D-RISM-KH theory for DED and water medium as well as the partial molar volume (PMV) of the solutes in the two solvents were used as molecular descriptors. Molecular polarity is an important factor for partitioning between two solvents of opposite polarity. Hence, we have used the topological polar surface area (TopoPSA) [47] and the polarity indices calculated from the connectivity table of the molecule (apol, bpol). The number of hydrogen bond donors and acceptors in the solute molecules is also incorporated in the initial calculations (calculated based on Lipinski's convention). Hybridization ratios of solutes were another structural descriptor. All the statistical manipulations and machine learning methods were done using the standard Python ® implementations. A sample set of the script for XGBoost, linear regression, and random forest methods is provided in the GitHub link. The target function for all the machine learning was logK DED/W collected from the work of Nitsche and coworkers [14]. The most important parameters for predicting the partition coefficients are TPSA and excess chemical potentials in DED and water medium. The polarity index apol also has a positive effect on predictive power. The parameters with the least influence in the prediction scheme are bpol, hydrogen bond donor/acceptor count, PMVs, and hybridization ratio count (Figure 3). a Solvent sites are provided in Figure 1. C1 is the terminal H2C = site (1), C2 = CH-site (2), and CH is the saturated alkane site (3).
To calculate the DED/Water partition coefficients, we used the DED susceptibili function calculated using the TraPPE parameters and the water susceptibility function with the modified SPCe parameters using the RISM formalism. The excess chemical p tentials calculated from the 3D-RISM-KH theory for DED and water medium as well the partial molar volume (PMV) of the solutes in the two solvents were used as molecul descriptors. Molecular polarity is an important factor for partitioning between two so vents of opposite polarity. Hence, we have used the topological polar surface are (TopoPSA) [47] and the polarity indices calculated from the connectivity table of the mo ecule (apol, bpol). The number of hydrogen bond donors and acceptors in the solute mo ecules is also incorporated in the initial calculations (calculated based on Lipinski's co vention). Hybridization ratios of solutes were another structural descriptor. All the stati tical manipulations and machine learning methods were done using the standard Python implementations. A sample set of the script for XGBoost, linear regression, and rando forest methods is provided in the GitHub link. The target function for all the machin learning was logKDED/W collected from the work of Nitsche and coworkers [14]. The mo important parameters for predicting the partition coefficients are TPSA and excess chem ical potentials in DED and water medium. The polarity index apol also has a positive effe on predictive power. The parameters with the least influence in the prediction scheme a bpol, hydrogen bond donor/acceptor count, PMVs, and hybridization ratio count (Figu 3). The 3D-RISM-KH descriptors (excess chemical potentials and PMVs in decadiene and water) are used with the aforementioned important 2D descriptors for quantitative structure activity modeling. The XGBoost method yielded an overall relative mean square error (RMSE) of 1.09 units for the test set. Prediction of the entire dataset by XGBoost yielded an RMSE of 0.05 unit. The small error in the prediction could be an effect of overfitting due to the small dataset. The random forest (RF) predictions yielded RMSEs of 1.14 and 0.68 units for test set and the whole dataset, respectively. Application of the simple linear regression model resulted in larger RMSEs, 1.23 and 0.92 units for the test and whole dataset, respectively. In order to judge the performance gain in predicting partition coefficients by incorporating the 3D-RISM-KH computed solvation descriptors, we have built predictive models using only 2D-moleculae descriptors, from here on denoted as XGB-2D-Descriptor Model. This model also yielded excellent predictions (RMSEs of 0.13 and 1.04 units for the whole data set and the test set, respectively) but with albeit higher RMSE than those computed by the XGBoost method with the 3D-RISM-KH solvation descriptors. The calculated partition coefficients and statistical correlation of the XGBoost and the random forest machine learning models' computed partition coefficients with experimental data are provided in Table 3 and Figure 4. The statistical correlations of these machine learning models for the test set and the whole dataset of 48 compounds are provided in Table 4. The earlier predictions by Abraham and coworkers [12] and by Nitsche and coworkers [13] had also reported excellent predictive models with empirical descriptors.  [48]. Molecules in the test set are marked with an asterisk (*). b Experimental logK. c logK computed using the XGBoost method. d logK DED/W computed using the random forest method. e logK DED/W computed using the multiple linear regression model.
Physchem 2021, 1, FOR PEER REVIEW 8 a Pubchem CID of the solutes [48]. Molecules in the test set are marked with an asterisk (*). b Experimental logK. c logK computed using the XGBoost method. d logKDED/W computed using the random forest method. e logKDED/W computed using the multiple linear regression model.

Conclusions
The 3D-RISM-KH molecular solvation theory was used to first generate suitable solvent susceptibility functions of liquid 1,9-decadiene. The choice of the force field, viz. TraPPE, was guided by the nonpolar nature of DED and also by the fact that it offers a reduction in the number of solvent sites with different atomic parameters, thus helping in the convergence of the MDIIS solver used for the RISM calculations. The XRISM-KH computed partial distribution functions showed qualitative agreement with the radial distributions obtained from the all-atom and united atom MD simulations. There are some differences observed in the nature of molecular packing in the liquid DED computations by the RISM and MD methods. The RISM calculations provide a more compact ordering of the terminal region than what was observed from the MD simulation data. In the absence of experimental results, it is impossible to comment on these differences. The quantitative predictions of the DED/Water partition coefficients for 48 solutes were done using the excess chemical potentials in DED and water solvents as descriptors with a few other 2D molecular descriptors. Amongst the three different machine learning models, the XGBoost method provided the best performance, followed by the random forest and multiple linear regression methods. The benzoic acid system is an outlier in the XGBoost method. This solvent combination is referred to as a mimic of lecithin/water permeation of molecules, and hence, it is useful in drug development applications. The 48 solutes used in this study cover a vast class of chemical functionality including peptide bond analogs, and so a broad applicability of this quantitative prediction scheme is anticipated. In summary, the present work serves as a proof of concept of successful application of the 3D-RISM-KH calculated excess chemical potentials of solutes in the 1,9-decadiene and water solvents as descriptors in predicting decadiene-water partitioning.