Assessment of the SM12, SM8, and SMD Solvation Models for Predicting Limiting Activity Coefﬁcients at 298.15 K

: The SM x ( x = 12, 8, or D) universal solvent models are implicit solvent models which using electronic structure calculations can compute solvation free energies at 298.15 K. While solvation free energy is an important thermophysical property, within the thermodynamic modeling of phase equilibrium, limiting (or inﬁnite dilution) activity coefﬁcients are preferred since they may be used to parameterize excess Gibbs free energy models to model phase equilibrium. Conveniently, the two quantities are related. Therefore the present study was performed to assess the ability to use the SM x universal solvent models to predict limiting activity coefﬁcients. Two methods of calculating the limiting activity coefﬁcient where compared: 1) The solvation free energy and self-solvation free energy were both predicted and 2) the self-solvation free energy was computed using readily available vapor pressure data. Overall the ﬁrst method is preferred as it results in a cancellation of errors, speciﬁcally for the case in which water is a solute. The SM12 model was compared to both UNIFAC and MOSCED. MOSCED was the highest performer, yet had the smallest available compound inventory. UNIFAC and SM12 exhibited comparable performance. Therefore further exploration and research should be conducted into the viability of using the SM x models for phase equilibrium calculations.


Introduction
Limiting (or infinite dilution) activity coefficients are of particular interest both in their theoretical and practical applications. Theoretically, they shed insight into the type and strength of the interactions between the solute and solvent molecules in a mixture and serves as a good measure of a mixture's non-ideality. This theoretical understanding provides the framework that directly impacts design schemes for chemical processes [1]. The applications are vast for they are used in the design of separation equipment, solvent selection, the design of pharmaceuticals, stripping operations, leaching operations, and the extraction of dilute materials [2,3]. Previous studies have also shown that activity coefficients can be used to predict the formation of an azeotrope [4][5][6][7][8]. The equilibrium composition of a dilute mixture can be also be determined, thus having industrial applications in azeotropic distillation and high-purity extraction [9]. This can serve as a means for process intensification in the environmental sphere, since environmental engineering addresses the implications of a chemical very dilute in a phase, often requiring dilute separation processes [10]. For example, limiting activity coefficients can be used to determine partition coefficients for a dilute species between two phases, which is critical in predicting the bio-accumulation of long-lived chemicals in the environment and food chain [2]. Furthermore, limiting activity coefficients of each component in a binary mixture can be used to parameterize binary interaction excess Gibbs free energy models to make phase equilibrium calculations over an entire range of concentrations [8,[11][12][13]. These parameterizations are particularly advantageous for maximizing the efficiency of separation processes, which comprise a large majority of chemical plant operations and costs.
Limiting activity coefficients can be experimentally determined through a variety of methods including: gas chromatography, gas stripping, ebulliometry, and the dew point method [9,14]. However when there is little or no experimental data present, predictive models and tools can be implemented. Some popular methods include group contributions methods, solubility parameter based methods, molecular dynamic simulations and electronic structure calculations. The Universal Quasichemical Functional-group Activity Coefficients (UNIFAC) method is a group contribution method designed by breaking molecules into functional groups and calculating the configuration energy of the molecule as the sum of group interaction energies [15,16]. Similarly, mod-UNIFAC (Dortmund) is a revision of the original UNIFAC model that accounts for the temperature dependence of activity coefficients by including temperature-dependent interaction parameters within the model [17,18]. Solubility parameter based methods such as the modified separation of cohesive energy density (MOSCED) [8,19,20] and Hansen solubility parameter (HSP) methods [21], while they limit the user to modeling systems for which solute parameters are available, they give molecular-level insight into the the behavior of a binary system. Molecular dynamic simulations and electronic structure calculations can be computationally rigorous, yet are advantageous for they only require the input of a molecular structure [22][23][24][25].
The pursuit of accurate, efficient predictive models is vital as it could reduce the experimental and design cost associated with chemical processes. This is of particular interest in separation processes, as they may account for 40-70 percent of the capital and operating cost of a chemical plant [26]. Predictive computational tools have become an essential asset in conducting feasibility and comparative studies, as they are significantly quicker and more cost-efficient than experimental determinations [27]. They are necessary when property data is unavailable, specifically in the case of dealing with new products and compounds. However within computational advancement, there is a risk in losing the intuitive molecular understanding of the processes that lead to observed outcomes. Thus, including thermodynamic parameters such as limiting activity coefficients in the computational process provides molecular insight that could greatly assist in resultant recommendations for industrial design [10].
This study seeks to assess the capability of the SMx (x= 12, 8, or D) universal solvent models for predicting limiting activity coefficients [28][29][30]. The SMx universal solvent models are implicit or continuum solvent models that can compute solvation free energies at 298.15 K using electronic structure calculations. The models have long been under active development with their accuracy continuously improving and are implemented in a number of software packages [28,[31][32][33]. The only input needed from the user is a structure of the molecule; therefore the SMx models are set apart from other methods in the sheer simplicity for the user. Previous work to use the SMx (x= 12, 8, or D) universal solvent models to predict limiting activity coefficients is limited. Recently, Lisboa and Pliego [34] used SMD to compute 46 limiting activity coefficients. In the present study, we compute over 3000 limiting activity coefficients with comparison to reference data, for each SM12, SM8 and SMD. In addition, we compare two methods of calculating the limiting activity coefficient: 1) The solvation free energy and self-solvation free energy were both predicted and 2) the self-solvation free energy was computed using readily available vapor pressure data. To better understand the two methods, we compare the individual contributions of the solvation and self-solvation free energy to available reference data. In general, there is not a significant difference between the limiting activity coefficients computed using the two methods. However, for the cases in which water is a solute, the results are significantly better when both the solvation and self-solvation free energy are predicted using SMx. This results from a cancellation of errors in the two terms. We therefore recommend that in general SMx be used to predict both the solvation and self-solvation free energy when computing limiting activity coefficients.

Method
Performing electronic structure calculations in the SMx (x= 12, 8, or D) universal solvent model, one calculates the solvation free energy of a solute (component i) at infinite dilution in a solvent (component j), ∆G solv i,j , where i ={1 or 2} and j ={1 or 2} [28][29][30]35,36]. The solvation free energy in this context is defined as taking a solute from a non-interacting ideal gas state to solution at the same molecular density (or concentration). We have shown previously that the solvation free energy is readily related to the limiting activity coefficient (γ ∞ i,j ) as [24,[37][38][39][40][41][42]: where ∆G self i,i is the solute "self"-solvation free energy, R is the molar gas constant, and v i and v j correspond to the pure liquid molar volume of component i and j, respectively. When i = j we obtain the correct limiting behavior that γ ∞ i,j = 1. The self-solvation free energy, ∆G self i,i , may readily be calculated by performing a solvation free energy calculation for component i in itself. Alternatively, we can relate ∆G self i,i to the pure liquid fugacity of component i, f 0 i , as [38]: We can expand f 0 i as [13]: where φ sat i and P sat i are the fugacity coefficient and vapor pressure of pure component i at saturation at T, and the term in brackets is the Poynting correction, and accounts for the change in fugacity in going from P sat to P at constant T. If we assume that the vapor phase is an ideal gas and that the Poynting correction is negligible, we have [13,38,43,44]: If we are at low pressures well removed from the critical point, and we have a non-self associating fluid (i.e., no carboxylic acid), use of this expression is generally reasonable. Often times, the pure component property P sat i is known. This presents an opportunity to use reference data to compute ∆G self i,i as: and then using electronic structure calculations in the SMx universal solvent models to compute ∆G solv i,j , which is a property of the mixture.

Computational Details
All solvation free energy calculations were performed in the SMx (x= 12, 8, or D) universal solvent models using QChem 5.1.2 [45,46]. We selected systems by searching through the Minnesota Solvation Database version 2012 [47] for binary systems for which QChem had (SMx) solvent parameters for both components. The Minnesota Solvation Database was selected as our reference for comparison as it was used in the parameterization of SM12 [28], and earlier versions were used in the parameterization of SM8 and SMD [29,30]. The data in the Minnesota Solvation Database is restricted to 298.15 K, and it follows that calculations using the SMx solvent models are restricted to 298.15 K. Also, the Minnesota Solvation Database contains only solvation and transfer free energy data. We computed reference limiting activity coefficients from the solvation free energies using eq. (1), where the self solvation free energy was computed via eq. (5) using reference vapor pressure and molar volume data [48]. Note that iodine and carboxylic acid containing molecules were excluded from our search. We excluded the seven iodine hydrocarbons because SM8 and SMD were not parameterized to treat iodine containing solutes [29,30]. The six molecules containing a carboxylic acid functional group were excluded as they are known to dimerize in the vapor phase [49], and we wish to compare the use of eq. (2) and eq. (4) when computing limiting activity coefficients. Excluding these 13 molecules, QChem contains SMx parameters for 148 additional molecules (solvents).
This resulted in a data set of solvation free energies (and hence limiting activities) for 1140 systems, consisting of 107 unique solutes and 73 unique solvents. The Minnesota Solvation Database contained self-solvation free energies at 298.15 K for only 62 of the 148 possible molecules. We therefore instead computed a set of reference self-solvation free energies for all 148 molecules using eq. (5) with reference vapor pressure and molar volume data [48]. For the 62 molecules for which values were available in the Minnesota Solvation Database, the computed values were found to be consistent. The inclusion of both solvation free energies and self-solvation free energies will allow us to look at the individual contributions to the limiting activity coefficient.
As already mentioned, the Minnesota Solvation Database was used in the parameterization of the SMx universal solvent models. As a secondary test, we curated reference data from Parts 1 to 6 of DECHEMA's "Activity Coefficients at Infinite Dilution" volumes [50][51][52][53][54][55]. We again restricted ourselves to binary systems for which both components were in our set of 148 molecules (solvents) for which QChem contains SMx parameters. This resulted in limiting activity coefficients for an additional 2163 systems, consisting of 117 unique solutes and 77 unique solvents. Note that as compared to the Minnesota Solvation Database wherein there was only a single entry for each unique system (i.e., solute-solvent pair), duplicates were present in DECHEMA. We retained all of the data, including duplicates, in the final set.
As we will discuss, we find that challenges occur in predicting limiting activity coefficients for aqueous systems. We therefore sourced an additional data set composed of organics in water and water in organics at 298.15 K from "Yaws' Handbook of Properties for Aqueous Systems" [56]. Data was available for 121 systems where water is the solvent and 90 systems where water is the solute, with no duplicates.
The calculations proceeded by first obtaining a daylight SMILES [57,58] for each of the 148 molecules from Pubchem [59]. We then generated 3-D structures from the SMILES using Open Babel 2.3.2 [60,61]. Open Babel was then used to perform a systematic conformational search to identify the lowest energy conformer, followed by geometry optimization. The conformational search and geometry optimization were conducted using the General Amber Force Field [62,63] with Gasteiger partial charges [64]. The final structures were then further refined by performing a geometry optimization in vacuum using QChem 5.1.2 at the M06-2X/cc-pVDZ level of theory/basis set [35,65,66]. This final structure was then used for all of the solvation free energy calculations.
To calculate the solvation (including self-solvation) free energies, single point energy calculations were performed in vacuum and in the SM12, SM8 and SMD universal solvent models for each solvent of interest at the M06-2X/6-31G(d) level of theory/basis set using QChem 5.1.2. Using QChem 5.1.2, the calculation of the solvation free energy is handled completely internally by the software. We emphasize that we used the same conformation for the solute in the ideal gas (vacuum) and solution phases, consistent with the SM12, SM8, and SMD parameterizations [28][29][30]. It is possible to consider different configurations for the two phases, and it is possible to incorporate multiple configurations into the calculation of the solvation free energy which may be important for some systems [67]. However, this was not pursued here for simplicity. All of the SMx computed self-solvation free energies (∆G self i,i ), solvation free energies (∆G solv i,j ), and limiting activity coefficients (γ ∞ i,j ), along with the reference values and results of the reference calculations (UNIFAC and MOSCED) to which comparison is made, are tabulated in the supporting information accompanying the electronic version of this manuscript.

Results and Discussion
The central goal of this work is to assess the ability of electronic structure calculations with the SMx (x= 12, 8, or D) universal solvent models to predict limiting activity coefficients. As seen in eq. (1), this is composed of contributions due to the solvation free energy (∆G solv i,j ) and self solvation free energy (∆G self i,i ). We will therefore first assess each individual contribution. We will quantify the agreement between the reference data and the SMx predictions using the root mean square error (RMSE): the average absolute percent deviation (AAPD): and the mean unsigned error (MUE): where the superscript "pred" and "ref" correspond to the predicted and reference values, respectively, Y corresponds to the property of interest, and N corresponds to the number of systems for which comparison is being made.

Self-Solvation Free Energy
We begin with ∆G self i,i . A parity plot of the predicted versus reference values of ∆G self i,i / (RT) and a summary of the computed statistics is provided in fig. 1 and table 1, respectively. The statistics in table 1 are computed for Y = ∆G self i,i / (RT), with the summation carried out over all N = 148 molecules for which reference data is available. Note that we are interested in the dimensionless ∆G self  Table 1. A summary of the squared correlation coefficient (R 2 ), root mean square error (RMSE), average absolute percent deviation (AAPD), and mean unsigned error (MUE) for the values of ∆G self i,i / (RT) predicted using the SM12, SM8, and SMD universal solvent model as compared to the reference data. Overall, we find that SM12 and SM8 outperform SMD, with SM12 performing slightly better than SM8. The top performance of SM12 is in agreement with the recent SM12 parameterization [28]. Visually, it is clear in fig. 1 that SM12 and SM8 offer a noticeably smaller spread in the data as compared to SMD, corresponding to a smaller overall error.
During the parameterization of SM12, it was pointed out that the largest deviation was observed for the self-solvation free energy of water, which was overly negative by 1 to 2 kcal/mol (or 1.7 to 3.4 RT in dimensionless units) using all methods, and the solvation free energy of the water dimer in water was overly negative by 3 to 4 kcal/mol (or 5.1 to 6.8 RT in dimensionless units) using all methods [28]. We likewise observe large errors for the self-solvation free energy of water, although our predictions are slightly more negative. The reference value of ∆G self i,i / (RT) for water is -10.689, while we predict -14.473, -15.485, and -15.024 with SM12, SM8, and SMD, respectively. This would correspond to predicting a vapor pressure that is too small, and that we are over estimating the affinity of water for the liquid phase or equivalently the strength of the water-water interactions.

Solvation Free Energy
Next we consider ∆G solv i,j , making comparison for the 1140 systems, consisting of 107 unique solutes and 73 unique solvents, from the Minnesota Solvation Database. A parity plot of the predicted versus reference values of ∆G solv i,j / (RT) and a summary of the computed statistics is provided in fig. 2 and table 2. The statistics in table 2 are computed for Y = ∆G solv i,j / (RT), with the summation carried out over all N = 1140 systems for which reference data is available.  Table 2. A summary of the squared correlation coefficient (R 2 ), root mean square error (RMSE), average absolute percent deviation (AAPD), and mean unsigned error (MUE) for the values of ∆G solv i,j / (RT) predicted using the SM12, SM8, and SMD universal solvent model as compared to the reference data from the Minnesota Solvation Database. We again find that SM12 is the top performing model, although the difference in performance between the models is much smaller for ∆G solv i,j as compared to ∆G self i,i . Moreover, we find that except for the AAPD, the overall errors are all improved for ∆G solv i,j relative to ∆G self i,i . Comparing figs. 1 and 2, there appears to be a smaller spread in the ∆G solv i,j data, which would correspond to the smaller RMSE and MUE. This at first had us surprised. In theory, ∆G self i,i corresponds to a single component, homogeneous system. On the other hand, ∆G solv i,j corresponds to a single solute infinitely dilute in a solvent. In this case we need to account for solute-solvent and solvent-solvent interactions which are not the same. We would therefore assume that the latter case is more difficult to predict and would result in a lower accuracy. However, this was not the case. We do not offer an explanation for this difference, but would like to point out two differences in the reference data sets. 1) Our data set for ∆G solv i,j consists of 1140 systems, while we have just 148 values of ∆G self i,i . And of the 148 values of ∆G self i,i , only 62 were included in the Minnesota Solvation Database. 2) Comparing figs. 1 and 2, we see there is a difference in the range of values in each data set. While a majority of the values of ∆G self i,i / (RT) are over the range -15 to -5, there is an appreciable number of values of ∆G solv i,j / (RT) greater than -5. We also point out that this last point contributes to the larger AAPD for ∆G solv i,j . The computed errors and top performance of SM12 are consistent with those published in the recent SM12 parameterization [28].
Considering again the original SM12 parametrization, it was found that the errors for the solvation free energy (both aqueous and non-aqueous) for the solute class containing molecular hydrogen, ammonia, water, and the water dimer were relatively large compared to the other solute classes [28]. As already mentioned, the original SM12 parameterization pointed out a particular challenge with the self-solvation free energy of water which was predicted to be overly negative. To explore this further, we broke down ∆G solv i,j to cases where water was the solvent (j =water), water was the solute (i =water), and when neither the solvent or solute was water. Note that ∆G self i,i was not include in the set of ∆G solv i,j . This resulted in 104 systems wherein water was the solvent, 14 systems wherein water was the solute, and 1022 systems wherein water was neither the solvent or solute. A parity plot of the predicted versus reference values of ∆G solv i,j / (RT) with SM12 wherein these three cases are differentiated is provided in fig. 3. We find that in general when water is the solute, ∆G solv i,j is predicted to be overly negative. We find that ∆G solv i,j / (RT) is overly negative on average by 3.4, 3.5, and 2.4 (dimensionless units) when using SM12, SM8, and SMD, respectively. Compare this to the case of ∆G self i,i / (RT) for water predicted to be overly negative by 3.8, 4.8, and 4.3 (dimensionless units) for SM12, SM8, and SMD, respectively.

Limiting Activity Coefficients
When predicting limiting activity coefficients (γ ∞ i,j ) with the SMx (x= 12, 8, or D) universal solvent models, two different strategies were evaluated. 1) We predicted both ∆G self i,i and ∆G solv i,j , then using reference values of the molar volume [48], γ ∞ i,j was computed via eq. (1). 2) In the second approach, since the pure component vapor pressure (P sat i ) is commonly available [48], we used this reference data to compute ∆G self i,i via eq. (5). We then combined this reference value of ∆G self i,i with our SMx predicted value of ∆G solv i,j to predict γ ∞ i,j . When computing statistics, for AAPD we use Y = γ ∞ i,j . For all other cases (R 2 , RMSE, and MUE) we use Y = ln γ ∞ i,j .

Minnesota Solvation Database
Limiting activity coefficients (γ ∞ i,j ) were computed for the same 1140 systems from the Minnesota Solvation Database for which ∆G solv i,j was computed. Reference values of γ ∞ i,j were computed via eq. (1) using the reference values of ∆G solv i,j from the Minnesota Solvation Database, the reference values of ∆G self i,i already described, and reference values of the molar volume [48]. A parity plot of the predicted versus reference values of ln γ ∞ i,j is provided in fig. 4. In the top pane results are shown when we predict both ∆G self i,i and ∆G solv i,j , and the bottom pane shows the case when ∆G self i,i was computed using reference data.   [47]. Predictions are made using the SM12, SM8, and SMD universal solvent model, as indicated. The blue lines corresponds to the y = x line, and the dashed blue lines correspond to ±1.1 and are drawn as a reference; the value of 1.1 was used based on the RMSE for the SM12 predictions. In the top pane we present results when ∆G self i,i was predicted using the SM12, SM8, or SMD universal solvent model. In the bottom pane we present results when ∆G self i,i was computed using reference vapor pressure data.
In general, there does not appear to be a significant difference between the predictions wherein ∆G self i,i was predicted or computed using reference data. The spread in results does appear to decrease when ∆G self i,i is computed using reference data, which may reflect the spread in the predicted values of ∆G self i,i (see fig. 1). A comparison of the statistics is provided in table 3. Table 3. A summary of the squared correlation coefficient (R 2 ), root mean square error (RMSE), and mean unsigned error (MUE) for ln γ ∞ i,j , and the average absolute percent deviation (AAPD) for γ ∞ i,j predicted using the SM12, SM8, and SMD universal solvent model as compared to the reference data from the Minnesota Solvation Database. For the predictions, we compare the case where 1) ∆G self i,i is predicted using the SM12, SM8, and SMD universal solvent model, and where 2) ∆G self i,i is computed using reference data, as indicated. Again, while the difference in the two sets of predictions does not appear to be large, in all cases we find that R 2 increases and the computed overall errors decrease when using reference values of ∆G self i,i . One might expect this result. Consider eq. (1) and the errors summarized in tables 1 to 3, where we will take RMSE as a measure of the error in the predictions. When using reference values of ∆G self i,i , the only predicted quantity in eq. (1) is ∆G solv i,j . As a result, we find for this case that the RMSE for the predicted values of ln γ ∞ i,j is identical to that for ∆G solv i,j / (RT). Next, consider the case of when ∆G self i,i is predicted. Using propagation of error, we would expect the error (taken to be RMSE) in ln γ ∞ i,j to be equal to RMSE 2 solv + RMSE 2 self , where RMSE solv and RMSE self are the RMSE for the predicted values of ∆G solv i,j / (RT) and ∆G self i,i / (RT), respectively. This would result in an RMSE of 1.893, 1.970, and 2.170 for SM12, SM8 and SMD, respectively. However, we find that the actual error is less than the predicted error. The expression used assumes that the error is random [68]. The smaller observed uncertainty suggests the presence of systematic error.
With this in mind, let us again look at the specific case of water. We found earlier that when water is the solute, on average ∆G solv i,j / (RT) is predicted to be overly negative by 3.4, 3.5, and 2.4 (dimensionless units) when using SM12, SM8, and SMD, respectively. Further, ∆G self i,i / (RT) for water is predicted to be overly negative by 3.8, 4.8, and 4.3 (dimensionless units) for SM12, SM8, and SMD, respectively. From eq. (1), this leads to a convenient cancellation of errors in the calculation of ln γ ∞ i,j . For the case of SM12, using the predicted value of ∆G self i,i , this would suggest that on average the value of ln γ ∞ i,j is overly positive by just 0.4. On the other hand, using the reference value of ∆G self i,i , the corresponding value would be overly negative by 3.4. This is illustrated for SM12 in fig. 5 where the predictions were again broken down to cases where water was the solvent (j =water), water was the solute (i =water), and when neither the solvent or solute was water.  Figure 5. Parity plot of the predicted versus reference values of ln γ ∞ i,j using the SM12 universal solvent model for the case where water was the solvent, water was the solute, and where water was neither the solvent or solute, as indicated. The blue lines corresponds to the y = x line, and the dashed blue lines correspond to ±1.1 and are drawn as a reference; the value of 1.1 was used based on the RMSE for the SM12 predictions. The reference values are from the Minnesota Solvation Database. In the top pane we present results when ∆G self i,i was predicted using the SM12 universal solvent model. In the bottom pane we present results when ∆G self i,i was computed using reference vapor pressure data.
Again, in general we find that the predicted values of ln γ ∞ i,j do not change substantially when ∆G self i,i was predicted or computed using reference vapor pressure data. However, when water is the solute, there is a clear difference. This would suggest that when water is the solute of interest, both ∆G solv i,j and ∆G self i,i should be computed consistently in the calculation of ln γ ∞ i,j . In fig. 5 we additionally point out that we observe an apparent break in our trend for reference values of ln γ ∞ i,j > 9. For all cases when the reference values of ln γ ∞ i,j are greater than 9, we under-predict the value of ln γ ∞ i,j . These would correspond to highly non-ideal (unfavorable) systems. For all 26 cases, water is the solvent. A possible source of error could be due to the difficulty of experimentally measuring such non-ideal values [69].

DECHEMA
Next, as a secondary test, values of γ ∞ i,j at 298.15 K were computed for 2163 systems, consisting of 117 unique solutes and 77 unique solvents, from Parts 1 to 6 of DECHEMA's "Activity Coefficients at Infinite Dilution" volumes [50][51][52][53][54][55]. A parity plot of the predicted versus reference values of ln γ ∞ i,j is provided in fig. 6. In the top pane of fig. 6 results are shown when we predict both ∆G self i,i and ∆G solv i,j , and the bottom pane shows the case when ∆G self i,i was computed using reference data. A summary of the statistics is provided in table 4. In the top pane we present results when ∆G self i,i was predicted using the SM12, SM8, or SMD universal solvent model. In the bottom pane we present results when ∆G self i,i was computed using reference vapor pressure data. Table 4. A summary of the squared correlation coefficient (R 2 ), root mean square error (RMSE), and mean unsigned error (MUE) for ln γ ∞ i,j , and the average absolute percent deviation (AAPD) for γ ∞ i,j predicted using the SM12, SM8, and SMD universal solvent model as compared to the reference data from DECHEMA. For the predictions, we compare the case where 1) ∆G self i,i is predicted using the SM12, SM8, and SMD universal solvent model, and where 2) ∆G self i,i is computed using reference data, as indicated. We again breakdown the predictions to cases where water was the solvent (j =water), water was the solute (i =water), and when neither the solvent or solute was water, and plot the results for SM12 in fig. 7. The breakdown resulted in 437 systems where water is the solvent, 52 systems where water is the solute, and 1668 systems where water was neither the solvent or solute.  Figure 7. Parity plot of the predicted versus reference values of ln γ ∞ i,j using the SM12 universal solvent model for the case where water was the solvent, water was the solute, and where water was neither the solvent or solute, as indicated. The blue lines corresponds to the y = x line, and the dashed blue lines correspond to ±1.1 and are drawn as a reference. The reference values are from DECHEMA. In the top pane we present results when ∆G self i,i was predicted using the SM12 universal solvent model. In the bottom pane we present results when ∆G self i,i was computed using reference vapor pressure data.
In general, our observations are the same as when we compared to the Minnesota Solvation Database. First, there again does not appear to be a significant difference between the predictions wherein ∆G self i,i was predicted or computed using reference data. However, for the specific case of water as a solute, there is a noticeable difference. When ∆G self i,i was computed using reference vapor pressure data, ln γ ∞ i,j when water was a solute was under-predicted on average by 3.02, 3.68, and 2.84 with SM12, SM8, and SMD, respectively. For of theses cases, the value of ln γ ∞ i,j was always under-predicted except for water in tetrachloroethene with SMD. Additionally, we again notice an abrupt deviation in our trend for reference values of ln γ ∞ i,j greater than 9. As before, all of these highly non-ideal (unfavorable) systems, water was the solvent. Interestingly, the value of R 2 does increase. However, we are dealing with a larger data set that does contain duplicates.

Yaws' Aqueous Systems
Having identified challenges in predicting ln γ ∞ i,j for aqueous systems, we sourced an additional data set composed of organics in water and water in organics at 298.15 K from "Yaws' Handbook of Properties for Aqueous Systems" [56]. Data was available for 121 systems where water is the solvent and 90 systems where water is the solute, with no duplicates. A parity plot of the results for SM12 is provided in fig. 8. In the top pane of fig. 8 results are shown when we predict both ∆G self i,i and ∆G solv i,j , and the bottom pane shows the case when ∆G self i,i was computed using reference data. A summary of the statistics is provided in table 5, broken down into the case where water is the solvent and water is the solute.  Figure 8. Parity plot of the predicted versus reference values of ln γ ∞ i,j using the SM12 universal solvent model for the case where water was the solvent and water was the solute, as indicated. The blue lines corresponds to the y = x line, and the dashed blue lines correspond to ±1.1 and are drawn as a reference. The reference values are from "Yaws' Handbook of Properties for Aqueous Systems" [56]. In the top pane we present results when ∆G self i,i was predicted using the SM12 universal solvent model. In the bottom pane we present results when ∆G self i,i was computed using reference vapor pressure data. Table 5. A summary of the squared correlation coefficient (R 2 ), root mean square error (RMSE), and mean unsigned error (MUE) for ln γ ∞ i,j , and the average absolute percent deviation (AAPD) for γ ∞ i,j predicted using the SM12, SM8, and SMD universal solvent model as compared to the reference data from "Yaws' Handbook of Properties for Aqueous Systems" [56]. For the predictions, we compare the case where 1) ∆G self i,i is predicted using the SM12, SM8, and SMD universal solvent model, and where 2) ∆G self i,i is computed using reference data, as indicated. The results are additionally separated into case where water was the solvent and where water was the solute, as indicated. The results here are consistent with the other data sets. Considering first ln γ ∞ i,j for the case of water as a solute when ∆G self i,i was computed using reference vapor pressure data. For all cases with SM12 and SM8 ln γ ∞ i,j is under predicted with an average error of 3.485 and 3.397, respectively. With SMD, ln γ ∞ i,j is under predicted for 85 of the 90 systems, with an average error of 2.834. On the other hand, when ∆G self i,i was also predicted, with SM12, SM8, and SMD the number of systems for which ln γ ∞ i,j was under predicted was 22, 13, and 17, respectively. And most importantly, with SM12, SM8, and SMD, the overall MUE noticeably decreased.
When water is the solvent, we find that with SM12, SM8, and SMD, the value of R 2 increases, and the value of the MUE and RMSE decreases for ln γ ∞ i,j . The difference in behavior results from a systematic error when water is the solute, where when ∆G self i,i for water is also predicted, we obtain a convenient cancellation of errors. We note that we find the values of AAPD to be less informative as the value can be dominated by just a few systems.

Free SMx γ ∞
i,j Calculator To facilitate the use of the SM12, SM8, and SMD solvation models to predict γ ∞ i,j and ultimately phase equilibrium, we are actively developing HTML (HyperText Markup Language) based software that can run in a web browser. The software is currently housed on GitHub at: https://github.com/ SpencerSabatino/SolvationFreeEnergyWebsite. For the 148 molecules for which QChem 5.1.2 [45,46] has SMx parameters (which excludes iodine and carboxylic acid containing molecules), the user can select any of the 148 molecules to form a binary pair and compute ∆G self i,i , ∆G solv i,j , and γ ∞ i,j for each component. The QChem 5.1.2 calculation results used by the software were generated in the present study. The software will continue to be developed with additional features added.

Reference Calculations
Overall, we found that the SMx (x= 12, 8, or D) predicted values of ln γ ∞ i,j did not change substantially when ∆G self i,i was predicted compared to when it was computed using reference vapor pressure data, except in the case when water is the solute. Therefore, moving forward the selected method for calculating ln γ ∞ i,j will be to compute ∆G solv i,j and ∆G self i,i consistently; put differently, both ∆G solv i,j and ∆G self i,i will be predicted. To provide further insight into the potential utility of electronic structure calculations with SMx universal solvent models to predict ln γ ∞ i,j , the top performing SM12 was compared to two well-known methods, the Universal Quasichemical Functional-group Activity Coefficients (UNIFAC) method [15,16,70] and the Modified Separation of Cohesive Energy Density (MOSCED) model [19,20,71]. UNIFAC was selected as a comparative method because it is perhaps the most common predictive excess Gibbs free energy method [72]. Additionally, it is widely used and integrated into most process simulators. Brouwer and Schuur [72] recently conducted a comparative study of eight predictive methods to compute ln γ ∞ i,j at 298.15 K. Of the methods evaluated, the solubility parameter method MOSCED was the top performer. This strong performance of MOSCED is consistent with the findings of the 2005 re-parameterization [20,73] and our recent work [8,74]. MOSCED was therefore selected as an additional comparison alongside UNIFAC. Note that for water, we adopted the revised MOSCED parameters from ref. 74 The Minnesota Solvation Database, DECHEMA, and Yaws' reference databases were constrained to match the compound availability provided by the two models, UNIFAC and MOSCED. In comparison to the Minnesota Solvation Database, UNIFAC was able to make predictions for 1065 (of 1140) systems with 99 unique solutes and 68 unique solvents. MOSCED was able to make predictions for 732 (of 1140) systems from the Minnesota Solvation Database with 58 unique solutes and 54 unique solvents. In regards to DECHEMA, UNIFAC was able to model 1988 (of 2163) systems with 112 unique solutes and 70 unique solvents. MOSCED was able to model 1818 (of 2163) systems with 71 unique solutes and 59 unique solvents. We again note that the DECHEMA data set includes repeated systems. All of the UNIFAC calculations were performed with CHEMCAD 7 [75]. A simple GNU Octave/MATLAB [76,77] M-file to predict limiting activity coefficients using MOSCED, a screen cast tutorial and additional references are available on the YouTube Channel of Andrew Paluch [78].
A parity plot of the results for SM12, UNIFAC and MOSCED is provided in fig. 9. In the top pane of fig. 9 results for the Minnesota Solvation Database are displayed, and in the bottom pane the corresponding DECHEMA results are displayed. A summary of the statistics is included in table 6.  In following the same methodology conducted previously, we sought to analyze the models' ability to predict limiting activity coefficients when water is the solute and when water is the solvent. The reference data from "Yaws' Handbook of Properties for Aqueous Systems" again served as the basis for systems that were modeled using the SM12 universal continuum solvent model, UNIFAC and MOSCED. A parity plot is provided in fig. 10 where the top pane contains data where water is the solute and the bottom pane where water is the solvent. The corresponding statistics are recorded in table 7.  i,j at 298.15 K using the SM12 universal solvent model, UNIFAC, and MOSCED versus reference data from "Yaws' Handbook of Properties for Aqueous Systems" [56]. In the top pane results as shown for the case of water as the solute (water in organics), and in the bottom pane results are shown for the case of water as the solvent (organics in water). The blue lines corresponds to the y = x line, and the dashed blue lines correspond to ±1.1 and are drawn as a reference. For the SM12 universal solvent model predictions, ∆G self i,i was predicted. Table 7. A summary of the squared correlation coefficient (R 2 ), root mean square error (RMSE), and mean unsigned error (MUE) for ln γ ∞ i,j , and the average absolute percent deviation (AAPD) for γ ∞ i,j predicted using UNIFAC and MOSCED as compared to the reference data from "Yaws' Handbook of Properties for Aqueous Systems" [56]. The results are separated for the case of water as the solute (water in organics) and water as the solvent (organics in water Within the Yaws' reference set where water is the solute, the SM12 universal continuum solvent model made predictions for 90 systems, UNIFAC modeled 87 systems and MOSCED 53 systems. When water was the solvent, the SM12 universal continuum solvent model made predictions for 121 systems, UNIFAC 114 systems and MOSCED 61 systems. MOSCED in both cases was the highest performer, yet modeled the lowest number of systems due to its limited compound inventory. We additionally note that the MOSCED parameters for water were optimized using limiting activity coefficients from "Yaws' Handbook of Properties for Aqueous Systems" [74]. Overall, all models seem to have more difficulty predicting ln γ ∞ i,j when water is the solute. UNIFAC is the lowest performer in both cases. A plausible reason for the SM12 universal continuum solvent model's improved accuracy over UNIFAC could be due to the cancellation of errors that occurs in computing ∆G self i,i and ∆G solv i,j consistently.
Again, when water is the solvent with reference values of ln γ ∞ i,j >9, we notice a tapering or an under prediction in ln γ ∞ i,j for both the SM12 and UNIFAC models. As stated previously, these are highly non-ideal systems.
The SMx universal solvent models, MOSCED, and UNIFAC are all limited in the system they can treat. MOSCED can only be used to model binary systems for which the the molar volume and five molecular descriptors are available for each component. At present, MOSCED parameters are available for 130 organic solvents and water [20,74]. Efforts have been made to use electronic structure calculations to predict missing MOSCED parameters, including the use of solvation free energy calculations with the SMx universal solvent models [42,[79][80][81][82][83]. While we would not expect the predicted MOSCED parameters to be as good as those regressed using experimental data, the results of the present study nonetheless would suggest they would yield acceptable results, especially for early stage process conceptualization and design applications. Interestingly, ref. 42 suggested that results obtained with MOSCED parameters predicted using solvation free energy calculations with the SMx universal solvent model were superior to predictions made with SMx alone. This may result from implicitly including reference data by using MOSCED parameters that were regressed using reference data.
UNIFAC is a group contribution-based method, and in general is able to cover a wider range of chemical compound space. Nonetheless, some parameters are still missing from the parameter matrix. Recently, it has likewise been demonstrated how electronic structure calculations, specifically COSMO-based (conductor-like screening model) calculations, can be used to predict missing UNIFAC parameters [84][85][86]. So while the SMx universal solvent models, MOSCED, and UNIFAC are all limited in the system they can treat, there is opportunity for synergy among the methods.

Summary and Conclusion
The SMx (x= 12, 8, or D) universal solvent models are implicit solvent models that can be used to compute the solvation free energy at 298.15 K using electronic structure calculations. Fundamentally, the solvation free energy is an important thermophysical property, and corresponds to the transfer free energy of taking a single solute from a non-interacting ideal gas state to solution (at infinite dilution) at the same molar density. In going from a non-interacting state to a state with a single solute molecule in solution, the solvation free energy is representative of the solute-solvent interactions in solution. If the solvation free energy of a solute is computed in two different solvents, they can readily be used (via a thermodynamic cycle) to compute the transfer free energy between the two solvents. This transfer free energy is directly related to the partition coefficient, which can be used to relate to the relative composition of the solute in the two (immiscible) solvents at equilibrium, or more generally can be used to compare the relative affinity of the solute for the two solvents.
In conventional chemical engineering thermodynamic modeling of phase equilibrium, we are interested in the activity coefficient of the components in the mixture, and the excess Gibbs free energy of the mixture. Coincidentally, values of the solvation free energy are not directly measured experimentally. Rather, it is common to measure limiting activity coefficients from which one may obtain the solvation free energy. The limiting activity coefficient is related to the transfer free energy of a single solute from a system of pure liquid solute to a solution phase at infinite dilution. Moreover, limiting activity coefficients may be related directly to the parameters in common binary interaction excess Gibbs free energy models used to model phase equilibrium.
In the present study we therefore assessed the ability to use the SMx universal solvent models to predict limiting activity coefficients. We considered 148 molecules for which SMx (solvent) parameters were available in QChem 5.1.2. As seen in eq. 1 the calculation of limiting activity coefficients is dependent upon both the self-solvation free energy (∆G self i,i ) and solvation free energy terms (∆G solv i,j ). Therefore we assessed the contribution of each individual term. The Minnesota Solvation Database was used as our reference set, since it was used to parameterize the SM12 model, which was consistently the top performer throughout the analysis.
In assessing the contribution of ∆G self i,i and ∆G solv i,j terms, water was found to be a large source of deviation. The ∆G self i,i of water was predicted to be overly negative, suggesting we over-estimate the strength of the water-water interactions. We gained further insight into this contribution by breaking down the ∆G solv i,j analysis into cases where water was the solute (i =water), water was the solvent (j =water), and where water was neither the solute or solvent. We found that when water was the solute, ∆G solv i,j / (RT) is predicted to be overly negative on average by 3.4 (dimensionless units) in the SM12 model as compared to the self-solvation free energy term ∆G self i,i / (RT) which was overly negative by 3.8 (dimensionless units). This deviation ends up being advantageous in discerning how to calculate limiting activity coefficients, as there were two methods in which to do so: 1) prediction of ∆G self i,i and ∆G solv i,j and use of reference values for the molar volume, and 2) compute ∆G self i,i using pure component vapor pressure data, which is readily available. We evaluated these two methods using three reference data sets: the Minnesota Solvation Database, Parts 1 to 6 of DECHEMA's "Activity Coefficients at Infinite Dilution" volumes [50][51][52][53][54][55], and "Yaws' Handbook of Properties for Aqueous Systems" [56]. All three methods told the same story, where at first glance there does not appear to be a significant difference in the predictions when ∆G self i,i is predicted or computed from reference data. However, upon isolating systems where water is the solute, we see a that ln γ ∞ i,j is under-predicted when ∆G self i,i is computed from reference vapor pressure data. Yet when both ∆G self i,i and ∆G solv i,j are predicted there is a convenient cancellation of errors in the ln γ ∞ i,j . Therefore this suggests that the best method to calculate limiting activity coefficients moving forward is to compute ∆G self i,i and ∆G solv i,j consistently; that is, to predict both terms.
In conclusion, the findings were compared to two popular and successful predictive methods, UNIFAC and MOSCED. Comparison to these models allows us to assess the viability of taking the SMx models into further chemical engineering applications. A recent 2019 study conducted by Brouwer and Schuur [72], previously showed MOSCED was the top performing model in an comparative study to predict limiting activity coefficients at 298.15 K. Our comparison further confirms that MOSCED is the top performing model in comparison to UNIFAC and the SM12 universal continuum solvent model, yet it is limited by the number of compounds and systems it can model. UNIFAC and the SM12 universal continuum solvent model are comparable in performance among the three reference sets. This suggests that the SMx models have potential to be used and tested in chemical engineering applications such as the prediction of phase equilibrium There is also opportunity for synergy between SM12, UNIFAC, and MOSCED.