Abraham Solvation Parameter Model: Calculation of L Solute Descriptors for Large C 11 to C 42 Methylated Alkanes from Measured Gas–Liquid Chromatographic Retention Data

: Abraham model L solute descriptors have been determined for 149 additional C 11 to C 42 monomethylated and polymethylated alkanes based on published Kovat’s retention indices based upon gas–liquid chromatographic measurements. The calculated solute descriptors, in combination with previously published Abraham model correlations, can be used to predict a number of very important chemical and thermodynamic properties including partition coefﬁcients, molar solubility ratios, gas–liquid chromatographic and HPLC retention data, inﬁnite dilution activity coefﬁcients, molar enthalpies of solvation, standard molar vaporization and sublimation at 298 K, vapor pressures, and limiting diffusion coefﬁcients. The predictive computations are illustrated by estimating both the standard molar enthalpies of sublimation and the enthalpies of solvation in benzene for the monomethylated and polymethylated alkanes considered in the current study.


Introduction
Scientists and engineers in both academia and the industrial manufacturing sector must rely upon empirical and semi-theoretical models to predict the thermodynamic and physical properties required in process design calculations. Even with today's modern instrumentation it is not feasible to experimentally determine the properties for the more than 60 million known chemical compounds [1]. Chemical manufacturing processes rarely contain only a single compound. Experimental measurements become even more challenging when one takes into account the number of possible binary, ternary and higherorder multicomponent mixtures that can be prepared from existing compounds. The number of possible combinations increases annually as new compounds are synthesized, found and identified.
Over the last 50 years numerous predictive methods have been developed based upon either quantitative structure-property relationships (QSPRs) or group contribution concepts. QSPR methods [2][3][4][5][6][7][8] identify mathematical relationships between the desired property that one wishes to calculate and other physical properties or from compound descriptor values that can be derived from molecular structure and/or quantum mechanical considerations. For some published QSPR studies [2,6,8] more than 1000 different descriptors have been considered before narrowing the descriptor set down to those yielding the desired predictive accuracy. Group contribution methods [9][10][11][12][13][14][15][16], on the other hand, fragment the given molecule into atoms or functional groups. A numerical value is then assigned to each atom or fragment group. In the simplest case the desired property would be calculated as the summation of the product of the number of occurrences each fragment group appears in the molecule times its respective numerical group value. Naturally the method would be limited to those molecules having known functional group values. The predictive method that we have been using to predict thermodynamic and physical properties is based upon the Abraham solvation parameter model [17][18][19][20] which defines solute transfer between two immiscible (or partly miscible) phases in terms of a linear free energy relationship (LFER). The first LFER: describes solute transfer between two condensed phases, while the second LFER: Solute Property = e 2 × E + s 2 × S + a 2 × A + b 2 × B + l 2 × L + c 2 (2) describes solute transfer into a condensed phase from the gas phase. Specific properties that have been successfully described include partition coefficients [20][21][22], molar solubility ratios [19,23,24], aquatic toxicities [25][26][27], nasal pungencies [28], gas-liquid chromatographic and HPLC retention data [18,[29][30][31][32][33], Draize scores and eye irritation thresholds [34,35], human and rat intestinal adsorption data [36,37], human skin permeability [38,39], infinite dilution activity coefficients [40,41], molar enthalpies of solvation [42][43][44], standard molar vaporization [45] and sublimation [46] at 298 K, vapor pressures [47], and limiting diffusion coefficients [48,49]. Unlike many of the QSPRs that have been proposed in the published chemical and engineering literature, Equations (1) and (2) are based upon a fundamental understanding of how molecules interact in solution. The first five terms on the right-hand side of both Equations (1) and (2) represent a different type of molecular interaction that is described by the product of a solute property (E, S, A, B, V and L) times its complimentary solvent or process property (e 1 , s 1 , a 1 , b 1 , v 1 , c 1 , e 2 , s 2 , a 2 , b 2 , l 2 and c 2 ). The uppercase alphabetical characters in both Abraham model correlations are called solute descriptors, and are defined in the following manner: A and B represent the respective overall hydrogen-bond donating and accepting abilities of the dissolved solute; E denotes the given solute's excess molar refraction referenced to that of a linear alkane having a comparable molecular size; L refers to the logarithm of the solute's gas-to-hexadecane partition coefficient determined at 298.15 K; S corresponds to a combination of the electrostatic polarity and polarizability of the solute; and V is the solute's McGowan molecular volume calculated from known sizes of the constituent atoms and chemical bond numbers. The numerical values of the lowercase solvent or process property are determined by a least-squares analysis that involves curve-fitting the measured property for a series of solutes with known descriptor values in a given solvent (or for a given process) in accordance with Equations (1) and (2) of the Abraham model. The calculated numerical values of solvent/process equation coefficients depend upon the organic solvent or process under consideration. In other words, the equation coefficients that describe the enthalpy of solvation of organic solutes dissolved in benzene are numerically different than the values that describe the enthalpy of solvation of organic solutes in dimethyl sulfoxide. The Abraham model is described in greater detail elsewhere [17,18,[50][51][52].
One major advantage that the Abraham model offers over most predictive QSPRs is that the same numerical values of the solute descriptors are used in every Abraham model correlation, irrespective of the chemical property being predicted. This feature avoids having to calculate a new set of solute descriptors every time that one wishes to predict a different chemical property. A common set of solute descriptors for every correlation also permits one to directly compare the solubilizing properties of different organic solvents and two-phase partitioning systems through Principal Component Analysis, the Euclidean distance formula and other computational methods. Such comparisons might be used to assist design engineers in identifying less toxic, more environmentally compatible solvent alternatives to replace the more hazardous organic solvents currently used in industrial manufacturing processes, or to help in identifying partition systems that might mimic biological response properties. These latter comparisons have involved skin permeation and water-to-organic solvent systems [38], parallel artificial membrane permeability assays and biological systems [53], and aquatic toxicity and water-to-organic solvent systems [25,27]. Each of the fore-mentioned comparisons described both the specified biological and chemical process in terms of an Abraham model correlation.
At present, experimental-based solute descriptors are available for more than 8500 different molecular organic and organometallic compounds and for more than 300 different ionic species (such as tetraalkylammonium and substituted-pyridinium cations, substitutedphenolate and substituted-benzoate anions), which is only a tiny fraction of the known chemical compounds. Recognizing that it will never be possible to perform a sufficient number of experimental measurements to calculate experimental-based Abraham model solute descriptors for every known chemical compound, researchers have searched for alternative methods to estimate the numerical values. Published group contribution [53][54][55][56] and machine learning methods [55,56] have exhibited remarkable promise in that both the molar enthalpies of solvation (∆H solv ) and molar Gibbs energies of solvation (∆G solv ) of molecular organic solutes and inorganic gases in a wide range of organic solvents of varying polarity and hydrogen-bonding character could be reasonably predicted by the estimated solute descriptors. In the study by Chung and coworkers [56] the authors used an "inhouse" solute descriptor database containing 8366 unique chemical compounds. The compounds were fragmented into atom-centered (AC) functional groups. Each solute descriptor was then calculated by: summing the contributions from all AC groups, with special ring strain corrections (RSC) and long-distance interaction (LDI) groups being added to account for any more advanced structural features that were not fully captured by the atom-centered approach. Halogenated molecules required a slightly different estimation scheme in that the halogenated atoms were replaced by hydrogen atoms prior to the fragmentation. The correction(s) for the halogen atom(s) were added at the end to the calculated solute descriptor of the "non-halogenated compound":  [59][60][61][62].
The continued development of predictive group contribution and machine learning methods for the Abraham model requires the determination of both experimental-based solute descriptors and experimental-based solvent/process correlations. Additional experimental measurements are also needed for testing the limitations and applications of new predictive methods. To aid in future endeavors, we have recently reported solute descriptors for an additional 174 monomethyl branched alkanes [63], for an additional 127 C 9 -C 26 mono-alkyl alkanes and polymethyl alkanes [64], for an additional 33 linear C 7 -C 14 alkynes [65], and for several important active pharmaceutical ingredients and intermediates [23,[66][67][68]. Abraham model correlations have also been recently determined for two practical partitioning systems, water-methyl ethyl ketone (MEK) [21] and water-methyl isobutyl ketone (MIBK) [22], and for solute transfer into isopropyl acetate [69], anisole [70] and cyclohexanol [71]. In the current communication we have calculated Abraham model L solute descriptor values for an additional 149 C 11 to C 44 methylated alkanes from measured gas-liquid chromatographic retention data gathered from a compilation by Katritzky and coworkers [72].

Calculation of Abraham Model Solute Descriptors for Methylated Alkanes
Normally the determination of Abraham model solute descriptors involves constructing a series of mathematical expressions for the measured solute properties of the given solute in a series of solvents and/or for a series of processes for which the lowercase equation coefficients are known. The solute properties used in past descriptor calculations have included logarithms of the measured solubility ratio of the solute in several different organic mono-solvents, the logarithms of the measured water-to-organic solvent partition coefficients, measured gas-liquid chromatographic retention data, and/or experimental enthalpies of solvation for the solute dissolved in several different organic solvents. Each of these solute properties will hopefully have very little experimental error. Aquatic toxicities and biological response factors generally have too much experimental error to be used effectively in solute descriptor calculations.
The Abraham model expressions for each of the fore-mentioned processes contain a common set of solute descriptors. The series of mathematical expressions are then solved for the "best" set of descriptor values that minimizes the overall squared-summed difference between the measured properties and the respective back-calculated values based on Equations (1) and (2). For the methylated hydrocarbons listed in Table 1, the computational process is greatly simplified in that the E, S, A and B solute descriptors are all equal to zero. The V solute descriptor is readily calculated from the solute's molecular structure, the atomic volumes of the constituent atoms contained in the solute molecule, and the number of chemical bonds in the solute molecule as described by Abraham and McGowan [73]. This leaves only the L solute descriptor to be calculated.
To calculate the L solute descriptor based on the gas-liquid chromatographic data retrieved from the published paper by Katritzky and coworkers [72], we first must establish a mathematical relationship correlation between the reported Kovat's retention indices, KRI, and the L solute descriptor for alkane solute molecules. Numerical values KRI are available for 178 methylated alkanes [73]; however, for most of the molecules the L solute descriptor is not known. We can increase the number of compounds used in establishing the Abraham model correlation by noting that by definition the KRI values of linear alkanes is simply 100 times the number of carbon atoms. This allows us to add an additional 34 alkanes (heptane through tetracontane) to the regression data set. By combining the 34 linear alkanes and the 29 monomethyl alkanes with known L descriptor values from the Katritzky et al. compilation [72], we have 63 experimental data points to use in developing our Abraham model KRI versus L descriptor correlation. Analysis of the values in the second and third columns of Table 1 where N is the number of experimental data points used in the linear least-squares analysis, SD is the standard deviation, R 2 is the squared correlation coefficient, and F is the Fisher F-statistic. Standard errors in the slope and intercept are given in parentheses immediately following the respective equation coefficient. Equation (5)  descriptor values versus KRI/100 values for the 63 data points used in deriving Equation (5). The derived mathematical relationship then allows the calculation of the L solute de scriptors of the remaining 149 methylated alkanes. These calculations are summarized in the last column of Table 1. As an informational note, Katritzky et al. identified the differen alkanes by a numerical labelling scheme where the first two digits in the number indicated the length of the longest carbon chain, and each of the next two-digit pairs showed the location of the methyl substituent on the carbon-atom backbone. We named the com pounds labelled 22_0822 and 38_162024 as 8-methyltricosane and 15,19,23-trimethylocta triacontane, respectively. For the first alkane, the placement of a methyl substituent on the 22nd carbon atom would increase the carbon backbone by one carbon atom. For the sec ond alkane, we renumbered the carbon atoms starting at the other end of the carbon back bone to obtain a smaller set of numerical values.

Calculation of Thermodynamic Properties of Large Methylated Alkanes Using Abraham Model Solute Descriptors
The L solute descriptors that are tabulated in the last column of Table 1 provide re searchers with an additional 149 chemical compounds to use in developing group contri bution and machine learning methods for predicting Abraham model solute descriptors Remember that the E, S, A and B solute descriptors of the tabulated compounds are equa to zero, and that the V solute descriptor values can be easily obtained using the method described by Abraham and McGowan [73]. The tabulated values given in Table 1 can also be used in conjunction with published Abraham model correlations to predict a wide range of physical, biological and thermodynamic properties including vapor pressures enthalpies of vaporization and sublimation, enthalpies of solvation, aquatic toxicities and other properties. In the next few paragraphs, we will illustrate the computational meth odology by calculating the standard molar enthalpies of sublimation, ∆Hsub,298K, and of va porization at 298 K, ∆Hvap,298K, as well as discussing how the former values might be com bined with measured enthalpy of solution data, ∆Hsoln,298K, to obtain a second set of pre dicted ∆Hsub,298K values. While our computations are focused on the large, methylated al kanes studied in the current communication, we note that the computational method can be applied to other organic compounds as well. All that is needed for the predictions i knowledge of the Abraham model solute descriptors for the given compound and the Abraham model correlation for the property/process that one wishes to predict. This i the driving force behind the development of group contribution and machine learning methods to predict Abraham model solute descriptors.

Calculation of Thermodynamic Properties of Large Methylated Alkanes Using Abraham Model Solute Descriptors
The L solute descriptors that are tabulated in the last column of Table 1 provide researchers with an additional 149 chemical compounds to use in developing group contribution and machine learning methods for predicting Abraham model solute descriptors. Remember that the E, S, A and B solute descriptors of the tabulated compounds are equal to zero, and that the V solute descriptor values can be easily obtained using the method described by Abraham and McGowan [73]. The tabulated values given in Table 1 can also be used in conjunction with published Abraham model correlations to predict a wide range of physical, biological and thermodynamic properties including vapor pressures, enthalpies of vaporization and sublimation, enthalpies of solvation, aquatic toxicities and other properties. In the next few paragraphs, we will illustrate the computational methodology by calculating the standard molar enthalpies of sublimation, ∆H sub,298K , and of vaporization at 298 K, ∆H vap,298K , as well as discussing how the former values might be combined with measured enthalpy of solution data, ∆H soln,298K , to obtain a second set of predicted ∆H sub,298K values. While our computations are focused on the large, methylated alkanes studied in the current communication, we note that the computational method can be applied to other organic compounds as well. All that is needed for the predictions is knowledge of the Abraham model solute descriptors for the given compound and the Abraham model correlation for the property/process that one wishes to predict. This is the driving force behind the development of group contribution and machine learning methods to predict Abraham model solute descriptors.
For the methylated alkane compounds studied in the current communication, the numerical values of ∆H sub,298K are calculated according to Equation (6) [46].
For the convenience of the journal readers, we have removed the e k × E, s k × S, a k × A and b k × B terms from Equation (2) as these terms will not contribute to the calculated ∆H sub,298K values because the E, S, A and B solute descriptors of the methylated alkane compounds are set equal to zero. The second column in Table 2 contains our calculated ∆H sub,298K values for the 178 compounds gathered from the Katritzky et al. [72] paper. Table 2. Comparison of the enthalpies of sublimation, ∆H sub,298K (in kJ mol −1 ), predicted by the Abraham model Equation (6) and the group-additivity method of Naef and Acree (Equation (8)).

Compound Name
∆H sub,298K Equation ( In our search of the published chemical and engineering literature we did not find experimental ∆H sub,298K data to compare our calculated values against. The sublimation enthalpies of large, nonvolatile compounds are difficult to measure due to the compound's very small vapor pressures; however, these quantities are often needed in the design of high-temperature industrial processes and in the calculation of gas-phase standard molar enthalpies of formation from enthalpy of combustion measurements. What we offer in the way of a comparison is to compare our Abraham model predictions against the calculated values from the Naef and Acree group-additivity method [14]. The group-additivity method has been shown to predict ∆H sub,298K values for approximately 1866 molecular compounds to within a standard deviation of SD = 10.33 kJ mol −1 . The basic method estimates a given thermodynamic or physical property by: summing the contributions that each individual atom group makes to the overall property. In Equation (7), A i denotes the number of occurrences of the ith atom group, B j is the number of times each special group occurs, and a i and b j are the numerical values of each atom group and special group. For monomethylated and polymethylated alkanes, the atom group-additivity method proposed by Naef and Acree [14] fragments molecules into three distinct kinds of sp 3 hybridized carbon atoms based on the number of each type of atoms directly bonded to the carbon atom. The first carbon-atom group will be bonded to three hydrogen atoms and one carbon atom (CH 3 group), the second carbon-atom group will be attached to two hydrogen and two carbon atoms (CH 2 group), and the third carbon-atom group has a single hydrogen and three carbon-atom nearest neighbors (CH group). The method also includes one special group that is defined as the total number of carbon atoms in the alkane molecule.
In Equation (8) below we have inserted the numerical group values and constants into Equation (7) for predicting ∆H sub,298K of C n H 2n+2 monomethylated and polymethylated alkanes: ∆H sub,298K (kJ mol −1 ) = 5.99 n CH3 + 6.88 n CH2 + 2.28 n CH − 0.53 n carbons + 21.03 (8) We note that the predicted values based on Equation (6) of the Abraham model are similar to the predictions based on the group-additivity model of Naef and Acree [14] for the "smaller" of the large polymethylated alkanes as shown by the numerical entries in the last two columns of Table 2. The differences become more pronounced with increasing carbon-atom chain length. The group-additivity method is unable to distinguish between the placement of the alkyl-substituent group along the large carbon-atom chain, and yields the same predicted values for a common set of number of different atom types. In other words, the predicted values of all dimethylhexacosane isomers in Table 2 are the same. Every dimethylhexacosane isomer considered in this study has 4 CH 3 -type carbon atoms, 22 CH 2 -type carbon atoms, 2 CH-type carbon atoms, and 28 total carbon atoms. The only dimethylhexacosane isomers that would be different would be those in which both methyl groups were situated on the same carbon atom. The predicted values of these latter dimethylhexacosane isomers would be identical to each other, but different from the dimethylhexacosane isomers in Table 2. Identical predicted values for isomeric compounds are a common feature of most published group-additivity and group contribution methods for predicting thermodynamic properties of organic compounds. The Abraham model, on the other hand, will give different predicted values for each dimethylhexacosane isomer, and does not require that the molecule be fragmented into atom groups or functional groups.
The enthalpies of vaporization would be calculated in similar fashion using the mathematical correlation developed by Churchill and coworkers [45]: ∆H vap,298K (kJ mol −1 ) = 6.100 + 9.537 L (9) and the group-additivity method proposed by Naef and Acree [14]: ∆H vap,298K (kJ mol −1 ) = 3.07 n CH3 + 4.67 n CH2 + 3.57 n CH + 0.09 n carbons + 8.61 (10) As before, only the terms needed in the calculation of the ∆H vap,298K values for the polymethylated alkanes studied in the current communication are given. The standard molar enthalpies of vaporization of large alkanes can be experimentally determined using correlation gas chromatography [74][75][76]. The experimental method is applicable to both solid and liquid compounds, and moreover, does not require highly purified chemical samples as the chromatographic retention times corresponding to the dissolved impurities in the samples can be distinguished from the desired chemical compound by their much smaller peak areas.
Solomonov and coworkers [77][78][79][80][81] recently devised an indirect method for determining both standard molar enthalpies of vaporization and sublimation from measured enthalpies of solution, ∆H soln , and predicted enthalpies of solvation, ∆H solv , for the given chemical compound dissolved in a suitable organic solvent. Numerical values of ∆H vap,298K and ∆H sub,298K were calculated from: Equations (11) and (12), respectively, depending upon whether the dissolved solute is a liquid or crystalline compound. The standard molar vaporization/sublimation enthalpies of vaporization and sublimation based upon the proposed solution calorimetry approach of Solomonov and coworkers were found to be in good agreement (within experimental uncertainties) with the values determined by the more direct calorimetric, gas saturation and vapor pressure methods. The Abraham general solvation model was one of the predictive methods used by the authors [81] to calculate the solvation enthalpies. Abraham model correlations are available for calculating ∆H solv , values for solutes dissolved in water and in more than 30 different organic solvents of varying polarity and hydrogen-bonding character. Group contribution and atom-additivity methods for predicting enthalpies of solvation are currently available for only a handful of organic solvents. In Table 3

Conclusions
The current study is a continuation of our ongoing research involving the calculation Abraham model solute descriptors based on experimental water-to-organic solvent partition coefficients, infinite dilution activity coefficients, molar solubilities and gas-liquid chromatographic and high-performance liquid chromatographic retention data. Abraham model L solute descriptors have been calculated for an additional 149 C 11 to C 42 monomethylated and polymethylated alkanes based on the measured Kovat's gas-liquid chromatographic retention indices retrieved from the published chemical literature. The calculated solute descriptors, in combination with previously published Abraham model correlations, can be used to predict a number of very important chemical and thermodynamic properties including partition coefficients, molar solubility ratios, gas-liquid chromatographic and HPLC retention data, infinite dilution activity coefficients, molar enthalpies of solvation, standard molar vaporization and sublimation at 298 K, vapor pressures, and limiting diffusion coefficients. The predictive computations are illustrated by estimating both the standard molar enthalpies of sublimation and the enthalpies of solvation in benzene for the monomethylated and polymethylated alkanes considered in the current study. The standard molar enthalpy of sublimation predictions were also performed using the group-additivity method proposed by Naef and Acree. Experimental thermodynamic data are not currently available for many of the larger monomethylated and polymethylated alkanes. Predictive methods provide practicing chemists and engineers working in the industrial manufacturing sector with a means to estimate the physical and thermodynamic properties needed in process design calculations.
A comparison of the two methods revealed that the ∆H sub,298K predictions based on the Abraham model are similar to predictions based on the group-additivity model of Naef and Acree [14] for the "smaller" of the large polymethylated alkanes. The differences became more pronounced with increasing carbon-atom chain length. The group-additivity method was not able to distinguish between the placement of the alkyl-substituent group attached to the large carbon-atom chain, and yielded the same predicted values for a given molecular formula. In other words, the predicted values of all dimethylhexacosane isomers were the same. This limitation is a common feature of most group-additivity and group contribution methods. The Abraham model, on the other hand, provides different predicted values for the different dimethylhexacosane molecules, and does not require fragmentation of the molecule into atom groups or functional groups.
The solute descriptors determined in the current study further the development of group contribution and machine learning methods by providing experimental-based values for an additional 149 chemical compounds. Published group contribution [53][54][55][56] and machine learning methods [55,56] have shown some promise along these lines in that the estimated descriptor values provide reasonably accurate predictions for both the molar enthalpies of solvation (∆H solv ) and molar Gibbs energies of solvation (∆G solv ) of organic solutes and inorganic gases in a wide range of organic solvents of varying polarity and hydrogen-bonding character.