Abraham Solvation Parameter Model: Revised Predictive Expressions for Solute Transfer into Polydimethylsiloxane Based on Much Larger and Chemically Diverse Datasets

: Updated Abraham model correlations are reported for the transfer of organic solutes and inorganic gases to a polydimethylsiloxane coating from both water and the gas phase based on published experimental data for more than 220 different compounds. The derived mathematical expressions back-calculate the observed partitioning behavior to within standard deviations of the residuals of 0.206 and 0.176 log units, respectively.


Introduction
Sample preparation is a vital part of analytical method development, particularly in the case of trace analysis. For complex unknown samples containing many chemical constituents, one often must isolate the analyte from all interferences that might be present. Preconcentration may also be needed as there is only a limited number of analytical techniques that have both the selectivity and sensitivity to permit accurate quantification at very low analyte concentrations. Classical separation methods such as liquid-liquid extraction and solid-phase extraction, which were once popular in analytical method development, have been replaced by microextraction methods that consume much smaller quantities of organic solvents. Considerable attention has been afforded in recent years to solvent (adsorbent) selection for use in microextraction devices as analyte partitioning between the unknown sample matrix and the device solvent controls the extraction efficiency. Currently, devices have been constructed using polymeric solvents/sorbents (e.g., polydimethylsiloxane (PDMS), low-density polyethylene, polyoxymethylene, etc.) [1][2][3][4], ionic liquids [4][5][6] and deep eutectic solvents [7][8][9]. Mathematical expressions have been reported for predicting analyte partitioning in select polymeric materials [10][11][12][13] and ionic liquid solvents [14][15][16][17][18] to aid in the solvent selection process.
Our research in this area has been to develop Abraham model expressions [19][20][21][22][23] for describing solute transfer between two condensed phases: log P = e eq 1 × E + s eq 1 × S + a eq 1 × A + b eq 1 × B + v eq 1 × V + c eq 1 (1) and solute transfer from the gas phase into a condensed phase: log K = e eq 2 × E + s eq 2 × S + a eq 2 × A + b eq 2 × B + l eq 2 × L + c eq 2 (2) where the dependent solute properties on the left-hand side of Equations (1) and (2) are the logarithms of the water-to-organic coating partition coefficient, log P, and the logarithms of the gas-to-organic coating partition coefficient, log K. The uppercase and lowercase quantities on the right-hand side of both mathematical equations represent the solute descriptors (E, S, A, B, V and L) and complimentary solvent/coating properties (c eq 1 , e eq 1 , s eq 1 , a eq 1 , b eq 1 , v eq 1 , c eq 2 , e eq 2 , s eq 2 , a eq 2 , b eq 2 and l eq 2 ), respectively. Numerical values of a given solute remain the same for all partitioning processes; in other words, the solute descriptors for benzene would be independent of both the partitioning process, log P or log K, and the identity of the receiving organic solvent/coating. The solute descriptors encode valuable chemical information regarding the ability of the solute to interact with its solubilizing media, and are defined as follows: A and B refer to the respective overall hydrogen-bond donating and accepting capacities of the dissolved solute; E corresponds to the molar refraction of the given solute (in units of (cm 3 ·mol −1 )/10) in excess of that of a linear alkane having a comparable molecular size; L is the logarithm of the solute's gas-to-hexadecane partition coefficient determined at 298.15 K; S represents a combination of the electrostatic polarity and polarizability of the solute; and V denotes the McGowan molecular volume of the solute (in units of (cm 3 ·mol −1 )/100) calculated from atomic sizes and chemical bond numbers. Numerical values of the complimentary solvent/coating properties in Equations (1) and (2) are determined by regressing measured log P and/or log K data for a series of solutes with known descriptor values in accordance with the respective solute property. Once determined, the lowercase alphabetical characters allow one to predict the specified property of additional solutes in the given organic solvent/coating, provided that the solute descriptors are known. Abraham model correlations [20] have been reported for many partitioning processes that are used in commercial manufacturing processes and private analytical laboratories to isolate the desired chemical compound/analyte from unwanted impurities. There are still a large number of common organic solvents and solvent mixtures for which predictive expressions are not available. We have tried to address this issue by determining Abraham model log P and log K correlations for additional organic solvents based on measured solubility and partition coefficient data, and as the occasion arises we have also updated previously published correlations using much larger datasets. For example, we recently reported Abraham model predictive expressions for solute transfer into tert-butyl acetate based on our measured solubility data for 31 different crystalline nonelectrolyte organic compounds of varying polarity and hydrogen-bonding character [21]. As part of this study we also updated existing equations for both ethyl acetate and butyl acetate, which had been published 14 years earlier [24]. It is important to periodically update existing correlations using larger and more chemically diverse datasets. The chemical diversity, as reflected by the solute descriptor values, defines the area of predictive chemical space over which a derived Abraham correlation is valid. It is not good practice to utilize a mathematical expression to make predictions for those solutes whose descriptor values fall too far outside of the range of values used in determining the equation coefficients, nor should one use mathematical correlations to calculate solute descriptors of additional compounds, if the newly obtained descriptor values fall too far outside of the range of values that the correlations themselves were based upon. In the present communication we critically re-examine the ability of the Abraham solvation parameter model to describe solute transfer into PDMS after contradictory studies have appeared in the chemical literature.

Prior Abraham Model Studies Describing Solute Transfer into Polydimethylsiloxane
Several predictive expressions have been proposed for predicting chromatographic retention behavior on a PDMS stationary phase column, solute diffusion through PDMS membranes and partition/sorption coefficients in PDMS film coatings. However, we focus our attention in this communication on those studies pertaining to the Abraham solvation parameter model. First, Hierlemann and coworkers [25] the sorption coefficients of vapors of 32 organic compounds on a thickness-shear-mode polydimethylsiloxane-coated resonator. In terms of statistical information, the au-thors gave the number of experimental data points used in the regression analysis (N), the squared correlation coefficient (R 2 ), the standard error for the correlation (SE), the Fisher F-statistical (F) and the standard errors in the calculated equation coefficients, which are given in parenthesis immediately following the respective coefficient. Equation (3) was found to back-calculate the observed log K PDMS-air values, that ranged from log K PDMS-air = 1.65 (propionaldehyde) to log K PDMS-air = 4.03 (decane), to within a standard error of SE = 0.127 log units. As an informational note, the authors also obtained Abraham model correlations for resonators coated with poly(methyloctylsiloxane), poly(methyl(cyanopropyl)siloxane), poly(methylphenylsiloxane), poly(methyl(2-carboxy (D-valinyl-tert-butylamide)propyl)siloxane, poly(methyl(isopropylcarboxylic acid) siloxane), poly(methylphenylsiloxane) and poly(methyl(aminopropyl)siloxane). In each case the Abraham model was found to provide a reasonably accurate mathematical description of the observed log K PDMS-air data. The largest calculated standard error, SE = 0.163, was for the poly(methyl(isopropylcarboxylic acid)siloxane) coating. Second, Xia et al. [26] reported an Abraham model correlation for absorption from aqueous solution onto a PDMS membrane: based on limited experimental data for 32 aromatic compounds (naphthalene, biphenyl, 1-methylnaphthalene and 29 benzene derivatives). The authors did not provide a value of the standard error for their correlation. At the time that Equations (3) and (4) were published the solute descriptors were denoted using a different set of alphabetical characters. For the convenience of journal readers, we have converted the older symbolism used by both Hierlemann and coworkers [25] and Xia et al. [26] to the current set of alphabetical characters. Third, Sprunger and coworkers [10] derived Abraham model correlations for both log P PDMS-water and log K PDMS-air : that described experimental partition coefficient data for approximately 170 different inorganic and organic compounds to within standard deviations of the residuals of SD = 0.171 log units (Equation (5)) and SD = 0.180 log units (Equation (6)). The relevant statistical information includes not only the number of experimental data points, squared correlation coefficient, standard deviation and Fisher F-statistic, but also the adjusted squared correlation coefficient, R adj 2 . The much larger dataset used in determining Equations (5) and (6) resulted from the authors more thorough search of the published chemical literature, coupled with the decision to combine measured values based on "dry" and "wet" experimental methodologies into a single dataset.
The main difference between the two experimental methodologies is whether or not the PDMS phase was in direct contact with water as the values were being determined. From an experimental standpoint the direct measurement of log P PDMS-water values requires the aqueous and PDMS phases be in contact with one another, while log K PDMS-air values are generally measured in the absence of a water phase. It is possible to convert measured log P PDMS-water values to calculated log K PDMS-gas values and vice versa through Equation (7): The conversion requires knowledge of the logarithm of solute molecule's gas-to-water partition coefficient, log K w . For purposes of this discussion we will refer to the values obtained from Equation (7) as "calculated" experimental values in that they were not obtained directly from the experimental methodology. Sprunger and coworkers carefully denoted for each solute-PDMS data point whether the PDMS phase was "wet" or "dry" in their tabulation of log P PDMS-water and log K PDMS-air values. Separate log P PDMS-water and log K PDMS-air expressions were also reported where the "wet" and "dry" values were not combined. For predictive applications it was recommended that the separate "wet" and "dry" correlations be used. The combined "wet" and "dry" correlations, Equations (5) and (6), were offered as possibilities in the event that the descriptor values of the solute whose log P PDMS-water and/or log K PDMS-air one wished to predict fell far outside of the range of values used in generating the separate "wet" and "dry" correlations.
The three afore-mentioned studies suggest that the Abraham model does provide a reasonably accurate mathematical description of solute transfer into polydimethylsiloxane. A recent study by Zhu and Tao [11] calls into question these earlier observations in that their reported Abraham model correlation for log K PDMS-air : log K PDMS-air = 1.524 + 0.660E − 0.006S + 0.896A + 0.369B + 0.452L (8) This is based on a training set containing 192 experimental values that had a very large root-mean-square-error of RMSE = 0.532 log units. The training set included values determined by both "wet" and "dry" experimental methodologies. Zhu and Tao did not provide in their paper or accompanying supporting information what numerical values were used for the Abraham model solute descriptors. The authors simply stated that "the optimized Abraham descriptors were calculated by PaDEL Descriptor (Version 2.21) [27]. An earlier paper co-authored by Zhu and coworkers [28] did include the numerical values; however, our private descriptor database did not contain many of the compounds so we were not able to properly ascertain the quality of the estimated values. It is entirely possible that bad estimates of the descriptor values for several compounds may have led to the rather poor Abraham model correlation and the large resulting RMSE value.
It is also possible that Zhu and Tao did not carefully curate their experimental log K PDMS-air database, and that incorrect values and/or values for other polymeric materials were included in their data analysis. For example, in glancing through the log K PDMS-air values used in the regression analyses, we found that values taken from a paper by Boscaini and coworkers [29] were often much larger than values determined by other independent researchers, e.g., log K PDMS-air = 3.28 [29] versus log K PDMS-air = 2.57 [30] for ethanol; log K PDMS-air = 3.90 [29] versus log K PDMS-air = 2.99 [31,32] versus log K PDMS-air = 3.37 [30] for 2-pentanone. In the case of multiple entries for a given solute, the authors simply averaged all numerical values.
We also noted in our search of the published literature that the earlier paper coauthored by Zhu and coworkers [28] did report an Abraham model correlation for log P PDMS-water : that contained only two of the five solute descriptors. The reason for the large RMSE in the latter equation is failure to include all five solute descriptors. Again, since this earlier study contained several compounds for which experimental-based solute descriptors are not available, we suspect that bad estimates of the descriptor values may have played a part in the poor descriptive ability of both Equations (9) and (10). Given the above concerns, we have re-examined the ability of the Abraham model to describe solute transfer into polydimethylsiloxane using a much larger dataset of experimental values. The results of our analysis are provided in the following pages.

Construction of Databases and Determination of Updated Abraham Model Correlations
We start our analysis with the combined database that Sprunger and coworkers [10] used in deriving Equations (5) and (6). The database listed the references from which each experimental value was taken and denoted whether the values were measured on a "wet" or "dry" PDMS phase. We add to the dataset the experimental log K PDMS-air data for 28 isoparrafinic compounds (methyl-and ethyl-branched alkanes) and for 31 alkyl-substituted benzene derivatives determined by Martos and coworkers [33] using a solid-phase microextraction fiber coated with PDMS. Values for cyclopentane, methylcyclopentane and methylcyclohexane were taken from the training data set used by Chao and coworkers [34] in developing empirical QSPR expressions for predicting water-to-DMSO partition coefficients. From the published compilations of Zhu and Tao [11] and Zhu et al. [28], we added those compounds for which we had experiment-based solute descriptors. The objectives of the current study are not only to ascertain if the Abraham model can describe solute transfer into polydimethylsiloxane, but also develop an updated Abraham model correlation that can be used to calculate solute descriptors for additional organic compounds from measured log P PMDS-water and log K PMDS-air data. The latter objective is not met by including compounds with questionable, estimated solute descriptors into the data analysis.
The measured log P PMDS-water data were converted to their "calculated" experimental log K PDMS-air counterpart using Equation (7) and log K w values taken from our private database. Measured log K PDMS-air data were converted to "calculated" experimental log P PMDS-water values in similar fashion. In total, we were able to assemble 244 log P PMDS-water values and 229 log K PDMS-air values to use in updating the earlier Abraham model expressions of Sprunger and coworkers [10] for describing solute transfer into PMDS. The additional experimental data increases the databases used by Sprunger et al. by 43.5% and 61.3%, respectively, which is more than enough new values to merit revision of the earlier correlations. The two sets of experimental data are tabulated in Tables 1 and 2, along with the descriptor values for all compounds considered in the current study. Given in the last column of both tables are the references from which the measured data were taken. If the values came from the Sprunger et al. database we referenced this paper [10] as the source of the experimental data in order to conserve journal space.      Analysis of the 244 log P PMDS-water values and 229 log K PDMS-air values given in Tables 1 and 2 that described experimental partition coefficient data for slightly more than 220 different inorganic and organic compounds to within standard deviations of residuals of SD = 0.206 log units (Equation (11)) and SD = 0.176 log units (Equation (12)). The associated statistical information is given below the derived mathematical correlations, and also includes the standard error of estimate, SEE. The above regression analyses, as well as the following training set and test set analyses, were performed using the IBM SPSS software (Version 29.0.0.0, Armonk, US). Both derived equations provide a reasonably accurate mathematical description of the observed log P PMDS-water and log K PDMS-air data, as evidenced by the relatively small SD values and near unity values for R 2 and R adj 2 .
The descriptive ability is further illustrated in Figures 1 and 2. Except for pentan-3one, most of the solute molecules, represented by the graphed points, fall near the drawn straight line, indicating a near-perfect back-calculation. It is observed that experimental log P PMDS-water and log K PDMS-air values differ significantly from those of pentan-2-one. We could not think of any reason for excluding the data for pentan-3-one from the regression other than its experimental values were out of line with those of other similar alkanones. The experimental value for pentan-3-one was measured on a PDMS membrane using a proton transfer reaction mass spectrometric method [29]. Experimental values determined by this particular methodology tended to be consistently larger than values measured using other techniques, as noted in the preceding section of this communication. Additional examples include log K PDMS-air = 3.923 [29] versus log K PDMS-air = 3.462 [33] for 1,2-dimethylbenzene; log K PDMS-air = 3.875 [29] versus log K PDMS-air = 3.320 [33] for 1,3-dimethylbenzene; and log K PDMS-air = 4.214 [29] versus log K PDMS-air = 3.702 [33] for propylbenzene. Large interlaboratory differences are not uncommon as the measurements are not trivial, particularly in the case of nonvolatile compounds and compounds having limited aqueous solubility. Figures 3 and 4 depict the residual values for Equations (11) and (12), respectively.
The updated log K PDMS-air correlation reported in the current communication differs significantly from the recently published equation of Zhu and Tao [11] in terms of its descriptive ability. The root-mean-square error associated with the Zhu and Tao correlation was RMSE = 0.532 log units. Similar differences were observed between our updated log P PDMS-water correlation at the published equation of Zhu et al. [28], which had a RMSE value of RMSE = 0.812. The only explanation that we can offer at this time for why the correlations of Zhu and Tao [11] and Zhu et al. [28] had such a large RMSE is that the authors employed estimated solute descriptors, rather than experiment-based solute descriptors, in their database. We cannot eliminate the possibility that there may have been some bad numerical values of log K PDMS-air and log P PDMS-water included their analysis, as some of the values that we suspect may have been in error are for compounds for which we do not have experiment-based Abraham model solute descriptors. We also note that a careful examination of Equations (5), (6), (11) and (12)        The updated log KPDMS-air correlation reported in the current communication differs significantly from the recently published equation of Zhu and Tao [11] in terms of its descriptive ability. The root-mean-square error associated with the Zhu and Tao correlation was RMSE = 0.532 log units. Similar differences were observed between our updated log PPDMS-water correlation at the published equation of Zhu et al. [28], which had a RMSE value   The updated log KPDMS-air correlation reported in the current communication differs significantly from the recently published equation of Zhu and Tao [11] in terms of its descriptive ability. The root-mean-square error associated with the Zhu and Tao correlation was RMSE = 0.532 log units. Similar differences were observed between our updated log PPDMS-water correlation at the published equation of Zhu et al. [28], which had a RMSE value of RMSE = 0.812. The only explanation that we can offer at this time for why the correlations of Zhu and Tao [11] and Zhu et al. [28] had such a large RMSE is that the authors In order to assess the predictive abilities of Equations (11) and (12) There is very little difference in the equation coefficients for the full dataset and the training set correlations, thus showing that the training set of compounds is a representative sample of the total dataset.
The training set correlations were then used to predict log P PDMS-water values of the 122 compounds in the test set and the log K PDMS-air values of the 114 compounds in the test set. For the predicted and experimental we found AAE (average absolute error) = 0.163 and AAE = 0.138, and AE (average error) = −0.023 and AE = −0.064, for Equations (13) and (14), respectively. The training and test computations were performed an additional three times by splitting the large log P PDMS-water and log K PDMS-air datasets into different combinations of experimental values. Very similar results were obtained each time.

Summary
Updated mathematical correlations based on the Abraham solvation parameter model have been shown to provide a reasonably accurate description/prediction of the observed log P PMDS-water and log K PDMS-air data for a chemically diverse set of slightly more than 220 organic solutes and inorganic gases in "dry" and "wet" PDMS. The updated equations were found to back-calculate the observed data to within a standard deviation of residuals of SD = 0.21 log units (or less). Equation coefficients for the updated correlations differ slightly from the earlier values reported by Sprunger and coworkers [10] and reaffirm of the applicability of the Abraham model to describe solute transfer into PDMS. The updated log K PDMS-air correlation reported in the current communication differ significantly from the recently published equation of Zhu and Tao [11] in terms of descriptive/predictive ability. The root-mean-square error associated with the correlation of Zhu and Tao is RMSE = 0.532 log units. Similarly, our updated log P PDMS-water correlation differs from that obtained by Zhu et al. [28] and also has a much better descriptive ability. The only possible explanation that we can offer at this time for the conflicting observations regarding predictive ability is that Zhu and coworkers [11,28] used estimated numerical values for the Abraham model solute descriptors, while we used experiment-based descriptor values determined from measured partition coefficient, molar solubility and chromatographic retention data. Our past experience in using group contribution [35][36][37][38] and machine learning methods [38,39] to estimate Abraham model solute descriptors is that the software programs can return estimated values that differ significantly from values determined from actual experimental data [20,40].
The updated PDMS correlations reported in the current communication provide us with two additional Abraham model equations for calculating experiment-based solute descriptors of additional organic compounds. The published literature contains log P PMDS-water and log K PDMS-air data for many organic compounds for which we do not have descriptor values. In fact, we excluded some of the log K PDMS-air data from the paper by Zhu and Tao [11], as well as some of the log P PDMS-water data from the paper by Zhu et al. [28], from our analyses because we did not have experiment-based descriptor values for the compounds. One of the objectives of the current study was to update the existing Abraham model correlations for PDMS so that we could use the updated correlations in planned later studies to calculate descriptor values for pesticides and other important environmental pollutants. As noted earlier, one should not use mathematical correlations to calculate solute descriptors of additional compounds if the newly obtained descriptor values fall too far outside of the range of values that the correlations themselves were based upon. Pesticides had solute descriptors that fell outside of the range of predictive chemical space for several of our existing correlations. We are gradually updating several of our existing Abraham model correlations to expand their predictive chemical space, and the PDMS was one of the solvents/coatings. Correlations for ethyl acetate and butyl acetate were recently updated as well [21].