Skin Permeation of Solutes from Metalworking Fluids to Build Prediction Models and Test A Partition Theory

Permeation of chemical solutes through skin can create major health issues. Using the membrane-coated fiber (MCF) as a solid phase membrane extraction (SPME) approach to simulate skin permeation, we obtained partition coefficients for 37 solutes under 90 treatment combinations that could broadly represent formulations that could be associated with occupational skin exposure. These formulations were designed to mimic fluids in the metalworking process, and they are defined in this manuscript using: one of mineral oil, polyethylene glycol-200, soluble oil, synthetic oil, or semi-synthetic oil; at a concentration of 0.05 or 0.5 or 5 percent; with solute concentration of 0.01, 0.05, 0.1, 0.5, 1, or 5 ppm. A single linear free-energy relationship (LFER) model was shown to be inadequate, but extensions that account for experimental conditions provide important improvements in estimating solute partitioning from selected formulations into the MCF. The benefit of the Expanded Nested-Solute-Concentration LFER model over the Expanded Crossed-Factors LFER model is only revealed through a careful leave-one-solute-out cross-validation that properly addresses the existence of replicates to avoid an overly optimistic view of predictive power. Finally, the partition theory that accompanies the MCF approach is thoroughly tested and found to not be supported under complex experimental settings that mimic occupational exposure in the metalworking industry.


Introduction
The assessment of skin permeation of chemical solutes can be used to inform scientific research and regulatory agencies in the risk management of chemical solutes that may be of concern especially for occupational exposures [1][2][3]. For example, in the metalworking industry, certain performance enhancing solutes such as corrosive inhibitors, emulsifiers, and biocides/preservatives are often added to the metalworking fluids (MWF). Contact with these industrial fluids containing some or all of these performance additives could sometimes cause skin irritation or even more harmful consequences [4][5][6][7]. Thus, it is of interest to study the permeation capability of the added solutes through skin, in the hopes of finding less permeable solutes that can be used in metalworking fluids.
Unfortunately, conducting skin absorption studies of the many industrial chemicals and many formulations can be very expensive, and many efforts have been made to mimic the skin using synthetic membranes [8][9][10][11][12][13]. Xia et al. [14] proposed an intriguing technique, called the membrane-coated fiber (MCF) assay approach, to simulate the different molecular interactions in skin permeation by different types of materials. In this approach, an MCF is used as the absorption membrane to determine partition coefficients, namely the ratio of the concentration of solute partitioning to the MCF relative to the concentration of solute not partitioning to the MCF. The partition coefficient is a measurement of the strength of molecular interaction that governs percutaneous absorption processes. Assuming that the MCF adequately represents skin absorption, larger values of partition coefficients suggest greater levels of absorption of the solute into skin, translating to possible health implications during the metalworking processes.
To relate the dermal permeability of a solute to the solute's chemical structure or properties, it is very common practice to develop and study a relevant quantitative structure-activity relationship (QSAR) model as classically demonstrated by [15] and [16], and also demonstrated more recently in studies more relevant to this paper ( [17][18][19]). Many commonly used QSAR models are linear regression models that use the biological activity (partition coefficients, permeation coefficients, etc.) as the response variable and the molecular descriptors as predictors. The linear free-energy relationship (LFER) model of [20] is a particular type of QSAR model that is widely used in modeling results from dermal permeability studies. The LFER model is easy to use and interpret, however, when experimental conditions are complex, a simple LFER model may not be able to appropriately account for the observed variability, leading to a model with poor fit statistics and low predictive power. Xu et al. [19] expanded the LFER model to account for the heterogeneity introduced by experimental factors, in which one set of partial slopes are defined for each experimental condition. This model proved to be useful, improving both the model fit statistics and predictive power. This article pursues extensions of the LFER model that are in the spirit of [19], but we are able to obtain further improvements in model performance by incorporating additional features observed in the current study. The critical role played by model assessment criterionQ 2 LOSO is also reviewed. The resulting model provides interpretations that are useful for identifying solutes whose chemical structures are consistent with low predicted levels of skin permeability.
An attractive feature of the MCF approach of [14] is their proposed partition theory, namely that the partition coefficient of a solute from a formulation is not affected by the starting concentration of that solute in the formulation. This theory, if realized, can lead to simplified analysis even in the most complex of experimental conditions. By applying an expanded LFER model, we are able to test this theory that could not otherwise be tested.
Earlier efforts by Xia et al. 2007 [13] demonstrated the use of a MCF array to simulate skin permeability in simple binary mixtures. However the present paper utilizes the MCF and molecular structure parameters within an LFER model described above to now better estimate the effects of several real world formulations at various concentrations on the partitioning behavior of 37 solutes at different concentrations in an effort to estimate solute partitioning into MCF which serves as a surrogate for skin permeability

Data Summaries
Formulations are designed to mimic fluids used in the metalworking process. For this article, a formulation refers to: a particular metalworking fluid (MWF), at a particular MWF concentration, spiked with a solute at a particular concentration. Formulations are spiked with trace levels of solutes in such a way that the chemistry of the MWF is not altered.
In this study, we considered 37 solutes (see Table 1) and five solvatochromic descriptors believed to be most relevant to the solvation process during permeation [16,20]. These descriptors represent different characteristics of compounds involved in the solvation process, specified as follows. E is the solute excess molar refractivity, S is the solute dipolarity/polarizability, A is the overall hydrogen bond acidity, B is the overall hydrogen bond basicity, and V is the McGowan characteristic volume. For most solutes, V can be calculated directly, E can be obtained from experiment or calculated, but A, B, and S must be experimentally derived. We varied the three other factors to create a formulation: the MWF, MWF concentration, and solute concentration. Five MWFs were considered: mineral oil (MO), polyethylene glycol-200 (PEG), soluble oil (SO), synthetic oil (SYN), and semi-synthetic oil (SSYN). MWF concentrations were at three levels: 0.05 percent, 0.5 percent, and 5 percent. Six solute concentrations were considered: 0.01, 0.05, 0.1, 0.5, 1, and 5 ppm. As a result, there were 5 × 3 × 6 = 90 treatment combinations, as displayed in Table A1 in Appendix A.
The study was designed to obtain partition coefficients, K MCF/mix , for all 37 solutes, under each of the 90 treatment combinations, using three replicates. Unfortunately, due to a variety of reasons (e.g., lack of detection in gas chromatography, records outside the calibration range, etc.), not all replicates were recordable, with some treatment combinations even ending in no replicates for a particular solute. Fitting the QSAR model does not require replicates because of the structure provided by the model, and all collected data informs the fitting process. Having replicates would likely result in smaller measures of variability and hence greater power to make inference beyond what could done here, but the lack of replicates has not impeded the ability to conduct statistical analysis and model building. Of the maximum possible 37 × 90 × 3 = 9990 observations, we actually generated 4646 partition coefficients.
Summary statistics are displayed in Table 2 for all variables, based on the complete dataset of 4646 observations. Partition coefficients range from 0.015 to 1279 (−1.820 to 3.107 on the base 10 logarithm scale). To get a more detailed view of the range of values for partition coefficients, Figure 1 shows boxplots of log K MCF/mix grouped by solute concentration. It is somewhat surprising that the smallest partition coefficients are associated with higher concentrations of solute present in the formulation; we return to this observation later in the article. (e.g., lack of detection in gas chromatography, records outside the calibration range, etc.), not all replicates were recordable, with some treatment combinations even ending in no replicates for a particular solute. Fitting the QSAR model does not require replicates because of the structure provided by the model, and all collected data informs the fitting process. Having replicates would likely result in smaller measures of variability and hence greater power to make inference beyond what could done here, but the lack of replicates has not impeded the ability to conduct statistical analysis and model building. Of the maximum possible 37 × 90 × 3 = 9990 observations, we actually generated 4646 partition coefficients. Summary statistics are displayed in Table 2 for all variables, based on the complete dataset of 4646 observations. Partition coefficients range from 0.015 to 1279 (−1.820 to 3.107 on the base 10 logarithm scale). To get a more detailed view of the range of values for partition coefficients, Figure  1 shows boxplots of log / grouped by solute concentration. It is somewhat surprising that the smallest partition coefficients are associated with higher concentrations of solute present in the formulation; we return to this observation later in the article.

Insufficiency of the LFER Model
Abraham and Martins [20] proposed the general linear free-energy relationship (LFER) model to study dermal absorption: where SP is the property of interest for the solutes (such as log K p , log P, etc.). Given data, the coefficients in the LFER model are determined by multiple linear regression. These coefficients are also commonly denoted as c, e, s, a, b, and v; we used β 0 , β 1 , β 2 , β 3 , β 4 , and β 5 as this is more common in the literature of multiple linear regression. In this article, logarithm of the partition coefficient, log K MCF/mix , is the property of interest. The resulting LFER model is shown in Equation (1): While the LFER model in Equation (1) is simple and easy to interpret, it is not always sufficient, especially for large datasets under complicated experimental conditions. Equation (1) suggests that the expected value of log K MCF/mix is a function of only E, S, A, B, and V. However, as is clearly demonstrated in Figure 1, log K MCF/mix decreases as solute concentration increases, suggesting that solute concentration should likely be included as a predictor in Equation (1); we return to this observation below.
Focusing for the moment on the LFER model, Equation (1) was separately applied to data from each of the 90 treatment combinations, resulting in 90 separate estimated models. If all 90 estimated models essentially coincide, then the LFER model that only accounts for E, S, A, B, and V, and does not adjust for experimental conditions, is sufficient. To investigate this, Table 3 presents details on three of the 90 estimated models; details include estimated coefficients, their standard errors, and associated 95 percent confidence intervals. Estimated models are shown for: treatment combination 5, with mineral oil at 0.05 percent and solute concentration 1 ppm; treatment combination 17, with mineral oil at five percent and solute concentration 1 ppm; and treatment combination 52, with soluble oil at five percent and solute concentration 0.5 ppm. Table 3. Results from fitting separate LFER models (Equation (1)) for each of three treatment combinations (T). The estimated models in Table 3 did not coincide. Consider, for example, the coefficient β 1 corresponding to E. For treatment combination 5, the 95 percent confidence interval consists of only positive values (0.89 to 2.65), suggesting that log K MCF/mix is expected to increase as excess molar refractivity increases. On the other hand, the 95 percent confidence interval consists of only negative values (−1.35 to −0.48) for treatment combination 17, suggesting that log K MCF/mix is expected to decrease as excess molar refractivity increases. These conflicting interpretations are not isolated. Figure 2 graphs the 95 percent confidence intervals for coefficient β 1 corresponding to E from all 90 treatment combinations, and these intervals clearly do not coincide. Moreover, similar results hold for all coefficients, as demonstrated in Table 3.

Figure 2.
Estimated 1 coefficients (circles) for molecular descriptor E from fitting separate LFER models across the 90 treatment combinations. Ninety-five percent (95%) confidence intervals are also shown, as vertical lines with two bars at the ends.

Improvement by Expanded LFER Models
Xu et al. [19] demonstrate insufficiency of the LFER model for accounting for experimental conditions defined by four MWFs. They extend the LFER model by allowing for different sets of estimated coefficients for each of the four MWFs, all while using a single model. They obtained substantial improvements in predictive power of the Extended LFER model compared to the (single) LFER model. Hoping to achieve similar levels of improvement as [19], we also fitted an Extended LFER model that allows for different sets of estimated coefficients for each of the 90 treatment combinations, while using a single model, as follows: where log MCF/mix, is the lth observation from MWF i (i = 1 for MO, i = 2 for PEG, i = 3 for SO, i = 4 for SYN, and i = 5 for SSYN), MWF concentration j (j = 1 for 0.05, j = 2 for 0.5, and j = 3 for 5 percent), and solute concentration k (k = 1 for 0.01, k = 2 for 0.05, k = 3 for 0.1, k = 4 for 0.5, k = 5 for 1, and k = 6 for 5 ppm). In Equation (2), denotes the coefficient for descriptor d (with d = 0 for the intercept, d = 1 for E, d = 2 for S, d = 3 for A, d = 4 for B, and d = 5 for V) corresponding to MWF i, MWF concentration j, and solute concentration k. For example, 1111 is the partial slope for descriptor E under treatment combination 1, with mineral oil at 0.05 percent and solute concentration 0.01 ppm. Three "dummy variables" , , and are defined to indicate treatment combinations; these variables take value zero or one according to the levels of MWF, MWF concentration, and solute concentration.

Improvement by Expanded LFER Models
Xu et al. [19] demonstrate insufficiency of the LFER model for accounting for experimental conditions defined by four MWFs. They extend the LFER model by allowing for different sets of estimated coefficients for each of the four MWFs, all while using a single model. They obtained substantial improvements in predictive power of the Extended LFER model compared to the (single) LFER model. Hoping to achieve similar levels of improvement as [19], we also fitted an Extended LFER model that allows for different sets of estimated coefficients for each of the 90 treatment combinations, while using a single model, as follows: where log K MCF/mix,ijkl is the lth observation from MWF i (i = 1 for MO, i = 2 for PEG, i = 3 for SO, i = 4 for SYN, and i = 5 for SSYN), MWF concentration j (j = 1 for 0.05, j = 2 for 0.5, and j = 3 for 5 percent), and solute concentration k (k = 1 for 0.01, k = 2 for 0.05, k = 3 for 0.1, k = 4 for 0.5, k = 5 for 1, and k = 6 for 5 ppm). In Equation (2), β dijk denotes the coefficient for descriptor d (with d = 0 for the intercept, d = 1 for E, d = 2 for S, d = 3 for A, d = 4 for B, and d = 5 for V) corresponding to MWF i, MWF concentration j, and solute concentration k. For example, β 1111 is the partial slope for descriptor E under treatment combination 1, with mineral oil at 0.05 percent and solute concentration 0.01 ppm. Three "dummy variables" F ijkl , C ijkl , and W ijkl are defined to indicate treatment combinations; these variables take value zero or one according to the levels of MWF, MWF concentration, and solute concentration.
The model in Equation (2) is quite large, having a maximum of 90 intercepts (one for each treatment combination) and 5 × 90 = 450 partial slopes (slopes corresponding to each of E, S, A, B, and V for each treatment combination). For any given observation, Equation (2) activates only a single set of coefficients because the product F ijkl C ijkl W ijkl will only be nonzero for a single Molecules 2018, 23, 3076 7 of 19 treatment combination. For example, if the observation is in treatment combination 2 (mineral oil at concentration 0.05 percent with solute concentration 0.05 ppm), then F 1jkl C i1kl W ij2l = 1 and all other F ijkl C ijkl W ijkl = 0, thus activating only β 0112 + β 1112 E l + β 2112 S l + β 3112 A l + β 4112 B l + β 5112 V l in Equation (2). Since Equation (2) is based on multiplying the dummy variables, we refer to it as the Expanded Crossed-Factors LFER model. Table 4 shows regression statistics of fitting the Expanded Crossed-Factors LFER model of Equation (2). Regression statistics are also shown for the (single) LFER model of Equation (1), and another model to be described later. The improvements in r 2 , Adj-r 2 , Q 2 LOO , and Q 2 LOSO are quite noticeable in favor of the Expanded Crossed-Factors LFER model over the LFER model. While r 2 and Adj-r 2 are widely known, Q 2 LOO , and Q 2 LOSO may be less familiar. Both Q 2 LOO and Q 2 LOSO are designed to measure predictive ability of a model, but [19] demonstrate the advantage of Q 2 LOSO over Q 2 LOO for the current context. Leave-one-out (LOO) cross-validation is employed in both, meaning models are fit after reducing the dataset, then the resulting fit is used to make prediction on the portion of the data that was left out. The difference is that Q 2 LOSO leaves out an entire solute at a time, whereas Q 2 LOO omits a single row from the dataset. If only a single row is removed from the dataset, we are left with the possibility that a single replicate of a solute in a particular formulation may be removed, but the other two replicates remain in the dataset. The result is that the model is fit with almost full knowledge of the solute in question, and the consequence is that we are misled about the quality of the model for fitting "new, unseen" solutes. By removing every instance of a solute, Q 2 LOSO provides a better assessment of the quality of the model for predicting new, unseen solutes. Large values are desirable for both Q 2 LOO and Q 2 LOSO , but the extra demands placed on Q 2 LOSO usually result in smaller values of Q 2 LOSO compared to Q 2 LOO , in much the same way that Adj-r 2 is often smaller than r 2 . (It is important to note that Q 2 LOSO in this article is equivalent to Q 2 LOO−adj in [19]. We prefer the simpler "LOSO" as it more clearly explains the difference from "LOO".) where y l is the lth observed response of log K MCF/mix ,ŷ l,−l is the leave-one-out prediction of the lth observation based on the model fit without the lth observation, and y is the average of all the observed responses. Q 2 LOSO , designed by [19] to handle pseudo or real replicates in leave-one-out cross-validation for proper assessment of predictive power, is defined as: where y sl is the lth observation of the sth solute, y is the average of all the observed responses, andŷ sl,−s is the predicted value of y sl based on the model fit from leaving out all the observations belonging to the sth solute. While Q 2 LOSO showed improvement of the Expanded Crossed-Factors LFER model over the LFER model, the value of 0.68 is not impressive and indicates some deficiency of the model. One possible reason may be overfitting. With so many regression para meters, this model seems to fit the data too closely, thus the idiosyncrasies of the data are captured instead of the general trends. The problem of overfitting is that when the model is applied to a new dataset, it cannot predict the new data well, as indicated by the weak value of Q 2 LOSO . This motivates us to look for an alternative model, which not only accounts for the heterogeneity introduced by different experimental conditions, but is also simpler and more predictive. The LFER model may be expanded in a variety of ways that accommodate experimental conditions, and the goal is to identify the simplest adequate expansion. As previously mentioned, the Expanded Crossed-Factors LFER model of Equation (2) is quite large, and we wondered whether it could be simplified. Figure 1 tells us that partition coefficients decrease as the solute concentration increases. This suggests that there may be a quantifiable relationship between log K MCF/mix and solute concentration. However, Figure 1 is the overall effect of solute concentration, not accounting for the effect of MWF or MWF concentration. Thus, a more detailed visualization is desired. Figure 3 depicts the trend of log K MCF/mix over solute concentration in all 15 combinations of MWF and MWF concentration. It shows a similar trend as in Figure 1, for each of the 15 combinations of MWF and MWF concentration. Figure 3 suggests that instead of viewing solute concentration as a third factor crossed with MWF and MWF concentration, we can take it as a (numerically) nested factor within each of the combinations of MWF and MWF concentration. In other words, for each combination of MWF and MWF concentration, allow a different partial slope for solute concentration. By doing this, we place a structure within each MWF x MWF concentration condition, and may be able to see how log K MCF/mix changes as a function of solute concentration. accommodate experimental conditions, and the goal is to identify the simplest adequate expansion. As previously mentioned, the Expanded Crossed-Factors LFER model of Equation (2) is quite large, and we wondered whether it could be simplified. Figure 1 tells us that partition coefficients decrease as the solute concentration increases. This suggests that there may be a quantifiable relationship between log / and solute concentration. However, Figure 1 is the overall effect of solute concentration, not accounting for the effect of MWF or MWF concentration. Thus, a more detailed visualization is desired. Figure 3 depicts the trend of log / over solute concentration in all 15 combinations of MWF and MWF concentration. It shows a similar trend as in Figure 1, for each of the 15 combinations of MWF and MWF concentration. Figure 3 suggests that instead of viewing solute concentration as a third factor crossed with MWF and MWF concentration, we can take it as a (numerically) nested factor within each of the combinations of MWF and MWF concentration. In other words, for each combination of MWF and MWF concentration, allow a different partial slope for solute concentration. By doing this, we place a structure within each MWF x MWF concentration condition, and may be able to see how log / changes as a function of solute concentration. We propose a new Expanded Nested-Solute-Concentration LFER model as in Equation (5)   We propose a new Expanded Nested-Solute-Concentration LFER model as in Equation (5): log K MCF/mix,ijl = F 1jl C i1l (β 011 + β 111 E l + β 211 S l + β 311 A l + β 411 B l + β 511 V l + β 611 t l ) +F 1jl C i2l (β 012 + β 112 E l + β 212 S l + β 312 A l + β 412 B l + β 512 V l + β 612 t l ) + · · · +F 5jl C i3l (β 053 + β 153 E l + β 253 S l + β 353 A l + β 453 B l + β 553 V l + β 653 t l ), where log K MCF/mix,ijl is the lth observation from MWF i, MWF concentration j, t l is the logarithm (base 10) of solute concentration of the lth observation, β dij is the regression coefficient of descriptor d (d = 0 for intercept, d = 1 for E, d = 2 for S, d = 3 for A, d = 4 for B, d = 5 for V, and d = 6 for logarithm of solute concentration), for MWF i and MWF concentration j. We take the logarithm of solute concentration as it is common practice and it linearizes the relationship. This model is relatively small, with a maximum of 15 × 7 = 105 coefficients to be estimated, compared to a maximum of 540 for the model in Equation (2). Regression statistics are shown in Table 4, and it is clear that the Expanded Nested-Solute-Concentration LFER model of Equation (5) is at least as good as the Expanded Crossed-Factors LFER model of Equation (2), because it has comparable or larger values for all regression statistics. However, the Expanded Nested-Solute-Concentration LFER model of Equation (5) has a tremendous advantage in that: (1) it is much smaller, and so more amenable to interpretation; and (2) it is more predictive as indicated by a much larger value for Q 2 LOSO . Figure 4 plots observed versus predicted log K MCF/mix values for both the LFER and Expanded Nested-Solute-Concentration LFER models. The tighter grouping around the line for the latter model is yet another demonstration of that model's better predictive power. Regression statistics are shown in Table 4, and it is clear that the Expanded Nested-Solute-Concentration LFER model of Equation (5) is at least as good as the Expanded Crossed-Factors LFER model of Equation (2), because it has comparable or larger values for all regression statistics. However, the Expanded Nested-Solute-Concentration LFER model of Equation (5) has a tremendous advantage in that: (1) it is much smaller, and so more amenable to interpretation; and (2) it is more predictive as indicated by a much larger value for 2 . Figure 4 plots observed versus predicted log / values for both the LFER and Expanded Nested-Solute-Concentration LFER models. The tighter grouping around the line for the latter model is yet another demonstration of that model's better predictive power.

Model Interpretation
We now intepret the estimated Expanded Nested-Solute-Concentration LFER model of Equation (5).
There are 15 rows in Equation (5), each representing the regression function for one combination of MWF/MWF concentration. For example, row one is for MWF mineral oil at concentration 0.05 percent, while row 15 is for MWF semi-synthetic oil at concentration five percent. Each row has a set of partial slopes that vary among the different combinations of MWF/MWF concentration. The estimates and associated standard errors of all partial slopes are shown in Table A2 in Appendix A.
To show how the partial slopes vary, in Figure 5 we plot 95 percent confidence intervals for each

Model Interpretation
We now intepret the estimated Expanded Nested-Solute-Concentration LFER model of Equation (5). There are 15 rows in Equation (5), each representing the regression function for one combination of MWF/MWF concentration. For example, row one is for MWF mineral oil at concentration 0.05 percent, while row 15 is for MWF semi-synthetic oil at concentration five percent. Each row has a set of partial slopes that vary among the different combinations of MWF/MWF concentration. The estimates and associated standard errors of all partial slopes are shown in Table A2 in Appendix A.
To show how the partial slopes vary, in Figure 5 we plot 95 percent confidence intervals for each partial slope corresponding to E, S, A, B, V and log solute concentration across all 15 combinations of MWF/MWF concentration. The 95 percent confidence intevals are shown as vertical lines with two bars at the ends. A horizontal reference line of zero is also shown. There are some interesting trends seen in Figure 5.
For example, in Figure 5a, the partial slope of E generally decreases as MWF concentration increases within each MWF. In mineral oil, the effect (sign of β 1 ) of E (solute excess molar refractivity) even changes as MWF concentration increases. To be specific, using mineral oil at concentration of 0.05 percent, if we increase solute excess molar refractivity and other predictors are held fixed, then the partition coefficient is expected to increase (the 95 percent confidence interval lays above the reference line). On the other hand, using mineral oil at the higher concentration of five percent, if we increase solute excess molar refractivity, then we expect the partition coefficient to decrease (the 95 percent confidence interval lays below the reference line).
In Figure 5b, the partial slope of S generally increases as MWF concentration increases within mineral oil, soluble oil, and semi-synthetic oil, but partial slopes show no significant change as MWF concentration increases within polyethylene glycol-200 and synthetic oil. In general, S (solute dipolarity/polarizability) has an inverse relationship with expected partition coefficient, meaning that as S increases we expected a decrease in partition coefficient. Figure 5c suggests increased levels of hydrogen bond acidity A are associated with decreased partition coefficients. However, the pattern of decrease changes according to the concentration of MWF. For example, in both mineral oil and soluble oil, higher MWF concentrations result in smaller decrease in partition coefficients. Figure 5d indicates that increased levels of hydrogen bond basicity B generally leads to decreased partition coefficients. Figure 5e says larger molecules tend to have larger partition coefficients. In soluble oil, synthetic oil and semi-synthetic oil, the effect of molecule size V gets smaller as MWF concentration increases, resulting in less dramatic effect of molecule size on partition coefficients. Figure 5f suggests that higher concentrations of solute generally result in lower partition coefficients. In both mineral oil and soluble oil, higher MWF concentrations result in stronger inverse relationships.

Implication of Partition Theory
According to [14], it is assumed that the amount of solute extracted from the MCF, 0 , is proportional to the solute concentration, 0 , where the proportionality constant is not affected by 0 .

Implication of Partition Theory
According to [14], it is assumed that the amount of solute extracted from the MCF, n 0 , is proportional to the solute concentration, C 0 , where the proportionality constant is not affected by C 0 . Based on this assumption, we obtain n 0 = pC 0 , where p is the proportionality constant and 0 ≤ p ≤ 1. Applying this relationship to partition coefficients, we obtain: Equation (6) suggests that K MCF/mix is independent of C 0 , which suggests that irrespective of the solute concentration, the partition coefficient remains the same. This so-called "partition theory", if true, has practical meaning in the metalworking industry as it would indicate that increasing solute concentration has no impact on skin permeation ability of the solute. For example, higher concentrations of biocides might be preferred to extend preservation of fluids, while there is no detrimental effect of increasing the biocide's ability to permeate skin. As described in more detail in the methods section, the MCF consists of a PDMS coating that is 100 µm thick and 1 cm long on an inert silica fiber. Solute partitioning into this membrane is dependent on the many chemical-chemical interactions quantified by our Expanded LFER models. However, the membrane volume (V m ) suggests that this may be a limitation with increasing solute concentration. It was, therefore, interesting to see if this partition theory is supported by our data.

Violation from Experimental Data
Assume the Expanded Nested-Solute-Concentration LFER model of Equation (5). To test whether the partition theory holds, we simply tested whether the coefficients corresponding to any solute concentration terms are different from zero. If all coefficients corresponding to solute concentration terms equal zero in Equation (5), then log K MCF/mix will not change as solute concentration changes. More specifically, we test the following null hypothesis: H 0 : β 6ij = 0 for all i = 1, 2, 3, 4, 5 and j = 1, 2, 3.
The resulting p-value of less than 0.0001 allows us to strongly conclude that the solute concentration term for at least one combination of MWF/MWF concentration is significantly different from zero. In fact, the individual P-values for testing each β 6ij = 0 show that the solute concentration effect is significantly different from zero for 12 of the 15 combinations; nonsignificance is obtained only in MO/0.05, PEG/5 and SYN/0.05. These results are consistent with Figure 5f, where confidence intervals contain zero only for mineral oil at concentration 0.05, polyethylene glycol-200 at concentration 5 and synthetic oil at concentration 0.05.
Hoping to find that the partition theory holds true in either low or high solute concentrations, we considered subsets of data that contain only some of the solute concentrations. Detailed results are given in Table 5 of testing the null hypothesis that the partition theory holds for a number of different subsets of solute concentrations. For example, does the partition theory hold when considering only observations with solute concentrations less than or equal to 1 ppm? The answer is provided by row two of Table 5: with a p-value of less than 0.0001, the partition theory does not hold for solute concentrations less than or equal to 1 ppm, with violations happening in eight of the 15 combinations. In fact, the partition theory is violated in all subsets of solute concentrations.  The subset of solute concentrations is shown in the first column, with p-value given in the second column. MWF/MWF concentrations that support the partition theory (meaning their individual p-values are larger than 0.05/15, where division by 15 is to adjust for multiple testing) are shown in the third column (with sample sizes in parentheses). MWF/MWF concentrations that violate the partition theory are shown in the last column (with sample sizes in parentheses). The partition theory is violated in every subset, with the greatest support for the partition theory being achieved when limiting solute concentration to 0.05 or 0.1 or 0.5 ppm as the largest subset.

Materials and Methods
Our experiments were based on the MCF approach proposed in [14]. Only a single MCF was used, namely PDMS (polydimethylsiloxane). In the current study, solutes were dissolved into a particular formulation, then an MCF was placed in the vial to allow the solute to partition from the solute-spiked formulation into the MCF over a period of one to four hours; see Figure 6. Gas chromatography and mass spectrometry were then used to extract or desorb the solute from the MCF, and the amount extracted was recorded.

Solvent/Solute Preparation
Three industry generic metal working fluids (MWF) formulations; soluble oil, synthetic fluid, and semi-synthetic fluid were kindly supplied by from Cimcool Industrial Products LLC (Cincinnati, OH, USA). The precise composition for each of these three formulations is proprietary information. In general, soluble oil concentrates contained approximately 58% mineral oil along with various other performance additives such as sulfonates and ethanolamines, semi-synthetic fluid concentrates contained about 15% mineral oil along with other additives such as sulfonates and ethanolamines, and synthetic fluid concentrates contain no mineral oil but contained various carboxylic acid salts, ethanolamines, ethyleneglycols, and plant seed oils. This is typical of many commercial MWF formulations that fall into these three categories. In addition to these three MWFs, two laboratory prepared surrogate formulations, mineral oil and PEG-200 (Aldrich, St. Louis, MO, USA) were prepared volumetrically in 0.05%, 0.5%, and 5.0% formulations in ultrapure water (Pure Water Solutions, Hillsborough, NC, USA). Each of these formulations were then spiked to six concentrations in the range of 0.01-5.0 µg/mL ranges with a set of 37 solutes (Table 1). These solutes were chosen to represent a wide variety of physiochemical properties. All solutes were of the highest purity available for purchase (Sigma Aldrich, Milwaukee, WI, USA). The 37 solutes were also prepared in acetone in a 2000 µg/mL stock solution. Experimental solutions were prepared fresh and all samples were kept at ambient temperature prior to analysis by SPME/GC-MS. Liquid GC-MS injections of the same 37 solutes prepared in acetone (0.01-10.00 µg/mL) were run daily, as well as blank liquid (acetone) and SPME (prepared solvent without addition of 37 solute) injections.

SPME/GC-MS Analysis
SPME absorption and injection was performed by a CTC Analytics Comi-Pal auto injector (Varian Inc., Walnut Creek, CA, USA) outfitted with a 100 µm polydimethylsiloxane SPME unit (Supelco Analytical, Bellafonte, PA, USA). A 9 mL sample was first agitated in a 37 • C heating block for 5 min, the SPME MCF ( Figure 6) was then inserted and exposed for 30 min at 37 • C with constant agitation. SPME and liquid (0.5 µL) injections were introduced into a Varian 1079 injector (Varian Inc., Walnut Creek, CA, USA) at 280 • C in a split less mode for five min, at 5.5 min the split was turned on to 100%. For the first 30 seconds a pressure pulse of 21.0 psi was applied. Column flow was maintained at a constant 1.0 mL/min using helium as the carrier gas (National Welders, Raleigh, NC, USA). The Varian CP-3800 GC oven (Varian Inc., Walnut Creek, CA, USA) was programmed to hold at 40 • C for the first minute, followed by a 20 • C/min ramp to 90 • C (3.5 min), at which time the ramp slowed to 2.5 • C/min until 127 C (18.30 min) was reached and the ramp was increased to 40 C/min until it reached 250 • C and held for 2.0 min (23.38 min), followed by another increased ramp of 40 C/min until 280 • C and held for 5.0 min (29.13 min). The Saturn 2200-MS (Varian Inc., Walnut Creek, CA, USA) was programmed to run in full scan mode (40-300 m/z) after the first 3.0 min. Individual solute peaks were identified/quantified by the Star v6.5 software (Varian Inc., Walnut Creek, CA, USA) using retention time and known quant ions as identified and confirmed in the initial method development. Our sensitivity was set at 0.01 µg/mL as we were working with solutes ranging in concentrations from 0.01-5.0 µg/mL. More importantly, no residues were detected in the second injection after each first test injection, which indicated that there was negligible carry over under the optimum desorption conditions. Differential ability of the solute to dissolve into the MCF or remain in the formulation was measured using a partition ratio (coefficient) K MCF/mix between the equilibrium concentration of the solute in the MCF and the equilibrium concentration of the solute in the formulation. K MCF/mix was calculated, following [14], as: where n 0 is the amount (in µg) of solute extracted from the MCF, V m is the volume (in mL) of the MCF, V d is the volume (in mL) of formulation placed in the vial based on solute concentration C 0 (in µg/mL), C pe = n 0 /V m is the equilibrium concentration of solute in the MCF, and C me = C 0 − n 0 /V d is the equilibrium concentration of solute in the formulation. ADME Boxes 4.95, commercial software from ACD/Labs [21], was used to identify the E, S, A, B, and V descriptors for all the 37 solutes used in the experiment.
where 0 is the amount (in μg) of solute extracted from the MCF, is the volume (in mL) of the MCF, is the volume (in mL) of formulation placed in the vial based on solute concentration 0 (in μg/mL), = 0 / is the equilibrium concentration of solute in the MCF, and = 0 − 0 / is the equilibrium concentration of solute in the formulation. ADME Boxes 4.95, commercial software from ACD/Labs [21], was used to identify the E, S, A, B, and V descriptors for all the 37 solutes used in the experiment.

Summary and Conclusions
The partition theory of [14] does not appear to hold for the current study, as evidenced by Figure 1, Figure 3, and Table 5. It is probable that there is a finite number of binding sites available in the coating of the fiber (i.e., in the MCF). As the solute concentration increases, the percentage of the solute that absorbs and/or adsorbs to the membrane coating decreases due to this finite number of binding sites.
Notwithstanding the complications that arise from violations of the partition theory, our Expanded LFER models are able to adequately capture the variability of partition coefficients as a function of solute properties and experimental conditions. The Expanded Crossed-Factors LFER model based on [19] is a vast improvement over the single LFER model, while the Expanded Nested-Solute-Concentration LFER model developed in this article is even more refined, more predictive, and offers simple interpretations. Table 3, Table 4, Figure 2, and Figure 4 provide strong evidence that the simple LFER model is not adequate in the presence of complicated experimental conditions. Proper assessment of model prediction ability is demonstrated with Q 2 LOSO (previously Q 2 LOO−adj in [19]), and this measure is contrasted with Q 2 LOO and the more familiar r 2 and Adj-r 2 . The leave-one-solute-out strategy allows assessment to occur based on completely unseen solutes.