A New Approach on Estimation of Solubility and n-octanol/water Partition Coefficient for Organohalogen Compounds

The aqueous solubility (logW) and n-octanol/water partition coefficient (logPOW) are important properties for pharmacology, toxicology and medicinal chemistry. Based on an understanding of the dissolution process, the frontier orbital interaction model was suggested in the present paper to describe the solvent-solute interactions of organohalogen compounds and a general three-parameter model was proposed to predict the aqueous solubility and n-octanol/water partition coefficient for the organohalogen compounds containing nonhydrogen-binding interactions. The model has satisfactory prediction accuracy. Furthermore, every item in the model has a very explicit meaning, which should be helpful to understand the structure-solubility relationship and may be provide a new view on estimation of solubility.


Introduction
Aqueous solubility (logW) and n-octanol/water partition coefficient (logP OW ) of compounds have long been recognized as the key molecular properties and are widely used in such diverse areas as pharmaceutics, biochemistry, environmental chemistry, toxicology, chemistry and chemical engineering. Drug delivery, transport, and distribution; prediction of environmental fate; and development of analytical methods depend on solubility and partition properties [1,2]. As a consequence, it is of considerable value to have practical knowledge of the logW and logP OW values for molecules. The measurement of logW or logP OW through the synthesis of a compound and then its subsequent experimental determination is time-consuming and expensive. Hence, there is strong interest in the structure-based prediction of logW or logP OW for rational development of new drugs and for reasonable assessment of the environmental impact of chemicals before they were released into the environment. Not surprisingly, numerous methods for the prediction of aqueous solubility or partition coefficients have been suggested in the literature [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19]. Fortunately, Jorgensen did a very good review on the prediction methods of logW for organic compounds [20]. Recently, Kühne has made a comparison among those widely used methods and pointed out that "every method has its methodspecific application domains" [21]. Thus, new methods for supplementing existing approaches are required.
It is well known that organohalogen compounds have been manufactured and used in the chemical industry as solvents, propellants, additives, cooling agents, and insecticides for many years [22]. In addition, these compounds can be formed during combustion processes in waste incineration. Generally, organohalogen compounds, such as polychlorinated biphenyls (PCBs), polybrominated biphenyls (PBBs), polychlorinated benzenes, polybrominated benzenes and polychlorinated naphthalenes (PCNs) and so on, have some extent negative impact on the environment and the ecology. Thus the assessment of the environmental risk of these compounds, which can be roughly done by studying their logW or logP OW , is very important. Recently, Padmanabhan [23], Lü [24], and Zou [25] proposed QSPR models to predict the logP OW of PCBs, and obtained good prediction accuracy. In the present paper, based on an understanding of the processes involved in dissolution, a new and very simple method was suggested to predict the logP OW and logW for some halogen containing organic compounds. The present method has a good prediction accuracy and every term in the presented equation has an explicit physical and chemical meaning.

Methodology
The dissolution of a solute in a solvent can conceptually take place in two stages: (i) a sizable hole or cavity has to be formed in the solvent phase to accommodate the solute molecule; (ii) the solute molecule is then inserted into the hole, and then interacts with the solvent molecules around it. After the above two steps, a stable solution is formed [26].
At the first stage of the dissolution, an input energy or enthalpy (E input ) is needed to separate the solvent molecules, i.e., to overcome the solvent-solvent cohesive interactions. This energy is proportional to the size or volume of the solute molecule. The second stage of the dissolution is an exothermic process. The output energy (E output ) in this stage for organic compounds having no (or very weak) hydrogen-binding interactions with solvent molecules, is, in our opinion, correlated with the interaction of the frontier orbitals of the solute molecules (FMO solute ) and solvent molecules (FMO solvent ). In other words, ignoring the interaction of the hydrogen-binding interactions resulting from solute and solvent molecules, E output is mainly determined by the interactions between solvent's HOMO (HOMO solvent ) and solute's LUMO (LUMO solute ), and between solute's HOMO (HOMO solute ) and solvent's LUMO (LUMO solvent ), viz.: According to the above statements, the following equation was proposed to predicted the logW for the organohalogen compounds having no (or very weak) hydrogen-binding interactions with solvent, Here, a, b, c and d are the coefficients; V, E HOMO and E LUMO are the volume, the HOMO energy and the LUMO energy of the solute, respectively. The parameter V of a solute can be calculated by additive method, for the details one should consult Ref. [10]; The E HOMO and E LUMO were calculated by Gaussian 98 program (using Gaussian program packages in SYBYL 6.7 of Tripos, Inc.) at the HF/6-31G(d) level.

Aqueous Solubility of PCBs
Taking some experimental logW of PCBs [11] (listed in Table 1) Tables 2 and 3, respectively, which suggested that the three descriptors (V, E HOMO , and E LUMO ) are significant descriptors and not strongly correlated with each other. According to the t-test values (in Table 2), the more significant descriptor appearing in Eq. (3) is the descriptor V, which indicated that the volume of PCB molecules is the predominant factor determining the PCB's aqueous solubility. The t-score value of parameter E LUMO implied that the interaction of LUMO of PCB molecule with the HOMO of water is also play a very importance role in the determination of the PCB's aqueous solubility.

n-Octanol/Water Partition Coefficient of PCBs
Hantsch et al. [27] have indicated that there exists a linear relationship between the aqueous solubility (logW) and the n-octanol/water partition coefficient (logP OW ) of a solute. As Eq. (2b) can express well the relationship of the structure-aqueous solubility for PCB congeners, it was also expected to be able to predict the n-octanol/water partition coefficient. Thus, taking some experimental logP OW of PCBs [10,23] (see Table 1) as the training set, we employed Eq. (2b) to carry out the regression analysis and got the following equation: Equation (4) has a high correlation coefficient R and small standard deviation S for predicting the noctanol/water partition coefficient of PCB congeners.

Discussion
A closer analysis of the coefficients in front of the parameters in Eq. (3) can provide physical insights to understand structure-solubility relationship. The negative coefficient of the parameter V implied that the PCB molecule with a larger volume has a smaller logW value than that of the smaller PCB. That is to say, the larger PCB has lower solubility in water than that of the smaller PCB, because a larger hole has to be carved in the water layer for accepting the larger PCB molecule, which needs a larger energy input. The positive coefficient of E HOMO means that the higher the HOMO of the PCB molecule, the larger the logW value of the PCB is. In our opinion, the higher energy of HOMO of the solute can interact with the LUMO of water more effectively, which is more energetically favorable for the formation of the solution. Thus the higher the HOMO of the PCB molecule, the more soluble it is. The higher the energy of LUMO of the solute molecule, the more effectively the LUMO of the solute molecule interact with the HOMO of water. Thus, the solubility of PCBs increases with the increase of the E LUMO values.
In order to test the robustness and prediction ability of Eq. (3), a cross-validation analysis was performed. In the cross-validation analysis, a model is calculated with groups of objects (i.e., PCB congeners) omitted subsequently, followed by the prediction of the logW for the omitted objects. In the present study, Leave-One-Out (LOO) cross-validation method is employed. The internal predicted ability and the robustness of the models are characterized in terms of the corresponding leave-one-out cross-validation correlation coefficient (R CV ) and the cross-validation predicted standard error (S CV ), which are defined as:  (6) where N is the number of samples used for model building, M is the number of descriptors. The R CV and S CV of Eq. (3) showed that Eq. (3) is robust with only 0.27 log unit for the prediction error of PCBs' logW. The obtained parameters R CV and S CV also show that Eq. (4) is robust.  The results of Eq. (4) and Eq. (7) showed that the predicted accuracy of the present model is better than that of Padmanabhan's QSPR model and is comparable to that of Lü's model. Examination of Eq. (4) or Eq. (7) may lead to the following significant interpretations: the value of logP OW increases with the increase of V, which means that increase in solute size, V, favors wet octanol phase. The reason is that water molecule is more polar than the n-octanol molecule, so the cohesive energy is larger between water molecules than that between the n-octanol ones. Thus, more energy input is needed to create a similarly-sized hole in the more polar solvent (i.e., water phase) than that in the less polar solvent (i.e., n-octanol phase). Consequently, the PCB molecule tends to enter into the n-ocatanol phase, which is energetically more favorable. Increase the E HOMO or E LUMO solute favors the aqueous layer.
It is relatively easy to build a QSPR model for the congeners, while it is somewhat difficult to correlate a data set of heterogeneous compounds. Besides having a high correlation and low deviation, a valuable QSPR model should also have a large application range. Thus, in order to verify the application of Eq. (2) in more complex data sets, we combined the logP OW of 157 PCBs and some other halogen substituted aromatic compounds [10,11] (including PBBs, PCNs, and HBs, listed in Table 1) as a data set, and used Eq. (2b) to perform a correlation analysis, the following correlation equation was obtained: It should be noted that some excellent software (such as ACD/LogP, CLogP [28], and so on) have been developed to compute logP OW . In order to compare the presented results with the data calculated by these softwares, some leading compounds were selected (see Table 4) and their logP OW were computed by Eq. (8) and by CLogP software (using CLogP packages in SYBYL 6.7 of Tripos, Inc.), respectively. For these compounds in Table 4, the average absolute deviation is 0.45 log units between the experimental logP OW exp. and the logP OW CLogP calculated by CLogP software. While, for the same compounds in Table 4, the average absolute deviation is only 0.14 log units between the logP OW exp. and the logP OW calc. predicted by Eq. (8). Seen from the average absolute deviation, the precision of present method is a little better than that of CLogP software.

Conclusions
Based on the comprehension of the dissolution process, a very simple three-parameter model was proposed to predict the aqueous solubility and n-octanol/water partition coefficients for organohalogen compounds containing nonhydrogen-binding interactions. The model has satisfactory prediction accuracy. Furthermore, every item in the model has a very explicit meaning, which would be helpful to understand the structure-solubility relationships.