Predicting Aqueous Solubility of Chlorinated Hydrocarbons by the MCI Approach

Ying-Long Wang, Yang-Dong Hu *, Lian-Ying Wu and We i-Zhong An College of Chemistry and Chemical Engineering, Ocean University of China, Qingdao 266003, People’s Republic of China * Author to whom correspondence should be addressed; E-mail: chem_ouc@yahoo.com.cn or ylong_wang@yahoo.com Received: 24 November 2005 / Accepted: 17 February 2006 / Published: 28 February 2006 Abstract: Correlation for estimation of the aqueous solubility (logSw) of chlorinated hydrocarbons molecules is proposed. The MCI based quantitative structure-property relationship (QSPR) model proposed is predictive and requires only three connectivity indices in the calculation. The correlation equation obtained which is based on a training set of 50 chlorinated hydrocarbons has a correlation coefficient of 0.9670 and a standard error of 0.44 log


Introduction
Aqueous solubility is a particularly important physicochemical property of organic chemicals that plays a significant role in various physical and biological processes, especially in drug transport and environment impact.Comparing with the time-consuming experimental procedures to determine aqueous solubility directly, reliable computational methods to predict aqueous solubility are more popular in today's research [1][2][3].
The molecular connectivity indices which were proposed 30 years ago, have been successfully used in the correlation of various physiochemical properties of organic substances [33][34][35] especially in the recent applications to computational molecular design studies [36].In the previous works, correlations of aqueous solubility using molecular connectivity indices and other descriptors have been studied and demonstrated the possibility of molecular connectivity indices in modeling aqueous solubility.In this study, we use different indices comparing with the already existing models to correlate the aqueous solubility and obtain the simpler model with the same or higher accuracy.

Data Set
The data set that has been studied by Eduardo J. Delgado [22] is adopted as the training set and listed in Table 2. To test the predictive ability of the proposed model, the aqueous solubility data for 73 chlorinated hydrocarbons were collected from the literature [6,10,27], as the testing set, and shown in Table 3.Both the training set and the testing set contain saturated, unsaturated, aliphatic and aromatic compounds, dioxins and PCBs.

Methods
Molecular connectivity indices have been widely used as molecular structural descriptors to correlate the physical properties of organic chemicals and used in computational molecular design studies.Recently, higher-order connectivity indices have been demonstrated the advantage of incorporating effects that are due to larger-scale structural features in a molecule on physical properties [36].

( )
where m is the order of the connectivity index; k denotes a contiguous path type of fragment, which is divided into paths (P), clusters (C), etc; p denotes which type connectivity index is(simple, valence or other type); n m is the number of the relevant paths; δ i p is the connectivity index.
In this work, for each chemical the values of the connectivity indices up to third order are calculated using the vertex adjacency matrix.The simple connectivity index (δ) and the valence connectivity index (δ v ) used in this study are summarized in Table 1.

Group
The detailed equations for the simple and valence molecular connectivity indices for zeroeth, first, second, and third orders are listed as follows: After the calculation of ten molecular connectivity indices, stepwise regression using MATLAB Statistics Toolbox [45] is used for choosing the variables and fitting the experimental data of the data set.
The average absolute error (AAE) and the root-mean-square error (RMSE) were calculated as the following equations to compare with the existing model.The AAE was calculated as The RMSE was calculated as log log (13) where N=number of compounds.

Results and Discussion
Delgado 22 used CODESSA to develop QSPR model and carried out a correlation analysis to find the best QSPR model using a heuristic method.He succeeded in obtaining the two descriptors that have definite physical meaning corresponding to different intermolecular interactions.
In our work, the coefficients of the best correlation model for aqueous solubility of the 50 chlorinated hydrocarbons used as training set in this study are shown in table 2 and equation ( 14).The 0 χ that reflects the size of the molecule is the most significant descriptor, as can be seen by its highest t-test value.This conclusion is in agreement with the existing models 22, 27 .The other descriptors 3 χ c and 3 χ c v that reflect the contribution of clusters in a molecule to aqueous solubility are also important in describing the aqueous solubility of chlorinated hydrocarbons.This demonstrates again that higherorder connectivity indices contain a large amount of information about the molecule, especially the larger-scale structural features (such as branching) 36 .The model we obtained is as the following general correlation: The results calculated with equation ( 14) are shown in Table 3, where the experimental values and the calculated results from the Delgado method are also listed, and the scatter plot is shown in Fig 1.
The AAE for our model is 0.31 which is smaller than the 0.32 for the Delgado model, indicating that the new model has comparable accuracy to the existing model.To test the predictive ability of our model, the aqueous solubility data for 73 chlorinated hydrocarbons were collected from the literature [6,10,27] as the testing set.The predictive results calculated with equation ( 14) are shown in Table 4, where the experimental values and the residual values are also listed, and the scatter plot is shown in Fig 2.
The AAE for the testing set is 0.38 and demonstrates that the proposed model is reliable and has good predictive ability.

Conclusion
Predictive QSPR model which is based on molecular connectivity indices is proposed in this work to correlate the aqueous solubility of 50 chlorinated hydrocarbons.Application of the developed model to a testing set of 73 chlorinated hydrocarbons demonstrates that the new model is reliable with good predictive accuracy and simple formulation.Besides, the new model does not require any experimental physicochemical properties in the calculation, so it is easy to apply, especially in cases where it is inconvenient or impossible to measure the physicochemical properties.

Figure 2 .
Figure 2. Calculated values versus experimental values of logSw for the testing data set.

Table 2 .
The best correlation model of logSw of 50 compounds

Table 3 .
Calculated results of the molar aqueous solubility of the 50 compounds (logSw).

Table 4 .
Predicted Results of the Molar Aqueous Solubility for 73 compounds (logSw)