Study on QSTR of Benzoic Acid Compounds with MCI

Quantitative structure-toxicity relationship (QSTR) plays an important role in toxicity prediction. With the modified method, the quantum chemistry parameters of 57 benzoic acid compounds were calculated with modified molecular connectivity index (MCI) using Visual Basic Program Software, and the QSTR of benzoic acid compounds in mice via oral LD50 (acute toxicity) was studied. A model was built to more accurately predict the toxicity of benzoic acid compounds in mice via oral LD50: 39 benzoic acid compounds were used as a training dataset for building the regression model and 18 others as a forecasting dataset to test the prediction ability of the model using SAS 9.0 Program Software. The model is LogLD50 = 1.2399 × 0JA +2.6911 × 1JA – 0.4445 × JB (R2 = 0.9860), where 0JA is zero order connectivity index, 1JA is the first order connectivity index and JB = 0JA × 1JA is the cross factor. The model was shown to have a good forecasting ability.


Introduction
Benzoic acid compounds are an important organic chemical raw material that are widely used in food, medicine, cosmetic, antiseptic, insecticide, dyestuff, etc. For example, benzoic acid is a common antiseptic, Aspirin is a famous non-steroid anti-inflammatory drug, Triflusal is a antithrombotic, and Chloramben and Dicamba are common pesticides (see Figure 1). Most benzoic acid compounds are OPEN ACCESS toxic and are hardly degraded by microorganism in the natural environment, which may cause serious public health and environmental problems.  (1), aspirin (2), triflusal (3), chloramben (4) and dicamba (5).
With the development of synthetic chemistry, combinatorial chemistry and pharmaceutical chemistry, millions of new compounds are being synthesized. Classical chemical substance evaluation needs a lot of time and is expensive, and the speed of analyzing the toxicity of compounds is less than the speed of discovery of new compounds. Nowadays, scientists pay more and more attention to the importance of prediction toxicity in the early stage. Quantitative structure-toxicity relationships (QSTR) have been efficiently used for the study of toxicity mechanisms of various compounds [1].
QSTR plays an important role in toxicity forecasting, which is widely used in the modern studying of compounds, since more and more compounds are being found. It is necessary to predict the toxicity of compounds accurately and quickly [2][3][4]. QSTR of benzoic acid compounds with molecular connectivity index (MCI) in mice via oral LD 50 (acute toxicity, half lethal dose) are not reported. The quantitative structure characteristic parameters of 57 benzoic acid compounds were obtained with MCI. Values of LD 50 for mice in benzoic acid compounds have been collected from various literature sources. In this work, the QSTR of benzoic acid compounds in mice via oral LD 50 was studied and a model was developed to more accurately predict the toxicity of benzoic acid compounds in mice via oral LD 50 . 39 benzoic acid compounds were used as a training dataset for building the regression model, and 18 other benzoic acid compounds as a forecasting dataset to test the prediction ability of the model. The experimental result analysis showed that 0 J A , 1 J A and cross factor J B were important factors affecting the toxicity of benzoic acid compounds (although the toxicity mechanism of compounds is not clear yet), where 0 J A is zero order connectivity index, 1 J A is the first order connectivity index and J B = 0 J A × 1 J A is the cross factor.

Research Methods
In 1975, Milan Randic described a skeletal branching index that correlated with the three physical properties of alkenes [5]. The concept was further developed and applied extensively by Kier and Hall [6][7][8], which led to the molecular connectivity index (MCI). Eventually, Kier and Hall modified the connectivity indices to discriminate carbon atoms from other heteroatoms, which introduced the valance molecular connectivity index m χ t [9]. The MCI is calculated with the follow formula: m χ t = ∑ Nm j=1 (∏ m+1 i 1/δ i ) 1/2 (1) m χ t is mth-order MCI, t is the type of sub-graph including path (p), cluster (c), path-cluster (pc), N m is the number of the sub-graph of the same type and order. The abbreviation is δ = σh, where σ is the count of electrons in σ orbital and h is the count of bonding hydrogen atoms.
There was no doubt that the MCI was proved to be the one of the most successful and widely used descriptors. The MCI has been introduced and used in many studies [10][11][12][13].
From the skeletal branching index of Randic to the connectivity index modified by Kier and Hall, the core is the connectivity of atoms, which is from the connectivity δ i of upper atom to valence connectivity of δ iv . The computing method of heteroatom i modified by Kier and Hall is as the following formula: Z and Z i are the count of extra nuclear electrons and valence electrons, respectively, h i is the count of hydrogen atoms combining with heteroatom i. Although Kier et al contributed to the computing method of heteroatom i, the method could not discriminate the same heteroatom in different oxidation states. More recently, Yu et al improved the method, and redefined the valence connectivity value δ h i using the following formula [14]: m i is the count of bonding electrons, Z is the count of extra nuclear electrons, n i is maximum first quantum number, Z i is the valence electron number, N i is the count, L p is the hybridization style of heteroatom i, the value as following: sp 3 , L p = 1; sp 2 , L p = -1.8; sp, L p = 2; if that is the atom itself, L p = 2, m i = 0. The program package for calculating the MCI of compounds was compiled by Visual Basic Program Software according to the modified formula. In order to predict the toxicity of benzoic acid compounds and get the prediction model, the molecular structure of 57 benzoic acid compounds was entered into the program package and their MCI were calculated. 39 of them were a training dataset for building the multi variance linear regression model (logarithm of LD 50 as dependent variable and MCI as factor), and 18 of them were predicted samples to test the prediction ability of the model using SAS 9.0 Program Software. During the process of building the regression model, the cross factor was considered into the model.

Results and Discussion
In what follows, we will present the process of computing MCI, choosing factors of the regression model and building the model, as well as testing the model. Firstly, zero order connectivity index 0 J A and first order connectivity index 1 J A were calculated using the program package. The value of LD 50 was converted to logarithm in order to make all the data in the same order of magnitude and easier to statistically analysze and compare. Then, the toxicity data was analyzed in the training dataset as regression analysis. Non-intercept stepwise regression was chosen as the statistical method. The influencing factors were as follows: zero order connectivity index 0 J A , first order connectivity index 1 J A and the cross factor J B = 0 J A × 1 J A . These influencing factors were inspected, and the results were as below: The results show that the groups are fine expect (3) and (4), and correlation coefficient (R 2 ) showed that (7) is the best. It was shown that the regression linearity of (7) is better than other groups. Therefore, 0 J A , 1 J A and J B were chosen as the independent variables of the model (see Table 1). Comparing the p value in the table, it was shown that 0 J A , 1 J A and J B had an obvious significant influence, and a regression estimated model was built: LogLD 50 = 1.2399 × 0 J A +2.6911 × 1 J A -0.4445 × J B (R 2 = 0.9860) Obeying the principles that the value of correlation coefficient (R 2 ) is approximate to 1 and the p value is less than 0.01, as well as the numbers of the parameters equal to the test coefficient, we found that the linearity of the model is appropriate. The result of residual analysis shows that the fitting of the model was good (see Table 2). The distribution of residual is a normal distribution, since the scatter plots are almost standing on one line (see Figure 2).
From analysis of the model, it was known that 0 J A , 1 J A and cross factor J B had great influence on the oral toxicity in mice. When 0 J A and 1 J A decrease, the value of LD 50 increases. And LD 50 decreases as J B increases. Since increasing LD 50 resulted in lower toxicity, therefore, the model showed that 0 J A and 1 J A have a negative correlation to the toxicity of benzoic acid compounds, and J B has a positive correlation to the toxicity of benzoic acid compounds. The ability of regression model with 18 benzoic acid compounds was also tested, and the result indicates that the prediction ability of the model is good (Table 3). It is shown that these influencing factors indeed had an significant effect on toxicity, and the forecasting accuracy of the model becomes higher when introducing the cross factor (J B ).

Conclusions
LD 50 is a common factor for evaluating compound toxicity, which reflects receptivity of test animals, and LD 50 values have high reproducibility and stability. In QSTR study, linear regression analysis is a widely useful quantization method [15]. In this work, the quantitative parameters were calculated with MCI and the toxicity prediction model of benzoic acid compounds was obtained as follow. LogLD 50 =1.2399 × 0 J A +2.6911 × 1 J A -0.4445 × J B , R-Square = 0.9860. The model has a good forecasting ability.