Next Article in Journal
Targeting GLI Transcription Factors in Cancer
Previous Article in Journal
Rapid Determination of the Geographical Origin of Chinese Red Peppers (Zanthoxylum Bungeanum Maxim.) Based on Sensory Characteristics and Chemometric Techniques
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimation of the Toxicity of Different Substituted Aromatic Compounds to the Aquatic Ciliate Tetrahymena pyriformis by QSAR Approach

1
College of Chemistry and Chemical Engineering, Yantai University, Yantai 264005, China
2
LAQV/REQUIMTE, Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
*
Author to whom correspondence should be addressed.
Molecules 2018, 23(5), 1002; https://doi.org/10.3390/molecules23051002
Submission received: 19 March 2018 / Revised: 20 April 2018 / Accepted: 21 April 2018 / Published: 24 April 2018

Abstract

:
Nowadays, quantitative structure–activity relationship (QSAR) methods have been widely performed to predict the toxicity of compounds to organisms due to their simplicity, ease of implementation, and low hazards. In this study, to estimate the toxicities of substituted aromatic compounds to Tetrahymena pyriformis, the QSAR models were established by the multiple linear regression (MLR) and radial basis function neural network (RBFNN). Unlike other QSAR studies, according to the difference of functional groups (−NO2, −X), the whole dataset was divided into three groups and further modeled separately. The statistical characteristics for the models are obtained as the following: MLR: n = 36, R2 = 0.829, RMS (root mean square) = 0.192, RBFNN: n = 36, R2 = 0.843, RMS = 0.167 for Group 1; MLR: n = 60, R2 = 0.803, RMS = 0.222, RBFNN: n = 60, R2 = 0.821, RMS = 0.193 for Group 2; MLR: n = 31 R2 = 0.852, RMS = 0.192; RBFNN: n = 31, R2 = 0.885, RMS = 0.163 for Group 3, respectively. The results were within the acceptable range, and the models were found to be statistically robust with high external predictivity. Moreover, the models also gave some insight on those characteristics of the structures that most affect the toxicity.

Graphical Abstract

1. Introduction

With the rapid development of science and technology, tens of thousands of new chemicals are synthesized and widely used in all walks of life every day. However, as we all know, if chemicals are used or handled incorrectly, they may enter the aquatic environment or bio-accumulate in the food chain, where they may adversely impact the people, ultimately. One of the current interests in medicinal chemistry, environmental sciences, and especially for toxicology, is to rank and establish the chemical substances with respect to their potential hazardous effects on humans, wildlife, and aquatic flora and fauna [1]. Among the vast organic matter, it is noteworthy that the substituted aromatic compounds [2,3,4,5,6,7,8] occupy important positions, since they are produced in large quantities and released into the environment as a result of their wide use in agriculture and industry, and are widely distributed in air, natural water, waste water, soil, sediment, and living organics [9,10]. In addition, recent studies have proved that the substituted aromatic compounds are also a kind of biotoxic environmental pollutant, and even have the effects of carcinogenesis and gene mutation on organisms [10,11]. Therefore, studies on the properties of substituted aromatics have important significance.
Up to now, both experimental [12,13,14,15] and theoretical methods [16,17] have been used to evaluate kinds of substituted aromatic compounds for their different toxicities. Also, it is well known that the theoretical predictions of properties or activities by quantitative structure–activity relationship (QSAR) studies have been widely adopted and applied since the 90s, because of their advantages, such as rapidness, easiness, sensitiveness, and cheapness [11]. The QSAR method has been widely applied in different fields, including physical chemistry, pharmaceutical chemistry, environmental chemistry, toxicology, and other research fields [18]. It has been proven that the use of QSAR modeling for toxicological predictions would help to determine the potential adverse effects of chemical entities in risk assessment.
For a long time, a lot of meaningful research focusing on the toxicity of substituted aromatic compounds by QSAR approach have been carried out. In 1982, Schultz et al. tried to perform the QSAR study between the cellular response to Tetrahymena pyriformis and molecular connectivity indexes for a series of 24 mono- and dinitrogen heterocyclic compounds. In this study, the authors established a better model than before, and pointed out that toxicity increases with an increase in the number of atoms and degree of methylation per compound, and that toxicity decreases with an increase in nitrogen substitution [1]. In 1998, Cronin et al. established several QSAR models focusing on a dataset of 42 alkyl- and halogen-substituted nitro- and dinitrobenzenes to Tetrahymena pyriformis [19]. They found that the nitrobenzenes were thought to elicit their toxic response through multiple (and mixed) mechanisms by one or two molecule descriptor models. In 2001, in order to compare the differences among kinds of QSAR model-building methods, Cronin and Schultz developed QSAR studies for the toxicity of 268 aromatic compounds in the Tetrahymena pyriformis growth inhibition assay [16]. In their study, they not only compared the influence of different descriptors on the models, but also the Bayesian regularized neural network (BRANN) and partial least-squares (PLS) analysis to build the models. In the following year, the same authors performed also the same study on a dataset of phenolic toxicity data to Tetrahymena pyriformis [17]. The above works gave us some guidelines or directions on how to build better models on the toxicities to this group of compounds. Netzeva et al. developed relative simple QSARs models (one or two descriptors) for the acute toxicity of a dataset of 77 aromatic aldehydes to the ciliate Tetrahymena pyriformis using mechanistically interpretable descriptors [20]. They revealed that the octanol/water partition coefficient (log KOW) is the most important descriptor, and the models would be improved by using another electronic descriptor. Roy et al. performed a QSAR studies on the toxic potency to Tetrahymena pyriformis of a dataset of 174 aromatic compounds (phenols, nitrobenzenes, and benzonitriles) using electrophilicity index [21]. In this study, the compounds in the dataset were divided into the electron donor and acceptor group, and they stated that electrophilicity indices, along with the total Hartree–Fock energy, can be used to build the model perfectively. Later, the performances of the linear and nonlinear models were estimated by Devillers et al. using a structurally heterogeneous set of 200 phenol derivatives on Tetrahymena pyriformis. In this study, the authors pointed out the superiority of the nonlinear methods over the linear ones to find complex structure–toxicity relationships among large sets of structurally diverse chemicals [22]. Tetko et al. gave studies on the applicability domain and the influence of the overfitting in the QSAR model building process by the toxicity dataset against Tetrahymena pyriformis [23]. The hierarchical technology for QSAR was performed using 95 diverse nitroaromatic compounds against the ciliate Tetrahymena pyriformis [24]. Zarei et al. developed a model for the prediction of the toxicity of 268 substituted benzene compounds including phenols, monosubstituted nitrobenzenes, multiply substituted nitrobenzenes and benzonitriles to T. pyriformis using bee algorithm (BA) for selecting descriptors and adaptive neuro-fuzzy inference system (ANFIS) for building model [25]. A molecular structural characterization (MSC) method named molecular vertexes correlative index (MVCI) was successfully used to describe the structures of 30 substituted aromatic compounds, and the results suggested good stability and predictability of the QSAR models [26]. Comparative molecular field (CoMFA), molecular similarity index analysis (CoMSIA), and density functional theory (DFT) methods were used to establish QSAR models for analyzing and predicting the toxicities of 31 substituted thiophenols [27]. And later, Salahinejad et al. also used the CoMFA, CoMSIA, and VolSurf techniques to develop valid and predictive models able to estimate the toxicity of substituted benzenes toward T. pyriformis. In the paper, they confirmed that in addition to hydrophobic effects, electrostatic and H-bonding interactions also play important roles in the toxicity of substituted benzenes, as well as that the information obtained from CoMFA and CoMSIA 3-D contour maps could be useful to explain the toxicity mechanism of substituted benzenes [28]. The linear (MLR) and nonlinear statistical (RBFNN) methods were used by us to build a reliable, credible, and fast QSAR model for the prediction of mixture toxicity of non-polar narcotic chemicals, including 9 PFCAs, 12 alcohols, and 8 chlorobenzenes and bromobenzenes. The predictive values are in good agreement with the experimental ones [18]. In the same way, recursive neural networks (RNN) and multiple linear regression (MLR) methods were also employed to build models for prediction of the toxicity values of 69 benzene derivatives, both methods provided good results as compared to other studies available in the literature [29]. To build a reliable and predictive QSAR model, a genetic algorithm along with partial least square (GA–PLS) was employed to select the optimal subset of descriptors that significantly contribute to the toxicity of 45 nitrobenzene derivatives to Tetrahymena pyriformis [30].
The goal of present study was to develop reliable and predictive QSAR models using both MLR and RBFNN methods to identify and predict the acute toxicity (the 50% growth inhibitory concentration IGC50) of substituted aromatic compounds to the aquatic ciliate Tetrahymena pyriformis. For this purpose, the whole dataset was divided into three groups with respect to the important function group of the substituted aromatic compounds such as −NO2, −X etc. They were Group 1: Compounds with NO2 group, etc. (46 compounds); Group 2: Compounds with –X, etc. (75 compounds); Group 3: Compounds with both −NO2 and −X, etc. (39 compounds). In so doing, different accurate models were built to evaluate the toxicities of these aromatic compounds.

2. Materials and Methods

2.1. Datasets

For the aromatic compounds, Wei et al. have mentioned that the order of the contribution of the special substituents to the toxicity of the aromatic compound is: −NO2 > −Cl > −CH3 > −NH2 > −OH [31]. Based on the dataset given by Schultz et al. [32], we selected the typical compounds containing the most influential functional groups (−NO2 and –X), and divided them into three subgroups. Group 1 includes 46 compounds whose chemical structures have the functional group −NO2 without −X. Among them, 36 compounds were substituted by a −NO2 and 10 compounds were substituted by two −NO2. Group 2 contains 75 compounds which have functional groups –X without −NO2. Among them, the 54, 16, and 5 compounds were replaced by one, two, or three functional groups −X, respectively. Group 3 contains 39 compounds, in which both the −NO2 and −X functional groups are included, and the total number of substituents for −NO2 and −X is not more than 3.
In this study, compounds in each group were randomly divided into two subsets. One called training set was used to build a model, and there were 36, 60, 31 compounds in the training set for Group 1, 2, 3, respectively. The remaining compounds were used to verify the robustness and feasibility of the model as a test set which includes 10, 15, and 8 for the corresponding groups, respectively. The CAS number, name, and toxicity (−log IGC50) of the above compounds are all listed in Table 1.

2.2. Molecular Descriptors’ Generation and Selection

To calculate the molecular descriptors of each compound, their structures were drawn using ISIS Draw 2.3 (MDL Information Systems, Inc., San Ramon, CA, USA) [33]. The MM+ molecular mechanics forcefield in the HyperChem 6.0 program (Hypercube, Inc.: Waterloo, ON, Canada) was then used to carry out the preliminary molecular geometry optimization [34]. The further optimization of the compound structure was done by semi-empirical PM3 method utilizing the Polak–Ribiere algorithm until the root mean square gradient was 0.01 kcal/mol [35]. Finally, a more precise optimization was achieved by MOPAC 6.0 software package (Indiana University: Bloomington, IN, USA) [36]. Afterwards, the final optimized structures were converted to the CODESSA 2.63 program (University of Florida, Gainesville, FL, USA) for calculating the five classes of descriptors, namely constitutional, topological, geometrical, electrostatic, and quantum-chemical descriptors [37]. It was necessary to explain that the logP descriptor, which cannot be calculated by the CODESSA 2.63, but can be obtained by Hyperchem, was then added to the descriptors pool [34]. Through doing these, 494, 597, and 611 descriptors were gained for each of the studied compounds in Group 1, 2, and 3, respectively.
Before establishing the QSAR models, it is necessary to remove the insignificant descriptors, and the constant and highly intercorrelated descriptors (the intercorrelation of the descriptors should be lower than 0.8). In this paper, the heuristic method (HM) was used to achieve a thorough search for the best multilinear correlations with the computed descriptors in the framework of the program CODESSA 2.63 [37].

2.3. Multiple Linear Regressions (MLR)

Multiple linear regressions (MLR) are often accepted as a classical method for solving linear problems when there are two or more than two independent variables in QSAR modeling. The purpose of MLR is to find a mathematical function which best depicts the desired activity Y (here, −log IGC50 values) as a linear combination of the X-variables (the molecular descriptors), with the regression coefficients bn. The equation is as follows:
Y = b0 + b1x1 +b2x2 + … +bnxn.
Usually, the good fit alone does not guarantee that the model is useful for prediction purposes by the R2 (coefficient of determination), LOOq2 (leave-one-out correlation coefficient), RMS (root mean square error), F (Fisher’s statistics), etc. [38]. Some statistical characteristics of the test set are also needed to be considered: R2 (coefficient of determination), R 0 2 (the coefficients of determination, predicted vs observed activities, when the Y-intercept b0 is set to zero), as well as by their corresponding slopes k and k′. The following conditions need to be fulfilled to adequately estimate the predictive ability of a model [39]:
q 2 > 0.5
R 2 > 0.6
( R 2 R 0 2 ) R 2 < 0.1   o r   ( R 2 R 0 2 ) R 2 < 0.1
0.85 k 1.15   or   0.85 k 1.15

2.4. Radial Basis Function Neural Networks (RBFNN)

In general, RBFNN may have a better result than MLR, because it can take into account some nonlinear behavior between the molecular descriptors and the desired activities values (−log IGC50). The detailed introduction of RBFNN has been stated in previous studies [40,41], so we only make a simple statement of the key parts here.
The RBFNN is a typical feed forward neural network which is composed of three layers, which are the input layer, the hidden layer, and the output layer. The first layer is linear, and distributes the input values, while the next layer is nonlinear, and uses radial basis function. The third layer linearly combines the outputs. Each neuron in each layer is adequately linked to the next layer. However, there is no connection between neurons in a given layer. Each hidden layer unit stands for a single radial basis function, which is characterized by a center and a width. In this layer, each neuron uses a radial basis function as nonlinear transfer function to handle the input information from the previous layer. The most common use of RBF is the Gauss function, characterized by the center (cj) and width (rj) [42]. It is used to measure the Euclidean distance between the input vector (x) and the radial basis function center (cj), and gain the nonlinear transformation within the hidden layer, defined as
h j = e x p ( x c j 2 / r j 2 ) ,
where hj is the output of the jth RBF unit, while cj and rj are the center and width of such a unit, respectively. And the operation of the output layer is linear and is given by
y k ( x ) = j = 1 n h w k j h j ( x ) + b k
where yk is the kth output unit for the input vector X, wkj is the weight connection between the kth output unit and the jth hidden layer unit, and bk is the respective bias.
In the present study, we used the MATLAB package (MathWorks, Natick, MA, USA) (www.mathworks.com/products/matlab/) to accomplish all the RBFNN calculations. The total functions of the RBFNN model can be evaluated by the same statistical parameters as the MLR method together with its reliability and robustness.

2.5. Applicability Domain (AD) of the Model

It is necessary to give the application domain (AD) of the model. The applicability domain (AD) of a QSAR model refers to a theoretical region in the space defined by the compounds in the training set. It demonstrates the nature of the compound molecules that can be utilized in the built model. That is to say, AD restricts a theoretical region, also for unknown chemicals without experimental data, with the lowest number of bad predictions (Y-outliers) and chemicals far from the training structural domain [43]. In this study, a William’s plot, i.e., a plot of standardized residuals (R) vs leverages was used [44]. Here, a simple measure of a chemical being too far from the applicability domain of the model is its leverage, hi [43], as follows:
h i = x i T ( x T x ) 1 x i ( i = 1 , 2 , , n ) .
In the above equation, xi represents the descriptor row vector of the studied compound, while x represents the n × k − 1 matrix of k model descriptor values for the n training set compounds. The superscript “T” refers to the transpose of the matrix/vector. hi characterizes the leverage of a compound, and is one of the coordinates of the William’s plot (standardized residuals versus leverage).

3. Results and Discussion

3.1. MLR Results

As mentioned above, based on the structural differences among the molecules which are caused by the influential functional groups (−NO2 and −X), Group 1, 2, and 3 have 46, 75, 39 compounds, respectively. The models of each group were established by the training sets. Before doing this, the heuristic method (HM) was used to conduct the descriptor selection. After the preselection of the descriptors, 178, 203, and 160 descriptors were left for each group by removing of the descriptors not obeyed the thumb rules [45].
Multilinear regression models were then developed in a stepwise procedure, that is, the descriptors and correlations were sorted by the values of the F-test and the correlation coefficients. Beginning with the top descriptor from the list, two-parameter correlations were calculated. Later, the descriptors were added one by one, until the preselected number of descriptors in the model is fulfilled. Finally, three descriptors were used to describe the relationship between molecule structure and toxicity for each group of compounds. The selected descriptors and their chemical meaning, along with the statistical parameters, are listed in Table 2, Table 3 and Table 4.
The external test set was also used to further evaluate the three models. The statistical parameters obtained are as follows: Next = 10, R2 = 0.917, q ext 2 = 0.851, F = 13.820, RMS = 0.222 for group 1; Next = 15, R2 = 0.789, q ext 2 = 0.732, F = 13.720, RMS = 0.266 for group 2; Next = 8, R2 = 0.733, q ext 2 = 0.730, F = 260.404, RMS = 0.380 for Group 3. Figure 1, Figure 2 and Figure 3a show the predicted vs observed −log IGC50 values for all the training and test set compounds. Thus, it can be seen that the model is reasonable in both statistical significance and predictive ability.

3.2. Model Applicability Domain Analysis and Improved MLR Model

It is also an important step to consider the possible outliers of the models. In order to visualize the AD, the plot of standardized cross-validated residuals versus leverage (the William’s plot), which can provide an immediate and simple graphical detection, was used to find out the outliers from the models. In this plot, the horizontal and vertical straight lines represent the normal control values of Y-outliers and X-outliers, respectively. The limit of X-coordinate is 3m/n, where m is the number of model parameters, and n is the number of samples belonging to the training set. In the present study, the normal control value for Y-outliers (RES) was set as ± 3 σ . Figure 4, Figure 5 and Figure 6 show the William’s plot based on the MLR models for the whole dataset compounds of group 1, 2, 3, respectively.
As can be judged from Figure 4, in the model for Group 1, there is one X-outlier (for Group 1: compound 2), which is 2-nitroanisole. In its structure, there are two functional groups, −NO2 and methoxy. The former is in all of the compounds belonging to this group as a strong electron-withdrawing group. However, the methoxy group has oxygen lone pair electrons which are a strong electron donor moiety, compared to other ones in the group. Therefore, care should be taken when using the compounds with methoxy, since they can activate the benzene ring and exert an unusual influence on the toxicity. And from Figure 6, it can also be seen that there is a X-outlier (for group 3: compound 15), that is, 2-chloromethyl-4-nitrophenol. This compound has three electron-withdrawing moieties, including—Cl, −OH, and −NO2, which has almost the strongest induction effect of the compounds in this group. Also, there seem to be another outlier (for Group 3: compound 36), which belongs to the test set. This may be due to variability in the measurement, or it may indicate experimental error.
If the handling of the outliers is unreasonable, the accuracy of the model will be affected. Thus, the quality and ability of the model prediction will be affected. Therefore, we removed the outliers from Group 1 and Group 3, set up the models anew, and the results were as follows: for the training set of Group 1 (removing compound 2 in Group 1): N = 35, R2 = 0.926, LOOq2 = 0.916, F = 94.266, RMS = 0.125. For the training set of Group 3 (removing compound 15 in Group 3): N = 30, R2 = 0.852, LOOq2 = 0.834, F = 49.710, RMS = 0.196. The statistical parameters of the model are better after removing the escape values.
To further assess the predictive powers of the model established by the MLR method, parameters such as ( R 2 R 0 2 ) R 2 , k , k , etc., were also calculated, and the results were shown in Table 5. From the table, we can see the statistical results were all within the acceptable ranges for the methods of MLR.

3.3. Validation Results of the Models

Further, a fivefold cross-validation algorithm was applied for validation of the stability of the three models. The members selected for each group (i.e., groups A, B, C, D, and T) were shown in Table 1. The R2, F, and RMS values for each validation along with their average values were shown in Table 6 for the MLR models. As can be seen, both models are stable, judging from the obtained values for the average training quality and for the average predicting quality.

3.4. RBFNN Results

In the field of QSAR research, RBFNN often shows better results than MLR because of its ability to consider some nonlinear relationships between the molecular structure and its activity. In order to confirm this view, RBFNN was utilized to build nonlinear predictive models using the same descriptors selected by the MLR models. The RBFNN can be traced as i-nk-1 net to indicate the number of units in the three layers, respectively. Meanwhile, the width (r) of RBF was computed by systemically changing its value in the training step from 0.1 to 4.0 with increments of 0.1. For the three groups of compounds belonging to training sets in this study, the RBFNN models were 3-10-1, 3-9-1, and 3-9-1, along with widths of 0.8, 2.0, and 1.7, respectively.
Their statistical results of the training and the test set are as follows. Group1: for training set, N = 36, R2 = 0.843, LOOq2 = 0.838, F = 182.306, RMS = 0.167, and for the test set, Next = 10, R2 = 0.881, q ext 2 = 0.867, F = 59.483, RMS = 0.210; Group 2: for training set, N = 60, R2 = 0.821, LOOq2 = 0.818, F = 265.898, RMS = 0.192, and for the test set, Next = 15, R2 = 0.810, q ext 2 = 0.796, F = 55.506, RMS = 0.232; Group 3: for training set, N = 31, R2 = 0.885, LOOq2 = 0.882, F = 224.261, RMS = 0.163, and for the test set, Next = 8, R2 = 0.632, q ext 2 = 0.622, F = 63.660, RMS = 0.298. The corresponding predicted endpoint values of each compound in each group were shown in Table 1, and the plot of the predicted and experimental values of both training and test set were displayed in Figure 1b, Figure 2b and Figure 3b. Different from the original literature, [35], we selected and classified the original compounds according to the structural characteristics and further modeled, analyzed, and predicted the corresponding toxicity values. The models, thus established, are also more targeted for the particular compounds, and the statistical results of ( R 2 R 0 2 ) R 2 , k , k , etc., as shown in Table 5 by RBFNN, also indicated the models to be statistically robust with high external predictivity.

3.5. Interpretation of Model Descriptors

In order to deepen the understanding of this study, more detailed explanations of the descriptors selected in each group were performed. For group 1, three descriptors were selected in the QSAR model, namely: G2, PAB, and Enn(C-H). The positive sign of them indicated that the −log IGC50 values increased with its increase, and vice versa. G2 refers to gravitation indexes for all bonded pairs of atoms, and it is defined as G 2 = ( i > j ) N B m i m j r i j 2 [46], where mi and mj are the atomic weights of atoms i and j, rij is the interatomic distance, Nb is the number of bonds in the molecule. Po belongs to the valency-related descriptors, which relate to the strength of intermolecular bonding interactions and characterize the stability of the molecules, their conformational flexibility and other valency-related properties [47]. Enn(C-H) is Max n–n repulsion for a C–H bond, calculated as follows: E n n ( C H ) = Z C Z H R C H , where ZC and ZH are the nuclear (core) charges of atoms C and H, respectively, and RCH is the distance between them. This energy describes the nuclear repulsion driven processes in the molecule, and may be related to the conformational (rotational, inversional) changes or atomic reactivity in the molecule [46].
For Group 2, focusing on the compounds without the functional group −NO2, but with −X, three descriptors were chosen. That is, Log P, PNSA-2/TMSA, and PSIGMA. PNSA-2/TMSA is FNSA-2 fractional PNSA (PNSA-2/TMSA) [Zefirov’s PC], which contributes to the calculation of atomic partial charges to the total molecular solvent-accessible surface area [46]. PSIGMA represents the maximum bond order for a given pair of atomic species in the molecule, its values for a given pair of atomic species in the molecule with the lower limit PSIGMA (min) > 0.1. LogP stands for the solvational characteristic (hydrophobicity of chemicals) because it is closely related to the change in the Gibbs energy of solvation of a solute between two solvents.
For Group 3, three descriptors were selected to build the model, that is Ic, Enn(C-C), and RPCG. The chemical meaning of them can be seen in Table 4. Ic is a geometrical descriptor which relates to the atomic masses, the distance of the atomic nucleus from the main rotational axes, which characterizes the mass distribution in the molecule. Enn ( C C ) = Z C Z C / R C C , where ZC and ZC are the nuclear (core) charges of atoms C, and RC–C is the distance between them. This energy describes the nuclear expulsion driven processes in the molecule, and may be related to the conformational (rotational, inversional) changes or atomic reactivity in the molecule [48]. RPCG, relative positive charge, belongs to electrostatic descriptors. From its coefficient, we can find that the relative positive charge of the molecule is negatively related to the endpoint values (−log IGC50).
In summary, we found that the repulsion between the two bonds and the local charge on the surface of the molecule appeared in different models, indicating that these two factors have a greater influence on the structure of the compound and should be relatively valued.

4. Conclusions

In the present study, the QSAR models were performed on the study of the acute toxicity of substituted aromatic compounds to the aquatic ciliate Tetrahymena pyriformis using the MLR and RBFNN methods, and by dividing the whole dataset into three groups based on the most influential functional group (−NO2 and −X). Acceptable statistical results for each model indicated their good stability and good predictability. We can also see from the results of the MLR and RBFNN models that the MLR method can establish reasonable models for evaluating the activity of compounds, and the RBFNN method can provide better statistical parameters. Also, the selected descriptors are effective and feasible for evaluating the toxicity of this group of compounds. Lastly, the results of this study provided useful insights on the characteristics of the structures that most affect the toxicity.

Author Contributions

Feng Luan and M. Natália Dias Soeiro Cordeiro conceived and designed the experiments; Ting Wang and Lili Tang performed the experiments and wrote the paper; Feng Luan and Shuang Zhang reviewed the paper.

Acknowledgments

This work was financially supported by the National Natural Science Foundation of China (No. 21675138).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Schultz, T.W.; Kier, L.B.; Hall, L.H. Structure-toxicity relationships of selected nitrogenous heterocyclic compounds. III. Relations using molecular connectivity. Bull. Environ. Contam. Toxicol. 1982, 28, 373–378. [Google Scholar] [CrossRef] [PubMed]
  2. Oren, A.; Gurevich, P.; Henis, Y. Reduction of nitrosubstituted aromatic compounds by the halophilic anaerobic eubacteria Haloanaerobium praevalens and Sporohalobacter marismortui. Appl. Environ. Microbiol. 1991, 57, 3367–3370. [Google Scholar] [PubMed]
  3. Gooch, A.; Sizochenko, N.; Rasulev, B.; Gorb, L.; Leszczynski, J. In vivo toxicity of nitroaromatics: A comprehensive quantitative structure–activity relationship study. Environ. Toxicol. Chem. 2017, 36, 2227–2233. [Google Scholar] [CrossRef] [PubMed]
  4. Finger, G.C.; Kruse, C.W. Aromatic fluorine compounds. VII. Replacement of aromatic-Cl and -NO2 groups by -F1,2. J. Am. Chem. Soc. 1956, 78, 6034–6037. [Google Scholar] [CrossRef]
  5. Zhang, A.Q.; Chen, R.Q.; Wei, D.B.; Wang, L.S. QSAR research of chlorinated aromatic compounds toxicity to Selenastrum capricornutum. China Environ. Sci. 2000, 20. [Google Scholar] [CrossRef]
  6. Gupta, A.K.; Chakraborty, A.; Giri, S.; Chattaraj, P. Toxicity of halogen, sulfur and chlorinated aromatic compounds. Int. J. Chemoinform. Chem. Eng. 2011, 1, 61–74. [Google Scholar] [CrossRef]
  7. Lu, G.H.; Wang, C.; Yuan, X.; Lang, P.Z. Quantitative structure-activity relationships for the toxicity of substituted benzenes to Cyprinus carpio. Biomed. Environ. Sci. 2005, 18, 53–57. [Google Scholar] [PubMed]
  8. Shintou, T.; Fujii, S.; Kubo, S. Process for Producing Iodinated Aromatic Compounds. U.S. Patent 6,437,203, 20 August 2002. [Google Scholar]
  9. Arcangeli, J.P.; Arvin, E. Biodegradation rates of aromatic contaminants in biofilm reactors. Water Sci. Technol. 1995, 31, 117–128. [Google Scholar]
  10. Jing, G.H.; Li, X.L.; Zou, Z.M. Quantitative structure-activity relationship (QSAR) study of toxicity of substituted aromatic compounds to Photobacterium phosphoreum. Chin. J. Struct. Chem. 2010, 29, 1189–1196. [Google Scholar]
  11. Khadikar, P.V.; Mather, K.C.; Singh, S.; Phadnis, A.; Shrivastava, A.; Mandaloi, M. Study on quantitative structure toxicity relationships for benzene derivatives acting by narcosis. Bioorg. Med. Chem. 2002, 10, 1761–1766. [Google Scholar] [CrossRef]
  12. Giddings, J.M. Acute toxicity to Selenastrum capricornutum, of aromatic compounds from coal conversion. Bull. Environ. Contam. Toxicol. 1979, 23, 360–364. [Google Scholar] [CrossRef] [PubMed]
  13. Kuivasniemi, K.; Eloranta, V.; Knuutinen, J. Acute toxicity of some chlorinated phenolic compounds to Selenastrum capricornutum, and phytoplankton. Arch. Environ. Contam. Toxicol. 1985, 14, 43–49. [Google Scholar] [CrossRef]
  14. Sverdrup, L.E.; Krogh, P.H.; Nielsen, T.; Kjaer, C.; Stenersen, J. Toxicity of eight polycyclic aromatic compounds to red clover (Trifolium pratense), ryegrass (Lolium perenne), and mustard (Sinapsis alba). Chemosphere 2003, 53, 993–1003. [Google Scholar] [CrossRef]
  15. Kobetičová, K.; Bezchlebová, J.; Lána, J.; Sochová, I.; Hofman, J. Toxicity of four nitrogen-heterocyclic polyaromatic hydrocarbons (NPAHs) to soil organisms. Ecotoxicol. Environ. Saf. 2008, 71, 650–660. [Google Scholar] [CrossRef] [PubMed]
  16. Cronin, M.T.; Schultz, T.W. Development of quantitative structure−activity relationships for the toxicity of aromatic compounds to Tetrahymena pyriformis: Comparative assessment of the methodologies. Chem. Res. Toxicol. 2001, 14, 1284–1295. [Google Scholar] [CrossRef] [PubMed]
  17. Cronin, M.T.; Aptula, A.O.; Duffy, J.C.; Netzeva, T.I.; Rowq, P.H.; Valkova, I.V.; Schultz, T.W. Comparative assessment of methods to develop QSARs for the prediction of the toxicity of phenols to Tetrahymena pyriformis. Chemosphere 2002, 49, 1201–1221. [Google Scholar] [CrossRef]
  18. Luan, F.; Xu, X.; Liu, H.T.; Cordeiro, M.N. Prediction of the baseline toxicity of non-polar narcotic chemical mixtures by QSAR approach. Chemosphere 2013, 90, 1980–1986. [Google Scholar] [CrossRef] [PubMed]
  19. Cronin, M.T.; Gregory, B.W.; Schultz, T.W. Quantitative structure-activity analyses of nitrobenzene toxicity to Tetrahymena pyriformis. Chem. Res. Toxicol. 1998, 11, 902–908. [Google Scholar] [CrossRef] [PubMed]
  20. Netzeva, T.I.; Schultz, T.W. QSAR for the aquatic toxicity of aromatic aldehydes from tetrahymena data. Chemosphere 2005, 61, 1632–1643. [Google Scholar] [CrossRef] [PubMed]
  21. Roy, D.R.; Parthasarathi, R.; Subramanian, V.; Chattaraj, P.K. An electrophilicity based analysis of toxicity of aromatic compounds towards Tetrahymena pyriformis. Mol. Inform. 2006, 25, 114–122. [Google Scholar]
  22. Devillers, J. Linear versus nonlinear QSAR modeling of the toxicity of phenol derivatives to Tetrahymena pyriformis. SAR QSAR Environ. Res. 2007, 15, 237–249. [Google Scholar] [CrossRef] [PubMed]
  23. Tetko, I.V.; Sushko, I.; Pandey, A.K.; Zhu, H.; Tropsha, A.; Papa, E.; Oberg, T.; Todeschini, R.; Fourches, D.; Varnek, A. Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: Focusing on applicability domain and overfitting by variable selection. J. Chem. Inf. Model. 2008, 48, 1733–1746. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Artemenko, A.G.; Muratov, E.N.; Kuz’Min, V.E.; Muratov, N.N.; Varlamova, E.V.; Kuz’Mina, A.V.; Gorb, L.G.; Golius, A.; Hill, F.C.; Leszczynski, J.; et al. QSAR analysis of the toxicity of nitroaromatics in Tetrahymena pyriformis: Structural factors and possible modes of action. SAR QSAR Environ. Res. 2011, 22, 575–601. [Google Scholar] [CrossRef] [PubMed]
  25. Zarei, K.; Atabati, M.; Kor, K. Bee algorithm and adaptive neuro-fuzzy inference system as tools for QSAR study toxicity of substituted benzenes to Tetrahymena pyriformis. Bull. Environ. Contam. Toxicol. 2014, 92, 642–649. [Google Scholar] [CrossRef] [PubMed]
  26. Li, J.F.; Liao, L.M. Structural characterization and acute toxicity prediction of substituted aromatic compounds by using molecular vertexes correlative index. Chin. J. Struct. Chem. 2013, 32, 557–563. [Google Scholar] [CrossRef]
  27. Shi, J.Q.; Cheng, J.; Wang, F.Y.; Flamm, A.; Wang, Z.Y.; Yang, X. Acute toxicity and n-octanol/water partition coefficients of substituted thiophenols: Determination and QSAR analysis. Ecotoxicol. Environ. Saf. 2012, 78, 134–141. [Google Scholar] [CrossRef] [PubMed]
  28. Salahinejad, M.; Ghasemi, J.B. 3D-QSAR studies on the toxicity of substituted benzenes to Tetrahymena pyriformis: CoMFA, CoMSIA and VolSurf approaches. Ecotoxicol. Environ. Saf. 2014, 105, 128–134. [Google Scholar] [CrossRef] [PubMed]
  29. Bertinetto, C.; Duce, C.; Solaro, R.; Tiné, M.R.; Micheli, A.; Héberger, K.; Miličević, A.; Nikolić, S. Modeling of the acute toxicity of benzene derivatives by complementary QSAR methods. Match-Commun. Math. Comput. Chem. 2013, 70, 1005–1021. [Google Scholar]
  30. Wang, D.D.; Feng, L.L.; He, G.Y.; Chen, H.Q. QSAR studies for assessing the acute toxicity of nitrobenzenes to Tetrahymena pyriformis. J. Serb. Chem. Soc. 2014, 79, 1111–1125. [Google Scholar] [CrossRef]
  31. Wei, D.B.; Zhai, L.H.; Dong, C.H.; Hu, H.Y. Determination and prediction of the acute toxicity of substituted benzene compounds to luminescent bacteria. Chin. J. Environ. Sci. 2002, S1, 3–7. [Google Scholar]
  32. Schultz, T.W.; Netzeva, T.I.; Cronin, M.T. Selection of data sets for QSARs: Analyses of tetrahymena toxicity from aromatic compounds. SAR QSAR Environ. Res. 2003, 14, 59–81. [Google Scholar] [CrossRef] [PubMed]
  33. ISIS Draw2.3, MDL Information Systems, Inc.: San Ramon, CA, USA, 1990–2000.
  34. HyperChem 6.01, Hypercube, Inc.: Waterloo, ON, Canada, 2000.
  35. Dewar, M.J.; Storch, D.M. Development and use of quantum molecular models. 75. Comparative tests of theoretical procedures for studying chemical reactions. J. Am. Chem. Soc. 1985, 107, 3898–3902. [Google Scholar] [CrossRef]
  36. Stewart, J.P.P. MOPAC 6.0, Quantum Chemistry Program Exchange, No. 455; Indiana University: Bloomington, IN, USA, 1989.
  37. Katritzky, A.R.; Lobanov, V.S.; Karelson, M. CODESSA 2.63: Training Manual; University of Florida: Gainesville, FL, USA, 1995. [Google Scholar]
  38. Golbraikh, A.; Tropsha, A. Beware of q2! J. Mol. Graph. Model. 2002, 20, 269–276. [Google Scholar] [CrossRef]
  39. Tropsha, A.; Gramatica, P.; Gombar, V. The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb. Sci. 2003, 22, 69–77. [Google Scholar] [CrossRef]
  40. Xiang, Y.H.; Liu, M.C.; Zhang, X.Y.; Zhang, R.S.; Hu, Z.D.; Fan, B.T.; Doucet, J.P.; Panaye, A. Quantitative prediction of liquid chromatography retention of N-benzylideneanilines based on quantum chemical parameters and radial basis function neural network. J. Chem. Inf. Comput. Sci. 2002, 42, 592–597. [Google Scholar] [CrossRef] [PubMed]
  41. Gharagheizi, F. QSPR analysis for intrinsic viscosity of polymer solutions by means of GA-MLR and RBFNN. Comput. Mater. Sci. 2007, 40, 159–167. [Google Scholar] [CrossRef]
  42. Shahlaei, M.; Madadkar-Sobhani, A.; Fassihi, A.; Saghaie, L.; Arkan, E. QSAR study of some CCR5 antagonists as anti-HIV agents using radial basis function neural network and general regression neural network on the basis of principal components. Med. Chem. Res. 2012, 21, 3246–3262. [Google Scholar] [CrossRef]
  43. Atkinson, A.C. Plots, transformations, and regression. An introduction to graphical methods of diagnostic regression analysis. J. R. Stat. Soc. 1985, 152, 1927–1934. [Google Scholar]
  44. Gadaleta, D.; Mangiatordi, G.F.; Catto, M.; Carotti, A.; Nicolotti, O. Applicability domain for QSAR models: Where theory meets reality. Int. J. QSPR 2016, 1, 45–63. [Google Scholar] [CrossRef]
  45. Luan, F.; Tang, L.L.; Zhang, L.H.; Zhang, S.; Monteagudo, M.C.; Cordeiro, M.N.D.S. A further development of the QNAR model to predict the cellular uptake of nanoparticles by pancreatic cancer cells. Food Chem. Toxicol. 2018, 112, 571–580. [Google Scholar] [CrossRef] [PubMed]
  46. Katritzky, A.R.; Lobanov, V.S.; Karelson, M. Comprehensive Descriptors for Structural and Statistical Analysis; Reference Manual, Version 2.0; University of Florida: Gainsville, FL, USA, 1994. [Google Scholar]
  47. Sannigrahi, A.B. AB initio molecular orbital calculations of bond index and valency. Adv. Quantum Chem. 1992, 23, 301–351. [Google Scholar]
  48. Štrouf, O. Chemical Pattern Recognition; Research Studies Press: Baldock, UK, 1986; Volume 11. [Google Scholar]
Sample Availability: Samples of the compounds are available from the authors.
Figure 1. Plot of the predicted versus experimental −log IGC50 including the training and the test set compounds of Group 1 by MLR model (a) and by RBFNN model (b).
Figure 1. Plot of the predicted versus experimental −log IGC50 including the training and the test set compounds of Group 1 by MLR model (a) and by RBFNN model (b).
Molecules 23 01002 g001
Figure 2. Plot of the predicted versus experimental −log IGC50 including the training and the test set compounds of Group 2 by MLR model (a) and by RBFNN model (b).
Figure 2. Plot of the predicted versus experimental −log IGC50 including the training and the test set compounds of Group 2 by MLR model (a) and by RBFNN model (b).
Molecules 23 01002 g002
Figure 3. Plot of the predicted versus experimental −log IGC50 including the training and the test set compounds of Group 3 by MLR model (a) and by RBFNN model (b).
Figure 3. Plot of the predicted versus experimental −log IGC50 including the training and the test set compounds of Group 3 by MLR model (a) and by RBFNN model (b).
Molecules 23 01002 g003
Figure 4. The William’s plot for the training and test set compounds of Group 1 by MLR model.
Figure 4. The William’s plot for the training and test set compounds of Group 1 by MLR model.
Molecules 23 01002 g004
Figure 5. The William’s plot for the training and test set compounds of Group 2 by MLR model.
Figure 5. The William’s plot for the training and test set compounds of Group 2 by MLR model.
Molecules 23 01002 g005
Figure 6. The William’s plot for the training and test set compounds of Group 3 by MLR model.
Figure 6. The William’s plot for the training and test set compounds of Group 3 by MLR model.
Molecules 23 01002 g006
Table 1. The CAS number, name, experimental −log IGC50 values, predicted −log IGC50 values and their corresponding residual for the three groups of compounds.
Table 1. The CAS number, name, experimental −log IGC50 values, predicted −log IGC50 values and their corresponding residual for the three groups of compounds.
No.CASNameExperimental
−log IGC50
Predicted −log IGC50
MLRResidualRBFNNResidualSet
Group 1. Compounds with the functional group −NO2.
1 *619-25-03-Nitrobenzyl alcohol−0.220.580.800.010.23T
291-23-62-Nitroanisole−0.070.220.290.000.07A
399-09-23-Nitroaniline0.030.460.430.340.31B
488-74-42-Nitroaniline0.080.430.350.350.27C
599-61-63-Nitrobenzaldehyde0.110.270.160.160.05D
6 *98-95-3Nitrobenzene0.140.310.17−0.10−0.24T
7552-89-62-Nitrobenzaldehyde0.170.250.080.190.02A
8555-16-84-Nitrobenzaldehyde0.20.380.180.16−0.04B
988-72-22-Nitrotoluene0.260.480.220.320.06C
10704-13-23-Hydroxy-4-nitrobenzaldehyde0.270.570.300.390.12D
11 *121-89-130-Nitroacetophenone0.320.470.15−0.02−0.34T
1242454-06-85-Hydroxy-2-nitrobenzaldehyde0.330.540.210.420.09A
1389-62-34-Methyl-2-nitroaniline0.370.620.250.550.18B
14619-50-1Methyl-4-nitrobenzoate0.390.700.310.460.07C
1599-08-13-Nitrotoluene0.420.530.110.430.01D
16 *5292-45-5Dimethyl nitroterephthalate0.431.511.080.09−0.34T
17619-24-93-Nitrobenzonitrile0.450.700.250.670.22A
18554-84-73-Nitrophenol0.510.580.070.49−0.02B
1983-41-01,2-Dimethyl-3-nitrobenzene0.560.670.110.660.10C
20119-33-54-Methyl-2-nitrophenol0.570.630.060.620.05D
21 *99-51-41,2-Dimethyl-4-nitrobenzene0.590.740.150.31−0.28T
22700-38-95-Methyl-2-nitrophenol0.590.780.190.760.17A
234920-77-83-Methyl-2-nitrophenol0.610.640.030.51−0.10B
243011-34-54-Hydroxy-3-nitrobenzaldehyde0.610.47−0.140.45−0.16C
2599-99-04-Nitrotoluene0.650.54−0.110.52−0.13D
26 *5428-54-62-Methyl-5-nitrophenol0.660.840.180.60−0.06T
27601-89-82-Nitroresorcinol0.660.63−0.030.55−0.11A
2888-75-52-Nitrophenol0.670.43−0.240.32−0.35B
2999-77-4Ethyl-4-nitrobenzoate0.70.760.060.67−0.03C
30555-03-33-Nitroanisole0.710.66−0.050.48−0.23D
31 *97-02-92,4-Dinitroaniline0.721.330.610.970.25T
32616-86-44-Ethoxy-2-nitroaniline0.761.090.330.840.08A
3399-65-01,3-Dinitrobenzene0.761.070.310.940.18B
34100-29-84-Nitrophenetole0.830.980.150.840.01C
35573-56-82,6-Dinitrophenol0.831.250.421.300.47D
36 *606-22-42,6-Dinitroaniline0.841.300.460.80−0.04T
37603-71-41,3,5-Trimethyl-2-nitrobenzene0.860.920.060.73−0.13A
38121-14-22,4-Dinitrotoluene0.871.300.431.070.20B
39329-71-52,5-Dinitrophenol1.041.500.460.94−0.10C
40528-29-01,2-Dinitrobenzene1.251.06−0.190.89−0.36D
41 *100-25-41,4-Dinitrobenzene1.31.23−0.071.330.03T
4286-00-02-Nitrobiphenyl1.31.20−0.101.03−0.27A
43620-88-24-Nitrophenyl phenyl ether1.581.710.131.590.01B
4469212-31-32-(Benzylthio)-3-nitropyridine1.721.980.261.71−0.01C
45534-52-14,6-Dinitro-2-methylphenol1.731.61−0.121.70−0.03D
46 *4097-49-84-(tert)-Butyl-2,6-dinitrophenol1.82.000.201.71−0.09T
Group 2. Compounds with the functional group −X.
1 *348-54-92-Fluoroaniline−0.37-0.200.17−0.170.20T
295-51-22-Chloroaniline−0.170.030.20−0.040.13A
3108-90-7Chlorobenzene−0.130.110.240.050.18B
4372-19-03-Fluoroaniline−0.1−0.27−0.17−0.23−0.13C
5371-41-54-Fluorophenol0.02−0.12−0.14−0.14−0.16D
6 *106-47-84-Chloroaniline0.05−0.01−0.060.060.01T
7100-44-7Benzyl chloride0.060.310.250.240.18A
8108-86-1Bromobenzene0.080.180.100.150.07B
918982-54-22-Bromobenzyl alcohol0.10.320.220.240.14C
1095-88-54-Chlororesorcinol0.130.500.370.480.35D
11 *156-41-22-(4-Chlorophenyl)-ethylamine0.140.430.290.460.32T
12873-63-23-Chlorobenzyl alcohol0.150.560.410.550.40A
13104-86-94-Chlorobenzylamine0.160.280.120.270.11B
14615-65-62-Chloro-4-methylaniline0.180.460.280.410.23C
15367-12-42-Fluorophenol0.190.17−0.020.09−0.10D
16 *108-42-93-Chloroaniline0.22−0.07−0.290.04−0.18T
17873-76-74-Chlorobenzyl alcohol0.250.330.080.260.01A
181875-88-34-Chlorophenethyl alcohol0.320.480.160.430.11B
1995-56-72-Bromophenol0.330.500.170.450.12C
2095-69-24-Chloro-2-methylaniline0.350.450.100.390.04D
21 *615-43-02-Iodoaniline0.350.400.050.480.13T
22591-50-4Iodobenzene0.360.30−0.060.32−0.04A
2387-60-53-Chloro-2-methylaniline0.380.35−0.030.30−0.08B
2495-74-93-Chloro-4-methylaniline0.390.390.000.38−0.01C
25104-88-14-Chlorobenzaldehyde0.40.480.080.410.01D
26 *103-63-9(2-Bromoethyl)-benzene0.420.510.090.650.23T
275922-60-12-Amino-5-chlorobenzonitrile0.440.28−0.160.22−0.22A
28106-38-74-Bromotoluene0.470.480.010.42−0.05B
2995-79-45-Chloro-2-methylaniline0.50.44−0.060.40−0.10C
3095-50-11,2-Dichlorobenzene0.530.530.000.49−0.04D
31 *106-48-94-Chlorophenol0.540.06−0.480.27−0.27T
32615-74-72-Chloro-5-methylphenol0.540.590.050.590.05A
33554-00-72,4-Dichloroaniline0.560.49−0.070.44−0.12B
3495-82-92,5-Dichloroaniline0.580.36−0.220.29−0.29C
357120-43-65-Chloro-2-hydroxybenzamide0.590.39−0.200.43−0.16D
36 *623-12-14-Chloroanisole0.60.27−0.330.36−0.24T
376627-55-02-Bromo-4-methylphenol0.60.960.360.840.24A
3816532-79-94-Bromophenyl acetonitrile0.60.660.060.630.03B
392973-76-45-Bromovanillin0.620.850.230.870.25C
40626-01-73-Iodoaniline0.650.40−0.250.35−0.30D
41 *140-53-44-Chlorobenzyl cyanide0.660.57−0.090.720.06T
421585-07-51-Bromo-4-ethylbenzene0.670.920.250.940.27A
43106-37-61,4-Dibromobenzene0.680.61−0.070.58−0.10B
44106-41-24-Bromophenol0.680.48−0.200.42−0.26C
451124-04-52-Chloro-4,5-dimethylphenol0.690.900.210.830.14D
46 *1570-64-54-Chloro-2-methylphenol0.70.54−0.160.52−0.18T
47626-43-73,5-Dichloroaniline0.710.52−0.190.49−0.22A
4865262-96-63-Chloro-5-methoxyphenol0.760.54−0.220.49−0.27B
4959-50-74-Chloro-3-methylphenol0.80.49−0.310.49−0.31C
502905-69-3Methyl-2,5-dichlorobenzoate0.811.130.321.130.32D
51 *14548-45-94-Bromophenyl-3-pyridyl ketone0.821.250.431.200.38T
52540-38-54-Iodophenol0.850.59−0.260.56−0.29A
53108-43-03-Chlorophenol0.870.43−0.440.37−0.50B
54108-70-31,3,5-Trichlorobenzene0.871.130.261.140.27C
55120-83-22,4-Dichlorophenol1.040.89−0.150.95−0.09D
56 *874-42-02,4-Dichlorobenzaldehyde1.040.97−0.071.080.04T
5795-75-03,4-Dichlorotoluene1.071.02−0.050.95−0.12A
58120-82-11,2,4-Trichlorobenzene1.081.100.021.160.08B
5914143-32-94-Chloro-3-ethylphenol1.080.70−0.380.74−0.34C
602374-05-24-Bromo-2,6-dimethylphenol1.161.04−0.120.95−0.21D
61 *1689-84-53,5-Dibromo-4-hydroxybenzonitrile1.161.540.381.290.13T
6288-04-04-Chloro-3,5-dimethylphenol1.20.70−0.500.74−0.46A
6390-90-44-Bromobenzophenone1.261.370.111.360.10B
647530-27-04-Bromo-6-chloro-2-cresol1.281.370.091.340.06C
65636-30-62,4,5-Trichloroaniline1.31.18−0.121.310.01D
66 *5798-75-4Ethyl-4-bromobenzoate1.331.16−0.171.23−0.10T
6713608-87-220,30,40-Trichloroacetophenone1.341.440.101.31−0.03A
68615-58-72,4-Dibromophenol1.41.07−0.331.15−0.25B
6988-06-22,4,6-Trichlorophenol1.411.620.211.470.06C
70134-85-04-Chlorobenzophenone1.51.30−0.201.34−0.16D
71 *1016-78-03-Chlorobenzophenone1.551.27−0.281.20−0.35T
7290-60-83,5-Dichlorosalicylaldehyde1.551.49−0.061.52−0.03A
73591-35-53,5-Dichlorophenol1.561.28−0.281.22−0.34B
7490-59-53,5-Dibromosalicylaldehyde1.651.55−0.101.61−0.04C
75456-47-33-Fluorobenzyl alcohol−0.39−0.050.34−0.090.30D
Group 3. Compounds with both −NO2 and −X.
1 *89-59-84-Chloro-2-nitrotoluene0.431.080.650.750.32T
2585-79-51-Bromo-3-nitrobenzene0.530.48−0.050.540.01A
37149-70-42-Bromo-5-nitrotoluene0.680.990.311.090.41B
4100-14-14-Nitrobenzyl chloride0.680.700.020.710.03C
5610-78-64-Chloro-3-nitrophenol0.731.080.351.030.30D
6 *7147-89-94-Chloro-6-nitro-3-cresol0.731.200.471.120.39T
7364-74-92,5-Difluoronitrobenzene0.750.66−0.090.850.10A
86361-21-32-Chloro-5-nitrobenzaldehyde0.750.860.110.750.00B
983-42-12-Chloro-6-nitrotoluene0.750.55−0.200.53−0.22C
1088-73-31-Chloro-2-nitrobenzene0.750.69−0.060.72−0.03D
11 *121-73-31-Chloro-3-nitrobenzene0.80.69−0.110.830.03T
1287-65-02,6-Dichlorophenol0.820.830.010.81−0.01A
13121-87-92-Chloro-4-nitroaniline0.820.79−0.030.77−0.05B
14577-19-51-Bromo-2-nitrobenzene0.990.71−0.280.80−0.19C
152973-19-52-Chloromethyl-4-nitrophenol1.031.030.001.00−0.03D
16 *78056-39-04,5-Difluoro-2-nitroaniline1.061.130.070.86−0.20T
17350-30-13-Chloro-4-fluoronitrobenzene1.070.86−0.210.93−0.14A
1842087-80-9Methyl-4-chloro-2-nitrobenzoate1.091.250.161.300.21B
19611-06-32,4-Dichloronitrobenzene1.121.230.111.270.15C
2051-28-52,4-Dinitrophenol1.131.170.041.12−0.01D
21 *3209-22-12,3-Dichloronitrobenzene1.130.83−0.301.04−0.09T
223819-88-31-Fluoro-3-iodo-5-nitrobenzene1.160.90−0.260.97−0.19A
23618-62-23,5-Dichloronitrobenzene1.160.96−0.201.08−0.08B
2489-61-22.5-Dichloronitrobenzene1.181.190.011.240.06C
2599-54-73,4-Dichloronitrobenzene1.240.87−0.370.92−0.32D
26 *2683-43-42,4-Dichloro-6-nitroaniline1.261.280.021.14−0.12T
273460-18-22,5-Dibromonitrobenzene1.271.270.001.310.04A
28827-23-62,4-Dibromo-6-nitroaniline1.371.400.031.450.08B
296641-64-14,5-Dichloro-2-nitroaniline1.621.31−0.311.38−0.24C
30609-89-22,4-Chloro-6-nitrophenol1.631.46−0.171.50−0.13D
31 *305-85-12,6-Iodo-4-nitrophenol1.661.40−0.261.47−0.19T
323531-19-96-Chloro-2,4-dinitroaniline1.711.53−0.181.64−0.07A
331817-73-82-Bromo-4,6-dinitroaniline1.751.63−0.121.73−0.02B
3497-00-71-Chloro-2,4-dinitrobenzene1.811.820.011.880.07C
35709-49-92,4-Dinitro-1-iodobenzene2.121.78−0.342.02−0.10D
36 *70-34-82,4-Dinitro-1-fluorobenzene2.161.67−0.490.67−1.49T
37350-46-91-Fluoro-4-nitrobenzene0.10.210.110.320.22A
381493-27-21-Fluoro-2-nitrobenzene0.23-0.07−0.300.230.00B
39100-00-51-Chloro-4-nitrobenzene0.330.460.130.510.18C
* represents the compound in the test set.
Table 2. Descriptors, Coefficients, Standard Error, and t-Test Values for the Best MLR Model of Group 1.
Table 2. Descriptors, Coefficients, Standard Error, and t-Test Values for the Best MLR Model of Group 1.
CoefficientsStandard Errorst-TestDescriptors
073.12320.5473.559Intercept
10.0020.00010.075Gravitation index (all bonds) (G2)
2−1.0160.188−5.394Max bond order of a O atom (Po)
3−1.0820.510−3.568Max n–n repulsion for a C–H bond (Enn(C–H))
N = 36, R2 = 0.829, LOOq2 = 0.813, F = 51.697, RMS = 0.192
Table 3. Descriptors, Coefficients, Standard Errors, and t-Test Values for the Best MLR Model of Group 2.
Table 3. Descriptors, Coefficients, Standard Errors, and t-Test Values for the Best MLR Model of Group 2.
CoefficientsStandard Errorst-TestDescriptors
0−18.0303.703−4.869Intercept
10.4380.0459.808LogP
2−6.6050.605−10.918FNSA-2 Fractional PNSA (PNSA-2/TMSA) [Zefirov’s PC] (PNSA-2/TMSA)
317.0663.7944.498Max SIGMA–SIGMA bond order(PSIGMA)
N = 60, R2 = 0.803, LOOq2 = 0.792, F = 76.016, RMS = 0.222
Table 4. Descriptors, Coefficients, Standard Errors, and t-Test Values for the Best MLR Model of Group 3.
Table 4. Descriptors, Coefficients, Standard Errors, and t-Test Values for the Best MLR Model of Group 3.
CoefficientsStandard Errorst-TestDescriptors
0−16.6403.092−5.382Intercept
1−2.1410.322−6.650Principal moment of inertia C (Ic)
20.1510.0246.292Max e–e repulsion for a C–C bond (Enn(C–C))
3−51.2907.493−6.845RPCG Relative positive charge (QMPOS/QTPLUS) [Quantum-Chemical PC] (QMPOS/QTPLUS)
N = 31, R2 = 0.852, LOOq2 = 0.835, F = 51.678, RMS = 0.193
Table 5. The statistical results of the external test set for the three models of each group.
Table 5. The statistical results of the external test set for the three models of each group.
Group 1Group 2Group 3
MLRRBFNNMLRRBFNNMLRRBFNN
R20.830.840.800.820.850.89
q ext 2 0.920.880.790.810.730.63
R 0 2 0.810.840.790.820.840.88
( R 2 R 0 2 ) R 2 0.0240.000.0120.000.0120.011
k0.880.860.800.830.850.87
k1.110.970.930.910.930.98
Table 6. Validation of the MLR models.
Table 6. Validation of the MLR models.
Training SetR2FRMSTest SetR2FRMS
Group 1
A + B + C + D0.82951.6970.192T0.91713.8200.222
A + B + C + T0.73530.4570.261D0.87448.5920.153
B + C + D + T0.74231.5650.255A0.79527.1990.217
A + C + D + T0.72328.7750.259B0.88955.7840.173
A + B + D + T0.74431.9120.247C0.83535.5010.217
Average0.75534.8810.243 0.86236.1800.196
Group 2
A + B + C + D0.80376.0160.222T0.85251.6780.193
A + B + C + T0.80275.6610.225D0.74838.5590.256
B + C + D + T0.78467.8050.235A0.85778.1330.193
A + C + D + T0.78668.4590.233B0.81858.4160.221
A + B + D + T0.78066.1840.235C0.83163.9810.218
Average0.79170.8340.230 0.82158.1530.216
Group 3
A + B + C + D0.78913.7200.260T0.733260.4040.380
A + B + C + T0.76029.5020.263D0.92763.8500.115
B + C + D + T0.75527.7870.254A0.89953.6120.170
A + C + D + T0.76228.8690.250B0.91766.3850.159
A + B + D + T0.75427.5730.248C0.84632.9570.237
Average0.76425.4900.255 0.86495.4420.212

Share and Cite

MDPI and ACS Style

Luan, F.; Wang, T.; Tang, L.; Zhang, S.; Cordeiro, M.N.D.S. Estimation of the Toxicity of Different Substituted Aromatic Compounds to the Aquatic Ciliate Tetrahymena pyriformis by QSAR Approach. Molecules 2018, 23, 1002. https://doi.org/10.3390/molecules23051002

AMA Style

Luan F, Wang T, Tang L, Zhang S, Cordeiro MNDS. Estimation of the Toxicity of Different Substituted Aromatic Compounds to the Aquatic Ciliate Tetrahymena pyriformis by QSAR Approach. Molecules. 2018; 23(5):1002. https://doi.org/10.3390/molecules23051002

Chicago/Turabian Style

Luan, Feng, Ting Wang, Lili Tang, Shuang Zhang, and M. Natália Dias Soeiro Cordeiro. 2018. "Estimation of the Toxicity of Different Substituted Aromatic Compounds to the Aquatic Ciliate Tetrahymena pyriformis by QSAR Approach" Molecules 23, no. 5: 1002. https://doi.org/10.3390/molecules23051002

Article Metrics

Back to TopTop