1. Introduction
The benzimidazoles are a large chemical family used as antimicrobial agents against the wide spectrum of microorganisms [
1–
9]. Because of its synthetic utility and broad range of pharmacological effects, the benzimidazole nucleus is an important heterocyclic ring, and interest in the chemistry, synthesis and microbiology of this pharmacophore continues to be fuelled by its antifungal [
10], antitubercular [
11], antioxidant [
12,
13], and antiallergic [
14,
15] properties. Other reports have revealed that these molecules are also present in a variety of antiparasitic [
16,
17] and herbicidal agents [
18]. Albendazole, fenbendazole and their sulphoxide derivatives are methylcarbamate benzimidazoles with a broad spectrum anthelmintic activity, widely used in human and veterinary medicine [
19]. They are used against several systemic parasitoses, including nematodoses, hidatidosis, teniasis and others [
20]. They are also used to treat microsporodial and cryptosporodial infections, which can cause lethal diarrhea in patients treated with immunosuppressive drugs, or infected with HIV [
21,
22].
Different substituted benzimidazolyl quinolinyl mercaptotriazoles are remarkably effective compounds both with respect to their virus inhibitory activity and their favourable antibacterial activity [
23]. In recent years, benzimidazole derivatives have been attracted particular interest due to their antiviral activity against HCV (Hepatitis C virus) [
24,
25].
Although a variety of benzimidazole derivatives are known, the development of new and convenient strategies to synthesize new biologically active benzimidazoles is of considerable interest. Quantitative structure activity relationship (QSAR) studies are useful tools in the rational search for bioactive molecules. The main success of the QSAR method is the possibility to estimate the characteristics of new chemical compounds without the need to synthesize and test them. This analysis represents an attempt to relate structural descriptors of compounds with their physicochemical properties and biological activities. This is widely used for the prediction of physicochemical properties in the chemical, pharmaceutical, and environmental spheres. This method included data collection, molecular descriptor selection, correlation model development, and finally model evaluation. QSAR studies have predictive ability and simultaneously provide deeper insight into mechanism of drug receptor interactions [
26,
27].
In view of the above and in continuation of our studies on the inhibitory activities of benzimidazole derivatives, as well as on correlation of molecular properties with activity [
4–
8,
28–
35], the objective of this investigation was to study the usefulness of QSAR in the prediction of the antibacterial activity of benzimidazole derivatives against
Pseudomonas aeruginosa. Multiple linear regression (MLR) models have been developed as a mathematical equation which can relate chemical structure to the inhibitory activity.
2. Results and Discussion
In the first step of the present study, different substituted benzimidazoles (
Table 1) were evaluated for
in vitro antibacterial activity against Gram-negative
Pseudomonas aeruginosa. The inhibitory effects of compounds
1 –
14, expressed as minimum inhibitory concentration (MIC) values, are summarized in
Table 2.
The screening results reveal that all the compounds exhibited appreciable
in vitro activity against the tested strain. In the second step, we focused our efforts on developing the QSAR models of compounds
1 –
14 as antibacterial agents. A set of benzimidazoles was used for MLR model generation. The reference drugs were not included in model generation as they belong to a different structural series. Inhibitory activity data determined as μg/mL were first transformed to the negative logarithms of molar MICs (log1/
cMIC), (
Table 2) which was used as a dependent variable in the QSAR study. Different physicochemical, steric, electronic, and structural molecular descriptors were used as independent variables and were correlated with antibacterial activity.
Developing a QSAR model requires a diverse set of data, and, thereby a large number of descriptors have to be considered. Descriptors are numerical values that encode different structural features of the molecules. Selection of a set of appropriate descriptors from a large number of them requires a method, which is able to discriminate between the parameters. Pearson's correlation matrix has been performed on all descriptors by using NCSS Statistical Software. The analysis of the matrix revealed nine descriptors for the development of MLR model. The values of descriptors selected for MLR model are presented in
Table 3. Linear models were then formed by a stepwise addition of terms. A delition process was then employed, whereby each variable in the model was held out in turn and using the remaining parameters models were generated. Each descriptor was chosen as input for the statistical software package and then the stepwise addition method implemented in the software was used for choosing the descriptors contributing to the antibacterial activity of benzimidazole derivatives.
The specifications for the best-selected MLR models are shown in
Table 4. The monoparametric model indicated the importance of molar weight (
MW) in contribution to inhibitory activity (model 1,
Table 4). Addition of total energy (
TE) as an additional parameter to
MW, significantly increased the correlation coefficient from 0.7910 to 0.8587 (model 2,
Table 4). Similarly, the addition of a third parameter also increased the correlation coefficient, but a MLR method only can be used when a relatively small number of molecular descriptors are used (at least five to six times smaller than the total number of compounds). In this case (for fourteen compounds), only two descriptors can be used to develop a good QSAR model in order to avoid a high chance of spurious correlations. In this approach, only the QSAR models 1 and 2 can be considered.
It is well known that there are three important components in any QSAR study: development of models, validation of models and utility of developed models. Validation is a crucial aspect of any QSAR analysis [
36]. The statistical quality of the resulting models, as depicted in
Table 4, is determined by
r,
s, and
F [
37–
39]. It is noteworthy that all these equations were derived using the entire data set of compounds (
n = 14) and no outliers were identified. The
F-value presented in
Table 4 is found statistically significant at 99% level since all the calculated
F values are higher as compared to tabulated values.
For the testing the validity of the predictive power of selected MLR models the LOO technique was used. The developed models were validated by the calculation of following statistical parameters: PRESS, SSY, S
PRESS, r2 CV, and
r2 adj (
Table 5). These parameters were calculated from the following equations:
where,
Yobs,
Ycalc and
Ymean are observed, calculated and mean values;
n, number of compounds;
p, number of independent parameters.
PRESS is an acronym for prediction sum of squares. It is used to validate a regression model with regards to predictability. To calculate PRESS, each observation is individually omitted. The remaining n – 1 observations are used to calculate a regression and estimate the value of the omitted observation. This is done n times, once for each observation. The difference between the actual Y value, yobs, and the predicted Y, ycalc, is called the prediction error. The sum of the squared prediction errors is the PRESS value. The smaller PRESS is, the better the predictability of the model. Its value being less than SSY points out that the model predicts better than chance and can be considered statistically significant. SSY are the sums of squares associated with the corresponding sources of variation. These values are in terms of the dependent variable, y.
The PRESS value above can be used to compute an r2 CV statistic, called r2 cross validated, which reflects the prediction ability of the model. This is a good way to validate the prediction of a regression model without selecting another sample or splitting your data. It is very possible to have a high r2 and a very low r2 CV. When this occurs, it implies that the fitted model is data dependent. This r2 CV ranges from below zero to above one. When outside the range of zero to one, it is truncated to stay within this range. Adjusted r-squared (r2 adj) is an adjusted version of r2. The adjustment seeks to remove the distortion due to a small sample size.
In many cases
r2 CV and
r2 adj is taken as a proof of the high predictive ability of QSAR models. A high value of these statistical characteristic (> 0.5) is considered as a proof of the high predictive ability of the model, although recent reports have proven the opposite [
40]. Although a low value of
r2 CV for the training set can indeed serve as an indicator of a low predictive ability of a model, the opposite is not necessarily true. Indeed, the high
r2 CV does not imply automatically a high predictive ability of the model. Thus, the high value of LOO
r2 CV is the
necessary condition for a model to have a high predictive power, it is not a
sufficient condition. It is proven that the only way to estimate the true predictive power of a model is to test it on a sufficiently large collection of compounds from an external test set. The test set must include no less than five compounds, whose activities and structures must cover the range of activities and structures of compounds from the training set. This application is necessary for obtaining trustful statistics for comparison between the observed and predicted activities for these compounds. Besides high
r2 CV, a reliable model should be also characterized by a high correlation coefficient between the predicted and observed activities of compounds from a test set of molecules that was not used to develop the models.
To confirm the predictive power of the QSAR models, an external set of benzimidazoles was used. Five benzimidazole derivatives which were tested in our previous paper for their antibacterial activity against the
Pseudomonas aeruginosa were used as the external set of molecules [
33]. In the present paper, the inhibitory activity of the following compounds was calculated: 1-(3-methoxybenzyl)-5,6- dimethylbenzimidazole (
15), 1-(3-methylbenzyl)-2-aminobenzimidazole (
16), 1-(3-chlorobenzyl)-2- amino-benzimidazole (
17), 1-(3-fluorobenzyl)-2-amino-5,6-dimethylbenzimidazole (
18) and 1-(3- methoxybenzyl)-2-amino-5,6-dimethylbenzimidazole (
19).
The values of inhibitory activitiy of a test set of molecules was calculated with the models 1 and 2. These data are compared with experimentally obtained values of antibacterial activity against the same bacteria. From the data presented in
Table 6, it is shown that high agreement between experimental and predicted inhibitory values was obtained (the residual values are small, indicating the good predictability of the established models. According to the reference [
40], without the validation of the QSAR models by using the external test set, we could not have come to a right conclusion about high predictive ability of derived models.
Figure 1 shows the plots of linear regression predicted
versus experimental values of the antibacterial activity of external set of benzimidazoles. The plots for QSAR models 1 and 2 show a very good fit with
r2 = 0.9992 and 0.9989, respectively. It indicates that models 1 and 2 can be successfully applied to predict the antibacterial activity of these class of molecules. Moreover, it is not possible to use the reported QSAR models to predict the activity of any type of molecules
vs. Pseudomonas aeruginosa. The applicability domain of the derived QSAR models is the different substituted 1-benzyl or 1-benzoylbenzimidazole derivatives. However, it is very important to point out an eventual QSAR models disappointments: activity cliffs [
41]. It is possible because similar molecules can show significantly different biological activities. For these molecules, activities are often mispredicted, even when the overall predictivity of the models are high.
Comparing the activities of the tested molecules it was found that 2-aminobenzimidazole derivatives (compounds 1 – 7) were more active than 2-methylbenzimidazoles (compounds 8 – 14). It can be concluded that the presence of an amino substituent leads to an increase in the activity, in comparison to the presence of a methyl group. Also, the analysis of the results indicates that the activity increased in the series of compounds 2 – 3 – 4; 5 – 6 – 7; 9 –10 – 11 and 12 – 13 – 14. These observations revealed that the nature of the substituents has an effect on inhibitory activity. It can be concluded that the presence of substituents (CH3 or Cl) enhanced the activity of the compounds 2, 5, 9 and 12. However, the presence of a chloro substituent leads to increase in the activity in comparison to the presence of a methyl group. The comparison between antibacterial activity of 1-benzyl and 1-benzoylbenzimidazole derivatives showed that they were similar, except in the cases of compounds 2 vs. 5, and 9 vs. 12, where the presence of carbonyl group leads to decrease in the activity.