This article is an openaccess article distributed under the terms and conditions of the Creative Commons Attribution license (
A series of 3hydroxypyridine4one and 3hydroxypyran4one derivatives were subjected to quantitative structureantimicrobial activity relationships (QSAR) analysis. A collection of chemometrics methods, including factor analysisbased multiple linear regression (FAMLR), principal component regression (PCR) and partial least squares combined with genetic algorithm for variable selection (GAPLS) were employed to make connections between structural parameters and antimicrobial activity. The results revealed the significant role of topological parameters in the antimicrobial activity of the studied compounds against
Quantitative structure activity relationships (QSAR) studies, as one of the most important areas in chemometrics, give information that is useful for molecular design and medicinal chemistry [
It is almost 120 years since physicians revealed that the coincidence of blood and bacteria in a wound may cause a lifethreatening infection. It has also been shown that blood or hemoglobin enhance the lethality of intraperitoneal or subcutaneous inocula of bacteria such as
Few reports of antimicrobial studies of 3hydroxypyridine4one and 3hydroxypyran4one derivatives are available [
The twodimensional structures of molecules were drawn using the Hyperchem 7.0 software. The final geometries were obtained with the semiempirical AM1 method in the Hyperchem program. The molecular structures were optimized using the PolakRibiere algorithm until the root mean square gradient was 0.01 kcal mol^{−1}. The resulted geometry was transferred into Dragon program package, which was developed by Milano Chemometrics and QSAR Group [
The biological data used in this study are antimicrobial activity, (in terms of –log MIC), of a set of 3hydroxypyridine4one and 3hydroxypyran4one derivatives [
Gaussian 98 was employed for calculation of different quantum chemical descriptors including, dipole moment (DM), local charges, HOMO and LOMO energies. Hardness (η), softness (S), electronegativity (χ) and electrophilicity (ω) were calculated according to the method proposed by Thanikaivelan
Constitutional, topological, geometrical, functional group, quantum and physicochemical indices were used in this study; brief description of some of them is listed in
The calculated descriptors were collected in a data matrix whose number of rows and columns were the number of molecules and descriptors, respectively. Genetic algorithm  partial least squares (GAPLS), MLR with factor analysis as the data preprocessing step for variable selection (FAMLR) and principal component regression analysis (PCRA) methods were used to derive the QSAR equations and feature selection was performed by the use of genetic algorithm (GA). The genetic algorithms are efficient methods for function minimization. In descriptor selection context, the prediction error of the model built upon a set of features is optimized [
In this study, to model the structureantimicrobial activity relationships better, genetic algorithmpartial least square (GAPLS) was employed [
Application of PLS method thus allows the construction of larger QSAR equations while still avoiding overfitting and eliminating most variables. This method is normally used in combination with crossvalidation to obtain the optimum number of components [
In our previous study the classical approach of multiple regression technique was used for developing QSAR relation [
In PLS analysis, the descriptors data matrix is decomposed to orthogonal matrices with an inner relationship between the dependent and independent variables. Therefore, unlike MLR analysis, the multicolinearity problem in the descriptors is omitted by PLS analysis. Because a minimal number of latent variables are used for modeling in PLS; this modeling method coincides with noisy data better than MLR. In order to find the more convenient set of descriptors in PLS modeling, genetic algorithm was used. To do so, many different GAPLS runs were conducted using different initial set of populations. The data set (compounds tested against
As it is observed, a combination of quantum, topological, geometrical, constitutional, and functional group descriptors have been selected by GAPLS to account the antimicrobial activity of the studied compounds. The majority of these descriptors are topological indices. The resulted GAPLS model possessed very high statistical quality R^{2} = 0.96 and Q^{2} = 0.91. The values of pMIC using PLS model (refined from crossvalidation or external prediction set) along with the corresponding relative errors of prediction (REP) are shown in
The data set (compounds tested against
Based on the procedure explained in the experimental section, the following threeparametric equation was derived.
Equation 1 could explain 73% of the variance and predict 68% of the variance in pMIC data. This equation describes the effect of geometrical (PJI3), functional group (nCONHR) and quantum (DMy) indices on antimicrobial activity.
When factor scores were used as the predictor parameters in a multiple regression equation using forward selection method (PCRA), the following equation was obtained:
Equation 2 also shows high equation statistics (81% explained variance and 79% predict variance in pMIC data). Since factor scores are used instead of selected descriptors, and any factorscore contains information from different descriptors, loss of information is thus avoided and the quality of PCRA equation is better than those derived from FAMLR.
As it is observed from
Based on the procedure explained in the experimental section, the following fourparametric equation was derived.
Equation 3 could explain and predict 85% and 81% of the variance in pMIC data, respectively. This equation describes the effect of topological (piID and PW3), functional group (nCp) and geometrical (ASP) indices on the antimicrobial activity.
When factor scores were used as the predictor parameters in a multiple regression equation using forward selection method (PCRA), the following equation was obtained:
Equation 4 shows also high equation statistics (88% explained variance and 83% predicted variance in pMIC data). It should be noted that the variables (factor scores) used in Equation 4 are perfectly orthogonal to each other. Since factor scores are used instead of selected descriptors, and any factorscore contains information from different descriptors, loss of information is thus avoided and the quality of PCRA equation is better than those derived from FAMLR.
As it is observed from
Comparison between the results obtained by GAPLS and the other employed regression methods indicates higher accuracy of this method in describing antimicrobial activity of the studied compounds.
Difference in accuracy of the different regression methods used in this study is visualized in
Quantitative relationships between molecular structure and inhibitory activity of a series of 3hydroxypyridine4one and 3hydroxypyran4one derivatives were discovered by a collection of chemometrics methods including GAPLS, FAMLR and PCRA. The results revealed the significant role of topological parameters in the antimicrobial activity of the studied compounds against
This work was supported by Isfahan Pharmaceutical Sciences Research Center.
PLS regression coefficients for the variables used in GAPLS model (against
PLS regression coefficients for the variables used in GAPLS model (against
Plots of the crossvalidated predicted activity against the experimental activity for the QSAR models obtained by different chemometrics methods (against
Plots of the crossvalidated predicted activity against the experimental activity for the QSAR models obtained by different chemometrics methods (against
Chemical structure of the compounds used in QSAR analysis.
Compound  X  R_{2}  R_{3}  R_{5}  R_{6} 

NH  CH_{3}  OH  CH_{2}R^{a}  H  
NH  C_{2}H_{5}  OH  CH_{2}R^{a}  H  
NH  CH_{3}  OH  CH_{2}N(CH_{3})_{2}  H  
NH  C_{2}H_{5}  OH  CH_{2}N(CH_{3})_{2}  H  
NH  CH_{3}  OH  CH_{2}N(C_{2}H_{5})_{2}  H  
NH  C_{2}H_{5}  OH  CH_{2}N(C_{2}H_{5})_{2}  H  
NPh  CH_{3}  OH  H  H  
N 
CH_{3}  OH  H  H  
NC_{3}H_{7}  CH_{3}  OH  H  H  
NC_{4}H_{9}  CH_{3}  OH  H  H  
O  CH_{2}Cl  H  OH  H  
O  CH_{3}  H  OH  H  
O  CH_{2}OH  OH  H  CH_{3}  
O  CH_{2}OH  OCH_{2}Ph  H  CH_{3}  
O  CHO  OCH_{2}Ph  H  CH_{3}  
O  COOH  OCH_{2}Ph  H  CH_{3}  
O  CONHR^{b}  OCH_{2}Ph  H  CH_{3}  
O  CONHR^{c}  OCH_{2}Ph  H  CH_{3}  
O  CONHR^{d}  OCH_{2}Ph  H  CH_{3}  
O  CONHR^{b}  OH  H  CH_{3}  
O  CONHR^{c}  OH  H  CH_{3}  
O  CONHR^{d}  OH  H  CH_{3}  
O  CH_{2}OH  H  OCH_{2}Ph  H  
O  COOH  H  OCH_{2}Ph  H  
O  CONHPh  H  OCH_{2}Ph  H  
NCH_{3}  CONHPh  H  OCH_{2}Ph  H  
NCH_{3}  CONHPh  H  OH  H  
O  CONHR^{e}  H  OCH_{2}Ph  H  
NCH_{3}  CONHR^{e}  H  OCH_{2}Ph  H  
NCH_{3}  CONHR^{e}  H  OH  H  
O  CH_{2}OH  H  OH  H 
Brief description of some descriptors used in this study.
Descriptor Type  Molecular Description 

Constitutional  Mean atomic van der Waals volume (Mv) (scaled on Carbon atom), no. of heteroatoms, no. of multiple bonds (nBM), no. of rings, no. of circuits, no of Hbond donors, no of Hbond acceptors, no. of Nitrogen atoms (nN), chemical composition, sum of KierHall electrotopological states (Ss), mean atomic polarizability (Mp), number of rotable bonds (RBN), mean atomic Sanderson electronegativity (Me), etc.

Topological  Narumi harmonic topological index (HNar), Total structure connectivity index (Xt), information content index (IC), mean information content on the distance degree equality (IDDE), total walk count, path/walkRandic shape indices (PW3, PW4, PW5, Zagreb indices, Schultz indices, Balaban J index (such as MSD) Wiener indices, Information content index (neighborhood symmetry of 2order) (IC2), Ratio of multiple path count to path counts (PCR), LovaszPelikan index (leading eigenvalue) (LP1), total information content index (neighborhood symmetry of 1order) (TIC1), reciprocal hyperdetour index (Rww), Average connectivity index chi5 (X5A), piID (conventional bondorder ID number), etc.

Geometrical  3D Petijean shape index (PJI3), Asphericity (ASP), Gravitational index, Balaban index, Wiener index, Lengthtobreadth ratio by WHIM (L/Bw), etc.

Quantum  Highest occupied Molecular Orbital Energy (HOMO), Lowest Unoccupied Molecular Orbital Energy (LUMO), Most positive charge (MPC), Sum of square of positive charges (SSPC), Sum of square of negative charges (SSNC), Sum of positive charges (SUMPC), Sum of negative charges (SUMNC), Sum of absolute of charges (SAC), Standard deviation (Std), Total dipole moment (DM_{t}), Molecular dipole moment at Xdirection (DM_{X}), Molecular dipole moment at Ydirection (DM_{Y}), Molecular dipole moment at Zdirection (DM_{Z}), Electronegativity (χ= −0.5 (HOMOLUMO)), Electrophilicity (ω= χ^{2}/2 η), Hardness (η = 0.5 (HOMO+LUMO)), Softness (S=1/ η).

Functional group  Number of total secondary C(sp3) (nCs), Number of total tertiary carbons (nCt), Number of Hbond acceptor atoms (nHAcc), Number of secondary amides (aliphatic) (nCONHR), Number of unsubstituted aromatic C (nCaH), Number of ethers (aromatic) (nRORPh), Number of ketones (aliphatic) (nCO), Number of tertiary amines (aliphatic) (nNR2), Number of phenols (nOHPh), Number of total primary C(sp3) (nCp), etc.

Chemical  LogP (Octanolwater partition coefficient), Hydration Energy (HE), Polarizability (Pol), Molar refractivity (MR), Molecular volume (V), Molecular surface area (SA).

Experimental and predicted activity of compounds against
Compound  Experimental pMIC 
Predicted pMIC  REP 

3.29  3.3205  0.9173  
3.29  3.3007  0.3242  
3.29  3.2266  −1.9664  
3.29  3.3976  3.1675  
4.19  3.7498  −11.740  
3.29  3.3205  0.9173  
3.89  3.8255  −1.6850  
3.29  3.2698  −0.6172  
3.29  3.2886  −0.0440  
3.89  3.9283  0.9738  
3.59  3.6207  0.8470  
3.59  3.7254  3.6340  
3.59  3.5063  −2.3883  
3.59  3.6212  0.8627  
4.19  4.1563  −0.8119  
3.59  3.5611  −0.8123  
3.59  3.6177  0.7647  
3.59  3.5548  −0.9915  
3.89  3.8950  0.1293  
4.19  4.0995  −2.2079  
3.59  3.7117  3.2787  
5.10  5.0840  −0.3141  
3.59  3.5533  −1.0318  
3.59  3.7223  3.5534  
3.89  3.9222  0.8214  
3.89  3.9779  2.2092  
4.80  4.8022  0.0453  
3.89  3.8591  −0.8011  
3.59  3.4907  −2.8470  
4.49  4.5105  0.4549  
3.59  3.4728  −3.3746 
^{a} pMIC= −log (MIC),
^{b} REP = Relative Error Prediction
*Compounds used as prediction set
Experimental and predicted activity of compounds against
Compd.  Experimental pMIC  Predicted pMIC  REP(%) 

3.29  3.4139  3.6304  
3.29  3.3893  2.9303  
3.89  3.8920  0.0514  
3.29  3.3591  2.0577  
3.29  3.3835  2.7631  
3.59  3.6477  1.5813  
3.29  3.3208  0.9272  
3.59  3.6196  0.8175  
3.89  3.9567  1.6857  
3.89  3.7481  −3.7870  
3.89  3.9092  0.4922  
3.89  3.7076  −4.9191  
3.89  3.8892  −0.0203  
3.89  3.8422  −1.2433  
4.49  4.3961  −2.1360  
4.49  4.4476  −0.9524  
3.89  3.7076  −4.9191  
3.89  3.8014  −2.3296  
3.89  3.9525  1.5813  
3.89  3.7450  −3.8727  
3.89  3.9056  0.3994  
3.89  3.9969  2.6755  
3.89  3.8489  −1.0691  
3.89  3.7573  −3.5304  
3.89  3.9503  1.5262  
3.89  3.9964  2.6619  
3.89  3.8978  0.2006  
3.89  3.8732  −0.4333 
*Compounds used as prediction set
Numerical values of factor loading numbers 1–4 for some descriptors after VARIMAX rotation (against
1  2  3  4  Commonality  

0.588  −0.105  0.587  −0.313  0.799  
0.195  −0.054  0.762  0.071  0.627  
0.059  0.637  −0.013  0.620  0.794  
−0.643  −0.206  −0.199  −0.496  0.741  
0.751  −0.413  0.362  −0.259  0.934  
0.001  −0.781  0.097  −0.298  0.708  
0.087  0.902  0.068  0.003  0.826  
0.866  0.051  0.217  −0.252  0.863  
−0.645  −0.505  −0.307  0.081  0.772  
0.746  0.359  0.215  0.324  0.837  
0.667  0.460  0.368  0.292  0.877  
0.714  0.413  0.175  0.127  0.726  
0.375  0.611  −0.315  −0.276  0.689  
−0.559  0.578  −0.411  0.199  0.855  
0.894  −0.140  −0.143  −0.079  0.845  
0.261  0.220  0.695  −0.906  0.765  
−0.082  0.081  −0.214  0.853  0.787  
0.041  −0.116  0.898  −0.051  0.824  
29.87  20.10  17.15  12.12  79.24 
Numerical values of factor loading numbers 1–5 for some descriptors after VARIMAX rotation (against
1  2  3  4  5  Commonality  

−0.491  −0.431  −0.459  −0.107  0.095  0.657  
−0.007  0.102  −0.209  0.860  0.322  0.898  
0.240  0.811  −0.156  −0.349  0.014  0.861  
−0.706  −0.389  0.142  0.323  −0.310  0.871  
−0.627  −0.664  −0.134  −0.102  0.129  0.879  
−0.166  0.594  −0.377  −0.158  0.893  0.584  
0.913  −0.079  0.055  0.135  −0.132  0.879  
0.579  0.272  −0.164  0.210  0.584  0.820  
0.750  −0.070  −0.333  −0.190  −0.208  0.758  
−0.075  0.087  0.866  −0.198  0.322  0.905  
0.064  0.117  0.926  −0.023  0.164  0.902  
−0.206  0.754  −0.224  −0.097  −0.325  0.777  
−0.366  0.722  0.148  0.287  −0.234  0.814  
−0.191  −0.415  −0.165  −0.447  0.356  0.562  
0.571  −0.522  0.379  0.002  −0.341  0.858  
0.628  −0.627  −0.277  −0.107  0.602  0.872  
22.58  20.58  14.71  14.02  8.71  80.60 