Computer-Aided Design of Orally Bioavailable Pyrrolidine Carboxamide Inhibitors of Enoyl-Acyl Carrier Protein Reductase of Mycobacterium tuberculosis with Favorable Pharmacokinetic Profiles

We have carried out a computational structure-based design of new potent pyrrolidine carboxamide (PCAMs) inhibitors of enoyl-acyl carrier protein reductase (InhA) of Mycobacterium tuberculosis (MTb). Three-dimensional (3D) models of InhA-PCAMx complexes were prepared by in situ modification of the crystal structure of InhA-PCAM1 (Protein Data Bank (PDB) entry code: 4U0J), the reference compound of a training set of 20 PCAMs with known experimental inhibitory potencies (IC50exp). First, we built a gas phase quantitative structure-activity relationships (QSAR) model, linearly correlating the computed enthalpy of the InhA-PCAM complex formation and the IC50exp. Further, taking into account the solvent effect and loss of inhibitor entropy upon enzyme binding led to a QSAR model with a superior linear correlation between computed Gibbs free energies (ΔΔGcom) of InhA-PCAM complex formation and IC50exp (pIC50exp = −0.1552·ΔΔGcom + 5.0448, R2 = 0.94), which was further validated with a 3D-QSAR pharmacophore model generation (PH4). Structural information from the models guided us in designing of a virtual combinatorial library (VL) of more than 17 million PCAMs. The VL was adsorption, distribution, metabolism and excretion (ADME) focused and reduced down to 1.6 million drug like orally bioavailable analogues and PH4 in silico screened to identify new potent PCAMs with predicted IC50pre reaching up to 5 nM. Combining molecular modeling and PH4 in silico screening of the VL resulted in the proposed novel potent antituberculotic agent candidates with favorable pharmacokinetic profiles.


Introduction
The World Health Organization (WHO) has set as its next goal the decrease of incidence of tuberculosis (TB) and TB-related deaths by more than 90% in the coming 20 years [1]. However, due to emerging and rapidly spreading multi-drug resistant and extensively-drug resistant strains of Mycobacterium tuberculosis (MTb), the causative agent of TB, this goal will be difficult to achieve without the discovery of new potent antituberculotic drugs. Despite the increasing worldwide incidence of TB and the threat for the public health, no novel antituberculotic drugs have been introduced into clinical practice over the past decades. It is therefore imperative to diversify the mycobacterial drug targets in order to fight the increasing incidence of drug-resistant strains of MTb. The oxidoreductase activity of enoyl-acyl carrier protein reductase (InhA or ENR) plays a key role in the type-II fatty-acid synthesis (FAS-II system) of MTb. This essential enzyme catalyzes the elongation cycle of biosynthesis of mycolic acid, which is a vital component of the mycobacterial cell wall [2]. Thus, InhA represents a validated drug target of antituberculotic agents and has been indicated in the Special Programme for Research & Training in Tropical Diseases of WHO (TDR) targets database as an attractive pharmacological target for design of new drug candidates [3,4]. In addition, the InhA sequence and structural organization for bacterial species are distinctly different from mammalian's fatty acid biosynthesis enzymes. Isoniazid prodrug (INH), Figure 1, is a first-line medication used for prevention and treatment of TB. The INH is a prodrug that must be activated by the bacterial catalase-peroxidase enzyme (KatG) which couples the isonicotinic acyl with the reduced form of nicotinamide adenine dinucleotide (NADH) to form an isonicotinic acyl-NADH complex. The complex binds tightly to InhA, blocks the natural substrate and hinders the action of fatty acid synthase, which hampers the synthesis of mycolic acid [5].

Introduction
The World Health Organization (WHO) has set as its next goal the decrease of incidence of tuberculosis (TB) and TB-related deaths by more than 90% in the coming 20 years [1]. However, due to emerging and rapidly spreading multi-drug resistant and extensively-drug resistant strains of Mycobacterium tuberculosis (MTb), the causative agent of TB, this goal will be difficult to achieve without the discovery of new potent antituberculotic drugs. Despite the increasing worldwide incidence of TB and the threat for the public health, no novel antituberculotic drugs have been introduced into clinical practice over the past decades. It is therefore imperative to diversify the mycobacterial drug targets in order to fight the increasing incidence of drug-resistant strains of MTb. The oxidoreductase activity of enoyl-acyl carrier protein reductase (InhA or ENR) plays a key role in the type-II fatty-acid synthesis (FAS-II system) of MTb. This essential enzyme catalyzes the elongation cycle of biosynthesis of mycolic acid, which is a vital component of the mycobacterial cell wall [2]. Thus, InhA represents a validated drug target of antituberculotic agents and has been indicated in the Special Programme for Research & Training in Tropical Diseases of WHO (TDR) targets database as an attractive pharmacological target for design of new drug candidates [3,4]. In addition, the InhA sequence and structural organization for bacterial species are distinctly different from mammalian's fatty acid biosynthesis enzymes. Isoniazid prodrug (INH), Figure 1, is a first-line medication used for prevention and treatment of TB. The INH is a prodrug that must be activated by the bacterial catalase-peroxidase enzyme (KatG) which couples the isonicotinic acyl with the reduced form of nicotinamide adenine dinucleotide (NADH) to form an isonicotinic acyl-NADH complex. The complex binds tightly to InhA, blocks the natural substrate and hinders the action of fatty acid synthase, which hampers the synthesis of mycolic acid [5]. The observed resistance of MTb to INH arises from mutations in the mycobacterial KatG and its reduced ability to activate the prodrug [6][7][8].
Several research groups were working on the discovery of InhA inhibitors not requiring KatG activation which were based on various scaffolds: triclosan [9], diphenyl ether [10,11], pyrrolidine carboxamide [12] and arylamide derivatives [13], all displaying intermediate inhibitory potencies. Recently, structural requirements for efficient InhA inhibitors have been evaluated by 2D-and 3D-quantitative structure-activity relationships (QSAR) methods, hologram QSAR (HQSAR) and comparative molecular field analysis (CoMFA) [14]. These studies indicated that a potent InhA inhibitor should be a relatively long molecule, which binds to the InhA next to the NADH cofactor binding site. This inhibitor should also contain a bulky group that selectively fits into a hydrophobic pocket of InhA constituted by residues Met155, Pro193, Ile215, Leu217, Leu218 and Trp222 that is located near to a larger solvent accessible cavity [14]. The crystal structure conformation of the NADH cofactor in InhA complexes with bound nanomolar inhibitors is shown on Figure 2. Except for the aryl amide inhibitor ( Figure 2C), conformation of the NADH in these ternary complexes is extended, similar to the NADH conformation in the reference binary InhA-cofactor complex (PDB entry 4DRE [15]), Figure 3. The observed resistance of MTb to INH arises from mutations in the mycobacterial KatG and its reduced ability to activate the prodrug [6][7][8].
Furthermore, we have built a virtual combinatorial library of PCAM analogues with the aim to design more potent orally bioavailable InhA inhibitors. The initial diversity library was focused using computed adsorption, distribution, metabolism and excretion (ADME)-related properties and a subset of predicted orally bioavailable PCAM analogues. In a subsequent screening, generated virtual library of analogues were compared to the PH4 pharmacophore model in order to identify analogues which can fit into the mycobacterial InhA binding site. Preference was given to analogues substituted on the benzene ring so as to enhance the van der Waals interactions in the hydrophobic pocket of the InhA binding site. Based on the complexation QSAR model we were able to predict the activities of the designed PCAM analogues and select the best analogues with the highest predicted inhibitory potencies. Finally, ADME profiles of the best-designed analogues were estimated and compared with those of the antituberculotic drugs currently used in the clinical practice.

Results and Discussion
A training set of 20 PCAMs and validation set of 3 PCAMs (Table 1) were selected from a homogeneous series of InhA inhibitors for which experimentally determined inhibitory activities were available from a single laboratory [12]. The whole series was obtained by substitutions at five The best InhA inhibitor designed so far belongs to the pyrrolidine carboxamide family (PCAM) and displays an inhibitory concentration IC 50 of 390 nM [12]. Since PCAMs are direct InhA inhibitors bypassing the drug resistance related to the need for KatG activation, a possibility of optimizing the IC 50 of PCAMs down to one digit nanomolar concentrations, would be rather attractive. Previous SAR knowledge of the existing PCAMs can be summarized as follows: halogen atom substitution in para-position of the benzene ring is detrimental for the potency, size limitation for the electron-withdrawing group in meta-position is required although that position is preferred to ortho. As we can see on Figure 2B, the lack of π-π stacking, cation-π or σ-π interactions explains partially the relatively low potency and opens the gate to the design of more potent analogues of the PCAM family. In this study, we have built a Hansch-type "complexation" QSAR model of InhA inhibition for a training set of known PCAMs inhibitors [12] and correlated computed Gibbs free energies of InhA-PCAMx complex formation with their observed inhibitory potencies. The robustness of this QSAR model was confirmed by a four features 3D-QSAR pharmacophore model (PH4) (see Section 3.8), which was prepared with help of bound conformations of the training set of PCAMs.
Furthermore, we have built a virtual combinatorial library of PCAM analogues with the aim to design more potent orally bioavailable InhA inhibitors. The initial diversity library was focused using computed adsorption, distribution, metabolism and excretion (ADME)-related properties and a subset of predicted orally bioavailable PCAM analogues. In a subsequent screening, generated virtual library of analogues were compared to the PH4 pharmacophore model in order to identify analogues which can fit into the mycobacterial InhA binding site. Preference was given to analogues substituted on the benzene ring so as to enhance the van der Waals interactions in the hydrophobic pocket of the InhA binding site. Based on the complexation QSAR model we were able to predict the activities of the designed PCAM analogues and select the best analogues with the highest predicted inhibitory potencies. Finally, ADME profiles of the best-designed analogues were estimated and compared with those of the antituberculotic drugs currently used in the clinical practice.

Results and Discussion
A training set of 20 PCAMs and validation set of 3 PCAMs (Table 1) were selected from a homogeneous series of InhA inhibitors for which experimentally determined inhibitory activities were available from a single laboratory [12]. The whole series was obtained by substitutions at five positions of the aromatic ring of 1-cyclohexyl-5-oxo-N-phenylpyrrolidine-3-carboxamide (PCAM1) as shown in Table 1. Their experimental inhibitory concentrations IC 50 exp [12] cover a concentration range sufficiently wide to serve well for building of a reliable QSAR model of InhA inhibition.

Quantitative Structure-Activity Relationships (QSAR) Model
The relative Gibbs free energy of the enzyme-inhibitor (E:I) complex formation was computed for the InhA-PCAMx complexes prepared by in situ modification of the template inhibitor PCAM1 within the binding site of InhA of the refined crystal structure (PDB entry code 4U0J [12]), as described in the Methods Section. Table 2 lists computed values of Gibbs free energies of complex formation (∆∆G com ) and its components, Equation (7) for the training and test sets of pyrrolidine carboxamide inhibitors [12]. Since the ∆∆G com was computed using an approximate approach, the relevance of the binding model was evaluated via correlation with the observed activity data (IC 50 exp , [12]) by linear regression, Equation (8), Section 3.6. Two correlation equations obtained for the Gibbs free energy of enzyme-inhibitor complex formation ∆∆G com and its enthalpic component ∆∆H MM are shown in Table 3 together with the relevant statistical data. The correlations are plotted on Figure 4. Relatively high values of the regression coefficient and Fischer F-test of the correlation involving ∆∆G com , Equation (B), Table 3, indicate that there is a strong relationship between the binding model and the experimental inhibitory potencies of the series of PCAMs.   The ratio of predicted and observed inhibitory potencies (pIC50 pre /pIC50 exp , the pIC50 pre was calculated using correlation Equation (B), Table 3) for validation set of PCAMs not included into the training set, is close to one and documents considerable predictive power of the complexation QSAR model, Table 2. Therefore, the correlation Equation (B) and computed ΔΔGcom quantities can be used for prediction of inhibitory potencies IC50 pre against InhA of MTb for novel PCAM analogues, provided that they share the same binding mode as the training set of pyrrolidine carboxamides.
In the crystal structure of InhA-PCAM1 [12] the benzene ring of the inhibitor sits in a hydrophobic cavity of the active-site surrounded by side chains of predominantly nonpolar  The ratio of predicted and observed inhibitory potencies (pIC 50 pre /pIC 50 exp , the pIC 50 pre was calculated using correlation Equation (B), Table 3) for validation set of PCAMs not included into the training set, is close to one and documents considerable predictive power of the complexation QSAR model, Table 2. Therefore, the correlation Equation (B) and computed ∆∆G com quantities can be used for prediction of inhibitory potencies IC 50 pre against InhA of MTb for novel PCAM analogues, provided that they share the same binding mode as the training set of pyrrolidine carboxamides.
In the crystal structure of InhA-PCAM1 [12] the benzene ring of the inhibitor sits in a hydrophobic cavity of the active-site surrounded by side chains of predominantly nonpolar residues: Met103, Gly104, Phe149, Met155, Pro156, Ala157, Tyr158, Pro193, Met199, Ile202, Leu207, Ala211, Gln214, Ile215 and Leu218, Figure 5 [9,14]. Ring substitutions in the meta position by a halogen atom (F, Cl or Br), which form van der Waal contacts to residues Met103, Ala157, Tyr158, Leu202 and Ile215 one side and Phe149, Met155 and Leu218 on the other side of the benzene ring, enhanced observed inhibitory potencies of PCAMs. Double substitution by chlorine atoms in meta positions yielded the most active inhibitor PCAM9. Besides halogen atoms also substitutions by smaller nonpolar groups such as methyl, methoxy or isopropyl group in the meta position resulted in somewhat elevated potencies compared to PCAM1. On the other hand, substitutions by small to moderate size electron-withdrawing groups in ortho and para positions decreased the inhibitory potencies of PCAMs substantially,  Figure 5 [9,14]. Ring substitutions in the meta position by a halogen atom (F, Cl or Br), which form van der Waal contacts to residues Met103, Ala157, Tyr158, Leu202 and Ile215 one side and Phe149, Met155 and Leu218 on the other side of the benzene ring, enhanced observed inhibitory potencies of PCAMs. Double substitution by chlorine atoms in meta positions yielded the most active inhibitor PCAM9. Besides halogen atoms also substitutions by smaller nonpolar groups such as methyl, methoxy or isopropyl group in the meta position resulted in somewhat elevated potencies compared to PCAM1. On the other hand, substitutions by small to moderate size electron-withdrawing groups in ortho and para positions decreased the inhibitory potencies of PCAMs substantially, Table 1 [14]. The CoMFA and CoMSIA analyses of PCAM inhibitors proposed useful structural information facilitating the design of new effective antituberculotic agents, namely the aromatic ring that can fit into the larger hydrophobic pocket of InhA should contain judicious substitution(s) by nucleophilic groups [14]. In fact as we can see from Table 1, for the three bromo substituted compounds PCAM8, PCAM3 and PCAM2 a change of the bromine atom from position 2 (PCAM8, IC50 exp = 101 μM) to position 3 (PCAM3, IC50 exp = 0.89 μM) and finally to position 4 (PCAM2, IC50 exp = 28 μM) [12] resulted in a significant variation of the inhibitory potencies. It can be pointed out that the ten most active training set compounds with IC50 exp lower than 4 μM are substituted in positions 3 or 5 or both except PCAM16, which contains a 2-CH3, 5-Cl substituents on its benzene ring.
In order to identify structural modifications of the benzene ring leading to increased binding affinity of PCAMs to InhA of MTb we have carried out detailed analysis of interactions in a series of InhA-PCAMs complexes with help of the complexation QSAR model. The first step of this analysis aimed at obtaining insight into InhA active-site interactions by performing the interaction energy breakdown into contributions from individual residues filling the hydrophobic pocket displayed on Figure 5 for the most active inhibitor and an inactive inhibitor PCAM9 and PCAM11, respectively, Table 2 [12]. The results of the interaction energy analysis are shown on Figure 6. Figure 6 shows the dependence of Eint on substitutions of the benzene ring of PCAMs by comparing the potent 3,5-dichloro pyrrolidine carboxamide PCAM9 (IC50 exp = 0.39 μM, Table 2) and 2,5-dichlorophenyl analogue PCAM11 (IC50 exp = 56.5 μM) [12]. Noticeable differences were observed for the active-site residues Ile215 and Tyr158 favoring the interaction with PCAM9 vs. Phe149 and Ala157 favoring the PCAM11. The effect of moving one Cl substituent in PCAM9 from meta to ortho position resulted in a 145-fold decrease of the inhibitory potency. In our recent study on thymine like inhibitors of thymidine monophosphate kinase (TMPK) of MTb, we were able to discriminate between active and inactive analogues based on the breakdown of Eint into individual contributions from active-site residues [16]. The same approach applied to PCAMs in order to access the influence of halogen substitution over the biological activity is not as conclusive as previously about the suggestion of structural modifications able to lead to novel more potent PCAMs. The CoMFA and CoMSIA analyses of PCAM inhibitors proposed useful structural information facilitating the design of new effective antituberculotic agents, namely the aromatic ring that can fit into the larger hydrophobic pocket of InhA should contain judicious substitution(s) by nucleophilic groups [14]. In fact as we can see from Table 1, for the three bromo substituted compounds PCAM8, PCAM3 and PCAM2 a change of the bromine atom from position 2 (PCAM8, IC 50 exp = 101 µM) to position 3 (PCAM3, IC 50 exp = 0.89 µM) and finally to position 4 (PCAM2, IC 50 exp = 28 µM) [12] resulted in a significant variation of the inhibitory potencies. It can be pointed out that the ten most active training set compounds with IC 50 exp lower than 4 µM are substituted in positions 3 or 5 or both except PCAM16, which contains a 2-CH 3 , 5-Cl substituents on its benzene ring. In order to identify structural modifications of the benzene ring leading to increased binding affinity of PCAMs to InhA of MTb we have carried out detailed analysis of interactions in a series of InhA-PCAMs complexes with help of the complexation QSAR model. The first step of this analysis aimed at obtaining insight into InhA active-site interactions by performing the interaction energy breakdown into contributions from individual residues filling the hydrophobic pocket displayed on Figure 5 for the most active inhibitor and an inactive inhibitor PCAM9 and PCAM11, respectively, Table 2 [12]. The results of the interaction energy analysis are shown on Figure 6. Figure 6 shows the dependence of E int on substitutions of the benzene ring of PCAMs by comparing the potent 3,5-dichloro pyrrolidine carboxamide PCAM9 (IC 50 exp = 0.39 µM, Table 2) and 2,5-dichlorophenyl analogue PCAM11 (IC 50 exp = 56.5 µM) [12]. Noticeable differences were observed for the active-site residues Ile215 and Tyr158 favoring the interaction with PCAM9 vs. Phe149 and Ala157 favoring the PCAM11. The effect of moving one Cl substituent in PCAM9 from meta to ortho position resulted in a 145-fold decrease of the inhibitory potency. In our recent study on thymine like inhibitors of thymidine monophosphate kinase (TMPK) of MTb, we were able to discriminate between active and inactive analogues based on the breakdown of E int into individual contributions from active-site residues [16]. The same approach applied to PCAMs in order to access the influence of halogen substitution over the biological activity is not as conclusive as previously about the suggestion of structural modifications able to lead to novel more potent PCAMs.

Ligand-Based 3D-QSAR Pharmacophore Model of Inhibitory Activity
The process of 3D-QSAR PH4 pharmacophore generation was carried out in three steps: the constructive step, the subtractive step, and the optimization step. During the constructive phase of HypoGen the most active compounds for which IC50 exp ≤ 2 × 0.39 μM, were automatically selected as the leads. Thus, the most active PCAM9 (IC50 exp = 0.39 μM) alone was used to generate the starting PH4 features. The features, which matched this lead, were retained. During the subsequent subtractive phase, pharmacophoric features which were present in more than half of the inactive compounds were removed. The pharmacophores which contained all features were retained. None of the training set compounds was found to be inactive (IC50 exp > 0.39 × 10 3.5 μM = 1233.3 μM). During the final optimization phase, the score of the pharmacophoric hypotheses was improved. Hypotheses are scored via simulated annealing approach according to errors in the activity estimates from regression and complexity. At the end of the optimization, 10 best scoring unique pharmacophoric hypotheses displaying four features were kept.
The reliability of the generated pharmacophore models was then assessed using the calculated cost parameters. The overall costs ranged from 64.6 (Hypo1) to 115.6 (Hypo10). The gap between the highest and lowest cost parameter was relatively small and matched well with the homogeneity of the generated hypotheses and consistency of the training set. For the best PH4 model the fixed cost (40.1) was lower than the null cost (217.4) by Δ = 177.3. This difference represents a chief indicator of the predictability of the PH4 model (Δ > 70 corresponds to a probability higher than 90% that the model represents a valid correlation [16]). In order to be statistically significant the hypotheses have to reach values as similar as possible to the fixed cost and as different as possible from the null cost. The difference Δ ≥ 101.8 for the set of 10 hypotheses confirms high quality of the pharmacophore model. The standard indicator as the root-mean-square deviation (RMSD) between the hypotheses ranged from 1.827 to 3.138 while the squared correlation coefficient (R 2 ) occupied an interval from 0.63 to 0.88. The PH4 hypothesis with the best RMSD and highest R 2 was retained for further analysis. The statistical data for the set of hypotheses (costs, RMSD, R 2 ) are listed in Table 4.
The geometry of the Hypo1 pharmacophore of InhA inhibition is displayed on Figure 7.
The regression equation for pIC50 exp vs. pIC50 pre estimated from Hypo1: pIC50 exp = 0.9237·pIC50 pre + 0.393 (n = 20, R 2 = 0.91, Rxv 2 = 0.9, F-test = 191.5, σ = 0.224, α > 95%) is also plotted on Figure 7. To check the predictive power of the generated pharmacophore model we have computed the ratio of predicted and observed activities (pIC50 pre /pIC50 exp ) for the validation set, Table 1. The computed ratios are as follows: PCAM21: 1.261, PCAM22: 1.017, PCAM23: 0.991; all of them displayed values of the ratio relatively close to one, which confirms the substantial predictive power of this regression for the best PH4 model.

Ligand-Based 3D-QSAR Pharmacophore Model of Inhibitory Activity
The process of 3D-QSAR PH4 pharmacophore generation was carried out in three steps: the constructive step, the subtractive step, and the optimization step. During the constructive phase of HypoGen the most active compounds for which IC 50 exp ď 2ˆ0.39 µM, were automatically selected as the leads. Thus, the most active PCAM9 (IC 50 exp = 0.39 µM) alone was used to generate the starting PH4 features. The features, which matched this lead, were retained. During the subsequent subtractive phase, pharmacophoric features which were present in more than half of the inactive compounds were removed. The pharmacophores which contained all features were retained. None of the training set compounds was found to be inactive (IC 50 exp > 0.39ˆ10 3.5 µM = 1233.3 µM). During the final optimization phase, the score of the pharmacophoric hypotheses was improved. Hypotheses are scored via simulated annealing approach according to errors in the activity estimates from regression and complexity. At the end of the optimization, 10 best scoring unique pharmacophoric hypotheses displaying four features were kept. The reliability of the generated pharmacophore models was then assessed using the calculated cost parameters. The overall costs ranged from 64.6 (Hypo1) to 115.6 (Hypo10). The gap between the highest and lowest cost parameter was relatively small and matched well with the homogeneity of the generated hypotheses and consistency of the training set. For the best PH4 model the fixed cost (40.1) was lower than the null cost (217.4) by ∆ = 177.3. This difference represents a chief indicator of the predictability of the PH4 model (∆ > 70 corresponds to a probability higher than 90% that the model represents a valid correlation [16]). In order to be statistically significant the hypotheses have to reach values as similar as possible to the fixed cost and as different as possible from the null cost. The difference ∆ ě 101.8 for the set of 10 hypotheses confirms high quality of the pharmacophore model. The standard indicator as the root-mean-square deviation (RMSD) between the hypotheses ranged from 1.827 to 3.138 while the squared correlation coefficient (R 2 ) occupied an interval from 0.63 to 0.88. The PH4 hypothesis with the best RMSD and highest R 2 was retained for further analysis. The statistical data for the set of hypotheses (costs, RMSD, R 2 ) are listed in Table 4.
The geometry of the Hypo1 pharmacophore of InhA inhibition is displayed on Figure 7. The regression equation for pIC 50 exp vs. pIC 50 pre estimated from Hypo1: pIC 50 exp = 0.9237¨pIC 50 pre + 0.393 (n = 20, R 2 = 0.91, R xv 2 = 0.9, F-test = 191.5, σ = 0.224, α > 95%) is also plotted on Figure 7. To check the predictive power of the generated pharmacophore model we have computed the ratio of predicted and observed activities (pIC 50 pre /pIC 50 exp ) for the validation set, Table 1. The computed ratios are as follows: PCAM21: 1.261, PCAM22: 1.017, PCAM23: 0.991; all of them displayed values of the ratio relatively close to one, which confirms the substantial predictive power of this regression for the best PH4 model.    The validation of the PH4 model was carried out by randomization using the CatScramble algorithm of Catalyst for 49 random runs corresponding to a 98% confidence level. This validation procedure created 10 hypotheses for each run. However, all of them were less predictive then the Hypo10, the hypothesis with the highest cost out of the ten best-generated hypotheses, Table 4. Thus, the best-selected hypothesis Hypo1 represents with a probability of 98% a pharmacophore model of the inhibitory activity of PCAMs with a similar level of predictive power as the complexation QSAR model, which relies on the computed Gibbs free energies of enzyme-inhibitor binding.
Next we have performed computational design and selection of new PCAM analogues with increased inhibition potencies against InhA of MTb. The design strategy relied on the hydrophobic feature included in the best PH4 pharmacophore model at the position of the benzene ring coupled with mapping of the ring substitutions to the hydrophobic features in the PH4 hypothesis Hypo1 (Figure 7). The validation of the PH4 model was carried out by randomization using the CatScramble algorithm of Catalyst for 49 random runs corresponding to a 98% confidence level. This validation procedure created 10 hypotheses for each run. However, all of them were less predictive then the Hypo10, the hypothesis with the highest cost out of the ten best-generated hypotheses, Table 4. Thus, the best-selected hypothesis Hypo1 represents with a probability of 98% a pharmacophore model of the inhibitory activity of PCAMs with a similar level of predictive power as the complexation QSAR model, which relies on the computed Gibbs free energies of enzyme-inhibitor binding.
Next we have performed computational design and selection of new PCAM analogues with increased inhibition potencies against InhA of MTb. The design strategy relied on the hydrophobic feature included in the best PH4 pharmacophore model at the position of the benzene ring coupled with mapping of the ring substitutions to the hydrophobic features in the PH4 hypothesis Hypo1 (Figure 7).

Library Design and Adsorption, Distribution, Metabolism and Excretion (ADME) Focusing
We have built a virtual library of new pyrrolidine carboxamide compounds with a variety of substitutions in five positions of the benzene ring with the goal to identify more potent orally bioavailable inhibitors of the InhA of MTb. PCAM analogues with substituted phenyl ring can be prepared through amide synthesis from pyrrolidine carboxylic acid and substituted anilines following a previously published protocol [12]. During the virtual library enumeration the R-groups listed in Table 5 were attached to positions R 1 -R 5 of the benzene ring of the PCAM scaffold to form an combinatorial library of the size: R 1ˆR2ˆR3ˆR4ˆR5 = 31ˆ26ˆ27ˆ26ˆ31 = 17,540,172 analogues. In order to match the substitution pattern of the best training set inhibitor PCAM9 and to take into account the reported structural information about para position that is not suitable for substitutions [12]. Hydrogen atom was added to the list of R 3 -group bringing it to 27 fragments. This subset of 649,636 virtual PCAMs is expected to yield the best new PCAM analogues with activity better than the 0.39 µM potency of PCAM9. In order to avoid steric effects, bulkier substituents were used only at extreme positions R 1 and R 5 . Including them at all positions is detrimental since they will be excluded through the Lipinski's rule violation (molecular weight >500 g¨mol´1) [17]. Relatively large initial diversity library of PCAM analogues was generated from the substituted anilines listed in the databases of available chemicals [18]. In order to design a more focused library of a reduced size and increased content of drug-like and orally bioavailable molecules, we have introduced a set of filters and penalties, which can help to select smaller number of suitable PCAMs that can be submitted to in silico screening. The initial virtual library was thus filtered in an ADME-based focusing step to remove compounds with expected poor oral bioavailability and low drug-like character. Only analogues with high predicted percentage of human oral absorption (HOA) in the gastrointestinal tract larger than 80% [19,20] and compounds satisfying the Lipinski's rule of five [17] computed for the entire virtual library using QikProp software [21], were kept. This focusing has reduced the size of the initial library to 1,604,448 analogues, less than 10% of its original number size.

Library Design and Adsorption, Distribution, Metabolism and Excretion (ADME) Focusing
We have built a virtual library of new pyrrolidine carboxamide compounds with a variety of substitutions in five positions of the benzene ring with the goal to identify more potent orally bioavailable inhibitors of the InhA of MTb. PCAM analogues with substituted phenyl ring can be prepared through amide synthesis from pyrrolidine carboxylic acid and substituted anilines following a previously published protocol [12]. During the virtual library enumeration the R-groups listed in Table 5 were attached to positions R1-R5 of the benzene ring of the PCAM scaffold to form an combinatorial library of the size: R1 × R2 × R3 × R4 × R5 = 31 × 26 × 27 × 26 × 31 = 17,540,172 analogues. In order to match the substitution pattern of the best training set inhibitor PCAM9 and to take into account the reported structural information about para position that is not suitable for substitutions [12]. Hydrogen atom was added to the list of R3-group bringing it to 27 fragments. This subset of 649,636 virtual PCAMs is expected to yield the best new PCAM analogues with activity better than the 0.39 μM potency of PCAM9. In order to avoid steric effects, bulkier substituents were used only at extreme positions R1 and R5. Including them at all positions is detrimental since they will be excluded through the Lipinski's rule violation (molecular weight >500 g·mol −1 ) [17]. Relatively large initial diversity library of PCAM analogues was generated from the substituted anilines listed in the databases of available chemicals [18]. In order to design a more focused library of a reduced size and increased content of drug-like and orally bioavailable molecules, we have introduced a set of filters and penalties, which can help to select smaller number of suitable PCAMs that can be submitted to in silico screening. The initial virtual library was thus filtered in an ADME-based focusing step to remove compounds with expected poor oral bioavailability and low drug-like character. Only analogues with high predicted percentage of human oral absorption (HOA) in the gastrointestinal tract larger than 80% [19,20] and compounds satisfying the Lipinski's rule of five [17] computed for the entire virtual library using QikProp software [21], were kept. This focusing has reduced the size of the initial library to 1,604,448 analogues, less than 10% of its original number size. Table 5. R-groups (fragments, building blocks, substituents) used in the design of the initial diversity library of PCAM analogues.
-SO2NH2 a fragments 1-26 were used in R2 and R4-groups; fragments 1-26 and H were used in R3-group and fragments 1-31 were used in R1 and R5-groups; b dashed bonds indicates the attachment points of individual fragments.
The selected library subset then underwent virtual screening by means of the PH4 pharmacophore of PCAM inhibitory activity towards model of InhA of MTb.
-SO 2 NH 2 a fragments 1-26 were used in R 2 and R 4 -groups; fragments 1-26 and H were used in R 3 -group and fragments 1-31 were used in R 1 and R 5 -groups; b dashed bonds indicates the attachment points of individual fragments.
The selected library subset then underwent virtual screening by means of the PH4 pharmacophore of PCAM inhibitory activity towards model of InhA of MTb.

In Silico Screening of Library of PCAMs
The library of PCAM analogues was further screened for molecular structures matching to the 3D-QSAR PH4 pharmacophore model Hypo1 of InhA inhibition. From the set of 1,604,448 analogues few thousands of PCAMs mapped to at least 2 features, 592 of which mapped to 4 features of the pharmacophore. Out of then, only 115 best fitting analogues (PH4 hits) have been retained and submitted to screening with help of the complexation QSAR model. Computed Gibbs free energy of complex formation with InhA of MTb and its component as well as predicted half-maximal inhibitory concentrations IC 50 pre estimated from the correlation Equation (B), Table 3, are given in Table 6.
For the majority of new PCAM analogues, the estimated inhibitory potencies shown in Table 6 are better than that for the most active training set compound PCAM9 (IC 50 exp = 390 nM [12]). In fact, for the best-designed PCAM analogue 21-2-H-6-4 the predicted inhibitory potency is more than 80 times higher than for the PCAM9.

Analysis of New Inhibitors
In order to identify which substituents on the benzene ring of PCAMs (Table 5) lead to new inhibitor candidates with the highest predicted potencies towards the InhA of MTb, we have prepared histograms of the frequency of occurrence of R 1 , R 4 and R 5 groups in the 115 PH4 best fitting hit PCAMs selected from the focused virtual library shown in Table 6 ( Figure 8).

In Silico Screening of Library of PCAMs
The library of PCAM analogues was further screened for molecular structures matching to the 3D-QSAR PH4 pharmacophore model Hypo1 of InhA inhibition. From the set of 1,604,448 analogues few thousands of PCAMs mapped to at least 2 features, 592 of which mapped to 4 features of the pharmacophore. Out of then, only 115 best fitting analogues (PH4 hits) have been retained and submitted to screening with help of the complexation QSAR model. Computed Gibbs free energy of complex formation with InhA of MTb and its component as well as predicted half-maximal inhibitory concentrations IC50 pre estimated from the correlation Equation (B), Table 3, are given in Table 6.
For the majority of new PCAM analogues, the estimated inhibitory potencies shown in Table 6 are better than that for the most active training set compound PCAM9 (IC50 exp = 390 nM [12]). In fact, for the best-designed PCAM analogue 21-2-H-6-4 the predicted inhibitory potency is more than 80 times higher than for the PCAM9.

Analysis of New Inhibitors
In order to identify which substituents on the benzene ring of PCAMs (Table 5) lead to new inhibitor candidates with the highest predicted potencies towards the InhA of MTb, we have prepared histograms of the frequency of occurrence of R1, R4 and R5 groups in the 115 PH4 best fitting hit PCAMs selected from the focused virtual library shown in Table 6 ( Figure 8).  Histograms of frequency of occurrence of individual R 1 , R 4 , R 5 -groups in the 115 best-selected analogues mapping to the four features of the PH4 pharmacophore hypothesis Hypo1 (for fragments' structures see Table 5); For R 2 and R 3 the occurrences are indicated in parenthesis: R 2 = -F (14), -Cl (97), -C"CH (3), -NH 2 (1) and R 3 = -F (46), -H (69).   Table 2); c ∆∆G sol is the relative solvation Gibbs free energy contribution to ∆∆G com ; d ∆∆TS vib is the relative entropic (vibrational) contribution to ∆∆G com ; e ∆∆G com is the relative Gibbs free energy change related to the enzyme-inhibitor InhA-PCAM complex formation ∆∆G com -∆∆H MM + ∆∆G sol´∆ ∆TS vib ; f IC 50 pre is the predicted half-maximal inhibitory concentration of PCAMx towards InhA of MTb calculated from ∆∆G com using correlation Equation (B), Table 3; g IC 50 exp is given for the reference inhibitor PCAM1 instead of IC 50 pre .

ADME Profiles of Designed PCAMs
The ADME-related properties described in Section 3.9 were computed. The values for the best active designed PCAMs were compared with those computed for drugs used for treatment of tuberculosis or currently undergoing clinical trials, Table 7. The best designed analogues all display #stars descriptor equal to zero, meaning that the optimal value range of none of the drug-likeness descriptors was violated. Thus, the designed PCAMs are predicted to display high level of druglikeness as well as human oral absorption in the gastrointestinal tract (HOA).

ADME Profiles of Designed PCAMs
The ADME-related properties described in Section 3.9 were computed. The values for the best active designed PCAMs were compared with those computed for drugs used for treatment of tuberculosis or currently undergoing clinical trials, Table 7. The best designed analogues all display #stars descriptor equal to zero, meaning that the optimal value range of none of the drug-likeness descriptors was violated. Thus, the designed PCAMs are predicted to display high level of druglikeness as well as human oral absorption in the gastrointestinal tract (HOA).  0.0-6.0); i estimated number of hydrogen bonds that would be accepted (acc) by the solute from water molecules in an aqueous solution. Values are averages taken over a number of configurations, so they can be non-integer (range for 95% of drugs: 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 95% of drugs:´2-6.5); k logarithm of predicted aqueous (wat) solubility, logS. S in mol¨dm´3 is the concentration of the solute in a saturated solution that is in equilibrium with the crystalline solid (range for 95% of drugs:´6.0-0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for 95% of drugs:´1.5-1.5); m logarithm of predicted brain/blood partition coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and serotonin are CNS negative because they are too polar to cross the blood-brain barrier (range for 95% of drugs:´3.0-1.2); n predicted apparent Caco-2 cell membrane permeability in Boehringer-Ingelheim scale, in nm/s (range for 95% of drugs: <25 poor, > 500 great);˝number of likely metabolic reactions (range for 95% of drugs: 1-8); p predicted half-maximal inhibitory concentrations IC 50 pre . IC 50 pre was predicted from computed ∆∆G com using the regression equation B shown in Table 3; q human oral absorption (1-low, 2-medium, 3-high); r percentage of human oral absorption in gastrointestinal tract (<25% poor; >80% high); * star indicating that the property descriptor value falls outside the range of values for 95% of known drugs.

Experimental Section
The workflow describing the steps of the whole process of virtual design of novel PCAM analogues is presented in Scheme 1.

Experimental Section
The workflow describing the steps of the whole process of virtual design of novel PCAM analogues is presented in Scheme 1.

Scheme 1.
Workflow describing the multistep approach to virtually design novel PCAM analogues with higher predicted potency against InhA.

Training and Validation Sets
Chemical structures and biological activities (IC50 exp ) of training and validation sets of pyrrolidine carboxamide inhibitors of InhA studied here were taken from literature [12]. The potencies of these compounds cover a broad range of half-maximal inhibitory concentrations (0.39

Training and Validation Sets
Chemical structures and biological activities (IC 50 exp ) of training and validation sets of pyrrolidine carboxamide inhibitors of InhA studied here were taken from literature [12]. The potencies of these compounds cover a broad range of half-maximal inhibitory concentrations (0.39 ď IC 50 exp ď 101.0 µM) in order to allow construction of a QSAR model. The training set contained 20 PCAM inhibitors while the validation set included 3 PCAMs taken from the same reference [12].
The structures of InhA and enzyme-inhibitor (E:I) complexes were regarded to be at the pH of 7 with capped N-and C-terminal residues set to neutral charge. The protonizable and ionizable residues of the enzyme were considered charged. All crystallographic water molecules were removed from the model. The inhibitors were prepared from the reference crystal structure 4U0J [12] by in situ modification of function groups in the molecular scaffold of the reference inhibitor PCAM1. An exhaustive conformational search was carried out over all rotatable bonds of the replaced function groups, which was tied with a careful gradual energy-minimization of the inhibitor and active-site residues of the InhA located in the close vicinity (ď5 Å). This process helped to identify low-energy bound conformations of the modified inhibitors leading to stable structures of binary E:I complexes, which were then carefully refined by energy-minimization of the complex. This procedure has been successfully applied to building of models of viral, bacterial and protozoal enzyme-inhibitor complexes and design of peptidomimetic, hydroxynaphthoic, thymidine and triclosan-based enzyme inhibitors [16,[24][25][26][27][28][29][30].

Molecular Mechanics
Modeling of inhibitors, InhA enzyme and E:I complexes was done in all-atom representation using consistent force field CFF91 force field parameters and charges [31]. In all molecular models a dielectric constant of 4 was employed for all molecular mechanics (MM) calculations in order to take into account the dielectric shielding effect in proteins. Energy minimizations of the E:I complexes, free E and I were completed by gradually relaxing the molecular structures, starting with the added hydrogen atoms, continued with heavy atoms of inhibitor, followed by residue side chains, and concluded with the protein backbone relaxation. During the geometry optimization process, a sufficient number of steepest descent steps followed by conjugate gradient iterative cycles were used, while the convergence criterion for the average gradient was set to 0.01 kcal¨mol´1¨Å´1.

Conformational Search
Conformations of free inhibitor were obtained from the bound conformations in the binary E:I complexes by gradual relaxation to the nearest local energy minimum. Then a Monte Carlo search (ď50,000 iterations) for low-energy conformations was performed over all rotatable bonds, except those in the rings, using Discovery Studio 2.5 (DS 2.5) molecular modeling program [32]. Two hundred unique inhibitor conformations were generated by randomly varying torsion angles of the last accepted conformer by˘15 deg at 5000 K followed by subsequent energy minimization. During the minimization a dielectric constant ε = 80 was used to approximate the dielectric screening effect of solvation. The conformer with the lowest total energy was selected and re-minimized at a dielectric constant of 4.

Solvation Gibbs free energies
The electrostatic component of the solvation Gibbs free energy which includes the effect of ionic strength via solving the nonlinear Poisson-Boltzmann equation [33,34] was computed by the DelPhi module of Discovery Studio [32]. The program represents the solvent by a continuous medium of high dielectric constant (ε o = 80) and the solute as charge distribution filling a low dielectric (ε i = 4) cavity with boundaries linked to the solute's molecular surface. The program numerically solves for the molecular electrostatic potential and reaction field around the solute using finite difference method. DelPhi calculations were done on a (235ˆ235ˆ235) cubic lattice grid for the E:I complexes and free E and on (65ˆ65ˆ65) grid for the free I. Full coulombic boundary conditions were employed. Two subsequent focusing steps led to a similar final resolution of about 0.3 Å per grid unit at 70% filling of the grid by the solute. Physiological ionic strength of 0.145 mol¨dm´3, atomic partial charges and radii defined in the CFF force field parameter set [32] and a probe sphere radius of 1.4 Å were used. The electrostatic component of the Poisson Boltzmann solvation Gibbs free energy was calculated as the reaction field energy [29,30,[35][36][37].

Calculation of Binding Affinity and QSAR Model
The standard Gibbs free energy (GFE) change of enzyme-inhibitor (E:I) complexes (∆G com ) formation in water is related to inhibition constant (K i ) of a reversible inhibitor I. Thus, prediction of the K i value from computed complexation GFE as lnK i =´∆G com /RT, is achievable assuming the following equilibrium [25]: tEu aq`t Iu aq Ø tE : Iu aq (1) where {} aq indicates solvated species. Half-maximal inhibitory concentration IC 50 is for tight binding competitive inhibitors proportional to K i : where S is the substrate concentration, K m represents the Michaelis constant and E means the free enzyme concentration [38]. The standard GFE change of the reaction Equation (1) can be estimated from molecular simulations of the complex and free reactants: In our laboratory we approximate the exact values of standard GFE for larger systems such as E:I complexes by expression [25][26][27]:  PCAM analogues, Table 6; b drug-likeness, number of property descriptors (from 24 out of the full list of 49 descriptors of QikProp, ver. 3.7, release tside of the range of values for 95% of known drugs; c molecular weight in g·mol −1 (range for 95% of drugs: 130-725 g·mol −1 ) [21]; d total solvent-accessi surface, in Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 300-1000 Å 2 ); e hydrophobic (hfo) portion of the solvent-accessible molecular (mol) surface radius 1.4 Å ) (range for 95% of drugs: 0-750 Å 2 ); f total volume of molecule (mol) enclosed by solvent-accessible molecular surface, in Å 3 (probe radius for 95% of drugs: 500-2000 Å 3 ); g number of non-trivial (not CX3), non-hindered (not alkene, amide, small ring) rotatable bonds (range for 95% of dru timated number of hydrogen bonds that would be donated (don) by the solute to water molecules in an aqueous solution. Values are averages taken ov of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i estimated number of hydrogen bonds that would be accepted (acc) by water molecules in an aqueous solution. Values are averages taken over a number of configurations, so they can be non-integer (range for 95% of dru j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 95% of drugs: −2-6.5); k logarithm of predicted aqueous (w logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is in equilibrium with the crystalline solid (range for 95% of drugs: −6 rithm of predicted binding constant to human serum albumin (HSA) (range for 95% of drugs: −1.5-1.5); m logarithm of predicted brain/blood partiti . Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and serotonin are CNS negative because they are too polar to cr brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell membrane permeability in Boehringer-Ingelheim scale, in nm/s (ran f drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 95% of drugs: 1-8); p predicted half-maximal inhibitory concentratio 0 pre was predicted from computed ΔΔGcom using the regression equation B shown in Table 3; q human oral absorption (1-low, 2-medium, 3-high e of human oral absorption in gastrointestinal tract (<25% poor; >80% high); * star indicating that the property descriptor value falls outside the range 95% of known drugs.  ); g number of non-trivial (not CX3), non-hindered (not alkene, amide, small ring) ro 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to water molecules in an aqueous so a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i estimated number of hydrogen bon solute from water molecules in an aqueous solution. Values are averages taken over a number of configurations, so they can b 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 95% of drugs: −2-6.5); k solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is in equilibrium with the crystalli 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for 95% of drugs: −1.5-1.5); m logarit coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and serotonin are CNS nega the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell membrane permeability in Boeh for 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 95% of drugs: 1-8); p predicted h IC50 pre . IC50 pre was predicted from computed ΔΔGcom using the regression equation B shown in Table 3; q human oral absorp percentage of human oral absorption in gastrointestinal tract (<25% poor; >80% high); * star indicating that the property des values for 95% of known drugs.   Table 6; b drug-likeness, number of property descriptors (from 24 out of the full list that fall outside of the range of values for 95% of known drugs; c molecular weight in g·mol −1 (range for 95% of drug molecular surface, in Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 300-1000 Å 2 ); e hydrophobic (hfo) portion of Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 0-750 Å 2 ); f total volume of molecule (mol) enclosed by solvent-acc Å) (range for 95% of drugs: 500-2000 Å 3 ); g number of non-trivial (not CX3), non-hindered (not alkene, amide, sma 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to water molecules in an a a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i estimated number of hyd solute from water molecules in an aqueous solution. Values are averages taken over a number of configurations, so 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 95% of drugs solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is in equilibrium with th 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for 95% of drugs: −1.5-1.5 coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and serotonin are the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell membrane permeabili for 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 95% of drugs: 1-8); p p IC50 pre . IC50 pre was predicted from computed ΔΔGcom using the regression equation B shown in Table 3; q human o percentage of human oral absorption in gastrointestinal tract (<25% poor; >80% high); * star indicating that the pro values for 95% of known drugs.
G sol (6) where TS tran {I} and TS rot {I} describe the translational and rotational entropy terms of the free inhibitor and ∆TS vib represents a simplified vibrational entropy change upon the complex formation: [39,40]. Similarly, ∆∆H MM is the relative enthalpic contribution to the GFE change related to the intermolecular interactions in the E:I complex derived by MM.
Relative changes in the complexation GFE of different inhibitors with respect to a reference inhibitor, I ref , were computed assuming ideal gas behavior for the rotational and translational motions of the inhibitors [25]:  Table 6; b drug-likeness, number of property descriptors (from 24 out of the full list of 49 descriptors of QikProp, that fall outside of the range of values for 95% of known drugs; c molecular weight in g·mol −1 (range for 95% of drugs: 130-725 g·mol −1 ) [21]; d tota molecular surface, in Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 300-1000 Å 2 ); e hydrophobic (hfo) portion of the solvent-accessible molecu Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 0-750 Å 2 ); f total volume of molecule (mol) enclosed by solvent-accessible molecular surface, in Å Å) (range for 95% of drugs: 500-2000 Å 3 ); g number of non-trivial (not CX3), non-hindered (not alkene, amide, small ring) rotatable bonds (rang 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to water molecules in an aqueous solution. Values are a a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i estimated number of hydrogen bonds that would be a solute from water molecules in an aqueous solution. Values are averages taken over a number of configurations, so they can be non-integer (rang 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 95% of drugs: −2-6.5); k logarithm of predi solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is in equilibrium with the crystalline solid (range for 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for 95% of drugs: −1.5-1.5); m logarithm of predicted br coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and serotonin are CNS negative because they ar the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell membrane permeability in Boehringer-Ingelheim sc for 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 95% of drugs: 1-8); p predicted half-maximal inhibi IC50 pre . IC50 pre was predicted from computed ΔΔGcom using the regression equation B shown in Table 3 Table 6; b drug-likeness, number of property descriptors (from 24 out of the full list of 49 descriptors that fall outside of the range of values for 95% of known drugs; c molecular weight in g·mol −1 (range for 95% of drugs: 130-725 g·mol molecular surface, in Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 300-1000 Å 2 ); e hydrophobic (hfo) portion of the solvent-acces Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 0-750 Å 2 ); f total volume of molecule (mol) enclosed by solvent-accessible molecula Å) (range for 95% of drugs: 500-2000 Å 3 ); g number of non-trivial (not CX3), non-hindered (not alkene, amide, small ring) rotatable 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to water molecules in an aqueous solution a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i estimated number of hydrogen bonds tha solute from water molecules in an aqueous solution. Values are averages taken over a number of configurations, so they can be non 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 95% of drugs: −2-6.5); k logari solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is in equilibrium with the crystalline soli 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for 95% of drugs: −1.5-1.5); m logarithm of coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and serotonin are CNS negative be the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell membrane permeability in Boehringerfor 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 95% of drugs: 1-8); p predicted half-ma IC50 pre . IC50 pre was predicted from computed ΔΔGcom using the regression equation B shown in Table 3 Table 6; b drug-likeness, number of property descriptors (from 24 out of the full list o that fall outside of the range of values for 95% of known drugs; c molecular weight in g·mol −1 (range for 95% of drug molecular surface, in Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 300-1000 Å 2 ); e hydrophobic (hfo) portion of th Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 0-750 Å 2 ); f total volume of molecule (mol) enclosed by solvent-acce Å) (range for 95% of drugs: 500-2000 Å 3 ); g number of non-trivial (not CX3), non-hindered (not alkene, amide, sma 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to water molecules in an aq a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i estimated number of hydr solute from water molecules in an aqueous solution. Values are averages taken over a number of configurations, so 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 95% of drugs: solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is in equilibrium with th 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for 95% of drugs: −1.5-1.5) coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and serotonin are C the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell membrane permeabilit for 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 95% of drugs: 1-8); p pr IC50 pre . IC50 pre was predicted from computed ΔΔGcom using the regression equation B shown in Table 3 Table 6; b drug-likeness, number of property descriptors (from 24 out of the full list that fall outside of the range of values for 95% of known drugs; c molecular weight in g·mol −1 (range for 95% of dru molecular surface, in Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 300-1000 Å 2 ); e hydrophobic (hfo) portion of Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 0-750 Å 2 ); f total volume of molecule (mol) enclosed by solvent-ac Å) (range for 95% of drugs: 500-2000 Å 3 ); g number of non-trivial (not CX3), non-hindered (not alkene, amide, sm 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to water molecules in an a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i estimated number of hy solute from water molecules in an aqueous solution. Values are averages taken over a number of configurations, so 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 95% of drug solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is in equilibrium with t 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for 95% of drugs: −1.5-1.5 coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and serotonin are the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell membrane permeabil for 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 95% of drugs: 1-8); p p IC50 pre . IC50 pre was predicted from computed ΔΔGcom using the regression equation B shown in Table 3 drug-likeness, number of property descriptors (from 24 out of t that fall outside of the range of values for 95% of known drugs; c molecular weight in g·mol −1 (range for 9 molecular surface, in Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 300-1000 Å 2 ); e hydrophobic (hfo) Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 0-750 Å 2 ); f total volume of molecule (mol) enclosed by Å) (range for 95% of drugs: 500-2000 Å 3 ); g number of non-trivial (not CX3), non-hindered (not alkene, 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to water molec a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i estimated num solute from water molecules in an aqueous solution. Values are averages taken over a number of configu 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 95 solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is in equilibr 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for 95% of drug coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and ser the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell membrane for 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 95% of drug IC50 pre . IC50 pre was predicted from computed ΔΔGcom using the regression equation B shown in Table 3 Table 6; b drug-likeness, number of property descriptors (from 24 out of that fall outside of the range of values for 95% of known drugs; c molecular weight in g·mol −1 (range for molecular surface, in Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 300-1000 Å 2 ); e hydrophobic (hfo Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 0-750 Å 2 ); f total volume of molecule (mol) enclosed by Å) (range for 95% of drugs: 500-2000 Å 3 ); g number of non-trivial (not CX3), non-hindered (not alkene 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to water mole a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i estimated nu solute from water molecules in an aqueous solution. Values are averages taken over a number of config 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 9 solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is in equilib 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for 95% of dru coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and se the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell membrane for 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 95% of dru IC50 pre . IC50 pre was predicted from computed ΔΔGcom using the regression equation B shown in Table   percentage Table 6; b drug-likeness, number of property descriptors (from that fall outside of the range of values for 95% of known drugs; c molecular weight in g·mol −1 molecular surface, in Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 300-1000 Å 2 ); e hydrop Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 0-750 Å 2 ); f total volume of molecule (mol) e Å) (range for 95% of drugs: 500-2000 Å 3 ); g number of non-trivial (not CX3), non-hindered ( 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to w a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i est solute from water molecules in an aqueous solution. Values are averages taken over a numb 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases ( solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopam the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell for 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 9  Table 6; b drug-likeness, number of property descriptors (fro that fall outside of the range of values for 95% of known drugs; c molecular weight in g·mol molecular surface, in Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 300-1000 Å 2 ); e hydro Å 2 (probe radius 1.4 Å ) (range for 95% of drugs: 0-750 Å 2 ); f total volume of molecule (mol) Å) (range for 95% of drugs: 500-2000 Å 3 ); g number of non-trivial (not CX3), non-hindered 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i e solute from water molecules in an aqueous solution. Values are averages taken over a num 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range fo coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopa the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cel for 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for IC50 pre . IC50 pre was predicted from computed ΔΔGcom using the regression equation B show percentage of human oral absorption in gastrointestinal tract (<25% poor; >80% high); * sta values for 95% of known drugs.
G sol (7) This evaluation of relative changes is preferable as it may lead to partial cancellation of errors originating from the approximate nature of the MM method, solvent and entropic effects description.
Quantitative structure-activity relationships (QSAR), in which a linear relationship between the computed relative GFE of the InhA-PCAM complex formation ∆∆G com for the receptor and observed inhibitory potencies IC 50 exp specific to MTb, is assumed according to Equations (1) and (2)  ); g number of non-trivial (not CX3), non-hindered (not alkene, amide, s 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to water molecules in a a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i estimated number of h solute from water molecules in an aqueous solution. Values are averages taken over a number of configurations 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 95% of dru solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is in equilibrium wit 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for 95% of drugs: −1.5coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and serotonin a the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell membrane permeab for 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 95% of drugs: 1-8); IC50 pre . IC50 pre was predicted from computed ΔΔGcom using the regression equation B shown in Table 3; q huma percentage of human oral absorption in gastrointestinal tract (<25% poor; >80% high); * star indicating that the values for 95% of known drugs.  ); g number of non-trivial (not CX3), non-hindered (not alkene, amide 0-15); h estimated number of hydrogen bonds that would be donated (don) by the solute to water molecules in a number of configurations, so they can be non-integer (range for 95% of drugs: 0.0-6.0); i estimated number of solute from water molecules in an aqueous solution. Values are averages taken over a number of configuration 2.0-20.0); j logarithm of partitioning coefficient between n-octanol and water (o/w) phases (range for 95% of d solubility, logS. S in mol·dm −3 is the concentration of the solute in a saturated solution that is in equilibrium w 0.5); l logarithm of predicted binding constant to human serum albumin (HSA) (range for 95% of drugs: −1.5 coefficient. Note: QikProp predictions are for orally delivered drugs so, for example, dopamine and serotonin the blood-brain barrier (range for 95% of drugs: −3.0-1.2); n predicted apparent Caco-2 cell membrane perme for 95% of drugs: <25 poor, > 500 great); ° number of likely metabolic reactions (range for 95% of drugs: [1][2][3][4][5][6][7][8] . IC50 pre was predicted from computed ΔΔGcom using the regression equation B shown in Table 3; q hum percentage of human oral absorption in gastrointestinal tract (<25% poor; >80% high); * star indicating that th values for 95% of known drugs.
The QSAR model was prepared by linear regression for the training set of PCAMs [12] using ∆∆G com quantities calculated from Equation (7), where a and b are regression coefficients. This QSAR model (termed here also as a target-specific scoring function) was then evaluated with help of validation set, not included into the training set, Section 3.1 and employed for prediction of inhibitory potencies (IC 50 pre ) of newly designed and modeled PCAM analogues.

Interaction Energy
The MM interaction energy (E int ) protocol available in DS 2.5 [32] computes the non-bonded interactions (van der Waals and electrostatic terms) between enzyme residues and the inhibitor. The calculations were performed using CFF forcefield [32] with a dielectric constant of 4. The breakdown of E int into active-site residue contributions reveals the significance of individual interactions and permits a comparative analysis, which leads to identification of affinity enhancing as well as unfavorable PCAM substitutions.
The interaction energy diagram displaying active-site residue contribution to the E int permits identification of residues with the highest contribution to the ligand binding and suggests the position and type of structural modifications, which can lead to improvement of binding affinity as exemplified in the design of thymine-like inhibitors of TMPK of MTb [16].

Pharmacophore Generation
Pharmacophore modeling assumes that a set of key structural features responsible for biological activity of the compound is recognized by the active site during receptor binding. In this work the pharmacophore was prepared by the 3D-QSAR pharmacophore protocol of Catalyst HypoGen algorithm [41] implemented in DS 2.5 [32]. Bound conformations of PCAM inhibitors taken from the refined E:I complexes were considered for the constructing of the pharmacophore building. The top scoring pharmacophore hypothesis was prepared in three stages: constructive, subtractive and optimization step, from a set of most active PCAMs inhibitors. The inactive compounds served for the definition of excluded volume. During the pharmacophore generation, five features available in the HypoGen algorithm were selected: hydrophobic aromatic (HYdAr), hydrophobic aliphatic (HYd), hydrogen-bond donor (HBD), hydrogen-bond acceptor (HBA), and ring aromatic (Ar) feature. Default values of the adjustable parameters were kept during the pharmacophore generation, except the uncertainty on the biological activity, which was reduced to 1.25 instead of 3. This adjustment modified the uncertainty interval of experimental activity from a wide span xIC 50 /3, 3¨IC 50 y to a relatively narrow one x4¨IC 50 /5, 5¨IC 50 /4y, due to accuracy and homogeneity of the measured activities originating from the same laboratory [12]. The top ten pharmacophores were generated with the number of missing features set to 0. Finally, the best pharmacophore model was selected. The PH4 pharmacophore model was then evaluated with help of the validation set of PCAM inhibitors [12], Section 3.1. Generally a PH4 model, as the one described here, can be used to estimate the pIC 50 of new analogues on the basis of their mapping to the pharmacophore features. In this study, priority was given to the PH4 based screening of ADME focused VLs.

ADME Properties
Properties that determine the pharmacokinetics profile of a compound, besides octanol/water partitioning coefficient, aqueous solubility, blood/brain partition coefficient, Caco-2 cell permeability, serum protein binding, number of likely metabolic reactions and other eighteen descriptors related to adsorption, distribution, metabolism and excretion (ADME properties) of the inhibitors were computed by the QikProp program [21] based on the methods of Jorgensen [19,20,42]. According to those methods, experimental results of more than 710 compounds among which about 500 drugs and related heterocycles were correlated with computed physicochemical descriptors resulting in an accurate prediction of molecule's pharmacokinetic profile. Drug likeness (#stars) is represented by the number of descriptors that exceed the range of values determined for 95% of known drugs out of 24 selected descriptors computed by the QikProp [21]. Drug-likeness was used as the global compound selection criterion related to ADME properties. The calculation of the selected ADME descriptors were performed from 3D structures of compounds considered. These descriptors were used to assess the pharmacokinetics profile of designed compounds and served also for the VL focusing (see Section 3.11).

Virtual Combinatorial Library Generation
The analogue model building was performed with Molecular Operating Environment (MOE) program [43]. The library of analogues was enumerated by attaching R-groups (fragments, building blocks) onto PCAM scaffold using the Quasar CombiDesign module of MOE [43]. Reagents and chemicals considered in this paper were selected from the directories of chemicals available from the commercial sources [18]. Each analogue was built as a neutral molecule in the MOE program [43], its molecular geometry was refined by MM optimization through smart minimizer of Discovery Studio [32] at high convergence criteria (threshold on energy difference of 10´4 kcal¨mol´1 and root mean square deviation (RMSD) of 10´5 Å), dielectric constant of 4, using class II consistent force field CFF [31], as described in the Section 3.3.

ADME-Based Library Focusing
Twenty four pharmacokinetics-related molecular descriptors available in QikProp [21], which characterize a wide spectrum of molecular properties as described in Section 3.9, e.g., such as molecular mass, total solvent-accessible molecular surface, hydrophobic portion of the solvent-accessible molecular surface, total volume of molecule enclosed by solvent-accessible molecular surface, number of non-trivial non-hindered rotatable bonds, estimated number of hydrogen bonds that would be donated by the solute to water molecules in an aqueous solution, estimated number of hydrogen bonds that would be accepted by the solute from water molecules, logarithm of partitioning coefficient between n-octanol and water phases, logarithm of predicted aqueous solubility, logarithm of predicted binding constant to human serum albumin, logarithm of predicted brain/blood partition coefficient, apparent Caco-2 cell membrane permeability in Boehringer-Ingelheim scale, number of likely metabolic reactions, percentage of human oral absorption in gastrointestinal tract, etc. Optimum ranges of these 24 descriptors were defined in terms of upper and lower bounds according to QikProp [21]. Only compounds with predicted drug-likeness (#stars, Section 3.9) equal to zero, were retained in the focused VL of drug-like PCAM analogues.

Pharmacophore-Based Library Focusing
The pharmacophore model (PH4) described in Section 3.8 was derived from the bound conformations of PCAMs at the active-site of InhA. The enumerated and ADME-focused virtual library was further focused by using the ligand pharmacophore mapping protocol available of Discovery Studio [32]. Within this protocol, each generated conformer of the analogues was geometry optimized by means of the CFF forcefield for a maximum of 500 energy minimization steps and subsequently aligned and mapped to the PH4 model in order to select the top ranking overlaps. Twenty best-fitting inhibitor conformers were saved and clustered into 10 conformational families according to their mutual RMSD by Jarvis-Patrick complete linkage clustering method [44]. The best representative of each cluster was considered in the virtual screening of analogues. Only those analogues mapping to all four PH4 features were retained for the in silico screening.

In Silico Screening
The conformer with the best mapping on the PH4 pharmacophore in each cluster of the focused library subset was selected for in silico screening by the complexation QSAR model. The relative GFE of E:I complex formation in water ∆∆G com was computed for each selected new analogue and then used for prediction of InhA inhibitory potencies (IC 50 pre ) of the focused virtual library of PCAM analogues by inserting this parameter into the target-specific scoring function, Equation (9). The scoring function, which is specific for the InhA receptor of MTb: pIC 50 pre [InhA] = a¨∆∆G com + b, was parameterized using the QSAR model described above, Section 3.6.

Conclusions
Structural information from the crystal structure of InhA-PCAM1 complex guided us during preparation of a reliable QSAR model of inhibition of the InhA of MTb by pyrrolidine carboxamide inhibitors, which correlated computed Gibbs free energies of complex formation with observed inhibitory potencies. In addition to this QSAR model, we have derived a PH4 pharmacophore model for PCAM inhibitors using a training set of 20 and validation set of 3 PCAMs with known inhibitory activities [12]. Analysis of interactions between the InhA and PCAMs in the enzyme active-site directed us in our effort to design an initial diversity virtual combinatorial library of new PCAM analogues with multiple substitutions on the benzene ring. A focused library filtered by a set of ADME-related descriptors and screened by matching of the analogues to the PH4 pharmacophore, permitted selection of a library subset of orally bioavailable PCAMs. This subset of 115 best virtual hits was submitted to computation of predicted InhA inhibitory potencies by the complexation QSAR model. The best analogues reached predicted activities in the low nanomolar concentration range.  Table 6, are recommended for synthesis and subsequent activity evaluation in InhA inhibition assays and may lead to a discovery of novel potent orally bioavailable antituberculotics.
Author Contributions: Affiba Florance Kouassi performed the complexation study, the subsequent PH4 generation, the interaction energy analysis, the PH4-based VL screening and the analogues evaluation under the supervision of Eugene Megnassan. Eugene Megnassan wrote the manuscript with assistance and intellectual input from Vladimir Frecer, Stanislav Miertus and Yao Thomas N'Guessan. Mawa Kone established the first preliminary complexation model in order to confirm the feasibility of this work. Melalie Keita and Akori Esmel performed the VL generation and focusing. All the authors approved the final manuscript and had complete access to the study data.

Conflicts of Interest:
The authors declare no conflict of interest.