A Combined Pharmacophore Modeling, 3D QSAR and Virtual Screening Studies on Imidazopyridines as B-Raf Inhibitors

B-Raf kinase is an important target in treatment of cancers. In order to design and find potent B-Raf inhibitors (BRIs), 3D pharmacophore models were created using the Genetic Algorithm with Linear Assignment of Hypermolecular Alignment of Database (GALAHAD). The best pharmacophore model obtained which was used in effective alignment of the data set contains two acceptor atoms, three donor atoms and three hydrophobes. In succession, comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) were performed on 39 imidazopyridine BRIs to build three dimensional quantitative structure-activity relationship (3D QSAR) models based on both pharmacophore and docking alignments. The CoMSIA model based on the pharmacophore alignment shows the best result (q2 = 0.621, r2pred = 0.885). This 3D QSAR approach provides significant insights that are useful for designing potent BRIs. In addition, the obtained best pharmacophore model was used for virtual screening against the NCI2000 database. The hit compounds were further filtered with molecular docking, and their biological activities were predicted using the CoMSIA model, and three potential BRIs with new skeletons were obtained.


Introduction
Cancer as the second cause of mortality is a major health problem all over the world [1], so it is still important to discover new anticancer drugs in spite of the progress in medicine. The Ras-Raf-MEK-ERK pathway, also called ERK/MAP or MAPK kinase pathway, is important for cell proliferation and survival, and its hyper-activation has been reported in up to 30% of human cancers [2][3][4]. Raf kinase exists as three isoformsin this pathway: A-, B-, and C-Raf. B-Raf kinase has been identified as the primary activator [5], and the mutations in B-Raf kinase have been observed in approximately 7% of human cancers, with a different frequency in a variety of human cancers, such as melanoma (50%-70%), ovarian (35%), thyroid (30%), and colorectal (10%) cancers [6]. Therefore, B-Raf kinase has recently emerged as an important and exciting target in cancer treatment [7][8][9].
Pharmacophore modeling can offer a valuable insight into interactions between receptors and ligands. A pharmacophore model reveals the ensemble of steric and electrostatic characteristics of different compounds, by which new classes of inhibitors can be discovered when one class of inhibitors is found. Therefore, pharmacophore searching is a good way to find various chemical structures with the same features in a virtual screening [10][11][12].
Quantitative structure-activity relationship (QSAR) methods have been successfully employed to assist the design of new small molecule drug candidates and explore the ligand-protein interaction mechanism [13][14][15][16][17][18][19][20]. Comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) are two of the most widely used 3D QSAR methodologies. Lennard-Jones and Coulomb potential functions are introduced by CoMFA to calculate the energies of steric and electrostatic interactions between the compound and the probe atom, and the obtained results can be represented as a three-dimensional "coefficient contour" map [21]. However, in order to avoid some inherent deficiencies caused by Lennard-Jones and Coulomb potential functions, Gaussian function is used by CoMSIA to calculate the energies of interactions between the compound and the probe atom, and the obtained coefficient contour maps can show how steric fields, electrostatic fields, hydrophobic fields, hydrogen bond donor (HBD) and hydrogen bond acceptor (HBA) influence the activity of inhibitors [22].
Recently, a series of imidazopyridines as B-Raf inhibitors (BRIs), shown in Table 1, have been reported in the literature [23]. To understand the structural basis for inhibitory activity and design potent inhibitors, pharmacophore models were created and 3D QSAR studies were performed for these imidazopyridines by using CoMFA and CoMSIA based on both pharmacophore and docking alignments. In addition, in order to discover new classes of BRIs, the obtained pharmacophore model was used as a 3D query for virtual screening against the NCI2000 database. The hit compounds were further filtered by molecular docking, and the biological activities of the hit compounds were predicted by the obtained 3D QSAR model.

Pharmacophore Generation
After the Genetic Algorithm with Linear Assignment of Hypermolecular Alignment of Database (GALAHAD) run, twenty pharmacophore models were generated with default parameters, for which statistical values are listed in Table 2. Among the conflicting demands of maximizing pharmacophore consensus, maximizing steric consensus, and minimizing energy, each obtained model represents a different tradeoff. As shown in Table 2, each model has Pareto rank 0, which implies no one model is superior to any other one. Model_12 has a high energy, which is due to steric clashes [24]. The good models should have small value of energy, high values of Specificity, N_hits, Sterics, H-bond and Mol_Qry [25]. Therefore, Model_06 was considered to be the best model and its statistical values are listed in Table 2. This model was not only used for the molecular alignment to produce CoMFA and CoMSIA models in the 3D QSAR studies, but also was converted into a UNITY query for virtual screening studies. As shown in Figure 1, this model contains two acceptor atoms, three donor atoms and three hydrophobes, and the nitrogen atom attached to -SO2-group acts both acceptor and donor atoms. The selected model (Model_06) is indicated in boldface. Figure 1. The selected GALAHAD model includes two acceptor atoms (green), three donor atoms (magenta) and three hydrophobes (cyan). The sphere sizes indicate query tolerances.

3D QSAR Studies
The structural alignment of compounds is a crucial step in the development of successful 3D QSAR models. In order to obtain reasonable results, all compounds of the data set were aligned according to both pharmacophore and molecular docking to derive CoMFA and CoMSIA models in current study.

CoMFA and CoMSIA Statistical Results
In order to get an effective 3D QSAR model, a series of statistical parameters, cross-validated coefficient (q 2 /r 2 cv), standard error estimate (SEE) and F-statistic values (F) were calculated as defined in SYBYL. The CoMFA and CoMSIA statistical results based on both pharmacophore-based and docking-based models are shown in Table 3, which shows that the pharmacophore-based modeling yielded q 2 (r 2 cv) = 0.501 for CoMFA model and q 2 (r 2 cv) = 0.621 for CoMSIA model, while the docking-based modeling gave q 2 (r 2 cv) = 0.690, for CoMFA model, q 2 (r 2 cv) = 0.541 for CoMSIA model, respectively.

Validation of 3D QSAR Models
In order to validate the 3D QSAR models, the predictive correlation (r 2 pred) was used to assess the predictive abilities of the CoMFA and CoMSIA models from the test set (Table 1) which was not included in the generation of the models. As shown in Table 3, the pharmacophore-based models exhibit better predictive ability than the docking-based models, where the pharmacophore-based modeling yielded r 2 pred = 0.786 for CoMFA model and r 2 pred = 0.885 for CoMSIA model, while the docking-based modeling gave r 2 pred = 0.590 for CoMFA model and r 2 pred = 0.607 for CoMSIA model, respectively.
We mainly focus on the CoMSIA obtained from pharmacophore-based alignment due to its satisfactory statistical results and its best predictive ability. As shown in Table 3, this CoMSIA model has a q 2 (r 2 cv) of 0.621 with ten optimal components, SEE of 0.063 and F value of 410.567, which indicates it is a quite good model. The corresponding field contributions of steric, electrostatic, hydrophobic, HBD and HBA are 0.196, 0.201, 0.291, 0.161 and 0.151, respectively, which suggests that each field gives similar contribution to activity. The observed and predicted pIC50 by the CoMSIA model of the training and test sets are given in Table 4, and the correlations between the observed and predicted pIC50 of training and test sets are depicted in Figure 3. Observed pIC 50 Predicted pIC50 Figure 3. Plots of observed vs. predicted activities of the training set and test set molecules from CoMSIA analysis.

CoMSIA Contour Maps
CoMSIA not only calculates steric and electrostatic fields as in CoMFA, but also additionally computes hydrophobic, HBD and HBA fields. The CoMSIA contour maps of steric, electrostatic, hydrophobic, HBD, and HBA fields are revealed in Figure 4a-e. Compound 18 and compound 10 were selected to be superimposed into the contour maps because compound 18 is the most active compound in all 39 imidazopyridines and compound 10 is the least active compound in 30 compounds (compounds 4-33) in which there is a substituent group attached to the imidazole ring. For each field, the favorable and disfavored contours represent 80% and 20% level contributions, respectively.
The steric contour map with compounds 18 and 10 is shown in Figure 4a, in which green contours refer to sterically favored regions, while yellow contours indicate sterically disfavored areas. A large green contour near the phenyl group attached to the imidazole ring of compound 18 indicates that a bulky group in this region is favorable to bioactivity. It is confirmed by the fact that compounds 12-39 with bulky substitution in this region have higher bioactivity than compounds 1-11 with no substitution. A large yellow contour near the piperidine group attached to the imidazole ring of compound 10 suggests that a bulky group in this area is unfavorable to bioactivity. This is supported by the lower activity of compounds 10-11 with large substituents in this area, compared with the higher activity of compounds 4-9 with small substituents.
The electrostatic contour map with compound 18 is shown in Figure 4b, where a negative potential is favorable to activity in the red areas while a positive potential is favorable in the blue areas. A red contour near the 4-position of the benzene ring and a blue contour near the 2-position of the benzene ring suggest that an electronegative atom on the 4-position of the benzene ring can increase the bioactivity while an electronegative atom on the 2-position of benzene ring is able to reduce the bioactivity. This can be validated by the higher activity of compounds 13, 14 and 15, compared with the lower activity of compounds 19, 20 and 21, respectively. Figure 4b also shows that a blue contour is near the nitrogen atom of the imidazole ring, which means that the nitrogen atom is unfavorable to activity. This is supported by the fact that compounds 12 and 13 with a nitrogen atom at that position are less active than compounds 34 and 35 with no nitrogen atom, respectively.  The hydrophobic contour map in combination with compounds 18 and 10 is shown in Figure 4c, in which yellow contours refer to regions where hydrophobic groups are favored while white contours represent areas where hydrophilic groups are favored. Two large yellow contours near the phenyl group attached to the imidazole ring of compound 18 means that a hydrophobic group in this region is favorable to bioactivity while a large white contour near the piperidine group attached to the imidazole ring of compound 10 suggests that a hydrophobic group in this area is unfavorable. This can be seen by the fact that compounds 12-39 with a benzene ring near the yellow contour have higher bioactivity while compounds 5-11 with a hydrophobic group in white contour have lower bioactivity. Figure 4c also displays that there is a yellow contour near the 3-position of benzene ring, which hints that a hydrophilic group on the 3-position of benzene ring is able to decrease the bioactivity. This can be proved by the descending activities of compounds 18, 17 and 16.
The HBD contour map in combination with compound 18 is shown in Figure 4d, where cyan contours indicate that HBD groups in this area are favorable to activity while purple contours represent that HBD substituents in this region are unfavorable. Three cyan contours are close to the three -NH groups of the compound 18, which suggests the necessity of the hydrogen atoms at these positions for high bioactivity. This can be validated by the fact that compounds 12-36 with three -NH groups have higher bioactivity than the other compounds with two -NH groups.
The HBA contour map in combination with compound 18 is shown in Figure 4e, in which magenta and red contours represent areas where HBA substituents are favored and disfavored, respectively. Two large magenta contours are located near the oxygen atoms of -CO-and -SO2-groups and one large magenta contour is next to the nitrogen atom of the pyridine ring, which implies that the two oxygen atoms and one nitrogen atom are HBA atoms. Figure 4e also shows there is a large red contour near the nitrogen atom of the imidazole ring, which indicates that the nitrogen is unfavorable to the activity. This is proved by the lower activity of compounds 12 and 13 with a nitrogen atom, compared with the higher activity of compounds 34 and 35 with no nitrogen atom, respectively, which is consistent with the result derived from electrostatic contour map.

Pharmacophore Model Validation
In order to validate the pharmacophore model in the virtual screening, the QFIT (pharmacophoric match between query and the hit compound) values of all the 39 imidazopyridine inhibitors were tested by using the obtained best pharmacophore model (Model_06), and the correlations between the QFIT values and pIC50 values of the 39 inhibitors are depicted in Figure 5. It can be seen that 37 of 39 inhibitors have high QFIT values (QFIT > 45), which indicates that the Model_06 is a potent pharmacophore model in the virtual screening. In order to screen the dataset effectively, the QFIT values of hit compounds were set to more than 45 in this study.

Docking Model Validation
In order to validate the docking model in the virtual screening, the C_score values of all the 39 imidazopyridine inhibitors were assessed by using Surflex-Dock, and the correlations between the C_score values and pIC50 values of the 39 inhibitors are depicted in Figure 6. It can be shown that all the 39 inhibitors have high C_score values (C_score > 5.0), which suggests that the protomol generated by Surflex-Dock is an effective docking model in the virtual screening. In order to screen the dataset effectively, the C_score values of hit compounds were set to more than 5.0 in this study.

Screening of NCI2000 Database
The obtained best GALAHAD model (Model_06) was converted into an UNITY query, which was screened against NCI2000 database (234,054 compounds). The "3D Search" option was implemented to perform virtual screening, primary filters such as Lipinski's rule of five and Van der Waals bumps were applied to reduce the dataset [26], and the QFIT values of hit compounds were set to more than 45. The screening of the pharmacophore query yielded eight hit compounds that met the specific requirements. The eight hit compounds were further subjected to molecular docking by using the Sulflex-Dock. Three compounds were selected based on the docking C_score values (C_score > 5.0). The pIC50 values of the three hit compounds were predicted by the obtained CoMSIA model based on pharmacophore alignment. Chemical structures and predicted pIC50 values of the three hit compounds are listed in Table 5. The three hit compounds show quite good predicted pIC50 values (pIC50 > 7.6), which are expected to design novel BRIs with new skeleton.

Compounds and Biological Data
A series of imidazopyridines as BRIs were used for this study, and their structures and bioactivity values are shown in Table 1 [23]. The pIC50 (−log IC50) values were used to derive 3D QSAR models. The whole data set of 39 compounds was divided into two groups in an approximate ratio of 3:1; a training set with 29 compounds, and a test set with 10 compounds ( Table 1). The selection of the training and test sets was performed manually such that high, moderate and low activity compounds were selected in roughly equal proportions in both sets. The training set was used to build predictive models, while the test set was used to validate the predictive ability of the models.

Molecular Modeling
The SYBYL 7.3 molecular modeling package from Tripos Inc., St. Louis, MO, USA, which was installed on Red Hat Linux workstations [27], was used to perform 3D QSAR modeling analyses, calculations and visualizations. Identification of the bioactive conformation is a very crucial step in a 3D QSAR study [28], so the conformation of compound 4 was obtained from the protein data bank (PDB code: 4MBJ) [23], where the compound 4 is combined with B-Raf kinase. The compound 4 was modified by adding hydrogen atoms after extracted from the complex, and then minimized with the following three steps: (i) using Steepest Descent with initial optimization of 200 simplex iterations in the condition of Tripos force field and Gasteiger-Marsili charge to optimize the energy; (ii) using conjugate gradient to optimize the energy; and (iii) using BFGS to optimize the energy [29]. Each conformation of other inhibitors was constructed based on the compound 4, and energy minimizations were performed according to the above procedure.

Pharmacophore Hypothesis
The pharmacophore hypothesis includes two main stages: firstly, the ligands are aligned to each other in internal coordinate space; secondly, the produced conformations are aligned in Cartesian space, which was performed by using GALAHAD module of SYBYL. The features used to generate the pharmacophore model include HBD atoms, HBA atoms, hydrophobic and charged centers [30][31][32]. Eight compounds with high activity (Table 1) were selected to create the pharmacophore hypothesis in the current study, and the conformers for all molecules were generated by genetic algorithm.

Molecular Docking
The molecular docking was carried out by using the Surflex-Dock module of SYBYL, and all parameters were set with default values in the whole process. All the molecules were docked to the binding site of B-Raf kinase crystal structure in complex with compound 4 (PDB code: 4MBJ) [23]. Before docking, the ligand was extracted, all the water molecules were removed and hydrogen atoms were added to the receptor. The protomol was generated using the docking based method; with the ligand location in the same coordinate space in the receptor. In our study, each conformer of all 39 inhibitors was docked into the binding site 10 times and the C_score values were used to evaluate the docking analysis. The top ranked conformations for each molecule were extracted and aligned together for the subsequent 3D QSAR study [33]. The Surflex-Dock was also used to filter the hit compounds in the virtual screening.

Molecular Alignment
The 3D molecular alignment plays a very important role in 3D QSAR studies, and affects the outcome of the CoMFA and CoMSIA statistical analysis. There are three main different molecular alignments for 3D QSAR: maximum common substructures overlap, pharmacophore-based alignment and docking-based alignment [29]. In order to get reasonable results, both of the pharmacophore-based and docking-based alignment procedures were performed in our study. Pharmacophore-based alignment was performed using GALAHAD and docking-based alignment was done by using Surlflex-Dock.

CoMFA and CoMSIA Models
CoMFA models use a Lennard-Jones potential to calculate steric fields and a Coulombic potential to compute electrostatic fields. During the calculation of CoMFA fields, a 3D cubic lattice with grid spacing of 2.0 Å in three-dimensional directions was generated by SYBYL, and the grid pattern stretched 4.0 Å in all directions of each molecule. A sp 3 carbon probe atom as steric probe and a +1.0 charge as an electrostatic probe were taken to calculate the probe-ligand interaction energies at each lattice point. The electrostatic contributions were ignored at lattice points with maximal steric interaction and the cut-off for energies was set to ±30 kcal/mol [21]. However, five different similarity fields (steric, electrostatic, hydrophobic, HBD, and HBA) were computed in CoMSIA models. The same lattice box as in CoMFA was used to derive CoMSIA models, in which a probe of charge +1, a radius of 1, hydrophobicity and hydrogen bonding properties of +1 were use to calculate the five fields, and an attenuation factor was set to 0.3 for the Gaussian distance-dependent function [22].

Statistical Analysis
The pIC50 values were used as dependent variables and CoMFA and CoMSIA descriptors as independent variables in the 3D QSAR models. The optimal number of components was obtained according to q 2 (r 2 cv) when the partial least squares (PLS) method with cross-validation (leave-one-out) was used in SYBYL. Based on the obtained optimal number of components, the final model was generated with the training set after a PLS analysis was performed with no validation and column filtering 2.0. The quality of the 3D QSAR models can be evaluated by the obtained q 2 and the predictive capability of the models can be determined by r 2 pred. The predicted activities for the test set were obtained from the model produced by the training set.

Conclusions
B-Raf kinase has proven to be an important target for treatment of cancers. In order to design and search for potent BRIs, a combined pharmacophore modeling, 3D QSAR and virtual screening studies on imidazopyridines were performed. Pharmacophore models were derived from eight compounds with high activity and diverse structure by using GALAHAD, and the best pharmacophore model obtained included two acceptor atoms, three donor atoms and three hydrophobes. 3D QSAR techniques based on both pharmacophore and docking alignments, CoMFA and CoMSIA, were applied for the 39 imidazopyridine BRIs. The CoMSIA model obtained from the pharmacophore-based alignment showed the best result (q 2 = 0.621, r 2 pred = 0.885), and the CoMSIA contour maps indicated that the phenyl group attached to the imidazole ring and -NH, -CO-, -SO2-groups in imidazopyridine molecules can increase the inhibitory activity, while the nitrogen atom of the imidazole ring may reduce the inhibitory activity. In addition, the best pharmacophore model obtained was used for virtual screening against the NCI2000 database, and eight hit compounds were obtained. Three compounds were selected using molecular docking, which showed perfect predicted pIC50 values. The present pharmacophore modeling, 3D QSAR, and virtual screening approach provides useful information to design and synthesize novel BRIs.