3D QSAR Studies, Pharmacophore Modeling and Virtual Screening on a Series of Steroidal Aromatase Inhibitors

Aromatase inhibitors are the most important targets in treatment of estrogen-dependent cancers. In order to search for potent steroidal aromatase inhibitors (SAIs) with lower side effects and overcome cellular resistance, comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) were performed on a series of SAIs to build 3D QSAR models. The reliable and predictive CoMFA and CoMSIA models were obtained with statistical results (CoMFA: q2 = 0.636, r2ncv = 0.988, r2pred = 0.658; CoMSIA: q2 = 0.843, r2ncv = 0.989, r2pred = 0.601). This 3D QSAR approach provides significant insights that can be used to develop novel and potent SAIs. In addition, Genetic algorithm with linear assignment of hypermolecular alignment of database (GALAHAD) was used to derive 3D pharmacophore models. The selected pharmacophore model contains two acceptor atoms and four hydrophobic centers, which was used as a 3D query for virtual screening against NCI2000 database. Six hit compounds were obtained and their biological activities were further predicted by the CoMFA and CoMSIA models, which are expected to design potent and novel SAIs.


Introduction
Aromatase is a cytochrome P-450 dependent enzyme that catalyzes the aromatization of androgens to estrogens. Aromatase inhibitors (AIs) reduce the synthesis of estrogens and offer a therapeutic alternative for the treatment of estrogen-dependent cancers, such as breast cancer [1][2][3]. There are two classes of AIs, steroidal and non-steroidal compounds, which cause potent estrogen suppression [4]. The non-steroidal aromatase inhibitors (NSAIs) are mostly azole type compounds, such as the clinically used anastrozole and letrozole, which compete with the substrate for binding to the enzyme active site [5]. Among steroidal aromatase inhibitors (SAIs), formestane was widely used during the early 1990s, but it is not used nowadays because of the need to administer it by intramuscular injection. Therefore, the orally active exemestane is the main steroidal inhibitor [6]. These SAIs mimic the natural substrate androstenedione and are converted by the enzyme to reactive intermediates, which bind irreversibly to the enzyme active site, resulting in inactivation of aromatase [7]. Despite the success of the third-generation NSAIs (anastrazole and letrozole) and SAIs (exemestane), they still have some major side effects, such as increase of bone loss, joint pain, and heart problems [8]. In addition, after some years of usage they can develop cellular resistance. For these reasons, it is important to search for other potent and specific molecules with lower side effects and which can overcome the resistance phenomena.
Quantitative structure-activity relationship (QSAR) methods have been successfully employed to assist the design of new small molecule drug candidates [9][10][11][12][13][14][15][16]. Comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) are two of the most widely used three-dimensional quantitative structure-activity relationship (3D QSAR) methodologies. CoMFA calculates the energies of steric and electrostatic interactions between the compound and the probe atom at various intersections of a regular 3D lattice according to Lennard-Jones and Coulomb potentials. The resulting energies derived from these two potential functions can be contoured to offer a quantitative spatial description of the molecular properties [17]. CoMSIA introduces the Gaussian function for the distance dependence between the molecular atoms and the probe atom in order to avoid some inherent deficiencies arising from the Lennard-Jones and Coulomb potential functional forms. CoMSIA is applied to gain an insight into how steric fields, electrostatic fields, hydrophobic fields, hydrogen bond donor (HBD) and hydrogen bond acceptor (HBA) influence the activity of inhibitors [18].
Pharmacophore modeling can provide valuable insight into ligand-receptor interactions. Pharmacophore searches are the best option to find a range of chemical structures with viable features. A pharmacophore model can be considered as the ensemble of steric and electrostatic features of different compounds, which are necessary to ensure optimal supramolecular interactions with a specific biological target structure and to trigger or to block its biological response. Thus, pharmacophore modeling is the method of choice for the first round of compound selection. This ability of a pharmacophore model is used to find new classes of inhibitors when one class is known. This is known as "scaffold hopping" [19][20][21].
A series of SAIs, shown in Table 1, have been reported in the recent literatures [22][23][24][25][26][27]. To understand the structural requirements for inhibitory activity and design more potent agents, 3D QSAR studies were performed for the fist time for these SAIs using CoMFA and CoMSIA. In addition, 3D pharmacophore models were created and the selected best model was used as a 3D query for virtual screening against NCI2000 database. The biological activities of hit compounds were further predicted by using CoMFA and CoMSIA models.

CoMFA and CoMSIA Statistical Results
The statistical parameters of standard CoMFA models constructed with steric and electrostatic fields are given in Table 2. The cross-validated coefficient (q 2 /r 2 cv), non-cross-validated correlation coefficient (r 2 ncv), standard error estimate (SEE) and F-statistic values (F) were computed as defined in SYBYL. The cross-validated (leave-one-out) PLS analysis shows a q 2 value of 0.636 with ten components and non-cross-validated PLS analysis results in r 2 ncv of 0.988, SEE of 0.094 and F value of 309.026, which indicates it is a model with high quality. The corresponding field contributions of steric and electrostatic are 0.671 and 0.329, respectively, which means the steric field gives more contribution to the bioactivity than the electrostatic field does. Commonly, in CoMSIA, five different similarity fields (steric, electrostatic, hydrophobic, HBD, and HBA) are calculated. However, for CoMSIA models, the model with global descriptors is not the best model in all probability. Some papers [28,29] have discussed whether the five different descriptor fields in CoMSIA are totally independent of each other. The dependencies of the individual fields usually decrease the signal-to-noise ratio in the data [29] and lower the statistical significance of the results. Therefore, in our study, an optimization of 31 possible combinations of five different descriptor fields was evaluated from the values of the q 2 , which is shown in Figure 1. The higher the value of q 2 shows, the better the model is. With the highest q 2 of 0.843, CoMSIA model with the combination of steric and HBA fields (SA) was finally chosen as the best model, which indicates that the steric and HBA fields mainly contribute to the binding affinities. This CoMSIA (SA) model was obtained with seventeen optimal components. The analysis results are summarized in Table 2 (q 2 = 0.843, r 2 ncv = 0.989, F = 174.304, SEE = 0.096). The corresponding field contributions of steric and HBA are 0.677 and 0.323, respectively, which means that the steric field provides more bioactivity contribution than the HBA field does.

Validation of 3D QSAR Models
In order to validate the obtained 3D QSAR models, r 2 pred was used to determine the predictive abilities of the CoMFA and CoMSIA models from the 16 compounds (test set), which were not included in the generation of the models. The obtained r 2 pred of the test set is 0.658, 0.601 for the CoMFA, CoMSIA model, respectively, which indicates that both models have good predictive ability. The observed and predicted pIC50 of the training and test sets by the CoMFA and CoMSIA models are listed in Table 3, and the correlations between the observed and predicted pIC50 of training and test sets are depicted in Figure 2 for CoMFA model, Figure 3 for CoMSIA model, respectively.      Observed pIC 50 Predicted pIC50

CoMFA Contour Maps
The steric contour map for the CoMFA model with the most active inhibitor compound 7 is shown in Figure 4a, in which the green contours represent regions of high steric bulk tolerance (80% contribution), while the yellow contours represent regions of low steric bulk tolerance (20% contribution). It can be seen that a large green contour near C-6 of the ring B indicates a bulky group in this position is favorable to bioactivity. This is supported by the higher activity of compounds 6-11, which have large substituents in that position, compared with the lower activity of compounds 4-5, which have small substituents in that position. It also can be observed that there are two large yellow contours: one is next to C-3 of the ring A, and another is near C-15 and C-16 of the ring D, which suggests that a bulky group in the two regions will decrease inhibitory activity. It is confirmed by the fact that compound 17 with bulky substitution on C-3 of the ring A has lower bioactivity than compound 14 with small substitution in that position, and all the compounds with bulky groups near C-15 and C-16 of the ring D (compounds 1-3, compounds 29-66) show low bioactivities (pIC50 < 6).
The CoMFA electrostatic contour map is shown in Figure 4b, in which the red areas are the regions where a negative potential is favorable to activity, while a negative potential is unfavorable in the blue areas. This figure displays two red contours around C-3 of the ring A and one red contour near C-4 of the ring A, which means that the bioactivity can be enhanced if an electronegative atom is present in these areas. It is supported that most of the studied compounds have electronegative atoms on C-3 or C-4 of the ring A. In addition, it also can be confirmed by the fact that compound 22, 24 with oxygen atom between C-3 and C-4 show higher activity than compound 21, 23 with no oxygen atom in that position, respectively.

CoMSIA Contour Maps
The steric and HBA fields are shown in Figure 5. For each field, the favorable and disfavored contours represent 80% and 20% level contributions, respectively. Figure 5a shows the steric contour map for the CoMSIA model with the most active inhibitor compound 7, in which green regions are sterically favorable and yellow regions are sterically unfavorable. It is clear that the distribution of steric field in CoMSIA is basically consistent with CoMFA results. A little difference is that there is a yellow contour below the large green contour, which indicates a proper bulky substituent on C-6 of the ring B, is favorable to bioactivity while too large substituent in that position is unfavorable. This is supported by the case that the inhibitory activities of compounds 7-10 decline with the augment of the substituents on C-6 of the ring B. Figure 5b shows the HBA contour maps, in which magenta and red contours represent areas where HBA substituents are favored and disfavored, respectively. There are there magenta contours in this figure: one is next to C-3 of the ring A, another is near C-4 of the ring A, and the third one is close to C-17 of the ring D, which indicates that the activity can be enhanced if the HBA atoms are present in these there positions. This is also consistent with the distribution of electrostatic field in CoMFA. It is supported by the case that all the studied compounds have at least one HBA atom in those positions. It also can be confirmed by the example that compound 13 (Formestane) show quite high bioactivity because of its three HBA atoms in those positions.

Pharmacophore Generation
Twenty pharmacophore models were generated with default parameters after Genetic Algorithm with Linear Assignment of Hypermolecular Alignment of Database (GALAHAD) run, and their statistical values are listed in Table 4. Each of the obtained models represents a different tradeoff among the conflicting demands of maximizing steric consensus, maximizing pharmacophore consensus, and minimizing energy. All the twenty models had Pareto rank 0, which means no one model is superior to any other one. Model_03 has very high energy, which is recognized that a high-energy value is due to steric clashes [30]. Small value of energy and high values of Specificity, N_hits, Sterics and Mol_Qry are desired for the best model [31]. Therefore, Model_04 was considered to be the best model and its statistical values are shown in Table 4. This model contains two acceptor atoms and four hydrophobic centers, which is shown in Figure 6 and will be converted into a 3D query for the further virtual screening studies. This pharmacophore model is consistent with the results of CoMFA and CoMSIA. Four hydrophobic centers mean the centers of A, B, C and D rings of steroids, and two acceptor atoms next to C-3 of the ring A and C-17 of the ring D indicate that hydrogen bond acceptor atoms in these two positions can enhance the activity.   Figure 6. The selected GALAHAD model includes two acceptor atoms (green) and four hydrophobic centers (cyan). The sphere sizes indicate query tolerances.

Virtual Screening
The obtained best GALAHAD model ( Figure 6) was converted into a UNITY query, which was screened against NCI2000 database. The "flexible database search" option was implemented to perform virtual screening. Primary filters such as Lipinski's rule of five, Van der Waals bumps, and QFIT (pharmacophoric match between query and the hit compound) were applied to reduce the dataset [32]. The screening of the pharmacophore query yielded six hit compounds that met the specific requirements. The pIC50 values of the six hit compounds were further predicted using the obtained CoMFA and CoMSIA models. Chemical structures and their predicted activity values of the hit compounds are listed in Table 5. Among the hit compounds, compounds NCI 77798 and NCI 79104 show high-predicted pIC50 values by both CoMFA and CoMSIA models (pIC50 > 6.5), which are expected to design potent and novel SAIs.

Compounds and Biological Data
Compounds 1-3 [22], compounds 4-13 [23], compounds 14-17 [24], compounds 18-24 [25], compounds 25-28 [26] and compounds 29-66 [27] were used for this analysis, and their structures and bioactivity values are presented in Table 1. The pIC50 (−log IC50) values were used to derive 3D QSAR models. The pIC50 values of the studied compounds cover an interval of more than 3 log units. The whole data set of 66 compounds was divided into two groups in approximate ratio of 4:1; A training set with 50 compounds, a test set with 16 compounds ( Table 1). The selection of the training and test sets was done manually such that low, moderate and high activity compounds were present in roughly equal proportions in both sets. The training set was used to build predictive models, while the test set was used to validate the predictive ability of the models.

Molecular Modeling and Alignment
The 3D QSAR modeling analyses, calculations and visualizations were performed using the SYBYL 7.3 molecular modeling package from Tripos Inc., St. Louis, Mo, USA, installed on Red Hat Linux workstations. Identification of the bioactive conformation is a very important step in a 3D QSAR study [33]. Among the inhibitors, the crystal structure of compound 7 bound with aromatase is available from the protein data bank (PDB code: 4GL7) [23]. Therefore, compound 7 was extracted from the complex and chosen to define the most likely binding conformation, which was modified by adding hydrogen atoms without any change of conformation and was minimized with the following steps: (i) Optimization by Steepest Descent with initial optimization of 200 simplex iterations using Tripos force field and Gasteiger-Marsili charges; (ii) Optimization by conjugate gradient; and (iii) Optimization by BFGS [34]. Three-dimensional structures of the other molecules were constructed from the compound 7. Energy minimizations of each compound were performed according to the above procedure. In the present study, compound 7 was used as a template because of its highest activity and all other compounds were aligned on the basis of the common structure shown in Figure 7. The alignment of training and test compounds is shown in Figure 8.

CoMFA and CoMSIA Models
In CoMFA, the steric fields were calculated using a Lennard-Jones potential, while the electrostatic fields were calculated using a Coulombic potential. To calculate the CoMFA fields, a 3D cubic lattice with grid spacing of 2.0 Å in X, Y and Z directions was created automatically by SYBYL. The grid pattern extended 4.0 Å units in all directions beyond the dimensions of each molecule. The steric and electrostatic probe-ligand interaction energies were calculated by using a sp 3 carbon probe atom and a +1.0 charge with a distance-dependent dielectric function at each lattice point. The cut-off for energies was set to ± 30 kcal/mol and the electrostatic contributions were ignored at lattice points with maximal steric interactions [17]. In CoMSIA, five different similarity fields (steric, electrostatic, hydrophobic, HBD, and HBA) were calculated. CoMSIA models were also derived with the same lattice box and all five fields were calculated using a probe of charge +1, a radius of 1, hydrophobicity and hydrogen bonding properties of +1, and an attenuation factor of 0.3 for the Gaussian distancedependent function [18].

Statistical Analysis
In order to derive 3D QSAR models, CoMFA and CoMSIA descriptors were used as independent variables and the pIC50 values as the dependent variables. PLS method with cross-validation (leave-one-out) was used in SYBYL to determine the optimal numbers of components by using cross-validated coefficient q 2 (r 2 cv). After obtaining the optimal numbers of components, a PLS analysis was performed with no validation and column filtering 2.0 to generate the final model with the training set. The obtained final non-cross-validated correlation coefficient (r 2 ncv) is a measure of the quality of the model. The predictive capability of the 3D QSAR models was determined from the predictive correlation (r 2 pred). The predicted activities for the test set were obtained from the model produced by the training set.

Pharmacophore Hypothesis
The pharmacophore hypothesis was generated using GALAHAD module of SYBYL, which operates in two main stages: The ligands are aligned to each other in internal coordinate space, and then the conformations produced are aligned in Cartesian space. The feature considered in developing the pharmacophore model includes HBD atoms, HBA atoms, hydrophobic and charged centers [35][36][37]. In our study, eight compounds shown in Table 1 were selected to carry out the pharmacophore hypothesis and the genetic algorithm was used to create conformers for all molecules. The compounds selected to generate the pharmacophore hypothesis are highly active and structurally diverse.

Conclusions
Aromatase inhibitors have proven to be the most important targets for treatment of estrogen-dependent cancers. In order to search for more potent SAIs with lower side effects and overcome the drug resistance, 3D QSAR studies, pharmacophore modeling and virtual screening were performed. The 3D QSAR techniques, CoMFA and CoMSIA, were applied for the first time to 66 new-synthesized SAIs, and the obtained models show good statistical results, q 2 = 0.636, r 2 ncv = 0.988, r 2 pred = 0.658 for CoMFA and q 2 = 0.843, r 2 ncv = 0.989, r 2 pred = 0.601 for CoMSIA, which indicate that both of the models have good quality and predictive ability. CoMFA and CoMSIA contour maps show that a proper bulky group near C-6 is favorable for activity while a bulky group near C-3, C-15 and C-16 is unfavorable, and the hydrogen bond acceptor atoms on C-3, C-4 and C-17 can enhance the bioactivity. In addition, pharmacophore models were derived from eight highly active and structurally diverse compounds using GALAHAD. The obtained best pharmacophore model includes two acceptor atoms and four hydrophobic centers, which was used as a query to search NCI2000 database. Six hit compounds were obtained after the screening of the pharmacophore query, and their pIC50 values were further predicted using the obtained CoMFA and CoMSIA models, which are expected to design potent and novel SAIs. The present 3D QSAR, pharmacophore modeling, and virtual screening approach provide useful information to design and synthesize potent and novel SAIs.

Statements
As shown in a series of recent publications [38][39][40][41][42][43][44][45] in response to the call in a comprehensive review [46], user-friendly and publicly accessible web-servers represent the future direction for developing practically more useful prediction and modeling methods or demonstrating new and novel findings, we shall make efforts in our future work to provide a web-server for the approach and finding presented in this paper.