Combined 3D-QSAR Modeling and Molecular Docking Studies on Pyrrole-Indolin-2-ones as Aurora A Kinase Inhibitors

Aurora kinases have emerged as attractive targets for the design of anticancer drugs. 3D-QSAR (comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA)) and Surflex-docking studies were performed on a series of pyrrole-indoline-2-ones as Aurora A inhibitors. The CoMFA and CoMSIA models using 25 inhibitors in the training set gave r2cv values of 0.726 and 0.566, and r2 values of 0.972 and 0.984, respectively. The adapted alignment method with the suitable parameters resulted in reliable models. The contour maps produced by the CoMFA and CoMSIA models were employed to rationalize the key structural requirements responsible for the activity. Surflex-docking studies revealed that the sulfo group, secondary amine group on indolin-2-one, and carbonyl of 6,7-dihydro-1H-indol-4(5H)-one groups were significant for binding to the receptor, and some essential features were also identified. Based on the 3D-QSAR and docking results, a set of new molecules with high predicted activities were designed.


Introduction
The Aurora kinases are a family of highly conserved serine/threonine protein kinases that play a key role in regulating many pivotal processes of mitosis and completion of cell division [1][2][3][4][5]. The two major Aurora kinases, Aurora A and Aurora B, play distinct roles in mitosis, though they are very closely related in kinase domain sequence (71% identical) and have the identical residues lining the binding pocket for the ATP adenine ring [6]. Aurora A is involved in centrosome maturation and separation, bipolar spindle assembly, and mitotic entry, while Aurora B is essential for accurate chromosome segregation and cytokinesis [7]. In recent years, Aurora A and B have been actively pursued as anticancer targets for the discovery of new cancer chemotherapeutics [8]. Aurora A has especially been identified as a colon-cancer-associated kinase that is overexpressed in a wide range of human tumors such as breast, colorectal, ovarian, as well as glioma [9,10]. Thus, targeted inhibition of Aurora A has become an attractive therapeutic strategy in cancer therapy, and more than 10 Aurora inhibitors have entered early clinical assessment [11,12].
A series of pyrrole-indoline-2-ones with Aurora A inhibitory activities were reported [13]. These pyrrole-indoline-2-ones, with excellent Aurora A inhibitory activities, were designed and synthesized by sharing a similar scaffold with Hesperadin [14]. In the present study, three-dimensional quantitative structure-activity relationship (3D-QSAR) methods along with docking approaches were used to explore the structure-activity relationship (SAR) of these pyrrole-indoline-2-ones. 3D-QSAR methods, comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA), were performed to foresee the activities of these pyrrole-indoline-2-ones and offered the regions where interactive fields (steric, electrostatic, hydrophobic, hydrogen bond donor and hydrogen bond acceptor fields) may increase or decrease the activity. Surflex-Docking was applied to study the interactions between 35 pyrrole-indoline-2-ones and Aurora A. These developed models can help understanding the SAR of the pyrrole-indoline-2-ones and can also serve as a valuable guide for the design of novel inhibitors with robust potency. Furthermore, we have designed a number of new pyrrole-indoline-2-ones derivatives by utilizing the structure information obtained from the CoMFA and CoMSIA models, which exhibit excellent predictive potencies. Moreover, based on the admirable performance of docking studies, the predicted activities of these newly designed molecules may be reliable.

Data Sets
All of the 35 compounds and associated data involved in this study were obtained from literature [13]. The inhibitory activity data were reported as IC 50 against Aurora A. The IC 50 values were converted into pIC 50 according to the formula in Equation 1 [15]. In CoMFA and CoMSIA models, the dataset was randomly divided into training and test sets including 25 and 10 molecules, respectively. The structures of the molecules are shown in Table 1 and associated inhibitory activities  are shown in Table 2, where pIC 50 values for 35 inhibitors ranged from 5.611 to 8.073.

Molecular Modeling and Database Alignment
Molecular modeling and database alignment were performed by using the molecular modeling package SYBYL 8.1 Tripos, Inc. [16]. The three-dimensional structures of all pyrrole-indoline-2-ones were constructed by using the Sketch Molecule module. Energy minimization of each structure was performed using the SYBYL energy minimizer Tripos force field and Gasteiger-Hückel charge. The maximum iterations for the minimization was set to 2000. The minimization was terminated when the energy gradient convergence criterion of 0.05 kcal/mol·Å was reached [17]. Molecular alignment was considered as one of the most sensitive parameters in 3D-QSAR analyses [18]. The quality and the predictive ability of the model are directly dependent on the alignment rule [19]. In this paper, all of the structures were aligned into a lattice box by fitting with (Z)-3-((1H-pyrrol-2-yl)methylene)indolin-2-one ( Figure 1) as a common structure using compound 20 as a template, which was the most active compound. The aligned molecules are shown in Figure 2.

CoMFA and CoMSIA Setup
CoMFA is a widely used 3D-QSAR method, which relates the biological activity of a series of molecules with their steric and electrostatic fields. The CoMFA descriptor fields were calculated at each lattice with a grid spacing of 1 Å and extending to 4 Å units in all three dimensions within the defined region [20]. The Van Der Waals potentials and Coulombic terms, which represent steric and electrostatic fields, respectively, were calculated by using the standard Tripos force field. In the CoMFA method, a sp 3 hybridized carbon atom with a charge of 1e was used as a probe atom, the energy values of the steric and electrostatic fields were truncated at 30 kcal/mol [21].
The steric, electrostatic, hydrophobic, hydrogen bond donor and hydrogen bond acceptor CoMSIA potential fields were calculated at each lattice intersection of a regularly spaced grid of 1 Å and extending to 4 Å using a probe atom with radius 1.0 Å, +1.0 charge, and hydrophobic and hydrogen bond properties of +1. The attenuation factor was set to the default value of 0.3 [22].

Regression Analysis and Models Validation
The partial least-squares (PLS) approach, an extension of multiple regression analysis, was applied to linearly correlate the CoMFA and CoMSIA fields to the pIC 50 values. CoMFA and CoMSIA descriptors were used as the independent variables. Column filtering was used at the default value of 2.0 kcal/mol in the cross-validation part.
The cross-validation analysis was performed using the leave-one-out (LOO) method in which one molecule was omitted from the dataset. The activity of the omitted molecule was then predicted by using the model derived from the rest of the dataset [23]. The leave-one-out (LOO) cross-validation method could check the predictivity of the obtained model and identify the optimum number of components (ONC). Thus, the optimum number of components (ONC) was the number of components that led to the highest cross-validated correlated correlation coefficient r 2 (r 2 cv ) [24]. Finally, the CoMFA and CoMSIA models were generated using non-cross-validated PLS analysis with the optimum number of components (ONC) determined by the cross-validation.

Predictive Correlation Co-efficient (r 2 pred )
The predictive abilities of 3D-QSAR models were validated by predicting the activities of a test set of 10 compounds that were not included in the training set. These molecules were aligned to the template and their pIC 50 values were predicted by the produced models which were obtained using the training set. The predictive correlation coefficient (r 2 pred ), based on the molecules of the test set, was calculated using Equation (2): In this equation, SD is the sum of the squared deviations between the inhibitory activities of the test set and the mean activity of the training molecules and PRESS is the sum of squared deviations between predicted and actual activity values for each molecule in the test set [25].

Molecular Docking
The Surflex-Dock was applied to study molecular docking by using an empirical scoring function and a patented search engine to dock ligands into a protein's binding site [16]. The crystal structure of Aurora A was retrieved from the RCSB Protein Data Bank (PDB entry code: 2X6E). The ligands were docked into corresponding protein's binding site by an empirical scoring function and a patented search engine in Surflex-Dock [16]. All ligands and water molecules in Aurora A 2X6E have been deleted and the polar hydrogen atoms were added to 2X6E. Protomol, a representation of a ligand making every potential interaction with the binding site, was applied to guide molecular docking. Protomols could be established by three manners: (1) Automatic: Surflex-Dock finds the largest cavity in the receptor protein; (2) Ligand: a ligand in the same coordinate space as the receptor; (3) Residues: specified residues in the receptor [26,27].
In this paper, the automatic docking was applied. The Aurora A structure was utilized in subsequent docking experiments without energy minimization. Other parameters were established by default in the software. Surflex-Dock scores (total scores) were expressed in -log 10 (K d ) units to represent binding affinities. Then, the MOLCAD (Molecular Computer Aided Design) program was employed to visualize the binding mode between the protein and ligand. MOLCAD calculates and exhibits the surfaces of channels and cavities, as well as the separating surface between protein subunits [20][21][22]. MOLCAD program provides several types to create a molecular surface [16]. The fast Connolly method using a marching cube algorithm to engender the surface was applied in this work, thus the MOLECAD Robbin and Multi-Channel surfaces program exhibited with copious potentials were established. Moreover, Surflex-Dock total scores, which were expressed in -log 10 (K d ) units to represent binding affinities, were applied to estimate the ligand-receptor interactions of newly designed molecules.

CoMFA and CoMSIA Models
The statistical parameters for the CoMFA and CoMSIA models are given in Table 3. For the CoMFA model, partial least squares (PLS) regression produced a excellent cross-validated correlation coefficient (r 2 cv ) of 0.726 (>0.5) with an optimized component of 6, which suggesting that the model is reliable and it should be a useful tool for predicting the IC 50 values. The non cross-validated PLS analysis gave a high correlation coefficient (r 2 ) of 0.972, F value of 105.300 and a low standard error estimate (SEE) of 0.144. The contributions of steric and electrostatic fields to this model were 0.457 and 0.543, respectively. The predictive correlation coefficient (r 2 pred ) value based on molecules of the test set was 0.937 for the CoMFA model. The actual and predicted pIC 50 values of the training set and test set by the model are given in Table 2. The relationship between actual and predicted pIC 50 of the training set and test set compounds of the CoMFA model is illustrated in Figure 3a, where almost all points are located on the diagonal line.    Table 2. The graph of actual activity versus predicted pIC 50 of the training set and test set is illustrated in Figure 3b, where almost all points are located on the diagonal line.

CoMFA and CoMSIA Contour Maps
The results of the CoMFA and CoMSIA models were visualized through contour maps. These maps showed regions in 3D space where variation in specific molecular properties increased or decreased the activity. The molecular fields around the most active compound 20 are displayed in Figures 4-6, accordingly. These contour maps are significant for drug design, as they showed regions in 3D space where modifications of the molecular fields strongly correlated with concomitant changes in biological activity.
The steric contour map of CoMFA is shown in Figure 4a, which was almost the same as the corresponding CoMSIA steric contour map (Figure 4b). Compound 20 was selected as a reference molecule. The steric field was represented by green and yellow contours, in which green contours indicate regions where presence of bulky steric groups was favored and should enhance inhibitory activity of molecules, while the yellow contours represent regions where occupancy of steric groups was unfavorable. As shown in Figure 4, the presence of the green contour around the R 1 position suggested that a bulky group at this region would be favorable. By checking up all the R 1 modified compounds, it was found that derivatives 07-08 have the activity order of 07 (R 1 = Br) > 08 (R 1 = NO 2 ); compounds 13, 14, 17 have the activity order of 14 (R 1 = -SO 2 CH 2 CHCH 2 ) > 13 (R 1 = -SO 2 C 2 H 5 ) > 17 (R 1 = -SO 2 NH 2 ); compounds 17-19 have the activity order of 20 (R 1 = sulfo-pyrrolidine) > 19 (R 1 = -SO 2 N(CH 3 ) 2 ) > 18 (R 1 = -SO 2 NHCH 3 ) > 17 (R 1 = -SO 2 NH 2 ); compounds 23-26 have the activity order of 23 (R 1 = -NHSO 2 C 2 H 5 ) < 24 (R 1 = -NHSO 2 -benzene), 25 (R 1 = -NHSO 2 -CH 2 -benzene) < 26 (R 1 = -NHSO 2 -benzene). These were satisfactory according to the steric contour map. The R 2 was surrounded by three yellow contours, which suggested a bulky group at this region would decrease the inhibitory activity. This may explain why compounds 1-2, 5, which possessed a relative bulky group (e.g., -COOEt) at R 1 , showed significantly decreased activities than other compounds with a relatively minor substituent at R 2 . For instance, derivative 24 bearing a carboxy group at R 2 exhibited improved potency than compound 26 with an ethoxycarbonyl at this position. Furthermore, compound 20 with carboxyl group at the R 2 position was the most inactive compound.
The electrostatic field contour maps of CoMFA and CoMSIA are shown in Figure 5a and b, respectively. Compound 20 was selected as a reference molecule again. The electrostatic field is indicated by blue and red contours, which demonstrate the regions where electron-donating group and electron-withdrawing group would be favorable, respectively. In the electrostatic field, two blue contours around the terminal of R 1 and two red contours at the middle of the R 1 revealed that the electron-donating substituents at the terminal of the R 1 and the electron-withdrawing groups at the middle of the R 1 were essential for the inhibitory activity. Take the compounds 13-22 and 24-26 (R 1 = 4-CF 3 -benzyl) for an example, the strong electron-withdrawing sulfo, and sulfanilamido group at the middle of R 1 and electron-donating Et, -NH 2 , -NHCH 3 , -N(CH 3 ) 2 , 1-pyrrolidine, 1-piperidine, and 4-morpholine groups at the terminal of R 1 in these compounds resulted in significantly increased activity. The blue contour near the chain between 6,7-dihydro-1H-inden-4(5H)-one and R 2 suggested the electron-donating group (-CH 2 CH 2 -) at this position may be essential for potency. All of the derivatives involved in this study possessed a -CH 2 CH 2 -group at this site, which revealed the extreme importance of the electron-donating substituent. In hydrophobic fields, yellow and white contours highlighted areas where hydrophobic and hydrophilic properties were favored. In Figure 6a, the yellow contour around the chain of R 1 position indicated that a hydrophobic substituent would benefit the potency. Most of the derivatives involved in this study possessed a hydrophobic group at this site, which revealed the extreme importance of the hydrophobic substituent. Furthermore, the actual IC 50 of these compounds was basically 10-fold than those without a hydrophobic group. A huge white contour near the terminal of R 1 site suggested that a hydrophilic group may be favored. This may explain why derivative 20 with a relatively more hydrophilic N atom at this position exhibited better potencies than compounds 25-33. Another white contour around R 2 position demonstrated that a hydrophilic substituent carboxyl would be favorable. Most of the compounds possessed a hydrophilic substituent at this site, except compounds 1-2 with a hydrophobic -OOC 2 H 5 at R 2 , which displayed lower activity than compounds 3-25.
In hydrogen bond donor field, the cyan and purple contours indicated favorable and unfavorable hydrogen bond donor groups. In Figure 6b, the two bulk purple contours near the R 1 position revealed that hydrogen bond acceptor groups may benefit the potency. In fact the sulfo group and quaternary amine atom at this position acted as hydrogen bond acceptor by forming H-bonds with residues of the ATP binding site of Aurora A. This may explain why compounds 19-22 displayed relatively better activities. Likewise, a bulk purple contour near R 2 revealed that they acted as hydrogen bond acceptor by forming H-bonds with residues of the ATP binding site of Aurora A. Most of the derivatives involved in this study possessed a hydrogen bond acceptor group (carboxyl) at this site, which indicated the extreme importance of the hydrogen bond acceptor group substituent.
In hydrogen bond acceptor field, the magenta and red contours identified favorable and unfavorable positions for hydrogen bond acceptors. In Figure 6c, two bulk red contours around the R 1 and R 2 indicated that a hydrogen bond acceptor substituent at these sites would increase the activity. The inference obtained by Figure 6c satisfactorily matched the hydrogen bond donor contour map.

Docking Analysis
Surflex-Dock was applied to investigate the binding mode between these pyrrole-indoline-2-ones and Aurora A. In this paper, Surflex-Dock could also serve to inspect the stability of 3D-QSAR models previous established. To visualize secondary structure elements, the MOLCAD Robbin surfaces program was applied. Furthermore, the MOLCAD surface of ATP site was also developed and displayed with cavity depth (CD), electrostatic potential (EP) to further explore the interaction between these inhibitors and the receptor. The most potent inhibitor 20 was selected for more detailed research.
In Figure 7a, the hydrogen bonding (dashed lines) interactions between the reference compound 20 with highest inhibitory activity is shown and the key residues (Lys162, Asp274, Glu211, and Arg220) of the ATP site of Aurora A (PDB code 2X6E) are labeled. A total of five hydrogen bonds were formed: the sulfo at R 1 position acted as the hydrogen bond acceptor and formed two H-bonds with the secondary amino group of the Tys162 residue, and a H-bond with the secondary amino group of Asp274; the carbonyl substituent on the 6,7-dihydro-1H-inden-4(5H)-one scaffold in compound 20 also acted as the hydrogen bond acceptor and formed H-bond with primary amino group of Arg220 residue; the secondary amino on the 6,7-dihydro-1H-inden-4(5H)-one in compound 20 acted as H-bond donor and formed H-bond with carbonyl group of Glu211. These results observed by Figure  7a satisfactorily matched the observation taken from the CoMSIA hydrogen bond donor contour map.
In Figure 7b, the secondary structure of the ATP pocket within compound 20 is depicted: alpha helices are displayed as helices or cylinders, while beta sheets are shown as arrows and the loop regions as tubes. The key residues and hydrogen bonds (dashed lines) are labeled.
In Figure 7c, the MOLCAD Multi-Channel cavity depth potential surfaces structure of the binding site within the compound 20 is displayed and the cavity depth color ramp ranged from blue (low depth values = outside of the pocket) to light red (high depth values = cavities deep inside the pocket). In Figure 7c, the R 1 position of compound 20 is observed in a blue area, revealing that this position was embedded deep inside the ATP pocket. It can be simply inferred that a bulky group at R 1 position may be favorable. Since the R 2 site was oriented to a light red area, which illustrated a minor group was anchored into a favorable region, this suggests that minor groups may benefit the potency. The observation obtained by Figure 7c satisfactorily matched the corresponding CoMFA and CoMSIA steric contour maps. In Figure 7d, the MOLCAD electrostatic potential surface of the binding region was demonstrated with the color ramp for EP ranging from red (most positive) to purple (most negative). The 1-pyrrolidinyl group at the terminal of R 1 was found in a blue area, which indicated that electron-donating properties at this site were essential for the potency; the sulfo group was in a yellow area, which suggested that electron-withdrawing properties would be favored; the -CH 2 CH 2 -chain between 6,7-dihydro-1H-inden-4(5H)-one and carboxyl was anchored in a blue area which suggested that an electron-donating substituent at this position would be essential for the potency. These results were well compared with the corresponding CoMFA and CoMSIA electrostatic contour maps.

Design for New Molecules
The detailed contour map analysis of both COMFA and CoMSIA models and the docking analysis empowered us to identify structural requirements for the observed inhibitory activity (Figure 8). In detail, bulky, electron-donating, hydrophobic, and hydrogen bond acceptor substituents at the terminal of R 1 (e.g., pyrrolidine) would increase activity; bulky, electron-withdrawing, hydrophilic, and hydrogen bond acceptor substituents at the chain of the R 1 (e.g., sulfo group) are favored for inhibitory activity; minor, hydrophilic, hydrogen bond donor groups at the R 2 (e.g., carboxyl) may benefit potency. Moreover, the electron-donating chain (-CH 2 CH 2 -) between pyrrole and R 2 may be essential for the activity of the inhibitors. The chain of the R 1 , indolin-2-one, and 6,7-dihydro-1H-indol-4(5H)-one groups were crucial for binding to ATP pocket of Aurora A.
Based on QSAR and docking results, inhibitor 20, with the highest activity, was taken as a template to design new compounds. A set of nine new compounds with high predicted activity were designed and assessed (Table 4), and the graphs of their predicted pIC 50 values versus the most active compound 20 are shown in Figure 9. After energy minimization, the nine new compounds were docked into the ATP binding site of Aurora A. The total scores of these compounds were higher than that of the template molecule (Table 4). The designed molecule D8 was selected for more detailed investigation as one more H-bond from carbonyl on indolin-2-one group with Ala213 can be observed in Figure 10, where the compound D8 with the highest surflex-dock total score were docked into the ATP pocket of Aurora A.

Conclusion
We employed 3D-QSAR and docking methods to explore the structure-activity relationship of a series of pyrrole-indoline-2-ones as Aurora inhibitors. The 3D-QSAR models described herein possessed excellent consistency. The predictive ability of the models were manifested in the good correlation between actual and predicted pIC 50 values for the test molecules. The CoMFA and CoMSIA contour maps as well as the docking results provided enough information to understand the structure-activity relationship, identify structural features influencing the inhibitory activity, and furthermore, design new molecules. A number of novel pyrrole-indoline-2-ones derivatives were designed by using the SAR taken from the present study. The predicted activities of these newly designed pyrrole-indoline-2-ones may be reliable. The correlation of the results obtained from 3D-QSAR and docking studies can serve as a useful guideline for the further modification of pyrrole-indoline-2-ones that function as Aurora A inhibitors.