Structure-Based Discovery and Bioactivity Evaluation of Novel Aurora-A Kinase Inhibitors as Anticancer Agents via Docking-Based Comparative Intermolecular Contacts Analysis (dbCICA)

Aurora-A kinase plays a central role in mitosis, where aberrant activation contributes to cancer by promoting cell cycle progression, genomic instability, epithelial-mesenchymal transition, and cancer stemness. Aurora-A kinase inhibitors have shown encouraging results in clinical trials but have not gained Food and Drug Administration (FDA) approval. An innovative computational workflow named Docking-based Comparative Intermolecular Contacts Analysis (dbCICA) was applied—aiming to identify novel Aurora-A kinase inhibitors—using seventy-nine reported Aurora-A kinase inhibitors to specify the best possible docking settings needed to fit into the active-site binding pocket of Aurora-A kinase crystal structure, in a process that only potent ligands contact critical binding-site spots, distinct from those occupied by less-active ligands. Optimal dbCICA models were transformed into two corresponding pharmacophores. The optimal one, in capturing active hits and discarding inactive ones, validated by receiver operating characteristic analysis, was used as a virtual in-silico search query for screening new molecules from the National Cancer Institute database. A fluorescence resonance energy transfer (FRET)-based assay was used to assess the activity of captured molecules and five promising Aurora-A kinase inhibitors were identified. The activity was next validated using a cell culture anti-proliferative assay (MTT) and revealed a most potent lead 85(NCI 14040) molecule after 72 h of incubation, scoring IC50 values of 3.5–11.0 μM against PANC1 (pancreas), PC-3 (prostate), T-47D and MDA-MB-231 (breast)cancer cells, and showing favorable safety profiles (27.5 μM IC50 on fibroblasts). Our results provide new clues for further development of Aurora-A kinase inhibitors as anticancer molecules.


Introduction
Aurora kinases are a group of serine/threonine kinases fundamental for cell cycle control and mitosis. The first Aurora-kinase was discovered in Drosophila [1], where its mutations caused a failure of centrosome separation, leading to the formation of scattered monopolar spindles instead of bipolar ones; therefore, this kinase was named "Aurora" indicating resemblance to the northern lights Computer-aided structure-based modeling is a drug discovery procedure that depends on the existence of a three-dimensional (3D)-structure of the enzyme to be targeted [51,52]. Docking is a fundamental element for structure-based drug design approaches that involve virtual fitting of ligands into the relevant binding site. Forcefield-based algorithms are employed by docking engines in order to calculate the interactions between bonded groups within the virtual ligand-protein complex, either attractive or repulsive [51,53,54]. Docking results in various conformational patterns exploring the best fitting one. Docking-based comparative intermolecular contacts analysis (dbCICA) is a new 3D-quantitative structure-activity relationship (QSAR) approach that was developed to build structure-based pharmacophores after verifying the best docking configurations [55,56]. Particular docking configurations are validated through defining a cluster of binding-site atoms in the binding-site to which potent ligands are fitted, while poorer or inactive ones evade contacting to docking ligands to the 3D structure for the protein [56].
Corresponding docking settings are assumed successful when they properly differentiate the molecules in the binding site, in an affinity-discriminating manner, explaining both the bioactivity variation among them and determining the critical interactions important for the affinity [57,58]. Moreover, successful dbCICA models give information about the vital amino-acids engaged in ligand-binding, which will be valuable in subsequent structure-based optimization [59]. Next, pharmacophoric model(s) are built based on the critical contact points determined by dbCICA modeling.
In this research, seventy-nine Aurora-A Kinase inhibitors were docked into the binding pocket of the protein crystal-structure of Aurora-A Kinase, utilizing the LibDock docking engine [59]. The inhibitors were docked in two ionization states (unionized and ionized) and two binding-site hydration states (anhydrous and hydrous). High ranking docked poses produced by each docking setting were scored using seven scoring functions. Subsequently, dbCICA was used to gauge the success of different docking/scoring settings. Thereafter, the best dbCICA models were recruited as templates to create pharmacophores that were used later as a query to search, and to screen the National Cancer Institute (NCI) database. A fluorescence resonance energy transfer (FRET)-based Z′-LYTE ® kinase assay (Invitrogen, Carlsbad, CA, USA) [68,69] enzyme assay was used to assess the Computer-aided structure-based modeling is a drug discovery procedure that depends on the existence of a three-dimensional (3D)-structure of the enzyme to be targeted [51,52]. Docking is a fundamental element for structure-based drug design approaches that involve virtual fitting of ligands into the relevant binding site. Forcefield-based algorithms are employed by docking engines in order to calculate the interactions between bonded groups within the virtual ligand-protein complex, either attractive or repulsive [51,53,54]. Docking results in various conformational patterns exploring the best fitting one. Docking-based comparative intermolecular contacts analysis (dbCICA) is a new 3D-quantitative structure-activity relationship (QSAR) approach that was developed to build structure-based pharmacophores after verifying the best docking configurations [55,56]. Particular docking configurations are validated through defining a cluster of binding-site atoms in the binding-site to which potent ligands are fitted, while poorer or inactive ones evade contacting to docking ligands to the 3D structure for the protein [56].
Corresponding docking settings are assumed successful when they properly differentiate the molecules in the binding site, in an affinity-discriminating manner, explaining both the bioactivity variation among them and determining the critical interactions important for the affinity [57,58]. Moreover, successful dbCICA models give information about the vital amino-acids engaged in ligand-binding, which will be valuable in subsequent structure-based optimization [59]. Next, pharmacophoric model(s) are built based on the critical contact points determined by dbCICA modeling.
In this research, seventy-nine Aurora-A Kinase inhibitors were docked into the binding pocket of the protein crystal-structure of Aurora-A Kinase, utilizing the LibDock docking engine [59]. The inhibitors were docked in two ionization states (unionized and ionized) and two binding-site hydration states (anhydrous and hydrous). High ranking docked poses produced by each docking setting were scored using seven scoring functions. Subsequently, dbCICA was used to gauge the success of different docking/scoring settings. Thereafter, the best dbCICA models were recruited as templates to create pharmacophores that were used later as a query to search, and to screen the National Cancer Institute (NCI) database. A fluorescence resonance energy transfer (FRET)-based Z -LYTE ® kinase assay (Invitrogen, Carlsbad, CA, USA) [68,69] enzyme assay was used to assess the activity of captured molecules requested from the NCI. Five promising Aurora-A kinase inhibitors were identified. Their bioactivity was next evaluated using a FRET-based kinase assay and cell culture anti-proliferative assay (MTT), which revealed five promising molecules. The most potent lead inhibitor 85(NCI 14040) scored half-maximal inhibitory concentration (IC 50 ) values of 3.5 µM, 8.2 µM, 8.8 µM, and 11.0 µM against PANC1 (pancreas adenocarcinoma), PC-3 (prostate adenocarcinoma), T-47D (ductal breast carcinoma), and MDA-MB-231 (triple negative breast adenocarcinoma), respectively, and showed favorable safety profile (27.5 µM IC 50 on fibroblasts). Our results provide new clues for further development of Aurora-A kinase inhibitors as anticancer molecules.

Structure-Based Molecular Modeling
To understand the interactions associated in ligand binding into Aurora-A kinase, we implemented the innovative structure-based 3D-QSAR methodology; docking-based comparative intermolecular contacts analysis (dbCICA) [51,56,57,[60][61][62][63][64][65][66][67] as a tool to study how the seventy-nine Aurora-A kinase inhibitors (Table S1 in Supplementary Material) bind into Aurora-A binding pocket. dbCICA modeling was used to create successful pharmacophore hypotheses that highlighted the critical binding interactions engaged in ligand-Aurora-A kinase binding. These successful pharmacophores were exploited as 3D search queries to screen the NCI database for novel anti-Aurora-A kinase molecules.
The cycles of docking/scoring/dbCICA modeling were repeated to conduct all probable docking combinations, originating from using ionized ligands or un-ionized ligands, and the absence of crystallographically explicit water molecules, or its presence within the binding site.
Intermolecular ligand-binding site contacts were discovered by applying two thresholds of distances: 2.5 Å and 3.5 Å. Thus, two corresponding contacts binary matrices were engendered for each docking solution (detailed in methods Section 4 Docking-Based Comparative Molecular Contacts Analysis (dbCICA)). Afterwards, a genetic algorithm (GA)-based search was employed to find the best combination of receptor-ligand intermolecular contacts, efficient in justifying the variations in bioactivity within training compounds. GA was exploited to visualize combinations of four to ten directly proportional intermolecular (positive) contacts followed by five or ten inversely proportional (negative) contacts. Interestingly, an intermolecular ligand binding site contact threshold of 3.5 Å yielded superior dbCICA models compared to their 2.5 Å counterparts; moreover, combining ten inversely proportional (negative) contacts yielded superior dbCICA models compared to combining five (negative) contacts. Table 1 shows the contacts distance thresholds, number of negative and positive contacts, and the statistical criteria of the best dbCICA models based on the LibDock docking engine and multiple scoring functions. Clearly from Table 1, the best correlation statistics (particularly r 2 5-fold ) were achieved for dbCICA models, based on two conditions: (i) docking ionized ligands into a hydrous binding site, using the PMF04 scoring function with five features and ten exclusion volumes, i.e., model (SB-1), and (ii) docking ionized ligands into a unhydrous binding site, using the Ligscore2 scoring function, with ten features and ten exclusion volumes i.e., model (SB-2). Table 2 shows critical Aurora-A kinase binding-site contact atoms and their weights as suggested by optimal dbCICA models, (SB-1) and (SB2).

Building Pharmacophores Based on Optimal dbCICA models
Successful dbCICA models and their corresponding pharmacophore are summarized in Figure 2. As an example, for the complete stepwise process of pharmacophore building, dbCICA model (SB-1) translation into its corresponding pharmacophore within DiscoveryStudio 2.5.5 environment is demonstrated in Figure 3A-F (see Section 4 Manual Generation of Pharmacophores Corresponding to Successful dbCICA Models). Successful dbCICA models, their corresponding pharmacophores, and the refined pharmacophore Hypo(SB-1) (A) pharmacophore extracted form successful SB-1 model that is based on docking ionized ligands into hydrous binding site using PMF04 scoring function with five features. (B) Hypo(SB-2) pharmacophore extracted form successful SB-2 model that is based on docking ionized ligands into unhydrous binding site using Ligscore2 scoring function with ten features and ten exclusion volumes. (C) Sterically-refined version of Hypo(SB-1) includes 93 exclusion spheres that are shown as grey spheres (1.2 Å tolerance radii). Light blue sphere represents Hbic (hydrophobic feature), orange-vectored spheres represent ring aromatic feature, green-vectored spheres represent hydrogen bond acceptor (HBA), and gray spheres represent exclusion region. Initially, the binding pocket was annotated by rendering significant contact atoms in spherical forms according to the particular successful dbCICA model ( Figure 3A). Subsequently, the docked poses of potent (Ki ≤ 5.1 nM) and the docked ligands that were well-behaved were left in the binding pocket (as in Figure 3B,C). Well-behaved ligands are those of least difference between fitted and experimental bioactivities, as determined by regression against their docking-based contact-summations in the specified dbCICA model. Thereafter, proper pharmacophoric features were positioned onto common aligned chemical functionalities of well-behaved docked potent ligands ( Figure 3D). The pharmacophoric features were assigned in a way that hallmarks the interactions encoded by the critical contacts.
The conversion of the dbCICA model SB-1 (Tables 1 and 2) into the corresponding pharmacophore model Hypo(SB-1) ( Figure 2) is described herein (i.e., as an example, Figure 3A-F. Figure 3 shows how pharmacophore Hypo(SB-1) was extracted from dbCICA model SB-1 (Tables 1 and 2 and Figure 2). Appearance of significant contact at the NH of ARG137 and agreement among potent well-behaved docked ligands on placing nearby hydrogen-bond acceptor atoms (carbonyl oxygen of 52 and the heterocyclic sp 2 N of 50, and 77, 71, Figure 3C,D) directed towards the guanidine of ARG137 suggested placing the HBA feature onto these ligand atoms, directed towards the guanidine of ARG137 ( Figure 3D).
Likewise, emergence of significant positive contact at the isopropyl side chain of LEU210, combined with agreement among well-behaved potent ligands on placing hydrophobic groups nearby (e.g., methylpyrazole in 52) suggested placing hydrophobic (Hbic) feature onto these groups, as in Figure 3D.
The emergence of significant positive contacts at TYR212 and PHE144, combined with agreement among potent well-behaved docked inhibitors to project aromatic molecular fragments, such as the pyrimidine, pyrazole, and phenyl ring in 52 and 50, and pyrimidine, thiazole, and the phenyl ring in 71, and pyrazine and phenyl rings in 77, adjacent to these significant contacts, prompted us to place the aromatic ring pharmacophoric features (RingArom) at these positions (as in Figure 3D,E). Apparently, these features encode for stacking against the aromatic side chains of TYR212 and PHE144 ( Figure 3C-E). Figure 2 shows the dbCICA-based pharmacophores, while Table 3 shows their 3D coordinates.

Evaluation via Receiver Operating Characteristic Analysis and HIPHOP Refinement Using Exclusion Spheres
The resulting pharmacophores; Hypo(SB-1) and Hypo(SB-2) were evaluated using receiver operating characteristic (ROC) curve analysis to validate their capability of selectively capturing various potent Aurora-A kinase inhibitors from large lists of closely related experimentally-established Aurora-A kinase inhibitors extracted from the European Bioinformatics Institute database ChEMBL database (Testing Set, Table S2 in Supplementary Material), meaning that ROC analysis evaluates the ability of a pharmacophore to correctly discriminate between active compounds to be mapped and inactive compounds to be discarded. There are several successfulness criteria for ROC evaluation: (1) area under the curve (AUC) of the corresponding ROC curve; (2) sensitivity (SE) known as overall true positive rate (TPR) or truly active compound being captured, the (TPR) is calculated by dividing the true positive (TP) by the sum of true positives (TP), and false negatives (FN); (3) specificity (SP) known as overall true negative rate (TNR) or truly inactive compounds being discarded after proper identification, the (TNR) is calculated by dividing the true negative (TN) by the sum of true negatives (TN) and false positives (FP), and also (4) overall accuracy (ACC), and (5) overall false positives (FP). ROC performances are shown in Table 4. Hypo(SB-1) had superior/better specificity (SP) or (TNR) than Hypo(SB-2), though Hypo(SB-1) had weaker sensitivity (SE) or (TPR). It is known that the sensitivity (SE) gives information about active molecules that will be discarded, or the false negative compounds. The lower the false negative is, the higher the sensitivity, and the better the test in selecting active ones. Besides, Specificity (SP) gives information about inactive compounds that are wrongly mapped or captured, or the false positive compounds. The lower the false positive is, the higher the specificity, and the better the test in discarding inactive compounds. However, it is well-known that it is not possible to optimize both the specificity and the sensitivity simultaneously, and thus the ROC curve analysis providing a comprehensive image of the capability of a test to perform the distinction, considering all selection thresholds [79,80].
Therefore, the behavior of Hypo(SB-1) is not considered inferior in terms of sensitivity (SE). Looking back at (AUC) and (ACC) ( Table 4), Hypo(SB-1) has higher (AUC) than Hypo(SB-2). It is known that the greater the AUC, the more effective the virtual screening workflow in discriminating active from inactive compounds. The most important dependable principle to assess which pharmacophore has better performance is concluded from ROC curves in Table 4. The optimal ROC curve is where active molecules are completely separated from inactive ones; the curve skyrockets vertically northwest to the upper left corner, significantly, starts at the point (0,1), which represent perfect classifier pharmacophore [79,80]. Accordingly, Hypo(SB-1) excel over Hypo(SB-2) because though Hypo(SB-2) starts at (0,1), however, it swings away from y-axis earlier than the Hypo(SB-1) does. Hypo(SB-1) starts at (0,1) and lays over y-axis, which means that it assigns the highest capturing values to several active molecules before it mistakes and captures an inactive molecule for the first time.
Based on ROC, results Hypo(SB-1) had better behavior over Hypo(SB-2) in terms of ROC-AUC, ACC, and TPR, and in order to enhance its ability to discard inactive compounds, or to improve its (TNR), it was decided to complement it with exclusion spheres by employing HIPHOP REFINE module of DiscoveryStudio 2.5.5. [81,82]. Although pharmacophore models help in understanding ligand-receptor interactions, they lack proper steric constrains crucial for identifying the binding pocket size. This problem renders pharmacophoric models sometimes promiscuous. Thus, it was decided to decorate our optimal dbCICA pharmacophore; Hypo(SB-1), with exclusion spheres, using the HIPHOP REFINE module of DiscoveryStudio software suit [81,82]. Compounds listed in Table S2 in Supplementary Material, were utilized to decorate pharmacophore Hypo(SB-1) with exclusion spheres. This HIPHOP REFINE Testing Set (Table S2 in Supplementary Material) was chosen in a way that the weakly active-member bioactivities are explainable by steric clashes with the binding pocket. HIPHOP REFINE was configured to permit a maximum of 100 exclusion spheres to be added to pharmacophoric hypothesis. Pharmacophore Hypo(SB-1) was decorated with an additional 93 exclusion volumes.
Unsurprisingly, the sterically Refined-Hypo(SB-1) version of the pharmacophore (shown in Figure 2) illustrated improved ROC-AUC, ACC, and superior capability of discarding inactive compounds (TNR) in comparison to it is unrefined counterpart, and this is based on ROC analysis for Refined-Hypo(SB-1) (Table 4). Accordingly, it was decided to use the Refined-Hypo(SB-1) pharmacophore for in-silico search of the National Cancer Institution (NCI) database for new anti-Aurora-A kinase inhibitors.

In-Silico Screening and FRET Based Enzyme Inhibition Assay
The fundamental use of pharmacophores models is the discovery of new chemical scaffolds demonstrating similar biological characteristics or scaffold hopping [56][57][58][59][60][61][62][63][64][65][83][84][85]. Therefore, Refined-Hypo (SB-1) pharmacophore ( Figure 2) was used as a 3D query to search and screen the NCI list of compounds for new Aurora-A kinase inhibitors with novel chemotypes. The large number of added exclusion volumes on Refined-Hypo (SB-1) pharmacophore (i.e., 93), led to intensive data distinction, by which a huge number of inactive compounds were discarded upon in-silico screening, resulting in enriched probability that the remaining un-discarded hits are more likely to be active ones. The captured hits were filtered based on SMARTS SMILES Arbitrary Target Specification pattern) filtration to exclude reactive ligands [81] and based on Lipinski's rule to ensure good pharmacokinetic properties [86]. Surviving hits were fitted against the Hypo(SB-1) and the lowest fitting hits were discarded leaving only 117 hits that were subsequently docked into Aurora-A kinase protein binding pocket (PDB code: 3w2c) employing the same docking/scoring settings of corresponding dbCICA models (SB-1) and (SB-2) (in Table 1). The crystallographic 3D coordinates of Aurora-A kinase protein were retrieved from the Protein Data Bank (PDB code: 3w2c).
The resulting docked poses (i.e., of hits) were analyzed to identify their binding critical contacts. The activity of each hit was predicted by using the sums of critical contacts as identified by the respective dbCICA model (Tables 1 and 2) through substituting the sum of contacts with binding-site atoms in the corresponding dbCICA-regression equations [56,57,[60][61][62][63][64][65][66][67].
Only 75 highest ranking hits were obtainable from the NCI to be further assessed via kinase inhibition assay. Table S3 and Figure S2 in Supplementary Material show the structures of the seventy-five captured hits (numbered 80-154) available from the NCI, their contact atoms summation, and their predicted bioactivities. These hits were bioassayed against Aurora-A kinase with a FRET-based enzyme inhibition assay conducted using Z -LYTE ® kinase assay (Invitrogen, Carlsbad, CA, USA) [68,69]. The 75 hits were initially screened at 10 µM; results are summarized in Table S3, Supplementary Material. Five promising hits have shown anti-Aurora-A Kinase inhibition percentages exceeding 51% at 10 µM as detailed in Table 5. Inhibition percentages at 10 µM were equal to 86% by hit 130 (NCI 12849), 75% by 88 (NCI 1576), 56% by 85 (NCI 14040), 55% by 112 (NCI 12415), and 51% by 141 (NCI 12492). Chemical structures of the five hits are shown in Figure 4. values (<2) [87,88] and excellent R 2 values, which strongly suggest their authenticity (i.e., nonpromiscuous) in binding to the protein kinase. However, 130 (NCI 12849) show steep dose-response curve, and Hill slope coefficient of 7.6 indicate additional non-specific enzymatic mechanisms of inhibition. Figure 6A-T demonstrates how these hits fit or map to the optimal Hypo(SB-1) before and after refinement (binding to sterically refined (SB-1)), and to Hypo(SB-2). Nuclear magnetic resonance (NMR) and mass spectrometry were used to validate the chemical structures and purities of those hits ( Figures S3-S38 Table 4. Three of these promising hits (inhibition % at 10 µM above 56%) were next tested at several concentrations range (0.1-100 µM) to determine their IC 50 values by using nonlinear regression to plot log (concentration) vs. inhibition % curves (shown in Figure 5) using GraphPad Prism software.   Table 5. The identities and purities were validated by NMR and mass spectroscopy in ( Figures S3-S38 in Supplementary Data). Table 5. The IC 50 values of the five active hits that produced anti-Aurora-A kinase inhibition above 50% at 10 µM using Z -LYTE ® kinase assay (Invitrogen, Carlsbad, CA, USA), sum of their critical contacts, and their predicted activity. These high-ranking hits captured by Refined-Hypo(SB-1) derived from Hypo(SB-1) pharmacophore were docked into Aurora-A Kinase binding pocket (3w2c) using the docking-scoring settings of (SB-1). Their docked poses were analyzed to identify their critical binding contacts (marked by dbCICA model Table 2) and were used to predict their Ki values or activities by substituting the sum of binding contacts in the respective dbCICA regression equations ( Table 1). Steps of activity prediction were also employed using docking/scoring settings of (SB-2) dbCICA model to assess the similarity extent in "predicted activity" of both (SB-2) model and the better performing (SB-1) model. inhibition % values plotted in Figure 5. f Not Detected.
The five promising hits codes, their contact atoms summation, their predicted bioactivities, and their inhibitory IC 50 values are summarized Table 5 (other weakly active hits are presented in  Table S3 and Figure S2 in Supplementary Material). Figure 4 shows the dose-response curves of the three most potent hits. Evidently, the dose-response curves of hits 88 (NCI 1576) and 85 (NCI 14040) show reasonable Hill slopes/coefficient values (<2) [87,88] and excellent R 2 values, which strongly suggest their authenticity (i.e., non-promiscuous) in binding to the protein kinase. However, 130 (NCI 12849) show steep dose-response curve, and Hill slope coefficient of 7.6 indicate additional non-specific enzymatic mechanisms of inhibition. Figure 6A-T demonstrates how these hits fit or map to the optimal Hypo(SB-1) before and after refinement (binding to sterically refined (SB-1)), and to Hypo(SB-2).
Nuclear magnetic resonance (NMR) and mass spectrometry were used to validate the chemical structures and purities of those hits ( Figures S3-S38 in Supplementary Data).

Cell Culture Anti-Proliferation Assay: In Vitro Evaluation
The five potent hits showing anti-Aurora-A Kinase inhibition above 51% at 10 µM using FRET-based Enzyme Inhibition Assay (Table 5) were further examined using MTT cytotoxicity assay against various cancer cells. The standard Aurora-A kinase inhibitor Tozasertib/(VX-680) [41] was used as a positive control. Tozasertib/(VX-680) is a pyrimidine derivative discovered in 2004 with high-affinity to Aurora-A Kinase and its inhibitory constant or Ki value for Aurora-A is 0.6 nM [40,41] (Figure 1).
In the present research, the cytotoxicity of the five compounds was initially screened against cancer cell-lines including MDA-MB-231, PANC1, and T-47D at two preliminary concentrations: 50µM and 25µM. Cytotoxicity MTT assay results are detailed in (Table 6). Next, the compounds that produced cell death above 50% at 50µM were further examined to determine their (IC 50    Finally, to investigate the safety index of the compounds with specified IC 50 values, these compounds were examined using MTT assay against Fibroblasts to determine their cytotoxic effect against non-cancerous cells. A compound that demonstrates a higher IC 50 value against normal non-cancerous cells than the IC 50 it produces against cancer cells, demonstrates favorable safety profile. The IC 50 value of hit 85 (NCI 14040) was 27.5 µM against fibroblasts, which is a concertation that is higher than all the IC 50

Discussion and Conclusions
Aurora-A kinase inhibitors have shown promising results in suppressing many types of human cancers, because it is involved in regulating multiple molecular targets and signaling pathways, specifically in mitosis. Therefore, Aurora-A is considered a promising kinase to be targeted via designing new inhibitors against it. In this research, a structure-based computational workflow was implemented to explore the structural features required to design new anti-Aurora-A kinase molecules by employing seventy-nine Aurora-A kinase known inhibitors and utilizing an innovative structure-based methodology, i.e., dbCICA, to identify the best possible docking settings required to fit the seventy-nine inhibitors into the active binding site pocket of Aurora-A kinase protein structure. Optimal dbCICA models were converted into corresponding pharmacophore models. The model that had the optimal quality of capturing the active hits and discarding the inactive molecules (i.e., SB-1) was used after refinement as an in-silico search query to screen the National Cancer Institute chemical structural database. The (SB-2) dbCICA model was used for purposes of re-evaluating the activity predicted by (SB-1) or to assess the similarity extent in "predicted activity" between the two models.
Previous studies have shown that there are three active site residues of Aurora-A including (Leu215, Thr217, and Arg220) that lead to selective Aurora-A inhibition upon targeting these residues in the active site [39,92]. It is noteworthy to emphasize that Arg220 and Leu215 are two critical binding site contacts determined by the (SB-2) dbCICA model ( Table 2). The Refined-Hypo (SB-1) was adopted in this research because it demonstrated better ROC analysis than (SB-2) model, additionally, the predicted activity by (SB-2) was taken into consideration upon requesting the hits from NCI (Table  S3 in Supplementary Material).
Looking into the FRET-based kinase inhibition assay results (Table 5) the IC 50 values for the Hits 88 (NCI 1576), 85 (NCI 14040) and 130 (NCI 12849) were equal to 2.6 µM, 4.3 µM, and 5.9 µM, respectively. Although there is a minimal mismatch in the hierarchy between the predicted activities employed by (SB-1) model and the activities obtained by FRET-based kinase assay, however, it is a considered a minimal difference in hierarchy because their IC 50 values are relatively similar. It must be noted that predicted bioactivity obtained by dbCICA models were utilized to prioritize the captured hits for subsequent in vitro testing, bearing in mind that dbCICA approach succeeded in extracting 75 suggested molecules out of an enormous number of compounds in the NCI database, from which 5 molecules are active anti-Aurora-A inhibitors with a top-ranking potent lead inhibitor against four cancer types with a favorable safety profile.
Interestingly, the predicted activity by (SB-1) model, whose refined pharmacophore was used as 3D search query, has revealed that hit 85 (NCI 14040) is the most effective Anti-Aurora-A among the five active hits (i.e., predicted Ki = 1.2 nM). This result is consistent with the discovery of the effect that hit 85 (NCI 14040) produce as the most potent inhibitor against PANC1, PC-3, T-47D, and MDA-MB-231 cells (IC 50 values of 3.54 µM, 8.2 µM, 8.8 µM, and 11.0 µM), respectively, demonstrating consistency between cell-based assay and dbCICA prediction in this case.
The increasing availability of crystal structures of Aurora-A Kinase with different inhibitors will allow further optimization of existing leads as well as the discovery of new ones. Moreover, the newly identified compounds belong to distinct chemical classes than the existing inhibitors, shown in Figure 1, in which Aurora-A kinases comprise successive donor-acceptor functional groups, accordingly, this research enlarges the arsenal of chemical scaffolds for additional rational drug design. The subsequent future lead optimization research will be further assisted by modifying functional groups to approach more selective, more potent, and less toxic compounds.
Although this research study delivers new inhibitors against Aurora-A kinase with new chemical structural backbone compared to the previously published Aurora-A inhibitors employed in clinical trials, however, some limitations need to be addressed here and examined in future work. Bioactivity evaluation should be estimated from three independent experiments in duplicates or triplicates at each concentration point. Permeability and selectivity of these inhibitors should be investigated. As a beneficial improvement; the Hypo(SB-2) pharmacophore could be utilized for in silico 3D search query to explore the NCI library for additional possible Aurora-A kinase inhibitors rather than using it for purposes of validating the activity predicted by Hypo(SB-1), as previous studies have shown that there are three active site residues for Aurora-A (i.e., Leu215, Arg220 and Thr217) that lead to selective Aurora-A kinase inhibition upon targeting these residues [39,92] and it is notable that Leu215 and Arg220 are two critical binding site contacts defined by the (SB-2) dbCICA model ( Table 2) which implies that (SB-2) might deliver promising selective anti-Aurora-A kinase molecules.

Software and Hardware
The following software packages were utilized in the present research:

Method
The workflow adopted in this research project is summarized in Figure 7.

Molecular Modeling
Data Set for Aurora-A Kinase inhibitors and Ligands Preparation (ChEMBL) database (https://www.ebi.ac.uk/chembl/) was searched for Aurora-A Kinase inhibitors. The search identified 1168 available Aurora-A Kinase inhibitors, they were downloaded and carefully inspected to confirm the consistency of their bioassay procedure standardized with a well-known Aurora-A kinase standard inhibitor. The consistent bioassay determining their bioactivity (Ki value) is affinity or binding kinase assay with minor differences in the procedure, any molecule assayed functionally was not utilized. List of seventy-nine Aurora-A Kinase inhibitors were collected for modeling from six literature articles [42][43][44][70][71][72]. The consistency of the bioassay data is critical for quantitative structure-activity relationship (QSAR) modeling including dbCICA [56,57,61]. The collected list of compounds and their reported Ki values (in (nM)) are demonstrated in Table S1 under Supplementary Material. The logarithmic transformations of 1/Ki were used in structure-activity correlation in order to correlate free energy change to the bioactivity linearly.
The chemical structures of the collected compounds were exported from ChEMBL in simplified molecular-input line-entry system (SMILES) format. Subsequently, they were imported into DiscoveryStudio (version 2.5.5) [81], and transformed automatically into a reasonable single conformer 3D structure using rule-based methods implemented in DiscoveryStudio 2.5.5 [81], and saved in structure database format for subsequent docking experiments. For each inhibitor, two protonation states were considered: un-ionized and ionized as guided by MarvinView (Version 5.1.4) at physiological pH (7.4). In the ionized structures, amino substituent (pKa ≈ 9.0-9.5) were protonated and granted positive charges while the carboxylic acids (pKa ≈ 4.5) were deprotonated and granted formal negative charges. The unionized and ionized lists were utilized in further docking experiments.

Preparation of Aurora-A Kinase Crystal Structure
The crystallographic 3D coordinates of Aurora-A kinase protein were retrieved from the Protein Data Bank (PDB code: 3w2c, resolution = 2.45 Å). Hydrogen atoms were added to the protein utilizing DiscoveryStudio 2.5.5 templates for protein residues. Energy minimization was not performed upon using the protein structure for docking. Explicit water molecules were either removed or kept according to the desired docking conditions, i.e., docking in hydrous or anhydrous binding pocket. The binding pocket was defined as the cavity volume occupied by co-crystallized ligand within the PDB protein (3w2c).

Docking and Scoring
Docking experiments were performed using LibDock docking engine, which consider the flexibility of the ligand while treat the receptor as rigid [59,73]. Docking was performed by using the seventy-nine Aurora-A Kinase inhibitors with known (absolute) stereochemistries as ligands (1-79, Table S1 in Supplementary Material) after removing hydrogen atoms into the (3w2c) protein structure putative active site/binding pocket guided by binding hotspots. Polar and apolar receptor interactions hotspots align the ligand conformations [59,73]. Details of LibDock docking process and corresponding docking settings are described in Section S1 in the Supplementary Materials.

Docking-Based Comparative Molecular Contacts Analysis (dbCICA)
This methodology was performed through the following steps:

1.
Allocating binary code based on neighboring contacts: the closest binding site atoms for each docked conformer of each ligand was identified. The neighboring atoms lying exactly at or closer than a predefined distance-threshold of 3.5 Å or 2.5 Å are allocated an intermolecular contact value of "one", otherwise they are given a contact value of "zero". The distance evaluation was automatically performed via employing the Intermolecular Monitor of DS 2.5.5.

2.
A 2D matrix is engendered for every docking-scoring configuration, such that each matrix is composed of column labels correlated to different binding site atoms and row labels correlated to docked ligands. The binary code is filled into the matrix, by which "zeros" correspond to inter-atomic distances that exceed the predetermined threshold and "ones" for distances equal or below the predefined threshold cutoffs.

3.
Two binary matrices (each one relevant to distance threshold: 2.5 Å or distance threshold: 3.5 Å) were constructed for each docking configuration (i.e., 2 protein hydration states × 2 ligand ionization states × 2 distance thresholds × 7 scoring functions). Accordingly, 56 binary files were generated in the study.

4.
Each individual column in the matrix was regressed against respective molecular bioactivities (i.e., −log (Ki)). If the columns showed negative correlation with bioactivity, thus, they are inverted, i.e., zeros are converted to ones and vice versa, and excluded from the subsequent step.

5.
The remaining binary matrix composed of bioactivity correlated with positive contact columns, is then subjected to genetic algorithm (GA)-based search (within MATLAB) for optimal sum of contacts columns able to explain variations in bioactivity (detailed information on how GA is implemented in dbCICA modeling, is in Section S3 under Supplementary Materials). The dbCICA model is built or represented based on selecting the best column-summation model. 6.
dbCICA algorithm permits varying contacts-weights to detect intermolecular contacts comprising higher weights contributions in optimal dbCICA models: variable contacts weights emerged by positive contacts column could be up to three times in the summation model. To perform this, dual valued genes are implemented in the GA, whereby every gene encodes not only the respective contacts column number but also its weight. 8.
After using positive contacts to identify the optimal summation model(s), subsequently, GA is used to seek the best summation generated by linking inverted columns (exclusion columns, detailed in steps 4 and 5 at Section 4 Docking-Based Comparative Molecular Contacts Analysis (dbCICA)) with the best positive summation model(s). In this project, we executed two exclusion settings; the allowed negative contacts were either five or ten.
Manual Generation of Pharmacophores Corresponding to Successful dbCICA Models Building pharmacophores was guided by using the best dbCICA models. Subsequently, pharmacophores were used as 3D-search queries for the discovery of new Aurora-A kinase inhibitors. The pharmacophores were created via the following steps: 1.
The docking setting that comprised the best dbCICA models were nominated. In this project two conditions produced the best models: (1) docking the ionized ligands into hydrous binding site using PMF04 scoring function with five features and ten exclusion volumes, named (SB-1) (2) docking ionized ligands into unhydrous binding site using Ligscore2 scoring function with ten features and ten exclusion volumes, named (SB-2) (Tables 1 and 2). The docked poses, corresponding to the model of the most potent ligands (Ki ≤ 17 nM), were preserved in the binding pocket while less potent or inactive ligands were discarded.

2.
Afterwards, these best dbCICA models are used to predict the bioactivities of the potent Aurora kinase A inhibitors in the binding pocket i.e., by substituting the sum of contacts for every docked molecule in the relevant regression equation associated to the dbCICA model. The well-behaved and potent compounds (i.e., showing minimal difference between real and fitted bioactivities) were maintained in the binding pocket for subsequent steps.

3.
Those compounds, having weights of 2 or 3 or major positive molecular contacts in the binding site, were carefully evaluated and annotated to discover the ligands closest moieties. Consensus among potent training molecules to locate moieties of common physicochemical properties next to significant contact atom permit placing a representative pharmacophoric feature onto that region. For example, if potent docked molecules agreed on the presence of aromatic rings next to a certain dbCICA significant contact atom then a hydrophobic aromatic feature is positioned on top of the aromatic rings within the predefined threshold of distance. Manual addition of pharmacophoric features was employed using DiscoveryStudio 2.5.5 feature library and default feature radii (1.6 Å or 2.2 Å).

4.
Finally, the steric constraints of the binding pocket were accounted for based on binding-site contact atoms of negative correlations with bioactivity. Negative contacts specify the spaces where the conformers of inactive or minimal active compounds dominate and surely, active ones are absent. Therefore, these exclusion volumes were filled with exclusion spheres manually from DiscoveryStudio 2.5.5 feature library and default feature radius (1.2 Å) was employed.

5.
The overall manual pharmacophores modeling procedure generated two pharmacophore models from dbCICA based on: first, ionized ligands into hydrous binding site using PMF04 scoring function with five features and 10 exclusion volumes i.e., Hypo(SB-1). Second, based on ionized ligands into unhydrous binding site using Ligscore2 scoring function with ten features and ten exclusion volumes Hypo(SB-2) (see Tables 1 and 2 Validation of Generated Pharmacophore Models Using Receiver Operating Characteristic (ROC) Curve Analysis The classification power of dbCICA derived pharmacophores; Hypo(SB-1) and Hypo(SB-2) was validated using receiver operating characteristic (ROC) curve analysis.
The ROC analysis assesses the ability of the pharmacophore to distinguish between Aurora-A kinase active or inactive group of molecules within a designated large "testing list" and to selectively capture the ones with better activity. This testing list was entirely composed of experimentally validated compounds extracted from the European Bioinformatics Institute database (ChEMBL, https://www.ebi.ac.uk/chembl) and included 86 active compounds (anti-Aurora-A Kinase with Ki values ≤10 nM) and 248 less-active compounds (anti-Aurora-A Kinase with Ki values > 500 nM considered as decoy list).
Selection process of the testing set compounds is detailed in Section S5 under Supplementary Materials. Conformational ensembles were generated for the testing set using "CESEAR" conformation generation option implemented in DiscoveryStudio 2.5.5.
The results were presented in the form of ROC curves (Table 4). ROC analysis provides several success criteria for evaluation: (1) area under the curve (AUC) of the corresponding ROC curve, (2) overall accuracy (ACC), (3) overall specificity (SP), or true negative (TNR), (4) overall sensitivity (SE), or true positive rate (TPR) and (5) overall false positives (FP). ROC curves were plotted by considering the highest score (fit value against the tested pharmacophore) of an active molecule as the first threshold then counting the number of decoy compounds within this cut-off value, and the corresponding sensitivity and specificity pair is calculated. This process is repeated using the active molecule possessing the second highest score and so forth, until the scores of all active compounds are considered as selection cut-off values rate [79,80,93,94]. Details of ROC analysis are shown in Section S4 under Supplementary Materials. dbCICA models ROC analysis curves and results are illustrated in Table 4.

Addition of Exclusion Volumes
Based on ROC results and in order to improve the classification properties of Hypo(SB-1) model, which had better behavior over Hypo(SB-2), as detailed in results Section 2.3, and in Table 4, it was decided to complement Hypo(SB-1) with exclusion spheres by employing HIPHOP REFINE module of DiscoveryStudio 2.5.5 [81,82]. HIPHOP REFINE identifies spaces occupied by the conformations of inactive compounds and free from conformations of active ones. These areas are filled with exclusion volumes to represent the steric constrains of the binding pocket [56,82,95]. See Section S5 under Supplementary Materials for more information on HIPHOP REFINE and how it adds exclusion spheres to pharmacophore model Hypo(SB-1). A subset of training compounds was carefully selected for HIPHOP REFINE modeling of the Hypo(SB-1) as detailed in Table S2. This list includes active, moderately active, and inactive compounds. Based on their activity, exclusion spheres occupy the space conformations of inactive compounds and avoid space conformations of active compounds. The HIPHOP REFINE process resulted in adding 93 exclusion volumes to Hypo(SB-1), and the sterically refined pharmacophore was named Refined-Hypo(SB-1).
The performance of Refined-Hypo(SB-1) was evaluated again by ROC analysis, as illustrated in (ROC) curves in Table 4, and accordingly, Refined-Hypo(SB-1) model was adopted for in-silico search screening, whereas Hypo(SB-2) was used as a confirmatory model for the predicted activity.

In-Silico Screening of NCI for new Aurora-A Kinase Inhibitors
The sterically refined dbCICA-derived pharmacophore Refined-Hypo(SB-1) in Figure 2 was used as 3D query to search and screen the National Cancer Institute compounds database (253,368 compounds). Screening was performed by employing the "Fast Flexible Database" search option implemented within DiscoveryStudio 2.5.5, and captured 293 hits that were subsequently filtered based on SMARTS filtration [81] and based on Lipinski rule [86], which yielded 262 hits. Subsequently, conformational ensembles were generated for surviving hits using "CEASAR" conformation generation option within DiscoveryStudio2.5.5 to allow fitting hits against the capturing pharmacophore. The lowest fitting 145 hits (ca. the least 55% against Hypo(SB-1) pharmacophore) were discarded leaving 117 hits for subsequent docking into Aurora-A kinase.
The remaining 117 hits were docked into Aurora-A kinase protein binding pocket (PDB code: 3w2c) employing the same docking/scoring settings of a corresponding dbCICA model (SB-1) (Tables 1  and 2, pharmacophores in Figure 2) (i.e., docking the ionized ligands into hydrous binding site using PMF04 scoring function with five features and ten exclusion volumes). Steps of activity prediction were also employed using docking/scoring settings of (SB-2) dbCICA model to assess the similarity extent in "predicted activity" of (SB-2) model and the better performing (SB-1) model. Therefore, the remaining 117 hits were also docked into Aurora-A Kinase binding pocket (3w2c) using the docking-scoring settings of (SB-2) dbCICA model i.e., docking ionized ligands into unhydrous binding site using Ligscore2 scoring function with ten features and ten exclusion volumes.
The resulting docked poses (i.e., of hits) were analyzed to identify their binding critical contacts. The sums of critical contacts (as identified by corresponding dbCICA model, Tables 1 and 2) for each compound were used to predict their corresponding Ki values or bioactivities by substituting their sum of contacts with binding site atoms in the respective dbCICA-regression equations (Table 1), tying the contacts sums with bioactivities.
The process of hits request was done in accordance with the predicted activity of (SB-1) model (i.e., whose settings were used as the 3D search query derived pharmacophore Refined-Hypo(SB-1), and after a voting step to prioritize the hits that scored high predicted activity via (SB-2) dbCICA model, and lastly, based on the compounds' availability at NCI. The highest-ranking 75 obtainable hits were requested from the NCI for subsequent enzyme inhibition assay and in-vitro cell-based biological evaluation. Table S3 and Figure S2 in Supplementary Material summarizes the structures of all the 75 captured hits (numbered 80-154) obtained by in-silico screening, their contact atoms summation, and their predicted bioactivities.

FRET-Based Enzyme Inhibition Assay
The activity, the 75 captured hits, was evaluated using enzyme inhibition assay. This bioassay was conducted using Z -LYTE ® kinase assay (Invitrogen, Carlsbad, CA, USA) [68,69]. This biochemical assay depends on fluorescence emission as detection method utilizing a synthetic fluorescence resonance energy transfer (FRET) peptide with two fluorophores: coumarin that work as donor and fluorescein as acceptor [69]. Each hit or compound was dissolved in DMSO to form 10 mM stock solution. Hits were screened at 10 µM concentration. The bioassay was performed in black 384-well plate. For each assay, 100 nL of tested compound stock solution (10 mM) was mixed with 2.4 µL kinase buffer, 5 µL (peptide substrate/Aurora-A Kinase) mixture, and 2.5 µL ATP solution. The final 10 µL kinase reaction consisted of (0.91-8.56) ng Aurora-A Kinase (in DMSO < 1%), and 2 µM Ser/Thr 01 (kinase peptide substrate from Z -LYTE ® (Invitrogen, Carlsbad, CA, USA) and 10 µM ATP in 50 mM HEPES pH 7.5, 0.01% BRIJ-35, 10 mM MgCl 2 , and 1 mM EGTA. The mixture was incubated at room temperature for 60 min to allow the kinase reaction to complete. Subsequently, 5-µL development reagent A (from Z -LYTE ® , 1:4096 dilution) were added to the reaction mixture and shaken over 30 s. The mixture was incubated for another 60 min at room temperature before the fluorescence is measured at λEx of 445 and λEm of 520 nm (λEx: excitation wavelengths, λEm: emission wavelengths).
For initial screening, the 75 hits were examined at 10 µM. Five compounds have shown anti-Aurora-A Kinase inhibition percentages > 51% at 10 µM (Table 5), which were further tested at several concentrations range (0.1-100 µM) to determine their anti-Aurora-A Kinase IC 50 values.
The tests were conducted in duplicates. Staurosporine was used as control (standard inhibitor with IC 50 of 3.72 nM) [62]. Nonlinear regression of log (concentration) vs. inhibition percentages values using GraphPad Prism 7.03 software was adopted to plot the dose-response curves ( Figure 5) and calculate IC 50 values ( Table 5).
The five bioactive hits were further examined using Cell Culture Based Cytotoxicity Assay.
Cell Culture Based Antiproliferation Assay The human cell-lines triple negative breast adenocarcinoma MDA-MB-231, Pancreas adenocarcinoma PANC1, ductal breast carcinoma T-47D and fibroblasts were preserved and expanded in DMEM medium, while the prostate adenocarcinoma PC-3 cells were cultured in RPMI-1640 growth medium. All growth media were supplemented with 10% heat-inactivated fetal bovine serum (FBS). Cells were maintained under standard culture conditions at 37 • C, 5% CO 2 in a humidified incubator. At 80% confluency, cells were washed with 5 mL phosphate buffer saline (1X PBS), then incubated for 4 min with 2-3 mL Trypsin-EDTA 0.25% in order to obtain suspended cells and collect them in a known volume of fresh media for cell count. Cells were counted using conventional hemocytometer and trypan blue dye method.
The five potent hits that have shown anti-Aurora-A Kinase inhibition percentages above 51% at 10 µM upon conducting the kinase enzyme Inhibition assay (Table 5) were further validated using MTT cytotoxicity Assay. This assay includes 3-(4,5-Dimethylthiazol-2-yl)-2,5-diphenyl tetrazolium bromide (MTT) dye, which is a yellowish tetrazole reduced by viable cells into formazan purple color, which detects the normal mitochondrial role in viable cells. In 96-well coated flat bottom microplates, 100 µL cell suspension of PANC1 or T-47D cells were seeded at a density of 7000 cell/well, and 6000 cell/well for MDA-MB-231, PC-3, or fibroblasts, and incubated under the same conditions at 37 • C. After 24 h, each well was treated with the intended drug concentration diluted in 200 µL fresh media after removal of the old 100-µL media. After 72 h incubation, drug-containing media was aspirated and on each well 100 µL phenol-red free media and 15-µL MTT solution (5 mg/mL (wt/vol) PBS). After 3 h of dark incubation at 37 • C, MTT and media were carefully withdrawn from each well, and 200 µL/well DMSO were added. Colorimetric absorbance (OD) was measured at 570 nm wavelength using 96-well plate reader (Tecan-Sunrise, Männedorf, Switzerland).
The five compounds were prepared in a stock concentrate of 10 mM using cell culture grade DMSO to be utilized in MTT assay. The cytotoxicity of each compound was initially screened on cancer cell-lines at two concentrations: 50 µM and 25 µM in duplicates. Wells treated with 200 µL media plus DMSO only (0.5% v/v) represented negative control and the standard inhibitor Tozasertib/(VX-680) was used as positive control. The outgrowth inhibition rate or (cell death) was calculated using this formula: outgrowth inhibition % = (1 − (mean (ab treated)/mean (ab control))) × 100%. Subsequently, compounds that produced cell death or inhibition rate above 50% at 50 µM were further tested to determine their half-maximal inhibitory concentration (IC 50 ) values. Cells were treated with multiple decreasing concentrations of the tested compound starting at 50 µM in triplicates for IC 50 determination. The safety index for the compounds with specified IC 50 values was evaluated via determining their IC 50 values against normal Fibroblast cells. It is noteworthy to mention that DMSO concentration was constant on all the wells (i.e., (0.5% v/v or 1 µL/200 µL) per well), for the treated and the control ones. IC 50 values were calculated using nonlinear regression of the dose-response curves using GraphPad Prism software 7.03.

Supplementary Materials:
The following are available online, Table S1: The structures of the seventy-nine Aurora-A Kinase inhibitors collected from the (ChEMBL), utilized in modeling. Table S2: Refinement list for steric refinement of Hypo(SB-1). Table S3: The 75 high-ranking hits captured by Refined-Hypo(SB-1) 3D search query-obtained from NCI and their anti-Aurora-A kinase inhibition % at 10 µM using Z -LYTE ® kinase assay (Invitrogen, Carlsbad, CA, USA), sum of their critical contacts and their predicted activity. Section S1: LibDock Docking Engine and Scoring Settings. Section S2: Scoring of Docked Ligand Poses. Section S3: Genetic Algorithm Implementation in dbCICA Modeling. Section S4: receiver operating characteristic (ROC) curve analysis. Section S5: Steric Refinement of pharmacophores. Figure S1: Three-dimensional plot showing three main principal components calculated for the Testing Set of for HIPHOP REFINE modeling. Figure S2: The chemical structures of the tested highest-ranking hits. Figures S3-S10: "NMR" and Mass Spectrum for Hit 88 (NCI 1576). Figure