1. Introduction
Cancer remains one of the leading causes of mortality worldwide, with millions of new cases and deaths reported annually [
1]. Despite significant advances in cancer therapy, drug resistance continues to pose a substantial challenge, driving the ongoing search for new treatments [
2]. This underscores the importance of identifying novel therapeutic targets and developing new drugs to enhance the efficacy of cancer treatments and overcome resistance.
Protein kinases have emerged as pivotal targets in cancer therapy due to their crucial roles in regulating cellular processes such as proliferation, differentiation, and apoptosis. These enzymes function by transferring phosphate groups to specific substrates, an essential process for signaling pathways that control cell growth and survival. Aberrations in kinase activity, whether through overexpression, mutation, or dysregulation, are commonly associated with cancer, making these enzymes attractive targets for cancer drug design [
3].
Among the plethora of protein kinases implicated in cancer, Aurora kinases have drawn considerable attention. Aurora kinases are a family of serine/threonine kinases comprising isoforms A, B, and C. These kinases are essential regulators of mitosis, ensuring accurate chromosomal segregation and cytokinesis [
4,
5]. Aurora kinase B (AurB) is particularly critical as it plays a central role in the chromosomal passenger complex (CPC), which is vital for correcting kinetochore-microtubule attachments, regulating the mitotic spindle checkpoint, and ensuring proper chromosomal alignment and segregation during cell division [
5,
6]. The CPC, which includes the proteins AurB, INCENP (inner centromere protein), survivin, and borealin, orchestrates these processes, highlighting the indispensable function of AurB in maintaining genomic stability [
5,
7].
AurB overexpression and hyperactivation have been linked to various malignancies, including colorectal, breast, prostate, ovarian, testicular, thyroid, and lung cancers, often correlating with poor prognosis and resistance to therapy [
6,
8]. This overexpression disrupts normal mitotic processes, causing defects such as multipolar spindles, chromosomal mis-segregation, and cytokinesis failure, which drive tumor progression and aneuploidy—a hallmark of cancer [
8,
9,
10]. Moreover, AurB contributes to tumor cell survival by enhancing DNA damage tolerance and activation of pro-survival pathways [
4,
8,
11]. As a result, AurB inhibitors have gained attention for their potential to selectively induce mitotic catastrophe and apoptosis in cancer cells [
8,
12,
13,
14].
The development and testing of several AurB inhibitors have been described in the literature, and they have had varying degrees of success. Barasertib (AZD1152), an ATP-competitive inhibitor, has shown high selectivity for AurB and efficacy in preclinical models, inducing mitotic defects and apoptosis in acute myeloid leukemia (AML) and lung cancer [
15,
16]. Other ATP-competitive inhibitors, such as hesperadin and ZM447439, also disrupt mitosis and cytokinesis, leading to polyploidy and cancer cell death [
16,
17]. Allosteric inhibitors, targeting non-ATP-binding sites, and dual inhibitors like VX-680 and GSK1070916 offer alternative strategies, though challenges remain in developing these compounds due to structural complexities and specificity requirements [
16,
18,
19].
Despite these advancements, developing selective and potent AurB inhibitors remains challenging. The structural similarity between AurB and its closely related isoform Aurora A (AurA) complicates the design of highly specific inhibitors, as they share a conserved catalytic domain. Achieving high selectivity is crucial to minimize off-target effects such as cytotoxicity and hematological toxicity [
8,
20,
21]. Drug resistance is another important obstacle in AurB-targeted therapy, as cancer cells can adapt through various mechanisms, including activation of compensatory pathways or mutations that alter the binding affinity of inhibitors [
4,
16,
22]. For instance, mutations in the AURKB gene, such as Gly160Glu, can reduce inhibitor binding efficiency, diminishing therapeutic efficacy [
22,
23].
Advances in structure-based drug design, including high-throughput screening and virtual docking, have been instrumental in identifying new AurB inhibitors. However, optimizing these compounds for high specificity and overcoming resistance mechanisms remains a critical challenge. In silico drug discovery methods, such as molecular docking and quantitative structure-activity relationship (QSAR) analysis, can accelerate the identification of lead compounds, but further refinement is needed to translate these findings into clinical therapies [
16,
24].
Structurally, AurB comprises three main domains: an N-terminal regulatory domain, a highly conserved kinase domain, and a short C-terminal extension. The kinase domain consists of a β-stranded lobe and an α-helical lobe connected by a hinge region, critical for its catalytic AurB. The phosphorylation of the activation loop, particularly at Thr232, is essential for AurB activation [
8,
24,
25].
As proven in other studies, key residues within the ATP-binding pocket include Leu83, Phe88, Glu155, and Ala157, which contribute significantly to substrate recognition and inhibitor binding. The hinge region and the conserved T-loop are also important for modulating kinase activity. The structural flexibility of these regions allows AurB to interact with various inhibitors, though this also complicates the design of selective inhibitors due to the overlap with AurA binding sites [
20,
26].
In order to improve affinity while also increasing selectivity for AurB isoform, the interaction of known ligands and inhibitors with the target protein can be studied form the perspective of chemical structure and amino acid interactions, as detailed information may aid in the development or selection of more potent and selective drug candidates.
Molecular interaction fingerprints (MIFs) are computational representations of protein–ligand interactions that capture critical contact points within a binding site. These fingerprints map interactions such as hydrogen bonds, hydrophobic contacts, π–π stacking, and electrostatic forces, offering a detailed profile of binding affinity. Utilizing MIFs can significantly enhance post-docking analysis by allowing researchers to quantitatively assess the strength and nature of interactions between the targeted protein and potential inhibitors. This technique goes beyond traditional scoring functions by identifying key residues involved in binding and predicting the impact of structural modifications on inhibitor efficacy [
26,
27,
28,
29].
By integrating analysis of interaction profiles predicted with molecular docking and machine learning-based approaches (ML), the current study aims to develop a drug repurposing framework to facilitate the identification of structural features that contribute to high binding affinity towards AurB and to discover novel potential inhibitors among approved or investigational drugs.
2. Results
2.1. Datasets
The initial dataset contained 179 inhibitors tested on the AurB/INCENP complex, as retrieved from the ChEMBL database. Of these, 127 compounds presented exact half maximal inhibitory concentration values (IC50) available and were retained for further analysis. The pIC50 values representing the negative logarithm (base 10) of the IC50 were further used for subsequent analysis. This subset, referred to as the “exact values set” (EV set), served as the basis for scaffold structure analysis and QSAR modeling.
Supplementary Material Table S1 illustrates descriptive statistics for the total and EV sets of AurB inhibitors, including several molecular descriptors calculated with DataWarrior v06.03.01 software [
29] based on their chemical structure.
The EV set exhibited diverse physicochemical properties, with molecular weights ranging from approximately 198.2 to 663.0 g/mol and pIC50 values spanning from 4.0 to 9.3.
The 550 decoy compounds generated with the DUD-E platform presented a comparable profile of molecular descriptors to the inhibitors, supporting their suitability as decoy molecules. This dataset exhibits a similar range of molecular weights, AlogP values, number of H-bond acceptors and H-bond donors, and other descriptors, as detailed in
Supplementary Material Table S2.
2.2. Scaffold Analysis
Using the EV set, scaffold analysis was performed to identify the most central ring (MCR) systems associated with the compounds. The 127 compounds yielded 11 distinct MCR fragments, numbered from 1 to 11. Scaffolds 10 and 11, each represented by a single compound, were excluded from further analysis due to limited statistical significance. Additionally, MCR1 and MCR2, each comprising two bioisosteric compounds, the pyrazole and furane rings, were grouped into a single category (MCR1 + 2), resulting in a final set of eight scaffold categories for further evaluation.
In order to evaluate the relationship between these scaffolds and inhibitor potency (pIC50), Mann–Whitney U tests and Chi-squared tests were performed (results shown in
Supplementary Material, Table S3). The pIC50 threshold of 6.5 was selected to divide the compounds into two subsets, distinguishing potent inhibitors from weaker ones. Significant results were observed for MCR1 + 2, MCR8 (pyrrolo [2,3-b]pyridine), and MCR9 (quinazoline), which showed both high enrichment factors (EF) and statistically significant associations with increased potency (
Figure 1A–C). MCR1 + 2 exhibited a mean pIC50 value of 8.15 (
p = 0.0038), underscoring its potential as a scaffold associated with enhanced activity.
The Chi-squared test yielded a value of 29.7068 (p = 0.0001), indicating statistically significant differences in the distribution of high-potency inhibitors across the scaffolds. Significant residuals were observed for MCR9, suggesting this scaffold was disproportionately represented among the most potent inhibitors.
The EF analysis further highlighted trends in scaffold potency. MCR1 + 2, MCR5, MCR8, and MCR9 displayed EF values greater than 1, indicating a higher-than-expected proportion of potent inhibitors within these scaffolds. However, MCR5 (cyclohexane), despite showing an EF > 1, did not exhibit statistically significant differences in mean pIC50 values when compared to the other scaffolds, suggesting that while MCR5 contains potent inhibitors, its contribution to potency may not be as robust or consistent as that of MCR1 + 2, MCR8, and MCR9.
2.3. QSAR Models
Several quantitative structure–activity relationship (QSAR) models were developed using the exact values set to predict pIC50 values based on selected molecular descriptors generated with RDKit (as described in the
Section 4). The molecular descriptors selected using the implemented feature selection approaches are explained in
Table 1, while the heatmap of intercorrelations is shown in
Supplementary Materials Figure S1. The highest negative correlation was observed between SPS and BCUT2D_CHGLO (r = −0.83), while the highest positive correlation was noted between SPS and VSA_EState3 (r = 0.74). Most of the absolute values of correlation coefficients are below 0.7, highlighting a relatively low degree of collinearity between the selected features.
Five regression models—Multiple Linear Regression (MLR), Partial Least Squares (PLS), Support Vector Machines (SVM), Random Forest (RF), and Gradient Boosting (GB)—were trained and evaluated after splitting the dataset into training (80%) and test (20%) subsets. The performance metrics for the trained models, including R
2, RMSE, MAE, and Q
2 cross-validation values, are summarized in
Table 2.
Supplementary Material Figure S2 shows the correlation diagrams between experimental and the pIC50 values predicted by all five models, further illustrating the poor predictive performance of linear models (MLR and PLS) compared to non-linear methods (SVM, RF, and GB) in accordance with the corresponding correlation coefficients.
The SVM model demonstrated the best overall performance, achieving a cross-validated Q2 of 0.67 ± 0.10 and a test set R2 of 0.81. The RF and GB models also demonstrated strong test R2 values (~0.76) but showed evidence of higher overfitting, as reflected by their exceptionally high training R2 values (0.9587 for RF and 0.9949 for GB).
Feature importance computed by permutation approach for the SVM model revealed that molecular descriptors such as SlogP_VSA11, PEOE_VSA2, SMR_VSA10, and qed had the most significant contributions to model predictions (
Figure 2A). The SHAP (Shapley additive explanations) summary plot (
Figure 2B) shows that, on average, lower values for PEOE_VSA2, qed, VSA_Estate3, and PEOE_VSA7 are required for higher predicted potencies. On the other hand, high values for PEOE_VSA8, SMR_VSA10, and EState_VSA8 are correlated with higher AurB inhibitory activity. However, the dependency between SlogP_VSA11, BCUT2D_CHGLO, SPS, and predicted pIC50 is less linear, highlighting the capacity of the SVM model to capture non-linear relationships. Partial dependence and individual conditional expectation plots are shown in
Supplementary Material Figure S3A–J, further illustrating the influence of each descriptor on the predicted pIC50 value.
According to the SHAP analysis, compounds with a lower van der Waals surface area (VSA) of atoms with more electronegative partial charges (PEOE_VSA2) and with partial charges close to 0 (PEOE_VSA7) are more likely to be potent inhibitors, while molecules with lower VSA of atoms with partial charges between 0 and 0.05 (PEOE_VSA8) are less active. PEOE is a method of partial equalization of orbital electronegativities for calculating atomic partial charges, accounting for the differences in electronegativity among bonded atoms. Furthermore, compounds with atoms with high contributions to molar refractivity (SMR_VSA10) are also correlated with higher activities, as well as atoms with high EState indices (EState_VSA8). Additionally, molecules with lower sums of EState indices for atoms with relatively low VSA contributions (VSA_EState3) are also more potent. Interestingly, lower drug-likeness scores (qed) are associated with higher activities, while higher spatial complexity scores (SPS) are specific to less active compounds. Additional insights into the QSAR models are provided in the
Supplementary Material.
Figure S4 illustrates the applicability domain of the SVM model through a Williams plot and principal component analysis (PCA). Most test set compounds fall within the reliable prediction domain, validating the model’s robustness.
Table S4 outlines the range of values for selected molecular descriptors in the training and test sets, showing sufficient overlap to ensure that the model was tested on a representative chemical space.
2.4. Fingerprints Classification Models
Classification models were developed using 20 principal components (PCs) of flexophore fingerprints to distinguish active (pIC50 > 7) from inactive inhibitors. Performance metrics for logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosting (GB), and K-nearest neighbors (KNN) classifiers are presented in
Figure 3C, with additional data in
Supplementary Materials Table S5.
RF and GB models outperformed the other classifiers, with training accuracies of 97.2% and 94.4%, respectively, and test accuracies exceeding 90%. ROC curves for the test set confirmed robust model performance, with test set ROC AUC values of 0.9688 for RF and 0.9469 for GB. The RF classifier was selected for downstream analysis based on its ability to accurately identify active ligands while minimizing false positives and its higher values for cross-validation performance metrics. The Gini importance scores of the 20 selected PCs for the RF model are shown in
Figure 3D. Notably, PC3 has a significantly larger contribution to predictions compared to the other PCs.
2.5. Ligand-Based Meta-Model
A stratified logistic regression meta-model was developed to integrate scaffold-specific information (one-hot encoded MCR scaffolds), predicted pIC50 values (from the selected QSAR model), and predicted probabilities (from the flexophore-based classification model) for distinguishing active from inactive AurB inhibitors (pIC50 > 7).
Figure 4 illustrates the performance of the meta-model, presenting the ROC curves for the training, test, and cross-validation datasets (
Figure 4A), as well as the regression coefficients for the features used in the model (
Figure 4B).
The meta-model demonstrated strong predictive performance, achieving a ROC AUC of 0.9922 on the training set, 0.9594 on the test set, and a mean RIC AUC of 0.9142 across five-fold cross-validations. These results indicate strong discriminatory power and generalizability of the meta-model. Among the input features, pIC50 values predicted by the QSAR modeling and flexophore-based classification probabilities contributed the most, as evidenced by their higher regression coefficients (
Figure 4B). This suggests that the integration of QSAR predictions and molecular fingerprints effectively captures key aspects of ligand activity. MCR1 + 2 and MCR9 also exhibited positive regression coefficients, reinforcing the relevance of these scaffolds in defining active AurB inhibitors.
2.6. Docking-Based Classification
Docking studies using AutoDock Vina 1.1.2 and AutoDock 4 (AD4) were conducted on AurB inhibitors and decoy molecules. Redocking the co-crystallized structure of AurB inhibitor VX-680 showed that AD4 is more accurate than Vina in correctly predicting the binding pose in this specific case (RMSD of 1.8497 vs. 2.3386 Å,
Supplementary Materials Figure S5). AD4 outperformed Vina in terms of accuracy and reliability, achieving higher ROC AUC values for distinguishing active inhibitors from inactive/decoys (
Figure 5A,B). Additionally, AD4 demonstrated a better correlation between predicted binding energies and experimental pIC50 values, as illustrated in
Supplementary Material Figure S6A,B.
Binding energy distributions and enriched ROC curves highlighted the ability of AD4 to prioritize potent inhibitors, with statistically significant differences (
t-Student’s test,
p < 0.0001) in binding energies between active and inactive groups (
Figure 5C,D). Moreover, the effectiveness of the docking protocol was also evaluated by computing the enrichment factors in the top 1% and 10% of the dataset. For the top 1%, we determined enrichment factors of 2.13 for Vina and 3.19 for AD4, while at 10%, the values were 3.72 for Vina and 4.75 for AD4, the latter showing higher performance in correctly identifying active molecules.
To further assess the performance of the docking algorithms, Tanimoto similarity indices were calculated and compared for the binding poses generated by Vina and AD4 (
Figure 6). More specifically, Tanimoto similarity indices were calculated to quantitatively compare the molecular interaction fingerprints of docked ligands against the reference co-crystallized ligand (VX-680), reflecting the degree of overlap between predicted and reference interaction profiles, ranging from 0 (no similarity) to 1 (perfect match). By assessing this metric, we evaluated the fidelity of docking algorithms in reproducing experimentally validated binding interactions. The docking software that achieves higher Tanimoto similarity indices demonstrates a closer match to the reference interaction profile, indicating superior predictive accuracy.
Figure 6A illustrates that, in our study, AD4 achieves a higher median Tanimoto similarity score compared to Vina, indicating improved consistency in docking pose accuracy. In contrast, Vina demonstrated less robust pose predictions.
Figure 6B further supports these findings, where AD4’s distribution is shifted toward higher similarity indices, signifying a stronger overlap between predicted and reference poses. These results confirm AD4 as the superior docking algorithm in reproducing interaction patterns relevant for active AurB inhibitors, demonstrating the reliability of AD4 in generating biologically relevant binding poses through a quantitative and reproducible measure. Therefore, the selection of interaction fingerprints for downstream machine-learning models was performed based on the AD4 predictions.
A random forest classifier and Chi-square test were performed to select the most relevant interactions as features for training the docking-based ML model. Based on the computed importance scores, we selected the presence of hydrophobic interactions with Lys85, Tyr156, Glu161, and Lys164 and hydrogen bonds with Tyr156 and Glu161 as common features identified using both approaches. The feature importance scores for the top 10 interactions detected by both methods and the proportions of the common interactions in both active and inactive compounds are shown in
Figure S7. Binding energies and the selected interaction fingerprints obtained using AD4 were thereafter used to train and fine-tune multiple multi-layer perceptron (MLP) classification models, leading to a total of 64 hyperparameter combinations. The most optimal MLP model was trained with a learning rate of 0.001, a batch size of 8.0, and a dropout rate of 0.1. The training, test, and cross-validation ROC curves, feature weights, loss curves, and accuracy curves for the selected MLP model are shown in
Figure 7. The model achieved robust performance across multiple evaluation metrics. The training and test loss curves (
Figure 7C) demonstrate consistent convergence, with no significant overfitting observed. Accuracy curves (
Figure 7D) highlight the model’s ability to maintain high prediction accuracy for both training and test sets. Additionally, ROC curves (
Figure 7A) indicate strong discrimination between active and inactive inhibitors, with test AUC values exceeding 0.95. The importance of specific interaction features in driving inhibitory activity predictions is visualized in
Figure 7B.
As expected, binding energy stands out as the primary determinant, emphasizing its high contribution to ligand activity. Hydrophobic interactions with residues such as Lys85, Tyr156, Lys164, and Glu161 are highly ranked, highlighting the importance of hydrophobic contacts in stabilizing ligand binding within the active site. Hydrogen interactions involving Tyr156 and Glu161 residues are also noteworthy, as they are important for forming key hydrogen bonds and polar contacts.
2.7. Drug Repurposing Solutions
After predicting AurB inhibitory activity using molecular docking results as independent variables in the MLP model, we further analyzed compounds from DrugBank (investigational and approved drug sets encompassing 4680 molecules). Following applicability domain verification and similarity filtering using flexophore descriptors, 1024 compounds were selected. After further filtering through QSAR-based classification and meta-model analysis, 548 candidates were predicted to possess inhibitory activity against AurB and were further chosen for docking simulations.
Among the compounds with potential AurB inhibitory activity, 30 candidates with probabilities exceeding 95% were ranked and further analyzed (
Table S6 in Supplementary Material). Notably, saredutant, montelukast, and canertinib emerged as the most promising repurposing candidates based on docking results and predicted probabilities.
Saredutant, an investigational neurokinin-2 (NK2) antagonist, demonstrated a predicted binding energy of −11.40 kcal/mol, a 57% probability of being active based on flexophore fingerprints, and a predicted pIC50 of 6.80 M. While its structure lacked a relevant MCR scaffold, the ligand-based meta-model predicted an 85.72% probability of activity, suggesting its potential for repurposing.
Montelukast, a leukotriene receptor antagonist used in asthma management, also lacked any relevant MCR scaffold. It showed a predicted binding energy of −11.10 kcal/mol, a flexophore-based predicted probability of activity of 78%; a predicted pIC50 of 6.35 M, and a meta-model predicted probability of 75.77%, also being suitable as a potential repurposing candidate.
Canertinib, an investigational EGFR, HER2, and ErbB-4 inhibitor, featured a quinazoline core scaffold (MCR9) and had a predicted binding energy of −10.39 kcal/mol. Despite a lower flexophore-based probability of activity (41%), it exhibited a high predicted pIC50 of 8.10 M and an exceptional meta-model probability of 99.63%, underscoring its potential as an AurB kinase inhibitor.
Figure 8A–C illustrates the predicted binding poses and molecular interactions of the three selected candidates within the AurB kinase active site. Hydrophobic interactions dominate the binding mode of montelukast (
Figure 8A), with significant contacts involving Leu83, Pro82, and Phe88, contributing to ligand stabilization. Additionally, a hydrogen bond with Lys85 strengthens the interaction, which is further supported by the salt bridge formed with Glu161. The overall binding energy of montelukast (−11.10 kcal/mol) reflects a strong affinity, yet the lack of an MCR scaffold indicates that its mechanism of action may be unconventional, relying more on its interaction network than conserved pharmacophores.
Hydrophobic contacts with residues Leu83, Tyr156, and Ala157 of saredutant (
Figure 8B) ensure firm anchoring within the pocket. The hydrogen bond interaction with Glu161, a key residue, highlights its potential for kinase inhibition. Notably, despite the strong binding energy of saredutant (−11.40 kcal/mol) and moderate predicted probabilities in some models, it also lacks a scaffold from one of the established relevant MCRs, suggesting it may act through atypical binding dynamics, making it an interesting candidate for further exploration.
Figure 8C reveals that canertinib engages extensively with hydrophobic residues such as Leu138, Ala157, and Phe88, similar to known inhibitors (e.g., VX-680). Hydrogen bonding with Glu161, Leu138, and Phe219 further stabilizes the binding, while the quinazoline scaffold aligns well within the pocket, reminiscent of other kinase inhibitors. With a binding energy of −10.39 kcal/mol and a high predicted meta-model probability of 99.63, canertinib emerges as a strong candidate for repurposing, especially due to its preserved MCR scaffold.
Given their strong binding energies and favorable interaction profiles within the active site, the three candidates demonstrated significant potential as AurB inhibitors. The compounds were subjected to further validation through molecular dynamics simulations aimed at assessing the stability of the predicted complex with AurB.
2.8. Molecular Dynamics Results
The MD simulations were conducted to evaluate the stability and dynamic behavior of the AurB–ligand complexes over a 125 ns trajectory. Two reference systems were included: the apo structure of AurB (negative control) and the co-crystallized inhibitor-bound structure (positive control). The results, illustrated in
Figure 9, highlighted key differences in the stability and flexibility of the complexes, providing insights into the suitability of the three repurposing candidates.
The root mean square deviation (RMSD) of Cα atoms (
Figure 9A) revealed that the ligand-free (apo) structure exhibited higher stability than the complexes with montelukast and canertinib, with RMSD values fluctuating significantly throughout the simulation. Conversely, the co-crystallized inhibitor (VX-680) and the complex with saredutant showed relatively stable trajectories. Among the candidates, the saredutant demonstrated the most stable complex formation, maintaining low RMSD values close to VX-680, indicating a robust binding within the active site.
The radius of gyration (Rg) values (
Figure 9B) further supported the structural compactness of the protein–ligand complexes. The Rg values for the apo form fluctuated prominently, suggesting structural instability. The complex with VX-680 maintained a more compact conformation toward the end of the simulation. The complexes with the three candidates showed similar compactness in the last 15 ns., while the AurB–saredutant complex exhibited similar behavior with the positive control in the first 50 ns.
The average number of intramolecular hydrogen bonds (
Figure 9C) had similar values for all the simulated systems (214–217). Towards the end of the simulation, a higher number of intramolecular hydrogen bonds was observed for the complexes with saredutant and canertinib.
The root mean square fluctuation (RMSF) per residue (
Figure 9D) highlighted localized flexibility across the protein structure. Interestingly, residues in the apo system displayed similar fluctuations with the AurB–VX-680 complex. However, the complexes with the three candidates showed lower fluctuations of the residues within the loop (hinge region) connecting the two lobes. Moreover, some of the residues involved in saredutant binding exhibited higher fluctuations than both the apo structure and other complexes, such as Phe88, which is involved in ATP-binding, illustrating that the initial conformation of the AurB–saredutant complex could represent an intermediate binding mode.
The ligand movement RMSD values (
Figure 9E) further revealed that the positive control showed the lowest movement within the binding site during the simulation, followed by canertinib and montelukast. Interestingly, the saredutant showed high movement in the first 25 ns simulation time, which was then stabilized throughout the rest of the trajectory. Ligand conformation RMSD values (
Figure 9F) emphasized minimal fluctuations for canertinib within the active site, while VX-680 showed higher conformational changes. Nonetheless, both saredutant and montelukast displayed higher variations in binding conformations. Both ligand movement and conformation RMSD profiles of saredutant illustrated a 25 ns equilibration time of the protein–ligand complex, the fluctuations in RMSD being much lower during the rest of the simulation (last 100 ns). Therefore, it can be assumed that the saredutant underwent a high conformational change in the equilibration phase to engage in more stable binding with the active site. Moreover, the overall ligand movement and ligand conformation RMSD profiles of the last 100 ns showed relatively fewer fluctuations when compared to VX-680. The reference ligand VX-680 had the lowest value for the minimum free binding energy calculated with the MM/PBSA approach (−86.542 kcal/mol), followed by montelukast (−72.098 kcal/mol), saredutant (−65.691 kcal/mol), and canertinib (−55.996), respectively. Binding free energy calculations were performed only for the last 100 ns, corresponding to the production phase.
3. Discussion
The developed drug repurposing framework integrated ligand-based and structure-based computational methods to enhance the prediction accuracy of AurB inhibitors. Initially, ligand-based techniques, including scaffold analysis, QSAR modeling, and classification models using molecular fingerprints, were utilized to identify structural features predictive of AurB activity. These approaches informed the creation of a predictive meta-model that combined outputs from QSAR predictions, classification probabilities, and scaffold information to refine candidate selection. This ligand-based meta-model filtered 548 compounds with over 50% probability of inhibitory activity before subjecting them to molecular docking screening. Subsequently, docking-derived interaction fingerprints and binding energies were used to train the final multi-layer perceptron model. This step incorporated molecular interaction data into the prediction process, extending beyond simple quantitative evaluations. By integrating these complementary methodologies, the proposed workflow achieved a balance between structural specificity and prediction robustness, underscoring the synergistic potential of combining ligand- and structure-based approaches in drug discovery.
The structure–activity analysis of the AurB inhibitors provided valuable insights into the chemical features that are important for inhibitory activity. The proposed strategy emphasizes drug repurposing due to its potential to significantly shorten the timeline for bringing new therapies to market. Since repurposed drugs have already undergone extensive clinical testing, including safety and pharmacokinetic evaluations, they can move more swiftly through the regulatory process compared to novel compounds that require comprehensive preclinical and clinical studies.
Other in silico studies on AurB inhibitors have also identified critical binding site residues and employed advanced docking protocols. For example, a study by Sarvagalla et al. highlighted residues Arg159, Glu161, and Lys164 as critical for designing subtype-selective inhibitors due to their solvent-exposed location [
30]. Similarly, our docking results align with these findings, particularly highlighting interactions with Glu161 across our proposed candidates. This consistency with previous studies underscores the robustness of our combined framework. Furthermore, the integration of meta-model predictions in our pipeline extends the capabilities of traditional QSAR or docking-only methods, a step forward compared to other similar works in the literature [
20,
25,
26].
This study demonstrates the powerful synergy of computational tools in drug discovery. By integrating docking simulations with machine learning models, we developed a comprehensive framework to predict AurB inhibitors, identifying saredutant, montelukast, and canertinib as promising candidates. Molecular dynamics (MD) simulations revealed that saredutant could potentially be the most suitable candidate for repurposing. The initial fluctuations (first 25 ns) of the saredutant may represent conformational adjustments as the ligand better accommodates into the binding pocket, showing more stable dynamics in the following 100 ns. However, the complexity of the predictive models using advanced machine learning techniques hinders interpretability, making it difficult to clearly understand the molecular features that contribute mostly to activity. Another limitation of our study is the nature of the molecular descriptors used in QSAR modeling since 6 out of 10 features represent sums of van der Waals surface area contributions of specific atoms based on several ranges for partial charges, lipophilicity, molecular refractivity, and electrotopological state indices, while three of the features combine van der Waals surface areas and different bins of atomic partial charges computed using PEOE method.
Despite these promising findings, experimental validation remains essential to confirm the inhibitory activity of the identified compounds. The achievement of this study lies not only in the identification of the potential repurposed drugs but also in the development of a versatile, integrated computational pipeline for drug discovery. This combined method, which incorporates docking, machine learning, and molecular dynamics, is broadly applicable and can be adapted to other drug targets beyond AurB. Additionally, leveraging structural insights gained from this study can orient de novo drug design, offering a pathway to novel inhibitors with greater specificity and improved therapeutic profiles.