Pharmacophore Synergism in Diverse Scaffold Clinches in Aurora Kinase B

Aurora kinase B (AKB) is a crucial signaling kinase with an important role in cell division. Therefore, inhibition of AKB is an attractive approach to the treatment of cancer. In the present work, extensive quantitative structure–activity relationships (QSAR) analysis has been performed using a set of 561 structurally diverse aurora kinase B inhibitors. The Organization for Economic Cooperation and Development (OECD) guidelines were used to develop a QSAR model that has high statistical performance (R2tr = 0.815, Q2LMO = 0.808, R2ex = 0.814, CCCex = 0.899). The seven-variable-based newly developed QSAR model has an excellent balance of external predictive ability (Predictive QSAR) and mechanistic interpretation (Mechanistic QSAR). The QSAR analysis successfully identifies not only the visible pharmacophoric features but also the hidden features. The analysis indicates that the lipophilic and polar groups—especially the H-bond capable groups—must be present at a specific distance from each other. Moreover, the ring nitrogen and ring carbon atoms play important roles in determining the inhibitory activity for AKB. The analysis effectively captures reported as well as unreported pharmacophoric features. The results of the present analysis are also supported by the reported crystal structures of inhibitors bound to AKB.


Introduction
The machinery for cell division, also known as mitosis, is completely regulated. Any irregularity or imperfect mitosis results in nondiploid DNA content, which ultimately causes cancer [1]. Researchers have therefore become interested in developing cancer chemotherapeutics that target centrosome maturation and separation, mitotic spindle assembly, chromosomal separation, and cytokinesis involving the participation of numerous important signaling kinases, including aurora, polo-like-kinase (Plk), and cyclin-dependent kinase (Cdk) [2,3]. The successful transition to mitosis depends on the aurora kinase family of serine/threonine kinases [4][5][6][7]. Since their discovery in 1995 and the initial detection of their expression in human cancer tissue in 1998 [2,5,[7][8][9], these kinases have received a great deal of attention. This is due to their aberrant and excessive expression in a wide range of solid and liquid tumors, such as pancreatic, lung, liver, and breast tumors, as well as their oncogenic activity [2,4,5,[7][8][9][10][11].
The aurora kinase family consists of three isoforms (A, B, and C), each of which differs in the length and amino acid composition of the N-terminal domain, but they share a common and conserved ATP binding site [2,12]. In order for the centrosome to mature, and for spindle assembly, meiosis, and metaphase spindle orientation to occur, aurora-A In these conditions, a quick and effective strategy to find AKB inhibitors is still a key goal for medicinal chemists. To fulfill this goal, there is a need to use modern methods such as computer-aided drug design (CADD) to reduce time, costs, trial-and-error procedures, and other required resources [14,15]. The vibrant and developing field of CADD is successful due to the result-oriented performance of molecular docking, QSAR, and its other branches [14][15][16]. In QSAR, a mathematical model is created to connect chemical descriptors (structural features) to a desired bioactivity profile using a wide range of machine learning techniques [17,18]. In a more pragmatic sense, QSAR allows one to prioritize compounds with desirable attributes for a subsequent (and presumably successful) biological evaluation [17][18][19]. Traditional QSAR concentrates on producing statistically significant models [17][18][19]. Previously, different researchers have reported QSAR models for AKB using different techniques. For example, Neaz et al. [20] reported a 3D-QSAR model for a dataset of fortyeight quinazoline derivatives possessing other heterocyclic rings. The developed model had a leave-one-out cross-validated correlation coefficient (Q2LOO) of 0.56. Another 3D-QSAR and molecular docking study of azaindole derivatives as AKB inhibitors was accomplished by Lan and co-workers [21]. The best developed QSAR model based on forty-one molecules had Q2LOO = 0.575. Likewise, Ashraf et al. [22] used a dataset of 57 acylureidoindolin derivatives to develop a 3D-QSAR model, which had Q2LOO = 0.641, and indicated that electrostatic and hydrophobic fields determine the activity of compounds. Thus, AKB has been the subject of QSAR research; however, the developed QSAR models find little usage due to a lack of generalizability, low predictive power, being based on small datasets comprising limited scaffolds, or a combination of these factors. Therefore, there is a need to develop a robust and balanced QSAR model based on a larger dataset, encompassing diverse structural scaffolds. Consequently, in the present work, a QSAR model has been developed that possesses high external predictive ability and extensive mechanistic interpretations supported by X-rayresolved structures. In these conditions, a quick and effective strategy to find AKB inhibitors is still a key goal for medicinal chemists. To fulfill this goal, there is a need to use modern methods such as computer-aided drug design (CADD) to reduce time, costs, trial-and-error procedures, and other required resources [14,15]. The vibrant and developing field of CADD is successful due to the result-oriented performance of molecular docking, QSAR, and its other branches [14][15][16]. In QSAR, a mathematical model is created to connect chemical descriptors (structural features) to a desired bioactivity profile using a wide range of machine learning techniques [17,18]. In a more pragmatic sense, QSAR allows one to prioritize compounds with desirable attributes for a subsequent (and presumably successful) biological evaluation [17][18][19]. Traditional QSAR concentrates on producing statistically significant models [17][18][19]. Previously, different researchers have reported QSAR models for AKB using different techniques. For example, Neaz et al. [20] reported a 3D-QSAR model for a dataset of forty-eight quinazoline derivatives possessing other heterocyclic rings. The developed model had a leave-one-out cross-validated correlation coefficient (Q2LOO) of 0.56. Another 3D-QSAR and molecular docking study of azaindole derivatives as AKB inhibitors was accomplished by Lan and co-workers [21]. The best developed QSAR model based on forty-one molecules had Q2LOO = 0.575. Likewise, Ashraf et al. [22] used a dataset of 57 acylureidoindolin derivatives to develop a 3D-QSAR model, which had Q2LOO = 0.641, and indicated that electrostatic and hydrophobic fields determine the activity of compounds. Thus, AKB has been the subject of QSAR research; however, the developed QSAR models find little usage due to a lack of generalizability, low predictive power, being based on small datasets comprising limited scaffolds, or a combination of these factors. Therefore, there is a need to develop a robust and balanced QSAR model based on a larger dataset, encompassing diverse structural scaffolds. Consequently, in the present work, a QSAR model has been developed that possesses high external predictive ability and extensive mechanistic interpretations supported by X-ray-resolved structures.

Results
As stated in Section 1, the focus was on developing a genetic algorithm-multilinear regression (GA-MLR) model with a combination of mechanistic interpretation and high predictive power. We have discovered several structural features in the current investigation. The recently constructed seven-parameter model and its statistical validation parameters are as follows. Model A is statistically robust, as shown by the high values of various statistical parameters, such as the coefficient of determination (R 2 tr ) and cross-validated coefficient of determination for leave-one-out (R2cv or Q2LOO), the external coefficient of determination (R 2 ex ), Q2-Fn and the Concordance Correlation Coefficient (CCC ex ), etc., and the low values of lack-of-fit (LOF), root mean square error (RMSEtr), and mean absolute error (MAE). As a result, model A has high external predictive ability [23][24][25][26][27][28][29][30], is devoid of random correlations [31,32], and meets suggested threshold values for key parameters. The Supplementary Materials contain the formulae to determine these parameters. A Williams plot was used to evaluate the model's applicability domain [33][34][35][36]. As a result, it complies with all the OECD-recommended standards and requirements for developing a valuable QSAR model. Different graphs associated with model A are depicted in Figure 2.

Results
As stated in Section 1, the focus was on developing a genetic algorithm-multilinear regression (GA-MLR) model with a combination of mechanistic interpretation and high predictive power. We have discovered several structural features in the current investigation. The recently constructed seven-parameter model and its statistical validation parameters are as follows.
Model Model A is statistically robust, as shown by the high values of various statistical parameters, such as the coefficient of determination (R 2 tr) and cross-validated coefficient of determination for leave-one-out (R2cv or Q2LOO), the external coefficient of determination (R 2 ex), Q2-Fn and the Concordance Correlation Coefficient (CCCex), etc., and the low values of lack-of-fit (LOF), root mean square error (RMSEtr), and mean absolute error (MAE). As a result, model A has high external predictive ability [23][24][25][26][27][28][29][30], is devoid of random correlations [31,32], and meets suggested threshold values for key parameters. The Supplementary Materials contain the formulae to determine these parameters. A Williams plot was used to evaluate the model's applicability domain [33][34][35][36]. As a result, it complies with all the OECD-recommended standards and requirements for developing a valuable QSAR model. Different graphs associated with model A are depicted in Figure  2. There are seven descriptors in model A, which have been calculated by PyDescriptor [37] and tabulated in Table 1. Of the seven descriptors, five descriptors, viz. fringNplaN4B, fsp3Csp2N5B, N_H_2B, fsp2Osp2C5B, and da_lipo_5B, have positive coefficients in model A, implying that increasing their value could lead to a better activity profile, whereas the reverse is true for the remaining two descriptors, fOringC6B and fringNC6B, which have negative coefficients in model A. Each molecular descriptor, There are seven descriptors in model A, which have been calculated by PyDescriptor [37] and tabulated in Table 1. Of the seven descriptors, five descriptors, viz. fringN-plaN4B, fsp3Csp2N5B, N_H_2B, fsp2Osp2C5B, and da_lipo_5B, have positive coefficients in model A, implying that increasing their value could lead to a better activity profile, whereas the reverse is true for the remaining two descriptors, fOringC6B and fringNC6B, which have negative coefficients in model A. Each molecular descriptor, which is a numeric representation of structural features [37][38][39], has correlations with different types of pharmacophoric features, which govern the inhibitory profile. However, it is to be noted that a single structural feature can neither explain nor fully determine the final biological activity (IC50) of a molecule. The biological activity IC50, etc., is an outcome of a combination of different structural features and some unknown factors. Some features enhance the desired pharmacological activity, whereas others are responsible for reversing it. It is believed that two or more pharmacophoric groups concomitantly decide the biological activity (pharmacophore synergism).

Discussion
Of the seven descriptors in model A, five descriptors, viz. fringNplaN4B, fsp3Csp2N5B, N_H_2B, da_lipo_5B, and fringNC6B, indicate the importance of different types of nitrogen atoms in determining the inhibitory activity for aurora kinase B. The same is true for carbon, which is present in four descriptors, viz. fsp3Csp2N5B, da_lipo_5B, fringNC6B, and fOringC6B. The relevance of oxygen is due to its presence in three descriptors, viz. fsp2Osp2C5B, da_lipo_5B, and fOringC6B. At the same time, it should be noted that the descriptors present in model A are highly interlinked; that is, increasing the value of one descriptor could significantly change the value of another descriptor. This leads to substantial changes in the biological profile of a molecule, pointing toward pharmacophore synergism, as molecular descriptors are mathematical representations of pharmacophores. For example, the values of descriptors fringNplaN4B and fringNC6B vary with the presence/absence of ring nitrogen atoms. Therefore, increasing the value of fringNplaN4B by escalating ring nitrogen atoms could also lead to a higher fringNC6B value. Therefore, in the present work, we have adopted an approach that involves the concomitant consideration of two or more molecular descriptors to explain the variance in the activity profile of matched molecular pairs (MMP). Accordingly, the molecular descriptors whose values have changed for MMP have been discussed concurrently with relevant examples in Section 3.

da_lipo_5B:
The descriptor da_lipo_5B is simultaneously associated with two important aspects of a molecule: lipophilic character and H-bonding-capable (donor and acceptor) atoms. It is to be noted that, in the present work, a carbon atom is non-lipophilic while calculating da_lipo_5B, if oxygen or nitrogen is attached to it. The average value of da_lipo_5B for the top one hundred active molecules (IC50 = 0.26 to 4.3 nM) is 15.29, and the value for the least active one hundred molecules (IC50 = 611 to 16,000 nM) is 8.51. This reveals that the higher the number of lipophilic atoms within five bonds of a H-bond-capable atom, the higher the activity. This gives an initial impression that lipophilicity (mostly represented by logP [40]) is the only governing factor. However, the calculated logP (clogP), which represents molecular lipophilicity, has a weak correlation of 0.077 with pIC50, whereas da_lipo_5B has a value of 0.533. Therefore, the conditional occurrence of lipophilic atoms in the vicinity of H-bonding-capable atoms is a better choice. A plausible reason could be the composition of the active site of AKB, which consists of the persistent presence of lipophilic residues such as Gly, Leu, Val, Phe, etc., between the acidic or basic residues such as Glu, Asp, Lys [22]. This is why an aurora kinase B inhibitor also requires the presence of H-bond-capable atoms, preferably with separation by five bonds and the concomitant occurrence of lipophilic atoms in their vicinity. This observation is confirmed by the reported X-ray-resolved structure of aurora kinase B (pdb: 4c2w [41]) (see Figure 3). discussed concurrently with relevant examples in Section 3.

da_lipo_5B:
The descriptor da_lipo_5B is simultaneously associated with two important aspects of a molecule: lipophilic character and H-bonding-capable (donor and acceptor) atoms. It is to be noted that, in the present work, a carbon atom is non-lipophilic while calculating da_lipo_5B, if oxygen or nitrogen is attached to it. The average value of da_lipo_5B for the top one hundred active molecules (IC50 = 0.26 to 4.3 nM) is 15.29, and the value for the least active one hundred molecules (IC50 = 611 to 16,000 nM) is 8.51. This reveals that the higher the number of lipophilic atoms within five bonds of a H-bond-capable atom, the higher the activity. This gives an initial impression that lipophilicity (mostly represented by logP [40]) is the only governing factor. However, the calculated logP (clogP), which represents molecular lipophilicity, has a weak correlation of 0.077 with pIC50, whereas da_lipo_5B has a value of 0.533. Therefore, the conditional occurrence of lipophilic atoms in the vicinity of H-bonding-capable atoms is a better choice. A plausible reason could be the composition of the active site of AKB, which consists of the persistent presence of lipophilic residues such as Gly, Leu, Val, Phe, etc., between the acidic or basic residues such as Glu, Asp, Lys [22]. This is why an aurora kinase B inhibitor also requires the presence of H-bond-capable atoms, preferably with separation by five bonds and the concomitant occurrence of lipophilic atoms in their vicinity. This observation is confirmed by the reported X-ray-resolved structure of aurora kinase B (pdb: 4c2w [41]) (see Figure  3).

fringNplaN4B:
fringNplaN4B stands for the frequency of occurrence of planer nitrogen atoms exactly at four bonds from a ring nitrogen atom. If the same planer nitrogen atom is also present at ≤4 bonds from the same or any other ring nitrogen atom through any path, then it is excluded while calculating fringNplaN4B. The importance of fringNplaN4B is reflected by the fact that the most active 110 molecules with IC50 values ranging from 0.26 to 5.9 nM have one or more combinations of planer and ring nitrogen atoms. The reverse is true for less active molecules (IC50 = 16,000 to 611 nM), with some exceptions, such as molecule numbers 213, 73, 71, 66, 20, etc. Moreover, it was observed that replacing fringNplaN4B with its corresponding equivalents, fringNplaN3B and fringNplaN5B, for three and five bonds led to a reduction in the performance of model A (R 2 = 0.770, for both). Moreover, fringNplaN3B and fringNplaN5B have a correlation of R = 0.084 and 0.028 with pIC50, respectively, whereas fringNplaN4B is a better choice as a descriptor, with R = 0.628. However, at first sight, it appears that, individually, ringN (number of ring nitrogen atoms) or nplanN (number of planer nitrogen atoms) could be an alternative to fringN-plaN4B. However, both have a weak correlation of 0.207 and 0.374 with pIC50, respectively. Moreover, a loss in the statistical performance of model A on replacing fringNplaN4B with ringN (R 2 = 0.772) or nplanN (R 2 = 0.770) again confirmed the importance of fringNplaN4B. Therefore, a combination of ring and planer nitrogen atoms separated exactly by four bonds is an important structural feature to obtain a better pIC50 for AKB.
A literature survey reveals that for pyrrolopyrazole derivatives, a substituted 3aminopyrazole moiety is important due to its ability to interact with the hinge region of the ATP binding site [2]. The three nitrogen atoms of the N-C-N-N pattern present in 3-aminopyrazole are responsible for binding with the receptor [2]. Unfortunately, it appears that the reported pattern is exclusive to pyrrolopyrazole derivatives bearing a substituted 3-aminopyrazole moiety. Interestingly, the terminal nitrogen atoms of the N-C-N-N pattern are actually ring and planer nitrogen atoms, thereby suggesting the possible presence of fringNplaN4B. However, in many active molecules of the present dataset bearing a substituted 3-aminopyrazole moiety, the value of fringNplaN4B is zero; this is because the planer nitrogen of the N-C-N-N pattern is also present within ≤4 bonds of the other ring nitrogen atom. However, in several active molecules for AKB, fringNplaN4B is present due to other scaffolds (see Figure 4). In other words, instead of the N-C-N-N pattern or a substituted 3-aminopyrazole moiety, an emphasis on the simultaneous presence of planer and ring nitrogen atoms separated by four bonds in the molecule is a better strategy to enhance the inhibitory profile against AKB. Hence, the present work successfully identified a novel aspect of a reported pattern (N-C-N-N) and extended it for other scaffolds.

N_H_2B:
The positive coefficient for N_H_2B indicates that the presence of hydrogen in the vicinity of nitrogen is beneficial to increase the inhibitory activity for aurora kinase B. In many molecules, N_H_2B exists due to the direct attachment of a hydrogen atom to a nitrogen atom (N-H) or due to hydrogen atoms bonded to carbon atoms adjacent to nitrogen (N-CHn fragment). N_H_2B favors two important structural features that could lead to a better inhibitory profile: (1) the presence of polar hydrogen atoms as N-H or N-CHn fragments; (2) steric hindrance or bulkiness in the vicinity of nitrogen atoms, because hydrogen is the smallest among all the elements. The lesser the bulkiness around nitrogen atoms, the better the inhibitory profile. These two structural features in combination allow

N_H_2B:
The positive coefficient for N_H_2B indicates that the presence of hydrogen in the vicinity of nitrogen is beneficial to increase the inhibitory activity for aurora kinase B. In many molecules, N_H_2B exists due to the direct attachment of a hydrogen atom to a nitrogen atom (N-H) or due to hydrogen atoms bonded to carbon atoms adjacent to nitrogen (N-CHn fragment). N_H_2B favors two important structural features that could lead to a better inhibitory profile: (1) the presence of polar hydrogen atoms as N-H or N-CHn fragments; (2) steric hindrance or bulkiness in the vicinity of nitrogen atoms, because hydrogen is the smallest among all the elements. The lesser the bulkiness around nitrogen atoms, the better the inhibitory profile. These two structural features in combination allow the polar interactions or H-bond formation between the ligand and the receptor. This observation, and the significance of N_H_2B as well as da_lipo_5B, is confirmed by the two forms of the ligand VX-680 (molecule number 14) in the pdb 4b8m [42].
The ligand VX-680 exists in two different forms, labeled as TA and TB in the present work, in the two chains of pdb 4b8m. From Figure 5 and Table 2, it is clear that the TA form consists of a higher number of hydrogen atoms than TB, especially in the vicinity of nitrogen atoms. This led to different values for N_H_2B for the two forms (see Figure 5). The form TA, having a higher N_H_2B value, has a higher number of interactions with the receptor, because the additional hydrogen atoms attached to the nitrogen atoms of the pyrazole (designated as N19 and N20) ring and aminopyrimidine (designated as N14) are responsible for H-bond interactions with Glu171, Phe172, and Ala173 (see Table 2). Meanwhile, these interactions are absent for TB, even though the respective atoms N19 and N14 of TB are more proximate to receptor atoms. The TB form has only one prominent interaction with the receptor due to the nitrogen (designated as N20) of the pyrazole ring in the form of a H-bond with Ala173.  The following comparisons of molecules further highlight the importance of N_H_2B (see Figure 6): 108 with 75 and 101, 486 with 487 and 484, and 148 with 144, to list a few. A simple analysis of these examples indicates that the presence of a pyrazole ring leads to a better IC50 for a molecule (see Figure 6). However, it has a negative correlation (R = −0.177) with pIC50. A plausible reason appears from the present work suggesting that Hbond-capable polar groups are more suitable near the periphery of a molecule, rather than a pyrazole ring, to achieve good interactions with the receptor.  The following comparisons of molecules further highlight the importance of N_H_2B (see Figure 6): 108 with 75 and 101, 486 with 487 and 484, and 148 with 144, to list a few. A simple analysis of these examples indicates that the presence of a pyrazole ring leads to a better IC50 for a molecule (see Figure 6). However, it has a negative correlation (R = −0.177) with pIC50. A plausible reason appears from the present work suggesting that H-bond-capable polar groups are more suitable near the periphery of a molecule, rather than a pyrazole ring, to achieve good interactions with the receptor. The following comparisons of molecules further highlight the importance of N_H_2B (see Figure 6): 108 with 75 and 101, 486 with 487 and 484, and 148 with 144, to list a few. A simple analysis of these examples indicates that the presence of a pyrazole ring leads to a better IC50 for a molecule (see Figure 6). However, it has a negative correlation (R = −0.177) with pIC50. A plausible reason appears from the present work suggesting that Hbond-capable polar groups are more suitable near the periphery of a molecule, rather than a pyrazole ring, to achieve good interactions with the receptor.

fsp3Csp2N5B:
The descriptor fsp3Csp2N5B is associated with two features, viz. sp2-hybridized nitrogen and sp3-hybridized carbon atoms. As it has a positive coefficient in model 1, increasing the numbers of such atoms favors the augmentation of pIC50. At the same time, increasing fsp3Csp2N5B could influence the values of da_lipo_5B and N_H_2B, as these descriptors are associated with carbon and nitrogen too. Therefore, it indicates that pharmacophore synergism determines the final inhibitory ability of a molecule for AKB. This is clearly reflected when molecule 435 is compared with molecule 438.
The pdb 4c2v contains two different tautomeric forms of ligand YJA in two different chains, A and B. The influence of fsp3Csp2N5B along with N_H_2B is observed for the two tautomeric forms of co-crystallized ligand 'YJA' in the pdb 4c2v [41]. The two tautomeric forms show that YJA-T1 and YJA-T2 (see Figure 7) of ligand YJA have different values for fsp3Csp2N5B and N_H_2B (see Table 3). The online tautomer generator from Chemaxon (https://disco.chemaxon.com/calculators/demo/plugins/tautomers/, accessed on 28 October 2022) indicates that the ligand YJA can exist in seven different tautomeric forms. However, only two tautomeric forms, YJA-T1 and YJA-T2, predominate, with approximately 16 and 84 percent, respectively. The rest of the tautomeric forms have less than a 0.1% probability of existence.
A comparison of the interactions of YJA-T1 and YJA-T2 with the receptor and the solvent indicates that the two forms have established H-bonds with the similar amino acid residues of the receptor but with different distances (see Figure 8). The YJA-T2 has an additional H-bond with the solvent (HOH2108). Moreover, it has a higher number of interactions with the receptor and the solvent (H 2 O) within 5 Å compared to YJA-T1. Thus, the increased value of fsp3Csp2N5B and N_H_2B for these two tautomeric forms correlates with a higher number of receptor atoms in the vicinity, which ultimately leads to an augmented number of interactions. Additional details related to the interactions of YJA-T1 and YJA-T2 with the receptor are available in Table S1 in the Supplementary Materials. tautomeric forms show that YJA-T1 and YJA-T2 (see Figure 7) of ligand YJA have different values for fsp3Csp2N5B and N_H_2B (see Table 3). The online tautomer generator from Chemaxon (https://disco.chemaxon.com/calculators/demo/plugins/tautomers/, accessed on 28 October 2022) indicates that the ligand YJA can exist in seven different tautomeric forms. However, only two tautomeric forms, YJA-T1 and YJA-T2, predominate, with approximately 16 and 84 percent, respectively. The rest of the tautomeric forms have less than a 0.1% probability of existence.     The molecular descriptor fsp2Osp2C5B underlines the influence of a specific combination of sp2-hybridized carbon with sp2-hybridized oxygen in determining the inhibitory profile for AKB. The positive coefficient for fsp2Osp2C5B indicates that increasing such a combination of oxygen and carbon could lead to a better inhibitory profile. In the present dataset, there are 426 molecules with the presence of at least one such combination of oxygen and carbon. Likewise, the 200 most active molecules with IC50 values in the range of 0.26 to 24 nM, except molecule numbers 36 and 469, also possess fsp2Osp2C5B >1. A comparison of molecule number 167 with 168 further strengthens this observation (see Figure 9).
A closer analysis revealed that the sp2-hybridized carbon with sp2-hybridized oxygen, required for the existence of fsp2Osp2C5B are, in general, aromatic carbon atoms and oxygen of the carbonyl group, especially the amide group, respectively. This further highlights the importance of aromatic rings-and in turn lipophilic atoms-as aromatic carbons are mostly lipophilic in nature. The need for an amide group in conjugation point outs the necessity of a polar group to enhance the interactions with the receptor. The two tautomeric forms of YJA-T1 and T2 possess such a combination and it results in enhanced interactions with the receptor (see Figure 8). Obviously, a sp2-hybridized carbon atom will be at a respective distance of three and five bonds from the nitrogen and oxygen atoms of the same amide group; therefore, we also checked the importance of famdNsp2C3B (frequency of occurrence of sp2-hybridized carbon atoms exactly at three bonds from amide nitrogen atoms). It was observed that fsp2Osp2C5B and famdNsp2C3B have a correlation of 0.64 and 0.58, respectively, with pIC50. Therefore, fsp2Osp2C5B is a better choice to be considered for future optimizations and activity predictions.
additional H-bond with the solvent (HOH2108). Moreover, it has a higher number of interactions with the receptor and the solvent (H2O) within 5 Å compared to YJA-T1. Thus, the increased value of fsp3Csp2N5B and N_H_2B for these two tautomeric forms correlates with a higher number of receptor atoms in the vicinity, which ultimately leads to an augmented number of interactions. Additional details related to the interactions of YJA-T1 and YJA-T2 with the receptor are available in

fsp2Osp2C5B:
The molecular descriptor fsp2Osp2C5B underlines the influence of a specific combination of sp2-hybridized carbon with sp2-hybridized oxygen in determining the inhibitory profile for AKB. The positive coefficient for fsp2Osp2C5B indicates that increasing such a combination of oxygen and carbon could lead to a better inhibitory profile. In the present dataset, there are 426 molecules with the presence of at least one such combination of oxygen and carbon. Likewise, the 200 most active molecules with IC50 values in the range of 0.26 to 24 nM, except molecule numbers 36 and 469, also possess fsp2Osp2C5B >1. A comparison of molecule number 167 with 168 further strengthens this observation (see Figure  9). A closer analysis revealed that the sp2-hybridized carbon with sp2-hybridized oxygen, required for the existence of fsp2Osp2C5B are, in general, aromatic carbon atoms and oxygen of the carbonyl group, especially the amide group, respectively. This further highlights the importance of aromatic rings-and in turn lipophilic atoms-as aromatic carbons are mostly lipophilic in nature. The need for an amide group in conjugation point outs the necessity of a polar group to enhance the interactions with the receptor. The two

fsp2Osp2C5B:
The molecular descriptor fsp2Osp2C5B underlines the influence of a specific combination of sp2-hybridized carbon with sp2-hybridized oxygen in determining the inhibitory profile for AKB. The positive coefficient for fsp2Osp2C5B indicates that increasing such a combination of oxygen and carbon could lead to a better inhibitory profile. In the present dataset, there are 426 molecules with the presence of at least one such combination of oxygen and carbon. Likewise, the 200 most active molecules with IC50 values in the range of 0.26 to 24 nM, except molecule numbers 36 and 469, also possess fsp2Osp2C5B >1. A comparison of molecule number 167 with 168 further strengthens this observation (see Figure  9). A closer analysis revealed that the sp2-hybridized carbon with sp2-hybridized oxygen, required for the existence of fsp2Osp2C5B are, in general, aromatic carbon atoms and oxygen of the carbonyl group, especially the amide group, respectively. This further highlights the importance of aromatic rings-and in turn lipophilic atoms-as aromatic carbons are mostly lipophilic in nature. The need for an amide group in conjugation point outs the necessity of a polar group to enhance the interactions with the receptor. The two tautomeric forms of YJA-T1 and T2 possess such a combination and it results in enhanced interactions with the receptor (see Figure 8). Obviously, a sp2-hybridized carbon atom will be at a respective distance of three and five bonds from the nitrogen and oxygen atoms of the same amide group; therefore, we also checked the importance of famdNsp2C3B (frequency of occurrence of sp2-hybridized carbon atoms exactly at three bonds from amide nitrogen atoms). It was observed that fsp2Osp2C5B and famdNsp2C3B

fOringC6B:
The descriptor fOringC6B is associated with the simultaneous and conditional occurrence of polar (oxygen) and lipophilic characters (ring carbons) with an exact separation by six bonds. If a ring carbon is also present within five or less bonds of any other oxygen atom, then it is omitted while calculating fOringC6B. The molecular descriptor fOringC6B has a negative coefficient in model 1, which means that a higher number of such carbon atoms could reduce the inhibitory profile of a molecule for AKB. This is confirmed when the following pairs of molecules are compared: 526 with 511, 526 with 521, 204 with 205, 229 with 231, 477 with 485, and 256 with 257. The descriptor has been depicted in Figure 10. The red dots indicate the ring carbons, which contribute to fOringC6B at exactly six bonds from the oxygen atom. The six bonds separating such carbon and oxygen atoms have been labeled with numbers.
occurrence of polar (oxygen) and lipophilic characters (ring carbons) with an exact separation by six bonds. If a ring carbon is also present within five or less bonds of any other oxygen atom, then it is omitted while calculating fOringC6B. The molecular descriptor fOringC6B has a negative coefficient in model 1, which means that a higher number of such carbon atoms could reduce the inhibitory profile of a molecule for AKB. This is confirmed when the following pairs of molecules are compared: 526 with 511, 526 with 521, 204 with 205, 229 with 231, 477 with 485, and 256 with 257. The descriptor has been depicted in Figure 10. The red dots indicate the ring carbons, which contribute to fOringC6B at exactly six bonds from the oxygen atom. The six bonds separating such carbon and oxygen atoms have been labeled with numbers. It appears that reducing the number of ring carbon atoms is a feasible solution to achieve a lower value of fOringC6B, but this will affect negatively other descriptors, viz. da_lipo_5B, fsp2Osp2C5B. Instead, a solution is to reduce the number of oxygen atoms or alternatively increase their presence within five or less bonds of ring carbon atoms. The second solution is observed in the case of molecule number 229. The additional -OCH3 led to a decreased value of fOringC6B, because, while calculating fOringC6B, if a ring carbon atom was simultaneously present within six bonds of two or more oxygen atoms, it was excluded.

fringNC6B:
The molecular descriptor fringNC6B provides crucial information about the upper limit for separation required between the lipophilic (carbon atoms) and polar (nitrogen atoms) moieties to achieve a better activity profile. While calculating fringNC6B, if a carbon atom is also present within five bonds of any other ring nitrogen, then it is omitted. If a carbon atom is present exactly at a distance of six bonds from a ring nitrogen atom, then it contributes negatively; therefore, such a combination should be avoided. Reducing the bond gap between carbon and ring nitrogen is a feasible and justified solution, as other descriptors, viz. da_lipo_5B and fsp3Csp2N5B, also indicate the same. As stated earlier, a plausible reason for this could be the active site of AKB (see Figure 11). The influence of It appears that reducing the number of ring carbon atoms is a feasible solution to achieve a lower value of fOringC6B, but this will affect negatively other descriptors, viz. da_lipo_5B, fsp2Osp2C5B. Instead, a solution is to reduce the number of oxygen atoms or alternatively increase their presence within five or less bonds of ring carbon atoms. The second solution is observed in the case of molecule number 229. The additional -OCH3 led to a decreased value of fOringC6B, because, while calculating fOringC6B, if a ring carbon atom was simultaneously present within six bonds of two or more oxygen atoms, it was excluded.

fringNC6B:
The molecular descriptor fringNC6B provides crucial information about the upper limit for separation required between the lipophilic (carbon atoms) and polar (nitrogen atoms) moieties to achieve a better activity profile. While calculating fringNC6B, if a carbon atom is also present within five bonds of any other ring nitrogen, then it is omitted. If a carbon atom is present exactly at a distance of six bonds from a ring nitrogen atom, then it contributes negatively; therefore, such a combination should be avoided. Reducing the bond gap between carbon and ring nitrogen is a feasible and justified solution, as other descriptors, viz. da_lipo_5B and fsp3Csp2N5B, also indicate the same. As stated earlier, a plausible reason for this could be the active site of AKB (see Figure 11). The influence of fringNC6B on activity is confirmed when following pairs of molecules are compared: 5 with 500, 5 with 506, 374 with 406, 507 with 514, to list a few.
As stated earlier, the descriptors present in model A are entangled. Therefore, changing one descriptor could result in changes in other descriptors. For example, the descriptors fringNplaN4B and fringNC6B indicate the importance of ring nitrogen atoms. The fringN-plaN4B has a positive correlation with pIC50 but fringNC6B has the opposite relation. Therefore, increasing the value of fringNplaN4B by escalating the ring nitrogen atoms could also lead to a higher fringNC6B value. Hence, a balance of the appropriate number and types of nitrogen, carbon, and oxygen could lead to significant inhibitory activity for aurora kinase B. fringNC6B on activity is confirmed when following pairs of molecules are compared: 5 with 500, 5 with 506, 374 with 406, 507 with 514, to list a few. Figure 11. Depiction of fringNC6B using molecule numbers 5, 500, and 506 as representative examples. The carbon present at six bonds from ring nitrogen has been depicted using black dots. The numbers (black) indicate the counting of number of bonds between ring nitrogen and carbon.
As stated earlier, the descriptors present in model A are entangled. Therefore, changing one descriptor could result in changes in other descriptors. For example, the descriptors fringNplaN4B and fringNC6B indicate the importance of ring nitrogen atoms. The fringNplaN4B has a positive correlation with pIC50 but fringNC6B has the opposite relation. Therefore, increasing the value of fringNplaN4B by escalating the ring nitrogen atoms could also lead to a higher fringNC6B value. Hence, a balance of the appropriate number and types of nitrogen, carbon, and oxygen could lead to significant inhibitory activity for aurora kinase B.

Materials and Methods
In this work, we adhered to the OECD's and other researchers' suggested standards and recommendations [17][18][19]32,43,44] for a successful QSAR analysis. The various procedures for creating a model included meticulous dataset selection, data curation, 3D structure production for all molecules, computation and trimming of molecular descriptors, model creation and extensive validation, and mechanistic interpretation [45,46]. To eliminate bias and ensure proper model validation, these stages were carried out one at a time.

Selection of Dataset
The success and efficacy of a QSAR analysis in the drug discovery pipeline are significantly influenced by the size, composition, and structural diversity of the selected dataset used for the analysis [17][18][19]32,43,44]. As a result, a sizable dataset of 3398 reported AKB ligands was downloaded from BindingDB (https://www.bindingdb.org/bind/index.jsp, accessed on 14 January 2022). The dataset was then reduced to 561 molecules only after duplicates (average value for duplicates), salts, metal derivatives, rule-of-five violators, molecules with undefinable Ki values, etc., were eliminated during data curation [47]. The condensed dataset still included a variety of molecules, such as stereoisomers, positional and chain isomers, various heterocyclic and aromatic scaffolds, etc. Thus, it covered a broad chemical space. The experimental IC50 ranged from 0.26 to 16,000 nM. The experimental IC50 values were converted to pIC50 for a better QSAR analysis (−log10IC50). Figure 12 and Table 4 comprise some molecules that are very active and those that are least active, to help the readers to understand the structural variation present in the dataset. The carbon present at six bonds from ring nitrogen has been depicted using black dots. The numbers (black) indicate the counting of number of bonds between ring nitrogen and carbon.

Materials and Methods
In this work, we adhered to the OECD's and other researchers' suggested standards and recommendations [17][18][19]32,43,44] for a successful QSAR analysis. The various procedures for creating a model included meticulous dataset selection, data curation, 3D structure production for all molecules, computation and trimming of molecular descriptors, model creation and extensive validation, and mechanistic interpretation [45,46]. To eliminate bias and ensure proper model validation, these stages were carried out one at a time.

Selection of Dataset
The success and efficacy of a QSAR analysis in the drug discovery pipeline are significantly influenced by the size, composition, and structural diversity of the selected dataset used for the analysis [17][18][19]32,43,44]. As a result, a sizable dataset of 3398 reported AKB ligands was downloaded from BindingDB (https://www.bindingdb.org/bind/index.jsp, accessed on 14 January 2022). The dataset was then reduced to 561 molecules only after duplicates (average value for duplicates), salts, metal derivatives, rule-of-five violators, molecules with undefinable Ki values, etc., were eliminated during data curation [47]. The condensed dataset still included a variety of molecules, such as stereoisomers, positional and chain isomers, various heterocyclic and aromatic scaffolds, etc. Thus, it covered a broad chemical space. The experimental IC50 ranged from 0.26 to 16,000 nM. The experimental IC50 values were converted to pIC50 for a better QSAR analysis (−log 10 IC50). Figure 12 and Table 4 comprise some molecules that are very active and those that are least active, to help the readers to understand the structural variation present in the dataset.

Calculation of Molecular Descriptors and Objective Feature Selection (OFS)
The next step involved applying the proper methodology to convert SMILES notations into 3D-optimized structures. OpenBabel 3.1 [48] was used to translate SMILES to SDF for this. Then, utilizing PM3 as a force field for structure optimization and partial charge assignment, SDF was converted to MOL2 using MOPAC [49] 2016. After this, PyDescriptor [37] and PaDEL [50], which together offered more than 40,000 molecular descriptors for each molecule, were used for molecular descriptor calculation. Although using a large number of molecular descriptors increases the likelihood that a QSAR analysis will be effective, with a balance of predictive and mechanistic interpretation abilities, it also raises the risk of overfitting due to noisy redundancy in the descriptors or chance correlations. As a result, OFS was carried out using QSARINS 2.2.4 [51], which eliminated molecular descriptors that were nearly constant (for 90% of molecules) and highly inter-correlated (|R| > 0.90). After extensive OFS, only 1150 descriptors were finally included in the reduced set of molecular descriptors, but they nevertheless covered a wide descriptor space because they included fingerprints, charged-based, 1D to 3D, and a good number of atom-pair descriptors. The likelihood of a mechanistic interpretation of the model increased because a significant portion of the descriptors could be readily interpreted in terms of structural traits.

Splitting the Dataset into Training and External Sets and Subjective Feature Selection (SFS)
SFS is one of the most important steps in the QSAR model-building process that involves choosing the right feature selection technique with an adequate number and set of molecular descriptors. Before developing the QSAR model, the dataset was randomly divided into a training set (80%, or 449 molecules) and a prediction set (20%, or 112 molecules), to allow for proper training and validation of the model. In order to eliminate bias, reduce information leakage [32], confirm the model's external predictive ability to predict for molecules other than the training set, and to improve the composition of the training and prediction sets, the dataset was randomly divided at a ratio of 80:20. The selection of molecular descriptors was done using the training set only. The prediction set, also known as the test set or external set, was used exclusively for judging the external predictive ability of the model.
To prevent over-and underfitting, the QSAR model must have an ideal number of molecular descriptors (variables). Consequently, the ideal number of descriptors for the model was identified using a straightforward graphical (or breaking point) method [45,46,52]. The value of Q2LOO typically increases considerably when a new variable (molecular descriptor) is added in stages to an MLR model until the desired elevation is reached. After this, the value of Q2LOO increases slightly or negligibly. As a result, the number of molecular descriptors that match the elevation point is ideal for creating a QSAR model. A graph of this is shown in Figure 13. The last elevation point in Figure 13 corresponds to seven molecular descriptors. Therefore, the genetic algorithm (GA) in combination with multi-regression (GA-MLR) method, using QSARINS 2.2.4, was used for the exhaustive search to identify seven molecular descriptors to develop the QSAR model. For GA-MLR, Q2LOO was used as the fitness parameter.
The model's application domain must be identified for additional validation. In order to assess the application domain of the QSAR model, we employed a Williams plot (standardized residuals vs. hat values).

Conclusions
In relation to different features influencing the inhibitory activity for AKB, the present analysis successfully highlighted the significance of different types of atoms, groups, patterns, and tautomerism. Additionally, it emphasized the significance of specific patterns of atoms of different hybridization and their inter-relations in determining the final activity. The conditional presence of lipophilic (carbon) atoms or groups with respect to nitrogen atoms was also successfully recognized by model A as being beneficial for obtaining higher inhibitory for AKB. The present work, for the first time, pointed out the role played by tautomerism for AKB inhibitors. Model A performed statistically well, which was indicative of its strong external prediction power. As the current work successfully recognized both previously described and novel pharmacophoric properties associated with AKB inhibition, the results are of immense use throughout the drug discovery pipeline for the development of lead/drug candidates against AKB.

Conflicts of Interest:
The authors declare no conflict of interest.

SMILES
Simplified molecular-input line