AKT Inhibitors: The Road Ahead to Computational Modeling-Guided Discovery

AKT, is a serine/threonine protein kinase comprising three isoforms—namely: AKT1, AKT2 and AKT3, whose inhibitors have been recognized as promising therapeutic targets for various human disorders, especially cancer. In this work, we report a systematic evaluation of multi-target Quantitative Structure-Activity Relationship (mt-QSAR) models to probe AKT’ inhibitory activity, based on different feature selection algorithms and machine learning tools. The best predictive linear and non-linear mt-QSAR models were found by the genetic algorithm-based linear discriminant analysis (GA-LDA) and gradient boosting (Xgboost) techniques, respectively, using a dataset containing 5523 inhibitors of the AKT isoforms assayed under various experimental conditions. The linear model highlighted the key structural attributes responsible for higher inhibitory activity whereas the non-linear model displayed an overall accuracy higher than 90%. Both these predictive models, generated through internal and external validation methods, were then used for screening the Asinex kinase inhibitor library to identify the most potential virtual hits as pan-AKT inhibitors. The virtual hits identified were then filtered by stepwise analyses based on reverse pharmacophore-mapping based prediction. Finally, results of molecular dynamics simulations were used to estimate the theoretical binding affinity of the selected virtual hits towards the three isoforms of enzyme AKT. Our computational findings thus provide important guidelines to facilitate the discovery of novel AKT inhibitors.


Introduction
AKT, also known as protein kinase B (PKB), is a serine/threonine-specific protein kinase that belongs to the AGC family of kinases. Three closely related isoforms of AKT occur in mammals, namely: AKT1 (PKBα), AKT2 (PKBβ) and AKT3 (PKBγ) [1,2]. All these AKT isoforms have a common structure comprised of three domains, i.e., a pleckstrin homology (PH) domain at the N terminus that binds to phosphatidylinositol-3-kinase (PI3K), a catalytic kinase domain with an ATP-binding site and a hydrophobic motif (HM) at the C-terminus [3]. A high degree of sequence homology is observed at the catalytic domain of AKT isoforms though the other two domains vary to a certain extent. AKT1 and AKT2 are expressed ubiquitously whereas AKT3 is found primarily in the brain, kidney, and heart. Being a key enzyme of the PI3K cascade, AKT plays a crucial role in the regulation of diverse cellular functions. Some major functions of AKT include cell proliferation, cell cycle progression, cell survival and inhibition of apoptosis through inactivation of more than 20 pro-apoptotic proteins [1,[4][5][6]. AKT signaling is regularly impaired in several types of cancers and increased AKT activity has been detected in a number of aggressive malignancies. Therefore, these enzymes are considered as promising targets for the development of novel anticancer agents [7][8][9]. However, the scope of AKT inhibitors action is not just limited to the treatment of malignant diseases. Recent investigations suggested that AKT inhibitors might also be applied in the treatment of neurological diseases, diabetes, obesity, cardiovascular diseases, idiopathic pulmonary fibrosis, inflammatory and autoimmune diseases [6,10,11]. Multiple attempts have so far been made to develop AKT inhibitors as anticancer agents. GSK690693 was the first clinically tested AKT inhibitor that was followed by AZD5363, ipatasertib, afuresertib, uprosertib, MK-2206, etc. [12]. Currently, at least seven AKT inhibitors are in different stages of clinical trials [13]. Most of these clinically tested agents bind to the catalytic domain of the enzymes, and therefore these simultaneously inhibit all three isoforms of the AKT (i.e., they are pan-AKT-inhibitors). However, allosteric inhibitors were also developed in order to obtain more selectivity towards one or more AKT isoforms [14]. As an example, BAY1125976 is an allosteric inhibitor with higher specificity towards AKT1 and AKT2 isoforms. However, the advantage of isoform specific AKT inhibitors against pan-AKT inhibitors are yet to be established clinically [13,15].
Nevertheless, it has already been confirmed that bioactivity against all three isoforms should be taken into consideration while developing novel AKT inhibitors.
Machine learning-based (ML) tools have thus far been proved to be an extremely useful strategy for the design and discovery of therapeutically active agents [16][17][18]. In particular, these have been frequently employed in Quantitative Structure-Activity Relationships (QSAR) modeling to find structural requirements for higher active molecules and/or to predict activity of novel hit molecules [19][20][21]. ML tools have also been applied in multi-target QSAR (mt-QSAR) modeling for jointly predicting the bioactivity of compounds optimized under multiple biological targets and assay conditions [16,20,[22][23][24][25][26]. Recently, we have launched the software code QSAR-Co for easy tackling of multi-target classification-based QSAR modeling efforts [22]. In QSAR-Co (available in https://sites. google.com/view/qsar-co (accessed on 8 February 2021)), linear mt-QSAR models are developed by the genetic algorithm based linear discriminant analysis (GA-LDA) whereas non-linear models are generated by the random forest (RF) technique. With the desire to extend its functionalities that can further help in understanding the scope and reliability of computational modeling-guided approaches, we examined here various feature selection algorithms and machine learning tools to build reliable mt-QSAR models for probing the inhibitory action of AKT enzyme isoforms. The best predictive linear and non-linear models were then used to screen a focused kinase library to obtain the most potential virtual hits that were further investigated by structure-based methods, such as pharmacophore-based prediction, docking and molecular dynamics (MD) simulation techniques. Even though several computational modeling works targeting AKT inhibitors have been reported so far, these were always focused only on one subtype of AKT pertaining to one experimental assay condition [27][28][29][30][31][32][33]. To the best of our knowledge, the current work is the first one to report multi-target computational modeling-guided discovery of inhibitors for all three AKT isoforms assayed under multiple experimental assay conditions.

Dataset Collection and Preparation
Dataset compounds were collected from the ChEMBL database (https://www.ebi. ac.uk/chembl/ (accessed on 1 June 2020)). Details of the dataset can be found in Supplementary Materials (SM1.xlsx). Each of these compounds has been tested against at least one of the three isoforms of AKT (i.e., AKT1, AKT2 and AKT3, BAO label: Single protein) and the corresponding activity evaluated according to either half-maximal inhibitory concentration (IC 50 ) or binding affinity (Ki). Moreover, at least one of two assay techniques has been applied, i.e., either a binding assay (B) or a functional assay (F). After removal of duplicate data-points, a dataset containing 5523 samples was used for modeling the inhibitory activity, in which the latter was converted into a binary categorical response variable, IAi(c j ), with values +1 (active) and −1 (inactive). Samples with IC 50 /K i values ≤ 500 nM were considered as active [IAi(c j ) = +1], otherwise they were considered as inactive [IAi(c j ) = −1] [20,34]. Further, we adopted the Box-Jenkins moving average approach for handling the mt-QSAR modeling. Details of this computational modeling approach have been thoroughly discussed over the past and so we limit ourselves here to a brief outline [16,22,25,35]. According to the Box-Jenkins moving average approach, the input structural descriptors (D i ) of each compound are converted to deviation descriptors (∆(D i )c j based on the experimental conditions c j (or ontology) these have been tested for. As referred to above, three different experimental elements are considered here for mt-QSAR modeling, i.e., the biological target (b t : AKT1, AKT2, or AKT3), measure of effect (m e : IC 50 or Ki), and assay type (a t : B or F). The final deviation descriptors ∆(D i )c j not only encode structural aspects of the compounds but also information related to the experimental conditions under which these have been assayed (i.e., c j ) [22,36,37]. Details regarding the calculation of input descriptors (D i ), data curation and dataset division schemes, as well as model development strategies are discussed in the Materials and Methods section. The mt-QSAR model development strategy is outlined in Figure 1.

Linear Interpretable Mt-QSAR Models
The dataset was first randomly divided into a training set containing 3867 data points (70% of the data) and a validation set containing 1656 data-points (30% of the data) using the random division technique of QSAR-Co tool [22]. A total of 5305 input descriptors (D i ) were calculated for the training set by employing the alvaDesc tool [38], and these descriptors were subsequently converted to 15,915 deviation descriptors with the help of QSAR-Co tool [22]. It must be noted here that all models were set up solely on the basis of the training set, which was further split into a sub-training set containing 2707 data points (70% of the training set) and a test set containing 1160 data points (30% of the training set) by applying the random division technique of QSAR-Co tool [22] for models' development purposes. The predictivity of the built models was finally tested with the validation set. Therefore, the difference between the test set and validation set is that the test set datapoints participated in the calculation of deviation descriptors but datapoints of the validation set had no role on that.
Then, 2881 descriptors were identified to have an intercorrelation greater than 0.999 with other descriptors and after removing these descriptors the remaining 13,034 descrip-tors were considered for development of the linear models. However, the intercorrelations among the selected descriptors of the final models were critically examined.
Three different feature selection techniques, namely: genetic algorithm (GA), forward stepwise (FS) and sequential forward selection (SFS), were used one-by-one for the development of linear interpretable models. For each of the following linear discriminant analysis (LDA) models-i.e., GA-LDA, FS-LDA and SFS-LDA, a maximum of ten descriptors was allowed. The best linear models derived from the sub-training set, with descriptors selected by these feature selection techniques, are depicted in Table 1 along with the LDA statistical parameters.
As seen, the low Wilk's lambda (λ) values and high chi-square (χ 2 ), squared Mahalanobis distance (D 2 ) and F values are indicative of the statistical significance of all three models developed. Among these models, the FS-LDA model is found to have the lowest λ value. Significantly, the goodness-of-fit of GA-LDA is very similar to that of the FS-LDA model. The degree of collinearity among the selected variables was also inspected, and the resultant cross-correlation matrices can be found in the Supplementary Materials (Tables S1-S3). The highest Pearson correlation coefficients (r) observed between two independent variables were 0.838, 0.614 and 0.742 for the GA-LDA, FS-LDA and SFS-LDA models, respectively. It must be pointed out here that we discarded all models generated with highly intercorrelated independent variables (r ≥ 0.85). That was the case, for example, of two initial SFS-LDA models that had to be discarded and then re-generated after removing one of the descriptors with r > 0.85.
The next step was to verify the uniqueness of the derived models, which can easily be done by applying the Y-based randomization technique [39]. In our previous works [16,20], the Y-based randomization was performed only by scrambling the response variable but here, we slightly modified this technique and named the new technique as Yc randomization. Generally, the Y-based randomization allows one to check if the linear model was not developed by chance. In conventional computational-guided modeling, the response variable is randomly shuffled n times to generate n number of randomized models, the statistical parameters of which are then compared to that of the original model [22,40]. However, in the Box-Jenkins based mt-QSAR, the experimental elements (c j ) participate also in the calculation of the final deviation descriptors. Therefore, these experimental elements should be randomized along with the response variables to assess the robustness of the models. In order to fulfil such criteria, both the responses I A i c j and elements c j were shuffled 100 times to generate 100 different randomized datasets along with their deviation descriptors. The models developed subsequently using the same feature selection techniques were evaluated by computing the corresponding λ (λ r ) values. The average of the latter values (λ rm ) was then compared with the λ values obtained for the original models. The λ rm values obtained for the GA-LDA, FS-LDA and SFS-LDA randomized models (0.994, 0.996 and 0.992, respectively) were found to be much higher than the λ values obtained for the original models (0.414, 0.408 and 0.507, respectively), thus confirming the unique nature of the later models. Table 1. Goodness-of-fit of the linear models produced by different feature selection algorithms.

Method
Model λ χ 2 D 2 p F (10,2696)   Let us now check the overall predictive ability of these linear models. To do so, statistical parameters such as the sensitivity, specificity, F-measure, accuracy and the Matthews correlation coefficient [41,42] values were carefully examined not only for the sub-training subset but also, to infer their external predictivity, firstly for the test set (n = 1160) and finally for the validation set (n = 1656). As seen in Table 2, all models display a high predictivity against the sub-training, test and validation sets. The overall predictivity of the GA-LDA model however supersedes that of both FS-LDA and SFS-LDA models, judging from the obtained accuracy values for such sets (88.2%, 89.6%, 88.2%, respectively). Interestingly, the overall predictivity of SFS-LDA model is similar to that of the GA-LDA model. Even though FS-LDA model had the highest goodness-of-fit (lowest λ value), it afforded a lower overall predictive power compared to that of the other two models. Another way of confirming the classification ability of any LDA model is by means of the receiver operating characteristics (ROC) curve [43]. Figure 2 shows the ROC plots for the GA-LDA and SFS-LDA models. One can see that both these models are not random, but truly statistically significant classifiers, since the area under the ROC curves (ROC AUC scores calculated with Scikit-learn [44]) for all the sets are statistically higher (>0.88) than that of a random classifier model (ROC AUC score = 0.5).  Another way of confirming the classification ability of any LDA model is by means of the receiver operating characteristics (ROC) curve [43]. Figure 2 shows the ROC plots for the GA-LDA and SFS-LDA models. One can see that both these models are not random, but truly statistically significant classifiers, since the area under the ROC curves (ROC AUC scores calculated with Scikit-learn [44]) for all the sets are statistically higher (>0.88) than that of a random classifier model (ROC AUC score = 0.5). To sum up, it can be inferred that the GA-LDA is the best linear model considering its goodness-of-fit as well as internal and external predictive ability. Moreover, when the sub-training set of GA-LDA model was subjected to a 10-fold cross-validation, an accuracy of 87.99% and MCC value of 0.752 were obtained indicating high internal predictivity of this model. Yet, to establish the overall reliability of any mt-QSAR model, one should also access its applicability domain (AD). When the GA-LDA model was examined by means of the standardization-based AD approach [45], 64 data-points of the sub-training set, 20 datapoints of test set and 49 data-points of the validation set are found to be possible structural outliers, meaning that for those predictions might not be reliable.

GA-LDA
Further approval of this GA-LDA classification model should only be carried out af- To sum up, it can be inferred that the GA-LDA is the best linear model considering its goodness-of-fit as well as internal and external predictive ability. Moreover, when the sub-training set of GA-LDA model was subjected to a 10-fold cross-validation, an accuracy of 87.99% and MCC value of 0.752 were obtained indicating high internal predictivity of this model. Yet, to establish the overall reliability of any mt-QSAR model, one should also access its applicability domain (AD). When the GA-LDA model was examined by means of the standardization-based AD approach [45], 64 data-points of the sub-training set, 20 datapoints of test set and 49 data-points of the validation set are found to be possible structural outliers, meaning that for those predictions might not be reliable.
Further approval of this GA-LDA classification model should only be carried out after verifying if a consensus modeling approach might not yield a new model with higher predictivity. To accomplish this, the predicted response variables from three different LDA models were collected and the outcomes occurring more frequently were regarded as the consensus predicted activities. By comparing results from both models (Table 3), it was observed that the predictive power of the consensus LDA model is particularly similar but not higher than that of the original GA-LDA model. Therefore, the GA-LDA model is to be considered as the best linear interpretable model for the current dataset. However, considering the performance of the other LDA models, one can reach to the conclusion that the feature selection algorithms used here may simultaneously be applied for future mt-QSAR modeling of other datasets.

Interpretation of Molecular Descriptors
Undeniably, one of the primary advantages of linear models is the possibility of identifying the most crucial structural and physicochemical factors responsible for the higher activity of the compounds [46]. A description of all the descriptors appearing in the three linear models is given in Table 4. However, considering the higher statistical quality of both the GA-LDA and SFS-LDA models, our discussion will focus mainly on the descriptors appearing in these models. Since the relative contributions of the descriptors can only be understood by analyzing the absolute values of their standardized coefficients, these are shown in Figure 3 for the GA-LDA and SFS-LDA models.
Coefficient sum of the last eigenvector (absolute values) from Barysz matrix weighted by atomic number    Firstly, it is noteworthy that all the experimental elements considered in this work (i.e., at, me, and bt) consistently appeared in the final LDA models demonstrating their importance (Table 4). To set up the models, 30 distinct categories of descriptors (available in alvaDesc [38]) were considered but only 10 persisted. The latter pertained more frequently to 2D atom pairs (pairs of atoms at a given topological distance) and functional group counts descriptors, as well as to atom-centered fragments [49]. Importantly, these three categories of descriptors are easily interpretable. Nevertheless, 3D descriptors, such as 3D-Morse (3D-Molecular Representation of Structures based on Electronic diffraction) [50], WHIM (Weighted Holistic Invariant Molecular) [47], and CATS3D (Chemically Advanced Template Search 3D) [48], also appeared in the final models.
As can be seen in Figure 3, the three most important descriptors of the GA-LDA model pertain to two topological indices (i.e., the graph-based descriptors the importance of aliphatic primary amines for achieving high activity against the AKT enzyme isoforms. Other important descriptors in this model are the frequency of atom pairs at particular topological distances, e.g., between two nitrogen atoms or sulfur and bromine atoms of the compounds [49]. Two mentions also are the two CATS2D descriptors [48] of the SFS-LDA model, i.e., descriptors , which embody a potential pharmacophore point on pairs of atoms. Both these involve two hydrogen bond donor groups (DD) in the compounds located at different topological distances.
Overall, it may thus be concluded that the presence as well as topological orientation of primary amines, amides and hydrogen bond donor groups in the molecules play crucial roles in promoting activity against the AKT isoforms. The topological distance between Firstly, it is noteworthy that all the experimental elements considered in this work (i.e., a t , m e , and b t ) consistently appeared in the final LDA models demonstrating their importance (Table 4). To set up the models, 30 distinct categories of descriptors (available in alvaDesc [38]) were considered but only 10 persisted. The latter pertained more frequently to 2D atom pairs (pairs of atoms at a given topological distance) and functional group counts descriptors, as well as to atom-centered fragments [49]. Importantly, these three categories of descriptors are easily interpretable. Nevertheless, 3D descriptors, such as 3D-Morse (3D-Molecular Representation of Structures based on Electronic diffraction) [50], WHIM (Weighted Holistic Invariant Molecular) [47], and CATS3D (Chemically Advanced Template Search 3D) [48], also appeared in the final models.
As can be seen in Figure 3, the three most important descriptors of the GA-LDA model pertain to two topological indices (i.e., the graph-based descriptors ∆[VE_1Dz(Z)] b t and ∆(Wi_D/Dt) m e ) and one constitutional descriptor (∆(nRNH2) a t : counts of the number of primary amines). Other contributing descriptors of the model are geometrical descriptors either weighted by atomic masses or unweighted, these obviously containing information about the whole 3D-molecular structure of the compounds [47,49]. Interestingly, similar to the number of aliphatic amines, the number of aromatic amines (∆(nArNHR) a t ) is found also to be important in the GA-LDA model for triggering a higher biological activity, both being dependent on the experimental element assay type (a t ). Except one 3D-MoRSE descriptor, all SFS-LDA descriptors are found as 2D descriptors. For instance, its most significant descriptor is ∆(nRNH2) m e , which along with the ∆(nRNH2) a t descriptor of GA-LDA (which also appears in the FS-LDA model), reiterates the importance of aliphatic primary amines for achieving high activity against the AKT enzyme isoforms. Other important descriptors in this model are the frequency of atom pairs at particular topological distances, e.g., between two nitrogen atoms or sulfur and bromine atoms of the compounds [49]. Two mentions also are the two CATS2D descriptors [48] of the SFS-LDA model, i.e., descriptors ∆(CATS2D_02_DD) b t and ∆(CATS2D_06_DD) a t , which embody a potential pharmacophore point on pairs of atoms. Both these involve two hydrogen bond donor groups (DD) in the compounds located at different topological distances.
Overall, it may thus be concluded that the presence as well as topological orientation of primary amines, amides and hydrogen bond donor groups in the molecules play crucial roles in promoting activity against the AKT isoforms. The topological distance between two nitrogen atoms, nitrogen and chlorine atoms, nitrogen and oxygen atoms, as well as sulfur and bromine atoms in the compounds appear to be the key factors for a higher AKT inhibition. At the same time, the importance of carbon atoms (e.g., ∆(C − 032) m e and ∆(H − 052) m e ), topological graph-based descriptors and geometrical descriptors (particularly 3D-MoRSE ones) is revealed. All these descriptors are found to be significant in ascertaining the biological activity of the compounds against AKT enzymes.

Non-Linear Predictive Mt-QSAR Models
Herein, descriptors with variances less than 0.001 and inter-correlations higher than 0.95 were removed, leaving 3342 deviation descriptors for setting up various non-linear mt-QSAR models, based on the ML tools: Xgboost [51], (RF) [52], kNN [53], RBF-SVC [54], MLP [55], DT [56], and NB [57]. Since the statistical quality of the resultant non-linear models substantially depends on the ML parameter settings [58], the latter were optimized by hyperparameter tuning [59]. A 10-fold cross-validation (CV) grid search scheme was employed to optimize the parameters and to achieve the best possible model based upon the sub-training set. Table 5 shows the parameter values that were optimized through hyperparameter tuning as well as the final optimized values of these parameters for each of the machine learning tools applied. Furthermore, Table 5 also depicts the 10-fold CV accuracies obtained when the machine learning tools with optimized parameter values were fitted with the sub-training set. It is evident that the classification ability of these machine learning tools varies to a considerable extent, but those achieving the better ones are the DT, RF and Xgboost tools. In fact, the MLP model demonstrates moderate internal predictivity, and the classifications produced by kNN, SVC and NB suggest an extremely poor internal predictivity of these tools.
The RF and Xgboost tools produced a 10-fold CV accuracy on the sub-training set of 91.02% and 91.54%, respectively, indicating that both resultant models display high and almost similar internal predictivity. These two models were then chosen to further investigate their external predictivity on the test and validation sets. Table 6 shows the overall predictivity of the RF and Xgboost models, whereas their ROC plots are displayed in Figure S1 of the Supplementary Materials. As expected, the internal and external predictivities of the RF and Xgboost based non-linear models are noticeably higher than those of the linear models (see Tables 2 and 3). On the basis of their overall predictivity (cf. accuracy values and MCC scores for all sets in Table 6), the Xgboost model stands for the best non-linear model.
One aspect that warrants explicit attention is to inspect how the two best-derived models (i.e., GA-LDA and Xgboost) manage to classify the datapoints pertaining to different experimental elements c j (Table 7). In general, the datasets applied in mt-QSAR computational modeling encompass a large variation in the number of data-points vis-àvis the various experimental elements. As expected, the same situation happens in the current dataset. Still, the non-linear Xgboost model is unaffected by that since it affords high accuracies irrespectively of the experimental element or validation set. The GA-LDA model, with less overall predictivity than the Xgboost model, shows also high accuracies in case of the test set. Nevertheless, it reaches low accuracy values against some experimental conditions (e.g., for c j = 4 and 7). Yet, if both these models are considered simultaneously, there is apparently a greater chance of finding more accurate predictions.

Virtual Screening
In order to describe how the developed mt-QSAR models perform on identifying virtual hits, we employed the GA-LDA and Xgboost models for screening the focused library Asinex kinase (http://www.asinex.com/focus_kinases/, accessed on 17 August 2020), which comprises 6538 compounds. Details about this dataset can be found in Supplementary Materials (SM2.xlsx). Similarly, the descriptors of all such compounds were calculated by the alvaDesc tool [38]. In the modeling dataset used here, we found 10 unique experimental elements c j depending on the m e , a t and b t conditions (cf. Table 7). Each compound of the Asinex kinase inhibitor library was also assigned to those conditions, and a virtual dataset containing 65,380 samples was then prepared. Afterwards, the deviation descriptors of each of these samples were calculated using the QSAR-Co tool [22]. Initially the GA-LDA model was used for screening this virtual dataset, and 28 compounds were predicted as active (IA i (c j ) = +1) against at least 7 out of 10 experimental elements. The predictivity of these 28 compounds was then tested with the best non-linear model (i.e., the Xgboost model) and seven compounds were predicted to be active against at least 6 experimental elements c j . These seven compounds (Figure 4) are thus to be considered as the most potential virtual hits according to the current computational multi-target modeling. Note that, for screening the dataset, we mainly relied on the linear model since it is likely to be less susceptible to overfitting than the non-linear model. Table 8 displays the number of experimental conditions pertaining to each virtual hit picked by the GA-LDA and Xgboost models, whereas details of these experimental conditions are provided in Supplementary Materials (Table S4). It is observed that each of these virtual hits are predicted to be active against all three AKT isoforms in one or more experimental assay conditions (defined by the combinations of a t and b t ).   Table 8 displays the number of experimental conditions pertaining to each virtual hit picked by the GA-LDA and Xgboost models, whereas details of these experimental conditions are provided in Supplementary Materials (Table S4). It is observed that each of these virtual hits are predicted to be active against all three AKT isoforms in one or more experimental assay conditions (defined by the combinations of at and bt).

Pharmacophore Based Biological Target Identification
After identifying seven virtual hits from multi-target computational-guided modeling, we have decided to implement a filtering scheme based on reverse pharmacophore mapping. The latter will allow pre-filtering such hits that best fulfil simple geometric and chemical functionality requirements to trigger a biological response, before embarking on more computationally demanding approaches. For this, each of these virtual hits were investigated with the PharmMapper webserver (http://www.lilab-ecust.cn/pharmmapper/ index.html, accessed on 4 January 2021)) [60,61] to identify their possible human macromolecular targets. The results of PharmMapeer based predictions are outlined in Table 9. Details about the fitting of the virtual hits with the obtained structure-based pharmacophores are provided in the Supplementary Materials ( Figures S2 and S3).
Interestingly, except Asn5283 and Asn2706, all virtual hits are mapped with the structure-based pharmacophores coming from both human AKT1 (PDB ID: 3CQU) and human AKT2 (PDB ID: 2UW9). Note that the PharmMapper predictions are based on the protein structures available in the Protein Data Bank (PDB) and so far, no complete structure of AKT3 is available. Therefore, these results mean that multiple AKT isoforms may be possible biological targets for these five virtual hits, which are Asn0019, Asn0021, Asn0022, Asn5093 and Asn6236. Furthermore, all these five virtual hits are fitted with structure-based pharmacophores generated at the catalytic sites of these enzyme isoforms. We randomly selected five Asinex kinase inhibitor library compounds (names and structures are provided in SM2.xlsx of Supplementary Materials) that were predicted to be inactive in the QSAR based virtual screening and none of these compounds was mapped with structure-based pharmacophores pertaining to both AKT1 and AKT2. Based upon the current pharmacophore analysis, we removed Asn5283 and Asn2706, and decide to further investigate the remaining six virtual hits by other more complicated structure-based analysis techniques.

Structure-Based Prediction of the Virtual Hits
Even though pharmacophore mapping insinuated that selected virtual hits may interact with multiple AKT enzyme isoforms, considering high flexibilities of kinase enzymes, it may not be able to predict the stabilities ligand-receptor complexes. Therefore, we decided to go a step forward and carry out molecular dynamics (MD) simulations to understand the dynamic behavior of the virtual hits within the AKT enzyme isoforms. The five virtual hits predicted by both the mt-QSAR modeling and the pharmacophore-based mapping (PharmMapper) were initially docked into the isoforms AKT1 (PDB ID:4GV1) and AKT2 (PDB ID: 1O6K), as well as into an AKT3 homology model. For this, two molecular docking tools were employed, namely: (a) Autodock Vina [62] and (b) Autodock 4.2 [63]. The homology modeling of AKT3 was carried out using the SWISS-MODEL webserver (swissmodel.expasy.org) [64,65]. Details about the procedures of homology modeling used as well as the validation of the model are described in the Materials and Methods section.
Since multiple binding sites exist in these AKT enzyme isoforms, we initially conducted a blind docking experiment for the virtual hits with the help of Autodock Vina. To do so, a grid box of 100 Å × 100 Å × 100 Å dimensions was centered on each of such macromolecules. The blind docking (performed with an exhaustiveness of 45) indicated that all the virtual hits may preferably bind to the catalytic domain of the enzyme isoforms. Thereafter, an Autodock based docking was performed for all the virtual hits by setting a grid box of 50 Å × 50 Å × 50 Å dimensions, with a grid spacing of 0.375 Å, centered on the catalytic residue of Asp292, Asp293, and Asp289 for AKT1, AKT2, and AKT3, respectively. The pan-AKT inhibitor GSK690693 was employed as a reference compound in this Autodock docking experiment. The results of both docking experiments performed with Autodock Vina and Autodock 4.2 are provided in Table S5 of Supplementary Materials.
These MD simulations were carried out with docked (Autodock-based) complexes of these compounds for 50 ns. The docked complexes of some selected virtual hits (i.e., the starting ligand-protein complexes used in the MD simulations) are provided in the Supplementary Materials. The root-mean-square-deviation (RMSD) plots of the backbone atoms of the receptor-ligand complexes, ligand RMSD plots, the root-mean-square-fluctuation (RMSF) plots of these protein structures, and the radius of gyration' plots are presented in the Supplementary Materials (Figures S4-S6). A close look at these plots reveals an adequate dynamic stability as well as compactness of the ligand-macromolecule complexes. However, our major goal was to estimate the binding free energies of these ligand-protein complexes, which was computed by the molecular mechanics-generalized Born surface area (MM-GBSA) approach.
As observed from the results in Table 10, Asn0019 depicts maximum theoretical binding energy values against three different enzyme isoforms of AKT among five virtual hits. The theoretical binding free energy values of Asn0019 against AKT2 and AKT3 enzymes were comparable to those of the reference compound GSK690693. Even though Asn5093 depicted slightly increased theoretical ∆G bind values against AKT1 and AKT2 as compared to Asn0019 (as well as Asn0021 and Asn0022), its binding energy was found to be reduced against AKT3 enzyme. The Asn0019 may be projected as the most promising virtual hits based on its average theoretical ∆G bind value. Per residue decomposition analysis of Asn0019 was performed to understand the binding interactions of this virtual hit against X-ray crystal structures AKT isoforms (i.e., AKT1 and AKT2). Total energy values obtained from important binding site residues are depicted in Figure 5. It is observed that interactions of Asn0019 with Leu156/158, Val164/166, Ala177/179, Lys179/181, Tyr229/231, Ala230/232, Glu278/279, Asn279/280, Met281/282, Thr291/292 and Phe438/439 of AKT1 and AKT2 enzymes may play crucial roles in determining the binding affinities of this molecule in these proteins.

Development of Linear Interpretable Models
The linear models were based on linear discriminant analysis (LDA) models and three different feature selection algorithms, i.e., (a) genetic algorithm (GA), (b) forward stepwise (FS), and (c) sequential forward selection (SFS). For the first one, the resultant GA-LDA model was set up with the help of the QSAR-Co tool [22]. Details of the GA selection procedure have been extensively reported in the past [69]. Briefly, GA is based on the evolution of a population of randomly generated models. Firstly, parent models are chosen, and these are then subjected to random "cross-over" and "mutation" processes to produce child models, which are then used to check their fitness scores. The new generated models with the highest fitness scores are then forwarded to a next iteration. The algorithm terminates either when the maximum number of allowed population models is reached or when no improvement regarding the fitness score is observed for the subsequent 10 population generations. The parameters employed here for setting up the GA-LDA model with QSAR-Co were: (i) total number of iterations/generations: 100, (ii) equation length: 10 (fixed), (iii) mutation probability: 0.3, (iv) initial number of equations generated: 100, and (v) number of equations selected in each generation: 30. Forward stepwise (FS) is a very popular feature selection algorithm in which the independent descriptors are included in the model stepwise depending on a specific statistical parameter. In this work, the FS-LDA model was set up with a python program where the features are selected and included in the model stepwise by the corresponding p-values of the Fischer statistic [69]. Initially, criteria for the forward selection (i.e., p-value to enter) as well as for the backward elimination (p-value to remove) have to be set. The descriptor with the lowest p-value is first included and subsequently other descriptors are included in the model based on their p-values only if the criterion for forward selection is met. However, if the p-value of a descriptor included in the model is found to be greater than the "p-value to

Descriptor Calculation
The SMILES structures of the compounds were converted to 2D structures using the MarvinView software (https://docs.chemaxon.com/display/docs/MarvinView, accessed on 6 July 2020). Such structures were subsequently standardized by the ChemAxon Standardizer tool using the following options: strip salts, aromatize, clean 3D, tautomerize, neutralize and add explicit hydrogens [66]. The initial descriptors were calculated by alvaDesc tool (https://www.alvascience.com/alvadesc/, accessed on 6 July 2020) [38] in the OCHEM web-platform [67]. Geometry optimizations of the compounds were performed with the Corina software [68] under OCHEM [67]. Calculation of the deviation descriptors (∆(D i )c j ) from the initial descriptors (D i ) and experimental elements (c j ) were carried out using the QSAR-Co tool [22].

Development of Linear Interpretable Models
The linear models were based on linear discriminant analysis (LDA) models and three different feature selection algorithms, i.e., (a) genetic algorithm (GA), (b) forward stepwise (FS), and (c) sequential forward selection (SFS). For the first one, the resultant GA-LDA model was set up with the help of the QSAR-Co tool [22]. Details of the GA selection procedure have been extensively reported in the past [69]. Briefly, GA is based on the evolution of a population of randomly generated models. Firstly, parent models are chosen, and these are then subjected to random "cross-over" and "mutation" processes to produce child models, which are then used to check their fitness scores. The new generated models with the highest fitness scores are then forwarded to a next iteration. The algorithm terminates either when the maximum number of allowed population models is reached or when no improvement regarding the fitness score is observed for the subsequent 10 population generations. The parameters employed here for setting up the GA-LDA model with QSAR-Co were: (i) total number of iterations/generations: 100, (ii) equation length: 10 (fixed), (iii) mutation probability: 0.3, (iv) initial number of equations generated: 100, and (v) number of equations selected in each generation: 30. Forward stepwise (FS) is a very popular feature selection algorithm in which the independent descriptors are included in the model stepwise depending on a specific statistical parameter. In this work, the FS-LDA model was set up with a python program where the features are selected and included in the model stepwise by the corresponding p-values of the Fischer statistic [69]. Initially, criteria for the forward selection (i.e., p-value to enter) as well as for the backward elimination (p-value to remove) have to be set. The descriptor with the lowest p-value is first included and subsequently other descriptors are included in the model based on their p-values only if the criterion for forward selection is met. However, if the p-value of a descriptor included in the model is found to be greater than the "p-value to remove", it is eliminated from the model. In the current work, both p-values to enter and to remove were fixed at 0.05. The final LDA models were developed and were subsequently validated using the LinearDiscriminantAnalysis function of Scikit-learn [44]. The python code used for FS-LDA model development is provided in the Supplementary Materials (file SM1.py). The last feature selection algorithm is based on a sequential forward selection which adds features into an empty set until the performance of the model is not improved either by addition of another feature or by reaching the maximum allowed number of features [70]. Similar to FS, this procedure is also a greedy search algorithm where the best subsets of descriptors are selected stepwise and the model performance is justified by the user-specific statistical measure. In this work, the python based SequentialFeatureSelector algorithm of the "mlxtend" library (http://rasbt.github.io/mlxtend/, accessed on 27 July 2020) was applied to setup the resultant SFS-LDA model. The parameters used for that purpose were: (i) maximum number of features: 10, (ii) forward: True, (iii) floating: True, (iv) scoring: accuracy, and (v) cross-validation (cv) = 0. Thereby, accuracy was employed as the userspecific statistical measure for feature selection, in contrast to FS where the p-value was used, and no cross-validation was performed during feature selection.
Several diagnostic statistical indices were employed for assessing our model equations, in terms of the criteria goodness-of-fit and goodness-of-prediction. Measures of goodnessof-fit of the LDA models based on the sub-training set were estimated by standard indices such as the Wilks' lambda (λ) [71], chi-squared (χ 2 ), the square of Mahalanobis distance (D 2 ), the Fisher's statistic index (F), and the corresponding p-value (p) [41]. Measures of goodness-of-prediction for both linear and non-linear models were estimated by computing the following statistical measures for the sub-training, test and external validation sets: sensitivity-correct classification of the active cases, specificity-correct classification of inactive cases, accuracy-overall correct classification, F-measure, and Matthews correlation coefficient (MCC) [41,42].
All other machine learning techniques were applied by resorting to the respective Scikit-learn machine learning packages [44]. The model development parameters for each of these techniques were determined by hyperparameter tuning as implemented in GridSearchCV of Scikit-learn [58]. During hyperparameter optimization, a 10-fold cross-validation (10-fold CV) was performed with the sub-training set to identify the best model estimators. The 10-fold CV accuracies obtained from the different machine learning techniques were compared to find the highly predictive classifiers. Finally, the external predictivity of these classifiers were estimated with the test and the validation sets.

PharmMapper Based Prediction of Biological Targets
The freely-accessed PharmMapper (version 2017) web server (http://www.lilab-ecust. cn/pharmmapper/, accessed on 4 January 2021) searches for the best mapping poses of the given molecules against structure-based pharmacophore models generated with all targets of PharmTargetDB [60,61]. PharmMapper applies a large database of receptor-based pharmacophores (>7000 pharmacophore models based on 1627 drug targets, 459 of which are human protein targets.) to find possible macromolecular targets for the input ligands. It implements Cavity (version 1.1) program in order to identify the binding sites on the surface of a protein structure and to rank these subsequently as per their druggability scores. The receptor-based pharmacophore modelling was then performed using Pocket (version 4.0) program for extracting the pharmacophore features within these druggable cavities.
In this work, the virtual hits obtained from multi-target QSAR computational modeling were subjected to PharmMapper based predictions of biological targets. A maximum number of 1000 conformers were generated during the search of pharmacophore fitting on human biological targets and the top 200 targets based on the fit scores were analyzed.

Homology Modeling
The homology model of AKT3 was set up with the help of the SWISS-MODEL webserver [64,65]. The FASTA sequence of human AKT3 was retrieved from Uniprot (Uniprot ID: Q9Y243) and we used the sequence 141-479 for homology modeling. The FASTA sequence was uploaded to the SWISS-MODEL server to search for templates and then, ranging from 50 templates to 10 templates were selected for generating the model on the basis of higher GMQE values and also on their diversity. Each of the homology models was analyzed by the 'structural assessment' option available in this webserver. Among these models, one model developed with the template of 6CCY.1.A (sequence identity: 83.18%) was found to have the lowest MolProbity value (=1.02), calculated from http://molprobity.biochem.duke.edu/, accessed on 11 January 2021). This model, which showed a Qualitative Model Energy Analysis (QMEAN) value of −1.84, was selected for further structural modifications. From the low MolProbity value, it is ensured that the homology model is of good structural quality, however, the model showed some complications mainly due to its Clashscore (all atoms) of 0.37, the presence of 5 Ramachandran outliers, as well as the presence of 26 bad angles. Therefore, the UCSF Chimera software [72] was utilized for structural modifications of this homology model. The latter included step-by-step loop refinement, rotamer adjustment and structural minimization of the selected residues. Then, for each modification step, the quality of the model was checked in the Molprobilty webserver (http://molprobity.biochem.duke.edu/, accessed on 11 January 2021) [64]. Ultimately, the modified homology model was found to have an even lower Molprobilty score of 0.87, indicating that the initially developed model was clear structurally improved. In this final model, the Clashscore (all atoms) reduced to zero and the number of outliers to one, as well as the number of bad angles lowered to two. The final model also depicted an improved Global Model Quality Estimation (GMQE) value of −1.61. Alignment of Q9Y243 with 6CCY.1.A, validation parameters as well as Ramachandran plots of the final homology model are shown in the Supplementary Materials ( Figures S7 and S8).

Molecular Docking
The docking experiments with the X-ray crystal structures and homology model were performed with Autodock Vina (version 1.1.2., The Scripps Research Institute, La Jolla, CA, USA) [62] and Autodock 4.2 (The Scripps Research Institute, La Jolla, CA, USA) [63]. During preparation of the macromolecules, water molecules and the peptide substrate obtained from the PDB were removed. Other necessary details about the methodology followed on such docking experiments are the same as described earlier [20].

Molecular Dynamics Simulations
All MD simulations were carried out using the software package AMBER 12 [73,74]. Initially, the protonation states of the amino acid residues of each protein were fixed at pH = 7.0. These protonation states were attained by the PDB2PQR server (http:// server.poissonboltzmann.org/pdb2pqr, accessed on 12 January 2021), using the AMBER forcefield and output naming scheme [75]. The ff99SB and general AMBER forcefield (GAFF) were applied for describing the protein-inhibitor and inhibitor-solvent interactions, respectively [76,77]. The optimization of ligands was carried out with the help of Leap in Antechamber, by performing MD simulations of the hydrated complexes centered in a cubic box of edge length of 8 Å, and applying the forcefields GAFF, ff99SB, and TIP3P for water molecules [78]. Subsequently, the negative charges of the complex systems were neutralized. The SHAKE algorithm was used to constrain all bonds related to hydrogen atoms, whereas the Partial Mesh Ewald (PME) method was employed to handle long range electrostatic forces (using a cutoff of 12 Å). Energy minimization of the complexes was performed in two stages. In the first stage, only the ions and water molecules were allowed to relax during 1000 steps of the steepest descent method and during 1000 steps of the conjugate gradient algorithm using a restrained force of 500 kcal/mol on the solute. In the second stage, the whole system was relaxed during 5000 minimization steps, i.e., 2500 steps of steepest decent minimization followed by 2500 of conjugated gradient. The minimized systems were then gradually heated up from 0 to 300 K (50 K in each step) with a weak harmonic restraint of 10 kcal/mol to keep the solute fixed for 200 ps. Subsequently, a 2-ns equilibration on the NpT ensemble was performed, with pressure kept fixed at 1 bar and temperature at 300 K. Finally, the 50-ns MD simulations without restrictions were run with constant temperature (T = 300 K) and constant pressure (p = 1 bar). After completion of such simulations, various post-dynamic analyses were carried out with PTRAJ and CPPTRAJ implemented in the AMBER package [79]. The graphs were plotted using the QTGRACE tool (https://sourceforge.net/projects/qtgrace/files/, accessed on 1 February 2021). Molecular Mechanics Generalized Born Surface Area (MM-GBSA) based binding free energies of the ligands were calculated using MMPBSA.py program in AMBER [80,81]. One hundred snapshots were taken from the last 10 ns of MD trajectory.
The binding energy calculation is represented as the following equation: ∆E ele and ∆E vdW are electrostatic and van der Waals interactions between the ligands and the proteins (in gas phase), respectively. The polar solvation free energy (∆G pol ) accounts for the polar interactions with the solvent molecules whereas the ∆G nonpolar term represents non-polar solvation free energy, which is obtained from the equation ∆G nonpolar = γSASA + β. The the solvent accessible surface area is represented by the term SASA. The surface tension proportionality constant (γ) and the free energy of nonpolar solvation of a point solute (β), were set to 0.00542 kcal mol −1 Å −2 and 0 kcal mol −1 , respectively. The entropic contribution (T∆S) is not calculated because apart from being computationally expensive (especially for large macromolecular complexes), it has been reported to be less accurate [80,81].
The energy contributions of the close contact amino acid residues into the total binding free energies were computed using MM-GBSA per residue free energy decomposition method with Amber 12 MM-GBSA module [73,74,82,83]. All energy components (van der Waals, electrostatic, polar solvation, and nonpolar solvation contributions) were calculated using 200 snapshots extracted from the last 10 ns MD trajectories.

Conclusions
The latest advances in machine learning tools, coupled with the availability of everlarger data sets, brought about a fresh wave for faster and less complicated computationalguided drug discovery efforts [84][85][86]. In this work, we could assemble a large dataset containing 5523 inhibitors of the three AKT isoforms, assayed under a variety of experimental conditions. With the desire to build reliable predictive multi-target QSAR classification models for probing the inhibitory activity from such data, we examined the use of various machine learning tools along with several feature selection algorithms. Considering that machine learning is a powerful mean for finding drug like leads [18], the best linear and non-linear mt-QSAR models were finally used for screening a focused kinase library to identify virtual hits as potential pan-AKT inhibitors. The results obtained were finally post-processed by structure-based approaches.
With regard to the mt-QSAR modeling, the combination of LDA with feature selection algorithms such as FS, SFS, or GA was found to produce classification models exhibiting very good accuracy (>87%), as well as internal and external predictivity. Nevertheless, the GA feature selection algorithm yielded the best predictive linear model, even though in a not so straightforward and less time-consuming way as the other two algorithms. More significantly, these linear models aided us in understanding the most crucial structural and/or physicochemical properties required for higher AKT inhibition. At the same time, the classification ability of the seven different ML-based mt-QSAR models were found to vary to a considerable extent. The Xgboost technique produced the most predictive non-linear mt-QSAR model (accuracy > 90%), but its overall predictivity was similar to that of the RF model. This leads us to suppose that tree-based modeling techniques are superior to other machine learning ones for multi-target computational modeling. Yet, more investigations are needed to confirm that supposition.
To judge how the best linear and nonlinear mt-QSAR models perform on identifying virtual hits with activity against AKT inhibitors, we used them to screen the Asinex kinase inhibitor library. The obtained virtual hits were further evaluated by structurebased pharmacophore modeling, molecular docking and MD simulations studies. Worth mentioning here that in case of multi-target modeling, structure-based approaches become more problematic when one or more biological targets are not sufficiently characterized. Indeed, this was the case for the AKT3 isoform, the X-ray crystal structure of which is yet to be reported, and thereby a reliable homology model had to be derived. The pharmacophoremapping target-identification search led to results reinforcing the former mt-QSAR based predictions. Further, the results obtained in the following MD simulations allowed us to put forward Asn0019 as the most potent virtual hit for the inhibition of all AKT isoforms.
To conclude, the information gathered and the derived mt-QSAR computational models provide important guidelines for the discovery of novel AKT inhibitors. What is more, such models are not limited only to pan-inhibitors but can also be applied to identify inhibitors that have selectivity towards one or two AKT isoforms.
Supplementary Materials: The following are available online https://www.mdpi.com/article/10.3 390/ijms22083944/s1, Table S1: Degree of collinearity among the variables of the GA-LDA model; Table S2: Degree of collinearity among the variables of the FS-LDA model; Table S3: Degree of collinearity among the variables of the SFS-LDA model; Figure S1: ROC curves for the two best nonlinear models (Xgboost-ROC-AUC score (test): 0.919, ROC-AUC score (validation): 0.932, and RF-AUROC: ROC-AUC score (test): 0.917, ROC-AUC score (validation): 0.930); Table S4: Experimental conditions under which the virtual hits were predicted to be active; Table S5: Molecular docking results with binding energy values (in kcal/mol) of the virtual hits; Figure S2: Fitting of virtual hits to the structure-based pharmacophores of AKT1 (PDB: 3CQU) and AKT2 (PDB: 2UW9) enzymes; Figure S3: Fitting of virtual hits to the structure-based pharmacophores of AKT1 (PDB: 3CQU) and AKT2 (PDB: 2UW9) enzymes.; Figure S4: Protein backbone RMSD, ligand RMSD, RMSF and radius of gyration plots for the AKT1 complexes.; Figure S5: Protein backbone RMSD, ligand RMSD, RMSF and radius of gyration plots for the AKT2 complexes; Figure S6: Protein backbone RMSD, ligand RMSD, RMSF and radius of gyration plots for the AKT3 complexes; Figure S7: (A) Alignment of the AKT3 target sequence with potential template; (B) Z-score estimation of the AKT3 homology model; (C) Local QMEAN estimates after manual refinement; (D) AKT3 homology model built using the Swiss-Model server and the 6CCy.1.A template 3D structure; Figure S8: Ramachandran plots for the homology model of AKT3 enzyme. Starting_protein_ligand_complexes.zip: PDB files containing the complexes of virtual hits with the enzymes, used in the MD simulations. Funding: This work was supported by UID/QUI/50006/2020 with funding from FCT/MCTES through national funds.

Conflicts of Interest:
The authors declare no conflict of interest.