2.2.1. Distribution of Physicochemical Properties of PKIs
A common quality check made on oral drug candidates consists of assessing their compliance with the Lipinski’s rule of five (Ro5) [
11]. This rule states that a compound is more likely to show poor passive absorption or permeation if it violates two or more of the following constraints: molecular weight (MW) ≤ 500, calculated logP (ClogP) ≤ 5, number of hydrogen bond acceptors (HBA) ≤ 10, number of hydrogen bond donors (HBD) ≤ 5. Two additional properties, topological polar surface area (TPSA) and number of rotatable bonds (NRB), have also been shown to be correlated with oral bioavailability and the following conditions are often used in predictive drug design: TPSA ≤ 140 Å
2 and NRB ≤ 10 [
12]. These property ranges can serve as an early filtering strategy to reduce the size of a compound collection before running experimental or virtual screening campaigns.
Since current drugs do not always conform to Lipinski’s rules, it is expected that PKIs may also offend some of the four criteria proposed by Lipinski. We checked to which extent the compounds in the PKI dataset violate the Ro5 and found that a significant fraction of them have properties falling outside Lipinski’s boundaries (
Table 1). Although 56% (or 63% depending on the method used to calculate the ClogP,
Appendix A:
Table A1) of the compounds fully comply with all conditions, almost one third of the compounds (28%) violate a single rule and 16% of the compounds violate two rules. These numbers are quite comparable to the statistics obtained for the approved PKI subset.
Going into more details, we looked at the individual components composing Lipinski’s Ro5. Because the calculated logP is strongly dependent on the software used, the logarithm of the octanol/water partition coefficient was calculated using two different methods, the ClogP functionality included in the RDKit toolkit and the ClogP calculator from ChemAxon [
13] (
Figure 4f,h and
Table 2 and
Appendix A:
Table A2). We found that most offending compounds exceed Lipinski‘s boundaries in terms of their molecular weight MW (32%) and their partition coefficient (ClogP, 24% or 12% using RDKit or ChemAxon respectively). In fact, high molecular weight and lipophilicity are often seen in Type-II inhibitors, whose chemical structure is elongated as compared to Type-I inhibitors. This is required when designing Type-II inhibitors, which extend to and interact within the kinase hydrophobic back-pocket but at the expense of a higher molecular weight. Interestingly, only one compound, barasertib, a prodrug of the active entity, contains five hydrogen-bond donor atoms and thus does not violate Lipinski’s rule.
As for TPSA and NRB properties (
Table 3), the proportion of the PKIs complying with the criteria TPSA ≤ 140 Å
2 and NRB ≤ 10 is very high. Only 3.9% and 4.4% of compounds exceed the TPSA and NRB thresholds, respectively. Similarly, all approved kinase inhibitors so far are compliant with these two criteria except lapatinib and neratinib.
The individual physicochemical properties follow a normal distribution as depicted by their bell-shaped curves (
Figure 4). Only a few compounds have properties deviating significantly from the mean and can be seen as ‘exceptions to the rule’. In order to provide experimentalists with property ranges that apply to most kinase inhibitors, we disregarded property values beyond two standard deviations from the mean (95.4% confidence interval). The upper and lower molecular descriptor boundaries delimit the current chemical space of kinase inhibitors. They could be used as guidelines, rather than filters, to assist the prioritization of compounds with physicochemical properties comparable to current PKIs. It is important to note that these boundaries were extracted from protein kinase inhibitors being administrated orally. Thus, we could argue that the probability for a compound to reach clinical trials is greater if it does not violate most of these boundaries. The rational is that most of the existing approved and under-development kinase inhibitor drugs successfully passed the safety tests of Phase 1 by following theses boundaries (more than 90% of compounds in PKIDB have passed phases 0 and 1).
Considering all PKIs, to be prioritized a compound could:
Have a molecular weight (MW) between 309 and 617 Da (average of 463.3)
Have a ClogP (calculated with RDKit) between 1.4 and 6.7 (average of 4.0)
Contain between 0 and 4 hydrogen bond donors (HBD) (average of 2.1)
Contain between 3 and 11 hydrogen bond acceptors (HBA) (average of 6.7)
Have a topological polar surface area (TPSA) between 54 and 140 Å2 (average of 96.8)
Contain between 1 and 11 rotatable bonds (NRB) (average of 6.2).
2.2.2. Chemometrical Analysis of Protein Kinase Inhibitors
We asked ourselves whether PKIs have structural specificities that set this class of compounds apart from other orally bioavailable drugs. We reported previously that Type-II inhibitors tend to have a higher molecular weight and lipophilicity as compared to other types of kinase inhibitors [
14]. This can likely be attributed to their binding mode requirement for an elongated structure, which is necessary to simultaneously bind hinge residues and fully occupy the adjacent hydrophobic back-pocket.
In order to gain insight into the relationships between the physicochemical properties of PKIs and their inhibitory effect on kinases, we mapped the PKI chemical space in a low dimensional space using Principal Component Analysis (PCA) [
15]. PCA is a well-established multivariate statistical method able to condense a high-dimension description of individual entities, i.e., molecules in our case, into a 2D or 3D space. This space is delimited by factorial axes or principal components (PCs) formed by linear combinations of the original variables used to describe the individuals. The PCs are rank-ordered according to the fraction of the total variance accounted for by each. The graphs that are produced help understand similarities between molecules as well as correlations between variables (i.e., descriptors). It is routinely used in the chemoinformatics field to analyze chemical datasets [
16].
We performed two separate PCAs on the following sets of compounds: the first set was built using the 180 PKIs collected in PKIDB augmented with 956 FDA-approved oral drugs. The goal was to determine whether the two classes of compounds could be efficiently discriminated into distinct groups; the second set contained only the 180 PKIs. Here, the goal was to highlight physicochemical features specific to each inhibitor type. Both compound sets were described using 11 classical physicochemical descriptors (
Table 4) well suited to quantify chemical structures properties.
The first PCA plot (
Figure 5) illustrates the chemical space of PKIs and oral drugs in a 2D space delimited by the two first principal components (PC1 and PC2).
The two first principal components explain 42.2% and 21.8%, respectively, of the total variance. This sums to 64%, which is an acceptable value for the graphical analysis of the data on a 2D scatterplot without losing too much information. The next PC (PC3) explains 13.8% of the total variance.
Each dot on the PC1/PC2 2D scatterplot represents a molecule. Molecules deviate from the center of gravity of the cloud (center of the graph if the initial data matrix was centered and reduced) by high values of the contributing descriptors of each factorial axis. PKIs occupy the top right quadrant of the graph while oral drugs tend to occupy the opposite top left and lower left quadrants. This suggests that PKIs share structural characteristics specific to this class of compounds. Sometimes a better separation can be achieved by accounting for the next component PC3, but this is not true here (data not shown).
Applying PCA on a correlation matrix enables the graphical representation of normalized variables in a unit hypersphere called ‘correlation sphere’ or ‘correlation circle’ in 2D representation (
Figure 5b). In this space, collinear variable vectors are inter-correlated; likewise, vectors collinear to factorial axes are correlated with these axes. This allows assigning factorial axes a meaning in terms of original descriptors. A vector approaching the surface of the sphere (or circle in a 2D representation) indicates a strong contribution to the creation of a factorial axis.
Analysis of the correlation circle (
Figure 5b) shows that the first factorial axis is correlated with high MW and Labute’s Approximate Surface Area (LabuteASA). These two variables contribute to PC1 with values of 17.2% and 16.5%, respectively. The second axis, PC2, is correlated with a high number of aromatic rings (NAR) and high logP (contribution of 17.4% and 29.9% respectively), and negatively correlated with the fraction of sp3 hybridized carbon atoms (FCSP3). These observations support the fact that kinase inhibitors are known to be less flexible than other drugs and contain more aromatic rings. Collinearity between NAR and logP is consistent with the fact that logP increases mechanically with the number of aromatic rings. The correlation with the molecular weight confirms the preliminary observation inferred from the distribution of the physicochemical properties of the PKIs; they are bigger molecules than the average of oral drugs and have higher LabuteASA.
The second PCA plot (
Figure 6) is a projection of the PKI dataset in the PC1/PC2 factorial plane. PKIs are labelled according to their type (Type-I, Type-I½, Type-II, Type-III and NaN for unknown kinase inhibitor Type).
The variance explained by the three first principal components PC1, PC2 and PC3 is 36.4%, 20% and 14%, respectively (56.4% explained by PC1 and PC2 alone). The most contributing variables to PC1 are, in decreasing order of importance, the molecular weight (19.9%), Labute’s approximate surface area (18.8%) and the number of rotatable bonds (15.4%). Thus, PC1 primarily represents molecular size. For PC2, the most contributing variables are the number of aromatic rings (30.8%), logP (26.8%), and the fraction of sp3 hybridized C atoms (19.2%).
In this plot, inhibitors were colored according to their type of binding mode when the information was available. As expected, PC1 is not able to discriminate inhibitor types, but rather large and flexible molecules from smaller compounds. Type-I and Type-II inhibitors, however, are better separated along the PC2 axis as is apparent from their projection in two different areas of the plot. Indeed, PC2 is correlated with the same variables as those able to discriminate PKIs from oral drugs in our first PCA analysis (
Figure 5). Type-II inhibitors contain more aromatic rings and are more planar than Type-I inhibitors. They need to reach the adjacent hydrophobic pocket, which tolerates aromatic moieties. On the other hand, Type-I inhibitors generally contain a greater fraction of sp3 carbons, which is consistent with the negative correlation of FCSP3 with PC2. Type-I½ inhibitors are distributed evenly across Type-I and Type-II chemical spaces. Because of their hybrid nature, they combine Type-I and Type-II features and display physicochemical properties similar to both types of inhibitors. The three Type-III inhibitors (trametinib, cobimetinib and selumetinib) that bind specifically to the allosteric pocket are projected in the center of the PCA plot and cannot be discriminated by the first two principal components.
2.2.3. Principal Moments of Inertia
The Principal Moments of Inertia (PMI) plot visually represents the shape-based distribution of a set of molecules [
17]. In a PMI plot, molecules are projected in a triangular space (
Figure 7) with its vertices representing the extremes of molecular shape: rod (diacetylene), disc (benzene) and sphere (adamantane). Here, we used this method to simultaneously compare the shape diversity of three sets of compounds: approved PKIs, PKIs under development and other oral drugs. Since the shape of a ligand is often complementary to the shape of a protein binding site, molecules spanning a wide space of the PMI plot are expected to target a large diversity of protein sites. Conversely, ligands occupying a narrow shape space of the PMI plot could target similar binding sites like the ATP binding site of kinases, for instance.
As apparent in
Figure 7, the space covered by the structurally-diverse oral drugs (green-blue) is wider than the one occupied by both PKI sets (red and orange). The distribution of oral drugs is skewed to elongated and circular shapes. A few drugs, however, exhibit a spherical-like shape. The most spherical drug is methenamine (
e), used as an antibacterial drug for the treatment of urinary tract infection. This drug shows up in the same location as adamantane because of its similar “cage-like” 3D structure. Levacetylmethadol (
d) is also an example of a drug that adopts a spherical shape because of its central stereogenic center that is able to project substituents in all directions in space.
Most kinase inhibitors are located close to the rod vertex and along the popular rod-disc axis. Type-II PKIs such as quizartinib (
Figure 7c) can be found close to the rod edge because of their binding requirement for an extended conformation [
18,
19]. The three PKIs closest to the extreme vertices are all molecules under development such as rabusertib (
a), galunisertib (
b), and quizartinib (
c) (
Figure 7). This might be an indication that kinase inhibitors currently in development tend to exhibit new molecular shapes and potentially novel chemical space by moving away from the rod-disc axis.