Exploring the Chemical Space of Cytochrome P450 Inhibitors Using Integrated Physicochemical Parameters, Drug E ﬃ ciency Metrics and Decision Tree Models

: The cytochrome P450s (CYPs) play a central role in the metabolism of various endogenous and exogenous compounds including drugs. CYPs are vulnerable to inhibition and induction which can lead to adverse drug reactions. Therefore, insights into the underlying mechanism of CYP450 inhibition and the estimation of overall CYP inhibitor properties might serve as valuable tools during the early phases of drug discovery. Herein, we present a large data set of inhibitors against ﬁve major metabolic CYPs (CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4) for the evaluation of important physicochemical properties and ligand e ﬃ ciency metrics to deﬁne property trends across various activity levels (active, e ﬃ cient and inactive). Decision tree models for CYP inhibition were developed with an accuracy > 90% for both the training set and 10-folds cross validation. Overall, molecular weight (MW), hydrogen bond acceptors / donors (HBA / HBD) and lipophilicity (clogP / logPo / w) represent important physicochemical descriptors for CYP450 inhibitors. However, highly e ﬃ cient CYP inhibitors show mean MW, HBA, HBD and logP values between 294.18–482.40,5.0–8.2,1–7.29 and 1.68–2.57, respectively. Our results might help in optimization of toxicological proﬁles associated with new chemical entities (NCEs), through a better understanding of inhibitor properties leading to CYP-mediated interactions.


Introduction
The drug discovery and development is a grueling and lengthy process that is prone to high attrition rates throughout all phases of development [1].However, to increase the research and development output, an improved "5R" strategy deciphering right target, right safety, right tissue, right patient and right commercial potential has been proposed by AstraZeneca [2].Various proofs of concept examples of the application of the "5R" strategy indicates an improved success rate from candidate selection to the completion of phase III [2,3].Thus, for high quality leads and drug candidates better insights into pharmacokinetics (PK)/pharmacodynamics (PD) along with ADMET (absorption, distribution, metabolism, excretion and toxicity) properties is highly recommended [3].Additionally, more focused approaches towards incorporating pharmacokinetics and drug metabolism into compound design has assisted in making PK/PD and dose related predictions in humans [3].Hitherto, drug metabolism is an influential factor in pharmacokinetics and hence modulates the behavior of a drug.Therefore, early understanding of metabolism of new chemical entities (NCE) and their affinity towards various metabolic enzymes might assist the PK/PD optimization during the drug development process [4].Generally, amongst all metabolic enzymes, the most important are the cytochrome P450s which constitute a ubiquitous superfamily of heme proteins, playing a key role in the oxidative, peroxidative and reductive metabolism of a wide range of endogenous and exogenous compounds, including drugs [4].In human,57 CYP isoforms have been identified with CYP1A2,2C9,2C19,2D6 and 3A4 mediating ~90% of all the phase I metabolic reactions of clinically relevant drugs [5].The association of cytochrome P450s with toxicological events due to metabolic alterations has brought about CYP-mediated drug metabolism as the principal reason for the occurrence of several drug-drug interactions (DDIs) [6].Moreover, the co-administration of drugs might lead to the inhibition or induction of cytochrome P450s, therefore, there is an earnest need to assess CYP mediated interaction profiles of NCEs during the drug design and development phase [7,8].Furthermore, during the last decade, DDIs associated with the inhibition of cytochrome P450s mainly due to the broad substrate specificity of CYP family of enzymes, emerged as the most common reason for the removal of various marketed drugs [9][10][11].
Additionally, the cytochrome P450 enzymes display an inherent affinity for lipophilic substrates due to their lipophilic nature [12][13][14].Whereas, depending on the ionization states, lipophilic compounds also show inhibition potential against the cytochrome P450s [15].This represents lipophilicity as one of the most significant physicochemical property in the drug discovery and design programs that plays a significant role in determining the ADMET properties [16] along with selectivity, promiscuity [17] and potency [18].Many two-and three-dimensional quantitative structure-activity relationship (2D and 3DQSAR) studies have also reported the effect of lipophilicity on the inhibition of cytochrome P450s [19][20][21][22][23][24].
Therefore, from the drug design perspective, it is anticipated that NCEs should display a suitable metabolism with negligible or no potential of CYP inhibition or induction [25].During the recent years the availability of X-ray, crystallographic structures of various mammalian CYP isoforms and mutagenesis data has provided a better understanding of CYP structure-function relationships [26][27][28][29][30][31][32].Most importantly, significant in silico, in vitro and experimental efforts have been made to elucidate the underlying mechanisms behind CYP inhibition [33][34][35][36][37][38][39].Moreover, various ligand-and structure-based in silico models as well as machine learning approaches have been used for the classification of inhibitors and substrates of individual CYP isoforms [40][41][42][43][44][45][46].Herein, we estimate a set of physicochemical parameters in combination with lipophilic efficiency (LipE) and ligand efficiency (LE) metrics to classify the most active and efficient inhibitors of the target CYPs (CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4).Additionally, we attempt to build simple and easily interpretable decision tree models for the prediction of cytochrome P450 inhibition.The identification of molecular descriptor ranges, important for CYP inhibition in general and for highly efficient binding in particular, might provide a valuable tool for the classification and prediction of CYP inhibition against the selected subtypes.

Database Collection
A data set of CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4 inhibitors with inhibitory potency (IC 50 ) values was collated from ChEMBL database [47] using a filtering criteria of IC 50 ≤ 100 µM against each CYP subtype.The dataset was further refined by removing inconsistent potency values (% age inhibition, nonabsolute) and duplicate entries.After refinement, the final data set of 6999 inhibitors of CYPs includes 612 CYP1A2,1341 CYP2C9,651 CYP2C19,1647 CYP2D6 and 2747 CYP3A4 inhibitors (Figure 1 and Tables S1-S5 in Supplementary Materials).Additionally, the inhibitor datasets of each CYP class were divided in to three activity levels including active, efficient and inactive.Generally, an activity threshold of IC 50 ≤ 50 µM was used to categorize compounds as actives and remaining compounds with IC 50 > 50-100 µM as inactives.Here, in this particular study, we have used the activity threshold of 50 µM to build a more generalized inhibition model for each CYP subtype as proposed by Tie et al. [5,6].However, the active inhibitors with LipE ≥ 5, lipophilicity (clogP) values of ~1.0-3.0,IC 50 ~10-150 nM and LE ≥ 0.29 (kcal/mol/heavy atom) were further classified as highly efficient (more prone to drug-drug interaction due to CYP inhibition).These include 12 CYP1A2, eight CYP2C9, five CYP2C19, eight CYP2D6 and 17 inhibitors of CYP3A4.The activity ranges and the number of actives and inactives against each CYP isoform are presented in Figure 1.Furthermore, the IC 50 (µM) values were normalized by converting into pIC 50 for the calculation of LipE and LE metrics.The schematic workflow used in this study for the elucidation of CYP inhibitor properties across various activity levels is shown in Figure 2.
Computation 2019,7, x FOR PEER REVIEW 3 of 31 inhibition).These include 12 CYP1A2, eight CYP2C9, five CYP2C19, eight CYP2D6 and 17 inhibitors of CYP3A4.The activity ranges and the number of actives and inactives against each CYP isoform are presented in Figure 1.Furthermore, the IC50 (µM) values were normalized by converting into pIC50 for the calculation of LipE and LE metrics.The schematic workflow used in this study for the elucidation of CYP inhibitor properties across various activity levels is shown in Figure 2.
Figure 1.The total number of cytochrome P450 (CYP) inhibitors split into "active," "inactive" and "efficient" along with the respective potency ranges against each CYP isoform.The total number of cytochrome P450 (CYP) inhibitors split into "active," "inactive" and "efficient" along with the respective potency ranges against each CYP isoform.
Computation 2019,7, x FOR PEER REVIEW 3 of 31 inhibition).These include 12 CYP1A2, eight CYP2C9, five CYP2C19, eight CYP2D6 and 17 inhibitors of CYP3A4.The activity ranges and the number of actives and inactives against each CYP isoform are presented in Figure 1.Furthermore, the IC50 (µM) values were normalized by converting into pIC50 for the calculation of LipE and LE metrics.The schematic workflow used in this study for the elucidation of CYP inhibitor properties across various activity levels is shown in Figure 2.    Lipophilicity contributes towards drug solubility, permeability and metabolism, thus representing an important factor in pharmacokinetics-and pharmacodynamics-mediated toxicity of a chemical entity [16].Leeson and Springthorpe proposed the lipophilic efficiency (LipE) metric as an explicit approach to estimate drug-likeliness by providing a linkage between lipophilicity and potency [17].However, a drug like compound may also show off-target toxicity due to its potential to interact with antitargets such as CYP450, hERG and P-glycoprotein.Herein, we apply this concept to the inhibitors of the selected CYP450 subtypes to estimate the properties of the most efficient CYP inhibitors and anticipate that avoiding these properties during lead optimization programs may reduce antitarget interaction potential of new chemical entities.The LipE profiles were generated by subtracting lipophilicity (clogP) from the negative logarithm of potency (pIC 50 ) values against the respective CYP isoform (Equation ( 1)) (Tables S1-S5 in Supplementary Materials).
The clogP values were calculated through the Bio-Loom software package [48] using the SMILES of the entire data set whereas, the LipE calculations were performed using the Excel spreadsheet.

Ligand Efficiency (LE)
Ligand efficiency is a measure that quantifies a ligands affinity towards its target and is measured by dividing the binding free energy (∆G) in kcal/mol to the number of heavy atoms (HA) [49,50].The binding free energies (∆G) were calculated using Equation ( 2) where, R is the ideal gas constant, T is the temperature in Kelvin and K d is the disassociation constant.A temperature of 310 K was used to compute ligand efficiencies in kcal/mol/heavy atom.Additionally, ∆G values for CYP inhibitors were computed by substituting the dissociation constant (K d ) with pIC 50 values as explicated by Hopkins et al. [49] which was also further established by the experimental findings of Kuntz et al. [51].
In order to estimate the binding quality of a compound towards the respective CYP isoforms, here ligand efficiency (LE) profiling for the entire inhibitor dataset was performed using Equation (3): ∆G and LE values for inhibitors of each CYP subtype are shown in Supplementary Materials (Tables S1-S5 in Supplementary Materials).The Excel spreadsheet was used to perform ligand efficiency calculations.
Furthermore, Lipinski's rule of five [54], the Golden Triangle [55] and the Pfizer's 3/75 rule [56] have also been applied using these physicochemical descriptors to probe inhibition rules for active and efficient inhibitors of the respective CYP isoform.

Decision Trees (C4.5 DT)
The decision trees for the classification of active and inactive inhibitors of CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4 were built using the complete set of already calculated physicochemical descriptors.WEKA software package [57] was used to train decision trees based on J48 classifier [58] using 10-fold cross-validation procedure.J48 is one of the most powerful and commonly used decision tree classifier that is an improved version of C4.5 algorithm [59].A J48 classifier creates a binary decision tree to model the classification procedure based on the divide-and-conquer rule [60].

Model Performance Evaluation
In order to evaluate the overall performance of decision tree models, several parameters including accuracy (Equation ( 4)), sensitivity (Equation ( 5)) and specificity (Equation ( 6)) were calculated; where accuracy indicates the ratio of correctly categorized occurrences to the total number of entities, sensitivity and specificity correspond to the ratio of correctly classified inhibitors and correctly predicted noninhibitors, respectively [61].
Matthews correlation coefficient (MCC) metric was further used to measure the quality of classification model by taking into account the true positive (TP), true negative (TN), false positive (FP) and false negative (FN) instances (Equation ( 7)) [62].The MCC values usually fall between −1 to +1, where ideally, a value of +1 for a classifier is indicative of a good agreement between predicted and experimental values of classes [62].Another index that shows a better evaluation of the models predictive power is kappa statistic (Equation ( 8)), which uses by chance, the expected agreement based on the ratio between the classes (Equations ( 8) and ( 9)) [63] where 1,0 and −1 indicate perfect agreement, no agreement above that expected by chance and complete disagreement, respectively [63].
where, E is the expected agreement and calculated as follows: Furthermore, area under the curve (AUC) based on the receiver operating characteristic (ROC) curve was also calculated to estimate the overall model performance [64].Overall, the AUC of about 0.5 corresponds to the expected performance of random selection, whereas a value below 0.5 is indicative of inferior performance as compared to random selection [64].

Activity and Efficiency Landscape of the Selected CYP Isoforms Inhibitors
In order to refine the CYP inhibition rules, inhibitor datasets of the selected CYP isoforms were divided in to active, efficient and inactive.The highly efficient inhibitors of CYP subtypes were selected on the basis of drug lipophilic and ligand efficiency metrics.

Lipophilic Efficiency
Previously, Leeson and Springthorpe demonstrated a clogP of ~2.5, potency in the range ~1-10 nM and LipE of ~5-7 or greater as the optimal criteria for an average oral drug against a true target [17].
In the present study, we attempt to apply this concept to a set of antitargets, the cytochrome P450 family of enzymes, to further refine the respective inhibition rules.The clogP and LipE distribution for the inhibitors of the selected CYP isoforms shows that a greater percentage of CYP inhibitors are highly lipophilic with clogP values from 2.0-7.0 and LipE values from 0.0-5.0(Figure 3a,b).

CYP Isoform
No. of Compounds IC 50

Ligand Efficiency
To gain insight into the highly efficient inhibitors of the selected CYP450 isoforms in terms of binding free energy with the respective enzyme, we computed the ligand efficiency (LE) metric for the entire inhibitor dataset as outlined in the Materials and Method section.For our dataset of CYP inhibitors, a greater percentage (97.9%) of inhibitors displayed heavy atom count (HA) from 10-50 with LE values from 0.1-0.5 (kcal/mol/heavy atom) as shown by distribution plots in Figure 5a,b.Generally, LE values for the entire CYP inhibitor data set vary from 0.016-1.07kcal/mol/heavy atom (Table 1 and Tables S1-S5).It is evident that out of the total data, about 365 CYP1A2,361 CYP2C9,236 CYP2C19,739 CYP2D6 and 675 CYP3A4 inhibitors showed LE within the range of already established threshold (≥ 0.29 kcal/mol/heavy atom) for optimal binding with true therapeutic target, which may reflect an optimal fit inside the respective binding site [49].Therefore, in the present study, compounds having LE ≥ 0.29 kcal/mol/heavy atom along with LipE ≥ 5 and clogP ~1.0-3.0 were classified as the highly efficient inhibitors of the respective CYP subtype.The overall ranges and mean values of LE and HA for the entire set of inhibitors as well as for the most efficient inhibitors against each CYP subtype are shown in Table 1.However, the absolute LE, HA count and ∆G values of the inhibitors of CYP1A2,2C9,2C19,2D6 and 3A4 are presented in Tables S1-S5 respectively.

Ligand Efficiency
To gain insight into the highly efficient inhibitors of the selected CYP450 isoforms in terms of binding free energy with the respective enzyme, we computed the ligand efficiency (LE) metric for the entire inhibitor dataset as outlined in the Materials and Method section.For our dataset of CYP inhibitors, a greater percentage (97.9%) of inhibitors displayed heavy atom count (HA) from 10-50 with LE values from 0.1-0.5 (kcal/mol/heavy atom) as shown by distribution plots in Figure 5a,b.Generally, LE values for the entire CYP inhibitor data set vary from 0.016-1.07kcal/mol/heavy atom (Table 1 and Tables S1-S5).It is evident that out of the total data, about 365 CYP1A2,361 CYP2C9,236 CYP2C19,739 CYP2D6 and 675 CYP3A4 inhibitors showed LE within the range of already established threshold (≥ 0.29 kcal/mol/heavy atom) for optimal binding with true therapeutic target, which may reflect an optimal fit inside the respective binding site [49].Therefore, in the present study, compounds having LE ≥ 0.29 kcal/mol/heavy atom along with LipE ≥ 5 and clogP ~1.0-3.0 were classified as the highly efficient inhibitors of the respective CYP subtype.The overall ranges and mean values of LE and HA for the entire set of inhibitors as well as for the most efficient inhibitors against each CYP subtype are shown in Table 1.However, the absolute LE, HA count and ΔG values of the inhibitors of CYP1A2,2C9,2C19,2D6 and 3A4 are presented in Tables S1-S5 respectively.

Physicochemical Properties
The physicochemical properties associated with chemical compounds might influence the overall efficacy, metabolism and safety profiles.Therefore, various studies elucidating the relationships between potency, ADME and physicochemical properties of chemical entities have been reported in literature [17,[76][77][78][79].For various classes of compounds, a better understanding of the physicochemical properties might assist in differentiating target families and ultimately avoiding the undesirable binding to off-targets.Additionally, it may also contribute towards the design of compounds capable of binding to multiple biological targets which might prove beneficial for the treatment of complex disease conditions [80].Here, physicochemical properties including MW, logP, logD, TPSA, rotatable bond, HBDs and HBAs, vsa_acc, vsa_don, rings, number of stereocenters, fraction of sp3 carbons (Fsp3) and the formal charges have been computed to probe the general and specific properties of CYP inhibitors across various classes and activity levels.
Additionally, the two most important applications of physicochemical parameters to assess drug-likeness are the well-known Lipinski's rule of five (RO5) [54] and the Golden Triangle rule [55] that were originally proposed by taking into account the properties of successful drug compounds of that time.The application of these rules to true therapeutic targets has been extensively reported in literature [81][82][83][84][85].However, here we have monitored the RO5 and Golden Triangle violations for the inhibitors of antitargets, the cytochrome P450 family of enzymes.
The physicochemical properties of oral drugs reaching clinical phase II were estimated by Lipinski et al. to frame the well-known rule of five, indicating that a logP ≤ 5, MW ≤ 500, HBAs (O + N atom count) ≤ 10 and HBDs (OH + NH count) ≤ 5 is necessary for absorption or permeation [54].Considering the trends of these important descriptors across the family of CYP inhibitors it is notable that the CYP3A4 inhibitors show the highest mean (466.29) and median (455.63)molecular weights with 95% percentile of 677.69 (Table 2) which is well explicated by the fact that CYP3A4 accommodates large and structurally diverse compounds due its promiscuous binding site [86].Similarly, the highly efficient CYP3A4 inhibitors with optimal LipE and LE values show the highest mean and median MW (M: 482.46,Mdn: 493.99) in comparison to the highly efficient inhibitors of the remaining CYP isoforms in the data set.Overall, CYP1A2 inhibitors including those fulfilling the efficiency criteria display the lowest mean and median MW (all inhibitors M: 345.14, Mdn: 330.37, efficient inhibitors M: 294.18,Mdn: 288.3) as compared to other CYP isoforms which expounds the fact that molecular planarity with a small volume to surface ratio may favor CYP1A2 inhibition [11].For the analysis of molecular weight property for other CYP isoforms, refer to Table 2 and Table S6.The 95% confidence intervals (CI) for the difference between calculated property means were also computed for all datasets (Table S6).
Table 2.The range (R), mean (M), standard error of mean (SEM), median (Mdn) and 95% percentile (P) values for physicochemical parameters of all inhibitors and highly efficient inhibitors of CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4.TPSA: Topological polar surface area; HBAs: Hydrogen bond acceptors; HBDs: Hydrogen bond donors; Fsp3: Fraction of sp3 carbons; Vsa_acc: Sum of van der Waals (vdW) surface areas (Å 2 ) of Hydrogen bond acceptors; Vsa_don: Sum of van der Waals (vdW) surface areas (Å 2 ) of Hydrogen bond donors; MW: Molecular weight.An important component of the RO5 is lipophilicity, which is a major contributing factor in several ADMET parameters as well as potency.A higher lipophilicity-associated with the chemical entities might lead to unsuitable metabolism and solubility, whereas reduced permeability might be an outcome of lower lipophilicity [87].Specifically, for the CYP family of enzymes, lipophilicity is crucial for determining binding affinity of a compound and selectivity towards the specific CYP isoforms [88].Previously, various in silico models based on logP/logD, hydrogen bonding potential or polar surface area for the prediction of ADME and efficiency have been reported in literature [89][90][91].Herein, the lipophilicity values characterized by computed logP and logD were compiled for each CYP subtype, showing that overall CYP inhibitors of the selected subtypes are highly lipophilic.This clearly indicates the inherent affinity of cytochrome P450 family of enzymes for lipophilic compounds [12][13][14].From our datasets the highest mean/median logP values are shown by CYP2C9 (M: 4.03/Mdn: 3.9) and CYP2C19 (M: 3.9/Mdn: 3.88) inhibitors.Whereas, the highly efficient inhibitors of each CYP subtype show mean/median logP values of ~2.5 (Table 1).Moreover, in comparison to all other CYP inhibitor datasets, the lowest mean/median logD values were displayed by CYP2D6 inhibitors (M: 2.24/Mdn: 2.135) as shown in Table 2. Briefly, these are basic compounds with positive charge on nitrogen [36].Thus, CYP2D6 inhibitors might show lower logD values due to lower partitioning of the protonated amines at pH 7.4 into the organic phase [92].Similarly, the lowest mean/median logD (M: 0.57/Mdn: 0.405) values were shown by highly efficient CYP2D6 inhibitors.
CYP-family-based property analysis of inhibitors, in terms of hydrogen bonding potential (HBA and HBD), was also performed.It is well explicated that the overall shape and flexibility along with molecular size, compound lipophilicity and hydrogen bonding potential are of extreme importance for the estimation of permeability of chemical entities [93].Therefore, all these parameters were also assessed for the selected CYP inhibitor datasets (Table 2, Table S6).We observed mean hydrogen bond donor (HBD) values within 1 to 2 for all inhibitors and 1 to 3 for the highly efficient inhibitors of each CYP subtype (Table 2).The highest mean/median hydrogen bond acceptor (HBA) values were shown by all CYP3A4 inhibitors (M: 6.7/Mdn: 7), including highly efficient inhibitors of CYP3A4 (M: 8/Mdn: 9), which can be explained by the fact that CYP3A4 inhibitors exhibit high molecular weight that ultimately increases the atom count, thus subsequently increasing the hydrogen bonding potential.In contrast, the lowest mean/median values of HBA count was shown by all CYP1A2 inhibitors (M: 4/Mdn: 4) mainly due to planar aromatic compounds with small volume to surface ratio [94].A similar trend in HBA count was observed for the highly efficient inhibitors of CYP1A2 (M: 5/Mdn: 5) (Table 2).
Figure 6 represents the overall Lipinski's violations for each CYP inhibitor dataset.Overall, greater number of CYP1A2 inhibitors (81.37%) do not violate the RO5 followed by inhibitors of CYP2D6 (74.32%),CYP2C19 (70.20%),CYP2C9 inhibitors (64.50%) with only 57.26% CYP3A4 inhibitors as shown in Figure 6.Various studies elucidating the relationships between CYP enzymes and calculated properties have been reported extensively in literature, however, the CYP inhibition is of utmost concern in terms of RO5 violations [15,55,95,96].The RO5 guidelines are based on molecular properties and, therefore, do not take into account the affinity of a ligand towards its particular target [97].Thus, RO5 is a simplistic criteria solely based on molecular properties that does not consider a ligands affinity towards its target [97] and safety profiling.Therefore, majority of the CYP inhibitors from our dataset showing drug-like properties (no RO5 violations) also show greater chances of toxicological outcomes due to the inhibition of CYP isoforms.
Interestingly, the greatest numbers of RO5 violations were observed for logP and MW descriptors which can be explained by the fact that CYP inhibitors are larger and highly lipophilic in nature.Generally, for the two most important RO5 descriptors (logP and MW) it is shown that an increased lipophilicity (logP) is associated to target promiscuity and toxicity, whereas an increased MW leads to decreased promiscuity [17,96,98,99].It is also well explicated that highly lipophilic compounds show a greater potential for hERG and CYP inhibition which clearly explains the trend observed for highest RO5 logP violations in our dataset [15,87].However, CYP3A4 inhibition has also been correlated to increased MW and lipophilicity with decreased Fsp3 which might lead to potential drug-drug interactions and clearance issues [15,55,95,96].This is also depicted by the greatest RO5 MW and logP violations for CYP3A4 inhibitors from our data set (Figure 6).Interestingly, the greatest numbers of RO5 violations were observed for logP and MW descriptors which can be explained by the fact that CYP inhibitors are larger and highly lipophilic in nature.Generally, for the two most important RO5 descriptors (logP and MW) it is shown that an increased lipophilicity (logP) is associated to target promiscuity and toxicity, whereas an increased MW leads to decreased promiscuity [17,96,98,99].It is also well explicated that highly lipophilic compounds show a greater potential for hERG and CYP inhibition which clearly explains the trend observed for highest RO5 logP violations in our dataset [15,87].However, CYP3A4 inhibition has also been correlated to increased MW and lipophilicity with decreased Fsp3 which might lead to potential drug-drug interactions and clearance issues [15,55,95,96].This is also depicted by the greatest RO5 MW and logP violations for CYP3A4 inhibitors from our data set (Figure 6).
Moreover, the Golden Triangle hypothesis was originally proposed by Johnson et al. that aids the selection of molecules with better permeability, metabolic stability and improved potency by simultaneously optimizing the overall absorption and clearance of chemical entities.Principally, in vitro permeability (Caco-2 cells: 16,227 compounds) and metabolic data (human liver microsomes (HLM): 47,018 compounds) were used for analysis with physicochemical properties including MW and logD, where a positive correlation was observed between logD and permeability at a given MW.However, for metabolic clearance, a negative correlation was observed with logD and MW.Therefore, the combination of permeability and HLM data were used to define favorable thresholds with baseline logD ranging from −2.0-5.0 at MW of 200 Da and an apex at logD 1.0-2.0 and MW of 450 Da for compounds with better permeability and metabolic stability properties [55].Since the logD and MW parameters are also closely related to LipE, LE and lipophilic metabolic efficiency (LipMetE) parameters, therefore, the Golden Triangle can be effectively used by designing leads against true therapeutic targets with optimal LipE, LE and LipMetE into the center of Golden Triangle to provide better potency, absorption/permeability, metabolic stability and suitable clearance properties for new chemical entities [55].
Herein, the logD and MW properties have been calculated for our antitarget inhibitor datasets (selected CYP450 isoforms).The highly lipophilic CYP inhibitors lying outside the Golden Triangle fail to display better permeability and show low in vitro clearance and, thus, represent poor pharmacokinetics.The CYP inhibitors from our datasets with high MW and lower logD values do not lie with the Golden Moreover, the Golden Triangle hypothesis was originally proposed by Johnson et al. that aids the selection of molecules with better permeability, metabolic stability and improved potency by simultaneously optimizing the overall absorption and clearance of chemical entities.Principally, in vitro permeability (Caco-2 cells: 16,227 compounds) and metabolic data (human liver microsomes (HLM): 47,018 compounds) were used for analysis with physicochemical properties including MW and logD, where a positive correlation was observed between logD and permeability at a given MW.However, for metabolic clearance, a negative correlation was observed with logD and MW.Therefore, the combination of permeability and HLM data were used to define favorable thresholds with baseline logD ranging from −2.0-5.0 at MW of 200 Da and an apex at logD 1.0-2.0 and MW of 450 Da for compounds with better permeability and metabolic stability properties [55].Since the logD and MW parameters are also closely related to LipE, LE and lipophilic metabolic efficiency (LipMetE) parameters, therefore, the Golden Triangle can be effectively used by designing leads against true therapeutic targets with optimal LipE, LE and LipMetE into the center of Golden Triangle to provide better potency, absorption/permeability, metabolic stability and suitable clearance properties for new chemical entities [55].
Herein, the logD and MW properties have been calculated for our antitarget inhibitor datasets (selected CYP450 isoforms).The highly lipophilic CYP inhibitors lying outside the Golden Triangle fail to display better permeability and show low in vitro clearance and, thus, represent poor pharmacokinetics.The CYP inhibitors from our datasets with high MW and lower logD values do not lie with the Golden Triangle mainly due to low permeability, whereas a greater number of highly lipophilic CYP inhibitors with high MW lie outside this region due to higher in vitro clearance.Ideally, while screening against an antitarget (toxicity prediction), compounds within the Golden Triangle represent safer chemical entities while the ones lying outside this region represent more notorious compounds due to poor absorption and permeability properties.However, for our datasets, the highest number of CYP1A2 (70.75%) and the lowest number of CYP3A4 (44.70%) inhibitors were observed within the Golden Triangle region mainly due to high MW and highly lipophilicity.Moreover, a better prevalence of inhibitors within the Golden Triangle has also been observed for CYP2D6 (66.48%),CYP2C19 (62.52%) and CYP2C9 (52.57%).It is also observed that the majority of the highly efficient inhibitors of each CYP subtype also lie within this window mainly due to the fulfillment of the efficiency criteria (clogP ~1.0-3.0,LipE ≥ 5, LE ≥ 0.29, MW ≤ 500) (Figure 7a-e).Overall, the presence of most efficient, as well as highly active, inhibitors of CYP isoforms (toxic) within the Golden Triangle indicates that majority of the CYP inhibitors show properties of safer compounds but still they are notorious and show a greater degree of CYP inhibition potential.
Triangle mainly due to low permeability, whereas a greater number of highly lipophilic CYP inhibitors with high MW lie outside this region due to higher in vitro clearance.Ideally, while screening against an antitarget (toxicity prediction), compounds within the Golden Triangle represent safer chemical entities while the ones lying outside this region represent more notorious compounds due to poor absorption and permeability properties.However, for our datasets, the highest number of CYP1A2 (70.75%) and the lowest number of CYP3A4 (44.70%) inhibitors were observed within the Golden Triangle region mainly due to high MW and highly lipophilicity.Moreover, a better prevalence of inhibitors within the Golden Triangle has also been observed for CYP2D6 (66.48%),CYP2C19 (62.52%) and CYP2C9 (52.57%).It is also observed that the majority of the highly efficient inhibitors of each CYP subtype also lie within this window mainly due to the fulfillment of the efficiency criteria (clogP ~1.0-3.0,LipE ≥ 5, LE ≥ 0.29, MW ≤ 500) (Figure 7a-e).Overall, the presence of most efficient, as well as highly active, inhibitors of CYP isoforms (toxic) within the Golden Triangle indicates that majority of the CYP inhibitors show properties of safer compounds but still they are notorious and show a greater degree of CYP inhibition potential.The 3/75 rule is yet another important rule introduced by Pfizer that takes in account physicochemical properties and is mainly based on the observation that at a plasma concentration < 10 µM (Cmax), a logP > 3 and TPSA < 75Å 2 leads to a greater possibility of adverse and toxicological outcomes [56].TPSA is an important physicochemical parameter related to hydrogen bonding that shows the sum of surfaces of all polar atoms (mainly oxygen and nitrogen) and is frequently used for the assessment of oral bioavailability and permeability [96,100].Moreover, an increasing trend in TPSA values is indicative of reduced permeability and the overall bioavailability [100].The trends of TPSA for our datasets were also monitored across various classes and activity levels and are shown in Table 2.
Additionally, the Pfizer 3/75 rule was also applied to all inhibitor datasets and the greatest number of CYP1A2 inhibitors (54.24%) followed by 46.1% CYP2D6,44.4%CYP2C19,34.5% CYP2C9 and 29% CYP3A4 inhibitors were observed in the pink region, indicating a greater likelihood to cause toxicity and experimental promiscuity (Figure 8a-e).A compound with clogP > 3 and TPSA < 75 is observed within the unacceptable region indicated by pink whereas, a chemical entity with clogP < 3 and TPSA > 75 defines the acceptable region of safety (green region).Similar to the Golden Triangle results, a substantial number of CYP inhibitors, including the highly efficient inhibitors, reside outside the pink region but still they are capable of causing toxicological outcomes mediated by CYP inhibition.Therefore, there is an earnest need to assess the physicochemical property trends of CYP inhibitors and noninhibitors at different classification levels and to construct highly accurate predictive models for the safety profiling.
points, whereas highly efficient inhibitors are shown by red points.The compounds located in the Golden Triangle show a greater likelihood of an optimal permeability, low clearance and a better metabolic stability.
The 3/75 rule is yet another important rule introduced by Pfizer that takes in account physicochemical properties and is mainly based on the observation that at a plasma concentration < 10 µM (Cmax), a logP > 3 and TPSA < 75Å 2 leads to a greater possibility of adverse and toxicological outcomes [56].TPSA is an important physicochemical parameter related to hydrogen bonding that shows the sum of surfaces of all polar atoms (mainly oxygen and nitrogen) and is frequently used for the assessment of oral bioavailability and permeability [96,100].Moreover, an increasing trend in TPSA values is indicative of reduced permeability and the overall bioavailability [100].The trends of TPSA for our datasets were also monitored across various classes and activity levels and are shown in Table 2.
Additionally, the Pfizer 3/75 rule was also applied to all inhibitor datasets and the greatest number of CYP1A2 inhibitors (54.24%) followed by 46.1% CYP2D6,44.4%CYP2C19,34.5% CYP2C9 and 29% CYP3A4 inhibitors were observed in the pink region, indicating a greater likelihood to cause toxicity and experimental promiscuity (Figure 8a-e).A compound with clogP > 3 and TPSA < 75 is observed within the unacceptable region indicated by pink whereas, a chemical entity with clogP < 3 and TPSA > 75 defines the acceptable region of safety (green region).Similar to the Golden Triangle results, a substantial number of CYP inhibitors, including the highly efficient inhibitors, reside outside the pink region but still they are capable of causing toxicological outcomes mediated by CYP inhibition.Therefore, there is an earnest need to assess the physicochemical property trends of CYP inhibitors and noninhibitors at different classification levels and to construct highly accurate predictive models for the safety profiling.Other important descriptors encoding molecular flexibility of CYP inhibitors were also assessed by the number of rings, rotatable bonds, stereocenters and the fraction of sp3 hybridized carbons (Fsp3) since it plays an influential role in determining the overall permeability, bioavailability and promiscuity against a particular target [95,101].Overall, an increased risk of hERG toxicity and CYP inhibition has also been associated with an aromatic ring count greater than three [101,102].Therefore, higher ring counts (between 0 to 7) were shown by all inhibitors and highly efficient inhibitors of CYPs and no significant variations were observed in the mean/median ranges as shown in Table 2. Additionally, the greatest molecular flexibility in terms of rotatable bonds was shown by all CYP3A4 inhibitors, including highly efficient CYP3A4 inhibitors and the lowest being shown by CYP1A2 inhibitors which correlates with the molecular planarity of CYP1A2 inhibitors [36].For a detailed analysis of the calculated properties of all classes of inhibitors (all/highly efficient) against each CYP isoform, refer to Table 2.

Decision Trees
In the current study, predictive decision tree models were obtained with HBD_HBA count, sCenters, number of rings, HBAs, HBDs, total charge, molecular weight, logD, logP and vsa_acc.The values of selected descriptors for active, efficient and inactive compounds against each CYP subtype are summarized in Table 3.The statistical parameters of each model are shown in Table 4.Other important descriptors encoding molecular flexibility of CYP inhibitors were also assessed by the number of rings, rotatable bonds, stereocenters and the fraction of sp3 hybridized carbons (Fsp3) since it plays an influential role in determining the overall permeability, bioavailability and promiscuity against a particular target [95,101].Overall, an increased risk of hERG toxicity and CYP inhibition has also been associated with an aromatic ring count greater than three [101,102].Therefore, higher ring counts (between 0 to 7) were shown by all inhibitors and highly efficient inhibitors of CYPs and no significant variations were observed in the mean/median ranges as shown in Table 2. Additionally, the greatest molecular flexibility in terms of rotatable bonds was shown by all CYP3A4 inhibitors, including highly efficient CYP3A4 inhibitors and the lowest being shown by CYP1A2 inhibitors which correlates with the molecular planarity of CYP1A2 inhibitors [36].For a detailed analysis of the calculated properties of all classes of inhibitors (all/highly efficient) against each CYP isoform, refer to Table 2.

Decision Trees
In the current study, predictive decision tree models were obtained with HBD_HBA count, sCenters, number of rings, HBAs, HBDs, total charge, molecular weight, logD, logP and vsa_acc.The values of selected descriptors for active, efficient and inactive compounds against each CYP subtype are summarized in Table 3.The statistical parameters of each model are shown in Table 4.
A decision tree model for the classification of CYP1A2 inhibition was built using a training set of 612 inhibitors with data split into 566 active compounds (including 12 efficient) with IC 50 values ≤ 50 µM and 46 inactives with IC 50 > 50 µM.Most prominently, for CYP1A2 inhibitors the HBD and HBA count (HBA_HBD) was identified as an important classification descriptor.Other discerning descriptors for this class include stereocenters (sCenters), hydrogen bond acceptors (HBA) and molecular weight (Figure 9a).Additionally, for active inhibitors a HBA count of ≤6 is shown by the decision tree classifier for CYP1A2 inhibition.A similar range of HBA count for CYP1A2 inhibitors has been reported by Vasanthanathan et al. [103].Overall, it is shown by the decision tree classifier that the active inhibitors of CYP1A2 show lower HBA_HBD counts (≤ 11), sCenters, HBA (≤ 6) and molecular weights (≤ 507) in comparison to inactive CYP1A2 inhibitors.However, the highly efficient CYP1A2 inhibitors display lower mean molecular weights and sCenters along with higher HBA in comparison to actives (Mean: 294.18/0.11/5)and inactives (Mean: 422.65/0.80/5.78)(Table 3).A decision tree model for the classification of CYP1A2 inhibition was built using a training set of 612 inhibitors with data split into 566 active compounds (including 12 efficient) with IC50 values ≤ 50 µM and 46 inactives with IC50 > 50 µM.Most prominently, for CYP1A2 inhibitors the HBD and HBA count (HBA_HBD) was identified as an important classification descriptor.Other discerning descriptors for this class include stereocenters (sCenters), hydrogen bond acceptors (HBA) and molecular weight (Figure 9a).Additionally, for active inhibitors a HBA count of ≤6 is shown by the decision tree classifier for CYP1A2 inhibition.A similar range of HBA count for CYP1A2 inhibitors has been reported by Vasanthanathan et al. [103].Overall, it is shown by the decision tree classifier that the active inhibitors of CYP1A2 show lower HBA_HBD counts (≤ 11), sCenters, HBA (≤ 6) and molecular weights (≤ 507) in comparison to inactive CYP1A2 inhibitors.However, the highly efficient CYP1A2 inhibitors display lower mean molecular weights and sCenters along with higher HBA in comparison to actives (Mean: 294.18/0.11/5)and inactives (Mean: 422.65/0.80/5.78)(Table 3).For the CYP2C9 pruned decision tree model, molecular weight appeared to be the branching descriptor.Other discriminating descriptors for CYP2C9 inhibitors include HBD and logD as shown in Figure 9b.Previously, Jónsdóttir et al. proposed that, in comparison to CYP2C9 substrates, the CYP2C9 inhibitors exhibit larger mean molecular weight and polar surface area which further strengthens the selection of our descriptor set [104].In contrast, Ekins et al. have delineated, through different inhibitor data sets, that CYP2C9 inhibitor binding is controlled by multiple factors within the binding site, such as hydrophobic, hydrogen bond acceptor and donor interactions which reflects the significance of hydrogen bonding potential as described by descriptors (HBD, logD) in our model [105].Largely, the trends of MW, logD and HBD descriptors selected by our CYP2C9 inhibition classification model have already been discussed across various activity levels in the physicochemical property analysis section (Table 3).
For CYP2D6 inhibition classification shape, atomic polarizability, electrostatic, hydrophobic, lipophilicity and acid base features have already been reported in literature [106,107].Herein, a CYP2D6 decision tree classifier was built using 1647 compounds and a set of seven descriptors as explained in Table 3.Total charge descriptor appeared as the branching node and the other selected descriptors include HBD, HBA, rings, sCenters, logP and molecular weight (Figure 9c).Two extremely important descriptors including logP(o/w) and molecular weight were selected by our CYP2D6 inhibitor classification showing mean logP(o/w) and MW of 400.19/3.69 and 395.50/3.61for actives and inactives, respectively.Considering the selected descriptors for the three categories of active, efficient and inactive CYP2D6 inhibitors, it is evident that efficient inhibitors represent the highest mean HBD, HBA, total charge and stereocenters with lower molecular weights, logD values and ring counts (Table 3).
Finally, CYP3A4 inhibition-based decision tree was built using a set of 2641 active including 43 efficient and 106 inactive compounds (Figure 1).Briefly, size and hydrophobicity of a chemical entity are the molecular properties that have an influential role in determining CYP3A4 inhibition [82].Choi et al. have built recursive partitioning trees for CYP3A4 inhibitor and noninhibitor classification using a set of 2D descriptors indicating molecular weight as the most conclusive feature which is also shown by the physicochemical property analysis and decision tree of CYP3A4 inhibitors.For the decision tree, the other discerning descriptor was vsa_acc (Figure 9d).The chemical entities with a molecular weight >235.28 were For the CYP2C9 pruned decision tree model, molecular weight appeared to be the branching descriptor.Other discriminating descriptors for CYP2C9 inhibitors include HBD and logD as shown in Figure 9b.Previously, Jónsdóttir et al. proposed that, in comparison to CYP2C9 substrates, the CYP2C9 inhibitors exhibit larger mean molecular weight and polar surface area which further strengthens the selection of our descriptor set [104].In contrast, Ekins et al. have delineated, through different inhibitor data sets, that CYP2C9 inhibitor binding is controlled by multiple factors within the binding site, such as hydrophobic, hydrogen bond acceptor and donor interactions which reflects the significance of hydrogen bonding potential as described by descriptors (HBD, logD) in our model [105].Largely, the trends of MW, logD and HBD descriptors selected by our CYP2C9 inhibition classification model have already been discussed across various activity levels in the physicochemical property analysis section (Table 3).
For CYP2D6 inhibition classification shape, atomic polarizability, electrostatic, hydrophobic, lipophilicity and acid base features have already been reported in literature [106,107].Herein, a CYP2D6 decision tree classifier was built using 1647 compounds and a set of seven descriptors as explained in Table 3.Total charge descriptor appeared as the branching node and the other selected descriptors include HBD, HBA, rings, sCenters, logP and molecular weight (Figure 9c).Two extremely important descriptors including logP(o/w) and molecular weight were selected by our CYP2D6 inhibitor classification showing mean logP(o/w) and MW of 400.19/3.69 and 395.50/3.61for actives and inactives, respectively.Considering the selected descriptors for the three categories of active, efficient and inactive CYP2D6 inhibitors, it is evident that efficient inhibitors represent the highest mean HBD, HBA, total charge and stereocenters with lower molecular weights, logD values and ring counts (Table 3).
Finally, CYP3A4 inhibition-based decision tree was built using a set of 2641 active including 43 efficient and 106 inactive compounds (Figure 1).Briefly, size and hydrophobicity of a chemical entity are the molecular properties that have an influential role in determining CYP3A4 inhibition [82].Choi et al. have built recursive partitioning trees for CYP3A4 inhibitor and noninhibitor classification using a set of 2D descriptors indicating molecular weight as the most conclusive feature which is also shown by the physicochemical property analysis and decision tree of CYP3A4 inhibitors.For the decision tree, the other discerning descriptor was vsa_acc (Figure 9d).The chemical entities with a molecular weight >235.28 were classified as active CYP3A4 inhibitors.These also include 17 highly efficient inhibitors with LE and LipE values in the range of already discussed thresholds.It is obvious that the highest mean values of molecular weight and vsa_acc were shown by highly efficient CYP3A4 inhibitors followed by active and inactive compounds (Table 3).
Additionally, for our decision tree classifiers, the model evaluation was performed using specificity, sensitivity, accuracy, MCC and kappa statistics.A large number of independent studies based on classification models to probe inhibition of CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4 with variable data sets (109-17,143 compounds), overall accuracies, MCC and kappa statistics in the ranges 61.9-97%, 0.287-1 and 0.38-0.65 have been reported in literature until now [36,103,[106][107][108][109][110][111][112][113][114][115].The details of statistical parameters to evaluate the model performance have been provided in Table 4.It is notable that all four CYP inhibition-based decision tree models in this study show an overall accuracy and sensitivity above 90% for both the training set and 10-folds cross validation.Moreover, based on the model evaluation parameters the best performance was shown by the decision tree classifier for CYP1A2 followed by CYP2D6 inhibition (Table 4).

Discussion
The discovery of small-molecule drugs is a challenging endeavor that relies on parallel optimization of several parameters including efficacy, pharmacokinetics and safety [116].Therefore, ADMET properties, in addition to the pharmacological parameters, are of extreme importance to achieve clinical success of a drug candidate [117].Amongst these, drug metabolism plays a major role in determining the therapeutic fate of drugs where high metabolic liability can ultimately lead to high clearance and loss of pharmacological activity.However, poor metabolic turnover might lead to toxicity and adverse drug reactions due to the accumulation of drugs or active metabolites.Additionally, the inhibition or induction of drug metabolism due to co-administration of drugs might also lead to potential drug-drug interactions.Furthermore, during the recent years, high attrition rates during preclinical and later stages of clinical drug development have been associated with drug safety problems [118][119][120][121].
For small-molecule drug candidates, toxicology at preclinical phase and drug safety concerns at clinical trial remain the major reasons for higher attrition rates and clinical failures accounting for 25% phase I and 14% phase II failures during 2013 to 2015 [122,123].Therefore, for the elucidation of drug metabolism at molecular level, a better understanding of metabolic properties and a revised research and development (R&D) strategy might assist in the optimization of the metabolic stability and safety properties of NCEs, eventually leading to an efficacious drug discovery and development process [124].Thus, for the improvement of R&D productivity, recently a 5R framework based on right target, right tissue, right safety, right patient and right commercial potential has been applied that increased the success rates from 4% (2005-2010) to 19% (2012-2016) from candidate selection to the successful completion of phase III [3].
Furthermore, probing toxicological profiles of new chemical entities remains an important cornerstone of the drug development process which is experimentally expensive and the translation of animal model results to humans are also challenging.Therefore, a number of in silico models based on machine learning techniques using different combinations of data types have been developed for toxicity prediction of thousands of NCEs/drugs yet with their own strengths and weaknesses [125].Herein, we presented a data set of 6999 inhibitors against five CYP isoforms mainly CYP1A2,2C9,2C19,2D6, and 3A4 with known activity values.A combination of ligand efficiency metrics (LipE and LE), physicochemical parameters and decision tree models have been used to discern the important property trends across various activity levels (actives-> highly efficient-> inactive) to probe CYP inhibition using large datasets of CYP inhibitors.
Table 1 summarizes clogP, LipE, LE and heavy atom ranges across the activity levels of CYP inhibitors which might provide valuable ranges and mean values generally for all CYP inhibitors and particularly for most efficient CYP inhibitors.It is important to note that a CYP inhibitor might not necessarily be an efficient inhibitor.Therefore, the estimation of these parameter ranges might prove useful for the differentiation between general inhibitors and efficient/potent inhibitors against the selected CYP450s.Additionally, during the recent years, many studies probing the properties of fragments, HTS High Throughput Screening) hits, corresponding leads, clinical candidates and marketed drugs for the investigation and identification of successful trends in physicochemical parameters leading to the formulation of several rules for future drug designing programs have been reported [49,54,[126][127][128][129].The physicochemical properties of oral drugs reaching clinical phase II were estimated by Lipinski et al. to frame the well-known rule of five indicating that a logP ≤ 5, MW ≤ 500, HBAs ≤ 10 and HBDs ≤ 5 is necessary for absorption or permeation [54].For fragments and drug-like compounds, these parameters have already been estimated (see Table 5) but the current study is aimed at identifying the ranges of RO5 parameters, TPSA, rotatable bonds, LipE and LE generally for all CYP inhibitors and more specifically, for active and highly efficient inhibitors against the selected CYP isoforms.Amongst the RO5 physicochemical properties, MW of a chemical entity is one of the most important parameter in the drug discovery programs that can influence absorption, elimination, blood brain barrier penetration and interactions with on-targets and off-targets [15,76].For fragment-like and drug-like compounds, MW < 300 and < 500 have been reported [54,127,130] (Table 5), whereas from the analysis of MW trends in our dataset, mean values between 340 to 470.8 and 294 to 482 have been observed for active and highly efficient CYP inhibitors, respectively.Additionally, an increasing order of MW across CYP1A2-> 2D6-> 2C19-> 2C9-> 3A4 and CYP1A2-> 2C19-> 2C9-> 2D6-> 3A4 has been observed for the active and efficient CYP inhibitors.Overall, amongst all subtypes the lowest molecular weight was observed for highly efficient (MW: 195.22 Additionally, lipophilicity parameter is an important mediator of the overall ADMET properties, where a high lipophilicity might hamper compound solubility and metabolism, whereas lower lipophilicity might ultimately lead to decreased permeability [87].It is also well explicated that high lipophilicity associated with a chemical entity might lead to target promiscuity and toxicity issues arising from hERG inhibition, phospholipidosis or cytochrome P450 (CYP) inhibitions [15,87].Generally, for fragments and drug-like compounds a clogP of < 3 and < 5 has been reported [54,127,130], but for our dataset of CYP inhibitors, a mean clogP of ~4 has been observed for active inhibitors of all CYP subtypes, whereas for highly efficient inhibitors, mean clogP values between 1.56-2.36have been observed (Table 5).Moreover, lipophilicity of active compounds increased from CYP2D6-> 3A4-> 1A2-> 2C9-> 2C19, whereas an inverse order was observed for highly efficient CYP inhibitors.Thus, new chemical entities displaying clogP values of 1.56 to ~4 might show greater CYP450 inhibition potential.
However, as far as the hydrogen bonding potential is considered, fragment-and drug-like compounds generally follow the rule of three and RO5, whereas from our datasets, the mean HBAs and HBDs counts for active and efficient CYP inhibitors were observed between 3 to 5/3 to 6 and 1 to 2/1 to 3, respectively.For active compounds, the increase in both the number of HBA and HBD was observed in the order 1A2-> 2C9-> 2C19-> 2D6-> 3A4.However, with the exception of CYP1A2 and CYP2D6 inhibitors, all highly efficient inhibitors displayed a greater number of HBA/HBD as shown in Table 5.Thus, indicating that new compounds containing a lower number of HBA/HBD (Mean: 2-3/1-2) screened against the studied parameters might display greater inhibition potential against CYP1A2 and CYP2D6, respectively.Another descriptor associated to hydrogen bonding is the topological surface area (TPSA) [131,132] that was monitored for the CYP inhibitor datasets.A previous QSPR(Quantitative Structure Property Relationship) study associated an increase in the TPSA and the number of rotatable bonds to decreased oral bioavailability, and proposed that TPSA can be used effectively with number of rotatable bonds to reveal flexibility of molecules [133].Moreover, thresholds of rotatable bonds (≤ 10) and TPSA (≤ 140 Å 2 ) were defined to obtain direct correlation with oral bioavailability in the rat [133].A TPSA of ≤ 60 Å 2 and ≤ 140 Å 2 has been described in literature for fragments and drugs [54,127,130,133] in humans, whereas for our CYP inhibitor data sets, increasing trend of TPSA was observed while moving from actives to efficient inhibitors with CYP3A4 showing the highest TPSA values of > 83.46 Å 2 and CYP1A2 showing the lowest TPSA values (> 57.25 Å 2 ), as shown in Table 5.For CYP1A2 and CYP2C9, an increase in number of rotatable bonds was observed while moving from efficient to active inhibitors, whereas for CYP2C19, CYP2D6 and CYP3A4, no difference was observed between active and efficient inhibitors in terms of rotatable bonds.Overall, our results show that new chemical entities displaying mean TPSA values between 67.6-105 and mean rotatable bonds between 2 to 8 might show better bioavailability ultimately leading to a greater inhibition potential against the major metabolic CYP isoforms during first pass metabolism.Specifically, new chemical entities with mean TPSA values and number of rotatable bonds falling within 56.25-67.61/2-5,81.3-93.75/6-7,71.9-98.81/6,62.58-71.67/7and 83.46-105/8 might show greater inhibition potential against CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4 due to increased bioavailability.
Various classification models to probe the inhibition of CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4 with variable data sets, overall accuracies, MCC and kappa statistics have been reported in literature [35,[71][72][73][74][75][76]78,79,[95][96][97].However, a direct comparison with already reported instances in literature is difficult since variable datasets, descriptor selection and description methods have been used.Thus, a comparison of model evaluation parameters of studies reporting CYP inhibition classification models would be reasonable.Overall, an accuracy and sensitivity above 90% is shown by our CYP inhibition-based decision tree models using training sets and 10-folds cross validation (Table 4).The MCC, kappa statistics and AUC values for the CYP inhibition-based decision tree models were observed in the ranges 0.299-0.50,0.233-0.4).
Generally, for the prediction of drug-drug interactions associated to CYP inhibition, various in silico approaches and web based computational tools have been reported in literature [134][135][136][137][138][139].These include WhichCyp [134], vNN Web Server [136], admetSAR [140] and yet other freely available tools based on classification models for the prediction of CYP inhibition potential associated with new chemical entities.So far, other more sophisticated methods, based on dynamic mechanistic model taking into account the simultaneous influence of reversible and irreversible CYP inhibition and DDI module of GastroPlusTM for the prediction of time dependent CYP inhibition, have also been reported in the literature [137,139].However, in this particular study, the trends of ligand efficiency metrics and physicochemical properties for cytochrome P450 enzymes using large datasets of CYP inhibitors have been calculated as a simple step towards a better understanding of cytochrome P450 inhibition by estimating activity thresholds across various classes and activity levels (while considering active, efficient and finally the inactive inhibitors) which might assist in the optimization of overall properties of new chemical entities during the drug discovery phases.

Conclusions
Here, we estimated LipE and LE profiles along with the physicochemical properties and decision tree models for CYP1A2,2C9,2C19,2D6 and 3A4 inhibitor classification to effectively distinguish active inhibitors from inactive and highly efficient inhibitors.Moreover, the features important for inhibition against each CYP isoform were encoded by the relevant set of descriptors including molecular weight, lipophilicity, number of hydrogen bond acceptor and donors, total charges, stereocenters and ring counts.Additionally, the clogP, LipE, heavy atom count and LE trends were analyzed for CYP1A2,2C9,2C19,2D6 and 3A4 inhibitor data sets to provide the thresholds of these parameters for active (IC 50 ≤ 50 µM), highly potent (clogP ~1.0-3.0,LipE ≥ 5, LE ≥ 0.29) and inactive (IC 50 > 50-100 µM) inhibitors against each CYP subtype.Generally, amongst the entire data set of CYP inhibitors, the highly efficient inhibitors show mean MW, HBA, HBD and logP values between 294.18-482.40,5.0-8.2,1-7.29 and 1.68-2.57,respectively.Overall, our results could aid the early prediction of CYP inhibition against the major players of drug metabolism (CYP1A2,2C9,2C19,2D6 and 3A4) during the drug discovery phases.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2079-3197/7/2/26/s1,Table S1: LipE and LE profiling results of CYP1A2 inhibitors dataset, Table S2: LipE and LE profiling results of CYP2C9 inhibitors dataset, Table S3: LipE and LE profiling results of CYP2C19 inhibitors dataset, Table S4: LipE and LE profiling results of CYP2D6 inhibitors dataset and Table S5: LipE and LE profiling results of CYP3A4 inhibitors dataset.
Author Contributions: Y.S.K. and I.J. conceived and designed the project.Y.S.K. and I.J. carried out all the work, analyzed the results and wrote the paper.

Figure 2 .Figure 1 .
Figure 2. The schematic work flow used in this study to probe the properties of the selected CYP inhibitors across various activity levels.IC50: Inhibitory Potency; clogP: Lipophilicity

Figure 1 .
Figure1.The total number of cytochrome P450 (CYP) inhibitors split into "active," "inactive" and "efficient" along with the respective potency ranges against each CYP isoform.

Figure 2 .
Figure 2. The schematic work flow used in this study to probe the properties of the selected CYP inhibitors across various activity levels.IC50: Inhibitory Potency; clogP: Lipophilicity

Figure 2 .
Figure 2. The schematic work flow used in this study to probe the properties of the selected CYP inhibitors across various activity levels.IC 50 : Inhibitory Potency; clogP: Lipophilicity

Figure 6 .
Figure 6.The bar chart distribution showing the number of Lipinski's violations (RO5) against CYP inhibitor datasets.The most commonly violated RO5 descriptor pairs for each CYP subtype including logP, MW and HBA have also been labelled.For each data set bars are color-coded according to the number of RO5 violations, where 0 indicates no violation and 1,2 and ≥ 3 show one, two and three or more RO5 violations, respectively.

Figure 6 .
Figure 6.The bar chart distribution showing the number of Lipinski's violations (RO5) against CYP inhibitor datasets.The most commonly violated RO5 descriptor pairs for each CYP subtype including logP, MW and HBA have also been labelled.For each data set bars are color-coded according to the number of RO5 violations, where 0 indicates no violation and 1,2 and ≥ 3 show one, two and three or more RO5 violations, respectively.

Figure 7 .
Figure 7. Golden Triangle rule positioning of (a) CYP1A2 inhibitors, (b) CYP2C9 inhibitors, (c) CYP2C19 inhibitors, (d) CYP2D6 inhibitors and (e) CYP3A4 inhibitors.Here, all compounds are denoted by blue points, whereas highly efficient inhibitors are shown by red points.The compounds located in the Golden Triangle show a greater likelihood of an optimal permeability, low clearance and a better metabolic stability.

Figure 8 .
Figure 8. Pfizer 3/75 rule positioning of (a) CYP1A2 inhibitors, (b) CYP2C9 inhibitors, (c) CYP2C19 inhibitors, (d) CYP2D6 inhibitors and (e) CYP3A4 inhibitors.Here, all compounds are denoted by blue points and highly efficient inhibitors of CYP isoforms are shown by red points.The compounds located in the pink square show a greater likelihood to cause toxicity and experimental promiscuity.

Figure 9 .
Figure 9. (a) A J-48-pruned decision tree for CYP1A2 inhibitors based on HBD_HBA, sCenters, HBA and molecular weight.(b) A J-48-pruned decision tree for CYP2C9 inhibitors based on molecular weight, HBD and logD.(c) A J-48-pruned decision tree for CYP2D6 inhibitors based on total charge, stereocenters, HBD, HBA, logP, rings and molecular weight descriptors.(d) A J-48-pruned decision tree for CYP3A4 inhibitors based molecular weight and vsa_acc from the set of selected descriptors.

Table 3 .
The relevant set of descriptors for each CYP inhibition decision tree classifier along with the description, average values of selected descriptors for active, efficient and inactive inhibitors against each CYP isoform are presented.

Table 3 .
The relevant set of descriptors for each CYP inhibition decision tree classifier along with the description, average values of selected descriptors for active, efficient and inactive inhibitors against each CYP isoform are presented.

Table 4 .
The model evaluation parameters for the inhibitor-based decision tree classifiers of CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4.MCC: Matthews correlation coefficient; AUC: Area under the curve.

Table 4 .
The model evaluation parameters for the inhibitor-based decision tree classifiers of CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4.MCC: Matthews correlation coefficient; AUC: Area under the curve.

Table 5 .
The ranges and estimated average values of rule of five parameters and ligand efficiency metrics for fragments, drug-like compounds and CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A4 active and highly efficient inhibitors.