1. Introduction
Oral administration is the predominant route for medication that can be manifested by the fact that ca. 56% of unique drugs approved by FDA in 2018 were orally administrated [
1]. Accordingly, drug absorption is one of critical absorption, distribution, metabolism and excretion, and toxicity (ADME/Tox) factors that should be taken into consideration in the process of drug discovery and development as well as practical applications [
2]. For instance, curcumin, which is the major constituent of the spice turmeric (
Curcuma longa), has a great beneficial potential in treating cancer, diabetes, osteoarthritis, antianxiety, and even novel coronavirus disease 2019 (COVID-19) [
3,
4] and yet its practical clinical applications are very limited mainly due to its poor absorption [
5]. Clinically, tuberculosis (TB) is one of the leading causes of death globally, especially for HIV/AIDS patients [
6], and the survival of extremely ill TB patients is diminished due to the poor absorption of anti-TB agents [
7].
Drug absorption mainly relies on solubility and intestinal permeability [
8], which is also termed as intestinal absorption [
9], since oral drugs must permeate the gastrointestinal barrier before they can be absorbed by the bodies [
9]. In fact, solubility and permeability have been adopted by the biopharmaceutics drug disposition classification system (BDDCS), which suggests that the intestinal permeability rate is closely correlated with the extent of metabolism [
10] Nevertheless, intestinal permeability is an extremely complicated process since drugs can pass through the intestinal epithelium to enter blood vessel by active transport as well as passive diffusion, as illustrated by Figure 3 of Dahlgren and Lennernäs [
11]. Mechanistically, the active transport can be mediated by two superfamilies expressed in the intestine, namely the influx transporters of the solute carrier (SLC) family and the efflux transporters of the ATP-binding cassette (ABC) family, whereas the passive diffusion can take place through the transcellular and/or paracellular routes [
12].
In addition, the ABC transporters including P-glycoprotein (P-gp, MDR1,
ABCB1), breast cancer resistance protein (BCRP,
ABCG2), MRP2 (
ABCC2), MRP3 (
ABCC3), MRP4 (
ABCC4), MRP5 (
ABCC5), MRP6 (
ABCC6), MRP7 (
ABCC10), MRP8 (
ABCC11), and MRP9 (
ABCC12) [
13], and the SLC transporters involving peptide transporter 1 (PepT1,
SLC15A1), concentrative nucleoside transporter 1 (CNT1,
SLC28A1), concentrative nucleoside transporter 2 (CNT2,
SLC28A2), equilibrative nucleoside transporter (ENT2,
SLC29A2), organic cation transporters 1 (OCT1,
SLC22A1), organic cation/carnitine transporter 1 (OCTN1,
SLC22A4), organic cation/carnitine transporter 2 (OCTN2,
SLC22A5), monocarboxylate Transporter 1 (MCT1,
SLC16A1), organic anion transporting polypeptide 2B1 (OATP2B1,
SLC02B1), serotonin transporter (SERT,
SLC6A4), and apical sodium-dependent bile acid transporter; (ASBT.
SLC10A2) [
14] can be found in the intestine. Their expression levels can be different in varied segments of intestine [
15,
16].
Of various in vitro assay systems to measure intestinal permeability, human colorectal adenocarcinoma cells (Caco-2), Madin−Darby canine kidney cells (MDCK), and parallel artificial membrane permeability assay (PAMPA) are commonly used [
9], and they can be affected by factors such as cell line types and cultured conditions. The in situ single-pass intestinal perfusion (SPIP) model is the most prevalent assay [
17] that normally measures effective permeability (
Peff) of the gastrointestinal (GI) tract segments, namely duodenum, jejunum, ileum, and colon, in human, rat, and mouse [
18]. The parameter
Peff, which is expressed as cm/s, can be calculated by
where
Q is the perfusion buffer flow rate;
and
are the outlet and inlet solute concentrations, respectively; and
A represents the surface area within the intestinal segment that can be computed by the radius of the intestinal segment (
R) and the length of the perfusion intestinal segment (
L) [
19],
When compared with in vitro assays, in vivo tests provide a closer to real-life environment, but they are costly, time consuming, and sometimes inhumane, and are subjected to discrepancies by a number of factors such as individual differences in intestinal cell surface and epithelial cell integrity [
20], especially they are very sensitive to the animal species because of differences in physiological conditions [
21]. More importantly, those factors can make substantial contribution to data inhomogeneity that, in turn, can create paramount obstacles to producing a sound quantitative theoretical model based on the data compiled from the public domain since only homogenous data can produce a good in silico model [
22].
In silico technologies have been seamlessly integrated into the drug discovery and development and they especially provide valuable advantages in ADME/Tox profiling due to their extremely fast throughput and low cost [
23]. As such, it is plausible to expect an in silico model that can predict intestinal permeability is very useful. Nevertheless, no sound quantitative structure–activity relationship (QSAR) model has been published to date despite, even though some qualitative studies have been conducted. The scarcity in QSAR model can be plausibly attributed to the lack of consistent and homogenous data in the public domain and, more importantly, the extremely complex process of intestinal permeability (vide supra) since it can take place through various active transport and passive diffusion routes. More specifically, the SLC transporters can enhance the drug uptake into the intestine and hence increase drug absorption, whereas ABC proteins can elevate drug efflux out of intestine and therefore reduce drug absorption [
24], leading to problematic situations for model development. For instance, the substrates of PepT1 and P-gp, which are two of the most abundant SLC and ABC transporters, respectively, in jejunum [
15], can interact with their transporter proteins by hydrogen-bond donor (HBD) [
25,
26], suggesting that HBD can simultaneously promote and hinder intestinal permeability. As such, traditional or machine learning (ML) modeling schemes are not sophisticated enough to manage such exceedingly nonlinear situations.
Accordingly, it is extraordinarily difficult, if not entirely infeasible, to develop a robust in silico model to predict the intestinal permeability with the consideration of those critical factors governing the perplexing efflux and influx active transport and passive diffusion mentioned above. Such challenge, nevertheless, can be solved by the novel ML-based hierarchical support vector regression (HSVR) scheme devised by Leong et al. [
27] since HSVR can properly depict the complicated and inconstant dependencies of descriptors that can be greatly attributed to the fact that HSVR has the advantageous features of both a local model and a global model, namely larger coverage of applicability domain (AD) and a higher degree of predictivity, respectively. Unlike most predictive models, which are vulnerable to the “mesa effect”, i.e., give mediocre performances when applied to extrapolated predictions, HSVR can substantially minimize such performance deterioration, as demonstrated elsewhere [
1,
28,
29], suggesting that HSVR is insusceptible to outliers in contrast to the other predictive models that is of crucial importance to a theoretical model [
30]. Herein, this investigation was aimed at developing an accurate, rapid, and predictive in silico model based on the HSVR scheme to predict the intestinal permeability to facilitate drug discovery and development.
3. Discussion
Numerous in silico models have been reported to predict intestinal permeability [
41,
42,
43,
44,
45,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55]. However, those published models were based on data assayed by different experimental conditions or different measured parameters, and some of them were qualitative models, making the direct comparison between HSVR and those models extremely difficult. In addition, intestine permeability is an extremely complicated process, which can take place through various active transport and passive diffusion routes (vide supra). As such, it is not uncommon to observe that various descriptor combinations associated with intestinal permeability have been identified. For example, Shultz proposed the significance of HBD, topological polar surface area (TPSA), and log
P in intestinal permeability [
56], whereas Broccatelli et al. recognized the contributions of TPSA, MW, HBD, number of rotamers (
nrot), charge, and fraction ionized at pH 7.4 (cFI
7.4) to intestinal permeability [
57].
Drugs must pass through the hydrophobic mucus layer, which is adjacent to the intestinal wall, before they can penetrate through the intestinal epithelial cells [
58]. As such, hydrophobicity is of critical importance in intestinal permeability and it can be represented by the
n-octanol–water partition coefficient (log
P) and the
n-octanol–water partition coefficient at pH 6.5 (log
D). Moreover, it was proposed by Balimane et al. that log
P and log
D should be adopted to predict the intestinal permeability since log
P alone is not sufficient enough to accurately render this complicated process [
9]. As such, both log
P and log
D were adopted by this study (
Table 1). However, the selection of both descriptors can plausibly lead to an overtrained model since the correlation coefficient between log
P and log
D was 0.73 for all molecules included in this study. This controversial issue can be eliminated by the fact that log
D was adopted by SVR A and SVR B, whereas log
P was selected by SVR C, depicting the fact that no single SVR model included both descriptors simultaneously. In fact, this dilemma of selecting both correlated descriptors to accurately predict intestinal permeability cannot be resolved by any other traditional linear or machine learning-based QSAR schemes but only by any ensemble-based scheme such as HSVR.
It has been observed that PSA is profoundly implicated in membrane permeability in passive diffusion [
59], which is completely consistent with the PAMPA study [
1] as well as intestinal permeability [
56]. In addition, permeability also relies on MW, as proposed in [
13]. Nevertheless, neither PSA nor MW was adopted by any of the SVR models in the ensemble (
Table 1). Conversely, it is seemingly unusual to observe that the descriptor
nN+O was selected by SVR A and yet has hitherto not been adopted by any reported study. These discrepancies can be realized by the fact that
nN+O was modestly correlated with PSA and MW with
r values of 0.88 and 0.71, respectively, for all molecules selected in this study. The empirical observation indicated that models with the selection of
nN+O performed better than those with the selection of PSA or MW (data not shown), plausibly due to the descriptor–descriptor interaction [
1], suggesting that it is plausible to represent PSA or MW by
nN+O. The negative correlation between
nN+O and log
Peff (−0.29) is also consistent with the fact that permeability can decrease with MW [
60].
It has been postulated that hydrogen bond, which can be characterized by HBA and HBD, plays a critical role in intestinal P-gp-mediated transport [
61] and HBD makes substantial contributions to intestinal permeability when compared with its HBA counterpart [
56]. Accordingly, HBD was adopted in this study. Nevertheless, the relationship between HBD and
Peff is seemingly obscure, as illustrated by
Figure 8, which shows the average
Peff for each histogram bin of HBD for all molecules included in this investigation. This peculiar relationship can be plausibly attributed to the fact that the substrates of PepT1 and P-gp, which are the most abundant SLC and ABC transporters, respectively, in jejunum [
15], can interact with their transporter proteins via HBD [
25,
26]. The complexity can be further increased by taking into the account the fact that P-gp inhibitors, modulators, and substrates can interact with P-gp through HBD [
26,
62,
63]. As such, HBD can simultaneously facilitate and hinder intestinal permeability, leading to a perplexing dependency, which, in turn, can create an unsurmountable hurdle for creating a predictive theoretical model regardless of traditional linear or machine learning-based schemes.
Shadow-
ν is a size-related descriptor which measures the ratio of largest to smallest dimension. It can be observed in
Figure 9, which displays the average
Peff for each histogram bin of shadow-
ν, that
Peff generally increased with shadow-
ν for all molecules selected in this investigation, suggesting that molecules with larger shadow-
ν have higher permeability than their smaller counterparts.
It has been observed that molar refractivity (MR), which is possibly associated with molecular size, polarity, and/or polarizability [
64], is closely related to ligand‒P-gp interactions [
65,
66]. Nevertheless, little correlation manifested between MR and log
Peff for all molecules enrolled in this study, with an insignificant
r value of −0.12 (
Table 1). This incongruity can be resolved by the nonlinearity between MR and
Peff, as demonstrated in
Figure 10, which illustrates the average
Peff for each histogram bin of MR. It can be observed that
Peff marginally increased with MR and substantially decreased afterwards, suggesting the nonlinear relationship between MR and
Peff. Thus, linear models cannot properly render such a complicated relationship.
The significance of the descriptor
μ in intestinal permeability has been recognized [
67] since
μ can describe the solute‒solute and solute–solvent dipole interactions [
68], as demonstrated in PAMPA permeability [
1], leading to nonlinear relationship between
μ and permeability. In addition, it has been observed that ligands can interact with the efflux transporter P-gp and the influx transporter PepT1 through dipole interactions [
69,
70,
71], giving rise to the complex role played by
μ in intestinal permeability.
It is of interest that most of descriptors selected in this study are associated with passive diffusion, which is consistent with the fact that passive diffusion is the major route for intestinal permeability for many administrated drugs [
12]. Additionally, MR, shadow-
ν, and
nN+O, which was selected in place of MW in this study (vide supra), are closely linked to molecular size, and the molecular size is a determining barrier factor in intestinal permeability as postulated [
72,
73,
74].
CSA, which is also another characteristic associated with molecular size, has been implicated in membrane permeability [
75].
Figure 11 exhibits the average
Peff for each histogram bin of CSA. It can be observed that
Peff did not show substantial variations with CSA initially, yet the
Peff value greatly dropped once CSA was larger than 75, which is very similar to the trend observed for MR (
Figure 10), suggesting that it is more difficult to penetrate the intestinal wall once the CSA values are larger than a threshold. Nevertheless, the empirical observation has indicated that HSVR with the selections of MR executed better than those with the selection of CSA (data not shown), presumably because of the descriptor–descriptor interaction [
1].
It has been indicated that ion class is one of critical factors in physiological-based pharmacokinetic (PBPK) models and ADME/Tox properties that should be taken into account [
20,
76]. Actually, it has been demonstrated that neutral compounds show higher passive diffusion [
1]. Accordingly, all molecules enrolled in this investigation were subjected to ion class analysis.
Figure 12 displays the box plot of the log
Peff minimum, maximum, mean, median, the 25th percentile, and the 75th percentile for each ion class. The log
Peff values of neutral compounds are statistically greater than the other ion classes, depicting that neutral compounds show higher intestinal permeability. It is possible to improve the compound’s intestinal permeability of the other ion classes by chemical modification to produce neutral compounds when they show low intestinal permeability.
Initially, massive attempts were made in this investigation to construct various 2-QSAR models by adopting numerous partial least squares (PLSs), but no acceptable models were yielded (data not shown) [
29]. This challenge was due to little correlation between the selected descriptors and log
Peff for those molecules selected in this study and the largest absolute maximum
r was merely 0.29 between
nN+O and log
Peff (
Table 1), depicting the highly nonlinear relationship between them. More importantly, the substantial difference in 2-QSAR development between the passive diffusion, viz. the PAMPA system, and intestinal permeability can be greatly attributed to the significant and complex roles played by those active (influx and efflux) transporters. As such, it is extremely difficult to build a sound linear model to predict intestinal permeability. Conversely, the accurate and predictive HSVR model can considerably delineate such nonlinear dependence of log
Peff on descriptors.