Machine Learning Techniques Applied to the Study of Drug Transporters

Kong, Xiaorui; Lin, Kexin; Wu, Gaolei; Tao, Xufeng; Zhai, Xiaohan; Lv, Linlin; Dong, Deshi; Zhu, Yanna; Yang, Shilei

doi:10.3390/molecules28165936

Open AccessReview

Machine Learning Techniques Applied to the Study of Drug Transporters

by

Xiaorui Kong

¹

,

Kexin Lin

¹,

Gaolei Wu

²,

Xufeng Tao

¹,

Xiaohan Zhai

¹,

Linlin Lv

¹,

Deshi Dong

¹,

Yanna Zhu

^1,*

and

Shilei Yang

^1,*

¹

Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China

²

Department of Pharmacy, Dalian Women and Children’s Medical Group, Dalian 116024, China

^*

Authors to whom correspondence should be addressed.

Molecules 2023, 28(16), 5936; https://doi.org/10.3390/molecules28165936

Submission received: 30 June 2023 / Revised: 27 July 2023 / Accepted: 2 August 2023 / Published: 8 August 2023

(This article belongs to the Special Issue New Advances in Drug Metabolism and Pharmacokinetics)

Download

Browse Figure

Review Reports Versions Notes

Abstract

With the advancement of computer technology, machine learning-based artificial intelligence technology has been increasingly integrated and applied in the fields of medicine, biology, and pharmacy, thereby facilitating their development. Transporters have important roles in influencing drug resistance, drug–drug interactions, and tissue-specific drug targeting. The investigation of drug transporter substrates and inhibitors is a crucial aspect of pharmaceutical development. However, long duration and high expenses pose significant challenges in the investigation of drug transporters. In this review, we discuss the present situation and challenges encountered in applying machine learning techniques to investigate drug transporters. The transporters involved include ABC transporters (P-gp, BCRP, MRPs, and BSEP) and SLC transporters (OAT, OATP, OCT, MATE1,2-K, and NET). The aim is to offer a point of reference for and assistance with the progression of drug transporter research, as well as the advancement of more efficient computer technology. Machine learning methods are valuable and attractive for helping with the study of drug transporter substrates and inhibitors, but continuous efforts are still needed to develop more accurate and reliable predictive models and to apply them in the screening process of drug development to improve efficiency and success rates.

Keywords:

machine learning; drug transporters; inhibiter; substrate

1. Introduction

Drug transporters are a group of transmembrane proteins that are widely distributed throughout the human body. They facilitate the movement of endogenous and exogenous substances into and out of biofilms, thereby influencing drug absorption, distribution, metabolism, excretion, and other pharmacokinetic processes. The investigation of transporters holds great importance in relation to pharmacokinetics, pharmacodynamics, drug–drug interactions (DDIs) and drug toxicity. Over 400 transporters have been identified in the human genome [1], primarily belonging to two superfamilies: the ATP-binding cassette (ABC) and the solute carrier (SLC) transporter. Over nearly two decades, various in vitro, in situ/ex vivo, and in vivo methods have been developed to study transporter function and drug–transporter interactions for the identification of their substrates or inhibitors. In vitro models comprise membrane-based and cell-based assays, whereas in vivo models encompass transporter gene knockout, natural mutant animal models, and anthropogenic animal models. In situ/in vitro models pertain to isolated and perfused organs or tissues, such as the liver, kidney, intestine, lung, and brain [2]. Although traditional research methods are constantly updated and improved, their experimental costs and time consumption remain significant obstacles in the research process, which is also a common challenge encountered during drug development. In addition, computational methods such as virtual screening (VS) and molecular docking are also employed in the study of drug transporters [3]. Molecular docking is a computational tool that enables the prediction of ligand conformation and binding affinity, as well as the identification of drug side effects and toxicity. Initially developed for investigating molecular recognition between small and large molecules, molecular docking has gained increasing popularity in supporting drug discovery programs. Its applications include but are not limited to hit identification and optimization, drug repositioning, a posteriori target identification (reverse screening), multi-target ligand design, and repositioning [4]. Common molecular docking platforms include DockThor-VS, Durrant Lab, iGEMDOCK [5], and AutoDock [6]. The proliferation of data and improvements in analysis techniques have given rise to ML-based prediction models, which present a valuable opportunity for investigating transporters [7]. These models can effectively address challenges in traditional research methods, providing advantages such as costs and time savings, as well as overall efficiency.

Currently, an increasing number of machine learning models are being developed in the field of transporter research. In this review, we focus on the application of machine learning in the study of transporters, with particular emphasis on recent advances in predicting transporter substrate/inhibitor interactions using machine learning models, in order to provide a reference for and help with the study of transporters.

2. Drug Transporters and Important Implications

Transporters are a ubiquitous class of proteins that are located on the cell membrane and facilitate transport functions throughout the human body (Figure 1). Until now, it has been generally assumed that multispecific drug transporters are derived from two transporter superfamilies: the ABC superfamily and the SLC superfamily. ABC transporters are a family of efflux transporters that transport drugs and endogenous substances by reversing the energy concentration gradient after ATP hydrolysis and are related to drug bioavailability, tumor multidrug resistance, and disease. Among the ABC family members, P-glycoprotein (P-gp), multidrug resistance-associated protein (MRP), and breast cancer resistance protein (BCRP) are considered to be important causes of multidrug resistance (MDR) of tumor cells [8] and therefore are the most studied subtypes related to drug transport. In addition, there are bile salt transporters (BSEP). Bile salt transporters that are not functioning properly or are expressed abnormally have been identified as significant factors contributing to various liver diseases, particularly those causing cholestasis [9]. Most SLC transporters are located on the cell membrane and rely on electrochemical and ion concentration gradients to transport substrates, regulate the exchange of soluble molecular substrates between the two sides of the lipid membrane, and maintain the stability of the intracellular environment. Over 400 transporters have been identified to date, displaying a wide range of substrates such as sugars, amino acids, vitamins, nucleotides, metals, inorganic ions, organic anions, oligopeptides, and drugs [10]. The SLC22 transporter family is among the most extensively researched SLC families in terms of drug handling [11], playing a central role in the transport of small molecule endogenous substances, drugs, and endotoxins across tissues and interfacial fluids. SLC transporters involved in drug transport are primarily composed of organic anion transporters (OATPs), organic anion transporters (OATs), organic cation transporters (OCTs), and oligopeptide transporters (PEPTs). Another SLC transporter, the multidrug and toxic compound efflux transporter (MATE), is an efflux transporter. Transporters expressed in the intestine, liver, and kidney play a critical role in the drug absorption, distribution, metabolism, and excretion (ADME) process. These transporters play a crucial role in regulating drug concentrations in both blood and tissues. Oral medication is primarily absorbed in the gastrointestinal tract, and its bioavailability is influenced by both uptake and efflux transporters present in this region. PEPT1, a transport protein expressed on the brush border membrane of the intestine, facilitates the absorption and transportation of peptide-like anticancer drugs within the gut. Linking the drug with a dipeptide can improve its bioavailability in the human body [12]. P-gp is the most extensively studied efflux transporter and plays a crucial role in limiting the bioavailability of numerous orally administered drugs [13]. MRP2 and BCRP are also expressed in the intestinal tract, with known substrates including statins, methotrexate, and other compounds. Transporters play a significant role in drug tissue distribution and elimination, ultimately influencing drug selectivity. Within the blood–brain barrier (BBB), various transport proteins, including P-gp, BCRP, and OCTs, play crucial roles in the distribution of neuroactive drugs. These transport proteins regulate the velocity and direction of drug transportation across the BBB. P-gp and BCRP can collaborate to facilitate the transportation of chemotherapy drugs [14].

SLC family members such as OCT1, OAT2, and NTCP are responsible for drug uptake into liver cells [15], whereas transport proteins involved in drug hepatobiliary efflux include P-gp, MRP2, BSEP, and BCRP [16,17]. There are many transporters (OCT, OAT, OATP, PEPTs, etc.) expressed on renal tubular epithelial cells that participate in proximal tubular secretion and reabsorption processes. These transporters play a crucial role in transferring drugs or their metabolites into urine for excretion [18]. In summary, alterations in transporter function can affect the ADME process and consequently drug efficacy, with transporters playing a crucial role in pharmacokinetics.

Transporters also play a crucial role in DDI by modulating the disposition of drugs within the body. DDI occurs when a drug influences the action of another drug by inhibiting or inducing one or more processes. Transporter-mediated DDIs, particularly those involving transporters expressed in the intestine, liver, kidney, and BBB, have garnered significant attention. DDI is likely to occur when the co-administered drug is a substrate, inhibitor, or inducer of the transporter protein. Through machine learning techniques, we can find the substrate or inhibitor of the transporter in a more efficient way, which can help us to better understand drug interactions during transporter studies, which can be important for both drug development and basic medical research.

3. Machine Learning

Artificial intelligence (AI) is widely utilized, leveraging computational power to emulate human cognitive processes. Machine learning, a pivotal component of artificial intelligence, can be traced back to 1943 [19]. The term refers to the capability of software to accomplish a task by means of learning from data and has been widely employed in various domains, such as data integration and analysis [20,21]. Machine learning possesses the ability to identify complex patterns from vast and complex molecular descriptor datasets, making it particularly suitable for predicting transporter substrates and inhibitors. Depending on the type of data, such as whether sample labels are available, machine learning algorithms can be classified as supervised, semi-supervised, and unsupervised learning [22]. When machine learning is used to predict transporter substrates or inhibitors, it is often done through supervised learning, where models are built using labeled training data. Model building can include options such as decision trees, random forests, neural networks, support vector machines, logistic regression, k-nearest neighbors, and more. Each model has its own characteristics and suitable environments.

3.1. Decision Trees and Random Forests

Tree-based algorithms are very popular in machine learning and are a method of classification and regression using decision trees [23]. Decision tree learning is a supervised learning technique based on the concept of recursive classification. In this method, classification models are represented as tree structures that start at a decision point and use a feature that can split the data. Each split is connected to a new decision point that contains more features to further separate the data. In addition to simple decision trees, there are newer ensemble methods, such as random forest (RF) and gradient boosting trees (XGBoost). Random forest builds multiple decision trees and combine their prediction results to improve prediction accuracy and prevent overfitting. Each decision tree is created using a subsample of features, not each feature [24].

3.2. Neural Network

Artificial neural network (ANN), deep neural network (DNN), and deep learning (DL) are also common algorithms in the field of machine learning. The concept is grounded in the architecture of the human brain and can be effectively applied to both regression and classification problems. An Ann model consists of units that combine multiple inputs and produce a single output, including an input layer, an output layer, and a hidden layer between them, each consisting of multiple neurons in parallel. The existence of hidden layers enables the categories in the input signal that are not linearly separable to be distinguished. The nonlinear activation function modifies the signal of the input node and outputs it to the next node, each output node corresponds to the task to be predicted, and finally, the complex information is classified [25]. DNNs are artificial neural networks with multiple hidden layers, which are considered deep learning algorithms with more complex networks and data volumes, so the problem of overfitting needs to be considered. There are several well-known variants of deep learning, such as convolutional neural networks, recurrent neural networks (RNNs), and so on [26].

3.3. Support Vector Machine

Support vector machine (SVM) is a kind of machine learning with maximization (support) of separating the margin (vector), which is a classical nonlinear classification and regression modeling algorithm. The separation hyperplane is constructed in space, the distance between the separation hyperplane and the nearest expression vector is defined as the edge of the hyperplane, and the classification ability is maximized by selecting the maximum edge to separate the hyperplane. Constructing the optimal hyperplane requires support vectors and some training data [27,28]. Achieving the optimal separation requires the application of kernel functions, which can add additional dimensions, and the data become better separated in the high-order space, which is also an advantage of SVM.

3.4. Naïve Bayes

Naïve Bayes (NB) uses Bayes’ theorem to classify data under the assumption that each feature of a sample is uncorrelated (strongly independent) with other features. Compared with other machine learning algorithms, the Bayesian algorithm is a faster and simpler algorithm, only needs to consider each predictor variable in each class separately [29], and has relatively low accuracy, so it can perform better on less complex data.

3.5. k-Nearest Neighbor Algorithm

The k-nearest neighbor algorithm (k-NN) is a machine learning algorithm mainly used for classification and is widely used due to its simple and easy-to-understand design [30]. It classifies unlabeled data by assigning them to the most similar labeled category. The k-nearest training data (neighbors) are considered, and the final classification is determined and checked according to the majority voting rule [31]. Factors such as the k value, distance calculation, and appropriate predictor variable selection can all have a significant impact on model performance [32].

The general procedure for identifying new substrates/inhibitors of drug transporters through machine learning techniques is outlined as follows: (1) A database is built of known compounds as substrates/inhibitors as a dataset; (2) the chemical information of the compound is analyzed, extracted, and converted into a form that can be recognized by the algorithm; and (3) the constructed dataset is split into a training set and a validation set. Machine learning methods are employed to learn from the training set and develop the model, whereas the validation set is used to test and enhance the newly created model. (4) The unknown compounds are predicted and verified.

4. Application of Machine Learning Methods in the Investigation of Drug Transporters

4.1. ABC Transporters

Many human ABC proteins are efflux transporters, including P-gp (ABCB1), MRPs (ABCC), and BCRP (ABCG2), and function as efflux pumps that actively extrude compounds such as drugs from the cell. The classical ABC transporter is structurally composed of four structural domains, two transmembrane domains (TMDs), and two cytoplasmic nucleotide-binding domains (NBDs) [33]. Transporter proteins associated with MDR belong to the ABC transporter superfamily, which is one of the major barriers to cancer therapy and affects drug accumulation in cancer cells [34]. Among these transporters, P-gp is considered to be the major contributor to cellular multidrug resistance. The tissue distribution and cellular localization of transporters influence drug efficacy and toxicity. Therefore, it is essential to study the efflux transport of drugs and identify the substrates of efflux transporters. Additionally, exploring efflux transporter inhibitors represents a promising research direction for addressing drug resistance. The machine learning methods used by the researchers are listed in Table 1.

4.1.1. P-gp

The ABCB1 transporter, also known as P-gp, belongs to the ABCB subfamily. It was first identified in Chinese hamster ovary cells in 1976 [55]. With the introduction of the concept of the ABC transporter family [56], the research on P-gp gradually increased. There are two genes that encode P-gp in humans: MDR1 and MDR1A/1B P-gp, which are mainly distributed in human small intestine, colon, liver, kidney, brain, and other tissues and organs, as well as barrier tissues such as the blood–brain barrier, blood–testis barrier, and placental barrier. They are also expressed in the lung, heart, and spleen [57]. P-gp functions as an efflux transporter for endogenous substances, exogenous substances (drugs and their metabolites), and toxins out of cells. Therefore, in normal tissues, P-gp-mediated efflux transport helps to reduce toxicity and protect cells, but at the same time, it limits the absorption of drugs and reduces the bioavailability [58]. P-gp is highly expressed on the membrane of many tumor cells, which is directly related to the multidrug resistance of tumor cells. Not only anticancer drugs but also HIV protease inhibitors and immunosuppressants are the substrates of P-gp [59]. Therefore, drugs that inhibit P-gp are anticipated to elevate the intracellular concentration of chemotherapeutic agents and enhance their sensitivity.

P-gp is the earliest discovered transporter [60] and has been studied for about 30 years. Therefore, there is a large amount of data accumulation on P-gp transporters, and most machine learning methods in the early stage are carried out around P-gp. With the development of computer technology, machine learning prediction models of P-gp have been constantly improved.

In 2019, Kadioglu et al. [35] established a prediction platform for P-gp modulators using machine learning methods (including k-NN, neural networks, RF, and SVM). They used defined chemical descriptors to predict whether test compounds can act as substrates or inhibitors of P-gp. It is noteworthy that they also validated the results using molecular docking in terms of binding energy and docking poses. The RF classification algorithm performed better than other algorithms in feature selection. In 2020, Esposito et al. [36] combined machine learning with MD simulations using the MDFP/ML approach, using molecular dynamics fingerprints (MDFPs) as orthogonal descriptors to distinguish and predict substrates and non-substrates of P-gp. The study used four different ML methods, namely, RF, GTB, SVM, and meta-learner. When the model was validated with an external validation set, it was found that only models trained on MDFPs or attribute-based descriptors could be applied to chemical space areas not covered by the training set. Despite P-gp being a well-known entity for over three decades, the lack of improved selective inhibitors targeting this protein can be attributed to its specificity and unknown structural characteristics.

4.1.2. BCRP

BCRP, a member of the G subfamily of the ABC family, was first identified in the multidrug resistant human breast cancer cell line MCF-7/AdrVp [61]. BCRP is widely expressed and distributed in several normal tissues, such as the small intestine, liver, brain endothelium, and placenta [62]. It can confer resistance by pumping chemotherapy drugs out of cells. In the past decade, the research of machine learning in BCRP has developed rapidly.

Hazai et al. [44] developed an SVM prediction model of BCRP substrate based on the known substrates and non-substrates of BCRP in 2013. For model verification, a training set/testing set machine ratio of 0.75/0.25 was chosen, and the overall prediction accuracy for the independent external validation dataset was 40%. Moreover, the prediction accuracy for the wild-type BCRP substrate was higher than that of the non-substrate, with a rate of 76%. The 3D structure of the substrate was found to be a possible determinant of the BCRP–substrate interaction by the molecular descriptors it used. In 2014, Ding et al. [45] developed an accurate, fast, and robust pharmacophore ensemble/support vector machine (PhE/SVM) model to predict the BCRP inhibition of structurally diverse molecules. Due to the confounding nature of BCRP, this method does not produce significant bias when applied to various structurally diverse inhibitors. In 2016, Montanari et al. [41] integrated data using KNIME workflows to build a multi-label classification model of BCRP/P-gp inhibitory activity using a machine learning approach. Key molecular features affecting transporter selectivity were retrieved by comparing various multi-label learning algorithms. The KNIME workflow is an effective solution for merging data from multiple sources and constructing multi-label datasets that are tailored for BCRP and/or P-gp. Using the dataset created through the KNIME workflow, it was possible to distinguish between selective BCRP inhibitors and selective P-gp inhibitors by examining only two features: the count of hydrophobic and aromatic atoms, and the shared characteristics between dual and selective inhibitors. In 2017, Gantner et al. [42] were the first to combine computer predictions of BCRP with experimental validation to develop nonlinear computer models of BCRP substrates. The J48 decision tree induction algorithm implemented by the C4.5 decision tree algorithm in WEKA3.651 is used to obtain the corresponding nonlinear classification model, and the genetic algorithm (GA) is used to select the best descriptor. The selected non-substrate compounds were experimentally validated using a stereovalgus rat intestinal sac model, which demonstrated the predictive power of the model. The rfSA technique is a feature selection approach that uses both the simulated annealing (SA) algorithm and RF to eliminate redundant and irrelevant features. In 2020, Jiang et al. [43] used XGBoost and DNN methods for the prediction of BCRP inhibitors for the first time and obtained good prediction results. A diverse set of 1098 BCRP inhibitors and 1701 non-inhibitors was compiled as a dataset, and the molecular descriptors linked to BCRP inhibition were explored. It was found that one of the characteristics of BCRP inhibitors was high hydrophobicity and aromatic properties. Seven ML methods (DNN, SGB, XGBoost, NB, weighted k-NN, RLR, and SVM) were used to develop the classification model. The Bayesian optimization algorithm was used to optimize the hyperparameters. The results showed that the SVM, XGBoost, and DNN methods were superior to other methods, and SVM had the best prediction ability. Analysis of the misclassified compounds revealed that most of them had complex structures and may not be able to be accurately characterized by traditional descriptors. In 2021, Ganguly et al. [40] used a Bayesian machine learning model to predict the metabolites most likely to be BCRP or P-gp substrates in CSF and plasma of dKO rats, demonstrating that CSF may be a better substrate for identifying endogenous substrates of BCRP and P-gp.

BCRP has been shown to have a role in the permeability of the blood–brain barrier, resulting in the failure of most CNS-acting drugs in clinical trials [63]. In 2014, Garg et al. [46] developed a machine learning model to evaluate the effect of BCRP on BBB, an artificial neural network model to predict the BBB permeability of molecules, and an SVM model to predict the substrate of BCRP. Through molecular docking analysis, 11 molecules were identified as meeting the criteria for BBB penetration. Additionally, these compounds were predicted to be substrates of BCRP in BBB permeability.

4.1.3. MRPs

MRPs are active transporters of the ABC family and are widely distributed in the lung, kidney, brain, and other organs. MRPs contain many isoforms, among which MRP1, MRP2, and MRP4 are highly expressed in tumor cells and mediate the efflux of a variety of anti-tumor drugs, leading to the occurrence of multidrug resistance. Therefore, the study of MRPs is highly significant in combating multidrug resistance in tumors. Recently developed machine learning methods for predicting substrates or inhibitors of MRPs have demonstrated remarkable accuracy.

In 2017, Lingineni et al. [48] established a SVM model for MRP1 substrate classification based on previous studies [46], and the accuracy of the best MRP1 substrate model in the training set, test set, and external validation set was 87.39, 93.54, and 80%, respectively. The BBB permeability artificial neural network model and molecular docking analysis demonstrated that MRP1 plays an important role in the transport of substances in the BBB.

Kharangarh et al. [47] used k-NN, RF, SVM, and other machine learning methods to train the classification model of MRP2 inhibitors and non-inhibitors using compounds from the Metrabase database and obtained different descriptors through four methods: variance threshold, SelectKBest, RF, and REF. The k-NN, RF, and SVM methods were used to train the machine learning model. The five-fold cross-validation and analysis of relevant parameters showed that the SVM model constructed by the features selected by the RFE method performed well, and the key descriptors for developing MRP2 inhibitor models were obtained by RStudio analysis, which could determine the inhibitory properties of the MRP2 protein in the early stages of drug discovery. The prediction of MRPs substrates can reduce the failure rate of preclinical drug studies, and the prediction of inhibitors can help the study of MDR, both of which can be used in the early stages of drug discovery.

4.1.4. BSEP

BSEP is a kind of ABC transporter encoded by the ABCB11 gene. It is located in the duct membrane of hepatocytes and is responsible for transporting bile acids and bile salts from hepatocytes to bile tubules [64]. Inhibition of BSEP can cause the toxic accumulation of bile salts in cells, triggering cholestatic liver injury and ultimately leading to the premature termination of preclinical development and clinical trials of drug candidates. Machine learning prediction of BSEP has also been studied in recent years. In 2021, McLoughlin et al. [49] developed a model for predicting and classifying BSEP inhibitors. They utilized the Automated Data-Driven Modeling Pipeline (AMPL) to train and assess over 15,500 classification and regression models. The optimal combination of model types, dataset segmentation strategies, chemical characterization methods, and model parameters is determined by testing various configurations using the AMPL’s hyperparameter search function. The best performing model for this purpose was finally found to be the RF model, which included MOE descriptor features.

4.2. SLC Transporters

More than 400 transporters have been identified, and the SLC22 transporter family is one of the best studied SLC families for drug handling, with a central role for small molecule endogenous substances, drugs, and endotoxins that move between tissues and interfacial fluids. The kidney expresses high levels of OCT2 and OAT1, which are crucial for the renal uptake of clinical drugs and endogenous substances. OAT1, OAT3, OCT1, and OCT2 are widely studied drug transporters. OAT2 is mainly expressed in hepatocytes and involved in the transport of small molecule anion drugs to hepatocytes. OAT1 and OAT3 are mainly expressed in renal cells and regulate the transport of organic anions from the blood into proximal tubular cells. OCT1 is mainly expressed in the hepatic sinusoidal membrane and mediates the transport of drugs and endogenous substances. OCT2 and OCT3 are involved in the renal and biliary excretion of cationic drugs, respectively. In 2016, Ose et al. [39] developed a model for predicting drug transporter substrates based on the SVM method and established a database of seven classes of transporter substrates (OATP1B1/1B3, OAT1/3, OCT1/2, MRP2/3/4, MDR1, BCRP, and MATE1/2-K). Physicochemical parameters were used as the basic descriptors. This model has the potential to accurately predict transporter substrates without the need for in vitro transport assays. In 2017, Shaikh [37] performed multi-transporter modeling and developed substrate prediction models for transporters using quantitative structure–activity relationship (QSAR) and protein stoichiometry (PCM) methods. After evaluating the established models, the top-performing model was merged with other models to create a heterogeneous integrated model for each transporter. This analysis involved 6 efflux transporters, 7 uptake transporters, and 4575 substrate/non-substrate data. In 2020, Nigam et al. [50] combined machine learning, chemoinformatics, and multi-specific drug transporter knockout metabolomics to analyze the unique metabolites accumulated in the plasma of OAT1 and OAT3 knockout mice and define the molecular properties of endogenous ligands. Finally, seven key molecular descriptors were obtained. The RF classification model based on the metabolite dataset correctly classified ≥ 75% of the drugs known to interact with OAT1/3. This helps with the physiological role of drug transporters, metabolite-based drug design, and analysis of drug–metabolite interactions. This cheminomics–machine learning approach was subsequently used to analyze OATP and OAT-transported drugs by Nigam et al. [51]. The results showed that liver OATPs preferred highly hydrophobic, complex, and more ring-like drugs as substrates, whereas kidney OATs preferred more polar drugs. This provides a molecular basis for tissue-specific targeting strategies, drug interactions, and drug delivery to minimize toxicity in liver and kidney diseases. In 2021, Jensen et al. [54] used machine learning methods to predict substrates for OCT1. A database containing more than 1000 substances was predicted by virtual screening, and 19 substances were tested in vitro. This study demonstrates that machine learning methods can accurately predict substrates of OCT1, even in the absence of its crystal structure.

The norepinephrine transporter (NET/SLC6A2) is also a member of the SLC family and is more well studied than other SLCS. NETs can regulate NE-mediated physiological effects by terminating noradrenergic signaling by uptake of norepinephrine into presynaptic terminals. In 2023, Bongers et al. [65] developed a technique for the identification of NET inhibitors combined with machine learning methods using RF, GBt, and PLS algorithms to build a model combined with virtual screening and experimental validation and finally identified five novel NET inhibitors. This method incorporates the chemical space of the ligand and utilizes a similarity-based network to select related proteins for the modeling of NETs.

Obtaining data is sometimes limited by privacy issues. Collecting and sorting data also requires a lot of time and effort. After obtaining data, it is also crucial to select chemical descriptors that can establish high-quality models for different transporters. To make it possible to build predictive models in a quick and easy way, Smajic et al. [38] worked with Jupyter Notebooks in 2022 to create a framework that can create or retrain ML models in a semi-automatic manner. Classification models of six transporters (BCRP, BSEP, OATP1B1, OATP1B3, MRP3, and P-gp) were allowed to be generated, and the created models could be updated and shared using Jupyter Notebooks. This is a valuable tool for predicting new data on ABC and SLC transporters. Table 2 shows the sources of the datasets used by the researchers to build their machine learning models.

5. Conclusions and Future Prospects

Advances in information technology have created new methods for advances in many fields, including pharmaceutical research. Cost and efficiency are also a major challenge in the drug discovery process. Drug resistance is one of the main reasons for the failure of chemotherapeutic drugs, so predicting whether a chemotherapeutic drug is a substrate of a transporter or not is essential; the alteration of drug efficacy and toxicity caused by DDI can be predicted by the identification and prediction of transporter substrates and inhibitors. Nowadays, computerized classification models of various transporter substrates and inhibitors have been established to save experimental resources. They have been instrumental in overcoming drug resistance, DDI discovery, and drug targeting.

The quality of the data has a significant impact on the performance of the model. In addition to the machine learning models that have been built so far in the initial database establishment stage, as well as obtaining data from public databases, we need to manually collect and collate data from different studies or use unpublished data from our own laboratory. This may affect the reliability of the prediction results. Therefore, the selection of descriptors and validation of the model is also an important step to ensure the accuracy of the prediction results.

In this paper, we have reviewed the machine learning technique-based approach to study different transporters. The use of a single source of data and construction of models to understand the role of drugs and transporter proteins is not sufficiently accurate. Classification efficiency and higher predictive accuracy of machine learning models depend on comprehensive and reliable data and trade-offs between individual machine learning approaches. In addition, further validation is needed for transporter substrates and inhibitors predicted through machine learning. In general, machine learning provides a highly useful tool for studying transporters, improving research efficiency, and allowing us to focus on compounds with higher potential.

Author Contributions

Conceptualization, K.L., Y.Z. and S.Y.; methodology, X.K. and K.L.; formal analysis, X.K. and X.Z.; writing—original draft preparation, K.L., G.W. and X.T.; writing—review and editing, D.D. and L.L. supervision, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

National Natural Science Foundation of China, grant/award number: 82003837.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No experimental data are available for this review.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Sample Availability

Not available.

Abbreviations

DDIs, drug-drug interactions; ABC, ATP-binding cassette; SLC, solute carrier; VS, virtual screening; ML, machine learning; P-gp, P-glycoprotein; MRP, multidrug resistance-associated protein; BCRP, breast cancer resistance protein; MDR, multidrug resistance; BSEP, bile salt transporters; OATPs, organic anion transporters; OATs, organic anion transporters; OCTs, organic cation transporters; PEPTs, oligopeptide transporters; MATE, multidrug and toxic compound efflux transporter; ADME, absorption, distribution, metabolism, and excretion; BBB, blood–brain barrier; AI, artificial intelligence; RF, random forest; XGBoost, gradient boosting trees; ANN, artificial neural network; DNN, deep neural network; DL, deep learning; DNNS, deep neural networks; RNNs, recurrent neural networks; SVM, support vector machine; MDFPs, molecular dynamics fingerprints; PhE, pharmacophore ensemble; GA, genetic algorithm; SA, simulated annealing; QSAR, quantitative structure–activity relationship; PCM, protein stoichiometry; NET, norepinephrine transporter; NE, norepinephrine.

References

Liang, Y.; Li, S.; Chen, L. The physiological role of drug transporters. Protein Cell 2015, 6, 334–350. [Google Scholar] [CrossRef] [PubMed]
Wang, D. Current Research Method in Transporter Study. Adv. Exp. Med. Biol. 2019, 1141, 203–240. [Google Scholar] [PubMed]
Shoichet, B.K.; McGovern, S.L.; Wei, B.; Irwin, J.J. Lead discovery using molecular docking. Curr. Opin. Chem. Biol. 2002, 6, 439–446. [Google Scholar] [CrossRef]
Pinzi, L.; Rastelli, G. Molecular Docking: Shifting Paradigms in Drug Discovery. Int. J. Mol. Sci. 2019, 20, 4331. [Google Scholar] [CrossRef]
Hsu, K.-C.; Chen, Y.-F.; Lin, S.-R.; Yang, J.-M. iGEMDOCK: A graphical environment of enhancing GEMDOCK using pharmacological interactions and post-screening analysis. BMC Bioinform. 2011, 12, S33. [Google Scholar] [CrossRef]
Forli, S.; Huey, R.; Pique, M.E.; Sanner, M.F.; Goodsell, D.S.; Olson, A.J. Computational protein-ligand docking and virtual drug screening with the AutoDock suite. Nat. Protoc. 2016, 11, 905–919. [Google Scholar] [CrossRef] [PubMed]
Carracedo-Reboredo, P.; Liñares-Blanco, J.; Rodríguez-Fernández, N.; Cedrón, F.; Novoa, F.J.; Carballal, A.; Maojo, V.; Pazos, A.; Fernandez-Lozano, C. A review on machine learning approaches and trends in drug discovery. Comput. Struct. Biotechnol. J. 2021, 19, 4538–4558. [Google Scholar] [CrossRef]
Cascorbi, I. Role of pharmacogenetics of ATP-binding cassette transporters in the pharmacokinetics of drugs. Pharmacol. Ther. 2006, 112, 457–473. [Google Scholar] [CrossRef]
Meier, P.J.; Stieger, B. Bile salt transporters. Annu. Rev. Physiol. 2002, 64, 635–661. [Google Scholar] [CrossRef] [PubMed]
Hediger, M.A.; Clémençon, B.; Burrier, R.E.; Bruford, E.A. The ABCs of membrane transporters in health and disease (SLC series): Introduction. Mol. Aspects Med. 2013, 34, 95–107. [Google Scholar] [CrossRef]
Nigam, S.K. The SLC22 Transporter Family: A Paradigm for the Impact of Drug Transporters on Metabolic Pathways, Signaling, and Disease. Annu. Rev. Pharmacol. Toxicol. 2018, 58, 663–687. [Google Scholar] [CrossRef]
Gyimesi, G.; Hediger, M.A. Transporter-Mediated Drug Delivery. Molecules 2023, 28, 1151. [Google Scholar] [CrossRef]
Zhao, J.; Zeng, Z.; Sun, J.; Zhang, Y.; Li, D.; Zhang, X.; Liu, M.; Wang, X. A Novel Model of P-Glycoprotein Inhibitor Screening Using Human Small Intestinal Organoids. Basic Clin. Pharmacol. Toxicol. 2017, 120, 250–255. [Google Scholar] [CrossRef]
Wolf, A.; Bauer, B.; Hartz, A.M.S. ABC Transporters and the Alzheimer’s Disease Enigma. Front. Psychiatry 2012, 3, 54. [Google Scholar] [CrossRef] [PubMed]
Bi, Y.-A.; Costales, C.; Mathialagan, S.; West, M.; Eatemadpour, S.; Lazzaro, S.; Tylaska, L.; Scialis, R.; Zhang, H.; Umland, J.; et al. Quantitative Contribution of Six Major Transporters to the Hepatic Uptake of Drugs: “SLC-Phenotyping” Using Primary Human Hepatocytes. J. Pharmacol. Exp. Ther. 2019, 370, 72–83. [Google Scholar] [CrossRef]
Brouwer, K.L.R.; Evers, R.; Hayden, E.; Hu, S.; Li, C.Y.; Meyer Zu Schwabedissen, H.E.; Neuhoff, S.; Oswald, S.; Piquette-Miller, M.; Saran, C.; et al. Regulation of Drug Transport Proteins-From Mechanisms to Clinical Impact: A White Paper on Behalf of the International Transporter Consortium. Clin. Pharmacol. Ther. 2022, 112, 461–484. [Google Scholar] [CrossRef]
Droździk, M.; Oswald, S.; Droździk, A. Extrahepatic Drug Transporters in Liver Failure: Focus on Kidney and Gastrointestinal Tract. Int. J. Mol. Sci. 2020, 21, 5737. [Google Scholar] [CrossRef]
Zou, W.; Shi, B.; Zeng, T.; Zhang, Y.; Huang, B.; Ouyang, B.; Cai, Z.; Liu, M. Drug Transporters in the Kidney: Perspectives on Species Differences, Disease Status, and Molecular Docking. Front. Pharmacol. 2021, 12, 746208. [Google Scholar] [CrossRef] [PubMed]
McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. 1943. Bull. Math. Biol. 1990, 52, 99–115. [Google Scholar] [CrossRef]
Tkatchenko, A. Machine learning for chemical discovery. Nat. Commun. 2020, 11, 4125. [Google Scholar] [CrossRef] [PubMed]
Klambauer, G.; Hochreiter, S.; Rarey, M. Machine Learning in Drug Discovery. J. Chem. Inf. Model. 2019, 59, 945–946. [Google Scholar] [CrossRef]
Loftus, T.J.; Tighe, P.J.; Ozrazgat-Baslanti, T.; Davis, J.P.; Ruppert, M.M.; Ren, Y.; Shickel, B.; Kamaleswaran, R.; Hogan, W.R.; Moorman, J.R.; et al. Ideal algorithms in healthcare: Explainable, dynamic, precise, autonomous, fair, and reproducible. PLoS Digit. Health 2022, 1, e0000006. [Google Scholar] [CrossRef]
Chen, X.; Wang, M.; Zhang, H. The use of classification trees for bioinformatics. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 55–63. [Google Scholar] [CrossRef] [PubMed]
Choi, R.Y.; Coyner, A.S.; Kalpathy-Cramer, J.; Chiang, M.F.; Campbell, J.P. Introduction to Machine Learning, Neural Networks, and Deep Learning. Transl. Vis. Sci. Technol. 2020, 9, 14. [Google Scholar] [PubMed]
Sarker, I.H. Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective. SN Comput. Sci. 2021, 2, 377. [Google Scholar] [CrossRef] [PubMed]
Lavecchia, A. Deep learning in drug discovery: Opportunities, challenges and future prospects. Drug Discov. Today 2019, 24, 2017–2032. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Ai, H.; Chen, W.; Yin, Z.; Hu, H.; Zhu, J.; Zhao, J.; Zhao, Q.; Liu, H. CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods. Sci. Rep. 2017, 7, 2118. [Google Scholar] [CrossRef]
Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef]
Zhang, Z. Naïve Bayes classification in R. Ann. Transl. Med. 2016, 4, 241. [Google Scholar] [CrossRef]
Uddin, S.; Haque, I.; Lu, H.; Moni, M.A.; Gide, E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci. Rep. 2022, 12, 6256. [Google Scholar] [CrossRef]
Bzdok, D.; Krzywinski, M.; Altman, N. Machine learning: Supervised methods. Nat. Methods 2018, 15, 5–6. [Google Scholar] [CrossRef]
Zhang, Z. Introduction to machine learning: K-nearest neighbors. Ann. Transl. Med. 2016, 4, 218. [Google Scholar] [CrossRef] [PubMed]
Linton, K.J. Structure and function of ABC transporters. Physiology 2007, 22, 122–130. [Google Scholar] [CrossRef] [PubMed]
Huang, J.; Ecker, G.F. A Structure-Based View on ABC-Transporter Linked to Multidrug Resistance. Molecules 2023, 28, 495. [Google Scholar] [CrossRef] [PubMed]
Kadioglu, O.; Efferth, T. A Machine Learning-Based Prediction Platform for P-Glycoprotein Modulators and Its Validation by Molecular Docking. Cells 2019, 8, 1286. [Google Scholar] [CrossRef]
Esposito, C.; Wang, S.; Lange, U.E.W.; Oellien, F.; Riniker, S. Combining Machine Learning and Molecular Dynamics to Predict P-Glycoprotein Substrates. J. Chem. Inf. Model. 2020, 60, 4730–4749. [Google Scholar] [CrossRef]
Shaikh, N.; Sharma, M.; Garg, P. Selective Fusion of Heterogeneous Classifiers for Predicting Substrates of Membrane Transporters. J. Chem. Inf. Model. 2017, 57, 594–607. [Google Scholar] [CrossRef]
Smajić, A.; Grandits, M.; Ecker, G.F. Using Jupyter Notebooks for re-training machine learning models. J. Cheminform. 2022, 14, 54. [Google Scholar] [CrossRef]
Ose, A.; Toshimoto, K.; Ikeda, K.; Maeda, K.; Yoshida, S.; Yamashita, F.; Hashida, M.; Ishida, T.; Akiyama, Y.; Sugiyama, Y. Development of a Support Vector Machine-Based System to Predict Whether a Compound Is a Substrate of a Given Drug Transporter Using Its Chemical Structure. J. Pharm. Sci. 2016, 105, 2222–2230. [Google Scholar] [CrossRef]
Ganguly, S.; Finkelstein, D.; Shaw, T.I.; Michalek, R.D.; Zorn, K.M.; Ekins, S.; Yasuda, K.; Fukuda, Y.; Schuetz, J.D.; Mukherjee, K.; et al. Metabolomic and transcriptomic analysis reveals endogenous substrates and metabolic adaptation in rats lacking Abcg2 and Abcb1a transporters. PLoS ONE 2021, 16, e0253852. [Google Scholar] [CrossRef]
Montanari, F.; Zdrazil, B.; Digles, D.; Ecker, G.F. Selectivity profiling of BCRP versus P-gp inhibition: From automated collection of polypharmacology data to multi-label learning. J. Cheminform. 2016, 8, 7. [Google Scholar] [CrossRef] [PubMed]
Gantner, M.E.; Peroni, R.N.; Morales, J.F.; Villalba, M.L.; Ruiz, M.E.; Talevi, A. Development and Validation of a Computational Model Ensemble for the Early Detection of BCRP/ABCG2 Substrates during the Drug Design Stage. J. Chem. Inf. Model. 2017, 57, 1868–1880. [Google Scholar] [CrossRef] [PubMed]
Jiang, D.; Lei, T.; Wang, Z.; Shen, C.; Cao, D.; Hou, T. ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance protein inhibition through machine learning. J. Cheminform. 2020, 12, 16. [Google Scholar] [CrossRef]
Hazai, E.; Hazai, I.; Ragueneau-Majlessi, I.; Chung, S.P.; Bikadi, Z.; Mao, Q. Predicting substrates of the human breast cancer resistance protein using a support vector machine method. BMC Bioinform. 2013, 14, 130. [Google Scholar] [CrossRef]
Ding, Y.-L.; Shih, Y.-H.; Tsai, F.-Y.; Leong, M.K. In silico prediction of inhibition of promiscuous breast cancer resistance protein (BCRP/ABCG2). PLoS ONE 2014, 9, e90689. [Google Scholar] [CrossRef]
Garg, P.; Dhakne, R.; Belekar, V. Role of breast cancer resistance protein (BCRP) as active efflux transporter on blood-brain barrier (BBB) permeability. Mol. Divers. 2015, 19, 163–172. [Google Scholar] [CrossRef] [PubMed]
Kharangarh, S.; Sandhu, H.; Tangadpalliwar, S.; Garg, P. Predicting Inhibitors for Multidrug Resistance Associated Protein-2 Transporter by Machine Learning Approach. Comb. Chem. High Throughput Screen. 2018, 21, 557–566. [Google Scholar] [CrossRef] [PubMed]
Lingineni, K.; Belekar, V.; Tangadpalliwar, S.R.; Garg, P. The role of multidrug resistance protein (MRP-1) as an active efflux transporter on blood-brain barrier (BBB) permeability. Mol. Divers. 2017, 21, 355–365. [Google Scholar] [CrossRef]
McLoughlin, K.S.; Jeong, C.G.; Sweitzer, T.D.; Minnich, A.J.; Tse, M.J.; Bennion, B.J.; Allen, J.E.; Calad-Thomson, S.; Rush, T.S.; Brase, J.M. Machine Learning Models to Predict Inhibition of the Bile Salt Export Pump. J. Chem. Inf. Model. 2021, 61, 587–602. [Google Scholar] [CrossRef]
Nigam, A.K.; Li, J.G.; Lall, K.; Shi, D.; Bush, K.T.; Bhatnagar, V.; Abagyan, R.; Nigam, S.K. Unique metabolite preferences of the drug transporters OAT1 and OAT3 analyzed by machine learning. J. Biol. Chem. 2020, 295, 1829–1842. [Google Scholar] [CrossRef]
Nigam, A.K.; Ojha, A.A.; Li, J.G.; Shi, D.; Bhatnagar, V.; Nigam, K.B.; Abagyan, R.; Nigam, S.K. Molecular Properties of Drugs Handled by Kidney OATs and Liver OATPs Revealed by Chemoinformatics and Machine Learning: Implications for Kidney and Liver Disease. Pharmaceutics 2021, 13, 1720. [Google Scholar] [CrossRef] [PubMed]
Tuerkova, A.; Bongers, B.J.; Norinder, U.; Ungvári, O.; Székely, V.; Tarnovskiy, A.; Szakács, G.; Özvegy-Laczka, C.; van Westen, G.J.P.; Zdrazil, B. Identifying Novel Inhibitors for Hepatic Organic Anion Transporting Polypeptides by Machine Learning-Based Virtual Screening. J. Chem. Inf. Model. 2022, 62, 6323–6335. [Google Scholar] [CrossRef] [PubMed]
Lane, T.R.; Urbina, F.; Zhang, X.; Fye, M.; Gerlach, J.; Wright, S.H.; Ekins, S. Machine Learning Models Identify New Inhibitors for Human OATP1B1. Mol. Pharm. 2022, 19, 4320–4332. [Google Scholar] [CrossRef]
Jensen, O.; Brockmöller, J.; Dücker, C. Identification of Novel High-Affinity Substrates of OCT1 Using Machine Learning-Guided Virtual Screening and Experimental Validation. J. Med. Chem. 2021, 64, 2762–2776. [Google Scholar] [CrossRef]
Juliano, R.L.; Ling, V. A surface glycoprotein modulating drug permeability in Chinese hamster ovary cell mutants. Biochim. Biophys. Acta 1976, 455, 152–162. [Google Scholar] [CrossRef] [PubMed]
Hyde, S.C.; Emsley, P.; Hartshorn, M.J.; Mimmack, M.M.; Gileadi, U.; Pearce, S.R.; Gallagher, M.P.; Gill, D.R.; Hubbard, R.E.; Higgins, C.F. Structural model of ATP-binding proteins associated with cystic fibrosis, multidrug resistance and bacterial transport. Nature 1990, 346, 362–365. [Google Scholar] [CrossRef]
Dewanjee, S.; Dua, T.K.; Bhattacharjee, N.; Das, A.; Gangopadhyay, M.; Khanra, R.; Joardar, S.; Riaz, M.; Feo, V.D.; Zia-Ul-Haq, M. Natural Products as Alternative Choices for P-Glycoprotein (P-gp) Inhibition. Molecules 2017, 22, 871. [Google Scholar] [CrossRef]
Constantinides, P.P.; Wasan, K.M. Lipid formulation strategies for enhancing intestinal transport and absorption of P-glycoprotein (P-gp) substrate drugs: In vitro/in vivo case studies. J. Pharm. Sci. 2007, 96, 235–248. [Google Scholar] [CrossRef]
DeGorter, M.K.; Xia, C.Q.; Yang, J.J.; Kim, R.B. Drug transporters in drug efficacy and toxicity. Annu. Rev. Pharmacol. Toxicol. 2012, 52, 249–273. [Google Scholar] [CrossRef]
Ueda, K.; Clark, D.P.; Chen, C.J.; Roninson, I.B.; Gottesman, M.M.; Pastan, I. The human multidrug resistance (mdr1) gene. cDNA cloning and transcription initiation. J. Biol. Chem. 1987, 262, 505–508. [Google Scholar] [CrossRef]
Doyle, L.A.; Yang, W.; Abruzzo, L.V.; Krogmann, T.; Gao, Y.; Rishi, A.K.; Ross, D.D. A multidrug resistance transporter from human MCF-7 breast cancer cells. Proc. Natl. Acad. Sci. USA 1998, 95, 15665–15670. [Google Scholar] [CrossRef]
Ni, Z.; Bikadi, Z.; Rosenberg, M.F.; Mao, Q. Structure and function of the human breast cancer resistance protein (BCRP/ABCG2). Curr. Drug Metab. 2010, 11, 603–617. [Google Scholar] [CrossRef]
Begley, D.J. Delivery of therapeutic agents to the central nervous system: The problems and the possibilities. Pharmacol. Ther. 2004, 104, 29–45. [Google Scholar] [CrossRef]
Sohail, M.I.; Dönmez-Cakil, Y.; Szöllősi, D.; Stockner, T.; Chiba, P. The Bile Salt Export Pump: Molecular Structure, Study Models and Small-Molecule Drugs for the Treatment of Inherited BSEP Deficiencies. Int. J. Mol. Sci. 2021, 22, 784. [Google Scholar] [CrossRef] [PubMed]
Bongers, B.J.; Sijben, H.J.; Hartog, P.B.R.; Tarnovskiy, A.; Ijzerman, A.P.; Heitman, L.H.; van Westen, G.J.P. Proteochemometric Modeling Identifies Chemically Diverse Norepinephrine Transporter Inhibitors. J. Chem. Inf. Model. 2023, 63, 1745–1755. [Google Scholar] [CrossRef]
Gaulton, A.; Bellis, L.J.; Bento, A.P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; et al. ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. [Google Scholar] [CrossRef] [PubMed]
Mak, L.; Marcus, D.; Howlett, A.; Yarova, G.; Duchateau, G.; Klaffke, W.; Bender, A.; Glen, R.C. Metrabase: A cheminformatics and bioinformatics database for small molecule transporter data analysis and (Q)SAR modeling. J. Cheminform. 2015, 7, 31. [Google Scholar] [CrossRef]
Montanari, F.; Knasmüller, B.; Kohlbacher, S.; Hillisch, C.; Baierová, C.; Grandits, M.; Ecker, G.F. Vienna LiverTox Workspace-A Set of Machine Learning Models for Prediction of Interactions Profiles of Small Molecules With Transporters Relevant for Regulatory Agencies. Front. Chem. 2019, 7, 899. [Google Scholar] [CrossRef] [PubMed]
Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem in 2021: New data content and improved web interfaces. Nucleic Acids Res 2021, 49, D1388–D1395. [Google Scholar] [CrossRef] [PubMed]
Yoshida, S.; Yamashita, F.; Ose, A.; Maeda, K.; Sugiyama, Y.; Hashida, M. Automated extraction of information on chemical-P-glycoprotein interactions from the literature. J. Chem. Inf. Model. 2013, 53, 2506–2510. [Google Scholar] [CrossRef]
Morgan, R.E.; van Staden, C.J.; Chen, Y.; Kalyanaraman, N.; Kalanzi, J.; Dunn, R.T.; Afshari, C.A.; Hamadeh, H.K. A multifactorial approach to hepatobiliary transporter assessment enables improved therapeutic compound development. Toxicol. Sci. 2013, 136, 216–241. [Google Scholar] [CrossRef] [PubMed]
Rodríguez-Pérez, R.; Gerebtzoff, G. Identification of bile salt export pump inhibitors using machine learning: Predictive safety from an industry perspective. Artif. Intell. Life Sci. 2021, 1, 100027. [Google Scholar] [CrossRef]
Baidya, A.T.K.; Ghosh, K.; Amin, S.A.; Adhikari, N.; Nirmal, J.; Jha, T.; Gayen, S. In silico modelling, identification of crucial molecular fingerprints, and prediction of new possible substrates of human organic cationic transporters 1 and 2. New J. Chem. 2020, 44, 4129–4143. [Google Scholar] [CrossRef]
Malani, M.; Hiremath, M.S.; Sharma, S.; Jhunjhunwala, M.; Gayen, S.; Hota, C.; Nirmal, J. Interaction of systemic drugs causing ocular toxicity with organic cation transporter: An artificial intelligence prediction. J. Biomol. Struct. Dyn. 2023, 1–12. [Google Scholar] [CrossRef]
Sterling, T.; Irwin, J.J. ZINC 15—Ligand Discovery for Everyone. J. Chem. Inf. Model. 2015, 55, 2324–2337. [Google Scholar] [CrossRef]
Burggraaff, L.; Oranje, P.; Gouka, R.; van der Pijl, P.; Geldof, M.; van Vlijmen, H.W.T.; Ijzerman, A.P.; van Westen, G.J.P. Identification of novel small molecule inhibitors for solute carrier SGLT1 using proteochemometric modeling. J. Cheminform. 2019, 11, 15. [Google Scholar] [CrossRef]

Figure 1. Expression of ABC and SLC transporters with major roles in drug efficacy or toxicity in human intestinal epithelia, hepatocytes, kidney proximal tubule epithelia, brain capillary endothelial cells, and choroid plexus epithelial cells.

Table 1. Application of machine learning methods in the investigation of drug transporters.

Transporter		ML Methods	References
ABC	P-gp	RF	[35,36,37,38]
		NN	[35]
		SVM	[35,36,37,38,39]
		k-NN	[35,37,38]
		Bayes	[37,40]
		Logistic regression (LR)	[37,38]
		GTB	[36]
	BCRP	RF	[37,38,41,42]
		Bayes	[37,40]
		DNN	[43]
		SVM	[37,38,39,41,44,45,46]
		k-NN	[37,38]
		XGBoost	[43]
		LR	[37,38,41]
	MRPs	RF	[37,38,47]
		SVM	[37,38,39,47,48]
		Bayes	[37]
		k-NN	[37,38,47]
		LR	[37,38]
	BSEP	RF	[38,49]
	BSEP	SVM, LR, k-NN	[38]
SLC	OAT	RF	[50,51]
		SVM	[39]
		SNN, NB, k-NN, LR	[51]
	OATP	RF	[52,53]
		k-NN, LR	[37,38,51,53]
		XGBoost, DL	[53]
		SVM	[37,38,39]
		SNN	[51]
		Bayes	[37,51,53]
	OCT	RF, Bayes, k-NN, LR	[37]
	OCT	SVM	[37,39]
	MATE1,2-K	SVM	[39]
	NET	RF	[54]

Table 2. The source of the dataset.

Transport Protein	Data Sources	References
P-gp	Literature	[35]
P-gp	In-house dataset; ChEMBL [66]	[36]
P-gp, BCRP, MRPs, OATP, OCT	Metrabase [67] (http://www-metrabase.ch.cam.ac.uk, accessed on 25 July 2023); literature	[37]
P-gp, BCRP, MRPs, BSEP, OATP	LiverTox [68]; ChEMBL and PubChem [69]	[38]
P-gp, BCRP, MRPs, OAT, OATP, OCT, MATE1,2-K	Text-mining technique [70]; TP search (http://togodb.dbcls.jp/tpsearch, accessed on 25 July 2023); DIDB (http://www.druginteractioninfo.org/, accessed on 25 July 2023); PharmGKB (www.pharmgkb.org); TransPortal (http://dbts.ucsf.edu/fdatransportal, accessed on 25 July 2023); PubChem	[39]
P-gp, BCRP	Literature	[40]
BCRP	Literature; the Open PHACTS Discovery Platform	[41]
BCRP	Literature	[42,43,45,46]
BCRP	Literature; University of Washington Metabolism & Transport Drug Interaction Database (http://www.druginteractioninfo.org/, accessed on 25 July 2023); PubChem Database (http://pubchem.ncbi.nlm.nih.gov, accessed on 25 July 2023)	[44]
MRPs	Literature; Metrabase	[47]
MRPs	Literature; PubMed; TP search	[48]
BSEP	A proprietary BSEP assay dataset; published dataset [71]	[49]
BSEP	Training dataset	[72]
OAT	Literature; training dataset	[50]
OAT, OATP	PubChem	[51]
OATP	ChEMBL, UCSF-FDA TransPortal, DrugBank, Metrabase, IUPHAR	[52]
OATP	ChEMBL	[53]
OCT	Literature	[73]
OCT1	Training dataset	[74]
NET	ZINC database [75]; PubChem; literature	[54]
SGLT1	ChEMBL; the Spectrum Collection compound library	[76]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kong, X.; Lin, K.; Wu, G.; Tao, X.; Zhai, X.; Lv, L.; Dong, D.; Zhu, Y.; Yang, S. Machine Learning Techniques Applied to the Study of Drug Transporters. Molecules 2023, 28, 5936. https://doi.org/10.3390/molecules28165936

AMA Style

Kong X, Lin K, Wu G, Tao X, Zhai X, Lv L, Dong D, Zhu Y, Yang S. Machine Learning Techniques Applied to the Study of Drug Transporters. Molecules. 2023; 28(16):5936. https://doi.org/10.3390/molecules28165936

Chicago/Turabian Style

Kong, Xiaorui, Kexin Lin, Gaolei Wu, Xufeng Tao, Xiaohan Zhai, Linlin Lv, Deshi Dong, Yanna Zhu, and Shilei Yang. 2023. "Machine Learning Techniques Applied to the Study of Drug Transporters" Molecules 28, no. 16: 5936. https://doi.org/10.3390/molecules28165936

APA Style

Kong, X., Lin, K., Wu, G., Tao, X., Zhai, X., Lv, L., Dong, D., Zhu, Y., & Yang, S. (2023). Machine Learning Techniques Applied to the Study of Drug Transporters. Molecules, 28(16), 5936. https://doi.org/10.3390/molecules28165936

Article Menu

Machine Learning Techniques Applied to the Study of Drug Transporters

Abstract

1. Introduction

2. Drug Transporters and Important Implications

3. Machine Learning

3.1. Decision Trees and Random Forests

3.2. Neural Network

3.3. Support Vector Machine

3.4. Naïve Bayes

3.5. k-Nearest Neighbor Algorithm

4. Application of Machine Learning Methods in the Investigation of Drug Transporters

4.1. ABC Transporters

4.1.1. P-gp

4.1.2. BCRP

4.1.3. MRPs

4.1.4. BSEP

4.2. SLC Transporters

5. Conclusions and Future Prospects

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Sample Availability

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI