A Robust Drug–Target Interaction Prediction Framework with Capsule Network and Transfer Learning

Huang, Yixian; Huang, Hsi-Yuan; Chen, Yigang; Lin, Yang-Chi-Dung; Yao, Lantian; Lin, Tianxiu; Leng, Junlin; Chang, Yuan; Zhang, Yuntian; Zhu, Zihao; Ma, Kun; Cheng, Yeong-Nan; Lee, Tzong-Yi; Huang, Hsien-Da

doi:10.3390/ijms241814061

Open AccessArticle

A Robust Drug–Target Interaction Prediction Framework with Capsule Network and Transfer Learning

by

Yixian Huang

^1,2,†

,

Hsi-Yuan Huang

^1,2,†

,

Yigang Chen

^1,2

,

Yang-Chi-Dung Lin

^1,2,

Lantian Yao

²,

Tianxiu Lin

^1,2,

Junlin Leng

^1,2,

Yuan Chang

²,

Yuntian Zhang

²,

Zihao Zhu

^1,2,

Kun Ma

^1,2,

Yeong-Nan Cheng

³,

Tzong-Yi Lee

³ and

Hsien-Da Huang

^1,2,*

¹

School of Medicine, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China

²

Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, Longgang District, Shenzhen 518172, China

³

Institute of Bioinformatics and Systems Biology, Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Int. J. Mol. Sci. 2023, 24(18), 14061; https://doi.org/10.3390/ijms241814061

Submission received: 27 July 2023 / Revised: 27 August 2023 / Accepted: 28 August 2023 / Published: 14 September 2023

(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine)

Download

Browse Figures

Versions Notes

Abstract

:

Drug–target interactions (DTIs) are considered a crucial component of drug design and drug discovery. To date, many computational methods were developed for drug–target interactions, but they are insufficiently informative for accurately predicting DTIs due to the lack of experimentally verified negative datasets, inaccurate molecular feature representation, and ineffective DTI classifiers. Therefore, we address the limitations of randomly selecting negative DTI data from unknown drug–target pairs by establishing two experimentally validated datasets and propose a capsule network-based framework called CapBM-DTI to capture hierarchical relationships of drugs and targets, which adopts pre-trained bidirectional encoder representations from transformers (BERT) for contextual sequence feature extraction from target proteins through transfer learning and the message-passing neural network (MPNN) for the 2-D graph feature extraction of compounds to accurately and robustly identify drug–target interactions. We compared the performance of CapBM-DTI with state-of-the-art methods using four experimentally validated DTI datasets of different sizes, including human (Homo sapiens) and worm (Caenorhabditis elegans) species datasets, as well as three subsets (new compounds, new proteins, and new pairs). Our results demonstrate that the proposed model achieved robust performance and powerful generalization ability in all experiments. The case study on treating COVID-19 demonstrates the applicability of the model in virtual screening.

Keywords:

drug–target interactions; bidirectional encoder representations from transformers; transfer learning; message-passing neural networks; capsule network

1. Introduction

Drug–target interaction, or DTI, refers to the recognition of the interaction between drugs and the protein targets that may result in illness in humans. Such binding typically causes the target protein to alter its physicochemical characteristics and malfunction as a result. Discovering and understanding these interactions can not only assist explain the mechanism of many currently available medications, but also be employed for drug repositioning and side effect forecasting [1]. Therefore, DTI prediction is one of the most significant research areas during the drug development process [2]. Wet lab experiments are the most reliable and effective method for determining whether a drug interacts with its target, but they are time-consuming and resource-intensive [3]. To alleviate this problem, several computational methods were proposed to speed up the development of new drugs and reduce costs.

Structure-based methods and ligand-based methods are the two main existing computational approaches for identifying drug–target interactions [4]. In structure-based strategies such as molecular docking, the three-dimensional (3D) structures of proteins and chemical compounds are utilized to explore potential binding poses at the atom level and identify binding affinities [5]. Nevertheless, the great computational difficulty of solving such 3D structures and the scarcity of small molecules and proteins with known 3D structures limits the scope of these approaches, even though they yielded adequate biological interpretation and somewhat attractive predictive performance [6].

Ligand-based methods, including machine learning-based methods and deep learning-based methods, require less computational resources than structure-based methods because they rely on only one or two-dimensional sequence information of proteins and chemical compounds [1,7]. As for the machine learning-based method, it extracts discriminative biological features for chemical compounds and target proteins in a drug–target pair and feeds the extracted features into a machine learning model such as random forest, logistic regression, support vector machine, and other kernel-based methods to determine whether the drug and the target protein will interact or not. Wang et al. use the structure similarity profile of drugs and the sequence similarity profile of proteins to encode a given drug–protein pair to obtain the feature vector that is inputted into support vector machine (SVM) to predict drug–protein interactions [8]. Tabei et al. employ an improved min-wise hashing algorithm to construct new compact fingerprints for compound–protein pairs and adopts linear SVM to make a large-scale prediction of compound–protein interactions [9]. Yu et al. extract conserved patterns from proteins and their corresponding ligands and then construct a systematic approach based on Random Forest (RF) and SVM to predict drug–target interaction [10].

Although machine learning-based methods are highly generalizable and sequence-sensitive (the arrangement and order of elements within the data sequences such as drug SMILE sequences and target protein sequences will affect the predictions or decisions of the model), they are constrained by excessive reliance on hand-crafted, expert knowledge-based features [1,11]. The aforementioned restrictions might be overcome via deep learning techniques, namely end-to-end differential models. In fact, despite having a lot of parameters, they may automatically learn the characteristics and invariances of provided data and offer a good generalization. For example, Öztürk et al. propose a convolutional neural-based method called DeepDTA to predict the binding affinities of protein–ligand interactions which employ CNN blocks to learn representations from the raw protein sequences and SMILES strings and combine these representations to feed into a fully connected layer block [12]. MolTrans, built by Huang et al., adopts frequent consecutive sub-sequence (FCS) mining to extract fit-sized sub-structures for both protein and drug, which is further processed through an augmented transformer-embedding module to predict drug–target interaction [13]. Cheng et al. developed a bidirectional encoder–decoder structure named IIFDTI to extract interactive features of substructures between drugs and targets and to input into fully connected dense layers in downstream tasks for predicting DTIs [14]. Chatterjee et al. propose the AI-Bind to overcome the limitations of current deep learning models in predicting novel drug–target interactions. AI-Bind includes two layers corresponding to binding and non-binding annotations between proteins and ligands. Positive and negative link probabilities are determined by entropy maximization and are used to estimate the conditional probability in transductive, semi-inductive, and inductive conditions [15]. Despite the progress made in DTI prediction models, they still have limitations. For instance, some methods lack experimentally verified negative datasets [16], while others are restricted to specific datasets, making them less robust due to inaccurate molecular feature representation and ineffective DTI classifiers [9]. Such limitations can result in overfitting the training data, causing the models to be adapted to only specific cohorts of data and perform poorly when applied to unseen drug–target pairs. Inaccurate molecular feature representation can lead to issues such as missing or inaccurate information, introducing noise and inconsistency into the model. Ineffective DTI classifiers may produce inaccurate predictions by failing to capture the categorical properties of input features, leading to false-positive and false-negative predictions.

Taking into account the limitations of the existing models, a novel robust capsule-based prediction model was constructed to accurately identify DTIs on four experimentally validated DTI datasets of different sizes, including human (Homo sapiens) and worm (Caenorhabditis elegans) species datasets, as well as three subsets (new compounds, new proteins, and new pairs). Briefly, we use the simplified molecular input line system (SMILES) string of the compounds, a line notation that uses predefined rules to describe the structure of a compound sequentially, and the target sequence of the compounds as input. Our model is made up of three major modules. The first module is a 2-dimensional (2-D) graph feature encoder, which converts SMILES into an atomic graph and employs the message-passing neural network (MPNN) to extract the graph’s high-order structures as well as semantic relations. Because pharmaceuticals are primarily graph-structured, with atoms as nodes and bonds as edges, the atomic graph representation is natural and sensible. The second module uses ProtBert [17] to vectorize protein sequences, rather than using the physicochemical properties of amino acids to encode proteins and perform clustering. This approach maps each word (amino acids are considered as words) to a potential vector space where geometric relationships can be applied to characterize semantic relationships. The last module is to add structures called “capsules” as a better model of hierarchical relationships. In this step, we apply a new activation function called ‘squash’, and another process widely known as dynamic routing between capsules. The robust representations generated by this module are highly discriminative in distinguishing between DTIs and non-DTIs. In comparison to former studies, our novelty is outlined as follows:

To overcome the limitation of other studies in which negative DTI data are randomly selected from unknown drug–target pairs, we established two experimentally validated datasets.
Protein sequences are treated as natural language and are vectorized by the state-of-the-art ProtBert model through transfer learning, while drug molecules are transformed by MPNN. Both encoding approaches represent protein targets and drug molecules more precisely.
The proposed capsule network-based DTI prediction model describes the internal hierarchical representation of features. It outperforms other existing established SOTA DTI prediction tools on seven experimentally validated DTI datasets with varying amounts of data, including human (Homo sapiens) and worm (Caenorhabditis elegans) species datasets.

2. Results

2.1. Effectiveness of BERT Module, MPNN Module, and Capsule Network Module

In this study, seven models with different protein sequence features, drug features, and DTI decision-making modules were selected as baselines to investigate the effectiveness of the BERT module, MPNN module, and capsule network module in the proposed model (Figure 1E). Specifically, one-hot encoding and BERT were used for protein sequence feature extraction, while fingerprint and MPNN were employed for feature extraction of drug structure (SMILES). As for the DTI decision-making module, the dense layer and the capsule layer were adopted to differentiate the DTI class and the non-DTI class. Overall, there are seven baseline models for comparison; that is, BERT + MPNN + dense model, BERT + fingerprint + capsule model, BERT + fingerprint + dense model, one-hot + MPNN + capsule model, one-hot + MPNN + dense model, one-hot + fingerprint + capsule model, and one-hot + fingerprint + dense model.

The performance of the proposed CapBM-DTI model was compared with the above seven baseline models on Dataset 1 (the reason for choosing Dataset 1 is that Dataset 1 is a medium-sized expert-curated dataset, and its comparison results are representative), and the results are shown in Figure 2 and Table S3. For the model training and independent testing, it was randomly assigned according to a ratio of 8:2. This means that the data consisted of 13,302 positive and 9414 negative samples, randomly selected for model training, and the rest of the data (3325 positives and 2354 negative samples) were used for external (independent) testing. The training and independent samples did not overlap. The model built in this paper outperforms all baseline models in terms of accuracy and F1 score. Specifically, this model achieved an accuracy of 89.3% and an F1 score of 90.1%. Although the other metrics (sensitivity, specificity, and precision) are not the best compared to the baseline models, the highest values of accuracy and F1 score are enough to prove that the proposed model is the most effective and powerful because accuracy is the most widely used metric to evaluate a model performance considering all correctly predicted cases whether positive or negative, and F1 score is a very robust evaluation metric that works great for many classification problems due to taking both false positives and false negatives into account.

To further intuitively compare the proposed CapBM-DTI model with other baseline models, ROC and PR curves were utilized to compare the model with other baselines as illustrated in Figure 3 and Table S3. It can be observed that this model achieved the highest auROC value of 0.946 and auPRC value of 0.97 for the independent test set compared with the baseline models, showing that the model exhibited more excellent predictive ability than baseline models in identifying DTI. To compare the performance of the models graphically, t-distributed stochastic neighbor embedding (t-SNE) visualizations of the proposed model and seven baseline models are shown in Figure S2.

In addition to the involvement of the BERT layer and the MPNN layer effectively extracting the latent feature of target proteins and drug molecules, respectively, the important reason why the CapBM-DTI model performs well in identifying DTI in the given dataset is the participation of the capsule layer by extracting internal hierarchical representation of concatenated features of protein targets and drug molecules. To visualize the feature extraction process of each layer of the capsule network (PrimaryCap_Conv1D layer, PrimaryCap_Squash layer, capsule layer, and length layer), the features captured by the layers were displayed in scatter 2D space by computed t-SNE. In Figure 4, the yellow triangles and blue stars denote the DTIs and non-DTIs, respectively. With these images, it can be seen that the DTI and non-DTI are more easily distinguishable along with each layer of the capsule layer processes the features from concatenate layer concatenating target protein and drug molecule features from the BERT layer and MPNN layer.

2.2. Performance and Generalization Comparison with Existing SOTA Predictors

To further demonstrate the power of the CapBM-DTI predictor, we compared it with several existing methods (Figure 1F). We selected three state-of-the-art (SOTA) deep learning models (i.e., DeepConv-DTI [18], CPI_prediction [19], TransformerCPI [20], and IIFDTI [14]) for comparisons on four datasets (the number of training data: the number of test data = 8:2 in each dataset) in Table 1. The reason why we choose these three methods is that they use experimentally validated datasets, so it is more fair to compare our model with them. Specifically, DeepConv-DTI uses the experimentally validated independent test sets collected from Pubchem and KinaseSafari to test the model while CPI_prediction, TransformerCPI, and IIFDTI use Dataset 3 and Dataset 4 to train and test the models. The comparative results (the scores of accuracy, F1 score, AUC, and AUPR obtained by these three predictors) are illustrated in Table 2 and the best results are shown in bold. As shown in Table 2, the results show that CapBM-DTI obtains the best performance amongst all competing methods in 4 Datasets except the AUC score in Dataset 2, the AUPR score in Dataset 3, and the accuracy score in Dataset 4. Importantly, CapBM-DTI also exhibited excellent performance in the cross-species Dataset 4 (C. elegans), proving the strong generalization ability of the model.

In order to further prove the strong predictive performance and generalization ability of the model, we compared the proposed method with the previous model in terms of three settings (new compounds, new proteins, and new pairs; “new” refers to the test set containing unseen items compared to the train set) from Dataset 3 (the reason we choose Dataset 3 for subset splitting is that Dataset 3 is a benchmark dataset for humans, so the split subsets are more comparable). The detailed statistics and definitions of the new compound subset, new protein subset, and new pairs subset are shown in Table 3. As shown in Table 4, the proposed model exhibits superior performance in all three tasks compared to previous predictors except the AUC score in the new compounds dataset and new pair dataset.

In summary, the proposed CapBM-DTI achieves a competitive or better performance than three SOTA predictors in all settings, proving its superior generalization ability and robust performance.

2.3. Feature Analysis

Discriminative features are the key to building a powerful computational classifier. When compared with other existing methods, the proposed CapBM-DTI has two main advantages: (i) integrating BERT and MPNN to extract the feature of protein targets and drug molecules more precisely, and (ii) using the capsule network to extract an internal hierarchical representation of concatenated features of protein targets and drug molecules. The latent and DTI-related features, i.e., DTI and non-DTI, have distinctly different characteristics, and are constructed with 32 features in our capsule networks layer. To highlight the discriminative power of DTI-related features extracted by the proposed CapBM-DTI, we randomly selected 50 DTI samples and 50 non-DTI ones on four datasets (Dataset 1, Dataset 2, Dataset 3, and Dataset 4) and perform a clustering analysis of the features on each dataset (Figure 1G). The clustering results are presented in Figure 5. As shown in Figure 5, we can easily see on each dataset that: (i) DTI and non-DTI samples are clustered into two distinct sub-trees, and (ii) the samples of the DTI class or non-DTI class tend to show similar values of DTI-related features. These results demonstrate that the features extracted by the proposed CapBM-DTI can accurately capture the DTI-related characteristics.

2.4. Case Study of Drug Repurposing to Treat COVID-19

Drug repurposing, also known as drug repositioning, is a strategy for identifying new uses for approved or investigational drugs outside of the original medical indication. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for causing coronavirus 2019 (COVID-19). The virus infects host cells by attaching the receptor-binding domain on its spike glycoprotein to the angiotensin-converting enzyme (ACE2) receptor in the host. Therefore, ACE2 is considered a potential target for the treatment of COVID-19 [22]. To identify potential repositioned drugs for COVID-19 treatment, this study predicted whether the drugs in DrugBank (totaling 11,296 drugs) could bind to ACE2 and prevent SARS-CoV-2 from entering the host cells, thus halting infection. This was achieved through CapBM-DTI, as shown in Figure 1H. Since we have models trained on three human datasets (Dataset 1, Dataset 2, and Dataset 3), we need to choose a model suitable for predicting ACE2-binding drugs based on the amount of data and the balance of ACE2 BLASTP results in a positive and negative DTI subset. We eventually choose the CapBM-DTI model, whose training set is Dataset 2, because not only does Dataset 2 have the largest amount of data (64026 samples), but also according to the BLASTP results of ACE2 aligned to Dataset 2 (Table S5), ACE2 can be blasted in positive and negative DTI sets, so Dataset 2 may be a positive and negative ACE2 balance dataset that is more suitable for predicting ACE2-binding drugs. Dataset 1 (Table S4) and Dataset 3 (Table S6) were not adopted because of the unbalanced results of BLASTP and the much smaller amount of data (6728 samples), respectively. Importantly, six COVID-19 therapeutic drugs reported in the literature shown in Table 5 were successfully predicted with high DTI possibility (0.83505136–0.9957441). We also provide an additional 50 ACE2-binding drugs with super high probability (DTI possibility > 0.9975) in Table S7, which have high application values of drug repurposing in clinical usages for treating COVID-19. To sum up, this case study proves the powerful application value of CapBM-DTI in the field of drug repurposing and virtual screening.

3. Discussion

To demonstrate the practicality of our model in drug development, we employed the treatment of COVID-19 as a case study, highlighting the model’s robust applicability and accuracy in drug repurposing and virtual screening. We also presented three case studies to showcase the model’s predictive performance on three key incremental problems in the drug development process: (a) predicting which compound molecules are “pan-assay interference” (PAIN) substances that bind promiscuously to many biochemical targets, producing extensive side effects or toxicity (Table S8); (b) accurately identifying a small subset of specific amino acids that are close to viable, druggable targets in protein sequences (binding region prediction) (Table S9); and (c) distinguishing binding–enhancing analogs from loss-of-potency analogs to refine activity and selectivity during lead optimization (Table S10). We believe that with the ongoing accumulation of drug-related data and the continual advancement of artificial intelligence algorithms, drug–target prediction models will play an increasingly pivotal role in the realm of drug development.

4. Materials and Methods

4.1. Experimentally Validated Datasets

Most supervised learning methods are restricted to the issue of negative sample selection due to the absence of experimentally validated non-DTI data. As a result, such approaches can only randomly select negative DTI data from unknown drug–target pairs of interest; however, these selected negative samples may comprise positive DTIs, which severely affects the selection performance and the generalization ability of the model [21,26,27,28,29,30]. To overcome this obstacle, it is most straightforward to create experimentally validated benchmark standard datasets containing both positive and negative DTI datasets (Figure 1A).

We collected two DTI datasets, Dataset 1 (medium-sized expert-curated dataset) from KinaseSARfari [31] and IUPHAR [32], and Dataset 2 (large-sized BioAssay-based dataset) from the PubChem BioAssay database [33]. For Dataset 1, KinaseSARfari contains 404 compounds and 365 proteins comprising a positive DTI dataset of 3969 with the dissociation constant (Cut off: Kd < 10 μm [34]) and an experimental validated negative dataset of 11,768. Moreover, we also obtained a positive DTI dataset of 13,643, consisting of 1541 compounds and 6218 proteins from IUPHAR. The percentages of different types of target proteins in Dataset 1 are as follows (Table S1): kinase (3.1%), enzyme (13.7%), GPCR (18.3%), ion channel (5.6%), nuclear receptor (1.7%), transporter (1.4%), and others (56.2%). For Dataset 2, we collected “Active” DTIs from the assays with the dissociation constant (Kd < 10 μm) as a positive DTI dataset. For the negative samples, we took the samples annotated as “Inactive” from the other assay types. Because the PubChem BioAssay database contained too many negative samples, we first collected only negative samples whose drug or target was included in the positive samples in the PubChem BioAssay database. Second, we randomly selected as many negative samples as positive DTIs from the PubChem BioAssay database. In this way, a total of 64,026 positive and negative samples containing 14,737 drugs and 2709 proteins were generated. In Dataset 2, the proportions of distinct target protein categories are as delineated (Table S1): kinase (5%), enzyme (45.8%), GPCR (9.2%), ion channel (0.7%), nuclear receptor (2%), transporter (0.3%), and others (37%).

In addition, we also selected two DTI datasets (i.e., Dataset 3 and Dataset 4), originally proposed by Liu et al. [21], as benchmark datasets because they are universally applicable and effective in drug discovery and development. They performed statistical tests to test the protein (or compound) corresponding to each compound (or protein) in the negative samples. As a result, their datasets contain highly credible negative samples of compound–protein pairs [19]. The detailed statistics, Venn diagram, and classification and percentage of target proteins of the 4 DTI datasets are shown in Table 1, Figure S1, and Table S1, respectively.

4.2. Framework of the Constructed Model

In this article, we construct a novel capsule network-based model for DTI prediction based on large-scale pre-trained bidirectional encoder representations from transformers (BERT) and message-passing neural networks (MPNN). An overview of the DTIs model can be seen in Figure 1D, which has three modules: (a) the BERT-based protein sequence encoding module (Figure 1B), (b) MPNN-based drug molecule encoding module (Figure 1C), and (c) capsule network-based DTI decision-making module (Figure 6). Firstly, we generate embedding vectors for protein sequences in the feature engineering of the protein sequences module by utilizing the auto-encoder ProtBert model, which is pre-trained on UniRef100 data containing 216 million protein sequences. Therefore, 1024-D vectors (dimensionality of the features extracted by the ProtBert model) can be used to represent the proteins. After a fully connected layer and batch normalization layer, 1024-D vectors are transformed into 200-D vectors. Secondly, in the feature engineering of the drug molecule module, structures of drug molecules are represented by 64-D vectors through the message-passing neural network (MPNN). Lastly, the 264-D vectors (a concatenation of protein sequence feature and drug feature) are fed into the capsule network-based decision-making model to generate interaction information. The optimization module contains a cross-entropy loss. At the end of the model, we can obtain an interaction score and a non-interaction score (generated by a length layer in the capsule network); the pair is an interaction if the interaction score is bigger than the non-interaction score.

The model in this study was constructed with the help of the TensorFlow library and the Keras framework (https://keras.io/, accessed on 15 August 2023). Overfitting was avoided by employing an early stopping strategy during model training. In particular, the training process was terminated when the accuracy did not improve within 20 epochs. The Adam optimizer was used, and the batch size and learning rate were set to 64 and 0.0001, respectively. We use grid search for hyperparameter tuning, and the optimal parameters obtained are shown in Table S2. NVIDIA A1000 40GB GPUs in high-performance computing systems were utilized throughout the training process.

4.2.1. Feature Extraction from Protein

For protein feature extraction, numerous word-embedding techniques were recently utilized [35], but these techniques may map each word to its vector, making this representation context-independent. In response to the exponential growth of textual data, the first fine-tuning-based representation model, bidirectional encoder representations from transformers (BERT) [36], can generate distinct representations for the same word based on context [36,37]. In particular, Elnaggar et al. released a model named ProtBert that was trained on UniRef100 datasets that contained 216 million protein sequences [17]. The structure of ProtBert is similar to that of the original Bert publication, and the BERT model includes some unique encoding symbols, such as [CLS] and [SEP]. However, accurately predicting the complete 3D structure of a protein solely from its sequence using BERT-based models remains a complex and challenging problem [17,38,39,40,41]. While BERT-based models are not specifically designed to provide high-resolution 3D structural predictions similar to AlphaFold [42], they showed promise in capturing local structural features such as secondary structure (alpha-helices and beta-sheets), solvent accessibility, and other properties that can be inferred from sequence patterns [4,43]. In this study, we employ the ProtBert, a large protein language model, to extract contextual features from protein sequences utilizing transfer learning (Figure 1B). This approach may aid in capturing the local structure of proteins and simulating the binding interactions between drugs and the protein’s local structure.

We add the [CLS] token, which serves as an aggregate sequence representation and is typically utilized for sequence classification tasks in the BERT model. Furthermore, the [SEP] token, which is marked as R_L+1, is added at the end of the protein sequence. The protein can be represented as a feature matrix P_BERT, and every amino acid can be converted to a 1024-dimensional vector BERT_Ri from the final layer of ProtBert.

B E R T_{R_{i}} = [B E R T_{R_{i}}^{1} B E R T_{R_{i}}^{2} B E R T_{R_{i}}^{3} \dots B E R T_{R_{i}}^{1024}]

(1)

P_{B E R T} = [\begin{matrix} B E R T_{R_{0}}^{1} & \dots & B E R T_{R_{0}}^{1024} \\ ⋮ & ⋱ & ⋮ \\ B E R T_{R_{L + 1}}^{1} & \dots & B E R T_{R_{L + 1}}^{1024} \end{matrix}]

(2)

Equation (2) demonstrates that P_BERT sizes vary depending on the protein length. The matrix was averaged (mean pooled) over the vertical axis to create the protein sequences in the same mathematics, and the resulting 1024-dimensional vector was referred to as BERT_Mean.

B E R T_M e a n_{n} = \frac{\sum_{I = 0}^{i = L + 1} B E R T_{R_{i}}^{n}}{L + 2} (1 \leq n \leq 1024)

(3)

P_{B E R T_M e a n} = B E R T_M e a n_{1} B E R T_M e a n_{2} \dots B E R T_M e a n_{1024}

(4)

4.2.2. Feature Extraction from Drug Molecule

The message-passing neural network (MPNN) [44] is a kind of generalized graph neural network, which is very suitable for extracting features from graph-structured data. MPNN was recently used to solve problems involving the prediction of molecular properties [45,46,47,48]. In the DTI prediction task, we employ a message passing neural network as a drug feature encoder for the atomic graph (Figure 1C).

The module takes a 2-D atomic graph

G = {V, E}

as its input. Specifically,

V

is the set of nodes, containing the various atoms in the molecule

V = {C, H, O \dots \dots}

.

E

is the set of edges, which contains a total of four types.

E = {e_{v, w} \in t y p e | v, w \in V}

,

t y p e = {s i n g l e, d o u b l e, t r i p l e, a r o m a t i c b o n d}

.

In particular, V is the collection of nodes that contain the various atoms that make up the molecule

V = {C, H, O \dots \dots}

. The set of edges known as E has four distinct types.

E = {e_{v, w} \in t y p e | v, w \in V} t y p e = {s i n g l e, d o u b l e, t r i p l e, a r o m a t i c b o n d}

Graph feature encoder is performed through two phases:

Phase 1: Message passing. Along the edges of the graph, nodes transmit their information in the form of message vectors to other neighbor nodes, and nodes update their hidden features by aggregating the message vectors transmitted by their neighbors. Each node receives message vectors from its K-th neighbors after K times of message passing, and the hidden features of each node are updated K times.

Phase 2: Readout. A readout function is used to combine the features of all nodes into a representation of the entire graph after the hidden features of all nodes are updated.

Message Passing

A fixed-size feature

h_{v}^{(0)}

∈R^r of the atom’s chemical information (such as its type, valency, number of implicit H, number of electrons, hybridization type, and number of aromatic rings) is initialized by each node. The hidden features of the nodes are updated iteratively along the edges of the graph by passing information between neighbor nodes. As a result, we define the locally operated calculation step for the aggregated message vector

m_{v}^{(k)}

as

m_{v}^{(k)} = A g g r e g a t i o n (h_{v}^{(k)}, h_{w}^{(k)}, e_{v, w} | w \in N (v), e_{v, w} \in E)

(5)

where Aggregation is a message-aggregating function,

N (v)

is a self-included neighborhood of the node v, and

e_{v, w}

is the edge that separates the node v from the node w.

Notably, the message vector generated by the sender node is passed to the neighbor nodes by a specific type of edge. The receiving node aggregates messages from its neighbors, including information about the neighbor nodes and edges between the nodes. Then, each node updates the hidden features using its current hidden features

h_{v}^{(k)}

and the message from its neighbors

m_{v}^{(k)}

. This is completed according to the following formulas:

A particular kind of edge is responsible for transferring the message vector that is generated by the sender node to the neighboring nodes. The messages sent by its neighbors are compiled by the receiving node, which also collects data about the edges that separate the nodes. Then, each node uses its current hidden features

h_{v}^{(k)}

and the message from its neighbors

m_{v}^{(k)}

to update the hidden features. This is completed using the following formula:

h_{v}^{(k + 1)} = G R U (h_{v}^{(k)}, m_{v}^{(k)}) .

(6)

As the update function, the gated recurrent unit (GRU) takes the most recent node state as input and updates it based on previous node states. To put it another way, the GRU gets its information from the most recent node state, while the memory state of the GRU contains information from previous nodes.

After K times of message passing, hidden features of each node contain the messages of its Kth neighbor nodes.

Readout

When the message-passing procedure ends, the k-step-aggregated node states are to be partitioned into subgraphs (corresponding to each molecule in the batch) and subsequently reduced to graph-level embedding. In this study, a transformer encoder + average pooling is used. Specifically:

The k-step-aggregated node states will be partitioned into the subgraphs (corresponding to each molecule in the batch);
Each subgraph will then be padded to match the subgraph with the greatest number of nodes;
The (stacked padded) tensor, encoding subgraphs (each subgraph containing a set of node states), are masked to make sure the paddings do not interfere with training;
Finally, the tensor is passed to the transformer followed by average pooling.

4.2.3. Capsule Network

CNNs significantly outperformed many conventional curated feature extraction models over the past few years, achieving breakthroughs in numerous fields, such as computer vision [49] and bioinformatics [18,50,51,52,53]. CNNs have some success, but they are limited by their inability to learn spatial relationships between features and the invariance caused by pooling operations [54]. Sabour et al. proposed a novel DL theory called capsule network (CapsNet) to address these issues [55,56,57]. The fundamental component of capsule networks is the capsule, which is a collection of neurons arranged in the shape of a vector. A primary capsule layer and a class capsule layer make up the capsule layer. In contrast to conventional neural networks, capsule networks use vectors as inputs and outputs rather than scalar. Each vector (capsule) represents a type of pattern (values for feature properties such as velocity, size, orientation, and color) while the orientation of the capsule denotes the characteristics of that pattern.

A Conv1D is located in the primary capsule layer and is utilized for additional feature extraction. Multiple 8-dimension vectors are transformed from Conv1D’s outputs (the dimension is a hyper-parameter). The length of a vector, according to the idea of capsule networks, indicates the probability of the pattern that is presented. These 8-dimension vectors are treated with squash, a non-linear function that does not change the direction of the vector but compresses its length to a value between 0 and 1.

s q u a s h (s) = \frac{{‖ s ‖}^{2}}{1 + {‖ s ‖}^{2}} \frac{s}{‖ s ‖}

(7)

where the conditions are

s q u a s h (s) \approx ‖ s ‖ s

when s is small and

s q u a s h (s) \approx \frac{s}{‖ s ‖}

when s is large.

The class capsule layer contains two 16-dimensional vectors for this binary classification task: a positive capsule and a negative capsule. The input concatenate protein sequences and drug structures are represented by each capsule in this layer. The computation process between the primary and class capsule layers is depicted in Figure 2. To obtain the prediction vectors

{\hat{u}}_{j | i}

from capsule i to j, the outputs of the primary capsule layer

u_{i}

are first multiplied by the weight matrix

W_{i j}

, a learnable weight matrix. The weighted sum of all

{\hat{u}}_{j | i}

determined by the following equation is

S_{j}

.

{\hat{u}}_{j | i} = W_{i j} u_{i}

(8)

S_{j} = \sum_{i = 1}^{L} c_{i j} {\hat{u}}_{j | i}

(9)

where

c_{i j}

is coupling coefficients determined by the dynamic routing process through comparing the prediction

{\hat{u}}_{j | i}

and real output v_j and reflects the probabilities of primary capsules that should be coupled to the class capsules activated by the specific input target sequences and drug structures. L denotes the number of primary capsules.

Algorithm 1 demonstrates the complete process of dynamic routing. A scalar product

b_{i j} = {\hat{u}}_{j | i}^{T} v_{j}

is the log prior probability between primary capsule i and class capsule j and

c_{i j}

is calculated as a softmax function on

b_{i j}

. Therefore, the sum of the coupling coefficients of primary capsule i to class capsules

\sum_{k = 1}^{N} c_{i k}

equals 1, where N denotes the class capsule number. The number of iterations r is a hyper-parameter that needs to be assigned in advance. During dynamic routing, the class capsule layer generates two output vectors

v_{j}

. The elements of

v_{j}

encode the features whose length (L2 norm) denotes the probability distribution of the two classes, DTI or non-DTI. Therefore, the final part of the network aims to calculate its length, which is calculated as follows:

Algorithm 1: Dynamic Routing

Input:

{\hat{u}}_{j | i}

, r and l
Output: v_j

for all capsule i (primary capsule) in layer l and capsule j (class capsule) in layer (l + 1):
b_ij ⇐ 0
for r iterations do
for all capsules i in layer l: c_i ⇐ softmax(b_i)
for all capsules j in layer (l+1): s_j ⇐ $\sum_{i} c_{i j} {\hat{u}}_{j | i}$
for all capsules j in layer (l+1): v_j ⇐ squash(s_j)
for all capsule i in layer l and capsule j in layer (l+1):
b_ij ⇐ b_ij + ${\hat{u}}_{j | i} v_{j}$
end for
return v_j

The entire dynamic routing process is shown in Algorithm 1. A scalar product

b_{i j} = {\hat{u}}_{j | i}^{T} v_{j}

is the log prior probability between primary capsule i and class capsule j, and the softmax function on

b_{i j}

is used to calculate

c_{i j}

. As a result, the sum of the coupling coefficients of primary capsule i to class capsules

\sum_{k = 1}^{N} c_{i k}

equals 1, where N denotes the class capsule number. The number of iterations r is a hyper-parameter that needs to be predetermined. During dynamic routing, the class capsule layer creates two result vectors

v_{j}

. The elements of

v_{j}

encode the features whose length (L2 norm) indicates the probability distribution of the two classes, DTI or non-DTI. Therefore, the final section of the network aims to calculate its length as follows:

p_{j} = ‖ v_{j} ‖_{2}

(10)

Finally, this study utilizes the margin loss introduced by Sabour et al. [56], which is calculated as follows for DTI and non-DTI classes:

L_{c} = I_{c} \max {(0, m^{+} - {‖ v_{c} ‖}_{2})}^{2} + λ (1 - I_{c}) \max {(0, {‖ v_{c} ‖}_{2} - m^{-})}^{2}

(11)

where c and

I

denote the classification category and the indicator function, respectively. In particular,

I_{c} = 1

if the current sample belongs to class c; otherwise

I_{c} = 0

.

m^{+}

,

m^{-}

, and

λ

are hyper-parameters in this function, and we use the suggested values 0.9, 0.1, and 0.5, respectively.

4.3. Performance Metrics

The determination of a pair belonging to an interactive drug–target pair or non-interactive drug–target pair is in the case of single-label classification. The metrics such as sensitivity (Sen.), specificity (Spe.), precision (Pre.), accuracy (Acc.), and the F1 measure (F1) are frequently used. The specific formulas are as follows:

Sen . = \frac{TP}{P}

(12)

Spe . = \frac{TN}{N}

(13)

Pre . = \frac{TP}{TP + FP}

(14)

Acc . = \frac{TP + TN}{P + N}

(15)

F 1 = \frac{2 Sen * Pre}{Sen + Pre}

(16)

where TP is true positive, TN is true negative, FP is false positive, FN is false negative, T is positive, and N is negative.

Additionally, the precision recall (PR) curve and the receiver operating characteristic (ROC) curve are frequently utilized to intuitively assess a predictor’s overall predictive performance. In this section, we evaluated the predictive performance in general by calculating the area under the ROC curve (auROC) and the area under the PR curve (auPRC). The auROC and auPRC values range from 0.5 to 1. The higher their values are, the better the model performance is.

5. Conclusions

In this study, we developed a potent and robust capsule-based drug–target interaction prediction framework named CapBM-DTI based on drug structures and protein sequences. According to our knowledge, this study is the first to use a capsule network to classify drug–target interactions. Specifically, we attempted to use pre-trained BERT to present proteins through transfer learning that map each word (amino acids are considered as words) to a potential vector space where geometric relationships can be applied to characterize semantic relationships. Additionally, we convert SMILES into an atomic 2D graph to represent drug and employ the message-passing neural network (MPNN) to extract the graph’s high-order structures as well as semantic relations. Lastly, we adopted a capsule network, which describes the internal hierarchical representation of features as a classifier to differentiate DTI and non-DTI. Importantly, we constructed two experimentally validated datasets to get over the drawback that negative DTI data from other research are picked at random from unidentified drug–target combinations. Furthermore, CapBM-DTI achieved satisfactory robust performance and strong generalizability compared to SOTA methods on seven experimentally validated DTI datasets with varying amounts of data, including human (Homo sapiens) and worm (Caenorhabditis elegans) species datasets. The case studies demonstrates the applicability of the model in virtual screening, drug repositioning and drug development. Overall, CapBM-DTI is a robust and accurate DTI prediction framework, which offers more significant potential and scope for drug target identification, virtual screening, drug repurposing, and drug development. The source code and data are freely available at https://github.com/huangyixian666/CapBM-DTI (accessed on 15 August 2023).

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms241814061/s1. References [58,59,60,61] are cited in the supplementary materials.

Author Contributions

Conceptualization, Y.H., H.-Y.H. and H.-D.H.; data curation, Y.H., Y.C. (Yigang Chen), T.L. and J.L.; funding acquisition, H.-D.H.; methodology, Y.H. and L.Y.; project administration, H.-D.H.; supervision, H.-Y.H. and H.-D.H.; validation, Y.H.; writing—original draft, Y.H. and Y.C. (Yuan Chang); writing—review and editing, Y.H., H.-Y.H., Y.-C.-D.L., Y.Z., Z.Z., K.M., Y.-N.C., T.-Y.L. and H.-D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the National Natural Science Foundation of China (No. 32070674); Shenzhen Science and Technology Program [JCYJ20220530143615035]; the Warshel Institute for Computational Biology funding from Shenzhen City and Longgang District; Shenzhen-Hong Kong Cooperation Zone for Technology and Innovation (HZQB-KCZYB-2020056, P2-2022-HDH-001-A); Guangdong Young Scholar Development Fund of Shenzhen Ganghong Group Co., Ltd. (2021E0005, 2022E0035); Key Program of Guangdong Basic and Applied Basic Research Fund (Guangdong–Shenzhen Joint Fund) [2020B1515120069].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

CapBM-DTI and datasets of this study are available at https://github.com/huangyixian666/CapBM-DTI (accessed on 15 August 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Sachdev, K.; Gupta, M.K. A comprehensive review of feature based methods for drug target interaction prediction. J. Biomed. Inform. 2019, 93, 103159. [Google Scholar] [CrossRef]
Yamanishi, Y.; Kotera, M.; Moriya, Y.; Sawada, R.; Kanehisa, M.; Goto, S. DINIES: Drug-target interaction network inference engine based on supervised analysis. Nucleic Acids Res. 2014, 42, W39–W45. [Google Scholar] [CrossRef] [PubMed]
Bagherian, M.; Sabeti, E.; Wang, K.; Sartor, M.A.; Nikolovska-Coleska, Z.; Najarian, K. Machine learning approaches and databases for prediction of drug–target interaction: A survey paper. Brief. Bioinform. 2021, 22, 247–269. [Google Scholar] [CrossRef] [PubMed]
Zheng, J.; Xiao, X.; Qiu, W.-R. DTI-BERT: Identifying drug-target interactions in cellular networking based on BERT and deep learning method. Front. Genet. 2022, 13, 859188. [Google Scholar] [CrossRef]
Ferreira, L.G.; Dos Santos, R.N.; Oliva, G.; Andricopulo, A.D. Molecular docking and structure-based drug design strategies. Molecules 2015, 20, 13384–13421. [Google Scholar] [CrossRef]
Sethi, A.; Joshi, K.; Sasikala, K.; Alvala, M. Molecular docking in modern drug discovery: Principles and recent applications. Drug Discov. Dev.-New Adv. 2019, 2, 27–47. [Google Scholar]
He, Z.; Zhang, J.; Shi, X.H.; Hu, L.L.; Kong, X.; Cai, Y.D.; Chou, K.C. Predicting drug-target interaction networks based on functional groups and biological features. PLoS ONE 2010, 5, e9603. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.-C.; Yang, Z.-X.; Wang, Y.; Deng, N.-Y. Computationally probing drug-protein interactions via support vector machine. Lett. Drug Des. Discov. 2010, 7, 370–378. [Google Scholar] [CrossRef]
Tabei, Y.; Yamanishi, Y. Scalable prediction of compound-protein interactions using minwise hashing. BMC Syst. Biol. 2013, 7 (Suppl. S6), S3. [Google Scholar] [CrossRef]
Yu, H.; Chen, J.; Xu, X.; Li, Y.; Zhao, H.; Fang, Y.; Li, X.; Zhou, W.; Wang, W.; Wang, Y. A systematic prediction of multiple drug-target interactions from chemical, genomic, and pharmacological data. PLoS ONE 2012, 7, e37608. [Google Scholar] [CrossRef]
Sawada, R.; Kotera, M.; Yamanishi, Y. Benchmarking a Wide Range of Chemical Descriptors for Drug-Target Interaction Prediction Using a Chemogenomic Approach. Mol. Inform. 2014, 33, 719–731. [Google Scholar] [CrossRef] [PubMed]
Ozturk, H.; Ozgur, A.; Ozkirimli, E. DeepDTA: Deep drug-target binding affinity prediction. Bioinformatics 2018, 34, i821–i829. [Google Scholar] [CrossRef] [PubMed]
Huang, K.; Xiao, C.; Glass, L.M.; Sun, J. MolTrans: Molecular Interaction Transformer for drug-target interaction prediction. Bioinformatics 2021, 37, 830–836. [Google Scholar] [CrossRef]
Cheng, Z.; Zhao, Q.; Li, Y.; Wang, J. IIFDTI: Predicting drug-target interactions through interactive and independent features based on attention mechanism. Bioinformatics 2022, 38, 4153–4161. [Google Scholar] [CrossRef] [PubMed]
Chatterjee, A.; Walters, R.; Shafi, Z.; Ahmed, O.S.; Sebek, M.; Gysi, D.; Yu, R.; Eliassi-Rad, T.; Barabási, A.-L.; Menichetti, G. Improving the generalizability of protein-ligand binding predictions with AI-Bind. Nat. Commun. 2023, 14, 1989. [Google Scholar] [CrossRef]
You, J.; McLeod, R.D.; Hu, P. Predicting drug-target interaction network using deep learning model. Comput. Biol. Chem. 2019, 80, 90–101. [Google Scholar] [CrossRef]
Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rihawi, G.; Wang, Y.; Jones, L.; Gibbs, T.; Feher, T.; Angerer, C.; Steinegger, M. ProtTrans: Towards cracking the language of Life’s code through self-supervised deep learning and high performance computing. arXiv 2020, arXiv:2007.06225. [Google Scholar]
Lee, I.; Keum, J.; Nam, H. DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences. PLoS Comput. Biol. 2019, 15, e1007129. [Google Scholar] [CrossRef]
Tsubaki, M.; Tomii, K.; Sese, J. Compound–protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics 2019, 35, 309–318. [Google Scholar] [CrossRef]
Chen, L.; Tan, X.; Wang, D.; Zhong, F.; Liu, X.; Yang, T.; Luo, X.; Chen, K.; Jiang, H.; Zheng, M. TransformerCPI: Improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics 2020, 36, 4406–4414. [Google Scholar] [CrossRef]
Liu, H.; Sun, J.; Guan, J.; Zheng, J.; Zhou, S. Improving compound–protein interaction prediction by building up highly credible negative samples. Bioinformatics 2015, 31, i221–i229. [Google Scholar] [CrossRef]
Mulling, N.; Rohn, H. Angiotensin-converting enzyme 2 (ACE2): Role in the pathogenesis of diseases outside of COVID-19. Der nephrologe 2021, 16, 185–188. [Google Scholar] [CrossRef]
Inoue, Y.; Tanaka, N.; Tanaka, Y.; Inoue, S.; Morita, K.; Zhuang, M.; Hattori, T.; Sugamura, K. Clathrin-dependent entry of severe acute respiratory syndrome coronavirus into target cells expressing ACE2 with the cytoplasmic tail deleted. J. Virol. 2007, 81, 8722–8729. [Google Scholar] [CrossRef]
Touret, F.; Gilles, M.; Barral, K.; Nougairède, A.; Van Helden, J.; Decroly, E.; De Lamballerie, X.; Coutard, B. In vitro screening of a FDA approved chemical library reveals potential inhibitors of SARS-CoV-2 replication. Sci. Rep. 2020, 10, 13093. [Google Scholar] [CrossRef]
Hoffmann, M.; Hofmann-Winkler, H.; Smith, J.C.; Krüger, N.; Arora, P.; Sørensen, L.K.; Søgaard, O.S.; Hasselstrøm, J.B.; Winkler, M.; Hempel, T. Camostat mesylate inhibits SARS-CoV-2 activation by TMPRSS2-related proteases and its metabolite GBPA exerts antiviral activity. EBioMedicine 2021, 65, 103255. [Google Scholar] [CrossRef]
Chen, X.; Yan, C.C.; Zhang, X.; Zhang, X.; Dai, F.; Yin, J.; Zhang, Y. Drug–target interaction prediction: Databases, web servers and computational models. Brief. Bioinform. 2016, 17, 696–712. [Google Scholar] [CrossRef] [PubMed]
Gönen, M. Predicting drug–target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics 2012, 28, 2304–2310. [Google Scholar] [CrossRef] [PubMed]
Peng, L.; Zhu, W.; Liao, B.; Duan, Y.; Chen, M.; Chen, Y.; Yang, J. Screening drug-target interactions with positive-unlabeled learning. Sci. Rep. 2017, 7, 8087. [Google Scholar] [CrossRef]
Ezzat, A.; Wu, M.; Li, X.-L.; Kwoh, C.-K. Computational prediction of drug–target interactions using chemogenomic approaches: An empirical survey. Brief. Bioinform. 2019, 20, 1337–1357. [Google Scholar] [CrossRef] [PubMed]
Chen, R.; Liu, X.; Jin, S.; Lin, J.; Liu, J. Machine learning for drug-target interaction prediction. Molecules 2018, 23, 2208. [Google Scholar] [CrossRef]
Gaulton, A.; Bellis, L.J.; Bento, A.P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B. ChEMBL: A large-scale bioactivity database for drug discovery. Nucleic Acids Res. 2012, 40, D1100–D1107. [Google Scholar] [CrossRef] [PubMed]
Harmar, A.J.; Hills, R.A.; Rosser, E.M.; Jones, M.; Buneman, O.P.; Dunbar, D.R.; Greenhill, S.D.; Hale, V.A.; Sharman, J.L.; Bonner, T.I. IUPHAR-DB: The IUPHAR database of G protein-coupled receptors and ion channels. Nucleic Acids Res. 2009, 37 (Suppl. S1), D680–D685. [Google Scholar] [CrossRef]
Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem 2023 update. Nucleic Acids Res. 2023, 51, D1373–D1380. [Google Scholar] [CrossRef]
Niijima, S.; Shiraishi, A.; Okuno, Y. Dissecting kinase profiling data to predict activity and understand cross-reactivity of kinase inhibitors. J. Chem. Inf. Model. 2012, 52, 901–912. [Google Scholar] [CrossRef] [PubMed]
Zheng, J.; Xiao, X.; Qiu, W.-R. iCDI-W2vCom: Identifying the Ion channel–Drug interaction in cellular networking based on word2vec and node2vec. Front. Genet. 2021, 12, 738274. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Bianchi, F.; Terragni, S.; Hovy, D. Pre-training is a hot topic: Contextualized document embeddings improve topic coherence. arXiv 2020, arXiv:2004.03974. [Google Scholar]
Hu, B.; Xia, J.; Zheng, J.; Tan, C.; Huang, Y.; Xu, Y.; Li, S.Z. Protein language models and structure prediction: Connection and progression. arXiv 2022, arXiv:2211.16742. [Google Scholar]
Brandes, N.; Ofer, D.; Peleg, Y.; Rappoport, N.; Linial, M. ProteinBERT: A universal deep-learning model of protein sequence and function. Bioinformatics 2022, 38, 2102–2110. [Google Scholar] [CrossRef]
Dumortier, B.; Liutkus, A.; Carré, C.; Krouk, G. PeTriBERT: Augmenting BERT with tridimensional encoding for inverse protein folding and design. bioRxiv 2022, 2022, 503344. [Google Scholar]
Lin, Z.; Akin, H.; Rao, R.; Hie, B.; Zhu, Z.; Lu, W.; Smetanin, N.; Verkuil, R.; Kabeli, O.; Shmueli, Y. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023, 379, 1123–1130. [Google Scholar] [CrossRef]
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef]
Chandra, A.; Tünnermann, L.; Löfstedt, T.; Gratz, R. Transformer-based deep learning for predicting protein properties in the life sciences. eLife 2023, 12, e82819. [Google Scholar] [CrossRef] [PubMed]
Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural Message Passing for Quantum Chemistry. In Proceedings of the International Conference on Machine Learning 2017, Sydney, Australia, 6–11 August 2017; PMLR: Cambridge, UK, 2017; pp. 1263–1272. [Google Scholar]
Withnall, M.; Lindelöf, E.; Engkvist, O.; Chen, H. Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction. J. Cheminform. 2020, 12, 1. [Google Scholar] [CrossRef]
Jo, J.; Kwak, B.; Choi, H.-S.; Yoon, S. The message passing neural networks for chemical property prediction on SMILES. Methods 2020, 179, 65–72. [Google Scholar] [CrossRef]
Wang, Z.; Liu, M.; Luo, Y.; Xu, Z.; Xie, Y.; Wang, L.; Cai, L.; Qi, Q.; Yuan, Z.; Yang, T. Advanced graph and sequence neural networks for molecular property prediction and drug discovery. Bioinformatics 2022, 38, 2579–2586. [Google Scholar] [CrossRef]
Datta, R.; Das, D.; Das, S. Efficient lipophilicity prediction of molecules employing deep-learning models. Chemom. Intell. Lab. Syst. 2021, 213, 104309. [Google Scholar] [CrossRef]
Lu, L.; Yi, Y.; Huang, F.; Wang, K.; Wang, Q. Integrating local CNN and global CNN for script identification in natural scene images. IEEE Access 2019, 7, 52669–52679. [Google Scholar] [CrossRef]
Cao, X.; He, W.; Chen, Z.; Li, Y.; Wang, K.; Zhang, H.; Wei, L.; Cui, L.; Su, R.; Wei, L. PSSP-MVIRT: Peptide secondary structure prediction based on a multi-view deep learning architecture. Brief. Bioinform. 2021, 22, bbab203. [Google Scholar] [CrossRef]
Khanal, J.; Nazari, I.; Tayara, H.; Chong, K.T. 4mCCNN: Identification of N4-methylcytosine sites in prokaryotes using convolutional neural network. IEEE Access 2019, 7, 145455–145461. [Google Scholar] [CrossRef]
Khanal, J.; Tayara, H.; Chong, K.T. Identifying enhancers and their strength by the integration of word embedding and convolution neural network. IEEE Access 2020, 8, 58369–58376. [Google Scholar] [CrossRef]
Luo, X.; Kang, X.; Schönhuth, A. Predicting the prevalence of complex genetic diseases from individual genotype profiles using capsule networks. Nat. Mach. Intell. 2023, 5, 114–125. [Google Scholar] [CrossRef]
Ali, S.D.; Kim, J.H.; Tayara, H.; to Chong, K. Prediction of RNA 5-hydroxymethylcytosine modifications using deep learning. IEEE Access 2021, 9, 8491–8496. [Google Scholar] [CrossRef]
LaLonde, R.; Bagci, U. Capsules for object segmentation. arXiv 2018, arXiv:1804.04241. [Google Scholar]
Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. Adv. Neural Inf. Process. Syst. 2017, 30, 3859–3869. [Google Scholar]
Hinton, G.E.; Sabour, S.; Frosst, N. Matrix capsules with EM routing. In Proceedings of the International Conference on Learning Representations 2018, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Baell, J.B. Feeling nature’s PAINS: Natural products, natural product drugs, and pan assay interference compounds (PAINS). J. Nat. Prod. 2016, 79, 616–628. [Google Scholar] [CrossRef]
Rimassa, L. Drugs in development for hepatocellular carcinoma. Gastroenterol. Hepatol. 2018, 14, 542. [Google Scholar]
Lee, I.; Nam, H. Sequence-based prediction of protein binding regions and drug–target interactions. J. Cheminform. 2022, 14, 5. [Google Scholar] [CrossRef]
Verhasselt, S.; Roman, B.I.; Bracke, M.E.; Stevens, C.V. Improved synthesis and comparative analysis of the tool properties of new and existing D-ring modified (S)-blebbistatin analogs. Eur. J. Med. Chem. 2017, 136, 85–103. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The schematic of this study. (A) Build two experimentally validated DTI datasets. (B) The principles of feature extraction from protein sequence using bidirectional encoder representations from transformers (BERT). (C) The principles of feature extraction from drug molecules using the message-passing neural network (MPNN). (D) Overview of the proposed model (CapBM-DTI). It has three modules: (a) BERT-based protein sequence encoding module, (b) MPNN-based drug molecule encoding module, and (c) the capsule network-based DTI decision-making module. (E) Effectiveness of BERT module, MPNN module, and capsule network module. Seven baseline models with different protein sequence features, drug molecule features, and DTI decision-making modules were selected as baselines to investigate the effectiveness of the BERT module, MPNN module, and capsule network module in the proposed model. (F) Comparison with previous SOTA methods among different datasets (human and worm datasets) and different settings (new compounds, new proteins, and new pairs). (G) Feature analysis of DTI-related features extracted by the proposed CapBM-DTI. (H) Case study of drug repurposing to treat COVID-19.

Figure 2. Comparison of performances between our models and baseline models on Dataset 1.

Figure 3. ROC and PR curves for our models and baseline models on Dataset 1. (A) ROC curves. (B) ROC curves for partially enlarged. (C) PR curves. (D) PR curves for partially enlarged.

Figure 4. t-SNE visualizations of different layers, using Dataset 1. With these images, it can be seen that the DTI (yellow triangles) and non-DTI (blue stars) are easily distinguishable through the length layer compared to the concatenate layer concatenating target protein and drug molecule features from the BERT layer and MPNN layer.

Figure 5. The clustering analysis chart of features obtained by CapBM-DTI on 4 Datasets. (A) Dataset 1. (B) Dataset 2. (C) Dataset 3. (D) Dataset 4.

Figure 6. Computation process between the primary capsule layer and the class capsule layer. The prediction vector

{\hat{u}}_{j | i}

is computed by multiplying

u_{i}

by a transform matrix

W_{i j}

. The class capsule layer contains positive and negative capsules, each of which are calculated by a weighted sum of all prediction vectors and the squash function. During dynamic routing, the class capsule layer generates two output vectors

v_{j}

, whose length (L2 norm) denotes the probability distribution of the two classes, DTI or non-DTI.

Figure 6. Computation process between the primary capsule layer and the class capsule layer. The prediction vector

{\hat{u}}_{j | i}

is computed by multiplying

u_{i}

by a transform matrix

W_{i j}

. The class capsule layer contains positive and negative capsules, each of which are calculated by a weighted sum of all prediction vectors and the squash function. During dynamic routing, the class capsule layer generates two output vectors

v_{j}

, whose length (L2 norm) denotes the probability distribution of the two classes, DTI or non-DTI.

Table 1. Statistics of DTI datasets.

Dataset	Dataset 1	Dataset 2	Dataset 3 [21]	Dataset 4 [21]
Species	H. sapiens	H. sapiens	H. sapiens	C. elegans
The number of compounds	6602	14,737	2726	1767
The number of proteins	1900	2709	2001	1876
The number of positive interactions ^a	16,627	32,013	3364	3893
The number of negative interactions ^b	11,768	32,013	3364	3893
Density (%)	0.226	0.16	0.123	0.235

^a: Positive interactions are experimentally validated drug–target interactions. ^b: Negative interactions are experimentally validated non-drug–target interactions.

Table 2. Comparison of performances between our model and previous models from four datasets (training dataset: test dataset = 8:2). Best performance values are in bold.

Model	Dataset 1				Dataset 2				Dataset 3				Dataset 4
Model	Accuracy	F1	AUC	AUPR	Accuracy	F1	AUC	AUPR	Accuracy	F1	AUC	AUPR	Accuracy	F1	AUC	AUPR
DeepConv-DTI	0.877	0.894	0.941	0.964	0.825	0.843	0.933	0.932	0.611	0.662	0.636	0.780	0.943	0.936	0.978	0.975
CPI_prediction	0.885	0.894	0.943	0.967	0.864	0.852	0.935	0.938	0.891	0.901	0.936	0.945	0.926	0.931	0.965	0.972
TransformerCPI	0.872	0.883	0.940	0.965	0.855	0.851	0.938	0.935	0.892	0.893	0.954	0.958	0.914	0.911	0.977	0.977
IIFDTI	0.857	0.879	0.946	0.968	0.736	0.631	0.952	0.943	0.880	0.890	0.951	0.963	0.938	0.942	0.980	0.983
CapBM-DTI	0.893	0.901	0.946	0.970	0.87	0.862	0.935	0.944	0.915	0.915	0.958	0.961	0.941	0.938	0.982	0.983

Table 3. Statistics of new compound subset, new protein subset, and new pairs subset from Dataset 3.

Dataset	New Compounds Dataset ^c		New Proteins Dataset ^d		New Pairs Dataset ^e
Dataset	Training Set	Test Set	Training Set	Test Set	Training Set	Test Set
The number of compounds	2000	726	2406	824	1770	227
The number of proteins	1797	915	1500	501	1351	125
The number of positive interactions ^a	2569	795	2888	476	2218	125
The number of negative interactions ^b	2445	919	2532	832	1834	221
Density (%)	0.140	0.258	0.150	0.317	0.169	0.674

^a: Positive interactions are experimentally validated drug–target interactions. ^b: Negative interactions are experimentally validated non-drug–target interactions. ^c: There were no intersections of compounds in the training set and compounds in the test set. ^d: There were no intersections of proteins in the training set and proteins in the test set. ^e: There were no overlaps between the training and test datasets. Neither the training compound nor the training protein appeared in the test set.

Table 4. Comparison of the proposed method with the previous model in terms of three settings. Best performance values are in bold.

Model	New Compounds				New Proteins				New Pairs
Model	Accuracy	F1	AUC	AUPR	Accuracy	F1	AUC	AUPR	Accuracy	F1	AUC	AUPR
DeepConv-DTI	0.744	0.714	0.808	0.699	0.792	0.719	0.849	0.668	0.668	0.590	0.729	0.645
CPI_prediction	0.571	0.613	0.599	0.669	0.399	0.462	0.443	0.53	0.455	0.567	0.527	0.645
TransformerCPI	0.779	0.743	0.847	0.832	0.829	0.744	0.892	0.841	0.723	0.575	0.753	0.661
IIFDTI	0.750	0.765	0.834	0.818	0.851	0.814	0.922	0.912	0.705	0.608	0.765	0.668
CapBM-DTI	0.788	0.766	0.832	0.848	0.906	0.874	0.945	0.908	0.725	0.615	0.751	0.704

Table 5. Reference-supported ACE2-binding drugs identified through DrugBank large-scale virtual screening.

DrugBank ID	Name	Interact Status ^a	Non-DTI Possibility	DTI Possibility	Drug Mechanism of Action	Ref
DB00691	Moexipril	1	0.03831145	0.98414844	An angiotensin-converting enzyme inhibitor (ACE inhibitor) used for the treatment of hypertension and congestive heart failure	[23]
DB00477	Chlorpromazine	1	0.01079374	0.995042	Facilitating ACE2 endocytosis, reducing virus-receptor binding capacity in vitro	[23]
DB13609	Umifenovir	1	0.12442836	0.84877896	Decreased viral endocytosis	[24]
DB12466	Favipiravir	1	0.03333556	0.98836476	Decreased viral endocytosis	[24]
DB13729	Camostat	1	0.0929359	0.83505136	TMPRSS2 hydrolyzes ACE2 and thus degrades ACE2	[25]
DB12598	Nafamostat	1	0.04878383	0.9957441	TMPRSS2 hydrolyzes ACE2 and thus degrades ACE2	[25]

^a: 1 means interact while 0 means non-interact.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, Y.; Huang, H.-Y.; Chen, Y.; Lin, Y.-C.-D.; Yao, L.; Lin, T.; Leng, J.; Chang, Y.; Zhang, Y.; Zhu, Z.; et al. A Robust Drug–Target Interaction Prediction Framework with Capsule Network and Transfer Learning. Int. J. Mol. Sci. 2023, 24, 14061. https://doi.org/10.3390/ijms241814061

AMA Style

Huang Y, Huang H-Y, Chen Y, Lin Y-C-D, Yao L, Lin T, Leng J, Chang Y, Zhang Y, Zhu Z, et al. A Robust Drug–Target Interaction Prediction Framework with Capsule Network and Transfer Learning. International Journal of Molecular Sciences. 2023; 24(18):14061. https://doi.org/10.3390/ijms241814061

Chicago/Turabian Style

Huang, Yixian, Hsi-Yuan Huang, Yigang Chen, Yang-Chi-Dung Lin, Lantian Yao, Tianxiu Lin, Junlin Leng, Yuan Chang, Yuntian Zhang, Zihao Zhu, and et al. 2023. "A Robust Drug–Target Interaction Prediction Framework with Capsule Network and Transfer Learning" International Journal of Molecular Sciences 24, no. 18: 14061. https://doi.org/10.3390/ijms241814061

APA Style

Huang, Y., Huang, H.-Y., Chen, Y., Lin, Y.-C.-D., Yao, L., Lin, T., Leng, J., Chang, Y., Zhang, Y., Zhu, Z., Ma, K., Cheng, Y.-N., Lee, T.-Y., & Huang, H.-D. (2023). A Robust Drug–Target Interaction Prediction Framework with Capsule Network and Transfer Learning. International Journal of Molecular Sciences, 24(18), 14061. https://doi.org/10.3390/ijms241814061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Robust Drug–Target Interaction Prediction Framework with Capsule Network and Transfer Learning

Abstract

1. Introduction

2. Results

2.1. Effectiveness of BERT Module, MPNN Module, and Capsule Network Module

2.2. Performance and Generalization Comparison with Existing SOTA Predictors

2.3. Feature Analysis

2.4. Case Study of Drug Repurposing to Treat COVID-19

3. Discussion

4. Materials and Methods

4.1. Experimentally Validated Datasets

4.2. Framework of the Constructed Model

4.2.1. Feature Extraction from Protein

4.2.2. Feature Extraction from Drug Molecule

Message Passing

Readout

4.2.3. Capsule Network

4.3. Performance Metrics

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI