GraphATT-DTA: Attention-Based Novel Representation of Interaction to Predict Drug-Target Binding Affinity

Drug-target binding affinity (DTA) prediction is an essential step in drug discovery. Drug-target protein binding occurs at specific regions between the protein and drug, rather than the entire protein and drug. However, existing deep-learning DTA prediction methods do not consider the interactions between drug substructures and protein sub-sequences. This work proposes GraphATT-DTA, a DTA prediction model that constructs the essential regions for determining interaction affinity between compounds and proteins, modeled with an attention mechanism for interpretability. We make the model consider the local-to-global interactions with the attention mechanism between compound and protein. As a result, GraphATT-DTA shows an improved prediction of DTA performance and interpretability compared with state-of-the-art models. The model is trained and evaluated with the Davis dataset, the human kinase dataset; an external evaluation is achieved with the independently proposed human kinase dataset from the BindingDB dataset.


Introduction
Drug development is a high-risk industry involving complex experiments, drug discovery, and pre-clinical and clinical trials. Drug discovery is the process of identifying new candidate compounds with potential therapeutic effects, and it is essential for identify drug-target interactions (DTIs). Moreover, the drug-target binding affinity (DTA) provides information on the strength of the interaction between a drug-target pair. However, as there are millions of drug-like compounds, it can take years and costs about 24 million US dollars for experimental assays of target-to-hit process for a new drug [1]. Efficient computational models for predicting DTA are urgently needed to speed up drug development and reduce resource consumption.
There are several computational approaches to predicting DTA [2,3]. One is the ligandbased method, which compares a query ligand to known ligands based on their target proteins. However, prediction results become unreliable if the number of known ligands with target proteins is insufficient [4]. Another approach is molecular docking [5], which simulates the binding of the conformational spaces between compounds and proteins based on their three-dimensional (3D) structures. However, it is too challenging to produce the 3D protein-ligand complex. Another approach is the chemogenomic method [6] that integrates the chemical attributes of drug compounds, the genomic attributes of proteins, and their interactions into a unified math framework.
In feature-based chemogenomic methods, drug-target pairs are taken as input, and their binding strength or whether to interact, determined by regression or binary classification, are output [7,8]. Efficient input representation is key to accurate prediction. The Biomedicines 2023, 11, x FOR PEER REVIEW 3 of 15 of our model using the Davis kinase binding affinity dataset and the public, web-accessible BindingDB database of measured binding affinities. GraphATT-DTA's prediction performance is then compared with state-of-the-art (SOTA) global and local interaction modeling methods.

Figure 1.
GraphATT-DTA architecture. Molecule graph and protein amino acid sequences are taken as input, and the predicted binding affinity of drug-target pair is the output. The attention-based novel representation is learned by molecule encoding with a graph neural network, sequence encoding using convolutional neural networks, and interaction modeling. The local-to-global relationships between drug substructures and protein sub-sequences are learned via interaction modeling using an attention mechanism. CNN, convolutional neural network; GNN, graph neural network; D, drug embedding matrix; S, protein embedding matrix; R, relation matrix; a, atom-wise softmax of relation matrix; s, subseq-wise softmax of relation matrix.

Dataset
In this study, our proposed model and comparison baselines were trained with the Davis dataset [25] and evaluated for external validity with the BindingDB dataset [26]. Table 1 and Supplementary Table S1 provide a summary. The Davis dataset contains the GraphATT-DTA architecture. Molecule graph and protein amino acid sequences are taken as input, and the predicted binding affinity of drug-target pair is the output. The attention-based novel representation is learned by molecule encoding with a graph neural network, sequence encoding using convolutional neural networks, and interaction modeling. The local-to-global relationships between drug substructures and protein sub-sequences are learned via interaction modeling using an attention mechanism. CNN, convolutional neural network; GNN, graph neural network; D, drug embedding matrix; S, protein embedding matrix; R, relation matrix; a, atom-wise softmax of relation matrix; s, subseq-wise softmax of relation matrix.

Dataset
In this study, our proposed model and comparison baselines were trained with the Davis dataset [25] and evaluated for external validity with the BindingDB dataset [26]. Table 1 and Supplementary Table S1 provide a summary. The Davis dataset contains the kinase protein family and relevant inhibitors with dissociation constants K d , whose value is transformed into the log space as pK d = − log 10 K d e 9 (1) The BindingDB dataset is publicly accessible and contains experimentally measured binding affinities whose values are expressed as K d , K i , IC50, and EC50 terms. For the external test, we extracted drug-target pairs in which the protein is human kinase, and the binding affinity is recorded as a K d value. These values are then transformed into the log space as described.
The Davis dataset consists of six parts. Five are used for cross-validation and one is used for testing. We use the same training and testing scheme as GraphDTA. The hyperparameter is tuned using five parts with five-fold cross-validation. After tuning the hyperparameter, we train all five parts and evaluate the performance with one test part. To evaluate the generalizability of the model, BindingDB is used as the external test dataset.

Input Data Representation
GraphATT-DTA takes SMILES as the compound input and amino acid sequence string as the protein input. First, the SMILES string is converted to a graph structure that takes atoms as nodes and bonds as edges using the open-source Deep Graph Library (DGL) v.0.4.3(2) [27], DGL-LifeSci v.0.2.4 [28], and RDKit v.2019.03.1(1) [29]. We used the atomic feature defined in GraphDTA (i.e., atom symbol, number of adjacent atoms, number of adjacent hydrogens, implicit values of the atoms, and whether the atom is in aromatic structures). We leverage the bond feature used by the directed message-passing neural network (DMPNN; i.e., bond type, conjugation, in the ring, stereo). Tables 2 and 3 list detailed information for each feature. Each amino acid sequence type is encoded with an integer and cut by a maximum length of 1000. If the sequences are shorter than the maximum length, they are padded with zeros. The maximum length can cover at least 80% of all proteins.

Feature Dimension
One hot encoding of the atom element 44 One hot encoding of the degree of the atom in the molecule 11 One hot encoding of the total number of H bonds to the atom 11 One hot encoding of the number of implicit H bonds to the atom 11 Whether the atom is aromatic 1 All 78 Table 3. Input data representations: Compound bond features.

Feature Dimension
One hot encoding of bond type 4 One hot encoding of bond conjugating 2 One hot encoding of whether the bond is part of a ring 2 One hot encoding of the stereo 6 All 14

Drug Representation Learning Model
The molecule is originally represented by a graph structure consisting of atoms and bonds. The GNN uses its structural information and applies a message-passing phase consisting of message_passing and update functions. In the message_passing function, node v aggregates information from its neighbor's hidden representation, h where N(v) is the set of the neighbors of v in graph G, and h (t) v follows time step t of initial atom features, x v . This mechanism, in which atoms aggregate and update information from neighbor nodes, captures information about the substructure of the molecule. GNN models have variants, such as the graph convolutional network (GCN) [30], graph attention network (GAT) [31], graph isomorphism network (GIN) [32], message-passing neural network (MPNN) [33], and directed message-passing neural network (DMPNN) [34], which can be leveraged by specifying the message_passing function, m   Table 4). The output is a drug embedding matrix, D ∈ R N a ×d , where N a is the number of atoms, and d is the dimension of the embedding vectors. In the drug embedding matrix, each atom has the information of its neighbor atoms (i.e., substructure) along with the number of GNN layers. Table 4. Graph neural network variants used for drug embedding matrix generation.

Model
Message Passing Function Update Function

Protein Representation Learning Model
The Davis and BindingDB datasets have 21 and 20 amino acid types, respectively. Hence, we consider 21 and 20 amino acids for learning and testing, respectively. The integer forms of protein amino acid sequences become the input to the embedding layers. These are then used as input to three consecutive 1D convolutional layers, which learn representations from the raw sequence data of proteins. The CNN models capture local dependencies by sliding the input features with filters, and their output is the protein sub-sequence embedding matrix, S ∈ R N s ×d , where N s is the number of sub-sequences. The number of amino acids in a sub-sequence depends on the filter size. The larger the filter size, the greater the number of amino acids in the sub-sequence.

Interaction Learning Model
The relationship between the protein and the compound is a determinant key for DTA prediction. The attention mechanism can make the input pair information influence the computation of each other's representation. The input pairs can jointly learn a relationship. GraphATT-DTA model constructs the relation matrix R using dot product of protein and compound embedding where R ∈ R N a ×N s . It provides information about the relationship between the substructures of compounds and protein sub-sequences.
GraphATT-DTA reflects the local interactions by considering the crucial relationships between protein sub-sequences and compound substructures. The subseq-wise/atomwise SoftMax is applied to the relation matrix to construct the substructure and subsequence significance matrices. The formulas appear in (5) and (6). The element of substructure_significance indicates the substructure's importance to the sub-sequence. Similarly, the element of subsequence_significance indicates the sub-sequence's importance to the substructure.
The substructure_significance is directed to the drug embedding matrix via elementwise multiplication ( ) with a j and D, where a j ∈ R N a ×1 , and j = 1, . . . , N s . a j indicates each substructure's importance of the jth sub-sequence. D (j) ∈ R N a ×d indicates the drug embedding matrix with the importance of the jth sub-sequence.
Drug vector d (j) is constructed by (8) and carries the information of the compound The concatenation of d (j) with all sub-sequences causes D to inherit all information about the sub-sequences and compounds, where D ∈ R N s ×d . The new drug feature is thus constructed to reflect all protein sequences and compound atoms where drug_feature ∈ R 1×d .
The new protein feature is calculated the same way. Using the element-wise multiplication of the subsequence_significance and protein embedding matrix, the protein embedding matrix, P (i) , with the ith substructure significance to the sub-sequence, is constructed so that s i ∈ R 1×N s , S ∈ R N s ×d , and P (i) ∈ R N s ×d . The summation of P (i) makes protein vector p (i) with the sub-sequence information about the compound, where p (i) ∈ R 1×d . After the concatenation of p (i) , the summation of P makes the new protein feature vector reflect compound sub-structure significance information, where P ∈ R N a ×d , and protein_feature ∈ R 1×d .
Protein and drug features reflecting the local-to-global interaction information are collected via concatenation. The fully connected layers can then predict the binding affinity. We use mean squared error (MSE) as the loss function.

Implementation and Hyperparameter Settings
GraphATT-DTA was implemented with Pytorch 1.5.0 [35], and the GNN models were built with DGL v.0.4.3(2) [27] and DGL-LifeSci v.0.2.4 [28]. Early stopping was configured with the patience of 30 epochs to avoid potential overfitting and obtain improved generalization performance. The hyperparameter settings are summarized in Table 5. Multiple experiments are used with five-fold cross-validation, applied for hyperparameter selection. The layers of GNN are important because they pertain to how many neighbor nodes are regarded by the model. Because there are many layers, the model can consider many neighbors; however, doing so can cause an over-smoothing problem in which all node embeddings converge to the same value. Additionally, if the number of layers is too small, the graph substructure will not be captured. Therefore, proper layer configuration is important. The optimal number of GNN layers was experimentally chosen for GraphATT-DTA by using each GNN graph embedding model. Specific experimental results can be found in Supplementary Table S2 and Supplementary Figure S1.

Performance Evaluation Metrics
We used the concordance index (CI) and MSE to evaluate prediction performance. We formulated the CI to estimate whether the predicted binding affinity values were in the same order as their true values. b x is the prediction value with the larger affinity, d x , b y is Biomedicines 2023, 11, 67 8 of 15 the prediction value with the smaller affinity, d y , Z is a normalization constant, and h(x) is the step function.
The MSE was used to calculate the differences between predicted and true affinity values, p i is the predictive score, y i is the ground-truth score, and n is the number of samples.

Binding Affinity Prediction on Testing Data
First, we investigated the predictive performance of GraphATT-DTA using various graph embedding models. The drug was represented using different graphs embedding models such as GCN, GAT, GIN, MPNN, and DMPNN. We performed ten times repeated training and testing. In Figure 2, we report the mean and standard deviation scores of the MSE and CI from the Davis dataset. The drug embedding models showed similar performances (i.e., MSE and CI). For MSE, the GIN drug embedding model performed best with an MSE of 0.213, followed by the GCN and MPNN models with MSEs of 0.214 and 0.215. For CI, MPNN performed best, with a CI of 0.899, followed by DMPNN and GCN.
The MSE was used to calculate the differences between predicted and true affinity values, is the predictive score, is the ground-truth score, and n is the number of samples.

Binding Affinity Prediction on Testing Data
First, we investigated the predictive performance of GraphATT-DTA using various graph embedding models. The drug was represented using different graphs embedding models such as GCN, GAT, GIN, MPNN, and DMPNN. We performed ten times repeated training and testing. In Figure 2, we report the mean and standard deviation scores of the MSE and CI from the Davis dataset. The drug embedding models showed similar performances (i.e., MSE and CI). For MSE, the GIN drug embedding model performed best with an MSE of 0.213, followed by the GCN and MPNN models with MSEs of 0.214 and 0.215. For CI, MPNN performed best, with a CI of 0.899, followed by DMPNN and GCN.
Next, using the compound embedding model, we evaluated the overall affinity prediction performance again using the Davis dataset. Figure 3 illustrates the correlations between the true and predicted affinity scores. Here, MSE, Pearson correlation, CI, and R 2 values are reported on graphs' y-axes. When the model predicts the binding affinity perfectly, the slope follows the y = x line. In the Davis dataset, we observed that the predicted distribution follows the true binding affinity quite well (MSE 0.204).  Next, using the compound embedding model, we evaluated the overall affinity prediction performance again using the Davis dataset. Figure 3 illustrates the correlations between the true and predicted affinity scores. Here, MSE, Pearson correlation, CI, and R 2 values are reported on graphs' y-axes. When the model predicts the binding affinity perfectly, the slope follows the y = x line. In the Davis dataset, we observed that the predicted distribution follows the true binding affinity quite well (MSE 0.204).

Comparison with Other Methods
For comparisons, we considered both global and local interaction modeling methods. Global methods included DeepDTA and GraphDTA, and local methods included DeepAffinity, ML-DTI, HyperAttentionDTI, and FusionDTA. For a fair comparison, we used the same training and testing datasets. The Davis dataset consists of six parts: five for training and one for testing. Finally, the model showing the best prediction performance was selected, and if the model used early stopping for training, we also applied early stopping on the Davis testing dataset to avoid overfitting. We trained using the same hyperparameter settings described in the baseline model papers [16,17,20,21,23,24].
DeepDTA used 1D CNNs to learn representations from the raw sequence data from SMILES of drugs and amino acids of a protein. After pooling the final convolution layer, the drug and protein features were concatenated, and the fully connected layers regressed the DTA.
GraphDTA represents drugs as molecule graphs and uses GNNs for molecule representation. The amino acid sequences were the protein inputs, and 1D CNNs were used for protein representation. Max-pooling was applied, followed by concatenation, and the DTA was predicted using hidden layers. GCN, GAT, GIN, and GCN-GAT models were tested, and GIN was the best performer. Hence, we used it for molecule embedding. It did not use early stopping; thus, the best model performance on the Davis testing dataset was selected for testing on the BindingDB.
DeepAffinity [19] takes the SMILES string of the compound and the structural propery sequence (SPS) of the protein. SPS includes secondary structure elements, solvent accessibility, physico-chemical characteristics, and lengths. The secondary structure element and solvent accessibility are determined using the SSPro prediction model [31], and the input is pretrained using the Seq2seq autoencoder. After the recurrent neural network (RNN) encodes the input, the pair is represented by a weighted compound and protein summation. Joint attention is then calculated using the compound and protein string pairs. In the original study, the researchers predicted the pIC50 value using a model trained by the BindingDB dataset.
ML-DTI [20] applies a mutual learning mechanism and takes SMILES and amino

Comparison with Other Methods
For comparisons, we considered both global and local interaction modeling methods. Global methods included DeepDTA and GraphDTA, and local methods included Deep-Affinity, ML-DTI, HyperAttentionDTI, and FusionDTA. For a fair comparison, we used the same training and testing datasets. The Davis dataset consists of six parts: five for training and one for testing. Finally, the model showing the best prediction performance was selected, and if the model used early stopping for training, we also applied early stopping on the Davis testing dataset to avoid overfitting. We trained using the same hyperparameter settings described in the baseline model papers [16,17,20,21,23,24].
DeepDTA used 1D CNNs to learn representations from the raw sequence data from SMILES of drugs and amino acids of a protein. After pooling the final convolution layer, the drug and protein features were concatenated, and the fully connected layers regressed the DTA.
GraphDTA represents drugs as molecule graphs and uses GNNs for molecule representation. The amino acid sequences were the protein inputs, and 1D CNNs were used for protein representation. Max-pooling was applied, followed by concatenation, and the DTA was predicted using hidden layers. GCN, GAT, GIN, and GCN-GAT models were tested, and GIN was the best performer. Hence, we used it for molecule embedding. It did not use early stopping; thus, the best model performance on the Davis testing dataset was selected for testing on the BindingDB.
DeepAffinity [19] takes the SMILES string of the compound and the structural propery sequence (SPS) of the protein. SPS includes secondary structure elements, solvent accessibility, physico-chemical characteristics, and lengths. The secondary structure element and solvent accessibility are determined using the SSPro prediction model [31], and the input is pretrained using the Seq2seq autoencoder. After the recurrent neural network (RNN) encodes the input, the pair is represented by a weighted compound and protein summation. Joint attention is then calculated using the compound and protein string pairs. In the original study, the researchers predicted the pIC50 value using a model trained by the BindingDB dataset.
ML-DTI [20] applies a mutual learning mechanism and takes SMILES and amino acid sequences as input; 1D CNNs are used for encoding. The model leverages protein information during compound encoding and vice versa. It creates a probability map between the global protein descriptor and drug string feature vector. In the original study, the researchers used the Davis dataset processed by PADME for training and testing. The number of drugs was 72, the targets were 442, and the interactions were 31,824. However, to compare the performances, we used the same Davis dataset as GraphATT-DTA, which consisted of 68 drugs and 442 targets.
HyperAttentionDTI takes compound SMILES and protein input as amino acid sequences. 1D CNN layers encode the compound and protein representations, and a hyperattention module models the semantic interdependencies spatially and channel-wise between the drug and protein sub-sequences. DTI is predicted via binary classification; hence, we changed the last layer and applied MSE to the loss function.
FusionDTA uses a pretrained transformer and BI-LSTM to encode amino acid sequence, and BI-LSTM to encode SMILES. The original researchers proposed a fusion layer, which consisted of a multi-head linear attention layer that focused on the important token from the biological sequence and aggregated global information based on the attention score. The fully connected layer then predicts binding affinity.
Tables 6 and 7 report the MSE and CI scores of the Davis and external BindingDB datasets, respectively. We used the model trained on the Davis dataset to evaluate the BindingDB external dataset and verify its generalizability. GraphATT-DTA consistently performed well on both sets.  For the Davis dataset, the results are shown in Table 6. GraphATT-DTA achieved an MSE of 0.204 and a CI of 0.904. The local interaction modeling methods tend to exhibit superior performance to the global interaction methods. Local interaction methods employ local characteristics extracted from protein and compound to predict DTA. This gives the model a more comprehensive insight into the interacting parts of the protein and the compound. Thus, such models perform better when predicting drug-target affinity based on the training data.
For the external BindingDB dataset test, GraphATT-DTA achieved an MSE of 1.582 and a CI of 0.651. The BindingDB dataset consisted of various large human kinase DTA pairs. The local interaction modeling methods had better results because the attention mechanism caused the model to have a more informative representation. Additionally, early stopping seems to have played an important role as it avoided overfitting. Compared with other models, GraphATT-DTA showed consistently higher performance on both the Davis and BindingDB datasets. Additionally, when comparing the performance per drug scaffold, we confirmed that no particular prediction model has a specially high or low prediction performance for a specific scaffold (Supplementary Figures S2 and S3 [36]). However, there were performance gaps between the two. We speculate that discrepancies in the predicted affinity values vs. actual affinity values (pKd) in the two datasets led to performance deterioration. That is, in the Davis dataset, the lowest pKd value was five (10 µM). However, for the BindingDB dataset, many samples had values lower than five (Supplementary Figure S4).

Ablation Study
Next, we performed an ablation study of the GraphATT-DTA model to confirm the advantages obtained by the attention interaction. The ablation model used max-pooling on the drug and protein embedding matrices, followed by concatenation. The fully connected layers predicted the binding affinity. The drug and protein embedding modules were the same as those in the proposed GraphATT-DTA model. Figure 4 compares the performances of concatenation and attention. The results showed that interaction modeling with the attention mechanism had more predictive power.

Ablation Study
Next, we performed an ablation study of the GraphATT-DTA model to confirm th advantages obtained by the attention interaction. The ablation model used max-poolin on the drug and protein embedding matrices, followed by concatenation. The fully con nected layers predicted the binding affinity. The drug and protein embedding module were the same as those in the proposed GraphATT-DTA model. Figure 4 compares th performances of concatenation and attention. The results showed that interaction mode ing with the attention mechanism had more predictive power.

Visualization with a Case Study
The proposed GraphATT-DTA uses an attention mechanism to consider the im portant interactions between molecule substructures and protein sub-sequences whe predicting the DTA score. For substructure and sub-sequence significance score matrices we investigated where high attention scores existed. Thus, we prepared a visualized cas study experiment for the GraphATT-DTA ( Figure 5, Supplementary Figure S5). For this we selected the complex compound Lapatinib and gene EGFR (PDB ID: 1XKK). The actua affinity was 8.38, and the predicted affinity was 7.58 via GraphATT-DTA using GCN.
For the 1XKK complex, we found that the high attention scores were clustered nea the residue, from 780 to 807 of the protein and quinazoline substructure of the drug. Th

Visualization with a Case Study
The proposed GraphATT-DTA uses an attention mechanism to consider the important interactions between molecule substructures and protein sub-sequences when predicting the DTA score. For substructure and sub-sequence significance score matrices, we investigated where high attention scores existed. Thus, we prepared a visualized case study experiment for the GraphATT-DTA ( Figure 5, Supplementary Figure S5). For this, we selected the complex compound Lapatinib and gene EGFR (PDB ID: 1XKK). The actual affinity was 8.38, and the predicted affinity was 7.58 via GraphATT-DTA using GCN. sub-sequence. Figure 5c shows the significances matched to the sub-sequences. From the protein sub-sequence's perspective, the sub-sequence index, including 793 methionine, regarded quinazoline as the important compound substructure. We identified the top five significance scores from the two significance matrices, finding that they had the same substructure-subsequence pairs. This result indicates that the sub-sequence deemed important by the substructure and the substructure deemed important by the sub-sequence retain the same position in the drug target pair.

Discussion and Conclusions
Identification of DTIs is an essential step for finding novel drug candidates for a given target protein or vice versa. Once initial binding molecules are successfully identified, the next step is to optimize the activity by enhancing the binding affinity between molecules. Thus, DTA prediction models could be used in many efficient ways in the compound optimization process. For the 1XKK complex, we found that the high attention scores were clustered near the residue, from 780 to 807 of the protein and quinazoline substructure of the drug. The binding positions of the RCSB PDB showed that the focused interaction between the sub-sequence and substructure, as determined by GraphATT-DTA was one of the true binding sites. The residue, 793 Methionine, and atom, Nitrogen, have a true hydrogen bond. Figure 5a shows the 3D structure of Lapatinib and EGFR, where we colored a ligand as green, the true binding sites are yellow, and hydrogen bonds are red. In Figure 5b, we colored ligand substructures with a high attention score in orange and protein regions with a high attention score in light blue.
The substructure significance is visualized in Figure 5b, where a high score was matched to atoms. From the drug substructure's perspective, the quinazoline substructure referred to the sub-sequence index, including 793 methionine as the important protein sub-sequence. Figure 5c shows the significances matched to the sub-sequences. From the protein sub-sequence's perspective, the sub-sequence index, including 793 methionine, regarded quinazoline as the important compound substructure. We identified the top five significance scores from the two significance matrices, finding that they had the same substructure-subsequence pairs. This result indicates that the sub-sequence deemed important by the substructure and the substructure deemed important by the sub-sequence retain the same position in the drug target pair.

Discussion and Conclusions
Identification of DTIs is an essential step for finding novel drug candidates for a given target protein or vice versa. Once initial binding molecules are successfully identified, the next step is to optimize the activity by enhancing the binding affinity between molecules.
Thus, DTA prediction models could be used in many efficient ways in the compound optimization process.
In this paper, we proposed an attention-based novel representation that considers local-to-global interactions to predict DTA. We used a powerful GNN to describe the raw graph data of drugs and 1D CNNs for raw amino acid sequences of proteins. The attention mechanism was used to construct a new molecule and the protein feature that reflects the relationship between the compound substructure and protein sub-sequence. Consequently, the physical interaction patterns between proteins and compounds can be captured by these attention-based protein sequential and molecule graph feature representations. Moreover, in our model, binding patterns of proteins with diverse 3D structures can be addressed during the learning phase and it is one of the advantages of employing 2D sequential features as opposed to 3D structural features. In this regard, the proposed method is not limited to be applied to specific structured proteins but can be applied to more generally structured proteins.
In the case of the performance evaluation, prediction results with the Davis and Bind-ingDB datasets clearly demonstrated the efficacy of the proposed method. We observed that the precise modeling of the interaction allowed for more efficient feature learning from training data. Consequently, when modeling DTA, if interaction patterns are observed during training for affinity prediction, the prediction performance can be improved to the point where the model can provide guidance regarding unidentified drug-protein interaction regions. In addition, our case study demonstrated that GraphATT-DTA can provide unique biological insights to help understand the predicted binding affinity based on the attention scores. We also identified a substructure with high attention scores of a compound that triggers decreasing the binding affinity between EGFR and Canertinib (Supplementary Figures S6 and S7 [37]). Notably, it is well known that the overall performance of the deep learning model heavily depends on the training data. Due to the fact that only kinase inhibitors are included in the training dataset used in this study, the performance of the proposed model against other proteins may be compromised. Thus, in order to provide more general DTA predictions, it is necessary to construct large-scale benchmark datasets containing different classes of proteins.
The mutation of protein can lead to human disease. The mutated amino acid can change biological functions and three-dimensional structures that can change the binding affinity. However, no SOTA models have yet incorporated the concept of mutational alteration effects on protein sequence for modeling DTA (Supplementary Tables S3 and S4; Supplementary Figure S8). For future research, it is necessary to develop a deep-learning model that can identify small-scale alterations in a protein sequence and apply their effects to predicting DTIs and DTAs.
Recently, deep-learning models that predict protein structures based only on amino acid sequences have been developed [38,39]. With such methods, the binding pockets of proteins can be captured, and interactions can be modeled for drug target affinity prediction. We believe that such approaches will soon allow us to achieve higher performance, and leave the exploration using interaction sites with binding pockets in a data-driven manner to future work.