Prediction of Drug–Target Interaction Using Dual-Network Integrated Logistic Matrix Factorization and Knowledge Graph Embedding

Nowadays, drug–target interactions (DTIs) prediction is a fundamental part of drug repositioning. However, on the one hand, drug–target interactions prediction models usually consider drugs or targets information, which ignore prior knowledge between drugs and targets. On the other hand, models incorporating priori knowledge cannot make interactions prediction for under-studied drugs and targets. Hence, this article proposes a novel dual-network integrated logistic matrix factorization DTIs prediction scheme (Ro-DNILMF) via a knowledge graph embedding approach. This model adds prior knowledge as input data into the prediction model and inherits the advantages of the DNILMF model, which can predict under-studied drug–target interactions. Firstly, a knowledge graph embedding model based on relational rotation (RotatE) is trained to construct the interaction adjacency matrix and integrate prior knowledge. Secondly, a dual-network integrated logistic matrix factorization prediction model (DNILMF) is used to predict new drugs and targets. Finally, several experiments conducted on the public datasets are used to demonstrate that the proposed method outperforms the single base-line model and some mainstream methods on efficiency.


Introduction
In recent years, the discovery of new drugs has enormous technology advancement and research investment. However, an intended target is rarely bound to the drugs. This may lead to off-target effects and extend drug development time. As a consequence, there is a necessary need for researchers to develop new drugs in effective ways. Drug repositioning [1] is one of the essential and important part in the discovery of new drugs. Herein, it should be pointed out that one of the fundamentals for computational drug repositioning is to accurately predict drug-target interactions. There are abundant research studies for DTI prediction over the past several decades including chemical genetic and proteomic methods such as affinity chromatography [2] and expression cloning approaches [3]. However, because of laboratory experiments and physical resources, these methods can only process a limited number of possible drugs and targets. Therefore, computational prediction approaches [4,5] have received lots of attention when they can lead to a much faster assessments of possible DTIs.
Mei et al. [6] proposed one of the approaches to predict drug-target interactions computationally. A neighbor-based interaction-profile inference was used for both drugs and targets. KRONRLS-MKL [7] researched a linear combination of multiple similarity measures to model the all similarity between drugs and targets. However, these models used a simple linear combination technique to predict DTIs. In fact, such a linear setting may not be appropriate when the linear relationship is not evident. In view of this bottleneck, regularized least squares integrating with kernel fusion technique model (RLS-KF) [8] employed a nonlinear kernel diffusion technique to combine different kernels and then used the diffused kernel to perform DTIs prediction. As a result, the model has a better performance than the linear combination models. However, when testing with 10-fold cross-validation for the whole dataset, this model failed to produce satisfactory results.
Recently, a neighborhood regularized logistic matrix factorization (NRLMF) [9] was developed to predict DTIs by using logistic matrix factorization and a neighborhood smoothing method. The NRLMF model showed an encouraging result based on the 10-fold cross-validation. Moreover, the dual-network integrated logistic matrix factorization (DNILMF) [10] based on NRLMF used matrix factorization to predict drug-target interactions over drug information networks and showed significant improvements over other methods on standard benchmarking datasets. Other models such as the DTI-CDF and DTI-MLCD used machine learning-based methods. The DTI-CDF [11] used pseudoposition specific scoring matrix (PsePSSM) to extract the evolution information of protein sequence and added path-category-based muti-similarities feature (PathCS) based on the heterogeneous graph of DTIs. The DTI-MLCD [12] utilized the community detection method to facilitate muti-label classification. Nevertheless, in this case, the difficulty lies in overdependence on known drugs and targets information, and the latent information between drugs and targets might be absent. In view of this problem, the more advanced prior-knowledge-based approaches have been proposed to satisfy various DTI tasks.
The current prior-knowledge-based approaches in this context are arguably the DDR [13], the NeoDTI and the TriModel. The DDR used a multiphase procedure to predict drug-target interactions from relevant heterogeneous graphs. In this effort, nonlinear fusion was employed to combine different similarity indices as well as random walk features from the input graphs. The NeoDTI [14] supported information about drugs and targets. The TriModel [15] approached the DTI prediction problem as a link prediction in knowledge graphs. In contrast, existing prior-knowledge-based prediction methods such as DDR are best suited to finding new associations between well-studied drugs and targets (useful for instance in the drug repurposing context). In the real word, under-studied drugs and targets can be more easily obtained than well-studied drugs and targets. Therefore, there is a critical need for methods that combine both priori knowledge and the ability to predict under-studied drug-target interactions.
Motivated by the previous studies [10,16], a novel dual-network integrated logistic matrix factorization DTI prediction scheme via relational rotation knowledge graph embedding (Ro-DNILMF) approach is proposed in this article. This model combines knowledge graph embedding and DNILMF. Firstly, we add the tanh function as an optimization function into knowledge graph embedding to produce better results in this task. Secondly, we construct an interaction adjacency matrix by knowledge graph embedding model based on relational rotation (RotatE) [16] to improve information integrity. Finally, we add the interaction adjacency matrix into DNIMLF to predict interactions between new drugs and new targets.
The remainder of this article is organized as follows. We briefly introduce basic concepts and related work in Section 2, such as the DNIMLF and RotatE. Section 3 details the proposed Ro-DNILMF model for drug-target interactions prediction task. Experimental results and discussions are presented in Section 4, and the conclusion and the future work are prospected in Section 5.

Principle of the DNILMF
DNILMF is a predicting drug-target interactions model proposed by Hao et al. [10]. It inherits a majority of features and indicates the superiority of the neighborhood regularized logistic matrix factorization (NRLMF) [9]. The logistic matrix factorization of DNILMF is especially suitable for binary variables and the diffused kernels matrices considering the drug-target profile information to predict the new drug or target. Many researchers have made some work of DNILMF in recent years [17,18], and the architecture of DNILMF is shown in Figure 1. Firstly, the target sequence similarity matrix, chemical structure similarity matrix and interaction adjacency matrix are used as input data. Secondly, to infer new drugs and targets information, the Gaussian kernel matrix and the latent variable matrix are presented by the interaction adjacency matrix. Thirdly, the final kernel matrix is composed of integrating drug or target neighbor information. Finally, the final kernel matrix is added into the logic function to yield interaction probabilities between drugs and targets.
of DNILMF is shown in Figure 1. Firstly, the target sequence similarity matrix, chemical structure similarity matrix and interaction adjacency matrix are used as input data. Secondly, to infer new drugs and targets information, the Gaussian kernel matrix and the latent variable matrix are presented by the interaction adjacency matrix. Thirdly, the final kernel matrix is composed of integrating drug or target neighbor information. Finally, the final kernel matrix is added into the logic function to yield interaction probabilities between drugs and targets.

Data Preparation
The given input training data consist of the target similarity matrix, the drug similarity matrix and the interaction adjacency matrix. The target sequence similarity matrix is denoted by (similarity scores among proteins for both datasets are computed using a normalized version of SmithWaterman score [19]), which is a × square matrix (number of targets, ). The drug similarity matrix is denoted by (similarity scores among compounds for both datasets are computed using the SIMCOMP tool [20]), which is a × square matrix (number of drugs, ). The interaction adjacency matrix is denoted by , where

Definition
In the DNILMF model, "known drug", "new drug", "known target" and "new target" are defined as follows [8]. A "known drug" refers to a drug that has at least one interaction with targets (e.g., D1 in Figure 1A,B, respectively), while a "new drug" refers to a drug that does not have any interaction with targets (e.g., D1 in Figure 1C,D, respectively) in the dataset. A "known target" refers to a target that has at least one interaction with drugs (e.g., D1 in Figure 1A,C, respectively), but a "new target" refers to a target that

Data Preparation
The given input training data consist of the target similarity matrix, the drug similarity matrix and the interaction adjacency matrix. The target sequence similarity matrix is denoted by S ct (similarity scores among proteins for both datasets are computed using a normalized version of SmithWaterman score [19]), which is a N × N square matrix (number of targets, N). The drug similarity matrix is denoted by S cd (similarity scores among compounds for both datasets are computed using the SIMCOMP tool [20]), which is a M × M square matrix (number of drugs, M). The interaction adjacency matrix is denoted by Y cn , where Y cn [d, t] = 1 if drug d interacts with target t, and Y cn [d, t] = 0 otherwise, as shown in Figure 1A-D.

Definition
In the DNILMF model, "known drug", "new drug", "known target" and "new target" are defined as follows [8]. A "known drug" refers to a drug that has at least one interaction with targets (e.g., D1 in Figure 1A,B, respectively), while a "new drug" refers to a drug that does not have any interaction with targets (e.g., D1 in Figure 1C,D, respectively) in the dataset. A "known target" refers to a target that has at least one interaction with drugs (e.g., D1 in Figure 1A,C, respectively), but a "new target" refers to a target that does not have any interaction with drugs (e.g., D1 in Figure 1B,D, respectively) in the dataset.

Latent Matrix and Gaussian Kernel Matrix Construction
The goal of this model is to use known drugs and known targets to derive new drugs and new targets information. Specifically, the algorithm deduces the known drug/target interaction profiles to build a new drug/target latent matrix and the Gaussian kernel matrix. The known drug/target interaction profile (denoted separately by Y u and Y v for the known drug u interaction profile and the known target v interaction profile) are inferred by Y cn . For example, for a known drug u, the interaction profile is calculated by its nearest neighbors in which their interactions are extracted from Y cn . The known target v interaction profile, Y v , is calculated in a similar way. After that, the new drug/target latent variable matrix (denoted separately by D i and T j for the new drug i latent variable matrix and the new target j latent variable matrix) is calculated by Y u /Y v . The formulations are as follows: where S iu cd is the similarity score between new drug i and known drug u and S jv ct is the similarity score between new target j and known target v. Once drugs/targets profiles are inferred for all new drugs and targets, the Gaussian kernel matrices denoted by K gd (d i , d x ) (x = 1, 2, 3, . . . , M) and K gt t j , t z (z = 1, 2, 3, . . . , N) are calculated as Formulas (3) and (4). Those are where Y t . is the new target interaction profile, Y d . is the new drug interaction profile and ϕ is the kernel bandwidth.

Final Diffused Matrix Construction
To add the similarity network information between drugs and targets into the model, the final diffused matrices for drugs and targets (denoted by S d for drugs and S t for targets) are combined with the similarity matrices S ct , S cd and the Gaussian kernel matrices K gd d i , d j , K gt t i , t j . These matrices are normalized and symmetrized. The resulting matrices are status similarity matrices, which are denoted by P (1) , P (2) , P (3) and P (4) , respectively for S ct , S cd , K gd d i , d j , and K gt t i , t j . Status similarity matrices are iterated with a given iteration step number, t, for drugs and targets, respectively. After the iteration process is finished, the final diffused matrices are generated. For details of the calculation procedure, the previous studies [21] can be referred.

Interaction Probabilities Score Calculation
The interaction probability score is key to the predicting interaction between drugs and targets. A high score indicates a higher chance of a drug-target interaction. To obtain the interaction probability score, a logistic function is used to yield scores between drugs and targets with the above final drug diffused matrices S d and the final target diffused matrices S t . The formulation is as follows: where α, ρ, τ are the corresponding smoothing coefficients with the summation of them as 1 and T T denotes the transpose of T.

RotatE
RotatE is a knowledge graph embedding model by relational rotation in complex space. It is able to model and infer three patterns (i.e., symmetry/antisymmetric, inversion, and composition) from the observed facts. The process of the RotatE method can be illustrated as follows. Firstly, in order to initialize knowledge graph embeddings, three types of relation patterns are defined. Then, after the distance between the source entity to the target entity is calculated, self-adversarial negative sampling is used to optimize embeddings. Finally, the score function is proposed to measure the salience of a candidate triplet.

Three Relation Patterns Definition
Specifically, for given triplet (h, r, t), h represents the source entity, t represents the target entity, and r is the relation between h and t. RotatE defines each relation as a rotation from the source entity to the target entity. Relation types are symmetry/antisymmetric, inversion, and composition. According to the existing literature [22], three types of relation pattern definitions are as follows: The relation r 1 is inverse to relation r 2 if ∀h, t The relation r 1 is composed of the relation with r 2 and r 3 if ∀h, t, z

Embeddings Optimization
RotatE initializes its embeddings with random noise. It updates them by self-adversarial negative sampling so as to score the true triplets much higher than the corrupted false triplets. The negative sampling loss function is obtained by: where σ is the optimization function, γ denotes the fixed margin, Tri n is the number of triplets, Θ(·) is the embedding, p h i , r i , t i represents the weight of the negative sample and d r (Θ(h), Θ(t)) is the distance function. p h i , r i , t i is calculated as Formula (7), that is where β is the temperature. The distance function is as follows: where · is Euclidean distance and denotes the Hadamard product.

Score Function Definition
The score function scores true triplets much higher than the corrupted false triplets. To measure the salience of a candidate triplet (h, r, t), the score function is defined as follows: DNILMF can achieve good performance on new drugs and targets. However, it does not incorporate prior knowledge, which is important to enhance predictive accuracy. RotatE can achieve good performance on known drugs and targets, but it is not suited to finding new associations between new drugs and targets. Therefore, to improve the DNILMF performance, a novel DTIs prediction method combined with DNILMF and RotatE called Ro-DNILMF is proposed in this article.

Architecture
In this section, we describe the proposed model Ro-DNILMF for drug-target interactions prediction. This scheme adopts the knowledge graph embedding model to integrate prior knowledge. As the basic prediction model, DNILMF is used to predict interactions between new drugs and targets. It is graphically illustrated in Figure 2. Three main stages are included: data preparation, constructing the interaction adjacency matrix, and training DNILMF model. Firstly, all of the input data are integrated into triples. Then, the RotatE model is trained to optimize embeddings by the negative sampling loss function. The self-adversarial temperature method in the negative sampling loss function is used to choose temperature. The interaction adjacency matrix is generated by the score function so as to integrate prior knowledge into the prediction model. Finally, the new interaction adjacency matrix is applied on the DNILMF model. Latent variable matrix and the final diffused matrix are integrated into logistic function so as to obtain the interaction possibilities between new drugs and targets.

Embedding Initialization
As illustrated in Figure 2, it is necessary for all triplets to initialize embeddings by RotatE. The entity embeddings, (ℎ) and ( ), are initialized by random noise. The relation embeddings, ( ), are calculated based on Euler's identity:

= cos + sin
Relation types include symmetry/antisymmetric, inversion, and composition. If a relation r is symmetry or antisymmetric, each element of its embeddings ( ), i.e., , satisfies: If two relations and are reverse, each element of their embeddings, and

Data Preparation
The knowledge graph embedding model requires data to be modeled in a triplet form, where the objective is to predict new links between entities. In the case of drug discovery, the input data include each triplet (h, r, t), the target sequence similarity matrix, S ct , and chemical structure similarity matrix, S cd .

Embedding Initialization
As illustrated in Figure 2, it is necessary for all triplets to initialize embeddings by RotatE. The entity embeddings, Θ 0 (h) and Θ 0 (t), are initialized by random noise. The relation embeddings, Θ 0 (r), are calculated based on Euler's identity: Relation types include symmetry/antisymmetric, inversion, and composition. If a relation r is symmetry or antisymmetric, each element of its embeddings Θ 0 (r), i.e., r i , satisfies: If two relations r 1 and r 2 are reverse, each element of their embeddings, r 1i and r 2i , satisfy: If a relation r 3 is a combination of two relations r 1 and r 2 , each element of their embeddings, r 1i , r 2i and r 3i , satisfy:

Embedding Optimization
In order to update embeddings, self-adversarial negative sampling is used to train distance d r (Θ(h), Θ(t)) to reduce the distance of true triplets and enlarge the distance of corrupted false triplets. The negative sampling loss function is obtained by Formula (6). According to the problem of inefficiency for the traditional temperature of handcrafted sampling, a method called self-adversarial temperature is adopted to choose the temperature with the current training level. The negative sampling probability is calculated by Formula (7), and the temperature, denoted by β, is obtained by where β 0 is initial temperature and ω is the sigmoid function. The final values of each triplet embedding (Θ(h), Θ(r), Θ(t)) are generated by Formulas (6) and (8).

Interaction Adjacency Matrix Construction
To construct the interaction adjacency matrix, the score function, f RotatE , is trained to score triplet (Θ(h), Θ(r), Θ(t)) by Formula (9) and select new relations. If f RotatE has a higher score than the minimum passing score denoted by ξ, the relation r is added into the interaction adjacency matrix, Y cn [h, t] = 1 as a new element; otherwise, Y cn [h, t] = 0. The interaction adjacency matrix integrates prior knowledge, which is good preparation for DTIs prediction in the next stage.

Predicting DTI with DNILMF
As shown in Figure 2, the interaction adjacency matrix Y cn is constructed by RotatE. It is integrated into DNILMF as one of input matrices, together with S ct and S cd .

Latent Variable Matrix and Gaussian Kernel Matrix Construction
Combined with the above interaction adjacency matrix Y cn , the drug i latent variable matrix denoted by D i and the target j latent variable matrix denoted by T j are generated to predict new DTIs. The important steps are summarized as follows: (1) the interaction profile is built. For a known drug u, the interaction profile, Y u , is calculated by its nearest neighbors in which their interactions extracted from Y cn . For a known target v, the interaction profile, Y v , is calculated in the same way; (2) the latent variable matrix is calculated by the multiplication of the similarity score with interaction profile. According to Formulas (1) and (2), the latent matrices, D i and T j , are calculated by the following equations: where Y u and Y v are separately the known drug u interaction profile and the known target v interaction profile. According to Formulas (3) and (4), the Gaussian kernel matrices, denoted by K gd (d i , d x ) (x = 1, 2, 3, . . . , M) for drug i and K gt (t i , t z ) (z = 1, 2, 3, . . . , N), for target j can be calculated by where Y t . is the drug/target interaction profile. Thus, after the above calculation, D i , T j , K gd (d i , d x ), and K gt (t i , t z ) are constructed.

Final Diffused Matrix Construction
In the DNILMF model, the final diffused matrix is constructed to integrate neighbor information between drugs and targets. The final diffused matrix is calculated by the Gaussian kernel matrix and the similarity matrix. Specifically, for a new target, the similarity matrix S ct is first converted into the kernel matrix according to previous studies [23]. By normalizing and symmetrizing both the target kernel matrix and the target Gaussian kernel matrix, the status matrix, denoted by P (1) and P (2) , is constructed for the target kernel matrix and the target Gaussian kernel matrix, respectively. The final diffused matrix is calculated by the multiplication of local similarity matrix L for each P matrix with the status matrix after t iterations. The local similarity matrix for each P matrix is calculated by the following equation: where N i denotes the nearest neighbors of target i and k is the number of nearest neighbors. It can be noted that this operation makes the similarities among non-nearest neighbors to zero. P (1) t+1 and P (2) t+1 are calculated by the following equation: where P (1) t+1 is the status matrix of the target kernel matrix after t iterations, and P (2) t+1 is the status matrix of K gt t i , t j after t iterations. To make P (1) t and P (2) t symmetrical, in each iteration, the status matrices, P (1) t and P (2) t , are further changed as follows: where I denotes the identity matrix. After t steps, the final target diffused matrix, S t , is calculated by P (1) t and P (2) t . For a new drug, after applying the same steps, we can also obtain the final drug kernel matrix S d .

Interaction Probability Calculation
Using the latent variable matrices and the final kernel matrices, the interaction probabilities P between new drugs and targets are yielded. The equation is as follows:

Data Preparation and Experimental Settings
To demonstrate the effectiveness of our proposed scheme, it is thoroughly evaluated on the Kyoto Encyclopedia of Genes and Genomes (KEGG) dataset [24], DrugBank dataset [25] and Yamanishi_08 dataset [26], respectively.

Dataset Preparation
The KEGG dataset is a large benchmark dataset covering metabolismus, cellular processes, diseases, drug pathways, genetic information processing, environmental information processing, and organismal systems. The total training drug sample is 10,979, the target sample is 13,959, and the interaction sample is 12,112.
The DrugBank dataset can be considered as both a bioinformatics and a cheminformatics resource. The total training drug sample is 1482, the target sample is 1408, and the interaction sample is 9881 in our experiment.
The Yamanishi_08 dataset represents the most frequently used gold standard datasets in the previous state-of-the-art models. It is used to validate the proposed model for DTIs prediction. The dataset is classified into four groups: enzymes (EN), which has 445 drugs and 664 targets; ion channels (IC), which has 210 drugs and 204 targets; G-protein coupled receptors (GPCR), which has 223 drugs and 95 targets; and nuclear receptors (NR), which has 54 drugs and 26 targets. All of samples are trained in our experiment. The information of datasets is shown in Table 1.

. Experimental Environment
This presented method can be easily performed on a laptop. All experiments are conducted on the laptop configured by NOIDIA GeForce MX250, 8G memory, Intel Core i5-1021 CPU 1.60-GHz processor, and the operating system is Window 1064 bit.

Results and Discussion
In this section, we comprehensively evaluate the superior performance of the proposed method in many aspects: parameter setting, the optimization function determination of Ro-DNILMF, performance of the score function in Ro-DNILMF, performance of Ro-DNILMF under different samples, and comparative results with some mainstream prediction methods.

The Optimization Function Determination of Ro-DNILMF
In order to improve the computational efficiency of the loss function, the optimization function is used. This function can map the distance discrete values to a certain range. This experiment gives the performance of optimal function including sigmoid function and tanh function. The sigmoid function is a common optimization function that maps the distance between the fixed margin γ and the distance d r to [0, 1]. The formulation is as follows: The tanh function extends the mapping range to [-1, 1] based on the sigmoid function. The formulation is as follows: Mean Reciprocal Rank (MRR) and Hit at N (H@N) are standard evaluation measures for the Yamanishi_08 dataset.
The results of the optimization function based on the tanh function and sigmoid function are shown in Table 3. It shows that the best MRR scores of the sigmoid function and the tanh function are 0.723 and 0.743, respectively. The H@1 and the H@3 score of the sigmoid function are lower than the tanh function. In a Hit@10 comparison, the tanh function tops the sigmiod function by at most 0.093. In conclusion, the results of the tanh function are better than the sigmoid function (e.g., the highest score of the tanh function is 0.884, while the highest score of the sigmoid function is 0.817). We think it is caused by the distance between the fixed margin γ and the distance d r . The tanh function can calculate the distance both positive and negative (−1, 1) and the sigmoid function can only calculate positive ones (0, 1). The blue part 0.817 is the best performance of the sigmoid function and the blue part 0.817 is the best performance of the tanh function.

Performance of the Score Function in Ro-DNILMF
To measure the salience of the score function in Ro-DNILMF, this part trains different embedding models including TransE [28], ComplEX [29] and RotatE on the DrugBank dataset. The score function in TransE is −h + r − t, and the score function in ComplEX is Re r, h, t , where Re(x) is a real vector component. The score function in RotatE is shown in Formular (9). Figure 3 shows Hit@N with different embedding models trained by the best optimal parameters. In Figure 3a, the highest hit score is RotatE (79%) and the maximum difference is 30%. In Figure 3b, the highest hit score is RotatE (88.4%) and the maximum difference is 26%. In Figure 3c, the highest hit score is RotatE (88.5%) and the maximum difference is 14%. These results show that the values of the RotatE model are higher than the ones of any embedding models. It is caused by RotatE, which defines each relation as a rotation from the source entity to the target entity, and relation types determination will produce better generalization results.

Performance of Ro-DNILMF under Different Samples
In order to test the prediction performance of the Ro-DNILMF model under different samples, area under curve of receiver operating characteristic (AUC) and area under precision-recall curve metrics (AUPR) are evaluated on the KEGG dataset. This experiment increases the number of training samples from 0 to 2000 with 100 training samples each time. As can be seen from Figure 4, the Ro-DNILMF model is more robust than the DNILMF model in the case of fewer samples. When the training sample is 100, the AUC score of the Ro-DNILMF model is 0.903, while the AUC score of the DNILMF model is 0.59. When the training sample is 1000, the AUPR score of the Ro-DNILMF model is 0.96, while the AUPR score of the DNILMF model is 0.72. When the training sample is 1500, the AUC score of the Ro-DNILMF is 0.965, while the AUC score is 0.892. When the training sample is 2000, the AUPR score of the Ro-DNILMF model is 0.972, while the AUPR score of the DNILMF model is 0.945. These results show that the Ro-DNILMF model converges significantly faster than the DNILMF model with the increasing number of training sam-

Performance of Ro-DNILMF under Different Samples
In order to test the prediction performance of the Ro-DNILMF model under different samples, area under curve of receiver operating characteristic (AUC) and area under precision-recall curve metrics (AUPR) are evaluated on the KEGG dataset. This experiment increases the number of training samples from 0 to 2000 with 100 training samples each time. As can be seen from Figure 4, the Ro-DNILMF model is more robust than the DNILMF model in the case of fewer samples. When the training sample is 100, the AUC score of the Ro-DNILMF model is 0.903, while the AUC score of the DNILMF model is 0.59. When the training sample is 1000, the AUPR score of the Ro-DNILMF model is 0.96, while the AUPR score of the DNILMF model is 0.72. When the training sample is 1500, the AUC score of the Ro-DNILMF is 0.965, while the AUC score is 0.892. When the training sample is 2000, the AUPR score of the Ro-DNILMF model is 0.972, while the AUPR score of the DNILMF model is 0.945. These results show that the Ro-DNILMF model converges significantly faster than the DNILMF model with the increasing number of training samples. cision-recall curve metrics (AUPR) are evaluated on the KEGG dataset. This experiment increases the number of training samples from 0 to 2000 with 100 training samples each time. As can be seen from Figure 4, the Ro-DNILMF model is more robust than the DNILMF model in the case of fewer samples. When the training sample is 100, the AUC score of the Ro-DNILMF model is 0.903, while the AUC score of the DNILMF model is 0.59. When the training sample is 1000, the AUPR score of the Ro-DNILMF model is 0.96, while the AUPR score of the DNILMF model is 0.72. When the training sample is 1500, the AUC score of the Ro-DNILMF is 0.965, while the AUC score is 0.892. When the training sample is 2000, the AUPR score of the Ro-DNILMF model is 0.972, while the AUPR score of the DNILMF model is 0.945. These results show that the Ro-DNILMF model converges significantly faster than the DNILMF model with the increasing number of training samples.

Comparison with Other Mainstream Methods
We further compare the presented method with other state-of-the-art methods, such as BLM-NII, KRONRLS-MKL, NRLMF and DNILMF. The comparative results are shown in Figure 5. Note that all of the comparative methods are tuned with optimal parameters as previous works [10,[30][31][32]. The performance of each method is tested on the Yamanishi_08 dataset, and it is evaluated with AUC and AUPR. As can be seen from

Comparison with Other Mainstream Methods
We further compare the presented method with other state-of-the-art methods, such as BLM-NII, KRONRLS-MKL, NRLMF and DNILMF. The comparative results are shown in Figure 5. Note that all of the comparative methods are tuned with optimal parameters as previous works [10,[30][31][32]. The performance of each method is tested on the Yamanishi_08 dataset, and it is evaluated with AUC and AUPR. As can be seen from Figure 5, the scores of BLM-NII and KRONRLS-MKL are also lower than the scores of the other methods. NRLMF and DNILMF take higher AUC and AUPR scores on the EN dataset. For the proposed method, although it has a bit of a lower score than the NRLMF and DNILMF methods on the EN dataset, its AUPR score and AUC score can obviously achieve the highest ones on other three datasets (AUPR score: 72.6%, 91.2% and 62.5% on other three datasets, AUC score: 94.5%, 98.6% and 91.3% on other three datasets).  To verify the contribution of RotatE, we compare the performance of RotatE with previous knowledge graph embedding models on the Yamanishi_08 dataset, including

Comparison with Other Combination Models
To verify the contribution of RotatE, we compare the performance of RotatE with previous knowledge graph embedding models on the Yamanishi_08 dataset, including TransE, DisMult [33], HolE [34], ComplEx [29], and ConvE [35]. In this experiment, these embedding models, including RotatE, are combined with DNILMF and NRLMF, respectively. For all combined models, the AUC and AUPR scores are shown in Table 4. Although Ro-DNILMF has better performance than the combination model of RotatE and DNILMF caused by the optimization function, the combination model of the RotatE model and the DNILMF model outperforms the other combined models. It is noted that almost all of the combination models outperform the baseline model. We think it is caused by the knowledge graph embedding model added to the prediction models. New information of samples will produce better performance. The green part is the performance of the Ro-DNILMF model and the blue part is the best performance of the other combination models.

Conclusions and Future Work
Ro-DNILMF is an efficient drug-target interactions prediction model, which was designed based on RotatE and DNILMF. This method used RotatE to learn efficient vector representation for both drugs and targets, and it constructed the interaction adjacency matrix to integrate prior knowledge. Our study trained DNILMF to predict new drugs/targets interactions. This method faced an increasingly prior knowledge problem in the real world. The prior knowledge was combined with predicting new drugs/targets, and the prediction accuracy was surprisingly improved.
What is more, the tanh function was added into RotatE to greatly increase generalization capability. Experiments conducted on the benchmark datasets proved that the proposed method achieved high efficiency and better effectiveness than many other popular methods. Our experiments also showed that prediction model with knowledge graph embedding can improve accuracy.
In future work, we will further explore the relationship of drug-target interactions and annotation information, and we will even extend this method to many complicated applications. Last but not least, the selected prediction of our model will be validated in laboratory experiments to demonstrate the clinical relevance of our results.