Interpretable Graph-Embedding Framework Based on Joint Feature Similarity for Drug–Drug Interaction Prediction

Li, Xiaowei; Chen, Cheng; Zhao, Zihao; Wang, Qingyong; Gu, Lichuan

doi:10.3390/electronics15030712

Open AccessArticle

Interpretable Graph-Embedding Framework Based on Joint Feature Similarity for Drug–Drug Interaction Prediction

by

Xiaowei Li

^1,2,†,

Cheng Chen

^1,2,†,

Zihao Zhao

^1,2,

Qingyong Wang

^1,2 and

Lichuan Gu

^1,2,*

¹

School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China

²

Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Hefei 230036, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2026, 15(3), 712; https://doi.org/10.3390/electronics15030712

Submission received: 9 January 2026 / Revised: 3 February 2026 / Accepted: 4 February 2026 / Published: 6 February 2026

Download

Browse Figures

Versions Notes

Abstract

Deep learning methods have been extensively used for drug–drug interaction (DDI) prediction, aiding the development of effective and safe combination therapies. Most studies focus on either the internal molecular structure or external contextual information of individual drugs to improve feature diversity and validity. However, the latent similarities between drug pairs, which are essential for accurate predictions, have largely been overlooked. Therefore, we propose an interpretable predictive approach for graph embedding called PINGE, which relies solely on the interaction network of drugs. Specifically, we constrain the joint features of drug pairs to their interactions, allowing those with similar types to achieve cosine similarity. This similarity in direction helps the joint features converge to the same class during prediction. Additionally, each known drug can link to multiple others, enhancing its diversity. Extensive experiments demonstrate that PINGE outperforms current advanced prediction methods on both KEGG and Drugbank datasets, achieving improvements of 0.7% and 2.4% in ACC while providing network structure-based explanations for predictions. Furthermore, PINGE surpasses advanced baselines by 1% and 1.1% in AUC on the human drug–target dataset and HuRI protein–protein interaction dataset, showcasing excellent versatility.

Keywords:

drug-drug interaction prediction; graph embedding; joint feature; cosine similarity

1. Introduction

Drug–drug interactions (DDIs) occur when one drug’s effects are altered by another, leading to unexpected adverse effects or reduced efficacy [1]. These interactions can arise from pharmacokinetic mechanisms (affecting absorption, distribution, metabolism, or excretion) and pharmacodynamic mechanisms (modifying effects at the receptor level). The clinical significance of DDIs is substantial; they can result in serious adverse reactions, increased toxicity, or therapeutic failure [2]. For example, the interaction between warfarin and certain antibiotics raises bleeding risk, while combining antihypertensives with non-steroidal anti-inflammatory drugs (NSAIDs) may impair blood pressure control [3]. Understanding and predicting these interactions is essential for patient safety and optimizing treatment outcomes. Furthermore, within the evolving digital transformation of healthcare, such as Hospital 4.0 and 5.0, the integration of smart sensors and advanced data science is essential for creating a safer healthcare environment and facilitating data-driven decision-making [4].

However, leveraging large-scale DDI networks may introduce privacy and governance risks when the underlying evidence is associated with sensitive biomedical or clinical records. To mitigate these risks, prior studies have explored blockchain-based mechanisms to enhance decentralized data provenance, integrity, and auditability, as well as homomorphic encryption to support model training or inference directly on encrypted data without exposing raw information. Such privacy-preserving approaches are important for responsible deployment and for strengthening clinician and stakeholder trust in real-world clinical decision support systems [5,6].

Within the Internet of Medical Things (IoMT) landscape, protecting sensitive patient information such as diagnoses and drug histories is particularly critical. Aligning the PINGE framework with real-world healthcare data protection standards therefore necessitates robust cryptographic mechanisms and secure data storage architectures. Recent studies on bit-count transmutation encryption models [7] highlight that integrating decentralized storage solutions such as the InterPlanetary File System (IPFS) with blockchain technology can provide a resilient security infrastructure for safeguarding medical records. Such measures are essential not only for preventing unauthorized access, but also for ensuring data integrity and auditability in decentralized clinical environments.

Accurate prediction of drug–drug interactions (DDIs) is crucial for preventing adverse events and ensuring effective treatment. Early identification of potential interactions enables healthcare professionals to make informed prescribing decisions and adjust dosages as needed [8]. By leveraging large-scale biomedical data, deep learning approaches enable more accurate and robust drug–drug interaction (DDI) predictions [1]. However, the challenge remains to create models that not only predict interactions effectively but also provide clear explanations for their outcomes.

The use of deep learning models in healthcare has raised concerns about the interpretability of their predictions [9]. While complex models like deep neural networks can achieve high accuracy, their “black-box” nature makes it hard for clinicians to understand the rationale behind them. In drug–drug interactions (DDIs), interpretability is crucial as it helps healthcare professionals grasp interaction mechanisms, validate predictions, and make informed decisions [10]. Interpretable models bridge the gap between predictive accuracy and clinical usability by revealing which features influence predictions and how drugs interact [11]. This transparency is essential for gaining clinician trust and effectively integrating predictive models into clinical practice.

With advancements in graph neural networks and embedding algorithms, quantifying drug chemical structures has become feasible. Additionally, biomedical knowledge graphs (KGs) containing extensive external drug feature information have found significant applications in DDI prediction [12]. The extraction of drug features has shifted from simple similarity fusion to aggregating semantic information from KG-associated features or emphasizing structural details. Novel graph attention methods have been developed on KGs, allowing different drug pairs to generate unique attention paths linking their entities [13]. These paths represent the dynamic generation of drug prediction features based on specific pairs, enhancing feature diversity. Recent studies have focused on increasing this diversity while neglecting the hidden similarities among drugs with similar interaction types. Earlier similarity fusion methods also struggled to capture these similarities effectively, as they did not consider joint features of drug pairs and only assessed individual drugs’ similarities [14].

To uncover hidden similarities among drug pairs’ joint features, we introduce a computational hypothesis: drug pairs with the same interaction type should exhibit similarity in their joint features within the prediction module. This module can be viewed as a learnable parameter matrix that maps the joint feature vectors of drug pairs sharing an interaction type to the same one-hot vector. Based on this assumption, we propose an interpretable predictive approach for graph embedding called PINGE, which relies solely on its own DDI network to accurately predict DDIs without external drug features. Consequently, it significantly reduces prediction costs and can be generalized to other domains as a universal method. The core idea of PINGE is to group drug pairs with similar interaction types under a unified relation in the training set. This allows analogous drug pairs to share directional features, forming a relation domain in feature space. This approach aligns with knowledge representation learning algorithms that generate entity embeddings through triples construction. While these algorithms focus on semantic accuracy from knowledge graphs, their application to DDI graphs proves advantageous; the resulting entity embeddings are highly representative and align well with the similarity features required for DDI prediction. Additionally, we maintain diversity in joint features by considering the specificity of different relations. The relation domain is shaped by relation embeddings, and by ensuring variability in their directional aspects, we achieve diverse joint features. In summary, the main contributions of this paper are as follows:

We propose an interpretable predictive approach for graph embedding called PINGE, which maps the joint features of drug pairs with identical interaction types to a common relation domain.
We demonstrate that input features in networks exhibiting similar cosine directions tend to share analogous distributions of maximum values.
The results consistently showed that PINGE outperformed existing state-of-the-art models across all four real-world datasets, highlighting its strong predictive performance and versatility.

The remainder of this paper is organized as follows. In Section 2, we summarize some studies related to our work. Section 3 introduces our method. In Section 4, we carry out some experiments. Section 5 concludes this paper.

2. Related Works

DDI prediction can be viewed as a representative problem in Artificial Intelligence, where decision errors may incur disproportionate clinical costs and therefore demand reliable and trustworthy modeling. Similar requirements arise in other high-impact domains such as financial fraud detection and Industrial Internet of Things (IoT) monitoring, in which models support cost- or safety-critical decisions under imperfect and heterogeneous data [15].

Across these domains, a recurring bottleneck is the data regime. First, class distributions are often highly skewed: fraudulent events are rare within massive transaction streams [16], and clinically verified adverse DDIs are sparse relative to the combinatorial space of drug pairs. Second, the underlying signals are strongly relational: fraud patterns depend on transaction graphs and account linkages, while IoT anomalies propagate through interconnected devices and sensors [17]; analogously, DDIs emerge from complex biological and pharmacological interaction networks. These shared characteristics motivate graph-based learning frameworks and robustness-aware training strategies that can handle long-tailed supervision and complex relational dependencies.

2.1. Drug–Drug Interaction Prediction

Modern pharmacotherapeutic strategies involve drug combinations to enhance efficacy and reduce toxicity, leading to increased interest in exploring the combinatorial space of approved and investigational drugs for effective and safe therapies across multiple indications [18]. However, the vast number of potential drug pairs raises research costs and reduces clinical trial efficiency [19]. Therefore, computational methods for predicting drug–drug interactions (DDIs) have become essential to address these challenges.

DDIs can be categorized into pharmacokinetic (PK) and pharmacodynamic (PD) interactions [20]. PK interactions occur when one drug affects the absorption, distribution, metabolism, or excretion of another drug. In contrast, PD interactions happen when one drug alters the pharmacological effects of another without influencing its pharmacokinetics [21]. This indicates that different DDI types may arise from distinct molecular structures, enzymes, and pathways in drugs. Early prediction methods aimed to identify potential DDIs by analyzing similarities in various drug features. Researchers proposed that drugs with similar characteristics are likely to exhibit comparable DDIs [22]. While this hypothesis does not fully capture the complexities of drug interactions, it paved the way for further DDI prediction research focused on integrating multidimensional drug features for a more comprehensive similarity assessment. This combined similarity was then applied to DDI prediction [23]. However, acquiring and standardizing these multidimensional features remains challenging; thus, merely merging feature similarities is insufficient for accurate predictions.

2.2. Deep Learning-Based Prediction

Traditional machine learning methods have been effectively used for DDI prediction. Recently, advancements in graph neural networks (GNNs) and biomedical knowledge graphs enable researchers to incorporate more comprehensive molecular structural information of drugs beyond just external feature similarity [11]. For example, A novel DDI prediction method that combines atomic 3D position encoding with an elastic message passing GNNs, utilizing attention mechanisms, multi-head attention, and adversarial attack detection for improved robustness [24]. However, it only considers interactions between drug molecules and overlooks external factors that may influence these interactions. Furthermore, it focuses solely on binary DDI prediction and does not address multi-class classification performance. For example, Chen [25] leveraged both drug knowledge and structural information through a cross-fusion strategy that utilizes drug molecular graphs and features from large-scale biomedical knowledge graphs to create robust drug representations. This method considers various factors related to DDI biological processes, ensuring diversity in drug features. Subsequent prediction methods have further examined the diversity of drug features or the representativeness of molecular structures but have overlooked the fundamental similarity of joint features in drug prediction [26]. Beyond DDI prediction, graph-based learning has demonstrated remarkable versatility across the broader landscape of health analytics. For instance, graph analytics frameworks like MEGA have been developed for infodemic risk management, enabling the systematic analysis of health-related misinformation spread [27]. Additionally, federated graph learning has been successfully applied to privacy-preserving, individual-level COVID-19 infection prediction, demonstrating the power of graph models in handling decentralized and sensitive healthcare data [28]. Situating PINGE within this evolving field highlights its potential as a robust and generalized solution for complex relational challenges in modern healthcare.

2.3. Multi-Relational Graph Embedding

TransE is a classical algorithm for modeling multi-relational data, widely used in knowledge question answering, information retrieval, and recommendation systems [29]. With the growth of biomedical knowledge graphs, TransE has also found applications in bioinformatics areas like drug repositioning and DDI prediction [30]. Several graph embedding methods have emerged, including TransA, DisMult, and RotatE. A detailed introduction is below. TransA enhances distance scoring to better model complex entities or relationships [31]. DisMult uses neural networks to learn low-dimensional entity vectors while constraining the relationship matrix to be diagonal, thus simplifying parameters with a bilinear scoring function as its loss function [32]. RotatE builds on TransE by defining relationships as rotations from source to target entities, allowing for inference of patterns such as symmetry/antisymmetry, inversion, and composition within knowledge graphs [33]. These algorithms focus on constructing triple relationships and effectively integrating joint features into relationship embeddings. Consequently, using graph embedding methods to model triplets in DDI graphs is an excellent choice for representing drug similarity.

3. Methods

3.1. Problem Define

In our study, the set of drugs is defined as

D = {d_{0}, d_{1}, \dots, d_{N_{d}}}

, where

N_{d}

is the total number of drugs, and the corresponding feature vectors are represented as

V = {v_{0}, v_{1}, \dots, v_{N_{d}}}

.

The set of relationships is represented as

R = {r_{0}, r_{1}, \dots, r_{N_{r}}}

, where

N_{r}

is the total number of interaction types. Let

G = {(h, t, r) ∣ h, t \in D, r \in R}

be the DDI graph, where each triplet represents a DDI pair. In this representation, h and t represent two interacting drug entities, while r represents the type of interaction between them. The joint feature in graph G is defined as the concatenation of the individual drug features. For the interactions between drugs, we propose two hypotheses:

Assumption 1.

If there are two pairs of drugs that exhibit the same interaction, it is likely that the underlying mechanisms of interaction for each pair are similar.

Assumption 2.

If there exist two different pairs of drugs that exhibit the same interaction, it is possible to identify two distinct drug combinations. For any drug pair within each combination, it is possible to find a potential intermediate drug in the drug interaction network such that the pathways from both drugs to the intermediate drug are similar.

In previous studies on predicting linkages with all positive links, there was a assumption that when two nodes have many common neighbors, there might be a connection between them [34]. By combining it with Assumption 2, we can propose a new assumption:

Assumption 3.

In a graph with all positive links, if there exists an intermediate node between two non-adjacent nodes such that the path length from the intermediate node to both nodes is the same and less than or equal to 2, then the probability of a connection between those two non-adjacent nodes is higher.

We first propose the following definition for prediction problems:

Definition 1.

(Same Maxima Distribution Space) There exist two nonzero real vectors

v_{ht}, v_{ht}^{'} \in R^{2 m}

, R denoting the set of real numbers and m means that the number of dimensions. If they can be subjected to the same linear transformation

W \in R^{N_{r} * 2 m}

such that the locations producing the maxima are the same, we consider that these two nonzero real vectors form the same maxima distribution space.

Clearly, a non-zero real vector can be associated with an infinite number of non-zero real vectors to form the same maxima distribution space. The question of which type of vector can form the largest space with it is what we need to discuss. Let

\begin{matrix} v_{h t} = [α_{1}, α_{2}, \dots, α_{2 m}] \end{matrix}

(1)

\begin{matrix} v_{h t}^{'} = [β_{1}, β_{2}, \dots, β_{2 m}] \end{matrix}

(2)

The linear transformation matrix W for them is as follows:

\begin{matrix} W = [\begin{matrix} w_{1, 1} & \dots & w_{1, 2 m} \\ ⋮ & ⋱ & ⋮ \\ w_{N_{r}, 1} & \dots & w_{N_{r}, 2 m} \end{matrix}] \end{matrix}

(3)

Let the position of the maximum value be denoted as k. Then we have:

\begin{matrix} W_{k} = [\begin{matrix} w_{k, 1} & \dots & w_{k, 2 m} \\ ⋮ & ⋱ & ⋮ \\ w_{k, 1} & \dots & w_{k, 2 m} \end{matrix}] - W \end{matrix}

(4)

After removing the zero vectors from matrix

W_{k}

, we obtain matrix

W_{k}^{'}

, which resides in the same maxima distribution space. The matrix

W_{k}^{'}

is denoted as:

\begin{matrix} W_{k}^{'} = [\begin{matrix} ω_{1, 1} & \dots & ω_{1, 2 m} \\ ⋮ & ⋱ & ⋮ \\ ω_{N_{r} - 1, 1} & \dots & ω_{N_{r} - 1, 2 m} \end{matrix}] \end{matrix}

(5)

Then,

W_{k}^{'} \cdot v_{h t}^{T}

and

W_{k}^{'} \cdot v_{h t}^{' T}

can be represented as a set of inequalities:

\begin{matrix} \{\begin{matrix} α_{1} ω_{1, 1} + \dots + α_{2 m} ω_{1, 1} > 0 \\ ⋮ \\ α_{1} ω_{N_{r} - 1, 1} + \dots + α_{2 m} ω_{N_{r} - 1, 1} > 0 \\ β_{1} ω_{1, 1} + \dots + β_{2 m} ω_{1, 1} > 0 \\ ⋮ \\ β_{1} ω_{N_{r} - 1, 1} + \dots + β_{2 m} ω_{N_{r} - 1, 1} > 0 \end{matrix} \end{matrix}

(6)

We can express the above inequalities as follows:

\begin{matrix} \{\begin{matrix} α_{1} θ_{1} + \dots + α_{2 m} θ_{2 m} > 0 \\ β_{1} θ_{1} + \dots + β_{2 m} θ_{2 m} > 0 \end{matrix} \end{matrix}

(7)

The row vectors of

W_{k}^{'}

belong to the set of solution vectors of inequality set (7). We measure the direction of two vectors with cosine similarity. If the vectors

v_{ht}

and

v_{ht}^{'}

have opposite directions,

v_{ht} = λ v_{ht}^{'}

, where

λ < 0

, then inequality set (7) has no solution. Therefore, the matrix

W_{k}^{'}

lies in the space of maxima same distribution formed by nonzero real vectors with directions that are not opposite to each other. When

2 m = 1

, since the directions are only the same or opposite, there are only two possibilities for an infinite solution and no solution for inequality set. Inequality (1) in Inequality Set (7) has a solution for any

v_{ht}

when

2 m = 2

. The solution to the inequality when

v_{ht}^{'}

is in the same direction as its direction is the solution to the set of inequalities, and the solution plane is maximized at this point. As the angle between

v_{ht}

and

v_{ht}^{'}

increases, the solution plane gradually decreases until the directions are completely opposite, resulting in a solution plane size of 0, as shown in Figure 1a. Similarly, as shown in Figure 1b, even though the dimension increases when

2 m = 3

, the inequality set (7) still possesses the aforementioned properties. Therefore, we can infer that in the case of

2 m \in N^{+}

, two vectors with the same direction can form the largest same maxima distribution space. The size of the space gradually decreases as the direction deviates, until the vectors are in opposite directions, at which point there is no same maxima distribution space.

In Definition 1, the W actually refers to the parameter matrix of MLP. The same maxima distribution space can be seen as the largest parameter space that the model can discover, enabling the mapping of two vectors to the same category. During training, if two input features that are closely aligned in cosine similarity belong to the same category, then the parameter space in which they can be found is maximized. This allows for similar input features, which are closer in cosine similarity to the aforementioned features, to be assigned to the same category during prediction. Therefore, we aim for the combined features of drugs belonging to the same category to be more aligned in cosine similarity, residing within the same relationship domain.

Our ultimate task is to learn a prediction function

y_{i j} = F (i, j ∣ w, Y, G)

and complete the mapping from drug combination features to relationship representations

M : D \times D \to R_{D}

, where

y_{i j}

represents the probability of interaction between drugs, and w represents the learnable parameters in the function F. As shown in Figure 2, we have designed a multilayer perceptron to accomplish the final prediction task. We concatenate the features of drug pairs and input them into the neural network for training. Finally, we use the trained model to make predictions on the test set. The prediction process can be described as follows:

\begin{matrix} {\hat{y}}_{i j} = σ (M L P ([v_{i} ∥ | v_{j}])), \end{matrix}

(8)

where

{\hat{y}}_{i j}

represents the predicted scores. In both binary and multi-class classification tasks, we use the Softmax function as the activation function. The Softmax function is chosen because it maintains the relative distribution of the maximum value after the mapping.

3.2. Extraction of Positive and Negative Samples from DDI Graph

Different negative sampling strategies can impact the generation of drug features. To identify the most suitable model negative sampling method, we extract all subsets from the collection of negative sampling methods and validate them individually on two datasets. For each triplet in the DDI training graph, we randomly perform negative sampling on the head and tail nodes, as well as the relationship node. By constructing nonexistent triplets using these three types of negative sampling, we aim to maximize the distance score of negative triplets in the drug feature generation module, thereby enhancing the specificity of feature representations for different drug relationships. Negative sampling is primarily aimed at the drug feature generation module.

We employ three types of negative sampling methods, namely head node negative sampling, tail node negative sampling, and relation negative sampling, as illustrated in the negative sampling section of Figure 2. Negative sampling of head and tail nodes is commonly used in knowledge representation learning. By fixing either

(h, r)

or

(r, t)

, we randomly sample other entities in the graph that are not connected to the given binary tuple to obtain negative samples. We additionally incorporate relation negative sampling in the negative sampling method, where

(h, t)

is fixed and a random relation is drawn from the remaining relation types as the negative sample. Its purpose is to increase the specificity of relation embeddings and prevent the training process from causing high similarity between certain relation types.

Due to the different requirements of various tasks, the construction of the DDI graph may vary. Therefore, we employ different sampling strategies tailored to different problems. For instance, in a multi-classification problem where the graph contains the actual interaction types between drugs, we can perform negative sampling on each triple by randomly combining three different negative sampling methods. On the other hand, in a binary classification problem, we introduce a fictitious relation

y_{i j} = 0

to ensure that the entire DDI graph structure includes all the relations that need to be predicted. We combine relation negative sampling with other negative sampling methods to generate negative samples for triples. After the negative sampling process, each positive triple will have multiple corresponding negative samples. These positive and negative samples will be fed into the drug feature generation module to generate high-level representations of the drugs.

3.3. Remote Negative Link Generation

In binary classification tasks for link prediction, we get that the dataset is usually full of negative samples. Therefore, most algorithms address this issue by randomly generating an equal number of non-existent links as negative samples. However, we believe that this approach may mistakenly include hidden positive samples as negative samples during training, thereby reducing the predictive capability of the model. Therefore, based on Assumption 3, we propose a negative sample generation method.

In a biological network where all links are positive, let us consider two non-adjacent nodes, h and

h^{'}

, that are connected by paths of length

k \leq 4

. Assuming there are n paths between them, for each path, we can find an intermediate node

h_{m}

such that the difference between the path lengths from the two non-adjacent nodes to l is minimal, with a difference typically equal to 0 or 1. The distance scores of these two nodes in each path can be represented as follows:

\begin{matrix} p_{s} = | d i s t a n c e (h \to h_{m}) - d i s t a n c e (h_{m} \to h^{'}) | \end{matrix}

(9)

\begin{matrix} p_{s} = 0 o r 1 \end{matrix}

(10)

When

p_{s} = 0

, we believe that there is no distance between these two nodes, meaning that the likelihood of the existence of a link between them is greater than the likelihood of the absence of a link. Therefore, we should not classify the link between these two nodes as a negative sample. However, when

p_{s} = 1

, we consider that the two nodes are far apart, and the likelihood of the existence of a link between them is smaller than the likelihood of the absence of a link. In simple terms, we believe that if the number of nodes between two connected nodes is less than 4 and odd, their distance is considered small, whereas if it is even, the distance is considered large. In Figure 3a, the dashed line connecting the two nodes represents a close distance, and they should not be classified as negative samples. Two nodes without a connection are allowed to be classified as negative samples, but it does not necessarily mean that they will always be classified as negative samples. They participate in the random selection of distant nodes. As shown in Figure 3b, we can consider the three possibly fully connected intermediate nodes as a single intermediate node, so the two nodes connected through this intermediate node should not be classified as negative samples.

3.4. Pre-Train Drug Feature

The fitting of PINGE’s joint feature similarity relies on the construction of triplet relationships in the interaction network, where drug pairs with the same mechanism of action are mapped to the same relationship domain. This mapping is consistent for all triplets in the graph, and the joint features of the head and tail entities in the same link need to be directed to the same relationship domain. The joint representation of drug pairs affects the size and direction of the joint features and further influences the choice of the mapping function. In this paper, we use feature concatenation to obtain the joint features of drug pairs. This approach retains the differences between drug entities while fully incorporating the mapping function. We hope that a good mapping function can make the joint features closer to each other in terms of direction, under the assumption of the same type of interaction.

Closer direction implies a larger maximum value and shared distribution space between the joint features, making them more likely to be assigned to the same category by predictive models. For the DDI graph, we need to consider the directionality of triplets. When dealing with directed graphs, we define the tail entity as the element-wise rotation of the head entity with respect to the relationship representation. The specific process is shown in Figure 2b, where we introduce different mapping patterns for all triplets in the training set based on the task, and then update the corresponding entity and relationship representations according to the positive or negative samples. PINGE is not limited to ROTATE, any algorithm that possesses the above characteristics is feasible, such as the classical TRANSE, which can also achieve good results in similar tasks.

We utilize ROTATE to generate drug representations, which is inspired by Euler’s formula. Euler’s formula states that

e^{i θ} = cos θ + i sin θ

, where a complex number can be viewed as a rotation on the complex plane. Specifically, the RotatE model maps entities and relations to a complex vector space and defines each relationship as an element-wise rotation from the source entity to the target entity. Given a triplet

(h, t, r)

, we expect

v_{t} = v_{h} ⊙ u_{r}

, where

v_{h}, v_{t}, u_{r} \in R^{m}, | u_{r i} | = 1

and ⊙ represents the element-wise products (Hadamard products). We begin by randomly initializing the embeddings of all nodes in the graph and the edges in the training set. This results in embeddings for both nodes and relationships, with their sizes constrained within the range of

(- γ, γ)

, where

γ

represents a fixed margin. Subsequently, we apply the ROTATE model to calculate the distance scores for each triplet as follows:

\begin{matrix} d_{r} (h, t) = ‖ v_{t} - v_{h} ⊙ u_{r} ‖ \end{matrix}

(11)

It is worth noting that no matter how many iterations in training, the entity embeddings and relationship embeddings in the graph cannot reach an ideal state. In an ideal state, where

| u_{r i} | = 1

, the relationship can also be defined as a rotation from the tail entity to the head entity. However, in practice, the edges in graph G are directed, and the relationship cannot be seen as a rotation from the tail entity to the head entity. This has an impact on handling undirected graphs. As mentioned earlier, in a binary graph, the edges with

y_{i j} = 0

are fictional and undirected. Their directional nature reduces the accuracy of distinguishing DDI types. To address this issue, when dealing with link prediction in undirected graph networks, we define the target entity as the element-wise rotation from the source entity to the relationship. This approach leads to a new distance score as follows:

\begin{matrix} d_{r} (h, t) = ‖ u_{r} - v_{h} ⊙ v_{t} ‖ \end{matrix}

(12)

The advantage of this approach is that we do not need to consider the directionality of the triplets. By applying a negative sampling strategy, we can generate corresponding negative samples for each existing triplet. The loss function for pre-training can be described as follows:

\begin{matrix} L = - log σ (γ - d_{r} (h, t)) - \sum_{i = 1}^{m} \frac{1}{m} log σ (d_{r} (h_{i}^{'}, t_{i}^{'}) - γ) \end{matrix}

(13)

To demonstrate that ROTATE can capture the similarity of drug combination features, let us assume that after multiple rounds of training, for any triplet in the graph, we have

v_{t} = v_{h} ⊙ u_{r}

. According to Definition 1, we know that two non-opposing, non-zero real vectors form same maxima distribution space, and there exists a matrix

W_{k}^{'}

in this space.

v_{h t}

and

v_{h t}^{'}

form an angle

θ

, where

v_{h t} = c o n c a t (v_{h}, v_{t})

and

v_{h t}^{'} = c o n c a t (v_{h}^{'}, v_{t}^{'})

. They belong to triplets

(h, r, t)

and

(h^{'}, r, t^{'})

respectively. Since given the conditions

v_{t} = v_{h} ⊙ u_{r}

,

v_{t}^{'} = v_{h}^{'} ⊙ u_{r}^{'}

, and

| u_{r i} | = | u_{r i}^{'} |

,

cos θ = cos < v_{h t}, v_{h t}^{'} >

, we can conclude the following:

\begin{matrix} \begin{matrix} cos θ & = \frac{v_{h} v_{h}^{' T} + v_{t} v_{t}^{' T}}{\sqrt{v_{h} v_{h}^{T} + v_{t} v_{t}^{T}} \cdot \sqrt{v_{h}^{'} v_{h}^{' T} + v_{t}^{'} v_{t}^{' T}}} \\ = \frac{v_{h} v_{h}^{' T} + v_{h} (v_{h}^{' T} ⊙ u_{r} ⊙ u_{r}^{'})}{\sqrt{v_{h} v_{h}^{T} + v_{h} v_{h}^{T}} \cdot \sqrt{v_{h}^{'} v_{h}^{' T} + v_{h}^{'} v_{h}^{' T}}} \end{matrix} \end{matrix}

(14)

In light of the DDI graph being connected, it is guaranteed that there exists a vertex

u_{s}

for which

v_{h} ⊙ u_{s} = v_{h}^{'}

. Furthermore, given the condition

| u_{s i} | = 1

, we can ultimately ascertain:

\begin{matrix} cos θ = \frac{v_{h} (v_{h}^{T} ⊙ u_{s}) + v_{h} (v_{h}^{T} ⊙ u_{s} ⊙ u_{r} ⊙ u_{r}^{'})}{2 v_{h} v_{h}^{T}} \end{matrix}

(15)

For the same head entity, it is always true that

h = h^{'}

and tail entity

t \neq t^{'}

, and

cos θ ≧ 0

. When

r \neq r^{'}

, the value of

cos θ

depends on the cosine similarity of the two types of relationships, and the similarity of relationships depends on their distribution in the graph G. When

r = r^{'}

,

cos θ = 1

, which means that the directions of the two joint features are completely the same, resulting in the maximum space of the same maxima distribution. In the case where the head entities and tail entities are different, when

r = r^{'}

, the magnitude of

cos θ

is actually equal to the magnitude of

cos < v_{h}, v_{h}^{'} >

. This means that

cos θ

depends on the cosine similarity of the two head nodes, and whether the two head nodes are similar depends on whether there exists an intermediate node

h_{m}

in graph G that has similar paths to the two head nodes.

Let us revisit Assumptions 1 and 2. We consider that in the real world, different drugs have similar principles for interacting with the same type of interaction. This similarity could be due to interactions at the substructure level, enzyme interactions, or the effects of metabolites, among other factors. Therefore, drugs within the same relationship category may trigger similar interaction events elsewhere or have similar sets of interaction events. Consequently, when

r = r^{'}

, there exists an intermediate node

h_{m}

that can make the path between

h \to h_{m}

and

h^{'} \to h_{m}

similar, thereby making the cosine direction of the joint features closer. During prediction, we aim to maximize the distribution space of the maximum values of drug pairs within the same relationship type. Therefore, based on the above discussion, for drug pairs in the test set, we expect to find a drug pair in the training set of the same relationship type that has highly similar drug joint features. This maximizes the same maxima distribution space and allows accurate identification by the prediction module.

3.5. Optimization Strategy

To clarify how entity and relation representations are updated, we describe the loss objective and the gradient-based training procedure used in PINGE. We optimize all learnable parameters end-to-end by minimizing a joint objective that couples the downstream DDI prediction loss with a representation-learning term:

L = L_{pred} + α L_{repr}

(16)

where

L_{pred}

is the cross-entropy loss for interaction-type prediction and

α

balances the two terms.

For representation learning, we adopt negative sampling. For each observed triple

(h, r, t)

, we construct N negative triples by corrupting the head or tail entity, obtaining

{(h_{i}^{'}, r, t_{i}^{'})}_{i = 1}^{N}

. The representation-learning term is computed as the average over training triples of the following margin-based objective:

ℓ_{repr} (h, r, t) = - log σ (γ - d_{r} (h, t)) - \sum_{i = 1}^{N} \frac{1}{N} log σ (d_{r} (h_{i}^{'}, t_{i}^{'}) - γ)

(17)

where

σ (\cdot)

is the sigmoid function,

γ

is the margin, and

d_{r} (\cdot)

is the relation-specific distance induced by the RotatE scoring function. This objective increases the score separation between valid and corrupted triples, yielding discriminative entity and relation representations.

Training is performed with mini-batch stochastic optimization using Adam. In each iteration, negative samples are generated on-the-fly for the current batch and parameters are updated by backpropagating

\nabla L

. We apply early stopping based on validation performance and use a unified set of optimization-related hyperparameters across experiments for fair comparison. Specifically, we use learning rate

10^{- 3}

and batch size 2048. For representation learning, we set embedding dimension 800, negative sampling size

N = 128

, and margin

γ = 12.0

. When using adversarial negative sampling for RotatE, the temperature is set to

1.0

. The loss-balance factor

α

is fixed as reported in the experimental section.

4. Experimental Settings

In this section, we conducted extensive experiments to validate the effectiveness of PINGE in the task of predicting drug interactions and its versatility in similar tasks.

4.1. Data Preparation

For the DDI prediction task, we used two widely used real-world datasets:

KEGG-drug: This dataset consists of a knowledge graph containing 1925 approved drugs and 56,983 approved drug interactions. Negative samples were randomly generated at a 1:1 ratio, where negative samples represent drug pairs that do not have evidence of interaction in the positive samples. This dataset was provided by [35].

DrugBank: This dataset includes 1710 drugs and 192,284 drug pairs from DrugBank [36], covering interactions among 86 types of drugs, each describing specific mechanisms of action. It was also provided by [35] and the version used in our experiments is from 2018. These datasets were chosen for their wide usage and comprehensive coverage of drug interactions, allowing us to evaluate the performance of our approach in predicting drug interactions accurately.

To evaluate the portability of PINGE, we utilized the following datasets: human drug–target interactions (CPI dataset): This dataset contains highly reliable positive and negative samples of drug–target interactions. The positive samples were extracted using a systematic filtering framework based on similarity rules [37]. It includes 1052 unique compounds and 852 unique targets, resulting in 3369 positive interactions. To ensure fairness in comparison, we directly used the negative samples provided in the baseline literature for training and evaluation.

Human Reference Interactome (HuRI): It is a human protein–protein interaction (PPI) dataset comprising 8274 proteins and 52,548 protein–protein interactions [38]. It is assembled from three independent high-quality yeast two-hybrid (Y2H) screens. To maintain balance, we generated negative samples for training and evaluation in a 1:1 ratio with the positive samples, following remote negative link generation as described in the paper.

For DDI prediction, we performed 5-fold cross-validation on the DDI dataset to evaluate PINGE’s performance. For the human dataset and the HuRI dataset, we used 10-fold cross-validation during evaluation.

4.2. Baseline and Performance Metrics

In the DDI prediction task, we compared PINGE with the following baselines: KGNN [35], GAT [39], GAT-const [13], SumGNN [40], GIN [41], and LaGAT [13]. For the experiments demonstrating the portability of PINGE in PPI prediction, we used RW [42], cGAN1 [43], SkipGNN [44], DNN+node2vec [45], and SEAL [46] as baselines. For DPI prediction, the baselines included GCN, GraphDTA [47], TransformerCPI [48], DrugVQA, and MINN-DTI [49].

To improve reproducibility, we clarify the implementation sources and primary settings of all baseline models. All baselines are trained and evaluated under the same data splits and evaluation protocol as PINGE. For each baseline, we follow the original publications or official implementations where available, and tune the key hyperparameters on the validation set; early stopping is employed where applicable.

In addition, since TransE and RotatE are utilized in our multi-relational embedding module and ablation study (rather than as primary competing baselines), we report their exact configurations here. We adopt the Adam optimizer with a learning rate of 0.001, batch size 2048, embedding dimension 800, negative sampling size 128, and margin

γ = 12.0

. For RotatE, adversarial negative sampling is activated with a temperature value of 1.0.

We evaluated the performance of PINGE on each dataset using six metrics: Accuracy (ACC), Area Under the Receiver Operating Characteristic curve (AUROC), Area Under the Precision–Recall curve (AUPRC), Balanced Score F1, Precision, and Recall. It is important to note that in multi-class DDI prediction, all metrics except for ACC need to be averaged. These metrics provide a comprehensive assessment of PINGE’s performance on different datasets, capturing various aspects of its predictive capabilities for drug-drug interactions.

4.3. Parameter Setting

In our experiments, PINGE was implemented using PyTorch. The model was trained using the Adam optimizer, and the learning rate for both pre-training and fine-tuning was set to 0.001. The optimal hyperparameter settings are summarized in Table 1.

4.3.1. Computational Cost and Efficiency Analysis

To address concerns regarding computational efficiency, potential over-parameterization, and deployment feasibility—particularly given the 800-dimensional embeddings—we characterize the computational profile of PINGE on a consumer-grade workstation (Intel i7-14700KF, 32 GB RAM, NVIDIA RTX 4060 Ti with 16 GB VRAM). All experiments are implemented in PyTorch 2.0 with mixed-precision training.

On our core benchmarks, the KG-embedding pre-training phase (using RotatE in our default setting) takes an average of 7 min, and subsequent fine-tuning converges in an average of 5 min under the standard protocol. The average inference latency is 0.85 ms per drug pair on GPU and 3 ms on CPU in our implementation. Although the embedding dimension is relatively high, it helps capture rich multi-relational semantics in the biomedical knowledge graph and yields consistent performance under our evaluation protocol (the validation–test Macro-F1 gap typically stays below 1%). The peak GPU memory usage is around 4 GB under mixed-precision training, and the embedding-related parameter/memory cost scales approximately linearly with the embedding dimension d and the numbers of entities and relations, i.e.,

O ((| E | + | R |) d)

. Overall, these results indicate that PINGE remains practical and scalable on standard hardware.

4.3.2. Trade-Off of Structure-Only Modeling

We discuss the trade-off between structure-only learning and feature-enriched hybrid models, particularly highlighting clinical and data-scarce scenarios where a network-only approach like PINGE is preferable. Structure-only models infer relational patterns purely from the topology of biomedical networks, bypassing the need for explicit molecular descriptors or pharmacological annotations. While hybrid models can yield more fine-grained representations when high-quality, standardized features are available, their performance may degrade in the presence of missing, noisy, or inconsistently annotated data across heterogeneous sources.

A structure-only strategy offers distinct advantages in three practical settings. First, for novel or cold-start drugs whose molecular or pharmacological profiles are incomplete or unavailable, structure-only inference remains viable through network connectivity. Second, in multi-center clinical datasets where feature annotation standards vary substantially across institutions, PINGE avoids integration noise and data-alignment artifacts by relying on graph topology. Third, in long-tailed DDI scenarios with sparse supervision, global relational constraints derived from graph topology provide a more stable inductive bias than unreliable features, reducing the risk of overfitting to rare interaction types. Consequently, PINGE prioritizes applicability and robustness in real-world environments characterized by feature scarcity or data heterogeneity.

4.4. Systematic Hyperparameter Optimization

To reduce reliance on manual tuning and mitigate dataset-specific over-optimization, we conduct systematic hyperparameter optimization using Particle Swarm Optimization (PSO) [50]. PSO is a population-based global search method that maintains a swarm of candidate configurations and iteratively refines them via information sharing, reducing the risk of converging to suboptimal local optima. Specifically, each particle represents a hyperparameter configuration and is evaluated under a fixed training protocol, with Macro-F1 on the validation set as the optimization objective; the test set is not used during optimization. Particles iteratively update their velocity and position by combining their personal best historical solution (pbest) and the swarm’s global best solution (gbest), balancing exploration and exploitation. We run PSO with

P = 20

particles for

T = 50

iterations, resulting in P ×

T = 1000

objective evaluations within a predefined search space of these hyperparameters, and select the best-performing configuration under this computational budget.

We optimize

α

,

τ

, N, and

λ

while keeping all other settings fixed. The best-performing configuration returned by PSO is then fixed for all subsequent experiments:

α = 0.5

,

τ = 0.1

,

N = 128

, and

λ

= 1.2 ×

10^{- 5}

. Under this unified configuration, PINGE yields improved Macro-F1 compared with the default setting, with only marginal additional training overhead in practice.

5. Results

5.1. Performance Comparison

We selected two best negative sampling strategies for model negative sampling and compared them with the mentioned baselines on the two datasets. Table 2 shows the AUC, ACC, F1 score, and AUPR of each model on the KEGG-drug dataset, as well as the ACC, Macro-Precision, Macro-F1 score, and Macro-Recall on the DrugBank dataset. The baseline evaluation results on the KEGG-drug dataset are from [13], and the baseline evaluation results on the DrugBank dataset were obtained by conducting experiments using the models and hyperparameters mentioned in the original papers. Compared to all the baselines, our model performs the best. On the KEGG-drug dataset, LaGAT is the best-performing baseline, as it enhances the specificity of each drug feature. However, our model has advantages over LaGAT, with improvements of

0.4 %

on AUC,

0.7 %

on ACC,

0.6 %

on F1 score, and

0.5 %

on AUPR. The reason for this is that our method maintains the specificity of drug features while providing similarity for drug pairs with the same relationship. The same applies to the DrugBank dataset, where we can see from Table 3 that LaGAT remains the best-performing baseline, and our model outperforms LaGAT with improvements of

2.4 %

on ACC,

3.4 %

on Macro-Precision,

3.8 %

on Macro-F1 score, and

4.2 %

on Macro-Recall.

These results fully demonstrate the effectiveness of our method, and it can be observed that our method shows a significantly higher performance improvement on the DrugBank dataset compared to the KEGG-drug dataset. The contrasting baseline LaGAT had an ACC lift of

2.68 %

in binary task and

3.02 %

in multiclassification task relative to the other baselines, which can be found to be not a large difference. As mentioned earlier, drugs in the same relationship pairs may trigger the same interaction events elsewhere or have similar sets of interaction events. Our method can generate more comprehensive drug features on drug interaction networks with such characteristics, leading to more accurate predictions. However, the KEGG-drug dataset has 1925 drug types with only 56,983 positive samples, making it sparser compared to the DrugBank dataset. Therefore, it lacks the presence of such properties in more drug pairs within the training and testing sets. Consequently, the performance improvement of our model differs significantly between the two datasets, which indirectly confirms the rationality of PINGE.

Additionally, by incorporating another knowledge representation learning method, TransE, as the core of our model, we achieved better results than LaGAT on the DrugBank dataset. This is because TransE is an improved algorithm based on TransE, known as RotatE, and they possess equivalent properties within PINGE.

5.2. Discussion

We conducted the respective prediction tasks using PINGE on two different datasets, and compared its performance with existing methods. The results show that PINGE achieved better performance, which is sufficient to demonstrate its portability and effectiveness for similar tasks. When applying PINGE to predict protein-protein interactions in the protein interaction network, we did not adopt the random generation of negative samples. Instead, we followed Assumption 3 and tried to avoid generating negative samples in cases where there is a higher possibility of a connection between two nodes.

During the prediction process, we found that when

n = 1

in remote negative link generation, the overall prediction performance of PINGE was average.Compared to randomly generating negative samples, there was only a slight improvement in metrics such as AUC. However, when

n = 2

, there was a significant improvement in all evaluation metrics. This indicates that Assumption 3 holds true. Table 4 demonstrates that PINGE outperforms existing protein–protein interaction prediction methods.

In the drug–target dataset, we directly referenced the negative samples from the baseline literature. This is because the drug–target interaction network is a many-to-one and unidirectional network, and the negative samples should not only be generated within the positive sample network but also consider incorporating drugs that do not appear as positive samples.

Such data heavily tests the generalization ability of PINGE, as some nodes in the negative samples were only randomly initialized during the feature generation process. This is similar to the case of PINGE predicting DDI on the Drugbank dataset, where a few nodes in the test set of the five-fold cross-validation did not appear in the training set. The difference is that these drug nodes, which only underwent random feature initialization, appear more frequently in the drug–target test set. Identifying negative samples containing these drug nodes is approximately equivalent to random selection, as their drug features are obtained through random initialization. This undoubtedly lowers the overall performance of drug–target prediction. As shown in Table 5, despite the presence of drug nodes with random features in the drug–target test set, PINGE still exhibits better performance compared to state-of-the-art models on the human drug–target dataset. This is sufficient to demonstrate the strength of PINGE.

Class imbalance is a pervasive challenge in real-world drug–drug interaction (DDI) datasets, where minority interaction types may contain only one-fifth, one-twentieth, or one-fiftieth as many samples as majority types. This skewed distribution can bias model learning toward majority classes and degrade performance on minority DDI types. To mitigate this issue, PINGE integrates two complementary strategies: (1) class-weighted cross-entropy, where loss weights are set inversely proportional to class frequency to prevent minority gradients from being overwhelmed; and (2) a cosine-direction alignment mechanism, which acts as a global relational-geometry constraint to preserve distinct directional patterns across interaction types and reduce minority embedding collapse into majority distributions. To evaluate robustness to class imbalance, we construct DDI subsets with three imbalance ratios (1:5, 1:20, and 1:50), defined as the majority-to-minority class sample-count ratio, and repeat each setting five times. We report the loss value as mean ± std over 5 repeats, since class imbalance primarily manifests as gradient domination by majority classes and thus increases optimization difficulty under skewed supervision.

As shown in Figure 4, the loss value increases as the imbalance becomes more severe for all models, consistent with increased optimization difficulty when minority classes are underrepresented. PINGE exhibits the smallest loss growth: its loss increases by 0.24 (from

0.31

to

0.55

) when moving from 1:5 to the most severe 1:50 imbalance, while all baseline methods show larger increases under the same shift. Overall, these results indicate that PINGE is more robust to highly skewed class distributions in DDI datasets.

Robustness to sample noise: Real-world biomedical datasets may contain noisy samples, which can introduce misleading supervision signals and degrade model learning. To evaluate robustness under sample noise, we perturb 10%/20%/30% of the training samples by replacing their interaction-type labels with valid but incorrect types, repeating each noise ratio five times. We report the loss value (mean ± std) as an indicator of sensitivity to corrupted supervision, where larger loss increases imply higher vulnerability to sample noise.

As shown in Figure 5, the loss value increases with the noise ratio for all models, consistent with reduced supervision reliability under stronger corruption. Notably, PINGE exhibits the smallest loss increase. Under 30% sample noise, its loss rises from

0.28

to

0.65

, indicating higher tolerance to noisy supervision than the baseline methods under the same noise level. We attribute this robustness to the cosine-direction alignment constraint, which acts as an inductive bias that encourages learning intrinsic interaction patterns rather than overfitting corrupted samples.

In addition, contrastive or consistency regularization across neighboring subgraphs could further enhance robustness to weak supervision. These extensions would allow PINGE to better accommodate the noisy and evolving nature of real-world DDI repositories, strengthening its reliability in clinical and pharmacovigilance-oriented applications [51].

Furthermore, for downstream clinical deployments where external validation may involve sensitive patient-level records, privacy and governance constraints become important engineering considerations beyond the modeling core. In decentralized clinical environments, future implementations of PINGE could incorporate secure multi-party computation (SMPC) to preserve the confidentiality of raw data during collaborative validation [52]. Additionally, to mitigate the risks of unauthorized access, blockchain-enabled auditability and permission management (with sensitive records stored off-chain in encrypted form) could be employed to enhance data governance and traceability [4]. These security-aware design considerations can strengthen trustworthiness and resilience for practical deployments.

Extension to temporal and dynamic interaction graphs. The current implementation of PINGE is designed for static interaction graphs, where drug–drug relationships are treated as time-invariant. However, real-world pharmacological systems are inherently dynamic, as drug usage patterns evolve, new medications are introduced, and evidence of adverse interactions accumulates over time. Extending PINGE to temporal or dynamic interaction graphs therefore represents an important direction for future work. One potential adaptation is to construct time-sliced interaction graphs, where edges are indexed by temporal windows, and to integrate temporal graph embedding techniques or recurrent architectures to model the evolution of interaction patterns [53]. In this setting, the PINGE framework could be extended to learn joint structural and temporal representations, capturing not only whether interactions exist, but also how they emerge, strengthen, or diminish over time. Additionally, incremental updating mechanisms could be explored to incorporate newly reported adverse interactions, reducing the need for full retraining from scratch. Such dynamic extensions would enable time-aware DDI prediction and support longitudinal risk monitoring in practical clinical screening settings.

5.3. Ablation Study

To validate our point, we remove the key multilayer perceptron to perform ablation studies on the method, which also means that RotatE is used directly to predict drug-drug interactions, relying on the semantic information in the DDI graph network. Because RotatE itself is suitable for binary prediction tasks, the experiments were performed on a dichotomous dataset.

The results are shown in Figure 6, and the results of the experiment show that the prediction obtained by using only the semantic information in the positive link network is much lower than the prediction we obtained by using the joint feature similarity.

We analyze the parameter sensitivity of the proposed PINGE for prediction. Specifically, the main focus is on the variation of the performance of the proposed method in the drug feature dimension and the negative link generation distance. For fair comparison, we adjust only the relevant variables and keep the other variables constant.

We analyzed the effect of drug feature dimensions on the prediction results in the multiclassification dataset, as shown in Figure 7 it can be found that the PINGE performance improves as the feature dimension size increases and gets the best performance at 800. This may be due to the fact that the inequality set (7) obtains a larger solution space with feature dimensions, while the limitation of the number of layers in the MLP makes that the feature dimensions are not as large as they should be.

As can be seen in Figure 8, the performance of the proposed negative sample generation method for PINGE as well as the link prediction algorithm SEAL, which relies on graph structural features, is positively affected with increasing distance. This may be due to the fact that a greater distance makes the generated negative samples more realistic and enhances the sensitivity of the model to positive samples.

5.4. Interpretable Analysis

To clarify the interpretability of PINGE, we emphasize that the model’s explanations rely on structural relationships and multi-hop paths within the DDI network. While the interpretability is largely post hoc, the intermediate nodes and paths provide mechanistic chains that map directly to pharmacological reasoning workflows, allowing clinicians to understand and validate predicted interactions through observable structural connectivity.

The larger the space of the maximum value co-distribution formed by two vectors, the more likely they are to be classified into the same category by the model. The size of the maximum value co-distribution space depends on whether the directions of the two vectors tend to be consistent, that is, the size of the cosine similarity. In the prediction task, we hope that the combined features of each drug pair in the test set are more similar to those of drug pairs of the same kind in the training set than to those of other kinds of drug pairs. Therefore, we analyze the similarity of the combined features of the generated drug features on the DrugBank training and test sets to verify our point of view.

We averaged the combined features of drugs of the same type in the training set triples and obtained 86 types of average drug joint features. We also obtained 86 types of average combined features for the test set. The results are shown in Figure 9. Figure 9a represents the cosine similarity confusion matrix of the average drug combined features of all categories in the training and test sets that were predicted correctly. The horizontal axis represents the average drug combined features of each interaction category in the test set, and the vertical axis represents the average drug combined features of each interaction category in the training set. Each column means the cosine similarity between the average combined features of this category in the training set and the average combined features of all categories in the training set. The greater the similarity, the darker the color. Therefore, we can clearly see that except for the few categories that did not appear in the test set, the average combined features of the same relationship type have relatively darker colors and higher similarity. Figure 9b represents the similarity confusion matrix of the average combined features of the training set categories and the average combined features of the test set categories that were predicted incorrectly.

To make the similar relationships more intuitive, we only kept the maximum value of each column in Figure 9c,d, which was assigned a value of 1, and the other values were set to 0. It can be seen that although most of the maximum value distributions of the similarity of the average combined features of the categories predicted incorrectly are on the diagonal, there are still many categories whose maximum value distributions are scattered. The categories on the diagonal of the test set may be misclassified due to parameter adjustment errors, or their own similarity is not high, and the difference in similarity with other categories is small, leading to misclassification. However, the maximum value distribution of the similarity between the average combined features of the categories predicted correctly in the test set and the categories’ average combined features in the training set is entirely on the diagonal. That is to say, the overall similarity of the correctly predicted drug combined features in the test set to the drug combined features of the same relationship type in the training set is the highest and closest in direction, which undoubtedly matches our expectations.

5.5. Interpretable Results

The usual explanation for predicting drug interactions focuses on the feature generation part, and PINGE is no exception. While PINGE’s interpretability is primarily structural and post hoc, the model provides ’white-box’ evidence chains through intermediate drugs and multi-hop paths. For example, predictions for Doxercalciferol with Metolazone or Quinethazone are supported by intermediate nodes such as Methantheline, whose known interactions align with clinical knowledge. Similarly, for other drug pairs, intermediate drugs ensure that the combined features of the drugs are directionally similar to those observed in training data, corresponding to their actual pharmacological behavior. These structure-based explanations enable clinicians to interpret and validate the model outputs against biochemical pathways, bridging the gap between deep learning predictions and bedside decision-making. Future work will explore more intrinsically interpretable approaches, such as causally grounded graph reasoning, to further enhance transparency. However, unlike other interpretable methods, we do not directly focus on the specific interaction process, but instead explain the prediction process based on Assumption 2. As an example, let’s take the drug Doxercalciferol, which is an approved medication for treating secondary hyperparathyroidism in patients with stage 3 or 4 chronic kidney disease. When used in combination with Metolazone, a long-acting diuretic for chronic kidney failure, the risk or severity of hypokalemia may increase [36]. In PINGE, their combined use is predicted to potentially increase the risk of adverse reactions, which is consistent with the description in reality.

The combined use of Quinethazone and Doxercalciferol may also increase the risk of adverse reactions, if the interaction between these two drugs appears in the test set while the interaction between Metolazone and Doxercalciferol appears in the training set. As shown in Figure 10a, PINGE judges that they will have the same interaction with Doxercalciferol through an intermediate drug between Metolazone and Quinethazone. According to Assumption 2, we believe that two drugs will have similar interactions or similar interaction paths with their intermediate drugs. It turns out that the intermediate drug Methantheline, when used in combination with Metolazone or Quinethazone, can lead to an increase in the serum concentration of the latter drug. In addition, there are still many drugs that interact similarly with Metolazone and Quinethazone. Their existence makes the representation of the two drugs tend to be similar, and also makes their combined features with Doxercalciferol and Metolazone more similar in direction.

In Figure 10b, suppose PINGE needs to predict the interaction between the drugs Toremifene and Chlorthalidone. It can be observed that Chlorothiazide, a drug used to treat congestive heart failure, cirrhosis, and hypertension and edema associated with corticosteroid and estrogen therapy, appears in the intermediate node between Doxercalciferol and Toremifene. Its combined use with Doxercalciferol or Toremifene increases the risk or severity of hypercalcemia in patients [36], indicating the same interaction. Carbamazepine, an anticonvulsant medication used to treat various types of seizures and pain caused by trigeminal neuralgia, appears in the intermediate node between Metolazone and Chlorthalidone. Its combined use with the above two drugs also increases the risk or severity of adverse reactions. Similarly, there are multiple intermediate drugs that have the same interaction, making the combined features of drugs with Toremifene and Chlorthalidone more similar in direction to those with Doxercalciferol and Metolazone. The impact of intermediate drugs on the similarity of drug pairs’ combined features is detailed in Section 3.

From a clinical perspective, interpretability is most useful when explanations can be related to established pharmacological reasoning, for example, by separating pharmacokinetic (PK) from pharmacodynamic (PD) mechanisms, rather than relying on abstract latent features alone. In our framework, structure-based explanations are provided as intermediate nodes and paths connecting a drug pair, where intermediate nodes correspond to clinically meaningful entities such as CYP enzymes or transporters, protein targets, or biological pathways. These explanations can be communicated to clinicians as a concise list of top-k high-scoring interaction paths with the mediating entities explicitly named, and can be summarized into mechanism-oriented statements by grouping enzyme/transporter-related paths as PK-related and target/pathway-related paths as PD-related. For example, paths indicating shared CYP3A4 metabolism may indicate a potential PK-related interaction risk, whereas paths involving shared targets or QT-related pathways may be consistent with additive PD effects; in all cases, the mediating entities are explicitly listed to facilitate follow-up cross-checking.This presentation is broadly consistent with common clinical pharmacology workflows, where clinicians consult labels, knowledge bases, and guidelines to form mechanistic hypotheses and then decide appropriate monitoring strategies or dose adjustments. Accordingly, the path-level rationales produced by our model are intended to provide traceable, mechanism-oriented support for hypothesis-driven verification, rather than serving as definitive causal evidence or replacing clinical judgment.

6. Conclusions

In this work, we present a new method for generating drug representations for predicting binary and multiclass drug interactions based on knowledge-based representation learning. It does not rely on any features of the drug including the molecular structure, but generates high-level representations needed for drug interaction prediction directly from the drug interaction network. It preserves the specificity of individual drug representations while adding similarity to the joint features of drug pairs of the same class. We also demonstrated that the presence of this similarity improves the accuracy of prediction and conducted experiments on two real datasets, where a large number of experimental results demonstrated the effectiveness of our model compared to other existing research results. In our future work, we will further focus on the drug’s own features and try to link these features with attention aggregation methods and further optimize the generation according to our approach, so that the final representation of the drug can predict the DDI of the new drug and at the same time provide interpretability for the model.

Author Contributions

Conceptualization, X.L.; Methodology, X.L., Q.W. and L.G.; Software, X.L. and Z.Z.; Validation, X.L., C.C. and Z.Z.; Formal analysis, X.L. and C.C.; Investigation, X.L., C.C., Z.Z. and Q.W.; Resources, X.L., Z.Z., Q.W. and L.G.; Data curation, X.L., C.C. and Z.Z.; Writing—original draft, X.L.; Writing—review & editing, X.L.; Visualization, X.L. and C.C.; Supervision, Q.W. and L.G.; Project administration, X.L. and L.G.; Funding acquisition, Q.W. and L.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program (grant number 2023YFD1802200), the National Natural Science Foundation of China (grant number 32472007), and the University Natural Science Research Project of Anhui Province (grant numbers 2025AHGXZK10033, 2025AHGXZK40390).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

He, Y.; Ma, T.; Li, C.; Ma, P.; Xiang, H.; Wang, J.; Liu, Y.; Song, B.; Zeng, X. ImageDDI: Image-enhanced molecular motif sequence representation for drug-drug interaction prediction. Inf. Fusion 2025, 126, 103574. [Google Scholar] [CrossRef]
Li, T.; Zhou, S.; Wang, L.; Zhao, T.; Wang, J.; Shao, F. Docetaxel, cyclophosphamide, and epirubicin: Application of PBPK modeling to gain new insights for drug-drug interactions. J. Pharmacokinet. Pharmacodyn. 2024, 51, 367–384. [Google Scholar] [CrossRef] [PubMed]
Dunbar, D.; Ouanounou, A. An update on drug interactions involving anti-inflammatory and analgesic medications in oral and maxillofacial medicine: A narrative review. Front. Oral Maxillofac. Med. 2025, 7, 11. [Google Scholar] [CrossRef]
Kumar, D.; Hemamalini, V.; Tyagi, A.K.; Singh, R. Smart sensors for Hospital 4.0/5.0: An introduction. In Human-Centric Integration of Next-Generation Data Science and Blockchain Technology; Academic Press: Cambridge, MA, USA, 2025; pp. 369–384. [Google Scholar]
Wenhua, Z.; Qamar, F.; Abdali, T.A.N.; Hassan, R.; Jafri, S.T.A.; Nguyen, Q.N. Blockchain technology: Security issues, healthcare applications, challenges and future trends. Electronics 2023, 12, 546. [Google Scholar] [CrossRef]
Benaissa, A.; Retiat, B.; Cebere, B.; Belfedhal, A.E. Tenseal: A library for encrypted tensor operations using homomorphic encryption. arXiv 2021, arXiv:2104.03152. [Google Scholar] [CrossRef]
Almotairi, S.; Addula, S.R.; Alharbi, O.; Alzaid, Z.; Hausawi, Y.M.; Almutairi, J. Personal data protection model in IOMT-blockchain on secured bit-count transmutation data encryption approach. Fusion Pract. Appl. 2024, 16, 152–170. [Google Scholar]
Hughes, J.E.; Moriarty, F.; Bennett, K.E.; Cahir, C. Drug–Drug interactions and the risk of adverse drug reaction-related hospital admissions in the older population. Br. J. Clin. Pharmacol. 2024, 90, 959–975. [Google Scholar] [CrossRef]
Wang, Q. Interpretable vertical federated learning with privacy-preserving multi-source data integration for prognostic prediction. Eng. Appl. Artif. Intell. 2025, 148, 110408. [Google Scholar] [CrossRef]
Lee, C.K.; Samad, M.; Hofer, I.; Cannesson, M.; Baldi, P. Development and validation of an interpretable neural network for prediction of postoperative in-hospital mortality. Npj Digit. Med. 2021, 4, 8. [Google Scholar] [CrossRef]
Yu, H.; Wang, Q.; Zhou, X. Adaptive-weighted federated graph convolutional networks with multi-sensor data fusion for drug response prediction. Inf. Fusion 2025, 122, 103147. [Google Scholar] [CrossRef]
Li, D.; Yang, Y.; Cui, Z.; Yin, H.; Hu, P.; Hu, L. LLM-DDI: Leveraging Large Language Models for Drug-Drug Interaction Prediction on Biomedical Knowledge Graph. IEEE J. Biomed. Health Inform. 2025, 30, 773–781. [Google Scholar] [CrossRef] [PubMed]
Hong, Y.; Luo, P.; Jin, S.; Liu, X. LaGAT: Link-aware graph attention network for drug–drug interaction prediction. Bioinformatics 2022, 38, 5406–5412. [Google Scholar] [CrossRef] [PubMed]
Pujahari, S.R.; Engla, S.; Soni, R.; Patra, S.; Hanawal, M.K.; Kumar, A. Structural similarity of biological drugs using statistical signal processing and nuclear magnetic resonance spectral pattern analysis. Mol. Pharm. 2025, 22, 2684–2693. [Google Scholar] [CrossRef] [PubMed]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
Qiao, H.; Tong, H.; An, B.; King, I.; Aggarwal, C.; Pang, G. Deep Graph Anomaly Detection: A Survey and New Perspectives. IEEE Trans. Knowl. Data Eng. 2025, 37, 5106–5126. [Google Scholar] [CrossRef]
Latif, S.; Driss, M.; Boulila, W.; Huma, Z.E.; Jamal, S.S.; Idrees, Z.; Ahmad, J. Deep learning for the industrial internet of things (IIoT): A comprehensive survey of techniques, implementation frameworks, potential applications, and future directions. Sensors 2021, 21, 7518. [Google Scholar] [CrossRef]
Bisht, A.; Avinash, D.; Sahu, K.K.; Patel, P.; Das Gupta, G.; Kurmi, B.D. A comprehensive review on doxorubicin: Mechanisms, toxicity, clinical trials, combination therapies and nanoformulations in breast cancer. Drug Deliv. Transl. Res. 2025, 15, 102–133. [Google Scholar] [CrossRef]
Wang, X.; Liu, Y.; Wang, Q.; Gu, L. Disentangled contrastive learning with dynamic intent adaptation for unveiling gene–drug associations. Briefings Bioinform. 2025, 26, bbaf530. [Google Scholar] [CrossRef]
Nebert, D.W.; Russell, D.W. Clinical importance of the cytochromes P450. Lancet 2002, 360, 1155–1162. [Google Scholar] [CrossRef]
Niu, J.; Straubinger, R.M.; Mager, D.E. Pharmacodynamic drug–drug interactions. Clin. Pharmacol. Ther. 2019, 105, 1395–1406. [Google Scholar] [CrossRef]
Vilar, S.; Harpaz, R.; Uriarte, E.; Santana, L.; Rabadan, R.; Friedman, C. Drug-drug interaction through molecular structure similarity analysis. J. Am. Med Informatics Assoc. 2012, 19, 1066–1074. [Google Scholar] [CrossRef] [PubMed]
Deng, Y.; Xu, X.; Qiu, Y.; Xia, J.; Zhang, W.; Liu, S. A multimodal deep learning framework for predicting drug–drug interaction events. Bioinformatics 2020, 36, 4316–4322. [Google Scholar] [CrossRef] [PubMed]
Luo, T.; Lin, T.; Yang, C.; Fan, L.; Wang, W. A Drug-Drug Interaction Prediction Method Based on Atomic 3D Position Encoding and Elastic Message Passing Graph Neural Network. IEEE J. Biomed. Health Inform. 2025, 29, 6915–6928. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Ma, T.; Yang, X.; Wang, J.; Song, B.; Zeng, X. MUFFIN: Multi-scale feature fusion for drug–drug interaction prediction. Bioinformatics 2021, 37, 2651–2658. [Google Scholar] [CrossRef]
Chen, C.; Shi, X.; Nie, J.; Xu, J.; Wang, L. A Molecular Representation Learning Model Based on Multidimensional Joint and Cross-Learning for Drug–Drug Interaction Prediction. J. Chem. Inf. Model. 2025, 65, 8889–8900. [Google Scholar] [CrossRef]
Hang, C.N.; Yu, P.D.; Chen, S.; Tan, C.W.; Chen, G. MEGA: Machine learning-enhanced graph analytics for infodemic risk management. IEEE J. Biomed. Health Inform. 2023, 27, 6100–6111. [Google Scholar] [CrossRef]
Fu, W.; Wang, H.; Gao, C.; Liu, G.; Li, Y.; Jiang, T. Privacy-Preserving Individual-Level COVID-19 Infection Prediction via Federated Graph Learning. ACM Trans. Inf. Syst. 2024, 42, 82. [Google Scholar] [CrossRef]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. Adv. Neural Inf. Process. Syst. 2013, 26, 2787–2795. [Google Scholar]
Panahandeh, F.; Mansouri, N. A comprehensive review of neural network-based approaches for drug–target interaction prediction. Mol. Divers. 2025. [Google Scholar] [CrossRef]
Xiao, H.; Huang, M.; Hao, Y.; Zhu, X. TransA: An adaptive approach for knowledge graph embedding. arXiv 2015, arXiv:1509.05490. [Google Scholar] [CrossRef]
Cao, Z.; Xu, Q.; Yang, Z.; He, Y.; Cao, X.; Huang, Q. GAHE: Geometry-Aware Embedding for Hyper-Relational Knowledge Graph Representation. ACM Trans. Multimed. Comput. Commun. Appl. 2025, 21, 205. [Google Scholar] [CrossRef]
Zhu, H.; Zeng, Y. SectorE: Knowledge Graph Embeddings with Representing Relations as Annular Sectors. In Proceedings of the International Conference on Advanced Data Mining and Applications, Kyoto, Japan, 22–24 October 2025; Springer: Singapore, 2025; pp. 451–466. [Google Scholar]
Liben-Nowell, D.; Kleinberg, J. The link prediction problem for social networks. In Proceedings of the Twelfth International Conference on Information and Knowledge Management, New Orleans, LA, USA, 3–8 November 2003; pp. 556–559. [Google Scholar]
Lin, X.; Quan, Z.; Wang, Z.J.; Ma, T.; Zeng, X. KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20, Yokohama, Japan, 11–17 July 2020; Volume 380, pp. 2739–2745. [Google Scholar]
Wishart, D.S.; Feunang, Y.D.; Guo, A.C.; Lo, E.J.; Marcu, A.; Grant, J.R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z.; et al. DrugBank 5.0: A major update to the DrugBank database for 2018. Nucleic Acids Res. 2018, 46, D1074–D1082. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Sun, J.; Guan, J.; Zheng, J.; Zhou, S. Improving compound–protein interaction prediction by building up highly credible negative samples. Bioinformatics 2015, 31, i221–i229. [Google Scholar] [CrossRef] [PubMed]
Luck, K.; Kim, D.K.; Lambourne, L.; Spirohn, K.; Begg, B.E.; Bian, W.; Brignall, R.; Cafarelli, T.; Campos-Laborie, F.J.; Charloteaux, B.; et al. A reference map of the human binary protein interactome. Nature 2020, 580, 402–408. [Google Scholar] [CrossRef] [PubMed]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Yu, Y.; Huang, K.; Zhang, C.; Glass, L.M.; Sun, J.; Xiao, C. SumGNN: Multi-typed drug interaction prediction via efficient knowledge graph summarization. Bioinformatics 2021, 37, 2988–2995. [Google Scholar] [CrossRef]
Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv 2018, arXiv:1810.00826. [Google Scholar]
Liu, W.; Lü, L. Link prediction based on local random walk. Europhys. Lett. 2010, 89, 58007. [Google Scholar] [CrossRef]
Balogh, O.M.; Benczik, B.; Horváth, A.; Pétervári, M.; Csermely, P.; Ferdinandy, P.; Ágg, B. Efficient link prediction in the protein–protein interaction network using topological information in a generative adversarial network machine learning model. BMC Bioinform. 2022, 23, 78. [Google Scholar] [CrossRef]
Huang, K.; Xiao, C.; Glass, L.M.; Zitnik, M.; Sun, J. SkipGNN: Predicting molecular interactions with skip-graph networks. Sci. Rep. 2020, 10, 21092. [Google Scholar] [CrossRef]
Wang, X.W.; Madeddu, L.; Spirohn, K.; Martini, L.; Fazzone, A.; Becchetti, L.; Wytock, T.P.; Kovács, I.A.; Balogh, O.M.; Benczik, B.; et al. Assessment of community efforts to advance network-based prediction of protein–protein interactions. Nat. Commun. 2023, 14, 1582. [Google Scholar] [CrossRef]
Dudziak-Gajowiak, D.; Juszczyszyn, K.; Chudzicki, D.M.; Skorupka, D. Link Prediction Using Temporal Graph Neural Network Model. Electronics 2026, 15, 662. [Google Scholar] [CrossRef]
Nguyen, T.; Le, H.; Quinn, T.P.; Nguyen, T.; Le, T.D.; Venkatesh, S. GraphDTA: Predicting drug–target binding affinity with graph neural networks. Bioinformatics 2021, 37, 1140–1147. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Tan, X.; Wang, D.; Zhong, F.; Liu, X.; Yang, T.; Luo, X.; Chen, K.; Jiang, H.; Zheng, M. TransformerCPI: Improving compound–protein interaction prediction by sequence-based deep learning with self-attention mechanism and label reversal experiments. Bioinformatics 2020, 36, 4406–4414. [Google Scholar] [CrossRef] [PubMed]
Li, F.; Zhang, Z.; Guan, J.; Zhou, S. Effective drug–target interaction prediction with mutual interaction neural network. Bioinformatics 2022, 38, 3582–3589. [Google Scholar] [CrossRef]
Abualigah, L.; Sheikhan, A.; Ikotun, A.M.; Abu Zitar, R.; Alsoud, A.R.; Al-Shourbaji, I.; Hussien, A.G.; Jia, H. Particle swarm optimization algorithm: Review and applications. In Metaheuristic Optimization Algorithms: Optimizers, Analysis, and Applications; Elsevier (Morgan Kaufmann): Amsterdam, The Netherlands, 2024; pp. 1–14. [Google Scholar] [CrossRef]
Ju, W.; Yi, S.; Wang, Y.; Xiao, Z.; Mao, Z.; Li, H.; Gu, Y.; Qin, Y.; Yin, N.; Wang, S.; et al. A survey of graph neural networks in real world: Imbalance, noise, privacy and OOD challenges. IEEE Trans. Pattern Anal. Mach. Intell. 2026, 48, 3036–3055. [Google Scholar] [CrossRef]
Rahaman, M.; Arya, V.; Orozco, S.M.; Pappachan, P. Secure multi-party computation (SMPC) protocols and privacy. In Innovations in Modern Cryptography; IGI Global: Hershey, PA, USA, 2024; pp. 190–214. [Google Scholar]
Feng, Z.; Wang, R.; Wang, T.; Song, M.; Wu, S.; He, S. A Comprehensive Survey of Dynamic Graph Neural Networks: Models, Frameworks, Benchmarks, Experiments and Challenges. IEEE Trans. Knowl. Data Eng. 2026, 38, 26–46. [Google Scholar] [CrossRef]

Figure 1. Visualization of the same maxima distribution space. (a) illustrates the gray same maxima distribution plane formed by a vertical line of two vectors when

m = 1

. (b) represents the gray same maxima distribution space formed by a vertical plane of two vectors when

2 m = 3

. Both the plane and space depicted in the figures are extendable, and the circular shapes are employed for enhanced understanding.

Figure 1. Visualization of the same maxima distribution space. (a) illustrates the gray same maxima distribution plane formed by a vertical line of two vectors when

m = 1

. (b) represents the gray same maxima distribution space formed by a vertical plane of two vectors when

2 m = 3

. Both the plane and space depicted in the figures are extendable, and the circular shapes are employed for enhanced understanding.

Figure 2. The illustration of the proposed PINGE, where the DDI graph is constructed from the training set, the negative sampling part and the mapping of the drug pairs joint features to the domain of relation are key to it. ⊕ denote element-wise addition.

Figure 3. Schematic of remote negative link generation. Four connected nodes are depicted in (a) and five connected nodes are depicted in (b). The dashed connections indicate possible connections between two nodes that cannot be classified as negative samples.

Figure 4. Loss-value comparison under different class-imbalance ratios (mean ± std over 5 repeats).

Figure 5. Loss-value comparison under different sample-noise ratios (mean ± std over 5 repeats).

Figure 6. The results of ablation experiments.

Figure 7. Sensitivity of the proposed method to feature dimensions.

Figure 8. Sensitivity of remote negative link generation methods to distance n. (a) Shows the AUC performance of different methods with respect to distance n; (b) Shows the AUPR performance of different methods with respect to distance n.

Figure 9. Heat map of cosine similarity of drug joint features in multiple classification training set and test set. (a) represents the correctly predicted cosine similarity heat map. (b) represents the incorrectly predicted cosine similarity heat map. (c) represents the maximum similarity heat map taken out of (a), and (d) represents the maximum similarity heat map taken out of (b).

Figure 10. PINGE demonstrates structure-based interpretable predictions. (a) denotes an interpretable prediction with duplicate nodes and (b) denotes an interpretable prediction with non-duplicate nodes. The dashed line indicates the connection of the predictions.

Table 1. Hyperparameter settings of PINGE.

Hyperparameter	Setting
Dimension of drug embedding	800
Pre-training epochs	1200
Negative sample size (pre-training)	128
Threshold of feature value	12.0
Number of layers of multi-layer perceptron	4
Learning rate	0.001
Batch size	2048
Fine-tuning epochs (per fold)	100

Table 2. The results of experiment on the KEGG-drug Dataset, bold font indicates the best performance.

Methods	AUC	ACC	F1	AUPR
KGNN	0.956 ± 0.002	0.901 ± 0.003	0.903 ± 0.002	0.942 ± 0.003
GAT	0.974 ± 0.001	0.928 ± 0.002	0.929 ± 0.002	0.966 ± 0.003
GAT-const	0.976 ± 0.002	0.932 ± 0.005	0.933 ± 0.005	0.971 ± 0.002
LaGAT	0.989 ± 0.001	0.959 ± 0.002	0.959 ± 0.002	0.989 ± 0.002
PING_TransE	0.967 ± 0.001	0.902 ± 0.002	0.917 ± 0.007	0.969 ± 0.001
PINGE	0.993 ± 0.001	0.966 ± 0.001	0.965 ± 0.001	0.994 ± 0.001

Table 3. The results of experiment on the DrugBank Dataset, bold font indicates the best performance.

Methods	ACC	Macro-Precision	Macro-F1	Macro-Recall
KGNN	0.924 ± 0.002	0.862 ± 0.017	0.837 ± 0.001	0.830 ± 0.016
GAT	0.914 ± 0.001	0.886 ± 0.005	0.877 ± 0.001	0.873 ± 0.001
GAT-const	0.910 ± 0.001	0.875 ± 0.005	0.864 ± 0.007	0.864 ± 0.007
SumGNN	0.906 ± 0.003	0.863 ± 0.001	0.830 ± 0.001	0.820 ± 0.001
GIN	0.932 ± 0.001	0.906 ± 0.002	0.902 ± 0.002	0.898 ± 0.002
LaGAT	0.953 ± 0.001	0.928 ± 0.009	0.910 ± 0.008	0.899 ± 0.007
PING_TransE	0.972 ± 0.001	0.959 ± 0.005	0.943 ± 0.005	0.934 ± 0.004
PINGE	0.977 ± 0.001	0.962 ± 0.009	0.948 ± 0.007	0.941 ± 0.06

Table 4. The results of experiment on the HuRI Dataset, bold font indicates the best performance.

Methods	AUC	AUPR	ACC
RW	0.849 ± 0.003	–	–
cGAN1	0.755 ± 0.021	0.761 ± 0.019	0.761 ± 0.019
SkipGNN	0.925 ± 0.003	0.924 ± 0.003	0.748 ± 0.002
DNN+node2vec	0.939 ± 0.033	0.939 ± 0.034	0.792 ± 0.035
SEAL	0.985 ± 0.003	0.989 ± 0.002	0.946 ± 0.032
PINGE	0.996 ± 0.002	0.995 ± 0.001	0.995 ± 0.001

Table 5. The results of experiment on the human dataset, bold font indicates the best performance.

Methods	AUC	Recall	Precision
GCN	0.956 ± 0.004	0.862 ± 0.006	0.928 ± 0.010
GraphDTA	0.960 ± 0.005	0.882 ± 0.040	0.882 ± 0.040
TransformerCPI	0.973 ± 0.002	0.916 ± 0.006	0.925 ± 0.006
DrugVQA	0.979 ± 0.003	0.961 ± 0.002	0.954 ± 0.030
MIN-DTI	0.981 ± 0.003	0.945 ± 0.030	0.902 ± 0.045
PINGE	0.991 ± 0.001	0.964 ± 0.018	0.952 ± 0.017

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.; Chen, C.; Zhao, Z.; Wang, Q.; Gu, L. Interpretable Graph-Embedding Framework Based on Joint Feature Similarity for Drug–Drug Interaction Prediction. Electronics 2026, 15, 712. https://doi.org/10.3390/electronics15030712

AMA Style

Li X, Chen C, Zhao Z, Wang Q, Gu L. Interpretable Graph-Embedding Framework Based on Joint Feature Similarity for Drug–Drug Interaction Prediction. Electronics. 2026; 15(3):712. https://doi.org/10.3390/electronics15030712

Chicago/Turabian Style

Li, Xiaowei, Cheng Chen, Zihao Zhao, Qingyong Wang, and Lichuan Gu. 2026. "Interpretable Graph-Embedding Framework Based on Joint Feature Similarity for Drug–Drug Interaction Prediction" Electronics 15, no. 3: 712. https://doi.org/10.3390/electronics15030712

APA Style

Li, X., Chen, C., Zhao, Z., Wang, Q., & Gu, L. (2026). Interpretable Graph-Embedding Framework Based on Joint Feature Similarity for Drug–Drug Interaction Prediction. Electronics, 15(3), 712. https://doi.org/10.3390/electronics15030712

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable Graph-Embedding Framework Based on Joint Feature Similarity for Drug–Drug Interaction Prediction

Abstract

1. Introduction

2. Related Works

2.1. Drug–Drug Interaction Prediction

2.2. Deep Learning-Based Prediction

2.3. Multi-Relational Graph Embedding

3. Methods

3.1. Problem Define

3.2. Extraction of Positive and Negative Samples from DDI Graph

3.3. Remote Negative Link Generation

3.4. Pre-Train Drug Feature

3.5. Optimization Strategy

4. Experimental Settings

4.1. Data Preparation

4.2. Baseline and Performance Metrics

4.3. Parameter Setting

4.3.1. Computational Cost and Efficiency Analysis

4.3.2. Trade-Off of Structure-Only Modeling

4.4. Systematic Hyperparameter Optimization

5. Results

5.1. Performance Comparison

5.2. Discussion

5.3. Ablation Study

5.4. Interpretable Analysis

5.5. Interpretable Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI