Biomedical Knowledge Graph Embedding with Hierarchical Capsule Network and Rotational Symmetry for Drug-Drug Interaction Prediction

Zhang, Sensen; Li, Xia; Liu, Yang; Bi, Peng; Hu, Tiangui

doi:10.3390/sym17111793

Open AccessArticle

Biomedical Knowledge Graph Embedding with Hierarchical Capsule Network and Rotational Symmetry for Drug-Drug Interaction Prediction

by

Sensen Zhang

^1,2

,

Xia Li

^3,*,

Yang Liu

^3,4,*,

Peng Bi

³ and

Tiangui Hu

³

¹

China Jiliang University College of Modern Science and Technology, 8 Daxue Road, Yiwu 322000, China

²

School of Information Technology, Renmin University of China, Haidian District, Beijing 100872, China

³

Yangtze River Delta Research Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China

⁴

Institute of Integrated Circuit Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611700, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2025, 17(11), 1793; https://doi.org/10.3390/sym17111793

Submission received: 18 August 2025 / Revised: 19 September 2025 / Accepted: 22 September 2025 / Published: 23 October 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

The forecasting of Drug-Drug Interactions (DDIs) is essential in pharmacology and clinical practice to prevent adverse drug reactions. Existing approaches, often based on neural networks and knowledge graph embedding, face limitations in modeling correlations among drug features and in handling complex BioKG relations, such as one-to-many, hierarchical, and composite interactions. To address these issues, we propose Rot4Cap, a novel framework that embeds drug entity pairs and BioKG relationships into a four-dimensional vector space, enabling effective modeling of diverse mapping properties and hierarchical structures. In addition, our method integrates molecular structures and drug descriptions with BioKG entities, and it employs capsule network–based attention routing to capture feature correlations. Experiments on three benchmark BioKG datasets demonstrate that Rot4Cap outperforms state-of-the-art baselines, highlighting its effectiveness and robustness.

Keywords:

DDIs; knowledge graph embedding; molecular structure; 4-dimensional vector space; capsule network

1. Introduction

In clinical settings, patients are often administered multiple drugs simultaneously to achieve optimal therapeutic outcomes. However, such polypharmacy increases the likelihood of Drug-Drug Interactions (DDIs), where the pharmacological effect of one drug is altered by another, potentially resulting in adverse drug reactions (ADRs) that range from mild symptoms to severe complications or even death. Empirical evidence shows that the probability of DDIs rises sharply with the number of co-administered drugs, from approximately 6% for two drugs to 50% for five, and it is nearly 100% when ten drugs are prescribed concurrently [1]. These observations underscore the critical need for accurate DDI prediction to enhance patient safety and treatment efficacy.

Over the past decade, numerous machine learning approaches have been proposed to predict DDIs, aiming to reduce drug development costs and improve safety profiles. Early methods primarily relied on feature-based models that incorporated attributes such as positional or syntactic features [2], but their performance was limited by the extensive manual feature engineering required. Subsequent neural network-based models, including BERT-derived architectures, have considerably advanced DDI extraction; nevertheless, they often focus on graph structure while underexploiting node and edge attributes [3]. To overcome these limitations, knowledge graph (KG)-based techniques have been developed [4], employing knowledge graph embedding (KGE) and integrating auxiliary information. For instance, the approach by Xin et al. [5] combined PubMedBERT, CNN, and RotatE [6] to jointly leverage textual and structural knowledge, achieving enhanced predictive accuracy.

Despite these advancements, significant challenges persist. Within Biomedical Knowledge Graphs (BioKGs), drug entities are interconnected through complex relation patterns, including relation mapping properties (RMPs, e.g., 1-to-1, 1-to-N, N-to-1, and N-to-N) and hierarchical structures. Existing KG-based models often fail to reason effectively over these patterns. Although KG2ECapsule [7] introduced capsule networks to model relation mappings, it does not fully exploit molecular structures or textual descriptions, limiting its capacity to capture comprehensive drug information.

To address these challenges, we propose Rot4Cap, a novel framework that embeds BioKG entities and relations into a four-dimensional vector space, facilitating reasoning over relation mapping properties and hierarchical structures via geometric symmetry. Additionally, molecular features are extracted using graph neural networks (GNNs), while BERT and CNN are employed to encode semantic representations of drug descriptions. These heterogeneous features are integrated within a capsule network utilizing an inverted dot-product attention routing mechanism, enabling richer information exchange between entities and relations.

The key contributions of this work are summarized as follows:

We introduce Rot4Cap, which embeds BioKGs in a 4D vector space and leverages geometric symmetry to capture complex relational patterns, including RMPs and hierarchical structures.
We integrate BERT, CNN, and GNN to obtain molecular and textual representations of drugs, employing capsule networks to model inter-dimensional correlations of entity embeddings.
We demonstrate the effectiveness of Rot4Cap on three widely used BioKG datasets, where it consistently surpasses both traditional and state-of-the-art DDI prediction models.

2. Related Work

In the early stages of Drug-Drug Interaction (DDI) research, conventional feature-based machine learning approaches dominated the field. For instance, Chowdhury et al. [2] applied kernel-based methods for DDI extraction, whereas Thomas et al. [8] employed majority voting techniques specifically designed for this task. Subsequent studies sought to enhance predictive performance by incorporating a broader set of feature types, including relative positional information, syntactic structure features, and phrasal auxiliary verb features, as illustrated in the works of Bokharaeian et al. [9], Kim et al. [10], and Jin et al. [5]. Despite these improvements, these early approaches remained heavily dependent on manual feature engineering and selection, which limited their effectiveness and risked overlooking subtle yet important patterns in the data.

In contrast, recent years have seen a transition toward deep neural network (NN)-based models for DDI prediction, leveraging their ability to automatically capture complex relationships from high-dimensional data. Broadly, these models can be classified into three categories: (I) Matrix Factorization (MF)-based approaches, as exemplified by Belkin et al. [11] and Cao et al. [12]; (II) Random Walk (RW)-based methods, as demonstrated by Perozzi et al. [13] and Grover et al. [14]; and (III) Neural Network (NN)-based approaches, with significant contributions from Tang et al. [15] and Wang et al. [16]. Compared to traditional machine learning techniques, these NN-based models have consistently yielded superior performances, representing a substantial advancement in DDI prediction research.

The deployment of advanced language models such as BERT has markedly influenced the domain of Drug-Drug Interaction (DDI) extraction from biomedical texts. In this context, the research community has been actively investigating the application of BERT variants, specifically tailored for biomedical language processing, including BioBERT [17] and SciBERT [18]. Notable in this regard is the work of Asada et al. [19], who utilized drug description and molecular structure data to augment SciBERT’s efficacy in DDI extraction. However, these models exhibit a certain limitation in their representation of potential inter-entity relationships. Predominantly, they focus on the linkages between nodes whilst neglecting critical dimensions, such as node attributes and edge types. As a result, there remains a need for enhancement in these models to achieve a more comprehensive understanding of DDIs, encompassing the intricate web of relationships and contextual data associated with drug entities.

Alongside these advancements, knowledge graph embedding (KGE) techniques have increasingly been applied to DDI prediction, facilitating the automated extraction of informative features for inference. KGE-based methods have demonstrated notable effectiveness in this domain. For example, the Tiresias model [20] was among the earliest to integrate multiple drug-related attributes into a Biomedical Knowledge Graph (BioKG), employing a logistic regression classifier to compute similarity metrics between drugs for potential DDI prediction. Similarly, Celebi et al. [21] utilized classical KGE models, including TransE [22] and TransD [23], to infer drug interactions. More recently, BERTKG-DDIs [24] extended traditional KGE approaches by modeling interactions between medicine embeddings and other biomedical entities, incorporating a BioBERT-based Relation Classification (RC) framework tailored to the biomedical domain. Nevertheless, many of these methods depend on simple mathematical operations, such as addition, subtraction, or basic multiplication, which limits their ability to capture complex inter-entity relationships. To overcome this limitation, Ma [25] introduced an attention mechanism to assign weights to different views, thereby improving model interpretability and relational reasoning.

In recent studies, KGNN [26] has leveraged neural networks to capture both higher-order structural patterns and semantic relationships within knowledge graphs (KGs). The model treats the neighborhood of each entity as its local receptive field and integrates information from neighboring entities with deviations in the current entity representation to enhance the embedding quality. Similarly, Xin [5] proposes a hybrid framework that combines neural networks with knowledge graph embedding, incorporating multiple levels of textual features into the neural network component while employing RotatE [6] for knowledge graph modeling. This integrated approach has demonstrated strong performance in DDI prediction. However, most existing network-based methods in this category approach DDI prediction as a binary classification task, which insufficiently reflects the diverse and complex relational mapping properties (RMPs) present among drug pairs. Addressing this limitation, KG2ECapsule [7] introduces a capsule network combined with a Bernoulli distribution framework to model these relationship patterns more effectively. Despite its promise, the inherent computational complexity of the Bernoulli-based approach limits its practical modeling efficiency. More recently, TIGER [27] has presented a Transformer-based relation-aware graph representation learning framework specifically designed for DDI prediction, providing a novel avenue for capturing complex drug interactions.

3. Model

In our research, we developed the Rot4Cap framework, which is predicated on a fusion of a four-dimensional (4D) vector space and a capsule network architecture, as depicted in Figure 1. The Rot4Cap framework is composed of three principal components, each contributing to its overall functionality. The first component, the entity embedding layer, maps drug pair entities and their corresponding relationships into a four-dimensional (4D) vector space. By incorporating geometric symmetry within this space, the embedding process enhances the capacity to represent complex relationship patterns, ranging from chains and hierarchical structures to symmetric and asymmetric relation mapping properties (RMPs). Subsequently, the molecular structure layer of Rot4Cap employs graph neural networks (GNNs) to represent the molecular graph structures of drugs. In parallel, the descriptive sentences pertaining to drugs are processed through a combination of BERT and CNN algorithms, thereby translating these textual descriptions into real-valued, fixed-size vectors. This dual approach allows for the extraction of fundamental features, including molecular organization and literature-based descriptions of drug pairs. The final component of the Rot4Cap framework is the capsule network layer. This layer is tasked with the intricate learning of feature variables encapsulated within each capsule. Its primary objective is to optimize the retention of valuable information within the network. Additionally, this layer plays a critical role in predicting the likelihood and nature of interactions between two entities within the relational space defined by the framework. This multifaceted approach enables Rot4Cap to address complex Drug-Drug Interaction prediction tasks with enhanced efficacy.

3.1. Motivation for Four-Dimensional Embedding

The adoption of a four-dimensional (4D) embedding space is motivated by both mathematical properties and empirical observations. Mathematically, quaternions—consisting of one real and three imaginary components—are inherently 4D objects. Embedding entities and relations in 4D space enables the use of quaternion algebra and

SO (4)

group operations, which allow for the simultaneous representation of left- and right-isoclinic rotations. This provides greater expressive power than lower-dimensional spaces (e.g.,

SO (2)

or

SO (3)

), which can only capture limited rotational symmetries. Such flexibility is crucial for biomedical knowledge graphs, where asymmetric (e.g., Drug A inhibits Drug B but not vice versa) and hierarchical relations (e.g., drug categories, sub-classes, and specific molecules) are pervasive.

From an empirical perspective, we conducted ablation studies comparing embeddings in 2D, 3D, and 4D spaces. Results demonstrate that 4D embeddings achieve higher accuracy in modeling relation mapping properties and hierarchical structures while avoiding the excessive computational burden and risk of overfitting associated with higher dimensions (e.g., 8D and 16D). Thus, the 4D embedding space offers an effective balance between expressiveness and efficiency, making it a principled choice for our framework.

3.2. Entity Embedding Layer

We represent the BioKG

G

as a collection of drug–relation–drug triples

{(h, r, t) ∣ h, t \in E, r \in R}

, where

E

is the entity set and

R

is the relation set. To capture richer semantics, entities and relations are embedded in a four-dimensional (4D) vector space based on the

SO (4)

group [28].

A quaternion q is defined as

q = a + b i + c j + d k, a, b, c, d \in R,

(1)

where

i^{2} = j^{2} = k^{2} = i j k = - 1

. Equivalently, q can be expressed as a scalar–vector pair

q = [a, v]

, where

v \in R^{3}

.

Each entity is embedded as a quaternion vector, e.g.,

h = a_{h} + b_{h} i + c_{h} j + d_{h} k, t = a_{t} + b_{t} i + c_{t} j + d_{t} k,

(2)

where

a_{\cdot}, b_{\cdot}, c_{\cdot}, d_{\cdot} \in R^{k}

.

The relation r defines a pair of unit quaternions p and

p^{'}

, which represent left- and right-isoclinic rotations as follows:

p = [cos θ, u sin θ], p^{'} = [cos σ, v sin σ],

where

θ, σ \in [0, π]

are rotation parameters.

Hierarchy modeling. Pure rotations preserve vector norms and thus struggle to represent hierarchical relations. To address this, we introduce a relation-specific weight matrix

W_{r}

that scales entity norms in a hierarchy-aware manner. Specifically, hierarchical relations are identified based on prior knowledge or dataset annotations. For such relations, entries

ω \in W_{r}

are initialized to 1 and kept fixed or constrained, while non-hierarchical relations allow

ω \neq 1

, which are optimized during training. Intuitively,

W_{r}

acts as a rescaling operator that ensures the rotated head entity

h

aligns with the hierarchical level of the tail entity

t

. The scoring function is as follows:

f (h, r, t) = ∥W_{r} \cdot (p h p^{'}) - t∥ .

(3)

This formulation explicitly defines how hierarchical relations are determined and how the weight matrix

W_{r}

is applied, ensuring reproducibility.

3.3. Molecular Structure Layer

In the Biomedical Knowledge Graph (BioKG) dataset, each pharmacological agent is accompanied by descriptive sentences elucidating its characteristics and therapeutic efficacy. For instance, the introductory sentence pertaining to the Salbutamol drug elucidates that “Salbutamol is a short-acting, selective beta2-adrenergic receptor agonist utilized in the management of asthma and COPD.” Additionally, this dataset encompasses details regarding the molecular configuration of each drug, encapsulated in the form of Simplified Molecular Input Line Entry System (SMILES) string codes. In the context of this study, the SMILES string representations of drug molecule pairs from the BioKG dataset serve as the foundational input. SMILES is a linear notation for representing chemical molecules, and it is widely used in cheminformatics and computational chemistry. These representations are, subsequently, transformed into graph structures utilizing the RDKit software (version 2022.09.5) [29]. Furthermore, molecular fingerprints are derived from these graph structures, employing a preprocessing script, as delineated by Tsubaki et al. [30]. This research project endeavors to establish a connection between mentions of drug pairs and their corresponding entries within drug databases. In doing so, it seeks to retrieve comprehensive descriptions and molecular structure data for these drugs. To accomplish this, we deploy the BERT (specifically, SciBERT) and CNN frameworks for the analysis of drug descriptions, alongside the application of graph neural networks (GNNs) for the interpretation of molecular structures of the drugs.

The textual descriptions of drugs are encoded into real-valued, fixed-length vectors using a combination of BERT and CNN. In our framework, block-level embeddings obtained from BERT serve as the input to the convolutional layer. Let the embeddings of a drug pair be denoted as

h_{w} \in H^{d}

and

t_{w} \in H^{d}

, where d represents the dimensionality of the entity embeddings. The convolutional operation on these embeddings is expressed as follows:

m_{h} = f (W ⊙ h_{w} + b),

(4)

where ⊙ denotes element-wise multiplication,

W \in H^{d \times f}

is the convolutional filter with f filters,

b

is a bias term, and

f (\cdot)

corresponds to the Gaussian Error Linear Unit (GELU) activation function [31]. A max-pooling operation is subsequently applied to the output of each filter, yielding a fixed-length vector

h_{d} = {max}_{i} m_{h}

. Following the same procedure for

t_{w}

, we obtain the description-based embedding

t_{d}

for the second drug in the pair.

The molecular structures of drugs are modeled as graphs using GNNs, where atoms correspond to nodes and chemical bonds represent edges in the molecular graph

G

. We adopted the neural molecular GNN approach proposed by [30], which leverages r-radius subgraphs or molecular fingerprints to encode each atom along with its local chemical environment. Initially, fingerprint vectors for each atom are assigned randomly and subsequently updated based on the graph topology of the molecule. Denoting the feature vector of the i-th atom as

a_{i}

and its set of neighboring atoms as

N_{i}

, the update at the ℓ-th iteration is formulated as follows:

a_{i}^{ℓ} = a_{i}^{ℓ - 1} + \sum_{j \in N_{i}} f (W_{b}^{ℓ - 1} b_{j}^{ℓ - 1} + b_{b}^{ℓ - 1}),

(5)

where

f (\cdot)

denotes the Rectified Linear Unit (ReLU) activation function, and

W_{b}

and

b_{b}

are the learnable weight and bias parameters, respectively. The molecular representation for the entire drug is obtained by aggregating the embeddings of all constituent atoms, which are then passed through a linear transformation:

h e / t e = f (W o u t \sum {i = 1}^{M} b_{i}^{L} + b_{o u t}),

(6)

where M denotes the total number of fingerprints in the molecule, and

W o u t

and

b o u t

represent the weight and bias parameters of the output layer, respectively.

3.4. Capsule Network Layer

In our framework, we integrate multiple types of inputs, including entity and relation embeddings, molecular graph representations, and descriptive information for drug pairs. These inputs are organized into a matrix representation:

B = [h_{e}, e_{h}, h_{d}, e_{r}, t_{e}, e_{t}, t_{d}] \in H^{k \times 7} .

(7)

For each row

B_{i}

of

B

, a convolution operation is applied using a filter

ω

to produce a feature map

q = [q_{1}, q_{2}, \dots, q_{k}]

:

q_{i} = g (ω \cdot B_{i} + b_{c}),

(8)

where

ω \in H^{1 \times 7}

,

(\cdot)

represents the dot product,

b_{c}

is a bias term, and

g (\cdot)

is a non-linear activation function (such as ReLU or Sigmoid).

Within the convolutional layer, this operation generates k capsule values per filter. Denoting the total number of filters as N, we obtain N feature maps, each capturing unique patterns across the same embedding dimensions. The first capsule layer, thus, contains k capsules, with each capsule

i \in 1, 2, \dots, k

outputting a vector

Q_{i} \in H^{N \times 1}

. These vectors are, subsequently, projected through weight matrices

w_{i} \in H^{d \times N}

to yield

v_{i} \in H^{d \times 1}

. The projected vectors are then combined to form the input

s \in H^{d \times 1}

for the next capsule layer. Finally, the capsule produces a normalized output vector

e \in H^{d \times 1}

, which is used to compute a score evaluating the correctness of a given quadruple:

v_{i} = w_{i} \cdot Q i, s = \sum i r_{i} v_{i}, e = LayerNorm (s), f = e \cdot W .

(9)

Layer Normalization [32] is applied to stabilize and accelerate the routing procedure. The coefficients

r_{i}

are coupling parameters learned during training, controlling the contribution of each capsule’s output to the aggregated representation.

Our proposed framework is illustrated in Figure 2 and configured with the following parameters: embedding dimension

k = 6

, number of convolutional filters

N = 5

, the number of neurons in the first capsule layer N, and the number of neurons in the second capsule layer

d = 4

. The model’s scoring function is formulated as follows:

\begin{matrix} f (h, r, t) = |caps (g ([h_{e}, e_{h}, h_{d}, e_{r}, t_{e}, e_{t}, t_{d}] * Ω))| • W, \end{matrix}

(10)

where

Ω

denotes the set of convolutional filters and

W

is the weight vector, both serving as shared hyperparameters in the convolution layer. Here, * represents the convolution operation, caps denotes the capsule network function, and g refers to the ReLU activation.

The utility of negative sampling has been widely recognized in both knowledge graph embedding learning [33] and word embedding training [34]. Accordingly, we adopted a loss function similar to the negative sampling loss [34] to efficiently optimize our model, which is expressed as follows:

L = - log σ (γ - f (h, r, t)) - \frac{1}{k} \sum_{i = 1}^{n} log σ (f (h_{i}^{'}, r, t_{i}^{'}) - γ),

(11)

where

γ

is a fixed margin hyperparameter,

σ

denotes the sigmoid function, and

(h_{i}^{'}, r, t_{i}^{'})

represents the i-th negative triple.

4. Experimental Design

4.1. Data Sets

To evaluate the proposed Rot4Cap, we conducted link prediction tasks on three widely used BioKG benchmark datasets: OGB-Biokg [35], DrugBank [36], and KEGG [37]. The detailed statistics of these datasets are summarized in Table 1.

OGB-Biokg was constructed by Stanford University and comprises five types of entities connected by 51 types of directed relations. DrugBank is a comprehensive, freely accessible database providing detailed information on drugs and drug targets, including chemical, pharmacological, pharmaceutical data, as well as sequence, structure, and pathway information. KEGG serves as a resource for understanding high-level biological functions and processes from molecular-level information, featuring manually curated pathway maps for metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems, and human diseases.

4.2. Baselines and Metrics

To assess Rot4Cap’s effectiveness, we compared it with a wide range of baseline methods. Traditional network representation learning baselines include the following: MF-based (Laplacian [12]), RW-based (DeepWalk [14]), and NN-based (LINE [16]). For knowledge graph-based approaches, we included the following: KGNN [26], KGAT [38], R-GCN [39], BERTKG-DDIs [24], Xin [5], and KG2ECapsule [7].

To investigate the effect of embedding dimensionality, we modeled TransECap, RotatECap, and QuatECap in 1D, 2D, and 3D spaces using TransE [22], RotatE [6], and QuatE [40], respectively, instead of the conventional 4D space.

We evaluated model performance using the following metrics:

Accuracy (Acc.): Overall correctness.

$Accuracy = \frac{TP + TN}{TP + TN + FP + FN} .$
Precision (Pre.): Ability to correctly identify positive instances.

$Precision = \frac{TP}{TP + FP} .$
Recall (Rec.): Ability to capture all relevant positive instances.

$Recall = \frac{TP}{TP + FN} .$
F1 Score (F1): Harmonic mean of Precision and Recall.

$F 1 = 2 \times \frac{Precision \times Recall}{Precision + Recall} .$
Area Under the ROC Curve (Auc): Measures the area under the ROC curve (TPR vs. FPR), with higher values indicating better performance.
Area Under the Precision–Recall Curve (AUPR): Measures the area under the Precision–Recall curve, particularly useful for imbalanced datasets.

4.3. Implementation Details

Rot4Cap was implemented in PyTorch (Python 3.7) and executed on an Ubuntu 20.04 system. The maximum training iterations were set to 1000, with a batch size of 512 following RotatE [6]. Embedding dimension k was selected from

{100, 200, 500, 1000, 1500}

to balance representation capability and computation. Learning rate r was tuned within

[0.01, 1]

, and margin

γ \in {6, 9, 12, 24, 30}

was chosen based on validation performance. The temperature parameter

α

for self-adversarial negative sampling was tuned in

{0.5, 1.0}

.

Relation rotation angles were initialized uniformly in

[- π, π]

. Entity, relation, and temporal embeddings were initialized from biquaternions following CapsE [41]. For convolutional feature extraction, the number of filters

| ω |

was chosen from

{50, 100, 200, 400}

. In the capsule network, the second layer contained

b = 10

neurons with a batch size of 128. The number of iterations m in the inverted dot-product attention routing was searched in

{1, 3, 5, 7, 9}

; empirically, three iterations provided the best balance between stability and efficiency.

4.4. Computational Cost Analysis

To evaluate scalability, we measured the training time and memory consumption of Rot4Cap on a single NVIDIA RTX 3090 GPU (24 GB memory; NVIDIA Corporation, Santa Clara, CA, USA). For the largest dataset, the average training epoch required approximately 2.5 min, and the peak memory usage remained below 12 GB. Compared to baseline models, Rot4Cap demonstrated competitive efficiency: it required slightly more computation than RotatE due to the capsule routing, but it remained substantially lighter than graph neural network-based models, such as KGAT and R-GCN, and it exceeded the available memory under similar embedding dimensions. This highlights that Rot4Cap balances modeling complexity with computational feasibility, enabling the integration of multi-source features without prohibitive cost.

4.5. Experiment Results

The experimental results, elucidated in Table 2, encompass an extensive comparative analysis of our proposed Rot4Cap model against a suite of knowledge graph (KG)-based models, including KGNN [26], KGAT [38], RGCN [39], and BERTKG-DDIs [24]. This comparison demonstrates Rot4Cap’s exemplary performance across all three evaluated datasets. Such an enhanced performance of Rot4Cap is primarily attributable to the innovative implementation of the capsule layer, which adeptly facilitates sophisticated entity representations within a designated nonlinear relational space. This finding not only illustrates the efficacy of KG-based models over traditional network representation learning methods, but it also highlights their proficiency in differentiating diverse relationships among triples embedded in vector spaces.

In contrast, traditional network representation learning approaches predominantly concentrate on topological characteristics within the relational network. Furthermore, Rot4Cap exhibits remarkable superiority over KG2ECapsule [7], TransECap, RotatECap, and QuatECap. This underlines the superior modeling capabilities of intrinsic relation patterns and relation mapping properties (RMPs) within a four-dimensional (4D) vector space, as opposed to the limited capacity of the Bernoulli distribution and lower-dimensional vector spaces (1D, 2D, and 3D). Notably, KG2ECapsule [7] marginally outperforms models such as TransE [22], RotatE [6], and QuatE [40], primarily due to its enhanced handling of the RMP challenges in the BioKG dataset. The Bernoulli distribution offers a partial remedy to this issue, whereas TransE, RotatE, and QuatE display a relative inability to model RMPs effectively, particularly within the BioKG context.

It is also critical to acknowledge the computational limitations encountered in the deployment of KGAT and R-GCN on the DrugBank dataset, where the sheer volume of trainable parameters led to memory overloads. This limitation, however, was not observed in other baseline models implemented on the same computational platform. KG-based models that rely on basic arithmetic operations (addition, subtraction, or simple multiplication) are confined to capturing linear relationships between entities. Furthermore, these models often require high-dimensional embeddings to fully encode their information, posing significant scalability challenges in large-scale KGs. These scalability constraints can potentially exacerbate issues of overfitting and computational complexity in the models.

4.6. Experimental Analysis

4.6.1. The Effect of Different Features

We added features to the model step by step to investigate the impact of different features. The model configuration is divided into four groups. The first group includes models using only knowledge (blue) or only the capsule network (red). The second group combines the capsule network with one type of feature information, with three possible choices: knowledge (blue), molecular structure (red), or textual description (green). The third group pairs the capsule network with two types of feature information, with three possible combinations: knowledge and molecular structure (blue), knowledge and textual description (red), or molecular structure and textual description (green). Finally, the fourth group corresponds to the complete model, incorporating all four features (one configuration). The results are shown in Figure 3.

Comparing the first group with the second group, we can see that the performance of the second group improved regardless of which features we integrated. Thus, molecular structure, drug description, and knowledge mapping contribute positively to DDI extractions. Relative to the second group, the models with two information features in the third group have either a slight improvement or a small decrease in performance, which could mean that the additional information is similar to that already included in the second group of models and a closer look could determine that KG has a higher impact on the performance of the models than molecular structure and drug description. However, from the third to the fourth group, we saw an improvement in all cases, suggesting that the three information features in the fourth group were complementary and could reinforce each other.

4.6.2. The Effect of Capsule Network

In our research approach, we utilized the scoring functions from CovE [42], ConvKB [43], and CapsE [41] as replacements for Equation (10) to substantiate its applicability, resulting in the derivation of Rot4CovE, Rot4ConvKB, and Rot4CapsE, respectively. The experimentation was executed on the DrugBank dataset. As indicated in Table 3, the Rot4Cap model demonstrated enhanced performance compared to its analogous counterparts, yielding superior experimental outcomes.

5. Conclusions

In order to enhance the performance of biomedical DDI predictions, a novel approach was proposed. It involves embedding drug entity pairs and their relationships from BioKG into a 4D dimensional vector space. Furthermore, for the first time, feature information, such as molecular structures and drug descriptions, were incorporated and linearly combined with the drug entities from BioKG. The integrated features were then utilized in conjunction with the capsule network’s inverted dot-product attention routing technique. This enabled the extraction of comprehensive information about drug entities and their relationships. Experimental evaluations conducted on various datasets demonstrated the superiority of our proposed model over existing state-of-the-art methods.

Author Contributions

S.Z.: carried out the investigations, developed the methodology, implemented software, and prepared the original draft; X.L.: carried out the investigations, curated data, generated visualizations, validated results, and contributed to the original draft; Y.L.: carried out investigations, curated data, generated visualizations, validated results, and contributed to the original draft; P.B.: responsible for conceptualization, methodology, software development, investigations, drafting, visualizations, and reviewing/editing; T.H.: contributed to conceptualization, methodology, software development, investigations, drafting, visualizations, and reviewing/editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Municipal Government of Quzhou under Grant No. 2023D013, and under Grant No. 2024D003 (Project title: “Innovative Design and Application of High-Performance Resistive Random Access Memory Based on SnO₂ Nanostructures”. This research was funded by Xia Li).

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Finkel, R.; Clark, M.A.; Cubeddu, L.X. Pharmacology; Lippincott Williams & Wilkins: Philadelphia, PA, USA, 2009. [Google Scholar]
Chowdhury, M.F.M.; Lavelli, A. FBK-IRST: A Multi-Phase Approach to Semantic Textual Similarity. In Proceedings of the *Second Joint Conference on Lexical and Computational Semantics* (*#*SEM 2013): Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, Atlanta, GA, USA, 13–14 June 2013; pp. 351–355. [Google Scholar]
Tu, K.; Cui, P.; Wang, X.; Wang, F.; Zhu, W. Structural Deep Embedding for Hyper-Networks. In Proceedings of the AAAI, New Orleans, LA, USA, 2–7 February 2018; AAAI Press: Washington, DC, USA, 2018; pp. 426–433. [Google Scholar]
Guan, N. Knowledge graph embedding with concepts. Knowl. Based Syst. 2019, 164, 38–44. [Google Scholar] [CrossRef]
Jin, X.; Sun, X.; Chen, J.; Sutcliffe, R.F.E. Extracting Drug-drug Interactions from Biomedical Texts using Knowledge Graph Embeddings and Multi-focal Loss. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; ACM: New York, NY, USA, 2022; pp. 884–893. [Google Scholar]
Sun, Z.; Deng, Z.-H.; Nie, J.-Y.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the ICLR, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Su, X.; You, Z.-H.; Huang, D.; Wang, L.; Wong, L.; Ji, B.; Zhao, B. Biomedical Knowledge Graph Embedding with Capsule Network for Multi-Label Drug-Drug Interaction Prediction. IEEE Trans. Knowl. Data Eng. 2023, 35, 5640–5651. [Google Scholar] [CrossRef]
Thomas, P.; Neves, M.L.; Rocktäschel, T.; Leser, U. WBI-DDI: Drug-Drug Interaction Extraction using Majority Voting. In Proceedings of the Association for Computer Linguistics NAACL-HLT, Atlanta, GA, USA, 14–15 June 2013; pp. 628–635. [Google Scholar]
Bokharaeian, B.; Díaz, A. NIL_UCM: Extracting Drug-Drug interactions from text through combination of sequence and tree kernels. In Proceedings of the Association for Computer Linguistics 7th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT, Atlanta, GA, USA, 14–15 June 2013; pp. 644–650. [Google Scholar]
Kim, S.; Liu, H.; Yeganova, L.; Wilbur, W.J. Extracting drug-drug interactions from literature using a rich feature-based linear kernel approach. J. Biomed. Inform. 2015, 55, 23–30. [Google Scholar] [CrossRef] [PubMed]
Belkin, M.; Niyogi, P. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Comput. 2003, 15, 1373–1396. [Google Scholar] [CrossRef]
Cao, S.; Lu, W.; Xu, Q. GraRep: Learning Graph Representations with Global Structural Information. In Proceedings of the CIKM 2015, Melbourne, VIC, Australia, 19–23 October 2015; ACM: New York, NY, USA, 2015; pp. 891–900. [Google Scholar]
Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk: Online learning of social representations. In Proceedings of the ACM KDD ’14, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
Grover, A.; Leskovec, J. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 855–864. [Google Scholar]
Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. LINE: Large-scale Information Network Embedding. In Proceedings of the ACM WWW 2015, Florence, Italy, 18–22 May 2015; pp. 1067–1077. [Google Scholar]
Wang, D.; Cui, P.; Zhu, W. Structural Deep Network Embedding. In Proceedings of the ACM SIGKDD, San Francisco, CA, USA, 13–17 August 2016; pp. 1225–1234. [Google Scholar]
Lee, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef] [PubMed]
Beltagy, I. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the Association for Computational Linguistic EMNLP-IJCNLP, Hong Kong, China, 3–7 November 2019; pp. 3613–3618. [Google Scholar]
Asada, M.; Miwa, M.; Sasaki, Y. Using drug descriptions and molecular structures for drug-drug interaction extraction from literature. Bioinformatics 2021, 37, 1739–1746. [Google Scholar] [CrossRef] [PubMed]
Abdelaziz, I.; Fokoue, A.; Hassanzadeh, O.; Zhang, P.; Sadoghi, M. Large-scale structural and textual similarity-based mining of knowledge graph to predict drug-drug interactions. J. Web Semant. 2017, 44, 104–117. [Google Scholar] [CrossRef]
Çelebi, R.; Yasar, E.; Uyar, H.; Gümüs, Ö.; Dikenelli, O.; Dumontier, M. Evaluation of knowledge graph embedding approaches for drug-drug interaction prediction using Linked Open Data. In Proceedings of the SWAT4LS 2018, Antwerp, Belgium, 3–6 December 2018; CEUR Workshop Proceedings. Volume 2275. [Google Scholar]
Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Advances in Neural Information Processing Systems 26, Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NeurIPS 2013), Lake Tahoe, NV, USA, 5–8 December 2013; Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2014; pp. 2787–2795. [Google Scholar]
Ji, G.; He, S.; Xu, L.; Liu, K.; Zhao, J. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the Association for Computer Linguistics ACL, Beijing, China, 26–31 July 2015; Volume 1: Long Papers. pp. 687–696. [Google Scholar]
Mondal, I. BERTKG-DDI: Towards Incorporating Entity-specific Knowledge Graph Information in Predicting Drug-Drug Interactions. In Proceedings of the AAAI, SDU@AAAI 2021, Virtual Event, 9 February 2021; CEUR Workshop Proceedings. Volume 2831. [Google Scholar]
Ma, T.; Xiao, C.; Zhou, J.; Wang, F. Drug Similarity Integration Through Attentive Multi-View Graph Auto-Encoders. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; pp. 3477–3483. [Google Scholar]
Lin, X.; Quan, Z.; Wang, Z.-J.; Ma, T.; Zeng, X. KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, Virtual, 7–15 January 2021; pp. 2739–2745. [Google Scholar]
Su, X.; Hu, P.; You, Z.-H.; Philip, S.Y.; Hu, L. Dual-Channel Learning Framework for Drug-Drug Interaction Prediction via Relation-Aware Heterogeneous Graph Transformer. In Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, QC, Canada, 20–27 February 2024; Volume 38, pp. 249–256. [Google Scholar]
Le, T.; Tran, H.; Le, B. Knowledge graph embedding with the special orthogonal group in quaternion space for link prediction. Knowl. Based Syst. 2023, 266, 110400. [Google Scholar] [CrossRef]
Landrum, G. RDKit: Open-Source Cheminformatics Software. Available online: http://www.rdkit.org (accessed on 15 March 2025).
Tsubaki, M.; Tomii, K.; Sese, J. Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences. Bioinformatics 2019, 35, 309–318. [Google Scholar] [CrossRef] [PubMed]
Hendrycks, D.; Gimpel, K. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units. CoRR 2016, abs/1606.08415. [Google Scholar]
Ba, L.J.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar] [CrossRef]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex Embeddings for Simple Link Prediction. In Proceedings of the ICML, New York, NY, USA, 19–24 June 2016; JMLR Workshop and Conference Proceedings. Volume 48, pp. 2071–2080. [Google Scholar]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; Advances in Neural Information Processing Systems 26. pp. 3111–3119. [Google Scholar]
Hu, W.; Fey, M.; Zitnik, M.; Dong, Y.; Ren, H.; Liu, B.; Catasta, M.; Leskovec, J. Open Graph Benchmark: Datasets for Machine Learning on Graphs. In Proceedings of the NeurIPS 2020, Virtual, 6–12 December 2020. [Google Scholar]
Wishart, D.S.; Feunang, Y.D.; Guo, A.C. DrugBank 5.0: A major update to the DrugBank database for 2018. Nucleic Acids Res. 2018, 46, D1074–D1082. [Google Scholar] [CrossRef] [PubMed]
Kanehisa, M.; Furumichi, M.; Tanabe, M.; Sato, Y.; Morishima, K. KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017, 45, D353–D361. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; He, X.; Cao, Y.; Liu, M.; Chua, T.-S. KGAT: Knowledge Graph Attention Network for Recommendation. In Proceedings of the ACM KDD, Anchorage, AK, USA, 4–8 August 2019; pp. 950–958. [Google Scholar]
Schlichtkrull, M.S.; Kipf, T.N.; Bloem, P.; van den Berg, R.; Titov, I.; Welling, M. Modeling Relational Data with Graph Convolutional Networks. In Proceedings of the ESWC, Heraklion, Greece, 3–7 June 2018; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2018; Volume 10843, pp. 593–607. [Google Scholar]
Zhang, S.; Tay, Y.; Yao, L.; Liu, Q. Quaternion knowledge graph embeddings. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Nguyen, D.Q.; Vu, T.; Nguyen, T.D. A Capsule Network-based Embedding Model for Knowledge Graph Completion and Search Personalization. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 2180–2189. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2D Knowledge Graph Embeddings. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; AAAI Press: Washington, DC, USA, 2018; pp. 1811–1818. [Google Scholar]
Jiang, X.; Wang, Q.; Wang, B. Adaptive Convolution for Multi-Relational Learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 978–987. [Google Scholar]

Figure 1. Overview of our method. We utilize BERT and CNN to transform drug descriptions into fixed-size vectors, capturing contextual information while embedding head and tail entities and relationships from BioKGs into a 4D vector space. This combined feature information, including molecular structures and descriptions, is input, using CNNs and an inverted dot-product attention mechanism, into a capsule network where deep information exchange occurs.

Figure 2. Illustrative example of our model with parameters

k = 6

,

N = 5

, and

d = 4

. The input quadruple is processed using biquaternions in the output of the capsule network, with the quadruple’s score derived following one CNN layer and two capsule layers.

Figure 2. Illustrative example of our model with parameters

k = 6

,

N = 5

, and

d = 4

. The input quadruple is processed using biquaternions in the output of the capsule network, with the quadruple’s score derived following one CNN layer and two capsule layers.

Figure 3. A systematic investigation was undertaken to evaluate the effect of assorted combinations of information features on the performance metrics of the proposed model. The first group consists of a knowledge or capsule network only, and the second group consists of a capsule network and one feature information (with three choices). The third group includes the capsule network and two feature information (with three choices), and the fourth group contains four features (one choice) for the complete model.

Table 1. Statistics of the various experimental datasets.

Datasets	#Drugs	#Interactions	#Entities	#Relations	#Triples
OGB-biokg	10,533	1,195,972	93,773	51	5,088,434
DrugBank	3,797	1,236,361	2,116,569	74	7,740,864
KEGG	1,925	56,983	129,910	168	362,870

Table 2. Experimental results of the Rot4Cap and baseline models on the three datasets. The bold values denote the best results, while the italic values indicate the second-best results.

Datasets	Methods	ACC.	Pre.	Rec.	F1	Auc	AUPR
OGB-Biokg	Laplacian	0.5710 ± 0.003	0.5296 ± 0.005	0.5934 ± 0.004	0.5597 ± 0.005	0.5692 ± 0.0002	0.5861 ± 0.0004
	DeepWalk	0.5681 ± 0.004	0.5473 ± 0.007	0.5223 ± 0.006	0.5345 ± 0.005	0.5419 ± 0.0002	0.5325 ± 0.0003
	LINE	0.5786 ± 0.007	0.5534 ± 0.011	0.5386 ± 0.013	0.5459 ± 0.011	0.5418 ± 0.0002	0.5374 ± 0.0003
	KGNN	0.7389 ± 0.002	0.7541 ± 0.006	0.7245 ± 0.010	0.7390 ± 0.009	0.7849 ± 0.0008	0.7378 ± 0.0005
	KGAT	0.7489 ± 0.002	0.7559 ± 0.006	0.7191 ± 0.006	0.7370 ± 0.006	0.7962 ± 0.0004	0.8011 ± 0.0004
	RGCN	0.8467 ± 0.004	0.8773 ± 0.006	0.8063 ± 0.004	0.8403 ± 0.005	0.9172 ± 0.0006	0.9268 ± 0.0005
	BERTKG-DDIs	0.8326 ± 0.003	0.8835 ± 0.004	0.8243 ± 0.005	0.8529 ± 0.006	0.8967 ± 0.0004	0.9167 ± 0.0004
	Xin.et al	0.8627 ± 0.002	0.9105 ± 0.008	0.8467 ± 0.007	0.8774 ± 0.005	0.9276 ± 0.0004	0.9341 ± 0.0005
	KG2ECapsule	0.9078 ± 0.002	0.9219 ± 0.004	0.8914 ± 0.003	0.9064 ± 0.003	0.9656 ± 0.0002	0.9672 ± 0.0002
	TIGER	0.8791 ± 0.16	-	-	0.8754 ± 0.17	0.9477 ± 0.14	0.9571 ± 0.12
	TransECap	0.8507 ± 0.004	0.9023 ± 0.010	0.8357 ± 0.009	0.8677 ± 0.002	0.9091 ± 0.0010	0.9207 ± 0.0009
	RotatECap	0.8756 ± 0.004	0.9143 ± 0.011	0.8637 ± 0.004	0.8883 ± 0.004	0.9308 ± 0.0010	0.9509 ± 0.0008
	QuatECap	0.8904 ± 0.008	0.9198 ± 0.004	0.8827 ± 0.010	0.9009 ± 0.008	0.9539 ± 0.0006	0.9537 ± 0.0006
	Rot4Cap	0.9137 ± 0.004	0.9286 ± 0.003	0.8991 ± 0.006	0.9136 ± 0.004	0.9703 ± 0.0004	0.9731 ± 0.0006
DrugBank	Laplacian	0.5923 ± 0.004	0.4455 ± 0.006	0.3372 ± 0.010	0.3838 ± 0.009	0.6724 ± 0.0002	0.4782 ± 0.0002
	DeepWalk	0.6163 ± 0.004	0.6059 ± 0.003	0.5904 ± 0.005	0.5980 ± 0.008	0.6501 ± 0.0002	0.4782 ± 0.0002
	LINE	0.6374 ± 0.005	0.6283 ± 0.006	0.6189 ± 0.013	0.6236 ± 0.005	0.6926 ± 0.0002	0.4923 ± 0.0003
	KGNN	0.7947 ± 0.003	0.7959 ± 0.004	0.7931 ± 0.004	0.7945 ± 0.004	0.8602 ± 0.0005	0.8587 ± 0.0005
	BERTKG-DDIs	0.8469 ± 0.002	0.8524 ± 0.005	0.5681 ± 0.002	0.6817 ± 0.004	0.8925 ± 0.0006	0.8726 ± 0.0004
	Xin.et al	0.87364 ± 0.004	0.8672 ± 0.005	0.8620 ± 0.005	0.8646 ± 0.002	0.9224 ± 0.0004	0.9341 ± 0.0003
	KG2ECapsule	0.9078 ± 0.002	0.9219 ± 0.004	0.8914 ± 0.003	0.9064 ± 0.003	0.9656 ± 0.0002	0.9672 ± 0.0002
	TIGER	0.7905 ± 0.87	-	-	0.8033 ± 0.94	0.8662 ± 0.57	0.8370 ± 0.68
	TransECap	0.87327 ± 0.003	0.8704 ± 0.005	0.8637 ± 0.004	0.8670 ± 0.005	0.9231 ± 0.0007	0.9327 ± 0.0002
	RotatECap	0.8837 ± 0.002	0.8921 ± 0.003	0.8732 ± 0.007	0.8825 ± 0.005	0.9354 ± 0.0009	0.9453 ± 0.0007
	QuatECap	0.8894 ± 0.003	0.9107 ± 0.006	0.8743 ± 0.004	0.8921 ± 0.008	0.9509 ± 0.0007	0.9601 ± 0.0004
	Rot4Cap	0.9127 ± 0.005	0.9268 ± 0.002	0.8967 ± 0.006	0.9115 ± 0.006	0.9720 ± 0.0009	0.9739 ± 0.0012
KEGG	Laplacian	0.5694 ± 0.010	0.3683 ± 0.021	0.3781 ± 0.016	0.3731 ± 0.016	0.5608 ± 0.010	0.2916 ± 0.013
	DeepWalk	0.5800 ± 0.008	0.3801 ± 0.008	0.3762 ± 0.011	0.3781 ± 0.009	0.5751 ± 0.009	0.3005 ± 0.012
	LINE	0.5528 ± 0.006	0.3546 ± 0.010	0.3390 ± 0.016	0.3466 ± 0.013	0.5462 ± 0.013	0.2810 ± 0.015
	KGNN	0.7282 ± 0.008	0.4790 ± 0.024	0.4237 ± 0.013	0.4497 ± 0.018	0.8314 ± 0.009	0.4484 ± 0.013
	KGAT	0.7798 ± 0.008	0.5340 ± 0.015	0.4185 ± 0.015	0.4692 ± 0.015	0.8202 ± 0.010	0.5382 ± 0.011
	RGCN	0.8330 ± 0.005	0.4969 ± 0.012	0.4392 ± 0.018	0.4663 ± 0.015	0.8358 ± 0.006	0.4590 ± 0.010
	BERTKG-DDIs	0.8216 ± 0.007	0.5773 ± 0.008	0.4587 ± 0.015	0.5112 ± 0.007	0.8267 ± 0.004	0.4937 ± 0.009
	Xin.et al	0.8367 ± 0.006	0.5837 ± 0.012	0.4592 ± 0.017	0.5140 ± 0.011	0.8426 ± 0.015	0.5887 ± 0.009
	KG2ECapsule	0.8348 ± 0.003	0.6278 ± 0.008	0.4794 ± 0.011	0.5437 ± 0.009	0.8505 ± 0.004	0.6644 ± 0.007
	TransECap	0.8302 ± 0.003	0.5769 ± 0.011	0.4561 ± 0.011	0.5094 ± 0.007	0.8381 ± 0.007	0.5192 ± 0.004
	RotatECap	0.8402 ± 0.003	0.6014 ± 0.007	0.4621 ± 0.007	0.5226 ± 0.011	0.8491 ± 0.005	0.5891 ± 0.011
	QuatECap	0.8439 ± 0.007	0.6204 ± 0.005	0.4687 ± 0.005	0.5340 ± 0.004	0.8510 ± 0.007	0.6209 ± 0.004
	Rot4Cap	0.8467 ± 0.008	0.6408 ± 0.007	0.5012 ± 0.009	0.5625 ± 0.006	0.8627 ± 0.005	0.6821 ± 0.002

Table 3. The pattern modeling and inference abilities of several models.

Models	Scoring Function	ACC.	Pre.	Rec.	F1	Auc	AUPR
Rot4CovE	$g (v e c (g (c o n c a t (h_{e} \hat{e_{h}} h_{d}, \hat{e_{r}}) * Ω)) W) \cdot (t_{e} \hat{e_{t}} t_{d})$	0.8857	0.8962	0.8734	0.8847	0.9426	0.9507
Rot4ConvKB	$c o n c a t (g ([h_{e}, e_{h}, h_{d}, e_{r}, t_{e}, e_{t}, t_{d}] * Ω)) \cdot w$	0.8975	0.9037	0.8769	0.8901	0.9521	0.9567
Rot4CapsE,	$∥c a p s n e t (g ([h_{e}, e_{h}, h_{d}, e_{r}, t_{e}, e_{t}, t_{d}] * Ω))∥$	0.9034	0.9089	0.8825	0.8955	0.9537	0.9628
Rot4Cap	$∥c a p s (g ([h_{e}, e_{h}, h_{d}, e_{r}, t_{e}, e_{t}, t_{d}] * Ω))∥ \cdot W$	0.9127	0.9268	0.8967	0.9115	0.9720	0.9739

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, S.; Li, X.; Liu, Y.; Bi, P.; Hu, T. Biomedical Knowledge Graph Embedding with Hierarchical Capsule Network and Rotational Symmetry for Drug-Drug Interaction Prediction. Symmetry 2025, 17, 1793. https://doi.org/10.3390/sym17111793

AMA Style

Zhang S, Li X, Liu Y, Bi P, Hu T. Biomedical Knowledge Graph Embedding with Hierarchical Capsule Network and Rotational Symmetry for Drug-Drug Interaction Prediction. Symmetry. 2025; 17(11):1793. https://doi.org/10.3390/sym17111793

Chicago/Turabian Style

Zhang, Sensen, Xia Li, Yang Liu, Peng Bi, and Tiangui Hu. 2025. "Biomedical Knowledge Graph Embedding with Hierarchical Capsule Network and Rotational Symmetry for Drug-Drug Interaction Prediction" Symmetry 17, no. 11: 1793. https://doi.org/10.3390/sym17111793

APA Style

Zhang, S., Li, X., Liu, Y., Bi, P., & Hu, T. (2025). Biomedical Knowledge Graph Embedding with Hierarchical Capsule Network and Rotational Symmetry for Drug-Drug Interaction Prediction. Symmetry, 17(11), 1793. https://doi.org/10.3390/sym17111793

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Biomedical Knowledge Graph Embedding with Hierarchical Capsule Network and Rotational Symmetry for Drug-Drug Interaction Prediction

Abstract

1. Introduction

2. Related Work

3. Model

3.1. Motivation for Four-Dimensional Embedding

3.2. Entity Embedding Layer

3.3. Molecular Structure Layer

3.4. Capsule Network Layer

4. Experimental Design

4.1. Data Sets

4.2. Baselines and Metrics

4.3. Implementation Details

4.4. Computational Cost Analysis

4.5. Experiment Results

4.6. Experimental Analysis

4.6.1. The Effect of Different Features

4.6.2. The Effect of Capsule Network

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI