Next Article in Journal
Exploring the Interplay Between Gut Microbiota and the Melatonergic Pathway in Hormone Receptor-Positive Breast Cancer
Previous Article in Journal
Comparative Analysis of Differentially Expressed Long Non-Coding RNA in Pre- and Postmenopausal Fibroids
Previous Article in Special Issue
A Multi-Modal Graph Neural Network Framework for Parkinson’s Disease Therapeutic Discovery
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

LSA-DDI: Learning Stereochemistry-Aware Drug Interactions via 3D Feature Fusion and Contrastive Cross-Attention

1
School of Economics and Management, Yan’an University, Yan’an 716000, China
2
College of Computer Science, Inner Mongolia University, Hohhot 010021, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Mol. Sci. 2025, 26(14), 6799; https://doi.org/10.3390/ijms26146799
Submission received: 29 May 2025 / Revised: 1 July 2025 / Accepted: 12 July 2025 / Published: 16 July 2025

Abstract

Accurate prediction of drug–drug interactions (DDIs) is essential for ensuring medication safety and optimizing combination-therapy strategies. However, existing DDI models face limitations in handling interactions related to stereochemistry and precisely locating drug interaction sites. These limitations reduce the prediction accuracy for conformation-dependent interactions and the interpretability of molecular mechanisms, potentially posing risks to clinical safety. To address these challenges, we introduce LSA-DDI, a Spatial-Contrastive-Attention-Based Drug–Drug Interaction framework. Our 3D feature extraction method captures the spatial structure of molecules through three features—coordinates, distances, and angles—and fuses them to enhance the model of molecular spatial structures. Concurrently, we design and implement a Dynamic Feature Exchange (DFE) mechanism that dynamically regulates the flow of information across modalities via an attention mechanism, achieving bidirectional enhancement and semantic alignment of 2D topological and 3D spatial structure features. Additionally, we incorporate a dynamic temperature-regulated multiscale contrastive learning framework that effectively aligns multiscale features and enhances the model’s generalizability. Experiments conducted on public drug databases under both warm-start and cold-start scenarios demonstrated that LSA-DDI achieved competitive performance, with consistent improvements over existing methods.

1. Introduction

In recent years, combination therapy has gradually replaced single-drug treatment approaches in clinical practice for disease management. However, the concurrent use of multiple medications carries potential risks of adverse drug interactions. These interactions not only pose risks to patient health and safety, but can even become life threatening. Research data indicate that adverse drug interactions account for approximately 30% of all reported adverse drug reactions and represent a leading cause of drug withdrawals from the market [1,2,3,4,5,6]. Such issues not only significantly increase morbidity and mortality, but also lead to substantial increases in public healthcare expenditure, due to the high costs associated with drug development and increased treatment expenses.
Predicting drug–drug interactions (DDIs) presents substantial challenges, as conventional approaches primarily depend on time-intensive in vitro experiments and clinical trials—methods known for their high costs and limited efficiency [7]. Recent advances in computational approaches, particularly through machine learning [8] and deep learning [9], have revolutionized this field. These techniques synergize multidisciplinary expertise, spanning computational science, pharmaceutical chemistry, pharmacology, and bioinformatics, offering powerful tools to accelerate drug development and optimize clinical medication strategies [2,10]. A growing number of deep-learning-driven computational frameworks have emerged as efficient alternatives for DDI prediction, demonstrating remarkable effectiveness in addressing this complex problem. Consequently, DDI prediction has become a prominent research frontier in artificial intelligence applications for healthcare.
Recent years have seen rapid progress in deep learning and neural network technologies, which have demonstrated remarkable capabilities in such fields as natural language processing, video recognition, and image identification. However, these applications are primarily confined to Euclidean spaces and are not easily applicable to graph-structured data. Advances in graph neural networks have significantly shifted focus to applying deep learning to graph data, leading to progress in such tasks as link prediction and node classification. The richness of graph data in computational biology, including molecular graphs, protein–protein interaction networks, and drug–target interaction networks, has driven the widespread adoption of graph neural networks in bioinformatics.
Many recent methods for predicting drug adverse interactions use graph neural networks (GNNs) such as GCN [11], GAT [12], and GIN [13], which are widely applied in DDI tasks [1,14,15,16,17]. These methods can be grouped into two categories: molecular-graph-based and DDI-network-based. The former uses Python’s RDKit toolkit [18] to convert SMILES [19] into molecular graphs, treating atoms and bonds as nodes and edges. For example, Deac et al. [20] presented a GNN architecture for DDI prediction using drug molecular structures. Wang et al. [21] integrated genomic and pharmacological features with GCN and attention mechanisms to identify synergistic drug combinations. Zhang et al. [22] employed centrality, spatial, and edge coding with a lightweight attention mechanism to extract structural features from drug graphs. Since drug molecules can be broken down into bioactive substructures, such as functional groups, some studies have predicted DDIs by analyzing interactions between these substructures [1,23,24,25]. Nyamabo et al. [26] input molecular graph representations of drug pairs into GAT to extract substructural information. Yu et al. [27] embedded substructure information based on a predefined list and proposed a substructure-aware tensor neural network for DDI prediction. Multimodal models, which integrate diverse data types (images, text, sequences, and graphs), enhance predictive power and robustness. Fu et al. [28] proposed a hierarchical interaction network for clinical trial outcome prediction using multimodal data. Duan et al. [29] combined BERT and GNN models to extract textual and two-dimensional (2D) graph information for DDI prediction. They also used interactive attention vectors to address relationship overlap in drug interactions. Compared to 2D graph representations, three-dimensional (3D) graph representations can capture molecular spatial relationships and stereochemistry, improving prediction accuracy.
As research in the DDI-prediction field continues to advance, several innovative approaches have emerged. Recent studies have begun exploring the integration of 2D and 3D molecular information, as well as multimodal feature fusion based on attention mechanisms. For example, Molormer [30] offers a new perspective on DDI prediction by combining 2D molecular graphs with spatial information. This method primarily enhances molecular representation by fusing 2D structural features with partial 3D descriptors, thereby improving prediction performance. Meanwhile, MHCADDI (2023) employs a co-attention mechanism to integrate various drug features, significantly improving the model’s ability to capture complex relationships between drugs and offering a more effective solution for predicting interactions involving novel drugs.
However, these methods still face limitations when dealing with complex stereochemical interactions. Although Molormer [30] incorporates spatial information, it may not fully exploit the spatial structure of molecules during feature extraction. Likewise, co-attention-based fusion models such as MHCADDI [31] may struggle to accurately identify the key interaction regions of drug pairs during the feature-fusion process. In addition, some 3D-based DDI predictors often rely on simple geometric descriptors or single-scale features, which limits their ability to comprehensively capture stereochemical characteristics.
To address these challenges, the LSA-DDI model introduces significant improvements and innovations. First, LSA-DDI adopts a systematic 3D spatial encoding strategy—including coordinate, distance, and angle encoding—to comprehensively capture stereochemical information. Second, it introduces a bidirectional cross-attention module and a dynamic feature-exchange mechanism, which not only more precisely identify the critical interaction regions of drug pairs, but also achieve deep semantic alignment between 2D and 3D features. Finally, LSA-DDI incorporates a multiscale contrastive learning framework, along with a dynamic temperature-adjustment mechanism, to effectively align and integrate molecular features across levels, enabling the model to capture complex stereochemical interaction patterns at multiple scales. These innovations endow LSA-DDI with stronger representational power and higher predictive accuracy to model intricate stereochemical relationships.
We conducted a series of experiments on the DrugBank benchmark dataset to evaluate the model’s effectiveness and generalizability. In warm-start tasks, the model achieved an AUROC value exceeding 98%, demonstrating its exceptional performance. For the most challenging cold-start tasks, the model achieved competitive performance on the DrugBank dataset, showing consistent improvements compared to state-of-the-art DDI-prediction models.
Our paper makes two main contributions.
  • To address the problem of insufficient 3D information, LSA-DDI first employs molecular conformation and dynamic feature exchange to align 2D and 3D feature information of drugs. Random rotation is also used to incorporate the stereoscopic structure of drug molecules, enhancing the model’s generalization of new drug prediction.
  • In this study, a dynamic feature exchange framework with dynamic temperature-regulated contrastive learning precisely locates drug interaction sites and boosts the model’s generalization ability. Moreover, we assessed our method on standard benchmark tests.
The results demonstrate that our approach led to modest but consistent improvements across most evaluation metrics.

2. Methods

2.1. Problem Settings

Given a set of drugs G , an interaction type space I = { I 1 , I 2 , , I m } , and a training dataset D = { ( G i , G j , r ) k } k = 1 n consisting of annotated drug pairs with their corresponding interaction types, we seek a sophisticated mapping function,
f : G × G × I [ 0 , 1 ] ,
designed to predict the likelihood, p = f ( G i , G j , I t ) , that a given drug pair, ( G i , G j ) , will exhibit a specific interaction type, I t . The mapping function, f, captures the complex relationships between drugs and their interaction types, providing a probabilistic assessment of the interactions.

2.2. Theoretical Background

To provide a clearer understanding of the core techniques integrated within the LSA-DDI framework, this section outlines the foundational concepts of attention mechanisms and contrastive learning.

2.2.1. Attention Mechanism

An attention mechanism, a fundamental component in many neural architectures, can be formally defined as
Attention ( Q , K , V ) = softmax Q K d k V
This formulation maps a query and a set of key–value pairs to an output, where the query, keys, values, and output are all vector representations. The output is computed as a weighted sum of the values, with the weights determined by a compatibility function measuring the similarity between the query and each key. In practice, multiple queries are processed in parallel, with the queries, keys, and values represented as matrices Q, K, and V, respectively [32]. This mechanism enables the model to selectively attend to the most relevant parts of the input, which is critical for capturing intricate dependencies in drug–drug interaction (DDI) scenarios.

2.2.2. Contrastive Learning

Contrastive learning is commonly implemented via the InfoNCE loss function, defined as
L InfoNCE = log exp ( sim ( q , k + ) / τ ) exp ( sim ( q , k + ) / τ ) + i = 0 K exp ( sim ( q , k i ) / τ )
Here, q denotes the anchor (or query) sample, k + is a positive (similar) sample, and each k i represents a negative (dissimilar) sample. The temperature parameter τ is a tunable hyperparameter that controls the sharpness of the similarity distribution. The similarity function sim ( · , · ) is typically implemented as cosine similarity. The principle behind Noise Contrastive Estimation (NCE) is to distinguish observed (positive) data from artificially generated noise (negatives) through a discriminative objective [33].

2.3. LSA-DDI Overview

This study proposes a novel Spatial-Contrastive-Attention-Based Drug–Drug-Interaction prediction framework named LSA-DDI. As shown in Figure 1, the model integrates 3D molecular modeling, cross-modal dynamic feature fusion, and multiscale contrastive learning, effectively enhancing the prediction of stereochemistry-sensitive interactions. Figure 1 presents the four major modules of the LSA-DDI workflow: 3D spatial-feature extraction, 2D topological-graph encoding, dynamic feature exchange, and multiscale contrastive learning.
In the 3D spatial feature extraction module, the SMILES representation of each drug is converted to a 3D molecular conformation using RDKit. Random spatial rotations are applied to improve the model’s robustness to conformational variance. Then, three parallel pathways extract atomic-level stereochemical features from coordinate, distance, and angular information. These features are fused using a dedicated fusion network to produce a unified 3D spatial representation, as illustrated in Figure 2.
For topological representation learning, LSA-DDI employs a multibranch GNN encoder composed of GAT and SAGPooling layers. These branches are designed to extract structural information at the atom, functional group, and molecular scaffold levels. The multiscale features are then fused via an MLP to generate the final 2D graph representation.
To achieve deep semantic alignment between the 3D spatial and 2D topological modalities, LSA-DDI introduces a Dynamic Feature Exchange (DFE) mechanism. As shown in Figure 3, the DFE module adopts a bidirectional cross-attention strategy to dynamically regulate the information flow between modalities, enabling both 2D→3D and 3D→2D feature enhancement. This design improves the model’s ability to identify critical interaction regions between drug pairs.
Furthermore, LSA-DDI incorporates a multiscale contrastive learning framework. Drug representations at three structural levels—atomic, functional group, and scaffold—are used to perform InfoNCE-based contrastive learning. Scale-specific contrastive losses are adaptively weighted via learnable parameters and jointly optimized with the main task loss to promote better generalization. Finally, the DDI score is computed using RESCAL tensor decomposition, enabling relation-specific interaction prediction.

2.4. Three-Dimensional Feature Extraction

For each pair of drug molecules, G x and G y , we first generate their 3D conformations from SMILES representations using the RDKit toolkit. To ensure robustness to conformational variations, we apply random rotation transformations by sampling Euler angles, θ U ( 0 , 2 π ) , from a uniform distribution and constructing rotation matrices, R θ , to transform the initial atomic coordinates:
X aug = R θ X init .
This rotational invariance augmentation ensures that the model learns the intrinsic structural features of molecules, independently of their absolute spacial orientations.
Our Spatial3DLayer captures molecular stereochemical information through three complementary feature-extraction pathways. First, the coordinate-feature encoder directly processes the augmented 3D coordinates, mapping the raw coordinates to a high-dimensional feature space through a multilayer perceptron structure:
f pos = SpatialEncoder ( X aug ) ,
where SpatialEncoder consists of linear layers, ReLU activation functions, layer normalization, and dropout, mapping 3D coordinates to a d-dimensional feature space. This encoder transforms atomic spatial-position information into latent representations through nonlinear transformations, enabling the model to learn associations between atomic positions and their chemical properties.
Subsequently, the distance feature encoder focuses on the molecule’s radial distribution features. First, we calculate the molecule’s geometric center, c = 1 n i = 1 n X aug , i . Then, we compute the Euclidean distance of each atom to the center:
d = X aug c 2 .
These distance values are processed through the DistanceEncoder to obtain f dist , which maps one-dimensional distance features to a d / 2 -dimensional space. These radial features provide rotation-invariant information on molecular shape and atomic distribution, crucial for representing drug molecules’ structural characteristics.
Finally, we calculate normalized direction vectors:
v norm = X aug c X aug c 2 .
These normalized vectors contain cosine values of the angles between atoms and coordinate axes, processed through AngleEncoder to obtain f angle , mapping 3D angular features to a d / 2 -dimensional space. Angular features provide information on atomic distribution in spherical coordinates, essential for understanding molecular spatial configuration and functional-group orientation.
Ultimately, we concatenate and fuse the three feature types:
f combined = [ f pos ; f dist ; f angle ]
s = FusionNetwork ( f combined ) ,
where FusionNetwork employs linear transformations, ReLU activation functions, and layer normalization to map the 2 d -dimensional concatenated features to a d-dimensional unified representation space. This multipathway feature extraction and fusion strategy enables our model to fully capture molecular 3D structural information, including absolute positions, relative distance distributions, and spatial orientations.

2.5. Dynamic Feature Exchange

To achieve efficient fusion of multimodal features, we design and introduce a DFE mechanism after extracting 2D topological-structure features and 3D spatial-structure features. This mechanism is used for semantic alignment and information enhancement between the two modalities.
Let the node-level representation output by the 2D encoder be H 2 D R N × d , representing the atomic features of molecules in the topological structure. The output of the 3D encoder is H 3 D R N × d , representing the geometric features in the spatial structure. Here, N denotes the number of atoms, and d is the feature dimension. The features of the two modalities will be fused in the DFE module.
To dynamically regulate cross-modal information, we employ an attention mechanism to generate gating weights. Specifically, query (Query) and key (Key) vectors are constructed through linear mappings:
Q = H 2 D W Q , K = H 3 D W K , α = Softmax Q K d ,
where W Q , W K R d × d are learnable parameters, and the attention matrix α R N × N represents the attention strength of each 2D atom for all 3D atoms.
After obtaining the attention weights, the features of the other modality are weighted and injected into the current modality to achieve bidirectional feature enhancement:
H ˜ 2 D = H 2 D + γ · α H 3 D W V , H ˜ 3 D = H 3 D + γ · α H 2 D W V ,
where W V R d × d is the value projection matrix, and γ is a learnable scaling factor with an initial value of 0.1. This value helps to avoid instability caused by overly strong modality fusion during the early stages of training.
After the feature interaction is completed, global average pooling operations on H ˜ 2 D and H ˜ 3 D obtain independent vector representations for the two modalities:
v 2 D = MeanPool ( H ˜ 2 D ) , v 3 D = MeanPool ( H ˜ 3 D ) .
These representations not only retain the unique structural information of each modality but also have the ability to respond to the other modality, which is beneficial for subsequent multiscale contrastive learning and interaction type recognition.
Finally, we combine multi-head outputs and predict interactions via RESCAL tensor decomposition:
score = Tr ( v 2 D T W rel ( r ) v 3 D ) .
where W rel ( r ) R 64 × 64 specifically encodes the r-th interaction type, enabling interpretable pattern learning. This architecture effectively captures the bidirectional, multiscale nature of DDI by leveraging both structural and spatial information.

2.6. Multiscale Contrastive Learning

To explore the implicit structural information within molecules, we implement a multiscale contrastive learning mechanism with drug representations from the bidirectional cross-attention layer as input, constructing contrastive constraints at different abstraction levels to encourage learning richer molecular interaction patterns. We select [1, 2, 4] as scale parameters, corresponding to the atomic, functional-group, and molecular-scaffold levels, based on the hierarchical molecule characteristics.
For drug pair representations f 1 and f 2 , we perform scale-specific pooling operations:
f 1 s = AvgPool 1 d ( f 1 , s ) , f 2 s = AvgPool 1 d ( f 2 , s ) .
We then apply L2 normalization for numerical stability:
f ^ 1 s = f 1 s | | f 1 s | | 2 , f ^ 2 s = f 2 s | | f 2 s | | 2 .
The similarity matrix is computed using
S s = f ^ 1 s ( f ^ 2 s ) T τ ,
where τ = 0.1 2.0 is a learnable hyperparameter (referred to as a ‘temperature parameter’ in machine learning, unrelated to physical temperature), which controls the distribution smoothness, with smaller values producing sharper distributions that enhance discriminative ability.
Our contrastive learning treats each drug pair and its 3D-augmented counterpart as positive pairs, while different drug pairs serve as negative samples. Specifically, for a batch of N drug pairs, we construct
  • Positive pairs:  { ( f i s , f i s ) } i = 1 N for each scale s.
  • Negative pairs:  { ( f i s , f j s ) , ( f i s , f j s ) } i j for each scale s.
The InfoNCE loss for scale s is defined as
L contrast s = 1 N i = 1 N log exp sim ( f ^ i s , f ^ i s ) τ exp sim ( f ^ i s , f ^ i s ) τ + j i exp sim ( f ^ i s , f ^ j s ) τ + exp sim ( f ^ i s , f ^ j s ) τ .
where f ^ i s and f ^ i s represent the 2 -normalized features of drug pair i and its 3D-augmented version at scale s, and τ is the temperature hyperparameter. The function sim ( · , · ) denotes cosine similarity.
To adaptively adjust scale importance, we introduce learnable weight parameters:
w s = exp ( ff s ) s { 1 , 2 , 4 } exp ( ff s ) .
The final multiscale contrastive loss combines scale-specific losses:
L total = s { 1 , 2 , 4 } w s L contrast s
During training, we employ a dynamic weight adjustment strategy:
γ = γ 0 · ( 1 t T )
where γ 0 = 0.1 and T = 500 epochs, t is the current epoch. This ensures an early focus on representation learning and task optimization later. The final loss combines the main task loss and contrastive learning loss:
L = L main + γ ( t ) · L total .

3. Experiments and Results

3.1. Dataset

To assess our method, we conducted experiments on a widely adopted real-world DDI benchmark, DrugBank [34]. This benchmark includes 191,808 drug pairs, involving 1706 distinct drugs and 86 interaction types. Each drug pair is associated with a single type of interaction that specifies how one drug influences the metabolism of the other.
We converted each drug’s SMILES into graph data using RDKit (version 2024.09.1) [18]. In this graph, the drug’s atoms were represented as nodes, while the chemical bonds formed the edges connecting these nodes, with corresponding features assigned to each node and edge. The detailed features of the atoms and bonds are shown in Table 1.

3.2. Experimental Setup

The SSI-DDI model comprises a 4-layer GAT with parameter sharing, with each layer having two multi-head attention heads (each head outputs 32-dimensional features, concatenated to 64 dimensions), followed by LayerNorm and ELU activation. The input data undergo preliminary LayerNorm processing, and each interaction type, I r , corresponds to a learnable 64 × 64 relation matrix. For 3D-structure processing, molecular structures are generated from SMILES and augmented with random rotation. Coordinates, distances, and direction features are encoded via separate modules and fused through a linear layer for unified molecule representation. In feature fusion, the DFI module maps 2D features to Query and 3D features to Key, generating gating coefficients via attention weights for bidirectional feature injection (2D→3D and 3D→2D), using a learnable scaling factor, γ (initially 0.1). The final drug pair representation is generated via global average pooling. For training, multiscale contrastive learning (atomic, substructure, molecular levels) uses a three-scale InfoNCE loss, with dynamic loss weight combination via learnable parameters. The total loss combines contrastive and main task losses, the former decaying linearly with training epochs. Optimization uses AdamW (weight decay 1 × 10−3, batch size 512), with an initial learning rate of 5 × 10−3 and cosine annealing lr scheduling. We conducted experiments on an NVIDIA GeForce RTX 4090 GPU.

3.3. Baselines

We compared our method with the following baselines.
  • MHCADDI [20]: Incorporates external message-passing protocols between drug structures, integrating joint drug information during the process of learning individual drug representations.
  • SSI-DDI [26]: This model uses molecular graphs as inputs and employs GAT layers to extract substructure features.
  • GMPNN-CS [35]: This gating mechanism enables the learning of substructures of diverse sizes and shapes from drug molecule graphs, with DDI event predictions derived from the interaction computations of these substructures.
  • BDN-DDI [36]: A bidirectional dynamic network with a multihead attention mechanism designed to capture molecular-structure information and employed for adaptive feature extraction and dynamic message passing to predict DDIs.
  • DGNN-DDI [37]: A directed message passing network incorporating a substructure attention mechanism, used to adaptively extract substructures and applied for DDI prediction.

3.4. Performance Evaluation of LSA-DDI in an Inductive Setting

Our initial experiments were conducted in a transductive setting. That is, the drugs in the test phase were also included in the training phase. To further evaluate our model’s generalization ability, we extended our experiments to a cold-start scenario involving completely unseen drugs. This setting better reflected real-world scenarios, where novel drugs must be predicted. To achieve this, we restructured our experimental protocol by splitting the dataset at the drug level rather than the interaction level.
Let G new denote the set of new (unseen) drugs and G old the set of seen drugs, satisfying
G new G old = G , G new G old = .
We then defined the splits as
D train = { ( G x , G y , r ) D G x , G y G old } ,
D S 1 = { ( G x , G y , r ) D G x , G y G new } ,
D S 2 = { ( G x , G y , r ) D | ( G x G old G y G new ) ( G x G new G y G old ) } .
D train contains DDI tuples of drug pairs (known drugs) used for training, while D S 1 comprises tuples of unknown drugs not present in D train . The remaining tuples (where one drug belongs to D S 1 and the other to D train ) form D S 2 . The model was trained on D train and evaluated on both D S 1 and D S 2 .
Table 2 compares our method with baseline approaches using the same model configuration as in Section IV-B, but with a cosine-annealing learning rate schedule (initial learning rate = 5 × 10−3) and a new drug ratio of 1/5. Based on the average and standard deviation of three runs reported in Table 2, our LSA-DDI method achieved the highest AUROC on known–unknown drug pairs, significantly outperforming existing methods. Table 3 presents the detailed comparison in cold-start scenarios, while all methods experienced a drop in performance, ours still achieved a competitive AUROC of 86.12%, While BDN-DDI demonstrated comparable performance in cold-start scenarios (AUROC: 86.20% in S1, 93.68% in S2), LSA-DDI showed consistent advantages through its systematic 3D spatial feature extraction and dynamic cross-modal attention mechanisms. The performance improvements, though modest in absolute terms, reflect the effectiveness of incorporating stereochemical information for DDI prediction. This can also be seen from Figure 4. This strategy generated diverse spatial representations, enhancing model generalization by leveraging common substructures and spatial features across different drugs. Our model consistently outperformed the other methods in most performance metrics, showing the highest median values and tightest interquartile ranges.

3.5. Ablation Experiment

To delve deeper into the factors that bolstered our model’s generalization, we ran ablation experiments on test data from S1 and S2 partitions by methodically removing each component of the LSA-DDI framework and observing the subsequent performance shifts. Using three-fold cross validation, we assessed these modified versions against our complete LSA-DDI model.
  • wo_3d: Model lacking 3D information. We explored how eliminating 3D stereochemical data of drugs affected drug-interaction modeling.
  • wo_DFE: Model without the dynamic feature exchange mechanism. We assessed the dynamic feature exchange mechanism’s role in capturing interaction patterns between drugs.
  • wo_constrast: Model lacking multiscale contrastive learning. We examined how multiscale contrastive learning contributed to capturing multiscale structural features.
  • wo_3D_constract: Model without both 3D data augmentation and multiscale contrastive learning. We investigated the combined effect of spatial information and contrastive learning.
Table 4 shows that LSA-DDI outperformed all its variants, confirming the effectiveness of each component. Removing the 3D module (wo_3D) significantly reduced performance, demonstrating that incorporating spatial geometric information effectively captures chemical features and addresses the cold-start problem. Similarly, the results from wo_DFE indicate the crucial role of bidirectional cross-attention in identifying key drug regions and aligning features, while wo_contrast confirms that contrastive learning enhanced the feature discriminability and preserved molecular topology invariance. In the wo_3D_contrast experiment, with neither 3D information extraction nor contrastive learning, the model depended only on 2D topological information, leading to a significant performance decline. To test the effectiveness of the scale parameters 1, 2, 4, we conducted ablation studies involving removing or keeping only molecule-level features (scale = 4), both of which caused sharp drops (Table 4). This shows the need for multiple scales. It underlines how crucial it is to combine information from various structural spaces and to use their combined effects for accurate DDI prediction. This can also be clearly seen in Figure 5.

3.6. Case Study

Figure 6 presents the spatial visualizations of the 3D molecular structures and DDIs processed by LSA-DDI. The red dashed lines in the visualizations represent critical interaction bonds and key stereochemical sites identified by the model’s attention mechanism as crucial for DDIs. In the Warfarin–aspirin case, the relative positions of Warfarin’s coumarin structure and aspirin’s salicylic acid structure in 3D space determine their synergistic anticoagulant effect, with the red dashed lines highlighting the specific interaction sites where these structures achieve optimal spatial complementarity. LSA-DDI successfully identified the interaction pattern of increased bleeding risk through extraction of coordinate, distance, and angle features at these critical sites. The metformin–insulin analog case further validated the model’s adaptability in handling different pharmacological mechanisms, where the red dashed lines indicate the complementary interaction regions between metformin’s biguanide structure and insulin’s polypeptide structure that contribute to their synergistic glucose-lowering effect. These visualization results fully demonstrate that LSA-DDI can accurately capture stereochemical information and spatial interaction patterns that traditional 2D methods cannot handle, providing important technical support for predicting complex DDIs.
As shown in Figure 7, the DFE mechanism achieved dynamic fusion of 2D topological features and 3D spatial features through bidirectional attention. The upper left plot shows that the original features exhibited obvious modal differences, while the upper right plot demonstrates the bidirectional information flow of 2D→3D and 3D→2D attention weights. The features enhanced after processing (lower left plot) achieved better semantic alignment, with feature similarity significantly improving from 0.03 to 0.95, demonstrating the effectiveness of the DFE mechanism.
The multiscale contrastive learning framework in Figure 8 demonstrates the working principle of the dynamic temperature regulation mechanism. Before contrastive learning, features at different scales (Scale 1, 2, 4) exhibited scattered distributions, while dynamic temperature parameters ( τ = 0.1 2.0 ) adjusted learning intensity during training, ultimately achieving feature alignment and clustering optimization. This mechanism enabled LSA-DDI to simultaneously learn effective feature representations at multiple levels, demonstrating excellent generalization performance when handling new drug combinations.

4. Conclusions

We proposed a novel drug–drug interaction prediction model, LSA-DDI, which integrates three key innovations to enhance prediction accuracy. Through a 3D spatial-feature extractor and position information embedding, the model can accurately capture and retain the fine stereochemical characteristics of drug molecules, use a bidirectional cross-attention module to accurately identify critical interaction regions between drugs, and implement a multiscale contrastive learning framework with dynamic InfoNCE loss to effectively align 2D and 3D features, strengthening the model’s consistency and generalization capability. In benchmark tests on DrugBank, LSA-DDI achieved competitive performance and demonstrated consistent improvements over baseline methods. From a clinical perspective, the model can not only support new drug development and polypharmacy safety assessment, but its interpretable attention patterns also provide deep pharmacological insights into interaction mechanisms, offering robust support for precision medical decision making.
Moreover, we are aware of the potential ethical issues that may arise from the practical application of such models. The predictions generated by LSA-DDI should be used solely for clinical-decision support, with final treatment decisions made by qualified medical professionals. During the development and deployment of the model, it is essential to strictly protect patient privacy and ensure the legal and compliant use of all data. Finally, the model’s performance and fairness across different populations and clinical settings should be continuously monitored and improved.

Author Contributions

S.W. and C.Y. contributed equally to conceptualization, methodology, and writing. L.C. supervised the project and provided critical revisions. S.W. performed data analysis. C.Y. implemented the model. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Inner Mongolia Natural Science Foundation Project “Research on Cross-border E-commerce Product ‘Localization’ Promotion Based on Large Language Models” (Grant No. 2025MS07016) and the research project “Social Network Public Ideology Mining and Prediction Research” (Grant No. YAU202507636).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the reported results are publicly available. This study utilized the DrugBank database (version 4.0), which contains 191,808 drug pairs involving 1706 distinct drugs and 86 interaction types. The dataset can be accessed at https://go.drugbank.com/. No new datasets were generated during this study. All experimental configurations and hyperparameters are detailed in the Methods section to ensure reproducibility.

Acknowledgments

The authors thank the anonymous reviewers for their valuable suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ryu, J.Y.; Kim, H.U.; Lee, S.Y. Deep learning improves prediction of drug–drug and drug–food interactions. Proc. Natl. Acad. Sci. USA 2018, 115, E4304–E4311. [Google Scholar] [CrossRef] [PubMed]
  2. Tatonetti, N.P.; Ye, P.P.; Daneshjou, R.; Altman, R.B. Data-driven prediction of drug effects and interactions. Sci. Transl. Med. 2012, 4, 125ra31. [Google Scholar] [CrossRef]
  3. Silverman, R.B.; Holladay, M.W. The Organic Chemistry of Drug Design and Drug Action; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
  4. Zhang, Y.; Pan, X.; Shi, T.; Gu, Z.; Yang, Z.; Liu, M.; Xu, Y.; Yang, Y.; Ren, L.; Song, X.; et al. P450Rdb: A manually curated database of reactions catalyzed by cytochrome P450 enzymes. J. Adv. Res. 2024, 63, 35–42. [Google Scholar] [CrossRef]
  5. Zhou, G.; Xuan, C.; Wang, Y.; Zhang, B.; Wu, H.; Gao, J. Drug repositioning based on a multiplex network by integrating disease, gene, and drug information. Curr. Bioinform. 2023, 18, 267–275. [Google Scholar] [CrossRef]
  6. Ai, C.; Yang, H.; Ding, Y.; Tang, J.; Guo, F. Low rank matrix factorization algorithm based on multi-graph regularization for detecting drug–disease association. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20, 3033–3043. [Google Scholar] [CrossRef]
  7. Whitebread, S.; Hamon, J.; Bojanic, D.; Urban, L. Keynote review: In vitro safety pharmacology profiling: An essential tool for successful drug development. Drug Discov. Today 2005, 10, 1421–1433. [Google Scholar] [CrossRef] [PubMed]
  8. Vo, T.H.; Nguyen, N.T.K.; Le, N.Q.K. Improved prediction of drug–drug interactions using ensemble deep neural networks. Med. Drug Discov. 2023, 17, 100149. [Google Scholar] [CrossRef]
  9. Zhong, Y.; Li, G.; Yang, J.; Zheng, H.; Yu, Y.; Zhang, J.; Luo, H.; Wang, B.; Weng, Z. Learning motif-based graphs for drug–drug interaction prediction via local–global self-attention. Nat. Mach. Intell. 2024, 6, 1094–1105. [Google Scholar] [CrossRef]
  10. Qiu, Y.; Zhang, Y.; Deng, Y.; Liu, S.; Zhang, W. A comprehensive review of computational methods for drug–drug interaction detection. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 19, 1968–1985. [Google Scholar] [CrossRef]
  11. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  12. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
  13. Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv 2018, arXiv:1810.00826. [Google Scholar]
  14. Sun, M.; Wang, F.; Elemento, O.; Zhou, J. Structure-based drug–drug interaction detection via expressive graph convolutional networks and deep sets. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; AAAI Press: Palo Alto, CA, USA, 2020; Volume 34, pp. 13927–13928. [Google Scholar]
  15. Hong, Y.; Luo, P.; Jin, S.; Liu, X. LaGAT: Link-aware graph attention network for drug–drug interaction prediction. Bioinformatics 2022, 38, 5406–5412. [Google Scholar] [CrossRef] [PubMed]
  16. Li, Z.; Tu, X.; Chen, Y.; Lin, W. HetDDI: A pre-trained heterogeneous graph neural network model for drug–drug interaction prediction. Brief. Bioinform. 2023, 24, bbad385. [Google Scholar] [CrossRef]
  17. Le, N.Q.K. Predicting emerging drug interactions using GNNs. Nat. Comput. Sci. 2023, 3, 1007–1008. [Google Scholar] [CrossRef] [PubMed]
  18. Landrum, G. RDKit: A software suite for cheminformatics, computational chemistry, and predictive modeling. Greg Landrum 2013, 8, 5281. [Google Scholar]
  19. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar] [CrossRef]
  20. Deac, A.; Huang, Y.H.; Veličković, P.; Liò, P.; Tang, J. Drug–drug adverse effect prediction with graph co-attention. arXiv 2019, arXiv:1905.00534. [Google Scholar]
  21. Wang, J.; Liu, X.; Shen, S.; Deng, L.; Liu, H. DeepDDS: Deep graph neural network with attention mechanism to predict synergistic drug combinations. Brief. Bioinform. 2022, 23, bbab390. [Google Scholar] [CrossRef]
  22. Zhang, W.; Chen, Y.; Liu, F.; Luo, F.; Tian, G.; Li, X. Predicting potential drug–drug interactions by integrating chemical, biological, phenotypic and network data. BMC Bioinform. 2017, 18, 18. [Google Scholar] [CrossRef]
  23. Feng, Y.H.; Zhang, S.W.; Shi, J.Y. DPDDI: A deep predictor for drug–drug interactions. BMC Bioinform. 2020, 21, 419. [Google Scholar] [CrossRef] [PubMed]
  24. Lin, S.; Wang, Y.; Zhang, L.; Chu, Y.; Liu, Y.; Fang, Y.; Shi, M.; Jiang, H.; Zhang, S.; Chen, X.; et al. MDF-SA-DDI: Predicting drug–drug interaction events based on multi-source drug fusion, multi-source feature fusion and transformer self-attention mechanism. Brief. Bioinform. 2022, 23, bbab421. [Google Scholar] [CrossRef]
  25. Chen, Y.; Ma, T.; Yang, X.; Wang, J.; Song, B.; Zeng, X. MUFFIN: Multi-scale feature fusion for drug–drug interaction prediction. Bioinformatics 2021, 37, 2651–2658. [Google Scholar] [CrossRef] [PubMed]
  26. Nyamabo, A.K.; Yu, H.; Shi, J.Y. SSI–DDI: Substructure–substructure interactions for drug–drug interaction prediction. Brief. Bioinform. 2021, 22, bbab133. [Google Scholar] [CrossRef] [PubMed]
  27. Yu, H.; Zhao, S.; Shi, J.Y. STNN-DDI: A substructure-aware tensor neural network to predict drug–drug interactions. Brief. Bioinform. 2022, 23, bbac209. [Google Scholar] [CrossRef]
  28. Fu, G.; Ding, Y.; Seal, A.; Chen, B.; Sun, Y.; Bolton, E. Predicting drug target interactions using meta-path-based semantic network analysis. BMC Bioinform. 2016, 17, 160. [Google Scholar] [CrossRef]
  29. Duan, Y.; Guo, C.; Zhang, E.; Ma, X.; Wen, C. Attention-wise masked graph contrastive learning for predicting molecular property. Brief. Bioinform. 2022, 23, bbac303. [Google Scholar]
  30. Zhang, X.; Wang, G.; Meng, X.; Wang, S.; Zhang, Y.; Rodriguez-Paton, A.; Wang, J.; Wang, X. Molormer: A lightweight self-attention-based method focused on spatial structure of molecular graph for drug-–drug interactions prediction. Brief. Bioinform. 2022, 23, bbac296. [Google Scholar] [CrossRef]
  31. Cheng, B.; Peng, J. Molecular substructure-based cross-dependent network for drug–drug interaction prediction. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, HI, USA, 1–4 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 101–106. [Google Scholar]
  32. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
  33. van den Oord, A.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar] [CrossRef]
  34. Law, V.; Knox, C.; Djoumbou, Y.; Jewison, T.; Guo, A.C.; Liu, Y.; Maciejewski, A.; Arndt, D.; Wilson, M.; Neveu, V.; et al. DrugBank 4.0: Shedding new light on drug metabolism. Nucleic Acids Res. 2014, 42, D1091–D1097. [Google Scholar] [CrossRef] [PubMed]
  35. Nyamabo, A.K.; Yu, H.; Liu, Z.; Shi, J.Y. Drug–drug interaction prediction with learnable size-adaptive molecular substructures. Brief. Bioinform. 2022, 23, bbab441. [Google Scholar] [CrossRef] [PubMed]
  36. Ning, G.; Sun, Y.; Ling, J.; Chen, J.; He, J. BDN-DDI: A bilinear dual-view representation learning framework for drug–drug interaction prediction. Comput. Biol. Med. 2023, 165, 107340. [Google Scholar] [CrossRef] [PubMed]
  37. Ma, M.; Lei, X. A dual graph neural network for drug–drug interactions prediction based on molecular structure and interactions. PLoS Comput. Biol. 2023, 19, e1010812. [Google Scholar] [CrossRef]
Figure 1. The figure illustrates the framework of a drug–drug-interaction prediction model. It includes the 3D spatial-feature extraction and GNN encoding processes for two drug molecules, A and B. Also covered are the dynamic feature-exchange mechanism, multiscale contrastive learning strategy, and final DDI predictor. The interactions between model components are detailed, showing the entire process of drug-interaction prediction via the integration of 3D spatial and topological features.
Figure 1. The figure illustrates the framework of a drug–drug-interaction prediction model. It includes the 3D spatial-feature extraction and GNN encoding processes for two drug molecules, A and B. Also covered are the dynamic feature-exchange mechanism, multiscale contrastive learning strategy, and final DDI predictor. The interactions between model components are detailed, showing the entire process of drug-interaction prediction via the integration of 3D spatial and topological features.
Ijms 26 06799 g001
Figure 2. The structure of the 3D feature-extraction module, showing how spatial, distance, and angle encoders extract coordinate, distance, and angle information from molecular structures to generate a unified 3D spatial-feature representation.
Figure 2. The structure of the 3D feature-extraction module, showing how spatial, distance, and angle encoders extract coordinate, distance, and angle information from molecular structures to generate a unified 3D spatial-feature representation.
Ijms 26 06799 g002
Figure 3. Dynamic Feature Exchange (DFE) mechanism. The mechanism uses linear transformations to generate query and key vectors, computes attention weights via a cross-modal attention mechanism, and enhances 2D and 3D features bidirectionally. Finally, it uses global average pooling to obtain feature representations for subsequent predictions.
Figure 3. Dynamic Feature Exchange (DFE) mechanism. The mechanism uses linear transformations to generate query and key vectors, computes attention weights via a cross-modal attention mechanism, and enhances 2D and 3D features bidirectionally. Finally, it uses global average pooling to obtain feature representations for subsequent predictions.
Ijms 26 06799 g003
Figure 4. Figure shows the performance evaluation of different models on the S1 and S2 partitions (involving old and new drug combinations). By comparing the models’ performance across ACC, AUROC, AP, and F1 metrics, it visually demonstrates the performance of the LSA-DDI model in multiple scenarios.
Figure 4. Figure shows the performance evaluation of different models on the S1 and S2 partitions (involving old and new drug combinations). By comparing the models’ performance across ACC, AUROC, AP, and F1 metrics, it visually demonstrates the performance of the LSA-DDI model in multiple scenarios.
Ijms 26 06799 g004
Figure 5. Figure presents the results of an ablation study, comparing the performance of the LSA-DDI model and its variants across S1 and S2 partitions (new drug–new drug and old drug–new drug combinations, respectively).
Figure 5. Figure presents the results of an ablation study, comparing the performance of the LSA-DDI model and its variants across S1 and S2 partitions (new drug–new drug and old drug–new drug combinations, respectively).
Ijms 26 06799 g005
Figure 6. Three-dimensional data augmentation case study.
Figure 6. Three-dimensional data augmentation case study.
Ijms 26 06799 g006
Figure 7. Comparative study case analysis.
Figure 7. Comparative study case analysis.
Ijms 26 06799 g007
Figure 8. DFE data fusion case analysis.
Figure 8. DFE data fusion case analysis.
Ijms 26 06799 g008
Table 1. Chemical feature overview.
Table 1. Chemical feature overview.
FeatureTypeDimension
Node Features
The atomic symbolNode44
The degree of the atomNode1
The implicit valenceNode1
The formal chargeNode1
The number of radical electronsNode1
The hybridizationNode5
Whether or not the atom is aromaticNode1
The total number of hydrogens on the atomNode1
Subtotal for Node Features55
Bond Features
Bond typeBond4
ConjugationBond1
In ringBond1
Subtotal for Bond Features6
Total Features61
Table 2. The performance of each model in the warm-start task. The maximum value of each column is bolded.
Table 2. The performance of each model in the warm-start task. The maximum value of each column is bolded.
ModelACCAUROCAPF1
MHCADDI82.6088.4089.1285.40
SSI-DDI95.4897.4997.8696.07
GMPNN-CS94.8997.5297.4895.30
DGNN-DDI96.3298.2098.2295.23
BDN-DDI96.4298.2898.2496.30
LSA-DDI(Ours)97.4398.7098.4996.50
Table 3. The performance of each model in the cold-start tasks. The maximum value of each column is bolded.
Table 3. The performance of each model in the cold-start tasks. The maximum value of each column is bolded.
ModelS1 Partition (New Drug, New Drug)S2 Partition (Old Drug, New Drug)
ACCAUROCAPF1ACCAUROCAPF1
MHCADDI65.5372.4873.2967.2170.5877.8476.1572.73
SSI-DDI65.1872.5874.1063.3276.3884.2284.0873.42
GMPNN-CS68.5774.9675.4465.3277.4584.8484.5678.29
DGNN-DDI66.5773.2172.3262.1175.3983.1584.7569.71
BDN-DDI74.8886.2084.3270.2184.9293.6893.6484.18
LSA-DDI (Ours)75.4586.1286.3770.0285.4193.2194.1286.21
Table 4. Performance demonstration of LSA-DDI and its variants on S1 and S2 partitions. The maximum value of each column is bolded.
Table 4. Performance demonstration of LSA-DDI and its variants on S1 and S2 partitions. The maximum value of each column is bolded.
ModelS1 Partition (New Drug, New Drug)S2 Partition (Old Drug, New Drug)
ACCAUROCAPF1ACCAUROCAPF1
wo_3d72.3578.2879.3064.7079.6588.4088.2078.90
wo_DFE72.3387.3879.5368.0779.5588.2387.4276.86
wo_contrast69.4179.2779.4065.1270.1888.0788.1079.29
wo_3d_contrast65.1471.3773.9760.1375.2982.9884.1074.47
wo_3d_contrast_scale_461.1469.8071.8856.2174.1976.4378.9070.29
wo_3d_contrast_scale_1,264.1170.3170.9651.2274.3978.9080.3071.22
LSA-DDI (Ours)75.4586.1286.3770.0285.4193.2194.1286.21
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, S.; Yang, C.; Chen, L. LSA-DDI: Learning Stereochemistry-Aware Drug Interactions via 3D Feature Fusion and Contrastive Cross-Attention. Int. J. Mol. Sci. 2025, 26, 6799. https://doi.org/10.3390/ijms26146799

AMA Style

Wang S, Yang C, Chen L. LSA-DDI: Learning Stereochemistry-Aware Drug Interactions via 3D Feature Fusion and Contrastive Cross-Attention. International Journal of Molecular Sciences. 2025; 26(14):6799. https://doi.org/10.3390/ijms26146799

Chicago/Turabian Style

Wang, Shanshan, Chen Yang, and Lirong Chen. 2025. "LSA-DDI: Learning Stereochemistry-Aware Drug Interactions via 3D Feature Fusion and Contrastive Cross-Attention" International Journal of Molecular Sciences 26, no. 14: 6799. https://doi.org/10.3390/ijms26146799

APA Style

Wang, S., Yang, C., & Chen, L. (2025). LSA-DDI: Learning Stereochemistry-Aware Drug Interactions via 3D Feature Fusion and Contrastive Cross-Attention. International Journal of Molecular Sciences, 26(14), 6799. https://doi.org/10.3390/ijms26146799

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop