Fine-Tuning BiomedBERT with LoRA and Pseudo-Labeling for Accurate Drug–Drug Interactions Classification

Gheorghita, Ioan-Flaviu; Bocanet, Vlad-Ioan; Iantovics, Laszlo Barna

doi:10.3390/app15158653

Open AccessArticle

Fine-Tuning BiomedBERT with LoRA and Pseudo-Labeling for Accurate Drug–Drug Interactions Classification

by

Ioan-Flaviu Gheorghita

¹

,

Vlad-Ioan Bocanet

²

and

Laszlo Barna Iantovics

^3,*

¹

Doctoral School of Letters, Humanities and Applied Sciences, George Emil Palade University of Medicine, Pharmacy, Science, and Technology of Târgu Mureş, 540142 Târgu Mureş, Romania

²

Department of Manufacturing Engineering, Technical University of Cluj-Napoca, 400641 Cluj-Napoca, Romania

³

Department of Electrical Engineering and Information Technology, George Emil Palade University of Medicine, Pharmacy, Science, and Technology of Târgu Mureş, 540142 Târgu Mureş, Romania

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8653; https://doi.org/10.3390/app15158653

Submission received: 23 June 2025 / Revised: 27 July 2025 / Accepted: 1 August 2025 / Published: 5 August 2025

(This article belongs to the Special Issue Computational Approaches to the Discovery and Design of Pharmaceutical Drugs)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

The proposed LoRA-BiomedBERT classifier provides polarity–aware drug–drug interaction (DDI) classification directly from unstructured clinical text. This lightweight model is suitable for integration into clinical decision support systems (CDSSs), enabling safer polypharmacy and personalized therapeutic planning within hospital prescribing environments.

Abstract

In clinical decision support systems (CDSSs), where accurate classification of drug–drug interactions (DDIs) can directly affect treatment safety and outcomes, identifying drug interactions is a major challenge, introducing a scalable approach for classifying DDIs utilizing a finely-tuned biomedical language model. The method shown here uses BiomedBERT, a domain-specific version of bidirectional encoder representations from transformers (BERT) that was pre-trained on biomedical literature, to reduce the number of resources needed during fine-tuning. Low-rank adaptation (LoRA) was used to fine-tune the model on the DrugBank dataset. The objective was to classify DDIs into two clinically distinct categories, that is, synergistic and antagonistic interactions. A pseudo-labeling strategy was created to deal with the problem of not having enough labeled data. A curated ground-truth dataset was constructed using polarity-labeled interaction entries from DrugComb and verified DrugBank antagonism pairs. The fine-tuned model is used to figure out what kinds of interactions there are in the rest of the unlabeled data. A checkpointing system saves predictions and confidence scores in small pieces, which means that the process can be continued and is not affected by system crashes. The framework is designed to log every prediction it makes, allowing results to be refined later, either manually or through automated updates, without discarding low-confidence cases, as traditional threshold-based methods often do. The method keeps a record of every output it generates, making it easier to revisit earlier predictions, either by experts or with improved tools, without depending on preset confidence cutoffs. It was built with efficiency in mind, so it can handle large amounts of biomedical text without heavy computational demands. Rather than focusing on model novelty, this research demonstrates how existing biomedical transformers can be adapted to polarity-aware DDI classification with minimal computational overhead, emphasizing deployment feasibility and clinical relevance.

Keywords:

polypharmacy; clinical decision support systems; drug–drug interaction; LoRA fine-tuning; BiomedBERT; BERT

1. Introduction

Polypharmacy, the concurrent use of more than five active substances–commonly used in the literature [1]–has grown considerably in cardiology, oncology, psychiatry, and geriatrics [2]. Although it enables holistic management of multimorbidity, each additional drug increases the combinatorial space of potential drug–drug interactions (DDIs). A 2023 claims-based analysis covering four European Union states found that 78% of adults taking more than seven prescriptions experienced at least one clinically actionable DDI [3,4].

Clinical decision support systems (CDSSs) are the primary safeguard against such risks, as they can screen medication orders in real time and generate context-aware alerts during regimen construction. Linking polypharmacy profiles to CDSS logic, therefore, directly influences patient safety and therapeutic efficacy [5].

However, most commercial CDSS systems treat DDIs as binary hazards [6], ignoring their polarity. This means that interactions are simply flagged as present or absent, without indicating whether they enhance or diminish therapeutic effects. Distinguishing synergistic (therapeutically beneficial) from antagonistic (harmful or efficacy-reducing) interactions could help clinicians deliberately exploit positive synergies—such as

β

-lactam/

β

-lactamase inhibitor pairs in infectious disease—while mitigating adverse antagonism.

State-of-the-art biomedical language models such as BiomedBERT [7] can interpret unstructured textual evidence in DrugBank, MEDLINE, or electronic health records (EHR), offering a scalable route to polarity-aware classification. The present study employs BiomedBERT as the backbone and fine-tunes it with low-rank adaptation (LoRA) [8], allowing memory-constrained hospital servers to deploy the model without storing full-precision weight deltas.

This research bridges the intensifying demand of polypharmacy, the need for polarity-aware CDSS alerts, and a resource-efficient transformer pipeline that generates those alerts. A polarity-balanced seed corpus was assembled from three curated sources and used to fine-tune the BiomedBERT backbone via parameter-efficient LoRA adaptation. While DrugBank provides millions of interaction descriptions, only a small subset of polarity-labeled interactions was identified from DrugComb and the DrugBank antagonism file, which were used to build the supervised training set. The trained model is applied to the DrugBank database to infer synergy or antagonism across unlabeled drug–drug interaction sentences. DrugBank provides structured descriptions of drug interactions, making it suitable for polarity inference. The proposed modular pipeline enables scalable polarity classification with minimal annotation effort and prepares the labeled data for downstream CDSS integration.

This article is organized as follows: Section 2 goes over pertinent work on polarity-aware models, transformer fine-tuning, and DDI classification techniques. The methodology in Section 3 explains the LoRA fine-tuning technique, labeling system, and data sources. The Section 4 lists the possible approaches to clinical integration. Comparative findings and experimental assessment are presented in Section 5, while Section 6 addresses important conclusions, constraints, and ramifications. Section 7 presents the conclusions of the paper and a future research direction.

A preliminary version of this work was presented as an abstract at FMF-AI 2025, but not published as an article.

2. Literature Review

Recent rule-centric engines use metabolic ontologies and curated CYP-450 tables to detect pharmacokinetic clashes but still miss pharmacodynamic synergies. For example, in a seminal study [9], drug database providers were found to update their interaction rule bases monthly, with each update vetted by clinical experts to maintain more than 90% precision for recently approved drugs. Noor and Assiri [10] showed that a Tanimoto similarity threshold above 0.85 can recover over 60% of true DDIs, although recall remains limited.

Comprehensive surveys covering 2020–2024 report that graph neural networks and variational auto-encoders obtain AUROC between 0.82 and 0.86 on DrugBank pairs. Yan et al. [11] employed a heterogeneous graph attention network(GAT) with chemical, gene-expression, and pathway edges. Liu et al. [12] proposed the synergistic graph neural network (SynerGNet), a graph attention network tailored for predicting synergistic drug pairs in oncology. By integrating cell-line viability profiles and chemical descriptors, their model achieved a balanced accuracy of 84.1% on DrugComb-derived synergy data. However, it relies heavily on experimental omics inputs (e.g., gene expression and dose-response curves), limiting its direct applicability in real-world clinical decision support, where such data are unavailable at prescription time. Despite strong performance, the infrastructure demands and feature requirements of SynerGNet restrict its use to specialized research environments or pharmacological studies where rich experimental annotation is available.

Other studies have used domain-adapted language models such as BioBERT, PubMedBERT, and BiomedBERT, which have outperformed their general-domain counterparts by 6–10 percentage points in the F1 score (the harmonic mean of precision and recall) on relation-extraction benchmarks [13]. Shankar et al. [14] demonstrated that incorporating sentence-level attention explanations improved pharmacist trust ratings in a simulated medication-review task.

While domain-specific models like BioBERT, PubMedBERT, and BiomedBERT significantly improve biomedical language comprehension, they are not trained to recognize interaction polarity. These models learn general contextual embeddings but do not distinguish between synergistic and antagonistic effects unless explicitly fine-tuned on labeled polarity data. In this study, we use LoRA-based tuning to adapt BiomedBERT for this specific classification task.

Hu et al. [8] benchmarked LoRA against adapters and prefix-tuning on eight biomedical tasks, in which it decomposes weight updates into low-rank matrices—often rank 8–16—so that only 0.5–2% of parameters are trainable. LoRA matched full fine-tuning within 0.3 F1 points while reducing video random-access memory (VRAM) usage by 12×.

In a different approach, Zhang et al. [15] applied confidence-weighted pseudo-labels to MEDI-SPAN entries—a widely used commercial drug database containing structured information about drug interactions, dosages, and contraindications—observing a 5–7% macro-F1 gain with only 1000 human annotations. Unlike their method, which applied fixed confidence thresholds during training, our framework logs all predictions along with their confidence scores. This allows for flexible, rule-based filtering or manual review after inference, without discarding potentially informative cases prematurely.

Several earlier studies have improved predictive accuracy in DDI modeling by using molecular graph structures. One such method, for example, presented a graph neural network learning size-adaptive molecular substructures to capture chemically relevant interactions at several levels, enhancing pharmacological effect classification among compounds [16]. This graph-based framework highlights the predictive utility of structural drug features, though it does not explicitly address sentence-level interpretability or clinical deployment.

Excessive system alerts and poor targeting still hinder the adoption of CDSS tools—nearly half of the alerts in outpatient care are ignored or overridden by clinicians [17]. That is why intelligent systems that can have a computational understanding of medical context are crucial. AI-enhanced CDSS prototypes that incorporate interaction polarity reduced override rates to 28%, supporting the clinical relevance of polarity-aware classifiers.

In the medical field, it is especially important to understand how decisions are made. That is why explainability should be a core element in CDSS. Tanvir et al. [18] proposed a heterogeneous attention network for drug–drug interaction prediction (HAN-DDI), a heterogeneous graph attention network trained on biomedical interaction graphs consisting of drugs, targets, enzymes, and side effects. The model achieved high performance in DDI prediction, reaching an F1 score of 95.18% for existing drugs and 82.87% for novel drugs, demonstrating strong generalization. However, the system depends on structured biomedical triples and does not operate directly on unstructured clinical narratives, limiting its utility in NLP-based settings.

Unlike previous models that relied on structured biomedical triples or omics data, our LoRA-BiomedBERT model operates directly on free-text interaction statements, making it more suitable for NLP-based clinical decision systems where such unstructured data is prevalent.

Despite recent advances, many DDI classification models either demand extensive experimental inputs or fail to process real-world textual descriptions found in clinical databases. This leaves academic prototypes apart from accessible tools. Our work seeks to close this gap by creating a lightweight, polarity-aware model capable of direct DDIs straight from raw narrative data classification. Another purpose of the research was to offer a scalable solution for real-time decision assistance in healthcare situations where structured data is often limited. The suggested technique balances accuracy against practicality.

Table 1 summarizes representative approaches in DDI prediction, highlighting the input types, architectures, and whether polarity awareness was supported. Most prior works focused on interaction presence rather than directional classification, which this study addresses explicitly.

3. Methodology

The experimental workflow follows a linear four-step process. It starts with label acquisition, followed by model fine-tuning, large-scale inference, and finally, ledger-guided refinement. The workflow is illustrated in Figure 1.

3.1. Polarity-Balanced Seed Corpus

Three curated resources containing explicit polarity information were used, as follows: DrugComb-ASDCD synergism, a dataset of synergistic drug pairs experimentally validated in cancer studies; the comprehensive DrugComb synergy-score matrix (Bliss, Loewe, ZIP, HSA), which quantifies interaction effects across multiple models; and the DrugBank Antagonism file, which lists clinically documented cases where drug co-administration leads to reduced efficacy or adverse outcomes. The data was normalized by mapping drug names and synonyms to their canonical DrugBank identifiers using exact string matching and synonym resolution heuristics, then merged across sources. Rows were retained only when the synergy or antagonism score deviated by at least ±10% from the expected null effect (i.e., no interaction), or when an adverse outcome was documented in the literature. The 10% threshold was selected empirically based on prior use in DrugComb synergy scoring and to exclude minor numerical fluctuations around neutral interactions. It provides a practical filter that reduces noise while retaining biologically meaningful polarity shifts.

After this procedure, the resulting complete set was balanced, yielding 7436 synergistic and 7436 antagonistic interaction sentences (14,872 total), corresponding to an equal class distribution with a balancing ratio of 1:1. These were formatted as short declarative statements (e.g., “Drug A increases the anticoagulant effect of Drug B”). In classification problems, class balancing is essential as unbalanced datasets might cause biased learning and worse generalization for the minority class. In our research, the naturally balanced class distribution made neither intentional oversampling nor undersampling necessary. As shown in [19], this helps reduce problems such as inflated accuracy at the expense of recall or model overfitting to the dominant class.

To evaluate generalization, the corpus was split into three files, as follows: a training set (total of 11,744 values) and a held-out test set (total of 2936 values), and a validation set (192 values), displayed in Table 2 below for better clarity. No drug entity appears in both files, guaranteeing that the model must extrapolate polarity to unseen drugs rather than memorize drug-specific phrases.

The model was provided with the Interaction_Description field values, which are a set of natural language sentences. Structured identifiers and labels were only used to put together datasets and keep track of metadata. We used BiomedBERT’s default tokenizer to break these sentences into tokens, keeping the meaning of each sentence intact. Structured fields, like DrugBank IDs, were kept only for labeling and metadata purposes.

3.2. LoRA Fine-Tuning of BiomedBERT

In the present work, the microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext model was fine-tuned using LoRA [8]. For each self-attention block, the original query and value projection matrices

W_{Q, V}^{(0)} \in R^{d \times k}

are augmented by a trainable low-rank term:

Δ W = B A, B \in R^{d \times r}, A \in R^{r \times k}

(1)

so that the adapted weights become as follows:

W_{Q, V} = W_{Q, V}^{(0)} + Δ W

(2)

With rank

r = 8

and

d = k = 768

(matching BiomedBERT’s hidden size), two low-rank matrices B and A together introduce exactly 12,288 additional parameters—only 0.01117% of the base model—while recovering the representational capacity of a full-rank update. The value

r = 8

was selected based on prior LoRA studies and early empirical results indicating a favorable tradeoff between accuracy and parameter efficiency.

Training minimizes the cross-entropy objective:

L = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{c \in {S, A}} y_{i c} log p_{i c}, p_{i c} = \frac{e^{z_{i c}}}{\sum_{j} e^{z_{i j}}}

(3)

where

S, A

denote the synergistic and antagonistic classes, and

z_{i c}

is the logit for sample i and class c. The Adam optimizer with weight decay (AdamW), set with

β_{1} = 0.9

,

β_{2} = 0.999

, and

ϵ = 10^{- 8}

was used in optimization. Helping to stabilize convergence and prevent oscillations during training, these values regulate the exponential decay rates for the moving averages of gradient (first moment) and squared gradient (second moment). The selected parameters conform to conventional defaults experimentally proved to be suitable for transformer-based designs. We employed early stopping after two epochs without improvement to avoid overfitting using a learning rate of

2 \times 10^{- 4}

with 10% linear warm-up.

Training was performed on an NVIDIA RTX 3060 with 12 GB VRAM. At batch size 8 and sequence length 256, convergence was reached within 6–7 epochs over ∼3.5 h. The LoRA adapter introduced 11,744 trainable examples, allowing for practical fine-tuning on consumer hardware. The resulting adapter weighs <10 MB and can be merged into the base checkpoint on-the-fly or kept separate for rapid version control.

While DrugBank provides millions of interaction descriptions, only a small subset of polarity-labeled interactions could be identified from DrugComb and the DrugBank antagonism file, which was used to build the supervised training set.

After fine-tuning, the adapter-augmented model was used to label all remaining interaction sentences in DrugBank v5.1.10. Prior to inference, every

({drug}_{1}, {drug}_{2})

pair present in the seed corpus was removed. This ensured a strict separation between training and application data. Roughly 1.5 million sentences were streamed through the model in batches of 256; predictions, confidences, and checkpoint hashes were inserted into a resumable SQLite ledger. This exhaustive logging allows downstream rules or pharmacists to review low-confidence pairs without re-executing the classifier.

3.3. Ledger-Guided Refinement

The model does not throw away any of its predictions, even those it is unsure about. Every output is logged in a small SQLite file that acts like a ledger. Along with the label, it saves the confidence score, the sentence it was based on, and some basic context like model version and timestamp. This setup makes it easy to track what the model said, why it said it, and how certain it was. That way, anything suspicious or borderline can be reviewed later or used to refine future versions. Instead, the ledger allows post hoc decisions—e.g., identifying low-confidence samples for human review or retraining. Unlike hard-threshold filtering, this confidence-aware logging enables rules-based feedback (e.g., ATC class contradictions) and pharmacist-guided refinement without data loss. It also supports resumable inference and persistent tracking of model decisions for each processed interaction.

4. Potential Clinical Integration

The compact LoRA adapter (<10 MB) and the ONNX-runtime inference graph can be containerized in a FastAPI microservice that exposes a single endpoint /classify, which accepts input drug pairs and returns polarity predictions. This endpoint receives HTTP POST requests containing structured input in the form of drug–drug interaction sentences. Electronic prescribing systems that follow the HL7 Fast Healthcare Interoperability Resources (FHIR) standard—specifically the MedicationRequest resource—can generate (drug1, drug2, sentence) triples based on prescriptions and forward them to the /classify service. The response is a JSON payload containing a predicted interaction label (e.g., synergistic or antagonistic), a confidence score, and an optional rationale traceable to the input text.

Returned polarity allows the CDSS layer to suppress benign synergies or emphasize antagonistic hazards with traceable natural-language explanations. This design demonstrates technical feasibility without asserting that full regulatory integration has been achieved, keeping the article’s focus firmly on LoRA-based classification.

5. Experimental Evaluation

Experimental results show that applying LoRA tuning to BiomedBERT yields a strong gain in polarity classification without the heavy cost of full model retraining. After eight training epochs, the fine-tuned model correctly classified 2347 out of 2936 test samples—an accuracy of roughly 79.96%. Figure 2 provides a multi-angle comparison. In Figure 2a,b, the confusion matrices show that while the baseline model struggles with both classes, LoRA recovers a more balanced and accurate classification. Figure 2c tracks clear gains across accuracy, F1, precision, and recall. Figure 2d further confirms that LoRA improves both synergistic and antagonistic recognition.

ROC curves in Figure 2e show significant improvement in area under the curve (AUC), rising from 0.449/0.644 (baseline) to 0.864/0.866 (LoRA). This demonstrates how well the model balances precision and recall when it finds synergistic interactions, which were the positive class during evaluation. The model stays very confident across a wide range of thresholds, with an area under the curve of 0.865. In real life, this is useful because decisions often depend on being able to tell the difference between combinations that are reliably good and those that are neutral or bad.

However, since precision–recall (PR) curves are asymmetric, a complete view of model performance requires plotting an additional curve using Synergistic as the positive class. This is achieved by inverting the class labels and their associated probabilities. The multi-curve evaluation in Figure 2f highlights recall–precision trade-offs between classes, exposes mild imbalance, and reveals asymmetries that would be hidden in scalar metrics like F1 or accuracy. In clinical applications, a higher PR curve is desirable when false positives (e.g., predicting synergy when not present) would lead to risky combinations. Therefore, this analysis helps prioritize recall or precision depending on the clinical tolerance for risk.

We trained and tested the model in both the synergistic and antagonistic polarity classes, using balanced data. Figure 3 shows the features of the data set, such as the class distribution, how the data set is divided up, and the lengths of the input sentences. The accuracy, precision, recall, F1 score, and confusion matrix all show how well both classes did. During training and analysis, no class was left out.

To obtain a point of reference, we also ran the base BiomedBERT model without any fine-tuning. We used the same test set and did not apply LoRA or any task-specific adjustments. The results were clearly weaker, especially for synergy cases, where the model often missed the correct label; see Table 3. This makes sense, since the original model was not trained to recognize polarity. These baseline scores helped us gauge how much the tuning process actually improves the outcome.

In more concrete terms, it recognizes synergistic statements nine times out of ten (recall ≈ 89.59%) and still identifies around 70.31% of antagonistic ones. These results demonstrate a solid performance for sentence-level DDI extraction, particularly given that only 11,744 parameters were updated during training (a fraction of the base model).

Several follow-up experiments—varying LoRA rank,

α

-scaling, dropout, and learning rate—exhibited the same pattern, that is, accuracy fluctuated within a narrow ±2% band, while memory consumption and training time remained essentially constant. This stability suggests that the model’s performance is largely driven by the pretrained biomedical priors of the backbone, rather than fine-tuned hyperparameter settings.

Misclassifications tend to cluster around vague formulations such as “Drug A may affect the activity of Drug B,” where the sentence provides no explicit indication of benefit or harm. It is expected that recall for the Antagonistic class will improve once such borderline examples are incorporated into the next pseudo-labeling round, thereby exposing the model to a broader variety of negative cues.

We also checked how the model performs on completely new drug combinations—the ones it never saw during training. Table 4 shows a side-by-side comparison between the original BiomedBERT and the LoRA-tuned version. LoRA handled the unfamiliar data much better, with big improvements in F1 and precision. This suggests it does not just memorize, but can generalize to new cases.

To better understand how the model performs on drugs it has not seen before, Figure 4 presents a side-by-side comparison of the baseline and LoRA-enhanced versions. In Figure 4a,b, we see that the original model fails to detect any synergistic interactions, whereas the LoRA variant manages to correctly identify both types with strong precision. The bar chart in Figure 4c breaks down the metric gains, showing the biggest improvements in F1 score and precision. ROC and AUC scores for each class are shown in Figure 4d,e, and they indicate that the model can separate synergistic and antagonistic cases with high reliability. Lastly, the curves in Figure 4f suggest that LoRA remains confident across a wide range of thresholds, which is useful when making real-world decisions based on prediction certainty.

The results validate LoRA as a pragmatic compromise between performance and deployability. Achieving almost 86% accuracy on new drugs with only 11,744 trainable parameters (2936 test parameters and 192 parameters for validation) provides a credible foundation for integration into live CDSS modules. Meanwhile, the adapter’s small footprint ensures ease of installation and version control.

6. Discussion

This research demonstrates that polarity-based classification of drug interactions can be achieved with relatively lightweight adaptations of a pretrained biomedical language model. LoRA proved effective in limiting the number of trainable parameters while preserving classification performance, a critical factor for deployment in systems with constrained compute capacity. The model’s accuracy remained consistent across multiple runs, even under variation in dropout and learning rate, suggesting that the pretrained backbone generalizes well to polarity annotation.

6.1. Comparative Evaluation with Existing Models

To contextualize these results, the proposed LoRA-BiomedBERT classifier was compared with two recent state-of-the-art approaches: SynerGNet and HAN-DDI. While SynerGNet delivers strong performance, it depends on cell-line omics data, which limits its applicability in real-time clinical settings. HAN-DDI, despite its high F1 score and strong generalization for both known and novel drugs, requires structured biomedical triples and cannot process unstructured narrative data directly. In contrast, our approach operates entirely on raw DrugBank sentences, requires no experimental or graph-based inputs, and achieves over 70% accuracy with just approx. 12K trainable parameters. It offers a practical trade-off between interpretability, deployment simplicity, and data accessibility (see Table 5).

Apart from independent implementation, the suggested polarity-aware classifier is also a contender for inclusion into more general hybrid diagnostic systems. Especially, we view it in line with Arik et al.’s “Hybrid Intelligent Medical Diagnosis System” [20], which combines symbolic thinking and connectionist models for enhanced clinical inference. The technique we propose might improve such models by providing sophisticated DDI classification from unstructured text, hence strengthening the system’s capacity to understand challenging pharmacological environments within multimodal diagnostic pipelines.

6.2. Limitations of the Research

However, several limitations were observed. The classifier struggled with ambiguous or non-specific formulations—sentences lacking clear outcomes or directionality. Such cases are prevalent in narrative biomedical datasets and likely contributed to the lower recall observed for the Antagonistic class. To enhance robustness, future iterations of the model could benefit from training on a wider range of sentence structures and edge-case interactions.

Additionally, performance varied slightly depending on which polarity was treated as the positive class during evaluation. This observation reinforces the importance of assessing polarity classes independently, rather than relying solely on macro-averaged metrics. Finally, the confidence-based ledger was helpful for tracking predictions and may facilitate future filtering or iterative retraining, without requiring a full repeat of prior inference stages.

7. Conclusions

The present study describes a polarity-aware classification framework based on LoRA-tuned BiomedBERT that can tell the difference between synergistic and antagonistic DDIs using sentence-level input. The system was trained on a dataset with polarity labels that came from DrugComb and DrugBank, and it was tested on a set that did not have any drug entities that were also in the training set.

This method looks promising for use in clinical support tools, especially in cases where structured labels are not available but there is still access to sentence-level interaction descriptions. It is efficient enough to run in limited environments and does not require full model retraining. We have not tested the model yet on clinical data like patient records or hospital notes. Right now, it only works on labeled sentences from public datasets. Before it can be used in practice, it needs to be tested in more realistic settings.

One of the next things we will do is expand the labels. For now, it only handles synergism and antagonism. We also plan to run it on other biomedical text sources to see how well it holds up outside this setup.

Author Contributions

Conceptualization, I.-F.G. and V.-I.B.; methodology, I.-F.G. and V.-I.B.; software, I.-F.G.; validation, I.-F.G., V.-I.B., and L.B.I.; formal analysis, I.-F.G., V.-I.B., L.B.I.; investigation, I.-F.G.; writing—original draft preparation, I.-F.G.; writing—review and editing, I.-F.G., V.-I.B., and L.B.I.; supervision, L.B.I.; funding, I.-F.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The training dataset is derived from DrugBank and DrugComb resources that are not redistributable under an open license. As such, the raw data cannot be shared publicly. Inference logs and evaluation metrics can be made available upon request. The fine-tuning code is not released at this time due to pending extensions for follow-up studies.

Acknowledgments

The authors gratefully acknowledge the support of the Research Center on Artificial Intelligence, Data Science, and Smart Engineering (ARTEMIS) at the George Emil Palade University of Medicine, Pharmacy, Science, and Technology of Târgu Mureş, Romania. Additional support was provided by the CA22137 COST Action and the Randomized Optimization Algorithms Research Network (ROAR-NET). The authors also thank the Research Group on Artificial Intelligence and Data Science for Healthcare Innovation (REFLECTION) for its insightful contributions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Masnoon, N.; Shakib, S.; Kalisch Ellett, L.; Caughey, G. What is polypharmacy? A systematic review of definitions. BMC Geriatr. 2017, 17, 230. [Google Scholar] [CrossRef] [PubMed]
Mouazer, A.; Tsopra, R.; Sedki, K.; Letord, C.; Lamy, J.B. Decision-support systems for managing polypharmacy in the elderly: A scoping review. J. Biomed. Inform. 2022, 130, 104074. [Google Scholar] [CrossRef] [PubMed]
Hu, Z.; Liu, W.; Zhang, C.; Huang, J.; Zhang, S.; Yu, H.; Xiong, Y.; Liu, H.; Ke, S.; Hong, L. SAM-DTA: A sequence-agnostic model for drug–target binding affinity prediction. Briefings Bioinform. 2023, 24, bbac533. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Xiong, Z.; Zhang, W.; Liu, S. Deep learning for drug-drug interaction prediction: A comprehensive review. Quant. Biol. 2024, 12, 30–52. [Google Scholar] [CrossRef]
Zhang, Y.; Li, H.; Duan, H.; Zhao, Y. Mobilizing clinical decision support to facilitate knowledge translation: A case study in China. Comput. Biol. Med. 2015, 60, 40–50. [Google Scholar] [CrossRef] [PubMed]
Dhami, D.S.; Yan, S.; Kunapuli, G.; Page, D.; Natarajan, S. Predicting Drug-Drug Interactions from Heterogeneous Data: An Embedding Approach. arXiv 2021. [Google Scholar] [CrossRef]
Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef] [PubMed]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021. [Google Scholar] [CrossRef]
Bryant, A.; Fletcher, G.; Payne, T. Drug Interaction Alert Override Rates in the Meaningful Use Era. Appl. Clin. Inform. 2014, 5, 802–813. [Google Scholar] [CrossRef] [PubMed]
Noor, A.; Assiri, A. A Rule-Based Inference Framework to Explore and Explain the Biological Related Mechanisms of Potential Drug-Drug Interactions. Comput. Math. Methods Med. 2022, 2022, 093262. [Google Scholar] [CrossRef] [PubMed]
Yan, X.Y.; Yin, P.W.; Wu, X.M.; Han, J.X. Prediction of the Drug–Drug Interaction Types with the Unified Embedding Features from Drug Similarity Networks. Front. Pharmacol. 2021, 12, 794205. [Google Scholar] [CrossRef] [PubMed]
Liu, M.; Srivastava, G.; Ramanujam, J.; Brylinski, M. SynerGNet: A Graph Neural Network Model to Predict Anticancer Drug Synergy. Biomolecules 2024, 14, 253. [Google Scholar] [CrossRef] [PubMed]
Yuan, Z.; Liu, Y.; Tan, C.; Huang, S.; Huang, F. Improving Biomedical Pretrained Language Models with Knowledge. arXiv 2021, arXiv:2104.10344. [Google Scholar] [CrossRef]
Shankar, V.; Yousefi, E.; Manashty, A.; Blair, D.; Teegapuram, D. Clinical-GAN: Trajectory Forecasting of Clinical Events using Transformer and Generative Adversarial Networks. Artif. Intell. Med. 2023, 138, 102507. [Google Scholar] [CrossRef] [PubMed]
Zhang, S.; System, V.P.H. 344-2013: A Macro to Read in Medi-Span Text Format Database by Data Dictionary. SAS Global Forum. 2013. Available online: https://support.sas.com/resources/papers/proceedings13/344-2013.pdf (accessed on 24 April 2025).
Gheorghita, F.I.; Pop, D.O.; Iantovics, L.B. Molecular Feature-Based Prediction of Drug-Drug Interactions Using Graph Neural Networks. In Proceedings of the 18th International Conference Interdisciplinarity in Engineering, Targu Mures, Romania, 3–4 October 2024; Moldovan, L., Gligor, A., Eds.; Springer: Cham, Switzerland, 2025; pp. 23–40. [Google Scholar] [CrossRef]
Jaspers, M.W.M.; Smeulers, M.; Vermeulen, H.; Peute, L.W. Effects of clinical decision-support systems on practitioner performance and patient outcomes: A synthesis of high-quality systematic review findings. J. Am. Med Inform. Assoc. 2011, 18, 327–334. [Google Scholar] [CrossRef] [PubMed]
Tanvir, F.; Saifuddin, K.M.; Akbas, E. DDI Prediction via Heterogeneous Graph Attention Networks. arXiv 2022, arXiv:2207.05672. [Google Scholar] [CrossRef]
Muraru, M.M.; Simó, Z.; Iantovics, L.B. Cervical Cancer Prediction Based on Imbalanced Data Using Machine Learning Algorithms with a Variety of Sampling Methods. Appl. Sci. 2024, 14, 10085. [Google Scholar] [CrossRef]
Arik, S.; Iantovics, L.B. Next Generation Hybrid Intelligent Medical Diagnosis Systems. In Proceedings of the Neural Information Processing, Guangzhou, China, 14–18 November 2017; Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S.M., Eds.; Springer: Cham, Switzerland, 2017; pp. 903–912. [Google Scholar] [CrossRef]

Figure 1. Schematic overview of the polarity-labeled dataset assembly and classification workflow, including integration, normalization, and inference stages.

Figure 2. Comparison between the pretrained BiomedBERT and the LoRA fine-tuned BiomedBERT on the test set. (a,b) Confusion matrices show LoRA’s improved classification across both classes. (c) Metric-wise comparison reveals substantial accuracy and F1 improvements post fine-tuning. (d) Per-class F1 scores highlight more balanced performance under LoRA. (e) ROC curves show AUC gains for both synergy (0.864) and antagonism (0.866) under LoRA. (f) Precision–recall curves indicate stronger class separation and confidence consistency in the LoRA model.

Figure 3. Dataset characteristics and input sentence analysis. (a) Label distribution in the training set, balanced between synergistic and antagonistic interactions. (b) Dataset size split across training (11,744), test (2936), and validation (192) sets, ensuring no drug overlap between them. (c) Distribution of input sentence lengths measured in characters; the mean length is 108, supporting efficient tokenization and uniform input formatting.

Figure 4. Comprehensive validation analysis comparing pretrained and LoRA-tuned BiomedBERT models. (a) Confusion matrix for the base model, showing poor classification of synergistic samples. (b) LoRA-enhanced model confusion matrix, with major improvements in both classes. (c) Performance comparison by metric (accuracy, F1, precision, recall) between both models. (d) ROC curves showing near-identical AUCs for synergistic and antagonistic prediction. (e) Per-class AUC scores confirm strong separability. (f) Precision–recall curves further highlight LoRA’s improvements across confidence thresholds.

Table 1. Summary of key models discussed in the literature review, including their primary features and limitations regarding polarity classification.

Study	Method/Architecture	Input/Data Type	Limitations
Yan et al. [11]	Heterogeneous GAT	Chemical, pathway, and gene-expression edges	Requires structured omics data not available at prescribing time
Liu et al. [12]	SynGNet (GAT for synergy)	DrugComb synergy labels, cell-line viability, chemical descriptors	High performance, but depends on dose–response curves and experimental features
Shankar et al. [14]	Domain-specific LMs (BioBERT, PubMedBERT)	PubMed text corpora, fine-tuned on relation extraction	Strong generalization, but not polarity-aware and lacks directional prediction
Noor and Assiri [10]	Rule-based similarity model	Tanimoto similarity thresholds applied to drug fingerprints	Captures structural similarity, but recall is limited; no linguistic inference
This work	LoRA-tuned BiomedBERT	Natural-language interaction descriptions	Trained on polarity labels (synergy vs. antagonism); limited to sentence-level cues

Table 2. Dataset composition by interaction type.

Split	Synergistic	Antagonistic	Total
Training	5872	5872	11,744
Testing	1468	1468	2936
Validation	92	96	192

Table 3. Test set performance comparison between pretrained BiomedBERT and BiomedBERT with LoRA fine-tuning models.

Metric	Pretrained	LoRA	Absolute Diff	Relative Diff (%)
Accuracy	0.4986	0.7994	0.3007	60.31
F1	0.3339	0.7975	0.4636	138.82
Precision	0.3747	0.8109	0.4363	116.45
Recall	0.4986	0.7994	0.3007	60.31
F1 Synergistic	0.0027	0.8170	0.8143	30,048.18
F1 Antagonistic	0.6652	0.7780	0.1128	16.96

Table 4. Validation set performance comparison.

Metric	Pretrained	LoRA	Improvement (%)
Accuracy	0.4896	0.8542	+74.47%
F1	0.3287	0.8539	+159.81%
Precision	0.2474	0.8566	+246.30%
Recall	0.4896	0.8542	+74.47%

Table 5. Comparative performance of recent DDI classification models.

Study	Model	Dataset	New Drug Accuracy	Notes
Liu et al. [12]	SynerGNet	Cell-line synergy data	84.1%	Requires omics data
Tanvir et al. [18]	HAN-DDI	Heterogeneous biomedical graph	82.87%	Limited sentence-level generalization
Our fine-tuned LoRA-BiomedBERT model	LoRA-BiomedBERT (Transformer w/LoRA)	DrugBank v5.1 + DrugComb	85.42%	Lightweight, no omics needed

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gheorghita, I.-F.; Bocanet, V.-I.; Iantovics, L.B. Fine-Tuning BiomedBERT with LoRA and Pseudo-Labeling for Accurate Drug–Drug Interactions Classification. Appl. Sci. 2025, 15, 8653. https://doi.org/10.3390/app15158653

AMA Style

Gheorghita I-F, Bocanet V-I, Iantovics LB. Fine-Tuning BiomedBERT with LoRA and Pseudo-Labeling for Accurate Drug–Drug Interactions Classification. Applied Sciences. 2025; 15(15):8653. https://doi.org/10.3390/app15158653

Chicago/Turabian Style

Gheorghita, Ioan-Flaviu, Vlad-Ioan Bocanet, and Laszlo Barna Iantovics. 2025. "Fine-Tuning BiomedBERT with LoRA and Pseudo-Labeling for Accurate Drug–Drug Interactions Classification" Applied Sciences 15, no. 15: 8653. https://doi.org/10.3390/app15158653

APA Style

Gheorghita, I.-F., Bocanet, V.-I., & Iantovics, L. B. (2025). Fine-Tuning BiomedBERT with LoRA and Pseudo-Labeling for Accurate Drug–Drug Interactions Classification. Applied Sciences, 15(15), 8653. https://doi.org/10.3390/app15158653

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fine-Tuning BiomedBERT with LoRA and Pseudo-Labeling for Accurate Drug–Drug Interactions Classification

Abstract

Featured Application

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Polarity-Balanced Seed Corpus

3.2. LoRA Fine-Tuning of BiomedBERT

3.3. Ledger-Guided Refinement

4. Potential Clinical Integration

5. Experimental Evaluation

6. Discussion

6.1. Comparative Evaluation with Existing Models

6.2. Limitations of the Research

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI