Advances in Large Language Models for Biological and Medical Applications

A special issue of Big Data and Cognitive Computing (ISSN 2504-2289). This special issue belongs to the section "Large Language Models and Embodied Intelligence".

Deadline for manuscript submissions: 30 April 2026 | Viewed by 5869

Special Issue Editors


E-Mail Website
Guest Editor
Information Technology Center, University of Tokyo, Tokyo, Japan
Interests: NLP; bioinformatics; LLM

E-Mail Website
Guest Editor
Computer Science Department, University College Dublin, Dublin, Ireland
Interests: LLMs; recommendation system; machine learning

Special Issue Information

Dear Colleagues,

The rapid advancement of Large Language Models (LLMs) has revolutionized numerous fields, with the biological and medical applications standing out as particularly transformative. LLMs, empowered by their ability to process and generate human-like text, offer unprecedented opportunities for enhancing patient care, accelerating biomedical research, and improving healthcare management. The integration of LLMs into these domains not only augments the capabilities of healthcare professionals but also democratizes access to advanced medical knowledge, enabling more informed and timely interventions. This Special Issue, "Advances in Large Language Models for Biological and Medical Applications", highlights cutting-edge research that bridges the gap between artificial intelligence and healthcare, underscoring the critical importance of developing reliable, interpretable, and ethically sound AI-driven solutions to address the complex challenges in modern medicine and biology.

The Special Issue seeks to showcase cutting-edge research that leverages LLMs to revolutionize the biological and medical fields. This Issue underscores the critical importance of transparency and factual accuracy in generative AI approaches, ensuring that LLM-driven solutions are reliable and trustworthy for clinical and biomedical applications. Research exploring the application of LLMs in under-represented languages and efforts to bridge health disparities is highly encouraged. By fostering comprehensive and inclusive dialogue, this Special Issue aims to advance the integration of LLMs in biological and medical research, ultimately contributing to improved healthcare outcomes globally. We encourage researchers from diverse backgrounds and disciplines to submit their innovative work, driving forward the frontiers of AI-assisted biomedical and clinical applications.

In this Special Issue, original research articles and reviews are welcome. Research areas may include (but are not limited to) the following:

  • Infrastructure and Pre-trained Language Models for Biomedical NLP;
  • Processing and Annotation Platforms;
  • Synthetic Data Generation and Data Augmentation;
  • Translating NLP Research into Clinical Practice;
  • Applications and Methods for Low-Resource Languages;
  • Medical or Clinical Knowledge Graphs;
  • Achieving Reproducible Results;
  • Clinical Decision Support Systems;
  • Medical Information Retrieval and Mining;
  • Electronic Health Records (EHR) Analysis;
  • Privacy and Security in Biomedical LLM Applications;
  • Multimodal Large Language Models in Healthcare;
  • Explainability and Interpretability of Biomedical LLMs;
  • Ethical Considerations in Medical LLM Deployment;
  • Predictive Analytics in Healthcare Using LLMs;
  • Personalized Medicine and Large Language Models;
  • Integration of LLMs with Existing Healthcare Systems;
  • Natural Language Understanding for Biomedical Literature;
  • Real-time Data Processing and LLMs in Emergency Medicine;
  • Text Simplification;
  • Question Answering;
  • System Testing and Evaluation Strategies.

We look forward to receiving your contributions.

Dr. Irene Li
Dr. Ruihai Dong
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • LLMs
  • NLP
  • bioinformatics
  • EHRs
  • medical text processing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 1065 KB  
Article
Fine-Tuning LLaMA2 for Summarizing Discharge Notes: Evaluating the Role of Highlighted Information
by Mahshad Koohi Habibi Dehkordi, Yehoshua Perl, Fadi P. Deek and Hao Liu
Big Data Cogn. Comput. 2026, 10(1), 4; https://doi.org/10.3390/bdcc10010004 - 22 Dec 2025
Cited by 2 | Viewed by 997
Abstract
This study investigates whether incorporating highlighted information in discharge notes improves the quality of the summaries generated by Large Language Models (LLMs). Specifically, it evaluates the effect of using highlighted versus unhighlighted inputs for fine-tuning LLaMA2-13B model for summarization tasks. We fine-tuned LlaMA2-13B [...] Read more.
This study investigates whether incorporating highlighted information in discharge notes improves the quality of the summaries generated by Large Language Models (LLMs). Specifically, it evaluates the effect of using highlighted versus unhighlighted inputs for fine-tuning LLaMA2-13B model for summarization tasks. We fine-tuned LlaMA2-13B in two variants using MIMIC-IV-Ext-BHC dataset: one variant fine-tuned with the highlighted discharge notes (H-LLaMA), and the other on the same set of notes without highlighting (U-LLaMA). Highlighting was performed automatically using a Cardiology Interface Terminology (CIT) presented in our previous work. H-LLaMA and U-LLaMA were evaluated on a randomly selected test set of 100 discharge notes using multiple metrics (including BERTScore, ROUGE-L, BLEU, and SummaC_CONV). Additionally, LLM-based judgment via ChatGPT-4o rated coherence, fluency, conciseness, and correctness, alongside a manual completeness evaluation on a random sample of 40 notes. H-LLaMA consistently outperformed U-LLaMA across all metrics. H-summaries, generated using H-LLaMA, in comparison to U-summaries, generated using U-LLaMA, achieved higher BERTScore (63.75 vs. 59.61), ROUGE-L (23.43 vs. 21.82), BLEU (10.4 vs. 8.41), and SummaC_CONV (67.7 vs. 40.2). Manual review also showed improved completeness for H-summaries (54.8% vs. 47.6%). All improvements were statistically significant (p < 0.05). Moreover, LLM-based evaluation indicated higher average ratings across coherence, correctness, and conciseness. Full article
Show Figures

Figure 1

21 pages, 5329 KB  
Article
CURE: Confidence-Driven Unified Reasoning Ensemble Framework for Medical Question Answering
by Ziad Elshaer and Essam A. Rashed
Big Data Cogn. Comput. 2025, 9(12), 299; https://doi.org/10.3390/bdcc9120299 - 23 Nov 2025
Viewed by 1110
Abstract
High-performing medical Large Language Models (LLMs) typically require extensive fine-tuning with substantial computational resources, limiting accessibility for resource-constrained healthcare institutions. This study introduces a confidence-driven multi-model framework that leverages model diversity to enhance medical question answering without fine-tuning. Our framework employs a two-stage [...] Read more.
High-performing medical Large Language Models (LLMs) typically require extensive fine-tuning with substantial computational resources, limiting accessibility for resource-constrained healthcare institutions. This study introduces a confidence-driven multi-model framework that leverages model diversity to enhance medical question answering without fine-tuning. Our framework employs a two-stage architecture: a confidence detection module assesses the primary model’s certainty, and an adaptive routing mechanism directs low-confidence queries to Helper models with complementary knowledge for collaborative reasoning. We evaluate our approach using Qwen3-30B-A3B-Instruct, Phi-4 14B, and Gemma 2 12B across three medical benchmarks; MedQA, MedMCQA, and PubMedQA. Results demonstrate that our framework achieves competitive performance, with particularly strong results in PubMedQA (0.95) and MedMCQA (0.78). Ablation studies confirm that confidence-aware routing combined with multi-model collaboration substantially outperforms single-model approaches and uniform reasoning strategies. This work establishes that strategic model collaboration offers a practical, computationally efficient pathway to improve medical AI systems, with significant implications for democratizing access to advanced medical AI in resource-limited settings. Full article
Show Figures

Figure 1

40 pages, 2077 KB  
Article
Robust Clinical Querying with Local LLMs: Lexical Challenges in NL2SQL and Retrieval-Augmented QA on EHRs
by Luka Blašković, Nikola Tanković, Ivan Lorencin and Sandi Baressi Šegota
Big Data Cogn. Comput. 2025, 9(10), 256; https://doi.org/10.3390/bdcc9100256 - 11 Oct 2025
Viewed by 2836
Abstract
Electronic health records (EHRs) are typically stored in relational databases, making them difficult to query for nontechnical users, especially under privacy constraints. We evaluate two practical clinical NLP workflows, natural language to SQL (NL2SQL) for EHR querying and retrieval-augmented generation for clinical question [...] Read more.
Electronic health records (EHRs) are typically stored in relational databases, making them difficult to query for nontechnical users, especially under privacy constraints. We evaluate two practical clinical NLP workflows, natural language to SQL (NL2SQL) for EHR querying and retrieval-augmented generation for clinical question answering (RAG-QA), with a focus on privacy-preserving deployment. We benchmark nine large language models, spanning open-weight options (DeepSeek V3/V3.1, Llama-3.3-70B, Qwen2.5-32B, Mixtral-8 × 22B, BioMistral-7B, and GPT-OSS-20B) and proprietary APIs (GPT-4o and GPT-5). The models were chosen to represent a diverse cross-section spanning sparse MoE, dense general-purpose, domain-adapted, and proprietary LLMs. On MIMICSQL (27,000 generations; nine models × three runs), the best NL2SQL execution accuracy (EX) is 66.1% (GPT-4o), followed by 64.6% (GPT-5). Among open-weight models, DeepSeek V3.1 reaches 59.8% EX, while DeepSeek V3 reaches 58.8%, with Llama-3.3-70B at 54.5% and BioMistral-7B achieving only 11.8%, underscoring a persistent gap relative to general-domain benchmarks. We introduce SQL-EC, a deterministic SQL error-classification framework with adjudication, revealing string mismatches as the dominant failure (86.3%), followed by query-join misinterpretations (49.7%), while incorrect aggregation-function usage accounts for only 6.7%. This highlights lexical/ontology grounding as the key bottleneck for NL2SQL in the biomedical domain. For RAG-QA, evaluated on 100 synthetic patient records across 20 questions (54,000 reference–generation pairs; three runs), BLEU and ROUGE-L fluctuate more strongly across models, whereas BERTScore remains high on most, with DeepSeek V3.1 and GPT-4o among the top performers; pairwise t-tests confirm that significant differences were observed among the LLMs. Cost–performance analysis based on measured token usage shows per-query costs ranging from USD 0.000285 (GPT-OSS-20B) to USD 0.005918 (GPT-4o); DeepSeek V3.1 offers the best open-weight cost–accuracy trade-off, and GPT-5 provides a balanced API alternative. Overall, the privacy-conscious RAG-QA attains strong semantic fidelity, whereas the clinical NL2SQL remains brittle under lexical variation. SQL-EC pinpoints actionable failure modes, motivating ontology-aware normalization and schema-linked prompting for robust clinical querying. Full article
Show Figures

Figure 1

Back to TopTop