LLMs and AI Agents in Biomedical and Health Sciences

A special issue of AI (ISSN 2673-2688). This special issue belongs to the section "Medical & Healthcare AI".

Deadline for manuscript submissions: 22 December 2026 | Viewed by 1984

Editors


E-Mail Website
Guest Editor
Division of Nephrology, Hypertension & Renal Transplantation, Department of Medicine (Quantitative Health), Intelligent Clinical Care Center (IC3), University of Florida College of Medicine, Gainesville, FL 32610, USA
Interests: machine learning; AI; LLM; multimodal representation learning; AR/VR/XR; digital twin; HCI; clinical decision making; precision medicine
School of Information Sciences, University of Tennessee, Knoxville, TN 37996, USA
Interests: multimodal machine learning; large language model–based methods; genomic, environmental and clinical data; clinical decision support; precision medicine; population health

Special Issue Information

Dear Colleagues,

Recent advances in large language models (LLMs) and AI agents are transforming the biomedical and health sciences by enabling systems that can reason, interact, and act across complex, data-rich environments. Unlike earlier task-specific machine learning models, LLMs and agentic AI systems can integrate heterogeneous biomedical data, perform multi-step reasoning, collaborate with humans, and dynamically adapt to evolving clinical and research contexts. These capabilities present unprecedented opportunities for advancing biomedical discovery, clinical decision support, digital health, education, and healthcare operations, while also raising critical challenges related to safety, reliability, interpretability, ethics, and governance. 

This Special Issue focuses on the design, evaluation, and application of LLMs and AI agents in biomedical and health sciences, emphasizing systems that move beyond static prediction toward interactive, goal-directed, and context-aware intelligence. The scope includes foundational methods, system architectures, and real-world deployments spanning clinical care, biomedical research, public health, and health system operations. Topics of interest include LLM-based clinical and scientific reasoning, multimodal and agentic AI systems, retrieval-augmented generation (RAG), tool-using and autonomous agents, human–AI collaboration, and methods for ensuring robustness, transparency, and trustworthiness in high-stakes biomedical settings. 

The purpose of this Special Issue is threefold. First, it aims to provide a dedicated venue for high-quality methodological and applied research on LLMs and AI agents tailored to biomedical and healthcare domains. Second, it seeks to establish best practices for evaluation, validation, and governance of these systems, particularly in safety-critical and ethically sensitive applications. Third, it encourages interdisciplinary contributions that bridge artificial intelligence, medicine, biomedical informatics, health systems engineering, and social sciences. 

This Special Issue complements and extends existing literature on medical AI and biomedical informatics by shifting the focus from isolated predictive models to agentic, interactive, and system-level AI. While prior work has largely emphasized supervised learning and narrow clinical tasks, this issue highlights emerging paradigms centered on reasoning, autonomy, multimodal integration, and continuous human–AI interaction. By bringing together theoretical advances, practical implementations, and real-world evaluations, this Special Issue aims to advance the responsible development and translation of LLMs and AI agents into biomedical research and healthcare practice.

Dr. Zhenhong Hu
Dr. Yingbo Ma
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-anonymized peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. AI is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • large language models (LLMs)
  • agents
  • biomedical AI
  • healthcare AI
  • clinical decision support
  • multimodal AI
  • retrieval-augmented generation (RAG)
  • agentic AI systems
  • human–AI collaboration
  • trustworthy and explainable AI

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

30 pages, 1280 KB  
Article
FHIR-RAG-MEDS: Integrating HL7 FHIR with Retrieval-Augmented Large Language Models for Enhanced Medical Decision Support
by Yildiray Kabak, Gokce B. Laleci Erturkmen, Mert Gencturk, Tuncay Namli, A. Anil Sinaci, Ruben Alcantud Corcoles, Cristina Gómez Ballesteros, Pedro Abizanda, Volkan Atmis and Asuman Dogac
AI 2026, 7(7), 246; https://doi.org/10.3390/ai7070246 - 2 Jul 2026
Abstract
Background: Evidence-based clinical guidelines are essential for high-quality care yet translating them into personalized clinical decision support remains resource-intensive and time-consuming. Large language models (LLMs) show promise for supporting clinical decision-making, but their limited access to patient-specific data and explicit guideline sources constrains [...] Read more.
Background: Evidence-based clinical guidelines are essential for high-quality care yet translating them into personalized clinical decision support remains resource-intensive and time-consuming. Large language models (LLMs) show promise for supporting clinical decision-making, but their limited access to patient-specific data and explicit guideline sources constrains trustworthiness, personalization, and clinical applicability. Retrieval-augmented generation (RAG) addresses part of this challenge by grounding model outputs in curated evidence sources; however, true personalization requires structured access to electronic health record data. Methods: This study presents FHIR-RAG-MEDS, a medical decision support system that integrates HL7 Fast Healthcare Interoperability Resources (FHIR) with an RAG-enhanced LLM to enable patient-specific, guideline-concordant clinical recommendations. Through SMART on FHIR, the system retrieves real-time patient data from FHIR servers, generates structured medical summaries, and incorporates this personalized context into the RAG pipeline, grounding responses in evidence-based clinical guidelines stored in a vector database. Results: FHIR-RAG-MEDS was evaluated using 139 physician-generated clinical questions covering dementia, chronic obstructive pulmonary disease, hypertension, and sarcopenia. Performance was assessed using automated metrics, RAG-specific evaluation frameworks, and independent expert physician review. The system consistently outperformed state-of-the-art medical LLMs, demonstrating higher semantic accuracy, improved faithfulness to guideline content, and stronger clinical relevance. Conclusions: Integrating HL7 FHIR with RAG-based LLMs enables trustworthy, personalized clinical decision support, bridging the gap between static language models and real-world, patient-centered care. Full article
(This article belongs to the Special Issue LLMs and AI Agents in Biomedical and Health Sciences)
Show Figures

Figure 1

36 pages, 11796 KB  
Article
Gemini-Augmented Digital Twin Framework for Biodegradable Mg-Based Implants: A Proof-of-Concept for Multi-Domain Design Integration
by Veronica Manescu (Paltanea), Iosif-Vasile Nemoianu, Gheorghe Paltanea, Iulian Antoniac, Aurora Antoniac, Alexandru Streza, Gabriel Cristescu, Costel Paun and Adrian-Vasile Dumitru
AI 2026, 7(6), 221; https://doi.org/10.3390/ai7060221 - 15 Jun 2026
Viewed by 570
Abstract
Background: Biodegradable implants manufactured from Mg-based alloys are one of the most commonly used in orthopedics. However, their overall clinical acceptance is influenced by their fast corrosion speed and hydrogen emission. Based on an innovative manufacturing route previously described, this study introduces a [...] Read more.
Background: Biodegradable implants manufactured from Mg-based alloys are one of the most commonly used in orthopedics. However, their overall clinical acceptance is influenced by their fast corrosion speed and hydrogen emission. Based on an innovative manufacturing route previously described, this study introduces a preliminary proof-of-concept for a Gemini-assisted Digital Twin (Gemini-DT),which is an AI-augmented in silico framework designed to consider a MgF2 conversion coating on the implant surface and to model the synchronization of the degradation process with new bone formation. Methods: Based on the integration of experimental data for Mg-Nd and Mg-Zn alloys and by considering the implant geometry and coating formation, we developed, in collaborative work with LLM Gemini 1.5 Flash (Google), a four-module cognitive framework (surface thermodynamic synergy (Module 1), degradation analysis and alloy extract concentration management (Module 2), micro-channel fluidics and mechanical stability (Module 3), and bio-mechanical synchronization and regenerative evaluation (Module 4)) to evaluate simulated implant behaviors). Results: Using a 10,000 iteration Monte Carlo stability simulation, the model demonstrated a potential 12% reduction in false-negative design screening errors compared to rigid rule-based systems, achieving strong internal decision consistency in sustaining the mandated parametric compliance window. Computational verification supports the projected biocompatibility trends of Mg-Zn alloys, as previously demonstrated in our in vivo studies. Conclusions: Our research leads to a consistent computational architecture dedicated to Mg-based implants and offers a robust platform for virtual design and optimization. These observations suggest that the developed model can recover viable designs, whereas traditional linear models may reject them. Full article
(This article belongs to the Special Issue LLMs and AI Agents in Biomedical and Health Sciences)
Show Figures

Figure 1

Review

Jump to: Research

22 pages, 3938 KB  
Review
Human Evaluation of Large Language Models: A Review and Protocol Selection Framework
by Tad T. Brunyé
AI 2026, 7(5), 174; https://doi.org/10.3390/ai7050174 - 19 May 2026
Viewed by 966
Abstract
Evaluating large language models (LLMs) critically depends on human judgment. This article reviews and develops a conceptual framework for human-centered LLM evaluation, synthesizing research across evaluation methodology, psychometrics, cognitive science, and domain-specific applications. Four primary challenges are identified that limit current human evaluation [...] Read more.
Evaluating large language models (LLMs) critically depends on human judgment. This article reviews and develops a conceptual framework for human-centered LLM evaluation, synthesizing research across evaluation methodology, psychometrics, cognitive science, and domain-specific applications. Four primary challenges are identified that limit current human evaluation practice: imperfect gold standards, evaluator fatigue and overload, shared and unique bias structures across humans and LLM judges, and the routine omission of uncertainty and dispersion estimates. To address these gaps, the STEP-V design framework is proposed: Stakes, Task-type, Evaluator availability, Purpose, and Volume, for selecting human and/or automated LLM evaluation methods under real-world constraints. An evaluator failure mode taxonomy is also proposed that analyzes human and LLM judges within a common error framework, clarifying where hybrid pipelines can compensate for weaknesses and where they might compound them. The framework motivates a more rigorous science of LLM evaluation, one that treats human judgment as a necessary but fallible measurement requiring explicit design, calibration, and uncertainty quantification. Full article
(This article belongs to the Special Issue LLMs and AI Agents in Biomedical and Health Sciences)
Show Figures

Figure 1

Back to TopTop