Large Language Models in Medical Diagnostics: Advancing Clinical Practice, Research, and Patient Care

A special issue of Diagnostics (ISSN 2075-4418). This special issue belongs to the section "Machine Learning and Artificial Intelligence in Diagnostics".

Deadline for manuscript submissions: 31 July 2026 | Viewed by 2218

Special Issue Editor

Special Issue Information

Dear Colleagues, 

Large Language Models (LLMs) represent a breakthrough in artificial intelligence with significant potential to facilitate clinical diagnostics and workflow. Their capacity to process, interpret, and generate natural language—particularly from unstructured clinical narratives, laboratory reports, and imaging summaries—enables novel applications in diagnostic reasoning, patient communication, workflow automation, and research that can work to improve health.

While promising, the integration of LLMs into diagnostic medicine remains at an early stage, constrained by challenges related to real-world performance, interpretability, bias mitigation, and regulatory compliance. There is a critical need for a dedicated forum to examine how LLMs can enhance diagnostic accuracy, efficiency, and accessibility while addressing inherent risks such as algorithmic errors, data privacy concerns, and model generalizability.

This Special Issue of Diagnostics will focus specifically on the role of LLMs in supporting, refining, and accelerating diagnostic processes across medical specialties. We seek contributions that demonstrate empirical advances, validate clinical utility, and engage with the practical and ethical dimensions of implementing LLM-based tools in diagnostic settings.

Potential Topics for the Special Issue

  1. Diagnostic Decision Support
    • LLMs for differential diagnosis generation and case-based reasoning.
    • Integration of LLMs with clinical data (EHRs, lab results, imaging reports) to support diagnostic accuracy.
    • Comparative studies between LLM-assisted diagnostics and existing clinical decision support systems.
  2. Interpretation and Reporting of Diagnostic Data
    • Automated generation of structured reports from radiology, pathology, and cardiology interpretations.
    • Summarization and translation of complex diagnostic information into clinician-friendly formats.
    • Extraction of structured findings from free-text clinical notes for diagnostic validation.
  3. Patient-Centered Diagnostic Applications
    • LLM-enabled tools for explaining diagnostic results to patients in plain language.
    • Enhancing patient understanding of imaging, laboratory, and pathology reports.
    • Evaluating patient engagement and comprehension when interacting with LLM-generated diagnostic explanations.
  4. Workflow Efficiency in Diagnostic Medicine
    • Reducing documentation burden through automated note-taking, coding, and preliminary summarization.
    • Applications in prior authorization, referral coordination, and test result communication.
    • Operational impacts of LLMs on diagnostic throughput and turnaround time.
  5. Research and Data Curation for Diagnostics
    • LLM-assisted systematic reviews and meta-analyses related to diagnostic methods.
    • Mining biomedical literature for diagnostic biomarker discovery or test validation.
    • Synthesizing evidence to support diagnostic guideline development.
  6. Validation, Ethics, and Implementation
    • Addressing biases, inequities, and fairness in diagnostic LLM applications.
    • Explainable AI approaches that ensure diagnostic model interpretability.
    • Regulatory and privacy considerations (e.g., HIPAA, GDPR) when deploying LLMs in clinical diagnostics.

Impact and Significance

This Special Issue will provide a timely and focused platform for presenting cutting-edge research on the application of LLMs in medical diagnostics. It aims to bridge the gap between computational innovation and clinical practice by highlighting studies that emphasize validation, usability, and integration into real-world diagnostic pathways. By convening research from AI experts, clinical diagnosticians, laboratory scientists, and regulatory scholars, the Special Issue will contribute to establishing best practices and guiding the responsible adoption of LLMs in diagnostic medicine.

Prof. Dr. Tim Duong
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Diagnostics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • large language models
  • diagnostic
  • prognosis
  • decision support
  • clinical practice

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 2440 KB  
Article
Evaluating GPT-5 for Melanoma Detection Using Dermoscopic Images
by Qingguo Wang, Ihunna Amugo, Harshana Rajakaruna, Maria Johnson Irudayam, Hua Xie, Anil Shanker and Samuel E. Adunyah
Diagnostics 2025, 15(23), 3052; https://doi.org/10.3390/diagnostics15233052 - 29 Nov 2025
Viewed by 505
Abstract
Background: Melanoma is the deadliest form of skin cancer, for which early detection is crucial and can lead to positive survival outcomes. Advances in AI, particularly large language models (LLMs) such as GPT-5, present promising opportunities to support melanoma early detection, but [...] Read more.
Background: Melanoma is the deadliest form of skin cancer, for which early detection is crucial and can lead to positive survival outcomes. Advances in AI, particularly large language models (LLMs) such as GPT-5, present promising opportunities to support melanoma early detection, but their performance in this domain has not been systematically assessed. Objectives: Assess GPT-5’s diagnostic performance on dermoscopic images. Methods: GPT-5 was evaluated on two public benchmark datasets: the ISIC Archive and HAM10K, using 100 and 500 randomly selected dermoscopic images, respectively. Via the OpenAI Application Programming Interface (API), GPT-5 was prompted to perform three tasks: (1) top-1 or primary diagnosis, (2) top-3 differential diagnoses, and (3) malignancy discrimination (melanoma vs. benign). Model outputs were compared with histopathology-verified ground truth, and performance was measured by sensitivity, specificity, accuracy, F1 score, and other metrics. Results: GPT-5 achieved modest accuracy in top-1 or primary diagnosis but markedly improved performance in top-3 differential diagnoses, with sensitivity > 93%, specificity > 86%, accuracy ≥ 92%, and F1 score > 91%. For malignancy discrimination, GPT-5 showed more balanced sensitivity and specificity than GPT-4-based models (GPT-4V, GPT-4T, and GPT-4o), resulting in more reliable classification overall. Conclusions: GPT-5 outperformed GPT-4 and its derivatives, particularly in differential diagnosis, highlighting its potential for clinical decision support and medical education. However, GPT-5 also showed a tendency to misclassify melanoma as benign, underscoring the need for cautious clinical interpretation and refinement. Full article
Show Figures

Figure 1

16 pages, 2243 KB  
Article
Evaluating Large Language Models in Interpreting MRI Reports and Recommending Treatment for Vestibular Schwannoma
by Arthur H. A. Sales, Christine Julia Gizaw, Jürgen Beck and Jürgen Grauvogel
Diagnostics 2025, 15(22), 2841; https://doi.org/10.3390/diagnostics15222841 - 10 Nov 2025
Viewed by 770
Abstract
Background/Objectives: The use of large language models (LLMs) by patients seeking information about their diagnosis and treatment is rapidly increasing. While their application in healthcare is still under scientific investigation, the demand for these models is expected to grow significantly in the [...] Read more.
Background/Objectives: The use of large language models (LLMs) by patients seeking information about their diagnosis and treatment is rapidly increasing. While their application in healthcare is still under scientific investigation, the demand for these models is expected to grow significantly in the coming years. This study evaluates the accuracy of three publicly available AI tools—GPT-4, Gemini, and Bing—in interpreting MRI reports and suggesting treatments for patients with vestibular schwannomas (VS). To evaluate and compare the diagnostic accuracy and treatment recommendations provided by GPT-4, Gemini, and Bing for patients with VS based on MRI reports, while addressing the growing use of these tools by patients seeking medical information. Methods: This retrospective study included 35 consecutive patients with VS treated at a university-based neurosurgery department. Anonymized MRI reports in German were translated to English, and AI tools were prompted with five standardized verbal prompts for diagnoses and treatment recommendations. Diagnostic accuracy, differential diagnoses, and treatment recommendations were assessed and compared. Results: Thirty-five patients (mean age, 57 years ± 13; 18 men) were included. GPT-4 achieved the highest diagnostic accuracy for VS at 97.14% (34/35), followed by Gemini at 88.57% (31/35), and Bing at 85.71% (30/35). GPT-4 provided the most accurate treatment recommendations (57.1%, 20/35), compared to Gemini (45.7%, 16/35) and Bing (31.4%, 11/35). GPT-4 correctly recommended surgery in 60% of cases (21/35), compared to 51.4% for Bing (18/35) and 45.7% for Gemini (16/35). The difference between GPT-4 and Bing was statistically significant (p-value: 0.02). Conclusions: GPT-4 outperformed Gemini and Bing in interpreting MRI reports and providing treatment recommendations for VS. Although the AI tools demonstrated good diagnostic accuracy, their treatment recommendations were less precise than those made by an interdisciplinary tumor board. This study highlights the growing role of AI tools in patient-driven healthcare inquiries. Full article
Show Figures

Figure 1

13 pages, 1102 KB  
Article
From Prompts to Practice: Evaluating ChatGPT, Gemini, and Grok Against Plastic Surgeons in Local Flap Decision-Making
by Gianluca Marcaccini, Luca Corradini, Omar Shadid, Ishith Seth, Warren M. Rozen, Luca Grimaldi and Roberto Cuomo
Diagnostics 2025, 15(20), 2646; https://doi.org/10.3390/diagnostics15202646 - 20 Oct 2025
Cited by 1 | Viewed by 713
Abstract
Background: Local flaps are a cornerstone of reconstructive plastic surgery for oncological skin defects, ensuring functional recovery and aesthetic integration. Their selection, however, varies with surgeon experience. Generative artificial intelligence has emerged as a potential decision-support tool, although its clinical role remains [...] Read more.
Background: Local flaps are a cornerstone of reconstructive plastic surgery for oncological skin defects, ensuring functional recovery and aesthetic integration. Their selection, however, varies with surgeon experience. Generative artificial intelligence has emerged as a potential decision-support tool, although its clinical role remains uncertain. Methods: We evaluated three generative AI platforms (ChatGPT-5 by OpenAI, Grok by xAI, and Gemini by Google DeepMind) in their free-access versions available in September 2025. Ten preoperative photographs of suspected cutaneous neoplastic lesions from diverse facial and limb sites were submitted to each platform in a two-step task: concise description of site, size, and tissue involvement, followed by the single most suitable local flap for reconstruction. Outputs were compared with the unanimous consensus of experienced plastic surgeons. Results: Performance differed across models. ChatGPT-5 consistently described lesion size accurately and achieved complete concordance with surgeons in flap selection. Grok showed intermediate performance, tending to recognise tissue planes better than lesion size and proposing flaps that were often acceptable but not always the preferred choice. Gemini estimated size well, yet was inconsistent for anatomical site, tissue involvement, and flap recommendation. When partially correct answers were considered acceptable, differences narrowed but the overall ranking remained unchanged. Conclusion: Generative AI can support reconstructive reasoning from clinical images with variable reliability. In this series, ChatGPT-5 was the most dependable for local flap planning, suggesting a potential role in education and preliminary decision-making. Larger studies using standardised image acquisition and explicit uncertainty reporting are needed to confirm clinical applicability and safety. Full article
Back to TopTop