Next Article in Journal
A Resilient Energy-Efficient Framework for Jamming Mitigation in Cluster-Based Wireless Sensor Networks
Next Article in Special Issue
Multimodal LLM vs. Human-Measured Features for AI Predictions of Autism in Home Videos
Previous Article in Journal
Integrating Structured Time-Series Modeling and Ensemble Learning for Strategic Performance Forecasting
Previous Article in Special Issue
SAIN: Search-And-INfer, a Mathematical and Computational Framework for Personalised Multimodal Data Modelling with Applications in Healthcare
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluating Google Gemini’s Capability to Generate NBME-Standard Pharmacology Questions Using a 16-Criterion NBME Rubric

Department of Foundational Medical Studies, Oakland University William Beaumont School of Medicine, Rochester, MI 48309, USA
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(10), 612; https://doi.org/10.3390/a18100612
Submission received: 10 August 2025 / Revised: 19 September 2025 / Accepted: 25 September 2025 / Published: 29 September 2025
(This article belongs to the Special Issue Algorithms for Computer Aided Diagnosis: 2nd Edition)

Abstract

Background: Large language models (LLMs) such as Google Gemini have demonstrated strong capabilities in natural language generation, but their ability to create medical assessment items aligned with National Board of Medical Examiners (NBME) standards remains underexplored. Objective: This study evaluated the quality of Gemini-generated NBME-style pharmacology questions using a structured rubric to assess accuracy, clarity, and alignment with examination standards. Methods: Ten pharmacology questions were generated using a standardized prompt and assessed independently by two pharmacology experts. Each item was evaluated using a 16-criterion NBME rubric with binary scoring. Inter-rater reliability was calculated (Cohen’s Kappa = 0.81) following a calibration session. Results: On average, questions met 14.3 of 16 criteria. Strengths included logical structure, appropriate distractors, and clinically relevant framing. Limitations included occasional pseudo-vignettes, cueing issues, and one instance of factual inaccuracy (albuterol mechanism of action). The evaluation highlighted Gemini’s ability to produce high-quality NBME-style questions, while underscoring concerns regarding sample size, reproducibility, and factual reliability. Conclusions: Gemini shows promise as a tool for generating pharmacology assessment items, but its probabilistic outputs, factual inaccuracies, and limited scope necessitate caution. Larger-scale studies, inclusion of multiple medical disciplines, incorporation of student performance data, and use of broader expert panels are recommended to establish reliability and educational applicability.
Keywords: google gemini; NBME-style questions; pharmacology education; artificial intelligence in medical education; prompt engineering; clinical vignettes; USMLE preparation google gemini; NBME-style questions; pharmacology education; artificial intelligence in medical education; prompt engineering; clinical vignettes; USMLE preparation

Share and Cite

MDPI and ACS Style

Almasri, W.; Saad, M.; Mohiyeddini, C. Evaluating Google Gemini’s Capability to Generate NBME-Standard Pharmacology Questions Using a 16-Criterion NBME Rubric. Algorithms 2025, 18, 612. https://doi.org/10.3390/a18100612

AMA Style

Almasri W, Saad M, Mohiyeddini C. Evaluating Google Gemini’s Capability to Generate NBME-Standard Pharmacology Questions Using a 16-Criterion NBME Rubric. Algorithms. 2025; 18(10):612. https://doi.org/10.3390/a18100612

Chicago/Turabian Style

Almasri, Wesam, Marwa Saad, and Changiz Mohiyeddini. 2025. "Evaluating Google Gemini’s Capability to Generate NBME-Standard Pharmacology Questions Using a 16-Criterion NBME Rubric" Algorithms 18, no. 10: 612. https://doi.org/10.3390/a18100612

APA Style

Almasri, W., Saad, M., & Mohiyeddini, C. (2025). Evaluating Google Gemini’s Capability to Generate NBME-Standard Pharmacology Questions Using a 16-Criterion NBME Rubric. Algorithms, 18(10), 612. https://doi.org/10.3390/a18100612

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop