1. Introduction
The acquisition of proficient history-taking skills is a cornerstone of clinical education, forming the foundation for critical reasoning, accurate diagnosis, and effective patient care [
1]. These skills are crucial for effective patient communication and clinical decision-making [
2].
Traditionally, clinical history-taking skills were acquired through direct student–patient interactions. Medical students were granted access to hospital wards, where they engaged with patients to collect medical histories. Typically, students were expected to summarize and report their findings to a clinical tutor afterward, who would provide feedback and guidance, an approach commonly referred to as conventional bedside clinical teaching [
3].
Other traditional instructional formats, such as practicing with standardized patients, using simulators, or engaging in peer role-playing, remain essential for developing history-taking skills. However, these methods often face significant challenges, including limited patient diversity and availability, time constraints, and a lack of consistent opportunities for immediate feedback. Such limitations can impede the structured and continuous development of students’ clinical competencies [
4].
Technology-enhanced learning tools have revolutionized medical history-taking education, offering innovative solutions that address traditional limitations while providing scalable, accessible, and personalized learning experiences. These tools range from 360-degree Virtual reality films, virtual patient simulations, mobile applications, to artificial intelligence-powered chatbots, each designed to enhance student competence and confidence in this fundamental clinical skill [
5].
Advancements in artificial intelligence (AI) and natural language processing (NLP) have offered a promising solution to these pedagogical challenges. AI-powered chatbots, acting as virtual patients, can generate unlimited, discipline-specific clinical scenarios on demand, enabling fully interactive and dynamic training sessions and offering a unique opportunity to create immersive, interactive, and scalable learning experiences [
6]. To allow more student–patient interactions without increasing costs, staff’s workload, or the burden on patients, virtual simulated patients have emerged as an adjunctive approach [
6,
7,
8].
Chatbot-based virtual patient simulators are emerging as effective tools for enhancing history-taking skills. Simulating patient encounters provides medical students with a safe and controlled environment to practice history-taking skills, free from the constraints of clinical settings [
9]. Studies have demonstrated that these tools offer comparable efficacy to traditional bedside teaching while providing unique advantages in accessibility and scenario diversity [
3].
Recent empirical studies consistently support the feasibility and effectiveness of AI-powered chatbots in history-taking training. For instance, simulators driven by GPT-4 have demonstrated the ability to facilitate realistic and adaptive questioning, leading to measurable improvements in students’ self-reported competence after only a few sessions. Foundational research further confirms that large language model (LLM)-based virtual patients can generate medically plausible and contextually appropriate responses, effectively simulate authentic clinical encounters and enhance the learning experience [
10].
This potential is amplified when such platforms integrate automated, custom knowledge base, rubric-based feedback, which has been shown to produce measurable gains in history completeness and diagnostic reasoning [
10]. Moreover, the integration of real-time feedback mechanisms allows students to identify areas for improvement and refine their techniques [
11]. This approach will enhance learning efficiency and foster confidence and competence in clinical communication [
12].
During internal medicine rotations, where students are exposed to a wide range of clinical presentations, the ability to take a thorough and accurate history is paramount. A chatbot-based virtual simulator tailored to this context could serve as a valuable adjunct to traditional teaching methods, bridging the gap between theoretical knowledge and practical application [
10]. By customizing a chatbot’s knowledge base to incorporate diverse clinical scenarios, adaptive questioning techniques, and evidence-based feedback, such a tool has the potential to revolutionize how history-taking skills are taught and assessed.
A key advantage repeatedly highlighted in the literature is the creation of a safe and accessible learning environment, which students value for the opportunity to engage in repetitive practice without the pressure of being evaluated by peers or supervisors [
13]. Beyond establishing feasibility, recent work has demonstrated the tangible impact of these tools on skill acquisition, particularly when they incorporate automated feedback. While the existing literature establishes that AI chatbots are a feasible and promising tool, most studies have focused on either technical validity or self-reported outcomes.
While existing studies have demonstrated the potential of chatbots with customized knowledge bases to enhance clinical training [
14], most have examined either performance outcomes or user perceptions in isolation. Evidence from nursing education further supports the educational value of chatbot patients, suggesting promising cross-disciplinary generalizability [
13].
A clear gap remains in providing a comprehensive evaluation that triangulates objective data, such as OSCE scores, with rich qualitative insights into user experience. Such an integrated approach is crucial to determine whether the intervention is effective, but also to understand how and why it supports learning.
This study aims to address these gaps by evaluating a custom-built chatbot virtual patient simulator, tailored to the internal medicine rotation of medical students in this regional context. Building on the established potential of AI simulation to provide a safe, flexible platform for repeated practice [
15], this research offers a much-needed holistic appraisal of chatbot-based history-taking training.
This study hypothesized that medical students who practice with a chatbot simulator built on a customized knowledge base would achieve significantly higher scores on a history-taking OSCE post-intervention compared to their pre-intervention scores and to students trained with traditional methods. This mixed-methods study examines the effectiveness of the chatbot with a custom knowledge base in improving the history-taking skills of medical students and to explore the students’ experiences and perceptions of the chatbot as a learning tool.
2. Materials and Methods
2.1. Study Design and Setting
This study employed a mixed-methods, quasi-experimental, non-equivalent control group design to evaluate the educational intervention. A non-randomized, convenience, consecutive sampling technique was used. The study was conducted at Ibn Sina National College for Medical Studies, Jeddah, Saudi Arabia. The target population consisted of all fifth-year medical students (N = 313) enrolled in the internal medicine rotation during the second semester of the academic year 2024–2025.
The design integrated quantitative performance data (OSCE scores) with qualitative insights from survey responses and focus groups to rigorously assess the impact of the chatbot on student performance.
2.2. Sample/Participants
The sample size was determined using G*Power, 3.1 with a minimum of 50 students per group (power = 0.95, effect size = 0.5).
Experimental Group: 157 students who completed the internal medicine rotation with the AI Chabot intervention during an 18-week semester.
Control Group: 156 students from the preceding semester who underwent the traditional rotation without access to the Chatbot.
To ensure comparability and mitigate potential confounding variables, both groups were taught by the same faculty members using an identical curriculum and learning objectives. Baseline academic performance was matched between the two groups, as the college policy ensures equal distribution of GPAs when allocating students to rotations.
The intervention was rolled out in a structured sequence. First, all students in the experimental group participated in an initial training session on the principles of effective history-taking as part of their standard course curriculum. Following this, an onboarding session was conducted where all students were given a link to the first virtual patient. This session served as a hands-on tutorial to familiarize students with the chat interface and the process of interacting with the chatbot. Subsequently, links to the remaining three virtual patient scenarios were made available to the students for independent, self-paced practice at designated points throughout their rotation.
2.3. The AI Chatbot Intervention
The intervention introduced AI-powered virtual patients to train students on clinical history-taking skills. It integrated simulated patient encounters, automated formative feedback, and structured faculty debriefing.
2.3.1. The Virtual Patient Intervention
The virtual patient intervention was designed to simulate authentic clinical history-taking encounters using AI-powered chatbot technology. The intervention integrated structured clinical scenarios, standardized patient responses, automated formative feedback, and guided reflective learning to support the development of medical students’ history-taking skills during internal medicine rotations. The following subsections describe the technical platform, case development, prompt design, and feedback mechanisms used in the intervention.
Knowledge Base and Virtual Patient Scenarios
Four virtual patient cases were developed in alignment with the internal medicine curriculum and mapped to two clinical themes: (1) shortness of breath (bronchial asthma; congestive heart failure) and (2) lower-limb swelling (decompensated liver cirrhosis; diabetic kidney disease).
Table 1 summarizes the scenarios.
Case information supplied by the Department of Internal Medicine was organized into case-specific knowledge bases, which contained demographic data, chief complaint, history of present illness, and associated symptoms. Two professors of internal medicine independently reviewed all cases for realism, internal consistency, and alignment with OSCE evaluation criteria.
Two internal medicine Professors reviewed and approved each case to ensure realism, consistency, and alignment with OSCE grading rubrics. Screenshots from Patient No. 3 (Leila) exemplify the consistency and structure of chatbot case development (
Figure 2).
Prompt Design Architecture
Each chatbot was governed by a detailed prompt architecture designed to emulate high-fidelity patient interactions. The instruction set included:
Role Fixation: Chatbot was instructed to remain strictly in character (e.g., “You are Khalid, 85-year-old male with nephrotic syndrome”) throughout the session.
Interaction Rules: Chatbot responded only to the specific questions asked by the student and avoided unsolicited or excessive elaboration. The prompt architecture explicitly instructed the chatbot not to disclose additional clinical details unless directly requested by the student. Responses were designed to remain concise, medically plausible, and strictly consistent with the assigned clinical scenario.
Embedded Scoring System: A 30-point rubric, aligned with OSCE domains, was embedded in the chatbot’s prompt logic. Chatbot silently assigned marks during the session based on the appropriateness and completeness of each student’s question.
Conditional Feedback Disclosure: Chatbot revealed the total score and qualitative feedback once the student requested evaluation, even if the diagnosis was incomplete. This was handled through structured internal scripts that generated strengths and areas for improvement based on student questioning patterns, without external grading. An example of such automated feedback for both correct (Patient Ahmed—
Figure 3) and incomplete (Patient Leila—
Figure 4) sessions is included to demonstrate feedback granularity.
To enhance consistency and standardization across chatbot interactions, all virtual patient cases were developed using a predefined prompt architecture with fixed interaction rules and case-specific instructions. All prompts, scoring logic, and case content were reviewed and approved by two internal medicine professors to ensure alignment with OSCE-oriented history-taking objectives, response consistency, and traditional clinical teaching expectations.
Reflective Practice and Faculty Debriefing
To enhance metacognitive learning, students were prompted to self-reflect at the end of each session using guided questions (e.g., “What key information did you miss?”). Additionally, students were provided with standardized self-assessment rubric for self-evaluation. Weekly debriefing sessions facilitated by faculty instructors analyzed anonymized session transcripts, discussed exemplar interviews, and offered structured feedback to the cohort.
2.4. Data Collection and Measurement Tools
A multi-method data collection strategy was employed to capture both quantitative and qualitative outcomes, integrating students’ performance scores, self-reported perceptions by the Adapted TAM, and qualitative experiential insights (focus group). Three primary instruments were utilized.
2.4.1. Survey (Modified TAM Survey)
The post-intervention survey was structured according to the Technology Acceptance Model (TAM) [
16], which explains technology adoption through 4 dimensions: Perceived Usefulness (PU) and Perceived Ease of Use (PEOU), Attitude Toward Use (ATU) and Behavioral Intention to Use (BIU).
2.4.2. Quantitative Assessment of Students’ Performance
Assessment of Course Learning Outcome (CLO 1.2) employed both direct and indirect measures.
Direct assessment: Student performance was evaluated via OSCE stations using a validated history-taking rubric across two scenarios—an organ-specific case (hematemesis/upper GI bleeding) and an internal medical case (lower-limb swelling/edema).
Indirect assessment: Student perceptions were gathered through a course evaluation survey. Responses on a five-point Likert scale were converted to percentages and used as indirect indicators of achievement of CLO 1.2 for the two cohorts in the 2024/25 academic year
Validity: The rubric was critically reviewed by one medical education expert and two internal medicine experts, focusing on domain coverage, clarity of performance descriptors, and alignment with OSCE standards.
2.4.3. Focus Group Interview
Qualitative insights were collected through focus group discussions with a total of 32 students (17 Female and 15 male), organized in smaller groups of 4–5 participants to ensure interactive and in-depth dialogue. The discussions explored students’ perceptions regarding usability, realism, and educational value of the AI Chatbot intervention.
2.5. Ethical Considerations
The study protocol received full approval from the Institutional Review Board at Ibn Sina National College for Medical Studies (IRRB-02-24112024). Before participation, all students were provided with a detailed information sheet and were required to give written informed consent (included in the survey). To protect participant identities, participant privacy, and to ensure confidentiality, all collected data, including performance rubric scores and focus group transcripts, were fully anonymized.
2.6. Data Analysis Procedures
Quantitative data were analyzed using IBM SPSS-22 Statistics. The mean performance scores on the history-taking rubric for the experimental and control groups were compared using independent samples t-tests to determine statistical significance. Perceptual data from the survey were analyzed using descriptive statistics, and the experimental group was compared to the control group using the chi-square test. Qualitative data were coded and analyzed using thematic analysis to identify recurrent themes related to the student learning experience.
3. Results
The study included 157 (95 females, 62 males) 5th year medical students in the experimental cohort and 156 (91 females and 65 males) in the control cohort. Their age was mainly below 25 years (84.8%).
3.1. Quantitative Performance Assessment Outcomes
There were significant improvements in both OSCE performance (direct assessment) (79.2% vs. 74.8%,
p = 0.002, and a mean difference of ~4%) and student reported learning (indirect assessment) (81.2% vs. 79%,
p = 0.026, and a mean difference of ~2%) of history taking among experimental compared to control group (
Figure 5), only the experimental group reaching near the target level of learning out achievability of 80% (
Figure 5). Moreover, the Chatbot training group demonstrated a statistically significantly higher performance in history-taking skills score across both clinical scenarios compared to the traditional group (focused clinical history taking = 71.0% vs. 66.2%,
p = 0.008; with the mean difference of ~5%) and general clinical history = 87.4% vs. 83.4%,
p = 0.023; and a mean difference of ~4%) (
Figure 6).
3.2. The Perception of the Experimental Group of the Use of Chatbots in History Taking
The Chatbot trained group showed equally high perception scores (80%) of chatbot training concerning their PU, PEOU, ATU, and BIU. The median score of the perception of the Chatbot trained group of the use of chatbots in history taking was high (76 (IQR 28)) (80%) (
Table 2). Many students (63.8%) considered the chatbot as excellent, 16.2% as good, and only 20% as poor in history taking (
Figure 7). Good and excellent students in history taking before the training showed a better perception of chatbot training, irrespective of their age or gender (
p < 0.001), than other students (
Table 3).
3.3. Qualitative Insights from Focus Groups
The focus group discussions revealed three dominant themes regarding students’ experiences with the AI chatbot simulator, which complemented the quantitative findings:
Safe and Accessible Environment for Practice
Psychological Safety; Students valued the stress-free space to practice without fear of harming patients, which helped build confidence.
Accessibility and Flexibility: The ability to practice anytime, anywhere, and at one’s own pace was considered a major advantage.
Repetitive, Low-Stakes Practice: Students appreciated the opportunity to repeat cases multiple times, reinforcing skills through iterative, low-pressure learning.
High Usability and Immediate Feedback
User-Friendly Interface: The platform was described as intuitive, clear, and easy to use, resembling familiar messaging apps.
Value of Feedback: Immediate, structured feedback was viewed as highly beneficial, helping students identify mistakes, improve reasoning, and refine diagnostic skills.
Limitations and Supplementary Role
Lack of Human Interaction: Students noted that the chatbot could not replicate emotional depth, body language, or real-life variability.
Limited Communication Skills Training: It was useful for information gathering but insufficient for practicing empathy, rapport, and spontaneous dialogue.
Supplementary Function: Students agreed the chatbot is best as a bridge or adjunct, supporting early skill development but not replacing real patient interactions.
The overall evaluation of the chatbot collected by Mentimeter word cloud was illus-trated in
Figure 8.
4. Discussion
In this study, integrating a custom knowledge base chatbot into history-taking training significantly enhanced both students’ performance and their overall satisfaction.
The findings of this study provide strong evidence that the custom knowledge base is an effective adjunct for enhancing medical students’ history-taking skills. The key strength lies in its triangulation of evidence, where the statistically significant performance improvements are explained and validated by survey from users and positive subjective feedback in focus groups.
Participants reported consistently positive perceptions regarding the tool’s usefulness, ease of use, attitude toward it, and intention to adopt it in the future. A majority of students (63.8%) rated the chatbot as excellent, particularly those who already possessed good or excellent skills before training. Insights from the focus group discussions emphasized the value of the chatbot in providing a safe, flexible, and feedback-rich environment that supports the development of clinical reasoning and history-taking abilities. Nonetheless, students acknowledged its limitations in cultivating communication and emotional skills, underscoring its role as a supplementary and preparatory aid rather than a substitute for real patient interaction.
Additionally, a significant increase in student scores in both the organ-specific history (p = 0.008) and general history (p = 0.023) scenarios is a direct indicator of improved clinical reasoning.
The safe learning environment provided by the chatbot platform addresses a critical component of effective medical simulation training. Studies consistently demonstrate that psychological safety in simulation-based learning is essential for optimal skill acquisition, as it allows learners to engage fully without fear of judgment or negative consequences. This safe container for learning enables students to make mistakes and receive feedback without the stakes associated with real patient care, facilitating the iterative refinement of clinical reasoning skills. The accessibility factor, enabling practice “anytime, anywhere,” addresses known barriers in traditional medical education, where structured learning opportunities are often limited by scheduling constraints and resource availability [
17].
The distinction between organ-specific history (hematemesis) and general history (lower limb swelling) scenarios in our study reflects the complexity of clinical reasoning development across different medical domains. Research indicates that clinical reasoning skills are often domain-specific, requiring targeted practice in various clinical presentations to develop comprehensive diagnostic capabilities. Our findings suggest that the chatbot intervention was effective across both focused, system-specific presentations and more general symptomatic complaints, indicating its versatility as a training tool for diverse clinical scenarios [
18].
Furthermore, the study underscores the importance of user acceptance, a critical factor for the successful adoption of educational technology. The high rate of perceived helpfulness (85%) and the notable increase in confidence (78%) reported in the survey are not merely satisfaction metrics; they are reflections of the learning experience itself.
Research consistently demonstrates that perceived helpfulness serves as a rational motivation for engagement and directly correlates with learning outcomes. significantly influence their attitudes toward technology adoption and subsequent learning performance. The strong correlation between satisfaction and self-confidence observed in simulation-based learning (r = 0.684,
p < 0.05) supports our findings that these metrics represent genuine indicators of educational effectiveness rather than superficial satisfaction measures [
19].
Studies have found that self-confidence serves as a mediating factor between educational interventions and clinical performance, with higher confidence levels directly associated with improved clinical capabilities. Research by Aljohani et al. demonstrated a strong positive correlation between satisfaction with simulation learning environments and learning achievement (r = 0.80,
p < 0.01), indicating that satisfaction metrics are indeed reflective of substantive learning gains. Similarly, studies on online learning self-efficacy have shown that confidence in learning domains significantly predicts student satisfaction and academic outcomes (β = 8.93,
p < 0.001) [
20].
Educational technology adoption is most successful when users perceive genuine value and enhanced performance, with satisfaction serving as a reliable proxy for educational impact. This evidence base supports our interpretation that the observed user acceptance reflects meaningful educational outcomes, not superficial user preferences.
4.1. Interpretation of Student Perception Using TAM
The consistently high levels of agreement across Technology Acceptance Model (TAM) domains in this study affirm that the chatbot-based virtual patient simulator was highly accepted by medical students (overall TAM average: 3.89/5.0). Notably, elevated ratings for Perceived Usefulness (PU) (mean = 3.91) demonstrate the intervention’s significant educational impact, with the highest scores observed for enhanced communication skills and rapport-building (3.98) and improved logical flow during history-taking (3.96). These findings are concordant with previous TAM-centered research, which demonstrates that perceived usefulness is the strongest predictor of technology uptake among health professional learners and is central to successful implementation in clinical education settings [
21].
Similarly, strong scores in Perceived Ease of Use (PEOU) (mean = 3.85)—with students rating the chatbot as easy to use and navigate (3.95) and requiring minimal mental effort for interaction (3.88)—underscore the accessibility of the chatbot-based platform. This is especially crucial, as ease of use has been shown to mediate the effect of perceived usefulness on both students’ attitudes toward technology and their behavioral intention to adopt educational innovations. Findings from recent systematic reviews and TAM-based research confirm that platforms perceived as effortless to use facilitate acceptance and sustained engagement among medical learners [
21,
22].
Positive Attitudes Toward Use (ATU) (mean = 3.82) and Behavioral Intentions to Use (BIU) (mean = 3.93)—reflected in students’ enthusiasm to continue using AI-based tools for learning (3.96) and their strong desire to see similar technologies integrated into other areas of medical education (3.98)—suggest high technology acceptance and significant potential for broader integration across medical curricula. These findings affirm TAM’s relevance as a robust theoretical lens for interpreting the effectiveness of educational technologies in clinical training settings, as research consistently demonstrates that behavioral intention serves as the strongest predictor of actual technology adoption and sustained use in medical education contexts. The strong correlations observed between attitude and behavioral intention align with established TAM research showing that positive attitudes toward educational technology directly influence students’ willingness to integrate these tools into their learning practices [
21,
23].
The integration of quantitative performance improvements with qualitative insights from thematic analysis reveals a multifaceted understanding of how this technology facilitates learning while illuminating its inherent limitations.
The experimental group’s stronger performance likely stems from what the AI chatbot makes possible, and what traditional teaching often cannot. The chatbot gives students anytime, on-demand practice with realistic cases, so they can repeat, refine, and retry without waiting for a scheduled session or a free tutor. This directly supports deliberate practice, frequent, focused repetition with feedback—long recognized as crucial for building clinical skills [
24,
25].
In medical education, access is usually the bottleneck: limited faculty time, crowded ward schedules, and scarce slots for simulated patients. By removing those constraints, the chatbot lets learners practice more often and with more variety than paper exercises or occasional, faculty-led sessions. That repeated exposure to diverse presentations helps students structure their questioning, spot patterns, and grow diagnostic expertise [
24].
The standardization delivered through uniform virtual patient cases ensures equitable training quality and is particularly valuable given evidence that inconsistent clinical exposure creates disparities in learning outcomes among medical students [
26,
27]. The platform’s capacity to maintain fidelity across multiple learner interactions provides reliable, reproducible learning experiences that support systematic skill development.
The qualitative analysis reveals that the chatbot’s creation of a psychologically safe learning environment represents one of its most significant educational contributions, which enabled them to overcome anxiety associated with making mistakes, a barrier that commonly impedes learning in traditional clinical settings. This non-judgmental space allowed learners to explore different questioning techniques and experiment with clinical approaches without fear of evaluation by peers or instructors.
The importance of psychological safety in simulation-based medical education is well-established, with research demonstrating that learners require secure environments to engage fully in skill development activities [
9]. The chatbot platform effectively addresses this need by eliminating social pressures and evaluative concerns that can inhibit learning, particularly among students in early stages of clinical training who may lack confidence in their abilities.
4.2. Immediate Feedback and Learning Optimization
The chatbot’s capacity to provide instantaneous, objective feedback emerged as a critical pedagogical strength that distinguishes it from traditional teaching methods. Educational research consistently demonstrates that immediate feedback is superior to delayed feedback for skill acquisition, particularly in complex domains like clinical reasoning, where misconceptions can become entrenched without timely correction [
28]. The real-time corrective input provided by the chatbot enables students to recognize and address errors immediately while reinforcing appropriate clinical reasoning pathways.
The chatbot’s ability to provide personalized, specific feedback after each interaction creates opportunities for continuous improvement and self-regulation that are difficult to achieve in conventional educational settings.
Despite its strengths, the qualitative analysis revealed important limitations that define the chatbot’s optimal role within medical education. Students consistently acknowledged the platform’s inability to replicate the full spectrum of human interaction, particularly emotional nuance, non-verbal communication, and spontaneous variability characteristic of real patient encounters. These elements remain essential for developing communication skills, empathy, and the interpersonal competencies central to patient-centered care.
The recognition of these limitations led to a strong consensus among participants that the chatbot should function as a supplementary tool rather than a replacement for direct patient contact. Students appreciated its role as a preparatory bridge that helps organize clinical approaches and build foundational skills before transitioning to higher-stakes patient interactions. This perspective aligns with blended learning frameworks that emphasize the integration of digital simulations with experiential clinical training for comprehensive professional development [
26].
5. Educational Innovation and Future Directions
AI-powered virtual patient simulators represent a valuable innovation in medical education that addresses longstanding challenges related to practice accessibility, feedback timeliness, and learning standardization. Technology’s strength lies not in replacing human elements of medical education but in augmenting traditional approaches through the provision of safe, accessible, and feedback-rich learning environments.
Beyond educational effectiveness, the design of AI-driven learning tools should also consider issues of transparency, interpretability, and learner trust.
Recent advances in artificial intelligence in medical education have increasingly emphasized the importance of interpretable and knowledge-based systems to enhance transparency, reliability, and user trust in healthcare education applications. In this context, integrating structured priors and explainable AI frameworks has shown promise in improving the interpretability and practical applicability of AI-driven models in medical education [
29]. Similarly, the development of educational chatbots should extend beyond conversational performance to include pedagogically aligned, transparent, and trustworthy interactions that support learners’ clinical reasoning and confidence during history-taking practice.
Future developments should focus on enhancing the platform’s capacity to simulate more complex aspects of patient interaction while maintaining its core strengths in providing structured, repeatable practice opportunities. The integration of automated analytics, multilingual support, and enhanced emotional modeling could further expand the technology’s educational utility while preserving the psychological safety and accessibility that students value.
The evidence presented supports the strategic integration of AI chatbot simulators into medical curricula as complementary tools that enhance rather than replace traditional clinical education. This approach maximizes the benefits of technological innovation while preserving the irreplaceable human elements essential for comprehensive medical training and professional development.
6. Study Limitations
The quasi-experimental design, using a historical control group, is susceptible to confounding variables, though care was taken to ensure curricular and instructional consistency. There was a potential for selection bias due to the non-randomized nature of the study design. The reliance on self-reported measures for confidence is also a limitation. Finally, as students noted, Chatbot cannot replicate the full spectrum of human interaction, especially emotional nuance and non-verbal cues. This reinforces its role as a powerful adjunct to, not a replacement for, real-world clinical experience.
Future research should include longitudinal evaluation of chatbot-based training to examine whether repeated exposure to AI-supported history-taking practice has sustained effects on students’ communication skills, clinical reasoning, and clinical performance during advanced clinical rotations and internship training. Further studies should also incorporate systematic auditing of chatbot–student interactions and direct comparisons between chatbot-generated feedback and faculty assessment to evaluate reliability, consistency, and educational validity.
7. Conclusions
The findings suggest that Chatbot-assisted history-taking training was associated with significantly improved OSCE performance and increased student confidence, suggesting potential value as an adjunctive educational tool in clinical training. The chatbot-based training significantly improved students’ satisfaction, with most rating it highly for usefulness, ease of use, and intention to adopt. While valued as a safe, flexible, and feedback-rich tool for developing history-taking and reasoning skills, students agreed it should serve as a supplementary aid rather than a substitute for real patient interaction applications.
Author Contributions
Conceptualization, S.A. and S.E.-T.; methodology, I.S.; software, S.A.; validation, S.A., S.E.-T. and I.S.; formal analysis, I.S.; investigation, S.E.-T.; resources, S.A.; data curation, I.S.; writing—original draft preparation, S.A.; writing—review and editing, S.E.-T.; visualization, I.S.; supervision, S.A.; project administration, S.E.-T. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
The study protocol received full approval from the Institutional Review Board at Ibn Sina National College for Medical Studies (IRRB-02-24112024). Before participation, all students were provided with a detailed information sheet and were required to give written informed consent (included in the survey). To protect participant identities, participant privacy, and to ensure confidentiality, all collected data, including performance rubric scores and focus group transcripts, were fully anonymized.
Informed Consent Statement
Informed consent was obtained from all subjects involved in the study.
Data Availability Statement
Data will be available upon request.
Acknowledgments
The authors would like to express their gratitude to all students who agreed to participate, without whom this work would not have been possible.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper. All authors have contributed to the work objectively, and no financial, personal, or professional relationships that could inappropriately influence or bias the research and its findings have been identified.
Abbreviations
The following abbreviations are used in this manuscript:
| OSCE | Objective Structured Clinical Examination |
| TAM | Technology Acceptance Model |
| AI | Artificial intelligence |
References
- Palsson, R.; Kellett, J.; Lindgren, S.; Merino, J.; Semple, C.; Sereni, D. Core competencies of the European internist: A discussion paper. Eur. J. Intern. Med. 2007, 18, 104–108. [Google Scholar] [CrossRef]
- Keifenheim, K.E.; Teufel, M.; Ip, J.; Speiser, N.; Leehr, E.J.; Zipfel, S.; Herrmann-Werner, A. Teaching history taking to medical students: A systematic review. BMC Med. Educ. 2015, 15, 159. [Google Scholar] [CrossRef]
- Co, M.; Yuen, T.H.J.; Cheung, H.H. Using clinical history-taking chatbot mobile app for clinical bedside teachings: A prospective case-control study. Heliyon 2022, 8, e09751. [Google Scholar] [CrossRef] [PubMed]
- Kaplonyi, J.; Bowles, K.; Nestel, D.; Kiegaldie, D.; Maloney, S.; Haines, T.; Williams, C. Understanding the impact of simulated patients on health care learners’ communication skills: A systematic review. Med. Educ. 2017, 51, 1209–1219. [Google Scholar] [CrossRef]
- Khan, A.; Rodwell, V.; Luhar, L.; Nandakumar, S.; Sivam, S.; Rosil, J.; Bird, T. Virtual reality 360-degree films for Objective Structured Clinical Examination preparation: A descriptive study. Cureus 2025, 17, e78120. [Google Scholar] [CrossRef]
- Wong, R.S.-Y. ChatGPT in medical education: Promoting learning or killing critical thinking? Educ. Med. J. 2024, 16, 177–183. [Google Scholar] [CrossRef]
- Kelly, S.; Smyth, E.; Murphy, P.; Pawlikowska, T. A scoping review: Virtual patients for communication skills in medical undergraduates. BMC Med. Educ. 2022, 22, 429. [Google Scholar] [CrossRef] [PubMed]
- Plackett, R.; Kassianos, A.P.; Mylan, S.; Kambouri, M.; Raine, R.; Sheringham, J. The effectiveness of using virtual patient educational tools to improve medical students’ clinical reasoning skills: A systematic review. BMC Med. Educ. 2022, 22, 365. [Google Scholar] [CrossRef] [PubMed]
- Holderried, F.; Stegemann-Philipps, C.; Herrmann-Werner, A.; Festl-Wietek, T.; Holderried, M.; Eickhoff, C.; Mahling, M. A language model-powered simulated patient with automated feedback for history taking: Prospective study. JMIR Med. Educ. 2024, 10, e59213. [Google Scholar] [CrossRef]
- Holderried, F.; Stegemann-Philipps, C.; Herschbach, L.; Moldt, J.-A.; Nevins, A.; Griewatz, J.; Holderried, M.; Herrmann-Werner, A.; Festl-Wietek, T.; Mahling, M. A generative pretrained transformer (GPT)-powered chatbot as a simulated patient to practice history taking: Prospective, mixed methods study. JMIR Med. Educ. 2024, 10, e53961. [Google Scholar] [CrossRef]
- Veloski, J.; Boex, J.R.; Grasberger, M.J.; Evans, A.; Wolfson, D.B. Systematic review of the literature on assessment, feedback and physicians’ clinical performance: BEME Guide No. 7. Med. Teach. 2006, 28, 117–128. [Google Scholar] [CrossRef]
- Natesan, S.; Jordan, J.; Sheng, A.; Carmelli, G.; Barbas, B.; King, A.; Gore, K.; Estes, M.; Gottlieb, M. Feedback in medical education: An evidence-based guide to best practices from the Council of Residency Directors in Emergency Medicine. West. J. Emerg. Med. 2023, 24, 479–494. [Google Scholar] [CrossRef] [PubMed]
- Srinivasan, M.; Venugopal, A.; Venkatesan, L.; Kumar, R. Navigating the pedagogical landscape: Exploring the implications of AI chatbots in nursing education. JMIR Nurs. 2024, 9, e52105. [Google Scholar] [CrossRef]
- Pereira, D.S.; Falcão, F.; Nunes, A.; Santos, N.; Costa, P.; Pêgo, J.M. Designing and building OSCEBot® for virtual OSCE: Performance evaluation. Med. Educ. Online 2023, 28, 2228550. [Google Scholar] [CrossRef]
- Or, A.J.; Sukumar, S.; Ritchie, H.E.; Sarrafpour, B. Using artificial intelligence chatbots to improve patient history taking in dental education (Pilot study). J. Dent. Educ. 2024, 88, 1988–1990. [Google Scholar] [CrossRef]
- Davis, F.D. Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Q. 1989, 13, 319–340. [Google Scholar] [CrossRef]
- Madireddy, S.; Rufa, E.P. Maintaining confidentiality and psychological safety in medical simulation. In StatPearls; StatPearls Publishing: Treasure Island, FL, USA, 2025. [Google Scholar]
- Lai, J.H.; Cheng, K.H.; Wu, Y.J.; Lin, C.C. Assessing clinical reasoning ability in fourth-year medical students via an integrative group history-taking with an individual reasoning activity. BMC Med. Educ. 2022, 22, 573. [Google Scholar] [CrossRef] [PubMed]
- Bdiri Gabbouj, S.; Zedini, C.; Naija, W. Nursing students’ satisfaction and self-confidence with simulation-based learning and its associations with simulation design characteristics and educational practices. Adv. Med. Educ. Pract. 2024, 15, 1093–1102. [Google Scholar] [CrossRef]
- Aljohani, A.S.; Karim, Q.; George, P. Students’ satisfaction with simulation learning environment in relation to self-confidence and learning achievement. J. Health Sci. 2016, 4, 228–235. [Google Scholar] [CrossRef][Green Version]
- Lee, J.W.Y.; Tan, J.Y.; Bello, F. Technology Acceptance Model in medical education: Systematic review. JMIR Med. Educ. 2025, 11, e67873. [Google Scholar] [CrossRef] [PubMed]
- Mastour, H.; Yousefi, R.; Niroumand, S. Exploring the acceptance of e-learning in health professions education in Iran based on the technology acceptance model (TAM). Sci. Rep. 2025, 15, 8178. [Google Scholar] [CrossRef]
- Kucuk, S.; Baydas Onlu, O.; Kapakin, S. A model for medical students’ behavioral intention to use mobile learning. J. Med. Educ. Curric. Dev. 2020, 7, 2382120520973222. [Google Scholar] [CrossRef]
- Mitchell, S.A.; Boyer, T.J. Deliberate practice in medical simulation. In StatPearls; StatPearls Publishing: Treasure Island, FL, USA, 2025. [Google Scholar]
- Taraporewalla, K.; Barach, P.; van Zundert, A. Teaching medical procedural skills for performance. Clin. Pract. 2024, 14, 862–869. [Google Scholar] [CrossRef]
- Cheng, K.H.; Lee, C.Y.; Wu, Y.J.; Lin, C.C. Using group history-taking and individual reasoning to identify shortcomings in clinical reasoning for medical students. J. Med. Educ. Curric. Dev. 2024, 11, 23821205241280946. [Google Scholar] [CrossRef] [PubMed]
- Kononowicz, A.A.; Woodham, L.A.; Edelbring, S.; Stathakarou, N.; Davies, D.; Saxena, N.; Tudor Car, L.; Carlstedt-Duke, J.; Car, J.; Zary, N. Virtual patient simulations in health professions education: Systematic review and meta-analysis by the Digital Health Education Collaboration. J. Med. Internet Res. 2019, 21, e14676. [Google Scholar] [CrossRef] [PubMed]
- Elendu, C.; Amaechi, D.C.; Okatta, A.U.; Amaechi, E.C.; Elendu, T.C.; Ezeh, C.P.; Elendu, I.D. The impact of simulation-based training in medical education: A review. Medicine 2024, 103, e38813. [Google Scholar] [CrossRef]
- Li, S.; Zhu, Q.; Tian, C.; Zhang, D. Interpretable dynamic brain network analysis with functional and structural priors. IEEE Trans. Med. Imaging 2025, 44, 4878–4889. [Google Scholar] [CrossRef] [PubMed]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Published by MDPI on behalf of the Academic Society for International Medical Education. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.