Integrating Generative Artificial Intelligence (AI) in Medical Education: A Framework for Preserving Clinical Reasoning

Corral-Gudino, Luis; Herrero-Montano, Isabel; de la Torre-Díez, Isabel; Miramontes-González, José Pablo

doi:10.3390/app16125946

Open AccessArticle

Integrating Generative Artificial Intelligence (AI) in Medical Education: A Framework for Preserving Clinical Reasoning

by

Luis Corral-Gudino

¹

,

Isabel Herrero-Montano

²

,

Isabel de la Torre-Díez

²

and

José Pablo Miramontes-González

^1,*

¹

Department of Medicine, Dermatology and Toxicology, School of Medicine, Universidad de Valladolid, 47002 Valladolid, Spain

²

Department of Signal Theory and Communications, and Telematics Engineering, Universidad de Valladolid, 47002 Valladolid, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(12), 5946; https://doi.org/10.3390/app16125946

Submission received: 15 April 2026 / Revised: 5 June 2026 / Accepted: 8 June 2026 / Published: 12 June 2026

Download

Browse Figure

Versions Notes

Featured Application

The M3RGE-AI framework offers a directly implementable model for medical schools seeking to integrate generative AI tools into their curricula without compromising the development of independent clinical reasoning. Its staged progression, peer moderation structure, and AI-off assessment checkpoints can be adopted incrementally by institutions. However, full implementation requires adequate computing infrastructure, reliable internet connectivity, and sustainable access to AI tools, factors that may limit applicability in resource-constrained or community-based settings. Addressing these barriers through institutional investment, open-source alternatives, or locally hosted models represents a necessary condition for equitable framework adoption.

Abstract

Generative artificial intelligence (AI) is increasingly present in medical education, yet its indiscriminate use risks impairing the acquisition of foundational clinical competencies, including clinical reasoning, hypothesis generation, and patient-centered communication, through processes of never-skilling, mis-skilling, and deskilling. This paper presents M3RGE-AI (Responsible, Reliable, and Reflexive use of Generative AI in Medical Education), a conceptual framework for the purposeful integration of AI as a cognitive scaffold in medical training. Drawing on established learning theories, zone of proximal development, deliberate practice, and peer learning, the framework assigns progressively expanding AI functions across training stages, prioritizes Socratic over directive interactions, requires transparent and verifiable sourcing of AI-generated content, and incorporates peer moderation and AI-off assessment checkpoints to mitigate over-reliance. The framework is operationalized through alternating AI-on and AI-off cycles, governance processes, and educator training protocols. Applied within these constraints, AI can shorten feedback loops and broaden clinical exposure while preserving independent reasoning and authentic patient communication. M3RGE-AI offers a theoretically grounded and institutionally implementable model for integrating generative AI into medical curricula without sacrificing the essential human competencies that underpin safe clinical practice.

Keywords:

artificial intelligence; clinical education; clinical competence; active methodologies; curriculum design

1. Introduction

Artificial intelligence (AI) encompasses computational approaches enabling machines to perform tasks traditionally requiring human intelligence. Generative AI, particularly through Large Language Models (LLMs), has the potential to transform medical practice in ways distinct from earlier AI applications. Tools like ChatGPT, Gemini, and Claude can instantly generate differential diagnoses, summarize patient histories, and synthesize medical literature. However, these general-purpose AI tools have not received regulatory approval for clinical use, their medical outputs remain unvalidated, and expert verification is required. Early evidence suggests that for certain tasks, these systems may match or exceed the diagnostic reasoning speed of junior clinicians in well-defined research scenarios [1,2,3].

While generative AI holds great promise, it may disrupt essential reflective processes in medical practice [4]. Educators are exploring how AI can enhance simulations and personalized learning [5,6], yet there is increasing concern that students may bypass critical diagnostic reasoning steps, relying on algorithmic outputs without critical interrogation [7]. Students requesting AI-generated differential diagnoses may use these lists without performing independent reasoning, thereby reducing engagement with critical thinking skills [8]. Unlike previous educational tools, AI autonomously generates plausible clinical narratives. This “black box” aspect means learners may accept outputs at face value without questioning underlying assumptions, skills that are central to safe clinical practice [9]. One study found that 73% of medical students showed significant anchoring bias when presented with AI-generated differential diagnoses [10].

The “GPS effect” illustrates how automation can erode skill: just as reliance on navigation systems diminishes spatial orientation abilities [11]. Consistent AI dependence for generating clinical reasoning may prevent the development of internal frameworks essential for flexible practice. Recent literature identifies four trajectories: “never-skilling” (AI supplants essential reasoning), “mis-skilling” (uncritical reliance on outputs), “deskilling” (erosion of competencies through cognitive offloading), and “skilling” (effective AI integration enhancing critical thinking) [12]. Each of these risks maps directly to specific M3RGE-AI safeguards. Restricted AI use and AI-off checkpoints address never-skilling and deskilling, while the Chief-of-GPT model and 3T-RAG principle target mis-skilling and automation bias.

Rejecting AI entirely is neither practical nor desirable. Medical data complexity and workforce constraints demand that future physicians become proficient in harnessing AI tools as collaborators [13]. Regulatory bodies now call for digital health and AI literacy competencies [14]. However, a significant gap remains in the literature. Although many reports detail early AI uses [15,16], there is a lack of structured frameworks explaining how to integrate AI while maintaining core clinical skills. Most initiatives are fragmented local projects rather than unified curriculum strategies, and available guidance seldom addresses balancing AI’s efficiency with the prevention of skill loss.

Medical schools require frameworks that: (1) delineate progressive AI integration stages; (2) outline strategies to prevent never-skilling, mis-skilling, and deskilling; (3) integrate assessment methods ensuring skill retention without AI; and (4) establish governance structures for responsible implementation.

We propose the M3RGE-AI framework (Medical Education with Responsible, Reliable, and Reflexive use of Generative Artificial Intelligence), grounded in established learning principles including zone of proximal development and deliberate practice. Key components include Socratic AI interactions promoting active reasoning, mandatory AI-off assessment checkpoints verifying independent mastery, and governance mechanisms with peer moderation and transparent sourcing. This positions AI as a scaffold for skill acquisition rather than a substitute for clinical competence, enabling educators to leverage AI’s benefits while preserving independent clinical reasoning development.

2. Materials and Methods

The M3RGE-AI framework was developed as a conceptual proposal through a structured reasoning process informed by three complementary sources: established learning theories, emerging literature on AI integration in medical education, and observed risks of unstructured AI use in clinical training contexts.

Framework foundations. Three educational theories were selected as cornerstones based on their established relevance to skill acquisition in health professions education and their practical applicability to AI-mediated learning environments: instructional scaffolding and the zone of proximal development (ZPD), deliberate practice theory, and peer-assisted learning (PAL) combined with near-peer teaching (NPT). These theories were chosen because they collectively address the three core challenges identified in the literature: the developmental appropriateness of AI exposure, the maintenance of effortful cognitive engagement, and social accountability in AI use.

Framework derivation. Framework components were derived through three sequential steps: (1) identifying the foundational clinical competencies most vulnerable to displacement by unstructured AI use, specifically hypothesis generation, clinical prioritization, and patient-centered communication; (2) mapping each identified risk, never-skilling, mis-skilling, and deskilling, to specific design responses grounded in the selected learning theories; and (3) proposing implementation mechanisms, including staged progression, restricted AI outputs, peer moderation, and AI-off assessment checkpoints, aligned with institutional feasibility in medical school curricula.

Scope and limitations. This framework has not undergone a systematic literature review, formal expert consensus, or empirical validation. It is presented as a theoretically reasoned proposal intended to guide initial implementation efforts, rather than a validated protocol ready for direct adoption. The explicit expectation is that any institutional adoption should be accompanied by rigorous pilot evaluation and evidence-based refinement. Accordingly, the framework should be read as a structured starting point for empirical investigation, not as a prescriptive implementation guide.

3. Results

3.1. M3RGE-AI Framework

No single theory comprehensively addresses the challenges of integrating AI into medical education. We therefore draw upon three educational theories as cornerstones for our framework. First, instructional scaffolding and the ZPD [17,18] support a developmentally appropriate progression: diagnosing learner capability, introducing graduated prompts within the ZPD, and fading support as competence grows. This means constraining AI early by providing Socratic questions rather than answers, thereby preserving productive difficulty and preventing never-skilling or mis-skilling. Consistent with its original formulation [17], scaffolding within M3RGE-AI operates as a fading mechanism. AI support is deliberately and progressively reduced as learner competence increases, moving from highly constrained interactions in foundational stages to increasingly autonomous AI use in advanced clinical years, ensuring that the scaffold is withdrawn before dependence becomes established. Second, Ericsson’s theory of deliberate practice [19] provides a rationale for goal-directed activities with immediate feedback, repetition with variability, and progressive challenge toward transferable expertise. AI can support deliberate practice through rapid case generation and timely feedback while requiring learners to justify responses, preventing superficial performance improvements. Third, PAL/NPT principles offer scalable collaboration through cognitive congruence and low-stakes questioning environments. We operationalize PAL/NPT through a “Chief-of-GPT” role that challenges AI suggestions, normalizes explanation-before-acceptance, and serves as a first-line validation layer for AI-generated clinical scenarios, tasked with identifying and flagging clinically implausible or internally inconsistent cases before they are presented to the group for reasoning practice. This role is not merely moderating, it is epistemically active. The Chief-of-GPT models critical appraisal as a social norm rather than an individual habit. To fulfill this function effectively, facilitator training should explicitly address the risks of authority bias and groupthink, ensuring that the peer leader promotes genuine critical debate rather than inadvertently anchoring the group to their own reasoning.

These theories shape our design choices, early AI restriction, verification-ready outputs, peer moderation, and AI-off assessments, aimed at strengthening genuine skill acquisition rather than substituting it. Table 1 details how framework components are grounded in these theories.

3.2. Core Components

The M3RGE-AI framework proposes a staged, developmentally aligned approach for integrating AI into medical curricula. Rather than viewing AI as an add-on or standalone tool, M3RGE-AI offers a structured progression that adjusts the scope and complexity of AI-assisted learning to match students’ evolving cognitive and clinical abilities.

This approach ensures that AI serves as a scaffold for developing clinical reasoning, not as a substitute for the deliberate practice that is essential to safe patient care.

3.2.1. Overview: Principles and Longitudinal Design

M3RGE-AI operates on three principles: responsibility (preserving essential human skills), reliability (validating AI accuracy), and reflexivity (promoting critical student engagement). Unlike rigid checklists, this flexible framework adapts to learner progress, with support through AI, peer review, and tutoring scaled back as competence increases. Training shifts from guided use of AI in early stages to more independent and critical application during clinical years.

The 3T-RAG principle, which requires AI outputs to be Traceable, Trustworthy, and Thought-provoking, operationalizes responsible AI use at the level of individual student–AI interaction. Concretely, students should be trained to apply structured verification prompts, such as requesting explicit source attribution, asking the AI to quantify its uncertainty, and cross-referencing outputs against validated clinical resources, before incorporating any AI-generated content into their reasoning. These habits, practiced deliberately across all framework stages, transform the 3T-RAG principle from an abstract standard into a concrete clinical skill. Empirical models operationalizing Socratic AI tutoring in medical education contexts provide concrete implementation guidance for institutions adopting this approach [20].

Figure 1 shows how AI use progresses from basic simulations to complex, realistic clinical scenarios as students advance.

This approach helps students build core skills and then improve them in increasingly lifelike settings (see Table 2).

AI assistants can dynamically create endless clinical cases [21,22], thereby surpassing pre-programmed scenarios and enabling true deliberate practice that is essential for expertise. When students repeatedly encounter identical cases, exercises devolve into the memorization of specific answers rather than the mastery of reasoning processes. By providing novel variations with each interaction, AI ensures that students engage in focused cognitive work, specifically hypothesis generation and synthesis, preventing reliance on simple recall and fostering robust, transferable clinical skills.

3.2.2. Cross-Stages Safeguards: Monitoring, Peer Review, and Adaptive Governance

Across all stages of the M3RGE-AI framework, safeguards are needed to ensure that the integration of AI remains supportive of skill development rather than inadvertently promoting passive dependence. While restrictions on AI outputs and transparent feedback loops are essential, equally important are the social and pedagogical mechanisms that keep students actively engaged and critically reflective.

3.2.3. Continuous Individual Assessment

Outcomes and metrics. We recommend (i) unaided diagnostic reasoning via the Diagnostic Thinking Inventory (DTI); (ii) uncertainty-tolerant reasoning via Script Concordance Tests (SCT); (iii) AI-wrong OSCE stations in which AI outputs deliberately contain clinical errors or inconsistencies that students must identify and correct; and (iv) supervised observation of real clinical scenarios in the last course according to the DEFT-AI proposal [12]. AI-off checkpoints are implemented progressively across training stages. At the foundational stage, unassisted history-taking exercises and basic differential diagnosis tasks verify that students can generate hypotheses independently before AI exposure is broadened. At the intermediate stage, SCTs administered without AI access assess whether students can tolerate and reason through clinical uncertainty independently. At the advanced stage, AI-wrong OSCE stations verify that senior students have developed the critical appraisal capacity to detect AI failures and override unreliable outputs safely in high-stakes clinical contexts. Pre/post contrasts and staged progression checkpoints allow early detection of over-reliance on AI.

Pending empirical threshold definition, institutions should establish local baseline performance on AI-off assessments when implementation begins and treat meaningful within-student gaps between AI-assisted and independent performance as an early warning sign of over-reliance requiring targeted intervention. In addition, structured reflection on student attitudes should be incorporated into debriefings and self-assessment exercises. Students should be encouraged to recognize and articulate when their trust in AI might override their independent clinical judgment.

3.2.4. Supervision Challenges in Large Cohorts: The Role of PAL/NPT

One of the most valuable capacities of AI is that it can be trained to provide students with immediate, personalized feedback even outside formal teaching hours. Introducing an AI-based assistant may offer a practical solution for supporting large student cohorts, but unstructured access alone does not guarantee proper use or meaningful feedback as individual students interacting with AI are at a high risk of cognitive errors like automation bias or anchoring [23].

The M3RGE-AI framework proposes a social, peer-moderated structure to mitigate this risk by incorporating the figure of the “Chief of GPT”, a role inspired by the traditional “Chief of Table” used in anatomical teaching, where an advanced student leads and coordinates the learning activities of a small group at a dissection table. In the Chief-of-GPT model, a designated student, typically an advanced learner or one demonstrating strong clinical reasoning skills, facilitates their peers’ use of AI during small-group learning sessions, ensuring critical engagement rather than passive acceptance of AI outputs.

The PAL/NPT is a well-established and effective pedagogy in health professions education. Research shows that PAL achieves learning outcomes comparable to those of faculty-led teaching for theoretical knowledge and has a significant positive impact on improving procedural skills [24,25].

The Chief-of-GPT leads small-group sessions where students evaluate AI-powered clinical cases. By guiding discussions, modeling critical analysis of AI results, and prompting questions like “Is the AI correct?” or “What information is missing?”, the Chief-of-GPT fosters metacognitive thinking. This approach helps students to treat AI outputs as hypotheses that need verification rather than accepting them as facts.

The Chief-of-GPT role can be rotated among qualified students to distribute leadership opportunities and reinforce teaching skills, embedding the principle that mastery includes guiding others in critical AI application. Faculty oversight remains essential: educators supervise peer-led sessions, train student facilitators, monitor for over-reliance or uncritical acceptance, and integrate insights into curriculum refinements. This peer facilitation coupled with faculty supervision creates sustainable scalability while maintaining academic rigor.

The Chief-of-GPT model fundamentally transforms learning dynamics from a traditional one-to-one human–machine dialog to a more robust one-to-many-to-one human–humans–machine interaction. The peer tutor guides the group as they collectively engage with and critically appraise the AI outputs. This human–humans interaction serves as a built-in, real-time check against the cognitive pitfalls of human–machine interaction.

4. Discussion

The central challenge of AI integration in medical education is not technological but pedagogical. We need to address how to harness AI’s capacity to accelerate learning while preserving the productive difficulty that builds genuine clinical competence. M3RGE-AI proposes a structured answer to this question, conceptualizing AI not as a threat or a solution but as a collaborative learning partner whose effectiveness depends entirely on how, when, and under what constraints it is introduced.

Realizing this promise requires protecting the cognitive work underlying diagnostic expertise. M3RGE-AI’s staged progression and safeguards contrast with fragmented experiments, aligning AI complexity to learners’ development stage (Figure 1) and addressing digital medical education’s tendency toward uniform rather than developmental technology adoption.

Some argue that restricting AI limits real-world preparation, contending that students naturally develop critical skills through experience. This overlooks evidence that expertise requires deliberate, scaffolded practice, not passive exposure. Aviation provides a compelling analogy: pilots demonstrate manual competence before using autopilot [26]. Similarly, medical students should master independent reasoning (AI-off checkpoints) before depending on AI. These AI-off checkpoints should not be understood as a permanent assessment model or as a reflection of future clinical practice. Rather, they serve as a developmental safeguard to ensure that students establish independent reasoning before routine reliance on AI takes hold. The aviation analogy remains instructive: pilots must demonstrate manual competence not because autopilot will be unavailable, but because recognizing system failure and overriding automation safely is an essential professional skill. As learners move through the framework, assessment should evolve as well. In advanced clinical training, hybrid formats become more appropriate, evaluating both independent reasoning and the quality of human–AI collaboration, including the ability to critically assess AI outputs, recognize their limitations, and incorporate them appropriately into clinical judgment. In this sense, AI-off and AI-assisted assessments serve complementary functions across the continuum of training rather than representing competing educational philosophies.

The framework’s distinctive contribution is operationalizing peer-facilitated reflexivity through the Chief-of-GPT model. While peer-assisted learning is established in health professions education, its application in an AI context is novel. Senior students moderating AI-enhanced reasoning sessions add critical accountability, promote leadership and debate, and prevent passive AI acceptance, thereby combining near-peer teaching with adaptive AI to develop facilitation skills.

Major practical obstacles remain. Many educators lack the experience to properly oversee AI-augmented learning or to create tools fitting strict 3T-RAG guidelines. Comprehensive professional development is essential, not just for technical skills, but to train educators in encouraging critical thinking, recognizing automation bias, and managing AI-enhanced classrooms. Building scaffolded AI scenarios with adjustable restrictions and responsive feedback is far more complex than deploying commercial chatbots, requiring interdisciplinary collaboration and consistent oversight.

Governance issues persist, requiring institutions to define permissible AI uses, particularly regarding patient data. While RAG methodologies enhance traceability, they introduce concerns about data integrity and licensing when students bypass restrictions. Equity is central: without deliberate effort, the framework may widen gaps between resource-rich and resource-limited institutions. Future adaptations should explore open-source or low-bandwidth versions that can be shared across contexts.

Another practical challenge within M3RGE-AI is tool selection. Tool selection should be guided by institutional governance criteria rather than performance benchmarks alone. Given the rapid evolution of available models, accuracy comparisons between specific LLMs are likely to be outdated before implementation. More durable selection principles include the transparency of sourcing and the explainability of outputs, alignment with data protection regulations, and institutional control over student and patient data. In this regard, locally hosted Small Language Models represent an emerging privacy-preserving alternative to cloud-based tools in educational settings [27]; this suggests that data security may be a more stable and educationally appropriate selection criterion than raw diagnostic performance.

A further theoretical dimension warranting explicit acknowledgment is AI literacy, defined as the capacity to critically evaluate, responsibly use, and meaningfully collaborate with AI systems. AI literacy is not a byproduct of AI exposure but a teachable competency that must be deliberately cultivated. Within M3RGE-AI, the 3T-RAG principle and the Chief-of-GPT model operationalize core AI literacy skills, namely, source appraisal, uncertainty recognition, and critical interrogation of outputs, across all training stages. Empirical validation of M3RGE-AI should include explicit measurement of AI literacy as an outcome variable, assessing whether framework exposure produces measurable improvements in students’ capacity to critically appraise, selectively trust, and appropriately override AI outputs in clinical decision-making contexts.

There are several limitations to our framework. M3RGE-AI is a conceptual model developed from pedagogical reasoning rather than systematic reviews or empirical validation. Components like staged restrictions, Socratic prompting, and Chief-of-GPT moderation remain untested. Unintended consequences may include students bypassing restrictions, overestimating AI reliability, or developing a psychological dependence despite AI-off checkpoints. The framework’s applicability beyond medical students’ clinical reasoning remains uncertain. These limitations highlight the fact that M3RGE-AI should be viewed as a conceptual guide requiring systematic empirical validation through pilot studies, comparative research, and continuous refinement across varied institutional settings.

Beyond pedagogical concerns, the technical limitations of generative AI also require explicit recognition. Current LLMs remain vulnerable to hallucinations, whereby they generate plausible but incorrect clinical information, and to epistemic uncertainty, often expressing unjustified confidence that may mislead learners who lack the experience to detect errors. In deliberate practice settings, this creates a particular hazard: AI-generated case variations may include clinically inconsistent or physiologically implausible details that, if accepted uncritically, could reinforce flawed reasoning. These risks further support treating the 3T-RAG principle and faculty oversight as essential implementation requirements rather than optional safeguards.

5. Conclusions

The M3RGE-AI framework recognizes that generative AI’s principal risk in medical education stems from unstructured application rather than the mere use of the technology. By positioning AI as a cognitive scaffold rather than a substitute for clinical reasoning, the framework aims to produce graduates who benefit from AI’s efficiency and feedback capacity while retaining the independent judgment, contextual understanding, and patient-centered communication that safe medical practice demands.

However, a conceptual framework is only a starting point. The next critical step is empirical validation. Priority research questions include: whether staged AI restrictions meaningfully reduce automation bias compared to unstructured access; whether the Chief-of-GPT model produces measurable differences in critical appraisal skills; and whether AI-off assessment checkpoints reliably detect over-reliance before it becomes clinically consequential.

The next essential step is to test M3RGE-AI empirically. One suitable pilot study would involve third- and fourth-year medical students during clinical rotations, comparing a cohort using structured M3RGE-AI with a cohort using AI without structured guidance. Primary outcomes could include performance on AI-off OSCE stations and Script Concordance Tests administered before and after the intervention to assess the retention of independent clinical reasoning. Secondary outcomes could evaluate automation bias with the Diagnostic Thinking Inventory and measure students’ attitudes toward AI through structured reflective tools at multiple time points. This design would allow researchers to examine the framework’s central assumptions (namely, that staged restriction, peer moderation, and AI-off checkpoints help preserve clinical reasoning) and generate evidence to refine or reassess its components across different institutional contexts.

The value of generative AI in medical education will be determined not by the technology itself but by the rigor and intentionality of its pedagogical integration. M3RGE-AI offers a structured foundation for that work.

Author Contributions

Conceptualization, L.C.-G.; methodology, L.C.-G. and J.P.M.-G.; software and AI proof of concepts, I.H.-M. and I.d.l.T.-D.; writing—original draft preparation, L.C.-G.; writing—review and editing, L.C.-G., J.P.M.-G. and I.H.-M.; visualization, I.H.-M.; supervision, L.C.-G. and J.P.M.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the Universidad de Valladolid, Proyectos de Innovación Docente (PID) 2025–2026, grant number PID-38.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

During the preparation of this manuscript, the author(s) used myProse (DocuScope Writing Mentor, Carnegie Mellon University, Pittsburgh, PA, USA, 2025); Google Gemini 3 Pro (Version 3 Pro); and OpenAI ChatGPT (GPT-5.2) for the purposes of English language editing, grammar improvement, and fluency enhancement. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

3T-RAG	Traceable, Trustworthy, Thought-provoking Retrieval-Augmented Generation
AI	Artificial Intelligence
DEFT-AI	Directed Experiential Feedback and Training with AI
DTI	Diagnostic Thinking Inventory
GenAI	Generative Artificial Intelligence
LLM	Large Language Model
M3RGE-AI	Medical Education with Responsible, Reliable, and Reflexive use of Generative Artificial Intelligence
NPT	Near-Peer Teaching
OSCE	Objective Structured Clinical Examination
PAL	Peer-Assisted Learning
RAG	Retrieval-Augmented Generation
STC	Script Concordance Test
ZPD	Zone of Proximal Development

References

Cabral, S.; Restrepo, D.; Kanjee, Z.; Wilson, P.; Crowe, B.; Abdulnour, R.-E.; Rodman, A. Clinical Reasoning of a Generative Artificial Intelligence Model Compared with Physicians. JAMA Intern. Med. 2024, 184, 581–583. [Google Scholar] [CrossRef]
Rodman, A.; Topol, E.J. Is Generative Artificial Intelligence Capable of Clinical Reasoning? Lancet 2025, 405, 689. [Google Scholar] [CrossRef]
Takita, H.; Kabata, D.; Walston, S.L.; Tatekawa, H.; Saito, K.; Tsujimoto, Y.; Miki, Y.; Ueda, D. A Systematic Review and Meta-Analysis of Diagnostic Performance Comparison between Generative AI and Physicians. npj Digit. Med. 2025, 8, 175. [Google Scholar] [CrossRef]
Mullankandy, D.S. The Impact of Large Language Models on Medical Education: Preparing for a Revolutionary Shift in Doctor Training. J. Artif. Intell. Gen. Sci. (JAIGS) 2024, 4, 270–277. [Google Scholar] [CrossRef]
Ammenwerth, E.; Zwan, L. Integration of Digital Tools into Clinical Reasoning Education: A Rapid Review. Stud. Health Technol. Inform. 2025, 324, 49–50. [Google Scholar] [CrossRef] [PubMed]
Fatima, S.S.; Sheikh, N.A.; Osama, A. Authentic Assessment in Medical Education: Exploring AI Integration and Student-as-Partners Collaboration. Postgrad. Med. J. 2024, 100, 959–967. [Google Scholar] [CrossRef] [PubMed]
Pham, T.D.; Karunaratne, N.; Exintaris, B.; Liu, D.; Lay, T.; Yuriev, E.; Lim, A. The Impact of Generative AI on Health Professional Education: A Systematic Review in the Context of Student Learning. Med. Educ. 2025, 59, 1280–1289. [Google Scholar] [CrossRef]
Liaw, W.; Chavez, S.; Pham, C.; Tehami, S.; Govender, R. The Hazards of Using ChatGPT: A Call to Action for Medical Education Researchers. PRiMER Peer-Rev. Rep. Med. Educ. Res. 2023, 7, 27. [Google Scholar] [CrossRef]
Tran, M.; Balasooriya, C.; Jonnagaddala, J.; Leung, G.K.-K.; Mahboobani, N.; Ramani, S.; Rhee, J.; Schuwirth, L.; Najafzadeh-Tabrizi, N.S.; Semmler, C.; et al. Situating Governance and Regulatory Concerns for Generative Artificial Intelligence and Large Language Models in Medical Education. npj Digit. Med. 2025, 8, 315. [Google Scholar] [CrossRef] [PubMed]
Jabbour, S.; Fouhey, D.; Shepard, S.; Valley, T.S.; Kazerooni, E.A.; Banovic, N.; Wiens, J.; Sjoding, M.W. Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study. JAMA 2023, 330, 2275–2284. [Google Scholar] [CrossRef]
Miola, L.; Muffato, V.; Sella, E.; Meneghetti, C.; Pazzaglia, F. GPS Use and Navigation Ability: A Systematic Review and Meta-Analysis. J. Environ. Psychol. 2024, 99, 102417. [Google Scholar] [CrossRef]
Abdulnour, R.-E.E.; Gin, B.; Boscardin, C.K. Educational Strategies for Clinical Supervision of Artificial Intelligence Use. N. Engl. J. Med. 2025, 393, 786–797. [Google Scholar] [CrossRef] [PubMed]
Triola, M.M.; Rodman, A. Integrating Generative Artificial Intelligence Into Medical Education: Curriculum, Policy, and Governance Strategies. Acad. Med. 2025, 100, 413–418. [Google Scholar] [CrossRef] [PubMed]
Knopp, M.I.; Warm, E.J.; Weber, D.; Kelleher, M.; Kinnear, B.; Schumacher, D.J.; Santen, S.A.; Mendonça, E.; Turner, L. AI-Enabled Medical Education: Threads of Change, Promising Futures, and Risky Realities Across Four Potential Future Worlds. JMIR Med. Educ. 2023, 9, e50373. [Google Scholar] [CrossRef] [PubMed]
Benjamin, J.; Masters, K.; Agrawal, A.; MacNeill, H.; Mehta, N. Twelve Tips on Applying AI Tools in HPE Scholarship Using Boyer’s Model. Med. Teach. 2025, 47, 949–954. [Google Scholar] [CrossRef]
Moser, M.; Posel, N.; Ganescu, O.; Fleiszer, D. Twelve Tips: Using Generative AI to Create and Optimize Content for Virtual Patient Simulations. Med. Teach. 2025, 47, 1745–1751. [Google Scholar] [CrossRef]
Wood, D.; Bruner, J.S.; Ross, G. The Role of Tutoring in Problem Solving. J. Child. Psychol. Psychiatry 1976, 17, 89–100. [Google Scholar] [CrossRef]
Vygotsky, L. Mind in Society: The Development of Higher Psychological Processes; Harvard University Press: Cambridge, MA, USA, 1978. [Google Scholar]
Ericsson, K.A.; Prietula, M.J.; Cokely, E.T. The Making of an Expert. Harv. Bus. Rev. 2007, 85, 114–121, 193. [Google Scholar]
Thesen, T.; Park, S.H. A Generative AI Teaching Assistant for Personalized Learning in Medical Education. npj Digit. Med. 2025, 8, 627. [Google Scholar] [CrossRef]
Cayres Ribeiro, L.M.; Sidorenkov, G.; El-Baz, N.; Vliegenthart, R.; Koopman, M.Y.; Durning, S.J.; de Carvalho Filho, M.A. Generating Synthetic Patient Vignettes from Real Medical Texts for the Teaching of Clinical Reasoning. Med. Teach. 2025, 48, 945–948. [Google Scholar] [CrossRef]
Herrera Montano, I.; Góngora Alonso, S.; Martínez Licort, R.; Sainz de Abajo, B.; de la Torre Díez, I.; Miramontes González, J.P.; Simón Pérez, C.; Briongos Figuero, L.; Corral Gudino, L. Anamnesio_bot: A Chatbot Prototype Based on Generative Artificial Intelligence for Clinical Anamnesis Study and Simulation. In Proceedings of the 20th Iberian Conference on Information Systems and Technologies (CISTI 2025), Lisbon, Portugal, 16–19 June 2025; Rocha, A., García Peñalvo, F., Costa, C.J., Gonçalves, R., Eds.; Springer Nature: Cham, Switzerland, 2026; pp. 132–143. [Google Scholar]
Khera, R.; Simon, M.A.; Ross, J.S. Automation Bias and Assistive AI: Risk of Harm from AI-Driven Clinical Decision Support. JAMA 2023, 330, 2255–2257. [Google Scholar] [CrossRef]
Zhang, H.; Liao, A.W.X.; Goh, S.H.; Wu, X.V.; Yoong, S.Q. Effectiveness of Peer Teaching in Health Professions Education: A Systematic Review and Meta-Analysis. Nurse Educ. Today 2022, 118, 105499. [Google Scholar] [CrossRef]
Guraya, S.Y.; Abdalla, M.E. Determining the Effectiveness of Peer-Assisted Learning in Medical Education: A Systematic Review and Meta-Analysis. J. Taibah Univ. Med. Sci. 2020, 15, 177–184. [Google Scholar] [CrossRef]
Casner, S.M.; Geven, R.W.; Recker, M.P.; Schooler, J.W. The Retention of Manual Flying Skills in the Automated Cockpit. Hum. Factors 2014, 56, 1506–1516. [Google Scholar] [CrossRef]
Masters, K.; Valanci-Aroesty, S.; Benjamin, J.; Mehta, N.; MacNeill, H. Using Locally-Hosted Small Language Models (SLMs) to Protect Student, Patient and Research Subject Data in Health Professions Education. Med. Teach. 2026, 1–4. [Google Scholar] [CrossRef]

Figure 1. Flowchart of M3RGE-AI (Medical Education with Responsible, Reliable and Reflexive use of Generative-Artificial Intelligence): Implementation process for incorporating AI throughout medical education during the medical degree program.

Table 1. Mapping M3RGE-AI components to foundational educational theories.

M3RGE-AI Component	Primary Theoretical Basis	How It Protects Clinical Reasoning Development
Staged progression (Preclinical to clinical)	Scaffolding/ZPD + Deliberate practice	By calibrating AI complexity to the learner’s developmental stage, the framework ensures students build independent reasoning foundations before AI assistance is introduced, preventing premature cognitive offloading at critical skill-acquisition moments
Restricted AI Use	Scaffolding/ZPD + Deliberate practice	Prohibiting direct answer generation in early stages forces students to engage in the effortful hypothesis generation and differential reasoning that underpin clinical expertise, while gradually fading restrictions as competence is demonstrated
The “Chief-of-GPT” model	Scaffolding/ZPD + Deliberate practice + PAL/NPT	By designating a peer facilitator to lead critical appraisal of AI outputs, the model transforms individual human–machine interaction into a socially accountable group process, reducing automation bias and anchoring while reinforcing metacognitive habits
Responsible use of AI	Scaffolding/ZPD + Deliberate practice + PAL/NPT	Progressive exposure to increasingly autonomous AI use, within peer-moderated environments, ensures that ethical reasoning and professional accountability are practiced as deliberate skills rather than assumed as byproducts of experience
Reliable use of AI	Scaffolding/ZPD + Deliberate practice + PAL/NPT	Repeated, structured practice in critically appraising AI outputs for accuracy and evidence, supported by peer cross-checking, builds verification as an automatic cognitive habit rather than an optional step
Reflexive use of AI	Scaffolding/ZPD + Deliberate practice + PAL/NPT	Structured individual and peer-led reflection ensures students develop sustained awareness of their own reasoning processes, recognizing when AI outputs should be trusted, challenged, or overridden in clinical context
3T-RAGs Principle	Scaffolding/ZPD + Deliberate practice + PAL/NPT	By requiring AI interactions to be traceable, trustworthy, and thought-provoking, this principle scaffolds source appraisal skills, reduces hallucination risk, and ensures AI outputs serve as starting points for reasoning rather than endpoints

3T-RAGs: Traceable, Trustworthy, Thought-provoking Retrieval-Augmented Generation; AI: Artificial intelligence; M3RGE-AI: Medical Education with Responsible, Reliable, and Reflexive use of Generative Artificial Intelligence; NPT: near-peer teaching; PAL: peer-assisted learning; ZPD: zone of proximal development.

Table 2. Exploring M3RGE-AI traits across medical degree years.

Phase	Key Educational Risk	AI Integration Approach	Competency Safeguards
Foundational (Years 1–2) Basic AI simulations and Socratic tutors	Unsupervised AI use shortcuts history-taking and hypothesis generation before foundational skills are established	Scripted AI patient encounters requiring iterative questioning; direct answer generation prohibited; Socratic prompts guide reasoning	Real-time faculty or peer debriefs; AI withholds diagnosis until students generate independent hypotheses
Intermediate (Years 3–4) Hybrid real patient and AI simulation encounters	Over-generalization from limited clinical exposure; rare or atypical presentations under-represented in everyday wards	AI generates focused supplementary scenarios extending real patient encounters; students compare reasoning across contexts	Structured prompts requiring students to contrast AI-suggested and self-generated differential diagnoses
Advanced (Years 5–6) High-fidelity hybrid simulations	Cognitive overload and automation bias under pressure in complex multisystem scenarios	AI combined with manikins and interprofessional teams for crisis scenarios; unexpected variables injected dynamically	Structured debriefs requiring students to justify decisions, reconcile AI suggestions with clinical judgment, and articulate when to override the system

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Corral-Gudino, L.; Herrero-Montano, I.; de la Torre-Díez, I.; Miramontes-González, J.P. Integrating Generative Artificial Intelligence (AI) in Medical Education: A Framework for Preserving Clinical Reasoning. Appl. Sci. 2026, 16, 5946. https://doi.org/10.3390/app16125946

AMA Style

Corral-Gudino L, Herrero-Montano I, de la Torre-Díez I, Miramontes-González JP. Integrating Generative Artificial Intelligence (AI) in Medical Education: A Framework for Preserving Clinical Reasoning. Applied Sciences. 2026; 16(12):5946. https://doi.org/10.3390/app16125946

Chicago/Turabian Style

Corral-Gudino, Luis, Isabel Herrero-Montano, Isabel de la Torre-Díez, and José Pablo Miramontes-González. 2026. "Integrating Generative Artificial Intelligence (AI) in Medical Education: A Framework for Preserving Clinical Reasoning" Applied Sciences 16, no. 12: 5946. https://doi.org/10.3390/app16125946

APA Style

Corral-Gudino, L., Herrero-Montano, I., de la Torre-Díez, I., & Miramontes-González, J. P. (2026). Integrating Generative Artificial Intelligence (AI) in Medical Education: A Framework for Preserving Clinical Reasoning. Applied Sciences, 16(12), 5946. https://doi.org/10.3390/app16125946

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Integrating Generative Artificial Intelligence (AI) in Medical Education: A Framework for Preserving Clinical Reasoning

Featured Application

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. M3RGE-AI Framework

3.2. Core Components

3.2.1. Overview: Principles and Longitudinal Design

3.2.2. Cross-Stages Safeguards: Monitoring, Peer Review, and Adaptive Governance

3.2.3. Continuous Individual Assessment

3.2.4. Supervision Challenges in Large Cohorts: The Role of PAL/NPT

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI