Artificial Intelligence in Medical Education: A Narrative Review on Implementation, Evaluation, and Methodological Challenges
Abstract
1. Introduction
1.1. General Context
1.2. Rationale
1.3. Emerging Issues and Challenges
1.4. Specific Objectives
2. Materials and Methods
2.1. Literature Search Strategy
- PubMed search string: ((“artificial intelligence”[MeSH] OR “artificial intelligence”[Title/Abstract] OR “AI”[Title/Abstract] OR “machine learning”[Title/Abstract] OR “deep learning”[Title/Abstract] OR “Large Language Models”[Title/Abstract] OR “Generative Artificial Intelligence”[Title/Abstract] OR “Generative AI”[Title/Abstract]) AND (“curriculum”[Title/Abstract] OR “training program”[Title/Abstract] OR “simulation”[Title/Abstract] OR “skills training”[Title/Abstract] OR “competency-based education”[Title/Abstract] OR “clinical teaching”[Title/Abstract] OR “medical education”[MeSH] OR “clinical education”[Title/Abstract]) AND (“medical students”[Title/Abstract] OR “residents”[Title/Abstract] OR “fellows”[Title/Abstract] OR “physicians”[Title/Abstract] OR “health personnel”[MeSH])) AND (“2018/01/01”[Date–Publication]: “2024/12/31”[Date–Publication])
- Embase search string: (‘artificial intelligence’:ti,ab OR ‘ai’:ti,ab OR ‘machine learning’:ti,ab OR ‘deep learning’:ti,ab OR ‘large language models’:ti,ab OR ‘generative artificial intelligence’:ti,ab OR ‘generative ai’:ti,ab) AND (‘medical education’:ti,ab OR ‘curriculum’:ti,ab OR ‘training program’:ti,ab OR ‘simulation’:ti,ab OR ‘skills training’:ti,ab OR ‘competency-based education’:ti,ab OR ‘clinical teaching’:ti,ab OR ‘clinical education’:ti,ab) AND (‘healthcare professionals’:ti,ab OR ‘clinicians’:ti,ab OR ‘medical personnel’:ti,ab OR ‘medical students’:ti,ab OR ‘residents’:ti,ab OR ‘fellows’:ti,ab OR ‘specialists’:ti,ab OR ‘physicians’:ti,ab) AND [2018–2024]/py
2.2. Study Selection and Eligibility Criteria
- -
- Empirical studies examining AI applications in undergraduate, graduate, or continuing medical education;
- -
- Studies reporting objective educational outcomes (e.g., performance metrics, test scores, skill acquisition);
- -
- Interventions involving AI-based tools as tutors, simulators, evaluators, or diagnostic aids.
- -
- Studies involving only patients or non-healthcare learners;
- -
- Articles reporting only subjective outcomes or perceptions;
- -
- Editorials, conference abstracts, reviews, or protocols.
3. Results
3.1. AI as a Tutor and Generator of Educational Content
- Limitations and Challenges in the Use of AI as a Tutor and Generator of Educational Content
- -
- Content reliability: Several studies report conceptual errors, omissions, and incoherent responses generated by LLMs such as ChatGPT, especially in complex clinical contexts [35]. The performance of these systems is highly sensitive to prompt quality and requires expert supervision to ensure accuracy [36].
- -
- -
- -
- Pedagogical risks: The widespread use of AI may promote passive learning and cognitive outsourcing unless accompanied by specific training in AI literacy [29,31,37,38]. In academic contexts, unregulated use of generative tools in writing raises concerns about originality and the development of critical thinking [29,34].
- -
3.2. From Simulation to Practice: Developing Competence with AI
- Limitations and Challenges
- -
- Sample size and generalizability: several studies rely on extremely small samples, as in the case of Ruberto et al. (n = 4) [44], which limits generalizability.
- -
- Model interpretability: Although advanced models such as CNNs and LSTMs demonstrate high predictive accuracy, their opaque architecture impedes their adoption in high-responsibility domains like surgery and anesthesia, where trust and transparency are critical [40,50,52]. Emerging model interpretability techniques, such as saliency maps for CNNs, aim to mitigate this “black box” issue by visualizing the features that most influence a model’s decision, yet their integration into educational platforms is not yet standard practice.
- -
- -
- Learner engagement and self-efficacy: Some evidence suggests that AI-led training may reduce learner confidence or perceived competence. Sok et al. reported lower self-efficacy scores in simulations involving AI instructors and Chang et al. warned against cognitive offloading and automation dependency [50,54].
- -
- Retention and long-term efficacy: The durability of AI-acquired skills remains uncertain. Liu et al. observed evidence of skill decay over time, highlighting the need for periodic reinforcement and longitudinal curriculum strategies [55].
3.3. Enhancing Clinical Perception: AI in Diagnostic Training
- Limitations and Challenges
- -
- Algorithmic reliability: Some generative systems, such as DALL·E 3, have demonstrated poor reliability for educational purposes. In one evaluation, over 78% of AI-generated anatomical images for congenital heart disease were judged unsuitable due to structural errors and misleading labels [68]. NLP tools like OSCEBot also show reduced performance in non-scripted or unexpected clinical interactions [64].
- -
- Generalizability and dataset bias: Many AI models perform well in narrowly defined diagnostic domains but fail to generalize beyond their training data. For instance, CNNs used in histological classification struggled with atypical or rare morphologies, limiting training validity [56]. In dermatology and rare disease education, synthetic datasets may under-represent relevant pathologies or demographic groups, introducing bias and limiting realism [62].
- -
- Technical and infrastructural barriers: Effective implementation requires high-quality hardware, validated datasets, and stable platforms—resources not evenly distributed across institutions. Limited digital infrastructure can compromise access and reproducibility, especially in low-resource settings [60,61].
- -
- Ethical concerns and data governance: The use of facial recognition technologies for training in clinical genetics (e.g., DeepGestalt) raises critical questions about biometric data protection, informed consent, and the potential for unintended re-identification. Biases embedded in facial datasets may also propagate into learner judgments [63].
- -
3.4. Towards Data-Driven Training: AI in Competency Assessment
- Limitations and Challenges
- -
- Generalizability and sample limitations: Many models have been tested on small or homogeneous populations (e.g., medical students only, low-fidelity simulators), limiting external validity. For instance, YOLOv8 and 3D CNNs were validated in highly controlled settings and may not generalize to real clinical tasks [73,74].
- -
- Clinical validation: Although AI tools show high accuracy in simulated environments, few studies have demonstrated translation to real-world clinical outcomes. The connection between performance improvements in VR and patient-level results remains tenuous [75].
- -
- -
- Technological and cognitive burden: High technical complexity and limited digital familiarity can inhibit learner engagement. Meade et al. observed reduced participation and course retention due to initial intimidation regarding AI concepts [81].
4. Discussion
4.1. Summary of Evidence: Where AI Impacts Medical Education
4.2. Effectiveness, Methodological Rigor, and Epistemic Boundaries
4.3. Systemic Barriers: Technology, Pedagogy, and Ethics
4.4. Future Perspectives: Towards Responsible and Integrated Adoption
- -
- Supervised Hybrid Models (Human-in-the-Loop): AI should not replace but enhance educational interactions, providing automated feedback while requiring expert validation and interpretation [72].
- -
- Multicenter and Longitudinal Evaluations: Large-scale studies with clinical impact measures are essential to move beyond the exploratory phase and generate transferable evidence. Academic hospitals represent ideal testbeds for such integrated models, due to their integration of education, clinical care, and research.
- -
- AI Literacy for Learners and Educators: Medical curricula should include foundational modules on ML principles, digital ethics, and critical appraisal of algorithmic outputs. Without such training, there is a risk of creating passive users rather than critically engaged professionals [81].
- -
- Standardization and Interoperability: There is a pressing need for shared benchmarks, validated datasets, and interoperable systems to support algorithmic transparency and reliable cross-institutional implementation. This includes developing reference metrics for procedural and diagnostic competencies.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
AI-TIG | Artificial Intelligence Text-to-Image Generation |
ANOVA | Analysis of Variance |
AUC | Area Under the Curve |
AUROC | Area Under the Receiver Operating Characteristic Curve |
CAD | Computer-Aided Diagnostic |
CCEP | Clinical Cardiac Electrophysiology |
CHD | Congenital Heart Disease |
CHDs | Congenital Heart Diseases |
cPOCUS | Cardiac Point-Of-Care Ultrasound |
DCNN | Deep Convolutional Neural Network |
DL | Deep Learning |
GANs | Generative Adversarial Networks |
GDPR | General Data Protection Regulation |
LLMs | Large Language Models |
MESH | Medical Subject Headings |
ML | Machine Learning |
Mocap | Motion Capture |
NEC | Necrotizing Enterocolitis |
NICU | Neonatal Intensive Care Unit |
NLP | Natural Language Processing |
SUS | System Usability Scale |
WSI | Whole Slide Images |
OSCE | Objective Structured Clinical Examination |
OSATS | Objective Structured Assessment of Technical Skills |
VR | Virtual Reality |
CNN | Convolutional Neural Network |
SVM | Support Vector Machine |
DNN | Deep Neural Network |
LSTM | Long Short-Term Memory |
GEARS | Global Evaluative Assessment of Robotic Skills |
SBERT | Sentence-BERT |
MCRDR | Multiple Classification Ripple Down Rules |
References
- Hallquist, E.; Gupta, I.; Montalbano, M.; Loukas, M. Applications of Artificial Intelligence in Medical Education: A Systematic Review. Cureus 2025, 17, e79878. [Google Scholar] [CrossRef]
- Gordon, M.; Daniel, M.; Ajiboye, A.; Uraiby, H.; Xu, N.Y.; Bartlett, R.; Hanson, J.; Haas, M.; Spadafore, M.; Grafton-Clarke, C.; et al. A Scoping Review of Artificial Intelligence in Medical Education: BEME Guide No. 84. Med. Teach. 2024, 46, 446–470. [Google Scholar] [CrossRef]
- Nagi, F.; Salih, R.; Alzubaidi, M.; Shah, H.; Alam, T.; Shah, Z.; Househ, M. Applications of Artificial Intelligence (AI) in Medical Education: A Scoping Review. Stud. Health Technol. Inform. 2023, 305, 648–651. [Google Scholar] [CrossRef] [PubMed]
- Shaw, K.; Henning, M.A.; Webster, C.S. Artificial Intelligence in Medical Education: A Scoping Review of the Evidence for Efficacy and Future Directions. Med. Sci. Educ. 2025, 35, 1803–1816. [Google Scholar] [CrossRef] [PubMed]
- Younis, H.A.; Eisa, T.A.E.; Nasser, M.; Sahib, T.M.; Noor, A.A.; Alyasiri, O.M.; Salisu, S.; Hayder, I.M.; Younis, H.A. A Systematic Review and Meta-Analysis of Artificial Intelligence Tools in Medicine and Healthcare: Applications, Considerations, Limitations, Motivation and Challenges. Diagnostics 2024, 14, 109. [Google Scholar] [CrossRef] [PubMed]
- Giansanti, D.; Pirrera, A. Integrating AI and Assistive Technologies in Healthcare: Insights from a Narrative Review of Reviews. Healthcare 2025, 13, 556. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.; Wu, A.S.; Li, D.; Kulasegaram, K.M. Artificial Intelligence in Undergraduate Medical Education: A Scoping Review. Acad. Med. 2021, 96, S62–S70. [Google Scholar] [CrossRef]
- Kovalainen, T.; Pramila-Savukoski, S.; Kuivila, H.-M.; Juntunen, J.; Jarva, E.; Rasi, M.; Mikkonen, K. Utilising Artificial Intelligence in Developing Education of Health Sciences Higher Education: An Umbrella Review of Reviews. Nurs. Educ. Today 2025, 147, 106600. [Google Scholar] [CrossRef]
- Feigerlova, E.; Hani, H.; Hothersall-Davies, E. A Systematic Review of the Impact of Artificial Intelligence on Educational Outcomes in Health Professions Education. BMC Med. Educ. 2025, 25, 129. [Google Scholar] [CrossRef]
- Batista, J.; Mesquita, A.; Carnaz, G. Generative AI and Higher Education: Trends, Challenges, and Future Directions from a Systematic Literature Review. Information 2024, 15, 676. [Google Scholar] [CrossRef]
- Al-kfairy, M.; Mustafa, D.; Kshetri, N.; Insiew, M.; Alfandi, O. Ethical Challenges and Solutions of Generative AI: An Interdisciplinary Perspective. Informatics 2024, 11, 58. [Google Scholar] [CrossRef]
- Mohammad Amini, M.; Jesus, M.; Fanaei Sheikholeslami, D.; Alves, P.; Hassanzadeh Benam, A.; Hariri, F. Artificial Intelligence Ethics and Challenges in Healthcare Applications: A Comprehensive Review in the Context of the European GDPR Mandate. Mach. Learn. Knowl. Extr. 2023, 5, 1023–1035. [Google Scholar] [CrossRef]
- van Kolfschooten, H.B. A Health-Conformant Reading of the GDPR’s Right Not to Be Subject to Automated Decision-Making. Med. Law Rev. 2024, 32, 373–391. [Google Scholar] [CrossRef] [PubMed]
- Gilbert, F.J.; Palmer, J.; Woznitza, N.; Nash, J.; Brackstone, C.; Faria, L.; Dunbar, J.K.; Hogg, H.D.J.; Liu, X.; Denniston, A.K. Data and Data Privacy Impact Assessments in the Context of AI Research and Practice in the UK. Front. Health Serv. 2025, 5, 1525955. [Google Scholar] [CrossRef] [PubMed]
- Garcia, P.E.; Marques, F.C. Issues and Limitations on the Integration of Artificial Intelligence into Medical Education: A Narrative Review. Educ. Sci. 2024, 14, 379. [Google Scholar] [CrossRef]
- Barrera Castro, G.P.; Chiappe, A.; Ramírez-Montoya, M.S.; Alcántar Nieblas, C. Key Barriers to Personalized Learning in Times of Artificial Intelligence: A Literature Review. Appl. Sci. 2025, 15, 3103. [Google Scholar] [CrossRef]
- Lalova-Spinks, T.; Valcke, P.; Ioannidis, J.P.A.; Huys, I. EU–US Data Transfers: An Enduring Challenge for Health Research Collaborations. NPJ Digit. Med. 2024, 7, 215. [Google Scholar] [CrossRef]
- Falagas, M.E.; Pitsouni, E.I.; Malietzis, G.A.; Pappas, G. Comparison of PubMed, Scopus, Web of Science, and Google Scholar: Strengths and Weaknesses. FASEB J. 2008, 22, 338–342. [Google Scholar] [CrossRef] [PubMed]
- Laohawetwanit, T.; Apornvirat, S.; Kantasiripitak, C. ChatGPT as a Teaching Tool: Preparing Pathology Residents for Board Examination with AI-Generated Digestive System Pathology Tests. Am. J. Clin. Pathol. 2024, 162, 471–479. [Google Scholar] [CrossRef]
- Gan, W.; Ouyang, J.; Li, H.; Xue, Z.; Zhang, Y.; Dong, Q.; Huang, J.; Zheng, X.; Zhang, Y. Integrating ChatGPT in Orthopedic Education for Medical Undergraduates: Randomized Controlled Trial. J. Med. Internet Res. 2024, 26, e57037. [Google Scholar] [CrossRef]
- Brügge, E.; Ricchizzi, S.; Arenbeck, M.; Keller, M.N.; Schur, L.; Stummer, W.; Holling, M.; Lu, M.H.; Darici, D. Large Language Models Improve Clinical Decision Making of Medical Students through Patient Simulation and Structured Feedback: A Randomized Controlled Trial. BMC Med. Educ. 2024, 24, 1391. [Google Scholar] [CrossRef]
- Yamamoto, A.; Koda, M.; Ogawa, H.; Miyoshi, T.; Maeda, Y.; Otsuka, F.; Ino, H. Enhancing Medical Interview Skills Through AI-Simulated Patient Interactions: Nonrandomized Controlled Trial. JMIR Med. Educ. 2024, 10, e58753. [Google Scholar] [CrossRef]
- Zheng, K.; Shen, Z.; Chen, Z.; Che, C.; Zhu, H. Application of AI-Empowered Scenario-Based Simulation Teaching Mode in Cardiovascular Disease Education. BMC Med. Educ. 2024, 24, 1003. [Google Scholar] [CrossRef]
- Aster, A.; Hütt, C.; Morton, C.; Flitton, M.; Laupichler, M.C.; Raupach, T. Development and Evaluation of an Emergency Department Serious Game for Undergraduate Medical Students. BMC Med. Educ. 2024, 24, 1061. [Google Scholar] [CrossRef] [PubMed]
- Liaw, S.Y.; Tan, J.Z.; Lim, S.; Zhou, W.; Yap, J.; Ratan, R.; Ooi, S.L.; Wong, S.J.; Seah, B.; Chua, W.L. Artificial Intelligence in Virtual Reality Simulation for Interprofessional Communication Training: Mixed Method Study. Nurs. Educ. Today 2023, 122, 105718. [Google Scholar] [CrossRef]
- Cianciolo, A.T.; LaVoie, N.; Parker, J. Machine Scoring of Medical Students’ Written Clinical Reasoning: Initial Validity Evidence. Acad. Med. 2021, 96, 1026–1035. [Google Scholar] [CrossRef]
- Su, J.-M.; Hsu, S.-Y.; Fang, T.-Y.; Wang, P.-C. Developing and Validating a Knowledge-Based AI Assessment System for Learning Clinical Core Medical Knowledge in Otolaryngology. Comput. Biol. Med. 2024, 178, 108765. [Google Scholar] [CrossRef]
- Brutschi, R.; Wang, R.; Kolbe, M.; Weiss, K.; Lohmeyer, Q.; Meboldt, M. Speech Recognition Technology for Assessing Team Debriefing Communication and Interaction Patterns: An Algorithmic Toolkit for Healthcare Simulation Educators. Adv. Simul. 2024, 9, 42. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Liao, Y.; Liu, S.; Zhang, D.; Wang, N.; Shu, J.; Wang, R. The Impact of Using ChatGPT on Academic Writing Among Medical Undergraduates. Ann. Med. 2024, 56, 2426760. [Google Scholar] [CrossRef] [PubMed]
- Wiggins, W.F.; Caton, M.T., Jr.; Magudia, K.; Rosenthal, M.H.; Andriole, K.P. A Conference-Friendly, Hands-On Introduction to Deep Learning for Radiology Trainees. J. Digit. Imaging 2021, 34, 1026–1033. [Google Scholar] [CrossRef]
- Krive, J.; Isola, M.; Chang, L.; Patel, T.; Anderson, M.; Sreedhar, R. Grounded in Reality: Artificial Intelligence in Medical Education. JAMIA Open 2023, 6, ooad037. [Google Scholar] [CrossRef]
- Furlan, R.; Gatti, M.; Menè, R.; Shiffer, D.; Marchiori, C.; Giaj Levra, A.; Saturnino, V.; Brunetta, E.; Dipaola, F. A Natural Language Processing-Based Virtual Patient Simulator and Intelligent Tutoring System for the Clinical Diagnostic Process: Simulator Development and Case Study. JMIR Med. Inform. 2021, 9, e24073. [Google Scholar] [CrossRef]
- Wang, M.; Sun, Z.; Jia, M.; Wang, Y.; Wang, H.; Zhu, X.; Chen, L.; Ji, H. Intelligent Virtual Case Learning System Based on Real Medical Records and Natural Language Processing. BMC Med. Inform. Decis. Mak. 2022, 22, 60. [Google Scholar] [CrossRef] [PubMed]
- Desseauve, D.; Lescar, R.; de la Fourniere, B.; Ceccaldi, P.F.; Dziadzko, M. AI in obstetrics: Evaluating residents’ capabilities and interaction strategies with ChatGPT. Eur. J. Obstet. Gynecol. Reprod. Biol. 2024, 302, 238–241. [Google Scholar] [CrossRef] [PubMed]
- Scherr, R.; Halaseh, F.F.; Spina, A.; Andalib, S.; Rivera, R. ChatGPT Interactive Medical Simulations for Early Clinical Education: Case Study. JMIR Med. Educ. 2023, 9, e49877. [Google Scholar] [CrossRef] [PubMed]
- Saluja, S.; Tigga, S.R. Capabilities and Limitations of ChatGPT in Anatomy Education: An Interaction with ChatGPT. Cureus 2024, 16, e69000. [Google Scholar] [CrossRef]
- Veras, M.; Dyer, J.O.; Shannon, H.; Bogie, B.J.M.; Ronney, M.; Sekhon, H.; Rutherford, D.; Silva, P.G.B.; Kairy, D. A Mixed Methods Crossover Randomized Controlled Trial Exploring the Experiences, Perceptions, and Usability of Artificial Intelligence (ChatGPT) in Health Sciences Education. Digit. Health 2024, 10, 20552076241298485. [Google Scholar] [CrossRef]
- Xie, Y.; Seth, I.; Hunter-Smith, D.J.; Rozen, W.M.; Seifman, M.A. Investigating the Impact of Innovative AI Chatbot on Post-Pandemic Medical Education and Clinical Assistance: A Comprehensive Analysis. ANZ J. Surg. 2024, 94, 68–77. [Google Scholar] [CrossRef]
- Siyar, S.; Azarnoush, H.; Rashidi, S.; Winkler-Schwartz, A.; Bissonnette, V.; Ponnudurai, N.; Del Maestro, R.F. Machine Learning Distinguishes Neurosurgical Skill Levels in a Virtual Reality Tumor Resection Task. Med. Biol. Eng. Comput. 2020, 58, 1357–1367. [Google Scholar] [CrossRef]
- Alkadri, S.; Ledwos, N.; Mirchi, N.; Reich, A.; Yilmaz, R.; Driscoll, M.; Del Maestro, R.F. Utilizing a Multilayer Perceptron Artificial Neural Network to Assess a Virtual Reality Surgical Procedure. Comput. Biol. Med. 2021, 136, 104770. [Google Scholar] [CrossRef]
- Simmonds, C.; Brentnall, M.; Lenihan, J. Evaluation of a Novel Universal Robotic Surgery Virtual Reality Simulation Proficiency Index That Will Allow Comparisons of Users across Any Virtual Reality Simulation Curriculum. Surg. Endosc. 2021, 35, 5867–5875. [Google Scholar] [CrossRef]
- Radi, I.; Tellez, J.C.; Alterio, R.E.; Scott, D.J.; Sankaranarayanan, G.; Nagaraj, M.B.; Hogg, M.E.; Zeh, H.J.; Polanco, P.M. Feasibility, Effectiveness and Transferability of a Novel Mastery-Based Virtual Reality Robotic Training Platform for General Surgery Residents. Surg. Endosc. 2022, 36, 7279–7287. [Google Scholar] [CrossRef] [PubMed]
- Fazlollahi, A.M.; Bakhaidar, M.; Alsayegh, A.; Yilmaz, R.; Winkler-Schwartz, A.; Mirchi, N.; Langleben, I.; Ledwos, N.; Sabbagh, A.J.; Bajunaid, K.; et al. Effect of Artificial Intelligence Tutoring vs. Expert Instruction on Learning Simulated Surgical Skills Among Medical Students: A Randomized Clinical Trial. JAMA Netw. Open 2022, 5, e2149008. [Google Scholar] [CrossRef]
- Ruberto, A.J.; Rodenburg, D.; Ross, K.; Sarkar, P.; Hungler, P.C.; Etemad, A.; Howes, D.; Clarke, D.; McLellan, J.; Wilson, D.; et al. The Future of Simulation-Based Medical Education: Adaptive Simulation Utilizing a Deep Multitask Neural Network. AEM Educ. Train. 2021, 5, e10605. [Google Scholar] [CrossRef]
- Cai, N.; Wang, G.; Xu, L.; Zhou, Y.; Chong, H.; Zhao, Y.; Wang, J.; Yan, W.; Zhang, B.; Liu, N. Examining the Impact of Perceptual Learning Artificial-Intelligence-Based on the Incidence of Paresthesia When Performing the Ultrasound-Guided Popliteal Sciatic Block: Simulation-Based Randomized Study. BMC Anesthesiol. 2022, 22, 392. [Google Scholar] [CrossRef]
- Yovanoff, M.A.; Chen, H.E.; Pepley, D.F.; Mirkin, K.A.; Han, D.C.; Moore, J.Z.; Miller, S.R. Investigating the Effect of Simulator Functional Fidelity and Personalized Feedback on Central Venous Catheterization Training. J. Surg. Educ. 2018, 75, 1410–1421. [Google Scholar] [CrossRef]
- Ledwos, N.; Mirchi, N.; Yilmaz, R.; Winkler-Schwartz, A.; Sawni, A.; Fazlollahi, A.M.; Bissonnette, V.; Bajunaid, K.; Sabbagh, A.J.; Del Maestro, R.F. Assessment of Learning Curves on a Simulated Neurosurgical Task Using Metrics Selected by Artificial Intelligence. J. Neurosurg. 2022, 137, 1160–1171. [Google Scholar] [CrossRef]
- Di Mitri, D.; Schneider, J.; Specht, M.; Drachsler, H. Detecting Mistakes in CPR Training with Multimodal Data and Neural Networks. Sensors 2019, 19, 3099. [Google Scholar] [CrossRef]
- Melnyk, R.; Campbell, T.; Holler, T.; Cameron, K.; Saba, P.; Witthaus, M.W.; Joseph, J.; Ghazi, A. See Like an Expert: Gaze-Augmented Training Enhances Skill Acquisition in a Virtual Reality Robotic Suturing Task. J. Endourol. 2021, 35, 376–382. [Google Scholar] [CrossRef] [PubMed]
- Liaw, S.Y.; Tan, J.Z.; Bin Rusli, K.D.; Ratan, R.; Zhou, W.; Lim, S.; Lau, T.C.; Seah, B.; Chua, W.L. Artificial Intelligence Versus Human-Controlled Doctor in Virtual Reality Simulation for Sepsis Team Training: Randomized Controlled Study. J. Med. Internet Res. 2023, 25, e47748. [Google Scholar] [CrossRef] [PubMed]
- Riaño, D.; Real, F.; Alonso, J.R. Improving Resident’s Skills in the Management of Circulatory Shock with a Knowledge-Based E-Learning Tool. Int. J. Med. Inform. 2018, 113, 49–55. [Google Scholar] [CrossRef]
- Hamilton, B.C.; Dairywala, M.I.; Highet, A.; Nguyen, T.C.; O’Sullivan, P.; Chern, H.; Soriano, I.S. Artificial Intelligence Based Real-Time Video Ergonomic Assessment and Training Improves Resident Ergonomics. Am. J. Surg. 2023, 226, 741–746. [Google Scholar] [CrossRef]
- Hershberger, P.J.; Pei, Y.; Bricker, D.A.; Crawford, T.N.; Shivakumar, A.; Castle, A.; Conway, K.; Medaramitta, R.; Rechtin, M.; Wilson, J.F. Motivational Interviewing Skills Practice Enhanced with Artificial Intelligence: ReadMI. BMC Med. Educ. 2024, 24, 237. [Google Scholar] [CrossRef]
- Chang, J.; Bliss, L.; Angelov, N.; Glick, A. Artificial Intelligence-Assisted Full-Mouth Radiograph Mounting in Dental Education. J. Dent. Educ. 2024, 88, 933–939. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Watkins, K.; Hall, C.E.; Liu, Y.; Lee, S.H.; Papandria, D.; Delman, K.A.; Srinivasan, J.; Patel, A.; Davis, S.S.; et al. Utilizing Simulation to Evaluate Robotic Skill Acquisition and Learning Decay. Surg. Laparosc. Endosc. Percutan. Tech. 2023, 33, 317–323. [Google Scholar] [CrossRef]
- Barui, S.; Sanyal, P.; Rajmohan, K.S.; Panigrahi, A.; Kundu, R. Perception without Preconception: Comparison between the Human and Machine Learner in Recognition of Tissues from Histological Sections. Sci. Rep. 2022, 12, 16420. [Google Scholar] [CrossRef]
- Cheng, C.T.; Chen, C.C.; Fu, C.Y.; Chaou, C.H.; Wu, Y.T.; Hsu, C.P.; Chang, C.C.; Chung, I.F.; Hsieh, C.H.; Hsieh, M.J.; et al. Artificial Intelligence-Based Education Assists Medical Students’ Interpretation of Hip Fracture. Insights Imaging 2020, 11, 119. [Google Scholar] [CrossRef]
- Aronovitz, N.; Hazan, I.; Jedwab, R.; Ben Shitrit, I.; Quinn, A.; Wacht, O.; Fuchs, L. The Effect of Real-Time EF Automatic Tool on Cardiac Ultrasound Performance among Medical Students. PLoS ONE 2024, 19, e0299461. [Google Scholar] [CrossRef]
- Lei, T.; Zheng, Q.; Feng, J.; Zhang, L.; Zhou, Q.; He, M.; Lin, M.; Xie, H.N. Enhancing Trainee Performance in Obstetric Ultrasound through an Artificial Intelligence System: Randomized Controlled Trial. Ultrasound Obstet. Gynecol. 2024, 64, 453–462. [Google Scholar] [CrossRef] [PubMed]
- Fang, Z.; Xu, Z.; He, X.; Han, W. Artificial Intelligence-Based Pathologic Myopia Identification System in the Ophthalmology Residency Training Program. Front. Cell Dev. Biol. 2022, 10, 1053079. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Xian, D.; Yu, L.; Kong, Y.; Lv, H.; Huang, L.; Liu, K.; Zhang, H.; Wei, W.; Tang, H. Integration of AI-Assisted in Digital Cervical Cytology Training: A Comparative Study. Cytopathology 2024, 36, 156–164. [Google Scholar] [CrossRef] [PubMed]
- Tabuchi, H.; Nakajima, I.; Day, M.; Masumoto, H.; Tsuji, S.; Miki, M.; Enno, H.; Masumoto, K. Comparative Educational Effectiveness of AI Generated Images and Traditional Lectures for Diagnosing Chalazion and Sebaceous Carcinoma. Sci. Rep. 2024, 14, 29200. [Google Scholar] [CrossRef]
- Marwaha, A.; Chitayat, D.; Meyn, M.S.; Mendoza-Londono, R.; Chad, L. The Point-of-Care Use of a Facial Phenotyping Tool in the Genetics Clinic: Enhancing Diagnosis and Education with Machine Learning. Am. J. Med. Genet. A 2021, 185, 1151–1158. [Google Scholar] [CrossRef]
- Pereira, D.S.M.; Falcão, F.; Nunes, A.; Santos, N.; Costa, P.; Pêgo, J.M. Designing and Building OSCEBot®® for Virtual OSCE—Performance Evaluation. Med. Educ. Online 2023, 28, 2228550. [Google Scholar] [CrossRef] [PubMed]
- Yang, W.; Hebert, D.; Kim, S.; Kang, B. MCRDR Knowledge-Based 3D Dialogue Simulation in Clinical Training and Assessment. J. Med. Syst. 2019, 43, 200. [Google Scholar] [CrossRef]
- Ramgopal, S.; Varma, S.; Gorski, J.K.; Kester, K.M.; Shieh, A.; Suresh, S. Evaluation of a Large Language Model on the American Academy of Pediatrics’ PREP Emergency Medicine Question Bank. Pediatr. Emerg. Care 2024, 40, 871–875. [Google Scholar] [CrossRef]
- Berbenyuk, A.; Powell, L.; Zary, N. Feasibility and Educational Value of Clinical Cases Generated Using Large Language Models. Stud. Health Technol. Inform. 2024, 316, 1524–1528. [Google Scholar] [CrossRef]
- Temsah, M.H.; Alhuzaimi, A.N.; Almansour, M.; Aljamaan, F.; Alhasan, K.; Batarfi, M.A.; Altamimi, I.; Alharbi, A.; Alsuhaibani, A.A.; Alwakeel, L.; et al. Art or Artifact: Evaluating the Accuracy, Appeal, and Educational Value of AI-Generated Imagery in DALL·E 3 for Illustrating Congenital Heart Diseases. J. Med. Syst. 2024, 48, 54. [Google Scholar] [CrossRef]
- Yang, J.H.; Goodman, E.D.; Dawes, A.J.; Gahagan, J.V.; Esquivel, M.M.; Liebert, C.A.; Kin, C.; Yeung, S.; Gurland, B.H. Using AI and Computer Vision to Analyze Technical Proficiency in Robotic Surgery. Surg. Endosc. 2023, 37, 3010–3017. [Google Scholar] [CrossRef]
- Bissonnette, V.; Mirchi, N.; Ledwos, N.; Alsidieri, G.; Winkler-Schwartz, A.; Del Maestro, R.F.; Neurosurgical Simulation & Artificial Intelligence Learning Centre. Artificial Intelligence Distinguishes Surgical Training Levels in a Virtual Reality Spinal Task. J. Bone Jt. Surg. Am. 2019, 101, e127. [Google Scholar] [CrossRef] [PubMed]
- Gleason, A.; Servais, E.; Quadri, S.; Manganiello, M.; Cheah, Y.L.; Simon, C.J.; Preston, E.; Graham-Stephenson, A.; Wright, V. Developing Basic Robotic Skills Using Virtual Reality Simulation and Automated Assessment Tools: A Multidisciplinary Robotic Virtual Reality-Based Curriculum Using the Da Vinci Skills Simulator and Tracking Progress with the Intuitive Learning Platform. J. Robot. Surg. 2022, 16, 1313–1319. [Google Scholar] [CrossRef]
- Fazlollahi, A.M.; Yilmaz, R.; Winkler-Schwartz, A.; Mirchi, N.; Ledwos, N.; Bakhaidar, M.; Alsayegh, A.; Del Maestro, R.F. AI in Surgical Curriculum Design and Unintended Outcomes for Technical Competencies in Simulation Training. JAMA Netw. Open 2023, 6, e2334658. [Google Scholar] [CrossRef] [PubMed]
- Smith, R.; Julian, D.; Dubin, A. Deep Neural Networks Are Effective Tools for Assessing Performance during Surgical Training. J. Robot. Surg. 2022, 16, 559–562. [Google Scholar] [CrossRef] [PubMed]
- Nagaraj, M.B.; Namazi, B.; Sankaranarayanan, G.; Scott, D.J. Developing Artificial Intelligence Models for Medical Student Suturing and Knot-Tying Video-Based Assessment and Coaching. Surg. Endosc. 2023, 37, 402–411. [Google Scholar] [CrossRef] [PubMed]
- Laverde, R.; Rueda, C.; Amado, L.; Rojas, D.; Altuve, M. Artificial Neural Network for Laparoscopic Skills Classification Using Motion Signals from Apple Watch. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2018, 2018, 5434–5437. [Google Scholar] [CrossRef]
- Bogar, P.Z.; Virag, M.; Bene, M.; Hardi, P.; Matuz, A.; Schlegl, A.T.; Toth, L.; Molnar, F.; Nagy, B.; Rendeki, S.; et al. Validation of a Novel, Low-Fidelity Virtual Reality Simulator and an Artificial Intelligence Assessment Approach for Peg Transfer Laparoscopic Training. Sci. Rep. 2024, 14, 16702. [Google Scholar] [CrossRef]
- Mears, J.; Kaleem, S.; Panchamia, R.; Kamel, H.; Tam, C.; Thalappillil, R.; Murthy, S.; Merkler, A.E.; Zhang, C.; Ch’ang, J.H. Leveraging the Capabilities of AI: Novice Neurology-Trained Operators Performing Cardiac POCUS in Patients with Acute Brain Injury. Neurocrit. Care 2024, 41, 523–532. [Google Scholar] [CrossRef]
- Mistro, M.; Sheng, Y.; Ge, Y.; Kelsey, C.R.; Palta, J.R.; Cai, J.; Wu, Q.; Yin, F.F.; Wu, Q.J. Knowledge Models as Teaching Aid for Training Intensity Modulated Radiation Therapy Planning: A Lung Cancer Case Study. Front. Artif. Intell. 2020, 3, 66. [Google Scholar] [CrossRef]
- Li, J.; Zong, H.; Wu, E.; Wu, R.; Peng, Z.; Zhao, J.; Yang, L.; Xie, H.; Shen, B. Exploring the Potential of Artificial Intelligence to Enhance the Writing of English Academic Papers by Non-Native English-Speaking Medical Students—The Educational Application of ChatGPT. BMC Med. Educ. 2024, 24, 736. [Google Scholar] [CrossRef]
- Xin, L.; Bin, Z.; Xiaoqin, D.; Wenjing, H.; Yuandong, L.; Jinyu, Z.; Chen, Z.; Lin, W. Detecting Task Difficulty of Learners in Colonoscopy: Evidence from Eye-Tracking. J. Eye Mov. Res. 2021, 14, 5. [Google Scholar] [CrossRef]
- Meade, S.M.; Salas-Vega, S.; Nagy, M.R.; Sundar, S.J.; Steinmetz, M.P.; Benzel, E.C.; Habboub, G. A Pilot Remote Curriculum to Enhance Resident and Medical Student Understanding of Machine Learning in Healthcare. World Neurosurg. 2023, 180, e142–e148. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Roveta, A.; Castello, L.M.; Massarino, C.; Francese, A.; Ugo, F.; Maconi, A. Artificial Intelligence in Medical Education: A Narrative Review on Implementation, Evaluation, and Methodological Challenges. AI 2025, 6, 227. https://doi.org/10.3390/ai6090227
Roveta A, Castello LM, Massarino C, Francese A, Ugo F, Maconi A. Artificial Intelligence in Medical Education: A Narrative Review on Implementation, Evaluation, and Methodological Challenges. AI. 2025; 6(9):227. https://doi.org/10.3390/ai6090227
Chicago/Turabian StyleRoveta, Annalisa, Luigi Mario Castello, Costanza Massarino, Alessia Francese, Francesca Ugo, and Antonio Maconi. 2025. "Artificial Intelligence in Medical Education: A Narrative Review on Implementation, Evaluation, and Methodological Challenges" AI 6, no. 9: 227. https://doi.org/10.3390/ai6090227
APA StyleRoveta, A., Castello, L. M., Massarino, C., Francese, A., Ugo, F., & Maconi, A. (2025). Artificial Intelligence in Medical Education: A Narrative Review on Implementation, Evaluation, and Methodological Challenges. AI, 6(9), 227. https://doi.org/10.3390/ai6090227