1. Introduction
The field of medicine is a vast and ever-evolving domain, placing increasingly immense demands on medical students [
1,
2]. Due to the large volume of knowledge that must be acquired, retained, and applied, medical students must dedicate considerable time to memorizing and understanding these diverse materials [
3,
4]. To accomplish this, medical students spend an average of 7.8 h per weekday engaged in academic-related endeavors and an additional 4.9 h on weekends [
5]. This underscores the significant time commitment required of medical students to master the vast body of knowledge essential to their education and future practice, often driving both students and educators to seek innovative methods that enhance learning efficiency and retention [
6].
The application of generative artificial intelligence (genAI) in medical imaging is a rapidly advancing field, with significant focus on developing algorithms for automated image analysis, disease detection, and diagnostic support [
7,
8,
9,
10]. However, alongside the development of AI as an analyst, there is a critical, complementary need to enhance the abilities of human clinicians who remain central to the diagnostic process. These clinicians must interpret complex images, often working in collaboration with or validating AI findings, requiring interpretive skills learned during their training and practice [
11]. One such emerging approach that can contribute to developing these crucial human skills is the use of genAI in medical education [
12].
The integration of genAI into medical education has opened new avenues for creating engaging and personalized learning materials [
12]. Image-generating AI, such as DALLE-3, Stable Diffusion, and Midjourney, can produce high-quality images and visual aids that have the potential to enhance the learning experience [
13]. Crucially, in the context of medical image interpretation, these tools offer novel ways to represent complex diagnostic information, potentially accelerating the development of pattern recognition skills essential for accurate analysis by future practitioners [
14,
15]. Recent studies have highlighted the potential of AI-generated content to improve educational outcomes by providing interactive and visually appealing resources [
13,
16]. Furthermore, we have shown that the use of AI-generated videos can improve student understanding and retention of medical topics [
17]. Generative AI also allows for the customization of educational content to meet individual learning needs, making it a valuable tool in personalized education [
18].
Despite the promising potential of genAI, there are several challenges and concerns associated with its use in medical education [
19]. One significant issue is the accuracy and reliability of AI-generated images. Medical education requires highly accurate and precise visuals to ensure that students learn correct information. However, genAI models often produce images that lack the necessary anatomical or clinical accuracy, leading to potential misunderstandings [
20]. While genAI applications involving direct image analysis are still in their relative infancy, educational tools leveraging genAI’s generative capabilities can adopt different strategies, particularly when the goal is to enhance the human learning process related to image interpretation [
21]. Additionally, there are concerns about the integration of AI-generated images into existing curricula, as educators may be hesitant to adopt new technologies without clear evidence of their effectiveness and reliability [
22,
23]. To address these challenges, and recognizing that proficient human interpretation remains a cornerstone of clinical practice even in the age of genAI analysis, we propose the use of genAI to create mnemonic images rather than strictly medically accurate images. This approach focuses genAI’s capabilities on enhancing cognitive processes—memory and association—fundamental to learning how to interpret complex visual medical data, such as electrocardiogram (ECG) readings.
Mnemonic techniques, considered the art of memory, can be useful for learning difficult and complex information [
24]. These techniques involve transforming hard-to-remember material into something more memorable [
25]. Visual mnemonics, in particular, aid in recalling abstract or complex information and facilitate both the sequential and immediate retrieval of memorized material [
26]. Numerous studies have demonstrated the effectiveness of pictorial mnemonics in improving the recall of factual knowledge, long-term memory retention in college students, and enhancing students’ memory for important textual information integrated by an underlying central theme [
27]. In medical education, where the breadth and depth of required knowledge are extensive, memory tools like mnemonics can significantly augment the learning process [
28]. This is evident in the widespread use of visual learning platforms such as Sketchy Medical and Picmonic, which have become favored study sources among medical students [
29,
30]. Mnemonic strategies have proven to be invaluable in equipping students with materials for acquiring straightforward, but nonetheless not easily remembered, facts and information [
24].
The problem addressed in this study is the difficulty medical students face in retaining complex information about coronary artery occlusions using traditional ECG diagrams—a critical skill in medical image interpretation. Diagnosing a heart attack, or ST-elevation myocardial infarction (STEMI), involves first identifying ST-segment elevations, often accompanied by reciprocal ST-segment depressions, and then determining which ECG leads show these changes to localize the affected coronary artery. Our intervention involved using genAI (DALLE-3) to create mnemonic-based images overlayed on a 12-lead ECG. These mnemonic overlays are not designed to detect ST-segment elevations in the waveform but to facilitate the correlation of ST-segment elevations in specific leads with the underlying coronary artery territories they represent. By focusing on mnemonics, the AI-generated images can leverage the strengths of visual memory aids while circumventing the need for precise anatomical accuracy. This approach can enhance the learning experience by making complex information more memorable and easier to recall.
The aim of our research was to evaluate the effectiveness of these mnemonic images in enhancing long-term retention and student preference. We conducted a comparative study with a control and an experimental group, measuring exam performance and student preferences using surveys. We hypothesized that the mnemonic-based images would improve long-term retention and be preferred by students over traditional ECG images. To the best of our knowledge, this is the first study to evaluate the effectiveness of generative AI-generated mnemonic images specifically designed to support the interpretation of ECGs in the context of coronary artery localization. While previous work has examined AI tools for image analysis or generic educational support, this study uniquely explores the cognitive benefits of AI-generated images as a personalized learning aid for a high-stakes diagnostic skill.
The remainder of this paper is structured as follows:
Section 2 describes the methodology, including participant recruitment, intervention design, and data analysis procedures.
Section 3 presents the results from both achievement data and student survey responses.
Section 4 discusses the educational implications and limitations of the findings. Finally,
Section 5 concludes with a summary of key takeaways and directions for future research.
2. Materials and Methods
This educational research was approved as exempt by the institutional review board of the University of Idaho (21-223). This study was conducted at the University of Idaho WWAMI Medical Education Program, which is part of the six-campus/site-collaborative University of Washington School of Medicine program that serves Washington, Wyoming, Alaska, Montana, and Idaho. The WWAMI program allows students to complete their first two preclinical years of medical education in their home states before transitioning to clinical training, providing an accessible pathway for medical education across these five states. This study involved first-year medical students from all six WWAMI sites (n = 275) who received uniform material and exam questions across all sites. The primary aim was to evaluate the effectiveness of a mnemonic-based image, generated using generative AI, in improving exam performance and educational material preference in ECG interpretation for acute coronary syndrome.
2.1. Participants
The participants included first-year medical students from six different locations or sites. All students received the same lecture content and exam questions. The experimental group comprised 40 students (n = 40) attending the corresponding author’s site (Site 6), while the remaining 235 students served as the control group across the other 5 sites.
2.2. Intervention
All students were enrolled in a 6-week cardiovascular course during which they received a one-hour lecture covering material related to ECG interpretation in acute coronary syndrome, including an image correlating changes in each lead of a 12-lead ECG with its corresponding coronary artery (
Figure 1A). The experimental group (
n = 40) received additional material featuring a graphic with mnemonic-based images overlayed onto a localization ECG image (
Figure 1B). These mnemonic images were generated using DALLE-3 through the ChatGPT interface, which can be viewed in
Appendix A (
Figure A1,
Figure A2 and
Figure A3); upscaled with Krea.ai; and further refined in Adobe Photoshop.
2.3. Assessments
Students were assessed using a multiple-choice exam question (MCQ) related to the localization of coronary artery occlusion in acute coronary syndrome on their weekly exam (Exam 3), conducted 6 days after the lecture, and again on their final course exam (Exam 4), 11 days after the lecture. The same exams were administered to all students at all sites at similar times.
2.4. Data Collection
Following the final exam, students in the experimental group were invited to participate in a survey. A total of 31 students from the experimental group completed the Situational Interest Survey of Multimedia (SIS-M) (
Table 1) through the Qualtrics platform. The SIS-M was developed by Dr. Tonia Dousay, a professor specializing in instructional design and educational technology, to assess different aspects of situational interest in multimedia learning environments. Designed for use in educational settings, the SIS-M targets adult learners and evaluates constructs such as triggered situational interest (initial engagement with multimedia), maintained interest, and value interest (perceived usefulness of the content). Initially used to assess the effectiveness of multimedia in promoting engagement and motivation in higher education and adult learning [
31,
32], the SIS-M has recently been applied to medical education research [
17,
33,
34], making it a suitable tool for evaluating learner engagement in this study.
The survey included questions that asked students to consent to participate and respond to the 12-item SIS-M twice, first referencing the original image and then the experimental image. The survey includes items to rank on a 1-5 scale (1 = strongly disagree; 5 = strongly agree); a question asking for preference of image format; and an open-ended question asking, “Why do you think this is your preference?”.
Student exam grades and specific grades on the material-specific exam question were recorded to measure baseline achievement and material-specific achievement, respectively.
2.5. Data Analysis
Researchers utilized SPSS (30.0.0.0 (172)) to analyze the students’ grades and SIS-M survey results. Achievement data were reported as the average exam score at each site and the average score on material-specific exam questions for each site. Since we did not have access to individual student grades, but instead the average site grades for each exam and exam question, differences in exam question achievement between the weekly exam and the final exam were measured using a 2 × 2 contingency table with a 1-tailed Chi-Squared test. Linear regression of site scores between the control groups and the experimental group was performed using GraphPad Prism (v 9.5.1 (733)).
The SIS-M survey analysis considered multiple dimensions of situational interest: triggered interest (Trig), maintained interest (MT), maintained feeling (MF), and maintained value (MV). Given the parametric nature of the data, four paired t-tests were used to evaluate experimental group students’ interest in the original and experimental images.
For the open-ended question in the SIS-M survey, thematic analysis was conducted using multiple large language models (LLMs), including ChatGPT (GPT4o and o1-preview) and Claude 3.5 Sonnet (
Figure 2). This involved generating initial codes and identifying themes, followed by the researcher combining and refining these themes for overlap and relevancy between the three LLM models [
17,
33,
34]. Prompt engineering techniques used included Persona Prompting [
35,
36], Zero-Shot Chain of Thought (CoT) [
37], and Self-Criticism [
38]. Zero-Shot Chain of Thought prompting was omitted in prompts utilizing the ChatGPT o1-preview model as it has built-in Tree-of-Thought functionality in every output. The initial prompt was the following:
Act like a brilliant medical education researcher. I am doing a study on the use of a graphic with mnemonic-based images overlayed onto an ECG image that teaches the localization coronary artery obstructions during STEMI. These mnemonic-based images were generated with generative AI. I surveyed the participants on their preference of the mnemonic image over the traditional image that only included colored boxes over the ECG image and asked them to explain their preference. Please perform a thematic analysis on the below participant responses marked between <response> </response>. Let’s work this out in a step by step way to be sure we have the right answer.
<response>
Participant responses here
</response>”
The follow-up query in the conversation was a Self-Criticism prompt: “Please reflect on your previous answer for any errors”.
2.6. Ethical Considerations
This educational research was approved as exempt by the institutional review board of the University of Idaho (21-223). According to OpenAI’s Content Policy and Terms of Use, users retain ownership of images generated with DALLE-3. The images used in this study were not representations of real individuals but rather generic, non-identifiable subjects such as dogs, insects, and a race car, minimizing any ethical or privacy concerns related to likeness or identity.
3. Results
To address the question of increased performance with the experimental media, we first measured baseline knowledge of students across the six sites. The average exam score for each of the four exams in the course was used for this measure (
Figure 3A). The experimental site (Site 6) did not show significantly higher overall exam scores compared to the other sites, indicating comparable baseline knowledge levels across the experimental and control groups.
Students were assessed on their knowledge of ECG interpretation during acute coronary syndrome presentation, six days after learning the material (Exam 3) and eleven days after learning the material on the course final exam (Exam 4). This was performed by having them answer an MCQ question related to the material covered in the lecture. Regarding Exam 3, there was no significant difference in achievement between any sites, including the experimental site (
Figure 3B,C). This suggests that the initial understanding of the material was similar across all groups shortly after the lecture.
However, on the final exam (Exam 4) question related to the same material, a significant difference was observed. Students in the control group at all sites, but not those in the experimental group, showed a significant drop in exam question scores from the previous exam (
Figure 3B,D). This drop indicates a decline in long-term retention of the material. In contrast, students in the experimental group had a slight drop in scores that was statistically insignificant, demonstrating that the mnemonic-based images promoted better long-term memory retention of the material.
Linear regression analysis of the material-related question scores on the weekly exam and final exam supports these findings. The analysis showed a non-statistically significant but trending interaction effect between the group (control vs. experimental) and time (weekly exam vs. final exam) on exam scores. The experimental group exhibited a smaller decline in scores on the final exam compared to the control group, supporting the effectiveness of the mnemonic-based images in promoting long-term retention (
Figure 3C,D).
To address the question of increased interest and preference for the experimental media, four paired sample t-tests were conducted to explore the students’ preference for the redesigned learning materials over the original ones. The results revealed a significant difference among the 31 participants (
Table 2). Notably, 80% (
n = 25) of the participants preferred the experimental image, while 13% (
n = 4) preferred the original image, and 6% (
n = 2) had no preference. The vast majority of students preferring the mnemonic-based image underscores the importance of the study’s findings.
The participants’ average triggered situational interest (Trig) in the experimental image (M = 4.69, SD = 0.43) was significantly higher than in the original image (M = 2.52, SD = 0.90), t = −11.709, p < 0.001. The 95% confidence interval for the mean difference between the two ratings was −2.55 to −1.79, suggesting a preference for the experimental image.
The findings for maintained (MT) interest indicated that the participants’ interest rating of the experimental image (M = 4.60, SD = 0.48) was significantly greater than that of the original learning image (M = 3.83, SD = 0.90), t = −3.816, p < 0.001. The 95% confidence interval for the mean difference between the two ratings was −1.18 to −0.36.
The results for maintained-feeling (MF) interest revealed that the participants’ interest rating of the experimental image (M = 4.46, SD = 0.62) was significantly greater than that of the original image (M = 3.49, SD = 1.01), t = −4.069, p < 0.001. The 95% confidence interval for the mean difference between the two ratings was -1.45 to -0.48.
The outcomes for maintained-value (MV) interest suggested that the participants’ interest rating of the experimental image (M = 4.74, SD = 0.47) was significantly greater than that of the original image (M = 4.17, SD = 0.94), t = −2.956, p = 0.006. The 95% confidence interval for the mean difference between the two ratings was −0.97 to −0.18.
A thematic analysis of the open-ended survey responses in the SIS-M provided insights into why students preferred the experimental image over the traditional image. Four primary themes emerged from the responses:
Mnemonic’s Impact on Retention and Learning: The mnemonic-based image helped with long-term retention and made memorization easier, both for exams and in the long term.
Engagement and Interest: The mnemonic-based image made the material more fun and interactive compared to traditional learning methods.
Preference for Mnemonic Imagery: Participants favored mnemonic-based characters and visuals over the traditional colored boxes, citing better memorization aids.
Challenges with Mnemonic:
Confusion with Mnemonic Imagery: Some found the mnemonic elements confusing or hard to connect to the content.
Prior Memorization of Traditional Image: Some participants struggled with switching to the mnemonic image after memorizing the traditional one.
In summary, participants overwhelmingly preferred the mnemonic-based image for its ability to enhance memorability, make learning more engaging, and simplify complex information. Achievement data supported this finding by demonstrating improved long-term retention of students’ ability to correlate ST-segment elevations in specific ECG leads with their corresponding obstructed coronary artery. However, some participants noted challenges, such as confusion with certain mnemonic elements or the influence of image order on their preference. Overall, the mnemonic-based image provided a more interactive and memorable learning experience for most participants.
4. Discussion
This study presents a novel application of genAI by using it to create mnemonic-based images that support medical students in learning ECG interpretation for coronary artery localization. Unlike prior uses of genAI in medical education that have focused on automated analysis or content delivery, this approach evaluates AI-generated images as a tool to enhance cognitive processes, specifically the long-term retention of complex visual associations. The results revealed two main findings: First, students exposed to mnemonic-based images generated by genAI showed improved long-term retention of the material as compared to those exposed to the traditional ECG image. This was demonstrated by a lesser decline in scores on the final exam for the experimental group compared to the control group. Second, there was a notable increase in student preference for the mnemonic images, with 80% of participants favoring the redesigned materials as opposed to the original ECG diagrams. These mnemonic images significantly improved the students’ ability to recall and apply information over time due to the effective use of word/image associations. This study showed that such associations help in memorizing, as they create visually interesting and captivating materials, thus reducing cognitive load. This approach made learning more interesting and interactive while at the same time reinforcing memory through powerful visual cues for complex medical concepts, ultimately leading to the enhanced long-term retention of interpreting ECGs related to acute coronary syndrome.
The enhanced memory retention observed in this study can be attributed to the mnemonic images’ ability to create strong visual associations, aligning with established principles of memory formation and recall [
39]. The principal goal of mnemonic instruction is to help students remember facts and concepts, which is imperative for academic success, as content in every area needs to be memorized and quickly retrieved [
10]. The increased student engagement with mnemonic images likely stems from their reported captivating and entertaining nature, which aligns with previous findings on the effectiveness of pictorial mnemonics in improving recall of factual knowledge [
27]. The reduction in cognitive load, as reported by the students, contributed to improved performance by allowing more cognitive resources to be allocated to understanding and retaining the material, rather than struggling with memorization. These findings are consistent with the existing literature on mnemonic use in medical education; notably, Sketchy Medical, is heavily utilized throughout the organ-system-based curriculum during the first year of medical education [
30]. However, this study extends beyond traditional mnemonic techniques by incorporating AI-generated images, offering a new approach to creating personalized and engaging educational content. Previous studies have shown the potential of AI-generated content in improving educational outcomes and knowledge acquisition [
20]. This research specifically demonstrates its effectiveness in creating mnemonic aids for complex medical concepts, bridging the gap between traditional mnemonics and cutting-edge AI technology in educational materials.
The efficacy of AI-generated mnemonic images in enhancing the retention of ECG interpretation skills indicates significant potential for this approach to be used across various domains within medical education. Complex subjects, such as anatomy, pharmacology, and pathophysiology, could benefit from the implementation of similar mnemonic-based resources, thereby transforming the way students assimilate extensive medical information [
14,
40,
41,
42]. The greater amount of material that must be learned in medical school consists of words and numbers. Mnemonics employ a form of chunking [
4,
43], decreasing the number of items to remember by grouping them together, which is particularly beneficial in the context of medical education, where continuing education and the maintenance of knowledge are of utmost importance. Visual mnemonics serve the brain by building associations between diagnoses and disease processes with easy-to-recall images [
43]. The incorporation of these tools into current educational frameworks could involve supplementing standard lectures with AI-generated mnemonics, promoting a more diverse learning atmosphere. This may influence teaching strategies by encouraging and prompting educators to incorporate more visual and associative learning modalities. This can further lead to the development of comprehensive, AI-enhanced mnemonic resources tailored for medical education. Increased student engagement and preference for mnemonic images indicate that this approach could significantly enhance student satisfaction and overall learning experience. By decreasing cognitive load and making hard-to-remember information more memorable, these tools could lessen the pressure associated with the demanding nature of medical education. AI-generated mnemonics can be applied as a resource to aid students in efficiently retaining important information as medicinal knowledge continues to expand.
Limitations
The strengths of this study are evident in its rigorous design, including the multicampus setting, which offered a varied student demographic and ensured uniform lecture content across all locations. The integration of both qualitative and quantitative data, such as exam performance indicators and the Situational Interest Survey of Multimedia (SIS-M), provided a thorough understanding of the intervention’s effects. Moreover, the novel application of genAI for developing mnemonic-based educational resources marks a notable progression in research methodology within education. Nonetheless, the study also faced limitations. There is a risk of self-selection bias in the survey responses, possibly skewing the qualitative data, as students with strong opinions on the mnemonic images may have been more inclined to respond and participate. The applicability of the findings to other medical schools or educational contexts might be restricted due to the study’s particular circumstances. Furthermore, the relatively small size of the experimental group (n = 40) compared to the control group (n = 235) could have affected the statistical power of the results and may not fully reflect the broader student population.
Furthermore, the use of genAI technologies also poses a challenge in environments with varying technological resources. However, these limitations are potentially mitigated by the growing accessibility of user-friendly AI platforms, which are simplifying the use of AI in educational contexts and may broaden the applicability of such innovative teaching tools.
To overcome these constraints and further investigate the potential of AI-generated mnemonic tools in medical education, future studies should aim to broaden the scope and scale of similar research. Examining the use of this technique in other medical fields and disciplines, such as anatomy, pharmacology, or pathophysiology, could yield important insights into its overall effectiveness. Longitudinal studies that assess the long-term impact of mnemonic-based images on medical education would be useful in understanding their lasting influence on knowledge retention and clinical application. To improve the generalizability and statistical significance of the results, future research should seek to involve larger and more varied groups of students from different medical institutions and educational backgrounds. Moreover, investigating various types of mnemonic images and their effects on different learning styles could help customize this method to meet the diverse needs and preferences of students, potentially resulting in more personalized and effective educational strategies in medical training.