Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (226)

Search Parameters:
Keywords = final exam

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 1755 KB  
Article
A Faculty-Constructed AI Tutor for Personalized Learning and Remediation in a U.S. PharmD Immunology Course: An “In-House” Evaluation of New Learning Technology
by Ashim Malhotra
Pharmacy 2026, 14(2), 59; https://doi.org/10.3390/pharmacy14020059 - 3 Apr 2026
Viewed by 220
Abstract
While generative AI becomes increasingly available in higher education, faculties find it challenging to design, implement, and evaluate AI-enabled personalized learning systems within accreditation-constrained professional curricula. This method paper describes ADAPT (Assessment-Driven AI for Personalized Tutoring), a home-grown AI tutoring and remediation ecosystem [...] Read more.
While generative AI becomes increasingly available in higher education, faculties find it challenging to design, implement, and evaluate AI-enabled personalized learning systems within accreditation-constrained professional curricula. This method paper describes ADAPT (Assessment-Driven AI for Personalized Tutoring), a home-grown AI tutoring and remediation ecosystem implemented in a required PharmD immunology course. Using standard learning management (Canvas) and assessment (ExamSoft) platforms, a 20-item quiz mapped to six immunology mastery domains (N = 34; mean 69.1%, SD 17.9; Cronbach’s α = 0.73) was used to trigger tiered, structured generative AI remediation at both individual student and cohort levels. Instructional impact was evaluated using reliability indices, item-level difficulty analyses, and paired pre/post-assessment comparisons. Following AI-guided remediation, mean performance increased to 79.8% (+10.7 percentage points), variability decreased (SD 14.4), and assessment reliability improved (ExamSoft KR-20 0.87) compared with the diagnostic exam, the first midterm exam, and the final exam, respectively. Item difficulty stabilized (mean ≈ 0.80), with sustained retention of targeted concepts on the final examination. ADAPT provides a replicable, low-cost methodological blueprint for faculties to independently construct assessment-driven AI tutoring systems and lays the foundational steps for future AI-based predictive analysis workflow for at-risk students. Full article
(This article belongs to the Section Pharmacy Education and Student/Practitioner Training)
Show Figures

Figure 1

33 pages, 2332 KB  
Article
EvalHack: Answer-Side Prompt Injection for Probing LLM Exam-Grading Panel Stability
by Catalin Anghel, Marian Viorel Craciun, Adina Cocu, Andreea Alexandra Anghel, Antonio Stefan Balau, Adrian Istrate and Aurelian-Dumitrache Anghele
Information 2026, 17(3), 297; https://doi.org/10.3390/info17030297 - 18 Mar 2026
Viewed by 377
Abstract
Large language models are increasingly used as automated graders, yet their reliability under answer-side manipulation and their behavior in multi-model panels remain insufficiently understood. This paper introduces EvalHack, a matrix benchmark in which a fixed committee of four LLMs grades university-level machine learning [...] Read more.
Large language models are increasingly used as automated graders, yet their reliability under answer-side manipulation and their behavior in multi-model panels remain insufficiently understood. This paper introduces EvalHack, a matrix benchmark in which a fixed committee of four LLMs grades university-level machine learning exam answers under a strict integer-only contract (0–10) grounded in instructor-authored rubric artifacts. The dataset comprises 100 students answering 10 short, open-ended items (1000 answers). For each answer, the evaluation includes a clean version and two content-preserving adversarial variants that operate only on the student text: A1, a visible coercive suffix appended to the answer, and A2, a stealth variant that uses Unicode control characters (e.g., zero-width and bidirectional marks) to embed an instruction. EvalHack instruments the full grading pipeline, recording item-level member scores, the committee aggregate, within-panel disagreement, and discrepancies to human grades. Empirically, answer-side edits induce systematic score inflation and stronger top-end concentration, with edited answers clustering near the upper end of the scale. Within-panel disagreement, measured as the range between the highest and lowest member score, varies across conditions, with median Consistency Spread values of 3.0 (clean), 2.0 (A1), and 6.0 (A2). Compared to human graders, the panel is more lenient on average (MAE = 1.897; bias human − panel = −1.345). Finally, grouping items by disagreement shows that low-disagreement items exhibit smaller human-panel errors, indicating that within-panel spread can serve as a practical uncertainty signal for routing difficult answers to human review or to larger/more specialized panels. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Graphical abstract

33 pages, 2576 KB  
Article
ExamQ-Gen: Instructor-in-the-Loop Generation of Self-Contained Exam Questions from Course Materials and Decision-Support Grading
by Catalin Anghel, Emilia Pecheanu, Andreea Alexandra Anghel, Marian Viorel Craciun and Adina Cocu
Computers 2026, 15(3), 177; https://doi.org/10.3390/computers15030177 - 9 Mar 2026
Viewed by 296
Abstract
Reliable evaluation of large language models (LLMs) for educational use requires benchmarks that reflect exam constraints, instructor grading practices, and the operational consequences of thresholded decisions. This paper introduces ExamQ-Gen, an instructor-in-the-loop benchmark that couples two tasks: (i) an LLM answering university-style exam [...] Read more.
Reliable evaluation of large language models (LLMs) for educational use requires benchmarks that reflect exam constraints, instructor grading practices, and the operational consequences of thresholded decisions. This paper introduces ExamQ-Gen, an instructor-in-the-loop benchmark that couples two tasks: (i) an LLM answering university-style exam questions and (ii) decision-support grading aligned with an instructor reference. Automatic grading is used for triage and feedback; in practice, ExamQ-Gen supports instructor-led exam authoring and provides grading recommendations, while the instructor issues the final grade and pass/fail decision. ExamQ-Gen is constructed from the course content by using an LLM to generate exam-style questions directly from the lecture materials, producing a course-derived question set suitable for controlled experimentation. The benchmark then instantiates contrasting exam conditions, including instructor-authored (HUMAN) versus pipeline-generated (PIPELINE) artifacts, to evaluate robustness under distribution shifts that can occur when exam questions and answers are produced through different generation workflows. Using two LLM “students” (Llama3-8B-Instruct and Mistral-7B-Instruct) and an LLM-based grader, we compare automatic grading against an instructor reference on a 1–10 score scale and at the decision level induced by the operational pass policy (pass if score ≥ 9). Accordingly, our conclusions are conditioned on the two evaluated student models. Score-level agreement is strong under HUMAN conditions but degrades substantially under PIPELINE conditions, indicating condition-dependent stability. At the pass threshold, decision errors are highly asymmetric, with false fails dominating false passes, meaning that conservative grading may appear safe while producing credit denial. A severity-focused analysis isolates a high-stakes failure mode—denial of instructor-perfect answers—and shows that, in the most affected PIPELINE condition, the perfect-pass miss rate reaches 0.926 (50/54), consistent with systematic conservatism rather than borderline noise. Overall, the results highlight that aggregate score agreement and accuracy are insufficient for instructor-controlled exam deployment and motivate reporting practices that combine disaggregated score agreement, threshold-based error asymmetry with uncertainty, and severity-aware diagnostics under exam-relevant condition shifts. Full article
Show Figures

Figure 1

16 pages, 803 KB  
Article
AI-Powered Physiotherapy: Evaluating LLMs Against Students in Clinical Rehabilitation Scenarios
by Ioanna Michou, Athanasios Fouras, Dionysia Chrysanthakopoulou, Marina Theodoritsi, Savina Mariettou, Sotiria Stellatou and Constantinos Koutsojannis
Appl. Sci. 2026, 16(3), 1165; https://doi.org/10.3390/app16031165 - 23 Jan 2026
Viewed by 1033
Abstract
Generative artificial intelligence (GenAI), particularly large language models (LLMs) such as ChatGPT and DeepSeek, is transforming healthcare by enhancing clinical decision-making, education, and patient interaction. This exploratory study compares the responses of ChatGPT (GPT-4.1) and DeepSeek-V2 against 90 final-year physiotherapy students in Greece [...] Read more.
Generative artificial intelligence (GenAI), particularly large language models (LLMs) such as ChatGPT and DeepSeek, is transforming healthcare by enhancing clinical decision-making, education, and patient interaction. This exploratory study compares the responses of ChatGPT (GPT-4.1) and DeepSeek-V2 against 90 final-year physiotherapy students in Greece on the quality of the responses to 60 clinical questions across four rehabilitation domains: low back pain, multiple sclerosis, frozen shoulder, and knee osteoarthritis (15 questions per domain). The questions spanned basic knowledge, diagnosis, alternative treatments, and rehabilitation practices. The responses were evaluated for their relevance, accuracy, clarity, completeness, and consistency with clinical practice guidelines (CPGs), emphasizing conceptual understanding. This study provides novel contributions by (i) benchmarking LLMs in physiotherapy-specific domains (low back pain, multiple sclerosis, frozen shoulder, and knee osteoarthritis) underrepresented in prior AI-health evaluations; (ii) directly comparing the LLM written response quality to student performance under exam constraints; and (iii) highlighting the improvement potential for education, complementing ChatGPT’s established role in physician decision support. The results indicate that the LLMs produced higher-quality written responses than students in most domains, particularly in the global response quality and the conceptual depth of written responses, highlighting their potential as educational aids for knowledge-based tasks, although not equivalent to clinical expertise. This suggests AI’s role in physiotherapy as a supportive tool rather than a replacement for hands-on clinical skills and asks whether GenAI could transform physiotherapy practice by augmenting, rather than threatening, human-centered care, for its potential as a knowledge support tool in education, pending validation in clinical contexts. This study explores these findings, compares them with the related work, and discusses whether GenAI will transform or threaten physiotherapy practice. Ethical considerations, limitations, and future directions, including AI voice assistants and AI characters, are addressed. Full article
Show Figures

Figure 1

19 pages, 1421 KB  
Article
Turning the Page: Pre-Class AI-Generated Podcasts Improve Student Outcomes in Ecology and Environmental Biology
by Laura Díaz and Víctor D. Carmona-Galindo
Educ. Sci. 2026, 16(1), 168; https://doi.org/10.3390/educsci16010168 - 22 Jan 2026
Viewed by 662
Abstract
In the aftermath of the COVID-19 pandemic, instructors in higher education have reported a decline in foundational reading habits, particularly in STEM courses where dense, technical texts are common. This study examines a low-barrier instructional intervention that used generative AI (GenAI) to support [...] Read more.
In the aftermath of the COVID-19 pandemic, instructors in higher education have reported a decline in foundational reading habits, particularly in STEM courses where dense, technical texts are common. This study examines a low-barrier instructional intervention that used generative AI (GenAI) to support pre-class preparation in two upper-division biology courses. Weekly AI-generated audio overviews—“podcasts”—were paired with timed, textbook-based online quizzes. These tools were not intended to replace reading, but to scaffold engagement, reduce preparation anxiety, and promote early familiarity with course content. We analyzed student engagement, perceptions, and performance using pre/post surveys, quiz scores, and exam outcomes. Students reported that the podcasts helped manage time constraints, improved their readiness for lecture, and increased their motivation to read. Those who consistently completed the quizzes performed significantly better on closed-book, in-class exams and earned higher final course grades. Our findings suggest that GenAI tools, when integrated intentionally, can reintroduce structured learning behaviors in post-pandemic classrooms. By meeting students where they are—without compromising cognitive rigor—audio-based scaffolds may offer inclusive, scalable strategies for improving academic performance and reengaging students with scientific content in an increasingly attention-fragmented educational landscape. Full article
Show Figures

Figure 1

11 pages, 480 KB  
Article
Comparison of Visual Acuity and Strabismus Pre- and Post-Baerveldt 350 Glaucoma Drainage Device Placement in Refractory Childhood Glaucomas
by Adam Jacobson, Elizabeth M. Bolton and Brenda L. Bohnsack
J. Clin. Transl. Ophthalmol. 2026, 4(1), 4; https://doi.org/10.3390/jcto4010004 - 19 Jan 2026
Viewed by 354
Abstract
Objective: Assess visual acuity (VA) and strabismus changes in children after Baerveldt 350 (BV350) device placement. Methods and Analysis: Retrospective cohort study of children (<21 years of age) who had superotemporal BV350 placement (2011–2023) and >6-month follow-up. Ocular diagnoses, surgical details, [...] Read more.
Objective: Assess visual acuity (VA) and strabismus changes in children after Baerveldt 350 (BV350) device placement. Methods and Analysis: Retrospective cohort study of children (<21 years of age) who had superotemporal BV350 placement (2011–2023) and >6-month follow-up. Ocular diagnoses, surgical details, and preoperative and final follow-up exam findings were collected. In bilateral cases, first eye implanted was included in analysis. Results: Ninety-seven patients underwent BV350 surgery with median age of 6.7 (interquartile (IQR) 3.1, 11.2) years and with a median of 4.2 (IQR 1.8, 6.8) years of follow-up. Most common glaucomas were secondary to non-acquired ocular anomaly (n = 31) or primary congenital glaucoma (n = 21). There was no difference in preoperative and final VA (p = 0.6583). Twenty-seven (28%) and twenty-five (26%) patients were orthophoric preoperatively and at final follow-up, respectively. Orthophoria at final follow-up was associated with preoperative (odds ratio (OR)1.8 [1.2, 2.9]) and final VA (OR1.5 [1.1, 2.3]). At final follow-up, 13 patients (13%) and 19 patients (20%) showed worsened or improved horizontal deviation (>10 prism diopter (PD) change), respectively. No patients reported postoperative diplopia. Only four patients, all with esotropia, underwent subsequent strabismus surgery. Conclusions: Children who underwent BV350 placement did not have significant change in VA, and a high percentage of patients had strabismus prior to (72%) and following (74%) glaucoma surgery. Orthophoria was associated with better VA. The majority of patients did not show worsening of strabismus postoperatively, and none reported diplopia. Full article
Show Figures

Figure 1

27 pages, 1930 KB  
Article
SteadyEval: Robust LLM Exam Graders via Adversarial Training and Distillation
by Catalin Anghel, Marian Viorel Craciun, Adina Cocu, Andreea Alexandra Anghel and Adrian Istrate
Computers 2026, 15(1), 55; https://doi.org/10.3390/computers15010055 - 14 Jan 2026
Viewed by 484
Abstract
Large language models (LLMs) are increasingly used as rubric-guided graders for short-answer exams, but their decisions can be unstable across prompts and vulnerable to answer-side prompt injection. In this paper, we study SteadyEval, a guardrailed exam-grading pipeline in which an adversarially trained LoRA [...] Read more.
Large language models (LLMs) are increasingly used as rubric-guided graders for short-answer exams, but their decisions can be unstable across prompts and vulnerable to answer-side prompt injection. In this paper, we study SteadyEval, a guardrailed exam-grading pipeline in which an adversarially trained LoRA filter (SteadyEval-7B-deep) preprocesses student answers to remove answer-side prompt injection, after which the original Mistral-7B-Instruct rubric-guided grader assigns the final score. We build two exam-grading pipelines on top of Mistral-7B-Instruct: a baseline pipeline that scores student answers directly, and a guardrailed pipeline in which a LoRA-based filter (SteadyEval-7B-deep) first removes injection content from the answer and a downstream grader then assigns the final score. Using two rubric-guided short-answer datasets in machine learning and computer networking, we generate grouped families of clean answers and four classes of answer-side attacks, and we evaluate the impact of these attacks on score shifts, attack success rates, stability across prompt variants, and alignment with human graders. On the pooled dataset, answer-side attacks inflate grades in the unguarded baseline by an average of about +1.2 points on a 1–10 scale, and substantially increase score dispersion across prompt variants. The guardrailed pipeline largely removes this systematic grade inflation and reduces instability for many items, especially in the machine-learning exam, while keeping mean absolute error with respect to human reference scores in a similar range to the unguarded baseline on clean answers, with a conservative shift in networking that motivates per-course calibration. Chief-panel comparisons further show that the guardrailed pipeline tracks human grading more closely on machine-learning items, but tends to under-score networking answers. These findings are best interpreted as a proof-of-concept guardrail and require per-course validation and calibration before operational use. Full article
Show Figures

Figure 1

24 pages, 476 KB  
Article
APAR: A Structural Design and Guidance Framework for Gamification in Education Based on Motivation Theories
by J. Carlos López-Ardao, Miguel Rodríguez-Pérez, Sergio Herrería-Alonso, M. Estrella Sousa-Vieira, Alfonso Lago Ferreiro, Andrés Suárez-González and Raúl F. Rodríguez-Rubio
Multimodal Technol. Interact. 2026, 10(1), 10; https://doi.org/10.3390/mti10010010 - 10 Jan 2026
Viewed by 1161
Abstract
Gamification is widely used to enhance student motivation, yet many educational design proposals remain conceptual and provide limited operational guidance for digital learning environments. This paper introduces APAR (Activities, Points, Achievements and Rewards), a content-independent structural framework for designing and implementing educational gamification [...] Read more.
Gamification is widely used to enhance student motivation, yet many educational design proposals remain conceptual and provide limited operational guidance for digital learning environments. This paper introduces APAR (Activities, Points, Achievements and Rewards), a content-independent structural framework for designing and implementing educational gamification in learning platforms. Grounded in motivation theories (including Self-Determination Theory and Relatedness–Autonomy–Mastery–Purpose) and reward taxonomies (Status, Access, Power and Stuff), APAR distinguishes high-level design constructs from concrete game elements (e.g., points, badges and leaderboards) and provides a systematic design loop linking learning activities, feedback, intermediate goals and reinforcement. The contribution includes (i) a mapping table relating each APAR construct to motivation models, supported dynamics and typical learning-platform implementations; (ii) an actionable design guide; and (iii) an empirical illustration implemented in Moodle in a higher-education Computer Networks course. In this setting, the proportion of enrolled students taking the final exam increased from 58% to 72% in the first year, and the proportion of enrolled students passing increased from 17% to 38%; in 2022–2023 these values were 70% and 39%, respectively (56% of exam takers passed). While the use case relies on quantitative course-level indicators and is observational, the findings support the potential of structural gamification as an integrated methodological tool and motivate further mixed-method validations. Full article
Show Figures

Figure 1

18 pages, 2272 KB  
Article
Machine Learning Approaches for Early Student Performance Prediction in Programming Education
by Seifeddine Bouallegue, Aymen Omri and Salem Al-Naemi
Information 2026, 17(1), 60; https://doi.org/10.3390/info17010060 - 8 Jan 2026
Viewed by 1018
Abstract
Intelligent recommender systems are essential for identifying at-risk students and personalizing learning through tailored resources. Accurate prediction of student performance enables these systems to deliver timely interventions and data-driven support. This paper presents the application of machine learning models to predict final exam [...] Read more.
Intelligent recommender systems are essential for identifying at-risk students and personalizing learning through tailored resources. Accurate prediction of student performance enables these systems to deliver timely interventions and data-driven support. This paper presents the application of machine learning models to predict final exam grades in a university-level programming course, leveraging multi-modal student data to improve prediction accuracy. In particular, a recent raw dataset of students enrolled in a programming course across 36 class sections from the Fall 2024 and Winter 2025 terms was initially processed. The data was collected up to one month before the final exam. From this data, a comprehensive set of features was engineered, including the student’s background, assessment grades and completion times, digital learning interactions, and engagement metrics. Building on this feature set, six machine learning prediction models were initially developed using data from the Fall 2024 term. Both training and testing were conducted on this dataset using cross-validation combined with hyperparameter tuning. The XGBoost model demonstrated strong performance, achieving an accuracy exceeding 91%. To assess the generalizability of the considered models, all models were retrained on the complete Fall 2024 dataset. They were then evaluated on an independent dataset from Winter 2025, with XGBoost achieving the highest accuracy, exceeding 84%. Feature importance analysis has revealed that the midterm grade and the average completion duration of lab assessments are the most influential predictors. This data-driven approach empowers instructors to proactively identify and support at-risk students, enabling adaptive learning environments that deliver personalized learning and timely interventions. Full article
(This article belongs to the Special Issue Human–Computer Interactions and Computer-Assisted Education)
Show Figures

Graphical abstract

12 pages, 1051 KB  
Article
Assessing the Efficacy of Ortho GPT: A Comparative Study with Medical Students and General LLMs on Orthopedic Examination Questions
by Philippe Fabian Pohlmann, Maximilian Glienke, Richard Sandkamp, Christian Gratzke, Hagen Schmal, Dominik Stephan Schoeb and Andreas Fuchs
Bioengineering 2025, 12(12), 1290; https://doi.org/10.3390/bioengineering12121290 - 24 Nov 2025
Cited by 1 | Viewed by 759
Abstract
Background: Domain-specific large language models (LLMs) like Ortho GPT have potential advantages over general-purpose models in medical education, offering improved factual accuracy and contextual relevance. This study evaluates the performance of Ortho GPT against general LLMs and senior medical students on validated orthopedic [...] Read more.
Background: Domain-specific large language models (LLMs) like Ortho GPT have potential advantages over general-purpose models in medical education, offering improved factual accuracy and contextual relevance. This study evaluates the performance of Ortho GPT against general LLMs and senior medical students on validated orthopedic examination questions. Methods: Six LLMs (Ortho GPT 4o, ChatGPT 4o, ChatGPT 3.5, Perplexity AI, DeepSeek-R1, and Llama 3.3-70B) were tested using multiple-choice items from final-year medical student orthopedic exams in German language. Each model answered identical questions under standardized zero-shot conditions; accuracy rates and item-level results were compared using McNemar’s test, Jaccard similarity, and point-biserial correlation with student difficulty ratings. Results: Ortho GPT achieved the highest accuracy across models. McNemar’s tests revealed the significant superiority of Ortho GPT over DeepSeek (p = 2.33 × 10−35), Llama 3.3-70B (p = 1.11 × 10−32), and Perplexity (p = 4.01 × 10−5). Differences between Ortho GPT and ChatGPT 4o were non-significant (p = 0.065), suggesting near-equivalent performance to the strongest general model. No LLM showed correlation with student item difficulty (|r| < 0.07, p > 0.05), indicating that models solved items independently of human-perceived difficulty. Jaccard indices suggested moderate overlap between Ortho GPT and ChatGPT 4o, but distinct response profiles compared with general LLMs. Conclusions: These findings illustrate the superiority of Ortho GPT in orthopedic exam accuracy and context relevance, attributed to its specialized training data. The domain-specific approach enables performance matching or exceeding top general LLMs in orthopedics, emphasizing the importance of domain specialization for reliable, curriculum-aligned support in medical education. Full article
Show Figures

Figure 1

16 pages, 594 KB  
Article
A Data-Driven Analysis of Cognitive Learning and Illusion Effects in University Mathematics
by Rodolfo Bojorque, Fernando Moscoso, Miguel Arcos-Argudo and Fernando Pesántez
Data 2025, 10(11), 192; https://doi.org/10.3390/data10110192 - 19 Nov 2025
Cited by 1 | Viewed by 1184
Abstract
The increasing adoption of video-based instruction and digital assessment in higher education has reshaped how students interact with learning materials. However, it also introduces cognitive and behavioral biases that challenge the accuracy of self-perceived learning. This study aims to bridge the gap between [...] Read more.
The increasing adoption of video-based instruction and digital assessment in higher education has reshaped how students interact with learning materials. However, it also introduces cognitive and behavioral biases that challenge the accuracy of self-perceived learning. This study aims to bridge the gap between perceived and actual learning by investigating how illusion learning—an overestimation of understanding driven by the fluency of instructional media and autonomous study behaviors—affects cognitive performance in university mathematics. Specifically, it examines how students’ performance evolves across Bloom’s cognitive domains (Understanding, Application, and Analysis) from midterm to final assessments. This paper presents a data-driven investigation that combines the theoretical framework of illusion learning, the tendency to overestimate understanding based on the fluency of instructional media, with empirical evidence drawn from a structured and anonymized dataset of 294 undergraduate students enrolled in a Linear Algebra course. The dataset records midterm and final exam scores across three cognitive domains (Understanding, Application, and Analysis) aligned with Bloom’s taxonomy. Through paired-sample testing, descriptive analytics, and visual inspection, the study identifies significant improvement in analytical reasoning, moderate progress in application, and persistent overconfidence in self-assessment. These results suggest that while students develop higher-order problem-solving skills, a cognitive gap remains between perceived and actual mastery. Beyond contributing to the theoretical understanding of metacognitive illusion, this paper provides a reproducible dataset and analysis framework that can inform future work in learning analytics, educational psychology, and behavioral modeling in higher education. Full article
Show Figures

Figure 1

18 pages, 3035 KB  
Article
A Multi-Institution Mixed Methods Analysis of a Novel Acid-Base Mnemonic Algorithm
by Camille Massaad, Harrison Howe, Meize Guo and Tyler Bland
Multimodal Technol. Interact. 2025, 9(11), 113; https://doi.org/10.3390/mti9110113 - 11 Nov 2025
Cited by 2 | Viewed by 1173
Abstract
Acid-base analysis is a high-load diagnostic skill that many medical students struggle to master when taught using traditional text-based flowcharts. This multi-institution mixed-methods study evaluated a novel visual mnemonic algorithm that integrated Medimon characters, symbolic imagery, and pop-culture references into the standard acid-base [...] Read more.
Acid-base analysis is a high-load diagnostic skill that many medical students struggle to master when taught using traditional text-based flowcharts. This multi-institution mixed-methods study evaluated a novel visual mnemonic algorithm that integrated Medimon characters, symbolic imagery, and pop-culture references into the standard acid-base diagnostic framework. First-year medical students (n = 273) at six distributed WWAMI campuses attended an identical lecture on acid-base physiology. Students at five control campuses received the original text-based algorithm, while students at one experimental campus received the Medimon algorithm in addition. Achievement was measured with a unit exam (nine focal items, day 7) and a final exam (four focal items, day 11). A Differences-in-Differences approach compared performance on focal items versus baseline items across sites. Students at the experimental campus showed no significant advantage on the unit exam (DiD = +1.2%, g = 0.12) but demonstrated a larger, but still non-significant, medium-to-large effect on the final exam (DiD = +11.0%, g = 0.85). At the experimental site, 39 students completed the Situational Interest Survey for Multimedia (SIS-M), revealing significantly higher triggered, maintained-feeling, maintained-value, and overall situational interest scores for the Medimon algorithm (all p < 0.001). Thematic analysis of open-ended responses identified four themes: enhanced clarity, improved memorability, increased engagement, and barriers to interpretation. Collectively, the findings suggest that embedding visual mnemonics and serious-game characters into diagnostic algorithms can enhance learner interest and may improve long-term retention in preclinical medical education. Full article
Show Figures

Figure 1

15 pages, 648 KB  
Article
Evaluating a 30-Hour Training Program for Community Health Workers on 4Ms Implementation in FQHCs Using the Kirkpatrick Model
by Sweta Tewary, Cherell Cottrell-Daniels, Kevin Espinoza, Katherine Chung-Bridges, Diego I. Shmuels, Deborah Gracia and Joycelyn J Lawrence
Healthcare 2025, 13(21), 2677; https://doi.org/10.3390/healthcare13212677 - 23 Oct 2025
Viewed by 857
Abstract
Objective: To evaluate a 30 h educational program delivered to community health care workers (CHWs) involved in geriatric care within a primary care clinic, measure increase in knowledge, likelihood of using the education, and baseline results of geriatric screenings for patients 65 and [...] Read more.
Objective: To evaluate a 30 h educational program delivered to community health care workers (CHWs) involved in geriatric care within a primary care clinic, measure increase in knowledge, likelihood of using the education, and baseline results of geriatric screenings for patients 65 and older conducted by CHWs in their clinics. Methods: Design, Setting and Participants: This is an evaluation with a two-center, pre–post design study of a 30 h in-person educational program. The program used the Kirkpatrick model to evaluate the educational program. The study used quantitative and qualitative data collection with surveys measuring knowledge, feedback, content, and demographics of the participants and chart reviews to measure clinical implementation of 4Ms discussion. Qualitative data collection included a focus group and open-ended questions in the survey. Thematic analysis from focus groups explored the feedback from the educational program. Results: Twelve community health care workers (average age 40, 90% female) from two federally qualified health centers (FQHC) participated in the 30 h training program. Perceived knowledge improved after the completion of the training. Final exam scores after the training were also significant, indicating an improvement in content retention. Overall, 98% of participants described the training as “Excellent” and 96% rated excellent for the speakers who provided the training. Additionally, 83% suggested they would apply the training in their practice. Approximately 40% of chart reviews indicated the completion of the 4Ms (What Matters, Mentation, Medication, and Mobility) discussion with patients. Thematic analysis yielded two new practice dimensions: care provision and clinical documentation. The training resulted in organizational adaptation with the development of an intake form in the Electronic Health Record (EHR) to document the 4Ms. Conclusion: Results indicate improvement in all dimensions of the training with an emphasis on level 4, indicating wider organization adaptation of 4Ms discussion. Full article
Show Figures

Figure 1

19 pages, 257 KB  
Review
From Recall to Resilience: Reforming Assessment Practices in Saudi Theory-Based Higher Education to Advance Vision 2030
by Mubarak S. Aldosari
Sustainability 2025, 17(21), 9415; https://doi.org/10.3390/su17219415 - 23 Oct 2025
Viewed by 1562
Abstract
Assessment practices are central to higher education, particularly critical in theory-based programs, where they facilitate the development of conceptual understanding and higher-order cognitive skills. They also support Saudi Arabia’s Vision 2030 agenda, which aims to drive educational innovation. This narrative review examines assessment [...] Read more.
Assessment practices are central to higher education, particularly critical in theory-based programs, where they facilitate the development of conceptual understanding and higher-order cognitive skills. They also support Saudi Arabia’s Vision 2030 agenda, which aims to drive educational innovation. This narrative review examines assessment practices in theory-based programs at a Saudi public university, identifies discrepancies with learning objectives, and proposes potential solutions. A narrative review synthesised peer-reviewed literature (2015–2025) from Scopus, Web of Science, ERIC, and Google Scholar, focusing on traditional and alternative assessments, barriers, progress, and comparisons with international standards. The review found that traditional summative methods (quizzes, final exams) still dominate and emphasise memorisation, limiting the development of higher-order skills. Emerging techniques, such as projects, portfolios, oral presentations, and peer assessment, are gaining traction but face institutional constraints and resistance from faculty. Digital adoption is growing: 63% of students are satisfied with learning management system tools, and 75% find online materials easy to understand; yet, advanced analytics and AI-based assessments are rare. A comparative analysis reveals that international standards favour formative feedback, adaptive technologies, and holistic competencies. The misalignment between current practices and Vision 2030 highlights the need to broaden assessment portfolios, integrate technology, and provide faculty training. Saudi theory-based programs must transition from memory-oriented evaluations to student-centred, evidence-based assessments that foster critical thinking and real-world application. Adopt diverse assessments (projects, portfolios, peer reviews), invest in digital analytics and adaptive learning, align assessments with learning outcomes and Vision 2030 competencies, and implement ongoing faculty development. The study offers practical pathways for reform and highlights strategic opportunities for achieving Saudi Arabia’s national learning outcomes. Full article
(This article belongs to the Section Sustainable Education and Approaches)
13 pages, 644 KB  
Article
Reaching Students Where They Scroll: A Pilot Study Using Facebook as a Supplementary Learning Platform in Undergraduate Anatomy and Physiology Education
by Homaira M. Azim, Dimitrios E. Bakatsias, Brittnay K. Harrington, Patrick A. Vespa and Kristyn A. Spetz
Anatomia 2025, 4(4), 16; https://doi.org/10.3390/anatomia4040016 - 15 Oct 2025
Viewed by 1136
Abstract
Background: Social networking platforms offer promising educational value, particularly for undergraduate students whose daily lives are deeply embedded in online spaces. Yet in most courses, instructional technologies remain limited to institutional learning management systems (LMSs), which often do not foster informal interaction or [...] Read more.
Background: Social networking platforms offer promising educational value, particularly for undergraduate students whose daily lives are deeply embedded in online spaces. Yet in most courses, instructional technologies remain limited to institutional learning management systems (LMSs), which often do not foster informal interaction or community. This study examined whether supplementing LMSs with a Facebook group could enhance academic outcomes and retention in undergraduate Anatomy and Physiology (A&P) courses. Methods: Over two semesters, two student cohorts (n = 39) were taught by the same instructor using identical materials; one cohort also used a closed Facebook group for course-related engagement. Results: While final course grades were not significantly different between groups (p = 0.186), students in the Facebook cohort scored significantly higher on mid-semester unit exams (p < 0.001 to p = 0.006). Regression analysis revealed a 9.4% higher mean final course grade among Facebook users. Importantly, the pass rate in the Facebook cohort was 94.7% compared to 45.0% in the control group, with dropout rates significantly lower (5.3% vs. 55%, p = 0.001). Conclusions: These findings suggest that incorporating social media into undergraduate science instruction may promote academic success and retention by providing a familiar, collaborative space for active learning and peer support. Full article
Show Figures

Figure 1

Back to TopTop