MDPI - Publisher of Open Access Journals

22 pages, 923 KB

Open AccessArticle

AI-Powered Natural Language Processing Framework for Reverse-Engineering Examination Questions from Marking Schemes

by Julius Olaniyan, Silas Formunyuy Verkijika and Ibidun Christiana Obagbuwa

Computers 2026, 15(4), 204; https://doi.org/10.3390/computers15040204 - 26 Mar 2026

Viewed by 713

The generation of examination questions from examiner-provided marking schemes remains a critical yet underexplored challenge in automated assessment. This study proposes an AI-powered natural language processing (NLP) framework that reverse-engineers exam questions using transformer-based generative modeling, semantic reconstruction, and pedagogical constraints. Marking schemes are encoded with MPNet embeddings and decoded into candidate questions by a T5-small model, with a reconstruction module ensuring semantic fidelity and Bloom-level embeddings enforcing cognitive alignment. Evaluation on a dataset of 7021 marking schemes from Sol Plaatje University demonstrated strong performance, with BLEU = 0.71, ROUGE-L = 0.68, METEOR = 0.65, reconstruction fidelity = 0.84, and Bloom-level accuracy = 0.79. Comparative baselines, including an unconstrained T5 (BLEU = 0.62, RF = 0.68, Bloom = 0.56) and rule-based methods (BLEU = 0.48, RF = 0.51, Bloom = 0.43), confirmed the effectiveness of the proposed approach. The results indicate that the framework generates questions that are semantically accurate, structurally coherent, and pedagogically valid, offering a scalable solution for adaptive assessment, digital archiving, and automated exam construction. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling (2nd Edition))

► Show Figures

Graphical abstract

21 pages, 2721 KB

Open AccessArticle

Assessing the Efficacy of Artificial Intelligence Platforms in Answering Dental Caries Multiple-Choice Questions: A Comparative Study of ChatGPT and Google Gemini Language Models

by Amr Ahmed Azhari, Walaa Magdy Ahmed, Abdulaziz Alhamadani, Amal Alfaraj, Min Zhang and Chang-Tien Lu

Dent. J. 2026, 14(2), 72; https://doi.org/10.3390/dj14020072 - 27 Jan 2026

Viewed by 831

Abstract

Objective: This study aimed to compare the accuracy of two large language models (LLMs)—ChatGPT (version 3.5) and Google Gemini (formerly Bard)—in answering dental caries-related multiple-choice questions (MCQs) using a simulated student examination framework across seven examination lengths. Materials and Methods: A total of 125 validated dental caries MCQs were extracted from Dental Decks and Oxford University Press question banks. Seven examination groups were constructed with varying question counts (25, 35, 45, 55, 65, 75, and 85 questions). For each group, 100 simulations were generated per LLM (ChatGPT and Gemini), resulting in 1400 simulated examinations. Each simulated student received a unique randomized subset of questions. MCQs were answered by each LLM using a standardized prompt to minimize ambiguity. Outcomes included mean score, passing rate (≥60%), and performance differences between LLMs. Statistical analyses included independent t-tests, one-way ANOVA within each LLM, and two-way ANOVA examining interactions between LLM type and question count. Results: Across all seven examination formats, Gemini significantly outperformed ChatGPT (p < 0.001). Gemini achieved higher passing rates and higher mean scores in every examination length. One-way ANOVA revealed significant score variation with increasing exam length for both LLMs (p < 0.05). Two-way ANOVA demonstrated significant main effects of LLM type and question count, with no significant interaction. Randomization had no measurable effect on Gemini performance but influenced ChatGPT scores. Conclusions: Gemini demonstrated superior accuracy and higher passing rates compared to ChatGPT in all simulated examination formats. While both LLMs struggled with complex caries-related content, Gemini provided more reliable performance across question quantities. Educators should exercise caution in relying on LLMs for automated assessment or self-study, and future research should evaluate human–AI hybrid models and LLM performance across broader dental domains. Full article

► Show Figures

Figure 1

32 pages, 29650 KB

Open AccessArticle

Unsupervised Optical Mark Recognition on Answer Sheets for Massive Printed Multiple-Choice Tests

by Yahir Hernández-Mier, Marco Aurelio Nuño-Maganda, Said Polanco-Martagón, Guadalupe Acosta-Villarreal and Rubén Posada-Gómez

J. Imaging 2025, 11(9), 308; https://doi.org/10.3390/jimaging11090308 - 8 Sep 2025

Viewed by 4156

Abstract

The large-scale evaluation of multiple-choice tests is a challenging task from the perspective of image processing. A typical instrument is a multiple-choice question test that employs an answer sheet with circles or squares. Once students have finished the test, the answer sheets are digitized and sent to a processing center for scoring. Operators compute each exam score manually, but this task requires considerable time. While it is true that mature algorithms exist for detecting circles under controlled conditions, they may fail in real-life applications, even when using controlled conditions for image acquisition of the answer sheets. This paper proposes a desktop application for optical mark recognition (OMR) on the scanned multiple-choice question (MCQ) test answer sheets. First, we compiled a set of answer sheet images corresponding to 6029 exams (totaling 564,040 four-option answers) applied in 2024 in Tamaulipas, Mexico. Subsequently, we developed an image-processing module that extracts answers from the answer sheets and an interface for operators to perform analysis by selecting the folder containing the exams and generating results in a tabulated format. We evaluated the image-processing module, achieving a percentage of 96.15% of exams graded without error and 99.95% of 4-option answers classified correctly. We obtained these percentages by comparing the answers generated through our system with those generated by human operators, who took an average of 2 min to produce the answers for a single answer sheet, while the automated version took an average of 1.04 s. Full article

(This article belongs to the Special Issue Self-Supervised Learning for Image Processing and Analysis)

► Show Figures

Figure 1

15 pages, 4095 KB

Open AccessArticle

AI-Generated Mnemonic Images Improve Long-Term Retention of Coronary Artery Occlusions in STEMI: A Comparative Study

by Zahraa Alomar, Meize Guo and Tyler Bland

Technologies 2025, 13(6), 217; https://doi.org/10.3390/technologies13060217 - 26 May 2025

Cited by 4 | Viewed by 3501

Abstract

Medical students face significant challenges retaining complex information, such as interpreting ECGs for coronary artery occlusions, amidst demanding curricula. While artificial intelligence (AI) is increasingly used for medical image analysis, this study explored using generative AI (DALLE-3) to create mnemonic-based images to enhance human learning and retention of medical images, in particular, electrocardiograms (ECGs). This study is among the first to investigate generative AI as a tool not for automated diagnosis but as a human-centered educational aid designed to enhance long-term retention in complex visual tasks like ECG interpretation. We conducted a comparative study with 275 first-year medical students across six campuses; an experimental group (n = 40) received a lecture supplemented with AI-generated mnemonic ECG images, while control groups (n = 235) received standard lectures with traditional ECG diagrams. Student achievement and retention were assessed by course examinations, and student preference and engagement were measured using the Situational Interest Survey for Multimedia (SIS-M). Control groups showed a significant decline in scores on the relevant exam question over time, whereas the experimental group’s scores remained stable, indicating improved long-term retention. Experimental students also reported significantly higher situational interest in the mnemonic-based images over traditional images. AI-generated mnemonic images can effectively improve long-term retention of complex ECG interpretation skills and enhance student engagement and preference, highlighting generative AI’s potential as a valuable cognitive tool in image analysis during medical education. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Medical Image Analysis)

► Show Figures

Figure 1

34 pages, 6263 KB

Open AccessArticle

Advancing AI in Higher Education: A Comparative Study of Large Language Model-Based Agents for Exam Question Generation, Improvement, and Evaluation

by Vlatko Nikolovski, Dimitar Trajanov and Ivan Chorbev

Algorithms 2025, 18(3), 144; https://doi.org/10.3390/a18030144 - 4 Mar 2025

Cited by 22 | Viewed by 8597

Abstract

The transformative capabilities of large language models (LLMs) are reshaping educational assessment and question design in higher education. This study proposes a systematic framework for leveraging LLMs to enhance question-centric tasks: aligning exam questions with course objectives, improving clarity and difficulty, and generating new items guided by learning goals. The research spans four university courses—two theory-focused and two application-focused—covering diverse cognitive levels according to Bloom’s taxonomy. A balanced dataset ensures representation of question categories and structures. Three LLM-based agents—VectorRAG, VectorGraphRAG, and a fine-tuned LLM—are developed and evaluated against a meta-evaluator, supervised by human experts, to assess alignment accuracy and explanation quality. Robust analytical methods, including mixed-effects modeling, yield actionable insights for integrating generative AI into university assessment processes. Beyond exam-specific applications, this methodology provides a foundational approach for the broader adoption of AI in post-secondary education, emphasizing fairness, contextual relevance, and collaboration. The findings offer a comprehensive framework for aligning AI-generated content with learning objectives, detailing effective integration strategies, and addressing challenges such as bias and contextual limitations. Overall, this work underscores the potential of generative AI to enhance educational assessment while identifying pathways for responsible implementation. Full article

(This article belongs to the Special Issue Artificial Intelligence Algorithms and Generative AI in Education)

► Show Figures

Figure 1

22 pages, 988 KB

Open AccessArticle

Assessing the Potential and Risks of AI-Based Tools in Higher Education: Results from an eSurvey and SWOT Analysis

by Kerstin Denecke, Robin Glauser and Daniel Reichenpfader

Trends High. Educ. 2023, 2(4), 667-688; https://doi.org/10.3390/higheredu2040039 - 6 Dec 2023

Cited by 29 | Viewed by 18451

Abstract

Recent developments related to tools based on artificial intelligence (AI) have raised interests in many areas, including higher education. While machine translation tools have been available and in use for many years in teaching and learning, generative AI models have sparked concerns within the academic community. The objective of this paper is to identify the strengths, weaknesses, opportunities and threats (SWOT) of using AI-based tools (ABTs) in higher education contexts. We employed a mixed methods approach to achieve our objectives; we conducted a survey and used the results to perform a SWOT analysis. For the survey, we asked lecturers and students to answer 27 questions (Likert scale, free text, etc.) on their experiences and viewpoints related to AI-based tools in higher education. A total of 305 people from different countries and with different backgrounds answered the questionnaire. The results show that a moderate to high future impact of ABTs on teaching, learning and exams is expected by the participants. ABT strengths are seen as the personalization of the learning experience or increased efficiency via automation of repetitive tasks. Several use cases are envisioned but are still not yet used in daily practice. Challenges include skills teaching, data protection and bias. We conclude that research is needed to study the unintended consequences of ABT usage in higher education in particular for developing countermeasures and to demonstrate the benefits of ABT usage in higher education. Furthermore, we suggest defining a competence model specifying the required skills that ensure the responsible and efficient use of ABTs by students and lecturers. Full article

(This article belongs to the Special Issue EdTech in Higher Education: Future Perspective on Teaching and Learning)

► Show Figures

Figure 1

16 pages, 1744 KB

Open AccessArticle

An Ontology-Driven Learning Assessment Using the Script Concordance Test

by Maja Radovic, Nenad Petrovic and Milorad Tosic

Appl. Sci. 2022, 12(3), 1472; https://doi.org/10.3390/app12031472 - 29 Jan 2022

Cited by 7 | Viewed by 4003

Abstract

Assessing the level of domain-specific reasoning acquired by students is one of the major challenges in education particularly in medical education. Considering the importance of clinical reasoning in preclinical and clinical practice, it is necessary to evaluate students’ learning achievements accordingly. The traditional way of assessing clinical reasoning includes long-case exams, oral exams, and objective structured clinical examinations. However, the traditional assessment techniques are not enough to answer emerging requirements in the new reality due to limited scalability and difficulty for adoption in online education. In recent decades, the script concordance test (SCT) has emerged as a promising tool for assessment, particularly in medical education. The question is whether the usability of SCT could be raised to a level high enough to match the current education requirements by exploiting opportunities that new technologies provide, particularly semantic knowledge graphs (SCGs) and ontologies. In this paper, an ontology-driven learning assessment is proposed using a novel automated SCT generation platform. SCTonto ontology is adopted for knowledge representation in SCT question generation with the focus on using electronic health records data for medical education. Direct and indirect strategies for generating Likert-type scores of SCT are described in detail as well. The proposed automatic question generation was evaluated against the traditional manually created SCT, and the results showed that the time required for tests creation significantly reduced, which confirms significant scalability improvements with respect to traditional approaches. Full article

(This article belongs to the Special Issue Application of Ontologies and Semantic Web Technologies in Biomedical Science)

► Show Figures

Figure 1

Search Results (7)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (7)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI