Saved Queries

by Amy H. Y. Chow, Barbara Caffery, Angela Di Marco, Sarah Guthrie, Mira Acs, Stephanie Fromstein, Shalu Pal, Stephanie Ramdass, Vishakha Thakrar, Matthew Zeidenberg and Deborah A. Jones

J. Clin. Med. 2026, 15(7), 2748; https://doi.org/10.3390/jcm15072748 - 5 Apr 2026

Viewed by 307

Abstract

Background/Objectives: Given the growing prevalence of myopia worldwide, prevention and proactive management of at-risk children becomes increasingly important. This study sought to evaluate trends in myopia development in pediatric pre-myopic patients and determine how optometrists in Canada manage pre-myopia. Methods: In this retrospective chart review, records for children aged 6–10 years who had an eye exam between 2017 and 2021 were reviewed. Pre-myopic children were included if the presenting refraction at the first visit was between +0.75D and −0.25D (inclusive). Up to five unique patients were selected for each age (6, 7, 8, 9, and 10) and initial visit year (2017 to 2021) at each clinical site. Demographic information, refractive status and recommended interventions were recorded. Results: A total of 1740 pre-myopic patients were included across 15 practices in Ontario, of which 184 patients developed myopia (10.6%) during the years studied. Cohort year groups did not differ in baseline age (mean ± SD 8.39 ± 1.43 years) or baseline refractive error (+0.13 ± 0.27 DS). At initial encounters, most clinicians monitored without intervention (mean across cohort years 91.9%), with some recommending lifestyle changes (3.5%) and SV spectacles/CL (3.0%). This pattern remained stable over the years studied. Pre-myopic children developed myopia at a similar age over the study period (mean ± SE: 9.66 ± 0.16 years) and experienced a faster rate of loss of hyperopic reserve (loss of −0.26 ± 0.07 D/year in the 2017 cohort vs. −0.73 ± 0.18 D/year in the 2020 cohort and −0.71 ± 0.10 D/year in the 2021 cohort) regardless of patient age. Conclusions: Pre-myopic children in the 2020 and 2021 cohort years experienced an accelerated loss of hyperopic reserve compared to those in the 2017 cohort. Despite this, very few pre-myopic children were recommended lifestyle changes, which were known to be effective for delaying myopia onset. Since delaying myopia onset may be more impactful than subsequent myopia treatment, additional research should focus on effective interventions for the pre-myopic population. Full article

(This article belongs to the Special Issue Unraveling Myopia: Current Science, Clinical Impact, and Future Horizons)

►▼ Show Figures

Figure 1

15 pages, 1755 KB

Open AccessArticle

A Faculty-Constructed AI Tutor for Personalized Learning and Remediation in a U.S. PharmD Immunology Course: An “In-House” Evaluation of New Learning Technology

by Ashim Malhotra

Pharmacy 2026, 14(2), 59; https://doi.org/10.3390/pharmacy14020059 - 3 Apr 2026

Viewed by 220

Abstract

While generative AI becomes increasingly available in higher education, faculties find it challenging to design, implement, and evaluate AI-enabled personalized learning systems within accreditation-constrained professional curricula. This method paper describes ADAPT (Assessment-Driven AI for Personalized Tutoring), a home-grown AI tutoring and remediation ecosystem implemented in a required PharmD immunology course. Using standard learning management (Canvas) and assessment (ExamSoft) platforms, a 20-item quiz mapped to six immunology mastery domains (N = 34; mean 69.1%, SD 17.9; Cronbach’s α = 0.73) was used to trigger tiered, structured generative AI remediation at both individual student and cohort levels. Instructional impact was evaluated using reliability indices, item-level difficulty analyses, and paired pre/post-assessment comparisons. Following AI-guided remediation, mean performance increased to 79.8% (+10.7 percentage points), variability decreased (SD 14.4), and assessment reliability improved (ExamSoft KR-20 0.87) compared with the diagnostic exam, the first midterm exam, and the final exam, respectively. Item difficulty stabilized (mean ≈ 0.80), with sustained retention of targeted concepts on the final examination. ADAPT provides a replicable, low-cost methodological blueprint for faculties to independently construct assessment-driven AI tutoring systems and lays the foundational steps for future AI-based predictive analysis workflow for at-risk students. Full article

(This article belongs to the Section Pharmacy Education and Student/Practitioner Training)

►▼ Show Figures

Figure 1

15 pages, 381 KB

Open AccessArticle

Assessment Validity in the Age of Generative AI: A Natural Experiment

by Håvar Brattli, Alexander Utne and Matthew Lynch

Informatics 2026, 13(4), 56; https://doi.org/10.3390/informatics13040056 - 3 Apr 2026

Viewed by 306

Abstract

Universities play a dual role as sites of learning and as institutions that certify student competence through assessment. The rapid diffusion of generative artificial intelligence (GenAI) challenges this certification function by altering the conditions under which assessment evidence is produced. When powerful AI tools are widely available, grades may increasingly reflect a combination of individual understanding and external cognitive support rather than solely independent competence. This study examines how changes in assessment format interact with GenAI availability to reshape observable performance outcomes in higher education. Using exam grade data from a compulsory undergraduate course delivered over five years (2021–2025; N = 1066), the study exploits a naturally occurring change in assessment conditions as a natural experiment. From 2021 to 2024, the course was assessed using an AI-permissive take-home examination, while in 2025 the assessment shifted to an AI-restricted, supervised in-person examination. Course content, intended learning outcomes, grading criteria, examiner continuity, and the structural design of the examination tasks remained stable across cohorts. The results reveal a pronounced shift in grade distributions coinciding with the format change. Failure rates increased sharply in 2025, mid-range grades declined, and the proportion of top grades remained largely unchanged. Statistical analysis indicates a significant association between examination period and grade outcomes (χ²(5, N = 1066) = 60.62, p < 0.001), with a small-to-moderate effect size (Cramér’s V = 0.24), driven primarily by the increase in failing grades. These findings suggest that AI-permissive and AI-restricted assessment formats may not be measurement-equivalent under conditions of widespread GenAI use. The results raise concerns about construct validity and the credibility of grades as signals of independent competence, while also highlighting tensions between certification credibility and assessment authenticity. Full article

(This article belongs to the Special Issue Generative AI in Higher Education: Applications, Implications, and Future Directions)

►▼ Show Figures

Figure 1

19 pages, 1462 KB

Open AccessArticle

Heterogeneous Layout-Aware Cross-Modal Knowledge Point Classification for Exam Questions

by Zhushun Su, Bi Zeng, Pengfei Wei, Keyun Wang and Zhentao Lin

Computation 2026, 14(4), 82; https://doi.org/10.3390/computation14040082 - 1 Apr 2026

Viewed by 182

Abstract

With the continuous emergence of exam question types, accurate classification of knowledge points is crucial for intelligent exam analysis. Existing methods focus on text or text–image fusion but largely ignore spatial layout. To address this limitation, we propose a heterogeneous layout-aware cross-modal framework for knowledge point classification. The architecture begins with an encoding module where independent text and layout encoders extract semantic content and spatial configurations, respectively. We then design a layout-aware enhancing module consisting of two parallel cross-modal blocks, namely a Layout-Aware Text-Enhancing block and a Context-Aware Layout-Enhancing block. This module supports the bidirectional fusion of text and layout features and generates a comprehensive representation that integrates both semantic and spatial information. Furthermore, a dynamic router with top-k expert selection is introduced to dynamically adapt to question-specific knowledge distributions and focus on core knowledge points for precise classification. Experimental results demonstrate that our method effectively integrates text and layout information, significantly enhancing performance on the proposed QType-EDU dataset. The approach achieves 91.56% accuracy for coarse-grained classification and 80.58% for fine-grained classification, with an overall F1-score of 91.39%, surpassing all baseline models. Full article

(This article belongs to the Section Computational Engineering)

►▼ Show Figures

Figure 1

19 pages, 610 KB

Open AccessArticle

Quality Assessment of Generative AI in Cybersecurity Certification

by Vanessa G. Félix, Rodolfo Ostos, Luis J. Mena, Homero Toral-Cruz, Alberto Ochoa-Brust, Pablo Velarde-Alvarado, Apolinar González-Potes, Ramón A. Félix-Cuadras, José A. León-Borges and Rafael Martínez-Peláez

Informatics 2026, 13(4), 53; https://doi.org/10.3390/informatics13040053 - 30 Mar 2026

Viewed by 527

Abstract

Generative Artificial Intelligence (GenAI), particularly Large Language Models (LLMs), is rapidly changing how higher education approaches teaching, learning, and assessment. In cybersecurity education, professional certification exams are key for measuring competence and helping professionals find better job offers, but there is little research on how GenAI systems perform in these exam settings. This study looks at how three popular LLMs, ChatGPT-5, Gemini-2.5 Pro, and Copilot-2.5 Pro, handle 183 practice questions from the CompTIA Security+ certification. The study used a two-phase evaluation: a domain-based assessment and a full-length practice exam that mirrors real certification tests. The researchers measured model performance with accuracy scores, chi-square tests for statistical differences, and an error taxonomy to spot patterns of mistakes important for education. All three GenAI systems scored above the passing mark, and there were no significant differences between them. Still, the error analysis showed ongoing conceptual and classification mistakes that did not show up in the overall accuracy scores. Our results show that GenAI systems can pass structured certification tests, but accuracy by itself does not fully measure professional skills. The study points out important issues for the reliability and validity of AI-based assessments in higher education and stresses the need for more realistic, concept-focused ways to evaluate GenAI in cybersecurity education. Full article

(This article belongs to the Special Issue Generative AI in Higher Education: Applications, Implications, and Future Directions)

►▼ Show Figures

Figure 1

15 pages, 2435 KB

Open AccessArticle

Clinical Performance Tradeoffs of ChatGPT-5.2 Thinking (OpenAI) Compared with Radiologist Interpretation in Biopsy-Referred Mammography: Cancer Detection, False Positives, and Laterality

by Mohammad Alarifi, Areej Aloufi, Abdulrahman Jabour, Ahmad Abanomy, Haitham Alahmad, Khaled Alenazi, Alhanouf Alshedi and Mansour Almanaa

Tomography 2026, 12(4), 45; https://doi.org/10.3390/tomography12040045 - 29 Mar 2026

Viewed by 412

Abstract

Background/Objectives: Breast cancer screening such as mammography supports earlier detection, but variability in interpretation can still lead to missed cancers and avoidable follow-up testing. We evaluated ChatGPT-5.2 Thinking (OpenAI) as a stand-alone model for examination-level malignancy classification on standard bilateral mammography views in a biopsy-referred cohort, compared with breast radiologists, and assessed laterality performance. Methods: We conducted a retrospective, multicenter diagnostic-accuracy study across breast imaging centers in Saudi Arabia. From an upstream screened cohort (n = 1225), we constructed a biopsy-referred test set of 100 mammography examinations (four 2D views per exam: bilateral CC and MLO; 400 images), including 61 biopsy-confirmed malignancies and 39 biopsy-negative controls, with pathology as the reference standard. Radiologists were blinded to pathology and AI outputs and assigned BI-RADS (0–5) and suspected laterality. ChatGPT-5.2 interpreted the same de-identified views using a BI-RADS-guided prompt to generate BI-RADS and laterality. The sensitivity, specificity, accuracy, and laterality concordance were then estimated. Results: ChatGPT-5.2 had higher sensitivity than radiologists (95.08% vs. 81.97%) but markedly lower specificity (10.26% vs. 56.41%), resulting in lower overall accuracy (62.00% vs. 72.00%). The AI produced 58 true positives, 35 false positives, and 3 false negatives, while radiologists produced 50 true positives, 17 false positives, and 11 false negatives. Laterality accuracy among malignant examinations was 60.66%. Conclusions: In this pathology-anchored, biopsy-referred evaluation, ChatGPT-5.2 identified more cancers but generated substantially more false-positive classifications and showed only moderate breast-side localization. These findings support use as a concurrent aid or prioritization tool rather than a stand-alone reader and motivate efforts to improve specificity and laterality before prospective validation. Full article

(This article belongs to the Section Artificial Intelligence in Medical Imaging)

►▼ Show Figures

Figure 1

21 pages, 363 KB

Open AccessArticle

Teacher Bilingual Ideology as Catalyst in EAP: Influencing Chinese Graduate Students’ Language Beliefs

by Shuai An and Wenli Zhang

Educ. Sci. 2026, 16(4), 516; https://doi.org/10.3390/educsci16040516 - 26 Mar 2026

Viewed by 307

Abstract

English for Academic Purposes (EAP) courses primarily aim to cultivate academic communication, yet English-only norms and exam-oriented histories often discourage bilingual participation. This qualitative study traced Chinese graduate students’ language-belief development over one semester in a graduate EAP course and examined how the instructor mediated that process. Data included two rounds of open-ended surveys in two intact classes (N = 40), two interview rounds and end-of-semester reflections from ten purposively selected focus students (n = 10), and video-recorded classroom observations of 12 lessons. Findings show that the students increasingly legitimized bilingual participation and reframed English learning from test preparation toward academic communication. Beliefs nevertheless remained layered. Many still upheld an English-only ideal, treated English as the default language, and positioned the first language (L1) mainly as support when second language (L2) expression became difficult. Endorsement also exceeded uptake, with L1 use treated as a compensatory fallback rather than a co-equal academic resource. Instructor policy, conceptual framing, and interactional modeling reduced anxiety around bilingual moves and sometimes supported greater willingness to attempt more English, which identifies mechanisms for bilingual-aware EAP pedagogy in monolingual-leaning EFL contexts. Full article

(This article belongs to the Special Issue Research, Innovation, and Practice in Bilingual Education)

22 pages, 923 KB

Open AccessArticle

AI-Powered Natural Language Processing Framework for Reverse-Engineering Examination Questions from Marking Schemes

by Julius Olaniyan, Silas Formunyuy Verkijika and Ibidun Christiana Obagbuwa

Computers 2026, 15(4), 204; https://doi.org/10.3390/computers15040204 - 26 Mar 2026

Viewed by 368

Abstract

The generation of examination questions from examiner-provided marking schemes remains a critical yet underexplored challenge in automated assessment. This study proposes an AI-powered natural language processing (NLP) framework that reverse-engineers exam questions using transformer-based generative modeling, semantic reconstruction, and pedagogical constraints. Marking schemes are encoded with MPNet embeddings and decoded into candidate questions by a T5-small model, with a reconstruction module ensuring semantic fidelity and Bloom-level embeddings enforcing cognitive alignment. Evaluation on a dataset of 7021 marking schemes from Sol Plaatje University demonstrated strong performance, with BLEU = 0.71, ROUGE-L = 0.68, METEOR = 0.65, reconstruction fidelity = 0.84, and Bloom-level accuracy = 0.79. Comparative baselines, including an unconstrained T5 (BLEU = 0.62, RF = 0.68, Bloom = 0.56) and rule-based methods (BLEU = 0.48, RF = 0.51, Bloom = 0.43), confirmed the effectiveness of the proposed approach. The results indicate that the framework generates questions that are semantically accurate, structurally coherent, and pedagogically valid, offering a scalable solution for adaptive assessment, digital archiving, and automated exam construction. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling (2nd Edition))

►▼ Show Figures

Graphical abstract

15 pages, 254 KB

Open AccessArticle

Sustaining Learning Practices: Exploring the Roles of External Engagement for Engineering Graduates

by Pornthipa Ongkunaruk, Panuwat Rodchom, Bordin Rassameethes and Kongkiti Phusavat

Sustainability 2026, 18(7), 3218; https://doi.org/10.3390/su18073218 - 25 Mar 2026

Viewed by 386

Abstract

This exploratory study addresses the shift in employment preferences among recent engineering graduates toward small and medium enterprises (SMEs) and startups, highlighting the importance of learning. Learning, instead of conventional training, has been pivotal for the sustainability of small businesses. The study evaluated industrial engineering students’ perception of learning following a 2023 change, which replaced traditional exams with professional presentations and business reports based on enterprise visits. Using a mixed-methods approach with 218 third-year students, the findings demonstrate that external engagement relates positively to the perception of learning. Given the rising interest among new engineering graduates in SMEs and startups, these findings offer useful background for preparing workplaces to support and sustain business operations. Full article

(This article belongs to the Special Issue Sustainable Higher Education: Innovative Teaching and Learning, and Leadership for Creating Impacts on Local Society and Globally)

13 pages, 1341 KB

Open AccessArticle

Incidental Hepatic Findings in Cardiac Magnetic Resonance Imaging Examinations in Patients with Congenital Heart Disease: A Pilot Study

by Gretha Hecke, Bianca Haase, Nikolaus Clodi, Karolin Hauptvogel, David Plajer, Jakob Spogis, Anja Hanser, Jürgen F. Schäfer, Konstantin Nikolaou, Johannes Nordmeyer and Sarah Nordmeyer

J. Clin. Med. 2026, 15(6), 2453; https://doi.org/10.3390/jcm15062453 - 23 Mar 2026

Viewed by 359

Abstract

Objectives: During cardiac magnetic resonance imaging (cMRI) exams in patients with congenital heart disease (CHD), incidental liver abnormalities are increasingly found. However, no systematic data exist on the incidence of liver lesions in patients with different CHDs. In order to gain a first overview, we retrospectively analyzed cMRI examinations from the last 10 years at our institution. Methods: CMRI examinations including T2-weighted images covering parts of the liver were performed on 899 patients with CHD at our institution between 2014 and 2024. The cMRI examinations were analyzed by a medical student, a pediatrician, a radiologist, and a pediatric cardiologist. Liver lesions were defined as atypical liver parenchyma, showing T2 hyper- or hypointensity compared to the surrounding liver tissue. Results: Liver lesions were found in 9.5% (85/899) of all cMRI studies; of these, 89% ((76/85) of cases) were unknown at time of cMRI, 96% (82/85) were T2 hyperintense, and 38% (32/85) were larger than 1 cm. The patients with liver lesions were older (29 years vs. 22 years, p < 0.0001). There were no sex differences in the incidence of liver lesions or differences in right or left ventricular function (LVEF: 57% vs. 58%, p = 0.78; RVEF: 55% vs. 54%, p = 0.35). The patients with univentricular hearts, transposition of great arteries after atrial switch operation, and atrial septal defects showed the highest incidence (18%, 17%, and 21%, respectively). However, 9% of patients with left heart-sided valve disease also showed liver lesions. Conclusions: Incidental findings of liver lesions in cMRI examinations of patients with CHD are reasonably high with almost 10%. In the growing population of adults with CHD, liver monitoring might be helpful to assure overall patient health. Full article

(This article belongs to the Section Cardiovascular Medicine)

►▼ Show Figures

Figure 1

19 pages, 1032 KB

Open AccessReview

Assessment of Congestion in Heart Failure Using VExUS: Current Evidence, Limitations and Clinical Perspectives

by Cosmina-Georgiana Ponor, Maria-Ruxandra Cepoi, Marilena Renata Spiridon, Ionuț Tudorancea, Amelian Mădălin Bobu, Minerva Codruta Badescu, Alexandru Dan Costache, Sandu Cucută and Irina-Iuliana Costache-Enache

Life 2026, 16(3), 518; https://doi.org/10.3390/life16030518 - 20 Mar 2026

Viewed by 1361

Abstract

Background: Systemic venous congestion is a key driver of organ dysfunction in heart failure (HF), yet accurate non-invasive quantification remains challenging. Recognizing residual congestion is critical, since it predicts HF readmissions and mortality. Traditional assessments (physical exam, jugular venous pressure, inferior vena cava [IVC] size) are imprecise. The Venous Excess Ultrasound Score (VExUS) is a semi-quantitative point-of-care ultrasound (POCUS) protocol that integrates IVC diameter with Doppler flow patterns in the hepatic, portal and intrarenal veins to grade systemic venous overload. Methods: We conducted a narrative review of literature (2018–2025) regarding the usefulness of VExUS in HF, covering congestion pathophysiology, clinical evidence (hemodynamic correlations, organ dysfunction, outcomes), potential applications, integration with lung ultrasound, echocardiography and biomarkers, limitations of its assessment and future directions. Results and Discussions: In HF, elevated right atrial pressure causes venous congestion. VExUS integrates IVC diameter with Doppler waveforms of hepatic, portal, and intrarenal veins to grade congestion. Emerging evidence shows higher VExUS grades correlate with elevated filling pressures, renal dysfunction, and worse outcomes. Its use may guide diuretic therapy, aid discharge planning, and monitor outpatient congestion, especially when combined with lung ultrasound and biomarkers. However, VExUS has limitations: it is technical and operator-dependent. Importantly, large trials validating VExUS-guided management are lacking. Future directions include AI-driven automation of Doppler analysis and integration with multimodal congestion monitoring to provide a comprehensive congestion assessment. Conclusions: VExUS is a promising noninvasive tool for quantifying congestion in HF. Higher grades are associated with organ dysfunction and poor prognosis. Incorporating this technique into HF care may improve congestion-guided therapy, but large-scale validation is required before routine use. Full article

(This article belongs to the Special Issue Precision Medicine in Heart Failure: From Biomarkers to Targeted Therapies)

►▼ Show Figures

Figure 1

15 pages, 1561 KB

Open AccessArticle

Virtual Reality Enables Rapid and Multi-Faceted Vision Screening in a Pilot Study

by Margarita Labkovich, Andrew J. Warburton, Christopher P. Cheng, Oluwafeyikemi O. Okome, Vicente Navarro, Randal A. Serafini, Aly A. Valliani, Harsha Reddy and James Chelnis

J. Clin. Transl. Ophthalmol. 2026, 4(1), 8; https://doi.org/10.3390/jcto4010008 - 18 Mar 2026

Viewed by 232

Abstract

Background: Given global population growth and aging, it is imperative to prioritize early eye disease detection and treatment. However, as patient volume increases, providers are facing a shortage of workforce capacity, particularly in areas where eye doctors are already scarce, making it important to consider alternative innovative solutions that could help increase eye screening capabilities. This study compared virtual reality (VR) platform of vision screening exams that are used to evaluate ocular health, such as 24-2 perimetry, Ishihara tiles, and the Amsler grid, against their in-clinic counterparts. Methods: A total of 86 subjects were recruited from Mount Sinai’s ophthalmology clinic (New York, USA) for a comparison trial that was internally controlled across healthy eyes and those with glaucoma and retinal diseases. VR and in-office tests were administered to the patients during their clinical visit, including 24-2 perimetry, Ishihara tiles, and the Amsler grid in a randomized order, and the results were compared for each test. Results: Perimetry results from Humphrey Visual Field Analyzer (HVFA) and VR suprathreshold testing demonstrated a good sensitivity both overall (80% OD, 84% OS) and across control (86% OD, 89% OS), glaucoma (69% OD, 78% OS), and retinal disease (76% OD, 80% OS) groups. A Garway-Heath anatomical map showed an overall 70–80% agreement. Ishihara plate tests did not show a significant difference between the two testing modalities (p = 0.12; Mann–Whitney U test), which remained true across all groups. Amsler grid testing differences were also non-significant within each subgroup (p = 0.81; Mann–Whitney U test). Patient time required to complete VR exams was significantly improved (p < 0.0001; Welch’s t-test) compared to the clinical standard tests. Conclusions: All VR-based exams tested in this study showed high sensitivity and percent agreement when compared to their in-office standards. Given the results of this study, VR has a promising potential in visual function screening, which, in addition to its portable design and easy use, could assist eye doctors in screening for prevalent diseases such as glaucoma and retinal conditions. Translational Relevance: VR-based vision exams that test vision fields, color vision and visual distortions provide comparable results in healthy patients, as well as those with glaucoma and retinal diseases, indicating its potential as a screening technology for different ocular pathologies. Given VR’s portable and low-profile features, it is important to consider leveraging VR to augment delivery of vision care. Full article

►▼ Show Figures

Figure 1

33 pages, 2332 KB

Open AccessArticle

EvalHack: Answer-Side Prompt Injection for Probing LLM Exam-Grading Panel Stability

by Catalin Anghel, Marian Viorel Craciun, Adina Cocu, Andreea Alexandra Anghel, Antonio Stefan Balau, Adrian Istrate and Aurelian-Dumitrache Anghele

Information 2026, 17(3), 297; https://doi.org/10.3390/info17030297 - 18 Mar 2026

Viewed by 377

Abstract

Large language models are increasingly used as automated graders, yet their reliability under answer-side manipulation and their behavior in multi-model panels remain insufficiently understood. This paper introduces EvalHack, a matrix benchmark in which a fixed committee of four LLMs grades university-level machine learning exam answers under a strict integer-only contract (0–10) grounded in instructor-authored rubric artifacts. The dataset comprises 100 students answering 10 short, open-ended items (1000 answers). For each answer, the evaluation includes a clean version and two content-preserving adversarial variants that operate only on the student text: A₁, a visible coercive suffix appended to the answer, and A₂, a stealth variant that uses Unicode control characters (e.g., zero-width and bidirectional marks) to embed an instruction. EvalHack instruments the full grading pipeline, recording item-level member scores, the committee aggregate, within-panel disagreement, and discrepancies to human grades. Empirically, answer-side edits induce systematic score inflation and stronger top-end concentration, with edited answers clustering near the upper end of the scale. Within-panel disagreement, measured as the range between the highest and lowest member score, varies across conditions, with median Consistency Spread values of 3.0 (clean), 2.0 (A₁), and 6.0 (A₂). Compared to human graders, the panel is more lenient on average (MAE = 1.897; bias human − panel = −1.345). Finally, grouping items by disagreement shows that low-disagreement items exhibit smaller human-panel errors, indicating that within-panel spread can serve as a practical uncertainty signal for routing difficult answers to human review or to larger/more specialized panels. Full article

(This article belongs to the Section Artificial Intelligence)

►▼ Show Figures

Graphical abstract

22 pages, 3196 KB

Open AccessArticle

An Explainable Neuro-Symbolic Framework for Online Exam Cheating Detection

by Turgut Özseven and Beyza Esin Özseven

Appl. Sci. 2026, 16(6), 2884; https://doi.org/10.3390/app16062884 - 17 Mar 2026

Viewed by 300

Abstract

With the proliferation of online examination systems, protecting academic integrity and reliably detecting cheating have become significant research problems. Current AI-based online monitoring systems can achieve high accuracy by analyzing visual behavioral cues; however, their often black-box nature limits their explainability, reliability, and legal compliance (e.g., GDPR). In contrast, while rule-based approaches are interpretable, they are insufficient for generalizing complex and ambiguous human behaviors. This study proposes an explainable neuro-symbolic framework combining data-driven learning with symbolic reasoning for cheating detection in online exams. The proposed framework comprises three main layers: a neural perceptron layer that generates a suspicious behavior score; a symbolic reasoning layer comprising ANFIS and ILP methods to increase explainability and manage ambiguity; and a neuro-symbolic fusion layer that integrates these two layers. The success of the proposed framework for plagiarism detection was evaluated using a dataset containing visual–behavioral features such as gaze behavior, head pose, hand-object interaction, and device usage, along with the XGBoost method at the neural perceptron layer. Experimental results show that the proposed approach achieves high detection success and supports decision-making using logical rules, thereby reducing false positives. In this respect, the study offers an ethical, transparent, and reliable solution for online exam security. Full article

►▼ Show Figures

Figure 1

32 pages, 2055 KB

Open AccessArticle

Leveraging Transformers and LLMs for Automated Grading and Feedback Generation Using a Novel Dataset

by Asmaa G. Khalf, Emad Nabil, Wael H. Gomaa, Oussama Benrhouma and Amira M. El-Mandouh

Data 2026, 11(3), 57; https://doi.org/10.3390/data11030057 - 16 Mar 2026

Viewed by 383

Abstract

Automated Short Answer Grading (ASAG) has garnered significant attention in the field of educational technology due to its potential to improve the efficiency, scalability, and consistency of student assessments. This study introduces a novel dataset of 651 student responses from a Database Transaction course exam at Beni-Suef University, referred to as the Beni-Suef Transaction Processing (BeSTraP) dataset. The BeSTraP is specifically designed to support ASAG evaluation. To assess ASAG performance, five approaches were employed: string-based similarity, semantic similarity, a hybrid of both, fine-tuning transformer-based models, and the application of Large Language Models (LLMs). The experimental results indicated that fine-tuned transformers, particularly GPT-2, achieved the highest Pearson correlation with human scores (0.8813) on the new dataset and maintained robust performance on the Mohler benchmark (0.7834). In addition to grading, the framework integrates automated feedback generation through LLMs, further enriching the assessment process. This research contributes (i) a novel, domain-specific dataset derived from an actual university examination, (ii) a comprehensive comparison of traditional and transformer-based approaches, and (iii) evidence of the efficacy of fine-tuned models in providing accurate and scalable grading solutions. The created dataset will be publicly available for the community. Full article

(This article belongs to the Special Issue Mining and Computational Intelligence for E-Learning and Education—4th Edition)

►▼ Show Figures

Graphical abstract

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 41.

Go to page 1 2 3 4 5

Search Results (2,009)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI