Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (66)

Search Parameters:
Keywords = AI-assisted instruction

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
11 pages, 525 KB  
Article
Assessment of Stage Two Hypertension Treatment Plans Written by Generative AI
by Tai Metzger, Zaheen Hossain, Kody Park, Stephen Vu, Simon Dixon and Tracey A. H. Taylor
J. Clin. Med. 2026, 15(8), 3103; https://doi.org/10.3390/jcm15083103 - 18 Apr 2026
Viewed by 162
Abstract
Background/Objectives: As use of large language models (LLMs) in clinical practice, in medical education, and by patients increases, it is essential to ensure that information provided is accurate and safe. Our objective was to compare stage two hypertension treatment plans generated by [...] Read more.
Background/Objectives: As use of large language models (LLMs) in clinical practice, in medical education, and by patients increases, it is essential to ensure that information provided is accurate and safe. Our objective was to compare stage two hypertension treatment plans generated by popular LLMs. Methods: ChatGPT (GPT-4o), Claude (Claude 4 Sonnet), ClinicalKey AI, Microsoft Copilot (Wave 2), DeepSeek-V3-0324, Dyna AI, Google Gemini (2.5 Flash), Grok (version 3), Meta AI assistant (Llama 4 Maverick), OpenEvidence (version 2.0), Perplexity (Sonar backend model), and Pi (Inflection-2.5) were prompted to generate a treatment plan for stage two hypertension. Six blinded reviewers scored each response in three domains: adherence to clinical guidelines, detail/clarity, and reliability/safety. Results: Perplexity received the highest composite score (8.17 out of 9), followed by OpenEvidence (7.92 out of 9). Dyna AI had the lowest overall score (3.75 out of 9). Perplexity (3.00 out of 3), Grok (2.83 out of 3), and OpenEvidence (2.75 out of 3) had the highest scores for detail/clarity, while Dyna AI had the lowest for both detail/clarity (1.00 out of 3) and reliability/safety (1.00 out of 3). ChatGPT had the highest score for adherence to guidelines (2.75 out of 3) while Pi had the lowest (1.58 out of 3). Kruskal–Wallis test showed p < 0.05 across sub-score domains and composite scores. Conclusions: LLMs tended to adhere to clinical guidelines and provide detailed responses but often did not provide sources or instruct users to see a healthcare professional. There was notable variability in quality, and medicine-specific LLMs were not superior to popular LLMs. Full article
Show Figures

Figure 1

20 pages, 660 KB  
Article
Rapid AI-Assisted Instructional Design: Using Agentic LLM Tools to Develop UDL-Aligned Curricula for Student Veterans and Multilingual Learners
by John C. Chick and Laura T. Morello
Appl. Sci. 2026, 16(8), 3871; https://doi.org/10.3390/app16083871 - 16 Apr 2026
Viewed by 182
Abstract
Background/Context: Creating instructional materials that authentically meet the needs of marginalized learner groups such as student veterans, multilingual adult learners, and first-generation doctoral students demands consistent application of Universal Design for Learning (UDL) principles coupled with meaningful content expertise about those learners’ traits, [...] Read more.
Background/Context: Creating instructional materials that authentically meet the needs of marginalized learner groups such as student veterans, multilingual adult learners, and first-generation doctoral students demands consistent application of Universal Design for Learning (UDL) principles coupled with meaningful content expertise about those learners’ traits, access needs, and lived experiences. Faculty at teaching-intensive institutions face persistent constraints of time, knowledge, and course load that make systematic UDL implementation difficult. Objective: This practitioner-scholar case study examines whether HAIST-structured agentic LLM-assisted instructional design can produce UDL-aligned materials for student veterans and multilingual learners at a quality level and time frame realistic for under-resourced faculty. Methodology: Drawing from the Human-AI Symbiotic Theory (HAIST) and UDL guidelines, we document four AI-assisted cycles of instructional design at a Hispanic-Serving Institution. Outcomes related to UDL alignment were measured using a rubric adapted from CAST Guidelines 2.2. Results: Across four materials, initial AI generation averaged 61.4% UDL alignment (SD = 8.7%); following iterative calibration, this rose to 84.2% (SD = 5.3%). The largest gains occurred in the Engagement category. Conclusions: These descriptive findings, interpreted as exploratory rather than inferential given the single-site case study design and n = 4 materials, suggest that HAIST-structured AI-assisted design has the potential to produce accessible materials for underserved learner populations in time frames feasible for working faculty. Learner outcome data were not collected in this study; future quasi-experimental work is needed to assess the effectiveness of these materials with target learner populations. Full article
24 pages, 318 KB  
Article
“I’m Not as Good as AI”: The Impact of Generative AI Use on Learning Anxiety and Self-Efficacy
by Tao Jiang and Yan Xu
Sustainability 2026, 18(8), 3869; https://doi.org/10.3390/su18083869 - 14 Apr 2026
Viewed by 444
Abstract
This study investigates whether metacognitive prompting for responsible generative AI (GenAI) use can enhance students’ psychological sustainability in AI-assisted learning. Using a face-to-face classroom experiment (N = 148; 74 prompting, 74 control), we examined how metacognitive prompts embedded in a GenAI-assisted academic [...] Read more.
This study investigates whether metacognitive prompting for responsible generative AI (GenAI) use can enhance students’ psychological sustainability in AI-assisted learning. Using a face-to-face classroom experiment (N = 148; 74 prompting, 74 control), we examined how metacognitive prompts embedded in a GenAI-assisted academic task influence learning anxiety and academic self-efficacy, and whether anxiety mediates the effect on self-efficacy. Manipulation checks indicated that the prompting condition produced significantly higher metacognitive engagement than the control condition (t(146) = 7.50, p < 0.001, d = 1.23). Hypothesis tests showed that metacognitive prompting reduced learning anxiety (b = −0.68, p < 0.001) and increased academic self-efficacy (b = 0.40, p = 0.008). Learning anxiety was negatively associated with self-efficacy (b = −0.42, p < 0.001). Mediation analyses using bootstrap confidence intervals revealed a significant indirect effect of prompting on self-efficacy via reduced anxiety (ab = 0.26, 95% CI [0.12, 0.43]), indicating partial mediation. These findings suggest that responsible GenAI use can be supported through instructional design. Brief metacognitive prompts may help students regulate AI use, reduce learning anxiety, and maintain academic self-efficacy. More broadly, the study contributes to sustainable education and educational technology research by showing that classroom scaffolds can support student agency and well-being in AI-assisted learning. Full article
(This article belongs to the Special Issue AI for Sustainable and Creative Learning in Education)
14 pages, 871 KB  
Article
Validation of a Dermatology-Focused Multimodal Image-and-Data Assistant in Diagnosis and Management of Common Dermatologic Conditions
by Joshua Mijares, Emma J. Bisch, Eanna DeGuzman, Kanika Garg, David Pontes, Neil K. Jairath, Vignesh Ramachandran, George Jeha, Andjela Nemcevic and Syril Keena T. Que
Medicina 2026, 62(4), 715; https://doi.org/10.3390/medicina62040715 - 9 Apr 2026
Viewed by 343
Abstract
Background and Objectives: Shortages of dermatologists create significant barriers to care, particularly for inflammatory and history-dependent conditions where image-only artificial intelligence (AI) classifiers have limited applicability. Current teledermatology solutions largely focus on single-task, morphology-based neoplasm classifiers, leaving the vast majority of dermatologic [...] Read more.
Background and Objectives: Shortages of dermatologists create significant barriers to care, particularly for inflammatory and history-dependent conditions where image-only artificial intelligence (AI) classifiers have limited applicability. Current teledermatology solutions largely focus on single-task, morphology-based neoplasm classifiers, leaving the vast majority of dermatologic presentations underserved. This study evaluated the diagnostic accuracy and management plan quality of Dermflow (Prava Medical, Delaware, USA), a proprietary dermatology-focused Multimodal Image-and-Data Assistant (MIDA) that autonomously gathers dermatology-specific history, integrates data with patient-submitted images, and outputs structured differential diagnoses and management summaries. Materials and Methods: Two AI systems, Dermflow and Claude Sonnet 4 (Claude, a leading vision–language model), analyzed 87 clinical images from the Skin Condition Image Network and Diverse Dermatology Images databases, representing 10 inflammatory dermatoses and 9 neoplastic conditions stratified across Fitzpatrick Skin Tone (FST) categories (I–II, III–IV, V–VI). For the diagnostic comparison, Dermflow received images and autonomously gathered clinical history, while Claude received identical images without history. For the management plan comparison, both systems received the correct diagnosis and the clinical histories gathered by Dermflow. The primary outcome was diagnostic accuracy. The secondary outcome was management plan quality, assessed by two blinded dermatologists across eight clinical dimensions using 5-point Likert scales. Chi-square tests compared diagnostic accuracy between models; t-tests and ANOVA compared management quality scores. Results: Dermflow achieved markedly superior diagnostic accuracy compared to Claude (86.2% vs. 24.1%, p < 0.001). Both models maintained consistent diagnostic performance across FST categories without significant within-model differences (Dermflow p = 0.924; Claude p = 0.828). Management plan quality showed no significant overall differences between models. However, composite management quality scores declined significantly for darker skin tones across both systems: Dermflow scored 4.20 (FST I–II), 3.99 (FST III–IV), and 3.47 (FST V–VI); Claude scored 4.35, 3.97, and 3.44, respectively (p < 0.001 for most pairwise FST comparisons within each model). Conclusions: Multimodal AI integrating targeted history with image analysis achieves substantially higher diagnostic accuracy than image-only approaches across both inflammatory and neoplastic dermatologic conditions. Autonomous history gathering addresses fundamental limitations of morphology-only classifiers and enables scalable, patient-facing triage across the full spectrum of dermatologic disease. However, both models demonstrated reduced management plan quality for darker skin tones despite receiving the correct diagnosis, suggesting persistent training data limitations that require targeted bias-mitigation strategies beyond domain-specific instruction. Full article
Show Figures

Figure 1

25 pages, 1802 KB  
Article
Integrating Generative AI and Cultural Storytelling to Enhance Geometry Learning in Vietnamese Primary Classrooms: A Quasi-Experimental Study
by Nguyen Huu Hau, Pham Sy Nam, Trinh Cong Son, Dao Chung Lan Anh, Nguyen Thuy Van, Pham Thi Thanh Tu, Tran Thuy Nga and Vo Xuan Mai
Educ. Sci. 2026, 16(4), 588; https://doi.org/10.3390/educsci16040588 - 7 Apr 2026
Viewed by 319
Abstract
In Vietnamese primary mathematics education, geometry instruction often emphasizes rote calculation and formula memorization rather than meaningful contextualization, leaving students disconnected from abstract concepts and lacking opportunities to connect learning with cultural identity. This quasi-experimental study investigates how integrating generative AI tools (ChatGPT, [...] Read more.
In Vietnamese primary mathematics education, geometry instruction often emphasizes rote calculation and formula memorization rather than meaningful contextualization, leaving students disconnected from abstract concepts and lacking opportunities to connect learning with cultural identity. This quasi-experimental study investigates how integrating generative AI tools (ChatGPT, DALL·E, Canva) with the culturally grounded Vietnamese folktale Bánh Chưng—Bánh Giầy can support Grade 5 students’ understanding of circle geometry. Employing a mixed-methods design with 30 students divided into experimental (AI + storytelling) and control (traditional instruction) groups, the study measured cognitive and affective learning outcomes through pre/post-tests, a validated 25-item questionnaire, interviews, and classroom observations. Quantitative results revealed significant improvements in the experimental group across all measured dimensions, learning interest, attentional focus, conceptual understanding, mathematics passion, and cultural preservation awareness, with large effect sizes. Qualitative findings confirmed enhanced engagement, multimodal conceptual clarity, and cultural affective resonance. The study demonstrates that low-cost, teacher-mediated generative AI can effectively support learning in resource-constrained primary settings when anchored in local narratives. Implications for ethical AI integration and teacher professional development in Vietnamese contexts are discussed. Full article
Show Figures

Figure 1

17 pages, 1826 KB  
Review
Integrating AI Segmentation, Simulated Digital Twins, and Extended Reality into Medical Education: A Narrative Technical Review and Proof-of-Concept Case Study
by Parhesh Kumar, Ingharan Siddarthan, Catharine Kelsh Keim, Daniel K. Cho, John E. Rubin, Robert S. White and Rohan Jotwani
J. Pers. Med. 2026, 16(4), 202; https://doi.org/10.3390/jpm16040202 - 3 Apr 2026
Viewed by 553
Abstract
Background/Objectives: Simulation digital twins (DT) models that integrate patient-specific imaging with artificial intelligence (AI)-based segmentation and extended reality (XR) technologies are rapidly increasing in relevance in personalized medicine. While their clinical applications are expanding, their role as reusable educational tools and the [...] Read more.
Background/Objectives: Simulation digital twins (DT) models that integrate patient-specific imaging with artificial intelligence (AI)-based segmentation and extended reality (XR) technologies are rapidly increasing in relevance in personalized medicine. While their clinical applications are expanding, their role as reusable educational tools and the technical pipeline utilized for their development remain incompletely characterized. This narrative review examines current approaches to digital twin creation and XR integration, illustrated by a scoliosis-specific proof-of-concept educational case study. Methods: A narrative technical review was conducted by identifying relevant search keywords within the fields of AI-based image segmentation, extended reality in medicine, and medical education based on the authors’ expertise and familiarity with the subject. PubMed, Google Scholar, and Scopus were searched for English-language studies published primarily between 2015 and 2025 addressing patient-specific three-dimensional modeling, AI-driven segmentation, and XR applications in spine, orthopedic, anesthesiology, and interventional care. A de-identified case of scoliosis is used to present a proof-of-concept example of this process of creating a simulated digital twin for the purpose of medical education in a recorded XR format. Results: Prior studies demonstrated benefits of patient-specific 3D models for anatomical understanding and procedural planning, while highlighting limitations in segmentation accuracy and workflow integration. Nevertheless, while DTs have traditionally served clinical roles in surgical planning or pre-procedural rehearsal, their pedagogical potential remains under-explored. In the proof-of-concept case study, AI-assisted segmentation enabled rapid creation of an anatomically detailed scoliosis digital twin that was incorporated into XR and used to produce a reusable, spatially anchored instructional experience focused on neuraxial access. Conclusions: AI-enabled digital twin models integrated with XR represent a promising approach for personalized, anatomy-driven medical education. Further evaluation is needed to assess educational outcomes, scalability, and integration into clinical training workflows. Full article
Show Figures

Figure 1

14 pages, 466 KB  
Review
Fidelity, Virtual Human Assistants, and Engagement in Immersive Virtual Learning Environments: The Role of Temporal Functional Fidelity
by Thomas Gaudi, Bill Kapralos and Alvaro Quevedo
Encyclopedia 2026, 6(4), 77; https://doi.org/10.3390/encyclopedia6040077 - 30 Mar 2026
Viewed by 484
Abstract
Advances in consumer virtual reality (VR) and artificial intelligence (AI) have accelerated the use of immersive virtual learning environments (iVLEs) for skills training. Learner engagement is a critical determinant of training effectiveness, which can be shaped by VR system features (e.g., visual, auditory, [...] Read more.
Advances in consumer virtual reality (VR) and artificial intelligence (AI) have accelerated the use of immersive virtual learning environments (iVLEs) for skills training. Learner engagement is a critical determinant of training effectiveness, which can be shaped by VR system features (e.g., visual, auditory, and tactile immersion) coupled with interaction mechanics and instructional design integrated with the instructional behaviors of virtual human assistants (VHAs). Although visual and behavioral fidelity in VHAs have been extensively studied, functional fidelity (i.e., the extent to which the iVLE and/or VHAs support cognitive, perceptual, and motor processes required to perform a task regardless of visual realism), and particularly the temporal alignment of instructional guidance with learners’ cognitive and motor demands, remains underexamined. This article highlights research on VHAs in iVLEs with a special emphasis on temporal functional fidelity as an emerging requirement for synchronizing instructional support with user workload and task phases. By consolidating existing findings and highlighting gaps in current empirical work, this article outlines key implications for the design and evaluation of VHAs and identifies directions for future research aimed at optimizing instructional timing in iVLEs. The goal is to inform principled VHA design and clarify how fidelity dimensions should be integrated to support effective, pedagogically grounded immersive learning experiences. Full article
(This article belongs to the Section Mathematics & Computer Science)
Show Figures

Figure 1

22 pages, 1060 KB  
Systematic Review
Artificial Intelligence in EFL Speaking Instruction: A Systematic Review of Pedagogical Design, Affective Conditions and Instructional Input
by Sareen Kaur Bhar
Encyclopedia 2026, 6(4), 74; https://doi.org/10.3390/encyclopedia6040074 - 27 Mar 2026
Viewed by 944
Abstract
Speaking proficiency remains one of the most challenging skills for learners of English as a Foreign Language (EFL), particularly in contexts where sustained spoken interaction is limited. This systematic review synthesises 36 empirical studies (2015–2025) identified through a PRISMA-guided Scopus search to examine [...] Read more.
Speaking proficiency remains one of the most challenging skills for learners of English as a Foreign Language (EFL), particularly in contexts where sustained spoken interaction is limited. This systematic review synthesises 36 empirical studies (2015–2025) identified through a PRISMA-guided Scopus search to examine how artificial intelligence (AI)-mediated instruction supports EFL speaking development. The included studies were analysed according to AI modality, pedagogical integration, instructional input characteristics, and linguistic and affective outcomes. Findings indicate that AI tools—such as chatbots, automatic speech recognition systems, and large language models—consistently support affective outcomes, including reduced speaking anxiety and increased willingness to communicate. Improvements in fluency, pronunciation, and accuracy were frequently reported, particularly when AI tools were embedded within task-based and pedagogically structured instructional designs. However, evidence for sustained development of higher-order communicative competence was more variable. The review proposes a mediated input framework conceptualising AI as a design-sensitive instructional resource rather than an autonomous teaching agent. Full article
(This article belongs to the Section Arts & Humanities)
Show Figures

Figure 1

22 pages, 13466 KB  
Article
On-Premise Multimodal AI Assistance for Operator-in-the-Loop Diagnosis in Machine Tool Mechatronic Systems
by Seongwoo Cho, Jongsu Park and Jumyung Um
Appl. Sci. 2026, 16(7), 3166; https://doi.org/10.3390/app16073166 - 25 Mar 2026
Viewed by 325
Abstract
Modern machine tools are safety-critical mechatronic systems, yet shop floor maintenance from abnormal events still relies heavily on scarce expert know-how and time-consuming manual searches across heterogeneous controller documentation. This paper presents an on-premise multimodal AI assistant. It integrates large language models with [...] Read more.
Modern machine tools are safety-critical mechatronic systems, yet shop floor maintenance from abnormal events still relies heavily on scarce expert know-how and time-consuming manual searches across heterogeneous controller documentation. This paper presents an on-premise multimodal AI assistant. It integrates large language models with retrieval augmented generation and real-time machine signals to support operator-in-the-loop fault diagnosis. The proposed system provides three tightly coupled functions: (1) alarm-grounded guidance, which answers controller alarms and recommends corrective actions by grounding generation on manuals, maintenance procedures, and historical alarm cases; (2) parameter-aware reasoning, which injects live process and health indicators (e.g., spindle temperature, vibration, and axis states) into the reasoning context through an industrial data pipeline, enabling context specific troubleshooting; and (3) vision enabled support, which retrieves similar visual cases and generates concise visual instructions when text alone is insufficient. The assistant is deployed within an intranet environment to satisfy industrial security and privacy requirements and is orchestrated via lightweight tool calling for seamless integration with existing shop floor systems. Experiments on real machine tool alarm scenarios demonstrate that the proposed system achieves 82% answer correctness for alarm Q&A and improves response consistency and time-to-resolution compared with baseline keyword search and template-based guidance. The results suggest that grounded, multimodal chatbot assistants can act as practical AI-based feedback and decision support mechanisms for mechatronic production equipment, bridging human skill gaps while enhancing reliability and maintainability. Full article
Show Figures

Figure 1

20 pages, 502 KB  
Article
Design and Evaluation of a Retrieval-Augmented Generation LLM Chatbot with Structured Database Access
by Juan Burbano, Pablo Landeta-López, Cathy Guevara-Vega and Antonio Quiña-Mera
Appl. Sci. 2026, 16(7), 3147; https://doi.org/10.3390/app16073147 - 25 Mar 2026
Viewed by 771
Abstract
Context. The grocery sector is undergoing a massive shift in consumer behavior, with global chatbot usage projected to reach 8.4 billion units by 2024—surpassing the total human population—and online grocery revenue per shopper expected to hit USD 449.00 by 2023. In this competitive [...] Read more.
Context. The grocery sector is undergoing a massive shift in consumer behavior, with global chatbot usage projected to reach 8.4 billion units by 2024—surpassing the total human population—and online grocery revenue per shopper expected to hit USD 449.00 by 2023. In this competitive landscape, small grocery stores must adopt AI-driven tools to modernize their operations. However, these businesses often face significant inefficiencies in manual inventory management, resulting in errors and reduced competitiveness. Objective. This research aims to develop and validate a chatbot application using Large Language Models and Retrieval-Augmented Generation (RAG) for operational management of grocery stores. Method. The method employed a quantitative experimental approach with a five-component system architecture: a web interface, a FastAPI API, a Mistral-7B-Instruct-v0.2 model, a dynamic SQL generator, and a custom RAG application with an FAISS vector database, all integrated through SQLAlchemy 2.0.40. Results. The results demonstrate that a chatbot achieves an average response time of 0.08 s with 80% overall accuracy, showing a 96.2% improvement in information query time and a 92.9% reduction in operational errors. Conclusions. Major conclusions suggest that the chatbot system is effective for retail environments and has the potential to enhance the operational efficiency of grocery stores, serving as a foundation for future research in applied conversational assistance. Full article
Show Figures

Figure 1

20 pages, 15973 KB  
Article
Streamlining Human–Robot Interaction: Integrating LLM-Based Planning into Modular Robotic Frameworks
by MinHyuk Kim, JooHee Park, Kwanyong Park, Yong-Ju Lee and Sanghun Jeon
Sensors 2026, 26(6), 1978; https://doi.org/10.3390/s26061978 - 21 Mar 2026
Viewed by 633
Abstract
Embodied artificial intelligence (AI), which integrates AI and robotics, has made significant progress, particularly in human–robot interaction, task-assisting robots, and the integration of multimodal AI models. Experimental studies have demonstrated strong performance in complex tasks, such as providing human assistance, performing household chores, [...] Read more.
Embodied artificial intelligence (AI), which integrates AI and robotics, has made significant progress, particularly in human–robot interaction, task-assisting robots, and the integration of multimodal AI models. Experimental studies have demonstrated strong performance in complex tasks, such as providing human assistance, performing household chores, and object manipulation through pick-and-place operations. However, despite these impressive capabilities, real-world applicability remains limited. While tasks such as household chores and object manipulation offer significant practical utility, users often struggle to provide effective instructions, and execution remains prohibitively slow for real-world deployment. This study introduces an approach to enhance usability through spoken human instructions and reduce operation time by streamlining intermediate steps through our Module Handler. The proposed approach leverages a large language model to extract information from spoken human instructions accurately. Through experiments, we validated the accuracy of our approach and confirmed speed improvements compared with related studies. Our experiments evaluated system accuracy in extracting relevant information from spoken human instruction, achieving an object identification accuracy rate of approximately 92.47%. In addition, our method reduced task completion times by an average of 33 s across four different experimental environments compared with existing modular robotics systems. This time reduction is significant for enhancing robotic task execution efficiency. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

17 pages, 249 KB  
Article
ChatGPT-Assisted Task Analysis for Special Education Teachers: An Exploratory Study of Alignment, Readability, Efficiency, and Acceptability
by Serife Balikci, Nesime Kubra Terzioglu and Salih Rakap
Future Internet 2026, 18(3), 158; https://doi.org/10.3390/fi18030158 - 18 Mar 2026
Viewed by 297
Abstract
Task analysis is a foundational component of instructional design in special education, yet it can impose substantial time and cognitive demands on teachers. Artificial intelligence (AI) tools such as ChatGPT may provide support for instructional planning tasks by assisting educators in generating and [...] Read more.
Task analysis is a foundational component of instructional design in special education, yet it can impose substantial time and cognitive demands on teachers. Artificial intelligence (AI) tools such as ChatGPT may provide support for instructional planning tasks by assisting educators in generating and organizing task sequences. This study examined the effectiveness, readability, time efficiency, and acceptability of ChatGPT-assisted task analysis compared to a traditional task analysis method. Thirty-two special education teachers participated in a randomized between-groups study in which they developed task analyses using either a traditional approach or ChatGPT supported by a structured interaction protocol. Task analyses were evaluated based on alignment with expert-developed models, readability, and development time, and teachers’ perceptions of acceptability were also examined. Results indicated that ChatGPT-assisted task analyses required significantly less development time while demonstrating strong alignment with expert-generated models. Readability levels and the number of task steps were similar across groups. Teachers who used ChatGPT also reported positive perceptions regarding the usefulness and acceptability of AI assistance in instructional planning. These findings suggest that AI-assisted tools may support teachers in developing task analyses more efficiently while maintaining instructional clarity. However, given the exploratory nature of the study and the limited sample, further research is needed to examine how AI-assisted task analysis may influence instructional practice and student learning outcomes in special education. Full article
(This article belongs to the Special Issue Human-Centered Artificial Intelligence)
42 pages, 1179 KB  
Article
Towards Reliable LLM Grading Through Self-Consistency and Selective Human Review: Higher Accuracy, Less Work
by Luke Korthals, Emma Akrong, Gali Geller, Hannes Rosenbusch, Raoul Grasman and Ingmar Visser
Mach. Learn. Knowl. Extr. 2026, 8(3), 74; https://doi.org/10.3390/make8030074 - 16 Mar 2026
Viewed by 815
Abstract
Large language models (LLMs) show promise for grading open-ended assessments but still exhibit inconsistent accuracy, systematic biases, and limited reliability across assignments. To address these concerns, we introduce SURE (Selective Uncertainty-based Re-Evaluation), a human-in-the-loop pipeline that combines repeated LLM prompting, uncertainty-based flagging, and [...] Read more.
Large language models (LLMs) show promise for grading open-ended assessments but still exhibit inconsistent accuracy, systematic biases, and limited reliability across assignments. To address these concerns, we introduce SURE (Selective Uncertainty-based Re-Evaluation), a human-in-the-loop pipeline that combines repeated LLM prompting, uncertainty-based flagging, and selective human regrading. Three LLMs—gpt-4.1-nano, gpt-5-nano, and the open-source gpt-oss-20b—graded answers of 46 students to 130 open questions and coding exercises across five assignments. Each student answer was scored 20 times to derive majority-voted predictions and self-consistency-based certainty estimates. We simulated human regrading by flagging low-certainty cases and replacing them with scores from four human graders. We used the first assignment as a training set for tuning certainty thresholds and to explore LLM output diversification via sampling parameters, rubric shuffling, varied personas, multilingual prompts, and post hoc ensembles. We then evaluated the effectiveness and efficiency of SURE on the other four assignments using a fixed certainty threshold. Across assignments, fully automated grading with a single prompt resulted in substantial underscoring, and majority-voting based on 20 prompts improved but did not eliminate this bias. Low certainty (i.e., high output diversity) was diagnostic of incorrect LLM scores, enabling targeted human regrading that improved grading accuracy while reducing manual grading time by 40–90%. Aggregating responses from all three LLMs in an ensemble improved certainty-based flagging and most consistently approached human-level accuracy, with 70–90% of the grades students would receive falling inside human-grader ranges. A reanalysis based on outputs from a more diversified LLM ensemble comprised of gpt-5, codestral-25.01, and llama-3.3-70b-instruct replicated these findings but also suggested that large reasoning models such as gpt-5 might eliminate the need for human oversight of LLM grading entirely. These findings demonstrate that self-consistency-based uncertainty estimation and selective human oversight can substantially improve the reliability and efficiency of AI-assisted grading. Full article
(This article belongs to the Section Learning)
Show Figures

Graphical abstract

21 pages, 409 KB  
Article
Motivational Mechanisms in CDIO-Based Sustainability Education: Effects of Experiential and AI-Supported Learning on Interest and Satisfaction
by Yang-Chieh Chin and Chiao-Chen Chang
Sustainability 2026, 18(6), 2724; https://doi.org/10.3390/su18062724 - 11 Mar 2026
Viewed by 304
Abstract
Higher education institutions are expected to cultivate graduates capable of addressing sustainability challenges through innovation, collaboration, and digital competence. However, many business programs struggle to integrate experiential authenticity, intelligent technologies, and collaborative learning into coherent instructional models, limiting students’ intrinsic motivation and sustainability-oriented [...] Read more.
Higher education institutions are expected to cultivate graduates capable of addressing sustainability challenges through innovation, collaboration, and digital competence. However, many business programs struggle to integrate experiential authenticity, intelligent technologies, and collaborative learning into coherent instructional models, limiting students’ intrinsic motivation and sustainability-oriented competence development. This study aims to examine how experiential learning, artificial intelligence-assisted collaborative learning, and team-based learning operate within the Conceive–Design–Implement–Operate instructional framework to influence learning interest and learning satisfaction in a sustainability-oriented business course. Survey data from 217 undergraduate students were analyzed using confirmatory factor analysis, structural equation modeling, and moderated regression analysis. The results indicate that both experiential and AI-supported collaborative learning positively enhance students’ learning interest, which partially mediates their effects on learning satisfaction. Team-based learning strengthens the experiential pathway but does not significantly moderate the AI-assisted pathway. These findings clarify differentiated motivational mechanisms within structured instructional systems and provide theoretical support for designing digitally enhanced sustainability education. Full article
Show Figures

Figure 1

19 pages, 1152 KB  
Article
Building Capacity with Assistive Technology in Teacher Education
by Alicia M. Drelick and Brent Elder
Educ. Sci. 2026, 16(3), 418; https://doi.org/10.3390/educsci16030418 - 10 Mar 2026
Viewed by 464
Abstract
Assistive Technology (AT) is legally mandated via the Individuals with Disabilities Education Act (IDEA), Section 504, and Every Student Succeeds Act (ESSA), but remains unevenly implemented in K-12 schools, in part due to teachers having limited preparation for selecting and using AT in [...] Read more.
Assistive Technology (AT) is legally mandated via the Individuals with Disabilities Education Act (IDEA), Section 504, and Every Student Succeeds Act (ESSA), but remains unevenly implemented in K-12 schools, in part due to teachers having limited preparation for selecting and using AT in inclusive classrooms. In this practice-oriented article, we describe how we designed two co-taught courses to systematically embed AT within a special education teacher preparation program. Guided by Disabilities Studies in Education (DSE) and Universal Design for Learning (UDL), we organized instruction around the AT Multi-Tiered Systems of Support (AT-MTSS), positioning service teachers as being responsible for various assistive technologies. In this article, we outline course structures and assignments, including accessibility labs grounded in Web Content Accessibility Guidelines (WCAG); structured labs for text-to-speech, speech-to-text, and AAC tools; field-based AT consideration and intervention using the SETT Framework; and an emerging focus on ethical consideration around the use of artificial intelligence (AI) in special education. We describe an emerging co-teaching practice, “One Teach–One Tech” through which we pair methods-based instruction with a real-time model of AT. We argue that the intentional embedding of assistive technology in pre-service programs is critical to expanding technology-assisted instruction and realizing the access promised to students under policies (i.e., IDEA, Section 504, and ESSA). Full article
(This article belongs to the Special Issue Technology-Assisted Instruction in Special Education)
Show Figures

Figure 1

Back to TopTop