MDPI - Publisher of Open Access Journals

25 pages, 1893 KB

Open AccessArticle

Contribution to Sarcasm Detection in Arabic Using Natural Language Processing Techniques

by Mennat Allah Hassan, Silvia García-Méndez and Francisco de Arriba-Pérez

Appl. Sci. 2026, 16(6), 2724; https://doi.org/10.3390/app16062724 - 12 Mar 2026

Viewed by 85

Sarcasm detection remains a challenging task in Natural Language Processing (nlp), especially for low-resource and non-standardized languages. Hence, this study addresses Franco-Arabic, a widely used form of online communication where Arabic words are written with Latin characters and numerals. Its informal [...] Read more.

Sarcasm detection remains a challenging task in Natural Language Processing (nlp), especially for low-resource and non-standardized languages. Hence, this study addresses Franco-Arabic, a widely used form of online communication where Arabic words are written with Latin characters and numerals. Its informal nature and orthographic variation complicate sarcasm identification and limit the applicability of existing nlp models. We propose an approach that integrates transformer-based representations with auxiliary linguistic features and rule-based cues to capture both contextual meaning and sentiment-driven inconsistencies. This research opens the door to practical applications. In particular, future work will investigate integrating sarcasm detection into the marketing sector, where accurate recognition of sarcastic reviews can enhance sentiment analysis, customer segmentation, and personalized communication strategies. Full article

(This article belongs to the Special Issue Advancements in Natural Language Processing, Semantic Networks, and Sentiment Analysis: 2nd Edition)

► Show Figures

Figure 1

9 pages, 733 KB

Open AccessBrief Report

Narrative Medicine and AI in Anesthesiology Training: Teaching Empathy in End-of-Life Care

by Anna La Palma, Giuliana Scarpati, Giulia Savarese and Ornella Piazza

Int. Med. Educ. 2026, 5(1), 33; https://doi.org/10.3390/ime5010033 - 10 Mar 2026

Viewed by 121

Abstract

Teaching empathy remains a challenge in medical education, particularly in anesthesiology, where physicians frequently care for patients at the end of life. Narrative Medicine, centered on communicative competence and patients’ lived experience, offers a framework for cultivating reflective and relational skills. Meanwhile, artificial [...] Read more.

Teaching empathy remains a challenge in medical education, particularly in anesthesiology, where physicians frequently care for patients at the end of life. Narrative Medicine, centered on communicative competence and patients’ lived experience, offers a framework for cultivating reflective and relational skills. Meanwhile, artificial intelligence (AI) systems can generate expressions of empathy, raising questions about authentic moral engagement. To explore how narrative-based education, combined with AI-generated texts, may stimulate reflection, we implemented an exploratory narrative-based intervention involving 25 anesthesiology residents, supported by three tutors, integrating literature, film, and AI-generated narratives. After an introduction session, participants engaged with excerpts from the book What Are You Going Through and the film The Room Next Door, followed by reflective writing based on five prompts. The same prompts were submitted to ChatGPT (OpenAI, GPT-4o) for comparative analysis, discussed during a debriefing session. Reflective writings were assessed using an adapted REFLECT rubric, alongside qualitative lexical and semantic analyses. Most participants did not reach the highest levels of reflective capacity, while ChatGPT texts achieved higher REFLECT scores, primarily due to linguistic coherence. These findings suggest that empathic competence is neither automatically acquired through medical training nor reducible to verbal fluency. Rather, it requires structured training grounded in meaningful engagement with patients. Full article

► Show Figures

Figure 1

24 pages, 6373 KB

Open AccessArticle

Augmented Reality-Based Training System Using Multimodal Language Model for Context-Aware Guidance and Activity Recognition in Complex Machine Operations

by Waseem Ahmed and Qingjin Peng

Designs 2026, 10(2), 30; https://doi.org/10.3390/designs10020030 - 5 Mar 2026

Viewed by 247

Abstract

Augmented Reality (AR) and Large Language Models (LLMs) have made significant advances across many fields, opening new possibilities, particularly in complex machine operations. In complex operations, non-expert users often struggle to perform high-precision tasks and require constant supervision to execute tasks correctly. This [...] Read more.

Augmented Reality (AR) and Large Language Models (LLMs) have made significant advances across many fields, opening new possibilities, particularly in complex machine operations. In complex operations, non-expert users often struggle to perform high-precision tasks and require constant supervision to execute tasks correctly. This paper proposes a novel AR-MLLM-based training system that integrates AR, multimodal large language models (MLLMs), and prompt engineering to interpret real-time machine feedback and user activity. It converts extensive technical text into structured, step-by-step commands. The system uses a prompt structure developed through an iterative design method and refined across multiple machine operation scenarios, enabling ChatGPT to generate task-specific contextual digital overlays directly on the physical machines. A case study with participants was conducted to assess the effectiveness and usability of the AR-MLLM system in Coordinate Measuring Machine (CMM) operation training. The experimental results demonstrate high accuracy in task recognition and feature measurement activity. The data further show reduced time and user workload during task execution with the proposed AR-MLLM system. The proposed system not only provides real-time guidance and enhances efficiency in CMM operation training but also demonstrates the potential of the AR-MLLM design framework for broader industrial applications. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Smart Factories: From Sensor Networks to Large Language Models)

► Show Figures

Figure 1

7 pages, 332 KB

Open AccessBrief Report

Large Language Models (LLM) for Emergency Department Triage Based on Vital Signs

by Thomas G. Lederer, William C. Herring, Lama A. Ammar, Benjamin S. Abella, Donald J. Apakama, Ethan E. Abbott and Aditya C. Shekhar

Emerg. Care Med. 2026, 3(1), 9; https://doi.org/10.3390/ecm3010009 - 5 Mar 2026

Viewed by 184

Abstract

Introduction: Large language models (LLMs) have proven effective in many different fields, including the allocation of scarce resources. Triage within emergency departments (ED) is a core process that ensures the sickest patients are seen in a timely manner. Relatively little research has examined [...] Read more.

Introduction: Large language models (LLMs) have proven effective in many different fields, including the allocation of scarce resources. Triage within emergency departments (ED) is a core process that ensures the sickest patients are seen in a timely manner. Relatively little research has examined the use of existing LLMs in the triage process. Methods: 12 widely available LLMs were provided with real-world patient triage vital sign data from an academic trauma center in a major metropolitan area. The LLMs were asked to assign a triage score to each patient based on this information alone. The deviation between each LLM triage score and the real-world triage score for each patient was calculated, and the absolute value of the deviation was calculated and then averaged across the entire dataset per LLM. The average absolute value of deviation (AAVD) could then be used to compare LLMs against each other. All LLMs were blinded to the real-world triage score and received no additional training or instruction. Results: The models with the highest concordance with real-world triage scores were Claude Sonnet 4.5 (AAVD: 0.37; 62.37% concordance), ChatGPT-5 Instant (AAVD: 0.39; 62.89% concordance), and Claude Opus 4.1 (AAVD: 0.40; 62.37% concordance). The least accurate models were Gemini 2.5 Flash (AAVD: 0.42; 43.81% concordance), ChatGPT-4o Mini (AAVD: 0.49; 45.36% concordance), and ChatGPT-o3 (AAVD: 0.48; 48.45% concordance). Conclusions: This study analyzes the ability of LLMs to triage emergency department patients based primarily on vital sign data. Certain LLMs demonstrated moderate concordance with real-world triage scores. LLMs may be able to synthesize objective vital sign data and provide a triage recommendation. Further study could involve clinical validation against patient outcomes. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Emergency Care)

► Show Figures

Figure 1

7 pages, 980 KB

Open AccessProceeding Paper

Implicitly Empathy Prompting Features to Improve Empathetic Chatbot Performance in Lightweight Language Models

by Yun-Rong Chen, Kun-Ta Chuang and Hung-Yu Kao

Eng. Proc. 2026, 129(1), 8; https://doi.org/10.3390/engproc2026129008 - 26 Feb 2026

Viewed by 185

Abstract

An empathetic chatbot is an essential component of intelligent mental healthcare. We adopted implicitly empathy prompting (IEP) by decomposing empathy into supportive dialogue, paraphrased response, emotional understanding, and attitude expression, referred to as the four features of empathy decomposition. IEP is based on [...] Read more.

An empathetic chatbot is an essential component of intelligent mental healthcare. We adopted implicitly empathy prompting (IEP) by decomposing empathy into supportive dialogue, paraphrased response, emotional understanding, and attitude expression, referred to as the four features of empathy decomposition. IEP is based on lightweight language multi-agents (LLM-Agents) to generate empathy dialogue. The approach contrasts with the explicitly defined empathy of simply prompting a model to be empathetic. Three datasets for the four features scenario were generated by using the Generative Pre-Trained Transformer (GPT)-4o model, with cases in finance, family, and health issues. For each dataset, 30 examples were randomly selected and examined as input prompting onto six lightweight language models. These models include Mistral (7B), Phi-4 (14B), StableLM2 (12B), Tulu3 (8B), Neural-chat (7B), and Llama 3.1-Instruct (8B). After that, the output was evaluated by using GPT-4o to calculate empathy perception scores (EP scores). The average EP scores on three datasets for implicit/explicit empathy prompting ranged from 1 to 10. The final evaluation results are as follows: (1) implicitly empathy prompting (IEP): Mistral (8.83), Phi-4 (8.96), StableLM2 (9.03), Tulu3 (7.24), Neural-chat: (8.03), Llama 3.1-Instruct (8.74); (2) Explicitly Empathy Prompting (EEP): Mistral (7.52), Phi-4 (8.55), StableLM2 (7.78), Tulu3 (7.67), Neural-chat (8.35), Llama 3.1-Instruct (8.76). Among these values, three models (Mistral, Phi-4, and StableLM2) achieve higher and stable EP scores obviously. The other models (Tulu3, Neural-chat, and Llama 3.1-Instruct) keep comparable EP scores. Our experiment findings showed that the prompt engineering method with the IEP approach could significantly outperform EEP. Full article

► Show Figures

Figure 1

16 pages, 3452 KB

Open AccessArticle

Impact of Large Language Model Assistance on Radiologists’ Diagnostic Performance for Brain Tumors by Experience Level

by Chae Won Song, Byung Hyun Baek, Seul Kee Kim, Woong Yoon, Yun Young Lee, Ilwoo Park, Jae Hyun Park, Seol Bin Park and In Woo Choi

J. Clin. Med. 2026, 15(4), 1673; https://doi.org/10.3390/jcm15041673 - 23 Feb 2026

Viewed by 401

Abstract

Background: Large language models (LLMs) may assist radiologists in interpreting brain tumor MRI. We compared the diagnostic accuracy of ChatGPT-4o and Claude 3.5 Sonnet with that of board-certified radiologists and trainees, and evaluated whether LLM assistance could enhance diagnostic performance. Methods: [...] Read more.

Background: Large language models (LLMs) may assist radiologists in interpreting brain tumor MRI. We compared the diagnostic accuracy of ChatGPT-4o and Claude 3.5 Sonnet with that of board-certified radiologists and trainees, and evaluated whether LLM assistance could enhance diagnostic performance. Methods: A total of 127 histologically confirmed brain tumor cases were included. Two LLMs analyzed representative MRI images together with structured radiologic reports, whereas two board-certified radiologists and three trainees reviewed representative images with basic demographic information only. All participants generated up to three differential diagnoses per case. The accuracy of the primary diagnosis and the accuracy of the top-three differential diagnoses were calculated and compared. Following the initial readings, LLM-generated differential diagnoses were provided to the readers, and their post-assistance diagnostic performance was re-evaluated. Results: Claude 3.5 Sonnet achieved a primary diagnostic accuracy of 50.4% and a top-three differential accuracy of 85.0%, comparable to ChatGPT-4o (44.9% and 82.7%, respectively). Radiologists demonstrated a higher primary diagnostic accuracy (69.3%, p < 0.001) compared to LLMs, but a similar top-three differential accuracy (80.7%). In contrast, trainees showed a primary diagnostic accuracy (48.0%) comparable to LLMs, but a lower top-three differential accuracy (62.5%) than LLMs. With LLM assistance, radiologists exhibited a significant improvement in the top-three differential accuracy (from 80.7% to 90.2%, p < 0.001), and trainees showed significant improvements in both the primary and top-three differential accuracy (from 48.0% to 58.8%, p < 0.001, and from 62.5% to 81.1%, p < 0.001, respectively). Conclusion: LLMs demonstrated the ability to expand differential diagnostic considerations when operating on structured imaging inputs. LLM assistance was associated with improved trainee performance in this constrained experimental setting. These findings should be interpreted cautiously and require validation under balanced input conditions and clinically realistic workflows. Full article

(This article belongs to the Section Clinical Research Methods)

► Show Figures

Figure 1

26 pages, 2554 KB

Open AccessArticle

Semi-Automated Reporting from Environmental Monitoring Data Using a Large Language Model-Based Chatbot

by Angelica Lo Duca, Rosa Lo Duca, Arianna Marinelli, Donatella Occhiuto and Alessandra Scariot

ISPRS Int. J. Geo-Inf. 2026, 15(2), 80; https://doi.org/10.3390/ijgi15020080 - 14 Feb 2026

Viewed by 359

Abstract

Producing high-quality analytical reports for the environmental domain is typically time-consuming and requires significant human expertise. This paper describes MeteoChat, a semi-automatic framework for efficiently generating specialized environmental reports from heterogeneous environmental data. MeteoChat utilizes a Large Language Model (LLM) fine-tuned and integrated [...] Read more.

Producing high-quality analytical reports for the environmental domain is typically time-consuming and requires significant human expertise. This paper describes MeteoChat, a semi-automatic framework for efficiently generating specialized environmental reports from heterogeneous environmental data. MeteoChat utilizes a Large Language Model (LLM) fine-tuned and integrated with Retrieval-Augmented Generation (RAG). The system’s core is its plug-and-play philosophy, which separates analytical reasoning from the data source and the report’s intended audience. The fine-tuning phase uses data-agnostic, parameterized question–context–answer triples defined by an environmental expert to teach the LLM domain-specific analytical logic and audience-appropriate communication styles. Subsequently, the RAG phase integrates the model with actual datasets, which are processed via an Extract–Transform–Load (ETL) workflow to generate statistical summaries. This architectural separation ensures that the same reporting engine can operate on different sources, such as meteorological time series, satellite imagery, or geographical data, without additional training. Users interact with the system via a web-based conversational interface, where responses are tailored for either technical experts (using explicit calculations and tables) or the general public (using simplified, narrative language). MeteoChat has been tested with real data extracted from the micrometeorological network of ARPA Lazio. Full article

(This article belongs to the Special Issue LLM4GIS: Large Language Models for GIS)

► Show Figures

Figure 1

43 pages, 621 KB

Open AccessArticle

A Benchmark for Evaluating Cognitive Reasoning in Modern Language Models

by Kinga Piętka and Michał Bereta

Appl. Sci. 2026, 16(4), 1918; https://doi.org/10.3390/app16041918 - 14 Feb 2026

Viewed by 398

Abstract

With the growth of large language models (LLMs), there are increasing calls to interpret their behavior through the prism of analogies to human cognitive mechanisms. At the same time, scientific literature points to the fundamental limitations of these systems, describing them, among other [...] Read more.

With the growth of large language models (LLMs), there are increasing calls to interpret their behavior through the prism of analogies to human cognitive mechanisms. At the same time, scientific literature points to the fundamental limitations of these systems, describing them, among other things, as models that generate a superficial simulation of reasoning without real access to semantic meanings (“stochastic parrots” or “illusion of reasoning”). This paper proposes an innovative, modular benchmark for assessing the cognitive competence of LLMs, integrating three complementary dimensions of language processing: factual, syntactic, and logical. Eight language models (LLama 3.2, Mistral 7B, LLama 3:8B, Gemini 2.5 Flash, ChatGPT-3, ChatGPT-4o mini, ChatGPT-4, and ChatGPT-5) were tested using a uniform procedure with context reset after each interaction and a three-point scoring scheme (0/0.5/1). The results obtained showed a clear advantage for the largest models in tasks based on general knowledge and formal transformations known from training, with a significant decrease in effectiveness, regardless of model size, in tasks requiring conjunctive reasoning based solely on new, local premises. Importantly, unstable but measurable corrective abilities of some models were also observed after feedback, suggesting the presence of reactive mechanisms, but were insufficient to consider them systems capable of cognitive self-reflection. The combined analysis indicates that LLMs effectively simulate syntax and logic rules when the task corresponds to recognizable formal patterns, but fail in situations requiring the construction of new, coherent chains of beliefs and symbolic inferences, which undermines the thesis of their cognitive “understanding”. The results justify the need to create more complex and semantically restrictive evaluation frameworks that will allow distinguishing statistical fit from systemic, multi-stage formal reasoning. The proposed benchmark is a step towards a more multidimensional and diagnostic evaluation of LLMs, shifting the focus from “will the model respond correctly?” to “why and under what conditions is the model able to reason?” Full article

► Show Figures

Figure 1

19 pages, 1073 KB

Open AccessArticle

Domain-Adaptive Multimodal Large Language Models for Photovoltaic Fault Diagnosis via Dynamic LoRA Routing

by Junjian Wu, Yiwei Chen, Qihao Min, Ming Chen, Jie Zhao and Mang Ye

Processes 2026, 14(4), 653; https://doi.org/10.3390/pr14040653 - 13 Feb 2026

Viewed by 318

Abstract

The reliability of photovoltaic (PV) equipment is vital for ensuring the safe and stable operation of power systems. While multimodal large language models (MLLMs) open up promising avenues for intelligent fault diagnosis, they often falter when confronted with the heterogeneity of PV data—where [...] Read more.

The reliability of photovoltaic (PV) equipment is vital for ensuring the safe and stable operation of power systems. While multimodal large language models (MLLMs) open up promising avenues for intelligent fault diagnosis, they often falter when confronted with the heterogeneity of PV data—where visual observations come from different sensor modalities (e.g., visible, infrared, and thermal) and display strong domain-dependent variations. Conventional Low-Rank Adaptation (LoRA) is not expressive enough to model such modality-aware differences, which can result in insufficient exploitation of informative patterns. To overcome this limitation, we propose PV-FaultExpert, a domain-adaptive MLLM designed specifically for PV equipment fault analysis. PV-FaultExpert is built upon DyLoRA (Dynamic Expert Routing with LoRA), a dynamic routing strategy that reformulates standard LoRA into a shared low-rank component coupled with multiple expert-specific adapters. A routing module then selects expert paths according to input characteristics, allowing the model to adapt to diverse modalities while maintaining parameter efficiency. Moreover, we construct a PVfault diagnosis dataset via ChatGPT-4o-assisted chain-of-thought reasoning and subsequent expert verification, which both supports model training and enables rigorous evaluation of our method. Extensive experiments demonstrate that PV-FaultExpert consistently surpasses strong baselines, including GPT-4 and Claude-3, across multiple evaluation criteria, producing fault analysis reports that are accurate, interpretable, and aligned with safety-critical requirements. Full article

(This article belongs to the Special Issue Advancements in Photovoltaic Technologies: Innovations for Enhanced Energy Conversion and Efficiency)

► Show Figures

Figure 1

22 pages, 7883 KB

Open AccessArticle

A Comparative Evaluation of Multimodal Generative AI as an Early-Stage Biophilic Design Assistant

by Bekir Huseyin Tekin

Buildings 2026, 16(4), 768; https://doi.org/10.3390/buildings16040768 - 13 Feb 2026

Viewed by 245

Abstract

This study investigates how two widely used language-modelled generative AI tools, ChatGPT-5.1 (with DALL·E 3) and Gemini 3 (with Imagen), perform as early-stage co-design partners for biophilic interior design. Focusing on real-world use rather than theoretical capability, the research asks to what extent [...] Read more.

This study investigates how two widely used language-modelled generative AI tools, ChatGPT-5.1 (with DALL·E 3) and Gemini 3 (with Imagen), perform as early-stage co-design partners for biophilic interior design. Focusing on real-world use rather than theoretical capability, the research asks to what extent these systems can generate conceptually robust, visually coherent and practically feasible proposals when designers explicitly request biophilic strategies. A multiple-case design was employed across three scenarios: (1) an empty “tabula rasa” room, (2) a damaged rustic room requiring contextual renovation, and (3) a hospital staff break room to be transformed into a “cognitive restoration sanctuary.” For each case, both tools were prompted to produce a step-by-step biophilic design plan and a corresponding photorealistic image. Textual outputs were coded against the 14 Patterns of Biophilic Design and related restorative concepts, while images were evaluated by an expert panel of 15 architects with formal training in biophilic design using a structured Likert-scale instrument. Exterior and building-scale applications were not assessed. Results show that both systems can articulate broadly plausible biophilic strategies but differ in emphasis: ChatGPT tends to produce more spatially coherent, pattern-rich and functionally grounded plans, whereas Gemini excels more in visual realism and atmospheric rendering. Expert ratings indicate a consistent, though not overwhelming, preference for ChatGPT in spatial composition, human-spatial responses, contextual fit, and strategic support for cognitive restoration, with a slight advantage for Gemini in visual realism. Across all cases, however, plan-to-image fidelity is limited, particularly for non-visual and operational patterns (e.g., sound, scent, thermal variability, circadian systems, infrastructure access). The findings suggest that current generative AI tools are best positioned as fast, co-creative aides for early exploration of biophilic ideas, rather than as reliable autonomous consultants for evidence-based, cognitively targeted biophilic design. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

► Show Figures

Figure 1

18 pages, 1272 KB

Open AccessStudy Protocol

Leveraging Student-Athlete Mental Health Through an AI-Augmented Mobile Platform: The ThriveNudge Study Protocol

by Sameer Chakraborty, Nicholas Mendro and Longxi Li

Behav. Sci. 2026, 16(2), 268; https://doi.org/10.3390/bs16020268 - 11 Feb 2026

Viewed by 365

Abstract

Playing sports remains one of the most common avenues for youth engagement in physical activity. Yet mental health challenges, such as performance anxiety, depressive symptoms, reduced motivation, and burnout, place many young athletes at risk. As key mediators of sport participation, coaches’ roles [...] Read more.

Playing sports remains one of the most common avenues for youth engagement in physical activity. Yet mental health challenges, such as performance anxiety, depressive symptoms, reduced motivation, and burnout, place many young athletes at risk. As key mediators of sport participation, coaches’ roles are often underscored in recognizing shifts in athlete motivation, behavior, or well-being. Gaining better insight into athlete mental health status may enable coaches to provide timely support and strengthen athlete and team well-being. In this study protocol, we employ a mixed-methods design, evaluating the effectiveness of an AI-augmented mobile application (i.e., ThriveNudge) in promoting the mental health of youth athletes. ThriveNudge helps coaches monitor athlete mental health, flag mood disruptions, and practice supportive communication via simulated chats. A target sample of four interscholastic teams (with athletes aged 14–18 years) and their head coaches will be recruited. Teams will be cluster-randomized to either the intervention condition (n = 2), receiving pre-season training to implement ThriveNudge, or to a waitlist control condition (n = 2). Primary outcomes, including athlete burnout, motivation, coach–athlete relationships, and sport enjoyment, will be measured using psychometric scales administered online. Semi-structured interviews will be conducted with coaches and athletes in the experimental group to collect qualitative data on user interface and user experience. We hypothesize that teams using ThriveNudge will report lower athlete anxiety and burnout, higher intrinsic motivation and enjoyment, and stronger coach–athlete relationships than athletes in control teams. We aim to provide a scalable and accessible digital platform that safeguards youth mental health. Full article

(This article belongs to the Special Issue The Use of AI in the Behavioral Sciences)

► Show Figures

Figure 1

15 pages, 339 KB

Open AccessArticle

Teacher Education Students’ Practices, Benefits, and Challenges in the Use of Generative AI Tools in Higher Education

by Stavros Athanassopoulos, Aggeliki Tzavara, Spyridon Aravantinos, Konstantinos Lavidas, Vassilis Komis and Stamatios Papadakis

Educ. Sci. 2026, 16(2), 228; https://doi.org/10.3390/educsci16020228 - 2 Feb 2026

Viewed by 614

Abstract

Despite the growing adoption of generative artificial intelligence (GenAI) tools in higher education, limited research has examined how future educators perceive and use these technologies in their academic practices. This study investigates the practices, perceived benefits, and challenges associated with the use of [...] Read more.

Despite the growing adoption of generative artificial intelligence (GenAI) tools in higher education, limited research has examined how future educators perceive and use these technologies in their academic practices. This study investigates the practices, perceived benefits, and challenges associated with the use of GenAI tools—such as ChatGPT—among undergraduate students enrolled in programs that confer teaching qualifications. Using a mixed-methods design, data were collected from 314 students from the Early Childhood Education, Philosophy, and Philology departments. The findings indicate that the majority of students use GenAI tools primarily for academic purposes, most commonly for information searching, data analysis, study advice, and exam preparation. Students reported several perceived benefits, including rapid access to information, time efficiency, improved comprehension of complex concepts, enhanced study organization, and support with assignments and research-related tasks such as summarizing or translating academic texts. At the same time, participants expressed notable concerns, particularly regarding over-reliance on AI, reduced personal effort, risks to academic integrity, diminished critical thinking, and weakened research skills. Additional challenges included misinformation, reduced creativity, improper use of AI-generated content, skill underdevelopment, and potential technological dependence. The study concludes that teacher education programs should systematically integrate AI literacy and responsible-use training to prepare future educators to address the pedagogical and ethical implications of GenAI in educational settings. Full article

(This article belongs to the Special Issue Unleashing the Potential of E-learning in Higher Education)

► Show Figures

Figure 1

23 pages, 974 KB

Open AccessSystematic Review

Performance of Large Language Models for Radiology Report Impression Generation: A Systematic Review

by Curtise K. C. Ng, Zhonghua Sun and Ian K. H. Te

Technologies 2026, 14(2), 99; https://doi.org/10.3390/technologies14020099 - 2 Feb 2026

Viewed by 575

Abstract

No systematic review has previously examined the application of large language models (LLMs) for generating impressions from radiology report findings. This study systematically reviews the performance of LLMs on this task and their associated evaluation methodologies. A search of seven electronic databases on [...] Read more.

No systematic review has previously examined the application of large language models (LLMs) for generating impressions from radiology report findings. This study systematically reviews the performance of LLMs on this task and their associated evaluation methodologies. A search of seven electronic databases on 7 August 2025 identified 15 eligible papers (average quality score: 71.4%). These articles evaluated 35 LLMs, including 21 base models. The reported performance ranges were as follows: Recall-Oriented Understudy for Gisting Evaluation (ROUGE)-1, 35.9% (Generative Pre-Trained Transformer (GPT)-4) to 69.7% (Baichuan2-13B); ROUGE-2, 13.4% (Large Language Model Meta AI (Llama)) to 52.4% (Baichuan2-13B); and ROUGE-L, 16.5% (Chat General Language Model–Medical (ChatGLM-Med)) to 63.8% (finetuned Text-to-Text Transfer Transformer (T5)). The finetuned T5 consistently demonstrated high performance, based on Bidirectional Encoder Representations from Transformers Score (BERTScore): 89.2%; BiLingual Evaluation Understudy (BLEU)-1: 65.2%; BLEU-2: 57.9%; BLEU-3: 52.5%; BLEU-4: 48.3%; Metric for Evaluation of Translation with Explicit ORdering (METEOR): 38.1%; ROUGE-1: 59.9%; ROUGE-2: 50.9%; ROUGE-L: 63.8%; and subjective metrics (clinical usability: 4.5/5.0; completeness: 4.3/5.0; conciseness: 4.3/5.0; fluency: 4.4/5.0). These results, based on 132,043 computed tomography, echocardiography, magnetic resonance imaging, and X-ray reports, indicate its strong clinical potential for assisting radiologists in impression generation through supervised finetuning rather than prompting techniques used in closed-source LLMs. Full article

(This article belongs to the Special Issue Artificial Intelligence in Medical Radiation Science, Radiology and Radiation Oncology)

► Show Figures

Figure 1

19 pages, 2519 KB

Open AccessFeature PaperArticle

Evaluating Fairness in LLM Negotiator Agents via Economic Games Using Multi-Agent Systems

by Ahmad Mouri Zadeh Khaki and Ahyoung Choi

Mathematics 2026, 14(3), 458; https://doi.org/10.3390/math14030458 - 28 Jan 2026

Viewed by 412

Abstract

With the surge of artificial intelligence (AI) systems, autonomous Large Language Model (LLM)-based negotiator agents are being developed to negotiate on behalf of humans, particularly in commercial contexts. In human interactions, marginalized groups, such as racial minorities and women, often face unequal outcomes [...] Read more.

With the surge of artificial intelligence (AI) systems, autonomous Large Language Model (LLM)-based negotiator agents are being developed to negotiate on behalf of humans, particularly in commercial contexts. In human interactions, marginalized groups, such as racial minorities and women, often face unequal outcomes due to gender and social biases. Since these models are trained on human data, a key question arises: do LLM-based agents reflect existing biases in human interaction in their negotiation strategies? To address this question, we investigated the impact of such biases in one of the most advanced LLMs available, ChatGPT-4 Turbo, by employing a buyer–seller game approach using male and female agents from four racial groups (White, Black, Asian, and Latino). We found that when either the seller or buyer is aware of the gender and race of the other player, they secure more profit compared to when negotiations are gender- and race-blind. Additionally, we examined the influence of conditioning buyer agents to improve their negotiation strategy by prompting them with additional persona. Interestingly, we observed that such conditioning can mitigate LLM-based agents’ biases, suggesting a way to empower underrepresented groups to achieve more equitable outcomes. Based on the findings of this study, while LLM-generated text may not exhibit explicit biases, hidden gender and social biases in the training data can still lead to skewed outcomes for users. Therefore, it is crucial to mitigate these biases and prevent their transfer during dataset curation to ensure fair human–agent interactions and build user trust. Full article

(This article belongs to the Special Issue Mathematical Methods Applied in Artificial Intelligence and Multi-Agent Systems, 2nd Edition)

► Show Figures

Figure 1

10 pages, 193 KB

Open AccessReview

Attention to Elderspeak: A Call for Dignity-Affirming Communication in Advanced Nursing Care

by Takahiko Nagamine

Clin. Pract. 2026, 16(1), 21; https://doi.org/10.3390/clinpract16010021 - 22 Jan 2026

Cited by 1 | Viewed by 377

Abstract

Elderspeak is a form of communication overaccommodation directed toward older adults, characterized by simplified language and an elevated pitch. While typically well-intentioned, it is rooted in ageist stereotypes and linked to negative health outcomes. A literature search was conducted in PubMed, CINAHL, and [...] Read more.

Elderspeak is a form of communication overaccommodation directed toward older adults, characterized by simplified language and an elevated pitch. While typically well-intentioned, it is rooted in ageist stereotypes and linked to negative health outcomes. A literature search was conducted in PubMed, CINAHL, and PsycINFO (2018–2025), yielding 24 key articles focusing on acute and surgical settings. The purpose of this narrative review is to synthesize current evidence on Elderspeak within acute care hospitals and propose a research framework and intervention strategies. Elderspeak is a key determinant of resistiveness to care (RTC), particularly in acute settings where it is triggered by functional impairment. Exposure increases patient distress and negatively impacts vital signs and cooperation with medical interventions. Inconsistent measurement is being addressed through standardized schemes like the Iowa Coding Scheme for Elderspeak (ICodE). This paper proposes that future research must employ mixed-methods, longitudinal designs to capture the impact of Elderspeak on long-term outcomes. Drawing on the ICodE, we propose a qualitative self-reflection tool for clinicians to enhance awareness in high-stakes acute settings. Eliminating Elderspeak is a foundational necessity for patient safety and dignity-affirming care in advanced nursing. Full article

(This article belongs to the Special Issue Advances in Clinical Nursing: Integrating Advanced Surgical and Medical Nursing for Enhanced Patient Outcomes)

Search Results (316)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (316)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI