Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (595)

Search Parameters:
Keywords = GPT application

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 282 KB  
Article
Educating Aspiring Teachers with AI by Strengthening Sustainable Pedagogical Competence in Changing Educational Landscapes
by Aydoğan Erkan, İslam Suiçmez, Sezer Kanbul and Mehmet Öznacar
Sustainability 2026, 18(2), 757; https://doi.org/10.3390/su18020757 - 12 Jan 2026
Viewed by 88
Abstract
This study examines the effectiveness of an eight-week AI training program aimed at enhancing teacher candidates’ pedagogical competence and AI literacy in rapidly changing and evolving educational environments. As the modern world continues to change and develop, the transformation of education, which is [...] Read more.
This study examines the effectiveness of an eight-week AI training program aimed at enhancing teacher candidates’ pedagogical competence and AI literacy in rapidly changing and evolving educational environments. As the modern world continues to change and develop, the transformation of education, which is one of the most important elements of our lives, cannot be ignored. Accordingly, the integration of teacher candidates, who constitute key education stakeholders, into technological developments is very important in terms of both efficiency and sustainability. The “parallel–simultaneous design”, one of the mixed research methods in which quantitative and qualitative research methods are used together, was employed. In line with the stated purpose, the study started with a needs analysis conducted with 33 teacher candidates studying in different branches at the faculty of education. As a result of the needs analysis, knowledge gaps, digital skill levels and readiness for integration of artificial intelligence tools in future classrooms were determined. Its application to teacher candidates, instead of teachers in the profession, was determined by the needs analysis. The results indicate that it would be more beneficial to apply the education of the future to the teachers of the future and that they will find it easier to adapt to such training. Accordingly, a pre-test–post-test design was applied to observe how the participants changed, and an artificial intelligence literacy scale was also used. QDA Miner Lite was used for the analysis of the qualitative data, and SPSS 29.0 was used for the analysis of the quantitative data. During the eight-week training, Gamma programs were used for the presentation, Suno for audio, Midjourney for visuals and ChatGPT-4 for a descriptive search in order to provide better quality education to the participants. While practicing with these applications, the aim is to provide more up-to-date education that reveals problem-solving skills that include critical thinking exercises. According to the results, the teacher candidates who expressed that they were undecided or had insufficient knowledge reached a sufficient level in the post-test. In the light of these results, it can be stated that artificial-intelligence-oriented education is effective in developing sustainable pedagogical skills, digital literacy, readiness and professional self-confidence. The study also offers evidence-based recommendations for the design of future teacher training programs. Full article
30 pages, 588 KB  
Article
Comparative Performance Analysis of Large Language Models for Structured Data Processing: An Evaluation Framework Applied to Bibliometric Analysis
by Maryam Abbasi, Paulo Váz, José Silva, Filipe Cardoso, Filipe Sá and Pedro Martins
Appl. Sci. 2026, 16(2), 669; https://doi.org/10.3390/app16020669 - 8 Jan 2026
Viewed by 183
Abstract
The proliferation of Large Language Models (LLMs) has transformed natural language processing (NLP) applications across diverse domains. This paper presents a comprehensive comparative analysis of three state-of-the-art language models—GPT-4o, Claude-3, and Julius AI—evaluating their performance across systematic NLP tasks using standardized datasets and [...] Read more.
The proliferation of Large Language Models (LLMs) has transformed natural language processing (NLP) applications across diverse domains. This paper presents a comprehensive comparative analysis of three state-of-the-art language models—GPT-4o, Claude-3, and Julius AI—evaluating their performance across systematic NLP tasks using standardized datasets and evaluation frameworks. We introduce a reusable evaluation methodology incorporating five distinct prompt engineering techniques (Prefix, Cloze, Anticipatory, Heuristic, and Chain of Thought) applied to three categories of linguistic challenges: data extraction, aggregation, and contextual reasoning. Using a bibliometric analysis use case as our evaluation domain, we demonstrate the framework’s application to structured data processing tasks common in academic research, business intelligence, and data analytics applications. Our experimental design utilized a curated Scopus bibliographic dataset containing 3212 academic publications to ensure reproducible and objective comparisons, representing structured data processing tasks. The results demonstrated significant performance variations across models and tasks, with GPT-4o achieving 89.3% average accuracy, Julius AI reaching 85.7%, and Claude-3 demonstrating 72.1%. The results demonstrated significant performance variations across models and tasks, with Claude-3 showing notably high prompt sensitivity (consistency score: 74.3%, compared with GPT-4o: 91.2% and Julius AI: 86.7%). This study revealed critical insights into prompt sensitivity, contextual understanding limitations, and the effectiveness of different prompting strategies for specific task categories. Statistical analysis using repeated measures ANOVA and pairwise t-tests with Bonferroni’s correction confirmed significant differences between models (F(2, 132) = 142.3, p < 0.001), with effect sizes ranging from 0.51 to 1.33. Response time analysis showed task-dependent latency patterns: for data extraction tasks, Claude-3 averaged 1.9 s (fastest), GPT-4o 2.1 s, and Julius AI 2.8 s; however, for contextual reasoning tasks, latency increased as follows for Claude-3: 3.8 s, GPT-4o: 4.5 s, and Julius AI: 5.8 s. Overall averages were as follows for GPT-4o: 3.2 s, Julius AI: 4.1 s, and Claude-3: 2.8 s. While specific performance metrics reflect current model versions (GPT-4o: gpt-4o-2024-05-13; Claude-3 Opus: 20240229; Julius AI: v2.1.4), the evaluation framework provides a reusable methodology for ongoing LLM assessment as new versions emerge. These findings provide practical guidance for researchers and practitioners in selecting appropriate LLMs for domain-specific applications and highlight areas requiring further development in language model capabilities. While demonstrated on bibliometric data, this evaluation framework is generalizable to other structured data processing domains. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

14 pages, 1101 KB  
Article
AI in the Hot Seat: Head-to-Head Comparison of Large Language Models and Cardiologists in Emergency Scenarios
by Vedat Cicek, Lili Zhao, Yalcin Tur, Ahmet Oz, Sahhan Kilic, Gorkem Durak, Faysal Saylik, Mert Ilker Hayiroglu, Tufan Cinar and Ulas Bagci
Med. Sci. 2026, 14(1), 33; https://doi.org/10.3390/medsci14010033 - 8 Jan 2026
Viewed by 135
Abstract
Background: The clinical applicability of large language models (LLMs) in high-stakes cardiac emergencies remains unexplored. This study evaluated how well advanced LLMs perform in managing complex catheterization laboratory (Cath lab) scenarios and compared their performance with that of interventional cardiologists. Methods and Results: [...] Read more.
Background: The clinical applicability of large language models (LLMs) in high-stakes cardiac emergencies remains unexplored. This study evaluated how well advanced LLMs perform in managing complex catheterization laboratory (Cath lab) scenarios and compared their performance with that of interventional cardiologists. Methods and Results: A cross-sectional study was conducted from 20 June to 2 December 2024. Twelve challenging inferior myocardial infarction scenarios were presented to seven LLMs (ChatGPT, Gemini, LLAMA, Qwen, Bing, Claude, DeepSeek) and five early-career interventional cardiologists. Responses were standardized, anonymized, and evaluated by thirty experienced interventional cardiologists. Performance comparisons were analyzed using a linear mixed-effects model with correlation and reliability statistics. Physicians had an average reference score of 80.68 (95% CI 76.3–85.0). Among LLMs, ChatGPT ranked highest (87.4, 95% CI 82.5–92.3), followed by Claude (80.8, 95% CI 75.7–85.9) and DeepSeek (78.7, 95% CI 72.9–84.6). LLAMA (73.7), Qwen (66.2), and Bing (64.3) ranked lower, while Gemini scored the lowest (59.0). ChatGPT scored higher than the early-career physician comparator group (difference 6.69, 95% CI 0.00–13.37; p < 0.05), whereas Gemini, LLAMA, Qwen, and Bing performed significantly worse; Claude and DeepSeek showed no significant difference. Conclusions: This expanded assessment reveals significant variability in LLM performance. In this simulated setting, ChatGPT demonstrated performance comparable to that of early-career interventional cardiologists. These results suggest that LLMs could serve as supplementary decision-support tools in interventional cardiology under simulated conditions. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) in Cardiovascular Medicine)
Show Figures

Figure 1

30 pages, 332 KB  
Review
Prompt Injection Attacks in Large Language Models and AI Agent Systems: A Comprehensive Review of Vulnerabilities, Attack Vectors, and Defense Mechanisms
by Saidakhror Gulyamov, Said Gulyamov, Andrey Rodionov, Rustam Khursanov, Kambariddin Mekhmonov, Djakhongir Babaev and Akmaljon Rakhimjonov
Information 2026, 17(1), 54; https://doi.org/10.3390/info17010054 - 7 Jan 2026
Viewed by 778
Abstract
Large language models (LLMs) have rapidly transformed artificial intelligence applications across industries, yet their integration into production systems has unveiled critical security vulnerabilities, chief among them prompt injection attacks. This comprehensive review synthesizes research from 2023 to 2025, analyzing 45 key sources, industry [...] Read more.
Large language models (LLMs) have rapidly transformed artificial intelligence applications across industries, yet their integration into production systems has unveiled critical security vulnerabilities, chief among them prompt injection attacks. This comprehensive review synthesizes research from 2023 to 2025, analyzing 45 key sources, industry security reports, and documented real-world exploits. We examine the taxonomy of prompt injection techniques, including direct jailbreaking and indirect injection through external content. The rise of AI agent systems and the Model Context Protocol (MCP) has dramatically expanded attack surfaces, introducing vulnerabilities such as tool poisoning and credential theft. We document critical incidents including GitHub Copilot’s CVE-2025-53773 remote code execution vulnerability (CVSS 9.6) and ChatGPT’s Windows license key exposure. Research demonstrates that just five carefully crafted documents can manipulate AI responses 90% of the time through Retrieval-Augmented Generation (RAG) poisoning. We propose PALADIN, a defense-in-depth framework implementing five protective layers. This review provides actionable mitigation strategies based on OWASP Top 10 for LLM Applications 2025, identifies fundamental limitations including the stochastic nature problem and alignment paradox, and proposes research directions for architecturally secure AI systems. Our analysis reveals that prompt injection represents a fundamental architectural vulnerability requiring defense-in-depth approaches rather than singular solutions. Full article
(This article belongs to the Special Issue Emerging Trends in AI-Driven Cyber Security and Digital Forensics)
Show Figures

Graphical abstract

33 pages, 2053 KB  
Systematic Review
Generative AI in Art Education: A Systematic Review of Research Trends, Tool Applications, and Outcomes (2019–2025)
by Yihan Jiang, Yujiao Fan and Zifeng Liu
Educ. Sci. 2026, 16(1), 47; https://doi.org/10.3390/educsci16010047 - 30 Dec 2025
Viewed by 987
Abstract
Generative artificial intelligence (GenAI) tools are transforming art education by enabling instant creation of textual, visual, audio, and multimodal outputs. This systematic review synthesizes research on GenAI applications in art education from January 2019 to August 2025. Following PRISMA 2020 guidelines, 19 peer-reviewed [...] Read more.
Generative artificial intelligence (GenAI) tools are transforming art education by enabling instant creation of textual, visual, audio, and multimodal outputs. This systematic review synthesizes research on GenAI applications in art education from January 2019 to August 2025. Following PRISMA 2020 guidelines, 19 peer-reviewed empirical studies across six databases (Web of Science, ScienceDirect, Springer, Taylor & Francis, Scopus, and ERIC) met inclusion criteria, which required clear pedagogical implementation with students or educators as active participants. Research accelerated from two studies in 2023 to 14 in 2025, with most studies examining higher education and East Asia contexts through mixed methods approaches and grounded in constructivist and cognitive learning theories. Text-to-image generation models (DALL-E, Midjourney, Stable Diffusion) and conversational AI (ChatGPT) were most frequently implemented across creative production, pedagogical scaffolding, and instructional design applications. Findings from this emerging body of research suggest that GenAI has the potential to improve learning achievement, creative thinking, engagement, and cultural understanding when integrated through structured pedagogical frameworks with intentional instructor design. However, these positive outcomes represent early-stage implementation trends in well-resourced contexts rather than broadly generalizable conclusions. Successful integration requires explicit instructional frameworks, clear ethical guidelines for human-AI collaboration, and evolved assessment methods. Full article
(This article belongs to the Special Issue The Impact of Artificial Intelligence on Teaching and Learning)
Show Figures

Figure 1

28 pages, 4566 KB  
Systematic Review
Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) for Enterprise Knowledge Management and Document Automation: A Systematic Literature Review
by Ehlullah Karakurt and Akhan Akbulut
Appl. Sci. 2026, 16(1), 368; https://doi.org/10.3390/app16010368 - 29 Dec 2025
Viewed by 1497
Abstract
The integration of Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) is rapidly transforming enterprise knowledge management, yet a comprehensive understanding of their deployment in real-world workflows remains limited. This study presents a systematic literature review (SLR) analyzing 63 high-quality primary studies selected [...] Read more.
The integration of Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) is rapidly transforming enterprise knowledge management, yet a comprehensive understanding of their deployment in real-world workflows remains limited. This study presents a systematic literature review (SLR) analyzing 63 high-quality primary studies selected after rigorous screening to evaluate how these technologies address practical enterprise challenges. We formulated nine research questions targeting platforms, datasets, algorithms, and validation metrics to map the current landscape. Our findings reveal that enterprise adoption is largely in the experimental phase: 63.6% of implementations utilize GPT based models, and 80.5% rely on standard retrieval frameworks such as FAISS or Elasticsearch. Critically, this review identifies a significant ‘lab-to-market’ gap; while retrieval and classification sub-tasks frequently employ academic validation methods like k-fold cross-validation (93.6%), generative evaluation predominantly relies on static hold-out sets due to computational constraints. Furthermore, fewer than 15% of studies address real-time integration challenges required for production scale deployment. By systematically mapping these disparities, this study offers a data-driven perspective and a strategic roadmap for bridging the gap between academic prototypes and robust enterprise applications. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

13 pages, 258 KB  
Article
AI-Generated Antibiotic Therapies for Acute Periprosthetic Joint Infections with Implant Retention in Comparison with an Interdisciplinary Team
by Alberto Alfieri Zellner, Tamaradoubra Tippa Tuburu, Alexander Franz, Jonas Roos, Frank Sebastian Fröschen and Gunnar Thorben Rembert Hischebeth
Antibiotics 2026, 15(1), 25; https://doi.org/10.3390/antibiotics15010025 - 29 Dec 2025
Viewed by 247
Abstract
Background: Periprosthetic joint infections (PJI) represent a serious complication following joint arthroplasty and require, in addition to surgical intervention, a targeted antibiotic therapy. The aim of this study was to compare microbiological recommendations for the antibiotic treatment of fictitious PJI patients generated by [...] Read more.
Background: Periprosthetic joint infections (PJI) represent a serious complication following joint arthroplasty and require, in addition to surgical intervention, a targeted antibiotic therapy. The aim of this study was to compare microbiological recommendations for the antibiotic treatment of fictitious PJI patients generated by an artificial intelligence (AI) system with those of an interdisciplinary team (IT) consisting of microbiologists and orthopedic surgeons. The differences between the recommendations suggested by AI and the IT were analyzed with regard to the suggested agents and duration of antibiotic therapy. Methods: Based on meta-analyses, a cohort of 100 fictitious patients with acute early- and acute late-onset PJI was created, reflecting the typical demographic data, comorbidities and pathogen profiles of such a population. This information was input into the AI system ChatGPT (OpenAI, GPT-5 “Thinking mode” accessed via ChatGPT Plus, San Francisco, CA, USA) to generate corresponding recommendations. The objective was to use these profiles to obtain recommendations for definitive antibiotic therapy, including daily dosage, intravenous and oral treatment durations. Simultaneously, the same fictitious patient data were reviewed by the IT to produce their own recommendations. Results: The results revealed both concordances and discrepancies in the selection of antibiotics. Notably, in cases involving multidrug-resistant organisms and more complex clinical scenarios, the AI-generated recommendations were incongruent with those of the IT, with estimated percentage agreement ranging from 0–33%. In straightforward clinical scenarios with monomicrobial infections, AI reached an estimated percentage agreement of up to 57% (95%-CI [0.47–0.67]). Furthermore, AI consistently recommended 12 weeks of therapy duration vs. six weeks usually recommended by the IT. Conclusions: The study provides important insights into the potential and limitations of AI-assisted decision-making models in orthopedic infection treatments. Consultation of AI is universally accessible at all times of day, which may offer a significant advantage in the future for the treatment of PJI. This kind of application will be of particular interest for institutions without in-house microbiology services. However, from our perspective, the current level of incongruence between the AI-generated recommendations and those of an experienced interdisciplinary team remains too high for this approach to be clinically implemented at this time. Furthermore, AI lacks transparency regarding the sources it uses to inform about its decision-making and therapeutic recommendations, currently carries no legal weight and clinical implementation is severely hindered by restrictive privacy laws regarding health care data. Full article
(This article belongs to the Special Issue Diagnostics and Antibiotic Therapy in Bone and Joint Infections)
11 pages, 681 KB  
Article
Artificial Intelligence in Cosmetic Dermatology with Regard to Laser Treatments: A Comparative Analysis of AI and Dermatologists’ Decision-Making
by Alexandra Junge, Ali Mokhtari, Simone Cazzaniga, Ashraf Badawi, Flurin Brand, Simone Böll, Laurence Feldmeyer, Cindy Franklin, Hans-Joachim Laubach, Mathias Lehmann, Zora Martignoni, Sammy Murday, Dominik Obrist, Antonia Reimer-Taschenbrecker, Basil Signer, Roberta Vasconcelos-Berg, Charlotte Vogel, Nikhil Yawalkar, Kristine Heidemeyer and Seyed Morteza Seyed Jafari
Cosmetics 2026, 13(1), 5; https://doi.org/10.3390/cosmetics13010005 - 26 Dec 2025
Viewed by 862
Abstract
Introduction: Artificial intelligence (AI) has developed into an increasingly important tool in dermatology. While new technologies integrated within laser devices are emerging, there is a lack of data on the applicability of publicly available AI models. Methods: The prospective study used an online [...] Read more.
Introduction: Artificial intelligence (AI) has developed into an increasingly important tool in dermatology. While new technologies integrated within laser devices are emerging, there is a lack of data on the applicability of publicly available AI models. Methods: The prospective study used an online questionnaire where participants evaluated diagnosis and treatment for 25 dermatological cases shown as pictures. The same questions were given to AI models: ChatGPT-4o, Claude Sonnet 4, Gemini 2.5 Flash, and Grok-3. Results: Dermatologists outperformed AI in diagnostic accuracy (suspected primary diagnosis-SD 75.6%) in pooled dermatologists vs. pooled AI (SD 57.0%), with laser specialists achieving the highest accuracy (SD 82.0%) and residents the lowest (SD 66.0%). There was a high heterogeneity across AI models. Gemini approached dermatologist performance (SD 72.0%), while Claude showed a low accuracy (SD 40.0%). While AI models reached near 100% accuracy in some classic/common diagnoses (e.g., acne, rosacea, spider angioma, infantile hemangioma), their accuracy dropped to near 0% on rare or context-dependent cases (e.g., blue rubber bleb nevus syndrome, angiosarcoma, hirsutism, cutaneous siderosis). Inter-rater agreement was high among laser experts in terms of diagnostic accuracy and treatment choice. Agreement between residents and AI models was highest for diagnostic accuracy and treatment choice, while it was lowest between experts and AI models. Conclusions: Before AI-based tools can be integrated into daily practice, particularly regarding diagnosis and appropriate laser treatment recommendations, specific supervised medical training of the AI model is necessary, as open-source platforms currently lack the ability to contextualize presented data. Full article
(This article belongs to the Special Issue Feature Papers in Cosmetics in 2025)
Show Figures

Figure 1

27 pages, 897 KB  
Review
Large Language Models for Cardiovascular Disease, Cancer, and Mental Disorders: A Review of Systematic Reviews
by Andreas Triantafyllidis, Sofia Segkouli, Stelios Kokkas, Anastasios Alexiadis, Evdoxia Eirini Lithoxoidou, George Manias, Athos Antoniades, Konstantinos Votis and Dimitrios Tzovaras
Healthcare 2026, 14(1), 45; https://doi.org/10.3390/healthcare14010045 - 24 Dec 2025
Viewed by 494
Abstract
Background/Objective: The use of Large Language Models (LLMs) has recently gained significant interest from the research community toward the development and adoption of Generative Artificial Intelligence (GenAI) solutions for healthcare. The present work introduces the first meta-review (i.e., review of systematic reviews) in [...] Read more.
Background/Objective: The use of Large Language Models (LLMs) has recently gained significant interest from the research community toward the development and adoption of Generative Artificial Intelligence (GenAI) solutions for healthcare. The present work introduces the first meta-review (i.e., review of systematic reviews) in the field of LLMs for chronic diseases, focusing particularly on cardiovascular, cancer, and mental diseases, to identify their value in patient care, and challenges for their implementation and clinical application. Methods: A literature search in the bibliographic databases of PubMed and Scopus was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, to identify systematic reviews incorporating LLMs. The original studies included in the reviews were synthesized according to their target disease, specific application, LLMs used, data sources, accuracy, and key outcomes. Results: The literature search identified 5 systematic reviews respecting our inclusion and exclusion criteria, which examined 81 unique LLM-based solutions. The highest percentage of the solutions targeted mental disease (86%), followed by cancer (7%) and cardiovascular disease (6%), implying a large research focus in mental health. Generative Pre-trained Transformer (GPT)-family models were used most frequently (~55%), followed by Bidirectional Encoder Representations from Transformers (BERT) variants (~40%). Key application areas included depression detection and classification (38%), suicidal ideation detection (7%), question answering based on treatment guidelines and recommendations (7%), and emotion classification (5%). Study aims and designs were highly heterogeneous, and methodological quality was generally moderate with frequent risk-of-bias concerns. Reported performance varied widely across domains and datasets, and many evaluations relied on fictional vignettes or non-representative data, limiting generalisability. The most significant found challenges in the development and evaluation of LLMs include inconsistent accuracy, bias detection and mitigation, model transparency, data privacy, need for continual human oversight, ethical concerns and guidelines, as well as the design and conduction of high-quality studies. Conclusions: While LLMs show promise for screening, triage, decision support, and patient education—particularly in mental health—the current literature is descriptive and constrained by data, transparency, and safety gaps. We recommend prioritizing rigorous real-world evaluations, diverse benchmark datasets, bias-auditing, and governance frameworks before LLM clinical deployment and large adoption. Full article
(This article belongs to the Special Issue Smart and Digital Health)
Show Figures

Figure 1

21 pages, 1000 KB  
Article
ChatGPT in Programming Education: An Empirical Study on Its Impact on Student Performance, Creativity, and Teamwork
by Diana Stoyanova, Silviya Stoyanova-Petrova, Snezha Shotarova, Slavi Lyubomirov and Nevena Mileva
Educ. Sci. 2026, 16(1), 19; https://doi.org/10.3390/educsci16010019 - 23 Dec 2025
Viewed by 448
Abstract
This study employs a two-part research design to explore the impact of ChatGPT on programming education for engineering undergraduates. Study 1 involved 56 third-year students who completed a questionnaire examining the frequency and purposes of ChatGPT use in text-based programming. While no statistically [...] Read more.
This study employs a two-part research design to explore the impact of ChatGPT on programming education for engineering undergraduates. Study 1 involved 56 third-year students who completed a questionnaire examining the frequency and purposes of ChatGPT use in text-based programming. While no statistically significant association was found between ChatGPT usage frequency and final grades, high-achieving students tended to use the tool less frequently, whereas lower-performing students relied on it more for support. Study 2 employed a counterbalanced repeated-measures design with nine first-year students divided into two groups, who developed desktop applications with and without ChatGPT. Project assessments and focus-group interviews were used to examine the effects of ChatGPT on creativity, confidence, effectiveness, and teamwork in visual programming. The results indicate that ChatGPT use was associated with reduced task completion time and increased coding efficiency; however, it was also linked to decreased creativity, greater reliance on ready-made solutions, and diminished code readability and collaborative engagement. These results highlight the need for teaching strategies that balance AI integration in programming education. The study recommends incorporating tasks based on the Reverse Bloom’s Taxonomy and activities that allow students to work with and without ChatGPT to encourage critical reflection and responsible AI use. Full article
(This article belongs to the Section Higher Education)
Show Figures

Figure 1

12 pages, 707 KB  
Article
Comparison of ChatGPT-4o and DeepSeek R1 in the Management of Ophthalmological Emergencies—An Analysis of Ten Fictional Case Vignettes
by Dominik Knebel, Siegfried Priglinger and Benedikt Schworm
J. Clin. Med. 2025, 14(24), 8927; https://doi.org/10.3390/jcm14248927 - 17 Dec 2025
Viewed by 324
Abstract
Background: Generative artificial intelligence (AI) applications have gained increasing popularity in recent years and are used by an ever-increasing number of people on a day-to-day basis. While the performance of the earlier-generation generative AI ChatGPT-3.5 in the context of ophthalmologic emergencies has been [...] Read more.
Background: Generative artificial intelligence (AI) applications have gained increasing popularity in recent years and are used by an ever-increasing number of people on a day-to-day basis. While the performance of the earlier-generation generative AI ChatGPT-3.5 in the context of ophthalmologic emergencies has been previously assessed, the purpose of this study is to analyze the performance of the newer-generation generative AIs DeepSeek R1 (Hangzhou DeepSeek Artificial Intelligence Co., Ltd., Hangzhou, China) and ChatGPT-4o (OpenAI Inc., San Francisco, CA, USA) in the context of diagnosis, triage and prehospital management of ophthalmological emergencies. Methods: Ten previously published fictional case vignettes representing queries in the English language of patients experiencing acute ophthalmological symptoms were entered into the generative AIs DeepSeek R1 and ChatGPT-4o. The interaction with the generative AIs followed a previously described structured interaction path. In a random order, each case vignette was entered into separate chats five times, producing a total of 50 answers from each generative AI. Each answer was analyzed according to a previously published manual. Results: We observed better values for DeepSeek R1 compared to ChatGPT-4o in terms of treatment accuracy (60% compared to 50%), the share of answers containing wrong (46% compared to 60%) or conflicting information (30% compared to 40%), the share of answers that correctly captured the overall severity of symptoms (98% compared to 78%), as well as the share of potentially harmful answers (38% compared to 50%). Moreover, DeepSeek R1 more frequently provided a single diagnosis (20% compared to 16%) and specific treatment advice (42% compared to 20%) than ChatGPT-4o. Both generative AIs showed a diagnostic accuracy of 100%, i.e., whenever they provided a single diagnosis, this was indeed the correct diagnosis. In terms of triage accuracy, ChatGPT-4o performed slightly better than DeepSeek R1 (73% compared to 66%). In contrast to DeepSeek R1, which never directed questions back at the user, ChatGPT-4o always did. The direction of questions at the user enables dialogues with ChatGPT-4o that more closely resemble the actual taking of a patient’s history. However, DeepSeek R1 seems to perform better compared to ChatGPT-4o in terms of several important content-related metrics and has been shown to be more cost-effective in other studies. Conclusions: Both newer-generation generative AIs constitute remarkable milestones in the development of generative artificial intelligence. However, since potentially harmful recommendations were observed with both models, we currently do not recommend their use as sole source of information on ophthalmological emergencies for laypersons. Full article
(This article belongs to the Section Ophthalmology)
Show Figures

Figure 1

36 pages, 8767 KB  
Article
AI-Powered Multimodal System for Haiku Appreciation Based on Intelligent Data Analysis: Validation and Cross-Cultural Extension Potential
by Renjie Fan and Yuanyuan Wang
Electronics 2025, 14(24), 4921; https://doi.org/10.3390/electronics14244921 - 15 Dec 2025
Viewed by 357
Abstract
This study proposes an artificial intelligence (AI)-powered multimodal system designed to enhance the appreciation of traditional poetry, using Japanese haiku as the primary application domain. At the core of the system is an intelligent data analysis pipeline that extracts key emotional features from [...] Read more.
This study proposes an artificial intelligence (AI)-powered multimodal system designed to enhance the appreciation of traditional poetry, using Japanese haiku as the primary application domain. At the core of the system is an intelligent data analysis pipeline that extracts key emotional features from poetic texts. A fine-tuned Japanese BERT model is employed to compute three affective indices—valence, energy, and dynamism—which form a quantitative emotional representation of each haiku. These features guide a generative AI workflow: ChatGPT constructs structured image prompts based on the extracted affective cues and contextual information, and these prompts are used by DALL·E to synthesize stylistically consistent watercolor illustrations. Simultaneously, background music is automatically selected from an open-source collection by matching each poem’s affective vector with that of instrumental tracks, producing a coherent multimodal (text, image, sound) experience. A series of validation experiments demonstrated the reliability and stability of the extracted emotional features, as well as their effectiveness in supporting consistent cross-modal alignment. These results indicate that poetic emotion can be represented within a low-dimensional affective space and used as a bridge across linguistic and artistic modalities. The proposed framework illustrates a novel integration of affective computing and natural language processing (NLP) within cultural computing. Because the underlying emotional representation is linguistically agnostic, the system holds strong potential for cross-cultural extensions, including applications to Chinese classical poetry and other forms of traditional literature. Full article
Show Figures

Figure 1

28 pages, 3811 KB  
Article
Diagnosing and Mitigating LLM Failures in Recognizing Culturally Specific Korean Names: An Error-Driven Prompting Framework
by Xiaonan Wang, Gyuri Choi, Subin An, Joeun Kang, Seoyoon Park, Hyeji Choi, Jongkyu Lee and Hansaem Kim
Appl. Sci. 2025, 15(24), 12977; https://doi.org/10.3390/app152412977 - 9 Dec 2025
Viewed by 631
Abstract
As large language models (LLMs) improve in understanding and reasoning, they are increasingly used in privacy protection tasks such as de-identification, privacy-sensitive text generation, and entity obfuscation. However, these applications depend on an essential requirement: the accurate identification of personally identifiable information (PII). [...] Read more.
As large language models (LLMs) improve in understanding and reasoning, they are increasingly used in privacy protection tasks such as de-identification, privacy-sensitive text generation, and entity obfuscation. However, these applications depend on an essential requirement: the accurate identification of personally identifiable information (PII). Compared with template-based PII that follows clear structural patterns, name-related PII depends much more on cultural and pragmatic context, which makes it harder for models to detect and raises higher privacy risks. Although recent studies begin to address this issue, existing work remains limited in language coverage, evaluation granularity, and the depth of error analysis. To address these gaps, this study proposes an error-driven framework that integrates diagnosis and intervention. Specifically, the framework introduces a method called Error-Driven Prompt (EDP), which transforms common failure patterns into executable prompting strategies. It further explores the integration of EDP with general advanced prompting techniques such as Chain-of-Thought (CoT), few-shot learning, and role-playing. In addition, the study constructed K-NameDiag, the first fine-grained evaluation benchmark for Korean name-related PII, which includes twelve culturally sensitive subtypes designed to examine model weaknesses in real-world contexts. The experimental results showed that EDP improved F1-scores in the range of 6 to 9 points across three widely used commercial LLMs, namely Claude Sonnet 4.5, GPT-5, and Gemini 2.5 Pro, while the Combined Enhanced Prompt (CEP), which integrates EDP with advanced prompting strategies, resulted in different shifts in precision and recall rather than consistent improvements. Further subtype-level analysis suggests that subtypes reliant on implicit cultural context remain resistant to correction, which shows the limitations of prompt engineering in addressing a model’s lack of internalized cultural knowledge. Full article
Show Figures

Figure 1

25 pages, 1219 KB  
Article
Chain-of-Thought Prompt Optimization via Adversarial Learning
by Guang Yang, Xiantao Cai, Shaohe Wang and Juhua Liu
Information 2025, 16(12), 1092; https://doi.org/10.3390/info16121092 - 9 Dec 2025
Viewed by 1073
Abstract
Chain-of-Thought (CoT) prompting has demonstrated strong effectiveness in improving the reasoning capabilities of Large Language Models (LLMs). However, existing CoT optimization approaches still lack systematic mechanisms for evaluating and refining prompts. To address this gap, we propose Adversarial Chain-of-Thought (adv-CoT), a framework that [...] Read more.
Chain-of-Thought (CoT) prompting has demonstrated strong effectiveness in improving the reasoning capabilities of Large Language Models (LLMs). However, existing CoT optimization approaches still lack systematic mechanisms for evaluating and refining prompts. To address this gap, we propose Adversarial Chain-of-Thought (adv-CoT), a framework that introduces adversarial learning into prompt optimization. Adv-CoT iteratively refines an initial prompt through generator–discriminator interactions and integrates both feedback and verification mechanisms. This process enables more targeted and interpretable improvements to CoT instructions and demonstrations. We evaluate adv-CoT on twelve datasets across commonsense, factual, symbolic, and arithmetic reasoning. Across 12 reasoning datasets, adv-CoT yields an average improvement of 4.44% on GPT-3.5-turbo and 1.08% on GPT-4o-mini, with both gains being statistically significant (paired t-test, p < 0.05). The experimental results show that the framework yields consistent but task-dependent gains, particularly on numerical and factual reasoning tasks, and maintains competitive performance on symbolic and commonsense benchmarks. Paired significance tests further indicate that improvements are statistically reliable on high-capacity proprietary models, while results on smaller open-source models exhibit greater variance. Although these findings demonstrate the promise of adversarial refinement for CoT prompting, the conclusions remain preliminary. The effectiveness of adv-CoT depends on the base model’s reasoning capability, and the current evaluation is limited to four major categories of reasoning tasks. We will release the full implementation and prompts to support further investigation into broader applications and more generalizable prompt optimization strategies. Full article
Show Figures

Graphical abstract

43 pages, 7699 KB  
Review
Unveiling the Algorithm: The Role of Explainable Artificial Intelligence in Modern Surgery
by Sara Lopes, Miguel Mascarenhas, João Fonseca, Maria Gabriela O. Fernandes and Adelino F. Leite-Moreira
Healthcare 2025, 13(24), 3208; https://doi.org/10.3390/healthcare13243208 - 8 Dec 2025
Viewed by 962
Abstract
Artificial Intelligence (AI) is rapidly transforming surgical care by enabling more accurate diagnosis and risk prediction, personalized decision-making, real-time intraoperative support, and postoperative management. Ongoing trends such as multi-task learning, real-time integration, and clinician-centered design suggest AI is maturing into a safe, pragmatic [...] Read more.
Artificial Intelligence (AI) is rapidly transforming surgical care by enabling more accurate diagnosis and risk prediction, personalized decision-making, real-time intraoperative support, and postoperative management. Ongoing trends such as multi-task learning, real-time integration, and clinician-centered design suggest AI is maturing into a safe, pragmatic asset in surgical care. Yet, significant challenges, such as the complexity and opacity of many AI models (particularly deep learning), transparency, bias, data sharing, and equitable deployment, must be surpassed to achieve clinical trust, ethical use, and regulatory approval of AI algorithms in healthcare. Explainable Artificial Intelligence (XAI) is an emerging field that plays an important role in bridging the gap between algorithmic power and clinical use as surgery becomes increasingly data-driven. The authors reviewed current applications of XAI in the context of surgery—preoperative risk assessment, surgical planning, intraoperative guidance, and postoperative monitoring—and highlighted the absence of these mechanisms in Generative AI (e.g., ChatGPT). XAI will allow surgeons to interpret, validate, and trust AI tools. XAI applied in surgery is not a luxury: it must be a prerequisite for responsible innovation. Model bias, overfitting, and user interface design are key challenges that need to be overcome and will be explored in this review to achieve the integration of XAI into the surgical field. Unveiling the algorithm is the first step toward a safe, accountable, transparent, and human-centered surgical AI. Full article
(This article belongs to the Section Artificial Intelligence in Healthcare)
Show Figures

Figure 1

Back to TopTop