Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,565)

Search Parameters:
Keywords = LLM-105

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1516 KB  
Article
Unlikely Storyteller: Leveraging Narrative-Based Communication in LLM-Generated Medical Advice
by Fan Wang, Ningshen Wang, Weiming Xu and Peng Zhang
Healthcare 2026, 14(8), 1015; https://doi.org/10.3390/healthcare14081015 (registering DOI) - 13 Apr 2026
Abstract
Background/Objectives: Time-constrained consultations in high-volume settings can crowd out patient-centered communication, while AI-generated advice may face algorithm aversion when it lacks a humanistic dimension. This study examined whether a brief narrative-based prompt could improve coded patient-facing communication features in an LLM relative to [...] Read more.
Background/Objectives: Time-constrained consultations in high-volume settings can crowd out patient-centered communication, while AI-generated advice may face algorithm aversion when it lacks a humanistic dimension. This study examined whether a brief narrative-based prompt could improve coded patient-facing communication features in an LLM relative to both clinicians and an unprompted model on authentic patient queries. Methods: We conducted a three-condition comparative evaluation using a stratified sample of 1000 de-identified MedDialog-CN consultations (2016–2020). For each consultation, the same patient query was used to generate (i) a zero-shot GPT-o3-mini response and (ii) a narrative-prompted GPT-o3-mini response; the original physician reply served as the human baseline. Responses were annotated with a pre-specified schema operationalizing four communication dimensions—Storytelling, Empathy, Personalization, and Clarity—with expert adjudication. Frequency-based indicators were summarized as mean events per consultation, and binary indicators as proportions; secondary checks captured unwarranted certainty and risk-relevant language. Results: Narrative prompting shifted coded patient-facing communication from sparse and selectively deployed (clinicians and zero-shot AI) to more routine and standardized. Across the reported communication measures, the prompted model showed the most favorable overall pattern, with higher narrative-device use, empathic support, contextual tailoring, and terminology explanation, alongside more frequent consideration of patient preferences and markedly higher rates of emotion–symptom linkage and the presence of a patient-centered narrative framework. Conclusions: Narrative prompting may offer a lightweight and potentially scalable strategy for improving patient-facing communication in Chinese asynchronous, text-based online consultations. An important next step is calibration: humanistic cues should be delivered selectively and safely so that responses remain credible, locally feasible, and cognitively manageable. Full article
(This article belongs to the Special Issue Artificial Intelligence in Healthcare: Opportunities and Challenges)
Show Figures

Figure 1

22 pages, 3734 KB  
Article
CLEAR: A Cognitive LLM-Empowered Adaptive Restoration Framework for Robust Ship Detection in Complex Maritime Scenarios
by Min Li, Xinyu Zhao and Yunfeng Wan
Remote Sens. 2026, 18(8), 1142; https://doi.org/10.3390/rs18081142 (registering DOI) - 12 Apr 2026
Abstract
Ship detection in remote sensing imagery serves as a cornerstone of modern maritime surveillance. Existing visible light detectors suffer from severe performance degradation in adverse environmental conditions (e.g., fog, low light) due to domain gaps. Traditional global enhancement methods often lack adaptability, leading [...] Read more.
Ship detection in remote sensing imagery serves as a cornerstone of modern maritime surveillance. Existing visible light detectors suffer from severe performance degradation in adverse environmental conditions (e.g., fog, low light) due to domain gaps. Traditional global enhancement methods often lack adaptability, leading to “negative transfer”—where artifacts are introduced into clean images or mismatched with degradation types. To address these challenges, we propose CLEAR (Cognitive Large Language Model (LLM)-Empowered Adaptive Restoration) framework. Inspired by the dual-process theory of cognition, we introduce a dynamic switching mechanism between fast perception and deep reasoning. Rather than processing all images indiscriminately, it utilizes a hybrid gating mechanism to efficiently filter nominal samples, triggering Vision–Language Model (VLM) only when necessary to diagnose degradation and dispatch targeted restoration operators. Extensive experiments on the constructed HRSC-Robust dataset demonstrate that CLEAR achieves an overall mean Average Precision (mAP) at 0.5 Intersection-over-Union (IoU) of 86.92%, outperforming the baseline by 7.74%. Notably, it establishes a “fail-safe” mechanism for optical degradations. By adaptively resolving fog and low-light, it effectively mitigates detector blindness—exemplified by a doubled Recall rate (52.52%) in dark scenarios. Furthermore, a confidence-based sparse triggering strategy ensures operational efficiency, maintaining a throughput of ~11.8 FPS in nominal conditions. This work validates the potential of VLMs for interpretable and robust remote sensing tasks. Full article
Show Figures

Figure 1

20 pages, 5504 KB  
Article
A Large Language Model for Traffic Flow Prediction Based on Stationary Wavelet Transform and Graph Convolutional Networks
by Xin Wang, Gang Liu, Jing He, Xiangbing Zhou and Zhiyong Luo
ISPRS Int. J. Geo-Inf. 2026, 15(4), 166; https://doi.org/10.3390/ijgi15040166 (registering DOI) - 11 Apr 2026
Abstract
With the rapid development of Intelligent Transportation Systems (ITSs), traffic prediction, a crucial component of ITSs, has garnered growing scholarly attention. The appli-cation of deep learning into traffic prediction has emerged as a prominent research direction, especially amid the rapid advancement of pretrained [...] Read more.
With the rapid development of Intelligent Transportation Systems (ITSs), traffic prediction, a crucial component of ITSs, has garnered growing scholarly attention. The appli-cation of deep learning into traffic prediction has emerged as a prominent research direction, especially amid the rapid advancement of pretrained large language models (LLMs), which offer substantial benefits in time-series analysis through cross-modal knowledge transfer. In response to this advancement, this study introduces an innovative model for traffic flow prediction, designated as WGLLM. To capture spatiotemporal characteristics inherent in traffic flow data, this model incorporates a sequence embedding layer constructed on the stationary wavelet transform (SWT) and long short-term memory (LSTM), in conjunction with a spatial embedding layer founded on graph convolutional networks (GCNs). Additionally, a fully connected layer is utilized to integrate embeddings into the LLMs for comprehensive global dependency analysis. To verify the effectiveness of the proposed approach, experiments were carried out on two real traffic flow datasets. The experimental results demonstrate that WGLLM achieves superior predictive performance compared to multiple mainstream baseline models, accompanied by a significant enhancement in prediction accuracy. Full article
Show Figures

Figure 1

18 pages, 439 KB  
Article
Understanding and Predicting Tourist Behavior Through Large Language Models
by Anna Dalla Vecchia, Simone Mattioli, Sara Migliorini and Elisa Quintarelli
Big Data Cogn. Comput. 2026, 10(4), 117; https://doi.org/10.3390/bdcc10040117 (registering DOI) - 11 Apr 2026
Abstract
Understanding and predicting how tourists move through a city is a challenging task, as it involves a complex interplay of spatial, temporal, and social factors. Traditional recommender systems often rely on structured data, trying to capture the nature of the problem. However, recent [...] Read more.
Understanding and predicting how tourists move through a city is a challenging task, as it involves a complex interplay of spatial, temporal, and social factors. Traditional recommender systems often rely on structured data, trying to capture the nature of the problem. However, recent advances in Large Language Models (LLMs) open new possibilities for reasoning over richer, text-based representations of user context, even without a dedicated pre-training phase. In this study, we investigate the potential of LLMs to interpret and predict tourist movements in a real-world application scenario involving tourist visits to Verona, a municipality in Northern Italy, between 2014 and 2023. We propose an incremental prompt engineering approach that gradually enriches the model input, from spatial features alone to richer behavioral information, including visit histories, time information, and user cluster patterns. The approach is evaluated using six open-source models, enabling us to compare their accuracy and efficiency across various levels of contextual enrichment. The results provide a first insight about the abilities of LLMs to incorporate spatio-temporal contextual factors, thus improving predictions, while maintaining computational efficiency. The analysis of the model-generated explanations completes the picture by adding an interpretability dimension that most existing next-PoI prediction solutions lack. Overall, the study demonstrates the potential of LLMs to integrate multiple contextual dimensions in tourism mobility, highlighting the possibility of a more text-oriented, adaptive, and explainable T-RS. Full article
(This article belongs to the Section Large Language Models and Embodied Intelligence)
Show Figures

Figure 1

15 pages, 392 KB  
Article
Random Forest Predicts Human Ratings of Creative Stories Using Very Small Training Samples
by Baptiste Barbot and Thomas Calogero Kiekens
Behav. Sci. 2026, 16(4), 576; https://doi.org/10.3390/bs16040576 (registering DOI) - 11 Apr 2026
Abstract
The Consensual Assessment Technique (CAT) is a gold standard of creativity assessment which provides valid product-based creativity scores that are contextually grounded (stemming from raters with unique expertise, culturally and historically situated). However, its implementation is often demanding (raters’ burden, complex rating designs). [...] Read more.
The Consensual Assessment Technique (CAT) is a gold standard of creativity assessment which provides valid product-based creativity scores that are contextually grounded (stemming from raters with unique expertise, culturally and historically situated). However, its implementation is often demanding (raters’ burden, complex rating designs). This study investigates whether machine learning can effectively simulate expert-panel judgments of creativity using minimal training data. Using a dataset of 411 short stories, we compared the performance of Random Forest (RF), Gradient Boosted Trees, and Decision Tree models, based on story length and Divergent Semantic Integration, to predict expert CAT ratings by (1) identifying the optimal algorithm and (2) the minimum training sample size required for reliable prediction. Results indicate that RF consistently outperformed other algorithms, achieving high correlations with CAT scores (r = 0.80) using as few as 25 training stories. Furthermore, RF demonstrated superior accuracy and lower reliance on story length compared to LLM-based scoring models. These findings provide a robust proof-of-concept for using simulated expert panels as a scalable alternative to (decontextualized) automated assessment methods, while reducing human raters’ burden and the logistical constraints of complex rating designs. Extension of this work to different contexts, creativity tasks and domains are necessary to gauge its generalizability. Full article
(This article belongs to the Section Cognition)
Show Figures

Figure 1

22 pages, 1449 KB  
Article
On the Vulnerability of Citation Metrics in the Era of Generative Artificial Intelligence
by Kay Smarsly
Publications 2026, 14(2), 23; https://doi.org/10.3390/publications14020023 (registering DOI) - 11 Apr 2026
Abstract
Large language model (LLM) chatbots, as a widely used form of generative artificial intelligence, have reduced the marginal cost of producing publication-style manuscripts and have expanded feasible routes for manipulating citation metrics within the publishing ecosystem. Citation-based indicators (e.g., the h-index, the i10-index, [...] Read more.
Large language model (LLM) chatbots, as a widely used form of generative artificial intelligence, have reduced the marginal cost of producing publication-style manuscripts and have expanded feasible routes for manipulating citation metrics within the publishing ecosystem. Citation-based indicators (e.g., the h-index, the i10-index, and total citation counts) remain embedded in research evaluation and are sensitive to indexing practices of bibliographic databases, with Google Scholar providing broad coverage combined with comparatively limited curation. In this study, a systematic literature review is conducted to synthesize reported mechanisms of citation-metric manipulation and to examine limitations of citation-metric use, including evidence reported in civil engineering. A Google Scholar proof-of-concept case study examines whether the indexing of LLM-assisted, non-peer-reviewed documents with concentrated references to a target author is associated with changes in author-level citation metrics under platform-specific conditions. After indexing, a stepwise increase in author-level metrics is observed, demonstrating the feasibility of citation-metric manipulation under the platform-specific conditions. Finally, this paper discusses the implications for research integrity and citation manipulation in the era of generative artificial intelligence. It also presents recommendations for researchers, academic institutions and evaluation committees, publishers and editors, bibliographic database providers, and funding institutions and policymakers. Full article
(This article belongs to the Special Issue AI in Academic Metrics and Impact Analysis)
Show Figures

Figure 1

6 pages, 450 KB  
Proceeding Paper
Class Entity Identification Based on Large Language Models: A Choice Between Classification and Generation
by Eric Jui-Lin Lu and Cheng-Hao Yang
Eng. Proc. 2026, 134(1), 42; https://doi.org/10.3390/engproc2026134042 (registering DOI) - 10 Apr 2026
Abstract
Large language models (LLMs) have been widely applied to knowledge graph question answering (KGQA) systems. Recent Text-to-SPARQL studies have demonstrated that generation performance can achieve an F1 score exceeding 90%. Further error analysis has categorized common errors into entity translation errors, entity position [...] Read more.
Large language models (LLMs) have been widely applied to knowledge graph question answering (KGQA) systems. Recent Text-to-SPARQL studies have demonstrated that generation performance can achieve an F1 score exceeding 90%. Further error analysis has categorized common errors into entity translation errors, entity position errors, and resource description framework (RDF) triple-count errors, with the latter accounting for 24% of all errors. Notably, nearly 90% of RDF triple-count errors occur when the triples involve class entities. Previous research has shown that incorporating prompts can effectively enhance model performance. Based on the results, we predicted whether a question contains a class entity and the number of RDF triples in the corresponding query to reduce RDF triple-count errors in large language models by providing precise task-related information through prompt design. Since both strategies are classification-oriented, two implementation paradigms were established: traditional classification architectures and generative modeling. They were compared in terms of performance. For classification-based architectures, we employed Bidirectional Encoder Representations from Transformers (BERT) and the Robustly Optimized BERT Approach (RoBERTa) to obtain question embeddings for classification. For the generative approach, we adopted the Instruction-Tuned Text-to-Text Transfer Transformer (Flan-T5). Experimental results show that the generative model slightly outperforms conventional classification architectures, indicating that generative approaches can achieve higher prediction accuracy and provide more reliable information without the need for additional complex encoder designs, thereby improving the overall quality of Text-to-SPARQL generation. Full article
Show Figures

Figure 1

31 pages, 3673 KB  
Article
Unveiling Systemic Risks in Sustainable Safety Management: Integrating BERTopic, LLM, and SNA for Accident Text Mining
by Lanjing Wang, Rui Huang, Yige Chen, Yunxiang Yang, Jing Zhan and Haiyuan Gong
Sustainability 2026, 18(8), 3787; https://doi.org/10.3390/su18083787 - 10 Apr 2026
Abstract
To unveil the underlying risk structures in complex industrial systems, this paper proposes a hybrid analytical framework that integrates BERTopic modeling, a large language model (LLM), and social network analysis (SNA). This framework aims to extract systemic safety intelligence from unstructured accident reports. [...] Read more.
To unveil the underlying risk structures in complex industrial systems, this paper proposes a hybrid analytical framework that integrates BERTopic modeling, a large language model (LLM), and social network analysis (SNA). This framework aims to extract systemic safety intelligence from unstructured accident reports. It first employs BERTopic to identify latent causal topics based on 745 Chinese accident investigation reports and utilizes DeepSeek-V3.1 (LLM) for semantic refinement and causal mapping of these topics. Subsequently, a semantic network of causal keywords based on positive pointwise mutual information (PPMI) is constructed, and its topological structure is analyzed using SNA methods. The study identifies and analyzes five major risk communities: confined spaces, fire, mining, construction, and road traffic. It reveals that accident causation exhibits the small-world characteristics of multi-factor coupling and non-linearity, with core risk nodes concentrated in systemic inducements such as organizational management and compliance deficiencies. The results demonstrate that this framework effectively identifies the latent systemic risk patterns embedded within the texts, providing methodological support for developing sustainable safety management mechanisms based on design for safety. Full article
(This article belongs to the Special Issue Achieving Sustainability in Safety Management and Design for Safety)
39 pages, 5852 KB  
Article
SAPIENT: A Multi-Agent Framework for Corporate Reputation Intelligence Through Sentinel Monitoring and LLM-Based Synthetic Population Simulation
by Alper Ozpinar and Saha Baygul Ozpinar
Systems 2026, 14(4), 425; https://doi.org/10.3390/systems14040425 - 10 Apr 2026
Abstract
Corporate reputation teams rely on media monitoring and qualitative research, both limited in speed and coverage when digital narratives form rapidly. This paper proposes SAPIENT (Sentinel-Augmented Population Intelligence for Emerging Narrative Tracking), a multi-agent system that links a sentinel layer over public text [...] Read more.
Corporate reputation teams rely on media monitoring and qualitative research, both limited in speed and coverage when digital narratives form rapidly. This paper proposes SAPIENT (Sentinel-Augmented Population Intelligence for Emerging Narrative Tracking), a multi-agent system that links a sentinel layer over public text streams with a simulation layer that runs moderated, repeatable in silico focus-group sessions. The sentinel layer ingests social media, news, and forum text to produce a compact signal state (topics, sentiment, anomaly scores, risk labels), which conditions the simulation layer through an orchestrator. Persona agents and a moderator follow an Agentic Focus Group (AFG) protocol with repeated runs, variance reporting, and human review gates. We describe four sustainability communication scenarios: greenwashing backlash prediction, greenhushing risk assessment, campaign pre-testing, and crisis communication simulation. Nine experiments span 280 AFG runs across 20 conditions, three LLM backends (Claude Sonnet 4, GPT-4o, and Gemini 2.5 Flash), and a preregistered pilot human validation study with 54 participants. Signal conditioning improved simulation specificity (p=0.012). Cross-lingual sessions revealed a sentiment asymmetry between English and Turkish (p=0.001) with preserved persona rank ordering (r=0.81, p=0.015). Cross-model comparison showed consistent persona differentiation across all three backends (Pearson r>0.92, p<0.002 for all pairs). Sentiment was robust to prompt paraphrasing (p=0.061, n.s.), though credibility was sensitive to prompt wording (p<0.001). All significant results from Experiments 1–8 survived Benjamini–Hochberg correction. A preregistered pilot with 54 human participants on Prolific replicated the predicted credibility ranking across framing variants (p=0.004) but not the sentiment ranking, identifying a specific calibration target for future work. Full article
(This article belongs to the Section Artificial Intelligence and Digital Systems Engineering)
29 pages, 2439 KB  
Review
Agentic and LLM-Based Multimodal Anomaly Detection: Architectures, Challenges, and Prospects
by Mohammed Ayalew Belay, Amirshayan Haghipour, Adil Rasheed and Pierluigi Salvo Rossi
Sensors 2026, 26(8), 2330; https://doi.org/10.3390/s26082330 - 9 Apr 2026
Viewed by 118
Abstract
Anomaly detection is crucial in maintaining the safety, reliability, and optimal performance of complex systems across diverse domains, such as industrial manufacturing, cybersecurity, and autonomous systems. While conventional methods typically handle single data modalities, recently, there has been an increase in the application [...] Read more.
Anomaly detection is crucial in maintaining the safety, reliability, and optimal performance of complex systems across diverse domains, such as industrial manufacturing, cybersecurity, and autonomous systems. While conventional methods typically handle single data modalities, recently, there has been an increase in the application of multimodal detection in dynamic real-world environments. This paper presents a comprehensive review of recent research at the intersection of agentic artificial intelligence and large language-based multimodal anomaly detection. We systematically analyze and categorize existing studies based on the agent architecture, reasoning capabilities, tool integration, and modality scope. The main contribution of this work is a novel taxonomy that unifies agentic and multimodal anomaly detection methods, alongside benchmark datasets, evaluation methods, key challenges, and mitigation strategies. Furthermore, we identify major open issues, including data alignment, scalability, reliability, explainability, and evaluation standardization. Finally, we outline future research directions, with a particular emphasis on trustworthy autonomous agents, efficient multimodal fusion, human-in-the-loop systems, and real-world deployment in safety-critical applications. Full article
(This article belongs to the Special Issue Intelligent Sensors for Security and Attack Detection)
27 pages, 3278 KB  
Article
Multimodal PPG-Based Arrhythmia Detection Using a CLIP-Initialized Multi-Task U-Net and LLM-Assisted Reporting
by Youngho Huh, Minhwan Noh, Dongwoo Ji, Yuna Oh and Sukkyu Sun
Sensors 2026, 26(8), 2316; https://doi.org/10.3390/s26082316 - 9 Apr 2026
Viewed by 193
Abstract
Photoplethysmography (PPG) has emerged as an attractive modality for non-invasive cardiovascular monitoring due to its low cost, unobtrusive nature, and ubiquity in consumer wearable devices. Despite its potential, existing PPG-based arrhythmia detection systems remain limited in scope: (i) most target only atrial fibrillation, [...] Read more.
Photoplethysmography (PPG) has emerged as an attractive modality for non-invasive cardiovascular monitoring due to its low cost, unobtrusive nature, and ubiquity in consumer wearable devices. Despite its potential, existing PPG-based arrhythmia detection systems remain limited in scope: (i) most target only atrial fibrillation, (ii) temporal localization of abnormal segments is rarely provided, and (iii) deep learning models lack explainability, hindering adoption in clinical workflows. We present a comprehensive and fully integrated framework for multi-class arrhythmia detection, segmentation, and explainability based on PPG waveforms, Heart Rate Variability (HRV), and structured clinical metadata. The proposed system introduces a CLIP-style contrastive learning module aligning PPG waveforms with clinical variables and rhythm-state textual descriptions using BioBERT; a multitask U-Net architecture performing 4-class classification and 1D segmentation; a Retrieval-Augmented Generation (RAG) pipeline leveraging Gemini Flash large language models to produce guideline-grounded diagnostic reports; and a real-time Streamlit-based web platform supporting inference, visualization, and database storage. The system significantly improves classification accuracy (from 86.27% to 91.19%) and segmentation Dice (from 0.5815 to 0.7167). These results demonstrate the feasibility of a robust, multimodal, and explainable PPG-based arrhythmia monitoring system for real-world applications. Full article
(This article belongs to the Section Wearables)
Show Figures

Figure 1

31 pages, 380 KB  
Article
Hybrid Approach to Patient Review Classification at Scale: From Expert Annotations to Production-Ready Machine Learning Models for Sustainable Healthcare
by Irina Evgenievna Kalabikhina, Anton Vasilyevich Kolotusha and Vadim Sergeevich Moshkin
Big Data Cogn. Comput. 2026, 10(4), 114; https://doi.org/10.3390/bdcc10040114 - 9 Apr 2026
Viewed by 90
Abstract
Patients leave millions of medical reviews annually, providing critical data for quality management. However, manual processing is infeasible, and existing systems fail to distinguish medical from organizational problems—a distinction essential for complaint routing. The consequences of misrouting are significant: clinical issues may go [...] Read more.
Patients leave millions of medical reviews annually, providing critical data for quality management. However, manual processing is infeasible, and existing systems fail to distinguish medical from organizational problems—a distinction essential for complaint routing. The consequences of misrouting are significant: clinical issues may go unaddressed when medical complaints reach administrative staff, while systemic service problems remain unresolved when organizational complaints reach medical directors. We developed a hybrid approach combining expert annotation with Large Language Models (LLMs). Fifteen prompt iterations on 1500 reviews with expert validation (modified Cohen’s kappa (κ_mod), which weights errors hierarchically, reached 0.745) preceded the LLM annotation of 15,000 mixed-sentiment and positive reviews. These were combined with 7417 expert-annotated negative reviews to form a corpus of 22,417 reviews. Eight architectures, ranging from Logistic Regression to a BERT + TF-IDF + LightGBM ensemble, were compared using both standard metrics and domain-specific practical metrics tailored to complaint routing. The best model, scaled to 4.3 million Russian-language reviews from the Prodoctorov.ru platform, achieved 92.9% Practical Accuracy—the proportion of reviews classified without critical medical–organizational misclassification errors (M ↔ O)—compared to 68.0% standard accuracy, which treats all errors equally. Critical errors were reduced to 1.4%, yielding 144,000 more correctly processed complaints than traditional methods (TF-IDF + Logistic Regression). Analysis of the scaled data revealed the following: 46.1% M (medical), 21.0% O (organizational), and 32.9% C (combined) reviews; medical ratings were highest (4.75 vs. 4.59 for organizational, p < 0.001); combined reviews were longest (802 characters); zero-star reviews comprised 3.8% of feedback, with organizational complaints dominating (38.2%) among extreme negatives; and average ratings rose by 1.24 points over 14 years. This hybrid approach yields expert-comparable corpora, automates 93% of feedback processing, ensures correct complaint routing, and contributes to healthcare sustainability by reducing administrative burden, accelerating resolution, and enabling data-driven quality management without proportional increases in human resources. All analyses were conducted on Russian-language patient reviews. Full article
25 pages, 1844 KB  
Article
Retrieval-Augmented Large Language Model-Based Framework for Hierarchical Classification of Public Feedback on Transportation Infrastructure
by Milan Knezevic, Trevor Neece, Marko Vukojevic, Lev Khazanovich and Aleksandar Stevanovic
Appl. Sci. 2026, 16(8), 3663; https://doi.org/10.3390/app16083663 - 9 Apr 2026
Viewed by 254
Abstract
Transportation agencies receive large volumes of free-form public comments describing infrastructure conditions, safety concerns, and service issues. These comments are often processed manually for downstream operational actions, which is time-consuming, inconsistent across reviewers, and difficult to scale, thereby limiting their value for operational [...] Read more.
Transportation agencies receive large volumes of free-form public comments describing infrastructure conditions, safety concerns, and service issues. These comments are often processed manually for downstream operational actions, which is time-consuming, inconsistent across reviewers, and difficult to scale, thereby limiting their value for operational decision-making. This study presents a machine learning and Large Language Model (LLM) framework for automated triage of free-form public comments, assigning each report to a three-level hierarchical taxonomy consisting of Category, Subcategory, and Final Decision. The proposed framework uses agency historical data together with retrieval-based evidence, where semantically similar past comments are provided to the LLM as contextual support to better align predictions with agency-specific labeling practices. The framework was evaluated using TF-IDF with Logistic Regression, TF-IDF with Linear SVM, embedding-based kNN with cosine similarity, few-shot LLM prompting, and retrieval-based LLM prompting. Results show that retrieval-based prompting achieved the best overall performance, with the highest accuracy at both the Category and Subcategory levels. At the Final Decision level, retrieval-based prompting slightly outperformed kNN, while few-shot prompting performed worse. Error analysis showed that many misclassifications were semantically plausible alternatives, reflecting the overlap across infrastructure-related complaint categories. When a second candidate label was allowed, further improving performance. Latency analysis also indicated that the framework can process more than 2000 comments in under 30 min, supporting faster and more consistent agency workflows. Full article
(This article belongs to the Special Issue Intelligent Transportation and Mobility Analytics)
Show Figures

Figure 1

7 pages, 707 KB  
Proceeding Paper
Enhancing Text-to-SPARQL Generation via In-Context Learning with Example Selection Strategies
by Eric Jui-Lin Lu and Zi-Ting Su
Eng. Proc. 2026, 134(1), 36; https://doi.org/10.3390/engproc2026134036 - 9 Apr 2026
Viewed by 145
Abstract
Large language models demonstrate strong in-context learning (ICL) capabilities, allowing them to perform diverse tasks without fine-tuning. In knowledge graph question answering (KGQA), natural language questions are translated into SPARQL queries. Existing ICL approaches mainly rely on semantic similarity, often neglecting structural features. [...] Read more.
Large language models demonstrate strong in-context learning (ICL) capabilities, allowing them to perform diverse tasks without fine-tuning. In knowledge graph question answering (KGQA), natural language questions are translated into SPARQL queries. Existing ICL approaches mainly rely on semantic similarity, often neglecting structural features. To address this limitation, we developed a structure-aware example selection strategy that integrates both semantic and structural patterns by abstracting Resource Description Framework (RDF) triples. We compare four strategies: (1) fully random, (2) semantic similarity, (3) same-type random, and (4) same-type semantic similarity. Experiments on LC-QuAD 1.0 using FLAN-T5 show that in non-fine-tuned settings, structure-aware semantic selection achieves the best results, highlighting the importance of structural congruence, while after fine-tuning, differences between strategies converge but diversity and semantic relevance remain beneficial. These findings demonstrate the critical role of example quality in ICL and provide empirical insights for KGQA design. Full article
Show Figures

Figure 1

15 pages, 719 KB  
Article
Efficacy of Large Language Models for Screening of Systematic Reviews on Periprosthetic Joint Infection
by Woojin Shin, Jaeyoung Hong, Sunwoo Lee, Seongchan Park, Hyoungtae Kim and Suenghwan Jo
J. Clin. Med. 2026, 15(8), 2830; https://doi.org/10.3390/jcm15082830 - 8 Apr 2026
Viewed by 195
Abstract
Background: Periprosthetic joint infection (PJI) remains a devastating complication following arthroplasty. Systematic reviews of PJI provide essential evidence to inform clinical practice; however, the screening process remains labor-intensive. Recent advancements in large language models (LLMs) offer potential for automating literature screening, though [...] Read more.
Background: Periprosthetic joint infection (PJI) remains a devastating complication following arthroplasty. Systematic reviews of PJI provide essential evidence to inform clinical practice; however, the screening process remains labor-intensive. Recent advancements in large language models (LLMs) offer potential for automating literature screening, though evaluation of current generation models is needed. Methods: This validation study evaluated GPT-5, GPT-5 Pro, and Gemini 2.5 Pro in replicating the title/abstract and full-text screening stages of a published systematic review on intraosseous versus intravenous antibiotic prophylaxis in total joint arthroplasty. Title/abstract screening was performed on 165 articles, followed by a full-text eligibility assessment of 26 articles. Accuracy, sensitivity, specificity, and Cohen’s kappa (κ) were calculated against human screening decisions as the gold standard. Results: In title/abstract screening, GPT-5 Pro achieved the highest accuracy (92.1–92.7%) and specificity (98.6–99.3%), while GPT-5 demonstrated the highest sensitivity (84.6–96.1%). In full-text screening, Gemini 2.5 Pro showed the most consistent performance across repeated evaluations (κ = 0.839 in both trials), whereas GPT-5 Pro exhibited marked intra-model variability (κ = 0.399 to 0.920). Conclusions: Current-generation LLMs achieve near-human accuracy in systematic review screening for PJI research, though substantial intra-model variability underscores the continued need for human oversight in systematic review workflows. Full article
(This article belongs to the Section Orthopedics)
Show Figures

Figure 1

Back to TopTop