Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (340)

Search Parameters:
Keywords = LLaMA

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
15 pages, 740 KB  
Article
A Scalable and Low-Cost Mobile RAG Architecture for AI-Augmented Learning in Higher Education
by Rodolfo Bojorque, Andrea Plaza, Pilar Morquecho and Fernando Moscoso
Appl. Sci. 2026, 16(2), 963; https://doi.org/10.3390/app16020963 (registering DOI) - 17 Jan 2026
Abstract
This paper presents a scalable and low-cost Retrieval Augmented Generation (RAG) architecture designed to enhance learning in university-level courses, with a particular focus on supporting students from economically disadvantaged backgrounds. Recent advances in large language models (LLMs) have demonstrated considerable potential in educational [...] Read more.
This paper presents a scalable and low-cost Retrieval Augmented Generation (RAG) architecture designed to enhance learning in university-level courses, with a particular focus on supporting students from economically disadvantaged backgrounds. Recent advances in large language models (LLMs) have demonstrated considerable potential in educational contexts; however, their adoption is often limited by computational costs and the need for stable broadband access, issues that disproportionately affect low-income learners. To address this challenge, we propose a lightweight, mobile, and friendly RAG system that integrates the LLaMA language model with the Milvus vector database, enabling efficient on device retrieval and context-grounded generation using only modest hardware resources. The system was implemented in a university-level Data Mining course and evaluated over four semesters using a quasi-experimental design with randomized assignment to experimental and control groups. Students in the experimental group had voluntary access to the RAG assistant, while the control group followed the same instructional schedule without exposure to the tool. The results show statistically significant improvements in academic performance for the experimental group, with p < 0.01 in the first semester and p < 0.001 in the subsequent three semesters. Effect sizes, measured using Hedges g to account for small cohort sizes, increased from 0.56 (moderate) to 1.52 (extremely large), demonstrating a clear and growing pedagogical impact over time. Qualitative feedback further indicates increased learner autonomy, confidence, and engagement. These findings highlight the potential of mobile RAG architectures to deliver equitable, high-quality AI support to students regardless of socioeconomic status. The proposed solution offers a practical engineering pathway for institutions seeking inclusive, scalable, and resource-efficient approaches to AI-enhanced education. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

22 pages, 6241 KB  
Article
Using Large Language Models to Detect and Debunk Climate Change Misinformation
by Zeinab Shahbazi and Sara Behnamian
Big Data Cogn. Comput. 2026, 10(1), 34; https://doi.org/10.3390/bdcc10010034 (registering DOI) - 17 Jan 2026
Abstract
The rapid spread of climate change misinformation across digital platforms undermines scientific literacy, public trust, and evidence-based policy action. Advances in Natural Language Processing (NLP) and Large Language Models (LLMs) create new opportunities for automating the detection and correction of misleading climate-related narratives. [...] Read more.
The rapid spread of climate change misinformation across digital platforms undermines scientific literacy, public trust, and evidence-based policy action. Advances in Natural Language Processing (NLP) and Large Language Models (LLMs) create new opportunities for automating the detection and correction of misleading climate-related narratives. This study presents a multi-stage system that employs state-of-the-art large language models such as Generative Pre-trained Transformer 4 (GPT-4), Large Language Model Meta AI (LLaMA) version 3 (LLaMA-3), and RoBERTa-large (Robustly optimized BERT pretraining approach large) to identify, classify, and generate scientifically grounded corrections for climate misinformation. The system integrates several complementary techniques, including transformer-based text classification, semantic similarity scoring using Sentence-BERT, stance detection, and retrieval-augmented generation (RAG) for evidence-grounded debunking. Misinformation instances are detected through a fine-tuned RoBERTa–Multi-Genre Natural Language Inference (MNLI) classifier (RoBERTa-MNLI), grouped using BERTopic, and verified against curated climate-science knowledge sources using BM25 and dense retrieval via FAISS (Facebook AI Similarity Search). The debunking component employs RAG-enhanced GPT-4 to produce accurate and persuasive counter-messages aligned with authoritative scientific reports such as those from the Intergovernmental Panel on Climate Change (IPCC). A diverse dataset of climate misinformation categories covering denialism, cherry-picking of data, false causation narratives, and misleading comparisons is compiled for evaluation. Benchmarking experiments demonstrate that LLM-based models substantially outperform traditional machine-learning baselines such as Support Vector Machines, Logistic Regression, and Random Forests in precision, contextual understanding, and robustness to linguistic variation. Expert assessment further shows that generated debunking messages exhibit higher clarity, scientific accuracy, and persuasive effectiveness compared to conventional fact-checking text. These results highlight the potential of advanced LLM-driven pipelines to provide scalable, real-time mitigation of climate misinformation while offering guidelines for responsible deployment of AI-assisted debunking systems. Full article
(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)
Show Figures

Figure 1

22 pages, 572 KB  
Article
Machines Prefer Humans as Literary Authors: Evaluating Authorship Bias in Large Language Models
by Marco Rospocher, Massimo Salgaro and Simone Rebora
Information 2026, 17(1), 95; https://doi.org/10.3390/info17010095 - 16 Jan 2026
Abstract
Automata and artificial intelligence (AI) have long occupied a central place in cultural and artistic imagination, and the recent proliferation of AI-generated artworks has intensified debates about authorship, creativity, and human agency. Empirical studies show that audiences often perceive AI-generated works as less [...] Read more.
Automata and artificial intelligence (AI) have long occupied a central place in cultural and artistic imagination, and the recent proliferation of AI-generated artworks has intensified debates about authorship, creativity, and human agency. Empirical studies show that audiences often perceive AI-generated works as less authentic or emotionally resonant than human creations, with authorship attribution strongly shaping esthetic judgments. Yet little attention has been paid to how AI systems themselves evaluate creative authorship. This study investigates how large language models (LLMs) evaluate literary quality under different framings of authorship—Human, AI, or Human+AI collaboration. Using a questionnaire-based experimental design, we prompted four instruction-tuned LLMs (ChatGPT 4, Gemini 2, Gemma 3, and LLaMA 3) to read and assess three short stories in Italian, originally generated by ChatGPT 4 in the narrative style of Roald Dahl. For each story × authorship condition × model combination, we collected 100 questionnaire completions, yielding 3600 responses in total. Across esthetic, literary, and inclusiveness dimensions, the stated authorship systematically conditioned model judgments: identical stories were consistently rated more favorably when framed as human-authored or human–AI co-authored than when labeled as AI-authored, revealing a robust negative bias toward AI authorship. Model-specific analyses further indicate distinctive evaluative profiles and inclusiveness thresholds across proprietary and open-source systems. Our findings extend research on attribution bias into the computational realm, showing that LLM-based evaluations reproduce human-like assumptions about creative agency and literary value. We publicly release all materials to facilitate transparency and future comparative work on AI-mediated literary evaluation. Full article
(This article belongs to the Special Issue Emerging Research in Computational Creativity and Creative Robotics)
20 pages, 7030 KB  
Article
Latency-Aware Benchmarking of Large Language Models for Natural-Language Robot Navigation in ROS 2
by Murat Das, Zawar Hussain and Muhammad Nawaz
Sensors 2026, 26(2), 608; https://doi.org/10.3390/s26020608 - 16 Jan 2026
Abstract
A growing challenge in mobile robotics is the reliance on complex graphical interfaces and rigid control pipelines, which limit accessibility for non-expert users. This work introduces a latency-aware benchmarking framework that enables natural-language robot navigation by integrating multiple Large Language Models (LLMs) with [...] Read more.
A growing challenge in mobile robotics is the reliance on complex graphical interfaces and rigid control pipelines, which limit accessibility for non-expert users. This work introduces a latency-aware benchmarking framework that enables natural-language robot navigation by integrating multiple Large Language Models (LLMs) with the Robot Operating System 2 (ROS 2) Navigation 2 (Nav2) stack. The system allows robots to interpret and act upon free-form text instructions, replacing traditional Human–Machine Interfaces (HMIs) with conversational interaction. Using a simulated TurtleBot4 platform in Gazebo Fortress, we benchmarked a diverse set of contemporary LLMs, including GPT-3.5, GPT-4, GPT-5, Claude 3.7, Gemini 2.5, Mistral-7B Instruct, DeepSeek-R1, and LLaMA-3.3-70B, across three local planners, namely Dynamic Window Approach (DWB), Timed Elastic Band (TEB), and Regulated Pure Pursuit (RPP). The framework measures end-to-end response latency, instruction-parsing accuracy, path quality, and task success rate in standardised indoor scenarios. The results show that there are clear trade-offs between latency and accuracy, where smaller models respond quickly but have less spatial reasoning, while larger models have more consistent navigation intent but take longer to respond. The proposed framework is the first reproducible multi-LLM system with multi-planner evaluations within ROS 2, supporting the development of intuitive and latency-efficient natural-language interfaces for robot navigation. Full article
Show Figures

Figure 1

22 pages, 8300 KB  
Article
Sign2Story: A Multimodal Framework for Near-Real-Time Hand Gestures via Smartphone Sensors to AI-Generated Audio-Comics
by Gul Faraz, Lei Jing and Xiang Li
Sensors 2026, 26(2), 596; https://doi.org/10.3390/s26020596 - 15 Jan 2026
Viewed by 34
Abstract
This study presents a multimodal framework that uses smartphone motion sensors and generative AI to create audio comics from live news headlines. The system operates without direct touch or voice input, instead responding to simple hand-wave gestures. The system demonstrates potential as an [...] Read more.
This study presents a multimodal framework that uses smartphone motion sensors and generative AI to create audio comics from live news headlines. The system operates without direct touch or voice input, instead responding to simple hand-wave gestures. The system demonstrates potential as an alternative input method, which may benefit users who find traditional touch or voice interaction challenging. In the experiments, we investigated the generation of comics on based on the latest tech-related news headlines using Really Simple Syndication (RSS) on a simple hand wave gesture. The proposed framework demonstrates extensibility beyond comic generation, as various other tasks utilizing large language models and multimodal AI could be integrated by mapping them to different hand gestures. Our experiments with open-source models like LLaMA, LLaVA, Gemma, and Qwen revealed that LLaVA delivers superior results in generating panel-aligned stories compared to Qwen3-VL, both in terms of inference speed and output quality, relative to the source image. These large language models (LLMs) collectively contribute imaginative and conversational narrative elements that enhance diversity in storytelling within the comic format. Additionally, we implement an AI-in-the-loop mechanism to iteratively improve output quality without human intervention. Finally, AI-generated audio narration is incorporated into the comics to create an immersive, multimodal reading experience. Full article
(This article belongs to the Special Issue Body Area Networks: Intelligence, Sensing and Communication)
Show Figures

Figure 1

26 pages, 2786 KB  
Article
Time-Series Modeling and LLM-Based Agents for Peak Energy Management in Smart Campus Environments
by Mossab Batal, Youness Tace, Hassna Bensag, Sanaa El Filali and Mohamed Tabaa
Sustainability 2026, 18(2), 875; https://doi.org/10.3390/su18020875 - 15 Jan 2026
Viewed by 54
Abstract
A Smart campus increasingly operates on the basis of data-driven operations, but an increasing demand for energy puts their control over costs and sustainability at risk. This study addresses the challenge of anticipating and managing energy consumption peaks in multi-campus environments by proposing [...] Read more.
A Smart campus increasingly operates on the basis of data-driven operations, but an increasing demand for energy puts their control over costs and sustainability at risk. This study addresses the challenge of anticipating and managing energy consumption peaks in multi-campus environments by proposing a hybrid framework that combines advanced time-series forecasting models with a large language model (LLM)-driven multi-agent system. Based on the UNICON dataset, LSTM, CNN, GRU, and a combination architecture are trained and compared in terms of MAE and RMSE. The hybrid configuration achieves the greatest forecasting results by returning the minimum loss values. For the identification of critical periods, we employed a strategy based on median thresholding, which offers a categorization into low, normal, and extreme category, allowing the targeting of peak mitigation actions. We also introduce a multi-agent system based on the LLM, including the data aggregator, the forecaster, and the policy advisor, which create actionable policies informed by context. We also compare LLMs (Qwen-2.5, Gemma-2, Phi-4, Mistral, Llama-3.3) in terms of context accuracy, response relevance, semantic similarity, and retrieval/recall accuracy and fidelity, with Llama-3.3 achieving the best overall results. This framework has shown great potential, not only for energy consumption forecasting but also for developing precise policies on how to effectively manage energy consumption peaks. Full article
(This article belongs to the Section Environmental Sustainability and Applications)
Show Figures

Figure 1

11 pages, 412 KB  
Article
Artificial Intelligence Chatbots in Peritoneal Dialysis Education: A Cross-Sectional Comparative Study of Quality, Readability, and Reliability
by Engin Onan, İlter Bozaci, Yelda Deligoz Bildaci, Sevinc Puren Yucel Karakaya, Ruya Kozanoglu and Rumeyza Kazancioglu
J. Clin. Med. 2026, 15(2), 692; https://doi.org/10.3390/jcm15020692 - 15 Jan 2026
Viewed by 52
Abstract
Background: Peritoneal dialysis (PD) remains underutilized worldwide, partly due to limited patient education, misconceptions, and barriers to accessing reliable health information. Artificial intelligence (AI)-based chatbots have emerged as promising tools for improving health literacy, supporting shared decision-making, and enhancing patient engagement. However, concerns [...] Read more.
Background: Peritoneal dialysis (PD) remains underutilized worldwide, partly due to limited patient education, misconceptions, and barriers to accessing reliable health information. Artificial intelligence (AI)-based chatbots have emerged as promising tools for improving health literacy, supporting shared decision-making, and enhancing patient engagement. However, concerns regarding content quality, reliability, and readability persist, and no study to date has systematically evaluated AI-generated content in the context of PD. Therefore, this study aimed to systematically evaluate the quality, reliability, and readability of AI-generated educational content on peritoneal dialysis using multiple large language model-based chatbots. Methods: A total of 45 frequently asked questions about PD were developed by nephrology experts and categorized into three domains: general information (n = 15), technical and clinical issues (n = 21), and myths/misconceptions (n = 9). Three AI-based chatbots, Gemini Pro 2.5, ChatGPT-5, and LLaMA Maverick 4, were prompted to generate responses to all questions. Each response was independently evaluated by two blinded reviewers for textual characteristics, readability using the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL), and content quality/reliability using the Ensuring Quality Information for Patients (EQIP) tool and the Modified DISCERN instrument. Results: Across all domains, significant differences were observed among the chatbots. Gemini Pro 2.5 achieved higher Flesch Reading Ease (FRES) scores (32.6 ± 10.5) compared with ChatGPT-5 (24.2 ± 11.7) and LLaMA Maverick 4 (16.2 ± 7.5; p < 0.001), as well as higher EQIP scores (75.4% vs. 59.4% and 61.5%, respectively; p < 0.001) and Modified DISCERN scores (4.0 [4.0–4.5] vs. 3.0 [3.0–3.5] and 3.0 [2.5–3.5]; p < 0.001). ChatGPT-5 demonstrated intermediate performance, while LLaMA Maverick 4 showed lower scores across evaluated metrics. Conclusions: These findings demonstrate differences among AI-based chatbots in readability, content quality, and reliability when responding to identical peritoneal dialysis–related questions. While AI chatbots may support health literacy and complement clinical decision-making, their outputs should be interpreted with caution and under appropriate clinical oversight. Future research should focus on multilingual, multicenter, and outcome-based studies to ensure the safe integration of AI into PD patient education. Full article
Show Figures

Figure 1

28 pages, 22992 KB  
Article
Domain Knowledge-Infused Synthetic Data Generation for LLM-Based ICS Intrusion Detection: Mitigating Data Scarcity and Imbalance
by Seokhyun Ann, Hongeun Kim, Suhyeon Park, Seong-je Cho, Joonmo Kim and Harksu Cho
Electronics 2026, 15(2), 371; https://doi.org/10.3390/electronics15020371 - 14 Jan 2026
Viewed by 118
Abstract
Industrial control systems (ICSs) are increasingly interconnected with enterprise IT networks and remote services, which expands the attack surface of operational technology (OT) environments. However, collecting sufficient attack traffic from real OT/ICS networks is difficult, and the resulting scarcity and class imbalance of [...] Read more.
Industrial control systems (ICSs) are increasingly interconnected with enterprise IT networks and remote services, which expands the attack surface of operational technology (OT) environments. However, collecting sufficient attack traffic from real OT/ICS networks is difficult, and the resulting scarcity and class imbalance of malicious data hinder the development of intrusion detection systems (IDSs). At the same time, large language models (LLMs) have shown promise for security analytics when system events are expressed in natural language. This study investigates an LLM-based network IDS for a smart-factory OT/ICS environment and proposes a synthetic data generation method that injects domain knowledge into attack samples. Using the ICSSIM simulator, we construct a bottle-filling smart factory, implement six MITRE ATT&CK for ICS-based attack scenarios, capture Modbus/TCP traffic, and convert each request–response pair into a natural-language description of network behavior. We then generate synthetic attack descriptions with GPT by combining (1) statistical properties of normal traffic, (2) MITRE ATT&CK for ICS tactics and techniques, and (3) expert knowledge obtained from executing the attacks in ICSSIM. The Llama 3.1 8B Instruct model is fine-tuned with QLoRA on a seven-class classification task (Benign vs. six attack types) and evaluated on a test set composed exclusively of real ICSSIM traffic. Experimental results show that synthetic data generated only from statistical information, or from statistics plus MITRE descriptions, yield limited performance, whereas incorporating environment-specific expert knowledge is associated with substantially higher performance on our ICSSIM-based expanded test set (100% accuracy in binary detection and 96.49% accuracy with a macro F1-score of 0.958 in attack-type classification). Overall, these findings suggest that domain-knowledge-infused synthetic data and natural-language traffic representations can support LLM-based IDSs in OT/ICS smart-factory settings; however, further validation on larger and more diverse datasets is needed to confirm generality. Full article
(This article belongs to the Special Issue AI-Enhanced Security: Advancing Threat Detection and Defense)
Show Figures

Figure 1

29 pages, 2829 KB  
Article
Real-Time Deterministic Lane Detection on CPU-Only Embedded Systems via Binary Line Segment Filtering
by Shang-En Tsai, Shih-Ming Yang and Chia-Han Hsieh
Electronics 2026, 15(2), 351; https://doi.org/10.3390/electronics15020351 - 13 Jan 2026
Viewed by 199
Abstract
The deployment of Advanced Driver-Assistance Systems (ADAS) in economically constrained markets frequently relies on hardware architectures that lack dedicated graphics processing units. Within such environments, the integration of deep neural networks faces significant hurdles, primarily stemming from strict limitations on energy consumption, the [...] Read more.
The deployment of Advanced Driver-Assistance Systems (ADAS) in economically constrained markets frequently relies on hardware architectures that lack dedicated graphics processing units. Within such environments, the integration of deep neural networks faces significant hurdles, primarily stemming from strict limitations on energy consumption, the absolute necessity for deterministic real-time response, and the rigorous demands of safety certification protocols. Meanwhile, traditional geometry-based lane detection pipelines continue to exhibit limited robustness under adverse illumination conditions, including intense backlighting, low-contrast nighttime scenes, and heavy rainfall. Motivated by these constraints, this work re-examines geometry-based lane perception from a sensor-level viewpoint and introduces a Binary Line Segment Filter (BLSF) that leverages the inherent structural regularity of lane markings in bird’s-eye-view (BEV) imagery within a computationally lightweight framework. The proposed BLSF is integrated into a complete pipeline consisting of inverse perspective mapping, median local thresholding, line-segment detection, and a simplified Hough-style sliding-window fitting scheme combined with RANSAC. Experiments on a self-collected dataset of 297 challenging frames show that the inclusion of BLSF significantly improves robustness over an ablated baseline while sustaining real-time performance on a 2 GHz ARM CPU-only platform. Additional evaluations on the Dazzling Light and Night subsets of the CULane and LLAMAS benchmarks further confirm consistent gains of approximately 6–7% in F1-score, together with corresponding improvements in IoU. These results demonstrate that interpretable, geometry-driven lane feature extraction remains a practical and complementary alternative to lightweight learning-based approaches for cost- and safety-critical ADAS applications. Full article
(This article belongs to the Special Issue Feature Papers in Electrical and Autonomous Vehicles, Volume 2)
Show Figures

Figure 1

22 pages, 884 KB  
Article
Sentiment-Augmented RNN Models for Mini-TAIEX Futures Prediction
by Yu-Heng Hsieh, Keng-Pei Lin, Ching-Hsi Tseng, Xiaolong Liu and Shyan-Ming Yuan
Algorithms 2026, 19(1), 69; https://doi.org/10.3390/a19010069 - 13 Jan 2026
Viewed by 85
Abstract
Accurate forecasting in low-liquidity futures markets is essential for effective trading. This study introduces a hybrid decision-support framework that combines Mini-TAIEX (MTX) futures data with sentiment signals extracted from 13 financial news sources and PTT forum discussions. Sentiment features are generated using three [...] Read more.
Accurate forecasting in low-liquidity futures markets is essential for effective trading. This study introduces a hybrid decision-support framework that combines Mini-TAIEX (MTX) futures data with sentiment signals extracted from 13 financial news sources and PTT forum discussions. Sentiment features are generated using three domain-adapted large language models—FinGPT-internLM, FinGPT-llama, and FinMA—trained on more than 360,000 finance-related texts. These features are integrated with technical indicators in four deep learning models: LSTM, GRU, Informer, and PatchTST. Experiments from June 2024 to June 2025 show that sentiment-augmented models consistently outperform baselines. Backtesting further demonstrates that the sentiment-enhanced PatchTST achieves a 526% cumulative return with a Sharpe ratio of 0.407, highlighting the value of incorporating sentiment into AI-driven futures trading systems. Full article
Show Figures

Figure 1

14 pages, 1101 KB  
Article
AI in the Hot Seat: Head-to-Head Comparison of Large Language Models and Cardiologists in Emergency Scenarios
by Vedat Cicek, Lili Zhao, Yalcin Tur, Ahmet Oz, Sahhan Kilic, Gorkem Durak, Faysal Saylik, Mert Ilker Hayiroglu, Tufan Cinar and Ulas Bagci
Med. Sci. 2026, 14(1), 33; https://doi.org/10.3390/medsci14010033 - 8 Jan 2026
Viewed by 160
Abstract
Background: The clinical applicability of large language models (LLMs) in high-stakes cardiac emergencies remains unexplored. This study evaluated how well advanced LLMs perform in managing complex catheterization laboratory (Cath lab) scenarios and compared their performance with that of interventional cardiologists. Methods and Results: [...] Read more.
Background: The clinical applicability of large language models (LLMs) in high-stakes cardiac emergencies remains unexplored. This study evaluated how well advanced LLMs perform in managing complex catheterization laboratory (Cath lab) scenarios and compared their performance with that of interventional cardiologists. Methods and Results: A cross-sectional study was conducted from 20 June to 2 December 2024. Twelve challenging inferior myocardial infarction scenarios were presented to seven LLMs (ChatGPT, Gemini, LLAMA, Qwen, Bing, Claude, DeepSeek) and five early-career interventional cardiologists. Responses were standardized, anonymized, and evaluated by thirty experienced interventional cardiologists. Performance comparisons were analyzed using a linear mixed-effects model with correlation and reliability statistics. Physicians had an average reference score of 80.68 (95% CI 76.3–85.0). Among LLMs, ChatGPT ranked highest (87.4, 95% CI 82.5–92.3), followed by Claude (80.8, 95% CI 75.7–85.9) and DeepSeek (78.7, 95% CI 72.9–84.6). LLAMA (73.7), Qwen (66.2), and Bing (64.3) ranked lower, while Gemini scored the lowest (59.0). ChatGPT scored higher than the early-career physician comparator group (difference 6.69, 95% CI 0.00–13.37; p < 0.05), whereas Gemini, LLAMA, Qwen, and Bing performed significantly worse; Claude and DeepSeek showed no significant difference. Conclusions: This expanded assessment reveals significant variability in LLM performance. In this simulated setting, ChatGPT demonstrated performance comparable to that of early-career interventional cardiologists. These results suggest that LLMs could serve as supplementary decision-support tools in interventional cardiology under simulated conditions. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) in Cardiovascular Medicine)
Show Figures

Figure 1

11 pages, 1027 KB  
Article
Clustering-Based Characterization of Mixed Herds and the Influence of Pasture Fertilization in High-Andean Livestock Systems
by Jesus Nuñez, Felimon Paxi-Meneses, Wilder Cruz and Richard Estrada
Ruminants 2026, 6(1), 5; https://doi.org/10.3390/ruminants6010005 - 8 Jan 2026
Viewed by 150
Abstract
Livestock production in the high Andes is vital for rural livelihoods and food security but is limited by poor pasture quality, environmental variability, and restricted resources. Pasture improvement, achieved through management practices and particularly through fertilization, may enhance productivity and sustainability in high-Andean [...] Read more.
Livestock production in the high Andes is vital for rural livelihoods and food security but is limited by poor pasture quality, environmental variability, and restricted resources. Pasture improvement, achieved through management practices and particularly through fertilization, may enhance productivity and sustainability in high-Andean livestock systems. This study aimed to characterize mixed herds composed of domestic sheep (Ovis aries), alpacas (Vicugna pacos), llamas (Lama glama), and domestic cattle (Bos taurus) and to evaluate the role of pasture fertilization on herd composition and livestock size. Primary data were collected through structured questionnaires administered to 88 randomly selected livestock producers, complemented by direct field observations of grazing areas, corrals, shelters, and water sources. The survey documented herd structure, grazing management, pasture conservation, fertilization practices, and farm infrastructure. Data from multiple farms were analyzed using a clustering approach to group production units with similar characteristics, and statistical models were applied to assess the effects of fertilization, pasture area, and water sources. Three distinct clusters were identified: one dominated by alpacas, another by sheep, and a third by llamas with the most uniform stocking density. Pasture fertilization was most common in the sheep-dominated cluster and was significantly associated with higher sheep numbers, while no significant effects were detected for alpacas, llamas, or cattle. Farms without fertilization showed slightly higher overall livestock size; however, a strong negative interaction between pasture area and lack of fertilization indicated that expanding grazing land alone could not offset low forage quality. These findings suggest that targeted fertilization, when combined with sustainable grazing practices, may contribute to improved herd performance and long-term resilience in heterogeneous Andean livestock systems. Full article
Show Figures

Figure 1

21 pages, 3379 KB  
Article
KORIE: A Multi-Task Benchmark for Detection, OCR, and Information Extraction on Korean Retail Receipts
by Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Mostafa Farouk Senussi, Mahmoud Abdalla and Hyun Soo Kang
Mathematics 2026, 14(1), 187; https://doi.org/10.3390/math14010187 - 4 Jan 2026
Viewed by 685
Abstract
We introduce KORIE, a curated benchmark of 748 Korean retail receipts designed to evaluate scene text detection, Optical Character Recognition (OCR), and Information Extraction (IE) under challenging digitization conditions. Unlike existing large-scale repositories, KORIE consists exclusively of receipts digitized via flatbed scanning (HP [...] Read more.
We introduce KORIE, a curated benchmark of 748 Korean retail receipts designed to evaluate scene text detection, Optical Character Recognition (OCR), and Information Extraction (IE) under challenging digitization conditions. Unlike existing large-scale repositories, KORIE consists exclusively of receipts digitized via flatbed scanning (HP LaserJet MFP), specifically selected to preserve complex thermal printing artifacts such as ink fading, banding, and mechanical creases. We establish rigorous baselines across three tasks: (1) Detection, comparing Weakly Supervised Object Localization (WSOL) against state-of-the-art fully supervised models (YOLOv9, YOLOv10, YOLOv11, and DINO-DETR); (2) OCR, benchmarking Tesseract, EasyOCR, PaddleOCR, and a custom Attention-based BiGRU; and (3) Information Extraction, evaluating the zero-shot capabilities of Large Language Models (Llama-3, Qwen-2.5) on structured field parsing. Our results identify YOLOv11 as the optimal detector for dense receipt layouts and demonstrate that while PaddleOCR achieves the lowest Character Error Rate (15.84%), standard LLMs struggle in zero-shot settings due to domain mismatch with noisy Korean receipt text, particularly for price-related fields (F1 scores ≈ 25%). We release the dataset, splits, and evaluation code to facilitate reproducible research on degraded Hangul document understanding. Full article
Show Figures

Figure 1

21 pages, 507 KB  
Article
KGEval: Evaluating Scientific Knowledge Graphs with Large Language Models
by Vladyslav Nechakhin, Jennifer D’Souza, Steffen Eger and Sören Auer
Information 2026, 17(1), 35; https://doi.org/10.3390/info17010035 - 3 Jan 2026
Viewed by 393
Abstract
This paper explores the novel application of large language models (LLMs) as evaluators for structured scientific summaries—a task where traditional natural language evaluation metrics may not readily apply. Leveraging the Open Research Knowledge Graph (ORKG) as a repository of human-curated properties, we augment [...] Read more.
This paper explores the novel application of large language models (LLMs) as evaluators for structured scientific summaries—a task where traditional natural language evaluation metrics may not readily apply. Leveraging the Open Research Knowledge Graph (ORKG) as a repository of human-curated properties, we augment a gold-standard dataset by generating corresponding properties using three distinct LLMs—Llama, Mistral, and Qwen—under three contextual settings: context-lean (research problem only), context-rich (research problem with title and abstract), and context-dense (research problem with multiple similar papers). To assess the quality of these properties, we employ LLM evaluators (Deepseek, Mistral, and Qwen) to rate them on criteria, including similarity, relevance, factuality, informativeness, coherence, and specificity. This study addresses key research questions: How do LLM-as-a-judge rubrics transfer to the evaluation of structured summaries? How do LLM-generated properties compare to human-annotated ones? What are the performance differences among various LLMs? How does the amount of contextual input affect the generation quality? The resulting evaluation framework, KGEval, offers a customizable approach that can be extended to diverse knowledge graphs and application domains. Our experimental findings reveal distinct patterns in evaluator biases, contextual sensitivity, and inter-model performance, thereby highlighting both the promise and the challenges of integrating LLMs into structured science evaluation. Full article
(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)
Show Figures

Figure 1

25 pages, 1090 KB  
Article
Evaluating Large Language Models on Chinese Zero Anaphora: A Symmetric Winograd-Style Minimal-Pair Benchmark
by Zimeng Li, Yichen Qiao, Xiaoran Chen and Shuangshuang Chen
Symmetry 2026, 18(1), 47; https://doi.org/10.3390/sym18010047 - 26 Dec 2025
Viewed by 324
Abstract
This study investigates how large language models (LLMs) handle Chinese zero anaphora under symmetric minimal-pair conditions designed to neutralize shallow syntactic cues. We construct a Winograd-style benchmark of carefully controlled sentence pairs that require semantic interpretation, pragmatic inference, discourse tracking, and commonsense reasoning [...] Read more.
This study investigates how large language models (LLMs) handle Chinese zero anaphora under symmetric minimal-pair conditions designed to neutralize shallow syntactic cues. We construct a Winograd-style benchmark of carefully controlled sentence pairs that require semantic interpretation, pragmatic inference, discourse tracking, and commonsense reasoning rather than structural heuristics. Using GPT-4, ChatGLM-4, and LLaMA-3 under zero-shot, one-shot, and few-shot prompting, we assess both accuracy and the reasoning traces generated through a standardized Chain-of-Thought diagnostic. Results show that all models perform consistently on items solvable through local cues but display systematic asymmetric errors on 19 universally misinterpreted sentences that demand deeper discourse reasoning. Analysis of these failures reveals weaknesses in semantic role differentiation, topic-chain maintenance, logical-relation interpretation, pragmatic inference, and long-distance dependency tracking. These findings suggest that while LLMs perform well on simpler tasks, they still face challenges in interpreting contextually omitted arguments in Chinese. The study provides a new controlled evaluation resource, an interpretable error analysis framework, and evidence of differences in symmetric versus asymmetric reasoning behaviors in LLMs. Future research could expand the current benchmark to longer discourse contexts, incorporate multi-modal or knowledge-grounded cues, and explore fine-tuning LLMs on discourse data, helping clarify whether asymmetric patterns stem from deeper reasoning challenges or from interactions between models and the evaluation format. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Natural Language Processing)
Show Figures

Figure 1

Back to TopTop