Saved Queries

Knowledge-based visual question answering (KB-VQA) requires leveraging external knowledge relevant to the image to assist reasoning. Existing methods typically convert images into a single textual description for knowledge retrieval or directly rely on the implicit knowledge within large language models to generate answers. However, a single textual description struggles to preserve fine-grained visual information such as object attributes and scene text, limiting retrieval quality. Meanwhile, naively fusing multi-source information tends to introduce modality noise, undermining reasoning accuracy. To address these issues, we propose a unified framework that constructs multi-source semantic anchors to bridge the cross-modal semantic gaps among vision, questions, and external knowledge. Specifically, we unify image captions, object tags, and optical character recognition (OCR) text as semantic anchors. These anchors serve as shared intermediaries to pre-align visual and textual features, avoiding direct interaction between heterogeneous modalities. During cross-modal fusion, a cross-residual gating mechanism adaptively suppresses modality noise by leveraging the semantic anchors as stable references. The framework further integrates contrastive learning to strengthen cross-modal alignment and employs a retrieve-then-read pipeline for open-domain answer reasoning. Experiments on the OK-VQA, FVQA, and A-OKVQA datasets demonstrate that the proposed framework outperforms state-of-the-art methods across multiple metrics, validating the effectiveness and robustness of the proposed framework. Full article

(This article belongs to the Topic Generative AI and Interdisciplinary Applications)

►▼ Show Figures

Figure 1

22 pages, 1380 KB

Open AccessArticle

Intelligent Question-Answering System for New Energy Vehicles Integrating Deep Semantic Parsing and Knowledge Graphs

by Yaqi Wu, Pengcheng Li, Tong Geng, Yi Wang, Haiyu Zhang and Shixiong Li

Informatics 2026, 13(5), 66; https://doi.org/10.3390/informatics13050066 - 24 Apr 2026

Abstract

The new energy vehicle (NEV) industry generates massive multi-source heterogeneous data. To overcome traditional database limitations in terminology disambiguation and multi-hop reasoning, this paper proposes a knowledge graph (KG)-based question-answering (QA) architecture. Three primary domain challenges are addressed: First, to tackle the poor semantic extraction of informal diagnostic texts, a deep semantic parsing network (BERT-BiLSTM-CRF) is integrated to extract high-precision knowledge from 150,000 real-world maintenance records. Second, to solve topological redundancy, the Labeled Property Graph (LPG) specification is employed to encapsulate parameters of 2157 vehicle models as internal attributes, significantly streamlining complex multi-hop reasoning. Finally, to enhance limited reasoning capabilities, an intent classification module (TextCNN) automatically translates natural language into graph queries, enabling deep fault tracing across up to five semantic levels. Experimental results demonstrate 98% and 93% accuracy in entity-relation recognition and intent classification, respectively. The resulting KG (8274 nodes, 14,488 edges) establishes a scalable paradigm for intelligent diagnostic reasoning in complex vertical domains. Full article

(This article belongs to the Section Machine Learning)

18 pages, 880 KB

Open AccessArticle

Comparative Evaluation of Five Multimodal Large Language Models for Medical Laboratory Image Recognition: Impact of Prompting Strategies on Diagnostic Accuracy

by Hui-Ru Yang, Kuei-Ying Lin, Ping-Chang Lin, Jih-Jin Tsai and Po-Chih Chen

Diagnostics 2026, 16(9), 1258; https://doi.org/10.3390/diagnostics16091258 - 22 Apr 2026

Viewed by 160

Abstract

Background: Multimodal large language models (MLLMs) show promise in medical imaging, but their performance is highly dependent on prompt engineering. This study systematically evaluates how different prompting strategies affect diagnostic accuracy in clinical laboratory image interpretation. Methods: We evaluated five MLLMs (ChatGPT-4o, Gemini 2.0 Flash, Claude 3.5 Sonnet, Grok-2, and Perplexity Pro (Claude 3.5 Sonnet)) using 177 proficiency testing images across three domains: blood smears (n = 78), urinalysis (n = 50), and parasitology (n = 49). Three prompting approaches were compared: (1) complex multi-choice prompts with 20 diagnostic options, (2) zero-shot open-ended prompts, and (3) two-step descriptive-reasoning prompts. Images were sourced from the Taiwan Society of Laboratory Medicine external quality assurance archives with expert consensus diagnoses. Results: Zero-shot prompting significantly outperformed complex multi-choice prompts across all models and domains (p < 0.001). With zero-shot prompts, Gemini achieved 78.5% overall accuracy (urinalysis: 92.0%; parasitology: 75.5%; blood smears: 64.1%), representing a 17% improvement over complex prompts. Two-step descriptive-reasoning prompts further improved blood smear accuracy by 8–12% for top-performing models, but showed minimal benefit in urinalysis and parasitology. The re-query mechanism (“please reconsider”) improved urinalysis accuracy by 7.6% but had a negligible effect on blood smears and parasitology. Conclusions: Prompting strategy critically determines MLLM diagnostic performance. Zero-shot approaches with minimal constraints consistently outperform complex multi-choice formats. The remarkable performance of general-purpose models in structured domains like urinalysis (>90% accuracy) demonstrates the considerable progress of multimodal AI. However, complex morphological tasks like blood smear interpretation require either specialized prompting techniques or domain-specific fine-tuning. These findings provide evidence-based guidance for optimizing AI integration in clinical laboratories. Full article

(This article belongs to the Special Issue The Application of Large Language Models (LLMs) and Vision-Language Models (VLMs) in Healthcare)

18 pages, 701 KB

Open AccessArticle

PatternStudio: A Neuro-Symbolic Framework for Dynamic and High-Throughput Complex Event Processing

by Jesús Rosa-Bilbao

IoT 2026, 7(2), 36; https://doi.org/10.3390/iot7020036 - 22 Apr 2026

Viewed by 92

Abstract

Complex Event Processing (CEP) is essential for real-time analytics in domains such as industrial IoT, cybersecurity, and financial monitoring, yet CEP adoption is still hindered by the difficulty of authoring temporal rules and by rigid redeployment workflows. This paper presents PatternStudio, a neuro-symbolic CEP framework that translates natural language specifications into validated event-processing patterns and executes them on a deterministic Apache Flink-based runtime without interrupting service. The generative layer is constrained to produce a typed intermediate representation, while the symbolic layer enforces validation and runtime execution guarantees. We evaluate the prototype as a single-node system-characterization study on commodity hardware representative of edge and near-edge gateways rather than microcontroller-class devices. Under this setting, PatternStudio reaches 47,910 events per second at 250 active rules while maintaining a bounded memory footprint between 1.6 GB and 1.9 GB during the reported runs. Beyond 500 active rules, throughput degradation is driven primarily by CPU saturation and alert amplification, which also explains the sharp increase in tail latency. Additional measurements with parallelism 4, a static baseline, and a two-stage NL-to-IR evaluation further show that the architecture remains functional under partitioned execution, incurs moderate dynamic-orchestration overhead, preserves rule structure reliably under natural-language authoring, and supports interchangeable LLM backends at the semantic front end. Full article

20 pages, 621 KB

Open AccessReview

Conditional Generative AI in Oncology Diagnostics

by Chiara Frascarelli, Alberto Concardi, Elisa Mangione, Mariachiara Negrelli, Francesca Maria Porta, Michela Tulino, Joana Sorino, Antonio Marra, Nicola Fusco, Elena Guerini-Rocco and Konstantinos Venetis

Appl. Sci. 2026, 16(8), 4015; https://doi.org/10.3390/app16084015 - 21 Apr 2026

Viewed by 209

Abstract

The increasing complexity of oncology diagnostics requires advanced Clinical Decision Support Systems (CDSS) capable of integrating multimodal data. Traditional discriminative models often struggle with missing data and cross-modal dependencies. This review provides a novel, systematic analysis of conditional generative artificial intelligence (AI), including Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), diffusion models and Multimodal Large Language Models (MLLMs), specifically tailored for oncological CDSS. We examine how these architectures move beyond simple prediction to learn joint data distributions, enabling robust data imputation, virtual staining, and automated clinical reporting. A central focus of this work is the assessment of translational application, identifying the gaps between experimental proof-of-concepts and clinical deployment. We address critical hurdles such as model hallucinations, domain shift, and demographic bias, providing a roadmap for biological consistency and regulatory compliance. This review highlights the transition from task-specific generators to multimodal reasoning systems. Ultimately, we argue that the integration of generative AI into diagnostic workflows is essential for precision oncology, provided that human-in-the-loop validation and uncertainty-aware inference remain central to their implementation. Full article

(This article belongs to the Special Issue Generative Artificial Intelligence for Clinical Decision Support System and Healthcare)

►▼ Show Figures

Figure 1

26 pages, 31446 KB

Open AccessArticle

A Training-Free Paradigm for Data-Scarce Maritime Scene Classification Using Vision-Language Models

by Jiabao Wu, Yujie Chen, Wentao Chen, Yicheng Lai, Junjun Li, Xuhang Chen and Wangyu Wu

Sensors 2026, 26(8), 2549; https://doi.org/10.3390/s26082549 - 21 Apr 2026

Viewed by 221

Abstract

Maritime Domain Awareness (MDA) relies heavily on data acquired from high-resolution optical spaceborne sensors; however, processing this massive quantity of sensor data via traditional supervised deep learning is severely bottlenecked by its dependency on exhaustively annotated datasets. Under extreme data scarcity, conventional architectures suffer severe performance degradation, rendering them impractical for time-critical, zero-day deployments. To overcome this barrier, we propose a training-free inference paradigm that leverages the extensive pre-trained knowledge of Large Vision-Language Models (VLMs). Specifically, we introduce a Domain Knowledge-Enhanced In-Context Learning (DK-ICL) framework coupled with a Macro-Topological Chain-of-Thought (MT-CoT) strategy. This approach bridges the perspective gap between natural images and top–down optical sensor imagery by translating expert remote sensing heuristics into a strict, step-by-step reasoning pipeline. Extensive evaluations demonstrate the substantial efficacy of this framework. Armed with merely 4 visual exemplars per category as in-context triggers, our MT-CoT augmented VLMs outperform traditional models trained under identical scarcity by over 38% in F1-score. Crucially, real-world case studies confirm that this zero-gradient approach maintains robust generalization on unannotated, out-of-distribution coastal clutters, achieving performance parity with data-heavy networks trained on 50 times the data volume. By substituting massive human annotation and GPU optimization with scalable logical deduction, this paradigm establishes a resource-efficient foundation for next-generation intelligent maritime sensing networks. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Target Recognition and Remote Sensing Data Processing)

►▼ Show Figures

Figure 1

18 pages, 1843 KB

Open AccessArticle

MENARA: Medical Natural Arabic Response Assistant

by Ahmed Ibrahim, Abdullah Hosseini, Hoda Helmy, Maryam Arabi, Aya AlShareef, Wafa Lakhdhar and Ahmed Serag

Mach. Learn. Knowl. Extr. 2026, 8(4), 110; https://doi.org/10.3390/make8040110 - 21 Apr 2026

Viewed by 196

Abstract

Dialectal variation presents a major challenge for deploying medical language models in real-world healthcare settings, where patient–clinician communication often occurs in regional vernaculars rather than standardized language forms. This challenge is particularly pronounced in the Arabic-speaking world, where clinical interactions frequently take place in diverse dialects that differ substantially from Modern Standard Arabic. Fine-tuning and maintaining separate models for each dialect is computationally inefficient and difficult to scale, motivating more integrated approaches. In this work, we present MENARA, an Arabic medical language model constructed by merging Egyptian Arabic, Moroccan Darija, and medical-domain specialists through model merging. We extend prior feasibility findings through comprehensive evaluation of cross-dialect performance, medical safety, and cross-lingual knowledge retention. Specifically, we introduce a fine-grained dialect composition analysis to quantify lexical purity and structured code-switching behavior, benchmark against state-of-the-art Arabic LLMs, conduct subject-matter-expert assessment of both dialectal fidelity and medical appropriateness. The results show that model merging preserves core medical competence while enabling robust dialectal adaptation, achieving strong cross-dialect fidelity while substantially reducing storage and deployment overhead compared to maintaining separate models. These findings establish model merging as a potentially practical and resource-efficient paradigm for dialect-aware medical NLP in linguistically fragmented healthcare environments. Full article

(This article belongs to the Special Issue Advancing Natural Language Processing for Low-Resource Languages and Dialects)

►▼ Show Figures

Figure 1

16 pages, 830 KB

Open AccessSystematic Review

Concurrent (Dual) Disorder Management Guidelines: Systematic Review Update

by Syune Hakobyan, Zachary Allan, Stephen Lee-Cheong, Kristina Adorjan, Peter Falkai and Christian G. Schütz

J. Clin. Med. 2026, 15(8), 3123; https://doi.org/10.3390/jcm15083123 - 20 Apr 2026

Viewed by 171

Abstract

Background/Objectives: The initial systematic review of “Concurrent Disorder Management Guidelines. Systematic Review” assessed the quality of the concurrent disorders’ clinical management guidelines in 2020, including the guidelines in the field from 2000 to 2020. Twenty-four guidelines were identified and assessed with AGREE II (Appraisal of Guidelines for Research and Evaluation). As dual disorder needs increased specifically among the younger population, requiring significant healthcare resources, more efficient approaches targeting complex concurrent disorders are essential. Since 2020, multiple new guidelines have been developed in response to new developments in the field of substance use disorder management. This systematic review update aimed to identify and appraise all new available concurrent disorder management guidelines to strategize the management of concurrent disorders, support better outcomes and further research directions. Methods: The review was registered, and protocol is available in the international register—PROSPERO. Literature searches were performed by two independent authors in electronic databases and the gray literature. The inclusion criteria were English language clinical management guidelines for adult concurrent disorders between 2020 and 2025. Sources that were not formal clinical guidelines, not addressed to physicians for adult age group, addressed to intellectual/developmental disability, or written in languages other than English were excluded. Results: The initial search resulted in 5003 records. A total of eight new guidelines were identified and assessed with AGREE II, highlighting the consistent gap in the evidence-based management recommendations. Conclusions: The appraised guidelines had similar quality to the 2020 findings, supporting dual or combined treatment; however, all guidelines had multiple domains not developed rigorously and with methodological limitations. Levels of complexity and staging of treatment were not considered in recommendations. Average domain scores were very low, with the lowest being applicability and editorial independence. Development of high-quality, rigorously developed, evidence-based guidelines, addressing staging, resource implications, and patient involvement is recommended as the evidence base remains underdeveloped. Full article

(This article belongs to the Special Issue From Dual Diagnosis to Dual Disorder(s): Bridging Comorbidity and Integrated Clinical Understanding)

►▼ Show Figures

Figure 1

11 pages, 525 KB

Open AccessArticle

Assessment of Stage Two Hypertension Treatment Plans Written by Generative AI

by Tai Metzger, Zaheen Hossain, Kody Park, Stephen Vu, Simon Dixon and Tracey A. H. Taylor

J. Clin. Med. 2026, 15(8), 3103; https://doi.org/10.3390/jcm15083103 - 18 Apr 2026

Viewed by 221

Abstract

Background/Objectives: As use of large language models (LLMs) in clinical practice, in medical education, and by patients increases, it is essential to ensure that information provided is accurate and safe. Our objective was to compare stage two hypertension treatment plans generated by popular LLMs. Methods: ChatGPT (GPT-4o), Claude (Claude 4 Sonnet), ClinicalKey AI, Microsoft Copilot (Wave 2), DeepSeek-V3-0324, Dyna AI, Google Gemini (2.5 Flash), Grok (version 3), Meta AI assistant (Llama 4 Maverick), OpenEvidence (version 2.0), Perplexity (Sonar backend model), and Pi (Inflection-2.5) were prompted to generate a treatment plan for stage two hypertension. Six blinded reviewers scored each response in three domains: adherence to clinical guidelines, detail/clarity, and reliability/safety. Results: Perplexity received the highest composite score (8.17 out of 9), followed by OpenEvidence (7.92 out of 9). Dyna AI had the lowest overall score (3.75 out of 9). Perplexity (3.00 out of 3), Grok (2.83 out of 3), and OpenEvidence (2.75 out of 3) had the highest scores for detail/clarity, while Dyna AI had the lowest for both detail/clarity (1.00 out of 3) and reliability/safety (1.00 out of 3). ChatGPT had the highest score for adherence to guidelines (2.75 out of 3) while Pi had the lowest (1.58 out of 3). Kruskal–Wallis test showed p < 0.05 across sub-score domains and composite scores. Conclusions: LLMs tended to adhere to clinical guidelines and provide detailed responses but often did not provide sources or instruct users to see a healthcare professional. There was notable variability in quality, and medicine-specific LLMs were not superior to popular LLMs. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Clinical Medicine)

►▼ Show Figures

Figure 1

38 pages, 24838 KB

Open AccessArticle

LLM-Driven Modeling and Decision Support Methods for Cross-Domain Collaborative Mission Systems

by Han Li, Dongji Li, Yunxiao Liu, Jinyu Ma, Guangyao Wang and Jianliang Ai

Appl. Syst. Innov. 2026, 9(4), 80; https://doi.org/10.3390/asi9040080 - 17 Apr 2026

Viewed by 220

Abstract

Cross-domain formations composed of Unmanned Aerial Vehicles (UAVs) and Unmanned Surface Vessels (USVs) are critical for maritime defense but face significant challenges in countering complex aerial threats and developing flexible, collaborative strategies. Addressing the limitations of traditional decision support systems in semantic understanding and dynamic adaptation, this paper proposes a novel Large Language Model (LLM)-driven decision support framework grounded in the Department of Defense Architecture Framework (DoDAF). By integrating Retrieval-Augmented Generation (RAG) with a domain-specific knowledge base, the framework enhances the LLM’s ability to align natural-language directives with standardized DoDAF view models, effectively mitigating hallucinations in tactical generation. The proposed framework coordinates a closed-loop process, using Petri net-based static logic verification to ensure structural consistency and Monte Carlo-based dynamic effectiveness evaluation to optimize the selection of kill chains. Experimental validations in a simulated UAV-USV maritime defense scenario demonstrate that the framework achieves 96.6% entity accuracy and 100% format compliance in model generation. In comparison, the generated cooperative kill chains significantly outperform non-cooperative methods by improving interception efficacy by approximately 26.08% under saturation attack conditions. This study develops an automated, interpretable workflow that transforms unstructured situational understanding into decision reporting, significantly enhancing the efficiency and reliability of cross-domain collaborative mission planning. Full article

(This article belongs to the Special Issue AI-Driven Decision Support for Systemic Innovation)

►▼ Show Figures

Figure 1

23 pages, 2967 KB

Open AccessArticle

SPARK_AI: A Prompt-Orchestrated Architecture for Stateful, Process-Oriented Reasoning with Large Language Models

by Marija Kaplar, Sebastijan Kaplar, Miloš Vučić, Lidija Ivanović, Aleksandra Stevanović, Aleksandar Milenković and Nemanja Vučićević

Informatics 2026, 13(4), 63; https://doi.org/10.3390/informatics13040063 - 17 Apr 2026

Viewed by 423

Abstract

This paper presents SPARK_AI, a prompt-orchestrated system architecture for governing how large language models (LLMs) conduct structured and adaptive reasoning in human–AI interaction. The framework mitigates ad hoc LLM use by replacing direct answer generation with a process-oriented, step-by-step reasoning workflow. We focus on SPARK_AI_MATH, a domain module that supports learners in solving non-routine problem-solving tasks by operationalizing well-established problem-solving phases and guided questioning dialog strategies (Socratic-style prompts), with an optional tool-mediated visualization layer (e.g., GeoGebra). The module implements a five-phase conversational protocol consisting of problem interpretation, analysis of givens, planning, execution, and reflection, together with a controlled hint policy. This design is realized through a stateful system architecture in which each problem instance is maintained as an independent interaction track with a persistent reasoning state. User acceptance was evaluated by first-year mechanical engineering students (N = 108) using an expanded Technology Acceptance Model instrument, and the results were analyzed via PLS-SEM. The findings indicate overall favorable perceptions, with perceived usefulness and learning support emerging as key predictors of intention for continued use. Beyond this specific domain, the SPARK_AI framework enables efficient domain adaptation through localized prompt strategies while preserving a shared cognitive control layer for reasoning-centered human–LLM interaction. Full article

►▼ Show Figures

Figure 1

22 pages, 5917 KB

Open AccessReview

Mapping Research on Virtual Reality for Balance, Coordination, and Motor Rehabilitation: A Bibliometric Analysis with Topic Modeling

by Hongfei Zhang, Wenjun Hu, Qing Zhang, Man Jiang and Jakub Kortas

Healthcare 2026, 14(8), 1067; https://doi.org/10.3390/healthcare14081067 - 17 Apr 2026

Viewed by 294

Abstract

Virtual reality (VR) has been increasingly adopted as a digital tool in rehabilitation for balance training, coordination improvement, and motor recovery, yet the literature remains dispersed across clinical rehabilitation, exercise-based interventions, and broader motor-related applications. This fragmentation makes it difficult to determine how the field has evolved and where research emphasis has shifted. This study mapped the research landscape and thematic evolution of VR for balance, coordination, and motor rehabilitation using bibliometric analysis and topic modeling. A total of 1258 articles indexed in the Web of Science Core Collection from 2011 to 2025 were analyzed. Only English language articles and reviews relevant to VR-based balance, coordination, or motor rehabilitation research were included, yielding a final dataset of 1258 publications. CiteSpace and VOSviewer were used to examine keyword co-occurrence, clustering patterns, and temporal trends, while Latent Dirichlet Allocation (LDA) was applied to identify latent themes and their temporal dynamics. The field has moved beyond early feasibility testing toward a more differentiated landscape shaped by distinct clinical targets, population groups, and training purposes. Seven recurring themes were identified, including vestibular rehabilitation and immersive training, post-stroke upper-limb rehabilitation, efficacy and adverse-effect assessment, balance and gait training interventions, evidence synthesis and review-based evaluation, elderly exercise and cognitive interventions, and skill-oriented virtual task training with recent expansion toward broader population groups and task-specific applications beyond traditional rehabilitation settings. VR research on balance, coordination, and motor rehabilitation has evolved into a more thematically differentiated field rather than remaining a single rehabilitation-oriented domain. By combining bibliometric mapping with topic modeling, this study clarifies where evidence is concentrated and which thematic directions are gaining visibility, providing a clearer basis for future evidence synthesis and more comparable intervention reporting. Full article

(This article belongs to the Special Issue Advances in Virtual Reality Technology and AI in Physical Activity, Rehabilitation, and Health Promotion)

►▼ Show Figures

Figure 1

17 pages, 3629 KB

Open AccessArticle

Toward Auditable Urban Soil Management: A Knowledge Graph and LLM Approach Fusing Environmental and Geochemical Data

by Xi Qin, Yanlin Tang, Yirong Deng, Meiqu Lu, Wenqiang He, Jinrui Song, Keyu Lin and Feng Han

Appl. Sci. 2026, 16(8), 3895; https://doi.org/10.3390/app16083895 - 17 Apr 2026

Viewed by 239

Abstract

Urban soil contamination poses persistent risks to redevelopment, public health, and ecological restoration, yet actionable evidence is scattered across site investigation reports, monitoring databases, and regulatory documents. Existing decision-support tools often depend on manual searches and provide limited structured reasoning. This study develops a domain knowledge graph (KG) and a KG-powered question-answering (KBQA) system for urban soil management to organize multi-source evidence and deliver precise, auditable answers to parcel- and pollutant-specific queries. The approach (1) defines an urban soil ontology covering parcels, land uses, pollutants, measurements, pathways, and regulatory thresholds; (2) extracts and links entities and relations from textual and tabular sources; (3) constructs a graph database with provenance; and (4) implements a KBQA pipeline that maps natural-language questions to constrained graph queries and verbalizes results with citations. The resulting system supports source identification, land-use-specific exceedance checks, affected-parcel listing, and remediation reference retrieval. Experiments on a curated QA set and a South China case study show higher answer accuracy and lower latency than text-only baselines, while consistently returning traceable evidence and reducing cross-document lookup effort. Compared to text-only RAG baselines, the KG-powered system achieved a 0.14 improvement in Exact Match scores (e.g., 0.81 vs. 0.58 for Threshold tasks) and maintained a competitive median latency of 0.75 s. The pipeline utilizes a 13B-parameter instruction-tuned LLM. The ontology, schema, benchmark QA sets, and sample queries are publicly released to support transfer to other regions. Full article

(This article belongs to the Topic Big Data and AI for Geoscience)

►▼ Show Figures

Figure 1

21 pages, 635 KB

Open AccessArticle

Agentic Hallucination Risk Scoring for Medical LLMs via Uncertainty Quantification and Clinical Knowledge Injection

by Mayank Kapadia and Mohammad Masum

Algorithms 2026, 19(4), 315; https://doi.org/10.3390/a19040315 - 17 Apr 2026

Viewed by 328

Abstract

Large Language Models (LLMs) have witnessed significant adoption across numerous domains since 2020, but their proclivity to hallucinate creates unacceptable dangers in high-risk environments like healthcare, where wrong outputs can directly jeopardize human safety. While present systems focus on pre-generation mitigation strategies, they cannot ensure the safety of individual outputs during inference. We provide a post hoc Hallucination Risk Scoring (HRS) methodology that intercepts questionable outputs before they reach patients via an agentic pipeline. Given a medical question, a domain-specific LLM generates an initial response from which five complimentary uncertainty signals are computed, which are then separated into a decision layer that governs escalation and a guidance layer that directs clinical knowledge injection by a GPT. The framework is tested using three biological question-answering datasets of various complexity: PubMedQA-Labeled, PubMedQA-Artificial, and BioASQ Task B. The results show an up to 38% safety increase at the most sensitive threshold configuration, zero deterioration across all experimental configurations enforced by the Revert Baseline method, and complexity-aware escalation rates that scale organically with dataset difficulty. Tunable thresholds allow physicians to calibrate system behavior based on deployment requirements, providing a practical safety–accuracy trade-off. Statistical research finds entropy as the primary uncertainty signal separating escalated from non-escalated situations across all datasets. These findings provide a deployable, interpretable, and configurable post hoc safety paradigm for reliable medical AI implementation. Full article

(This article belongs to the Special Issue Evolution of Algorithms in the Era of Generative AI)

►▼ Show Figures

Figure 1

20 pages, 991 KB

Open AccessArticle

Collaborative Multi-Agent Method for Zero-Shot LLM-Generated Text Detection

by Gang Sun, Bowen Li, Ying Zhou, Yi Zhu and Jipeng Qiang

Informatics 2026, 13(4), 62; https://doi.org/10.3390/informatics13040062 - 16 Apr 2026

Viewed by 413

Abstract

With the rapid proliferation of large language models (LLMs), distinguishing machine-generated text from human-authored content has become increasingly critical for ensuring content authenticity, academic integrity, and trust in information systems. However, detecting text generated by LLMs remains a challenging problem, particularly in zero-shot settings where labeled data and domain-specific tuning are unavailable. To address this challenge, in this paper, we propose a novel Collaborative Multi-Agent Zero-Shot Detection framework (CMA-ZSD). In contrast to existing methods based on watermarking, statistical heuristics, or neural classifiers, our CMA-ZSD employs three functionally heterogeneous agents that perform differentiated perturbations of the input text. By jointly modeling semantic consistency, grammatical normalization, and feature-level reconstruction, our method captures intrinsic asymmetries between human-authored and LLM-generated text. A semantic similarity evaluation mechanism, combined with majority voting, enables robust and interpretable detection decisions that balance individual agent autonomy with collective consensus. Extensive experiments across 11 domains demonstrate the effectiveness of our method, with its zero-shot detection achieving accuracy comparable to domain-finetuned models in specific domains such as Finance and Reddit-dli5. Full article

(This article belongs to the Section Big Data Mining and Analytics)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 24.

Go to page 1 2 3 4 5

Search Results (1,167)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI