MDPI - Publisher of Open Access Journals

17 pages, 1283 KB

Open AccessArticle

LedgerRAG: Governance-Driven Agentic Chain of Retrieval for Dynamic Knowledge Scenarios

by Siwei Wang, Yangsen Zhang, Yalong Guo and Jing Kang

Electronics 2026, 15(7), 1376; https://doi.org/10.3390/electronics15071376 - 26 Mar 2026

Viewed by 39

Retrieval-augmented generation (RAG) grounds large language models (LLMs) with external evidence. Dynamic knowledge tasks, however, require systems to decide not only what to retrieve but also when to refresh, how to arbitrate conflicts, and how to preserve an auditable record of the evidence [...] Read more.

Retrieval-augmented generation (RAG) grounds large language models (LLMs) with external evidence. Dynamic knowledge tasks, however, require systems to decide not only what to retrieve but also when to refresh, how to arbitrate conflicts, and how to preserve an auditable record of the evidence used to answer a query. We present LedgerRAG, a trigger-aware retrieval chain framework that maintains an explicit claim-level evidence ledger and uses coverage, temporal validity, authority, and conflict signals to control retrieval, refresh, and stopping decisions. We expand the evaluation with a query-level BM25 baseline, a dense retriever setting, and task-aligned proxy baselines representing graph-style retrieval, temporal-only retrieval, and conflict-focused retrieval. The revised results show that LedgerRAG’s clearest advantage lies in conflict governance and auditable evidence control, achieving near-perfect ConFLICT adjudication (CRAcc = 0.993) under authority-aware routing while yielding more modest gains and explicit trade-offs in regulation-change and streaming settings. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

28 pages, 4780 KB

Open AccessArticle

Retrieval over Response: Large Language Model-Augmented Decision Strategies for Hierarchical Wildfire Risk Evaluation

by Yuheng Cheng, Yuchen Lin, Yanwei Wu, Lida Huang, Tao Chen, Wenguo Weng and Xiaole Zhang

Fire 2026, 9(4), 143; https://doi.org/10.3390/fire9040143 - 26 Mar 2026

Viewed by 131

Abstract

The Analytic Hierarchy Process (AHP) is widely used in Multi-Criteria Decision Analysis (MCDA), yet its strong reliance on expert judgment constrains its scalability and may introduce variability in weighting outcomes, particularly in high-stakes applications such as wildfire risk assessment. In this study, we [...] Read more.

The Analytic Hierarchy Process (AHP) is widely used in Multi-Criteria Decision Analysis (MCDA), yet its strong reliance on expert judgment constrains its scalability and may introduce variability in weighting outcomes, particularly in high-stakes applications such as wildfire risk assessment. In this study, we investigate how Large Language Models (LLMs) can function as decision-support agents in an AHP-style hierarchical evaluation task derived from validated wildfire literature. Based on this structure, four representative LLM-assisted strategies are examined: Direct LLM Scoring (DLS), Multi-Model Debate Scoring (MDS), Full-Document Prompting (FDP), and Indicator-Guided Prompting (IGP). To evaluate their effectiveness, we benchmark LLM-generated rankings against expert-defined ground truth across 16 sub-criteria. Using the mean correlation coefficient R as the key evaluation metric, with reported values expressed as mean ± standard deviation across models: DLS shows no correlation with expert rankings (R = 0.009 ± 0.070), MDS yields marginal gains (R = 0.181), and FDP remains unstable (R = 0.081 ± 0.189). By contrast, IGP, which incorporates retrieval-informed structured prompting, shows the highest agreement with the expert reference among the four compared strategies (R = 0.598 ± 0.065), suggesting that structured contextual guidance may improve the performance of LLM-assisted weighting within the evaluated benchmark. This study suggests that, within the evaluated wildfire benchmark and the tested set of hosted LLMs, LLMs may serve as useful decision-support tools in MCDA tasks when guided by structured inputs or coordinated through multi-agent mechanisms. The proposed framework provides an interpretable basis for exploring LLM-assisted risk evaluation in the present wildfire benchmark, while further validation is needed before extending it to other environmental or safety-critical contexts. Full article

(This article belongs to the Special Issue Fire Risk Management and Emergency Prevention)

► Show Figures

Figure 1

24 pages, 55161 KB

Open AccessArticle

Navigating the Future: A Design Fiction Study on User Perceptions of Next-Gen LLM-Based Voice Interaction

by Biju Thankachan, Deepak Akkil, Sama Rahman, Kristiina Jokinen and Markku Turunen

Multimodal Technol. Interact. 2026, 10(3), 31; https://doi.org/10.3390/mti10030031 - 20 Mar 2026

Viewed by 209

Abstract

Voice user interfaces (VUIs) have evolved from simple command-based systems to more advanced platforms capable of engaging in complex, multi-turn conversations. While current VUIs primarily perform routine tasks, their future trajectory is poised to be significantly shaped by advancements in large language models [...] Read more.

Voice user interfaces (VUIs) have evolved from simple command-based systems to more advanced platforms capable of engaging in complex, multi-turn conversations. While current VUIs primarily perform routine tasks, their future trajectory is poised to be significantly shaped by advancements in large language models (LLMs), enhancing their language understanding and human-like interaction capabilities. This study explores user perceptions of next-generation VUIs using a design fiction approach. We crafted five plausible future scenarios, depicted in comic-style formats, showcasing diverse VUI use-cases. Results from the focus group discussions reveal valuable insights highlighting the potential and challenges of integrating advanced VUIs into everyday interactions. Our results highlight the importance of building trust, factors influencing trust, social aspects and implications of technology, preferences for interaction techniques, and various ethical considerations associated with technology. We conclude by providing design guidelines for future VUIs, emphasizing the need for designing to build trust, the importance of domain specificity, the importance of enabling social experiences mediated via VUIs, and more. Full article

► Show Figures

Figure 1

12 pages, 290 KB

Open AccessArticle

The Linguistic Method of Abraham Joshua Heschel: Interpretative, Linguistic, and Cognitive Aspects

by Yonatan Karish

Religions 2026, 17(3), 394; https://doi.org/10.3390/rel17030394 - 20 Mar 2026

Viewed by 173

Abstract

Abraham Joshua Heschel proposed a linguistic method that he applied in his interpretation of biblical texts and rabbinic teachings. A central feature of this method is the reinterpretation of certain terms beyond their direct, literal meaning. While this approach is rooted in earlier [...] Read more.

Abraham Joshua Heschel proposed a linguistic method that he applied in his interpretation of biblical texts and rabbinic teachings. A central feature of this method is the reinterpretation of certain terms beyond their direct, literal meaning. While this approach is rooted in earlier traditions, Heschel gave it a distinct conceptual formulation and regarded it as a key component of his theological vision. This article articulates its structure and explores how it may be understood through the lens of contemporary research on creative language. To that end, the article compares Heschel’s view with selected philosophical and theological models and introduces cognitive tools, such as metaphor theory and semantic networks, that may support a more systematic understanding of his exegetical style. The aim is not only to deepen our comprehension of Heschel’s linguistic method, but also to propose a path toward advancing his broader vision through the integration of traditional thought and modern research. Full article

(This article belongs to the Special Issue Modern Jewish Thought and Philosophy)

16 pages, 317 KB

Open AccessArticle

Veiled Expressions of the Sacred: Ghazal, Genre, and Mystical Experience in Neshāṭī’s Poetry

by Muhammed Tarik Ablak

Religions 2026, 17(3), 371; https://doi.org/10.3390/rel17030371 - 16 Mar 2026

Viewed by 212

Abstract

This article examines how religious experience is articulated through genre in the poetry of the seventeenth-century Ottoman Mawlawī shaykh Neshāṭī (d. 1674), focusing on the striking contrast between his ghazals and non-ghazal compositions. While Neshāṭī’s qaṣīdas, mathnawīs and other formal genres employ an [...] Read more.

This article examines how religious experience is articulated through genre in the poetry of the seventeenth-century Ottoman Mawlawī shaykh Neshāṭī (d. 1674), focusing on the striking contrast between his ghazals and non-ghazal compositions. While Neshāṭī’s qaṣīdas, mathnawīs and other formal genres employ an explicit and direct religious language—addressing God, the Prophet, sacred figures, and doctrinal themes—his ghazals are dominated by imagery of wine, love, and the beloved, which at first glance appears markedly profane. Rather than reading this contrast as a sign of secularization or doctrinal inconsistency, the article argues that it reflects a conscious poetic strategy shaped by the expressive style of the ghazal. Through a close reading of Neshāṭī’s Dīwān, the study demonstrates that religious meaning in ghazals is not absent but deliberately rendered implicit. Drawing on motifs such as the mirror, secret (sirr), annihilation (fanāʾ fīʾllāh), and states of spiritual contraction, Neshāṭī transforms the language of human love into a vehicle for divine experience. In this context, the ghazal emerges as a genre particularly suited to conveying religious meaning through ambiguity, emotional intensity, and symbolic indirection rather than overt doctrinal exposition. By situating Neshāṭī within both the Mawlawī tradition and the aesthetics of Sabk-i Hindī, this article highlights how genre manifests religious expression in Ottoman poetry. It proposes that divine encounter in Neshāṭī’s work is realized less through explicit theological discourse than through the affective and symbolic potential of the ghazal. In doing so, the study offers a new reading of Neshāṭī’s poetry and contributes to broader discussions on the relationship between literary/lyrical genre, mysticism, and religious experience in Islamic literary traditions. Full article

(This article belongs to the Special Issue Divine Encounters: Exploring Religious Themes in Literature)

21 pages, 3762 KB

Open AccessArticle

Multimodal Large Language Models for Visual Attribute Inference in iRAP Road Attribute Coding

by Horia Ameen, Natchapon Jongwiriyanurak, Jesús Balado and Mario Soilan

Infrastructures 2026, 11(3), 95; https://doi.org/10.3390/infrastructures11030095 - 12 Mar 2026

Viewed by 309

Abstract

Road safety assessment is essential for reducing traffic fatalities, with road infrastructure contributing to a substantial proportion of crashes worldwide. International frameworks such as the International Road Assessment Program (iRAP) define standardized attributes for infrastructure auditing; however, many of these attributes remain challenging [...] Read more.

Road safety assessment is essential for reducing traffic fatalities, with road infrastructure contributing to a substantial proportion of crashes worldwide. International frameworks such as the International Road Assessment Program (iRAP) define standardized attributes for infrastructure auditing; however, many of these attributes remain challenging to automate using imagery alone. This study evaluates V-RoAst (visual question answering for road assessment), a public dataset of road images that are annotated with iRAP-style attributes, using state-of-the-art multimodal large language models (MLLMs), specifically Gemini 2.0 and Gemini 2.5. The analysis focuses on how prompt design influences the accuracy and stability of single image iRAP inference. A token-efficient reduced prompt is developed that preserves the iRAP schema while removing single-class constants, hard-coded administrative fields, and derived or non-visual codes, retaining only visually interpretable attributes. Performance is compared with the original full multi-attribute prompt and single attribute prompts using a fixed evaluation protocol incorporating majority voting, bootstrap 95% confidence intervals, and per-code sample-size checks. Results indicate only minor performance differences between Gemini 2.0 and Gemini 2.5, while prompt optimization produces the most consistent gains, improving macro-F1 scores and tightening confidence intervals for visually grounded attributes such as roadside severity, intersection channelization, and service-road presence. Token analysis shows an approximate 30% reduction in prompt length, reducing computational cost and truncation risk. Overall, the findings demonstrate that prompt scope has a greater impact than model version in image-only iRAP coding, offering practical guidance for scalable infrastructure assessment. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence for Infrastructures)

► Show Figures

Figure 1

18 pages, 2341 KB

Open AccessArticle

Structure-Aware Lightweight Document-Level Event Extraction via Code-Based Large Language Models

by Xing Xu, Jianbin Zhao, Pengfei Zhang, Yaduo Liu, Bingyang Yu, Puyuan Zheng, Dingyuan Hu, Zhongchen Deng, Ping Zong, Guoxin Zhang, Zhonghong Ou, Meina Song and Yifan Zhu

Electronics 2026, 15(6), 1187; https://doi.org/10.3390/electronics15061187 - 12 Mar 2026

Viewed by 272

Abstract

Document-level Event Extraction (DEE) requires identifying complex event records and arguments dispersed across unstructured texts. However, applying general Large Language Models (LLMs) to DEE is intrinsically hindered by their lack of inductive bias for rigid structural constraints, often leading to schema violations and [...] Read more.

Document-level Event Extraction (DEE) requires identifying complex event records and arguments dispersed across unstructured texts. However, applying general Large Language Models (LLMs) to DEE is intrinsically hindered by their lack of inductive bias for rigid structural constraints, often leading to schema violations and suboptimal performance in complex structural prediction tasks. To address this, we propose the S tructure-Aware Lightweight DEE, termed SALE, which leverages the structural reasoning potential of Code-Based LLMs (Code-LLMs) as a favorable inductive preference. We leverage the natural isomorphism between event schemas and programming object definitions, formulating event extraction as a Python 3.9 class instantiation task to bridge the gap between semantic understanding and structural adherence. Specifically, SALE employs a novel two-stage training paradigm: First, a Structure-Aware Fine-tuning stage injects general structural knowledge via diverse code-style instruction tasks derived from broad Information Extraction (IE) datasets; second, an Event Extraction Alignment stage utilizes a reward-based alignment loss—optimized via policy gradient—to adapt this capability to document-level intricacies. The effectiveness of SALE stems from the synergy between its structure-aware prompting and the specialized alignment stage built on a code-oriented backbone. Extensive experiments on established news-domain benchmarks (RAMS and WikiEvents) demonstrate that our approach significantly outperforms representative supervised and general LLM baselines in cross-task zero-shot and few-shot transfer settings (e.g., surpassing supervised baselines by over 7% in F1 score). Furthermore, SALE maintains a highly efficient inference profile and parameter-efficient footprint, offering a practical and scalable solution for vertical domain applications. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

33 pages, 2576 KB

Open AccessArticle

ExamQ-Gen: Instructor-in-the-Loop Generation of Self-Contained Exam Questions from Course Materials and Decision-Support Grading

by Catalin Anghel, Emilia Pecheanu, Andreea Alexandra Anghel, Marian Viorel Craciun and Adina Cocu

Computers 2026, 15(3), 177; https://doi.org/10.3390/computers15030177 - 9 Mar 2026

Viewed by 223

Abstract

Reliable evaluation of large language models (LLMs) for educational use requires benchmarks that reflect exam constraints, instructor grading practices, and the operational consequences of thresholded decisions. This paper introduces ExamQ-Gen, an instructor-in-the-loop benchmark that couples two tasks: (i) an LLM answering university-style exam [...] Read more.

Reliable evaluation of large language models (LLMs) for educational use requires benchmarks that reflect exam constraints, instructor grading practices, and the operational consequences of thresholded decisions. This paper introduces ExamQ-Gen, an instructor-in-the-loop benchmark that couples two tasks: (i) an LLM answering university-style exam questions and (ii) decision-support grading aligned with an instructor reference. Automatic grading is used for triage and feedback; in practice, ExamQ-Gen supports instructor-led exam authoring and provides grading recommendations, while the instructor issues the final grade and pass/fail decision. ExamQ-Gen is constructed from the course content by using an LLM to generate exam-style questions directly from the lecture materials, producing a course-derived question set suitable for controlled experimentation. The benchmark then instantiates contrasting exam conditions, including instructor-authored (HUMAN) versus pipeline-generated (PIPELINE) artifacts, to evaluate robustness under distribution shifts that can occur when exam questions and answers are produced through different generation workflows. Using two LLM “students” (Llama3-8B-Instruct and Mistral-7B-Instruct) and an LLM-based grader, we compare automatic grading against an instructor reference on a 1–10 score scale and at the decision level induced by the operational pass policy (pass if score ≥ 9). Accordingly, our conclusions are conditioned on the two evaluated student models. Score-level agreement is strong under HUMAN conditions but degrades substantially under PIPELINE conditions, indicating condition-dependent stability. At the pass threshold, decision errors are highly asymmetric, with false fails dominating false passes, meaning that conservative grading may appear safe while producing credit denial. A severity-focused analysis isolates a high-stakes failure mode—denial of instructor-perfect answers—and shows that, in the most affected PIPELINE condition, the perfect-pass miss rate reaches 0.926 (50/54), consistent with systematic conservatism rather than borderline noise. Overall, the results highlight that aggregate score agreement and accuracy are insufficient for instructor-controlled exam deployment and motivate reporting practices that combine disaggregated score agreement, threshold-based error asymmetry with uncertainty, and severity-aware diagnostics under exam-relevant condition shifts. Full article

(This article belongs to the Special Issue Intelligence for Complex Data: From Retrieval and Understanding to Decision-Making)

► Show Figures

Figure 1

17 pages, 1701 KB

Open AccessArticle

CLIP-ArASL: A Lightweight Multimodal Model for Arabic Sign Language Recognition

by Naif Alasmari

Appl. Sci. 2026, 16(5), 2573; https://doi.org/10.3390/app16052573 - 7 Mar 2026

Viewed by 224

Abstract

Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. [...] Read more.

Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. This paper introduces CLIP-ArASL, a lightweight CLIP-style multimodal approach for static ArASL letter recognition that aligns visual hand gestures with bilingual textual descriptions. The approach integrates an EfficientNet-B0 image encoder with a MiniLM text encoder to learn a shared embedding space using a hybrid objective that combines contrastive and cross-entropy losses. This design supports supervised classification on seen classes and zero-shot prediction on unseen classes using textual class representations. The proposed approach is evaluated on two public datasets, ArASL2018 and ArASL21L. Under supervised evaluation, recognition accuracies of

99.25 \pm 0.14 %

and

91.51 \pm 1.29 %

are achieved, respectively. Zero-shot performance is assessed by withholding

20 %

of gesture classes during training and predicting them using only their textual descriptions. In this setting, accuracies of

55.2 \pm 12.15 %

on ArASL2018 and

37.6 \pm 9.07 %

on ArASL21L are obtained. These results show that multimodal vision–language alignment supports semantic transfer and enables recognition of unseen classes. Full article

(This article belongs to the Special Issue Machine Learning in Computer Vision and Image Processing)

► Show Figures

Figure 1

41 pages, 2707 KB

Open AccessArticle

Prompt Engineering and Multimodal Tasks in AI-Supported EFL Education: A Mixed Methods Study

by Debopriyo Roy, George F. Fragulis and Adya Surbhi

Sustainability 2026, 18(5), 2415; https://doi.org/10.3390/su18052415 - 2 Mar 2026

Viewed by 463

Abstract

The rapid integration of artificial intelligence (AI) into higher education is reshaping how learners develop academic, linguistic, and research competencies. This mixed-methods study examines how second-year EFL computer science students employ prompt engineering techniques across four task domains—research summarization, academic video note-taking, style [...] Read more.

The rapid integration of artificial intelligence (AI) into higher education is reshaping how learners develop academic, linguistic, and research competencies. This mixed-methods study examines how second-year EFL computer science students employ prompt engineering techniques across four task domains—research summarization, academic video note-taking, style transformation, and concept mapping—within a smart learning environment. Sixty-nine students completed a structured survey requiring AI-assisted draft generation followed by student-led revision. Quantitative analyses included descriptive statistics, chi-square tests, Cramer’s V, t-tests, ANOVA, Kruskal–Wallis tests, and three text-similarity measures (cosine, Jaccard, and Levenshtein). Qualitative evidence was drawn from students’ revised outputs and reflective responses. Results indicate that students consistently preserved semantic meaning while significantly rephrasing AI-generated text, demonstrating moderate conceptual alignment but substantial lexical and structural transformation. Frequent AI users said they were better at searching and revising, but the type of prompt didn’t have much of an effect on how deep the revision was or how well they learned. Iterative prompting and revision emerged as central drivers of metacognitive growth, academic language development, and sustainable learning behaviors. Across tasks, students viewed AI prompts as effective scaffolds for organizing information and synthesizing multimodal input, though reliance varied by learner. The findings underscore that sustainable AI use in EFL technical education depends not on AI output alone, but on structured prompting, iterative human revision, and critical engagement—practices that cultivate autonomy, digital literacy, and long-term academic resilience. Full article

(This article belongs to the Special Issue AI for Sustainable and Creative Learning in Education)

► Show Figures

Figure 1

33 pages, 900 KB

Open AccessArticle

Limits of Computational Selection and Their Implications for Human–AI Divergence in Convergent Creativity

by Sungwook Jung and Ken Nah

Information 2026, 17(3), 243; https://doi.org/10.3390/info17030243 - 2 Mar 2026

Viewed by 322

Abstract

This study investigated whether humans and generative Large Language Models (LLMs) exhibit similar performance in divergent ideation but diverge in convergent selection. To address the critical oversight in current AI creativity research, which predominantly focuses on generative output, this study introduces the original [...] Read more.

This study investigated whether humans and generative Large Language Models (LLMs) exhibit similar performance in divergent ideation but diverge in convergent selection. To address the critical oversight in current AI creativity research, which predominantly focuses on generative output, this study introduces the original conceptual framework of ‘Selection Alignment’ and a ‘novel dual-phase experimental protocol.’ This research transcends traditional generation-centric evaluations to establish a new paradigm for assessing the evaluative stage of creativity. A controlled experiment involved 240 design professionals (120 idea generators, 120 independent selectors) and two LLM agents (GPT-4o, Gemini 1.5 Pro). Participants and LLMs responded to identical divergent prompts, including 10 Alternative Uses Task-style prompts and 10 design problems. Both humans and LLMs generated candidate idea pools, then performed convergent selection by choosing the top five items per prompt. Idea generation was evaluated based on Fluency, Flexibility, and Semantic Breadth. Selection outcomes were compared using top-5 overlap rates derived from semantic clustering. The results indicated near-parity in generation metrics, showing no statistically significant differences between human and AI outputs. However, a substantial divergence was observed in convergent selection: the mean human–AI top-5 overlap was 19.2% for Model-A and 22.4% for Model-B, both significantly below permutation-based chance levels (null mean overlap ≈ 35%). AI selections were strongly predicted by embedding- and probability-based metrics, while human choices were better predicted by context- and experience-based criteria, highlighting a fundamental mechanistic divide. This suggests that convergent selection amplifies human–AI divergence, carrying significant implications for designing co-creative interfaces that integrate human experience into AI’s selection mechanisms. Full article

► Show Figures

Figure 1

17 pages, 702 KB

Open AccessArticle

Leveraging AI to Mitigate Learning Poverty in the Digital Era: The Impacts of Integrated AI Educational Tools on Students’ Literacy Skills

by Yirga Yayeh Munaye, Mekuriaw Genanew Asratie, Bantalem Derseh Wale and Demeke Siltan Adane

AI 2026, 7(3), 84; https://doi.org/10.3390/ai7030084 - 2 Mar 2026

Viewed by 481

Abstract

Technological innovation plays a crucial role in improving educational quality worldwide. In Ethiopia, however, literacy skills face significant obstacles, worsening the problem of learning poverty. This study aimed to analyze the effects of integrated AI educational tools on students’ literacy development. It also [...] Read more.

Technological innovation plays a crucial role in improving educational quality worldwide. In Ethiopia, however, literacy skills face significant obstacles, worsening the problem of learning poverty. This study aimed to analyze the effects of integrated AI educational tools on students’ literacy development. It also explored how learners perceived the use of these tools in reading and writing instruction. A quasi-experimental single-group time series design, combining both quantitative and qualitative approaches, was used. A total of 46 students from the Information Technology department at Injibara University were selected through a comprehensive census sampling method. For a period of three months, participants received reading and writing lessons supported by AI tools (NoRedInk, Rewordifyv2.1.0, and LanguageTool 9.5.0) to assess their impact on literacy skills. Data collection included pre- and post-tests, focus group discussions, and reflective journals. Quantitative data were analyzed with ANOVA, and qualitative data underwent thematic analysis using thematic techniques. Results revealed that the integration of AI educational tools significantly enhanced students’ literacy skills, including grammar, vocabulary, comprehension, content organization, and writing style. Students also expressed positive perceptions of using these tools in their reading and writing lessons. Therefore, this study encourages scholars, educators, and learners to adopt integrated AI educational tools to improve literacy development. Full article

► Show Figures

Figure 1

31 pages, 1230 KB

Open AccessReview

A Review of Multi-Agent AI Systems for Biological and Clinical Data Analysis

by Jackson Spieser, Ali Balapour, Jarek Meller, Krushna C. Patra and Behrouz Shamsaei

Methods Protoc. 2026, 9(2), 33; https://doi.org/10.3390/mps9020033 - 28 Feb 2026

Viewed by 689

Abstract

This review evaluates the emerging paradigm of multi-agent systems (MASs) for biomedical and clinical data analysis, focusing on their ability to overcome the reasoning and reliability limitations of standalone large language models (LLMs). We synthesize findings from recent architectural frameworks, specifically LangGraph, CrewAI, [...] Read more.

This review evaluates the emerging paradigm of multi-agent systems (MASs) for biomedical and clinical data analysis, focusing on their ability to overcome the reasoning and reliability limitations of standalone large language models (LLMs). We synthesize findings from recent architectural frameworks, specifically LangGraph, CrewAI, and the Model Context Protocol (MCP), to examine how specialized agent teams divide labor, utilize precision tools, and cross-verify outputs. We find that MAS architectures yield significant performance gains in various domains: recent implementations improved oncology decision-making accuracy from 30.3% to 87.2% and reached a peak of 93.2% accuracy on USMLE-style benchmarks through simulated clinical evolution. In clinical trial matching, multi-agent frameworks achieved 87.3% accuracy and enhanced clinician screening efficiency by 42.6% (p < 0.001). However, we also highlight critical operational challenges, including an unreliability tax of 15–50× higher token consumption compared to standalone models and the risk of cascading errors where initial hallucinations are amplified across the agent collective. We conclude that while MAS enables a shift toward collaborative intelligence in biomedicine, its clinical and research adoption requires the development of deterministic orchestration and rigorous cost-utility frameworks to ensure safety and expert-centered oversight. Full article

(This article belongs to the Section Biomedical Sciences and Physiology)

► Show Figures

Figure 1

24 pages, 11644 KB

Open AccessArticle

Authenticating Matryoshka Nesting Dolls via an Auditable 2D–3D–Text Evidence Framework with BMA Compression and Zero-Shot 3D Completion

by Yulia Kumar and Srotriyo Sengupta

Electronics 2026, 15(5), 992; https://doi.org/10.3390/electronics15050992 - 27 Feb 2026

Viewed by 306

Abstract

Authenticating cultural heritage artifacts such as Matryoshka Nesting Dolls (MNDs) is increasingly complicated by high-fidelity replicas that successfully mimic surface textures and palettes, leading traditional 2D computer vision models to exhibit dangerous overconfidence in false-positive classifications. To address this, we propose an auditable [...] Read more.

Authenticating cultural heritage artifacts such as Matryoshka Nesting Dolls (MNDs) is increasingly complicated by high-fidelity replicas that successfully mimic surface textures and palettes, leading traditional 2D computer vision models to exhibit dangerous overconfidence in false-positive classifications. To address this, we propose an auditable multimodal framework that transitions from appearance-only detection to a robust verification system based on the following three technical pillars: (1) a 2D visual stream utilizing a ConvNeXt-Tiny backbone for fine-grained style recognition; (2) a 3D geometric stream employing a custom 2D-to-3D reconstruction pipeline based on the Blum Medial Axis (BMA) and surfaces of revolution to capture axisymmetric structural fidelity; and (3) a semantic stream leveraging the Qwen3-VL vision-language model to generate human-interpretable evidence cards. To support this framework, we introduce a novel multimodal dataset comprising 168 unique physical MND sets and 27,387 labeled frames, archived for reproducibility. Our experimental results demonstrate that while 2D-only baselines achieve 77.9% authenticity accuracy, they suffer from a high Expected Calibration Error (ECE) of 0.121. The integrated multimodal framework achieves a superior authenticity accuracy of 96.7% and reduces the ECE to 0.041, representing a 66% improvement in calibration reliability. Crucially, the system shifts the mean confidence for incorrect replica classifications from a high-risk 0.82 to a safe 0.45. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

18 pages, 2369 KB

Open AccessArticle

TransGoT: Structured Graph-of-Thoughts Reasoning for Machine Translation with Large Language Models

by Danying Zhang, Yixin Liu, Jie Zhao and Cai Xu

Big Data Cogn. Comput. 2026, 10(3), 70; https://doi.org/10.3390/bdcc10030070 - 27 Feb 2026

Viewed by 395

Abstract

Machine translation with large language models has recently attracted growing attention due to its flexibility and strong zero-shot and few-shot capabilities. However, most prompt-based LLM translation methods rely on linear generation or shallow self-refinement, implicitly committing to a single reasoning path. Such designs [...] Read more.

Machine translation with large language models has recently attracted growing attention due to its flexibility and strong zero-shot and few-shot capabilities. However, most prompt-based LLM translation methods rely on linear generation or shallow self-refinement, implicitly committing to a single reasoning path. Such designs are brittle when translating long and syntactically complex sources, where reliable translation often requires structured planning and hypothesis exploration. In this paper, we propose TransGoT, a novel machine translation framework inspired by the graph-of-thoughts paradigm, which formulates translation as a structured, multi-stage reasoning process over a graph of intermediate thoughts. TransGoT explicitly decomposes translation into constraint identification, draft generation, and culture- and style-aware refinement, enabling systematic exploration and aggregation of alternative translation hypotheses. To better adapt graph-based reasoning to translation, we design two key mechanisms: (1) Uncertainty-driven thought transformation. Unlike general reasoning tasks, translation uncertainty is often localized and unevenly distributed across tokens, making holistic regeneration inefficient. We therefore design uncertainty-driven thought transformation, which leverages model-internal confidence signals to guide targeted token-level revision; (2) Dispersion-adaptive thought scoring. It emphasizes evaluation criteria with stronger inter-candidate variance to enable robust multi-criteria thought selection. We evaluate TransGoT on the WMT22 benchmarks and experimental results demonstrate that TransGoT consistently outperforms strong LLM-based translation baselines, validating the effectiveness of structured graph-based reasoning for machine translation. Full article

(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)

► Show Figures

Figure 1

Search Results (390)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (390)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI