Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (51)

Search Parameters:
Keywords = LLM orchestration

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
35 pages, 2917 KB  
Article
Generative AI-Assisted Automation of Clinical Data Processing: A Methodological Framework for Streamlining Behavioral Research Workflows
by Marta Lilia Eraña-Díaz, Alejandra Rosales-Lagarde, Iván Arango-de-Montis and José Alejandro Velázquez-Monzón
Informatics 2026, 13(4), 48; https://doi.org/10.3390/informatics13040048 - 25 Mar 2026
Viewed by 104
Abstract
This article presents a methodological framework for automating clinical data processing workflows using Generative Artificial Intelligence (AI) as an interactive co-developer. We demonstrate how Large Language Models (LLMs), specifically ChatGPT and Claude, can assist researchers in designing, implementing, and deploying complete ETL (Extract, [...] Read more.
This article presents a methodological framework for automating clinical data processing workflows using Generative Artificial Intelligence (AI) as an interactive co-developer. We demonstrate how Large Language Models (LLMs), specifically ChatGPT and Claude, can assist researchers in designing, implementing, and deploying complete ETL (Extract, Transform, Load) pipelines without requiring advanced programming or DevOps expertise. Using a dataset of 102 participants from a nonverbal expression study as a proof-of-concept, we show how AI-assisted automation transforms FaceReader video analysis outputs during the Cyberball paradigm into structured, analysis-ready datasets through containerized workflows orchestrated via Docker and n8n. The resulting framework successfully processes all 102 datasets, generating machine learning outputs to validate pipeline execution stability (rather than clinical predictivity), and deploys interactive visualization dashboards, tasks that would normally require significant manual effort and technical specialization expertise. This work establishes a replicable methodology for integrating Generative AI into research data management workflows, with implications for accelerating scientific discovery across behavioral and medical research domains. Full article
Show Figures

Figure 1

23 pages, 2120 KB  
Review
The Impact of Generative AI on 6G Network Architecture and Service
by Yedil Nurakhov, Serik Aibagarov, Nurislam Kassymbek, Aksultan Mukhanbet, Bolatzhan Kumalakov and Timur Imankulov
Electronics 2026, 15(7), 1345; https://doi.org/10.3390/electronics15071345 - 24 Mar 2026
Viewed by 111
Abstract
The transition from 5G to 6G wireless systems marks a paradigm shift from “connected things” to “connected intelligence,” driven by the necessity to manage hyper-heterogeneous networks and overcome the Shannon capacity limit. This Systematic Literature Review (SLR) analyzes 118 primary studies to evaluate [...] Read more.
The transition from 5G to 6G wireless systems marks a paradigm shift from “connected things” to “connected intelligence,” driven by the necessity to manage hyper-heterogeneous networks and overcome the Shannon capacity limit. This Systematic Literature Review (SLR) analyzes 118 primary studies to evaluate the transformative impact of Generative AI (GenAI) and Large Language Models (LLMs) on 6G architecture. We categorize the integration of GenAI into five semantic clusters: Architecture, Management, Security, Semantics, and Edge AI. The synthesis reveals that 6G is evolving toward an “AI-Native” ecosystem where LLMs show strong promise for augmenting network orchestration through Intent-Based Networking (IBN) and generative models demonstrate significant potential to augment or transcend traditional physical layer algorithms. Furthermore, the review identifies a fundamental transition from bit-oriented to semantic-oriented communication, utilizing GenAI to reconstruct meaning from minimal data. However, critical challenges remain, particularly the “energy–intelligence paradox” and the risks of model hallucinations in critical infrastructure. We conclude that while GenAI provides the necessary cognitive flexibility for 6G, its successful deployment depends on solving the “inference gap” through split learning and extreme model quantization at the edge. Full article
Show Figures

Figure 1

44 pages, 643 KB  
Article
A Hybrid Multi-Agent System for Early Scam Detection in Crypto-Assets
by Mario Trerotola, Mimmo Parente and Davide Calvaresi
Appl. Sci. 2026, 16(7), 3122; https://doi.org/10.3390/app16073122 - 24 Mar 2026
Viewed by 202
Abstract
The rapid expansion of crypto-asset markets and the introduction of the Markets in Crypto-Assets Regulation (MiCAR) pose novel supervisory challenges. Existing blockchain intelligence platforms focus predominantly on on-chain surveillance, leaving gaps in off-chain documentary due diligence automation. This paper presents a Multi-Agent System [...] Read more.
The rapid expansion of crypto-asset markets and the introduction of the Markets in Crypto-Assets Regulation (MiCAR) pose novel supervisory challenges. Existing blockchain intelligence platforms focus predominantly on on-chain surveillance, leaving gaps in off-chain documentary due diligence automation. This paper presents a Multi-Agent System (MAS) integrating Large Language Model (LLM) capabilities with rule-based compliance frameworks. The architecture comprises seven specialized agents: a Coordinator Agent for orchestration; data acquisition agents (Searcher, Crawler); three parallel analytical agents—Heuristic Agent (LLM-powered qualitative risk assessment), Compliance Agent (hybrid-AI MiCAR asset classification and regulatory requirement verification), and On-Chain Agent (machine learning-based fraud detection); and a Reconciliator Agent synthesizing findings into unified alerts. Component-level empirical validation on 150 projects indicates 95% output reproducibility (identical alert tier and score deviation 0.05 across five reruns) and 210 s mean latency, providing proof-of-concept evidence for the integrated pipeline. A pilot user evaluation (six researchers/master students and two experts from regulatory authorities) provides preliminary usability evidence and surfaces domain-specific feedback from regulatory-authority experts. The architecture advances proactive regulatory technology by enabling scalable analysis combining off-chain documentary evidence with on-chain forensics. Full article
Show Figures

Figure 1

37 pages, 2886 KB  
Article
A Zero-Touch Vulnerability Remediation Framework Based on OpenVAS, Threat Intelligence, and RAG-Enhanced Large Language Models
by Cheng-Hui Hsieh, Chen-Yi Cheng and Yung-Chung Wang
Mathematics 2026, 14(6), 1072; https://doi.org/10.3390/math14061072 - 22 Mar 2026
Viewed by 213
Abstract
Vulnerability disclosures are outpacing manual remediation capacity. We present a Zero-Touch Vulnerability Remediation Framework combining OpenVAS scanning, multi-source threat intelligence, and Large Language Models (LLMs) enhanced through Retrieval-Augmented Generation (RAG). The Scanning Layer normalizes findings into structured JSON; the AI Decision Layer applies [...] Read more.
Vulnerability disclosures are outpacing manual remediation capacity. We present a Zero-Touch Vulnerability Remediation Framework combining OpenVAS scanning, multi-source threat intelligence, and Large Language Models (LLMs) enhanced through Retrieval-Augmented Generation (RAG). The Scanning Layer normalizes findings into structured JSON; the AI Decision Layer applies hybrid FAISS + BM25 retrieval, dual-LLM verification (a primary generator checked by a gpt-4o auxiliary verifier), and confidence-based routing; the Orchestration Layer executes validated patches via CI/CD pipelines with automated rollback. On 350 real-world vulnerability cases across five GPT-family models, the full Prompt + RAG pipeline raised accuracy from 52.0% to 76.7–82.6% (all p < 0.001, Cohen’s h = 0.51–0.68) and reduced hallucination from 23.4% to 7.8%. Confidence routing routed 34.9% of cases to the high-confidence auto-execution tier, yielding a 4.1% rollback rate and zero service outages. The framework addresses the most relevant categories of the OWASP LLM Top 10 and lays groundwork for enterprise-scale, Zero-Touch vulnerability management. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

18 pages, 1815 KB  
Article
Predictive Maintenance MCP: An Open-Source Framework for Bridging Large Language Models and Industrial Condition Monitoring via the Model Context Protocol
by Luigi Gianpio Di Maggio
Appl. Sci. 2026, 16(6), 2812; https://doi.org/10.3390/app16062812 - 15 Mar 2026
Viewed by 296
Abstract
This paper presents a Proof of Concept (PoC) for PredictiveMaintenance MCP, an open-source server based on the Model Context Protocol (MCP) that supports machine condition monitoring and predictive maintenance via natural language interaction with Large Language Models (LLMs). The server constrains the [...] Read more.
This paper presents a Proof of Concept (PoC) for PredictiveMaintenance MCP, an open-source server based on the Model Context Protocol (MCP) that supports machine condition monitoring and predictive maintenance via natural language interaction with Large Language Models (LLMs). The server constrains the LLM within an explicit perimeter of deterministic resources and tools for vibration-based diagnostics, including FFT spectral analysis with peak identification, envelope analysis for rolling element bearing defects, time-domain indicators, vibration severity assessment consistent with ISO standards and semi-supervised anomaly detection on extracted features. Each tool invocation produces structured outputs and artifacts that record inputs, parameters, and results. The LLM acts as an orchestrator that selects resources, configures parameters, invokes tools, and synthesizes conclusions anchored to computed evidence, thereby improving traceability and repeatability compared to unconstrained text-only interaction. End-to-end workflows are demonstrated in a reproducible package with code, examples, and demo data to support community-driven validation and extension toward industrial requirements. The software is archived on Zenodo and the GitHub repository serves as the collaboration hub. Full article
(This article belongs to the Section Mechanical Engineering)
Show Figures

Figure 1

27 pages, 717 KB  
Article
Cognitively Diverse Multiple-Choice Question Generation: A Hybrid Multi-Agent Framework with Large Language Models
by Yu Tian, Linh Huynh, Katerina Christhilf, Shubham Chakraborty, Micah Watanabe, Tracy Arner and Danielle McNamara
Electronics 2026, 15(6), 1209; https://doi.org/10.3390/electronics15061209 - 13 Mar 2026
Viewed by 293
Abstract
Recent advances in large language models (LLMs) have made automated multiple-choice question (MCQ) generation increasingly feasible; however, reliably producing items that satisfy controlled cognitive demands remains a challenge. To address this gap, we introduce ReQUESTA, a hybrid, multi-agent framework for generating cognitively diverse [...] Read more.
Recent advances in large language models (LLMs) have made automated multiple-choice question (MCQ) generation increasingly feasible; however, reliably producing items that satisfy controlled cognitive demands remains a challenge. To address this gap, we introduce ReQUESTA, a hybrid, multi-agent framework for generating cognitively diverse MCQs that systematically target text-based, inferential, and main idea comprehension. ReQUESTA decomposes MCQ authoring into specialized subtasks and coordinates LLM-powered agents with rule-based components to support planning, controlled generation, iterative evaluation, and post-processing. We evaluated the framework in a large-scale reading comprehension study using academic expository texts, comparing ReQUESTA-generated MCQs with those produced by a single-pass GPT-5 zero-shot baseline. Psychometric analyses of learner responses assessed item difficulty and discrimination, while expert raters evaluated question quality across multiple dimensions, including topic relevance and distractor quality. Results showed that ReQUESTA-generated items were consistently more challenging, more discriminative, and more strongly aligned with overall reading comprehension performance. Expert evaluations further indicated stronger alignment with central concepts and superior distractor linguistic consistency and semantic plausibility, particularly for inferential questions. These findings demonstrate that hybrid, agentic orchestration can systematically improve the reliability and controllability of LLM-based generation, highlighting workflow design as a key lever for structured artifact generation beyond single-pass prompting. Full article
(This article belongs to the Special Issue Multi-Agentic Systems for Automated Task Execution)
Show Figures

Figure 1

18 pages, 28063 KB  
Article
Towards Hyper-Personalized Travel Planning: A Multimodal AI Agent with Integrated Neural Rendering for Immersive Itineraries
by José Márquez-Algaba, Pablo Vicente-Martínez, Emilio Soria-Olivas, Manuel Sánchez-Montañés, María Ángeles García-Escrivà and Edu William-Secin
Electronics 2026, 15(6), 1142; https://doi.org/10.3390/electronics15061142 - 10 Mar 2026
Viewed by 402
Abstract
The digital transformation of the tourism industry faces a dual challenge: the fragmentation of data across platforms and the lack of immersive “try-before-you-buy” experiences. While Large Language Models (LLMs) have revolutionized information synthesis, they typically lack real-time visual verification capabilities. This paper proposes [...] Read more.
The digital transformation of the tourism industry faces a dual challenge: the fragmentation of data across platforms and the lack of immersive “try-before-you-buy” experiences. While Large Language Models (LLMs) have revolutionized information synthesis, they typically lack real-time visual verification capabilities. This paper proposes a novel, multimodal AI Agent architecture that integrates advanced natural language planning with photorealistic 3D visualization. We present a system where a conversational agent, powered by Gemini 2.5 Flash, orchestrates a suite of dynamic tools to build structured travel itineraries (flights, hotels, activities) while simultaneously deploying a neural rendering engine. This engine utilizes a modular Structure-from-Motion (SfM) pipeline feeding into 3D Gaussian Splatting (3DGS) to render navigable, high-fidelity digital twins of hotel facilities directly within the chat interface. Positioned as a Technology Readiness Level 4 (TRL 4) proof of concept (PoC), this work demonstrates the technical feasibility of the multimodal integration between conversational logic and automated visual synthesis. The results demonstrate the technical feasibility of a pipeline that dynamically binds LLM inference to 3D spatial data, providing a foundation for high-fidelity, interactive travel consultancy. Full article
Show Figures

Figure 1

16 pages, 775 KB  
Review
ChatMicroscopy: A Perspective Review of Large Language Models for Next-Generation Optical Microscopy
by Giuseppe Sancataldo
Appl. Sci. 2026, 16(5), 2502; https://doi.org/10.3390/app16052502 - 5 Mar 2026
Viewed by 316
Abstract
Optical microscopy is a fundamental tool in the physical, chemical, and life sciences, enabling direct investigation of structure, dynamics, and function across multiple spatial and temporal scales. Advances in optical design, detectors, and computational techniques have greatly enhanced performance, but have also increased [...] Read more.
Optical microscopy is a fundamental tool in the physical, chemical, and life sciences, enabling direct investigation of structure, dynamics, and function across multiple spatial and temporal scales. Advances in optical design, detectors, and computational techniques have greatly enhanced performance, but have also increased the complexity of modern microscopes, which are now software-driven and embedded in data-intensive workflows. Artificial intelligence has become an important component of this landscape, particularly through task-specific machine learning approaches for image analysis, optimization, and limited instrument control. While effective, these solutions are often fragmented and lack the ability to integrate experimental intent, contextual knowledge, and multi-step reasoning. Recent progress in large language models (LLMs) offers a new paradigm for intelligent microscopy. As foundation models trained on large-scale text and code, LLMs exhibit emergent capabilities in reasoning, abstraction, and tool coordination, allowing them to act as natural interfaces between users and complex experimental systems. This perspective highlights how LLMs can function as cognitive and orchestration layers that connect experiment design, instrument control, data analysis, and knowledge integration. Emerging applications include conversational microscope control, workflow supervision, and scientific assistance for data exploration and hypothesis generation, alongside important technical, ethical, and governance challenges. Full article
(This article belongs to the Special Issue Biomedical Optics and Imaging: Latest Advances and Prospects)
Show Figures

Figure 1

31 pages, 1230 KB  
Review
A Review of Multi-Agent AI Systems for Biological and Clinical Data Analysis
by Jackson Spieser, Ali Balapour, Jarek Meller, Krushna C. Patra and Behrouz Shamsaei
Methods Protoc. 2026, 9(2), 33; https://doi.org/10.3390/mps9020033 - 28 Feb 2026
Viewed by 663
Abstract
This review evaluates the emerging paradigm of multi-agent systems (MASs) for biomedical and clinical data analysis, focusing on their ability to overcome the reasoning and reliability limitations of standalone large language models (LLMs). We synthesize findings from recent architectural frameworks, specifically LangGraph, CrewAI, [...] Read more.
This review evaluates the emerging paradigm of multi-agent systems (MASs) for biomedical and clinical data analysis, focusing on their ability to overcome the reasoning and reliability limitations of standalone large language models (LLMs). We synthesize findings from recent architectural frameworks, specifically LangGraph, CrewAI, and the Model Context Protocol (MCP), to examine how specialized agent teams divide labor, utilize precision tools, and cross-verify outputs. We find that MAS architectures yield significant performance gains in various domains: recent implementations improved oncology decision-making accuracy from 30.3% to 87.2% and reached a peak of 93.2% accuracy on USMLE-style benchmarks through simulated clinical evolution. In clinical trial matching, multi-agent frameworks achieved 87.3% accuracy and enhanced clinician screening efficiency by 42.6% (p < 0.001). However, we also highlight critical operational challenges, including an unreliability tax of 15–50× higher token consumption compared to standalone models and the risk of cascading errors where initial hallucinations are amplified across the agent collective. We conclude that while MAS enables a shift toward collaborative intelligence in biomedicine, its clinical and research adoption requires the development of deterministic orchestration and rigorous cost-utility frameworks to ensure safety and expert-centered oversight. Full article
(This article belongs to the Section Biomedical Sciences and Physiology)
Show Figures

Figure 1

21 pages, 372 KB  
Review
Open-Source Large Language Models in Education: A Narrative Review of Evidence, Pedagogical Roles, and Learning Outcomes
by Michael Pin-Chuan Lin, Jing-Yuan Huang, Daniel H. Chang, Gerald Tembrevilla, G. Michael Bowen, Eric Poitras, Vasudevan Janarthanan and Jeeho Ryoo
AI Educ. 2026, 2(1), 4; https://doi.org/10.3390/aieduc2010004 - 27 Feb 2026
Viewed by 878
Abstract
Open-source large language models (LLMs) are increasingly explored in educational contexts due to their transparency, adaptability, and alignment with institutional governance and equity considerations. Despite growing interest, empirical research on how open-source LLMs are deployed in education and what evidence currently supports their [...] Read more.
Open-source large language models (LLMs) are increasingly explored in educational contexts due to their transparency, adaptability, and alignment with institutional governance and equity considerations. Despite growing interest, empirical research on how open-source LLMs are deployed in education and what evidence currently supports their integration remains limited and fragmented. This paper presents a state-of-the-art narrative review of peer-reviewed, human empirical studies examining the use of open-source LLMs in education. Guided by three questions, the review synthesizes how open-source LLMs are deployed across instructional contexts, what learner-related evidence is reported, and how teachers engage in human–AI collaboration. The reviewed literature is concentrated in higher education, particularly within computer science and programming domains, with applications focused on post-class tutoring, guidance, and formative feedback. Learner perceptions are generally positive, but evidence linking open-source LLM use to measurable learning outcomes remains emerging and inconsistent. Through interpretive synthesis, the review articulates a four-role model—Designer, Facilitator, Monitor, and Evaluator—that captures how teacher agency is enacted across AI-supported instructional workflows. This review maps recurring orchestration dimensions, decision points, and tensions that characterize early implementations, and it proposes a minimal orchestration reporting scaffold (configuration, boundaries, logging, adjudication) intended to support auditability and cross-study comparison as the empirical base develops. Full article
Show Figures

Figure 1

24 pages, 2591 KB  
Article
AI-Driven IFC Processing for Automated IBS Scoring
by Annamária Behúnová, Matúš Pohorenec, Lucia Ševčíková and Marcel Behún
Algorithms 2026, 19(3), 178; https://doi.org/10.3390/a19030178 - 27 Feb 2026
Viewed by 341
Abstract
The assessment of Industrialized Building System (IBS) adoption in construction projects—a critical metric for evaluating prefabrication levels and construction modernization—remains largely manual, time-intensive, and prone to inconsistencies, with practitioners typically requiring 4–8 h to evaluate a single building using spreadsheet-based frameworks and visual [...] Read more.
The assessment of Industrialized Building System (IBS) adoption in construction projects—a critical metric for evaluating prefabrication levels and construction modernization—remains largely manual, time-intensive, and prone to inconsistencies, with practitioners typically requiring 4–8 h to evaluate a single building using spreadsheet-based frameworks and visual documentation review. This paper presents a novel AI-enhanced workflow architecture that automates IBS scoring through systematic processing of Industry Foundation Classes (IFC) building information models—the first documented integration of web-based IFC processing, visual workflow automation (n8n), and large language model (LLM) reasoning specifically for construction industrialization assessment. The proposed system integrates a web-based frontend for IFC file upload and configuration, an n8n workflow automation backend orchestrating data transformation pipelines, and an Azure OpenAI-powered scoring engine (GPT-4o-mini and GPT-5-0-mini) that applies Construction Industry Standard (CIS) 18:2023 rules to extracted building data. Experimental validation across 136 diverse IFC building models (ranging from 0.01 MB to 136.26 MB) achieved a 100% processing success rate with a median processing duration of 61.62 s per model, representing approximately 99% time reduction compared to conventional manual assessment requiring 4–8 h of expert practitioner effort. The system demonstrated consistent scoring performance with IBS scores ranging from 31.24 to 100.00 points (mean 37.14, SD 8.84), while GPT-5-0-mini exhibited 71% faster inference (mean 23.4 s) compared to GPT-4o-mini (mean 80.2 s) with no significant scoring divergence, validating prompt engineering robustness across model generations. Processing efficiency scales approximately linearly with file size (0.67 s per megabyte), enabling real-time design feedback and portfolio-scale batch processing previously infeasible with manual methods. Unlike prior rule-based compliance checking systems requiring extensive manual programming, this approach leverages LLM semantic reasoning to interpret ambiguous construction classifications while maintaining deterministic scoring through structured prompt engineering. The system addresses key interoperability challenges in IFC data heterogeneity while maintaining traceability and compliance with established scoring methodologies. This research establishes a replicable architectural pattern for BIM-AI integration in construction analytics and positions LLM-enhanced IFC processing as a practical, accessible approach for industrialization evaluation that democratizes advanced assessment capabilities through open-source workflow automation technologies. Full article
(This article belongs to the Special Issue AI Applications and Modern Industry)
Show Figures

Figure 1

31 pages, 2277 KB  
Article
Performance Comparison of a Neuro-Symbolic Large Language Model System Versus Human Experts in Acute Cholecystitis Management
by Evren Ekingen and Mete Ucdal
J. Clin. Med. 2026, 15(5), 1730; https://doi.org/10.3390/jcm15051730 - 25 Feb 2026
Viewed by 407
Abstract
Background/Objectives: Large language models (LLMs) have shown promising results in medical decision support; however, their effectiveness in managing acute cholecystitis and other gallbladder diseases remains insufficiently examined. This study evaluated the performance of a neuro-symbolic LLM system that integrates multiple AI agents with [...] Read more.
Background/Objectives: Large language models (LLMs) have shown promising results in medical decision support; however, their effectiveness in managing acute cholecystitis and other gallbladder diseases remains insufficiently examined. This study evaluated the performance of a neuro-symbolic LLM system that integrates multiple AI agents with neural–symbolic reasoning for acute cholecystitis management and compared its diagnostic accuracy with that of human expert physicians across three clinical specialties. Methods: This multi-center cross-sectional study included 30 case-based questions covering acute cholecystitis and gallbladder diseases, stratified across eight predefined disease categories: acute calculous cholecystitis (n = 6), acute acalculous cholecystitis (n = 2), complicated cholecystitis including gangrenous, emphysematous, and perforated variants (n = 5), chronic cholecystitis and biliary colic (n = 4), gallbladder polyps and adenomyomatosis (n = 3), Mirizzi syndrome (n = 2), gallbladder carcinoma (n = 4), and post-cholecystectomy complications (n = 4). Questions were categorized into diagnosis (n = 10), treatment (n = 10), and complications/prognosis (n = 10). Gold standard answers were established through consensus by an expert panel consisting of two senior general surgery expert clinicians and one senior emergency medicine expert clinician, each with more than 20 years of clinical experience, utilizing the Tokyo Guidelines 2018 (TG18) as the reference standard for diagnostic criteria, severity grading, and management recommendations. The expert panel achieved unanimous consensus on all 30 gold standard answers. All responses were cross-referenced against the primary TG18 publications to ensure guideline-based rather than solely opinion-based reference standards. This consensus-based, guideline-anchored approach is consistent with established methodologies for gold standard establishment in AI diagnostic accuracy studies. Performance of a neuro-symbolic LLM system orchestrated via LangGraph v1.0 was compared against 10 general surgery specialists, 10 emergency medicine physicians, and 10 gastroenterology specialists from four tertiary centers in Turkey. The neuro-symbolic system incorporated the Tokyo Guidelines 2018 (TG18) as its symbolic knowledge base for diagnostic criteria, severity grading, and management algorithms. Results: The neuro-symbolic system attained the highest overall accuracy rate of 96.7% (29/30), markedly surpassing the performance of general surgery specialists (average 82.3% ± 6.8%), emergency medicine physicians (average 71.0% ± 8.2%), and gastroenterology specialists (average 78.7% ± 7.4%). Furthermore, the neuro-symbolic system exhibited superior performance across all clinical categories. Among human participants, general surgeons showed the highest accuracy in treatment decisions (88.0%), while gastroenterologists excelled in diagnostic questions (82.0%). Emergency medicine physicians showed comparable performance to other specialties in acute presentation scenarios. ROC analysis revealed excellent discrimination for the neuro-symbolic system (AUC = 0.983) compared to general surgery (AUC = 0.856), gastroenterology (AUC = 0.821), and emergency medicine (AUC = 0.764). Conclusions: The neuro-symbolic LLM system exhibited superior performance in standardized guideline-concordant case-based assessment of acute cholecystitis management compared to all human expert groups, reflecting its consistent application of encoded guideline criteria. These findings support its potential role as a clinical decision-support tool that augments, rather than replaces, physician expertise. The system’s consistent application of standardized guidelines indicates its potential utility as a clinical decision support tool, particularly in settings where specialist expertise is limited. However, these results should be interpreted within the constraints of a structured case-based evaluation and do not imply global clinical superiority over human experts. Full article
Show Figures

Graphical abstract

26 pages, 3776 KB  
Article
AgoraAI: An Open-Source Voice-to-Voice Framework for Multi-Persona and Multi-Human Interaction
by Antonio Concha-Sánchez, José Adalberto Bernal-Millan, Alfredo Hernández-Muñiz and Suresh Kumar Gadi
Appl. Sci. 2026, 16(4), 2120; https://doi.org/10.3390/app16042120 - 22 Feb 2026
Viewed by 556
Abstract
This article presents AgoraAI, an open-source framework designed to enable dynamic, multi-participant conversations by integrating Multi-Persona Orchestration within a shared conversational environment. Unlike traditional single-agent Large Language Model (LLM) interactions or passive commercial meeting assistants, AgoraAI allows users to configure distinct AI personas [...] Read more.
This article presents AgoraAI, an open-source framework designed to enable dynamic, multi-participant conversations by integrating Multi-Persona Orchestration within a shared conversational environment. Unlike traditional single-agent Large Language Model (LLM) interactions or passive commercial meeting assistants, AgoraAI allows users to configure distinct AI personas that engage in active facilitation and simultaneous, turn-based dialogues with human participants. The system supports diverse high-stakes use cases, including formal panel discussions and interactive educational settings. Crucially, this work addresses the engineering challenge of the “Concurrency-Coherence Paradox” in real-time voice systems. Key architectural contributions include: (1) the implementation of Asynchronous Dual-Queue Processing, a thread-safe integration strategy that synchronizes real-time Speech-to-Text streams with LLM generation to resolve race conditions; and (2) Dynamic Context-Injection pipelines that ensure persona consistency. The platform’s ecological validity is demonstrated through deployment in a human-supervised Master’s thesis seminar and a corporate coordination meeting. Results from an exploratory pilot study indicate high usability, perceived utility, and strong user acceptance. These findings suggest that AgoraAI provides a flexible, empirically evaluated architecture for democratizing multi-perspective collaboration across education, research, and professional domains. Full article
(This article belongs to the Special Issue State of the Art in AI-Based Co-Creativity)
Show Figures

Figure 1

43 pages, 1927 KB  
Article
A Large-Scale Empirical Study of LLM Orchestration and Ensemble Strategies for Sentiment Analysis in Recommender Systems
by Konstantinos I. Roumeliotis, Dionisis Margaris, Dimitris Spiliotopoulos and Costas Vassilakis
Future Internet 2026, 18(2), 112; https://doi.org/10.3390/fi18020112 - 20 Feb 2026
Viewed by 883
Abstract
This paper presents a comprehensive empirical evaluation comparing meta-model aggregation strategies with traditional ensemble methods and standalone models for sentiment analysis in recommender systems beyond standalone large language model (LLM) performance. We investigate whether aggregating multiple LLMs through a reasoning-based meta-model provides measurable [...] Read more.
This paper presents a comprehensive empirical evaluation comparing meta-model aggregation strategies with traditional ensemble methods and standalone models for sentiment analysis in recommender systems beyond standalone large language model (LLM) performance. We investigate whether aggregating multiple LLMs through a reasoning-based meta-model provides measurable performance advantages over individual models and standard statistical aggregation approaches in zero-shot sentiment classification. Using a balanced dataset of 5000 verified Amazon purchase reviews (1000 reviews per rating category from 1 to 5 stars, sampled via two-stage stratified sampling across five product categories), we evaluate 12 different leading pre-trained LLMs from four major providers (OpenAI, Anthropic, Google, and DeepSeek) in both standalone and meta-model configurations. Our experimental design systematically compares individual model performance against GPT-based meta-model aggregation and traditional ensemble baselines (majority voting, mean aggregation). Results show statistically significant improvements (McNemar’s test, p < 0.001): the GPT-5 meta-model achieves 71.40% accuracy (10.15 percentage point improvement over the 61.25% individual model average), while the GPT-5 mini meta-model reaches 70.32% (9.07 percentage point improvement). These observed improvements surpass traditional ensemble methods (majority voting: 62.64%; mean aggregation: 62.96%), suggesting potential value in meta-model aggregation for sentiment analysis tasks. Our analysis reveals empirical patterns including neutral sentiment classification challenges (3-star ratings show 64.83% failure rates across models), model influence hierarchies, and cost-accuracy trade-offs ($130.45 aggregation cost vs. $0.24–$43.97 for individual models per 5000 predictions). This work provides evidence-based insights into the comparative effectiveness of LLM aggregation strategies in recommender systems, demonstrating that meta-model aggregation with natural language reasoning capabilities achieves measurable performance gains beyond statistical aggregation alone. Full article
(This article belongs to the Special Issue Intelligent Agents and Their Application)
Show Figures

Graphical abstract

19 pages, 1131 KB  
Article
Multi-Agent-Based Smart-Home Energy Management with Adaptive Reasoning
by Elena Dolinin and Chairi Kiourt
Appl. Sci. 2026, 16(4), 1896; https://doi.org/10.3390/app16041896 - 13 Feb 2026
Viewed by 564
Abstract
This paper introduces SmartHouseOperator, a multi-agent intelligent control framework for adaptive and energy-efficient smart-home management. Modern smart homes integrate heterogeneous devices and sensors, yet most existing solutions rely on static rules or manual coordination, limiting their ability to adapt to dynamic environmental conditions [...] Read more.
This paper introduces SmartHouseOperator, a multi-agent intelligent control framework for adaptive and energy-efficient smart-home management. Modern smart homes integrate heterogeneous devices and sensors, yet most existing solutions rely on static rules or manual coordination, limiting their ability to adapt to dynamic environmental conditions and evolving user preferences. SmartHouseOperator addresses these limitations through an agentic architecture that coordinates device-specific agents for air conditioning, lighting, refrigeration, and shutters under a central orchestrator. The system combines contextual inputs (e.g., weather, occupancy, power load), persistent knowledge, reinforcement-learning-based preference modeling, and LLM-powered reasoning to enable coordinated and personalized control decisions. Experimental results show that the framework achieves consistent reasoning performance across multiple agent orchestration engines and reduces air-conditioning power consumption by up to 16% under critical load conditions. These findings demonstrate the potential of multi-agent, learning-enabled control systems to deliver intelligent, energy-aware, and user-centric smart-home operation. Full article
(This article belongs to the Special Issue Advancements and Applications in Reinforcement Learning)
Show Figures

Figure 1

Back to TopTop