A Review of Multi-Agent AI Systems for Biological and Clinical Data Analysis
Abstract
1. Introduction and Methodology
Methods
2. Definitions and Frameworks
2.1. Agent Definitions and Orchestration Frameworks
2.2. Memory, Guardrails, and Communication Protocols
3. State of the Art in Biomedicine
3.1. Basic Science Applications
3.1.1. Drug Discovery and Pharmacology
3.1.2. Bioinformatics and Multi-Omics Analysis
3.1.3. Cancer Biology
3.2. Clinical Applications
3.2.1. Medical Imaging and Multimodal Diagnosis
3.2.2. Clinical Trials and Evidence Synthesis
3.2.3. Clinical Decision Support
4. Opportunities and Underutilized Domains
5. Platforms and Benchmarks
6. Challenges and Future Directions
6.1. Reliability, Verification, and Safety
6.2. Scalability and Efficiency
6.3. Continual Learning and Adaptation
6.4. Ethics, Regulation, and Trust
7. Discussion
- High-Stakes Scenarios (Justified): In domains such as oncology or the identification of rare, fatal conditions, a 15% improvement in diagnostic precision has the potential to be transformative. The systemic burden of diagnostic safety lapses is estimated at 1.8% of GDP in OECD countries [55]. For patients with suspected rare diseases, uncoordinated clinical investigations typically incur costs 7.6-fold higher than matched controls [41]. In these high-morbidity settings, the sub-dollar cost of additional tokens is negligible compared to the thousands of dollars saved by mitigating redundant secondary testing and acute care utilization. Recent evaluations indicate that structured agentic panels can reduce complex diagnostic expenditures from approximately $7850 to $2397 while significantly increasing specificity over human specialists [54,55].
- Low-Stakes Scenarios (Prohibitive): Conversely, for routine, high-volume tasks such as the triage of the common cold, contact dermatitis, or administrative chart summaries, a 50-fold cost increase provides diminishing returns. In these low-risk environments, the marginal gains do not offset the increased latency, API costs, or computational footprint [41,55]. For such tasks, standalone models remain the preferred economic choice, as the clinical utility of multi-agent deliberation does not warrant the “unreliability tax” of excessive agent communication.
8. Future Directions
- Rigorous Real-World Evaluation: The MAS community should move beyond purely benchmark-driven experiments and conduct real-world deployment studies. Pilot implementations, for instance, testing a multi-agent system as a virtual tumor board in an oncology department, or as a decision-support assistant in live hospital workflows, would be invaluable for assessing true utility and reliability [35]. Such trials will help determine whether MAS can genuinely improve outcomes or efficiency in practice, and they will reveal unanticipated failure modes and human–AI interaction challenges that are not evident from offline simulations.
- Technical Robustness and Efficiency: Continued research is needed to improve the reliability, scalability, and efficiency of MAS. This includes incorporating redundancy and consensus mechanisms as standard practice to improve accuracy. Communication protocols must evolve toward ‘value-aware’ orchestration, where agentic depth is dynamically scaled based on the Cost-Utility Analysis (CUA) of the task at hand. Rather than applying a uniform agentic reasoning to all queries, future MAS should autonomously reserve high-compute multi-agent deliberation for high-stakes clinical scenarios, such as working up rare or complex disease variants, while utilizing leaner single-agent pathways for routine administrative triage [66]. Methods for safe continual learning are also important so that agent teams can update their knowledge bases as medical knowledge evolves, without compromising prior validated performance. Additionally, there is a clear need for better agent observability and debugging tools. For example, dashboards or immutable logs that allow developers and regulators to inspect how decisions were reached by the agent collective. Improving transparency and auditability of MAS reasoning will build confidence that these systems are working as intended [69]. Together, advances in these areas will determine whether MAS can evolve from exciting demos into trusted, production-grade systems.
- Human-Centered Design and Governance: Finally, the development of MAS should be guided by human-centered principles. Domain experts, such as clinicians, biomedical researchers, and pharmacologists, must be involved in co-designing multi-agent solutions for their workflows. Human oversight should remain integral to MAS deployments: studies consistently indicate that keeping doctors and scientists involved is essential for reviewing AI outputs, correcting errors, and making the ultimate decisions. Features that facilitate this oversight and build user trust are crucial. For example, developers should include explanation modules (an interface that clearly explains the team’s reasoning in human-understandable terms) and implement fail-safe triggers that defer to human judgment when the system’s uncertainty is high or when ethical boundaries might be crossed [74]. Importantly, MAS should be positioned as assistive tools that enhance human expertise, rather than autonomous entities that supplant it. By designing for transparency, control, and accountability from the start, we can ensure these AI agent teams are adopted as dependable collaborators in medicine and science. With diligent interdisciplinary effort, spanning AI research, software engineering, clinical evaluation, and ethics, MAS can mature into a transformative asset for biomedicine.
9. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A survey on large language model based autonomous agents. Front. Comput. Sci. 2024, 18, 186345. [Google Scholar] [CrossRef]
- Li, X.; Wang, S.; Zeng, S.; Yang, Y. A survey on LLM-based multi-agent systems: Workflow, infrastructure, and challenges. Vicinagearth 2024, 1, 9. [Google Scholar] [CrossRef]
- Xi, Z.; Chen, W.; Guo, X.; He, W.; Ding, Y.; Hong, B.; Zhang, M.; Wang, J.; Jin, S.; Zhou, E.; et al. The rise and potential of large language model based agents: A survey. Sci. China Inf. Sci. 2025, 68, 121101. [Google Scholar] [CrossRef]
- Gao, S.; Fang, A.; Huang, Y.; Giunchiglia, V.; Noori, A.; Schwarz, J.R.; Ektefaie, Y.; Kondic, J.; Zitnik, M. Empowering biomedical discovery with AI agents. Cell 2024, 187, 6125–6151. [Google Scholar] [CrossRef]
- Sami, A.M.; Rasheed, Z.; Kemell, K.K.; Waseem, M.; Kilamo, T.; Saari, M.; Abrahamsson, P. System for systematic literature review using multiple ai agents: Concept and an empirical evaluation. arXiv 2024, arXiv:2403.08399. [Google Scholar] [CrossRef]
- Gottesman, O.; Johansson, F.; Komorowski, M.; Faisal, A.; Sontag, D.; Doshi-Velez, F.; Celi, L.A. Guidelines for reinforcement learning in healthcare. Nat. Med. 2019, 25, 16–18. [Google Scholar] [CrossRef] [PubMed]
- Wooldridge, M.; Jennings, N.R. Intelligent agents: Theory and practice. Knowl. Eng. Rev. 1995, 10, 115–152. [Google Scholar] [CrossRef]
- Stone, P.; Veloso, M. Multiagent systems: A survey from a machine learning perspective. Auton. Robot. 2000, 8, 345–383. [Google Scholar] [CrossRef]
- Adimulam, A.; Gupta, R.; Kumar, S. The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption. arXiv 2026, arXiv:2601.13671. [Google Scholar] [CrossRef]
- Hong, S.; Zhuge, M.; Chen, J.; Zheng, X.; Cheng, Y.; Zhang, C.; Wang, J.; Wang, Z.; Yau, S.K.S.; Lin, Z.; et al. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. arXiv 2023, arXiv:2308.00352. [Google Scholar]
- Shen, Y.; Song, K.; Tan, X.; Li, D.; Lu, W.; Zhuang, Y. HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face. Adv. Neural Inf. Process. Syst. 2023, 36, 38154–38180. [Google Scholar]
- Wu, Q.; Bansal, G.; Zhang, J.; Wu, Y.; Zhang, S.; Zhu, E.; Li, B.; Jiang, L.; Zhang, X.; Wang, C. AutoGen: Enabling next-gen LLM applications via multi-agent conversation frameworks. arXiv 2023, arXiv:2308.08155. [Google Scholar]
- Li, G.; Hammoud, H.; Itani, H.; Khizbullin, D.; Ghanem, B. CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society. arXiv 2023, arXiv:2303.17760. [Google Scholar] [CrossRef]
- Derouiche, H.; Brahmi, Z.; Mezni, H. Agentic AI Frameworks: Architectures, Protocols, and Design Challenges. arXiv 2025, arXiv:2508.10146. [Google Scholar] [CrossRef]
- Duan, Z.; Wang, J. Exploration of LLM Multi-Agent Application Implementation Based on LangGraph+CrewAI. arXiv 2024, arXiv:2411.18241. [Google Scholar]
- Joshi, S. Review of autonomous systems and collaborative AI agent frameworks. Int. J. Sci. Res. Arch. 2025, 14, 961–972. [Google Scholar] [CrossRef]
- Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.R.; Cao, Y. ReAct: Synergizing reasoning and acting in language models. Adv. Neural Inf. Process. Syst. 2023, 36, 30636–30650. [Google Scholar]
- Schick, T.; Dwivedi-Yu, J.; Dessì, R.; Raileanu, R.; Lomeli, M.; Hambro, E.; Scialom, T. Toolformer: Language models can teach themselves to use tools. arXiv 2023, arXiv:2302.04761. [Google Scholar] [CrossRef]
- Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.; Chi, E.; Narang, S.; Zhou, D. Self-consistency improves chain-of-thought reasoning in language models. arXiv 2022, arXiv:2203.11171. [Google Scholar]
- Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.; Cao, Y.; Narasimhan, K. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv 2023, arXiv:2305.10601. [Google Scholar] [CrossRef]
- Shinn, N.; Labash, M.F.; Tran, T. Reflexion: Language agents with verbal reinforcement learning. arXiv 2023, arXiv:2303.11366. [Google Scholar] [CrossRef]
- Madaan, A.; Tandon, N.; Gupta, P.; Hallinan, S.; Gao, L.; Wiegreffe, S.; Clark, P. Self-refine: Iterative refinement with self-feedback. Adv. Neural Inf. Process. Syst. 2023, 36, 3940–3955. [Google Scholar]
- Wang, T.; Yu, P.; Tan, X.E.; O’Brien, S.; Pasunuru, R.; Dwivedi-Yu, J.; Celikyilmaz, A. Shepherd: A Critic for Language Model Generation. arXiv 2023, arXiv:2308.04592. [Google Scholar] [CrossRef]
- He, J.; Treude, C.; Lo, D. Llm-based multi-agent systems for software engineering: Literature review, vision, and the road ahead. ACM Trans. Softw. Eng. Methodol. 2025, 34, 1–30. [Google Scholar] [CrossRef]
- Louck, Y.; Stulman, A.; Dvir, A. Improving Google A2A Protocol: Protecting Sensitive Data and Mitigating Unintended Harms in Multi-Agent Systems. arXiv 2025, arXiv:2505.12490. [Google Scholar]
- Surapaneni, R.; Jha, M.; Vakoc, M.; Segal, T. Announcing the Agent2Agent Protocol (A2A)—A New Era of Agent Interoperability. Google for Developers Blog. 9 April 2025. Available online: https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/ (accessed on 1 October 2025).
- Zhang, Z.; Cui, S.; Lu, Y.; Zhou, J.; Yang, J.; Wang, H.; Huang, M. Agent-SafetyBench: Evaluating the safety of multi-agent AI systems. arXiv 2024, arXiv:2412.14470. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018; pp. 321–323. [Google Scholar]
- Ferber, D.; Wei, J.; Ghaffari Laleh, N.; Wu, Z.; Tan, Y.; Peng, C.; Wang, X.; Liu, Y.; Zhang, Z.; Chen, J.; et al. Multimodal Oncology Agent for IDH1 Mutation Prediction in Low-Grade Glioma. arXiv 2025, arXiv:2512.05824. [Google Scholar] [CrossRef]
- Multi-Agent Orchestration for Knowledge Extraction and Retrieval: AI Expert System for GPCRs. bioRxiv 2025. Available online: https://submit.biorxiv.org/submission/pdf?msid=BIORXIV/2025/696782 (accessed on 20 February 2026).
- Gao, S.; Zhu, R.; Kong, Z.; Noori, A.; Su, X.; Ginder, C.; Zitnik, M. TxAgent: An AI agent for therapeutic reasoning across a universe of tools. arXiv 2025, arXiv:2503.10970. [Google Scholar] [CrossRef]
- Su, H.; Long, W.; Zhang, Y. BioMaster: Multi-agent system for automated bioinformatics analysis workflows. bioRxiv 2025. [Google Scholar] [CrossRef]
- Mehandru, N.; Hall, A.K.; Melnichenko, O.; Dubinina, Y.; Tsirulnikov, D.; Bamman, D.; Malladi, V.S. Bioagents: Democratizing bioinformatics analysis with multi-agent systems. arXiv 2025, arXiv:2501.06314. [Google Scholar]
- Ferber, D.; El Nahhas, O.S.; Wölflein, G.; Wiest, I.C.; Clusmann, J.; Leßmann, M.E.; Kather, J.N. Development and validation of an autonomous artificial intelligence agent for clinical decision-making in oncology. Nat. Cancer 2025, 6, 1337–1349. [Google Scholar] [CrossRef]
- Chen, X.; Yi, H.; You, M.; Liu, W.; Wang, L.; Li, H.; Li, J. Enhancing diagnostic capability with multi-agent conversational LLMs. npj Digit. Med. 2025, 8, 159. [Google Scholar] [CrossRef] [PubMed]
- Jin, Q.; Wang, Z.; Floudas, C.S.; Chen, F.; Gong, C.; Bracken-Clarke, D.; Lu, Z. Matching patients to clinical trials with large language models. Nat. Commun. 2024, 15, 9074. [Google Scholar] [CrossRef] [PubMed]
- Gupta, S.; Basu, A.; Nievas, M.; Thomas, J.; Wolfrath, N.; Ramamurthi, A.; Singh, H. PRISM: Patient Records Interpretation for Semantic clinical trial Matching using LLMs. npj Digit. Med. 2024, 7, 35. [Google Scholar] [CrossRef]
- Naumov, V.; Zagirova, D.; Lin, S.; Xie, Y.; Gou, W.; Urban, A.; Zhavoronkov, A. DORA AI Scientist: Multi-agent Virtual Research Team for Scientific Exploration Discovery and Automated Report Generation. bioRxiv 2025. [Google Scholar] [CrossRef]
- Li, J.; Lai, Y.; Li, W.; Ren, J.; Zhang, M.; Kang, X.; Liu, Y. Agent Hospital: A simulacrum of hospital with evolvable medical agents. arXiv 2025, arXiv:2405.02957. [Google Scholar] [CrossRef]
- Miller, G.E. The assessment of clinical skills/competence/performance. Acad. Med. 1990, 65, S63–S67. [Google Scholar] [CrossRef]
- Glaubitz, R.; Heinrich, L.; Tesch, F.; Seifert, M.; Reber, K.C.; Marschall, U.; Müller, G. The cost of the diagnostic odyssey of patients with suspected rare diseases. Orphanet J. Rare Dis. 2025, 20, 222. [Google Scholar] [CrossRef]
- Kim, Y.; Park, C.; Jeong, H.; Chan, Y.S.; Xu, X.; McDuff, D.; Park, H.W. MDAgents: An adaptive collaboration of LLMs for medical decision-making. Adv. Neural Inf. Process. Syst. 2024, 37, 79410–79452. [Google Scholar]
- Wang, H.; Fu, T.; Du, Y.; Gao, W.; Huang, K.; Liu, Z.; Chandak, P.; Liu, S.; Van Katwyk, P.; Deac, A.; et al. Scientific discovery in the age of artificial intelligence. Nature 2023, 620, 47–60. [Google Scholar] [CrossRef]
- Alsentzer, E.; Li, M.M.; Kobren, S.N.; Noori, A.; Undiagnosed Diseases Network; Kohane, I.S.; Zitnik, M. Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases. npj Digit. Med. 2025, 8, 380. [Google Scholar] [CrossRef]
- Zhu, Y.; He, Z.; Hu, H.; Zheng, X.; Zhang, X.; Wang, Z.; Gao, J.; Ma, L.; Yu, L. MedAgentBoard: Evaluating multi-agent LLM collaboration for medical training. arXiv 2025, arXiv:2505.12371. [Google Scholar]
- Wei, H.; Qiu, J.; Yu, H.; Yuan, W. Medco: Medical education copilots based on a multi-agent framework. In European Conference on Computer Vision; Springer Nature Switzerland: Cham, Switzerland, 2024; pp. 119–135. [Google Scholar]
- Zheng, J.; Shi, C.; Cai, X.; Li, Q.; Zhang, D.; Li, C.; Yu, D.; Ma, Q. Lifelong learning of large language model based agents: A roadmap. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 1–20. [Google Scholar] [CrossRef]
- Mialon, G.; Dessì, R.; Lomeli, M.; Nalmpantis, C.; Pasunuru, R.; Raileanu, R.; Rozière, B.; Schick, T.; Dwivedi-Yu, J.; Celikyilmaz, A.; et al. Augmented language models: A survey. arXiv 2023, arXiv:2302.07842. [Google Scholar] [CrossRef]
- Deng, Z.; Guo, Y.; Han, C.; Ma, W.; Xiong, J.; Wen, S.; Xiang, Y. AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways. ACM Comput. Surv. 2025, 57, 182. [Google Scholar] [CrossRef]
- Sapkota, R.; Roumeliotis, K.I.; Karkee, M. AI agents vs. agentic AI: A conceptual taxonomy, applications and challenges. Inf. Fusion 2025, 126, 103599. [Google Scholar] [CrossRef]
- Su, H.; Luo, J.; Liu, C.; Yang, X.; Zhang, Y.; Dong, Y.; Zhu, J. A Survey on Autonomy-Induced Security Risks in Large Model-Based Agents. arXiv 2025, arXiv:2506.23844. [Google Scholar]
- Singh, A.; Ehtesham, A.; Kumar, S.; Khoei, T.T. A Survey of the Model Context Protocol (MCP): Standardizing Context to Enhance LLMs. Preprints 2025. [Google Scholar] [CrossRef]
- Hou, X.; Zhao, Y.; Wang, S.; Wang, H. Model Context Protocol (MCP): Landscape, Security Threats, and Future Research Directions. arXiv 2025, arXiv:2503.23278. [Google Scholar] [CrossRef]
- Klang, E.; Arnold, M.; Tessler, I.; Apakama, D.U.; Abbott, E.; Glicksberg, B.S.; Moses, A.; Nadkarni, G.N. Assessing retrieval-augmented large language models for medical coding. NEJM AI 2025, 2, AIcs2401161. [Google Scholar] [CrossRef]
- Slawomirski, L.; Kelly, D.; de Bienassis, K.; Kallas, K.A.; Klazinga, N. The Economics of Diagnostic Safety; OECD Health Working Papers, No. 176; OECD Publishing: Paris, France, 2025. [Google Scholar] [CrossRef]
- Nori, H.; Daswani, M.; Kelly, C.; Lundberg, S.; Ribeiro, M.T.; Wilson, M.; Liu, X.; Sounderajah, V.; Carlson, J.; Lungren, M.P.; et al. Sequential Diagnosis with Language Models. arXiv 2025, arXiv:2506.22405. [Google Scholar] [CrossRef]
- High-Level Expert Group on Artificial Intelligence. Ethics Guidelines for Trustworthy AI. European Commission. 2019. Available online: https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai (accessed on 5 July 2025).
- Yang, G.; Rao, A.; Fernandez-Maloigne, C.; Calhoun, V.; Menegaz, G. Explainable AI (XAI) In Biomedical Signal and Image Processing: Promises and Challenges. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 1531–1535. [Google Scholar] [CrossRef]
- Sullivan, H.R.; Schweikart, S.J. Are Current Tort Liability Doctrines Adequate for Addressing Injury Caused by AI? AMA J. Ethics 2019, 21, E160–E166. [Google Scholar] [CrossRef]
- Rigby, M.J. Ethical Dimensions of Using Artificial Intelligence in Health Care. AMA J. Ethics 2019, 21, E121–E124. [Google Scholar] [CrossRef]
- National Institute of Standards and Technology. Artificial Intelligence Risk Management Framework (AI RMF 1.0) (NIST AI 100-1); U.S. Department of Commerce: Washington, DC, USA, 2023. [Google Scholar] [CrossRef]
- ISO 14971:2019; Medical Devices: Application of Risk Management to Medical Devices. International Organization for Standardization (ISO): Geneva, Switzerland, 2019.
- IEC 62304:2006 + A1:2015; Medical Device Software: Software Life-Cycle Processes. International Electrotechnical Commission (IEC): Geneva, Switzerland, 2015.
- IMDRF Software as a Medical Device (SaMD) Working Group. Characterization Considerations for Medical Device Software and Software-Specific Risk. IMDRF/SaMD WG/N81 FINAL:2025; International Medical Device Regulators Forum: Geneva, Switzerland, 2025. [Google Scholar]
- U.S. Department of Health and Human Services (HHS). HIPAA Privacy Rule, 45 CFR Parts 160 and 164; Office for Civil Rights: Washington, DC, USA, 2023. [Google Scholar]
- Choi, H.K.; Zhu, X.; Li, S. Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models? arXiv 2025, arXiv:2508.17536. [Google Scholar] [CrossRef]
- Rose, C.; Preiksaitis, C. AI passed the test, but can it make the rounds? AEM Educ. Train. 2024, 8, e11044. [Google Scholar] [CrossRef]
- Kim, J.; Podlasek, A.; Shidara, K.; Liu, F.; Alaa, A.; Bernardo, D. Limitations of large language models in clinical problem-solving arising from inflexible reasoning. Sci. Rep. 2025, 15, 39426. [Google Scholar] [CrossRef] [PubMed]
- Jimenez-Romero, C.; Yegenoglu, A.; Blum, C. Multi-agent systems powered by large language models: Applications in swarm intelligence. Front. Artif. Intell. 2025, 8, 1593017. [Google Scholar] [CrossRef]
- Gu, L.; Zhu, Y.; Sang, H.; Wang, Z.; Sui, D.; Tang, W.; Harrison, E.; Gao, J.; Yu, L.; Ma, L. MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems. arXiv 2025, arXiv:2510.10185. [Google Scholar]
- Drummond, M.F.; Sculpher, M.J.; Claxton, K.; Stoddart, G.L.; Torrance, G.W. Methods for the Economic Evaluation of Health Care Programmes, 4th ed.; Oxford University Press: Oxford, UK, 2015. [Google Scholar]
- Gong, E.J.; Bang, C.S.; Lee, J.J.; Baik, G.H. Knowledge-Practice Performance Gap in Clinical Large Language Models: Systematic Review of 39 Benchmarks. J. Med. Internet Res. 2025, 27, e84120. [Google Scholar] [CrossRef]
- U.S. Food and Drug Administration. Artificial Intelligence in Software as a Medical Device. 2025. Available online: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-software-medical-device (accessed on 25 September 2025).
- Li, H.; Cheng, X.; Zhang, X. Accurate Insights, Trustworthy Interactions: Designing a Collaborative AI-Human Multi-Agent System with Knowledge Graph for Diagnosis Prediction. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ‘25); Article No. 788; Association for Computing Machinery: New York, NY, USA, 2025; pp. 1–15. [Google Scholar] [CrossRef]


| Dimension | LangGraph | CrewAI |
|---|---|---|
| Control Flow | Explicit, developer-defined state machine. Nodes are agents/tools; edges are deterministic transitions. Supports branching, loops, retries, parallelism. | Emergent role-based delegation. Planner, Executor, Reviewer coordinate dynamically. No fixed global graph: flow adapts to agent decisions. |
| State Handling | Centralized, persistent graph state passed between nodes; memory is first-class and scoped to workflow steps. | Distributed state via shared crew memory and message passing. Less centralized; context emerges from dialogue rather than a single state object. |
| Human-in-the-loop | Native support through approval nodes, gating steps, and enforced checkpoints. | Supported through reviewer/critic roles or human task overrides, but less formally structured. |
| Determinism | High. Given the same inputs and settings, execution is reproducible and replayable. | Medium. Agent decisions depend on role behavior, LLM variability, and message contents; reproducibility possible but less guaranteed. |
| Best-fit Workloads | Safety-critical, protocolized workflows; regulated pipelines; workflows needing strict branching logic, validation, compliance, or deterministic replay. | Creative ideation, collaborative reasoning, decomposition and critique tasks, research workflows, expert-team simulations, open-ended problem solving. |
| Limitations | More upfront design effort: large graphs can become complex; less flexible for exploratory tasks. | Less deterministic; harder to enforce strict protocols; role interactions can become noisy or redundant if not carefully designed. |
| Guardrail Area | Core Control Mechanism | Biomedical Implementation Example |
|---|---|---|
| Accountability & Transparency | Persistent, tamper-evident decision logging; structured explainability layers that expose intermediate agent reasoning; human-approval checkpoints for consequential actions. | Append-only audit trails capturing prompts, model versions, and tool outputs; clinician-readable summaries of agent deliberations before final recommendations. |
| Bias & Fairness | Continuous subgroup-performance auditing; counterargument or “devil’s-advocate” agents to challenge consensus; diversity of model sources to avoid correlated bias. | Periodic bias stress-tests across sex, race, and socioeconomic strata; fairness-review agents questioning diagnostic or treatment disparities. |
| Data & Privacy Governance | Principle of least privilege; automatic de-identification; end-to-end encryption; retrieval allow-lists; local-only data residency where possible. | Role-scoped access to de-identified EHR records; encrypted inter-agent channels; content filters that block PHI leakage during retrieval or tool use. |
| Safety & Human Oversight | Clinical rule-packs for contraindications and dosage safety; approval gates for high-risk or off-label actions; fallback to deterministic single-agent mode on anomalies. | Embedded drug–drug interaction checks; dual sign-off for therapeutic orders; circuit-breakers that halt unsafe multi-agent cascades. |
| Evidence & Auditability | Provenance enforcement (“no-evidence, no-claim”); automated citation extraction; confidence-linked references; replayable decision provenance. | Each recommendation linked to guideline ID, DOI, or PMID; confidence scores with evidence hyperlinks; reproducible workflow re-runs for audits. |
| Regulatory Alignment & Trust | Mapping of system functions to SaMD risk classes; integration of ISO 14971/IEC 62304 life-cycle controls; post-market drift monitoring and human-in-the-loop review. | Documented verification and change-control plans; real-time performance dashboards; clinician feedback loops during adaptive model updates |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Spieser, J.; Balapour, A.; Meller, J.; Patra, K.C.; Shamsaei, B. A Review of Multi-Agent AI Systems for Biological and Clinical Data Analysis. Methods Protoc. 2026, 9, 33. https://doi.org/10.3390/mps9020033
Spieser J, Balapour A, Meller J, Patra KC, Shamsaei B. A Review of Multi-Agent AI Systems for Biological and Clinical Data Analysis. Methods and Protocols. 2026; 9(2):33. https://doi.org/10.3390/mps9020033
Chicago/Turabian StyleSpieser, Jackson, Ali Balapour, Jarek Meller, Krushna C. Patra, and Behrouz Shamsaei. 2026. "A Review of Multi-Agent AI Systems for Biological and Clinical Data Analysis" Methods and Protocols 9, no. 2: 33. https://doi.org/10.3390/mps9020033
APA StyleSpieser, J., Balapour, A., Meller, J., Patra, K. C., & Shamsaei, B. (2026). A Review of Multi-Agent AI Systems for Biological and Clinical Data Analysis. Methods and Protocols, 9(2), 33. https://doi.org/10.3390/mps9020033

