An Agentic LLM Framework for Autonomous Surgical Continuum Monitoring: ReAct-Driven Tool-Use Agents for Presurgical, Intraoperative, and Postsurgical Cardiopulmonary Care
Abstract
1. Introduction
- Architectural transformation: We formally specify the replacement of five rule-based MAS agents with six ReAct-driven tool-use agents, demonstrating how each hardcoded decision function maps to a dynamic reasoning capability.
- Surgical continuum adaptation: We adapt the prior ED-focused architecture to the presurgical-to-post-discharge cardiopulmonary care continuum, redesigning agent roles, tool registries, and HITL gates for the multi-phase surgical journey.
- Tool registry specification: We define a ten-function clinical tool registry with full parameter specifications, enabling agents to invoke DETER deterioration prediction, RAG evidence retrieval, FHIR data access, and resource coordination as runtime tool calls.
- Confidence-gated HITL: We replace fixed escalation thresholds with a confidence-gated HITL architecture where agents self-assess their reasoning quality and escalate to clinicians only when genuinely uncertain—reducing alert fatigue without compromising safety.
- Extended conflict resolution: We extend the prior P(p,t) priority function with surgical-phase and DETER trajectory gradient terms and replace the rule-based Priority Queue Manager with an LLM-mediated Coordination Supervisor Agent.
2. Background and Related Work
2.1. From Rule-Based MAS to Agentic LLM Frameworks
2.2. Surgical Continuum Coordination: The Unsolved Problem
2.3. LLM Tool Use in Clinical Settings
2.4. Reflexion and Self-Evaluation in Healthcare AI
3. Agentic Digital Twin Architecture
3.1. Architectural Overview
3.2. The ReAct Loop as Agent Computation Model
- Observe: The agent reads its relevant state slice from S(t) via authorised FHIR tool calls. For the DETER Monitoring Agent, this includes the current 144-step physiological window, recent EMR deltas, and active DETER risk scores. For the Coordination Supervisor, this includes all agent heartbeat states and the current resource competition queue.
- Reason: The agent generates a verbal chain-of-thought reasoning trace over the observed state, drawing on its domain-specific system prompt, the retrieved evidence from RAG_retrieve(), and the outputs of any computational tools invoked during reasoning. The reasoning trace is explicitly recorded in the audit log for every cycle.
- Act: The agent selects and invokes the appropriate action from its tool registry. Before any tool execution that modifies the care pathway—FHIR_observation_write(), bed_assign(), examination_order_evaluate(), and alert_escalate()—the agent invokes HITL_escalate() if its self-assessed confidence is below the configured threshold (default: 0.75) or if the action type is flagged as requiring mandatory human confirmation.
3.3. Confidence-Gated HITL Architecture
- Confidence gate: agent self-assessed reasoning confidence < 0.75 at the Act step, as expressed in the Reflexion self-critique.
- Novel scenario detection: agent identifies in its reasoning trace that the current clinical presentation is outside its training distribution or its episodic memory of prior cases.
- Mandatory action types: any tool execution in a designated mandatory HITL category regardless of confidence: surgical plan modifications, admission-to-discharge transitions, and anaesthesia protocol recommendations.
- Priority tie: inter-agent resource conflict where |P(p,t) − P(p’,t)| < ε_tie = 0.01 after LLM-mediated reasoning by the Coordination Supervisor.
Confidence Score Elicitation Mechanism
4. Agentic Agent Design
4.1. Preoperative Risk Stratification Agent
4.2. Intraoperative Monitoring Agent
4.3. DETER Monitoring Agent (Core)
4.4. Resource Allocation Agent
4.5. Discharge and Rehabilitation Agent
4.6. Coordination Supervisor Agent
5. Clinical Tool Registry
5.1. Tool Security and Authorisation Architecture
Adversarial Threat Model
5.2. DETER DSS Extension to Multi-Phase Surgical Knowledge
- Cardiopulmonary surgery literature (PubMed): 97 procedure-specific outcome profiles with temporal evidence weighting [31].
- Clinical practice guidelines: ESC, AHA/ACC, STS guidelines for cardiac, thoracic, and major vascular surgery; NICE perioperative guidelines.
- FHIR patient state records: real-time S(t) slice from the DT Core, including Φ(t) surgical context vector.
- SNOMED-CT + ICD-11: coded clinical concept matching for structured query disambiguation.
- BNF/RxNorm drug-interaction safety index: postsurgical polypharmacy interaction screening.
- Anonymised institutional outcome data: procedure-matched prior cases from the University of Patras Cardiothoracic Clinic, indexed by EuroSCORE II risk decile and procedure type.
- DETER prediction history: the current patient’s own DETER risk score trajectory, enabling retrieval of similar physiological trajectories from anonymised prior cases for comparative context.
6. Agentic Coordination Framework
6.1. Three-Tier Structure Retained and Extended
6.2. Extended Conflict Resolution Function P(p,t,context)
6.3. Surgical Phase as Coordination Anchor
6.4. Scalability: Dynamic Agent Instantiation
7. Safety Properties and Formal Specifications
7.1. Safety Invariants
- No care-pathway modification without HITL confirmation: Any tool execution in the {FHIR_observation_write, bed_assign, EHR_writeback, CAREPOI_remote_init, MCI_protocol_trigger} set requires a prior HITL_escalate() call with a clinician digital signature. This invariant is enforced at Layer 3 of the tool authorisation architecture, not by the agent’s reasoning logic.
- No tool invocation outside agent registry: The Layer 2 tool registry whitelist is immutable at runtime. Agents cannot invoke tools not in their registry, and the DETER DSS clinical safety filter rejects out-of-registry calls before execution.
- Immutable audit trail: Every Observe–Reason–Act cycle, every tool call (attempted and executed), every Reflexion self-critique, and every HITL outcome is written to the DETER DSS audit trail via audit_log(). The audit trail is append-only and cryptographically signed (EU AI Act Article 12 compliance).
- Graceful degradation: If any agent fails (timeout, exception, or LLM inference failure), the Coordination Supervisor activates a backup rule-based fallback agent for that role. The fallback agent implements the equivalent of the prior framework’s rule-based logic, ensuring patient safety is maintained even during LLM infrastructure failures. The fallback activation is logged and triggers a HITL alert.
7.2. Formal State-Space Preservation
7.3. Confidence Threshold Calibration Protocol
7.4. EU AI Act High-Risk Compliance Pathway
- Transparency and explainability (Article 13): Every agent reasoning trace is recorded verbatim in the audit trail; every tool call is logged with input parameters and output; HITL confirmations are digitally signed and timestamped [8].
- Human oversight (Article 14): HITL_escalate() is a mandatory architectural gate for all care-pathway modifications; clinicians have one-click override capability at every escalation point; the confidence-gated HITL ensures human oversight is proportionate to clinical uncertainty.
- Accuracy, robustness, and cybersecurity (Article 15): Reflexion self-correction improves reasoning robustness; the rule-based fallback agent ensures continuity of function during LLM failures; the tool authorisation architecture prevents prompt-injection attacks from influencing tool execution.
- Data governance (Article 10): BioMistral-7B or Meditron-70B on-premise deployment ensures no patient data egress to third-party LLM providers; all FHIR data access is governed by hospital IG policies and GDPR Article 25 privacy-by-design.
7.5. Reasoning Trace Access Control and Governance
8. Limitations and Future Directions
8.1. LLM Reasoning Reliability
Reflexion-Specific Failure Modes
8.2. Latency Profile
Concurrent Multi-Patient Load Analysis
8.3. Confidence Calibration
8.4. Illustrative Simulation Scenarios
8.5. Future Directions
9. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Pylarinou, C.; Gortzis, L.G.; Koletsis, E.; Mulita, F.; Leivaditis, V.; Liolis, E.; Mavrilas, D. DETER: A Clinical Deterioration Prediction Algorithm to Improve Patient Care with Devices-Based Telemetry and Generative AI. J. Comput. Intell. Biomed. (ICCK JCIB), 2026; in press. [CrossRef]
- Yao, S.; Zhao, J.; Yu, D.; Shafran, I.; Griffiths, T.L.; Cao, Y.; Narasimhan, K. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv 2023, arXiv:2210.03629. [Google Scholar] [CrossRef]
- Shinn, N.; Cassano, F.; Labash, A.; Gopalan, A.; Narasimhan, K.; Yao, S. Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv 2023, arXiv:2303.11366. [Google Scholar] [CrossRef]
- Maleki Varnosfaderani, S.; Forouzanfar, M. The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century. Bioengineering 2024, 11, 337. [Google Scholar] [CrossRef]
- Shengli, W. Is human digital twin possible? Comput. Methods Programs Biomed. Update 2021, 1, 100014. [Google Scholar] [CrossRef]
- Bellifemine, F.; Caire, G.; Greenwood, D. Developing Multi-Agent Systems with JADE; Wiley: Chichester, UK, 2007. [Google Scholar]
- Bordini, R.H.; Hubner, J.F.; Wooldridge, M. Programming Multi-Agent Systems in AgentSpeak Using Jason; Wiley: Chichester, UK, 2007. [Google Scholar]
- Metta, C.; Beretta, A.; Pellungrini, R.; Rinzivillo, S.; Giannotti, F. Towards Transparent Healthcare: Advancing Local Explanation Methods in Explainable Artificial Intelligence. Bioengineering 2024, 11, 369. [Google Scholar] [CrossRef]
- Nashef, S.A.M.; Roques, F.; Sharples, L.D.; Nilsson, J.; Smith, C.; Goldstone, A.R.; Lockowandt, U. EuroSCORE II. Eur. J. Cardio-Thorac. Surg. 2012, 41, 734–745. [Google Scholar] [CrossRef]
- Hemmerling, T.M.; Charabati, S.; Zaouter, C.; Minardi, C.; Mathieu, P.A. A randomized controlled trial demonstrates that a novel closed-loop propofol system performs better hypnosis control than manual administration. Can. J. Anaesth. 2010, 57, 725–735. [Google Scholar] [CrossRef]
- Grieves, M.; Vickers, J. Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems. In Transdisciplinary Perspectives on Complex Systems; Springer: Cham, Switzerland, 2017; pp. 85–113. [Google Scholar]
- Pylarinou, C.; Gortzis, L.; Zimeras, S. Coordinating effectively a heart attack pre-hospital service process using constraint optimization programming. In Proceedings of the 2011 International Conference on Electrical and Control Engineering, Yichang, China, 16–18 September 2011; pp. 4752–4755. [Google Scholar] [CrossRef]
- Yuan, S.; Yang, Z.; Li, J.; Wu, C.; Liu, S. AI-Powered Early Warning Systems for Clinical Deterioration Significantly Improve Patient Outcomes: A Meta-Analysis. BMC Med. Inform. Decis. Mak. 2025, 25, 203. [Google Scholar] [CrossRef]
- Gortzis, L.; Sakellaropoulos, G.; Nikiforidis, G. Multi-Agent Cooperation Infrastructure to Support Patient-Oriented Telecare Services. In Proceedings of the 2006 International Conference on Information Technology: Research and Education, Tel Aviv, Israel, 16–19 October 2006; pp. 4278–4281. [Google Scholar] [CrossRef]
- Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; et al. Large language models encode clinical knowledge. Nature 2023, 620, 172–180. [Google Scholar] [CrossRef]
- Labrak, Y.; Bazoge, A.; Morin, E.; Gourraud, P.-A.; Rouvier, M.; Dufour, R. BioMistral: A collection of open-source pretrained large language models for medical domains. arXiv 2024, arXiv:2402.10373. [Google Scholar] [CrossRef]
- Wu, Q.; Bansal, G.; Zhang, J.; Wu, Y.; Li, B.; Zhu, E.; Jiang, L.; Zhang, X.; Zhang, S.; Liu, J.; et al. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. arXiv 2023, arXiv:2308.08155. [Google Scholar] [CrossRef]
- Katsoulakis, E.; Wang, Q.; Wu, H.; Shahriyari, L.; Fletcher, R.; Liu, J.; Achenie, L.; Liu, H.; Jackson, P.; Xiao, Y.; et al. Digital Twins for Health: A Scoping Review. npj Digit. Med. 2024, 7, 77. [Google Scholar] [CrossRef] [PubMed]
- Lehmann, C.U.; Gundlapalli, A.V.; Williamson, J.J.; Fridsma, D.B.; Hersh, W.R.; Krousel-Wood, M.; Ondrula, C.J.; Munger, B. Five Years of Clinical Informatics Board Certification for Physicians in the United States of America. Yearb. Med. Inform. 2018, 27, 237–242. [Google Scholar] [CrossRef]
- Bica, I.; Alaa, A.M.; Lambert, C.; van der Schaar, M. From real-world patient data to individualized treatment effects using machine learning: Current and future methods to address underlying challenges. Clin. Pharmacol. Ther. 2021, 109, 87–100. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef]
- Moor, M.; Banerjee, O.; Abad, Z.S.H.; Krumholz, H.M.; Leskovec, J.; Topol, E.J.; Rajpurkar, P. Foundation models for generalist medical artificial intelligence. Nature 2023, 616, 259–265. [Google Scholar] [CrossRef]
- Han, S.; Zhang, Q.; Yao, Y.; Jin, W.; Xu, Z.; He, C. LLM multi-agent systems: Challenges and open problems. arXiv 2024, arXiv:2402.03578. [Google Scholar] [CrossRef]
- OpenAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Kim, Y.; Park, C.; Jeong, H.; Chan, Y.S.; Xu, X.; McDuff, D.; Lee, H.; Ghassemi, M.; Breazeal, C.; Park, H.W. MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making. arXiv 2024, arXiv:2404.15155. [Google Scholar] [CrossRef]
- Johnson, Z.; Saikia, M.J. Digital Twins for Healthcare Using Wearables. Bioengineering 2024, 11, 606. [Google Scholar] [CrossRef] [PubMed]
- Royal College of Physicians. National Early Warning Score (NEWS) 2; RCP: London, UK, 2017. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.-T.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar] [CrossRef]
- HL7 International. HL7 FHIR R4 Standard; HL7 International: Ann Arbor, MI, USA, 2019. [Google Scholar]
- European Parliament and Council of the EU. Regulation (EU) 2024/1689 Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act). Official Journal of the European Union. 2024. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L_202401689 (accessed on 9 June 2026).
- Pylarinou, C.; Mulita, F.; Koletsis, E.; Leivaditis, V.; Liolis, E.; Gortzis, L.; Mavrilas, D. A Clinical Decision Support System for Post-Surgical Cardiovascular Remote Monitoring. Clin. Pract. 2026, 16, 93. [Google Scholar] [CrossRef]
- Thirunavukarasu, A.J.; Ting, D.S.J.; Elangovan, K.; Gutierrez, L.; Tan, T.F.; Ting, D.S.W. Large Language Models in Medicine. Nat. Med. 2023, 29, 1930–1940. [Google Scholar] [CrossRef] [PubMed]
- Chen, Z.; Hernández-Cano, A.; Romanou, A.; Bonnet, A.; Matoba, K.; Salvi, F.; Pagliardini, M.; Fan, S.; Köpf, A.; Mohtashami, A.; et al. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models. arXiv 2023, arXiv:2311.16079. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar] [CrossRef]

| System | Autonomy | Tool Use | Safety Enforcement | Temporal Coverage | Self-Correction |
|---|---|---|---|---|---|
| JADE/JASON BDI [6,7] | Rule-bound (L1) | Fixed APIs | Formal (plan library) | Episode | No |
| DETER MAS Prior [1] | Rule-bound+PPO (L1) | Fixed APIs | Formal (priority queue) | Hours–days | No |
| Med-PaLM 2 [15] | Single LLM (L3) | None | None | Single query | No |
| Hemmerling 2010 [10] | Closed-loop (L2) | PK/PD model | Rule-based | Single procedure | No |
| AutoGen [17] | Dynamic LLM (L4) | Dynamic | Informal | Task-specific | Minimal |
| MDAgents [25] | Dynamic LLM (L4) | Limited | Voting consensus | Single case | Limited |
| This work | Agentic LLM (L5) | Formal registry (10) | Multi-layer + CFVL | Full continuum | Reflexion+CFVL |
| Agent | Trigger Condition | Threshold Value | Basis |
|---|---|---|---|
| Intraoperative Monitoring | Mean arterial pressure (MAP) | <65 mmHg for >30 consecutive seconds | AAGBI/SCA intraoperative crisis criteria |
| Intraoperative Monitoring | Heart rate | <40 bpm or >160 bpm for >20 s | AAGBI/SCA intraoperative crisis criteria |
| Intraoperative Monitoring | SpO2 | <88% for >60 s | AAGBI/SCA intraoperative crisis criteria |
| Intraoperative Monitoring | CPB pump flow | Drop > 30% from baseline within one observation window | Cardiothoracic surgical protocol |
| DETER Monitoring (ICU/ward) | DETER 6 h risk score | ≥0.85 on two consecutive prediction cycles (~10–20 min) | DETER validation cohort [1] |
| DETER Monitoring (ICU/ward) | NEWS2 score | ≥7 on single assessment | Royal College of Physicians NEWS2 [27] |
| DETER Monitoring (ICU/ward) | LLM inference/Reflexion limit | Timeout > 2000 ms, exception, or N_max = 2 reached without convergence | Latency budget (Section 8.2) |
| Coordination Supervisor | LLM conflict resolution | Fails after N_max = 2 iterations | Reflexion termination condition (Section 3.2) |
| Coordination Supervisor | Simultaneous HITL requests | ≥3 agents requesting HITL in one coordination cycle | System load threshold |
| Coordination Supervisor | OPEL level | Level 4 (system black)—MCI protocol activated; agentic coordination suspended | NHS England OPEL framework |
| All agents (universal) | LLM inference failure | Any timeout > 2000 ms, any exception, or out-of-registry tool call attempt | Safety-by-default fallback policy |
| Dimension | Rule-Based MAS (Prior Framework) | Agentic LLM Framework (This Work) |
|---|---|---|
| Agent logic | Hardcoded decision trees and PPO-trained policies per role | LLM-based autonomous agents with dynamic goal decomposition, tool invocation, and self-correction via ReAct loops |
| Reasoning | Deterministic if–then rules; cannot handle novel scenario combinations | Chain-of-thought + tool-use reasoning; generalises to unseen clinical presentations |
| Tool integration | Fixed API calls per agent role; no dynamic tool selection | Dynamic tool registry: agents select and invoke the appropriate FHIR/clinical/computational tool at runtime |
| Inter-agent coordination | Rule-based message passing with static priority queue P(p,t) | LLM-mediated negotiation with structured handoff protocols; P(p,t) replaced by agent-reasoned consensus under constraints |
| Knowledge source | Static RAG pipeline queried per fixed trigger | Agents dynamically decide when and what to retrieve; multi-hop retrieval chains for complex clinical questions |
| Self-correction | No; errors propagate unless HITL intervenes | Reflexion-style self-evaluation: agents critique their own reasoning before acting |
| Application domain | Pre-hospital to ED triage (acute, episodic) | Presurgical–intraoperative–postsurgical continuum (longitudinal, multi-phase) |
| Temporal horizon | Minutes to hours (ED encounter) | Days to weeks (surgical journey from consent to rehabilitation) |
| HITL interface | Override at decision escalation points | HITL is a mandatory gating step in every ReAct cycle before any tool execution that modifies care pathway |
| Agent | Surgical Phase | ReAct Loop Signature | Tool Registry | Authorisation and HITL Gate |
|---|---|---|---|---|
| Preoperative Risk Stratification Agent | Presurgical consent to scheduling | Observe: patient HIS data + FHIR records. Reason: EuroSCORE II/STS risk synthesis. Act: risk report + surgical plan endorsement or flag. | EuroSCORE_calc(), STS_risk_api(), FHIR_patient_read(), guideline_retrieve(), DETER_DSS_query(), comorbidity_screen() | Read EHR; generate risk report; HITL gate before any surgical plan modification or high-risk flag propagation |
| Intraoperative Monitoring Agent | Operative phase (sterile-field-aware) | Observe: real-time haemodynamic + anaesthesia streams. Reason: deterioration trajectory vs. operative stage. Act: alert escalation or physiological correction suggestion. | biosensor_stream_read(), anaesthesia_monitor_api(), CPB_event_log(), DETER_predict(), NEWS2_calc(), alert_publish() | Read monitors; generate alerts; HITL gate before any anaesthesia or perfusion protocol recommendation |
| DETER Monitoring Agent ★ Core | Postsurgical ICU and ward | Observe: CAREPOI telemetry 5–10 min + EMR delta. Reason: 6 h/24 h/7 d deterioration trajectory via DETER. Act: personalised risk score + CDS recommendation chain. | DETER_predict(), RAG_retrieve(), FHIR_observation_write(), NEWS2_calc(), alert_escalate(), audit_log() | Write observations; escalate alerts; HITL gate before all care pathway modifications; every act logged immutably |
| Resource Allocation Agent ★ | All surgical phases | Observe: DT Core S(t) resource state + examination orders. Reason: evidence-based investigation appropriateness per phase. Act: approve/flag/redirect examination orders; coordinate HEART ECG. | DT_state_read(), bed_assign(), examination_order_evaluate(), LIMS_api(), RIS_api(), ECG_coordinate(), cost_track() | Read resource state; flag orders; HITL gate before bed reassignment or examination restriction affecting care |
| Discharge and Rehabilitation Agent | Post-discharge remote monitoring | Observe: LACE+ readiness + CAREPOI remote telemetry. Reason: trajectory-based discharge timing and follow-up intensity. Act: discharge plan + remote monitoring initiation + PROMs scheduling. | LACE_plus_calc(), FHIR_task_write(), CAREPOI_remote_init(), PROM_schedule(), follow_up_book(), EHR_writeback() | Write discharge plan; HITL gate before admission-to-discharge transition; initiate remote monitoring autonomously within authorised parameters |
| Coordination Supervisor Agent | System-wide (all phases) | Observe: all agent states + DT Core health + OPEL level. Reason: multi-agent conflict under priority function + surge detection. Act: re-prioritise queue; escalate to HITL for unresolvable conflicts. | agent_health_monitor(), priority_queue_manage(), conflict_resolve_P(), OPEL_read(), MCI_protocol_trigger(), HITL_escalate() | Orchestrate all agents; HITL escalation for any inter-agent conflict above epsilon threshold; MCI protocol activation |
| Tool Name | Category | Description and Parameters | Invoked by |
|---|---|---|---|
| DETER_predict() | Prediction | Runs DETER Transformer inference on current 144-step physiological window. Returns: risk_score{6 h, 24 h, 72 h}, confidence_interval, and feature_importance_map. | DETER Monitoring, Intraoperative |
| RAG_retrieve() | Knowledge | Semantic search over the cardiopulmonary surgery literature, 97 procedure profiles, ESC/AHA/ACC guidelines, and SNOMED-CT. Params: query_text, top_k, and phase_filter. Returns: evidence_list with provenance [28]. | DETER Monitoring, Preoperative Risk |
| FHIR_patient_read() | Data | Reads Patient, Observation, Condition, MedicationRequest FHIR R4 resources. Params: patient_id, resource_types[], and time_window. Returns: structured FHIR bundle [29]. | All agents |
| FHIR_observation_write() | Data | Creates FHIR observation resource with HITL approval flag. Params: patient_id, observation_type, value, unit, and flag_hitl. Returns: resource_id. | DETER Monitoring |
| EuroSCORE_calc() | Clinical | Computes EuroSCORE II and STS risk scores from structured patient data. Returns: logistic_euroSCORE, additive_euroSCORE, STS_mortality, and STS_morbidity. | Preoperative Risk |
| examination_order_evaluate() | Resource | Evaluates proposed examination order against ESI-stratified evidence protocols and current resource availability from DT Core. Returns: approve|flag|redirect with justification. | Resource Allocation |
| conflict_resolve_P() | Coordination | Computes the extended priority function P(p,t,context) = w1·f1(acuity_p) + w2·f2(WT_p(t)) + w3·f3(CR_p(t)) + w4·f4(phase_p(t)) + w5·f5(∇DETER_p(t)) for each competing patient, with default weights w = (0.40, 0.15, 0.20, 0.15, 0.10). Returns: ranked patient list with priority scores. | Coordination Supervisor |
| agent_health_monitor() | Infrastructure | Polls heartbeat of all active agents. Detects timeout, exception, or deadlock. Activates backup instance on failure. Returns: agent_status_map. | Coordination Supervisor |
| HITL_escalate() | Safety | Packages agent reasoning chain, evidence provenance, and proposed action into structured HITL dashboard alert. Requires clinician digital signature before proceeding. Returns: approval|rejection|modified_action. | All agents (mandatory) |
| audit_log() | Compliance | Writes immutable timestamped record of agent_id, reasoning_chain, tools_called, evidence_retrieved, action_proposed, and HITL_outcome to DETER DSS audit trail (EU AI Act Article 12 compliant) [19,30]. | All agents (mandatory) |
| Component | Prior MAS Framework | Agentic Framework (This Work) | Rationale |
|---|---|---|---|
| Priority function | P(p,t) = w1*f1(ESI) + w2*f2(WT) + w3*f3(CR) | Extended: P(p,t,context) += w4*f4(surgical_phase) + w5*f5(DETER_trajectory_gradient) | Surgical phase modulates urgency beyond acute ESI; DETER gradient captures rate of deterioration, not just current level |
| Conflict resolution | Rule-based Tier 2 Priority Queue Manager | Agent-mediated: Coordination Supervisor LLM reasons over competing requests, invokes conflict_resolve_P(), and proposes resolution with chain-of-thought justification | LLM reasoning handles novel conflict types not encodable in fixed weight vectors |
| HITL escalation trigger | Fixed threshold: |P(p) − P(p’)| < epsilon = 0.01 | Dynamic: Agent requests HITL when self-assessed confidence < 0.75 OR when two competing resources have equal priority OR when novel scenario type detected | Confidence-gated HITL reduces alert fatigue while preserving safety for genuinely ambiguous situations |
| Deadlock handling | Supervisor activates backup agent instance | Reflexion loop: Coordination Supervisor critiques own conflict resolution, attempts re-reasoning with broader context before escalating to HITL | Self-correction before escalation reduces unnecessary human interruptions |
| MCI surge mode | OPEL term added to P: P_MCI += w4*f4(OPEL) | Autonomous protocol switch: MCI Coordinator sub-agent instantiated dynamically; full surgical care pathway suspended in favour of damage-control prioritisation | Agentic architecture allows runtime agent instantiation for novel scenarios |
| Agent | Observation Polling | ReAct Cycle (est.) | Reflexion Overhead | Latency Regime |
|---|---|---|---|---|
| Preoperative Risk Stratification | On-demand at surgical consent | 1200–2000 ms (EuroSCORE + STS + RAG multi-hop) | 800–1200 ms | Non-time-critical (hours to days before surgery) |
| Intraoperative Monitoring | 60 s window; continuous CPB event monitoring | 800–1500 ms (routine); <100 ms (rule-based fallback, time-critical events) | 600–900 ms; N_max = 2 | Time-critical; dual-path hybrid |
| DETER Monitoring Core | 5–10 min CAREPOI telemetry cycle | 900–1500 ms | 700–1100 ms | Routine ICU/ward; <100 ms fallback for DETER score ≥ 0.85 |
| Resource Allocation | Event-driven (examination order receipt) | 700–1200 ms per order | 500–800 ms | Non-time-critical |
| Discharge and Rehabilitation | Daily LACE+ + PROM cycle | 1000–1800 ms | 700–1000 ms | Non-time-critical; mandatory HITL adds clinician response time |
| Coordination Supervisor | Continuous agent heartbeat (30 s intervals) | 1000–1500 ms for conflict resolution | 800–1200 ms | Continuous; rule-based fallback on LLM failure < 50 ms |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Pylarinou, C.; Gortzis, L.; Leivaditis, V.; Liolis, E.; Antzoulas, A.; Papadoulas, S.; Nikolakopoulos, K.; Panagiotopoulos, I.; Mitsos, S.; Tomos, P.; et al. An Agentic LLM Framework for Autonomous Surgical Continuum Monitoring: ReAct-Driven Tool-Use Agents for Presurgical, Intraoperative, and Postsurgical Cardiopulmonary Care. Bioengineering 2026, 13, 686. https://doi.org/10.3390/bioengineering13060686
Pylarinou C, Gortzis L, Leivaditis V, Liolis E, Antzoulas A, Papadoulas S, Nikolakopoulos K, Panagiotopoulos I, Mitsos S, Tomos P, et al. An Agentic LLM Framework for Autonomous Surgical Continuum Monitoring: ReAct-Driven Tool-Use Agents for Presurgical, Intraoperative, and Postsurgical Cardiopulmonary Care. Bioengineering. 2026; 13(6):686. https://doi.org/10.3390/bioengineering13060686
Chicago/Turabian StylePylarinou, Charalampia, Lefteris Gortzis, Vasileios Leivaditis, Elias Liolis, Andreas Antzoulas, Spyros Papadoulas, Konstantinos Nikolakopoulos, Ioannis Panagiotopoulos, Sofoklis Mitsos, Periklis Tomos, and et al. 2026. "An Agentic LLM Framework for Autonomous Surgical Continuum Monitoring: ReAct-Driven Tool-Use Agents for Presurgical, Intraoperative, and Postsurgical Cardiopulmonary Care" Bioengineering 13, no. 6: 686. https://doi.org/10.3390/bioengineering13060686
APA StylePylarinou, C., Gortzis, L., Leivaditis, V., Liolis, E., Antzoulas, A., Papadoulas, S., Nikolakopoulos, K., Panagiotopoulos, I., Mitsos, S., Tomos, P., Koletsis, E., & Mulita, F. (2026). An Agentic LLM Framework for Autonomous Surgical Continuum Monitoring: ReAct-Driven Tool-Use Agents for Presurgical, Intraoperative, and Postsurgical Cardiopulmonary Care. Bioengineering, 13(6), 686. https://doi.org/10.3390/bioengineering13060686

