Advances in Closed-Loop Artificial Intelligence for Healthcare

Das, Diba; Adams, Scott D.; Corva, Dean M.; Bucknall, Tracey K.; Kouzani, Abbas Z.

doi:10.3390/electronics15071396

Open AccessReview

Advances in Closed-Loop Artificial Intelligence for Healthcare

by

Diba Das

¹

,

Scott D. Adams

¹

,

Dean M. Corva

²

,

Tracey K. Bucknall

^3,4

and

Abbas Z. Kouzani

^1,*

¹

School of Engineering, Deakin University, Geelong, VIC 3216, Australia

²

School of Engineering, Deakin University, Burwood, Melbourne, VIC 3125, Australia

³

School of Nursing & Midwifery, Centre for Quality and Patient Safety Research, Institute for Health Transformation, Deakin University, Geelong, VIC 3220, Australia

⁴

Alfred Health, Melbourne, VIC 3004, Australia

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(7), 1396; https://doi.org/10.3390/electronics15071396

Submission received: 24 February 2026 / Revised: 18 March 2026 / Accepted: 25 March 2026 / Published: 27 March 2026

(This article belongs to the Special Issue Trustworthy Decision Intelligence: Data-Centric AI, Foundation Models, and Real-World Impact)

Download

Browse Figures

Versions Notes

Abstract

Artificial intelligence (AI) is increasingly used in healthcare to support clinical decision-making through clinical decision support systems (CDSS). Human-in-the-loop (HITL) approaches introduce clinician oversight to improve model interpretability, reliability, and adaptability, while explainable AI (XAI) helps clinicians understand model behaviour. This review categorises HITL AI approaches in healthcare into pre-deployment and post-deployment stages and provides a dedicated review focusing specifically on post-deployment HITL systems. It also introduces the concept of closed-loop AI, where real-time expert feedback can refine AI outputs without requiring model retraining. A systematic review following PRISMA guidelines was conducted using the Scopus and PubMed databases for studies published between 2020 and July 2025. From 3466 identified records, 3012 remained after duplicate removal. After title and abstract screening, 1630 articles were assessed through full-text review, and 15 studies met the predefined inclusion criteria related to HITL, post-deployment adaptation, and interactive XAI in healthcare. The selected studies indicate growing interest in post-deployment HITL systems that allow clinicians to refine AI outputs, provide real-time feedback, and support adaptive CDSS. These findings highlight a shift toward human-centred, closed-loop AI frameworks that integrate expert feedback into deployed systems to improve transparency, trust, and responsiveness in clinical decision-making.

Keywords:

artificial intelligence; clinical decision-making; clinical decision support; closed-loop; human-in-the-loop; real-time feedback

1. Introduction

Artificial intelligence (AI) systems in the healthcare industry have experienced rapid growth and wide adoption [1]. These systems are revolutionizing healthcare by assisting in critical areas such as diagnosis, treatment planning, drug development, clinical decision-making, and rehabilitation [2]. Machine learning (ML) models can identify patterns that may be missed by clinicians, enabling early detection of conditions such as sepsis [3], cancer [4], cardiac arrest [5], etc. Natural language processing (NLP) allows unstructured clinical notes to be mined for relevant information, while predictive models support clinical decision-making and risk stratification [6]. Deep learning expands AI’s ability to process vast amounts of data and respond dynamically to patient needs [6,7,8]. These AI technologies enable real-time analysis and decision support across various clinical settings.

Clinical decision support systems (CDSS) powered by AI have the potential to support healthcare professionals [9]. However, the integration of AI into healthcare requires these systems to be effective and understandable to everyone involved. This includes doctors, patients, and healthcare professionals [10,11]. Growing concern has emerged regarding the technical, clinical, and ethical implications of AI integration into healthcare systems [12]. Studies have highlighted that AI-driven healthcare applications are susceptible to various forms of error, which can inadvertently compromise patient safety [13]. These systems may also reflect underlying biases, suffer from limited transparency and accountability, and raise significant concerns related to data privacy and security [14]. Furthermore, challenges such as biased training data, poor model interpretability, and a lack of mechanisms for real-time validation undermine the reliability and credibility of the AI tools [15]. These issues can result in clinical errors, loss of trust, and ethical dilemmas, particularly when algorithms are applied to diverse patient populations [16,17].

This gap in the state-of-the-art AI algorithms has prompted a shift toward human-in-the-loop (HITL) paradigms. HITL is an approach that incorporates human judgment into the AI process, ensuring that people remain involved in critical decisions [18,19,20]. In HITL systems, human experts interact with AI systems to align outcomes with ethical standards and societal values, providing oversight and making final decisions when necessary [21,22]. Some concerns have been raised about humans being left out of the loop, but industry groups such as the European Society of Radiology have released statements saying AI can never replace humans [23,24,25,26]. Therefore, in healthcare, HITL is necessary, as clinicians must remain integral to the decision-making process; this integration can even be an advantage by providing feedback to refine model performance. This collaborative approach ensures that clinical judgment is preserved and that AI supports rather than replaces human expertise [27].

A key issue with HITL is transparency. For experts to be able to effectively collaborate with an AI model, they must be able to understand how the AI has arrived at the conclusions it presents. This explainability plays a central role in enabling effective human–AI collaboration [28,29]. The concept of explainable AI (XAI) has emerged as a way to enhance transparency and support informed, trustworthy clinical decision-making [30,31]. In XAI, a black-box model refers to AI models whose internal processes are not easily visible or interpretable [32]. While these models can generate predictions from input data, the rationale behind their decisions remains unclear to users [33]. This opacity makes it difficult to understand how the model functions, identify potential biases or errors, and ensure accountability [34]. The term “black box” is typically contrasted with “white box” or “transparent” models, where the decision-making process is open and understandable [35]. Transparent models enable users to understand the outcomes better and trust them more. However, despite their strong predictive performance, high-complexity models like deep neural networks (DNNs) [36,37] often lack transparency, posing challenges that must be addressed to justify their use in healthcare [38]. XAI techniques such as shapley additive explanations (SHAP), local interpretable model-agnostic explanations (LIME), and counterfactual reasoning frameworks have been developed to make complex models more transparent for experts [39,40]. When combined with HITL workflows, these methods facilitate model debugging and auditability, fostering greater trust and accountability in AI-assisted decisions.

Table 1 presents a comparison of review papers on HITL in healthcare published between 2020 and 2025, including the scope of our work. These state-of-the-art reviews highlight a growing interest in integrating HITL across various healthcare domains such as medical imaging, electronic health records (EHR), clinical decision-making, and AI governance. Budd et al. [41] emphasize HITL through active learning in medical imaging, though their scope remains limited to prototypes with minimal clinical deployment. Chen et al. [42] explore HITL in XAI for imaging, focusing on usability and transparency, though without quantifying clinical impact. Tam et al. [43] introduced a framework to ensure quality of information, understanding, and reasoning, expression style and persona, safety and harm, trust and confidence (QUEST) for evaluating clinical LLMs. But they overlooked training-stage human interaction and lacked real-world validation. Yuan et al. [44] provide a conceptual classification of HITL roles in the ML pipeline using EHRs, but fall short in addressing the practical and ethical implementation aspects. Kabata and Thaldar [45] frame HITL as a regulatory and ethical necessity in low-resource settings but offer little in terms of technical detail or deployment strategy.

Collectively, these papers underscore HITL’s promise but reveal critical gaps in implementation, evaluation, and generalizability across real-world clinical environments. Early research has predominantly focused on the pre-deployment phase, where clinicians contribute to activities such as data annotation, feature selection, model training, and validation. As a result, most existing studies and reviews emphasize how human expertise improves model development prior to clinical implementation. Post-deployment HITL AI, which involves human oversight and interaction after the model has been integrated into clinical workflows, has received comparatively less attention in earlier literature. This is partly because large-scale deployment of AI-based CDSS in real-world healthcare settings has only recently begun to expand. Consequently, post-deployment HITL mechanisms have been discussed less frequently in prior work. In contrast to the existing works, our review centres specifically on post-deployment HITL systems, a critical yet underexplored area. Our work addresses a critical gap by bridging the divide between existing HITL models and the practical demands of real-world healthcare AI deployment.

In this work, we categorise HITL approaches for healthcare based on their application stage. Pre-deployment HITL strategies typically involve expert participation in variable curation, annotation design, and interpretability-guided feature engineering. Conversely, post-deployment HITL mechanisms focus on monitoring model performance in dynamic environments, correcting for distributional shifts, and enabling interactive decision support generally via XAI tools. This work synthesises recent literature that operationalises human oversight beyond interpretability, examining how experts can be involved in the moderation of model variables. We present conceptual frameworks that involve post-deployment HITL stages, integrating algorithmic inference with human judgment to benefit clinical decision making. We also highlight the advantages and challenges associated with the proposed frameworks. This review focuses specifically on post-deployment HITL with real-time integration, bridging a critical gap left by prior conceptual reviews.

The contributions of this paper are as follows.

(1): This work systematically categorises Human-in-the-Loop (HITL) AI approaches in healthcare into pre-deployment and post-deployment stages, providing a structured perspective on how human involvement evolves across the lifecycle of clinical AI systems.
(2): This work presents a focused review of post-deployment HITL AI in healthcare, synthesising emerging studies that examine clinician interaction, feedback mechanisms, and human–AI collaboration after model deployment in clinical settings.
(3): This work introduces conceptual frameworks for closed-loop AI without retraining, illustrating how real-time expert feedback can be incorporated into deployed systems to refine AI outputs and support adaptive clinical decision support systems.

This article is organised as follows: Section 2 discusses current HITL AI approaches. Section 3 discusses the method and results for a systematic search for post-deployment HITL. Section 4 presents the overall discussions and future directions. Finally, Section 5 concludes the paper.

2. HITL AI Approaches

HITL refers to AI systems where human expertise is systematically integrated into one or more phases of model development, evaluation, or deployment to guide, supervise, or correct algorithmic behaviour [46,47]. HITL approaches can be categorised based on the stage of the AI lifecycle in which human involvement occurs. As illustrated in Figure 1, these approaches broadly fall into two categories: pre-deployment and post-deployment HITL.

2.1. Definitions of Pre-Deployment and Post-Deployment

Pre-deployment HITL refers to human participation during the development and validation stages of an AI model, before it is integrated into clinical workflows. At this stage, domain experts contribute through activities such as data annotation, feature engineering, model validation, active learning, and reinforcement learning during training. The primary goal is to improve model quality and ensure that the system reflects clinical knowledge and real-world medical reasoning before deployment.

Post-deployment HITL, in contrast, involves human interaction with the AI system after it has been deployed in operational clinical settings. During this phase, clinicians may review, modify, or provide feedback on model outputs during real-world use. Post-deployment HITL mechanisms can include approaches that involve model retraining, such as feedback-informed learning and continuous active learning, as well as approaches without retraining, such as clinician override, input or output moderation, and other real-time control mechanisms.

To organise these approaches, Figure 1 presents a taxonomy of HITL AI methods across the AI lifecycle, distinguishing between pre-deployment and post-deployment human involvement. Within post-deployment HITL, approaches can be broadly categorised into (1) methods involving model retraining and (2) methods operating without retraining. While several studies have explored post-deployment HITL mechanisms, most existing work focuses on retraining-based approaches or simple override strategies. However, structured mechanisms that enable clinicians to guide AI system behaviour during inference without modifying model parameters remain relatively underexplored. To address this gap, this paper proposes a set of conceptual moderation frameworks for post-deployment HITL AI, enabling clinicians to guide AI system behaviour in real time while preserving the stability of validated models. These frameworks represent structured human interventions during inference rather than model updates.

As illustrated in Figure 1, the proposed moderation frameworks fall under HITL without model retraining and are organised into three categories:

Output moderation, where clinicians review and adjust model predictions before they are used in decision-making.
Input moderation, where clinicians influence model behaviour by modifying inputs or feature representations.
Internal moderation, where clinicians interact with selected model parameters to guide system behaviour.

These proposed frameworks provide a conceptual basis for analysing post-deployment HITL strategies in the remainder of this paper.

2.2. Pre-Deployment

Conventional AI model development typically follows a linear, automation-focused pipeline beginning with problem definition and data collection, followed by data preprocessing and, if applicable, feature engineering. Developers then select a suitable model, train it using labelled data, and evaluate its performance using metrics such as accuracy or Area Under Curve (AUC) on a validation set. Hyperparameter tuning is performed to optimise performance, after which the model is tested on an unseen dataset to estimate its generalizability [48]. Finally, the model is deployed into a production environment, often with minimal mechanisms for human oversight or feedback. So, conventional AI model development often emphasises automation, relying primarily on statistical performance metrics and offline evaluation [49]. In contrast, pre-deployment HITL involves structured human involvement during the development phase. From a healthcare perspective, examples of pre-deployment HITL include clinician-led data annotation, feature selection informed by domain knowledge, or expert validation of model predictions on retrospective datasets. This ensures that models are not only technically accurate but also clinically valid and aligned with real-world decision-making contexts [50]. It addresses limitations of purely data-driven development by embedding clinical reasoning and expert judgment into the modelling process, thereby improving the relevance, reliability, and safety of AI systems before their integration into clinical practice.

2.2.1. Human Labelling Data

Data labelling is a collaborative process where human experts actively participate in annotating or validating data used to train AI models [51,52,53]. In healthcare, this approach leverages clinicians’ domain knowledge to ensure that training datasets are accurate, clinically relevant, and representative of diverse patient populations. By integrating expert feedback during the labelling phase, HITL helps address common issues like noisy or incomplete data, reducing biases, and improving model robustness. Furthermore, HITL labelling enables iterative refinement: AI models can pre-label data, which humans then review and correct, increasing efficiency while maintaining quality. This interactive labelling process is especially valuable in complex medical contexts such as radiology image annotation, pathology slide labelling, or clinical note classification, where subtle nuances require expert interpretation. In the literature, this method is prominently seen. For example, the annotation of the RibFrac dataset followed a HITL labelling procedure, which ensures a high standard of annotation quality [54]. Ramesh et al. [55] showed that twenty-six predefined medical conditions were annotated by a team of humans (comprising two glaucoma specialists and two optometrists) by using the Microsoft Visual Object Tagging Tool (VoTT). Ultimately, HITL data labelling enhances the trustworthiness and performance of healthcare AI systems by combining human judgment with machine scalability. Huang et al. [56] showed human involvement to ensure accurate anatomical segmentation and reliable feature extraction from magnetic resonance imaging (MRI) images. Yu et al. [57] showed subspecialist radiologists’ contribution to expert annotations during lesion detection and segmentation via a UNet model, ensuring accurate identification of biopsy-relevant regions for prostate cancer diagnosis from MRI. Zhou et al. [58] showed the annotation of a few key 2-dimensional slices in a 3-dimensional scan for medical image segmentations by an expert.

2.2.2. Active Learning

Most commonly, active learning is used during the model development phase. Here, the AI model selects the most uncertain or informative data points for expert labelling, helping to efficiently build a high-quality training dataset before the model is deployed in clinical settings [59,60,61]. The model is trained on a small, labelled set, queries uncertain examples from a large unlabelled pool, gets human labels, retrains, and repeats this cycle until performance is satisfactory. This reduces labelling costs and improves model accuracy upfront. For example, Brandenburg et al. [62] used active learning along with the annotation of selected frames from robot-assisted minimally invasive esophagectomies that were performed by at least three independent medical experts.

2.2.3. Reinforcement Learning

Reinforcement learning (RL) with HITL is an emerging paradigm in healthcare that combines the adaptive learning capabilities of RL with the expert judgment of clinicians to develop safer, more effective decision-support systems [47]. In traditional RL, agents learn optimal treatment policies by maximising rewards through trial-and-error interactions with data; however, in high-stakes domains like healthcare, this exploration can be risky or unethical [63]. HITL-RL addresses this by integrating human expertise into critical points in the learning process. Clinicians may provide demonstrations, give real-time feedback on treatment decisions (reward shaping), or rank policy outcomes (preference learning) to guide the RL agent toward clinically valid behaviours. For example, by combining retrospective data (e.g., from Medical Information Mart for Intensive Care (MIMIC)-IV dataset) with expert-in-the-loop oversight, HITL-RL enables offline learning of treatment strategies, such as optimal fluid or vasopressor dosing for sepsis, while reducing the risk of unsafe recommendations [64]. This technique can be a promising approach for considering crowdsourcing as a HITL [65].

2.2.4. Others

Humans can be involved in the training phase by knowledge injection during feature engineering. Experts can provide their knowledge for encoding medical scores (e.g., SOFA) or treatment guidelines into the model pipeline. An example is incorporating clinical expert knowledge before applying machine learning techniques to a severe asthma case study [66]. Wu et al. [67] used expert knowledge in the pre-deployment stage for concept verification for imaging techniques. Subba et al. [68] presented an advanced LLM-driven workflow with a novel optimised scoring strategy for candidate gene prioritisation, incorporating human-in-the-loop augmentation. Boden et al. [69] showed that HITL via a pathologists’ scoring approach provides an important safety mechanism for detecting and correcting algorithmic errors for breast cancer diagnosis with digital image analysis. Lee et al. [70] used a Turing test that integrated clinicians into the AI evaluation loop, having them assess both the model’s outputs and its explanations. Ghani [71] leveraged human expertise in HiLTS© to optimize EEG stimulation parameters, improving system reliability and clinical relevance even without requiring manual data labelling.

2.3. Post-Deployment

Post-Deployment HITL refers to the integration of human oversight and intervention after an AI model has been deployed. Here, the AI model has already been deployed in a clinical setting, and human experts interact with the system during real-world operation. In this phase, clinicians review AI outputs in real time, providing feedback by confirming, correcting, or overriding predictions as necessary. This ongoing interaction helps detect model drift, identify errors, and capture new clinical knowledge or emerging patterns that were not part of the original training data. This approach helps ensure that the model behaves as expected in real-world scenarios and adapts to new data or edge cases. We categorize the post-deployment HITL strategies mainly into two categories—with model retraining and without model retraining—as described in the following.

2.3.1. With Model Retraining

In this HITL approach, after examining the model’s outputs, humans can incorporate feedback and new data to retrain and improve the model.

Without XAI

Retraining HITL without XAI involves incorporating human feedback into the model improvement process based solely on outcomes or observable errors, without relying on model interpretability tools. In this setup, human experts review model predictions in deployment or during evaluation and identify incorrect outputs. Based on this, they may provide corrected labels, flag specific data points, or adjust system thresholds. These human-validated examples are collected and used to retrain or fine-tune the model, improving its accuracy and generalisation. In a hospital setting, a decision-support model may recommend incorrect drug dosages; if clinicians notice and override these decisions, their corrections can be logged and used to retrain the model. While this process enhances performance over time, the lack of explainability means that humans can only respond to surface-level errors without understanding why the model made a specific decision. As a result, HITL without XAI still enables continuous improvement, but it may be less effective in identifying systemic biases, hidden shortcuts, or unsafe reasoning compared to HITL systems that include XAI tools.

With XAI

HITL retraining with XAI combines human judgment and model interpretability to improve AI systems in a transparent and accountable way. In this approach, explainability tools (e.g., SHAP, LIME, attention maps) are used to make model predictions understandable to domain experts, who can then identify errors, biases, or spurious correlations. These insights guide targeted retraining: experts may correct misclassified examples, relabel ambiguous cases, or flag harmful reasoning patterns revealed by the model’s explanations. The corrected or augmented data is then fed back into the model for fine-tuning, leading to more accurate and trustworthy predictions. For instance, Metsch et al. [72] proposed an interactive XAI platform prototype, CLARUS, that allows not only the evaluation of specific human counterfactual questions based on user-defined alterations of patient networks and a re-prediction of the clinical outcome but also a retraining of the entire graph neural network (GNN) after changing the underlying graph structures. They used a synthetic dataset and a Protein–Protein Interaction (PPI) dataset on kidney cancer to get to know the platform.

Active Learning

While active learning is traditionally associated with pre-deployment data annotation and model training, it can also be part of a post-deployment HITL system for continuous improvement. Post-deployment active learning, also known as online active learning, refers to the use of active learning techniques after a machine learning model has been deployed in a real-world environment [73]. In this setup, the model continuously monitors incoming data and flags uncertain or low-confidence predictions for human review [74]. By involving human experts to label these edge cases, the system ensures that the most informative data points are added to the training set. These human-labelled examples can then be used for periodic or online retraining, improving the model’s adaptability and robustness over time. This technique bridges the gap between static model deployment and evolving real-world data, enabling continuous learning and better long-term performance [41].

Reinforcement Learning

Post-deployment HITL RL refers to the ongoing collaboration between AI agents and human experts after an RL model has been deployed in real-world settings. Humans can supervise the RL models and provide additional feedback in the online deployment phase. Li et al. proposed two approaches for online deployment [75]. The first one uses model selection and the upper confidence bound algorithm to adaptively select a model to deploy from a candidate set of trained offline RL models. The second approach involves fine-tuning the model in the online deployment phase when a supervision signal arrives. For healthcare, the agent can continue to operate in a live environment such as a hospital Intensive Care Unit (ICU), clinical decision support system, or robotic surgical assistant, and humans oversee, correct, and refine its behaviour. Human experts may intervene in real time to prevent unsafe actions, provide feedback on outcomes (used to adjust rewards), or label critical edge cases where the agent’s confidence is low. This input is incorporated into online updates or offline policy refinement, allowing the agent to adapt to new scenarios, shifts in clinical practices, or rare events not seen during training.

2.3.2. Without Model Retraining

Post-deployment HITL approaches without model retraining represent a broad category of mechanisms in which human feedback influences system behaviour without modifying the underlying model parameters. In this approach, the model itself is not retrained, but human input can be used to moderate or override its outputs.

Override

In this taxonomy, override is considered a specific intervention mechanism within the broader category of post-deployment HITL without model retraining, distinguished by the clinician’s direct authority to replace the model’s final output. Manual overriding by clinicians is a critical component of post-deployment HITL workflows in healthcare AI systems. Such overrides may occur with or without the inclusion of XAI. Non-explainable or black-box models offer limited interpretability, requiring clinicians to fully discount the results presented by the model when overriding decisions [76]. In contrast, in override systems with XAI, clinicians are presented with interpretable outputs such as feature attributions, rule-based explanations, and visual saliency maps [77], enabling them to assess the model’s reasoning before accepting or overriding its recommendation. This transparency enhances trust, supports accountability, and improves clinical decision-making. While both system types allow for human intervention, XAI-enabled systems facilitate more structured post-deployment feedback loops, where override events can be logged and used to retrain or recalibrate the model. Consequently, HITL systems that incorporate manual override mechanisms, particularly those enhanced with XAI, should be considered foundational for the safe deployment of adaptive AI in health settings. Manual override with XAI, as shown in Figure 2, allows human clinicians to overrule AI system decisions, ensuring safety and trust in critical care environments. This is particularly crucial in contexts like CDSS, where false positives or negatives can have severe consequences. Modern CDSS provides recommendations to health professionals, and health professionals are expected to make their own decisions and override AI recommendations that they consider inappropriate [78]. Clinicians often overrode AI recommendations when the AI’s predictions conflicted with their clinical judgment or when the AI’s suggestions lacked sufficient explanation [79]. For instance, in an AI-based sepsis prediction system, clinicians were allowed to disregard AI alerts based on their judgment to prevent unnecessary interventions and alarm fatigue [80].

Manual overriding, while essential for ensuring safety and accountability, has key limitations such as poor scalability, inconsistency in human judgment, cognitive fatigue, and susceptibility to bias or error. It can also be resource-intensive, requiring expert time and attention, which may not be sustainable on a large scale [81]. Moreover, if overrides are not fed back into the model for retraining, the AI system does not learn from its mistakes, limiting long-term improvement. These challenges highlight the need to balance human oversight with system design that supports learning and efficiency.

Moderation

Definition and Advantages Over Override

Post-deployment HITL moderation frameworks provide structured human oversight of AI outputs without modifying the underlying model parameters. Compared with simple override mechanisms, moderation establishes a systematic and traceable interaction between clinicians and AI systems, enabling consistent review and validation of model predictions. This structured oversight improves accountability, transparency, and reproducibility of decisions, which are essential for clinical auditability and regulatory evaluation. By integrating XAI techniques, clinicians can interpret model reasoning in real time, identify potential errors or biases, and provide corrective feedback without requiring model retraining.

Unlike ad hoc overrides, moderation frameworks support a structured and continuous interaction in which clinicians review AI outputs before final decisions are made. This approach promotes meaningful collaboration between human expertise and algorithmic decision-making while maintaining clear audit trails of expert interventions. Furthermore, moderation enables immediate correction of problematic predictions without the operational costs, data requirements, and safety risks associated with frequent model retraining. Because effective moderation depends on clinicians’ ability to understand model behaviour, explainability mechanisms play a central role across all moderation strategies.

Before introducing the concept of closed-loop AI, it is important to consider the regulatory and practical context of moderation frameworks. While theoretically promising, the practical implementation of moderation frameworks must address several challenges, including workflow integration, cognitive burden on clinicians, and regulatory compliance. AI systems that directly influence clinical decision-making may be classified as Software as a Medical Device (SaMD) [82] under regulatory frameworks such as those of the U.S. Food and Drug Administration (FDA) or European Medical Device Regulation (EU MDR). SaMD systems are typically categorised according to their potential clinical risk and the degree to which they influence medical decision-making. AI-based CDSS commonly fall under Class II or Class III medical devices, depending on their intended use and risk level [83]. Class II devices generally include decision-support tools that assist clinicians in diagnosis or treatment planning but still require human oversight for the final decision. These systems must undergo regulatory review and demonstrate safety, effectiveness, and appropriate risk management. In contrast, Class III devices represent high-risk systems that may directly drive or automate critical clinical decisions, such as treatment recommendations for life-threatening conditions. These devices require the most rigorous regulatory scrutiny, including extensive clinical validation and post-market monitoring. These regulatory expectations, which emphasise human oversight, safety, and accountability, underscore the need for AI systems that can benefit from expert input without compromising stability or requiring constant retraining. Closed-Loop AI without Retraining is one such paradigm that directly responds to these requirements.

Closed-Loop AI Without Retraining

Closed-loop AI without retraining refers to systems in which the underlying model remains fixed while human experts interact with outputs at inference time to guide predictions, correct errors, or adjust inputs and internal parameters. In this paradigm, adaptation occurs through structured human feedback, moderation, or interpretability-guided interventions rather than retraining.

Figure 3 illustrates a conceptual closed-loop HITL AI system, where human feedback dynamically shapes AI outputs before final decisions are made. Unlike online learning, which continuously updates model parameters, or continual learning, which adapts through sequential retraining, and rule-based overrides, which rely on static rules, closed-loop AI without retraining leverages real-time expert intervention to provide traceable, accountable, and explainable guidance. This framework enables oversight through targeted moderation strategies, including output, input, and internal moderation.

Output Moderation

Output moderation refers to the mechanism by which experts review AI-generated predictions before final decisions are issued. The diagram in Figure 4 illustrates an HITL AI system in which a machine learning model processes input data to generate a prediction that subsequently passes through an output moderation stage involving a human expert before the final output is produced. To facilitate effective human oversight, explainable AI (XAI) tools provide insights into the model’s decision-making process, enabling the expert to interpret, validate, or adjust the model’s prediction when necessary. Although real-time output moderation can enhance safety and transparency, it may also introduce workflow disruptions and additional cognitive load for clinicians, particularly in high-throughput environments such as emergency care. Therefore, systems should ensure that moderated outputs are systematically logged and traceable to maintain accountability, especially when such interventions influence treatment decisions. From a regulatory perspective, moderation mechanisms that directly affect clinical recommendations may require rigorous documentation, validation, and post-market monitoring to ensure patient safety.

Input Moderation

Input moderation refers to mechanisms that allow experts to influence model behaviour by modifying input features or providing additional control signals. Since the input to an AI model may consist of raw data or pre-processed features, experts can intervene at the input stage by adjusting feature values or introducing supplementary control inputs. These mechanisms enable clinicians to guide model behaviour during inference without altering the underlying model parameters.

(1): Feature Upgrade/Downgrade

Figure 5 illustrates an HITL AI framework that integrates expert feedback and XAI to iteratively enhance model performance through feature upgrade/downgrade. Once the model is deployed, experts analyse the output using XAI techniques to identify influential features. Experts may intervene in feature engineering by upgrading or downgrading feature importance based on domain knowledge and XAI insights.

(2): Additional Control Inputs

Figure 6 illustrates an HITL AI framework that integrates expert feedback and XAI to iteratively enhance model performance through additional control inputs. Once the model is deployed, the output is analysed using XAI techniques. The expert can provide feedback and influence by adding extra inputs, in addition to the preassigned inputs.

While these mechanisms improve flexibility and clinician control, they may increase system complexity and require clear audit trails to ensure compliance with regulatory standards. Input adjustments must be carefully monitored, as modifications can unintentionally bias outputs or introduce safety risks.

Internal Moderation

Internal moderation refers to expert-controlled adjustments of predefined internal model parameters without retraining the underlying model. The HITL framework in Figure 7 illustrates how an expert may influence the internal behaviour of the AI system through controlled moderation mechanisms. The system receives input data, which is processed by the AI model to produce an initial prediction. Explainable AI (XAI) methods then generate interpretable explanations of the model’s output, making the reasoning process more transparent to clinicians. These explanations are presented to the human expert, who reviews the prediction in the context of domain knowledge and clinical judgement. Based on this interpretation, the healthcare professional may intervene by moderating predefined internal control parameters, such as decision thresholds, feature weight constraints, or confidence calibration settings. The moderation affects controlled parameters or decision thresholds, not the learned weights of the model. These moderated parameters are fed back into the inference pipeline to refine the model’s response without retraining the underlying model. In this way, expert knowledge can influence the behaviour of the system while preserving the integrity of the validated model.

While this mechanism provides fine-grained human oversight and may enhance interpretability and responsiveness in clinical decision support, it also introduces potential algorithmic safety risks. Real-time adjustments to internal thresholds or control parameters could inadvertently induce unintended shifts in model behaviour, producing outputs that diverge from the model’s validated performance. To mitigate these risks, internal moderation frameworks should implement safeguards such as bounded parameter adjustments, predefined intervention constraints, and automated monitoring mechanisms to detect anomalous changes in output distributions. Comprehensive logging and versioning of all expert interventions are essential to maintain traceability and accountability, enabling clinicians and regulators to audit decision pathways. From a regulatory perspective, extensive human intervention in internal model controls may challenge the notion of a fixed-model SaMD [84]. Continuous or untracked modifications could require additional validation to ensure safety and efficacy. Therefore, internal moderation is most appropriate in controlled advisory systems where adjustments are limited, reversible, and fully documented, rather than in fully autonomous or high-risk clinical AI applications.

Ultimately, the effectiveness of HITL moderation frameworks depends not only on technical design but also on their integration into clinical workflows, regulatory compliance, and the ability to balance human oversight with operational efficiency.

3. Systematic Search for Post-Deployment HITL AI

Post-deployment HITL mechanisms are more critical than pre-deployment HITL in the context of healthcare AI, due to the dynamic and complex nature of real-world clinical environments [85]. While pre-deployment HITL plays a vital role in refining models through expert review and retrospective validation, it is inherently limited by the static nature of training data and the controlled settings in which development occurs. Once deployed, models are exposed to a broad spectrum of patient presentations, data inconsistencies, institutional variations, and evolving clinical practices that are often underrepresented or absent from development datasets. In this context, post-deployment HITL becomes essential for ensuring patient safety by enabling clinicians to monitor, interpret, and, where necessary, override or adjust model outputs in real-time. This oversight is crucial in mitigating the risks associated with incorrect or contextually inappropriate predictions that may adversely affect patient outcomes. Moreover, post-deployment HITL provides a necessary feedback loop for detecting model drift and concept shift, which can be significant sources of error in long-running AI models [86]. It also supports mechanisms for continuous model adaptation, such as active learning or periodic retraining based on real-world performance metrics. From a regulatory and ethical standpoint, post-deployment HITL facilitates ongoing auditing, transparency, and compliance with medical device regulations and AI governance frameworks that increasingly mandate lifecycle monitoring of deployed systems. In addition, it plays a pivotal role in fostering clinician trust and promoting the adoption of AI tools by enabling human oversight, contextual interpretation, and shared decision-making. This trust is further reinforced when HITL processes are integrated into routine workflows, allowing for iterative refinement of both model behaviour and its interface within clinical systems. Unlike pre-deployment HITL, which primarily evaluates hypothetical risk, post-deployment HITL directly addresses practical safety, reliability, and usability in the point-of-care environment. Furthermore, by empowering clinicians to engage in a continuous feedback loop, post-deployment HITL ensures that AI systems remain clinically relevant, user-centred, and aligned with patient-specific decision-making needs. Therefore, in this review, we explore post-deployment HITL works for healthcare over the last 5 years.

3.1. Methodology

This systematic literature review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta Analyses (PRISMA) guidelines [87].

3.1.1. Search Strategy and Data Sources

A comprehensive literature search was conducted using the Scopus and PubMed databases. A combination of relevant keywords and Boolean operators was used to identify articles pertinent to the research objectives. Keywords were selected based on common terminology found in the literature and refined through preliminary scoping searches. The final search string used for title and abstract searching is: (“artificial intelligence” OR “machine learning” OR “deep learning” OR “neural network” OR “reinforcement learning”) AND (“closed loop” OR “human in the loop” OR “feedback loop” OR “XAI” OR “explainable AI” OR “explainable artificial intelligence” OR “Interpretable AI” OR “ Interpretable artificial intelligence”) AND (“healthcare” OR “health care” OR “clinical decision support” OR “medical system” OR “hospital” OR “patient monitoring”). The searches were undertaken on 30 June 2025, with time limitations from 2020.

3.1.2. Inclusion Criteria

Studies were included if:

They were peer-reviewed original research publications (journal and conference papers).
They included healthcare applications.
They contained experimental works.
The model variables were modified by a human expert.
They were published in English.
The were published between 2020 and 2025.
The full text was available.

3.1.3. Study Selection and Data Collection

Search results were imported into EndNote, where duplicates were removed. The inclusion/exclusion criteria were used to screen titles and abstracts and to identify potentially relevant articles. Full texts for all potentially relevant articles were retrieved for full-text screening. A PRISMA study flow diagram was used to visually represent the selection process at each stage (Figure 8).

3.2. Results

Among 3466 studies identified, 3012 unique papers were found after removing duplicates. A total of 2299 papers were found after title screening. The number of papers after abstract screening was 1630. These papers in full-text form were assessed for full-text eligibility, and a total of 15 articles were considered eligible to be included in this review.

3.2.1. Publication Trend of Initial Search String

Figure 9 represents the publication trend in the last 5 years using the mentioned search string. It shows a solid line to reveal the trend from January 2021 to January 2025 from Scopus and PubMed. The fitted polynomial equations for the Scopus trend and PubMed trend are 3.75x³ − 2.272e⁴x² + 4.587e⁷x − 3.087e¹⁰ and −3.083x³ + 1.872e4x² − 3.79e7x + 2.558e¹⁰, respectively. The fitted curves reveal a nonlinear growth pattern for both databases, with Scopus showing a steeper acceleration in publication volume compared to PubMed. Forecasts for January 2026 predict 1155 papers in Scopus and 508 papers in PubMed, based on observed counts as of July 2025.

3.2.2. Existing Post-Deployment HITL AI Studies

Recent studies have explored how HITL mechanisms can improve alignment with clinical reasoning. In healthcare, HITL design has become essential for promoting transparency and ethical oversight. This section reviews existing post-deployment HITL studies that integrate explainability and human feedback to improve the outcome.

Table 2 summarises recent applications of post-deployment HITL AI systems. The studies are categorised into two groups: post-deployment HITL with retraining and post-deployment HITL without retraining (override).

For the first group, studies span diverse domains, from predicting COVID-19 outcomes to arrhythmia detection and personalised nutrition, employing models such as GNNs, deep Convolutional Neural Network (CNN), and gradient boosting techniques. In each case, explainability methods like SHAP, LIME, and counterfactual explanations are used to facilitate expert understanding, after which human feedback is used to refine the model through retraining. CLARUS [72] exemplifies active HITL integration by allowing users to manipulate counterfactuals in GNNs, thereby promoting real-time exploration of decision boundaries with retraining of the model. This level of interactivity facilitates deeper cognitive engagement and fosters trust in AI predictions. Aboutalebi et al. [88] introduce COVID-Net Biochem using clinical and biochemical data, which incorporates clinician-in-the-loop validation during the model development process using GSInquire XAI, ensuring interpretability and alignment with expert knowledge. Similarly, Sheu et al. [89] propose a feedback-enabled CNN system for pneumonia detection that adapts based on SHAP-driven examiner input, highlighting the feasibility of scalable, transparent AI deployment across clinical environments. Ju et al. [90] used LIME to interpret CNN decisions for cardiac arrhythmia detection and incorporated physician feedback into iterative error correction. A broader application of HITL principles is reflected in the AI-driven nutrition system [91], where expert oversight ensures that personalised dietary recommendations, generated via a retrieval-augmented generation (RAG) model, are nutritionally sound and adaptable in real-time. Collectively, these systems represent an interactive and iterative HITL design paradigm, where human insight plays a vital role in refining model behaviour.

For the second group, HITL principles in a post-deployment override capacity, human experts interact with or validate AI outputs without modifying or retraining the underlying model. For instance, models for seizure recognition [96], superficial surgical site infection detection [94], and clinical sequence prediction [95] rely on expert review, feedback panels, or prototype explanation interfaces that enable users to assess or override AI decisions but not update the model weights. Similarly, explainable systems for heart disease [92,100], Alzheimer’s [93], lung cancer [97], and tuberculosis [101] employ XAI techniques (e.g., SHAP, LIME, visualisations) to provide transparency, allowing clinicians to make the final decision without altering the model itself. Also, graphical interfaces or clinician-in-the-loop designs are implemented to serve as interpretive or decision-support tools [98,99].

4. Discussions and Future Directions

This section reflects the advantages and limitations of the existing HITL approaches, and how the limitations can be addressed in future research and implementation.

Post-deployment HITL systems in healthcare AI can be broadly categorised based on whether they involve model retraining or enable human interaction without retraining, through either override or moderation mechanisms. Table 3 summarises these three categories, highlighting key advantages and limitations based on recent studies.

The first category, HITL with model retraining, includes approaches where human feedback, such as correction or annotation, is incorporated into iterative updates of the AI model. They demonstrate that retraining-based HITL systems can lead to improved accuracy, adaptability to evolving clinical data, and enhanced relevance of AI outputs to specific patient populations. These systems can reduce algorithmic bias, support error correction, and foster knowledge transfer between clinicians and AI systems. However, their practical deployment remains challenging. Retraining requires significant computational resources, introduces complexity in integration, and imposes additional burdens of version management and system validation. Moreover, model updates are typically delayed, reducing responsiveness to urgent clinical needs. Human variability and fatigue may also impact the consistency of feedback, particularly when used at scale. Challenges specific to real-time retraining in healthcare extend beyond technical considerations to encompass data quality issues, distributional shifts in patient populations, annotation bottlenecks, and the risk of catastrophic forgetting. Moreover, regulatory constraints and resource limitations further restrict frequent model updates, underscoring the need for strategies that balance adaptability with safety and compliance [102,103].

The second category, HITL without model retraining (override), includes systems where clinicians can directly override AI outputs in real time, without altering the core model. These studies explore mechanisms that support real-time adaptation and clinician oversight, while maintaining a lightweight infrastructure. This class of HITL is particularly appealing in clinical settings due to its lower computational overhead, ease of regulatory compliance (since the model itself remains unchanged), and capacity to improve safety by enabling rapid feedback integration. Clinicians retain agency over final decisions, thereby increasing trust and accountability. However, these systems can increase the cognitive load on users, as clinicians must constantly interpret and decide whether to accept or override model suggestions. Over time, this may lead to feedback fatigue. Furthermore, inconsistency in override behaviour between users, or even the same users at different times, can lead to uneven performance, particularly in edge cases that require nuanced judgment.

The third category, which includes our proposed framework, falls under HITL without retraining (moderation). To mitigate the practical and ethical challenges associated with closed-loop HITL systems, especially the risks of biased or inconsistent feedback corrupting model performance, our proposed solution is to incorporate feedback but with a no-retraining approach. This approach allows clinicians to guide or adjust the behaviour of an AI system in real time, without retraining, via predefined moderation logic. Rather than completely overriding or retraining the model, clinicians can influence output presentation, suppress low-confidence predictions, or adjust decision thresholds. This hybrid model maintains the autonomy of the AI system while enabling human-influenced adaptations. Advantages include low deployment friction, rapid feedback loops, and simplified regulatory handling. However, the effectiveness of this approach relies heavily on the quality of interface design and the robustness of moderation protocols. Poorly calibrated moderation logic could either suppress important alerts or fail to adequately adjust harmful outputs. While moderation frameworks provide structured, traceable, and accountable human oversight, they may also increase cognitive demand on clinicians compared to simple override mechanisms because they require deeper interpretation of model reasoning and potential interaction with system parameters. Real-time review of model outputs, input adjustments, and internal parameter interventions requires continuous attention and interpretive reasoning, potentially exacerbating cognitive fatigue in already busy healthcare environments. To mitigate these risks, the proposed frameworks integrate explainable AI (XAI) tools to highlight the most relevant features, model reasoning, and potential sources of error, reducing the mental effort required to interpret complex predictions. Additionally, bounded intervention mechanisms, such as limiting the number or scope of changes a clinician can make per decision cycle, can help prevent overload. Logging and monitoring systems allow clinicians to focus only on outputs that exceed predefined uncertainty or risk thresholds, ensuring that interventions are targeted and efficient. Despite these strategies, it is important to recognise that HITL moderation cannot eliminate cognitive burden. Workflow integration, task prioritisation, and user interface design are critical factors that determine whether these frameworks enhance trust and safety without overtaxing clinicians. Future work should empirically evaluate the trade-off between oversight benefits and cognitive fatigue in real-world clinical settings.

By organising HITL methods into these categories, we highlight a key trade-off: retraining-based HITL offers long-term model improvement, while non-retraining approaches provide immediate clinical control. Our proposed moderation-based framework aims to strike a balance between these two extremes, enabling responsiveness, transparency, and safety without the complexity of frequent model updates. Our no-retraining feedback incorporation approach circumvents these challenges by utilizing clinician input exclusively at inference time. This preserves the stability and auditability of validated models, mitigates risks associated with erroneous corrections, and enables real-time customization without the delays and complexities inherent in retraining workflows.

The limited number of eligible studies identified in this review indicates that post-deployment HITL AI in healthcare remains an underexplored research area. Despite the critical importance of HITL frameworks, the extant literature reveals a gap of rigorous, prospective evaluations in real-world clinical environments, calling for further investigation. Research focus should prioritise assessing the impact of post-deployment HITL methods on clinical outcomes.

Human Factors and Cognitive Load in HITL Systems

Human factors play a critical role in the effectiveness of post-deployment HITL AI systems, particularly in clinical environments where healthcare professionals operate under significant time pressure and cognitive demands. Prior research in human–computer interaction (HCI) and clinical decision support systems has shown that poorly designed interfaces, excessive alerts, and opaque model behaviour can significantly reduce clinician adoption and trust in AI-assisted decision-making. Override-based and moderation-based HITL approaches require clinicians to interpret AI outputs, evaluate explanations, and decide whether to intervene. While these mechanisms enhance safety and human oversight, they can also increase cognitive workload and contribute to decision fatigue if not carefully designed. One key challenge is alert fatigue, a well-documented issue in clinical decision support systems where excessive notifications reduce clinician responsiveness and trust. If HITL mechanisms require frequent overrides or constant moderation of model outputs, clinicians may experience increased mental workload, leading to inconsistent interactions with the AI system. Similarly, poorly designed explanations may impose additional interpretive effort, undermining the intended benefits of explainable AI. Trust calibration is another important consideration. Clinicians must develop an appropriate level of trust in AI systems, neither over-relying on automated outputs nor dismissing useful recommendations. HITL systems should therefore support transparent reasoning, clear uncertainty communication, and intuitive interaction mechanisms that help clinicians understand when intervention is necessary. From a human–computer interaction perspective, the design of HITL interfaces should aim to minimize cognitive burden while preserving human agency. Several design principles can help achieve this balance.

Design Recommendations for Human-Centered HITL Systems

(1): Selective intervention mechanisms: HITL interactions should be triggered primarily for low-confidence predictions or high-risk decisions, reducing unnecessary clinician interruptions.
(2): Explanation prioritisation: XAI interfaces should highlight the most influential features or reasoning steps to reduce interpretation time.
(3): Workflow-aware integration: HITL mechanisms should align with existing clinical workflows rather than requiring additional steps outside routine decision-making processes.
(4): Adaptive alerting strategies: Systems should limit excessive notifications and prioritise alerts based on clinical relevance and uncertainty levels.
(5): Auditability and traceability: Logging clinician interventions supports accountability and enables system improvement without increasing user burden.

These design principles highlight that effective HITL systems must be engineered not only for algorithmic performance but also for usability, interpretability, and seamless integration into clinical workflows.

Regulatory and Safety Considerations for Post-Deployment HITL Adaptation

Post-deployment adaptation through human-in-the-loop (HITL) mechanisms must also be considered within the context of emerging regulatory frameworks for clinical AI systems. Regulatory agencies such as the FDA have proposed lifecycle-based governance models for AI/ML-enabled Software as a Medical Device, emphasising transparency, continuous monitoring, and post-market performance evaluation. Similarly, the European Union AI Act classifies many healthcare AI systems as high-risk and requires strict oversight, traceability, and human supervision. In this context, HITL-based moderation mechanisms can support regulatory compliance by enabling controlled expert oversight while preserving the stability of validated models. Implementing structured logging of clinician interventions, monitoring output distributions, and maintaining clear documentation of human–AI interactions are essential to ensure auditability, safety, and accountability. Such governance mechanisms help ensure that post-deployment adaptations remain transparent and aligned with regulatory expectations for trustworthy clinical AI systems.

In summary, HITL is essential to ensure the safe and effective integration of AI into healthcare. Pre-deployment human involvement establishes foundational model robustness, while post-deployment HITL mechanisms ensure ongoing adaptability and clinician oversight. The proposed moderation-based framework contributes a viable approach that balances responsiveness, regulatory feasibility, and user control, advancing the development of trustworthy AI-driven clinical decision support systems.

5. Conclusions

The integration of human-in-the-loop approaches with explainable AI in clinical decision support systems represents a pivotal advancement in healthcare AI, emphasising safety, transparency, and collaborative decision-making. This review has systematically examined post-deployment human-in-the-loop methodologies in healthcare, categorising them into three primary frameworks: retraining-based, override-based, and moderation-based. Each framework offers distinct advantages and challenges in terms of clinical integration, computational demands, and user experience. Retraining approaches enhance model performance over time but face practical limitations related to resource intensity and system complexity. Override-based systems allow immediate human intervention but may increase cognitive load on clinicians. Moderation-based frameworks strike a balance by enabling real-time adjustments without retraining, thereby enhancing efficiency and regulatory compliance. Despite these advances, significant challenges persist, including managing data variability, addressing annotation scarcity, and ensuring rigorous system validation, all of which complicate real-time model updates. This review highlights an emerging body of research incorporating expert knowledge during the post-deployment phase, demonstrating the transformative potential of closed-loop human-in-the-loop systems. Proposed frameworks that facilitate real-time feedback without continual retraining underscore the critical role of domain experts in refining AI behaviour, ultimately promoting more responsive and effective patient care. As healthcare increasingly adopts intelligent systems, fostering human–AI collaboration through interpretable and feedback-driven models will be crucial for building clinician trust, enabling transparency, and enhancing patient outcomes. Moreover, the limited empirical evidence on the effectiveness of human-in-the-loop systems in clinical environments underscores a vital avenue for future research.

Future research should prioritise the development of robust evaluation protocols for post-deployment HITL systems, including controlled studies that measure not only predictive accuracy but also clinician workload, decision quality, response time, and downstream clinical outcomes. Establishing standardised experimental designs and reporting practices will be essential to enable meaningful comparisons across HITL frameworks. Another important direction involves developing methods for detecting and managing data drift in continuously evolving clinical environments. Changes in patient populations, clinical practices, and data collection procedures can significantly affect model reliability. Integrating automated drift detection mechanisms with structured human oversight may provide a practical approach for maintaining model validity without requiring frequent full retraining. Future work should also investigate safe mechanisms for incorporating expert feedback into deployed systems. While override and moderation strategies allow clinicians to intervene in model outputs, further research is needed to design controlled feedback pipelines that ensure expert interventions improve system performance without introducing unintended biases or instability. Approaches such as constrained parameter adjustments, feedback validation procedures, and intervention logging may help ensure safe integration of human expertise. Finally, there is a need for benchmark datasets and evaluation frameworks for interactive explainable AI (XAI) in clinical settings. Current XAI evaluations often focus on technical explanation quality rather than clinical usability. Developing benchmarks that measure interpretability, trust calibration, and decision support effectiveness during real clinician–AI interaction would provide valuable guidance for the design of future HITL systems.

Advancing human-in-the-loop strategies that seamlessly integrate human expertise with AI autonomy are crucial to developing clinical decision support that is safe, reliable, and user-friendly. By synthesising current methodologies and identifying key gaps, this review provides a foundation for designing human-in-the-loop systems that are both practical and impactful, thereby enhancing the transformative role of AI in healthcare delivery.

Author Contributions

Conceptualization, D.D., S.D.A., D.M.C. and A.Z.K.; methodology, D.D., S.D.A. and A.Z.K.; formal analysis, D.D.; investigation, D.D.; resources, S.D.A., T.K.B. and A.Z.K.; data curation, D.D.; writing—original draft, D.D.; writing—review & editing, S.D.A., D.M.C., T.K.B. and A.Z.K.; visualization, D.D.; supervision, S.D.A., D.M.C., T.K.B. and A.Z.K. All authors have read and agreed to the published version of the manuscript.

Funding

No funding was received to assist with the preparation of this manuscript.

Data Availability Statement

No datasets were generated or analysed during the current study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
CDSS	Clinical Decision Support Systems
CNN	Convolutional Neural Network
EHR	Electronic Health Records
GUI	Graphical User Interface
GNN	Graph Neural Network
HITL	Human-In-The-Loop
ICU	Intensive Care Unit
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta Analyses
SHAP	Shapley Additive Explanations
LIME	Local Interpretable Model-agnostic Explanations
XAI	Explainable AI

References

Kim, M.; Sohn, H.; Choi, S.; Kim, S. Requirements for Trustworthy Artificial Intelligence and its Application in Healthcare. Healthc. Inform. Res. 2023, 29, 315–322. [Google Scholar] [CrossRef] [PubMed]
Scarpato, N.; Ferroni, P.; Guadagni, F. XAI Unveiled: Revealing the Potential of Explainable AI in Medicine: A Systematic Review. IEEE Access 2024, 12, 191498–191516. [Google Scholar] [CrossRef]
Al-Ansari, A.A.; Nejad, F.A.B.; Al-Nasr, R.J.; Prithula, J.; Rahman, T.; Hasan, A.; Chowdhury, M.E.H.; Alam, M.F. Predicting ICU Mortality Among Septic Patients Using Machine Learning Technique. J. Clin. Med. 2025, 14, 3495. [Google Scholar] [CrossRef]
Frezza, B.; Nurchis, M.C.; Capolupo, G.T.; Carannante, F.; De Prizio, M.; Rondelli, F.; Alunni Fegatelli, D.; Gili, A.; Lepre, L.; Costa, G. A Comparison of Machine Learning-Based Models and a Simple Clinical Bedside Tool to Predict Morbidity and Mortality After Gastrointestinal Cancer Surgery in the Elderly. Bioengineering 2025, 12, 544. [Google Scholar] [CrossRef]
Bindewari, S.; Sharma, K.; Gaddam, S.S.; Verma, A.; Parashar, D.; Arse, M. Machine Learning Models for Early Detection of Cardiac Arrest Risk Factors. In Proceedings of the 2025 International Conference on Cognitive Computing in Engineering, Communications, Sciences and Biomedical Health Informatics (IC3ECSBHI), Greater Noida, India, 16–18 January 2025; pp. 1–5. [Google Scholar]
Moazemi, S.; Vahdati, S.; Li, J.; Kalkhoff, S.; Castano, L.J.V.; Dewitz, B.; Bibo, R.; Sabouniaghdam, P.; Tootooni, M.S.; Bundschuh, R.A.; et al. Artificial intelligence for clinical decision support for monitoring patients in cardiovascular ICUs: A systematic review. Front. Med. 2023, 10, 1109411. [Google Scholar] [CrossRef]
Abubeker, K.M.; Baskar, S.; Chandran, P.; Yadav, P. Computer Vision-Assisted Smart ICU Framework for Optimized Patient Care. IEEE Sens. Lett. 2024, 8, 6001004. [Google Scholar] [CrossRef]
Alowais, S.A.; Alghamdi, S.S.; Alsuhebany, N.; Alqahtani, T.; Alshaya, A.I.; Almohareb, S.N.; Aldairem, A.; Alrashed, M.; Bin Saleh, K.; Badreldin, H.A.; et al. Revolutionizing healthcare: The role of artificial intelligence in clinical practice. BMC Med. Educ. 2023, 23, 689. [Google Scholar] [CrossRef]
Van Dort, B.A.; Engelsma, T.; Medlock, S.; Dusseljee-Peute, L. User-Centered Methods in Explainable AI Development for Hospital Clinical Decision Support: A Scoping Review. Stud. Health Technol. Inform. 2025, 326, 17–21. [Google Scholar] [CrossRef]
Wang, B.; Asan, O.; Mansouri, M. Patients’ Perceptions of Integrating AI into Healthcare: Systems Thinking Approach. In Proceedings of the 2022 IEEE International Symposium on Systems Engineering (ISSE), Vienna, Austria, 24–26 October 2022; pp. 1–6. Available online: https://ieeexplore.ieee.org/document/10005383 (accessed on 23 March 2026).
Gerke, S.; Minssen, T.; Cohen, G. Ethical and legal challenges of artificial intelligence-driven healthcare. In Artificial Intelligence in Healthcare; Academic Press: Cambridge, MA, USA, 2020; pp. 295–336. [Google Scholar] [CrossRef]
Siala, H.; Wang, Y. SHIFTing artificial intelligence to be responsible in healthcare: A systematic review. Soc. Sci. Med. 2022, 296, 114782. [Google Scholar] [CrossRef] [PubMed]
De Micco, F.; Di Palma, G.; Ferorelli, D.; De Benedictis, A.; Tomassini, L.; Tambone, V.; Cingolani, M.; Scendoni, R. Artificial intelligence in healthcare: Transforming patient safety with intelligent systems-A systematic review. Front. Med. 2024, 11, 1522554. [Google Scholar] [CrossRef] [PubMed]
Lekadir, K.; Frangi, A.F.; Porras, A.R.; Glocker, B.; Cintas, C.; Langlotz, C.P.; Weicken, E.; Asselbergs, F.W.; Prior, F.; Collins, G.S.; et al. FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ 2025, 388, e081554. [Google Scholar] [CrossRef] [PubMed]
Cross, J.L.; Choma, M.A.; Onofrey, J.A. Bias in medical AI: Implications for clinical decision-making. PLoS Digit. Health 2024, 3, e0000651. [Google Scholar] [CrossRef]
Adegbesan, A.; Akingbola, A.; Ojo, O.; Jessica, O.U.; Alao, U.H.; Shagaya, U.; Adewole, O.; Abdullahi, O. Ethical Challenges in the Integration of Artificial Intelligence in Palliative Care. J. Med. Surg. Public Health 2024, 4, 100158. [Google Scholar] [CrossRef]
Karimian, G.; Petelos, E.; Evers, S.M.A.A. The ethical issues of the application of artificial intelligence in healthcare: A systematic scoping review. AI Ethics 2022, 2, 539–551. [Google Scholar] [CrossRef]
Chen, X.; Wang, X.; Qu, Y. Constructing Ethical AI Based on the “Human-in-the-Loop” System. Systems 2023, 11, 548. [Google Scholar] [CrossRef]
Sezgin, E. Artificial intelligence in healthcare: Complementing, not replacing, doctors and healthcare providers. Digit. Health 2023, 9, 20552076231186520. [Google Scholar] [CrossRef]
Mosqueira-Rey, E.; Hernández-Pereira, E.; Alonso-Ríos, D.; Bobes-Bascarán, J.; Fernández-Leal, Á. Human-in-the-loop machine learning: A state of the art. Artif. Intell. Rev. 2022, 56, 3005–3054. [Google Scholar] [CrossRef]
Tomaszewski, J.E. Overview of the role of artificial intelligence in pathology: The computer as a pathology digital assistant. In Artificial Intelligence and Deep Learning in Pathology; Elsevier: Amsterdam, The Netherlands, 2021; pp. 237–262. [Google Scholar] [CrossRef]
Leersum, C.M.v.; Maathuis, C. Human centred explainable AI decision-making in healthcare. J. Responsible Technol. 2025, 21, 100108. [Google Scholar] [CrossRef]
Chustecki, M. Benefits and Risks of AI in Health Care: Narrative Review. Interact. J. Med. Res. 2024, 13, e53616. [Google Scholar] [CrossRef] [PubMed]
Akingbola, A.; Adeleke, O.; Idris, A.; Adewole, O.; Adegbesan, A. Artificial Intelligence and the Dehumanization of Patient Care. J. Med. Surg. Public Health 2024, 3, 100138. [Google Scholar] [CrossRef]
European Society of Radiology. Summary of the proceedings of the International Forum 2021: “A more visible radiologist can never be replaced by AI”. Insights Imaging 2022, 13, 43. [Google Scholar] [CrossRef] [PubMed]
Abbasian Ardakani, A.; Airom, O.; Khorshidi, H.; Bureau, N.J.; Salvi, M.; Molinari, F.; Acharya, U.R. Interpretation of Artificial Intelligence Models in Healthcare: A Pictorial Guide for Clinicians. J. Ultrasound Med. 2024, 43, 1789–1818. [Google Scholar] [CrossRef]
Sadeghi, Z.; Alizadehsani, R.; Cifci, M.A.; Kausar, S.; Rehman, R.; Mahanta, P.; Bora, P.K.; Almasri, A.; Alkhawaldeh, R.S.; Hussain, S.; et al. A review of Explainable Artificial Intelligence in healthcare. Comput. Electr. Eng. 2024, 118, 109370. [Google Scholar] [CrossRef]
Jung, J.; Kang, S.; Choi, J.; El-Kareh, R.; Lee, H.; Kim, H. Evaluating the impact of explainable AI on clinicians’ decision-making: A study on ICU length of stay prediction. Int. J. Med. Inform. 2025, 201, 105943. [Google Scholar] [CrossRef]
Bharati, S.; Mondal, M.R.H.; Podder, P. A Review on Explainable Artificial Intelligence for Healthcare: Why, How, and When? IEEE Trans. Artif. Intell. 2024, 5, 1429–1442. [Google Scholar] [CrossRef]
Nazar, M.; Alam, M.M.; Yafi, E.; Su’Ud, M.M. A Systematic Review of Human-Computer Interaction and Explainable Artificial Intelligence in Healthcare with Artificial Intelligence Techniques. IEEE Access 2021, 9, 153316–153348. [Google Scholar] [CrossRef]
Reddy, S. Explainability and artificial intelligence in medicine. Lancet Digit. Health 2022, 4, e214–e215. [Google Scholar] [CrossRef] [PubMed]
Hamida, S.U.; Chowdhury, M.J.M.; Chakraborty, N.R.; Biswas, K.; Sami, S.K. Exploring the Landscape of Explainable Artificial Intelligence (XAI): A Systematic Review of Techniques and Applications. Big Data Cogn. Comput. 2024, 8, 149. [Google Scholar] [CrossRef]
Saeed, W.; Omlin, C. Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities. Knowl.-Based Syst. 2023, 263, 110273. [Google Scholar] [CrossRef]
Kiseleva, A.; Kotzinos, D.; De Hert, P. Transparency of AI in Healthcare as a Multilayered System of Accountabilities: Between Legal Requirements and Technical Limitations. Front. Artif. Intell. 2022, 5, 879603. [Google Scholar] [CrossRef]
Ali, S.; Abuhmed, T.; El-Sappagh, S.; Muhammad, K.; Alonso-Moral, J.M.; Confalonieri, R.; Guidotti, R.; Del Ser, J.; Díaz-Rodríguez, N.; Herrera, F. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Inf. Fusion 2023, 99, 101805. Available online: https://www.sciencedirect.com/science/article/pii/S1566253523001148 (accessed on 23 March 2026). [CrossRef]
Roy, S.; Pal, D.; Meena, T. Explainable artificial intelligence to increase transparency for revolutionizing healthcare ecosystem and the road ahead. Netw. Model. Anal. Health Inform. Bioinform. 2024, 13, 4. [Google Scholar] [CrossRef]
Dritsas, E.; Trigka, M. Application of Deep Learning for Heart Attack Prediction with Explainable Artificial Intelligence. Computers 2024, 13, 244. [Google Scholar] [CrossRef]
Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
Vimbi, V.; Shaffi, N.; Mahmud, M. Interpreting artificial intelligence models: A systematic review on the application of LIME and SHAP in Alzheimer’s disease detection. Brain Inform. 2024, 11, 10. [Google Scholar] [CrossRef]
Antoniadi, A.M.; Du, Y.; Guendouz, Y.; Wei, L.; Mazo, C.; Becker, B.A.; Mooney, C. Current Challenges and Future Opportunities for XAI in Machine Learning-Based Clinical Decision Support Systems: A Systematic Review. Appl. Sci. 2021, 11, 5088. [Google Scholar] [CrossRef]
Budd, S.; Robinson, E.C.; Kainz, B. A survey on active learning and human-in-the-loop deep learning for medical image analysis. Med. Image Anal. 2021, 71, 102062. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Gomez, C.; Huang, C.M.; Unberath, M. Explainable medical imaging AI needs human-centered design: Guidelines and evidence from a systematic review. NPJ Digit. Med. 2022, 5, 156. [Google Scholar] [CrossRef]
Tam, T.Y.C.; Sivarajkumar, S.; Kapoor, S.; Stolyar, A.V.; Polanska, K.; McCarthy, K.R.; Osterhoudt, H.; Wu, X.; Visweswaran, S.; Fu, S.; et al. A framework for human evaluation of large language models in healthcare derived from literature review. NPJ Digit. Med. 2024, 7, 258. [Google Scholar] [CrossRef]
Yuan, H.; Kang, L.; Li, Y.; Fan, Z. Human-in-the-loop machine learning for healthcare: Current progress and future opportunities in electronic health records. Med. Adv. 2024, 2, 318–322. [Google Scholar] [CrossRef]
Kabata, F.; Thaldar, D. Human in the loop requirement and AI healthcare applications in low-resource settings: A narrative review. S. Afr. J. Bioeth. Law 2024, 17, 70–73. Available online: https://samajournals.co.za/index.php/sajbl/article/view/1975 (accessed on 23 March 2026).
Gómez-Carmona, O.; Casado-Mansilla, D.; López-de-Ipiña, D.; García-Zubia, J. Human-in-the-loop machine learning: Reconceptualizing the role of the user in interactive approaches. Internet Things 2024, 25, 101048. [Google Scholar] [CrossRef]
Kumar, S.; Datta, S.; Singh, V.; Datta, D.; Kumar Singh, S.; Sharma, R. Applications, Challenges, and Future Directions of Human-in-the-Loop Learning. IEEE Access 2024, 12, 75735–75760. [Google Scholar] [CrossRef]
Sarker, I.H. AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems. SN Comput. Sci. 2022, 3, 158. [Google Scholar] [CrossRef] [PubMed]
Steidl, M.; Felderer, M.; Ramler, R. The pipeline for the continuous development of artificial intelligence models—Current state of research and practice. J. Syst. Softw. 2023, 199, 111615. [Google Scholar] [CrossRef]
Ennab, M.; McHeick, H. Enhancing interpretability and accuracy of AI models in healthcare: A comprehensive review on challenges and future directions. Front. Robot. AI 2024, 11, 1444763. [Google Scholar] [CrossRef] [PubMed]
Adnan, N.; Faizan Ahmed, S.M.; Das, J.K.; Aijaz, S.; Sukhia, R.H.; Hoodbhoy, Z.; Umer, F. Developing an AI-based application for caries index detection on intraoral photographs. Sci. Rep. 2024, 14, 26752. [Google Scholar] [CrossRef]
Ignesti, G.; Deri, C.; D’Angelo, G.; Pratali, L.; Bruno, A.; Benassi, A.; Salvetti, O.; Moroni, D.; Martinelli, M. Deep learning methods for point-of-care ultrasound examination. In Proceedings of the 17th International Conference on Signal-Image Technology and Internet-Based Systems, SITIS 2023, Bangkok, Thailand, 8–10 November 2023; pp. 435–440. Available online: https://ieeexplore.ieee.org/document/10472834 (accessed on 23 March 2026).
Lee, M.H.; Siewiorek, D.P.; Smailagic, A.; Bernardino, A.; Bermúdez I Badia, S. Towards Efficient Annotations for a Human-AI Collaborative, Clinical Decision Support System: A Case Study on Physical Stroke Rehabilitation Assessment. In Proceedings of the International Conference on Intelligent User Interfaces, Proceedings IUI; Association for Computing Machinery: New York, NY, USA, 2022; pp. 4–14. Available online: https://ink.library.smu.edu.sg/sis_research/7307/ (accessed on 23 March 2026).
Jin, L.; Yang, J.; Kuang, K.; Ni, B.; Gao, Y.; Sun, Y.; Gao, P.; Ma, W.; Tan, M.; Kang, H.; et al. Deep-learning-assisted detection and segmentation of rib fractures from CT scans: Development and validation of FracNet. EBioMedicine 2020, 62, 103106. [Google Scholar] [CrossRef]
Ramesh, P.V.; Subramaniam, T.; Ray, P.; Devadas, A.K.; Ramesh, S.V.; Ansar, S.M.; Ramesh, M.K.; Rajasekaran, R.; Parthasarathi, S. Utilizing human intelligence in artificial intelligence for detecting glaucomatous fundus images using human-in-the-loop machine learning. Indian J. Ophthalmol. 2022, 70, 1131–1138. [Google Scholar] [CrossRef]
Huang, Z.; Wang, X.; Liu, X.; Li, J.; Hu, X.; Yu, Q.; Kuang, G.; Xiong, N.; Gao, Y. Human-in-the-loop machine learning-based quantitative assessment of hemifacial spasm based on volumetric interpolated breath-hold examination MR. Br. J. Radiol. 2025, 98, 562–570. [Google Scholar] [CrossRef]
Yu, R.; Jiang, K.W.; Bao, J.; Hou, Y.; Yi, Y.; Wu, D.; Song, Y.; Hu, C.H.; Yang, G.; Zhang, Y.D. PI-RADS(AI): Introducing a new human-in-the-loop AI model for prostate cancer diagnosis based on MRI. Br. J. Cancer 2023, 128, 1019–1029. [Google Scholar] [CrossRef]
Zhou, T.; Li, L.; Bredell, G.; Li, J.; Unkelbach, J.; Konukoglu, E. Volumetric memory network for interactive medical image segmentation. Med. Image Anal. 2023, 83, 102599. [Google Scholar] [CrossRef]
Busch, F.; Xu, L.; Sushko, D.; Weidlich, M.; Truhn, D.; Müller-Franzes, G.; Heimer, M.M.; Niehues, S.M.; Makowski, M.R.; Hinsche, M.; et al. Dual center validation of deep learning for automated multi-label segmentation of thoracic anatomy in bedside chest radiographs. Comput. Methods Programs Biomed. 2023, 234, 107505. [Google Scholar] [CrossRef]
Talaat, F.M.; Elnaggar, A.R.; Shaban, W.M.; Shehata, M.; Elhosseini, M. CardioRiskNet: A Hybrid AI-Based Model for Explainable Risk Prediction and Prognosis in Cardiovascular Disease. Bioengineering 2024, 11, 822. [Google Scholar] [CrossRef] [PubMed]
Chandler, C.; Foltz, P.W.; Elvevåg, B. Improving the Applicability of AI for Psychiatric Applications through Human-in-the-loop Methodologies. Schizophr. Bull. 2022, 48, 949–957. [Google Scholar] [CrossRef] [PubMed]
Brandenburg, J.M.; Jenke, A.C.; Stern, A.; Daum, M.T.J.; Schulze, A.; Younis, R.; Petrynowski, P.; Davitashvili, T.; Vanat, V.; Bhasker, N.; et al. Active learning for extracting surgomic features in robot-assisted minimally invasive esophagectomy: A prospective annotation study. Surg. Endosc. 2023, 37, 8577–8593. [Google Scholar] [CrossRef] [PubMed]
Banumathi, K.; Venkatesan, L.; Benjamin, L.S.; Vijayalakshmi, K.; Satchi, N.S. Reinforcement Learning in Personalized Medicine: A Comprehensive Review of Treatment Optimization Strategies. Cureus 2025, 17, e82756. [Google Scholar] [CrossRef]
Tang, S.; Modi, A.; Sjoding, M.W.; Wiens, J. Clinician-in-The-loop decision making: Reinforcement learning with near-optimal set-valued policies. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020; JMLR: Norfolk, MA, USA, 2020; pp. 9329–9338. Available online: https://arxiv.org/abs/2007.12678 (accessed on 23 March 2026).
Washington, P. A Perspective on Crowdsourcing and Human-in-the-Loop Workflows in Precision Health. J. Med. Internet Res. 2024, 26, e51138. [Google Scholar] [CrossRef]
Roe, K.D.; Jawa, V.; Zhang, X.; Chute, C.G.; Epstein, J.A.; Matelsky, J.; Shpitser, I.; Taylor, C.O. Feature engineering with clinical expert knowledge: A case study assessment of machine learning model complexity and performance. PLoS ONE 2020, 15, e0231300. [Google Scholar] [CrossRef]
Wu, Y.; Liu, Y.; Yang, Y.; Yao, M.S.; Yang, W.; Shi, X.; Yang, L.; Li, D.; Liu, Y.; Yin, S.; et al. A concept-based interpretable model for the diagnosis of choroid neoplasias using multimodal data. Nat. Commun. 2025, 16, 3504. [Google Scholar] [CrossRef]
Subba, B.; Toufiq, M.; Omi, F.; Yurieva, M.; Khan, T.; Rinchai, D.; Palucka, K.; Chaussabel, D. Human-augmented large language model-driven selection of glutathione peroxidase 4 as a candidate blood transcriptional biomarker for circulating erythroid cells. Sci. Rep. 2024, 14, 23225. [Google Scholar] [CrossRef]
Bodén, A.C.S.; Molin, J.; Garvin, S.; West, R.A.; Lundström, C.; Treanor, D. The human-in-the-loop: An evaluation of pathologists’ interaction with artificial intelligence in clinical practice. Histopathology 2021, 79, 210–218. [Google Scholar] [CrossRef]
Lee, S.; Lee, J.; Park, J.; Park, J.; Kim, D.; Lee, J.; Oh, J. Deep learning-based natural language processing for detecting medical symptoms and histories in emergency patient triage. Am. J. Emerg. Med. 2024, 77, 29–38. [Google Scholar] [CrossRef]
Ghani, A. HiLTS©: Human-in-the-Loop Therapeutic System: A Wireless-Enabled Digital Neuromodulation Testbed for Brainwave Entrainment. Technologies 2026, 14, 71. [Google Scholar] [CrossRef]
Metsch, J.M.; Saranti, A.; Angerschmid, A.; Pfeifer, B.; Klemt, V.; Holzinger, A.; Hauschild, A.C. CLARUS: An interactive explainable AI platform for manual counterfactuals in graph neural networks. J. Biomed. Inform. 2024, 150, 104600. [Google Scholar] [CrossRef] [PubMed]
Zliobaite, I.; Bifet, A.; Pfahringer, B.; Holmes, G. Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 27–39. [Google Scholar] [CrossRef]
El-Hasnony, I.M.; Elzeki, O.M.; Alshehri, A.; Salem, H. Multi-Label Active Learning-Based Machine Learning Model for Heart Disease Prediction. Sensors 2022, 22, 1184. [Google Scholar] [CrossRef] [PubMed]
Ziniu, L.; Ke, X.; Liu, L.; Lanqing, L.; Deheng, Y.; Peilin, Z. Deploying Offline Reinforcement Learning with Human Feedback. arXiv 2023, arXiv:2303.07046. Available online: https://arxiv.org/abs/2303.07046 (accessed on 23 March 2026).
Wysocki, O.; Davies, J.K.; Vigo, M.; Armstrong, A.C.; Landers, D.; Lee, R.; Freitas, A. Assessing the communication gap between AI models and healthcare professionals: Explainability, utility and trust in AI-driven clinical decision-making. Artif. Intell. 2023, 316, 103839. [Google Scholar] [CrossRef]
Mienye, I.D.; Obaido, G.; Jere, N.; Mienye, E.; Aruleba, K.; Emmanuel, I.D.; Ogbuokiri, B. A survey of explainable artificial intelligence in healthcare: Concepts, applications, and challenges. Inform. Med. Unlocked 2024, 51, 101587. [Google Scholar] [CrossRef]
Steffny, L.; Dahlem, N.; Reichl, L.; Gisa, K.; Greff, T.; Werth, D. Design of a Human-in-the-Loop Centered AI-Based Clinical Decision Support System for Professional Care Planning. In HHAI 2023: Augmenting Human Intellect; IOS Press: Amsterdam, The Netherlands, 2023. [Google Scholar] [CrossRef]
Zhang, S.; Yu, J.; Xu, X.; Yin, C.; Lu, Y.; Yao, B.; Tory, M.; Padilla, L.M.; Caterino, J.; Zhang, P.; et al. Rethinking Human-AI Collaboration in Complex Medical Decision Making: A Case Study in Sepsis Diagnosis. In Proceedings of the 2024 CHI Conference on Human Factors in Computing System; Association for Computing Machinery: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
Sendak, M.; Elish, M.C.; Gao, M.; Futoma, J.; Ratliff, W.; Nichols, M.; Bedoya, A.; Balu, S.; O’Brien, C. The human body is a black box. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency; Association for Computing Machinery: New York, NY, USA, 2020; pp. 99–109. [Google Scholar] [CrossRef]
van Voorst, R. Challenges and Limitations of Human Oversight in Ethical Artificial Intelligence Implementation in Health Care: Balancing Digital Literacy and Professional Strain. Mayo Clin. Proc. Digit. Health 2024, 2, 559–563. [Google Scholar] [CrossRef]
Hau, D. Update on Software as a Medical Device (SaMD): The TGA and IMDRF Perspectives; Therapeutic Goods Administration: Canberra, Australia, 2017. Available online: https://www.tga.gov.au/sites/default/files/update-on-software-as-a-medical-device-samd.pdf (accessed on 15 March 2026).
OpenRegulatory. FDA Risk Classification for Software as a Medical Device (SaMD). Available online: https://openregulatory.com/articles/fda-risk-classification-for-software-as-a-medical-device-samd (accessed on 15 March 2026).
Harmon, R.; Williams, P.A.H.; McCauley, V.B. Software as a Medical Device (SaMD): Useful or Useless Term? In Proceedings of the 54th Hawaii International Conference on System Sciences; University of Hawai’i at Mānoa: Honolulu, HI, USA, 2021; Available online: https://researchnow-admin.flinders.edu.au/ws/portalfiles/portal/35103812/Hermon_Williams_McCauley_SaMD_2021_Paper_0367.pdf (accessed on 23 March 2026).
Harris, S.; Bonnici, T.; Keen, T.; Lilaonitkul, W.; White, M.J.; Swanepoel, N. Clinical deployment environments: Five pillars of translational machine learning for health. Front. Digit. Health 2022, 4, 939292. [Google Scholar] [CrossRef]
Bayram, F.; Ahmed, B.S.; Kassler, A. From concept drift to model degradation: An overview on performance-aware drift detectors. Knowl.-Based Syst. 2022, 245, 108632. [Google Scholar] [CrossRef]
Aasvang, E.K.; Meyhoff, C.S. The future of postoperative vital sign monitoring in general wards: Improving patient safety through continuous artificial intelligence-enabled alert formation and reduction. Curr. Opin. Anaesthesiol. 2023, 36, 683–690. [Google Scholar] [CrossRef]
Aboutalebi, H.; Pavlova, M.; Shafiee, M.J.; Florea, A.; Hryniowski, A.; Wong, A. COVID-Net Biochem: An explainability-driven framework to building machine learning models for predicting survival and kidney injury of COVID-19 patients from clinical and biochemistry data. Sci. Rep. 2023, 13, 17001. [Google Scholar] [CrossRef] [PubMed]
Sheu, R.K.; Pardeshi, M.S.; Pai, K.C.; Chen, L.C.; Wu, C.L.; Chen, W.C. Interpretable Classification of Pneumonia Infection Using eXplainable AI (XAI-ICP). IEEE Access 2023, 11, 28896–28919. [Google Scholar] [CrossRef]
Ju, Y.; Waugh, J.L.S.; Singh, S.; Rusin, C.G.; Patel, A.B.; Jain, P.N. A multimodal deep learning tool for detection of junctional ectopic tachycardia in children with congenital heart disease. Heart Rhythm O² 2024, 5, 452–459. [Google Scholar] [CrossRef]
Gavai, A.K.; van Hillegersberg, J. AI-driven personalized nutrition: RAG-based digital health solution for obesity and type 2 diabetes. PLoS Digit. Health 2025, 4, e0000758. [Google Scholar] [CrossRef] [PubMed]
Bahani, K.; Moujabbir, M.; Ramdani, M. An accurate fuzzy rule-based classification systems for heart disease diagnosis. Sci. Afr. 2021, 14, e01019. [Google Scholar] [CrossRef]
Deshmukh, A.; Kallivalappil, N.; D’Souza, K.; Kadam, C. AL-XAI-MERS: Unveiling Alzheimer’s Mysteries with Explainable AI. In Proceedings of the 2nd International Conference on Emerging Trends in Information Technology and Engineering, ICETITE 2024, Vellore, India, 22–23 February 2024; Available online: https://ieeexplore.ieee.org/document/10493489 (accessed on 23 March 2026).
Bonde, A.; Lorenzen, S.; Brixen, G.; Troelsen, A.; Sillesen, M. Assessing the utility of deep neural networks in detecting superficial surgical site infections from free text electronic health record data. Front. Digit. Health 2023, 5, 1249835. [Google Scholar] [CrossRef]
Panigutti, C.; Beretta, A.; Fadda, D.; Giannotti, F.; Pedreschi, D.; Perotti, A.; Rinzivillo, S. Co-design of Human-centered, Explainable AI for Clinical Decision Support. ACM Trans. Interact. Intell. Syst. 2023, 13, 21. [Google Scholar] [CrossRef]
Yang, Y.; Truong, N.D.; Maher, C.; Nikpour, A.; Kavehei, O. Continental generalization of a human-in-the-loop AI system for clinical seizure recognition. Expert Syst. Appl. 2022, 207, 118083. [Google Scholar] [CrossRef]
Wani, N.A.; Kumar, R.; Bedi, J. DeepXplainer: An interpretable deep learning based approach for lung cancer detection using explainable artificial intelligence. Comput. Methods Programs Biomed. 2024, 243, 107879. [Google Scholar] [CrossRef] [PubMed]
Yu, H.Q.; Alaba, A.; Eziefuna, E. Evaluation of Integrated XAI Frameworks for Explaining Disease Prediction Models in Healthcare. In Internet of Things of Big Data for Healthcare; Communications in Computer and Information Science; Springer: Cham, Switzerland, 2024; pp. 14–28. Available online: https://link.springer.com/chapter/10.1007/978-3-031-52216-1_2 (accessed on 23 March 2026).
Awais, M.; Ghayvat, H.; Krishnan Pandarathodiyil, A.; Nabillah Ghani, W.M.; Ramanathan, A.; Pandya, S.; Walter, N.; Naufal Saad, M.; Zain, R.B.; Faye, I. Healthcare professional in the loop (HPIL): Classification of standard and oral cancer-causing anomalous regions of oral cavity using textural analysis technique in autofluorescence imaging. Sensors 2020, 20, 5780. [Google Scholar] [CrossRef] [PubMed]
Saranya, A.; Narayan, S. Risk Prediction of Heart Disease using Deep SHAP Techniques. In Proceedings of the 2nd International Conference on Advancement in Computation and Computer Technologies, InCACCT 2024, Gharuan, India, 2–3 May 2024; pp. 332–336. Available online: https://ieeexplore.ieee.org/document/10551212 (accessed on 23 March 2026).
Nafisah, S.I.; Muhammad, G. Tuberculosis detection in chest radiograph using convolutional neural network architecture and explainable artificial intelligence. Neural Comput. Appl. 2022, 36, 111–131. [Google Scholar] [CrossRef] [PubMed]
Hatherley, J.; Sparrow, R. Diachronic and synchronic variation in the performance of adaptive machine learning systems: The ethical challenges. J. Am. Med. Inform. Assoc. 2023, 30, 361–366. [Google Scholar] [CrossRef] [PubMed]
Pulicharla, M.R. Detecting and addressing model drift: Automated monitoring and real-time retraining in ML pipelines. World J. Adv. Res. Rev. 2019, 3, 147–152. [Google Scholar] [CrossRef]

Figure 1. Classification of HITL approaches.

Figure 2. Experts override AI for determining a final clinical decision.

Figure 3. Conceptual closed-loop HITL AI without retraining.

Figure 4. Experts moderate outputs (proposed Framework 1).

Figure 5. Experts moderate features (proposed Framework 2).

Figure 6. Experts add extra control inputs (proposed Framework 3).

Figure 7. Experts moderate internal parameters of model (proposed Framework 4).

Figure 8. PRISMA flow diagram of literature review.

Figure 9. Publication trend in the last 5 years.

Table 1. Summary of some recent Healthcare HITL review papers (2020–2025).

Ref	Year	Main Task	Scope	Key Gaps
[41]	2021	Reviews HITL and active learning in medical imaging	Focused on DL methods for classification and segmentation tasks	Does not cover non-imaging applications, mostly prototype-level studies, needs deployment focus
[42]	2022	Reviews HITL in XAI through human-centred design	Systematic analysis of usability and transparency in medical imaging AI	Narrow focus on imaging, needs economic and deployment considerations, few studies measure the actual clinical decision impact
[43]	2024	Presents a framework for human evaluation of clinical large language models (LLMs)	Evaluates trust, quality, safety, and reasoning of LLM outputs	Needs HITL during training or annotation, narrow to evaluation, framework not yet prospectively validated, limited LLM deployment data
[44]	2024	Classifies HITL roles across the ML lifecycle	Broad coverage: annotation, training, validation	Only covers opportunities for Electronic Health records, needs practical integration and ethical discussions, conceptual, minimal empirical validations in clinical environments
[45]	2024	Analyses ethical and regulatory aspects of HITL in low-resource settings	HITL is a governance necessity for safety and fairness	No technical taxonomy, purely normative, needs implementation examples or data, needs practical integration pathways
This Work	2026	Presents post-deployment HITL framework for clinical settings	HITL design enabling real-time expert modification of the model without model retraining	Prospective deployment evaluation is pending

Table 2. Comparative synthesis of post-deployment HITL AI studies in healthcare.

HITL Approaches	Ref	Clinical Domain	Interaction Type	Deployment Setting	AI Model	XAI Method	Evaluation Metrics
Post-deployment HITL with retraining	[72]	Kidney cancer	Human feedback on counterfactuals	Synthetic and PPI Dataset	GNN	GNNExplainer	Sensitivity: 0.75, Specificity: 0.74
	[88]	COVID-19 survival and acute kidney injury	Human-in-the-loop retraining based on clinician feedback	Hospital (Stony Brook)	eXtreme Gradient Boost (XGB), LightGBM, CatBoost, RF, LR	GSInquire	XGBoost performs best with Accuracy: 92.30% (Survival), 88.05% (AKI)
	[89]	Pneumonia infection	Clinician-in-the-loop transfer learning	Multi-centre CXR datasets	DCNN	SHAP	Accuracy: 92.14% (Independent Learning), 93.29% (TL)
	[90]	Pediatric cardiac arrhythmia	Clinician review & model retraining	ECG data, Texas Children Hospital	5 Layer CNN	LIME	AUC ROC: 0.95
	[91]	Dietary recommendation system	Feedback on AI nutrition suggestions	User-facing recommendation system	LLaMA3 model	Virtual nutritionist that provides structured, evidence-backed explanations	Success rate: 80.1% (Nutritional) 92% (Sustainability)
Post-deployment HITL without retraining (override).	[92]	Heart disease	Override rule-based HITL	Clinical dataset (Cleveland & CombinedHunVada)	FCRLC (IF-THEN)	Rule-based	Accuracy: 83.17% (CL) 80.46% (CO)
	[93]	Alzheimer’s diagnosis	Human override and validation	OASIS dataset	DenseNet121 and MobileNet v2	LIME	Accuracy: 88% (D), 93%(M)
	[94]	Surgical site infections	Clinician review	11 hospitals, Denmark	NLP	Human review	AUC ROC: 0.989
	[95]	Predicting next visit time, diagnosis, and medications	Human-in-the-loop override and explanation	MIMIC-IV ICU data	RNN	DoctorXAI, Prototype UI for explanation	Max avg Fidelity: 0.90 ± 0.03
	[96]	Seizure recognition	Clinician feedback on predictions	RPAH EEG	Conv-LSTM	The occlusion-based approach	Sensitivity: 92.19%
	[97]	Lung cancer detection	Human feedback for model correction	Survey Lung Cancer dataset	Hybrid (ConvXGB) model	SHAP	Accuracy: 97.43%
	[98]	Evaluation of XAI frameworks in disease prediction	Human evaluation of explanations	Prostate cancer, pneumonia CXR, medical Q&A	RF, LR-tabular CNN-Image	LIME, SHAP	AUC ROC (12-s window): 0.84
	[99]	Oral cancer	Graphical User Interface-based clinician review	VELscope imaging (own data)	KNN	GUI visualization	Accuracy: 83%
	[100]	Heart disease risk	Clinician-guided retraining	UCI dataset	CNN	DeepSHAP	Accuracy: 90%
	[101]	Tuberculosis detection	Clinician feedback	CXR: Montgomery, Shenzhen, Belarus	CNN	Visualization	Accuracy: 99.1%

Table 3. Comparison of post-deployment HITL approaches and the proposed framework.

HITL Approaches	Ref	Advantages	Limitations
With model retraining	[72,88,89,90,91]	Improved accuracy over time, adaptability to evolving data, enhanced clinical relevance, error correction & bias reduction, knowledge transfer, improved trust and transparency	High computational and resource costs, data drift risk between retraining cycles, human fatigue and variability, integration complexity, version management & validation overhead, delayed responsiveness
Without model Retraining (Override)	[92,93,94,95,96,97,98,99,100,101]	Real-time adaptation, lower computational burden, rapid feedback integration, improved safety and control, simpler regulatory compliance as the core model is unchanged, clinician trust and oversight	Increased cognitive load, risk of feedback fatigue, and inconsistent handling of edge cases
Without model Retraining (moderation)- proposed frameworks	-	Maintains autonomous operation while allowing human guidance, allows immediate adjustment to model behaviour without delay, no heavy retraining or optimisation steps needed, potentially simpler regulatory management compared with systems requiring frequent model retraining	Human influence may be limited to predefined moderation mechanisms, may require well-designed interfaces or protocols to be effective, moderation logic adds complexity to the inference pipeline, poorly calibrated moderation may suppress important alerts

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Das, D.; Adams, S.D.; Corva, D.M.; Bucknall, T.K.; Kouzani, A.Z. Advances in Closed-Loop Artificial Intelligence for Healthcare. Electronics 2026, 15, 1396. https://doi.org/10.3390/electronics15071396

AMA Style

Das D, Adams SD, Corva DM, Bucknall TK, Kouzani AZ. Advances in Closed-Loop Artificial Intelligence for Healthcare. Electronics. 2026; 15(7):1396. https://doi.org/10.3390/electronics15071396

Chicago/Turabian Style

Das, Diba, Scott D. Adams, Dean M. Corva, Tracey K. Bucknall, and Abbas Z. Kouzani. 2026. "Advances in Closed-Loop Artificial Intelligence for Healthcare" Electronics 15, no. 7: 1396. https://doi.org/10.3390/electronics15071396

APA Style

Das, D., Adams, S. D., Corva, D. M., Bucknall, T. K., & Kouzani, A. Z. (2026). Advances in Closed-Loop Artificial Intelligence for Healthcare. Electronics, 15(7), 1396. https://doi.org/10.3390/electronics15071396

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advances in Closed-Loop Artificial Intelligence for Healthcare

Abstract

1. Introduction

2. HITL AI Approaches

2.1. Definitions of Pre-Deployment and Post-Deployment

2.2. Pre-Deployment

2.2.1. Human Labelling Data

2.2.2. Active Learning

2.2.3. Reinforcement Learning

2.2.4. Others

2.3. Post-Deployment

2.3.1. With Model Retraining

Without XAI

With XAI

Active Learning

Reinforcement Learning

2.3.2. Without Model Retraining

Override

Moderation

3. Systematic Search for Post-Deployment HITL AI

3.1. Methodology

3.1.1. Search Strategy and Data Sources

3.1.2. Inclusion Criteria

3.1.3. Study Selection and Data Collection

3.2. Results

3.2.1. Publication Trend of Initial Search String

3.2.2. Existing Post-Deployment HITL AI Studies

4. Discussions and Future Directions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI