Next Article in Journal
Negotiating Human–AI Complementarity in Geriatric and Palliative Care: A Qualitative Study of Healthcare Practitioners’ Perspectives in Northeast China
Previous Article in Journal
Sentiment Analysis of Meme Images Using Deep Neural Network Based on Keypoint Representation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Explainable AI for Clinical Decision Support Systems: Literature Review, Key Gaps, and Research Synthesis

by
Mozhgan Salimparsa
1,
Kamran Sedig
1,*,
Daniel J. Lizotte
2,3,
Sheikh S. Abdullah
1,4,5,6,7,
Niaz Chalabianloo
2,6,8 and
Flory T. Muanda
3,6,7,8
1
Insight Lab, Department of Computer Science, Western University, London, ON N6A 3K7, Canada
2
Department of Computer Science, Faculty of Science, Western University, London, ON N6A 3K7, Canada
3
Department of Epidemiology & Biostatistics, Western University, London, ON N6G 2M1, Canada
4
Department of Computer Science, MacEwan University, Edmonton, AB T5J 2P2, Canada
5
London Health Sciences Centre Research Institute, London, ON N6A 5W9, Canada
6
ICES Western, London, ON N6A 5W9, Canada
7
Lawson Health Research Institute, London Health Sciences Centre, London, ON N6A 4V2, Canada
8
Department of Physiology and Pharmacology, Western University, London, ON N6A 5C1, Canada
*
Author to whom correspondence should be addressed.
Informatics 2025, 12(4), 119; https://doi.org/10.3390/informatics12040119
Submission received: 13 September 2025 / Revised: 14 October 2025 / Accepted: 21 October 2025 / Published: 28 October 2025
(This article belongs to the Section Health Informatics)

Abstract

While Artificial Intelligence (AI) promises significant enhancements for Clinical Decision Support Systems (CDSSs), the opacity of many AI models remains a major barrier to clinical adoption, primarily due to interpretability and trust challenges. Explainable AI (XAI) seeks to bridge this gap by making model reasoning understandable to clinicians, but technical XAI solutions have too often failed to address real-world clinician needs, workflow integration, and usability concerns. This study synthesizes persistent challenges in applying XAI to CDSS—including mismatched explanation methods, suboptimal interface designs, and insufficient evaluation practices—and proposes a structured, user-centered framework to guide more effective and trustworthy XAI-CDSS development. Drawing on a comprehensive literature review, we detail a three-phase framework encompassing user-centered XAI method selection, interface co-design, and iterative evaluation and refinement. We demonstrate its application through a retrospective case study analysis of a published XAI-CDSS for sepsis care. Our synthesis highlights the importance of aligning XAI with clinical workflows, supporting calibrated trust, and deploying robust evaluation methodologies that capture real-world clinician–AI interaction patterns, such as negotiation. The case analysis shows how the framework can systematically identify and address user-centric gaps, leading to better workflow integration, tailored explanations, and more usable interfaces. We conclude that achieving trustworthy and clinically useful XAI-CDSS requires a fundamentally user-centered approach; our framework offers actionable guidance for creating explainable, usable, and trusted AI systems in healthcare.

1. Introduction

Clinical Decision Support Systems (CDSSs) are computer applications designed to facilitate clinicians’ decision-making processes [1]. These systems assist clinicians in leveraging data and modeling techniques to improve decision quality, which, in turn, can enhance healthcare delivery. CDSSs have diverse applications, ranging from information management to providing patient-specific recommendations [2]. In recent years, there has been growing interest among CDSS developers in applying Artificial Intelligence (AI) to predict clinical outcomes [3]. Machine Learning (ML), a subfield of AI, enables systems to learn from past experiences (e.g., patient histories) and recognize useful patterns in health data. ML models take data features as input and, based on underlying patterns, generate predictions. A review by Montani and Striani [4] suggests that such ML models are becoming integral to CDSS.
Although applying ML models in CDSS holds considerable potential for improving clinical outcomes, the ‘black-box’ nature of many such models limits their utility. Predictions from these ML models are often characterized by a lack of interpretability [5]. Rudin [6], for example, argues that opaque models should be avoided in high-stakes settings such as medicine, where interpretability and trustworthiness are at least as important as predictive accuracy. Clinicians and patients may find it difficult to trust predictions if they do not understand the underlying logic [7]. Thus, limited model interpretability remains a significant barrier to widespread clinical adoption.
These challenges have spurred interest in interpretable and Explainable AI (XAI) as a means to improve accountability and trustworthiness. A variety of XAI methods have been developed to construct and communicate explanations for ML model predictions [8]. These methods can act as ‘translators’ for the models; their outputs, when consistent with domain knowledge, help users develop a deeper understanding of the model’s logic and mitigate its ‘black-box’ nature. Ultimately, the goal is to enable users to understand, and consequently trust, the models and their outputs [8,9]. XAI methods, therefore, hold significant potential to enhance the implementation, utility, and adoption of AI in CDSS.
While current XAI methods show promise in making systems more accountable and transparent [10], effectively communicating explanations to users via the system interface remains a major obstacle to integrating ML into clinical practice. This results in a persistent gap between the technical capabilities of XAI tools and their practical utility for end-users. To accelerate XAI integration, it is crucial to bridge this gap by meticulously considering clinicians’ needs and task objectives during system design.
This paper underscores the importance of a user-centered design approach for XAI methods within clinical applications. Section 2 provides background on XAI methodologies. Section 3 outlines our literature review methodology. Section 4 synthesizes recent research, presenting findings on XAI applications, challenges, and evaluation approaches in CDSS. Key themes are discussed, with Section 4.1 identifying primary factors that contribute to the gap between clinicians and CDSS. In response, Section 5 proposes a structured, three-phase user-centered framework for the development and evaluation of XAI-CDSS. Section 6 demonstrates the framework’s practical applicability through a retrospective case study analysis of a recent XAI-CDSS implementation. Finally, Section 7 concludes the paper with a summary and outlines directions for future research.

2. Background

2.1. Explainable AI

There are two general categories of XAI methods: ante hoc and post hoc [8,9,11,12]. Ante hoc (or explainable modeling) methods involve models specifically designed to be transparent or ‘glass-box’ systems, whose logic is understandable by end-users. Examples include RuleFit, additive models (e.g., Generalized Additive Models—GAMs), fuzzy inference systems, decision trees, and linear regression (see cited sources for specific details). Post hoc (or post-modeling) methods are employed to explain existing ’black-box’ models whose logic is not inherently understandable to end-users.
Since post hoc methods operate on already-trained models, they are widely applicable across diverse modeling scenarios. Post hoc methods can be categorized along three dimensions: model specificity, explanation scope, and explanation type. Regarding model specificity, XAI methods are divided into two major groups: model-specific and model-agnostic. Model-specific methods explain operations within a particular ML model and are limited to that specific model architecture. For instance, Layer-Wise Relevance Propagation (LRP) is a model-specific explanation method for neural networks. In contrast, model-agnostic methods are not tied to a specific model and aim to explain predictions without referring to the model’s internal workings. Model-agnostic methods are especially popular due to their broad applicability across models constructed by various ML algorithms.
Regarding explanation scope, XAI methods can be global or local. Importantly, model specificity and explanation scope are orthogonal; an XAI method can be model-specific and global, model-specific and local, model-agnostic and global, or model-agnostic and local. Global methods explain a model’s overall logic and reasoning across an entire dataset, whereas local methods explain the reasoning for a specific prediction related to a single data point.
Concerning the type of explanation, XAI methods can be categorized as simplification-based, influence-based, or example-based. We adapt the categorization used by Adadi and Berrada [11] and Guidotti et al. [12] (refer to Figure 1). These explanation types either reveal information about a model’s internal workings and the logic behind its predictions or expose the model’s input–output relationships. They can be employed individually or in combination.
Simplification strategies aim to render complex models more understandable by providing approximations or distilled representations of the model’s logic. One approach is knowledge extraction, which derives simplified (often symbolic) descriptions of what the model has learned. This includes distillation methods that compress complex models into simpler forms [13,14,15] and rule extraction techniques that generate sets of rules approximating the model’s decision-making process [16,17,18,19,20]. Note, however, that the interpretability of extracted rules may diminish in situations where models involve very high-dimensional feature spaces. Another simplification strategy is surrogate modeling, where an inherently interpretable model (e.g., a linear model or decision tree) is trained to mimic the predictions of the original complex model. Local Interpretable Model-Agnostic Explanation (LIME) is a popular example of a local surrogate approach that explains individual predictions by fitting simple models locally around those instances [21]. However, surrogate models may only accurately approximate the original model in specific local regions [22].
Influence assessment strategies focus on determining and communicating the importance or influence of input features on the model’s predictions. This helps users identify which inputs drive a specific outcome and enables them to compare these with their clinical judgment. For example, sensitivity analysis examines how changes in input values affect the output, aiding in the verification of model behavior and stability [23]. More commonly, feature importance methods quantify each input feature’s contribution to a particular prediction (or to the model’s overall performance) [24]. Techniques for measuring feature importance include perturbation-based methods, which assess the impact of altering or omitting feature values but perturbing correlated features can produce implausible data instances [25]; SHAP (SHapley Additive exPlanations), which uses game theory to assign contribution values to features for local explanations or approximate global importance [26,27]; and saliency maps, often used with deep learning models on images or text, which visually highlight the input regions most influential for a given output [28,29]. For neural networks specifically, model-specific neuron contribution methods assess the importance of internal components. These include propagation-based techniques like LRP, which backpropagate a quantity of interest to determine input contributions [30,31,32], and activation maximization, which identifies input patterns that maximally activate specific neurons [33].
Finally, example-based strategies explain model behavior by referencing specific data instances; these methods are often model-agnostic. Prototype and criticism approaches select representative instances (‘prototypes’) from the dataset to exemplify typical model behavior, possibly supplemented by ‘criticisms’ that represent atypical cases [34]. ounterfactual explanations identify minimal changes to an input instance’s features that would alter the model’s prediction outcome, helping users understand the model’s decision boundary [35,36]. Adversarial examples are a related concept: these are small, often imperceptible perturbations to inputs designed to intentionally cause misclassification, thereby revealing model vulnerabilities or sensitivities [37]. In clinical contexts, Case-Based Reasoning (CBR) is another example-based approach that explains predictions by retrieving and presenting similar past cases from a case database [38]. Collectively, these example-based methods help users understand model behavior through concrete illustrations and comparisons.

2.2. Joint Cognitive Systems

Clinical decision-making in complex conditions involves intricate cognitive activities [39]. According to the theory of distributed cognition, such activities occur not only within an individual’s mind but also through interactions with the external environment [40]. This distribution of cognitive work transpires over time and arises from interactions between internal factors (e.g., an individual’s analytical reasoning ability and background knowledge) and external resources (e.g., computer processing power and information representations). In this manner, external tools become coupled with an individual’s cognitive system and can effectively extend it.
When a clinician uses a CDSS to perform complex decision-making, the clinician and the CDSS constitute a joint cognitive system. In this joint system, cognitive activities result from a coupling between the clinician’s cognitive processes and the CDSS’s external representations of information [41]. Consequently, the clinician and the CDSS share the information processing required to perform the task. The effectiveness of such complex cognitive work depends on the characteristics of both the user and the CDSS, as well as the strength of the coupling between them. Coupling can be weak or strong; when coupling is weak, external aids contribute little to actual information processing [40].

3. Review Methodology

We conducted a literature search to identify research papers describing the development, application, evaluation, or proposed frameworks related to XAI in the context of CDSS. To capture both foundational and recent developments relevant to the field’s evolution, we considered publications from 2010 through December 2024.
We used a set of relevant keywords indicating research at the intersection of XAI and CDSS (summarized in Table 1). We performed searches across multiple databases, including PubMed, Scopus, Web of Science, Google Scholar, the ACM Digital Library, and IEEE Xplore, to ensure comprehensive coverage across the biomedical, computer science, and human–computer interaction literature. Search strategies combined various XAI and CDSS-related terms (e.g., “Explainable AI” AND “Clinical Decision Support”).
Inclusion criteria included peer-reviewed publications (journal articles or conference proceedings) and highly cited preprints focused on XAI within a healthcare CDSS context. Studies were excluded if they (a) did not primarily focus on the intersection of XAI and CDSS in healthcare, (b) dealt solely with topics like public health surveillance or genomics/administrative data analysis without a direct CDSS application, or with institutional policy guidelines, or (c) were not available in English.
After gathering the search results, we conducted a thematic synthesis of the literature following a structured screening process, as illustrated in the PRISMA diagram in Figure 2. Following the removal of duplicates, two authors independently screened the titles and abstracts against the inclusion and exclusion criteria. Any disagreements were resolved through discussion with a third author to reach a consensus. The full texts of all potentially relevant articles were then retrieved and evaluated by two authors to determine final inclusion. We also manually scanned the reference lists of key articles and systematic reviews found in the search to capture any further relevant publications.
Studies meeting the inclusion criteria and directly addressing core themes such as XAI application in CDSS, user-centered design considerations, evaluation methodologies, clinician trust, or interaction patterns were selected for review. This process resulted in a final set of approximately 30 core studies (including 13 foundational works published before 2021 and 17 recent publications from 2021 onward). These studies form the basis of our literature synthesis (Section 4), gap analysis (Section 4.1), and case study evaluation (Section 6).

4. Literature Review: State of XAI in CDSS

Based on the comprehensive search described in Section 3 (covering 2010–2024), this section synthesizes key findings from the literature on the application, evaluation, and challenges of XAI in CDSS. Early foundational studies explored a wide range of interpretability techniques, including visualizing ontology-based preferences for decision support [42], using visual case-based reasoning in oncology [38], developing inherently interpretable models like RETAIN for electronic health records [43], explaining inferences in Bayesian networks [44], deriving rule-based explanations from predictive models [45], employing mimic learning (training simpler models to replicate complex model behavior) [46], and applying post hoc methods such as LIME and LRP for specific use cases [47,48,49]. A key limitation in much of this foundational work was the insufficient rigorous assessment of how useful or effective these explanations were in actual clinical workflows, revealing an early evaluation gap.
As summarized by the PRISMA diagram (Figure 2), our search yielded approximately 1032 records. After removing 274 duplicates, 758 titles and abstracts were screened, and 56 potentially relevant articles underwent full-text review. Applying the inclusion/exclusion criteria resulted in a final set of 30 core studies for our analysis. To provide a clear narrative of the field’s evolution, we categorized these into 13 foundational works (published before 2021) and 17 recent publications (2021–2024). While the foundational studies were essential for identifying the historical context and persistent challenges in the field, our thematic synthesis focuses primarily on the 17 recent studies to ensure our analysis of key themes reflects the current state of the art in XAI for CDSS. Insights from the foundational works are integrated where they provide essential background or contrast. The key themes identified in the recent literature are summarized in Table 2.

4.1. Addressing Gaps in User-Centered XAI-CDSS Evaluation

The integration of AI into clinical practice necessitates a shift from purely technology-driven developments to more human-centered approaches. While AI models have shown increasing technical prowess [50], their effective adoption hinges on alignment with clinical workflows, user needs, and the complex realities of healthcare environments. The ‘black-box’ nature of many AI systems remains a significant hurdle, limiting transparency and hindering trust among clinicians who require understandable justifications for AI-driven recommendations, especially in high-stakes decisions [51]. While early XAI efforts explored diverse interpretability techniques, as noted previously, a persistent challenge has been ensuring these solutions effectively meet user needs and integrate into clinical practice.
Recent systematic reviews reinforce the persistence of these user-centered challenges. For example, a 2024 review by Ayorinde et al. [51] found widespread reports of healthcare professionals struggling to understand AI outputs and rationale due to the AI’s opacity. This lack of understanding was explicitly linked to diminished trust and acceptance of AI, increased difficulty integrating AI tools into existing clinical workflows, and confusion over specific AI alerts [51]. Such empirically grounded challenges underscore the critical importance of early and thorough understanding of the clinical context and user needs early in the development process, as well as the importance of designing systems that integrate seamlessly within workflows and clearly communicate AI outputs.
A central challenge in this domain is navigating the well-documented trade-off between model accuracy and inherent interpretability [6]. Highly complex, opaque models (e.g., deep neural networks) may offer superior predictive performance, while simpler, inherently transparent models (e.g., rule-based systems) are easier for clinicians to understand but may not capture intricate patterns in the data. The debate over how to balance this trade-off is ongoing and context-dependent [52,53]. Choosing a simpler, interpretable model might be preferable in high-stakes scenarios where the cost of an unexplainable error is catastrophic, aligning with arguments to avoid ‘black box’ models altogether in medicine [6]. Conversely, a post hoc explanation for a more accurate black-box model might be sufficient for lower-risk tasks. This decision is not merely technical; it is a clinical and ethical one. A user-centered approach, as proposed in our framework, is essential for navigating this balance as it forces developers to first define the specific clinical needs, user requirements, and acceptable levels of risk, which in turn determines the most appropriate modeling and explanation strategy.
Further reviews focusing on XAI in CDSS (e.g., Kim et al. [54] and Aziz et al. [55]) provide broad overviews of applied methods and clinical domains, but they consistently critique existing evaluation practices for lacking robustness and user-centered perspectives. Aziz et al. [55], analyzing studies up to 2024, identified gaps including the need for better healthcare data handling, more comprehensive XAI evaluation methods tailored to usability, and stronger interdisciplinary collaboration to bridge the technical–clinical divide. Echoing the call for more user-centered processes, Panigutti et al. [56] explicitly advocate for co-design and human-centered development, highlighting the importance of involving clinicians early and iteratively to avoid building systems misaligned with clinical needs.
Empirical studies further illustrate these gaps. Turri et al. [57] found that achieving meaningful transparency in a deployed AI system required addressing nuanced user needs and practical limitations far beyond implementing a basic XAI technique. This finding reinforces the necessity of deep contextual understanding and thorough user-need analysis. Similarly, Micocci et al. [58] highlighted the danger of clinicians passively adhering to incorrect AI recommendations, demonstrating a critical need for systems and evaluations that encourage appropriate reliance. The ongoing debate about explainability trade-offs [52,53], building on earlier work [43,44,45,46], underscores that selecting suitable XAI methods requires careful consideration of the context–balancing interpretability needs against performance constraints and workflow pressures.
Overall, these consistent findings point toward systemic challenges and a disconnect between technical XAI development and practical clinical utility—often stemming from insufficient user-centered evaluation and difficulty achieving truly meaningful transparency. This indicates a clear need for development processes that prioritize end-user needs and incorporate evaluation methods aligned with real clinical practice.

4.2. Understanding Clinician Trust, Acceptance, and Interaction

Trust is a cornerstone of AI adoption in healthcare, yet it is a complex construct to define and measure [11]. Rosenbacke et al. [59] conducted a systematic review on XAI’s impact on clinician trust and found considerable heterogeneity in how trust is assessed across studies. Most evaluations focus on cognitive facets of trust (e.g., perceptions of reliability) rather than affective facets, despite the latter’s influence on technology acceptance. The review found XAI’s impact on trust to be nuanced: clear and relevant explanations can increase trust, but poorly designed or irrelevant explanations can decrease trust or have no effect. Moreover, fostering trust is not an unqualified good–excessive or unfounded trust (over-trust) in AI is risky [59]. Methods to evaluate trust in these studies ranged from self-reported surveys [60,61] to behavioral measures like the Weight-of-Advice (WoA) technique [61] and richer qualitative analyses [62]. Notably, Laxar et al. [61] discovered a disconnect between low self-reported trust and high behavioral reliance (WoA) on an AI-based triage support tool, suggesting that clinicians’ actual reliance on AI may not directly align with what they say about trust.
Clinician trust is modulated by various factors. While some earlier work tried to link specific explanation types to perceived trustworthiness (e.g., rule-based vs. example-based explanations) [44], recent studies confirm that the relationship is complex [59]. The type of explanation provided matters significantly: for instance, Gomez et al. [60] found that in their telehealth application, example-based explanations garnered more trust from clinicians than feature-importance explanations, indicating the importance of selecting explanation modalities that align with clinician reasoning processes. User expertise also plays a role: more knowledgeable clinicians are often better able to calibrate their trust in an AI system appropriately [58,62]. Critically, a baseline level of perceived AI reliability and evidence of rigorous external validation are often prerequisites for clinicians to trust an AI tool at all [62]. Furthermore, the effectiveness of the interface in presenting explanations can significantly impact how explanations are received and used [60,62]; a clunky or confusing interface may undermine trust even if the underlying explanation is sound.
Interaction patterns between clinicians and AI are often complex and do not boil down to simple accept/reject decisions. Sivaraman et al. [62] identified a prevalent interaction pattern they termed ‘negotiation’, where clinicians selectively engage with, modify, or ignore AI recommendations. This nuance is missed by binary metrics that only record whether the AI’s advice was followed. Such findings underscore the need to understand how clinicians actually integrate AI advice into their workflow and to develop evaluation metrics capable of capturing these nuanced behaviors. The field grapples with dual risks: on one hand, passive over-reliance on AI (taking AI suggestions without sufficient scrutiny) [58], and on the other, under-reliance or excessive skepticism of AI (dismissal of useful AI input) [60]. This tension highlights that the goal of system design and evaluation should be appropriate trust calibration [59], empowering clinicians to leverage AI effectively while maintaining their critical judgment. The evolution toward more rigorous, multifaceted trust assessments in recent work [53,63,64,65] stands in contrast to many earlier XAI studies [11,45,46,66,67], which often demonstrated technical feasibility and interpretability but did not empirically evaluate user trust or integration into real clinical workflows.

4.3. Advancements in XAI Evaluation Methodologies

Recent research has begun to critique traditional evaluation methods for AI and XAI that focus too narrowly on performance metrics or simplistic user feedback. Morrison et al. [64], for example, note that metrics like accuracy or brief usability surveys fail to capture real-world interaction nuances or the true effectiveness of explanations. This critique builds on limitations that were often implicit in early XAI studies [38,45,46,66,68], where evaluations prioritized technical feasibility or proof-of-concept over robust behavioral assessment. Even when interactive visual explanation tools were introduced (e.g., RetainVis [68]), these highlighted the need for more dynamic user evaluations beyond static metrics. There is now a growing call for more robust, behaviorally grounded, and human-centered evaluation methodologies [53]. Systematic reviews such as Aziz et al. [55] explicitly urge the development of comprehensive XAI evaluation strategies that better assess clinical usability and impact.
Frameworks and tools for richer behavioral evaluation are emerging. Cabrera et al.’s Zeno platform [69] provides an interactive framework to probe AI systems for systematic failures and biases by examining performance on specific data subgroups, directly addressing the need for rigorous stress-testing tools. To tackle the challenge of scaling up human evaluations, Morrison et al. [64] proposed ‘Eye into AI,’ a game with a purpose (GWAP) that collects human judgments on the interpretability of AI explanations and human–AI agreement at scale. This approach offers an innovative way to perform large-scale, user-centered evaluation by engaging end-users (or proxy participants) in a structured game setting.
Increasingly, evaluations are focusing on the explanation itself as a first-class object of study. Building on earlier efforts that compared different model outputs [70] or validated the intuitiveness of visual explanations (like those from LRP) [48,49], recent studies have begun to experimentally test the impact of different explanation strategies on user performance and reliance. For example, Morrison et al. [64] varied explanation types and measured effects on users’ decision outcomes and confidence. Cabrera et al. [69] introduced the concept of ‘behavior descriptions’ as a form of explanation and evaluated how these influence user understanding. Such research informs the selection of effective XAI methods and the design of explanation interfaces. Moreover, the historical divergence between using inherently interpretable models [43] versus applying post hoc explanations to black-box models [45,46,66] necessitates evaluation approaches that can assess both a model’s intrinsic interpretability and the fidelity of any additional explanations.
The development of tools (e.g., Zeno) and methods (e.g., GWAPs) signals a shift towards more meaningful assessment of human–AI interaction. While early explanation techniques (e.g., LRP [48,49]) provided intuitive outputs, recognition of the need to validate those explanations’ utility underscores the challenge of ensuring explanations are both faithful to the model and relevant to the user’s task. This is a core concern for modern, comprehensive evaluation methodologies. The focus is gradually shifting to evaluating the explanation as an active component of the system design, which requires assessment criteria distinct from the underlying model’s predictive accuracy.

4.4. Frameworks for Responsible and Trustworthy XAI-CDSS

Given the complexity of deploying AI in healthcare, there is a recognized need for overarching frameworks to guide development toward systems that are reliable, fair, transparent, and clinically aligned. Recent proposals have started to fill this gap. For instance, Sáez et al. [71] introduced a ‘Resilient AI Framework’ focused on robustness in AI for health, and Nasarian et al. [72] proposed a ‘Responsible Clinician–AI Collaboration Framework.’ The latter, derived from a systematic review, emphasizes a structured interpretability process (covering pre-model, model, and post-model stages), as well as quality control, clear communication, and implementation roadmaps aimed at enhancing trust and collaboration between clinicians and AI.
These emerging frameworks share common ground with the approach proposed in this paper, particularly in their overarching goals of fostering trustworthiness through user focus, rigorous evaluation, and structured processes. However, they differ in their primary methodological focus and contribution. For instance, the framework by Sáez et al. [71] prioritizes system resilience, outlining principles to ensure AI is robust against real-world data challenges as a pathway to trustworthiness. The framework by Nasarian et al. [72], derived from a systematic review, focuses on an interpretability workflow, detailing a three-level process (pre-model, model, and post-model) to guide responsible clinician–AI collaboration.
The framework we propose in this paper offers a distinct methodological contribution by structuring the development process as a user-centered evaluative lifecycle. Its novelty lies not in the individual components (user needs analysis and evaluation are known principles) but in its integration of these into three distinct, iterative phases that place continuous, context-aware user evaluation at the center of the entire development process. Unlike frameworks that focus on system properties (like resilience) or technical workflows (like interpretability), our framework provides a process-oriented methodology specifically for ensuring the final XAI-CDSS is demonstrably usable, trusted, and clinically integrated, with a feedback loop designed to drive iterative refinement based on empirical user-centered evidence.
Related concepts such as co-design and participatory frameworks for AI development [56], new behavioral evaluation tools (e.g., Zeno [69]), and emerging ethical and governance principles for XAI [11] also enrich this evolving landscape. The proliferation of such frameworks and best-practice guidelines indicates that the field is moving beyond ad hoc applications of XAI toward more principled, systematic development approaches. This shift is driven by the need for holistic strategies that address technical performance, ethical considerations (like fairness and accountability), and effective user collaboration simultaneously. Our proposed framework aims to provide a clear, process-oriented contribution to this shift, focusing on achieving trustworthy AI in healthcare through rigorous user-centered evaluation and design.
Table 2. Key themes in recent XAI for CDSS literature (2021–2024).
Table 2. Key themes in recent XAI for CDSS literature (2021–2024).
ThemeKey Findings/Trends in Recent LiteratureRepresentative Citations
1. User-Centeredness and Contextual Needs AnalysisPersistent gap between technical XAI and clinical usability; lack of understanding/transparency hinders adoption; need for deep contextual understanding, workflow integration, and early user involvement (co–design); meaningful transparency beyond basic XAI required.Ayorinde et al. [51], Turri et al. [57], Panigutti et al. [56], Nasarian et al. [72], Aziz et al. [55], Amann et al. [52], Pierce et al. [53]
2. Clinician Trust, Acceptance, and InteractionTrust is complex, multi–faceted (cognitive/affective), and inconsistently measured; XAI impact on trust is nuanced (depends on quality, type, and context); disconnect between stated trust and behavioral reliance (WoA); nuanced interaction patterns (“Negotiation”); risks of over-reliance and under-reliance; trust calibration is the goal.Rosenbacek et al. [59], Sivaraman et al. [62], Laxar et al. [61], Gomez et al. [60], Micocci et al. [58]
3. Evaluation MethodologiesCritique of traditional metrics (accuracy and simple surveys); need for robust, behavioral, and user-centered evaluation; emergence of behavioral testing frameworks (e.g., Zeno); novel methods for scalable human assessment (e.g., GWAPs); increased focus on evaluating the explanation itself (strategies, types, and formats).Cabrera et al. [69], Morrison et al. [64], Cabrera et al. [69], Rosenbacek et al. [59], Aziz et al. [55], Sivaraman et al. [62], Laxar et al. [61]
4. Frameworks for Responsible and Trustworthy AIProliferation of frameworks guiding development (resilience and responsible collaboration); shared emphasis on user needs, trust, evaluation, and ethics; move towards principled, structured approaches beyond ad hoc XAI application.Sáez et al. [71], Nasarian et al. [72], Panigutti et al. [56], Cabrera et al. [69]

4.5. The CDS-Clinician Gap

Integrating CDSS into clinical workflows has the potential to improve decision-making by leveraging data-driven insights. However, a persistent gap exists between clinicians and CDSS, undermining the adoption and effectiveness of these systems. This gap arises from three critical factors: (1) clinicians’ needs and abilities, (2) interface design, and (3) the evaluation of XAI methods. Addressing these factors is essential to create systems that are not only technically robust but also accessible and trustworthy for end-users.
  • Clinicians’ needs and abilities: Our review (Section 4) indicates that existing XAI efforts in CDSS rarely select explanation methodologies based on the characteristics of end-users. This is a missed opportunity; different XAI methods could be tailored to different users’ reasoning processes and preferences. The main goal of XAI is to explain the model to users, and users will only trust the model if they understand how its outputs are produced. However, studies show that many XAI methods are designed according to developers’ intuition of what constitutes a ‘good’ explanation, rather than actual insights into users’ needs [73]. In this regard, it is useful to consider how explanations are defined in fields like philosophy and psychology: explanations are often viewed as a conversation or interaction intended to transfer knowledge, implying that the explainer must leverage the receiver’s existing understanding to enhance it. Different individuals reason differently and perceive information in varied ways. Therefore, accommodating the user’s mode of reasoning is crucial in designing useful, effective, and practical XAI. Not considering end-users can result in systems that clinicians do not want to use; ignoring users is a major design flaw. Designing an explanation mechanism in isolation from the decisions it is meant to inform will not lead to effective support. Clinical decision-making is highly information-intensive and, especially under complex conditions, involves complex human cognition. As discussed in the background, distributed cognition theory posits that cognitive activities involve both the mind and the external environment. This means that to design effective explanations, the reasoning processes of users (and how they use external aids) must be taken into account.
  • Interface Design: A gap often exists between how information is represented by a CDSS and how humans conceptualize that information. Many CDSS interfaces fail to address this mismatch, leading to ineffective communication of information and a poor user experience. Most current interfaces rely on static representations of data (e.g., tables or fixed graphs), which force users to mentally bridge gaps in the presented information. This weak coupling between the system and user increases cognitive load, making it harder for clinicians to derive meaningful insights. Dynamic and interactive interfaces could help by allowing users to explore data in ways that align with their mental models and clinical tasks. However, such capabilities are largely underdeveloped in today’s CDSS. For example, a clinician trying to investigate the underlying drivers of a cluster of patients may struggle if the system only provides static visualizations that cannot be filtered or tailored, or if the clinician cannot incorporate additional contextual data. The lack of interactivity limits the clinician’s ability to iteratively refine their understanding or adapt the system’s outputs to their specific needs. The concept of coupling the degree of interconnection between the user and the interface is particularly relevant here. Weak coupling, characterized by static and disconnected representations, forces clinicians to infer relationships or trends manually, increasing the likelihood of errors. Strong coupling, on the other hand, involves dynamic, reciprocal interactions that allow users to engage with the system more intuitively and effectively. Despite its importance, the idea of strengthening this user–interface coupling is rarely discussed or prioritized in current XAI–CDSS design. Additionally, the visual design of many CDSS interfaces fails to balance clarity and complexity. Overloaded interfaces with excessive information or poorly designed visualizations can overwhelm users, whereas overly simplified displays risk omitting critical details. These interface shortcomings further exacerbate the clinician–CDSS gap, reducing user trust and hindering adoption.
  • Evaluation of XAI Methods: The evaluation of XAI methods within CDSS is another underdeveloped area. In many cases, it remains unknown to what degree the provided AI explanations are understandable or useful to end-users. There is a clear need to evaluate the interpretability and explainability of XAI methods to ensure their usability and effectiveness. Despite a growing body of XAI research, relatively few studies have focused on evaluating these explanation methods in practice or assessing their influence on decision-making. In our survey of the literature, only a handful of papers included formal user evaluations of their XAI components, and just one study explicitly investigated whether providing explanations actually helped users trust the system or use the model’s output in their decisions [68]. This highlights a pressing need for systematic evaluation to validate, compare, and quantify different explanation approaches in terms of user comprehension and decision impact [11]. Without such evaluation, developers may not know which explanations truly aid clinicians or how to improve them.
The CDS–clinician gap arises from the interplay of these interconnected factors: misaligned XAI methodologies (stemming from poorly understood user needs), ineffective communication through suboptimal interfaces, and insufficient or inappropriate evaluation practices. Figure 3 illustrates how these factors influence one another: ignoring user needs often leads to poor interface design; a poor interface hinders effective evaluation and user feedback; and weaknesses in evaluation mean that misalignments in methods or interfaces go uncorrected. All three factors contribute directly to the overall gap, undermining clinician trust and system adoption. These shortcomings not only impede the uptake of CDSS but also limit their effectiveness in supporting clinical decision-making. Understanding and addressing these interrelated issues are critical for developing a CDSS that clinicians can trust, readily use, and seamlessly integrate into their workflows.

5. Proposed Framework

CDSSs are intended to assist healthcare providers in making well-informed decisions, thereby improving patient outcomes. Integrating XAI into these systems holds significant promise—particularly for addressing challenges such as the black-box nature of ML models and issues of user trust and interpretability. However, the adoption of XAI in CDSS has been hampered by several barriers, including insufficient alignment with clinicians’ cognitive processes, inadequate interactive visualization techniques, and a lack of systematic evaluation frameworks. Most XAI methods in current CDSS have been developed with little specific consideration of human–computer interaction principles. For instance, Liang and Sedig note that interaction design significantly influences a user’s ability to perform complex cognitive tasks with visualization tools [74]. By neglecting interactive elements, many existing XAI methods risk imposing additional cognitive burdens on users, such as requiring them to interpret static, poorly contextualized explanations.
To bridge this gap, we introduce a comprehensive framework that advocates user involvement throughout the entire lifecycle of XAI algorithm development, interface design, and evaluation. This framework is structured into three distinct phases for clarity, as illustrated in Figure 4.
Phase 1: User-Centered XAI Method Selection
To ensure active user involvement early in development, it is imperative to engage users, clinicians, and other stakeholders from the very beginning, particularly in selecting appropriate XAI methods. This effort requires collaboration within an interdisciplinary team (clinicians, cognitive scientists, computer scientists, etc.) to guarantee that chosen explanation techniques are both clinically relevant and technically feasible. Such collaboration helps bridge the gap between clinicians’ cognitive workflows and the inner workings of XAI models, enabling the creation of systems that are interpretable, actionable, and trustworthy.
Selecting the right XAI method is pivotal and should align the method’s explanations with the users’ intended reasoning objectives. This requires an approach focused on user needs and how humans naturally think and reason about decisions. As outlined in Section 2, different XAI methods produce explanations grounded in different rationales, each with unique strengths and limitations. Identifying the most suitable and acceptable explanation approach for the intended users is critical to an effective XAI-CDSS. For this, we recommend a reasoning-driven evaluation of XAI approaches, similar to the framework proposed by Wang et al. [57], which emphasizes human cognitive principles, to systematically determine which XAI methods best align with users’ reasoning goals. This framework emphasizes understanding how users seek explanations and reason through specific tasks, enabling developers to identify explanation types that best satisfy their cognitive needs.
To gather the necessary information, we engage potential users (e.g., clinicians and possibly patients for patient-facing tools) through interviews or focus groups, asking targeted questions such as ‘What kind of explanation would you expect from a colleague when they provide information to support your decision-making?’ The insights from these discussions establish a clear link between users’ natural reasoning processes and the explanatory strategies of candidate XAI methods, ensuring that the chosen explanations are both relevant and useful. For instance, if clinicians indicate they often reason by comparing a new patient to similar past cases, an example-based method like Case-Based Reasoning (CBR) would be a strong candidate to align with this analogical thinking [38].
By establishing this connection between users’ reasoning and the XAI mechanism, we can select explanation approaches that cater to users’ cognitive abilities and preferences, delivering insights in a way that resonates with the intended audience. A key part of Phase 1 is verifying that the chosen XAI methods indeed meet the needs of specific user groups, thereby justifying their applicability. Insights gained in this phase guide the subsequent stages of XAI development, ensuring the system is tailored to end-user requirements and expectations.
To further enhance the design and implementation of XAI-CDSS within healthcare, we integrate the following guiding principles into Phase 1:
  • User-Centered Design Approach: Engaging end-users (healthcare professionals such as doctors and nurses, as well as patients and other stakeholders when appropriate) is a cornerstone of the XAI development process. Early and continuous involvement of these users ensures the system aligns with their diverse needs, expectations, and cognitive abilities. Techniques like participatory design, structured interviews, and focus groups help reveal how users interact with CDSS and what they require from explanations. Healthcare providers vary in their expertise and familiarity with AI models. For example, technically proficient clinicians may prefer detailed, model-specific explanations (such as feature importance graphs or counterfactual examples), while others might find simplified visualizations like heatmaps or plain-language summaries more helpful. Liang and Sedig emphasize that interactive tools tailored to users’ cognitive processes can significantly improve reasoning and decision-making performance [74]. Patients, who typically lack technical expertise, might benefit from explanations that focus on the clinical implications of the system’s outputs, helping them understand how recommendations impact their care. By involving end-users from the beginning, developers can create XAI systems that foster trust, are usable, and fit into real-world workflows. This participatory approach boosts decision support, ensures explanations resonate with users, and ultimately improves adoption and patient care outcomes.
  • Customizable Explanation Level: Offering explanations at varying levels of granularity is critical to address the diverse expertise and preferences of users. Non-experts (such as patients or clinicians not deeply familiar with AI) may benefit from simplified explanations that highlight key clinical takeaways, whereas experienced medical professionals often require more detailed insights to support their reasoning. For example, a CDSS designed for radiologists might include heatmaps on medical images to highlight regions of interest, directly tying the AI model’s output to specific visual evidence. In contrast, primary care physicians might prefer plain-language summaries emphasizing the most relevant factors influencing a model’s prediction (for instance, a risk score based on patient history). Liang and Sedig underscore the importance of adapting information to users’ cognitive styles in order to improve comprehension and reduce cognitive load during decision-making tasks [74]. Gadanidis et al. [75] similarly argue that enabling users to customize the level of detail in explanations not only enhances usability but also empowers them to engage with the system at a depth aligned with their individual needs and expertise, making explanations more actionable and meaningful. By tailoring the depth and complexity of explanations to each user, XAI systems can better support decision processes, ensure interpretability, and foster trust in AI-driven recommendations. This adaptability also makes the system more inclusive, effectively serving a broader range of stakeholders.
  • User Profiling: Creating detailed user profiles based on factors like experience level, domain specialty, and preferred interaction style is vital for personalizing XAI explanations. Understanding users’ backgrounds and their familiarity with AI enables developers to ensure explanations align with their needs and cognitive preferences. For instance, novice users might benefit from intuitive, step-by-step explanations that draw attention to the key elements of a decision, whereas advanced users might prefer complex visualizations or interactive tools that allow deeper exploration and customization of information. Tailoring explanations to fit users’ mental models (their internal understanding of how a process works) is particularly important. For example, a clinician with a diagnostic reasoning mindset might prefer explanations structured around causal relationships and decision pathways, enabling them to map the CDSS’s reasoning to their own diagnostic process. In contrast, a clinician focused on patient monitoring might prioritize trend-based visualizations or alerts tied to threshold values. By considering such user profile factors, the system can present explanations in a form that each user finds intuitive and valuable.
Phase 2: User-Centered Interface Design
To achieve user-oriented interfaces, we draw on a robust body of HCI research on interaction design, context-awareness, and software learnability. As Xie and Gao [76] note, a context-aware system should provide users with a clear understanding of the actions being performed by the system and help predict what it will do next, while software learnability (tied to ease of use) ensures that users can quickly adapt to and effectively utilize the system. In CDSS, these principles are especially critical for ensuring that the integration of explanations supports, rather than hinders, clinical decision-making.
Addressing gaps in current systems requires that interface design foster a joint cognitive system between the XAI and the human user. This means prioritizing cognitive elements such as perception, reasoning, and insight—each integral to effective human–AI interaction [77]. Parsons and Sedig [41] emphasize designing with careful attention to the specific context and activity of use. Developers must ask, ‘In what tasks and scenarios will clinicians engage with this system?’ Determining this ensures that the interface design is well aligned with real-world clinical workflows.
Visual representation properties of the CDSS interface should then be co-designed with users to ensure clarity and effectiveness. These properties include appearance, complexity, dynamism, fidelity, fragmentation, interiority, scope, and type, as outlined by Parsons and Sedig [78]. For example, a clinician who needs to make time-sensitive decisions might benefit from low-complexity, high-fidelity visuals (showing essential information clearly), while someone conducting long-term patient analyses might prefer more dynamic, multi-dimensional representations that can show trends over time. A practical way to conduct such co-design is through participatory workshops where clinicians can sketch low-fidelity wireframes of how explanations should be integrated into their existing workflow, ensuring the final design is both intuitive and clinically useful [56].
Furthermore, how users interact with the explanations significantly influences their understanding and trust. If the system’s representations conflict with the user’s mental model (their internal expectation of how a process or decision unfolds), it can lead to confusion and mistrust [76]. To mitigate this, interfaces must bridge the gap between the system’s logic and user expectations. For instance, an interface for clinicians who use diagnostic reasoning might incorporate causal pathways in its explanations, whereas an interface for monitoring-focused clinicians might emphasize trends and threshold-based alerts. Therefore, it is essential to focus on interaction design—not just static visual representations.
Interactivity in the interface can be considered at two levels: micro and macro [79]. Micro-level interactivity involves individual user actions, such as zooming into specific data points, toggling different layers of explanation, or highlighting key variables to explore particular predictions. Macro-level interactivity focuses on the overarching strategies for combining these individual interactions into a cohesive workflow. To guide this design process, developers should ask questions like ‘How does interactivity facilitate the clinician’s decision-making?’, ‘Which actions should the CDSS support for the user?’, and ‘Which parameters or properties should be adjustable by the clinician?’ Tools like the interaction catalog by Sedig et al. [40] can assist in systematically designing interactive features within visual representations.
By incorporating principles of context-aware design, software learnability, and thoughtful interaction design, XAI systems can foster more effective communication and engagement. This approach produces interfaces that genuinely respond to user needs, smoothly support decision-making workflows, and help clinicians make sense of AI explanations. Below, we outline specific guidelines for designing interactive visualization components that embody these principles:
  • Interactivity: Facilitate immediate exploration of the model’s behavior using interactive visualization tools. Users should be able to perform ‘what-if’ analyses by adjusting input variables and observing how the model’s predictions change. For example, a clinician could modify patient parameters (like age or lab results) to see how the recommendation from the model would differ. Such hands-on capabilities give users valuable insight into the model’s reasoning and enable clinicians to validate or challenge recommendations through real-time experimentation—ultimately fostering greater trust.
  • Multi-modal Explanations: Present information through multiple forms of visual representation (such as heatmaps, line graphs, bar charts, text summaries, etc.) to convey different aspects of the model’s reasoning. Different modalities can cater to different user preferences and cognitive styles. For instance, one clinician might prefer a graphical visualization of feature importance, while another might find a textual explanation more accessible. Providing explanations in various formats ensures that a wider range of users can engage effectively with the XAI system.
  • Interactive Explanations: Allow users to control and customize the explanation outputs. Users might adjust parameters of the explanation (for example, changing the threshold for what constitutes a ‘high’ feature importance) or explore alternative decision pathways suggested by the model. Enabling such user control fosters a sense of ownership and control over the AI-assisted decision process. For example, a clinician could interactively explore why the AI recommended a certain treatment by manipulating elements of a visualization; this deeper engagement can lead to more informed and confident use of the CDSS.
Phase 3: Evaluation and Iterative Refinement
Developing a user-centered XAI system requires a robust methodology for evaluating its impact on end-users and outcomes. The evaluation process must be contextualized to the specific environment and tasks in which the XAI system operates, as highlighted by Hundhausen [80]. A critical starting point is understanding the exact clinical task or decision the system is intended to support. Once the task and setting are clear, it is essential to define what ‘effectiveness’ looks like in that scenario. Hundhausen emphasizes that, for any visualization (or explanatory) system, effectiveness must be evaluated in terms of the environment’s and users’ goals [80]. Thus, the initial stage of evaluation involves translating the clinical and user objectives identified in Phases 1 and 2 into measurable criteria.
Despite the existence of comprehensive evaluation guidelines (e.g., those by Freitas et al. [81] for visualization), significant challenges remain in conducting truly user-centered usability assessments for XAI-CDSS. Much of the literature still speaks in subjective terms of ‘good’ or ‘quality’ explanations [73], often without measuring how well those explanations integrate into clinicians’ decision-making processes. To address this gap, well-defined, consensus-driven evaluation criteria must be established—criteria that reflect both user needs and system characteristics. These criteria should be co-developed with clinicians and domain experts to ensure practical relevance and alignment with clinical workflows.
Translating these principles into practice, we propose the following guidelines for evaluating the effectiveness of XAI systems in CDSS:
  • User Satisfaction Surveys: User satisfaction is a key metric for evaluating XAI systems. Conduct surveys and interviews with end-users (clinicians and potentially patients for patient-facing tools) to gather feedback on the clarity, usefulness, and trustworthiness of the AI’s explanations. Regularly collecting this feedback ensures the system evolves in response to user expectations and helps pinpoint areas for improvement. These iterative feedback loops are especially valuable; over successive versions, explanations can be refined to better meet user needs, thereby enhancing usability and adoption.
  • Diagnostic Accuracy with Explanations: Perform comparative assessments to measure clinicians’ diagnostic accuracy or decision outcomes when using the CDSS with versus without the AI explanations. This helps determine whether the presence of explanations positively influences decision-making processes and clinical outcomes. Importantly, it can also verify that adding explanations does not inadvertently degrade the model’s predictive performance or the clinician’s speed and accuracy. By linking explanations to tangible improvements (such as increased diagnostic confidence, better decision accuracy, or reduced errors), this evaluation underscores the practical benefits of integrating XAI into healthcare workflows.
  • Comprehensibility Metrics: Comprehensibility is critical; explanations must be understandable to be actionable. Use quantifiable metrics to assess whether explanations are accessible to their intended audience. For example, apply readability scores like Flesch–Kincaid [62] to textual explanations to ensure they are pitched at an appropriate level (perhaps simpler for patients, more technical for expert users). For visual explanations, one might use measures of complexity or conduct comprehension quizzes. Ensuring that explanations are easily understood by users with varying expertise levels fosters effective communication between the XAI system and its users, promoting trust and adoption.
  • Uncertainty Quantification: Incorporate evaluation of how well the system communicates uncertainty in its predictions. It is essential to inform users about the reliability or confidence of the AI’s recommendations. Provide metrics or visual cues indicating uncertainty (for instance, confidence intervals or probability estimates) and then assess whether users appropriately factor this uncertainty into their decisions. Highlighting low-confidence predictions or cases where the model is extrapolating beyond the data can prompt users to be cautious or seek additional information. Transparent communication of uncertainty can improve decision-making and help prevent over-reliance on the AI.
  • Bias and Fairness Analysis: Evaluate the system for biases and fairness to ensure equitable outcomes across diverse patient groups. Assess both the training data and model outputs for any bias that could disproportionately affect certain populations (e.g., under-performance on minority groups). Strategies to mitigate identified biases might include diversifying training data or adjusting the algorithm. Regularly auditing the system’s recommendations for fairness and documenting these evaluations helps prevent discriminatory practices and promotes impartial decision support.
  • Continuous Improvement and User Feedback: Integrate continuous feedback mechanisms so the XAI system can adapt over time to changing clinical contexts and user needs. For example, implement a way for clinicians to flag when an explanation was not helpful or when the system made an unexpected suggestion. Such feedback can drive updates: improving the model, refining explanations, or adjusting the interface. This ensures the system remains up-to-date with medical knowledge and evolving user expectations. Regular updates based on real-world feedback not only improve the system’s utility but also demonstrate a commitment to transparency and accountability, further reinforcing user trust.
This comprehensive framework integrates XAI into CDSS in a way that enhances transparency, usability, and trust across three phases: User-Centered XAI Method Selection, User-Centered Interface Design, and Evaluation and Iterative Refinement. By aligning XAI methods with users’ cognitive processes (Phase 1), we ensure explanations are relevant and actionable for healthcare providers and patients. Phase 2 centers on user-driven interface design, employing principles of interaction design, context-awareness, and learnability to create interfaces that reduce cognitive strain and streamline clinical reasoning. Well-designed interfaces foster a joint cognitive system, bridging technical and user domains and increasing transparency and adoption. Phase 3 introduces rigorous evaluation methodologies—including user satisfaction, decision impact, comprehensibility, uncertainty communication, and fairness—to ensure XAI systems truly serve practical clinical needs and uphold ethical standards. Findings from each phase inform iterative refinement, so each XAI-CDSS is not static but evolves with user and domain feedback. By systematically addressing barriers like model opacity, workflow misalignment, and lack of user-centered evaluation, this framework offers actionable guidance to developers and researchers for bridging the gap between technical AI capabilities and actual clinical utility. The result is a pathway to XAI systems that demonstrably improve clinical decision-making and patient outcomes while fostering appropriate, sustainable trust in AI.

6. Case Study: Applying the User-Centered XAI Framework

To bridge the gap between theoretical XAI principles and effective implementation in clinical practice, we illustrate our three-phase user-centered framework (see Section 5) using a retrospective analysis of a recent XAI-CDSS implementation (2021–2024).
The purpose of this retrospective analysis is not to serve as a formal validation of the framework but rather to illustrate its analytical utility. By applying the framework as a retrospective lens to a well-documented, real-world study, we can demonstrate how its principles can be used to systematically deconstruct an XAI-CDSS implementation, identify specific user-centric gaps, and pinpoint opportunities for improvement. This analysis, therefore, makes the case for the framework’s value in guiding future prospective development and evaluation, showing how it can help anticipate and mitigate common challenges before a system is built. Our analysis is based on published findings and aims to highlight the framework’s value through this illustrative application.

6.1. Case Study Selection and Rationale

We selected the study ‘Ignore, Trust, or Negotiate: Understanding Clinician Acceptance of AI-Based Treatment Recommendations in Health Care’ by Sivaraman et al. (CHI 2023) [62] as our primary case for the following reasons:
  • High Relevance: This study directly investigates clinician interaction, acceptance, and trust—including ‘negotiation’ behaviors—with an interpretable AI-CDSS for sepsis treatment [62].
  • Illustrates Framework Phases: It provides empirical data on context/user needs (Phase 1), interface/explanation design (Phase 2), and a mixed-methods evaluation examining user interactions (Phase 3).
  • Highlights Key Gaps: The research explicitly surfaces unmet user needs, interface design issues, and evaluation shortcomings—validating the need for our structured approach.
  • Detailed Reporting: The article provides clear details on methodology, interface (‘AI Clinician Explorer’), the XAI technique (SHAP), and user feedback—enabling a comprehensive re-analysis through our framework.
To broaden our perspective, we also reference Laxar et al. (2023) [61] (trust and reliance) and Gomez et al. (2024) [60] (explanation type and under-reliance) where relevant.

6.2. Applying the Three-Phase Framework to the Case Study

This section analyzes the Sivaraman et al. (2023) study [62] through the lens of our proposed three-phase user-centered framework.
Phase 1: User-Centered XAI Method Selection
  • Context: The Sivaraman study is set in the ICU for sepsis management—a high-stakes, time-pressured domain [62]. Such environments demand extremely high trustworthiness from any CDSS. Phase 1 of our framework would emphasize deep contextual understanding before beginning design.
  • User Needs Identification: Through think-aloud protocols and interviews, several clinician needs not fully addressed by the AI system were identified:
    -
    Desire for validation evidence to justify trusting the recommendations.
    -
    Inclusion of ‘gestalt’/bedside context.
    -
    Alignment with clinical workflow (discrepancy between AI’s discrete intervals and clinicians’ continuous management).
    -
    Clearer explanations, especially for non-standard recommendations.
    Transparency of the underlying model was also a concern (echoed in Laxar et al. [61]). In the Sivaraman study, these surfaced after deployment; Phase 1 would advocate for uncovering such needs before design, via ethnography, contextual inquiry, and participatory workshops.
  • XAI Method Selection: The system used SHAP (feature attribution), but the publication does not document clinician involvement in this choice. Our framework would have called for XAI method selection based on user reasoning processes—potentially identifying, for example, the value of example-based explanations if clinicians favored case-based reasoning (as in Gomez et al. [60]). A purely technical or developer-driven selection risks persistent mismatches.
  • Phase 1: Summary: Many gaps could have been identified by a proactive, user-centered needs analysis and method selection—aligning the explanation approach, data, and interface with clinician reasoning and practice before deployment, not after.
Phase 2: User-Centered Interface Design
  • Interface Design: The ‘AI Clinician Explorer’ included patient trajectory visualization, navigation controls, and an explanation panel for recommendations [62]. This tool is central to Phase 2 of our framework.
  • Explanation Presentation: Three interface conditions were tested: (1) Text-Only, (2) Feature Importance (SHAP bar chart), and (3) Alternative Treatments (lists with historical frequency) [62]. Comparing multiple formats fits Phase 2’s recommendation to iteratively prototype and evaluate different presentation solutions.
  • Design Rationale and Feedback: SHAP-based visualizations improved confidence in the AI, suggesting increased transparency, but ‘Alternative Treatments’ received mixed reactions (some found them confusing or less helpful). Critically, a mismatch remained: clinicians managed sepsis as a continuous process, but the AI provided discrete recommendations. Phase 2 in our framework emphasizes co-design, real-world simulation, and deep integration into clinical workflow, which might have addressed this issue earlier.
  • Phase 2: Summary: The case demonstrates the difficulty of bridging AI logic and clinical workflow. Despite thoughtful design, full alignment with real-world tasks and needs may require more simulation, prototyping, and continuous user feedback, as prescribed by Phase 2.
Phase 3: Evaluation and Iterative Refinement
  • Evaluation Methodology: Sivaraman et al. used a mixed-methods approach: 24 clinicians applied the tool to real cases with both quantitative (concordance and confidence ratings) and qualitative (think-aloud and interviews) outcomes [62]. Coding interaction patterns revealed a ‘negotiation’ mode, neither blind trust nor rejection—capturing nuances a binary approach would miss.
  • Key Findings and Limitations: SHAP explanations increased confidence but not concordance with recommendations. Qualitative insights (negotiation behavior) highlight the value of behavioral evaluation advocated in Phase 3. As with Laxar et al. [61], actual reliance and self-reported trust may diverge—suggesting the importance of multifaceted metrics.
  • Trust and Acceptance Factors: Trust was driven more by evidence of external validation than interface features or explanation types. Phase 3 advises not only measuring outcomes but probing underlying reasons for acceptance, trust, or skepticism—aiding iterative improvement.
  • Iterative Refinement: Gaps identified (need for validation, holistic context, and workflow integration) would, under our framework, feed back to Phase 1 and Phase 2—closing the design loop. Solutions might include adding evidence summaries or continuous integration into the workflow.
  • Phase 3: Summary: The case shows how mixed-method evaluation uncovers real barriers and opportunities for improvement and highlights the importance of iterative design refinement driven by real-world user feedback.
Table 3 links Sivaraman et al.’s study elements to specific framework recommendations.

6.3. Discussion: Illustrating the Framework’s Value

The Sivaraman et al. case (contextualized with Laxar et al. [61] and Gomez et al. [60]) illustrates several key insights:
  • The ‘negotiation’ interaction pattern demonstrates the limitations of binary or technical evaluation metrics and justifies our framework’s Phase 3 call for nuanced, mixed-method, behavioral evaluation.
  • User needs such as system validation, workflow integration, and holistic context—surfaced post hoc—could have shaped better decisions had they been proactively addressed in Phases 1 and 2.
  • Evaluation challenges like the trust–validation paradox [62] reinforce that assessment must be reflective and continuous, not a one-time event.
  • Findings from Laxar et al. (difference between self-reported trust and real reliance) and Gomez et al. (explanation type and reliance) further strengthen the need for user-aligned method selection and interface design—and for complex, multi-modal evaluation.
In sum, this case exemplifies how the three-phase user-centered XAI framework can anticipate, surface, and address the real, nuanced barriers to adoption, usability, and trust in clinical XAI deployment. Each phase targets empirically observed issues—ensuring that technology meets real-world needs, interfaces fit actual clinical workflows, and evaluation produces actionable, iterative progress toward effective, trustworthy AI-CDSS.

7. Limitations and Conclusions

Explainable Artificial Intelligence (XAI) holds immense potential to enhance the interpretability, trustworthiness, and clinical utility of AI-driven CDSS. However, as our literature review (Section 4) highlights, a significant gap persists between the development of XAI techniques and their effective, user-centered implementation in complex healthcare environments. Key challenges consistently emerge around aligning XAI methods with clinician needs and workflows, managing the nuanced dynamics of trust calibration and interaction patterns (such as the ‘negotiation’ behavior), and overcoming the limitations of traditional evaluation approaches that often fail to capture real-world use.
To address these challenges, we have proposed a comprehensive three-phase user-centered framework for guiding the development and evaluation of XAI within CDSS: (1) User-Centered XAI Method Selection, (2) User-Centered Interface Design, and (3) Evaluation and Iterative Refinement. Through the detailed case study analysis in Section 6, we demonstrated the practical utility of this framework. By retrospectively applying its principles to a relevant XAI-CDSS implementation study, we showed how the framework provides a structured lens to identify specific user-centric gaps related to needs assessment, interface effectiveness, and evaluation methodology. Our analysis illustrated how, by following the framework’s guidance, developers can create explanations and interfaces aligned with clinicians’ reasoning and workflows and adopt evaluation methods capable of capturing crucial interaction dynamics—thereby fostering appropriately calibrated trust.
Achieving successful XAI integration in clinical practice requires deep engagement with user context and continuous evaluation throughout the development lifecycle. We reiterate the need for interdisciplinary teams—including clinicians, cognitive scientists, HCI specialists, and computer scientists—to collaborate closely in this process. Such collaboration ensures technical advances are grounded in clinical reality. Future work should focus on prospectively applying and validating the proposed framework across diverse clinical settings. A critical next step is to extend this work beyond its current clinician-centric scope by adapting and validating the framework for the development of patient-facing XAI-CDSS. This will involve exploring the unique explanatory needs of patients to support shared decision-making and patient empowerment, which represents a distinct but equally important challenge. Further research is also needed to develop standardized, yet context-sensitive, metrics for user-centered XAI evaluation and to design adaptive explanation interfaces that dynamically adjust to different user roles and information needs.

Considerations for Data Modality and XAI Complexity

A crucial consideration in applying our user-centered framework is how the underlying data modality of the CDSS influences XAI complexity and user-centered challenges. The reviewer’s comment regarding image-based systems is insightful as the cognitive tasks and explanation paradigms for imaging are fundamentally different from those for systems built on structured Electronic Health Record (EHR) data [82].
XAI for medical imaging primarily focuses on spatial attribution, using techniques like saliency maps (e.g., Grad-CAM) to answer the clinician’s question, ‘Where is the model looking?’ [82]. The user’s task is to visually correlate the highlighted region with their anatomical and pathological knowledge. In contrast, XAI for structured EHR data relies on feature attribution (e.g., using SHAP or LIME), answering the question, ‘What factors influenced this prediction?’ [83]. Here, the user’s task is a logical review of contributing factors (e.g., lab values and vital signs) against their understanding of clinical pathways.
These differences create distinct user-centered challenges that must be addressed in each phase of our framework. For image-based XAI, the key challenges include managing visual confirmation bias, interpreting spatial ambiguity (as a heatmap shows *where* but not necessarily *why*), and ensuring the visual explanation is a faithful representation of the model’s logic [82]. For structured data XAI, challenges include managing cognitive overload from long lists of features and assessing the plausibility of abstract feature importance scores, which may be statistically powerful but counter-intuitive to a clinician [83]. The selection of XAI methods (Phase 1), the design of the interface (Phase 2), and the evaluation of user interaction (Phase 3) must, therefore, be tailored to the specific data modality. This distinction is summarized in Table 4.
While this study provides a structured framework and demonstrates its application through retrospective analysis, certain limitations must be acknowledged. First, our literature review, though comprehensive and covering many core themes and recent works, may not have captured every relevant study in this rapidly evolving field. Second, the case study is a retrospective, post hoc analysis of a single published work and does not constitute a prospective validation of the framework in a real-world development project. Consequently, the practical impact of the framework is not yet proven, and the evaluation principles discussed in Phase 3 remain conceptual at this stage as their application has not been demonstrated in practice. Third, the framework as presented is primarily clinician-centric. A significant limitation is that we have not explored how it would need to be adapted for patient-facing XAI tools, which would involve substantially different user needs, ethical considerations, and evaluation criteria related to health literacy and shared decision-making. Fourth, our framework relies heavily on end-user involvement, and as a reviewer rightly pointed out, clinicians’ behaviors and attitudes toward new technology vary significantly. The challenge of interpreting feedback from a mix of ‘early believers’ and ‘late believers’ is a genuine limitation. These differing attitudes can manifest as distinct cognitive biases when interacting with an AI-CDSS; for instance, early adopters may be susceptible to automation bias, leading to over-reliance on AI recommendations [82], while more skeptical users might exhibit confirmation bias, ignoring AI outputs that contradict their pre-existing beliefs [84]. While our framework provides a foundation for mitigating this through user profiling and training, it does not offer a complete solution [84]. Fifth, the framework itself is conceptual and requires further empirical validation in real-world development projects to test its generalizability. Finally, the field of AI in healthcare is advancing quickly, and the framework will require ongoing refinement to keep pace with new technologies and user expectations.
In conclusion, effective integration of XAI into clinical decision support is not just a technical challenge but fundamentally a human-centered one. By systematically incorporating user needs into method selection, co-designing interfaces with clinicians, and rigorously evaluating use in real-world settings, we can develop XAI-CDSS that are not only explainable in principle but also truly usable and trustworthy in clinical environments. We hope that the framework and insights offered here will guide future research and development—enabling XAI systems that foster meaningful clinician–AI collaboration and truly enhance clinical decision-making and patient outcomes.

Author Contributions

Conceptualization, M.S., D.J.L. and K.S.; methodology, M.S., N.C. and S.S.A.; validation, N.C. and S.S.A.; formal analysis, N.C. and S.S.A.; investigation, M.S., N.C.; resources, K.S., D.J.L. and F.T.M.; writing—original draft preparation, M.S. and N.C.; writing—review and editing, M.S., K.S., D.J.L., S.S.A., N.C. and F.T.M.; visualization, N.C.; supervision, K.S., D.J.L. and F.T.M.; project administration, N.C. and K.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
XAIExplainable Artificial Intelligence
CDSSClinical Decision Support Systems
MLMachine Learning
EHRElectronic Health Records

References

  1. Berner, E.S.; La Lande, T.J. Overview of clinical decision support systems. In Clinical Decision Support Systems: Theory and Practice; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1–17. [Google Scholar]
  2. Kubben, P.; Dumontier, M.; Dekker, A. Fundamentals of Clinical Data Science; Springer: New York, NY, USA, 2019. [Google Scholar]
  3. Shamout, F.; Zhu, T.; Clifton, D.A. Machine learning for clinical outcome prediction. IEEE Rev. Biomed. Eng. 2020, 14, 116–126. [Google Scholar] [CrossRef]
  4. Montani, S.; Striani, M. Artificial intelligence in clinical decision support: A focused literature survey. Yearb. Med. Inform. 2019, 28, 120–127. [Google Scholar] [CrossRef] [PubMed]
  5. Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining explanations: An overview of interpretability of machine learning. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–3 October 2018; IEEE: New York, NY, USA, 2018; pp. 80–89. [Google Scholar]
  6. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
  7. Tonekaboni, S.; Joshi, S.; McCradden, M.D.; Goldenberg, A. What clinicians want: Contextualizing explainable machine learning for clinical end use. In Proceedings of the Machine Learning for Healthcare Conference, Ann Arbor, MI, USA, 9–10 August 2019; PMLR: Cambridge, MA, USA, 2019; pp. 359–380. [Google Scholar]
  8. Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  9. Tjoa, E.; Guan, C. A survey on explainable artificial intelligence (XAI): Toward medical XAI. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4793–4813. [Google Scholar] [CrossRef]
  10. Bunn, J. Working in contexts for which transparency is important: A recordkeeping view of explainable artificial intelligence (XAI). Rec. Manag. J. 2020, 30, 143–153. [Google Scholar] [CrossRef]
  11. Adadi, A.; Berrada, M. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
  12. Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Giannotti, F.; Pedreschi, D. A survey of methods for explaining black box models. Acm Comput. Surv. (CSUR) 2018, 51, 1–42. [Google Scholar] [CrossRef]
  13. Tan, S.; Caruana, R.; Hooker, G.; Lou, Y. Distill–and–compare: Auditing black-box models using transparent model distillation. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, New Orleans, LA, USA, 2–3 February 2018; pp. 303–310. [Google Scholar]
  14. Xu, K.; Park, D.H.; Yi, C.; Sutton, C. Interpreting deep classifier by visual distillation of dark knowledge. arXiv 2018, arXiv:1803.04042. [Google Scholar] [CrossRef]
  15. Che, Z.; Purushotham, S.; Khemani, R.; Liu, Y. Distilling knowledge from deep networks with applications to healthcare domain. arXiv 2015, arXiv:1512.03542. [Google Scholar] [CrossRef]
  16. Tickle, A.B.; Andrews, R.; Golea, M.; Diederich, J. The truth will come to light: Directions and challenges in extracting the knowledge embedded within trained artificial neural networks. IEEE Trans. Neural Netw. 1998, 9, 1057–1068. [Google Scholar] [CrossRef] [PubMed]
  17. Su, C.T.; Chen, Y.C. Rule extraction algorithm from support vector machines and its application to credit screening. Soft Comput. 2012, 16, 645–658. [Google Scholar] [CrossRef]
  18. De Fortuny, E.J.; Martens, D. Active learning-based pedagogical rule extraction. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 2664–2677. [Google Scholar] [CrossRef] [PubMed]
  19. Bologna, G.; Hayashi, Y. A rule extraction study from svm on sentiment analysis. Big Data Cogn. Comput. 2018, 2, 6. [Google Scholar] [CrossRef]
  20. Hailesilassie, T. Rule extraction algorithm for deep neural networks: A review. arXiv 2016, arXiv:1610.05267. [Google Scholar] [CrossRef]
  21. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
  22. Magesh, P.R.; Myloth, R.D.; Tom, R.J. An explainable machine learning model for early detection of Parkinson’s disease using LIME on DaTSCAN imagery. Comput. Biol. Med. 2020, 126, 104041. [Google Scholar] [CrossRef]
  23. Zhang, Y.; Wallace, B. A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv 2015, arXiv:1510.03820. [Google Scholar]
  24. Hooker, S.; Erhan, D.; Kindermans, P.J.; Kim, B. A benchmark for interpretability methods in deep neural networks. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
  25. Hara, S.; Ikeno, K.; Soma, T.; Maehara, T. Maximally invariant data perturbation as explanation. arXiv 2018, arXiv:1806.07004. [Google Scholar] [CrossRef]
  26. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  27. Štrumbelj, E.; Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst. 2014, 41, 647–665. [Google Scholar] [CrossRef]
  28. Fong, R.C.; Vedaldi, A. Interpretable explanations of black boxes by meaningful perturbation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3429–3437. [Google Scholar]
  29. Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; PMLR: Cambridge, MA, USA, 2015; pp. 2048–2057. [Google Scholar]
  30. Montavon, G.; Samek, W.; Müller, K.R. Methods for interpreting and understanding deep neural networks. Digit. Signal Process. 2018, 73, 1–15. [Google Scholar] [CrossRef]
  31. Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.R.; Samek, W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS ONE 2015, 10, e0130140. [Google Scholar] [CrossRef] [PubMed]
  32. Samek, W.; Montavon, G.; Binder, A.; Lapuschkin, S.; Müller, K.R. Interpreting the predictions of complex ML models by layer-wise relevance propagation. arXiv 2016, arXiv:1611.08191. [Google Scholar] [CrossRef]
  33. Nguyen, A.; Dosovitskiy, A.; Yosinski, J.; Brox, T.; Clune, J. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
  34. Bien, J.; Tibshirani, R. Prototype selection for interpretable classification. arXiv 2011, arXiv:1202.5933. [Google Scholar] [CrossRef]
  35. Sharma, S.; Henderson, J.; Ghosh, J. Certifai: Counterfactual explanations for robustness, transparency, interpretability, and fairness of artificial intelligence models. arXiv 2019, arXiv:1905.07857. [Google Scholar] [CrossRef]
  36. Mothilal, R.K.; Sharma, A.; Tan, C. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 27–30 January 2020; pp. 607–617. [Google Scholar]
  37. Yuan, X.; He, P.; Zhu, Q.; Li, X. Adversarial examples: Attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2805–2824. [Google Scholar] [CrossRef]
  38. Lamy, J.B.; Sekar, B.; Guezennec, G.; Bouaud, J.; Séroussi, B. Explainable artificial intelligence for breast cancer: A visual case-based reasoning approach. Artif. Intell. Med. 2019, 94, 42–53. [Google Scholar] [CrossRef]
  39. Sedig, K.; Parsons, P. Interaction design for complex cognitive activities with visual representations: A pattern-based approach. Ais Trans. Hum. Comput. Interact. 2013, 5, 84–133. [Google Scholar] [CrossRef]
  40. Sedig, K.; Naimi, A.; Haggerty, N. Aligning information technologies with evidence-based health-care activities: A design and evaluation framework. Hum. Technol. Interdiscip. J. Humans Ict Environ. 2017, 13, 180–215. [Google Scholar] [CrossRef]
  41. Parsons, P.; Sedig, K. Distribution of information processing while performing complex cognitive activities with visualization tools. In Handbook of Human Centric Visualization; Springer: New York, NY, USA, 2013; pp. 693–715. [Google Scholar]
  42. Lamy, J.B.; Sedki, K.; Tsopra, R. Explainable decision support through the learning and visualization of preferences from a formal ontology of antibiotic treatments. J. Biomed. Inform. 2020, 104, 103407. [Google Scholar] [CrossRef] [PubMed]
  43. Choi, E.; Bahadori, M.T.; Sun, J.; Kulas, J.; Schuetz, A.; Stewart, W. Retain: An interpretable predictive model for healthcare using reverse time attention mechanism. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
  44. Kyrimi, E.; Mossadegh, S.; Tai, N.; Marsh, W. An incremental explanation of inference in Bayesian networks for increasing model trustworthiness and supporting clinical decision making. Artif. Intell. Med. 2020, 103, 101812. [Google Scholar] [CrossRef] [PubMed]
  45. Ming, Y.; Qu, H.; Bertini, E. Rulematrix: Visualizing and understanding classifiers with rules. IEEE Trans. Vis. Comput. Graph. 2018, 25, 342–352. [Google Scholar] [CrossRef]
  46. Che, Z.; Purushotham, S.; Khemani, R.; Liu, Y. Interpretable deep models for ICU outcome prediction. In Proceedings of the AMIA Annual Symposium Proceedings, Washington, DC, USA, 4–8 November 2017; Volume 2016, p. 371. [Google Scholar]
  47. Yang, Y.; Tresp, V.; Wunderle, M.; Fasching, P.A. Explaining therapy predictions with layer-wise relevance propagation in neural networks. In Proceedings of the 2018 IEEE International Conference on Healthcare Informatics (ICHI), New York, NY, USA, 4–7 June 2018; IEEE: New York, NY, USA, 2018; pp. 152–162. [Google Scholar]
  48. Böhle, M.; Eitel, F.; Weygandt, M.; Ritter, K. Layer-wise relevance propagation for explaining deep neural network decisions in MRI-based Alzheimer’s disease classification. Front. Aging Neurosci. 2019, 11, 194. [Google Scholar] [CrossRef]
  49. Slijepcevic, D.; Horst, F.; Lapuschkin, S.; Raberger, A.M.; Zeppelzauer, M.; Samek, W.; Breiteneder, C.; Schöllhorn, W.I.; Horsak, B. On the explanation of machine learning predictions in clinical gait analysis. arXiv 2020, arXiv:1912.07737. [Google Scholar] [CrossRef]
  50. Šajnović, U.; Vošner, H.B.; Završnik, J.; Žlahtič, B.; Kokol, P. Internet of things and big data analytics in preventive healthcare: A synthetic review. Electronics 2024, 13, 3642. [Google Scholar] [CrossRef]
  51. Ayorinde, A.; Mensah, D.O.; Walsh, J.; Ghosh, I.; Ibrahim, S.A.; Hogg, J.; Peek, N.; Griffiths, F. Health Care Professionals’ Experience of Using AI: Systematic Review With Narrative Synthesis. J. Med. Internet Res. 2024, 26, e55766. [Google Scholar] [CrossRef]
  52. Amann, J.; Vetter, D.; Blomberg, S.N.; Christensen, H.C.; Coffee, M.; Gerke, S.; Gilbert, T.K.; Hagendorff, T.; Holm, S.; Livne, M.; et al. To explain or not to explain? Artificial intelligence explainability in clinical decision support systems. PLoS Digit. Health 2022, 1, e0000016. [Google Scholar] [CrossRef]
  53. Pierce, R.L.; Van Biesen, W.; Van Cauwenberge, D.; Decruyenaere, J.; Sterckx, S. Explainability in medicine in an era of AI-based clinical decision support systems. Front. Genet. 2022, 13, 903600. [Google Scholar] [CrossRef]
  54. Kim, S.Y.; Kim, D.H.; Kim, M.J.; Ko, H.J.; Jeong, O.R. XAI-Based Clinical Decision Support Systems: A Systematic Review. Appl. Sci. 2024, 14, 6638. [Google Scholar] [CrossRef]
  55. Aziz, N.A.; Manzoor, A.; Mazhar Qureshi, M.D.; Qureshi, M.A.; Rashwan, W. Explainable AI in Healthcare: Systematic Review of Clinical Decision Support Systems. medRxiv 2024. [Google Scholar] [CrossRef]
  56. Panigutti, C.; Beretta, A.; Fadda, D.; Giannotti, F.; Pedreschi, D.; Perotti, A.; Rinzivillo, S. Co-design of human-centered, explainable AI for clinical decision support. ACM Trans. Interact. Intell. Syst. 2023, 13, 1–35. [Google Scholar] [CrossRef]
  57. Turri, V.; Morrison, K.; Robinson, K.M.; Abidi, C.; Perer, A.; Forlizzi, J.; Dzombak, R. Transparency in the Wild: Navigating Transparency in a Deployed AI System to Broaden Need-Finding Approaches. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, Rio de Janeiro, Brazil, 3–6 June 2024; pp. 1494–1514. [Google Scholar]
  58. Micocci, M.; Borsci, S.; Thakerar, V.; Walne, S.; Manshadi, Y.; Edridge, F.; Mullarkey, D.; Buckle, P.; Hanna, G.B. Attitudes towards trusting artificial intelligence insights and factors to prevent the passive adherence of GPs: A pilot study. J. Clin. Med. 2021, 10, 3101. [Google Scholar] [CrossRef]
  59. Rosenbacke, R.; Melhus, Å.; McKee, M.; Stuckler, D. How Explainable Artificial Intelligence Can Increase or Decrease Clinicians’ Trust in AI Applications in Health Care: Systematic Review. JMIR AI 2024, 3, e53207. [Google Scholar] [CrossRef]
  60. Gomez, C.; Smith, B.L.; Zayas, A.; Unberath, M.; Canares, T. Explainable AI decision support improves accuracy during telehealth strep throat screening. Commun. Med. 2024, 4, 149. [Google Scholar] [CrossRef]
  61. Laxar, D.; Eitenberger, M.; Maleczek, M.; Kaider, A.; Hammerle, F.P.; Kimberger, O. The influence of explainable vs non-explainable clinical decision support systems on rapid triage decisions: A mixed methods study. BMC Med. 2023, 21, 359. [Google Scholar] [CrossRef]
  62. Sivaraman, V.; Bukowski, L.A.; Levin, J.; Kahn, J.M.; Perer, A. Ignore, trust, or negotiate: Understanding clinician acceptance of AI-based treatment recommendations in health care. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 23–28 April 2023; pp. 1–18. [Google Scholar]
  63. Sivaraman, V.; Morrison, K.; Epperson, W.; Perer, A. Over-Relying on Reliance: Towards Realistic Evaluations of AI-Based Clinical Decision Support. arXiv 2025, arXiv:2504.07423. [Google Scholar]
  64. Morrison, K.; Jain, M.; Hammer, J.; Perer, A. Eye into AI: Evaluating the Interpretability of Explainable AI Techniques through a Game with a Purpose. Proc. ACM Hum. Comput. Interact. 2023, 7, 1–22. [Google Scholar] [CrossRef]
  65. Morrison, K.; Shin, D.; Holstein, K.; Perer, A. Evaluating the impact of human explanation strategies on human-AI visual decision-making. Proc. ACM Hum. Comput. Interact. 2023, 7, 1–37. [Google Scholar] [CrossRef]
  66. Katuwal, G.J.; Chen, R. Machine learning model interpretability for precision medicine. arXiv 2016, arXiv:1610.09045. [Google Scholar] [CrossRef]
  67. Giordano, C.; Brennan, M.; Mohamed, B.; Rashidi, P.; Modave, F.; Tighe, P. Accessing artificial intelligence for clinical decision-making. Front. Digit. Health 2021, 3, 645232. [Google Scholar] [CrossRef]
  68. Kwon, B.C.; Choi, M.J.; Kim, J.T.; Choi, E.; Kim, Y.B.; Kwon, S.; Sun, J.; Choo, J. Retainvis: Visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE Trans. Vis. Comput. Graph. 2018, 25, 299–309. [Google Scholar] [CrossRef] [PubMed]
  69. Cabrera, Á.A.; Fu, E.; Bertucci, D.; Holstein, K.; Talwalkar, A.; Hong, J.I.; Perer, A. Zeno: An interactive framework for behavioral evaluation of machine learning. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Hamburg, Germany, 23–28 April 2023; pp. 1–14. [Google Scholar]
  70. Zihni, E.; Madai, V.I.; Livne, M.; Galinovic, I.; Khalil, A.A.; Fiebach, J.B.; Frey, D. Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome. PLoS ONE 2020, 15, e0231166. [Google Scholar] [CrossRef] [PubMed]
  71. Sáez, C.; Ferri, P.; García-Gómez, J.M. Resilient artificial intelligence in health: Synthesis and research agenda toward next-generation trustworthy clinical decision support. J. Med. Internet Res. 2024, 26, e50295. [Google Scholar] [CrossRef] [PubMed]
  72. Nasarian, E.; Alizadehsani, R.; Acharya, U.R.; Tsui, K.L. Designing interpretable ML system to enhance trust in healthcare: A systematic review to proposed responsible clinician-AI-collaboration framework. Inf. Fusion 2024, 108, 102412. [Google Scholar] [CrossRef]
  73. Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]
  74. Sedig, K.; Liang, H.N. On the design of interactive visual representations: Fitness of interaction. In Proceedings of the EdMedia+ Innovate Learning, Vancouver, BC, Canada, 25–29 June 2007; Association for the Advancement of Computing in Education (AACE): Waynesville, NC, USA, 2007; pp. 999–1006. [Google Scholar]
  75. Gadanidis, G.; Sedig, K.; Liang, H.N. Designing online mathematical investigation. J. Comput. Math. Sci. Teach. 2004, 23, 275–298. [Google Scholar]
  76. Xie, Y.; Gao, G.; Chen, X. Outlining the design space of explainable intelligent systems for medical diagnosis. arXiv 2019, arXiv:1902.06019. [Google Scholar] [CrossRef]
  77. Tory, M. User studies in visualization: A reflection on methods. In Handbook of Human Centric Visualization; Springer: New York, NY, USA, 2013; pp. 411–426. [Google Scholar]
  78. Parsons, P.; Sedig, K. Adjustable properties of visual representations: Improving the quality of human-information interaction. J. Assoc. Inf. Sci. Technol. 2014, 65, 455–482. [Google Scholar] [CrossRef]
  79. Sedig, K.; Parsons, P.; Dittmer, M.; Haworth, R. Human-centered interactivity of visualization tools: Micro-and macro-level considerations. In Handbook of Human Centric Visualization; Springer: New York, NY, USA, 2014; pp. 717–743. [Google Scholar]
  80. Hundhausen, C.D. Evaluating visualization environments: Cognitive, social, and cultural perspectives. In Handbook of Human Centric Visualization; Springer: New York, NY, USA, 2013; pp. 115–145. [Google Scholar]
  81. Freitas, C.M.; Pimenta, M.S.; Scapin, D.L. User-centered evaluation of information visualization techniques: Making the HCI-InfoVis connection explicit. In Handbook of Human Centric Visualization; Springer: New York, NY, USA, 2014; pp. 315–336. [Google Scholar]
  82. Abbas, Q.; Jeong, W.; Lee, S.W. Explainable AI in Clinical Decision Support Systems: A Meta-Analysis of Methods, Applications, and Usability Challenges. Healthcare 2025, 13, 2154. [Google Scholar] [CrossRef]
  83. Zytek, A.; Liu, D.; Vaithianathan, R.; Veeramachaneni, K. Sibyl: Understanding and addressing the usability challenges of machine learning in high-stakes decision making. IEEE Trans. Vis. Comput. Graph. 2021, 28, 1161–1171. [Google Scholar] [CrossRef]
  84. Gambetti, A.; Han, Q.; Shen, H.; Soares, C. A Survey on Human-Centered Evaluation of Explainable AI Methods in Clinical Decision Support Systems. arXiv 2025, arXiv:2502.09849. [Google Scholar] [CrossRef]
Figure 1. Division of post hoc methods based on model specificity, explanation scope, and explanation type.
Figure 1. Division of post hoc methods based on model specificity, explanation scope, and explanation type.
Informatics 12 00119 g001
Figure 2. PRISMA diagram illustrating search results and article selection process.
Figure 2. PRISMA diagram illustrating search results and article selection process.
Informatics 12 00119 g002
Figure 3. The CDS–clinician gap.
Figure 3. The CDS–clinician gap.
Informatics 12 00119 g003
Figure 4. The proposed 3-phase user-centered framework for XAI-CDSS development and evaluation, highlighting the iterative feedback loop.
Figure 4. The proposed 3-phase user-centered framework for XAI-CDSS development and evaluation, highlighting the iterative feedback loop.
Informatics 12 00119 g004
Table 1. Keywords used in the search engine (comprehensive keywords for XAI in CDSS).
Table 1. Keywords used in the search engine (comprehensive keywords for XAI in CDSS).
XAI KeywordCDSS Keyword
Explainable AIClinical Decision Support
Interpretable AIClinical Decision Support Systems
Transparent AIHealthcare Decision Support
Accountable AIMedical Decision Support Systems
Interpretable Machine LearningMedicine
Explainable Machine LearningClinical Decision Support Tools
Black BoxTreatment Recommendations
Interpretable AlgorithmClinical Prediction
Explainable AlgorithmClinical Decision-Making
Model ExplanationPatient Outcomes
XAICDSS
Transparent Machine LearningHealthcare
Real-world XAI in MedicineDisease Diagnosis
Table 3. Analysis of Sivaraman et al. using the 3-phase framework.
Table 3. Analysis of Sivaraman et al. using the 3-phase framework.
Framework PhaseAspectCase Study Details (Sivaraman et al.)Connection to Framework Principles/Goals
ContextHigh-stakes ICU environment; sepsis treatment (uncertainty, variability).Aligns with Phase 1 goal of understanding the specific clinical setting.
Phase 1: User-Centered XAI Method SelectionUser Needs AnalysisNeeds identified post hoc via evaluation: validation evidence for trust, inclusion of bedside/gestalt info, workflow alignment, explanation for deviations. Lack of model transparency also noted in other contexts (Laxar et al. [61]).Demonstrates importance of proactive Phase 1 needs analysis (interviews, observation) rather than discovering needs late.
XAI Method SelectionSHAP used for feature explanation; no documented user input in selection. Contrast with Gomez et al. [60], who found higher trust for example-based explanations aligning with clinical reasoning.Phase 1 emphasizes selecting methods based on identified user needs and reasoning styles—not just technical availability.
Interface Design‘AI Clinician Explorer’ interface with trajectory views, controls, recommendation panel.Exemplifies the design artifact created in Phase 2.
Phase 2: User-Centered Interface Design  Explanation PresentationTested: Text-Only; Feature Explanation (SHAP chart); Alternative Treatments (AI-ranked actions + historical frequency).Demonstrates testing multiple explanation formats as part of iterative Phase 2 refinement.
Design FeedbackFeature Explanation increased confidence; Alternative Treatments had mixed reception; mismatch between AI’s discrete outputs and clinical workflow noted. Context-dependence also seen in Laxar et al. (time–pressure) studies.Phase 2 requires translating Phase 1 needs into usable interfaces via iterative design and testing; feedback pinpoints areas for improvement.
Evaluation Method Mixed-methods: 24 clinicians, real cases, think-aloud, interviews, quantitative ratings, concordance analysis, qualitative coding for interaction patterns.Exemplifies robust Phase 3 evaluation combining quantitative and qualitative insights.
Phase 3: Evaluation and Iterative RefinementKey Findings and LimitationsExplanations boosted confidence but not binary concordance; ‘Negotiation’ pattern dominant; trust linked to external validation; gaps identified (data, workflow). Disconnect between reliance (WoA) and trust seen elsewhere (Laxar et al. [61]).Demonstrates Phase 3 goal of evaluating beyond simple metrics and understanding complex behaviors (‘Negotiation’).
Trust Calibration and Iterative RefinementIdentified the ‘chicken-and-egg’ problem of validation ↔ trust. Contrasting risks of over-reliance (Micocci et al. [58]) and under-reliance (Gomez et al. [60]) noted.Phase 3 aims for appropriate trust calibration. Findings feed back into refining Phases 1 and 2.
Table 4. Comparative analysis of XAI user-centered challenges for imaging vs. structured EHR data.
Table 4. Comparative analysis of XAI user-centered challenges for imaging vs. structured EHR data.
AttributeImage-Based CDSSStructured EHR-Based CDSS
Data ModalityContinuous, spatial, high-dimensional (e.g., pixels in an MRI).Discrete, tabular, categorical/numerical features (e.g., lab values, diagnosis codes).
Primary User Question‘Where did the model focus?’‘Which factors influenced the prediction and by how much?’
Common Explanation TypeSpatial Attribution (Visual Overlays like heatmaps).Feature Attribution (Contribution plots, lists, e.g., SHAP).
Primary User TaskVisual correlation and anatomical plausibility assessment.Logical review and clinical pathway validation.
Key User-Centered ChallengeInterpreting spatial ambiguity, avoiding visual confirmation bias, and assessing the fidelity of the visual explanation [82].Managing feature overload, understanding complex feature interactions, and assessing the plausibility of abstract feature importance
values [83].
Potential for MisinterpretationTrusting a visually plausible but mechanistically flawed localization.Dismissing a counter-intuitive but statistically correct feature importance.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Salimparsa, M.; Sedig, K.; Lizotte, D.J.; Abdullah, S.S.; Chalabianloo, N.; Muanda, F.T. Explainable AI for Clinical Decision Support Systems: Literature Review, Key Gaps, and Research Synthesis. Informatics 2025, 12, 119. https://doi.org/10.3390/informatics12040119

AMA Style

Salimparsa M, Sedig K, Lizotte DJ, Abdullah SS, Chalabianloo N, Muanda FT. Explainable AI for Clinical Decision Support Systems: Literature Review, Key Gaps, and Research Synthesis. Informatics. 2025; 12(4):119. https://doi.org/10.3390/informatics12040119

Chicago/Turabian Style

Salimparsa, Mozhgan, Kamran Sedig, Daniel J. Lizotte, Sheikh S. Abdullah, Niaz Chalabianloo, and Flory T. Muanda. 2025. "Explainable AI for Clinical Decision Support Systems: Literature Review, Key Gaps, and Research Synthesis" Informatics 12, no. 4: 119. https://doi.org/10.3390/informatics12040119

APA Style

Salimparsa, M., Sedig, K., Lizotte, D. J., Abdullah, S. S., Chalabianloo, N., & Muanda, F. T. (2025). Explainable AI for Clinical Decision Support Systems: Literature Review, Key Gaps, and Research Synthesis. Informatics, 12(4), 119. https://doi.org/10.3390/informatics12040119

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop