Explainability in Deep Learning in Healthcare and Medicine: Panacea or Pandora’s Box? A Systemic View

Raghupathi, Wullianallur

doi:10.3390/a19010063

Open AccessReview

Explainability in Deep Learning in Healthcare and Medicine: Panacea or Pandora’s Box? A Systemic View

by

Wullianallur Raghupathi

Gabelli School of Business, Fordham University, 140 W. 62nd Street, New York, NY 10023, USA

Algorithms 2026, 19(1), 63; https://doi.org/10.3390/a19010063

Submission received: 4 November 2025 / Revised: 4 January 2026 / Accepted: 7 January 2026 / Published: 12 January 2026

(This article belongs to the Special Issue Applications of Artificial Intelligence in Healthcare, Biomedicine and Medical Informatics)

Download

Browse Figures

Versions Notes

Abstract

Explainability in deep learning (XDL) for healthcare is increasingly portrayed as essential for addressing the “black box” problem in clinical artificial intelligence. However, this universal transparency mandate may create unintended consequences, including cognitive overload, spurious confidence, and workflow disruption. This paper examines a fundamental question: Is explainability a panacea that resolves AI’s trust deficit, or a Pandora’s box that introduces new risks? Drawing on general systems theory we demonstrate that the answer is profoundly context dependent. Through systemic analysis of current XDL methods, Saliency Maps, LIME, SHAP, and attention mechanisms, we reveal systematic disconnects between technical transparency and clinical utility. This paper argues that XDL is a context-dependent systemic property rather than a universal requirement. It functions as a panacea when proportionately applied to high-stakes reasoning tasks (cancer treatment planning, complex diagnosis) within integrated socio-technical architectures. Conversely, it becomes a Pandora’s box when superficially imposed on routine operational functions (scheduling, preprocessing) or time-critical emergencies (e.g., cardiac arrest) where comprehensive explanation delays lifesaving intervention. The paper proposes a risk-stratified framework recognizing that a specific subset of healthcare AI applications—those involving high-stakes clinical reasoning—require comprehensive explainability, while other applications benefit from calibrated transparency appropriate to their clinical context. We conclude that explainability is neither a cure-all nor an inevitable harm, but rather a dynamic equilibrium requiring continuous rebalancing across technical, cognitive, and organizational dimensions.

Keywords:

explainability in deep learning; healthcare AI; general systems theory; socio-technical systems; clinical decision support; risk-proportionate transparency

1. Introduction

Current discourse increasingly treats explainability as a universal requirement. The European Union AI Act (2024) [1] mandates transparency for high-risk applications. U.S. FDA guidance emphasizes interpretability for medical devices. The World Health Organization (2021) [2] positions explainability as a fundamental ethical principle. Academic literature often assumes that more transparency necessarily yields better outcomes—improved clinical decisions, enhanced trust, strengthened accountability, reduced bias. This universal transparency assumption exhibits three key characteristics. First, explainability is portrayed as an inherently beneficial panacea resolving AI’s trust deficit. Second, lack of transparency is framed as the primary barrier to adoption—make models interpretable and resistance will dissolve. Third, technical solutions are emphasized, i.e., develop better explanation methods and the problems will be solved.

The rapid proliferation of deep learning in healthcare has created a paradox. Models increasingly outperform human experts at specific tasks—dermatological diagnosis [3], breast cancer screening [4], pathological image analysis [5]—yet face persistent adoption barriers. The core problem is that deep neural networks function as “black boxes” whose decision-making processes remain opaque to clinicians, patients, and regulators. This opacity triggers legitimate concerns. How can physicians trust recommendations they cannot understand? How can patients provide informed consent for AI-influenced care? How can organizations ensure accountability when algorithms fail? How can regulators validate safety and fairness without insight into model reasoning? These questions have spawned explainability in deep learning (XDL), a rapidly growing subfield attempting to render algorithmic decision-making transparent and interpretable.

However, emerging evidence challenges this assumption. Ref. [6] demonstrated that radiologists shown explanation heatmaps reported increased confidence but made no better diagnostic decisions than controls; explanations created illusions of understanding without improving performance. Ref. [7] found that explanation-induced overconfidence led clinicians to accept flawed AI recommendations they would have questioned otherwise. Ref. [8] revealed systematic disconnects between what explanation methods provide and what clinicians need for decision-making. These findings suggest explainability is not universally beneficial but context-dependent—sometimes helping, sometimes harming, and sometimes irrelevant. They point to a deeper issue: current approaches treat explainability as a technical problem amenable to algorithmic solutions, neglecting the fundamentally socio-technical nature of healthcare AI embedded within complex organizational and clinical contexts.

This paper addresses a question largely absent from current literature: Under what conditions does explainability function as panacea versus Pandora’s box? When does transparency enhance versus undermine clinical decision-making? What systemic characteristics determine whether explanation helps or harms? We approach these questions through general systems theory (GST), a framework for understanding complex socio-technical systems characterized by emergence, feedback, and contextual adaptation [9]. Specifically, we build on Raghupathi et al.’s three-decade application of GST to healthcare information systems (1992–2008), which provides conceptual tools for analyzing when and how transparency should be designed into clinical AI [10,11,12]. We argue that explainability is neither universally beneficial (panacea) nor inevitably harmful (Pandora’s box), but rather a context-dependent systemic property requiring careful design, proportionate deployment, and continuous rebalancing. Its value emerges from alignment across three layers:

Technical infrastructure: Generating accurate, reliable explanations.
Cognitive interface: Translating technical outputs into clinically meaningful insights.
Organizational governance: Embedding transparency within accountability structures.

When systemically integrated across these layers for high-stakes reasoning tasks (cancer treatment selection, complex diagnosis), explainability functions as panacea—building trust, supporting judgment, and enabling accountability. When superficially imposed on routine operational functions (scheduling, preprocessing) or forced into time-critical contexts (cardiac arrest response), it becomes Pandora’s box—creating cognitive overload, delaying intervention, and generating false confidence. The paper demonstrates that high-stakes healthcare AI applications genuinely require comprehensive explainability—specifically those involving clinical reasoning where human judgment remains essential. The remaining applications benefit more from performance validation and organizational oversight than from instance-level explanations. The scholarly literature on explainability in healthcare AI consistently emphasizes that transparency requirements should be calibrated to clinical context, cognitive demands, and potential consequences [6,13]. This calibration reflects fundamental differences in how AI systems interact with clinical reasoning across application types.

High-stakes reasoning applications involve AI systems that participate directly in clinical reasoning—diagnostic classification, treatment selection, and prognostic modeling, where algorithmic outputs substantively shape therapeutic decisions [14,15]. Examples include deep learning systems for cancer detection in mammography or histopathology, AI-driven treatment recommendations in oncology, and mortality prediction models in intensive care units. These applications demand comprehensive explainability because clinicians bear professional and legal accountability for outcomes, patients deserve understanding of factors influencing their care, and errors carry severe consequences, including misdiagnosis, inappropriate treatment, or delayed intervention [6,16]. Research demonstrates that in reasoning contexts, clinicians require a mechanistic understanding of how AI systems weight clinical features—not merely confidence scores—enabling appropriate trust calibration and error detection [8,9]. The cognitive science literature suggests that effective human–AI collaboration in reasoning tasks requires shared mental models between clinician and algorithm [17].

Medium-stakes augmentation applications occupy an intermediate position where AI supports but does not replace clinical judgment [18,19]. Examples include sepsis early warning systems that alert clinicians to deteriorating patients, risk stratification tools that prioritize patient populations for intervention, and triage algorithms that suggest urgency levels in emergency departments. These systems augment clinical attention and resource allocation rather than determining specific diagnoses or treatments. Contextual transparency—understanding why an alert was triggered or why a patient was flagged as high-risk—enables appropriate clinician reliance while preserving professional autonomy [20]. The literature suggests that augmentation applications benefit from explanations calibrated to decision urgency: emergency contexts may require streamlined indicators, while routine risk stratification can accommodate richer contextual information [21,22].

Low-stakes functional applications involve AI systems performing operational tasks peripheral to clinical reasoning [23,24]. Examples include appointment scheduling optimization, medical image preprocessing and quality enhancement, automated documentation and coding assistance, and inventory management. These applications primarily require performance validation and systematic monitoring rather than instance-level explanation. While organizational accountability structures remain essential, the cognitive burden of comprehensive explainability would be disproportionate to clinical benefit. However, even functional applications require transparency regarding aggregate performance, failure modes, and systematic biases that might differentially affect patient populations [25].

This work makes three primary contributions:

Theoretical: We integrate general systems theory with XDL scholarship, demonstrating how GST provides analytical tools for understanding when transparency helps versus harms. We extend Raghupathi’s framework (developed for clinical decision support) to contemporary deep learning challenges [10,11,12].

Empirical: We systematically evaluate major XDL techniques (saliency maps, LIME, SHAP, attention mechanisms, emerging approaches) across technical, cognitive, and organizational dimensions, revealing systematic gaps between technical sophistication and clinical utility.

Practical: We provide risk-stratified implementation framework and policy recommendations enabling context-appropriate deployment—comprehensive explanations where genuinely needed, lighter-touch approaches where sufficient, recognizing when transparency should be suppressed entirely (time-critical emergencies).

The paper proceeds as follows. Section 2 discusses explainable deep learning in healthcare and medicine and lays the foundation for the systemic analysis. Section 3 establishes the systemic framework grounded in GST. Section 4 evaluates current XDL methods against systemic criteria. Section 5 discusses context as determinant: panacea versus Pandora’s box, the central theme of this paper. Section 6 provides implementation guidance. Section 7 offers conclusions, policy recommendations, and future research directions.

2. Understanding Explainability in Deep Learning in Healthcare and Medicine

Before examining the systemic framework that guides our analysis, we must establish foundational concepts about explainability in deep learning (XDL) for healthcare. This section addresses four critical questions: What is explainability in the context of healthcare AI? Why is it important? What unique challenges does healthcare present for explainability? And what specific technical issues arise when attempting to explain deep learning models, particularly transformers, in medical applications? Before defining explainability specifically for deep learning in healthcare, we must distinguish between explainability in artificial intelligence (XAI) broadly and explainability in deep learning (XDL) more specifically, as these terms are often conflated despite important differences [26,27].

Explainability in AI (XAI) encompasses the entire spectrum of artificial intelligence systems, from simple rule-based algorithms and decision trees to complex neural networks. Traditional AI systems like expert systems, decision trees, and logistic regression models often possess inherent transparency: their decision rules can be directly inspected, and the logic connecting inputs to outputs remains relatively straightforward. For instance, a clinical decision rule for pneumonia severity (CURB-65) explicitly states that one point each is assigned for confusion, elevated urea, respiratory rate ≥ 30, low blood pressure, and age ≥ 65, with the sum determining treatment location. Such rule-based AI systems are “interpretable by design,” requiring minimal additional explanation mechanisms [28].

Explainability in deep learning (XDL), by contrast, addresses the unique challenges posed by multi-layered neural networks with millions to billions of parameters distributed across numerous interconnected layers. Deep learning models—including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers—achieve remarkable performance through learning hierarchical representations via non-linear transformations across many processing stages. This depth and complexity create fundamental opacity: the computational pathways connecting inputs to outputs involve millions of learned parameters interacting through non-linear functions, making the decision-making process inherently difficult to trace or comprehend [6,29].

The distinction matters critically for healthcare and medicine. While traditional AI methods may enable clinicians to inspect decision rules directly (e.g., “if troponin > 0.04 ng/mL AND chest pain present, then high cardiac risk”), deep learning models learn representations that resist such direct inspection. A deep CNN analyzing chest X-rays might detect pneumonia through patterns in millions of pixel-level features processed across dozens of layers—patterns that have no simple verbal description and cannot be reduced to inspectable rules. XDL therefore requires post hoc explanation techniques (attribution methods, attention visualization, counterfactual generation) to approximate what opaque models have learned, whereas traditional XAI might simply display the explicit rules already encoded in the system [28,30].

Recent scoping reviews systematically document this landscape. Ref. [14] conducted a comprehensive scoping review of explainability in medical AI, identifying 72 papers and revealing that most focused on post hoc explanation methods for deep learning rather than inherently interpretable traditional AI. Similarly, Ref. [13] performed a scoping review of explainable artificial intelligence in healthcare, analyzing 56 studies and finding that deep learning models (CNNs, RNNs, LSTMs) constituted most applications requiring explanation, with image classification and clinical prediction being the primary use cases. Tjoa and Guan (2021) surveyed explainable AI in healthcare specifically for deep learning, reviewing over 100 papers and confirming that XDL methods like saliency maps, gradient-based attribution, and attention mechanisms dominate the healthcare explainability literature precisely because deep models’ opacity necessitates these techniques [31]. Most recently, Ref. [16] conducted a scoping review examining explainable AI in clinical decision support, finding that 68% of reviewed systems employed deep learning architectures and that explanation adequacy varied dramatically depending on whether methods addressed technical transparency, clinical interpretability, or actionable justification. Building on these reviews, explainability in deep learning for healthcare encompasses three distinct but interrelated dimensions [31,32].

First, technical transparency addresses how models process information internally. This includes understanding model architecture, identifying which input features most influence predictions, and tracing the computational pathways that lead to specific outputs. For a convolutional neural network analyzing chest X-rays, technical transparency might reveal which image regions the model weighted most heavily, which convolutional filters activated, and how these features propagated through network layers [17]. However, technical transparency alone proves insufficient for healthcare because computational patterns do not automatically translate into clinical meaning.

Second, clinical interpretability translates technical outputs into medically meaningful insights. This dimension bridges the gap between what models compute and what clinicians need to understand. A model might identify that elevated troponin levels, ST-segment changes, and patient age collectively predict cardiac risk, but clinical interpretability requires connecting these features to pathophysiological mechanisms—explaining that troponin elevation indicates myocardial damage, ST-segment changes suggest ischemia, and age correlates with cumulative cardiovascular burden [33]. Clinical interpretability demands alignment with existing medical knowledge, causal reasoning, and explicit acknowledgment of uncertainty. The scoping reviews consistently identify this as the dimension where current XDL methods show greatest weakness, with most techniques providing technical attributions without medical contextualization [13,16].

Third, actionable justification provides the reasoning necessary for decision-making and accountability. This dimension answers not just “what did the model detect?” or “how did it reach this conclusion?” but “why should this recommendation guide action?” Actionable justification requires contextualizing predictions within broader clinical scenarios, acknowledging alternative interpretations, clarifying limitations, and explicitly supporting informed consent [21]. For a model recommending cancer treatment, actionable justification must explain the trade-offs between different therapeutic options, the evidence base supporting each recommendation, the uncertainties involved, and how patient values might inform selection among alternatives.

These three dimensions must function coherently. Technical transparency without clinical interpretability produces meaningless feature attributions: knowing which pixels contributed to a diagnosis does not help clinicians verify correctness unless those pixels correspond to anatomical structures or pathological findings. Clinical interpretability without actionable justification fails to support decision-making: understanding that a model detected elevated inflammatory markers does not indicate whether empiric antibiotic treatment should begin immediately or await culture confirmation. As multiple scoping reviews confirm, current XDL methods excel at technical transparency but struggle profoundly with clinical interpretability and actionable justification, creating a systematic gap between algorithmic capability and clinical utility [16,18,31]. As we demonstrate throughout this analysis, effective explainability emerges from integration across all three dimensions within the socio-technical ecosystem of healthcare delivery [25,26,27].

The importance of explainability in healthcare deep learning derives from six interconnected imperatives that distinguish medical applications from other domains [13,22]. Clinical safety demands understanding deep learning reasoning to identify potential errors before they cause harm. Unlike consumer applications where mistakes might cause inconvenience, medical errors can result in death or permanent injury. When a model misdiagnoses a malignant melanoma as benign, delayed treatment may allow metastasis and drastically reduce survival probability. Explainability enables clinicians to detect when models rely on spurious correlations—such as identifying pneumonia based on hospital-specific scan markers rather than actual pathology [21] before deploying systems clinically. Professional accountability requires clinicians to justify their decisions based on medical reasoning. Legal and ethical frameworks assign responsibility to licensed practitioners, not algorithms [18]. When DL systems recommend treatments, clinicians must understand the rationale sufficiently to accept or reject recommendations based on professional judgment. A surgeon cannot ethically perform a DL-recommended procedure without understanding why that procedure is indicated for the specific patient. Explainability thus becomes prerequisite for maintaining professional standards and legal accountability in DL-augmented practice.

Patient autonomy necessitates informed consent, which requires patients to understand how DL influences their care. The principle of informed consent—that patients have the right to accept or refuse treatment based on understanding risks, benefits, and alternatives—extends to DL-assisted decisions [2]. Patients increasingly want to know whether algorithms influenced their diagnosis or treatment recommendations, what information the DL considered, and how certain the predictions are. Without explainability, obtaining meaningful informed consent becomes impossible when DL plays substantial roles in clinical decisions.

Bias detection and mitigation require understanding which feature model what weights and how predictions vary across demographic groups. Healthcare AI systems have demonstrated concerning disparities, such as underestimating kidney disease severity in Black patients or providing inferior care recommendations to women [34]. Explainability tools can reveal when models inappropriately weight race or gender, learn biased patterns from historical data reflecting systemic inequities, or produce systematically different predictions for protected groups; enabling interventions before deployment exacerbates existing healthcare disparities.

Regulatory compliance increasingly mandates transparency for high-risk medical DL systems. The European Union AI Act (2024) [1] requires high-risk applications to provide explanations enabling users to understand and appropriately oversee outputs. The U.S. FDA guidance emphasizes that developers should “document and communicate information about algorithms in a manner that supports transparency” [35]. Healthcare organizations deploying DL without adequate explainability mechanisms face regulatory risks, potential penalties, and barriers to market authorization.

Trust and adoption depend on clinicians’ and patients’ understanding of DL reasoning. Even highly accurate models fail to improve care if clinicians refuse to use them or systematically override recommendations. Research demonstrates that clinicians show greater willingness to adopt DL systems when they understand the underlying logic and can verify that recommendations align with medical knowledge [7]. Conversely, opacity breeds suspicion and resistance. Explainability thus becomes not merely a technical feature but a prerequisite for successful integration of DL into clinical practice.

Healthcare presents unique challenges that distinguish medical explainability from other domains and explain why generic XDL techniques often fail in clinical contexts [17,31]. High-dimensional heterogeneous data creates complexity beyond most domains. Medical records integrate structured data (laboratory values, vital signs, medications), semi-structured information (clinical notes, radiology reports), unstructured content (physician assessments, patient histories), temporal sequences (disease progression, treatment responses), and multimodal inputs (imaging, genomics, wearable sensor streams). A single patient encounter might generate thousands of features across multiple modalities with complex interdependencies. Standard feature attribution methods like SHAP struggle to meaningfully explain predictions based on such heterogeneous high-dimensional inputs, often producing unwieldy lists of hundreds of contributing factors without clear clinical interpretation [36].

Clinical reasoning operates through causal pathophysiological mechanisms, not statistical correlations. Physicians learn disease processes through mechanistic understanding: infections trigger inflammatory cascades, genetic mutations disrupt cellular signaling, and vascular occlusions cause tissue ischemia. In contrast, most deep learning models identify statistical patterns without explicit causal modeling [37]. A model might correctly predict sepsis risk by detecting subtle vital sign patterns, but if it cannot explain these patterns through known pathophysiological mechanisms (e.g., hypotension from vasodilation, tachycardia from compensatory responses, fever from inflammatory mediators), clinicians cannot verify the reasoning or apply it to novel contexts [34].

Uncertainty quantification proves essential but difficult. Medical decision-making explicitly acknowledges uncertainty through differential diagnoses, probability estimates, and treatment trade-offs. Clinicians need to know not just what models predict but how confident predictions are, which factors contribute most to uncertainty, and how predictions might change with additional information. Most XDL techniques provide point estimates of feature importance without uncertainty bounds. A model assigning 72% probability to malignancy could be highly confident (tight probability distribution) or quite uncertain (wide distribution), but standard explanations omit this crucial distinction [6].

Temporal dynamics and context-dependence complicate interpretation. The same laboratory value carries different meanings depending on timing, trajectory, clinical context, and individual patient factors. An elevated D-dimer might indicate acute venous thromboembolism in a post-surgical patient, chronic inflammation in an elderly patient, or be clinically insignificant in pregnancy. Models must incorporate extensive contextual information to generate appropriate interpretations, but standard XDL methods treat features independently without capturing complex conditional relationships and temporal dynamics [35].

Diverse stakeholders require different explanation types, creating tension between completeness and accessibility. Radiologists need technical details about image analysis, oncologists require treatment rationales grounded in clinical trials, hospital administrators want transparency about resource allocation algorithms, patients need accessible language avoiding medical jargon, and regulators demand comprehensive documentation enabling oversight. No single explanation satisfies all stakeholders simultaneously. Designing explainability systems that serve multiple audiences without becoming unwieldy or contradictory poses fundamental challenges [14].

Distribution shift and generalization create unique risks. Models trained on data from one institution, patient population, or period may perform differently when deployed elsewhere due to demographic differences, practice variations, equipment changes, or temporal shifts. Explainability must help identify when models encounter out-of-distribution inputs or when learned patterns may not generalize to new contexts. A model trained predominantly on data from younger patients might inappropriately apply patterns when deployed for geriatric care, but detecting such distribution mismatches requires explanations revealing what patterns the model learned and how current inputs differ from training distributions [21].

Having established foundational XDL concepts, we now examine transformer architectures, which represent the current frontier of healthcare AI and present unique explainability challenges distinct from earlier neural networks.

2.1. Transformers in Healthcare and Medicine

Transformer architecture has revolutionized natural language processing and is increasingly applied to healthcare challenges ranging from clinical note analysis to protein structure prediction. Understanding transformers and their explainability challenges requires examining their distinctive mechanisms and why they differ fundamentally from earlier neural network architectures [38].

Transformers introduced in 2017 by Vaswani et al. process sequential data through self-attention mechanisms rather than the recurrence used in earlier architectures like LSTMs. Self-attention computes relationships between all pairs of positions in an input sequence, allowing models to identify dependencies regardless of distance—critical for medical contexts where symptoms appearing in different sections of a clinical note or separated in time might indicate specific diagnoses. For a transformer analyzing clinical notes, self-attention can connect “chest pain” mentioned in the chief complaint with “elevated troponin” appearing later in laboratory results, recognizing this pattern indicates myocardial infarction even though dozens of words separate the relevant information [38].

Figure 1 illustrates the key components of a transformer processing clinical text, including tokenization, embeddings, multi-head self-attention mechanisms, and feed-forward networks arranged in multiple encoder layers. Each attention head can focus on different linguistic relationships, enabling the model to capture complex patterns in clinical notes.

In healthcare applications, transformers excel at several tasks: clinical note analysis for extracting diagnoses, medications, and clinical reasoning from unstructured text; medical code prediction assigning diagnostic and procedure codes based on encounter documentation; risk prediction identifying patients at high risk for outcomes like hospital readmission or mortality based on longitudinal electronic health record data; and question-answering providing clinicians with evidence-based responses to medical queries [39]. Large language models like GPT-4 and specialized medical transformers like BioBERT demonstrate impressive performance on these tasks, often matching or exceeding human experts.

Despite their clinical promise, transformers present severe explainability challenges that exceed those of earlier architectures and create difficulties for healthcare or medical deployment [6,40].

Figure 2 highlights four major challenges: (1) attention ambiguity from 144+ attention matrices, (2) layer depth opacity as information transforms through 12+ layers, (3) contextual dependencies where word meanings vary by context, and (4) emergent reasoning that produces novel patterns that do not exist in the training data. Attention visualization, the most common approach to explaining transformers, examines attention weights to identify which input tokens the model focuses on when generating predictions. Attention weights ranging from 0 to 1 quantify how much each input position influences each output position, creating interpretable matrices that can be visualized as heatmaps. For a transformer predicting heart failure risk from clinical notes, attention visualization might show strong weights connecting mentions of “dyspnea,” “edema,” and “elevated BNP,” suggesting the model recognized this constellation as indicating heart failure. However, mounting evidence demonstrates that attention weights correlate poorly with true feature importance and can be manipulated adversarially to appear plausible while the model relies on different information [41].

Multiple attention heads processing information in parallel create interpretive challenges. A 12-layer transformer with 12 attention heads per layer contains 144 attention matrices, each potentially identifying different patterns. Aggregating these matrices to produce coherent explanations proves extremely difficult. Should all heads be averaged? Should certain heads be prioritized? Different heads might identify contradictory patterns—one suggesting malignancy while another indicates benignity, leaving uncertain which heads the model relied on for its final prediction. In healthcare contexts requiring unambiguous justifications, this ambiguity becomes untenable [42].

Layer depth creates opacity as information transforms through many processing stages. Unlike shallow models where input–output relationships remain relatively direct, transformers pass representations through numerous encoder and decoder layers, each performing complex non-linear transformations. Early layers might identify low-level patterns (frequent word co-occurrences), middle layers detect intermediate concepts (symptom clusters), and late layers capture high-level abstractions (disease categories). Tracing how specific inputs influence final outputs through this cascade of transformations exceeds current explanation capabilities. We can visualize attention weights at individual layers but understanding how these patterns compose across the full depth to produce predictions remains largely infeasible [43].

Contextual embedding prevents meaningful decomposition of predictions into independent feature contributions. Unlike tabular data where SHAP can attribute importance to discrete features, transformers create rich contextual representations where meaning depends entirely on surrounding context. The word “positive” contributes differently when appearing in “positive stress test” (indicating cardiac ischemia) versus “positive attitude” (reflecting mental state). Standard attribution methods assuming independent features fail catastrophically with contextual embeddings where the same token carries different meanings in different contexts, and meaning emerges from complex interactions that resist decomposition [44].

Emergent reasoning beyond training data poses verification challenges. Large transformers trained on massive text corpora develop capabilities not explicitly present in training data—a phenomenon called emergence. A medical transformer might generate plausible clinical reasoning connecting symptoms to diagnoses even when that specific reasoning chain never appeared in training documents. While impressive, this capability creates explainability problems because we cannot verify reasoning against known examples. The model might produce clinically sound logic, hallucinate convincing but incorrect explanations, or use reasoning patterns reflecting training data biases rather than medical evidence. Distinguishing these scenarios requires understanding not just what models output but how they generate outputs—a challenge that current XDL methods cannot reliably solve [40].

Natural language explanation generation through transformers introduces recursive explainability problems. Models that generate textual explanations use the same opaque transformer architectures as the systems being explained. A large language model might produce convincing clinical reasoning explaining why it diagnosed pneumonia, but this explanation comes from another transformer, raising questions: Is the explanation faithful to the diagnostic model’s actual reasoning? Or did the explanation generator produce post hoc rationalization that sounds plausible but does not reflect true model behavior? Explaining the explainer creates infinite regress; we need explanations of explanations, each suffering the same opacity as the original problem [45].

These transformer-specific challenges compound the general healthcare explainability difficulties discussed earlier. High-dimensional medical data processed through massive multi-head attention mechanisms, contextual embeddings obscuring feature independence, and emergent reasoning from training data patterns—all combine to make transformer explainability in healthcare an exceptionally difficult problem. Current XDL techniques developed for simpler architectures and domains prove inadequate, explaining why, despite transformers’ clinical promise, their opacity remains a significant barrier to widespread healthcare deployment.

2.2. Summary

This foundational section establishes that explainability in healthcare DL is neither a simple technical requirement nor a straightforward engineering challenge. It is a multifaceted imperative spanning technical transparency, clinical interpretability, and actionable justification—made essential by clinical safety, professional accountability, patient autonomy, bias detection, regulatory compliance, and trust requirements. Healthcare presents unique explainability challenges through high-dimensional heterogeneous data, requirement for causal reasoning, uncertainty quantification, temporal dynamics, diverse stakeholder needs, and distribution shift risks. Transformers, despite clinical promise, compound these difficulties through attention ambiguity, multiple processing layers, contextual embeddings, emergent reasoning, natural language generation opacity, and massive scale.

These foundational concepts frame the central question this paper addresses: Given these formidable challenges, under what conditions does explainability function as panacea versus Pandora’s box? The systemic analysis that is discussed next demonstrates that answers depend not on universal principles but on context-appropriate deployment informed by careful consideration of sociotechnical factors, stakeholder needs, and institutional capabilities. We now turn to the systemic foundations that provide analytical tools for understanding this context-dependence.

3. Systemic Foundations: Beyond Reductionism

To understand when explainability helps versus when it harms, we must first recognize healthcare DL as a complex socio-technical system rather than a purely technical artifact. This section establishes the general systems theory foundation underlying our analysis.

3.1. The Socio-Technical Paradigm

General systems theory emerged in the mid-20th century as a response to scientific reductionism, the tendency to understand wholes by analyzing parts in isolation [9,46]. GST recognizes that complex systems exhibit emergent properties arising from component interactions that cannot be predicted from individual elements alone. In socio-technical systems like healthcare, technology and social context co-evolve through continuous interaction rather than operating independently [47,48]. This perspective proves essential for understanding XDL because explanation effectiveness emerges from interaction between technical methods, clinical reasoning patterns, workflow characteristics, organizational culture, and governance structures. Optimizing explanation algorithms in isolation, the dominant approach in computer science, cannot ensure systemic success because it ignores the social context within which those algorithms must function.

We draw on five principles from GST particularly relevant to XDL design:

Principle 1: Emergence and Non-Linearity. System behavior arises from interactions rather than being reducible to individual components. Small changes can produce disproportionate effects; large interventions may yield minimal impact. For XDL, this means explanation utility cannot be predicted from technical properties alone; it emerges from how explanations interact with clinical workflows, cognitive processes, and organizational structures. An explanation that works well in oncology deliberation may fail catastrophically in emergency medicine.

Principle 2: Feedback and Self-Regulation. Complex systems maintain stability through feedback loops: negative feedback corrects deviations; positive feedback enables change and adaptation. For XDL, this suggests transparency systems require continuous monitoring and adjustment. User feedback, outcome data, and incident reports should drive iterative refinement rather than one-time deployment. Static explanation methods cannot adapt to evolving clinical needs or emerging evidence.

Principle 3: Openness and Adaptation. Living systems exchange information with their environment, adapting to changing conditions. For XDL, this means explanation systems must remain open to new medical knowledge, evolving clinical guidelines, and changing practices. Explanations that do not update as evidence accumulates become obsolete, potentially misleading users with outdated information.

Principle 4: Requisite Variety. A system’s control mechanisms must possess complexity matching what they attempt to control. Simple universal solutions cannot govern diverse complex problems. For XDL, this implies that different applications require different explanation approaches. Cancer treatment planning demands comprehensive transparency; equipment maintenance prediction may need only performance validation. One-size-fits-all mandates violate requisite variety.

Principle 5: Equifinality. Multiple paths can lead to the same outcome; no single “correct” approach exists. For XDL, this suggests technical pluralism: SHAP, LIME, counterfactuals, and prototypes all offer valid but different explanatory perspectives. Rather than seeking one optimal method, we should combine complementary approaches appropriate to specific contexts. These principles reveal why purely technical optimization fails to guarantee clinical value. Current XDL research typically performs the following:

Evaluates methods in isolation (violating emergence principle).
Deploys static solutions without feedback mechanisms (violating self-regulation).
Designs explanations divorced from clinical context (violating openness).
Proposes universal approaches for diverse applications (violating requisite variety).
Seeks single optimal methods (violating equifinality).

A systemic approach recognizes that explanation effectiveness depends on alignment across multiple interacting dimensions—technical accuracy, cognitive interpretability, workflow integration, and organizational governance—within specific clinical contexts. This shifts the focus from “which explanation method is best?” to “how should explanation be designed as systemic property emerging from careful integration of technology, people, and organizations?” Table 1 summarizes how GST concepts map to XDL design principles, providing a foundation for the analysis that follows.

The systemic perspective thus provides theoretical foundation for understanding XDL as a context-dependent dynamic equilibrium requiring continuous rebalancing rather than universal static solutions. We now apply this foundation to healthcare-specific concepts.

3.2. Healthcare-Medicine Specific Systems Concepts

While GST provides general principles, Refs. [10,11] translates these into healthcare-specific frameworks directly applicable to XDL design. This section examines five key concepts from his three-decade research program, showing how each illuminates the contemporary deep learning challenges.

The Five Key Systemic Properties

Ref. [10] identified five properties characterizing effective health information systems: softness, openness, complexity, flexibility, and generality. Figure 3 illustrates these properties and their relationships [10,11,12].

Figure 3 shows five interconnected properties (softness, openness, complexity, flexibility, and generality) that characterize effective health information systems and serve as evaluative criteria for XDL methods. Softness recognizes healthcare as an inherently interpretive domain involving human judgment, values, and uncertainty. Unlike “hard” engineering systems with objective optimization criteria, medical decisions balance competing priorities, accommodate individual patient circumstances, and require interpretation that algorithms cannot fully capture. For XDL, softness demands that explanations support rather than replace clinical judgment, acknowledge uncertainty explicitly, and adapt to diverse user needs and circumstances [10,12]. Openness emphasizes continuous information exchange with the environment. Healthcare systems must adapt to new evidence, evolving practices, and changing circumstances. For XDL, openness requires explanations that link to current medical literature, update as guidelines change, and incorporate feedback from clinical use. Static explanations disconnected from evolving knowledge bases rapidly become obsolete [10,12]. Complexity acknowledges the multi-dimensional, multi-stakeholder nature of healthcare. Decisions involve clinical, technical, ethical, economic, and social dimensions; stakeholders include patients, multiple clinician types, administrators, and regulators. For XDL, complexity demands multi-perspective explanations addressing diverse needs—technical details for algorithm auditors, clinical narratives for physicians, and accessible summaries for patients—all coherently integrated [10,12]. Flexibility enables context-sensitive adaptation. Different clinical scenarios (emergency versus deliberative), different users (expert versus novice), and different stakes (life-threatening versus convenience) require different approaches. For XDL, flexibility means explanation depth and format should adapt to situation—comprehensive analysis when appropriate, streamlined directives when urgent, and suppressed entirely when time-critical action supersedes understanding [10,12]. Generality supports reusability and organizational learning. Systems designed for narrow contexts waste resources; architectures applicable across domains enable institutional investment and knowledge accumulation. For XDL, generality suggests explanation frameworks should apply across multiple applications and data types while allowing customization to specific needs [10,12]. These five properties provide evaluative criteria for XDL methods. Section 4 demonstrates that current techniques fail to embody these properties, explaining systematic gaps between technical sophistication and clinical utility.

3.3. Clinical Reasoning Versus Clinical Function

Raghupathi’s (2007) distinction between clinical reasoning and clinical function proves pivotal for determining when explainability is necessary versus unnecessary or counterproductive [10]. Figure 4 illustrates this spectrum.

This spectrum illustrates the range from reasoning-intensive applications (requiring comprehensive XDL) to function-oriented tasks (requiring minimal XDL). Clinical reasoning involves cognitive processes where clinician judgment integrates multiple considerations to reach decisions: diagnosis (what disease?), treatment selection (what therapy given patient values?), prognosis (what outcomes?), and risk assessment (how likely is deterioration?). These applications intersect with professional expertise, ethical responsibility, and patient autonomy. For reasoning-intensive AI, explainability is essential because clinicians remain legally and ethically accountable, must integrate recommendations with broader context, need to assess when to rely on versus override DL, and should learn from rather than de-skill through automation.

Clinical function involves operational tasks supporting healthcare delivery without directly engaging clinical reasoning: appointment scheduling, supply chain management, equipment maintenance prediction, image preprocessing, and sensor calibration. These optimize logistics and infrastructure without requiring judgment about individual patient care decisions. For function-oriented AI, comprehensive explainability is unnecessary because accountability is systemic rather than individual (suboptimal appointment time inconveniences but does not directly harm), performance validation suffices (does scheduling minimize wait times?), and transparency may be a burden without improving outcomes. This distinction enables context-appropriate deployment: comprehensive explanation where reasoning occurs and lighter approaches where the AI tasks are functional and routine, recognizing that universal transparency may misallocate resources by forcing explanation where unnecessary while potentially undertreating applications that genuinely require comprehensive transparency.

3.4. Total Digital Health Systems and Distributed Transparency

Ref. [12] extended systemic concepts to Total Digital Health Systems (TDHS)—integrated networks linking electronic health records, diagnostic systems, analytics, and governance structures. They contrasted TDHS with narrow EHR implementations, arguing that systemic intelligence emerges from integration rather than mere digitization [12]. Within TDHS, transparency becomes distributed rather than localized in algorithms. Trust does not derive solely from algorithmic interpretability but from alignment of multiple transparency layers: data provenance (training data sources and biases), model validation (performance assessment and limitations), user interaction (interface support for understanding), and organizational oversight (governance structures and accountability enforcement). This distributed accountability perspective aligns with socio-technical systems theory, emphasizing that responsibility in complex systems is inherently shared across multiple actors rather than concentrated in single points of failure [12,48,49]. Figure 5 visualizes this distributed accountability network.

As seen above, the patient sits at the center as rights holder, surrounded by four key stakeholders (clinician, institution, regulator, developer) with bidirectional accountability relationships. Each stakeholder has specific responsibilities, and arrows indicate feedback loops and mutual dependencies. An explainable DL tool is thus one node in a transparent ecosystem. Even technically opaque models can function within transparent systems if surrounded by robust governance, clear accountability structures, and effective communication protocols. Conversely, technically transparent models can fail systemically when embedded in opaque organizational contexts lacking governance or accountability.

3.5. The Integrated Three-Layer Architecture

Synthesizing GST, socio-technical theory, and Raghupathi’s healthcare-specific concepts yields a three-layer architecture where explainability emerges from coherent interaction rather than residing in individual components. This layered approach to information system design has precedent in healthcare informatics, where [22] articulated multiple levels at which clinical decision support systems must function—technical infrastructure, knowledge management, and organizational workflow—to achieve effective implementation. Similarly, Sittig and Singh (2010) proposed a sociotechnical model for health information technology spanning hardware/software, clinical content, human–computer interface, people, workflow, organizational policies, external rules, and system measurement—all of which must align for successful deployment [49]. Our three-layer XDL architecture adapts these established frameworks to contemporary deep learning challenges, recognizing that explanation systems require simultaneous optimization across technical, cognitive, and organizational dimensions. Figure 6 illustrates this systemic architecture with its critical feedback loops.

The three-layer architecture in Figure 6 shows how explainability emerges from coherent interaction across organizational governance (top), cognitive interface (middle), and technical infrastructure (bottom) layers. Bidirectional arrows show feedback loops enabling continuous learning and improvement. The technical infrastructure layer provides foundational transparency—interpretable architectures where feasible, feature attribution methods (SHAP, LIME), counterfactual generators, and performance monitoring. This layer answers technical questions: Which features influenced predictions? How confident is the model? What alternative outputs were considered? The cognitive interface layer transforms technical signals into clinically meaningful explanations. Semantic translation maps technical features to clinical concepts—converting “pixel gradient 0.87” to “consolidation pattern in right lower lobe consistent with pneumonia.” This layer aligns explanations with clinical reasoning patterns, communicates uncertainty appropriately, integrates evidence from medical literature and guidelines, and provides interactive exploration enabling “what-if” queries. The necessity of this translation layer reflects longstanding recognition in medical informatics that technical system outputs must be transformed into forms compatible with clinical cognitive processes [22,49]. The organizational governance layer embeds explainability within institutional structures ensuring accountability and continuous learning. This layer addresses policy compliance (FDA, HIPAA, institutional review boards), establishes accountability frameworks clarifying responsibility distribution, provides ethical oversight through governance committees, enables systematic audit and quality assurance, creates continuous learning mechanisms where clinical experience informs system improvements, and maintains regulatory alignment with external oversight requirements. This governance layer reflects principles articulated in frameworks for clinical decision support oversight and health IT safety [49]. Critically, coherent interaction across layers determines success. Technical transparency without cognitive translation produces meaningless outputs. Cognitive interpretability without organizational governance lacks accountability. Governance without technical foundation becomes performative. These layers interact through feedback loops creating cybernetic self-regulation—clinical use reveals limitations, explanation quality affects adoption, governance policies identify gaps, performance monitoring detects drift, and user feedback shapes interface evolution. This three-layer architecture provides a conceptual model for systemic XDL design. We now evaluate current methods against these systemic criteria.

4. Systemic Analysis of XDL Techniques

Having established the systemic framework, we evaluate major XDL techniques across technical, cognitive, and organizational layers. This analysis reveals why technically sophisticated methods often fail to deliver clinical value: they succeed partially in one layer while inadequately addressing others.

4.1. Saliency Maps: Perceptual Appeal, Systemic Failure

Saliency maps generate intuitive heatmaps highlighting influential image regions. Grad-CAM [17] produces visually appealing overlays showing where models “look” when making predictions. Technical assessment reveals fundamental instability despite superficial appeal. Ref. [50] demonstrated that adversarial techniques manipulate saliency maps to appear correct while underlying models remain flawed. Ref. [51] showed many gradient methods produce plausible explanations even for randomized models; when explanation methods work for random models, they cannot be trusted for real models. Ref. [23] found saliency methods lack reliability across implementations. Cognitive assessment reveals dangerous over-interpretation. Ref. [6] demonstrated radiologists shown heatmaps reported increased confidence but made no better decisions than controls: explanations created “illusions of understanding.” Ref. [52] found this pattern represents systematic problems across domains. Without semantic grounding connecting visual attention to clinical concepts, saliency maps offer perceptual plausibility without explanatory depth. Organizational assessment shows minimal governance value. Ref. [53] documented that biased models produce plausible-looking heatmaps masking spurious correlations. Instability undermines reproducibility required for systematic review. Systemic verdict: Saliency maps achieve narrow technical success but fail systemically due to instability, shallow semantic grounding, creation of false confidence, and weak organizational integration.

4.2. LIME: Flexibility Without Reliability

LIME’s model-agnostic approach [54] enables explanation of any black box system through local linear approximations, which is valuable in healthcare where diverse model types coexist. Technical assessment reveals fatal inconsistency. Alvarez-Melis and Jaakkola (2018) demonstrated that repeated queries on identical inputs yield dramatically different explanations [55]. Ref. [56] showed LIME can be systematically fooled, enabling adversarial models to mask unethical behavior. Molnar (2020) notes locality creates fundamental trade-offs between fidelity and interpretability, which is problematic for high-stakes medical contexts [57]. Cognitive assessment shows gaps in medical reasoning alignment. LIME lacks medical semantics, treating all features equivalently without distinguishing clinically meaningful patterns from spurious correlations. It might emphasize room number if that correlated with outcomes in training data. Ref. [32] documented that inconsistency undermines trust in both explanation and model. Organizational assessment reveals governance challenges. Which explanation is authoritative when repeated runs yield different results? Lack of reproducibility complicates regulatory review and undermines third-party verification. Systemic verdict: LIME achieves partial technical success (model-agnostic flexibility) but fails cognitively (lack of semantic grounding, inconsistency) and organizationally (unreliability for governance).

4.3. SHAP: Mathematical Elegance, Clinical Disconnect

SHAP provides principled feature attribution through Shapley values [33], offering mathematical rigor and consistency: the same input always produces the same explanation. Technical assessment shows substantial achievements. For tabular clinical data, SHAP has become a standard tool. Ref. [33] demonstrated that SHAP provides both local and global explanations. However, Ref. [36] showed SHAP approximations can be inaccurate for deep networks, and ref. [58] documented sensitivity to feature correlations common in clinical data. Cognitive assessment reveals fundamental disconnects. Clinicians reason through pathophysiological narratives integrating mechanisms and temporal dynamics—elements absent from SHAP’s numerical decomposition. Consider sepsis prediction showing lactate: +0.35, temperature: +0.22, white blood cells: +0.18. A clinician’s reasoning: “Elevated lactate suggests tissue hypoperfusion from distributive shock. Combined with fever and leukocytosis, this indicates systemic inflammatory response.” SHAP provides mathematical decomposition without clinical synthesis. Ref. [21] documented systematic misalignment between what SHAP provides and what clinicians need. Organizational assessment shows SHAP performs better than alternatives. Consistency supports auditing; global summaries can reveal systematic biases [34]. However, Ref. [18] argue feature attribution provides insufficient transparency for high-stakes decisions, revealing correlations without establishing clinical validity or causal relationships. Systemic verdict: SHAP excels technically and provides moderate organizational value but disconnects fundamentally from the cognitive layer. In synthesis, SHAP exemplifies a broader pattern: mathematical sophistication does not automatically translate to clinical utility. The gap between Shapley value decomposition and pathophysiological reasoning reflects deep epistemological differences between statistical and clinical understanding.

4.4. Emerging Approaches

Several emerging approaches attempt to address identified limitations, though each faces systemic challenges. Counterfactual explanations [59] answer “what would need to change for different outcome?”—aligning with clinical reasoning about modifiable risk factors. Ref. [60] note challenges in generating clinically realistic scenarios respecting physiological constraints. Concept-based explanations [61] define high-level clinical concepts as intermediate representations. However, defining appropriate concept vocabularies requires substantial expertise, and enforcing concept bottlenecks may reduce model performance. Causal reasoning approaches [37,62] attempt to move beyond correlation to identify causal relationships. However, learning valid causal structures from observational data remains extraordinarily challenging, requiring strong untestable assumptions [63]. Natural language explanations through neural text generation [39] could produce human-readable narratives. However, Ref. [40] note text generation introduces risks: explanations may sound authoritative while being factually incorrect. While promising, none yet achieves systemic adequacy across technical, cognitive, and organizational layers.

Table 2 summarizes how current XDL methods perform against systemic criteria.

The fundamental issue is as follows: these methods treat explainability as a technical problem rather than a socio-technical challenge. They optimize for computational tractability without systematically addressing clinical reasoning alignment, workflow integration, or organizational governance. This systematic gap explains puzzling empirical findings. Why do sophisticated explanations fail to improve clinical decisions [6,21]? It is because technical transparency does not automatically translate to cognitive utility. Why does user-reported trust increase while decision quality does not [52]? It is because explanations satisfy psychological needs while providing unreliable insight into model behavior. Achieving genuine systemic explainability requires moving beyond individual technical methods toward integrated architectures combining multiple approaches with semantic translation, stakeholder adaptation, and organizational integration—the three-layer ecosystem illustrated in Figure 6.

5. Context as Determinant: Panacea Versus Pandora’s Box

The systemic framework and technical analysis reveal that explainability’s value emerges from context rather than being inherent to methods. Identical approaches generate opposite outcomes depending on application type, decision stakes, time constraints, user characteristics, and organizational integration.

Having established that XDL methods exhibit systematic gaps between technical capability and clinical utility, we examine the central question: When does explainability function as a beneficial panacea versus a harmful Pandora’s box?

Figure 7 provides the conceptual framework for understanding this context-dependence.

The 2 × 2 matrix above maps the way in which explainability helps versus harms based on two dimensions: stakes (high/low) and time criticality (deliberative/emergency). High-stakes deliberative contexts require comprehensive XDL (panacea), while high-stakes emergencies need minimal XDL during events (Pandora’s box if misapplied).

5.1. The Reasoning–Function Distinction: A Foundational Framework

Raghupathi’s (2007) distinction between clinical reasoning and clinical function provides the foundation for context-appropriate XDL deployment. This distinction cuts to the heart of understanding when explainability becomes essential versus when it becomes unnecessary or even counterproductive [10].

Reasoning-intensive AI supports cognitive processes where clinician judgment integrates multiple considerations to reach decisions: diagnosis (what disease does this patient have?), treatment selection (what therapy is most appropriate given this patient’s values and circumstances?), prognosis (what outcomes should we expect?), and risk assessment (how likely is deterioration?). These applications intersect with professional expertise, ethical responsibility, and patient autonomy. Clinical reasoning is inherently interpretive, values-dependent, and requires integration with broader patient context that algorithms cannot fully capture [10,11,12].

For reasoning-intensive AI, explainability is essential because it enables several critical functions: professional accountability (clinicians remain legally and ethically responsible for decisions), cognitive integration (recommendations must be evaluated against clinical knowledge and patient-specific considerations), trust calibration (clinicians need to assess when to rely on AI versus when to override it), learning and skill maintenance (transparency enables clinicians to learn from AI rather than deskilling through automation), and informed consent (patients have rights to understand how AI influenced their care) [10,11,12]. Function-oriented AI performs operational tasks supporting healthcare delivery without directly engaging clinical reasoning: appointment scheduling optimizing resource utilization, supply chain management maintaining inventory, equipment maintenance prediction preventing failures, medical image quality enhancement preprocessing raw data, and sensor calibration ensuring measurement accuracy. These applications optimize logistics and infrastructure without requiring clinical judgment about individual patient care decisions [10,11,12].

For function-oriented AI, comprehensive explainability is unnecessary and potentially counterproductive because accountability operates differently: accountability is systemic (responsibility lies with organizational processes rather than individual clinical decisions), validation suffices (performance monitoring matters more than instance-level explanations), transparency may burden (detailed explanations add cognitive load without improving outcomes), and trust derives from reliability (consistent accurate performance builds confidence more than explanations) [10,11].

This distinction avoids the universal transparency trap. Not all AI requires explainability—only reasoning-intensive systems do. Function-oriented systems require different accountability mechanisms: performance monitoring, quality assurance, organizational oversight, and incident response for systematic failures. Mandating explanations for functional AI wastes resources that could be better deployed improving high-stakes reasoning applications.

Table 3 operationalizes this reasoning–function distinction into a risk-stratified framework that provides practical guidance for organizations deploying healthcare AI. The classification employs three defining indicators: (1) Clinical decision authority—whether AI outputs directly determine patient care (high-stakes), support clinician decisions (medium-stakes), or optimize operations without patient-level impact (low-stakes); (2) Reversibility—whether errors are potentially irreversible (high-stakes), correctable with intervention (medium-stakes), or cause only operational inconvenience (low-stakes); (3) Regulatory designation—alignment with FDA risk classifications and EU AI Act categories.

This risk-stratified framework recognizes that different applications require fundamentally different approaches to transparency and accountability. High-stakes reasoning AI demands comprehensive explainability across all three systemic layers (technical, cognitive, organizational) as illustrated in Figure 6. Medium-stakes augmentation AI benefits from contextual transparency—enough information to support appropriate reliance without overwhelming users. Low-stakes operational AI requires minimal explanation, focusing instead on performance validation and systemic oversight.

Critically, time-criticality modifies these categories. Even high-stakes reasoning applications should prioritize action over explanation during emergencies, deferring comprehensive analysis to post-event learning.

5.2. Synthesizing Dual Nature: Dynamic Equilibrium

Table 4 synthesizes how XDL manifests as panacea, Pandora’s box, or balanced systemic intervention depending on design and deployment choices.

This synthesis reveals that explainability is neither inherently beneficial nor harmful—it is a managed intervention requiring continuous rebalancing. As illustrated throughout our analysis, the same technical methods produce opposite outcomes depending on systemic alignment. The key is recognizing XDL as a living property of socio-technical ecosystems rather than a static technical feature. From this systemic perspective, explainability functions follows:

-: Panacea when co-designed with clinicians, governed ethically, aligned with feedback loops, proportionate to risk, and embedded across all three systemic layers (Figure 6).
-: Pandora’s box when applied as static, technical, one-size-fits-all add-on without workflow integration, cognitive alignment, or organizational governance.
-: Systemic intervention when embedded across technical, cognitive, and organizational layers as a self-correcting, learning subsystem that enhances trust and resilience through continuous adaptation.

The cancer treatment case exemplifies panacea manifestation—comprehensive explanation enabling informed consent, shared decision-making, and value-aligned care. The cardiac arrest case exemplifies Pandora’s box—comprehensive explanation during an emergency causing dangerous delays and cognitive overload. The difference lies not in the explanation quality but in systemic alignment with context. In essence, explainability is not a cure-all for AI in healthcare or medicine; it is a systemic commitment to transparency, context, and co-evolution between human and machine intelligence. When governed holistically as illustrated in our three-layer architecture or risk-stratified framework it becomes the panacea that keeps Pandora’s box closed. When imposed superficially without systemic thinking, it opens that box, releasing complexity that obscures rather than illuminates.

6. Implementation as Systemic Intervention

Understanding explainability as context-dependent systemic property rather than universal requirement fundamentally changes implementation approach. Organizations cannot simply “add explainability” to AI systems as a technical feature. They must design integrated architectures spanning technical, cognitive, and organizational dimensions while assessing each application to determine appropriate transparency levels. Figure 8 provides a simplified decision framework for determining appropriate XDL levels.

This decision tree provides systematic guidance for determining appropriate explainability levels. Starting with whether AI is reasoning-intensive, the framework branches through stakes assessment and time-criticality evaluation to recommend minimal, contextual, comprehensive, or emergency-appropriate XDL approaches.

6.1. Soft Systems Methodology for XDL Implementation

Before determining appropriate explainability levels, organizations should employ Soft Systems Methodology (SSM)—a participatory, iterative approach from [12] that treats XDL implementation as ongoing systemic intervention rather than one-time technical deployment. Table 5 shows how SSM adapts to explainability design.

This iterative methodology embodies systemic principles by embracing softness (recognizing XDL as interpretive challenge requiring stakeholder participation), maintaining openness (adapting continuously based on clinical experience and evolving needs), managing complexity (addressing technical, cognitive, organizational dimensions simultaneously), enabling flexibility (adjusting approach based on specific organizational context and readiness), and supporting generality (creating frameworks applicable across multiple AI applications).

The SSM approach prevents the common failure mode of treating explainability as a purely technical problem amenable to one-time solution. Instead, it positions XDL as an ongoing socio-technical practice requiring continuous negotiation among stakeholders, iterative refinement based on feedback, and systematic evaluation against multiple criteria beyond technical performance.

Organizations should cycle through these SSM steps for each AI application (following the decision framework in Figure 8), recognizing that appropriate XDL design emerges from participatory process rather than from prescriptive guidelines alone.

6.2. Determining Appropriate Explainability Levels

The reasoning–function distinction provides operational guidance for context-appropriate deployment. Organizations should begin by classifying each AI application using the framework in Figure 2 and Table 2: reasoning-intensive AI supporting diagnosis, treatment selection, prognosis, and risk assessment demands comprehensive explainability because it intersects with clinical judgment, requiring professional accountability, cognitive integration with existing knowledge, trust calibration enabling appropriate reliance, and learning supporting skill maintenance. Examples include cancer treatment recommenders, diagnostic classification systems, drug interaction predictors, surgical planning tools, and genetic counseling systems.

Function-oriented AI performing scheduling, resource allocation, signal processing, image preprocessing, and sensor calibration requires performance validation and organizational monitoring but not instance-level explanations. Examples include appointment scheduling optimizers, supply chain management systems, equipment maintenance predictors, medical image quality enhancement algorithms, and vital sign sensor calibration routines. Medium-stakes augmentation AI supporting workflow without direct decision authority—triage, prioritization, early warning—falls between these extremes, requiring contextual transparency: summary explanations, confidence indicators, performance transparency, and simplified reasoning without comprehensive detail overwhelming users. Time-criticality modifies these categories. Even high-stakes reasoning applications should prioritize action over explanation during emergencies, deferring comprehensive analysis to post-event learning. This temporal dimension is often overlooked but proves critical for appropriate deployment, as illustrated in Figure 7.

6.3. Limitations

Even optimally designed XDL faces limitations requiring honest acknowledgment rather than overpromising. Organizations should recognize and communicate these constraints. Epistemological limits: Post hoc explanations approximate rather than perfectly represent model reasoning. The map is not the territory: explanations are interpretive reconstructions, not complete representations of internal computational processes. This gap between explanation and reality means explanations are always partly narrative fiction serving communication purposes rather than literal truth about model cognition.

Complexity irreducibility: Some patterns genuinely exceed human comprehension. Deep networks discovering valid high-dimensional relationships in data spaces humans cannot visualize may resist intelligible explanation. We face fundamental trade-offs between model performance (benefiting from complexity) and human interpretability (requiring simplification). Organizations should acknowledge when full explanation is infeasible while focusing on actionable insight rather than complete transparency.

Resource constraints: Comprehensive XDL demands significant investment in technical infrastructure, semantic translation systems, organizational governance, and training programs. Not all organizations possess necessary resources. Implementation should be phased and prioritized—focusing first on high-stakes reasoning applications where explanation adds most value, using simpler validation for function-oriented systems.

Adversarial exploitation: Bad actors can game explanation systems. Models can be adversarially trained to produce desired explanations (appearing to weight appropriate features) while maintaining biased predictions through complex interactions explanations do not capture. This “explanation theater” satisfies transparency requirements superficially without genuine accountability. Organizations should implement independent validation beyond self-reported explanations and use multiple explanation methods to detect inconsistencies.

Cultural resistance: Some clinicians may resist AI transparency, preferring autonomous judgment without algorithmic input. Others may demand more explanation than is technically feasible or clinically useful. Managing these divergent preferences requires ongoing negotiation, stakeholder engagement, education about capabilities and limitations, and psychological safety for questioning AI outputs.

Organizations should clearly communicate that explanations are interpretive tools rather than ground truth, focus on actionable insight rather than complete transparency, accept performance-interpretability trade-offs in some contexts, and maintain human oversight resistant to explanation manipulation. This honest acknowledgment of limitations builds more sustainable trust than unrealistic promises.

Federated learning enables model training across distributed institutions without centralizing patient data, addressing privacy while improving generalizability. However, federated architectures require explanations accounting for institutional heterogeneity. Recent advances in federated fault diagnosis demonstrate that balance recovery and collaborative adaptation approaches can address inconsistencies across distributed systems, suggesting pathways for maintaining explanation quality in federated healthcare AI.

6.4. Mitigation Strategies

To address identified limitations, organizations should implement targeted strategies: for epistemological limits, deploy ensemble explanation methods with explicit uncertainty quantification. For complexity irreducibility, develop tiered explanation interfaces with drill-down capability. For resource constraints, implement phased deployment prioritizing high-stakes applications. For adversarial exploitation, establish independent auditing and multi-method consistency checks. For cultural resistance, conduct stakeholder engagement from inception.

7. Conclusions

This analysis began with the question: Is explainability in deep learning for healthcare a panacea or a Pandora’s box? Through systemic analysis integrating three decades of healthcare informatics scholarship with contemporary XDL research and concrete clinical use cases, we arrive at a clear answer: the dichotomy is false. Explainability is neither universally beneficial nor inevitably harmful. It is a dynamic equilibrium, a living property of socio-technical ecosystems requiring continuous rebalancing. Its value emerges from systemic alignment: coherence between explanation methods, clinical reasoning patterns, workflow requirements, and organizational governance. Whether explainability heals or harms depends fundamentally on context, design quality, and integration within complex socio-technical systems.

7.1. Key Findings

Six key findings emerge from our analysis.

First, context determines outcome. XDL functions as panacea when applied to deliberative, high-stakes reasoning tasks; designed systemically across technical, cognitive, and organizational layers; tailored to stakeholder needs; and embedded within feedback loops enabling continuous learning. It becomes Pandora’s box when forced onto time-critical emergencies, applied superficially without integration, or provided uniformly without context adaptation. Second, current methods exhibit systematic gaps. Major XDL techniques achieve partial technical success but inadequately address cognitive and organizational layers. None fully embodies Raghupathi’s five systemic properties. Genuine systemic explainability requires integrated architectures combining multiple approaches with semantic translation, stakeholder adaptation, and organizational governance—the three-layer ecosystem. Third, high-stakes healthcare AI applications require comprehensive explainability. These applications involving clinical reasoning—where AI influences diagnosis, treatment, and prognosis—require comprehensive transparency. The remaining applications benefit more from performance validation and organizational oversight than from comprehensive instance-level explanations. Figure 4 illustrates this reasoning–function spectrum. Fourth, XDL is a socio-technical challenge requiring joint optimization across technical transparency, cognitive interpretability, and organizational accountability. Success demands three-layer architecture with coherent interaction. Trustworthiness emerges from distributed transparency across interacting layers rather than from any single component. Fifth, policy must embrace systemic thinking through distributed accountability models. Sixth, XDL embodies dynamic equilibrium requiring continuous rebalancing. It manifests differently depending on design, deployment, and governance. Organizations should employ structured approaches like Soft Systems Methodology to manage XDL as iterative socio-technical practice.

7.2. Theoretical Contributions

This work integrates general systems theory with XDL scholarship, demonstrating that GST provides powerful analytical tools for understanding transparency in healthcare AI. The three-layer systemic architecture (Figure 6) offers a conceptual model for future research. Extension of Raghupathi’s framework shows how concepts developed for clinical decision support apply to contemporary deep learning challenges. The five systemic properties (Figure 3) provide evaluative criteria exposing where current methods fall short. The reasoning–function distinction (Figure 4) offers operational guidance for context-appropriate deployment. The panacea–Pandora’s box framework captures the radically different outcomes explainability produces depending on context and implementation quality. This framework, supported by concrete clinical cases and visual illustrations across eleven figures and seven tables, enables practitioners to predict when XDL will help versus harm.

7.3. Implications for Practice and Policy

The systemic perspective yields several practical insights. Organizations should recognize that explainability is not a universal requirement but a context-dependent property. The reasoning–function distinction (Figure 4) and risk-stratification framework (Table 3) provide guidance for determining when comprehensive transparency adds value versus when lighter approaches suffice. Achieving systemic explainability requires moving beyond technical optimization toward integrated design across all three layers (Figure 6), demanding collaboration among AI developers, clinicians, and organizational leadership. Time-criticality fundamentally changes explainability requirements: what functions as panacea in deliberative contexts becomes Pandora’s box in emergencies. Static explanation systems cannot adapt to evolving clinical needs, emerging evidence, or changing practices. Organizations should view XDL as ongoing practice requiring sustained attention rather than one-time implementation. Distributed accountability frameworks (Figure 5) recognize that effective governance requires network models where multiple stakeholders share responsibility through mutual reinforcement and redundancy.

7.4. Future Directions

This analysis has limitations requiring future research. The systemic framework requires empirical validation through large-scale implementation studies and randomized controlled trials. Our risk stratification framework needs refinement for specific clinical domains and organizational contexts. The focus on clinical care settings means other applications may present different challenges. Future research should pursue empirical validation through multi-site studies measuring XDL impact on clinical outcomes; domain-specific refinement developing detailed guidelines for major specialties; causal explanation methods bridging statistical patterns and mechanistic understanding; patient-centered XDL designs addressing diverse information needs; and global health applications extending analysis to resource-constrained settings.

The debate around explainability often presents false dichotomies: transparency versus performance, innovation versus accountability, and human judgment versus algorithmic decision-making. The systemic perspective reveals these as artificial oppositions. Properly understood, explainability is not opposed to performance but complementary when appropriately deployed. Technical sophistication and clinical utility can align when systems are designed with cognitive and organizational integration from inception. Innovation and accountability are synergistic when transparent systems enable faster learning through feedback loops.

The question is not whether AI should be explainable, but when, how, and for whom. By applying systemic thinking—treating healthcare AI as complex socio-technical systems characterized by emergence, feedback, and contextual adaptation—we move beyond simplistic universalism toward nuanced wisdom.

Explainability, properly applied as proportionate systemic intervention, is indeed a panacea: it heals the trust deficit, enables integration of machine pattern recognition with human judgment, supports learning, and maintains accountability. But it requires careful calibration and appropriate indication. Applied indiscriminately without contextual wisdom, it becomes Pandora’s box—cognitive overload, analysis paralysis, workflow disruption, and resource diversion. The same methods that enable understanding in deliberative contexts create confusion in time-critical emergencies.

The path forward requires systemic maturity—recognizing that transparency is an emergent property of well-designed socio-technical systems rather than a technical feature to add. It demands contextual wisdom—understanding that explainability requirements must flex with application type, decision stakes, and time constraints. It necessitates distributed accountability—acknowledging that trust arises from alignment of multiple actors rather than from any single component. It calls for continuous rebalancing—treating explainability as dynamic equilibrium requiring ongoing adjustment as technology evolves and understanding deepens.

Understanding XDL as dynamic equilibrium rather than static property fundamentally changes how we approach design, implementation, and governance. Three decades of systemic scholarship converge on this insight: complex domains like healthcare require open, adaptive, feedback-rich architectures where transparency is structural property rather than add-on feature. Explainability is not a cure-all for AI in healthcare and medicine; it is a systemic commitment to transparency, context, and co-evolution between human and machine intelligence.

Funding

This research received no funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

European Parliament and Council of the European Union. Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Off. J. Eur. Union 2024, 1689, 1–144. [Google Scholar]
World Health Organization. Ethics and Governance of Artificial Intelligence for Health: WHO Guidance; World Health Organization: Geneva, Switzerland, 2021. [Google Scholar]
Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118, Erratum in Nature 2017, 546, 686. [Google Scholar] [CrossRef]
McKinney, S.M.; Sieniek, M.; Godbole, V.; Godwin, J.; Antropova, N.; Ashrafian, H.; Back, T.; Chesus, M.; Corrado, G.S.; Darzi, A.; et al. International evaluation of an AI system for breast cancer screening. Nature 2020, 577, 89–94. [Google Scholar] [CrossRef]
Campanella, G.; Hanna, M.G.; Geneslaw, L.; Miraflor, A.; Silva, V.W.K.; Busam, K.J.; Brogi, E.; Reuter, V.E.; Klimstra, D.S.; Fuchs, T.J. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 2019, 25, 1301–1309. [Google Scholar] [CrossRef] [PubMed]
Ghassemi, M.; Oakden-Rayner, L.; Beam, A.L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 2021, 3, e745–e750. [Google Scholar] [CrossRef] [PubMed]
Jacobs, M.; Pradier, M.F.; McCoy, T.H.; Perlis, R.H.; Doshi-Velez, F.; Gajos, K.Z. How machine-learning recommendations influence clinician treatment selections: The example of antidepressant selection. Transl. Psychiatry 2021, 11, 108. [Google Scholar] [CrossRef]
Tonekaboni, S.; Joshi, S.; McCradden, M.D.; Goldenberg, A. What clinicians want: Contextualizing explainable machine learning for clinical end use. In Proceedings of the Machine Learning for Healthcare Conference, Ann Arbor, MI, USA, 8–10 August 2019; pp. 359–380. [Google Scholar]
Boulding, K.E. General systems theory—The skeleton of science. Manag. Sci. 1956, 2, 197–208. [Google Scholar] [CrossRef]
Raghupathi, W. Designing clinical decision support systems in health care: A systemic view. Int. J. Healthc. Inf. Syst. Inform. 2007, 2, 44–53. [Google Scholar] [CrossRef]
Raghupathi, W.; Schkade, L.L. Designing artificial intelligence applications in law: A systemic view. Syst. Pract. 1992, 5, 61–78. [Google Scholar] [CrossRef]
Raghupathi, W.; Kesh, S. Designing electronic health records versus total digital health systems. Syst. Res. Behav. Sci. 2008, 25, 571–589. [Google Scholar] [CrossRef]
Singh, A.; Sengupta, S.; Lakshminarayanan, V. Explainable deep learning models in medical image analysis. J. Imaging 2020, 6, 52. [Google Scholar] [CrossRef]
Amann, J.; Blasimme, A.; Vayena, E.; Frey, D.; Madai, V.I. Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Med. Inform. Decis. Mak. 2020, 20, 310. [Google Scholar] [CrossRef] [PubMed]
Loh, H.W.; Ooi, C.P.; Seoni, S.; Barua, P.D.; Molinari, F.; Acharya, U.R. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Comput. Methods Programs Biomed. 2022, 226, 107161. [Google Scholar] [CrossRef] [PubMed]
Bellucci, M.; Delestre, N.; Malandain, N.; Zanni-Merk, C. Towards a terminology for a fully contextualized XAI. In Proceedings of the International Conference on Knowledge-Based Intelligent Information & Engineering Systems, Szczecin, Poland, 8–10 September 2021. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Jin, D.; Sergeeva, E.; Weng, W.H.; Chauhan, G.; Szolovits, P. Explainable deep learning in healthcare: A methodological survey from an attribution view. WIREs Mech. Dis. 2022, 14, e1548. [Google Scholar] [CrossRef]
Cutillo, C.M.; Sharma, K.R.; Foschini, L.; Kunber, S.; Mackintosh, M.; Mandl, K.D. Machine intelligence in healthcare—Perspectives on trustworthiness, explainability, usability, and transparency. npj Digit. Med. 2020, 3, 47. [Google Scholar] [CrossRef] [PubMed]
Price, W.N. Medical malpractice and black-box medicine. In Big Data, Health Law, and Bioethics; Glenn Cohen, I., Fernandez Lynch, H., Vayena, E., Gasser, U., Eds.; Cambridge University Press: Cambridge, UK, 2018; pp. 295–306. [Google Scholar]
Chari, S.; Chakraborty, P.; Ghalwash, M.; Seneviratne, O.; Eyigoz, E.K.; Gruen, D.M.; Saiz, F.S.; Chen, C.H.; Rojas, P.M.; McGuinness, D.L. Leveraging Clinical Context for User-Centered Explainability: A Diabetes Use Case. arXiv 2021, arXiv:2107.02359. [Google Scholar] [CrossRef]
Bates, D.W.; Kuperman, G.J.; Wang, S.; Gandhi, T.; Kittler, A.; Volk, L.; Spurr, C.; Khorasani, R.; Tanasijevic, M.; Middleton, B. Ten commandments for effective clinical decision support: Making the practice of evidence-based medicine a reality. J. Am. Med. Inform. Assoc. 2003, 10, 523–530. [Google Scholar] [CrossRef]
Kindermans, P.J.; Hooker, S.; Adebayo, J.; Alber, M.; Schütt, K.T.; Dähne, S.; Erhan, D.; Kim, B. The (un)reliability of saliency methods. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Springer: Berlin/Heidelberg, Germany, 2019; pp. 267–280. [Google Scholar]
He, J.; Baxter, S.L.; Xu, J.; Xu, J.; Zhou, X.; Zhang, K. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 2019, 25, 30–36. [Google Scholar] [CrossRef]
Wiens, J.; Saria, S.; Sendak, M.; Ghassemi, M.; Liu, V.X.; Doshi-Velez, F.; Jung, K.; Heller, K.; Kale, D.; Saeed, M.; et al. Do no harm: A roadmap for responsible machine learning for health care. Nat. Med. 2019, 25, 1337–1340. [Google Scholar] [CrossRef]
Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable ai: A review of machine learning interpretability methods. Entropy 2021, 23, 18. [Google Scholar] [CrossRef]
Minh, D.; Wang, H.X.; Li, Y.F.; Nguyen, T.N. Explainable artificial intelligence: A comprehensive review. Artif. Intell. Rev. 2022, 55, 3503–3568. [Google Scholar]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef] [PubMed]
Samek, W.; Montavon, G.; Lapuschkin, S.; Anders, C.J.; Müller, K.R. Explaining deep neural networks and beyond: A review of methods and applications. Proc. IEEE 2021, 109, 247–278. [Google Scholar] [CrossRef]
Lipton, Z.C. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 2018, 16, 31–57. [Google Scholar] [CrossRef]
Tjoa, E.; Guan, C. A survey on explainable artificial intelligence (xai): Toward medical xai. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4793–4813. [Google Scholar] [CrossRef] [PubMed]
Langer, M.; Oster, D.; Speith, T.; Hermanns, H.; Kästner, L.; Schmidt, E.; Sesing, A.; Baum, K. What do we want from explainable artificial intelligence (XAI)?—A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artif. Intell. 2021, 296, 103473. [Google Scholar]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4765–4774. [Google Scholar]
Rajkomar, A.; Hardt, M.; Howell, M.D.; Corrado, G.; Chin, M.H. Ensuring fairness in machine learning to advance health equity. Ann. Intern. Med. 2018, 169, 866–872. [Google Scholar] [CrossRef]
FDA. Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices; U.S. Food and Drug Administration: Silver Spring, MD, USA, 2022. [Google Scholar]
Kumar, I.E.; Venkatasubramanian, S.; Scheidegger, C.; Friedler, S. Problems with Shapley-value-based explanations as feature importance measures. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 5491–5500. [Google Scholar]
Prosperi, M.; Guo, Y.; Sperrin, M.; Koopman, J.S.; Min, J.S.; He, X.; Rich, S.; Wang, M.; Buchan, I.E.; Bian, J. Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nat. Mach. Intell. 2020, 2, 369–375. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Mullenbach, J.; Wiegreffe, S.; Duke, J.; Sun, J.; Eisenstein, J. Explainable prediction of medical codes from clinical text. arXiv 2018, arXiv:1802.05695. [Google Scholar] [CrossRef]
Bender, E.M.; Gebru, T.; McMillan-Major, A.; Shmitchell, S. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual, 3–10 March 2021; pp. 610–623. [Google Scholar]
Jain, S.; Wallace, B.C. Attention is not explanation. arXiv 2019, arXiv:1902.10186. [Google Scholar]
Voita, E.; Talbot, D.; Moiseev, F.; Sennrich, R.; Titov, I. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv 2019, arXiv:1905.09418. [Google Scholar]
Rogers, A.; Kovaleva, O.; Rumshisky, A. A primer in BERTology: What we know about how BERT works. Trans. Assoc. Comput. Linguist. 2021, 8, 842–866. [Google Scholar] [CrossRef]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 3319–3328. [Google Scholar]
Jacovi, A.; Goldberg, Y. Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? arXiv 2020, arXiv:2004.03685. [Google Scholar] [CrossRef]
von Bertalanffy, L. General System Theory: Foundations, Development, Applications; George Braziller: New York, NY, USA, 1968. [Google Scholar]
Trist, E.L.; Bamforth, K.W. Some social and psychological consequences of the longwall method of coal-getting. Hum. Relat. 1951, 4, 3–38. [Google Scholar] [CrossRef]
Pasmore, W.A. Designing Effective Organizations: The Sociotechnical Systems Perspective; Wiley: Hoboken, NJ, USA, 1988. [Google Scholar]
Sittig, D.F.; Singh, H. A new sociotechnical model for studying health information technology in complex adaptive healthcare systems. Qual. Saf. Health Care 2010, 19, i68–i74. [Google Scholar] [CrossRef] [PubMed]
Ghorbani, A.; Abid, A.; Zou, J. Interpretation of neural networks is fragile. Proc. AAAI Conf. Artif. Intell. 2019, 33, 3681–3688. [Google Scholar] [CrossRef]
Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; Kim, B. Sanity checks for saliency maps. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Volume 31, pp. 9505–9515. [Google Scholar]
Kaur, H.; Nori, H.; Jenkins, S.; Caruana, R.; Wallach, H.; Wortman Vaughan, J. Interpreting interpretability: Understanding data scientists’ use of interpretability tools for machine learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020. [Google Scholar]
Oakden-Rayner, L.; Dunnmon, J.; Carneiro, G.; Ré, C. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. In Proceedings of the ACM Conference on Health, Inference, and Learning, Toronto, ON, Canada, 2–4 April 2020; Association for Computing Machinery: New York, NY, USA; pp. 151–159. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA; pp. 1135–1144. [Google Scholar]
Alvarez-Melis, D.; Jaakkola, T.S. On the robustness of interpretability methods. arXiv 2018, arXiv:1806.08049. [Google Scholar] [CrossRef]
Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; Lakkaraju, H. Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, 7–8 February 2020; Association for Computing Machinery: New York, NY, USA; pp. 180–186. [Google Scholar]
Molnar, C.; Casalicchio, G.; Bischl, B. Interpretable machine learning—A brief history, state-of-the-art and challenges. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Vilnius, Lithuania, 8–12 September 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 417–431. [Google Scholar]
Chen, H.; Lundberg, S.; Lee, S.I. Explaining models by propagating Shapley values of local components. In Explainable AI in Healthcare and Medicine: Building a Culture of Transparency and Accountability; Springer International Publishing: Cham, Switzerland, 2020; pp. 261–270. [Google Scholar]
Wachter, S.; Mittelstadt, B.; Russell, C. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. J. Law Technol. 2017, 31, 841. [Google Scholar] [CrossRef]
Verma, S.; Dickerson, J.; Hines, K. Counterfactual explanations for machine learning: A review. arXiv 2020, arXiv:2010.10596. [Google Scholar]
Koh, P.W.; Nguyen, T.; Tang, Y.S.; Mussmann, S.; Pierson, E.; Kim, B.; Liang, P. Concept bottleneck models. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 5338–5348. [Google Scholar]
Pearl, J.; Mackenzie, D. The Book of Why: The New Science of Cause and Effect; Basic Books: New York, NY, USA, 2018. [Google Scholar]
Hernán, M.; Robins, J.M. Causal Inference: What If; Chapman and Hall/CRC: Boca Raton, FL, USA, 2020. [Google Scholar]

Figure 1. Transformer architecture for healthcare NLP.

Figure 2. The explainability problem in transformers.

Figure 3. Raghupathi’s five systemic properties for healthcare information systems.

Figure 4. The reasoning–function spectrum.

Figure 5. The distributed accountability network.

Figure 6. The systemic explainability ecosystem.

Figure 7. The context-dependency matrix.

Figure 8. XDL deployment decision framework.

Table 1. General systems theory applied to XDL design.

GST Principle	Healthcare AI Implication	XDL Design Requirement
Emergence	Explanation utility arises from technical-clinical-organizational interaction	Design across all three layers simultaneously
Feedback	Static solutions become obsolete; continuous adaptation required	Build monitoring and refinement mechanisms
Openness	Medical knowledge and practice evolve	Link explanations to current evidence and guidelines
Requisite Variety	Different applications have different transparency needs	Risk-stratified approaches matching context
Equifinality	Multiple valid explanation approaches exist	Technical pluralism combining complementary methods

Table 2. Systemic adequacy of major XDL methods.

Method	Technical	Cognitive	Organizational	Systemic Grade
Saliency Maps	Moderate	Poor	Poor	D: Failure
LIME	Moderate	Poor	Poor	D: Failure
SHAP	Good	Poor	Moderate	C: Partial Success
Attention	Moderate	Moderate	Moderate	C+: Partial Success

Table 3. Risk-stratified framework for XDL requirements.

Category	Characteristics	Examples	XDL Requirements	Regulatory Approach
High-Stakes Reasoning	Direct diagnosis/treatment impact; Requires clinician judgment; Significant individual outcomes; Professional accountability central	Cancer treatment selection; Diagnostic classification; Drug interaction prediction; Surgical planning; Genetic counseling	Comprehensive XDL: Feature attribution; Counterfactuals; Uncertainty quantification; Evidence linkage; Alternative presentation; Bias disclosure	Stringent pre-market approval, continuous surveillance, mandatory incident reporting, regular audits.
Medium-Stakes Augmentation	Supports workflow without decision authority; Prioritization/triage/early warning; Clinician oversight required; Errors recoverable	Radiology worklist prioritization; ICU deterioration warnings; Fall risk screening; Readmission prediction; Lab result flagging	Contextual XDL: Summary explanations; Confidence indicators; Performance transparency; Threshold justification; Simplified counterfactuals	Moderate oversight, performance-based validation, user feedback integration, bias monitoring.
Low-Stakes Operational	Supports operational efficiency; No direct clinical reasoning; Minimal individual outcome impact; Errors cause inconvenience, not harm	Appointment scheduling; Supply chain management; Equipment maintenance; Staff scheduling; Image preprocessing; Sensor calibration	Minimal XDL: Performance metrics; Anomaly detection; System documentation; Failure mode analysis	Light-touch oversight, self-certification, incident reporting for systematic failures only.

Table 4. Explainability as dynamic equilibrium.

Aspect	Panacea Manifestation	Pandora’s Box Manifestation	Systemic Balance
Trust	Builds appropriate confidence through understanding	Creates false security through illusions of understanding	Co-validated feedback and audit cycles ensuring reliability
Complexity	Clarifies AI reasoning through semantic translation	Adds interpretive burden through cognitive overload	Role-filtered interfaces with adaptive control
Accountability	Clarifies responsibility distribution	Blurs liability through explanation theater	Shared governance frameworks
Ethics	Promotes fairness through bias detection	Exposes privacy risks and discrimination	Controlled openness with contextual disclosure
Learning	Reinforces model improvement through feedback	Risks instability and alert fatigue	Iterative evaluation via managed feedback loops
Workflow	Supports clinical reasoning integration	Disrupts time-critical interventions	Temporal adaptation (Figure 7)
Resources	Optimizes investment in high-value applications	Wastes resources on unnecessary transparency	Risk-stratified allocation

Table 5. Soft Systems Methodology applied to XDL.

SSM Step	Traditional SSM Focus	XDL Adaptation	Deliverable
1. Problem Situation	Understanding unstructured problem	Diagnose trust deficit, interpretability gaps, stakeholder concerns	Rich picture mapping data flows, stakeholders, transparency needs
2. Root Definition	Defining transformation purpose	Define “a system for transparent, trustworthy AI-based care” using CATWOE	Formal statement of XDL system purpose and boundaries
3. Conceptual Model	Building activity model	Design three-layer explanation architecture with feedback loops	Conceptual architecture spanning technical, cognitive, organizational layers
4. Comparison	Comparing model with reality	Validate explanations against actual clinician reasoning patterns	Gap analysis identifying misalignments
5. Implementation	Defining feasible changes	Deploy adaptive dashboards, role-specific interfaces, governance structures	Integrated XDL system with continuous monitoring
6. Evaluation (5E)	Assessing intervention	Assess Efficacy, Efficiency, Effectiveness, Ethics, Elegance	Multi-dimensional success metrics

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Raghupathi, W. Explainability in Deep Learning in Healthcare and Medicine: Panacea or Pandora’s Box? A Systemic View. Algorithms 2026, 19, 63. https://doi.org/10.3390/a19010063

AMA Style

Raghupathi W. Explainability in Deep Learning in Healthcare and Medicine: Panacea or Pandora’s Box? A Systemic View. Algorithms. 2026; 19(1):63. https://doi.org/10.3390/a19010063

Chicago/Turabian Style

Raghupathi, Wullianallur. 2026. "Explainability in Deep Learning in Healthcare and Medicine: Panacea or Pandora’s Box? A Systemic View" Algorithms 19, no. 1: 63. https://doi.org/10.3390/a19010063

APA Style

Raghupathi, W. (2026). Explainability in Deep Learning in Healthcare and Medicine: Panacea or Pandora’s Box? A Systemic View. Algorithms, 19(1), 63. https://doi.org/10.3390/a19010063

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainability in Deep Learning in Healthcare and Medicine: Panacea or Pandora’s Box? A Systemic View

Abstract

1. Introduction

2. Understanding Explainability in Deep Learning in Healthcare and Medicine

2.1. Transformers in Healthcare and Medicine

2.2. Summary

3. Systemic Foundations: Beyond Reductionism

3.1. The Socio-Technical Paradigm

3.2. Healthcare-Medicine Specific Systems Concepts

The Five Key Systemic Properties

3.3. Clinical Reasoning Versus Clinical Function

3.4. Total Digital Health Systems and Distributed Transparency

3.5. The Integrated Three-Layer Architecture

4. Systemic Analysis of XDL Techniques

4.1. Saliency Maps: Perceptual Appeal, Systemic Failure

4.2. LIME: Flexibility Without Reliability

4.3. SHAP: Mathematical Elegance, Clinical Disconnect

4.4. Emerging Approaches

5. Context as Determinant: Panacea Versus Pandora’s Box

5.1. The Reasoning–Function Distinction: A Foundational Framework

5.2. Synthesizing Dual Nature: Dynamic Equilibrium

6. Implementation as Systemic Intervention

6.1. Soft Systems Methodology for XDL Implementation

6.2. Determining Appropriate Explainability Levels

6.3. Limitations

6.4. Mitigation Strategies

7. Conclusions

7.1. Key Findings

7.2. Theoretical Contributions

7.3. Implications for Practice and Policy

7.4. Future Directions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI