From Black Box to Glass Box: A Practical Review of Explainable Artificial Intelligence (XAI)

Liu, Xiaoming; Huang, Danni; Yao, Jingyu; Dong, Jing; Song, Litong; Wang, Hui; Yao, Chao; Chu, Weishen

doi:10.3390/ai6110285

Open AccessReview

From Black Box to Glass Box: A Practical Review of Explainable Artificial Intelligence (XAI)

by

Xiaoming Liu

^1,†,

Danni Huang

^2,†,

Jingyu Yao

³,

Jing Dong

⁴

,

Litong Song

⁵,

Hui Wang

³,

Chao Yao

⁶ and

Weishen Chu

^7,*

¹

Department of Computer Science, San Jose State University, San Jose, CA 95192, USA

²

School of Engineering and Computing, Trine University, Angola, IN 46703, USA

³

Department of Computer Science, Yale University, New Haven, CT 06520, USA

⁴

Department of Computer Science, Columbia University, New York, NY 10027, USA

⁵

Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX 78712, USA

⁶

Ira A. Fulton Schools of Engineering, Arizona State University, Tempe, AZ 85281, USA

⁷

Department of Electrical and Computer Engineering, Northwestern University, Evanston, IL 60208, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

AI 2025, 6(11), 285; https://doi.org/10.3390/ai6110285

Submission received: 24 September 2025 / Revised: 23 October 2025 / Accepted: 26 October 2025 / Published: 3 November 2025

(This article belongs to the Section AI Systems: Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

Explainable Artificial Intelligence (XAI) has become essential as machine learning systems are deployed in high-stakes domains such as security, finance, and healthcare. Traditional models often act as “black boxes”, limiting trust and accountability. Traditional models often act as “black boxes”, limiting trust and accountability. However, most existing reviews treat explainability either as a technical problem or a philosophical issue, without connecting interpretability techniques to their real-world implications for security, privacy, and governance. This review fills that gap by integrating theoretical foundations with practical applications and societal perspectives. define transparency and interpretability as core concepts and introduce new economics-inspired notions of marginal transparency and marginal interpretability to highlight diminishing returns in disclosure and explanation. Methodologically, we examine model-agnostic approaches such as LIME and SHAP, alongside model-specific methods including decision trees and interpretable neural networks. We also address ante-hoc vs. post hoc strategies, local vs. global explanations, and emerging privacy-preserving techniques. To contextualize XAI’s growth, we integrate capital investment and publication trends, showing that research momentum has remained resilient despite market fluctuations. Finally, we propose a roadmap for 2025–2030, emphasizing evaluation standards, adaptive explanations, integration with Zero Trust architectures, and the development of self-explaining agents supported by global standards. By combining technical insights with societal implications, this article provides both a scholarly contribution and a practical reference for advancing trustworthy AI.

Keywords:

explainable AI; trustworthy AI; trustworthy machine learning; explainability; security and privacy; Zero Trust architecture; adversarial robustness; large language model; generative AI

1. Introduction

Artificial intelligence system and machine learning algorithms are widespread in many area nowadays [1,2]. Data is used almost everywhere to solve problem and help humans [3,4]. A large factor for this success is progress in deep learning area but also generally the development of new and creative ways to how we can use data. Recent advances in vision–language alignment and unified benchmarks for parameter-efficient transfer learning further highlight how quickly the field is evolving and expanding into complex multimodal and adaptive settings [5,6]. Recent studies show that even in highly specialized domains such as high-frequency trading, the combination of dynamic feature selection and lightweight neural networks is required to process real-time data efficiently while keeping models adaptable [7]. As a consequence, the complexity of these systems becomes incomprehensible even for AI experts. That is why the models are usually referred as “black boxes” [8].

However, machine learning is also applied in critical decision-making in areas such as engineering, psychology, career development, and policy recommendations. This creates a dilemma: while AI is entrusted with making important decisions, its reasoning process is often either opaque or difficult to understand. Therefore, there is a pressing need for systems that can make the decision-making process of AI more transparent, interpretable, and comprehensible. In other words, such systems allow humans to understand why AI arrives at a particular prediction or decision [9]. This is precisely the role of Explainable AI (XAI).

Explainable Artificial Intelligence (XAI) is an emerging area that has gained momentum alongside the rapid development and commercialization of machine learning algorithms and artificial intelligence applications. As AI has moved beyond textbooks and research laboratories into real-world domains such as search engines, autonomous vehicles, and healthcare, the demand for explainability has become increasingly critical. Figure 1 illustrates the trend of capital investment in AI in the United States from 2015 to 2024 [10]. Investment levels rose steadily through 2020, spiked in 2021, experienced a temporary decline in 2022, and then strongly rebounded in 2023–2024.

To ensure that the review was conducted in a systematic and objective manner, we collected literature from major academic databases, including Scopus, IEEE Xplore, and Web of Science, covering the period from 2015 to 2024. The detailed qualitative evaluation criteria are listed in Table 1. The search used combinations of keywords such as “Explainable Artificial Intelligence”, “XAI”, “model interpretability”, and “trustworthy AI”. Only peer-reviewed journal and conference papers written in English were included, while preprints, editorials, and book chapters were excluded. Duplicate records were removed using DOI filtering. The selected studies were then analyzed thematically to identify methodological patterns, application areas, and emerging research directions in explainable AI.

Figure 2 presents the corresponding growth of XAI-related publications during the same period, showing an exponential year-over-year increase. Notably, while overall AI investment dropped in 2022—likely influenced by the COVID-19 pandemic—the trajectory of XAI research remained unaffected. This financial independence underscores the essential nature of XAI as a research domain.

We argue that incorporating capital investment trends into the analysis of XAI provides a novel perspective for understanding its importance and societal relevance. Accordingly, this article adopts a dual approach: it not only reviews the existing literature on XAI but also introduces new conceptual frameworks inspired by economic principles. Because XAI concerns not only algorithms but also human trust, the integration of economic thinking—such as marginal transparency and marginal interpretability—offers fresh insights into how explainability can be measured, optimized, and valued. Our goal is to move beyond conventional academic boundaries, presenting this review as both a rigorous scholarly contribution and a practical reference resource, with the ambition of serving as a “dictionary-like” guide to XAI for a broad audience of researchers, practitioners, and policymakers.

2. Key Concepts

This chapter discusses the key concepts and fundamental principles of XAI. One of the ultimate goals of XAI is to minimize, and ideally eliminate, the trade-off between the accuracy of an AI system and the ease with which its processes can be understood. This challenge leads us to two foundational concepts in the field: Transparency and Interpretability.

2.1. Transparency

Transparency refers to the degree to which users and stakeholders can see and understand the internal workings of an AI system. It highlights the importance of opening the model’s “black box” by showing which algorithms are used, what data is processed, and how decisions are made. A transparent system also provides visibility into the structure of the model, the flow of data, and the influence of specific attributes on outcomes. An AI model is considered transparent when its data-processing steps and decision criteria are presented in a way that people can clearly follow. Research by Van der Waa and colleagues [11] emphasizes that transparency is essential for building trust, particularly in high-stakes domains such as healthcare and finance, where understanding the decision-making process is crucial.

2.2. Interpretability

Interpretability refers to the extent to which people can understand the meaning of a model’s predictions or decisions. As shown in Figure 3, the focus lies on the human ability to grasp the outputs of an AI system and the reasoning that supports them. This balance between accuracy and interpretability has been widely discussed in the literature [12,13,14,15,16,17], often framed as the “black box” problem of machine learning. Prior work emphasizes that while deep neural networks achieve state-of-the-art accuracy, their opacity limits human trust and accountability. Explainable AI seeks to overcome this trade-off by reinforcing models with methods that provide both predictive performance and interpretability, especially in high-stakes domains where decision rationales are as important as outcomes. Interpretability often requires simplifying complex models into more understandable forms or providing explanations that clarify why a particular decision was made. Methods such as LIME and SHAP belong to this category because they highlight the importance of features in individual predictions. However, research in 2023 shows that transparency alone is not enough. Even if a model reveals its inner workings, it may still remain difficult to interpret if the learned relationships are too complex for humans to follow [18].

Building on the idea that capital investment trends can offer new insight into the importance and societal relevance of XAI, we extend the discussion with two new concepts. We propose marginal transparency and marginal interpretability. These concepts borrow from economics and highlight how small, incremental changes in disclosure or explanation can shape the overall value of explainable AI.

2.3. Marginal Transparency

Marginal transparency builds on the economic idea of marginal cost [19]. It describes the extra clarity that users gain when developers add one more layer of disclosure. Examples include model documentation, architectural diagrams, or data-flow visualizations. The first layers of transparency, such as showing the model’s structure or highlighting feature importance, usually provide large gains in understanding. Later efforts, like publishing detailed hyperparameter logs or long mathematical proofs, tend to add less value. These steps often bring diminishing returns. This perspective shows that transparency is not a simple on–off state but a spectrum. Each new disclosure adds something, but the benefit becomes smaller over time. As Van der Waa et al. [11] note, it is therefore important to allocate disclosure resources with care and strategy.

In practical settings, marginal transparency can be implemented through structured documentation frameworks such as model cards and data sheets for datasets, which disclose essential information about model design, training data, and limitations while avoiding unnecessary technical overload. These incremental disclosures help organizations balance the cost of transparency with its utility for regulators and end users. For instance, companies deploying high-risk AI systems often adopt tiered transparency policies—internal documentation for engineers, summarized reports for auditors, and simplified explanations for consumers—reflecting different levels of marginal gain from additional disclosure.

2.4. Marginal Interpretability

Marginal interpretability draws on the economic idea of marginal utility [20]. It refers to the extra understanding people gain when one more layer of explanation is added to an AI system. Early methods, such as LIME [21] and SHAP [22], often provide large benefits. They help users see which features matter most for individual predictions and make the reasoning of the model easier to follow. As explanations become more detailed or highly technical, the gains become smaller. For example, showing high-dimensional feature interactions or long mathematical proofs may add little value and may even confuse non-experts. This pattern reflects the principle of diminishing returns. Interpretability, like transparency, is not limitless. Designers must balance the depth of explanation with the limits of human cognition to create effective XAI systems.

Similarly, marginal interpretability can be observed in real-world systems where explanations are dynamically adjusted according to user expertise. In healthcare AI, for example, models may provide clinicians with feature-level attributions while giving patients simplified textual rationales. This adaptive explanation depth illustrates how interpretability can be optimized rather than maximized, aligning with the principle of diminishing returns in explanation utility.

3. Key Methods

3.1. Summary of Open-Source XAI Toolkits

To emphasis the practical uses of XAI, we first discuss what is available Open-Source XAI Toolkits. These frameworks offer diverse capabilities ranging from local and global explanations to auditing for fairness and robustness. As summarized in Table 2, IBM’s AIX360 delivers a broad library of interpretability methods, while Alibi from SeldonIO emphasizes counterfactual reasoning and confidence metrics. Facet, developed by BCG X, focuses on geometric inference of feature interactions, and OmniXAI provides a versatile multi-model platform with an intuitive graphical interface. Meanwhile, Explabox offers a pipeline for auditing transparency and robustness, and Xplique specializes in deep-learning model introspection. Collectively, these toolkits demonstrate the growing ecosystem of practical resources that bring XAI research into real-world deployment.

Although many toolkits and algorithms have been developed for XAI, most approaches can be grouped into two main categories: model-agnostic methods and model-specific approaches. This division is sufficient to capture the core methodology of the field. Within the model-agnostic category, two seminal papers stand out: LIME [20] and SHAP [21]. Nearly all other techniques we have examined build on these foundations. They typically introduce refinements, optimizations, or application-specific adaptations. These contributions are valuable, but the central ideas still trace back to the original work on LIME and SHAP.

3.2. Model-Agnostic Methods

3.2.1. Local Interpreatable Model-Agnostic Explanations (LIME)

LIME [21] provides explanations at the local level. It simplifies complex models by building small, interpretable models around a specific prediction. The method perturbs the input data and observes how those changes affect the output. It then fits a weighted linear model to approximate the local behavior of the original model, giving more weight to perturbed points closer to the original input. Studies show that LIME is efficient in many scenarios and can reveal insights into complex models such as deep neural networks and ensembles. However, its reliability depends on the perturbation strategy and the number of samples used, which can affect the stability of the explanations.

To be specific, the model explains predictions by approximating a complex model

f

with an interpretable model

g

in the neighborhood of the instance being explained. Let

x \in R^{d}

be the input, and

x^{'} \in {\{0, 1\}}^{d'}

its interpretable representation (e.g., words in text or super-pixels in images). The explanation is obtained by solving:

ξ (x) = \underset{g∈G}{argmin} L (f, g, π_{x}) + Ω (g)

where

L (f, g, π_{x})

measures how unfaithful

g

is in approximating

f

near

x

,

π_{x} (g)

defines locality, and

Ω (g)

penalizes complexity to keep

g

interpretable.

3.2.2. Shapley Additive Explanations (SHAP)

SHAP [22] extends the idea of local explanation by applying cooperative game theory. It assigns importance to features using Shapley values, which fairly distribute the contribution of each feature across all possible feature combinations. Unlike LIME, SHAP offers both local and global interpretability. It can explain individual predictions and also summarize feature importance across an entire model. Research shows that SHAP tends to be more stable than other methods, though it is computationally expensive, especially for models with many features.

To be specific, the method unifies additive feature attribution approaches under a single framework. Let

f (x)

be the original model and

x \in R^{M}

the input with

M

features. SHAP explains

f (x)

through an additive surrogate model:

g (z^{'}) = \emptyset_{0} + \sum_{i = 1}^{M} \emptyset_{i} {z'}_{i}

where

z^{'} \in {\{0, 1\}}^{M}

represents the presence or absence of features, and

\emptyset_{i}

is the contribution of feature

i

.

3.3. Model-Specific Methods

Fortunately, not all machine learning models are black boxes. Some are “glass-box” models, meaning they are inherently interpretable. These models can, in theory, provide users with complete interpretability. Under our definition of marginal interpretability, this situation corresponds to a value of zero. In economic terms, the system has already reached its maximum return. Additional efforts to improve interpretability yield no further benefit. Although many studies continue to explore the explainability of these models, the theoretical focus should shift. For glass-box models, the priority is no longer explainability but rather improving accuracy and performance.

3.3.1. Decision Trees

Decision trees are among the most straightforward and understandable models in machine learning. They represent choices as branching paths defined by feature values, which makes the decision process easy to follow. Each node shows a feature, and each branch shows a decision rule. Their visual structure gives them a natural interpretability [29]. A recent study confirms that decision trees are not only simple to grasp but also flexible [30]. They can handle both categorical and continuous data, which makes them useful across many domains. At the same time, the study warns that decision trees can overfit when they grow too deep. In such cases, the model becomes complex, and its interpretability declines.

Several strategies aim to improve both the performance and interpretability of decision trees. One example is the Random Forest, which combines many decision trees to increase predictive accuracy. This ensemble method often produces stronger results while retaining some level of interpretability. However, as the study notes, the added complexity reduces transparency. Users may find it harder to understand the collective decision-making process compared to a single tree [31].

3.3.2. Interpretable Neural Networks

Interpretable neural networks mark an important advance in the progress of XAI. Traditional deep learning models are highly effective, but they often operate as “black boxes”. Their complex structures make them difficult to interpret [12]. To address this challenge, researchers have designed specialized architectures that are easier to understand. One example is the use of attention mechanisms, which allow models to highlight the most relevant features when making predictions [32]. By showing which inputs matter most, attention-based models improve interpretability while preserving strong performance.

Another approach is the development of glass-box neural networks. These models combine the power of deep learning with structures that enhance transparency [33]. They are designed to remain as simple as possible while maintaining the efficiency of standard deep neural networks. Achieving this balance between complexity and interpretability is especially critical in domains such as healthcare and finance, where decisions directly affect human well-being.

Model-specific methods are widely applied across fields. Decision trees are often used in finance for credit scoring and risk analysis because of their clarity. Interpretable neural networks, by contrast, have been applied in image recognition and natural language processing, where it is important to understand which factors drive predictions. Across these applications, interpretable models support the safe adoption of AI systems. They show that transparency can strengthen user trust and lead to better decision-making outcomes.

3.3.3. Explainability in Large Language Models (LLMs)

Large Language Models (LLMs) present both new opportunities and challenges for explainability. Unlike earlier AI models, LLMs can generate natural language rationales for their outputs, offering interactive and human-friendly explanations that extend beyond traditional attribution methods [34]. However, these rationales may not always be faithful, raising concerns about hallucinated explanations and the high computational cost of interpretation. Recent studies point to two main directions for advancing XAI in LLMs. The first focuses on user-facing explanations, such as chain-of-thought prompting, attention-based analyses, and retrieval-augmented generation, which aim to clarify how models arrive at predictions [34]. The second emphasizes mechanistic interpretability, which seeks to reverse engineer circuits, features, and representations inside transformer architectures to uncover how knowledge is stored and processed [35]. Together, these approaches highlight the importance of combining accessible explanations with deeper structural analysis to ensure that LLMs can be deployed in high-stakes domains with both transparency and trust.

While methods such as LIME and SHAP have become foundational tools in explainable AI, they still face notable trade-offs compared with other techniques. LIME is efficient and model-agnostic but can produce unstable explanations that vary with random perturbations, while SHAP offers theoretically consistent feature attributions at the cost of high computational complexity. In contrast, gradient-based visualization methods such as Grad-CAM are faster for image data but limited to differentiable models, and counterfactual approaches provide intuitive examples yet lack scalability in large datasets. As shown in Table 3, many studies also emphasize algorithmic accuracy rather than how explanations are perceived by different user groups. Moreover, the reliability of post hoc explanations remains an open issue—two models of similar accuracy may yield inconsistent local explanations for the same instance. Addressing these challenges will require integrating human-centered evaluation, stability testing, and real-world usability assessments into the design of XAI techniques.

4. Taxonomy of Explainable AI

4.1. Ante-Hoc vs. Post-Hoc Approaches

Ante-hoc (intrinsically interpretable) methods are designed to be transparent by construction. Models such as decision trees, generalized additive models, and rule-based systems inherently provide human-readable decision rules [14]. In cybersecurity, these models are widely used in access control policies and rule-based anomaly detection, where interpretability is essential for operational auditability [38]. Ante-hoc methods are advantageous in regulated environments but often struggle to match the accuracy of deep neural networks in large-scale security tasks.

Post hoc methods, in contrast, generate explanations for already-trained black-box models. They include saliency maps, gradient-based attribution, perturbation techniques, and counterfactual explanations [17]. In security domains, post hoc explanations help forensic analysts understand why a deep learning model identified a binary as malware or why an intrusion detection system flagged network activity as suspicious. Although widely adopted, post hoc methods risk producing explanations that are approximate rather than faithful, raising concerns about their reliability in adversarial contexts [12].

4.2. Local vs. Global Explanations

Local explanations clarify individual predictions, providing instance-specific rationales. For example, in fraud detection, a local explanation might indicate that an unusual transaction location and atypical spending pattern jointly triggered a fraud alert. Local explanations are particularly useful in operational decision-making, where analysts must justify immediate actions such as blocking a transaction or denying access.

Global explanations, on the other hand, aim to capture overall model behavior by identifying dominant patterns and feature interactions. In privacy and security, global explanations can reveal systematic biases in an intrusion detection system or highlight the features most frequently used by a malware classifier across an entire dataset. Global transparency is critical for compliance audits and for aligning AI models with organizational or regulatory expectations.

4.3. Privacy-Preserving Explainability

Beyond traditional axes of classification, privacy-preserving explainability has emerged as a critical requirement in sensitive domains. Explanations themselves can inadvertently reveal confidential or personally identifiable information (PII). For instance, a feature attribution method in a medical intrusion detection system might disclose that a specific patient attribute strongly influenced the model’s prediction, thereby breaching privacy [39,40,41].

Techniques to mitigate this risk include differentially private explanations, where controlled noise is added to explanation outputs, and federated explainability, in which explanations are generated in a decentralized manner without exposing raw data [42]. In multi-cloud or edge computing security environments, privacy-preserving XAI is indispensable for balancing transparency with confidentiality obligations [43]. Research into privacy-preserving explainability remains nascent but is gaining momentum due to the dual demands of trustworthiness and data protection.

4.4. Visual Taxonomy of XAI Applications

To illustrate how these taxonomic categories translate into practice, Figure 4 maps representative methods across both classification criteria and application domains. The horizontal axis captures methodological dimensions (model-specific vs. model-agnostic, ante-hoc vs. post hoc, local vs. global, and privacy-preserving). The vertical axis lists security and privacy domains where XAI plays a central role, including intrusion detection, malware analysis, fraud detection, access control, and privacy protection.

Each plotted point links a method to its domain of use. For example, decision trees serve in both intrusion detection and access control due to their transparent rule structures, while LIME and SHAP are widely applied to anomaly detection and fraud detection. Post hoc visualization techniques such as Grad-CAM and saliency maps aid malware analysis, whereas counterfactuals provide user-level reasoning in fraud detection. Finally, privacy-preserving approaches like homomorphic encryption ensure that explanations do not expose sensitive data, making them vital in privacy protection. Recent advances in domain adaptation show how explainability must adjust to shifting data distributions while maintaining reliable performance across contexts (e.g., MDANet; Self-training with label-feature-consistency) [44,45]. Parameter-efficient transfer learning has also become central, enabling scalable adaptation across tasks without compromising interpretability (e.g., Vmt-adapter; V-petl bench) [5,46]. Meanwhile, multimodal alignment and generative models highlight the challenge of developing XAI techniques capable of making sense of complex reasoning across text, images, and motion (e.g., Mmap; UniTMGE; calibrated self-rewarding VLMs; diffusion model studies) [6,47,48,49].

This taxonomy diagram consolidates the methodological landscape, showing that XAI is not confined to a single axis but instead spans multiple categories, each suited to different applications and trade-offs.

Compared with earlier taxonomies of explainable AI, which primarily classified methods along the axes of model specificity and explanation scope, the taxonomy proposed in this study introduces two prevailing extensions. First, it explicitly incorporates privacy-preserving explainability as a new dimension, reflecting the growing importance of data confidentiality and regulatory compliance in modern AI systems. Second, it integrates application-level mapping, linking methodological categories to concrete domains such as intrusion detection, fraud analysis, and privacy protection. This dual-layered design enables the taxonomy to capture both technical and contextual aspects of explainability, making it more adaptable to real-world security and governance scenarios. By aligning methodological classification with emerging societal demands, the proposed taxonomy provides a more holistic and practical framework for understanding the evolving landscape of XAI.

5. Research Roadmap for 2025–2030+

Explainable Artificial Intelligence (XAI) in security and privacy is entering a decisive stage. The growing sophistication of adversarial threats, the expansion of distributed computing, and stricter global regulations call for a clear forward-looking agenda. As illustrated in Figure 5, the years 2025 to 2030 will determine how XAI matures into a trusted, operational, and internationally aligned element of secure AI systems from the perspective of a senior machine learning specialist.

5.1. Short-Term (2025–2026): Establishing Foundations

The first priority is the standardization of evaluation protocols. Current practices are fragmented and depend on isolated measures such as fidelity, comprehensibility, and stability. By 2026, the research community must converge on shared benchmarks that capture adversarial robustness, regulatory compliance, and latency. Latency optimization will also become critical. Explanations must be produced in near real time for domains such as fraud detection and intrusion prevention. At the same time, privacy-preserving XAI methods will begin to emerge. Techniques like differential privacy, homomorphic encryption, and federated explainability will help ensure that explanations do not create new risks of information leakage. Together, these steps will provide the foundation for scalable, compliant, and trustworthy XAI in security-critical sectors.

5.2. Mid-Term (2027–2028): Integration and Expansion

The next stage will be defined by integration with Zero Trust Architectures (ZTA). As ZTA becomes the standard for enterprise and government security [50], XAI will act as the interpretability layer that justifies access, validates trust scores, and supports audits. Beyond the enterprise, XAI will need to adapt to multi-cloud, IoT, and edge environments [51]. Explanations in these contexts must be lightweight, distributed, and sensitive to context. This period will also bring adaptive explanations. Explanation depth will adjust dynamically, offering technical detail to administrators and policy-level reasoning to compliance officers. Achieving this adaptability will require collaboration between machine learning, cognitive science, and human–computer interaction. By 2028, XAI will evolve from an auxiliary function into an embedded and adaptive element of distributed AI ecosystems.

5.3. Long-Term (2029–2030+): Towards Global Explainable Security

The long-term vision points to self-explaining AI agents. These systems will embed reasoning directly into their architectures, allowing them to articulate decisions in natural, human-compatible terms. Reliance on post hoc methods will decrease as intrinsic transparency grows. At the same time, international collaboration will focus on creating global standards for explainable security. These standards, similar to ISO/IEC 27001 for cybersecurity, will define requirements for interpretability, auditability, and fairness [52]. Cross-border compliance will become possible, and trust in AI systems will increase in critical areas such as finance, healthcare, defense, and infrastructure. Looking further ahead, global explainable security networks may emerge. In such networks, XAI-driven threat detection systems across countries will share interpretable insights while respecting data sovereignty. This will transform explanation into a collective defense mechanism for the digital world.

5.4. Synthesis

This roadmap highlights a steady progression. The short term focuses on evaluation and privacy-preserving methods. The mid-term emphasizes integration into distributed, adaptive environments. The long-term advances toward self-explaining systems and international harmonization. Each phase builds on the one before it. Together, they show that explainability is not static but an evolving capability that must keep pace with new technologies, adversaries, and social expectations. By 2030, explainability will stand as a core requirement of secure and trustworthy AI, shaping both resilience in operations and the future of global governance.

6. Applications of Explainable AI in Data Privacy and Security

6.1. Enhancing Transparency in Security Decision-Making

AI systems are increasingly used in cybersecurity tasks such as intrusion detection, malware classification, and anomaly detection. However, the opaque nature of black-box models can hinder analysts’ trust in automated security alerts. XAI techniques provide interpretability by highlighting the specific features or patterns responsible for a given decision. For instance, when a deep learning system flags suspicious network traffic, post hoc explanation methods such as SHAP or LIME can reveal which packet features contributed most strongly to the classification. As shown in Singh et al. [53], SHAP summary plots (Figure 6) reveal the most influential traffic features in encrypted environments, enabling analysts to validate anomaly alerts with clear, feature-level justifications. This transparency enables security professionals to validate alerts, identify false positives, and fine-tune monitoring systems.

6.2. Privacy-Preserving Explanations

A unique challenge in the data privacy domain is balancing interpretability with confidentiality. Traditional explanations may unintentionally reveal sensitive information, creating a secondary privacy risk. Emerging research explores privacy-preserving XAI, where explanations are generated without exposing raw data or individual records. Differential privacy mechanisms have been integrated into explanation models to provide aggregate-level interpretability while safeguarding individual data. Such methods are critical in contexts like healthcare, where medical AI models must explain diagnostic decisions without disclosing identifiable patient data.

6.3. Trustworthy AI for Compliance and Regulation

Global data protection frameworks, including the EU’s General Data Protection Regulation (GDPR), emphasize the “right to explanation” in automated decision-making. As illustrated in Figure 7 of Lukács [54], the GDPR also specifies possible exceptions to this right, clarifying when automated decision-making may proceed without full explanation.

XAI directly addresses this legal and ethical requirement by enabling organizations to justify AI-driven decisions in domains such as credit scoring, identity verification, and fraud detection. By producing human-understandable explanations, XAI contributes to regulatory compliance, reduces legal risk, and supports organizations in establishing accountability frameworks.

6.4. Securing AI Models Against Adversarial Threats

AI models themselves are vulnerable to adversarial attacks, where maliciously crafted inputs deceive classifiers. XAI plays a defensive role by making the model’s decision boundaries and influential features visible, thus aiding in the detection of adversarial manipulation. For example, heatmaps and saliency maps can highlight abnormal perturbations that may not be evident to human observers. This helps researchers strengthen model robustness against attacks while also ensuring that explanations do not leak sensitive model parameters. As shown in Figure 8, recent work further shows that adversarial perturbations can drastically alter explanations while keeping predictions unchanged, underscoring the need for defenses such as explanation ensembling [55].

6.5. Human-in-the-Loop Security Systems

The combination of XAI with human expertise offers a promising avenue in cybersecurity operations. Security analysts rely not only on detection accuracy but also on contextual understanding of threats. As shown in Table 4, by integrating interpretable models into human-in-the-loop frameworks, XAI facilitates collaborative decision-making where analysts can interrogate model outputs, challenge incorrect reasoning, and iteratively improve system performance [56,57]. This synergy enhances both operational trust and practical adoption of AI-driven security systems.

6.6. Other Application Examples

Explainable AI has already shown measurable benefits in several high-impact domains. In cybersecurity, techniques such as SHAP and LIME have been used to interpret intrusion detection and malware classification models, helping analysts trace abnormal behaviors and reduce false alerts. In finance, interpretable neural networks and decision-tree ensembles support transparent credit scoring and fraud detection, enabling institutions to meet regulatory compliance while maintaining model accuracy. In healthcare, gradient-based and counterfactual explanations improve clinical decision support by revealing which imaging or laboratory features drive diagnostic predictions. These use cases demonstrate how explainability enhances not only accountability but also operational reliability. However, they also highlight practical challenges such as the trade-off between interpretability and latency, domain-specific bias, and the need for user-centered evaluation in real deployments.

Despite the promising applications of XAI in privacy, compliance, and human-in-the-loop systems, current research still focuses heavily on proof-of-concept studies rather than operational deployment. Many demonstrations rely on small-scale or synthetic datasets, making it difficult to assess real-world effectiveness. Moreover, while explanations are often visually persuasive, their actual contribution to decision quality or user trust is rarely quantified. Future work should move beyond illustrative examples toward longitudinal, domain-specific studies that evaluate the measurable impact of explainability in practice.

7. Limitations, Synthesis and Conclusion

7.1. Limitations

This review has several limitations that should be acknowledged. First, although major databases such as Scopus, IEEE Xplore, and Web of Science were used, the coverage of explainable AI publications may still be incomplete, and certain relevant works from other sources might have been missed. Second, the selection and synthesis of studies were based on qualitative thematic analysis rather than quantitative meta-analysis, which could introduce interpretive bias. Third, the review focuses mainly on peer-reviewed English-language publications, and therefore it may not fully represent ongoing developments in non-English or industrial research communities. Finally, because XAI is a fast-evolving field, new methods and applications may quickly outdate parts of this synthesis. Future work could address these limitations by integrating bibliometric analysis, expanding multilingual coverage, and periodically updating the dataset to reflect emerging trends.

7.2. Synthesis and Conclusion

Explainable Artificial Intelligence has shifted from a technical challenge to a socio-technical necessity. In domains where security, privacy, and accountability are critical, the ability to interpret and trust AI outputs is as important as achieving high predictive accuracy. XAI provides the foundation for this trust by combining transparency, interpretability, and privacy-preserving methods into a framework that enables responsible adoption.

Our review highlights both the diversity of methods and the broader forces shaping the field. Model-agnostic techniques such as LIME and SHAP offer flexibility across systems, while model-specific approaches, including decision trees and interpretable neural networks, deliver clarity by design. At the same time, regulatory frameworks, investment trends, and societal expectations ensure that explainability is not an optional feature but a requirement for compliance and acceptance. The concepts of marginal transparency and marginal interpretability underscore that explainability has an economic dimension, with diminishing returns that call for strategic allocation of resources.

Looking forward, the roadmap for 2025–2030+ shows how XAI will progress from fragmented practices toward global harmonization. Early phases will focus on shared benchmarks and privacy-preserving methods. Mid-term efforts will emphasize integration into enterprise security and adaptive explanations tailored to users. The long-term vision is one of self-explaining AI agents operating within international standards and collaborative networks. By 2030, explainability will stand as a core dimension of trustworthy AI—central not only to technical performance but also to governance, ethics, and collective resilience in the digital age.

Looking ahead, the field of explainable AI is expected to evolve from algorithmic innovation toward large-scale social adoption. Future developments will likely emphasize human-centered design, adaptive explanations tailored to different expertise levels, and integration with legal and ethical frameworks such as the EU AI Act and global data protection regulations. As AI systems increasingly affect employment, healthcare, education, and security, the societal demand for transparent and trustworthy decision-making will grow even stronger. Explainability will thus become not only a technical requirement but also a social contract between humans and intelligent systems.

Moreover, the future of XAI research will depend on interdisciplinary collaboration that bridges computer science, cognitive psychology, economics, and public policy. Governments and organizations will need to define clear standards for explainability, while educational systems must prepare professionals capable of interpreting and auditing AI behavior. In this sense, the progress of XAI will shape—and be shaped by—how societies choose to balance innovation, accountability, and human values in the age of intelligent automation.

Author Contributions

Conceptualization, W.C.; methodology, X.L. and D.H.; validation, J.Y., J.D. and L.S.; formal analysis, J.Y., H.W. and C.Y.; investigation, X.L. and D.H.; resources, W.C.; data curation, X.L. and D.H.; writing—original draft preparation, X.L. and D.H.; writing—review and editing, W.C.; visualization, W.C.; supervision, W.C.; project administration, W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT (GPT-5, OpenAI) solely for language refinement, including grammar and wording improvements. The authors have reviewed and edited all AI-assisted content and take full responsibility for the final version of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Selmy, H.A.; Mohamed, H.K.; Medhat, W. Big data analytics deep learning techniques and applications: A survey. Inf. Syst. 2024, 120, 102318. [Google Scholar] [CrossRef]
Bhattacherjee, A.; Badhan, A.K. Convergence of data analytics, big data, and machine learning: Applications, challenges, and future direction. In Data Analytics and Machine Learning: Navigating the Big Data Landscape; Springer: Singapore, 2024; pp. 317–334. [Google Scholar]
Wang, Y.; He, Y.; Wang, J.; Li, K.; Sun, L.; Yin, J.; Zhang, M.; Wang, X. Enhancing intent understanding for ambiguous prompt: A human-machine co-adaption strategy. Neurocomputing 2025, 646, 130415. [Google Scholar] [CrossRef]
He, Y.; Li, S.; Wang, J.; Li, K.; Song, X.; Yuan, X.; Li, K.; Lu, K.; Huo, M.; Tang, J.; et al. Enhancing low-cost video editing with lightweight adaptors and temporal-aware inversion. arXiv 2025, arXiv:2501.04606. [Google Scholar]
Xin, Y.; Luo, S.; Liu, X.; Zhou, H.; Cheng, X.; Lee, C.E.; Du, J.; Wang, H.; Chen, M.; Liu, T.; et al. V-petl bench: A unified visual parameter-efficient transfer learning benchmark. Adv. Neural Inf. Process. Syst. 2024, 37, 80522–80535. [Google Scholar]
Zhou, Y.; Fan, Z.; Cheng, D.; Yang, S.; Chen, Z.; Cui, C.; Wang, X.; Li, Y.; Zhang, L.; Yao, H. Calibrated self-rewarding vision language models. Adv. Neural Inf. Process. Syst. 2024, 37, 51503–51531. [Google Scholar]
Fan, Y.; Hu, Z.; Fu, L.; Cheng, Y.; Wang, L.; Wang, Y. Research on Optimizing Real-Time Data Processing in High-Frequency Trading Algorithms using Machine Learning. In Proceedings of the 2024 6th International Conference on Intelligent Control, Measurement and Signal Processing (ICMSP), Xi’an, China, 29 November–1 December 2024; pp. 774–777. [Google Scholar]
Hassija, V.; Chamola, V.; Mahapatra, A.; Singal, A.; Goel, D.; Huang, K.; Scardapane, S.; Spinelli, I.; Mahmud, M.; Hussain, A. Interpreting black-box models: A review on explainable artificial intelligence. Cogn. Comput. 2024, 16, 45–74. [Google Scholar] [CrossRef]
Doshi-Velez, F.; Kim, B. Towards a rigorous science of interpretable machine learning. arXiv 2017, arXiv:1702.08608. [Google Scholar] [CrossRef]
Our World in Data. Annual Private Investment in Artificial Intelligence. Available online: https://ourworldindata.org/grapher/private-investment-in-artificial-intelligence (accessed on 15 September 2025).
Van Der Waa, J.; Nieuwburg, E.; Cremers, A.; Neerincx, M. Evaluating XAI: A comparison of rule-based and example-based explanations. Artif. Intell. 2021, 291, 103404. [Google Scholar] [CrossRef]
Lipton, Z.C. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 2018, 16, 31–57. [Google Scholar] [CrossRef]
Gunning, D.; Aha, D. DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 2019, 40, 44–58. [Google Scholar]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
Gilpin, L.H.; Bau, D.; Yuan, B.Z.; Bajwa, A.; Specter, M.; Kagal, L. Explaining explanations: An overview of interpretability of machine learning. In Proceedings of the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy, 1–3 October 2018; pp. 80–89. [Google Scholar]
Carvalho, D.V.; Pereira, E.M.; Cardoso, J.S. Machine learning interpretability: A survey on methods and metrics. Electronics 2019, 8, 832. [Google Scholar] [CrossRef]
Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Giannotti, F.; Pedreschi, D. A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 2018, 51, 1–42. [Google Scholar] [CrossRef]
Kadir, M.A.; Mosavi, A.; Sonntag, D. Evaluation metrics for xai: A review, taxonomy, and practical applications. In Proceedings of the 2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES), Nairobi, Kenya, 26–28 July 2023; pp. 000111–000124. [Google Scholar]
Page-Hoongrajok, A.; Mamunuru, S.M. Approaches to intermediate microeconomics. East. Econ. J. 2023, 49, 368–390. [Google Scholar] [CrossRef]
Varian, H.R. Intermediate Microeconomics: A Modern Approach, 9th ed.; Elsevier: New York, NY, USA, 2014; p. 356. ISBN 978-0393934243. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Arya, V.; Bellamy, R.K.; Chen, P.-Y.; Dhurandhar, A.; Hind, M.; Hoffman, S.C.; Houde, S.; Liao, Q.V.; Luss, R.; Mojsilović, A.; et al. Ai explainability 360 toolkit. In Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD), Bangalore, India, 2–4 January 2021; pp. 376–379. [Google Scholar]
Klaise, J.; Van Looveren, A.; Vacanti, G.; Coca, A. Alibi: Algorithms for Monitoring and Explaining Machine Learning Models. 2020. Available online: https://github.com/SeldonIO/alibi (accessed on 15 September 2025).
Dion, K. Designing an Interactive Interface for FACET: Personalized Explanations in XAI; Worcester Polytechnic Institute: Worcester, MA, USA, 2024. [Google Scholar]
Yang, W.; Le, H.; Laud, T.; Savarese, S.; Hoi, S.C. Omnixai: A library for explainable ai. arXiv 2022, arXiv:2206.01612. [Google Scholar] [CrossRef]
Robeer, M.; Bron, M.; Herrewijnen, E.; Hoeseni, R.; Bex, F. The Explabox: Model-Agnostic Machine Learning Transparency & Analysis. arXiv 2024, arXiv:2411.15257. [Google Scholar] [CrossRef]
Fel, T.; Hervier, L.; Vigouroux, D.; Poche, A.; Plakoo, J.; Cadene, R.; Chalvidal, M.; Colin, J.; Boissin, T.; Bethune, L.; et al. Xplique: A deep learning explainability toolbox. arXiv 2022, arXiv:2206.04394. [Google Scholar] [CrossRef]
Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 1991, 21, 660–674. [Google Scholar] [CrossRef]
Song, Y.-Y.; Lu, Y. Decision tree methods: Applications for classification and prediction. Shanghai Arch. Psychiatry 2015, 27, 130–135. [Google Scholar]
Caruana, R.; Lou, Y.; Gehrke, J.; Koch, P.; Sturm, M.; Elhadad, N. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 1721–1730. [Google Scholar]
Shaik, T.; Tao, X.; Xie, H.; Li, L.; Higgins, N.; Velásquez, J.D. Towards Transparent Deep Learning in Medicine: Feature Contribution and Attention Mechanism-Based Explainability. Hum.-Centric Intell. Syst. 2025, 5, 209–229. [Google Scholar] [CrossRef]
Zhang, Q.; Wu, Y.N.; Zhu, S.-C. Interpretable convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8827–8836. [Google Scholar]
Singh, C.; Inala, J.P.; Galley, M.; Caruana, R.; Gao, J. Rethinking interpretability in the era of large language models. arXiv 2024, arXiv:2402.01761. [Google Scholar] [CrossRef]
Rai, D.; Zhou, Y.; Feng, S.; Saparov, A.; Yao, Z. A practical review of mechanistic interpretability for transformer-based language models. arXiv 2024, arXiv:2407.02646. [Google Scholar] [CrossRef]
Almenwer, S.; El-Sayed, H.; Sarker, M.K. Classification Method in Vision Transformer with Explainability in Medical Images for Lung Neoplasm Detection. In Proceedings of the International Conference on Medical Imaging and Computer-Aided Diagnosis, Manchester, UK, 19–21 November 2024; pp. 85–99. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Holzinger, A.; Langs, G.; Denk, H.; Zatloukal, K.; Müller, H. Causability and explainability of artificial intelligence in medicine. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1312. [Google Scholar] [CrossRef] [PubMed]
Molnar, C. Interpretable Machine Learning; Lulu.com: Morrisville, NC, USA, 2020. [Google Scholar]
Puiutta, E.; Veith, E.M. Explainable reinforcement learning: A survey. In International Cross-Domain Conference for Machine Learning and Knowledge Extraction; Springer: Cham, Switzerland, 2020; pp. 77–95. [Google Scholar]
Bhatt, U.; Xiang, A.; Sharma, S.; Weller, A.; Taly, A.; Jia, Y.; Ghosh, J.; Puri, R.; Moura, J.M.; Eckersley, P. Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 27–30 January 2020; pp. 648–657. [Google Scholar]
Truex, S.; Baracaldo, N.; Anwar, A.; Steinke, T.; Ludwig, H.; Zhang, R.; Zhou, Y. A hybrid approach to privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, London, UK, 15 November 2019; pp. 1–11. [Google Scholar]
Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends^® Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
Xin, Y.; Luo, S.; Jin, P.; Du, Y.; Wang, C. Self-training with label-feature-consistency for domain adaptation. In International Conference on Database Systems for Advanced Applications; Springer: Cham, Switzerland, 2023; pp. 84–99. [Google Scholar]
Wang, J.; He, Y.; Li, K.; Li, S.; Zhao, L.; Yin, J.; Zhang, M.; Shi, T.; Wang, X. MDANet: A multi-stage domain adaptation framework for generalizable low-light image enhancement. Neurocomputing 2025, 627, 129572. [Google Scholar] [CrossRef]
Xin, Y.; Du, J.; Wang, Q.; Lin, Z.; Yan, K. Vmt-adapter: Parameter-efficient transfer learning for multi-task dense scene understanding. Proc. AAAI Conf. Artif. Intell. 2024, 38, 16085–16093. [Google Scholar] [CrossRef]
Yi, M.; Li, A.; Xin, Y.; Li, Z. Towards understanding the working mechanism of text-to-image diffusion model. Adv. Neural Inf. Process. Syst. 2024, 37, 55342–55369. [Google Scholar]
Xin, Y.; Du, J.; Wang, Q.; Yan, K.; Ding, S. Mmap: Multi-modal alignment prompt for cross-domain multi-task learning. Proc. AAAI Conf. Artif. Intell. 2024, 38, 16076–16084. [Google Scholar] [CrossRef]
Wang, R.; He, Y.; Sun, T.; Li, X.; Shi, T. UniTMGE: Uniform Text-Motion Generation and Editing Model via Diffusion. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, 26 February–6 March 2025; pp. 6104–6114. [Google Scholar]
Stafford, V. Zero trust architecture. NIST Spec. Publ. 2020, 800, 59p. [Google Scholar]
Khan, M.J. Zero trust architecture: Redefining network security paradigms in the digital age. World J. Adv. Res. Rev. 2023, 19, 105–116. [Google Scholar] [CrossRef]
Allendevaux, S. How US State Data Protection Statutes Compare in Scope to Safeguard Information and Protect Privacy Using Iso/Iec 27001: 2013 and Iso/Iec 27701: 2019 Security and Privacy Management System Requirements as an Adequacy Baseline. Ph.D. Thesis, Northeastern University, Boston, MA, USA, 2021. [Google Scholar]
Singh, K.; Kashyap, A.; Cherukuri, A.K. Interpretable Anomaly Detection in Encrypted Traffic Using SHAP with Machine Learning Models. arXiv 2025, arXiv:2505.16261. [Google Scholar] [CrossRef]
Lukács, A.; Váradi, S. GDPR-compliant AI-based automated decision-making in the world of work. Comput. Law Secur. Rev. 2023, 50, 105848. [Google Scholar] [CrossRef]
Baniecki, H.; Biecek, P. Adversarial attacks and defenses in explainable artificial intelligence: A survey. Inf. Fusion 2024, 107, 102303. [Google Scholar] [CrossRef]
Amershi, S.; Weld, D.; Vorvoreanu, M.; Fourney, A.; Nushi, B.; Collisson, P.; Suh, J.; Iqbal, S.; Bennett, P.N.; Inkpen, K.; et al. Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, Scotland, UK, 4–9 May 2019; pp. 1–13. [Google Scholar]
Patil, S.; Varadarajan, V.; Mazhar, S.M.; Sahibzada, A.; Ahmed, N.; Sinha, O.; Kumar, S.; Shaw, K.; Kotecha, K. Explainable artificial intelligence for intrusion detection system. Electronics 2022, 11, 3079. [Google Scholar] [CrossRef]
Galli, A.; La Gatta, V.; Moscato, V.; Postiglione, M.; Sperlì, G. Explainability in AI-based behavioral malware detection systems. Comput. Secur. 2024, 141, 103842. [Google Scholar] [CrossRef]
Gupta, S.; Singh, B. An intelligent multi-layer framework with SHAP integration for botnet detection and classification. Comput. Secur. 2024, 140, 103783. [Google Scholar] [CrossRef]
Nazim, S.; Alam, M.M.; Rizvi, S.S.; Mustapha, J.C.; Hussain, S.S.; Suud, M.M. Advancing malware imagery classification with explainable deep learning: A state-of-the-art approach using SHAP, LIME and Grad-CAM. PLoS ONE 2025, 20, e0318542. [Google Scholar] [CrossRef] [PubMed]

Figure 1. US private AI investment from 2015 to 2024 [10].

Figure 2. Number of publications in XAI from 2015 to 2024.

Figure 3. Relationship between interpretability and accuracy for AI models.

Figure 4. Taxonomy of Explainable AI Methods in Security and Privacy Applications.

Figure 5. XAI research roadmap for 2025–2030+.

Figure 6. SHAP Summary Plot for Encrypted Traffic Anomaly Detection (XGBoost Model) [53].

Figure 7. Possible exceptions for automated decision-making under the GDPR [54].

Figure 8. Ensembling explanations from multiple methods provides stronger defense against adversarial attacks, as the aggregated output remains stable even when a single explanation is successfully manipulated [55].

Table 1. Summary of reviewed studies and qualitative evaluation criteria.

Evaluation Item	Description
Research Question	How explainable AI methods (model-agnostic and model-specific) contribute to trustworthy and transparent AI systems across domains
Time Period	2015–2024
Databases Searched	Scopus, IEEE Xplore, Web of Science
Inclusion Criteria	Peer-reviewed English-language journal and conference papers on XAI methods or applications
Performance Measures	Accuracy, F1-score, fidelity, latency, interpretability
Qualitative Evaluation	Thematic analysis of advantages, limitations, and emerging research gaps

Table 2. Open-Source XAI toolkits summary.

Organization/Project	Tool	Highlights	Reference
IBM	AIX360 (v0.3.0)	Broad library of interpretability and explanation methods	[23]
SeldonIO	Alibi (v0.9.6)	Global/local explainers including counterfactuals and confidence metrics	[24]
BCG X	Facet (v2.0)	Geometric inference of feature interactions and dependencies	[25]
OmniXAI	OmniXAI (v1.2.2)	Multi-data, multi-model XAI toolkit with GUI	[26]
Explabox	Explabox (v0.7.1)	Auditing pipeline for explainability, fairness, robustness	[27]
Xplique	Xplique (v1.3.3)	Toolkit for deep-learning model introspection	[28]

Table 3. Comparative summary of model accuracy and interpretability across representative domains.

Model Type	Typical Domain	Example Study	Accuracy (%)	Interpretability Level ¹
Decision Tree	Credit Scoring/Risk Analysis	[31]	85–90	★★★★★
Random Forest	Drug Discovery/Genomics	[16]	88–93	★★★★☆
CNN (Grad-CAM)	Medical Imaging	[36]	90–95	★★★☆☆
Attention-based NN	NLP/Healthcare	[37]	92–96	★★★☆☆
Interpretable NN (Glass-box)	Finance/Tabular Data	[33]	87–94	★★★★☆

¹ Interpretability levels are estimated qualitatively based on model transparency and human comprehensibility (★★★★★ = fully interpretable, ★☆☆☆☆ = opaque).

Table 4. Comparison of XAI toolkits and methods in cybersecurity applications.

Study	XAI Method	Application Domain	Main Performance Metric(s)	Interpretability Metric
[57]	LIME on DT/RF/SVM	Intrusion detection	Demonstrated high detection accuracy and balanced precision–recall across multiple classifiers	Qualitative per-class LIME analyses
[58]	SHAP, LIME, LRP, Grad-CAM	Behavioral malware detection	Reported consistently high detection rates and AUC values across evaluated detectors	Comparative usefulness of four XAI techniques
[59]	SHAP integrated in multilayer pipeline	Botnet detection & classification	Achieved competitive F1-scores and low false-positive rates on benchmark datasets	Global + local SHAP attributions
[60]	SHAP, LIME, Grad-CAM	Malware classification	Outperformed conventional CNN baselines in accuracy and robustness	Visual + feature-level explanations
[55]	Robust/adv. XAI	Adversarial settings for XAI	Synthesizes empirical studies on model robustness and explanation fidelity	Stability and robustness of explanations

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, X.; Huang, D.; Yao, J.; Dong, J.; Song, L.; Wang, H.; Yao, C.; Chu, W. From Black Box to Glass Box: A Practical Review of Explainable Artificial Intelligence (XAI). AI 2025, 6, 285. https://doi.org/10.3390/ai6110285

AMA Style

Liu X, Huang D, Yao J, Dong J, Song L, Wang H, Yao C, Chu W. From Black Box to Glass Box: A Practical Review of Explainable Artificial Intelligence (XAI). AI. 2025; 6(11):285. https://doi.org/10.3390/ai6110285

Chicago/Turabian Style

Liu, Xiaoming, Danni Huang, Jingyu Yao, Jing Dong, Litong Song, Hui Wang, Chao Yao, and Weishen Chu. 2025. "From Black Box to Glass Box: A Practical Review of Explainable Artificial Intelligence (XAI)" AI 6, no. 11: 285. https://doi.org/10.3390/ai6110285

APA Style

Liu, X., Huang, D., Yao, J., Dong, J., Song, L., Wang, H., Yao, C., & Chu, W. (2025). From Black Box to Glass Box: A Practical Review of Explainable Artificial Intelligence (XAI). AI, 6(11), 285. https://doi.org/10.3390/ai6110285

Article Menu

From Black Box to Glass Box: A Practical Review of Explainable Artificial Intelligence (XAI)

Abstract

1. Introduction

2. Key Concepts

2.1. Transparency

2.2. Interpretability

2.3. Marginal Transparency

2.4. Marginal Interpretability

3. Key Methods

3.1. Summary of Open-Source XAI Toolkits

3.2. Model-Agnostic Methods

3.2.1. Local Interpreatable Model-Agnostic Explanations (LIME)

3.2.2. Shapley Additive Explanations (SHAP)

3.3. Model-Specific Methods

3.3.1. Decision Trees

3.3.2. Interpretable Neural Networks

3.3.3. Explainability in Large Language Models (LLMs)

4. Taxonomy of Explainable AI

4.1. Ante-Hoc vs. Post-Hoc Approaches

4.2. Local vs. Global Explanations

4.3. Privacy-Preserving Explainability

4.4. Visual Taxonomy of XAI Applications

5. Research Roadmap for 2025–2030+

5.1. Short-Term (2025–2026): Establishing Foundations

5.2. Mid-Term (2027–2028): Integration and Expansion

5.3. Long-Term (2029–2030+): Towards Global Explainable Security

5.4. Synthesis

6. Applications of Explainable AI in Data Privacy and Security

6.1. Enhancing Transparency in Security Decision-Making

6.2. Privacy-Preserving Explanations

6.3. Trustworthy AI for Compliance and Regulation

6.4. Securing AI Models Against Adversarial Threats

6.5. Human-in-the-Loop Security Systems

6.6. Other Application Examples

7. Limitations, Synthesis and Conclusion

7.1. Limitations

7.2. Synthesis and Conclusion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI