Explainable Generative AI: A Two-Stage Review of Existing Techniques and Future Research Directions

Kumarage, Prabha M.; Saarela, Mirka

doi:10.3390/ai7010031

Open AccessSystematic Review

Explainable Generative AI: A Two-Stage Review of Existing Techniques and Future Research Directions

by

Prabha M. Kumarage

^*

and

Mirka Saarela

Faculty of Information Technology, University of Jyväskylä, P.O. Box 35, FI-40014 Jyväskylä, Finland

^*

Author to whom correspondence should be addressed.

AI 2026, 7(1), 31; https://doi.org/10.3390/ai7010031

Submission received: 30 October 2025 / Revised: 29 December 2025 / Accepted: 13 January 2026 / Published: 16 January 2026

Download

Browse Figures

Versions Notes

Abstract

Generative Artificial Intelligence (GenAI) models produce increasingly sophisticated outputs, yet their underlying mechanisms remain opaque. To clarify how explainability is conceptualized and implemented in GenAI research, this two-stage review systematically examined 261 articles retrieved from six major databases. After removing duplicates and applying predefined inclusion criteria, 63 articles were retained for full analysis. In the first stage, an umbrella review synthesized insights from 18 review papers to identify prevailing frameworks, strategies, and conceptual challenges surrounding explainability in GenAI. In the second stage, an empirical review analyzed 45 primary studies to assess how explainability is operationalized, evaluated, and applied in practice. Across both stages, findings reveal fragmented approaches, a lack of standardized evaluation frameworks, and persistent challenges, including limited generalizability, interpretability–performance trade-offs, and high computational costs. The review concludes by outlining future research directions aimed at developing user-centric, regulation-aware explainability methods tailored to the unique architectures and application contexts of GenAI. By consolidating theoretical and empirical evidence, this study establishes a comprehensive foundation for advancing transparent, interpretable, and trustworthy GenAI systems.

Keywords:

generative AI; explainable AI; interpretable AI; explainability techniques; explainability challenges

1. Introduction

Generative Artificial Intelligence (GenAI) has undergone rapid development in recent years, transforming from theoretical constructs into widely deployed technologies capable of generating text, images, audio, and multimodal content at human-competitive levels. Advances in deep generative architectures—including Generative Adversarial Networks (GANs) [1], Variational Autoencoders (VAEs) [2], autoregressive transformers [3], and diffusion models [4]—have enabled applications in scientific discovery, creative industries, and decision support [5]. Prominent systems such as GPT-4, DALL·E, and Stable Diffusion exemplify the increasing societal reach and transformative potential of generative technologies.

This rapid progress has intensified concerns regarding transparency and understanding. While explainability research has traditionally focused on discriminative models and their decision boundaries [6,7], generative systems introduce distinct epistemic challenges. Rather than producing deterministic predictions for predefined labels, GenAI models stochastically generate artifacts based on learned distributions, latent representations, and, increasingly, external knowledge retrieval mechanisms. As a result, explainability in generative systems raises different questions about what it means to account for model behavior, outputs, and underlying processes [8].

Importantly, explainability in GenAI encompasses multiple, qualitatively different explanatory targets, including (1) output-oriented explanations (e.g., why a particular token, image region, or object was generated); (2) explanations of latent-space behavior (e.g., the role of internal representations or sampling dynamics); and (3) data- and knowledge-source explanations (e.g., the influence of training data or retrieved external content). These concerns differ fundamentally from classical XAI objectives such as feature attribution for classification tasks, and therefore challenge the direct transfer of existing explainability paradigms to generative settings [9].

Recent survey work has established explainability for generative models as a legitimate and increasingly important research topic. For example, Schneider [8] introduce the notion of GenXAI and outline desiderata such as verifiability and user interaction, while Zhao et al. [9] review interpretability methods for large language models, highlighting the need to account for internal mechanisms, training dynamics, and emergent behaviors. Together, these surveys suggest a growing recognition that generative systems constitute a distinct object of explainability research.

At the same time, explainable GenAI remains an emerging and conceptually unsettled research area rather than a consolidated field. Existing studies are heterogeneous in scope, objectives, and methodological grounding, and many approaches adapt techniques originally developed for discriminative models without a shared theoretical foundation tailored to generative processes. Rather than presupposing the existence of a mature explanatory theory for GenAI, this review treats the current lack of conceptual consolidation as an empirical characteristic of the literature itself. Accordingly, the aim of this article is not to propose a new explanatory model or theory, but to systematically examine how explainability has been approached so far, to identify recurring patterns and limitations, and to clarify open conceptual and methodological gaps.

To this end, we conduct a two-stage systematic review guided by the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) framework [10]. A total of 261 articles were identified across six major databases. After deduplication and screening based on predefined inclusion and exclusion criteria, 63 articles were retained for full-text analysis. The first stage comprises an umbrella review of 18 existing review articles, synthesizing prevailing conceptual and methodological perspectives on explainability in GenAI. The second stage consists of an empirical review of 45 primary studies, focusing on how explainability is operationalized, implemented, and evaluated in concrete generative settings. Together, these stages provide a structured synthesis of the current state of the field.

The contributions of this review are fourfold and as follows:

It provides the first two-stage synthesis of explainability research in GenAI, integrating insights from both review and empirical studies;
It introduces an analytical taxonomy that organizes existing explainability approaches according to shared methodological principles and application contexts;
It critically synthesizes strengths, limitations, and recurring tensions observed across the literature; and
It articulates future research directions aimed at addressing conceptual fragmentation, evaluation challenges, and human-centered transparency requirements.

By consolidating and critically interpreting existing work, this article establishes a foundation for future theory-building and methodological innovation, while making explicit the limitations of current explainability approaches in generative AI.

The remainder of this article is organized as follows: Section 2 introduces the conceptual background of explainability in GenAI; Section 3 details the two-stage review methodology; Section 4 presents the taxonomy and analysis of existing techniques; Section 5 thoroughly analyzes open challenges deeply; Section 6 discusses practical recommendations and future research directions; and Section 7 concludes with reflections on implications for research, practice, and policy.

2. Preliminaries of Explainable Generative AI

2.1. Generative Artificial Intelligence

GenAI is a collection of computational techniques that are designed to generate novel and synthetic data, such as text, images, audio, and video, autonomously using trained arrangements and input data [5]. These GenAI tools mainly leverage deep learning (DL) models like Transformers-based models (TRMs), GANs, and VAEs to generate highly realistic and meaningful content [11]. Transformer models use self-attention mechanisms to process and generate content, sometimes in combination with CNN-based components [12]. OpenAI’s ChatGPT and Google’s Gemini are representative examples of GenAI models that use transformers. VAEs are integrated mainly into image synthesis and anomaly detection applications by using deep latent-variable models and related inference models [2]. On the other hand, GANs can produce highly realistic results using a two-network architecture that encompasses a discriminator to determine whether the data are real and a generator to produce the data [1]. This GAN architecture is widely utilized in various applications, including image generation, music composition, and 3D object generation. These complex back-end architectures substantially contribute to the current GenAI systems to produce content that is almost exactly human-generated. While these architectures differ in their mechanisms, they share several properties that make explainability particularly challenging.

First, generation is driven by latent representations rather than explicit decision boundaries. This makes it difficult to trace how internal factors shape outputs. In models such as VAEs and diffusion models, this latent space is often high-dimensional and abstract, which can limit interpretability [13,14]. Second, GenAI models produce outputs through multi-step stochastic processes (e.g., autoregressive token sampling in transformers or iterative denoising in diffusion models) [15]. Identical inputs may result in different outputs, complicating attribution and the traditional notion of “why” a specific outcome was produced [16]. Third, emergent behaviors and distributed representations, especially in large transformer models, make it difficult to pinpoint localized features or parameters responsible for specific generative effects [17]. This contrasts with discriminative models, where feature attribution techniques have clearer interpretive value. Finally, external augmentation mechanisms (e.g., retrieval, fine-tuning, prompt engineering) introduce additional layers influencing the generative process, further complicating provenance and transparency [18,19]. Together, these architectural characteristics highlight why conventional XAI methods designed for classification tasks often fail to generalize to generative settings, motivating the need for GenAI-specific explainability frameworks.

2.2. Explainable Artificial Intelligence (XAI)

The term XAI is a reference to the techniques and methodologies that make decisions and internal processes and AI systems understandable to humans [20]. XAI provides tools that can help address the black-box nature of AI models by using explanations to enhance the model’s transparency, understandability, and reliability.

Explainable techniques are basically divided into two primary divisions according to their scope of applicability: model-specific explanations and model-agnostic explanations. Model-specific explanations are designed for certain model architectures, such as Gradient-Weighted Class Activation Mapping (Grad-CAM) explanations, while model-agnostic explanations are applicable to all types of machine learning models, irrespective of their internal architecture, such as LIME [21].

In our classification explainability techniques are potentially divided into two distinct groups according to their methodological nature: quantitative and qualitative. Quantitative explainability techniques explain model behavior using numerical metrics or visual outputs, while qualitative explainability techniques focus on human-centered understanding through textual explanations or expert interpretation [22].

Further, evaluation methodologies have been divided into five main categories that are commonly employed to evaluate the quality and usefulness of explanations: metric-based, expert-verified, anecdotal evidence, case-based, and survey-based. Metric-based evaluations are based on computational standards such as robustness, accuracy, and fidelity. In expert-verified evaluations, domain experts assess the reliability and relevance of explanations. Anecdotal evidence shows how helpful explanations are by using informal or illustrative examples without performing a thorough analysis, while case-based uses specific example cases or scenarios to qualitatively assess. Lastly, survey-based evaluations use structured surveys to collect user inputs in order to assess how well explanations are understood and helpful.

Beyond technical implementation, explainability has long been studied as a theoretical problem in both philosophy of science and cognitive science. Classical accounts of scientific explanation emphasize mechanistic and probabilistic understanding, where explaining a phenomenon involves characterizing the processes and interactions that generate observable outcomes rather than identifying deterministic causes alone [23]. Complementarily, cognitive theories view explanations as tools that support human mental models, enabling users to reason about, predict, and appropriately trust system behavior [24]. These perspectives are particularly relevant for GenAI, where stochastic generation challenges deterministic and feature-centric notions of explanation.

2.3. The Need for Explainability and Transparency in GenAI

The explainability of an AI system refers to its ability to provide human-understandable justifications for its decisions and outputs [25]. Such explanations should be interpretable not only by technical specialists but also by diverse stakeholders, including practitioners, end users, and policymakers Dwivedi et al. [6]. Most existing explanation techniques were developed for traditional AI systems, whose inner structures are comparatively easier to interpret. In this article, the term “traditional AI” refers to non-generative systems. However, it can be challenging to provide meaningful explanations for GenAI models using existing XAI methods or to design new frameworks that can handle their complex, high-dimensional latent representations [8]. Moreover, GenAI models rely heavily on large-scale datasets and intricate statistical dependencies, further complicating efforts to trace and justify individual outputs. The development of inherently interpretable GenAI models, therefore, remains an ongoing research challenge.

Explainability is often linked to transparency, but the two refer to different aspects of GenAI systems. Transparency describes how openly an AI system operates, including access to information about model architecture, training data, and generative or decision-making processes [26]. It is widely considered a prerequisite for explainability and is essential to ensure fairness, ethics, and accountability. However, achieving transparency in GenAI remains difficult because many generative models are built on deep learning architectures that function as black boxes. In contrast, explainability refers to the provision of human-understandable reasons for why a model produced a particular output. Interpretability is a related but distinct concept that describes how easily a human can understand a model’s internal logic or representations, independent of any explanation provided. Finally, human-centered and ethical considerations (e.g., fairness, accountability, usability) do not constitute explainability themselves; rather, they shape the goals and evaluation criteria for explanation methods. Distinguishing these concepts highlights that explainability in GenAI requires both technical mechanisms and alignment with user and societal needs, but these elements are not interchangeable.

Although the complexity of GenAI contributes to its opacity, ongoing efforts increasingly aim to make these systems more transparent and interpretable. Providing understandable justifications for GenAI outputs is vital to fostering trust, responsibility, and ethical use. Enhancing explainability and transparency can also promote user confidence, facilitate adoption, and support compliance with emerging regulatory standards.

The following Table 1 summarizes key reasons why explainability and transparency are essential in GenAI and outlines their broader implications.

Despite the recognized significance of explainability and transparency in GenAI models, still existing explainability techniques and frameworks remain undeveloped compared to traditional AI systems. This research study aims to explore and evaluate these techniques and frameworks by determining their strengths, limitations, and challenges.

2.4. A Formalized Model for Explainability in GenAI

Existing XAI techniques are largely grounded in assumptions inherited from discriminative modeling, where predictions are treated as deterministic or near-deterministic mappings from inputs to outputs. Feature attribution, saliency, and counterfactual explanations implicitly assume stable decision boundaries and locally linear behavior. GenAI models violate these assumptions at a fundamental level: outputs are sampled from learned probability distributions, generation unfolds through multi-step stochastic processes, and semantic content is encoded in high-dimensional latent spaces rather than explicit input features. As a result, the central explanatory question in GenAI shifts from “which features caused this decision?” to “which probabilistic mechanisms and latent factors shaped this generative outcome?”. This conceptual mismatch explains why direct transplantation of conventional XAI techniques often yields explanations that are unstable, non-reproducible, or only superficially informative.

From a theoretical perspective, explainability in GenAI cannot be defined as the justification of a single output or the attribution of importance to specific input features. Instead, explainability in generative systems concerns the ability to account for generative behavior, that is, how probability distributions, latent variables, and sampling processes interact to produce families of plausible outputs. An explanation in GenAI therefore, aims to make intelligible the mechanisms and constraints shaping generation, rather than to provide a causal account of a single realized instance. This shifts the explanatory focus from outcome justification to mechanism-oriented understanding, where explanations describe how internal representations, stochastic choices, and conditioning information jointly influence the space of possible outputs.

Against this background, although existing XAI research provides important foundations, explainability for GenAI models requires a distinct theoretical formulation that explicitly accounts for stochastic outputs, latent-variable structures, and multimodal architectures. To address this need, we propose the GenAI Explainability Triangle, a formal model that conceptualizes explainability as an interaction between three interdependent components: generative mechanism transparency, user-centered interpretability, and evaluation fidelity. This model provides a structured theoretical lens for understanding how explainability is operationalized across different GenAI architectures.

The GenAI Explainability Triangle is introduced not as a prescriptive checklist, but as a conceptual reformulation of explainability objectives grounded in established explanatory traditions and adapted to probabilistic generation. The dimension of generative mechanism transparency aligns with mechanistic and probabilistic accounts of scientific explanation. That emphasizes understanding how internal processes produce distributions of outcomes rather than attributing single causes. User-centered interpretability draws on cognitive theories of explanation and mental models, according to which explanations are effective only insofar as they support human reasoning and contextual understanding. Evaluation fidelity reflects epistemic accounts of explanation that stress faithfulness between explanatory claims and the underlying processes they purport to describe. Together, these three dimensions explain why explainability in GenAI cannot be reduced to extensions of classical XAI, but instead requires rethinking explanation at the level of generative mechanisms, human cognition, and stochastic processes.

Generative mechanism transparency concerns the extent to which a system reveals information about its internal generative pathways, including latent-space representations, sampling dynamics, architectural dependencies, and any retrieval or conditioning mechanisms [27]. For models such as GANs, VAEs, diffusion models, and transformer-based systems, transparency includes both structural insight and the capacity to trace how internal factors influence the produced output.

User-centered interpretability refers to how understandable, meaningful, and actionable the explanations are for different stakeholders. Interpretability is inherently contextual because domain experts, developers, end users, and policymakers may require different explanatory forms. For GenAI systems, interpretability may involve natural-language rationales, latent factor visualizations, semantic concept manipulations, or training data provenance. This dimension highlights that explainability cannot be defined solely in technical terms; it must align with human cognitive requirements and usage contexts.

Evaluation fidelity addresses the degree to which explanations accurately reflect the true generative behavior of the model. High-fidelity explanations must be robust, replicable, and faithful to the mechanisms they purport to describe. In generative settings, fidelity may be assessed through quantitative robustness measures, alignment with latent factors, expert validation, or human-centered evaluation protocols. This dimension underscores that explanation quality depends on both methodological rigor and empirical grounding.

Together, these three dimensions form the GenAI Explainability Triangle, shown in Figure 1, which conceptualizes explainability in generative AI as involving multiple, potentially competing objectives. Existing approaches in the literature can be reviewed and compared along these dimensions, revealing how emphasis on one aspect (e.g., transparency) is often associated with trade-offs in others (e.g., interpretability or fidelity). Rather than constituting a formal explanatory model, the GenAI Explainability Triangle serves as an analytical framework that complements the taxonomy developed in this review by organizing prior work around shared explanatory concerns. In doing so, it helps clarify how explainability has been operationalized in current GenAI research and highlights systematic gaps and tensions that motivate future, generative-specific methodological and theoretical developments.

Unlike traditional XAI frameworks, which implicitly assume deterministic prediction and static model behavior, the proposed framework treats explainability as inherently probabilistic and multi-objective. Conventional approaches typically evaluate explanations against single-instance correctness or feature attribution fidelity. In contrast, explainability in generative systems must account for distributional behavior, latent-variable dynamics, and variability across multiple valid outputs. By explicitly modeling tensions between transparency, interpretability, and fidelity, the GenAI Explainability Triangle explains why many existing XAI techniques succeed only partially when applied to generative models and why explainability in GenAI must be understood as a trade-off rather than a single optimization target.

This framework provides the analytical foundation for the remainder of the review. It allows us to interpret the literature not merely as a collection of techniques or model-specific adaptations, but as evidence of how transparency, interpretability, and fidelity manifest across different generative architectures. By organizing subsequent findings through this lens, the review moves beyond descriptive mapping and develops an integrated synthesis of conceptual trends and methodological patterns.

3. Research Methodology

This study conducts a two-stage Systematic Literature Review (SLR) on explainability in GenAI systems, identifying, evaluating, and synthesizing research studies to understand current practices and research gaps in the field. The review protocol was registered on the Open Science Framework (OSF; https://doi.org/10.17605/OSF.IO/2ZWMH), ensuring transparency and reproducibility of the planned methodology.

As far as we are aware, no prior study has systematically and comprehensively reviewed the existing literature, both reviews and empirical studies. Therefore, to systematically address the research questions, the review follows the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [10] to collect relevant sources and to ensure rigorous, transparent, and reproducible methodology. The process involved several structured steps: establishing eligibility requirements, specifying information sources, outlining the search approach, detailing the selection and data collection procedures, selecting data items, assessing the risk of bias, determining effect measures, outlining synthesis techniques, disclosing bias assessment, and assessing overall certainty of findings [10]. This systematic approach ensures comprehensive coverage and a reproducible approach for the analysis of the literature.

3.1. Data Sources and Search Strategy

A search was conducted across six major academic databases—Scopus, Web of Science, ACM Digital Library, IEEE Xplore, ScienceDirect, and Sage—in order to obtain a thorough view of explainability techniques in GenAI. The search was performed in April 2025 using a Boolean query targeting the title, abstract, and keywords fields. The search string used was: ((“explainable artificial intelligence” OR “XAI” OR “explainable AI” or “EAI”) AND (“generative artificial intelligence” OR “generative AI” OR “GAI”)). Subsequently, each database’s syntax requirements were taken into consideration and formed relevant Boolean search strings. This search was restricted to the peer-reviewed journal articles and conference proceedings issued between 1 January 2020 and 30 April 2025 to ensure that the most recent and quality advancements were captured.

3.2. Selection Criteria

The PRISMA flow chart shown in Figure 2 summarizes the literature selection process. The database search provided 261 records across the above-mentioned databases. After removing duplicate records (n = 114), 147 records were qualified for a full-text review and screening. As part of the screening process, we identified three additional records from the existing records while conducting the snowball search. The final count for the next step was 150 articles.

Inclusion and exclusion criteria were implemented to assess these 150 articles, as shown in Table 2. As shown in Figure 2, 86 articles were excluded due to their failure to comply with the inclusion criteria. Most reports (n = 78) were excluded because they were not relevant to the primary focus of the study. The main reasons for that exclusion were a lack of discussion of explainability techniques for GenAI, a focus solely on GenAI without addressing explainability, or coverage of explainability without specifically referring to GenAI. Upon closely examining the abstracts and results sections, it was confirmed that the excluded articles are not completely focused on the topic of this research study. Eventually, 18 review articles and 45 empirical articles remained for data extraction and synthesis.

Although preprints (e.g., arXiv, SSRN) constitute a substantial proportion of emerging GenAI research, we restricted the review to peer-reviewed publications to ensure methodological consistency, reliability, and sufficient reporting quality for bias and certainty assessment. This approach aligns with PRISMA recommendations for minimizing variability in evidence quality. However, we recognize that this decision may omit very recent developments that have not yet undergone peer review, particularly given the rapid evolution of GenAI explainability techniques.

3.3. Data Extraction and Analysis

Following the analytical process of PRISMA guidelines, this review was carried out applying the methodology of thematic analysis as described in Braun and Clarke [28]. Each empirical study was carefully studied to obtain important information associated with the study content, and notes were made.

A structured data extraction form was used to ensure a consistent and systematic approach across all studies. Each article was evaluated based on predefined coding criteria and guiding questions. The key details collected included bibliographic information (authors, year, country, and source), study type, application domain, explainable techniques, methodology, and key findings. In addition to these details, review articles were read to identify the number of articles they analyzed, how they structured their findings, recognized challenges, and synthesized trends in the field. On the other hand, the specific focus was given to the empirical studies focused on the GenAI modality, XAI techniques implemented, their evolution methods, and the results obtained. After completion of the data extraction process, we synthesized the data using both qualitative and quantitative techniques for analyzing data in order to respond to our research question. Eventually, we synthesized all findings to highlight key advancements, existing limitations, and open challenges within the field of GenAI explainability.

3.4. Bias and Certainty Assessments

A systematic evaluation of reporting bias was carried out to ensure the credibility of the reviewed literature. Each article was assessed for data transparency and completeness to determine whether the findings aligned with the predefined research question. Studies with incomplete results, missing key outcomes, or lacking essential details were marked as a potential bias. This evaluation helped ensure that the results were based on reliable, well-documented sources and supported the integrity of the systematic review by preventing misleading interpretations.

In addition to reporting bias, several methodological limitations inherent to the review process must be acknowledged. Restricting the search to peer-reviewed and English-language publications introduces potential publication bias, as studies with inconclusive or negative results, or those published in non-English venues, may be underrepresented. The exclusion of preprints—although intended to ensure methodological rigor—may also bias the evidence base toward more established or conservative research, especially given the pace of GenAI development. Database coverage further creates the risk of missing relevant studies from domain-specific repositories. Finally, although data extraction and coding followed predefined criteria, thematic synthesis inevitably involves interpretive judgment, which introduces a degree of subjectivity. Recognizing these limitations is essential for evaluating the scope, generalizability, and ethical transparency of the present review.

A certainty assessment was conducted in addition to the bias assessment to evaluate the overall quality and reliability of the included articles. This evaluation adhered to the general recommendations from Da’u and Salim [29], and guidelines proposed by Kitchenham and Charters [30]. These standards include aspects such as reporting clarity, methodological rigor, and relevance to the research objectives. To ensure a minimum quality threshold, all articles were chosen from prestigious peer-reviewed publication channels that maintain strict academic standards, such as citation-based metrics and expert evaluations [31]. These measures aimed to reduce uncertainty and strengthen the reliability of the study’s conclusions.

Unlike empirical GenAI studies that evaluate individual models, this review treats published primary studies as empirical observations and applies comparative synthesis across model families, application domains, explainability mechanisms, evaluation strategies, and stakeholder groups to extract higher-level empirical patterns that are not visible within individual studies.

4. Results and Analysis

This section presents the findings of the two-stage review, which jointly map the conceptual and empirical landscape of explainability in GenAI. Rather than reporting individual studies in isolation, the analysis in this section treats empirical articles as units of observation and synthesizes their findings comparatively to identify cross-study regularities, divergences, and explanatory gaps in current GenAI explainability practice. The first stage (Section 4.1) synthesizes existing review studies to establish a conceptual and methodological overview of explainability in GenAI. The second stage (Section 4.2) analyzes empirical research to examine how explainability techniques are implemented, evaluated, and applied in practice. Together, these two stages provide an integrated understanding of current approaches, limitations, and emerging trends in explainable GenAI.

Rather than presenting the results as a catalog of techniques, the analysis in this section is structured around the theoretical model introduced earlier. This framing enables us to identify how explainability practices differ across domains, model families, and evaluation strategies, and how these variations reflect deeper conceptual challenges within GenAI. The two-stage approach therefore produces not only a descriptive overview but also a comparative and interpretative synthesis. Importantly, the widespread use of adapted XAI techniques in the reviewed studies is not interpreted here as methodological validation, but as empirical evidence of how the field currently compensates for the lack of generative-native explainability frameworks.

4.1. First Stage: Review of Reviews

In the first stage, 18 review articles were identified and analyzed to capture how explainability in GenAI has been conceptualized, categorized, and critiqued across existing syntheses. Table 3 summarizes these reviews and their main findings.

4.1.1. Temporal Distribution in Review Studies

No review articles related to GenAI explainability were published before 2023. This indicates that explainability became a formal research priority only after the rapid uptake of GenAI models. The first two articles appeared in 2023, followed by a sharp increase in 2024 and 2025 (eight articles each). This delayed emergence likely reflects the field’s initial focus on generative capability rather than explainability. This sharp rise also signals increasing concerns regarding transparency and governance aligned with the first component of our framework: generative mechanism transparency.

4.1.2. Domain Distribution in Review Studies

Review articles are unevenly distributed across domains. The most represented areas are technologically mature and commercially active—such as industry and engineering [34,36,39,42,48] and technology and software development [8,33,35,37,44]. In contrast, law and governance [32,41], education [38,40], and finance and business [45] remain underexplored despite their societal relevance. Healthcare [43,46,47] occupies an intermediate position, with growing but cautious engagement driven by ethical and regulatory concerns. This uneven distribution reflects the interplay between user-centered interpretability (varying stakeholder needs) and regulatory pressure (e.g., the EU AI Act), which influences research maturity across domains. This domain distribution also highlights both the promise and the uneven maturity of GenAI explainability research.

4.1.3. Key Findings

The umbrella review reveals a rapidly growing interest in explainability for GenAI, with most work emerging only after 2023. Across nearly all reviews, the black-box nature of GenAI models is identified as the central challenge [8,32,33,34,35,36,37,39,41,42,45,46,48]. Although traditional XAI methods such as SHAP, LIME, and saliency maps are commonly applied, many reviews highlight their limited ability to capture high-dimensional and stochastic GenAI outputs [35,43].

Existing XAI techniques often fail to account for the probabilistic and generative nature of these models, where identical inputs can yield diverse outputs. This stochasticity challenges the deterministic assumptions underlying most XAI frameworks, resulting in explanations that are inconsistent or lack user relevance [34,35,38].

Most reviews therefore conclude that current research over-relies on pre-GenAI paradigms rather than developing GenAI-specific approaches [36,37,44,45]. The field also lacks standardized evaluation frameworks for assessing fidelity, robustness, and usability of explanations [32,34,38,39]. Several works call for domain- and user-centered evaluation strategies, especially in high-stakes contexts [39,41,43].

Finally, regulatory and ethical imperatives—such as those emerging from the EU AI Act—are increasingly cited as external drivers of explainability research [32,35,38,41], emphasizing the alignment of technical development with social and legal accountability.

These insights reinforce the need for model-agnostic transparency mechanisms and standardized evaluation fidelity, both essential pillars of the theoretical model introduced in background section.

4.1.4. Cross-Study Comparisons in Review Studies

While the review articles collectively highlight similar challenges, cross-study comparison reveals notable differences in conceptual emphasis and methodological depth. For example, reviews focusing on technical domains (e.g., engineering, manufacturing, and computer science) generally frame explainability as a problem of mechanistic transparency, drawing attention to latent-space understanding and architectural constraints. In contrast, reviews from healthcare, education, and legal domains emphasize user-centered interpretability and ethical accountability. These differences indicate that explainability expectations vary substantially across sectors, which partly explains the fragmented terminology and inconsistent evaluation criteria identified in the literature.

Additionally, although most reviews agree that traditional post hoc XAI techniques are insufficient for GenAI, they diverge in their recommendations for future research. Some advocate adapting existing tools (e.g., SHAP, Grad-CAM), whereas others call for entirely new generative-specific paradigms. This divergence underscores the lack of consensus on foundational explainability principles for GenAI and highlights the need for unified frameworks that can operate across diverse generative architectures.

The insights synthesized in Stage 1 served as the conceptual foundation shaping the analytical structure of Stage 2. The umbrella review identified several recurring themes, namely fragmented terminology and conceptual inconsistency, an over-reliance on traditional XAI methods such as LIME, SHAP, and saliency-based tools despite their limited suitability for generative architectures, the absence of standardized evaluation protocols, and a broader lack of GenAI-specific explainability frameworks. These themes directly informed the coding and categorization strategy employed in Stage 2. For example, the critique of traditional XAI techniques guided our classification of empirical methods into pre-existing, modified, and novel categories; the observed inconsistency in terminology motivated our mapping of explainability practices across different GenAI model families; and the lack of unified evaluation standards shaped our comparative analysis of metric-based, expert-verified, case-based, anecdotal, and unevaluated approaches. In this way, the empirical review operationalizes the conceptual gaps and priorities highlighted in Stage 1, enabling a coherent, two-stage synthesis in which the theoretical insights of the umbrella review directly motivate and structure the empirical analysis.

Taken together, the stage 1 reveals that explainability in GenAI is not progressing linearly toward convergence, but instead fragments along theoretical fault lines. Across domains, models, and application contexts, explainability efforts repeatedly oscillate between three unresolved objectives: revealing internal generative mechanisms, producing explanations that are meaningful to human users, and maintaining fidelity to inherently stochastic processes. This pattern suggests that the lack of consensus in the literature is not merely methodological immaturity, but reflects a deeper theoretical tension in how explanation is conceptualized for generative systems.

4.2. Second Stage: Empirical Review of Primary Studies

Building on the conceptual insights synthesized in the umbrella review, the second stage of analysis examines how explainability is implemented, operationalized, and evaluated in empirical GenAI research. Whereas Stage 1 revealed fragmented terminology, heavy reliance on pre-existing XAI techniques, limited use of GenAI-specific methods, and a lack of standardized evaluation frameworks, Stage 2 investigates whether and how these patterns manifest in practice. To ensure conceptual continuity, the analytical categories applied to the empirical studies such as the classification of explainability techniques (pre-existing, modified, novel), the mapping of model families, and the comparison of evaluation approaches, were directly derived from the themes identified in Stage 1. This stage, therefore, translates the conceptual gaps and research priorities highlighted in the umbrella review into a structured empirical assessment, enabling a systematic comparison between theoretical expectations and real-world implementations of explainability in GenAI systems.

Specifically, this stage analyzes 45 primary studies that implement, test, or evaluate explainability techniques in GenAI systems. To strengthen theoretical grounding, this analysis is organized around the three components of GenAI explainability framework.

Table 4 summarizes these studies by reference, geographic location, and key contribution, providing an overview of the empirical evidence base and illustrating the range of methodological and regional perspectives represented in current research.

4.2.1. Temporal and Geographic Distribution

The first empirical study on GenAI explainability appeared in 2022, followed by six studies in 2023 and a sharp increase in 2024 with 27 publications. As of the end of April 2025, an additional ten studies have been published. This growth reflects the broader shift in GenAI research from capability-focused development toward concerns over transparency, accountability, and governance—trends amplified by regulatory initiatives and the rising complexity of generative architectures.

Geographically, Europe accounts for the largest share of contributions (25 studies), a pattern consistent with the region’s strong regulatory environment and emphasis on interpretability requirements such as those introduced by the EU AI Act [94]. Asia (10 studies) and North America (8 studies) follow, each exhibiting distinct research priorities: Asian work often focuses on applied, domain-specific implementations of explainability, whereas North American studies emphasize human–AI interaction, trust calibration, and conceptual perspectives on explanation. South America and Oceania contribute one study each. When a study’s regional context was not explicit, we assigned a region based on the lead or majority author affiliation (see Table 4). These observations suggest that regional policy environments and academic traditions meaningfully shape the research agenda for GenAI explainability.

Further, several regional trends emerge from this distribution. European studies frequently foreground governance, auditability, and risk management, reflecting policy-driven expectations for transparency. North American research, rooted in mature HCI and sociotechnical traditions, tends to investigate user-centred explanations and cognitive aspects of trust. Asian contributions, particularly from South Korea and Japan, demonstrate strong industry–academia integration, with a focus on operationalizing explainability in sectors such as engineering, energy systems, and collaborative design. The limited representation from South America and Oceania likely reflects disparities in research investment and access to large-scale GenAI infrastructure. Collectively, these patterns show that explainability research is not globally uniform but conditioned by regional regulatory pressures, disciplinary orientations, and domain-specific application needs.

4.2.2. Domain Distribution

Figure 3 shows a broad spread across domains, with some studies spanning multiple sectors [71,72]. Technology and software is most represented (eight studies), consistent with the field’s origins in computer science, including work in HCI [64,85], human–AI interaction [78], human-centered explainability [79], code generation [49], semantic text generation [87], conversational AI [75], and causal generative image explainability [82]. When viewed through the theoretical model, technology and software domains prioritize mechanistic transparency (e.g., latent-space analysis, attention visualization).

Cybersecurity and arts and culture follow. Cybersecurity applications include IDS [93], cloud anomaly detection [76], detection of synthetic images [63], cyber harassment [77], misinformation [81], deepfakes [59], and O-RAN traffic [88]. Arts and culture spans content creation [72], T-shirt layout design [91], GAN-based image generation [53], digitization of archives [84], digital art and emotion [52], co-creation [50], and AI-generated music [66]. Notably, explainability in this domain often serves co-creativity rather than strictly technical validation.

Education, healthcare, and industry and engineering each contribute six studies. Education covers GenAI-supported learning [72], MOOCs/LA [83], at-risk CS1 prediction [60], TEL [67], teaching assistant robots [54], and adaptive content generation [71]. Healthcare work includes digital pathology [58], metagenomics [65,68], T2D prediction [90], CXR classification/correction [55], and personalized treatment [71]. Industry and engineering covers air-traffic anomaly detection [51], bridge design [69], industrial AI [89], energy/decarbonization [74], chemical engineering [73], and telecom management [80]. When viewed through the theoretical model, these healthcare and education domains emphasize user-centered interpretability due to safety or pedagogical constraints.

Law and governance includes social epistemology [57], legal decision-making in banking [61], auditing [62], sports policy [92], and ethical content generation [71]. When viewed through the theoretical model, law and governance focuses on evaluation fidelity (auditability, reliability, provenance tracking). Finance and business is least represented (three studies—risk in mission-critical IT [56], customer support [86], robo-advice [70]), which may reflect proprietary constraints and regulatory exposure. Overall, the distribution suggests explainable GenAI is increasingly cross-sectoral, though many studies remain experimental with limited end-user evaluation. Additionally, this cross-domain comparison formalizes how explainability goals shift depending on generative context and stakeholder needs.

4.2.3. Distribution Across GenAI Model Families

Figure 4 categorizes studies by model family; some evaluate multiple families [59,71,80,92].

Transformer-based models dominate (25 studies), reflecting their multi-modal adaptability. This includes LLMs such as GPT-3.5/4/4o-mini, PaLM-2, Phi-2, ChatGPT, Gemini, Bard, Claude [54,56,61,62,64,67,70,71,72,73,75,79,80,84,86,87,89,91], Mistral-7B-Instruct-v0.1 [77], text-to-image systems (VQGAN-CLIP, DALL-E variants) [52,59,71], and DistilBERT [83,87]. Other uses include LLM-powered search [78], structured prediction with Gemma 2B [60], and code models [49]. Attention visualization and prompt-based probes are frequently leveraged for interpretability.

GANs (11 studies) include CTGANs, StyleGAN variants, ProGAN, BigGAN, StarGAN, AttGAN, CycleGAN, GauGAN, GDWCT, CopulaGAN [53,59,65,68,71,76,80,90,92,93]. Their adversarial formulation complicates explanation, motivating post hoc tools.

VAEs (traditional, CVAE,

β

-VAE, AAE, SS-DeepCAE, Sketch-RNN) appear less often [51,55,66,69,74,92] but offer more structured latent spaces.

Diffusion models (DMs) are emerging: LDMs [63], DDPMs [58], Stable Diffusion 1.5 [85], Glide/LDM/Stable Diffusion [59].

A small subset uses other or hybrid approaches: CDGMs [82], unspecified GenAI [57], or GenAI images without model detail [81]. The trend points toward architectural diversification and the need for explanation methods that generalize across families.

Each GenAI model family in Figure 4 introduces distinct explainability challenges stemming from its underlying generative mechanisms. GANs rely on adversarial training between generator and discriminator networks, producing highly nonlinear and interdependent feature representations that complicate attribution. Diffusion models generate outputs through iterative denoising steps, where each step depends on stochastic sampling; explaining a final output therefore requires tracing a long generative trajectory across latent space. VAEs, although structurally more interpretable, compress information into latent variables whose abstract semantics do not always map cleanly to human concepts. Transformer-based models add further complexity due to distributed attention patterns and emergent behaviors, which obscure localized causal influences on generated tokens. These architectural characteristics illustrate why explainability techniques often need to be tailored to specific generative families rather than applied uniformly across GenAI models.

Organizing these findings through our theoretical model reveals that transformers demand user-centered and workflow-based explanations, GANs and DMs require techniques targeting generative mechanism transparency, and VAEs provide natural alignment with evaluation fidelity through stable latent structures.

To examine practice, Figure 5 maps domains, model families, and explanation types. Transformers appear in all domains except finance and business, often paired with repurposed XAI. Novel methods remain sparse and domain-specific, underscoring gaps in model–method fit.

4.2.4. Explainability Techniques

The categorization of explainability techniques in this section is deliberately critical: the prevalence of classical XAI methods is analyzed as an indicator of conceptual borrowing rather than as evidence of their adequacy for generative models. Across 45 studies, techniques cluster into pre-existing, modified pre-existing, andnovel approaches (Table 5).

Pre-existing methods, especially LIME and SHAP, remain the default choices [95], but their post hoc, relevance-based logic struggles with open-ended generative outputs. For instance, Biswal [77] use LIME and SHAP to attribute word-level contributions in a fine-tuned Mistral-7B-Instruct-v0.1 model for cyber-harassment detection—useful, but still limited for generative behaviors.

While pre-existing XAI techniques such as SHAP, LIME, and Grad-CAM remain widely used in the reviewed studies, their applicability to generative models is fundamentally limited. These methods were originally developed for discriminative settings with deterministic decision boundaries, whereas generative systems produce outputs through multi-step stochastic sampling and high-dimensional latent representations. As a result, perturbation-based attribution scores often become unstable, non-reproducible, or only loosely related to the internal generative mechanisms [96]. Generative architectures including transformers, GANs, VAEs, and diffusion models compress semantic factors into latent spaces that do not map directly to input features, meaning that attribution methods assuming locality in input space fail to capture how latent variables, cross-token dependencies, or denoising trajectories shape outputs [14,97]. Moreover, emergent behaviors in large-scale generative transformers arise from distributed internal representations that cannot be localized to specific tokens or pixels. Thus, additive or linear feature decompositions used by classical XAI methods cannot meaningfully reflect the nonlinear, context-dependent interactions underlying content generation [98,99]. Consequently, many empirical studies employ SHAP, LIME, or Grad-CAM only as proxy indicators—highlighting salient regions or latent correlations—rather than as faithful explanations of how generative systems produce outputs. This identifies a key methodological gap: transferring classical XAI methods into generative contexts yields surface-level interpretability but rarely provides mechanistic insight into the generative process. These limitations collectively motivate the development of GenAI-specific explainability frameworks that can account for latent-space behavior, sampling dynamics, and data-source influences.

To address these limitations, many studies introduce modification or hybridization: adapting known tools and combining symbolic, statistical, and visual components. Examples include VAE-focused latent analyses and composite visualization for air-traffic trajectories [51], SHAP+PDPs with proxy models [52], and attention visualizations integrated with semantic role labeling and clustering [84]. These augmentations improve domain fit and human interpretability without inventing entirely new frameworks.

A smaller but growing stream introduces novel techniques. Social Transparency [49,79] surfaces other users’ interactions; Concept Lens maps semantic latent directions in GANs [53]; Epistemic filters [57] make data/epistemic assumptions explicit; LLM self-explanations [78] provide natural-language justifications; and human–GenAI collaboration models [72] support meta-cognitive understanding. These remain early-stage but align more directly with generative characteristics and user needs.

The taxonomy of techniques-pre-existing, modified, and novel-is retained, but now contextualized theoretically, pre-existing techniques primarily address local transparency but often fail to capture probabilistic generation, modified techniques enhance alignment with user-centered interpretability or adapt to latent-space structures, and novel techniques represent emerging attempts to bridge all three components of the theoretical model. This reframing clarifies why certain methods dominate and where gaps persist.

Figure 6 shows that modified pre-existing approaches dominate, followed by pre-existing; novel techniques are rare. Some studies span multiple categories [49,56,72,78].

While Table 5 classifies explainability techniques based on their degree of novelty (pre-existing, modified, and novel), this categorization alone does not capture their deeper analytical properties. Therefore, Table 6 introduces a complementary multi-dimensional taxonomy that characterizes empirical GenAI explainability techniques based on their mechanism of explanation, timing (ante hoc vs. post hoc), scope (local vs. global), target audience, and methodological nature.

This Table 6 shows that empirically applied GenAI explainability techniques are highly heterogeneous across all analytical dimensions. Post hoc explanations remain dominant, particularly through feature-attribution methods such as SHAP, LIME, and Grad-CAM, reflecting the continued reliance on classical XAI approaches despite their known limitations for generative systems. Ante hoc mechanisms, including latent-space interpretability and epistemic filtering, appear less frequently but offer stronger alignment with the internal generative process. The distribution across local, global, and mixed scopes further illustrates that most studies prioritize instance-level transparency, while fewer address system-level behavior. With respect to target audiences, developers and end users are most frequently supported, whereas policymakers and educators remain comparatively underrepresented.

Finally, beyond functional categories, these techniques can also be grouped by methodological nature: quantitative, qualitative, or mixed. In our review, 66.7% combine quantitative and qualitative elements; 17.8% are purely quantitative; and 15.6% purely qualitative. The prevalence of mixed methodological approaches indicates a growing recognition that neither purely quantitative nor purely qualitative explanations alone are sufficient for capturing the complexity of GenAI behavior. Further, quantitative metrics offer rigor but can be opaque to non-experts, while qualitative analysis surfaces human-centered meaning but may lack robustness.

Collectively, this multi-dimensional perspective reveals that current GenAI explainability research is still fragmented, with limited convergence toward unified, audience-sensitive, and generative-native explanation paradigms.

Across the 45 empirical studies, several outcome-level performance trends can be observed despite the absence of standardized evaluation metrics [52,55,56,65,67,71,74,76,81,83,86]. With respect to computational overhead, lightweight post hoc techniques such as LIME, SHAP, and Grad-CAM generally introduce low to moderate computational cost [59,63,65,77], whereas hybrid and real-time explainability frameworks (e.g., DYNAMIC, ExplainAgent, and blockchain-based auditing systems) report higher overhead, particularly in large-scale or real-time applications [61,62,71,80,89]. Regarding user-centered outcomes, multiple studies consistently report improvements in user trust, satisfaction, confidence calibration, and task performance when explanations are integrated, especially in human–AI interaction, education, and creative co-creation settings [50,54,67,71,75,78,79,83,86,91]. In terms of explainability effectiveness, several studies demonstrate enhanced fidelity, robustness, bias detection, and auditability, particularly in healthcare, cybersecurity, and legal decision-making contexts [51,58,65,71,76,77,80,90,93]. However, due to the heterogeneity of metrics, scales, and evaluation protocols across these studies, these results cannot yet be aggregated into a formal meta-analysis, highlighting the need for standardized GenAI explainability benchmarks.

Further, despite frequent claims of “using explainability,” few papers show end-user or developer-facing explanation artifacts. Selected examples include Grad-CAM overlays for diffusion-generated vs. real images [63] (Figure 7), SHAP global importance for GAN-based anomaly detection [76] (Figure 8), counterfactual comparisons for image classifiers (see, e.g., [82]), and Concept Lens for semantic edits in GANs (see, e.g., [53]).

Finally, Figure 9 distinguishes techniques usable across AI vs. GenAI-only. We observe 43 techniques (53.3%) applicable broadly and 28 (46.7%) GenAI-specific. None of the GenAI-specific methods are universal within GenAI; they are application-bound, underscoring the absence of common standards.

Overall, the field is inching from post hoc rationalization toward more integrated, adaptive, and user-centered approaches, but explainability is still too often an afterthought in empirical deployments.

When synthesized across domains and model families, the empirical evidence suggests that current explainability practices in GenAI function largely as compensatory mechanisms rather than principled explanatory solutions. Pre-existing XAI techniques are repeatedly adapted to approximate transparency, while novel approaches remain sparse and domain-specific. This pattern indicates that explainability research has focused more on mitigating the opacity of existing generative architectures than on reconceptualizing explanation in probabilistic terms. Importantly, this compensatory dynamic manifests differently across generative model families: transformer-based systems predominantly rely on post hoc, user-facing explanations due to opaque distributed representations and interactive deployment contexts; VAE-based models more frequently support ante hoc latent interpretability through structured latent spaces; and GAN-based systems depend heavily on proxy and surrogate explanations shaped by adversarial training dynamics. These systematic differences are not methodological coincidences but reflect structural properties of generative architectures, providing empirical support for the need for architecture-sensitive, generative-native explainability frameworks.

4.2.5. Evaluation Methods

We classify evaluation strategies into: metric-based, anecdotal evidence, expert-verified assessment, case-based validation, survey-based validation, and unevaluated (Figure 10 and Figure 11). Several papers use multiple strategies [50,51,52,53,56,58,67,68,69,71,78,79,81,85,86].

Metric-based evaluation dominates, especially for pre-existing and modified techniques. Importantly, many metrics gauge model performance (e.g., accuracy, precision/recall/F1, AUC/ROC, MCC, hallucination rates, reconstruction loss) rather than explanation quality [52,55,56,59,65,66,67,71,74,76,81,83,86,88]. Poor model scores can undermine otherwise well-formed explanations. A smaller subset targets explainability directly—e.g., CLIP alignment [52], readability [70], cosine similarity of perceptual features [68], user valuation and choice models [75], latent response/divergence/curvature [51], IM1/IM2/oracle scores [82], edit magnitude/variability and concept distance [53], misattribution/dehumanization scores [64], feature/sensitivity scores [69], SUS and user satisfaction/ethics indices [71], and user trust/reliance calibration [78]. These are replicable, but still miss facets of situated human interpretability; only a few works explicitly close that gap [71,78].

Anecdotal evidence and unevaluated techniques are also common. Anecdotal assessments (e.g., visual inspection, think-aloud, qualitative observations) appear in [50,51,52,53,58,63,67,68,71,73,77,79,81,85]. Although insightful, they lack rigor and reproducibility. A notable number of techniques remain unevaluated [54,57,59,60,61,62,72,78,80,84,89,92,93], raising concerns about practical readiness.

To avoid ambiguity, we clarify that the distinction between metric-based and anecdotal evaluations is not based on whether the method is quantitative or qualitative, but rather on the presence or absence of methodological structure and reproducibility. Metric-based evaluations including both quantitative metrics and systematically coded qualitative measures, apply predefined criteria, follow repeatable procedures, and allow comparison across studies. In contrast, anecdotal evidence refers to informal, observational, or illustrative assessments (e.g., subjective visual inspection) that lack systematic protocols and cannot be replicated consistently. Qualitative evaluation can therefore be rigorous when supported by formalized methods; only unsystematic observations fall under the anecdotal category.

Expert verification appears across categories [49,56,58,67,69,71,87,90,91], adding credibility but incurring potential bias and scalability limits. Case-based validation provides domain realism (e.g., confabulation detection [86], co-design [79]), though generalizability can be narrow. Survey-based evaluation is rare (e.g., [50]) despite its value for human interpretability.

These evaluation approaches reflect the third component of the framework: evaluation fidelity. The review shows, heavy reliance on metric-based evaluation, often measuring model performance rather than explanation quality. Further, limited user studies, despite being central to interpretability an inconsistent metrics across studies, preventing cross-comparison.

In summary, the key gaps and needs concern the lack of standardized evaluation metrics for GenAI explainability, which limits cross-study comparability and reproducibility. Real-world usability testing also remains scarce, while human-centered evaluation dimensions, such as cognitive alignment, trust calibration, and perceived usefulness, are underrepresented. Moreover, many novel techniques are proposed without rigorous empirical validation. Together, these findings underscore the urgent need for comprehensive evaluation frameworks that integrate technical fidelity with human-centered assessment to advance explainability in GenAI.

Further, integrating the two-stage review with the GenAI Explainability Framework reveals a fragmented landscape where techniques, models, and evaluation methods have evolved independently. The theoretical reframing introduced in this article brings conceptual cohesion by demonstrating how transparency, interpretability, and evaluation fidelity jointly shape explainability practices.

4.2.6. Cross-Study Comparisons

Cross-study comparison of the empirical evidence shows that explainability approaches are strongly shaped by the underlying GenAI model family. Transformer-based systems rely heavily on attention visualization, proxy models, or natural-language rationales, whereas GAN-based systems emphasize latent-space manipulations and semantic editing tools. VAE-based systems, by contrast, demonstrate more interpretable latent representations but are primarily used in narrow, structured domains. These differences reveal that explainability methods are not interchangeable across generative architectures, which contributes to the limited generalizability observed in the literature.

A further comparison across application domains shows additional variation. Studies in high-stakes settings (e.g., healthcare, finance, cybersecurity) prioritize evaluation fidelity and robustness, typically using quantitative metrics or expert assessments. In contrast, creative and educational domains tend to adopt more qualitative or user-centered evaluations, often without rigorous testing. This uneven distribution of evaluation strategies limits comparability across studies and reinforces the field’s current fragmentation. Taken together, these analytical contrasts highlight that explainability in GenAI is implemented inconsistently across models, methods, and domains, reinforcing the need for standardized evaluation practices.

5. Open Challenges in Explainable GenAI

The two-stage review reveals that the challenges surrounding explainability in GenAI are not merely isolated technical limitations, but manifestations of a deeper theoretical misalignment between prevailing XAI assumptions and the epistemic structure of generative models. Whereas traditional XAI techniques were developed for deterministic or near-deterministic prediction tasks, generative systems operate through probabilistic, multi-step sampling processes and high-dimensional latent representations. As a result, many commonly reported difficulties—such as unstable explanations, limited generalizability, and weak correspondence between explanations and actual generative behavior—reflect structural limitations of explanation paradigms rather than shortcomings of individual methods.

Across the reviewed literature, these challenges consistently span methodological, computational, evaluative, and ethical dimensions. A recurring issue is the continued reliance on post hoc techniques such as SHAP, LIME, and Grad-CAM, which provide useful surface-level cues but fail to explain the stochastic dynamics or latent-driven mechanisms underlying generation. Importantly, this methodological reliance emerges directly from the empirical patterns identified in Section 4, where adapted XAI techniques function simultaneously as practical workarounds and as constraints on theoretical progress. This persistent gap reinforces the need to reconceptualize explainability for GenAI as a mechanism-oriented and distribution-aware endeavor rather than as an extension of feature-attribution approaches.

To provide a clearer and more coherent structure, the following subsections are organized according to two complementary principles: (1) the stakeholder groups most affected by each challenge such as model developers, end users, domain experts, and policymakers and (2) the relative urgency of the challenge for ensuring trustworthy GenAI deployment. Methodological and architectural gaps primarily affect researchers and developers; scalability and evaluation challenges impact both developers and organizations deploying GenAI systems; and ethical, regulatory, and user-centered issues directly concern policymakers and end users. This framing provides a structured lens for interpreting the open challenges identified in the literature and aligns them with the decision-making needs of different stakeholder communities.

5.1. Lack of Generalizable and GenAI-Specific Frameworks

An important limitation concerns the generalizability of existing explainability techniques. As identified in both review stages, many methods remain domain- or model-specific and struggle to transfer across contexts [8,32,34,35]. This pattern was especially visible in the empirical studies, where explainability approaches often depended heavily on the underlying GenAI architecture (e.g., transformer-based or GAN-based models) and showed limited applicability across model families (see Figure 5).

Future research will likely need to prioritize the development of model-agnostic explainability frameworks capable of functioning across diverse generative architectures and data modalities. Possible directions include leveraging shared latent representations, multimodal embeddings, or causal abstraction layers to facilitate cross-domain interpretability.

Furthermore, as noted in the results, traditional XAI techniques—such as SHAP, LIME, and Grad-CAM—frequently face challenges when applied to generative architectures due to their open-ended and stochastic outputs [35,38,49,51,61,82]. This underscores the growing need for GenAI-specific explainability paradigms that engage more directly with the generative process, such as latent-space reasoning, probabilistic attribution, or prompt-based interpretability mechanisms [53,79].

5.2. Scalability and Computational Feasibility

Another notable challenge concerns scalability. Many explainability techniques are computationally expensive, limiting their use in real-time or large-scale GenAI applications [51,69,73,76,80]. Although these methods show promise in controlled experiments, their deployment in production systems remains rare [45]. Future work should explore computationally efficient approximations, surrogate modeling, or incremental explanation generation to balance interpretability with performance demands. Integrating explainability into model training pipelines, rather than applying it post hoc, could further reduce computational overhead while improving relevance and responsiveness.

5.3. Evaluation Metrics and Benchmarking

Both the umbrella and empirical reviews highlight the absence of standardized evaluation metrics for explainability in GenAI [34,39,72]. Current assessments employ heterogeneous benchmarks—some quantitative, some qualitative, and others purely subjective—making cross-study comparison difficult (Figure 10 and Figure 11). Moreover, evaluation is often limited to technical validation rather than assessing user comprehension or decision support.

To advance the field, a unified evaluation framework should combine fidelity-based metrics (e.g., faithfulness, completeness) with human-centered metrics (e.g., usability, trust calibration, cognitive alignment). Creating benchmark datasets and task-specific testbeds for explainability in GenAI, similar to those proposed by Kumarage and Saarela [100], could further promote reproducibility and comparability across studies.

The comparisons across reviews and empirical studies reveal that current explainability research for GenAI lacks convergence in both methodological approach and evaluative rigor. Techniques that perform well for one model family (e.g., GAN latent-space analysis) rarely transfer to others such as transformers or diffusion models. Likewise, evaluation methods vary widely across domains, preventing systematic comparison of explanation quality. This fragmentation limits cumulative progress and makes it difficult to identify best practices. Future work should therefore aim to develop shared frameworks, cross-model benchmarks, and unified evaluation protocols that allow for meaningful comparison across generative architectures and application contexts.

5.4. Explainability Challenges in Multimodal Generative Models

Multimodal GenAI systems such as text-to-image, text-to-video, audio-to-text, and retrieval-augmented models introduce a distinct set of explainability challenges that differ from unimodal generative architectures. Because the input and output modalities differ, explanations must bridge heterogeneous representational spaces. In text-to-image and diffusion-based models, for example, the generative process unfolds across multiple latent-space transformations and iterative denoising steps, making it difficult to attribute specific visual features to individual input tokens or prompts [97,101]. Furthermore, the semantic alignment between textual inputs and visual outputs is often indirect: linguistic concepts are mapped onto high-dimensional visual latent structures that lack clear human-interpretable boundaries, complicating transparency and attribution [102]. These cross-modal mappings are also susceptible to emergent behaviors such as over-literalization, hallucination, or unintended stylistic associations that current attribution methods struggle to capture [103,104].

Another challenge arises from the distributed and compositional nature of multimodal representations. Visual features emerge from interactions among multiple latent components and diffusion steps rather than from a single interpretable unit, limiting the usefulness of traditional attribution, attention maps, or token-level explanations. Explanations are further complicated by the cross-attention mechanisms that bridge text and image modalities: while attention-weight visualizations are common, recent research shows they do not reliably reflect causal or semantic relationships between modalities. Moreover, evaluation fidelity is difficult to establish because explanations must reflect not only what the model generates, but how meaning, style, and structure propagate across modalities. Techniques such as CLIP-based scoring or cross-attention heatmaps offer partial insight but cannot fully represent the multimodal reasoning process [105].

These challenges highlight the need for multimodal-specific explainability techniques that can align latent cross-modal representations, trace how semantic information flows from one modality to another, and communicate these processes in ways that are meaningful to end-users. Addressing these multimodal complexities represents an important future direction for explainable GenAI research.

5.5. Balancing Performance and Interpretability

As with traditional AI, GenAI research continues to face a trade-off between performance and interpretability. While simpler, more transparent models (e.g., VAEs) typically offer greater explainability, they often underperform on more complex generative tasks. Conversely, high-capacity models such as diffusion models or large transformers provide impressive generative capabilities but remain difficult to interpret [55,56]. This tension is particularly relevant in safety- or mission-critical contexts, where explainability cannot be compromised for performance.

Addressing this balance may require hybrid approaches that embed explainability into model optimization processes. Multi-objective optimization techniques or inherently interpretable generative architectures could help align accuracy and transparency goals [106]. Several emerging studies point toward such integration through human–GenAI collaborative frameworks [72,79], though broader methodological advances are still needed.

5.6. Ethical, Regulatory, and User-Centered Alignment

The reviewed literature underscores the increasing importance of ethical and regulatory considerations [56,62,70,80,83]. Policies such as the EU AI Act and Canada’s AIDA emphasize transparency, explainability, and human oversight in high-risk AI systems. However, existing GenAI explanations are often accessible primarily to technical specialists rather than general users or policymakers [107].

Future research should adopt human-centered explainability as a core design principle, ensuring that explanations are not only technically accurate but also accessible, contextual, and meaningful to non-expert stakeholders. Ethical concerns such as fairness, bias mitigation, and the avoidance of misleading or manipulative explanations should also be systematically integrated into GenAI explanation frameworks.

5.7. Synthesis and Outlook

Overall, this review highlights that despite rapid advances, explainability in GenAI remains fragmented. The field continues to lack (1) model-agnostic frameworks; (2) computationally scalable methods; (3) standardized evaluation metrics; (4) approaches balancing interpretability and performance; and (5) alignment with ethical and regulatory expectations.

Building on insights from both the conceptual (first stage) and empirical (second stage) analyses, future research should continue to develop explainability methods that are generalizable, computationally efficient, human-centered, and regulation-aware. Addressing these challenges will support the development of transparent and socially responsible GenAI systems across domains.

6. Practical Recommendations and Future Research Directions

Building on the challenges identified in Section 5, this section presents a set of practical, forward-looking recommendations derived from the two-stage review. These proposals extend beyond identifying gaps to offer concrete strategies for advancing explainability in GenAI. Together, they support a shift from post hoc interpretation techniques toward more integrated, user-centered, and context-aware approaches.

6.1. Integrating Explainability into Model Training Pipelines

A first recommendation is to incorporate explainability directly into generative model training pipelines. Most current methods operate post hoc, which limits their fidelity and contextual relevance. Training-time strategies such as latent-space regularization, attention supervision with human-annotated signals, and the integration of interpretable intermediate layers have the potential to yield generative models with more structured and conceptually meaningful internal representations. Approaches that embed interpretability constraints or auxiliary explanation objectives during model optimization could help bridge the gap between raw generative capacity and actionable transparency.

Actionable research questions:

How can training-time constraints (e.g., attention supervision, latent regularization) improve attribution fidelity without degrading generative quality?
What forms of internal representations are most interpretable for downstream applications?

Near-term milestone (1–2 years): Benchmarks for training-time interpretability across major GenAI architectures.
Long-term milestone (5+ years): Widely adopted training frameworks where interpretability is a built-in objective.

6.2. Provenance Tracking and Traceability for Generative Outputs

A complementary direction involves establishing robust mechanisms for multimodal provenance tracking. As GenAI models increasingly influence decision-making processes, stakeholders require clearer visibility into the data, retrieval sources, and generative pathways underlying produced outputs. Incorporating structured provenance metadata such as token-level attribution, retrieval logs, and watermarking techniques can enhance traceability and support regulatory compliance. Provenance-aware generative systems may also enable downstream auditors, developers, and end users to assess model behavior in more informed and responsible ways.

Actionable research questions:

How can token-level provenance be captured without violating privacy or model efficiency?
What are the minimal metadata standards required for regulatory compliance?

Near-term milestone: Prototype provenance metadata standards for text-to-image and multimodal systems.
Long-term milestone: Harmonized provenance protocols integrated into GenAI APIs.

6.3. Human-Centered Explanation Interfaces and Interaction Design

Explainability should also be framed as an interactional process rather than a static artifact. Human-centered explanation interfaces can provide layered, adaptive insights tailored to different users’ needs. Examples include interactive visualization tools that reveal latent-space trajectories, concept activation vectors, or uncertainty distributions; conversational explanation agents that combine GenAI outputs with contextual justifications; and counterfactual exploration tools that allow users to inspect how alternative prompts or conditions might shape generated outputs. Such interfaces help position explainability as a dynamic, user-driven experience that aligns with the cognitive and domain-specific demands of stakeholders.

Actionable research questions:

What interaction patterns (e.g., layered explanations, contrastive queries) best support non-experts?
How can explanation interfaces adapt dynamically to user goals and expertise?

Near-term milestone: Usability-tested multimodal explanation interfaces.
Long-term milestone: Standardized interaction guidelines for explainable GenAI tools.

6.4. Human-Centered Explainability and Stakeholder Needs

Human-centered explainability requires designing explanations that align with the cognitive demands, responsibilities, and expertise of the diverse stakeholders who interact with GenAI systems. While technical transparency remains essential, meaningful explainability must reflect the different interpretive needs of creators, regulators, and end-users.

Developers and model creators require mechanistic transparency, including insights into latent-space dynamics, uncertainty estimates, dataset provenance, and known failure modes. These explanations aid debugging, model alignment, safety assessment, and optimization. Regulators, auditors, and governance bodies require accountability-oriented explanations, emphasizing traceability, fairness assessments, robustness indicators, logs of generative pathways, and documentation that supports legal compliance (e.g., explainability requirements under the EU AI Act). Their needs center on reproducibility and risk visibility rather than architectural detail. End-users (e.g., educators, clinicians, designers, consumers) require task-oriented, domain-specific explanations that help them interpret outputs correctly, assess model confidence, and understand how to steer or refine responses. These explanations often take the form of natural-language rationales, visual cues, example-based explanations, or interactive exploratory tools.

Mapping these needs to the GenAI Explainability Triangle reveals clear alignment that generative mechanism transparency supports developers, user-centered interpretability supports end-users, and evaluation fidelity supports regulators. This stakeholder-centered framing highlights that explainability in GenAI cannot be one-size-fits-all; instead, it must adapt to the varying roles, goals, and responsibilities across the GenAI ecosystem.

Actionable research questions:

What level of explanation granularity is optimal for different stakeholders (developers, regulators, end-users)?
How can explanations be evaluated for cognitive alignment and decision usefulness?

Near-term milestone: Stakeholder-specific explanation taxonomies and evaluation rubrics.
Long-term milestone: Regulatory-ready explanation templates tailored to high-risk domains.

6.5. Domain-Adaptive and Standardized Evaluation Protocols

Another recommendation is the development of domain-adaptive evaluation protocols for explainability. Current evaluation practices remain heterogeneous and fragmented, limiting the comparability of techniques across studies. A tiered evaluation framework that integrates fidelity-based metrics with human-centered measures—such as usability, trust calibration, or cognitive alignment—would support more consistent assessment. Establishing benchmark datasets or shared explainability testbeds could further promote methodological convergence and facilitate meaningful comparisons across generative architectures.

Actionable research questions:

How can fidelity metrics be combined with human-centered metrics to provide holistic evaluation?
Can domain-specific benchmarks (e.g., medical, legal, educational) be generalized across sectors?

Near-term milestone: Release of public benchmark datasets for GenAI explainability.
Long-term milestone: International standards for GenAI explanation quality (ISO/IEC-style).

6.6. Hybrid Neuro-Symbolic and Generative Approaches

Finally, hybrid neuro-symbolic and generative approaches represent a promising direction for future work. Combining symbolic reasoning layers, causal abstraction models, structured knowledge graphs, and generative sampling mechanisms may help bridge the conceptual gap between low-level model mechanics and high-level human-understandable concepts. Such hybrid frameworks could support more transparent and controllable generative processes across diverse application domains.

Actionable research questions:

How can causal abstraction layers be integrated into generative pipelines?
What types of symbolic representations improve transparency without limiting generative flexibility?

Near-term milestone: Experimental hybrid architectures demonstrated across multiple modalities.
Long-term milestone: Scalable hybrid systems widely deployed in industry.

6.7. A Roadmap for Explainable GenAI

To prioritize the urgency of the challenges identified in this review, we propose the following roadmap:

Immediate Priorities (0–2 years)

Training-time interpretability constraints.
Provenance and traceability mechanisms.
User-tested explanation interfaces.
Multimodal explainability benchmarks.

Medium-Term Priorities (3–5 years)

Stakeholder-aligned explanation standards.
Domain-specific evaluation frameworks.
Hybrid neuro-symbolic–generative prototypes.

Long-Term Priorities (5+ years)

Model-agnostic explainability frameworks.
Regulatory-aligned, auditable GenAI systems.
Fully integrated human-centered design pipelines.

This roadmap translates the conceptual gaps identified in both review stages into a structured, actionable research agenda for the field.

7. Conclusions

This study examined the current state of explainability in GenAI through a two-stage systematic review comprising an umbrella review of existing secondary literature and an empirical review of primary studies. From an initial corpus of 261 records identified across six major databases, 63 articles were retained for full-text analysis. The first stage synthesized 18 review papers to map the conceptual and methodological landscape of explainability in GenAI, while the second stage analyzed 45 empirical studies to examine how explainability techniques are implemented, evaluated, and applied across generative architectures and domains.

Findings from both stages reveal that most explainability approaches in GenAI remain adaptations of pre-existing XAI techniques rather than purpose-built, generative-specific methods [35,36,38]. Evaluation practices are heterogeneous and frequently lack standardization [34,39], limiting cross-study comparability and reproducibility. Synthesizing insights across the two stages, the review identifies four interrelated areas that warrant further research: (1) the development of model-agnostic explainability frameworks applicable across diverse GenAI architectures; (2) the establishment of standardized evaluation metrics, benchmarks, and protocols; (3) the resolution of trade-offs between interpretability and generative model performance [55,56]; and (4) the alignment of explainability with ethical, regulatory, and human-centered principles [32,62,83].

By integrating conceptual and empirical perspectives, this review consolidates a fragmented body of work and clarifies methodological trends that define the emerging field of explainable GenAI. It highlights the persistence of traditional XAI paradigms, the scarcity of GenAI-specific methods, and the urgent need for robust evaluation frameworks that combine technical fidelity with human-centered design. Together, these insights provide a foundation for developing GenAI systems that are transparent, interpretable, and trustworthy across domains.

In addition to outlining these challenges, this review also advances a set of practical recommendations aimed at guiding future research and system design. These include integrating explainability directly into generative model training pipelines, incorporating multimodal provenance tracking mechanisms to support transparency and regulatory compliance, designing human-centered explanation interfaces that enable interactive and adaptive engagement, and developing domain-adaptive evaluation protocols that combine fidelity-based and user-centered metrics. Hybrid neuro-symbolic approaches that link low-level generative processes with high-level conceptual reasoning also represent a promising frontier. These recommendations complement the identified challenges and highlight actionable directions for building explainability into GenAI technologies in a more systematic and anticipatory manner.

Nonetheless, several limitations should be acknowledged. A notable limitation is the exclusion of preprints and gray literature. Given the fast-moving nature of GenAI research, preprints frequently introduce new explainability techniques earlier than formal publications. While our focus on peer-reviewed studies ensured rigor and comparability, future reviews would benefit from incorporating preprints to capture cutting-edge innovations and broaden the scope of emerging GenAI-specific explainability methods. Additionally, the search strategy was based on explicit terminology related to “explainability” and “GenAI,” which may have omitted conceptually related work using alternative descriptors. Future reviews could expand this scope by including gray literature, preprints, and interdisciplinary sources.

While the proposed framework is grounded in established theories of scientific, cognitive, and epistemic explanation, it remains a conceptual synthesis rather than a formalized theory with operational metrics. The review does not empirically test how the proposed explanatory dimensions interact in deployed GenAI systems, nor does it quantify trade-offs between transparency, interpretability, and fidelity. Future work should empirically operationalize these dimensions and validate their applicability across different generative architectures and user contexts.

Explainability remains one of the most complex yet critical frontiers in GenAI research. As generative systems become increasingly embedded in creative, decision-making, and safety-critical contexts, progress will depend on interdisciplinary collaboration, transparent governance, and continued innovation in user-centric explainability [72,79]. Advancing these directions is essential to ensure that GenAI evolves not only as a powerful technological paradigm but also as a responsible and interpretable one.

Author Contributions

Conceptualization, P.M.K. and M.S.; methodology, P.M.K. and M.S.; software, P.M.K. and M.S.; validation, P.M.K. and M.S.; formal analysis, P.M.K.; investigation, P.M.K.; resources, P.M.K.; data curation, P.M.K.; writing—original draft preparation, P.M.K.; writing—review and editing, M.S.; visualization, P.M.K.; supervision, M.S.; project administration, M.S.; funding acquisition, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Academy of Finland (project no. 356314).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset created during the full-text review, including predefined codes and protocol details, is available from the first author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AAEs	Adversarial Autoencoders
ALE	Accumulated Local Effects
AI	Artificial Intelligence
CAM	Class Activation Mapping
CDGM	Causal Deep Generative Models
CNNs	Convolutional Neural Networks
CTGAN	Conditional Tabular Generative Adversarial Network
CVAEs	Conditional Variational Autoencoders
DL	Deep Learning
DMs	Diffusion Models
DT	Decision Tree
EU AI Act	European Union Artificial Intelligence Act
GANs	Generative Adversarial Networks
GenAI	Generative Artificial Intelligence
GPT	Generative Pre-trained Transformer
Grad-CAM	Gradient-Weighted Class Activation Mapping
HCI	Human–Computer Interaction
ICU	Intensive Care Unit
IEEE	Institute of Electrical and Electronics Engineers
IG	Integrated Gradients
IML	Interpretable Machine Learning
IoT	Internet of Things
k-NN	k-Nearest Neighbor
LIME	Local Interpretable Model-agnostic Explanations
LLMs	Large Language Models
LR	Logistic Regression
LRP	Layer-wise Relevance Propagation
ML	Machine Learning
NLP	Natural Language Processing
NN	Neural Network
OSF	Open Science Framework
PDPs	Partial Dependency Plots
PRISMA	Preferred Reporting Items for Systematic Reviews and Meta-Analysis
RAG	Retrieval-Augmented Generation
RISE	Randomized Input Sampling for Explanation
RLHF	Reinforcement Learning from Human Feedback
SHAP	SHapley Additive exPlanations
SLR	Systematic Literature Review
TRMs	Transformer-based Models
VAEs	Variational Autoencoders
XAI	eXplainable Artificial Intelligence

References

Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Kingma, D.P.; Welling, M. An Introduction to Variational Autoencoders. Found. Trends Mach. Learn. 2019, 12, 307–392. [Google Scholar] [CrossRef]
Sander, M.E.; Giryes, R.; Suzuki, T.; Blondel, M.; Peyré, G. How do transformers perform in-context autoregressive learning? In Proceedings of the 41st International Conference on Machine Learning (ICML 2024), JMLR.org, Vienna, Austria, 21–27 July 2024; Available online: https://proceedings.mlr.press/v235/sander24a.html (accessed on 12 January 2026).
Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.; Zhang, W.; Cui, B.; Yang, M.H. Diffusion Models: A Comprehensive Survey of Methods and Applications. ACM Comput. Surv. 2023, 56, 60. [Google Scholar] [CrossRef]
Feuerriegel, S.; Hartmann, J.; Janiesch, C.; Zschech, P. Generative AI. Bus. Inf. Syst. Eng. 2023, 66, 111–126. [Google Scholar] [CrossRef]
Dwivedi, R.; Dave, D.; Naik, H.; Singhal, S.; Omer, R.; Patel, P.; Qian, B.; Wen, Z.; Shah, T.; Morgan, G.; et al. Explainable AI (XAI): Core Ideas, Techniques, and Solutions. ACM Comput. Surv. 2023, 55, 194. [Google Scholar] [CrossRef]
Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef] [PubMed]
Schneider, J. Explainable Generative AI (GenXAI): A Survey, Conceptualization, and Research Agenda. Artif. Intell. Rev. 2024, 57, 289. [Google Scholar] [CrossRef]
Zhao, H.; Chen, H.; Yang, F.; Liu, N.; Deng, H.; Cai, H.; Wang, S.; Yin, D.; Du, M. Explainability for Large Language Models: A Survey. ACM Trans. Intell. Syst. Technol. 2024, 15, 20. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. Int. J. Surg. 2021, 88, n71. [Google Scholar] [CrossRef]
Sengar, S.S.; Hasan, A.B.; Kumar, S.; Carroll, F. Generative artificial intelligence: A systematic review and applications. Multimed. Tools Appl. 2024, 84, 23661–23700. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
Chen, M.; Mei, S.; Fan, J.; Wang, M. Opportunities and challenges of diffusion models for generative AI. Natl. Sci. Rev. 2024, 11, nwae348. [Google Scholar] [CrossRef]
Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; Lerchner, A. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Decardi-Nelson, B.; Alshehri, A.S.; Ajagekar, A.; You, F. Generative AI and process systems engineering: The next frontier. Comput. Chem. Eng. 2024, 187, 108723. [Google Scholar] [CrossRef]
Zhang, S.; Han, T.; Bhalla, U.; Lakkaraju, H. Unifying AI Attribution: A New Frontier in Understanding Complex Systems; Insight Article; D^3 Institute—Digital Data Design Institute at Harvard: Boston, MA, USA, 2025. [Google Scholar]
Anthony, Q.; Michalowicz, B.; Hatef, J.; Xu, L.; Abduljabbar, M.; Shafi, A.; Subramoni, H.; Panda, D.K.D. Understanding and Characterizing Communication Characteristics for Distributed Transformer Models. IEEE Micro 2025, 45, 8–17. [Google Scholar] [CrossRef]
Belcic, I.; Stryker, C. RAG vs. Fine-Tuning vs. Prompt Engineering. IBM Think 2025. Available online: https://www.ibm.com/think/topics/rag-vs-fine-tuning-vs-prompt-engineering (accessed on 12 January 2026).
Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 2023, 55, 195. [Google Scholar] [CrossRef]
Lent, M.; Fisher, W.; Mancuso, M. An Explainable Artificial Intelligence System for Small-Unit Tactical Behavior. In Proceedings of the 16th Innovative Applications of Artificial Intelligence Conference, San Jose, CA, USA, 27–29 July 2004; AAAI Press: Washington, DC, USA, 2004; pp. 900–907. [Google Scholar]
Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Ahmed, N.A.; Alpkocak, A. A quantitative evaluation of explainable AI methods using the depth of decision tree. Turk. J. Electr. Eng. Comput. Sci. 2022, 30, 2054–2072. [Google Scholar] [CrossRef]
Machamer, P.; Darden, L.; Craver, C.F. Thinking about Mechanisms. Philos. Sci. 2000, 67, 1–25. [Google Scholar] [CrossRef]
Lombrozo, T. The structure and function of explanations. Trends Cogn. Sci. 2006, 10, 464–470. [Google Scholar] [CrossRef]
Miller, T. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 2019, 267, 1–38. [Google Scholar] [CrossRef]
Larsson, S.; Heintz, F. Transparency in artificial intelligence. Internet Policy Rev. 2020, 9, 1469. [Google Scholar] [CrossRef]
Ensuring Transparency in Generative AI Systems. 2025. Available online: https://palospublishing.com/ensuring-transparency-in-generative-ai-systems/ (accessed on 6 December 2025).
Braun, V.; Clarke, V. Using thematic analysis in psychology. Qual. Res. Psychol. 2006, 3, 77–101. [Google Scholar] [CrossRef]
Da’u, A.; Salim, N. Recommendation system based on deep learning methods: A systematic review and new directions. Artif. Intell. Rev. 2020, 53, 2709–2748. [Google Scholar] [CrossRef]
Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; EBSE Technical Report EBSE-2007-01; Software Engineering Group, School of Computer Science and Mathematics, Keele University: Keele, UK, 2007. [Google Scholar]
Saarela, M.; Kärkkäinen, T. Can we automate expert-based journal rankings? Analysis of the Finnish publication indicator. J. Inf. 2020, 14, 101008. [Google Scholar] [CrossRef]
Bushey, J. AI-Generated Images as an Emergent Record Format. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy, 15–18 December 2023; pp. 2020–2031. [Google Scholar]
Hanif, A.; Beheshti, A.; Benatallah, B.; Zhang, X.; Habiba; Foo, E.; Shabani, N.; Shahabikargar, M. A Comprehensive Survey of Explainable Artificial Intelligence (XAI) Methods: Exploring Transparency and Interpretability. In Proceedings of the Web Information Systems Engineering, Victoria, Australia, 25–27 October 2023; Zhang, F., Wang, H., Barhamgi, M., Chen, L., Zhou, R., Eds.; Springer Nature: Singapore, 2023; pp. 915–925. [Google Scholar]
Zarghami, S.; Kouchaki, H.; Yang, L.; Martinez, P. Explainable Artificial Intelligence in Generative Design for Construction. In Proceedings of the 2024 European Conference on Computing in Construction, Crete, Greece, 14–17 July 2024. [Google Scholar]
Longo, L.; Brcic, M.; Cabitza, F.; Choi, J.; Confalonieri, R.; Ser, J.D.; Guidotti, R.; Hayashi, Y.; Herrera, F.; Holzinger, A.; et al. Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Inf. Fusion 2024, 106, 102301. [Google Scholar] [CrossRef]
Mudabbiruddin, M.; Mosavi, A.; Imre, F. From Deep Learning to ChatGPT for Materials Design. In Proceedings of the 2024 IEEE 11th International Conference on Computational Cybernetics and Cyber-Medical Systems (ICCC), Hanoi, Vietnam, 4–6 April 2024; pp. 1–8. [Google Scholar]
Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; Wu, X. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng. 2024, 36, 3580–3599. [Google Scholar] [CrossRef]
Jain, R.; Jain, A. Generative AI in Writing Research Papers: A New Type of Algorithmic Bias and Uncertainty in Scholarly Work. In Artificial Intelligence and Soft Computing; Rutkowski, L., Ed.; Springer: Berlin/Heidelberg, Germany, 2024; pp. 656–669. [Google Scholar]
Zeiser, T.; Ehret, D.; Lutz, T.; Saar, J. Explainable AI in Manufacturing. In Proceedings of the 2024 IEEE International Conference on Engineering, Technology, and Innovation (ICE/ITMC), Funchal, Portugal, 24–28 June 2024; pp. 1–8. [Google Scholar]
Qu, T.; Yang, Z. Overview of Artificial Intelligence Applications in Educational Research; ISAIE ’24. In Proceedings of the 2024 International Symposium on Artificial Intelligence for Education, New York, NY, USA, 6–8 September 2024; pp. 101–108. [Google Scholar]
Bui, L.V. Advancing patent law with generative AI: Human-in-the-loop systems for AI-assisted drafting, prior art search, and multimodal IP protection. World Pat. Inf. 2025, 80, 102341. [Google Scholar] [CrossRef]
Ye, X.; Yigitcanlar, T.; Goodchild, M.; Huang, X.; Li, W.; Shaw, S.L.; Fu, Y.; Gong, W.; Newman, G. Artificial intelligence in urban science: Why does it matter? Ann. GIS 2025, 31, 181–189. [Google Scholar] [CrossRef]
Demuth, S.; Paris, J.; Faddeenkov, I.; De Sèze, J.; Gourraud, P.A. Clinical applications of deep learning in neuroinflammatory diseases: A scoping review. Rev. Neurol. 2025, 181, 135–155. [Google Scholar] [CrossRef] [PubMed]
López Joya, S.; Diaz-Garcia, J.; Ruiz, M.; Martin-Bautista, M. Dissecting a social bot powered by generative AI: Anatomy, new trends and challenges. Soc. Netw. Anal. Min. 2025, 15, 7. [Google Scholar] [CrossRef]
Abbas, K. Management accounting and artificial intelligence: A comprehensive literature review and recommendations for future research. Br. Account. Rev. 2025, 57, 101551. [Google Scholar] [CrossRef]
Non, L.R.; Marra, A.R.; Ince, D. Rise of the Machines—Artificial Intelligence in Healthcare Epidemiology. Curr. Infect. Dis. Rep. 2025, 27, 4. [Google Scholar] [CrossRef]
Westphal, A.; Mrowka, R. Special issue European Journal of Physiology: Artificial intelligence in the field of physiology and medicine. Pflügers Arch. Eur. J. Physiol. 2025, 477, 509–512. [Google Scholar] [CrossRef]
Mikołajewska, E.; Mikołajewski, D.; Mikołajczyk, T.; Paczkowski, T. Generative AI in AI-Based Digital Twins for Fault Diagnosis for Predictive Maintenance in Industry 4.0/5.0. Appl. Sci. 2025, 15, 3166. [Google Scholar] [CrossRef]
Sun, J.; Liao, V.; Muller, M.; Agarwal, M.; Houde, S.; Talamadupula, K.; Weisz, J. Investigating Explainability of Generative AI for Code through Scenario-based Design. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April–5 May 2022; pp. 212–228. [Google Scholar] [CrossRef]
El-Zanfaly, D.; Huang, Y.; Dong, Y. Sand-in-the-loop: Investigating Embodied Co-Creation for Shared Understandings of Generative AI. In Proceedings of the 2023 ACM Designing Interactive Systems Conference (DIS), Pittsburgh, PA, USA, 10–14 July 2023; pp. 256–260. [Google Scholar] [CrossRef]
Ezzahed, Z.; Chevrot, A.; Hurter, C.; Olive, X. Bringing Explainability to Autoencoding Neural Networks Encoding Aircraft Trajectories. In Proceedings of the 13th SESAR Innovation Days 2023, SIDS 2023, Séville, Spain, 27–30 November 2023. [Google Scholar]
Wang, Y.; Shen, S.; Lim, B.Y. RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA, 23–28 April 2023. [Google Scholar] [CrossRef]
Jeong, S.; Li, M.; Berger, M.; Liu, S. Concept Lens: Visually Analyzing the Consistency of Semantic Manipulation in GANs. In Proceedings of the 2023 IEEE Visualization and Visual Analytics (VIS), Melbourne, Australia, 22–27 October 2023; pp. 221–225. [Google Scholar] [CrossRef]
Hasko, R.; Hasko, O.; Kutucu, H. Teaching Assistant Robots in Various Fields: Natural Sciences, Medicine and Specific Non-Deterministic Conditions. In Proceedings of the 6th International Conference on Informatics and Data-Driven Medicine (IDDM 2023), Bratislava, Slovakia, 17–19 November 2023; Volume 3609, pp. 303–309. [Google Scholar]
Minutti, C.; Escalante-Ramírez, B.; Olveres, J. PumaMedNet-CXR: An Explainable Generative Artificial Intelligence for the Analysis and Classification of Chest X-Ray Images. Comput. Sist. 2023, 27, 909–920. [Google Scholar] [CrossRef]
Esposito, M.; Palagiano, F.; Lenarduzzi, V.; Taibi, D. Beyond Words: On Large Language Models Actionability in Mission-Critical Risk Analysis; ESEM ’24. In Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, Barcelona, Spain, 24–25 October 2024; pp. 517–527. [Google Scholar] [CrossRef]
Moruzzi, S.; Ferrari, F.; Riscica, F. Biases, Epistemic Filters, and Explainable Artificial Intelligence. In Proceedings of the Workshops at the Third International Conference on Hybrid Human-Artificial Intelligence (HHAI 2024), Malmö, Sweden, 10–14 June 2024; CEUR Workshop Proceedings. Volume 3825, pp. 33–36. [Google Scholar]
Pozzi, M.; Noei, S.; Robbi, E.; Cima, L.; Moroni, M.; Munari, E.; Torresani, E.; Jurman, G. Generating and evaluating synthetic data in digital pathology through diffusion models. Sci. Rep. 2024, 14, 28435. [Google Scholar] [CrossRef] [PubMed]
Pontorno, O.; Guarnera, L.; Battiato, S. On the Exploitation of DCT-Traces in the Generative-AI Domain. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 27–30 October 2024; pp. 3806–3812. [Google Scholar] [CrossRef]
Riello, P.; Quille, K.; Jaiswal, R.; Sansone, C. Reimagining Student Success Prediction: Applying LLMs in Educational AI with XAI; HCAIep ’24. In Proceedings of the 2024 Conference on Human Centred Artificial Intelligence—Education and Practice, New York, NY, USA, 2–3 December 2024; pp. 34–40. [Google Scholar] [CrossRef]
Sachan, S.; Dezem, V.; Fickett, D. Blockchain for Ethical and Transparent Generative AI Utilization by Banking and Finance Lawyers. In Proceedings of the Explainable Artificial Intelligence, Valletta, Malta, 17–19 July 2024; Longo, L., Lapuschkin, S., Seifert, C., Eds.; Springer: Cham, Switzerland, 2024; pp. 319–333. [Google Scholar] [CrossRef]
Sachan, S.; Liang, X.; Liu, X. Blockchain-based auditing of legal decisions supported by explainable AI and generative AI tools. Eng. Appl. Artif. Intell. 2024, 129, 107666. [Google Scholar] [CrossRef]
Bird, J.; Lotfi, A. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access 2024, 12, 15642–15650. [Google Scholar] [CrossRef]
Burgess, M. Deceptive AI dehumanizes: The ethics of misattributed intelligence in the design of Generative AI interfaces. In Proceedings of the 2024 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), Liverpool, UK, 2–6 September 2024; pp. 96–108. [Google Scholar] [CrossRef]
Ince, V.; Bader-El-Den, M.; Sari, O. Enhanced dataset synthesis using CTGAN for metagenomic dataset. In Proceedings of the 2024 IEEE 12th International Conference on Intelligent Systems, IS 2024, Varna, Bulgaria, 29–31 August 2024; Sgurev, V., Jotsov, V., Piuri, V., Doukovska, L., Yoshinov, R., Eds.; IEEE: New York City, NY, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
Bryan-Kinns, N.; Zhang, B.; Zhao, S.; Banar, B. Exploring Variational Auto-encoder Architectures, Configurations, and Datasets for Generative Music Explainable AI. Mach. Intell. Res. 2024, 21, 29–45. [Google Scholar] [CrossRef]
Abu-Rasheed, H.; Abdulsalam, M.H.; Weber, C.; Fathi, M. Supporting Student Decisions on Learning Recommendations: An LLM-Based Chatbot with Knowledge Graph Contextualization for Conversational Explainability and Mentoring. In Proceedings of the Joint Proceedings of LAK 2024 Workshops Co-Located with 14th International Conference on Learning Analytics and Knowledge (LAK 2024), Kyoto, Japan, 18–22 March 2024; CEUR-WS.org, CEUR Workshop Proceedings. Volume 3667, pp. 230–239. [Google Scholar]
Herdt, R.; Maass, P. Visualize and Paint GAN Activations. In Proceedings of the 2024 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), London, UK, 22–25 September 2024; pp. 1–6. [Google Scholar] [CrossRef]
Balmer, V.; Kuhn, S.; Bischof, R.; Salamanca, L.; Kaufmann, W.; Perez-Cruz, F.; Kraus, M. Design Space Exploration and Explanation via Conditional Variational Autoencoders in Meta-Model-Based Conceptual Design of Pedestrian Bridges. Autom. Constr. 2024, 163, 105411. [Google Scholar] [CrossRef]
Vilone, G.; Sovrano, F.; Lognoul, M. On the Explainability of Financial Robo-Advice Systems. In Explainable Artificial Intelligence for Finance; Springer: Berlin/Heidelberg, Germany, 2024; pp. 219–242. [Google Scholar] [CrossRef]
Durango, I.; Gallud, J.A.; Penichet, V. The data dance: Choreographing seamless partnerships between humans, data, and GenAI. Int. J. Data Sci. Anal. 2024, 20, 3613–3640. [Google Scholar] [CrossRef]
Kim, P.W. A Framework to Overcome the Dark Side of Generative Artificial Intelligence (GAI) Like ChatGPT in Social Media and Education. IEEE Trans. Comput. Soc. Syst. 2024, 11, 5266–5274. [Google Scholar] [CrossRef]
Lee, D.; Lee, J.; Shin, D. GPT Prompt Engineering for a Large Language Model-Based Process Improvement Generation System. Korean J. Chem. Eng. 2024, 41, 3263–3286. [Google Scholar] [CrossRef]
Heo, S.; Byun, J.; Ifaei, P.; Ko, J.; Ha, B.; Hwangbo, S.; Yoo, C. Towards mega-scale decarbonized industrial park (Mega-DIP): Generative AI-driven techno-economic and environmental assessment of renewable and sustainable energy utilization in petrochemical industry. Renew. Sustain. Energy Rev. 2024, 189, 113933. [Google Scholar] [CrossRef]
Jang, S.; Lee, H.; Kim, Y.; Lee, D.; Shin, J.; Nam, J. When, What, and how should generative artificial intelligence explain to Users? Telemat. Inform. 2024, 93, 102175. [Google Scholar] [CrossRef]
Demirbaga, U. Advancing anomaly detection in cloud environments with cutting-edge generative AI for expert systems. Expert Syst. 2024, 42, e13722. [Google Scholar] [CrossRef]
Biswal, S. SCOUT: Surveillance and Cyber harassment Observation of Unseen Threats. In Proceedings of the 2024 International Conference on Artificial Intelligence, Metaverse and Cybersecurity (ICAMAC), Dubai, United Arab Emirates, 25–26 October 2024; pp. 1–6. [Google Scholar] [CrossRef]
Kim, S.S.Y. Establishing Appropriate Trust in AI through Transparency and Explainability. In Proceedings of the CHI Extended Abstracts, Hybrid, HO, USA, 11–16 May 2024; pp. 433:1–433:6. [Google Scholar] [CrossRef]
Ehsan, U.; Riedl, M. Explainable AI Reloaded: Challenging the XAI Status Quo in the Era of Large Language Models; HttF ’24. In Proceedings of the Halfway to the Future Symposium, Santa Cruz, CA, USA, 21–23 October 2024. [Google Scholar] [CrossRef]
Chaccour, C.; Karapantelakis, A.; Murphy, T.; Dohler, M. Telecom’s Artificial General Intelligence (AGI) Vision: Beyond the GenAI Frontier. IEEE Netw. 2024, 38, 21–28. [Google Scholar] [CrossRef]
Pendyala, V.S.; Chintalapati, A. Using Multimodal Foundation Models for Detecting Fake Images on the Internet with Explanations. Future Internet 2024, 16, 432. [Google Scholar] [CrossRef]
Taylor-Melanson, W.; Sadeghi, Z.; Matwin, S. Causal generative explainers using counterfactual inference: A case study on the Morpho-MNIST dataset. Pattern Anal. Appl. 2024, 27, 89. [Google Scholar] [CrossRef]
Hu, Y.; Giacaman, N.; Donald, C. Enhancing Trust in Generative AI: Investigating Explainability of LLMs to Analyse Confusion in MOOC Discussions. In Proceedings of the Joint Proceedings of LAK 2024 Workshops, Kyoto, Japan, 18–22 March 2024. [Google Scholar]
Toth, G.; Albrecht, R.; Pruski, C. Explainable AI, LLM, and digitized archival cultural heritage: A case study of the Grand Ducal Archive of the Medici. AI Soc. 2025, 40, 4561–4573. [Google Scholar] [CrossRef]
Di Lodovico, C.; Torrielli, F.; Di Caro, L.; Rapp, A. How Do People Develop Folk Theories of Generative AI Text-to-Image Models? A Qualitative Study on How People Strive to Explain and Make Sense of GenAI. Int. J. Hum. Comput. Interact. 2025, 42, 14846–14870. [Google Scholar] [CrossRef]
Leimeister, J.M.; Reinhard, P.; Li, M.; Fina, M. Fact or Fiction? Exploring Explanations to Identify Factual Confabulations in RAG-Based LLM Systems. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, ACM, Yokohama, Japan, 26 April–1 May 2025. [Google Scholar] [CrossRef]
Jeck, J.; Leiser, F.; Hüsges, A.; Sunyaev, A. TELL-ME: Toward Personalized Explanations of Large Language Models; CHI EA ’25. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, New York, NY, USA, 11–16 May 2025. [Google Scholar] [CrossRef]
Basaran, O.T.; Dressler, F. XAInomaly: Explainable and interpretable Deep Contractive Autoencoder for O-RAN traffic anomaly detection. Comput. Netw. 2025, 261, 111145. [Google Scholar] [CrossRef]
Ahmed, M.U.; Begum, S.; Barua, S.; Masud, A.N.; Di Flumeri, G.; Navarin, N. Enhancing Explainability, Robustness, and Autonomy: A Comprehensive Approach in Trustworthy AI. In Proceedings of the 2025 IEEE Symposium on Trustworthy, Explainable and Responsible Computational Intelligence (CITREx), Trondheim, Norway, 17–20 March 2025; pp. 1–7. [Google Scholar] [CrossRef]
Bhattacharya, A.; Stumpf, S.; De Croon, R.; Verbert, K. Explanatory Debiasing: Involving Domain Experts in the Data Generation Process to Mitigate Representation Bias in AI Systems. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25, New York, NY, USA, 26 April–1 May 2025. [Google Scholar] [CrossRef]
Katsuragi, M.; Tanaka, K. Comparing AI-Generated and Human-Crafted T-Shirt Layouts through an XAI Lens: Key Design Elements and Implications for Co-Creative Tools. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA ’25, New York, NY, USA, 11–16 May 2025. [Google Scholar] [CrossRef]
Yoshioka, T.; Morikura, Y.; Izumi, T.; Wada, T. Sustainable data-driven framework and policy recommendations for enhancing sports promotion using generative and explainable Artificial Intelligence. J. Phys. Educ. Sport 2025, 25, 638–645. [Google Scholar] [CrossRef]
Rathakrishnan, M.; Gayan, S.; Edirisinghe, S.; Inaltekin, H. A Multi-Model Framework for Synthesizing High-Fidelity Network Intrusion Data Using Generative AI. In Proceedings of the 2025 5th International Conference on Advanced Research in Computing (ICARC), Belihuloya, Sri Lanka, 19–20 February 2025; pp. 1–6. [Google Scholar] [CrossRef]
Future of Life Institute. The Act Texts—EU Artificial Intelligence Act. 2024. Available online: https://artificialintelligenceact.eu/the-act/ (accessed on 28 October 2025).
Salih, A.M.; Raisi-Estabragh, Z.; Galazzo, I.B.; Radeva, P.; Petersen, S.E.; Lekadir, K.; Menegaz, G. A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME. Adv. Intell. Syst. 2025, 7, 2400304. [Google Scholar] [CrossRef]
Slack, D.; Hilgard, S.; Jia, E.; Singh, S.; Lakkaraju, H. Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods; AIES ’20. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, New York, NY, USA, 7–12 February 2020; pp. 180–186. [Google Scholar] [CrossRef]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10674–10685. [Google Scholar] [CrossRef]
Elhage, N.; Nanda, N.; Olsson, C.; Henighan, T.; Joseph, N.; Mann, B.; Askell, A.; Bai, Y.; Chen, A.; Conerly, T.; et al. A Mathematical Framework for Transformer Circuits. 2021. Available online: https://transformer-circuits.pub/2021/framework/index.html (accessed on 6 December 2025).
Rauker, T.; Ho, A.; Casper, S.; Hadfield-Menell, D. Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. In Proceedings of the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), Raleigh, NC, USA, 8–10 February 2023; pp. 464–483. [Google Scholar] [CrossRef]
Kumarage, P.; Saarela, M. Explainability in Generative AI: An Umbrella Review of Current Techniques, Limitations, and Future Directions. In Proceedings of the Late-Breaking Work at the 2025 International Conference on Explainable AI (XAI 2025), Istanbul, Turkey, 9–11 July 2025. [Google Scholar]
Dhariwal, P.; Nichol, A. Diffusion Models Beat GANs on Image Synthesis. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 8780–8794. [Google Scholar]
Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E.L.; Ghasemipour, K.; Gontijo Lopes, R.; Karagol Ayan, B.; Salimans, T.; et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: New York, NY, USA, 2022; Volume 35, pp. 36479–36494. [Google Scholar]
Luccioni, A.S.; Akiki, C.; Mitchell, M.; Jernite, Y. Stable bias: Evaluating societal representations in diffusion models; NIPS ’23. In Proceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 10–16 December 2023. [Google Scholar]
Hertz, A.; Mokady, R.; Tenenbaum, J.; Aberman, K.; Pritch, Y.; Cohen-Or, D. Prompt-to-Prompt Image Editing with Cross-Attention Control. In Proceedings of the 11th International Conference on Learning Representations (ICLR), La Jolla, CA, USA, 1–5 May 2023. [Google Scholar]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021; Meila, M., Zhang, T., Eds.; Volume 139, pp. 8748–8763. [Google Scholar]
Monteiro, W.R.; Reynoso-Meza, G. A Review of the Convergence Between Explainable Artificial Intelligence and Multi-Objective Optimization. TechRxiv 2022. [Google Scholar] [CrossRef]
DeGrave, A.J.; Cai, Z.R.; Janizek, J.D.; Daneshjou, R.; Lee, S.I. Auditing the inference processes of medical-image classifiers by leveraging generative AI and the expertise of physicians. Nat. Biomed. Eng. 2023, 9, 294–306. [Google Scholar] [CrossRef]

Figure 1. The GenAI Explainability Triangle, which formalizes explainability as an interaction among generative mechanism transparency, user-centered interpretability, and evaluation fidelity.

Figure 2. PRISMA flow diagram illustrating the two-stage review process. A total of 261 records were identified from six databases. After removing duplicates and applying inclusion and exclusion criteria, 63 articles were retained for full-text analysis. The first stage comprised an umbrella review of 18 review papers, and the second stage comprised an empirical review of 45 primary studies.

Figure 3. Distribution of chosen empirical articles across various application domains. Technology and software, cybersecurity, and arts and culture domains being the most frequently addressed areas.

Figure 4. Distribution of chosen empirical articles based on GenAI model families. Transformer-based models dominate, followed by Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs).

Figure 5. Flow of empirical studies across application domains, GenAI model families, and types of explainability techniques.

Figure 6. Categorization of explainability techniques utilized by empirical articles. The majority studies relied on either modified or pre-existing techniques, while only a few proposed novel methods.

Figure 7. Grad-CAM visualizations from Bird and Lotfi [63] showing activation heatmaps over real (first row) and AI-generated (second row) images. Features that contributes to the output class label are represented by brighter pixels. Reproduced from Bird and Lotfi [63] under the Creative Commons Attribution License (CC BY 4.0).

Figure 8. SHAP summary plot showing global feature importance in the GAN-generated dataset within the CloudGEN framework. Reproduced from Demirbaga [76] under the Creative Commons Attribution License (CC BY 4.0).

Figure 9. Distribution of empirical articles’ explainability techniques usage. The number inside parenthesis indicate the overall quantity of techniques used in each category.

Figure 10. Distribution of evaluation methods used in chosen empirical articles for explainability techniques in GenAI. Majority evaluated through metric-based approaches.

Figure 11. Distribution of explainability techniques used in chosen empirical articles alongside their quality evaluations.

Table 1. Key aspects highlighting the importance of explainability and transparency in GenAI and their implications.

Key Aspect	Implications for GenAI
Trust and Transparency	Increase user adoption and confidence in AI-generated content.
Regularity Compliance	AI developers and deploying organizations must align GenAI models with global AI governance standards to ensure lawful deployment.
Bias and Fairness Assessment	Fairer AI-generated results minimize discrimination and improve AI’s societal impact.
Mitigation of Misinformation	Helps minimize AI-generated misinformation and propaganda, and improve trust in AI-powered content creation.
Ethical and Legal Considerations	Reduce legal risks, safeguard user rights, and ensure AI-generated content does not violate regulations or norms.
User Control and Interpretability	Enhances transparency in AI interactions by allowing users to refine and oversee model outputs more effectively.
Security and Robustness	Strengths AI security by detecting malicious manipulations and reducing risks of AI misuse or cyber threats.
Accountability and Auditing	Ensures AI-generated outputs remain accountable and verifiable to promote ethical development of AI.

Table 2. Inclusion exclusion criteria for articles selection.

Criterion	Included	Excluded
a. Type of Publication	Peer-reviewed journals and conference papers	Editorial materials, books, book chapters, short surveys, notes, conference reviews, articles, theses, other gray literature
b. Study Topic	GenAI, Explainability	Common AI, explainability for common AI
c. Recentness	Publications on 2020 or after	Publications before 2020
d. Language	English	Other languages
e. Type of Study	Different types of literature reviews and empirical studies	Only abstracts and proposals
f. Study Content	Explainability of GenAI	Explainability only for common AI, Only about GenAI

Table 3. Overview of the included review articles, summarizing their main findings and thematic scope.

Review Article	Main Findings
Bushey [32]	GenAI explainability is underdeveloped but critical, especially as AI-generated images become part of legal records and medical decisions.
Hanif et al. [33]	XAI can bridge the human–AI understanding gap in high-stakes domains.
Zarghami et al. [34]	Emphasizes explainability in construction; proposes a taxonomy and hybrid methods.
Longo et al. [35]	Current XAI methods do not scale to GenAI due to model complexity; promising but untested techniques exist.
Schneider [8]	GenAI explainability is an urgent but underdeveloped area; trust, interactivity, verifiability, evaluation, and cost need attention.
Mudabbiruddin et al. [36]	Traditional XAI techniques (e.g., SHAP, PatternNet) contribute to GenAI output explainability but remain insufficient for full transparency and trust.
Pan et al. [37]	Combining LLMs and knowledge graphs could improve GenAI explainability.
Jain and Jain [38]	XAI techniques like Reinforcement Learning from Human Feedback (RLHF) and post hoc methods fail to adequately address GenAI explainability.
Zeiser et al. [39]	XAI for GenAI in manufacturing is essential but still lacking.
Qu and Yang [40]	ChatGPT offers educational potential but interpretability and bias remain issues.
Bui [41]	Explainability is crucial for integrating GenAI into patent law; uses traditional XAI techniques (e.g., SHAP, LIME), but gaps persist.
Ye et al. [42]	GenAI has potential in planning and participation, but its opacity poses risks.
Demuth et al. [43]	Clinical GenAI uses post hoc techniques like saliency maps and layer-wise relevance propagation, which help visualize input regions but lack intrinsic interpretability.
López Joya et al. [44]	SHAP aids GenAI-powered bot detection insights, but GenAI-specific XAI remains immature.
Abbas [45]	Current literature lacks empirical studies and practical integration of techniques for GenAI explainability.
Non et al. [46]	GenAI explainability in healthcare epidemiology is viewed as an ethical and conceptual necessity.
Westphal and Mrowka [47]	GenAI explainability is crucial for clinical integration; adaptation of traditional XAI to GenAI is still evolving.
Mikołajewska et al. [48]	Lack of explainability limits GenAI adoption in industrial settings such as digital twins.

Table 4. Overview of the included empirical studies, indicating each study’s main findings and country of origin to illustrate the geographic and thematic diversity of GenAI explainability research.

Empirical Article	Country	Main Findings
Sun et al. [49]	USA	Identifies users’ explainability needs for GenAI for code and proposed four human-centered explainability features to address them.
El-Zanfaly et al. [50]	USA	Demonstrates that explainability of GenAI can be achieved through intuitive, embodied interaction rather than technical explanations.
Ezzahed et al. [51]	Switzerland	Introduces visual latent space analysis to improve the explainability of VAEs in air traffic management.
Wang et al. [52]	Singapore	Shows how XAI techniques like SHAP and PDPs can be used with proxy models to make GenAI prompt editing transparent and interpretable.
Jeong et al. [53]	USA	Introduces Concept Lens, a tool that explains image-based GenAI by visualizing consistent and inconsistent semantic edits in GANs.
Hasko et al. [54]	Ukraine	Demonstrates how integrating XAI with GenAI models like GPT-3.5 in robotic assistants enhances transparency and user trust.
Minutti et al. [55]	Mexico	Introduces a $β$ -VAE-based GenAI model that enhances explainability through interpretable latent space and bias control.
Esposito et al. [56]	Italy	Proposes actionability as a practical, domain-specific proxy for explainability in GenAI systems used for mission-critical risk analysis.
Moruzzi et al. [57]	Italy	Introduces epistemic filters as a conceptual framework to improve GenAI explainability by accounting for both model and user biases in human-AI interactions.
Pozzi et al. [58]	Italy	Applies explainability via Concept Relevance Propagation to assess the fidelity of features learned from GenAI in digital pathology.
Pontorno et al. [59]	Italy	Uses LIME to identify detectable traces left by GenAI models to improve explainability in deepfake detection.
Riello et al. [60]	Italy	Demonstrates that LLM attention scores can offer interpretable, feature-level explanations for educational predictions.
Sachan et al. [61]	UK	Introduces a framework that links GenAI outputs to explainable decisions using Evidential Reasoning and ensures their transparency through blockchain auditing.
Sachan et al. [62]	UK	Ensures explainability and accountability in GenAI outputs by grounding them in XAI-generated legal decisions and tracking their use through immutable records.
Bird and Lotfi [63]	UK	Demonstrates how Grad-CAM can reveal subtle visual cues in AI-generated images in GenAI image detection systems.
Burgess [64]	UK	Shows that demystification improves GenAI explainability by reducing user misattribution of intelligence.
Ince et al. [65]	UK	Shows that using SHAP with CTGAN makes GenAI data augmentation more transparent and interpretable.
Bryan-Kinns et al. [66]	UK	Demonstrates that structuring VAE latent spaces with musical attributes makes GenAI music more interpretable and controllable.
Abu-Rasheed et al. [67]	Germany	Proposes a hybrid system that improves GenAI explainability in education using GPT-4, knowledge graphs, and expert-guided conversational support.
Herdt and Maass [68]	Germany	Presents a method to visualize and control GAN outputs using activation vectors for improved interpretability and structure-level generation.
Balmer et al. [69]	Switzerland	Shows that GenAI can be made explainable by combining CVAEs with decision trees and sensitivity analysis for transparent design exploration.
Vilone et al. [70]	Switzerland	Proposes a legal compliance framework revealing that current GenAI lacks sufficient explainability for financial advice under EU regulations.
Durango et al. [71]	Spain	Introduces the DYNAMIC framework that enhances GenAI explainability through adaptive, interpretable, and user-driven system design.
Kim [72]	South Korea	Proposes conceptual frameworks (DIKW hierarchy, Human-GenAI collaboration models, and ZPD) to guide the responsible use of text-generating AI by fostering explainability through XAI literacy.
Lee et al. [73]	South Korea	Develops a GPT-based multi-agent system that generates structured, explainable outputs for chemical process improvements.
Heo et al. [74]	South Korea	Shows how Deep SHAP explains GenAI (AAE) energy forecasts by revealing key climate and feature influences.
Jang et al. [75]	South Korea	Provides a user-centered framework identifying when, what, and how explanations should be delivered in GenAI chatbots.
Demirbaga [76]	Turkey	Introduces CloudGEN, a GAN-powered anomaly detection framework that combines generative modeling with SHAP-based explainability in cloud systems.
Biswal [77]	India	Demonstrates how LIME and SHAP can be effectively applied to a fine-tuned GenAI model to provide explanations for detecting subtle cyber harassment.
Kim [78]	USA	Shows that uncertainty expressions can improve trust calibration in LLMs and proposes a framework for explainability in GenAI
Ehsan and Riedl [79]	USA	Reframes GenAI explainability by proposing human-centered approaches that emphasize actionable understanding over algorithmic transparency
Chaccour et al. [80]	USA	Highlights the potential of integrating causal AI, XAI, uncertainty, and neuro-symbolic AI to enhance GenAI explainability in telecom networks.
Pendyala and Chintalapati [81]	USA	Shows that LIME and removal-based methods can explain how foundation models detect GenAI-generated fake images.
Taylor-Melanson et al. [82]	Canada	Introduces a set of counterfactual explanation methods using causal GenAI models that enhance the interpretability of image classifiers.
Hu et al. [83]	New Zealand	Shows that integrated gradients can enhance the transparency of LLM predictions in GenAI for education.
Toth et al. [84]	Italy	Demonstrates how ChatGPT-4 and explainable AI can be combined to make archival metadata more transparent and semantically accessible.
Di Lodovico et al. [85]	Italy	Shows that users from diverse, evolving folk theories to explain GenAI outputs, revealing gaps in current GenAI explainability tools.
Leimeister et al. [86]	Germany	Demonstrates that tailored explanations for GenAI, such as factual and analogical, can improve users’ ability to detect confabulations in GenAI outputs.
Jeck et al. [87]	Germany	Introduces TELL-ME, a prototype that provides personalized explanations for GenAI outputs based on user expertise.
Basaran and Dressler [88]	Germany	Introduces fastSHAP-C, a real-time method to explain generative autoencoder decisions in O-RAN anomaly detection.
Ahmed et al. [89]	Sweden	Proposes ExplainAgent, a modular tool that unifies existing XAI methods to improve GenAI transparency through user-tailored explanations.
Bhattacharya et al. [90]	Belgium	Shows how explainability tools can support domain experts in guiding and validating GenAI-generated data to reduce bias.
Katsuragi and Tanaka [91]	Japan	Demonstrates that textual explanations from a GenAI model can enhance designer trust and support human-AI collaboration in creative layout tasks.
Yoshioka et al. [92]	Japan	Shows how XAI can make policy insights from GenAI-generated data more transparent.
Rathakrishnan et al. [93]	Sri Lanka	Applies LIME to make CTGAN-generated intrusion data interpretable and enables analysts to understand feature contributions without exposing real data.

Table 5. Categories of explainability techniques utilized for GenAI in empirical articles.

Category	Specific Techniques
Pre-existing	Local Interpretable Model-Agnostic Explanations (LIME) [59,77,81,92,93]
Techniques	SHapley Additive exPlanations (SHAP) [65,76,77,92]
	Gradient-weighted Class Activation Mapping (Grad-CAM) [63]
	Integrated gradients [83]
	Heatmaps [85]
	Concept Relevance Propagation (CRP) [58]
	Retrieval-Augmented Generation (RAG) and fine-tuning [56]
	Removal-based explanations [81]
	Machine learning classifiers [59]
	Attention-based explanations [60]
	Decision trees and sensitivity analysis [69]
	Deep SHAP, latent variable analysis, multiple linear regression and spline methods [74]
	Feature attribution, counterfactual explanations, causal reasoning, fairness and bias assessments, self-evaluation metrics, chain-of-thought reasoning [70]
	Causal AI, Common XAI and Neuro-symbolic AI [80]
	Natural language textual justifications [91]
	AI documentation (fact sheets and model cards) [49]
Modified	Actionability as a proxy [56]
Pre-existing	fastSHAP-C [88]
Techniques	Uncertainty expression technique [78]
	Sand playground interface [50]
	Mystification and demystification [64]
	Rationale generation and Seamful XAI [79]
	MeasureVAE and AdversarialVAE [66]
	Uncertainty indicators and attention visualization [49]
	Revised DIKW hierarchy and Zone of Proximal Development (ZPD) [72]
	Factual explanations, analogical explanations, probabilistic explanations, chain-of-thought explanations [86]
	Pixel-based explanations, attribute-based explanations and counterfactual explanations [82]
	DYNAMIC Framework: Combination of GenAI lens, interpretable neural modules, XAI techniques (LIME, SHAP, counterfactual explanations, concept activation vectors), and real-time visualizations with D3.js [71]
	Combination of activation vector visualization [68]
	Combination of LIME, SHAP, transformer-based attention mechanisms, context-aware and multimodal explanations, and user-centric interactive explanation systems [89]
	Combination of integrated gradients, remove and retrain (ROAR), LLM embedding-based semantic clustering, LLM-based hierarchical node labeling, t-SNE dimensionality reduction, frameNet-based semantic role labeling via LLM, location–event extraction using WordNet semantics with LLM [84]
	Combination of evidential reasoning, blockchain-based auditing, and anonymized AI prompting [61]
	Combination of feature ablation, counterfactual explanations, and self explainability [87]
	Combination of data-centric explanations, model impact analysis, local what-if analysis, and transparency measures [90]
	Combination of I-MAKER, C-MAKER, blockchain-based auditing, anonymization of AI prompts and explainable legal reasoning [62]
	Combination of latent response matrix, divergence plots, mean curvature plots, intervention-based analysis and composite visualization [51]
	Combination of GIPHT system, DSFILES (Detailed Simplified Flowsheet Input Line Entry System), literature-based validation and prompt engineering techniques [73]
	Combination of $β$ -VAE, latent space manipulation, weighted masking, and comparative evaluation [55]
	Combination of SHAP, Partial Dependence Plots (PDPs) and proxy model (LightGBM) [52]
	Combination of knowledge graph-based contextualization, re-prompting and intent classification, expert-defined constraints and rules and human mentor fallback mechanism [67]
	Combination of XAI, federated learning and human–robot interaction and collaboration [54]
	Combination of timing of explanations, explanation arrangement, accuracy of presentation, and global and local explanations [75]
Novel	Social transparency [49,79]
Techniques	Concept lens [53]
	Epistemic filters [57]
	LLMs’ self-explanations [78]
	Human-GenAI collaboration models [72]

Table 6. Multi-dimensional analytical taxonomy of explainability techniques in empirical GenAI studies.

Analytical Dimension	Categories	Representative Studies from This Review
Mechanism of Explanation	Feature attribution	SHAP in CTGAN [65], LIME for cyber harassment [77], Grad-CAM for image detection [63]
	Latent space interpretability	$β$ -VAE for bias control [55], VAE music generation [66], latent response matrices [51]
	Surrogate modeling	SHAP + LightGBM proxy [77], decision trees + CVAE [69]
	Causal and counterfactual reasoning	Causal GenAI images [82], feature ablation + counterfactuals [87]
	Semantic concept modeling	Concept Lens in GANs [53], CRP in pathology [58]
	Natural-language rationalization	LLM self-explanations [78], natural language design explanations [91]
	Social and interaction-based transparency	Social transparency [49,79], embodied sandbox explanations [50]
Timing of Explanation	Ante hoc	$β$ -VAE latent control [55], epistemic filters [57], Concept Lens [53]
	Post hoc	LIME [77,93], SHAP [65,76], Grad-CAM [63], attention visualization [60]
	Hybrid (Ante + Post hoc)	DYNAMIC framework [71], ExplainAgent [89], uncertainty-aware GenAI [78]
Scope of Explanation	Local	LIME for intrusion data [93], Grad-CAM images [63], counterfactual classifiers [82]
	Global	SHAP global importance [76], sensitivity analysis [69]
	Mixed (Local + Global)	SHAP + PDPs [77], DYNAMIC framework [71], ExplainAgent [89]
Target Audience	Developers	SHAP proxy models [52], fastSHAP-C [88], attention analysis [60]
	End users	Chatbot explanation timing [75], uncertainty expression [78], co-creation sandbox [50]
	Domain experts	Digital pathology [58], air-traffic VAEs [51], chemical process GPT agents [73]
	Learners and educators	MOOCs with integrated gradients [83], TEL mentoring system [67]
	Policymakers and regulators	Blockchain-audited GenAI [61,62], legal explainability in robo-advice [70]
Methodological	Quantitative	SHAP, PDPs [52,65,76]
Nature	Qualitative	Social transparency [49], epistemic filters [57]
	Mixed	ExplainAgent [89], DYNAMIC framework [71]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kumarage, P.M.; Saarela, M. Explainable Generative AI: A Two-Stage Review of Existing Techniques and Future Research Directions. AI 2026, 7, 31. https://doi.org/10.3390/ai7010031

AMA Style

Kumarage PM, Saarela M. Explainable Generative AI: A Two-Stage Review of Existing Techniques and Future Research Directions. AI. 2026; 7(1):31. https://doi.org/10.3390/ai7010031

Chicago/Turabian Style

Kumarage, Prabha M., and Mirka Saarela. 2026. "Explainable Generative AI: A Two-Stage Review of Existing Techniques and Future Research Directions" AI 7, no. 1: 31. https://doi.org/10.3390/ai7010031

APA Style

Kumarage, P. M., & Saarela, M. (2026). Explainable Generative AI: A Two-Stage Review of Existing Techniques and Future Research Directions. AI, 7(1), 31. https://doi.org/10.3390/ai7010031

Article Menu

Explainable Generative AI: A Two-Stage Review of Existing Techniques and Future Research Directions

Abstract

1. Introduction

2. Preliminaries of Explainable Generative AI

2.1. Generative Artificial Intelligence

2.2. Explainable Artificial Intelligence (XAI)

2.3. The Need for Explainability and Transparency in GenAI

2.4. A Formalized Model for Explainability in GenAI

3. Research Methodology

3.1. Data Sources and Search Strategy

3.2. Selection Criteria

3.3. Data Extraction and Analysis

3.4. Bias and Certainty Assessments

4. Results and Analysis

4.1. First Stage: Review of Reviews

4.1.1. Temporal Distribution in Review Studies

4.1.2. Domain Distribution in Review Studies

4.1.3. Key Findings

4.1.4. Cross-Study Comparisons in Review Studies

4.2. Second Stage: Empirical Review of Primary Studies

4.2.1. Temporal and Geographic Distribution

4.2.2. Domain Distribution

4.2.3. Distribution Across GenAI Model Families

4.2.4. Explainability Techniques

4.2.5. Evaluation Methods

4.2.6. Cross-Study Comparisons

5. Open Challenges in Explainable GenAI

5.1. Lack of Generalizable and GenAI-Specific Frameworks

5.2. Scalability and Computational Feasibility

5.3. Evaluation Metrics and Benchmarking

5.4. Explainability Challenges in Multimodal Generative Models

5.5. Balancing Performance and Interpretability

5.6. Ethical, Regulatory, and User-Centered Alignment

5.7. Synthesis and Outlook

6. Practical Recommendations and Future Research Directions

6.1. Integrating Explainability into Model Training Pipelines

6.2. Provenance Tracking and Traceability for Generative Outputs

6.3. Human-Centered Explanation Interfaces and Interaction Design

6.4. Human-Centered Explainability and Stakeholder Needs

6.5. Domain-Adaptive and Standardized Evaluation Protocols

6.6. Hybrid Neuro-Symbolic and Generative Approaches

6.7. A Roadmap for Explainable GenAI

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI